<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/x86/kernel/dumpstack_64.c, branch v6.0</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>x86/mm/64: Improve stack overflow warnings</title>
<updated>2021-09-21T11:57:43+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2021-09-15T15:12:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=44b979fa302cab91bdd2cc982823e5c13202cd4e'/>
<id>44b979fa302cab91bdd2cc982823e5c13202cd4e</id>
<content type='text'>
Current code has an explicit check for hitting the task stack guard;
but overflowing any of the other stacks will get you a non-descript
general #DF warning.

Improve matters by using get_stack_info_noinstr() to detetrmine if and
which stack guard page got hit, enabling a better stack warning.

In specific, Michael Wang reported what turned out to be an NMI
exception stack overflow, which is now clearly reported as such:

  [] BUG: NMI stack guard page was hit at 0000000085fd977b (stack is 000000003a55b09e..00000000d8cce1a5)

Reported-by: Michael Wang &lt;yun.wang@linux.alibaba.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Michael Wang &lt;yun.wang@linux.alibaba.com&gt;
Link: https://lkml.kernel.org/r/YUTE/NuqnaWbST8n@hirez.programming.kicks-ass.net
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Current code has an explicit check for hitting the task stack guard;
but overflowing any of the other stacks will get you a non-descript
general #DF warning.

Improve matters by using get_stack_info_noinstr() to detetrmine if and
which stack guard page got hit, enabling a better stack warning.

In specific, Michael Wang reported what turned out to be an NMI
exception stack overflow, which is now clearly reported as such:

  [] BUG: NMI stack guard page was hit at 0000000085fd977b (stack is 000000003a55b09e..00000000d8cce1a5)

Reported-by: Michael Wang &lt;yun.wang@linux.alibaba.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Michael Wang &lt;yun.wang@linux.alibaba.com&gt;
Link: https://lkml.kernel.org/r/YUTE/NuqnaWbST8n@hirez.programming.kicks-ass.net
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/irq/64: Adjust the per CPU irq stack pointer by 8</title>
<updated>2021-02-10T22:34:14+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-02-09T23:40:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=951c2a51ae75382d519839e2308394ad43ce4b40'/>
<id>951c2a51ae75382d519839e2308394ad43ce4b40</id>
<content type='text'>
The per CPU hardirq_stack_ptr contains the pointer to the irq stack in the
form that it is ready to be assigned to [ER]SP so that the first push ends
up on the top entry of the stack.

But the stack switching on 64 bit has the following rules:

    1) Store the current stack pointer (RSP) in the top most stack entry
       to allow the unwinder to link back to the previous stack

    2) Set RSP to the top most stack entry

    3) Invoke functions on the irq stack

    4) Pop RSP from the top most stack entry (stored in #1) so it's back
       to the original stack.

That requires all stack switching code to decrement the stored pointer by 8
in order to be able to store the current RSP and then set RSP to that
location. That's a pointless exercise.

Do the -8 adjustment right when storing the pointer and make the data type
a void pointer to avoid confusion vs. the struct irq_stack data type which
is on 64bit only used to declare the backing store. Move the definition
next to the inuse flag so they likely end up in the same cache
line. Sticking them into a struct to enforce it is a seperate change.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Link: https://lore.kernel.org/r/20210210002512.354260928@linutronix.de

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The per CPU hardirq_stack_ptr contains the pointer to the irq stack in the
form that it is ready to be assigned to [ER]SP so that the first push ends
up on the top entry of the stack.

But the stack switching on 64 bit has the following rules:

    1) Store the current stack pointer (RSP) in the top most stack entry
       to allow the unwinder to link back to the previous stack

    2) Set RSP to the top most stack entry

    3) Invoke functions on the irq stack

    4) Pop RSP from the top most stack entry (stored in #1) so it's back
       to the original stack.

That requires all stack switching code to decrement the stored pointer by 8
in order to be able to store the current RSP and then set RSP to that
location. That's a pointless exercise.

Do the -8 adjustment right when storing the pointer and make the data type
a void pointer to avoid confusion vs. the struct irq_stack data type which
is on 64bit only used to declare the backing store. Move the definition
next to the inuse flag so they likely end up in the same cache
line. Sticking them into a struct to enforce it is a seperate change.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Link: https://lore.kernel.org/r/20210210002512.354260928@linutronix.de

</pre>
</div>
</content>
</entry>
<entry>
<title>x86/dumpstack/64: Add noinstr version of get_stack_info()</title>
<updated>2020-09-09T09:33:19+00:00</updated>
<author>
<name>Joerg Roedel</name>
<email>jroedel@suse.de</email>
</author>
<published>2020-09-07T13:15:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6b27edd74a5e9669120f7bd0ae1f475d124c1042'/>
<id>6b27edd74a5e9669120f7bd0ae1f475d124c1042</id>
<content type='text'>
The get_stack_info() functionality is needed in the entry code for the
#VC exception handler. Provide a version of it in the .text.noinstr
section which can be called safely from there.

Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lkml.kernel.org/r/20200907131613.12703-45-joro@8bytes.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The get_stack_info() functionality is needed in the entry code for the
#VC exception handler. Provide a version of it in the .text.noinstr
section which can be called safely from there.

Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lkml.kernel.org/r/20200907131613.12703-45-joro@8bytes.org
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/sev-es: Allocate and map an IST stack for #VC handler</title>
<updated>2020-09-09T09:33:19+00:00</updated>
<author>
<name>Joerg Roedel</name>
<email>jroedel@suse.de</email>
</author>
<published>2020-09-07T13:15:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=02772fb9b68e6a72a5e17f994048df832fe2b15e'/>
<id>02772fb9b68e6a72a5e17f994048df832fe2b15e</id>
<content type='text'>
Allocate and map an IST stack and an additional fall-back stack for
the #VC handler.  The memory for the stacks is allocated only when
SEV-ES is active.

The #VC handler needs to use an IST stack because a #VC exception can be
raised from kernel space with unsafe stack, e.g. in the SYSCALL entry
path.

Since the #VC exception can be nested, the #VC handler switches back to
the interrupted stack when entered from kernel space. If switching back
is not possible, the fall-back stack is used.

Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lkml.kernel.org/r/20200907131613.12703-43-joro@8bytes.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Allocate and map an IST stack and an additional fall-back stack for
the #VC handler.  The memory for the stacks is allocated only when
SEV-ES is active.

The #VC handler needs to use an IST stack because a #VC exception can be
raised from kernel space with unsafe stack, e.g. in the SYSCALL entry
path.

Since the #VC exception can be nested, the #VC handler switches back to
the interrupted stack when entered from kernel space. If switching back
is not possible, the fall-back stack is used.

Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lkml.kernel.org/r/20200907131613.12703-43-joro@8bytes.org
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/entry: Remove DBn stacks</title>
<updated>2020-06-11T13:15:23+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2020-05-29T21:27:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fd501d4f0399700011acde486576c7c1eb8e7a61'/>
<id>fd501d4f0399700011acde486576c7c1eb8e7a61</id>
<content type='text'>
Both #DB itself, as all other IST users (NMI, #MC) now clear DR7 on
entry. Combined with not allowing breakpoints on entry/noinstr/NOKPROBE
text and no single step (EFLAGS.TF) inside the #DB handler should guarantee
no nested #DB.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lkml.kernel.org/r/20200529213321.303027161@infradead.org



</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Both #DB itself, as all other IST users (NMI, #MC) now clear DR7 on
entry. Combined with not allowing breakpoints on entry/noinstr/NOKPROBE
text and no single step (EFLAGS.TF) inside the #DB handler should guarantee
no nested #DB.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lkml.kernel.org/r/20200529213321.303027161@infradead.org



</pre>
</div>
</content>
</entry>
<entry>
<title>x86/unwind: Prevent false warnings for non-current tasks</title>
<updated>2020-04-25T10:22:28+00:00</updated>
<author>
<name>Josh Poimboeuf</name>
<email>jpoimboe@redhat.com</email>
</author>
<published>2020-04-25T10:03:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b08418b54831255a7e3700d6bf7dfc2bdae25cd7'/>
<id>b08418b54831255a7e3700d6bf7dfc2bdae25cd7</id>
<content type='text'>
There's some daring kernel code out there which dumps the stack of
another task without first making sure the task is inactive.  If the
task happens to be running while the unwinder is reading the stack,
unusual unwinder warnings can result.

There's no race-free way for the unwinder to know whether such a warning
is legitimate, so just disable unwinder warnings for all non-current
tasks.

Reviewed-by: Miroslav Benes &lt;mbenes@suse.cz&gt;
Signed-off-by: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Dave Jones &lt;dsj@fb.com&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Link: https://lore.kernel.org/r/ec424a2aea1d461eb30cab48a28c6433de2ab784.1587808742.git.jpoimboe@redhat.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There's some daring kernel code out there which dumps the stack of
another task without first making sure the task is inactive.  If the
task happens to be running while the unwinder is reading the stack,
unusual unwinder warnings can result.

There's no race-free way for the unwinder to know whether such a warning
is legitimate, so just disable unwinder warnings for all non-current
tasks.

Reviewed-by: Miroslav Benes &lt;mbenes@suse.cz&gt;
Signed-off-by: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Dave Jones &lt;dsj@fb.com&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Link: https://lore.kernel.org/r/ec424a2aea1d461eb30cab48a28c6433de2ab784.1587808742.git.jpoimboe@redhat.com
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/dumpstack/64: Don't evaluate exception stacks before setup</title>
<updated>2019-11-04T23:51:35+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-10-23T18:05:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e361362b08cab1098b64b0e5fd8c879f086b3f46'/>
<id>e361362b08cab1098b64b0e5fd8c879f086b3f46</id>
<content type='text'>
Cyrill reported the following crash:

  BUG: unable to handle page fault for address: 0000000000001ff0
  #PF: supervisor read access in kernel mode
  RIP: 0010:get_stack_info+0xb3/0x148

It turns out that if the stack tracer is invoked before the exception stack
mappings are initialized in_exception_stack() can erroneously classify an
invalid address as an address inside of an exception stack:

    begin = this_cpu_read(cea_exception_stacks);  &lt;- 0
    end = begin + sizeof(exception stacks);

i.e. any address between 0 and end will be considered as exception stack
address and the subsequent code will then try to derefence the resulting
stack frame at a non mapped address.

 end = begin + (unsigned long)ep-&gt;size;
     ==&gt; end = 0x2000

 regs = (struct pt_regs *)end - 1;
     ==&gt; regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0

 info-&gt;next_sp   = (unsigned long *)regs-&gt;sp;
     ==&gt; Crashes due to accessing 0x1ff0

Prevent this by checking the validity of the cea_exception_stack base
address and bailing out if it is zero.

Fixes: afcd21dad88b ("x86/dumpstack/64: Use cpu_entry_area instead of orig_ist")
Reported-by: Cyrill Gorcunov &lt;gorcunov@gmail.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Cyrill Gorcunov &lt;gorcunov@gmail.com&gt;
Acked-by: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1910231950590.1852@nanos.tec.linutronix.de

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Cyrill reported the following crash:

  BUG: unable to handle page fault for address: 0000000000001ff0
  #PF: supervisor read access in kernel mode
  RIP: 0010:get_stack_info+0xb3/0x148

It turns out that if the stack tracer is invoked before the exception stack
mappings are initialized in_exception_stack() can erroneously classify an
invalid address as an address inside of an exception stack:

    begin = this_cpu_read(cea_exception_stacks);  &lt;- 0
    end = begin + sizeof(exception stacks);

i.e. any address between 0 and end will be considered as exception stack
address and the subsequent code will then try to derefence the resulting
stack frame at a non mapped address.

 end = begin + (unsigned long)ep-&gt;size;
     ==&gt; end = 0x2000

 regs = (struct pt_regs *)end - 1;
     ==&gt; regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0

 info-&gt;next_sp   = (unsigned long *)regs-&gt;sp;
     ==&gt; Crashes due to accessing 0x1ff0

Prevent this by checking the validity of the cea_exception_stack base
address and bailing out if it is zero.

Fixes: afcd21dad88b ("x86/dumpstack/64: Use cpu_entry_area instead of orig_ist")
Reported-by: Cyrill Gorcunov &lt;gorcunov@gmail.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Cyrill Gorcunov &lt;gorcunov@gmail.com&gt;
Acked-by: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1910231950590.1852@nanos.tec.linutronix.de

</pre>
</div>
</content>
</entry>
<entry>
<title>x86/irq/64: Rename irq_stack_ptr to hardirq_stack_ptr</title>
<updated>2019-04-17T13:27:10+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-04-14T16:00:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=758a2e312228410f2f5092ade558109e93dc3ee8'/>
<id>758a2e312228410f2f5092ade558109e93dc3ee8</id>
<content type='text'>
Preparatory patch to share code with 32bit.

No functional changes.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: "Chang S. Bae" &lt;chang.seok.bae@intel.com&gt;
Cc: Dominik Brodowski &lt;linux@dominikbrodowski.net&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jiri Kosina &lt;jkosina@suse.cz&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@linux.vnet.ibm.com&gt;
Cc: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Cc: Nicolai Stange &lt;nstange@suse.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Pingfan Liu &lt;kernelfans@gmail.com&gt;
Cc: Sean Christopherson &lt;sean.j.christopherson@intel.com&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: x86-ml &lt;x86@kernel.org&gt;
Link: https://lkml.kernel.org/r/20190414160145.912584074@linutronix.de
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Preparatory patch to share code with 32bit.

No functional changes.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: "Chang S. Bae" &lt;chang.seok.bae@intel.com&gt;
Cc: Dominik Brodowski &lt;linux@dominikbrodowski.net&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jiri Kosina &lt;jkosina@suse.cz&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@linux.vnet.ibm.com&gt;
Cc: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Cc: Nicolai Stange &lt;nstange@suse.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Pingfan Liu &lt;kernelfans@gmail.com&gt;
Cc: Sean Christopherson &lt;sean.j.christopherson@intel.com&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: x86-ml &lt;x86@kernel.org&gt;
Link: https://lkml.kernel.org/r/20190414160145.912584074@linutronix.de
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/dumpstack/64: Speedup in_exception_stack()</title>
<updated>2019-04-17T13:16:57+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-04-14T15:59:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c450c8f532b63475b30e29bc600c25ab0a4ab282'/>
<id>c450c8f532b63475b30e29bc600c25ab0a4ab282</id>
<content type='text'>
The current implementation of in_exception_stack() iterates over the
exception stacks array. Most of the time this is an useless exercise, but
even for the actual use cases (perf and ftrace) it takes at least 2
iterations to get to the NMI stack.

As the exception stacks and the guard pages are page aligned the loop can
be avoided completely.

Add a initial check whether the stack pointer is inside the full exception
stack area and leave early if not.

Create a lookup table which describes the stack area. The table index is
the page offset from the beginning of the exception stacks. So for any
given stack pointer the page offset is computed and a lookup in the
description table is performed. If it is inside a guard page, return. If
not, use the descriptor to fill in the info structure.

The table is filled at compile time and for the !KASAN case the interesting
page descriptors exactly fit into a single cache line. Just the last guard
page descriptor is in the next cacheline, but that should not be accessed
in the regular case.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Acked-by: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Sean Christopherson &lt;sean.j.christopherson@intel.com&gt;
Cc: x86-ml &lt;x86@kernel.org&gt;
Link: https://lkml.kernel.org/r/20190414160145.543320386@linutronix.de
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current implementation of in_exception_stack() iterates over the
exception stacks array. Most of the time this is an useless exercise, but
even for the actual use cases (perf and ftrace) it takes at least 2
iterations to get to the NMI stack.

As the exception stacks and the guard pages are page aligned the loop can
be avoided completely.

Add a initial check whether the stack pointer is inside the full exception
stack area and leave early if not.

Create a lookup table which describes the stack area. The table index is
the page offset from the beginning of the exception stacks. So for any
given stack pointer the page offset is computed and a lookup in the
description table is performed. If it is inside a guard page, return. If
not, use the descriptor to fill in the info structure.

The table is filled at compile time and for the !KASAN case the interesting
page descriptors exactly fit into a single cache line. Just the last guard
page descriptor is in the next cacheline, but that should not be accessed
in the regular case.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Acked-by: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Sean Christopherson &lt;sean.j.christopherson@intel.com&gt;
Cc: x86-ml &lt;x86@kernel.org&gt;
Link: https://lkml.kernel.org/r/20190414160145.543320386@linutronix.de
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/exceptions: Split debug IST stack</title>
<updated>2019-04-17T13:14:28+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-04-14T15:59:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2a594d4ccf3f10f80b77d71bd3dad10813ac0137'/>
<id>2a594d4ccf3f10f80b77d71bd3dad10813ac0137</id>
<content type='text'>
The debug IST stack is actually two separate debug stacks to handle #DB
recursion. This is required because the CPU starts always at top of stack
on exception entry, which means on #DB recursion the second #DB would
overwrite the stack of the first.

The low level entry code therefore adjusts the top of stack on entry so a
secondary #DB starts from a different stack page. But the stack pages are
adjacent without a guard page between them.

Split the debug stack into 3 stacks which are separated by guard pages. The
3rd stack is never mapped into the cpu_entry_area and is only there to
catch triple #DB nesting:

      --- top of DB_stack	&lt;- Initial stack
      --- end of DB_stack
      	  guard page

      --- top of DB1_stack	&lt;- Top of stack after entering first #DB
      --- end of DB1_stack
      	  guard page

      --- top of DB2_stack	&lt;- Top of stack after entering second #DB
      --- end of DB2_stack
      	  guard page

If DB2 would not act as the final guard hole, a second #DB would point the
top of #DB stack to the stack below #DB1 which would be valid and not catch
the not so desired triple nesting.

The backing store does not allocate any memory for DB2 and its guard page
as it is not going to be mapped into the cpu_entry_area.

 - Adjust the low level entry code so it adjusts top of #DB with the offset
   between the stacks instead of exception stack size.

 - Make the dumpstack code aware of the new stacks.

 - Adjust the in_debug_stack() implementation and move it into the NMI code
   where it belongs. As this is NMI hotpath code, it just checks the full
   area between top of DB_stack and bottom of DB1_stack without checking
   for the guard page. That's correct because the NMI cannot hit a
   stackpointer pointing to the guard page between DB and DB1 stack.  Even
   if it would, then the NMI operation still is unaffected, but the resume
   of the debug exception on the topmost DB stack will crash by touching
   the guard page.

  [ bp: Make exception_stack_names static const char * const ]

Suggested-by: Andy Lutomirski &lt;luto@kernel.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Sean Christopherson &lt;sean.j.christopherson@intel.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: "Chang S. Bae" &lt;chang.seok.bae@intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Dominik Brodowski &lt;linux@dominikbrodowski.net&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Joerg Roedel &lt;jroedel@suse.de&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: linux-doc@vger.kernel.org
Cc: Masahiro Yamada &lt;yamada.masahiro@socionext.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Sean Christopherson &lt;sean.j.christopherson@intel.com&gt;
Cc: x86-ml &lt;x86@kernel.org&gt;
Link: https://lkml.kernel.org/r/20190414160145.439944544@linutronix.de
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The debug IST stack is actually two separate debug stacks to handle #DB
recursion. This is required because the CPU starts always at top of stack
on exception entry, which means on #DB recursion the second #DB would
overwrite the stack of the first.

The low level entry code therefore adjusts the top of stack on entry so a
secondary #DB starts from a different stack page. But the stack pages are
adjacent without a guard page between them.

Split the debug stack into 3 stacks which are separated by guard pages. The
3rd stack is never mapped into the cpu_entry_area and is only there to
catch triple #DB nesting:

      --- top of DB_stack	&lt;- Initial stack
      --- end of DB_stack
      	  guard page

      --- top of DB1_stack	&lt;- Top of stack after entering first #DB
      --- end of DB1_stack
      	  guard page

      --- top of DB2_stack	&lt;- Top of stack after entering second #DB
      --- end of DB2_stack
      	  guard page

If DB2 would not act as the final guard hole, a second #DB would point the
top of #DB stack to the stack below #DB1 which would be valid and not catch
the not so desired triple nesting.

The backing store does not allocate any memory for DB2 and its guard page
as it is not going to be mapped into the cpu_entry_area.

 - Adjust the low level entry code so it adjusts top of #DB with the offset
   between the stacks instead of exception stack size.

 - Make the dumpstack code aware of the new stacks.

 - Adjust the in_debug_stack() implementation and move it into the NMI code
   where it belongs. As this is NMI hotpath code, it just checks the full
   area between top of DB_stack and bottom of DB1_stack without checking
   for the guard page. That's correct because the NMI cannot hit a
   stackpointer pointing to the guard page between DB and DB1 stack.  Even
   if it would, then the NMI operation still is unaffected, but the resume
   of the debug exception on the topmost DB stack will crash by touching
   the guard page.

  [ bp: Make exception_stack_names static const char * const ]

Suggested-by: Andy Lutomirski &lt;luto@kernel.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Sean Christopherson &lt;sean.j.christopherson@intel.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: "Chang S. Bae" &lt;chang.seok.bae@intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Dominik Brodowski &lt;linux@dominikbrodowski.net&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Joerg Roedel &lt;jroedel@suse.de&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: linux-doc@vger.kernel.org
Cc: Masahiro Yamada &lt;yamada.masahiro@socionext.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Sean Christopherson &lt;sean.j.christopherson@intel.com&gt;
Cc: x86-ml &lt;x86@kernel.org&gt;
Link: https://lkml.kernel.org/r/20190414160145.439944544@linutronix.de
</pre>
</div>
</content>
</entry>
</feed>
