linux.git/arch/x86/kernel/dumpstack_64.c, branch v6.0

x86/mm/64: Improve stack overflow warnings

2021-09-21T11:57:43+00:00

Current code has an explicit check for hitting the task stack guard;
but overflowing any of the other stacks will get you a non-descript
general #DF warning.

Improve matters by using get_stack_info_noinstr() to detetrmine if and
which stack guard page got hit, enabling a better stack warning.

In specific, Michael Wang reported what turned out to be an NMI
exception stack overflow, which is now clearly reported as such:

  [] BUG: NMI stack guard page was hit at 0000000085fd977b (stack is 000000003a55b09e..00000000d8cce1a5)

Reported-by: Michael Wang 
Signed-off-by: Peter Zijlstra (Intel) 
Tested-by: Michael Wang 
Link: https://lkml.kernel.org/r/YUTE/NuqnaWbST8n@hirez.programming.kicks-ass.net

x86/irq/64: Adjust the per CPU irq stack pointer by 8

2021-02-10T22:34:14+00:00

The per CPU hardirq_stack_ptr contains the pointer to the irq stack in the
form that it is ready to be assigned to [ER]SP so that the first push ends
up on the top entry of the stack.

But the stack switching on 64 bit has the following rules:

    1) Store the current stack pointer (RSP) in the top most stack entry
       to allow the unwinder to link back to the previous stack

    2) Set RSP to the top most stack entry

    3) Invoke functions on the irq stack

    4) Pop RSP from the top most stack entry (stored in #1) so it's back
       to the original stack.

That requires all stack switching code to decrement the stored pointer by 8
in order to be able to store the current RSP and then set RSP to that
location. That's a pointless exercise.

Do the -8 adjustment right when storing the pointer and make the data type
a void pointer to avoid confusion vs. the struct irq_stack data type which
is on 64bit only used to declare the backing store. Move the definition
next to the inuse flag so they likely end up in the same cache
line. Sticking them into a struct to enforce it is a seperate change.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Kees Cook 
Link: https://lore.kernel.org/r/20210210002512.354260928@linutronix.de

x86/dumpstack/64: Add noinstr version of get_stack_info()

2020-09-09T09:33:19+00:00

The get_stack_info() functionality is needed in the entry code for the
#VC exception handler. Provide a version of it in the .text.noinstr
section which can be called safely from there.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-45-joro@8bytes.org

x86/sev-es: Allocate and map an IST stack for #VC handler

2020-09-09T09:33:19+00:00

Allocate and map an IST stack and an additional fall-back stack for
the #VC handler.  The memory for the stacks is allocated only when
SEV-ES is active.

The #VC handler needs to use an IST stack because a #VC exception can be
raised from kernel space with unsafe stack, e.g. in the SYSCALL entry
path.

Since the #VC exception can be nested, the #VC handler switches back to
the interrupted stack when entered from kernel space. If switching back
is not possible, the fall-back stack is used.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-43-joro@8bytes.org

x86/entry: Remove DBn stacks

2020-06-11T13:15:23+00:00

Both #DB itself, as all other IST users (NMI, #MC) now clear DR7 on
entry. Combined with not allowing breakpoints on entry/noinstr/NOKPROBE
text and no single step (EFLAGS.TF) inside the #DB handler should guarantee
no nested #DB.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Thomas Gleixner 
Link: https://lkml.kernel.org/r/20200529213321.303027161@infradead.org

x86/unwind: Prevent false warnings for non-current tasks

2020-04-25T10:22:28+00:00

There's some daring kernel code out there which dumps the stack of
another task without first making sure the task is inactive.  If the
task happens to be running while the unwinder is reading the stack,
unusual unwinder warnings can result.

There's no race-free way for the unwinder to know whether such a warning
is legitimate, so just disable unwinder warnings for all non-current
tasks.

Reviewed-by: Miroslav Benes 
Signed-off-by: Josh Poimboeuf 
Signed-off-by: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Dave Jones 
Cc: Jann Horn 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Link: https://lore.kernel.org/r/ec424a2aea1d461eb30cab48a28c6433de2ab784.1587808742.git.jpoimboe@redhat.com

x86/dumpstack/64: Don't evaluate exception stacks before setup

2019-11-04T23:51:35+00:00

Cyrill reported the following crash:

  BUG: unable to handle page fault for address: 0000000000001ff0
  #PF: supervisor read access in kernel mode
  RIP: 0010:get_stack_info+0xb3/0x148

It turns out that if the stack tracer is invoked before the exception stack
mappings are initialized in_exception_stack() can erroneously classify an
invalid address as an address inside of an exception stack:

    begin = this_cpu_read(cea_exception_stacks);  <- 0
    end = begin + sizeof(exception stacks);

i.e. any address between 0 and end will be considered as exception stack
address and the subsequent code will then try to derefence the resulting
stack frame at a non mapped address.

 end = begin + (unsigned long)ep->size;
     ==> end = 0x2000

 regs = (struct pt_regs *)end - 1;
     ==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0

 info->next_sp   = (unsigned long *)regs->sp;
     ==> Crashes due to accessing 0x1ff0

Prevent this by checking the validity of the cea_exception_stack base
address and bailing out if it is zero.

Fixes: afcd21dad88b ("x86/dumpstack/64: Use cpu_entry_area instead of orig_ist")
Reported-by: Cyrill Gorcunov 
Signed-off-by: Thomas Gleixner 
Tested-by: Cyrill Gorcunov 
Acked-by: Josh Poimboeuf 
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1910231950590.1852@nanos.tec.linutronix.de

x86/irq/64: Rename irq_stack_ptr to hardirq_stack_ptr

2019-04-17T13:27:10+00:00

Preparatory patch to share code with 32bit.

No functional changes.

Signed-off-by: Thomas Gleixner 
Signed-off-by: Borislav Petkov 
Cc: Alexey Dobriyan 
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: "Chang S. Bae" 
Cc: Dominik Brodowski 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: Jiri Kosina 
Cc: Josh Poimboeuf 
Cc: Konrad Rzeszutek Wilk 
Cc: Michal Hocko 
Cc: Mike Rapoport 
Cc: Nick Desaulniers 
Cc: Nicolai Stange 
Cc: Peter Zijlstra 
Cc: Pingfan Liu 
Cc: Sean Christopherson 
Cc: Stephen Rothwell 
Cc: Vlastimil Babka 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20190414160145.912584074@linutronix.de

x86/dumpstack/64: Speedup in_exception_stack()

2019-04-17T13:16:57+00:00

The current implementation of in_exception_stack() iterates over the
exception stacks array. Most of the time this is an useless exercise, but
even for the actual use cases (perf and ftrace) it takes at least 2
iterations to get to the NMI stack.

As the exception stacks and the guard pages are page aligned the loop can
be avoided completely.

Add a initial check whether the stack pointer is inside the full exception
stack area and leave early if not.

Create a lookup table which describes the stack area. The table index is
the page offset from the beginning of the exception stacks. So for any
given stack pointer the page offset is computed and a lookup in the
description table is performed. If it is inside a guard page, return. If
not, use the descriptor to fill in the info structure.

The table is filled at compile time and for the !KASAN case the interesting
page descriptors exactly fit into a single cache line. Just the last guard
page descriptor is in the next cacheline, but that should not be accessed
in the regular case.

Signed-off-by: Thomas Gleixner 
Signed-off-by: Borislav Petkov 
Acked-by: Josh Poimboeuf 
Cc: "H. Peter Anvin" 
Cc: Andy Lutomirski 
Cc: Ingo Molnar 
Cc: Josh Poimboeuf 
Cc: Sean Christopherson 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20190414160145.543320386@linutronix.de

x86/exceptions: Split debug IST stack

2019-04-17T13:14:28+00:00

The debug IST stack is actually two separate debug stacks to handle #DB
recursion. This is required because the CPU starts always at top of stack
on exception entry, which means on #DB recursion the second #DB would
overwrite the stack of the first.

The low level entry code therefore adjusts the top of stack on entry so a
secondary #DB starts from a different stack page. But the stack pages are
adjacent without a guard page between them.

Split the debug stack into 3 stacks which are separated by guard pages. The
3rd stack is never mapped into the cpu_entry_area and is only there to
catch triple #DB nesting:

      --- top of DB_stack	<- Initial stack
      --- end of DB_stack
      	  guard page

      --- top of DB1_stack	<- Top of stack after entering first #DB
      --- end of DB1_stack
      	  guard page

      --- top of DB2_stack	<- Top of stack after entering second #DB
      --- end of DB2_stack
      	  guard page

If DB2 would not act as the final guard hole, a second #DB would point the
top of #DB stack to the stack below #DB1 which would be valid and not catch
the not so desired triple nesting.

The backing store does not allocate any memory for DB2 and its guard page
as it is not going to be mapped into the cpu_entry_area.

 - Adjust the low level entry code so it adjusts top of #DB with the offset
   between the stacks instead of exception stack size.

 - Make the dumpstack code aware of the new stacks.

 - Adjust the in_debug_stack() implementation and move it into the NMI code
   where it belongs. As this is NMI hotpath code, it just checks the full
   area between top of DB_stack and bottom of DB1_stack without checking
   for the guard page. That's correct because the NMI cannot hit a
   stackpointer pointing to the guard page between DB and DB1 stack.  Even
   if it would, then the NMI operation still is unaffected, but the resume
   of the debug exception on the topmost DB stack will crash by touching
   the guard page.

  [ bp: Make exception_stack_names static const char * const ]

Suggested-by: Andy Lutomirski 
Signed-off-by: Thomas Gleixner 
Signed-off-by: Borislav Petkov 
Reviewed-by: Sean Christopherson 
Cc: Andy Lutomirski 
Cc: Baoquan He 
Cc: "Chang S. Bae" 
Cc: Dave Hansen 
Cc: Dominik Brodowski 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: Joerg Roedel 
Cc: Jonathan Corbet 
Cc: Josh Poimboeuf 
Cc: Juergen Gross 
Cc: "Kirill A. Shutemov" 
Cc: Konrad Rzeszutek Wilk 
Cc: linux-doc@vger.kernel.org
Cc: Masahiro Yamada 
Cc: Peter Zijlstra 
Cc: Qian Cai 
Cc: Sean Christopherson 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20190414160145.439944544@linutronix.de