<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/arch/x86/kernel/dumpstack_64.c, branch v3.8</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>x86: Move call to print_modules() out of show_regs()</title>
<updated>2012-06-20T12:33:48+00:00</updated>
<author>
<name>Jan Beulich</name>
<email>JBeulich@suse.com</email>
</author>
<published>2012-06-18T10:40:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0fa0e2f02e8edfbdb5f86d1cab0fa6dc0517489f'/>
<id>0fa0e2f02e8edfbdb5f86d1cab0fa6dc0517489f</id>
<content type='text'>
Printing the list of loaded modules is really unrelated to what
this function is about, and is particularly unnecessary in the
context of the SysRQ key handling (gets printed so far over and
over).

It should really be the caller of the function to decide whether
this piece of information is useful (and to avoid redundantly
printing it).

Signed-off-by: Jan Beulich &lt;jbeulich@suse.com&gt;
Link: http://lkml.kernel.org/r/4FDF21A4020000780008A67F@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Printing the list of loaded modules is really unrelated to what
this function is about, and is particularly unnecessary in the
context of the SysRQ key handling (gets printed so far over and
over).

It should really be the caller of the function to decide whether
this piece of information is useful (and to avoid redundantly
printing it).

Signed-off-by: Jan Beulich &lt;jbeulich@suse.com&gt;
Link: http://lkml.kernel.org/r/4FDF21A4020000780008A67F@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/debug: Add KERN_&lt;LEVEL&gt; to bare printks, convert printks to pr_&lt;level&gt;</title>
<updated>2012-06-06T07:17:22+00:00</updated>
<author>
<name>Joe Perches</name>
<email>joe@perches.com</email>
</author>
<published>2012-05-22T02:50:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c767a54ba0657e52e6edaa97cbe0b0a8bf1c1655'/>
<id>c767a54ba0657e52e6edaa97cbe0b0a8bf1c1655</id>
<content type='text'>
Use a more current logging style:

 - Bare printks should have a KERN_&lt;LEVEL&gt; for consistency's sake
 - Add pr_fmt where appropriate
 - Neaten some macro definitions
 - Convert some Ok output to OK
 - Use "%s: ", __func__ in pr_fmt for summit
 - Convert some printks to pr_&lt;level&gt;

Message output is not identical in all cases.

Signed-off-by: Joe Perches &lt;joe@perches.com&gt;
Cc: levinsasha928@gmail.com
Link: http://lkml.kernel.org/r/1337655007.24226.10.camel@joe2Laptop
[ merged two similar patches, tidied up the changelog ]
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use a more current logging style:

 - Bare printks should have a KERN_&lt;LEVEL&gt; for consistency's sake
 - Add pr_fmt where appropriate
 - Neaten some macro definitions
 - Convert some Ok output to OK
 - Use "%s: ", __func__ in pr_fmt for summit
 - Convert some printks to pr_&lt;level&gt;

Message output is not identical in all cases.

Signed-off-by: Joe Perches &lt;joe@perches.com&gt;
Cc: levinsasha928@gmail.com
Link: http://lkml.kernel.org/r/1337655007.24226.10.camel@joe2Laptop
[ merged two similar patches, tidied up the changelog ]
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86: Avoid double stack traces with show_regs()</title>
<updated>2012-05-09T09:44:42+00:00</updated>
<author>
<name>Jan Beulich</name>
<email>JBeulich@suse.com</email>
</author>
<published>2012-05-09T07:47:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=57da8b960b9a25646a8ddb5a9c1d0b5978e69bec'/>
<id>57da8b960b9a25646a8ddb5a9c1d0b5978e69bec</id>
<content type='text'>
What was called show_registers() so far already showed a stack
trace for kernel faults, and kernel_stack_pointer() isn't even
valid to be used for faults from user mode, hence it was
pointless for show_regs() to call show_trace() after
show_registers().

Simply rename show_registers() to show_regs() and eliminate
the old definition.

Signed-off-by: Jan Beulich &lt;jbeulich@suse.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Link: http://lkml.kernel.org/r/4FAA3D3902000078000826E1@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
What was called show_registers() so far already showed a stack
trace for kernel faults, and kernel_stack_pointer() isn't even
valid to be used for faults from user mode, hence it was
pointless for show_regs() to call show_trace() after
show_registers().

Simply rename show_registers() to show_regs() and eliminate
the old definition.

Signed-off-by: Jan Beulich &lt;jbeulich@suse.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Link: http://lkml.kernel.org/r/4FAA3D3902000078000826E1@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branches 'core-urgent-for-linus', 'perf-urgent-for-linus', 'sched-urgent-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2012-02-02T19:11:13+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-02-02T19:11:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2f2fde927243bde5fd106da692efef34be12f81c'/>
<id>2f2fde927243bde5fd106da692efef34be12f81c</id>
<content type='text'>
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  bugs, x86: Fix printk levels for panic, softlockups and stack dumps

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf top: Fix number of samples displayed
  perf tools: Fix strlen() bug in perf_event__synthesize_event_type()
  perf tools: Fix broken build by defining _GNU_SOURCE in Makefile
  x86/dumpstack: Remove unneeded check in dump_trace()
  perf: Fix broken interrupt rate throttling

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/rt: Fix task stack corruption under __ARCH_WANT_INTERRUPTS_ON_CTXSW
  sched: Fix ancient race in do_exit()
  sched/nohz: Fix nohz cpu idle load balancing state with cpu hotplug
  sched/s390: Fix compile error in sched/core.c
  sched: Fix rq-&gt;nr_uninterruptible update race

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/reboot: Remove VersaLogic Menlow reboot quirk
  x86/reboot: Skip DMI checks if reboot set by user
  x86: Properly parenthesize cmpxchg() macro arguments
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  bugs, x86: Fix printk levels for panic, softlockups and stack dumps

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf top: Fix number of samples displayed
  perf tools: Fix strlen() bug in perf_event__synthesize_event_type()
  perf tools: Fix broken build by defining _GNU_SOURCE in Makefile
  x86/dumpstack: Remove unneeded check in dump_trace()
  perf: Fix broken interrupt rate throttling

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/rt: Fix task stack corruption under __ARCH_WANT_INTERRUPTS_ON_CTXSW
  sched: Fix ancient race in do_exit()
  sched/nohz: Fix nohz cpu idle load balancing state with cpu hotplug
  sched/s390: Fix compile error in sched/core.c
  sched: Fix rq-&gt;nr_uninterruptible update race

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/reboot: Remove VersaLogic Menlow reboot quirk
  x86/reboot: Skip DMI checks if reboot set by user
  x86: Properly parenthesize cmpxchg() macro arguments
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/dumpstack: Remove unneeded check in dump_trace()</title>
<updated>2012-01-28T12:09:06+00:00</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@oracle.com</email>
</author>
<published>2012-01-28T10:52:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d0caf292505d051b1026e85faf3a85e907566f31'/>
<id>d0caf292505d051b1026e85faf3a85e907566f31</id>
<content type='text'>
Smatch complains that we have some inconsistent NULL checking.

If "task" were NULL then it would lead to a NULL dereference
later. We can remove this test because earlier on in the
function we have:

 if (!task)
	task = current;

Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Acked-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Namhyung Kim &lt;namhyung@gmail.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Clemens Ladisch &lt;clemens@ladisch.de&gt;
Link: http://lkml.kernel.org/r/20120128105246.GA25092@elgon.mountain
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Smatch complains that we have some inconsistent NULL checking.

If "task" were NULL then it would lead to a NULL dereference
later. We can remove this test because earlier on in the
function we have:

 if (!task)
	task = current;

Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Acked-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Namhyung Kim &lt;namhyung@gmail.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Clemens Ladisch &lt;clemens@ladisch.de&gt;
Link: http://lkml.kernel.org/r/20120128105246.GA25092@elgon.mountain
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bugs, x86: Fix printk levels for panic, softlockups and stack dumps</title>
<updated>2012-01-26T20:28:45+00:00</updated>
<author>
<name>Prarit Bhargava</name>
<email>prarit@redhat.com</email>
</author>
<published>2012-01-26T13:55:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b0f4c4b32c8e3aa0d44fc4dd6c40a9a9a8d66b63'/>
<id>b0f4c4b32c8e3aa0d44fc4dd6c40a9a9a8d66b63</id>
<content type='text'>
rsyslog will display KERN_EMERG messages on a connected
terminal.  However, these messages are useless/undecipherable
for a general user.

For example, after a softlockup we get:

 Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
 kernel:Stack:

 Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
 kernel:Call Trace:

 Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
 kernel:Code: ff ff a8 08 75 25 31 d2 48 8d 86 38 e0 ff ff 48 89
 d1 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff ff a8 08 75 08 b1 01 4c 89 e0 0f 01 c9 &lt;e8&gt; ea 69 dd ff 4c 29 e8 48 89 c7 e8 0f bc da ff 49 89 c4 49 89

This happens because the printk levels for these messages are
incorrect. Only an informational message should be displayed on
a terminal.

I modified the printk levels for various messages in the kernel
and tested the output by using the drivers/misc/lkdtm.c kernel
modules (ie, softlockups, panics, hard lockups, etc.) and
confirmed that the console output was still the same and that
the output to the terminals was correct.

For example, in the case of a softlockup we now see the much
more informative:

 Message from syslogd@intel-s3e37-04 at Jan 25 10:18:06 ...
 BUG: soft lockup - CPU4 stuck for 60s!

instead of the above confusing messages.

AFAICT, the messages no longer have to be KERN_EMERG.  In the
most important case of a panic we set console_verbose().  As for
the other less severe cases the correct data is output to the
console and /var/log/messages.

Successfully tested by me using the drivers/misc/lkdtm.c module.

Signed-off-by: Prarit Bhargava &lt;prarit@redhat.com&gt;
Cc: dzickus@redhat.com
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/1327586134-11926-1-git-send-email-prarit@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
rsyslog will display KERN_EMERG messages on a connected
terminal.  However, these messages are useless/undecipherable
for a general user.

For example, after a softlockup we get:

 Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
 kernel:Stack:

 Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
 kernel:Call Trace:

 Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
 kernel:Code: ff ff a8 08 75 25 31 d2 48 8d 86 38 e0 ff ff 48 89
 d1 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff ff a8 08 75 08 b1 01 4c 89 e0 0f 01 c9 &lt;e8&gt; ea 69 dd ff 4c 29 e8 48 89 c7 e8 0f bc da ff 49 89 c4 49 89

This happens because the printk levels for these messages are
incorrect. Only an informational message should be displayed on
a terminal.

I modified the printk levels for various messages in the kernel
and tested the output by using the drivers/misc/lkdtm.c kernel
modules (ie, softlockups, panics, hard lockups, etc.) and
confirmed that the console output was still the same and that
the output to the terminals was correct.

For example, in the case of a softlockup we now see the much
more informative:

 Message from syslogd@intel-s3e37-04 at Jan 25 10:18:06 ...
 BUG: soft lockup - CPU4 stuck for 60s!

instead of the above confusing messages.

AFAICT, the messages no longer have to be KERN_EMERG.  In the
most important case of a panic we set console_verbose().  As for
the other less severe cases the correct data is output to the
console and /var/log/messages.

Successfully tested by me using the drivers/misc/lkdtm.c module.

Signed-off-by: Prarit Bhargava &lt;prarit@redhat.com&gt;
Cc: dzickus@redhat.com
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/1327586134-11926-1-git-send-email-prarit@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86, dumpstack: Fix code bytes breakage due to missing KERN_CONT</title>
<updated>2011-12-19T21:09:56+00:00</updated>
<author>
<name>Clemens Ladisch</name>
<email>clemens@ladisch.de</email>
</author>
<published>2011-12-19T21:07:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=13f541c10b30fc6529200d7f9a0073217709622f'/>
<id>13f541c10b30fc6529200d7f9a0073217709622f</id>
<content type='text'>
When printing the code bytes in show_registers(), the markers around the
byte at the fault address could make the printk() format string look
like a valid log level and facility code.  This would prevent this byte
from being printed and result in a spurious newline:

[ 7555.765589] Code: 8b 32 e9 94 00 00 00 81 7d 00 ff 00 00 00 0f 87 96 00 00 00 48 8b 83 c0 00 00 00 44 89 e2 44 89 e6 48 89 df 48 8b 80 d8 02 00 00
[ 7555.765683]  8b 48 28 48 89 d0 81 e2 ff 0f 00 00 48 c1 e8 0c 48 c1 e0 04

Add KERN_CONT where needed, and elsewhere in show_registers() for
consistency.

Signed-off-by: Clemens Ladisch &lt;clemens@ladisch.de&gt;
Link: http://lkml.kernel.org/r/4EEFA7AE.9020407@ladisch.de
Signed-off-by: H. Peter Anvin &lt;hpa@linux.intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When printing the code bytes in show_registers(), the markers around the
byte at the fault address could make the printk() format string look
like a valid log level and facility code.  This would prevent this byte
from being printed and result in a spurious newline:

[ 7555.765589] Code: 8b 32 e9 94 00 00 00 81 7d 00 ff 00 00 00 0f 87 96 00 00 00 48 8b 83 c0 00 00 00 44 89 e2 44 89 e6 48 89 df 48 8b 80 d8 02 00 00
[ 7555.765683]  8b 48 28 48 89 d0 81 e2 ff 0f 00 00 48 c1 e8 0c 48 c1 e0 04

Add KERN_CONT where needed, and elsewhere in show_registers() for
consistency.

Signed-off-by: Clemens Ladisch &lt;clemens@ladisch.de&gt;
Link: http://lkml.kernel.org/r/4EEFA7AE.9020407@ladisch.de
Signed-off-by: H. Peter Anvin &lt;hpa@linux.intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86: Don't use frame pointer to save old stack on irq entry</title>
<updated>2011-07-02T16:06:36+00:00</updated>
<author>
<name>Frederic Weisbecker</name>
<email>fweisbec@gmail.com</email>
</author>
<published>2011-07-02T14:52:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a2bbe75089d5eb9a3a46d50dd5c215e213790288'/>
<id>a2bbe75089d5eb9a3a46d50dd5c215e213790288</id>
<content type='text'>
rbp is used in SAVE_ARGS_IRQ to save the old stack pointer
in order to restore it later in ret_from_intr.

It is convenient because we save its value in the irq regs
and it's easily restored using the leave instruction.

However this is a kind of abuse of the frame pointer which
role is to help unwinding the kernel by chaining frames
together, each node following the return address to the
previous frame.

But although we are breaking the frame by changing the stack
pointer, there is no preceding return address before the new
frame. Hence using the frame pointer to link the two stacks
breaks the stack unwinders that find a random value instead of
a return address here.

There is no workaround that can work in every case. We are using
the fixup_bp_irq_link() function to dereference that abused frame
pointer in the case of non nesting interrupt (which means stack
changed).
But that doesn't fix the case of interrupts that don't change the
stack (but we still have the unconditional frame link), which is
the case of hardirq interrupting softirq. We have no way to detect
this transition so the frame irq link is considered as a real frame
pointer and the return address is dereferenced but it is still a
spurious one.

There are two possible results of this: either the spurious return
address, a random stack value, luckily belongs to the kernel text
and then the unwinding can continue and we just have a weird entry
in the stack trace. Or it doesn't belong to the kernel text and
unwinding stops there.

This is the reason why stacktraces (including perf callchains) on
irqs that interrupted softirqs don't work very well.

To solve this, we don't save the old stack pointer on rbp anymore
but we save it to a scratch register that we push on the new
stack and that we pop back later on irq return.

This preserves the whole frame chain without spurious return addresses
in the middle and drops the need for the horrid fixup_bp_irq_link()
workaround.

And finally irqs that interrupt softirq are sanely unwinded.

Before:

    99.81%         perf  [kernel.kallsyms]  [k] perf_pending_event
                   |
                   --- perf_pending_event
                       irq_work_run
                       smp_irq_work_interrupt
                       irq_work_interrupt
                      |
                      |--41.60%-- __read
                      |          |
                      |          |--99.90%-- create_worker
                      |          |          bench_sched_messaging
                      |          |          cmd_bench
                      |          |          run_builtin
                      |          |          main
                      |          |          __libc_start_main
                      |           --0.10%-- [...]

After:

     1.64%  swapper  [kernel.kallsyms]  [k] perf_pending_event
            |
            --- perf_pending_event
                irq_work_run
                smp_irq_work_interrupt
                irq_work_interrupt
               |
               |--95.00%-- arch_irq_work_raise
               |          irq_work_queue
               |          __perf_event_overflow
               |          perf_swevent_overflow
               |          perf_swevent_event
               |          perf_tp_event
               |          perf_trace_softirq
               |          __do_softirq
               |          call_softirq
               |          do_softirq
               |          irq_exit
               |          |
               |          |--73.68%-- smp_apic_timer_interrupt
               |          |          apic_timer_interrupt
               |          |          |
               |          |          |--96.43%-- amd_e400_idle
               |          |          |          cpu_idle
               |          |          |          start_secondary

Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Jan Beulich &lt;JBeulich@novell.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
rbp is used in SAVE_ARGS_IRQ to save the old stack pointer
in order to restore it later in ret_from_intr.

It is convenient because we save its value in the irq regs
and it's easily restored using the leave instruction.

However this is a kind of abuse of the frame pointer which
role is to help unwinding the kernel by chaining frames
together, each node following the return address to the
previous frame.

But although we are breaking the frame by changing the stack
pointer, there is no preceding return address before the new
frame. Hence using the frame pointer to link the two stacks
breaks the stack unwinders that find a random value instead of
a return address here.

There is no workaround that can work in every case. We are using
the fixup_bp_irq_link() function to dereference that abused frame
pointer in the case of non nesting interrupt (which means stack
changed).
But that doesn't fix the case of interrupts that don't change the
stack (but we still have the unconditional frame link), which is
the case of hardirq interrupting softirq. We have no way to detect
this transition so the frame irq link is considered as a real frame
pointer and the return address is dereferenced but it is still a
spurious one.

There are two possible results of this: either the spurious return
address, a random stack value, luckily belongs to the kernel text
and then the unwinding can continue and we just have a weird entry
in the stack trace. Or it doesn't belong to the kernel text and
unwinding stops there.

This is the reason why stacktraces (including perf callchains) on
irqs that interrupted softirqs don't work very well.

To solve this, we don't save the old stack pointer on rbp anymore
but we save it to a scratch register that we push on the new
stack and that we pop back later on irq return.

This preserves the whole frame chain without spurious return addresses
in the middle and drops the need for the horrid fixup_bp_irq_link()
workaround.

And finally irqs that interrupt softirq are sanely unwinded.

Before:

    99.81%         perf  [kernel.kallsyms]  [k] perf_pending_event
                   |
                   --- perf_pending_event
                       irq_work_run
                       smp_irq_work_interrupt
                       irq_work_interrupt
                      |
                      |--41.60%-- __read
                      |          |
                      |          |--99.90%-- create_worker
                      |          |          bench_sched_messaging
                      |          |          cmd_bench
                      |          |          run_builtin
                      |          |          main
                      |          |          __libc_start_main
                      |           --0.10%-- [...]

After:

     1.64%  swapper  [kernel.kallsyms]  [k] perf_pending_event
            |
            --- perf_pending_event
                irq_work_run
                smp_irq_work_interrupt
                irq_work_interrupt
               |
               |--95.00%-- arch_irq_work_raise
               |          irq_work_queue
               |          __perf_event_overflow
               |          perf_swevent_overflow
               |          perf_swevent_event
               |          perf_tp_event
               |          perf_trace_softirq
               |          __do_softirq
               |          call_softirq
               |          do_softirq
               |          irq_exit
               |          |
               |          |--73.68%-- smp_apic_timer_interrupt
               |          |          apic_timer_interrupt
               |          |          |
               |          |          |--96.43%-- amd_e400_idle
               |          |          |          cpu_idle
               |          |          |          start_secondary

Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Jan Beulich &lt;JBeulich@novell.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86: Fetch stack from regs when possible in dump_trace()</title>
<updated>2011-07-02T16:04:20+00:00</updated>
<author>
<name>Frederic Weisbecker</name>
<email>fweisbec@gmail.com</email>
</author>
<published>2011-06-30T17:04:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=47ce11a2b6519f9c7843223ea8e561eb71ea5896'/>
<id>47ce11a2b6519f9c7843223ea8e561eb71ea5896</id>
<content type='text'>
When regs are passed to dump_stack(), we fetch the frame
pointer from the regs but the stack pointer is taken from
the current frame.

Thus the frame and stack pointers may not come from the same
context. For example this can result in the unwinder to
think the context is in irq, due to the current value of
the stack, but the frame pointer coming from the regs points
to a frame from another place. It then tries to fix up
the irq link but ends up dereferencing a random frame
pointer that doesn't belong to the irq stack:

[ 9131.706906] ------------[ cut here ]------------
[ 9131.707003] WARNING: at arch/x86/kernel/dumpstack_64.c:129 dump_trace+0x2aa/0x330()
[ 9131.707003] Hardware name: AMD690VM-FMH
[ 9131.707003] Perf: bad frame pointer = 0000000000000005 in callchain
[ 9131.707003] Modules linked in:
[ 9131.707003] Pid: 1050, comm: perf Not tainted 3.0.0-rc3+ #181
[ 9131.707003] Call Trace:
[ 9131.707003]  &lt;IRQ&gt;  [&lt;ffffffff8104bd4a&gt;] warn_slowpath_common+0x7a/0xb0
[ 9131.707003]  [&lt;ffffffff8104be21&gt;] warn_slowpath_fmt+0x41/0x50
[ 9131.707003]  [&lt;ffffffff8178b873&gt;] ? bad_to_user+0x6d/0x10be
[ 9131.707003]  [&lt;ffffffff8100c2da&gt;] dump_trace+0x2aa/0x330
[ 9131.707003]  [&lt;ffffffff810107d3&gt;] ? native_sched_clock+0x13/0x50
[ 9131.707003]  [&lt;ffffffff8101b164&gt;] perf_callchain_kernel+0x54/0x70
[ 9131.707003]  [&lt;ffffffff810d391f&gt;] perf_prepare_sample+0x19f/0x2a0
[ 9131.707003]  [&lt;ffffffff810d546c&gt;] __perf_event_overflow+0x16c/0x290
[ 9131.707003]  [&lt;ffffffff810d5430&gt;] ? __perf_event_overflow+0x130/0x290
[ 9131.707003]  [&lt;ffffffff810107d3&gt;] ? native_sched_clock+0x13/0x50
[ 9131.707003]  [&lt;ffffffff8100fbb9&gt;] ? sched_clock+0x9/0x10
[ 9131.707003]  [&lt;ffffffff810752e5&gt;] ? T.375+0x15/0x90
[ 9131.707003]  [&lt;ffffffff81084da4&gt;] ? trace_hardirqs_on_caller+0x64/0x180
[ 9131.707003]  [&lt;ffffffff810817bd&gt;] ? trace_hardirqs_off+0xd/0x10
[ 9131.707003]  [&lt;ffffffff810d5764&gt;] perf_event_overflow+0x14/0x20
[ 9131.707003]  [&lt;ffffffff810d588c&gt;] perf_swevent_hrtimer+0x11c/0x130
[ 9131.707003]  [&lt;ffffffff817821a1&gt;] ? error_exit+0x51/0xb0
[ 9131.707003]  [&lt;ffffffff81072e93&gt;] __run_hrtimer+0x83/0x1e0
[ 9131.707003]  [&lt;ffffffff810d5770&gt;] ? perf_event_overflow+0x20/0x20
[ 9131.707003]  [&lt;ffffffff81073256&gt;] hrtimer_interrupt+0x106/0x250
[ 9131.707003]  [&lt;ffffffff812a3bfd&gt;] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 9131.707003]  [&lt;ffffffff81024833&gt;] smp_apic_timer_interrupt+0x53/0x90
[ 9131.707003]  [&lt;ffffffff81789053&gt;] apic_timer_interrupt+0x13/0x20
[ 9131.707003]  &lt;EOI&gt;  [&lt;ffffffff817821a1&gt;] ? error_exit+0x51/0xb0
[ 9131.707003]  [&lt;ffffffff8178219c&gt;] ? error_exit+0x4c/0xb0
[ 9131.707003] ---[ end trace b2560d4876709347 ]---

Fix this by simply taking the stack pointer from regs-&gt;sp
when regs are provided.

Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When regs are passed to dump_stack(), we fetch the frame
pointer from the regs but the stack pointer is taken from
the current frame.

Thus the frame and stack pointers may not come from the same
context. For example this can result in the unwinder to
think the context is in irq, due to the current value of
the stack, but the frame pointer coming from the regs points
to a frame from another place. It then tries to fix up
the irq link but ends up dereferencing a random frame
pointer that doesn't belong to the irq stack:

[ 9131.706906] ------------[ cut here ]------------
[ 9131.707003] WARNING: at arch/x86/kernel/dumpstack_64.c:129 dump_trace+0x2aa/0x330()
[ 9131.707003] Hardware name: AMD690VM-FMH
[ 9131.707003] Perf: bad frame pointer = 0000000000000005 in callchain
[ 9131.707003] Modules linked in:
[ 9131.707003] Pid: 1050, comm: perf Not tainted 3.0.0-rc3+ #181
[ 9131.707003] Call Trace:
[ 9131.707003]  &lt;IRQ&gt;  [&lt;ffffffff8104bd4a&gt;] warn_slowpath_common+0x7a/0xb0
[ 9131.707003]  [&lt;ffffffff8104be21&gt;] warn_slowpath_fmt+0x41/0x50
[ 9131.707003]  [&lt;ffffffff8178b873&gt;] ? bad_to_user+0x6d/0x10be
[ 9131.707003]  [&lt;ffffffff8100c2da&gt;] dump_trace+0x2aa/0x330
[ 9131.707003]  [&lt;ffffffff810107d3&gt;] ? native_sched_clock+0x13/0x50
[ 9131.707003]  [&lt;ffffffff8101b164&gt;] perf_callchain_kernel+0x54/0x70
[ 9131.707003]  [&lt;ffffffff810d391f&gt;] perf_prepare_sample+0x19f/0x2a0
[ 9131.707003]  [&lt;ffffffff810d546c&gt;] __perf_event_overflow+0x16c/0x290
[ 9131.707003]  [&lt;ffffffff810d5430&gt;] ? __perf_event_overflow+0x130/0x290
[ 9131.707003]  [&lt;ffffffff810107d3&gt;] ? native_sched_clock+0x13/0x50
[ 9131.707003]  [&lt;ffffffff8100fbb9&gt;] ? sched_clock+0x9/0x10
[ 9131.707003]  [&lt;ffffffff810752e5&gt;] ? T.375+0x15/0x90
[ 9131.707003]  [&lt;ffffffff81084da4&gt;] ? trace_hardirqs_on_caller+0x64/0x180
[ 9131.707003]  [&lt;ffffffff810817bd&gt;] ? trace_hardirqs_off+0xd/0x10
[ 9131.707003]  [&lt;ffffffff810d5764&gt;] perf_event_overflow+0x14/0x20
[ 9131.707003]  [&lt;ffffffff810d588c&gt;] perf_swevent_hrtimer+0x11c/0x130
[ 9131.707003]  [&lt;ffffffff817821a1&gt;] ? error_exit+0x51/0xb0
[ 9131.707003]  [&lt;ffffffff81072e93&gt;] __run_hrtimer+0x83/0x1e0
[ 9131.707003]  [&lt;ffffffff810d5770&gt;] ? perf_event_overflow+0x20/0x20
[ 9131.707003]  [&lt;ffffffff81073256&gt;] hrtimer_interrupt+0x106/0x250
[ 9131.707003]  [&lt;ffffffff812a3bfd&gt;] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 9131.707003]  [&lt;ffffffff81024833&gt;] smp_apic_timer_interrupt+0x53/0x90
[ 9131.707003]  [&lt;ffffffff81789053&gt;] apic_timer_interrupt+0x13/0x20
[ 9131.707003]  &lt;EOI&gt;  [&lt;ffffffff817821a1&gt;] ? error_exit+0x51/0xb0
[ 9131.707003]  [&lt;ffffffff8178219c&gt;] ? error_exit+0x4c/0xb0
[ 9131.707003] ---[ end trace b2560d4876709347 ]---

Fix this by simply taking the stack pointer from regs-&gt;sp
when regs are provided.

Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86, dumpstack: Correct stack dump info when frame pointer is available</title>
<updated>2011-03-18T09:51:42+00:00</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@gmail.com</email>
</author>
<published>2011-03-18T02:40:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e8e999cf3cc733482e390b02ff25a64cecdc0b64'/>
<id>e8e999cf3cc733482e390b02ff25a64cecdc0b64</id>
<content type='text'>
Current stack dump code scans entire stack and check each entry
contains a pointer to kernel code. If CONFIG_FRAME_POINTER=y it
could mark whether the pointer is valid or not based on value of
the frame pointer. Invalid entries could be preceded by '?' sign.

However this was not going to happen because scan start point
was always higher than the frame pointer so that they could not
meet.

Commit 9c0729dc8062 ("x86: Eliminate bp argument from the stack
tracing routines") delayed bp acquisition point, so the bp was
read in lower frame, thus all of the entries were marked
invalid.

This patch fixes this by reverting above commit while retaining
stack_frame() helper as suggested by Frederic Weisbecker.

End result looks like below:

before:

 [    3.508329] Call Trace:
 [    3.508551]  [&lt;ffffffff814f35c9&gt;] ? panic+0x91/0x199
 [    3.508662]  [&lt;ffffffff814f3739&gt;] ? printk+0x68/0x6a
 [    3.508770]  [&lt;ffffffff81a981b2&gt;] ? mount_block_root+0x257/0x26e
 [    3.508876]  [&lt;ffffffff81a9821f&gt;] ? mount_root+0x56/0x5a
 [    3.508975]  [&lt;ffffffff81a98393&gt;] ? prepare_namespace+0x170/0x1a9
 [    3.509216]  [&lt;ffffffff81a9772b&gt;] ? kernel_init+0x1d2/0x1e2
 [    3.509335]  [&lt;ffffffff81003894&gt;] ? kernel_thread_helper+0x4/0x10
 [    3.509442]  [&lt;ffffffff814f6880&gt;] ? restore_args+0x0/0x30
 [    3.509542]  [&lt;ffffffff81a97559&gt;] ? kernel_init+0x0/0x1e2
 [    3.509641]  [&lt;ffffffff81003890&gt;] ? kernel_thread_helper+0x0/0x10

after:

 [    3.522991] Call Trace:
 [    3.523351]  [&lt;ffffffff814f35b9&gt;] panic+0x91/0x199
 [    3.523468]  [&lt;ffffffff814f3729&gt;] ? printk+0x68/0x6a
 [    3.523576]  [&lt;ffffffff81a981b2&gt;] mount_block_root+0x257/0x26e
 [    3.523681]  [&lt;ffffffff81a9821f&gt;] mount_root+0x56/0x5a
 [    3.523780]  [&lt;ffffffff81a98393&gt;] prepare_namespace+0x170/0x1a9
 [    3.523885]  [&lt;ffffffff81a9772b&gt;] kernel_init+0x1d2/0x1e2
 [    3.523987]  [&lt;ffffffff81003894&gt;] kernel_thread_helper+0x4/0x10
 [    3.524228]  [&lt;ffffffff814f6880&gt;] ? restore_args+0x0/0x30
 [    3.524345]  [&lt;ffffffff81a97559&gt;] ? kernel_init+0x0/0x1e2
 [    3.524445]  [&lt;ffffffff81003890&gt;] ? kernel_thread_helper+0x0/0x10

 -v5:
   * fix build breakage with oprofile

 -v4:
   * use 0 instead of regs-&gt;bp
   * separate out printk changes

 -v3:
   * apply comment from Frederic
   * add a couple of printk fixes

Signed-off-by: Namhyung Kim &lt;namhyung@gmail.com&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Acked-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Soren Sandmann &lt;ssp@redhat.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Robert Richter &lt;robert.richter@amd.com&gt;
LKML-Reference: &lt;1300416006-3163-1-git-send-email-namhyung@gmail.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Current stack dump code scans entire stack and check each entry
contains a pointer to kernel code. If CONFIG_FRAME_POINTER=y it
could mark whether the pointer is valid or not based on value of
the frame pointer. Invalid entries could be preceded by '?' sign.

However this was not going to happen because scan start point
was always higher than the frame pointer so that they could not
meet.

Commit 9c0729dc8062 ("x86: Eliminate bp argument from the stack
tracing routines") delayed bp acquisition point, so the bp was
read in lower frame, thus all of the entries were marked
invalid.

This patch fixes this by reverting above commit while retaining
stack_frame() helper as suggested by Frederic Weisbecker.

End result looks like below:

before:

 [    3.508329] Call Trace:
 [    3.508551]  [&lt;ffffffff814f35c9&gt;] ? panic+0x91/0x199
 [    3.508662]  [&lt;ffffffff814f3739&gt;] ? printk+0x68/0x6a
 [    3.508770]  [&lt;ffffffff81a981b2&gt;] ? mount_block_root+0x257/0x26e
 [    3.508876]  [&lt;ffffffff81a9821f&gt;] ? mount_root+0x56/0x5a
 [    3.508975]  [&lt;ffffffff81a98393&gt;] ? prepare_namespace+0x170/0x1a9
 [    3.509216]  [&lt;ffffffff81a9772b&gt;] ? kernel_init+0x1d2/0x1e2
 [    3.509335]  [&lt;ffffffff81003894&gt;] ? kernel_thread_helper+0x4/0x10
 [    3.509442]  [&lt;ffffffff814f6880&gt;] ? restore_args+0x0/0x30
 [    3.509542]  [&lt;ffffffff81a97559&gt;] ? kernel_init+0x0/0x1e2
 [    3.509641]  [&lt;ffffffff81003890&gt;] ? kernel_thread_helper+0x0/0x10

after:

 [    3.522991] Call Trace:
 [    3.523351]  [&lt;ffffffff814f35b9&gt;] panic+0x91/0x199
 [    3.523468]  [&lt;ffffffff814f3729&gt;] ? printk+0x68/0x6a
 [    3.523576]  [&lt;ffffffff81a981b2&gt;] mount_block_root+0x257/0x26e
 [    3.523681]  [&lt;ffffffff81a9821f&gt;] mount_root+0x56/0x5a
 [    3.523780]  [&lt;ffffffff81a98393&gt;] prepare_namespace+0x170/0x1a9
 [    3.523885]  [&lt;ffffffff81a9772b&gt;] kernel_init+0x1d2/0x1e2
 [    3.523987]  [&lt;ffffffff81003894&gt;] kernel_thread_helper+0x4/0x10
 [    3.524228]  [&lt;ffffffff814f6880&gt;] ? restore_args+0x0/0x30
 [    3.524345]  [&lt;ffffffff81a97559&gt;] ? kernel_init+0x0/0x1e2
 [    3.524445]  [&lt;ffffffff81003890&gt;] ? kernel_thread_helper+0x0/0x10

 -v5:
   * fix build breakage with oprofile

 -v4:
   * use 0 instead of regs-&gt;bp
   * separate out printk changes

 -v3:
   * apply comment from Frederic
   * add a couple of printk fixes

Signed-off-by: Namhyung Kim &lt;namhyung@gmail.com&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Acked-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Soren Sandmann &lt;ssp@redhat.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Robert Richter &lt;robert.richter@amd.com&gt;
LKML-Reference: &lt;1300416006-3163-1-git-send-email-namhyung@gmail.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
</feed>
