<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/powerpc/kernel/smp.c, branch v4.18</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>powerpc: smp_send_stop do not offline stopped CPUs</title>
<updated>2018-06-19T11:28:23+00:00</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2018-05-19T04:35:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=de6e5d38417e6cdb005843db420a2974993d36ff'/>
<id>de6e5d38417e6cdb005843db420a2974993d36ff</id>
<content type='text'>
Marking CPUs stopped by smp_send_stop as offline can cause warnings
due to cross-CPU wakeups. This trace was noticed on a busy system
running a sysrq+c crash test, after the injected crash:

WARNING: CPU: 51 PID: 1546 at kernel/sched/core.c:1179 set_task_cpu+0x22c/0x240
CPU: 51 PID: 1546 Comm: kworker/u352:1 Tainted: G      D
Workqueue: mlx5e mlx5e_update_stats_work [mlx5_core]
[...]
NIP [c00000000017c21c] set_task_cpu+0x22c/0x240
LR [c00000000017d580] try_to_wake_up+0x230/0x720
Call Trace:
[c000000001017700] runqueues+0x0/0xb00 (unreliable)
[c00000000017d580] try_to_wake_up+0x230/0x720
[c00000000015a214] insert_work+0x104/0x140
[c00000000015adb0] __queue_work+0x230/0x690
[c000003fc5007910] [c00000000015b26c] queue_work_on+0x5c/0x90
[c0080000135fc8f8] mlx5_cmd_exec+0x538/0xcb0 [mlx5_core]
[c008000013608fd0] mlx5_core_access_reg+0x140/0x1d0 [mlx5_core]
[c00800001362777c] mlx5e_update_pport_counters.constprop.59+0x6c/0x90 [mlx5_core]
[c008000013628868] mlx5e_update_ndo_stats+0x28/0x90 [mlx5_core]
[c008000013625558] mlx5e_update_stats_work+0x68/0xb0 [mlx5_core]
[c00000000015bcec] process_one_work+0x1bc/0x5f0
[c00000000015ecac] worker_thread+0xac/0x6b0
[c000000000168338] kthread+0x168/0x1b0
[c00000000000b628] ret_from_kernel_thread+0x5c/0xb4

This happens because firstly the CPU is not really offline in the
usual sense, processes and interrupts have not been migrated away.
Secondly smp_send_stop does not happen atomically on all CPUs, so
one CPU can have marked itself offline, while another CPU is still
running processes or interrupts which can affect the first CPU.

Fix this by just not marking the CPU as offline. It's more like
frozen in time, so offline does not really reflect its state properly
anyway. There should be nothing in the crash/panic path that walks
online CPUs and synchronously waits for them, so this change should
not introduce new hangs.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Marking CPUs stopped by smp_send_stop as offline can cause warnings
due to cross-CPU wakeups. This trace was noticed on a busy system
running a sysrq+c crash test, after the injected crash:

WARNING: CPU: 51 PID: 1546 at kernel/sched/core.c:1179 set_task_cpu+0x22c/0x240
CPU: 51 PID: 1546 Comm: kworker/u352:1 Tainted: G      D
Workqueue: mlx5e mlx5e_update_stats_work [mlx5_core]
[...]
NIP [c00000000017c21c] set_task_cpu+0x22c/0x240
LR [c00000000017d580] try_to_wake_up+0x230/0x720
Call Trace:
[c000000001017700] runqueues+0x0/0xb00 (unreliable)
[c00000000017d580] try_to_wake_up+0x230/0x720
[c00000000015a214] insert_work+0x104/0x140
[c00000000015adb0] __queue_work+0x230/0x690
[c000003fc5007910] [c00000000015b26c] queue_work_on+0x5c/0x90
[c0080000135fc8f8] mlx5_cmd_exec+0x538/0xcb0 [mlx5_core]
[c008000013608fd0] mlx5_core_access_reg+0x140/0x1d0 [mlx5_core]
[c00800001362777c] mlx5e_update_pport_counters.constprop.59+0x6c/0x90 [mlx5_core]
[c008000013628868] mlx5e_update_ndo_stats+0x28/0x90 [mlx5_core]
[c008000013625558] mlx5e_update_stats_work+0x68/0xb0 [mlx5_core]
[c00000000015bcec] process_one_work+0x1bc/0x5f0
[c00000000015ecac] worker_thread+0xac/0x6b0
[c000000000168338] kthread+0x168/0x1b0
[c00000000000b628] ret_from_kernel_thread+0x5c/0xb4

This happens because firstly the CPU is not really offline in the
usual sense, processes and interrupts have not been migrated away.
Secondly smp_send_stop does not happen atomically on all CPUs, so
one CPU can have marked itself offline, while another CPU is still
running processes or interrupts which can affect the first CPU.

Fix this by just not marking the CPU as offline. It's more like
frozen in time, so offline does not really reflect its state properly
anyway. There should be nothing in the crash/panic path that walks
online CPUs and synchronously waits for them, so this change should
not introduce new hangs.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/nmi: Add an API for sending "safe" NMIs</title>
<updated>2018-06-03T10:43:43+00:00</updated>
<author>
<name>Michael Ellerman</name>
<email>mpe@ellerman.id.au</email>
</author>
<published>2018-05-02T13:07:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6ba55716a24f5f399ad4d37685e4bb721f8e6dd5'/>
<id>6ba55716a24f5f399ad4d37685e4bb721f8e6dd5</id>
<content type='text'>
Currently the options we have for sending NMIs are not necessarily
safe, that is they can potentially interrupt a CPU in a
non-recoverable region of code, meaning the kernel must then panic().

But we'd like to use smp_send_nmi_ipi() to do cross-CPU calls in
situations where we don't want to risk a panic(), because it doesn't
have the requirement that interrupts must be enabled like
smp_call_function().

So add an API for the caller to indicate that it wants to use the NMI
infrastructure, but doesn't want to do anything "unsafe".

Currently that is implemented by not actually calling cause_nmi_ipi(),
instead falling back to an IPI. In future we can pass the safe
parameter down to cause_nmi_ipi() and the individual backends can
potentially take it into account before deciding what to do.

Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Reviewed-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently the options we have for sending NMIs are not necessarily
safe, that is they can potentially interrupt a CPU in a
non-recoverable region of code, meaning the kernel must then panic().

But we'd like to use smp_send_nmi_ipi() to do cross-CPU calls in
situations where we don't want to risk a panic(), because it doesn't
have the requirement that interrupts must be enabled like
smp_call_function().

So add an API for the caller to indicate that it wants to use the NMI
infrastructure, but doesn't want to do anything "unsafe".

Currently that is implemented by not actually calling cause_nmi_ipi(),
instead falling back to an IPI. In future we can pass the safe
parameter down to cause_nmi_ipi() and the individual backends can
potentially take it into account before deciding what to do.

Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Reviewed-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc: move a stray NMI IPI case under NMI_IPI ifdef</title>
<updated>2018-06-03T10:40:29+00:00</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2018-05-04T17:19:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=21bfd6a8e9999f40f9eae09ca6ba33e7f75f0be4'/>
<id>21bfd6a8e9999f40f9eae09ca6ba33e7f75f0be4</id>
<content type='text'>
Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc: move timer broadcast code under GENERIC_CLOCKEVENTS_BROADCAST ifdef</title>
<updated>2018-06-03T10:40:28+00:00</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2018-05-04T17:19:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=bc9071133144acdbdb28cfc6ee5ce983d8fd5f81'/>
<id>bc9071133144acdbdb28cfc6ee5ce983d8fd5f81</id>
<content type='text'>
Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc: generic clockevents broadcast receiver call tick_receive_broadcast</title>
<updated>2018-06-03T10:40:27+00:00</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2018-05-04T17:19:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3f984620f9a4fe089c0a3c951b75a460211394bb'/>
<id>3f984620f9a4fe089c0a3c951b75a460211394bb</id>
<content type='text'>
The broadcast tick recipient can call tick_receive_broadcast rather
than re-running the full timer interrupt.

It does not have to check for the next event time, because the sender
already determined the timer has expired. It does not have to test
irq_work_pending, because that's a direct decrementer interrupt and
does not go through the clock events subsystem. And it does not have
to read PURR because that was removed with the previous patch.

This results in no code size change, but both the decrementer and
broadcast path lengths are reduced.

Cc: Srivatsa S. Bhat &lt;srivatsa.bhat@linux.vnet.ibm.com&gt;
Cc: Preeti U Murthy &lt;preeti@linux.vnet.ibm.com&gt;
Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The broadcast tick recipient can call tick_receive_broadcast rather
than re-running the full timer interrupt.

It does not have to check for the next event time, because the sender
already determined the timer has expired. It does not have to test
irq_work_pending, because that's a direct decrementer interrupt and
does not go through the clock events subsystem. And it does not have
to read PURR because that was removed with the previous patch.

This results in no code size change, but both the decrementer and
broadcast path lengths are reduced.

Cc: Srivatsa S. Bhat &lt;srivatsa.bhat@linux.vnet.ibm.com&gt;
Cc: Preeti U Murthy &lt;preeti@linux.vnet.ibm.com&gt;
Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc64/ftrace: Disable ftrace during hotplug</title>
<updated>2018-05-03T12:32:27+00:00</updated>
<author>
<name>Naveen N. Rao</name>
<email>naveen.n.rao@linux.vnet.ibm.com</email>
</author>
<published>2018-04-19T07:04:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=424ef0160f439feb2a1a6e796a281e2bfa7b6997'/>
<id>424ef0160f439feb2a1a6e796a281e2bfa7b6997</id>
<content type='text'>
Disable ftrace when a cpu is about to go offline. When the cpu is woken
up, ftrace will get enabled in start_secondary().

Signed-off-by: Naveen N. Rao &lt;naveen.n.rao@linux.vnet.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Disable ftrace when a cpu is about to go offline. When the cpu is woken
up, ftrace will get enabled in start_secondary().

Signed-off-by: Naveen N. Rao &lt;naveen.n.rao@linux.vnet.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc64/ftrace: Delay enabling ftrace on secondary cpus</title>
<updated>2018-05-03T12:32:26+00:00</updated>
<author>
<name>Naveen N. Rao</name>
<email>naveen.n.rao@linux.vnet.ibm.com</email>
</author>
<published>2018-04-19T07:04:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d103978636c27fce216bbc8bb289981047b71bd4'/>
<id>d103978636c27fce216bbc8bb289981047b71bd4</id>
<content type='text'>
On the boot cpu, though we enable paca-&gt;ftrace_enabled in early_setup()
(via cpu_ready_for_interrupts()), we don't start tracing until much
later since ftrace is not initialized yet and since we only support
DYNAMIC_FTRACE on powerpc. However, it is possible that ftrace has been
initialized by the time some of the secondary cpus start up. In this
case, we will try to trace some of the early boot code which can cause
problems.

To address this, move setting paca-&gt;ftrace_enabled from
cpu_ready_for_interrupts() to early_setup() for the boot cpu, and towards
the end of start_secondary() for secondary cpus.

Signed-off-by: Naveen N. Rao &lt;naveen.n.rao@linux.vnet.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On the boot cpu, though we enable paca-&gt;ftrace_enabled in early_setup()
(via cpu_ready_for_interrupts()), we don't start tracing until much
later since ftrace is not initialized yet and since we only support
DYNAMIC_FTRACE on powerpc. However, it is possible that ftrace has been
initialized by the time some of the secondary cpus start up. In this
case, we will try to trace some of the early boot code which can cause
problems.

To address this, move setting paca-&gt;ftrace_enabled from
cpu_ready_for_interrupts() to early_setup() for the boot cpu, and towards
the end of start_secondary() for secondary cpus.

Signed-off-by: Naveen N. Rao &lt;naveen.n.rao@linux.vnet.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc: Fix deadlock with multiple calls to smp_send_stop</title>
<updated>2018-04-27T06:35:57+00:00</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2018-04-27T01:51:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6029755eed95e5c90f763188c87ae3ff41e48e5c'/>
<id>6029755eed95e5c90f763188c87ae3ff41e48e5c</id>
<content type='text'>
smp_send_stop can lock up the IPI path for any subsequent calls,
because the receiving CPUs spin in their handler function. This
started becoming a problem with the addition of an smp_send_stop
call in the reboot path, because panics can reboot after doing
their own smp_send_stop.

The NMI IPI variant was fixed with ac61c11566 ("powerpc: Fix
smp_send_stop NMI IPI handling"), which leaves the smp_call_function
variant.

This is fixed by having smp_send_stop only ever do the
smp_call_function once. This is a bit less robust than the NMI IPI
fix, because any other call to smp_call_function after smp_send_stop
could deadlock, but that has always been the case, and it was not
been a problem before.

Fixes: f2748bdfe1573 ("powerpc/powernv: Always stop secondaries before reboot/shutdown")
Reported-by: Abdul Haleem &lt;abdhalee@linux.vnet.ibm.com&gt;
Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
smp_send_stop can lock up the IPI path for any subsequent calls,
because the receiving CPUs spin in their handler function. This
started becoming a problem with the addition of an smp_send_stop
call in the reboot path, because panics can reboot after doing
their own smp_send_stop.

The NMI IPI variant was fixed with ac61c11566 ("powerpc: Fix
smp_send_stop NMI IPI handling"), which leaves the smp_call_function
variant.

This is fixed by having smp_send_stop only ever do the
smp_call_function once. This is a bit less robust than the NMI IPI
fix, because any other call to smp_call_function after smp_send_stop
could deadlock, but that has always been the case, and it was not
been a problem before.

Fixes: f2748bdfe1573 ("powerpc/powernv: Always stop secondaries before reboot/shutdown")
Reported-by: Abdul Haleem &lt;abdhalee@linux.vnet.ibm.com&gt;
Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc: Fix smp_send_stop NMI IPI handling</title>
<updated>2018-04-25T10:38:08+00:00</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2018-04-25T02:17:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ac61c1156623455c46701654abd8c99720bceea1'/>
<id>ac61c1156623455c46701654abd8c99720bceea1</id>
<content type='text'>
The NMI IPI handler for a receiving CPU increments nmi_ipi_busy_count
over the handler function call, which causes later smp_send_nmi_ipi()
callers to spin until the call is finished.

The stop_this_cpu() function never returns, so the busy count is never
decremeted, which can cause the system to hang in some cases. For
example panic() will call smp_send_stop() early on which calls
stop_this_cpu() on other CPUs, then later in the reboot path,
pnv_restart() will call smp_send_stop() again, which hangs.

Fix this by adding a special case to the stop_this_cpu() handler to
decrement the busy count, because it will never return.

Now that the NMI/non-NMI versions of stop_this_cpu() are different,
split them out into separate functions rather than doing #ifdef tricks
to share the body between the two functions.

Fixes: 6bed3237624e3 ("powerpc: use NMI IPI for smp_send_stop")
Reported-by: Abdul Haleem &lt;abdhalee@linux.vnet.ibm.com&gt;
Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
[mpe: Split out the functions, tweak change log a bit]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The NMI IPI handler for a receiving CPU increments nmi_ipi_busy_count
over the handler function call, which causes later smp_send_nmi_ipi()
callers to spin until the call is finished.

The stop_this_cpu() function never returns, so the busy count is never
decremeted, which can cause the system to hang in some cases. For
example panic() will call smp_send_stop() early on which calls
stop_this_cpu() on other CPUs, then later in the reboot path,
pnv_restart() will call smp_send_stop() again, which hangs.

Fix this by adding a special case to the stop_this_cpu() handler to
decrement the busy count, because it will never return.

Now that the NMI/non-NMI versions of stop_this_cpu() are different,
split them out into separate functions rather than doing #ifdef tricks
to share the body between the two functions.

Fixes: 6bed3237624e3 ("powerpc: use NMI IPI for smp_send_stop")
Reported-by: Abdul Haleem &lt;abdhalee@linux.vnet.ibm.com&gt;
Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
[mpe: Split out the functions, tweak change log a bit]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc: hard disable irqs in smp_send_stop loop</title>
<updated>2018-04-03T12:59:10+00:00</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2018-04-01T10:36:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=855bfe0de1a05a01f89975ea8ba9f5521fb0f567'/>
<id>855bfe0de1a05a01f89975ea8ba9f5521fb0f567</id>
<content type='text'>
The hard lockup watchdog can fire under local_irq_disable
on platforms with irq soft masking.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The hard lockup watchdog can fire under local_irq_disable
on platforms with irq soft masking.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
</pre>
</div>
</content>
</entry>
</feed>
