<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/arc/kernel, branch v4.4-rc5</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>ARC: dw2 unwind: Remove falllback linear search thru FDE entries</title>
<updated>2015-11-23T16:06:49+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2015-11-23T14:02:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2e22502c080f27afeab5e6f11e618fb7bc7aea53'/>
<id>2e22502c080f27afeab5e6f11e618fb7bc7aea53</id>
<content type='text'>
Fixes STAR 9000953410: "perf callgraph profiling causing RCU stalls"

| perf record -g -c 15000 -e cycles /sbin/hackbench
|
| INFO: rcu_preempt self-detected stall on CPU
| 1: (1 GPs behind) idle=609/140000000000002/0 softirq=2914/2915 fqs=603
| Task dump for CPU 1:

in-kernel dwarf unwinder has a fast binary lookup and a fallback linear
search (which iterates thru each of ~11K entries) thus takes 2 orders of
magnitude longer (~3 million cycles vs. 2000). Routines written in hand
assembler lack dwarf info (as we don't support assembler CFI pseudo-ops
yet) fail the unwinder binary lookup, hit linear search, failing
nevertheless in the end.

However the linear search is pointless as binary lookup tables are created
from it in first place. It is impossible to have binary lookup fail while
succeed the linear search. It is pure waste of cycles thus removed by
this patch.

This manifested as RCU stalls / NMI watchdog splat when running
hackbench under perf with callgraph profiling. The triggering condition
was perf counter overflowing in routine lacking dwarf info (like memset)
leading to patheic 3 million cycle unwinder slow path and by the time it
returned new interrupts were already pending (Timer, IPI) and taken
rightaway. The original memset didn't make forward progress, system kept
accruing more interrupts and more unwinder delayes in a vicious feedback
loop, ultimately triggering the NMI diagnostic.

Cc: stable@vger.kernel.org
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fixes STAR 9000953410: "perf callgraph profiling causing RCU stalls"

| perf record -g -c 15000 -e cycles /sbin/hackbench
|
| INFO: rcu_preempt self-detected stall on CPU
| 1: (1 GPs behind) idle=609/140000000000002/0 softirq=2914/2915 fqs=603
| Task dump for CPU 1:

in-kernel dwarf unwinder has a fast binary lookup and a fallback linear
search (which iterates thru each of ~11K entries) thus takes 2 orders of
magnitude longer (~3 million cycles vs. 2000). Routines written in hand
assembler lack dwarf info (as we don't support assembler CFI pseudo-ops
yet) fail the unwinder binary lookup, hit linear search, failing
nevertheless in the end.

However the linear search is pointless as binary lookup tables are created
from it in first place. It is impossible to have binary lookup fail while
succeed the linear search. It is pure waste of cycles thus removed by
this patch.

This manifested as RCU stalls / NMI watchdog splat when running
hackbench under perf with callgraph profiling. The triggering condition
was perf counter overflowing in routine lacking dwarf info (like memset)
leading to patheic 3 million cycle unwinder slow path and by the time it
returned new interrupts were already pending (Timer, IPI) and taken
rightaway. The original memset didn't make forward progress, system kept
accruing more interrupts and more unwinder delayes in a vicious feedback
loop, ultimately triggering the NMI diagnostic.

Cc: stable@vger.kernel.org
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARC: remove SYNC from __switch_to()</title>
<updated>2015-11-17T16:35:30+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2015-11-17T12:16:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e81b75f7b206d4c44afd20354a66825239490109'/>
<id>e81b75f7b206d4c44afd20354a66825239490109</id>
<content type='text'>
SYNC in __switch_to() is a historic relic and not needed at all.

 - In UP context it is obviously useless, why would we want to stall
   the core for all updates to stack memory of t0 to complete before
   loading kernel mode callee registers from t1 stack's memory.

 - In SMP, there could be potential race in which outgoing task could
   be concurrently picked for running on a different core, thus writes
   to stack here need to be visible before the reads from stack on
   other core. Peter confirmed that generic schedular already has needed
   barriers (by way of rq lock) so there is no need for additional arch
   barrier.

This came up when Noam was trying to replace this SYNC with EZChip
specific hardware thread scheduling instruction for their platform
support.

Link: http://lkml.kernel.org/r/20151102092654.GM17308@twins.programming.kicks-ass.net
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: linux-kernel@vger.kernel.org
Cc: Noam Camus &lt;noamc@ezchip.com&gt;
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
SYNC in __switch_to() is a historic relic and not needed at all.

 - In UP context it is obviously useless, why would we want to stall
   the core for all updates to stack memory of t0 to complete before
   loading kernel mode callee registers from t1 stack's memory.

 - In SMP, there could be potential race in which outgoing task could
   be concurrently picked for running on a different core, thus writes
   to stack here need to be visible before the reads from stack on
   other core. Peter confirmed that generic schedular already has needed
   barriers (by way of rq lock) so there is no need for additional arch
   barrier.

This came up when Noam was trying to replace this SYNC with EZChip
specific hardware thread scheduling instruction for their platform
support.

Link: http://lkml.kernel.org/r/20151102092654.GM17308@twins.programming.kicks-ass.net
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: linux-kernel@vger.kernel.org
Cc: Noam Camus &lt;noamc@ezchip.com&gt;
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARC: Abstract out ISA specific SLEEP args</title>
<updated>2015-11-16T08:47:02+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2015-11-16T08:22:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=512b5b89b9ef90e7a6475513fa9402e55b0e831c'/>
<id>512b5b89b9ef90e7a6475513fa9402e55b0e831c</id>
<content type='text'>
No semantical changes, prepares for ARCv2 specific change in next commit

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
No semantical changes, prepares for ARCv2 specific change in next commit

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARC: [arcompact] Handle bus error from userspace as Interrupt not exception</title>
<updated>2015-11-14T07:42:20+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2015-10-30T19:52:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=541366da6a93f52f468b408ba24ab6bb5e4fd3d8'/>
<id>541366da6a93f52f468b408ba24ab6bb5e4fd3d8</id>
<content type='text'>
Bus errors from userspace on ARCompact based cores are handled by core
as a high priority L2 interrupt but current code treated it as interrupt
Handling an interrupt like exception is certainly not going to go unnoticed.
(and it worked so far as we never saw a Bus error from userspace until
IPPK guys tested a DDR controller with ECC error detection etc hence
needed to explicitly trigger/handle such errors)

 - So move mem_service exception handler from common code into ARCv2 code.
 - In ARCompact code, define  mem_service as L2 interrupt handler which
   just drops down to pure kernel mode and goes of to enqueue SIGBUS

Reported-by: Nelson Pereira &lt;npereira@synopsys.com&gt;
Tested-by: Ana Martins &lt;amartins@synopsys.com&gt;
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Bus errors from userspace on ARCompact based cores are handled by core
as a high priority L2 interrupt but current code treated it as interrupt
Handling an interrupt like exception is certainly not going to go unnoticed.
(and it worked so far as we never saw a Bus error from userspace until
IPPK guys tested a DDR controller with ECC error detection etc hence
needed to explicitly trigger/handle such errors)

 - So move mem_service exception handler from common code into ARCv2 code.
 - In ARCompact code, define  mem_service as L2 interrupt handler which
   just drops down to pure kernel mode and goes of to enqueue SIGBUS

Reported-by: Nelson Pereira &lt;npereira@synopsys.com&gt;
Tested-by: Ana Martins &lt;amartins@synopsys.com&gt;
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARC: boot: Non Master cpus only need to call EARLY_CPU_SETUP once</title>
<updated>2015-10-28T10:43:42+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2015-10-15T15:02:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=483bcc99c0a349570b30fc9cb20dea20c9387a4b'/>
<id>483bcc99c0a349570b30fc9cb20dea20c9387a4b</id>
<content type='text'>
With prev fixes, all cores now start via common entry point @stext which
already calls EARLY_CPU_SETUP for all cores - so no need to invoke it
again

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With prev fixes, all cores now start via common entry point @stext which
already calls EARLY_CPU_SETUP for all cores - so no need to invoke it
again

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARCv2: smp: [plat-*]: No need to explicitly call mcip_init_smp()</title>
<updated>2015-10-28T10:43:41+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2015-10-12T09:45:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=aa0efcde45a36d1ea2bc5bde4d47f36ec17502de'/>
<id>aa0efcde45a36d1ea2bc5bde4d47f36ec17502de</id>
<content type='text'>
MCIP now registers it's own per cpu setup routine (for IPI IRQ request)
using smp_ops.init_irq_cpu().

So no need for platforms to do that. This now completely decouples
platforms from MCIP.

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
MCIP now registers it's own per cpu setup routine (for IPI IRQ request)
using smp_ops.init_irq_cpu().

So no need for platforms to do that. This now completely decouples
platforms from MCIP.

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARC: smp: Introduce smp hook @init_irq_cpu called for all cores</title>
<updated>2015-10-28T10:43:41+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2015-10-14T09:08:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=286130ebf196d9643800977d57bdb7cda266b49e'/>
<id>286130ebf196d9643800977d57bdb7cda266b49e</id>
<content type='text'>
Note this is not part of platform owned static machine_desc,
but more of device owned plat_smp_ops (rather misnamed) which a IPI
provider or some such typically defines.

This will help us seperate out the IPI registration from platform
specific init_cpu_smp() into device specific init_irq_cpu()

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Note this is not part of platform owned static machine_desc,
but more of device owned plat_smp_ops (rather misnamed) which a IPI
provider or some such typically defines.

This will help us seperate out the IPI registration from platform
specific init_cpu_smp() into device specific init_irq_cpu()

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARC: smp: Rename platform hook @init_smp -&gt; @init_cpu_smp</title>
<updated>2015-10-28T10:43:40+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2015-10-13T09:56:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8721a7f5a6f95c38cacbe1be22c820a7698926ef'/>
<id>8721a7f5a6f95c38cacbe1be22c820a7698926ef</id>
<content type='text'>
This conveys better that it is called for each cpu

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This conveys better that it is called for each cpu

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARCv2: smp: [plat-*]: No need to explicitly call mcip_init_early_smp()</title>
<updated>2015-10-28T10:43:40+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2015-10-12T11:08:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=26b8f996239884451aeb1213747e3ca808c26024'/>
<id>26b8f996239884451aeb1213747e3ca808c26024</id>
<content type='text'>
MCIP now registers it's own probe callback with smp_ops.init_early_smp()
which is called by ARC common code, so no need for platforms to do that.

This decouples the platforms and MCIP and helps confine MCIP details
to it's own file.

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
MCIP now registers it's own probe callback with smp_ops.init_early_smp()
which is called by ARC common code, so no need for platforms to do that.

This decouples the platforms and MCIP and helps confine MCIP details
to it's own file.

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARC: smp: Introduce smp hook @init_early_smp for Master core</title>
<updated>2015-10-28T10:43:40+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2015-10-12T10:58:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e55af4da026ebdb9ded3cb7708b8a8bd7884ad3a'/>
<id>e55af4da026ebdb9ded3cb7708b8a8bd7884ad3a</id>
<content type='text'>
This adds a platform agnostic early SMP init hook which is called on
Master core before calling setup_processor()

  setup_arch()
     smp_init_cpus()
         smp_ops.init_early_smp()
     ...
     setup_processor()

How this helps:
 - Used for one time init of certain SMP centric IP blocks, before
   calling setup_processor() which probes various bits of core,
   possibly including this block

 - Currently platforms need to call this IP block init from their
   init routines, which doesn't make sense as this is specific to ARC
   core and not platform and otherwise requires copy/paste in all
   (and hence a possible point of failure)

e.g. MCIP init is called from 2 platforms currently (axs10x and sim)
which will go away once we have this.

This change only adds the hooks but they are empty for now. Next commit
will populate them and remove the explicit init calls from platforms.

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This adds a platform agnostic early SMP init hook which is called on
Master core before calling setup_processor()

  setup_arch()
     smp_init_cpus()
         smp_ops.init_early_smp()
     ...
     setup_processor()

How this helps:
 - Used for one time init of certain SMP centric IP blocks, before
   calling setup_processor() which probes various bits of core,
   possibly including this block

 - Currently platforms need to call this IP block init from their
   init routines, which doesn't make sense as this is specific to ARC
   core and not platform and otherwise requires copy/paste in all
   (and hence a possible point of failure)

e.g. MCIP init is called from 2 platforms currently (axs10x and sim)
which will go away once we have this.

This change only adds the hooks but they are empty for now. Next commit
will populate them and remove the explicit init calls from platforms.

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
