<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/arm64, branch v3.14</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm</title>
<updated>2014-02-28T19:45:03+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-02-28T19:45:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d8efcf38b13df3e9e889cf7cc214cb85dc53600c'/>
<id>d8efcf38b13df3e9e889cf7cc214cb85dc53600c</id>
<content type='text'>
Pull KVM fixes from Paolo Bonzini:
 "Three x86 fixes and one for ARM/ARM64.

  In particular, nested virtualization on Intel is broken in 3.13 and
  fixed by this pull request"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm, vmx: Really fix lazy FPU on nested guest
  kvm: x86: fix emulator buffer overflow (CVE-2014-0049)
  arm/arm64: KVM: detect CPU reset on CPU_PM_EXIT
  KVM: MMU: drop read-only large sptes when creating lower level sptes
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull KVM fixes from Paolo Bonzini:
 "Three x86 fixes and one for ARM/ARM64.

  In particular, nested virtualization on Intel is broken in 3.13 and
  fixed by this pull request"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm, vmx: Really fix lazy FPU on nested guest
  kvm: x86: fix emulator buffer overflow (CVE-2014-0049)
  arm/arm64: KVM: detect CPU reset on CPU_PM_EXIT
  KVM: MMU: drop read-only large sptes when creating lower level sptes
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: Fix !CONFIG_SMP kernel build</title>
<updated>2014-02-28T16:12:25+00:00</updated>
<author>
<name>Catalin Marinas</name>
<email>catalin.marinas@arm.com</email>
</author>
<published>2014-02-28T16:12:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b57fc9e80692043e2a3a74e1d2c047eb700dcd0c'/>
<id>b57fc9e80692043e2a3a74e1d2c047eb700dcd0c</id>
<content type='text'>
Commit fb4a96029c8a (arm64: kernel: fix per-cpu offset restore on
resume) uses per_cpu_offset() unconditionally during CPU wakeup,
however, this is only defined for the SMP case.

Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Reported-by: Dave P Martin &lt;Dave.Martin@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit fb4a96029c8a (arm64: kernel: fix per-cpu offset restore on
resume) uses per_cpu_offset() unconditionally during CPU wakeup,
however, this is only defined for the SMP case.

Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Reported-by: Dave P Martin &lt;Dave.Martin@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: mm: Add double logical invert to pte accessors</title>
<updated>2014-02-28T15:44:19+00:00</updated>
<author>
<name>Steve Capper</name>
<email>steve.capper@linaro.org</email>
</author>
<published>2014-02-25T11:38:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=84fe6826c28f69d8708bd575faed7f75e6b6f57f'/>
<id>84fe6826c28f69d8708bd575faed7f75e6b6f57f</id>
<content type='text'>
Page table entries on ARM64 are 64 bits, and some pte functions such as
pte_dirty return a bitwise-and of a flag with the pte value. If the
flag to be tested resides in the upper 32 bits of the pte, then we run
into the danger of the result being dropped if downcast.

For example:
	gather_stats(page, md, pte_dirty(*pte), 1);
where pte_dirty(*pte) is downcast to an int.

This patch adds a double logical invert to all the pte_ accessors to
ensure predictable downcasting.

Signed-off-by: Steve Capper &lt;steve.capper@linaro.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Page table entries on ARM64 are 64 bits, and some pte functions such as
pte_dirty return a bitwise-and of a flag with the pte value. If the
flag to be tested resides in the upper 32 bits of the pte, then we run
into the danger of the result being dropped if downcast.

For example:
	gather_stats(page, md, pte_dirty(*pte), 1);
where pte_dirty(*pte) is downcast to an int.

This patch adds a double logical invert to all the pte_ accessors to
ensure predictable downcasting.

Signed-off-by: Steve Capper &lt;steve.capper@linaro.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm/arm64: KVM: detect CPU reset on CPU_PM_EXIT</title>
<updated>2014-02-27T18:27:10+00:00</updated>
<author>
<name>Marc Zyngier</name>
<email>marc.zyngier@arm.com</email>
</author>
<published>2014-02-26T18:47:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b20c9f29c5c25921c6ad18b50d4b61e6d181c3cc'/>
<id>b20c9f29c5c25921c6ad18b50d4b61e6d181c3cc</id>
<content type='text'>
Commit 1fcf7ce0c602 (arm: kvm: implement CPU PM notifier) added
support for CPU power-management, using a cpu_notifier to re-init
KVM on a CPU that entered CPU idle.

The code assumed that a CPU entering idle would actually be powered
off, loosing its state entierely, and would then need to be
reinitialized. It turns out that this is not always the case, and
some HW performs CPU PM without actually killing the core. In this
case, we try to reinitialize KVM while it is still live. It ends up
badly, as reported by Andre Przywara (using a Calxeda Midway):

[    3.663897] Kernel panic - not syncing: unexpected prefetch abort in Hyp mode at: 0x685760
[    3.663897] unexpected data abort in Hyp mode at: 0xc067d150
[    3.663897] unexpected HVC/SVC trap in Hyp mode at: 0xc0901dd0

The trick here is to detect if we've been through a full re-init or
not by looking at HVBAR (VBAR_EL2 on arm64). This involves
implementing the backend for __hyp_get_vectors in the main KVM HYP
code (rather small), and checking the return value against the
default one when the CPU notifier is called on CPU_PM_EXIT.

Reported-by: Andre Przywara &lt;osp@andrep.de&gt;
Tested-by: Andre Przywara &lt;osp@andrep.de&gt;
Cc: Lorenzo Pieralisi &lt;lorenzo.pieralisi@arm.com&gt;
Cc: Rob Herring &lt;rob.herring@linaro.org&gt;
Acked-by: Christoffer Dall &lt;christoffer.dall@linaro.org&gt;
Signed-off-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 1fcf7ce0c602 (arm: kvm: implement CPU PM notifier) added
support for CPU power-management, using a cpu_notifier to re-init
KVM on a CPU that entered CPU idle.

The code assumed that a CPU entering idle would actually be powered
off, loosing its state entierely, and would then need to be
reinitialized. It turns out that this is not always the case, and
some HW performs CPU PM without actually killing the core. In this
case, we try to reinitialize KVM while it is still live. It ends up
badly, as reported by Andre Przywara (using a Calxeda Midway):

[    3.663897] Kernel panic - not syncing: unexpected prefetch abort in Hyp mode at: 0x685760
[    3.663897] unexpected data abort in Hyp mode at: 0xc067d150
[    3.663897] unexpected HVC/SVC trap in Hyp mode at: 0xc0901dd0

The trick here is to detect if we've been through a full re-init or
not by looking at HVBAR (VBAR_EL2 on arm64). This involves
implementing the backend for __hyp_get_vectors in the main KVM HYP
code (rather small), and checking the return value against the
default one when the CPU notifier is called on CPU_PM_EXIT.

Reported-by: Andre Przywara &lt;osp@andrep.de&gt;
Tested-by: Andre Przywara &lt;osp@andrep.de&gt;
Cc: Lorenzo Pieralisi &lt;lorenzo.pieralisi@arm.com&gt;
Cc: Rob Herring &lt;rob.herring@linaro.org&gt;
Acked-by: Christoffer Dall &lt;christoffer.dall@linaro.org&gt;
Signed-off-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARM64: unwind: Fix PC calculation</title>
<updated>2014-02-17T09:16:33+00:00</updated>
<author>
<name>Olof Johansson</name>
<email>olof@lixom.net</email>
</author>
<published>2014-02-14T19:35:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e306dfd06fcb44d21c80acb8e5a88d55f3d1cf63'/>
<id>e306dfd06fcb44d21c80acb8e5a88d55f3d1cf63</id>
<content type='text'>
The frame PC value in the unwind code used to just take the saved LR
value and use that.  That's incorrect as a stack trace, since it shows
the return path stack, not the call path stack.

In particular, it shows faulty information in case the bl is done as
the very last instruction of one label, since the return point will be
in the next label. That can easily be seen with tail calls to panic(),
which is marked __noreturn and thus doesn't have anything useful after it.

Easiest here is to just correct the unwind code and do a -4, to get the
actual call site for the backtrace instead of the return site.

Signed-off-by: Olof Johansson &lt;olof@lixom.net&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The frame PC value in the unwind code used to just take the saved LR
value and use that.  That's incorrect as a stack trace, since it shows
the return path stack, not the call path stack.

In particular, it shows faulty information in case the bl is done as
the very last instruction of one label, since the return point will be
in the next label. That can easily be seen with tail calls to panic(),
which is marked __noreturn and thus doesn't have anything useful after it.

Easiest here is to just correct the unwind code and do a -4, to get the
actual call site for the backtrace instead of the return site.

Signed-off-by: Olof Johansson &lt;olof@lixom.net&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm</title>
<updated>2014-02-14T19:10:49+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-02-14T19:10:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c1b8ae03c38281e442c2ad3725e921b09f23290a'/>
<id>c1b8ae03c38281e442c2ad3725e921b09f23290a</id>
<content type='text'>
Pull KVM fixes from Paolo Bonzini:
 "A small error handling problem and a compile breakage for ARM64"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  arm64: KVM: Add VGIC device control for arm64
  KVM: return an error code in kvm_vm_ioctl_register_coalesced_mmio()
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull KVM fixes from Paolo Bonzini:
 "A small error handling problem and a compile breakage for ARM64"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  arm64: KVM: Add VGIC device control for arm64
  KVM: return an error code in kvm_vm_ioctl_register_coalesced_mmio()
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: KVM: Add VGIC device control for arm64</title>
<updated>2014-02-14T10:09:49+00:00</updated>
<author>
<name>Christoffer Dall</name>
<email>christoffer.dall@linaro.org</email>
</author>
<published>2014-02-02T21:41:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2a2f3e269c75edf916de5967079069aeb6a601cb'/>
<id>2a2f3e269c75edf916de5967079069aeb6a601cb</id>
<content type='text'>
This fixes the build breakage introduced by
c07a0191ef2de1f9510f12d1f88e3b0b5cd8d66f and adds support for the device
control API and save/restore of the VGIC state for ARMv8.

The defines were simply missing from the arm64 header files and
uaccess.h must be implicitly imported from somewhere else on arm.

Signed-off-by: Christoffer Dall &lt;christoffer.dall@linaro.org&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This fixes the build breakage introduced by
c07a0191ef2de1f9510f12d1f88e3b0b5cd8d66f and adds support for the device
control API and save/restore of the VGIC state for ARMv8.

The defines were simply missing from the arm64 header files and
uaccess.h must be implicitly imported from somewhere else on arm.

Signed-off-by: Christoffer Dall &lt;christoffer.dall@linaro.org&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: defconfig: Expand default enabled features</title>
<updated>2014-02-07T17:17:28+00:00</updated>
<author>
<name>Mark Rutland</name>
<email>mark.rutland@arm.com</email>
</author>
<published>2014-02-07T17:12:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=55834a773fe343912b705bef8114ec93fd337188'/>
<id>55834a773fe343912b705bef8114ec93fd337188</id>
<content type='text'>
FPGA implementations of the Cortex-A57 and Cortex-A53 are now available
in the form of the SMM-A57 and SMM-A53 Soft Macrocell Models (SMMs) for
Versatile Express. As these attach to a Motherboard Express V2M-P1 it
would be useful to have support for some V2M-P1 peripherals enabled by
default.

Additionally a couple of of features have been introduced since the last
defconfig update (CMA, jump labels) that would be good to have enabled
by default to ensure they are build and boot tested.

This patch updates the arm64 defconfig to enable support for these
devices and features. The arm64 Kconfig is modified to select
HAVE_PATA_PLATFORM, which is required to enable support for the
CompactFlash controller on the V2M-P1.

A few options which don't need to appear in defconfig are trimmed:

* BLK_DEV - selected by default
* EXPERIMENTAL - otherwise gone from the kernel
* MII - selected by drivers which require it
* USB_SUPPORT - selected by default

Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
FPGA implementations of the Cortex-A57 and Cortex-A53 are now available
in the form of the SMM-A57 and SMM-A53 Soft Macrocell Models (SMMs) for
Versatile Express. As these attach to a Motherboard Express V2M-P1 it
would be useful to have support for some V2M-P1 peripherals enabled by
default.

Additionally a couple of of features have been introduced since the last
defconfig update (CMA, jump labels) that would be good to have enabled
by default to ensure they are build and boot tested.

This patch updates the arm64 defconfig to enable support for these
devices and features. The arm64 Kconfig is modified to select
HAVE_PATA_PLATFORM, which is required to enable support for the
CompactFlash controller on the V2M-P1.

A few options which don't need to appear in defconfig are trimmed:

* BLK_DEV - selected by default
* EXPERIMENTAL - otherwise gone from the kernel
* MII - selected by drivers which require it
* USB_SUPPORT - selected by default

Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: asm: remove redundant "cc" clobbers</title>
<updated>2014-02-07T16:46:07+00:00</updated>
<author>
<name>Will Deacon</name>
<email>will.deacon@arm.com</email>
</author>
<published>2014-02-04T12:29:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=95c4189689f92fba7ecf9097173404d4928c6e9b'/>
<id>95c4189689f92fba7ecf9097173404d4928c6e9b</id>
<content type='text'>
cbnz/tbnz don't update the condition flags, so remove the "cc" clobbers
from inline asm blocks that only use these instructions to implement
conditional branches.

Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
cbnz/tbnz don't update the condition flags, so remove the "cc" clobbers
from inline asm blocks that only use these instructions to implement
conditional branches.

Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: atomics: fix use of acquire + release for full barrier semantics</title>
<updated>2014-02-07T16:45:43+00:00</updated>
<author>
<name>Will Deacon</name>
<email>will.deacon@arm.com</email>
</author>
<published>2014-02-04T12:29:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8e86f0b409a44193f1587e87b69c5dcf8f65be67'/>
<id>8e86f0b409a44193f1587e87b69c5dcf8f65be67</id>
<content type='text'>
Linux requires a number of atomic operations to provide full barrier
semantics, that is no memory accesses after the operation can be
observed before any accesses up to and including the operation in
program order.

On arm64, these operations have been incorrectly implemented as follows:

	// A, B, C are independent memory locations

	&lt;Access [A]&gt;

	// atomic_op (B)
1:	ldaxr	x0, [B]		// Exclusive load with acquire
	&lt;op(B)&gt;
	stlxr	w1, x0, [B]	// Exclusive store with release
	cbnz	w1, 1b

	&lt;Access [C]&gt;

The assumption here being that two half barriers are equivalent to a
full barrier, so the only permitted ordering would be A -&gt; B -&gt; C
(where B is the atomic operation involving both a load and a store).

Unfortunately, this is not the case by the letter of the architecture
and, in fact, the accesses to A and C are permitted to pass their
nearest half barrier resulting in orderings such as Bl -&gt; A -&gt; C -&gt; Bs
or Bl -&gt; C -&gt; A -&gt; Bs (where Bl is the load-acquire on B and Bs is the
store-release on B). This is a clear violation of the full barrier
requirement.

The simple way to fix this is to implement the same algorithm as ARMv7
using explicit barriers:

	&lt;Access [A]&gt;

	// atomic_op (B)
	dmb	ish		// Full barrier
1:	ldxr	x0, [B]		// Exclusive load
	&lt;op(B)&gt;
	stxr	w1, x0, [B]	// Exclusive store
	cbnz	w1, 1b
	dmb	ish		// Full barrier

	&lt;Access [C]&gt;

but this has the undesirable effect of introducing *two* full barrier
instructions. A better approach is actually the following, non-intuitive
sequence:

	&lt;Access [A]&gt;

	// atomic_op (B)
1:	ldxr	x0, [B]		// Exclusive load
	&lt;op(B)&gt;
	stlxr	w1, x0, [B]	// Exclusive store with release
	cbnz	w1, 1b
	dmb	ish		// Full barrier

	&lt;Access [C]&gt;

The simple observations here are:

  - The dmb ensures that no subsequent accesses (e.g. the access to C)
    can enter or pass the atomic sequence.

  - The dmb also ensures that no prior accesses (e.g. the access to A)
    can pass the atomic sequence.

  - Therefore, no prior access can pass a subsequent access, or
    vice-versa (i.e. A is strictly ordered before C).

  - The stlxr ensures that no prior access can pass the store component
    of the atomic operation.

The only tricky part remaining is the ordering between the ldxr and the
access to A, since the absence of the first dmb means that we're now
permitting re-ordering between the ldxr and any prior accesses.

From an (arbitrary) observer's point of view, there are two scenarios:

  1. We have observed the ldxr. This means that if we perform a store to
     [B], the ldxr will still return older data. If we can observe the
     ldxr, then we can potentially observe the permitted re-ordering
     with the access to A, which is clearly an issue when compared to
     the dmb variant of the code. Thankfully, the exclusive monitor will
     save us here since it will be cleared as a result of the store and
     the ldxr will retry. Notice that any use of a later memory
     observation to imply observation of the ldxr will also imply
     observation of the access to A, since the stlxr/dmb ensure strict
     ordering.

  2. We have not observed the ldxr. This means we can perform a store
     and influence the later ldxr. However, that doesn't actually tell
     us anything about the access to [A], so we've not lost anything
     here either when compared to the dmb variant.

This patch implements this solution for our barriered atomic operations,
ensuring that we satisfy the full barrier requirements where they are
needed.

Cc: &lt;stable@vger.kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Linux requires a number of atomic operations to provide full barrier
semantics, that is no memory accesses after the operation can be
observed before any accesses up to and including the operation in
program order.

On arm64, these operations have been incorrectly implemented as follows:

	// A, B, C are independent memory locations

	&lt;Access [A]&gt;

	// atomic_op (B)
1:	ldaxr	x0, [B]		// Exclusive load with acquire
	&lt;op(B)&gt;
	stlxr	w1, x0, [B]	// Exclusive store with release
	cbnz	w1, 1b

	&lt;Access [C]&gt;

The assumption here being that two half barriers are equivalent to a
full barrier, so the only permitted ordering would be A -&gt; B -&gt; C
(where B is the atomic operation involving both a load and a store).

Unfortunately, this is not the case by the letter of the architecture
and, in fact, the accesses to A and C are permitted to pass their
nearest half barrier resulting in orderings such as Bl -&gt; A -&gt; C -&gt; Bs
or Bl -&gt; C -&gt; A -&gt; Bs (where Bl is the load-acquire on B and Bs is the
store-release on B). This is a clear violation of the full barrier
requirement.

The simple way to fix this is to implement the same algorithm as ARMv7
using explicit barriers:

	&lt;Access [A]&gt;

	// atomic_op (B)
	dmb	ish		// Full barrier
1:	ldxr	x0, [B]		// Exclusive load
	&lt;op(B)&gt;
	stxr	w1, x0, [B]	// Exclusive store
	cbnz	w1, 1b
	dmb	ish		// Full barrier

	&lt;Access [C]&gt;

but this has the undesirable effect of introducing *two* full barrier
instructions. A better approach is actually the following, non-intuitive
sequence:

	&lt;Access [A]&gt;

	// atomic_op (B)
1:	ldxr	x0, [B]		// Exclusive load
	&lt;op(B)&gt;
	stlxr	w1, x0, [B]	// Exclusive store with release
	cbnz	w1, 1b
	dmb	ish		// Full barrier

	&lt;Access [C]&gt;

The simple observations here are:

  - The dmb ensures that no subsequent accesses (e.g. the access to C)
    can enter or pass the atomic sequence.

  - The dmb also ensures that no prior accesses (e.g. the access to A)
    can pass the atomic sequence.

  - Therefore, no prior access can pass a subsequent access, or
    vice-versa (i.e. A is strictly ordered before C).

  - The stlxr ensures that no prior access can pass the store component
    of the atomic operation.

The only tricky part remaining is the ordering between the ldxr and the
access to A, since the absence of the first dmb means that we're now
permitting re-ordering between the ldxr and any prior accesses.

From an (arbitrary) observer's point of view, there are two scenarios:

  1. We have observed the ldxr. This means that if we perform a store to
     [B], the ldxr will still return older data. If we can observe the
     ldxr, then we can potentially observe the permitted re-ordering
     with the access to A, which is clearly an issue when compared to
     the dmb variant of the code. Thankfully, the exclusive monitor will
     save us here since it will be cleared as a result of the store and
     the ldxr will retry. Notice that any use of a later memory
     observation to imply observation of the ldxr will also imply
     observation of the access to A, since the stlxr/dmb ensure strict
     ordering.

  2. We have not observed the ldxr. This means we can perform a store
     and influence the later ldxr. However, that doesn't actually tell
     us anything about the access to [A], so we've not lost anything
     here either when compared to the dmb variant.

This patch implements this solution for our barriered atomic operations,
ensuring that we satisfy the full barrier requirements where they are
needed.

Cc: &lt;stable@vger.kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
