<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/arm64/lib, branch v5.14</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>arm64: Avoid premature usercopy failure</title>
<updated>2021-07-15T16:29:14+00:00</updated>
<author>
<name>Robin Murphy</name>
<email>robin.murphy@arm.com</email>
</author>
<published>2021-07-12T14:27:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=295cf156231ca3f9e3a66bde7fab5e09c41835e0'/>
<id>295cf156231ca3f9e3a66bde7fab5e09c41835e0</id>
<content type='text'>
Al reminds us that the usercopy API must only return complete failure
if absolutely nothing could be copied. Currently, if userspace does
something silly like giving us an unaligned pointer to Device memory,
or a size which overruns MTE tag bounds, we may fail to honour that
requirement when faulting on a multi-byte access even though a smaller
access could have succeeded.

Add a mitigation to the fixup routines to fall back to a single-byte
copy if we faulted on a larger access before anything has been written
to the destination, to guarantee making *some* forward progress. We
needn't be too concerned about the overall performance since this should
only occur when callers are doing something a bit dodgy in the first
place. Particularly broken userspace might still be able to trick
generic_perform_write() into an infinite loop by targeting write() at
an mmap() of some read-only device register where the fault-in load
succeeds but any store synchronously aborts such that copy_to_user() is
genuinely unable to make progress, but, well, don't do that...

CC: stable@vger.kernel.org
Reported-by: Chen Huang &lt;chenhuang5@huawei.com&gt;
Suggested-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Reviewed-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Signed-off-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Link: https://lore.kernel.org/r/dc03d5c675731a1f24a62417dba5429ad744234e.1626098433.git.robin.murphy@arm.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Al reminds us that the usercopy API must only return complete failure
if absolutely nothing could be copied. Currently, if userspace does
something silly like giving us an unaligned pointer to Device memory,
or a size which overruns MTE tag bounds, we may fail to honour that
requirement when faulting on a multi-byte access even though a smaller
access could have succeeded.

Add a mitigation to the fixup routines to fall back to a single-byte
copy if we faulted on a larger access before anything has been written
to the destination, to guarantee making *some* forward progress. We
needn't be too concerned about the overall performance since this should
only occur when callers are doing something a bit dodgy in the first
place. Particularly broken userspace might still be able to trick
generic_perform_write() into an infinite loop by targeting write() at
an mmap() of some read-only device register where the fault-in load
succeeds but any store synchronously aborts such that copy_to_user() is
genuinely unable to make progress, but, well, don't do that...

CC: stable@vger.kernel.org
Reported-by: Chen Huang &lt;chenhuang5@huawei.com&gt;
Suggested-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Reviewed-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Signed-off-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Link: https://lore.kernel.org/r/dc03d5c675731a1f24a62417dba5429ad744234e.1626098433.git.robin.murphy@arm.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: fix strlen() with CONFIG_KASAN_HW_TAGS</title>
<updated>2021-07-12T12:36:22+00:00</updated>
<author>
<name>Mark Rutland</name>
<email>mark.rutland@arm.com</email>
</author>
<published>2021-07-12T09:00:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5f34b1eb2f8d4bba7d6352e767ef84bee9096d97'/>
<id>5f34b1eb2f8d4bba7d6352e767ef84bee9096d97</id>
<content type='text'>
When the kernel is built with CONFIG_KASAN_HW_TAGS and the CPU supports
MTE, memory accesses are checked at 16-byte granularity, and
out-of-bounds accesses can result in tag check faults. Our current
implementation of strlen() makes unaligned 16-byte accesses (within a
naturally aligned 4096-byte window), and can trigger tag check faults.

This can be seen at boot time, e.g.

| BUG: KASAN: invalid-access in __pi_strlen+0x14/0x150
| Read at addr f4ff0000c0028300 by task swapper/0/0
| Pointer tag: [f4], memory tag: [fe]
|
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.13.0-09550-g03c2813535a2-dirty #20
| Hardware name: linux,dummy-virt (DT)
| Call trace:
|  dump_backtrace+0x0/0x1b0
|  show_stack+0x1c/0x30
|  dump_stack_lvl+0x68/0x84
|  print_address_description+0x7c/0x2b4
|  kasan_report+0x138/0x38c
|  __do_kernel_fault+0x190/0x1c4
|  do_tag_check_fault+0x78/0x90
|  do_mem_abort+0x44/0xb4
|  el1_abort+0x40/0x60
|  el1h_64_sync_handler+0xb0/0xd0
|  el1h_64_sync+0x78/0x7c
|  __pi_strlen+0x14/0x150
|  __register_sysctl_table+0x7c4/0x890
|  register_leaf_sysctl_tables+0x1a4/0x210
|  register_leaf_sysctl_tables+0xc8/0x210
|  __register_sysctl_paths+0x22c/0x290
|  register_sysctl_table+0x2c/0x40
|  sysctl_init+0x20/0x30
|  proc_sys_init+0x3c/0x48
|  proc_root_init+0x80/0x9c
|  start_kernel+0x640/0x69c
|  __primary_switched+0xc0/0xc8

To fix this, we can reduce the (strlen-internal) MIN_PAGE_SIZE to 16
bytes when CONFIG_KASAN_HW_TAGS is selected. This will cause strlen() to
align the base pointer downwards to a 16-byte boundary, and to discard
the additional prefix bytes without counting them. All subsequent
accesses will be 16-byte aligned 16-byte LDPs. While the comments say
the body of the loop will access 32 bytes, this is performed as two
16-byte acceses, with the second made only if the first did not
encounter a NUL byte, so the body of the loop will not over-read across
a 16-byte boundary.

No other string routines are affected. The other str*() routines will
not make any access which straddles a 16-byte boundary, and the mem*()
routines will only make acceses which straddle a 16-byte boundary when
which is entirely within the bounds of the relevant base and size
arguments.

Fixes: 325a1de81287 ("arm64: Import updated version of Cortex Strings' strlen")
Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com
Cc: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Cc: Andrey Ryabinin &lt;ryabinin.a.a@gmail.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Marco Elver &lt;elver@google.com&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Reviewed-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Reviewed-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Link: https://lore.kernel.org/r/20210712090043.20847-1-mark.rutland@arm.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When the kernel is built with CONFIG_KASAN_HW_TAGS and the CPU supports
MTE, memory accesses are checked at 16-byte granularity, and
out-of-bounds accesses can result in tag check faults. Our current
implementation of strlen() makes unaligned 16-byte accesses (within a
naturally aligned 4096-byte window), and can trigger tag check faults.

This can be seen at boot time, e.g.

| BUG: KASAN: invalid-access in __pi_strlen+0x14/0x150
| Read at addr f4ff0000c0028300 by task swapper/0/0
| Pointer tag: [f4], memory tag: [fe]
|
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.13.0-09550-g03c2813535a2-dirty #20
| Hardware name: linux,dummy-virt (DT)
| Call trace:
|  dump_backtrace+0x0/0x1b0
|  show_stack+0x1c/0x30
|  dump_stack_lvl+0x68/0x84
|  print_address_description+0x7c/0x2b4
|  kasan_report+0x138/0x38c
|  __do_kernel_fault+0x190/0x1c4
|  do_tag_check_fault+0x78/0x90
|  do_mem_abort+0x44/0xb4
|  el1_abort+0x40/0x60
|  el1h_64_sync_handler+0xb0/0xd0
|  el1h_64_sync+0x78/0x7c
|  __pi_strlen+0x14/0x150
|  __register_sysctl_table+0x7c4/0x890
|  register_leaf_sysctl_tables+0x1a4/0x210
|  register_leaf_sysctl_tables+0xc8/0x210
|  __register_sysctl_paths+0x22c/0x290
|  register_sysctl_table+0x2c/0x40
|  sysctl_init+0x20/0x30
|  proc_sys_init+0x3c/0x48
|  proc_root_init+0x80/0x9c
|  start_kernel+0x640/0x69c
|  __primary_switched+0xc0/0xc8

To fix this, we can reduce the (strlen-internal) MIN_PAGE_SIZE to 16
bytes when CONFIG_KASAN_HW_TAGS is selected. This will cause strlen() to
align the base pointer downwards to a 16-byte boundary, and to discard
the additional prefix bytes without counting them. All subsequent
accesses will be 16-byte aligned 16-byte LDPs. While the comments say
the body of the loop will access 32 bytes, this is performed as two
16-byte acceses, with the second made only if the first did not
encounter a NUL byte, so the body of the loop will not over-read across
a 16-byte boundary.

No other string routines are affected. The other str*() routines will
not make any access which straddles a 16-byte boundary, and the mem*()
routines will only make acceses which straddle a 16-byte boundary when
which is entirely within the bounds of the relevant base and size
arguments.

Fixes: 325a1de81287 ("arm64: Import updated version of Cortex Strings' strlen")
Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com
Cc: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Cc: Andrey Ryabinin &lt;ryabinin.a.a@gmail.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Marco Elver &lt;elver@google.com&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Reviewed-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Reviewed-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Link: https://lore.kernel.org/r/20210712090043.20847-1-mark.rutland@arm.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-next/mte' into for-next/core</title>
<updated>2021-06-24T13:05:25+00:00</updated>
<author>
<name>Will Deacon</name>
<email>will@kernel.org</email>
</author>
<published>2021-06-24T13:05:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fdceddb06a5ff5ad3894cf9e8124d5af38ac5793'/>
<id>fdceddb06a5ff5ad3894cf9e8124d5af38ac5793</id>
<content type='text'>
KASAN optimisations for the hardware tagging (MTE) implementation.

* for-next/mte:
  kasan: disable freed user page poisoning with HW tags
  arm64: mte: handle tags zeroing at page allocation time
  kasan: use separate (un)poison implementation for integrated init
  mm: arch: remove indirection level in alloc_zeroed_user_highpage_movable()
  kasan: speed up mte_set_mem_tag_range
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
KASAN optimisations for the hardware tagging (MTE) implementation.

* for-next/mte:
  kasan: disable freed user page poisoning with HW tags
  arm64: mte: handle tags zeroing at page allocation time
  kasan: use separate (un)poison implementation for integrated init
  mm: arch: remove indirection level in alloc_zeroed_user_highpage_movable()
  kasan: speed up mte_set_mem_tag_range
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-next/kasan' into for-next/core</title>
<updated>2021-06-24T13:04:00+00:00</updated>
<author>
<name>Will Deacon</name>
<email>will@kernel.org</email>
</author>
<published>2021-06-24T13:04:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2c9bd9d806757bc84e9d744044d6937a85df5f60'/>
<id>2c9bd9d806757bc84e9d744044d6937a85df5f60</id>
<content type='text'>
Optimise out-of-line KASAN checking when using software tagging.

* for-next/kasan:
  kasan: arm64: support specialized outlined tag mismatch checks
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Optimise out-of-line KASAN checking when using software tagging.

* for-next/kasan:
  kasan: arm64: support specialized outlined tag mismatch checks
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-next/insn' into for-next/core</title>
<updated>2021-06-24T13:03:24+00:00</updated>
<author>
<name>Will Deacon</name>
<email>will@kernel.org</email>
</author>
<published>2021-06-24T13:03:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=181a126979307a0192f41a4a1fac235d6f4ac9f0'/>
<id>181a126979307a0192f41a4a1fac235d6f4ac9f0</id>
<content type='text'>
Refactoring of our instruction decoding routines and addition of some
missing encodings.

* for-next/insn:
  arm64: insn: avoid circular include dependency
  arm64: insn: move AARCH64_INSN_SIZE into &lt;asm/insn.h&gt;
  arm64: insn: decouple patching from insn code
  arm64: insn: Add load/store decoding helpers
  arm64: insn: Add some opcodes to instruction decoder
  arm64: insn: Add barrier encodings
  arm64: insn: Add SVE instruction class
  arm64: Move instruction encoder/decoder under lib/
  arm64: Move aarch32 condition check functions
  arm64: Move patching utilities out of instruction encoding/decoding
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Refactoring of our instruction decoding routines and addition of some
missing encodings.

* for-next/insn:
  arm64: insn: avoid circular include dependency
  arm64: insn: move AARCH64_INSN_SIZE into &lt;asm/insn.h&gt;
  arm64: insn: decouple patching from insn code
  arm64: insn: Add load/store decoding helpers
  arm64: insn: Add some opcodes to instruction decoder
  arm64: insn: Add barrier encodings
  arm64: insn: Add SVE instruction class
  arm64: Move instruction encoder/decoder under lib/
  arm64: Move aarch32 condition check functions
  arm64: Move patching utilities out of instruction encoding/decoding
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-next/cortex-strings' into for-next/core</title>
<updated>2021-06-24T12:33:57+00:00</updated>
<author>
<name>Will Deacon</name>
<email>will@kernel.org</email>
</author>
<published>2021-06-24T12:33:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5ceb045541ad979fd304ca2321bf1fbb76189867'/>
<id>5ceb045541ad979fd304ca2321bf1fbb76189867</id>
<content type='text'>
Update our kernel string routines to the latest Cortex Strings
implementation.

* for-next/cortex-strings:
  arm64: update string routine copyrights and URLs
  arm64: Rewrite __arch_clear_user()
  arm64: Better optimised memchr()
  arm64: Import latest memcpy()/memmove() implementation
  arm64: Add assembly annotations for weak-PI-alias madness
  arm64: Import latest version of Cortex Strings' strncmp
  arm64: Import updated version of Cortex Strings' strlen
  arm64: Import latest version of Cortex Strings' strcmp
  arm64: Import latest version of Cortex Strings' memcmp
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Update our kernel string routines to the latest Cortex Strings
implementation.

* for-next/cortex-strings:
  arm64: update string routine copyrights and URLs
  arm64: Rewrite __arch_clear_user()
  arm64: Better optimised memchr()
  arm64: Import latest memcpy()/memmove() implementation
  arm64: Add assembly annotations for weak-PI-alias madness
  arm64: Import latest version of Cortex Strings' strncmp
  arm64: Import updated version of Cortex Strings' strlen
  arm64: Import latest version of Cortex Strings' strcmp
  arm64: Import latest version of Cortex Strings' memcmp
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: mte: handle tags zeroing at page allocation time</title>
<updated>2021-06-04T18:32:21+00:00</updated>
<author>
<name>Peter Collingbourne</name>
<email>pcc@google.com</email>
</author>
<published>2021-06-02T23:52:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=013bb59dbb7cf876449df860946458a595a96d51'/>
<id>013bb59dbb7cf876449df860946458a595a96d51</id>
<content type='text'>
Currently, on an anonymous page fault, the kernel allocates a zeroed
page and maps it in user space. If the mapping is tagged (PROT_MTE),
set_pte_at() additionally clears the tags. It is, however, more
efficient to clear the tags at the same time as zeroing the data on
allocation. To avoid clearing the tags on any page (which may not be
mapped as tagged), only do this if the vma flags contain VM_MTE. This
requires introducing a new GFP flag that is used to determine whether
to clear the tags.

The DC GZVA instruction with a 0 top byte (and 0 tag) requires
top-byte-ignore. Set the TCR_EL1.{TBI1,TBID1} bits irrespective of
whether KASAN_HW is enabled.

Signed-off-by: Peter Collingbourne &lt;pcc@google.com&gt;
Co-developed-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Link: https://linux-review.googlesource.com/id/Id46dc94e30fe11474f7e54f5d65e7658dbdddb26
Reviewed-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Reviewed-by: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Link: https://lore.kernel.org/r/20210602235230.3928842-4-pcc@google.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, on an anonymous page fault, the kernel allocates a zeroed
page and maps it in user space. If the mapping is tagged (PROT_MTE),
set_pte_at() additionally clears the tags. It is, however, more
efficient to clear the tags at the same time as zeroing the data on
allocation. To avoid clearing the tags on any page (which may not be
mapped as tagged), only do this if the vma flags contain VM_MTE. This
requires introducing a new GFP flag that is used to determine whether
to clear the tags.

The DC GZVA instruction with a 0 top byte (and 0 tag) requires
top-byte-ignore. Set the TCR_EL1.{TBI1,TBID1} bits irrespective of
whether KASAN_HW is enabled.

Signed-off-by: Peter Collingbourne &lt;pcc@google.com&gt;
Co-developed-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Link: https://linux-review.googlesource.com/id/Id46dc94e30fe11474f7e54f5d65e7658dbdddb26
Reviewed-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Reviewed-by: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Link: https://lore.kernel.org/r/20210602235230.3928842-4-pcc@google.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: update string routine copyrights and URLs</title>
<updated>2021-06-02T16:58:26+00:00</updated>
<author>
<name>Mark Rutland</name>
<email>mark.rutland@arm.com</email>
</author>
<published>2021-06-02T15:13:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6b8f648959e5036695f056a60e3444f4753f643e'/>
<id>6b8f648959e5036695f056a60e3444f4753f643e</id>
<content type='text'>
To make future archaeology easier, let's have the string routine comment
blocks encode the specific upstream commit ID they were imported from.
These are the same commit IDs as listed in the commits importing the
code, expanded to 16 characters. Note that the routines have different
commit IDs, each reprsenting the latest upstream commit which changed
the particular routine.

At the same time, let's consistently include 2021 in the copyright
dates.

Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Link: https://lore.kernel.org/r/20210602151358.35571-1-mark.rutland@arm.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
To make future archaeology easier, let's have the string routine comment
blocks encode the specific upstream commit ID they were imported from.
These are the same commit IDs as listed in the commits importing the
code, expanded to 16 characters. Note that the routines have different
commit IDs, each reprsenting the latest upstream commit which changed
the particular routine.

At the same time, let's consistently include 2021 in the copyright
dates.

Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Link: https://lore.kernel.org/r/20210602151358.35571-1-mark.rutland@arm.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: Rewrite __arch_clear_user()</title>
<updated>2021-06-01T17:34:38+00:00</updated>
<author>
<name>Robin Murphy</name>
<email>robin.murphy@arm.com</email>
</author>
<published>2021-05-27T15:34:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=344323e0428b9911406bede6cff23d1482c19eae'/>
<id>344323e0428b9911406bede6cff23d1482c19eae</id>
<content type='text'>
Now that we're always using STTR variants rather than abstracting two
different addressing modes, the user_ldst macro here is frankly more
obfuscating than helpful. Rewrite __arch_clear_user() with regular
USER() annotations so that it's clearer what's going on, and take the
opportunity to minimise the branchiness in the most common paths, while
also allowing the exception fixup to return an accurate result.

Apparently some folks examine large reads from /dev/zero closely enough
to notice the loop being hot, so align it per the other critical loops
(presumably around a typical instruction fetch granularity).

Reviewed-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Link: https://lore.kernel.org/r/1cbd78b12c076a8ad4656a345811cfb9425df0b3.1622128527.git.robin.murphy@arm.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that we're always using STTR variants rather than abstracting two
different addressing modes, the user_ldst macro here is frankly more
obfuscating than helpful. Rewrite __arch_clear_user() with regular
USER() annotations so that it's clearer what's going on, and take the
opportunity to minimise the branchiness in the most common paths, while
also allowing the exception fixup to return an accurate result.

Apparently some folks examine large reads from /dev/zero closely enough
to notice the loop being hot, so align it per the other critical loops
(presumably around a typical instruction fetch granularity).

Reviewed-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Link: https://lore.kernel.org/r/1cbd78b12c076a8ad4656a345811cfb9425df0b3.1622128527.git.robin.murphy@arm.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: Better optimised memchr()</title>
<updated>2021-06-01T17:34:38+00:00</updated>
<author>
<name>Robin Murphy</name>
<email>robin.murphy@arm.com</email>
</author>
<published>2021-05-27T15:34:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9e51cafd783b22018fb15bfb06d65f69349223a9'/>
<id>9e51cafd783b22018fb15bfb06d65f69349223a9</id>
<content type='text'>
Although we implement our own assembly version of memchr(), it turns
out to be barely any better than what GCC can generate for the generic
C version (and would go wrong if the size_t argument were ever large
enough to be interpreted as negative). Unfortunately we can't import the
tuned implementation from the Arm optimized-routines library, since that
has some Advanced SIMD parts which are not really viable for general
kernel library code. What we can do, however, is pep things up with some
relatively straightforward word-at-a-time logic for larger calls.

Adding some timing to optimized-routines' memchr() test for a simple
benchmark, overall this version comes in around half as fast as the SIMD
code, but still nearly 4x faster than our existing implementation.

Signed-off-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Link: https://lore.kernel.org/r/58471b42f9287e039dafa9e5e7035077152438fd.1622128527.git.robin.murphy@arm.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Although we implement our own assembly version of memchr(), it turns
out to be barely any better than what GCC can generate for the generic
C version (and would go wrong if the size_t argument were ever large
enough to be interpreted as negative). Unfortunately we can't import the
tuned implementation from the Arm optimized-routines library, since that
has some Advanced SIMD parts which are not really viable for general
kernel library code. What we can do, however, is pep things up with some
relatively straightforward word-at-a-time logic for larger calls.

Adding some timing to optimized-routines' memchr() test for a simple
benchmark, overall this version comes in around half as fast as the SIMD
code, but still nearly 4x faster than our existing implementation.

Signed-off-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Link: https://lore.kernel.org/r/58471b42f9287e039dafa9e5e7035077152438fd.1622128527.git.robin.murphy@arm.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
