linux.git/arch/arm64/include, branch v6.11

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

2024-08-17T00:02:32+00:00

Pull arm64 fixes from Catalin Marinas:

 - Fix the arm64 __get_mem_asm() to use the _ASM_EXTABLE_##type##ACCESS()
   macro instead of the *_ERR() one in order to avoid writing -EFAULT to
   the value register in case of a fault

 - Initialise all elements of the acpi_early_node_map[] to NUMA_NO_NODE.
   Prior to this fix, only the first element was initialised

 - Move the KASAN random tag seed initialisation after the per-CPU areas
   have been initialised (prng_state is __percpu)

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Fix KASAN random tag seed initialization
  arm64: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE
  arm64: uaccess: correct thinko in __get_mem_asm()

arm64: uaccess: correct thinko in __get_mem_asm()

2024-08-14T16:51:11+00:00

In the CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y version of __get_mem_asm(), we
incorrectly use _ASM_EXTABLE_##type##ACCESS_ERR() such that upon a fault
the extable fixup handler writes -EFAULT into "%w0", which is the
register containing 'x' (the result of the load).

This was a thinko in commit:

  86a6a68febfcf57b ("arm64: start using 'asm goto' for get_user() when available")

Prior to that commit _ASM_EXTABLE_##type##ACCESS_ERR_ZERO() was used
such that the extable fixup handler wrote -EFAULT into "%w0" (the
register containing 'err'), and zero into "%w1" (the register containing
'x'). When the 'err' variable was removed, the extable entry was updated
incorrectly.

Writing -EFAULT to the value register is unnecessary but benign:

* We never want -EFAULT in the value register, and previously this would
  have been zeroed in the extable fixup handler.

* In __get_user_error() the value is overwritten with zero explicitly in
  the error path.

* The asm goto outputs cannot be used when the goto label is taken, as
  older compilers (e.g. clang < 16.0.0) do not guarantee that asm goto
  outputs are usable in this path and may use a stale value rather than
  the value in an output register. Consequently, zeroing in the extable
  fixup handler is insufficient to ensure callers see zero in the error
  path.

* The expected usage of unsafe_get_user() and get_kernel_nofault()
  requires that the value is not consumed in the error path.

Some versions of GCC would mis-compile asm goto with outputs, and
erroneously omit subsequent assignments, breaking the error path
handling in __get_user_error(). This was discussed at:

  https://lore.kernel.org/lkml/ZpfxLrJAOF2YNqCk@J2N7QTR9R3.cambridge.arm.com/

... and was fixed by removing support for asm goto with outputs on those
broken compilers in commit:

  f2f6a8e887172503 ("init/Kconfig: remove CONFIG_GCC_ASM_GOTO_OUTPUT_WORKAROUND")

With that out of the way, we can safely replace the usage of
_ASM_EXTABLE_##type##ACCESS_ERR() with _ASM_EXTABLE_##type##ACCESS(),
leaving the value register unchanged in the case a fault is taken, as
was originally intended. This matches other architectures and matches
our __put_mem_asm().

Signed-off-by: Mark Rutland 
Cc: Will Deacon 
Link: https://lore.kernel.org/r/20240807103731.2498893-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas

Merge tag 'kvmarm-fixes-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

2024-08-13T10:06:27+00:00

KVM/arm64 fixes for 6.11, round #1

 - Use kvfree() for the kvmalloc'd nested MMUs array

 - Set of fixes to address warnings in W=1 builds

 - Make KVM depend on assembler support for ARMv8.4

 - Fix for vgic-debug interface for VMs without LPIs

 - Actually check ID_AA64MMFR3_EL1.S1PIE in get-reg-list selftest

 - Minor code / comment cleanups for configuring PAuth traps

 - Take kvm->arch.config_lock to prevent destruction / initialization
   race for a vCPU's CPUIF which may lead to a UAF

KVM: arm64: Tidying up PAuth code in KVM

2024-08-07T19:11:05+00:00

Tidy up some of the PAuth trapping code to clear up some comments
and avoid clang/checkpatch warnings. Also, don't bother setting
PAuth HCR_EL2 bits in pKVM, since it's handled by the hypervisor.

Signed-off-by: Fuad Tabba 
Link: https://lore.kernel.org/r/20240722163311.1493879-1-tabba@google.com
Signed-off-by: Oliver Upton

arm64: jump_label: Ensure patched jump_labels are visible to all CPUs

2024-08-02T14:07:01+00:00

Although the Arm architecture permits concurrent modification and
execution of NOP and branch instructions, it still requires some
synchronisation to ensure that other CPUs consistently execute the newly
written instruction:

 >  When the modified instructions are observable, each PE that is
 >  executing the modified instructions must execute an ISB or perform a
 >  context synchronizing event to ensure execution of the modified
 >  instructions

Prior to commit f6cc0c501649 ("arm64: Avoid calling stop_machine() when
patching jump labels"), the arm64 jump_label patching machinery
performed synchronisation using stop_machine() after each modification,
however this was problematic when flipping static keys from atomic
contexts (namely, the arm_arch_timer CPU hotplug startup notifier) and
so we switched to the _nosync() patching routines to avoid "scheduling
while atomic" BUG()s during boot.

In hindsight, the analysis of the issue in f6cc0c501649 isn't quite
right: it cites the use of IPIs in the default patching routines as the
cause of the lockup, whereas stop_machine() does not rely on IPIs and
the I-cache invalidation is performed using __flush_icache_range(),
which elides the call to kick_all_cpus_sync(). In fact, the blocking
wait for other CPUs is what triggers the BUG() and the problem remains
even after f6cc0c501649, for example because we could block on the
jump_label_mutex. Eventually, the arm_arch_timer driver was fixed to
avoid the static key entirely in commit a862fc2254bd
("clocksource/arm_arch_timer: Remove use of workaround static key").

This all leaves the jump_label patching code in a funny situation on
arm64 as we do not synchronise with other CPUs to reduce the likelihood
of a bug which no longer exists. Consequently, toggling a static key on
one CPU cannot be assumed to take effect on other CPUs, leading to
potential issues, for example with missing preempt notifiers.

Rather than revert f6cc0c501649 and go back to stop_machine() for each
patch site, implement arch_jump_label_transform_apply() and kick all
the other CPUs with an IPI at the end of patching.

Cc: Alexander Potapenko 
Cc: Mark Rutland 
Cc: Marc Zyngier 
Fixes: f6cc0c501649 ("arm64: Avoid calling stop_machine() when patching jump labels")
Signed-off-by: Will Deacon 
Reviewed-by: Catalin Marinas 
Reviewed-by: Marc Zyngier 
Link: https://lore.kernel.org/r/20240731133601.3073-1-will@kernel.org
Signed-off-by: Catalin Marinas

arm64: cputype: Add Cortex-A725 definitions

2024-08-01T15:11:27+00:00

Add cputype definitions for Cortex-A725. These will be used for errata
detection in subsequent patches.

These values can be found in the Cortex-A725 TRM:

  https://developer.arm.com/documentation/107652/0001/

... in table A-247 ("MIDR_EL1 bit descriptions").

Signed-off-by: Mark Rutland 
Cc: James Morse 
Cc: Will Deacon 
Reviewed-by: Anshuman Khandual 
Link: https://lore.kernel.org/r/20240801101803.1982459-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas

arm64: cputype: Add Cortex-X1C definitions

2024-08-01T15:11:26+00:00

Add cputype definitions for Cortex-X1C. These will be used for errata
detection in subsequent patches.

These values can be found in the Cortex-X1C TRM:

  https://developer.arm.com/documentation/101968/0002/

... in section B2.107 ("MIDR_EL1, Main ID Register, EL1").

Signed-off-by: Mark Rutland 
Cc: James Morse 
Cc: Will Deacon 
Reviewed-by: Anshuman Khandual 
Link: https://lore.kernel.org/r/20240801101803.1982459-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

2024-07-26T17:39:10+00:00

Pull arm64 fixes from Will Deacon:
 "The usual summary below, but the main fix is for the fast GUP lockless
  page-table walk when we have a combination of compile-time and
  run-time folding of the p4d and the pud respectively.

   - Remove some redundant Kconfig conditionals

   - Fix string output in ptrace selftest

   - Fix fast GUP crashes in some page-table configurations

   - Remove obsolete linker option when building the vDSO

   - Fix some sysreg field definitions for the GIC"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: mm: Fix lockless walks with static and dynamic page-table folding
  arm64/sysreg: Correct the values for GICv4.1
  arm64/vdso: Remove --hash-style=sysv
  kselftest: missing arg in ptrace.c
  arm64/Kconfig: Remove redundant 'if HAVE_FUNCTION_GRAPH_TRACER'
  arm64: remove redundant 'if HAVE_ARCH_KASAN' in Kconfig

arm64: mm: Fix lockless walks with static and dynamic page-table folding

2024-07-25T12:20:55+00:00

Lina reports random oopsen originating from the fast GUP code when
16K pages are used with 4-level page-tables, the fourth level being
folded at runtime due to lack of LPA2.

In this configuration, the generic implementation of
p4d_offset_lockless() will return a 'p4d_t *' corresponding to the
'pgd_t' allocated on the stack of the caller, gup_fast_pgd_range().
This is normally fine, but when the fourth level of page-table is folded
at runtime, pud_offset_lockless() will offset from the address of the
'p4d_t' to calculate the address of the PUD in the same page-table page.
This results in a stray stack read when the 'p4d_t' has been allocated
on the stack and can send the walker into the weeds.

Fix the problem by providing our own definition of p4d_offset_lockless()
when CONFIG_PGTABLE_LEVELS <= 4 which returns the real page-table
pointer rather than the address of the local stack variable.

Cc: Catalin Marinas 
Cc: Ard Biesheuvel 
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/50360968-13fb-4e6f-8f52-1725b3177215@asahilina.net
Fixes: 0dd4f60a2c76 ("arm64: mm: Add support for folding PUDs at runtime")
Reported-by: Asahi Lina 
Reviewed-by: Ard Biesheuvel 
Link: https://lore.kernel.org/r/20240725090345.28461-1-will@kernel.org
Signed-off-by: Will Deacon

Merge tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

2024-07-22T00:15:46+00:00

Pull MM updates from Andrew Morton:

- In the series "mm: Avoid possible overflows in dirty throttling" Jan
Kara addresses a couple of issues in the writeback throttling code.
These fixes are also targetted at -stable kernels.

- Ryusuke Konishi's series "nilfs2: fix potential issues related to
reserved inodes" does that. This should actually be in the
mm-nonmm-stable tree, along with the many other nilfs2 patches. My
bad.

- More folio conversions from Kefeng Wang in the series "mm: convert to
folio_alloc_mpol()"

- Kemeng Shi has sent some cleanups to the writeback code in the series
"Add helper functions to remove repeated code and improve readability
of cgroup writeback"

- Kairui Song has made the swap code a little smaller and a little
faster in the series "mm/swap: clean up and optimize swap cache
index".

- In the series "mm/memory: cleanly support zeropage in
vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David
Hildenbrand has reworked the rather sketchy handling of the use of
the zeropage in MAP_SHARED mappings. I don't see any runtime effects
here - more a cleanup/understandability/maintainablity thing.

- Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling
of higher addresses, for aarch64. The (poorly named) series is
"Restructure va_high_addr_switch".

- The core TLB handling code gets some cleanups and possible slight
optimizations in Bang Li's series "Add update_mmu_tlb_range() to
simplify code".

- Jane Chu has improved the handling of our
fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in
the series "Enhance soft hwpoison handling and injection".

- Jeff Johnson has sent a billion patches everywhere to add
MODULE_DESCRIPTION() to everything. Some landed in this pull.

- In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang
has simplified migration's use of hardware-offload memory copying.

- Yosry Ahmed performs more folio API conversions in his series "mm:
zswap: trivial folio conversions".

- In the series "large folios swap-in: handle refault cases first",
Chuanhua Han inches us forward in the handling of large pages in the
swap code. This is a cleanup and optimization, working toward the end
objective of full support of large folio swapin/out.

- In the series "mm,swap: cleanup VMA based swap readahead window
calculation", Huang Ying has contributed some cleanups and a possible
fixlet to his VMA based swap readahead code.

- In the series "add mTHP support for anonymous shmem" Baolin Wang has
taught anonymous shmem mappings to use multisize THP. By default this
is a no-op - users must opt in vis sysfs controls. Dramatic
improvements in pagefault latency are realized.

- David Hildenbrand has some cleanups to our remaining use of
page_mapcount() in the series "fs/proc: move page_mapcount() to
fs/proc/internal.h".

- David also has some highmem accounting cleanups in the series
"mm/highmem: don't track highmem pages manually".

- Build-time fixes and cleanups from John Hubbard in the series
"cleanups, fixes, and progress towards avoiding "make headers"".

- Cleanups and consolidation of the core pagemap handling from Barry
Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers
and utilize them".

- Lance Yang's series "Reclaim lazyfree THP without splitting" has
reduced the latency of the reclaim of pmd-mapped THPs under fairly
common circumstances. A 10x speedup is seen in a microbenchmark.

It does this by punting to aother CPU but I guess that's a win unless
all CPUs are pegged.

- hugetlb_cgroup cleanups from Xiu Jianfeng in the series
"mm/hugetlb_cgroup: rework on cftypes".

- Miaohe Lin's series "Some cleanups for memory-failure" does just that
thing.

- Someone other than SeongJae has developed a DAMON feature in Honggyu
Kim's series "DAMON based tiered memory management for CXL memory".
This adds DAMON features which may be used to help determine the
efficiency of our placement of CXL/PCIe attached DRAM.

- DAMON user API centralization and simplificatio work in SeongJae
Park's series "mm/damon: introduce DAMON parameters online commit
function".

- In the series "mm: page_type, zsmalloc and page_mapcount_reset()"
David Hildenbrand does some maintenance work on zsmalloc - partially
modernizing its use of pageframe fields.

- Kefeng Wang provides more folio conversions in the series "mm: remove
page_maybe_dma_pinned() and page_mkclean()".

- More cleanup from David Hildenbrand, this time in the series
"mm/memory_hotplug: use PageOffline() instead of PageReserved() for
!ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline()
pages" and permits the removal of some virtio-mem hacks.

- Barry Song's series "mm: clarify folio_add_new_anon_rmap() and
__folio_add_anon_rmap()" is a cleanup to the anon folio handling in
preparation for mTHP (multisize THP) swapin.

- Kefeng Wang's series "mm: improve clear and copy user folio"
implements more folio conversions, this time in the area of large
folio userspace copying.

- The series "Docs/mm/damon/maintaier-profile: document a mailing tool
and community meetup series" tells people how to get better involved
with other DAMON developers. From SeongJae Park.

- A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does
that.

- David Hildenbrand sends along more cleanups, this time against the
migration code. The series is "mm/migrate: move NUMA hinting fault
folio isolation + checks under PTL".

- Jan Kara has found quite a lot of strangenesses and minor errors in
the readahead code. He addresses this in the series "mm: Fix various
readahead quirks".

- SeongJae Park's series "selftests/damon: test DAMOS tried regions and
{min,max}_nr_regions" adds features and addresses errors in DAMON's
self testing code.

- Gavin Shan has found a userspace-triggerable WARN in the pagecache
code. The series "mm/filemap: Limit page cache size to that supported
by xarray" addresses this. The series is marked cc:stable.

- Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations
and cleanup" cleans up and slightly optimizes KSM.

- Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of
code motion. The series (which also makes the memcg-v1 code
Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put
under config option" and "mm: memcg: put cgroup v1-specific memcg
data under CONFIG_MEMCG_V1"

- Dan Schatzberg's series "Add swappiness argument to memory.reclaim"
adds an additional feature to this cgroup-v2 control file.

- The series "Userspace controls soft-offline pages" from Jiaqi Yan
permits userspace to stop the kernel's automatic treatment of
excessive correctable memory errors. In order to permit userspace to
monitor and handle this situation.

- Kefeng Wang's series "mm: migrate: support poison recover from
migrate folio" teaches the kernel to appropriately handle migration
from poisoned source folios rather than simply panicing.

- SeongJae Park's series "Docs/damon: minor fixups and improvements"
does those things.

- In the series "mm/zsmalloc: change back to per-size_class lock"
Chengming Zhou improves zsmalloc's scalability and memory
utilization.

- Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for
pinning memfd folios" makes the GUP code use FOLL_PIN rather than
bare refcount increments. So these paes can first be moved aside if
they reside in the movable zone or a CMA block.

- Andrii Nakryiko has added a binary ioctl()-based API to
/proc/pid/maps for much faster reading of vma information. The series
is "query VMAs from /proc//maps".

- In the series "mm: introduce per-order mTHP split counters" Lance
Yang improves the kernel's presentation of developer information
related to multisize THP splitting.

- Michael Ellerman has developed the series "Reimplement huge pages
without hugepd on powerpc (8xx, e500, book3s/64)". This permits
userspace to use all available huge page sizes.

- In the series "revert unconditional slab and page allocator fault
injection calls" Vlastimil Babka removes a performance-affecting and
not very useful feature from slab fault injection.

* tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits)
mm/mglru: fix ineffective protection calculation
mm/zswap: fix a white space issue
mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio
mm/hugetlb: fix possible recursive locking detected warning
mm/gup: clear the LRU flag of a page before adding to LRU batch
mm/numa_balancing: teach mpol_to_str about the balancing mode
mm: memcg1: convert charge move flags to unsigned long long
alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting
lib: reuse page_ext_data() to obtain codetag_ref
lib: add missing newline character in the warning message
mm/mglru: fix overshooting shrinker memory
mm/mglru: fix div-by-zero in vmpressure_calc_level()
mm/kmemleak: replace strncpy() with strscpy()
mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC
mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB
mm: ignore data-race in __swap_writepage
hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr
mm: shmem: rename mTHP shmem counters
mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async()
mm/migrate: putback split folios when numa hint migration fails
...