linux.git/arch/csky/include/asm, branch v5.19

csky/tlb: Remove tlb_flush() define

2022-07-21T17:50:13+00:00

The previous patch removed the tlb_flush_end() implementation which
used tlb_flush_range(). This means:

 - csky did double invalidates, a range invalidate per vma and a full
   invalidate at the end

 - csky actually has range invalidates and as such the generic
   tlb_flush implementation is more efficient for it.

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Will Deacon 
Tested-by: Guo Ren 
Signed-off-by: Linus Torvalds

mmu_gather: Remove per arch tlb_{start,end}_vma()

2022-07-21T17:50:13+00:00

Scattered across the archs are 3 basic forms of tlb_{start,end}_vma().
Provide two new MMU_GATHER_knobs to enumerate them and remove the per
arch tlb_{start,end}_vma() implementations.

 - MMU_GATHER_NO_FLUSH_CACHE indicates the arch has flush_cache_range()
   but does *NOT* want to call it for each VMA.

 - MMU_GATHER_MERGE_VMAS indicates the arch wants to merge the
   invalidate across multiple VMAs if possible.

With these it is possible to capture the three forms:

  1) empty stubs;
     select MMU_GATHER_NO_FLUSH_CACHE and MMU_GATHER_MERGE_VMAS

  2) start: flush_cache_range(), end: empty;
     select MMU_GATHER_MERGE_VMAS

  3) start: flush_cache_range(), end: flush_tlb_range();
     default

Obviously, if the architecture does not have flush_cache_range() then
it also doesn't need to select MMU_GATHER_NO_FLUSH_CACHE.

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Will Deacon 
Cc: David Miller 
Signed-off-by: Linus Torvalds

Merge tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

2022-05-26T19:32:41+00:00

Pull MM updates from Andrew Morton:
"Almost all of MM here. A few things are still getting finished off,
reviewed, etc.

- Yang Shi has improved the behaviour of khugepaged collapsing of
readonly file-backed transparent hugepages.

- Johannes Weiner has arranged for zswap memory use to be tracked and
managed on a per-cgroup basis.

- Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for
runtime enablement of the recent huge page vmemmap optimization
feature.

- Baolin Wang contributes a series to fix some issues around hugetlb
pagetable invalidation.

- Zhenwei Pi has fixed some interactions between hwpoisoned pages and
virtualization.

- Tong Tiangen has enabled the use of the presently x86-only
page_table_check debugging feature on arm64 and riscv.

- David Vernet has done some fixup work on the memcg selftests.

- Peter Xu has taught userfaultfd to handle write protection faults
against shmem- and hugetlbfs-backed files.

- More DAMON development from SeongJae Park - adding online tuning of
the feature and support for monitoring of fixed virtual address
ranges. Also easier discovery of which monitoring operations are
available.

- Nadav Amit has done some optimization of TLB flushing during
mprotect().

- Neil Brown continues to labor away at improving our swap-over-NFS
support.

- David Hildenbrand has some fixes to anon page COWing versus
get_user_pages().

- Peng Liu fixed some errors in the core hugetlb code.

- Joao Martins has reduced the amount of memory consumed by
device-dax's compound devmaps.

- Some cleanups of the arch-specific pagemap code from Anshuman
Khandual.

- Muchun Song has found and fixed some errors in the TLB flushing of
transparent hugepages.

- Roman Gushchin has done more work on the memcg selftests.

... and, of course, many smaller fixes and cleanups. Notably, the
customary million cleanup serieses from Miaohe Lin"

* tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (381 commits)
mm: kfence: use PAGE_ALIGNED helper
selftests: vm: add the "settings" file with timeout variable
selftests: vm: add "test_hmm.sh" to TEST_FILES
selftests: vm: check numa_available() before operating "merge_across_nodes" in ksm_tests
selftests: vm: add migration to the .gitignore
selftests/vm/pkeys: fix typo in comment
ksm: fix typo in comment
selftests: vm: add process_mrelease tests
Revert "mm/vmscan: never demote for memcg reclaim"
mm/kfence: print disabling or re-enabling message
include/trace/events/percpu.h: cleanup for "percpu: improve percpu_alloc_percpu event trace"
include/trace/events/mmflags.h: cleanup for "tracing: incorrect gfp_t conversion"
mm: fix a potential infinite loop in start_isolate_page_range()
MAINTAINERS: add Muchun as co-maintainer for HugeTLB
zram: fix Kconfig dependency warning
mm/shmem: fix shmem folio swapoff hang
cgroup: fix an error handling path in alloc_pagecache_max_30M()
mm: damon: use HPAGE_PMD_SIZE
tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate
nodemask.h: fix compilation error with GCC12
...

Merge tag 'asm-generic-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

2022-05-26T17:50:30+00:00

Pull asm-generic updates from Arnd Bergmann:
 "The asm-generic tree contains three separate changes for linux-5.19:

   - The h8300 architecture is retired after it has been effectively
     unmaintained for a number of years. This is the last architecture
     we supported that has no MMU implementation, but there are still a
     few architectures (arm, m68k, riscv, sh and xtensa) that support
     CPUs with and without an MMU.

   - A series to add a generic ticket spinlock that can be shared by
     most architectures with a working cmpxchg or ll/sc type atomic,
     including the conversion of riscv, csky and openrisc. This series
     is also a prerequisite for the loongarch64 architecture port that
     will come as a separate pull request.

   - A cleanup of some exported uapi header files to ensure they can be
     included from user space without relying on other kernel headers"

* tag 'asm-generic-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  h8300: remove stale bindings and symlink
  sparc: add asm/stat.h to UAPI compile-test coverage
  powerpc: add asm/stat.h to UAPI compile-test coverage
  mips: add asm/stat.h to UAPI compile-test coverage
  riscv: add linux/bpf_perf_event.h to UAPI compile-test coverage
  kbuild: prevent exported headers from including , 
  agpgart.h: do not include  from exported header
  csky: Move to generic ticket-spinlock
  RISC-V: Move to queued RW locks
  RISC-V: Move to generic spinlocks
  openrisc: Move to ticket-spinlock
  asm-generic: qrwlock: Document the spinlock fairness requirements
  asm-generic: qspinlock: Indicate the use of mixed-size atomics
  asm-generic: ticket-lock: New generic ticket-based spinlock
  remove the h8300 architecture

printk: stop including cache.h from printk.h

2022-05-13T14:20:07+00:00

An inclusion of cache.h in printk.h was added in 2014 in commit
c28aa1f0a847 ("printk/cache: mark printk_once test variable
__read_mostly") in order to bring in the definition of __read_mostly.  The
usage of __read_mostly was later removed in commit 3ec25826ae33 ("printk:
Tie printk_once / printk_deferred_once into .data.once for reset") which
made the inclusion of cache.h unnecessary, so remove it.

We have a small amount of code that depended on the inclusion of cache.h
from printk.h; fix that code to include the appropriate header.

This fixes a circular inclusion on arm64 (linux/printk.h -> linux/cache.h
-> asm/cache.h -> linux/kasan-enabled.h -> linux/static_key.h ->
linux/jump_label.h -> linux/bug.h -> asm/bug.h -> linux/printk.h) that
would otherwise be introduced by the next patch.

Build tested using {allyesconfig,defconfig} x {arm64,x86_64}.

Link: https://linux-review.googlesource.com/id/I8fd51f72c9ef1f2d6afd3b2cbc875aa4792c1fba
Link: https://lkml.kernel.org/r/20220427195820.1716975-1-pcc@google.com
Signed-off-by: Peter Collingbourne 
Cc: Alexander Potapenko 
Cc: Andrey Konovalov 
Cc: Andrey Ryabinin 
Cc: Catalin Marinas 
Cc: David Rientjes 
Cc: Dmitry Vyukov 
Cc: Eric W. Biederman 
Cc: Herbert Xu 
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim 
Cc: Kees Cook 
Cc: Pekka Enberg 
Cc: Roman Gushchin 
Cc: Vlastimil Babka 
Signed-off-by: Andrew Morton

csky: Move to generic ticket-spinlock

2022-05-11T18:50:15+00:00

There is no benefit from custom implementation for ticket-spinlock,
so move to generic ticket-spinlock for easy maintenance.

Signed-off-by: Guo Ren 
Reviewed-by: Arnd Bergmann 
Signed-off-by: Palmer Dabbelt

csky: atomic: Add conditional atomic operations' optimization

2022-04-25T05:51:43+00:00

Add conditional atomic operations' optimization:
 - arch_atomic_fetch_add_unless
 - arch_atomic_inc_unless_negative
 - arch_atomic_dec_unless_positive
 - arch_atomic_dec_if_positive

Comments by Boqun:

FWIW, you probably need to make sure that a barrier instruction inside
an lr/sc loop is a good thing. IIUC, the execution time of a barrier
instruction is determined by the status of store buffers and invalidate
queues (and probably other stuffs), so it may increase the execution
time of the lr/sc loop, and make it unlikely to succeed. But this really
depends on how the arch executes these instructions.

Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Cc: Boqun Feng

csky: atomic: Add custom atomic.h implementation

2022-04-25T05:51:42+00:00

The generic atomic.h used cmpxchg to implement the atomic
operations, it will cause daul loop to reduce the forward
guarantee. The patch implement csky custom atomic operations with
ldex/stex instructions for the best performance.

Important comment by Rutland:
8e86f0b409a4 ("arm64: atomics: fix use of acquire + release for
full barrier semantics")

Link: https://lore.kernel.org/linux-riscv/CAJF2gTSAxpAi=LbAdu7jntZRUa=-dJwL0VfmDfBV5MHB=rcZ-w@mail.gmail.com/T/#m27a0f1342995deae49ce1d0e1f2683f8a181d6c3
Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Cc: Mark Rutland

csky: atomic: Optimize cmpxchg with acquire & release

2022-04-25T05:51:42+00:00

Optimize cmpxchg with ASM acquire/release fence ASM instructions
instead of previous generic based. Prevent a fence when cmxchg's
first load != old.

Comments by Rutland:

8e86f0b409a4 ("arm64: atomics: fix use of acquire + release for
full barrier semantics")

Comments by Boqun:

FWIW, you probably need to make sure that a barrier instruction inside
an lr/sc loop is a good thing. IIUC, the execution time of a barrier
instruction is determined by the status of store buffers and invalidate
queues (and probably other stuffs), so it may increase the execution
time of the lr/sc loop, and make it unlikely to succeed. But this really
depends on how the arch executes these instructions.

Link: https://lore.kernel.org/linux-riscv/CAJF2gTSAxpAi=LbAdu7jntZRUa=-dJwL0VfmDfBV5MHB=rcZ-w@mail.gmail.com/T/#m27a0f1342995deae49ce1d0e1f2683f8a181d6c3
Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Cc: Mark Rutland

csky: optimize memcpy_{from,to}io() and memset_io()

2022-04-18T13:23:55+00:00

Optimize memcpy_{from,to}io() and memset_io() by transferring in
64 bit as much as possible with minimized barrier usage.  This
simplest optimization brings faster throughput compare to current
byte-by-byte read and write with barrier in the loop. Code's
skeleton is taken from the powerpc & arm64.

Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren