linux-stable.git/arch/mips/include/asm/bitops.h, branch v5.4

mips/atomic: Fix smp_mb__{before,after}_atomic()

2019-08-31T10:06:02+00:00

Recent probing at the Linux Kernel Memory Model uncovered a
'surprise'. Strongly ordered architectures where the atomic RmW
primitive implies full memory ordering and
smp_mb__{before,after}_atomic() are a simple barrier() (such as MIPS
without WEAK_REORDERING_BEYOND_LLSC) fail for:

	*x = 1;
	atomic_inc(u);
	smp_mb__after_atomic();
	r0 = *y;

Because, while the atomic_inc() implies memory order, it
(surprisingly) does not provide a compiler barrier. This then allows
the compiler to re-order like so:

	atomic_inc(u);
	*x = 1;
	smp_mb__after_atomic();
	r0 = *y;

Which the CPU is then allowed to re-order (under TSO rules) like:

	atomic_inc(u);
	r0 = *y;
	*x = 1;

And this very much was not intended. Therefore strengthen the atomic
RmW ops to include a compiler barrier.

Reported-by: Andrea Parri 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Paul Burton

mips/atomic: Fix loongson_llsc_mb() wreckage

2019-08-31T10:05:17+00:00

The comment describing the loongson_llsc_mb() reorder case doesn't
make any sense what so ever. Instruction re-ordering is not an SMP
artifact, but rather a CPU local phenomenon. Clarify the comment by
explaining that these issue cause a coherence fail.

For the branch speculation case; if futex_atomic_cmpxchg_inatomic()
needs one at the bne branch target, then surely the normal
__cmpxch_asm() implementation does too. We cannot rely on the
barriers from cmpxchg() because cmpxchg_local() is implemented with
the same macro, and branch prediction and speculation are, too, CPU
local.

Fixes: e02e07e3127d ("MIPS: Loongson: Introduce and use loongson_llsc_mb()")
Cc: Huacai Chen 
Cc: Huang Pei 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Paul Burton

MIPS: mark fls() and ffs() as __always_inline

2019-05-15T02:52:48+00:00

This prepares to move CONFIG_OPTIMIZE_INLINING from x86 to a common
place.  We need to eliminate potential issues beforehand.

If it is enabled for mips, the following errors are reported:

  arch/mips/mm/sc-mips.o: In function `mips_sc_prefetch_enable.part.2':
  sc-mips.c:(.text+0x98): undefined reference to `mips_gcr_base'
  sc-mips.c:(.text+0x9c): undefined reference to `mips_gcr_base'
  sc-mips.c:(.text+0xbc): undefined reference to `mips_gcr_base'
  sc-mips.c:(.text+0xc8): undefined reference to `mips_gcr_base'
  sc-mips.c:(.text+0xdc): undefined reference to `mips_gcr_base'
  arch/mips/mm/sc-mips.o:sc-mips.c:(.text.unlikely+0x44): more undefined references to `mips_gcr_base'

Link: http://lkml.kernel.org/r/20190423034959.13525-7-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada 
Cc: Arnd Bergmann 
Cc: Benjamin Herrenschmidt 
Cc: Boris Brezillon 
Cc: Borislav Petkov 
Cc: Brian Norris 
Cc: Christophe Leroy 
Cc: David Woodhouse 
Cc: Heiko Carstens 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: Marek Vasut 
Cc: Mark Rutland 
Cc: Mathieu Malaterre 
Cc: Miquel Raynal 
Cc: Paul Mackerras 
Cc: Ralf Baechle 
Cc: Richard Weinberger 
Cc: Russell King 
Cc: Stefan Agner 
Cc: Thomas Gleixner 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

MIPS: Loongson: Introduce and use loongson_llsc_mb()

2019-02-04T18:53:34+00:00

On the Loongson-2G/2H/3A/3B there is a hardware flaw that ll/sc and
lld/scd is very weak ordering. We should add sync instructions "before
each ll/lld" and "at the branch-target between ll/sc" to workaround.
Otherwise, this flaw will cause deadlock occasionally (e.g. when doing
heavy load test with LTP).

Below is the explaination of CPU designer:

"For Loongson 3 family, when a memory access instruction (load, store,
or prefetch)'s executing occurs between the execution of LL and SC, the
success or failure of SC is not predictable. Although programmer would
not insert memory access instructions between LL and SC, the memory
instructions before LL in program-order, may dynamically executed
between the execution of LL/SC, so a memory fence (SYNC) is needed
before LL/LLD to avoid this situation.

Since Loongson-3A R2 (3A2000), we have improved our hardware design to
handle this case. But we later deduce a rarely circumstance that some
speculatively executed memory instructions due to branch misprediction
between LL/SC still fall into the above case, so a memory fence (SYNC)
at branch-target (if its target is not between LL/SC) is needed for
Loongson 3A1000, 3B1500, 3A2000 and 3A3000.

Our processor is continually evolving and we aim to to remove all these
workaround-SYNCs around LL/SC for new-come processor."

Here is an example:

Both cpu1 and cpu2 simutaneously run atomic_add by 1 on same atomic var,
this bug cause both 'sc' run by two cpus (in atomic_add) succeed at same
time('sc' return 1), and the variable is only *added by 1*, sometimes,
which is wrong and unacceptable(it should be added by 2).

Why disable fix-loongson3-llsc in compiler?
Because compiler fix will cause problems in kernel's __ex_table section.

This patch fix all the cases in kernel, but:

+. the fix at the end of futex_atomic_cmpxchg_inatomic is for branch-target
of 'bne', there other cases which smp_mb__before_llsc() and smp_llsc_mb() fix
the ll and branch-target coincidently such as atomic_sub_if_positive/
cmpxchg/xchg, just like this one.

+. Loongson 3 does support CONFIG_EDAC_ATOMIC_SCRUB, so no need to touch
edac.h

+. local_ops and cmpxchg_local should not be affected by this bug since
only the owner can write.

+. mips_atomic_set for syscall.c is deprecated and rarely used, just let
it go

Signed-off-by: Huacai Chen 
Signed-off-by: Huang Pei 
[paul.burton@mips.com:
  - Simplify the addition of -mno-fix-loongson3-llsc to cflags, and add
    a comment describing why it's there.
  - Make loongson_llsc_mb() a no-op when
    CONFIG_CPU_LOONGSON3_WORKAROUNDS=n, rather than a compiler memory
    barrier.
  - Add a comment describing the bug & how loongson_llsc_mb() helps
    in asm/barrier.h.]
Signed-off-by: Paul Burton 
Cc: Ralf Baechle 
Cc: ambrosehua@gmail.com
Cc: Steven J . Hill 
Cc: linux-mips@linux-mips.org
Cc: Fuxin Zhang 
Cc: Zhangjin Wu 
Cc: Li Xuefeng 
Cc: Xu Chenghua

fls: change parameter to unsigned int

2019-01-04T21:13:46+00:00

When testing in userspace, UBSAN pointed out that shifting into the sign
bit is undefined behaviour.  It doesn't really make sense to ask for the
highest set bit of a negative value, so just turn the argument type into
an unsigned int.

Some architectures (eg ppc) already had it declared as an unsigned int,
so I don't expect too many problems.

Link: http://lkml.kernel.org/r/20181105221117.31828-1-willy@infradead.org
Signed-off-by: Matthew Wilcox 
Acked-by: Thomas Gleixner 
Acked-by: Geert Uytterhoeven 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

MIPS: Avoid using .set mips0 to restore ISA

2018-11-09T18:23:19+00:00

We currently have 2 commonly used methods for switching ISA within
assembly code, then restoring the original ISA.

  1) Using a pair of .set push & .set pop directives. For example:

     .set	push
     .set	mips32r2
     
     .set	pop

  2) Using .set mips0 to restore the ISA originally specified on the
     command line. For example:

     .set	mips32r2
     
     .set	mips0

Unfortunately method 2 does not work with nanoMIPS toolchains, where the
assembler rejects the .set mips0 directive like so:

     Error: cannot change ISA from nanoMIPS to mips0

In preparation for supporting nanoMIPS builds, switch all instances of
method 2 in generic non-platform-specific code to use push & pop as in
method 1 instead. The .set push & .set pop is arguably cleaner anyway,
and if nothing else it's good to consistently use one method.

Signed-off-by: Paul Burton 
Patchwork: https://patchwork.linux-mips.org/patch/21037/
Cc: linux-mips@linux-mips.org

MIPS: Add nudges to writes for bit unlocks.

2017-10-09T12:53:56+00:00

Flushing the writes lets other CPUs waiting for the lock to get it sooner.

Signed-off-by: Chad Reese 
Signed-off-by: David Daney 
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/17289/
Signed-off-by: Ralf Baechle

MIPS: Move definitions for 32/64-bit agonstic inline assembler to new file.

2016-05-09T10:00:05+00:00

Inspired by Markos Chandras' patch.  I just didn't want do pull bitsops.h
into pgtable.h.

Signed-off-by: Ralf Baechle 
References: https://patchwork.linux-mips.org/patch/11052/

MIPS: Replace smp_mb with release barrier function in unlocks.

2015-06-21T19:54:30+00:00

Repleace smp_mb() in arch_write_unlock() and __clear_bit_unlock() to
smp_mb__before_llsc() call which does "release" barrier functionality.

It seems like it was missed in commit f252ffd50c97dae87b45f1dbad24f71358ccfbd6
during introduction of "acquire" and "release" semantics.

[ralf@linux-mips: The original patch submission was labelled a fix but
actually it replaces a barrier with another less restrictive type of
barrier so it doesn't fix any ill behaviour but rather squeezes out a
tad better performance.  Further improvments will be possible once
smp_release() has been merged.]

Signed-off-by: Leonid Yegoshin 
Cc: linux-mips@linux-mips.org
Cc: benh@kernel.crashing.org
Cc: will.deacon@arm.com
Cc: linux-kernel@vger.kernel.org
Cc: markos.chandras@imgtec.com
Cc: macro@linux-mips.org
Cc: Steven.Hill@imgtec.com
Cc: alexander.h.duyck@redhat.com
Cc: davem@davemloft.net
Patchwork: https://patchwork.linux-mips.org/patch/10507/
Signed-off-by: Ralf Baechle

MIPS: bitops.h: Avoid inline asm for constant FLS

2015-04-07T23:09:12+00:00

GCC is smart enough to substitute the final result for FLS calculations
as implemented in the fallback C code we have in `__fls' and `fls'
applied to constant values.  The presence of inline asm defeats the
compiler though, forcing it to emit extraneous CLZ/DCLZ calculation for
processors that support these instructions.

Use `__builtin_constant_p' then to avoid inline asm altogether for
constants.

Signed-off-by: Maciej W. Rozycki 
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9681/
Signed-off-by: Ralf Baechle