linux-stable.git/arch/sparc/lib, branch linux-3.12.y

sparc64: Fix userspace FPU register corruptions.

2015-08-19T06:36:51+00:00

[ Upstream commit 44922150d87cef616fd183220d43d8fde4d41390 ]

If we have a series of events from userpsace, with %fprs=FPRS_FEF,
like follows:

ETRAP
	ETRAP
		VIS_ENTRY(fprs=0x4)
		VIS_EXIT
		RTRAP (kernel FPU restore with fpu_saved=0x4)
	RTRAP

We will not restore the user registers that were clobbered by the FPU
using kernel code in the inner-most trap.

Traps allocate FPU save slots in the thread struct, and FPU using
sequences save the "dirty" FPU registers only.

This works at the initial trap level because all of the registers
get recorded into the top-level FPU save area, and we'll return
to userspace with the FPU disabled so that any FPU use by the user
will take an FPU disabled trap wherein we'll load the registers
back up properly.

But this is not how trap returns from kernel to kernel operate.

The simplest fix for this bug is to always save all FPU register state
for anything other than the top-most FPU save area.

Getting rid of the optimized inner-slot FPU saving code ends up
making VISEntryHalf degenerate into plain VISEntry.

Longer term we need to do something smarter to reinstate the partial
save optimizations.  Perhaps the fundament error is having trap entry
and exit allocate FPU save slots and restore register state.  Instead,
the VISEntry et al. calls should be doing that work.

This bug is about two decades old.

Reported-by: James Y Knight 
Signed-off-by: David S. Miller 
Signed-off-by: Jiri Slaby

sparc64: Fix several bugs in memmove().

2015-04-09T11:13:51+00:00

[ Upstream commit 2077cef4d5c29cf886192ec32066f783d6a80db8 ]

Firstly, handle zero length calls properly.  Believe it or not there
are a few of these happening during early boot.

Next, we can't just drop to a memcpy() call in the forward copy case
where dst <= src.  The reason is that the cache initializing stores
used in the Niagara memcpy() implementations can end up clearing out
cache lines before we've sourced their original contents completely.

For example, considering NG4memcpy, the main unrolled loop begins like
this:

     load   src + 0x00
     load   src + 0x08
     load   src + 0x10
     load   src + 0x18
     load   src + 0x20
     store  dst + 0x00

Assume dst is 64 byte aligned and let's say that dst is src - 8 for
this memcpy() call.  That store at the end there is the one to the
first line in the cache line, thus clearing the whole line, which thus
clobbers "src + 0x28" before it even gets loaded.

To avoid this, just fall through to a simple copy only mildly
optimized for the case where src and dst are 8 byte aligned and the
length is a multiple of 8 as well.  We could get fancy and call
GENmemcpy() but this is good enough for how this thing is actually
used.

Reported-by: David Ahern 
Reported-by: Bob Picco 
Signed-off-by: David S. Miller 
Signed-off-by: Jiri Slaby

sparc32: Implement xchg and atomic_xchg using ATOMIC_HASH locks

2014-11-19T11:32:56+00:00

[ Upstream commit 1a17fdc4f4ed06b63fac1937470378a5441a663a ]

Atomicity between xchg and cmpxchg cannot be guaranteed when xchg is
implemented with a swap and cmpxchg is implemented with locks.
Without this, e.g. mcs_spin_lock and mcs_spin_unlock are broken.

Signed-off-by: Andreas Larsson 
Signed-off-by: David S. Miller 
Signed-off-by: Jiri Slaby

sparc64: Make PAGE_OFFSET variable.

2014-10-31T14:07:22+00:00

commit b2d438348024b75a1ee8b66b85d77f569a5dfed8 upstream.

Choose PAGE_OFFSET dynamically based upon cpu type.

Original UltraSPARC-I (spitfire) chips only supported a 44-bit
virtual address space.

Newer chips (T4 and later) support 52-bit virtual addresses
and up to 47-bits of physical memory space.

Therefore we have to adjust PAGE_SIZE dynamically based upon
the capabilities of the chip.

Note that this change alone does not allow us to support > 43-bit
physical memory, to do that we need to re-arrange our page table
support.  The current encodings of the pmd_t and pgd_t pointers
restricts us to "32 + 11" == 43 bits.

This change can waste quite a bit of memory for the various tables.
In particular, a future change should work to size and allocate
kern_linear_bitmap[] and sparc64_valid_addr_bitmap[] dynamically.
This isn't easy as we really cannot take a TLB miss when accessing
kern_linear_bitmap[].  We'd have to lock it into the TLB or similar.

Signed-off-by: David S. Miller 
Acked-by: Bob Picco

sparc64: Fix FPU register corruption with AES crypto offload.

2014-10-31T14:05:21+00:00

[ Upstream commit f4da3628dc7c32a59d1fb7116bb042e6f436d611 ]

The AES loops in arch/sparc/crypto/aes_glue.c use a scheme where the
key material is preloaded into the FPU registers, and then we loop
over and over doing the crypt operation, reusing those pre-cooked key
registers.

There are intervening blkcipher*() calls between the crypt operation
calls.  And those might perform memcpy() and thus also try to use the
FPU.

The sparc64 kernel FPU usage mechanism is designed to allow such
recursive uses, but with a catch.

There has to be a trap between the two FPU using threads of control.

The mechanism works by, when the FPU is already in use by the kernel,
allocating a slot for FPU saving at trap time.  Then if, within the
trap handler, we try to use the FPU registers, the pre-trap FPU
register state is saved into the slot.  Then at trap return time we
notice this and restore the pre-trap FPU state.

Over the long term there are various more involved ways we can make
this work, but for a quick fix let's take advantage of the fact that
the situation where this happens is very limited.

All sparc64 chips that support the crypto instructiosn also are using
the Niagara4 memcpy routine, and that routine only uses the FPU for
large copies where we can't get the source aligned properly to a
multiple of 8 bytes.

We look to see if the FPU is already in use in this context, and if so
we use the non-large copy path which only uses integer registers.

Furthermore, we also limit this special logic to when we are doing
kernel copy, rather than a user copy.

Signed-off-by: David S. Miller

sparc: Let memset return the address argument

2014-10-31T14:05:17+00:00

[ Upstream commit 74cad25c076a2f5253312c2fe82d1a4daecc1323 ]

This makes memset follow the standard (instead of returning 0 on success). This
is needed when certain versions of gcc optimizes around memset calls and assume
that the address argument is preserved in %o0.

Signed-off-by: Andreas Larsson 
Signed-off-by: David S. Miller

sparc64: Add membar to Niagara2 memcpy code.

2014-08-26T12:11:58+00:00

[ Upstream commit 5aa4ecfd0ddb1e6dcd1c886e6c49677550f581aa ]

This is the prevent previous stores from overlapping the block stores
done by the memcpy loop.

Based upon a glibc patch by Jose E. Marchesi

Signed-off-by: David S. Miller 
Signed-off-by: Jiri Slaby

sparc64: Remove RWSEM export leftovers

2013-09-05T19:12:51+00:00

The functions

			__down_read
			__down_read_trylock
			__down_write
			__down_write_trylock
			__up_read
			__up_write
			__downgrade_write

are implemented inline, so remove corresponding EXPORT_SYMBOLs
(They lead to compile errors on RT kernel).

Signed-off-by: Kirill Tkhai 
CC: David Miller 
Signed-off-by: David S. Miller

Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS

2013-05-01T00:04:09+00:00

The help text for this config is duplicated across the x86, parisc, and
s390 Kconfig.debug files.  Arnd Bergman noted that the help text was
slightly misleading and should be fixed to state that enabling this
option isn't a problem when using pre 4.4 gcc.

To simplify the rewording, consolidate the text into lib/Kconfig.debug
and modify it there to be more explicit about when you should say N to
this config.

Also, make the text a bit more generic by stating that this option
enables compile time checks so we can cover architectures which emit
warnings vs.  ones which emit errors.  The details of how an
architecture decided to implement the checks isn't as important as the
concept of compile time checking of copy_from_user() calls.

While we're doing this, remove all the copy_from_user_overflow() code
that's duplicated many times and place it into lib/ so that any
architecture supporting this option can get the function for free.

Signed-off-by: Stephen Boyd 
Acked-by: Arnd Bergmann 
Acked-by: Ingo Molnar 
Acked-by: H. Peter Anvin 
Cc: Arjan van de Ven 
Acked-by: Helge Deller 
Cc: Heiko Carstens 
Cc: Stephen Rothwell 
Cc: Chris Metcalf 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

sparc/srmmu: clear trailing edge of bitmap properly

2013-03-31T23:29:12+00:00

srmmu_nocache_bitmap is cleared by bit_map_init().  But bit_map_init()
attempts to clear by memset(), so it can't clear the trailing edge of
bitmap properly on big-endian architecture if the number of bits is not
a multiple of BITS_PER_LONG.

Actually, the number of bits in srmmu_nocache_bitmap is not always
a multiple of BITS_PER_LONG.  It is calculated as below:

        bitmap_bits = srmmu_nocache_size >> SRMMU_NOCACHE_BITMAP_SHIFT;

srmmu_nocache_size is decided proportionally by the amount of system RAM
and it is rounded to a multiple of PAGE_SIZE.  SRMMU_NOCACHE_BITMAP_SHIFT
is defined as (PAGE_SHIFT - 4).  So it can only be said that bitmap_bits
is a multiple of 16.

This fixes the problem by using bitmap_clear() instead of memset()
in bit_map_init() and this also uses BITS_TO_LONGS() to calculate correct
size at bitmap allocation time.

Signed-off-by: Akinobu Mita 
Cc: "David S. Miller" 
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller