linux-stable.git/arch/s390/lib, branch linux-3.9.y

s390/uaccess: fix page table walk

2013-04-02T06:53:08+00:00

When translating user space addresses to kernel addresses the follow_table()
function had two bugs:

- PROT_NONE mappings could be read accessed via the kernel mapping. That is
  e.g. putting a filename into a user page, then protecting the page with
  PROT_NONE and afterwards issuing the "open" syscall with a pointer to
  the filename would incorrectly succeed.

- when walking the page tables it used the pgd/pud/pmd/pte primitives which
  with dynamic page tables give no indication which real level of page tables
  is being walked (region2, region3, segment or page table). So in case of an
  exception the translation exception code passed to __handle_fault() is not
  necessarily correct.
  This is not really an issue since __handle_fault() doesn't evaluate the code.
  Only in case of e.g. a SIGBUS this code gets passed to user space. If user
  space can do something sane with the value is a different question though.

To fix these issues don't use any Linux primitives. Only walk the page tables
like the hardware would do it, however we leave quite some checks away since
we know that we only have full size page tables and each index is within bounds.

In theory this should fix all issues...

Signed-off-by: Heiko Carstens 
Reviewed-by: Gerald Schaefer 
Signed-off-by: Martin Schwidefsky

s390/uaccess: fix clear_user_pt()

2013-03-21T12:35:38+00:00

The page table walker variant of clear_user() is supposed to copy the
contents of the empty zero page to user space.
However since 238ec4ef "[S390] zero page cache synonyms" empty_zero_page
is not anymore the page itself but contains the pointer to the empty zero
pages. Therefore the page table walker variant of clear_user() copied
the address of the first empty zero page and afterwards more or less
random data to user space instead of clearing the given user space range.

Signed-off-by: Heiko Carstens 
Signed-off-by: Martin Schwidefsky

s390/uaccess: fix kernel ds access for page table walk

2013-02-28T08:37:12+00:00

When the kernel resides in home space and the mvcos instruction is not
available uaccesses for kernel ds happen via simple strnlen() or memcpy()
calls.
This however can break badly, since uaccesses in kernel space may fail as
well, especially if CONFIG_DEBUG_PAGEALLOC is turned on.

To fix this implement strnlen_kernel() and copy_in_kernel() functions
which can only be used by the page table uaccess functions. These two
functions detect invalid memory accesses and return the correct length
of processed data.. Both functions are more or less a copy of the std
variants without sacf calls.

Fixes ipl crashes on 31 bit machines as well on 64 bit machines without
mvcos. Caused by changing the default address space of the kernel being
home space.

Signed-off-by: Heiko Carstens 
Signed-off-by: Martin Schwidefsky

s390/uaccess: fix strncpy_from_user string length check

2013-02-28T08:37:11+00:00

The "standard" and page table walk variants of strncpy_from_user() first
check the length of the to be copied string in userspace.
The string is then copied to kernel space and the length returned to the
caller.
However userspace can modify the string at any time while the kernel
checks for the length of the string or copies the string. In result the
returned length of the string is not necessarily correct.
Fix this by copying in a loop which mimics the mvcos variant of
strncpy_from_user(), which handles this correctly.

Signed-off-by: Heiko Carstens 
Signed-off-by: Martin Schwidefsky

s390/uaccess: fix strncpy_from_user/strnlen_user zero maxlen case

2013-02-28T08:37:08+00:00

If the maximum length specified for the to be accessed string for
strncpy_from_user() and strnlen_user() is zero the following incorrect
values would be returned or incorrect memory accesses would happen:

strnlen_user_std() and strnlen_user_pt() incorrectly return "1"
strncpy_from_user_pt() would incorrectly access "dst[maxlen - 1]"
strncpy_from_user_mvcos() would incorrectly return "-EFAULT"

Fix all these oddities by adding early checks.

Reviewed-by: Gerald Schaefer 
Signed-off-by: Heiko Carstens 
Signed-off-by: Martin Schwidefsky

s390/uaccess: shorten strncpy_from_user/strnlen_user

2013-02-28T08:37:07+00:00

Always stay within page boundaries when copying from user within
strlen_user_mvcos()/strncpy_from_user_mvcos(). This allows to
shorten the code a bit and may prevent unnecessary faults, since
we copy quite large amounts of memory to kernel space.

Also directly call the mvcos variants of copy_from_user() to
avoid indirect branches.

Signed-off-by: Heiko Carstens 
Signed-off-by: Martin Schwidefsky

s390/mm: implement software dirty bits

2013-02-14T14:55:23+00:00

The s390 architecture is unique in respect to dirty page detection,
it uses the change bit in the per-page storage key to track page
modifications. All other architectures track dirty bits by means
of page table entries. This property of s390 has caused numerous
problems in the past, e.g. see git commit ef5d437f71afdf4a
"mm: fix XFS oops due to dirty pages without buffers on s390".

To avoid future issues in regard to per-page dirty bits convert
s390 to a fault based software dirty bit detection mechanism. All
user page table entries which are marked as clean will be hardware
read-only, even if the pte is supposed to be writable. A write by
the user process will trigger a protection fault which will cause
the user pte to be marked as dirty and the hardware read-only bit
is removed.

With this change the dirty bit in the storage key is irrelevant
for Linux as a host, but the storage key is still required for
KVM guests. The effect is that page_test_and_clear_dirty and the
related code can be removed. The referenced bit in the storage
key is still used by the page_test_and_clear_young primitive to
provide page age information.

For page cache pages of mappings with mapping_cap_account_dirty
there will not be any change in behavior as the dirty bit tracking
already uses read-only ptes to control the amount of dirty pages.
Only for swap cache pages and pages of mappings without
mapping_cap_account_dirty there can be additional protection faults.
To avoid an excessive number of additional faults the mk_pte
primitive checks for PageDirty if the pgprot value allows for writes
and pre-dirties the pte. That avoids all additional faults for
tmpfs and shmem pages until these pages are added to the swap cache.

Signed-off-by: Martin Schwidefsky

s390/time: rename tod clock access functions

2013-02-14T14:55:10+00:00

Fix name clash with some common code device drivers and add "tod"
to all tod clock access function names.

Signed-off-by: Heiko Carstens 
Signed-off-by: Martin Schwidefsky

s390/mm: use pmd_large() instead of pmd_huge()

2012-10-26T14:44:23+00:00

Without CONFIG_HUGETLB_PAGE, pmd_huge() will always return 0. So
pmd_large() should be used instead in places where both transparent
huge pages and hugetlbfs pages can occur.

Signed-off-by: Gerald Schaefer 
Signed-off-by: Martin Schwidefsky

s390/string: provide asm lib functions for memcpy and memcmp

2012-09-26T13:44:50+00:00

Our memcpy and memcmp variants were implemented by calling the corresponding
gcc builtin variants.
However gcc is free to replace a call to __builtin_memcmp with a call to memcmp
which, when called, will result in an endless recursion within memcmp.
So let's provide asm variants and also fix the variants that are used for
uncompressing the kernel image.
In addition remove all other occurences of builtin function calls.

Signed-off-by: Heiko Carstens 
Signed-off-by: Martin Schwidefsky