linux-stable.git/Documentation, branch v3.2.99

x86/smp: Don't ever patch back to UP if we unplug cpus

2018-02-13T18:32:14+00:00

commit 816afe4ff98ee10b1d30fd66361be132a0a5cee6 upstream.

We still patch SMP instructions to UP variants if we boot with a
single CPU, but not at any other time.  In particular, not if we
unplug CPUs to return to a single cpu.

Paul McKenney points out:

 mean offline overhead is 6251/48=130.2 milliseconds.

 If I remove the alternatives_smp_switch() from the offline
 path [...] the mean offline overhead is 550/42=13.1 milliseconds

Basically, we're never going to get those 120ms back, and the
code is pretty messy.

We get rid of:

 1) The "smp-alt-once" boot option. It's actually "smp-alt-boot", the
    documentation is wrong. It's now the default.

 2) The skip_smp_alternatives flag used by suspend.

 3) arch_disable_nonboot_cpus_begin() and arch_disable_nonboot_cpus_end()
    which were only used to set this one flag.

Signed-off-by: Rusty Russell 
Cc: Paul McKenney 
Cc: Suresh Siddha 
Cc: Linus Torvalds 
Cc: Andrew Morton 
Cc: Peter Zijlstra 
Link: http://lkml.kernel.org/r/87vcgwwive.fsf@rustcorp.com.au
Signed-off-by: Ingo Molnar 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

x86/kaiser: Check boottime cmdline params

2018-01-07T01:46:52+00:00

AMD (and possibly other vendors) are not affected by the leak
KAISER is protecting against.

Keep the "nopti" for traditional reasons and add pti=
like upstream.

Signed-off-by: Borislav Petkov 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Ben Hutchings

x86/kaiser: Rename and simplify X86_FEATURE_KAISER handling

2018-01-07T01:46:52+00:00

Concentrate it in arch/x86/mm/kaiser.c and use the upstream string "nopti".

Signed-off-by: Borislav Petkov 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Ben Hutchings

kaiser: add "nokaiser" boot option, using ALTERNATIVE

2018-01-07T01:46:51+00:00

Added "nokaiser" boot option: an early param like "noinvpcid".
Most places now check int kaiser_enabled (#defined 0 when not
CONFIG_KAISER) instead of #ifdef CONFIG_KAISER; but entry_64.S
and entry_64_compat.S are using the ALTERNATIVE technique, which
patches in the preferred instructions at runtime.  That technique
is tied to x86 cpu features, so X86_FEATURE_KAISER fabricated
("" in its comment so "kaiser" not magicked into /proc/cpuinfo).

Prior to "nokaiser", Kaiser #defined _PAGE_GLOBAL 0: revert that,
but be careful with both _PAGE_GLOBAL and CR4.PGE: setting them when
nokaiser like when !CONFIG_KAISER, but not setting either when kaiser -
neither matters on its own, but it's hard to be sure that _PAGE_GLOBAL
won't get set in some obscure corner, or something add PGE into CR4.
By omitting _PAGE_GLOBAL from __supported_pte_mask when kaiser_enabled,
all page table setup which uses pte_pfn() masks it out of the ptes.

It's slightly shameful that the same declaration versus definition of
kaiser_enabled appears in not one, not two, but in three header files
(asm/kaiser.h, asm/pgtable.h, asm/tlbflush.h).  I felt safer that way,
than with #including any of those in any of the others; and did not
feel it worth an asm/kaiser_enabled.h - kernel/cpu/common.c includes
them all, so we shall hear about it if they get out of synch.

Cleanups while in the area: removed the silly #ifdef CONFIG_KAISER
from kaiser.c; removed the unused native_get_normal_pgd(); removed
the spurious reg clutter from SWITCH_*_CR3 macro stubs; corrected some
comments.  But more interestingly, set CR4.PSE in secondary_startup_64:
the manual is clear that it does not matter whether it's 0 or 1 when
4-level-pts are enabled, but I was distracted to find cr4 different on
BSP and auxiliaries - BSP alone was adding PSE, in init_memory_mapping().

(cherry picked from Change-Id: I8e5bec716944444359cbd19f6729311eff943e9a)

Signed-off-by: Hugh Dickins 
Signed-off-by: Ben Hutchings

x86/mm: Add the 'nopcid' boot option to turn off PCID

2018-01-07T01:46:48+00:00

commit 0790c9aad84901ca1bdc14746175549c8b5da215 upstream.

The parameter is only present on x86_64 systems to save a few bytes,
as PCID is always disabled on x86_32.

Signed-off-by: Andy Lutomirski 
Reviewed-by: Nadav Amit 
Reviewed-by: Borislav Petkov 
Reviewed-by: Thomas Gleixner 
Cc: Andrew Morton 
Cc: Arjan van de Ven 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: Linus Torvalds 
Cc: Mel Gorman 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/8bbb2e65bcd249a5f18bfb8128b4689f08ac2b60.1498751203.git.luto@kernel.org
Signed-off-by: Ingo Molnar 
[Hugh Dickins: Backported to 3.2:
 - Documentation/admin-guide/kernel-parameters.txt (not in this tree)
 - Documentation/kernel-parameters.txt (patched instead of that)]
Signed-off-by: Hugh Dickins 
Signed-off-by: Ben Hutchings

x86/mm: Add a 'noinvpcid' boot option to turn off INVPCID

2018-01-07T01:46:47+00:00

commit d12a72b844a49d4162f24cefdab30bed3f86730e upstream.

This adds a chicken bit to turn off INVPCID in case something goes
wrong.  It's an early_param() because we do TLB flushes before we
parse __setup() parameters.

Signed-off-by: Andy Lutomirski 
Reviewed-by: Borislav Petkov 
Cc: Andrew Morton 
Cc: Andrey Ryabinin 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Hansen 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Luis R. Rodriguez 
Cc: Oleg Nesterov 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Toshi Kani 
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/f586317ed1bc2b87aee652267e515b90051af385.1454096309.git.luto@kernel.org
Signed-off-by: Ingo Molnar 
Cc: Hugh Dickins 
Signed-off-by: Ben Hutchings

cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags

2017-10-12T14:27:22+00:00

commit 2ad654bc5e2b211e92f66da1d819e47d79a866f0 upstream.

When we change cpuset.memory_spread_{page,slab}, cpuset will flip
PF_SPREAD_{PAGE,SLAB} bit of tsk->flags for each task in that cpuset.
This should be done using atomic bitops, but currently we don't,
which is broken.

Tetsuo reported a hard-to-reproduce kernel crash on RHEL6, which happened
when one thread tried to clear PF_USED_MATH while at the same time another
thread tried to flip PF_SPREAD_PAGE/PF_SPREAD_SLAB. They both operate on
the same task.

Here's the full report:
https://lkml.org/lkml/2014/9/19/230

To fix this, we make PF_SPREAD_PAGE and PF_SPREAD_SLAB atomic flags.

v4:
- updated mm/slab.c. (Fengguang Wu)
- updated Documentation.

Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Miao Xie 
Cc: Kees Cook 
Fixes: 950592f7b991 ("cpusets: update tasks' page/slab spread flags in time")
Reported-by: Tetsuo Handa 
Signed-off-by: Zefan Li 
Signed-off-by: Tejun Heo 
[lizf: Backported to 3.4:
 - adjust context
 - check current->flags & PF_MEMPOLICY rather than current->mempolicy]
Signed-off-by: Ben Hutchings

mm: larger stack guard gap, between vmas

2017-07-02T16:12:47+00:00

commit 1be7107fbe18eed3e319a6c3e83c78254b693acb upstream.

Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.

This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.

Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.

One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications.  For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).

Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.

Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.

Original-patch-by: Oleg Nesterov 
Original-patch-by: Michal Hocko 
Signed-off-by: Hugh Dickins 
Acked-by: Michal Hocko 
Tested-by: Helge Deller  # parisc
Signed-off-by: Linus Torvalds 
[Hugh Dickins: Backported to 3.2]
[bwh: Fix more instances of vma->vm_start in sparc64 impl. of
 arch_get_unmapped_area_topdown() and generic impl. of
 hugetlb_get_unmapped_area()]
Signed-off-by: Ben Hutchings

fs: Give dentry to inode_change_ok() instead of inode

2016-11-20T01:01:43+00:00

commit 31051c85b5e2aaaf6315f74c72a732673632a905 upstream.

inode_change_ok() will be resposible for clearing capabilities and IMA
extended attributes and as such will need dentry. Give it as an argument
to inode_change_ok() instead of an inode. Also rename inode_change_ok()
to setattr_prepare() to better relect that it does also some
modifications in addition to checks.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Jan Kara 
[bwh: Backported to 3.2:
 - Drop changes to f2fs, lustre, orangefs, overlayfs
 - Adjust filenames, context
 - In nfsd, pass dentry to nfsd_sanitize_attrs()
 - In xfs, pass dentry to xfs_change_file_space(), xfs_set_mode(),
   xfs_setattr_nonsize(), and xfs_setattr_size()
 - Update ext3 as well
 - Mark pohmelfs as BROKEN; it's long dead upstream]
Signed-off-by: Ben Hutchings

USB: uas: Add a new NO_REPORT_LUNS quirk

2016-06-15T20:28:11+00:00

commit 1363074667a6b7d0507527742ccd7bbed5e3ceaa upstream.

Add a new NO_REPORT_LUNS quirk and set it for Seagate drives with
an usb-id of: 0bc2:331a, as these will fail to respond to a
REPORT_LUNS command.

Reported-and-tested-by: David Webb 
Signed-off-by: Hans de Goede 
Acked-by: Alan Stern 
Signed-off-by: Greg Kroah-Hartman 
[bwh: Backported to 3.2:
 - Adjust context
 - Drop the UAS changes]
Signed-off-by: Ben Hutchings