linux-stable.git/include/linux, branch linux-2.6.27.y

lib: proportion: lower PROP_MAX_SHIFT to 32 on 64-bit kernel

2012-03-17T13:03:57+00:00

commit 3310225dfc71a35a2cc9340c15c0e08b14b3c754 upstream.

PROP_MAX_SHIFT should be set to <=32 on 64-bit box. This fixes two bugs
in the below lines of bdi_dirty_limit():

	bdi_dirty *= numerator;
	do_div(bdi_dirty, denominator);

1) divide error: do_div() only uses the lower 32 bit of the denominator,
   which may trimmed to be 0 when PROP_MAX_SHIFT > 32.

2) overflow: (bdi_dirty * numerator) could easily overflow if numerator
   used up to 48 bits, leaving only 16 bits to bdi_dirty

Cc: Peter Zijlstra 
Reported-by: Ilya Tumaykin 
Tested-by: Ilya Tumaykin 
Signed-off-by: Wu Fengguang 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Willy Tarreau

block: fail SCSI passthrough ioctls on partition devices

2012-02-11T14:40:55+00:00

commit 0bfc96cb77224736dfa35c3c555d37b3646ef35e upstream.

[ Changes with respect to 3.3: return -ENOTTY from scsi_verify_blk_ioctl
  and -ENOIOCTLCMD from sd_compat_ioctl. ]

Linux allows executing the SG_IO ioctl on a partition or LVM volume, and
will pass the command to the underlying block device.  This is
well-known, but it is also a large security problem when (via Unix
permissions, ACLs, SELinux or a combination thereof) a program or user
needs to be granted access only to part of the disk.

This patch lets partitions forward a small set of harmless ioctls;
others are logged with printk so that we can see which ioctls are
actually sent.  In my tests only CDROM_GET_CAPABILITY actually occurred.
Of course it was being sent to a (partition on a) hard disk, so it would
have failed with ENOTTY and the patch isn't changing anything in
practice.  Still, I'm treating it specially to avoid spamming the logs.

In principle, this restriction should include programs running with
CAP_SYS_RAWIO.  If for example I let a program access /dev/sda2 and
/dev/sdb, it still should not be able to read/write outside the
boundaries of /dev/sda2 independent of the capabilities.  However, for
now programs with CAP_SYS_RAWIO will still be allowed to send the
ioctls.  Their actions will still be logged.

This patch does not affect the non-libata IDE driver.  That driver
however already tests for bd != bd->bd_contains before issuing some
ioctl; it could be restricted further to forbid these ioctls even for
programs running with CAP_SYS_ADMIN/CAP_SYS_RAWIO.

Cc: linux-scsi@vger.kernel.org
Cc: Jens Axboe 
Cc: James Bottomley 
Signed-off-by: Paolo Bonzini 
[ Make it also print the command name when warning - Linus ]
Signed-off-by: Linus Torvalds 
[bwh: Backport to 2.6.32 - ENOIOCTLCMD does not get converted to
 ENOTTY, so we must return ENOTTY directly]
Signed-off-by: Ben Hutchings 
Signed-off-by: Greg Kroah-Hartman 

Signed-off-by: Willy Tarreau

block: add and use scsi_blk_cmd_ioctl

2012-02-11T14:40:54+00:00

commit 577ebb374c78314ac4617242f509e2f5e7156649 upstream.

Introduce a wrapper around scsi_cmd_ioctl that takes a block device.

The function will then be enhanced to detect partition block devices
and, in that case, subject the ioctls to whitelisting.

Cc: linux-scsi@vger.kernel.org
Cc: Jens Axboe 
Cc: James Bottomley 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman 
[bwh: Backport to 2.6.32 - adjust context]
Signed-off-by: Ben Hutchings 
[wt: slightly changed the interface to match 2.6.27's scsi_cmd_ioctl()
     which still needs the file pointer but has no mode parameter].

Signed-off-by: Willy Tarreau

af_packet: prevent information leak

2012-02-11T14:40:47+00:00

[ Upstream commit 13fcb7bd322164c67926ffe272846d4860196dc6 ]

In 2.6.27, commit 393e52e33c6c2 (packet: deliver VLAN TCI to userspace)
added a small information leak.

Add padding field and make sure its zeroed before copy to user.

Signed-off-by: Eric Dumazet 
CC: Patrick McHardy 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Willy Tarreau

x86, mm: Add __get_user_pages_fast()

2012-02-11T14:38:10+00:00

Introduce a gup_fast() variant which is usable from IRQ/NMI context.

[ WT: this one is only needed for next patch ]

Signed-off-by: Peter Zijlstra 
CC: Nick Piggin 
Cc: Mike Galbraith 
Cc: Paul Mackerras 
Cc: Arnaldo Carvalho de Melo 
LKML-Reference: 
Signed-off-by: Ingo Molnar 
Signed-off-by: Willy Tarreau

NLM: Don't hang forever on NLM unlock requests

2012-02-11T14:37:49+00:00

commit 0b760113a3a155269a3fba93a409c640031dd68f upstream.

If the NLM daemon is killed on the NFS server, we can currently end up
hanging forever on an 'unlock' request, instead of aborting. Basically,
if the rpcbind request fails, or the server keeps returning garbage, we
really want to quit instead of retrying.

Tested-by: Vasily Averin 
Signed-off-by: Trond Myklebust 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Willy Tarreau

seqlock: Don't smp_rmb in seqlock reader spin loop

2012-02-11T14:37:29+00:00

commit 5db1256a5131d3b133946fa02ac9770a784e6eb2 upstream.

Move the smp_rmb after cpu_relax loop in read_seqlock and add
ACCESS_ONCE to make sure the test and return are consistent.

A multi-threaded core in the lab didn't like the update
from 2.6.35 to 2.6.36, to the point it would hang during
boot when multiple threads were active.  Bisection showed
af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867 (clockevents:
Remove the per cpu tick skew) as the culprit and it is
supported with stack traces showing xtime_lock waits including
tick_do_update_jiffies64 and/or update_vsyscall.

Experimentation showed the combination of cpu_relax and smp_rmb
was significantly slowing the progress of other threads sharing
the core, and this patch is effective in avoiding the hang.

A theory is the rmb is affecting the whole core while the
cpu_relax is causing a resource rebalance flush, together they
cause an interfernce cadance that is unbroken when the seqlock
reader has interrupts disabled.

At first I was confused why the refactor in
3c22cd5709e8143444a6d08682a87f4c57902df3 (kernel: optimise
seqlock) didn't affect this patch application, but after some
study that affected seqcount not seqlock. The new seqcount was
not factored back into the seqlock.  I defer that the future.

While the removal of the timer interrupt offset created
contention for the xtime lock while a cpu does the
additonal work to update the system clock, the seqlock
implementation with the tight rmb spin loop goes back much
further, and is just waiting for the right trigger.

Signed-off-by: Milton Miller 
Cc: 
Cc: Linus Torvalds 
Cc: Andi Kleen 
Cc: Nick Piggin 
Cc: Benjamin Herrenschmidt 
Cc: Anton Blanchard 
Cc: Paul McKenney 
Acked-by: Eric Dumazet 
Link: http://lkml.kernel.org/r/%3Cseqlock-rmb%40mdm.bga.com%3E
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Willy Tarreau

next_pidmap: fix overflow condition

2011-04-30T14:53:38+00:00

commit c78193e9c7bcbf25b8237ad0dec82f805c4ea69b upstream.

next_pidmap() just quietly accepted whatever 'last' pid that was passed
in, which is not all that safe when one of the users is /proc.

Admittedly the proc code should do some sanity checking on the range
(and that will be the next commit), but that doesn't mean that the
helper functions should just do that pidmap pointer arithmetic without
checking the range of its arguments.

So clamp 'last' to PID_MAX_LIMIT.  The fact that we then do "last+1"
doesn't really matter, the for-loop does check against the end of the
pidmap array properly (it's only the actual pointer arithmetic overflow
case we need to worry about, and going one bit beyond isn't going to
overflow).

[ Use PID_MAX_LIMIT rather than pid_max as per Eric Biederman ]

Reported-by: Tavis Ormandy 
Analyzed-by: Robert Święcki 
Cc: Eric W. Biederman 
Cc: Pavel Emelyanov 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

exec: copy-and-paste the fixes into compat_do_execve() paths

2011-04-30T14:53:36+00:00

commit 114279be2120a916e8a04feeb2ac976a10016f2f upstream.

Note: this patch targets 2.6.37 and tries to be as simple as possible.
That is why it adds more copy-and-paste horror into fs/compat.c and
uglifies fs/exec.c, this will be cleanuped later.

compat_copy_strings() plays with bprm->vma/mm directly and thus has
two problems: it lacks the RLIMIT_STACK check and argv/envp memory
is not visible to oom killer.

Export acct_arg_size() and get_arg_page(), change compat_copy_strings()
to use get_arg_page(), change compat_do_execve() to do acct_arg_size(0)
as do_execve() does.

Add the fatal_signal_pending/cond_resched checks into compat_count() and
compat_copy_strings(), this matches the code in fs/exec.c and certainly
makes sense.

Signed-off-by: Oleg Nesterov 
Cc: KOSAKI Motohiro 
Signed-off-by: Linus Torvalds 
Signed-off-by: Andi Kleen 
Cc: Moritz Muehlenhoff 
Signed-off-by: Greg Kroah-Hartman

exec: make argv/envp memory visible to oom-killer

2011-04-30T14:53:36+00:00

commit 3c77f845722158206a7209c45ccddc264d19319c upstream.

Brad Spengler published a local memory-allocation DoS that
evades the OOM-killer (though not the virtual memory RLIMIT):
http://www.grsecurity.net/~spender/64bit_dos.c

execve()->copy_strings() can allocate a lot of memory, but
this is not visible to oom-killer, nobody can see the nascent
bprm->mm and take it into account.

With this patch get_arg_page() increments current's MM_ANONPAGES
counter every time we allocate the new page for argv/envp. When
do_execve() succeds or fails, we change this counter back.

Technically this is not 100% correct, we can't know if the new
page is swapped out and turn MM_ANONPAGES into MM_SWAPENTS, but
I don't think this really matters and everything becomes correct
once exec changes ->mm or fails.

Reported-by: Brad Spengler 
Reviewed-and-discussed-by: KOSAKI Motohiro 
Signed-off-by: Oleg Nesterov 
Signed-off-by: Linus Torvalds 
Cc: Moritz Muehlenhoff 
Signed-off-by: Greg Kroah-Hartman