linux-stable.git/fs, branch v3.18.48

fs: exec: apply CLOEXEC before changing dumpable task flags

2017-01-15T14:49:53+00:00

[ Upstream commit 613cc2b6f272c1a8ad33aefa21cad77af23139f7 ]

If you have a process that has set itself to be non-dumpable, and it
then undergoes exec(2), any CLOEXEC file descriptors it has open are
"exposed" during a race window between the dumpable flags of the process
being reset for exec(2) and CLOEXEC being applied to the file
descriptors. This can be exploited by a process by attempting to access
/proc//fd/... during this window, without requiring CAP_SYS_PTRACE.

The race in question is after set_dumpable has been (for get_link,
though the trace is basically the same for readlink):

[vfs]
-> proc_pid_link_inode_operations.get_link
   -> proc_pid_get_link
      -> proc_fd_access_allowed
         -> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);

Which will return 0, during the race window and CLOEXEC file descriptors
will still be open during this window because do_close_on_exec has not
been called yet. As a result, the ordering of these calls should be
reversed to avoid this race window.

This is of particular concern to container runtimes, where joining a
PID namespace with file descriptors referring to the host filesystem
can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect
against access of CLOEXEC file descriptors -- file descriptors which may
reference filesystem objects the container shouldn't have access to).

Cc: dev@opencontainers.org
Cc:  # v3.2+
Reported-by: Michael Crosby 
Signed-off-by: Aleksa Sarai 
Signed-off-by: Al Viro 
Signed-off-by: Sasha Levin

block_dev: don't test bdev->bd_contains when it is not stable

2017-01-15T14:49:52+00:00

[ Upstream commit bcc7f5b4bee8e327689a4d994022765855c807ff ]

bdev->bd_contains is not stable before calling __blkdev_get().
When __blkdev_get() is called on a parition with ->bd_openers == 0
it sets
  bdev->bd_contains = bdev;
which is not correct for a partition.
After a call to __blkdev_get() succeeds, ->bd_openers will be > 0
and then ->bd_contains is stable.

When FMODE_EXCL is used, blkdev_get() calls
   bd_start_claiming() ->  bd_prepare_to_claim() -> bd_may_claim()

This call happens before __blkdev_get() is called, so ->bd_contains
is not stable.  So bd_may_claim() cannot safely use ->bd_contains.
It currently tries to use it, and this can lead to a BUG_ON().

This happens when a whole device is already open with a bd_holder (in
use by dm in my particular example) and two threads race to open a
partition of that device for the first time, one opening with O_EXCL and
one without.

The thread that doesn't use O_EXCL gets through blkdev_get() to
__blkdev_get(), gains the ->bd_mutex, and sets bdev->bd_contains = bdev;

Immediately thereafter the other thread, using FMODE_EXCL, calls
bd_start_claiming() from blkdev_get().  This should fail because the
whole device has a holder, but because bdev->bd_contains == bdev
bd_may_claim() incorrectly reports success.
This thread continues and blocks on bd_mutex.

The first thread then sets bdev->bd_contains correctly and drops the mutex.
The thread using FMODE_EXCL then continues and when it calls bd_may_claim()
again in:
			BUG_ON(!bd_may_claim(bdev, whole, holder));
The BUG_ON fires.

Fix this by removing the dependency on ->bd_contains in
bd_may_claim().  As bd_may_claim() has direct access to the whole
device, it can simply test if the target bdev is the whole device.

Fixes: 6b4517a7913a ("block: implement bd_claiming and claiming block")
Cc: stable@vger.kernel.org (v2.6.35+)
Signed-off-by: NeilBrown 
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin

ext4: return -ENOMEM instead of success

2017-01-15T14:49:51+00:00

[ Upstream commit 578620f451f836389424833f1454eeeb2ffc9e9f ]

We should set the error code if kzalloc() fails.

Fixes: 67cf5b09a46f ("ext4: add the basic function for inline data support")
Signed-off-by: Dan Carpenter 
Signed-off-by: Theodore Ts'o 
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin

nfs_write_end(): fix handling of short copies

2017-01-15T14:49:51+00:00

[ Upstream commit c0cf3ef5e0f47e385920450b245d22bead93e7ad ]

What matters when deciding if we should make a page uptodate is
not how much we _wanted_ to copy, but how much we actually have
copied.  As it is, on architectures that do not zero tail on
short copy we can leave uninitialized data in page marked uptodate.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro 
Signed-off-by: Sasha Levin

CIFS: Fix a possible memory corruption during reconnect

2017-01-15T14:49:51+00:00

[ Upstream commit 53e0e11efe9289535b060a51d4cf37c25e0d0f2b ]

We can not unlock/lock cifs_tcp_ses_lock while walking through ses
and tcon lists because it can corrupt list iterator pointers and
a tcon structure can be released if we don't hold an extra reference.
Fix it by moving a reconnect process to a separate delayed work
and acquiring a reference to every tcon that needs to be reconnected.
Also do not send an echo request on newly established connections.

CC: Stable 
Signed-off-by: Pavel Shilovsky 
Signed-off-by: Sasha Levin

CIFS: Fix a possible memory corruption in push locks

2017-01-15T14:49:51+00:00

[ Upstream commit e3d240e9d505fc67f8f8735836df97a794bbd946 ]

If maxBuf is not 0 but less than a size of SMB2 lock structure
we can end up with a memory corruption.

Cc: Stable 
Signed-off-by: Pavel Shilovsky 
Signed-off-by: Sasha Levin

CIFS: Fix missing nls unload in smb2_reconnect()

2017-01-15T14:49:50+00:00

[ Upstream commit 4772c79599564bd08ee6682715a7d3516f67433f ]

Cc: Stable 
Acked-by: Sachin Prabhu 
Signed-off-by: Pavel Shilovsky 
Signed-off-by: Sasha Levin

xfs: set AGI buffer type in xlog_recover_clear_agi_bucket

2017-01-15T14:49:50+00:00

[ Upstream commit 6b10b23ca94451fae153a5cc8d62fd721bec2019 ]

xlog_recover_clear_agi_bucket didn't set the
type to XFS_BLFT_AGI_BUF, so we got a warning during log
replay (or an ASSERT on a debug build).

    XFS (md0): Unknown buffer type 0!
    XFS (md0): _xfs_buf_ioapply: no ops on block 0xaea8802/0x1

Fix this, as was done in f19b872b for 2 other locations
with the same problem.

cc:  # 3.10 to current
Signed-off-by: Eric Sandeen 
Reviewed-by: Brian Foster 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Dave Chinner 
Signed-off-by: Sasha Levin

block: protect iterate_bdevs() against concurrent close

2017-01-15T14:49:50+00:00

[ Upstream commit af309226db916e2c6e08d3eba3fa5c34225200c4 ]

If a block device is closed while iterate_bdevs() is handling it, the
following NULL pointer dereference occurs because bdev->b_disk is NULL
in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in
turn called by the mapping_cap_writeback_dirty() call in
__filemap_fdatawrite_range()):

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000508
 IP: [] blk_get_backing_dev_info+0x10/0x20
 PGD 9e62067 PUD 9ee8067 PMD 0
 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
 Modules linked in:
 CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
 task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000
 RIP: 0010:[]  [] blk_get_backing_dev_info+0x10/0x20
 RSP: 0018:ffff880009f5fe68  EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940
 RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0
 RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860
 R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38
 FS:  00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0
 Stack:
  ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff
  0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001
  ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f
 Call Trace:
  [] __filemap_fdatawrite_range+0x85/0x90
  [] filemap_fdatawrite+0x1f/0x30
  [] fdatawrite_one_bdev+0x16/0x20
  [] iterate_bdevs+0xf2/0x130
  [] sys_sync+0x63/0x90
  [] entry_SYSCALL_64_fastpath+0x12/0x76
 Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 <48> 8b 80 08 05 00 00 5d
 RIP  [] blk_get_backing_dev_info+0x10/0x20
  RSP 
 CR2: 0000000000000508
 ---[ end trace 2487336ceb3de62d ]---

The crash is easily reproducible by running the following command, if an
msleep(100) is inserted before the call to func() in iterate_devs():

 while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done

Fix it by holding the bd_mutex across the func() call and only calling
func() if the bdev is opened.

Cc: stable@vger.kernel.org
Fixes: 5c0d6b60a0ba ("vfs: Create function for iterating over block devices")
Reported-and-tested-by: Wei Fang 
Signed-off-by: Rabin Vincent 
Signed-off-by: Jan Kara 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin

Btrfs: fix tree search logic when replaying directory entry deletes

2017-01-15T14:49:50+00:00

[ Upstream commit 2a7bf53f577e49c43de4ffa7776056de26db65d9 ]

If a log tree has a layout like the following:

leaf N:
        ...
        item 240 key (282 DIR_LOG_ITEM 0) itemoff 8189 itemsize 8
                dir log end 1275809046
leaf N + 1:
        item 0 key (282 DIR_LOG_ITEM 3936149215) itemoff 16275 itemsize 8
                dir log end 18446744073709551615
        ...

When we pass the value 1275809046 + 1 as the parameter start_ret to the
function tree-log.c:find_dir_range() (done by replay_dir_deletes()), we
end up with path->slots[0] having the value 239 (points to the last item
of leaf N, item 240). Because the dir log item in that position has an
offset value smaller than *start_ret (1275809046 + 1) we need to move on
to the next leaf, however the logic for that is wrong since it compares
the current slot to the number of items in the leaf, which is smaller
and therefore we don't lookup for the next leaf but instead we set the
slot to point to an item that does not exist, at slot 240, and we later
operate on that slot which has unexpected content or in the worst case
can result in an invalid memory access (accessing beyond the last page
of leaf N's extent buffer).

So fix the logic that checks when we need to lookup at the next leaf
by first incrementing the slot and only after to check if that slot
is beyond the last item of the current leaf.

Signed-off-by: Robbie Ko 
Reviewed-by: Filipe Manana 
Fixes: e02119d5a7b4 (Btrfs: Add a write ahead tree log to optimize synchronous operations)
Cc: stable@vger.kernel.org  # 2.6.29+
Signed-off-by: Filipe Manana 
[Modified changelog for clarity and correctness]

Signed-off-by: Sasha Levin