linux-stable.git/fs/notify, branch v3.16.40

fanotify: fix list corruption in fanotify_get_response()

2016-11-20T01:17:33+00:00

commit 96d41019e3ac55f6f0115b0ce97e4f24a3d636d2 upstream.

fanotify_get_response() calls fsnotify_remove_event() when it finds that
group is being released from fanotify_release() (bypass_perm is set).

However the event it removes need not be only in the group's notification
queue but it can have already moved to access_list (userspace read the
event before closing the fanotify instance fd) which is protected by a
different lock.  Thus when fsnotify_remove_event() races with
fanotify_release() operating on access_list, the list can get corrupted.

Fix the problem by moving all the logic removing permission events from
the lists to one place - fanotify_release().

Fixes: 5838d4442bd5 ("fanotify: fix double free of pending permission events")
Link: http://lkml.kernel.org/r/1473797711-14111-3-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara 
Reported-by: Miklos Szeredi 
Tested-by: Miklos Szeredi 
Reviewed-by: Miklos Szeredi 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
[bwh: Backported to 3.16:
 - s/fsnotify_remove_first_event/fsnotify_remove_notify_event/
 - Adjust context]
Signed-off-by: Ben Hutchings

fsnotify: add a way to stop queueing events on group shutdown

2016-11-20T01:17:33+00:00

commit 12703dbfeb15402260e7554d32a34ac40c233990 upstream.

Implement a function that can be called when a group is being shutdown
to stop queueing new events to the group.  Fanotify will use this.

Fixes: 5838d4442bd5 ("fanotify: fix double free of pending permission events")
Link: http://lkml.kernel.org/r/1473797711-14111-2-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara 
Reviewed-by: Miklos Szeredi 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings

fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()

2015-08-27T11:08:16+00:00

commit 8f2f3eb59dff4ec538de55f2e0592fec85966aab upstream.

fsnotify_clear_marks_by_group_flags() can race with
fsnotify_destroy_marks() so that when fsnotify_destroy_mark_locked()
drops mark_mutex, a mark from the list iterated by
fsnotify_clear_marks_by_group_flags() can be freed and thus the next
entry pointer we have cached may become stale and we dereference free
memory.

Fix the problem by first moving marks to free to a special private list
and then always free the first entry in the special list.  This method
is safe even when entries from the list can disappear once we drop the
lock.

Signed-off-by: Jan Kara 
Reported-by: Ashish Sangwan 
Reviewed-by: Ashish Sangwan 
Cc: Lino Sanfilippo 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Luis Henriques

fsnotify: next_i is freed during fsnotify_unmount_inodes.

2015-01-16T14:34:38+00:00

commit 6424babfd68dd8a83d9c60a5242d27038856599f upstream.

During file system stress testing on 3.10 and 3.12 based kernels, the
umount command occasionally hung in fsnotify_unmount_inodes in the
section of code:

                spin_lock(&inode->i_lock);
                if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) {
                        spin_unlock(&inode->i_lock);
                        continue;
                }

As this section of code holds the global inode_sb_list_lock, eventually
the system hangs trying to acquire the lock.

Multiple crash dumps showed:

The inode->i_state == 0x60 and i_count == 0 and i_sb_list would point
back at itself.  As this is not the value of list upon entry to the
function, the kernel never exits the loop.

To help narrow down problem, the call to list_del_init in
inode_sb_list_del was changed to list_del.  This poisons the pointers in
the i_sb_list and causes a kernel to panic if it transverse a freed
inode.

Subsequent stress testing paniced in fsnotify_unmount_inodes at the
bottom of the list_for_each_entry_safe loop showing next_i had become
free.

We believe the root cause of the problem is that next_i is being freed
during the window of time that the list_for_each_entry_safe loop
temporarily releases inode_sb_list_lock to call fsnotify and
fsnotify_inode_delete.

The code in fsnotify_unmount_inodes attempts to prevent the freeing of
inode and next_i by calling __iget.  However, the code doesn't do the
__iget call on next_i

	if i_count == 0 or
	if i_state & (I_FREEING | I_WILL_FREE)

The patch addresses this issue by advancing next_i in the above two cases
until we either find a next_i which we can __iget or we reach the end of
the list.  This makes the handling of next_i more closely match the
handling of the variable "inode."

The time to reproduce the hang is highly variable (from hours to days.) We
ran the stress test on a 3.10 kernel with the proposed patch for a week
without failure.

During list_for_each_entry_safe, next_i is becoming free causing
the loop to never terminate.  Advance next_i in those cases where
__iget is not done.

Signed-off-by: Jerry Hoemann 
Cc: Jeff Kirsher 
Cc: Ken Helias 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Cc: Jan Kara 
Signed-off-by: Luis Henriques

move d_rcu from overlapping d_child to overlapping d_alias

2015-01-15T10:44:58+00:00

commit 946e51f2bf37f1656916eb75bd0742ba33983c28 upstream.

Signed-off-by: Al Viro 
[bwh: Backported to 3.16:
 - Apply name changes in all the different places we use d_alias and d_child
 - Adjust context]
Signed-off-by: Ben Hutchings 
Cc: Moritz Muehlenhoff 
Signed-off-by: Luis Henriques

fanotify: enable close-on-exec on events' fd when requested in fanotify_init()

2014-10-30T16:40:16+00:00

commit 0b37e097a648aa71d4db1ad108001e95b69a2da4 upstream.

According to commit 80af258867648 ("fanotify: groups can specify their
f_flags for new fd"), file descriptors created as part of file access
notification events inherit flags from the event_f_flags argument passed
to syscall fanotify_init(2)[1].

Unfortunately O_CLOEXEC is currently silently ignored.

Indeed, event_f_flags are only given to dentry_open(), which only seems to
care about O_ACCMODE and O_PATH in do_dentry_open(), O_DIRECT in
open_check_o_direct() and O_LARGEFILE in generic_file_open().

It's a pity, since, according to some lookup on various search engines and
http://codesearch.debian.net/, there's already some userspace code which
use O_CLOEXEC:

- in systemd's readahead[2]:

    fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);

- in clsync[3]:

    #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)

    int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);

- in examples [4] from "Filesystem monitoring in the Linux
  kernel" article[5] by Aleksander Morgado:

    if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
                                      O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)

Additionally, since commit 48149e9d3a7e ("fanotify: check file flags
passed in fanotify_init").  having O_CLOEXEC as part of fanotify_init()
second argument is expressly allowed.

So it seems expected to set close-on-exec flag on the file descriptors if
userspace is allowed to request it with O_CLOEXEC.

But Andrew Morton raised[6] the concern that enabling now close-on-exec
might break existing applications which ask for O_CLOEXEC but expect the
file descriptor to be inherited across exec().

In the other hand, as reported by Mihai Dontu[7] close-on-exec on the file
descriptor returned as part of file access notify can break applications
due to deadlock.  So close-on-exec is needed for most applications.

More, applications asking for close-on-exec are likely expecting it to be
enabled, relying on O_CLOEXEC being effective.  If not, it might weaken
their security, as noted by Jan Kara[8].

So this patch replaces call to macro get_unused_fd() by a call to function
get_unused_fd_flags() with event_f_flags value as argument.  This way
O_CLOEXEC flag in the second argument of fanotify_init(2) syscall is
interpreted and close-on-exec get enabled when requested.

[1] http://man7.org/linux/man-pages/man2/fanotify_init.2.html
[2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294
[3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631
    https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38
[4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c
[5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/
[6] http://lkml.kernel.org/r/20141001153621.65e9258e65a6167bf2e4cb50@linux-foundation.org
[7] http://lkml.kernel.org/r/20141002095046.3715eb69@mdontu-l
[8] http://lkml.kernel.org/r/20141002104410.GB19748@quack.suse.cz

Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com
Signed-off-by: Yann Droneaud 
Reviewed-by: Jan Kara 
Reviewed by: Heinrich Schuchardt 
Tested-by: Heinrich Schuchardt 
Cc: Mihai Don\u021bu 
Cc: Pádraig Brady 
Cc: Heinrich Schuchardt 
Cc: Jan Kara 
Cc: Valdis Kletnieks 
Cc: Michael Kerrisk-manpages 
Cc: Lino Sanfilippo 
Cc: Richard Guy Briggs 
Cc: Eric Paris 
Cc: Al Viro 
Cc: Michael Kerrisk 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

fs/notify: don't show f_handle if exportfs_encode_inode_fh failed

2014-10-05T20:41:07+00:00

commit 7e8824816bda16bb11ff5ff1e1212d642e57b0b3 upstream.

Currently we handle only ENOSPC.  In case of other errors the file_handle
variable isn't filled properly and we will show a part of stack.

Signed-off-by: Andrey Vagin 
Acked-by: Cyrill Gorcunov 
Cc: Alexander Viro 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

fsnotify/fdinfo: use named constants instead of hardcoded values

2014-10-05T20:41:07+00:00

commit 1fc98d11cac6dd66342e5580cb2687e5b1e9a613 upstream.

MAX_HANDLE_SZ is equal to 128, but currently the size of pad is only 64
bytes, so exportfs_encode_inode_fh can return an error.

Signed-off-by: Andrey Vagin 
Acked-by: Cyrill Gorcunov 
Cc: Alexander Viro 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

fanotify: fix double free of pending permission events

2014-09-17T16:21:53+00:00

commit 5838d4442bd5971687b72221736222637e03140d upstream.

Commit 85816794240b ("fanotify: Fix use after free for permission
events") introduced a double free issue for permission events which are
pending in group's notification queue while group is being destroyed.
These events are freed from fanotify_handle_event() but they are not
removed from groups notification queue and thus they get freed again
from fsnotify_flush_notify().

Fix the problem by removing permission events from notification queue
before freeing them if we skip processing access response.  Also expand
comments in fanotify_release() to explain group shutdown in detail.

Fixes: 85816794240b9659e66e4d9b0df7c6e814e5f603
Signed-off-by: Jan Kara 
Reported-by: Douglas Leeder 
Tested-by: Douglas Leeder 
Reported-by: Heinrich Schuchard 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

inotify: convert use of typedef ctl_table to struct ctl_table

2014-06-06T23:08:16+00:00

This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds