linux-stable.git/fs/fscache, branch linux-3.12.y

FS-Cache: Don't override netfs's primary_index if registering failed

2016-02-24T08:45:13+00:00

commit b130ed5998e62879a66bad08931a2b5e832da95c upstream.

Only override netfs->primary_index when registering success.

Signed-off-by: Kinglong Mee 
Signed-off-by: David Howells 
Signed-off-by: Al Viro 
Signed-off-by: Jiri Slaby

FS-Cache: Increase reference of parent after registering, netfs success

2016-02-24T08:45:13+00:00

commit 86108c2e34a26e4bec3c6ddb23390bf8cedcf391 upstream.

If netfs exist, fscache should not increase the reference of parent's
usage and n_children, otherwise, never be decreased.

v2: thanks David's suggest,
 move increasing reference of parent if success
 use kmem_cache_free() freeing primary_index directly

v3: don't move "netfs->primary_index->parent = &fscache_fsdef_index;"

Signed-off-by: Kinglong Mee 
Signed-off-by: David Howells 
Signed-off-by: Al Viro 
Signed-off-by: Jiri Slaby

FS-Cache: Handle removal of unadded object to the fscache_object_list rb tree

2014-04-03T08:32:20+00:00

commit 7026f1929e18921fd67bf478f475a8fdfdff16ae upstream.

When FS-Cache allocates an object, the following sequence of events can
occur:

 -->fscache_alloc_object()
    -->cachefiles_alloc_object() [via cache->ops->alloc_object]
    <--[returns new object]
    -->fscache_attach_object()
    <--[failed]
    -->cachefiles_put_object() [via cache->ops->put_object]
       -->fscache_object_destroy()
          -->fscache_objlist_remove()
             -->rb_erase() to remove the object from fscache_object_list.

resulting in a crash in the rbtree code.

The problem is that the object is only added to fscache_object_list on
the success path of fscache_attach_object() where it calls
fscache_objlist_add().

So if fscache_attach_object() fails, the object won't have been added to
the objlist rbtree.  We do, however, unconditionally try to remove the
object from the tree.

Thanks to NeilBrown for finding this and suggesting this solution.

Reported-by: NeilBrown 
Signed-off-by: David Howells 
Tested-by: (a customer of) NeilBrown 
Signed-off-by: Linus Torvalds 
Signed-off-by: Jiri Slaby

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client

2013-09-19T17:50:37+00:00

Pull ceph fixes from Sage Weil:
 "These fix several bugs with RBD from 3.11 that didn't get tested in
  time for the merge window: some error handling, a use-after-free, and
  a sequencing issue when unmapping and image races with a notify
  operation.

  There is also a patch fixing a problem with the new ceph + fscache
  code that just went in"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  fscache: check consistency does not decrement refcount
  rbd: fix error handling from rbd_snap_name()
  rbd: ignore unmapped snapshots that no longer exist
  rbd: fix use-after free of rbd_dev->disk
  rbd: make rbd_obj_notify_ack() synchronous
  rbd: complete notifies before cleaning up osd_client and rbd_dev
  libceph: add function to ensure notifies are complete

lib/radix-tree.c: make radix_tree_node_alloc() work correctly within interrupt

2013-09-11T22:59:36+00:00

With users of radix_tree_preload() run from interrupt (block/blk-ioc.c is
one such possible user), the following race can happen:

radix_tree_preload()
...
radix_tree_insert()
  radix_tree_node_alloc()
    if (rtp->nr) {
      ret = rtp->nodes[rtp->nr - 1];

...
radix_tree_preload()
...
radix_tree_insert()
  radix_tree_node_alloc()
    if (rtp->nr) {
      ret = rtp->nodes[rtp->nr - 1];

And we give out one radix tree node twice.  That clearly results in radix
tree corruption with different results (usually OOPS) depending on which
two users of radix tree race.

We fix the problem by making radix_tree_node_alloc() always allocate fresh
radix tree nodes when in interrupt.  Using preloading when in interrupt
doesn't make sense since all the allocations have to be atomic anyway and
we cannot steal nodes from process-context users because some users rely
on radix_tree_insert() succeeding after radix_tree_preload().
in_interrupt() check is somewhat ugly but we cannot simply key off passed
gfp_mask as that is acquired from root_gfp_mask() and thus the same for
all preload users.

Another part of the fix is to avoid node preallocation in
radix_tree_preload() when passed gfp_mask doesn't allow waiting.  Again,
preallocation in such case doesn't make sense and when preallocation would
happen in interrupt we could possibly leak some allocated nodes.  However,
some users of radix_tree_preload() require following radix_tree_insert()
to succeed.  To avoid unexpected effects for these users,
radix_tree_preload() only warns if passed gfp mask doesn't allow waiting
and we provide a new function radix_tree_maybe_preload() for those users
which get different gfp mask from different call sites and which are
prepared to handle radix_tree_insert() failure.

Signed-off-by: Jan Kara 
Cc: Jens Axboe 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

fscache: check consistency does not decrement refcount

2013-09-10T16:04:46+00:00

__fscache_check_consistency() does not decrement the count of operations
active after it finishes in the success case. This leads to a hung tasks on
cookie de-registration (commonly in inode eviction).

INFO: task kworker/1:2:4214 blocked for more than 120 seconds.
kworker/1:2     D ffff880443513fc0     0  4214      2 0x00000000
Workqueue: ceph-msgr con_work [libceph]
  ...
Call Trace:
 [] ? _raw_spin_unlock_irqrestore+0x16/0x20
 [] ? fscache_wait_bit_interruptible+0x30/0x30 [fscache]
 [] schedule+0x29/0x70
 [] fscache_wait_atomic_t+0xe/0x20 [fscache]
 [] out_of_line_wait_on_atomic_t+0x9f/0xe0
 [] ? autoremove_wake_function+0x40/0x40
 [] __fscache_relinquish_cookie+0x15c/0x310 [fscache]
 [] ceph_fscache_unregister_inode_cookie+0x3e/0x50 [ceph]
 [] ceph_destroy_inode+0x33/0x200 [ceph]
 [] ? __fsnotify_inode_delete+0xe/0x10
 [] destroy_inode+0x3c/0x70
 [] evict+0x119/0x1b0

Signed-off-by: Milosz Tanski 
Acked-by: David Howells 
Signed-off-by: Sage Weil

fscache: Netfs function for cleanup post readpages

2013-09-06T08:17:30+00:00

Currently the fscache code expect the netfs to call fscache_readpages_or_alloc
inside the aops readpages callback.  It marks all the pages in the list
provided by readahead with PG_private_2.  In the cases that the netfs fails to
read all the pages (which is legal) it ends up returning to the readahead and
triggering a BUG.  This happens because the page list still contains marked
pages.

This patch implements a simple fscache_readpages_cancel function that the netfs
should call before returning from readpages.  It will revoke the pages from the
underlying cache backend and unmark them.

The problem was originally worked out in the Ceph devel tree, but it also
occurs in CIFS.  It appears that NFS, AFS and 9P are okay as read_cache_pages()
will clean up the unprocessed pages in the case of an error.

This can be used to address the following oops:

[12410647.597278] BUG: Bad page state in process petabucket  pfn:3d504e
[12410647.597292] page:ffffea000f541380 count:0 mapcount:0 mapping:
	(null) index:0x0
[12410647.597298] page flags: 0x200000000001000(private_2)

...

[12410647.597334] Call Trace:
[12410647.597345]  [] dump_stack+0x19/0x1b
[12410647.597356]  [] bad_page+0xc7/0x120
[12410647.597359]  [] free_pages_prepare+0x10e/0x120
[12410647.597361]  [] free_hot_cold_page+0x40/0x170
[12410647.597363]  [] __put_single_page+0x27/0x30
[12410647.597365]  [] put_page+0x25/0x40
[12410647.597376]  [] ceph_readpages+0x2e9/0x6e0 [ceph]
[12410647.597379]  [] __do_page_cache_readahead+0x1af/0x260
[12410647.597382]  [] ra_submit+0x21/0x30
[12410647.597384]  [] filemap_fault+0x254/0x490
[12410647.597387]  [] __do_fault+0x6f/0x4e0
[12410647.597391]  [] ? __switch_to+0x16d/0x4a0
[12410647.597395]  [] ? finish_task_switch+0x5a/0xc0
[12410647.597398]  [] handle_pte_fault+0xf6/0x930
[12410647.597401]  [] ? pte_mfn_to_pfn+0x93/0x110
[12410647.597403]  [] ? xen_pmd_val+0xe/0x10
[12410647.597405]  [] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
[12410647.597407]  [] handle_mm_fault+0x251/0x370
[12410647.597411]  [] ? call_rwsem_down_read_failed+0x14/0x30
[12410647.597414]  [] __do_page_fault+0x1aa/0x550
[12410647.597418]  [] ? up_write+0x1d/0x20
[12410647.597422]  [] ? vm_mmap_pgoff+0xbc/0xe0
[12410647.597425]  [] ? SyS_mmap_pgoff+0xd8/0x240
[12410647.597427]  [] do_page_fault+0xe/0x10
[12410647.597431]  [] page_fault+0x28/0x30

Signed-off-by: Milosz Tanski 
Signed-off-by: David Howells

FS-Cache: Add interface to check consistency of a cached object

2013-09-06T08:17:30+00:00

Extend the fscache netfs API so that the netfs can ask as to whether a cache
object is up to date with respect to its corresponding netfs object:

	int fscache_check_consistency(struct fscache_cookie *cookie)

This will call back to the netfs to check whether the auxiliary data associated
with a cookie is correct.  It returns 0 if it is and -ESTALE if it isn't; it
may also return -ENOMEM and -ERESTARTSYS.

The backends now have to implement a mandatory operation pointer:

	int (*check_consistency)(struct fscache_object *object)

that corresponds to the above API call.  FS-Cache takes care of pinning the
object and the cookie in memory and managing this call with respect to the
object state.

Original-author: Hongyi Jia 
Signed-off-by: David Howells 
cc: Hongyi Jia 
cc: Milosz Tanski

FS-Cache: Don't use spin_is_locked() in assertions

2013-06-19T13:16:47+00:00

Under certain circumstances, spin_is_locked() is hardwired to 0 - even when the
code would normally be in a locked section where it should return 1.  This
means it cannot be used for an assertion that checks that a spinlock is locked.

Remove such usages from FS-Cache.

The following oops might otherwise be observed:

FS-Cache: Assertion failed
BUG: failure at fs/fscache/operation.c:270/fscache_start_operations()!
Kernel panic - not syncing: BUG!
CPU: 0 PID: 10 Comm: kworker/u2:1 Not tainted 3.10.0-rc1-00133-ge7ebb75 #2
Workqueue: fscache_operation fscache_op_work_func [fscache]
7f091c48 603c8947 7f090000 7f9b1361 7f25f080 00000001 7f26d440 7f091c90
60299eb8 7f091d90 602951c5 7f26d440 3000000008 7f091da0 7f091cc0 7f091cd0
00000007 00000007 00000006 7f091ae0 00000010 0000010e 7f9af330 7f091ae0
Call Trace:
7f091c88: [<60299eb8>] dump_stack+0x17/0x19
7f091c98: [<602951c5>] panic+0xf4/0x1e9
7f091d38: [<6002b10e>] set_signals+0x1e/0x40
7f091d58: [<6005b89e>] __wake_up+0x4e/0x70
7f091d98: [<7f9aa003>] fscache_start_operations+0x43/0x50 [fscache]
7f091da8: [<7f9aa1e3>] fscache_op_complete+0x1d3/0x220 [fscache]
7f091db8: [<60082985>] unlock_page+0x55/0x60
7f091de8: [<7fb25bb0>] cachefiles_read_copier+0x250/0x330 [cachefiles]
7f091e58: [<7f9ab03c>] fscache_op_work_func+0xac/0x120 [fscache]
7f091e88: [<6004d5b0>] process_one_work+0x250/0x3a0
7f091ef8: [<6004edc7>] worker_thread+0x177/0x2a0
7f091f38: [<6004ec50>] worker_thread+0x0/0x2a0
7f091f58: [<60054418>] kthread+0xd8/0xe0
7f091f68: [<6005bb27>] finish_task_switch.isra.64+0x37/0xa0
7f091fd8: [<600185cf>] new_thread_handler+0x8f/0xb0

Reported-by: Milosz Tanski 
Signed-off-by: David Howells 
Reviewed-and-tested-By: Milosz Tanski

FS-Cache: The retrieval remaining-pages counter needs to be atomic_t

2013-06-19T13:16:47+00:00

struct fscache_retrieval contains a count of the number of pages that still
need some processing (n_pages).  This is decremented as the pages are
processed.

However, this needs to be atomic as fscache_retrieval_complete() (I think) just
occasionally may be called from cachefiles_read_backing_file() and
cachefiles_read_copier() simultaneously.

This happens when an fscache_read_or_alloc_pages() request containing a lot of
pages (say a couple of hundred) is being processed.  The read on each backing
page is dispatched individually because we need to insert a monitor into the
waitqueue to catch when the read completes.  However, under low-memory
conditions, we might be forced to wait in the allocator - and this gives the
I/O on the backing page a chance to complete first.

When the I/O completes, fscache_enqueue_retrieval() chucks the retrieval onto
the workqueue without waiting for the operation to finish the initial I/O
dispatch (we want to release any pages we can as soon as we can), thus both can
end up running simultaneously and potentially attempting to partially complete
the retrieval simultaneously (ENOMEM may occur, backing pages may already be in
the page cache).

This was demonstrated by parallelling the non-atomic counter with an atomic
counter and printing both of them when the assertion fails.  At this point, the
atomic counter has reached zero, but the non-atomic counter has not.

To fix this, make the counter an atomic_t.

This results in the following bug appearing

	FS-Cache: Assertion failed
	3 == 5 is false
	------------[ cut here ]------------
	kernel BUG at fs/fscache/operation.c:421!

or

	FS-Cache: Assertion failed
	3 == 5 is false
	------------[ cut here ]------------
	kernel BUG at fs/fscache/operation.c:414!

With a backtrace like the following:

RIP: 0010:[] fscache_put_operation+0x1ad/0x240 [fscache]
Call Trace:
 [] fscache_retrieval_work+0x55/0x270 [fscache]
 [] ? fscache_retrieval_work+0x0/0x270 [fscache]
 [] worker_thread+0x170/0x2a0
 [] ? autoremove_wake_function+0x0/0x40
 [] ? worker_thread+0x0/0x2a0
 [] kthread+0x96/0xa0
 [] child_rip+0xa/0x20
 [] ? kthread+0x0/0xa0
 [] ? child_rip+0x0/0x20

Signed-off-by: David Howells 
Reviewed-and-tested-By: Milosz Tanski 
Acked-by: Jeff Layton