linux-stable.git/fs/ceph, branch v5.4.166

ceph: fix handling of "meta" errors

2021-10-27T07:54:27+00:00

commit 1bd85aa65d0e7b5e4d09240f492f37c569fdd431 upstream.

Currently, we check the wb_err too early for directories, before all of
the unsafe child requests have been waited on. In order to fix that we
need to check the mapping->wb_err later nearer to the end of ceph_fsync.

We also have an overly-complex method for tracking errors after
blocklisting. The errors recorded in cleanup_session_requests go to a
completely separate field in the inode, but we end up reporting them the
same way we would for any other error (in fsync).

There's no real benefit to tracking these errors in two different
places, since the only reporting mechanism for them is in fsync, and
we'd need to advance them both every time.

Given that, we can just remove i_meta_err, and convert the places that
used it to instead just use mapping->wb_err instead. That also fixes
the original problem by ensuring that we do a check_and_advance of the
wb_err at the end of the fsync op.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/52864
Reported-by: Patrick Donnelly 
Signed-off-by: Jeff Layton 
Reviewed-by: Xiubo Li 
Signed-off-by: Ilya Dryomov 
Signed-off-by: Greg Kroah-Hartman

ceph: lockdep annotations for try_nonblocking_invalidate

2021-09-26T12:07:11+00:00

[ Upstream commit 3eaf5aa1cfa8c97c72f5824e2e9263d6cc977b03 ]

Signed-off-by: Jeff Layton 
Reviewed-by: Ilya Dryomov 
Signed-off-by: Ilya Dryomov 
Signed-off-by: Sasha Levin

ceph: request Fw caps before updating the mtime in ceph_write_iter

2021-09-26T12:07:11+00:00

[ Upstream commit b11ed50346683a749632ea664959b28d524d7395 ]

The current code will update the mtime and then try to get caps to
handle the write. If we end up having to request caps from the MDS, then
the mtime in the cap grant will clobber the updated mtime and it'll be
lost.

This is most noticable when two clients are alternately writing to the
same file. Fw caps are continually being granted and revoked, and the
mtime ends up stuck because the updated mtimes are always being
overwritten with the old one.

Fix this by changing the order of operations in ceph_write_iter to get
the caps before updating the times. Also, make sure we check the pool
full conditions before even getting any caps or uninlining.

URL: https://tracker.ceph.com/issues/46574
Reported-by: Jozef Kováč 
Signed-off-by: Jeff Layton 
Reviewed-by: Xiubo Li 
Reviewed-by: Luis Henriques 
Signed-off-by: Ilya Dryomov 
Signed-off-by: Sasha Levin

ceph: take snap_empty_lock atomically with snaprealm refcount change

2021-08-18T06:57:04+00:00

commit 8434ffe71c874b9c4e184b88d25de98c2bf5fe3f upstream.

There is a race in ceph_put_snap_realm. The change to the nref and the
spinlock acquisition are not done atomically, so you could decrement
nref, and before you take the spinlock, the nref is incremented again.
At that point, you end up putting it on the empty list when it
shouldn't be there. Eventually __cleanup_empty_realms runs and frees
it when it's still in-use.

Fix this by protecting the 1->0 transition with atomic_dec_and_lock,
and just drop the spinlock if we can get the rwsem.

Because these objects can also undergo a 0->1 refcount transition, we
must protect that change as well with the spinlock. Increment locklessly
unless the value is at 0, in which case we take the spinlock, increment
and then take it off the empty list if it did the 0->1 transition.

With these changes, I'm removing the dout() messages from these
functions, as well as in __put_snap_realm. They've always been racy, and
it's better to not print values that may be misleading.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/46419
Reported-by: Mark Nelson 
Signed-off-by: Jeff Layton 
Reviewed-by: Luis Henriques 
Signed-off-by: Ilya Dryomov 
Signed-off-by: Greg Kroah-Hartman

ceph: clean up locking annotation for ceph_get_snap_realm and __lookup_snap_realm

2021-08-18T06:57:04+00:00

commit df2c0cb7f8e8c83e495260ad86df8c5da947f2a7 upstream.

They both say that the snap_rwsem must be held for write, but I don't
see any real reason for it, and it's not currently always called that
way.

The lookup is just walking the rbtree, so holding it for read should be
fine there. The "get" is bumping the refcount and (possibly) removing
it from the empty list. I see no need to hold the snap_rwsem for write
for that.

Signed-off-by: Jeff Layton 
Reviewed-by: Ilya Dryomov 
Signed-off-by: Ilya Dryomov 
Signed-off-by: Greg Kroah-Hartman

ceph: add some lockdep assertions around snaprealm handling

2021-08-18T06:57:04+00:00

commit a6862e6708c15995bc10614b2ef34ca35b4b9078 upstream.

Turn some comments into lockdep asserts.

Signed-off-by: Jeff Layton 
Reviewed-by: Ilya Dryomov 
Signed-off-by: Ilya Dryomov 
Signed-off-by: Greg Kroah-Hartman

ceph: reduce contention in ceph_check_delayed_caps()

2021-08-18T06:56:57+00:00

commit bf2ba432213fade50dd39f2e348085b758c0726e upstream.

Function ceph_check_delayed_caps() is called from the mdsc->delayed_work
workqueue and it can be kept looping for quite some time if caps keep
being added back to the mdsc->cap_delay_list.  This may result in the
watchdog tainting the kernel with the softlockup flag.

This patch breaks this loop if the caps have been recently (i.e. during
the loop execution).  Any new caps added to the list will be handled in
the next run.

Also, allow schedule_delayed() callers to explicitly set the delay value
instead of defaulting to 5s, so we can ensure that it runs soon
afterward if it looks like there is more work.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/46284
Signed-off-by: Luis Henriques 
Reviewed-by: Jeff Layton 
Signed-off-by: Ilya Dryomov 
Signed-off-by: Greg Kroah-Hartman

ceph: remove bogus checks and WARN_ONs from ceph_set_page_dirty

2021-07-20T14:10:48+00:00

[ Upstream commit 22d41cdcd3cfd467a4af074165357fcbea1c37f5 ]

The checks for page->mapping are odd, as set_page_dirty is an
address_space operation, and I don't see where it would be called on a
non-pagecache page.

The warning about the page lock also seems bogus.  The comment over
set_page_dirty() says that it can be called without the page lock in
some rare cases. I don't think we want to warn if that's the case.

Reported-by: Matthew Wilcox 
Signed-off-by: Jeff Layton 
Signed-off-by: Ilya Dryomov 
Signed-off-by: Sasha Levin

ceph: fix fscache invalidation

2021-05-22T09:38:29+00:00

[ Upstream commit 10a7052c7868bc7bc72d947f5aac6f768928db87 ]

Ensure that we invalidate the fscache whenever we invalidate the
pagecache.

Signed-off-by: Jeff Layton 
Signed-off-by: Ilya Dryomov 
Signed-off-by: Sasha Levin

ceph: fix inode leak on getattr error in __fh_to_dentry

2021-05-19T08:08:26+00:00

[ Upstream commit 1775c7ddacfcea29051c67409087578f8f4d751b ]

Fixes: 878dabb64117 ("ceph: don't return -ESTALE if there's still an open file")
Signed-off-by: Jeff Layton 
Reviewed-by: Xiubo Li 
Signed-off-by: Ilya Dryomov 
Signed-off-by: Sasha Levin