<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/ceph/locks.c, branch v3.13</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client</title>
<updated>2013-07-09T19:39:10+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-07-09T19:39:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9a5889ae1ce41f376e6a5b56e17e0c5a755fda80'/>
<id>9a5889ae1ce41f376e6a5b56e17e0c5a755fda80</id>
<content type='text'>
Pull Ceph updates from Sage Weil:
 "There is some follow-on RBD cleanup after the last window's code drop,
  a series from Yan fixing multi-mds behavior in cephfs, and then a
  sprinkling of bug fixes all around.  Some warnings, sleeping while
  atomic, a null dereference, and cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (36 commits)
  libceph: fix invalid unsigned-&gt;signed conversion for timespec encoding
  libceph: call r_unsafe_callback when unsafe reply is received
  ceph: fix race between cap issue and revoke
  ceph: fix cap revoke race
  ceph: fix pending vmtruncate race
  ceph: avoid accessing invalid memory
  libceph: Fix NULL pointer dereference in auth client code
  ceph: Reconstruct the func ceph_reserve_caps.
  ceph: Free mdsc if alloc mdsc-&gt;mdsmap failed.
  ceph: remove sb_start/end_write in ceph_aio_write.
  ceph: avoid meaningless calling ceph_caps_revoking if sync_mode == WB_SYNC_ALL.
  ceph: fix sleeping function called from invalid context.
  ceph: move inode to proper flushing list when auth MDS changes
  rbd: fix a couple warnings
  ceph: clear migrate seq when MDS restarts
  ceph: check migrate seq before changing auth cap
  ceph: fix race between page writeback and truncate
  ceph: reset iov_len when discarding cap release messages
  ceph: fix cap release race
  libceph: fix truncate size calculation
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull Ceph updates from Sage Weil:
 "There is some follow-on RBD cleanup after the last window's code drop,
  a series from Yan fixing multi-mds behavior in cephfs, and then a
  sprinkling of bug fixes all around.  Some warnings, sleeping while
  atomic, a null dereference, and cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (36 commits)
  libceph: fix invalid unsigned-&gt;signed conversion for timespec encoding
  libceph: call r_unsafe_callback when unsafe reply is received
  ceph: fix race between cap issue and revoke
  ceph: fix cap revoke race
  ceph: fix pending vmtruncate race
  ceph: avoid accessing invalid memory
  libceph: Fix NULL pointer dereference in auth client code
  ceph: Reconstruct the func ceph_reserve_caps.
  ceph: Free mdsc if alloc mdsc-&gt;mdsmap failed.
  ceph: remove sb_start/end_write in ceph_aio_write.
  ceph: avoid meaningless calling ceph_caps_revoking if sync_mode == WB_SYNC_ALL.
  ceph: fix sleeping function called from invalid context.
  ceph: move inode to proper flushing list when auth MDS changes
  rbd: fix a couple warnings
  ceph: clear migrate seq when MDS restarts
  ceph: check migrate seq before changing auth cap
  ceph: fix race between page writeback and truncate
  ceph: reset iov_len when discarding cap release messages
  ceph: fix cap release race
  libceph: fix truncate size calculation
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: fix up comment for ceph_count_locks() as to which lock to hold</title>
<updated>2013-07-01T16:52:01+00:00</updated>
<author>
<name>Jim Schutt</name>
<email>jaschut@sandia.gov</email>
</author>
<published>2013-05-15T16:38:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4d1bf79aff7962ab0654d66127ebb6eec17460ab'/>
<id>4d1bf79aff7962ab0654d66127ebb6eec17460ab</id>
<content type='text'>
Signed-off-by: Jim Schutt &lt;jaschut@sandia.gov&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Jim Schutt &lt;jaschut@sandia.gov&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>locks: protect most of the file_lock handling with i_lock</title>
<updated>2013-06-29T08:57:42+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@redhat.com</email>
</author>
<published>2013-06-21T12:58:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1c8c601a8c0dc59fe64907dcd9d512a3d181ddc7'/>
<id>1c8c601a8c0dc59fe64907dcd9d512a3d181ddc7</id>
<content type='text'>
Having a global lock that protects all of this code is a clear
scalability problem. Instead of doing that, move most of the code to be
protected by the i_lock instead. The exceptions are the global lists
that the -&gt;fl_link sits on, and the -&gt;fl_block list.

-&gt;fl_link is what connects these structures to the
global lists, so we must ensure that we hold those locks when iterating
over or updating these lists.

Furthermore, sound deadlock detection requires that we hold the
blocked_list state steady while checking for loops. We also must ensure
that the search and update to the list are atomic.

For the checking and insertion side of the blocked_list, push the
acquisition of the global lock into __posix_lock_file and ensure that
checking and update of the  blocked_list is done without dropping the
lock in between.

On the removal side, when waking up blocked lock waiters, take the
global lock before walking the blocked list and dequeue the waiters from
the global list prior to removal from the fl_block list.

With this, deadlock detection should be race free while we minimize
excessive file_lock_lock thrashing.

Finally, in order to avoid a lock inversion problem when handling
/proc/locks output we must ensure that manipulations of the fl_block
list are also protected by the file_lock_lock.

Signed-off-by: Jeff Layton &lt;jlayton@redhat.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Having a global lock that protects all of this code is a clear
scalability problem. Instead of doing that, move most of the code to be
protected by the i_lock instead. The exceptions are the global lists
that the -&gt;fl_link sits on, and the -&gt;fl_block list.

-&gt;fl_link is what connects these structures to the
global lists, so we must ensure that we hold those locks when iterating
over or updating these lists.

Furthermore, sound deadlock detection requires that we hold the
blocked_list state steady while checking for loops. We also must ensure
that the search and update to the list are atomic.

For the checking and insertion side of the blocked_list, push the
acquisition of the global lock into __posix_lock_file and ensure that
checking and update of the  blocked_list is done without dropping the
lock in between.

On the removal side, when waking up blocked lock waiters, take the
global lock before walking the blocked list and dequeue the waiters from
the global list prior to removal from the fl_block list.

With this, deadlock detection should be race free while we minimize
excessive file_lock_lock thrashing.

Finally, in order to avoid a lock inversion problem when handling
/proc/locks output we must ensure that manipulations of the fl_block
list are also protected by the file_lock_lock.

Signed-off-by: Jeff Layton &lt;jlayton@redhat.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: ceph_pagelist_append might sleep while atomic</title>
<updated>2013-05-17T17:45:48+00:00</updated>
<author>
<name>Jim Schutt</name>
<email>jaschut@sandia.gov</email>
</author>
<published>2013-05-15T18:03:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=39be95e9c8c0b5668c9f8806ffe29bf9f4bc0f40'/>
<id>39be95e9c8c0b5668c9f8806ffe29bf9f4bc0f40</id>
<content type='text'>
Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc()
while holding a lock, but it's spoiled because ceph_pagelist_addpage()
always calls kmap(), which might sleep.  Here's the result:

[13439.295457] ceph: mds0 reconnect start
[13439.300572] BUG: sleeping function called from invalid context at include/linux/highmem.h:58
[13439.309243] in_atomic(): 1, irqs_disabled(): 0, pid: 12059, name: kworker/1:1
    . . .
[13439.376225] Call Trace:
[13439.378757]  [&lt;ffffffff81076f4c&gt;] __might_sleep+0xfc/0x110
[13439.384353]  [&lt;ffffffffa03f4ce0&gt;] ceph_pagelist_append+0x120/0x1b0 [libceph]
[13439.391491]  [&lt;ffffffffa0448fe9&gt;] ceph_encode_locks+0x89/0x190 [ceph]
[13439.398035]  [&lt;ffffffff814ee849&gt;] ? _raw_spin_lock+0x49/0x50
[13439.403775]  [&lt;ffffffff811cadf5&gt;] ? lock_flocks+0x15/0x20
[13439.409277]  [&lt;ffffffffa045e2af&gt;] encode_caps_cb+0x41f/0x4a0 [ceph]
[13439.415622]  [&lt;ffffffff81196748&gt;] ? igrab+0x28/0x70
[13439.420610]  [&lt;ffffffffa045e9f8&gt;] ? iterate_session_caps+0xe8/0x250 [ceph]
[13439.427584]  [&lt;ffffffffa045ea25&gt;] iterate_session_caps+0x115/0x250 [ceph]
[13439.434499]  [&lt;ffffffffa045de90&gt;] ? set_request_path_attr+0x2d0/0x2d0 [ceph]
[13439.441646]  [&lt;ffffffffa0462888&gt;] send_mds_reconnect+0x238/0x450 [ceph]
[13439.448363]  [&lt;ffffffffa0464542&gt;] ? ceph_mdsmap_decode+0x5e2/0x770 [ceph]
[13439.455250]  [&lt;ffffffffa0462e42&gt;] check_new_map+0x352/0x500 [ceph]
[13439.461534]  [&lt;ffffffffa04631ad&gt;] ceph_mdsc_handle_map+0x1bd/0x260 [ceph]
[13439.468432]  [&lt;ffffffff814ebc7e&gt;] ? mutex_unlock+0xe/0x10
[13439.473934]  [&lt;ffffffffa043c612&gt;] extra_mon_dispatch+0x22/0x30 [ceph]
[13439.480464]  [&lt;ffffffffa03f6c2c&gt;] dispatch+0xbc/0x110 [libceph]
[13439.486492]  [&lt;ffffffffa03eec3d&gt;] process_message+0x1ad/0x1d0 [libceph]
[13439.493190]  [&lt;ffffffffa03f1498&gt;] ? read_partial_message+0x3e8/0x520 [libceph]
    . . .
[13439.587132] ceph: mds0 reconnect success
[13490.720032] ceph: mds0 caps stale
[13501.235257] ceph: mds0 recovery completed
[13501.300419] ceph: mds0 caps renewed

Fix it up by encoding locks into a buffer first, and when the number
of encoded locks is stable, copy that into a ceph_pagelist.

[elder@inktank.com: abbreviated the stack info a bit.]

Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: Jim Schutt &lt;jaschut@sandia.gov&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc()
while holding a lock, but it's spoiled because ceph_pagelist_addpage()
always calls kmap(), which might sleep.  Here's the result:

[13439.295457] ceph: mds0 reconnect start
[13439.300572] BUG: sleeping function called from invalid context at include/linux/highmem.h:58
[13439.309243] in_atomic(): 1, irqs_disabled(): 0, pid: 12059, name: kworker/1:1
    . . .
[13439.376225] Call Trace:
[13439.378757]  [&lt;ffffffff81076f4c&gt;] __might_sleep+0xfc/0x110
[13439.384353]  [&lt;ffffffffa03f4ce0&gt;] ceph_pagelist_append+0x120/0x1b0 [libceph]
[13439.391491]  [&lt;ffffffffa0448fe9&gt;] ceph_encode_locks+0x89/0x190 [ceph]
[13439.398035]  [&lt;ffffffff814ee849&gt;] ? _raw_spin_lock+0x49/0x50
[13439.403775]  [&lt;ffffffff811cadf5&gt;] ? lock_flocks+0x15/0x20
[13439.409277]  [&lt;ffffffffa045e2af&gt;] encode_caps_cb+0x41f/0x4a0 [ceph]
[13439.415622]  [&lt;ffffffff81196748&gt;] ? igrab+0x28/0x70
[13439.420610]  [&lt;ffffffffa045e9f8&gt;] ? iterate_session_caps+0xe8/0x250 [ceph]
[13439.427584]  [&lt;ffffffffa045ea25&gt;] iterate_session_caps+0x115/0x250 [ceph]
[13439.434499]  [&lt;ffffffffa045de90&gt;] ? set_request_path_attr+0x2d0/0x2d0 [ceph]
[13439.441646]  [&lt;ffffffffa0462888&gt;] send_mds_reconnect+0x238/0x450 [ceph]
[13439.448363]  [&lt;ffffffffa0464542&gt;] ? ceph_mdsmap_decode+0x5e2/0x770 [ceph]
[13439.455250]  [&lt;ffffffffa0462e42&gt;] check_new_map+0x352/0x500 [ceph]
[13439.461534]  [&lt;ffffffffa04631ad&gt;] ceph_mdsc_handle_map+0x1bd/0x260 [ceph]
[13439.468432]  [&lt;ffffffff814ebc7e&gt;] ? mutex_unlock+0xe/0x10
[13439.473934]  [&lt;ffffffffa043c612&gt;] extra_mon_dispatch+0x22/0x30 [ceph]
[13439.480464]  [&lt;ffffffffa03f6c2c&gt;] dispatch+0xbc/0x110 [libceph]
[13439.486492]  [&lt;ffffffffa03eec3d&gt;] process_message+0x1ad/0x1d0 [libceph]
[13439.493190]  [&lt;ffffffffa03f1498&gt;] ? read_partial_message+0x3e8/0x520 [libceph]
    . . .
[13439.587132] ceph: mds0 reconnect success
[13490.720032] ceph: mds0 caps stale
[13501.235257] ceph: mds0 recovery completed
[13501.300419] ceph: mds0 caps renewed

Fix it up by encoding locks into a buffer first, and when the number
of encoded locks is stable, copy that into a ceph_pagelist.

[elder@inktank.com: abbreviated the stack info a bit.]

Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: Jim Schutt &lt;jaschut@sandia.gov&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: add cpu_to_le32() calls when encoding a reconnect capability</title>
<updated>2013-05-17T17:45:43+00:00</updated>
<author>
<name>Jim Schutt</name>
<email>jaschut@sandia.gov</email>
</author>
<published>2013-05-15T18:03:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c420276a532a10ef59849adc2681f45306166b89'/>
<id>c420276a532a10ef59849adc2681f45306166b89</id>
<content type='text'>
In his review, Alex Elder mentioned that he hadn't checked that
num_fcntl_locks and num_flock_locks were properly decoded on the
server side, from a le32 over-the-wire type to a cpu type.
I checked, and AFAICS it is done; those interested can consult
    Locker::_do_cap_update()
in src/mds/Locker.cc and src/include/encoding.h in the Ceph server
code (git://github.com/ceph/ceph).

I also checked the server side for flock_len decoding, and I believe
that also happens correctly, by virtue of having been declared
__le32 in struct ceph_mds_cap_reconnect, in src/include/ceph_fs.h.

Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: Jim Schutt &lt;jaschut@sandia.gov&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In his review, Alex Elder mentioned that he hadn't checked that
num_fcntl_locks and num_flock_locks were properly decoded on the
server side, from a le32 over-the-wire type to a cpu type.
I checked, and AFAICS it is done; those interested can consult
    Locker::_do_cap_update()
in src/mds/Locker.cc and src/include/encoding.h in the Ceph server
code (git://github.com/ceph/ceph).

I also checked the server side for flock_len decoding, and I believe
that also happens correctly, by virtue of having been declared
__le32 in struct ceph_mds_cap_reconnect, in src/include/ceph_fs.h.

Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: Jim Schutt &lt;jaschut@sandia.gov&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>new helper: file_inode(file)</title>
<updated>2013-02-23T04:31:31+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2013-01-23T22:07:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=496ad9aa8ef448058e36ca7a787c61f2e63f0f54'/>
<id>496ad9aa8ef448058e36ca7a787c61f2e63f0f54</id>
<content type='text'>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: unwind canceled flock state</title>
<updated>2011-06-08T04:36:45+00:00</updated>
<author>
<name>Sage Weil</name>
<email>sage@newdream.net</email>
</author>
<published>2011-05-25T21:56:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0c1f91f27140cf3b6e38dc4e892adac241c73a20'/>
<id>0c1f91f27140cf3b6e38dc4e892adac241c73a20</id>
<content type='text'>
If we request a lock and then abort (e.g., ^C), we need to send a matching
unlock request to the MDS to unwind our lock attempt to avoid indefinitely
blocking other clients.

Reported-by: Brian Chrisman &lt;brchrisman@gmail.com&gt;
Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If we request a lock and then abort (e.g., ^C), we need to send a matching
unlock request to the MDS to unwind our lock attempt to avoid indefinitely
blocking other clients.

Reported-by: Brian Chrisman &lt;brchrisman@gmail.com&gt;
Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: use ihold when we already have an inode ref</title>
<updated>2011-06-08T04:34:11+00:00</updated>
<author>
<name>Sage Weil</name>
<email>sage@newdream.net</email>
</author>
<published>2011-05-27T16:24:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=70b666c3b4cb2b96098d80e6f515e4bc6d37db5a'/>
<id>70b666c3b4cb2b96098d80e6f515e4bc6d37db5a</id>
<content type='text'>
We should use ihold whenever we already have a stable inode ref, even
when we aren't holding i_lock.  This avoids adding new and unnecessary
locking dependencies.

Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We should use ihold whenever we already have a stable inode ref, even
when we aren't holding i_lock.  This avoids adding new and unnecessary
locking dependencies.

Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: Behave better when handling file lock replies.</title>
<updated>2010-12-01T22:22:34+00:00</updated>
<author>
<name>Herb Shiu</name>
<email>herb_shiu@tcloudcomputing.com</email>
</author>
<published>2010-11-23T21:58:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a5b10629edfa521071ccdb3b1e0e7fb350a044db'/>
<id>a5b10629edfa521071ccdb3b1e0e7fb350a044db</id>
<content type='text'>
Fill in the local lock with response data if appropriate,
and don't call posix_lock_file when reading locks.

Signed-off-by: Herb Shiu &lt;herb_shiu@tcloudcomputing.com&gt;
Acked-by: Greg Farnum &lt;gregf@hq.newdream.net&gt;
Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fill in the local lock with response data if appropriate,
and don't call posix_lock_file when reading locks.

Signed-off-by: Herb Shiu &lt;herb_shiu@tcloudcomputing.com&gt;
Acked-by: Greg Farnum &lt;gregf@hq.newdream.net&gt;
Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: pass lock information by struct file_lock instead of as individual params.</title>
<updated>2010-12-01T22:22:34+00:00</updated>
<author>
<name>Herb Shiu</name>
<email>herb_shiu@tcloudcomputing.com</email>
</author>
<published>2010-11-23T21:42:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=637ae8d547390df75bad42a7e9cb65e625119767'/>
<id>637ae8d547390df75bad42a7e9cb65e625119767</id>
<content type='text'>
Signed-off-by: Herb Shiu &lt;herb_shiu@tcloudcomputing.com&gt;
Acked-by: Greg Farnum &lt;gregf@hq.newdream.net&gt;
Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Herb Shiu &lt;herb_shiu@tcloudcomputing.com&gt;
Acked-by: Greg Farnum &lt;gregf@hq.newdream.net&gt;
Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
