<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/ceph/locks.c, branch v3.10</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>ceph: ceph_pagelist_append might sleep while atomic</title>
<updated>2013-05-17T17:45:48+00:00</updated>
<author>
<name>Jim Schutt</name>
<email>jaschut@sandia.gov</email>
</author>
<published>2013-05-15T18:03:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=39be95e9c8c0b5668c9f8806ffe29bf9f4bc0f40'/>
<id>39be95e9c8c0b5668c9f8806ffe29bf9f4bc0f40</id>
<content type='text'>
Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc()
while holding a lock, but it's spoiled because ceph_pagelist_addpage()
always calls kmap(), which might sleep.  Here's the result:

[13439.295457] ceph: mds0 reconnect start
[13439.300572] BUG: sleeping function called from invalid context at include/linux/highmem.h:58
[13439.309243] in_atomic(): 1, irqs_disabled(): 0, pid: 12059, name: kworker/1:1
    . . .
[13439.376225] Call Trace:
[13439.378757]  [&lt;ffffffff81076f4c&gt;] __might_sleep+0xfc/0x110
[13439.384353]  [&lt;ffffffffa03f4ce0&gt;] ceph_pagelist_append+0x120/0x1b0 [libceph]
[13439.391491]  [&lt;ffffffffa0448fe9&gt;] ceph_encode_locks+0x89/0x190 [ceph]
[13439.398035]  [&lt;ffffffff814ee849&gt;] ? _raw_spin_lock+0x49/0x50
[13439.403775]  [&lt;ffffffff811cadf5&gt;] ? lock_flocks+0x15/0x20
[13439.409277]  [&lt;ffffffffa045e2af&gt;] encode_caps_cb+0x41f/0x4a0 [ceph]
[13439.415622]  [&lt;ffffffff81196748&gt;] ? igrab+0x28/0x70
[13439.420610]  [&lt;ffffffffa045e9f8&gt;] ? iterate_session_caps+0xe8/0x250 [ceph]
[13439.427584]  [&lt;ffffffffa045ea25&gt;] iterate_session_caps+0x115/0x250 [ceph]
[13439.434499]  [&lt;ffffffffa045de90&gt;] ? set_request_path_attr+0x2d0/0x2d0 [ceph]
[13439.441646]  [&lt;ffffffffa0462888&gt;] send_mds_reconnect+0x238/0x450 [ceph]
[13439.448363]  [&lt;ffffffffa0464542&gt;] ? ceph_mdsmap_decode+0x5e2/0x770 [ceph]
[13439.455250]  [&lt;ffffffffa0462e42&gt;] check_new_map+0x352/0x500 [ceph]
[13439.461534]  [&lt;ffffffffa04631ad&gt;] ceph_mdsc_handle_map+0x1bd/0x260 [ceph]
[13439.468432]  [&lt;ffffffff814ebc7e&gt;] ? mutex_unlock+0xe/0x10
[13439.473934]  [&lt;ffffffffa043c612&gt;] extra_mon_dispatch+0x22/0x30 [ceph]
[13439.480464]  [&lt;ffffffffa03f6c2c&gt;] dispatch+0xbc/0x110 [libceph]
[13439.486492]  [&lt;ffffffffa03eec3d&gt;] process_message+0x1ad/0x1d0 [libceph]
[13439.493190]  [&lt;ffffffffa03f1498&gt;] ? read_partial_message+0x3e8/0x520 [libceph]
    . . .
[13439.587132] ceph: mds0 reconnect success
[13490.720032] ceph: mds0 caps stale
[13501.235257] ceph: mds0 recovery completed
[13501.300419] ceph: mds0 caps renewed

Fix it up by encoding locks into a buffer first, and when the number
of encoded locks is stable, copy that into a ceph_pagelist.

[elder@inktank.com: abbreviated the stack info a bit.]

Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: Jim Schutt &lt;jaschut@sandia.gov&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc()
while holding a lock, but it's spoiled because ceph_pagelist_addpage()
always calls kmap(), which might sleep.  Here's the result:

[13439.295457] ceph: mds0 reconnect start
[13439.300572] BUG: sleeping function called from invalid context at include/linux/highmem.h:58
[13439.309243] in_atomic(): 1, irqs_disabled(): 0, pid: 12059, name: kworker/1:1
    . . .
[13439.376225] Call Trace:
[13439.378757]  [&lt;ffffffff81076f4c&gt;] __might_sleep+0xfc/0x110
[13439.384353]  [&lt;ffffffffa03f4ce0&gt;] ceph_pagelist_append+0x120/0x1b0 [libceph]
[13439.391491]  [&lt;ffffffffa0448fe9&gt;] ceph_encode_locks+0x89/0x190 [ceph]
[13439.398035]  [&lt;ffffffff814ee849&gt;] ? _raw_spin_lock+0x49/0x50
[13439.403775]  [&lt;ffffffff811cadf5&gt;] ? lock_flocks+0x15/0x20
[13439.409277]  [&lt;ffffffffa045e2af&gt;] encode_caps_cb+0x41f/0x4a0 [ceph]
[13439.415622]  [&lt;ffffffff81196748&gt;] ? igrab+0x28/0x70
[13439.420610]  [&lt;ffffffffa045e9f8&gt;] ? iterate_session_caps+0xe8/0x250 [ceph]
[13439.427584]  [&lt;ffffffffa045ea25&gt;] iterate_session_caps+0x115/0x250 [ceph]
[13439.434499]  [&lt;ffffffffa045de90&gt;] ? set_request_path_attr+0x2d0/0x2d0 [ceph]
[13439.441646]  [&lt;ffffffffa0462888&gt;] send_mds_reconnect+0x238/0x450 [ceph]
[13439.448363]  [&lt;ffffffffa0464542&gt;] ? ceph_mdsmap_decode+0x5e2/0x770 [ceph]
[13439.455250]  [&lt;ffffffffa0462e42&gt;] check_new_map+0x352/0x500 [ceph]
[13439.461534]  [&lt;ffffffffa04631ad&gt;] ceph_mdsc_handle_map+0x1bd/0x260 [ceph]
[13439.468432]  [&lt;ffffffff814ebc7e&gt;] ? mutex_unlock+0xe/0x10
[13439.473934]  [&lt;ffffffffa043c612&gt;] extra_mon_dispatch+0x22/0x30 [ceph]
[13439.480464]  [&lt;ffffffffa03f6c2c&gt;] dispatch+0xbc/0x110 [libceph]
[13439.486492]  [&lt;ffffffffa03eec3d&gt;] process_message+0x1ad/0x1d0 [libceph]
[13439.493190]  [&lt;ffffffffa03f1498&gt;] ? read_partial_message+0x3e8/0x520 [libceph]
    . . .
[13439.587132] ceph: mds0 reconnect success
[13490.720032] ceph: mds0 caps stale
[13501.235257] ceph: mds0 recovery completed
[13501.300419] ceph: mds0 caps renewed

Fix it up by encoding locks into a buffer first, and when the number
of encoded locks is stable, copy that into a ceph_pagelist.

[elder@inktank.com: abbreviated the stack info a bit.]

Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: Jim Schutt &lt;jaschut@sandia.gov&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: add cpu_to_le32() calls when encoding a reconnect capability</title>
<updated>2013-05-17T17:45:43+00:00</updated>
<author>
<name>Jim Schutt</name>
<email>jaschut@sandia.gov</email>
</author>
<published>2013-05-15T18:03:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c420276a532a10ef59849adc2681f45306166b89'/>
<id>c420276a532a10ef59849adc2681f45306166b89</id>
<content type='text'>
In his review, Alex Elder mentioned that he hadn't checked that
num_fcntl_locks and num_flock_locks were properly decoded on the
server side, from a le32 over-the-wire type to a cpu type.
I checked, and AFAICS it is done; those interested can consult
    Locker::_do_cap_update()
in src/mds/Locker.cc and src/include/encoding.h in the Ceph server
code (git://github.com/ceph/ceph).

I also checked the server side for flock_len decoding, and I believe
that also happens correctly, by virtue of having been declared
__le32 in struct ceph_mds_cap_reconnect, in src/include/ceph_fs.h.

Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: Jim Schutt &lt;jaschut@sandia.gov&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In his review, Alex Elder mentioned that he hadn't checked that
num_fcntl_locks and num_flock_locks were properly decoded on the
server side, from a le32 over-the-wire type to a cpu type.
I checked, and AFAICS it is done; those interested can consult
    Locker::_do_cap_update()
in src/mds/Locker.cc and src/include/encoding.h in the Ceph server
code (git://github.com/ceph/ceph).

I also checked the server side for flock_len decoding, and I believe
that also happens correctly, by virtue of having been declared
__le32 in struct ceph_mds_cap_reconnect, in src/include/ceph_fs.h.

Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: Jim Schutt &lt;jaschut@sandia.gov&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>new helper: file_inode(file)</title>
<updated>2013-02-23T04:31:31+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2013-01-23T22:07:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=496ad9aa8ef448058e36ca7a787c61f2e63f0f54'/>
<id>496ad9aa8ef448058e36ca7a787c61f2e63f0f54</id>
<content type='text'>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: unwind canceled flock state</title>
<updated>2011-06-08T04:36:45+00:00</updated>
<author>
<name>Sage Weil</name>
<email>sage@newdream.net</email>
</author>
<published>2011-05-25T21:56:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0c1f91f27140cf3b6e38dc4e892adac241c73a20'/>
<id>0c1f91f27140cf3b6e38dc4e892adac241c73a20</id>
<content type='text'>
If we request a lock and then abort (e.g., ^C), we need to send a matching
unlock request to the MDS to unwind our lock attempt to avoid indefinitely
blocking other clients.

Reported-by: Brian Chrisman &lt;brchrisman@gmail.com&gt;
Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If we request a lock and then abort (e.g., ^C), we need to send a matching
unlock request to the MDS to unwind our lock attempt to avoid indefinitely
blocking other clients.

Reported-by: Brian Chrisman &lt;brchrisman@gmail.com&gt;
Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: use ihold when we already have an inode ref</title>
<updated>2011-06-08T04:34:11+00:00</updated>
<author>
<name>Sage Weil</name>
<email>sage@newdream.net</email>
</author>
<published>2011-05-27T16:24:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=70b666c3b4cb2b96098d80e6f515e4bc6d37db5a'/>
<id>70b666c3b4cb2b96098d80e6f515e4bc6d37db5a</id>
<content type='text'>
We should use ihold whenever we already have a stable inode ref, even
when we aren't holding i_lock.  This avoids adding new and unnecessary
locking dependencies.

Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We should use ihold whenever we already have a stable inode ref, even
when we aren't holding i_lock.  This avoids adding new and unnecessary
locking dependencies.

Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: Behave better when handling file lock replies.</title>
<updated>2010-12-01T22:22:34+00:00</updated>
<author>
<name>Herb Shiu</name>
<email>herb_shiu@tcloudcomputing.com</email>
</author>
<published>2010-11-23T21:58:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a5b10629edfa521071ccdb3b1e0e7fb350a044db'/>
<id>a5b10629edfa521071ccdb3b1e0e7fb350a044db</id>
<content type='text'>
Fill in the local lock with response data if appropriate,
and don't call posix_lock_file when reading locks.

Signed-off-by: Herb Shiu &lt;herb_shiu@tcloudcomputing.com&gt;
Acked-by: Greg Farnum &lt;gregf@hq.newdream.net&gt;
Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fill in the local lock with response data if appropriate,
and don't call posix_lock_file when reading locks.

Signed-off-by: Herb Shiu &lt;herb_shiu@tcloudcomputing.com&gt;
Acked-by: Greg Farnum &lt;gregf@hq.newdream.net&gt;
Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: pass lock information by struct file_lock instead of as individual params.</title>
<updated>2010-12-01T22:22:34+00:00</updated>
<author>
<name>Herb Shiu</name>
<email>herb_shiu@tcloudcomputing.com</email>
</author>
<published>2010-11-23T21:42:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=637ae8d547390df75bad42a7e9cb65e625119767'/>
<id>637ae8d547390df75bad42a7e9cb65e625119767</id>
<content type='text'>
Signed-off-by: Herb Shiu &lt;herb_shiu@tcloudcomputing.com&gt;
Acked-by: Greg Farnum &lt;gregf@hq.newdream.net&gt;
Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Herb Shiu &lt;herb_shiu@tcloudcomputing.com&gt;
Acked-by: Greg Farnum &lt;gregf@hq.newdream.net&gt;
Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: preallocate flock state without locks held</title>
<updated>2010-10-20T22:38:17+00:00</updated>
<author>
<name>Greg Farnum</name>
<email>gregf@hq.newdream.net</email>
</author>
<published>2010-09-17T17:24:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fca4451acfdcf894154e4809529ca28a09db88ff'/>
<id>fca4451acfdcf894154e4809529ca28a09db88ff</id>
<content type='text'>
When the lock_kernel() turns into lock_flocks() and a spinlock, we won't
be able to do allocations with the lock held.  Preallocate space without
the lock, and retry if the lock state changes out from underneath us.

Signed-off-by: Greg Farnum &lt;gregf@hq.newdream.net&gt;
Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When the lock_kernel() turns into lock_flocks() and a spinlock, we won't
be able to do allocations with the lock held.  Preallocate space without
the lock, and retry if the lock state changes out from underneath us.

Signed-off-by: Greg Farnum &lt;gregf@hq.newdream.net&gt;
Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: factor out libceph from Ceph file system</title>
<updated>2010-10-20T22:37:28+00:00</updated>
<author>
<name>Yehuda Sadeh</name>
<email>yehuda@hq.newdream.net</email>
</author>
<published>2010-04-06T22:14:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3d14c5d2b6e15c21d8e5467dc62d33127c23a644'/>
<id>3d14c5d2b6e15c21d8e5467dc62d33127c23a644</id>
<content type='text'>
This factors out protocol and low-level storage parts of ceph into a
separate libceph module living in net/ceph and include/linux/ceph.  This
is mostly a matter of moving files around.  However, a few key pieces
of the interface change as well:

 - ceph_client becomes ceph_fs_client and ceph_client, where the latter
   captures the mon and osd clients, and the fs_client gets the mds client
   and file system specific pieces.
 - Mount option parsing and debugfs setup is correspondingly broken into
   two pieces.
 - The mon client gets a generic handler callback for otherwise unknown
   messages (mds map, in this case).
 - The basic supported/required feature bits can be expanded (and are by
   ceph_fs_client).

No functional change, aside from some subtle error handling cases that got
cleaned up in the refactoring process.

Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This factors out protocol and low-level storage parts of ceph into a
separate libceph module living in net/ceph and include/linux/ceph.  This
is mostly a matter of moving files around.  However, a few key pieces
of the interface change as well:

 - ceph_client becomes ceph_fs_client and ceph_client, where the latter
   captures the mon and osd clients, and the fs_client gets the mds client
   and file system specific pieces.
 - Mount option parsing and debugfs setup is correspondingly broken into
   two pieces.
 - The mon client gets a generic handler callback for otherwise unknown
   messages (mds map, in this case).
 - The basic supported/required feature bits can be expanded (and are by
   ceph_fs_client).

No functional change, aside from some subtle error handling cases that got
cleaned up in the refactoring process.

Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: Fix warnings</title>
<updated>2010-08-25T19:02:14+00:00</updated>
<author>
<name>Alan Cox</name>
<email>alan@linux.intel.com</email>
</author>
<published>2010-08-25T12:26:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ad8453ab0a5b98884074302ba3cc37664791e261'/>
<id>ad8453ab0a5b98884074302ba3cc37664791e261</id>
<content type='text'>
Just scrubbing some warnings so I can see real problem ones in the build
noise. For 32bit we need to coax gcc politely into believing we really
honestly intend to the casts. Using (u64)(unsigned long) means we cast from
a pointer to a type of the right size and then extend it. This stops the
warning spew.

Signed-off-by: Alan Cox &lt;alan@linux.intel.com&gt;
Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Just scrubbing some warnings so I can see real problem ones in the build
noise. For 32bit we need to coax gcc politely into believing we really
honestly intend to the casts. Using (u64)(unsigned long) means we cast from
a pointer to a type of the right size and then extend it. This stops the
warning spew.

Signed-off-by: Alan Cox &lt;alan@linux.intel.com&gt;
Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
