<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/ceph/file.c, branch v3.11</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client</title>
<updated>2013-07-09T19:39:10+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-07-09T19:39:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9a5889ae1ce41f376e6a5b56e17e0c5a755fda80'/>
<id>9a5889ae1ce41f376e6a5b56e17e0c5a755fda80</id>
<content type='text'>
Pull Ceph updates from Sage Weil:
 "There is some follow-on RBD cleanup after the last window's code drop,
  a series from Yan fixing multi-mds behavior in cephfs, and then a
  sprinkling of bug fixes all around.  Some warnings, sleeping while
  atomic, a null dereference, and cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (36 commits)
  libceph: fix invalid unsigned-&gt;signed conversion for timespec encoding
  libceph: call r_unsafe_callback when unsafe reply is received
  ceph: fix race between cap issue and revoke
  ceph: fix cap revoke race
  ceph: fix pending vmtruncate race
  ceph: avoid accessing invalid memory
  libceph: Fix NULL pointer dereference in auth client code
  ceph: Reconstruct the func ceph_reserve_caps.
  ceph: Free mdsc if alloc mdsc-&gt;mdsmap failed.
  ceph: remove sb_start/end_write in ceph_aio_write.
  ceph: avoid meaningless calling ceph_caps_revoking if sync_mode == WB_SYNC_ALL.
  ceph: fix sleeping function called from invalid context.
  ceph: move inode to proper flushing list when auth MDS changes
  rbd: fix a couple warnings
  ceph: clear migrate seq when MDS restarts
  ceph: check migrate seq before changing auth cap
  ceph: fix race between page writeback and truncate
  ceph: reset iov_len when discarding cap release messages
  ceph: fix cap release race
  libceph: fix truncate size calculation
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull Ceph updates from Sage Weil:
 "There is some follow-on RBD cleanup after the last window's code drop,
  a series from Yan fixing multi-mds behavior in cephfs, and then a
  sprinkling of bug fixes all around.  Some warnings, sleeping while
  atomic, a null dereference, and cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (36 commits)
  libceph: fix invalid unsigned-&gt;signed conversion for timespec encoding
  libceph: call r_unsafe_callback when unsafe reply is received
  ceph: fix race between cap issue and revoke
  ceph: fix cap revoke race
  ceph: fix pending vmtruncate race
  ceph: avoid accessing invalid memory
  libceph: Fix NULL pointer dereference in auth client code
  ceph: Reconstruct the func ceph_reserve_caps.
  ceph: Free mdsc if alloc mdsc-&gt;mdsmap failed.
  ceph: remove sb_start/end_write in ceph_aio_write.
  ceph: avoid meaningless calling ceph_caps_revoking if sync_mode == WB_SYNC_ALL.
  ceph: fix sleeping function called from invalid context.
  ceph: move inode to proper flushing list when auth MDS changes
  rbd: fix a couple warnings
  ceph: clear migrate seq when MDS restarts
  ceph: check migrate seq before changing auth cap
  ceph: fix race between page writeback and truncate
  ceph: reset iov_len when discarding cap release messages
  ceph: fix cap release race
  libceph: fix truncate size calculation
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: fix pending vmtruncate race</title>
<updated>2013-07-03T22:32:56+00:00</updated>
<author>
<name>Yan, Zheng</name>
<email>zheng.z.yan@intel.com</email>
</author>
<published>2013-07-02T04:40:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b415bf4f9fe25f39934f5c464125e4a2dffb6d08'/>
<id>b415bf4f9fe25f39934f5c464125e4a2dffb6d08</id>
<content type='text'>
The locking order for pending vmtruncate is wrong, it can lead to
following race:

        write                  wmtruncate work
------------------------    ----------------------
lock i_mutex
check i_truncate_pending   check i_truncate_pending
truncate_inode_pages()     lock i_mutex (blocked)
copy data to page cache
unlock i_mutex
                           truncate_inode_pages()

The fix is take i_mutex before calling __ceph_do_pending_vmtruncate()

Fixes: http://tracker.ceph.com/issues/5453
Signed-off-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The locking order for pending vmtruncate is wrong, it can lead to
following race:

        write                  wmtruncate work
------------------------    ----------------------
lock i_mutex
check i_truncate_pending   check i_truncate_pending
truncate_inode_pages()     lock i_mutex (blocked)
copy data to page cache
unlock i_mutex
                           truncate_inode_pages()

The fix is take i_mutex before calling __ceph_do_pending_vmtruncate()

Fixes: http://tracker.ceph.com/issues/5453
Signed-off-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: remove sb_start/end_write in ceph_aio_write.</title>
<updated>2013-07-03T22:32:52+00:00</updated>
<author>
<name>Jianpeng Ma</name>
<email>majianpeng@gmail.com</email>
</author>
<published>2013-06-23T16:48:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0405a1499df42a2b9fd4906096c6bb950e15e850'/>
<id>0405a1499df42a2b9fd4906096c6bb950e15e850</id>
<content type='text'>
Either in vfs_write or io_submit,it call file_start/end_write.
The different between file_start/end_write and sb_start/end_write is
file_ only handle regular file.But i think in ceph_aio_write,it only
for regular file.

Signed-off-by: Jianpeng Ma &lt;majianpeng@gmail.com&gt;
Acked-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Either in vfs_write or io_submit,it call file_start/end_write.
The different between file_start/end_write and sb_start/end_write is
file_ only handle regular file.But i think in ceph_aio_write,it only
for regular file.

Signed-off-by: Jianpeng Ma &lt;majianpeng@gmail.com&gt;
Acked-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfs: export lseek_execute() to modules</title>
<updated>2013-07-03T12:23:27+00:00</updated>
<author>
<name>Jie Liu</name>
<email>jeff.liu@oracle.com</email>
</author>
<published>2013-06-25T04:02:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=46a1c2c7ae53de2a5676754b54a73c591a3951d2'/>
<id>46a1c2c7ae53de2a5676754b54a73c591a3951d2</id>
<content type='text'>
For those file systems(btrfs/ext4/ocfs2/tmpfs) that support
SEEK_DATA/SEEK_HOLE functions, we end up handling the similar
matter in lseek_execute() to update the current file offset
to the desired offset if it is valid, ceph also does the
simliar things at ceph_llseek().

To reduce the duplications, this patch make lseek_execute()
public accessible so that we can call it directly from the
underlying file systems.

Thanks Dave Chinner for this suggestion.

[AV: call it vfs_setpos(), don't bring the removed 'inode' argument back]

v2-&gt;v1:
- Add kernel-doc comments for lseek_execute()
- Call lseek_execute() in ceph-&gt;llseek()

Signed-off-by: Jie Liu &lt;jeff.liu@oracle.com&gt;
Cc: Dave Chinner &lt;dchinner@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Chris Mason &lt;chris.mason@fusionio.com&gt;
Cc: Josef Bacik &lt;jbacik@fusionio.com&gt;
Cc: Ben Myers &lt;bpm@sgi.com&gt;
Cc: Ted Tso &lt;tytso@mit.edu&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Mark Fasheh &lt;mfasheh@suse.com&gt;
Cc: Joel Becker &lt;jlbec@evilplan.org&gt;
Cc: Sage Weil &lt;sage@inktank.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For those file systems(btrfs/ext4/ocfs2/tmpfs) that support
SEEK_DATA/SEEK_HOLE functions, we end up handling the similar
matter in lseek_execute() to update the current file offset
to the desired offset if it is valid, ceph also does the
simliar things at ceph_llseek().

To reduce the duplications, this patch make lseek_execute()
public accessible so that we can call it directly from the
underlying file systems.

Thanks Dave Chinner for this suggestion.

[AV: call it vfs_setpos(), don't bring the removed 'inode' argument back]

v2-&gt;v1:
- Add kernel-doc comments for lseek_execute()
- Call lseek_execute() in ceph-&gt;llseek()

Signed-off-by: Jie Liu &lt;jeff.liu@oracle.com&gt;
Cc: Dave Chinner &lt;dchinner@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Chris Mason &lt;chris.mason@fusionio.com&gt;
Cc: Josef Bacik &lt;jbacik@fusionio.com&gt;
Cc: Ben Myers &lt;bpm@sgi.com&gt;
Cc: Ted Tso &lt;tytso@mit.edu&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Mark Fasheh &lt;mfasheh@suse.com&gt;
Cc: Joel Becker &lt;jlbec@evilplan.org&gt;
Cc: Sage Weil &lt;sage@inktank.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: don't include aio.h in sched.h</title>
<updated>2013-05-08T03:16:25+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>koverstreet@google.com</email>
</author>
<published>2013-05-07T23:19:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a27bb332c04cec8c4afd7912df0dc7890db27560'/>
<id>a27bb332c04cec8c4afd7912df0dc7890db27560</id>
<content type='text'>
Faster kernel compiles by way of fewer unnecessary includes.

[akpm@linux-foundation.org: fix fallout]
[akpm@linux-foundation.org: fix build]
Signed-off-by: Kent Overstreet &lt;koverstreet@google.com&gt;
Cc: Zach Brown &lt;zab@redhat.com&gt;
Cc: Felipe Balbi &lt;balbi@ti.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Mark Fasheh &lt;mfasheh@suse.com&gt;
Cc: Joel Becker &lt;jlbec@evilplan.org&gt;
Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Asai Thambi S P &lt;asamymuthupa@micron.com&gt;
Cc: Selvan Mani &lt;smani@micron.com&gt;
Cc: Sam Bradshaw &lt;sbradshaw@micron.com&gt;
Cc: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Reviewed-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Faster kernel compiles by way of fewer unnecessary includes.

[akpm@linux-foundation.org: fix fallout]
[akpm@linux-foundation.org: fix build]
Signed-off-by: Kent Overstreet &lt;koverstreet@google.com&gt;
Cc: Zach Brown &lt;zab@redhat.com&gt;
Cc: Felipe Balbi &lt;balbi@ti.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Mark Fasheh &lt;mfasheh@suse.com&gt;
Cc: Joel Becker &lt;jlbec@evilplan.org&gt;
Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Asai Thambi S P &lt;asamymuthupa@micron.com&gt;
Cc: Selvan Mani &lt;smani@micron.com&gt;
Cc: Sam Bradshaw &lt;sbradshaw@micron.com&gt;
Cc: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Reviewed-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: kill off osd data write_request parameters</title>
<updated>2013-05-02T04:18:58+00:00</updated>
<author>
<name>Alex Elder</name>
<email>elder@inktank.com</email>
</author>
<published>2013-04-15T19:50:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=406e2c9f9286fc93ae2191a7abf477dea05aadc9'/>
<id>406e2c9f9286fc93ae2191a7abf477dea05aadc9</id>
<content type='text'>
In the incremental move toward supporting distinct data items in an
osd request some of the functions had "write_request" parameters to
indicate, basically, whether the data belonged to in_data or the
out_data.  Now that we maintain the data fields in the op structure
there is no need to indicate the direction, so get rid of the
"write_request" parameters.

Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In the incremental move toward supporting distinct data items in an
osd request some of the functions had "write_request" parameters to
indicate, basically, whether the data belonged to in_data or the
out_data.  Now that we maintain the data fields in the op structure
there is no need to indicate the direction, so get rid of the
"write_request" parameters.

Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: fix printk format warnings in file.c</title>
<updated>2013-05-02T04:18:57+00:00</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@infradead.org</email>
</author>
<published>2013-04-19T21:20:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ac7f29bf2ee4a526efb68f947475ff77a43028de'/>
<id>ac7f29bf2ee4a526efb68f947475ff77a43028de</id>
<content type='text'>
Fix printk format warnings by using %zd for 'ssize_t' variables:

fs/ceph/file.c:751:2: warning: format '%ld' expects argument of type 'long int', but argument 11 has type 'ssize_t' [-Wformat]
fs/ceph/file.c:762:2: warning: format '%ld' expects argument of type 'long int', but argument 11 has type 'ssize_t' [-Wformat]

Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc:	ceph-devel@vger.kernel.org
Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix printk format warnings by using %zd for 'ssize_t' variables:

fs/ceph/file.c:751:2: warning: format '%ld' expects argument of type 'long int', but argument 11 has type 'ssize_t' [-Wformat]
fs/ceph/file.c:762:2: warning: format '%ld' expects argument of type 'long int', but argument 11 has type 'ssize_t' [-Wformat]

Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc:	ceph-devel@vger.kernel.org
Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: apply write checks in ceph_aio_write</title>
<updated>2013-05-02T04:18:54+00:00</updated>
<author>
<name>Yan, Zheng</name>
<email>zheng.z.yan@intel.com</email>
</author>
<published>2013-04-12T08:11:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=03d254edebe51949a569c38df6b4b05b7f3c50f9'/>
<id>03d254edebe51949a569c38df6b4b05b7f3c50f9</id>
<content type='text'>
copy write checks in __generic_file_aio_write to ceph_aio_write.
To make these checks cover sync write path.

Signed-off-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
copy write checks in __generic_file_aio_write to ceph_aio_write.
To make these checks cover sync write path.

Signed-off-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: take i_mutex before getting Fw cap</title>
<updated>2013-05-02T04:18:53+00:00</updated>
<author>
<name>Yan, Zheng</name>
<email>zheng.z.yan@intel.com</email>
</author>
<published>2013-04-12T08:11:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=37505d5768b9389a5ef7bc1efa465d4484a5462e'/>
<id>37505d5768b9389a5ef7bc1efa465d4484a5462e</id>
<content type='text'>
There is deadlock as illustrated bellow. The fix is taking i_mutex
before getting Fw cap reference.

      write                    truncate                 MDS
---------------------     --------------------      --------------
get Fw cap
                          lock i_mutex
lock i_mutex (blocked)
                          request setattr.size  -&gt;
                                                &lt;-   revoke Fw cap

Signed-off-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is deadlock as illustrated bellow. The fix is taking i_mutex
before getting Fw cap reference.

      write                    truncate                 MDS
---------------------     --------------------      --------------
get Fw cap
                          lock i_mutex
lock i_mutex (blocked)
                          request setattr.size  -&gt;
                                                &lt;-   revoke Fw cap

Signed-off-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: change how "safe" callback is used</title>
<updated>2013-05-02T04:18:52+00:00</updated>
<author>
<name>Alex Elder</name>
<email>elder@inktank.com</email>
</author>
<published>2013-04-15T16:20:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=26be88087ae8a04a5b576aa2f490597b649fc132'/>
<id>26be88087ae8a04a5b576aa2f490597b649fc132</id>
<content type='text'>
An osd request currently has two callbacks.  They inform the
initiator of the request when we've received confirmation for the
target osd that a request was received, and when the osd indicates
all changes described by the request are durable.

The only time the second callback is used is in the ceph file system
for a synchronous write.  There's a race that makes some handling of
this case unsafe.  This patch addresses this problem.  The error
handling for this callback is also kind of gross, and this patch
changes that as well.

In ceph_sync_write(), if a safe callback is requested we want to add
the request on the ceph inode's unsafe items list.  Because items on
this list must have their tid set (by ceph_osd_start_request()), the
request added *after* the call to that function returns.  The
problem with this is that there's a race between starting the
request and adding it to the unsafe items list; the request may
already be complete before ceph_sync_write() even begins to put it
on the list.

To address this, we change the way the "safe" callback is used.
Rather than just calling it when the request is "safe", we use it to
notify the initiator the bounds (start and end) of the period during
which the request is *unsafe*.  So the initiator gets notified just
before the request gets sent to the osd (when it is "unsafe"), and
again when it's known the results are durable (it's no longer
unsafe).  The first call will get made in __send_request(), just
before the request message gets sent to the messenger for the first
time.  That function is only called by __send_queued(), which is
always called with the osd client's request mutex held.

We then have this callback function insert the request on the ceph
inode's unsafe list when we're told the request is unsafe.  This
will avoid the race because this call will be made under protection
of the osd client's request mutex.  It also nicely groups the setup
and cleanup of the state associated with managing unsafe requests.

The name of the "safe" callback field is changed to "unsafe" to
better reflect its new purpose.  It has a Boolean "unsafe" parameter
to indicate whether the request is becoming unsafe or is now safe.
Because the "msg" parameter wasn't used, we drop that.

This resolves the original problem reportedin:
    http://tracker.ceph.com/issues/4706

Reported-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
An osd request currently has two callbacks.  They inform the
initiator of the request when we've received confirmation for the
target osd that a request was received, and when the osd indicates
all changes described by the request are durable.

The only time the second callback is used is in the ceph file system
for a synchronous write.  There's a race that makes some handling of
this case unsafe.  This patch addresses this problem.  The error
handling for this callback is also kind of gross, and this patch
changes that as well.

In ceph_sync_write(), if a safe callback is requested we want to add
the request on the ceph inode's unsafe items list.  Because items on
this list must have their tid set (by ceph_osd_start_request()), the
request added *after* the call to that function returns.  The
problem with this is that there's a race between starting the
request and adding it to the unsafe items list; the request may
already be complete before ceph_sync_write() even begins to put it
on the list.

To address this, we change the way the "safe" callback is used.
Rather than just calling it when the request is "safe", we use it to
notify the initiator the bounds (start and end) of the period during
which the request is *unsafe*.  So the initiator gets notified just
before the request gets sent to the osd (when it is "unsafe"), and
again when it's known the results are durable (it's no longer
unsafe).  The first call will get made in __send_request(), just
before the request message gets sent to the messenger for the first
time.  That function is only called by __send_queued(), which is
always called with the osd client's request mutex held.

We then have this callback function insert the request on the ceph
inode's unsafe list when we're told the request is unsafe.  This
will avoid the race because this call will be made under protection
of the osd client's request mutex.  It also nicely groups the setup
and cleanup of the state associated with managing unsafe requests.

The name of the "safe" callback field is changed to "unsafe" to
better reflect its new purpose.  It has a Boolean "unsafe" parameter
to indicate whether the request is becoming unsafe or is now safe.
Because the "msg" parameter wasn't used, we drop that.

This resolves the original problem reportedin:
    http://tracker.ceph.com/issues/4706

Reported-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
