<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/ceph/messenger.c, branch v4.2</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>libceph: treat sockaddr_storage with uninitialized family as blank</title>
<updated>2015-07-09T17:30:34+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2015-07-09T10:57:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c44bd69c0c8cfadf0239437635b2933efb1f6c4c'/>
<id>c44bd69c0c8cfadf0239437635b2933efb1f6c4c</id>
<content type='text'>
addr_is_blank() should return true if family is neither AF_INET nor
AF_INET6.  This is what its counterpart entity_addr_t::is_blank_ip() is
doing and it is the right thing to do: in process_banner() we check if
our address is blank and if it is "learn" it from our peer.  As it is,
we never learn our address and always send out a blank one.  This goes
way back to ceph.git commit dd732cbfc1c9 ("use sockaddr_storage; and
some ipv6 support groundwork") from 2009.

While at at, do not open-code ipv6_addr_any() and use INADDR_ANY
constant instead of 0.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Sage Weil &lt;sage@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
addr_is_blank() should return true if family is neither AF_INET nor
AF_INET6.  This is what its counterpart entity_addr_t::is_blank_ip() is
doing and it is the right thing to do: in process_banner() we check if
our address is blank and if it is "learn" it from our peer.  As it is,
we never learn our address and always send out a blank one.  This goes
way back to ceph.git commit dd732cbfc1c9 ("use sockaddr_storage; and
some ipv6 support groundwork") from 2009.

While at at, do not open-code ipv6_addr_any() and use INADDR_ANY
constant instead of 0.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Sage Weil &lt;sage@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: enable ceph in a non-default network namespace</title>
<updated>2015-07-09T17:30:34+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2015-06-25T14:47:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=757856d2b9568a701df9ea6a4be68effbb9d6f44'/>
<id>757856d2b9568a701df9ea6a4be68effbb9d6f44</id>
<content type='text'>
Grab a reference on a network namespace of the 'rbd map' (in case of
rbd) or 'mount' (in case of ceph) process and use that to open sockets
instead of always using init_net and bailing if network namespace is
anything but init_net.  Be careful to not share struct ceph_client
instances between different namespaces and don't add any code in the
!CONFIG_NET_NS case.

This is based on a patch from Hong Zhiguo &lt;zhiguohong@tencent.com&gt;.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Sage Weil &lt;sage@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Grab a reference on a network namespace of the 'rbd map' (in case of
rbd) or 'mount' (in case of ceph) process and use that to open sockets
instead of always using init_net and bailing if network namespace is
anything but init_net.  Be careful to not share struct ceph_client
instances between different namespaces and don't add any code in the
!CONFIG_NET_NS case.

This is based on a patch from Hong Zhiguo &lt;zhiguohong@tencent.com&gt;.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Sage Weil &lt;sage@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client</title>
<updated>2015-07-02T18:35:00+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-07-02T18:35:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0c76c6ba246043bbc5c0f9620a0645ae78217421'/>
<id>0c76c6ba246043bbc5c0f9620a0645ae78217421</id>
<content type='text'>
Pull Ceph updates from Sage Weil:
 "We have a pile of bug fixes from Ilya, including a few patches that
  sync up the CRUSH code with the latest from userspace.

  There is also a long series from Zheng that fixes various issues with
  snapshots, inline data, and directory fsync, some simplification and
  improvement in the cap release code, and a rework of the caching of
  directory contents.

  To top it off there are a few small fixes and cleanups from Benoit and
  Hong"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (40 commits)
  rbd: use GFP_NOIO in rbd_obj_request_create()
  crush: fix a bug in tree bucket decode
  libceph: Fix ceph_tcp_sendpage()'s more boolean usage
  libceph: Remove spurious kunmap() of the zero page
  rbd: queue_depth map option
  rbd: store rbd_options in rbd_device
  rbd: terminate rbd_opts_tokens with Opt_err
  ceph: fix ceph_writepages_start()
  rbd: bump queue_max_segments
  ceph: rework dcache readdir
  crush: sync up with userspace
  crush: fix crash from invalid 'take' argument
  ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL
  ceph: pre-allocate data structure that tracks caps flushing
  ceph: re-send flushing caps (which are revoked) in reconnect stage
  ceph: send TID of the oldest pending caps flush to MDS
  ceph: track pending caps flushing globally
  ceph: track pending caps flushing accurately
  libceph: fix wrong name "Ceph filesystem for Linux"
  ceph: fix directory fsync
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull Ceph updates from Sage Weil:
 "We have a pile of bug fixes from Ilya, including a few patches that
  sync up the CRUSH code with the latest from userspace.

  There is also a long series from Zheng that fixes various issues with
  snapshots, inline data, and directory fsync, some simplification and
  improvement in the cap release code, and a rework of the caching of
  directory contents.

  To top it off there are a few small fixes and cleanups from Benoit and
  Hong"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (40 commits)
  rbd: use GFP_NOIO in rbd_obj_request_create()
  crush: fix a bug in tree bucket decode
  libceph: Fix ceph_tcp_sendpage()'s more boolean usage
  libceph: Remove spurious kunmap() of the zero page
  rbd: queue_depth map option
  rbd: store rbd_options in rbd_device
  rbd: terminate rbd_opts_tokens with Opt_err
  ceph: fix ceph_writepages_start()
  rbd: bump queue_max_segments
  ceph: rework dcache readdir
  crush: sync up with userspace
  crush: fix crash from invalid 'take' argument
  ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL
  ceph: pre-allocate data structure that tracks caps flushing
  ceph: re-send flushing caps (which are revoked) in reconnect stage
  ceph: send TID of the oldest pending caps flush to MDS
  ceph: track pending caps flushing globally
  ceph: track pending caps flushing accurately
  libceph: fix wrong name "Ceph filesystem for Linux"
  ceph: fix directory fsync
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: Fix ceph_tcp_sendpage()'s more boolean usage</title>
<updated>2015-06-29T17:03:46+00:00</updated>
<author>
<name>Benoît Canet</name>
<email>benoit.canet@nodalink.com</email>
</author>
<published>2015-06-25T17:32:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c2cfa19400979dc1a14bba75f83b451b0cd9507a'/>
<id>c2cfa19400979dc1a14bba75f83b451b0cd9507a</id>
<content type='text'>
From struct ceph_msg_data_cursor in include/linux/ceph/messenger.h:

bool    last_piece;     /* current is last piece */

In ceph_msg_data_next():

*last_piece = cursor-&gt;last_piece;

A call to ceph_msg_data_next() is followed by:

ret = ceph_tcp_sendpage(con-&gt;sock, page, page_offset,
                        length, last_piece);

while ceph_tcp_sendpage() is:

static int ceph_tcp_sendpage(struct socket *sock, struct page *page,
                             int offset, size_t size, bool more)

The logic is inverted: correct it.

Signed-off-by: Benoît Canet &lt;benoit.canet@nodalink.com&gt;
Reviewed-by: Alex Elder &lt;elder@linaro.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
From struct ceph_msg_data_cursor in include/linux/ceph/messenger.h:

bool    last_piece;     /* current is last piece */

In ceph_msg_data_next():

*last_piece = cursor-&gt;last_piece;

A call to ceph_msg_data_next() is followed by:

ret = ceph_tcp_sendpage(con-&gt;sock, page, page_offset,
                        length, last_piece);

while ceph_tcp_sendpage() is:

static int ceph_tcp_sendpage(struct socket *sock, struct page *page,
                             int offset, size_t size, bool more)

The logic is inverted: correct it.

Signed-off-by: Benoît Canet &lt;benoit.canet@nodalink.com&gt;
Reviewed-by: Alex Elder &lt;elder@linaro.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: Remove spurious kunmap() of the zero page</title>
<updated>2015-06-25T15:30:56+00:00</updated>
<author>
<name>Benoît Canet</name>
<email>benoit.canet@nodalink.com</email>
</author>
<published>2015-06-24T21:18:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6ba8edc0bcbdf337293e60123ddac8fc1c895a3c'/>
<id>6ba8edc0bcbdf337293e60123ddac8fc1c895a3c</id>
<content type='text'>
ceph_tcp_sendpage already does the work of mapping/unmapping
the zero page if needed.

Signed-off-by: Benoît Canet &lt;benoit.canet@nodalink.com&gt;
Reviewed-by: Alex Elder &lt;elder@linaro.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
ceph_tcp_sendpage already does the work of mapping/unmapping
the zero page if needed.

Signed-off-by: Benoît Canet &lt;benoit.canet@nodalink.com&gt;
Reviewed-by: Alex Elder &lt;elder@linaro.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Add a struct net parameter to sock_create_kern</title>
<updated>2015-05-11T14:50:17+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2015-05-09T02:08:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=eeb1bd5c40edb0e2fd925c8535e2fdebdbc5cef2'/>
<id>eeb1bd5c40edb0e2fd925c8535e2fdebdbc5cef2</id>
<content type='text'>
This is long overdue, and is part of cleaning up how we allocate kernel
sockets that don't reference count struct net.

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is long overdue, and is part of cleaning up how we allocate kernel
sockets that don't reference count struct net.

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: don't overwrite specific con error msgs</title>
<updated>2015-04-20T15:55:37+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2015-03-23T11:52:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=67c64eb742a49d3d3f5dcef75d0c32a3394e5519'/>
<id>67c64eb742a49d3d3f5dcef75d0c32a3394e5519</id>
<content type='text'>
- specific con-&gt;error_msg messages (e.g. "protocol version mismatch")
  end up getting overwritten by a catch-all "socket error on read
  / write", introduced in commit 3a140a0d5c4b ("libceph: report socket
  read/write error message")
- "bad message sequence # for incoming message" loses to "bad crc" due
  to the fact that -EBADMSG is used for both

Fix it, and tidy up con-&gt;error_msg assignments and pr_errs while at it.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- specific con-&gt;error_msg messages (e.g. "protocol version mismatch")
  end up getting overwritten by a catch-all "socket error on read
  / write", introduced in commit 3a140a0d5c4b ("libceph: report socket
  read/write error message")
- "bad message sequence # for incoming message" loses to "bad crc" due
  to the fact that -EBADMSG is used for both

Fix it, and tidy up con-&gt;error_msg assignments and pr_errs while at it.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "libceph: use memalloc flags for net IO"</title>
<updated>2015-04-07T16:08:35+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2015-04-02T11:40:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6d7fdb0ab351b33d4c12d53fe44be030b90fc9d4'/>
<id>6d7fdb0ab351b33d4c12d53fe44be030b90fc9d4</id>
<content type='text'>
This reverts commit 89baaa570ab0b476db09408d209578cfed700e9f.

Dirty page throttling should be sufficient for us in the general case
so there is no need to use __GFP_MEMALLOC - it would be needed only in
the swap-over-rbd case, which we currently don't support.  (It would
probably take approximately the commit that is being reverted to add
that support, but we would also need the "swap" option to distinguish
from the general case and make sure swap ceph_client-s aren't shared
with anything else.)  See ceph-devel threads [1] and [2] for the
details of why enabling pfmemalloc reserves for all cases is a bad
thing.

On top of potential system lockups related to drained emergency
reserves, this turned out to cause ceph lockups in case peers are on
the same host and communicating via loopback due to sk_filter()
dropping pfmemalloc skbs on the receiving side because the receiving
loopback socket is not tagged with SOCK_MEMALLOC.

[1] "SOCK_MEMALLOC vs loopback"
    http://www.spinics.net/lists/ceph-devel/msg22998.html
[2] "[PATCH] libceph: don't set memalloc flags in loopback case"
    http://www.spinics.net/lists/ceph-devel/msg23392.html

Conflicts:
	net/ceph/messenger.c [ context: tcp_nodelay option ]

Cc: Mike Christie &lt;michaelc@cs.wisc.edu&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Sage Weil &lt;sage@redhat.com&gt;
Cc: stable@vger.kernel.org # 3.18+, needs backporting
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Acked-by: Mike Christie &lt;michaelc@cs.wisc.edu&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit 89baaa570ab0b476db09408d209578cfed700e9f.

Dirty page throttling should be sufficient for us in the general case
so there is no need to use __GFP_MEMALLOC - it would be needed only in
the swap-over-rbd case, which we currently don't support.  (It would
probably take approximately the commit that is being reverted to add
that support, but we would also need the "swap" option to distinguish
from the general case and make sure swap ceph_client-s aren't shared
with anything else.)  See ceph-devel threads [1] and [2] for the
details of why enabling pfmemalloc reserves for all cases is a bad
thing.

On top of potential system lockups related to drained emergency
reserves, this turned out to cause ceph lockups in case peers are on
the same host and communicating via loopback due to sk_filter()
dropping pfmemalloc skbs on the receiving side because the receiving
loopback socket is not tagged with SOCK_MEMALLOC.

[1] "SOCK_MEMALLOC vs loopback"
    http://www.spinics.net/lists/ceph-devel/msg22998.html
[2] "[PATCH] libceph: don't set memalloc flags in loopback case"
    http://www.spinics.net/lists/ceph-devel/msg23392.html

Conflicts:
	net/ceph/messenger.c [ context: tcp_nodelay option ]

Cc: Mike Christie &lt;michaelc@cs.wisc.edu&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Sage Weil &lt;sage@redhat.com&gt;
Cc: stable@vger.kernel.org # 3.18+, needs backporting
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Acked-by: Mike Christie &lt;michaelc@cs.wisc.edu&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: tcp_nodelay support</title>
<updated>2015-02-19T10:31:40+00:00</updated>
<author>
<name>Chaitanya Huilgol</name>
<email>chaitanya.huilgol@gmail.com</email>
</author>
<published>2015-01-23T11:11:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ba988f87f532cd2b8c4740aa8ec49056521ae833'/>
<id>ba988f87f532cd2b8c4740aa8ec49056521ae833</id>
<content type='text'>
TCP_NODELAY socket option set on connection sockets,
disables Nagle’s algorithm and improves latency characteristics.
tcp_nodelay(default)/notcp_nodelay option flags provided to
enable/disable setting the socket option.

Signed-off-by: Chaitanya Huilgol &lt;chaitanya.huilgol@sandisk.com&gt;
[idryomov@redhat.com: NO_TCP_NODELAY -&gt; TCP_NODELAY, minor adjustments]
Signed-off-by: Ilya Dryomov &lt;idryomov@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
TCP_NODELAY socket option set on connection sockets,
disables Nagle’s algorithm and improves latency characteristics.
tcp_nodelay(default)/notcp_nodelay option flags provided to
enable/disable setting the socket option.

Signed-off-by: Chaitanya Huilgol &lt;chaitanya.huilgol@sandisk.com&gt;
[idryomov@redhat.com: NO_TCP_NODELAY -&gt; TCP_NODELAY, minor adjustments]
Signed-off-by: Ilya Dryomov &lt;idryomov@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: message signature support</title>
<updated>2014-12-17T17:09:50+00:00</updated>
<author>
<name>Yan, Zheng</name>
<email>zyan@redhat.com</email>
</author>
<published>2014-11-04T08:33:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=33d07337962c7bbd2fd5cf7f1106735c9507fbe2'/>
<id>33d07337962c7bbd2fd5cf7f1106735c9507fbe2</id>
<content type='text'>
Signed-off-by: Yan, Zheng &lt;zyan@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Yan, Zheng &lt;zyan@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
