<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net, branch v6.2-rc2</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>treewide: Convert del_timer*() to timer_shutdown*()</title>
<updated>2022-12-25T21:38:09+00:00</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2022-12-20T18:45:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=292a089d78d3e2f7944e60bb897c977785a321e3'/>
<id>292a089d78d3e2f7944e60bb897c977785a321e3</id>
<content type='text'>
Due to several bugs caused by timers being re-armed after they are
shutdown and just before they are freed, a new state of timers was added
called "shutdown".  After a timer is set to this state, then it can no
longer be re-armed.

The following script was run to find all the trivial locations where
del_timer() or del_timer_sync() is called in the same function that the
object holding the timer is freed.  It also ignores any locations where
the timer-&gt;function is modified between the del_timer*() and the free(),
as that is not considered a "trivial" case.

This was created by using a coccinelle script and the following
commands:

    $ cat timer.cocci
    @@
    expression ptr, slab;
    identifier timer, rfield;
    @@
    (
    -       del_timer(&amp;ptr-&gt;timer);
    +       timer_shutdown(&amp;ptr-&gt;timer);
    |
    -       del_timer_sync(&amp;ptr-&gt;timer);
    +       timer_shutdown_sync(&amp;ptr-&gt;timer);
    )
      ... when strict
          when != ptr-&gt;timer
    (
            kfree_rcu(ptr, rfield);
    |
            kmem_cache_free(slab, ptr);
    |
            kfree(ptr);
    )

    $ spatch timer.cocci . &gt; /tmp/t.patch
    $ patch -p1 &lt; /tmp/t.patch

Link: https://lore.kernel.org/lkml/20221123201306.823305113@linutronix.de/
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Acked-by: Pavel Machek &lt;pavel@ucw.cz&gt; [ LED ]
Acked-by: Kalle Valo &lt;kvalo@kernel.org&gt; [ wireless ]
Acked-by: Paolo Abeni &lt;pabeni@redhat.com&gt; [ networking ]
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Due to several bugs caused by timers being re-armed after they are
shutdown and just before they are freed, a new state of timers was added
called "shutdown".  After a timer is set to this state, then it can no
longer be re-armed.

The following script was run to find all the trivial locations where
del_timer() or del_timer_sync() is called in the same function that the
object holding the timer is freed.  It also ignores any locations where
the timer-&gt;function is modified between the del_timer*() and the free(),
as that is not considered a "trivial" case.

This was created by using a coccinelle script and the following
commands:

    $ cat timer.cocci
    @@
    expression ptr, slab;
    identifier timer, rfield;
    @@
    (
    -       del_timer(&amp;ptr-&gt;timer);
    +       timer_shutdown(&amp;ptr-&gt;timer);
    |
    -       del_timer_sync(&amp;ptr-&gt;timer);
    +       timer_shutdown_sync(&amp;ptr-&gt;timer);
    )
      ... when strict
          when != ptr-&gt;timer
    (
            kfree_rcu(ptr, rfield);
    |
            kmem_cache_free(slab, ptr);
    |
            kfree(ptr);
    )

    $ spatch timer.cocci . &gt; /tmp/t.patch
    $ patch -p1 &lt; /tmp/t.patch

Link: https://lore.kernel.org/lkml/20221123201306.823305113@linutronix.de/
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Acked-by: Pavel Machek &lt;pavel@ucw.cz&gt; [ LED ]
Acked-by: Kalle Valo &lt;kvalo@kernel.org&gt; [ wireless ]
Acked-by: Paolo Abeni &lt;pabeni@redhat.com&gt; [ networking ]
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag '9p-for-6.2-rc1' of https://github.com/martinetd/linux</title>
<updated>2022-12-23T19:39:18+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-12-23T19:39:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e3b862ed893bf030ebdd78ead99647374a2cfd47'/>
<id>e3b862ed893bf030ebdd78ead99647374a2cfd47</id>
<content type='text'>
Pull 9p updates from Dominique Martinet:

 - improve p9_check_errors to check buffer size instead of msize when
   possible (e.g. not zero-copy)

 - some more syzbot and KCSAN fixes

 - minor headers include cleanup

* tag '9p-for-6.2-rc1' of https://github.com/martinetd/linux:
  9p/client: fix data race on req-&gt;status
  net/9p: fix response size check in p9_check_errors()
  net/9p: distinguish zero-copy requests
  9p/xen: do not memcpy header into req-&gt;rc
  9p: set req refcount to zero to avoid uninitialized usage
  9p/net: Remove unneeded idr.h #include
  9p/fs: Remove unneeded idr.h #include
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull 9p updates from Dominique Martinet:

 - improve p9_check_errors to check buffer size instead of msize when
   possible (e.g. not zero-copy)

 - some more syzbot and KCSAN fixes

 - minor headers include cleanup

* tag '9p-for-6.2-rc1' of https://github.com/martinetd/linux:
  9p/client: fix data race on req-&gt;status
  net/9p: fix response size check in p9_check_errors()
  net/9p: distinguish zero-copy requests
  9p/xen: do not memcpy header into req-&gt;rc
  9p: set req refcount to zero to avoid uninitialized usage
  9p/net: Remove unneeded idr.h #include
  9p/fs: Remove unneeded idr.h #include
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'net-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2022-12-21T16:41:32+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-12-21T16:41:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=609d3bc6230514a8ca79b377775b17e8c3d9ac93'/>
<id>609d3bc6230514a8ca79b377775b17e8c3d9ac93</id>
<content type='text'>
Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf, netfilter and can.

  Current release - regressions:

   - bpf: synchronize dispatcher update with bpf_dispatcher_xdp_func

   - rxrpc:
      - fix security setting propagation
      - fix null-deref in rxrpc_unuse_local()
      - fix switched parameters in peer tracing

  Current release - new code bugs:

   - rxrpc:
      - fix I/O thread startup getting skipped
      - fix locking issues in rxrpc_put_peer_locked()
      - fix I/O thread stop
      - fix uninitialised variable in rxperf server
      - fix the return value of rxrpc_new_incoming_call()

   - microchip: vcap: fix initialization of value and mask

   - nfp: fix unaligned io read of capabilities word

  Previous releases - regressions:

   - stop in-kernel socket users from corrupting socket's task_frag

   - stream: purge sk_error_queue in sk_stream_kill_queues()

   - openvswitch: fix flow lookup to use unmasked key

   - dsa: mv88e6xxx: avoid reg_lock deadlock in mv88e6xxx_setup_port()

   - devlink:
      - hold region lock when flushing snapshots
      - protect devlink dump by the instance lock

  Previous releases - always broken:

   - bpf:
      - prevent leak of lsm program after failed attach
      - resolve fext program type when checking map compatibility

   - skbuff: account for tail adjustment during pull operations

   - macsec: fix net device access prior to holding a lock

   - bonding: switch back when high prio link up

   - netfilter: flowtable: really fix NAT IPv6 offload

   - enetc: avoid buffer leaks on xdp_do_redirect() failure

   - unix: fix race in SOCK_SEQPACKET's unix_dgram_sendmsg()

   - dsa: microchip: remove IRQF_TRIGGER_FALLING in
     request_threaded_irq"

* tag 'net-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits)
  net: fec: check the return value of build_skb()
  net: simplify sk_page_frag
  Treewide: Stop corrupting socket's task_frag
  net: Introduce sk_use_task_frag in struct sock.
  mctp: Remove device type check at unregister
  net: dsa: microchip: remove IRQF_TRIGGER_FALLING in request_threaded_irq
  can: kvaser_usb: hydra: help gcc-13 to figure out cmd_len
  can: flexcan: avoid unbalanced pm_runtime_enable warning
  Documentation: devlink: add missing toc entry for etas_es58x devlink doc
  mctp: serial: Fix starting value for frame check sequence
  nfp: fix unaligned io read of capabilities word
  net: stream: purge sk_error_queue in sk_stream_kill_queues()
  myri10ge: Fix an error handling path in myri10ge_probe()
  net: microchip: vcap: Fix initialization of value and mask
  rxrpc: Fix the return value of rxrpc_new_incoming_call()
  rxrpc: rxperf: Fix uninitialised variable
  rxrpc: Fix I/O thread stop
  rxrpc: Fix switched parameters in peer tracing
  rxrpc: Fix locking issues in rxrpc_put_peer_locked()
  rxrpc: Fix I/O thread startup getting skipped
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf, netfilter and can.

  Current release - regressions:

   - bpf: synchronize dispatcher update with bpf_dispatcher_xdp_func

   - rxrpc:
      - fix security setting propagation
      - fix null-deref in rxrpc_unuse_local()
      - fix switched parameters in peer tracing

  Current release - new code bugs:

   - rxrpc:
      - fix I/O thread startup getting skipped
      - fix locking issues in rxrpc_put_peer_locked()
      - fix I/O thread stop
      - fix uninitialised variable in rxperf server
      - fix the return value of rxrpc_new_incoming_call()

   - microchip: vcap: fix initialization of value and mask

   - nfp: fix unaligned io read of capabilities word

  Previous releases - regressions:

   - stop in-kernel socket users from corrupting socket's task_frag

   - stream: purge sk_error_queue in sk_stream_kill_queues()

   - openvswitch: fix flow lookup to use unmasked key

   - dsa: mv88e6xxx: avoid reg_lock deadlock in mv88e6xxx_setup_port()

   - devlink:
      - hold region lock when flushing snapshots
      - protect devlink dump by the instance lock

  Previous releases - always broken:

   - bpf:
      - prevent leak of lsm program after failed attach
      - resolve fext program type when checking map compatibility

   - skbuff: account for tail adjustment during pull operations

   - macsec: fix net device access prior to holding a lock

   - bonding: switch back when high prio link up

   - netfilter: flowtable: really fix NAT IPv6 offload

   - enetc: avoid buffer leaks on xdp_do_redirect() failure

   - unix: fix race in SOCK_SEQPACKET's unix_dgram_sendmsg()

   - dsa: microchip: remove IRQF_TRIGGER_FALLING in
     request_threaded_irq"

* tag 'net-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits)
  net: fec: check the return value of build_skb()
  net: simplify sk_page_frag
  Treewide: Stop corrupting socket's task_frag
  net: Introduce sk_use_task_frag in struct sock.
  mctp: Remove device type check at unregister
  net: dsa: microchip: remove IRQF_TRIGGER_FALLING in request_threaded_irq
  can: kvaser_usb: hydra: help gcc-13 to figure out cmd_len
  can: flexcan: avoid unbalanced pm_runtime_enable warning
  Documentation: devlink: add missing toc entry for etas_es58x devlink doc
  mctp: serial: Fix starting value for frame check sequence
  nfp: fix unaligned io read of capabilities word
  net: stream: purge sk_error_queue in sk_stream_kill_queues()
  myri10ge: Fix an error handling path in myri10ge_probe()
  net: microchip: vcap: Fix initialization of value and mask
  rxrpc: Fix the return value of rxrpc_new_incoming_call()
  rxrpc: rxperf: Fix uninitialised variable
  rxrpc: Fix I/O thread stop
  rxrpc: Fix switched parameters in peer tracing
  rxrpc: Fix locking issues in rxrpc_put_peer_locked()
  rxrpc: Fix I/O thread startup getting skipped
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>prandom: remove prandom_u32_max()</title>
<updated>2022-12-20T02:13:45+00:00</updated>
<author>
<name>Jason A. Donenfeld</name>
<email>Jason@zx2c4.com</email>
</author>
<published>2022-10-10T02:45:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3c202d14a9d73fb63c3dccb18feac5618c21e1c4'/>
<id>3c202d14a9d73fb63c3dccb18feac5618c21e1c4</id>
<content type='text'>
Convert the final two users of prandom_u32_max() that slipped in during
6.2-rc1 to use get_random_u32_below().

Then, with no more users left, we can finally remove the deprecated
function.

Signed-off-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Convert the final two users of prandom_u32_max() that slipped in during
6.2-rc1 to use get_random_u32_below().

Then, with no more users left, we can finally remove the deprecated
function.

Signed-off-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Treewide: Stop corrupting socket's task_frag</title>
<updated>2022-12-20T01:28:49+00:00</updated>
<author>
<name>Benjamin Coddington</name>
<email>bcodding@redhat.com</email>
</author>
<published>2022-12-16T12:45:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=98123866fcf3fe95a0c1b198ef122dfdbd351916'/>
<id>98123866fcf3fe95a0c1b198ef122dfdbd351916</id>
<content type='text'>
Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the
GFP_NOIO flag on sk_allocation which the networking system uses to decide
when it is safe to use current-&gt;task_frag.  The results of this are
unexpected corruption in task_frag when SUNRPC is involved in memory
reclaim.

The corruption can be seen in crashes, but the root cause is often
difficult to ascertain as a crashing machine's stack trace will have no
evidence of being near NFS or SUNRPC code.  I believe this problem to
be much more pervasive than reports to the community may indicate.

Fix this by having kernel users of sockets that may corrupt task_frag due
to reclaim set sk_use_task_frag = false.  Preemptively correcting this
situation for users that still set sk_allocation allows them to convert to
memalloc_nofs_save/restore without the same unexpected corruptions that are
sure to follow, unlikely to show up in testing, and difficult to bisect.

CC: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
CC: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
CC: "Christoph Böhmwalder" &lt;christoph.boehmwalder@linbit.com&gt;
CC: Jens Axboe &lt;axboe@kernel.dk&gt;
CC: Josef Bacik &lt;josef@toxicpanda.com&gt;
CC: Keith Busch &lt;kbusch@kernel.org&gt;
CC: Christoph Hellwig &lt;hch@lst.de&gt;
CC: Sagi Grimberg &lt;sagi@grimberg.me&gt;
CC: Lee Duncan &lt;lduncan@suse.com&gt;
CC: Chris Leech &lt;cleech@redhat.com&gt;
CC: Mike Christie &lt;michael.christie@oracle.com&gt;
CC: "James E.J. Bottomley" &lt;jejb@linux.ibm.com&gt;
CC: "Martin K. Petersen" &lt;martin.petersen@oracle.com&gt;
CC: Valentina Manea &lt;valentina.manea.m@gmail.com&gt;
CC: Shuah Khan &lt;shuah@kernel.org&gt;
CC: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
CC: David Howells &lt;dhowells@redhat.com&gt;
CC: Marc Dionne &lt;marc.dionne@auristor.com&gt;
CC: Steve French &lt;sfrench@samba.org&gt;
CC: Christine Caulfield &lt;ccaulfie@redhat.com&gt;
CC: David Teigland &lt;teigland@redhat.com&gt;
CC: Mark Fasheh &lt;mark@fasheh.com&gt;
CC: Joel Becker &lt;jlbec@evilplan.org&gt;
CC: Joseph Qi &lt;joseph.qi@linux.alibaba.com&gt;
CC: Eric Van Hensbergen &lt;ericvh@gmail.com&gt;
CC: Latchesar Ionkov &lt;lucho@ionkov.net&gt;
CC: Dominique Martinet &lt;asmadeus@codewreck.org&gt;
CC: Ilya Dryomov &lt;idryomov@gmail.com&gt;
CC: Xiubo Li &lt;xiubli@redhat.com&gt;
CC: Chuck Lever &lt;chuck.lever@oracle.com&gt;
CC: Jeff Layton &lt;jlayton@kernel.org&gt;
CC: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
CC: Anna Schumaker &lt;anna@kernel.org&gt;
CC: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
CC: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;

Suggested-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Signed-off-by: Benjamin Coddington &lt;bcodding@redhat.com&gt;
Reviewed-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the
GFP_NOIO flag on sk_allocation which the networking system uses to decide
when it is safe to use current-&gt;task_frag.  The results of this are
unexpected corruption in task_frag when SUNRPC is involved in memory
reclaim.

The corruption can be seen in crashes, but the root cause is often
difficult to ascertain as a crashing machine's stack trace will have no
evidence of being near NFS or SUNRPC code.  I believe this problem to
be much more pervasive than reports to the community may indicate.

Fix this by having kernel users of sockets that may corrupt task_frag due
to reclaim set sk_use_task_frag = false.  Preemptively correcting this
situation for users that still set sk_allocation allows them to convert to
memalloc_nofs_save/restore without the same unexpected corruptions that are
sure to follow, unlikely to show up in testing, and difficult to bisect.

CC: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
CC: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
CC: "Christoph Böhmwalder" &lt;christoph.boehmwalder@linbit.com&gt;
CC: Jens Axboe &lt;axboe@kernel.dk&gt;
CC: Josef Bacik &lt;josef@toxicpanda.com&gt;
CC: Keith Busch &lt;kbusch@kernel.org&gt;
CC: Christoph Hellwig &lt;hch@lst.de&gt;
CC: Sagi Grimberg &lt;sagi@grimberg.me&gt;
CC: Lee Duncan &lt;lduncan@suse.com&gt;
CC: Chris Leech &lt;cleech@redhat.com&gt;
CC: Mike Christie &lt;michael.christie@oracle.com&gt;
CC: "James E.J. Bottomley" &lt;jejb@linux.ibm.com&gt;
CC: "Martin K. Petersen" &lt;martin.petersen@oracle.com&gt;
CC: Valentina Manea &lt;valentina.manea.m@gmail.com&gt;
CC: Shuah Khan &lt;shuah@kernel.org&gt;
CC: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
CC: David Howells &lt;dhowells@redhat.com&gt;
CC: Marc Dionne &lt;marc.dionne@auristor.com&gt;
CC: Steve French &lt;sfrench@samba.org&gt;
CC: Christine Caulfield &lt;ccaulfie@redhat.com&gt;
CC: David Teigland &lt;teigland@redhat.com&gt;
CC: Mark Fasheh &lt;mark@fasheh.com&gt;
CC: Joel Becker &lt;jlbec@evilplan.org&gt;
CC: Joseph Qi &lt;joseph.qi@linux.alibaba.com&gt;
CC: Eric Van Hensbergen &lt;ericvh@gmail.com&gt;
CC: Latchesar Ionkov &lt;lucho@ionkov.net&gt;
CC: Dominique Martinet &lt;asmadeus@codewreck.org&gt;
CC: Ilya Dryomov &lt;idryomov@gmail.com&gt;
CC: Xiubo Li &lt;xiubli@redhat.com&gt;
CC: Chuck Lever &lt;chuck.lever@oracle.com&gt;
CC: Jeff Layton &lt;jlayton@kernel.org&gt;
CC: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
CC: Anna Schumaker &lt;anna@kernel.org&gt;
CC: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
CC: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;

Suggested-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Signed-off-by: Benjamin Coddington &lt;bcodding@redhat.com&gt;
Reviewed-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Introduce sk_use_task_frag in struct sock.</title>
<updated>2022-12-20T01:28:49+00:00</updated>
<author>
<name>Guillaume Nault</name>
<email>gnault@redhat.com</email>
</author>
<published>2022-12-16T12:45:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fb87bd47516d9a26b6d549231aa743b20fd4a569'/>
<id>fb87bd47516d9a26b6d549231aa743b20fd4a569</id>
<content type='text'>
Sockets that can be used while recursing into memory reclaim, like
those used by network block devices and file systems, mustn't use
current-&gt;task_frag: if the current process is already using it, then
the inner memory reclaim call would corrupt the task_frag structure.

To avoid this, sk_page_frag() uses -&gt;sk_allocation to detect sockets
that mustn't use current-&gt;task_frag, assuming that those used during
memory reclaim had their allocation constraints reflected in
-&gt;sk_allocation.

This unfortunately doesn't cover all cases: in an attempt to remove all
usage of GFP_NOFS and GFP_NOIO, sunrpc stopped setting these flags in
-&gt;sk_allocation, and used memalloc_nofs critical sections instead.
This breaks the sk_page_frag() heuristic since the allocation
constraints are now stored in current-&gt;flags, which sk_page_frag()
can't read without risking triggering a cache miss and slowing down
TCP's fast path.

This patch creates a new field in struct sock, named sk_use_task_frag,
which sockets with memory reclaim constraints can set to false if they
can't safely use current-&gt;task_frag. In such cases, sk_page_frag() now
always returns the socket's page_frag (-&gt;sk_frag). The first user is
sunrpc, which needs to avoid using current-&gt;task_frag but can keep
-&gt;sk_allocation set to GFP_KERNEL otherwise.

Eventually, it might be possible to simplify sk_page_frag() by only
testing -&gt;sk_use_task_frag and avoid relying on the -&gt;sk_allocation
heuristic entirely (assuming other sockets will set -&gt;sk_use_task_frag
according to their constraints in the future).

The new -&gt;sk_use_task_frag field is placed in a hole in struct sock and
belongs to a cache line shared with -&gt;sk_shutdown. Therefore it should
be hot and shouldn't have negative performance impacts on TCP's fast
path (sk_shutdown is tested just before the while() loop in
tcp_sendmsg_locked()).

Link: https://lore.kernel.org/netdev/b4d8cb09c913d3e34f853736f3f5628abfd7f4b6.1656699567.git.gnault@redhat.com/
Signed-off-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Reviewed-by: Benjamin Coddington &lt;bcodding@redhat.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Sockets that can be used while recursing into memory reclaim, like
those used by network block devices and file systems, mustn't use
current-&gt;task_frag: if the current process is already using it, then
the inner memory reclaim call would corrupt the task_frag structure.

To avoid this, sk_page_frag() uses -&gt;sk_allocation to detect sockets
that mustn't use current-&gt;task_frag, assuming that those used during
memory reclaim had their allocation constraints reflected in
-&gt;sk_allocation.

This unfortunately doesn't cover all cases: in an attempt to remove all
usage of GFP_NOFS and GFP_NOIO, sunrpc stopped setting these flags in
-&gt;sk_allocation, and used memalloc_nofs critical sections instead.
This breaks the sk_page_frag() heuristic since the allocation
constraints are now stored in current-&gt;flags, which sk_page_frag()
can't read without risking triggering a cache miss and slowing down
TCP's fast path.

This patch creates a new field in struct sock, named sk_use_task_frag,
which sockets with memory reclaim constraints can set to false if they
can't safely use current-&gt;task_frag. In such cases, sk_page_frag() now
always returns the socket's page_frag (-&gt;sk_frag). The first user is
sunrpc, which needs to avoid using current-&gt;task_frag but can keep
-&gt;sk_allocation set to GFP_KERNEL otherwise.

Eventually, it might be possible to simplify sk_page_frag() by only
testing -&gt;sk_use_task_frag and avoid relying on the -&gt;sk_allocation
heuristic entirely (assuming other sockets will set -&gt;sk_use_task_frag
according to their constraints in the future).

The new -&gt;sk_use_task_frag field is placed in a hole in struct sock and
belongs to a cache line shared with -&gt;sk_shutdown. Therefore it should
be hot and shouldn't have negative performance impacts on TCP's fast
path (sk_shutdown is tested just before the while() loop in
tcp_sendmsg_locked()).

Link: https://lore.kernel.org/netdev/b4d8cb09c913d3e34f853736f3f5628abfd7f4b6.1656699567.git.gnault@redhat.com/
Signed-off-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Reviewed-by: Benjamin Coddington &lt;bcodding@redhat.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mctp: Remove device type check at unregister</title>
<updated>2022-12-20T01:20:22+00:00</updated>
<author>
<name>Matt Johnston</name>
<email>matt@codeconstruct.com.au</email>
</author>
<published>2022-12-15T05:49:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b389a902dd5be4ece505a2e0463b9b034de04bf5'/>
<id>b389a902dd5be4ece505a2e0463b9b034de04bf5</id>
<content type='text'>
The unregister check could be incorrectly triggered if a netdev
changes its type after register. That is possible for a tun device
using TUNSETLINK ioctl, resulting in mctp unregister failing
and the netdev unregister waiting forever.

This was encountered by https://github.com/openthread/openthread/issues/8523

Neither check at register or unregister is required. They were added in
an attempt to track down mctp_ptr being set unexpectedly, which should
not happen in normal operation.

Fixes: 7b1871af75f3 ("mctp: Warn if pointer is set for a wrong dev type")
Signed-off-by: Matt Johnston &lt;matt@codeconstruct.com.au&gt;
Link: https://lore.kernel.org/r/20221215054933.2403401-1-matt@codeconstruct.com.au
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The unregister check could be incorrectly triggered if a netdev
changes its type after register. That is possible for a tun device
using TUNSETLINK ioctl, resulting in mctp unregister failing
and the netdev unregister waiting forever.

This was encountered by https://github.com/openthread/openthread/issues/8523

Neither check at register or unregister is required. They were added in
an attempt to track down mctp_ptr being set unexpectedly, which should
not happen in normal operation.

Fixes: 7b1871af75f3 ("mctp: Warn if pointer is set for a wrong dev type")
Signed-off-by: Matt Johnston &lt;matt@codeconstruct.com.au&gt;
Link: https://lore.kernel.org/r/20221215054933.2403401-1-matt@codeconstruct.com.au
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: stream: purge sk_error_queue in sk_stream_kill_queues()</title>
<updated>2022-12-19T12:33:16+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2022-12-16T16:29:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e0c8bccd40fc1c19e1d246c39bcf79e357e1ada3'/>
<id>e0c8bccd40fc1c19e1d246c39bcf79e357e1ada3</id>
<content type='text'>
Changheon Lee reported TCP socket leaks, with a nice repro.

It seems we leak TCP sockets with the following sequence:

1) SOF_TIMESTAMPING_TX_ACK is enabled on the socket.

   Each ACK will cook an skb put in error queue, from __skb_tstamp_tx().
   __skb_tstamp_tx() is using skb_clone(), unless
   SOF_TIMESTAMPING_OPT_TSONLY was also requested.

2) If the application is also using MSG_ZEROCOPY, then we put in the
   error queue cloned skbs that had a struct ubuf_info attached to them.

   Whenever an struct ubuf_info is allocated, sock_zerocopy_alloc()
   does a sock_hold().

   As long as the cloned skbs are still in sk_error_queue,
   socket refcount is kept elevated.

3) Application closes the socket, while error queue is not empty.

Since tcp_close() no longer purges the socket error queue,
we might end up with a TCP socket with at least one skb in
error queue keeping the socket alive forever.

This bug can be (ab)used to consume all kernel memory
and freeze the host.

We need to purge the error queue, with proper synchronization
against concurrent writers.

Fixes: 24bcbe1cc69f ("net: stream: don't purge sk_error_queue in sk_stream_kill_queues()")
Reported-by: Changheon Lee &lt;darklight2357@icloud.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Changheon Lee reported TCP socket leaks, with a nice repro.

It seems we leak TCP sockets with the following sequence:

1) SOF_TIMESTAMPING_TX_ACK is enabled on the socket.

   Each ACK will cook an skb put in error queue, from __skb_tstamp_tx().
   __skb_tstamp_tx() is using skb_clone(), unless
   SOF_TIMESTAMPING_OPT_TSONLY was also requested.

2) If the application is also using MSG_ZEROCOPY, then we put in the
   error queue cloned skbs that had a struct ubuf_info attached to them.

   Whenever an struct ubuf_info is allocated, sock_zerocopy_alloc()
   does a sock_hold().

   As long as the cloned skbs are still in sk_error_queue,
   socket refcount is kept elevated.

3) Application closes the socket, while error queue is not empty.

Since tcp_close() no longer purges the socket error queue,
we might end up with a TCP socket with at least one skb in
error queue keeping the socket alive forever.

This bug can be (ab)used to consume all kernel memory
and freeze the host.

We need to purge the error queue, with proper synchronization
against concurrent writers.

Fixes: 24bcbe1cc69f ("net: stream: don't purge sk_error_queue in sk_stream_kill_queues()")
Reported-by: Changheon Lee &lt;darklight2357@icloud.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rxrpc: Fix the return value of rxrpc_new_incoming_call()</title>
<updated>2022-12-19T09:51:31+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2022-12-15T16:20:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=31d35a02ad5b803354fe0727686fcbace7a343fe'/>
<id>31d35a02ad5b803354fe0727686fcbace7a343fe</id>
<content type='text'>
Dan Carpenter sayeth[1]:

  The patch 5e6ef4f1017c: "rxrpc: Make the I/O thread take over the
  call and local processor work" from Jan 23, 2020, leads to the
  following Smatch static checker warning:

	net/rxrpc/io_thread.c:283 rxrpc_input_packet()
	warn: bool is not less than zero.

Fix this (for now) by changing rxrpc_new_incoming_call() to return an int
with 0 or error code rather than bool.  Note that the actual return value
of rxrpc_input_packet() is currently ignored.  I have a separate patch to
clean that up.

Fixes: 5e6ef4f1017c ("rxrpc: Make the I/O thread take over the call and local processor work")
Reported-by: Dan Carpenter &lt;error27@gmail.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
cc: Marc Dionne &lt;marc.dionne@auristor.com&gt;
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2022-December/006123.html [1]
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Dan Carpenter sayeth[1]:

  The patch 5e6ef4f1017c: "rxrpc: Make the I/O thread take over the
  call and local processor work" from Jan 23, 2020, leads to the
  following Smatch static checker warning:

	net/rxrpc/io_thread.c:283 rxrpc_input_packet()
	warn: bool is not less than zero.

Fix this (for now) by changing rxrpc_new_incoming_call() to return an int
with 0 or error code rather than bool.  Note that the actual return value
of rxrpc_input_packet() is currently ignored.  I have a separate patch to
clean that up.

Fixes: 5e6ef4f1017c ("rxrpc: Make the I/O thread take over the call and local processor work")
Reported-by: Dan Carpenter &lt;error27@gmail.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
cc: Marc Dionne &lt;marc.dionne@auristor.com&gt;
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2022-December/006123.html [1]
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rxrpc: rxperf: Fix uninitialised variable</title>
<updated>2022-12-19T09:51:31+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2022-12-15T16:20:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=11e1706bc84f60040578056f8cef3d0139b92dda'/>
<id>11e1706bc84f60040578056f8cef3d0139b92dda</id>
<content type='text'>
Dan Carpenter sayeth[1]:

  The patch 75bfdbf2fca3: "rxrpc: Implement an in-kernel rxperf server
  for testing purposes" from Nov 3, 2022, leads to the following Smatch
  static checker warning:

	net/rxrpc/rxperf.c:337 rxperf_deliver_to_call()
	error: uninitialized symbol 'ret'.

Fix this by initialising ret to 0.  The value is only used for tracing
purposes in the rxperf server.

Fixes: 75bfdbf2fca3 ("rxrpc: Implement an in-kernel rxperf server for testing purposes")
Reported-by: Dan Carpenter &lt;error27@gmail.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
cc: Marc Dionne &lt;marc.dionne@auristor.com&gt;
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2022-December/006124.html [1]
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Dan Carpenter sayeth[1]:

  The patch 75bfdbf2fca3: "rxrpc: Implement an in-kernel rxperf server
  for testing purposes" from Nov 3, 2022, leads to the following Smatch
  static checker warning:

	net/rxrpc/rxperf.c:337 rxperf_deliver_to_call()
	error: uninitialized symbol 'ret'.

Fix this by initialising ret to 0.  The value is only used for tracing
purposes in the rxperf server.

Fixes: 75bfdbf2fca3 ("rxrpc: Implement an in-kernel rxperf server for testing purposes")
Reported-by: Dan Carpenter &lt;error27@gmail.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
cc: Marc Dionne &lt;marc.dionne@auristor.com&gt;
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2022-December/006124.html [1]
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
