<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/net/veth.c, branch v5.17</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>veth: fix races around rq-&gt;rx_notify_masked</title>
<updated>2022-02-09T12:04:53+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2022-02-08T23:28:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=68468d8c4cd4222a4ca1f185ab5a1c14480d078c'/>
<id>68468d8c4cd4222a4ca1f185ab5a1c14480d078c</id>
<content type='text'>
veth being NETIF_F_LLTX enabled, we need to be more careful
whenever we read/write rq-&gt;rx_notify_masked.

BUG: KCSAN: data-race in veth_xmit / veth_xmit

write to 0xffff888133d9a9f8 of 1 bytes by task 23552 on cpu 0:
 __veth_xdp_flush drivers/net/veth.c:269 [inline]
 veth_xmit+0x307/0x470 drivers/net/veth.c:350
 __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
 netdev_start_xmit include/linux/netdevice.h:4697 [inline]
 xmit_one+0x105/0x2f0 net/core/dev.c:3473
 dev_hard_start_xmit net/core/dev.c:3489 [inline]
 __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116
 dev_queue_xmit+0x13/0x20 net/core/dev.c:4149
 br_dev_queue_push_xmit+0x3ce/0x430 net/bridge/br_forward.c:53
 NF_HOOK include/linux/netfilter.h:307 [inline]
 br_forward_finish net/bridge/br_forward.c:66 [inline]
 NF_HOOK include/linux/netfilter.h:307 [inline]
 __br_forward+0x2e4/0x400 net/bridge/br_forward.c:115
 br_flood+0x521/0x5c0 net/bridge/br_forward.c:242
 br_dev_xmit+0x8b6/0x960
 __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
 netdev_start_xmit include/linux/netdevice.h:4697 [inline]
 xmit_one+0x105/0x2f0 net/core/dev.c:3473
 dev_hard_start_xmit net/core/dev.c:3489 [inline]
 __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116
 dev_queue_xmit+0x13/0x20 net/core/dev.c:4149
 neigh_hh_output include/net/neighbour.h:525 [inline]
 neigh_output include/net/neighbour.h:539 [inline]
 ip_finish_output2+0x6f8/0xb70 net/ipv4/ip_output.c:228
 ip_finish_output+0xfb/0x240 net/ipv4/ip_output.c:316
 NF_HOOK_COND include/linux/netfilter.h:296 [inline]
 ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:430
 dst_output include/net/dst.h:451 [inline]
 ip_local_out net/ipv4/ip_output.c:126 [inline]
 ip_send_skb+0x6e/0xe0 net/ipv4/ip_output.c:1570
 udp_send_skb+0x641/0x880 net/ipv4/udp.c:967
 udp_sendmsg+0x12ea/0x14c0 net/ipv4/udp.c:1254
 inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg net/socket.c:725 [inline]
 ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
 ___sys_sendmsg net/socket.c:2467 [inline]
 __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
 __do_sys_sendmmsg net/socket.c:2582 [inline]
 __se_sys_sendmmsg net/socket.c:2579 [inline]
 __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff888133d9a9f8 of 1 bytes by task 23563 on cpu 1:
 __veth_xdp_flush drivers/net/veth.c:268 [inline]
 veth_xmit+0x2d6/0x470 drivers/net/veth.c:350
 __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
 netdev_start_xmit include/linux/netdevice.h:4697 [inline]
 xmit_one+0x105/0x2f0 net/core/dev.c:3473
 dev_hard_start_xmit net/core/dev.c:3489 [inline]
 __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116
 dev_queue_xmit+0x13/0x20 net/core/dev.c:4149
 br_dev_queue_push_xmit+0x3ce/0x430 net/bridge/br_forward.c:53
 NF_HOOK include/linux/netfilter.h:307 [inline]
 br_forward_finish net/bridge/br_forward.c:66 [inline]
 NF_HOOK include/linux/netfilter.h:307 [inline]
 __br_forward+0x2e4/0x400 net/bridge/br_forward.c:115
 br_flood+0x521/0x5c0 net/bridge/br_forward.c:242
 br_dev_xmit+0x8b6/0x960
 __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
 netdev_start_xmit include/linux/netdevice.h:4697 [inline]
 xmit_one+0x105/0x2f0 net/core/dev.c:3473
 dev_hard_start_xmit net/core/dev.c:3489 [inline]
 __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116
 dev_queue_xmit+0x13/0x20 net/core/dev.c:4149
 neigh_hh_output include/net/neighbour.h:525 [inline]
 neigh_output include/net/neighbour.h:539 [inline]
 ip_finish_output2+0x6f8/0xb70 net/ipv4/ip_output.c:228
 ip_finish_output+0xfb/0x240 net/ipv4/ip_output.c:316
 NF_HOOK_COND include/linux/netfilter.h:296 [inline]
 ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:430
 dst_output include/net/dst.h:451 [inline]
 ip_local_out net/ipv4/ip_output.c:126 [inline]
 ip_send_skb+0x6e/0xe0 net/ipv4/ip_output.c:1570
 udp_send_skb+0x641/0x880 net/ipv4/udp.c:967
 udp_sendmsg+0x12ea/0x14c0 net/ipv4/udp.c:1254
 inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg net/socket.c:725 [inline]
 ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
 ___sys_sendmsg net/socket.c:2467 [inline]
 __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
 __do_sys_sendmmsg net/socket.c:2582 [inline]
 __se_sys_sendmmsg net/socket.c:2579 [inline]
 __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x00 -&gt; 0x01

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 23563 Comm: syz-executor.5 Not tainted 5.17.0-rc2-syzkaller-00064-gc36c04c2e132 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 948d4f214fde ("veth: Add driver XDP")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Toshiaki Makita &lt;makita.toshiaki@lab.ntt.co.jp&gt;
Reported-by: syzbot &lt;syzkaller@googlegroups.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
veth being NETIF_F_LLTX enabled, we need to be more careful
whenever we read/write rq-&gt;rx_notify_masked.

BUG: KCSAN: data-race in veth_xmit / veth_xmit

write to 0xffff888133d9a9f8 of 1 bytes by task 23552 on cpu 0:
 __veth_xdp_flush drivers/net/veth.c:269 [inline]
 veth_xmit+0x307/0x470 drivers/net/veth.c:350
 __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
 netdev_start_xmit include/linux/netdevice.h:4697 [inline]
 xmit_one+0x105/0x2f0 net/core/dev.c:3473
 dev_hard_start_xmit net/core/dev.c:3489 [inline]
 __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116
 dev_queue_xmit+0x13/0x20 net/core/dev.c:4149
 br_dev_queue_push_xmit+0x3ce/0x430 net/bridge/br_forward.c:53
 NF_HOOK include/linux/netfilter.h:307 [inline]
 br_forward_finish net/bridge/br_forward.c:66 [inline]
 NF_HOOK include/linux/netfilter.h:307 [inline]
 __br_forward+0x2e4/0x400 net/bridge/br_forward.c:115
 br_flood+0x521/0x5c0 net/bridge/br_forward.c:242
 br_dev_xmit+0x8b6/0x960
 __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
 netdev_start_xmit include/linux/netdevice.h:4697 [inline]
 xmit_one+0x105/0x2f0 net/core/dev.c:3473
 dev_hard_start_xmit net/core/dev.c:3489 [inline]
 __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116
 dev_queue_xmit+0x13/0x20 net/core/dev.c:4149
 neigh_hh_output include/net/neighbour.h:525 [inline]
 neigh_output include/net/neighbour.h:539 [inline]
 ip_finish_output2+0x6f8/0xb70 net/ipv4/ip_output.c:228
 ip_finish_output+0xfb/0x240 net/ipv4/ip_output.c:316
 NF_HOOK_COND include/linux/netfilter.h:296 [inline]
 ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:430
 dst_output include/net/dst.h:451 [inline]
 ip_local_out net/ipv4/ip_output.c:126 [inline]
 ip_send_skb+0x6e/0xe0 net/ipv4/ip_output.c:1570
 udp_send_skb+0x641/0x880 net/ipv4/udp.c:967
 udp_sendmsg+0x12ea/0x14c0 net/ipv4/udp.c:1254
 inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg net/socket.c:725 [inline]
 ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
 ___sys_sendmsg net/socket.c:2467 [inline]
 __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
 __do_sys_sendmmsg net/socket.c:2582 [inline]
 __se_sys_sendmmsg net/socket.c:2579 [inline]
 __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff888133d9a9f8 of 1 bytes by task 23563 on cpu 1:
 __veth_xdp_flush drivers/net/veth.c:268 [inline]
 veth_xmit+0x2d6/0x470 drivers/net/veth.c:350
 __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
 netdev_start_xmit include/linux/netdevice.h:4697 [inline]
 xmit_one+0x105/0x2f0 net/core/dev.c:3473
 dev_hard_start_xmit net/core/dev.c:3489 [inline]
 __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116
 dev_queue_xmit+0x13/0x20 net/core/dev.c:4149
 br_dev_queue_push_xmit+0x3ce/0x430 net/bridge/br_forward.c:53
 NF_HOOK include/linux/netfilter.h:307 [inline]
 br_forward_finish net/bridge/br_forward.c:66 [inline]
 NF_HOOK include/linux/netfilter.h:307 [inline]
 __br_forward+0x2e4/0x400 net/bridge/br_forward.c:115
 br_flood+0x521/0x5c0 net/bridge/br_forward.c:242
 br_dev_xmit+0x8b6/0x960
 __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
 netdev_start_xmit include/linux/netdevice.h:4697 [inline]
 xmit_one+0x105/0x2f0 net/core/dev.c:3473
 dev_hard_start_xmit net/core/dev.c:3489 [inline]
 __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116
 dev_queue_xmit+0x13/0x20 net/core/dev.c:4149
 neigh_hh_output include/net/neighbour.h:525 [inline]
 neigh_output include/net/neighbour.h:539 [inline]
 ip_finish_output2+0x6f8/0xb70 net/ipv4/ip_output.c:228
 ip_finish_output+0xfb/0x240 net/ipv4/ip_output.c:316
 NF_HOOK_COND include/linux/netfilter.h:296 [inline]
 ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:430
 dst_output include/net/dst.h:451 [inline]
 ip_local_out net/ipv4/ip_output.c:126 [inline]
 ip_send_skb+0x6e/0xe0 net/ipv4/ip_output.c:1570
 udp_send_skb+0x641/0x880 net/ipv4/udp.c:967
 udp_sendmsg+0x12ea/0x14c0 net/ipv4/udp.c:1254
 inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg net/socket.c:725 [inline]
 ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
 ___sys_sendmsg net/socket.c:2467 [inline]
 __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
 __do_sys_sendmmsg net/socket.c:2582 [inline]
 __se_sys_sendmmsg net/socket.c:2579 [inline]
 __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x00 -&gt; 0x01

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 23563 Comm: syz-executor.5 Not tainted 5.17.0-rc2-syzkaller-00064-gc36c04c2e132 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 948d4f214fde ("veth: Add driver XDP")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Toshiaki Makita &lt;makita.toshiaki@lab.ntt.co.jp&gt;
Reported-by: syzbot &lt;syzkaller@googlegroups.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>veth: Do not record rx queue hint in veth_xmit</title>
<updated>2022-01-06T13:49:54+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2022-01-06T00:46:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=710ad98c363a66a0cd8526465426c5c5f8377ee0'/>
<id>710ad98c363a66a0cd8526465426c5c5f8377ee0</id>
<content type='text'>
Laurent reported that they have seen a significant amount of TCP retransmissions
at high throughput from applications residing in network namespaces talking to
the outside world via veths. The drops were seen on the qdisc layer (fq_codel,
as per systemd default) of the phys device such as ena or virtio_net due to all
traffic hitting a _single_ TX queue _despite_ multi-queue device. (Note that the
setup was _not_ using XDP on veths as the issue is generic.)

More specifically, after edbea9220251 ("veth: Store queue_mapping independently
of XDP prog presence") which made it all the way back to v4.19.184+,
skb_record_rx_queue() would set skb-&gt;queue_mapping to 1 (given 1 RX and 1 TX
queue by default for veths) instead of leaving at 0.

This is eventually retained and callbacks like ena_select_queue() will also pick
single queue via netdev_core_pick_tx()'s ndo_select_queue() once all the traffic
is forwarded to that device via upper stack or other means. Similarly, for others
not implementing ndo_select_queue() if XPS is disabled, netdev_pick_tx() might
call into the skb_tx_hash() and check for prior skb_rx_queue_recorded() as well.

In general, it is a _bad_ idea for virtual devices like veth to mess around with
queue selection [by default]. Given dev-&gt;real_num_tx_queues is by default 1,
the skb-&gt;queue_mapping was left untouched, and so prior to edbea9220251 the
netdev_core_pick_tx() could do its job upon __dev_queue_xmit() on the phys device.

Unbreak this and restore prior behavior by removing the skb_record_rx_queue()
from veth_xmit() altogether.

If the veth peer has an XDP program attached, then it would return the first RX
queue index in xdp_md-&gt;rx_queue_index (unless configured in non-default manner).
However, this is still better than breaking the generic case.

Fixes: edbea9220251 ("veth: Store queue_mapping independently of XDP prog presence")
Fixes: 638264dc9022 ("veth: Support per queue XDP ring")
Reported-by: Laurent Bernaille &lt;laurent.bernaille@datadoghq.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Cc: Toshiaki Makita &lt;toshiaki.makita1@gmail.com&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Paolo Abeni &lt;pabeni@redhat.com&gt;
Cc: John Fastabend &lt;john.fastabend@gmail.com&gt;
Cc: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Toshiaki Makita &lt;toshiaki.makita1@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Laurent reported that they have seen a significant amount of TCP retransmissions
at high throughput from applications residing in network namespaces talking to
the outside world via veths. The drops were seen on the qdisc layer (fq_codel,
as per systemd default) of the phys device such as ena or virtio_net due to all
traffic hitting a _single_ TX queue _despite_ multi-queue device. (Note that the
setup was _not_ using XDP on veths as the issue is generic.)

More specifically, after edbea9220251 ("veth: Store queue_mapping independently
of XDP prog presence") which made it all the way back to v4.19.184+,
skb_record_rx_queue() would set skb-&gt;queue_mapping to 1 (given 1 RX and 1 TX
queue by default for veths) instead of leaving at 0.

This is eventually retained and callbacks like ena_select_queue() will also pick
single queue via netdev_core_pick_tx()'s ndo_select_queue() once all the traffic
is forwarded to that device via upper stack or other means. Similarly, for others
not implementing ndo_select_queue() if XPS is disabled, netdev_pick_tx() might
call into the skb_tx_hash() and check for prior skb_rx_queue_recorded() as well.

In general, it is a _bad_ idea for virtual devices like veth to mess around with
queue selection [by default]. Given dev-&gt;real_num_tx_queues is by default 1,
the skb-&gt;queue_mapping was left untouched, and so prior to edbea9220251 the
netdev_core_pick_tx() could do its job upon __dev_queue_xmit() on the phys device.

Unbreak this and restore prior behavior by removing the skb_record_rx_queue()
from veth_xmit() altogether.

If the veth peer has an XDP program attached, then it would return the first RX
queue index in xdp_md-&gt;rx_queue_index (unless configured in non-default manner).
However, this is still better than breaking the generic case.

Fixes: edbea9220251 ("veth: Store queue_mapping independently of XDP prog presence")
Fixes: 638264dc9022 ("veth: Support per queue XDP ring")
Reported-by: Laurent Bernaille &lt;laurent.bernaille@datadoghq.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Cc: Toshiaki Makita &lt;toshiaki.makita1@gmail.com&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Paolo Abeni &lt;pabeni@redhat.com&gt;
Cc: John Fastabend &lt;john.fastabend@gmail.com&gt;
Cc: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Toshiaki Makita &lt;toshiaki.makita1@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next</title>
<updated>2021-12-31T14:35:40+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2021-12-31T14:35:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e63a02348958cd7cc8c8401c94de57ad97b5d06c'/>
<id>e63a02348958cd7cc8c8401c94de57ad97b5d06c</id>
<content type='text'>
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-12-30

The following pull-request contains BPF updates for your *net-next* tree.

We've added 72 non-merge commits during the last 20 day(s) which contain
a total of 223 files changed, 3510 insertions(+), 1591 deletions(-).

The main changes are:

1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii.

2) Beautify and de-verbose verifier logs, from Christy.

3) Composable verifier types, from Hao.

4) bpf_strncmp helper, from Hou.

5) bpf.h header dependency cleanup, from Jakub.

6) get_func_[arg|ret|arg_cnt] helpers, from Jiri.

7) Sleepable local storage, from KP.

8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar.
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-12-30

The following pull-request contains BPF updates for your *net-next* tree.

We've added 72 non-merge commits during the last 20 day(s) which contain
a total of 223 files changed, 3510 insertions(+), 1591 deletions(-).

The main changes are:

1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii.

2) Beautify and de-verbose verifier logs, from Christy.

3) Composable verifier types, from Hao.

4) bpf_strncmp helper, from Hou.

5) bpf.h header dependency cleanup, from Jakub.

6) get_func_[arg|ret|arg_cnt] helpers, from Jiri.

7) Sleepable local storage, from KP.

8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar.
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2021-12-24T00:09:58+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2021-12-24T00:09:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8b3f91332291fa280a56215f5189baca185998f5'/>
<id>8b3f91332291fa280a56215f5189baca185998f5</id>
<content type='text'>
include/net/sock.h
  commit 8f905c0e7354 ("inet: fully convert sk-&gt;sk_rx_dst to RCU rules")
  commit 43f51df41729 ("net: move early demux fields close to sk_refcnt")
  https://lore.kernel.org/all/20211222141641.0caa0ab3@canb.auug.org.au/

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
include/net/sock.h
  commit 8f905c0e7354 ("inet: fully convert sk-&gt;sk_rx_dst to RCU rules")
  commit 43f51df41729 ("net: move early demux fields close to sk_refcnt")
  https://lore.kernel.org/all/20211222141641.0caa0ab3@canb.auug.org.au/

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>veth: ensure skb entering GRO are not cloned.</title>
<updated>2021-12-23T17:44:06+00:00</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2021-12-22T18:39:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9695b7de5b4760ed22132aca919570c0190cb0ce'/>
<id>9695b7de5b4760ed22132aca919570c0190cb0ce</id>
<content type='text'>
After commit d3256efd8e8b ("veth: allow enabling NAPI even without XDP"),
if GRO is enabled on a veth device and TSO is disabled on the peer
device, TCP skbs will go through the NAPI callback. If there is no XDP
program attached, the veth code does not perform any share check, and
shared/cloned skbs could enter the GRO engine.

Ignat reported a BUG triggered later-on due to the above condition:

[   53.970529][    C1] kernel BUG at net/core/skbuff.c:3574!
[   53.981755][    C1] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[   53.982634][    C1] CPU: 1 PID: 19 Comm: ksoftirqd/1 Not tainted 5.16.0-rc5+ #25
[   53.982634][    C1] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[   53.982634][    C1] RIP: 0010:skb_shift+0x13ef/0x23b0
[   53.982634][    C1] Code: ea 03 0f b6 04 02 48 89 fa 83 e2 07 38 d0
7f 08 84 c0 0f 85 41 0c 00 00 41 80 7f 02 00 4d 8d b5 d0 00 00 00 0f
85 74 f5 ff ff &lt;0f&gt; 0b 4d 8d 77 20 be 04 00 00 00 4c 89 44 24 78 4c 89
f7 4c 89 8c
[   53.982634][    C1] RSP: 0018:ffff8881008f7008 EFLAGS: 00010246
[   53.982634][    C1] RAX: 0000000000000000 RBX: ffff8881180b4c80 RCX: 0000000000000000
[   53.982634][    C1] RDX: 0000000000000002 RSI: ffff8881180b4d3c RDI: ffff88810bc9cac2
[   53.982634][    C1] RBP: ffff8881008f70b8 R08: ffff8881180b4cf4 R09: ffff8881180b4cf0
[   53.982634][    C1] R10: ffffed1022999e5c R11: 0000000000000002 R12: 0000000000000590
[   53.982634][    C1] R13: ffff88810f940c80 R14: ffff88810f940d50 R15: ffff88810bc9cac0
[   53.982634][    C1] FS:  0000000000000000(0000) GS:ffff888235880000(0000) knlGS:0000000000000000
[   53.982634][    C1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   53.982634][    C1] CR2: 00007ff5f9b86680 CR3: 0000000108ce8004 CR4: 0000000000170ee0
[   53.982634][    C1] Call Trace:
[   53.982634][    C1]  &lt;TASK&gt;
[   53.982634][    C1]  tcp_sacktag_walk+0xaba/0x18e0
[   53.982634][    C1]  tcp_sacktag_write_queue+0xe7b/0x3460
[   53.982634][    C1]  tcp_ack+0x2666/0x54b0
[   53.982634][    C1]  tcp_rcv_established+0x4d9/0x20f0
[   53.982634][    C1]  tcp_v4_do_rcv+0x551/0x810
[   53.982634][    C1]  tcp_v4_rcv+0x22ed/0x2ed0
[   53.982634][    C1]  ip_protocol_deliver_rcu+0x96/0xaf0
[   53.982634][    C1]  ip_local_deliver_finish+0x1e0/0x2f0
[   53.982634][    C1]  ip_sublist_rcv_finish+0x211/0x440
[   53.982634][    C1]  ip_list_rcv_finish.constprop.0+0x424/0x660
[   53.982634][    C1]  ip_list_rcv+0x2c8/0x410
[   53.982634][    C1]  __netif_receive_skb_list_core+0x65c/0x910
[   53.982634][    C1]  netif_receive_skb_list_internal+0x5f9/0xcb0
[   53.982634][    C1]  napi_complete_done+0x188/0x6e0
[   53.982634][    C1]  gro_cell_poll+0x10c/0x1d0
[   53.982634][    C1]  __napi_poll+0xa1/0x530
[   53.982634][    C1]  net_rx_action+0x567/0x1270
[   53.982634][    C1]  __do_softirq+0x28a/0x9ba
[   53.982634][    C1]  run_ksoftirqd+0x32/0x60
[   53.982634][    C1]  smpboot_thread_fn+0x559/0x8c0
[   53.982634][    C1]  kthread+0x3b9/0x490
[   53.982634][    C1]  ret_from_fork+0x22/0x30
[   53.982634][    C1]  &lt;/TASK&gt;

Address the issue by skipping the GRO stage for shared or cloned skbs.
To reduce the chance of OoO, try to unclone the skbs before giving up.

v1 -&gt; v2:
 - use avoid skb_copy and fallback to netif_receive_skb  - Eric

Reported-by: Ignat Korchagin &lt;ignat@cloudflare.com&gt;
Fixes: d3256efd8e8b ("veth: allow enabling NAPI even without XDP")
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Tested-by: Ignat Korchagin &lt;ignat@cloudflare.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://lore.kernel.org/r/b5f61c5602aab01bac8d711d8d1bfab0a4817db7.1640197544.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After commit d3256efd8e8b ("veth: allow enabling NAPI even without XDP"),
if GRO is enabled on a veth device and TSO is disabled on the peer
device, TCP skbs will go through the NAPI callback. If there is no XDP
program attached, the veth code does not perform any share check, and
shared/cloned skbs could enter the GRO engine.

Ignat reported a BUG triggered later-on due to the above condition:

[   53.970529][    C1] kernel BUG at net/core/skbuff.c:3574!
[   53.981755][    C1] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[   53.982634][    C1] CPU: 1 PID: 19 Comm: ksoftirqd/1 Not tainted 5.16.0-rc5+ #25
[   53.982634][    C1] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[   53.982634][    C1] RIP: 0010:skb_shift+0x13ef/0x23b0
[   53.982634][    C1] Code: ea 03 0f b6 04 02 48 89 fa 83 e2 07 38 d0
7f 08 84 c0 0f 85 41 0c 00 00 41 80 7f 02 00 4d 8d b5 d0 00 00 00 0f
85 74 f5 ff ff &lt;0f&gt; 0b 4d 8d 77 20 be 04 00 00 00 4c 89 44 24 78 4c 89
f7 4c 89 8c
[   53.982634][    C1] RSP: 0018:ffff8881008f7008 EFLAGS: 00010246
[   53.982634][    C1] RAX: 0000000000000000 RBX: ffff8881180b4c80 RCX: 0000000000000000
[   53.982634][    C1] RDX: 0000000000000002 RSI: ffff8881180b4d3c RDI: ffff88810bc9cac2
[   53.982634][    C1] RBP: ffff8881008f70b8 R08: ffff8881180b4cf4 R09: ffff8881180b4cf0
[   53.982634][    C1] R10: ffffed1022999e5c R11: 0000000000000002 R12: 0000000000000590
[   53.982634][    C1] R13: ffff88810f940c80 R14: ffff88810f940d50 R15: ffff88810bc9cac0
[   53.982634][    C1] FS:  0000000000000000(0000) GS:ffff888235880000(0000) knlGS:0000000000000000
[   53.982634][    C1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   53.982634][    C1] CR2: 00007ff5f9b86680 CR3: 0000000108ce8004 CR4: 0000000000170ee0
[   53.982634][    C1] Call Trace:
[   53.982634][    C1]  &lt;TASK&gt;
[   53.982634][    C1]  tcp_sacktag_walk+0xaba/0x18e0
[   53.982634][    C1]  tcp_sacktag_write_queue+0xe7b/0x3460
[   53.982634][    C1]  tcp_ack+0x2666/0x54b0
[   53.982634][    C1]  tcp_rcv_established+0x4d9/0x20f0
[   53.982634][    C1]  tcp_v4_do_rcv+0x551/0x810
[   53.982634][    C1]  tcp_v4_rcv+0x22ed/0x2ed0
[   53.982634][    C1]  ip_protocol_deliver_rcu+0x96/0xaf0
[   53.982634][    C1]  ip_local_deliver_finish+0x1e0/0x2f0
[   53.982634][    C1]  ip_sublist_rcv_finish+0x211/0x440
[   53.982634][    C1]  ip_list_rcv_finish.constprop.0+0x424/0x660
[   53.982634][    C1]  ip_list_rcv+0x2c8/0x410
[   53.982634][    C1]  __netif_receive_skb_list_core+0x65c/0x910
[   53.982634][    C1]  netif_receive_skb_list_internal+0x5f9/0xcb0
[   53.982634][    C1]  napi_complete_done+0x188/0x6e0
[   53.982634][    C1]  gro_cell_poll+0x10c/0x1d0
[   53.982634][    C1]  __napi_poll+0xa1/0x530
[   53.982634][    C1]  net_rx_action+0x567/0x1270
[   53.982634][    C1]  __do_softirq+0x28a/0x9ba
[   53.982634][    C1]  run_ksoftirqd+0x32/0x60
[   53.982634][    C1]  smpboot_thread_fn+0x559/0x8c0
[   53.982634][    C1]  kthread+0x3b9/0x490
[   53.982634][    C1]  ret_from_fork+0x22/0x30
[   53.982634][    C1]  &lt;/TASK&gt;

Address the issue by skipping the GRO stage for shared or cloned skbs.
To reduce the chance of OoO, try to unclone the skbs before giving up.

v1 -&gt; v2:
 - use avoid skb_copy and fallback to netif_receive_skb  - Eric

Reported-by: Ignat Korchagin &lt;ignat@cloudflare.com&gt;
Fixes: d3256efd8e8b ("veth: allow enabling NAPI even without XDP")
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Tested-by: Ignat Korchagin &lt;ignat@cloudflare.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://lore.kernel.org/r/b5f61c5602aab01bac8d711d8d1bfab0a4817db7.1640197544.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Let bpf_warn_invalid_xdp_action() report more info</title>
<updated>2021-12-13T21:28:27+00:00</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2021-11-30T10:08:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c8064e5b4adac5e1255cf4f3b374e75b5376e7ca'/>
<id>c8064e5b4adac5e1255cf4f3b374e75b5376e7ca</id>
<content type='text'>
In non trivial scenarios, the action id alone is not sufficient to
identify the program causing the warning. Before the previous patch,
the generated stack-trace pointed out at least the involved device
driver.

Let's additionally include the program name and id, and the relevant
device name.

If the user needs additional infos, he can fetch them via a kernel
probe, leveraging the arguments added here.

Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Link: https://lore.kernel.org/bpf/ddb96bb975cbfddb1546cf5da60e77d5100b533c.1638189075.git.pabeni@redhat.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In non trivial scenarios, the action id alone is not sufficient to
identify the program causing the warning. Before the previous patch,
the generated stack-trace pointed out at least the involved device
driver.

Let's additionally include the program name and id, and the relevant
device name.

If the user needs additional infos, he can fetch them via a kernel
probe, leveraging the arguments added here.

Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Link: https://lore.kernel.org/bpf/ddb96bb975cbfddb1546cf5da60e77d5100b533c.1638189075.git.pabeni@redhat.com
</pre>
</div>
</content>
</entry>
<entry>
<title>veth: use ethtool_sprintf instead of snprintf</title>
<updated>2021-11-26T03:15:12+00:00</updated>
<author>
<name>Tonghao Zhang</name>
<email>xiangxia.m.yue@gmail.com</email>
</author>
<published>2021-11-25T02:54:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a0341b73d843143afef52313b9f4470f5523941e'/>
<id>a0341b73d843143afef52313b9f4470f5523941e</id>
<content type='text'>
use ethtools api ethtool_sprintf to instead of snprintf.

Signed-off-by: Tonghao Zhang &lt;xiangxia.m.yue@gmail.com&gt;
Link: https://lore.kernel.org/r/20211125025444.13115-1-xiangxia.m.yue@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
use ethtools api ethtool_sprintf to instead of snprintf.

Signed-off-by: Tonghao Zhang &lt;xiangxia.m.yue@gmail.com&gt;
Link: https://lore.kernel.org/r/20211125025444.13115-1-xiangxia.m.yue@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: annotate accesses to dev-&gt;gso_max_segs</title>
<updated>2021-11-22T12:49:42+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2021-11-19T15:43:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6d872df3e3b91532b142de9044e5b4984017a55f'/>
<id>6d872df3e3b91532b142de9044e5b4984017a55f</id>
<content type='text'>
dev-&gt;gso_max_segs is written under RTNL protection, or when the device is
not yet visible, but is read locklessly.

Add netif_set_gso_max_segs() helper.

Add the READ_ONCE()/WRITE_ONCE() pairs, and use netif_set_gso_max_segs()
where we can to better document what is going on.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
dev-&gt;gso_max_segs is written under RTNL protection, or when the device is
not yet visible, but is read locklessly.

Add netif_set_gso_max_segs() helper.

Add the READ_ONCE()/WRITE_ONCE() pairs, and use netif_set_gso_max_segs()
where we can to better document what is going on.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: annotate accesses to dev-&gt;gso_max_size</title>
<updated>2021-11-22T12:49:42+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2021-11-19T15:43:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4b66d2161b8125b6caa6971815e85631cf3cf36f'/>
<id>4b66d2161b8125b6caa6971815e85631cf3cf36f</id>
<content type='text'>
dev-&gt;gso_max_size is written under RTNL protection, or when the device is
not yet visible, but is read locklessly.

Add the READ_ONCE()/WRITE_ONCE() pairs, and use netif_set_gso_max_size()
where we can to better document what is going on.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
dev-&gt;gso_max_size is written under RTNL protection, or when the device is
not yet visible, but is read locklessly.

Add the READ_ONCE()/WRITE_ONCE() pairs, and use netif_set_gso_max_size()
where we can to better document what is going on.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>veth: use skb_prepare_for_gro()</title>
<updated>2021-07-29T11:18:12+00:00</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2021-07-28T16:24:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d504fff0d14a0fd683e9ec1f736c6e1f894667ae'/>
<id>d504fff0d14a0fd683e9ec1f736c6e1f894667ae</id>
<content type='text'>
Leveraging the previous patch we can now avoid orphaning the
skb in the veth gro path, allowing correct backpressure.

Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Leveraging the previous patch we can now avoid orphaning the
skb in the veth gro path, allowing correct backpressure.

Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
