| Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth, can and IPsec.
Current release - regressions:
- net: add net.core.qdisc_max_burst
- can: propagate CAN device capabilities via ml_priv
Previous releases - regressions:
- dst: fix races in rt6_uncached_list_del() and
rt_del_uncached_list()
- ipv6: fix use-after-free in inet6_addr_del().
- xfrm: fix inner mode lookup in tunnel mode GSO segmentation
- ip_tunnel: spread netdev_lockdep_set_classes()
- ip6_tunnel: use skb_vlan_inet_prepare() in __ip6_tnl_rcv()
- bluetooth: hci_sync: enable PA sync lost event
- eth: virtio-net:
- fix the deadlock when disabling rx NAPI
- fix misalignment bug in struct virtnet_info
Previous releases - always broken:
- ipv4: ip_gre: make ipgre_header() robust
- can: fix SSP_SRC in cases when bit-rate is higher than 1 MBit.
- eth:
- mlx5e: profile change fix
- octeon_ep_vf: fix free_irq dev_id mismatch in IRQ rollback
- macvlan: fix possible UAF in macvlan_forward_source()"
* tag 'net-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits)
virtio_net: Fix misalignment bug in struct virtnet_info
net: can: j1939: j1939_xtp_rx_rts_session_active(): deactivate session upon receiving the second rts
can: raw: instantly reject disabled CAN frames
can: propagate CAN device capabilities via ml_priv
Revert "can: raw: instantly reject unsupported CAN frames"
net/sched: sch_qfq: do not free existing class in qfq_change_class()
selftests: drv-net: fix RPS mask handling for high CPU numbers
selftests: drv-net: fix RPS mask handling in toeplitz test
ipv6: Fix use-after-free in inet6_addr_del().
dst: fix races in rt6_uncached_list_del() and rt_del_uncached_list()
net: hv_netvsc: reject RSS hash key programming without RX indirection table
tools: ynl: render event op docs correctly
net: add net.core.qdisc_max_burst
net: airoha: Fix typo in airoha_ppe_setup_tc_block_cb definition
net: phy: motorcomm: fix duplex setting error for phy leds
net: octeon_ep_vf: fix free_irq dev_id mismatch in IRQ rollback
net/mlx5e: Restore destroying state bit after profile cleanup
net/mlx5e: Pass netdev to mlx5e_destroy_netdev instead of priv
net/mlx5e: Don't store mlx5e_priv in mlx5e_dev devlink priv
net/mlx5e: Fix crash on profile change rollback failure
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2026-01-15
this is a pull request of 4 patches for net/main, it super-seeds the
"can 2026-01-14" pull request. The dev refcount leak in patch #3 is
fixed.
The first 3 patches are by Oliver Hartkopp and revert the approach to
instantly reject unsupported CAN frames introduced in
net-next-for-v6.19 and replace it by placing the needed data into the
CAN specific ml_priv.
The last patch is by Tetsuo Handa and fixes a J1939 refcount leak for
j1939_session in session deactivation upon receiving the second RTS.
linux-can-fixes-for-6.19-20260115
* tag 'linux-can-fixes-for-6.19-20260115' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
net: can: j1939: j1939_xtp_rx_rts_session_active(): deactivate session upon receiving the second rts
can: raw: instantly reject disabled CAN frames
can: propagate CAN device capabilities via ml_priv
Revert "can: raw: instantly reject unsupported CAN frames"
====================
Link: https://patch.msgid.link/20260115090603.1124860-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2026-01-14
1) Fix inner mode lookup in tunnel mode GSO segmentation.
The protocol was taken from the wrong field.
2) Set ipv4 no_pmtu_disc flag only on output SAs. The
insertation of input SAs can fail if no_pmtu_disc
is set.
Please pull or let me know if there are problems.
ipsec-2026-01-14
* tag 'ipsec-2026-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: set ipv4 no_pmtu_disc flag only on output sa when direction is set
xfrm: Fix inner mode lookup in tunnel mode GSO segmentation
====================
Link: https://patch.msgid.link/20260114121817.1106134-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
receiving the second rts
Since j1939_session_deactivate_activate_next() in j1939_tp_rxtimer() is
called only when the timer is enabled, we need to call
j1939_session_deactivate_activate_next() if we cancelled the timer.
Otherwise, refcount for j1939_session leaks, which will later appear as
| unregister_netdevice: waiting for vcan0 to become free. Usage count = 2.
problem.
Reported-by: syzbot <syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Link: https://patch.msgid.link/b1212653-8fa1-44e1-be9d-12f950fb3a07@I-love.SAKURA.ne.jp
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
For real CAN interfaces the CAN_CTRLMODE_FD and CAN_CTRLMODE_XL control
modes indicate whether an interface can handle those CAN FD/XL frames.
In the case a CAN XL interface is configured in CANXL-only mode with
disabled error-signalling neither CAN CC nor CAN FD frames can be sent.
The checks are now performed on CAN_RAW sockets to give an instant feedback
to the user when writing unsupported CAN frames to the interface or when
the CAN interface is in read-only mode.
Fixes: 1a620a723853 ("can: raw: instantly reject unsupported CAN frames")
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Vincent Mailhol <mailhol@kernel.org>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260109144135.8495-4-socketcan@hartkopp.net
[mkl: fix dev reference leak]
Link: https://lore.kernel.org/all/0636c732-2e71-4633-8005-dfa85e1da445@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
This reverts commit 1a620a723853a0f49703c317d52dc6b9602cbaa8
and its follow-up fixes for the introduced dependency issues.
commit 1a620a723853 ("can: raw: instantly reject unsupported CAN frames")
commit cb2dc6d2869a ("can: Kconfig: select CAN driver infrastructure by default")
commit 6abd4577bccc ("can: fix build dependency")
commit 5a5aff6338c0 ("can: fix build dependency")
The entire problem was caused by the requirement that a new network layer
feature needed to know about the protocol capabilities of the CAN devices.
Instead of accessing CAN device internal data structures which caused the
dependency problems a better approach has been developed which makes use of
CAN specific ml_priv data which is accessible from both sides.
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Vincent Mailhol <mailhol@kernel.org>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260109144135.8495-2-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Pull bpf fixes from Alexei Starovoitov:
- Fix incorrect usage of BPF_TRAMP_F_ORIG_STACK in riscv JIT (Menglong
Dong)
- Fix reference count leak in bpf_prog_test_run_xdp() (Tetsuo Handa)
- Fix metadata size check in bpf_test_run() (Toke Høiland-Jørgensen)
- Check that BPF insn array is not allowed as a map for const strings
(Deepanshu Kartikey)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix reference count leak in bpf_prog_test_run_xdp()
bpf: Reject BPF_MAP_TYPE_INSN_ARRAY in check_reg_const_str()
selftests/bpf: Update xdp_context_test_run test to check maximum metadata size
bpf, test_run: Subtract size of xdp_frame from allowed metadata size
riscv, bpf: Fix incorrect usage of BPF_TRAMP_F_ORIG_STACK
|
|
Fixes qfq_change_class() error case.
cl->qdisc and cl should only be freed if a new class and qdisc
were allocated, or we risk various UAF.
Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
Reported-by: syzbot+07f3f38f723c335f106d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6965351d.050a0220.eaf7.00c5.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260112175656.17605-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot reported use-after-free of inet6_ifaddr in
inet6_addr_del(). [0]
The cited commit accidentally moved ipv6_del_addr() for
mngtmpaddr before reading its ifp->flags for temporary
addresses in inet6_addr_del().
Let's move ipv6_del_addr() down to fix the UAF.
[0]:
BUG: KASAN: slab-use-after-free in inet6_addr_del.constprop.0+0x67a/0x6b0 net/ipv6/addrconf.c:3117
Read of size 4 at addr ffff88807b89c86c by task syz.3.1618/9593
CPU: 0 UID: 0 PID: 9593 Comm: syz.3.1618 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xcd/0x630 mm/kasan/report.c:482
kasan_report+0xe0/0x110 mm/kasan/report.c:595
inet6_addr_del.constprop.0+0x67a/0x6b0 net/ipv6/addrconf.c:3117
addrconf_del_ifaddr+0x11e/0x190 net/ipv6/addrconf.c:3181
inet6_ioctl+0x1e5/0x2b0 net/ipv6/af_inet6.c:582
sock_do_ioctl+0x118/0x280 net/socket.c:1254
sock_ioctl+0x227/0x6b0 net/socket.c:1375
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:597 [inline]
__se_sys_ioctl fs/ioctl.c:583 [inline]
__x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f164cf8f749
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f164de64038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f164d1e5fa0 RCX: 00007f164cf8f749
RDX: 0000200000000000 RSI: 0000000000008936 RDI: 0000000000000003
RBP: 00007f164d013f91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f164d1e6038 R14: 00007f164d1e5fa0 R15: 00007ffde15c8288
</TASK>
Allocated by task 9593:
kasan_save_stack+0x33/0x60 mm/kasan/common.c:56
kasan_save_track+0x14/0x30 mm/kasan/common.c:77
poison_kmalloc_redzone mm/kasan/common.c:397 [inline]
__kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:414
kmalloc_noprof include/linux/slab.h:957 [inline]
kzalloc_noprof include/linux/slab.h:1094 [inline]
ipv6_add_addr+0x4e3/0x2010 net/ipv6/addrconf.c:1120
inet6_addr_add+0x256/0x9b0 net/ipv6/addrconf.c:3050
addrconf_add_ifaddr+0x1fc/0x450 net/ipv6/addrconf.c:3160
inet6_ioctl+0x103/0x2b0 net/ipv6/af_inet6.c:580
sock_do_ioctl+0x118/0x280 net/socket.c:1254
sock_ioctl+0x227/0x6b0 net/socket.c:1375
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:597 [inline]
__se_sys_ioctl fs/ioctl.c:583 [inline]
__x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 6099:
kasan_save_stack+0x33/0x60 mm/kasan/common.c:56
kasan_save_track+0x14/0x30 mm/kasan/common.c:77
kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:584
poison_slab_object mm/kasan/common.c:252 [inline]
__kasan_slab_free+0x5f/0x80 mm/kasan/common.c:284
kasan_slab_free include/linux/kasan.h:234 [inline]
slab_free_hook mm/slub.c:2540 [inline]
slab_free_freelist_hook mm/slub.c:2569 [inline]
slab_free_bulk mm/slub.c:6696 [inline]
kmem_cache_free_bulk mm/slub.c:7383 [inline]
kmem_cache_free_bulk+0x2bf/0x680 mm/slub.c:7362
kfree_bulk include/linux/slab.h:830 [inline]
kvfree_rcu_bulk+0x1b7/0x1e0 mm/slab_common.c:1523
kvfree_rcu_drain_ready mm/slab_common.c:1728 [inline]
kfree_rcu_monitor+0x1d0/0x2f0 mm/slab_common.c:1801
process_one_work+0x9ba/0x1b20 kernel/workqueue.c:3257
process_scheduled_works kernel/workqueue.c:3340 [inline]
worker_thread+0x6c8/0xf10 kernel/workqueue.c:3421
kthread+0x3c5/0x780 kernel/kthread.c:463
ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
Fixes: 00b5b7aab9e42 ("net/ipv6: delete temporary address if mngtmpaddr is removed or unmanaged")
Reported-by: syzbot+72e610f4f1a930ca9d8a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/696598e9.050a0220.3be5c5.0009.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260113010538.2019411-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot was able to crash the kernel in rt6_uncached_list_flush_dev()
in an interesting way [1]
Crash happens in list_del_init()/INIT_LIST_HEAD() while writing
list->prev, while the prior write on list->next went well.
static inline void INIT_LIST_HEAD(struct list_head *list)
{
WRITE_ONCE(list->next, list); // This went well
WRITE_ONCE(list->prev, list); // Crash, @list has been freed.
}
Issue here is that rt6_uncached_list_del() did not attempt to lock
ul->lock, as list_empty(&rt->dst.rt_uncached) returned
true because the WRITE_ONCE(list->next, list) happened on the other CPU.
We might use list_del_init_careful() and list_empty_careful(),
or make sure rt6_uncached_list_del() always grabs the spinlock
whenever rt->dst.rt_uncached_list has been set.
A similar fix is neeed for IPv4.
[1]
BUG: KASAN: slab-use-after-free in INIT_LIST_HEAD include/linux/list.h:46 [inline]
BUG: KASAN: slab-use-after-free in list_del_init include/linux/list.h:296 [inline]
BUG: KASAN: slab-use-after-free in rt6_uncached_list_flush_dev net/ipv6/route.c:191 [inline]
BUG: KASAN: slab-use-after-free in rt6_disable_ip+0x633/0x730 net/ipv6/route.c:5020
Write of size 8 at addr ffff8880294cfa78 by task kworker/u8:14/3450
CPU: 0 UID: 0 PID: 3450 Comm: kworker/u8:14 Tainted: G L syzkaller #0 PREEMPT_{RT,(full)}
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
Workqueue: netns cleanup_net
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xca/0x240 mm/kasan/report.c:482
kasan_report+0x118/0x150 mm/kasan/report.c:595
INIT_LIST_HEAD include/linux/list.h:46 [inline]
list_del_init include/linux/list.h:296 [inline]
rt6_uncached_list_flush_dev net/ipv6/route.c:191 [inline]
rt6_disable_ip+0x633/0x730 net/ipv6/route.c:5020
addrconf_ifdown+0x143/0x18a0 net/ipv6/addrconf.c:3853
addrconf_notify+0x1bc/0x1050 net/ipv6/addrconf.c:-1
notifier_call_chain+0x19d/0x3a0 kernel/notifier.c:85
call_netdevice_notifiers_extack net/core/dev.c:2268 [inline]
call_netdevice_notifiers net/core/dev.c:2282 [inline]
netif_close_many+0x29c/0x410 net/core/dev.c:1785
unregister_netdevice_many_notify+0xb50/0x2330 net/core/dev.c:12353
ops_exit_rtnl_list net/core/net_namespace.c:187 [inline]
ops_undo_list+0x3dc/0x990 net/core/net_namespace.c:248
cleanup_net+0x4de/0x7b0 net/core/net_namespace.c:696
process_one_work kernel/workqueue.c:3257 [inline]
process_scheduled_works+0xad1/0x1770 kernel/workqueue.c:3340
worker_thread+0x8a0/0xda0 kernel/workqueue.c:3421
kthread+0x711/0x8a0 kernel/kthread.c:463
ret_from_fork+0x510/0xa50 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
</TASK>
Allocated by task 803:
kasan_save_stack mm/kasan/common.c:57 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
unpoison_slab_object mm/kasan/common.c:340 [inline]
__kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:366
kasan_slab_alloc include/linux/kasan.h:253 [inline]
slab_post_alloc_hook mm/slub.c:4953 [inline]
slab_alloc_node mm/slub.c:5263 [inline]
kmem_cache_alloc_noprof+0x18d/0x6c0 mm/slub.c:5270
dst_alloc+0x105/0x170 net/core/dst.c:89
ip6_dst_alloc net/ipv6/route.c:342 [inline]
icmp6_dst_alloc+0x75/0x460 net/ipv6/route.c:3333
mld_sendpack+0x683/0xe60 net/ipv6/mcast.c:1844
mld_send_cr net/ipv6/mcast.c:2154 [inline]
mld_ifc_work+0x83e/0xd60 net/ipv6/mcast.c:2693
process_one_work kernel/workqueue.c:3257 [inline]
process_scheduled_works+0xad1/0x1770 kernel/workqueue.c:3340
worker_thread+0x8a0/0xda0 kernel/workqueue.c:3421
kthread+0x711/0x8a0 kernel/kthread.c:463
ret_from_fork+0x510/0xa50 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
Freed by task 20:
kasan_save_stack mm/kasan/common.c:57 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:584
poison_slab_object mm/kasan/common.c:253 [inline]
__kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
kasan_slab_free include/linux/kasan.h:235 [inline]
slab_free_hook mm/slub.c:2540 [inline]
slab_free mm/slub.c:6670 [inline]
kmem_cache_free+0x18f/0x8d0 mm/slub.c:6781
dst_destroy+0x235/0x350 net/core/dst.c:121
rcu_do_batch kernel/rcu/tree.c:2605 [inline]
rcu_core kernel/rcu/tree.c:2857 [inline]
rcu_cpu_kthread+0xba5/0x1af0 kernel/rcu/tree.c:2945
smpboot_thread_fn+0x542/0xa60 kernel/smpboot.c:160
kthread+0x711/0x8a0 kernel/kthread.c:463
ret_from_fork+0x510/0xa50 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
Last potentially related work creation:
kasan_save_stack+0x3e/0x60 mm/kasan/common.c:57
kasan_record_aux_stack+0xbd/0xd0 mm/kasan/generic.c:556
__call_rcu_common kernel/rcu/tree.c:3119 [inline]
call_rcu+0xee/0x890 kernel/rcu/tree.c:3239
refdst_drop include/net/dst.h:266 [inline]
skb_dst_drop include/net/dst.h:278 [inline]
skb_release_head_state+0x71/0x360 net/core/skbuff.c:1156
skb_release_all net/core/skbuff.c:1180 [inline]
__kfree_skb net/core/skbuff.c:1196 [inline]
sk_skb_reason_drop+0xe9/0x170 net/core/skbuff.c:1234
kfree_skb_reason include/linux/skbuff.h:1322 [inline]
tcf_kfree_skb_list include/net/sch_generic.h:1127 [inline]
__dev_xmit_skb net/core/dev.c:4260 [inline]
__dev_queue_xmit+0x26aa/0x3210 net/core/dev.c:4785
NF_HOOK_COND include/linux/netfilter.h:307 [inline]
ip6_output+0x340/0x550 net/ipv6/ip6_output.c:247
NF_HOOK+0x9e/0x380 include/linux/netfilter.h:318
mld_sendpack+0x8d4/0xe60 net/ipv6/mcast.c:1855
mld_send_cr net/ipv6/mcast.c:2154 [inline]
mld_ifc_work+0x83e/0xd60 net/ipv6/mcast.c:2693
process_one_work kernel/workqueue.c:3257 [inline]
process_scheduled_works+0xad1/0x1770 kernel/workqueue.c:3340
worker_thread+0x8a0/0xda0 kernel/workqueue.c:3421
kthread+0x711/0x8a0 kernel/kthread.c:463
ret_from_fork+0x510/0xa50 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
The buggy address belongs to the object at ffff8880294cfa00
which belongs to the cache ip6_dst_cache of size 232
The buggy address is located 120 bytes inside of
freed 232-byte region [ffff8880294cfa00, ffff8880294cfae8)
The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x294cf
memcg:ffff88803536b781
flags: 0x80000000000000(node=0|zone=1)
page_type: f5(slab)
raw: 0080000000000000 ffff88802ff1c8c0 ffffea0000bf2bc0 dead000000000006
raw: 0000000000000000 00000000800c000c 00000000f5000000 ffff88803536b781
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x52820(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 9, tgid 9 (kworker/0:0), ts 91119585830, free_ts 91088628818
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x234/0x290 mm/page_alloc.c:1857
prep_new_page mm/page_alloc.c:1865 [inline]
get_page_from_freelist+0x28c0/0x2960 mm/page_alloc.c:3915
__alloc_frozen_pages_noprof+0x181/0x370 mm/page_alloc.c:5210
alloc_pages_mpol+0xd1/0x380 mm/mempolicy.c:2486
alloc_slab_page mm/slub.c:3075 [inline]
allocate_slab+0x86/0x3b0 mm/slub.c:3248
new_slab mm/slub.c:3302 [inline]
___slab_alloc+0xb10/0x13e0 mm/slub.c:4656
__slab_alloc+0xc6/0x1f0 mm/slub.c:4779
__slab_alloc_node mm/slub.c:4855 [inline]
slab_alloc_node mm/slub.c:5251 [inline]
kmem_cache_alloc_noprof+0x101/0x6c0 mm/slub.c:5270
dst_alloc+0x105/0x170 net/core/dst.c:89
ip6_dst_alloc net/ipv6/route.c:342 [inline]
icmp6_dst_alloc+0x75/0x460 net/ipv6/route.c:3333
mld_sendpack+0x683/0xe60 net/ipv6/mcast.c:1844
mld_send_cr net/ipv6/mcast.c:2154 [inline]
mld_ifc_work+0x83e/0xd60 net/ipv6/mcast.c:2693
process_one_work kernel/workqueue.c:3257 [inline]
process_scheduled_works+0xad1/0x1770 kernel/workqueue.c:3340
worker_thread+0x8a0/0xda0 kernel/workqueue.c:3421
kthread+0x711/0x8a0 kernel/kthread.c:463
ret_from_fork+0x510/0xa50 arch/x86/kernel/process.c:158
page last free pid 5859 tgid 5859 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
free_pages_prepare mm/page_alloc.c:1406 [inline]
__free_frozen_pages+0xfe1/0x1170 mm/page_alloc.c:2943
discard_slab mm/slub.c:3346 [inline]
__put_partials+0x149/0x170 mm/slub.c:3886
__slab_free+0x2af/0x330 mm/slub.c:5952
qlink_free mm/kasan/quarantine.c:163 [inline]
qlist_free_all+0x97/0x100 mm/kasan/quarantine.c:179
kasan_quarantine_reduce+0x148/0x160 mm/kasan/quarantine.c:286
__kasan_slab_alloc+0x22/0x80 mm/kasan/common.c:350
kasan_slab_alloc include/linux/kasan.h:253 [inline]
slab_post_alloc_hook mm/slub.c:4953 [inline]
slab_alloc_node mm/slub.c:5263 [inline]
kmem_cache_alloc_noprof+0x18d/0x6c0 mm/slub.c:5270
getname_flags+0xb8/0x540 fs/namei.c:146
getname include/linux/fs.h:2498 [inline]
do_sys_openat2+0xbc/0x200 fs/open.c:1426
do_sys_open fs/open.c:1436 [inline]
__do_sys_openat fs/open.c:1452 [inline]
__se_sys_openat fs/open.c:1447 [inline]
__x64_sys_openat+0x138/0x170 fs/open.c:1447
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
Fixes: 8d0b94afdca8 ("ipv6: Keep track of DST_NOCACHE routes in case of iface down/unregister")
Fixes: 78df76a065ae ("ipv4: take rt_uncached_lock only if needed")
Reported-by: syzbot+179fc225724092b8b2b2@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6964cdf2.050a0220.eaf7.009d.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260112103825.3810713-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In blamed commit, I added a check against the temporary queue
built in __dev_xmit_skb(). Idea was to drop packets early,
before any spinlock was acquired.
if (unlikely(defer_count > READ_ONCE(q->limit))) {
kfree_skb_reason(skb, SKB_DROP_REASON_QDISC_DROP);
return NET_XMIT_DROP;
}
It turned out that HTB Qdisc has a zero q->limit.
HTB limits packets on a per-class basis.
Some of our tests became flaky.
Add a new sysctl : net.core.qdisc_max_burst to control
how many packets can be stored in the temporary lockless queue.
Also add a new QDISC_BURST_DROP drop reason to better diagnose
future issues.
Thanks Neal !
Fixes: 100dfa74cad9 ("net: dev_queue_xmit() llist adoption")
Reported-and-bisected-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20260107104159.3669285-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
syzbot is reporting
unregister_netdevice: waiting for sit0 to become free. Usage count = 2
problem. A debug printk() patch found that a refcount is obtained at
xdp_convert_md_to_buff() from bpf_prog_test_run_xdp().
According to commit ec94670fcb3b ("bpf: Support specifying ingress via
xdp_md context in BPF_PROG_TEST_RUN"), the refcount obtained by
xdp_convert_md_to_buff() will be released by xdp_convert_buff_to_md().
Therefore, we can consider that the error handling path introduced by
commit 1c1949982524 ("bpf: introduce frags support to
bpf_prog_test_run_xdp()") forgot to call xdp_convert_buff_to_md().
Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84
Fixes: 1c1949982524 ("bpf: introduce frags support to bpf_prog_test_run_xdp()")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/af090e53-9d9b-4412-8acb-957733b3975c@I-love.SAKURA.ne.jp
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_sync: enable PA Sync Lost event
* tag 'for-net-2026-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci_sync: enable PA Sync Lost event
====================
Link: https://patch.msgid.link/20260109211949.236218-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Analog to commit db5b4e39c4e6 ("ip6_gre: make ip6gre_header() robust")
Over the years, syzbot found many ways to crash the kernel
in ipgre_header() [1].
This involves team or bonding drivers ability to dynamically
change their dev->needed_headroom and/or dev->hard_header_len
In this particular crash mld_newpack() allocated an skb
with a too small reserve/headroom, and by the time mld_sendpack()
was called, syzbot managed to attach an ipgre device.
[1]
skbuff: skb_under_panic: text:ffffffff89ea3cb7 len:2030915468 put:2030915372 head:ffff888058b43000 data:ffff887fdfa6e194 tail:0x120 end:0x6c0 dev:team0
kernel BUG at net/core/skbuff.c:213 !
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 1 UID: 0 PID: 1322 Comm: kworker/1:9 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
Workqueue: mld mld_ifc_work
RIP: 0010:skb_panic+0x157/0x160 net/core/skbuff.c:213
Call Trace:
<TASK>
skb_under_panic net/core/skbuff.c:223 [inline]
skb_push+0xc3/0xe0 net/core/skbuff.c:2641
ipgre_header+0x67/0x290 net/ipv4/ip_gre.c:897
dev_hard_header include/linux/netdevice.h:3436 [inline]
neigh_connected_output+0x286/0x460 net/core/neighbour.c:1618
NF_HOOK_COND include/linux/netfilter.h:307 [inline]
ip6_output+0x340/0x550 net/ipv6/ip6_output.c:247
NF_HOOK+0x9e/0x380 include/linux/netfilter.h:318
mld_sendpack+0x8d4/0xe60 net/ipv6/mcast.c:1855
mld_send_cr net/ipv6/mcast.c:2154 [inline]
mld_ifc_work+0x83e/0xd60 net/ipv6/mcast.c:2693
process_one_work kernel/workqueue.c:3257 [inline]
process_scheduled_works+0xad1/0x1770 kernel/workqueue.c:3340
worker_thread+0x8a0/0xda0 kernel/workqueue.c:3421
kthread+0x711/0x8a0 kernel/kthread.c:463
ret_from_fork+0x510/0xa50 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
Reported-by: syzbot+7c134e1c3aa3283790b9@syzkaller.appspotmail.com
Closes: https://www.spinics.net/lists/netdev/msg1147302.html
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260108190214.1667040-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add missing entries in netdev_lock_type[] and netdev_lock_name[] :
CAN, MCTP, RAWIP, CAIF, IP6GRE, 6LOWPAN, NETLINK, VSOCKMON,
IEEE802154_MONITOR.
Also add a WARN_ONCE() in netdev_lock_pos() to help future bug hunting
next time a protocol is added without updating these arrays.
Fixes: 1a33e10e4a95 ("net: partially revert dynamic lockdep key changes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260108093244.830280-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Blamed commit did not take care of VLAN encapsulations
as spotted by syzbot [1].
Use skb_vlan_inet_prepare() instead of pskb_inet_may_pull().
[1]
BUG: KMSAN: uninit-value in __INET_ECN_decapsulate include/net/inet_ecn.h:253 [inline]
BUG: KMSAN: uninit-value in INET_ECN_decapsulate include/net/inet_ecn.h:275 [inline]
BUG: KMSAN: uninit-value in IP6_ECN_decapsulate+0x7a8/0x1fa0 include/net/inet_ecn.h:321
__INET_ECN_decapsulate include/net/inet_ecn.h:253 [inline]
INET_ECN_decapsulate include/net/inet_ecn.h:275 [inline]
IP6_ECN_decapsulate+0x7a8/0x1fa0 include/net/inet_ecn.h:321
ip6ip6_dscp_ecn_decapsulate+0x16f/0x1b0 net/ipv6/ip6_tunnel.c:729
__ip6_tnl_rcv+0xed9/0x1b50 net/ipv6/ip6_tunnel.c:860
ip6_tnl_rcv+0xc3/0x100 net/ipv6/ip6_tunnel.c:903
gre_rcv+0x1529/0x1b90 net/ipv6/ip6_gre.c:-1
ip6_protocol_deliver_rcu+0x1c89/0x2c60 net/ipv6/ip6_input.c:438
ip6_input_finish+0x1f4/0x4a0 net/ipv6/ip6_input.c:489
NF_HOOK include/linux/netfilter.h:318 [inline]
ip6_input+0x9c/0x330 net/ipv6/ip6_input.c:500
ip6_mc_input+0x7ca/0xc10 net/ipv6/ip6_input.c:590
dst_input include/net/dst.h:474 [inline]
ip6_rcv_finish+0x958/0x990 net/ipv6/ip6_input.c:79
NF_HOOK include/linux/netfilter.h:318 [inline]
ipv6_rcv+0xf1/0x3c0 net/ipv6/ip6_input.c:311
__netif_receive_skb_one_core net/core/dev.c:6139 [inline]
__netif_receive_skb+0x1df/0xac0 net/core/dev.c:6252
netif_receive_skb_internal net/core/dev.c:6338 [inline]
netif_receive_skb+0x57/0x630 net/core/dev.c:6397
tun_rx_batched+0x1df/0x980 drivers/net/tun.c:1485
tun_get_user+0x5c0e/0x6c60 drivers/net/tun.c:1953
tun_chr_write_iter+0x3e9/0x5c0 drivers/net/tun.c:1999
new_sync_write fs/read_write.c:593 [inline]
vfs_write+0xbe2/0x15d0 fs/read_write.c:686
ksys_write fs/read_write.c:738 [inline]
__do_sys_write fs/read_write.c:749 [inline]
__se_sys_write fs/read_write.c:746 [inline]
__x64_sys_write+0x1fb/0x4d0 fs/read_write.c:746
x64_sys_call+0x30ab/0x3e70 arch/x86/include/generated/asm/syscalls_64.h:2
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xd3/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Uninit was created at:
slab_post_alloc_hook mm/slub.c:4960 [inline]
slab_alloc_node mm/slub.c:5263 [inline]
kmem_cache_alloc_node_noprof+0x9e7/0x17a0 mm/slub.c:5315
kmalloc_reserve+0x13c/0x4b0 net/core/skbuff.c:586
__alloc_skb+0x805/0x1040 net/core/skbuff.c:690
alloc_skb include/linux/skbuff.h:1383 [inline]
alloc_skb_with_frags+0xc5/0xa60 net/core/skbuff.c:6712
sock_alloc_send_pskb+0xacc/0xc60 net/core/sock.c:2995
tun_alloc_skb drivers/net/tun.c:1461 [inline]
tun_get_user+0x1142/0x6c60 drivers/net/tun.c:1794
tun_chr_write_iter+0x3e9/0x5c0 drivers/net/tun.c:1999
new_sync_write fs/read_write.c:593 [inline]
vfs_write+0xbe2/0x15d0 fs/read_write.c:686
ksys_write fs/read_write.c:738 [inline]
__do_sys_write fs/read_write.c:749 [inline]
__se_sys_write fs/read_write.c:746 [inline]
__x64_sys_write+0x1fb/0x4d0 fs/read_write.c:746
x64_sys_call+0x30ab/0x3e70 arch/x86/include/generated/asm/syscalls_64.h:2
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xd3/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
CPU: 0 UID: 0 PID: 6465 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(none)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
Fixes: 8d975c15c0cd ("ip6_tunnel: make sure to pull inner header in __ip6_tnl_rcv()")
Reported-by: syzbot+d4dda070f833dc5dc89a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/695e88b2.050a0220.1c677c.036d.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260107163109.4188620-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
fdb->updated and fdb->used are read and written locklessly.
Add READ_ONCE()/WRITE_ONCE() annotations.
Fixes: 31cbc39b6344 ("net: bridge: add option to allow activity notifications for any fdb entries")
Reported-by: syzbot+bfab43087ad57222ce96@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/695e3d74.050a0220.1c677c.035f.GAE@google.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260108093806.834459-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull ceph fixes from Ilya Dryomov:
"A bunch of libceph fixes split evenly between memory safety and
implementation correctness issues (all marked for stable) and a change
in maintainers for CephFS: Slava and Alex have formally taken over
Xiubo's role"
* tag 'ceph-for-6.19-rc5' of https://github.com/ceph/ceph-client:
libceph: make calc_target() set t->paused, not just clear it
libceph: reset sparse-read state in osd_fault()
libceph: return the handler error from mon_handle_auth_done()
libceph: make free_choose_arg_map() resilient to partial allocation
ceph: update co-maintainers list in MAINTAINERS
libceph: replace overzealous BUG_ON in osdmap_apply_incremental()
libceph: prevent potential out-of-bounds reads in handle_auth_done()
|
|
Enable the PA Sync Lost event mask to ensure PA sync loss is properly
reported and handled.
Fixes: 485e0626e587 ("Bluetooth: hci_event: Fix not handling PA Sync Lost event")
Signed-off-by: Yang Li <yang.li@amlogic.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Inspired by yet another syzbot report.
IPv6 tunnels call netdev_lockdep_set_classes() for each tunnel type,
while IPv4 currently centralizes netdev_lockdep_set_classes() call from
ip_tunnel_init().
Make ip_tunnel_init() a macro, so that we have different lockdep
classes per tunnel type.
Fixes: 0bef512012b1 ("net: add netdev_lockdep_set_classes() to virtual drivers")
Reported-by: syzbot+1240b33467289f5ab50b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/695d439f.050a0220.1c677c.0347.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260106172426.1760721-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
arp_create() is the only dev_hard_header() caller
making assumption about skb->head being unchanged.
A recent commit broke this assumption.
Initialize @arp pointer after dev_hard_header() call.
Fixes: db5b4e39c4e6 ("ip6_gre: make ip6gre_header() robust")
Reported-by: syzbot+58b44a770a1585795351@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260107212250.384552-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Couple of fixes:
- mac80211:
- long-standing injection bug due to chanctx rework
- more recent interface iteration issue
- collect statistics before removing stations
- hwsim:
- fix NAN frequency typo (potential NULL ptr deref)
- fix locking of radio lock (needs softirqs disabled)
- wext:
- ancient issue with compat and events copying some
uninitialized stack data to userspace
* tag 'wireless-2026-01-08' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: collect station statistics earlier when disconnect
wifi: mac80211: restore non-chanctx injection behaviour
wifi: mac80211_hwsim: disable BHs for hwsim_radio_lock
wifi: mac80211: don't iterate not running interfaces
wifi: mac80211_hwsim: fix typo in frequency notification
wifi: avoid kernel-infoleak from struct iw_point
====================
Link: https://patch.msgid.link/20260108140141.139687-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
NULL pointer dereference fix.
msg_get_inq is an input field from caller to callee. Don't set it in
the callee, as the caller may not clear it on struct reuse.
This is a kernel-internal variant of msghdr only, and the only user
does reinitialize the field. So this is not critical for that reason.
But it is more robust to avoid the write, and slightly simpler code.
And it fixes a bug, see below.
Callers set msg_get_inq to request the input queue length to be
returned in msg_inq. This is equivalent to but independent from the
SO_INQ request to return that same info as a cmsg (tp->recvmsg_inq).
To reduce branching in the hot path the second also sets the msg_inq.
That is WAI.
This is a fix to commit 4d1442979e4a ("af_unix: don't post cmsg for
SO_INQ unless explicitly asked for"), which fixed the inverse.
Also avoid NULL pointer dereference in unix_stream_read_generic if
state->msg is NULL and msg->msg_get_inq is written. A NULL state->msg
can happen when splicing as of commit 2b514574f7e8 ("net: af_unix:
implement splice for stream af_unix sockets").
Also collapse two branches using a bitwise or.
Cc: stable@vger.kernel.org
Fixes: 4d1442979e4a ("af_unix: don't post cmsg for SO_INQ unless explicitly asked for")
Link: https://lore.kernel.org/netdev/willemdebruijn.kernel.24d8030f7a3de@gmail.com/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260106150626.3944363-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
qfq_reset
`qfq_class->leaf_qdisc->q.qlen > 0` does not imply that the class
itself is active.
Two qfq_class objects may point to the same leaf_qdisc. This happens
when:
1. one QFQ qdisc is attached to the dev as the root qdisc, and
2. another QFQ qdisc is temporarily referenced (e.g., via qdisc_get()
/ qdisc_put()) and is pending to be destroyed, as in function
tc_new_tfilter.
When packets are enqueued through the root QFQ qdisc, the shared
leaf_qdisc->q.qlen increases. At the same time, the second QFQ
qdisc triggers qdisc_put and qdisc_destroy: the qdisc enters
qfq_reset() with its own q->q.qlen == 0, but its class's leaf
qdisc->q.qlen > 0. Therefore, the qfq_reset would wrongly deactivate
an inactive aggregate and trigger a null-deref in qfq_deactivate_agg:
[ 0.903172] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 0.903571] #PF: supervisor write access in kernel mode
[ 0.903860] #PF: error_code(0x0002) - not-present page
[ 0.904177] PGD 10299b067 P4D 10299b067 PUD 10299c067 PMD 0
[ 0.904502] Oops: Oops: 0002 [#1] SMP NOPTI
[ 0.904737] CPU: 0 UID: 0 PID: 135 Comm: exploit Not tainted 6.19.0-rc3+ #2 NONE
[ 0.905157] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[ 0.905754] RIP: 0010:qfq_deactivate_agg (include/linux/list.h:992 (discriminator 2) include/linux/list.h:1006 (discriminator 2) net/sched/sch_qfq.c:1367 (discriminator 2) net/sched/sch_qfq.c:1393 (discriminator 2))
[ 0.906046] Code: 0f 84 4d 01 00 00 48 89 70 18 8b 4b 10 48 c7 c2 ff ff ff ff 48 8b 78 08 48 d3 e2 48 21 f2 48 2b 13 48 8b 30 48 d3 ea 8b 4b 18 0
Code starting with the faulting instruction
===========================================
0: 0f 84 4d 01 00 00 je 0x153
6: 48 89 70 18 mov %rsi,0x18(%rax)
a: 8b 4b 10 mov 0x10(%rbx),%ecx
d: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx
14: 48 8b 78 08 mov 0x8(%rax),%rdi
18: 48 d3 e2 shl %cl,%rdx
1b: 48 21 f2 and %rsi,%rdx
1e: 48 2b 13 sub (%rbx),%rdx
21: 48 8b 30 mov (%rax),%rsi
24: 48 d3 ea shr %cl,%rdx
27: 8b 4b 18 mov 0x18(%rbx),%ecx
...
[ 0.907095] RSP: 0018:ffffc900004a39a0 EFLAGS: 00010246
[ 0.907368] RAX: ffff8881043a0880 RBX: ffff888102953340 RCX: 0000000000000000
[ 0.907723] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 0.908100] RBP: ffff888102952180 R08: 0000000000000000 R09: 0000000000000000
[ 0.908451] R10: ffff8881043a0000 R11: 0000000000000000 R12: ffff888102952000
[ 0.908804] R13: ffff888102952180 R14: ffff8881043a0ad8 R15: ffff8881043a0880
[ 0.909179] FS: 000000002a1a0380(0000) GS:ffff888196d8d000(0000) knlGS:0000000000000000
[ 0.909572] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.909857] CR2: 0000000000000000 CR3: 0000000102993002 CR4: 0000000000772ef0
[ 0.910247] PKRU: 55555554
[ 0.910391] Call Trace:
[ 0.910527] <TASK>
[ 0.910638] qfq_reset_qdisc (net/sched/sch_qfq.c:357 net/sched/sch_qfq.c:1485)
[ 0.910826] qdisc_reset (include/linux/skbuff.h:2195 include/linux/skbuff.h:2501 include/linux/skbuff.h:3424 include/linux/skbuff.h:3430 net/sched/sch_generic.c:1036)
[ 0.911040] __qdisc_destroy (net/sched/sch_generic.c:1076)
[ 0.911236] tc_new_tfilter (net/sched/cls_api.c:2447)
[ 0.911447] rtnetlink_rcv_msg (net/core/rtnetlink.c:6958)
[ 0.911663] ? __pfx_rtnetlink_rcv_msg (net/core/rtnetlink.c:6861)
[ 0.911894] netlink_rcv_skb (net/netlink/af_netlink.c:2550)
[ 0.912100] netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344)
[ 0.912296] ? __alloc_skb (net/core/skbuff.c:706)
[ 0.912484] netlink_sendmsg (net/netlink/af_netlink.c:1894)
[ 0.912682] sock_write_iter (net/socket.c:727 (discriminator 1) net/socket.c:742 (discriminator 1) net/socket.c:1195 (discriminator 1))
[ 0.912880] vfs_write (fs/read_write.c:593 fs/read_write.c:686)
[ 0.913077] ksys_write (fs/read_write.c:738)
[ 0.913252] do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
[ 0.913438] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)
[ 0.913687] RIP: 0033:0x424c34
[ 0.913844] Code: 89 02 48 c7 c0 ff ff ff ff eb bd 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 2d 44 09 00 00 74 13 b8 01 00 00 00 0f 05 9
Code starting with the faulting instruction
===========================================
0: 89 02 mov %eax,(%rdx)
2: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
9: eb bd jmp 0xffffffffffffffc8
b: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
12: 00 00 00
15: 90 nop
16: f3 0f 1e fa endbr64
1a: 80 3d 2d 44 09 00 00 cmpb $0x0,0x9442d(%rip) # 0x9444e
21: 74 13 je 0x36
23: b8 01 00 00 00 mov $0x1,%eax
28: 0f 05 syscall
2a: 09 .byte 0x9
[ 0.914807] RSP: 002b:00007ffea1938b78 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[ 0.915197] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 0000000000424c34
[ 0.915556] RDX: 000000000000003c RSI: 000000002af378c0 RDI: 0000000000000003
[ 0.915912] RBP: 00007ffea1938bc0 R08: 00000000004b8820 R09: 0000000000000000
[ 0.916297] R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffea1938d28
[ 0.916652] R13: 00007ffea1938d38 R14: 00000000004b3828 R15: 0000000000000001
[ 0.917039] </TASK>
[ 0.917158] Modules linked in:
[ 0.917316] CR2: 0000000000000000
[ 0.917484] ---[ end trace 0000000000000000 ]---
[ 0.917717] RIP: 0010:qfq_deactivate_agg (include/linux/list.h:992 (discriminator 2) include/linux/list.h:1006 (discriminator 2) net/sched/sch_qfq.c:1367 (discriminator 2) net/sched/sch_qfq.c:1393 (discriminator 2))
[ 0.917978] Code: 0f 84 4d 01 00 00 48 89 70 18 8b 4b 10 48 c7 c2 ff ff ff ff 48 8b 78 08 48 d3 e2 48 21 f2 48 2b 13 48 8b 30 48 d3 ea 8b 4b 18 0
Code starting with the faulting instruction
===========================================
0: 0f 84 4d 01 00 00 je 0x153
6: 48 89 70 18 mov %rsi,0x18(%rax)
a: 8b 4b 10 mov 0x10(%rbx),%ecx
d: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx
14: 48 8b 78 08 mov 0x8(%rax),%rdi
18: 48 d3 e2 shl %cl,%rdx
1b: 48 21 f2 and %rsi,%rdx
1e: 48 2b 13 sub (%rbx),%rdx
21: 48 8b 30 mov (%rax),%rsi
24: 48 d3 ea shr %cl,%rdx
27: 8b 4b 18 mov 0x18(%rbx),%ecx
...
[ 0.918902] RSP: 0018:ffffc900004a39a0 EFLAGS: 00010246
[ 0.919198] RAX: ffff8881043a0880 RBX: ffff888102953340 RCX: 0000000000000000
[ 0.919559] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 0.919908] RBP: ffff888102952180 R08: 0000000000000000 R09: 0000000000000000
[ 0.920289] R10: ffff8881043a0000 R11: 0000000000000000 R12: ffff888102952000
[ 0.920648] R13: ffff888102952180 R14: ffff8881043a0ad8 R15: ffff8881043a0880
[ 0.921014] FS: 000000002a1a0380(0000) GS:ffff888196d8d000(0000) knlGS:0000000000000000
[ 0.921424] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.921710] CR2: 0000000000000000 CR3: 0000000102993002 CR4: 0000000000772ef0
[ 0.922097] PKRU: 55555554
[ 0.922240] Kernel panic - not syncing: Fatal exception
[ 0.922590] Kernel Offset: disabled
Fixes: 0545a3037773 ("pkt_sched: QFQ - quick fair queue scheduler")
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Link: https://patch.msgid.link/20260106034100.1780779-1-xmei5@asu.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In __sta_info_destroy_part2(), station statistics are requested after the
IEEE80211_STA_NONE -> IEEE80211_STA_NOTEXIST transition. This is
problematic because the driver may be unable to handle the request due to
the STA being in the NOTEXIST state (i.e. if the driver destroys the
underlying data when transitioning to NOTEXIST).
Move the statistics collection to before the state transition to avoid
this issue.
Signed-off-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Link: https://patch.msgid.link/20251222-mac80211-move-station-stats-collection-earlier-v1-1-12cd4e42c633@oss.qualcomm.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
During the transition to use channel contexts throughout, the
ability to do injection while in monitor mode concurrent with
another interface was lost, since the (virtual) monitor won't
have a chanctx assigned in this scenario.
It's harder to fix drivers that actually transitioned to using
channel contexts themselves, such as mt76, but it's easy to do
those that are (still) just using the emulation. Do that.
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218763
Reported-and-tested-by: Oscar Alfonso Diaz <oscar.alfonso.diaz@gmail.com>
Fixes: 0a44dfc07074 ("wifi: mac80211: simplify non-chanctx drivers")
Link: https://patch.msgid.link/20251216105242.18366-2-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
for_each_chanctx_user_* was introdcued as a replacement for
for_each_sdata_link, which visits also other chanctx users that are not
link.
for_each_sdata_link skips not running interfaces, do the same for
for_each_chanctx_user_*
Fixes: 1ce954c98b89 ("wifi: mac80211: add and use chanctx usage iteration")
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260107143736.55c084e2a976.I38b7b904a135dadca339321923b501b2c2c5c8c0@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
struct iw_point has a 32bit hole on 64bit arches.
struct iw_point {
void __user *pointer; /* Pointer to the data (in user space) */
__u16 length; /* number of fields or size in bytes */
__u16 flags; /* Optional params */
};
Make sure to zero the structure to avoid disclosing 32bits of kernel data
to user space.
Fixes: 87de87d5e47f ("wext: Dispatch and handle compat ioctls entirely in net/wireless/wext.c")
Reported-by: syzbot+bfc7323743ca6dbcc3d3@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/695f83f3.050a0220.1c677c.0392.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260108101927.857582-1-edumazet@google.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
syzbot reported a crash in tc_act_in_hw() during netns teardown where
tcf_idrinfo_destroy() passed an ERR_PTR(-EBUSY) value as a tc_action
pointer, leading to an invalid dereference.
Guard against ERR_PTR entries when iterating the action IDR so teardown
does not call tc_act_in_hw() on an error pointer.
Fixes: 84a7d6797e6a ("net/sched: acp_api: no longer acquire RTNL in tc_action_net_exit()")
Link: https://syzkaller.appspot.com/bug?extid=8f1c492ffa4644ff3826
Reported-by: syzbot+8f1c492ffa4644ff3826@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=8f1c492ffa4644ff3826
Signed-off-by: Shivani Gupta <shivani07g@gmail.com>
Link: https://patch.msgid.link/20260105005905.243423-1-shivani07g@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Standard UDP receive path does not use skb->destructor.
But skmsg layer does use it, since it calls skb_set_owner_sk_safe()
from udp_read_skb().
This then triggers this warning in skb_attempt_defer_free():
DEBUG_NET_WARN_ON_ONCE(skb->destructor);
We must call skb_orphan() to fix this issue.
Fixes: 6471658dc66c ("udp: use skb_attempt_defer_free()")
Reported-by: syzbot+3e68572cf2286ce5ebe9@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/695b83bd.050a0220.1c9965.002b.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260105093630.1976085-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The xdp_frame structure takes up part of the XDP frame headroom,
limiting the size of the metadata. However, in bpf_test_run, we don't
take this into account, which makes it possible for userspace to supply
a metadata size that is too large (taking up the entire headroom).
If userspace supplies such a large metadata size in live packet mode,
the xdp_update_frame_from_buff() call in xdp_test_run_init_page() call
will fail, after which packet transmission proceeds with an
uninitialised frame structure, leading to the usual Bad Stuff.
The commit in the Fixes tag fixed a related bug where the second check
in xdp_update_frame_from_buff() could fail, but did not add any
additional constraints on the metadata size. Complete the fix by adding
an additional check on the metadata size. Reorder the checks slightly to
make the logic clearer and add a comment.
Link: https://lore.kernel.org/r/fa2be179-bad7-4ee3-8668-4903d1853461@hust.edu.cn
Fixes: b6f1f780b393 ("bpf, test_run: Fix packet size check for live packet mode")
Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260105114747.1358750-1-toke@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
When skb_segment_list() is called during packet forwarding, it handles
packets that were aggregated by the GRO engine.
Historically, the segmentation logic in skb_segment_list assumes that
individual segments are split from a parent SKB and may need to carry
their own socket memory accounting. Accordingly, the code transfers
truesize from the parent to the newly created segments.
Prior to commit ed4cccef64c1 ("gro: fix ownership transfer"), this
truesize subtraction in skb_segment_list() was valid because fragments
still carry a reference to the original socket.
However, commit ed4cccef64c1 ("gro: fix ownership transfer") changed
this behavior by ensuring that fraglist entries are explicitly
orphaned (skb->sk = NULL) to prevent illegal orphaning later in the
stack. This change meant that the entire socket memory charge remained
with the head SKB, but the corresponding accounting logic in
skb_segment_list() was never updated.
As a result, the current code unconditionally adds each fragment's
truesize to delta_truesize and subtracts it from the parent SKB. Since
the fragments are no longer charged to the socket, this subtraction
results in an effective under-count of memory when the head is freed.
This causes sk_wmem_alloc to remain non-zero, preventing socket
destruction and leading to a persistent memory leak.
The leak can be observed via KMEMLEAK when tearing down the networking
environment:
unreferenced object 0xffff8881e6eb9100 (size 2048):
comm "ping", pid 6720, jiffies 4295492526
backtrace:
kmem_cache_alloc_noprof+0x5c6/0x800
sk_prot_alloc+0x5b/0x220
sk_alloc+0x35/0xa00
inet6_create.part.0+0x303/0x10d0
__sock_create+0x248/0x640
__sys_socket+0x11b/0x1d0
Since skb_segment_list() is exclusively used for SKB_GSO_FRAGLIST
packets constructed by GRO, the truesize adjustment is removed.
The call to skb_release_head_state() must be preserved. As documented in
commit cf673ed0e057 ("net: fix fraglist segmentation reference count
leak"), it is still required to correctly drop references to SKB
extensions that may be overwritten during __copy_skb_header().
Fixes: ed4cccef64c1 ("gro: fix ownership transfer")
Signed-off-by: Mohammad Heib <mheib@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260104213101.352887-1-mheib@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Whenever a mirred redirect to self on egress happens, mirred allocates a
new skb (skb_to_send). The loop to self check was done after that
allocation, but was not freeing the newly allocated skb, causing a leak.
Fix this by moving the if-statement to before the allocation of the new
skb.
The issue was found by running the accompanying tdc test in 2/2
with config kmemleak enabled.
After a few minutes the kmemleak thread ran and reported the leak coming from
mirred.
Fixes: 1d856251a009 ("net/sched: act_mirred: fix loop detection")
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260101135608.253079-2-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
SO_ZEROCOPY handling in vsock_connectible_setsockopt() does not get called
on accept()ed sockets due to a missing flag. Flip it.
Fixes: e0718bd82e27 ("vsock: enable setting SO_ZEROCOPY")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://patch.msgid.link/20251229-vsock-child-sock-custom-sockopt-v2-1-64778d6c4f88@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently calc_target() clears t->paused if the request shouldn't be
paused anymore, but doesn't ever set t->paused even though it's able to
determine when the request should be paused. Setting t->paused is left
to __submit_request() which is fine for regular requests but doesn't
work for linger requests -- since __submit_request() doesn't operate
on linger requests, there is nowhere for lreq->t.paused to be set.
One consequence of this is that watches don't get reestablished on
paused -> unpaused transitions in cases where requests have been paused
long enough for the (paused) unwatch request to time out and for the
subsequent (re)watch request to enter the paused state. On top of the
watch not getting reestablished, rbd_reregister_watch() gets stuck with
rbd_dev->watch_mutex held:
rbd_register_watch
__rbd_register_watch
ceph_osdc_watch
linger_reg_commit_wait
It's waiting for lreq->reg_commit_wait to be completed, but for that to
happen the respective request needs to end up on need_resend_linger list
and be kicked when requests are unpaused. There is no chance for that
if the request in question is never marked paused in the first place.
The fact that rbd_dev->watch_mutex remains taken out forever then
prevents the image from getting unmapped -- "rbd unmap" would inevitably
hang in D state on an attempt to grab the mutex.
Cc: stable@vger.kernel.org
Reported-by: Raphael Zimmer <raphael.zimmer@tu-ilmenau.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
|
|
When a fault occurs, the connection is abandoned, reestablished, and any
pending operations are retried. The OSD client tracks the progress of a
sparse-read reply using a separate state machine, largely independent of
the messenger's state.
If a connection is lost mid-payload or the sparse-read state machine
returns an error, the sparse-read state is not reset. The OSD client
will then interpret the beginning of a new reply as the continuation of
the old one. If this makes the sparse-read machinery enter a failure
state, it may never recover, producing loops like:
libceph: [0] got 0 extents
libceph: data len 142248331 != extent len 0
libceph: osd0 (1)...:6801 socket error on read
libceph: data len 142248331 != extent len 0
libceph: osd0 (1)...:6801 socket error on read
Therefore, reset the sparse-read state in osd_fault(), ensuring retries
start from a clean state.
Cc: stable@vger.kernel.org
Fixes: f628d7999727 ("libceph: add sparse read support to OSD client")
Signed-off-by: Sam Edwards <CFSworks@gmail.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Currently any error from ceph_auth_handle_reply_done() is propagated
via finish_auth() but isn't returned from mon_handle_auth_done(). This
results in higher layers learning that (despite the monitor considering
us to be successfully authenticated) something went wrong in the
authentication phase and reacting accordingly, but msgr2 still trying
to proceed with establishing the session in the background. In the
case of secure mode this can trigger a WARN in setup_crypto() and later
lead to a NULL pointer dereference inside of prepare_auth_signature().
Cc: stable@vger.kernel.org
Fixes: cd1a677cad99 ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
|
|
free_choose_arg_map() may dereference a NULL pointer if its caller fails
after a partial allocation.
For example, in decode_choose_args(), if allocation of arg_map->args
fails, execution jumps to the fail label and free_choose_arg_map() is
called. Since arg_map->size is updated to a non-zero value before memory
allocation, free_choose_arg_map() will iterate over arg_map->args and
dereference a NULL pointer.
To prevent this potential NULL pointer dereference and make
free_choose_arg_map() more resilient, add checks for pointers before
iterating.
Cc: stable@vger.kernel.org
Co-authored-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Tuo Li <islituo@gmail.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
If the osdmap is (maliciously) corrupted such that the incremental
osdmap epoch is different from what is expected, there is no need to
BUG. Instead, just declare the incremental osdmap to be invalid.
Cc: stable@vger.kernel.org
Reported-by: ziming zhang <ezrakiez@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Perform an explicit bounds check on payload_len to avoid a possible
out-of-bounds access in the callout.
[ idryomov: changelog ]
Cc: stable@vger.kernel.org
Signed-off-by: ziming zhang <ezrakiez@gmail.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter: updates for net
The following patchset contains Netfilter fixes for *net*:
1) Fix overlap detection for nf_tables with concatenated ranges.
There are cases where element could not be added due to a conflict
with existing range, while kernel reports success to userspace.
2) update selftest to cover this bug.
3) synproxy update path should use READ/WRITE once as we replace
config struct while packet path might read it in parallel.
This relies on said config struct to fit sizeof(long).
From Fernando Fernandez Mancera.
4) Don't return -EEXIST from xtables in module load path, a pending
patch to module infra will spot a warning if this happens.
From Daniel Gomez.
5) Fix a memory leak in nf_tables when chain hits 2**32 users
and rule is to be hw-offloaded, from Zilin Guan.
6) Avoid infinite list growth when insert rate is high in nf_conncount,
also from Fernando.
* tag 'nf-26-01-02' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_conncount: update last_gc only when GC has been performed
netfilter: nf_tables: fix memory leak in nf_tables_newrule()
netfilter: replace -EEXIST with -EBUSY
netfilter: nft_synproxy: avoid possible data-race on update operation
selftests: netfilter: nft_concat_range.sh: add check for overlap detection bug
netfilter: nft_set_pipapo: fix range overlap detection
====================
Link: https://patch.msgid.link/20260102114128.7007-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jakub added a warning in nf_conntrack_cleanup_net_list() to make debugging
leaked skbs/conntrack references more obvious.
syzbot reports this as triggering, and I can also reproduce this via
ip_defrag.sh selftest:
conntrack cleanup blocked for 60s
WARNING: net/netfilter/nf_conntrack_core.c:2512
[..]
conntrack clenups gets stuck because there are skbs with still hold nf_conn
references via their frag_list.
net.core.skb_defer_max=0 makes the hang disappear.
Eric Dumazet points out that skb_release_head_state() doesn't follow the
fraglist.
ip_defrag.sh can only reproduce this problem since
commit 6471658dc66c ("udp: use skb_attempt_defer_free()"), but AFAICS this
problem could happen with TCP as well if pmtu discovery is off.
The relevant problem path for udp is:
1. netns emits fragmented packets
2. nf_defrag_v6_hook reassembles them (in output hook)
3. reassembled skb is tracked (skb owns nf_conn reference)
4. ip6_output refragments
5. refragmented packets also own nf_conn reference (ip6_fragment
calls ip6_copy_metadata())
6. on input path, nf_defrag_v6_hook skips defragmentation: the
fragments already have skb->nf_conn attached
7. skbs are reassembled via ipv6_frag_rcv()
8. skb_consume_udp -> skb_attempt_defer_free() -> skb ends up
in pcpu freelist, but still has nf_conn reference.
Possible solutions:
1 let defrag engine drop nf_conn entry, OR
2 export kick_defer_list_purge() and call it from the conntrack
netns exit callback, OR
3 add skb_has_frag_list() check to skb_attempt_defer_free()
2 & 3 also solve ip_defrag.sh hang but share same drawback:
Such reassembled skbs, queued to socket, can prevent conntrack module
removal until userspace has consumed the packet. While both tcp and udp
stack do call nf_reset_ct() before placing skb on socket queue, that
function doesn't iterate frag_list skbs.
Therefore drop nf_conn entries when they are placed in defrag queue.
Keep the nf_conn entry of the first (offset 0) skb so that reassembled
skb retains nf_conn entry for sake of TX path.
Note that fixes tag is incorrect; it points to the commit introducing the
'ip_defrag.sh reproducible problem': no need to backport this patch to
every stable kernel.
Reported-by: syzbot+4393c47753b7808dac7d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/693b0fa7.050a0220.4004e.040d.GAE@google.com/
Fixes: 6471658dc66c ("udp: use skb_attempt_defer_free()")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260102140030.32367-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
skbuff_fclone_cache was created without defining a usercopy region,
[1] unlike skbuff_head_cache which properly whitelists the cb[] field.
[2] This causes a usercopy BUG() when CONFIG_HARDENED_USERCOPY is
enabled and the kernel attempts to copy sk_buff.cb data to userspace
via sock_recv_errqueue() -> put_cmsg().
The crash occurs when: 1. TCP allocates an skb using alloc_skb_fclone()
(from skbuff_fclone_cache) [1]
2. The skb is cloned via skb_clone() using the pre-allocated fclone
[3] 3. The cloned skb is queued to sk_error_queue for timestamp
reporting 4. Userspace reads the error queue via recvmsg(MSG_ERRQUEUE)
5. sock_recv_errqueue() calls put_cmsg() to copy serr->ee from skb->cb
[4] 6. __check_heap_object() fails because skbuff_fclone_cache has no
usercopy whitelist [5]
When cloned skbs allocated from skbuff_fclone_cache are used in the
socket error queue, accessing the sock_exterr_skb structure in skb->cb
via put_cmsg() triggers a usercopy hardening violation:
[ 5.379589] usercopy: Kernel memory exposure attempt detected from SLUB object 'skbuff_fclone_cache' (offset 296, size 16)!
[ 5.382796] kernel BUG at mm/usercopy.c:102!
[ 5.383923] Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
[ 5.384903] CPU: 1 UID: 0 PID: 138 Comm: poc_put_cmsg Not tainted 6.12.57 #7
[ 5.384903] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 5.384903] RIP: 0010:usercopy_abort+0x6c/0x80
[ 5.384903] Code: 1a 86 51 48 c7 c2 40 15 1a 86 41 52 48 c7 c7 c0 15 1a 86 48 0f 45 d6 48 c7 c6 80 15 1a 86 48 89 c1 49 0f 45 f3 e8 84 27 88 ff <0f> 0b 490
[ 5.384903] RSP: 0018:ffffc900006f77a8 EFLAGS: 00010246
[ 5.384903] RAX: 000000000000006f RBX: ffff88800f0ad2a8 RCX: 1ffffffff0f72e74
[ 5.384903] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff87b973a0
[ 5.384903] RBP: 0000000000000010 R08: 0000000000000000 R09: fffffbfff0f72e74
[ 5.384903] R10: 0000000000000003 R11: 79706f6372657375 R12: 0000000000000001
[ 5.384903] R13: ffff88800f0ad2b8 R14: ffffea00003c2b40 R15: ffffea00003c2b00
[ 5.384903] FS: 0000000011bc4380(0000) GS:ffff8880bf100000(0000) knlGS:0000000000000000
[ 5.384903] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5.384903] CR2: 000056aa3b8e5fe4 CR3: 000000000ea26004 CR4: 0000000000770ef0
[ 5.384903] PKRU: 55555554
[ 5.384903] Call Trace:
[ 5.384903] <TASK>
[ 5.384903] __check_heap_object+0x9a/0xd0
[ 5.384903] __check_object_size+0x46c/0x690
[ 5.384903] put_cmsg+0x129/0x5e0
[ 5.384903] sock_recv_errqueue+0x22f/0x380
[ 5.384903] tls_sw_recvmsg+0x7ed/0x1960
[ 5.384903] ? srso_alias_return_thunk+0x5/0xfbef5
[ 5.384903] ? schedule+0x6d/0x270
[ 5.384903] ? srso_alias_return_thunk+0x5/0xfbef5
[ 5.384903] ? mutex_unlock+0x81/0xd0
[ 5.384903] ? __pfx_mutex_unlock+0x10/0x10
[ 5.384903] ? __pfx_tls_sw_recvmsg+0x10/0x10
[ 5.384903] ? _raw_spin_lock_irqsave+0x8f/0xf0
[ 5.384903] ? _raw_read_unlock_irqrestore+0x20/0x40
[ 5.384903] ? srso_alias_return_thunk+0x5/0xfbef5
The crash offset 296 corresponds to skb2->cb within skbuff_fclones:
- sizeof(struct sk_buff) = 232 - offsetof(struct sk_buff, cb) = 40 -
offset of skb2.cb in fclones = 232 + 40 = 272 - crash offset 296 =
272 + 24 (inside sock_exterr_skb.ee)
This patch uses a local stack variable as a bounce buffer to avoid the hardened usercopy check failure.
[1] https://elixir.bootlin.com/linux/v6.12.62/source/net/ipv4/tcp.c#L885
[2] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5104
[3] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5566
[4] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5491
[5] https://elixir.bootlin.com/linux/v6.12.62/source/mm/slub.c#L5719
Fixes: 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251223203534.1392218-2-bestswngs@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the ping program uses an IPPROTO_ICMP socket to send ICMP_ECHO
messages, ICMP_MIB_OUTMSGS is counted twice.
ping_v4_sendmsg
ping_v4_push_pending_frames
ip_push_pending_frames
ip_finish_skb
__ip_make_skb
icmp_out_count(net, icmp_type); // first count
icmp_out_count(sock_net(sk), user_icmph.type); // second count
However, when the ping program uses an IPPROTO_RAW socket,
ICMP_MIB_OUTMSGS is counted correctly only once.
Therefore, the first count should be removed.
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Signed-off-by: yuan.gao <yuan.gao@ucloud.cn>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20251224063145.3615282-1-yuan.gao@ucloud.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When using an 802.1ad bridge with vlan_tunnel, the C-VLAN tag is
incorrectly stripped from frames during egress processing.
br_handle_egress_vlan_tunnel() uses skb_vlan_pop() to remove the S-VLAN
from hwaccel before VXLAN encapsulation. However, skb_vlan_pop() also
moves any "next" VLAN from the payload into hwaccel:
/* move next vlan tag to hw accel tag */
__skb_vlan_pop(skb, &vlan_tci);
__vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);
For QinQ frames where the C-VLAN sits in the payload, this moves it to
hwaccel where it gets lost during VXLAN encapsulation.
Fix by calling __vlan_hwaccel_clear_tag() directly, which clears only
the hwaccel S-VLAN and leaves the payload untouched.
This path is only taken when vlan_tunnel is enabled and tunnel_info
is configured, so 802.1Q bridges are unaffected.
Tested with 802.1ad bridge + VXLAN vlan_tunnel, verified C-VLAN
preserved in VXLAN payload via tcpdump.
Fixes: 11538d039ac6 ("bridge: vlan dst_metadata hooks in ingress and egress paths")
Signed-off-by: Alexandre Knecht <knecht.alexandre@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20251228020057.2788865-1-knecht.alexandre@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently last_gc is being updated everytime a new connection is
tracked, that means that it is updated even if a GC wasn't performed.
With a sufficiently high packet rate, it is possible to always bypass
the GC, causing the list to grow infinitely.
Update the last_gc value only when a GC has been actually performed.
Fixes: d265929930e2 ("netfilter: nf_conncount: reduce unnecessary GC")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
In nf_tables_newrule(), if nft_use_inc() fails, the function jumps to
the err_release_rule label without freeing the allocated flow, leading
to a memory leak.
Fix this by adding a new label err_destroy_flow and jumping to it when
nft_use_inc() fails. This ensures that the flow is properly released
in this error case.
Fixes: 1689f25924ada ("netfilter: nf_tables: report use refcount overflow")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
The -EEXIST error code is reserved by the module loading infrastructure
to indicate that a module is already loaded. When a module's init
function returns -EEXIST, userspace tools like kmod interpret this as
"module already loaded" and treat the operation as successful, returning
0 to the user even though the module initialization actually failed.
Replace -EEXIST with -EBUSY to ensure correct error reporting in the module
initialization path.
Affected modules:
* ebtable_broute ebtable_filter ebtable_nat arptable_filter
* ip6table_filter ip6table_mangle ip6table_nat ip6table_raw
* ip6table_security iptable_filter iptable_mangle iptable_nat
* iptable_raw iptable_security
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
During nft_synproxy eval we are reading nf_synproxy_info struct which
can be modified on update operation concurrently. As nf_synproxy_info
struct fits in 32 bits, use READ_ONCE/WRITE_ONCE annotations.
Fixes: ee394f96ad75 ("netfilter: nft_synproxy: add synproxy stateful object support")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
set->klen has to be used, not sizeof(). The latter only compares a
single register but a full check of the entire key is needed.
Example:
table ip t {
map s {
typeof iifname . ip saddr : verdict
flags interval
}
}
nft add element t s '{ "lo" . 10.0.0.0/24 : drop }' # no error, expected
nft add element t s '{ "lo" . 10.0.0.0/24 : drop }' # no error, expected
nft add element t s '{ "lo" . 10.0.0.0/8 : drop }' # bug: no error
The 3rd 'add element' should be rejected via -ENOTEMPTY, not -EEXIST,
so userspace / nft can report an error to the user.
The latter is only correct for the 2nd case (re-add of existing element).
As-is, userspace is told that the command was successful, but no elements were
added.
After this patch, 3rd command gives:
Error: Could not process rule: File exists
add element t s { "lo" . 127.0.0.0/8 . "lo" : drop }
^^^^^^^^^^^^^^^^^^^^^^^^^
Fixes: 0eb4b5ee33f2 ("netfilter: nft_set_pipapo: Separate partial and complete overlap cases on insertion")
Signed-off-by: Florian Westphal <fw@strlen.de>
|