summaryrefslogtreecommitdiff
path: root/net/netfilter
AgeCommit message (Collapse)Author
14 daysnetfilter: nft_ct: fix missing expect put in obj evalLi Xiasong
nft_ct_expect_obj_eval() allocates an expectation and may call nf_ct_expect_related(), but never drops its local reference. Add nf_ct_expect_put(exp) before return to balance allocation. Fixes: 857b46027d6f ("netfilter: nft_ct: add ct expectations support") Cc: stable@vger.kernel.org Signed-off-by: Li Xiasong <lixiasong1@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
14 daysnetfilter: nf_conntrack_sip: get helper before allocating expectationLi Xiasong
process_register_request() allocates an expectation and then checks whether a conntrack helper is available. If helper lookup fails, the function returns early and the allocated expectation is left behind. Reorder the code to fetch and validate helper before calling nf_ct_expect_alloc(). This keeps the logic simpler and removes the leak path while preserving existing behavior. Fixes: e14575fa7529 ("netfilter: nf_conntrack: use rcu accessors where needed") Cc: stable@vger.kernel.org Signed-off-by: Li Xiasong <lixiasong1@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
14 daysnetfilter: ctnetlink: check tuple and mask in expectations created via nfqueuePablo Neira Ayuso
Ensure the expectation tuple and mask attributes are present in netlink message, otherwise null-ptr-deref is possible. Fixes: bd0779370588 ("netfilter: nfnetlink_queue: allow to attach expectations to conntracks") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
14 daysnetfilter: nf_conntrack_expect: restore helper propagation via expectationPablo Neira Ayuso
A recent series to fix expectations broke helper propagation via expectation, this mechanism is used by the sip and h323 helper. This also propagates the conntrack helper to expected connections. I changed semantics of exp->helper which now tells us the actual helper that created the expectation. Add an explicit assign_helper field to expectations for this purpose and update helpers to use it. Restore this feature for userspace conntrack helper via ctnetlink nfqueue integration so it is again possible to attach a helper to an expectation, where it makes sense. This is not restored via ctnetlink expectation creation as there is no client for such feature. Use the expectation layer 4 protocol number for the helper lookup for consistency. Make sure the expectation using this helper propagation mechanism also go away when the helper is unregistered. Fixes: 9c42bc9db90a ("netfilter: nf_conntrack_expect: honor expectation helper field") Fixes: 917b61fa2042 ("netfilter: ctnetlink: ignore explicit helper on new expectations") Reported-by: Ilya Maximets <i.maximets@ovn.org> Tested-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
14 daysnetfilter: x_tables: add and use xtables_unregister_table_exitFlorian Westphal
Previous change added xtables_unregister_table_pre_exit to detach the table from the packetpath and to unlink it from the active table list. In case of rmmod, userspace that is doing set/getsockopt for this table will not be able to re-instantiate the table: 1. The larval table has been removed already 2. existing instantiated table is no longer on the xt pernet table list. This adds the second stage helper: unlink the table from the dying list, free the hook ops (if any) and do the audit notification. It replaces xt_unregister_table(). Fixes: fdacd57c79b7 ("netfilter: x_tables: never register tables by default") Reported-by: Tristan Madani <tristan@talencesecurity.com> Reviewed-by: Tristan Madani <tristan@talencesecurity.com> Closes: https://lore.kernel.org/netfilter-devel/20260429175613.1459342-1-tristmd@gmail.com/ Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
14 daysnetfilter: x_tables: add and use xt_unregister_table_pre_exitFlorian Westphal
Remove the copypasted variants of _pre_exit and add one single function in the xtables core. ebtables is not compatible with x_tables and therefore unchanged. This is a preparation patch to reduce noise in the followup bug fixes. Reviewed-by: Tristan Madani <tristan@talencesecurity.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
14 daysnetfilter: x_tables: allocate hook ops while under mutexFlorian Westphal
arp/ip(6)t_register_table() add the table to the per-netns list via xt_register_table() before allocating the per-netns hook ops copy via kmemdup_array(). This leaves a window where the table is visible in the list with ops=NULL. If the pernet exit happens runs concurrently the pre_exit callback finds the table via xt_find_table() and passes the NULL ops pointer to nf_unregister_net_hooks(), causing a NULL dereference: general protection fault in nf_unregister_net_hooks+0xbc/0x150 RIP: nf_unregister_net_hooks (net/netfilter/core.c:613) Call Trace: ipt_unregister_table_pre_exit iptable_mangle_net_pre_exit ops_pre_exit_list cleanup_net Fix by moving the ops allocation into the xtables core so the table is never in the list without valid ops. Also ensure the table is no longer processing packets before its torn down on error unwind. nf_register_net_hooks might have published at least one hook; call synchronize_rcu() if there was an error. audit log register message gets deferred until all operations have passed, this avoids need to emit another ureg message in case of error unwinding. Based on earlier patch by Tristan Madani. Fixes: f9006acc8dfe5 ("netfilter: arp_tables: pass table pointer via nf_hook_ops") Fixes: ee177a54413a ("netfilter: ip6_tables: pass table pointer via nf_hook_ops") Fixes: ae689334225f ("netfilter: ip_tables: pass table pointer via nf_hook_ops") Link: https://lore.kernel.org/netfilter-devel/20260429175613.1459342-1-tristmd@gmail.com/ Signed-off-by: Tristan Madani <tristan@talencesecurity.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
14 daysnetfilter: x_tables: allow initial table replace without emitting audit log ↵Florian Westphal
message At the moment we emit the audit log a bit too early, which makes it necessary to also emit an unregister log in case we have to unwind errors after possible hook register failure. Followup patch will be slightly simpler if we can delay the register message until after the hooks have been wired up. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-05-05ipvs: Guard access of HK_TYPE_KTHREAD cpumask with RCUWaiman Long
The ip_vs_ctl.c file and the associated ip_vs.h file are the only places in the kernel where HK_TYPE_KTHREAD cpumask is being retrieved and used. Now that HK_TYPE_KTHREAD/HK_TYPE_DOMAIN cpumask can be changed at run time. We need to use RCU to guard access to this cpumask to avoid a potential UAF problem as the returned cpumask may be freed before it is being used. We can replace HK_TYPE_KTHREAD by HK_TYPE_DOMAIN as they are aliases of each other, but keeping the HK_TYPE_KTHREAD name can highlight the fact that it is the kthread initiated by ipvs that is being controlled. Fixes: 03ff73510169 ("cpuset: Update HK_TYPE_DOMAIN cpumask from cpuset") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-05-05ipvs: fix shift-out-of-bounds in ip_vs_rht_desired_sizeJulian Anastasov
Calling roundup_pow_of_two() with 0 has undefined result: UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13 shift exponent 64 is too large for 64-bit type 'unsigned long' CPU: 1 UID: 0 PID: 77 Comm: kworker/u8:4 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026 Workqueue: events_unbound conn_resize_work_handler Call Trace: <TASK> dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120 ubsan_epilogue+0xa/0x30 lib/ubsan.c:233 __ubsan_handle_shift_out_of_bounds+0x385/0x410 lib/ubsan.c:494 __roundup_pow_of_two include/linux/log2.h:57 [inline] ip_vs_rht_desired_size+0x2cf/0x410 net/netfilter/ipvs/ip_vs_core.c:240 ip_vs_conn_desired_size net/netfilter/ipvs/ip_vs_conn.c:765 [inline] conn_resize_work_handler+0x1b6/0x14c0 net/netfilter/ipvs/ip_vs_conn.c:822 process_one_work kernel/workqueue.c:3302 [inline] process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3385 worker_thread+0xa53/0xfc0 kernel/workqueue.c:3466 kthread+0x388/0x470 kernel/kthread.c:436 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> Reported-by: syzbot+217f1db9c791e27fe54a@syzkaller.appspotmail.com Fixes: b655388111cf ("ipvs: add resizable hash tables") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-05-05ipvs: fix races around est_mutex and est_cpulistJulian Anastasov
Sashiko reports for races and possible crash around the usage of est_cpulist_valid and sysctl_est_cpulist. The problem is that we do not lock est_mutex in some places which can lead to wrong write ordering and as result problems when calling cpumask_weight() and cpumask_empty(). Fix them by moving the est_max_threads read/write under locked est_mutex. Do the same for one ip_vs_est_reload_start() call to protect the cpumask_empty() usage of sysctl_est_cpulist. To remove the chance of deadlock while stopping the estimation kthreads, keep the data structure for kthread 0 even after last estimator is removed and do not hold mutexes while stopping this task. Now we will use a new flag 'needed' to know when kthread 0 should run. The kthreads above 0 do not use mutexes, so stop them under est_mutex because their kthread data still can be destroyed if they do not serve estimators. Now all kthreads will be started by the est_reload_work to properly serialize the stop/start for kthread 0. Reduce the use of service_mutex in ip_vs_est_calc_phase() because under est_mutex we can safely walk est_kt_arr to stop the kthreads above slot 0. As ip_vs_stop_estimator() for tot_stats should be called under service_mutex, do it early in the netns exit path in ip_vs_flush() to avoid locking the mutex again later. It still should be called in ip_vs_control_net_cleanup_sysctl() when we are called during netns init error. Use -2 for ktid as indicator if estimator was already stopped. Finally, fix use-after-free for kd->est_row in ip_vs_est_calc_phase(). est->ktrow should simply switch to a delay value while estimator is linked to est_temp_list. Link: https://sashiko.dev/#/patchset/20260331165015.2777765-1-longman%40redhat.com Link: https://sashiko.dev/#/patchset/20260420171308.87192-1-ja%40ssi.bg Link: https://sashiko.dev/#/patchset/20260422125123.40658-1-ja%40ssi.bg Link: https://sashiko.dev/#/patchset/20260424175858.54752-1-ja%40ssi.bg Link: https://sashiko.dev/#/patchset/20260425103918.7447-1-ja%40ssi.bg Fixes: f0be83d54217 ("ipvs: add est_cpulist and est_nice sysctl vars") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-05-05ipvs: do not leak dest after get from dest trashJulian Anastasov
Sashiko warns about leaked dest if ip_vs_start_estimator() fails in ip_vs_add_dest(). Add ip_vs_trash_put_dest() to put back the dest into dest trash. Link: https://sashiko.dev/#/patchset/20260428175725.72050-1-ja%40ssi.bg Fixes: 705dd3444081 ("ipvs: use kthreads for stats estimation") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-05-05ipvs: fix the spin_lock usage for RT buildJulian Anastasov
syzbot reports for sleeping function called from invalid context [1]. The recently added code for resizable hash tables uses hlist_bl bit locks in combination with spin_lock for the connection fields (cp->lock). Fix the following problems: * avoid using spin_lock(&cp->lock) under locked bit lock because it sleeps on PREEMPT_RT * as the recent changes call ip_vs_conn_hash() only for newly allocated connection, the spin_lock can be removed there because the connection is still not linked to table and does not need cp->lock protection. * the lock can be removed also from ip_vs_conn_unlink() where we are the last connection user. * the last place that is fixed is ip_vs_conn_fill_cport() where now the cp->lock is locked before the other locks to ensure other packets do not modify the cp->flags in non-atomic way. Here we make sure cport and flags are changed only once if two or more packets race to fill the cport. Also, we fill cport early, so that if we race with resizing there will be valid cport key for the hashing. Add a warning if too many hash table changes occur for our RCU read-side critical section which is error condition but minor because the connection still can expire gracefully. Still, restore the cport to 0 to allow retransmitted packet to properly fill the cport. Problems reported by Sashiko. [1]: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 16, name: ktimers/0 preempt_count: 2, expected: 0 RCU nest depth: 3, expected: 3 8 locks held by ktimers/0/16: #0: ffffffff8de5f260 (local_bh){.+.+}-{1:3}, at: __local_bh_disable_ip+0x3c/0x420 kernel/softirq.c:163 #1: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: __local_bh_disable_ip+0x3c/0x420 kernel/softirq.c:163 #2: ffff8880b8826360 (&base->expiry_lock){+...}-{3:3}, at: spin_lock include/linux/spinlock_rt.h:45 [inline] #2: ffff8880b8826360 (&base->expiry_lock){+...}-{3:3}, at: timer_base_lock_expiry kernel/time/timer.c:1502 [inline] #2: ffff8880b8826360 (&base->expiry_lock){+...}-{3:3}, at: __run_timer_base+0x120/0x9f0 kernel/time/timer.c:2384 #3: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:300 [inline] #3: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #3: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: __rt_spin_lock kernel/locking/spinlock_rt.c:50 [inline] #3: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rt_spin_lock+0x1e0/0x400 kernel/locking/spinlock_rt.c:57 #4: ffffc90000157a80 ((&cp->timer)){+...}-{0:0}, at: call_timer_fn+0xd4/0x5e0 kernel/time/timer.c:1745 #5: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:300 [inline] #5: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #5: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: ip_vs_conn_unlink net/netfilter/ipvs/ip_vs_conn.c:315 [inline] #5: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: ip_vs_conn_expire+0x257/0x2390 net/netfilter/ipvs/ip_vs_conn.c:1260 #6: ffffffff8de5f260 (local_bh){.+.+}-{1:3}, at: __local_bh_disable_ip+0x3c/0x420 kernel/softirq.c:163 #7: ffff888068d4c3f0 (&cp->lock#2){+...}-{3:3}, at: spin_lock include/linux/spinlock_rt.h:45 [inline] #7: ffff888068d4c3f0 (&cp->lock#2){+...}-{3:3}, at: ip_vs_conn_unlink net/netfilter/ipvs/ip_vs_conn.c:324 [inline] #7: ffff888068d4c3f0 (&cp->lock#2){+...}-{3:3}, at: ip_vs_conn_expire+0xd4a/0x2390 net/netfilter/ipvs/ip_vs_conn.c:1260 Preemption disabled at: [<ffffffff898a6358>] bit_spin_lock include/linux/bit_spinlock.h:38 [inline] [<ffffffff898a6358>] hlist_bl_lock+0x18/0x110 include/linux/list_bl.h:149 CPU: 0 UID: 0 PID: 16 Comm: ktimers/0 Tainted: G W L syzkaller #0 PREEMPT_{RT,(full)} Tainted: [W]=WARN, [L]=SOFTLOCKUP Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026 Call Trace: <TASK> dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120 __might_resched+0x329/0x480 kernel/sched/core.c:9162 __rt_spin_lock kernel/locking/spinlock_rt.c:48 [inline] rt_spin_lock+0xc2/0x400 kernel/locking/spinlock_rt.c:57 spin_lock include/linux/spinlock_rt.h:45 [inline] ip_vs_conn_unlink net/netfilter/ipvs/ip_vs_conn.c:324 [inline] ip_vs_conn_expire+0xd4a/0x2390 net/netfilter/ipvs/ip_vs_conn.c:1260 call_timer_fn+0x192/0x5e0 kernel/time/timer.c:1748 expire_timers kernel/time/timer.c:1799 [inline] __run_timers kernel/time/timer.c:2374 [inline] __run_timer_base+0x6a3/0x9f0 kernel/time/timer.c:2386 run_timer_base kernel/time/timer.c:2395 [inline] run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2405 handle_softirqs+0x1de/0x6d0 kernel/softirq.c:622 __do_softirq kernel/softirq.c:656 [inline] run_ktimerd+0x69/0x100 kernel/softirq.c:1151 smpboot_thread_fn+0x541/0xa50 kernel/smpboot.c:160 kthread+0x388/0x470 kernel/kthread.c:436 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> Reported-by: syzbot+504e778ddaecd36fdd17@syzkaller.appspotmail.com Link: https://sashiko.dev/#/patchset/20260415200216.79699-1-ja%40ssi.bg Link: https://sashiko.dev/#/patchset/20260420165539.85174-4-ja%40ssi.bg Link: https://sashiko.dev/#/patchset/20260422135823.50489-4-ja%40ssi.bg Fixes: 2fa7cc9c7025 ("ipvs: switch to per-net connection table") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-05-05ipvs: fix races around the conn_lfactor and svc_lfactor sysctl varsJulian Anastasov
Sashiko warns that the new sysctls vars can be changed after the hash tables are destroyed and their respective resizing works canceled, leading to mod_delayed_work() being called for canceled works. Solve this in different ways. conn_tab can be present even without services and is destroyed only on netns exit, so use disable_delayed_work_sync() to disable the work instead of adding more synchronization mechanisms. As for the svc_table, it is destroyed when the services are deleted, so we must be sure that netns exit is not called yet (the check for 'enable') and the work is not canceled by checking all under same mutex lock. Also, use WRITE_ONCE when updating the sysctl vars as we already read them with READ_ONCE. Link: https://sashiko.dev/#/patchset/20260410112352.23599-1-fw%40strlen.de Fixes: 8d7de5477e47 ("ipvs: add conn_lfactor and svc_lfactor sysctl vars") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-05-05ipvs: fixes for the new ip_vs_status infoJulian Anastasov
Sashiko reports some problems for the recently added /proc/net/ip_vs_status: * ip_vs_status_show() as a table reader may run long after the conn_tab and svc_table table are released. While ip_vs_conn_flush() properly changes the conn_tab_changes counter when conn_tab is removed, ip_vs_del_service() and ip_vs_flush() were missing such change for the svc_table_changes counter. As result, readers like ip_vs_dst_event() and ip_vs_status_show() may continue to use a freed table after a cond_resched_rcu() call. * While counting the buckets in ip_vs_status_show() make sure we traverse only the needed number of entries in the chain. This also prevents possible overflow of the 'count' variable. * Add check for 'loops' to prevent infinite loops while restarting the traversal on table change. * While IP_VS_CONN_TAB_MAX_BITS is 20 on 32-bit platforms and there is no risk to overflow when multiplying the number of conn_tab buckets to 100, prefer the div_u64() helper to make the following dividing safer. * Use 0440 permissions for ip_vs_status to restrict the info only to root due to the exported information for hash distribution. Link: https://sashiko.dev/#/patchset/20260410112352.23599-1-fw%40strlen.de Fixes: 9a9ccef907a7 ("ipvs: add ip_vs_status info") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-05-01netfilter: flowtable: use skb_pull_rcsum() to pop vlan/pppoe headerPablo Neira Ayuso
This adjusts the checksum, if required, after pulling the layer 2 header, either the pppoe header or the inner vlan header in the double-tagged vlan packets. Fixes: 4cd91f7c290f ("netfilter: flowtable: add vlan support") Fixes: 72efd585f714 ("netfilter: flowtable: add pppoe support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-05-01netfilter: flowtable: fix inline pppoe encapsulation in xmit pathPablo Neira Ayuso
Address two issues in the inline pppoe encapsulation: - Add needs_gso_segment flag to segment PPPoE packets in software given that there is no GSO support for this. - Use FLOW_OFFLOAD_XMIT_DIRECT since neighbour cache is not available in point-to-point device, use the hardware address that is obtained via flowtable path discovery (ie. fill_forward_path). Fixes: 18d27bed0880 ("netfilter: flowtable: inline pppoe encapsulation in xmit path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-05-01netfilter: flowtable: fix inline vlan encapsulation in xmit pathPablo Neira Ayuso
Several issues in the inline vlan support: - The layer 2 encapsulation representation in the tuple takes encap[0] as the outer header and encap[1] as the inner header as seen from the ingress path. Reverse the encap loop to push first the inner then the outer vlan header. - Postpone pushing the layer 2 header once destination device is known. This allows to calculate the needed hearoom via LL_RESERVED_SPACE to accommodate the layer 2 headers. - Add and use nf_flow_vlan_push() as suggested by Eric Woudstra, this is a simplified version of skb_vlan_push() for egress path only. Fixes: c653d5a78f34 ("netfilter: flowtable: inline vlan encapsulation in xmit path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-30netfilter: flowtable: ensure sufficient headroom in xmit pathPablo Neira Ayuso
Check for headroom and call skb_expand_head() like in the IP output path to ensure there is sufficient headroom for the mac header when forwarding this packet as suggested by sashiko. Fixes: b5964aac51e0 ("netfilter: flowtable: consolidate xmit path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-30netfilter: xtables: fix L4 header parsing for non-first fragmentsFernando Fernandez Mancera
Multiple targets and matches relies on L4 header to operate. For fragmented packets, every fragment carries the transport protocol identifier, but only the first fragment contains the L4 header. As the 'raw' table can be configured to run at priority -450 (before defragmentation at -400), the target/match can be reached before reassembly. In this case, non-first fragments have their payload incorrectly parsed as a TCP/UDP header. This would be of course a misconfiguration scenario. In most of the cases this just lead to a unreliable behavior for fragmented traffic. Add a fragment check to ensure target/match only evaluates unfragmented packets or the first fragment in the stream. Fixes: 902d6a4c2a4f ("netfilter: nf_defrag: Skip defrag if NOTRACK is set") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-30netfilter: nf_tables: skip L4 header parsing for non-first fragmentsFernando Fernandez Mancera
The tproxy, osf and exthdr (SCTP) expressions rely on the presence of transport layer headers to perform socket lookups, fingerprint matching, or chunk extraction. For fragmented packets, while the IP protocol remains constant across all fragments, only the first fragment contains the actual L4 header. The expressions could be attached to a chain with a priority lower than -400, bypassing defragmentation. Or could be used in stateless environments where defragmentation is not happening at all. This could result in garbage data being used for the matching. Add a check for pkt->fragoff so only unfragmented packets or the first fragment is processed. Fixes: 133dc203d77d ("netfilter: nft_exthdr: Support SCTP chunks") Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support") Fixes: b96af92d6eaf ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-30netfilter: nf_tables: fix netdev hook allocation memleak with dormant tablesFlorian Westphal
sashiko says: could the related code in __nf_tables_abort() leak the struct nft_hook objects when the table is dormant? In __nf_tables_abort(), when rolling back a NEWCHAIN transaction that updates hooks, the code conditionally unregisters and frees the hooks only if the table is not dormant [..] if (!(table->flags & NFT_TABLE_F_DORMANT)) { nft_netdev_unregister_hooks(net, &nft_trans_chain_hooks(trans), true); } ... nft_trans_destroy(trans); Unfortunately netdev family mixes hook registration and allocation. Push table struct down and only check for the flag to unregister. Fixes: 216e7bf7402c ("netfilter: nf_tables: skip netdev hook unregistration if table is dormant") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-30netfilter: xt_CT: fix usersize for v1 and v2 revisionFlorian Westphal
While resurrecting the conntrack-tool test cases I found following bug: In: iptables -I OUTPUT -t raw -p 13 -j CT --timeout test-generic Out: [0:0] -A OUTPUT -p 13 -j CT --timeout test Data after first four bytes of the timeout policy name is never copied to userspace because its treated as kernel-only. Fixes: ec2318904965 ("xtables: extend matches and targets with .usersize") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-30netfilter: nft_compat: run xt_check_hooks_{match,target}() from .validatePablo Neira Ayuso
Several matches and one target check that the hook is correct from checkentry(), however, the basechain is only available from nft_table_validate(). This patch uses xt_check_hooks_{match,target}() from the nft_compat expression .validate path. This patch sets the table in the nft_ctx struct in nft_table_validate() which is required by this patch. Based on patch from Florian Westphal. Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-30netfilter: x_tables: add .check_hooks to matches and targetsPablo Neira Ayuso
Add a new .check_hooks interface for checking if the match/target is used from the validate hook according to its configuration. Move existing conditional hook check based on the match/target configuration from .checkentry to .check_hooks for the following matches/targets: - addrtype - devgroup - physdev - policy - set - TCPMSS - SET This is a preparation patch to fix nft_compat, not functional changes are intended. Based on patch from Florian Westphal. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-30netfilter: nft_fwd_netdev: use recursion counter in neigh egress pathWeiming Shi
nft_fwd_neigh can be used in egress chains (NF_NETDEV_EGRESS). When the forwarding rule targets the same device or two devices forward to each other, neigh_xmit() triggers dev_queue_xmit() which re-enters nf_hook_egress(), causing infinite recursion and stack overflow. Move the nf_get_nf_dup_skb_recursion() accessor and NF_RECURSION_LIMIT to the shared header nf_dup_netdev.h as a static inline, so that nft_fwd_netdev can use the recursion counter directly without exported function call overhead. Guard neigh_xmit() with the same recursion limit already used in nf_do_netdev_egress(). [ Updated to cache the nf_get_nf_dup_skb_recursion pointer. --pablo ] Fixes: f87b9464d152 ("netfilter: nft_fwd_netdev: Support egress hook") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-30netfilter: nft_fwd_netdev: add device and headroom validate with neigh ↵Pablo Neira Ayuso
forwarding The ttl field has been decremented already and evaluation of this rule would proceed, just drop this packet instead if there is no destination device to forwards this packet. This is exactly what nf_dup already does in this case. Moreover, check for headroom and call skb_expand_head() like in the IP output path to ensure there is sufficient headroom when forwarding this via neigh_xmit(). Fixes: d32de98ea70f ("netfilter: nft_fwd_netdev: allow to forward packets via neighbour layer") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-30netfilter: replace skb_try_make_writable() by skb_ensure_writable()Pablo Neira Ayuso
skb_try_make_writable() only works on clones and uncloned packets might have their network header in paged fragments. nft_fwd needs to work for the ingress and egress hooks, but the egress hook where skb->data points to the mac header, use skb_network_offset() to include the mac header. The flowtable is fine since it already uses the transport offset. Fixes: d32de98ea70f ("netfilter: nft_fwd_netdev: allow to forward packets via neighbour layer") Fixes: 7d2086871762 ("netfilter: nf_flow_table: move ipv4 offload hook code to nf_flow_table") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-28netfilter: skip recording stale or retransmitted INITXin Long
An INIT whose init_tag matches the peer's vtag does not provide new state information. It indicates either: - a stale INIT (after INIT-ACK has already been seen on the same side), or - a retransmitted INIT (after INIT has already been recorded on the same side). In both cases, the INIT must not update ct->proto.sctp.init[] state, since it does not advance the handshake tracking and may otherwise corrupt INIT/INIT-ACK validation logic. Allow INIT processing only when the conntrack entry is newly created (SCTP_CONNTRACK_NONE), or when the init_tag differs from the stored peer vtag. Note it skips the check for the ct with old_state SCTP_CONNTRACK_NONE in nf_conntrack_sctp_packet(), as it is just created in sctp_new() where it set ct->proto.sctp.vtag[IP_CT_DIR_REPLY] = ih->init_tag. Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.") Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/ee56c3e416452b2a40589a2a85245ac2ad5e9f4b.1777214801.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-24netfilter: nf_conntrack_sip: don't use simple_strtoulFlorian Westphal
Replace unsafe port parsing in epaddr_len(), ct_sip_parse_header_uri(), and ct_sip_parse_request() with a new sip_parse_port() helper that validates each digit against the buffer limit, eliminating the use of simple_strtoul() which assumes NUL-terminated strings. The previous code dereferenced pointers without bounds checks after sip_parse_addr() and relied on simple_strtoul() on non-NUL-terminated skb data. A port that reaches the buffer limit without a trailing character is also rejected as malformed. Also get rid of all simple_strtoul() usage in conntrack, prefer a stricter version instead. There are intentional changes: - Bail out if number is > UINT_MAX and indicate a failure, same for too long sequences. While we do accept 05535 as port 5535, we will not accept e.g. 'sip:10.0.0.1:005060'. While its syntactically valid under RFC 3261, we should restrict this to not waste cycles when presented with malformed packets with 64k '0' characters. - Force base 10 in ct_sip_parse_numerical_param(). This is used to fetch 'expire=' and 'rports='; both are expected to use base-10. - In nf_nat_sip.c, only accept the parsed value if its within the 1k-64k range. - epaddr_len now returns 0 if the port is invalid, as it already does for invalid ip addresses. This is intentional. nf_conntrack_sip performs lots of guesswork to find the right parts of the message to parse. Being stricter could break existing setups. Connection tracking helpers are designed to allow traffic to pass, not to block it. Based on an earlier patch from Jenny Guanni Qu <qguanni@gmail.com>. Fixes: 05e3ced297fe ("[NETFILTER]: nf_conntrack_sip: introduce SIP-URI parsing helper") Reported-by: Klaudia Kloc <klaudia@vidocsecurity.com> Reported-by: Dawid Moczadło <dawid@vidocsecurity.com> Reported-by: Jenny Guanni Qu <qguanni@gmail.com>. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-24netfilter: reject zero shift in nft_bitwiseKai Ma
Reject zero shift operands for nft_bitwise left and right shift expressions during initialization. The carry propagation logic computes the carry from the adjacent 32-bit word using BITS_PER_TYPE(u32) - shift. A zero shift operand turns this into a 32-bit shift, which is undefined behaviour. Reject zero shift operands in the control plane, alongside the existing check for values greater than or equal to 32, so malformed rules never reach the packet path. Fixes: 567d746b55bc ("netfilter: bitwise: add support for shifts.") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Kai Ma <k4729.23098@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-24netfilter: xt_policy: fix strict mode inbound policy matchingJiexun Wang
match_policy_in() walks sec_path entries from the last transform to the first one, but strict policy matching needs to consume info->pol[] in the same forward order as the rule layout. Derive the strict-match policy position from the number of transforms already consumed so that multi-element inbound rules are matched consistently. Fixes: c4b885139203 ("[NETFILTER]: x_tables: replace IPv4/IPv6 policy match by address family independant version") Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-21netfilter: nf_tables: add hook transactions for device deletionsPablo Neira Ayuso
Restore the flag that indicates that the hook is going away, ie. NFT_HOOK_REMOVE, but add a new transaction object to track deletion of hooks without altering the basechain/flowtable hook_list during the preparation phase. The existing approach that moves the hook from the basechain/flowtable hook_list to transaction hook_list breaks netlink dump path readers of this RCU-protected list. It should be possible use an array for nft_trans_hook to store the deleted hooks to compact the representation but I am not expecting many hook object, specially now that wildcard support for devices is in place. Note that the nft_trans_chain_hooks() list contains a list of struct nft_trans_hook objects for DELCHAIN and DELFLOWTABLE commands, while this list stores struct nft_hook objects for NEWCHAIN and NEWFLOWTABLE. Note that new commands can be updated to use nft_trans_hook for consistency. This patch also adapts the event notification path to deal with the list of hook transactions. Fixes: 7d937b107108 ("netfilter: nf_tables: support for deleting devices in an existing netdev chain") Fixes: b6d9014a3335 ("netfilter: nf_tables: delete flowtable hooks via transaction list") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-21netfilter: nf_tables: join hook list via splice_list_rcu() in commit phasePablo Neira Ayuso
Publish new hooks in the list into the basechain/flowtable using splice_list_rcu() to ensure netlink dump list traversal via rcu is safe while concurrent ruleset update is going on. Fixes: 78d9f48f7f44 ("netfilter: nf_tables: add devices to existing flowtable") Fixes: b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-21netfilter: nf_tables: use list_del_rcu for netlink hooksFlorian Westphal
nft_netdev_unregister_hooks and __nft_unregister_flowtable_net_hooks need to use list_del_rcu(), this list can be walked by concurrent dumpers. Add a new helper and use it consistently. Fixes: f9a43007d3f7 ("netfilter: nf_tables: double hook unregistration in netns path") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-20netfilter: nfnetlink_osf: fix potential NULL dereference in ttl checkFernando Fernandez Mancera
The nf_osf_ttl() function accessed skb->dev to perform a local interface address lookup without verifying that the device pointer was valid. Additionally, the implementation utilized an in_dev_for_each_ifa_rcu loop to match the packet source address against local interface addresses. It assumed that packets from the same subnet should not see a decrement on the initial TTL. A packet might appear it is from the same subnet but it actually isn't especially in modern environments with containers and virtual switching. Remove the device dereference and interface loop. Replace the logic with a switch statement that evaluates the TTL according to the ttl_check. Fixes: 11eeef41d5f6 ("netfilter: passive OS fingerprint xtables match") Reported-by: Kito Xu (veritas501) <hxzene@gmail.com> Closes: https://lore.kernel.org/netfilter-devel/20260414074556.2512750-1-hxzene@gmail.com/ Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-20netfilter: nfnetlink_osf: fix out-of-bounds read on option matchingFernando Fernandez Mancera
In nf_osf_match(), the nf_osf_hdr_ctx structure is initialized once and passed by reference to nf_osf_match_one() for each fingerprint checked. During TCP option parsing, nf_osf_match_one() advances the shared ctx->optp pointer. If a fingerprint perfectly matches, the function returns early without restoring ctx->optp to its initial state. If the user has configured NF_OSF_LOGLEVEL_ALL, the loop continues to the next fingerprint. However, because ctx->optp was not restored, the next call to nf_osf_match_one() starts parsing from the end of the options buffer. This causes subsequent matches to read garbage data and fail immediately, making it impossible to log more than one match or logging incorrect matches. Instead of using a shared ctx->optp pointer, pass the context as a constant pointer and use a local pointer (optp) for TCP option traversal. This makes nf_osf_match_one() strictly stateless from the caller's perspective, ensuring every fingerprint check starts at the correct option offset. Fixes: 1a6a0951fc00 ("netfilter: nfnetlink_osf: add missing fmatch check") Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-20ipvs: fix MTU check for GSO packets in tunnel modeYingnan Zhang
Currently, IPVS skips MTU checks for GSO packets by excluding them with the !skb_is_gso(skb) condition. This creates problems when IPVS tunnel mode encapsulates GSO packets with IPIP headers. The issue manifests in two ways: 1. MTU violation after encapsulation: When a GSO packet passes through IPVS tunnel mode, the original MTU check is bypassed. After adding the IPIP tunnel header, the packet size may exceed the outgoing interface MTU, leading to unexpected fragmentation at the IP layer. 2. Fragmentation with problematic IP IDs: When net.ipv4.vs.pmtu_disc=1 and a GSO packet with multiple segments is fragmented after encapsulation, each segment gets a sequentially incremented IP ID (0, 1, 2, ...). This happens because: a) The GSO packet bypasses MTU check and gets encapsulated b) At __ip_finish_output, the oversized GSO packet is split into separate SKBs (one per segment), with IP IDs incrementing c) Each SKB is then fragmented again based on the actual MTU This sequential IP ID allocation differs from the expected behavior and can cause issues with fragment reassembly and packet tracking. Fix this by properly validating GSO packets using skb_gso_validate_network_len(). This function correctly validates whether the GSO segments will fit within the MTU after segmentation. If validation fails, send an ICMP Fragmentation Needed message to enable proper PMTU discovery. Fixes: 4cdd34084d53 ("netfilter: nf_conntrack_ipv6: improve fragmentation handling") Signed-off-by: Yingnan Zhang <342144303@qq.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-20netfilter: nat: use kfree_rcu to release opsPablo Neira Ayuso
Florian Westphal says: "Historically this is not an issue, even for normal base hooks: the data path doesn't use the original nf_hook_ops that are used to register the callbacks. However, in v5.14 I added the ability to dump the active netfilter hooks from userspace. This code will peek back into the nf_hook_ops that are available at the tail of the pointer-array blob used by the datapath. The nat hooks are special, because they are called indirectly from the central nat dispatcher hook. They are currently invisible to the nfnl hook dump subsystem though. But once that changes the nat ops structures have to be deferred too." Update nf_nat_register_fn() to deal with partial exposition of the hooks from error path which can be also an issue for nfnetlink_hook. Fixes: e2cf17d3774c ("netfilter: add new hook nfnl subsystem") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-20netfilter: xtables: restrict several matches to inet familyPablo Neira Ayuso
This is a partial revert of: commit ab4f21e6fb1c ("netfilter: xtables: use NFPROTO_UNSPEC in more extensions") to allow ipv4 and ipv6 only. - xt_mac - xt_owner - xt_physdev These extensions are not used by ebtables in userspace. Moreover, xt_realm is only for ipv4, since dst->tclassid is ipv4 specific. Fixes: ab4f21e6fb1c ("netfilter: xtables: use NFPROTO_UNSPEC in more extensions") Reported-by: "Kito Xu (veritas501)" <hxzene@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-20netfilter: conntrack: remove sprintf usageFlorian Westphal
Replace it with scnprintf, the buffer sizes are expected to be large enough to hold the result, no need for snprintf+overflow check. Increase buffer size in mangle_content_len() while at it. BUG: KASAN: stack-out-of-bounds in vsnprintf+0xea5/0x1270 Write of size 1 at addr [..] vsnprintf+0xea5/0x1270 sprintf+0xb1/0xe0 mangle_content_len+0x1ac/0x280 nf_nat_sdp_session+0x1cc/0x240 process_sdp+0x8f8/0xb80 process_invite_request+0x108/0x2b0 process_sip_msg+0x5da/0xf50 sip_help_tcp+0x45e/0x780 nf_confirm+0x34d/0x990 [..] Fixes: 9fafcd7b2032 ("[NETFILTER]: nf_conntrack/nf_nat: add SIP helper port") Reported-by: Yiming Qian <yimingqian591@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-20netfilter: nfnetlink_osf: fix divide-by-zero in OSF_WSS_MODULOXiang Mei
nf_osf_match_one() computes ctx->window % f->wss.val in the OSF_WSS_MODULO branch with no guard for f->wss.val == 0. A CAP_NET_ADMIN user can add such a fingerprint via nfnetlink; a subsequent matching TCP SYN divides by zero and panics the kernel. Reject the bogus fingerprint in nfnl_osf_add_callback() above the per-option for-loop. f->wss is per-fingerprint, not per-option, so the check must run regardless of f->opt_num (including 0). Also reject wss.wc >= OSF_WSS_MAX; nf_osf_match_one() already treats that as "should not happen". Crash: Oops: divide error: 0000 [#1] SMP KASAN NOPTI RIP: 0010:nf_osf_match_one (net/netfilter/nfnetlink_osf.c:98) Call Trace: <IRQ> nf_osf_match (net/netfilter/nfnetlink_osf.c:220) xt_osf_match_packet (net/netfilter/xt_osf.c:32) ipt_do_table (net/ipv4/netfilter/ip_tables.c:348) nf_hook_slow (net/netfilter/core.c:622) ip_local_deliver (net/ipv4/ip_input.c:265) ip_rcv (include/linux/skbuff.h:1162) __netif_receive_skb_one_core (net/core/dev.c:6181) process_backlog (net/core/dev.c:6642) __napi_poll (net/core/dev.c:7710) net_rx_action (net/core/dev.c:7945) handle_softirqs (kernel/softirq.c:622) Fixes: 11eeef41d5f6 ("netfilter: passive OS fingerprint xtables match") Reported-by: Weiming Shi <bestswngs@gmail.com> Suggested-by: Florian Westphal <fw@strlen.de> Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Xiang Mei <xmei5@asu.edu> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-20netfilter: nft_osf: restrict it to ipv4Pablo Neira Ayuso
This expression only supports for ipv4, restrict it. Fixes: b96af92d6eaf ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf") Acked-by: Florian Westphal <fw@strlen.de> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-10netfilter: require Ethernet MAC header before using eth_hdr()Zhengchuan Liang
`ip6t_eui64`, `xt_mac`, the `bitmap:ip,mac`, `hash:ip,mac`, and `hash:mac` ipset types, and `nf_log_syslog` access `eth_hdr(skb)` after either assuming that the skb is associated with an Ethernet device or checking only that the `ETH_HLEN` bytes at `skb_mac_header(skb)` lie between `skb->head` and `skb->data`. Make these paths first verify that the skb is associated with an Ethernet device, that the MAC header was set, and that it spans at least a full Ethernet header before accessing `eth_hdr(skb)`. Suggested-by: Florian Westphal <fw@strlen.de> Tested-by: Ren Wei <enjou1224z@gmail.com> Signed-off-by: Zhengchuan Liang <zcliangcn@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-04-10netfilter: nft_fwd_netdev: check ttl/hl before forwardingFlorian Westphal
Drop packets if their ttl/hl is too small for forwarding. Fixes: d32de98ea70f ("netfilter: nft_fwd_netdev: allow to forward packets via neighbour layer") Signed-off-by: Florian Westphal <fw@strlen.de>
2026-04-10netfilter: x_tables: Avoid a couple -Wflex-array-member-not-at-end warningsGustavo A. R. Silva
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Use the TRAILING_OVERLAP() helper to fix the following warnings: 1 net/netfilter/x_tables.c:816:39: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] 1 net/netfilter/x_tables.c:811:39: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] This helper creates a union between a flexible-array member (FAM) and a set of members that would otherwise follow it. This overlays the trailing members onto the FAM while preserving the original memory layout. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-04-10netfilter: conntrack: remove UDP-Lite conntrack supportFernando Fernandez Mancera
UDP-Lite (RFC 3828) socket support was recently retired from the core networking stack. As a follow-up of that, drop the connection tracker and NAT support for UDP-Lite in Netfilter. This patch removes CONFIG_NF_CT_PROTO_UDPLITE and scrubs UDP-Lite awareness from the conntrack core, NAT core, nft_ct, and ctnetlink. Please note that stateless packet inspection, matching, ipsets or logging support for IPPROTO_UDPLITE is preserved. As conntrack no longer extracts UDP-Lite ports or tracks its L4 state, when performing NAT the UDP-Lite checksum cannot be updated anymore. That is an expected and acceptable consequence of removing UDP-Lite conntrack module. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-04-10netfilter: xt_socket: enable defrag after all other checksFlorian Westphal
Originally this did not matter because defrag was enabled once per netns and only disabled again on netns dismantle. When this got changed I should have adjusted checkentry to not leave defrag enabled on error. Fixes: de8c12110a13 ("netfilter: disable defrag once its no longer needed") Signed-off-by: Florian Westphal <fw@strlen.de>
2026-04-10netfilter: xt_HL: add pr_fmt and checkentry validationMarino Dzalto
Add pr_fmt to prefix log messages with the module name for easier debugging in dmesg. Add checkentry functions for IPv4 (ttl_mt_check) and IPv6 (hl_mt6_check) to validate the match mode at rule registration time, rejecting invalid modes with -EINVAL. The evaluation function returns false in case the mode is unknown, so this is a cleanup, not a bug fix. Signed-off-by: Marino Dzalto <marino.dzalto@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-04-10netfilter: nfnetlink: prefer skb_mac_header helpersFlorian Westphal
This adds implicit DEBUG_WARN_ON_ONCE for debug configurations. No other changes intended. Signed-off-by: Florian Westphal <fw@strlen.de>