<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/netfilter/nfnetlink_queue.c, branch v7.0</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>netfilter: nfnetlink_queue: make hash table per queue</title>
<updated>2026-04-08T11:34:51+00:00</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2026-04-07T15:00:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=936206e3f6ff411581e615e930263d6f8b78df9d'/>
<id>936206e3f6ff411581e615e930263d6f8b78df9d</id>
<content type='text'>
Sharing a global hash table among all queues is tempting, but
it can cause crash:

BUG: KASAN: slab-use-after-free in nfqnl_recv_verdict+0x11ac/0x15e0 [nfnetlink_queue]
[..]
 nfqnl_recv_verdict+0x11ac/0x15e0 [nfnetlink_queue]
 nfnetlink_rcv_msg+0x46a/0x930
 kmem_cache_alloc_node_noprof+0x11e/0x450

struct nf_queue_entry is freed via kfree, but parallel cpu can still
encounter such an nf_queue_entry when walking the list.

Alternative fix is to free the nf_queue_entry via kfree_rcu() instead,
but as we have to alloc/free for each skb this will cause more mem
pressure.

Cc: Scott Mitchell &lt;scott.k.mitch1@gmail.com&gt;
Fixes: e19079adcd26 ("netfilter: nfnetlink_queue: optimize verdict lookup with hash table")
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Sharing a global hash table among all queues is tempting, but
it can cause crash:

BUG: KASAN: slab-use-after-free in nfqnl_recv_verdict+0x11ac/0x15e0 [nfnetlink_queue]
[..]
 nfqnl_recv_verdict+0x11ac/0x15e0 [nfnetlink_queue]
 nfnetlink_rcv_msg+0x46a/0x930
 kmem_cache_alloc_node_noprof+0x11e/0x450

struct nf_queue_entry is freed via kfree, but parallel cpu can still
encounter such an nf_queue_entry when walking the list.

Alternative fix is to free the nf_queue_entry via kfree_rcu() instead,
but as we have to alloc/free for each skb this will cause more mem
pressure.

Cc: Scott Mitchell &lt;scott.k.mitch1@gmail.com&gt;
Fixes: e19079adcd26 ("netfilter: nfnetlink_queue: optimize verdict lookup with hash table")
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfilter: nfnetlink_queue: fix entry leak in bridge verdict error path</title>
<updated>2026-03-10T13:10:42+00:00</updated>
<author>
<name>Hyunwoo Kim</name>
<email>imv4bel@gmail.com</email>
</author>
<published>2026-03-07T17:24:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f1ba83755d81c6fc66ac7acd723d238f974091e9'/>
<id>f1ba83755d81c6fc66ac7acd723d238f974091e9</id>
<content type='text'>
nfqnl_recv_verdict() calls find_dequeue_entry() to remove the queue
entry from the queue data structures, taking ownership of the entry.
For PF_BRIDGE packets, it then calls nfqa_parse_bridge() to parse VLAN
attributes.  If nfqa_parse_bridge() returns an error (e.g. NFQA_VLAN
present but NFQA_VLAN_TCI missing), the function returns immediately
without freeing the dequeued entry or its sk_buff.

This leaks the nf_queue_entry, its associated sk_buff, and all held
references (net_device refcounts, struct net refcount).  Repeated
triggering exhausts kernel memory.

Fix this by dropping the entry via nfqnl_reinject() with NF_DROP verdict
on the error path, consistent with other error handling in this file.

Fixes: 8d45ff22f1b4 ("netfilter: bridge: nf queue verdict to use NFQA_VLAN and NFQA_L2HDR")
Reviewed-by: David Dull &lt;monderasdor@gmail.com&gt;
Signed-off-by: Hyunwoo Kim &lt;imv4bel@gmail.com&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
nfqnl_recv_verdict() calls find_dequeue_entry() to remove the queue
entry from the queue data structures, taking ownership of the entry.
For PF_BRIDGE packets, it then calls nfqa_parse_bridge() to parse VLAN
attributes.  If nfqa_parse_bridge() returns an error (e.g. NFQA_VLAN
present but NFQA_VLAN_TCI missing), the function returns immediately
without freeing the dequeued entry or its sk_buff.

This leaks the nf_queue_entry, its associated sk_buff, and all held
references (net_device refcounts, struct net refcount).  Repeated
triggering exhausts kernel memory.

Fix this by dropping the entry via nfqnl_reinject() with NF_DROP verdict
on the error path, consistent with other error handling in this file.

Fixes: 8d45ff22f1b4 ("netfilter: bridge: nf queue verdict to use NFQA_VLAN and NFQA_L2HDR")
Reviewed-by: David Dull &lt;monderasdor@gmail.com&gt;
Signed-off-by: Hyunwoo Kim &lt;imv4bel@gmail.com&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: Replace kmalloc with kmalloc_obj for non-scalar types</title>
<updated>2026-02-21T09:02:28+00:00</updated>
<author>
<name>Kees Cook</name>
<email>kees@kernel.org</email>
</author>
<published>2026-02-21T07:49:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=69050f8d6d075dc01af7a5f2f550a8067510366f'/>
<id>69050f8d6d075dc01af7a5f2f550a8067510366f</id>
<content type='text'>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfilter: nfnetlink_queue: do shared-unconfirmed check before segmentation</title>
<updated>2026-02-06T12:34:55+00:00</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2025-11-20T16:17:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=207b3ebacb6113acaaec0d171d5307032c690004'/>
<id>207b3ebacb6113acaaec0d171d5307032c690004</id>
<content type='text'>
Ulrich reports a regression with nfqueue:

If an application did not set the 'F_GSO' capability flag and a gso
packet with an unconfirmed nf_conn entry is received all packets are
now dropped instead of queued, because the check happens after
skb_gso_segment().  In that case, we did have exclusive ownership
of the skb and its associated conntrack entry.  The elevated use
count is due to skb_clone happening via skb_gso_segment().

Move the check so that its peformed vs. the aggregated packet.

Then, annotate the individual segments except the first one so we
can do a 2nd check at reinject time.

For the normal case, where userspace does in-order reinjects, this avoids
packet drops: first reinjected segment continues traversal and confirms
entry, remaining segments observe the confirmed entry.

While at it, simplify nf_ct_drop_unconfirmed(): We only care about
unconfirmed entries with a refcnt &gt; 1, there is no need to special-case
dying entries.

This only happens with UDP.  With TCP, the only unconfirmed packet will
be the TCP SYN, those aren't aggregated by GRO.

Next patch adds a udpgro test case to cover this scenario.

Reported-by: Ulrich Weber &lt;ulrich.weber@gmail.com&gt;
Fixes: 7d8dc1c7be8d ("netfilter: nf_queue: drop packets with cloned unconfirmed conntracks")
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Ulrich reports a regression with nfqueue:

If an application did not set the 'F_GSO' capability flag and a gso
packet with an unconfirmed nf_conn entry is received all packets are
now dropped instead of queued, because the check happens after
skb_gso_segment().  In that case, we did have exclusive ownership
of the skb and its associated conntrack entry.  The elevated use
count is due to skb_clone happening via skb_gso_segment().

Move the check so that its peformed vs. the aggregated packet.

Then, annotate the individual segments except the first one so we
can do a 2nd check at reinject time.

For the normal case, where userspace does in-order reinjects, this avoids
packet drops: first reinjected segment continues traversal and confirms
entry, remaining segments observe the confirmed entry.

While at it, simplify nf_ct_drop_unconfirmed(): We only care about
unconfirmed entries with a refcnt &gt; 1, there is no need to special-case
dying entries.

This only happens with UDP.  With TCP, the only unconfirmed packet will
be the TCP SYN, those aren't aggregated by GRO.

Next patch adds a udpgro test case to cover this scenario.

Reported-by: Ulrich Weber &lt;ulrich.weber@gmail.com&gt;
Fixes: 7d8dc1c7be8d ("netfilter: nf_queue: drop packets with cloned unconfirmed conntracks")
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfilter: nfnetlink_queue: optimize verdict lookup with hash table</title>
<updated>2026-01-29T08:52:07+00:00</updated>
<author>
<name>Scott Mitchell</name>
<email>scott.k.mitch1@gmail.com</email>
</author>
<published>2026-01-23T22:09:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e19079adcd26a25d7d3e586b1837493361fdf8b6'/>
<id>e19079adcd26a25d7d3e586b1837493361fdf8b6</id>
<content type='text'>
The current implementation uses a linear list to find queued packets by
ID when processing verdicts from userspace. With large queue depths and
out-of-order verdicting, this O(n) lookup becomes a significant
bottleneck, causing userspace verdict processing to dominate CPU time.

Replace the linear search with a hash table for O(1) average-case
packet lookup by ID. A global rhashtable spanning all network
namespaces attributes hash bucket memory to kernel but is subject to
fixed upper bound.

Signed-off-by: Scott Mitchell &lt;scott.k.mitch1@gmail.com&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current implementation uses a linear list to find queued packets by
ID when processing verdicts from userspace. With large queue depths and
out-of-order verdicting, this O(n) lookup becomes a significant
bottleneck, causing userspace verdict processing to dominate CPU time.

Replace the linear search with a hash table for O(1) average-case
packet lookup by ID. A global rhashtable spanning all network
namespaces attributes hash bucket memory to kernel but is subject to
fixed upper bound.

Signed-off-by: Scott Mitchell &lt;scott.k.mitch1@gmail.com&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfilter: nfnetlink_queue: nfqnl_instance GFP_ATOMIC -&gt; GFP_KERNEL_ACCOUNT allocation</title>
<updated>2026-01-20T15:23:37+00:00</updated>
<author>
<name>Scott Mitchell</name>
<email>scott.k.mitch1@gmail.com</email>
</author>
<published>2026-01-17T17:32:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a4400a5b343d1bc4aa8f685608515413238e7ee2'/>
<id>a4400a5b343d1bc4aa8f685608515413238e7ee2</id>
<content type='text'>
Currently, instance_create() uses GFP_ATOMIC because it's called while
holding instances_lock spinlock. This makes allocation more likely to
fail under memory pressure.

Refactor nfqnl_recv_config() to drop RCU lock after instance_lookup()
and peer_portid verification. A socket cannot simultaneously send a
message and close, so the queue owned by the sending socket cannot be
destroyed while processing its CONFIG message. This allows
instance_create() to allocate with GFP_KERNEL_ACCOUNT before taking
the spinlock.

Suggested-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Scott Mitchell &lt;scott.k.mitch1@gmail.com&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, instance_create() uses GFP_ATOMIC because it's called while
holding instances_lock spinlock. This makes allocation more likely to
fail under memory pressure.

Refactor nfqnl_recv_config() to drop RCU lock after instance_lookup()
and peer_portid verification. A socket cannot simultaneously send a
message and close, so the queue owned by the sending socket cannot be
destroyed while processing its CONFIG message. This allows
instance_create() to allocate with GFP_KERNEL_ACCOUNT before taking
the spinlock.

Suggested-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Scott Mitchell &lt;scott.k.mitch1@gmail.com&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfilter: nfnetlink_queue: Initialize ctx to avoid memory allocation error</title>
<updated>2025-03-23T09:20:33+00:00</updated>
<author>
<name>Chenyuan Yang</name>
<email>chenyuan0y@gmail.com</email>
</author>
<published>2025-03-13T19:54:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=778b09d91baafb13408470c721d034d6515cfa5a'/>
<id>778b09d91baafb13408470c721d034d6515cfa5a</id>
<content type='text'>
It is possible that ctx in nfqnl_build_packet_message() could be used
before it is properly initialize, which is only initialized
by nfqnl_get_sk_secctx().

This patch corrects this problem by initializing the lsmctx to a safe
value when it is declared.

This is similar to the commit 35fcac7a7c25
("audit: Initialize lsmctx to avoid memory allocation error").

Fixes: 2d470c778120 ("lsm: replace context+len with lsm_context")
Signed-off-by: Chenyuan Yang &lt;chenyuan0y@gmail.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It is possible that ctx in nfqnl_build_packet_message() could be used
before it is properly initialize, which is only initialized
by nfqnl_get_sk_secctx().

This patch corrects this problem by initializing the lsmctx to a safe
value when it is declared.

This is similar to the commit 35fcac7a7c25
("audit: Initialize lsmctx to avoid memory allocation error").

Fixes: 2d470c778120 ("lsm: replace context+len with lsm_context")
Signed-off-by: Chenyuan Yang &lt;chenyuan0y@gmail.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: corrections for security_secid_to_secctx returns</title>
<updated>2025-01-05T03:11:22+00:00</updated>
<author>
<name>Casey Schaufler</name>
<email>casey@schaufler-ca.com</email>
</author>
<published>2024-12-20T22:02:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3b44cd0998678b55a0df20b514bca0e298f4ff48'/>
<id>3b44cd0998678b55a0df20b514bca0e298f4ff48</id>
<content type='text'>
security_secid_to_secctx() returns the size of the new context,
whereas previous versions provided that via a pointer parameter.
Correct the type of the value returned in nfqnl_get_sk_secctx()
and the check for error in netlbl_unlhsh_add(). Add an error
check.

Fixes: 2d470c778120 ("lsm: replace context+len with lsm_context")
Signed-off-by: Casey Schaufler &lt;casey@schaufler-ca.com&gt;
Signed-off-by: Paul Moore &lt;paul@paul-moore.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
security_secid_to_secctx() returns the size of the new context,
whereas previous versions provided that via a pointer parameter.
Correct the type of the value returned in nfqnl_get_sk_secctx()
and the check for error in netlbl_unlhsh_add(). Add an error
check.

Fixes: 2d470c778120 ("lsm: replace context+len with lsm_context")
Signed-off-by: Casey Schaufler &lt;casey@schaufler-ca.com&gt;
Signed-off-by: Paul Moore &lt;paul@paul-moore.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lsm: replace context+len with lsm_context</title>
<updated>2024-12-04T19:42:31+00:00</updated>
<author>
<name>Casey Schaufler</name>
<email>casey@schaufler-ca.com</email>
</author>
<published>2024-10-23T21:21:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2d470c778120d3cdb8d8ab250329ca85f49f12b1'/>
<id>2d470c778120d3cdb8d8ab250329ca85f49f12b1</id>
<content type='text'>
Replace the (secctx,seclen) pointer pair with a single
lsm_context pointer to allow return of the LSM identifier
along with the context and context length. This allows
security_release_secctx() to know how to release the
context. Callers have been modified to use or save the
returned data from the new structure.

security_secid_to_secctx() and security_lsmproc_to_secctx()
will now return the length value on success instead of 0.

Cc: netdev@vger.kernel.org
Cc: audit@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org
Cc: Todd Kjos &lt;tkjos@google.com&gt;
Signed-off-by: Casey Schaufler &lt;casey@schaufler-ca.com&gt;
[PM: subject tweak, kdoc fix, signedness fix from Dan Carpenter]
Signed-off-by: Paul Moore &lt;paul@paul-moore.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Replace the (secctx,seclen) pointer pair with a single
lsm_context pointer to allow return of the LSM identifier
along with the context and context length. This allows
security_release_secctx() to know how to release the
context. Callers have been modified to use or save the
returned data from the new structure.

security_secid_to_secctx() and security_lsmproc_to_secctx()
will now return the length value on success instead of 0.

Cc: netdev@vger.kernel.org
Cc: audit@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org
Cc: Todd Kjos &lt;tkjos@google.com&gt;
Signed-off-by: Casey Schaufler &lt;casey@schaufler-ca.com&gt;
[PM: subject tweak, kdoc fix, signedness fix from Dan Carpenter]
Signed-off-by: Paul Moore &lt;paul@paul-moore.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lsm: ensure the correct LSM context releaser</title>
<updated>2024-12-04T15:46:26+00:00</updated>
<author>
<name>Casey Schaufler</name>
<email>casey@schaufler-ca.com</email>
</author>
<published>2024-10-23T21:21:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6fba89813ccf333d2bc4d5caea04cd5f3c39eb50'/>
<id>6fba89813ccf333d2bc4d5caea04cd5f3c39eb50</id>
<content type='text'>
Add a new lsm_context data structure to hold all the information about a
"security context", including the string, its size and which LSM allocated
the string. The allocation information is necessary because LSMs have
different policies regarding the lifecycle of these strings. SELinux
allocates and destroys them on each use, whereas Smack provides a pointer
to an entry in a list that never goes away.

Update security_release_secctx() to use the lsm_context instead of a
(char *, len) pair. Change its callers to do likewise.  The LSMs
supporting this hook have had comments added to remind the developer
that there is more work to be done.

The BPF security module provides all LSM hooks. While there has yet to
be a known instance of a BPF configuration that uses security contexts,
the possibility is real. In the existing implementation there is
potential for multiple frees in that case.

Cc: linux-integrity@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: audit@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org
To: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Cc: linux-nfs@vger.kernel.org
Cc: Todd Kjos &lt;tkjos@google.com&gt;
Signed-off-by: Casey Schaufler &lt;casey@schaufler-ca.com&gt;
[PM: subject tweak]
Signed-off-by: Paul Moore &lt;paul@paul-moore.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a new lsm_context data structure to hold all the information about a
"security context", including the string, its size and which LSM allocated
the string. The allocation information is necessary because LSMs have
different policies regarding the lifecycle of these strings. SELinux
allocates and destroys them on each use, whereas Smack provides a pointer
to an entry in a list that never goes away.

Update security_release_secctx() to use the lsm_context instead of a
(char *, len) pair. Change its callers to do likewise.  The LSMs
supporting this hook have had comments added to remind the developer
that there is more work to be done.

The BPF security module provides all LSM hooks. While there has yet to
be a known instance of a BPF configuration that uses security contexts,
the possibility is real. In the existing implementation there is
potential for multiple frees in that case.

Cc: linux-integrity@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: audit@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org
To: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Cc: linux-nfs@vger.kernel.org
Cc: Todd Kjos &lt;tkjos@google.com&gt;
Signed-off-by: Casey Schaufler &lt;casey@schaufler-ca.com&gt;
[PM: subject tweak]
Signed-off-by: Paul Moore &lt;paul@paul-moore.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
