linux-stable.git/net/sched, branch linux-4.9.y

net_sched: reject TCF_EM_SIMPLE case for complex ematch module

2023-01-07T11:07:30+00:00

[ Upstream commit 9cd3fd2054c3b3055163accbf2f31a4426f10317 ]

When TCF_EM_SIMPLE was introduced, it is supposed to be convenient
for ematch implementation:

https://lore.kernel.org/all/20050105110048.GO26856@postel.suug.ch/

"You don't have to, providing a 32bit data chunk without TCF_EM_SIMPLE
set will simply result in allocating & copy. It's an optimization,
nothing more."

So if an ematch module provides ops->datalen that means it wants a
complex data structure (saved in its em->data) instead of a simple u32
value. We should simply reject such a combination, otherwise this u32
could be misinterpreted as a pointer.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-and-tested-by: syzbot+4caeae4c7103813598ae@syzkaller.appspotmail.com
Reported-by: Jun Nie 
Cc: Jamal Hadi Salim 
Cc: Paolo Abeni 
Signed-off-by: Cong Wang 
Acked-by: Paolo Abeni 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin

net: sched: Fix use after free in red_enqueue()

2022-11-10T14:46:05+00:00

[ Upstream commit 8bdc2acd420c6f3dd1f1c78750ec989f02a1e2b9 ]

We can't use "skb" again after passing it to qdisc_enqueue().  This is
basically identical to commit 2f09707d0c97 ("sch_sfb: Also store skb
len before calling child enqueue").

Fixes: d7f4f332f082 ("sch_red: update backlog as well")
Signed-off-by: Dan Carpenter 
Reviewed-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin

sch_sfb: Also store skb len before calling child enqueue

2022-09-15T10:39:46+00:00

[ Upstream commit 2f09707d0c972120bf794cfe0f0c67e2c2ddb252 ]

Cong Wang noticed that the previous fix for sch_sfb accessing the queued
skb after enqueueing it to a child qdisc was incomplete: the SFB enqueue
function was also calling qdisc_qstats_backlog_inc() after enqueue, which
reads the pkt len from the skb cb field. Fix this by also storing the skb
len, and using the stored value to increment the backlog after enqueueing.

Fixes: 9efd23297cca ("sch_sfb: Don't assume the skb is still around after enqueueing to child")
Signed-off-by: Toke Høiland-Jørgensen 
Acked-by: Cong Wang 
Link: https://lore.kernel.org/r/20220905192137.965549-1-toke@toke.dk
Signed-off-by: Paolo Abeni 
Signed-off-by: Sasha Levin

sch_sfb: Don't assume the skb is still around after enqueueing to child

2022-09-15T10:39:46+00:00

[ Upstream commit 9efd23297cca530bb35e1848665805d3fcdd7889 ]

The sch_sfb enqueue() routine assumes the skb is still alive after it has
been enqueued into a child qdisc, using the data in the skb cb field in the
increment_qlen() routine after enqueue. However, the skb may in fact have
been freed, causing a use-after-free in this case. In particular, this
happens if sch_cake is used as a child of sfb, and the GSO splitting mode
of CAKE is enabled (in which case the skb will be split into segments and
the original skb freed).

Fix this by copying the sfb cb data to the stack before enqueueing the skb,
and using this stack copy in increment_qlen() instead of the skb pointer
itself.

Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-18231
Fixes: e13e02a3c68d ("net_sched: SFB flow scheduler")
Signed-off-by: Toke Høiland-Jørgensen 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin

net_sched: cls_route: disallow handle of 0

2022-08-25T09:09:28+00:00

commit 02799571714dc5dd6948824b9d080b44a295f695 upstream.

Follows up on:
https://lore.kernel.org/all/20220809170518.164662-1-cascardo@canonical.com/

handle of 0 implies from/to of universe realm which is not very
sensible.

Lets see what this patch will do:
$sudo tc qdisc add dev $DEV root handle 1:0 prio

//lets manufacture a way to insert handle of 0
$sudo tc filter add dev $DEV parent 1:0 protocol ip prio 100 \
route to 0 from 0 classid 1:10 action ok

//gets rejected...
Error: handle of 0 is not valid.
We have an error talking to the kernel, -1

//lets create a legit entry..
sudo tc filter add dev $DEV parent 1:0 protocol ip prio 100 route from 10 \
classid 1:10 action ok

//what did the kernel insert?
$sudo tc filter ls dev $DEV parent 1:0
filter protocol ip pref 100 route chain 0
filter protocol ip pref 100 route chain 0 fh 0x000a8000 flowid 1:10 from 10
	action order 1: gact action pass
	 random type none pass val 0
	 index 1 ref 1 bind 1

//Lets try to replace that legit entry with a handle of 0
$ sudo tc filter replace dev $DEV parent 1:0 protocol ip prio 100 \
handle 0x000a8000 route to 0 from 0 classid 1:10 action drop

Error: Replacing with handle of 0 is invalid.
We have an error talking to the kernel, -1

And last, lets run Cascardo's POC:
$ ./poc
0
0
-22
-22
-22

Signed-off-by: Jamal Hadi Salim 
Acked-by: Stephen Hemminger 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net_sched: cls_route: remove from list when handle is 0

2022-08-25T09:09:26+00:00

commit 9ad36309e2719a884f946678e0296be10f0bb4c1 upstream.

When a route filter is replaced and the old filter has a 0 handle, the old
one won't be removed from the hashtable, while it will still be freed.

The test was there since before commit 1109c00547fc ("net: sched: RCU
cls_route"), when a new filter was not allocated when there was an old one.
The old filter was reused and the reinserting would only be necessary if an
old filter was replaced. That was still wrong for the same case where the
old handle was 0.

Remove the old filter from the list independently from its handle value.

This fixes CVE-2022-2588, also reported as ZDI-CAN-17440.

Reported-by: Zhenpeng Lin 
Signed-off-by: Thadeu Lima de Souza Cascardo 
Reviewed-by: Kamal Mostafa 
Cc: 
Acked-by: Jamal Hadi Salim 
Link: https://lore.kernel.org/r/20220809170518.164662-1-cascardo@canonical.com
Signed-off-by: Jakub Kicinski 
Signed-off-by: Greg Kroah-Hartman

net: sched: prevent UAF on tc_ctl_tfilter when temporarily dropping rtnl_lock

2022-05-12T10:14:58+00:00

When dropping the rtnl_lock for looking up for a module, the device may be
removed, releasing the qdisc and class memory. Right after trying to load
the module, cl_ops->put is called, leading to a potential use-after-free.

Though commit e368fdb61d8e ("net: sched: use Qdisc rcu API instead of
relying on rtnl lock") fixes this, it involves a lot of refactoring of the
net/sched/ code, complicating its backport.

This fix calls cl_ops->put before dropping rtnl_lock as it will be called
either way, and zeroes it out so it won't be called again on the exit path.

This has been shown to stop the following KASAN report with the reproducer:

[  256.609111] ==================================================================
[  256.609585] BUG: KASAN: use-after-free in cbq_put+0x20/0xd0 at addr ffff880021daaba0
[  256.610078] Read of size 4 by task total_cbq/11184
[  256.610380] CPU: 0 PID: 11184 Comm: total_cbq Not tainted 4.9.311 #78
[  256.610778]  ffff8800215875a8 ffffffff96e18735 ffff880024803080 ffff880021daaa80
[  256.611274]  ffff8800215875d0 ffffffff96334841 ffffed00043b5574 ffffed00043b5574
[  256.611768]  ffff880024803080 ffff880021587658 ffffffff96334af8 0000000000000000
[  256.612186] Call Trace:
[  256.612344]  [] dump_stack+0x6d/0x8b
[  256.612632]  [] kasan_object_err+0x21/0x70
[  256.612973]  [] kasan_report.part.1+0x218/0x4f0
[  256.613349]  [] ? cbq_put+0x20/0xd0
[  256.613634]  [] ? kasan_unpoison_shadow+0x36/0x50
[  256.613993]  [] kasan_report+0x25/0x30
[  256.614288]  [] __asan_load4+0x61/0x80
[  256.614580]  [] cbq_put+0x20/0xd0
[  256.614862]  [] tc_ctl_tfilter+0x4f4/0xb80
[  256.615151]  [] ? tfilter_notify+0x140/0x140
[  256.615478]  [] ? do_syscall_64+0xef/0x190
[  256.615799]  [] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
[  256.616190]  [] ? sock_sendmsg+0x76/0x80
[  256.616484]  [] ? sock_write_iter+0x13f/0x1f0
[  256.616833]  [] ? __vfs_write+0x262/0x3c0
[  256.617152]  [] ? vfs_write+0xf9/0x260
[  256.617451]  [] ? SyS_write+0xc9/0x1b0
[  256.617754]  [] ? ns_capable_common+0x5a/0xa0
[  256.618067]  [] ? ns_capable+0x13/0x20
[  256.618334]  [] ? __netlink_ns_capable+0x6d/0x80
[  256.618666]  [] rtnetlink_rcv_msg+0x1af/0x410
[  256.618969]  [] ? netlink_compare+0x5b/0x70
[  256.619295]  [] ? rtnl_newlink+0xc60/0xc60
[  256.619587]  [] ? __netlink_lookup+0x1a4/0x240
[  256.619885]  [] ? netlink_broadcast+0x20/0x20
[  256.620179]  [] netlink_rcv_skb+0x155/0x190
[  256.620463]  [] ? rtnl_newlink+0xc60/0xc60
[  256.620748]  [] rtnetlink_rcv+0x28/0x30
[  256.621015]  [] netlink_unicast+0x2f1/0x3b0
[  256.621354]  [] ? netlink_attachskb+0x340/0x340
[  256.621765]  [] netlink_sendmsg+0x56e/0x6f0
[  256.622181]  [] ? netlink_unicast+0x3b0/0x3b0
[  256.622578]  [] ? netlink_unicast+0x3b0/0x3b0
[  256.622893]  [] sock_sendmsg+0x76/0x80
[  256.623157]  [] sock_write_iter+0x13f/0x1f0
[  256.623440]  [] ? sock_sendmsg+0x80/0x80
[  256.623729]  [] ? iov_iter_init+0x82/0xc0
[  256.624006]  [] __vfs_write+0x262/0x3c0
[  256.624274]  [] ? default_llseek+0x120/0x120
[  256.624566]  [] ? common_file_perm+0x92/0x170
[  256.624925]  [] ? rw_verify_area+0x78/0x140
[  256.625277]  [] vfs_write+0xf9/0x260
[  256.625593]  [] SyS_write+0xc9/0x1b0
[  256.625891]  [] ? SyS_read+0x1b0/0x1b0
[  256.626154]  [] ? SyS_read+0x1b0/0x1b0
[  256.626422]  [] do_syscall_64+0xef/0x190
[  256.626697]  [] entry_SYSCALL_64_after_swapgs+0x58/0xc6
[  256.627033] Object at ffff880021daaa80, in cache kmalloc-512 size: 512
[  256.627415] Allocated:
[  256.627563] PID = 164
[  256.627711]  save_stack_trace+0x1b/0x20
[  256.627947]  save_stack+0x46/0xd0
[  256.628151]  kasan_kmalloc+0xad/0xe0
[  256.628362]  kmem_cache_alloc_trace+0xe8/0x1e0
[  256.628637]  cbq_change_class+0x8b6/0xde0
[  256.628896]  tc_ctl_tclass+0x56a/0x5b0
[  256.629129]  rtnetlink_rcv_msg+0x1af/0x410
[  256.629380]  netlink_rcv_skb+0x155/0x190
[  256.629621]  rtnetlink_rcv+0x28/0x30
[  256.629840]  netlink_unicast+0x2f1/0x3b0
[  256.630066]  netlink_sendmsg+0x56e/0x6f0
[  256.630263]  sock_sendmsg+0x76/0x80
[  256.630456]  sock_write_iter+0x13f/0x1f0
[  256.630698]  __vfs_write+0x262/0x3c0
[  256.630918]  vfs_write+0xf9/0x260
[  256.631123]  SyS_write+0xc9/0x1b0
[  256.631327]  do_syscall_64+0xef/0x190
[  256.631553]  entry_SYSCALL_64_after_swapgs+0x58/0xc6
[  256.631827] Freed:
[  256.631931] PID = 164
[  256.632048]  save_stack_trace+0x1b/0x20
[  256.632241]  save_stack+0x46/0xd0
[  256.632408]  kasan_slab_free+0x71/0xb0
[  256.632597]  kfree+0x8c/0x1a0
[  256.632751]  cbq_destroy_class+0x85/0xa0
[  256.632948]  cbq_destroy+0xfa/0x120
[  256.633125]  qdisc_destroy+0xa1/0x140
[  256.633309]  dev_shutdown+0x12d/0x190
[  256.633497]  rollback_registered_many+0x43c/0x5b0
[  256.633753]  unregister_netdevice_many+0x2c/0x130
[  256.634041]  rtnl_delete_link+0xb3/0x100
[  256.634283]  rtnl_dellink+0x19c/0x360
[  256.634509]  rtnetlink_rcv_msg+0x1af/0x410
[  256.634760]  netlink_rcv_skb+0x155/0x190
[  256.635001]  rtnetlink_rcv+0x28/0x30
[  256.635221]  netlink_unicast+0x2f1/0x3b0
[  256.635463]  netlink_sendmsg+0x56e/0x6f0
[  256.635700]  sock_sendmsg+0x76/0x80
[  256.635915]  sock_write_iter+0x13f/0x1f0
[  256.636156]  __vfs_write+0x262/0x3c0
[  256.636376]  vfs_write+0xf9/0x260
[  256.636580]  SyS_write+0xc9/0x1b0
[  256.636787]  do_syscall_64+0xef/0x190
[  256.637013]  entry_SYSCALL_64_after_swapgs+0x58/0xc6
[  256.637316] Memory state around the buggy address:
[  256.637610]  ffff880021daaa80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  256.638047]  ffff880021daab00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  256.638487] >ffff880021daab80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  256.638924]                                ^
[  256.639186]  ffff880021daac00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  256.639624]  ffff880021daac80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

Signed-off-by: Thadeu Lima de Souza Cascardo

net_sched: restore "mpu xxx" handling

2022-01-27T07:47:42+00:00

commit fb80445c438c78b40b547d12b8d56596ce4ccfeb upstream.

commit 56b765b79e9a ("htb: improved accuracy at high rates") broke
"overhead X", "linklayer atm" and "mpu X" attributes.

"overhead X" and "linklayer atm" have already been fixed. This restores
the "mpu X" handling, as might be used by DOCSIS or Ethernet shaping:

    tc class add ... htb rate X overhead 4 mpu 64

The code being fixed is used by htb, tbf and act_police. Cake has its
own mpu handling. qdisc_calculate_pkt_len still uses the size table
containing values adjusted for mpu by user space.

iproute2 tc has always passed mpu into the kernel via a tc_ratespec
structure, but the kernel never directly acted on it, merely stored it
so that it could be read back by `tc class show`.

Rather, tc would generate length-to-time tables that included the mpu
(and linklayer) in their construction, and the kernel used those tables.

Since v3.7, the tables were no longer used. Along with "mpu", this also
broke "overhead" and "linklayer" which were fixed in 01cb71d2d47b
("net_sched: restore "overhead xxx" handling", v3.10) and 8a8e3d84b171
("net_sched: restore "linklayer atm" handling", v3.11).

"overhead" was fixed by simply restoring use of tc_ratespec::overhead -
this had originally been used by the kernel but was initially omitted
from the new non-table-based calculations.

"linklayer" had been handled in the table like "mpu", but the mode was
not originally passed in tc_ratespec. The new implementation was made to
handle it by getting new versions of tc to pass the mode in an extended
tc_ratespec, and for older versions of tc the table contents were analysed
at load time to deduce linklayer.

As "mpu" has always been given to the kernel in tc_ratespec,
accompanying the mpu-based table, we can restore system functionality
with no userspace change by making the kernel act on the tc_ratespec
value.

Fixes: 56b765b79e9a ("htb: improved accuracy at high rates")
Signed-off-by: Kevin Bracey 
Cc: Eric Dumazet 
Cc: Jiri Pirko 
Cc: Vimalkumar 
Link: https://lore.kernel.org/r/20220112170210.1014351-1-kevin@bracey.fi
Signed-off-by: Jakub Kicinski 
Signed-off-by: Greg Kroah-Hartman

sch_qfq: prevent shift-out-of-bounds in qfq_init_qdisc

2022-01-11T12:38:12+00:00

commit 7d18a07897d07495ee140dd319b0e9265c0f68ba upstream.

tx_queue_len can be set to ~0U, we need to be more
careful about overflows.

__fls(0) is undefined, as this report shows:

UBSAN: shift-out-of-bounds in net/sched/sch_qfq.c:1430:24
shift exponent 51770272 is too large for 32-bit type 'int'
CPU: 0 PID: 25574 Comm: syz-executor.0 Not tainted 5.16.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x201/0x2d8 lib/dump_stack.c:106
 ubsan_epilogue lib/ubsan.c:151 [inline]
 __ubsan_handle_shift_out_of_bounds+0x494/0x530 lib/ubsan.c:330
 qfq_init_qdisc+0x43f/0x450 net/sched/sch_qfq.c:1430
 qdisc_create+0x895/0x1430 net/sched/sch_api.c:1253
 tc_modify_qdisc+0x9d9/0x1e20 net/sched/sch_api.c:1660
 rtnetlink_rcv_msg+0x934/0xe60 net/core/rtnetlink.c:5571
 netlink_rcv_skb+0x200/0x470 net/netlink/af_netlink.c:2496
 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
 netlink_unicast+0x814/0x9f0 net/netlink/af_netlink.c:1345
 netlink_sendmsg+0xaea/0xe60 net/netlink/af_netlink.c:1921
 sock_sendmsg_nosec net/socket.c:704 [inline]
 sock_sendmsg net/socket.c:724 [inline]
 ____sys_sendmsg+0x5b9/0x910 net/socket.c:2409
 ___sys_sendmsg net/socket.c:2463 [inline]
 __sys_sendmsg+0x280/0x370 net/socket.c:2492
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
Signed-off-by: Eric Dumazet 
Reported-by: syzbot 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net_sched: fix NULL deref in fifo_set_limit()

2021-10-17T08:05:39+00:00

[ Upstream commit 560ee196fe9e5037e5015e2cdb14b3aecb1cd7dc ]

syzbot reported another NULL deref in fifo_set_limit() [1]

I could repro the issue with :

unshare -n
tc qd add dev lo root handle 1:0 tbf limit 200000 burst 70000 rate 100Mbit
tc qd replace dev lo parent 1:0 pfifo_fast
tc qd change dev lo root handle 1:0 tbf limit 300000 burst 70000 rate 100Mbit

pfifo_fast does not have a change() operation.
Make fifo_set_limit() more robust about this.

[1]
BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 1cf99067 P4D 1cf99067 PUD 7ca49067 PMD 0
Oops: 0010 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 14443 Comm: syz-executor959 Not tainted 5.15.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:0x0
Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
RSP: 0018:ffffc9000e2f7310 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff8d6ecc00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff888024c27910 RDI: ffff888071e34000
RBP: ffff888071e34000 R08: 0000000000000001 R09: ffffffff8fcfb947
R10: 0000000000000001 R11: 0000000000000000 R12: ffff888024c27910
R13: ffff888071e34018 R14: 0000000000000000 R15: ffff88801ef74800
FS:  00007f321d897700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 00000000722c3000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 fifo_set_limit net/sched/sch_fifo.c:242 [inline]
 fifo_set_limit+0x198/0x210 net/sched/sch_fifo.c:227
 tbf_change+0x6ec/0x16d0 net/sched/sch_tbf.c:418
 qdisc_change net/sched/sch_api.c:1332 [inline]
 tc_modify_qdisc+0xd9a/0x1a60 net/sched/sch_api.c:1634
 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
 netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
 sock_sendmsg_nosec net/socket.c:704 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:724
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: fb0305ce1b03 ("net-sched: consolidate default fifo qdisc setup")
Signed-off-by: Eric Dumazet 
Reported-by: syzbot 
Link: https://lore.kernel.org/r/20210930212239.3430364-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski 
Signed-off-by: Sasha Levin