<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/sched, branch v4.20</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>net/sched: cls_flower: Remove old entries from rhashtable</title>
<updated>2018-12-20T00:36:55+00:00</updated>
<author>
<name>Roi Dayan</name>
<email>roid@mellanox.com</email>
</author>
<published>2018-12-19T16:07:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=599d2570b2da7c2f5419332b42b7999d79c85959'/>
<id>599d2570b2da7c2f5419332b42b7999d79c85959</id>
<content type='text'>
When replacing a rule we add the new rule to the rhashtable
but only remove the old if not in skip_sw.
This commit fix this and remove the old rule anyway.

Fixes: 35cc3cefc4de ("net/sched: cls_flower: Reject duplicated rules also under skip_sw")
Signed-off-by: Roi Dayan &lt;roid@mellanox.com&gt;
Reviewed-by: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Acked-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When replacing a rule we add the new rule to the rhashtable
but only remove the old if not in skip_sw.
This commit fix this and remove the old rule anyway.

Fixes: 35cc3cefc4de ("net/sched: cls_flower: Reject duplicated rules also under skip_sw")
Signed-off-by: Roi Dayan &lt;roid@mellanox.com&gt;
Reviewed-by: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Acked-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net/sched: cls_flower: Reject duplicated rules also under skip_sw</title>
<updated>2018-12-09T19:55:08+00:00</updated>
<author>
<name>Or Gerlitz</name>
<email>ogerlitz@mellanox.com</email>
</author>
<published>2018-12-09T16:10:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=35cc3cefc4de90001c9137e2d01dd9d06b11acfb'/>
<id>35cc3cefc4de90001c9137e2d01dd9d06b11acfb</id>
<content type='text'>
Currently, duplicated rules are rejected only for skip_hw or "none",
hence allowing users to push duplicates into HW for no reason.

Use the flower tables to protect for that.

Signed-off-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Signed-off-by: Paul Blakey &lt;paulb@mellanox.com&gt;
Reported-by: Chris Mi &lt;chrism@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, duplicated rules are rejected only for skip_hw or "none",
hence allowing users to push duplicates into HW for no reason.

Use the flower tables to protect for that.

Signed-off-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Signed-off-by: Paul Blakey &lt;paulb@mellanox.com&gt;
Reported-by: Chris Mi &lt;chrism@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net/sched: act_police: fix memory leak in case of invalid control action</title>
<updated>2018-12-01T01:14:06+00:00</updated>
<author>
<name>Davide Caratti</name>
<email>dcaratti@redhat.com</email>
</author>
<published>2018-11-28T17:43:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fd6d433865a2ad1f7e018ef80408cb3dc3be1ab3'/>
<id>fd6d433865a2ad1f7e018ef80408cb3dc3be1ab3</id>
<content type='text'>
when users set an invalid control action, kmemleak complains as follows:

 # echo clear &gt;/sys/kernel/debug/kmemleak
 # ./tdc.py -e b48b
 Test b48b: Add police action with exceed goto chain control action
 All test results:

 1..1
 ok 1 - b48b # Add police action with exceed goto chain control action
 about to flush the tap output if tests need to be skipped
 done flushing skipped test tap output
 # echo scan &gt;/sys/kernel/debug/kmemleak
 # cat /sys/kernel/debug/kmemleak
 unreferenced object 0xffffa0fafbc3dde0 (size 96):
  comm "tc", pid 2358, jiffies 4294922738 (age 17.022s)
  hex dump (first 32 bytes):
    2a 00 00 20 00 00 00 00 00 00 7d 00 00 00 00 00  *.. ......}.....
    f8 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [&lt;00000000648803d2&gt;] tcf_action_init_1+0x384/0x4c0
    [&lt;00000000cb69382e&gt;] tcf_action_init+0x12b/0x1a0
    [&lt;00000000847ef0d4&gt;] tcf_action_add+0x73/0x170
    [&lt;0000000093656e14&gt;] tc_ctl_action+0x122/0x160
    [&lt;0000000023c98e32&gt;] rtnetlink_rcv_msg+0x263/0x2d0
    [&lt;000000003493ae9c&gt;] netlink_rcv_skb+0x4d/0x130
    [&lt;00000000de63f8ba&gt;] netlink_unicast+0x209/0x2d0
    [&lt;00000000c3da0ebe&gt;] netlink_sendmsg+0x2c1/0x3c0
    [&lt;000000007a9e0753&gt;] sock_sendmsg+0x33/0x40
    [&lt;00000000457c6d2e&gt;] ___sys_sendmsg+0x2a0/0x2f0
    [&lt;00000000c5c6a086&gt;] __sys_sendmsg+0x5e/0xa0
    [&lt;00000000446eafce&gt;] do_syscall_64+0x5b/0x180
    [&lt;000000004aa871f2&gt;] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [&lt;00000000450c38ef&gt;] 0xffffffffffffffff

change tcf_police_init() to avoid leaking 'new' in case TCA_POLICE_RESULT
contains TC_ACT_GOTO_CHAIN extended action.

Fixes: c08f5ed5d625 ("net/sched: act_police: disallow 'goto chain' on fallback control action")
Reported-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Signed-off-by: Davide Caratti &lt;dcaratti@redhat.com&gt;
Acked-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
when users set an invalid control action, kmemleak complains as follows:

 # echo clear &gt;/sys/kernel/debug/kmemleak
 # ./tdc.py -e b48b
 Test b48b: Add police action with exceed goto chain control action
 All test results:

 1..1
 ok 1 - b48b # Add police action with exceed goto chain control action
 about to flush the tap output if tests need to be skipped
 done flushing skipped test tap output
 # echo scan &gt;/sys/kernel/debug/kmemleak
 # cat /sys/kernel/debug/kmemleak
 unreferenced object 0xffffa0fafbc3dde0 (size 96):
  comm "tc", pid 2358, jiffies 4294922738 (age 17.022s)
  hex dump (first 32 bytes):
    2a 00 00 20 00 00 00 00 00 00 7d 00 00 00 00 00  *.. ......}.....
    f8 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [&lt;00000000648803d2&gt;] tcf_action_init_1+0x384/0x4c0
    [&lt;00000000cb69382e&gt;] tcf_action_init+0x12b/0x1a0
    [&lt;00000000847ef0d4&gt;] tcf_action_add+0x73/0x170
    [&lt;0000000093656e14&gt;] tc_ctl_action+0x122/0x160
    [&lt;0000000023c98e32&gt;] rtnetlink_rcv_msg+0x263/0x2d0
    [&lt;000000003493ae9c&gt;] netlink_rcv_skb+0x4d/0x130
    [&lt;00000000de63f8ba&gt;] netlink_unicast+0x209/0x2d0
    [&lt;00000000c3da0ebe&gt;] netlink_sendmsg+0x2c1/0x3c0
    [&lt;000000007a9e0753&gt;] sock_sendmsg+0x33/0x40
    [&lt;00000000457c6d2e&gt;] ___sys_sendmsg+0x2a0/0x2f0
    [&lt;00000000c5c6a086&gt;] __sys_sendmsg+0x5e/0xa0
    [&lt;00000000446eafce&gt;] do_syscall_64+0x5b/0x180
    [&lt;000000004aa871f2&gt;] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [&lt;00000000450c38ef&gt;] 0xffffffffffffffff

change tcf_police_init() to avoid leaking 'new' in case TCA_POLICE_RESULT
contains TC_ACT_GOTO_CHAIN extended action.

Fixes: c08f5ed5d625 ("net/sched: act_police: disallow 'goto chain' on fallback control action")
Reported-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Signed-off-by: Davide Caratti &lt;dcaratti@redhat.com&gt;
Acked-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Prevent invalid access to skb-&gt;prev in __qdisc_drop_all</title>
<updated>2018-11-30T00:27:27+00:00</updated>
<author>
<name>Christoph Paasch</name>
<email>cpaasch@apple.com</email>
</author>
<published>2018-11-30T00:01:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9410d386d0a829ace9558336263086c2fbbe8aed'/>
<id>9410d386d0a829ace9558336263086c2fbbe8aed</id>
<content type='text'>
__qdisc_drop_all() accesses skb-&gt;prev to get to the tail of the
segment-list.

With commit 68d2f84a1368 ("net: gro: properly remove skb from list")
the skb-list handling has been changed to set skb-&gt;next to NULL and set
the list-poison on skb-&gt;prev.

With that change, __qdisc_drop_all() will panic when it tries to
dereference skb-&gt;prev.

Since commit 992cba7e276d ("net: Add and use skb_list_del_init().")
__list_del_entry is used, leaving skb-&gt;prev unchanged (thus,
pointing to the list-head if it's the first skb of the list).
This will make __qdisc_drop_all modify the next-pointer of the list-head
and result in a panic later on:

[   34.501053] general protection fault: 0000 [#1] SMP KASAN PTI
[   34.501968] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.20.0-rc2.mptcp #108
[   34.502887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
[   34.504074] RIP: 0010:dev_gro_receive+0x343/0x1f90
[   34.504751] Code: e0 48 c1 e8 03 42 80 3c 30 00 0f 85 4a 1c 00 00 4d 8b 24 24 4c 39 65 d0 0f 84 0a 04 00 00 49 8d 7c 24 38 48 89 f8 48 c1 e8 03 &lt;42&gt; 0f b6 04 30 84 c0 74 08 3c 04
[   34.507060] RSP: 0018:ffff8883af507930 EFLAGS: 00010202
[   34.507761] RAX: 0000000000000007 RBX: ffff8883970b2c80 RCX: 1ffff11072e165a6
[   34.508640] RDX: 1ffff11075867008 RSI: ffff8883ac338040 RDI: 0000000000000038
[   34.509493] RBP: ffff8883af5079d0 R08: ffff8883970b2d40 R09: 0000000000000062
[   34.510346] R10: 0000000000000034 R11: 0000000000000000 R12: 0000000000000000
[   34.511215] R13: 0000000000000000 R14: dffffc0000000000 R15: ffff8883ac338008
[   34.512082] FS:  0000000000000000(0000) GS:ffff8883af500000(0000) knlGS:0000000000000000
[   34.513036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   34.513741] CR2: 000055ccc3e9d020 CR3: 00000003abf32000 CR4: 00000000000006e0
[   34.514593] Call Trace:
[   34.514893]  &lt;IRQ&gt;
[   34.515157]  napi_gro_receive+0x93/0x150
[   34.515632]  receive_buf+0x893/0x3700
[   34.516094]  ? __netif_receive_skb+0x1f/0x1a0
[   34.516629]  ? virtnet_probe+0x1b40/0x1b40
[   34.517153]  ? __stable_node_chain+0x4d0/0x850
[   34.517684]  ? kfree+0x9a/0x180
[   34.518067]  ? __kasan_slab_free+0x171/0x190
[   34.518582]  ? detach_buf+0x1df/0x650
[   34.519061]  ? lapic_next_event+0x5a/0x90
[   34.519539]  ? virtqueue_get_buf_ctx+0x280/0x7f0
[   34.520093]  virtnet_poll+0x2df/0xd60
[   34.520533]  ? receive_buf+0x3700/0x3700
[   34.521027]  ? qdisc_watchdog_schedule_ns+0xd5/0x140
[   34.521631]  ? htb_dequeue+0x1817/0x25f0
[   34.522107]  ? sch_direct_xmit+0x142/0xf30
[   34.522595]  ? virtqueue_napi_schedule+0x26/0x30
[   34.523155]  net_rx_action+0x2f6/0xc50
[   34.523601]  ? napi_complete_done+0x2f0/0x2f0
[   34.524126]  ? kasan_check_read+0x11/0x20
[   34.524608]  ? _raw_spin_lock+0x7d/0xd0
[   34.525070]  ? _raw_spin_lock_bh+0xd0/0xd0
[   34.525563]  ? kvm_guest_apic_eoi_write+0x6b/0x80
[   34.526130]  ? apic_ack_irq+0x9e/0xe0
[   34.526567]  __do_softirq+0x188/0x4b5
[   34.527015]  irq_exit+0x151/0x180
[   34.527417]  do_IRQ+0xdb/0x150
[   34.527783]  common_interrupt+0xf/0xf
[   34.528223]  &lt;/IRQ&gt;

This patch makes sure that skb-&gt;prev is set to NULL when entering
netem_enqueue.

Cc: Prashant Bhole &lt;bhole_prashant_q7@lab.ntt.co.jp&gt;
Cc: Tyler Hicks &lt;tyhicks@canonical.com&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Fixes: 68d2f84a1368 ("net: gro: properly remove skb from list")
Suggested-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Christoph Paasch &lt;cpaasch@apple.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
__qdisc_drop_all() accesses skb-&gt;prev to get to the tail of the
segment-list.

With commit 68d2f84a1368 ("net: gro: properly remove skb from list")
the skb-list handling has been changed to set skb-&gt;next to NULL and set
the list-poison on skb-&gt;prev.

With that change, __qdisc_drop_all() will panic when it tries to
dereference skb-&gt;prev.

Since commit 992cba7e276d ("net: Add and use skb_list_del_init().")
__list_del_entry is used, leaving skb-&gt;prev unchanged (thus,
pointing to the list-head if it's the first skb of the list).
This will make __qdisc_drop_all modify the next-pointer of the list-head
and result in a panic later on:

[   34.501053] general protection fault: 0000 [#1] SMP KASAN PTI
[   34.501968] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.20.0-rc2.mptcp #108
[   34.502887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
[   34.504074] RIP: 0010:dev_gro_receive+0x343/0x1f90
[   34.504751] Code: e0 48 c1 e8 03 42 80 3c 30 00 0f 85 4a 1c 00 00 4d 8b 24 24 4c 39 65 d0 0f 84 0a 04 00 00 49 8d 7c 24 38 48 89 f8 48 c1 e8 03 &lt;42&gt; 0f b6 04 30 84 c0 74 08 3c 04
[   34.507060] RSP: 0018:ffff8883af507930 EFLAGS: 00010202
[   34.507761] RAX: 0000000000000007 RBX: ffff8883970b2c80 RCX: 1ffff11072e165a6
[   34.508640] RDX: 1ffff11075867008 RSI: ffff8883ac338040 RDI: 0000000000000038
[   34.509493] RBP: ffff8883af5079d0 R08: ffff8883970b2d40 R09: 0000000000000062
[   34.510346] R10: 0000000000000034 R11: 0000000000000000 R12: 0000000000000000
[   34.511215] R13: 0000000000000000 R14: dffffc0000000000 R15: ffff8883ac338008
[   34.512082] FS:  0000000000000000(0000) GS:ffff8883af500000(0000) knlGS:0000000000000000
[   34.513036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   34.513741] CR2: 000055ccc3e9d020 CR3: 00000003abf32000 CR4: 00000000000006e0
[   34.514593] Call Trace:
[   34.514893]  &lt;IRQ&gt;
[   34.515157]  napi_gro_receive+0x93/0x150
[   34.515632]  receive_buf+0x893/0x3700
[   34.516094]  ? __netif_receive_skb+0x1f/0x1a0
[   34.516629]  ? virtnet_probe+0x1b40/0x1b40
[   34.517153]  ? __stable_node_chain+0x4d0/0x850
[   34.517684]  ? kfree+0x9a/0x180
[   34.518067]  ? __kasan_slab_free+0x171/0x190
[   34.518582]  ? detach_buf+0x1df/0x650
[   34.519061]  ? lapic_next_event+0x5a/0x90
[   34.519539]  ? virtqueue_get_buf_ctx+0x280/0x7f0
[   34.520093]  virtnet_poll+0x2df/0xd60
[   34.520533]  ? receive_buf+0x3700/0x3700
[   34.521027]  ? qdisc_watchdog_schedule_ns+0xd5/0x140
[   34.521631]  ? htb_dequeue+0x1817/0x25f0
[   34.522107]  ? sch_direct_xmit+0x142/0xf30
[   34.522595]  ? virtqueue_napi_schedule+0x26/0x30
[   34.523155]  net_rx_action+0x2f6/0xc50
[   34.523601]  ? napi_complete_done+0x2f0/0x2f0
[   34.524126]  ? kasan_check_read+0x11/0x20
[   34.524608]  ? _raw_spin_lock+0x7d/0xd0
[   34.525070]  ? _raw_spin_lock_bh+0xd0/0xd0
[   34.525563]  ? kvm_guest_apic_eoi_write+0x6b/0x80
[   34.526130]  ? apic_ack_irq+0x9e/0xe0
[   34.526567]  __do_softirq+0x188/0x4b5
[   34.527015]  irq_exit+0x151/0x180
[   34.527417]  do_IRQ+0xdb/0x150
[   34.527783]  common_interrupt+0xf/0xf
[   34.528223]  &lt;/IRQ&gt;

This patch makes sure that skb-&gt;prev is set to NULL when entering
netem_enqueue.

Cc: Prashant Bhole &lt;bhole_prashant_q7@lab.ntt.co.jp&gt;
Cc: Tyler Hicks &lt;tyhicks@canonical.com&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Fixes: 68d2f84a1368 ("net: gro: properly remove skb from list")
Suggested-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Christoph Paasch &lt;cpaasch@apple.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net/sched: act_police: add missing spinlock initialization</title>
<updated>2018-11-23T19:20:02+00:00</updated>
<author>
<name>Davide Caratti</name>
<email>dcaratti@redhat.com</email>
</author>
<published>2018-11-21T17:23:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=484afd1bd3fc6f9f5347289fc8b285aa65f67054'/>
<id>484afd1bd3fc6f9f5347289fc8b285aa65f67054</id>
<content type='text'>
commit f2cbd4852820 ("net/sched: act_police: fix race condition on state
variables") introduces a new spinlock, but forgets its initialization.
Ensure that tcf_police_init() initializes 'tcfp_lock' every time a 'police'
action is newly created, to avoid the following lockdep splat:

 INFO: trying to register non-static key.
 the code is fine but needs lockdep annotation.
 turning off the locking correctness validator.
 &lt;...&gt;
 Call Trace:
  dump_stack+0x85/0xcb
  register_lock_class+0x581/0x590
  __lock_acquire+0xd4/0x1330
  ? tcf_police_init+0x2fa/0x650 [act_police]
  ? lock_acquire+0x9e/0x1a0
  lock_acquire+0x9e/0x1a0
  ? tcf_police_init+0x2fa/0x650 [act_police]
  ? tcf_police_init+0x55a/0x650 [act_police]
  _raw_spin_lock_bh+0x34/0x40
  ? tcf_police_init+0x2fa/0x650 [act_police]
  tcf_police_init+0x2fa/0x650 [act_police]
  tcf_action_init_1+0x384/0x4c0
  tcf_action_init+0xf6/0x160
  tcf_action_add+0x73/0x170
  tc_ctl_action+0x122/0x160
  rtnetlink_rcv_msg+0x2a4/0x490
  ? netlink_deliver_tap+0x99/0x400
  ? validate_linkmsg+0x370/0x370
  netlink_rcv_skb+0x4d/0x130
  netlink_unicast+0x196/0x230
  netlink_sendmsg+0x2e5/0x3e0
  sock_sendmsg+0x36/0x40
  ___sys_sendmsg+0x280/0x2f0
  ? _raw_spin_unlock+0x24/0x30
  ? handle_pte_fault+0xafe/0xf30
  ? find_held_lock+0x2d/0x90
  ? syscall_trace_enter+0x1df/0x360
  ? __sys_sendmsg+0x5e/0xa0
  __sys_sendmsg+0x5e/0xa0
  do_syscall_64+0x60/0x210
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
 RIP: 0033:0x7f1841c7cf10
 Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f 05 &lt;48&gt; 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24
 RSP: 002b:00007ffcf9df4d68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1841c7cf10
 RDX: 0000000000000000 RSI: 00007ffcf9df4dc0 RDI: 0000000000000003
 RBP: 000000005bf56105 R08: 0000000000000002 R09: 00007ffcf9df8edc
 R10: 00007ffcf9df47e0 R11: 0000000000000246 R12: 0000000000671be0
 R13: 00007ffcf9df4e84 R14: 0000000000000008 R15: 0000000000000000

Fixes: f2cbd4852820 ("net/sched: act_police: fix race condition on state variables")
Reported-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: Davide Caratti &lt;dcaratti@redhat.com&gt;
Acked-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Acked-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit f2cbd4852820 ("net/sched: act_police: fix race condition on state
variables") introduces a new spinlock, but forgets its initialization.
Ensure that tcf_police_init() initializes 'tcfp_lock' every time a 'police'
action is newly created, to avoid the following lockdep splat:

 INFO: trying to register non-static key.
 the code is fine but needs lockdep annotation.
 turning off the locking correctness validator.
 &lt;...&gt;
 Call Trace:
  dump_stack+0x85/0xcb
  register_lock_class+0x581/0x590
  __lock_acquire+0xd4/0x1330
  ? tcf_police_init+0x2fa/0x650 [act_police]
  ? lock_acquire+0x9e/0x1a0
  lock_acquire+0x9e/0x1a0
  ? tcf_police_init+0x2fa/0x650 [act_police]
  ? tcf_police_init+0x55a/0x650 [act_police]
  _raw_spin_lock_bh+0x34/0x40
  ? tcf_police_init+0x2fa/0x650 [act_police]
  tcf_police_init+0x2fa/0x650 [act_police]
  tcf_action_init_1+0x384/0x4c0
  tcf_action_init+0xf6/0x160
  tcf_action_add+0x73/0x170
  tc_ctl_action+0x122/0x160
  rtnetlink_rcv_msg+0x2a4/0x490
  ? netlink_deliver_tap+0x99/0x400
  ? validate_linkmsg+0x370/0x370
  netlink_rcv_skb+0x4d/0x130
  netlink_unicast+0x196/0x230
  netlink_sendmsg+0x2e5/0x3e0
  sock_sendmsg+0x36/0x40
  ___sys_sendmsg+0x280/0x2f0
  ? _raw_spin_unlock+0x24/0x30
  ? handle_pte_fault+0xafe/0xf30
  ? find_held_lock+0x2d/0x90
  ? syscall_trace_enter+0x1df/0x360
  ? __sys_sendmsg+0x5e/0xa0
  __sys_sendmsg+0x5e/0xa0
  do_syscall_64+0x60/0x210
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
 RIP: 0033:0x7f1841c7cf10
 Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f 05 &lt;48&gt; 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24
 RSP: 002b:00007ffcf9df4d68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1841c7cf10
 RDX: 0000000000000000 RSI: 00007ffcf9df4dc0 RDI: 0000000000000003
 RBP: 000000005bf56105 R08: 0000000000000002 R09: 00007ffcf9df8edc
 R10: 00007ffcf9df47e0 R11: 0000000000000246 R12: 0000000000671be0
 R13: 00007ffcf9df4e84 R14: 0000000000000008 R15: 0000000000000000

Fixes: f2cbd4852820 ("net/sched: act_police: fix race condition on state variables")
Reported-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: Davide Caratti &lt;dcaratti@redhat.com&gt;
Acked-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Acked-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net/sched: act_police: fix race condition on state variables</title>
<updated>2018-11-20T22:59:58+00:00</updated>
<author>
<name>Davide Caratti</name>
<email>dcaratti@redhat.com</email>
</author>
<published>2018-11-20T21:18:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f2cbd485282014132851bf37cb2ca624a456275d'/>
<id>f2cbd485282014132851bf37cb2ca624a456275d</id>
<content type='text'>
after 'police' configuration parameters were converted to use RCU instead
of spinlock, the state variables used to compute the traffic rate (namely
'tcfp_toks', 'tcfp_ptoks' and 'tcfp_t_c') are erroneously read/updated in
the traffic path without any protection.

Use a dedicated spinlock to avoid race conditions on these variables, and
ensure proper cache-line alignment. In this way, 'police' is still faster
than what we observed when 'tcf_lock' was used in the traffic path _ i.e.
reverting commit 2d550dbad83c ("net/sched: act_police: don't use spinlock
in the data path"). Moreover, we preserve the throughput improvement that
was obtained after 'police' started using per-cpu counters, when 'avrate'
is used instead of 'rate'.

Changes since v1 (thanks to Eric Dumazet):
- call ktime_get_ns() before acquiring the lock in the traffic path
- use a dedicated spinlock instead of tcf_lock
- improve cache-line usage

Fixes: 2d550dbad83c ("net/sched: act_police: don't use spinlock in the data path")
Reported-and-suggested-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Davide Caratti &lt;dcaratti@redhat.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
after 'police' configuration parameters were converted to use RCU instead
of spinlock, the state variables used to compute the traffic rate (namely
'tcfp_toks', 'tcfp_ptoks' and 'tcfp_t_c') are erroneously read/updated in
the traffic path without any protection.

Use a dedicated spinlock to avoid race conditions on these variables, and
ensure proper cache-line alignment. In this way, 'police' is still faster
than what we observed when 'tcf_lock' was used in the traffic path _ i.e.
reverting commit 2d550dbad83c ("net/sched: act_police: don't use spinlock
in the data path"). Moreover, we preserve the throughput improvement that
was obtained after 'police' started using per-cpu counters, when 'avrate'
is used instead of 'rate'.

Changes since v1 (thanks to Eric Dumazet):
- call ktime_get_ns() before acquiring the lock in the traffic path
- use a dedicated spinlock instead of tcf_lock
- improve cache-line usage

Fixes: 2d550dbad83c ("net/sched: act_police: don't use spinlock in the data path")
Reported-and-suggested-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Davide Caratti &lt;dcaratti@redhat.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net/sched: act_pedit: fix memory leak when IDR allocation fails</title>
<updated>2018-11-17T03:53:45+00:00</updated>
<author>
<name>Davide Caratti</name>
<email>dcaratti@redhat.com</email>
</author>
<published>2018-11-14T11:17:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=19ab69107d3ecfb7cd3e38ad262a881be40c01a3'/>
<id>19ab69107d3ecfb7cd3e38ad262a881be40c01a3</id>
<content type='text'>
tcf_idr_check_alloc() can return a negative value, on allocation failures
(-ENOMEM) or IDR exhaustion (-ENOSPC): don't leak keys_ex in these cases.

Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action")
Signed-off-by: Davide Caratti &lt;dcaratti@redhat.com&gt;
Acked-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
tcf_idr_check_alloc() can return a negative value, on allocation failures
(-ENOMEM) or IDR exhaustion (-ENOSPC): don't leak keys_ex in these cases.

Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action")
Signed-off-by: Davide Caratti &lt;dcaratti@redhat.com&gt;
Acked-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net_sched: sch_fq: ensure maxrate fq parameter applies to EDT flows</title>
<updated>2018-11-15T19:42:12+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2018-11-13T00:17:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=08e14fe429a07475ee9f29a283945d602e4a6d92'/>
<id>08e14fe429a07475ee9f29a283945d602e4a6d92</id>
<content type='text'>
When EDT conversion happened, fq lost the ability to enfore a maxrate
for all flows. It kept it for non EDT flows.

This commit restores the functionality.

Tested:

tc qd replace dev eth0 root fq maxrate 500Mbit
netperf -P0 -H host -- -O THROUGHPUT
489.75

Fixes: ab408b6dc744 ("tcp: switch tcp and sch_fq to new earliest departure time model")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When EDT conversion happened, fq lost the ability to enfore a maxrate
for all flows. It kept it for non EDT flows.

This commit restores the functionality.

Tested:

tc qd replace dev eth0 root fq maxrate 500Mbit
netperf -P0 -H host -- -O THROUGHPUT
489.75

Fixes: ab408b6dc744 ("tcp: switch tcp and sch_fq to new earliest departure time model")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>act_mirred: clear skb-&gt;tstamp on redirect</title>
<updated>2018-11-11T18:21:31+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2018-11-11T00:22:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7236ead1b14923f3ba35cd29cce13246be83f451'/>
<id>7236ead1b14923f3ba35cd29cce13246be83f451</id>
<content type='text'>
If sch_fq is used at ingress, skbs that might have been
timestamped by net_timestamp_set() if a packet capture
is requesting timestamps could be delayed by arbitrary
amount of time, since sch_fq time base is MONOTONIC.

Fix this problem by moving code from sch_netem.c to act_mirred.c.

Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If sch_fq is used at ingress, skbs that might have been
timestamped by net_timestamp_set() if a packet capture
is requesting timestamps could be delayed by arbitrary
amount of time, since sch_fq time base is MONOTONIC.

Fix this problem by moving code from sch_netem.c to act_mirred.c.

Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sched: cls_flower: validate nested enc_opts_policy to avoid warning</title>
<updated>2018-11-10T17:55:30+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>jakub.kicinski@netronome.com</email>
</author>
<published>2018-11-10T05:06:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=63c82997f5c0f3e1b914af43d82f712a86bc5f3a'/>
<id>63c82997f5c0f3e1b914af43d82f712a86bc5f3a</id>
<content type='text'>
TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only
currently contain further nested attributes, which are parsed by
hand, so the policy is never actually used resulting in a W=1
build warning:

net/sched/cls_flower.c:492:1: warning: ‘enc_opts_policy’ defined but not used [-Wunused-const-variable=]
 enc_opts_policy[TCA_FLOWER_KEY_ENC_OPTS_MAX + 1] = {

Add the validation anyway to avoid potential bugs when other
attributes are added and to make the attribute structure slightly
more clear.  Validation will also set extact to point to bad
attribute on error.

Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options")
Signed-off-by: Jakub Kicinski &lt;jakub.kicinski@netronome.com&gt;
Acked-by: Simon Horman &lt;simon.horman@netronome.com&gt;
Acked-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only
currently contain further nested attributes, which are parsed by
hand, so the policy is never actually used resulting in a W=1
build warning:

net/sched/cls_flower.c:492:1: warning: ‘enc_opts_policy’ defined but not used [-Wunused-const-variable=]
 enc_opts_policy[TCA_FLOWER_KEY_ENC_OPTS_MAX + 1] = {

Add the validation anyway to avoid potential bugs when other
attributes are added and to make the attribute structure slightly
more clear.  Validation will also set extact to point to bad
attribute on error.

Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options")
Signed-off-by: Jakub Kicinski &lt;jakub.kicinski@netronome.com&gt;
Acked-by: Simon Horman &lt;simon.horman@netronome.com&gt;
Acked-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
