<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/sched/act_sample.c, branch v4.19</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>net/sched: act_sample: fix NULL dereference in the data path</title>
<updated>2018-09-14T15:46:28+00:00</updated>
<author>
<name>Davide Caratti</name>
<email>dcaratti@redhat.com</email>
</author>
<published>2018-09-14T10:03:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=34043d250f51368f214aed7f54c2dc29c819a8c7'/>
<id>34043d250f51368f214aed7f54c2dc29c819a8c7</id>
<content type='text'>
Matteo reported the following splat, testing the datapath of TC 'sample':

 BUG: KASAN: null-ptr-deref in tcf_sample_act+0xc4/0x310
 Read of size 8 at addr 0000000000000000 by task nc/433

 CPU: 0 PID: 433 Comm: nc Not tainted 4.19.0-rc3-kvm #17
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014
 Call Trace:
  kasan_report.cold.6+0x6c/0x2fa
  tcf_sample_act+0xc4/0x310
  ? dev_hard_start_xmit+0x117/0x180
  tcf_action_exec+0xa3/0x160
  tcf_classify+0xdd/0x1d0
  htb_enqueue+0x18e/0x6b0
  ? deref_stack_reg+0x7a/0xb0
  ? htb_delete+0x4b0/0x4b0
  ? unwind_next_frame+0x819/0x8f0
  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
  __dev_queue_xmit+0x722/0xca0
  ? unwind_get_return_address_ptr+0x50/0x50
  ? netdev_pick_tx+0xe0/0xe0
  ? save_stack+0x8c/0xb0
  ? kasan_kmalloc+0xbe/0xd0
  ? __kmalloc_track_caller+0xe4/0x1c0
  ? __kmalloc_reserve.isra.45+0x24/0x70
  ? __alloc_skb+0xdd/0x2e0
  ? sk_stream_alloc_skb+0x91/0x3b0
  ? tcp_sendmsg_locked+0x71b/0x15a0
  ? tcp_sendmsg+0x22/0x40
  ? __sys_sendto+0x1b0/0x250
  ? __x64_sys_sendto+0x6f/0x80
  ? do_syscall_64+0x5d/0x150
  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
  ? __sys_sendto+0x1b0/0x250
  ? __x64_sys_sendto+0x6f/0x80
  ? do_syscall_64+0x5d/0x150
  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
  ip_finish_output2+0x495/0x590
  ? ip_copy_metadata+0x2e0/0x2e0
  ? skb_gso_validate_network_len+0x6f/0x110
  ? ip_finish_output+0x174/0x280
  __tcp_transmit_skb+0xb17/0x12b0
  ? __tcp_select_window+0x380/0x380
  tcp_write_xmit+0x913/0x1de0
  ? __sk_mem_schedule+0x50/0x80
  tcp_sendmsg_locked+0x49d/0x15a0
  ? tcp_rcv_established+0x8da/0xa30
  ? tcp_set_state+0x220/0x220
  ? clear_user+0x1f/0x50
  ? iov_iter_zero+0x1ae/0x590
  ? __fget_light+0xa0/0xe0
  tcp_sendmsg+0x22/0x40
  __sys_sendto+0x1b0/0x250
  ? __ia32_sys_getpeername+0x40/0x40
  ? _copy_to_user+0x58/0x70
  ? poll_select_copy_remaining+0x176/0x200
  ? __pollwait+0x1c0/0x1c0
  ? ktime_get_ts64+0x11f/0x140
  ? kern_select+0x108/0x150
  ? core_sys_select+0x360/0x360
  ? vfs_read+0x127/0x150
  ? kernel_write+0x90/0x90
  __x64_sys_sendto+0x6f/0x80
  do_syscall_64+0x5d/0x150
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7fefef2b129d
 Code: ff ff ff ff eb b6 0f 1f 80 00 00 00 00 48 8d 05 51 37 0c 00 41 89 ca 8b 00 85 c0 75 20 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 &lt;48&gt; 3d 00 f0 ff ff 77 6b f3 c3 66 0f 1f 84 00 00 00 00 00 41 56 41
 RSP: 002b:00007fff2f5350c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
 RAX: ffffffffffffffda RBX: 000056118d60c120 RCX: 00007fefef2b129d
 RDX: 0000000000002000 RSI: 000056118d629320 RDI: 0000000000000003
 RBP: 000056118d530370 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000002000
 R13: 000056118d5c2a10 R14: 000056118d5c2a10 R15: 000056118d5303b8

tcf_sample_act() tried to update its per-cpu stats, but tcf_sample_init()
forgot to allocate them, because tcf_idr_create() was called with a wrong
value of 'cpustats'. Setting it to true proved to fix the reported crash.

Reported-by: Matteo Croce &lt;mcroce@redhat.com&gt;
Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action")
Tested-by: Matteo Croce &lt;mcroce@redhat.com&gt;
Signed-off-by: Davide Caratti &lt;dcaratti@redhat.com&gt;
Acked-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Matteo reported the following splat, testing the datapath of TC 'sample':

 BUG: KASAN: null-ptr-deref in tcf_sample_act+0xc4/0x310
 Read of size 8 at addr 0000000000000000 by task nc/433

 CPU: 0 PID: 433 Comm: nc Not tainted 4.19.0-rc3-kvm #17
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014
 Call Trace:
  kasan_report.cold.6+0x6c/0x2fa
  tcf_sample_act+0xc4/0x310
  ? dev_hard_start_xmit+0x117/0x180
  tcf_action_exec+0xa3/0x160
  tcf_classify+0xdd/0x1d0
  htb_enqueue+0x18e/0x6b0
  ? deref_stack_reg+0x7a/0xb0
  ? htb_delete+0x4b0/0x4b0
  ? unwind_next_frame+0x819/0x8f0
  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
  __dev_queue_xmit+0x722/0xca0
  ? unwind_get_return_address_ptr+0x50/0x50
  ? netdev_pick_tx+0xe0/0xe0
  ? save_stack+0x8c/0xb0
  ? kasan_kmalloc+0xbe/0xd0
  ? __kmalloc_track_caller+0xe4/0x1c0
  ? __kmalloc_reserve.isra.45+0x24/0x70
  ? __alloc_skb+0xdd/0x2e0
  ? sk_stream_alloc_skb+0x91/0x3b0
  ? tcp_sendmsg_locked+0x71b/0x15a0
  ? tcp_sendmsg+0x22/0x40
  ? __sys_sendto+0x1b0/0x250
  ? __x64_sys_sendto+0x6f/0x80
  ? do_syscall_64+0x5d/0x150
  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
  ? __sys_sendto+0x1b0/0x250
  ? __x64_sys_sendto+0x6f/0x80
  ? do_syscall_64+0x5d/0x150
  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
  ip_finish_output2+0x495/0x590
  ? ip_copy_metadata+0x2e0/0x2e0
  ? skb_gso_validate_network_len+0x6f/0x110
  ? ip_finish_output+0x174/0x280
  __tcp_transmit_skb+0xb17/0x12b0
  ? __tcp_select_window+0x380/0x380
  tcp_write_xmit+0x913/0x1de0
  ? __sk_mem_schedule+0x50/0x80
  tcp_sendmsg_locked+0x49d/0x15a0
  ? tcp_rcv_established+0x8da/0xa30
  ? tcp_set_state+0x220/0x220
  ? clear_user+0x1f/0x50
  ? iov_iter_zero+0x1ae/0x590
  ? __fget_light+0xa0/0xe0
  tcp_sendmsg+0x22/0x40
  __sys_sendto+0x1b0/0x250
  ? __ia32_sys_getpeername+0x40/0x40
  ? _copy_to_user+0x58/0x70
  ? poll_select_copy_remaining+0x176/0x200
  ? __pollwait+0x1c0/0x1c0
  ? ktime_get_ts64+0x11f/0x140
  ? kern_select+0x108/0x150
  ? core_sys_select+0x360/0x360
  ? vfs_read+0x127/0x150
  ? kernel_write+0x90/0x90
  __x64_sys_sendto+0x6f/0x80
  do_syscall_64+0x5d/0x150
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7fefef2b129d
 Code: ff ff ff ff eb b6 0f 1f 80 00 00 00 00 48 8d 05 51 37 0c 00 41 89 ca 8b 00 85 c0 75 20 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 &lt;48&gt; 3d 00 f0 ff ff 77 6b f3 c3 66 0f 1f 84 00 00 00 00 00 41 56 41
 RSP: 002b:00007fff2f5350c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
 RAX: ffffffffffffffda RBX: 000056118d60c120 RCX: 00007fefef2b129d
 RDX: 0000000000002000 RSI: 000056118d629320 RDI: 0000000000000003
 RBP: 000056118d530370 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000002000
 R13: 000056118d5c2a10 R14: 000056118d5c2a10 R15: 000056118d5303b8

tcf_sample_act() tried to update its per-cpu stats, but tcf_sample_init()
forgot to allocate them, because tcf_idr_create() was called with a wrong
value of 'cpustats'. Setting it to true proved to fix the reported crash.

Reported-by: Matteo Croce &lt;mcroce@redhat.com&gt;
Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action")
Tested-by: Matteo Croce &lt;mcroce@redhat.com&gt;
Signed-off-by: Davide Caratti &lt;dcaratti@redhat.com&gt;
Acked-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net_sched: remove unnecessary ops-&gt;delete()</title>
<updated>2018-08-21T19:45:44+00:00</updated>
<author>
<name>Cong Wang</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2018-08-19T19:22:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=97a3f84f2c84f81b859aedd2c186df09c2ee21a6'/>
<id>97a3f84f2c84f81b859aedd2c186df09c2ee21a6</id>
<content type='text'>
All ops-&gt;delete() wants is getting the tn-&gt;idrinfo, but we already
have tc_action before calling ops-&gt;delete(), and tc_action has
a pointer -&gt;idrinfo.

More importantly, each type of action does the same thing, that is,
just calling tcf_idr_delete_index().

So it can be just removed.

Fixes: b409074e6693 ("net: sched: add 'delete' function to action ops")
Cc: Jiri Pirko &lt;jiri@mellanox.com&gt;
Cc: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
All ops-&gt;delete() wants is getting the tn-&gt;idrinfo, but we already
have tc_action before calling ops-&gt;delete(), and tc_action has
a pointer -&gt;idrinfo.

More importantly, each type of action does the same thing, that is,
just calling tcf_idr_delete_index().

So it can be just removed.

Fixes: b409074e6693 ("net: sched: add 'delete' function to action ops")
Cc: Jiri Pirko &lt;jiri@mellanox.com&gt;
Cc: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sched: always disable bh when taking tcf_lock</title>
<updated>2018-08-19T17:46:21+00:00</updated>
<author>
<name>Vlad Buslov</name>
<email>vladbu@mellanox.com</email>
</author>
<published>2018-08-14T18:46:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=653cd284a8a857ddfcf24f5bc3bd204a229f6c9f'/>
<id>653cd284a8a857ddfcf24f5bc3bd204a229f6c9f</id>
<content type='text'>
Recently, ops-&gt;init() and ops-&gt;dump() of all actions were modified to
always obtain tcf_lock when accessing private action state. Actions that
don't depend on tcf_lock for synchronization with their data path use
non-bh locking API. However, tcf_lock is also used to protect rate
estimator stats in softirq context by timer callback.

Change ops-&gt;init() and ops-&gt;dump() of all actions to disable bh when using
tcf_lock to prevent deadlock reported by following lockdep warning:

[  105.470398] ================================
[  105.475014] WARNING: inconsistent lock state
[  105.479628] 4.18.0-rc8+ #664 Not tainted
[  105.483897] --------------------------------
[  105.488511] inconsistent {SOFTIRQ-ON-W} -&gt; {IN-SOFTIRQ-W} usage.
[  105.494871] swapper/16/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[  105.500449] 00000000f86c012e (&amp;(&amp;p-&gt;tcfa_lock)-&gt;rlock){+.?.}, at: est_fetch_counters+0x3c/0xa0
[  105.509696] {SOFTIRQ-ON-W} state was registered at:
[  105.514925]   _raw_spin_lock+0x2c/0x40
[  105.519022]   tcf_bpf_init+0x579/0x820 [act_bpf]
[  105.523990]   tcf_action_init_1+0x4e4/0x660
[  105.528518]   tcf_action_init+0x1ce/0x2d0
[  105.532880]   tcf_exts_validate+0x1d8/0x200
[  105.537416]   fl_change+0x55a/0x268b [cls_flower]
[  105.542469]   tc_new_tfilter+0x748/0xa20
[  105.546738]   rtnetlink_rcv_msg+0x56a/0x6d0
[  105.551268]   netlink_rcv_skb+0x18d/0x200
[  105.555628]   netlink_unicast+0x2d0/0x370
[  105.559990]   netlink_sendmsg+0x3b9/0x6a0
[  105.564349]   sock_sendmsg+0x6b/0x80
[  105.568271]   ___sys_sendmsg+0x4a1/0x520
[  105.572547]   __sys_sendmsg+0xd7/0x150
[  105.576655]   do_syscall_64+0x72/0x2c0
[  105.580757]   entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  105.586243] irq event stamp: 489296
[  105.590084] hardirqs last  enabled at (489296): [&lt;ffffffffb507e639&gt;] _raw_spin_unlock_irq+0x29/0x40
[  105.599765] hardirqs last disabled at (489295): [&lt;ffffffffb507e745&gt;] _raw_spin_lock_irq+0x15/0x50
[  105.609277] softirqs last  enabled at (489292): [&lt;ffffffffb413a6a3&gt;] irq_enter+0x83/0xa0
[  105.618001] softirqs last disabled at (489293): [&lt;ffffffffb413a800&gt;] irq_exit+0x140/0x190
[  105.626813]
               other info that might help us debug this:
[  105.633976]  Possible unsafe locking scenario:

[  105.640526]        CPU0
[  105.643325]        ----
[  105.646125]   lock(&amp;(&amp;p-&gt;tcfa_lock)-&gt;rlock);
[  105.650747]   &lt;Interrupt&gt;
[  105.653717]     lock(&amp;(&amp;p-&gt;tcfa_lock)-&gt;rlock);
[  105.658514]
                *** DEADLOCK ***

[  105.665349] 1 lock held by swapper/16/0:
[  105.669629]  #0: 00000000a640ad99 ((&amp;est-&gt;timer)){+.-.}, at: call_timer_fn+0x10b/0x550
[  105.678200]
               stack backtrace:
[  105.683194] CPU: 16 PID: 0 Comm: swapper/16 Not tainted 4.18.0-rc8+ #664
[  105.690249] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[  105.698626] Call Trace:
[  105.701421]  &lt;IRQ&gt;
[  105.703791]  dump_stack+0x92/0xeb
[  105.707461]  print_usage_bug+0x336/0x34c
[  105.711744]  mark_lock+0x7c9/0x980
[  105.715500]  ? print_shortest_lock_dependencies+0x2e0/0x2e0
[  105.721424]  ? check_usage_forwards+0x230/0x230
[  105.726315]  __lock_acquire+0x923/0x26f0
[  105.730597]  ? debug_show_all_locks+0x240/0x240
[  105.735478]  ? mark_lock+0x493/0x980
[  105.739412]  ? check_chain_key+0x140/0x1f0
[  105.743861]  ? __lock_acquire+0x836/0x26f0
[  105.748323]  ? lock_acquire+0x12e/0x290
[  105.752516]  lock_acquire+0x12e/0x290
[  105.756539]  ? est_fetch_counters+0x3c/0xa0
[  105.761084]  _raw_spin_lock+0x2c/0x40
[  105.765099]  ? est_fetch_counters+0x3c/0xa0
[  105.769633]  est_fetch_counters+0x3c/0xa0
[  105.773995]  est_timer+0x87/0x390
[  105.777670]  ? est_fetch_counters+0xa0/0xa0
[  105.782210]  ? lock_acquire+0x12e/0x290
[  105.786410]  call_timer_fn+0x161/0x550
[  105.790512]  ? est_fetch_counters+0xa0/0xa0
[  105.795055]  ? del_timer_sync+0xd0/0xd0
[  105.799249]  ? __lock_is_held+0x93/0x110
[  105.803531]  ? mark_held_locks+0x20/0xe0
[  105.807813]  ? _raw_spin_unlock_irq+0x29/0x40
[  105.812525]  ? est_fetch_counters+0xa0/0xa0
[  105.817069]  ? est_fetch_counters+0xa0/0xa0
[  105.821610]  run_timer_softirq+0x3c4/0x9f0
[  105.826064]  ? lock_acquire+0x12e/0x290
[  105.830257]  ? __bpf_trace_timer_class+0x10/0x10
[  105.835237]  ? __lock_is_held+0x25/0x110
[  105.839517]  __do_softirq+0x11d/0x7bf
[  105.843542]  irq_exit+0x140/0x190
[  105.847208]  smp_apic_timer_interrupt+0xac/0x3b0
[  105.852182]  apic_timer_interrupt+0xf/0x20
[  105.856628]  &lt;/IRQ&gt;
[  105.859081] RIP: 0010:cpuidle_enter_state+0xd8/0x4d0
[  105.864395] Code: 46 ff 48 89 44 24 08 0f 1f 44 00 00 31 ff e8 cf ec 46 ff 80 7c 24 07 00 0f 85 1d 02 00 00 e8 9f 90 4b ff fb 66 0f 1f 44 00 00 &lt;4c&gt; 8b 6c 24 08 4d 29 fd 0f 80 36 03 00 00 4c 89 e8 48 ba cf f7 53
[  105.884288] RSP: 0018:ffff8803ad94fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[  105.892494] RAX: 0000000000000000 RBX: ffffe8fb300829c0 RCX: ffffffffb41e19e1
[  105.899988] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8803ad9358ac
[  105.907503] RBP: ffffffffb6636300 R08: 0000000000000004 R09: 0000000000000000
[  105.914997] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004
[  105.922487] R13: ffffffffb6636140 R14: ffffffffb66362d8 R15: 000000188d36091b
[  105.929988]  ? trace_hardirqs_on_caller+0x141/0x2d0
[  105.935232]  do_idle+0x28e/0x320
[  105.938817]  ? arch_cpu_idle_exit+0x40/0x40
[  105.943361]  ? mark_lock+0x8c1/0x980
[  105.947295]  ? _raw_spin_unlock_irqrestore+0x32/0x60
[  105.952619]  cpu_startup_entry+0xc2/0xd0
[  105.956900]  ? cpu_in_idle+0x20/0x20
[  105.960830]  ? _raw_spin_unlock_irqrestore+0x32/0x60
[  105.966146]  ? trace_hardirqs_on_caller+0x141/0x2d0
[  105.971391]  start_secondary+0x2b5/0x360
[  105.975669]  ? set_cpu_sibling_map+0x1330/0x1330
[  105.980654]  secondary_startup_64+0xa5/0xb0

Taking tcf_lock in sample action with bh disabled causes lockdep to issue a
warning regarding possible irq lock inversion dependency between tcf_lock,
and psample_groups_lock that is taken when holding tcf_lock in sample init:

[  162.108959]  Possible interrupt unsafe locking scenario:

[  162.116386]        CPU0                    CPU1
[  162.121277]        ----                    ----
[  162.126162]   lock(psample_groups_lock);
[  162.130447]                                local_irq_disable();
[  162.136772]                                lock(&amp;(&amp;p-&gt;tcfa_lock)-&gt;rlock);
[  162.143957]                                lock(psample_groups_lock);
[  162.150813]   &lt;Interrupt&gt;
[  162.153808]     lock(&amp;(&amp;p-&gt;tcfa_lock)-&gt;rlock);
[  162.158608]
                *** DEADLOCK ***

In order to prevent potential lock inversion dependency between tcf_lock
and psample_groups_lock, extract call to psample_group_get() from tcf_lock
protected section in sample action init function.

Fixes: 4e232818bd32 ("net: sched: act_mirred: remove dependency on rtnl lock")
Fixes: 764e9a24480f ("net: sched: act_vlan: remove dependency on rtnl lock")
Fixes: 729e01260989 ("net: sched: act_tunnel_key: remove dependency on rtnl lock")
Fixes: d77284956656 ("net: sched: act_sample: remove dependency on rtnl lock")
Fixes: e8917f437006 ("net: sched: act_gact: remove dependency on rtnl lock")
Fixes: b6a2b971c0b0 ("net: sched: act_csum: remove dependency on rtnl lock")
Fixes: 2142236b4584 ("net: sched: act_bpf: remove dependency on rtnl lock")
Signed-off-by: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Recently, ops-&gt;init() and ops-&gt;dump() of all actions were modified to
always obtain tcf_lock when accessing private action state. Actions that
don't depend on tcf_lock for synchronization with their data path use
non-bh locking API. However, tcf_lock is also used to protect rate
estimator stats in softirq context by timer callback.

Change ops-&gt;init() and ops-&gt;dump() of all actions to disable bh when using
tcf_lock to prevent deadlock reported by following lockdep warning:

[  105.470398] ================================
[  105.475014] WARNING: inconsistent lock state
[  105.479628] 4.18.0-rc8+ #664 Not tainted
[  105.483897] --------------------------------
[  105.488511] inconsistent {SOFTIRQ-ON-W} -&gt; {IN-SOFTIRQ-W} usage.
[  105.494871] swapper/16/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[  105.500449] 00000000f86c012e (&amp;(&amp;p-&gt;tcfa_lock)-&gt;rlock){+.?.}, at: est_fetch_counters+0x3c/0xa0
[  105.509696] {SOFTIRQ-ON-W} state was registered at:
[  105.514925]   _raw_spin_lock+0x2c/0x40
[  105.519022]   tcf_bpf_init+0x579/0x820 [act_bpf]
[  105.523990]   tcf_action_init_1+0x4e4/0x660
[  105.528518]   tcf_action_init+0x1ce/0x2d0
[  105.532880]   tcf_exts_validate+0x1d8/0x200
[  105.537416]   fl_change+0x55a/0x268b [cls_flower]
[  105.542469]   tc_new_tfilter+0x748/0xa20
[  105.546738]   rtnetlink_rcv_msg+0x56a/0x6d0
[  105.551268]   netlink_rcv_skb+0x18d/0x200
[  105.555628]   netlink_unicast+0x2d0/0x370
[  105.559990]   netlink_sendmsg+0x3b9/0x6a0
[  105.564349]   sock_sendmsg+0x6b/0x80
[  105.568271]   ___sys_sendmsg+0x4a1/0x520
[  105.572547]   __sys_sendmsg+0xd7/0x150
[  105.576655]   do_syscall_64+0x72/0x2c0
[  105.580757]   entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  105.586243] irq event stamp: 489296
[  105.590084] hardirqs last  enabled at (489296): [&lt;ffffffffb507e639&gt;] _raw_spin_unlock_irq+0x29/0x40
[  105.599765] hardirqs last disabled at (489295): [&lt;ffffffffb507e745&gt;] _raw_spin_lock_irq+0x15/0x50
[  105.609277] softirqs last  enabled at (489292): [&lt;ffffffffb413a6a3&gt;] irq_enter+0x83/0xa0
[  105.618001] softirqs last disabled at (489293): [&lt;ffffffffb413a800&gt;] irq_exit+0x140/0x190
[  105.626813]
               other info that might help us debug this:
[  105.633976]  Possible unsafe locking scenario:

[  105.640526]        CPU0
[  105.643325]        ----
[  105.646125]   lock(&amp;(&amp;p-&gt;tcfa_lock)-&gt;rlock);
[  105.650747]   &lt;Interrupt&gt;
[  105.653717]     lock(&amp;(&amp;p-&gt;tcfa_lock)-&gt;rlock);
[  105.658514]
                *** DEADLOCK ***

[  105.665349] 1 lock held by swapper/16/0:
[  105.669629]  #0: 00000000a640ad99 ((&amp;est-&gt;timer)){+.-.}, at: call_timer_fn+0x10b/0x550
[  105.678200]
               stack backtrace:
[  105.683194] CPU: 16 PID: 0 Comm: swapper/16 Not tainted 4.18.0-rc8+ #664
[  105.690249] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[  105.698626] Call Trace:
[  105.701421]  &lt;IRQ&gt;
[  105.703791]  dump_stack+0x92/0xeb
[  105.707461]  print_usage_bug+0x336/0x34c
[  105.711744]  mark_lock+0x7c9/0x980
[  105.715500]  ? print_shortest_lock_dependencies+0x2e0/0x2e0
[  105.721424]  ? check_usage_forwards+0x230/0x230
[  105.726315]  __lock_acquire+0x923/0x26f0
[  105.730597]  ? debug_show_all_locks+0x240/0x240
[  105.735478]  ? mark_lock+0x493/0x980
[  105.739412]  ? check_chain_key+0x140/0x1f0
[  105.743861]  ? __lock_acquire+0x836/0x26f0
[  105.748323]  ? lock_acquire+0x12e/0x290
[  105.752516]  lock_acquire+0x12e/0x290
[  105.756539]  ? est_fetch_counters+0x3c/0xa0
[  105.761084]  _raw_spin_lock+0x2c/0x40
[  105.765099]  ? est_fetch_counters+0x3c/0xa0
[  105.769633]  est_fetch_counters+0x3c/0xa0
[  105.773995]  est_timer+0x87/0x390
[  105.777670]  ? est_fetch_counters+0xa0/0xa0
[  105.782210]  ? lock_acquire+0x12e/0x290
[  105.786410]  call_timer_fn+0x161/0x550
[  105.790512]  ? est_fetch_counters+0xa0/0xa0
[  105.795055]  ? del_timer_sync+0xd0/0xd0
[  105.799249]  ? __lock_is_held+0x93/0x110
[  105.803531]  ? mark_held_locks+0x20/0xe0
[  105.807813]  ? _raw_spin_unlock_irq+0x29/0x40
[  105.812525]  ? est_fetch_counters+0xa0/0xa0
[  105.817069]  ? est_fetch_counters+0xa0/0xa0
[  105.821610]  run_timer_softirq+0x3c4/0x9f0
[  105.826064]  ? lock_acquire+0x12e/0x290
[  105.830257]  ? __bpf_trace_timer_class+0x10/0x10
[  105.835237]  ? __lock_is_held+0x25/0x110
[  105.839517]  __do_softirq+0x11d/0x7bf
[  105.843542]  irq_exit+0x140/0x190
[  105.847208]  smp_apic_timer_interrupt+0xac/0x3b0
[  105.852182]  apic_timer_interrupt+0xf/0x20
[  105.856628]  &lt;/IRQ&gt;
[  105.859081] RIP: 0010:cpuidle_enter_state+0xd8/0x4d0
[  105.864395] Code: 46 ff 48 89 44 24 08 0f 1f 44 00 00 31 ff e8 cf ec 46 ff 80 7c 24 07 00 0f 85 1d 02 00 00 e8 9f 90 4b ff fb 66 0f 1f 44 00 00 &lt;4c&gt; 8b 6c 24 08 4d 29 fd 0f 80 36 03 00 00 4c 89 e8 48 ba cf f7 53
[  105.884288] RSP: 0018:ffff8803ad94fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[  105.892494] RAX: 0000000000000000 RBX: ffffe8fb300829c0 RCX: ffffffffb41e19e1
[  105.899988] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8803ad9358ac
[  105.907503] RBP: ffffffffb6636300 R08: 0000000000000004 R09: 0000000000000000
[  105.914997] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004
[  105.922487] R13: ffffffffb6636140 R14: ffffffffb66362d8 R15: 000000188d36091b
[  105.929988]  ? trace_hardirqs_on_caller+0x141/0x2d0
[  105.935232]  do_idle+0x28e/0x320
[  105.938817]  ? arch_cpu_idle_exit+0x40/0x40
[  105.943361]  ? mark_lock+0x8c1/0x980
[  105.947295]  ? _raw_spin_unlock_irqrestore+0x32/0x60
[  105.952619]  cpu_startup_entry+0xc2/0xd0
[  105.956900]  ? cpu_in_idle+0x20/0x20
[  105.960830]  ? _raw_spin_unlock_irqrestore+0x32/0x60
[  105.966146]  ? trace_hardirqs_on_caller+0x141/0x2d0
[  105.971391]  start_secondary+0x2b5/0x360
[  105.975669]  ? set_cpu_sibling_map+0x1330/0x1330
[  105.980654]  secondary_startup_64+0xa5/0xb0

Taking tcf_lock in sample action with bh disabled causes lockdep to issue a
warning regarding possible irq lock inversion dependency between tcf_lock,
and psample_groups_lock that is taken when holding tcf_lock in sample init:

[  162.108959]  Possible interrupt unsafe locking scenario:

[  162.116386]        CPU0                    CPU1
[  162.121277]        ----                    ----
[  162.126162]   lock(psample_groups_lock);
[  162.130447]                                local_irq_disable();
[  162.136772]                                lock(&amp;(&amp;p-&gt;tcfa_lock)-&gt;rlock);
[  162.143957]                                lock(psample_groups_lock);
[  162.150813]   &lt;Interrupt&gt;
[  162.153808]     lock(&amp;(&amp;p-&gt;tcfa_lock)-&gt;rlock);
[  162.158608]
                *** DEADLOCK ***

In order to prevent potential lock inversion dependency between tcf_lock
and psample_groups_lock, extract call to psample_group_get() from tcf_lock
protected section in sample action init function.

Fixes: 4e232818bd32 ("net: sched: act_mirred: remove dependency on rtnl lock")
Fixes: 764e9a24480f ("net: sched: act_vlan: remove dependency on rtnl lock")
Fixes: 729e01260989 ("net: sched: act_tunnel_key: remove dependency on rtnl lock")
Fixes: d77284956656 ("net: sched: act_sample: remove dependency on rtnl lock")
Fixes: e8917f437006 ("net: sched: act_gact: remove dependency on rtnl lock")
Fixes: b6a2b971c0b0 ("net: sched: act_csum: remove dependency on rtnl lock")
Fixes: 2142236b4584 ("net: sched: act_bpf: remove dependency on rtnl lock")
Signed-off-by: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sched: act_sample: remove dependency on rtnl lock</title>
<updated>2018-08-11T19:37:09+00:00</updated>
<author>
<name>Vlad Buslov</name>
<email>vladbu@mellanox.com</email>
</author>
<published>2018-08-10T17:51:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d7728495665601658c7f94f3b5fa4e3f54d71c18'/>
<id>d7728495665601658c7f94f3b5fa4e3f54d71c18</id>
<content type='text'>
Use tcf spinlock to protect private sample action data from concurrent
modification during dump and init.

Signed-off-by: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use tcf spinlock to protect private sample action data from concurrent
modification during dump and init.

Signed-off-by: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tc/act: remove unneeded RCU lock in action callback</title>
<updated>2018-07-30T16:31:13+00:00</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2018-07-30T12:30:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7fd4b288ea6a3e45ad8afbcd5ec39554d57f1ae0'/>
<id>7fd4b288ea6a3e45ad8afbcd5ec39554d57f1ae0</id>
<content type='text'>
Each lockless action currently does its own RCU locking in -&gt;act().
This allows using plain RCU accessor, even if the context
is really RCU BH.

This change drops the per action RCU lock, replace the accessors
with the _bh variant, cleans up a bit the surrounding code and
documents the RCU status in the relevant header.
No functional nor performance change is intended.

The goal of this patch is clarifying that the RCU critical section
used by the tc actions extends up to the classifier's caller.

v1 -&gt; v2:
 - preserve rcu lock in act_bpf: it's needed by eBPF helpers,
   as pointed out by Daniel

v3 -&gt; v4:
 - fixed some typos in the commit message (JiriP)

Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Acked-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Each lockless action currently does its own RCU locking in -&gt;act().
This allows using plain RCU accessor, even if the context
is really RCU BH.

This change drops the per action RCU lock, replace the accessors
with the _bh variant, cleans up a bit the surrounding code and
documents the RCU status in the relevant header.
No functional nor performance change is intended.

The goal of this patch is clarifying that the RCU critical section
used by the tc actions extends up to the classifier's caller.

v1 -&gt; v2:
 - preserve rcu lock in act_bpf: it's needed by eBPF helpers,
   as pointed out by Daniel

v3 -&gt; v4:
 - fixed some typos in the commit message (JiriP)

Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Acked-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sched: atomically check-allocate action</title>
<updated>2018-07-08T03:42:29+00:00</updated>
<author>
<name>Vlad Buslov</name>
<email>vladbu@mellanox.com</email>
</author>
<published>2018-07-05T14:24:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0190c1d452a91c38a3462abdd81752be1b9006a8'/>
<id>0190c1d452a91c38a3462abdd81752be1b9006a8</id>
<content type='text'>
Implement function that atomically checks if action exists and either takes
reference to it, or allocates idr slot for action index to prevent
concurrent allocations of actions with same index. Use EBUSY error pointer
to indicate that idr slot is reserved.

Implement cleanup helper function that removes temporary error pointer from
idr. (in case of error between idr allocation and insertion of newly
created action to specified index)

Refactor all action init functions to insert new action to idr using this
API.

Reviewed-by: Marcelo Ricardo Leitner &lt;marcelo.leitner@gmail.com&gt;
Signed-off-by: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Implement function that atomically checks if action exists and either takes
reference to it, or allocates idr slot for action index to prevent
concurrent allocations of actions with same index. Use EBUSY error pointer
to indicate that idr slot is reserved.

Implement cleanup helper function that removes temporary error pointer from
idr. (in case of error between idr allocation and insertion of newly
created action to specified index)

Refactor all action init functions to insert new action to idr using this
API.

Reviewed-by: Marcelo Ricardo Leitner &lt;marcelo.leitner@gmail.com&gt;
Signed-off-by: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sched: don't release reference on action overwrite</title>
<updated>2018-07-08T03:42:29+00:00</updated>
<author>
<name>Vlad Buslov</name>
<email>vladbu@mellanox.com</email>
</author>
<published>2018-07-05T14:24:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4e8ddd7f1758ca4ddd0c1f7cf3e66fce736241d2'/>
<id>4e8ddd7f1758ca4ddd0c1f7cf3e66fce736241d2</id>
<content type='text'>
Return from action init function with reference to action taken,
even when overwriting existing action.

Action init API initializes its fourth argument (pointer to pointer to tc
action) to either existing action with same index or newly created action.
In case of existing index(and bind argument is zero), init function returns
without incrementing action reference counter. Caller of action init then
proceeds working with action, without actually holding reference to it.
This means that action could be deleted concurrently.

Change action init behavior to always take reference to action before
returning successfully, in order to protect from concurrent deletion.

Reviewed-by: Marcelo Ricardo Leitner &lt;marcelo.leitner@gmail.com&gt;
Signed-off-by: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Return from action init function with reference to action taken,
even when overwriting existing action.

Action init API initializes its fourth argument (pointer to pointer to tc
action) to either existing action with same index or newly created action.
In case of existing index(and bind argument is zero), init function returns
without incrementing action reference counter. Caller of action init then
proceeds working with action, without actually holding reference to it.
This means that action could be deleted concurrently.

Change action init behavior to always take reference to action before
returning successfully, in order to protect from concurrent deletion.

Reviewed-by: Marcelo Ricardo Leitner &lt;marcelo.leitner@gmail.com&gt;
Signed-off-by: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sched: add 'delete' function to action ops</title>
<updated>2018-07-08T03:42:29+00:00</updated>
<author>
<name>Vlad Buslov</name>
<email>vladbu@mellanox.com</email>
</author>
<published>2018-07-05T14:24:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b409074e6693bcdaa7abbee2a035f22a9eabda53'/>
<id>b409074e6693bcdaa7abbee2a035f22a9eabda53</id>
<content type='text'>
Extend action ops with 'delete' function. Each action type to implements
its own delete function that doesn't depend on rtnl lock.

Implement delete function that is required to delete actions without
holding rtnl lock. Use action API function that atomically deletes action
only if it is still in action idr. This implementation prevents concurrent
threads from deleting same action twice.

Reviewed-by: Marcelo Ricardo Leitner &lt;marcelo.leitner@gmail.com&gt;
Signed-off-by: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Extend action ops with 'delete' function. Each action type to implements
its own delete function that doesn't depend on rtnl lock.

Implement delete function that is required to delete actions without
holding rtnl lock. Use action API function that atomically deletes action
only if it is still in action idr. This implementation prevents concurrent
threads from deleting same action twice.

Reviewed-by: Marcelo Ricardo Leitner &lt;marcelo.leitner@gmail.com&gt;
Signed-off-by: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sched: implement unlocked action init API</title>
<updated>2018-07-08T03:42:28+00:00</updated>
<author>
<name>Vlad Buslov</name>
<email>vladbu@mellanox.com</email>
</author>
<published>2018-07-05T14:24:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=789871bb2a0381425b106d2a995bde1460d35a34'/>
<id>789871bb2a0381425b106d2a995bde1460d35a34</id>
<content type='text'>
Add additional 'rtnl_held' argument to act API init functions. It is
required to implement actions that need to release rtnl lock before loading
kernel module and reacquire if afterwards.

Reviewed-by: Marcelo Ricardo Leitner &lt;marcelo.leitner@gmail.com&gt;
Signed-off-by: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add additional 'rtnl_held' argument to act API init functions. It is
required to implement actions that need to release rtnl lock before loading
kernel module and reacquire if afterwards.

Reviewed-by: Marcelo Ricardo Leitner &lt;marcelo.leitner@gmail.com&gt;
Signed-off-by: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sched: change type of reference and bind counters</title>
<updated>2018-07-08T03:42:28+00:00</updated>
<author>
<name>Vlad Buslov</name>
<email>vladbu@mellanox.com</email>
</author>
<published>2018-07-05T14:24:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=036bb44327f50273e85ee4a2c9b56eebce1c0838'/>
<id>036bb44327f50273e85ee4a2c9b56eebce1c0838</id>
<content type='text'>
Change type of action reference counter to refcount_t.

Change type of action bind counter to atomic_t.
This type is used to allow decrementing bind counter without testing
for 0 result.

Reviewed-by: Marcelo Ricardo Leitner &lt;marcelo.leitner@gmail.com&gt;
Signed-off-by: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change type of action reference counter to refcount_t.

Change type of action bind counter to atomic_t.
This type is used to allow decrementing bind counter without testing
for 0 result.

Reviewed-by: Marcelo Ricardo Leitner &lt;marcelo.leitner@gmail.com&gt;
Signed-off-by: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
