<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/include/net/sch_generic.h, branch v4.8.2</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>net_sched: generalize bulk dequeue</title>
<updated>2016-06-25T16:19:35+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-06-22T06:16:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=4d202a0d31b96ab3324b21e7500d9a2da9ef57dd'/>
<id>4d202a0d31b96ab3324b21e7500d9a2da9ef57dd</id>
<content type='text'>
When qdisc bulk dequeue was added in linux-3.18 (commit
5772e9a3463b "qdisc: bulk dequeue support for qdiscs
with TCQ_F_ONETXQUEUE"), it was constrained to some
specific qdiscs.

With some extra care, we can extend this to all qdiscs,
so that typical traffic shaping solutions can benefit from
small batches (8 packets in this patch).

For example, HTB is often used on some multi queue device.
And bonding/team are multi queue devices...

Idea is to bulk-dequeue packets mapping to the same transmit queue.

This brings between 35 and 80 % performance increase in HTB setup
under pressure on a bonding setup :

1) NUMA node contention :   610,000 pps -&gt; 1,110,000 pps
2) No node contention   : 1,380,000 pps -&gt; 1,930,000 pps

Now we should work to add batches on the enqueue() side ;)

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Cc: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Cc: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When qdisc bulk dequeue was added in linux-3.18 (commit
5772e9a3463b "qdisc: bulk dequeue support for qdiscs
with TCQ_F_ONETXQUEUE"), it was constrained to some
specific qdiscs.

With some extra care, we can extend this to all qdiscs,
so that typical traffic shaping solutions can benefit from
small batches (8 packets in this patch).

For example, HTB is often used on some multi queue device.
And bonding/team are multi queue devices...

Idea is to bulk-dequeue packets mapping to the same transmit queue.

This brings between 35 and 80 % performance increase in HTB setup
under pressure on a bonding setup :

1) NUMA node contention :   610,000 pps -&gt; 1,110,000 pps
2) No node contention   : 1,380,000 pps -&gt; 1,930,000 pps

Now we should work to add batches on the enqueue() side ;)

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Cc: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Cc: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net_sched: drop packets after root qdisc lock is released</title>
<updated>2016-06-25T16:19:35+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-06-22T06:16:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=520ac30f45519b0a82dd92117c181d1d6144677b'/>
<id>520ac30f45519b0a82dd92117c181d1d6144677b</id>
<content type='text'>
Qdisc performance suffers when packets are dropped at enqueue()
time because drops (kfree_skb()) are done while qdisc lock is held,
delaying a dequeue() draining the queue.

Nominal throughput can be reduced by 50 % when this happens,
at a time we would like the dequeue() to proceed as fast as possible.

Even FQ is vulnerable to this problem, while one of FQ goals was
to provide some flow isolation.

This patch adds a 'struct sk_buff **to_free' parameter to all
qdisc-&gt;enqueue(), and in qdisc_drop() helper.

I measured a performance increase of up to 12 %, but this patch
is a prereq so that future batches in enqueue() can fly.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Qdisc performance suffers when packets are dropped at enqueue()
time because drops (kfree_skb()) are done while qdisc lock is held,
delaying a dequeue() draining the queue.

Nominal throughput can be reduced by 50 % when this happens,
at a time we would like the dequeue() to proceed as fast as possible.

Even FQ is vulnerable to this problem, while one of FQ goals was
to provide some flow isolation.

This patch adds a 'struct sk_buff **to_free' parameter to all
qdisc-&gt;enqueue(), and in qdisc_drop() helper.

I measured a performance increase of up to 12 %, but this patch
is a prereq so that future batches in enqueue() can fly.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net_sched: add the ability to defer skb freeing</title>
<updated>2016-06-15T21:08:34+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-06-14T03:21:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1b5c5493e3e68181be344cb51bf9df192d05ffc2'/>
<id>1b5c5493e3e68181be344cb51bf9df192d05ffc2</id>
<content type='text'>
qdisc are changed under RTNL protection and often
while blocking BH and root qdisc spinlock.

When lots of skbs need to be dropped, we free
them under these locks causing TX/RX freezes,
and more generally latency spikes.

This commit adds rtnl_kfree_skbs(), used to queue
skbs for deferred freeing.

Actual freeing happens right after RTNL is released,
with appropriate scheduling points.

rtnl_qdisc_drop() can also be used in place
of disc_drop() when RTNL is held.

qdisc_reset_queue() and __qdisc_reset_queue() get
the new behavior, so standard qdiscs like pfifo, pfifo_fast...
have their -&gt;reset() method automatically handled.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
qdisc are changed under RTNL protection and often
while blocking BH and root qdisc spinlock.

When lots of skbs need to be dropped, we free
them under these locks causing TX/RX freezes,
and more generally latency spikes.

This commit adds rtnl_kfree_skbs(), used to queue
skbs for deferred freeing.

Actual freeing happens right after RTNL is released,
with appropriate scheduling points.

rtnl_qdisc_drop() can also be used in place
of disc_drop() when RTNL is held.

qdisc_reset_queue() and __qdisc_reset_queue() get
the new behavior, so standard qdiscs like pfifo, pfifo_fast...
have their -&gt;reset() method automatically handled.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net_sched: remove generic throttled management</title>
<updated>2016-06-11T06:58:21+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-06-10T23:41:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=45f50bed1d808794e514e9eed0e579a8756ce2ba'/>
<id>45f50bed1d808794e514e9eed0e579a8756ce2ba</id>
<content type='text'>
__QDISC_STATE_THROTTLED bit manipulation is rather expensive
for HTB and few others.

I already removed it for sch_fq in commit f2600cf02b5b
("net: sched: avoid costly atomic operation in fq_dequeue()")
and so far nobody complained.

When one ore more packets are stuck in one or more throttled
HTB class, a htb dequeue() performs two atomic operations
to clear/set __QDISC_STATE_THROTTLED bit, while root qdisc
lock is held.

Removing this pair of atomic operations bring me a 8 % performance
increase on 200 TCP_RR tests, in presence of throttled classes.

This patch has no side effect, since nothing actually uses
disc_is_throttled() anymore.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
__QDISC_STATE_THROTTLED bit manipulation is rather expensive
for HTB and few others.

I already removed it for sch_fq in commit f2600cf02b5b
("net: sched: avoid costly atomic operation in fq_dequeue()")
and so far nobody complained.

When one ore more packets are stuck in one or more throttled
HTB class, a htb dequeue() performs two atomic operations
to clear/set __QDISC_STATE_THROTTLED bit, while root qdisc
lock is held.

Removing this pair of atomic operations bring me a 8 % performance
increase on 200 TCP_RR tests, in presence of throttled classes.

This patch has no side effect, since nothing actually uses
disc_is_throttled() anymore.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2016-06-10T18:52:24+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2016-06-10T18:52:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1578b0a5e92825334760741e5c166b8873886f1b'/>
<id>1578b0a5e92825334760741e5c166b8873886f1b</id>
<content type='text'>
Conflicts:
	net/sched/act_police.c
	net/sched/sch_drr.c
	net/sched/sch_hfsc.c
	net/sched/sch_prio.c
	net/sched/sch_red.c
	net/sched/sch_tbf.c

In net-next the drop methods of the packet schedulers got removed, so
the bug fixes to them in 'net' are irrelevant.

A packet action unload crash fix conflicts with the addition of the
new firstuse timestamp.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Conflicts:
	net/sched/act_police.c
	net/sched/sch_drr.c
	net/sched/sch_hfsc.c
	net/sched/sch_prio.c
	net/sched/sch_red.c
	net/sched/sch_tbf.c

In net-next the drop methods of the packet schedulers got removed, so
the bug fixes to them in 'net' are irrelevant.

A packet action unload crash fix conflicts with the addition of the
new firstuse timestamp.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sched: fix qdisc-&gt;running lockdep annotations</title>
<updated>2016-06-09T20:28:37+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-06-09T14:45:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=52fbb2907988aa0583c6d9d53a56aee090b2df7e'/>
<id>52fbb2907988aa0583c6d9d53a56aee090b2df7e</id>
<content type='text'>
1) qdisc_run_begin() is really using the equivalent of a trylock.
  Instead of using write_seqcount_begin(), use a combination of
  raw_write_seqcount_begin() and correct lockdep annotation.

2) sch_direct_xmit() should use regular spin_lock(root_lock)

Fixes: f9eb8aea2a1e ("net_sched: transform qdisc running bit into a seqcount")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
1) qdisc_run_begin() is really using the equivalent of a trylock.
  Instead of using write_seqcount_begin(), use a combination of
  raw_write_seqcount_begin() and correct lockdep annotation.

2) sch_direct_xmit() should use regular spin_lock(root_lock)

Fixes: f9eb8aea2a1e ("net_sched: transform qdisc running bit into a seqcount")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: place state, next_sched and gso_skb in same cacheline again</title>
<updated>2016-06-09T06:58:52+00:00</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2016-06-08T22:27:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c8945043cdc687388b7a43fc6f474bddd9607e80'/>
<id>c8945043cdc687388b7a43fc6f474bddd9607e80</id>
<content type='text'>
Earlier commits removed two members from struct Qdisc which places
next_sched/gso_skb into a different cacheline than -&gt;state.

This restores the struct layout to what it was before the removal.
Move the two members, then add an annotation so they all reside in the
same cacheline.

This adds a 16 byte hole after cpu_qstats.

The hole could be closed but as it doesn't decrease total struct size just
do it this way.

Reported-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Earlier commits removed two members from struct Qdisc which places
next_sched/gso_skb into a different cacheline than -&gt;state.

This restores the struct layout to what it was before the removal.
Move the two members, then add an annotation so they all reside in the
same cacheline.

This adds a 16 byte hole after cpu_qstats.

The hole could be closed but as it doesn't decrease total struct size just
do it this way.

Reported-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: remove qdisc-&gt;drop</title>
<updated>2016-06-09T06:58:52+00:00</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2016-06-08T22:27:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a09ceb0e08140a1eec05b49b4c232d3481339cb0'/>
<id>a09ceb0e08140a1eec05b49b4c232d3481339cb0</id>
<content type='text'>
after removal of TCA_CBQ_OVL_STRATEGY from cbq scheduler, there are no
more callers of -&gt;drop() outside of other -&gt;drop functions, i.e.
nothing calls them.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
after removal of TCA_CBQ_OVL_STRATEGY from cbq scheduler, there are no
more callers of -&gt;drop() outside of other -&gt;drop functions, i.e.
nothing calls them.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: remove qdisc_rehape_fail</title>
<updated>2016-06-09T06:58:51+00:00</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2016-06-08T22:27:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c3a173d7dba2d7c74dd4ab871b8f22bf56ac10b2'/>
<id>c3a173d7dba2d7c74dd4ab871b8f22bf56ac10b2</id>
<content type='text'>
After the removal of TCA_CBQ_POLICE in cbq scheduler qdisc-&gt;reshape_fail
is always NULL, i.e. qdisc_rehape_fail is now the same as qdisc_drop.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After the removal of TCA_CBQ_POLICE in cbq scheduler qdisc-&gt;reshape_fail
is always NULL, i.e. qdisc_rehape_fail is now the same as qdisc_drop.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cbq: remove TCA_CBQ_POLICE support</title>
<updated>2016-06-09T06:58:51+00:00</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2016-06-08T22:27:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=dd47c1fa776cda48531b651c88341e951140b0a7'/>
<id>dd47c1fa776cda48531b651c88341e951140b0a7</id>
<content type='text'>
iproute2 doesn't implement any cbq option that results in this attribute
being sent to kernel.

To make use of it, user would have to

- patch iproute2
- add a class
- attach a qdisc to the class (default pfifo doesn't work as
  q-&gt;handle is 0 and cbq_set_police() is a no-op in this case)
- re-'add' the same class (tc class change ...) again
- user must also specifiy a defmap (e.g. 'split 1:0 defmap 3f'), since
  this 'police' feature relies on its presence
- the added qdisc must be one of bfifo, pfifo or netem

If all of these conditions are met and _some_ leaf qdiscs, namely
p/bfifo, netem, plug or tbf would drop a packet, kernel calls back into
cbq, which will attempt to re-queue the skb into a different class
as indicated by the parents' defmap entry for TC_PRIO_BESTEFFORT.

[ i.e. we behave as if tc_classify returned TC_ACT_RECLASSIFY ].

This feature, which isn't documented or implemented in iproute2,
and isn't implemented consistently (most qdiscs like sfq, codel, etc
drop right away instead of attempting this reclassification) is the
sole reason for the reshape_fail and __parent member in Qdisc struct.

So remove TCA_CBQ_POLICE support from the kernel, reject it via EOPNOTSUPP
so userspace knows we don't support it, and then remove no-longer needed
infrastructure in followup commit.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
iproute2 doesn't implement any cbq option that results in this attribute
being sent to kernel.

To make use of it, user would have to

- patch iproute2
- add a class
- attach a qdisc to the class (default pfifo doesn't work as
  q-&gt;handle is 0 and cbq_set_police() is a no-op in this case)
- re-'add' the same class (tc class change ...) again
- user must also specifiy a defmap (e.g. 'split 1:0 defmap 3f'), since
  this 'police' feature relies on its presence
- the added qdisc must be one of bfifo, pfifo or netem

If all of these conditions are met and _some_ leaf qdiscs, namely
p/bfifo, netem, plug or tbf would drop a packet, kernel calls back into
cbq, which will attempt to re-queue the skb into a different class
as indicated by the parents' defmap entry for TC_PRIO_BESTEFFORT.

[ i.e. we behave as if tc_classify returned TC_ACT_RECLASSIFY ].

This feature, which isn't documented or implemented in iproute2,
and isn't implemented consistently (most qdiscs like sfq, codel, etc
drop right away instead of attempting this reclassification) is the
sole reason for the reshape_fail and __parent member in Qdisc struct.

So remove TCA_CBQ_POLICE support from the kernel, reject it via EOPNOTSUPP
so userspace knows we don't support it, and then remove no-longer needed
infrastructure in followup commit.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
