<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/include/net/sch_generic.h, branch v3.18.4</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE</title>
<updated>2014-10-03T19:37:06+00:00</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>brouer@redhat.com</email>
</author>
<published>2014-10-01T20:35:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5772e9a3463b264cee5a4e73ef586ad482d7ba48'/>
<id>5772e9a3463b264cee5a4e73ef586ad482d7ba48</id>
<content type='text'>
Based on DaveM's recent API work on dev_hard_start_xmit(), that allows
sending/processing an entire skb list.

This patch implements qdisc bulk dequeue, by allowing multiple packets
to be dequeued in dequeue_skb().

The optimization principle for this is two fold, (1) to amortize
locking cost and (2) avoid expensive tailptr update for notifying HW.
 (1) Several packets are dequeued while holding the qdisc root_lock,
amortizing locking cost over several packet.  The dequeued SKB list is
processed under the TXQ lock in dev_hard_start_xmit(), thus also
amortizing the cost of the TXQ lock.
 (2) Further more, dev_hard_start_xmit() will utilize the skb-&gt;xmit_more
API to delay HW tailptr update, which also reduces the cost per
packet.

One restriction of the new API is that every SKB must belong to the
same TXQ.  This patch takes the easy way out, by restricting bulk
dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the
qdisc only have attached a single TXQ.

Some detail about the flow; dev_hard_start_xmit() will process the skb
list, and transmit packets individually towards the driver (see
xmit_one()).  In case the driver stops midway in the list, the
remaining skb list is returned by dev_hard_start_xmit().  In
sch_direct_xmit() this returned list is requeued by dev_requeue_skb().

To avoid overshooting the HW limits, which results in requeuing, the
patch limits the amount of bytes dequeued, based on the drivers BQL
limits.  In-effect bulking will only happen for BQL enabled drivers.

Small amounts for extra HoL blocking (2x MTU/0.24ms) were
measured at 100Mbit/s, with bulking 8 packets, but the
oscillating nature of the measurement indicate something, like
sched latency might be causing this effect. More comparisons
show, that this oscillation goes away occationally. Thus, we
disregard this artifact completely and remove any "magic" bulking
limit.

For now, as a conservative approach, stop bulking when seeing TSO and
segmented GSO packets.  They already benefit from bulking on their own.
A followup patch add this, to allow easier bisect-ability for finding
regressions.

Jointed work with Hannes, Daniel and Florian.

Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Based on DaveM's recent API work on dev_hard_start_xmit(), that allows
sending/processing an entire skb list.

This patch implements qdisc bulk dequeue, by allowing multiple packets
to be dequeued in dequeue_skb().

The optimization principle for this is two fold, (1) to amortize
locking cost and (2) avoid expensive tailptr update for notifying HW.
 (1) Several packets are dequeued while holding the qdisc root_lock,
amortizing locking cost over several packet.  The dequeued SKB list is
processed under the TXQ lock in dev_hard_start_xmit(), thus also
amortizing the cost of the TXQ lock.
 (2) Further more, dev_hard_start_xmit() will utilize the skb-&gt;xmit_more
API to delay HW tailptr update, which also reduces the cost per
packet.

One restriction of the new API is that every SKB must belong to the
same TXQ.  This patch takes the easy way out, by restricting bulk
dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the
qdisc only have attached a single TXQ.

Some detail about the flow; dev_hard_start_xmit() will process the skb
list, and transmit packets individually towards the driver (see
xmit_one()).  In case the driver stops midway in the list, the
remaining skb list is returned by dev_hard_start_xmit().  In
sch_direct_xmit() this returned list is requeued by dev_requeue_skb().

To avoid overshooting the HW limits, which results in requeuing, the
patch limits the amount of bytes dequeued, based on the drivers BQL
limits.  In-effect bulking will only happen for BQL enabled drivers.

Small amounts for extra HoL blocking (2x MTU/0.24ms) were
measured at 100Mbit/s, with bulking 8 packets, but the
oscillating nature of the measurement indicate something, like
sched latency might be causing this effect. More comparisons
show, that this oscillation goes away occationally. Thus, we
disregard this artifact completely and remove any "magic" bulking
limit.

For now, as a conservative approach, stop bulking when seeing TSO and
segmented GSO packets.  They already benefit from bulking on their own.
A followup patch add this, to allow easier bisect-ability for finding
regressions.

Jointed work with Hannes, Daniel and Florian.

Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sched: enable per cpu qstats</title>
<updated>2014-09-30T05:02:26+00:00</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2014-09-28T18:54:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b0ab6f92752b9f9d8da980506e9df3bd9dcd7ed3'/>
<id>b0ab6f92752b9f9d8da980506e9df3bd9dcd7ed3</id>
<content type='text'>
After previous patches to simplify qstats the qstats can be
made per cpu with a packed union in Qdisc struct.

Signed-off-by: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After previous patches to simplify qstats the qstats can be
made per cpu with a packed union in Qdisc struct.

Signed-off-by: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sched: implement qstat helper routines</title>
<updated>2014-09-30T05:02:26+00:00</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2014-09-28T18:53:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=25331d6ce42bcf4b34b6705fce4da15c3fabe62f'/>
<id>25331d6ce42bcf4b34b6705fce4da15c3fabe62f</id>
<content type='text'>
This adds helpers to manipulate qstats logic and replaces locations
that touch the counters directly. This simplifies future patches
to push qstats onto per cpu counters.

Signed-off-by: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This adds helpers to manipulate qstats logic and replaces locations
that touch the counters directly. This simplifies future patches
to push qstats onto per cpu counters.

Signed-off-by: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sched: make bstats per cpu and estimator RCU safe</title>
<updated>2014-09-30T05:02:26+00:00</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2014-09-28T18:52:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=22e0f8b9322cb1a48b1357e8f4ae6f5a9eca8cfa'/>
<id>22e0f8b9322cb1a48b1357e8f4ae6f5a9eca8cfa</id>
<content type='text'>
In order to run qdisc's without locking statistics and estimators
need to be handled correctly.

To resolve bstats make the statistics per cpu. And because this is
only needed for qdiscs that are running without locks which is not
the case for most qdiscs in the near future only create percpu
stats when qdiscs set the TCQ_F_CPUSTATS flag.

Next because estimators use the bstats to calculate packets per
second and bytes per second the estimator code paths are updated
to use the per cpu statistics.

Signed-off-by: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In order to run qdisc's without locking statistics and estimators
need to be handled correctly.

To resolve bstats make the statistics per cpu. And because this is
only needed for qdiscs that are running without locks which is not
the case for most qdiscs in the near future only create percpu
stats when qdiscs set the TCQ_F_CPUSTATS flag.

Next because estimators use the bstats to calculate packets per
second and bytes per second the estimator code paths are updated
to use the per cpu statistics.

Signed-off-by: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2014-09-23T16:09:27+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2014-09-23T16:09:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1f6d80358dc9bbbeb56cb43384fa11fd645d9289'/>
<id>1f6d80358dc9bbbeb56cb43384fa11fd645d9289</id>
<content type='text'>
Conflicts:
	arch/mips/net/bpf_jit.c
	drivers/net/can/flexcan.c

Both the flexcan and MIPS bpf_jit conflicts were cases of simple
overlapping changes.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Conflicts:
	arch/mips/net/bpf_jit.c
	drivers/net/can/flexcan.c

Both the flexcan and MIPS bpf_jit conflicts were cases of simple
overlapping changes.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sched: shrink struct qdisc_skb_cb to 28 bytes</title>
<updated>2014-09-22T18:21:47+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-09-18T15:02:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=257117862634d89de33fec74858b1a0ba5ab444b'/>
<id>257117862634d89de33fec74858b1a0ba5ab444b</id>
<content type='text'>
We cannot make struct qdisc_skb_cb bigger without impacting IPoIB,
or increasing skb-&gt;cb[] size.

Commit e0f31d849867 ("flow_keys: Record IP layer protocol in
skb_flow_dissect()") broke IPoIB.

Only current offender is sch_choke, and this one do not need an
absolutely precise flow key.

If we store 17 bytes of flow key, its more than enough. (Its the actual
size of flow_keys if it was a packed structure, but we might add new
fields at the end of it later)

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Fixes: e0f31d849867 ("flow_keys: Record IP layer protocol in skb_flow_dissect()")
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We cannot make struct qdisc_skb_cb bigger without impacting IPoIB,
or increasing skb-&gt;cb[] size.

Commit e0f31d849867 ("flow_keys: Record IP layer protocol in
skb_flow_dissect()") broke IPoIB.

Only current offender is sch_choke, and this one do not need an
absolutely precise flow key.

If we store 17 bytes of flow key, its more than enough. (Its the actual
size of flow_keys if it was a packed structure, but we might add new
fields at the end of it later)

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Fixes: e0f31d849867 ("flow_keys: Record IP layer protocol in skb_flow_dissect()")
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: rcu-ify tcf_proto</title>
<updated>2014-09-13T16:30:25+00:00</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2014-09-13T03:05:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=25d8c0d55f241ce2d360df1bea48e23a55836ee6'/>
<id>25d8c0d55f241ce2d360df1bea48e23a55836ee6</id>
<content type='text'>
rcu'ify tcf_proto this allows calling tc_classify() without holding
any locks. Updaters are protected by RTNL.

This patch prepares the core net_sched infrastracture for running
the classifier/action chains without holding the qdisc lock however
it does nothing to ensure cls_xxx and act_xxx types also work without
locking. Additional patches are required to address the fall out.

Signed-off-by: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
rcu'ify tcf_proto this allows calling tc_classify() without holding
any locks. Updaters are protected by RTNL.

This patch prepares the core net_sched infrastracture for running
the classifier/action chains without holding the qdisc lock however
it does nothing to ensure cls_xxx and act_xxx types also work without
locking. Additional patches are required to address the fall out.

Signed-off-by: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: qdisc: use rcu prefix and silence sparse warnings</title>
<updated>2014-09-13T16:30:25+00:00</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2014-09-13T03:04:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=46e5da40aec256155cfedee96dd21a75da941f2c'/>
<id>46e5da40aec256155cfedee96dd21a75da941f2c</id>
<content type='text'>
Add __rcu notation to qdisc handling by doing this we can make
smatch output more legible. And anyways some of the cases should
be using rcu_dereference() see qdisc_all_tx_empty(),
qdisc_tx_chainging(), and so on.

Also *wake_queue() API is commonly called from driver timer routines
without rcu lock or rtnl lock. So I added rcu_read_lock() blocks
around netif_wake_subqueue and netif_tx_wake_queue.

Signed-off-by: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add __rcu notation to qdisc handling by doing this we can make
smatch output more legible. And anyways some of the cases should
be using rcu_dereference() see qdisc_all_tx_empty(),
qdisc_tx_chainging(), and so on.

Also *wake_queue() API is commonly called from driver timer routines
without rcu lock or rtnl lock. So I added rcu_read_lock() blocks
around netif_wake_subqueue and netif_tx_wake_queue.

Signed-off-by: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>flow_keys: Record IP layer protocol in skb_flow_dissect()</title>
<updated>2014-06-23T21:32:19+00:00</updated>
<author>
<name>Govindarajulu Varadarajan</name>
<email>_govind@gmx.com</email>
</author>
<published>2014-06-23T10:37:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e0f31d8498676fda36289603a054d0d490aa2679'/>
<id>e0f31d8498676fda36289603a054d0d490aa2679</id>
<content type='text'>
skb_flow_dissect() dissects only transport header type in ip_proto. It dose not
give any information about IPv4 or IPv6.

This patch adds new member, n_proto, to struct flow_keys. Which records the
IP layer type. i.e IPv4 or IPv6.

This can be used in netdev-&gt;ndo_rx_flow_steer driver function to dissect flow.

Adding new member to flow_keys increases the struct size by around 4 bytes.
This causes BUILD_BUG_ON(sizeof(qcb-&gt;data) &lt; sz); to fail in
qdisc_cb_private_validate()

So increase data size by 4

Signed-off-by: Govindarajulu Varadarajan &lt;_govind@gmx.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
skb_flow_dissect() dissects only transport header type in ip_proto. It dose not
give any information about IPv4 or IPv6.

This patch adds new member, n_proto, to struct flow_keys. Which records the
IP layer type. i.e IPv4 or IPv6.

This can be used in netdev-&gt;ndo_rx_flow_steer driver function to dissect flow.

Adding new member to flow_keys increases the struct size by around 4 bytes.
This causes BUILD_BUG_ON(sizeof(qcb-&gt;data) &lt; sz); to fail in
qdisc_cb_private_validate()

So increase data size by 4

Signed-off-by: Govindarajulu Varadarajan &lt;_govind@gmx.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched, cls: check if we could overwrite actions when changing a filter</title>
<updated>2014-04-28T03:42:39+00:00</updated>
<author>
<name>Cong Wang</name>
<email>cwang@twopensource.com</email>
</author>
<published>2014-04-25T20:54:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2f7ef2f8790f5bf53db4fc6b2310943139285827'/>
<id>2f7ef2f8790f5bf53db4fc6b2310943139285827</id>
<content type='text'>
When actions are attached to a filter, they are a part of the filter
itself, so when changing a filter we should allow to overwrite the actions
inside as well.

In my specific case, when I tried to _append_ a new action to an existing
filter which already has an action, I got EEXIST since kernel refused
to overwrite the existing one in kernel.

This patch checks if we are changing the filter checking NLM_F_CREATE flag
(Sigh, filters don't use NLM_F_REPLACE...) and then passes the boolean down
to actions. This fixes the problem above.

Cc: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: Cong Wang &lt;cwang@twopensource.com&gt;
Signed-off-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When actions are attached to a filter, they are a part of the filter
itself, so when changing a filter we should allow to overwrite the actions
inside as well.

In my specific case, when I tried to _append_ a new action to an existing
filter which already has an action, I got EEXIST since kernel refused
to overwrite the existing one in kernel.

This patch checks if we are changing the filter checking NLM_F_CREATE flag
(Sigh, filters don't use NLM_F_REPLACE...) and then passes the boolean down
to actions. This fixes the problem above.

Cc: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: Cong Wang &lt;cwang@twopensource.com&gt;
Signed-off-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
