<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/include/net/netns/ipv4.h, branch v4.12</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>net: Add sysctl to toggle early demux for tcp and udp</title>
<updated>2017-03-24T20:17:07+00:00</updated>
<author>
<name>subashab@codeaurora.org</name>
<email>subashab@codeaurora.org</email>
</author>
<published>2017-03-23T19:34:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dddb64bcb34615bf48a2c9cb9881eb76795cc5c5'/>
<id>dddb64bcb34615bf48a2c9cb9881eb76795cc5c5</id>
<content type='text'>
Certain system process significant unconnected UDP workload.
It would be preferrable to disable UDP early demux for those systems
and enable it for TCP only.

By disabling UDP demux, we see these slight gains on an ARM64 system-
782 -&gt; 788Mbps unconnected single stream UDPv4
633 -&gt; 654Mbps unconnected UDPv4 different sources

The performance impact can change based on CPU architecure and cache
sizes. There will not much difference seen if entire UDP hash table
is in cache.

Both sysctls are enabled by default to preserve existing behavior.

v1-&gt;v2: Change function pointer instead of adding conditional as
suggested by Stephen.

v2-&gt;v3: Read once in callers to avoid issues due to compiler
optimizations. Also update commit message with the tests.

v3-&gt;v4: Store and use read once result instead of querying pointer
again incorrectly.

v4-&gt;v5: Refactor to avoid errors due to compilation with IPV6={m,n}

Signed-off-by: Subash Abhinov Kasiviswanathan &lt;subashab@codeaurora.org&gt;
Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Stephen Hemminger &lt;stephen@networkplumber.org&gt;
Cc: Tom Herbert &lt;tom@herbertland.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Certain system process significant unconnected UDP workload.
It would be preferrable to disable UDP early demux for those systems
and enable it for TCP only.

By disabling UDP demux, we see these slight gains on an ARM64 system-
782 -&gt; 788Mbps unconnected single stream UDPv4
633 -&gt; 654Mbps unconnected UDPv4 different sources

The performance impact can change based on CPU architecure and cache
sizes. There will not much difference seen if entire UDP hash table
is in cache.

Both sysctls are enabled by default to preserve existing behavior.

v1-&gt;v2: Change function pointer instead of adding conditional as
suggested by Stephen.

v2-&gt;v3: Read once in callers to avoid issues due to compiler
optimizations. Also update commit message with the tests.

v3-&gt;v4: Store and use read once result instead of querying pointer
again incorrectly.

v4-&gt;v5: Refactor to avoid errors due to compilation with IPV6={m,n}

Signed-off-by: Subash Abhinov Kasiviswanathan &lt;subashab@codeaurora.org&gt;
Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Stephen Hemminger &lt;stephen@networkplumber.org&gt;
Cc: Tom Herbert &lt;tom@herbertland.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: ipv4: add support for ECMP hash policy choice</title>
<updated>2017-03-21T22:27:19+00:00</updated>
<author>
<name>Nikolay Aleksandrov</name>
<email>nikolay@cumulusnetworks.com</email>
</author>
<published>2017-03-16T13:28:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=bf4e0a3db97eb882368fd82980b3b1fa0b5b9778'/>
<id>bf4e0a3db97eb882368fd82980b3b1fa0b5b9778</id>
<content type='text'>
This patch adds support for ECMP hash policy choice via a new sysctl
called fib_multipath_hash_policy and also adds support for L4 hashes.
The current values for fib_multipath_hash_policy are:
 0 - layer 3 (default)
 1 - layer 4
If there's an skb hash already set and it matches the chosen policy then it
will be used instead of being calculated (currently only for L4).
In L3 mode we always calculate the hash due to the ICMP error special
case, the flow dissector's field consistentification should handle the
address order thus we can remove the address reversals.
If the skb is provided we always use it for the hash calculation,
otherwise we fallback to fl4, that is if skb is NULL fl4 has to be set.

Signed-off-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch adds support for ECMP hash policy choice via a new sysctl
called fib_multipath_hash_policy and also adds support for L4 hashes.
The current values for fib_multipath_hash_policy are:
 0 - layer 3 (default)
 1 - layer 4
If there's an skb hash already set and it matches the chosen policy then it
will be used instead of being calculated (currently only for L4).
In L3 mode we always calculate the hash due to the ICMP error special
case, the flow dissector's field consistentification should handle the
address order thus we can remove the address reversals.
If the skb is provided we always use it for the hash calculation,
otherwise we fallback to fl4, that is if skb is NULL fl4 has to be set.

Signed-off-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: remove tcp_tw_recycle</title>
<updated>2017-03-17T03:33:56+00:00</updated>
<author>
<name>Soheil Hassas Yeganeh</name>
<email>soheil@google.com</email>
</author>
<published>2017-03-15T20:30:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4396e46187ca5070219b81773c4e65088dac50cc'/>
<id>4396e46187ca5070219b81773c4e65088dac50cc</id>
<content type='text'>
The tcp_tw_recycle was already broken for connections
behind NAT, since the per-destination timestamp is not
monotonically increasing for multiple machines behind
a single destination address.

After the randomization of TCP timestamp offsets
in commit 8a5bd45f6616 (tcp: randomize tcp timestamp offsets
for each connection), the tcp_tw_recycle is broken for all
types of connections for the same reason: the timestamps
received from a single machine is not monotonically increasing,
anymore.

Remove tcp_tw_recycle, since it is not functional. Also, remove
the PAWSPassive SNMP counter since it is only used for
tcp_tw_recycle, and simplify tcp_v4_route_req and tcp_v6_route_req
since the strict argument is only set when tcp_tw_recycle is
enabled.

Signed-off-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Lutz Vieweg &lt;lvml@5t9.de&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The tcp_tw_recycle was already broken for connections
behind NAT, since the per-destination timestamp is not
monotonically increasing for multiple machines behind
a single destination address.

After the randomization of TCP timestamp offsets
in commit 8a5bd45f6616 (tcp: randomize tcp timestamp offsets
for each connection), the tcp_tw_recycle is broken for all
types of connections for the same reason: the timestamps
received from a single machine is not monotonically increasing,
anymore.

Remove tcp_tw_recycle, since it is not functional. Also, remove
the PAWSPassive SNMP counter since it is only used for
tcp_tw_recycle, and simplify tcp_v4_route_req and tcp_v6_route_req
since the strict argument is only set when tcp_tw_recycle is
enabled.

Signed-off-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Lutz Vieweg &lt;lvml@5t9.de&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Avoid receiving packets with an l3mdev on unbound UDP sockets</title>
<updated>2017-01-30T20:00:58+00:00</updated>
<author>
<name>Robert Shearman</name>
<email>rshearma@brocade.com</email>
</author>
<published>2017-01-26T18:02:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=63a6fff353d01da5a22b72670c434bf12fa0e3b8'/>
<id>63a6fff353d01da5a22b72670c434bf12fa0e3b8</id>
<content type='text'>
Packets arriving in a VRF currently are delivered to UDP sockets that
aren't bound to any interface. TCP defaults to not delivering packets
arriving in a VRF to unbound sockets. IP route lookup and socket
transmit both assume that unbound means using the default table and
UDP applications that haven't been changed to be aware of VRFs may not
function correctly in this case since they may not be able to handle
overlapping IP address ranges, or be able to send packets back to the
original sender if required.

So add a sysctl, udp_l3mdev_accept, to control this behaviour with it
being analgous to the existing tcp_l3mdev_accept, namely to allow a
process to have a VRF-global listen socket. Have this default to off
as this is the behaviour that users will expect, given that there is
no explicit mechanism to set unmodified VRF-unaware application into a
default VRF.

Signed-off-by: Robert Shearman &lt;rshearma@brocade.com&gt;
Acked-by: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Tested-by: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Packets arriving in a VRF currently are delivered to UDP sockets that
aren't bound to any interface. TCP defaults to not delivering packets
arriving in a VRF to unbound sockets. IP route lookup and socket
transmit both assume that unbound means using the default table and
UDP applications that haven't been changed to be aware of VRFs may not
function correctly in this case since they may not be able to handle
overlapping IP address ranges, or be able to send packets back to the
original sender if required.

So add a sysctl, udp_l3mdev_accept, to control this behaviour with it
being analgous to the existing tcp_l3mdev_accept, namely to allow a
process to have a VRF-global listen socket. Have this default to off
as this is the behaviour that users will expect, given that there is
no explicit mechanism to set unmodified VRF-unaware application into a
default VRF.

Signed-off-by: Robert Shearman &lt;rshearma@brocade.com&gt;
Acked-by: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Tested-by: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Introduce a sysctl that modifies the value of PROT_SOCK.</title>
<updated>2017-01-24T17:10:51+00:00</updated>
<author>
<name>Krister Johansen</name>
<email>kjlx@templeofstupid.com</email>
</author>
<published>2017-01-21T01:49:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4548b683b78137f8eadeb312b94e20bb0d4a7141'/>
<id>4548b683b78137f8eadeb312b94e20bb0d4a7141</id>
<content type='text'>
Add net.ipv4.ip_unprivileged_port_start, which is a per namespace sysctl
that denotes the first unprivileged inet port in the namespace.  To
disable all privileged ports set this to zero.  It also checks for
overlap with the local port range.  The privileged and local range may
not overlap.

The use case for this change is to allow containerized processes to bind
to priviliged ports, but prevent them from ever being allowed to modify
their container's network configuration.  The latter is accomplished by
ensuring that the network namespace is not a child of the user
namespace.  This modification was needed to allow the container manager
to disable a namespace's priviliged port restrictions without exposing
control of the network namespace to processes in the user namespace.

Signed-off-by: Krister Johansen &lt;kjlx@templeofstupid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add net.ipv4.ip_unprivileged_port_start, which is a per namespace sysctl
that denotes the first unprivileged inet port in the namespace.  To
disable all privileged ports set this to zero.  It also checks for
overlap with the local port range.  The privileged and local range may
not overlap.

The use case for this change is to allow containerized processes to bind
to priviliged ports, but prevent them from ever being allowed to modify
their container's network configuration.  The latter is accomplished by
ensuring that the network namespace is not a child of the user
namespace.  This modification was needed to allow the container manager
to disable a namespace's priviliged port restrictions without exposing
control of the network namespace to processes in the user namespace.

Signed-off-by: Krister Johansen &lt;kjlx@templeofstupid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: Namespaceify tcp_max_syn_backlog knob</title>
<updated>2016-12-29T16:38:31+00:00</updated>
<author>
<name>Haishuang Yan</name>
<email>yanhaishuang@cmss.chinamobile.com</email>
</author>
<published>2016-12-28T09:52:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fee83d097b1620530f23bf6063f4ea251ba9c8c7'/>
<id>fee83d097b1620530f23bf6063f4ea251ba9c8c7</id>
<content type='text'>
Different namespace application might require different maximal
number of remembered connection requests.

Signed-off-by: Haishuang Yan &lt;yanhaishuang@cmss.chinamobile.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Different namespace application might require different maximal
number of remembered connection requests.

Signed-off-by: Haishuang Yan &lt;yanhaishuang@cmss.chinamobile.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: Namespaceify tcp_tw_recycle and tcp_max_tw_buckets knob</title>
<updated>2016-12-29T16:38:31+00:00</updated>
<author>
<name>Haishuang Yan</name>
<email>yanhaishuang@cmss.chinamobile.com</email>
</author>
<published>2016-12-28T09:52:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1946e672c173559155a3e210fe95dbf8b7b8ddf7'/>
<id>1946e672c173559155a3e210fe95dbf8b7b8ddf7</id>
<content type='text'>
Different namespace application might require fast recycling
TIME-WAIT sockets independently of the host.

Signed-off-by: Haishuang Yan &lt;yanhaishuang@cmss.chinamobile.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Different namespace application might require fast recycling
TIME-WAIT sockets independently of the host.

Signed-off-by: Haishuang Yan &lt;yanhaishuang@cmss.chinamobile.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: Namespaceify tcp_tw_reuse knob</title>
<updated>2016-12-27T17:28:07+00:00</updated>
<author>
<name>Haishuang Yan</name>
<email>yanhaishuang@cmss.chinamobile.com</email>
</author>
<published>2016-12-25T06:33:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=56ab6b93007e5000a8812985aec1833c4a6a9ce0'/>
<id>56ab6b93007e5000a8812985aec1833c4a6a9ce0</id>
<content type='text'>
Different namespaces might have different requirements to reuse
TIME-WAIT sockets for new connections. This might be required in
cases where different namespace applications are in place which
require TIME_WAIT socket connections to be reduced independently
of the host.

Signed-off-by: Haishuang Yan &lt;yanhaishuang@cmss.chinamobile.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Different namespaces might have different requirements to reuse
TIME-WAIT sockets for new connections. This might be required in
cases where different namespace applications are in place which
require TIME_WAIT socket connections to be reduced independently
of the host.

Signed-off-by: Haishuang Yan &lt;yanhaishuang@cmss.chinamobile.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: fib: Allow for consistent FIB dumping</title>
<updated>2016-12-04T00:29:35+00:00</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@mellanox.com</email>
</author>
<published>2016-12-03T15:45:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=cacaad11f43aefbbe5fca00af3b9c16e6aee1ba4'/>
<id>cacaad11f43aefbbe5fca00af3b9c16e6aee1ba4</id>
<content type='text'>
The next patch will enable listeners of the FIB notification chain to
request a dump of the FIB tables. However, since RTNL isn't taken during
the dump, it's possible for the FIB tables to change mid-dump, which
will result in inconsistency between the listener's table and the
kernel's.

Allow listeners to know about changes that occurred mid-dump, by adding
a change sequence counter to each net namespace. The counter is
incremented just before a notification is sent in the FIB chain.

Signed-off-by: Ido Schimmel &lt;idosch@mellanox.com&gt;
Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The next patch will enable listeners of the FIB notification chain to
request a dump of the FIB tables. However, since RTNL isn't taken during
the dump, it's possible for the FIB tables to change mid-dump, which
will result in inconsistency between the listener's table and the
kernel's.

Allow listeners to know about changes that occurred mid-dump, by adding
a change sequence counter to each net namespace. The counter is
incremented just before a notification is sent in the FIB chain.

Signed-off-by: Ido Schimmel &lt;idosch@mellanox.com&gt;
Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Remove fib_local variable</title>
<updated>2016-08-09T21:57:39+00:00</updated>
<author>
<name>David Ahern</name>
<email>dsa@cumulusnetworks.com</email>
</author>
<published>2016-08-09T13:51:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=631fee7d70e8eabb642b4bcc58f08bbe880c91aa'/>
<id>631fee7d70e8eabb642b4bcc58f08bbe880c91aa</id>
<content type='text'>
After commit 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse")
fib_local is set but not used. Remove it.

Signed-off-by: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After commit 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse")
fib_local is set but not used. Remove it.

Signed-off-by: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
