<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/net/dccp, branch linux-5.17.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>dccp: Inline dccp_listen_start().</title>
<updated>2021-11-24T04:16:22+00:00</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.co.jp</email>
</author>
<published>2021-11-22T10:16:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b4a8e7493d74121093f2e1e024a9d3bb1a912912'/>
<id>b4a8e7493d74121093f2e1e024a9d3bb1a912912</id>
<content type='text'>
This patch inlines dccp_listen_start() and removes a stale comment in
inet_dccp_listen() so that it looks like inet_listen().

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.co.jp&gt;
Reviewed-by: Richard Sailer &lt;richard_siegfried@systemli.org&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch inlines dccp_listen_start() and removes a stale comment in
inet_dccp_listen() so that it looks like inet_listen().

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.co.jp&gt;
Reviewed-by: Richard Sailer &lt;richard_siegfried@systemli.org&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dccp/tcp: Remove an unused argument in inet_csk_listen_start().</title>
<updated>2021-11-24T04:16:18+00:00</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.co.jp</email>
</author>
<published>2021-11-22T10:16:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e7049395b1c3085d12b5ba16d058c65598368853'/>
<id>e7049395b1c3085d12b5ba16d058c65598368853</id>
<content type='text'>
The commit 1295e2cf3065 ("inet: minor optimization for backlog setting in
listen(2)") added change so that sk_max_ack_backlog is initialised earlier
in inet_dccp_listen() and inet_listen().  Since then, we no longer use
backlog in inet_csk_listen_start(), so let's remove it.

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.co.jp&gt;
Acked-by: Yafang Shao &lt;laoar.shao@gmail.com&gt;
Reviewed-by: Richard Sailer &lt;richard_siegfried@systemli.org&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The commit 1295e2cf3065 ("inet: minor optimization for backlog setting in
listen(2)") added change so that sk_max_ack_backlog is initialised earlier
in inet_dccp_listen() and inet_listen().  Since then, we no longer use
backlog in inet_csk_listen_start(), so let's remove it.

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.co.jp&gt;
Acked-by: Yafang Shao &lt;laoar.shao@gmail.com&gt;
Reviewed-by: Richard Sailer &lt;richard_siegfried@systemli.org&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: dccp: Use memset_startat() for TP zeroing</title>
<updated>2021-11-19T11:22:49+00:00</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2021-11-18T20:30:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f5455a1d9d499e4028b8ae8370758efad00b00e9'/>
<id>f5455a1d9d499e4028b8ae8370758efad00b00e9</id>
<content type='text'>
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.

Use memset_startat() so memset() doesn't get confused about writing
beyond the destination member that is intended to be the starting point
of zeroing through the end of the struct.

Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.

Use memset_startat() so memset() doesn't get confused about writing
beyond the destination member that is intended to be the starting point
of zeroing through the end of the struct.

Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: switch orphan_count to bare per-cpu counters</title>
<updated>2021-10-15T10:28:34+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2021-10-14T13:41:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=19757cebf0c5016a1f36f7fe9810a9f0b33c0832'/>
<id>19757cebf0c5016a1f36f7fe9810a9f0b33c0832</id>
<content type='text'>
Use of percpu_counter structure to track count of orphaned
sockets is causing problems on modern hosts with 256 cpus
or more.

Stefan Bach reported a serious spinlock contention in real workloads,
that I was able to reproduce with a netfilter rule dropping
incoming FIN packets.

    53.56%  server  [kernel.kallsyms]      [k] queued_spin_lock_slowpath
            |
            ---queued_spin_lock_slowpath
               |
                --53.51%--_raw_spin_lock_irqsave
                          |
                           --53.51%--__percpu_counter_sum
                                     tcp_check_oom
                                     |
                                     |--39.03%--__tcp_close
                                     |          tcp_close
                                     |          inet_release
                                     |          inet6_release
                                     |          sock_close
                                     |          __fput
                                     |          ____fput
                                     |          task_work_run
                                     |          exit_to_usermode_loop
                                     |          do_syscall_64
                                     |          entry_SYSCALL_64_after_hwframe
                                     |          __GI___libc_close
                                     |
                                      --14.48%--tcp_out_of_resources
                                                tcp_write_timeout
                                                tcp_retransmit_timer
                                                tcp_write_timer_handler
                                                tcp_write_timer
                                                call_timer_fn
                                                expire_timers
                                                __run_timers
                                                run_timer_softirq
                                                __softirqentry_text_start

As explained in commit cf86a086a180 ("net/dst: use a smaller percpu_counter
batch for dst entries accounting"), default batch size is too big
for the default value of tcp_max_orphans (262144).

But even if we reduce batch sizes, there would still be cases
where the estimated count of orphans is beyond the limit,
and where tcp_too_many_orphans() has to call the expensive
percpu_counter_sum_positive().

One solution is to use plain per-cpu counters, and have
a timer to periodically refresh this cache.

Updating this cache every 100ms seems about right, tcp pressure
state is not radically changing over shorter periods.

percpu_counter was nice 15 years ago while hosts had less
than 16 cpus, not anymore by current standards.

v2: Fix the build issue for CONFIG_CRYPTO_DEV_CHELSIO_TLS=m,
    reported by kernel test robot &lt;lkp@intel.com&gt;
    Remove unused socket argument from tcp_too_many_orphans()

Fixes: dd24c00191d5 ("net: Use a percpu_counter for orphan_count")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Stefan Bach &lt;sfb@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use of percpu_counter structure to track count of orphaned
sockets is causing problems on modern hosts with 256 cpus
or more.

Stefan Bach reported a serious spinlock contention in real workloads,
that I was able to reproduce with a netfilter rule dropping
incoming FIN packets.

    53.56%  server  [kernel.kallsyms]      [k] queued_spin_lock_slowpath
            |
            ---queued_spin_lock_slowpath
               |
                --53.51%--_raw_spin_lock_irqsave
                          |
                           --53.51%--__percpu_counter_sum
                                     tcp_check_oom
                                     |
                                     |--39.03%--__tcp_close
                                     |          tcp_close
                                     |          inet_release
                                     |          inet6_release
                                     |          sock_close
                                     |          __fput
                                     |          ____fput
                                     |          task_work_run
                                     |          exit_to_usermode_loop
                                     |          do_syscall_64
                                     |          entry_SYSCALL_64_after_hwframe
                                     |          __GI___libc_close
                                     |
                                      --14.48%--tcp_out_of_resources
                                                tcp_write_timeout
                                                tcp_retransmit_timer
                                                tcp_write_timer_handler
                                                tcp_write_timer
                                                call_timer_fn
                                                expire_timers
                                                __run_timers
                                                run_timer_softirq
                                                __softirqentry_text_start

As explained in commit cf86a086a180 ("net/dst: use a smaller percpu_counter
batch for dst entries accounting"), default batch size is too big
for the default value of tcp_max_orphans (262144).

But even if we reduce batch sizes, there would still be cases
where the estimated count of orphans is beyond the limit,
and where tcp_too_many_orphans() has to call the expensive
percpu_counter_sum_positive().

One solution is to use plain per-cpu counters, and have
a timer to periodically refresh this cache.

Updating this cache every 100ms seems about right, tcp pressure
state is not radically changing over shorter periods.

percpu_counter was nice 15 years ago while hosts had less
than 16 cpus, not anymore by current standards.

v2: Fix the build issue for CONFIG_CRYPTO_DEV_CHELSIO_TLS=m,
    reported by kernel test robot &lt;lkp@intel.com&gt;
    Remove unused socket argument from tcp_too_many_orphans()

Fixes: dd24c00191d5 ("net: Use a percpu_counter for orphan_count")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Stefan Bach &lt;sfb@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dccp: don't duplicate ccid when cloning dccp sock</title>
<updated>2021-09-08T10:28:35+00:00</updated>
<author>
<name>Lin, Zhenpeng</name>
<email>zplin@psu.edu</email>
</author>
<published>2021-09-08T03:40:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d9ea761fdd197351890418acd462c51f241014a7'/>
<id>d9ea761fdd197351890418acd462c51f241014a7</id>
<content type='text'>
Commit 2677d2067731 ("dccp: don't free ccid2_hc_tx_sock ...") fixed
a UAF but reintroduced CVE-2017-6074.

When the sock is cloned, two dccps_hc_tx_ccid will reference to the
same ccid. So one can free the ccid object twice from two socks after
cloning.

This issue was found by "Hadar Manor" as well and assigned with
CVE-2020-16119, which was fixed in Ubuntu's kernel. So here I port
the patch from Ubuntu to fix it.

The patch prevents cloned socks from referencing the same ccid.

Fixes: 2677d2067731410 ("dccp: don't free ccid2_hc_tx_sock ...")
Signed-off-by: Zhenpeng Lin &lt;zplin@psu.edu&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 2677d2067731 ("dccp: don't free ccid2_hc_tx_sock ...") fixed
a UAF but reintroduced CVE-2017-6074.

When the sock is cloned, two dccps_hc_tx_ccid will reference to the
same ccid. So one can free the ccid object twice from two socks after
cloning.

This issue was found by "Hadar Manor" as well and assigned with
CVE-2020-16119, which was fixed in Ubuntu's kernel. So here I port
the patch from Ubuntu to fix it.

The patch prevents cloned socks from referencing the same ccid.

Fixes: 2677d2067731410 ("dccp: don't free ccid2_hc_tx_sock ...")
Signed-off-by: Zhenpeng Lin &lt;zplin@psu.edu&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2021-08-13T13:41:22+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2021-08-13T13:41:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f4083a752a3b7dc2076432129c8469d02c25318e'/>
<id>f4083a752a3b7dc2076432129c8469d02c25318e</id>
<content type='text'>
Conflicts:

drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h
  9e26680733d5 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp")
  9e518f25802c ("bnxt_en: 1PPS functions to configure TSIO pins")
  099fdeda659d ("bnxt_en: Event handler for PPS events")

kernel/bpf/helpers.c
include/linux/bpf-cgroup.h
  a2baf4e8bb0f ("bpf: Fix potentially incorrect results with bpf_get_local_storage()")
  c7603cfa04e7 ("bpf: Add ambient BPF runtime context stored in current")

drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
  5957cc557dc5 ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray")
  2d0b41a37679 ("net/mlx5: Refcount mlx5_irq with integer")

MAINTAINERS
  7b637cd52f02 ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo")
  7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver")

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Conflicts:

drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h
  9e26680733d5 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp")
  9e518f25802c ("bnxt_en: 1PPS functions to configure TSIO pins")
  099fdeda659d ("bnxt_en: Event handler for PPS events")

kernel/bpf/helpers.c
include/linux/bpf-cgroup.h
  a2baf4e8bb0f ("bpf: Fix potentially incorrect results with bpf_get_local_storage()")
  c7603cfa04e7 ("bpf: Add ambient BPF runtime context stored in current")

drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
  5957cc557dc5 ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray")
  2d0b41a37679 ("net/mlx5: Refcount mlx5_irq with integer")

MAINTAINERS
  7b637cd52f02 ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo")
  7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver")

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dccp: add do-while-0 stubs for dccp_pr_debug macros</title>
<updated>2021-08-09T09:00:02+00:00</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@infradead.org</email>
</author>
<published>2021-08-08T23:04:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=86aab09a4870bb8346c9579864588c3d7f555299'/>
<id>86aab09a4870bb8346c9579864588c3d7f555299</id>
<content type='text'>
GCC complains about empty macros in an 'if' statement, so convert
them to 'do {} while (0)' macros.

Fixes these build warnings:

net/dccp/output.c: In function 'dccp_xmit_packet':
../net/dccp/output.c:283:71: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]
  283 |                 dccp_pr_debug("transmit_skb() returned err=%d\n", err);
net/dccp/ackvec.c: In function 'dccp_ackvec_update_old':
../net/dccp/ackvec.c:163:80: warning: suggest braces around empty body in an 'else' statement [-Wempty-body]
  163 |                                               (unsigned long long)seqno, state);

Fixes: dc841e30eaea ("dccp: Extend CCID packet dequeueing interface")
Fixes: 380240864451 ("dccp ccid-2: Update code for the Ack Vector input/registration routine")
Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: dccp@vger.kernel.org
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Jakub Kicinski &lt;kuba@kernel.org&gt;
Cc: Gerrit Renker &lt;gerrit@erg.abdn.ac.uk&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
GCC complains about empty macros in an 'if' statement, so convert
them to 'do {} while (0)' macros.

Fixes these build warnings:

net/dccp/output.c: In function 'dccp_xmit_packet':
../net/dccp/output.c:283:71: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]
  283 |                 dccp_pr_debug("transmit_skb() returned err=%d\n", err);
net/dccp/ackvec.c: In function 'dccp_ackvec_update_old':
../net/dccp/ackvec.c:163:80: warning: suggest braces around empty body in an 'else' statement [-Wempty-body]
  163 |                                               (unsigned long long)seqno, state);

Fixes: dc841e30eaea ("dccp: Extend CCID packet dequeueing interface")
Fixes: 380240864451 ("dccp ccid-2: Update code for the Ack Vector input/registration routine")
Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: dccp@vger.kernel.org
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Jakub Kicinski &lt;kuba@kernel.org&gt;
Cc: Gerrit Renker &lt;gerrit@erg.abdn.ac.uk&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>memcg: enable accounting for inet_bin_bucket cache</title>
<updated>2021-07-20T13:00:38+00:00</updated>
<author>
<name>Vasily Averin</name>
<email>vvs@virtuozzo.com</email>
</author>
<published>2021-07-19T10:44:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=990c74e3f41d7ae9711d3fb3e8b3f0d3088e0969'/>
<id>990c74e3f41d7ae9711d3fb3e8b3f0d3088e0969</id>
<content type='text'>
net namespace can create up to 64K tcp and dccp ports and force kernel
to allocate up to several megabytes of memory per netns
for inet_bind_bucket objects.

It makes sense to account for them to restrict the host's memory
consumption from inside the memcg-limited container.

Signed-off-by: Vasily Averin &lt;vvs@virtuozzo.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
net namespace can create up to 64K tcp and dccp ports and force kernel
to allocate up to several megabytes of memory per netns
for inet_bind_bucket objects.

It makes sense to account for them to restrict the host's memory
consumption from inside the memcg-limited container.

Signed-off-by: Vasily Averin &lt;vvs@virtuozzo.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sock: introduce sk_error_report</title>
<updated>2021-06-29T18:28:21+00:00</updated>
<author>
<name>Alexander Aring</name>
<email>aahringo@redhat.com</email>
</author>
<published>2021-06-27T22:48:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e3ae2365efc14269170a6326477e669332271ab3'/>
<id>e3ae2365efc14269170a6326477e669332271ab3</id>
<content type='text'>
This patch introduces a function wrapper to call the sk_error_report
callback. That will prepare to add additional handling whenever
sk_error_report is called, for example to trace socket errors.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch introduces a function wrapper to call the sk_error_report
callback. That will prepare to add additional handling whenever
sk_error_report is called, for example to trace socket errors.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dccp: tfrc: fix doc warnings in tfrc_equation.c</title>
<updated>2021-06-10T21:08:49+00:00</updated>
<author>
<name>Baokun Li</name>
<email>libaokun1@huawei.com</email>
</author>
<published>2021-06-10T13:26:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=cb8e2e4300fc17e1028cce554ecf72a9e6161742'/>
<id>cb8e2e4300fc17e1028cce554ecf72a9e6161742</id>
<content type='text'>
Add description for `tfrc_invert_loss_event_rate` to fix the W=1 warnings:

 net/dccp/ccids/lib/tfrc_equation.c:695: warning: Function parameter or
  member 'loss_event_rate' not described in 'tfrc_invert_loss_event_rate'

Signed-off-by: Baokun Li &lt;libaokun1@huawei.com&gt;
Reviewed-by: Richard Sailer &lt;richard_siegfried@systemli.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add description for `tfrc_invert_loss_event_rate` to fix the W=1 warnings:

 net/dccp/ccids/lib/tfrc_equation.c:695: warning: Function parameter or
  member 'loss_event_rate' not described in 'tfrc_invert_loss_event_rate'

Signed-off-by: Baokun Li &lt;libaokun1@huawei.com&gt;
Reviewed-by: Richard Sailer &lt;richard_siegfried@systemli.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
