summaryrefslogtreecommitdiff
path: root/scripts/stackusage
diff options
context:
space:
mode:
authorCosmin Ratiu <cratiu@nvidia.com>2026-05-07 10:56:05 +0300
committerJakub Kicinski <kuba@kernel.org>2026-05-10 10:16:01 -0700
commit35ce55100c61270eb8234bcc8ac87fec1d8e4ff9 (patch)
tree2b3f93cd636568777f2904f5ab5f65d8b5cf2012 /scripts/stackusage
parent31c777be2a2efd8980a660724955ba795ef751de (diff)
ipv4: Flush the FIB once on multiple nexthop removal
When a device is going down or when a net namespace is deleted, all nexthops on it are removed, and for each nexthop being removed the FIB table is flushed, which does a full trie traversal looking for entries marked RTNH_F_DEAD and removing them. This is O(N x R), with N being number of dev nexthops and R being number of IPv4 routes. The RTNL is held the entire time. When there are many nexthops to be removed and many routing entries, this can result in the RTNL being held for multiple minutes, which causes unhappiness in other processes trying to acquire the RTNL (e.g. systemd-networkd for DHCP renewals). In a complicated deployment with multiple vxlan devices, each having 16K nexthops and a total of 128K ipv4 routes, this is exactly what happens: nexthop_flush_dev() # loops over 16K nexthops -> remove_nexthop() -> __remove_nexthop() -> __remove_nexthop_fib() # marks fi->fib_flags |= RTNH_F_DEAD -> fib_flush() # for EACH nexthop! -> fib_table_flush() # walks the ENTIRE FIB, 128K entries This patch makes use of the previously added FIB flushing signal to only do a single FIB flush after all nexthops to be removed are marked as RTNH_F_DEAD: - __remove_nexthop_fib() no longer flushes the FIB. - nexthop_flush_dev() and flush_all_nexthops() now keep track whether any nexthop was removed and trigger a FIB flush at the end. - a new wrapper is defined, remove_one_nexthop() which calls remove_nexthop() and flushes if necessary. This is intended for places which must remove a single nexthop and shouldn't worry about the need to trigger a FIB flush. For now, the only caller is rtm_del_nexthop(). - The two direct callers of __remove_nexthop() get a WARN_ON_ONCE, since the nh about to be removed should not have any FIB entries referencing it when replacing or inserting a new one. This dramatically improves performance from O(N x R) to O(N + R). Releasing a nexthop reference in remove_nexthop() now no longer frees it. Instead, it is deleted when the last fib_info pointing to it gets freed via free_fib_info_rcu(). All routing code is already careful not to take into consideration routes marked with RTNH_F_DEAD. Tested with: DEV=eth2 ip link set up dev $DEV ip link add testnh0 link $DEV type macvlan mode bridge ip addr add 198.51.100.1/24 dev testnh0 ip link set testnh0 up seq 1 65536 | \ sed 's/.*/nexthop add id & via 198.51.100.2 dev testnh0/' | \ ip -batch - i=1 for a in $(seq 0 255); do for b in $(seq 0 255); do echo "route add 10.${a}.${b}.0/32 nhid $i" i=$((i + 1)) done done | ip -batch - time ip link set testnh0 down ip link del testnh0 Without this patch: real 0m32.601s user 0m0.000s sys 0m32.511s With this patch: real 0m0.209s user 0m0.000s sys 0m0.153s Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260507075606.322405-3-cratiu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Diffstat (limited to 'scripts/stackusage')
0 files changed, 0 insertions, 0 deletions