diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2026-06-25 12:25:36 -0700 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2026-06-25 12:25:36 -0700 |
| commit | 805185b7c7a1069e407b6f7b3bc98e44d415f484 (patch) | |
| tree | 8e252490fc55ac4a2ef591efa06d078211fc639f | |
| parent | c75597caada080effbfbc0a7fb10dc2a3bb543ad (diff) | |
| parent | fe9f4ee6c61a1410afd73bf011de5ae618004796 (diff) | |
Merge tag 'net-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter and IPsec.
Current release - regressions:
- do not acquire dev->tx_global_lock in netdev_watchdog_up()
- ethtool: keep rtnl_lock for ops using ethtool_op_get_link()
- fix deadlock in nested UP notifier events
Current release - new code bugs:
- eth:
- cn20k: fix subbank free list indexing for search order
- airoha: fix BQL underflow in shared QDMA TX ring
Previous releases - regressions:
- netfilter:
- flowtable: fix offloaded ct timeout never being extended
- nf_conncount: prevent connlimit drops for early confirmed ct
Previous releases - always broken:
- require CAP_NET_ADMIN in the originating netns when modifying
cross-netns devices
- report NAPI thread PID in the caller's pid namespace
- mac802154: fix dirty frag in in-place crypto for IOT radios
- sctp: hold socket lock when dumping endpoints in sctp_diag, avoid
an overflow
- eth: gve: fix header buffer corruption with header-split and HW-GRO
- af_key: initialize alg_key_len for IPComp states, prevent OOB read"
* tag 'net-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (213 commits)
selftests: bonding: add a test for VLAN propagation over a bonded real device
vlan: defer real device state propagation to netdev_work
net: add the driver-facing netdev_work scheduling API
net: turn the rx_mode work into a generic netdev_work facility
net: ethtool: keep rtnl_lock for ops using ethtool_op_get_link()
rxrpc: Fix rxrpc_rotate_tx_rotate() to check there's something to rotate
rxrpc: Fix leak of released call in recvmsg(MSG_PEEK)
rxrpc: Fix socket notification race
rxrpc: Fix potential infinite loop in rxrpc_recvmsg()
rxrpc: Fix oob challenge leak in cleanup after notification failure
rxrpc: Fix the reception of a reply packet before data transmission
afs: Fix uncancelled rxrpc OOB message handler
afs: Fix further netns teardown to cancel the preallocation charger
rxrpc: Fix double unlock in rxrpc_recvmsg()
rxrpc: Fix leak of connection from OOB challenge
rxrpc: Fix ACKALL packet handling
net: hns3: differentiate autoneg default values between copper and fiber
net: hns3: fix permanent link down deadlock after reset
net: hns3: refactor MAC autoneg and speed configuration
net: hns3: unify copper port ksettings configuration path
...
239 files changed, 3040 insertions, 1289 deletions
@@ -530,7 +530,8 @@ Luca Ceresoli <luca.ceresoli@bootlin.com> <luca@lucaceresoli.net> Luca Weiss <luca@lucaweiss.eu> <luca@z3ntu.xyz> Lucas De Marchi <demarchi@kernel.org> <lucas.demarchi@intel.com> Lukasz Luba <lukasz.luba@arm.com> <l.luba@partner.samsung.com> -Luo Jie <quic_luoj@quicinc.com> <luoj@codeaurora.org> +Luo Jie <jie.luo@oss.qualcomm.com> <luoj@codeaurora.org> +Luo Jie <jie.luo@oss.qualcomm.com> <quic_luoj@quicinc.com> Lance Yang <lance.yang@linux.dev> <ioworker0@gmail.com> Lance Yang <lance.yang@linux.dev> <mingzhe.yang@ly.com> Maciej W. Rozycki <macro@mips.com> <macro@imgtec.com> diff --git a/Documentation/devicetree/bindings/clock/qcom,ipq9574-cmn-pll.yaml b/Documentation/devicetree/bindings/clock/qcom,ipq9574-cmn-pll.yaml index de338c05190f..8cb86d74e489 100644 --- a/Documentation/devicetree/bindings/clock/qcom,ipq9574-cmn-pll.yaml +++ b/Documentation/devicetree/bindings/clock/qcom,ipq9574-cmn-pll.yaml @@ -8,7 +8,7 @@ title: Qualcomm CMN PLL Clock Controller on IPQ SoC maintainers: - Bjorn Andersson <andersson@kernel.org> - - Luo Jie <quic_luoj@quicinc.com> + - Luo Jie <jie.luo@oss.qualcomm.com> description: The CMN (or common) PLL clock controller expects a reference diff --git a/Documentation/devicetree/bindings/clock/qcom,qca8k-nsscc.yaml b/Documentation/devicetree/bindings/clock/qcom,qca8k-nsscc.yaml index 61473385da2d..480745349a5d 100644 --- a/Documentation/devicetree/bindings/clock/qcom,qca8k-nsscc.yaml +++ b/Documentation/devicetree/bindings/clock/qcom,qca8k-nsscc.yaml @@ -8,7 +8,7 @@ title: Qualcomm NSS Clock & Reset Controller on QCA8386/QCA8084 maintainers: - Bjorn Andersson <andersson@kernel.org> - - Luo Jie <quic_luoj@quicinc.com> + - Luo Jie <jie.luo@oss.qualcomm.com> description: | Qualcomm NSS clock control module provides the clocks and resets diff --git a/Documentation/devicetree/bindings/net/qcom,ipq9574-ppe.yaml b/Documentation/devicetree/bindings/net/qcom,ipq9574-ppe.yaml index 753f370b7605..6d0b21a10732 100644 --- a/Documentation/devicetree/bindings/net/qcom,ipq9574-ppe.yaml +++ b/Documentation/devicetree/bindings/net/qcom,ipq9574-ppe.yaml @@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml# title: Qualcomm IPQ packet process engine (PPE) maintainers: - - Luo Jie <quic_luoj@quicinc.com> + - Luo Jie <jie.luo@oss.qualcomm.com> - Lei Wei <quic_leiwei@quicinc.com> - Suruchi Agarwal <quic_suruchia@quicinc.com> - Pavithra R <quic_pavir@quicinc.com> diff --git a/Documentation/devicetree/bindings/net/renesas,ether.yaml b/Documentation/devicetree/bindings/net/renesas,ether.yaml index f0a52f47f95a..dd7187f12a67 100644 --- a/Documentation/devicetree/bindings/net/renesas,ether.yaml +++ b/Documentation/devicetree/bindings/net/renesas,ether.yaml @@ -121,8 +121,7 @@ examples: #size-cells = <0>; phy1: ethernet-phy@1 { - compatible = "ethernet-phy-id0022.1537", - "ethernet-phy-ieee802.3-c22"; + compatible = "ethernet-phy-id0022.1537"; reg = <1>; interrupt-parent = <&irqc0>; interrupts = <0 IRQ_TYPE_LEVEL_LOW>; diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst index fde601acd1d2..d2a238f8cc8b 100644 --- a/Documentation/networking/netdevices.rst +++ b/Documentation/networking/netdevices.rst @@ -433,6 +433,8 @@ exceptions) notifiers run under the instance lock. Please extend this documentation whenever you make explicit assumption about lock being held from a notifier. +Drivers **must not** generate nested notifications of the ops-locked types. + NETDEV_INTERNAL symbol namespace ================================ diff --git a/MAINTAINERS b/MAINTAINERS index bf97399d357f..15011f5752a9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -22323,9 +22323,9 @@ F: Documentation/devicetree/bindings/power/supply/qcom,pmi8998-charger.yaml F: drivers/power/supply/qcom_smbx.c QUALCOMM PPE DRIVER -M: Luo Jie <quic_luoj@quicinc.com> +M: Luo Jie <jie.luo@oss.qualcomm.com> L: netdev@vger.kernel.org -S: Supported +S: Maintained F: Documentation/devicetree/bindings/net/qcom,ipq9574-ppe.yaml F: Documentation/networking/device_drivers/ethernet/qualcomm/ppe/ppe.rst F: drivers/net/ethernet/qualcomm/ppe/ @@ -26001,9 +26001,8 @@ S: Maintained F: drivers/net/ethernet/dlink/sundance.c SUNPLUS ETHERNET DRIVER -M: Wells Lu <wellslutw@gmail.com> L: netdev@vger.kernel.org -S: Maintained +S: Orphan W: https://sunplus.atlassian.net/wiki/spaces/doc/overview F: Documentation/devicetree/bindings/net/sunplus,sp7021-emac.yaml F: drivers/net/ethernet/sunplus/ diff --git a/drivers/net/dsa/mxl862xx/mxl862xx-host.c b/drivers/net/dsa/mxl862xx/mxl862xx-host.c index d55f9dff6433..4acd216f7cc0 100644 --- a/drivers/net/dsa/mxl862xx/mxl862xx-host.c +++ b/drivers/net/dsa/mxl862xx/mxl862xx-host.c @@ -12,6 +12,7 @@ #include <linux/crc16.h> #include <linux/iopoll.h> #include <linux/limits.h> +#include <linux/unaligned.h> #include <net/dsa.h> #include "mxl862xx.h" #include "mxl862xx-host.h" @@ -40,12 +41,13 @@ static void mxl862xx_crc_err_work_fn(struct work_struct *work) crc_err_work); struct dsa_port *dp; - dev_warn(&priv->mdiodev->dev, - "MDIO CRC error detected, shutting down all ports\n"); - rtnl_lock(); - dsa_switch_for_each_cpu_port(dp, priv->ds) - dev_close(dp->conduit); + if (!test_bit(MXL862XX_FLAG_WORK_STOPPED, &priv->flags)) { + dev_warn(&priv->mdiodev->dev, + "MDIO CRC error detected, shutting down all ports\n"); + dsa_switch_for_each_cpu_port(dp, priv->ds) + dev_close(dp->conduit); + } rtnl_unlock(); clear_bit(MXL862XX_FLAG_CRC_ERR, &priv->flags); @@ -349,7 +351,7 @@ int mxl862xx_api_wrap(struct mxl862xx_priv *priv, u16 cmd, void *_data, * zero words individually. */ for (i = 0, zeros = 0; i < size / 2 && zeros < RST_DATA_THRESHOLD; i++) - if (!data[i]) + if (!get_unaligned_le16(&data[i])) zeros++; if (zeros < RST_DATA_THRESHOLD && (size & 1) && !*(u8 *)&data[i]) @@ -395,7 +397,7 @@ int mxl862xx_api_wrap(struct mxl862xx_priv *priv, u16 cmd, void *_data, */ val = *(u8 *)&data[i] | ((crc & 0xff) << 8); } else { - val = le16_to_cpu(data[i]); + val = get_unaligned_le16(&data[i]); } /* After RST_DATA, skip zero data words as the registers @@ -453,7 +455,7 @@ int mxl862xx_api_wrap(struct mxl862xx_priv *priv, u16 cmd, void *_data, *(uint8_t *)&data[i] = ret & 0xff; crc = (ret >> 8) & 0xff; } else { - data[i] = cpu_to_le16((u16)ret); + put_unaligned_le16((u16)ret, &data[i]); } } diff --git a/drivers/net/dsa/realtek/rtl8366rb-leds.c b/drivers/net/dsa/realtek/rtl8366rb-leds.c index 509ffd3f8db5..ba50d311cb15 100644 --- a/drivers/net/dsa/realtek/rtl8366rb-leds.c +++ b/drivers/net/dsa/realtek/rtl8366rb-leds.c @@ -89,6 +89,7 @@ static int rtl8366rb_setup_led(struct realtek_priv *priv, struct dsa_port *dp, struct led_init_data init_data = { }; enum led_default_state state; struct rtl8366rb_led *led; + char name[64]; u32 led_group; int ret; @@ -129,10 +130,9 @@ static int rtl8366rb_setup_led(struct realtek_priv *priv, struct dsa_port *dp, init_data.fwnode = led_fwnode; init_data.devname_mandatory = true; - init_data.devicename = kasprintf(GFP_KERNEL, "Realtek-%d:0%d:%d", - dp->ds->index, dp->index, led_group); - if (!init_data.devicename) - return -ENOMEM; + snprintf(name, sizeof(name), "Realtek-%d:0%d:%d", + dp->ds->index, dp->index, led_group); + init_data.devicename = name; ret = devm_led_classdev_register_ext(priv->dev, &led->cdev, &init_data); if (ret) { diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.c b/drivers/net/dsa/sja1105/sja1105_ptp.c index fefe46e2a5e6..350f958dcb2a 100644 --- a/drivers/net/dsa/sja1105/sja1105_ptp.c +++ b/drivers/net/dsa/sja1105/sja1105_ptp.c @@ -755,7 +755,7 @@ static int sja1105_per_out_enable(struct sja1105_private *priv, * 2 edges on PTP_CLK. So check for truncation which happens * at periods larger than around 68.7 seconds. */ - pin_duration = ns_to_sja1105_ticks(pin_duration / 2); + pin_duration = max_t(u64, ns_to_sja1105_ticks(pin_duration / 2), 1); if (pin_duration > U32_MAX) { rc = -ERANGE; goto out; diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c index 64dde6464f3f..932b3a3df2e5 100644 --- a/drivers/net/ethernet/airoha/airoha_eth.c +++ b/drivers/net/ethernet/airoha/airoha_eth.c @@ -1004,6 +1004,7 @@ static int airoha_qdma_tx_napi_poll(struct napi_struct *napi, int budget) e = &q->entry[index]; skb = e->skb; + e->skb = NULL; dma_unmap_single(eth->dev, e->dma_addr, e->dma_len, DMA_TO_DEVICE); @@ -1147,55 +1148,76 @@ static int airoha_qdma_init_tx(struct airoha_qdma *qdma) return 0; } -static void airoha_qdma_cleanup_tx_queue(struct airoha_queue *q) +static void airoha_qdma_tx_cleanup(struct airoha_qdma *qdma) { - struct airoha_qdma *qdma = q->qdma; - struct airoha_eth *eth = qdma->eth; - int i, qid = q - &qdma->q_tx[0]; - u16 index = 0; + u32 status; + int i; - spin_lock_bh(&q->lock); - for (i = 0; i < q->ndesc; i++) { - struct airoha_queue_entry *e = &q->entry[i]; - struct airoha_qdma_desc *desc = &q->desc[i]; + airoha_qdma_clear(qdma, REG_QDMA_GLOBAL_CFG, + GLOBAL_CFG_TX_DMA_EN_MASK); + if (read_poll_timeout(airoha_qdma_rr, status, + !(status & GLOBAL_CFG_TX_DMA_BUSY_MASK), + USEC_PER_MSEC, 50 * USEC_PER_MSEC, true, + qdma, REG_QDMA_GLOBAL_CFG)) + dev_warn(qdma->eth->dev, "QDMA TX DMA busy timeout\n"); - if (!e->dma_addr) + for (i = 0; i < ARRAY_SIZE(qdma->q_tx); i++) { + struct airoha_queue *q = &qdma->q_tx[i]; + u16 index = 0; + int j; + + if (!q->ndesc) continue; - dma_unmap_single(eth->dev, e->dma_addr, e->dma_len, - DMA_TO_DEVICE); - dev_kfree_skb_any(e->skb); - e->dma_addr = 0; - e->skb = NULL; - list_add_tail(&e->list, &q->tx_list); + spin_lock_bh(&q->lock); - /* Reset DMA descriptor */ - WRITE_ONCE(desc->ctrl, 0); - WRITE_ONCE(desc->addr, 0); - WRITE_ONCE(desc->data, 0); - WRITE_ONCE(desc->msg0, 0); - WRITE_ONCE(desc->msg1, 0); - WRITE_ONCE(desc->msg2, 0); + q->flushing = true; + for (j = 0; j < q->ndesc; j++) { + struct airoha_queue_entry *e = &q->entry[j]; + struct airoha_qdma_desc *desc = &q->desc[j]; + struct sk_buff *skb = e->skb; - q->queued--; - } + if (!e->dma_addr) + continue; - if (!list_empty(&q->tx_list)) { - struct airoha_queue_entry *e; + dma_unmap_single(qdma->eth->dev, e->dma_addr, + e->dma_len, DMA_TO_DEVICE); + e->dma_addr = 0; + list_add_tail(&e->list, &q->tx_list); + + WRITE_ONCE(desc->ctrl, 0); + WRITE_ONCE(desc->addr, 0); + WRITE_ONCE(desc->data, 0); + WRITE_ONCE(desc->msg0, 0); + WRITE_ONCE(desc->msg1, 0); + WRITE_ONCE(desc->msg2, 0); + + if (skb) { + struct netdev_queue *txq; + + txq = skb_get_tx_queue(skb->dev, skb); + netdev_tx_completed_queue(txq, 1, skb->len); + dev_kfree_skb_any(skb); + e->skb = NULL; + } - e = list_first_entry(&q->tx_list, struct airoha_queue_entry, - list); - index = e - q->entry; - } - /* Set TX_DMA_IDX to TX_CPU_IDX to notify the hw the QDMA TX ring is - * empty. - */ - airoha_qdma_rmw(qdma, REG_TX_CPU_IDX(qid), TX_RING_CPU_IDX_MASK, - FIELD_PREP(TX_RING_CPU_IDX_MASK, index)); - airoha_qdma_rmw(qdma, REG_TX_DMA_IDX(qid), TX_RING_DMA_IDX_MASK, - FIELD_PREP(TX_RING_DMA_IDX_MASK, index)); + q->queued--; + } - spin_unlock_bh(&q->lock); + if (!list_empty(&q->tx_list)) { + struct airoha_queue_entry *e; + + e = list_first_entry(&q->tx_list, + struct airoha_queue_entry, list); + index = e - q->entry; + } + airoha_qdma_rmw(qdma, REG_TX_CPU_IDX(i), TX_RING_CPU_IDX_MASK, + FIELD_PREP(TX_RING_CPU_IDX_MASK, index)); + airoha_qdma_rmw(qdma, REG_TX_DMA_IDX(i), TX_RING_DMA_IDX_MASK, + FIELD_PREP(TX_RING_DMA_IDX_MASK, index)); + + spin_unlock_bh(&q->lock); + } } static int airoha_qdma_init_hfwd_queues(struct airoha_qdma *qdma) @@ -1523,10 +1545,23 @@ static int airoha_qdma_init(struct platform_device *pdev, return airoha_qdma_hw_init(qdma); } -static void airoha_qdma_cleanup(struct airoha_qdma *qdma) +static void airoha_qdma_cleanup(struct airoha_eth *eth, + struct airoha_qdma *qdma) { int i; + if (test_bit(DEV_STATE_INITIALIZED, ð->state)) { + u32 status; + + airoha_qdma_clear(qdma, REG_QDMA_GLOBAL_CFG, + GLOBAL_CFG_RX_DMA_EN_MASK); + if (read_poll_timeout(airoha_qdma_rr, status, + !(status & GLOBAL_CFG_RX_DMA_BUSY_MASK), + USEC_PER_MSEC, 50 * USEC_PER_MSEC, true, + qdma, REG_QDMA_GLOBAL_CFG)) + dev_warn(eth->dev, "QDMA RX DMA busy timeout\n"); + } + for (i = 0; i < ARRAY_SIZE(qdma->q_rx); i++) { if (!qdma->q_rx[i].ndesc) continue; @@ -1546,12 +1581,6 @@ static void airoha_qdma_cleanup(struct airoha_qdma *qdma) netif_napi_del(&qdma->q_tx_irq[i].napi); } - for (i = 0; i < ARRAY_SIZE(qdma->q_tx); i++) { - if (!qdma->q_tx[i].ndesc) - continue; - - airoha_qdma_cleanup_tx_queue(&qdma->q_tx[i]); - } } static int airoha_hw_init(struct platform_device *pdev, @@ -1593,7 +1622,7 @@ static int airoha_hw_init(struct platform_device *pdev, return 0; error: for (i = 0; i < ARRAY_SIZE(eth->qdma); i++) - airoha_qdma_cleanup(ð->qdma[i]); + airoha_qdma_cleanup(eth, ð->qdma[i]); return err; } @@ -1603,7 +1632,7 @@ static void airoha_hw_cleanup(struct airoha_eth *eth) int i; for (i = 0; i < ARRAY_SIZE(eth->qdma); i++) - airoha_qdma_cleanup(ð->qdma[i]); + airoha_qdma_cleanup(eth, ð->qdma[i]); airoha_ppe_deinit(eth); } @@ -1837,11 +1866,6 @@ static int airoha_dev_open(struct net_device *netdev) } port->users++; - airoha_qdma_set(qdma, REG_QDMA_GLOBAL_CFG, - GLOBAL_CFG_TX_DMA_EN_MASK | - GLOBAL_CFG_RX_DMA_EN_MASK); - qdma->users++; - if (!airoha_is_lan_gdm_dev(dev) && airoha_ppe_is_enabled(qdma->eth, 1)) pse_port = FE_PSE_PORT_PPE2; @@ -1880,12 +1904,9 @@ static int airoha_dev_stop(struct net_device *netdev) struct airoha_gdm_dev *dev = netdev_priv(netdev); struct airoha_gdm_port *port = dev->port; struct airoha_qdma *qdma = dev->qdma; - int i; netif_tx_disable(netdev); airoha_set_vip_for_gdm_port(dev, false); - for (i = 0; i < netdev->num_tx_queues; i++) - netdev_tx_reset_subqueue(netdev, i); if (--port->users) airoha_set_port_mtu(dev->eth, port); @@ -1893,20 +1914,6 @@ static int airoha_dev_stop(struct net_device *netdev) airoha_set_gdm_port_fwd_cfg(qdma->eth, REG_GDM_FWD_CFG(port->id), FE_PSE_PORT_DROP); - - if (!--qdma->users) { - airoha_qdma_clear(qdma, REG_QDMA_GLOBAL_CFG, - GLOBAL_CFG_TX_DMA_EN_MASK | - GLOBAL_CFG_RX_DMA_EN_MASK); - - for (i = 0; i < ARRAY_SIZE(qdma->q_tx); i++) { - if (!qdma->q_tx[i].ndesc) - continue; - - airoha_qdma_cleanup_tx_queue(&qdma->q_tx[i]); - } - } - return 0; } @@ -2110,7 +2117,7 @@ static u16 airoha_dev_select_queue(struct net_device *netdev, */ channel = netdev_uses_dsa(netdev) ? skb_get_queue_mapping(skb) : port->id; channel = channel % AIROHA_NUM_QOS_CHANNELS; - queue = (skb->priority - 1) % AIROHA_NUM_QOS_QUEUES; /* QoS queue */ + queue = skb->priority % AIROHA_NUM_QOS_QUEUES; queue = channel * AIROHA_NUM_QOS_QUEUES + queue; return queue < netdev->num_tx_queues ? queue : 0; @@ -2229,6 +2236,9 @@ static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb, spin_lock_bh(&q->lock); + if (q->flushing) + goto error_unlock; + txq = skb_get_tx_queue(netdev, skb); nr_frags = 1 + skb_shinfo(skb)->nr_frags; @@ -2309,7 +2319,7 @@ error_unmap: e->dma_addr = 0; } list_splice(&tx_list, &q->tx_list); - +error_unlock: spin_unlock_bh(&q->lock); error: dev_kfree_skb_any(skb); @@ -2395,7 +2405,7 @@ static int airoha_qdma_set_chan_tx_sched(struct net_device *netdev, struct airoha_gdm_dev *dev = netdev_priv(netdev); int i; - for (i = 0; i < AIROHA_NUM_TX_RING; i++) + for (i = 0; i < AIROHA_NUM_QOS_QUEUES; i++) airoha_qdma_clear(dev->qdma, REG_QUEUE_CLOSE_CFG(channel), TXQ_DISABLE_CHAN_QUEUE_MASK(channel, i)); @@ -2789,7 +2799,7 @@ static int airoha_tc_htb_alloc_leaf_queue(struct net_device *netdev, struct tc_htb_qopt_offload *opt) { u32 channel = TC_H_MIN(opt->classid) % AIROHA_NUM_QOS_CHANNELS; - int err, num_tx_queues = netdev->real_num_tx_queues; + int err, num_tx_queues = AIROHA_NUM_TX_RING + channel + 1; struct airoha_gdm_dev *dev = netdev_priv(netdev); struct airoha_qdma *qdma = dev->qdma; @@ -2806,13 +2816,15 @@ static int airoha_tc_htb_alloc_leaf_queue(struct net_device *netdev, if (err) goto error; - err = netif_set_real_num_tx_queues(netdev, num_tx_queues + 1); - if (err) { - airoha_qdma_set_tx_rate_limit(netdev, channel, 0, - opt->quantum); - NL_SET_ERR_MSG_MOD(opt->extack, - "failed setting real_num_tx_queues"); - goto error; + if (num_tx_queues > netdev->real_num_tx_queues) { + err = netif_set_real_num_tx_queues(netdev, num_tx_queues); + if (err) { + airoha_qdma_set_tx_rate_limit(netdev, channel, 0, + opt->quantum); + NL_SET_ERR_MSG_MOD(opt->extack, + "failed setting real_num_tx_queues"); + goto error; + } } set_bit(channel, dev->qos_sq_bmap); @@ -3003,13 +3015,18 @@ static int airoha_dev_setup_tc_block(struct net_device *dev, static void airoha_tc_remove_htb_queue(struct net_device *netdev, int queue) { struct airoha_gdm_dev *dev = netdev_priv(netdev); + int num_tx_queues = AIROHA_NUM_TX_RING; struct airoha_qdma *qdma = dev->qdma; - netif_set_real_num_tx_queues(netdev, netdev->real_num_tx_queues - 1); - airoha_qdma_set_tx_rate_limit(netdev, queue + 1, 0, 0); + airoha_qdma_set_tx_rate_limit(netdev, queue, 0, 0); clear_bit(queue, qdma->qos_channel_map); clear_bit(queue, dev->qos_sq_bmap); + + if (!bitmap_empty(dev->qos_sq_bmap, AIROHA_NUM_QOS_CHANNELS)) + num_tx_queues += find_last_bit(dev->qos_sq_bmap, + AIROHA_NUM_QOS_CHANNELS) + 1; + netif_set_real_num_tx_queues(netdev, num_tx_queues); } static int airoha_tc_htb_delete_leaf_queue(struct net_device *netdev, @@ -3413,8 +3430,12 @@ static int airoha_probe(struct platform_device *pdev) if (err) goto error_netdev_free; - for (i = 0; i < ARRAY_SIZE(eth->qdma); i++) + for (i = 0; i < ARRAY_SIZE(eth->qdma); i++) { airoha_qdma_start_napi(ð->qdma[i]); + airoha_qdma_set(ð->qdma[i], REG_QDMA_GLOBAL_CFG, + GLOBAL_CFG_TX_DMA_EN_MASK | + GLOBAL_CFG_RX_DMA_EN_MASK); + } for_each_child_of_node(pdev->dev.of_node, np) { if (!of_device_is_compatible(np, "airoha,eth-mac")) @@ -3437,8 +3458,10 @@ static int airoha_probe(struct platform_device *pdev) return 0; error_napi_stop: - for (i = 0; i < ARRAY_SIZE(eth->qdma); i++) + for (i = 0; i < ARRAY_SIZE(eth->qdma); i++) { airoha_qdma_stop_napi(ð->qdma[i]); + airoha_qdma_tx_cleanup(ð->qdma[i]); + } for (i = 0; i < ARRAY_SIZE(eth->ports); i++) { struct airoha_gdm_port *port = eth->ports[i]; @@ -3474,8 +3497,10 @@ static void airoha_remove(struct platform_device *pdev) struct airoha_eth *eth = platform_get_drvdata(pdev); int i; - for (i = 0; i < ARRAY_SIZE(eth->qdma); i++) + for (i = 0; i < ARRAY_SIZE(eth->qdma); i++) { airoha_qdma_stop_napi(ð->qdma[i]); + airoha_qdma_tx_cleanup(ð->qdma[i]); + } for (i = 0; i < ARRAY_SIZE(eth->ports); i++) { struct airoha_gdm_port *port = eth->ports[i]; diff --git a/drivers/net/ethernet/airoha/airoha_eth.h b/drivers/net/ethernet/airoha/airoha_eth.h index 41d2e7a1f9fb..d7ff8c5200e2 100644 --- a/drivers/net/ethernet/airoha/airoha_eth.h +++ b/drivers/net/ethernet/airoha/airoha_eth.h @@ -197,6 +197,7 @@ struct airoha_queue { int free_thr; int buf_size; bool txq_stopped; + bool flushing; struct napi_struct napi; struct page_pool *page_pool; @@ -524,8 +525,6 @@ struct airoha_qdma { struct airoha_eth *eth; void __iomem *regs; - int users; - struct airoha_irq_bank irq_banks[AIROHA_MAX_NUM_IRQ_BANKS]; struct airoha_tx_irq_queue q_tx_irq[AIROHA_NUM_TX_IRQ]; diff --git a/drivers/net/ethernet/airoha/airoha_ppe.c b/drivers/net/ethernet/airoha/airoha_ppe.c index 329e7c2aae89..42f4b0f21d17 100644 --- a/drivers/net/ethernet/airoha/airoha_ppe.c +++ b/drivers/net/ethernet/airoha/airoha_ppe.c @@ -1601,7 +1601,8 @@ int airoha_ppe_init(struct airoha_eth *eth) return -ENOMEM; } - ppe->foe_check_time = devm_kzalloc(eth->dev, ppe_num_entries, + ppe->foe_check_time = devm_kzalloc(eth->dev, + ppe_num_entries * sizeof(*ppe->foe_check_time), GFP_KERNEL); if (!ppe->foe_check_time) return -ENOMEM; diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index 92d149d4f091..5d05020a6d05 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -752,6 +752,18 @@ static void ena_destroy_all_tx_queues(struct ena_adapter *adapter) } } +static void ena_destroy_xdp_tx_queues(struct ena_adapter *adapter) +{ + u16 ena_qid; + int i; + + for (i = adapter->xdp_first_ring; + i < adapter->xdp_first_ring + adapter->xdp_num_queues; i++) { + ena_qid = ENA_IO_TXQ_IDX(i); + ena_com_destroy_io_queue(adapter->ena_dev, ena_qid); + } +} + static void ena_destroy_all_rx_queues(struct ena_adapter *adapter) { u16 ena_qid; @@ -2078,14 +2090,21 @@ static int create_queues_with_size_backoff(struct ena_adapter *adapter) rc = ena_setup_tx_resources_in_range(adapter, 0, adapter->num_io_queues); - if (rc) + if (rc) { + ena_destroy_xdp_tx_queues(adapter); + ena_free_all_io_tx_resources_in_range(adapter, + adapter->xdp_first_ring, + adapter->xdp_num_queues); goto err_setup_tx; + } rc = ena_create_io_tx_queues_in_range(adapter, 0, adapter->num_io_queues); - if (rc) + if (rc) { + ena_destroy_xdp_tx_queues(adapter); goto err_create_tx_queues; + } rc = ena_setup_all_rx_resources(adapter); if (rc) diff --git a/drivers/net/ethernet/amd/au1000_eth.c b/drivers/net/ethernet/amd/au1000_eth.c index 9d35ac348ebe..5a04056e38fa 100644 --- a/drivers/net/ethernet/amd/au1000_eth.c +++ b/drivers/net/ethernet/amd/au1000_eth.c @@ -943,9 +943,10 @@ static int au1000_close(struct net_device *dev) /* stop the device */ netif_stop_queue(dev); + spin_unlock_irqrestore(&aup->lock, flags); + /* disable the interrupt */ free_irq(dev->irq, dev); - spin_unlock_irqrestore(&aup->lock, flags); return 0; } diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c index 19e078479b0d..5b2640bd31c3 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c @@ -4748,6 +4748,7 @@ int bnx2x_alloc_mem_bp(struct bnx2x *bp) fp = kzalloc_objs(*fp, bp->fp_array_size); if (!fp) goto alloc_err; + bp->fp = fp; for (i = 0; i < bp->fp_array_size; i++) { fp[i].tpa_info = kzalloc_objs(struct bnx2x_agg_info, @@ -4756,8 +4757,6 @@ int bnx2x_alloc_mem_bp(struct bnx2x *bp) goto alloc_err; } - bp->fp = fp; - /* allocate sp objs */ bp->sp_objs = kzalloc_objs(struct bnx2x_sp_objs, bp->fp_array_size); if (!bp->sp_objs) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 055e93a417b6..7513618793da 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -10530,7 +10530,7 @@ static void bnxt_accumulate_stats(struct bnxt_stats_mem *stats) stats->hw_masks, stats->len / 8, false); } -static void bnxt_accumulate_all_stats(struct bnxt *bp) +static void bnxt_accumulate_ring_stats(struct bnxt *bp) { struct bnxt_stats_mem *ring0_stats; bool ignore_zero = false; @@ -10553,6 +10553,10 @@ static void bnxt_accumulate_all_stats(struct bnxt *bp) ring0_stats->hw_masks, ring0_stats->len / 8, ignore_zero); } +} + +static void bnxt_accumulate_port_stats(struct bnxt *bp) +{ if (bp->flags & BNXT_FLAG_PORT_STATS) { struct bnxt_stats_mem *stats = &bp->port_stats; __le64 *hw_stats = stats->hw_stats; @@ -10575,6 +10579,41 @@ static void bnxt_accumulate_all_stats(struct bnxt *bp) } } +static void bnxt_accumulate_all_stats(struct bnxt *bp) +{ + bnxt_accumulate_ring_stats(bp); + bnxt_accumulate_port_stats(bp); +} + +/* Re-accumulate ring stats from DMA buffers if stale. + * uAPIs for reading sw_stats should call this first. + * + * We promise user space update frequency of bp->stats_coal_ticks but + * the update is a two step process - first device updates the DMA buffer, + * then we have to update from that buffer to driver stats in the service work. + * Worst case we would be 2x off from the desired frequency. + * Sync the stats sooner, if stale. The 20% threshold was chosen arbitrarily. + * + * Ideally we would split the user-configured time into two portions, + * i.e. also lower the DMA period by the 20%. But the DMA timer seems to have + * too coarse granularity to play such tricks. + */ +void bnxt_sync_ring_stats(struct bnxt *bp) +{ + unsigned long stale; + + if (!netif_running(bp->dev) || !bp->stats_coal_ticks) + return; + + spin_lock(&bp->stats_lock); + stale = usecs_to_jiffies(bp->stats_coal_ticks / 5); + if (time_after_eq(jiffies, bp->stats_updated_jiffies + stale)) { + bnxt_accumulate_ring_stats(bp); + bp->stats_updated_jiffies = jiffies; + } + spin_unlock(&bp->stats_lock); +} + static int bnxt_hwrm_port_qstats(struct bnxt *bp, u8 flags) { struct hwrm_port_qstats_input *req; @@ -13577,6 +13616,7 @@ bnxt_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) return; } + bnxt_sync_ring_stats(bp); bnxt_get_ring_stats(bp, stats); bnxt_add_prev_stats(bp, stats); @@ -14753,7 +14793,10 @@ static void bnxt_sp_task(struct work_struct *work) if (test_and_clear_bit(BNXT_PERIODIC_STATS_SP_EVENT, &bp->sp_event)) { bnxt_hwrm_port_qstats(bp, 0); bnxt_hwrm_port_qstats_ext(bp, 0); + spin_lock(&bp->stats_lock); bnxt_accumulate_all_stats(bp); + bp->stats_updated_jiffies = jiffies; + spin_unlock(&bp->stats_lock); } if (test_and_clear_bit(BNXT_LINK_CHNG_SP_EVENT, &bp->sp_event)) { @@ -15488,6 +15531,7 @@ static int bnxt_init_board(struct pci_dev *pdev, struct net_device *dev) INIT_DELAYED_WORK(&bp->fw_reset_task, bnxt_fw_reset_task); spin_lock_init(&bp->ntp_fltr_lock); + spin_lock_init(&bp->stats_lock); #if BITS_PER_LONG == 32 spin_lock_init(&bp->db_lock); #endif @@ -16056,6 +16100,7 @@ static void bnxt_get_queue_stats_rx(struct net_device *dev, int i, if (!bp->bnapi) return; + bnxt_sync_ring_stats(bp); cpr = &bp->bnapi[i]->cp_ring; sw = cpr->stats.sw_stats; @@ -16084,6 +16129,7 @@ static void bnxt_get_queue_stats_tx(struct net_device *dev, int i, if (!bp->tx_ring) return; + bnxt_sync_ring_stats(bp); bnapi = bp->tx_ring[bp->tx_ring_map[i]].bnapi; sw = bnapi->cp_ring.stats.sw_stats; diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index 6d312259f852..6335dfc14c98 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -2620,6 +2620,10 @@ struct bnxt { #define BNXT_MIN_STATS_COAL_TICKS 250000 #define BNXT_MAX_STATS_COAL_TICKS 1000000 + /* Protects stats_updated_jiffies and writes to sw_stats */ + spinlock_t stats_lock; + unsigned long stats_updated_jiffies; + struct work_struct sp_task; unsigned long sp_event; #define BNXT_RX_NTP_FLTR_SP_EVENT 1 @@ -3027,6 +3031,7 @@ void bnxt_reenable_sriov(struct bnxt *bp); void bnxt_close_nic(struct bnxt *, bool, bool); void bnxt_get_ring_drv_stats(struct bnxt *bp, struct bnxt_total_ring_drv_stats *stats); +void bnxt_sync_ring_stats(struct bnxt *bp); bool bnxt_rfs_capable(struct bnxt *bp, bool new_rss_ctx); int bnxt_dbg_hwrm_rd_reg(struct bnxt *bp, u32 reg_off, u16 num_words, u32 *reg_buf); diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c index 56d74a3c24b7..62bc9cae613c 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c @@ -606,6 +606,7 @@ static void bnxt_get_ethtool_stats(struct net_device *dev, goto skip_ring_stats; } + bnxt_sync_ring_stats(bp); tpa_stats = bnxt_get_num_tpa_ring_stats(bp); for (i = 0; i < bp->cp_nr_rings; i++) { struct bnxt_napi *bnapi = bp->bnapi[i]; diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index a12aa21244e8..fd282a1700fb 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -4522,6 +4522,13 @@ static int macb_setup_tc(struct net_device *dev, enum tc_setup_type type, } } +static void macb_tx_timeout(struct net_device *dev, unsigned int q) +{ + struct macb *bp = netdev_priv(dev); + + macb_tx_restart(&bp->queues[q]); +} + static const struct net_device_ops macb_netdev_ops = { .ndo_open = macb_open, .ndo_stop = macb_close, @@ -4540,6 +4547,7 @@ static const struct net_device_ops macb_netdev_ops = { .ndo_hwtstamp_set = macb_hwtstamp_set, .ndo_hwtstamp_get = macb_hwtstamp_get, .ndo_setup_tc = macb_setup_tc, + .ndo_tx_timeout = macb_tx_timeout, }; /* Configure peripheral capabilities according to device tree diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c index 45f276c2c3ec..858ba844ac51 100644 --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c @@ -2212,7 +2212,7 @@ dpaa2_switch_prechangeupper_sanity_checks(struct net_device *netdev, if (err) { NL_SET_ERR_MSG_MOD(extack, "Cannot join a bridge while VLAN uppers are present"); - return 0; + return err; } netdev_for_each_lower_dev(upper_dev, other_dev, iter) { @@ -2233,6 +2233,7 @@ dpaa2_switch_prechangeupper_sanity_checks(struct net_device *netdev, static int dpaa2_switch_port_prechangeupper(struct net_device *netdev, struct netdev_notifier_changeupper_info *info) { + struct ethsw_port_priv *port_priv; struct netlink_ext_ack *extack; struct net_device *upper_dev; int err; @@ -2251,6 +2252,13 @@ static int dpaa2_switch_port_prechangeupper(struct net_device *netdev, if (!info->linking) dpaa2_switch_port_pre_bridge_leave(netdev); + } else if (is_vlan_dev(upper_dev)) { + port_priv = netdev_priv(netdev); + if (port_priv->fdb->bridge_dev) { + NL_SET_ERR_MSG_MOD(extack, + "Cannot accept VLAN uppers while bridged"); + return -EOPNOTSUPP; + } } return 0; diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c index 4e771f852358..437a15bbb47b 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c +++ b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c @@ -322,6 +322,9 @@ static void enetc4_default_rings_allocation(struct enetc_pf *pf) val = enetc4_psicfgr0_val_construct(false, num_tx_bdr, num_rx_bdr); enetc_port_wr(hw, ENETC4_PSICFGR0(0), val); + if (!pf->caps.num_vsi) + return; + num_rx_bdr = pf->caps.num_rx_bdr - num_rx_bdr; rx_rem = num_rx_bdr % pf->caps.num_vsi; num_rx_bdr = num_rx_bdr / pf->caps.num_vsi; diff --git a/drivers/net/ethernet/google/gve/gve_ethtool.c b/drivers/net/ethernet/google/gve/gve_ethtool.c index 7cc22916852f..8199738ba979 100644 --- a/drivers/net/ethernet/google/gve/gve_ethtool.c +++ b/drivers/net/ethernet/google/gve/gve_ethtool.c @@ -984,7 +984,8 @@ const struct ethtool_ops gve_ethtool_ops = { .supported_ring_params = ETHTOOL_RING_USE_TCP_DATA_SPLIT | ETHTOOL_RING_USE_RX_BUF_LEN, .op_needs_rtnl = ETHTOOL_OP_NEEDS_RTNL_SCHANNELS | - ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM, + ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM | + ETHTOOL_OP_NEEDS_RTNL_GLINK, .get_drvinfo = gve_get_drvinfo, .get_strings = gve_get_strings, .get_sset_count = gve_get_sset_count, diff --git a/drivers/net/ethernet/google/gve/gve_rx_dqo.c b/drivers/net/ethernet/google/gve/gve_rx_dqo.c index 7924dce719e2..02cba280d81a 100644 --- a/drivers/net/ethernet/google/gve/gve_rx_dqo.c +++ b/drivers/net/ethernet/google/gve/gve_rx_dqo.c @@ -21,11 +21,13 @@ static void gve_rx_free_hdr_bufs(struct gve_priv *priv, struct gve_rx_ring *rx) { struct device *hdev = &priv->pdev->dev; - int buf_count = rx->dqo.bufq.mask + 1; if (rx->dqo.hdr_bufs.data) { - dma_free_coherent(hdev, priv->header_buf_size * buf_count, - rx->dqo.hdr_bufs.data, rx->dqo.hdr_bufs.addr); + size_t size = + (size_t)priv->header_buf_size * rx->dqo.num_buf_states; + + dma_free_coherent(hdev, size, rx->dqo.hdr_bufs.data, + rx->dqo.hdr_bufs.addr); rx->dqo.hdr_bufs.data = NULL; } } @@ -254,7 +256,7 @@ int gve_rx_alloc_ring_dqo(struct gve_priv *priv, /* Allocate header buffers for header-split */ if (cfg->enable_header_split) - if (gve_rx_alloc_hdr_bufs(priv, rx, buffer_queue_slots)) + if (gve_rx_alloc_hdr_bufs(priv, rx, rx->dqo.num_buf_states)) goto err; /* Allocate RX completion queue */ @@ -381,10 +383,13 @@ void gve_rx_post_buffers_dqo(struct gve_rx_ring *rx) break; } - if (rx->dqo.hdr_bufs.data) + if (rx->dqo.hdr_bufs.data) { + u16 buf_id = le16_to_cpu(desc->buf_id); + desc->header_buf_addr = cpu_to_le64(rx->dqo.hdr_bufs.addr + - priv->header_buf_size * bufq->tail); + (size_t)priv->header_buf_size * buf_id); + } bufq->tail = (bufq->tail + 1) & bufq->mask; complq->num_free_slots--; @@ -826,10 +831,13 @@ static int gve_rx_dqo(struct napi_struct *napi, struct gve_rx_ring *rx, int unsplit = 0; if (hdr_len && !hbo) { - rx->ctx.skb_head = gve_rx_copy_data(priv->dev, napi, - rx->dqo.hdr_bufs.data + - desc_idx * priv->header_buf_size, - hdr_len); + size_t offset = + (size_t)buffer_id * priv->header_buf_size; + + rx->ctx.skb_head = + gve_rx_copy_data(priv->dev, napi, + rx->dqo.hdr_bufs.data + offset, + hdr_len); if (unlikely(!rx->ctx.skb_head)) goto error; rx->ctx.skb_tail = rx->ctx.skb_head; diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c index 9cb7ce9fd311..442f15476af3 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c @@ -811,12 +811,11 @@ static int hns3_get_link_ksettings(struct net_device *netdev, } static int hns3_check_ksettings_param(const struct net_device *netdev, - const struct ethtool_link_ksettings *cmd) + const struct ethtool_link_ksettings *cmd, + u8 media_type) { struct hnae3_handle *handle = hns3_get_handle(netdev); const struct hnae3_ae_ops *ops = hns3_get_ops(handle); - u8 module_type = HNAE3_MODULE_TYPE_UNKNOWN; - u8 media_type = HNAE3_MEDIA_TYPE_UNKNOWN; u32 lane_num; u8 autoneg; u32 speed; @@ -836,9 +835,6 @@ static int hns3_check_ksettings_param(const struct net_device *netdev, return 0; } - if (ops->get_media_type) - ops->get_media_type(handle, &media_type, &module_type); - if (cmd->base.duplex == DUPLEX_HALF && media_type != HNAE3_MEDIA_TYPE_COPPER) { netdev_err(netdev, @@ -863,6 +859,8 @@ static int hns3_set_link_ksettings(struct net_device *netdev, struct hnae3_handle *handle = hns3_get_handle(netdev); struct hnae3_ae_dev *ae_dev = hns3_get_ae_dev(handle); const struct hnae3_ae_ops *ops = hns3_get_ops(handle); + u8 module_type = HNAE3_MODULE_TYPE_UNKNOWN; + u8 media_type = HNAE3_MEDIA_TYPE_UNKNOWN; int ret; /* Chip don't support this mode. */ @@ -878,22 +876,23 @@ static int hns3_set_link_ksettings(struct net_device *netdev, cmd->base.autoneg, cmd->base.speed, cmd->base.duplex, cmd->lanes); - /* Only support ksettings_set for netdev with phy attached for now */ - if (netdev->phydev) { - if (cmd->base.speed == SPEED_1000 && - cmd->base.autoneg == AUTONEG_DISABLE) - return -EINVAL; + if (!ops->get_media_type) + return -EOPNOTSUPP; + ops->get_media_type(handle, &media_type, &module_type); - return phy_ethtool_ksettings_set(netdev->phydev, cmd); - } else if (test_bit(HNAE3_DEV_SUPPORT_PHY_IMP_B, ae_dev->caps) && - ops->set_phy_link_ksettings) { - return ops->set_phy_link_ksettings(handle, cmd); + if (media_type == HNAE3_MEDIA_TYPE_COPPER) { + if (!ops->set_phy_link_ksettings) + return -EOPNOTSUPP; + ret = ops->set_phy_link_ksettings(handle, cmd); + if (ret != -ENODEV) + return ret; + /* PHY_INEXISTENT, use MAC-level configuration */ } if (ae_dev->dev_version < HNAE3_DEVICE_VERSION_V2) return -EOPNOTSUPP; - ret = hns3_check_ksettings_param(netdev, cmd); + ret = hns3_check_ksettings_param(netdev, cmd, media_type); if (ret) return ret; diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 2f1984930da2..fc8587c80813 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -1504,6 +1504,11 @@ static int hclge_configure(struct hclge_dev *hdev) hdev->hw.mac.req_autoneg = AUTONEG_ENABLE; hdev->hw.mac.req_duplex = DUPLEX_FULL; + /* When lane_num is 0, the firmware will automatically + * select the appropriate lane_num based on the speed. + */ + hdev->hw.mac.req_lane_num = 0; + hclge_parse_link_mode(hdev, cfg.speed_ability); hdev->hw.mac.max_speed = hclge_get_max_speed(cfg.speed_ability); @@ -2579,8 +2584,11 @@ static int hclge_cfg_mac_speed_dup_h(struct hnae3_handle *handle, int speed, if (ret) return ret; - hdev->hw.mac.req_speed = (u32)speed; - hdev->hw.mac.req_duplex = duplex; + hdev->hw.mac.req_lane_num = lane_num; + if (speed != SPEED_UNKNOWN) + hdev->hw.mac.req_speed = (u32)speed; + if (duplex != DUPLEX_UNKNOWN) + hdev->hw.mac.req_duplex = duplex; return 0; } @@ -2611,6 +2619,7 @@ static int hclge_set_autoneg(struct hnae3_handle *handle, bool enable) { struct hclge_vport *vport = hclge_get_vport(handle); struct hclge_dev *hdev = vport->back; + int ret; if (!hdev->hw.mac.support_autoneg) { if (enable) { @@ -2622,7 +2631,10 @@ static int hclge_set_autoneg(struct hnae3_handle *handle, bool enable) } } - return hclge_set_autoneg_en(hdev, enable); + ret = hclge_set_autoneg_en(hdev, enable); + if (!ret) + hdev->hw.mac.req_autoneg = enable; + return ret; } static int hclge_get_autoneg(struct hnae3_handle *handle) @@ -2884,20 +2896,6 @@ static int hclge_mac_init(struct hclge_dev *hdev) if (!test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state)) hdev->hw.mac.duplex = HCLGE_MAC_FULL; - if (hdev->hw.mac.support_autoneg) { - ret = hclge_set_autoneg_en(hdev, hdev->hw.mac.autoneg); - if (ret) - return ret; - } - - if (!hdev->hw.mac.autoneg) { - ret = hclge_cfg_mac_speed_dup_hw(hdev, hdev->hw.mac.req_speed, - hdev->hw.mac.req_duplex, - hdev->hw.mac.lane_num); - if (ret) - return ret; - } - mac->link = 0; if (mac->user_fec_mode & BIT(HNAE3_FEC_USER_DEF)) { @@ -3285,8 +3283,8 @@ static int hclge_get_phy_link_ksettings(struct hnae3_handle *handle, } static int -hclge_set_phy_link_ksettings(struct hnae3_handle *handle, - const struct ethtool_link_ksettings *cmd) +hclge_ethtool_ksettings_set(struct hnae3_handle *handle, + const struct ethtool_link_ksettings *cmd) { struct hclge_desc desc[HCLGE_PHY_LINK_SETTING_BD_NUM]; struct hclge_vport *vport = hclge_get_vport(handle); @@ -3327,10 +3325,34 @@ hclge_set_phy_link_ksettings(struct hnae3_handle *handle, return ret; } - hdev->hw.mac.req_autoneg = cmd->base.autoneg; - hdev->hw.mac.req_speed = cmd->base.speed; - hdev->hw.mac.req_duplex = cmd->base.duplex; linkmode_copy(hdev->hw.mac.advertising, cmd->link_modes.advertising); + return 0; +} + +static int +hclge_set_phy_link_ksettings(struct hnae3_handle *handle, + const struct ethtool_link_ksettings *cmd) +{ + struct hclge_vport *vport = hclge_get_vport(handle); + struct hclge_dev *hdev = vport->back; + int ret = -ENODEV; + + if (hnae3_dev_phy_imp_supported(hdev)) { + ret = hclge_ethtool_ksettings_set(handle, cmd); + } else if (handle->netdev->phydev) { + if (cmd->base.speed == SPEED_1000 && + cmd->base.autoneg == AUTONEG_DISABLE) + return -EINVAL; + ret = phy_ethtool_ksettings_set(handle->netdev->phydev, cmd); + } + if (ret) + return ret; + + hdev->hw.mac.req_autoneg = cmd->base.autoneg; + if (cmd->base.speed != SPEED_UNKNOWN) + hdev->hw.mac.req_speed = cmd->base.speed; + if (cmd->base.duplex != DUPLEX_UNKNOWN) + hdev->hw.mac.req_duplex = cmd->base.duplex; return 0; } @@ -9294,6 +9316,27 @@ static int hclge_set_wol(struct hnae3_handle *handle, return ret; } +static int hclge_set_autoneg_speed_dup(struct hclge_dev *hdev) +{ + int ret; + + if (hdev->hw.mac.support_autoneg) { + ret = hclge_set_autoneg_en(hdev, hdev->hw.mac.req_autoneg); + if (ret) + return ret; + } + + if (!hdev->hw.mac.req_autoneg) { + ret = hclge_cfg_mac_speed_dup_hw(hdev, hdev->hw.mac.req_speed, + hdev->hw.mac.req_duplex, + hdev->hw.mac.req_lane_num); + if (ret) + return ret; + } + + return 0; +} + static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev) { struct pci_dev *pdev = ae_dev->pdev; @@ -9455,6 +9498,20 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev) if (ret) goto err_ptp_uninit; + if (hdev->hw.mac.media_type != HNAE3_MEDIA_TYPE_COPPER) { + hdev->hw.mac.req_autoneg = hdev->hw.mac.autoneg; + if (hdev->hw.mac.autoneg == AUTONEG_DISABLE && + hdev->hw.mac.speed != SPEED_UNKNOWN) + hdev->hw.mac.req_speed = hdev->hw.mac.speed; + } + + ret = hclge_set_autoneg_speed_dup(hdev); + if (ret) { + dev_err(&pdev->dev, + "failed to set autoneg speed duplex, ret = %d\n", ret); + goto err_ptp_uninit; + } + INIT_KFIFO(hdev->mac_tnl_log); hclge_dcb_ops_set(hdev); @@ -9785,6 +9842,13 @@ static int hclge_reset_ae_dev(struct hnae3_ae_dev *ae_dev) return ret; } + ret = hclge_set_autoneg_speed_dup(hdev); + if (ret) { + dev_err(&pdev->dev, + "failed to set autoneg speed duplex, ret = %d\n", ret); + return ret; + } + ret = hclge_tp_port_init(hdev); if (ret) { dev_err(&pdev->dev, "failed to init tp port, ret = %d\n", diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h index 87adeb64e6ea..7419481422c3 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h @@ -287,6 +287,7 @@ struct hclge_mac { u8 support_autoneg; u8 speed_type; /* 0: sfp speed, 1: active speed */ u8 lane_num; + u8 req_lane_num; u32 speed; u32 req_speed; u32 max_speed; diff --git a/drivers/net/ethernet/ibm/ehea/ehea_main.c b/drivers/net/ethernet/ibm/ehea/ehea_main.c index ff67c4fd66a3..bfc8699a05b9 100644 --- a/drivers/net/ethernet/ibm/ehea/ehea_main.c +++ b/drivers/net/ethernet/ibm/ehea/ehea_main.c @@ -3216,6 +3216,8 @@ static int ehea_create_device_sysfs(struct platform_device *dev) goto out; ret = device_create_file(&dev->dev, &dev_attr_remove_port); + if (ret) + device_remove_file(&dev->dev, &dev_attr_probe_port); out: return ret; } diff --git a/drivers/net/ethernet/ibm/emac/core.c b/drivers/net/ethernet/ibm/emac/core.c index c0e1d85fce3f..1d46cf6c2c12 100644 --- a/drivers/net/ethernet/ibm/emac/core.c +++ b/drivers/net/ethernet/ibm/emac/core.c @@ -3044,6 +3044,12 @@ static int emac_probe(struct platform_device *ofdev) if (err) goto err_gone; + dev->emacp = devm_platform_ioremap_resource(ofdev, 0); + if (IS_ERR(dev->emacp)) { + err = PTR_ERR(dev->emacp); + goto err_gone; + } + /* Setup error IRQ handler */ dev->emac_irq = platform_get_irq(ofdev, 0); if (dev->emac_irq < 0) { @@ -3061,13 +3067,6 @@ static int emac_probe(struct platform_device *ofdev) ndev->irq = dev->emac_irq; - dev->emacp = devm_platform_ioremap_resource(ofdev, 0); - if (IS_ERR(dev->emacp)) { - dev_err(&ofdev->dev, "can't map device registers"); - err = PTR_ERR(dev->emacp); - goto err_gone; - } - /* Wait for dependent devices */ err = emac_wait_deps(dev); if (err) diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c index dea208db1be5..aa90e0ce8aca 100644 --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c @@ -1594,6 +1594,9 @@ static s32 e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw) phy_reg &= ~I217_PLL_CLOCK_GATE_MASK; if (speed == SPEED_100 || speed == SPEED_10) phy_reg |= 0x3E8; + else if (hw->mac.type == e1000_pch_mtp || + hw->mac.type == e1000_pch_ptp) + phy_reg |= 0x1D5; else phy_reg |= 0xFA; e1e_wphy_locked(hw, I217_PLL_CLOCK_GATE_REG, phy_reg); diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 808e5cddd6a9..844f31ab37ad 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -25,6 +25,7 @@ #include <linux/pm_runtime.h> #include <linux/prefetch.h> #include <linux/suspend.h> +#include <linux/dmi.h> #include "e1000.h" #define CREATE_TRACE_POINTS @@ -58,6 +59,17 @@ static const struct e1000_info *e1000_info_tbl[] = { [board_pch_ptp] = &e1000_pch_ptp_info, }; +static const struct dmi_system_id disable_k1_list[] = { + { + .ident = "Dell Pro 16 Plus PB16250", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Dell Pro 16 Plus PB16250"), + }, + }, + {} +}; + struct e1000_reg_info { u32 ofs; char *name; @@ -7670,7 +7682,8 @@ static int e1000_probe(struct pci_dev *pdev, const struct pci_device_id *ent) /* init PTP hardware clock */ e1000e_ptp_init(adapter); - if (hw->mac.type >= e1000_pch_mtp) + /* disable K1 by default on known problematic systems */ + if (hw->mac.type >= e1000_pch_mtp && dmi_check_system(disable_k1_list)) adapter->flags2 |= FLAG2_DISABLE_K1; /* reset the hardware with the new settings */ diff --git a/drivers/net/ethernet/intel/i40e/i40e_debug.h b/drivers/net/ethernet/intel/i40e/i40e_debug.h index e9871dfb32bd..01fd70db9086 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_debug.h +++ b/drivers/net/ethernet/intel/i40e/i40e_debug.h @@ -42,7 +42,7 @@ struct device *i40e_hw_to_dev(struct i40e_hw *hw); #define i40e_debug(h, m, s, ...) \ do { \ if (((m) & (h)->debug_mask)) \ - dev_info(i40e_hw_to_dev(hw), s, ##__VA_ARGS__); \ + dev_info(i40e_hw_to_dev(h), s, ##__VA_ARGS__); \ } while (0) #endif /* _I40E_DEBUG_H_ */ diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c index a615d599b88e..e7cf12eaa268 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c +++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c @@ -1855,6 +1855,7 @@ static const struct ethtool_ops iavf_ethtool_ops = { .supported_coalesce_params = ETHTOOL_COALESCE_USECS | ETHTOOL_COALESCE_USE_ADAPTIVE, .supported_input_xfrm = RXH_XFRM_SYM_XOR, + .op_needs_rtnl = ETHTOOL_OP_NEEDS_RTNL_GLINK, .get_drvinfo = iavf_get_drvinfo, .get_link = ethtool_op_get_link, .get_ringparam = iavf_get_ringparam, diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c index 31e0de9e7f60..ef1ce106f81b 100644 --- a/drivers/net/ethernet/intel/ice/ice_common.c +++ b/drivers/net/ethernet/intel/ice/ice_common.c @@ -3882,7 +3882,6 @@ ice_set_fc(struct ice_port_info *pi, u8 *aq_failures, bool ena_auto_link_update) if (!pi || !aq_failures) return -EINVAL; - *aq_failures = 0; hw = pi->hw; pcaps = kzalloc_obj(*pcaps); diff --git a/drivers/net/ethernet/intel/ice/ice_dpll.c b/drivers/net/ethernet/intel/ice/ice_dpll.c index 462c69cc11e1..30c3a4db7d61 100644 --- a/drivers/net/ethernet/intel/ice/ice_dpll.c +++ b/drivers/net/ethernet/intel/ice/ice_dpll.c @@ -4645,9 +4645,13 @@ ice_dpll_init_pins_info(struct ice_pf *pf, enum ice_dpll_pin_type pin_type) static void ice_dpll_deinit_info(struct ice_pf *pf) { kfree(pf->dplls.inputs); + pf->dplls.inputs = NULL; kfree(pf->dplls.outputs); + pf->dplls.outputs = NULL; kfree(pf->dplls.eec.input_prio); + pf->dplls.eec.input_prio = NULL; kfree(pf->dplls.pps.input_prio); + pf->dplls.pps.input_prio = NULL; } /** @@ -4748,12 +4752,16 @@ static int ice_dpll_init_info(struct ice_pf *pf, bool cgu) alloc_size = sizeof(*de->input_prio) * d->num_inputs; de->input_prio = kzalloc(alloc_size, GFP_KERNEL); - if (!de->input_prio) - return -ENOMEM; + if (!de->input_prio) { + ret = -ENOMEM; + goto deinit_info; + } dp->input_prio = kzalloc(alloc_size, GFP_KERNEL); - if (!dp->input_prio) - return -ENOMEM; + if (!dp->input_prio) { + ret = -ENOMEM; + goto deinit_info; + } ret = ice_dpll_init_pins_info(pf, ICE_DPLL_PIN_TYPE_INPUT); if (ret) @@ -4778,12 +4786,12 @@ static int ice_dpll_init_info(struct ice_pf *pf, bool cgu) ret = ice_get_cgu_rclk_pin_info(&pf->hw, &d->base_rclk_idx, &pf->dplls.rclk.num_parents); if (ret) - return ret; + goto deinit_info; for (i = 0; i < pf->dplls.rclk.num_parents; i++) pf->dplls.rclk.parent_idx[i] = d->base_rclk_idx + i; ret = ice_dpll_init_pins_info(pf, ICE_DPLL_PIN_TYPE_RCLK_INPUT); if (ret) - return ret; + goto deinit_info; de->mode = DPLL_MODE_AUTOMATIC; dp->mode = DPLL_MODE_AUTOMATIC; diff --git a/drivers/net/ethernet/intel/ice/ice_eswitch.c b/drivers/net/ethernet/intel/ice/ice_eswitch.c index 2e4f0969035f..c30e27bbfe6e 100644 --- a/drivers/net/ethernet/intel/ice/ice_eswitch.c +++ b/drivers/net/ethernet/intel/ice/ice_eswitch.c @@ -117,8 +117,6 @@ static int ice_eswitch_setup_repr(struct ice_pf *pf, struct ice_repr *repr) if (!repr->dst) return -ENOMEM; - netif_keep_dst(uplink_vsi->netdev); - dst = repr->dst; dst->u.port_info.port_id = vsi->vsi_num; dst->u.port_info.lower_dev = uplink_vsi->netdev; @@ -312,6 +310,8 @@ static int ice_eswitch_enable_switchdev(struct ice_pf *pf) if (ice_eswitch_br_offloads_init(pf)) goto err_br_offloads; + netif_keep_dst(uplink_vsi->netdev); + pf->eswitch.is_running = true; return 0; diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c index 236d293aba98..49371b065845 100644 --- a/drivers/net/ethernet/intel/ice/ice_ethtool.c +++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c @@ -3508,7 +3508,7 @@ ice_set_pauseparam(struct net_device *netdev, struct ethtool_pauseparam *pause) struct ice_vsi *vsi = np->vsi; struct ice_hw *hw = &pf->hw; struct ice_port_info *pi; - u8 aq_failures; + u8 aq_failures = 0; bool link_up; u32 is_an; int err; @@ -3579,18 +3579,22 @@ ice_set_pauseparam(struct net_device *netdev, struct ethtool_pauseparam *pause) /* Set the FC mode and only restart AN if link is up */ err = ice_set_fc(pi, &aq_failures, link_up); - if (aq_failures & ICE_SET_FC_AQ_FAIL_GET) { + switch (aq_failures) { + case ICE_SET_FC_AQ_FAIL_GET: netdev_info(netdev, "Set fc failed on the get_phy_capabilities call with err %d aq_err %s\n", err, libie_aq_str(hw->adminq.sq_last_status)); err = -EAGAIN; - } else if (aq_failures & ICE_SET_FC_AQ_FAIL_SET) { + break; + case ICE_SET_FC_AQ_FAIL_SET: netdev_info(netdev, "Set fc failed on the set_phy_config call with err %d aq_err %s\n", err, libie_aq_str(hw->adminq.sq_last_status)); err = -EAGAIN; - } else if (aq_failures & ICE_SET_FC_AQ_FAIL_UPDATE) { + break; + case ICE_SET_FC_AQ_FAIL_UPDATE: netdev_info(netdev, "Set fc failed on the get_link_info call with err %d aq_err %s\n", err, libie_aq_str(hw->adminq.sq_last_status)); err = -EAGAIN; + break; } return err; diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index e2fbe111f849..e2fd2dab03e3 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -4789,16 +4789,14 @@ static void ice_init_wakeup(struct ice_pf *pf) device_set_wakeup_enable(ice_pf_to_dev(pf), false); } -static int ice_init_link(struct ice_pf *pf) +static void ice_init_link(struct ice_pf *pf) { struct device *dev = ice_pf_to_dev(pf); int err; err = ice_init_link_events(pf->hw.port_info); - if (err) { + if (err) dev_err(dev, "ice_init_link_events failed: %d\n", err); - return err; - } /* not a fatal error if this fails */ err = ice_init_nvm_phy_type(pf->hw.port_info); @@ -4838,8 +4836,6 @@ static int ice_init_link(struct ice_pf *pf) } else { set_bit(ICE_FLAG_NO_MEDIA, pf->flags); } - - return err; } static int ice_init_pf_sw(struct ice_pf *pf) @@ -4982,13 +4978,11 @@ static int ice_init(struct ice_pf *pf) ice_init_wakeup(pf); - err = ice_init_link(pf); - if (err) - goto err_init_link; + ice_init_link(pf); err = ice_send_version(pf); if (err) - goto err_init_link; + goto err_deinit_pf_sw; ice_verify_cacheline_size(pf); @@ -5007,7 +5001,7 @@ static int ice_init(struct ice_pf *pf) return 0; -err_init_link: +err_deinit_pf_sw: ice_deinit_pf_sw(pf); err_init_pf_sw: ice_dealloc_vsis(pf); diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.c b/drivers/net/ethernet/intel/ice/ice_vf_lib.c index b1f46707dcc0..27e4acb1620f 100644 --- a/drivers/net/ethernet/intel/ice/ice_vf_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.c @@ -801,7 +801,7 @@ void ice_reset_all_vfs(struct ice_pf *pf) * setup only when VF creates its first FDIR rule. */ if (vf->ctrl_vsi_idx != ICE_NO_VSI) - ice_vf_ctrl_invalidate_vsi(vf); + ice_vf_ctrl_vsi_release(vf); ice_vf_pre_vsi_rebuild(vf); if (ice_vf_rebuild_vsi(vf)) { diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index 0c061fb0ed07..744d6585a949 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -5900,6 +5900,9 @@ static int mvneta_resume(struct device *device) rtnl_unlock(); mvneta_set_rx_mode(dev); + if (!pp->neta_armada3700) + on_each_cpu(mvneta_percpu_enable, pp, true); + return 0; } #endif diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c index 354c4e881c6a..3070700b952b 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c @@ -3423,6 +3423,36 @@ static int npc_create_srch_order(int cnt) return 0; } +static int npc_subbanks_srch_order_init(struct rvu *rvu) +{ + struct npc_subbank *sb; + int sb_idx; + int i, j; + int rc; + + for (i = 0; i < npc_priv->num_subbanks; i++) { + sb_idx = subbank_srch_order[i]; + sb = &npc_priv->sb[sb_idx]; + sb->arr_idx = i; + + dev_dbg(rvu->dev, "%s: sb->idx=%u sb->arr_idx=%u\n", + __func__, sb->idx, sb->arr_idx); + + rc = xa_err(xa_store(&npc_priv->xa_sb_free, sb->arr_idx, + xa_mk_value(sb->idx), GFP_KERNEL)); + if (rc) { + dev_err(rvu->dev, + "%s: xa_store(xa_sb_free) failed at slot %d (sb=%d): %d\n", + __func__, i, sb_idx, rc); + for (j = 0; j < i; j++) + xa_erase(&npc_priv->xa_sb_free, j); + return rc; + } + } + + return 0; +} + static void npc_subbank_init(struct rvu *rvu, struct npc_subbank *sb, int idx) { mutex_init(&sb->lock); @@ -3435,16 +3465,6 @@ static void npc_subbank_init(struct rvu *rvu, struct npc_subbank *sb, int idx) sb->flags = NPC_SUBBANK_FLAG_FREE; sb->idx = idx; - sb->arr_idx = subbank_srch_order[idx]; - - dev_dbg(rvu->dev, "%s: sb->idx=%u sb->arr_idx=%u\n", - __func__, sb->idx, sb->arr_idx); - - /* Keep first and last subbank at end of free array; so that - * it will be used at last - */ - xa_store(&npc_priv->xa_sb_free, sb->arr_idx, - xa_mk_value(sb->idx), GFP_KERNEL); } static int npc_pcifunc_map_create(struct rvu *rvu) @@ -3569,15 +3589,18 @@ static int npc_defrag_alloc_free_slots(struct rvu *rvu, alloc_cnt2 = 0; rc = __npc_subbank_alloc(rvu, sb, - NPC_MCAM_KEY_X2, sb->b0b, + f->key_type, sb->b0b, sb->b0t, NPC_MCAM_LOWER_PRIO, false, cnt, save, cnt, true, &alloc_cnt1); - if (alloc_cnt1 < cnt) { + /* X4 entries only occupy bank 0 (b0b..b0t); see npc_subbank_idx_2_mcam_idx(). + * X2 uses both halves of the subbank, so spill into bank 1 if needed. + */ + if (alloc_cnt1 < cnt && f->key_type == NPC_MCAM_KEY_X2) { rc = __npc_subbank_alloc(rvu, sb, - NPC_MCAM_KEY_X2, sb->b1b, + f->key_type, sb->b1b, sb->b1t, NPC_MCAM_LOWER_PRIO, false, cnt - alloc_cnt1, @@ -4635,6 +4658,7 @@ static int npc_priv_init(struct rvu *rvu) int num_subbanks, subbank_depth; u64 npc_const1, npc_const2 = 0; struct npc_subbank *sb; + int ret = -ENOMEM; u64 cfg; int i; @@ -4727,13 +4751,19 @@ static int npc_priv_init(struct rvu *rvu) for (i = 0, sb = npc_priv->sb; i < num_subbanks; i++, sb++) npc_subbank_init(rvu, sb, i); + ret = npc_subbanks_srch_order_init(rvu); + if (ret) + goto fail3; + /* Get number of pcifuncs in the system */ npc_priv->pf_cnt = npc_pcifunc_map_create(rvu); npc_priv->xa_pf2idx_map = kcalloc(npc_priv->pf_cnt, sizeof(struct xarray), GFP_KERNEL); - if (!npc_priv->xa_pf2idx_map) + if (!npc_priv->xa_pf2idx_map) { + ret = -ENOMEM; goto fail3; + } for (i = 0; i < npc_priv->pf_cnt; i++) xa_init_flags(&npc_priv->xa_pf2idx_map[i], XA_FLAGS_ALLOC); @@ -4760,7 +4790,7 @@ fail2: fail1: kfree(npc_priv); npc_priv = NULL; - return -ENOMEM; + return ret; } void npc_cn20k_deinit(struct rvu *rvu) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mcs.c b/drivers/net/ethernet/marvell/octeontx2/af/mcs.c index c1775bd01c2b..a07e0b3d8d00 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/mcs.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/mcs.c @@ -120,13 +120,13 @@ void mcs_get_rx_secy_stats(struct mcs *mcs, struct mcs_secy_stats *stats, int id reg = MCSX_CSE_RX_MEM_SLAVE_INPKTSSECYUNTAGGEDX(id); stats->pkt_untaged_cnt = mcs_reg_read(mcs, reg); - reg = MCSX_CSE_RX_MEM_SLAVE_INPKTSSECYCTLX(id); - stats->pkt_ctl_cnt = mcs_reg_read(mcs, reg); - if (mcs->hw->mcs_blks > 1) { reg = MCSX_CSE_RX_MEM_SLAVE_INPKTSSECYNOTAGX(id); stats->pkt_notag_cnt = mcs_reg_read(mcs, reg); + return; } + reg = MCSX_CSE_RX_MEM_SLAVE_INPKTSSECYCTLX(id); + stats->pkt_ctl_cnt = mcs_reg_read(mcs, reg); } void mcs_get_flowid_stats(struct mcs *mcs, struct mcs_flowid_stats *stats, diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cn10k.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cn10k.c index d2163da28d18..fa4ea1258d29 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cn10k.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cn10k.c @@ -178,6 +178,15 @@ int rvu_mbox_handler_lmtst_tbl_setup(struct rvu *rvu, * pcifunc (will be the one who is calling this mailbox). */ if (req->base_pcifunc) { + /* A VF is untrusted and must not redirect its LMTLINE to + * another PF's region, so confine VF callers to their own PF. + */ + if (is_vf(req->hdr.pcifunc) && + (!is_pf_func_valid(rvu, req->base_pcifunc) || + rvu_get_pf(rvu->pdev, req->hdr.pcifunc) != + rvu_get_pf(rvu->pdev, req->base_pcifunc))) + return -EPERM; + /* Calculating the LMT table index equivalent to primary * pcifunc. */ diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c index fa461489acdd..3456313d3b3c 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c @@ -482,10 +482,11 @@ static int rvu_dbg_mcs_rx_secy_stats_display(struct seq_file *filp, void *unused seq_printf(filp, "secy%d: Tagged ctrl pkts: %lld\n", secy_id, stats.pkt_tagged_ctl_cnt); seq_printf(filp, "secy%d: Untaged pkts: %lld\n", secy_id, stats.pkt_untaged_cnt); - seq_printf(filp, "secy%d: Ctrl pkts: %lld\n", secy_id, stats.pkt_ctl_cnt); if (mcs->hw->mcs_blks > 1) seq_printf(filp, "secy%d: pkts notag: %lld\n", secy_id, stats.pkt_notag_cnt); + else + seq_printf(filp, "secy%d: Ctrl pkts: %lld\n", secy_id, stats.pkt_ctl_cnt); } mutex_unlock(&mcs->stats_lock); return 0; @@ -2809,6 +2810,14 @@ static void rvu_dbg_npa_init(struct rvu *rvu) &rvu_dbg_npa_ndc_hits_miss_fops); } +/* Per-lmac CGX debugfs files need both RVU and CGX handle; inode->i_private + * points here so seq_file ops avoid pci_get_device(PCI_DEVID_OCTEONTX2_RVU_AF). + */ +struct rvu_cgx_lmac_dbgfs_ctx { + struct rvu *rvu; + void *cgxd; +}; + #define PRINT_CGX_CUML_NIXRX_STATUS(idx, name) \ ({ \ u64 cnt; \ @@ -2831,18 +2840,14 @@ static void rvu_dbg_npa_init(struct rvu *rvu) static int cgx_print_stats(struct seq_file *s, int lmac_id) { + struct rvu_cgx_lmac_dbgfs_ctx *dctx = s->private; struct cgx_link_user_info linfo; + struct rvu *rvu = dctx->rvu; struct mac_ops *mac_ops; - void *cgxd = s->private; + void *cgxd = dctx->cgxd; u64 ucast, mcast, bcast; int stat = 0, err = 0; u64 tx_stat, rx_stat; - struct rvu *rvu; - - rvu = pci_get_drvdata(pci_get_device(PCI_VENDOR_ID_CAVIUM, - PCI_DEVID_OCTEONTX2_RVU_AF, NULL)); - if (!rvu) - return -ENODEV; mac_ops = get_mac_ops(cgxd); /* There can be no CGX devices at all */ @@ -2949,20 +2954,16 @@ RVU_DEBUG_SEQ_FOPS(cgx_stat, cgx_stat_display, NULL); static int cgx_print_dmac_flt(struct seq_file *s, int lmac_id) { + struct rvu_cgx_lmac_dbgfs_ctx *dctx = s->private; + struct rvu *rvu = dctx->rvu; struct pci_dev *pdev = NULL; - void *cgxd = s->private; + void *cgxd = dctx->cgxd; char *bcast, *mcast; u16 index, domain; u8 dmac[ETH_ALEN]; - struct rvu *rvu; u64 cfg, mac; int pf; - rvu = pci_get_drvdata(pci_get_device(PCI_VENDOR_ID_CAVIUM, - PCI_DEVID_OCTEONTX2_RVU_AF, NULL)); - if (!rvu) - return -ENODEV; - pf = cgxlmac_to_pf(rvu, cgx_get_cgxid(cgxd), lmac_id); domain = 2; @@ -3009,17 +3010,13 @@ RVU_DEBUG_SEQ_FOPS(cgx_dmac_flt, cgx_dmac_flt_display, NULL); static int cgx_print_fwdata(struct seq_file *s, int lmac_id) { + struct rvu_cgx_lmac_dbgfs_ctx *dctx = s->private; struct cgx_lmac_fwdata_s *fwdata; - void *cgxd = s->private; + struct rvu *rvu = dctx->rvu; + void *cgxd = dctx->cgxd; struct phy_s *phy; - struct rvu *rvu; int cgx_id, i; - rvu = pci_get_drvdata(pci_get_device(PCI_VENDOR_ID_CAVIUM, - PCI_DEVID_OCTEONTX2_RVU_AF, NULL)); - if (!rvu) - return -ENODEV; - if (!rvu->fwdata) return -EAGAIN; @@ -3100,6 +3097,7 @@ RVU_DEBUG_SEQ_FOPS(cgx_fwdata, cgx_fwdata_display, NULL); static void rvu_dbg_cgx_init(struct rvu *rvu) { + struct rvu_cgx_lmac_dbgfs_ctx *ctx; struct mac_ops *mac_ops; unsigned long lmac_bmap; int i, lmac_id; @@ -3126,6 +3124,13 @@ static void rvu_dbg_cgx_init(struct rvu *rvu) rvu->rvu_dbg.cgx = debugfs_create_dir(dname, rvu->rvu_dbg.cgx_root); + ctx = devm_kzalloc(rvu->dev, sizeof(*ctx), GFP_KERNEL); + if (!ctx) + continue; + + ctx->rvu = rvu; + ctx->cgxd = cgx; + for_each_set_bit(lmac_id, &lmac_bmap, rvu->hw->lmac_per_cgx) { /* lmac debugfs dir */ sprintf(dname, "lmac%d", lmac_id); @@ -3133,13 +3138,13 @@ static void rvu_dbg_cgx_init(struct rvu *rvu) debugfs_create_dir(dname, rvu->rvu_dbg.cgx); debugfs_create_file_aux_num("stats", 0600, rvu->rvu_dbg.lmac, - cgx, lmac_id, &rvu_dbg_cgx_stat_fops); + ctx, lmac_id, &rvu_dbg_cgx_stat_fops); debugfs_create_file_aux_num("mac_filter", 0600, - rvu->rvu_dbg.lmac, cgx, lmac_id, + rvu->rvu_dbg.lmac, ctx, lmac_id, &rvu_dbg_cgx_dmac_flt_fops); - debugfs_create_file("fwdata", 0600, - rvu->rvu_dbg.lmac, cgx, - &rvu_dbg_cgx_fwdata_fops); + debugfs_create_file_aux_num("fwdata", 0600, + rvu->rvu_dbg.lmac, ctx, + lmac_id, &rvu_dbg_cgx_fwdata_fops); } } } diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c index aa3ecab5ebd8..d63c3d33775a 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c @@ -1511,7 +1511,9 @@ static int rvu_af_dl_nix_maxlf_validate(struct devlink *devlink, u32 id, struct rvu_devlink *rvu_dl = devlink_priv(devlink); struct rvu *rvu = rvu_dl->rvu; u16 max_nix0_lf, max_nix1_lf; - struct npc_mcam *mcam; + struct rvu_block *block; + int blkaddr = 0; + int free_lfs; u64 cfg; cfg = rvu_read64(rvu, BLKADDR_NIX0, NIX_AF_CONST2); @@ -1519,14 +1521,23 @@ static int rvu_af_dl_nix_maxlf_validate(struct devlink *devlink, u32 id, cfg = rvu_read64(rvu, BLKADDR_NIX1, NIX_AF_CONST2); max_nix1_lf = cfg & 0xFFF; - /* Do not allow user to modify maximum NIX LFs while mcam entries - * have already been assigned. + /* Do not allow user to modify maximum NIX LFs while NIX LFs + * have already been assigned. Note that modifying NIX LFs count + * can be done only before any LF attach requests from PFs and VFs + * and not later or concurrently. */ - mcam = &rvu->hw->mcam; - if (mcam->bmap_fcnt < mcam->bmap_entries) { - NL_SET_ERR_MSG_MOD(extack, - "mcam entries have already been assigned, can't resize"); - return -EPERM; + blkaddr = rvu_get_next_nix_blkaddr(rvu, blkaddr); + while (blkaddr) { + block = &rvu->hw->block[blkaddr]; + + free_lfs = rvu_rsrc_free_count(&block->lf); + if (free_lfs != block->lf.max) { + NL_SET_ERR_MSG_MOD(extack, + "NIX LFs already assigned, can't resize"); + return -EPERM; + } + + blkaddr = rvu_get_next_nix_blkaddr(rvu, blkaddr); } if (max_nix0_lf && val->vu16 > max_nix0_lf) { diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c index d8989395e875..0297c7ab0614 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c @@ -528,19 +528,24 @@ static int nix_setup_bpids(struct rvu *rvu, struct nix_hw *hw, int blkaddr) bp->fn_map = devm_kcalloc(rvu->dev, bp->bpids.max, sizeof(u16), GFP_KERNEL); if (!bp->fn_map) - return -ENOMEM; + goto free_bpids; bp->intf_map = devm_kcalloc(rvu->dev, bp->bpids.max, sizeof(u8), GFP_KERNEL); if (!bp->intf_map) - return -ENOMEM; + goto free_bpids; bp->ref_cnt = devm_kcalloc(rvu->dev, bp->bpids.max, sizeof(u8), GFP_KERNEL); if (!bp->ref_cnt) - return -ENOMEM; + goto free_bpids; return 0; + +free_bpids: + rvu_free_bitmap(&bp->bpids); + bp->bpids.bmap = NULL; + return -ENOMEM; } void rvu_nix_flr_free_bpids(struct rvu *rvu, u16 pcifunc) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc_fs.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc_fs.c index a22decbe3449..91b5947dae06 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc_fs.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc_fs.c @@ -2225,7 +2225,7 @@ int npc_install_mcam_drop_rule(struct rvu *rvu, int mcam_idx, u16 *counter_idx, return err; } - dev_err(rvu->dev, + dev_dbg(rvu->dev, "%s: Installed single drop on non hit rule at %d, cntr=%d\n", __func__, mcam_idx, req.cntr); diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c index 2cc1bdfd9b2e..9524d38f1582 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c @@ -182,6 +182,7 @@ static void cn10k_mcs_free_rsrc(struct otx2_nic *pfvf, enum mcs_direction dir, clear_req->id = hw_rsrc_id; clear_req->type = type; clear_req->dir = dir; + clear_req->all = all; req = otx2_mbox_alloc_msg_mcs_free_resources(mbox); if (!req) @@ -1776,11 +1777,16 @@ fail: void cn10k_mcs_free(struct otx2_nic *pfvf) { + struct cn10k_mcs_cfg *cfg = pfvf->macsec_cfg; + if (!test_bit(CN10K_HW_MACSEC, &pfvf->hw.cap_flag)) return; - cn10k_mcs_free_rsrc(pfvf, MCS_TX, MCS_RSRC_TYPE_SECY, 0, true); - cn10k_mcs_free_rsrc(pfvf, MCS_RX, MCS_RSRC_TYPE_SECY, 0, true); + if (!list_empty(&cfg->txsc_list)) { + cn10k_mcs_free_rsrc(pfvf, MCS_TX, MCS_RSRC_TYPE_SECY, 0, true); + cn10k_mcs_free_rsrc(pfvf, MCS_RX, MCS_RSRC_TYPE_SECY, 0, true); + } + kfree(pfvf->macsec_cfg); pfvf->macsec_cfg = NULL; } diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c index 41a0ebdf201e..b63df5737ff2 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c @@ -1575,6 +1575,7 @@ static void otx2_free_sq_res(struct otx2_nic *pf) qmem_free(pf->dev, sq->sqe_ring); qmem_free(pf->dev, sq->cpt_resp); qmem_free(pf->dev, sq->tso_hdrs); + qmem_free(pf->dev, sq->timestamps); kfree(sq->sg); kfree(sq->sqb_ptrs); } diff --git a/drivers/net/ethernet/marvell/prestera/prestera_main.c b/drivers/net/ethernet/marvell/prestera/prestera_main.c index 41e19e9ad28d..a82e7a802985 100644 --- a/drivers/net/ethernet/marvell/prestera/prestera_main.c +++ b/drivers/net/ethernet/marvell/prestera/prestera_main.c @@ -373,7 +373,7 @@ static int prestera_port_sfp_bind(struct prestera_port *port) struct device_node *ports, *node; struct fwnode_handle *fwnode; struct phylink *phy_link; - int err; + int err = 0; if (!sw->np) return 0; diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index 7d771168b990..5d291e50a47b 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -4960,6 +4960,11 @@ static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np) if (MTK_HAS_CAPS(eth->soc->caps, MTK_SOC_MT7628)) mac_ops = &rt5350_phylink_ops; + if (MTK_HAS_CAPS(mac->hw->soc->caps, MTK_2P5GPHY) && + id == MTK_GMAC2_ID) + __set_bit(PHY_INTERFACE_MODE_INTERNAL, + mac->phylink_config.supported_interfaces); + phylink = phylink_create(&mac->phylink_config, of_fwnode_handle(mac->of_node), phy_mode, mac_ops); @@ -4970,11 +4975,6 @@ static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np) mac->phylink = phylink; - if (MTK_HAS_CAPS(mac->hw->soc->caps, MTK_2P5GPHY) && - id == MTK_GMAC2_ID) - __set_bit(PHY_INTERFACE_MODE_INTERNAL, - mac->phylink_config.supported_interfaces); - SET_NETDEV_DEV(eth->netdev[id], eth->dev); eth->netdev[id]->watchdog_timeo = 5 * HZ; eth->netdev[id]->netdev_ops = &mtk_netdev_ops; diff --git a/drivers/net/ethernet/mediatek/mtk_ppe.c b/drivers/net/ethernet/mediatek/mtk_ppe.c index 18279e2a7022..8451dc3fd00a 100644 --- a/drivers/net/ethernet/mediatek/mtk_ppe.c +++ b/drivers/net/ethernet/mediatek/mtk_ppe.c @@ -918,7 +918,7 @@ struct mtk_ppe *mtk_ppe_init(struct mtk_eth *eth, void __iomem *base, int index) mib = dmam_alloc_coherent(ppe->dev, MTK_PPE_ENTRIES * sizeof(*mib), &ppe->mib_phys, GFP_KERNEL); if (!mib) - return NULL; + goto err_free_l2_flows; ppe->mib_table = mib; @@ -926,7 +926,7 @@ struct mtk_ppe *mtk_ppe_init(struct mtk_eth *eth, void __iomem *base, int index) GFP_KERNEL); if (!acct) - return NULL; + goto err_free_l2_flows; ppe->acct_table = acct; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig index 3c3e84100d5a..925ee25d05b4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig +++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig @@ -143,7 +143,7 @@ config MLX5_CORE_IPOIB config MLX5_MACSEC bool "Connect-X support for MACSec offload" depends on MLX5_CORE_EN - depends on MACSEC + depends on MACSEC=y || MACSEC=MLX5_CORE default n help Build support for MACsec cryptography-offload acceleration in the NIC. diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index 2f5b626ba33f..112926d07634 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -2721,7 +2721,8 @@ const struct ethtool_ops mlx5e_ethtool_ops = { .rxfh_max_num_contexts = MLX5E_MAX_NUM_RSS, .op_needs_rtnl = ETHTOOL_OP_NEEDS_RTNL_SCHANNELS | ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM | - ETHTOOL_OP_NEEDS_RTNL_SPFLAGS, + ETHTOOL_OP_NEEDS_RTNL_SPFLAGS | + ETHTOOL_OP_NEEDS_RTNL_GLINK, .supported_coalesce_params = ETHTOOL_COALESCE_USECS | ETHTOOL_COALESCE_MAX_FRAMES | ETHTOOL_COALESCE_USE_ADAPTIVE | diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index 1a8a19f980d3..c8b76d301c92 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -419,7 +419,8 @@ static const struct ethtool_ops mlx5e_rep_ethtool_ops = { ETHTOOL_COALESCE_MAX_FRAMES | ETHTOOL_COALESCE_USE_ADAPTIVE, .op_needs_rtnl = ETHTOOL_OP_NEEDS_RTNL_SCHANNELS | - ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM, + ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM | + ETHTOOL_OP_NEEDS_RTNL_GLINK, .get_drvinfo = mlx5e_rep_get_drvinfo, .get_link = ethtool_op_get_link, .get_strings = mlx5e_rep_get_strings, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c index 9b3b32408c64..01ddc3def9ac 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c @@ -286,7 +286,8 @@ const struct ethtool_ops mlx5i_ethtool_ops = { ETHTOOL_COALESCE_MAX_FRAMES | ETHTOOL_COALESCE_USE_ADAPTIVE, .op_needs_rtnl = ETHTOOL_OP_NEEDS_RTNL_SCHANNELS | - ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM, + ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM | + ETHTOOL_OP_NEEDS_RTNL_GLINK, .get_drvinfo = mlx5i_get_drvinfo, .get_strings = mlx5i_get_strings, .get_sset_count = mlx5i_get_sset_count, @@ -309,6 +310,7 @@ const struct ethtool_ops mlx5i_ethtool_ops = { }; const struct ethtool_ops mlx5i_pkey_ethtool_ops = { + .op_needs_rtnl = ETHTOOL_OP_NEEDS_RTNL_GLINK, .get_drvinfo = mlx5i_get_drvinfo, .get_link = ethtool_op_get_link, .get_ts_info = mlx5i_get_ts_info, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h index 51637e58a48b..09e669f83dba 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h @@ -297,7 +297,6 @@ void mlx5_core_reps_aux_devs_remove(struct mlx5_core_dev *dev); void mlx5_fw_reporters_create(struct mlx5_core_dev *dev); int mlx5_query_mtpps(struct mlx5_core_dev *dev, u32 *mtpps, u32 mtpps_size); int mlx5_set_mtpps(struct mlx5_core_dev *mdev, u32 *mtpps, u32 mtpps_size); -int mlx5_query_mtppse(struct mlx5_core_dev *mdev, u8 pin, u8 *arm, u8 *mode); int mlx5_set_mtppse(struct mlx5_core_dev *mdev, u8 pin, u8 arm, u8 mode); struct mlx5_dm *mlx5_dm_create(struct mlx5_core_dev *dev); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c index ee8b9765c5ba..ddbe9ca8971d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/port.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c @@ -908,25 +908,6 @@ int mlx5_set_mtpps(struct mlx5_core_dev *mdev, u32 *mtpps, u32 mtpps_size) sizeof(out), MLX5_REG_MTPPS, 0, 1); } -int mlx5_query_mtppse(struct mlx5_core_dev *mdev, u8 pin, u8 *arm, u8 *mode) -{ - u32 out[MLX5_ST_SZ_DW(mtppse_reg)] = {0}; - u32 in[MLX5_ST_SZ_DW(mtppse_reg)] = {0}; - int err = 0; - - MLX5_SET(mtppse_reg, in, pin, pin); - - err = mlx5_core_access_reg(mdev, in, sizeof(in), out, - sizeof(out), MLX5_REG_MTPPSE, 0, 0); - if (err) - return err; - - *arm = MLX5_GET(mtppse_reg, in, event_arm); - *mode = MLX5_GET(mtppse_reg, in, event_generation_mode); - - return err; -} - int mlx5_set_mtppse(struct mlx5_core_dev *mdev, u8 pin, u8 arm, u8 mode) { u32 out[MLX5_ST_SZ_DW(mtppse_reg)] = {0}; diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c index cb34fc166ef9..0e47088ec44b 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c @@ -2024,7 +2024,8 @@ static const struct ethtool_ops fbnic_ethtool_ops = { ETHTOOL_OP_NEEDS_RTNL_GPAUSEPARAM | ETHTOOL_OP_NEEDS_RTNL_SPAUSEPARAM | ETHTOOL_OP_NEEDS_RTNL_SCHANNELS | - ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM, + ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM | + ETHTOOL_OP_NEEDS_RTNL_GLINK, .get_drvinfo = fbnic_get_drvinfo, .get_regs_len = fbnic_get_regs_len, .get_regs = fbnic_get_regs, diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_fw.c b/drivers/net/ethernet/meta/fbnic/fbnic_fw.c index 0c6812fcf185..283d25fae79e 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_fw.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_fw.c @@ -526,15 +526,10 @@ int fbnic_fw_xmit_ownership_msg(struct fbnic_dev *fbd, bool take_ownership) goto free_message; } - err = fbnic_mbx_map_tlv_msg(fbd, msg); - if (err) - goto free_message; - /* Initialize heartbeat, set last response to 1 second in the past * so that we will trigger a timeout if the firmware doesn't respond */ fbd->last_heartbeat_response = req_time - HZ; - fbd->last_heartbeat_request = req_time; /* Set prev_firmware_time to 0 to avoid triggering firmware crash @@ -542,6 +537,10 @@ int fbnic_fw_xmit_ownership_msg(struct fbnic_dev *fbd, bool take_ownership) */ fbd->prev_firmware_time = 0; + err = fbnic_mbx_map_tlv_msg(fbd, msg); + if (err) + goto free_message; + /* Set heartbeat detection based on if we are taking ownership */ fbd->fw_heartbeat_enabled = take_ownership; diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c index dd77ab6052c8..10bf99be3f24 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c @@ -264,8 +264,11 @@ static int fbnic_set_mac(struct net_device *netdev, void *p) eth_hw_addr_set(netdev, addr->sa_data); - if (netif_running(netdev)) + if (netif_running(netdev)) { + netif_addr_lock_bh(netdev); __fbnic_set_rx_mode(fbn->fbd, &netdev->uc, &netdev->mc); + netif_addr_unlock_bh(netdev); + } return 0; } @@ -310,8 +313,10 @@ void fbnic_clear_rx_mode(struct fbnic_dev *fbd) /* Write updates to hardware */ fbnic_write_macda(fbd); + netif_addr_lock_bh(netdev); __dev_uc_unsync(netdev, NULL); __dev_mc_unsync(netdev, NULL); + netif_addr_unlock_bh(netdev); } static int fbnic_hwtstamp_get(struct net_device *netdev, diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c index 7e85b480203c..8b9bc9e8ea56 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c @@ -135,7 +135,9 @@ void fbnic_up(struct fbnic_net *fbn) fbnic_rss_reinit_hw(fbn->fbd, fbn); + netif_addr_lock_bh(fbn->netdev); __fbnic_set_rx_mode(fbn->fbd, &fbn->netdev->uc, &fbn->netdev->mc); + netif_addr_unlock_bh(fbn->netdev); /* Enable Tx/Rx processing */ fbnic_napi_enable(fbn); @@ -180,7 +182,9 @@ static int fbnic_fw_config_after_crash(struct fbnic_dev *fbd) } fbnic_rpc_reset_valid_entries(fbd); + netif_addr_lock_bh(fbd->netdev); __fbnic_set_rx_mode(fbd, &fbd->netdev->uc, &fbd->netdev->mc); + netif_addr_unlock_bh(fbd->netdev); return 0; } diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c b/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c index fe95b6f69646..bc0f38b6a2b2 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c @@ -244,7 +244,9 @@ void fbnic_bmc_rpc_check(struct fbnic_dev *fbd) if (fbd->fw_cap.need_bmc_tcam_reinit) { fbnic_bmc_rpc_init(fbd); + netif_addr_lock_bh(fbd->netdev); __fbnic_set_rx_mode(fbd, &fbd->netdev->uc, &fbd->netdev->mc); + netif_addr_unlock_bh(fbd->netdev); fbd->fw_cap.need_bmc_tcam_reinit = false; } diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c b/drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c index 644458108dd2..dac4dd833127 100644 --- a/drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c +++ b/drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c @@ -765,11 +765,13 @@ int sparx5_register_notifier_blocks(struct sparx5 *s5) sparx5_owq = alloc_ordered_workqueue("sparx5_order", 0); if (!sparx5_owq) { err = -ENOMEM; - goto err_switchdev_blocking_nb; + goto err_alloc_workqueue; } return 0; +err_alloc_workqueue: + unregister_switchdev_blocking_notifier(&s5->switchdev_blocking_nb); err_switchdev_blocking_nb: unregister_switchdev_notifier(&s5->switchdev_nb); err_switchdev_nb: diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c index a0fdd052d7f1..e8b7ffb47eb9 100644 --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c @@ -210,6 +210,8 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev) } else { /* If dynamic allocation is enabled we have already allocated * hwc msi + * Also, we make sure in this case the following is always true + * (num_msix_usable - 1 HWC) <= num_online_cpus() */ gc->num_msix_usable = min(resp.max_msix, num_online_cpus() + 1); } @@ -1909,8 +1911,8 @@ void mana_gd_free_res_map(struct gdma_resource *r) * do the same thing. */ -static int irq_setup(unsigned int *irqs, unsigned int len, int node, - bool skip_first_cpu) +static int mana_irq_setup_numa_aware(unsigned int *irqs, unsigned int len, + int node, bool skip_first_cpu) { const struct cpumask *next, *prev = cpu_none_mask; cpumask_var_t cpus __free(free_cpumask_var); @@ -1946,11 +1948,24 @@ done: return 0; } +/* must be called with cpus_read_lock() held */ +static void mana_irq_setup_linear(unsigned int *irqs, unsigned int len) +{ + int cpu; + + for_each_online_cpu(cpu) { + if (len == 0) + break; + + irq_set_affinity_and_hint(*irqs++, cpumask_of(cpu)); + len--; + } +} + static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec) { struct gdma_context *gc = pci_get_drvdata(pdev); struct gdma_irq_context *gic; - bool skip_first_cpu = false; int *irqs, err, i, msi; irqs = kmalloc_objs(int, nvec); @@ -1958,10 +1973,12 @@ static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec) return -ENOMEM; /* + * In this function, num_msix_usable = HWC IRQ + Queue IRQ. + * nvec is only Queue IRQ (HWC already setup). * While processing the next pci irq vector, we start with index 1, * as IRQ vector at index 0 is already processed for HWC. * However, the population of irqs array starts with index 0, to be - * further used in irq_setup() + * further used in mana_irq_setup_numa_aware() */ for (i = 1; i <= nvec; i++) { msi = i; @@ -1975,18 +1992,51 @@ static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec) } /* - * When calling irq_setup() for dynamically added IRQs, if number of - * CPUs is more than or equal to allocated MSI-X, we need to skip the - * first CPU sibling group since they are already affinitized to HWC IRQ + * When calling mana_irq_setup_numa_aware() for dynamically added IRQs, + * if number of CPUs is more than or equal to allocated MSI-X, we need to + * skip the first CPU sibling group since they are already affinitized to + * HWC IRQ */ cpus_read_lock(); - if (gc->num_msix_usable <= num_online_cpus()) - skip_first_cpu = true; + if (gc->num_msix_usable <= num_online_cpus()) { + err = mana_irq_setup_numa_aware(irqs, nvec, gc->numa_node, + true); + if (err) { + cpus_read_unlock(); + goto free_irq; + } + } else { + /* + * When num_msix_usable are more than num_online_cpus, our + * queue IRQs should be equal to num of online vCPUs. + * We try to make sure queue IRQs spread across all vCPUs. + * In such a case NUMA or CPU core affinity does not matter. + * Note: in this case the total mana IRQ should always be + * num_online_cpus + 1. The first HWC IRQ is already handled + * in HWC setup calls + * However, if CPUs went offline since num_msix_usable was + * computed, queue IRQs will be more than num_online_cpus(). + * In such cases remaining extra IRQs will retain their default + * affinity. + */ + int first_unassigned = num_online_cpus(); - err = irq_setup(irqs, nvec, gc->numa_node, skip_first_cpu); - if (err) { - cpus_read_unlock(); - goto free_irq; + if (nvec > first_unassigned) { + char buf[32]; + + if (first_unassigned == nvec - 1) + snprintf(buf, sizeof(buf), "%d", + first_unassigned); + else + snprintf(buf, sizeof(buf), "%d-%d", + first_unassigned, nvec - 1); + + dev_dbg(&pdev->dev, + "MANA IRQ indices #%s will retain the default CPU affinity\n", + buf); + } + + mana_irq_setup_linear(irqs, nvec); } cpus_read_unlock(); @@ -2041,7 +2091,7 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev, int nvec) nvec -= 1; } - err = irq_setup(irqs, nvec, gc->numa_node, false); + err = mana_irq_setup_numa_aware(irqs, nvec, gc->numa_node, false); if (err) { cpus_read_unlock(); goto free_irq; diff --git a/drivers/net/ethernet/microsoft/mana/mana_bpf.c b/drivers/net/ethernet/microsoft/mana/mana_bpf.c index b5e9bb184a1d..53308e139cbe 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_bpf.c +++ b/drivers/net/ethernet/microsoft/mana/mana_bpf.c @@ -237,7 +237,8 @@ static int mana_xdp_set(struct net_device *ndev, struct bpf_prog *prog, bpf_prog_put(old_prog); if (prog) - ndev->max_mtu = MANA_XDP_MTU_MAX; + ndev->max_mtu = min_t(unsigned int, MANA_XDP_MTU_MAX, + gc->adapter_mtu - ETH_HLEN); else ndev->max_mtu = gc->adapter_mtu - ETH_HLEN; diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c index 87862b0434c7..7438ea6b3f26 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -1233,12 +1233,24 @@ int mana_gd_query_device_cfg(struct gdma_context *gc, u32 proto_major_ver, *max_num_vports = resp.max_num_vports; if (resp.hdr.response.msg_version >= GDMA_MESSAGE_V2) { - if (resp.adapter_mtu < ETH_MIN_MTU + ETH_HLEN) { + if (resp.adapter_mtu == 0) { + /* + * Some older PF firmware versions report an + * adapter_mtu of 0. MANA hardware always supports the + * standard Ethernet MTU, so fall back to ETH_FRAME_LEN. + * Jumbo frames will not be available in this case. + */ + dev_info(dev, + "PF reported adapter_mtu of 0, falling back to %u (jumbo frames disabled)\n", + ETH_FRAME_LEN); + gc->adapter_mtu = ETH_FRAME_LEN; + } else if (resp.adapter_mtu < ETH_MIN_MTU + ETH_HLEN) { dev_err(dev, "Adapter MTU too small: %u\n", resp.adapter_mtu); return -EPROTO; + } else { + gc->adapter_mtu = resp.adapter_mtu; } - gc->adapter_mtu = resp.adapter_mtu; } else { gc->adapter_mtu = ETH_FRAME_LEN; } diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c index 94e658d07a27..881df597d7f9 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c @@ -597,7 +597,8 @@ static int mana_get_link_ksettings(struct net_device *ndev, const struct ethtool_ops mana_ethtool_ops = { .supported_coalesce_params = ETHTOOL_COALESCE_RX_CQE_FRAMES, .op_needs_rtnl = ETHTOOL_OP_NEEDS_RTNL_SCHANNELS | - ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM, + ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM | + ETHTOOL_OP_NEEDS_RTNL_GLINK, .get_ethtool_stats = mana_get_ethtool_stats, .get_sset_count = mana_get_sset_count, .get_strings = mana_get_strings, diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c index 62f05f4569b1..48b94ce77490 100644 --- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c +++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c @@ -1420,13 +1420,25 @@ pch_gbe_alloc_rx_buffers_pool(struct pch_gbe_adapter *adapter, return 0; } +static void pch_gbe_free_rx_buffers_pool(struct pch_gbe_adapter *adapter, + struct pch_gbe_rx_ring *rx_ring) +{ + dma_free_coherent(&adapter->pdev->dev, rx_ring->rx_buff_pool_size, + rx_ring->rx_buff_pool, rx_ring->rx_buff_pool_logic); + rx_ring->rx_buff_pool_logic = 0; + rx_ring->rx_buff_pool_size = 0; + rx_ring->rx_buff_pool = NULL; +} + /** * pch_gbe_alloc_tx_buffers - Allocate transmit buffers * @adapter: Board private structure * @tx_ring: Tx descriptor ring + * + * Return: 0 on success, -ENOMEM if a TX skb allocation fails. */ -static void pch_gbe_alloc_tx_buffers(struct pch_gbe_adapter *adapter, - struct pch_gbe_tx_ring *tx_ring) +static int pch_gbe_alloc_tx_buffers(struct pch_gbe_adapter *adapter, + struct pch_gbe_tx_ring *tx_ring) { struct pch_gbe_buffer *buffer_info; struct sk_buff *skb; @@ -1440,12 +1452,17 @@ static void pch_gbe_alloc_tx_buffers(struct pch_gbe_adapter *adapter, for (i = 0; i < tx_ring->count; i++) { buffer_info = &tx_ring->buffer_info[i]; skb = netdev_alloc_skb(adapter->netdev, bufsz); + if (!skb) { + pch_gbe_clean_tx_ring(adapter, tx_ring); + return -ENOMEM; + } skb_reserve(skb, PCH_GBE_DMA_ALIGN); buffer_info->skb = skb; tx_desc = PCH_GBE_TX_DESC(*tx_ring, i); tx_desc->gbec_status = (DSC_INIT16); } - return; + + return 0; } /** @@ -1887,7 +1904,12 @@ int pch_gbe_up(struct pch_gbe_adapter *adapter) "Error: can't bring device up - alloc rx buffers pool failed\n"); goto freeirq; } - pch_gbe_alloc_tx_buffers(adapter, tx_ring); + err = pch_gbe_alloc_tx_buffers(adapter, tx_ring); + if (err) { + netdev_err(netdev, + "Error: can't bring device up - alloc tx buffers failed\n"); + goto freebuf; + } pch_gbe_alloc_rx_buffers(adapter, rx_ring, rx_ring->count); adapter->tx_queue_len = netdev->tx_queue_len; pch_gbe_enable_dma_rx(&adapter->hw); @@ -1901,6 +1923,8 @@ int pch_gbe_up(struct pch_gbe_adapter *adapter) return 0; +freebuf: + pch_gbe_free_rx_buffers_pool(adapter, rx_ring); freeirq: pch_gbe_free_irq(adapter); out: @@ -1936,11 +1960,7 @@ void pch_gbe_down(struct pch_gbe_adapter *adapter) pch_gbe_clean_tx_ring(adapter, adapter->tx_ring); pch_gbe_clean_rx_ring(adapter, adapter->rx_ring); - dma_free_coherent(&adapter->pdev->dev, rx_ring->rx_buff_pool_size, - rx_ring->rx_buff_pool, rx_ring->rx_buff_pool_logic); - rx_ring->rx_buff_pool_logic = 0; - rx_ring->rx_buff_pool_size = 0; - rx_ring->rx_buff_pool = NULL; + pch_gbe_free_rx_buffers_pool(adapter, rx_ring); } /** diff --git a/drivers/net/ethernet/rocker/rocker_ofdpa.c b/drivers/net/ethernet/rocker/rocker_ofdpa.c index 66a8ae67c3ea..15d19a8a1710 100644 --- a/drivers/net/ethernet/rocker/rocker_ofdpa.c +++ b/drivers/net/ethernet/rocker/rocker_ofdpa.c @@ -1924,6 +1924,9 @@ static int ofdpa_port_fdb(struct ofdpa_port *ofdpa_port, flags |= OFDPA_OP_FLAG_REFRESH; } + if (found && removing) + kfree(found); + return ofdpa_port_fdb_learn(ofdpa_port, flags, addr, vlan_id); } diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-spacemit.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-spacemit.c index 223754cc5c79..322bdf167a4a 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-spacemit.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-spacemit.c @@ -18,10 +18,12 @@ #include "stmmac_platform.h" /* ctrl register bits */ -#define CTRL_PHY_INTF_RGMII BIT(3) -#define CTRL_PHY_INTF_MII BIT(4) -#define CTRL_WAKE_IRQ_EN BIT(9) -#define CTRL_PHY_IRQ_EN BIT(12) +#define CTRL_PHY_INTF_MODE GENMASK(4, 3) +#define CTRL_PHY_INTF_RMII FIELD_PREP(CTRL_PHY_INTF_MODE, 0) +#define CTRL_PHY_INTF_RGMII FIELD_PREP(CTRL_PHY_INTF_MODE, 1) +#define CTRL_PHY_INTF_MII FIELD_PREP(CTRL_PHY_INTF_MODE, 3) +#define CTRL_LPI_IRQ_EN BIT(9) +#define CTRL_WAKE_IRQ_EN BIT(12) /* dline register bits */ #define RGMII_RX_DLINE_EN BIT(0) @@ -118,7 +120,7 @@ static void spacemit_get_interfaces(struct stmmac_priv *priv, void *bsp_priv, static int spacemit_set_phy_intf_sel(void *bsp_priv, u8 phy_intf_sel) { - unsigned int mask = CTRL_PHY_INTF_MII | CTRL_PHY_INTF_RGMII; + unsigned int mask = CTRL_PHY_INTF_MODE; struct spacmit_dwmac *dwmac = bsp_priv; unsigned int val = 0; @@ -128,6 +130,7 @@ static int spacemit_set_phy_intf_sel(void *bsp_priv, u8 phy_intf_sel) break; case PHY_INTF_SEL_RMII: + val = CTRL_PHY_INTF_RMII; break; case PHY_INTF_SEL_RGMII: diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c index b56a0d4cdb12..dc5638d105db 100644 --- a/drivers/net/ethernet/sun/sungem.c +++ b/drivers/net/ethernet/sun/sungem.c @@ -2978,10 +2978,10 @@ static int gem_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) dev->max_mtu = GEM_MAX_MTU; /* Register with kernel */ - if (register_netdev(dev)) { + err = register_netdev(dev); + if (err) { pr_err("Cannot register net device, aborting\n"); - err = -ENOMEM; - goto err_out_free_consistent; + goto err_out_clear_drvdata; } /* Undo the get_cell with appropriate locking (we could use @@ -2995,8 +2995,13 @@ static int gem_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) dev->dev_addr); return 0; +err_out_clear_drvdata: + pci_set_drvdata(pdev, NULL); + netif_napi_del(&gp->napi); + err_out_free_consistent: - gem_remove_one(pdev); + dma_free_coherent(&pdev->dev, sizeof(struct gem_init_block), + gp->init_block, gp->gblock_dvma); err_out_iounmap: gem_put_cell(gp); iounmap(gp->regs); diff --git a/drivers/net/ethernet/sunplus/spl2sw_phy.c b/drivers/net/ethernet/sunplus/spl2sw_phy.c index 6f899e48f51d..a4889c52e00e 100644 --- a/drivers/net/ethernet/sunplus/spl2sw_phy.c +++ b/drivers/net/ethernet/sunplus/spl2sw_phy.c @@ -79,12 +79,14 @@ int spl2sw_phy_connect(struct spl2sw_common *comm) void spl2sw_phy_remove(struct spl2sw_common *comm) { struct net_device *ndev; + struct spl2sw_mac *mac; int i; for (i = 0; i < MAX_NETDEV_NUM; i++) if (comm->ndev[i]) { ndev = comm->ndev[i]; - if (ndev) - phy_disconnect(ndev->phydev); + mac = netdev_priv(ndev); + phy_disconnect(ndev->phydev); + of_node_put(mac->phy_node); } } diff --git a/drivers/net/ethernet/ti/icssg/icssg_common.c b/drivers/net/ethernet/ti/icssg/icssg_common.c index 82ddef9c17d5..4a7d1a6f470b 100644 --- a/drivers/net/ethernet/ti/icssg/icssg_common.c +++ b/drivers/net/ethernet/ti/icssg/icssg_common.c @@ -93,8 +93,8 @@ void prueth_ndev_del_tx_napi(struct prueth_emac *emac, int num) } EXPORT_SYMBOL_GPL(prueth_ndev_del_tx_napi); -static int emac_xsk_xmit_zc(struct prueth_emac *emac, - unsigned int q_idx) +static void emac_xsk_xmit_zc(struct prueth_emac *emac, + unsigned int q_idx) { struct prueth_tx_chn *tx_chn = &emac->tx_chns[q_idx]; struct xsk_buff_pool *pool = tx_chn->xsk_pool; @@ -115,7 +115,7 @@ static int emac_xsk_xmit_zc(struct prueth_emac *emac, * necessary */ if (descs_avail <= MAX_SKB_FRAGS) - return 0; + return; descs_avail -= MAX_SKB_FRAGS; @@ -170,8 +170,8 @@ static int emac_xsk_xmit_zc(struct prueth_emac *emac, num_tx++; } - xsk_tx_release(tx_chn->xsk_pool); - return num_tx; + if (num_tx) + xsk_tx_release(tx_chn->xsk_pool); } void prueth_xmit_free(struct prueth_tx_chn *tx_chn, @@ -279,9 +279,6 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn, num_tx++; } - if (!num_tx) - return 0; - netif_txq = netdev_get_tx_queue(ndev, chn); netdev_tx_completed_queue(netif_txq, num_tx, total_bytes); @@ -297,16 +294,18 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn, __netif_tx_unlock(netif_txq); } - if (tx_chn->xsk_pool) { - if (xsk_frames_done) + if (budget && tx_chn->xsk_pool) { + if (xsk_frames_done) { xsk_tx_completed(tx_chn->xsk_pool, xsk_frames_done); + txq_trans_cond_update(netif_txq); + } if (xsk_uses_need_wakeup(tx_chn->xsk_pool)) xsk_set_tx_need_wakeup(tx_chn->xsk_pool); - netif_txq = netdev_get_tx_queue(ndev, chn); - txq_trans_cond_update(netif_txq); + __netif_tx_lock(netif_txq, smp_processor_id()); emac_xsk_xmit_zc(emac, chn); + __netif_tx_unlock(netif_txq); } return num_tx; @@ -1652,28 +1651,35 @@ void icssg_ndo_get_stats64(struct net_device *ndev, stats->rx_over_errors = emac_get_stat_by_name(emac, "rx_over_errors"); stats->multicast = emac_get_stat_by_name(emac, "rx_multicast_frames"); - stats->rx_errors = ndev->stats.rx_errors + - emac_get_stat_by_name(emac, "FW_RX_ERROR") + - emac_get_stat_by_name(emac, "FW_RX_EOF_SHORT_FRMERR") + - emac_get_stat_by_name(emac, "FW_RX_B0_DROP_EARLY_EOF") + - emac_get_stat_by_name(emac, "FW_RX_EXP_FRAG_Q_DROP") + - emac_get_stat_by_name(emac, "FW_RX_FIFO_OVERRUN"); - stats->rx_dropped = ndev->stats.rx_dropped + - emac_get_stat_by_name(emac, "FW_DROPPED_PKT") + - emac_get_stat_by_name(emac, "FW_INF_PORT_DISABLED") + - emac_get_stat_by_name(emac, "FW_INF_SAV") + - emac_get_stat_by_name(emac, "FW_INF_SA_DL") + - emac_get_stat_by_name(emac, "FW_INF_PORT_BLOCKED") + - emac_get_stat_by_name(emac, "FW_INF_DROP_TAGGED") + - emac_get_stat_by_name(emac, "FW_INF_DROP_PRIOTAGGED") + - emac_get_stat_by_name(emac, "FW_INF_DROP_NOTAG") + - emac_get_stat_by_name(emac, "FW_INF_DROP_NOTMEMBER"); + stats->rx_errors = ndev->stats.rx_errors; + stats->rx_dropped = ndev->stats.rx_dropped; stats->tx_errors = ndev->stats.tx_errors; - stats->tx_dropped = ndev->stats.tx_dropped + - emac_get_stat_by_name(emac, "FW_RTU_PKT_DROP") + - emac_get_stat_by_name(emac, "FW_TX_DROPPED_PACKET") + - emac_get_stat_by_name(emac, "FW_TX_TS_DROPPED_PACKET") + - emac_get_stat_by_name(emac, "FW_TX_JUMBO_FRM_CUTOFF"); + stats->tx_dropped = ndev->stats.tx_dropped; + + if (!emac->prueth->pa_stats) + return; + + stats->rx_errors += + emac_get_stat_by_name(emac, "FW_RX_ERROR") + + emac_get_stat_by_name(emac, "FW_RX_EOF_SHORT_FRMERR") + + emac_get_stat_by_name(emac, "FW_RX_B0_DROP_EARLY_EOF") + + emac_get_stat_by_name(emac, "FW_RX_EXP_FRAG_Q_DROP") + + emac_get_stat_by_name(emac, "FW_RX_FIFO_OVERRUN"); + stats->rx_dropped += + emac_get_stat_by_name(emac, "FW_DROPPED_PKT") + + emac_get_stat_by_name(emac, "FW_INF_PORT_DISABLED") + + emac_get_stat_by_name(emac, "FW_INF_SAV") + + emac_get_stat_by_name(emac, "FW_INF_SA_DL") + + emac_get_stat_by_name(emac, "FW_INF_PORT_BLOCKED") + + emac_get_stat_by_name(emac, "FW_INF_DROP_TAGGED") + + emac_get_stat_by_name(emac, "FW_INF_DROP_PRIOTAGGED") + + emac_get_stat_by_name(emac, "FW_INF_DROP_NOTAG") + + emac_get_stat_by_name(emac, "FW_INF_DROP_NOTMEMBER"); + stats->tx_dropped += + emac_get_stat_by_name(emac, "FW_RTU_PKT_DROP") + + emac_get_stat_by_name(emac, "FW_TX_DROPPED_PACKET") + + emac_get_stat_by_name(emac, "FW_TX_TS_DROPPED_PACKET") + + emac_get_stat_by_name(emac, "FW_TX_JUMBO_FRM_CUTOFF"); } EXPORT_SYMBOL_GPL(icssg_ndo_get_stats64); diff --git a/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c b/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c index 8678c49b892a..a16221995909 100644 --- a/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c +++ b/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c @@ -715,7 +715,6 @@ static int ngbe_probe(struct pci_dev *pdev, netdev->features |= NETIF_F_GRO; netdev->priv_flags |= IFF_UNICAST_FLT; - netdev->priv_flags |= IFF_SUPP_NOFCS; netdev->priv_flags |= IFF_LIVE_ADDR_CHANGE; netdev->min_mtu = ETH_MIN_MTU; diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c index ce82e13aa8ae..20c5a295c6c2 100644 --- a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c +++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c @@ -796,7 +796,6 @@ static int txgbe_probe(struct pci_dev *pdev, netdev->features |= NETIF_F_RX_UDP_TUNNEL_PORT; netdev->priv_flags |= IFF_UNICAST_FLT; - netdev->priv_flags |= IFF_SUPP_NOFCS; netdev->priv_flags |= IFF_LIVE_ADDR_CHANGE; netdev->min_mtu = ETH_MIN_MTU; diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index 9afff7bcaa0b..396e1a113cd4 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -954,13 +954,27 @@ static int geneve_gro_complete(struct sock *sk, struct sk_buff *skb, struct genevehdr *gh; struct packet_offload *ptype; __be16 type; - int gh_len; + unsigned int gh_len; int err = -ENOSYS; gh = (struct genevehdr *)(skb->data + nhoff); gh_len = geneve_hlen(gh); type = gh->proto_type; - geneve_opt_gro_hint_off(gh, &type, &gh_len); + geneve_sk_gro_hint_off(sk, gh, &type, &gh_len); + + /* Bail out if we are about to dispatch past the inner network header + * gro_receive() validated. An inner VLAN tag only pushes + * inner_network_offset out, so use a lower bound. + */ + if (skb->encapsulation) { + unsigned int inner_nh = nhoff + gh_len; + + if (type == htons(ETH_P_TEB)) + inner_nh += ETH_HLEN; + + if (unlikely(inner_nh > NAPI_GRO_CB(skb)->inner_network_offset)) + return -EINVAL; + } /* since skb->encapsulation is set, eth_gro_complete() sets the inner mac header */ if (likely(type == htons(ETH_P_TEB))) diff --git a/drivers/net/ieee802154/ca8210.c b/drivers/net/ieee802154/ca8210.c index ed4178155a5d..01af4f9cf7f2 100644 --- a/drivers/net/ieee802154/ca8210.c +++ b/drivers/net/ieee802154/ca8210.c @@ -595,7 +595,7 @@ static int ca8210_test_int_driver_write( fifo_buffer = kmemdup(buf, len, GFP_KERNEL); if (!fifo_buffer) return -ENOMEM; - kfifo_in(&test->up_fifo, &fifo_buffer, 4); + kfifo_in(&test->up_fifo, &fifo_buffer, sizeof(fifo_buffer)); wake_up_interruptible(&priv->test.readq); return 0; @@ -919,9 +919,10 @@ static int ca8210_spi_transfer( if (status < 0) { dev_crit( &spi->dev, - "status %d from spi_sync in write\n", + "status %d from spi_async in write\n", status ); + kfree(cas_ctl); } return status; @@ -2525,6 +2526,7 @@ static ssize_t ca8210_test_int_user_read( struct ca8210_priv *priv = filp->private_data; unsigned char *fifo_buffer; unsigned long bytes_not_copied; + unsigned int copied; if (filp->f_flags & O_NONBLOCK) { /* Non-blocking mode */ @@ -2538,7 +2540,8 @@ static ssize_t ca8210_test_int_user_read( ); } - if (kfifo_out(&priv->test.up_fifo, &fifo_buffer, 4) != 4) { + copied = kfifo_out(&priv->test.up_fifo, &fifo_buffer, sizeof(fifo_buffer)); + if (copied != sizeof(fifo_buffer)) { dev_err( &priv->spi->dev, "test_interface: Wrong number of elements popped from upstream fifo\n" diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index a159cb293981..862001d09aa8 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -190,8 +190,10 @@ struct netconsole_target { bool extended; bool release; struct netpoll np; - /* protected by target_list_lock */ - char buf[MAX_PRINT_CHUNK]; + /* protected by target_list_lock; +1 gives scnprintf() room for its + * NUL terminator so a full MAX_PRINT_CHUNK payload is not truncated + */ + char buf[MAX_PRINT_CHUNK + 1]; struct work_struct resume_wq; }; @@ -1938,7 +1940,7 @@ static void send_msg_no_fragmentation(struct netconsole_target *nt, if (release_len) { release = init_utsname()->release; - scnprintf(nt->buf, MAX_PRINT_CHUNK, "%s,%.*s", release, + scnprintf(nt->buf, sizeof(nt->buf), "%s,%.*s", release, msg_len, msg); msg_len += release_len; } else { @@ -1947,12 +1949,12 @@ static void send_msg_no_fragmentation(struct netconsole_target *nt, if (userdata) msg_len += scnprintf(&nt->buf[msg_len], - MAX_PRINT_CHUNK - msg_len, "%s", + sizeof(nt->buf) - msg_len, "%s", userdata); if (sysdata) msg_len += scnprintf(&nt->buf[msg_len], - MAX_PRINT_CHUNK - msg_len, "%s", + sizeof(nt->buf) - msg_len, "%s", sysdata); send_udp(nt, nt->buf, msg_len); diff --git a/drivers/net/phy/realtek/realtek_main.c b/drivers/net/phy/realtek/realtek_main.c index 27268811f564..b65d0f5fa1a0 100644 --- a/drivers/net/phy/realtek/realtek_main.c +++ b/drivers/net/phy/realtek/realtek_main.c @@ -1802,7 +1802,8 @@ static int rtl822x_config_aneg(struct phy_device *phydev) ret = phy_modify_mmd_changed(phydev, MDIO_MMD_VEND2, RTL_MDIO_AN_10GBT_CTRL, MDIO_AN_10GBT_CTRL_ADV2_5G | - MDIO_AN_10GBT_CTRL_ADV5G, adv); + MDIO_AN_10GBT_CTRL_ADV5G | + MDIO_AN_10GBT_CTRL_ADV10G, adv); if (ret < 0) return ret; } diff --git a/drivers/net/pse-pd/pd692x0.c b/drivers/net/pse-pd/pd692x0.c index cb377d5ba7af..209de9cec849 100644 --- a/drivers/net/pse-pd/pd692x0.c +++ b/drivers/net/pse-pd/pd692x0.c @@ -200,7 +200,7 @@ static const struct pd692x0_msg pd692x0_msg_template_list[PD692X0_MSG_CNT] = { }, [PD692X0_MSG_SET_USER_BYTE] = { .key = PD692X0_KEY_PRG, - .sub = {0x41, PD692X0_USER_BYTE}, + .sub = {0x41, PD692X0_USER_BYTE, 0x4e}, .data = {0x4e, 0x4e, 0x4e, 0x4e, 0x4e, 0x4e, 0x4e, 0x4e}, }, diff --git a/drivers/net/thunderbolt/main.c b/drivers/net/thunderbolt/main.c index f8f97e8e2226..02a91650561a 100644 --- a/drivers/net/thunderbolt/main.c +++ b/drivers/net/thunderbolt/main.c @@ -783,8 +783,12 @@ static bool tbnet_check_frame(struct tbnet *net, const struct tbnet_frame *tf, return true; } - /* Start of packet, validate the frame header */ - if (frame_count == 0 || frame_count > TBNET_RING_SIZE / 4) { + /* Start of packet, validate the frame header. tbnet_poll() puts the + * first frame in the skb linear area and every further frame in a page + * fragment, so a packet may not span more than MAX_SKB_FRAGS + 1 frames + * without overflowing skb_shinfo()->frags[]. + */ + if (frame_count == 0 || frame_count > MAX_SKB_FRAGS + 1) { net->stats.rx_length_errors++; return false; } diff --git a/drivers/net/usb/kalmia.c b/drivers/net/usb/kalmia.c index ee9c48f7f68f..0dd0a30c3db4 100644 --- a/drivers/net/usb/kalmia.c +++ b/drivers/net/usb/kalmia.c @@ -276,6 +276,14 @@ kalmia_rx_fixup(struct usbnet *dev, struct sk_buff *skb) "Received header: %6phC. Package length: %i\n", header_start, skb->len - KALMIA_HEADER_LENGTH); + /* both framing headers must be present before we subtract + * them, otherwise usb_packet_length underflows and the + * device-supplied ether_packet_length drives an out of bounds + * access below + */ + if (skb->len < 2 * KALMIA_HEADER_LENGTH) + return 0; + /* subtract start header and end header */ usb_packet_length = skb->len - (2 * KALMIA_HEADER_LENGTH); ether_packet_length = get_unaligned_le16(&header_start[2]); diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c index bcf293ea1bd3..c4cebacabcb5 100644 --- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -1452,6 +1452,15 @@ static inline u32 lan78xx_hash(char addr[ETH_ALEN]) return (ether_crc(ETH_ALEN, addr) >> 23) & 0x1ff; } +static int lan78xx_write_mchash_table(struct lan78xx_net *dev) +{ + struct lan78xx_priv *pdata = (struct lan78xx_priv *)(dev->data[0]); + + return lan78xx_dataport_write(dev, DP_SEL_RSEL_VLAN_DA_, + DP_SEL_VHF_VLAN_LEN, + DP_SEL_VHF_HASH_LEN, pdata->mchash_table); +} + static void lan78xx_deferred_multicast_write(struct work_struct *param) { struct lan78xx_priv *pdata = @@ -1462,9 +1471,7 @@ static void lan78xx_deferred_multicast_write(struct work_struct *param) netif_dbg(dev, drv, dev->net, "deferred multicast write 0x%08x\n", pdata->rfe_ctl); - ret = lan78xx_dataport_write(dev, DP_SEL_RSEL_VLAN_DA_, - DP_SEL_VHF_VLAN_LEN, - DP_SEL_VHF_HASH_LEN, pdata->mchash_table); + ret = lan78xx_write_mchash_table(dev); if (ret < 0) goto multicast_write_done; @@ -1557,6 +1564,7 @@ static void lan78xx_set_multicast(struct net_device *netdev) } static void lan78xx_rx_urb_submit_all(struct lan78xx_net *dev); +static int lan78xx_write_vlan_table(struct lan78xx_net *dev); static int lan78xx_mac_reset(struct lan78xx_net *dev) { @@ -2514,6 +2522,17 @@ static void lan78xx_mac_link_up(struct phylink_config *config, if (ret < 0) goto link_up_fail; + /* The RFE clears the VLAN/DA hash filter (VHF) on a link down/up + * cycle, so reprogram both tables from their shadow copies. + */ + ret = lan78xx_write_vlan_table(dev); + if (ret < 0) + goto link_up_fail; + + ret = lan78xx_write_mchash_table(dev); + if (ret < 0) + goto link_up_fail; + netif_start_queue(net); return; @@ -3065,14 +3084,20 @@ static int lan78xx_set_features(struct net_device *netdev, return lan78xx_write_reg(dev, RFE_CTL, pdata->rfe_ctl); } +static int lan78xx_write_vlan_table(struct lan78xx_net *dev) +{ + struct lan78xx_priv *pdata = (struct lan78xx_priv *)(dev->data[0]); + + return lan78xx_dataport_write(dev, DP_SEL_RSEL_VLAN_DA_, 0, + DP_SEL_VHF_VLAN_LEN, pdata->vlan_table); +} + static void lan78xx_deferred_vlan_write(struct work_struct *param) { struct lan78xx_priv *pdata = container_of(param, struct lan78xx_priv, set_vlan); - struct lan78xx_net *dev = pdata->dev; - lan78xx_dataport_write(dev, DP_SEL_RSEL_VLAN_DA_, 0, - DP_SEL_VHF_VLAN_LEN, pdata->vlan_table); + lan78xx_write_vlan_table(pdata->dev); } static int lan78xx_vlan_rx_add_vid(struct net_device *netdev, diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 0cfb19b760dd..1c5142149175 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -1137,6 +1137,8 @@ static int veth_enable_xdp_range(struct net_device *dev, int start, int end, err_reg_mem: xdp_rxq_info_unreg(&priv->rq[i].xdp_rxq); err_rxq_reg: + if (!napi_already_on) + netif_napi_del(&priv->rq[i].xdp_napi); for (i--; i >= start; i--) { struct veth_rq *rq = &priv->rq[i]; diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 7d2eeb9b1226..26afa6341d16 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1999,15 +1999,18 @@ static struct sk_buff *receive_big(struct net_device *dev, struct virtnet_rq_stats *stats) { struct page *page = buf; + unsigned long max_len; struct sk_buff *skb; + max_len = (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE - + sizeof(struct padded_vnet_hdr) + vi->hdr_len; + /* Make sure that len does not exceed the size allocated in * add_recvbuf_big. */ - if (unlikely(len > (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE)) { + if (unlikely(len > max_len)) { pr_debug("%s: rx error: len %u exceeds allocated size %lu\n", - dev->name, len, - (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE); + dev->name, len, max_len); goto err; } diff --git a/drivers/net/wan/hdlc_ppp.c b/drivers/net/wan/hdlc_ppp.c index 159295c4bd6d..302ed27944e7 100644 --- a/drivers/net/wan/hdlc_ppp.c +++ b/drivers/net/wan/hdlc_ppp.c @@ -619,7 +619,6 @@ static void ppp_start(struct net_device *dev) struct proto *proto = &ppp->protos[i]; proto->dev = dev; - timer_setup(&proto->timer, ppp_timer, 0); proto->state = CLOSED; } ppp->protos[IDX_LCP].pid = PID_LCP; @@ -639,6 +638,15 @@ static void ppp_close(struct net_device *dev) ppp_tx_flush(); } +static void ppp_timer_release(struct net_device *dev) +{ + struct ppp *ppp = get_ppp(dev); + int i; + + for (i = 0; i < IDX_COUNT; i++) + timer_shutdown_sync(&ppp->protos[i].timer); +} + static struct hdlc_proto proto = { .start = ppp_start, .stop = ppp_stop, @@ -647,6 +655,7 @@ static struct hdlc_proto proto = { .ioctl = ppp_ioctl, .netif_rx = ppp_rx, .module = THIS_MODULE, + .detach = ppp_timer_release, }; static const struct header_ops ppp_header_ops = { @@ -657,7 +666,7 @@ static int ppp_ioctl(struct net_device *dev, struct if_settings *ifs) { hdlc_device *hdlc = dev_to_hdlc(dev); struct ppp *ppp; - int result; + int i, result; switch (ifs->type) { case IF_GET_PROTO: @@ -685,6 +694,8 @@ static int ppp_ioctl(struct net_device *dev, struct if_settings *ifs) return result; ppp = get_ppp(dev); + for (i = 0; i < IDX_COUNT; i++) + timer_setup(&ppp->protos[i].timer, ppp_timer, 0); spin_lock_init(&ppp->lock); ppp->req_timeout = 2; ppp->cr_retries = 10; diff --git a/drivers/net/wan/ixp4xx_hss.c b/drivers/net/wan/ixp4xx_hss.c index 720c5dc889ea..7f4645ff90aa 100644 --- a/drivers/net/wan/ixp4xx_hss.c +++ b/drivers/net/wan/ixp4xx_hss.c @@ -1487,11 +1487,11 @@ static int ixp4xx_hss_probe(struct platform_device *pdev) "unable to get CLK internal GPIO\n"); ndev = alloc_hdlcdev(port); - port->netdev = alloc_hdlcdev(port); - if (!port->netdev) { + if (!ndev) { err = -ENOMEM; goto err_plat; } + port->netdev = ndev; SET_NETDEV_DEV(ndev, &pdev->dev); hdlc = dev_to_hdlc(ndev); diff --git a/drivers/net/wwan/t7xx/t7xx_hif_cldma.c b/drivers/net/wwan/t7xx/t7xx_hif_cldma.c index e10cb4f9104e..2917cee9b802 100644 --- a/drivers/net/wwan/t7xx/t7xx_hif_cldma.c +++ b/drivers/net/wwan/t7xx/t7xx_hif_cldma.c @@ -1063,6 +1063,9 @@ err_free_tx_ring: while (i--) t7xx_cldma_ring_free(md_ctrl, &md_ctrl->tx_ring[i], DMA_TO_DEVICE); + dma_pool_destroy(md_ctrl->gpd_dmapool); + md_ctrl->gpd_dmapool = NULL; + return ret; } diff --git a/fs/afs/cm_security.c b/fs/afs/cm_security.c index edcbd249d202..103168c70dd4 100644 --- a/fs/afs/cm_security.c +++ b/fs/afs/cm_security.c @@ -101,7 +101,8 @@ void afs_process_oob_queue(struct work_struct *work) struct sk_buff *oob; enum rxrpc_oob_type type; - while ((oob = rxrpc_kernel_dequeue_oob(net->socket, &type))) { + while (READ_ONCE(net->live) && + (oob = rxrpc_kernel_dequeue_oob(net->socket, &type))) { switch (type) { case RXRPC_OOB_CHALLENGE: afs_respond_to_challenge(oob); diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c index d5cfd24e815b..d82916657a3d 100644 --- a/fs/afs/rxrpc.c +++ b/fs/afs/rxrpc.c @@ -128,8 +128,14 @@ void afs_close_socket(struct afs_net *net) _enter(""); cancel_work_sync(&net->charge_preallocation_work); + cancel_work_sync(&net->rx_oob_work); + /* Future work items should now see ->live is false. */ + kernel_listen(net->socket, 0); + + /* Make sure work items are no longer running. */ flush_workqueue(afs_async_calls); + cancel_work_sync(&net->charge_preallocation_work); if (net->spare_incoming_call) { afs_put_call(net->spare_incoming_call); @@ -143,6 +149,7 @@ void afs_close_socket(struct afs_net *net) kernel_sock_shutdown(net->socket, SHUT_RDWR); flush_workqueue(afs_async_calls); + cancel_work_sync(&net->rx_oob_work); net->socket->sk->sk_user_data = NULL; sock_release(net->socket); key_put(net->fs_cm_token_key); @@ -984,5 +991,6 @@ static void afs_rx_notify_oob(struct sock *sk, struct sk_buff *oob) { struct afs_net *net = sk->sk_user_data; - schedule_work(&net->rx_oob_work); + if (READ_ONCE(net->live)) + queue_work(afs_wq, &net->rx_oob_work); } diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h index 1b834e2a522e..5d491a98265e 100644 --- a/include/linux/ethtool.h +++ b/include/linux/ethtool.h @@ -942,6 +942,7 @@ struct kernel_ethtool_ts_info { #define ETHTOOL_OP_NEEDS_RTNL_GPAUSEPARAM BIT(5) #define ETHTOOL_OP_NEEDS_RTNL_SPAUSEPARAM BIT(6) #define ETHTOOL_OP_NEEDS_RTNL_RSS BIT(7) +#define ETHTOOL_OP_NEEDS_RTNL_GLINK BIT(8) /** * struct ethtool_ops - optional netdev operations @@ -978,6 +979,7 @@ struct kernel_ethtool_ts_info { * - phylink helpers (note that phydev is currently unsupported!) * - netdev_update_features() * - netif_set_real_num_tx_queues() + * - ethtool_op_get_link() (syncs link watch under rtnl_lock) * * @get_drvinfo: Report driver/device information. Modern drivers no * longer have to implement this callback. Most fields are diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index b67a12541eac..9981d637f8b5 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1131,6 +1131,9 @@ struct netdev_net_notifier { * netdev_hw_addr_list_for_each(ha, uc). Return 0 on success or a * negative errno to request a retry via the core backoff. * + * void (*ndo_work)(struct net_device *dev, unsigned long events); + * Run deferred work scheduled with netdev_work_sched(@events). + * * int (*ndo_set_mac_address)(struct net_device *dev, void *addr); * This function is called when the Media Access Control address * needs to be changed. If this interface is not defined, the @@ -1460,6 +1463,8 @@ struct net_device_ops { struct net_device *dev, struct netdev_hw_addr_list *uc, struct netdev_hw_addr_list *mc); + void (*ndo_work)(struct net_device *dev, + unsigned long events); int (*ndo_set_mac_address)(struct net_device *dev, void *addr); int (*ndo_validate_addr)(struct net_device *dev); @@ -1930,8 +1935,11 @@ enum netdev_reg_state { * has been enabled due to the need to listen to * additional unicast addresses in a device that * does not implement ndo_set_rx_mode() - * @rx_mode_node: List entry for rx_mode work processing - * @rx_mode_tracker: Refcount tracker for rx_mode work + * @work_node: List entry for async netdev_work processing + * @work_tracker: Refcount tracker for async netdev_work + * @work_pending: Driver-defined pending netdev_work, passed to + * ndo_work() (see netdev_work_sched()) + * @work_core_pending: Core-defined pending netdev_work (NETDEV_WORK_*) * @rx_mode_addr_cache: Recycled snapshot entries for rx_mode work * @rx_mode_retry_timer: Timer that re-queues rx_mode work after failure * @rx_mode_retry_count: Number of consecutive retries already scheduled @@ -2326,8 +2334,10 @@ struct net_device { unsigned int promiscuity; unsigned int allmulti; bool uc_promisc; - struct list_head rx_mode_node; - netdevice_tracker rx_mode_tracker; + struct list_head work_node; + netdevice_tracker work_tracker; + unsigned long work_pending; + unsigned long work_core_pending; struct netdev_hw_addr_list rx_mode_addr_cache; struct timer_list rx_mode_retry_timer; unsigned int rx_mode_retry_count; @@ -5176,6 +5186,9 @@ void dev_fetch_sw_netstats(struct rtnl_link_stats64 *s, const struct pcpu_sw_netstats __percpu *netstats); void dev_get_tstats64(struct net_device *dev, struct rtnl_link_stats64 *s); +void netdev_work_sched(struct net_device *dev, unsigned long events); +unsigned long netdev_work_cancel(struct net_device *dev, unsigned long mask); + enum { NESTED_SYNC_IMM_BIT, NESTED_SYNC_TODO_BIT, diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h index 20d70dddbe50..25062f4a0dd5 100644 --- a/include/linux/netfilter/x_tables.h +++ b/include/linux/netfilter/x_tables.h @@ -18,7 +18,7 @@ * @match: the match extension * @target: the target extension * @matchinfo: per-match data - * @targetinfo: per-target data + * @targinfo: per-target data * @state: pointer to hook state this packet came from * @fragoff: packet is a fragment, this is the data offset * @thoff: position of transport header relative to skb->data @@ -77,7 +77,9 @@ static inline u_int8_t xt_family(const struct xt_action_param *par) * @match: struct xt_match through which this function was invoked * @matchinfo: per-match data * @hook_mask: via which hooks the new rule is reachable - * Other fields as above. + * @family: actual NFPROTO_* through which the function is invoked + * (helpful when match->family == NFPROTO_UNSPEC) + * @nft_compat: running from the nft compat layer if true */ struct xt_mtchk_param { struct net *net; @@ -91,8 +93,13 @@ struct xt_mtchk_param { }; /** - * struct xt_mdtor_param - match destructor parameters - * Fields as above. + * struct xt_mtdtor_param - match destructor parameters + * + * @net: network namespace through which the check was invoked + * @match: struct xt_match through which this function was invoked + * @matchinfo: per-match data + * @family: actual NFPROTO_* through which the function is invoked + * (helpful when match->family == NFPROTO_UNSPEC) */ struct xt_mtdtor_param { struct net *net; @@ -105,10 +112,16 @@ struct xt_mtdtor_param { * struct xt_tgchk_param - parameters for target extensions' * checkentry functions * + * @net: network namespace through which the check was invoked + * @table: table the rule is tried to be inserted into * @entryinfo: the family-specific rule data * (struct ipt_entry, ip6t_entry, arpt_entry, ebt_entry) - * - * Other fields see above. + * @target: the target extension + * @targinfo: per-target data + * @hook_mask: via which hooks the new rule is reachable + * @family: actual NFPROTO_* through which the function is invoked + * (helpful when match->family == NFPROTO_UNSPEC) + * @nft_compat: running from the nft compat layer if true */ struct xt_tgchk_param { struct net *net; @@ -336,9 +349,9 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size); void xt_free_table_info(struct xt_table_info *info); /** - * xt_recseq - recursive seqcount for netfilter use + * var xt_recseq - recursive seqcount for netfilter use * - * Packet processing changes the seqcount only if no recursion happened + * Packet processing changes the seqcount only if no recursion happened. * get_counters() can use read_seqcount_begin()/read_seqcount_retry(), * because we use the normal seqcount convention : * Low order bit set to 1 if a writer is active. diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h index 1fc2fb03ce3f..f45d1e3163f0 100644 --- a/include/net/dst_metadata.h +++ b/include/net/dst_metadata.h @@ -164,8 +164,11 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) if (!new_md) return ERR_PTR(-ENOMEM); - memcpy(&new_md->u.tun_info, &md_dst->u.tun_info, - sizeof(struct ip_tunnel_info) + md_size); + /* Copy in two stages to keep the __counted_by happy. */ + new_md->u.tun_info = md_dst->u.tun_info; + memcpy(ip_tunnel_info_opts(&new_md->u.tun_info), + ip_tunnel_info_opts(&md_dst->u.tun_info), md_size); + #ifdef CONFIG_DST_CACHE /* Unclone the dst cache if there is one */ if (new_md->u.tun_info.dst_cache.cache) { diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index a71a98505650..c63a3c4967ae 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -374,7 +374,7 @@ static inline int fib_lookup(struct net *net, struct flowi4 *flp, struct fib_result *res, unsigned int flags) { struct fib_table *tb; - int err = -ENETUNREACH; + int err = -EAGAIN; flags |= FIB_LOOKUP_NOREF; if (net->ipv4.fib_has_custom_rules) @@ -388,17 +388,16 @@ static inline int fib_lookup(struct net *net, struct flowi4 *flp, if (tb) err = fib_table_lookup(tb, flp, res, flags); - if (!err) + if (err != -EAGAIN) goto out; tb = rcu_dereference_rtnl(net->ipv4.fib_default); if (tb) err = fib_table_lookup(tb, flp, res, flags); -out: if (err == -EAGAIN) err = -ENETUNREACH; - +out: rcu_read_unlock(); return err; diff --git a/include/net/netfilter/nf_conntrack_expect.h b/include/net/netfilter/nf_conntrack_expect.h index 80f50fd0f7ad..c024345c9bd8 100644 --- a/include/net/netfilter/nf_conntrack_expect.h +++ b/include/net/netfilter/nf_conntrack_expect.h @@ -26,6 +26,7 @@ struct nf_conntrack_expect { possible_net_t net; /* We expect this tuple, with the following mask */ + struct nf_conntrack_tuple master_tuple; struct nf_conntrack_tuple tuple; struct nf_conntrack_tuple_mask mask; @@ -54,8 +55,8 @@ struct nf_conntrack_expect { /* The conntrack of the master connection */ struct nf_conn *master; - /* Timer function; deletes the expectation. */ - struct timer_list timeout; + /* jiffies32 when this expectation expires */ + u32 timeout; #if IS_ENABLED(CONFIG_NF_NAT) union nf_inet_addr saved_addr; @@ -69,6 +70,14 @@ struct nf_conntrack_expect { struct rcu_head rcu; }; +static inline bool nf_ct_exp_is_expired(const struct nf_conntrack_expect *exp) +{ + if (READ_ONCE(exp->flags) & NF_CT_EXPECT_DEAD) + return true; + + return (__s32)(READ_ONCE(exp->timeout) - nfct_time_stamp) <= 0; +} + static inline struct net *nf_ct_exp_net(struct nf_conntrack_expect *exp) { return read_pnet(&exp->net); @@ -130,7 +139,6 @@ static inline void nf_ct_unlink_expect(struct nf_conntrack_expect *exp) void nf_ct_remove_expectations(struct nf_conn *ct); void nf_ct_unexpect_related(struct nf_conntrack_expect *exp); -bool nf_ct_remove_expect(struct nf_conntrack_expect *exp); void nf_ct_expect_iterate_destroy(bool (*iter)(struct nf_conntrack_expect *e, void *data), void *data); void nf_ct_expect_iterate_net(struct net *net, @@ -153,5 +161,8 @@ static inline int nf_ct_expect_related(struct nf_conntrack_expect *expect, return nf_ct_expect_related_report(expect, 0, 0, flags); } +struct nf_conn_help; +void nf_ct_expectation_gc(struct nf_conn_help *master_help); + #endif /*_NF_CONNTRACK_EXPECT_H*/ diff --git a/include/net/netfilter/nf_conntrack_helper.h b/include/net/netfilter/nf_conntrack_helper.h index 81025101f86d..c761cd8158b2 100644 --- a/include/net/netfilter/nf_conntrack_helper.h +++ b/include/net/netfilter/nf_conntrack_helper.h @@ -114,6 +114,10 @@ int nf_conntrack_helpers_register(struct nf_conntrack_helper *, unsigned int, void nf_conntrack_helpers_unregister(struct nf_conntrack_helper **, unsigned int); +#define nf_conntrack_helper_deprecated(name) \ + pr_warn("The %s conntrack helper is scheduled for removal.\n" \ + "Please contact the netfilter-devel mailing list if you still need this.\n", name) + struct nf_conn_help *nf_ct_helper_ext_add(struct nf_conn *ct, gfp_t gfp); int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl, diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h index 3978c3174cdb..fc3e81c07364 100644 --- a/include/net/netfilter/nf_queue.h +++ b/include/net/netfilter/nf_queue.h @@ -18,6 +18,7 @@ struct nf_queue_entry { unsigned int id; unsigned int hook_index; /* index in hook_entries->hook[] */ #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) + struct net_device *bridge_dev; struct net_device *physin; struct net_device *physout; #endif diff --git a/include/net/netfilter/nft_meta.h b/include/net/netfilter/nft_meta.h index f74e63290603..6cf1d910bbf8 100644 --- a/include/net/netfilter/nft_meta.h +++ b/include/net/netfilter/nft_meta.h @@ -40,6 +40,8 @@ void nft_meta_set_eval(const struct nft_expr *expr, void nft_meta_set_destroy(const struct nft_ctx *ctx, const struct nft_expr *expr); +int nft_meta_get_validate(const struct nft_ctx *ctx, + const struct nft_expr *expr); int nft_meta_set_validate(const struct nft_ctx *ctx, const struct nft_expr *expr); diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h index ec65a8cebb99..2bff41aacc98 100644 --- a/include/net/rtnetlink.h +++ b/include/net/rtnetlink.h @@ -256,6 +256,8 @@ int rtnl_configure_link(struct net_device *dev, const struct ifinfomsg *ifm, int rtnl_nla_parse_ifinfomsg(struct nlattr **tb, const struct nlattr *nla_peer, struct netlink_ext_ack *exterr); struct net *rtnl_get_net_ns_capable(struct sock *sk, int netnsid); +bool rtnl_dev_link_net_capable(const struct net_device *dev, + const struct net *link_net); #define MODULE_ALIAS_RTNL_LINK(kind) MODULE_ALIAS("rtnl-link-" kind) diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h index 60b073fd3ed8..d50c27812504 100644 --- a/include/net/sctp/sctp.h +++ b/include/net/sctp/sctp.h @@ -111,7 +111,8 @@ int sctp_transport_lookup_process(sctp_callback_t cb, struct net *net, const union sctp_addr *paddr, void *p, int dif); int sctp_transport_traverse_process(sctp_callback_t cb, sctp_callback_t cb_done, struct net *net, int *pos, void *p); -int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *), void *p); +int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *), + struct net *net, int *pos, void *p); int sctp_get_sctp_info(struct sock *sk, struct sctp_association *asoc, struct sctp_info *info); diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 46c1e499e955..519a0156a05c 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -953,6 +953,9 @@ static inline bool addr_match(const void *token1, const void *token2, unsigned int pdw; unsigned int pbi; + if (prefixlen > 128) + return false; + pdw = prefixlen >> 5; /* num of whole u32 in prefix */ pbi = prefixlen & 0x1f; /* num of bits in incomplete u32 in prefix */ @@ -977,6 +980,10 @@ static inline bool addr4_match(__be32 a1, __be32 a2, u8 prefixlen) /* C99 6.5.7 (3): u32 << 32 is undefined behaviour */ if (sizeof(long) == 4 && prefixlen == 0) return true; + + if (prefixlen > 32) + return false; + return !((a1 ^ a2) & htonl(~0UL << (32 - prefixlen))); } @@ -1260,8 +1267,8 @@ int __xfrm_policy_check(struct sock *, int dir, struct sk_buff *skb, static inline bool __xfrm_check_nopolicy(struct net *net, struct sk_buff *skb, int dir) { - if (!net->xfrm.policy_count[dir] && !secpath_exists(skb)) - return net->xfrm.policy_default[dir] == XFRM_USERPOLICY_ACCEPT; + if (!READ_ONCE(net->xfrm.policy_count[dir]) && !secpath_exists(skb)) + return READ_ONCE(net->xfrm.policy_default[dir]) == XFRM_USERPOLICY_ACCEPT; return false; } @@ -1361,8 +1368,8 @@ static inline int xfrm_route_forward(struct sk_buff *skb, unsigned short family) { struct net *net = dev_net(skb->dev); - if (!net->xfrm.policy_count[XFRM_POLICY_OUT] && - net->xfrm.policy_default[XFRM_POLICY_OUT] == XFRM_USERPOLICY_ACCEPT) + if (!READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_OUT]) && + READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_OUT]) == XFRM_USERPOLICY_ACCEPT) return true; return (skb_dst(skb)->flags & DST_NOXFRM) || diff --git a/include/uapi/linux/netfilter/nf_conntrack_common.h b/include/uapi/linux/netfilter/nf_conntrack_common.h index 56b6b60a814f..ee51045ae1d6 100644 --- a/include/uapi/linux/netfilter/nf_conntrack_common.h +++ b/include/uapi/linux/netfilter/nf_conntrack_common.h @@ -160,6 +160,7 @@ enum ip_conntrack_expect_events { #define NF_CT_EXPECT_USERSPACE 0x4 #ifdef __KERNEL__ +#define NF_CT_EXPECT_DEAD 0x8 #define NF_CT_EXPECT_MASK (NF_CT_EXPECT_PERMANENT | NF_CT_EXPECT_INACTIVE | \ NF_CT_EXPECT_USERSPACE) #endif diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 2b74ed56eb16..2d2efb877975 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -77,9 +77,9 @@ static int vlan_group_prealloc_vid(struct vlan_group *vg, return 0; } -static void vlan_stacked_transfer_operstate(const struct net_device *rootdev, - struct net_device *dev, - struct vlan_dev_priv *vlan) +void vlan_stacked_transfer_operstate(const struct net_device *rootdev, + struct net_device *dev, + struct vlan_dev_priv *vlan) { if (!(vlan->flags & VLAN_FLAG_BRIDGE_BINDING)) netif_stacked_transfer_operstate(rootdev, dev); @@ -316,29 +316,6 @@ out: ether_addr_copy(vlan->real_dev_addr, dev->dev_addr); } -static void vlan_transfer_features(struct net_device *dev, - struct net_device *vlandev) -{ - struct vlan_dev_priv *vlan = vlan_dev_priv(vlandev); - - netif_inherit_tso_max(vlandev, dev); - - if (vlan_hw_offload_capable(dev->features, vlan->vlan_proto)) - vlandev->hard_header_len = dev->hard_header_len; - else - vlandev->hard_header_len = dev->hard_header_len + VLAN_HLEN; - -#if IS_ENABLED(CONFIG_FCOE) - vlandev->fcoe_ddp_xid = dev->fcoe_ddp_xid; -#endif - - vlandev->priv_flags &= ~IFF_XMIT_DST_RELEASE; - vlandev->priv_flags |= (vlan->real_dev->priv_flags & IFF_XMIT_DST_RELEASE); - vlandev->hw_enc_features = vlan_tnl_features(vlan->real_dev); - - netdev_update_features(vlandev); -} - static int __vlan_device_event(struct net_device *dev, unsigned long event) { int err = 0; @@ -391,13 +368,11 @@ static void vlan_vid0_del(struct net_device *dev) static int vlan_device_event(struct notifier_block *unused, unsigned long event, void *ptr) { - struct netlink_ext_ack *extack = netdev_notifier_info_to_extack(ptr); struct net_device *dev = netdev_notifier_info_to_dev(ptr); struct vlan_group *grp; struct vlan_info *vlan_info; int i, flgs; struct net_device *vlandev; - struct vlan_dev_priv *vlan; bool last = false; LIST_HEAD(list); int err; @@ -447,54 +422,19 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, if (vlandev->mtu <= dev->mtu) continue; - dev_set_mtu(vlandev, dev->mtu); + netdev_work_sched(vlandev, VLAN_WORK_MTU); } break; case NETDEV_FEAT_CHANGE: - /* Propagate device features to underlying device */ vlan_group_for_each_dev(grp, i, vlandev) - vlan_transfer_features(dev, vlandev); + netdev_work_sched(vlandev, VLAN_WORK_FEATURES); break; - case NETDEV_DOWN: { - struct net_device *tmp; - LIST_HEAD(close_list); - - /* Put all VLANs for this dev in the down state too. */ - vlan_group_for_each_dev(grp, i, vlandev) { - flgs = vlandev->flags; - if (!(flgs & IFF_UP)) - continue; - - vlan = vlan_dev_priv(vlandev); - if (!(vlan->flags & VLAN_FLAG_LOOSE_BINDING)) - list_add(&vlandev->close_list, &close_list); - } - - netif_close_many(&close_list, false); - - list_for_each_entry_safe(vlandev, tmp, &close_list, close_list) { - vlan_stacked_transfer_operstate(dev, vlandev, - vlan_dev_priv(vlandev)); - list_del_init(&vlandev->close_list); - } - list_del(&close_list); - break; - } + case NETDEV_DOWN: case NETDEV_UP: - /* Put all VLANs for this dev in the up state too. */ - vlan_group_for_each_dev(grp, i, vlandev) { - flgs = netif_get_flags(vlandev); - if (flgs & IFF_UP) - continue; - - vlan = vlan_dev_priv(vlandev); - if (!(vlan->flags & VLAN_FLAG_LOOSE_BINDING)) - dev_change_flags(vlandev, flgs | IFF_UP, - extack); - vlan_stacked_transfer_operstate(dev, vlandev, vlan); - } + vlan_group_for_each_dev(grp, i, vlandev) + netdev_work_sched(vlandev, VLAN_WORK_LINK_STATE); break; case NETDEV_UNREGISTER: diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h index c7ffe591d593..c41caaf94095 100644 --- a/net/8021q/vlan.h +++ b/net/8021q/vlan.h @@ -125,6 +125,17 @@ static inline netdev_features_t vlan_tnl_features(struct net_device *real_dev) int vlan_filter_push_vids(struct vlan_info *vlan_info, __be16 proto); void vlan_filter_drop_vids(struct vlan_info *vlan_info, __be16 proto); +/* netdev_work events propagated from the real device, see vlan_dev_work(). */ +enum { + VLAN_WORK_LINK_STATE = BIT(0), /* sync up/down with real_dev */ + VLAN_WORK_MTU = BIT(1), /* clamp mtu to real_dev's */ + VLAN_WORK_FEATURES = BIT(2), /* re-inherit real_dev features */ +}; + +void vlan_stacked_transfer_operstate(const struct net_device *rootdev, + struct net_device *dev, + struct vlan_dev_priv *vlan); + /* found in vlan_dev.c */ void vlan_dev_set_ingress_priority(const struct net_device *dev, u32 skb_prio, u16 vlan_prio); diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index 7aa3af8b10ea..ec2569b3f8da 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -270,6 +270,9 @@ static int vlan_dev_open(struct net_device *dev) !(vlan->flags & VLAN_FLAG_LOOSE_BINDING)) return -ENETDOWN; + /* The explicit open supersedes any deferred link-state sync */ + netdev_work_cancel(dev, VLAN_WORK_LINK_STATE); + if (!ether_addr_equal(dev->dev_addr, real_dev->dev_addr) && !vlan_dev_inherit_address(dev, real_dev)) { err = dev_uc_add(real_dev, dev->dev_addr); @@ -300,6 +303,9 @@ static int vlan_dev_stop(struct net_device *dev) struct vlan_dev_priv *vlan = vlan_dev_priv(dev); struct net_device *real_dev = vlan->real_dev; + /* The explicit close supersedes any deferred link-state sync */ + netdev_work_cancel(dev, VLAN_WORK_LINK_STATE); + dev_mc_unsync(real_dev, dev); dev_uc_unsync(real_dev, dev); @@ -1016,6 +1022,59 @@ static const struct ethtool_ops vlan_ethtool_ops = { .get_ts_info = vlan_ethtool_get_ts_info, }; +static void vlan_transfer_features(struct net_device *dev, + struct net_device *vlandev) +{ + struct vlan_dev_priv *vlan = vlan_dev_priv(vlandev); + + netif_inherit_tso_max(vlandev, dev); + + if (vlan_hw_offload_capable(dev->features, vlan->vlan_proto)) + vlandev->hard_header_len = dev->hard_header_len; + else + vlandev->hard_header_len = dev->hard_header_len + VLAN_HLEN; + +#if IS_ENABLED(CONFIG_FCOE) + vlandev->fcoe_ddp_xid = dev->fcoe_ddp_xid; +#endif + + vlandev->priv_flags &= ~IFF_XMIT_DST_RELEASE; + vlandev->priv_flags |= (vlan->real_dev->priv_flags & IFF_XMIT_DST_RELEASE); + vlandev->hw_enc_features = vlan_tnl_features(vlan->real_dev); + + netdev_update_features(vlandev); +} + +static void vlan_dev_work(struct net_device *vlandev, unsigned long events) +{ + struct vlan_dev_priv *vlan = vlan_dev_priv(vlandev); + struct net_device *real_dev = vlan->real_dev; + bool loose = vlan->flags & VLAN_FLAG_LOOSE_BINDING; + unsigned int flgs; + + if (events & VLAN_WORK_LINK_STATE) { + flgs = netif_get_flags(vlandev); + if (real_dev->flags & IFF_UP) { + if (!(flgs & IFF_UP)) { + if (!loose) + netif_change_flags(vlandev, + flgs | IFF_UP, NULL); + vlan_stacked_transfer_operstate(real_dev, + vlandev, vlan); + } + } else if ((flgs & IFF_UP) && !loose) { + netif_change_flags(vlandev, flgs & ~IFF_UP, NULL); + vlan_stacked_transfer_operstate(real_dev, vlandev, vlan); + } + } + + if ((events & VLAN_WORK_MTU) && vlandev->mtu > real_dev->mtu) + netif_set_mtu(vlandev, real_dev->mtu); + + if (events & VLAN_WORK_FEATURES) + vlan_transfer_features(real_dev, vlandev); +} + static const struct net_device_ops vlan_netdev_ops = { .ndo_change_mtu = vlan_dev_change_mtu, .ndo_init = vlan_dev_init, @@ -1027,6 +1086,7 @@ static const struct net_device_ops vlan_netdev_ops = { .ndo_set_mac_address = vlan_dev_set_mac_address, .ndo_set_rx_mode = vlan_dev_set_rx_mode, .ndo_change_rx_flags = vlan_dev_change_rx_flags, + .ndo_work = vlan_dev_work, .ndo_eth_ioctl = vlan_dev_ioctl, .ndo_neigh_setup = vlan_dev_neigh_setup, .ndo_get_stats64 = vlan_dev_get_stats64, diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c index 7588e64e7ba6..bb2f012b454e 100644 --- a/net/batman-adv/bat_iv_ogm.c +++ b/net/batman-adv/bat_iv_ogm.c @@ -316,14 +316,23 @@ batadv_iv_ogm_aggr_packet(int buff_pos, int packet_len, const struct batadv_ogm_packet *ogm_packet) { int next_buff_pos = 0; + u16 tvlv_len; /* check if there is enough space for the header */ next_buff_pos += buff_pos + sizeof(*ogm_packet); if (next_buff_pos > packet_len) return false; + tvlv_len = ntohs(ogm_packet->tvlv_len); + + /* the fields of an aggregated OGM are accessed assuming (at least) + * 2-byte alignment, so a following OGM must start at an even offset. + */ + if (tvlv_len & 1) + return false; + /* check if there is enough space for the optional TVLV */ - next_buff_pos += ntohs(ogm_packet->tvlv_len); + next_buff_pos += tvlv_len; return next_buff_pos <= packet_len; } diff --git a/net/batman-adv/bat_v.c b/net/batman-adv/bat_v.c index fe7c0113d0df..db6f5bdcaa98 100644 --- a/net/batman-adv/bat_v.c +++ b/net/batman-adv/bat_v.c @@ -817,6 +817,7 @@ void batadv_v_hardif_init(struct batadv_hard_iface *hard_iface) hard_iface->bat_v.aggr_len = 0; skb_queue_head_init(&hard_iface->bat_v.aggr_list); + hard_iface->bat_v.aggr_list_enabled = false; INIT_DELAYED_WORK(&hard_iface->bat_v.aggr_wq, batadv_v_ogm_aggr_work); /* make sure it doesn't run until interface gets enabled */ diff --git a/net/batman-adv/bat_v_ogm.c b/net/batman-adv/bat_v_ogm.c index 81926ef9c02c..037921aad35d 100644 --- a/net/batman-adv/bat_v_ogm.c +++ b/net/batman-adv/bat_v_ogm.c @@ -254,11 +254,18 @@ static void batadv_v_ogm_queue_on_if(struct batadv_priv *bat_priv, } spin_lock_bh(&hard_iface->bat_v.aggr_list.lock); + if (!hard_iface->bat_v.aggr_list_enabled) { + kfree_skb(skb); + goto unlock; + } + if (!batadv_v_ogm_queue_left(skb, hard_iface)) batadv_v_ogm_aggr_send(bat_priv, hard_iface); hard_iface->bat_v.aggr_len += batadv_v_ogm_len(skb); __skb_queue_tail(&hard_iface->bat_v.aggr_list, skb); + +unlock: spin_unlock_bh(&hard_iface->bat_v.aggr_list.lock); } @@ -415,6 +422,10 @@ int batadv_v_ogm_iface_enable(struct batadv_hard_iface *hard_iface) { struct batadv_priv *bat_priv = netdev_priv(hard_iface->mesh_iface); + spin_lock_bh(&hard_iface->bat_v.aggr_list.lock); + hard_iface->bat_v.aggr_list_enabled = true; + spin_unlock_bh(&hard_iface->bat_v.aggr_list.lock); + enable_delayed_work(&hard_iface->bat_v.aggr_wq); batadv_v_ogm_start_queue_timer(hard_iface); @@ -432,6 +443,7 @@ void batadv_v_ogm_iface_disable(struct batadv_hard_iface *hard_iface) disable_delayed_work_sync(&hard_iface->bat_v.aggr_wq); spin_lock_bh(&hard_iface->bat_v.aggr_list.lock); + hard_iface->bat_v.aggr_list_enabled = false; batadv_v_ogm_aggr_list_free(hard_iface); spin_unlock_bh(&hard_iface->bat_v.aggr_list.lock); } @@ -837,14 +849,23 @@ batadv_v_ogm_aggr_packet(int buff_pos, int packet_len, const struct batadv_ogm2_packet *ogm2_packet) { int next_buff_pos = 0; + u16 tvlv_len; /* check if there is enough space for the header */ next_buff_pos += buff_pos + sizeof(*ogm2_packet); if (next_buff_pos > packet_len) return false; + tvlv_len = ntohs(ogm2_packet->tvlv_len); + + /* the fields of an aggregated OGMv2 are accessed assuming (at least) + * 2-byte alignment, so a following OGMv2 must start at an even offset. + */ + if (tvlv_len & 1) + return false; + /* check if there is enough space for the optional TVLV */ - next_buff_pos += ntohs(ogm2_packet->tvlv_len); + next_buff_pos += tvlv_len; return next_buff_pos <= packet_len; } diff --git a/net/batman-adv/distributed-arp-table.c b/net/batman-adv/distributed-arp-table.c index aaea155b9403..ae39ceaa2e29 100644 --- a/net/batman-adv/distributed-arp-table.c +++ b/net/batman-adv/distributed-arp-table.c @@ -215,10 +215,13 @@ static void batadv_dat_purge(struct work_struct *work) */ static bool batadv_compare_dat(const struct hlist_node *node, const void *data2) { - const void *data1 = container_of(node, struct batadv_dat_entry, - hash_entry); + const struct batadv_dat_entry *entry1; + const struct batadv_dat_entry *entry2; - return memcmp(data1, data2, sizeof(__be32)) == 0; + entry1 = container_of(node, struct batadv_dat_entry, hash_entry); + entry2 = data2; + + return entry1->ip == entry2->ip && entry1->vid == entry2->vid; } /** @@ -345,6 +348,9 @@ batadv_dat_entry_hash_find(struct batadv_priv *bat_priv, __be32 ip, if (dat_entry->ip != ip) continue; + if (dat_entry->vid != vid) + continue; + if (!kref_get_unless_zero(&dat_entry->refcount)) continue; diff --git a/net/batman-adv/fragmentation.c b/net/batman-adv/fragmentation.c index 1e42cf99f8b3..8a006a0473a8 100644 --- a/net/batman-adv/fragmentation.c +++ b/net/batman-adv/fragmentation.c @@ -386,6 +386,8 @@ out_err: * @skb: skb to forward * @recv_if: interface that the skb is received on * @orig_node_src: originator that the skb is received from + * @rx_result: set to NET_RX_SUCCESS when the fragment was forwarded and + * NET_RX_DROP when it was dropped; only valid when true is returned * * Look up the next-hop of the fragments payload and check if the merged packet * will exceed the MTU towards the next-hop. If so, the fragment is forwarded @@ -395,7 +397,8 @@ out_err: */ bool batadv_frag_skb_fwd(struct sk_buff *skb, struct batadv_hard_iface *recv_if, - struct batadv_orig_node *orig_node_src) + struct batadv_orig_node *orig_node_src, + int *rx_result) { struct batadv_priv *bat_priv = netdev_priv(recv_if->mesh_iface); struct batadv_neigh_node *neigh_node = NULL; @@ -414,12 +417,29 @@ bool batadv_frag_skb_fwd(struct sk_buff *skb, */ total_size = ntohs(packet->total_size); if (total_size > neigh_node->if_incoming->net_dev->mtu) { + if (packet->ttl < 2) { + kfree_skb(skb); + *rx_result = NET_RX_DROP; + ret = true; + goto out; + } + + if (skb_cow(skb, ETH_HLEN) < 0) { + kfree_skb(skb); + *rx_result = NET_RX_DROP; + ret = true; + goto out; + } + + packet = (struct batadv_frag_packet *)skb->data; + batadv_inc_counter(bat_priv, BATADV_CNT_FRAG_FWD); batadv_add_counter(bat_priv, BATADV_CNT_FRAG_FWD_BYTES, skb->len + ETH_HLEN); packet->ttl--; batadv_send_unicast_skb(skb, neigh_node); + *rx_result = NET_RX_SUCCESS; ret = true; } diff --git a/net/batman-adv/fragmentation.h b/net/batman-adv/fragmentation.h index dbf0871f8703..51e281027ab6 100644 --- a/net/batman-adv/fragmentation.h +++ b/net/batman-adv/fragmentation.h @@ -19,7 +19,8 @@ void batadv_frag_purge_orig(struct batadv_orig_node *orig, bool (*check_cb)(struct batadv_frag_table_entry *)); bool batadv_frag_skb_fwd(struct sk_buff *skb, struct batadv_hard_iface *recv_if, - struct batadv_orig_node *orig_node_src); + struct batadv_orig_node *orig_node_src, + int *rx_result); bool batadv_frag_skb_buffer(struct sk_buff **skb, struct batadv_orig_node *orig_node); int batadv_frag_send_packet(struct sk_buff *skb, diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c index 60cee2c2f2f4..03d01c20a954 100644 --- a/net/batman-adv/hard-interface.c +++ b/net/batman-adv/hard-interface.c @@ -815,30 +815,6 @@ err_dev: } /** - * batadv_hardif_cnt() - get number of interfaces enslaved to mesh interface - * @mesh_iface: mesh interface to check - * - * This function is only using RCU for locking - the result can therefore be - * off when another function is modifying the list at the same time. The - * caller can use the rtnl_lock to make sure that the count is accurate. - * - * Return: number of connected/enslaved hard interfaces - */ -static size_t batadv_hardif_cnt(struct net_device *mesh_iface) -{ - struct batadv_hard_iface *hard_iface; - struct list_head *iter; - size_t count = 0; - - rcu_read_lock(); - netdev_for_each_lower_private_rcu(mesh_iface, hard_iface, iter) - count++; - rcu_read_unlock(); - - return count; -} - -/** * batadv_hardif_disable_interface() - Remove hard interface from mesh interface * @hard_iface: hard interface to be removed */ @@ -878,8 +854,8 @@ void batadv_hardif_disable_interface(struct batadv_hard_iface *hard_iface) netdev_upper_dev_unlink(hard_iface->net_dev, hard_iface->mesh_iface); batadv_hardif_recalc_extra_skbroom(hard_iface->mesh_iface); - /* nobody uses this interface anymore */ - if (batadv_hardif_cnt(hard_iface->mesh_iface) <= 1) + /* nobody uses this mesh interface anymore */ + if (list_empty(&hard_iface->mesh_iface->adj_list.lower)) batadv_gw_check_client_stop(bat_priv); hard_iface->mesh_iface = NULL; diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c index cd4368b846ad..c05fcc9241ad 100644 --- a/net/batman-adv/routing.c +++ b/net/batman-adv/routing.c @@ -8,6 +8,7 @@ #include "main.h" #include <linux/atomic.h> +#include <linux/build_bug.h> #include <linux/byteorder/generic.h> #include <linux/compiler.h> #include <linux/errno.h> @@ -205,6 +206,59 @@ bool batadv_check_management_packet(struct sk_buff *skb, } /** + * batadv_skb_decrement_ttl() - decrement ttl in a batman-adv header, csum-safe + * @skb: the received packet with @skb->data pointing to the batman-adv header + * + * Supports the following packet types, all of which carry the TTL at offset 2: + * + * - batadv_ogm_packet + * - batadv_ogm2_packet + * - batadv_icmp_header + * - batadv_icmp_packet + * - batadv_icmp_tp_packet + * - batadv_icmp_packet_rr + * - batadv_unicast_packet + * - batadv_frag_packet + * - batadv_bcast_packet + * - batadv_mcast_packet + * - batadv_coded_packet + * - batadv_unicast_tvlv_packet + * + * Return: true if the packet may be forwarded (ttl decremented), + * false if it must be dropped (ttl would expire) + */ +static bool batadv_skb_decrement_ttl(struct sk_buff *skb) +{ + static const size_t ttl_offset = 2; + u8 *ttl_pos; + + BUILD_BUG_ON(offsetof(struct batadv_ogm_packet, ttl) != ttl_offset); + BUILD_BUG_ON(offsetof(struct batadv_ogm2_packet, ttl) != ttl_offset); + BUILD_BUG_ON(offsetof(struct batadv_icmp_header, ttl) != ttl_offset); + BUILD_BUG_ON(offsetof(struct batadv_icmp_packet, ttl) != ttl_offset); + BUILD_BUG_ON(offsetof(struct batadv_icmp_tp_packet, ttl) != ttl_offset); + BUILD_BUG_ON(offsetof(struct batadv_icmp_packet_rr, ttl) != ttl_offset); + BUILD_BUG_ON(offsetof(struct batadv_unicast_packet, ttl) != ttl_offset); + BUILD_BUG_ON(offsetof(struct batadv_frag_packet, ttl) != ttl_offset); + BUILD_BUG_ON(offsetof(struct batadv_bcast_packet, ttl) != ttl_offset); + BUILD_BUG_ON(offsetof(struct batadv_mcast_packet, ttl) != ttl_offset); + BUILD_BUG_ON(offsetof(struct batadv_coded_packet, ttl) != ttl_offset); + BUILD_BUG_ON(offsetof(struct batadv_unicast_tvlv_packet, ttl) != ttl_offset); + + ttl_pos = skb->data + ttl_offset; + + /* would expire on this hop -> drop, leave header + csum untouched */ + if (*ttl_pos < 2) + return false; + + skb_postpull_rcsum(skb, ttl_pos, 1); + (*ttl_pos)--; + skb_postpush_rcsum(skb, ttl_pos, 1); + + return true; +} + +/** * batadv_recv_my_icmp_packet() - receive an icmp packet locally * @bat_priv: the bat priv with all the mesh interface information * @skb: icmp packet to process @@ -1114,10 +1168,9 @@ int batadv_recv_frag_packet(struct sk_buff *skb, /* Route the fragment if it is not for us and too big to be merged. */ if (!batadv_is_my_mac(bat_priv, frag_packet->dest) && - batadv_frag_skb_fwd(skb, recv_if, orig_node_src)) { + batadv_frag_skb_fwd(skb, recv_if, orig_node_src, &ret)) { /* skb was consumed */ skb = NULL; - ret = NET_RX_SUCCESS; goto put_orig_node; } @@ -1191,7 +1244,13 @@ int batadv_recv_bcast_packet(struct sk_buff *skb, if (batadv_is_my_mac(bat_priv, bcast_packet->orig)) goto free_skb; - if (bcast_packet->ttl-- < 2) + /* create a copy of the skb, if needed, to modify it. */ + if (skb_cow(skb, ETH_HLEN) < 0) + goto free_skb; + + bcast_packet = (struct batadv_bcast_packet *)skb->data; + + if (!batadv_skb_decrement_ttl(skb)) goto free_skb; orig_node = batadv_orig_hash_find(bat_priv, bcast_packet->orig); @@ -1298,7 +1357,7 @@ int batadv_recv_mcast_packet(struct sk_buff *skb, goto free_skb; mcast_packet = (struct batadv_mcast_packet *)skb->data; - if (mcast_packet->ttl-- < 2) + if (!batadv_skb_decrement_ttl(skb)) goto free_skb; tvlv_buff = (unsigned char *)(skb->data + hdr_size); @@ -1307,6 +1366,12 @@ int batadv_recv_mcast_packet(struct sk_buff *skb, if (tvlv_buff_len > skb->len - hdr_size) goto free_skb; + /* the fields of an multicast payload are accessed assuming (at least) + * 2-byte alignment, so a following packet must start at an even offset. + */ + if (tvlv_buff_len & 1) + goto free_skb; + ret = batadv_tvlv_containers_process(bat_priv, BATADV_MCAST, NULL, skb, tvlv_buff, tvlv_buff_len); if (ret >= 0) { diff --git a/net/batman-adv/tp_meter.c b/net/batman-adv/tp_meter.c index 7e98cbfbbb70..c2eea7dbc488 100644 --- a/net/batman-adv/tp_meter.c +++ b/net/batman-adv/tp_meter.c @@ -87,6 +87,11 @@ #define BATADV_TP_PLEN (BATADV_TP_PACKET_LEN - ETH_HLEN - \ sizeof(struct batadv_unicast_packet)) +/** + * BATADV_TP_MAX_UNACKED - maximum number of packets a receiver didn't yet ack + */ +#define BATADV_TP_MAX_UNACKED 100 + static u8 batadv_tp_prerandom[4096] __read_mostly; /** @@ -1285,7 +1290,7 @@ static void batadv_tp_receiver_shutdown(struct timer_list *t) bat_priv = tp_vars->common.bat_priv; /* if there is recent activity rearm the timer */ - if (!batadv_has_timed_out(tp_vars->last_recv_time, + if (!batadv_has_timed_out(READ_ONCE(tp_vars->last_recv_time), BATADV_TP_RECV_TIMEOUT)) { /* reset the receiver shutdown timer */ batadv_tp_reset_receiver_timer(tp_vars); @@ -1303,6 +1308,7 @@ static void batadv_tp_receiver_shutdown(struct timer_list *t) list_for_each_entry_safe(un, safe, &tp_vars->common.unacked_list, list) { list_del(&un->list); kfree(un); + tp_vars->common.unacked_count--; } spin_unlock_bh(&tp_vars->common.unacked_lock); @@ -1386,7 +1392,8 @@ out: /** * batadv_tp_handle_out_of_order() - store an out of order packet * @tp_vars: the private data of the current TP meter session - * @skb: the buffer containing the received packet + * @seqno: sequence number of new received packet + * @payload_len: length of the received packet * * Store the out of order packet in the unacked list for late processing. This * packets are kept in this list so that they can be ACKed at once as soon as @@ -1395,28 +1402,24 @@ out: * Return: true if the packed has been successfully processed, false otherwise */ static bool batadv_tp_handle_out_of_order(struct batadv_tp_receiver *tp_vars, - const struct sk_buff *skb) + u32 seqno, u32 payload_len) + __must_hold(&tp_vars->common.unacked_lock) { - const struct batadv_icmp_tp_packet *icmp; struct batadv_tp_unacked *un, *new; - u32 payload_len; bool added = false; new = kmalloc_obj(*new, GFP_ATOMIC); if (unlikely(!new)) return false; - icmp = (struct batadv_icmp_tp_packet *)skb->data; - - new->seqno = ntohl(icmp->seqno); - payload_len = skb->len - sizeof(struct batadv_unicast_packet); + new->seqno = seqno; new->len = payload_len; - spin_lock_bh(&tp_vars->common.unacked_lock); /* if the list is empty immediately attach this new object */ if (list_empty(&tp_vars->common.unacked_list)) { list_add(&new->list, &tp_vars->common.unacked_list); - goto out; + tp_vars->common.unacked_count++; + return true; } /* otherwise loop over the list and either drop the packet because this @@ -1446,15 +1449,24 @@ static bool batadv_tp_handle_out_of_order(struct batadv_tp_receiver *tp_vars, */ list_add(&new->list, &un->list); added = true; + tp_vars->common.unacked_count++; break; } /* received packet with smallest seqno out of order; add it to front */ - if (!added) + if (!added) { list_add(&new->list, &tp_vars->common.unacked_list); + tp_vars->common.unacked_count++; + } -out: - spin_unlock_bh(&tp_vars->common.unacked_lock); + /* remove the last (biggest) unacked seqno when list is too large */ + if (tp_vars->common.unacked_count > BATADV_TP_MAX_UNACKED) { + un = list_last_entry(&tp_vars->common.unacked_list, + struct batadv_tp_unacked, list); + list_del(&un->list); + kfree(un); + tp_vars->common.unacked_count--; + } return true; } @@ -1465,6 +1477,7 @@ out: * @tp_vars: the private data of the current TP meter session */ static void batadv_tp_ack_unordered(struct batadv_tp_receiver *tp_vars) + __must_hold(&tp_vars->common.unacked_lock) { struct batadv_tp_unacked *un, *safe; u32 to_ack; @@ -1472,7 +1485,6 @@ static void batadv_tp_ack_unordered(struct batadv_tp_receiver *tp_vars) /* go through the unacked packet list and possibly ACK them as * well */ - spin_lock_bh(&tp_vars->common.unacked_lock); list_for_each_entry_safe(un, safe, &tp_vars->common.unacked_list, list) { /* the list is ordered, therefore it is possible to stop as soon * there is a gap between the last acked seqno and the seqno of @@ -1488,8 +1500,8 @@ static void batadv_tp_ack_unordered(struct batadv_tp_receiver *tp_vars) list_del(&un->list); kfree(un); + tp_vars->common.unacked_count--; } - spin_unlock_bh(&tp_vars->common.unacked_lock); } /** @@ -1512,7 +1524,7 @@ batadv_tp_init_recv(struct batadv_priv *bat_priv, tp_vars = batadv_tp_list_find_receiver_session(bat_priv, icmp->orig, icmp->session); if (tp_vars) { - tp_vars->last_recv_time = jiffies; + WRITE_ONCE(tp_vars->last_recv_time, jiffies); goto out_unlock; } @@ -1537,11 +1549,12 @@ batadv_tp_init_recv(struct batadv_priv *bat_priv, spin_lock_init(&tp_vars->common.unacked_lock); INIT_LIST_HEAD(&tp_vars->common.unacked_list); + tp_vars->common.unacked_count = 0; kref_get(&tp_vars->common.refcount); timer_setup(&tp_vars->common.timer, batadv_tp_receiver_shutdown, 0); - tp_vars->last_recv_time = jiffies; + WRITE_ONCE(tp_vars->last_recv_time, jiffies); kref_get(&tp_vars->common.refcount); hlist_add_head_rcu(&tp_vars->common.list, &bat_priv->tp_receiver_list); @@ -1566,7 +1579,8 @@ static void batadv_tp_recv_msg(struct batadv_priv *bat_priv, { const struct batadv_icmp_tp_packet *icmp; struct batadv_tp_receiver *tp_vars; - size_t packet_size; + u32 payload_len; + u32 to_ack; u32 seqno; icmp = (struct batadv_icmp_tp_packet *)skb->data; @@ -1592,40 +1606,48 @@ static void batadv_tp_recv_msg(struct batadv_priv *bat_priv, goto out; } - tp_vars->last_recv_time = jiffies; + WRITE_ONCE(tp_vars->last_recv_time, jiffies); } + spin_lock_bh(&tp_vars->common.unacked_lock); + /* if the packet is a duplicate, it may be the case that an ACK has been * lost. Resend the ACK */ - if (batadv_seq_before(seqno, tp_vars->last_recv)) + payload_len = skb->len - sizeof(struct batadv_unicast_packet); + to_ack = seqno + payload_len; + if (batadv_seq_before(to_ack, tp_vars->last_recv)) goto send_ack; /* if the packet is out of order enqueue it */ - if (ntohl(icmp->seqno) != tp_vars->last_recv) { + if (batadv_seq_before(tp_vars->last_recv, seqno)) { /* exit immediately (and do not send any ACK) if the packet has * not been enqueued correctly */ - if (!batadv_tp_handle_out_of_order(tp_vars, skb)) + if (!batadv_tp_handle_out_of_order(tp_vars, seqno, payload_len)) { + spin_unlock_bh(&tp_vars->common.unacked_lock); goto out; + } /* send a duplicate ACK */ goto send_ack; } /* if everything was fine count the ACKed bytes */ - packet_size = skb->len - sizeof(struct batadv_unicast_packet); - tp_vars->last_recv += packet_size; + tp_vars->last_recv = to_ack; /* check if this ordered message filled a gap.... */ batadv_tp_ack_unordered(tp_vars); send_ack: + to_ack = tp_vars->last_recv; + spin_unlock_bh(&tp_vars->common.unacked_lock); + /* send the ACK. If the received packet was out of order, the ACK that * is going to be sent is a duplicate (the sender will count them and * possibly enter Fast Retransmit as soon as it has reached 3) */ - batadv_tp_send_ack(bat_priv, icmp->orig, tp_vars->last_recv, + batadv_tp_send_ack(bat_priv, icmp->orig, to_ack, icmp->timestamp, icmp->session, icmp->uid); out: batadv_tp_receiver_put(tp_vars); diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index 8b6c49c32c89..4bfad36a4b70 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -447,6 +447,9 @@ static void batadv_tt_local_event(struct batadv_priv *bat_priv, if (!batadv_compare_eth(entry->change.addr, common->addr)) continue; + if (entry->change.vid != tt_change_node->change.vid) + continue; + del_op_entry = entry->change.flags & BATADV_TT_CLIENT_DEL; if (del_op_requested != del_op_entry) { /* DEL+ADD in the same orig interval have no effect and @@ -3447,6 +3450,7 @@ static void batadv_tt_roam_purge(struct batadv_priv *bat_priv) * batadv_tt_check_roam_count() - check if a client has roamed too frequently * @bat_priv: the bat priv with all the mesh interface information * @client: mac address of the roaming client + * @vid: VLAN identifier * * This function checks whether the client already reached the * maximum number of possible roaming phases. In this case the ROAMING_ADV @@ -3454,7 +3458,7 @@ static void batadv_tt_roam_purge(struct batadv_priv *bat_priv) * * Return: true if the ROAMING_ADV can be sent, false otherwise */ -static bool batadv_tt_check_roam_count(struct batadv_priv *bat_priv, u8 *client) +static bool batadv_tt_check_roam_count(struct batadv_priv *bat_priv, u8 *client, u16 vid) { struct batadv_tt_roam_node *tt_roam_node; bool ret = false; @@ -3467,6 +3471,9 @@ static bool batadv_tt_check_roam_count(struct batadv_priv *bat_priv, u8 *client) if (!batadv_compare_eth(tt_roam_node->addr, client)) continue; + if (tt_roam_node->vid != vid) + continue; + if (batadv_has_timed_out(tt_roam_node->first_time, BATADV_ROAMING_MAX_TIME)) continue; @@ -3488,6 +3495,7 @@ static bool batadv_tt_check_roam_count(struct batadv_priv *bat_priv, u8 *client) atomic_set(&tt_roam_node->counter, BATADV_ROAMING_MAX_COUNT - 1); ether_addr_copy(tt_roam_node->addr, client); + tt_roam_node->vid = vid; list_add(&tt_roam_node->list, &bat_priv->tt.roam_list); ret = true; @@ -3524,7 +3532,7 @@ static void batadv_send_roam_adv(struct batadv_priv *bat_priv, u8 *client, /* before going on we have to check whether the client has * already roamed to us too many times */ - if (!batadv_tt_check_roam_count(bat_priv, client)) + if (!batadv_tt_check_roam_count(bat_priv, client, vid)) goto out; batadv_dbg(BATADV_DBG_TT, bat_priv, diff --git a/net/batman-adv/tvlv.c b/net/batman-adv/tvlv.c index 403c85456870..1c9fb21985f6 100644 --- a/net/batman-adv/tvlv.c +++ b/net/batman-adv/tvlv.c @@ -411,7 +411,6 @@ static int batadv_tvlv_call_handler(struct batadv_priv *bat_priv, tvlv_handler->ogm_handler(bat_priv, orig_node, BATADV_NO_FLAGS, tvlv_value, tvlv_value_len); - tvlv_handler->flags |= BATADV_TVLV_HANDLER_OGM_CALLED; break; case BATADV_UNICAST_TVLV: if (!skb) @@ -444,6 +443,48 @@ static int batadv_tvlv_call_handler(struct batadv_priv *bat_priv, } /** + * batadv_tvlv_containers_contain() - check if a tvlv buffer holds a container + * @tvlv_value: tvlv content + * @tvlv_value_len: tvlv content length + * @type: tvlv container type to look for + * @version: tvlv container version to look for + * + * Return: true if a container of the given type and version is present in the + * tvlv buffer, false otherwise. + */ +static bool batadv_tvlv_containers_contain(void *tvlv_value, + u16 tvlv_value_len, u8 type, + u8 version) +{ + struct batadv_tvlv_hdr *tvlv_hdr; + u16 tvlv_value_cont_len; + + while (tvlv_value_len >= sizeof(*tvlv_hdr)) { + tvlv_hdr = tvlv_value; + tvlv_value_cont_len = ntohs(tvlv_hdr->len); + tvlv_value = tvlv_hdr + 1; + tvlv_value_len -= sizeof(*tvlv_hdr); + + if (tvlv_value_cont_len > tvlv_value_len) + break; + + /* the next tvlv header is accessed assuming (at least) 2-byte + * alignment, so it must start at an even offset. + */ + if (tvlv_value_cont_len & 1) + break; + + if (tvlv_hdr->type == type && tvlv_hdr->version == version) + return true; + + tvlv_value = (u8 *)tvlv_value + tvlv_value_cont_len; + tvlv_value_len -= tvlv_value_cont_len; + } + + return false; +} + +/** * batadv_tvlv_containers_process() - parse the given tvlv buffer to call the * appropriate handlers * @bat_priv: the bat priv with all the mesh interface information @@ -462,7 +503,9 @@ int batadv_tvlv_containers_process(struct batadv_priv *bat_priv, struct sk_buff *skb, void *tvlv_value, u16 tvlv_value_len) { + u16 tvlv_value_start_len = tvlv_value_len; struct batadv_tvlv_handler *tvlv_handler; + void *tvlv_value_start = tvlv_value; struct batadv_tvlv_hdr *tvlv_hdr; u16 tvlv_value_cont_len; u8 cifnotfound = BATADV_TVLV_HANDLER_OGM_CIFNOTFND; @@ -477,6 +520,12 @@ int batadv_tvlv_containers_process(struct batadv_priv *bat_priv, if (tvlv_value_cont_len > tvlv_value_len) break; + /* the next tvlv header is accessed assuming (at least) 2-byte + * alignment, so it must start at an even offset. + */ + if (tvlv_value_cont_len & 1) + break; + tvlv_handler = batadv_tvlv_handler_get(bat_priv, tvlv_hdr->type, tvlv_hdr->version); @@ -500,12 +549,20 @@ int batadv_tvlv_containers_process(struct batadv_priv *bat_priv, if (!tvlv_handler->ogm_handler) continue; - if ((tvlv_handler->flags & BATADV_TVLV_HANDLER_OGM_CIFNOTFND) && - !(tvlv_handler->flags & BATADV_TVLV_HANDLER_OGM_CALLED)) - tvlv_handler->ogm_handler(bat_priv, orig_node, - cifnotfound, NULL, 0); + if (!(tvlv_handler->flags & BATADV_TVLV_HANDLER_OGM_CIFNOTFND)) + continue; + + /* if the corresponding container was present then the handler + * was already called from the loop above + */ + if (batadv_tvlv_containers_contain(tvlv_value_start, + tvlv_value_start_len, + tvlv_handler->type, + tvlv_handler->version)) + continue; - tvlv_handler->flags &= ~BATADV_TVLV_HANDLER_OGM_CALLED; + tvlv_handler->ogm_handler(bat_priv, orig_node, + cifnotfound, NULL, 0); } rcu_read_unlock(); diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h index 5fd5bd358a24..b1f9f8964c3f 100644 --- a/net/batman-adv/types.h +++ b/net/batman-adv/types.h @@ -145,6 +145,12 @@ struct batadv_hard_iface_bat_v { /** @aggr_list: queue for to be aggregated OGM packets */ struct sk_buff_head aggr_list; + /** + * @aggr_list_enabled: aggr_list is active and new skbs can be + * enqueued. Protected by aggr_list.lock after initialization + */ + bool aggr_list_enabled:1; + /** @aggr_len: size of the OGM aggregate (excluding ethernet header) */ unsigned int aggr_len; @@ -1357,9 +1363,12 @@ struct batadv_tp_vars_common { /** @unacked_list: list of unacked packets (meta-info only) */ struct list_head unacked_list; - /** @unacked_lock: protect unacked_list */ + /** @unacked_lock: protect unacked_list + &batadv_tp_receiver.last_recv */ spinlock_t unacked_lock; + /** @unacked_count: number of unacked entries */ + size_t unacked_count; + /** @refcount: number of context where the object is used */ struct kref refcount; @@ -1952,6 +1961,9 @@ struct batadv_tt_roam_node { /** @addr: mac address of the client in the roaming phase */ u8 addr[ETH_ALEN]; + /** @vid: VLAN identifier */ + u16 vid; + /** * @counter: number of allowed roaming events per client within a single * OGM interval (changes are committed with each OGM) @@ -2282,13 +2294,6 @@ enum batadv_tvlv_handler_flags { * will call this handler even if its type was not found (with no data) */ BATADV_TVLV_HANDLER_OGM_CIFNOTFND = BIT(1), - - /** - * @BATADV_TVLV_HANDLER_OGM_CALLED: interval tvlv handling flag - the - * API marks a handler as being called, so it won't be called if the - * BATADV_TVLV_HANDLER_OGM_CIFNOTFND flag was set - */ - BATADV_TVLV_HANDLER_OGM_CALLED = BIT(2), }; #endif /* _NET_BATMAN_ADV_TYPES_H_ */ diff --git a/net/bridge/netfilter/nft_meta_bridge.c b/net/bridge/netfilter/nft_meta_bridge.c index 219c40680260..e4c9aa1f64e2 100644 --- a/net/bridge/netfilter/nft_meta_bridge.c +++ b/net/bridge/netfilter/nft_meta_bridge.c @@ -44,7 +44,9 @@ static void nft_meta_bridge_get_eval(const struct nft_expr *expr, if (!br_dev || !br_vlan_enabled(br_dev)) goto err; - br_vlan_get_pvid_rcu(in, &p_pvid); + if (br_vlan_get_pvid_rcu(in, &p_pvid)) + goto err; + nft_reg_store16(dest, p_pvid); return; } @@ -107,12 +109,30 @@ static int nft_meta_bridge_get_init(const struct nft_ctx *ctx, NULL, NFT_DATA_VALUE, len); } +static int nft_meta_bridge_get_validate(const struct nft_ctx *ctx, + const struct nft_expr *expr) +{ + struct nft_meta *priv = nft_expr_priv(expr); + unsigned int hooks; + + switch (priv->key) { + case NFT_META_BRI_IIFHWADDR: + hooks = 1 << NF_BR_PRE_ROUTING; + break; + default: + return nft_meta_get_validate(ctx, expr); + } + + return nft_chain_validate_hooks(ctx->chain, hooks); +} + static struct nft_expr_type nft_meta_bridge_type; static const struct nft_expr_ops nft_meta_bridge_get_ops = { .type = &nft_meta_bridge_type, .size = NFT_EXPR_SIZE(sizeof(struct nft_meta)), .eval = nft_meta_bridge_get_eval, .init = nft_meta_bridge_get_init, + .validate = nft_meta_bridge_get_validate, .dump = nft_meta_get_dump, }; @@ -168,7 +188,6 @@ static int nft_meta_bridge_set_validate(const struct nft_ctx *ctx, switch (priv->key) { case NFT_META_BRI_BROUTE: - case NFT_META_BRI_IIFHWADDR: hooks = 1 << NF_BR_PRE_ROUTING; break; default: diff --git a/net/core/Makefile b/net/core/Makefile index dc17c5a61e9a..b3fdcb4e355f 100644 --- a/net/core/Makefile +++ b/net/core/Makefile @@ -13,7 +13,7 @@ obj-y += dev.o dev_api.o dev_addr_lists.o dst.o netevent.o \ neighbour.o rtnetlink.o utils.o link_watch.o filter.o \ sock_diag.o dev_ioctl.o tso.o sock_reuseport.o \ fib_notifier.o xdp.o flow_offload.o gro.o \ - netdev-genl.o netdev-genl-gen.o gso.o + netdev-genl.o netdev-genl-gen.o netdev_work.o gso.o obj-$(CONFIG_NETDEV_ADDR_LIST_TEST) += dev_addr_lists_test.o diff --git a/net/core/dev.c b/net/core/dev.c index 5c01dfaa6c44..4b3d5cfdf6e0 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -9822,6 +9822,7 @@ int netif_change_flags(struct net_device *dev, unsigned int flags, __dev_notify_flags(dev, old_flags, changes, 0, NULL); return ret; } +EXPORT_SYMBOL(netif_change_flags); int __netif_set_mtu(struct net_device *dev, int new_mtu) { @@ -12093,6 +12094,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, INIT_LIST_HEAD(&dev->ptype_all); INIT_LIST_HEAD(&dev->ptype_specific); INIT_LIST_HEAD(&dev->net_notifier_list); + INIT_LIST_HEAD(&dev->work_node); #ifdef CONFIG_NET_SCHED hash_init(dev->qdisc_hash); #endif diff --git a/net/core/dev.h b/net/core/dev.h index 4121c50e7c88..5d0b0305d3ba 100644 --- a/net/core/dev.h +++ b/net/core/dev.h @@ -167,10 +167,19 @@ int dev_change_carrier(struct net_device *dev, bool new_carrier); void __dev_set_rx_mode(struct net_device *dev); int __dev_set_promiscuity(struct net_device *dev, int inc, bool notify); void netif_rx_mode_init(struct net_device *dev); -bool netif_rx_mode_clean(struct net_device *dev); +void netif_rx_mode_run(struct net_device *dev); void netif_rx_mode_sync(struct net_device *dev); void netif_rx_mode_cancel_retry(struct net_device *dev); +/* Events for the async netdev work, tracked in netdev->work_core_pending. */ +enum netdev_work_core { + NETDEV_WORK_RX_MODE = BIT(0), /* run the rx_mode update */ +}; + +void __netdev_work_core_sched(struct net_device *dev, unsigned long event); +unsigned long +__netdev_work_core_cancel(struct net_device *dev, unsigned long mask); + void __dev_notify_flags(struct net_device *dev, unsigned int old_flags, unsigned int gchanges, u32 portid, const struct nlmsghdr *nlh); diff --git a/net/core/dev_addr_lists.c b/net/core/dev_addr_lists.c index e17f64a65e17..08528ca0a8b3 100644 --- a/net/core/dev_addr_lists.c +++ b/net/core/dev_addr_lists.c @@ -12,17 +12,10 @@ #include <linux/export.h> #include <linux/list.h> #include <linux/spinlock.h> -#include <linux/workqueue.h> #include <kunit/visibility.h> #include "dev.h" -static void netdev_rx_mode_work(struct work_struct *work); - -static LIST_HEAD(rx_mode_list); -static DEFINE_SPINLOCK(rx_mode_lock); -static DECLARE_WORK(rx_mode_work, netdev_rx_mode_work); - /* * General list handling functions */ @@ -1281,7 +1274,7 @@ void netif_rx_mode_cancel_retry(struct net_device *dev) dev->rx_mode_retry_count = 0; } -static void netif_rx_mode_run(struct net_device *dev) +void netif_rx_mode_run(struct net_device *dev) { struct netdev_hw_addr_list uc_snap, mc_snap, uc_ref, mc_ref; const struct net_device_ops *ops = dev->netdev_ops; @@ -1339,49 +1332,9 @@ static void netif_rx_mode_run(struct net_device *dev) } } -static void netdev_rx_mode_work(struct work_struct *work) -{ - struct net_device *dev; - - rtnl_lock(); - - while (true) { - spin_lock_bh(&rx_mode_lock); - if (list_empty(&rx_mode_list)) { - spin_unlock_bh(&rx_mode_lock); - break; - } - dev = list_first_entry(&rx_mode_list, struct net_device, - rx_mode_node); - list_del_init(&dev->rx_mode_node); - /* We must free netdev tracker under - * the spinlock protection. - */ - netdev_tracker_free(dev, &dev->rx_mode_tracker); - spin_unlock_bh(&rx_mode_lock); - - netdev_lock_ops(dev); - netif_rx_mode_run(dev); - netdev_unlock_ops(dev); - /* Use __dev_put() because netdev_tracker_free() was already - * called above. Must be after netdev_unlock_ops() to prevent - * netdev_run_todo() from freeing the device while still in use. - */ - __dev_put(dev); - } - - rtnl_unlock(); -} - static void netif_rx_mode_queue(struct net_device *dev) { - spin_lock_bh(&rx_mode_lock); - if (list_empty(&dev->rx_mode_node)) { - list_add_tail(&dev->rx_mode_node, &rx_mode_list); - netdev_hold(dev, &dev->rx_mode_tracker, GFP_ATOMIC); - } - spin_unlock_bh(&rx_mode_lock); - schedule_work(&rx_mode_work); + __netdev_work_core_sched(dev, NETDEV_WORK_RX_MODE); } static void netif_rx_mode_retry(struct timer_list *t) @@ -1394,7 +1347,6 @@ static void netif_rx_mode_retry(struct timer_list *t) void netif_rx_mode_init(struct net_device *dev) { - INIT_LIST_HEAD(&dev->rx_mode_node); __hw_addr_init(&dev->rx_mode_addr_cache); timer_setup(&dev->rx_mode_retry_timer, netif_rx_mode_retry, 0); } @@ -1442,24 +1394,6 @@ void dev_set_rx_mode(struct net_device *dev) netif_addr_unlock_bh(dev); } -bool netif_rx_mode_clean(struct net_device *dev) -{ - bool clean = false; - - spin_lock_bh(&rx_mode_lock); - if (!list_empty(&dev->rx_mode_node)) { - list_del_init(&dev->rx_mode_node); - clean = true; - /* We must release netdev tracker under - * the spinlock protection. - */ - netdev_tracker_free(dev, &dev->rx_mode_tracker); - } - spin_unlock_bh(&rx_mode_lock); - - return clean; -} - /** * netif_rx_mode_sync() - sync rx mode inline * @dev: network device @@ -1473,11 +1407,6 @@ bool netif_rx_mode_clean(struct net_device *dev) */ void netif_rx_mode_sync(struct net_device *dev) { - if (netif_rx_mode_clean(dev)) { + if (__netdev_work_core_cancel(dev, NETDEV_WORK_RX_MODE)) netif_rx_mode_run(dev); - /* Use __dev_put() because netdev_tracker_free() was already - * called inside netif_rx_mode_clean(). - */ - __dev_put(dev); - } } diff --git a/net/core/filter.c b/net/core/filter.c index 2e96b4b847ce..69ec1a4c0f9d 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4472,7 +4472,7 @@ u32 xdp_master_redirect(struct xdp_buff *xdp) struct net_device *master, *slave; master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev); - if (unlikely(!(master->flags & IFF_UP))) + if (unlikely(!master || !(master->flags & IFF_UP))) return XDP_ABORTED; slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp); if (slave && slave != xdp->rxq->dev) { diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 2a98f5fa74eb..8aa4f9b4df81 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -1173,13 +1173,21 @@ bool __skb_flow_dissect(const struct net *net, if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ETH_ADDRS)) { - struct ethhdr *eth = eth_hdr(skb); struct flow_dissector_key_eth_addrs *key_eth_addrs; key_eth_addrs = skb_flow_dissector_target(flow_dissector, FLOW_DISSECTOR_KEY_ETH_ADDRS, target_container); - memcpy(key_eth_addrs, eth, sizeof(*key_eth_addrs)); + /* TC filter blocks can be shared across devices with + * different link types, so we cannot validate this + * when the filter is installed -- check at dissect time. + */ + if (skb && skb->dev && + skb->dev->type == ARPHRD_ETHER && + skb_mac_header_was_set(skb)) + memcpy(key_eth_addrs, eth_hdr(skb), sizeof(*key_eth_addrs)); + else + memset(key_eth_addrs, 0, sizeof(*key_eth_addrs)); } if (dissector_uses_key(flow_dissector, diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c index f9d76d85d04f..b01a395d9a96 100644 --- a/net/core/lwtunnel.c +++ b/net/core/lwtunnel.c @@ -350,6 +350,8 @@ int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb) rcu_read_lock(); ops = rcu_dereference(lwtun_encaps[lwtstate->type]); if (likely(ops && ops->output)) { + /* Encap pushes outer headers over the metadata; drop it. */ + skb_metadata_clear(skb); dev_xmit_recursion_inc(); ret = ops->output(net, sk, skb); dev_xmit_recursion_dec(); @@ -404,6 +406,8 @@ int lwtunnel_xmit(struct sk_buff *skb) rcu_read_lock(); ops = rcu_dereference(lwtun_encaps[lwtstate->type]); if (likely(ops && ops->xmit)) { + /* Encap pushes outer headers over the metadata; drop it. */ + skb_metadata_clear(skb); dev_xmit_recursion_inc(); ret = ops->xmit(skb); dev_xmit_recursion_dec(); @@ -455,6 +459,8 @@ int lwtunnel_input(struct sk_buff *skb) rcu_read_lock(); ops = rcu_dereference(lwtun_encaps[lwtstate->type]); if (likely(ops && ops->input)) { + /* Encap pushes outer headers over the metadata; drop it. */ + skb_metadata_clear(skb); dev_xmit_recursion_inc(); ret = ops->input(skb); dev_xmit_recursion_dec(); diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c index 11b0b91683d7..c15d8d4ca1f8 100644 --- a/net/core/netdev-genl.c +++ b/net/core/netdev-genl.c @@ -2,6 +2,7 @@ #include <linux/netdevice.h> #include <linux/notifier.h> +#include <linux/pid_namespace.h> #include <linux/rtnetlink.h> #include <net/busy_poll.h> #include <net/net_namespace.h> @@ -189,7 +190,8 @@ netdev_nl_napi_fill_one(struct sk_buff *rsp, struct napi_struct *napi, goto nla_put_failure; if (napi->thread) { - pid = task_pid_nr(napi->thread); + pid = task_pid_nr_ns(napi->thread, + task_active_pid_ns(current)); if (nla_put_u32(rsp, NETDEV_A_NAPI_PID, pid)) goto nla_put_failure; } diff --git a/net/core/netdev_work.c b/net/core/netdev_work.c new file mode 100644 index 000000000000..3109fae132ad --- /dev/null +++ b/net/core/netdev_work.c @@ -0,0 +1,162 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#include <linux/export.h> +#include <linux/list.h> +#include <linux/netdevice.h> +#include <linux/rtnetlink.h> +#include <linux/spinlock.h> +#include <linux/workqueue.h> +#include <net/netdev_lock.h> + +#include "dev.h" + +static void netdev_work_proc(struct work_struct *work); + +/* @netdev_work_lock protects: + * - @netdev_work_list + * - within the list entries (struct net_device fields): + * - work_node + * - work_tracker + * - work_pending + * - work_core_pending + */ +static LIST_HEAD(netdev_work_list); +static DEFINE_SPINLOCK(netdev_work_lock); +static DECLARE_WORK(netdev_work, netdev_work_proc); + +static void netdev_work_enqueue(struct net_device *dev, unsigned long events, + unsigned long core) +{ + if (!events && !core) + return; + + spin_lock_bh(&netdev_work_lock); + if (list_empty(&dev->work_node)) { + list_add_tail(&dev->work_node, &netdev_work_list); + netdev_hold(dev, &dev->work_tracker, GFP_ATOMIC); + } + dev->work_pending |= events; + dev->work_core_pending |= core; + spin_unlock_bh(&netdev_work_lock); + + schedule_work(&netdev_work); +} + +static unsigned long +netdev_work_dequeue(struct net_device *dev, unsigned long *pending, + unsigned long mask) +{ + unsigned long events; + + spin_lock_bh(&netdev_work_lock); + events = *pending & mask; + *pending &= ~events; + if (!list_empty(&dev->work_node) && + !dev->work_pending && !dev->work_core_pending) { + list_del_init(&dev->work_node); + netdev_put(dev, &dev->work_tracker); + } + spin_unlock_bh(&netdev_work_lock); + + return events; +} + +void netdev_work_sched(struct net_device *dev, unsigned long events) +{ + netdev_work_enqueue(dev, events, 0); +} +EXPORT_SYMBOL(netdev_work_sched); + +/** + * netdev_work_cancel() - cancel selected work for a netdev + * @dev: net_device + * @mask: events to cancel + * + * Clear @mask from the device's work pending mask. If no work is left pending + * the device is dequeued and its ndo_work won't be called. + * + * No expectations on locking, but also no guarantees provided. If the caller + * wants to touch @dev afterwards (e.g. call the work that got canceled) + * they have to ensure @dev does not get freed. + * + * Returns: the subset of @mask that was actually pending, so the caller can run + * those events inline. + */ +unsigned long netdev_work_cancel(struct net_device *dev, unsigned long mask) +{ + return netdev_work_dequeue(dev, &dev->work_pending, mask); +} +EXPORT_SYMBOL(netdev_work_cancel); + +void __netdev_work_core_sched(struct net_device *dev, unsigned long events) +{ + netdev_work_enqueue(dev, 0, events); +} + +unsigned long +__netdev_work_core_cancel(struct net_device *dev, unsigned long mask) +{ + return netdev_work_dequeue(dev, &dev->work_core_pending, mask); +} + +static void netdev_work_run(struct net_device *dev, unsigned long events, + unsigned long core) +{ + if (!netif_device_present(dev)) + return; + + if (core & NETDEV_WORK_RX_MODE) + netif_rx_mode_run(dev); + if (events && dev->netdev_ops->ndo_work) + dev->netdev_ops->ndo_work(dev, events); +} + +static void netdev_work_proc(struct work_struct *work) +{ + rtnl_lock(); + + while (true) { + unsigned long events = 0, core = 0; + netdevice_tracker tracker; + struct net_device *dev; + + spin_lock_bh(&netdev_work_lock); + if (list_empty(&netdev_work_list)) { + spin_unlock_bh(&netdev_work_lock); + break; + } + dev = list_first_entry(&netdev_work_list, struct net_device, + work_node); + /* Take a temporary reference so @dev can't be freed while we + * drop the lock to grab its ops lock; the work reference is + * only released once we claim the work below. + * The re-locking dance is to ensure that ops lock is enough + * to ensure canceling work is not racy with dequeue. + */ + netdev_hold(dev, &tracker, GFP_ATOMIC); + spin_unlock_bh(&netdev_work_lock); + + netdev_lock_ops(dev); + spin_lock_bh(&netdev_work_lock); + if (!list_empty(&dev->work_node)) { + list_del_init(&dev->work_node); + core = dev->work_core_pending; + dev->work_core_pending = 0; + events = dev->work_pending; + dev->work_pending = 0; + /* We took another ref above */ + netdev_put(dev, &dev->work_tracker); + + if (!dev_isalive(dev)) + core = events = 0; + } + spin_unlock_bh(&netdev_work_lock); + + netdev_work_run(dev, events, core); + netdev_unlock_ops(dev); + + netdev_put(dev, &tracker); + } + + rtnl_unlock(); +} diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 61d095ce1b3b..12aa3aa1688b 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -2438,6 +2438,14 @@ struct net *rtnl_get_net_ns_capable(struct sock *sk, int netnsid) } EXPORT_SYMBOL_GPL(rtnl_get_net_ns_capable); +bool rtnl_dev_link_net_capable(const struct net_device *dev, + const struct net *link_net) +{ + return net_eq(link_net, dev_net(dev)) || + ns_capable(link_net->user_ns, CAP_NET_ADMIN); +} +EXPORT_SYMBOL_GPL(rtnl_dev_link_net_capable); + static int rtnl_valid_dump_ifinfo_req(const struct nlmsghdr *nlh, bool strict_check, struct nlattr **tb, struct netlink_ext_ack *extack) diff --git a/net/devlink/rate.c b/net/devlink/rate.c index 41be2d6c2954..533d21b028a7 100644 --- a/net/devlink/rate.c +++ b/net/devlink/rate.c @@ -486,16 +486,19 @@ static int devlink_nl_rate_set(struct devlink_rate *devlink_rate, devlink_rate->tx_weight = weight; } - nla_parent = attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME]; - if (nla_parent) { - err = devlink_nl_rate_parent_node_set(devlink_rate, info, - nla_parent); + if (attrs[DEVLINK_ATTR_RATE_TC_BWS]) { + err = devlink_nl_rate_tc_bw_set(devlink_rate, info); if (err) return err; } - if (attrs[DEVLINK_ATTR_RATE_TC_BWS]) { - err = devlink_nl_rate_tc_bw_set(devlink_rate, info); + /* Keep parent setting last because it takes a reference. This function + * has no rollback, so failing after taking the ref would leak it. + */ + nla_parent = attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME]; + if (nla_parent) { + err = devlink_nl_rate_parent_node_set(devlink_rate, info, + nla_parent); if (err) return err; } @@ -725,11 +728,6 @@ devl_rate_node_create(struct devlink *devlink, void *priv, char *node_name, if (!rate_node) return ERR_PTR(-ENOMEM); - if (parent) { - rate_node->parent = parent; - refcount_inc(&rate_node->parent->refcnt); - } - rate_node->type = DEVLINK_RATE_TYPE_NODE; rate_node->devlink = devlink; rate_node->priv = priv; @@ -740,6 +738,11 @@ devl_rate_node_create(struct devlink *devlink, void *priv, char *node_name, return ERR_PTR(-ENOMEM); } + if (parent) { + rate_node->parent = parent; + refcount_inc(&rate_node->parent->refcnt); + } + refcount_set(&rate_node->refcnt, 1); list_add(&rate_node->list, &devlink->rate_list); devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_NEW); diff --git a/net/ethtool/common.h b/net/ethtool/common.h index 2b3847f00801..4e5356e26f40 100644 --- a/net/ethtool/common.h +++ b/net/ethtool/common.h @@ -113,6 +113,8 @@ ethtool_nl_msg_needs_rtnl(const struct net_device *dev, u8 cmd) return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_SPAUSEPARAM; case ETHTOOL_MSG_RSS_SET: return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_RSS; + case ETHTOOL_MSG_LINKSTATE_GET: + return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_GLINK; case ETHTOOL_MSG_TSCONFIG_GET: case ETHTOOL_MSG_TSCONFIG_SET: /* tsconfig calls ndos (ndo_hwtstamp_set/get), not ethtool ops. @@ -159,6 +161,8 @@ ethtool_ioctl_needs_rtnl(const struct net_device *dev, u32 ethcmd) case ETHTOOL_SRXFH: case ETHTOOL_SRXFHINDIR: return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_RSS; + case ETHTOOL_GLINK: + return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_GLINK; } return false; } diff --git a/net/ieee802154/core.c b/net/ieee802154/core.c index 89b671b12600..c0b8712018a1 100644 --- a/net/ieee802154/core.c +++ b/net/ieee802154/core.c @@ -228,36 +228,43 @@ int cfg802154_switch_netns(struct cfg802154_registered_device *rdev, continue; wpan_dev->netdev->netns_immutable = false; err = dev_change_net_namespace(wpan_dev->netdev, net, "wpan%d"); - if (err) + if (err) { + WARN_ON(err && err != -ENOMEM); break; + } wpan_dev->netdev->netns_immutable = true; } - if (err) { - /* failed -- clean up to old netns */ - net = wpan_phy_net(&rdev->wpan_phy); - - list_for_each_entry_continue_reverse(wpan_dev, - &rdev->wpan_dev_list, - list) { - if (!wpan_dev->netdev) - continue; - wpan_dev->netdev->netns_immutable = false; - err = dev_change_net_namespace(wpan_dev->netdev, net, - "wpan%d"); - WARN_ON(err); - wpan_dev->netdev->netns_immutable = true; - } + if (err) + goto errout; - return err; - } + err = device_rename(&rdev->wpan_phy.dev, dev_name(&rdev->wpan_phy.dev)); + WARN_ON(err && err != -ENOMEM); - wpan_phy_net_set(&rdev->wpan_phy, net); + if (err) + goto errout; - err = device_rename(&rdev->wpan_phy.dev, dev_name(&rdev->wpan_phy.dev)); - WARN_ON(err); + wpan_phy_net_set(&rdev->wpan_phy, net); return 0; + +errout: + /* failed -- clean up to old netns */ + net = wpan_phy_net(&rdev->wpan_phy); + + list_for_each_entry_continue_reverse(wpan_dev, + &rdev->wpan_dev_list, + list) { + if (!wpan_dev->netdev) + continue; + wpan_dev->netdev->netns_immutable = false; + err = dev_change_net_namespace(wpan_dev->netdev, net, + "wpan%d"); + WARN_ON(err && err != -ENOMEM); + wpan_dev->netdev->netns_immutable = true; + } + + return err; } void cfg802154_dev_free(struct cfg802154_registered_device *rdev) @@ -351,7 +358,7 @@ static void __net_exit cfg802154_pernet_exit(struct net *net) rtnl_lock(); list_for_each_entry(rdev, &cfg802154_rdev_list, list) { if (net_eq(wpan_phy_net(&rdev->wpan_phy), net)) - WARN_ON(cfg802154_switch_netns(rdev, &init_net)); + cfg802154_switch_netns(rdev, &init_net); } rtnl_unlock(); } diff --git a/net/ieee802154/header_ops.c b/net/ieee802154/header_ops.c index 41a556be1017..a9f0c8df5ae4 100644 --- a/net/ieee802154/header_ops.c +++ b/net/ieee802154/header_ops.c @@ -173,10 +173,13 @@ ieee802154_hdr_get_addr(const u8 *buf, int mode, bool omit_pan, { int pos = 0; - addr->mode = mode; - - if (mode == IEEE802154_ADDR_NONE) + if (mode == IEEE802154_ADDR_NONE) { + memset(addr, 0, sizeof(*addr)); + addr->mode = IEEE802154_ADDR_NONE; return 0; + } + + addr->mode = mode; if (!omit_pan) { memcpy(&addr->pan_id, buf + pos, 2); diff --git a/net/ieee802154/ieee802154.h b/net/ieee802154/ieee802154.h index c5d91f78301a..e765adc4b88f 100644 --- a/net/ieee802154/ieee802154.h +++ b/net/ieee802154/ieee802154.h @@ -16,6 +16,15 @@ void ieee802154_nl_exit(void); .flags = GENL_ADMIN_PERM, \ } +#define IEEE802154_OP_RELAXED(_cmd, _func) \ + { \ + .cmd = _cmd, \ + .doit = _func, \ + .dumpit = NULL, \ + .flags = GENL_ADMIN_PERM, \ + .validate = GENL_DONT_VALIDATE_STRICT,\ + } + #define IEEE802154_DUMP(_cmd, _func, _dump) \ { \ .cmd = _cmd, \ @@ -23,6 +32,14 @@ void ieee802154_nl_exit(void); .dumpit = _dump, \ } +#define IEEE802154_DUMP_PRIV(_cmd, _func, _dump) \ + { \ + .cmd = _cmd, \ + .doit = _func, \ + .dumpit = _dump, \ + .flags = GENL_ADMIN_PERM, \ + } + struct genl_info; struct sk_buff *ieee802154_nl_create(int flags, u8 req); diff --git a/net/ieee802154/netlink.c b/net/ieee802154/netlink.c index 7d2de4ee6992..cacad21347ec 100644 --- a/net/ieee802154/netlink.c +++ b/net/ieee802154/netlink.c @@ -98,24 +98,24 @@ static const struct genl_small_ops ieee802154_ops[] = { IEEE802154_OP(IEEE802154_SET_MACPARAMS, ieee802154_set_macparams), IEEE802154_OP(IEEE802154_LLSEC_GETPARAMS, ieee802154_llsec_getparams), IEEE802154_OP(IEEE802154_LLSEC_SETPARAMS, ieee802154_llsec_setparams), - IEEE802154_DUMP(IEEE802154_LLSEC_LIST_KEY, NULL, - ieee802154_llsec_dump_keys), - IEEE802154_OP(IEEE802154_LLSEC_ADD_KEY, ieee802154_llsec_add_key), - IEEE802154_OP(IEEE802154_LLSEC_DEL_KEY, ieee802154_llsec_del_key), - IEEE802154_DUMP(IEEE802154_LLSEC_LIST_DEV, NULL, - ieee802154_llsec_dump_devs), - IEEE802154_OP(IEEE802154_LLSEC_ADD_DEV, ieee802154_llsec_add_dev), - IEEE802154_OP(IEEE802154_LLSEC_DEL_DEV, ieee802154_llsec_del_dev), - IEEE802154_DUMP(IEEE802154_LLSEC_LIST_DEVKEY, NULL, - ieee802154_llsec_dump_devkeys), - IEEE802154_OP(IEEE802154_LLSEC_ADD_DEVKEY, ieee802154_llsec_add_devkey), - IEEE802154_OP(IEEE802154_LLSEC_DEL_DEVKEY, ieee802154_llsec_del_devkey), - IEEE802154_DUMP(IEEE802154_LLSEC_LIST_SECLEVEL, NULL, - ieee802154_llsec_dump_seclevels), - IEEE802154_OP(IEEE802154_LLSEC_ADD_SECLEVEL, - ieee802154_llsec_add_seclevel), - IEEE802154_OP(IEEE802154_LLSEC_DEL_SECLEVEL, - ieee802154_llsec_del_seclevel), + IEEE802154_DUMP_PRIV(IEEE802154_LLSEC_LIST_KEY, NULL, + ieee802154_llsec_dump_keys), + IEEE802154_OP_RELAXED(IEEE802154_LLSEC_ADD_KEY, ieee802154_llsec_add_key), + IEEE802154_OP_RELAXED(IEEE802154_LLSEC_DEL_KEY, ieee802154_llsec_del_key), + IEEE802154_DUMP_PRIV(IEEE802154_LLSEC_LIST_DEV, NULL, + ieee802154_llsec_dump_devs), + IEEE802154_OP_RELAXED(IEEE802154_LLSEC_ADD_DEV, ieee802154_llsec_add_dev), + IEEE802154_OP_RELAXED(IEEE802154_LLSEC_DEL_DEV, ieee802154_llsec_del_dev), + IEEE802154_DUMP_PRIV(IEEE802154_LLSEC_LIST_DEVKEY, NULL, + ieee802154_llsec_dump_devkeys), + IEEE802154_OP_RELAXED(IEEE802154_LLSEC_ADD_DEVKEY, ieee802154_llsec_add_devkey), + IEEE802154_OP_RELAXED(IEEE802154_LLSEC_DEL_DEVKEY, ieee802154_llsec_del_devkey), + IEEE802154_DUMP_PRIV(IEEE802154_LLSEC_LIST_SECLEVEL, NULL, + ieee802154_llsec_dump_seclevels), + IEEE802154_OP_RELAXED(IEEE802154_LLSEC_ADD_SECLEVEL, + ieee802154_llsec_add_seclevel), + IEEE802154_OP_RELAXED(IEEE802154_LLSEC_DEL_SECLEVEL, + ieee802154_llsec_del_seclevel), }; static const struct genl_multicast_group ieee802154_mcgrps[] = { diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 208dd48012d9..3efdfb4ffa21 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -1457,6 +1457,9 @@ static int ipgre_changelink(struct net_device *dev, struct nlattr *tb[], __u32 fwmark = t->fwmark; int err; + if (!rtnl_dev_link_net_capable(dev, t->net)) + return -EPERM; + err = ipgre_newlink_encap_setup(dev, data); if (err) return err; @@ -1486,6 +1489,9 @@ static int erspan_changelink(struct net_device *dev, struct nlattr *tb[], __u32 fwmark = t->fwmark; int err; + if (!rtnl_dev_link_net_capable(dev, t->net)) + return -EPERM; + err = ipgre_newlink_encap_setup(dev, data); if (err) return err; diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 3b4e9b8af044..e6dd1e5b8c32 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1116,8 +1116,8 @@ alloc_new_skb: !(rt->dst.dev->features & NETIF_F_SG))) alloclen = fraglen; else { - alloclen = fragheaderlen + transhdrlen; - pagedlen = datalen - transhdrlen; + alloclen = fragheaderlen + transhdrlen + fraggap; + pagedlen = datalen - transhdrlen - fraggap; } alloclen += alloc_extra; @@ -1164,9 +1164,6 @@ alloc_new_skb: } copy = datalen - transhdrlen - fraggap - pagedlen; - /* [!] NOTE: copy will be negative if pagedlen>0 - * because then the equation reduces to -fraggap. - */ if (copy > 0 && INDIRECT_CALL_1(getfrag, ip_generic_getfrag, from, data + transhdrlen, offset, diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c index 95b6bb78fcd2..3b80929994a0 100644 --- a/net/ipv4/ip_vti.c +++ b/net/ipv4/ip_vti.c @@ -596,6 +596,9 @@ static int vti_changelink(struct net_device *dev, struct nlattr *tb[], struct ip_tunnel_parm_kern p; __u32 fwmark = t->fwmark; + if (!rtnl_dev_link_net_capable(dev, t->net)) + return -EPERM; + vti_netlink_parms(data, &p, &fwmark); return ip_tunnel_changelink(dev, tb, &p, fwmark); } diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c index 4f89a03e0b49..b643194f57d2 100644 --- a/net/ipv4/ipip.c +++ b/net/ipv4/ipip.c @@ -494,6 +494,9 @@ static int ipip_changelink(struct net_device *dev, struct nlattr *tb[], bool collect_md; __u32 fwmark = t->fwmark; + if (!rtnl_dev_link_net_capable(dev, t->net)) + return -EPERM; + if (ip_tunnel_netlink_encap_parms(data, &ipencap)) { int err = ip_tunnel_encap_setup(t, &ipencap); diff --git a/net/ipv4/netfilter/nf_reject_ipv4.c b/net/ipv4/netfilter/nf_reject_ipv4.c index fecf6621f679..4626dc46808f 100644 --- a/net/ipv4/netfilter/nf_reject_ipv4.c +++ b/net/ipv4/netfilter/nf_reject_ipv4.c @@ -89,7 +89,7 @@ static bool nf_skb_is_icmp_unreach(const struct sk_buff *skb) if (iph->protocol != IPPROTO_ICMP) return false; - thoff = skb_network_offset(skb) + sizeof(*iph); + thoff = skb_network_offset(skb) + ip_hdrlen(skb); tp = skb_header_pointer(skb, thoff + offsetof(struct icmphdr, type), diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index c0e85cc171ae..ca1180dba1de 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -1058,7 +1058,9 @@ static struct ctl_table ipv4_net_table[] = { .data = &init_net.ipv4.sysctl_tcp_reordering, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ONE, + .extra2 = &init_net.ipv4.sysctl_tcp_max_reordering, }, { .procname = "tcp_retries1", @@ -1293,7 +1295,8 @@ static struct ctl_table ipv4_net_table[] = { .data = &init_net.ipv4.sysctl_tcp_max_reordering, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ONE, }, { .procname = "tcp_dsack", @@ -1676,6 +1679,9 @@ static __net_init int ipv4_sysctl_init_net(struct net *net) */ table[i].mode &= ~0222; } + if (table[i].extra2 >= (void *)&init_net.ipv4 && + table[i].extra2 < (void *)(&init_net.ipv4 + 1)) + table[i].extra2 += (void *)net - (void *)&init_net; } } diff --git a/net/ipv4/tcp_ao.c b/net/ipv4/tcp_ao.c index 2f69bcecae78..a56bb79e15e0 100644 --- a/net/ipv4/tcp_ao.c +++ b/net/ipv4/tcp_ao.c @@ -1747,6 +1747,10 @@ static int tcp_ao_delete_key(struct sock *sk, struct tcp_ao_info *ao_info, * them and we can just free all resources in RCU fashion. */ if (del_async) { + if (ao_info->current_key == key) + WRITE_ONCE(ao_info->current_key, NULL); + if (ao_info->rnext_key == key) + WRITE_ONCE(ao_info->rnext_key, NULL); atomic_sub(tcp_ao_sizeof_key(key), &sk->sk_omem_alloc); call_rcu(&key->rcu, tcp_ao_key_free_rcu); return 0; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 26dd751ec72a..00ec4b5900f2 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2688,7 +2688,7 @@ static int tcp_mtu_probe(struct sock *sk) struct sk_buff *skb, *nskb, *next; struct net *net = sock_net(sk); int probe_size; - int size_needed; + u64 size_needed; int copy, len; int mss_now; int interval; @@ -2712,7 +2712,7 @@ static int tcp_mtu_probe(struct sock *sk) mss_now = tcp_current_mss(sk); probe_size = tcp_mtu_to_mss(sk, (icsk->icsk_mtup.search_high + icsk->icsk_mtup.search_low) >> 1); - size_needed = probe_size + (tp->reordering + 1) * tp->mss_cache; + size_needed = probe_size + (tp->reordering + 1) * (u64)tp->mss_cache; interval = icsk->icsk_mtup.search_high - icsk->icsk_mtup.search_low; /* When misfortune happens, we are reprobing actively, * and then reprobe timer has expired. We stick with current diff --git a/net/ipv4/udp_tunnel_nic.c b/net/ipv4/udp_tunnel_nic.c index 9944ed923ddf..3b32a0afa979 100644 --- a/net/ipv4/udp_tunnel_nic.c +++ b/net/ipv4/udp_tunnel_nic.c @@ -301,7 +301,7 @@ __udp_tunnel_nic_device_sync(struct net_device *dev, struct udp_tunnel_nic *utn) static void udp_tunnel_nic_device_sync(struct net_device *dev, struct udp_tunnel_nic *utn) { - if (!utn->need_sync) + if (!utn->need_sync || utn->work_pending) return; queue_work(udp_tunnel_nic_workqueue, &utn->work); diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c index c2eac844bcdb..f6f2a8ef3f88 100644 --- a/net/ipv4/xfrm4_input.c +++ b/net/ipv4/xfrm4_input.c @@ -76,8 +76,6 @@ int xfrm4_transport_finish(struct sk_buff *skb, int async) NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, dev_net(dev), NULL, skb, dev, NULL, xfrm4_rcv_encap_finish); - if (async) - dev_put(dev); return 0; } diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 1f21ccb55caa..cbe681de3818 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -913,7 +913,7 @@ static int addrconf_fixup_forwarding(const struct ctl_table *table, int *p, int if (newf) rt6_purge_dflt_routers(net); - return 1; + return 0; } static void addrconf_linkdown_change(struct net *net, __s32 newf) @@ -955,11 +955,7 @@ static int addrconf_fixup_linkdown(const struct ctl_table *table, int *p, int ne NETCONFA_IGNORE_ROUTES_WITH_LINKDOWN, NETCONFA_IFINDEX_DEFAULT, net->ipv6.devconf_dflt); - rtnl_net_unlock(net); - return 0; - } - - if (p == &net->ipv6.devconf_all->ignore_routes_with_linkdown) { + } else if (p == &net->ipv6.devconf_all->ignore_routes_with_linkdown) { WRITE_ONCE(net->ipv6.devconf_dflt->ignore_routes_with_linkdown, newf); addrconf_linkdown_change(net, newf); if ((!newf) ^ (!old)) @@ -968,11 +964,21 @@ static int addrconf_fixup_linkdown(const struct ctl_table *table, int *p, int ne NETCONFA_IGNORE_ROUTES_WITH_LINKDOWN, NETCONFA_IFINDEX_ALL, net->ipv6.devconf_all); + } else { + if (!newf ^ !old) { + struct inet6_dev *idev = table->extra1; + + inet6_netconf_notify_devconf(net, + RTM_NEWNETCONF, + NETCONFA_IGNORE_ROUTES_WITH_LINKDOWN, + idev->dev->ifindex, + &idev->cnf); + } } rtnl_net_unlock(net); - return 1; + return 0; } #endif @@ -6370,6 +6376,8 @@ static int addrconf_sysctl_forward(const struct ctl_table *ctl, int write, lctl.data = &val; ret = proc_dointvec(&lctl, write, buffer, lenp, ppos); + if (ret) + return ret; if (write) ret = addrconf_fixup_forwarding(ctl, valp, val); @@ -6467,6 +6475,8 @@ static int addrconf_sysctl_disable(const struct ctl_table *ctl, int write, lctl.data = &val; ret = proc_dointvec(&lctl, write, buffer, lenp, ppos); + if (ret) + return ret; if (write) ret = addrconf_disable_ipv6(ctl, valp, val); @@ -6478,20 +6488,19 @@ static int addrconf_sysctl_disable(const struct ctl_table *ctl, int write, static int addrconf_sysctl_proxy_ndp(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { + struct net *net = ctl->extra2; int *valp = ctl->data; - int ret; int old, new; + int ret; + + if (write && !rtnl_net_trylock(net)) + return restart_syscall(); old = *valp; ret = proc_dointvec(ctl, write, buffer, lenp, ppos); new = *valp; if (write && old != new) { - struct net *net = ctl->extra2; - - if (!rtnl_net_trylock(net)) - return restart_syscall(); - if (valp == &net->ipv6.devconf_dflt->proxy_ndp) { inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, NETCONFA_PROXY_NEIGH, @@ -6510,8 +6519,9 @@ static int addrconf_sysctl_proxy_ndp(const struct ctl_table *ctl, int write, idev->dev->ifindex, &idev->cnf); } - rtnl_net_unlock(net); } + if (write) + rtnl_net_unlock(net); return ret; } @@ -6669,6 +6679,8 @@ int addrconf_sysctl_ignore_routes_with_linkdown(const struct ctl_table *ctl, lctl.data = &val; ret = proc_dointvec(&lctl, write, buffer, lenp, ppos); + if (ret) + return ret; if (write) ret = addrconf_fixup_linkdown(ctl, valp, val); @@ -6763,6 +6775,8 @@ static int addrconf_sysctl_disable_policy(const struct ctl_table *ctl, int write lctl = *ctl; lctl.data = &val; ret = proc_dointvec(&lctl, write, buffer, lenp, ppos); + if (ret) + return ret; if (write && (*valp != val)) ret = addrconf_disable_policy(ctl, valp, val); diff --git a/net/ipv6/ioam6_iptunnel.c b/net/ipv6/ioam6_iptunnel.c index b9f6d892a566..cfb2c41634a0 100644 --- a/net/ipv6/ioam6_iptunnel.c +++ b/net/ipv6/ioam6_iptunnel.c @@ -35,7 +35,7 @@ struct ioam6_lwt_freq { }; struct ioam6_lwt { - struct dst_entry null_dst; + struct rt6_info null_rt; struct dst_cache cache; struct ioam6_lwt_freq freq; atomic_t pkt_cnt; @@ -176,7 +176,7 @@ static int ioam6_build_state(struct net *net, struct nlattr *nla, * it is stored in the cache. Then, +1/-1 each time we read the cache * and release it. Long story short, we're fine. */ - dst_init(&ilwt->null_dst, NULL, NULL, DST_OBSOLETE_NONE, DST_NOCOUNT); + dst_init(&ilwt->null_rt.dst, NULL, NULL, DST_OBSOLETE_NONE, DST_NOCOUNT); atomic_set(&ilwt->pkt_cnt, 0); ilwt->freq.k = freq_k; @@ -360,7 +360,7 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb) /* This is how we notify that the destination does not change after * transformation and that we need to use orig_dst instead of the cache */ - if (dst == &ilwt->null_dst) { + if (dst == &ilwt->null_rt.dst) { dst_release(dst); dst = orig_dst; @@ -429,7 +429,7 @@ do_encap: local_bh_disable(); if (orig_dst->lwtstate == dst->lwtstate) dst_cache_set_ip6(&ilwt->cache, - &ilwt->null_dst, &fl6.saddr); + &ilwt->null_rt.dst, &fl6.saddr); else dst_cache_set_ip6(&ilwt->cache, dst, &fl6.saddr); local_bh_enable(); diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index 795be59946f7..7c09a269b352 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -2047,6 +2047,9 @@ static int ip6gre_changelink(struct net_device *dev, struct nlattr *tb[], struct ip6gre_net *ign = net_generic(t->net, ip6gre_net_id); struct __ip6_tnl_parm p; + if (!rtnl_dev_link_net_capable(dev, t->net)) + return -EPERM; + t = ip6gre_changelink_common(dev, tb, data, &p, extack); if (IS_ERR(t)) return PTR_ERR(t); @@ -2266,6 +2269,9 @@ static int ip6erspan_changelink(struct net_device *dev, struct nlattr *tb[], struct __ip6_tnl_parm p; struct ip6gre_net *ign; + if (!rtnl_dev_link_net_capable(dev, t->net)) + return -EPERM; + ign = net_generic(t->net, ip6gre_net_id); t = ip6gre_changelink_common(dev, tb, data, &p, extack); if (IS_ERR(t)) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 9f1e0e4f7464..368e4fa3b43c 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1667,8 +1667,8 @@ alloc_new_skb: !(rt->dst.dev->features & NETIF_F_SG))) alloclen = fraglen; else { - alloclen = fragheaderlen + transhdrlen; - pagedlen = datalen - transhdrlen; + alloclen = fragheaderlen + transhdrlen + fraggap; + pagedlen = datalen - transhdrlen - fraggap; } alloclen += alloc_extra; @@ -1683,10 +1683,7 @@ alloc_new_skb: fraglen = datalen + fragheaderlen; copy = datalen - transhdrlen - fraggap - pagedlen; - /* [!] NOTE: copy may be negative if pagedlen>0 - * because then the equation may reduces to -fraggap. - */ - if (copy < 0 && !(flags & MSG_SPLICE_PAGES)) { + if (copy < 0) { err = -EINVAL; goto error; } diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 73fc5a0b8203..bf8e40af60b0 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -1851,6 +1851,13 @@ static int ip6_tnl_fill_forward_path(struct net_device_path_ctx *ctx, struct dst_entry *dst; int err; + if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) { + /* encaplimit option is currently not supported is + * sw-acceleration path. + */ + return -EOPNOTSUPP; + } + dst = ip6_route_output(dev_net(ctx->dev), NULL, &fl6); if (!dst->error) { path->type = DEV_PATH_TUN; @@ -2103,6 +2110,9 @@ static int ip6_tnl_changelink(struct net_device *dev, struct nlattr *tb[], struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id); struct ip_tunnel_encap ipencap; + if (!rtnl_dev_link_net_capable(dev, net)) + return -EPERM; + if (dev == ip6n->fb_tnl_dev) { if (ip_tunnel_netlink_encap_parms(data, &ipencap)) { /* iproute2 always sets TUNNEL_ENCAP_FLAG_CSUM6, so diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c index d871cab6938d..ab94b3a4ba9c 100644 --- a/net/ipv6/ip6_vti.c +++ b/net/ipv6/ip6_vti.c @@ -1046,6 +1046,9 @@ static int vti6_changelink(struct net_device *dev, struct nlattr *tb[], struct __ip6_tnl_parm p; struct vti6_net *ip6n; + if (!rtnl_dev_link_net_capable(dev, net)) + return -EPERM; + ip6n = net_generic(net, vti6_net_id); if (dev == ip6n->fb_tnl_dev) return -EINVAL; diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index e7ad13c5bd26..f867ec8d3d90 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -967,10 +967,8 @@ out: return reason; } -static int accept_untracked_na(struct net_device *dev, struct in6_addr *saddr) +static int accept_untracked_na(struct inet6_dev *idev, struct in6_addr *saddr) { - struct inet6_dev *idev = __in6_dev_get(dev); - switch (READ_ONCE(idev->cnf.accept_untracked_na)) { case 0: /* Don't accept untracked na (absent in neighbor cache) */ return 0; @@ -980,7 +978,7 @@ static int accept_untracked_na(struct net_device *dev, struct in6_addr *saddr) * same subnet as an address configured on the interface that * received the na */ - return !!ipv6_chk_prefix(saddr, dev); + return !!ipv6_chk_prefix(saddr, idev->dev); default: return 0; } @@ -1078,7 +1076,7 @@ static enum skb_drop_reason ndisc_recv_na(struct sk_buff *skb) */ new_state = msg->icmph.icmp6_solicited ? NUD_REACHABLE : NUD_STALE; if (!neigh && lladdr && idev && READ_ONCE(idev->cnf.forwarding)) { - if (accept_untracked_na(dev, saddr)) { + if (accept_untracked_na(idev, saddr)) { neigh = neigh_create(&nd_tbl, &msg->target, dev); new_state = NUD_STALE; } diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 6361ad2fcf77..a1301334da48 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -5055,6 +5055,9 @@ static int fib6_nh_mtu_change(struct fib6_nh *nh, void *_arg) struct inet6_dev *idev = __in6_dev_get(arg->dev); u32 mtu = f6i->fib6_pmtu; + if (!idev) + return 0; + if (mtu >= arg->mtu || (mtu < arg->mtu && mtu == idev->cnf.mtu6)) fib6_metric_set(f6i, RTAX_MTU, arg->mtu); diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index 64f0d1b622d3..a38b24fb8384 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -1613,6 +1613,9 @@ static int ipip6_changelink(struct net_device *dev, struct nlattr *tb[], __u32 fwmark = t->fwmark; int err; + if (!rtnl_dev_link_net_capable(dev, net)) + return -EPERM; + if (dev == sitn->fb_tunnel_dev) return -EINVAL; diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c index 699a001ac166..89d0443b5307 100644 --- a/net/ipv6/xfrm6_input.c +++ b/net/ipv6/xfrm6_input.c @@ -71,8 +71,6 @@ int xfrm6_transport_finish(struct sk_buff *skb, int async) NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING, dev_net(dev), NULL, skb, dev, NULL, xfrm6_transport_finish2); - if (async) - dev_put(dev); return 0; } diff --git a/net/key/af_key.c b/net/key/af_key.c index a714c997aab2..1d8965d7f4f3 100644 --- a/net/key/af_key.c +++ b/net/key/af_key.c @@ -1218,6 +1218,7 @@ static struct xfrm_state * pfkey_msg2xfrm_state(struct net *net, goto out; } strcpy(x->calg->alg_name, a->name); + x->calg->alg_key_len = 0; x->props.calgo = sa->sadb_sa_encrypt; } else { int keysize = 0; diff --git a/net/llc/sysctl_net_llc.c b/net/llc/sysctl_net_llc.c index c8d88e2508fc..15f1e5d88f20 100644 --- a/net/llc/sysctl_net_llc.c +++ b/net/llc/sysctl_net_llc.c @@ -47,7 +47,7 @@ static struct ctl_table_header *llc_station_header; int __init llc_sysctl_init(void) { - struct ctl_table empty[1] = {}; + static struct ctl_table empty[1] = {}; llc2_timeout_header = register_net_sysctl(&init_net, "net/llc/llc2/timeout", llc2_timeout_table); llc_station_header = register_net_sysctl_sz(&init_net, "net/llc/station", empty, 0); diff --git a/net/mac802154/llsec.c b/net/mac802154/llsec.c index e8512578398e..5e7cc11fab3a 100644 --- a/net/mac802154/llsec.c +++ b/net/mac802154/llsec.c @@ -710,6 +710,7 @@ int mac802154_llsec_encrypt(struct mac802154_llsec *sec, struct sk_buff *skb) { struct ieee802154_hdr hdr; int rc, authlen, hlen; + struct sk_buff *trailer; struct mac802154_llsec_key *key; u32 frame_ctr; @@ -769,6 +770,12 @@ int mac802154_llsec_encrypt(struct mac802154_llsec *sec, struct sk_buff *skb) skb->mac_len = ieee802154_hdr_push(skb, &hdr); skb_reset_mac_header(skb); + rc = skb_cow_data(skb, 0, &trailer); + if (rc < 0) { + llsec_key_put(key); + return rc; + } + rc = llsec_do_encrypt(skb, sec, &hdr, key); llsec_key_put(key); @@ -908,6 +915,13 @@ llsec_do_decrypt(struct sk_buff *skb, const struct mac802154_llsec *sec, const struct ieee802154_hdr *hdr, struct mac802154_llsec_key *key, __le64 dev_addr) { + struct sk_buff *trailer; + int err; + + err = skb_cow_data(skb, 0, &trailer); + if (err < 0) + return err; + if (hdr->sec.level == IEEE802154_SCF_SECLEVEL_ENC) return llsec_do_decrypt_unauth(skb, sec, hdr, key, dev_addr); else diff --git a/net/mac802154/scan.c b/net/mac802154/scan.c index 0a31ac8d8415..300d4584533e 100644 --- a/net/mac802154/scan.c +++ b/net/mac802154/scan.c @@ -594,6 +594,7 @@ int mac802154_perform_association(struct ieee802154_sub_if_data *sdata, "Negative ASSOC RESP received from %8phC: %s\n", &ceaddr, local->assoc_status == IEEE802154_PAN_AT_CAPACITY ? "PAN at capacity" : "access denied"); + goto clear_assoc; } ret = 0; diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index 665f8008cc4b..4c04cd8d40a2 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -256,8 +256,7 @@ config NF_CONNTRACK_H323 To compile it as a module, choose M here. If unsure, say N. config NF_CONNTRACK_IRC - tristate "IRC protocol support" - default m if NETFILTER_ADVANCED=n + tristate "IRC DCC protocol support (obsolete)" help There is a commonly-used extension to IRC called Direct Client-to-Client Protocol (DCC). This enables users to send @@ -267,6 +266,8 @@ config NF_CONNTRACK_IRC using NAT, this extension will enable you to send files and initiate chats. Note that you do NOT need this extension to get files or have others initiate chats, or everything else in IRC. + DCC tracking behind NAT requires plaintext (unencrypted) IRC, so + this helper is of limited use these days. To compile it as a module, choose M here. If unsure, say N. @@ -308,17 +309,17 @@ config NF_CONNTRACK_SNMP To compile it as a module, choose M here. If unsure, say N. config NF_CONNTRACK_PPTP - tristate "PPtP protocol support" + tristate "PPtP protocol support (deprecated)" depends on NETFILTER_ADVANCED select NF_CT_PROTO_GRE help This module adds support for PPTP (Point to Point Tunnelling Protocol, RFC2637) connection tracking and NAT. - If you are running PPTP sessions over a stateful firewall or NAT + If you are still running PPTP sessions over a stateful firewall or NAT box, you may want to enable this feature. - Please note that not all PPTP modes of operation are supported yet. + Please note that not all PPTP modes of operation are supported. Specifically these limitations exist: - Blindly assumes that control connections are always established in PNS->PAC direction. This is a violation of RFC2637. diff --git a/net/netfilter/ipset/ip_set_bitmap_gen.h b/net/netfilter/ipset/ip_set_bitmap_gen.h index 798c7993635e..bb9b5bed10e1 100644 --- a/net/netfilter/ipset/ip_set_bitmap_gen.h +++ b/net/netfilter/ipset/ip_set_bitmap_gen.h @@ -165,6 +165,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, ip_set_init_skbinfo(ext_skbinfo(x, set), ext); /* Activate element */ + smp_mb__before_atomic(); set_bit(e->id, map->members); set->elements++; @@ -219,7 +220,7 @@ mtype_list(const struct ip_set *set, cond_resched_rcu(); id = cb->args[IPSET_CB_ARG0]; x = get_ext(set, map, id); - if (!test_bit(id, map->members) || + if (!test_bit_acquire(id, map->members) || (SET_WITH_TIMEOUT(set) && #ifdef IP_SET_BITMAP_STORED_TIMEOUT mtype_is_filled(x) && @@ -278,6 +279,7 @@ mtype_gc(struct timer_list *t) x = get_ext(set, map, id); if (ip_set_timeout_expired(ext_timeout(x, set))) { clear_bit(id, map->members); + smp_mb__after_atomic(); ip_set_ext_destroy(set, x); set->elements--; } diff --git a/net/netfilter/ipset/ip_set_bitmap_ip.c b/net/netfilter/ipset/ip_set_bitmap_ip.c index 5988b9bb9029..ac7febce074f 100644 --- a/net/netfilter/ipset/ip_set_bitmap_ip.c +++ b/net/netfilter/ipset/ip_set_bitmap_ip.c @@ -67,7 +67,7 @@ static int bitmap_ip_do_test(const struct bitmap_ip_adt_elem *e, struct bitmap_ip *map, size_t dsize) { - return !!test_bit(e->id, map->members); + return !!test_bit_acquire(e->id, map->members); } static int diff --git a/net/netfilter/ipset/ip_set_bitmap_ipmac.c b/net/netfilter/ipset/ip_set_bitmap_ipmac.c index 752f59ef8744..5921fd9d2dca 100644 --- a/net/netfilter/ipset/ip_set_bitmap_ipmac.c +++ b/net/netfilter/ipset/ip_set_bitmap_ipmac.c @@ -86,7 +86,7 @@ bitmap_ipmac_do_test(const struct bitmap_ipmac_adt_elem *e, { const struct bitmap_ipmac_elem *elem; - if (!test_bit(e->id, map->members)) + if (!test_bit_acquire(e->id, map->members)) return 0; elem = get_const_elem(map->extensions, e->id, dsize); if (e->add_mac && elem->filled == MAC_FILLED) diff --git a/net/netfilter/ipset/ip_set_bitmap_port.c b/net/netfilter/ipset/ip_set_bitmap_port.c index 7138e080def4..ca875c982424 100644 --- a/net/netfilter/ipset/ip_set_bitmap_port.c +++ b/net/netfilter/ipset/ip_set_bitmap_port.c @@ -58,7 +58,7 @@ static int bitmap_port_do_test(const struct bitmap_port_adt_elem *e, const struct bitmap_port *map, size_t dsize) { - return !!test_bit(e->id, map->members); + return !!test_bit_acquire(e->id, map->members); } static int diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c index 3706b4a85a0f..a531b654b8d9 100644 --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -351,8 +351,8 @@ ip_set_init_comment(struct ip_set *set, struct ip_set_comment *comment, if (unlikely(c)) { set->ext_size -= sizeof(*c) + strlen(c->str) + 1; - kfree_rcu(c, rcu); rcu_assign_pointer(comment->c, NULL); + kfree_rcu(c, rcu); } if (!len) return; @@ -393,8 +393,8 @@ ip_set_comment_free(struct ip_set *set, void *ptr) if (unlikely(!c)) return; set->ext_size -= sizeof(*c) + strlen(c->str) + 1; - kfree_rcu(c, rcu); rcu_assign_pointer(comment->c, NULL); + kfree_rcu(c, rcu); } typedef void (*destroyer)(struct ip_set *, void *); diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h index 04e4627ddfc1..dedf59b661dd 100644 --- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -606,7 +606,7 @@ mtype_cancel_gc(struct ip_set *set) struct htype *h = set->data; if (SET_WITH_TIMEOUT(set)) - cancel_delayed_work_sync(&h->gc.dwork); + disable_delayed_work_sync(&h->gc.dwork); } static int @@ -689,7 +689,7 @@ retry: continue; pos = smp_load_acquire(&n->pos); for (j = 0; j < pos; j++) { - if (!test_bit(j, n->used)) + if (!test_bit_acquire(j, n->used)) continue; data = ahash_data(n, j, dsize); if (SET_ELEM_EXPIRED(set, data)) @@ -826,7 +826,7 @@ mtype_ext_size(struct ip_set *set, u32 *elements, size_t *ext_size) continue; pos = smp_load_acquire(&n->pos); for (j = 0; j < pos; j++) { - if (!test_bit(j, n->used)) + if (!test_bit_acquire(j, n->used)) continue; data = ahash_data(n, j, set->dsize); if (!SET_ELEM_EXPIRED(set, data)) @@ -1201,7 +1201,7 @@ mtype_test_cidrs(struct ip_set *set, struct mtype_elem *d, continue; pos = smp_load_acquire(&n->pos); for (i = 0; i < pos; i++) { - if (!test_bit(i, n->used)) + if (!test_bit_acquire(i, n->used)) continue; data = ahash_data(n, i, set->dsize); if (!mtype_data_equal(data, d, &multi)) @@ -1259,7 +1259,7 @@ mtype_test(struct ip_set *set, void *value, const struct ip_set_ext *ext, } pos = smp_load_acquire(&n->pos); for (i = 0; i < pos; i++) { - if (!test_bit(i, n->used)) + if (!test_bit_acquire(i, n->used)) continue; data = ahash_data(n, i, set->dsize); if (!mtype_data_equal(data, d, &multi)) @@ -1396,7 +1396,7 @@ mtype_list(const struct ip_set *set, continue; pos = smp_load_acquire(&n->pos); for (i = 0; i < pos; i++) { - if (!test_bit(i, n->used)) + if (!test_bit_acquire(i, n->used)) continue; e = ahash_data(n, i, set->dsize); if (SET_ELEM_EXPIRED(set, e)) diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c index dd67004a5cc0..91582069f6d2 100644 --- a/net/netfilter/nf_conncount.c +++ b/net/netfilter/nf_conncount.c @@ -183,17 +183,16 @@ static int __nf_conncount_add(struct net *net, return -ENOENT; if (ct && nf_ct_is_confirmed(ct)) { - /* local connections are confirmed in postrouting so confirmation - * might have happened before hitting connlimit + /* Connection is confirmed but might still be in the setup phase. + * Only skip the tracking if it is fully assured. This guarantees + * that setup packets or retransmissions are properly counted and + * deduplicated. */ - if (skb->skb_iif != LOOPBACK_IFINDEX) { + if (test_bit(IPS_ASSURED_BIT, &ct->status)) { err = -EEXIST; goto out_put; } - /* this is likely a local connection, skip optimization to avoid - * adding duplicates from a 'packet train' - */ goto check_connections; } diff --git a/net/netfilter/nf_conntrack_broadcast.c b/net/netfilter/nf_conntrack_broadcast.c index 400119b6320e..bf78828c7549 100644 --- a/net/netfilter/nf_conntrack_broadcast.c +++ b/net/netfilter/nf_conntrack_broadcast.c @@ -62,6 +62,7 @@ int nf_conntrack_broadcast_help(struct sk_buff *skb, if (exp == NULL) goto out; + exp->master_tuple = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple; exp->tuple = ct->tuplehash[IP_CT_DIR_REPLY].tuple; helper = rcu_dereference(help->helper); diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 4fb3a2d18631..784bd1d7a9bf 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -1471,6 +1471,31 @@ static bool gc_worker_can_early_drop(const struct nf_conn *ct) return false; } +static void nf_ct_help_gc(struct nf_conn *ct) +{ + struct nf_conn_help *help; + + if (!refcount_inc_not_zero(&ct->ct_general.use)) + return; + + /* load ->status after refcount increase */ + smp_acquire__after_ctrl_dep(); + + if (!nf_ct_is_confirmed(ct) || nf_ct_is_dying(ct)) { + nf_ct_put(ct); + return; + } + + /* re-check helper due to SLAB_TYPESAFE_BY_RCU */ + if (test_bit(IPS_HELPER_BIT, &ct->status)) { + help = nfct_help(ct); + if (help) + nf_ct_expectation_gc(help); + } + + nf_ct_put(ct); +} + static void gc_worker(struct work_struct *work) { unsigned int i, hashsz, nf_conntrack_max95 = 0; @@ -1543,7 +1568,13 @@ static void gc_worker(struct work_struct *work) expires = (expires - (long)next_run) / ++count; next_run += expires; - if (nf_conntrack_max95 == 0 || gc_worker_skip_ct(tmp)) + if (gc_worker_skip_ct(tmp)) + continue; + + if (test_bit(IPS_HELPER_BIT, &tmp->status)) + nf_ct_help_gc(tmp); + + if (nf_conntrack_max95 == 0) continue; net = nf_ct_net(tmp); diff --git a/net/netfilter/nf_conntrack_expect.c b/net/netfilter/nf_conntrack_expect.c index 5c9b17835c28..38630c5e006f 100644 --- a/net/netfilter/nf_conntrack_expect.c +++ b/net/netfilter/nf_conntrack_expect.c @@ -43,6 +43,24 @@ unsigned int nf_ct_expect_max __read_mostly; static struct kmem_cache *nf_ct_expect_cachep __read_mostly; static siphash_aligned_key_t nf_ct_expect_hashrnd; +void nf_ct_expectation_gc(struct nf_conn_help *master_help) +{ + struct nf_conntrack_expect *exp; + struct hlist_node *next; + + if (hlist_empty(&master_help->expectations)) + return; + + spin_lock_bh(&nf_conntrack_expect_lock); + hlist_for_each_entry_safe(exp, next, &master_help->expectations, lnode) { + if (!nf_ct_exp_is_expired(exp)) + continue; + + nf_ct_unlink_expect(exp); + } + spin_unlock_bh(&nf_conntrack_expect_lock); +} + /* nf_conntrack_expect helper functions */ void nf_ct_unlink_expect_report(struct nf_conntrack_expect *exp, u32 portid, int report) @@ -52,7 +70,6 @@ void nf_ct_unlink_expect_report(struct nf_conntrack_expect *exp, struct nf_conntrack_net *cnet; lockdep_nfct_expect_lock_held(); - WARN_ON_ONCE(timer_pending(&exp->timeout)); hlist_del_rcu(&exp->hnode); @@ -70,16 +87,6 @@ void nf_ct_unlink_expect_report(struct nf_conntrack_expect *exp, } EXPORT_SYMBOL_GPL(nf_ct_unlink_expect_report); -static void nf_ct_expectation_timed_out(struct timer_list *t) -{ - struct nf_conntrack_expect *exp = timer_container_of(exp, t, timeout); - - spin_lock_bh(&nf_conntrack_expect_lock); - nf_ct_unlink_expect(exp); - spin_unlock_bh(&nf_conntrack_expect_lock); - nf_ct_expect_put(exp); -} - static unsigned int nf_ct_expect_dst_hash(const struct net *n, const struct nf_conntrack_tuple *tuple) { struct { @@ -117,19 +124,6 @@ nf_ct_exp_equal(const struct nf_conntrack_tuple *tuple, nf_ct_exp_zone_equal_any(i, zone); } -bool nf_ct_remove_expect(struct nf_conntrack_expect *exp) -{ - lockdep_nfct_expect_lock_held(); - - if (timer_delete(&exp->timeout)) { - nf_ct_unlink_expect(exp); - nf_ct_expect_put(exp); - return true; - } - return false; -} -EXPORT_SYMBOL_GPL(nf_ct_remove_expect); - struct nf_conntrack_expect * __nf_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone, @@ -144,6 +138,8 @@ __nf_ct_expect_find(struct net *net, h = nf_ct_expect_dst_hash(net, tuple); hlist_for_each_entry_rcu(i, &nf_ct_expect_hash[h], hnode) { + if (nf_ct_exp_is_expired(i)) + continue; if (nf_ct_exp_equal(tuple, i, zone, net)) return i; } @@ -178,6 +174,7 @@ nf_ct_find_expectation(struct net *net, { struct nf_conntrack_net *cnet = nf_ct_pernet(net); struct nf_conntrack_expect *i, *exp = NULL; + struct hlist_node *next; unsigned int h; lockdep_nfct_expect_lock_held(); @@ -186,7 +183,11 @@ nf_ct_find_expectation(struct net *net, return NULL; h = nf_ct_expect_dst_hash(net, tuple); - hlist_for_each_entry(i, &nf_ct_expect_hash[h], hnode) { + hlist_for_each_entry_safe(i, next, &nf_ct_expect_hash[h], hnode) { + if (nf_ct_exp_is_expired(i)) { + nf_ct_unlink_expect(i); + continue; + } if (!(i->flags & NF_CT_EXPECT_INACTIVE) && nf_ct_exp_equal(tuple, i, zone, net)) { exp = i; @@ -196,13 +197,16 @@ nf_ct_find_expectation(struct net *net, if (!exp) return NULL; + if (!refcount_inc_not_zero(&exp->use)) + return NULL; + /* If master is not in hash table yet (ie. packet hasn't left this machine yet), how can other end know about expected? Hence these are not the droids you are looking for (if master ct never got confirmed, we'd hold a reference to it and weird things would happen to future packets). */ if (!nf_ct_is_confirmed(exp->master)) - return NULL; + goto err_release_exp; /* Avoid race with other CPUs, that for exp->master ct, is * about to invoke ->destroy(), or nf_ct_delete() via timeout @@ -214,18 +218,17 @@ nf_ct_find_expectation(struct net *net, */ if (unlikely(nf_ct_is_dying(exp->master) || !refcount_inc_not_zero(&exp->master->ct_general.use))) - return NULL; + goto err_release_exp; - if (exp->flags & NF_CT_EXPECT_PERMANENT || !unlink) { - refcount_inc(&exp->use); + if (exp->flags & NF_CT_EXPECT_PERMANENT || !unlink) return exp; - } else if (timer_delete(&exp->timeout)) { - nf_ct_unlink_expect(exp); - return exp; - } - /* Undo exp->master refcnt increase, if timer_delete() failed */ - nf_ct_put(exp->master); + nf_ct_unlink_expect(exp); + + return exp; + +err_release_exp: + nf_ct_expect_put(exp); return NULL; } @@ -241,9 +244,8 @@ void nf_ct_remove_expectations(struct nf_conn *ct) return; spin_lock_bh(&nf_conntrack_expect_lock); - hlist_for_each_entry_safe(exp, next, &help->expectations, lnode) { - nf_ct_remove_expect(exp); - } + hlist_for_each_entry_safe(exp, next, &help->expectations, lnode) + nf_ct_unlink_expect(exp); spin_unlock_bh(&nf_conntrack_expect_lock); } EXPORT_SYMBOL_GPL(nf_ct_remove_expectations); @@ -292,7 +294,7 @@ static bool master_matches(const struct nf_conntrack_expect *a, void nf_ct_unexpect_related(struct nf_conntrack_expect *exp) { spin_lock_bh(&nf_conntrack_expect_lock); - nf_ct_remove_expect(exp); + WRITE_ONCE(exp->flags, exp->flags | NF_CT_EXPECT_DEAD); spin_unlock_bh(&nf_conntrack_expect_lock); } EXPORT_SYMBOL_GPL(nf_ct_unexpect_related); @@ -308,6 +310,7 @@ struct nf_conntrack_expect *nf_ct_expect_alloc(struct nf_conn *me) if (!new) return NULL; + new->timeout = nfct_time_stamp; new->master = me; refcount_set(&new->use, 1); return new; @@ -352,6 +355,8 @@ void nf_ct_expect_init(struct nf_conntrack_expect *exp, unsigned int class, exp->tuple.src.l3num = family; exp->tuple.dst.protonum = proto; + exp->master_tuple = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple; + if (saddr) { memcpy(&exp->tuple.src.u3, saddr, len); if (sizeof(exp->tuple.src.u3) > len) @@ -413,17 +418,12 @@ static void nf_ct_expect_insert(struct nf_conntrack_expect *exp, struct net *net = nf_ct_exp_net(exp); unsigned int h = nf_ct_expect_dst_hash(net, &exp->tuple); - /* two references : one for hash insert, one for the timer */ - refcount_add(2, &exp->use); + refcount_inc(&exp->use); - timer_setup(&exp->timeout, nf_ct_expectation_timed_out, 0); helper = rcu_dereference_protected(master_help->helper, lockdep_is_held(&nf_conntrack_expect_lock)); - if (helper) { - exp->timeout.expires = jiffies + - helper->expect_policy[exp->class].timeout * HZ; - } - add_timer(&exp->timeout); + if (helper) + exp->timeout += helper->expect_policy[exp->class].timeout * HZ; hlist_add_head_rcu(&exp->lnode, &master_help->expectations); master_help->expecting[exp->class]++; @@ -435,19 +435,26 @@ static void nf_ct_expect_insert(struct nf_conntrack_expect *exp, NF_CT_STAT_INC(net, expect_create); } -/* Race with expectations being used means we could have none to find; OK. */ static void evict_oldest_expect(struct nf_conn_help *master_help, - struct nf_conntrack_expect *new) + struct nf_conntrack_expect *new, + const struct nf_conntrack_expect_policy *p) { struct nf_conntrack_expect *exp, *last = NULL; + struct hlist_node *next; - hlist_for_each_entry(exp, &master_help->expectations, lnode) { + hlist_for_each_entry_safe(exp, next, &master_help->expectations, lnode) { + if (nf_ct_exp_is_expired(exp)) { + nf_ct_unlink_expect(exp); + continue; + } if (exp->class == new->class) last = exp; } - if (last) - nf_ct_remove_expect(last); + /* Still worth to evict oldest expectation after garbage collection? */ + if (last && + master_help->expecting[last->class] >= p->max_expected) + nf_ct_unlink_expect(last); } static inline int __nf_ct_expect_check(struct nf_conntrack_expect *expect, @@ -467,14 +474,18 @@ static inline int __nf_ct_expect_check(struct nf_conntrack_expect *expect, h = nf_ct_expect_dst_hash(net, &expect->tuple); hlist_for_each_entry_safe(i, next, &nf_ct_expect_hash[h], hnode) { + if (nf_ct_exp_is_expired(i)) { + nf_ct_unlink_expect(i); + continue; + } if (master_matches(i, expect, flags) && expect_matches(i, expect)) { if (i->class != expect->class || i->master != expect->master) return -EALREADY; - if (nf_ct_remove_expect(i)) - break; + nf_ct_unlink_expect(i); + break; } else if (expect_clash(i, expect)) { ret = -EBUSY; goto out; @@ -485,15 +496,15 @@ static inline int __nf_ct_expect_check(struct nf_conntrack_expect *expect, lockdep_is_held(&nf_conntrack_expect_lock)); if (helper) { p = &helper->expect_policy[expect->class]; - if (p->max_expected && - master_help->expecting[expect->class] >= p->max_expected) { - evict_oldest_expect(master_help, expect); - if (master_help->expecting[expect->class] - >= p->max_expected) { - ret = -EMFILE; - goto out; - } - } + if (master_help->expecting[expect->class] >= p->max_expected) + evict_oldest_expect(master_help, expect, p); + } else { + const struct nf_conntrack_expect_policy default_exp_policy = { + .max_expected = NF_CT_EXPECT_MAX_CNT, + }; + + if (master_help->expecting[expect->class] >= default_exp_policy.max_expected) + evict_oldest_expect(master_help, expect, &default_exp_policy); } cnet = nf_ct_pernet(net); @@ -547,10 +558,8 @@ void nf_ct_expect_iterate_destroy(bool (*iter)(struct nf_conntrack_expect *e, vo hlist_for_each_entry_safe(exp, next, &nf_ct_expect_hash[i], hnode) { - if (iter(exp, data) && timer_delete(&exp->timeout)) { + if (iter(exp, data)) nf_ct_unlink_expect(exp); - nf_ct_expect_put(exp); - } } } @@ -577,10 +586,8 @@ void nf_ct_expect_iterate_net(struct net *net, if (!net_eq(nf_ct_exp_net(exp), net)) continue; - if (iter(exp, data) && timer_delete(&exp->timeout)) { + if (iter(exp, data)) nf_ct_unlink_expect_report(exp, portid, report); - nf_ct_expect_put(exp); - } } } @@ -657,17 +664,17 @@ static int exp_seq_show(struct seq_file *s, void *v) struct net *net = seq_file_net(s); struct hlist_node *n = v; char *delim = ""; + __s32 timeout; expect = hlist_entry(n, struct nf_conntrack_expect, hnode); if (!net_eq(nf_ct_exp_net(expect), net)) return 0; + if (nf_ct_exp_is_expired(expect)) + return 0; - if (expect->timeout.function) - seq_printf(s, "%ld ", timer_pending(&expect->timeout) - ? (long)(expect->timeout.expires - jiffies)/HZ : 0); - else - seq_puts(s, "- "); + timeout = (__s32)(READ_ONCE(expect->timeout) - nfct_time_stamp) / HZ; + seq_printf(s, "%d ", timeout > 0 ? timeout : 0); seq_printf(s, "l3proto = %u proto=%u ", expect->tuple.src.l3num, expect->tuple.dst.protonum); diff --git a/net/netfilter/nf_conntrack_h323_main.c b/net/netfilter/nf_conntrack_h323_main.c index 7f189dceb3c4..24931e379985 100644 --- a/net/netfilter/nf_conntrack_h323_main.c +++ b/net/netfilter/nf_conntrack_h323_main.c @@ -1388,8 +1388,8 @@ static int process_rcf(struct sk_buff *skb, struct nf_conn *ct, "timeout to %u seconds for", info->timeout); nf_ct_dump_tuple(&exp->tuple); - mod_timer_pending(&exp->timeout, - jiffies + info->timeout * HZ); + WRITE_ONCE(exp->timeout, + nfct_time_stamp + (info->timeout * HZ)); } spin_unlock_bh(&nf_conntrack_expect_lock); } diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c index 2f35bdd0d7d7..500509b17663 100644 --- a/net/netfilter/nf_conntrack_helper.c +++ b/net/netfilter/nf_conntrack_helper.c @@ -181,10 +181,10 @@ nf_ct_helper_ext_add(struct nf_conn *ct, gfp_t gfp) struct nf_conn_help *help; help = nf_ct_ext_add(ct, NF_CT_EXT_HELPER, gfp); - if (help) + if (help) { + __set_bit(IPS_HELPER_BIT, &ct->status); INIT_HLIST_HEAD(&help->expectations); - else - pr_debug("failed to add helper extension area"); + } return help; } EXPORT_SYMBOL_GPL(nf_ct_helper_ext_add); @@ -203,10 +203,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl, return 0; help = nfct_help(tmpl); - if (help != NULL) { + if (help) helper = rcu_dereference(help->helper); - set_bit(IPS_HELPER_BIT, &ct->status); - } help = nfct_help(ct); @@ -376,8 +374,13 @@ int __nf_conntrack_helper_register(struct nf_conntrack_helper *me) if (!nf_ct_helper_hash) return -ENOENT; - if (me->expect_policy->max_expected > NF_CT_EXPECT_MAX_CNT) - return -EINVAL; + for (i = 0; i <= me->expect_class_max; i++) { + if (!me->expect_policy[i].max_expected) + me->expect_policy[i].max_expected = NF_CT_EXPECT_MAX_CNT; + + if (me->expect_policy[i].max_expected > NF_CT_EXPECT_MAX_CNT) + return -EINVAL; + } mutex_lock(&nf_ct_helper_mutex); for (i = 0; i < nf_ct_helper_hsize; i++) { diff --git a/net/netfilter/nf_conntrack_irc.c b/net/netfilter/nf_conntrack_irc.c index 0c117b8492e9..193ab34db795 100644 --- a/net/netfilter/nf_conntrack_irc.c +++ b/net/netfilter/nf_conntrack_irc.c @@ -262,6 +262,8 @@ static int __init nf_conntrack_irc_init(void) { int i, ret; + nf_conntrack_helper_deprecated(HELPER_NAME); + if (max_dcc_channels < 1) { pr_err("max_dcc_channels must not be zero\n"); return -EINVAL; diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index b429e648f06c..4217715d42dc 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -1953,19 +1953,6 @@ static int ctnetlink_change_helper(struct nf_conn *ct, return err; } - if (!strcmp(helpname, "") && help) { - helper = rcu_dereference(help->helper); - if (helper) { - /* we had a helper before ... */ - nf_ct_remove_expectations(ct); - RCU_INIT_POINTER(help->helper, NULL); - if (refcount_dec_and_test(&helper->ct_refcnt)) - kfree_rcu(helper, rcu); - } - rcu_read_unlock(); - return 0; - } - helper = __nf_conntrack_helper_find(helpname, nf_ct_l3num(ct), nf_ct_protonum(ct)); if (helper == NULL) { @@ -3014,8 +3001,7 @@ static int ctnetlink_exp_dump_expect(struct sk_buff *skb, const struct nf_conntrack_expect *exp) { - struct nf_conn *master = exp->master; - long timeout = ((long)exp->timeout.expires - (long)jiffies) / HZ; + __s32 timeout = (__s32)(READ_ONCE(exp->timeout) - nfct_time_stamp) / HZ; struct nf_conntrack_helper *helper; #if IS_ENABLED(CONFIG_NF_NAT) struct nlattr *nest_parms; @@ -3030,9 +3016,7 @@ ctnetlink_exp_dump_expect(struct sk_buff *skb, goto nla_put_failure; if (ctnetlink_exp_dump_mask(skb, &exp->tuple, &exp->mask) < 0) goto nla_put_failure; - if (ctnetlink_exp_dump_tuple(skb, - &master->tuplehash[IP_CT_DIR_ORIGINAL].tuple, - CTA_EXPECT_MASTER) < 0) + if (ctnetlink_exp_dump_tuple(skb, &exp->master_tuple, CTA_EXPECT_MASTER) < 0) goto nla_put_failure; #if IS_ENABLED(CONFIG_NF_NAT) @@ -3045,9 +3029,9 @@ ctnetlink_exp_dump_expect(struct sk_buff *skb, if (nla_put_be32(skb, CTA_EXPECT_NAT_DIR, htonl(exp->dir))) goto nla_put_failure; - nat_tuple.src.l3num = nf_ct_l3num(master); + nat_tuple.src.l3num = exp->master_tuple.src.l3num; nat_tuple.src.u3 = exp->saved_addr; - nat_tuple.dst.protonum = nf_ct_protonum(master); + nat_tuple.dst.protonum = exp->master_tuple.dst.protonum; nat_tuple.src.u = exp->saved_proto; if (ctnetlink_exp_dump_tuple(skb, &nat_tuple, @@ -3178,6 +3162,9 @@ ctnetlink_exp_dump_table(struct sk_buff *skb, struct netlink_callback *cb) restart: hlist_for_each_entry_rcu(exp, &nf_ct_expect_hash[cb->args[0]], hnode) { + if (nf_ct_exp_is_expired(exp)) + continue; + if (l3proto && exp->tuple.src.l3num != l3proto) continue; @@ -3456,11 +3443,8 @@ static int ctnetlink_del_expect(struct sk_buff *skb, } /* after list removal, usage count == 1 */ - if (timer_delete(&exp->timeout)) { - nf_ct_unlink_expect_report(exp, NETLINK_CB(skb).portid, - nlmsg_report(info->nlh)); - nf_ct_expect_put(exp); - } + nf_ct_unlink_expect_report(exp, NETLINK_CB(skb).portid, + nlmsg_report(info->nlh)); spin_unlock_bh(&nf_conntrack_expect_lock); /* have to put what we 'get' above. * after this line usage count == 0 */ @@ -3484,14 +3468,10 @@ static int ctnetlink_change_expect(struct nf_conntrack_expect *x, const struct nlattr * const cda[]) { - if (cda[CTA_EXPECT_TIMEOUT]) { - if (!timer_delete(&x->timeout)) - return -ETIME; + if (cda[CTA_EXPECT_TIMEOUT]) + WRITE_ONCE(x->timeout, nfct_time_stamp + + ntohl(nla_get_be32(cda[CTA_EXPECT_TIMEOUT])) * HZ); - x->timeout.expires = jiffies + - ntohl(nla_get_be32(cda[CTA_EXPECT_TIMEOUT])) * HZ; - add_timer(&x->timeout); - } return 0; } @@ -3593,6 +3573,7 @@ ctnetlink_alloc_expect(const struct nlattr * const cda[], struct nf_conn *ct, #endif rcu_assign_pointer(exp->helper, helper); rcu_assign_pointer(exp->assign_helper, assign_helper); + exp->master_tuple = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple; exp->tuple = *tuple; exp->mask.src.u3 = mask->src.u3; exp->mask.src.u.all = mask->src.u.all; diff --git a/net/netfilter/nf_conntrack_pptp.c b/net/netfilter/nf_conntrack_pptp.c index 776505a78e64..80fc14c87ddc 100644 --- a/net/netfilter/nf_conntrack_pptp.c +++ b/net/netfilter/nf_conntrack_pptp.c @@ -545,6 +545,8 @@ static int __init nf_conntrack_pptp_init(void) pptp.destroy = gre_pptp_destroy_siblings; + nf_conntrack_helper_deprecated(pptp.name); + return nf_conntrack_helper_register(&pptp, &pptp_ptr); } diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c index c606d1f60b58..5ec3a4a4bbd7 100644 --- a/net/netfilter/nf_conntrack_sip.c +++ b/net/netfilter/nf_conntrack_sip.c @@ -897,11 +897,10 @@ static int refresh_signalling_expectation(struct nf_conn *ct, exp->tuple.dst.protonum != proto || exp->tuple.dst.u.udp.port != port) continue; - if (mod_timer_pending(&exp->timeout, jiffies + expires * HZ)) { - exp->flags &= ~NF_CT_EXPECT_INACTIVE; - found = 1; - break; - } + WRITE_ONCE(exp->timeout, nfct_time_stamp + (expires * HZ)); + WRITE_ONCE(exp->flags, exp->flags & ~NF_CT_EXPECT_INACTIVE); + found = 1; + break; } spin_unlock_bh(&nf_conntrack_expect_lock); return found; @@ -920,8 +919,7 @@ static void flush_expectations(struct nf_conn *ct, bool media) hlist_for_each_entry_safe(exp, next, &help->expectations, lnode) { if ((exp->class != SIP_EXPECT_SIGNALLING) ^ media) continue; - if (!nf_ct_remove_expect(exp)) - continue; + nf_ct_unlink_expect(exp); if (!media) break; } @@ -1413,7 +1411,6 @@ static int process_register_request(struct sk_buff *skb, unsigned int protoff, nf_ct_expect_init(exp, SIP_EXPECT_SIGNALLING, nf_ct_l3num(ct), saddr, &daddr, proto, NULL, &port); - exp->timeout.expires = sip_timeout * HZ; rcu_assign_pointer(exp->assign_helper, helper); exp->flags = NF_CT_EXPECT_PERMANENT | NF_CT_EXPECT_INACTIVE; diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c index 785d8c244a77..99c5b9d671a0 100644 --- a/net/netfilter/nf_flow_table_core.c +++ b/net/netfilter/nf_flow_table_core.c @@ -505,8 +505,13 @@ static u32 nf_flow_table_tcp_timeout(const struct nf_conn *ct) */ static void nf_flow_table_extend_ct_timeout(struct nf_conn *ct) { - static const u32 min_timeout = 5 * 60 * HZ; - u32 expires = nf_ct_expires(ct); + static const s32 min_timeout = 5 * 60 * HZ; + u32 ct_timeout = READ_ONCE(ct->timeout); + s32 expires; + + expires = ct_timeout - nfct_time_stamp; + if (expires <= 0) /* already expired */ + return; /* normal case: large enough timeout, nothing to do. */ if (likely(expires >= min_timeout)) @@ -524,7 +529,7 @@ static void nf_flow_table_extend_ct_timeout(struct nf_conn *ct) if (nf_ct_is_confirmed(ct) && test_bit(IPS_OFFLOAD_BIT, &ct->status)) { u8 l4proto = nf_ct_protonum(ct); - u32 new_timeout = true; + u32 new_timeout = 1; switch (l4proto) { case IPPROTO_UDP: @@ -549,7 +554,7 @@ static void nf_flow_table_extend_ct_timeout(struct nf_conn *ct) */ if (new_timeout) { new_timeout += nfct_time_stamp; - cmpxchg(&ct->timeout, expires, new_timeout); + cmpxchg(&ct->timeout, ct_timeout, new_timeout); } } diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c index 9c05a50d6013..29e93ac1e2e4 100644 --- a/net/netfilter/nf_flow_table_ip.c +++ b/net/netfilter/nf_flow_table_ip.c @@ -326,8 +326,10 @@ static bool nf_flow_ip4_tunnel_proto(struct nf_flowtable_ctx *ctx, return false; iph = (struct iphdr *)(skb_network_header(skb) + ctx->offset); - size = iph->ihl << 2; + if (iph->ihl < 5) + return false; + size = iph->ihl << 2; if (ip_is_fragment(iph) || unlikely(ip_has_options(size))) return false; @@ -335,9 +337,9 @@ static bool nf_flow_ip4_tunnel_proto(struct nf_flowtable_ctx *ctx, return false; if (iph->protocol == IPPROTO_IPIP) { - ctx->tun.proto = IPPROTO_IPIP; + ctx->tun.proto = iph->protocol; ctx->tun.hdr_size = size; - ctx->offset += size; + ctx->offset += ctx->tun.hdr_size; } return true; @@ -347,29 +349,23 @@ static bool nf_flow_ip6_tunnel_proto(struct nf_flowtable_ctx *ctx, struct sk_buff *skb) { #if IS_ENABLED(CONFIG_IPV6) - struct ipv6hdr *ip6h, _ip6h; - __be16 frag_off; - u8 nexthdr; - int hdrlen; + struct ipv6hdr *ip6h; - ip6h = skb_header_pointer(skb, ctx->offset, sizeof(*ip6h), &_ip6h); - if (!ip6h) + if (!pskb_may_pull(skb, sizeof(*ip6h) + ctx->offset)) return false; + ip6h = (struct ipv6hdr *)(skb_network_header(skb) + ctx->offset); if (ip6h->hop_limit <= 1) return false; - nexthdr = ip6h->nexthdr; - hdrlen = ipv6_skip_exthdr(skb, sizeof(*ip6h) + ctx->offset, &nexthdr, - &frag_off); - if (hdrlen < 0) + if (ipv6_ext_hdr(ip6h->nexthdr)) return false; - if (nexthdr == IPPROTO_IPV6) { - ctx->tun.hdr_size = hdrlen; - ctx->tun.proto = IPPROTO_IPV6; + if (ip6h->nexthdr == IPPROTO_IPV6) { + ctx->tun.proto = ip6h->nexthdr; + ctx->tun.hdr_size = sizeof(*ip6h); + ctx->offset += ctx->tun.hdr_size; } - ctx->offset += ctx->tun.hdr_size; return true; #else @@ -648,25 +644,19 @@ static int nf_flow_tunnel_v4_push(struct net *net, struct sk_buff *skb, return 0; } -struct ipv6_tel_txoption { - struct ipv6_txoptions ops; - __u8 dst_opt[8]; -}; - static int nf_flow_tunnel_ip6ip6_push(struct net *net, struct sk_buff *skb, struct flow_offload_tuple *tuple, - struct in6_addr **ip6_daddr, - int encap_limit) + struct in6_addr **ip6_daddr) { struct ipv6hdr *ip6h = (struct ipv6hdr *)skb_network_header(skb); - u8 hop_limit = ip6h->hop_limit, proto = IPPROTO_IPV6; struct rtable *rt = dst_rtable(tuple->dst_cache); __u8 dsfield = ipv6_get_dsfield(ip6h); struct flowi6 fl6 = { .daddr = tuple->tun.src_v6, .saddr = tuple->tun.dst_v6, - .flowi6_proto = proto, + .flowi6_proto = IPPROTO_IPV6, }; + u8 hop_limit = ip6h->hop_limit; int err, mtu; u32 headroom; @@ -674,41 +664,18 @@ static int nf_flow_tunnel_ip6ip6_push(struct net *net, struct sk_buff *skb, if (err) return err; - skb_set_inner_ipproto(skb, proto); + skb_set_inner_ipproto(skb, IPPROTO_IPV6); headroom = sizeof(*ip6h) + LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len; - if (encap_limit) - headroom += 8; err = skb_cow_head(skb, headroom); if (err) return err; skb_scrub_packet(skb, true); mtu = dst_mtu(&rt->dst) - sizeof(*ip6h); - if (encap_limit) - mtu -= 8; mtu = max(mtu, IPV6_MIN_MTU); skb_dst_update_pmtu_no_confirm(skb, mtu); - if (encap_limit > 0) { - struct ipv6_tel_txoption opt = { - .dst_opt[2] = IPV6_TLV_TNL_ENCAP_LIMIT, - .dst_opt[3] = 1, - .dst_opt[4] = encap_limit, - .dst_opt[5] = IPV6_TLV_PADN, - .dst_opt[6] = 1, - }; - struct ipv6_opt_hdr *hopt; - - opt.ops.dst1opt = (struct ipv6_opt_hdr *)opt.dst_opt; - opt.ops.opt_nflen = 8; - - hopt = skb_push(skb, ipv6_optlen(opt.ops.dst1opt)); - memcpy(hopt, opt.ops.dst1opt, ipv6_optlen(opt.ops.dst1opt)); - hopt->nexthdr = IPPROTO_IPV6; - proto = NEXTHDR_DEST; - } - skb_push(skb, sizeof(*ip6h)); skb_reset_network_header(skb); @@ -716,7 +683,7 @@ static int nf_flow_tunnel_ip6ip6_push(struct net *net, struct sk_buff *skb, ip6_flow_hdr(ip6h, dsfield, ip6_make_flowlabel(net, skb, fl6.flowlabel, true, &fl6)); ip6h->hop_limit = hop_limit; - ip6h->nexthdr = proto; + ip6h->nexthdr = IPPROTO_IPV6; ip6h->daddr = tuple->tun.src_v6; ip6h->saddr = tuple->tun.dst_v6; ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(*ip6h)); @@ -729,12 +696,10 @@ static int nf_flow_tunnel_ip6ip6_push(struct net *net, struct sk_buff *skb, static int nf_flow_tunnel_v6_push(struct net *net, struct sk_buff *skb, struct flow_offload_tuple *tuple, - struct in6_addr **ip6_daddr, - int encap_limit) + struct in6_addr **ip6_daddr) { if (tuple->tun_num) - return nf_flow_tunnel_ip6ip6_push(net, skb, tuple, ip6_daddr, - encap_limit); + return nf_flow_tunnel_ip6ip6_push(net, skb, tuple, ip6_daddr); return 0; } @@ -1089,7 +1054,7 @@ static int nf_flow_tuple_ipv6(struct nf_flowtable_ctx *ctx, struct sk_buff *skb, static int nf_flow_offload_ipv6_forward(struct nf_flowtable_ctx *ctx, struct nf_flowtable *flow_table, struct flow_offload_tuple_rhash *tuplehash, - struct sk_buff *skb, int encap_limit) + struct sk_buff *skb) { enum flow_offload_tuple_dir dir; struct flow_offload *flow; @@ -1100,11 +1065,8 @@ static int nf_flow_offload_ipv6_forward(struct nf_flowtable_ctx *ctx, flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]); mtu = flow->tuplehash[dir].tuple.mtu + ctx->offset; - if (flow->tuplehash[!dir].tuple.tun_num) { + if (flow->tuplehash[!dir].tuple.tun_num) mtu -= sizeof(*ip6h); - if (encap_limit > 0) - mtu -= 8; /* encap limit option */ - } if (unlikely(nf_flow_exceeds_mtu(skb, mtu))) return 0; @@ -1158,7 +1120,6 @@ unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { - int encap_limit = IPV6_DEFAULT_TNL_ENCAP_LIMIT; struct flow_offload_tuple_rhash *tuplehash; struct nf_flowtable *flow_table = priv; struct flow_offload_tuple *other_tuple; @@ -1177,8 +1138,7 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb, if (tuplehash == NULL) return NF_ACCEPT; - ret = nf_flow_offload_ipv6_forward(&ctx, flow_table, tuplehash, skb, - encap_limit); + ret = nf_flow_offload_ipv6_forward(&ctx, flow_table, tuplehash, skb); if (ret < 0) return NF_DROP; else if (ret == 0) @@ -1198,7 +1158,7 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb, ip6_daddr = &other_tuple->src_v6; if (nf_flow_tunnel_v6_push(state->net, skb, other_tuple, - &ip6_daddr, encap_limit) < 0) + &ip6_daddr) < 0) return NF_DROP; switch (tuplehash->tuple.xmit_type) { diff --git a/net/netfilter/nf_flow_table_path.c b/net/netfilter/nf_flow_table_path.c index 1e7e216b9f89..98c03b487f52 100644 --- a/net/netfilter/nf_flow_table_path.c +++ b/net/netfilter/nf_flow_table_path.c @@ -53,8 +53,10 @@ static int nft_dev_fill_forward_path(const struct nf_flow_route *route, struct neighbour *n; u8 nud_state; - if (!nft_is_valid_ether_device(dev)) + if (!nft_is_valid_ether_device(dev)) { + eth_zero_addr(ha); goto out; + } n = dst_neigh_lookup(dst_cache, daddr); if (!n) diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c index 2bbf5163c0e2..63ff6b4d5d21 100644 --- a/net/netfilter/nf_nat_core.c +++ b/net/netfilter/nf_nat_core.c @@ -1181,6 +1181,16 @@ int nf_nat_register_fn(struct net *net, u8 pf, const struct nf_hook_ops *ops, struct nf_hook_ops *nat_ops; int i, ret; +#ifndef MODULE + /* If nf_nat_core is built-in and nf_nat_init() fails, dependent + * modules like nft_chain_nat.ko may still call this function. + * However, nat_net would be invalid, likely pointing to some other + * per-net structure. + */ + if (WARN_ON_ONCE(!nf_nat_hook)) + return -EOPNOTSUPP; +#endif + if (WARN_ON_ONCE(pf >= ARRAY_SIZE(nat_net->nat_proto_net))) return -EINVAL; diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c index 57b450024a99..73363ceedebe 100644 --- a/net/netfilter/nf_queue.c +++ b/net/netfilter/nf_queue.c @@ -68,6 +68,7 @@ static void nf_queue_entry_release_refs(struct nf_queue_entry *entry) nf_queue_sock_put(state->sk); #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) + dev_put(entry->bridge_dev); dev_put(entry->physin); dev_put(entry->physout); #endif @@ -84,6 +85,8 @@ static void __nf_queue_entry_init_physdevs(struct nf_queue_entry *entry) { #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) const struct sk_buff *skb = entry->skb; + struct dst_entry *dst = skb_dst(skb); + struct net_device *dev = NULL; if (nf_bridge_info_exists(skb)) { entry->physin = nf_bridge_get_physindev(skb, entry->state.net); @@ -92,6 +95,16 @@ static void __nf_queue_entry_init_physdevs(struct nf_queue_entry *entry) entry->physin = NULL; entry->physout = NULL; } + + if (entry->state.pf == NFPROTO_BRIDGE && + dst && (dst->flags & DST_FAKE_RTABLE)) + dev = dst_dev_rcu(dst); + + /* Must hold a reference on the bridge device: dst_hold() protects + * the dst itself, but the fake rtable is embedded in bridge-private + * storage that netdevice teardown can free independently. + */ + entry->bridge_dev = dev; #endif } @@ -108,6 +121,7 @@ bool nf_queue_entry_get_refs(struct nf_queue_entry *entry) dev_hold(state->out); #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) + dev_hold(entry->bridge_dev); dev_hold(entry->physin); dev_hold(entry->physout); #endif diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c index c5e29fec419b..80ca077b81bd 100644 --- a/net/netfilter/nfnetlink_queue.c +++ b/net/netfilter/nfnetlink_queue.c @@ -1262,6 +1262,9 @@ dev_cmp(struct nf_queue_entry *entry, unsigned long ifindex) if (physinif == ifindex || physoutif == ifindex) return 1; + + if (entry->bridge_dev && entry->bridge_dev->ifindex == ifindex) + return 1; #endif if (entry->skb_dev && entry->skb_dev->ifindex == ifindex) return 1; diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c index 0caa9304d2d0..63864b928259 100644 --- a/net/netfilter/nft_compat.c +++ b/net/netfilter/nft_compat.c @@ -397,6 +397,22 @@ static int nft_target_validate(const struct nft_ctx *ctx, return 0; } +static int nft_target_bridge_validate(const struct nft_ctx *ctx, + const struct nft_expr *expr) +{ + struct xt_target *target = expr->ops->data; + + /* Do not allow UNSPEC to stand-in for NFPROTO_BRIDGE + * targets: they are incompatible. ebtables targets return + * EBT_ACCEPT, DROP and so on which are not compatible with + * NF_ACCEPT, NF_DROP and so on. + */ + if (target->family != NFPROTO_BRIDGE) + return -ENOENT; + + return nft_target_validate(ctx, expr); +} + static void __nft_match_eval(const struct nft_expr *expr, struct nft_regs *regs, const struct nft_pktinfo *pkt, @@ -932,13 +948,15 @@ nft_target_select_ops(const struct nft_ctx *ctx, ops->init = nft_target_init; ops->destroy = nft_target_destroy; ops->dump = nft_target_dump; - ops->validate = nft_target_validate; ops->data = target; - if (family == NFPROTO_BRIDGE) + if (family == NFPROTO_BRIDGE) { ops->eval = nft_target_eval_bridge; - else + ops->validate = nft_target_bridge_validate; + } else { ops->eval = nft_target_eval_xt; + ops->validate = nft_target_validate; + } return ops; err: diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c index 25934c6f01fb..03a88c77e0f0 100644 --- a/net/netfilter/nft_ct.c +++ b/net/netfilter/nft_ct.c @@ -1145,7 +1145,6 @@ static void nft_ct_helper_obj_eval(struct nft_object *obj, help = nf_ct_helper_ext_add(ct, GFP_ATOMIC); if (help && refcount_inc_not_zero(&to_assign->ct_refcnt)) { rcu_assign_pointer(help->helper, to_assign); - set_bit(IPS_HELPER_BIT, &ct->status); if ((ct->status & IPS_NAT_MASK) && !nfct_seqadj(ct)) if (!nfct_seqadj_ext_add(ct)) @@ -1216,11 +1215,23 @@ struct nft_ct_expect_obj { u32 timeout; }; +static int nft_ct_expect_timeout_get(const struct nlattr *attr, u32 *val) +{ + unsigned long jiffies_val = msecs_to_jiffies(nla_get_u32(attr)); + + if (jiffies_val > UINT_MAX) + return -ERANGE; + + *val = jiffies_val; + return 0; +} + static int nft_ct_expect_obj_init(const struct nft_ctx *ctx, const struct nlattr * const tb[], struct nft_object *obj) { struct nft_ct_expect_obj *priv = nft_obj_data(obj); + int err; if (!tb[NFTA_CT_EXPECT_L4PROTO] || !tb[NFTA_CT_EXPECT_DPORT] || @@ -1255,8 +1266,11 @@ static int nft_ct_expect_obj_init(const struct nft_ctx *ctx, return -EOPNOTSUPP; } + err = nft_ct_expect_timeout_get(tb[NFTA_CT_EXPECT_TIMEOUT], &priv->timeout); + if (err) + return err; + priv->dport = nla_get_be16(tb[NFTA_CT_EXPECT_DPORT]); - priv->timeout = nla_get_u32(tb[NFTA_CT_EXPECT_TIMEOUT]); priv->size = nla_get_u8(tb[NFTA_CT_EXPECT_SIZE]); return nf_ct_netns_get(ctx->net, ctx->family); @@ -1276,7 +1290,7 @@ static int nft_ct_expect_obj_dump(struct sk_buff *skb, if (nla_put_be16(skb, NFTA_CT_EXPECT_L3PROTO, htons(priv->l3num)) || nla_put_u8(skb, NFTA_CT_EXPECT_L4PROTO, priv->l4proto) || nla_put_be16(skb, NFTA_CT_EXPECT_DPORT, priv->dport) || - nla_put_u32(skb, NFTA_CT_EXPECT_TIMEOUT, priv->timeout) || + nla_put_u32(skb, NFTA_CT_EXPECT_TIMEOUT, jiffies_to_msecs(priv->timeout)) || nla_put_u8(skb, NFTA_CT_EXPECT_SIZE, priv->size)) return -1; @@ -1326,7 +1340,7 @@ static void nft_ct_expect_obj_eval(struct nft_object *obj, &ct->tuplehash[!dir].tuple.src.u3, &ct->tuplehash[!dir].tuple.dst.u3, priv->l4proto, NULL, &priv->dport); - exp->timeout.expires = jiffies + priv->timeout * HZ; + exp->timeout += priv->timeout; if (nf_ct_expect_related(exp, 0) != 0) regs->verdict.code = NF_DROP; diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c index 9b5821c64442..0a43e0787a68 100644 --- a/net/netfilter/nft_meta.c +++ b/net/netfilter/nft_meta.c @@ -635,8 +635,8 @@ static int nft_meta_get_validate_xfrm(const struct nft_ctx *ctx) #endif } -static int nft_meta_get_validate(const struct nft_ctx *ctx, - const struct nft_expr *expr) +int nft_meta_get_validate(const struct nft_ctx *ctx, + const struct nft_expr *expr) { const struct nft_meta *priv = nft_expr_priv(expr); @@ -652,6 +652,7 @@ static int nft_meta_get_validate(const struct nft_ctx *ctx, return 0; } +EXPORT_SYMBOL_GPL(nft_meta_get_validate); int nft_meta_set_validate(const struct nft_ctx *ctx, const struct nft_expr *expr) diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c index ef2a80dfc68f..345eff140d56 100644 --- a/net/netfilter/nft_payload.c +++ b/net/netfilter/nft_payload.c @@ -224,11 +224,17 @@ static int nft_payload_init(const struct nft_ctx *ctx, const struct nlattr * const tb[]) { struct nft_payload *priv = nft_expr_priv(expr); + u32 offset; + int err; priv->base = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_BASE])); - priv->offset = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_OFFSET])); priv->len = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_LEN])); + err = nft_parse_u32_check(tb[NFTA_PAYLOAD_OFFSET], U16_MAX, &offset); + if (err < 0) + return err; + priv->offset = offset; + return nft_parse_register_store(ctx, tb[NFTA_PAYLOAD_DREG], &priv->dreg, NULL, NFT_DATA_VALUE, priv->len); @@ -621,7 +627,8 @@ static int nft_payload_inner_init(const struct nft_ctx *ctx, const struct nlattr * const tb[]) { struct nft_payload *priv = nft_expr_priv(expr); - u32 base; + u32 base, offset; + int err; if (!tb[NFTA_PAYLOAD_BASE] || !tb[NFTA_PAYLOAD_OFFSET] || !tb[NFTA_PAYLOAD_LEN] || !tb[NFTA_PAYLOAD_DREG]) @@ -639,8 +646,11 @@ static int nft_payload_inner_init(const struct nft_ctx *ctx, } priv->base = base; - priv->offset = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_OFFSET])); priv->len = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_LEN])); + err = nft_parse_u32_check(tb[NFTA_PAYLOAD_OFFSET], U16_MAX, &offset); + if (err < 0) + return err; + priv->offset = offset; return nft_parse_register_store(ctx, tb[NFTA_PAYLOAD_DREG], &priv->dreg, NULL, NFT_DATA_VALUE, diff --git a/net/netfilter/nft_synproxy.c b/net/netfilter/nft_synproxy.c index 7641f249614c..9ed288c9d168 100644 --- a/net/netfilter/nft_synproxy.c +++ b/net/netfilter/nft_synproxy.c @@ -24,14 +24,13 @@ static const struct nla_policy nft_synproxy_policy[NFTA_SYNPROXY_MAX + 1] = { static void nft_synproxy_tcp_options(struct synproxy_options *opts, const struct tcphdr *tcp, struct synproxy_net *snet, - struct nf_synproxy_info *info, - const struct nft_synproxy *priv) + struct nf_synproxy_info *info) { this_cpu_inc(snet->stats->syn_received); if (tcp->ece && tcp->cwr) opts->options |= NF_SYNPROXY_OPT_ECN; - opts->options &= priv->info.options; + opts->options &= info->options; opts->mss_encode = opts->mss_option; opts->mss_option = info->mss; if (opts->options & NF_SYNPROXY_OPT_TIMESTAMP) @@ -56,7 +55,7 @@ static void nft_synproxy_eval_v4(const struct nft_synproxy *priv, if (tcp->syn) { /* Initial SYN from client */ - nft_synproxy_tcp_options(opts, tcp, snet, &info, priv); + nft_synproxy_tcp_options(opts, tcp, snet, &info); synproxy_send_client_synack(net, skb, tcp, opts); consume_skb(skb); regs->verdict.code = NF_STOLEN; @@ -87,7 +86,7 @@ static void nft_synproxy_eval_v6(const struct nft_synproxy *priv, if (tcp->syn) { /* Initial SYN from client */ - nft_synproxy_tcp_options(opts, tcp, snet, &info, priv); + nft_synproxy_tcp_options(opts, tcp, snet, &info); synproxy_send_client_synack_ipv6(net, skb, tcp, opts); consume_skb(skb); regs->verdict.code = NF_STOLEN; diff --git a/net/netfilter/xt_cluster.c b/net/netfilter/xt_cluster.c index 908fd5f2c3c8..eaf2511d63f0 100644 --- a/net/netfilter/xt_cluster.c +++ b/net/netfilter/xt_cluster.c @@ -107,7 +107,7 @@ xt_cluster_mt(const struct sk_buff *skb, struct xt_action_param *par) } ct = nf_ct_get(skb, &ctinfo); - if (ct == NULL) + if (!ct || nf_ct_is_template(ct)) return false; if (ct->master) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index c6fd9c424e8f..95697d4e16e6 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -883,7 +883,8 @@ static void ct_limit_set(const struct ovs_ct_limit_info *info, struct hlist_head *head; head = ct_limit_hash_bucket(info, new_ct_limit->zone); - hlist_for_each_entry_rcu(ct_limit, head, hlist_node) { + hlist_for_each_entry_rcu(ct_limit, head, hlist_node, + lockdep_ovsl_is_held()) { if (ct_limit->zone == new_ct_limit->zone) { hlist_replace_rcu(&ct_limit->hlist_node, &new_ct_limit->hlist_node); diff --git a/net/psample/psample.c b/net/psample/psample.c index 7763662036fb..c112e1f0ccac 100644 --- a/net/psample/psample.c +++ b/net/psample/psample.c @@ -476,15 +476,17 @@ void psample_sample_packet(struct psample_group *group, goto error; if (data_len) { - int nla_len = nla_total_size(data_len); + int nla_len = nla_attr_size(data_len); struct nlattr *nla; nla = skb_put(nl_skb, nla_len); nla->nla_type = PSAMPLE_ATTR_DATA; - nla->nla_len = nla_attr_size(data_len); + nla->nla_len = nla_len; if (skb_copy_bits(skb, 0, nla_data(nla), data_len)) goto error; + + skb_put_zero(nl_skb, nla_padlen(data_len)); } #ifdef CONFIG_INET diff --git a/net/rds/send.c b/net/rds/send.c index e5d58c29aabe..68be1bf0e0ad 100644 --- a/net/rds/send.c +++ b/net/rds/send.c @@ -967,6 +967,8 @@ static int rds_rm_size(struct msghdr *msg, int num_sgs, switch (cmsg->cmsg_type) { case RDS_CMSG_RDMA_ARGS: + if (cmsg->cmsg_len < CMSG_LEN(sizeof(struct rds_rdma_args))) + return -EINVAL; if (vct->indx >= vct->len) { vct->len += vct->incr; tmp_iov = diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 5802f6f78723..ce946b0a03e2 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -669,7 +669,9 @@ enum rxrpc_call_event { enum rxrpc_call_state { RXRPC_CALL_UNINITIALISED, RXRPC_CALL_CLIENT_AWAIT_CONN, /* - client waiting for connection to become available */ + RXRPC_CALL_CLIENT_PRE_SEND, /* - client is connected, but hasn't sent anything yet */ RXRPC_CALL_CLIENT_SEND_REQUEST, /* - client sending request phase */ + RXRPC_CALL_CLIENT_AWAIT_ACK, /* - client awaiting ACKs of request */ RXRPC_CALL_CLIENT_AWAIT_REPLY, /* - client awaiting reply */ RXRPC_CALL_CLIENT_RECV_REPLY, /* - client receiving reply phase */ RXRPC_CALL_SERVER_PREALLOC, /* - service preallocation */ @@ -1374,9 +1376,9 @@ static inline struct rxrpc_net *rxrpc_net(struct net *net) } /* - * out_of_band.c + * oob.c */ -void rxrpc_notify_socket_oob(struct rxrpc_call *call, struct sk_buff *skb); +bool rxrpc_notify_socket_oob(struct rxrpc_call *call, struct sk_buff *skb); void rxrpc_add_pending_oob(struct rxrpc_sock *rx, struct sk_buff *skb); int rxrpc_sendmsg_oob(struct rxrpc_sock *rx, struct msghdr *msg, size_t len); diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index fec59d9338b9..21be9c86d7a7 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -178,7 +178,7 @@ static void rxrpc_close_tx_phase(struct rxrpc_call *call) switch (__rxrpc_call_state(call)) { case RXRPC_CALL_CLIENT_SEND_REQUEST: - rxrpc_set_call_state(call, RXRPC_CALL_CLIENT_AWAIT_REPLY); + rxrpc_set_call_state(call, RXRPC_CALL_CLIENT_AWAIT_ACK); break; case RXRPC_CALL_SERVER_SEND_REPLY: rxrpc_set_call_state(call, RXRPC_CALL_SERVER_AWAIT_ACK); @@ -244,6 +244,8 @@ static void rxrpc_transmit_fresh_data(struct rxrpc_call *call, unsigned int limi break; } while (req.n < limit && before(seq, send_top)); + if (__rxrpc_call_state(call) == RXRPC_CALL_CLIENT_PRE_SEND) + rxrpc_set_call_state(call, RXRPC_CALL_CLIENT_SEND_REQUEST); if (txb->flags & RXRPC_LAST_PACKET) { rxrpc_close_tx_phase(call); tq = NULL; @@ -267,6 +269,7 @@ void rxrpc_transmit_some_data(struct rxrpc_call *call, unsigned int limit, fallthrough; case RXRPC_CALL_SERVER_SEND_REPLY: + case RXRPC_CALL_CLIENT_PRE_SEND: case RXRPC_CALL_CLIENT_SEND_REQUEST: if (!rxrpc_tx_window_space(call)) return; diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index fcb9d38bb521..817ed9acb91e 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -18,7 +18,9 @@ const char *const rxrpc_call_states[NR__RXRPC_CALL_STATES] = { [RXRPC_CALL_UNINITIALISED] = "Uninit ", [RXRPC_CALL_CLIENT_AWAIT_CONN] = "ClWtConn", + [RXRPC_CALL_CLIENT_PRE_SEND] = "ClPreSnd", [RXRPC_CALL_CLIENT_SEND_REQUEST] = "ClSndReq", + [RXRPC_CALL_CLIENT_AWAIT_ACK] = "ClAwtAck", [RXRPC_CALL_CLIENT_AWAIT_REPLY] = "ClAwtRpl", [RXRPC_CALL_CLIENT_RECV_REPLY] = "ClRcvRpl", [RXRPC_CALL_SERVER_PREALLOC] = "SvPrealc", diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c index 9b757798dedd..48519f0de185 100644 --- a/net/rxrpc/conn_client.c +++ b/net/rxrpc/conn_client.c @@ -449,7 +449,7 @@ static void rxrpc_activate_one_channel(struct rxrpc_connection *conn, trace_rxrpc_connect_call(call); call->tx_last_sent = ktime_get_real(); rxrpc_start_call_timer(call); - rxrpc_set_call_state(call, RXRPC_CALL_CLIENT_SEND_REQUEST); + rxrpc_set_call_state(call, RXRPC_CALL_CLIENT_PRE_SEND); wake_up(&call->waitq); } diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c index c96ca615b787..611c790bc6d0 100644 --- a/net/rxrpc/conn_event.c +++ b/net/rxrpc/conn_event.c @@ -436,7 +436,7 @@ static bool rxrpc_post_challenge(struct rxrpc_connection *conn, struct rxrpc_skb_priv *sp = rxrpc_skb(skb); struct rxrpc_call *call = NULL; struct rxrpc_sock *rx; - bool respond = false; + bool respond = false, queued = false; sp->chall.conn = rxrpc_get_connection(conn, rxrpc_conn_get_challenge_input); @@ -472,8 +472,13 @@ static bool rxrpc_post_challenge(struct rxrpc_connection *conn, } if (call) - rxrpc_notify_socket_oob(call, skb); + queued = rxrpc_notify_socket_oob(call, skb); rcu_read_unlock(); + if (call && !queued) { + rxrpc_put_connection(conn, rxrpc_conn_put_challenge_input); + sp->chall.conn = NULL; + return false; + } if (!call) rxrpc_post_packet_to_conn(conn, skb); diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index ce761466b02d..73cafe6bfa9f 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -181,7 +181,8 @@ void rxrpc_congestion_degrade(struct rxrpc_call *call) if (call->cong_ca_state != RXRPC_CA_SLOW_START && call->cong_ca_state != RXRPC_CA_CONGEST_AVOIDANCE) return; - if (__rxrpc_call_state(call) == RXRPC_CALL_CLIENT_AWAIT_REPLY) + if (__rxrpc_call_state(call) == RXRPC_CALL_CLIENT_AWAIT_ACK || + __rxrpc_call_state(call) == RXRPC_CALL_CLIENT_AWAIT_REPLY) return; rtt = ns_to_ktime(call->srtt_us * (NSEC_PER_USEC / 8)); @@ -236,6 +237,9 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, call->acks_lowest_nak = to; } + if (after(seq, to)) + return false; + /* We may have a left over fully-consumed buffer at the front that we * couldn't drop before (rotate_and_keep below). */ @@ -247,7 +251,7 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, tq = call->tx_queue; } - do { + while (before_eq(seq, to)) { unsigned int ix = seq - call->tx_qbase; _debug("tq=%x seq=%x i=%d f=%x", tq->qbase, seq, ix, tq->bufs[ix]->flags); @@ -317,8 +321,7 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, break; } } - - } while (before_eq(seq, to)); + } if (trace) trace_rxrpc_rack_update(call, summary); @@ -356,6 +359,7 @@ static void rxrpc_end_tx_phase(struct rxrpc_call *call, bool reply_begun, switch (__rxrpc_call_state(call)) { case RXRPC_CALL_CLIENT_SEND_REQUEST: + case RXRPC_CALL_CLIENT_AWAIT_ACK: case RXRPC_CALL_CLIENT_AWAIT_REPLY: if (reply_begun) { rxrpc_set_call_state(call, RXRPC_CALL_CLIENT_RECV_REPLY); @@ -392,6 +396,14 @@ static bool rxrpc_receiving_reply(struct rxrpc_call *call) trace_rxrpc_timer_can(call, rxrpc_timer_trace_delayed_ack); } + /* Deal with an apparent reply coming in before we've got the request + * queued or transmitted. + */ + if (!test_bit(RXRPC_CALL_EXPOSED, &call->flags)) { + rxrpc_proto_abort(call, top, rxrpc_eproto_early_reply); + return false; + } + if (!test_bit(RXRPC_CALL_TX_LAST, &call->flags)) { if (!rxrpc_rotate_tx_window(call, top, &summary)) { rxrpc_proto_abort(call, top, rxrpc_eproto_early_reply); @@ -694,6 +706,7 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb) switch (__rxrpc_call_state(call)) { case RXRPC_CALL_CLIENT_SEND_REQUEST: + case RXRPC_CALL_CLIENT_AWAIT_ACK: case RXRPC_CALL_CLIENT_AWAIT_REPLY: /* Received data implicitly ACKs all of the request * packets we sent when we're acting as a client. @@ -1154,10 +1167,12 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) if (hard_ack + 1 == 0) return rxrpc_proto_abort(call, 0, rxrpc_eproto_ackr_zero); - /* Ignore ACKs unless we are or have just been transmitting. */ + /* Ignore ACKs unless we are transmitting or are waiting for + * acknowledgement of the packets we've just been transmitting. + */ switch (__rxrpc_call_state(call)) { case RXRPC_CALL_CLIENT_SEND_REQUEST: - case RXRPC_CALL_CLIENT_AWAIT_REPLY: + case RXRPC_CALL_CLIENT_AWAIT_ACK: case RXRPC_CALL_SERVER_SEND_REPLY: case RXRPC_CALL_SERVER_AWAIT_ACK: break; @@ -1215,7 +1230,17 @@ static void rxrpc_input_ackall(struct rxrpc_call *call, struct sk_buff *skb) { struct rxrpc_ack_summary summary = { 0 }; - if (rxrpc_rotate_tx_window(call, call->tx_top, &summary)) + switch (__rxrpc_call_state(call)) { + case RXRPC_CALL_CLIENT_SEND_REQUEST: + case RXRPC_CALL_CLIENT_AWAIT_ACK: + case RXRPC_CALL_SERVER_SEND_REPLY: + case RXRPC_CALL_SERVER_AWAIT_ACK: + break; + default: + return; + } + + if (rxrpc_rotate_tx_window(call, call->tx_transmitted, &summary)) rxrpc_end_tx_phase(call, false, rxrpc_eproto_unexpected_ackall); } diff --git a/net/rxrpc/oob.c b/net/rxrpc/oob.c index 05ca9c1faa57..c80ee2487d09 100644 --- a/net/rxrpc/oob.c +++ b/net/rxrpc/oob.c @@ -32,11 +32,12 @@ struct rxrpc_oob_params { * Post an out-of-band message for attention by the socket or kernel service * associated with a reference call. */ -void rxrpc_notify_socket_oob(struct rxrpc_call *call, struct sk_buff *skb) +bool rxrpc_notify_socket_oob(struct rxrpc_call *call, struct sk_buff *skb) { struct rxrpc_skb_priv *sp = rxrpc_skb(skb); struct rxrpc_sock *rx; struct sock *sk; + bool queued = false; rcu_read_lock(); @@ -49,6 +50,7 @@ void rxrpc_notify_socket_oob(struct rxrpc_call *call, struct sk_buff *skb) skb->skb_mstamp_ns = rx->oob_id_counter++; rxrpc_get_skb(skb, rxrpc_skb_get_post_oob); skb_queue_tail(&rx->recvmsg_oobq, skb); + queued = true; trace_rxrpc_notify_socket(call->debug_id, sp->hdr.serial); if (rx->app_ops) @@ -56,11 +58,12 @@ void rxrpc_notify_socket_oob(struct rxrpc_call *call, struct sk_buff *skb) } spin_unlock_irq(&rx->recvmsg_lock); - if (!rx->app_ops && !sock_flag(sk, SOCK_DEAD)) + if (queued && !rx->app_ops && !sock_flag(sk, SOCK_DEAD)) sk->sk_data_ready(sk); } rcu_read_unlock(); + return queued; } /* @@ -210,6 +213,11 @@ static int rxrpc_respond_to_oob(struct rxrpc_sock *rx, break; } + switch (skb->mark) { + case RXRPC_OOB_CHALLENGE: + rxrpc_put_connection(sp->chall.conn, rxrpc_conn_put_oob); + break; + } rxrpc_free_skb(skb, rxrpc_skb_put_oob); return ret; } diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c index 82614cbdb60f..efcba4b2e74f 100644 --- a/net/rxrpc/recvmsg.c +++ b/net/rxrpc/recvmsg.c @@ -27,8 +27,6 @@ void rxrpc_notify_socket(struct rxrpc_call *call) _enter("%d", call->debug_id); - if (!list_empty(&call->recvmsg_link)) - return; if (test_bit(RXRPC_CALL_RELEASED, &call->flags)) { rxrpc_see_call(call, rxrpc_call_see_notify_released); return; @@ -438,7 +436,8 @@ try_again: return -EAGAIN; } - if (list_empty(&rx->recvmsg_q)) { + if (list_empty(&rx->recvmsg_q) && + skb_queue_empty_lockless(&rx->recvmsg_oobq)) { ret = -EWOULDBLOCK; if (timeo == 0) { call = NULL; @@ -471,7 +470,7 @@ try_again: release_sock(&rx->sk); if (ret == -EAGAIN) goto try_again; - goto error_no_call; + goto error_trace; } /* Find the next call and dequeue it if we're not just peeking. If we @@ -530,8 +529,7 @@ try_again: if (test_bit(RXRPC_CALL_RELEASED, &call->flags)) { rxrpc_see_call(call, rxrpc_call_see_already_released); mutex_unlock(&call->user_mutex); - if (!(flags & MSG_PEEK)) - rxrpc_put_call(call, rxrpc_call_put_recvmsg); + rxrpc_put_call(call, rxrpc_call_put_recvmsg); goto try_again; } diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index c35de4fd75e3..ed2c9a51005a 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -366,7 +366,8 @@ reload: if (state >= RXRPC_CALL_COMPLETE) goto maybe_error; ret = -EPROTO; - if (state != RXRPC_CALL_CLIENT_SEND_REQUEST && + if (state != RXRPC_CALL_CLIENT_PRE_SEND && + state != RXRPC_CALL_CLIENT_SEND_REQUEST && state != RXRPC_CALL_SERVER_ACK_REQUEST && state != RXRPC_CALL_SERVER_SEND_REPLY) { /* Request phase complete for this client call */ diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 6158e13c98d3..be535a261fa0 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -844,11 +844,11 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, u8 family, u16 zone, bool *defrag) { enum ip_conntrack_info ctinfo; + struct tc_skb_cb cb; struct nf_conn *ct; int err = 0; bool frag; u8 proto; - u16 mru; /* Previously seen (loopback)? Ignore. */ ct = nf_ct_get(skb, &ctinfo); @@ -862,12 +862,13 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, if (err || !frag) return err; - err = nf_ct_handle_fragments(net, skb, zone, family, &proto, &mru); + cb = *tc_skb_cb(skb); + err = nf_ct_handle_fragments(net, skb, zone, family, &proto, &cb.mru); if (err) return err; *defrag = true; - tc_skb_cb(skb)->mru = mru; + *tc_skb_cb(skb) = cb; return 0; } @@ -1295,7 +1296,8 @@ static int tcf_ct_fill_params(struct net *net, if (tb[TCA_CT_ZONE]) { if (!IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES)) { NL_SET_ERR_MSG_MOD(extack, "Conntrack zones isn't enabled."); - return -EOPNOTSUPP; + err = -EOPNOTSUPP; + goto err; } tcf_ct_set_key_val(tb, @@ -1308,7 +1310,8 @@ static int tcf_ct_fill_params(struct net *net, tmpl = nf_ct_tmpl_alloc(net, &zone, GFP_KERNEL); if (!tmpl) { NL_SET_ERR_MSG_MOD(extack, "Failed to allocate conntrack template"); - return -ENOMEM; + err = -ENOMEM; + goto err; } p->tmpl = tmpl; if (tb[TCA_CT_HELPER_NAME]) { diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 20f7f9ee0b35..3e67600a4a1a 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -4049,6 +4049,9 @@ struct sk_buff *tcf_qevent_handle(struct tcf_qevent *qe, struct Qdisc *sch, stru skb_do_redirect(skb); *ret = __NET_XMIT_STOLEN; return NULL; + case TC_ACT_CONSUMED: + *ret = __NET_XMIT_STOLEN; + return NULL; } return skb; diff --git a/net/sched/sch_dualpi2.c b/net/sched/sch_dualpi2.c index d7c3254ef800..5434df6ca8ef 100644 --- a/net/sched/sch_dualpi2.c +++ b/net/sched/sch_dualpi2.c @@ -461,7 +461,7 @@ static int dualpi2_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch, if (IS_ERR_OR_NULL(nskb)) return qdisc_drop(skb, sch, to_free); - cnt = 1; + cnt = 0; byte_len = 0; orig_len = qdisc_pkt_len(skb); skb_list_walk_safe(nskb, nskb, next) { @@ -488,16 +488,15 @@ static int dualpi2_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch, byte_len += nskb->len; } } - if (cnt > 1) { + if (cnt > 0) { /* The caller will add the original skb stats to its * backlog, compensate this if any nskb is enqueued. */ - --cnt; - byte_len -= orig_len; + qdisc_tree_reduce_backlog(sch, 1 - cnt, + orig_len - byte_len); } - qdisc_tree_reduce_backlog(sch, -cnt, -byte_len); consume_skb(skb); - return err; + return cnt > 0 ? NET_XMIT_SUCCESS : err; } return dualpi2_enqueue_skb(skb, sch, to_free); } diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 3f1c510df850..ef2b4bf51564 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -594,9 +594,8 @@ void netdev_watchdog_up(struct net_device *dev) return; if (dev->watchdog_timeo <= 0) dev->watchdog_timeo = 5*HZ; - spin_lock_bh(&dev->tx_global_lock); - spin_lock(&dev->watchdog_lock); + spin_lock_bh(&dev->watchdog_lock); if (!mod_timer(&dev->watchdog_timer, round_jiffies(jiffies + dev->watchdog_timeo))) { if (!dev->watchdog_ref_held) { @@ -605,9 +604,7 @@ void netdev_watchdog_up(struct net_device *dev) dev->watchdog_ref_held = true; } } - spin_unlock(&dev->watchdog_lock); - - spin_unlock_bh(&dev->tx_global_lock); + spin_unlock_bh(&dev->watchdog_lock); } EXPORT_SYMBOL_GPL(netdev_watchdog_up); diff --git a/net/sctp/diag.c b/net/sctp/diag.c index d758f5c3e06e..c2a0de2adf6f 100644 --- a/net/sctp/diag.c +++ b/net/sctp/diag.c @@ -92,6 +92,7 @@ static int inet_diag_msg_sctpladdrs_fill(struct sk_buff *skb, if (!--addrcnt) break; } + WARN_ON_ONCE(addrcnt); rcu_read_unlock(); return 0; @@ -373,42 +374,39 @@ static int sctp_ep_dump(struct sctp_endpoint *ep, void *p) struct sk_buff *skb = commp->skb; struct netlink_callback *cb = commp->cb; const struct inet_diag_req_v2 *r = commp->r; - struct net *net = sock_net(skb->sk); struct inet_sock *inet = inet_sk(sk); int err = 0; - if (!net_eq(sock_net(sk), net)) + lock_sock(sk); + if (ep->base.dead) goto out; - if (cb->args[4] < cb->args[1]) - goto next; - - if (!(r->idiag_states & TCPF_LISTEN) && !list_empty(&ep->asocs)) - goto next; + /* Skip eps with assocs if non-LISTEN states were requested, since + * they'll be dumped by sctp_sock_dump() during assoc traversal. + */ + if ((r->idiag_states & ~(TCPF_LISTEN | TCPF_CLOSE)) && + !list_empty(&ep->asocs)) + goto out; if (r->sdiag_family != AF_UNSPEC && sk->sk_family != r->sdiag_family) - goto next; + goto out; if (r->id.idiag_sport != inet->inet_sport && r->id.idiag_sport) - goto next; + goto out; if (r->id.idiag_dport != inet->inet_dport && r->id.idiag_dport) - goto next; - - if (inet_sctp_diag_fill(sk, NULL, skb, r, - sk_user_ns(NETLINK_CB(cb->skb).sk), - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, NLM_F_MULTI, - cb->nlh, commp->net_admin) < 0) { - err = 2; goto out; - } -next: - cb->args[4]++; + + err = inet_sctp_diag_fill(sk, NULL, skb, r, + sk_user_ns(NETLINK_CB(cb->skb).sk), + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, NLM_F_MULTI, + cb->nlh, commp->net_admin); out: + release_sock(sk); return err; } @@ -479,41 +477,40 @@ static void sctp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb, .r = r, .net_admin = netlink_net_capable(cb->skb, CAP_NET_ADMIN), }; - int pos = cb->args[2]; + int pos; /* eps hashtable dumps * args: * 0 : if it will traversal listen sock * 1 : to record the sock pos of this time's traversal - * 4 : to work as a temporary variable to traversal list */ if (cb->args[0] == 0) { - if (!(idiag_states & TCPF_LISTEN)) - goto skip; - if (sctp_for_each_endpoint(sctp_ep_dump, &commp)) - goto done; -skip: + if (idiag_states & TCPF_LISTEN) { + pos = cb->args[1]; + if (sctp_for_each_endpoint(sctp_ep_dump, net, &pos, + &commp)) { + cb->args[1] = pos; + return; + } + } cb->args[0] = 1; cb->args[1] = 0; - cb->args[4] = 0; } + if (!(idiag_states & ~(TCPF_LISTEN | TCPF_CLOSE))) + return; + /* asocs by transport hashtable dump * args: * 1 : to record the assoc pos of this time's traversal * 2 : to record the transport pos of this time's traversal * 3 : to mark if we have dumped the ep info of the current asoc - * 4 : to work as a temporary variable to traversal list - * 5 : to save the sk we get from travelsing the tsp list. + * 4 : to track position within ep->asocs list in sctp_sock_dump() */ - if (!(idiag_states & ~(TCPF_LISTEN | TCPF_CLOSE))) - goto done; - + pos = cb->args[2]; sctp_transport_traverse_process(sctp_sock_filter, sctp_sock_dump, net, &pos, &commp); cb->args[2] = pos; - -done: cb->args[1] = cb->args[4]; cb->args[4] = 0; } diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c index 9b23c11cbb9e..8e920cef0858 100644 --- a/net/sctp/sm_statefuns.c +++ b/net/sctp/sm_statefuns.c @@ -415,6 +415,8 @@ enum sctp_disposition sctp_sf_do_5_1B_init(struct net *net, /* Update socket peer label if first association. */ if (security_sctp_assoc_request(new_asoc, chunk->skb)) { sctp_association_free(new_asoc); + if (err_chunk) + sctp_chunk_free(err_chunk); return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands); } @@ -1606,6 +1608,8 @@ static enum sctp_disposition sctp_sf_do_unexpected_init( /* Update socket peer label if first association. */ if (security_sctp_assoc_request(new_asoc, chunk->skb)) { sctp_association_free(new_asoc); + if (err_chunk) + sctp_chunk_free(err_chunk); return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands); } @@ -1671,6 +1675,7 @@ static enum sctp_disposition sctp_sf_do_unexpected_init( * parameter type. */ sctp_addto_chunk(repl, len, unk_param); + sctp_chunk_free(err_chunk); } sctp_add_cmd_sf(commands, SCTP_CMD_NEW_ASOC, SCTP_ASOC(new_asoc)); diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 66e12fb0c646..c8481461f7d8 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -5369,24 +5369,39 @@ struct sctp_transport *sctp_transport_get_idx(struct net *net, } int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void *), - void *p) { - int err = 0; - int hash = 0; - struct sctp_endpoint *ep; + struct net *net, int *pos, void *p) { + int err, hash = 0, idx = 0, start; struct sctp_hashbucket *head; + struct sctp_endpoint *ep; for (head = sctp_ep_hashtable; hash < sctp_ep_hashsize; hash++, head++) { + start = idx; +again: read_lock_bh(&head->lock); sctp_for_each_hentry(ep, &head->chain) { - err = cb(ep, p); - if (err) + if (sock_net(ep->base.sk) != net) + continue; + if (idx++ >= *pos) { + sctp_endpoint_hold(ep); break; + } } read_unlock_bh(&head->lock); + + if (ep) { + err = cb(ep, p); + sctp_endpoint_put(ep); + if (err) + return err; + (*pos)++; + + idx = start; + goto again; + } } - return err; + return 0; } EXPORT_SYMBOL_GPL(sctp_for_each_endpoint); diff --git a/net/tipc/core.c b/net/tipc/core.c index 434e70eabe08..315975c3be81 100644 --- a/net/tipc/core.c +++ b/net/tipc/core.c @@ -45,6 +45,7 @@ #include "crypto.h" #include <linux/module.h> +#include <linux/wait_bit.h> /* configurable TIPC parameters */ unsigned int tipc_net_id __read_mostly; @@ -118,8 +119,7 @@ static void __net_exit tipc_exit_net(struct net *net) #ifdef CONFIG_TIPC_CRYPTO tipc_crypto_stop(&tipc_net(net)->crypto_tx); #endif - while (atomic_read(&tn->wq_count)) - cond_resched(); + wait_var_event(&tn->wq_count, atomic_read(&tn->wq_count) == 0); } static void __net_exit tipc_pernet_pre_exit(struct net *net) @@ -218,6 +218,11 @@ static void __exit tipc_exit(void) unregister_pernet_device(&tipc_net_ops); tipc_unregister_sysctl(); + /* TODO: Wait for all timers that called call_rcu() to finish before + * calling rcu_barrier(). + */ + rcu_barrier(); + pr_info("Deactivated\n"); } diff --git a/net/tipc/crypto.c b/net/tipc/crypto.c index 6d3b6b89b1d1..16f1ed1f6b1b 100644 --- a/net/tipc/crypto.c +++ b/net/tipc/crypto.c @@ -941,12 +941,20 @@ static int tipc_aead_decrypt(struct net *net, struct tipc_aead *aead, goto exit; } + /* Get net to avoid freed tipc_crypto when delete namespace */ + if (!maybe_get_net(net)) { + tipc_bearer_put(b); + rc = -ENODEV; + goto exit; + } + /* Now, do decrypt */ rc = crypto_aead_decrypt(req); if (rc == -EINPROGRESS || rc == -EBUSY) return rc; tipc_bearer_put(b); + put_net(net); exit: kfree(ctx); @@ -984,6 +992,7 @@ static void tipc_aead_decrypt_done(void *data, int err) } tipc_bearer_put(b); + put_net(net); } static inline int tipc_ehdr_size(struct tipc_ehdr *ehdr) diff --git a/net/tipc/discover.c b/net/tipc/discover.c index 3e54d2df5683..b9d06595b067 100644 --- a/net/tipc/discover.c +++ b/net/tipc/discover.c @@ -58,6 +58,7 @@ * @skb: request message to be (repeatedly) sent * @timer: timer governing period between requests * @timer_intv: current interval between requests (in ms) + * @rcu: RCU head for deferred freeing */ struct tipc_discoverer { u32 bearer_id; @@ -69,6 +70,7 @@ struct tipc_discoverer { struct sk_buff *skb; struct timer_list timer; unsigned long timer_intv; + struct rcu_head rcu; }; /** @@ -382,6 +384,15 @@ int tipc_disc_create(struct net *net, struct tipc_bearer *b, return 0; } +static void tipc_disc_free_rcu(struct rcu_head *rp) +{ + struct tipc_discoverer *d = container_of(rp, struct tipc_discoverer, + rcu); + + kfree_skb(d->skb); + kfree(d); +} + /** * tipc_disc_delete - destroy object sending periodic link setup requests * @d: ptr to link dest structure @@ -389,8 +400,7 @@ int tipc_disc_create(struct net *net, struct tipc_bearer *b, void tipc_disc_delete(struct tipc_discoverer *d) { timer_shutdown_sync(&d->timer); - kfree_skb(d->skb); - kfree(d); + call_rcu(&d->rcu, tipc_disc_free_rcu); } /** diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c index 988b8a7f953a..62ae7f5b5840 100644 --- a/net/tipc/udp_media.c +++ b/net/tipc/udp_media.c @@ -40,6 +40,7 @@ #include <linux/igmp.h> #include <linux/kernel.h> #include <linux/workqueue.h> +#include <linux/wait_bit.h> #include <linux/list.h> #include <net/sock.h> #include <net/ip.h> @@ -803,6 +804,14 @@ err: return err; } +static void rcast_free_rcu(struct rcu_head *rcu) +{ + struct udp_replicast *rcast = container_of(rcu, struct udp_replicast, rcu); + + dst_cache_destroy(&rcast->dst_cache); + kfree(rcast); +} + /* cleanup_bearer - break the socket/bearer association */ static void cleanup_bearer(struct work_struct *work) { @@ -811,19 +820,19 @@ static void cleanup_bearer(struct work_struct *work) struct tipc_net *tn; list_for_each_entry_safe(rcast, tmp, &ub->rcast.list, list) { - dst_cache_destroy(&rcast->dst_cache); list_del_rcu(&rcast->list); - kfree_rcu(rcast, rcu); + call_rcu_hurry(&rcast->rcu, rcast_free_rcu); } tn = tipc_net(sock_net(ub->sk)); - dst_cache_destroy(&ub->rcast.dst_cache); udp_tunnel_sock_release(ub->sk); - /* Note: could use a call_rcu() to avoid another synchronize_net() */ synchronize_net(); - atomic_dec(&tn->wq_count); + + dst_cache_destroy(&ub->rcast.dst_cache); + if (atomic_dec_and_test(&tn->wq_count)) + wake_up_var(&tn->wq_count); kfree(ub); } diff --git a/net/xfrm/espintcp.c b/net/xfrm/espintcp.c index d9035546375e..374e1b964438 100644 --- a/net/xfrm/espintcp.c +++ b/net/xfrm/espintcp.c @@ -212,43 +212,23 @@ static int espintcp_sendskmsg_locked(struct sock *sk, struct sk_msg *skmsg = &emsg->skmsg; bool more = flags & MSG_MORE; struct scatterlist *sg; - int done = 0; int ret; - sg = &skmsg->sg.data[skmsg->sg.start]; do { struct bio_vec bvec; - size_t size = sg->length - emsg->offset; - int offset = sg->offset + emsg->offset; - struct page *p; - - emsg->offset = 0; + sg = &skmsg->sg.data[skmsg->sg.start]; if (sg_is_last(sg) && !more) msghdr.msg_flags &= ~MSG_MORE; - p = sg_page(sg); -retry: - bvec_set_page(&bvec, p, size, offset); - iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, size); - ret = tcp_sendmsg_locked(sk, &msghdr, size); - if (ret < 0) { - emsg->offset = offset - sg->offset; - skmsg->sg.start += done; + bvec_set_page(&bvec, sg_page(sg), sg->length, sg->offset); + iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, sg->length); + ret = tcp_sendmsg_locked(sk, &msghdr, sg->length); + if (ret < 0) return ret; - } - - if (ret != size) { - offset += ret; - size -= ret; - goto retry; - } - done++; - put_page(p); - sk_mem_uncharge(sk, sg->length); - sg = sg_next(sg); - } while (sg); + sk_msg_free_partial(sk, skmsg, ret); + } while (skmsg->sg.size); memset(emsg, 0, sizeof(*emsg)); diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index e4c2cd24936d..eecab337bd0a 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -467,6 +467,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) { const struct xfrm_state_afinfo *afinfo; struct net *net = dev_net(skb->dev); + struct net_device *dev = skb->dev; int err; __be32 seq; __be32 seq_hi; @@ -493,7 +494,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) LINUX_MIB_XFRMINSTATEINVALID); if (encap_type == -1) - dev_put(skb->dev); + dev_put(dev); goto drop; } @@ -655,16 +656,16 @@ process: if (!crypto_done) { spin_unlock(&x->lock); - dev_hold(skb->dev); + dev_hold(dev); nexthdr = x->type->input(x, skb); if (nexthdr == -EINPROGRESS) { if (async) - dev_put(skb->dev); + dev_put(dev); return 0; } - dev_put(skb->dev); + dev_put(dev); spin_lock(&x->lock); } resume: @@ -699,7 +700,7 @@ resume: err = xfrm_inner_mode_input(x, skb); if (err == -EINPROGRESS) { if (async) - dev_put(skb->dev); + dev_put(dev); return 0; } else if (err) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEMODEERROR); @@ -726,9 +727,12 @@ resume_decapped: crypto_done = false; } while (!err); + rcu_read_lock(); err = xfrm_rcv_cb(skb, family, x->type->proto, 0); - if (err) + if (err) { + rcu_read_unlock(); goto drop; + } nf_reset_ct(skb); @@ -739,8 +743,9 @@ resume_decapped: if (skb_valid_dst(skb)) skb_dst_drop(skb); if (async) - dev_put(skb->dev); + dev_put(dev); gro_cells_receive(&gro_cells, skb); + rcu_read_unlock(); return 0; } else { xo = xfrm_offload(skb); @@ -748,23 +753,21 @@ resume_decapped: xfrm_gro = xo->flags & XFRM_GRO; err = -EAFNOSUPPORT; - rcu_read_lock(); afinfo = xfrm_state_afinfo_get_rcu(x->props.family); if (likely(afinfo)) err = afinfo->transport_finish(skb, xfrm_gro || async); - rcu_read_unlock(); if (xfrm_gro) { sp = skb_sec_path(skb); if (sp) sp->olen = 0; if (skb_valid_dst(skb)) skb_dst_drop(skb); - if (async) - dev_put(skb->dev); gro_cells_receive(&gro_cells, skb); - return err; } + if (async) + dev_put(dev); + rcu_read_unlock(); return err; } @@ -772,7 +775,7 @@ drop_unlock: spin_unlock(&x->lock); drop: if (async) - dev_put(skb->dev); + dev_put(dev); xfrm_rcv_cb(skb, family, x && x->type ? x->type->proto : nexthdr, -1); kfree_skb(skb); return 0; diff --git a/net/xfrm/xfrm_interface_core.c b/net/xfrm/xfrm_interface_core.c index 330a05286a56..688306bf62c5 100644 --- a/net/xfrm/xfrm_interface_core.c +++ b/net/xfrm/xfrm_interface_core.c @@ -869,6 +869,9 @@ static int xfrmi_changelink(struct net_device *dev, struct nlattr *tb[], struct net *net = xi->net; struct xfrm_if_parms p = {}; + if (!rtnl_dev_link_net_capable(dev, net)) + return -EPERM; + xfrmi_netlink_parms(data, &p); if (!p.if_id) { NL_SET_ERR_MSG(extack, "if_id must be non zero"); diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 0ea015e1880b..7ef861a0e823 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -242,6 +242,9 @@ __xfrm6_selector_match(const struct xfrm_selector *sel, const struct flowi *fl) bool xfrm_selector_match(const struct xfrm_selector *sel, const struct flowi *fl, unsigned short family) { + if (family != sel->family && sel->family != AF_UNSPEC) + return false; + switch (family) { case AF_INET: return __xfrm4_selector_match(sel, fl); @@ -685,7 +688,7 @@ static void xfrm_byidx_resize(struct net *net) static inline int xfrm_bydst_should_resize(struct net *net, int dir, int *total) { - unsigned int cnt = net->xfrm.policy_count[dir]; + unsigned int cnt = READ_ONCE(net->xfrm.policy_count[dir]); unsigned int hmask = net->xfrm.policy_bydst[dir].hmask; if (total) @@ -711,12 +714,12 @@ static inline int xfrm_byidx_should_resize(struct net *net, int total) void xfrm_spd_getinfo(struct net *net, struct xfrmk_spdinfo *si) { - si->incnt = net->xfrm.policy_count[XFRM_POLICY_IN]; - si->outcnt = net->xfrm.policy_count[XFRM_POLICY_OUT]; - si->fwdcnt = net->xfrm.policy_count[XFRM_POLICY_FWD]; - si->inscnt = net->xfrm.policy_count[XFRM_POLICY_IN+XFRM_POLICY_MAX]; - si->outscnt = net->xfrm.policy_count[XFRM_POLICY_OUT+XFRM_POLICY_MAX]; - si->fwdscnt = net->xfrm.policy_count[XFRM_POLICY_FWD+XFRM_POLICY_MAX]; + si->incnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_IN]); + si->outcnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_OUT]); + si->fwdcnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_FWD]); + si->inscnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_IN+XFRM_POLICY_MAX]); + si->outscnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_OUT+XFRM_POLICY_MAX]); + si->fwdscnt = READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_FWD+XFRM_POLICY_MAX]); si->spdhcnt = net->xfrm.policy_idx_hmask; si->spdhmcnt = xfrm_policy_hashmax; } @@ -2318,7 +2321,7 @@ static void __xfrm_policy_link(struct xfrm_policy *pol, int dir) } list_add(&pol->walk.all, &net->xfrm.policy_all); - net->xfrm.policy_count[dir]++; + WRITE_ONCE(net->xfrm.policy_count[dir], net->xfrm.policy_count[dir] + 1); xfrm_pol_hold(pol); } @@ -2337,7 +2340,7 @@ static struct xfrm_policy *__xfrm_policy_unlink(struct xfrm_policy *pol, } list_del_init(&pol->walk.all); - net->xfrm.policy_count[dir]--; + WRITE_ONCE(net->xfrm.policy_count[dir], net->xfrm.policy_count[dir] - 1); return pol; } @@ -3222,7 +3225,7 @@ struct dst_entry *xfrm_lookup_with_ifid(struct net *net, /* To accelerate a bit... */ if (!if_id && ((dst_orig->flags & DST_NOXFRM) || - !net->xfrm.policy_count[XFRM_POLICY_OUT])) + !READ_ONCE(net->xfrm.policy_count[XFRM_POLICY_OUT]))) goto nopol; xdst = xfrm_bundle_lookup(net, fl, family, dir, &xflo, if_id); @@ -3296,7 +3299,7 @@ ok: nopol: if ((!dst_orig->dev || !(dst_orig->dev->flags & IFF_LOOPBACK)) && - net->xfrm.policy_default[dir] == XFRM_USERPOLICY_BLOCK) { + READ_ONCE(net->xfrm.policy_default[dir]) == XFRM_USERPOLICY_BLOCK) { err = -EPERM; goto error; } @@ -3750,7 +3753,7 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb, const bool is_crypto_offload = sp && (xfrm_input_state(skb)->xso.type == XFRM_DEV_OFFLOAD_CRYPTO); - if (net->xfrm.policy_default[dir] == XFRM_USERPOLICY_BLOCK) { + if (READ_ONCE(net->xfrm.policy_default[dir]) == XFRM_USERPOLICY_BLOCK) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOPOLS); return 0; } diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 1df017871651..c58cd024e3c6 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1207,9 +1207,11 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark, struct hlist_head *state_cache_input; struct xfrm_state *x = NULL; + /* BH is always disabled on the input path. */ + lockdep_assert_in_softirq(); + state_cache_input = raw_cpu_ptr(net->xfrm.state_cache_input); - rcu_read_lock(); hlist_for_each_entry_rcu(x, state_cache_input, state_cache_input) { if (x->props.family != family || x->id.spi != spi || @@ -1227,20 +1229,25 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark, xfrm_hash_ptrs_get(net, &state_ptrs); x = __xfrm_state_lookup(&state_ptrs, mark, daddr, spi, proto, family); - - if (x && x->km.state == XFRM_STATE_VALID) { - spin_lock_bh(&net->xfrm.xfrm_state_lock); - if (hlist_unhashed(&x->state_cache_input)) { + if (x) { + spin_lock(&net->xfrm.xfrm_state_lock); + if (x->km.state != XFRM_STATE_VALID) { + /* + * The state is about to be destroyed. + * + * Don't add it to the cache but still + * return it to the caller. + */ + } else if (hlist_unhashed(&x->state_cache_input)) { hlist_add_head_rcu(&x->state_cache_input, state_cache_input); } else { hlist_del_rcu(&x->state_cache_input); hlist_add_head_rcu(&x->state_cache_input, state_cache_input); } - spin_unlock_bh(&net->xfrm.xfrm_state_lock); + spin_unlock(&net->xfrm.xfrm_state_lock); } out: - rcu_read_unlock(); return x; } EXPORT_SYMBOL(xfrm_input_state_lookup); @@ -3014,7 +3021,7 @@ int xfrm_user_policy(struct sock *sk, int optname, sockptr_t optval, int optlen) if (IS_ERR(data)) return PTR_ERR(data); - if (in_compat_syscall()) { + if (IS_ENABLED(CONFIG_COMPAT_FOR_U64_ALIGNMENT) && in_compat_syscall()) { struct xfrm_translator *xtr = xfrm_get_translator(); if (!xtr) { diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index 5975849e5893..6384795ee6b2 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -2511,9 +2511,9 @@ static int xfrm_notify_userpolicy(struct net *net) } up = nlmsg_data(nlh); - up->in = net->xfrm.policy_default[XFRM_POLICY_IN]; - up->fwd = net->xfrm.policy_default[XFRM_POLICY_FWD]; - up->out = net->xfrm.policy_default[XFRM_POLICY_OUT]; + up->in = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_IN]); + up->fwd = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_FWD]); + up->out = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_OUT]); nlmsg_end(skb, nlh); @@ -2537,13 +2537,13 @@ static int xfrm_set_default(struct sk_buff *skb, struct nlmsghdr *nlh, struct xfrm_userpolicy_default *up = nlmsg_data(nlh); if (xfrm_userpolicy_is_valid(up->in)) - net->xfrm.policy_default[XFRM_POLICY_IN] = up->in; + WRITE_ONCE(net->xfrm.policy_default[XFRM_POLICY_IN], up->in); if (xfrm_userpolicy_is_valid(up->fwd)) - net->xfrm.policy_default[XFRM_POLICY_FWD] = up->fwd; + WRITE_ONCE(net->xfrm.policy_default[XFRM_POLICY_FWD], up->fwd); if (xfrm_userpolicy_is_valid(up->out)) - net->xfrm.policy_default[XFRM_POLICY_OUT] = up->out; + WRITE_ONCE(net->xfrm.policy_default[XFRM_POLICY_OUT], up->out); rt_genid_bump_all(net); @@ -2573,9 +2573,9 @@ static int xfrm_get_default(struct sk_buff *skb, struct nlmsghdr *nlh, } r_up = nlmsg_data(r_nlh); - r_up->in = net->xfrm.policy_default[XFRM_POLICY_IN]; - r_up->fwd = net->xfrm.policy_default[XFRM_POLICY_FWD]; - r_up->out = net->xfrm.policy_default[XFRM_POLICY_OUT]; + r_up->in = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_IN]); + r_up->fwd = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_FWD]); + r_up->out = READ_ONCE(net->xfrm.policy_default[XFRM_POLICY_OUT]); nlmsg_end(r_skb, r_nlh); return nlmsg_unicast(xfrm_net_nlsk(net, skb), r_skb, portid); @@ -3835,7 +3835,7 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, if (!netlink_net_capable(skb, CAP_NET_ADMIN)) return -EPERM; - if (in_compat_syscall()) { + if (IS_ENABLED(CONFIG_COMPAT_FOR_U64_ALIGNMENT) && in_compat_syscall()) { struct xfrm_translator *xtr = xfrm_get_translator(); if (!xtr) diff --git a/tools/net/ynl/Makefile b/tools/net/ynl/Makefile index d514a48dae27..3cefe4ed96cb 100644 --- a/tools/net/ynl/Makefile +++ b/tools/net/ynl/Makefile @@ -22,7 +22,7 @@ tests: | lib generated libynl.a ynltool: | lib generated libynl.a libynl.a: | lib generated @echo -e "\tAR $@" - @ar rcs $@ lib/ynl.o generated/*-user.o + @$(AR) rcs $@ lib/ynl.o generated/*-user.o $(SUBDIRS): @if [ -f "$@/Makefile" ] ; then \ diff --git a/tools/net/ynl/Makefile.deps b/tools/net/ynl/Makefile.deps index cc53b2f21c44..43d06ecbae93 100644 --- a/tools/net/ynl/Makefile.deps +++ b/tools/net/ynl/Makefile.deps @@ -14,10 +14,12 @@ UAPI_PATH:=../../../../include/uapi/ get_hdr_inc=-D$(1) -include $(UAPI_PATH)/linux/$(2) get_hdr_inc2=-D$(1) -D$(2) -include $(UAPI_PATH)/linux/$(3) +get_hdr_inc_drm=-D$(1) -include $(UAPI_PATH)/drm/$(2) CFLAGS_dev-energymodel:=$(call get_hdr_inc,_LINUX_DEV_ENERGYMODEL_H,dev_energymodel.h) CFLAGS_devlink:=$(call get_hdr_inc,_LINUX_DEVLINK_H_,devlink.h) CFLAGS_dpll:=$(call get_hdr_inc,_LINUX_DPLL_H,dpll.h) +CFLAGS_drm_ras:=$(call get_hdr_inc_drm,_LINUX_DRM_RAS_H,drm_ras.h) CFLAGS_ethtool:=$(call get_hdr_inc,_LINUX_TYPELIMITS_H,typelimits.h) \ $(call get_hdr_inc,_LINUX_ETHTOOL_H,ethtool.h) \ $(call get_hdr_inc,_LINUX_ETHTOOL_NETLINK_H_,ethtool_netlink.h) \ diff --git a/tools/net/ynl/generated/Makefile b/tools/net/ynl/generated/Makefile index 86e1e4a959a7..ea4128f612d6 100644 --- a/tools/net/ynl/generated/Makefile +++ b/tools/net/ynl/generated/Makefile @@ -37,7 +37,7 @@ all: protos.a $(HDRS) $(SRCS) $(KHDRS) $(KSRCS) $(UAPI) $(RSTS) protos.a: $(OBJS) @echo -e "\tAR $@" - @ar rcs $@ $(OBJS) + @$(AR) rcs $@ $(OBJS) %-user.h: $(SPECS_DIR)/%.yaml $(TOOL) @echo -e "\tGEN $@" diff --git a/tools/net/ynl/lib/Makefile b/tools/net/ynl/lib/Makefile index 4b2b98704ff9..9b98c0599600 100644 --- a/tools/net/ynl/lib/Makefile +++ b/tools/net/ynl/lib/Makefile @@ -15,7 +15,7 @@ all: ynl.a ynl.a: $(OBJS) @echo -e "\tAR $@" - @ar rcs $@ $(OBJS) + @$(AR) rcs $@ $(OBJS) clean: rm -f *.o *.d *~ diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config index bac60b444551..adb25146e88c 100644 --- a/tools/testing/selftests/bpf/config +++ b/tools/testing/selftests/bpf/config @@ -45,13 +45,16 @@ CONFIG_IPV6=y CONFIG_IPV6_FOU=y CONFIG_IPV6_FOU_TUNNEL=y CONFIG_IPV6_GRE=y +CONFIG_IPV6_IOAM6_LWTUNNEL=y CONFIG_IPV6_SEG6_BPF=y +CONFIG_IPV6_SEG6_LWTUNNEL=y CONFIG_IPV6_SIT=y CONFIG_IPV6_TUNNEL=y CONFIG_KEYS=y CONFIG_LIRC=y CONFIG_LIVEPATCH=y CONFIG_LWTUNNEL=y +CONFIG_LWTUNNEL_BPF=y CONFIG_MODULE_SIG=y CONFIG_MODULE_SRCVERSION_ALL=y CONFIG_MODULE_UNLOAD=y diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.c b/tools/testing/selftests/bpf/prog_tests/test_xsk.c index 72875071d4f1..6eb9096d084c 100644 --- a/tools/testing/selftests/bpf/prog_tests/test_xsk.c +++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.c @@ -7,7 +7,6 @@ #include <linux/netdev.h> #include <poll.h> #include <pthread.h> -#include <signal.h> #include <string.h> #include <sys/mman.h> #include <sys/socket.h> @@ -65,11 +64,6 @@ static void gen_eth_hdr(struct xsk_socket_info *xsk, struct ethhdr *eth_hdr) eth_hdr->h_proto = htons(ETH_P_LOOPBACK); } -static bool is_umem_valid(struct xsk_socket_info *xsk) -{ - return !!xsk->umem->umem; -} - static u32 mode_to_xdp_flags(enum test_mode mode) { return (mode == TEST_MODE_SKB) ? XDP_FLAGS_SKB_MODE : XDP_FLAGS_DRV_MODE; @@ -1010,7 +1004,7 @@ static int __receive_pkts(struct test_spec *test, struct xsk_socket_info *xsk) return TEST_FAILURE; if (!ret) { - if (!is_umem_valid(test->ifobj_tx->xsk)) + if (test->poll_tmout) return TEST_PASS; ksft_print_msg("ERROR: [%s] Poll timed out\n", __func__); @@ -1149,7 +1143,7 @@ static int receive_pkts(struct test_spec *test) break; res = __receive_pkts(test, xsk); - if (!(res == TEST_PASS || res == TEST_CONTINUE)) + if (res != TEST_CONTINUE) return res; ret = gettimeofday(&tv_now, NULL); @@ -1166,7 +1160,8 @@ static int receive_pkts(struct test_spec *test) return TEST_PASS; } -static int __send_pkts(struct ifobject *ifobject, struct xsk_socket_info *xsk, bool timeout) +static int __send_pkts(struct ifobject *ifobject, struct xsk_socket_info *xsk, + bool test_timeout) { u32 i, idx = 0, valid_pkts = 0, valid_frags = 0, buffer_len; struct pkt_stream *pkt_stream = xsk->pkt_stream; @@ -1178,7 +1173,7 @@ static int __send_pkts(struct ifobject *ifobject, struct xsk_socket_info *xsk, b buffer_len = pkt_get_buffer_len(umem, pkt_stream->max_pkt_len); /* pkts_in_flight might be negative if many invalid packets are sent */ if (pkts_in_flight >= (int)((umem_size(umem) - xsk->batch_size * buffer_len) / - buffer_len)) { + buffer_len) && !test_timeout) { ret = kick_tx(xsk); if (ret) return TEST_FAILURE; @@ -1191,7 +1186,7 @@ static int __send_pkts(struct ifobject *ifobject, struct xsk_socket_info *xsk, b while (xsk_ring_prod__reserve(&xsk->tx, xsk->batch_size, &idx) < xsk->batch_size) { if (use_poll) { ret = poll(&fds, 1, POLL_TMOUT); - if (timeout) { + if (test_timeout) { if (ret < 0) { ksft_print_msg("ERROR: [%s] Poll error %d\n", __func__, errno); @@ -1271,7 +1266,7 @@ static int __send_pkts(struct ifobject *ifobject, struct xsk_socket_info *xsk, b if (use_poll) { ret = poll(&fds, 1, POLL_TMOUT); if (ret <= 0) { - if (ret == 0 && timeout) + if (ret == 0 && test_timeout) return TEST_PASS; ksft_print_msg("ERROR: [%s] Poll error %d\n", __func__, ret); @@ -1279,14 +1274,14 @@ static int __send_pkts(struct ifobject *ifobject, struct xsk_socket_info *xsk, b } } - if (!timeout) { + if (!test_timeout) { if (complete_pkts(xsk, i)) return TEST_FAILURE; usleep(10); - return TEST_PASS; } + /* Loop completion is driven by send_pkts() stream progress checks. */ return TEST_CONTINUE; } @@ -1322,7 +1317,6 @@ bool all_packets_sent(struct test_spec *test, unsigned long *bitmap) static int send_pkts(struct test_spec *test, struct ifobject *ifobject) { - bool timeout = !is_umem_valid(test->ifobj_rx->xsk); DECLARE_BITMAP(bitmap, test->nb_sockets); u32 i, ret; @@ -1337,19 +1331,18 @@ static int send_pkts(struct test_spec *test, struct ifobject *ifobject) __set_bit(i, bitmap); continue; } - ret = __send_pkts(ifobject, &ifobject->xsk_arr[i], timeout); - if (ret == TEST_CONTINUE && !test->fail) - continue; - - if ((ret || test->fail) && !timeout) - return TEST_FAILURE; - - if (ret == TEST_PASS && timeout) + ret = __send_pkts(ifobject, &ifobject->xsk_arr[i], test->poll_tmout); + if (ret != TEST_CONTINUE) return ret; - ret = wait_for_tx_completion(&ifobject->xsk_arr[i]); - if (ret) + if (test->fail) return TEST_FAILURE; + + if (!test->poll_tmout) { + ret = wait_for_tx_completion(&ifobject->xsk_arr[i]); + if (ret) + return TEST_FAILURE; + } } } @@ -1677,7 +1670,8 @@ void *worker_testapp_validate_rx(void *arg) strerror(-err)); } - pthread_barrier_wait(&barr); + if (test->use_barrier) + pthread_barrier_wait(&barr); /* We leave only now in case of error to avoid getting stuck in the barrier */ if (err) { @@ -1716,11 +1710,6 @@ static void testapp_clean_xsk_umem(struct ifobject *ifobj) munmap(umem->buffer, umem->mmap_size); } -static void handler(int signum) -{ - pthread_exit(NULL); -} - static bool xdp_prog_changed_rx(struct test_spec *test) { struct ifobject *ifobj = test->ifobj_rx; @@ -1825,9 +1814,18 @@ static int __testapp_validate_traffic(struct test_spec *test, struct ifobject *i return TEST_FAILURE; } - if (ifobj2) { + err = xsk_attach_xdp_progs(test, ifobj1, ifobj2); + if (err) { + ksft_print_msg("Error: failed to attach XDP programs: %d (%s)\n", + err, strerror(-err)); + return TEST_FAILURE; + } + test->use_barrier = !!ifobj2; + + if (test->use_barrier) { if (pthread_barrier_init(&barr, NULL, 2)) return TEST_FAILURE; + pkt_stream_reset(ifobj2->xsk->pkt_stream); } @@ -1835,27 +1833,26 @@ static int __testapp_validate_traffic(struct test_spec *test, struct ifobject *i pkt_stream_reset(ifobj1->xsk->pkt_stream); pkts_in_flight = 0; - signal(SIGUSR1, handler); /*Spawn RX thread */ pthread_create(&t0, NULL, ifobj1->func_ptr, test); - if (ifobj2) { + if (test->use_barrier) { pthread_barrier_wait(&barr); if (pthread_barrier_destroy(&barr)) { - pthread_kill(t0, SIGUSR1); + test->use_barrier = false; + pthread_join(t0, NULL); clean_sockets(test, ifobj1); clean_umem(test, ifobj1, NULL); return TEST_FAILURE; } + } + if (ifobj2) { /*Spawn TX thread */ pthread_create(&t1, NULL, ifobj2->func_ptr, test); - pthread_join(t1, NULL); } - if (!ifobj2) - pthread_kill(t0, SIGUSR1); pthread_join(t0, NULL); if (test->total_steps == test->current_step || test->fail) { @@ -1893,8 +1890,6 @@ static int testapp_validate_traffic(struct test_spec *test) } } - if (xsk_attach_xdp_progs(test, ifobj_rx, ifobj_tx)) - return TEST_FAILURE; return __testapp_validate_traffic(test, ifobj_rx, ifobj_tx); } @@ -2231,16 +2226,33 @@ int testapp_xdp_shared_umem(struct test_spec *test) int testapp_poll_txq_tmout(struct test_spec *test) { + bool shared_umem = test->ifobj_tx->shared_umem; + int ret; + + test->poll_tmout = true; + /* + * POLL_TXQ_FULL exercises TX timeout setup in isolation. + * Keep TX out of shared-UMEM mode here so TX setup does not require + * RX UMEM to be initialized first. + */ + test->ifobj_tx->shared_umem = false; test->ifobj_tx->use_poll = true; /* create invalid frame by set umem frame_size and pkt length equal to 2048 */ test->ifobj_tx->xsk->umem->frame_size = 2048; - if (pkt_stream_replace(test, 2 * DEFAULT_PKT_CNT, 2048)) + if (pkt_stream_replace(test, 2 * DEFAULT_PKT_CNT, 2048)) { + test->ifobj_tx->shared_umem = shared_umem; return TEST_FAILURE; - return testapp_validate_traffic_single_thread(test, test->ifobj_tx); + } + + ret = testapp_validate_traffic_single_thread(test, test->ifobj_tx); + test->ifobj_tx->shared_umem = shared_umem; + + return ret; } int testapp_poll_rxq_tmout(struct test_spec *test) { + test->poll_tmout = true; test->ifobj_rx->use_poll = true; return testapp_validate_traffic_single_thread(test, test->ifobj_rx); } diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.h b/tools/testing/selftests/bpf/prog_tests/test_xsk.h index 4313d0d87235..03753ddc5dcd 100644 --- a/tools/testing/selftests/bpf/prog_tests/test_xsk.h +++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.h @@ -207,6 +207,8 @@ struct test_spec { bool set_ring; bool adjust_tail; bool adjust_tail_support; + bool poll_tmout; + bool use_barrier; enum test_mode mode; char name[MAX_TEST_NAME_SIZE]; }; diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c b/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c index 26159e0499c7..448807676176 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c @@ -1,6 +1,8 @@ // SPDX-License-Identifier: GPL-2.0 #include <test_progs.h> #include <network_helpers.h> +#include <linux/ipv6.h> +#include <arpa/inet.h> #include "test_xdp_context_test_run.skel.h" #include "test_xdp_meta.skel.h" @@ -8,9 +10,12 @@ #define TX_NAME "veth1" #define TX_NETNS "xdp_context_tx" #define RX_NETNS "xdp_context_rx" +#define RX_MAC "02:00:00:00:00:01" +#define TX_MAC "02:00:00:00:00:02" #define TAP_NAME "tap0" #define DUMMY_NAME "dum0" #define TAP_NETNS "xdp_context_tuntap" +#define LWT_NETNS "xdp_context_lwt" #define TEST_PAYLOAD_LEN 32 static const __u8 test_payload[TEST_PAYLOAD_LEN] = { @@ -187,6 +192,42 @@ static int write_test_packet(int tap_fd) return 0; } +/* Inject Ethernet+IPv6+UDP frame into TAP */ +static int write_test_packet_udp(int tap_fd) +{ + __u8 pkt[sizeof(struct ethhdr) + sizeof(struct ipv6hdr) + + sizeof(struct udphdr) + TEST_PAYLOAD_LEN] = {}; + struct ethhdr *eth = (void *)pkt; + struct ipv6hdr *ip6 = (void *)(eth + 1); + struct udphdr *udp = (void *)(ip6 + 1); + __u8 *payload = (void *)(udp + 1); + const __u8 tap_mac[ETH_ALEN] = { 0x02, 0, 0, 0, 0, 0x01 }; + int n; + + memcpy(eth->h_dest, tap_mac, ETH_ALEN); + eth->h_proto = htons(ETH_P_IPV6); + + ip6->version = 6; + ip6->hop_limit = 64; + ip6->nexthdr = IPPROTO_UDP; + ip6->payload_len = htons(sizeof(*udp) + TEST_PAYLOAD_LEN); + inet_pton(AF_INET6, "fd00::2", &ip6->saddr); + inet_pton(AF_INET6, "fd00:1::1", &ip6->daddr); + + udp->source = htons(42); + udp->dest = htons(42); + udp->len = htons(sizeof(*udp) + TEST_PAYLOAD_LEN); + /* UDP checksum is not validated on the forwarding path. */ + + memcpy(payload, test_payload, TEST_PAYLOAD_LEN); + + n = write(tap_fd, pkt, sizeof(pkt)); + if (!ASSERT_EQ(n, sizeof(pkt), "write frame")) + return -1; + + return 0; +} + static void dump_err_stream(const struct bpf_program *prog) { char buf[512]; @@ -518,3 +559,137 @@ void test_xdp_context_tuntap(void) test_xdp_meta__destroy(skel); } + +/* + * Test topology: + * + * tap0 fd00::1 + * RX: injected IPv6 UDP frame, XDP ingress sets metadata + * fwd: encap route prepends outer header(s) + * TX: TC egress validates metadata + * + * A routable IPv6 UDP frame is written into the tap fd, so it enters the RX + * path where XDP stores metadata. Routing then forwards it back out the same + * tap through an encapsulating route that prepends outer header(s). The TC + * egress program checks that the pushed header did not silently corrupt + * metadata. + */ +#define LWT_PIN_PATH "/sys/fs/bpf/xdp_context_lwt_xmit" + +enum lwt_encap_type { + LWT_ENCAP_BPF, + LWT_ENCAP_MPLS, + LWT_ENCAP_SEG6, + LWT_ENCAP_IOAM6, +}; + +static void test_lwt_encap(struct test_xdp_meta *skel, + enum lwt_encap_type type) +{ + LIBBPF_OPTS(bpf_tc_hook, tc_hook, .attach_point = BPF_TC_EGRESS); + LIBBPF_OPTS(bpf_tc_opts, tc_opts, .handle = 1, .priority = 1); + struct bpf_program *lwt_prog = NULL; + struct netns_obj *ns = NULL; + const char *encap; + bool pinned = false; + int tap_ifindex; + int tap_fd = -1; + int ret; + + skel->bss->test_pass = false; + + switch (type) { + case LWT_ENCAP_BPF: + encap = "encap bpf xmit pinned " LWT_PIN_PATH " via fd00::2"; + lwt_prog = skel->progs.dummy_lwt_xmit; + break; + case LWT_ENCAP_MPLS: + encap = "encap mpls 100 via inet6 fd00::2"; + break; + case LWT_ENCAP_SEG6: + encap = "encap seg6 mode encap segs fd00::2"; + break; + case LWT_ENCAP_IOAM6: + encap = "encap ioam6 mode encap tundst fd00::2 " + "trace prealloc type 0x800000 ns 0 size 4 via fd00::2"; + break; + default: + return; + } + + if (lwt_prog) { + unlink(LWT_PIN_PATH); + ret = bpf_program__pin(lwt_prog, LWT_PIN_PATH); + if (!ASSERT_OK(ret, "pin lwt prog")) + return; + pinned = true; + } + + ns = netns_new(LWT_NETNS, true); + if (!ASSERT_OK_PTR(ns, "netns_new")) + goto close; + + tap_fd = open_tuntap(TAP_NAME, true); + if (!ASSERT_GE(tap_fd, 0, "open_tuntap")) + goto close; + + SYS(close, "ip link set dev " TAP_NAME " address " RX_MAC); + SYS(close, "sysctl -wq net.ipv6.conf.all.forwarding=1"); + SYS(close, "ip addr add fd00::1/64 dev " TAP_NAME " nodad"); + SYS(close, "ip link set dev " TAP_NAME " up"); + SYS(close, "ip neigh add fd00::2 lladdr " TX_MAC " nud permanent dev " TAP_NAME); + SYS(close, "ip -6 route add fd00:1::/64 %s dev %s", encap, TAP_NAME); + + tap_ifindex = if_nametoindex(TAP_NAME); + if (!ASSERT_GE(tap_ifindex, 0, "if_nametoindex")) + goto close; + + ret = bpf_xdp_attach(tap_ifindex, bpf_program__fd(skel->progs.ing_xdp), + 0, NULL); + if (!ASSERT_GE(ret, 0, "bpf_xdp_attach")) + goto close; + + tc_hook.ifindex = tap_ifindex; + ret = bpf_tc_hook_create(&tc_hook); + if (!ASSERT_OK(ret, "bpf_tc_hook_create")) + goto close; + + tc_opts.prog_fd = bpf_program__fd(skel->progs.tc_is_meta_empty); + ret = bpf_tc_attach(&tc_hook, &tc_opts); + if (!ASSERT_OK(ret, "bpf_tc_attach")) + goto close; + + ret = write_test_packet_udp(tap_fd); + if (!ASSERT_OK(ret, "write_test_packet_udp")) + goto close; + + if (!ASSERT_TRUE(skel->bss->test_pass, "test_pass")) + dump_err_stream(skel->progs.tc_is_meta_empty); + +close: + if (tap_fd >= 0) + close(tap_fd); + netns_free(ns); + if (pinned) + unlink(LWT_PIN_PATH); +} + +void test_xdp_context_lwt_encap(void) +{ + struct test_xdp_meta *skel; + + skel = test_xdp_meta__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open and load skeleton")) + return; + + if (test__start_subtest("bpf_encap")) + test_lwt_encap(skel, LWT_ENCAP_BPF); + if (test__start_subtest("mpls_encap")) + test_lwt_encap(skel, LWT_ENCAP_MPLS); + if (test__start_subtest("seg6_encap")) + test_lwt_encap(skel, LWT_ENCAP_SEG6); + if (test__start_subtest("ioam6_encap")) + test_lwt_encap(skel, LWT_ENCAP_IOAM6); + + test_xdp_meta__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/test_xdp_meta.c b/tools/testing/selftests/bpf/progs/test_xdp_meta.c index fa73b17cb999..08b03be0b891 100644 --- a/tools/testing/selftests/bpf/progs/test_xdp_meta.c +++ b/tools/testing/selftests/bpf/progs/test_xdp_meta.c @@ -21,10 +21,6 @@ bool test_pass; -static const __u8 smac_want[ETH_ALEN] = { - 0x12, 0x34, 0xDE, 0xAD, 0xBE, 0xEF, -}; - static const __u8 meta_want[META_SIZE] = { 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, @@ -32,11 +28,6 @@ static const __u8 meta_want[META_SIZE] = { 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, }; -static bool check_smac(const struct ethhdr *eth) -{ - return !__builtin_memcmp(eth->h_source, smac_want, ETH_ALEN); -} - static bool check_metadata(const char *file, int line, __u8 *meta_have) { if (!__builtin_memcmp(meta_have, meta_want, META_SIZE)) @@ -280,18 +271,47 @@ fail: return TC_ACT_SHOT; } +/* Test packets carry test metadata pattern as payload. */ +static bool is_test_packet_xdp(struct xdp_md *ctx) +{ + __u8 meta_have[META_SIZE]; + __u32 len; + + len = bpf_xdp_get_buff_len(ctx); + if (len < META_SIZE) + return false; + if (bpf_xdp_load_bytes(ctx, len - META_SIZE, meta_have, META_SIZE)) + return false; + if (__builtin_memcmp(meta_have, meta_want, META_SIZE)) + return false; + + return true; +} + +/* Test packets carry test metadata pattern as payload. */ +static bool is_test_packet_tc(struct __sk_buff *ctx) +{ + __u8 meta_have[META_SIZE]; + + if (ctx->len < META_SIZE) + return false; + if (bpf_skb_load_bytes(ctx, ctx->len - META_SIZE, meta_have, META_SIZE)) + return false; + if (__builtin_memcmp(meta_have, meta_want, META_SIZE)) + return false; + + return true; +} + /* Reserve and clear space for metadata but don't populate it */ SEC("xdp") int ing_xdp_zalloc_meta(struct xdp_md *ctx) { - struct ethhdr *eth = ctx_ptr(ctx, data); __u8 *meta; int ret; /* Drop any non-test packets */ - if (eth + 1 > ctx_ptr(ctx, data_end)) - return XDP_DROP; - if (!check_smac(eth)) + if (!is_test_packet_xdp(ctx)) return XDP_DROP; ret = bpf_xdp_adjust_meta(ctx, -META_SIZE); @@ -310,33 +330,24 @@ int ing_xdp_zalloc_meta(struct xdp_md *ctx) SEC("xdp") int ing_xdp(struct xdp_md *ctx) { - __u8 *data, *data_meta, *data_end, *payload; - struct ethhdr *eth; + __u8 *data, *data_meta; int ret; + /* Drop any non-test packets */ + if (!is_test_packet_xdp(ctx)) + return XDP_DROP; + ret = bpf_xdp_adjust_meta(ctx, -META_SIZE); if (ret < 0) return XDP_DROP; data_meta = ctx_ptr(ctx, data_meta); - data_end = ctx_ptr(ctx, data_end); data = ctx_ptr(ctx, data); - eth = (struct ethhdr *)data; - payload = data + sizeof(struct ethhdr); - - if (payload + META_SIZE > data_end || - data_meta + META_SIZE > data) + if (data_meta + META_SIZE > data) return XDP_DROP; - /* The Linux networking stack may send other packets on the test - * interface that interfere with the test. Just drop them. - * The test packets can be recognized by their source MAC address. - */ - if (!check_smac(eth)) - return XDP_DROP; - - __builtin_memcpy(data_meta, payload, META_SIZE); + __builtin_memcpy(data_meta, meta_want, META_SIZE); return XDP_PASS; } @@ -353,7 +364,7 @@ int clone_data_meta_survives_data_write(struct __sk_buff *ctx) if (eth + 1 > ctx_ptr(ctx, data_end)) goto out; /* Ignore non-test packets */ - if (!check_smac(eth)) + if (!is_test_packet_tc(ctx)) goto out; if (meta_have + META_SIZE > eth) @@ -383,7 +394,7 @@ int clone_data_meta_survives_meta_write(struct __sk_buff *ctx) if (eth + 1 > ctx_ptr(ctx, data_end)) goto out; /* Ignore non-test packets */ - if (!check_smac(eth)) + if (!is_test_packet_tc(ctx)) goto out; if (meta_have + META_SIZE > eth) @@ -416,7 +427,7 @@ int clone_meta_dynptr_survives_data_slice_write(struct __sk_buff *ctx) if (!eth) goto out; /* Ignore non-test packets */ - if (!check_smac(eth)) + if (!is_test_packet_tc(ctx)) goto out; bpf_dynptr_from_skb_meta(ctx, 0, &meta); @@ -436,16 +447,11 @@ out: SEC("tc") int clone_meta_dynptr_survives_meta_slice_write(struct __sk_buff *ctx) { - struct bpf_dynptr data, meta; - const struct ethhdr *eth; + struct bpf_dynptr meta; __u8 *meta_have; - bpf_dynptr_from_skb(ctx, 0, &data); - eth = bpf_dynptr_slice(&data, 0, NULL, sizeof(*eth)); - if (!eth) - goto out; /* Ignore non-test packets */ - if (!check_smac(eth)) + if (!is_test_packet_tc(ctx)) goto out; bpf_dynptr_from_skb_meta(ctx, 0, &meta); @@ -471,15 +477,10 @@ int clone_meta_dynptr_rw_before_data_dynptr_write(struct __sk_buff *ctx) { struct bpf_dynptr data, meta; __u8 meta_have[META_SIZE]; - const struct ethhdr *eth; int err; - bpf_dynptr_from_skb(ctx, 0, &data); - eth = bpf_dynptr_slice(&data, 0, NULL, sizeof(*eth)); - if (!eth) - goto out; /* Ignore non-test packets */ - if (!check_smac(eth)) + if (!is_test_packet_tc(ctx)) goto out; /* Expect read-write metadata before unclone */ @@ -492,6 +493,7 @@ int clone_meta_dynptr_rw_before_data_dynptr_write(struct __sk_buff *ctx) goto out; /* Helper write to payload will unclone the packet */ + bpf_dynptr_from_skb(ctx, 0, &data); bpf_dynptr_write(&data, offsetof(struct ethhdr, h_proto), "x", 1, 0); err = bpf_dynptr_read(meta_have, META_SIZE, &meta, 0, 0); @@ -511,17 +513,12 @@ out: SEC("tc") int clone_meta_dynptr_rw_before_meta_dynptr_write(struct __sk_buff *ctx) { - struct bpf_dynptr data, meta; + struct bpf_dynptr meta; __u8 meta_have[META_SIZE]; - const struct ethhdr *eth; int err; - bpf_dynptr_from_skb(ctx, 0, &data); - eth = bpf_dynptr_slice(&data, 0, NULL, sizeof(*eth)); - if (!eth) - goto out; /* Ignore non-test packets */ - if (!check_smac(eth)) + if (!is_test_packet_tc(ctx)) goto out; /* Expect read-write metadata before unclone */ @@ -545,6 +542,28 @@ out: return TC_ACT_SHOT; } +SEC("lwt_xmit") +int dummy_lwt_xmit(struct __sk_buff *ctx) +{ + if (bpf_skb_change_head(ctx, sizeof(struct ipv6hdr), 0)) + return BPF_DROP; + + return BPF_OK; +} + +SEC("tc") +int tc_is_meta_empty(struct __sk_buff *ctx) +{ + if (!is_test_packet_tc(ctx)) + return TC_ACT_OK; + + if (ctx->data_meta != ctx->data) + return TC_ACT_OK; + + test_pass = true; + return TC_ACT_OK; +} + SEC("tc") int helper_skb_vlan_push_pop(struct __sk_buff *ctx) { diff --git a/tools/testing/selftests/drivers/net/bonding/Makefile b/tools/testing/selftests/drivers/net/bonding/Makefile index be130bf585a4..6364ca02642d 100644 --- a/tools/testing/selftests/drivers/net/bonding/Makefile +++ b/tools/testing/selftests/drivers/net/bonding/Makefile @@ -13,6 +13,7 @@ TEST_PROGS := \ bond_options.sh \ bond_passive_lacp.sh \ bond_stacked_header_parse.sh \ + bond_vlan_real_dev.sh \ dev_addr_lists.sh \ mode-1-recovery-updelay.sh \ mode-2-recovery-updelay.sh \ diff --git a/tools/testing/selftests/drivers/net/bonding/bond_vlan_real_dev.sh b/tools/testing/selftests/drivers/net/bonding/bond_vlan_real_dev.sh new file mode 100755 index 000000000000..542d9ffc4819 --- /dev/null +++ b/tools/testing/selftests/drivers/net/bonding/bond_vlan_real_dev.sh @@ -0,0 +1,180 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Test propagation of a real device's state to the VLANs stacked on top of it +# when the real device is (or becomes) a bond member. +# +# The kernel mirrors a real device's UP/DOWN, MTU and feature changes onto its +# VLANs. This is done asynchronously (netdev_work): doing it synchronously from +# the real device's notifier could deadlock. If the real device is brought up +# while enslaved to a bond - so its instance lock is held across NETDEV_UP - and +# a VLAN on top of it is itself a bond member, the synchronous propagation +# re-entered the stack and tried to take the same instance lock again. +# +# Cover both halves: +# - the deferred UP/DOWN, MTU and feature propagation actually lands on the +# VLAN (link state and MTU use an ops-locked dummy, i.e. the deferral path), +# - the deadlock-prone topology - a VLAN on a dummy, with the VLAN and the +# dummy each enslaved to a different bond - can be built without hanging. + +ALL_TESTS=" + vlan_link_state + vlan_mtu + vlan_features + vlan_real_dev_enslave +" + +REQUIRE_MZ=no +NUM_NETIFS=0 +lib_dir=$(dirname "$0") +source "$lib_dir"/../../../net/forwarding/lib.sh + +# Return 0 if $dev in netns $ns has flag $flag set (e.g. UP) in its <...> flags. +link_has_flag() +{ + local ns=$1 dev=$2 flag=$3 + + ip -n "$ns" link show dev "$dev" 2>/dev/null | grep -q "[<,]${flag}[,>]" +} + +link_lacks_flag() +{ + ! link_has_flag "$@" +} + +link_mtu_is() +{ + local ns=$1 dev=$2 want=$3 cur + + cur=$(ip -n "$ns" link show dev "$dev" 2>/dev/null | \ + sed -n 's/.* mtu \([0-9]\+\).*/\1/p') + [ "$cur" = "$want" ] +} + +vlan_feature_is() +{ + local ns=$1 dev=$2 feature=$3 value=$4 + + ip netns exec "$ns" ethtool -k "$dev" 2>/dev/null | \ + grep -q "^$feature: $value" +} + +link_has_master() +{ + local ns=$1 dev=$2 master=$3 + + ip -n "$ns" -o link show dev "$dev" 2>/dev/null | grep -q "master $master" +} + +vlan_link_state() +{ + RET=0 + + ip -n "$NS" link add ls_dummy type dummy + ip -n "$NS" link add link ls_dummy name ls_vlan type vlan id 100 + + # Bringing the real device up must propagate UP to the VLAN. + ip -n "$NS" link set ls_dummy up + busywait "$BUSYWAIT_TIMEOUT" link_has_flag "$NS" ls_vlan UP + check_err $? "VLAN did not go UP after the real device went UP" + + # ... and likewise for DOWN. + ip -n "$NS" link set ls_dummy down + busywait "$BUSYWAIT_TIMEOUT" link_lacks_flag "$NS" ls_vlan UP + check_err $? "VLAN did not go DOWN after the real device went DOWN" + + ip -n "$NS" link del ls_vlan + ip -n "$NS" link del ls_dummy + + log_test "VLAN link state follows the real device" +} + +vlan_mtu() +{ + RET=0 + + # The VLAN inherits the real device's MTU (2000) at creation time. + ip -n "$NS" link add mtu_dummy mtu 2000 type dummy + ip -n "$NS" link add link mtu_dummy name mtu_vlan type vlan id 100 + + # Shrinking the real device's MTU must clamp the VLAN's MTU. + ip -n "$NS" link set mtu_dummy mtu 1500 + busywait "$BUSYWAIT_TIMEOUT" link_mtu_is "$NS" mtu_vlan 1500 + check_err $? "VLAN MTU not clamped after the real device's MTU shrank" + + ip -n "$NS" link del mtu_vlan + ip -n "$NS" link del mtu_dummy + + log_test "VLAN MTU clamped to the real device" +} + +vlan_features() +{ + RET=0 + + # Use veth as the real device: unlike dummy it exports vlan_features, so + # the VLAN actually inherits a toggleable offload to assert on. + ip -n "$NS" link add ft_veth type veth peer name ft_veth_pr + ip -n "$NS" link add link ft_veth name ft_vlan type vlan id 100 + + vlan_feature_is "$NS" ft_vlan scatter-gather on + check_err $? "VLAN did not inherit scatter-gather from the real device" + + # Toggling the offload on the real device must propagate to the VLAN. + ip netns exec "$NS" ethtool -K ft_veth sg off + busywait "$BUSYWAIT_TIMEOUT" \ + vlan_feature_is "$NS" ft_vlan scatter-gather off + check_err $? "VLAN scatter-gather still on after disabling it on real dev" + + ip netns exec "$NS" ethtool -K ft_veth sg on + busywait "$BUSYWAIT_TIMEOUT" \ + vlan_feature_is "$NS" ft_vlan scatter-gather on + check_err $? "VLAN scatter-gather still off after enabling it on real dev" + + ip -n "$NS" link del ft_vlan + ip -n "$NS" link del ft_veth + + log_test "VLAN features follow the real device" +} + +vlan_real_dev_enslave() +{ + RET=0 + + # dummy <- VLAN -> bond0, then enslave the dummy itself to bond1. The + # last step brings the dummy up under bond1's instance lock, which used + # to deadlock while synchronously propagating UP to the (bond-enslaved) + # VLAN on top. + ip -n "$NS" link add dl_dummy type dummy + ip -n "$NS" link set dl_dummy up + ip -n "$NS" link add link dl_dummy name dl_vlan type vlan id 100 + + ip -n "$NS" link add dl_bond0 type bond mode active-backup + ip -n "$NS" link set dl_vlan down + ip -n "$NS" link set dl_vlan master dl_bond0 + check_err $? "could not enslave the VLAN to bond0" + + ip -n "$NS" link add dl_bond1 type bond mode active-backup + ip -n "$NS" link set dl_dummy down + ip -n "$NS" link set dl_dummy master dl_bond1 + check_err $? "could not enslave the real device to bond1" + + # If we got here the kernel did not deadlock; make sure it is still + # responsive and the enslave really took effect. + link_has_master "$NS" dl_dummy dl_bond1 + check_err $? "real device not enslaved to bond1" + + ip -n "$NS" link del dl_bond1 + ip -n "$NS" link del dl_bond0 + ip -n "$NS" link del dl_vlan + ip -n "$NS" link del dl_dummy + + log_test "VLAN real device enslaved to a second bond" +} + +setup_ns NS +trap 'cleanup_ns $NS' EXIT + +tests_run + +exit "$EXIT_STATUS" diff --git a/tools/testing/selftests/drivers/net/so_txtime.c b/tools/testing/selftests/drivers/net/so_txtime.c index 75f3beef13d9..55a386f3d1b9 100644 --- a/tools/testing/selftests/drivers/net/so_txtime.c +++ b/tools/testing/selftests/drivers/net/so_txtime.c @@ -37,7 +37,7 @@ static int cfg_clockid = CLOCK_TAI; static uint16_t cfg_port = 8000; -static int cfg_variance_us = 4000; +static int cfg_variance_us = 8000; static bool cfg_machine_slow; static uint64_t cfg_start_time_ns; static int cfg_mark; diff --git a/tools/testing/selftests/net/broadcast_ether_dst.sh b/tools/testing/selftests/net/broadcast_ether_dst.sh index 334a7eca8a80..cc571f607429 100755 --- a/tools/testing/selftests/net/broadcast_ether_dst.sh +++ b/tools/testing/selftests/net/broadcast_ether_dst.sh @@ -44,7 +44,7 @@ test_broadcast_ether_dst() { # tcpdump will exit after receiving a single packet # timeout will kill tcpdump if it is still running after 2s timeout 2s ip netns exec "${CLIENT_NS}" \ - tcpdump -i link0 -c 1 -w "${CAPFILE}" icmp &> "${OUTPUT}" & + tcpdump -i link0 -c 1 -w "${CAPFILE}" -Z root icmp &> "${OUTPUT}" & pid=$! slowwait 1 grep -qs "listening" "${OUTPUT}" diff --git a/tools/testing/selftests/net/netfilter/conntrack_sctp_collision.sh b/tools/testing/selftests/net/netfilter/conntrack_sctp_collision.sh index d860f7d9744b..7261975957ef 100755 --- a/tools/testing/selftests/net/netfilter/conntrack_sctp_collision.sh +++ b/tools/testing/selftests/net/netfilter/conntrack_sctp_collision.sh @@ -2,18 +2,32 @@ # SPDX-License-Identifier: GPL-2.0 # # Testing For SCTP COLLISION SCENARIO as Below: -# +# 1. Stale INIT_ACK capture: # 14:35:47.655279 IP CLIENT_IP.PORT > SERVER_IP.PORT: sctp (1) [INIT] [init tag: 2017837359] # 14:35:48.353250 IP SERVER_IP.PORT > CLIENT_IP.PORT: sctp (1) [INIT] [init tag: 1187206187] # 14:35:48.353275 IP CLIENT_IP.PORT > SERVER_IP.PORT: sctp (1) [INIT ACK] [init tag: 2017837359] # 14:35:48.353283 IP SERVER_IP.PORT > CLIENT_IP.PORT: sctp (1) [COOKIE ECHO] # 14:35:48.353977 IP CLIENT_IP.PORT > SERVER_IP.PORT: sctp (1) [COOKIE ACK] # 14:35:48.855335 IP SERVER_IP.PORT > CLIENT_IP.PORT: sctp (1) [INIT ACK] [init tag: 164579970] +# (Delayed) +# +# 2. Stale INIT capture: +# 14:35:48.353250 IP SERVER_IP.PORT > CLIENT_IP.PORT: sctp (1) [INIT] [init tag: 1187206187] +# 14:35:48.353275 IP CLIENT_IP.PORT > SERVER_IP.PORT: sctp (1) [INIT ACK] [init tag: 2017837359] +# 14:35:48.353283 IP SERVER_IP.PORT > CLIENT_IP.PORT: sctp (1) [COOKIE ECHO] +# 14:35:48.353977 IP CLIENT_IP.PORT > SERVER_IP.PORT: sctp (1) [COOKIE ACK] +# 14:35:47.655279 IP CLIENT_IP.PORT > SERVER_IP.PORT: sctp (1) [INIT] [init tag: 2017837359] +# (Delayed) +# 14:35:48.855335 IP SERVER_IP.PORT > CLIENT_IP.PORT: sctp (1) [INIT ACK] [init tag: 164579970] # # TOPO: SERVER_NS (link0)<--->(link1) ROUTER_NS (link2)<--->(link3) CLIENT_NS source lib.sh +checktool "nft --version" "run test without nft" +checktool "tc -h" "run test without tc" +checktool "modprobe -q sctp" "load sctp module" + CLIENT_IP="198.51.200.1" CLIENT_PORT=1234 @@ -24,7 +38,8 @@ CLIENT_GW="198.51.200.2" SERVER_GW="198.51.100.2" # setup the topo -setup() { +topo_setup() { + # setup_ns cleans up existing net namespaces first. setup_ns CLIENT_NS SERVER_NS ROUTER_NS ip -n "$SERVER_NS" link add link0 type veth peer name link1 netns "$ROUTER_NS" ip -n "$CLIENT_NS" link add link3 type veth peer name link2 netns "$ROUTER_NS" @@ -38,35 +53,53 @@ setup() { ip -n "$ROUTER_NS" addr add $SERVER_GW/24 dev link1 ip -n "$ROUTER_NS" addr add $CLIENT_GW/24 dev link2 ip net exec "$ROUTER_NS" sysctl -wq net.ipv4.ip_forward=1 + sysctl -wq net.netfilter.nf_log_all_netns=1 ip -n "$CLIENT_NS" link set link3 up ip -n "$CLIENT_NS" addr add $CLIENT_IP/24 dev link3 ip -n "$CLIENT_NS" route add $SERVER_IP dev link3 via $CLIENT_GW +} + +conf_delay() +{ + # simulate the delay on OVS upcall by setting up a delay for INIT_ACK/INIT with + local ns=$1 + local link=$2 + local chunk_type=$3 - # simulate the delay on OVS upcall by setting up a delay for INIT_ACK with - # tc on $SERVER_NS side - tc -n "$SERVER_NS" qdisc add dev link0 root handle 1: htb r2q 64 - tc -n "$SERVER_NS" class add dev link0 parent 1: classid 1:1 htb rate 100mbit - tc -n "$SERVER_NS" filter add dev link0 parent 1: protocol ip u32 match ip protocol 132 \ - 0xff match u8 2 0xff at 32 flowid 1:1 - if ! tc -n "$SERVER_NS" qdisc add dev link0 parent 1:1 handle 10: netem delay 1200ms; then + # use a smaller number for assoc's max_retrans to reproduce the issue + ip net exec "$CLIENT_NS" sysctl -wq net.sctp.association_max_retrans=3 + + tc -n "$ns" qdisc add dev "$link" root handle 1: htb r2q 64 + tc -n "$ns" class add dev "$link" parent 1: classid 1:1 htb rate 100mbit + tc -n "$ns" filter add dev "$link" parent 1: protocol ip \ + u32 match ip protocol 132 0xff match u8 "$chunk_type" 0xff at 32 flowid 1:1 + if ! tc -n "$ns" qdisc add dev "$link" parent 1:1 handle 10: netem delay 1200ms; then echo "SKIP: Cannot add netem qdisc" - exit $ksft_skip + return $ksft_skip fi # simulate the ctstate check on OVS nf_conntrack - ip net exec "$ROUTER_NS" iptables -A FORWARD -m state --state INVALID,UNTRACKED -j DROP - ip net exec "$ROUTER_NS" iptables -A INPUT -p sctp -j DROP - - # use a smaller number for assoc's max_retrans to reproduce the issue - modprobe -q sctp - ip net exec "$CLIENT_NS" sysctl -wq net.sctp.association_max_retrans=3 + ip net exec "$ROUTER_NS" nft -f - <<-EOF + table ip t { + chain forward { + type filter hook forward priority filter; policy accept; + meta l4proto icmp counter accept + ct state new counter accept + ct state established,related counter accept + ct state invalid log flags all counter drop comment \ + "Expect to drop stale INIT/INIT_ACK chunks" + counter + } + } + EOF + return 0 } cleanup() { - ip net exec "$CLIENT_NS" pkill sctp_collision >/dev/null 2>&1 - ip net exec "$SERVER_NS" pkill sctp_collision >/dev/null 2>&1 + # cleanup_all_ns terminates running processes in the namespaces. cleanup_all_ns + sysctl -wq net.netfilter.nf_log_all_netns=0 } do_test() { @@ -81,7 +114,19 @@ do_test() { # run the test case trap cleanup EXIT -setup && \ -echo "Test for SCTP Collision in nf_conntrack:" && \ -do_test && echo "PASS!" -exit $? + +echo "Test for SCTP INIT_ACK Collision in nf_conntrack:" +topo_setup || exit $? +conf_delay $SERVER_NS link0 2 || exit $? + +if ! do_test; then + exit $ksft_fail +fi + +echo "Test for SCTP INIT Collision in nf_conntrack:" +topo_setup || exit $? +conf_delay $CLIENT_NS link3 1 || exit $? + +if ! do_test; then + exit $ksft_fail +fi diff --git a/tools/testing/selftests/net/netfilter/nft_flowtable.sh b/tools/testing/selftests/net/netfilter/nft_flowtable.sh index 7a34ef468975..08ad07500e8a 100755 --- a/tools/testing/selftests/net/netfilter/nft_flowtable.sh +++ b/tools/testing/selftests/net/netfilter/nft_flowtable.sh @@ -592,7 +592,7 @@ ip -net "$nsr1" link set tun0 up ip -net "$nsr1" addr add 192.168.100.1/24 dev tun0 ip netns exec "$nsr1" sysctl net.ipv4.conf.tun0.forwarding=1 > /dev/null -ip -net "$nsr1" link add name tun6 type ip6tnl local fee1:2::1 remote fee1:2::2 +ip -net "$nsr1" link add name tun6 type ip6tnl local fee1:2::1 remote fee1:2::2 encaplimit none ip -net "$nsr1" link set tun6 up ip -net "$nsr1" addr add fee1:3::1/64 dev tun6 nodad @@ -601,7 +601,7 @@ ip -net "$nsr2" link set tun0 up ip -net "$nsr2" addr add 192.168.100.2/24 dev tun0 ip netns exec "$nsr2" sysctl net.ipv4.conf.tun0.forwarding=1 > /dev/null -ip -net "$nsr2" link add name tun6 type ip6tnl local fee1:2::2 remote fee1:2::1 || ret=1 +ip -net "$nsr2" link add name tun6 type ip6tnl local fee1:2::2 remote fee1:2::1 encaplimit none || ret=1 ip -net "$nsr2" link set tun6 up ip -net "$nsr2" addr add fee1:3::2/64 dev tun6 nodad @@ -651,7 +651,7 @@ ip -net "$nsr1" route change default via 192.168.200.2 ip netns exec "$nsr1" sysctl net.ipv4.conf.tun0/10.forwarding=1 > /dev/null ip netns exec "$nsr1" nft -a insert rule inet filter forward 'meta oif tun0.10 accept' -ip -net "$nsr1" link add name tun6.10 type ip6tnl local fee1:4::1 remote fee1:4::2 +ip -net "$nsr1" link add name tun6.10 type ip6tnl local fee1:4::1 remote fee1:4::2 encaplimit none ip -net "$nsr1" link set tun6.10 up ip -net "$nsr1" addr add fee1:5::1/64 dev tun6.10 nodad ip -6 -net "$nsr1" route delete default @@ -670,7 +670,7 @@ ip -net "$nsr2" addr add 192.168.200.2/24 dev tun0.10 ip -net "$nsr2" route change default via 192.168.200.1 ip netns exec "$nsr2" sysctl net.ipv4.conf.tun0/10.forwarding=1 > /dev/null -ip -net "$nsr2" link add name tun6.10 type ip6tnl local fee1:4::2 remote fee1:4::1 || ret=1 +ip -net "$nsr2" link add name tun6.10 type ip6tnl local fee1:4::2 remote fee1:4::1 encaplimit none || ret=1 ip -net "$nsr2" link set tun6.10 up ip -net "$nsr2" addr add fee1:5::2/64 dev tun6.10 nodad ip -6 -net "$nsr2" route delete default diff --git a/tools/testing/selftests/net/netfilter/nft_queue.sh b/tools/testing/selftests/net/netfilter/nft_queue.sh index d80390848e85..7c857a2e0f34 100755 --- a/tools/testing/selftests/net/netfilter/nft_queue.sh +++ b/tools/testing/selftests/net/netfilter/nft_queue.sh @@ -85,11 +85,12 @@ ip -net "$ns3" route add default via 10.0.3.1 ip -net "$ns3" route add default via dead:3::1 load_ruleset() { - local name=$1 - local prio=$2 + local family=$1 + local name=$2 + local prio=$3 ip netns exec "$nsrouter" nft -f /dev/stdin <<EOF -table inet $name { +table $family $name { chain nfq { ip protocol icmp queue bypass icmpv6 type { "echo-request", "echo-reply" } queue num 1 bypass @@ -228,6 +229,7 @@ nf_queue_wait() test_queue() { local expected="$1" + local family="$2" local last="" # spawn nf_queue listeners @@ -255,11 +257,13 @@ test_queue() if [ x"$last" != x"$expected packets total" ]; then echo "FAIL: Expected $expected packets total, but got $last" 1>&2 ip netns exec "$nsrouter" nft list ruleset + echo -n "$TMPFILE0: ";cat "$TMPFILE0" + echo -n "$TMPFILE1: ";cat "$TMPFILE1" exit 1 fi done - echo "PASS: Expected and received $last" + echo "PASS: Expected and received $last ($family)" } listener_ready() @@ -400,6 +404,8 @@ EOF kill "$nfqpid" echo "PASS: icmp+nfqueue via vrf" + ip -net "$ns1" link del tvrf + ip netns exec "$ns1" nft flush ruleset } sctp_listener_ready() @@ -814,12 +820,53 @@ EOF check_tainted "queue program exiting while packets queued" } +test_queue_bridge() +{ + ip -net "$nsrouter" addr flush dev veth0 + ip -net "$nsrouter" addr flush dev veth1 + + ip -net "$nsrouter" link add br0 type bridge + ip -net "$nsrouter" link set veth0 master br0 + ip -net "$nsrouter" link set veth1 master br0 + + ip -net "$nsrouter" link set br0 up + + ip -net "$nsrouter" addr add 10.0.2.1/16 dev br0 + ip -net "$nsrouter" addr add dead:2::1/64 dev br0 nodad + + ip -net "$ns1" addr flush dev eth0 + ip -net "$ns2" addr flush dev eth0 + + ip -net "$ns1" addr add 10.0.1.1/16 dev eth0 + ip -net "$ns1" addr add dead:2::2/64 dev eth0 nodad + + ip -net "$ns2" addr add 10.0.2.99/16 dev eth0 + ip -net "$ns2" addr add dead:2::99/64 dev eth0 nodad + + ip netns exec "$nsrouter" nft flush ruleset + + ip netns exec "$nsrouter" sysctl net.ipv6.conf.all.forwarding=0 > /dev/null + ip netns exec "$nsrouter" sysctl net.ipv4.conf.veth0.forwarding=0 > /dev/null + ip netns exec "$nsrouter" sysctl net.ipv4.conf.veth1.forwarding=0 > /dev/null + + if ! test_ping;then + echo "FAIL: netns bridge connectivity" 1>&2 + exit $ret + fi + + load_ruleset "bridge" "filter" 10 + test_queue 10 "bridge" + + load_ruleset "bridge" "filter2" 20 + test_queue 20 "bridge" +} + ip netns exec "$nsrouter" sysctl net.ipv6.conf.all.forwarding=1 > /dev/null ip netns exec "$nsrouter" sysctl net.ipv4.conf.veth0.forwarding=1 > /dev/null ip netns exec "$nsrouter" sysctl net.ipv4.conf.veth1.forwarding=1 > /dev/null ip netns exec "$nsrouter" sysctl net.ipv4.conf.veth2.forwarding=1 > /dev/null -load_ruleset "filter" 0 +load_ruleset "inet" "filter" 0 if test_ping; then # queue bypass works (rules were skipped, no listener) @@ -842,11 +889,11 @@ load_counter_ruleset 10 # 1x icmp prerouting,forward,postrouting -> 3 queue events (6 incl. reply). # 1x icmp prerouting,input,output postrouting -> 4 queue events incl. reply. # so we expect that userspace program receives 10 packets. -test_queue 10 +test_queue 10 "inet" # same. We queue to a second program as well. -load_ruleset "filter2" 20 -test_queue 20 +load_ruleset "inet" "filter2" 20 +test_queue 20 "inet" ip netns exec "$ns1" nft flush ruleset test_tcp_forward @@ -863,4 +910,7 @@ test_queue_stress test_icmp_vrf test_queue_removal +# turns router into a bridge +test_queue_bridge + exit $ret diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c index 9b9a3cb2700d..cbdd3ea28b99 100644 --- a/tools/testing/selftests/net/tls.c +++ b/tools/testing/selftests/net/tls.c @@ -997,6 +997,8 @@ TEST_F(tls, splice_short) char sendbuf[0x100]; char sendchar = 'S'; int pipefds[2]; + int pipe_sz; + int ret; int i; sendchar_iov.iov_base = &sendchar; @@ -1005,7 +1007,11 @@ TEST_F(tls, splice_short) memset(sendbuf, 's', sizeof(sendbuf)); ASSERT_GE(pipe2(pipefds, O_NONBLOCK), 0); - ASSERT_GE(fcntl(pipefds[0], F_SETPIPE_SZ, (MAX_FRAGS + 1) * 0x1000), 0); + pipe_sz = (MAX_FRAGS + 1) * getpagesize(); + ret = fcntl(pipefds[0], F_SETPIPE_SZ, pipe_sz); + if (ret < 0 && errno == EPERM) + SKIP(return, "insufficient pipe capacity"); + ASSERT_GE(ret, pipe_sz); for (i = 0; i < MAX_FRAGS; i++) ASSERT_GE(vmsplice(pipefds[1], &sendchar_iov, 1, 0), 0); diff --git a/tools/testing/selftests/net/vlan_bridge_binding.sh b/tools/testing/selftests/net/vlan_bridge_binding.sh index e8c02c64e03a..d04caa14202d 100755 --- a/tools/testing/selftests/net/vlan_bridge_binding.sh +++ b/tools/testing/selftests/net/vlan_bridge_binding.sh @@ -64,7 +64,7 @@ check_operstate() local expect=$1; shift local operstate - operstate=$(busywait 1000 \ + operstate=$(busywait 2000 \ operstate_is "$dev" "$expect") check_err $? "Got operstate of $operstate, expected $expect" } diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/ct.json b/tools/testing/selftests/tc-testing/tc-tests/actions/ct.json index 33bb8f3ff8ed..da65f838bd52 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/actions/ct.json +++ b/tools/testing/selftests/tc-testing/tc-tests/actions/ct.json @@ -664,5 +664,43 @@ "teardown": [ "$TC qdisc del dev $DEV1 ingress_block 21 clsact" ] + }, + { + "id": "9c2a", + "name": "Act_ct preserves skb cb across defrag before prio dequeue", + "category": [ + "actions", + "ct", + "scapy" + ], + "plugins": { + "requires": [ + "nsPlugin", + "scapyPlugin" + ] + }, + "setup": [ + "$TC qdisc add dev $DUMMY root handle 1: prio", + "$TC qdisc add dev $DUMMY clsact", + "$TC qdisc add dev $DEV1 clsact", + "$TC filter add dev $DEV1 ingress protocol ip prio 1 matchall action mirred egress redirect dev $DUMMY" + ], + "cmdUnderTest": "$TC filter add dev $DUMMY egress protocol ip prio 1 matchall action ct zone 1 pipe", + "scapy": [ + { + "iface": "$DEV0", + "count": 1, + "packet": "[Ether()/frag for frag in fragment(IP(src='10.0.0.10', dst='10.0.0.1', id=1)/UDP(sport=12345, dport=9)/Raw(b'A' * 4000), fragsize=1400)]" + } + ], + "expExitCode": "0", + "verifyCmd": "$TC -s qdisc show dev $DUMMY | grep -A 1 '^qdisc prio 1:'", + "matchPattern": "Sent [1-9][0-9]* bytes [1-9][0-9]* pkt", + "matchCount": "1", + "teardown": [ + "$TC qdisc del dev $DEV1 clsact", + "$TC qdisc del dev $DUMMY clsact", + "$TC qdisc del dev $DUMMY root handle 1:" + ] } ] diff --git a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/dualpi2.json b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/dualpi2.json index cd1f2ee8f354..ed6a900bb568 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/dualpi2.json +++ b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/dualpi2.json @@ -250,5 +250,49 @@ "teardown": [ "$TC qdisc del dev $DUMMY handle 1: root" ] + }, + { + "id": "891f", + "name": "Verify DualPI2 GSO backlog accounting with QFQ parent", + "category": [ + "qdisc", + "dualpi2", + "qfq", + "gso" + ], + "plugins": { + "requires": "nsPlugin" + }, + "setup": [ + "$IP link set dev $DUMMY up || true", + "$IP addr add 10.10.10.10/24 dev $DUMMY || true", + "$TC qdisc add dev $DUMMY root handle 1: qfq", + "$TC class add dev $DUMMY parent 1: classid 1:1 qfq weight 1 maxpkt 4096", + "$TC qdisc add dev $DUMMY parent 1:1 handle 2: dualpi2", + "$TC filter add dev $DUMMY parent 1: matchall classid 1:1" + ], + "cmdUnderTest": "./tdc_gso.py 10.10.10.10 10.10.10.1 9000 1200 2400", + "expExitCode": "0", + "verifyCmd": "$TC -j -s qdisc ls dev $DUMMY", + "matchJSON": [ + { + "kind": "qfq", + "handle": "1:", + "packets": 2, + "backlog": 0, + "qlen": 0 + }, + { + "kind": "dualpi2", + "handle": "2:", + "packets": 2, + "backlog": 0, + "qlen": 0 + } + ], + "teardown": [ + "$TC qdisc del dev $DUMMY root", + "$IP addr del 10.10.10.10/24 dev $DUMMY || true" + ] } ] diff --git a/tools/testing/selftests/tc-testing/tdc_gso.py b/tools/testing/selftests/tc-testing/tdc_gso.py new file mode 100755 index 000000000000..b66528ea4b68 --- /dev/null +++ b/tools/testing/selftests/tc-testing/tdc_gso.py @@ -0,0 +1,43 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 + +""" +tdc_gso.py - send a UDP GSO datagram + +Copyright (C) 2026 Xingquan Liu <b1n@b1n.io> +""" + +import argparse +import socket +import struct +import sys + +UDP_MAX_SEGMENTS = 1 << 7 + + +parser = argparse.ArgumentParser(description="UDP GSO datagram sender") +parser.add_argument("src", help="source IPv4 address") +parser.add_argument("dst", help="destination IPv4 address") +parser.add_argument("port", type=int, help="destination UDP port") +parser.add_argument("gso_size", type=int, help="UDP GSO segment payload size") +parser.add_argument("payload_len", type=int, help="total UDP payload length") +args = parser.parse_args() + +if args.gso_size <= 0 or args.gso_size > 0xFFFF: + parser.error("gso_size must fit in an unsigned 16-bit integer") +if args.payload_len <= args.gso_size: + parser.error("payload_len must be larger than gso_size") +if args.payload_len > args.gso_size * UDP_MAX_SEGMENTS: + parser.error("payload_len exceeds UDP_MAX_SEGMENTS") + +SOL_UDP = getattr(socket, "SOL_UDP", socket.IPPROTO_UDP) +UDP_SEGMENT = getattr(socket, "UDP_SEGMENT", 103) + +sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) +sock.bind((args.src, 0)) + +payload = b"b" * args.payload_len +cmsg = [(SOL_UDP, UDP_SEGMENT, struct.pack("=H", args.gso_size))] + +sent = sock.sendmsg([payload], cmsg, 0, (args.dst, args.port)) +sys.exit(sent != len(payload)) |
