summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2026-06-01net: txgbe: fix phylink leak on AML init failureChenguang Zhao
Destroy the phylink instance when fixed-link setup fails. Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/20260528013258.129146-1-zhaochenguang@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-01net: fec_mpc52xx_phy: Add missing MODULE_DESCRIPTION()Rosen Penev
Fixes error during modpost: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/freescale/fec_mpc52xx_phy.o Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20260527025139.10188-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-01net: wwan: t7xx: Add delay between MD and SAP suspendJose Ignacio Tornos Martinez
SAP (Service Access Point) suspend occasionally times out with error -110 (ETIMEDOUT), followed by modem port errors and complete modem failure requiring a system reboot to recover. Error symptoms: mtk_t7xx 0000:72:00.0: [PM] SAP suspend error: -110 mtk_t7xx 0000:72:00.0: can't suspend (...returned -110) mtk_t7xx 0000:07:00.0: Failed to send skb: -22 mtk_t7xx 0000:07:00.0: Write error on MBIM port, -22 The modem firmware needs time after receiving the MD (modem) suspend request to complete internal operations before it is ready to accept the SAP suspend request. Without this delay, if runtime PM attempts to suspend while the firmware is busy, the SAP suspend command times out, leaving the modem in an unrecoverable state. Root cause and userspace interaction: ModemManager 1.24+ includes changes that reduce the likelihood of this issue by ensuring the modem is in a low-power state before the kernel attempts runtime suspend. However, the kernel driver should not depend on specific userspace behavior or ModemManager versions. Older versions (1.20-1.22) are still widely deployed, and the kernel should be robust regardless of userspace implementation details. There appears to be no hardware status register or other mechanism available to query whether the firmware is ready for SAP suspend. A delay between the two suspend requests is the most reliable solution found through testing. Add a 50ms delay between MD suspend and SAP suspend. This gives the firmware adequate time to complete internal operations without adding significant latency to the suspend path. This makes the driver robust across all ModemManager versions and system conditions. Testing: 96+ hours of continuous operation with ModemManager 1.20.2 and Fibocom FM350-GL modem. Zero SAP suspend timeouts observed across 2000+ successful suspend/resume cycles. Previously failed within 24 hours with 100% reproducibility. Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com> Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com> Link: https://patch.msgid.link/20260527061451.12710-1-jtornosm@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-01net: phy: sfp: probe for RollBall I2C-to-MDIO bridge in mdio-i2cPetr Wozniak
The "OEM"/"SFP-10G-T" quirk entry in sfp_fixup_rollball_cc() unconditionally forces MDIO_I2C_ROLLBALL for all modules matching that vendor/part-number combination. This works for modules that genuinely implement a RollBall I2C-to-MDIO bridge, but silently breaks modules that share the same EEPROM strings without having such a bridge. The Realtek RTL8261BE-CG is one such module: a pure copper 10G SFP+ media converter with no I2C-to-MDIO bridge. Its EEPROM reports vendor="OEM", part="SFP-10G-T-I", and -- critically -- Vendor OUI 00:00:00, making OUI-based differentiation impossible. With MDIO_I2C_ROLLBALL forced, the module silently ACKs the unlock password write, the MDIO bus is created, but no PHY responds; the SFP state machine cycles through the RollBall PHY-probe retry window before reporting no PHY. Move the probe into i2c_mii_init_rollball() in mdio-i2c.c, where the RollBall protocol constants are already defined. After sending the unlock password, issue a CMD_READ and poll for CMD_DONE up to 200 ms (10 x 20 ms, matching the existing rollball poll tolerance). A genuine RollBall bridge asserts CMD_DONE within that window; modules without a bridge never do, so i2c_mii_init_rollball() returns -ENODEV. mdio_i2c_alloc() propagates -ENODEV to the caller to signal that no bridge is present and PHY probing should be skipped. sfp_sm_add_mdio_bus() catches -ENODEV and transitions sfp->mdio_protocol to MDIO_I2C_NONE so the rest of the state machine skips PHY probing for this module. Any I2C-level error (NACK, timeout) during the probe is also treated as -ENODEV: if the module does not respond at I2C address 0x51 at all, there is certainly no RollBall bridge there, and SFP initialization should not abort. The probe writes are safe with respect to SFP EEPROM integrity: only modules explicitly listed in the quirk table enter this path, and the RollBall password unlock write to 0x51 was already issued by i2c_mii_init_rollball() before the probe for all such modules. Any module without a device at 0x51 NACKs the transfer and is treated as -ENODEV. Add "OEM"/"SFP-10G-T-I" to the quirk table so RTL8261BE modules enter the probe path; genuine RollBall modules continue to work as before. Signed-off-by: Petr Wozniak <petr.wozniak@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260527053909.2118-1-petr.wozniak@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-01Merge branch 'mv88e6xxx-serdes-on-mv88e6321'Jakub Kicinski
Fidan Aliyeva says: ==================== mv88e6xxx: SERDES on mv88e6321 This patch series add code support to be able to use SERDES feature of mv88e6321 version of Marvel mv88e6xxx series. mv88e6321 has 2 ports to support high speed SERDES but the support is lacking in the driver. mv88e6321 version has a similar architecture to mv88e6352 version making it possible to reuse its pcs functions. That's why the patch series consist of 2 parts: 1. Refactor the serdes functions and pcs_init of mv88e6352 to be more generic (patches 1-2). 2. Add the SERDES support for mv88e6321 reusing 6352's pcs functions The final code has been tested on mv88e6321 ethernet device directly by ip ping tests, performance tests and also verifying the switch's expected register values. Referred document: 88E6321/88E6320 Functional Specification ==================== Link: https://patch.msgid.link/20260528210310.1365858-1-fidan.aliyeva.ext@ericsson.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-01mv88e6xxx: Add SERDES Support for mv88e6321Fidan Aliyeva
Add serdes and pcs_ops functions for mv88e6321. In mv88e6321 2 ports support serdes functionality; port 0 and port 1. These ports are serdes-only ports. Changes: 1. Add a function support to return the lane address for the port based on cmode. 2. Reuse mv88e6352's serdes_get_regs* and pcs_init functions for mv88e6321. Tested on mv88e6321 switch port 0. Co-developed-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com> Signed-off-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com> Signed-off-by: Fidan Aliyeva <fidan.aliyeva.ext@ericsson.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260528210310.1365858-4-fidan.aliyeva.ext@ericsson.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-01mv88e6xxx: Refactor 6352's serdes functionsFidan Aliyeva
Changes: 1. Replace serdes check by mv88e6352_g2_scratch_port_has_serdes in mv88e6352_pcs_init function by mv88e6xxx_serdes_get_lane function making it more generic. 2. Replace serdes checks in mv88e6352_serdes_get_* functions with mv88e6xxx_serdes_get_lane making them more generic. 3. Add lane argument to mv88e6352_serdes_read so it can be reused later for 6321. Co-developed-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com> Signed-off-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com> Signed-off-by: Fidan Aliyeva <fidan.aliyeva.ext@ericsson.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260528210310.1365858-3-fidan.aliyeva.ext@ericsson.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-01mv88e6xxx: Add mv88e6352_serdes_get_laneFidan Aliyeva
Changes: 1. Add mv88e6352_serdes_get_lane function which checks if the port supports SERDES by calling mv88e6352_g2_scratch_port_has_serdes. Then returns the address of the SERDES lane. 2. Add this function as .serdes_get_lane member to all the chip versions which use mv88e6352_pcs_init. Co-developed-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com> Signed-off-by: Thomas Eckerman <thomas.eckerman.ext@ericsson.com> Signed-off-by: Fidan Aliyeva <fidan.aliyeva.ext@ericsson.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260528210310.1365858-2-fidan.aliyeva.ext@ericsson.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-016lowpan: fix off-by-one in multicast context address compressionYizhou Zhao
The second memcpy in lowpan_iphc_mcast_ctx_addr_compress() uses &data[1] as destination and &ipaddr->s6_addr[11] as source, but both should be offset by one: &data[2] and &ipaddr->s6_addr[12] respectively. This off-by-one has two consequences: 1. data[1] is overwritten with s6_addr[11], corrupting the RIID field in the compressed multicast address 2. data[5] is never written, so uninitialized kernel stack memory is transmitted over the network via lowpan_push_hc_data(), leaking kernel stack contents The correct inline data layout must match what the decompression function lowpan_uncompress_multicast_ctx_daddr() expects: data[0..1] = s6_addr[1..2] (flags/scope + RIID) data[2..5] = s6_addr[12..15] (group ID) Also zero-initialize the data array as a defensive measure against similar bugs in the future. Fixes: 5609c185f24d ("6lowpan: iphc: add support for stateful compression") Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn> Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn> Reported-by: Ao Wang <wangao@seu.edu.cn> Reported-by: Xuewei Feng <fengxw06@126.com> Reported-by: Qi Li <qli01@tsinghua.edu.cn> Reported-by: Ke Xu <xuke@tsinghua.edu.cn> Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://patch.msgid.link/20260527081806.42747-1-zhaoyz24@mails.tsinghua.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-01Merge branch 'net-mdio-realtek-rtl9300-soc-independent-command-runner'Jakub Kicinski
Markus Stockhausen says: ==================== net: mdio: realtek-rtl9300: SoC independent command runner The Realtek Otto switch platform consist of four different series - RTL838x aka maple : 28 port 1G Switches - RTL839x aka cypress : 52 port 1G Switches - RTL930x aka longan : 28 port 1G/2.5G/10G Switches - RTL931x aka mango : 56 port 1G/2.5G/10G Switches After establishing basic groundwork for multi device support, this series harmonizes the command handling of the MDIO driver. It is the second step to allow easier integration of the non RTL930x SoCs into this driver. ==================== Link: https://patch.msgid.link/20260527163449.1294961-1-markus.stockhausen@gmx.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-01net: mdio: realtek-rtl9300: use command runner for read_c22()Markus Stockhausen
Convert the final missing read_c22() path to the new read enabled command runner. Do it the same way as other implementations. - bus calls otto_emdio_read_c22() - this hands over to SoC specific otto_emdio_9300_read_c22() - finally the registers are filled and the runner issued With this cleanup remove the obsolete helper otto_emdio_wait_ready() Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260527163449.1294961-5-markus.stockhausen@gmx.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-01net: mdio: realtek-rtl9300: use command runner for read_c45()Markus Stockhausen
Convert the read_c45() path to the new command runner. This needs the additional helper otto_emdio_read_cmd() that can issue the command runner and process a read operation. It is basically nothing more than - run the command - read the command result thorugh the I/O register With this in place convert the read_c45() like the alread existing write C22/C45 implementation. - bus calls otto_emdio_read_c45() - this handed over to SoC specific otto_emdio_9300_read_c45() - the registers are filled - the otto_emdio_read_cmd() is issued - that calls the command runner Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260527163449.1294961-4-markus.stockhausen@gmx.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-01net: mdio: realtek-rtl9300: use command runner for write_c22()Markus Stockhausen
Now that the driver has a generic command runner make use of it in the write_c22() path. For this. - add generic otto_emdio_write_c22() helper that will be called by bus - convert otto_emdio_9300_write_c22() to new command runner logic Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260527163449.1294961-3-markus.stockhausen@gmx.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-01net: mdio: realtek-rtl9300: provide generic command runnerMarkus Stockhausen
The current bus read/write commands for C22/C45 are RTL930x specific. Avoid to duplicate those 200 lines of code for the RTL838x, RTL839x and RTL931x targets. Instead provide a generic command runner that is SoC independent. The implementation works as follows: The runner will take a prepared list of the four MDIO registers. It will feed the data into the registers. This generic write to all registers (or to say "a little bit too much") is no issue. The hardware looks at the to be executed command and will only take the pieces of data that are really required. No side effects have been observed on any of the four SoCs during the time this mechanism exists in downstream OpenWrt. The last fed register is the C22/command register. This will be enriched with the proper command flags from the caller. The hardware issues the command and the runner will wait for its finalization. Besides from feeding all registers the runner emulates the behaviour of the old code as best as possible - check defensively for a running command in advance - Before this commit the driver had different MMIO timeout values. 1000s for command preparation, 100us after writes and 1000us after reads. The new version uses a consistent 1000us timeout for all of these. - return -ENXIO in case of hardware failure (fail bit) As a first consumer of this runner convert the write_c45() function. This is realized in a multi stage approach - a generic otto_emdio_write_c45() will be called by the bus - this will forward the request to the device specific writer. In this case otto_emdio_9300_write_c45(). - There the command data is filled in and the additional helper otto_emdio_write_cmd() will be called - That adds the write flag and issues the generic command runner. With all the above mentioned in place, there is not much left to do in otto_emdio_9300_write_c45(). It just fills the register fields and calls the write helper with the right command bits. Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260527163449.1294961-2-markus.stockhausen@gmx.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-01scsi: ufs: Remove redundant vops NULL check and trivial wrapperChanwoo Lee
ufshcd_variant_hba_init/exit() check 'if (!hba->vops)' before calling vops wrappers, but the wrappers already do NULL check internally. Remove the redundant checks. Also remove ufshcd_variant_hba_exit() entirely since it only wraps ufshcd_vops_exit() with no added value. Signed-off-by: Chanwoo Lee <cw9316.lee@samsung.com> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20260529061623.301291-1-cw9316.lee@samsung.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2026-06-01scsi: ufs: Remove unnecessary return in void vops wrappersChanwoo Lee
ufshcd_vops_exit(), ufshcd_vops_setup_task_mgmt(), and ufshcd_vops_hibern8_notify() use 'return hba->vops->xxx()' while other void vops wrappers call without return. Remove the unnecessary return keywords for consistency. Signed-off-by: Chanwoo Lee <cw9316.lee@samsung.com> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20260529061503.301182-1-cw9316.lee@samsung.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2026-06-01scsi: ufs: Fix wrong value printed in unexpected UPIU response caseChanwoo Lee
In ufshcd_transfer_rsp_status(), the default case of the inner switch statement prints the UPIU response code when an unexpected response is received. However, the code was printing 'result' variable which is always 0 at that point, making the error message useless for debugging. Fix this by printing the actual UPIU response code returned by ufshcd_get_req_rsp(). Fixes: 08108d31129a ("scsi: ufs: Improve type safety") Signed-off-by: Chanwoo Lee <cw9316.lee@samsung.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20260527092134.275887-1-cw9316.lee@samsung.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2026-06-01scsi: ufs: core: Fix NULL pointer dereference in scsi_cmd_priv() callsChanwoo Lee
ufshcd_tag_to_cmd() may return NULL if no command is associated with the given tag. However, several callers dereference the returned cmd pointer via scsi_cmd_priv() without checking for NULL first, leading to a potential NULL pointer dereference. Fix this by adding NULL checks for cmd before calling scsi_cmd_priv() and moving the lrbp initialization after the NULL check. Signed-off-by: Chanwoo Lee <cw9316.lee@samsung.com> Reviewed-by: Peter Wang <peter.wang@mediatek.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20260529010739.295391-1-cw9316.lee@samsung.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2026-06-01scsi: megaraid_mbox: Avoid double kfree()Arnd Bergmann
Smatch found a double-free after my recent change: drivers/scsi/megaraid/megaraid_mbox.c:3474 megaraid_cmm_register() error: double free of 'adp' (line 3468) Since the object is no longer allocated in megaraid_cmm_register(), remove the kfree() as well. Fixes: c1f7275b613b ("scsi: megaraid_mbox: Reduce stack usage in megaraid_cmm_register()") Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20260601210216.846809-1-arnd@kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2026-06-01scsi: pm8001: Fix error code in non_fatal_log_show()Dan Carpenter
The non_fatal_log_show() function is supposed to return negative error codes on failure. But because the error codes are saved in a u32 and then cast to signed long, they end up being high positive values instead of negative. Remove the intermediary u32 variable to fix this bug. Fixes: dba2cc03b9db ("scsi: pm80xx: sysfs attribute for non fatal dump") Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Link: https://patch.msgid.link/ahs-bEsBJH0KhnsX@stanley.mountain Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2026-06-01scsi: lpfc: Turn lpfc_queue q_pgs into a flexible arrayRosen Penev
The q_pgs pointer was assigned to point at the trailing memory allocated past the struct. Convert it to a proper C99 flexible array member and use struct_size() for the allocation. Assisted-by: Claude:Opus-4.7 Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Justin Tee <justin.tee@broadcom.com> Link: https://patch.msgid.link/20260523050241.190239-1-rosenp@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2026-06-01scsi: ufs: core: Skip link param validation when lanes_per_direction is unsetDaejun Park
ufshcd_validate_link_params(), added by commit e72323f3b09f ("scsi: ufs: core: Configure only active lanes during link"), is called unconditionally from ufshcd_link_startup() and fails link startup with -ENOLINK when the connected lane count read from the device differs from hba->lanes_per_direction. lanes_per_direction is only set by ufshcd-pltfrm (default 2, or the "lanes-per-direction" devicetree property); ufshcd-pci controllers (e.g. Intel) leave it 0. As the device always reports >= 1 connected lanes, the check can never match and link startup always fails. Reproduced with QEMU's UFS device. Skip the check when lanes_per_direction is unset: with no expected value to validate against, restore the behaviour from before that commit. Fixes: e72323f3b09f ("scsi: ufs: core: Configure only active lanes during link") Signed-off-by: Daejun Park <daejun7.park@samsung.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://patch.msgid.link/20260520070009epcms2p6542f3abb7660839e9d8140b3f2f145c3@epcms2p6 Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2026-06-01Merge branch 'minimize-annotations-for-arena-programs'Alexei Starovoitov
Emil Tsalapatis says: ==================== Minimize annotations for arena programs BPF programs must currently include code to address two limitations of function signatures that include arena types. First, arena arguments must be annotated with __arg_arena in the function signature in addition to __arena. Second, it is currently not allowed to return an arena pointer from a subprog, even though it is safe to do so. These limitations require extra annotations and typecasts respectively, and have proven sources of confusion to programmers. The patchset improves arena-related function signatures in two ways. First, it removes the need for __arg_arena in function signatures. Second, it allows subprogs to directly return arena pointers to their caller. To do this we add a new type tag to the existing __arena annotation. The annotation is currently an alias for __attribute__((address_space(1))), which is not discoverable from BTF alone and so cannot be used to determine whether a pointer variable is an arena pointer during verification. With the new type tag, we can determine whether either the arguments and or the return value of a function belong in an arena. We test the new code by modifying libarena to take advantage of these relaxed limitations. CHANGELOG ========= v2 -> v3 (https://lore.kernel.org/bpf/20260530002259.4505-1-emil@etsalapatis.com/) - Added Acks by Eduard - Complete the __arg_arena removal by removing them from htab (Alexei) - Add a test in verifier_arena_globals1.c to confirm the new __arena attribute works as expected in function argument and return types - Reject type tags on non-pointer types, currently only possible in handcrafted BTF (Eduard) - Undo inaccurate change on verifier comment (AI) - Fix error return value for invalid BTF return types during BTF parsing (Eduard) v1 -> v2 (lore.kernel.org/bpf/20260527071457.4598-1-emil@etsalapatis.com/) - Rebased to fix conflict - Removed the typedef foo * foo_t typedefs. Those were necessary to avoid annotating each instance of the type with __arena. The new version of the patch instead removes typedefs and uses __arena everywhere directly (see patch 4/5 for more details). - Reorganized the patchset to frontload all kernel-side changes and place the libarena changes at the end. ==================== Link: https://patch.msgid.link/20260602004120.17087-1-emil@etsalapatis.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01selftests/bpf: Add tests for the new type-tag based __arena identifierEmil Tsalapatis
Add selftests that combine the new type-based __arena identifier with the volatile qualifier both in functions' arguments and return values. This way we test both that they are recognized as arena arguments and that they are not sensitive to the position they are placed in the type compared to other qualifiers. Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com> Link: https://lore.kernel.org/r/20260602004120.17087-7-emil@etsalapatis.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01selftests/bpf: libarena: Directly return arena pointers from functionsEmil Tsalapatis
Now that the __arena annotation includes a BTF type tag, and the verifier can identify arena pointers at BTF loading time, return arena pointers as their true type instead of casting to u64. Remove the preprocessor typecast wrappers used to hide this from the caller. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com> Link: https://lore.kernel.org/r/20260602004120.17087-6-emil@etsalapatis.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01selftests/bpf: Remove __arg_arena from the codebaseEmil Tsalapatis
Now that BPF __arg_arena has been subsumed by __arena, remove __arg_arena from the codebase. This way the user has one fewer annotation to worry about. To remove __arg_arena we remove the typedefs we were previously using to minimize __arena annotations. This is because __arena now also includes a BTF type tag, which is ignored for non-pointer types. As a result, we cannot capture the whole __arena annotation inside a typedef and need to directly annotate the pointer type when declaring the variable. The extra verbosity is worth it because the use of the __arena tag is intuitive to the programmer and removes the __arg_arena tag that has been a consistent source of confusion for users. The typedefs can be reintroduced later (without __arg_arena) once compilers start supporting BTF type tags for non-pointer types. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com> Link: https://lore.kernel.org/r/20260602004120.17087-5-emil@etsalapatis.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01bpf: Allow subprogs to return arena pointersEmil Tsalapatis
BPF subprogs currently only return void or scalar values. However, it is also safe to return arena pointers between subprogs in the same BPF program: Arena pointers are guaranteed to be safe for both programs at any point. Expand the verifier to permit returning an arena pointer to the caller. The main subprog is still not allowed to return an arena pointer because arena pointers are internal to the BPF program, and the return values permitted for each main subprog depend on the program type anyway. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com> Link: https://lore.kernel.org/r/20260602004120.17087-4-emil@etsalapatis.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01verifier: parse BTF type tags for function argumentsEmil Tsalapatis
The BTF parsing logic for function arguments goes through the arguments' decl tags, but does not go into their type tags. Add type tag parsing for function arguments. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com> Link: https://lore.kernel.org/r/20260602004120.17087-3-emil@etsalapatis.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01selftests/bpf: libarena: Add "arena" BTF type tag to __arena qualifierEmil Tsalapatis
The arena qualifier currently designates its associated type as belonging to address space 1. This property affects code generation, but is not reflected in the BTF information of the function. This lack of information at the BTF level prevents us from returning arena pointers from global subprograms. Subprogs cannot return any data structure more complex than a scalar, so pointers to structs are rejected as a return type. We have no way of marking the return type as a pointer to an arena, which is safe provided the two subprogs have the same arena. Expand the __arena qualifier to also attach a BTF type tag to the type. This lets us determine whether a variable belongs to an arena from its type alone through BTF parsing. Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com> Link: https://lore.kernel.org/r/20260602004120.17087-2-emil@etsalapatis.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01scsi: sas: Skip opt_sectors when DMA reports no real optimization hintIonut Nechita
sas_host_setup() unconditionally sets shost->opt_sectors from dma_opt_mapping_size(). When the IOMMU is disabled or in passthrough mode and no DMA ops provide an opt_mapping_size callback, dma_opt_mapping_size() returns min(dma_max_mapping_size(), SIZE_MAX) which equals dma_max_mapping_size() — a hard upper bound, not an optimization hint. On a Dell PowerEdge R750 with mpt3sas (Broadcom SAS3816, FW 33.15.00.00) and intel_iommu=off the following values are observed: dma_opt_mapping_size() = dma_max_mapping_size() (no real hint) shost->max_sectors = 32767 opt_sectors = min(32767, huge >> 9) = 32767 optimal_io_size = 32767 << 9 = 16776704 → round_down(16776704, 4096) = 16773120 The SAS disk (SAMSUNG MZILT800HBHQ0D3) does not report an Optimal Transfer Length in VPD page B0, so sdkp->opt_xfer_blocks remains 0. sd_revalidate_disk() then uses min_not_zero(0, opt_sectors) = opt_sectors, propagating the bogus value into the block device's optimal_io_size (visible as OPT-IO = 16773120 in lsblk --topology). mkfs.xfs picks up optimal_io_size and minimum_io_size and computes: swidth = 16773120 / 4096 = 4095 sunit = 8192 / 4096 = 2 Since 4095 % 2 != 0, XFS rejects the geometry: SB stripe unit sanity check failed This makes it impossible to create XFS filesystems (e.g. for /var/lib/docker) during system bootstrap. Fix this by introducing a sas_dma_setup_opt_sectors() helper that sets opt_sectors only when dma_opt_mapping_size() is strictly less than dma_max_mapping_size(), indicating a genuine DMA optimization constraint. The helper computes min(opt_sectors, max_sectors) first, then rounds down to a power of two so that filesystem geometry calculations always produce clean results. When the two DMA values are equal, no backend provided a real hint, so opt_sectors stays at 0 ("no preference"). [mkp: implemented hch's suggestion] Fixes: 4cbfca5f7750 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit") Cc: stable@vger.kernel.org Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260519135238.373784-2-ionut.nechita@windriver.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2026-06-01Merge branch 'more-gen_loader-fixes'Alexei Starovoitov
Daniel Borkmann says: ==================== More gen_loader fixes Follow-up fixes for the signed loader, includes also the recent sashiko findings. v1->v2: - Fixed up verifier_map_ptr selftest - Added patch 1/2/6/7 with a new map-in-map fix and a redundant hash_buf memcpy cleanup as well as selftests ==================== Link: https://patch.msgid.link/20260601150248.394863-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01selftests/bpf: Test that exclusive maps are rejected in map-in-mapDaniel Borkmann
Add a subtest to map_excl that verifies an exclusive map (created with excl_prog_hash) cannot be used in a map-of-maps, covering both kernel enforcement points: i) the inner-map template at map-of-maps creation and, ii) the element inserted into an existing map-of-maps. # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t map_excl ./test_progs -t map_excl [ 1.728106] bpf_testmod: loading out-of-tree module taints kernel. [ 1.730473] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #215/1 map_excl/map_excl_allowed:OK #215/2 map_excl/map_excl_denied:OK #215/3 map_excl/map_excl_no_map_in_map:OK #215 map_excl:OK Summary: 1/3 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260601150248.394863-8-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01selftests/bpf: Adjust verifier_map_ptr for the map's excl fieldKP Singh
Adding the u32 excl field at offset 32 of struct bpf_map right after the sha[SHA256_DIGEST_SIZE] hash shifts the ops pointer from offset 32 to 40. Therefore, fix up the test case. # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_map_ptr [...] #637/1 verifier_map_ptr/bpf_map_ptr: read with negative offset rejected:OK #637/2 verifier_map_ptr/bpf_map_ptr: read with negative offset rejected @unpriv:OK #637/3 verifier_map_ptr/bpf_map_ptr: write rejected:OK #637/4 verifier_map_ptr/bpf_map_ptr: write rejected @unpriv:OK #637/5 verifier_map_ptr/bpf_map_ptr: read non-existent field rejected:OK #637/6 verifier_map_ptr/bpf_map_ptr: read non-existent field rejected @unpriv:OK #637/7 verifier_map_ptr/bpf_map_ptr: read ops field accepted:OK #637/8 verifier_map_ptr/bpf_map_ptr: read ops field accepted @unpriv:OK [...] Summary: 2/18 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: KP Singh <kpsingh@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260601150248.394863-7-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01libbpf: Skip max_entries override on signed loadersDaniel Borkmann
bpf_gen__map_create() lets the host-supplied loader ctx override a map's max_entries at runtime (map_desc[idx].max_entries, when non-zero). This is how the light skeleton sizes maps to the target machine, but it happens after emit_signature_match() and is covered by neither the signed loader instructions nor the hashed blob. For a signed loader this means an untrusted host can re-dimension the program's maps, outside what the signature attests to. Gate the override on gen_hash so signed loaders use the signer-provided max_entries baked into the blob. Fixes: ea923080c145 ("libbpf: Embed and verify the metadata hash in the loader") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260601150248.394863-6-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01libbpf: Skip initial_value override on signed loadersDaniel Borkmann
bpf_gen__map_update_elem() emits code that, when the host-supplied loader ctx provides a non-NULL map_desc[idx].initial_value, overwrites the blob value with bytes read from the host (bpf_copy_from_user / bpf_probe_read_kernel) before the BPF_MAP_UPDATE_ELEM that populates the program's .data/.rodata/.bss maps. This override runs after emit_signature_match() has validated map->sha[], and initial_value is part of neither the signed loader instructions nor the hashed data blob. For a signed loader this lets an untrusted host substitute global-variable contents into a program whose code carries a valid signature, thus weakening what the signature attests to. The blob already contains the signer-provided value (added via add_data() and covered by the embedded, signed hash), so simply skip emitting the override for signed loaders (gen_hash). Runtime initialization stays available for the unsigned light-skeleton path as before. The jump offsets within the override block are internal to it, so guarding the whole block leaves them unchanged. Fixes: ea923080c145 ("libbpf: Embed and verify the metadata hash in the loader") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260601150248.394863-5-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01libbpf: Reject non-exclusive metadata maps in the signed loaderKP Singh
The loader verifies map->sha against the metadata hash in its instructions. map->sha is calculated when BPF_OBJ_GET_INFO_BY_FD is called on the frozen map. While the map is frozen, the /signed loader/ must also ensure the map is exclusive, as, without exclusivity (which a hostile host could just omit when loading the loader), another BPF program with map access can mutate the contents afterwards, so the check passes on stale data. With the extra check as part of the signed loader, it now refuses to move on with map->sha validation if the host set it up wrongly. Fixes: fb2b0e290147 ("libbpf: Update light skeleton for signing") Signed-off-by: KP Singh <kpsingh@kernel.org> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260601150248.394863-4-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01bpf: Drop redundant hash_buf from map_get_hash operationDaniel Borkmann
bpf_map_get_info_by_fd() is the only caller of the ->map_get_hash and always invokes it with hash_buf == map->sha and hash_buf_size of SHA256_DIGEST_SIZE. array_map_get_hash() in turn lets sha256() write the digest directly into that buffer (map->sha) and then performs a trailing memcpy(), which evaluates to memcpy(map->sha, map->sha, 32): a redundant self-copy. The hash_buf_size argument was never used at all. Simplify this a bit, no functional change. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260601150248.394863-3-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01bpf: Reject exclusive maps as inner maps in map-in-mapDaniel Borkmann
An exclusive map (created with excl_prog_hash) is bound to a single program by hash: check_map_prog_compatibility() refuses to load any program whose digest does not match map->excl_prog_sha. That check only runs for maps a program references directly, i.e. its used_maps. A map reached at runtime through a map-of-maps is never in used_maps, and bpf_map_meta_equal() does not consider excl_prog_sha, so an exclusive map can be inserted into a non-exclusive outer map and then looked up and mutated by an unrelated program, bypassing the exclusivity guarantee. For the signed loader this defeats the metadata map exclusivity check added in the signed loader: the cached map->sha[] is validated against the signed hash while another program on a hostile host rewrites the frozen map's contents through the outer map. Fixes: baefdbdf6812 ("bpf: Implement exclusive map creation") Reported-by: sashiko <sashiko@sashiko.dev> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260601150248.394863-2-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01Merge branch 'refactor-verifier-object-relationship-tracking'Alexei Starovoitov
Amery Hung says: ==================== Refactor verifier object relationship tracking Hi all, This patchset cleans up dynptr handling, refactors object relationship tracking in the verifier by introducing parent_id and folding ref_obj_id into id, and fixes dynptr use-after-free bugs where file/skb dynptrs are not invalidated when the parent referenced object is freed. * Motivation * In BPF qdisc programs, an skb can be freed through kfuncs. However, since dynptr does not track the parent referenced object (e.g., skb), the verifier does not invalidate the dynptr after the skb is freed, resulting in use-after-free. The same issue also affects file dynptr. The figure below shows the current state of object tracking. The verifier tracks objects using three fields: id for nullness tracking, ref_obj_id for lifetime tracking, and dynptr_id for tracking the parent dynptr of a slice (PTR_TO_MEM only). While dynptr_id links slices to their parent dynptr, there is no field that links a dynptr back to its parent skb. When the skb is freed via release_reference(ref_obj_id=1), only objects with ref_obj_id=1 are invalidated. Since skb dynptr is non-referenced (ref_obj_id=0), the dynptr and its derived slices remain accessible. Current: object (id, ref_obj_id, dynptr_id) id = unique id of the object (for nullness tracking) ref_obj_id = id of the referenced object (for lifetime tracking) dynptr_id = id of the parent dynptr (only for PTR_TO_MEM slices) skb (0,1,0) ^^ ! No link from dynptr to skb ! |+------------------------------+ | bpf_dynptr_clone | dynptr A (2,0,0) dynptr C (4,0,0) ^ ^ bpf_dynptr_slice | | | | slice B (3,0,2) slice D (5,0,4) * Why not simply use ref_obj_id to track the parent? * A natural first approach is to link dynptr to its parent by sharing the parent's ref_obj_id and propagating it to slices. Now, releasing the skb via release_reference(ref_obj_id=1) correctly invalidates all derived objects. Attempted fix: share parent's ref_obj_id skb (0,1,0) ^^ || |+------------------------------+ | bpf_dynptr_clone | dynptr A (2,1,0) dynptr C (4,1,0) ^ ^ bpf_dynptr_slice | | | | slice B (3,1,2) slice D (5,1,4) However, this approach does not generalize to all dynptr types. Referenced dynptrs such as file dynptr acquire their own ref_obj_id to track the dynptr's lifetime. Since ref_obj_id is already used for the dynptr's own reference, it cannot also be used to point to the parent file object. While it is possible to add specialized handling for individual dynptr types [0], it adds complexity and does not generalize. An alternative approach is to avoid introducing a new field and instead repurpose ref_obj_id as parent_id by folding lifetime tracking into id [1]. In this design, each object is represented as (id, ref_obj_id) where id is used for both nullness and lifetime tracking, and ref_obj_id tracks the parent object's id. Attempted: object (id, ref_obj_id) id = id of the object (for nullness and lifetime tracking) ref_obj_id = id of the parent object ' = id is referenced skb (1',0) ^^ || bpf_dynptr_from_skb |+------------------------------+ | bpf_dynptr_clone(A, C) | dynptr A (2,1') dynptr C (4,1') ^ ^ bpf_dynptr_slice | | | | slice B (3,2) slice D (5,4) However, this design cannot express the relationship between referenced socket pointers and their casted counterparts. After pointer casting, the original and casted pointers need the same lifetime (same ref_obj_id in the current design) but different nullness (different id). The casted pointer may be NULL even if the original is valid. With id serving as the only field for both nullness and lifetime, and ref_obj_id repurposed as parent, there is no way to express "different identity, same lifetime." Referenced socket pointer (expressed using current design): C = ptr_casting_function(A) ptr A (1,1,0) ptr C (2,1,0) ^ ^ | | ptr C may be NULL even if ptr A is valid but they have the same lifetime * New Design: parent_id with branch splitting and intermediate reference * The patchset folds ref_obj_id into id and adds parent_id to bpf_reg_state (patch 5). A child object's parent_id points to the parent object's id. This replaces the PTR_TO_MEM-specific dynptr_id. Whether a register is referenced is determined by checking if its id appears in the reference array via reg_is_referenced() rather than reading a dedicated ref_obj_id field. Pointer casting: The challenge with pointer casting is that a cast result may be NULL even when the source is valid, requiring distinct identity but shared lifetime. This is solved using branch splitting: when a helper like bpf_sk_fullsock() is called with a referenced pointer, the verifier pushes an explicit NULL branch and assigns the cast result the same id as the source. Since the cast may return NULL for a non-NULL input, the NULL case is explored as a separate verifier branch. This allows releasing any of the original or cast pointers to invalidate all others, while avoiding the need for a separate tracking mechanism. Referenced dynptrs: The challenge with referenced dynptrs is that clones of a referenced dynptr have the same lifetime but different identities. When a referenced dynptr is overwritten, only slices derived from it will be invalidated. To solve this, the verifier creates an intermediate reference. This reference serves as a shared lifetime anchor for the dynptr and all its clones. All clones share the same parent_id but get unique ids for independent slice tracking. Releasing a referenced dynptr releases the intermediate reference, which in turn invalidates all clones and their derived slices. If the parent object is released while the intermediate reference still exists, it is reported as a leaked reference. Release cascading: When releasing an object, release_reference() performs a stack-based DFS to invalidate all descendants. It walks the object tree via parent_id links, invalidating registers and dynptr stack slots. Child references encountered during traversal are reported as leaked references. parent_id is also added to bpf_reference_state to enable intermediate reference. When acquiring a reference, a parent_id can be specified to link the new reference to an existing one (e.g., file dynptr's intermediate reference has parent_id linking to the file's reference). Final: object (id, parent_id) id = unique id of the object (for nullness and lifetime tracking) parent_id = id of the parent object (for object relationship tracking) I = intermediate reference serving as lifetime anchor in acquired_refs ' = id is referenced (appears in reference array) skb (1',0) ^^ || bpf_dynptr_from_skb |+------------------------------+ | bpf_dynptr_clone(A, C) | dynptr A (2,1') dynptr C (4,1') ^ ^ bpf_dynptr_slice | | | | slice B (3,2) slice D (5,4) * Preserving reg->id after null-check * For parent_id tracking to work, child objects need to refer to the parent's id. This requires two preparatory changes: assigning reg->id when reading referenced kptrs from program context (patch 3), and preserving reg->id of pointer objects after null-check (patch 4). Previously, null-check would clear reg->id, making it impossible for children to reference the parent afterward. The latter causes a slight increase in verified states for some programs. One selftest object sees +19 states (+5.01%). For Meta BPF objects, the increase is also minor, with the largest being +34 states (+3.63%). * Object relationship in different scenarios (for reference) * The figures below show how the final design handles all four combinations of referenced/non-referenced dynptr with referenced/non-referenced parent. (1) Non-referenced dynptr with referenced parent (e.g., skb in Qdisc): skb (1',0) ^^ || bpf_dynptr_from_skb |+------------------------------+ | bpf_dynptr_clone(A, C) | dynptr A (2,1') dynptr C (4,1') dynptr A and C live independently (2) Non-referenced dynptr with non-referenced parent (e.g., skb in TC, always valid): bpf_dynptr_from_skb bpf_dynptr_clone(A, C) dynptr A (1,0) dynptr C (2,0) dynptr A and C live independently (3) Referenced dynptr with referenced parent: file (1',0) ^ bpf_dynptr_from_file | I (2',1') <-- intermediate reference ^^ || |+-------------------------------+ | bpf_dynptr_clone(A, C) | dynptr A (3,2') dynptr C (4,2') dynptr A and C have the same lifetime Releasing either dynptr releases I, invalidating both. Releasing file (1') detects I as a leaked reference. (4) Referenced dynptr with non-referenced parent: bpf_ringbuf_reserve_dynptr I (1',0) <-- intermediate reference ^^ || |+--------------------------------+ | bpf_dynptr_clone(A, C) | dynptr A (2,1') dynptr C (3,1') dynptr A and C have the same lifetime [0] https://lore.kernel.org/bpf/20250414161443.1146103-2-memxor@gmail.com/ [1] https://github.com/ameryhung/bpf/commits/obj_relationship_v2_no_parent_id/ Changelog: v5 -> v6 - Squash "bpf: Fold ref_obj_id into id and introduce virtual references" (v5 patch 9) into "bpf: Refactor object relationship tracking and fix dynptr UAF bug" (now patch 5). ref_obj_id is removed in the same patch that introduces parent_id, eliminating the intermediate state where both coexist (Eduard) - Drop virtual references for pointer casting. Instead, cast results reuse the source pointer's id and use branch splitting to explore the NULL case as a separate verifier branch. This avoids adding virtual reference infrastructure for a case that can be handled more simply (Eduard, Andrii) - Address nit from Eduard Link: https://lore.kernel.org/bpf/20260519181314.2731658-1-ameryhung@gmail.com/ v4 -> v5 - Add patch 9 folding ref_obj_id into id and introducing virtual references for pointer casting and referenced dynptr clones (Eduard, Andrii) - Add patch 10 fixing dynptr ref counting to scan all call frames instead of only the current frame (Eduard) - Add utility function validate_ref_obj() (Eduard) Link: https://lore.kernel.org/bpf/20260506142709.2298255-1-ameryhung@gmail.com/ v3 -> v4 - Add patch 1 clean up mark_stack_slot_obj_read() and callers (to address v3 ignoring err returned from mark_dynptr_read) (Andrii) - Fix release_reference() and move the logic allowing destroying a referenced object when refcnt > 1 from destroy_if_stack_slots_dynptr() to release_reference() (Mykyta) - Add patch 7 introducing ref_obj_desc and unifying ref_obj handling (to address Eduard's concern about unclear meta->{id,ref_obj_id} initialization/use and confusing function arguments of process_dynptr_func()) - Add patch 8 unifying release_regno handling so that bpf_kptr_xchg also use release_reference() Link: https://lore.kernel.org/bpf/20260421221016.2967924-1-ameryhung@gmail.com/ v2 -> v3 - Rebase to bpf-next/master - Update veristat numbers - Update commit msg to explain multiple dropped checks (Mykyta, Andrii) - Reuse idmap as idstack in release_reference() and check for duplicate id (Mykyta, Andrii) - Change to use RUN_TEST for qdisc dynptr selftest (Eduard) Link: https://lore.kernel.org/bpf/20260307064439.3247440-1-ameryhung@gmail.com/ v1 -> v2 - Redesign: Use object (id, ref_obj_id, parent_id) instead of (id, ref_obj_id) as it cannot express ptr casting without introducing specialized code to handle the case - Use stack-based DFS to release objects to avoid recursion (Andrii) - Keep reg->id after null check - Add dynptr cleanup - Fix dynptr kfunc arg type determination - Add a file dynptr UAF selftest Link: https://lore.kernel.org/bpf/20260202214817.2853236-1-ameryhung@gmail.com/ --- ==================== Link: https://patch.msgid.link/20260529014936.2811085-1-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01selftests/bpf: Test using dynptr after freeing the underlying objectAmery Hung
Make sure the verifier invalidates the dynptr and dynptr slice derived from an skb after the skb is freed. Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260529014936.2811085-14-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01selftests/bpf: Test using file dynptr after the reference on file is droppedAmery Hung
File dynptr and slice should be invalidated when the parent file's reference is dropped in the program. Without the verifier tracking dyntpr's parent referenced object, the dynptr would continute to be incorrectly used even if the underlying file is being tear down or gone. Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260529014936.2811085-13-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01selftests/bpf: Test using slice after invalidating dynptr cloneAmery Hung
The parent object of a cloned dynptr is skb not the original dynptr. Invalidate the original dynptr should not prevent the program from using the slice derived from the cloned dynptr. Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260529014936.2811085-12-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01selftests/bpf: Test creating dynptr from dynptr data and sliceAmery Hung
The verifier currently does not allow creating dynptr from dynptr data or slice. Add a selftest to test this explicitly. Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260529014936.2811085-11-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01bpf: Fix dynptr ref counting to scan all call framesAmery Hung
When checking whether a referenced dynptr can be overwritten, destroy_if_dynptr_stack_slot only counted sibling dynptrs in the current call frame. If a clone sharing the same virtual ref parent existed in a different frame (e.g., passed to a subprog), it would not be counted, causing the verifier to incorrectly reject the overwrite with "cannot overwrite referenced dynptr". Fix by extracting the counting into dynptr_ref_cnt() which uses bpf_for_each_reg_in_vstate_mask() to scan dynptr stack slots across all call frames. Fixes: 017f5c4ef73c ("bpf: Allow overwriting referenced dynptr when refcnt > 1") Reported-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260529014936.2811085-10-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01bpf: Unify release handling for helpers and kfuncsAmery Hung
Introduce release_reg() to consolidate the release logic shared by both helpers and kfuncs: dynptr release, kptr_xchg percpu-to-RCU conversion, regular reference release, and NULL pass-through. NULL pass-through is only allowed if the prototype indicates the argument may be null. Determine release_regno from the function prototype/metadata before argument checking, rather than discovering it dynamically during argument processing. For helpers, scan the arg_type array in check_func_proto() via check_proto_release_reg(). For kfuncs, set release_regno to BPF_REG_1 in bpf_fetch_kfunc_arg_meta() when KF_RELEASE is set. In the future when we start adding decl_tag to kfunc arguments, we can just look at the function prototype instead of a release_regno. Extract ref_convert_alloc_rcu_protected() and invalidate_rcu_protected_refs() to make it more clear what the code is doing. For ref_convert_alloc_rcu_protected(), it pre-converts MEM_ALLOC | MEM_PERCPU registers to MEM_RCU (clearing id so they survive), then calls release_reference() to invalidate the remaining registers and release the reference state. Add KF_RELEASE to bpf_dynptr_file_discard() so its release_regno is set via fetch_kfunc_meta rather than being assigned manually in the dynptr argument processing. Set arg_type to ARG_PTR_TO_DYNPTR for KF_ARG_PTR_TO_DYNPTR so that check_func_arg_reg_off() correctly allows non-zero stack offsets for dynptr release arguments same as helper. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260529014936.2811085-9-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01bpf: Unify referenced object tracking in verifierAmery Hung
Helpers and kfuncs independently tracked referenced object metadata using standalone id fields in their respective arg_meta structs. This led to duplicated logic and inconsistent error handling between the two paths. Introduce struct ref_obj_desc to consolidate id and parent_id along with a count of how many arguments carry a reference. Add update_ref_obj() to populate it from a bpf_reg_state, replacing open-coded assignments in check_func_arg(), check_kfunc_args(), and process_iter_arg(). Add validate_ref_obj() to check for ambiguous ref_obj before using it. For ref_obj releasing helpers and kfuncs, keep checking it before calling update_ref_obj() for now. A later patch will make these functions not depending on ref_obj. For other users of ref_obj, move the checks to the use locations. For helper, this means moving the checks inside helper_multiple_ref_obj_use() to use locations. is_acquire_function() is dropped as ref_obj is never used. Pass ref_obj_desc into process_dynptr_func()/mark_stack_slots_dynptr() instead of a bare parent_id to make it less confusing. Drop the selftest introduced in 7ec899ac90a2 ("selftests/bpf: Negative test case for ref_obj_id in args") since the verifier no longer complains about ambiguous ref_obj if it is not used. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260529014936.2811085-8-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01bpf: Remove redundant dynptr arg check for helperAmery Hung
unmark_stack_slots_dynptr() already makes sure that CONST_PTR_TO_DYNPTR cannot be released. process_dynptr_func() also prevents passing uninitialized dynptr to helpers expecting initialized dynptr. Now that unmark_stack_slots_dynptr() also reports error returned from release_reference(), there should be no reason to keep these redundant checks. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260529014936.2811085-7-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01bpf: Refactor object relationship tracking and fix dynptr UAF bugAmery Hung
Refactor object relationship tracking in the verifier and fix a dynptr use-after-free bug where file/skb dynptrs are not invalidated when the parent referenced object is freed. Add parent_id to bpf_reg_state to precisely track child-parent relationships. A child object's parent_id points to the parent object's id. This replaces the PTR_TO_MEM-specific dynptr_id. Remove ref_obj_id from bpf_reg_state by folding its role into the existing id field. Previously, id tracked pointer identity for null checking while ref_obj_id tracked the owning reference for lifetime management. These are now unified: acquire helpers and kfuncs set id to the acquired reference id, and release paths use id directly. Add reg_is_referenced() which checks if a register is referenced by looking up its id in the reference array. This replaces all former ref_obj_id checks. For release_reference(), invalidating an object now also invalidates all descendants by traversing the object tree. This is done using stack-based DFS to avoid recursive call chains of release_reference() -> unmark_stack_slots_dynptr() -> release_reference(). Referenced objects encountered during tree traversal are reported as leaked references. Add parent_id to bpf_reference_state to enable hierarchical reference tracking. When acquiring a reference, a parent_id can be specified to link the new reference to an existing one (e.g., referenced dynptrs acquire a reference with parent_id linking to the parent object's reference). Pointer casting: For pointer casting helpers (bpf_sk_fullsock, bpf_tcp_sock), instead of propagating ref_obj_id, the cast result reuses the same reference id as the source pointer. Since the cast may return NULL for a non-NULL input, the NULL case is explored as a separate verifier branch. This allows releasing any of the original or cast pointers to invalidate all others. Referenced dynptrs: When constructing a referenced dynptr, acquire a intermediate reference with parent_id linking to the parent referenced object. The dynptr and all clones share the same parent_id (pointing to the intermediate ref) but get unique ids for independent slice tracking. Releasing a referenced dynptr releases the parent reference, which in turn invalidates all clones and their derived slices. Owning to non-owning reference conversion: After converting owning to non-owning by clearing id (e.g., object(id=1) -> object(id=0)), the verifier releases the reference state via release_reference_nomark(). Note that the error message "reference has not been acquired before" in the helper and kfunc release paths is removed. This message was already unreachable. The verifier only calls release_reference() after confirming the reference is valid, so the condition could never trigger in practice. Fixes: 870c28588afa ("bpf: net_sched: Add basic bpf qdisc kfuncs") Signed-off-by: Amery Hung <ameryhung@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260529014936.2811085-6-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01bpf: Preserve reg->id of pointer objects after null-checkAmery Hung
Preserve reg->id of pointer objects after null-checking the register so that children objects derived from it can still refer to it in the new object relationship tracking mechanism introduced in a later patch. This change incurs a slight increase in the number of states in one selftest bpf object, rbtree_search.bpf.o. For Meta bpf objects, the increase of states is also negligible. Selftest BPF objects with insns_diff > 0 Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) ------------------------ --------- --------- -------------- ---------- ---------- ------------- rbtree_search 6820 7326 +506 (+7.42%) 379 398 +19 (+5.01%) Meta BPF objects with insns_diff > 0 Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) ------------------------ --------- --------- -------------- ---------- ---------- ------------- ned_imex_be_tclass 52 57 +5 (+9.62%) 5 6 +1 (+20.00%) ned_imex_be_tclass 52 57 +5 (+9.62%) 5 6 +1 (+20.00%) ned_skop_auto_flowlabel 523 526 +3 (+0.57%) 39 40 +1 (+2.56%) ned_skop_mss 289 292 +3 (+1.04%) 20 20 +0 (+0.00%) ned_skopt_bet_classifier 78 82 +4 (+5.13%) 8 8 +0 (+0.00%) dctcp_update_alpha 252 320 +68 (+26.98%) 21 27 +6 (+28.57%) dctcp_update_alpha 252 320 +68 (+26.98%) 21 27 +6 (+28.57%) ned_ts_func 119 126 +7 (+5.88%) 6 7 +1 (+16.67%) tw_egress 1119 1128 +9 (+0.80%) 95 96 +1 (+1.05%) tw_ingress 1128 1137 +9 (+0.80%) 95 96 +1 (+1.05%) tw_tproxy_router 4380 4465 +85 (+1.94%) 114 118 +4 (+3.51%) tw_tproxy_router4 3093 3170 +77 (+2.49%) 83 88 +5 (+6.02%) ttls_tc_ingress 34656 35717 +1061 (+3.06%) 936 970 +34 (+3.63%) tw_twfw_egress 222327 222338 +11 (+0.00%) 10563 10564 +1 (+0.01%) tw_twfw_ingress 78295 78299 +4 (+0.01%) 3825 3826 +1 (+0.03%) tw_twfw_tc_eg 222839 222859 +20 (+0.01%) 10584 10585 +1 (+0.01%) tw_twfw_tc_in 78295 78299 +4 (+0.01%) 3825 3826 +1 (+0.03%) tw_twfw_egress 8080 8085 +5 (+0.06%) 456 456 +0 (+0.00%) tw_twfw_ingress 8053 8056 +3 (+0.04%) 454 454 +0 (+0.00%) tw_twfw_tc_eg 8154 8174 +20 (+0.25%) 456 457 +1 (+0.22%) tw_twfw_tc_in 8060 8063 +3 (+0.04%) 455 455 +0 (+0.00%) tw_twfw_egress 222327 222338 +11 (+0.00%) 10563 10564 +1 (+0.01%) tw_twfw_ingress 78295 78299 +4 (+0.01%) 3825 3826 +1 (+0.03%) tw_twfw_tc_eg 222839 222859 +20 (+0.01%) 10584 10585 +1 (+0.01%) tw_twfw_tc_in 78295 78299 +4 (+0.01%) 3825 3826 +1 (+0.03%) tw_twfw_egress 8080 8085 +5 (+0.06%) 456 456 +0 (+0.00%) tw_twfw_ingress 8053 8056 +3 (+0.04%) 454 454 +0 (+0.00%) tw_twfw_tc_eg 8154 8174 +20 (+0.25%) 456 457 +1 (+0.22%) tw_twfw_tc_in 8060 8063 +3 (+0.04%) 455 455 +0 (+0.00%) Looking into rbtree_search, the reason for such increase is that the verifier has to explore the main loop shown below for one more iteration until state pruning decides the current state is safe. long rbtree_search(void *ctx) { ... bpf_spin_lock(&glock0); rb_n = bpf_rbtree_root(&groot0); while (can_loop) { if (!rb_n) { bpf_spin_unlock(&glock0); return __LINE__; } n = rb_entry(rb_n, struct node_data, r0); if (lookup_key == n->key0) break; if (nr_gc < NR_NODES) gc_ns[nr_gc++] = rb_n; if (lookup_key < n->key0) rb_n = bpf_rbtree_left(&groot0, rb_n); else rb_n = bpf_rbtree_right(&groot0, rb_n); } ... } Below is what the verifier sees at the start of each iteration (65: may_goto) after preserving id of rb_n. Without id of rb_n, the verifier stops exploring the loop at iter 16. rb_n gc_ns[15] iter 15 257 257 iter 16 290 257 rb_n: idmap add 257->290 gc_ns[15]: check 257 != 290 --> state not equal iter 17 325 257 rb_n: idmap add 290->325 gc_ns[15]: idmap add 257->257 --> state safe Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260529014936.2811085-5-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-06-01bpf: Assign reg->id when getting referenced kptr from ctxAmery Hung
Assign reg->id when getting referenced kptr from read program context to be consistent with R0 of KF_ACQUIRE kfunc. skb dynptr will track the referenced skb in qdisc programs using a new field reg->parent_id in a later patch. Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260529014936.2811085-4-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>