summaryrefslogtreecommitdiff
path: root/include/uapi
AgeCommit message (Collapse)Author
2026-03-25wifi: nl80211: add support for NAN stationsMiri Korenblit
There are 2 types of logical links with a NAN peer: - management (NMI), which is used for Tx/Rx of NAN management frames. - data (NDI), which is used for Tx/Rx of data frames, or non-NAN management frames. The NMI station has two roles: - representation of the NAN peer - for example, the peer's schedule and the HT, VHT, HE capabilities - belong to the NMI station, and not to the NDI ones. - Tx/Rx of NAN management frames to/from the peer. The NDI station is used for Tx/Rx data frames of a specific NDP that was established with the NAN peer. Note that a peer can choose to reuse its NMI address as the NDI address. In that case, it is expected that two stations will be added even though they will have the same address. - An NDI station can only be added after the corresponding NMI station was configured with capabilities. - All the NDI stations will be removed before the NDI interface is brought down. - All NMI stations will be removed before NAN is stopped. - Before NMI sta removal, all corresponding NDI stations will be removed Add support for adding, removing, and changing NMI and NDI stations. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260219114327.d280936ee832.I6d859eee759bb5824a9ffd2984410faf879ba00e@changeid Link: https://patch.msgid.link/20260318123926.206536-6-miriam.rachel.korenblit@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-03-25wifi: cfg80211: separately store HT, VHT and HE capabilities for NANMiri Korenblit
In NAN, unlike in other modes, there is only one set of (HT, VHT, HE) capabilities that is used for all channels (and bands) used in the NAN data path. This set of capabilities will have to be a special one, for example - have the minimum of (HT-for-5 GHz, HT-for-2.4 GHz), careful handling of the bits that have a different meaning for each band, etc. While we could use the exiting sband/iftype capabilities, and require identical capabilities for all bands (makes no sense since this means that we will have VHT capabilities in the 2.4 GHz slot), or require that only one of the sbands will be set, or have logic to extract the minimum and handle the conflicting bits - it seems simpler to add a dedicated set of capabilities which is special for NAN, and is band agnostic, to be populated by the driver. That way we also let the driver decide how it wants to handle the conflicting bits. Add this special set of these capabilities to wiphy:nan_capabilities, to be populated by the driver. Send it to user space. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260219114327.4b6f3e4a81b4.I45422adc0df3ad4101d857a92e83f0de5cf241e1@changeid Link: https://patch.msgid.link/20260318123926.206536-5-miriam.rachel.korenblit@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-03-25wifi: cfg80211: add support for NAN data interfaceMiri Korenblit
This new interface type represents a NAN data interface (NDI). It is used for data communication with NAN peers. Note that the existing NL80211_IFTYPE_NAN interface, which is the NAN Management Interface (NMI), is used for management communication. An NDI interface is started when a new NAN data path is about to be established, and is stopped after the NAN data path is terminated. - An NDI interface can only be started if the NMI is running, and NAN is started. - Before the NMI is stopped, the NDI interfaces will be stopped. Add the new interface type, handle add/remove operations for it, and makes sure of the conditions above. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260219114327.0d681335c2e2.I92973483e927820ae2297853c141842fdb262747@changeid Link: https://patch.msgid.link/20260318123926.206536-4-miriam.rachel.korenblit@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-03-25wifi: cfg80211: Add an API to configure local NAN scheduleMiri Korenblit
Add an nl80211 API to allow user space to configure the local NAN schedule. The local schedule consists of a list of channel definitions and a schedule map, in which each element covers a time slot and indicates on what channel the device should be in that time slot. Channels can be added to schedule even without being scheduled, for reservation purposes. A schedule can be configured either immedietally or be deferred, in case there are already connected peers. When the deferred flag is set, the command is a request from the device to perform an announced schedule update: send the updated NAN Availability - as set in this command - to the peers, and do the actual switch to the new schedule on the right time (i.e. at the end of the slot after the slot in which the update was sent to the peers). In addition, a notification will be sent to indicate a deferred update completion. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260219114327.ecca178a2de0.Ic977ab08b4ed5cf9b849e55d3a59b01ad3fbd08e@changeid Link: https://patch.msgid.link/20260318123926.206536-2-miriam.rachel.korenblit@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-03-25accel/amdxdna: Add per-process BO memory usage query supportMax Zhen
Add support for querying per-process buffer object (BO) memory usage through the amdxdna GET_ARRAY UAPI. Introduce a new query type, DRM_AMDXDNA_BO_USAGE, along with struct amdxdna_drm_bo_usage to report BO memory usage statistics, including heap, total, and internal usage. Track BO memory usage on a per-client basis by maintaining counters in GEM open/close and heap allocation/free paths. This ensures the reported statistics reflect the current memory footprint of each process. Wire the new query into the GET_ARRAY implementation to expose the usage information to userspace. Link: https://github.com/amd/xdna-driver/commit/0546f2aaadbdacf1c3556410ecd71622044cd916 Signed-off-by: Max Zhen <max.zhen@amd.com> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260324163159.2425461-1-lizhi.hou@amd.com
2026-03-24Merge commit 'f35dbac6942171dc4ce9398d1d216a59224590a9' into ↵Steven Rostedt
trace/ring-buffer/core The commit f35dbac69421 ("ring-buffer: Fix to update per-subbuf entries of persistent ring buffer") was a fix and merged upstream. It is needed for some other work in the ring buffer. The current branch has the remote buffer code that is shared with the Arm64 subsystem and can't be rebased. Merge in the upstream commit to allow continuing of the ring buffer work. Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-24module: Move 'struct module_signature' to UAPIThomas Weißschuh
This structure definition is used outside the kernel proper. For example in kmod and the kernel build environment. To allow reuse, move it to a new UAPI header. While it is not a true UAPI, it is a common practice to have non-UAPI interface definitions in the kernel's UAPI headers. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Nicolas Schier <nsc@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
2026-03-24tracing: trace_mmap.h: fix a kernel-doc warningRandy Dunlap
Add a description of struct reader to resolve a kernel-doc warning: Warning: include/uapi/linux/trace_mmap.h:43 struct member 'reader' not described in 'trace_buffer_meta' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-24drm/panthor: extend timestamp query with flagsMarcin Slusarz
Flags now control which data user space wants to query, there is more information sources, and there's ability to query duration of multiple timestamp reads. New sources: - CPU's monotonic, - CPU's monotonic raw, - GPU's cycle count These changes should make the implementation of VK_KHR_calibrated_timestamps more accurate and much simpler. Signed-off-by: Marcin Slusarz <marcin.slusarz@arm.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Link: https://patch.msgid.link/20260324132557.1707286-1-marcin.slusarz@arm.com
2026-03-24iommu/tegra241-cmdqv: Update uAPI to clarify HYP_OWN requirementNicolin Chen
>From hardware implementation perspective, a guest tegra241-cmdqv hardware is different than the host hardware: - Host HW is backed by a VINTF (HYP_OWN=1) - Guest HW is backed by a VINTF (HYP_OWN=0) The kernel driver has an implementation requirement of the HYP_OWN bit in the VM. So, VMM must follow that to allow the same copy of Linux to work. Add this requirement to the uAPI, which is currently missing. Fixes: 4dc0d12474f9 ("iommu/tegra241-cmdqv: Add user-space use support") Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-03-23pidfds: add coredump_code field to pidfd_infoEmanuele Rocca
The struct pidfd_info currently exposes in a field called coredump_signal the signal number (si_signo) that triggered the dump (for example, 11 for SIGSEGV). However, it is also valuable to understand the reason why that signal was sent. This additional context is provided by the signal code (si_code), such as 2 for SEGV_ACCERR. Add a new field to struct pidfd_info called coredump_code with the value of si_code for the benefit of sysadmins who pipe core dumps to user-space programs for later analysis. The following snippet illustrates a simplified C program that consumes coredump_signal and coredump_code, and then logs core dump signals and codes to a file: int pidfd = (int)atoi(argv[1]); struct pidfd_info info = { .mask = PIDFD_INFO_EXIT | PIDFD_INFO_COREDUMP, }; if (ioctl(pidfd, PIDFD_GET_INFO, &info) == 0) if (info.mask & PIDFD_INFO_COREDUMP) fprintf(f, "PID=%d, si_signo: %d si_code: %d\n", info.pid, info.coredump_signal, info.coredump_code); Assuming the program is installed under /usr/local/bin/core-logger, core dump processing can be enabled by setting /proc/sys/kernel/core_pattern to '|/usr/local/bin/dumpstuff %F'. systemd-coredump(8) already uses pidfds to process core dumps, and it could be extended to include the values of coredump_code too. Signed-off-by: Emanuele Rocca <emanuele.rocca@arm.com> Link: https://patch.msgid.link/acE52HIFivNZN3nE@NH27D9T0LF Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-23drm/xe/xe3p_lpg: Restrict UAPI to enable L2 flush optimizationTejas Upadhyay
When set, starting xe3p_lpg, the L2 flush optimization feature will control whether L2 is in Persistent or Transient mode through monitoring of media activity. To enable L2 flush optimization include new feature flag GUC_CTL_ENABLE_L2FLUSH_OPT for Novalake platforms when media type is detected. Tighten UAPI validation to restrict userptr, svm and dmabuf mappings to be either 2WAY or XA+1WAY V5(Thomas): logic correction V4(MattA): Modify uapi doc and commit V3(MattA): check valid op and pat_index value V2(MattA): validate dma-buf bos and madvise pat-index Acked-by: José Roberto de Souza <jose.souza@intel.com> Acked-by: Michal Mrozek <michal.mrozek@intel.com> Acked-by: Carl Zhang <carl.zhang@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260305121902.1892593-9-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
2026-03-23Merge 7.0-rc5 into tty-nextGreg Kroah-Hartman
We need the tty/serial fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-23Merge tag 'drm-misc-next-2026-03-20' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v7.1: UAPI Changes: math: - provide __KERNEL_DIV_ROUND_CLOSEST() in UAPI mode: - provide DRM_ARGB_GET*() macros for reading color components Cross-subsystem Changes: math: - implement DIV_ROUND_CLOSEST() with __KERNEL_DIV_ROUND_CLOSEST() Core Changes: atomic: - fix handling of colorop state in atomic updates - provide CRTC background color ttm: - improve tests and doumentation Driver Changes: amdxdna: - allow forcing DMA through IOMMU IOVA - improve debugging bridge: - Support Lontium LT8713SX DP MST bridge plus DT bindings imx: - support planes behind the primary plane - fix bus-format selection ivpu: - perform engine reset on TDR error panel: - novatek-nt36672a: Use mipi_dsi_*_multi() functions - panel-edp: Support BOE NV153WUM-N42, CMN N153JCA-ELK, CSW MNF307QS3-2 renesas: - rz-du: clean up rockchip: - support CRTC background color sun4i: - fix leak in init code - clean up tildc - clean up v3d: - improve handling of struct v3d_stats - improve error handling - clean up vkms: - support CRTC background color Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260320082604.GA17867@linux.fritz.box
2026-03-21um: time-travel: clean up kernel-doc warningsRandy Dunlap
Repair all kernel-doc warnings in um_timetravel.h: - add one enum description - mark "reserve" as private - use a leading '@' on current_time Warning: include/uapi/linux/um_timetravel.h:59 Enum value 'UM_TIMETRAVEL_SHARED_MAX_FDS' not described in enum 'um_timetravel_shared_mem_fds' Warning: include/uapi/linux/um_timetravel.h:245 union member 'reserve' not described in 'um_timetravel_schedshm_client' Warning: include/uapi/linux/um_timetravel.h:288 struct member 'current_time' not described in 'um_timetravel_schedshm' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20260226221112.1042008-1-rdunlap@infradead.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-03-20accel/amdxdna: Refactor GEM BO handling and add helper APIs for address ↵Max Zhen
retrieval Refactor amdxdna GEM buffer object (BO) handling to simplify address management and unify BO type semantics. Introduce helper APIs to retrieve commonly used BO addresses: - User virtual address (UVA) - Kernel virtual address (KVA) - Device address (IOVA/PA) These helpers centralize address lookup logic and avoid duplicating BO-specific handling across submission and execution paths. This also improves readability and reduces the risk of inconsistent address handling in future changes. As part of the refactor: - Rename SHMEM BO type to SHARE to better reflect its usage. - Merge CMD BO handling into SHARE, removing special-case logic for command buffers. - Consolidate BO type handling paths to reduce code duplication and simplify maintenance. No functional change is intended. The refactor prepares the driver for future enhancements by providing a cleaner abstraction for BO address management. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Max Zhen <max.zhen@amd.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260320210615.1973016-1-lizhi.hou@amd.com
2026-03-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-7.0-rc5). net/netfilter/nft_set_rbtree.c 598adea720b97 ("netfilter: revert nft_set_rbtree: validate open interval overlap") 3aea466a43998 ("netfilter: nft_set_rbtree: don't disable bh when acquiring tree lock") https://lore.kernel.org/abgaQBpeGstdN4oq@sirena.org.uk No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-19vfio: Define uAPI for re-init initial bytes during the PRE_COPY phaseYishai Hadas
As currently defined, initial_bytes is monotonically decreasing and precedes dirty_bytes when reading from the saving file descriptor. The transition from initial_bytes to dirty_bytes is unidirectional and irreversible. The initial_bytes are considered as critical data that is highly recommended to be transferred to the target as part of PRE_COPY, without this data, the PRE_COPY phase would be ineffective. We come to solve the case when a new chunk of critical data is introduced during the PRE_COPY phase and the driver would like to report an entirely new value for the initial_bytes. For that, we extend the VFIO_MIG_GET_PRECOPY_INFO ioctl with an output flag named VFIO_PRECOPY_INFO_REINIT to allow drivers reporting a new initial_bytes value during the PRE_COPY phase. Currently, existing VFIO_MIG_GET_PRECOPY_INFO implementations don't assign info.flags before copy_to_user(), this effectively echoes userspace-provided flags back as output, preventing the field from being used to report new reliable data from the drivers. Reliable use of the new VFIO_PRECOPY_INFO_REINIT flag requires userspace to explicitly opt in by enabling the VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 device feature. When the caller opts in, the driver may report an entirely new value for initial_bytes. It may be larger, it may be smaller, it may include the previous unread initial_bytes, it may discard the previous unread initial_bytes, up to the driver logic and state. The presence of the VFIO_PRECOPY_INFO_REINIT output flag set by the driver indicates that new initial data is present on the stream. Once the caller sees this flag, the initial_bytes value should be re-evaluated relative to the readiness state for transition to STOP_COPY. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20260317161753.18964-2-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-03-19vfio: uapi: fix comment typoJoseph Salisbury
The file contains a spelling error in a source comment (succes). Typos in comments reduce readability and make text searches less reliable for developers and maintainers. Replace 'succes' with 'success' in the affected comment. This is a comment-only cleanup and does not change behavior. Signed-off-by: Joseph Salisbury <joseph.salisbury@oracle.com> Link: https://lore.kernel.org/r/20260316185617.166414-1-joseph.salisbury@oracle.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-03-19KVM: arm64: gic-v5: Add ARM_VGIC_V5 device to KVM headersSascha Bischoff
This is the base GICv5 device which is to be used with the KVM_CREATE_DEVICE ioctl to create a GICv5-based vgic. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-9-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-19bsg: add bsg_uring_cmd uapi structureYang Xiuwei
Add the bsg_uring_cmd structure to the BSG UAPI header to support io_uring-based SCSI passthrough operations via IORING_OP_URING_CMD. Signed-off-by: Yang Xiuwei <yangxiuwei@kylinos.cn> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20260317072226.2598233-2-yangxiuwei@kylinos.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-19Merge tag 'wireless-next-2026-03-19' of ↵Paolo Abeni
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Aside from various small improvements/cleanups, not much: - cfg80211/mac80211: S1G and UHR improvements - hwsim: incumbent signal report test support * tag 'wireless-next-2026-03-19' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (31 commits) qtnfmac: use alloc_netdev macro for single queue devices wifi: libertas: don't kill URBs in interrupt context wifi: libertas: use USB anchors for tracking in-flight URBs wifi: nl80211: use int for band coming from netlink wifi: rsi_91x_usb: do not pause rfkill polling when stopping mac80211 wifi: mac80211: fix STA link removal during link removal wifi: nl80211: reject S1G/60G with HT chantype wifi: ieee80211: fix definition of EHT-MCS 15 in MRU wifi: cfg80211: check non-S1G width with S1G chandef wifi: cfg80211: restrict cfg80211_chandef_create() to only HT-based bands wifi: mac80211: don't use cfg80211_chandef_create() for default chandef wifi: mac80211: Remove deleted sta links in ieee80211_ml_reconf_work() wifi: b43: use register definitions in nphy_op_software_rfkill wifi: cfg80211: split control freq check from chandef check wifi: mac80211: always use full chanctx compatible check wifi: mac80211: refactor chandef tracing macros wifi: mac80211: validate HE 6 GHz operation when EHT is used wifi: nl80211: split out UHR operation information wifi: mwifiex: drop redundant device reference wifi: rt2x00: drop redundant device reference ... ==================== Link: https://patch.msgid.link/20260319082439.79875-3-johannes@sipsolutions.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-19Merge tag 'ovpn-net-next-20260317' of https://github.com/OpenVPN/ovpn-net-nextPaolo Abeni
Antonio Quartulli says: ==================== Included features: * use bitops.h API when possible * send netlink notification in case of client float event * implement support for asymmetric peer IDs * consolidate memory allocations during crypto operations * add netlink notification check in selftests * add FW mark check in selftest * tag 'ovpn-net-next-20260317' of https://github.com/OpenVPN/ovpn-net-next: ovpn: consolidate crypto allocations in one chunk selftests: ovpn: add test for the FW mark feature selftests: ovpn: check asymmetric peer-id ovpn: add support for asymmetric peer IDs selftests: ovpn: add notification parsing and matching ovpn: notify userspace on client float event ovpn: pktid: use bitops.h API ovpn: use correct array size to parse nested attributes in ovpn_nl_key_swap_doit selftests: ovpn: allow compiling ovpn-cli.c with mbedtls3 ==================== Link: https://patch.msgid.link/20260317104023.192548-1-antonio@openvpn.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-19Merge drm/drm-next into drm-xe-nextThomas Hellström
Bring in series "drm/{i915,xe}: sort out step enums between the drivers" that was merged through i915. Link: https://lore.kernel.org/all/cover.1772635152.git.jani.nikula@intel.com Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2026-03-18net: ethtool: add ethtool COALESCE_RX_CQE_FRAMES/NSECSHaiyang Zhang
Add two parameters for drivers supporting Rx CQE coalescing / descriptor writeback. ETHTOOL_A_COALESCE_RX_CQE_FRAMES: Maximum number of frames that can be coalesced into a CQE or writeback. ETHTOOL_A_COALESCE_RX_CQE_NSECS: Max time in nanoseconds after the first packet arrival in a coalesced CQE or writeback to be sent. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/20260317191826.1346111-2-haiyangz@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-18drm: Add CRTC background color propertyCristian Ciocaltea
Some display controllers can be hardware programmed to show non-black colors for pixels that are either not covered by any plane or are exposed through transparent regions of higher planes. This feature can help reduce memory bandwidth usage, e.g. in compositors managing a UI with a solid background color while using smaller planes to render the remaining content. To support this capability, introduce the BACKGROUND_COLOR standard DRM mode property, which can be attached to a CRTC through the drm_crtc_attach_background_color_property() helper function. Additionally, define a 64-bit ARGB format value to be built with the help of a couple of dedicated DRM_ARGB64_PREP*() helpers. Individual color components can be extracted with desired precision using the corresponding DRM_ARGB64_GET*() macros. Co-developed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Tested-by: Diederik de Haas <diederik@cknow-tech.com> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://patch.msgid.link/20260303-rk3588-bgcolor-v8-2-fee377037ad1@collabora.com Signed-off-by: Daniel Stone <daniels@collabora.com>
2026-03-18uapi: Provide DIV_ROUND_CLOSEST()Cristian Ciocaltea
Currently DIV_ROUND_CLOSEST() is only available for the kernel via include/linux/math.h. Expose it to userland as well by adding __KERNEL_DIV_ROUND_CLOSEST() as a common definition in uapi. Additionally, ensure it allows building ISO C applications by switching from the 'typeof' GNU extension to the ISO-friendly __typeof__. Reviewed-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Tested-by: Diederik de Haas <diederik@cknow-tech.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://patch.msgid.link/20260303-rk3588-bgcolor-v8-1-fee377037ad1@collabora.com Signed-off-by: Daniel Stone <daniels@collabora.com>
2026-03-17iommufd: Report ATS not supported status via IOMMU_GET_HW_INFOShameer Kolothum
If the IOMMU driver reports that ATS is not supported for a device, set the IOMMU_HW_CAP_PCI_ATS_NOT_SUPPORTED flag in the returned hardware capabilities. This uses a negative flag for UAPI compatibility. Existing userspace assumes ATS is supported if no flag is present. This also ensures that new userspace works correctly on both old and new kernels, where a zero value implies ATS support. When this flag is set, ATS cannot be used for the device. When it is clear, ATS may be enabled when an appropriate HWPT is attached. Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-03-17RDMA/efa: Rename alloc_ucontext comp_mask to supported_capsMichael Margolin
Following discussion [1], rename the comp_mask field in efa_ibv_alloc_ucontext_cmd to supported_caps to reflect its actual usage as a capabilities handshake mechanism rather than a standard comp_mask. Rename related constants and align function and macro names. [1] https://lore.kernel.org/linux-rdma/20260312120858.GH1448102@nvidia.com/ Signed-off-by: Michael Margolin <mrgolin@amazon.com> Link: https://patch.msgid.link/20260316180846.30273-1-mrgolin@amazon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-17ovpn: add support for asymmetric peer IDsRalf Lici
In order to support the multipeer architecture, upon connection setup each side of a tunnel advertises a unique ID that the other side must include in packets sent to them. Therefore when transmitting a packet, a peer inserts the recipient's advertised ID for that specific tunnel into the peer ID field. When receiving a packet, a peer expects to find its own unique receive ID for that specific tunnel in the peer ID field. Add support for the TX peer ID and embed it into transmitting packets. If no TX peer ID is specified, fallback to using the same peer ID both for RX and TX in order to be compatible with the non-multipeer compliant peers. Cc: horms@kernel.org Cc: donald.hunter@gmail.com Signed-off-by: Ralf Lici <ralf@mandelbit.com> Signed-off-by: Antonio Quartulli <antonio@openvpn.net> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
2026-03-17ovpn: notify userspace on client float eventRalf Lici
Send a netlink notification when a client updates its remote UDP endpoint. The notification includes the new IP address, port, and scope ID (for IPv6). Cc: linux-kselftest@vger.kernel.org Cc: horms@kernel.org Cc: shuah@kernel.org Cc: donald.hunter@gmail.com Signed-off-by: Ralf Lici <ralf@mandelbit.com> Signed-off-by: Antonio Quartulli <antonio@openvpn.net> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
2026-03-17Merge tag 'v7.0-rc4' into sched/core, to pick up scheduler fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2026-03-16Merge tag 'drm-xe-next-2026-03-12' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next UAPI Changes: - add VM_BIND DECOMPRESS support and on-demand decompression (Nitin) - Allow per queue programming of COMMON_SLICE_CHICKEN3 bit13 (Lionel) Cross-subsystem Changes: - Introduce the DRM RAS infrastructure over generic netlink (Riana, Rodrigo) Core Changes: - Two-pass MMU interval notifiers (Thomas) Driver Changes: - Merge drm/drm-next into drm-xe-next (Brost) - Fix overflow in guc_ct_snapshot_capture (Mika, Fixes) - Extract gt_pta_entry (Gustavo) - Extra enabling patches for NVL-P (Gustavo) - Add Wa_14026578760 (Varun) - Add type-specific GT loop iterator (Roper) - Refactor xe_migrate_prepare_vm (Raag) - Don't disable GuCRC in suspend path (Vinay, Fixes) - Add missing kernel docs in xe_exec_queue.c (Niranjana) - Change TEST_VRAM to work with 32-bit resource_size_t (Wajdeczko) - Fix memory leak in xe_vm_madvise_ioctl (Varun, Fixes) - Skip access counter queue init for unsupported platforms (Himal) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/abLUVfSHu8EHRF9q@lstrano-desk.jf.intel.com
2026-03-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Quite a large pull request, partly due to skipping last week and therefore having material from ~all submaintainers in this one. About a fourth of it is a new selftest, and a couple more changes are large in number of files touched (fixing a -Wflex-array-member-not-at-end compiler warning) or lines changed (reformatting of a table in the API documentation, thanks rST). But who am I kidding---it's a lot of commits and there are a lot of bugs being fixed here, some of them on the nastier side like the RISC-V ones. ARM: - Correctly handle deactivation of interrupts that were activated from LRs. Since EOIcount only denotes deactivation of interrupts that are not present in an LR, start EOIcount deactivation walk *after* the last irq that made it into an LR - Avoid calling into the stubs to probe for ICH_VTR_EL2.TDS when pKVM is already enabled -- not only thhis isn't possible (pKVM will reject the call), but it is also useless: this can only happen for a CPU that has already booted once, and the capability will not change - Fix a couple of low-severity bugs in our S2 fault handling path, affecting the recently introduced LS64 handling and the even more esoteric handling of hwpoison in a nested context - Address yet another syzkaller finding in the vgic initialisation, where we would end-up destroying an uninitialised vgic with nasty consequences - Address an annoying case of pKVM failing to boot when some of the memblock regions that the host is faulting in are not page-aligned - Inject some sanity in the NV stage-2 walker by checking the limits against the advertised PA size, and correctly report the resulting faults PPC: - Fix a PPC e500 build error due to a long-standing wart that was exposed by the recent conversion to kmalloc_obj(); rip out all the ugliness that led to the wart RISC-V: - Prevent speculative out-of-bounds access using array_index_nospec() in APLIC interrupt handling, ONE_REG regiser access, AIA CSR access, float register access, and PMU counter access - Fix potential use-after-free issues in kvm_riscv_gstage_get_leaf(), kvm_riscv_aia_aplic_has_attr(), and kvm_riscv_aia_imsic_has_attr() - Fix potential null pointer dereference in kvm_riscv_vcpu_aia_rmw_topei() - Fix off-by-one array access in SBI PMU - Skip THP support check during dirty logging - Fix error code returned for Smstateen and Ssaia ONE_REG interface - Check host Ssaia extension when creating AIA irqchip x86: - Fix cases where CPUID mitigation features were incorrectly marked as available whenever the kernel used scattered feature words for them - Validate _all_ GVAs, rather than just the first GVA, when processing a range of GVAs for Hyper-V's TLB flush hypercalls - Fix a brown paper bug in add_atomic_switch_msr() - Use hlist_for_each_entry_srcu() when traversing mask_notifier_list, to fix a lockdep warning; KVM doesn't hold RCU, just irq_srcu - Ensure AVIC VMCB fields are initialized if the VM has an in-kernel local APIC (and AVIC is enabled at the module level) - Update CR8 write interception when AVIC is (de)activated, to fix a bug where the guest can run in perpetuity with the CR8 intercept enabled - Add a quirk to skip the consistency check on FREEZE_IN_SMM, i.e. to allow L1 hypervisors to set FREEZE_IN_SMM. This reverts (by default) an unintentional tightening of userspace ABI in 6.17, and provides some amount of backwards compatibility with hypervisors who want to freeze PMCs on VM-Entry - Validate the VMCS/VMCB on return to a nested guest from SMM, because either userspace or the guest could stash invalid values in memory and trigger the processor's consistency checks Generic: - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being unnecessary and confusing, triggered compiler warnings due to -Wflex-array-member-not-at-end - Document that vcpu->mutex is take outside of kvm->slots_lock and kvm->slots_arch_lock, which is intentional and desirable despite being rather unintuitive Selftests: - Increase the maximum number of NUMA nodes in the guest_memfd selftest to 64 (from 8)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits) KVM: selftests: Verify SEV+ guests can read and write EFER, CR0, CR4, and CR8 Documentation: kvm: fix formatting of the quirks table KVM: x86: clarify leave_smm() return value selftests: kvm: add a test that VMX validates controls on RSM selftests: kvm: extract common functionality out of smm_test.c KVM: SVM: check validity of VMCB controls when returning from SMM KVM: VMX: check validity of VMCS controls when returning from SMM KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM KVM: x86: Fix SRCU list traversal in kvm_fire_mask_notifiers() KVM: VMX: Fix a wrong MSR update in add_atomic_switch_msr() KVM: x86: hyper-v: Validate all GVAs during PV TLB flush KVM: x86: synthesize CPUID bits only if CPU capability is set KVM: PPC: e500: Rip out "struct tlbe_ref" KVM: PPC: e500: Fix build error due to using kmalloc_obj() with wrong type KVM: selftests: Increase 'maxnode' for guest_memfd tests KVM: arm64: pkvm: Don't reprobe for ICH_VTR_EL2.TDS on CPU hotplug KVM: arm64: vgic: Pick EOIcount deactivations from AP-list tail KVM: arm64: Remove the redundant ISB in __kvm_at_s1e2() ...
2026-03-14devlink: support index-based lookup via bus_name/dev_name handleJiri Pirko
Devlink instances without a backing device use bus_name "devlink_index" and dev_name set to the decimal index string. When user space sends this handle, detect the pattern and perform a direct xarray lookup by index instead of iterating all instances. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20260312100407.551173-6-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-14devlink: expose devlink instance index over netlinkJiri Pirko
Each devlink instance has an internally assigned index used for xarray storage. Expose it as a new DEVLINK_ATTR_INDEX uint attribute alongside the existing bus_name and dev_name handle. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20260312100407.551173-2-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-14include/psp-sev.h: fix structure member in commentTycho Andersen (AMD)
The member is 'data', not 'opaque'. Signed-off-by: Tycho Andersen (AMD) <tycho@kernel.org> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2026-03-13udp: Remove UDPLITE_SEND_CSCOV and UDPLITE_RECV_CSCOV.Kuniyuki Iwashima
UDP-Lite supports variable-length checksum and has two socket options, UDPLITE_SEND_CSCOV and UDPLITE_RECV_CSCOV, to control the checksum coverage. Let's remove the support. setsockopt(UDPLITE_SEND_CSCOV / UDPLITE_RECV_CSCOV) was only available for UDP-Lite and returned -ENOPROTOOPT for UDP. Now, the options are handled in ip_setsockopt() and ipv6_setsockopt(), which still return the same error. getsockopt(UDPLITE_SEND_CSCOV / UDPLITE_RECV_CSCOV) was available for UDP and always returned 0, meaning full checksum, but now -ENOPROTOOPT is returned. Given that getsockopt() is meaningless for UDP and even the options are not defined under include/uapi/, this should not be a problem. $ man 7 udplite ... BUGS Where glibc support is missing, the following definitions are needed: #define IPPROTO_UDPLITE 136 #define UDPLITE_SEND_CSCOV 10 #define UDPLITE_RECV_CSCOV 11 Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260311052020.1213705-10-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-13drm/xe/uapi: Fix kernel-doc for DRM_XE_VM_BIND_FLAG_DECOMPRESSNitin Gote
There is kernel-doc warning for DRM_XE_VM_BIND_FLAG_DECOMPRESS: ./include/uapi/drm/xe_drm.h:1060: WARNING: Block quote ends without a blank line; unexpected unindent. Fix the warning by adding the missing '%' prefix to DRM_XE_VM_BIND_FLAG_DECOMPRESS in the kernel-doc list entry for struct drm_xe_vm_bind_op. Fixes: 2270bd7124f4 ("drm/xe: add VM_BIND DECOMPRESS uapi flag") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202603121515.gEMrFlTL-lkp@intel.com/ Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260312160244.809849-2-nitin.r.gote@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
2026-03-13wifi: nl80211: split out UHR operation informationJohannes Berg
The beacon doesn't contain the full UHR operation, a number of fields (such as NPCA) are only partially there. Add a new attribute to contain the full information, so it's available to the driver/mac80211. Link: https://patch.msgid.link/20260303221710.866bacf82639.Iafdf37fb0f4304bdcdb824977d61e17b38c47685@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-03-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-7.0-rc4). drivers/net/ethernet/mellanox/mlx5/core/en_rx.c db25c42c2e1f9 ("net/mlx5e: RX, Fix XDP multi-buf frag counting for striding RQ") dff1c3164a692 ("net/mlx5e: SHAMPO, Always calculate page size") https://lore.kernel.org/aa7ORohmf67EKihj@sirena.org.uk drivers/net/ethernet/ti/am65-cpsw-nuss.c 840c9d13cb1ca ("net: ethernet: ti: am65-cpsw-nuss: Fix rx_filter value for PTP support") a23c657e332f2 ("net: ethernet: ti: am65-cpsw: Use also port number to identify timestamps") https://lore.kernel.org/abK3EkIXuVgMyGI7@sirena.org.uk No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-12KVM: x86: Use __DECLARE_FLEX_ARRAY() for UAPI structures with VLAsDavid Woodhouse
Commit 94dfc73e7cf4 ("treewide: uapi: Replace zero-length arrays with flexible-array members") broke the userspace API for C++. These structures ending in VLAs are typically a *header*, which can be followed by an arbitrary number of entries. Userspace typically creates a larger structure with some non-zero number of entries, for example in QEMU's kvm_arch_get_supported_msr_feature(): struct { struct kvm_msrs info; struct kvm_msr_entry entries[1]; } msr_data = {}; While that works in C, it fails in C++ with an error like: flexible array member 'kvm_msrs::entries' not at end of 'struct msr_data' Fix this by using __DECLARE_FLEX_ARRAY() for the VLA, which uses [0] for C++ compilation. Fixes: 94dfc73e7cf4 ("treewide: uapi: Replace zero-length arrays with flexible-array members") Cc: stable@vger.kernel.org Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://patch.msgid.link/3abaf6aefd6e5efeff3b860ac38421d9dec908db.camel@infradead.org [sean: tag for stable@] Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-12Merge drm/drm-next into drm-xe-nextMatthew Brost
Backmerging to bring in 7.00-rc3. Important ahead GPU SVM merging THP support. Signed-off-by: Matthew Brost <matthew.brost@intel.com>
2026-03-12vt: add KT_CSI keysym type for modifier-aware CSI sequencesNicolas Pitre
Add a new keysym type KT_CSI that generates CSI tilde sequences with automatic modifier encoding. The keysym value encodes the CSI parameter number, producing sequences like ESC [ <value> ~ or ESC [ <value> ; <mod> ~ when Shift, Alt, or Ctrl modifiers are held. This allows navigation keys (Home, End, Insert, Delete, PgUp, PgDn) and function keys to generate modifier-aware escape sequences without consuming string table entries for each modifier combination. Define key symbols for navigation keys (K_CSI_HOME, K_CSI_END, etc.) and function keys (K_CSI_F1 through K_CSI_F20) using standard xterm CSI parameter values. The modifier encoding follows the xterm convention: mod = 1 + (shift ? 1 : 0) + (alt ? 2 : 0) + (ctrl ? 4 : 0) Allowed CSI parameter values range from 0 to 99. Note: The Linux console historically uses a non-standard double-bracket format for F1-F5 (ESC [ [ A through ESC [ [ E) rather than the xterm tilde format (ESC [ 11 ~ through ESC [ 15 ~). The K_CSI_F1 through K_CSI_F5 definitions use the xterm format. Converting F1-F5 to KT_CSI would require updating the "linux" terminfo entry to match. Navigation keys and F6-F20 already use the tilde format and are fully compatible. Signed-off-by: Nicolas Pitre <npitre@baylibre.com> Link: https://patch.msgid.link/20260203045457.1049793-3-nico@fluxnic.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-12namespace: allow creating empty mount namespacesChristian Brauner
Add support for creating a mount namespace that contains only a copy of the root mount from the caller's mount namespace, with none of the child mounts. This is useful for containers and sandboxes that want to start with a minimal mount table and populate it from scratch rather than inheriting and then tearing down the full mount tree. Two new flags are introduced: - CLONE_EMPTY_MNTNS for clone3(), using the 64-bit flag space. - UNSHARE_EMPTY_MNTNS for unshare(), reusing the CLONE_PARENT_SETTID bit which has no meaning for unshare. Both flags imply CLONE_NEWNS. For the unshare path, UNSHARE_EMPTY_MNTNS is converted to CLONE_EMPTY_MNTNS in unshare_nsproxy_namespaces() before it reaches copy_mnt_ns(), so the mount namespace code only needs to handle a single flag. In copy_mnt_ns(), when CLONE_EMPTY_MNTNS is set, clone_mnt() is used instead of copy_tree() to clone only the root mount. The caller's root and working directory are both reset to the root dentry of the new mount. The cleanup variables are changed from vfsmount pointers with __free(mntput) to struct path with __free(path_put) because the empty mount namespace path needs to release both mount and dentry references when replacing the caller's root and pwd. In the normal (non-empty) path only the mount component is set, and dput(NULL) is a no-op so path_put remains correct there as well. Link: https://patch.msgid.link/20260306-work-empty-mntns-consolidated-v1-1-6eb30529bbb0@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-12mount: add FSMOUNT_NAMESPACEChristian Brauner
Add FSMOUNT_NAMESPACE flag to fsmount() that creates a new mount namespace with the newly created filesystem attached to a copy of the real rootfs. This returns a namespace file descriptor instead of an O_PATH mount fd, similar to how OPEN_TREE_NAMESPACE works for open_tree(). This allows creating a new filesystem and immediately placing it in a new mount namespace in a single operation, which is useful for container runtimes and other namespace-based isolation mechanisms. The rootfs mount is created before copying the real rootfs for the new namespace meaning that the mount namespace id for the mount of the root of the namespace is bigger than the child mounted on top of it. We've never explicitly given the guarantee for such ordering and I doubt anyone relies on it. Accepting that lets us avoid copying the mount again and also avoids having to massage may_copy_tree() to grant an exception for fsmount->mnt->mnt_ns being NULL. Link: https://patch.msgid.link/20260122-work-fsmount-namespace-v1-3-5ef0a886e646@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-12drm/xe: add VM_BIND DECOMPRESS uapi flagNitin Gote
Add a new VM_BIND flag, DRM_XE_VM_BIND_FLAG_DECOMPRESS, that lets userspace express intent for the driver to perform on-device in-place decompression for the GPU mapping created by a MAP bind operation. This flag is used by subsequent driver changes to trigger scheduling of GPU work that resolves compressed VRAM pages into an uncompressed PAT VM mapping. Behavior and semantics: - Valid only for DRM_XE_VM_BIND_OP_MAP. IOCTLs using this flag on other ops are rejected (-EINVAL). - The bind's pat_index must select the device "no-compression" PAT entry; otherwise the ioctl is rejected (-EINVAL). - Only meaningful for VRAM-backed BOs on devices that support Flat CCS and the required hardware generation (driver will return -EOPNOTSUPP if not). - On success the driver schedules a migrate/resolve and installs the returned dma_fence into the BO's kernel reservation (DMA_RESV_USAGE_KERNEL). Compute PR: https://github.com/intel/compute-runtime/pull/898 v3: Rebase on latest drm-tip and add compute pr info v2: Add kernel doc (Matt) Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Mrozek, Michal <michal.mrozek@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Acked-by: Michal Mrozek <michal.mrozek@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260304123758.3050386-6-nitin.r.gote@intel.com
2026-03-12Merge drm/drm-next into drm-misc-nextMaxime Ripard
Biju Das needs a patch for rz-du merged in 7.0-rc3 Signed-off-by: Maxime Ripard <mripard@kernel.org>
2026-03-11pidfd: add CLONE_PIDFD_AUTOKILLChristian Brauner
Add a new clone3() flag CLONE_PIDFD_AUTOKILL that ties a child's lifetime to the pidfd returned from clone3(). When the last reference to the struct file created by clone3() is closed the kernel sends SIGKILL to the child. A pidfd obtained via pidfd_open() for the same process does not keep the child alive and does not trigger autokill - only the specific struct file from clone3() has this property. This is useful for container runtimes, service managers, and sandboxed subprocess execution - any scenario where the child must die if the parent crashes or abandons the pidfd. CLONE_PIDFD_AUTOKILL requires both CLONE_PIDFD (the whole point is tying lifetime to the pidfd file) and CLONE_AUTOREAP (a killed child with no one to reap it would become a zombie). CLONE_THREAD is rejected because autokill targets a process not a thread. The clone3 pidfd is identified by the PIDFD_AUTOKILL file flag set on the struct file at clone3() time. The pidfs .release handler checks this flag and sends SIGKILL via do_send_sig_info(SIGKILL, SEND_SIG_PRIV, ...) only when it is set. Files from pidfd_open() or open_by_handle_at() are distinct struct files that do not carry this flag. dup()/fork() share the same struct file so they extend the child's lifetime until the last reference drops. CLONE_PIDFD_AUTOKILL uses a privilege model based on CLONE_NNP: without CLONE_NNP the child could escalate privileges via setuid/setgid exec after being spawned, so the caller must have CAP_SYS_ADMIN in its user namespace. With CLONE_NNP the child can never gain new privileges so unprivileged usage is allowed. This is a deliberate departure from the pdeath_signal model which is reset during secureexec and commit_creds() rendering it useless for container runtimes that need to deprivilege themselves. Link: https://patch.msgid.link/20260226-work-pidfs-autoreap-v5-3-d148b984a989@kernel.org Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-11clone: add CLONE_NNPChristian Brauner
Add a new clone3() flag CLONE_NNP that sets no_new_privs on the child process at clone time. This is analogous to prctl(PR_SET_NO_NEW_PRIVS) but applied at process creation rather than requiring a separate step after the child starts running. CLONE_NNP is rejected with CLONE_THREAD. It's conceptually a lot simpler if the whole thread-group is forced into NNP and not have single threads running around with NNP. Link: https://patch.msgid.link/20260226-work-pidfs-autoreap-v5-2-d148b984a989@kernel.org Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>