summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2026-01-12initrd: remove deprecated code path (linuxrc)Askar Safin
Remove linuxrc initrd code path, which was deprecated in 2020. Initramfs and (non-initial) RAM disks (i. e. brd) still work. Both built-in and bootloader-supplied initramfs still work. Non-linuxrc initrd code path (i. e. using /dev/ram as final root filesystem) still works, but I put deprecation message into it. Also I deprecate command line parameters "noinitrd" and "ramdisk_start=". Signed-off-by: Askar Safin <safinaskar@gmail.com> Link: https://patch.msgid.link/20251119222407.3333257-3-safinaskar@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12selftests: ublk: add end-to-end integrity testCaleb Sander Mateos
Add test case loop_08 to verify the ublk integrity data flow. It uses the kublk loop target to create a ublk device with integrity on top of backing data and integrity files. It then writes to the whole device with fio configured to generate integrity data. Then it reads back the whole device with fio configured to verify the integrity data. It also verifies that injected guard, reftag, and apptag corruptions are correctly detected. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12selftests: ublk: add integrity params testCaleb Sander Mateos
Add test case null_04 to exercise all the different integrity params. It creates 4 different ublk devices with different combinations of integrity arguments and verifies their integrity limits via sysfs and the metadata_size utility. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12selftests: ublk: add integrity data support to loop targetCaleb Sander Mateos
To perform and end-to-end test of integrity information through a ublk device, we need to actually store it somewhere and retrieve it. Add this support to kublk's loop target. It uses a second backing file for the integrity data corresponding to the data stored in the first file. The integrity file is initialized with byte 0xFF, which ensures the app and reference tags are set to the "escape" pattern to disable the bio-integrity-auto guard and reftag checks until the blocks are written. The integrity file is opened without O_DIRECT since it will be accessed at sub-block granularity. Each incoming read/write results in a pair of reads/writes, one to the data file, and one to the integrity file. If either backing I/O fails, the error is propagated to the ublk request. If both backing I/Os read/write some bytes, the ublk request is completed with the smaller of the number of blocks accessed by each I/O. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12selftests: ublk: support non-O_DIRECT backing filesCaleb Sander Mateos
A subsequent commit will add support for using a backing file to store integrity data. Since integrity data is accessed in intervals of metadata_size, which may be much smaller than a logical block on the backing device, direct I/O cannot be used. Add an argument to backing_file_tgt_init() to specify the number of files to open for direct I/O. The remaining files will use buffered I/O. For now, continue to request direct I/O for all the files. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12selftests: ublk: implement integrity user copy in kublkCaleb Sander Mateos
If integrity data is enabled for kublk, allocate an integrity buffer for each I/O. Extend ublk_user_copy() to copy the integrity data between the ublk request and the integrity buffer if the ublksrv_io_desc indicates that the request has integrity data. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12selftests: ublk: add kublk support for integrity paramsCaleb Sander Mateos
Add integrity param command line arguments to kublk. Plumb these to struct ublk_params for the null and fault_inject targets, as they don't need to actually read or write the integrity data. Forbid the integrity params for loop or stripe until the integrity data copy is implemented. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12selftests: ublk: add utility to get block device metadata sizeCaleb Sander Mateos
Some block device integrity parameters are available in sysfs, but others are only accessible using the FS_IOC_GETLBMD_CAP ioctl. Add a metadata_size utility program to print out the logical block metadata size, PI offset, and PI size within the metadata. Example output: $ metadata_size /dev/ublkb0 metadata_size: 64 pi_offset: 56 pi_tuple_size: 8 Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12selftests: ublk: display UBLK_F_INTEGRITY supportCaleb Sander Mateos
Add support for printing the UBLK_F_INTEGRITY feature flag in the human-readable kublk features output. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12ublk: optimize ublk_user_copy() on daemon taskCaleb Sander Mateos
ublk user copy syscalls may be issued from any task, so they take a reference count on the struct ublk_io to check whether it is owned by the ublk server and prevent a concurrent UBLK_IO_COMMIT_AND_FETCH_REQ from completing the request. However, if the user copy syscall is issued on the io's daemon task, a concurrent UBLK_IO_COMMIT_AND_FETCH_REQ isn't possible, so the atomic reference count dance is unnecessary. Check for UBLK_IO_FLAG_OWNED_BY_SRV to ensure the request is dispatched to the sever and obtain the request from ublk_io's req field instead of looking it up on the tagset. Skip the reference count increment and decrement. Commit 8a8fe42d765b ("ublk: optimize UBLK_IO_REGISTER_IO_BUF on daemon task") made an analogous optimization for ublk zero copy buffer registration. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12ublk: support UBLK_F_INTEGRITYStanley Zhang
Now that all the components of the ublk integrity feature have been implemented, add UBLK_F_INTEGRITY to UBLK_F_ALL, conditional on block layer integrity support (CONFIG_BLK_DEV_INTEGRITY). This allows ublk servers to create ublk devices with UBLK_F_INTEGRITY set and UBLK_U_CMD_GET_FEATURES to report the feature as supported. Signed-off-by: Stanley Zhang <stazhang@purestorage.com> [csander: make feature conditional on CONFIG_BLK_DEV_INTEGRITY] Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12ublk: implement integrity user copyStanley Zhang
Add a function ublk_copy_user_integrity() to copy integrity information between a request and a user iov_iter. This mirrors the existing ublk_copy_user_pages() but operates on request integrity data instead of regular data. Check UBLKSRV_IO_INTEGRITY_FLAG in iocb->ki_pos in ublk_user_copy() to choose between copying data or integrity data. [csander: change offset units from data bytes to integrity data bytes, fix CONFIG_BLK_DEV_INTEGRITY=n build, rebase on user copy refactor] Signed-off-by: Stanley Zhang <stazhang@purestorage.com> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12ublk: move offset check out of __ublk_check_and_get_req()Caleb Sander Mateos
__ublk_check_and_get_req() checks that the passed in offset is within the data length of the specified ublk request. However, only user copy (ublk_check_and_get_req()) supports accessing ublk request data at a nonzero offset. Zero-copy buffer registration (ublk_register_io_buf()) always passes 0 for the offset, so the check is unnecessary. Move the check from __ublk_check_and_get_req() to ublk_check_and_get_req(). Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12ublk: inline ublk_check_and_get_req() into ublk_user_copy()Caleb Sander Mateos
ublk_check_and_get_req() has a single callsite in ublk_user_copy(). It takes a ton of arguments in order to pass local variables from ublk_user_copy() to ublk_check_and_get_req() and vice versa. And more are about to be added. Combine the functions to reduce the argument passing noise. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12ublk: split out ublk_user_copy() helperCaleb Sander Mateos
ublk_ch_read_iter() and ublk_ch_write_iter() are nearly identical except for the iter direction. Split out a helper function ublk_user_copy() to reduce the code duplication as these functions are about to get larger. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12ublk: split out ublk_copy_user_bvec() helperCaleb Sander Mateos
Factor a helper function ublk_copy_user_bvec() out of ublk_copy_user_pages(). It will be used for copying integrity data too. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12ublk: set UBLK_IO_F_INTEGRITY in ublksrv_io_descCaleb Sander Mateos
Indicate to the ublk server when an incoming request has integrity data by setting UBLK_IO_F_INTEGRITY in the ublksrv_io_desc's op_flags field. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12ublk: support UBLK_PARAM_TYPE_INTEGRITY in device creationStanley Zhang
Add a feature flag UBLK_F_INTEGRITY for a ublk server to request integrity/metadata support when creating a ublk device. The ublk server can also check for the feature flag on the created device or the result of UBLK_U_CMD_GET_FEATURES to tell if the ublk driver supports it. UBLK_F_INTEGRITY requires UBLK_F_USER_COPY, as user copy is the only data copy mode initially supported for integrity data. Add UBLK_PARAM_TYPE_INTEGRITY and struct ublk_param_integrity to struct ublk_params to specify the integrity params of a ublk device. UBLK_PARAM_TYPE_INTEGRITY requires UBLK_F_INTEGRITY and a nonzero metadata_size. The LBMD_PI_CAP_* and LBMD_PI_CSUM_* values from the linux/fs.h UAPI header are used for the flags and csum_type fields. If the UBLK_PARAM_TYPE_INTEGRITY flag is set, validate the integrity parameters and apply them to the blk_integrity limits. The struct ublk_param_integrity validations are based on the checks in blk_validate_integrity_limits(). Any invalid parameters should be rejected before being applied to struct blk_integrity. [csander: drop redundant pi_tuple_size field, use block metadata UAPI constants, add param validation] Signed-off-by: Stanley Zhang <stazhang@purestorage.com> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12ublk: move ublk flag check functions earlierCaleb Sander Mateos
ublk_dev_support_user_copy() will be used in ublk_validate_params(). Move these functions next to ublk_{dev,queue}_is_zoned() to avoid needing to forward-declare them. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12blk-integrity: take const pointer in blk_integrity_rq()Caleb Sander Mateos
blk_integrity_rq() doesn't modify the struct request passed in, so allow a const pointer to be passed. Use a matching signature for the !CONFIG_BLK_DEV_INTEGRITY version. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-12landlock: Clarify documentation for the IOCTL access rightGünther Noack
Move the description of the LANDLOCK_ACCESS_FS_IOCTL_DEV access right together with the file access rights. This group of access rights applies to files (in this case device files), and they can be added to file or directory inodes using landlock_add_rule(2). The check for that works the same for all file access rights, including LANDLOCK_ACCESS_FS_IOCTL_DEV. Invoking ioctl(2) on directory FDs can not currently be restricted with Landlock. Having it grouped separately in the documentation is a remnant from earlier revisions of the LANDLOCK_ACCESS_FS_IOCTL_DEV patch set. Link: https://lore.kernel.org/all/20260108.Thaex5ruach2@digikod.net/ Signed-off-by: Günther Noack <gnoack3000@gmail.com> Link: https://lore.kernel.org/r/20260111175203.6545-2-gnoack3000@gmail.com Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-01-12cxl/hdm: Fix potential infinite loop in __cxl_dpa_reserve()Li Ming
In __cxl_dpa_reserve(), it will check if the new resource range is included in one of paritions of the cxl memory device. cxlds->nr_paritions is used to represent how many partitions information the cxl memory device has. In the loop, if driver cannot find a partition including the new resource range, it will be an infinite loop. [ dj: Removed incorrect fixes tag ] Fixes: 991d98f17d31 ("cxl: Make cxl_dpa_alloc() DPA partition number agnostic") Signed-off-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20260112120526.530232-1-ming.li@zohomail.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2026-01-12Merge patch series "fs: add immutable rootfs"Christian Brauner
Christian Brauner <brauner@kernel.org> says: Currently pivot_root() doesn't work on the real rootfs because it cannot be unmounted. Userspace has to do a recursive removal of the initramfs contents manually before continuing the boot. Really all we want from the real rootfs is to serve as the parent mount for anything that is actually useful such as the tmpfs or ramfs for initramfs unpacking or the rootfs itself. There's no need for the real rootfs to actually be anything meaningful or useful. Add a immutable rootfs called "nullfs" that can be selected via the "nullfs_rootfs" kernel command line option. The kernel will mount a tmpfs/ramfs on top of it, unpack the initramfs and fire up userspace which mounts the rootfs and can then just do: chdir(rootfs); pivot_root(".", "."); umount2(".", MNT_DETACH); and be done with it. (Ofc, userspace can also choose to retain the initramfs contents by using something like pivot_root(".", "/initramfs") without unmounting it.) Technically this also means that the rootfs mount in unprivileged namespaces doesn't need to become MNT_LOCKED anymore as it's guaranteed that the immutable rootfs remains permanently empty so there cannot be anything revealed by unmounting the covering mount. In the future this will also allow us to create completely empty mount namespaces without risking to leak anything. systemd already handles this all correctly as it tries to pivot_root() first and falls back to MS_MOVE only when that fails. This goes back to various discussion in previous years and a LPC 2024 presentation about this very topic. * patches from https://patch.msgid.link/20260112-work-immutable-rootfs-v2-0-88dd1c34a204@kernel.org: docs: mention nullfs fs: add immutable rootfs fs: add init_pivot_root() fs: ensure that internal tmpfs mount gets mount id zero Link: https://patch.msgid.link/20260112-work-immutable-rootfs-v2-0-88dd1c34a204@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12docs: mention nullfsChristian Brauner
Add a section about nullfs and how it enables pivot_root() to work. Link: https://patch.msgid.link/20260112-work-immutable-rootfs-v2-4-88dd1c34a204@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12fs: add immutable rootfsChristian Brauner
Currently pivot_root() doesn't work on the real rootfs because it cannot be unmounted. Userspace has to do a recursive removal of the initramfs contents manually before continuing the boot. Really all we want from the real rootfs is to serve as the parent mount for anything that is actually useful such as the tmpfs or ramfs for initramfs unpacking or the rootfs itself. There's no need for the real rootfs to actually be anything meaningful or useful. Add a immutable rootfs called "nullfs" that can be selected via the "nullfs_rootfs" kernel command line option. The kernel will mount a tmpfs/ramfs on top of it, unpack the initramfs and fire up userspace which mounts the rootfs and can then just do: chdir(rootfs); pivot_root(".", "."); umount2(".", MNT_DETACH); and be done with it. (Ofc, userspace can also choose to retain the initramfs contents by using something like pivot_root(".", "/initramfs") without unmounting it.) Technically this also means that the rootfs mount in unprivileged namespaces doesn't need to become MNT_LOCKED anymore as it's guaranteed that the immutable rootfs remains permanently empty so there cannot be anything revealed by unmounting the covering mount. In the future this will also allow us to create completely empty mount namespaces without risking to leak anything. systemd already handles this all correctly as it tries to pivot_root() first and falls back to MS_MOVE only when that fails. This goes back to various discussion in previous years and a LPC 2024 presentation about this very topic. Link: https://patch.msgid.link/20260112-work-immutable-rootfs-v2-3-88dd1c34a204@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12fs: add init_pivot_root()Christian Brauner
We will soon be able to pivot_root() with the introduction of the immutable rootfs. Add a wrapper for kernel internal usage. Link: https://patch.msgid.link/20260112-work-immutable-rootfs-v2-2-88dd1c34a204@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12fs: ensure that internal tmpfs mount gets mount id zeroChristian Brauner
and the rootfs get mount id one as it always has. Before we actually mount the rootfs we create an internal tmpfs mount which has mount id zero but is never exposed anywhere. Continue that "tradition". Link: https://patch.msgid.link/20260112-work-immutable-rootfs-v2-1-88dd1c34a204@kernel.org Fixes: 7f9bfafc5f49 ("fs: use xarray for old mount id") Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12x86/paravirt: Use common code for paravirt_steal_clock()Juergen Gross
Remove the arch-specific variant of paravirt_steal_clock() and use the common one instead. With all archs supporting Xen now having been switched to the common variant, including paravirt.h can be dropped from drivers/xen/time.c. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-12-jgross@suse.com
2026-01-12riscv/paravirt: Use common code for paravirt_steal_clock()Juergen Gross
Remove the arch specific variant of paravirt_steal_clock() and use the common one instead. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-11-jgross@suse.com
2026-01-12loongarch/paravirt: Use common code for paravirt_steal_clock()Juergen Gross
Remove the arch specific variant of paravirt_steal_clock() and use the common one instead. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Bibo Mao <maobibo@loongson.cn> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-10-jgross@suse.com
2026-01-12tools: jobserver: Prevent deadlock caused by incorrect jobserver ↵Changbin Du
configuration and enhance error reporting When using GNU Make's jobserver feature in kernel builds, a bug in MAKEFLAGS propagation caused "--jobserver-auth=r,w" to reference an unintended file descriptor. This led to infinite loops in jobserver-exec's os.read() calls due to empty token. My shell opened /etc/passwd for some reason without closing it, and as a result, all child processes inherited this fd 3. $ ls -l /proc/self/fd total 0 lrwx------ 1 changbin changbin 64 Dec 25 13:03 0 -> /dev/pts/1 lrwx------ 1 changbin changbin 64 Dec 25 13:03 1 -> /dev/pts/1 lrwx------ 1 changbin changbin 64 Dec 25 13:03 2 -> /dev/pts/1 lr-x------ 1 changbin changbin 64 Dec 25 13:03 3 -> /etc/passwd lr-x------ 1 changbin changbin 64 Dec 25 13:03 4 -> /proc/1421383/fd In this case, the `make` should open a new file descriptor for jobserver control, but clearly, it did not do so and instead still passed fd 3 as "--jobserver-auth=3,4" in MAKEFLAGS. (The version of my gnu make is 4.3) This update ensures robustness against invalid jobserver configurations, even when `make` incorrectly pass non-pipe file descriptors. * Rejecting empty reads to prevent infinite loops on EOF. * Clearing `self.jobs` to avoid writing to incorrect files if invalid tokens are detected. * Printing detailed error messages to stderr to inform the user. Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Changbin Du <changbin.du@huawei.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <20260108113836.2976527-1-changbin.du@huawei.com>
2026-01-12arm64/paravirt: Use common code for paravirt_steal_clock()Juergen Gross
Remove the arch-specific variant of paravirt_steal_clock() and use the common one instead. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-9-jgross@suse.com
2026-01-12drm/xe: Privatize xe_ggtt_nodeMaarten Lankhorst
Nothing requires it any more, make the member private. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://patch.msgid.link/20260108101014.579906-16-dev@lankhorst.se
2026-01-12drm/xe: Improve xe_gt_sriov_pf_config GGTT handlingMaarten Lankhorst
Do not directly dereference xe_ggtt_node, and add a function to retrieve the allocated GGTT size. Reviewed-by: Matthew.brost@intel.com Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://patch.msgid.link/20260108101014.579906-15-dev@lankhorst.se
2026-01-12drm/xe: Do not dereference ggtt_node in xe_bo.cMaarten Lankhorst
A careful inspection of __xe_ggtt_insert_bo_at() shows that the ggtt_node can always be seen as inserted from xe_bo.c due to the way error handling is performed. The checks are also a little bit too paranoid, since we never create a bo with ggtt_node[id] initialised but not inserted into the GGTT, which can be seen by looking at __xe_ggtt_insert_bo_at() Additionally, the size of the GGTT is never bigger than 4 GB, so adding a check at that level is incorrect. Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260108101014.579906-14-dev@lankhorst.se
2026-01-12drm/xe/display: Avoid dereferencing xe_ggtt_nodeMaarten Lankhorst
Start using xe_ggtt_node_addr, and avoid comparing the base offset as vma->node is dynamically allocated. Also sneak in a xe_bo_size() for stolen, too small to put as separate commit. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20260108101014.579906-13-dev@lankhorst.se
2026-01-12drm/xe: Add xe_ggtt_node_addr() to avoid dereferencing xe_ggtt_nodeMaarten Lankhorst
This function makes it possible to add an offset that is applied to all xe_ggtt_node's, and hides the internals from all its users. Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260108101014.579906-12-dev@lankhorst.se
2026-01-12drm/xe: Convert xe_fb_pin to use a callback for insertion into GGTTMaarten Lankhorst
The rotation details belong in xe_fb_pin.c, while the operations involving GGTT belong to xe_ggtt.c. As directly locking xe_ggtt etc results in exposing all of xe_ggtt details anyway, create a special function that allocates a ggtt_node, and allow display to populate it using a callback as a compromise. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://patch.msgid.link/20260108101014.579906-11-dev@lankhorst.se
2026-01-12drm/xe: Start using ggtt->start in preparation of balloon removalMaarten Lankhorst
Instead of having ggtt->size point to the end of ggtt, have ggtt->size be the actual size of the GGTT, and introduce ggtt->start to point to the beginning of GGTT. This will allow a massive cleanup of GGTT in case of SRIOV-VF. Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://patch.msgid.link/20260108101014.579906-10-dev@lankhorst.se
2026-01-12btrfs: fix memory leaks in create_space_info() error pathsJiasheng Jiang
In create_space_info(), the 'space_info' object is allocated at the beginning of the function. However, there are two error paths where the function returns an error code without freeing the allocated memory: 1. When create_space_info_sub_group() fails in zoned mode. 2. When btrfs_sysfs_add_space_info_type() fails. In both cases, 'space_info' has not yet been added to the fs_info->space_info list, resulting in a memory leak. Fix this by adding an error handling label to kfree(space_info) before returning. Fixes: 2be12ef79fe9 ("btrfs: Separate space_info create/update") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-12btrfs: invalidate pages instead of truncate after reflinkingFilipe Manana
Qu reported that generic/164 often fails because the read operations get zeroes when it expects to either get all bytes with a value of 0x61 or 0x62. The issue stems from truncating the pages from the page cache instead of invalidating, as truncating can zero page contents. This zeroing is not just in case the range is not page sized (as it's commented in truncate_inode_pages_range()) but also in case we are using large folios, they need to be split and the splitting fails. Stealing Qu's comment in the thread linked below: "We can have the following case: 0 4K 8K 12K 16K | | | | | |<---- Extent A ----->|<----- Extent B ------>| The page size is still 4K, but the folio we got is 16K. Then if we remap the range for [8K, 16K), then truncate_inode_pages_range() will get the large folio 0 sized 16K, then call truncate_inode_partial_folio(). Which later calls folio_zero_range() for the [8K, 16K) range first, then tries to split the folio into smaller ones to properly drop them from the cache. But if splitting failed (e.g. racing with other operations holding the filemap lock), the partially zeroed large folio will be kept, resulting the range [8K, 16K) being zeroed meanwhile the folio is still a 16K sized large one." So instead of truncating, invalidate the page cache range with a call to filemap_invalidate_inode(), which besides not doing any zeroing also ensures that while it's invalidating folios, no new folios are added. This helps ensure that buffered reads that happen while a reflink operation is in progress always get either the whole old data (the one before the reflink) or the whole new data, which is what generic/164 expects. Link: https://lore.kernel.org/linux-btrfs/7fb9b44f-9680-4c22-a47f-6648cb109ddf@suse.com/ Reported-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-12btrfs: update the Kconfig string for CONFIG_BTRFS_EXPERIMENTALQu Wenruo
The following new features are missing: - Async checksum - Shutdown ioctl and auto-degradation - Larger block size support Which is dependent on larger folios. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-12arm/paravirt: Use common code for paravirt_steal_clock()Juergen Gross
Remove the arch-specific variant of paravirt_steal_clock() and use the common one instead. This allows to remove paravirt.c and paravirt.h from arch/arm. Until all archs supporting Xen have been switched to the common code of paravirt_steal_clock(), drivers/xen/time.c needs to include asm/paravirt.h for those archs, while this is not necessary for arm any longer. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-8-jgross@suse.com
2026-01-12sched: Move clock related paravirt code to kernel/schedJuergen Gross
Paravirt clock related functions are available in multiple archs. In order to share the common parts, move the common static keys to kernel/sched/ and remove them from the arch specific files. Make a common paravirt_steal_clock() implementation available in kernel/sched/cputime.c, guarding it with a new config option CONFIG_HAVE_PV_STEAL_CLOCK_GEN, which can be selected by an arch in case it wants to use that common variant. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-7-jgross@suse.com
2026-01-12paravirt: Remove asm/paravirt_api_clock.hJuergen Gross
All architectures supporting CONFIG_PARAVIRT share the same contents of asm/paravirt_api_clock.h: #include <asm/paravirt.h> So remove all incarnations of asm/paravirt_api_clock.h and remove the only place where it is included, as there asm/paravirt.h is included anyway. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> # powerpc, scheduler bits Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-6-jgross@suse.com
2026-01-12x86/paravirt: Move thunk macros to paravirt_types.hJuergen Gross
The macros for generating PV-thunks are part of the generic paravirt infrastructure, so they should be in paravirt_types.h. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-5-jgross@suse.com
2026-01-12x86/paravirt: Remove PARAVIRT_DEBUG config optionJuergen Gross
The only effect of CONFIG_PARAVIRT_DEBUG set is that instead of doing a call using a NULL pointer a BUG() is being raised. While the BUG() will be a little bit easier to analyse, the call of NULL isn't really that difficult to find the reason for. Remove the config option to make paravirt coding a little bit less annoying. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-4-jgross@suse.com
2026-01-12x86/paravirt: Remove some unneeded struct declarationsJuergen Gross
In paravirt_types.h and paravirt.h there are some struct declarations which are not needed. Remove them. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-3-jgross@suse.com
2026-01-12Revert "gfs2: Fix use of bio_chain"Andreas Gruenbacher
This reverts commit 8a157e0a0aa5143b5d94201508c0ca1bb8cfb941. That commit incorrectly assumed that the bio_chain() arguments were swapped in gfs2. However, gfs2 intentionally constructs bio chains so that the first bio's bi_end_io callback is invoked when all bios in the chain have completed, unlike bio chains where the last bio's callback is invoked. Fixes: 8a157e0a0aa5 ("gfs2: Fix use of bio_chain") Cc: stable@vger.kernel.org Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-12gpu: nova-core: add missing newlines to several print stringsTimur Tabi
Although the dev_xx!() macro calls do not technically require terminating newlines for the format strings, they should be added anyway to maintain consistency, both within Rust code and with the C versions. Signed-off-by: Timur Tabi <ttabi@nvidia.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Link: https://patch.msgid.link/20260107201647.2490140-2-ttabi@nvidia.com Signed-off-by: Danilo Krummrich <dakr@kernel.org>