| Age | Commit message (Collapse) | Author |
|
Print a trace point after successfully inserting an extent in the
ext4_es_cache_extent() function. Additionally, similar to other extent
cache operation functions, call ext4_print_pending_tree() to display the
extent debug information of the inode when in ES_DEBUG mode.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Message-ID: <20251129103247.686136-13-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Currently, ext4_es_cache_extent() is used to load extents into the
extent status tree when reading on-disk extent blocks. But it inserts
information into the extent status tree if and only if there isn't
information about the specified range already. So it only used for the
initial loading and does not support overwrit extents.
However, there are many other places in ext4 where on-disk extents are
inserted into the extent status tree, such as in ext4_map_query_blocks().
Currently, they call ext4_es_insert_extent() to perform the insertion,
but they don't modify the extents, so ext4_es_cache_extent() would be a
more appropriate choice. However, when ext4_map_query_blocks() inserts
an extent, it may overwrite a short existing extent of the same type.
Therefore, to prepare for the replacements, we need to extend
ext4_es_cache_extent() to allow it to overwrite existing extents with
the same status. So it checks the found extents before removing and
inserting. (There is one exception, a hole in the on-disk extent but a
delayed extent in the extent status tree is allowed.)
In addition, since cached extents can be more lenient than the extents
they modify and do not involve modifying reserved blocks, it is not
necessary to ensure that the insertion operation succeeds as strictly as
in the ext4_es_insert_extent() function.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Message-ID: <20251129103247.686136-12-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Currently, __es_remove_extent() unconditionally removes extent status
entries within the specified range. In order to prepare for extending
the ext4_es_cache_extent() function to cache on-disk extents, which may
overwrite some existing short-length extents with the same status, allow
__es_remove_extent() to check the specified extent type before removing
it, and return error and pass out the conflicting extent if the status
does not match.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Message-ID: <20251129103247.686136-11-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The out label in __es_remove_extent() is just return err value, we can
return it directly if something bad happens. Therefore, remove the
useless out label and rename out_get_reserved to out.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Message-ID: <20251129103247.686136-10-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
zero_ex is a temporary variable used only for writing zeros and
inserting extent status entry, it will not be directly inserted into the
tree. Therefore, it can be assigned values from the target extent in
various scenarios, eliminating the need to explicitly assign values to
each variable individually.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Message-ID: <20251129103247.686136-9-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When the split extent fails, we might leave some extents still being
processed and return an error directly, which will result in stale
extent entries remaining in the extent status tree. So drop all of the
remaining potentially stale extents if the splitting fails.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Cc: stable@kernel.org
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251129103247.686136-8-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When splitting an unwritten extent in the middle and converting it to
initialized in ext4_split_extent() with the EXT4_EXT_MAY_ZEROOUT and
EXT4_EXT_DATA_VALID2 flags set, it could leave a stale unwritten extent.
Assume we have an unwritten file and buffered write in the middle of it
without dioread_nolock enabled, it will allocate blocks as written
extent.
0 A B N
[UUUUUUUUUUUU] on-disk extent U: unwritten extent
[UUUUUUUUUUUU] extent status tree
[--DDDDDDDD--] D: valid data
|<- ->| ----> this range needs to be initialized
ext4_split_extent() first try to split this extent at B with
EXT4_EXT_DATA_PARTIAL_VALID1 and EXT4_EXT_MAY_ZEROOUT flag set, but
ext4_split_extent_at() failed to split this extent due to temporary lack
of space. It zeroout B to N and leave the entire extent as unwritten.
0 A B N
[UUUUUUUUUUUU] on-disk extent
[UUUUUUUUUUUU] extent status tree
[--DDDDDDDDZZ] Z: zeroed data
ext4_split_extent() then try to split this extent at A with
EXT4_EXT_DATA_VALID2 flag set. This time, it split successfully and
leave an written extent from A to N.
0 A B N
[UUWWWWWWWWWW] on-disk extent W: written extent
[UUUUUUUUUUUU] extent status tree
[--DDDDDDDDZZ]
Finally ext4_map_create_blocks() only insert extent A to B to the extent
status tree, and leave an stale unwritten extent in the status tree.
0 A B N
[UUWWWWWWWWWW] on-disk extent W: written extent
[UUWWWWWWWWUU] extent status tree
[--DDDDDDDDZZ]
Fix this issue by always cached extent status entry after zeroing out
the second part.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Cc: stable@kernel.org
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251129103247.686136-7-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Caching extents during the splitting process is risky, as it may result
in stale extents remaining in the status tree. Moreover, in most cases,
the corresponding extent block entries are likely already cached before
the split happens, making caching here not particularly useful.
Assume we have an unwritten extent, and then DIO writes the first half.
[UUUUUUUUUUUUUUUU] on-disk extent U: unwritten extent
[UUUUUUUUUUUUUUUU] extent status tree
|<- ->| ----> dio write this range
First, when ext4_split_extent_at() splits this extent, it truncates the
existing extent and then inserts a new one. During this process, this
extent status entry may be shrunk, and calls to ext4_find_extent() and
ext4_cache_extents() may occur, which could potentially insert the
truncated range as a hole into the extent status tree. After the split
is completed, this hole is not replaced with the correct status.
[UUUUUUU|UUUUUUUU] on-disk extent U: unwritten extent
[UUUUUUU|HHHHHHHH] extent status tree H: hole
Then, the outer calling functions will not correct this remaining hole
extent either. Finally, if we perform a delayed buffer write on this
latter part, it will re-insert the delayed extent and cause an error in
space accounting.
In adition, if the unwritten extent cache is not shrunk during the
splitting, ext4_cache_extents() also conflicts with existing extents
when caching extents. In the future, we will add checks when caching
extents, which will trigger a warning. Therefore, Do not cache extents
that are being split.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Cc: stable@kernel.org
Message-ID: <20251129103247.686136-6-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Before submitting I/O and allocating blocks with the
EXT4_GET_BLOCKS_PRE_IO flag set, ext4_split_convert_extents() may
convert the target extent range to initialized due to ENOSPC, ENOMEM, or
EQUOTA errors. However, it still marks the mapping as incorrectly
unwritten. Although this may not seem to cause any practical problems,
it will result in an unnecessary extent conversion operation after I/O
completion. Therefore, it's better to correct the returned mapping
status.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Message-ID: <20251129103247.686136-5-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When allocating blocks during within-EOF DIO and writeback with
dioread_nolock enabled, EXT4_GET_BLOCKS_PRE_IO was set to split an
existing large unwritten extent. However, EXT4_GET_BLOCKS_CONVERT was
set when calling ext4_split_convert_extents(), which may potentially
result in stale data issues.
Assume we have an unwritten extent, and then DIO writes the second half.
[UUUUUUUUUUUUUUUU] on-disk extent U: unwritten extent
[UUUUUUUUUUUUUUUU] extent status tree
|<- ->| ----> dio write this range
First, ext4_iomap_alloc() call ext4_map_blocks() with
EXT4_GET_BLOCKS_PRE_IO, EXT4_GET_BLOCKS_UNWRIT_EXT and
EXT4_GET_BLOCKS_CREATE flags set. ext4_map_blocks() find this extent and
call ext4_split_convert_extents() with EXT4_GET_BLOCKS_CONVERT and the
above flags set.
Then, ext4_split_convert_extents() calls ext4_split_extent() with
EXT4_EXT_MAY_ZEROOUT, EXT4_EXT_MARK_UNWRIT2 and EXT4_EXT_DATA_VALID2
flags set, and it calls ext4_split_extent_at() to split the second half
with EXT4_EXT_DATA_VALID2, EXT4_EXT_MARK_UNWRIT1, EXT4_EXT_MAY_ZEROOUT
and EXT4_EXT_MARK_UNWRIT2 flags set. However, ext4_split_extent_at()
failed to insert extent since a temporary lack -ENOSPC. It zeroes out
the first half but convert the entire on-disk extent to written since
the EXT4_EXT_DATA_VALID2 flag set, but left the second half as unwritten
in the extent status tree.
[0000000000SSSSSS] data S: stale data, 0: zeroed
[WWWWWWWWWWWWWWWW] on-disk extent W: written extent
[WWWWWWWWWWUUUUUU] extent status tree
Finally, if the DIO failed to write data to the disk, the stale data in
the second half will be exposed once the cached extent entry is gone.
Fix this issue by not passing EXT4_GET_BLOCKS_CONVERT when splitting
an unwritten extent before submitting I/O, and make
ext4_split_convert_extents() to zero out the entire extent range
to zero for this case, and also mark the extent in the extent status
tree for consistency.
Fixes: b8a8684502a0 ("ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Cc: stable@kernel.org
Message-ID: <20251129103247.686136-4-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When allocating initialized blocks from a large unwritten extent, or
when splitting an unwritten extent during end I/O and converting it to
initialized, there is currently a potential issue of stale data if the
extent needs to be split in the middle.
0 A B N
[UUUUUUUUUUUU] U: unwritten extent
[--DDDDDDDD--] D: valid data
|<- ->| ----> this range needs to be initialized
ext4_split_extent() first try to split this extent at B with
EXT4_EXT_DATA_ENTIRE_VALID1 and EXT4_EXT_MAY_ZEROOUT flag set, but
ext4_split_extent_at() failed to split this extent due to temporary lack
of space. It zeroout B to N and mark the entire extent from 0 to N
as written.
0 A B N
[WWWWWWWWWWWW] W: written extent
[SSDDDDDDDDZZ] Z: zeroed, S: stale data
ext4_split_extent() then try to split this extent at A with
EXT4_EXT_DATA_VALID2 flag set. This time, it split successfully and left
a stale written extent from 0 to A.
0 A B N
[WW|WWWWWWWWWW]
[SS|DDDDDDDDZZ]
Fix this by pass EXT4_EXT_DATA_PARTIAL_VALID1 to ext4_split_extent_at()
when splitting at B, don't convert the entire extent to written and left
it as unwritten after zeroing out B to N. The remaining work is just
like the standard two-part split. ext4_split_extent() will pass the
EXT4_EXT_DATA_VALID2 flag when it calls ext4_split_extent_at() for the
second time, allowing it to properly handle the split. If the split is
successful, it will keep extent from 0 to A as unwritten.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Cc: stable@kernel.org
Message-ID: <20251129103247.686136-3-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When splitting an extent, if the EXT4_GET_BLOCKS_CONVERT flag is set and
it is necessary to split the target extent in the middle,
ext4_split_extent() first handles splitting the latter half of the
extent and passes the EXT4_EXT_DATA_VALID1 flag. This flag implies that
all blocks before the split point contain valid data; however, this
assumption is incorrect.
Therefore, subdivid EXT4_EXT_DATA_VALID1 into
EXT4_EXT_DATA_ENTIRE_VALID1 and EXT4_EXT_DATA_PARTIAL_VALID1, which
indicate that the first half of the extent is either entirely valid or
only partially valid, respectively. These two flags cannot be set
simultaneously.
This patch does not use EXT4_EXT_DATA_PARTIAL_VALID1, it only replaces
EXT4_EXT_DATA_VALID1 with EXT4_EXT_DATA_ENTIRE_VALID1 at the location
where it is set, no logical changes.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Cc: stable@kernel.org
Message-ID: <20251129103247.686136-2-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The error branch for ext4_xattr_inode_update_ref forget to release the
refcount for iloc.bh. Find this when review code.
Fixes: 57295e835408 ("ext4: guard against EA inode refcount underflow in xattr update")
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20251213055706.3417529-1-yangerkun@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
Commit 962e8a01eab9 ("ext4: introduce mext_move_extent()") attempts to
call ext4_swap_extents() on the failure path to recover the swapped
extents, but fails to acquire locks for the two inode->i_data_sem,
triggering the BUG_ON statement in ext4_swap_extents().
This issue can be fixed by calling ext4_double_down_write_data_sem()
before ext4_swap_extents().
Signed-off-by: Julian Sun <sunjunchao@bytedance.com>
Reported-by: syzbot+4ea6bd8737669b423aae@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/69368649.a70a0220.38f243.0093.GAE@google.com/
Fixes: 962e8a01eab9 ("ext4: introduce mext_move_extent()")
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://patch.msgid.link/20251208123713.1971068-1-sunjunchao@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The padding at the end of struct ext4_tune_sb_params is architecture
specific and in particular is different between x86-32 and x86-64,
since the __u64 member only enforces struct alignment on the latter.
This shows up as a new warning when test-building the headers with
-Wpadded:
include/linux/ext4.h:144:1: error: padding struct size to alignment boundary with 4 bytes [-Werror=padded]
All members inside the structure are naturally aligned, so the only
difference here is the amount of padding at the end. Make the padding
explicit, to have a consistent sizeof(struct ext4_tune_sb_params) of
232 on all architectures and avoid adding compat ioctl handling for
EXT4_IOC_GET_TUNE_SB_PARAM/EXT4_IOC_SET_TUNE_SB_PARAM.
This is an ABI break on x86-32 but hopefully this can go into 6.18.y early
enough as a fixup so no actual users will be affected. Alternatively, the
kernel could handle the ioctl commands for both sizes (232 and 228 bytes)
on all architectures.
Fixes: 04a91570ac67 ("ext4: implemet new ioctls to set and get superblock parameters")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20251204101914.1037148-1-arnd@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
During firmware reset in LAG mode, a race condition causes the driver
to hang indefinitely while waiting for UMR completion during device
unload. See [1].
In LAG mode the bond device is only registered on the master, so it
never sees sys_error events from the slave.
During firmware reset this causes UMR waits to hang forever on unload
as the slave is dead but the master hasn't entered error state yet, so
UMR posts succeed but completions never arrive.
Fix this by adding a sys_error notifier that gets registered before
MLX5_IB_STAGE_IB_REG and stays alive until after ib_unregister_device().
This ensures error events reach the bond device throughout teardown.
[1]
Call Trace:
__schedule+0x2bd/0x760
schedule+0x37/0xa0
schedule_preempt_disabled+0xa/0x10
__mutex_lock.isra.6+0x2b5/0x4a0
__mlx5_ib_dereg_mr+0x606/0x870 [mlx5_ib]
? __xa_erase+0x4a/0xa0
? _cond_resched+0x15/0x30
? wait_for_completion+0x31/0x100
ib_dereg_mr_user+0x48/0xc0 [ib_core]
? rdmacg_uncharge_hierarchy+0xa0/0x100
destroy_hw_idr_uobject+0x20/0x50 [ib_uverbs]
uverbs_destroy_uobject+0x37/0x150 [ib_uverbs]
__uverbs_cleanup_ufile+0xda/0x140 [ib_uverbs]
uverbs_destroy_ufile_hw+0x3a/0xf0 [ib_uverbs]
ib_uverbs_remove_one+0xc3/0x140 [ib_uverbs]
remove_client_context+0x8b/0xd0 [ib_core]
disable_device+0x8c/0x130 [ib_core]
__ib_unregister_device+0x10d/0x180 [ib_core]
ib_unregister_device+0x21/0x30 [ib_core]
__mlx5_ib_remove+0x1e4/0x1f0 [mlx5_ib]
auxiliary_bus_remove+0x1e/0x30
device_release_driver_internal+0x103/0x1f0
bus_remove_device+0xf7/0x170
device_del+0x181/0x410
mlx5_rescan_drivers_locked.part.10+0xa9/0x1d0 [mlx5_core]
mlx5_disable_lag+0x253/0x260 [mlx5_core]
mlx5_lag_disable_change+0x89/0xc0 [mlx5_core]
mlx5_eswitch_disable+0x67/0xa0 [mlx5_core]
mlx5_unload+0x15/0xd0 [mlx5_core]
mlx5_unload_one+0x71/0xc0 [mlx5_core]
mlx5_sync_reset_reload_work+0x83/0x100 [mlx5_core]
process_one_work+0x1a7/0x360
worker_thread+0x30/0x390
? create_worker+0x1a0/0x1a0
kthread+0x116/0x130
? kthread_flush_work_fn+0x10/0x10
ret_from_fork+0x22/0x40
Fixes: ede132a5cf55 ("RDMA/mlx5: Move events notifier registration to be after device registration")
Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20260113-umr-hand-lag-fix-v1-1-3dc476e00cd9@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The timestamp WA does not work on a VF because it requires reading MMIO
registers, which are inaccessible on a VF. This timestamp WA confuses
LRC sampling on a VF during TDR, as the LRC timestamp would always read
as 1 for any active context. Disable the timestamp WA on VFs to avoid
this confusion.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Fixes: 617d824c5323 ("drm/xe: Add WA BB to capture active context utilization")
Link: https://patch.msgid.link/20260110012739.2888434-7-matthew.brost@intel.com
(cherry picked from commit efffd56e4bd894e0935eea00e437f233b6cebc0d)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
Fix kernel-doc warnings on xe_vm_validation_exec():
Warning: ../drivers/gpu/drm/xe/xe_vm.h:392 expecting prototype for
xe_vm_set_validation_exec(). Prototype was for xe_vm_validation_exec()
instead
Fixes: 0131514f9789 ("drm/xe: Pass down drm_exec context to validation")
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260107155401.2379127-4-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit b3a7767989e6519127ac5e0cde682c50ad587f3b)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
Fix kernel-doc warnings on enum xe_late_bind_fw_id:
Warning: ../drivers/gpu/drm/xe/xe_late_bind_fw_types.h:19 cannot
understand function prototype: 'enum xe_late_bind_fw_id'
Fixes: 45832bf9c10f ("drm/xe/xe_late_bind_fw: Initialize late binding firmware")
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patch.msgid.link/20260107155401.2379127-3-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit a857e6102970c7bd8f2db967fe02d76741179d14)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
Fix kernel-doc warnings on struct xe_gt_sriov_vf_migration:
Warning: ../drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h:47 cannot
understand function prototype: 'struct xe_gt_sriov_vf_migration'
Fixes: e1d2e2d878bf ("drm/xe/vf: Add xe_gt_recovery_pending helper")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260107155401.2379127-2-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit 44393331c79f5df14c1ff25f4a355f439a2dc8a2)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
Workqueue xe-ggtt-wq has been allocated using WQ_MEM_RECLAIM, but
the flag has been passed as 3rd parameter (max_active) instead
of 2nd (flags) creating the workqueue as per-cpu with max_active = 8
(the WQ_MEM_RECLAIM value).
So change this by set WQ_MEM_RECLAIM as the 2nd parameter with a
default max_active.
Fixes: 60df57e496e4 ("drm/xe: Mark GGTT work queue with WQ_MEM_RECLAIM")
Cc: stable@vger.kernel.org
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260108180148.423062-1-marco.crivellari@suse.com
(cherry picked from commit aa39abc08e77d66ebb0c8c9ec4cc8d38ded34dc9)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
Page accounting can change via the shrinker without calling
xe_ttm_tt_unpopulate(), which normally updates page count tracepoints
through update_global_total_pages. Add a call to
update_global_total_pages when the shrinker successfully shrinks a BO.
v2:
- Don't adjust global accounting when pinning (Stuart)
Cc: stable@vger.kernel.org
Fixes: ce3d39fae3d3 ("drm/xe/bo: add GPU memory trace points")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patch.msgid.link/20260107205732.2267541-1-matthew.brost@intel.com
(cherry picked from commit cc54eabdfbf0c5b6638edc50002cfafac1f1e18b)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
The msm driver implements a custom dumb_map_offset callback. This
implementation acquires the msm_gem_lock, but the underlying
drm_gem_create_mmap_offset() function is already thread-safe regarding
the VMA offset manager (it acquires the mgr->vm_lock internally).
Switching to the generic drm_gem_dumb_map_offset() helper provides
several benefits:
1. Removes the unnecessary locking overhead (locking leftovers).
2. Adds a missing check to reject mapping of imported objects, which is
invalid for dumb buffers.
3. Allows for the removal of the msm_gem_dumb_map_offset() wrapper and
the msm_gem_mmap_offset() helper function.
The logic from msm_gem_mmap_offset() has been inlined into
msm_ioctl_gem_info() to maintain functionality without the separate
helper.
This addresses the TODO:
"Documentation/gpu/todo.rst: Remove custom dumb_map_offset implementations"
Signed-off-by: Swaraj Gaikwad <swarajgaikwad1925@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/694727/
Message-ID: <20251215022850.12358-1-swarajgaikwad1925@gmail.com>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
|
|
Add catalog entry for Adreno A225.6 as present on MSM8960v3. Most of the
pieces were already contributed by Jonathan Marek in commit 21af872cd8c6
("drm/msm/adreno: add a2xx"), but weren't enabled because there was no
GPU entry.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/689908/
Message-ID: <20251121-a225-v1-2-a1bab651d186@oss.qualcomm.com>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
|
|
A225 has a different PixelShader start address, write correct address
while initializing GPU.
Fixes: 21af872cd8c6 ("drm/msm/adreno: add a2xx")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/689906/
Message-ID: <20251121-a225-v1-1-a1bab651d186@oss.qualcomm.com>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
|
|
These GPUs have physically have the same regions as the base case
("main" + "cx_mem" + "cx_dbgc"). Remove the specific override.
Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Patchwork: https://patchwork.freedesktop.org/patch/696547/
Message-ID: <20251229-topic-6115_2290_gpu_dbgc-v1-1-4a24d196389c@oss.qualcomm.com>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
|
|
When receiving file descriptors via SCM_RIGHTS, both the socket pointer
and the socket's sk pointer can be NULL during socket setup or teardown,
causing NULL pointer dereferences in __unix_needs_revalidation().
This is a regression in AppArmor 5.0.0 (kernel 6.17+) where the new
__unix_needs_revalidation() function was added without proper NULL checks.
The crash manifests as:
BUG: kernel NULL pointer dereference, address: 0x0000000000000018
RIP: aa_file_perm+0xb7/0x3b0 (or +0xbe/0x3b0, +0xc0/0x3e0)
Call Trace:
apparmor_file_receive+0x42/0x80
security_file_receive+0x2e/0x50
receive_fd+0x1d/0xf0
scm_detach_fds+0xad/0x1c0
The function dereferences sock->sk->sk_family without checking if either
sock or sock->sk is NULL first.
Add NULL checks for both sock and sock->sk before accessing sk_family.
Fixes: 88fec3526e841 ("apparmor: make sure unix socket labeling is correctly updated.")
Reported-by: Jamin Mc <jaminmc@gmail.com>
Closes: https://bugzilla.proxmox.com/show_bug.cgi?id=7083
Closes: https://gitlab.com/apparmor/apparmor/-/issues/568
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: System Administrator <root@localhost>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Now that sparse builds enforce the usage of typeof_unqual() the casts in
__raw_cpu_read/write() cause sparse to emit tons of false postive
warnings:
warning: cast removes address space '__percpu' of expression
Address this by annotating the casts with __force.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/87v7gz0yjv.ffs@tglx
Closes: https://lore.kernel.org/oe-kbuild-all/202601181733.YZOf9XU3-lkp@intel.com/
|
|
strcpy() is deprecated; replace it with a direct '/' assignment. The
buffer is already NUL-terminated, so there is no need to copy an
additional NUL terminator as strcpy() did.
Update the comment and add the local variable 'is_root' for clarity.
Closes: https://github.com/KSPP/linux/issues/88
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
strcpy() is deprecated and sprintf() does not perform bounds checking
either. Although an overflow is unlikely, it's better to proactively
avoid it by using the safer strscpy() and scnprintf(), respectively.
Additionally, unify memory allocation for 'hname' to simplify and
improve aa_policy_init().
Closes: https://github.com/KSPP/linux/issues/88
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
On RZ/G3E using PSCI, s2ram powers down the SoC. Add suspend/resume
callbacks to restore IRQ type for NMI, TINT and external IRQ interrupts.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260113125315.359967-3-biju.das.jz@bp.renesas.com
|
|
of_io_request_and_map() returns IOMEM_ERR_PTR() on failure which is
non-NULL.
Fixes: 8a7f030df897 ("irqchip/aslint-sswi: Request IO memory resource")
Reported-by: Chris Mason <clm@meta.com>
Signed-off-by: Vladimir Kondratiev <vladimir.kondratiev@mobileye.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260118082843.2786630-1-vladimir.kondratiev@mobileye.com
Closes: https://lore.kernel.org/all/20260116124257.78357-1-clm@meta.com
|
|
All LoongArch irqchip drivers are adjusted, allow them to be built on both
32BIT and 64BIT platforms.
Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260113085940.3344837-8-chenhuacai@loongson.cn
|
|
irq_domain_alloc_fwnode() takes a parameter with the phys_addr_t type.
Currently we pass acpi_pchpic->address to it. This can only work on
64BIT platform because its type is u64, so cast it to phys_addr_t and
then the driver works on both 32BIT and 64BIT platforms.
Also use readl() to read vec_count because readq() is only available on
64BIT platform.
[ tglx: Make the cast explicit and use the casted address as argument for
pch_pic_init() which takes a phys_addr_t as well. Fixup coding
style. More sigh... ]
Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260113085940.3344837-7-chenhuacai@loongson.cn
|
|
irq_domain_alloc_fwnode() takes a parameter with the phys_addr_t type.
Currently the code passe acpi_pchmsi->msg_address to it.
This can only work on 64BIT platform because its type is u64, so cast it to
phys_addr_t and then the driver works on both 32BIT and 64BIT platform.
[ tglx: Make the cast explicit and fixup coding style. ]
Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260113085940.3344837-6-chenhuacai@loongson.cn
|
|
irq_domain_alloc_fwnode() takes a parameter with the phys_addr_t type.
Currently the code passes acpi_htvec->address to it.
This can only work on 64BIT platform because its type is u64, so cast it to
phys_addr_t and then the driver works on both 32BIT and 64BIT platforms.
[ tglx: Dereference _after_ the NULL pointer check, make the cast explicit
and use the casted address as argument for htvec_init() which takes
a phys_addr_t as well. Sigh... ]
Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260113085940.3344837-5-chenhuacai@loongson.cn
|
|
iocsr_read64()/iocsr_write64() are only available on 64BIT LoongArch
platform, so add and use a pair of helpers, i.e. read_isr()/write_isr()
instead to make the driver work on both 32BIT and 64BIT platforms.
This makes eoiintc_enable() a no-op for 32-bit as it is only required on
64-bit systems.
[ tglx: Make the helpers inline and fixup the variable declaration order ]
Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260113085940.3344837-4-chenhuacai@loongson.cn
|
|
irq_domain_alloc_fwnode() takes a parameter with the phys_addr_t type.
Currently the code passes acpi_liointc->address to it.
This can only work on 64BIT platforms because its type is u64, so cast it to
phys_addr_t and then the driver works on both 32BIT and 64BIT platform.
[ tglx: Make the cast explicit and use the casted address as argument for
liointc_init() which takes a phys_addr_t as well. Sigh... ]
Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260113085940.3344837-3-chenhuacai@loongson.cn
|
|
csr_read64() is only available on 64BIT LoongArch platform, so use the
recently added adaptive csr_read() instead to make the driver work on both
32BIT and 64BIT platforms.
This makes avecintc_enable() a no-op for 32-bit as it is only required on
64-bit systems.
Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260113085940.3344837-2-chenhuacai@loongson.cn
|
|
* for-7.0/blk-pvec:
types: move phys_vec definition to common header
nvme-pci: Use size_t for length fields to handle larger sizes
|
|
Currently, the error path of amd_iommu_probe_device() unconditionally
references dev_data, which may not be initialized if an early failure
occurs (like iommu_init_device() fails).
Move the out_err label to ensure the function exits immediately on
failure without accessing potentially uninitialized dev_data.
Fixes: 19e5cc156cb ("iommu/amd: Enable support for up to 2K interrupts per function")
Cc: Rakuram Eswaran <rakuram.e96@gmail.com>
Cc: Jörg Rödel <joro@8bytes.org>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202512191724.meqJENXe-lkp@intel.com/
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
Add devicetree binding for watchdog present on Qualcomm's Glymur SoC
Signed-off-by: Pankaj Patil <pankaj.patil@oss.qualcomm.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
|
|
Convert mpc83xx-wdt.txt to YAML to enable automatic schema validation.
Signed-off-by: J. Neuschäfer <j.ne@posteo.net>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
|
|
Introduce set_dte_nested() to program guest translation settings in
the host DTE when attaches the nested domain to a device.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
Introduce the amd_iommu_set_dte_v1() helper function to configure
IOMMU host (v1) page table into DTE. This will be used later
when attaching nested doamin.
Also, remove obsolete warning when SNP is enabled and domain id
is zero since this check is no longer applicable.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
amd_iommu_make_clear_dte()
To help avoid duplicate logic when programing DTE for nested translation.
Note that this commit changes behavior of when the IOMMU driver is
switching domain during attach and the blocking domain, where DTE bit
fields for interrupt pass-through (i.e. Lint0, Lint1, NMI, INIT, ExtInt)
and System management message could be affected. These DTE bits are
specified in the IVRS table for specific devices, and should be persistent.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
invalidation
Each nested domain is assigned guest domain ID (gDomID), which guest OS
programs into guest Device Table Entry (gDTE). For each gDomID, the driver
assigns a corresponding host domain ID (hDomID), which will be programmed
into the host Device Table Entry (hDTE).
The hDomID is allocated during amd_iommu_alloc_domain_nested(),
and free during nested_domain_free(). The gDomID-to-hDomID mapping info
(struct guest_domain_mapping_info) is stored in a per-viommu xarray
(struct amd_iommu_viommu.gdomid_array), which is indexed by gDomID.
Note also that parent domain can be shared among struct iommufd_viommu.
Therefore, when hypervisor invalidates the nest parent domain, the AMD
IOMMU command INVALIDATE_IOMMU_PAGES must be issued for each hDomID in
the gdomid_array. This is handled by the iommu_flush_pages_v1_hdom_ids(),
where it iterates through struct protection_domain.viommu_list.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
The nested domain is allocated with IOMMU_DOMAIN_NESTED type to store
stage-1 translation (i.e. GVA->GPA). This includes the GCR3 root pointer
table along with guest page tables. The struct iommu_hwpt_amd_guest
contains this information, and is passed from user-space as a parameter
of the struct iommu_ops.domain_alloc_nested().
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
Binding defined two if:then: blocks covering different conditions but
not fully constraining the properties per each variant:
1. "if:" to require samsung,syscon-phandle,
2. "if:" with "else:" to narrow number of clocks and require or disallow
samsung,cluster-index.
This still did not cover following cases:
1. Disallow samsung,syscon-phandle when not applicable,
2. Narrow samsung,cluster-index to [0, 1], for SoCs with only two
clusters.
Solving this in current format would lead to spaghetti code, so re-write
entire "if:then:" approach into mutually exclusive cases so each SoC
appears only in one "if:" block. This allows to forbid
samsung,syscon-phandle for S3C6410, and narrow samsung,cluster-index
to [0, 1].
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
|
|
Which stores reference to nested parent domain assigned during the call to
struct iommu_ops.viommu_init(). Information in the nest parent is needed
when setting up the nested translation.
Note that the viommu initialization will be introduced in subsequent
commit.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|