| Age | Commit message (Collapse) | Author |
|
This was reported by the KUnit tests in the later patches.
See MS-ERREF 2.3.1 STATUS_UNABLE_TO_FREE_VM. Keep it consistent with the
value in the documentation.
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
This was reported by the KUnit tests in the later patches.
See MS-ERREF 2.3.1 STATUS_DEVICE_DOOR_OPEN. Keep it consistent with the
value in the documentation.
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
This was reported by the KUnit tests in the later patches.
See MS-ERREF 2.3.1 STATUS_NO_DATA_DETECTED. Keep it consistent with the
value in the documentation.
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Pull csky updates from Guo Ren:
- Remove compile warning for CONFIG_SMP
- Fix __ASSEMBLER__ typo in headers
- Fix csky_cmpxchg_fixup
* tag 'csky-for-linus-6.19' of https://github.com/c-sky/csky-linux:
csky: Remove compile warning for CONFIG_SMP
csky: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi header
csky: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers
csky: fix csky_cmpxchg_fixup not working
|
|
If irq_domain_translate_twocell() sets "hwirq" to >= MCHP_EIC_NIRQ (2) then
it results in an out of bounds access.
The code checks for invalid values, but doesn't set the error code. Return
-EINVAL in that case, instead of returning success.
Fixes: 00fa3461c86d ("irqchip/mchp-eic: Add support for the Microchip EIC")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
Link: https://patch.msgid.link/aTfHmOz6IBpTIPU5@stanley.mountain
|
|
Explained why FileSystemName is always set to "NTFS".
Link: https://github.com/namjaejeon/ksmbd/commit/84392651b0b740d2f59bcacd3b4cfff8ae0051a0
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
KSMBD does not use these NT error code definitions. Instead, it uses the
SMB2 status code definitions defined in common/smb2status.h.
By the way, server/nterr.h contains the following additional definitions
compared to client/nterr.h:
- NT_STATUS_PENDING
- NT_STATUS_INVALID_LOCK_RANGE
- NT_STATUS_NETWORK_SESSION_EXPIRED
- NT_STATUS_NO_PREAUTH_INTEGRITY_HASH_OVERLAP
We can add them to client/nterr.h in the next patch.
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Make the include guard more descriptive to avoid conflicts with include
guards that may be used in the future.
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Place the file scripts/tracepoint-update.c in the TRACING section.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Link: https://patch.msgid.link/20251208192544.5f2392a7@debian
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
No in-tree users anymore.
[ tglx: Remove the reference in the Chinese documentation as well ]
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251202202327.1444693-1-andriy.shevchenko@linux.intel.com
|
|
Building the KVM intel module failed to build with UT=1:
no __tracepoint_strings in file: arch/x86/kvm/kvm-intel.o
make[3]: *** [/work/git/test-linux.git/scripts/Makefile.modfinal:62: arch/x86/kvm/kvm-intel.ko] Error 1
The reason is that the module only uses the tracepoints defined and
exported by the main kvm module. The tracepoint-update.c code fails the
build if a tracepoint is used, but there's no tracepoints defined. But
this is acceptable in modules if the tracepoints are defined in the vmlinux
proper or another module and exported.
Do not fail to build if a tracepoint is used but no tracepoints are
defined if the code is a module. This should still never happen for the
vmlinux itself.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Link: https://patch.msgid.link/20251209204023.76941824@fedora
Fixes: e30f8e61e2518 ("tracing: Add a tracepoint verification check at build time")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
setup_percpu_irq() was forgotten when the percpu_devid infrastructure was
updated to deal with CPU affinities.
In order to keep ignoring users of this legacy API, provide sensible
defaults by setting the affinity to cpu_online_mask if none was provided by
the caller.
Fixes: bdf4e2ac295fe ("genirq: Allow per-cpu interrupt sharing for non-overlapping affinities")
Reported-by: Daniel Thompson <danielt@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251205091814.3944205-1-maz@kernel.org
Closes: https://lore.kernel.org/r/aTFozefMQRg7lYxh@aspen.lan
|
|
TDM channel updates were applied to all DAIs, causing configurations
to overwrite for unrelated streams. The logic is modified to update
channels only for targeted DAI. This prevents corruption of other DAI
settings and resolves audio issues observed during system suspend and
resume cycles.
Fixes: 12229b7e50cf ("ASoC: amd: acp: Add TDM support for acp i2s stream")
Signed-off-by: Hemalatha Pinnamreddy <hemalatha.pinnamreddy2@amd.com>
Signed-off-by: Raghavendra Prasad Mallela <raghavendraprasad.mallela@amd.com>
Link: https://patch.msgid.link/20251203120136.2591395-1-raghavendraprasad.mallela@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Using min_wait, two timeouts are given:
1) The min_wait timeout, within which up to 'wait_nr' events are
waited for.
2) The overall long timeout, which is entered if no events are generated
in the min_wait window.
If the min_wait has expired, any event being posted must wake the task.
For SQPOLL, that isn't the case, as it won't trigger the io_has_work()
condition, as it will have already processed the task_work that happened
when an event was posted. This causes any event to trigger post the
min_wait to not always cause the waiting application to wakeup, and
instead it will wait until the overall timeout has expired. This can be
shown in a test case that has a 1 second min_wait, with a 5 second
overall wait, even if an event triggers after 1.5 seconds:
axboe@m2max-kvm /d/iouring-mre (master)> zig-out/bin/iouring
info: MIN_TIMEOUT supported: true, features: 0x3ffff
info: Testing: min_wait=1000ms, timeout=5s, wait_nr=4
info: 1 cqes in 5000.2ms
where the expected result should be:
axboe@m2max-kvm /d/iouring-mre (master)> zig-out/bin/iouring
info: MIN_TIMEOUT supported: true, features: 0x3ffff
info: Testing: min_wait=1000ms, timeout=5s, wait_nr=4
info: 1 cqes in 1500.3ms
When the min_wait timeout triggers, reset the number of completions
needed to wake the task. This should ensure that any future events will
wake the task, regardless of how many events it originally wanted to
wait for.
Reported-by: Tip ten Brink <tip@tenbrinkmeijs.com>
Cc: stable@vger.kernel.org
Fixes: 1100c4a2656d ("io_uring: add support for batch wait timeout")
Link: https://github.com/axboe/liburing/issues/1477
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Merge the two ksimd scopes in the implementation of SM4-XTS to prevent
stack bloat in cases where the compiler fails to combine the stack slots
for the kernel mode FP/SIMD buffers.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20251203163803.157541-6-ardb@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
The ciphertext stealing logic in the AES-XTS implementation creates a
separate ksimd scope to call into the FP/SIMD core routines, and in some
cases (CONFIG_KASAN_STACK is one, but there might be others), the 528
byte kernel mode FP/SIMD buffer that is allocated inside this scope is
not shared with the preceding ksimd scope, resulting in unnecessary
stack bloat.
Considering that
a) the XTS ciphertext stealing logic is never called for block
encryption use cases, and XTS is rarely used for anything else,
b) in the vast majority of cases, the entire input block is processed
during the first iteration of the loop,
we can combine both ksimd scopes into a single one with no practical
impact on how often/how long FP/SIMD is en/disabled, allowing us to
reuse the same stack slot for both FP/SIMD routine calls.
Fixes: ba3c1b3b5ac9 ("crypto/arm64: aes-blk - Switch to 'ksimd' scoped guard API")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20251203163803.157541-5-ardb@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
As we're doing in the BLAKE2b code, use unrolled_full to make the
compiler handle the loop unrolling. This simplifies the code slightly.
The generated object code is nearly the same with both gcc and clang.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251205051155.25274-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
BLAKE2b has a state of 16 64-bit words. Add the message data in and
there are 32 64-bit words. With the current code where all the rounds
are unrolled to enable constant-folding of the blake2b_sigma values,
this results in a very large code size on 32-bit kernels, including a
recurring issue where gcc uses a large amount of stack.
There's just not much benefit to this unrolling when the code is already
so large. Let's roll up the rounds when !CONFIG_64BIT.
To avoid having to duplicate the code, just write the code once using a
loop, and conditionally use 'unrolled_full' from <linux/unroll.h>.
Then, fold the now-unneeded ROUND() macro into the loop. Finally, also
remove the now-unneeded override of the stack frame size warning.
Code size improvements for blake2b_compress_generic():
Size before (bytes) Size after (bytes)
------------------- ------------------
i386, gcc 27584 3632
i386, clang 18208 3248
arm32, gcc 19912 2860
arm32, clang 21336 3344
Running the BLAKE2b benchmark on a !CONFIG_64BIT kernel on an x86_64
processor shows a 16384B throughput change of 351 => 340 MB/s (gcc) or
442 MB/s => 375 MB/s (clang). So clearly not much of a slowdown either.
But also that microbenchmark also effectively disregards cache usage,
which is important in practice and is far better in the smaller code.
Note: If we rolled up the loop on x86_64 too, the change would be
7024 bytes => 1584 bytes and 1960 MB/s => 1396 MB/s (gcc), or
6848 bytes => 1696 bytes and 1920 MB/s => 1263 MB/s (clang).
Maybe still worth it, though not quite as clearly beneficial.
Fixes: 91d689337fe8 ("crypto: blake2b - add blake2b generic implementation")
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251205050330.89704-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Replace the RISCV_ISA_V dependency of the RISC-V crypto code with
RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS, which implies RISCV_ISA_V as
well as vector unaligned accesses being efficient.
This is necessary because this code assumes that vector unaligned
accesses are supported and are efficient. (It does so to avoid having
to use lots of extra vsetvli instructions to switch the element width
back and forth between 8 and either 32 or 64.)
This was omitted from the code originally just because the RISC-V kernel
support for detecting this feature didn't exist yet. Support has now
been added, but it's fragmented into per-CPU runtime detection, a
command-line parameter, and a kconfig option. The kconfig option is the
only reasonable way to do it, though, so let's just rely on that.
Fixes: eb24af5d7a05 ("crypto: riscv - add vector crypto accelerated AES-{ECB,CBC,CTR,XTS}")
Fixes: bb54668837a0 ("crypto: riscv - add vector crypto accelerated ChaCha20")
Fixes: 600a3853dfa0 ("crypto: riscv - add vector crypto accelerated GHASH")
Fixes: 8c8e40470ffe ("crypto: riscv - add vector crypto accelerated SHA-{256,224}")
Fixes: b3415925a08b ("crypto: riscv - add vector crypto accelerated SHA-{512,384}")
Fixes: 563a5255afa2 ("crypto: riscv - add vector crypto accelerated SM3")
Fixes: b8d06352bbf3 ("crypto: riscv - add vector crypto accelerated SM4")
Cc: stable@vger.kernel.org
Reported-by: Vivian Wang <wangruikang@iscas.ac.cn>
Closes: https://lore.kernel.org/r/b3cfcdac-0337-4db0-a611-258f2868855f@iscas.ac.cn/
Reviewed-by: Jerry Shih <jerry.shih@sifive.com>
Link: https://lore.kernel.org/r/20251206213750.81474-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
In chacha_zvkb, avoid using the s0 register, which is the frame pointer,
by reallocating KEY0 to t5. This makes stack traces available if e.g. a
crash happens in chacha_zvkb.
No frame pointer maintenance is otherwise required since this is a leaf
function.
Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Fixes: bb54668837a0 ("crypto: riscv - add vector crypto accelerated ChaCha20")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20251202-riscv-chacha_zvkb-fp-v2-1-7bd00098c9dc@iscas.ac.cn
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
- general cleanups in bcm2835, designware, pcf8584, and stm32
- amd-mp2: fix device refcount
- designware: avoid interrupt storms caused by bad firmware
- spacemit: fix device detection failures
- new devices: Intel Diamond Rapids, Rockchip RK3506, Qualcomm
Kaanapali and MSM8953
- minor fixes to i801, core documentation, elektor Kconfig dependencies
- at24 updates: add new compatible for Belling BL24S64
* tag 'i2c-for-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (21 commits)
i2c: qcom-cci: Add msm8953 compatible
i2c: spacemit: fix detect issue
i2c: amd-mp2: fix reference leak in MP2 PCI device
i2c: i2c.h: fix a bad kernel-doc line
i2c: i2c-elektor: Allow building on SMP kernels
dt-bindings: i2c: qcom-cci: Document Kaanapali compatible
dt-bindings: i2c: qcom-cci: Document msm8953 compatible
dt-bindings: eeprom: at24: Add compatible for Belling BL24S64
i2c: i801: Fix the Intel Diamond Rapids features
i2c: pcf8584: Change pcf_doAdress() to pcf_send_address()
i2c: pcf8584: Make pcf_doAddress() function void
i2c: pcf8584: Move 'ret' variable inside for loop, goto out if ret < 0.
i2c: designware: Disable SMBus interrupts to prevent storms from mis-configured firmware
dt-bindings: i2c: i2c-rk3x: Add compatible string for RK3506
i2c: i801: Add support for Intel Diamond Rapids
i2c: stm32: Omit two variable reassignments in stm32_i2c_dma_request()
i2c: designware: Omit a variable reassignment in dw_i2c_plat_probe()
i2c: pcf8584: Fix do not use assignment inside if conditional
i2c: pcf8584: Remove debug macros from i2c-algo-pcf.c
i2c: busses: bcm2835: convert from round_rate() to determine_rate()
...
|
|
v2 survivability breadcrumbs introduces a new mode called
SPI Flash Descriptor Override mode (FDO). This is enabled by
PCODE when MEI itself fails and firmware cannot be updated via
MEI using igsc. This mode provides the ability to update
the firmware directly via SPI driver.
Xe KMD initializes the nvm aux driver if FDO mode is enabled.
Userspace should check FDO mode entry in survivability info sysfs before
using the SPI driver to update firmware.
/sys/bus/pci/devices/<device>/survivability_info/fdo_mode
v2 also supports survivability mode for critical boot errors.
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20251208084539.3652902-6-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Redesign survivability mode to have only one value per file.
1) Retain the survivability_mode sysfs to indicate the type
cat /sys/bus/pci/devices/0000\:03\:00.0/survivability_mode
(Boot / Runtime)
2) Add survivability_info directory to expose boot breadcrumbs.
Entries in survivability mode sysfs are only visible when
boot breadcrumb registers are populated.
/sys/bus/pci/devices/0000:03:00.0/survivability_info
├── aux_info0
├── aux_info1
├── aux_info2
├── aux_info3
├── aux_info4
├── capability_info
├── postcode_trace
└── postcode_trace_overflow
Capability Info:
Provides data about boot status and has bits that
indicate the support for the other breadcrumbs
Postcode Trace / Postcode Trace Overflow :
Each postcode is represented as an 8-bit value and represents
a boot failure event. When a new failure event is logged by Pcode
the existing postcodes are shifted left. These entries provide a
history of 8 postcodes.
Auxiliary Info:
Some failures have additional debug information.
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20251208084539.3652902-5-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Ilpo Järvinen:
- acer-wmi: Add PH16-72, PHN16-72, and PT14-51 fan control support
- acpi: platform_profile: Add max-power profile option (power draw
limited by the cooling hardware, may exceed battery power draw limit
when on AC power)
- amd/hsmp: Allow more than one data-fabric per socket
- asus-armoury: Add WMI attributes driver to expose miscellaneous WMI
functions through fw_attributes (deprecates the custom BIOS features
interface through asus-wmi)
- asus-wmi: Use brightness_set_blocking() for kbd led
- ayaneo-ec: Add Ayaneo Embedded Controller driver
- fs/nls:
- Fix utf16 to utf8 string conversion when output size restricted
- Improve error code consistency for utf8 to utf32 conversions
- ideapad-laptop: Fast (Rapid Charge) charge type support
- intel/hid: Add Dell Pro Rugged 10/12 tablet to VGBS DMI quirks
- intel/pmc:
- Arrow Lake telemetry GUID improvements
- Add support for Wildcat Lake PMC information
- intel_pmc_ipc: Fix ACPI buffer memleak
- intel/punit_ipc: Fix memory corruption
- intel/vsec: Wildcat Lake PMT telemetry support
- lenovo-wmi-gamezone: Map "Extreme" performance mode to max-power
- lg-laptop: Add support for the HDAP opregion field
- serial-multi-instantiate: Add IRQ_RESOURCE_OPT for IRQ missing
projects
- thinkpad-t14s-ec: Improve suspend/resume support (lid LEDs, keyboard
backlight)
- uniwill: Add Uniwill laptop driver
- wmi: Move under drivers/platform/wmi as non-x86 WMI support is around
the corner and other WMI features will require adding more C files as
well
- tools/power/x86/intel-speed-select: v1.24
- Check feature status to check if the feature enablement was
successful
- Reset SST-TF bucket structure to display valid bucket info
- Miscellaneous cleanups / refactoring / improvements
* tag 'platform-drivers-x86-v6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (73 commits)
tools/power/x86/intel-speed-select: v1.24 release
tools/power/x86/intel-speed-select: Reset isst_turbo_freq_info for invalid buckets
tools/power/x86/intel-speed-select: Check feature status
platform/x86: asus-wmi: use brightness_set_blocking() for kbd led
fs/nls: Fix inconsistency between utf8_to_utf32() and utf32_to_utf8()
platform/x86: asus-armoury: add support for GA503QR
platform/x86: intel_pmc_ipc: fix ACPI buffer memory leak
platform/x86: hp-wmi: Order DMI board name arrays
platform/x86/intel/hid: Add Dell Pro Rugged 10/12 tablet to VGBS DMI quirks
platform: surface: replace use of system_wq with system_percpu_wq
platform: x86: replace use of system_wq with system_percpu_wq
platform/surface: acpi-notify: add WQ_PERCPU to alloc_workqueue users
platform/x86: wmi-gamezone: Add Legion Go 2 Quirks
platform/x86: lenovo-wmi-gamezone Use max-power rather than balanced-performance
acpi: platform_profile - Add max-power profile option
platform/x86/amd/pmf: Use devm_mutex_init() for mutex initialization
platform/x86/amd/pmf: Add BIOS_INPUTS_MAX macro to replace hardcoded array size
platform/x86: serial-multi-instantiate: Add IRQ_RESOURCE_OPT for IRQ missing projects
platform/x86/amd/pmf: Refactor repetitive BIOS output handling
platform/x86/uniwill: Add TUXEDO devices
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"Fix a runtime PM unit test added during the 6.18 development cycle and
change the pm_runtime_barrier() return type to void (Brian Norris)"
* tag 'pm-6.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
coccinelle: Drop pm_runtime_barrier() error code checks
PM: runtime: Make pm_runtime_barrier() return void
PM: runtime: Stop checking pm_runtime_barrier() return code
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS updates from Thomas Bogendoerfer:
"Just cleanups and fixes"
* tag 'mips_6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Fix whitespace damage in r4k_wait from VS timer fix
mips: kvm: simplify kvm_mips_deliver_interrupts()
MIPS: alchemy: mtx1: switch to static device properties
mips: Remove __GFP_HIGHMEM masking
MIPS: ftrace: Fix memory corruption when kernel is located beyond 32 bits
MIPS: dts: Always descend vendor subdirectories
mips: configs: loongson1: Update defconfig
MIPS: Fix HOTPLUG_PARALLEL dependency
|
|
Add a cond_lock annotation for lockref_put_or_lock to make sparse
happy with using it. Note that for this the return value has to be
double-inverted as the return value convention of lockref_put_or_lock
is inverted compared to _trylock conventions expected by __cond_lock,
as lockref_put_or_lock returns true when it did not need to take the
lock.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-auto
Pull __auto_type to auto conversion from Peter Anvin:
"Convert '__auto_type' to 'auto', defining a macro for 'auto' unless
C23+ is in use"
* tag 'auto-type-conversion-for-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-auto:
tools/virtio: replace "__auto_type" with "auto"
selftests/bpf: replace "__auto_type" with "auto"
arch/x86: replace "__auto_type" with "auto"
arch/nios2: replace "__auto_type" and adjacent equivalent with "auto"
fs/proc: replace "__auto_type" with "const auto"
include/linux: change "__auto_type" to "auto"
compiler_types.h: add "auto" as a macro for "__auto_type"
|
|
When autosuspend is triggered, driver rpm_on flag is set to indicate that
a suspend/resume is already in progress. However, when a userspace
application submits a command during this narrow window,
amdxdna_pm_resume_get() may incorrectly skip the resume operation because
the rpm_on flag is still set. This results in commands being submitted
while the device has not actually resumed, causing unexpected behavior.
The set_dpm() is called by suspend/resume, it relied on rpm_on flag to
avoid calling into rpm suspend/resume recursivly. So to fix this, remove
the use of the rpm_on flag entirely. Instead, introduce aie2_pm_set_dpm()
which explicitly resumes the device before invoking set_dpm(). With this
change, set_dpm() is called directly inside the suspend or resume execution
path. Otherwise, aie2_pm_set_dpm() is called.
Fixes: 063db451832b ("accel/amdxdna: Enhance runtime power management")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251208165356.1549237-1-lizhi.hou@amd.com
|
|
If wait for ring space started just before migration, it can delay
the recovery process, by waiting without bailout path for up to 2
seconds.
Two second wait for recovery is not acceptable, and if the ring was
completely filled even without the migration temporarily stopping
execution, then such a wait will result in up to a thousand new jobs
(assuming constant flow) being added while the wait is happening.
While this will not cause data corruption, it will lead to warning
messages getting logged due to reset being scheduled on a GT under
recovery. Also several seconds of unresponsiveness, as the backlog
of jobs gets progressively executed.
Add a bailout condition, to make sure the recovery starts without
much delay. The recovery is expected to finish in about 100 ms when
under moderate stress, so the condition verification period needs to be
below that - settling at 64 ms.
The theoretical max time which the recovery can take depends on how
many requests can be emitted to engine rings and be pending execution.
While stress testing, it was possible to reach 10k pending requests
on rings when a platform with two GTs was used. This resulted in max
recovery time of 5 seconds. But in real life situations, it is very
unlikely that the amount of pending requests will ever exceed 100,
and for that the recovery time will be around 50 ms - well within our
claimed limit of 100ms.
Fixes: a4dae94aad6a ("drm/xe/vf: Wakeup in GuC backend on VF post migration recovery")
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251204200820.2206168-1-tomasz.lis@intel.com
|
|
The newly added damos_test_commit() constructs multiple large structures
on the stack, which exceeds the warning limit in some cases:
In file included from mm/damon/core.c:2941:
mm/damon/tests/core-kunit.h: In function 'damos_test_commit':
mm/damon/tests/core-kunit.h:965:1: error: the frame size of 1520 bytes is larger than 1280 bytes [-Werror=frame-larger-than=]
Split this function up into two separate ones that are called
sequentially, so they can occupy the same stack slots.
Link: https://lkml.kernel.org/r/20251204100403.1034980-1-arnd@kernel.org
Fixes: 299a88f6ec13 ("mm/damon/tests/core-kunit: add damos_commit() test")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Quanmin Yan <yanquanmin1@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When enabling vmscan tracing, it is observed that nr_requested is always
4096, which is confusing.
mm_vmscan_lru_isolate: classzone=3 order=0 nr_requested=4096 ...
mm_vmscan_lru_isolate: classzone=3 order=0 nr_requested=4096 ...
mm_vmscan_lru_isolate: classzone=3 order=0 nr_requested=4096 ...
mm_vmscan_lru_isolate: classzone=3 order=0 nr_requested=4096 ...
mm_vmscan_lru_isolate: classzone=3 order=0 nr_requested=4096 ...
mm_vmscan_lru_isolate: classzone=3 order=0 nr_requested=4096 ...
mm_vmscan_lru_isolate: classzone=3 order=0 nr_requested=4096 ...
This is because it prints MAX_LRU_BATCH, which is meaningless as it's a
constant. To fix this, modify it to print capped valued.
Link: https://lkml.kernel.org/r/20251204122355.1822919-1-chenridong@huaweicloud.com
Fixes: 8c2214fc9a47 ("mm: multi-gen LRU: reuse some legacy trace events")
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Jaewon Kim <jaewon31.kim@samsung.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Lu Jialin <lujialin4@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The files in Documentation/core-api/ are by virtue of their top-level
directory part of the Documentation section in MAINTAINERS. Each file in
Documentation/core-api/ should however also have a further section in
MAINTAINERS it belongs to, which fits to the technical area of the
documented API in that file.
idr.rst provides some explanation to the ID allocation API defined in
lib/idr.c, which itself is part of the XARRAY section.
Add this core-api document to XARRAY.
Link: https://lkml.kernel.org/r/20251105105857.156950-1-lukas.bulwahn@redhat.com
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The function hugetlb_reserve_pages() returns the number of pages added
to the reservation map on success and a negative error code on failure
(e.g. -EINVAL, -ENOMEM). However, in some error paths, it may return -1
directly.
For example, a failure at:
if (hugetlb_acct_memory(h, gbl_reserve) < 0)
goto out_put_pages;
results in returning -1 (since add = -1), which may be misinterpreted
in userspace as -EPERM.
Fix this by explicitly capturing and propagating the return values from
helper functions, and using -EINVAL for all other failure cases.
Link: https://lkml.kernel.org/r/20251125171350.86441-1-skolothumtho@nvidia.com
Fixes: 986f5f2b4be3 ("mm/hugetlb: make hugetlb_reserve_pages() return nr of entries updated")
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Matthew R. Ochs <mochs@nvidia.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolin Chen <nicolinc@nvidia.com>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Commit 2b6a3f061f11 ("mm: declare VMA flags by bit") significantly
refactors the header file include/linux/mm.h. In that step, it introduces
a typo in an ifdef, referring to a non-existing config option
STACK_GROWS_UP, whereas the actual config option is called STACK_GROWSUP.
Fix this typo in the mm header file.
Link: https://lkml.kernel.org/r/20251201122922.352480-1-lukas.bulwahn@redhat.com
Fixes: 2b6a3f061f11 ("mm: declare VMA flags by bit")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The "return <error code>" statements for error checks at the beginning of
__folio_split() skip necessary count_vm_event() and count_mthp_stat() at
the end of the function. Fix these by replacing them with "ret = <error
code>; goto out;".
Link: https://lkml.kernel.org/r/20251126210618.1971206-5-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
min_order_for_split() returns -EBUSY when the folio is truncated and
cannot be split. In commit 77008e1b2ef7 ("mm/huge_memory: do not change
split_huge_page*() target order silently"), memory_failure() does not
handle it and pass -EBUSY to try_to_split_thp_page() directly.
try_to_split_thp_page() returns -EINVAL since -EBUSY becomes 0xfffffff0 as
new_order is unsigned int in __folio_split() and this large new_order is
rejected as an invalid input. The code does not cause a bug.
soft_offline_in_use_page() also uses min_order_for_split() but it always
passes 0 as new_order for split.
Fix it by making min_order_for_split() always return an order. When the
given folio is truncated, namely folio->mapping == NULL, return 0 and let
a subsequent split function handle the situation and return -EBUSY.
Add kernel-doc to min_order_for_split() to clarify its use.
Link: https://lkml.kernel.org/r/20251126210618.1971206-4-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
can_split_folio() is just a refcount comparison, making sure only the
split caller holds an extra pin. Open code it with
folio_expected_ref_count() != folio_ref_count() - 1. For the extra_pins
used by folio_ref_freeze(), add folio_cache_ref_count() to calculate it.
Also replace folio_expected_ref_count() with folio_cache_ref_count() used
by folio_ref_unfreeze(), since they are returning the same values when a
folio is frozen and folio_cache_ref_count() does not have unnecessary
folio_mapcount() in its implementation.
Link: https://lkml.kernel.org/r/20251126210618.1971206-3-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Suggested-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Improve folio split related functions", v4.
This patchset improves several folio split related functions to avoid
future misuse. The changes are:
1. Consolidated folio splittable checks by moving truncated folio check,
huge zero folio check, and writeback folio check into
folio_split_supported(). Changed the function return type. Renamed it
to folio_check_splittable() for clarification.
2. Replaced can_split_folio() with open coded folio_expected_ref_count()
and folio_ref_count() and introduced folio_cache_ref_count().
3. Changed min_order_for_split() to always return an order.
4. Fixed folio split stats counting.
Motivation
==========
This is based on Wei's observation[1] and solves several potential
issues:
1. Dereferencing NULL folio->mapping in try_folio_split_to_order() if it
is called on truncated folios.
2. Not handling of negative return value of min_order_for_split() in
mm/memory-failure.c
There is no bug in the current code.
This patch (of 4):
folio_split_supported() used in try_folio_split_to_order() requires
folio->mapping to be non NULL, but current try_folio_split_to_order() does
not check it. There is no issue in the current code, since
try_folio_split_to_order() is only used in truncate_inode_partial_folio(),
where folio->mapping is not NULL.
To prevent future misuse, move folio->mapping NULL check (i.e., folio is
truncated) into folio_split_supported(). Since folio->mapping NULL check
returns -EBUSY and folio_split_supported() == false means -EINVAL, change
folio_split_supported() return type from bool to int and return error
numbers accordingly. Rename folio_split_supported() to
folio_check_splittable() to match the return type change.
While at it, move is_huge_zero_folio() check and folio_test_writeback()
check into folio_check_splittable() and add kernel-doc.
Remove all warnings inside folio_check_splittable() and give warnings
in __folio_split() instead, so that bool warns parameter can be removed.
Link: https://lkml.kernel.org/r/20251126210618.1971206-1-ziy@nvidia.com
Link: https://lkml.kernel.org/r/20251126210618.1971206-2-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: Balbir Singh <balbirs@nvidia.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When CONFIG_SPARSEMEM is disabled, the macro
sparse_vmemmap_init_nid_early(_nid, _use) passes two arguments, while the
actual function accepts only nid. Drop the extra argument _use.
Link: https://lkml.kernel.org/r/20251127092512.278-1-guojinhui.liam@bytedance.com
Fixes: d65917c42373 ("mm/sparse: allow for alternate vmemmap section init at boot")
Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com>
Cc: Frank van der Linden <fvdl@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Let's properly adjust BALLOON_MIGRATE like the other drivers.
Note that the INFLATE/DEFLATE events are triggered from the core when
enqueueing/dequeueing pages.
This was found by code inspection.
Link: https://lkml.kernel.org/r/20251021100606.148294-3-david@redhat.com
Fixes: fe030c9b85e6 ("powerpc/pseries/cmm: Implement balloon compaction")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
CONFIG_BALLOON_COMPACTION
Patch series "powerpc/pseries/cmm: two smaller fixes".
Two smaller fixes identified while doing a bigger rework.
This patch (of 2):
We always have to initialize the balloon_dev_info, even when compaction is
not configured in: otherwise the containing list and the lock are left
uninitialized.
Likely not many such configs exist in practice, but let's CC stable to
be sure.
This was found by code inspection.
Link: https://lkml.kernel.org/r/20251021100606.148294-1-david@redhat.com
Link: https://lkml.kernel.org/r/20251021100606.148294-2-david@redhat.com
Fixes: fe030c9b85e6 ("powerpc/pseries/cmm: Implement balloon compaction")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The xe3p_xpc platform supports Indirect Ring State and
it is required for the upcoming multi-queue feature.
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20251204063451.1180387-2-niranjana.vishwanathapura@intel.com
|
|
Initialize rzg3s_pcie_msi_irq() MSI status bitmap before use.
Fixes: 7ef502fb35b2 ("PCI: Add Renesas RZ/G3S host controller driver")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/all/202512070218.XVMUQCl7-lkp@intel.com
Closes: https://lore.kernel.org/all/202512061812.Xbqmd2Gn-lkp@intel.com
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20251209125122.304129-1-claudiu.beznea.uj@bp.renesas.com
|
|
Consider a multi-link AP configuration:
MLD vif (MAC addr: aa:bb)
|-- 2.4 GHz link (BSSID: aa:bb)
|-- 5 GHz link (BSSID: cc:dd)
For AP vdevs, ath12k creates a DP peer using the arvif's BSSID and stores
it in dp_hw->dp_peers_list. During scan operations, the driver assigns an
arvif to the scan vdev and uses the vif's MAC address as its BSSID. In
the above scenario, the scan vdev MAC address (aa:bb) matches the BSSID
of the 2.4 GHz AP link, causing a duplicate entry in dp_hw->dp_peers_list
and leading to scan vdev creation failure.
Failure in vif bringup sequence:
1. Create AP vdev for 2.4 GHz link:
- Assign arvif with BSSID = aa:bb and link_id = 0.
- Create DP peer with address aa:bb and add to dp_hw->dp_peers_list.
2. Create scan vdev for 5 GHz link:
- Assign arvif with BSSID = aa:bb (same as vif MAC address) and
link_id = 15.
- Attempt to create another DP peer with address aa:bb.
- Operation fails because aa:bb already exists in dp_hw->dp_peers_list,
resulting in duplicate entry conflict.
3. Delete scan vdev for 5 GHz link.
4. Create AP vdev for 5 GHz link.
Since DP peer is not needed for scan operations, identify scan vdev using
arvif->link_id >= IEEE80211_MLD_MAX_NUM_LINKS and skip DP peer creation
and deletion.
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.5-01651-QCAHKSWPL_SILICONZ-1
Signed-off-by: Ripan Deuri <quic_rdeuri@quicinc.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Link: https://patch.msgid.link/20251207072717.95542-1-quic_rdeuri@quicinc.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
blk_mq_{add,del}_queue_tag_set() functions add and remove queues from
tagset, the functions make sure that tagset and queues are marked as
shared when two or more queues are attached to the same tagset.
Initially a tagset starts as unshared and when the number of added
queues reaches two, blk_mq_add_queue_tag_set() marks it as shared along
with all the queues attached to it. When the number of attached queues
drops to 1 blk_mq_del_queue_tag_set() need to mark both the tagset and
the remaining queues as unshared.
Both functions need to freeze current queues in tagset before setting on
unsetting BLK_MQ_F_TAG_QUEUE_SHARED flag. While doing so, both functions
hold set->tag_list_lock mutex, which makes sense as we do not want
queues to be added or deleted in the process. This used to work fine
until commit 98d81f0df70c ("nvme: use blk_mq_[un]quiesce_tagset")
made the nvme driver quiesce tagset instead of quiscing individual
queues. blk_mq_quiesce_tagset() does the job and quiesce the queues in
set->tag_list while holding set->tag_list_lock also.
This results in deadlock between two threads with these stacktraces:
__schedule+0x47c/0xbb0
? timerqueue_add+0x66/0xb0
schedule+0x1c/0xa0
schedule_preempt_disabled+0xa/0x10
__mutex_lock.constprop.0+0x271/0x600
blk_mq_quiesce_tagset+0x25/0xc0
nvme_dev_disable+0x9c/0x250
nvme_timeout+0x1fc/0x520
blk_mq_handle_expired+0x5c/0x90
bt_iter+0x7e/0x90
blk_mq_queue_tag_busy_iter+0x27e/0x550
? __blk_mq_complete_request_remote+0x10/0x10
? __blk_mq_complete_request_remote+0x10/0x10
? __call_rcu_common.constprop.0+0x1c0/0x210
blk_mq_timeout_work+0x12d/0x170
process_one_work+0x12e/0x2d0
worker_thread+0x288/0x3a0
? rescuer_thread+0x480/0x480
kthread+0xb8/0xe0
? kthread_park+0x80/0x80
ret_from_fork+0x2d/0x50
? kthread_park+0x80/0x80
ret_from_fork_asm+0x11/0x20
__schedule+0x47c/0xbb0
? xas_find+0x161/0x1a0
schedule+0x1c/0xa0
blk_mq_freeze_queue_wait+0x3d/0x70
? destroy_sched_domains_rcu+0x30/0x30
blk_mq_update_tag_set_shared+0x44/0x80
blk_mq_exit_queue+0x141/0x150
del_gendisk+0x25a/0x2d0
nvme_ns_remove+0xc9/0x170
nvme_remove_namespaces+0xc7/0x100
nvme_remove+0x62/0x150
pci_device_remove+0x23/0x60
device_release_driver_internal+0x159/0x200
unbind_store+0x99/0xa0
kernfs_fop_write_iter+0x112/0x1e0
vfs_write+0x2b1/0x3d0
ksys_write+0x4e/0xb0
do_syscall_64+0x5b/0x160
entry_SYSCALL_64_after_hwframe+0x4b/0x53
The top stacktrace is showing nvme_timeout() called to handle nvme
command timeout. timeout handler is trying to disable the controller and
as a first step, it needs to blk_mq_quiesce_tagset() to tell blk-mq not
to call queue callback handlers. The thread is stuck waiting for
set->tag_list_lock as it tries to walk the queues in set->tag_list.
The lock is held by the second thread in the bottom stack which is
waiting for one of queues to be frozen. The queue usage counter will
drop to zero after nvme_timeout() finishes, and this will not happen
because the thread will wait for this mutex forever.
Given that [un]quiescing queue is an operation that does not need to
sleep, update blk_mq_[un]quiesce_tagset() to use RCU instead of taking
set->tag_list_lock, update blk_mq_{add,del}_queue_tag_set() to use RCU
safe list operations. Also, delete INIT_LIST_HEAD(&q->tag_set_list)
in blk_mq_del_queue_tag_set() because we can not re-initialize it while
the list is being traversed under RCU. The deleted queue will not be
added/deleted to/from a tagset and it will be freed in blk_free_queue()
after the end of RCU grace period.
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Fixes: 98d81f0df70c ("nvme: use blk_mq_[un]quiesce_tagset")
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
__bio_for_each_segment() uses the returned struct bio_vec's bv_len field
to advance the struct bvec_iter at the end of each loop iteration. So
it's incorrect to modify it during the loop. Don't assign to bv_len (or
bv_offset, for that matter) in ublk_copy_user_pages().
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Fixes: e87d66ab27ac ("ublk: use rq_for_each_segment() for user copy")
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Now that all potential callers of bio_chain_endio have been
eliminated, completely prohibit any future calls to this function.
Suggested-by: Ming Lei <ming.lei@redhat.com>
Suggested-by: Andreas Gruenbacher <agruenba@redhat.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Don't call bio->bi_end_io() directly. Use the bio_endio() helper
function instead, which handles completion more safely and uniformly.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
GCC notices that the 16-byte uabi_name field could theoretically be too
small for the formatted string if the instance number exceeds 100.
So grow the field to 20 bytes.
drivers/gpu/drm/i915/intel_memory_region.c: In function ‘intel_memory_region_create’:
drivers/gpu/drm/i915/intel_memory_region.c:273:61: error: ‘%u’ directive output may be truncated writing between 1 and 5 bytes into a region of size between 3 and 11 [-Werror=format-truncation=]
273 | snprintf(mem->uabi_name, sizeof(mem->uabi_name), "%s%u",
| ^~
drivers/gpu/drm/i915/intel_memory_region.c:273:58: note: directive argument in the range [0, 65535]
273 | snprintf(mem->uabi_name, sizeof(mem->uabi_name), "%s%u",
| ^~~~~~
drivers/gpu/drm/i915/intel_memory_region.c:273:9: note: ‘snprintf’ output between 7 and 19 bytes into a destination of size 16
273 | snprintf(mem->uabi_name, sizeof(mem->uabi_name), "%s%u",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
274 | intel_memory_type_str(type), instance);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fixes: 3b38d3515753 ("drm/i915: Add stable memory region names")
Cc: <stable@vger.kernel.org> # v6.8+
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
Link: https://lore.kernel.org/r/20251205113500.684286-2-ardb@kernel.org
(cherry picked from commit 18476087f1a18dc279d200d934ad94fba1fb51d5)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|