| Age | Commit message (Collapse) | Author |
|
Drop 27.2 MHz PLL table as with these PLL dividers
the port clock will be incorrectly calculated to 27.0 MHz.
For 27.2 MHz rate the PLl dividers are calculated
algorithmically making PLL table for this rate redundant.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260119093757.2850233-15-mika.kahola@intel.com
|
|
Drop C20 25.175 MHz PLL table as with these
PLL dividers the port clock will be incorrectly
calculated to 25.2 MHz. For 25.175 MHz rate the
PLl dividers are calculated algorithmically making
PLL table for this rate redundant.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260119093757.2850233-14-mika.kahola@intel.com
|
|
Add verification for lt phy pll dividers during boot. The port clock
is calculated from pll dividers and compared against the requested
port clock value. If there are a difference exceeding +-1 kHz an
drm_warn() is thrown out to indicate possible pll divider mismatch.
v2:
- Move the LT_PHY_PLL_PARAMS -> LT_PHY_PLL_DP/HDMI_PARAMS change
earlier.
- Use tables[i].name != NULL as a terminating condition.
- Use state vs. params term consistently in intel_c10pll_verify_clock()
and intel_c20pll_verify_clock().
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260119093757.2850233-13-mika.kahola@intel.com
|
|
Add verification for pll table dividers. The port clock
is computed based on pll tables and, for hdmi case, the
algorithmic model is applied to calculate pll dividers.
If port clock differs more than +-1 kHz from expected value
an drm_warn() is thrown and pll divider differences are
printed out for debugging purposes.
v2:
- Move clock derivation from dividers in intel_cx0pll_enable()
earlier in the patchset.
- Keep intel_cx0_pll_power_save_wa() in intel_dpll_sanitize_state()
- Use tables[i].name != NULL as a terminating condition.
- Drop duplicate intel_cx0pll_clock_matches() declaration in header.
- Use state vs. params term consistently in intel_c10pll_verify_clock()
and intel_c20pll_verify_clock().
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260119093757.2850233-12-mika.kahola@intel.com
|
|
Since the clock rate is derived from the PLL divider values it can have
a +-1kHz difference wrt. the reference rates in the comparison
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260119093757.2850233-11-mika.kahola@intel.com
|
|
HDMI FRL clock rates are incorrectly defined. Fix these
rates.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260119093757.2850233-10-mika.kahola@intel.com
|
|
The hard coded clock rate stored in the PLL state will be removed by
a follow-up change. The clock is calculated instead of
using clock from the PLL divider values. Since this calculated clock
may vary due to fixed point rounding issues, a +-1 kHz variation is
allowed with the request clock rate against the calculated clock rate.
v2:
- Use the stricter +-1 kHz allowed difference.
- Derive the clock from PLL dividers in intel_cx0pll_enable().
- Move corresponding fuzzy check for LT PHY PLLs to this patch.
v3: Reword commit message (Suraj)
Move clock check to intel_dpll and rename it (Suraj)
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260119093757.2850233-9-mika.kahola@intel.com
|
|
Create a macro for PLL state for LT PHY similar as
for cx0 case.
v2:
- Move addition of LT_PHY_PLL_DP/HDMI_PARAMS() to this patch.
- Fix end of table checking while looking up a table.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260119093757.2850233-8-mika.kahola@intel.com
|
|
Create macro for storing PLL dividers with table name
and clock rate.
v2: Preserve the terminating {} in each PLL table.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260119093757.2850233-7-mika.kahola@intel.com
|
|
For C10 and C20 we have unused encoder parameter passed
to port clock calculation function. Remove the encoder from
being passed to the port clock calculation function.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260119093757.2850233-6-mika.kahola@intel.com
|
|
Drop crtc_state from intel_lt_phy_calc_port_clock() function call
and replace it with pll state instead. Follow-up changes will
call these functions without a crtc_state available.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260119093757.2850233-5-mika.kahola@intel.com
|
|
Drop crtc_state from HDMI TMDS calculation and replace with the
parameters that are only required. Follow-up changes will call
these functions without a crtc_state available.
v2: Keep required crtc_state param for intel_c20_pll_tables_get()
and other functions calling this one.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260119093757.2850233-4-mika.kahola@intel.com
|
|
Prepare removal of .clock member from the pll state
structure by moving intel_c20pll_calc_port_clock()
function.
No functional change.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260119093757.2850233-3-mika.kahola@intel.com
|
|
Prepare removal of .clock member from pll state
structures by moving intel_c10pll_calc_port_clock()
function.
No functional changes.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260119093757.2850233-2-mika.kahola@intel.com
|
|
On error handling paths, lineinfo_changed_notify() doesn't free the
allocated resources which results leaks. Fix it.
Cc: stable@vger.kernel.org
Fixes: d4cd0902c156 ("gpio: cdev: make sure the cdev fd is still active before emitting events")
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
Link: https://lore.kernel.org/r/20260120030857.2144847-1-tzungbi@kernel.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
|
|
-ENOMEM is a more appropriate return code for memory allocation
failures. Correct it.
Cc: stable@vger.kernel.org
Fixes: 20bddcb40b2b ("gpiolib: cdev: replace locking wrappers for gpio_device with guards")
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
Link: https://lore.kernel.org/r/20260116081036.352286-6-tzungbi@kernel.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
|
|
Add options/u-boot/boot-led property to specify to U-Boot
the LED which indicates a successful boot.
Signed-off-by: Patrice Chotard <patrice.chotard@foss.st.com>
Link: https://lore.kernel.org/r/20251112-upstream_add_boot-led_for_stm32_boards-v1-10-50a3a9b339a8@foss.st.com
Link: https://lore.kernel.org/r/20251112-upstream_add_boot-led_for_stm32_boards-v1-11-50a3a9b339a8@foss.st.com
Link: https://lore.kernel.org/r/20251112-upstream_add_boot-led_for_stm32_boards-v1-12-50a3a9b339a8@foss.st.com
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
|
|
The L2 power-on sequence and soft reset in Tyr previously relied on fixed
sleeps followed by a single register check, since polling helpers were not
available in Rust at the time.
Now that read_poll_timeout() is available, poll the relevant registers
until the hardware reports readiness or a timeout is reached.
This avoids unnecessary delays on start-up.
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20260119202645.362457-1-deborah.brouwer@collabora.com
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
|
|
The `..IRQ..` register is printed here. Not the `..INT..` one.
Correct this.
Cc: stable@vger.kernel.org
Fixes: cf4fd52e3236 ("rust: drm: Introduce the Tyr driver for Arm Mali GPUs")
Link: https://lore.kernel.org/rust-for-linux/A04F0357-896E-4ACC-BC0E-DEE8608CE518@collabora.com/
Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
Link: https://patch.msgid.link/20260119070838.3219739-1-dirk.behme@de.bosch.com
[aliceryhl: update commit message prefix]
[aliceryhl: add cc stable as per Miguel's suggestion]
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
|
|
Add options/u-boot/boot-led property to specify to U-Boot
the LED which indicates a successful boot.
Signed-off-by: Patrice Chotard <patrice.chotard@foss.st.com>
Link: https://lore.kernel.org/r/20251112-upstream_add_boot-led_for_stm32_boards-v1-1-50a3a9b339a8@foss.st.com
Link: https://lore.kernel.org/r/20251112-upstream_add_boot-led_for_stm32_boards-v1-2-50a3a9b339a8@foss.st.com
Link: https://lore.kernel.org/r/20251112-upstream_add_boot-led_for_stm32_boards-v1-3-50a3a9b339a8@foss.st.com
Link: https://lore.kernel.org/r/20251112-upstream_add_boot-led_for_stm32_boards-v1-4-50a3a9b339a8@foss.st.com
Link: https://lore.kernel.org/r/20251112-upstream_add_boot-led_for_stm32_boards-v1-5-50a3a9b339a8@foss.st.com
Link: https://lore.kernel.org/r/20251112-upstream_add_boot-led_for_stm32_boards-v1-6-50a3a9b339a8@foss.st.com
Link: https://lore.kernel.org/r/20251112-upstream_add_boot-led_for_stm32_boards-v1-7-50a3a9b339a8@foss.st.com
Link: https://lore.kernel.org/r/20251112-upstream_add_boot-led_for_stm32_boards-v1-8-50a3a9b339a8@foss.st.com
Link: https://lore.kernel.org/r/20251112-upstream_add_boot-led_for_stm32_boards-v1-9-50a3a9b339a8@foss.st.com
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
|
|
Add green and orange LED support on stm32mp235f-dk board.
Signed-off-by: Patrice Chotard <patrice.chotard@foss.st.com>
Link: https://lore.kernel.org/r/20251113-upstream_update_led_nodes-v2-14-45090db9e2e5@foss.st.com
Link: https://lore.kernel.org/r/20251113-upstream_update_led_nodes-v2-15-45090db9e2e5@foss.st.com
Link: https://lore.kernel.org/r/20251113-upstream_update_led_nodes-v2-16-45090db9e2e5@foss.st.com
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
|
|
Add led-red node for stm32mp15xx-dkx, this LED is used as status
LED in U-Boot.
Update led-blue node by adding color property and replacing obsolete
label property by function property.
Signed-off-by: Patrice Chotard <patrice.chotard@foss.st.com>
Link: https://lore.kernel.org/r/20251113-upstream_update_led_nodes-v2-13-45090db9e2e5@foss.st.com
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
|
|
Add led-red node for stm32mp157c-ed1.
This LED is used as status LED in U-Boot.
Signed-off-by: Patrice Chotard <patrice.chotard@foss.st.com>
Link: https://lore.kernel.org/r/20251113-upstream_update_led_nodes-v2-12-45090db9e2e5@foss.st.com
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
|
|
Add LED red node for stm32mp135f-dk.
This LED is used as status lLED in U-Boot.
Signed-off-by: Patrice Chotard <patrice.chotard@foss.st.com>
Link: https://lore.kernel.org/r/20251113-upstream_update_led_nodes-v2-11-45090db9e2e5@foss.st.com
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
|
|
Add green and red LEDs support for stm32h743-eval.
Signed-off-by: Patrice Chotard <patrice.chotard@foss.st.com>
Link: https://lore.kernel.org/r/20251113-upstream_update_led_nodes-v2-9-45090db9e2e5@foss.st.com
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
|
|
Add gpio led support for LED green,orange,red and blue
in stm32h743i-disco.dts.
Signed-off-by: Patrice Chotard <patrice.chotard@foss.st.com>
Link: https://lore.kernel.org/r/20251113-upstream_update_led_nodes-v2-8-45090db9e2e5@foss.st.com
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
|
|
Add function porperty for led nodes.
Add LED color property for LED nodes.
Reorder include dt-bindings.
Signed-off-by: Patrice Chotard <patrice.chotard@foss.st.com>
Link: https://lore.kernel.org/r/20251113-upstream_update_led_nodes-v2-2-45090db9e2e5@foss.st.com
Link: https://lore.kernel.org/r/20251113-upstream_update_led_nodes-v2-3-45090db9e2e5@foss.st.com
Link: https://lore.kernel.org/r/20251113-upstream_update_led_nodes-v2-4-45090db9e2e5@foss.st.com
Link: https://lore.kernel.org/r/20251113-upstream_update_led_nodes-v2-5-45090db9e2e5@foss.st.com
Link: https://lore.kernel.org/r/20251113-upstream_update_led_nodes-v2-6-45090db9e2e5@foss.st.com
Link: https://lore.kernel.org/r/20251113-upstream_update_led_nodes-v2-7-45090db9e2e5@foss.st.com
Link: https://lore.kernel.org/r/20251113-upstream_update_led_nodes-v2-10-45090db9e2e5@foss.st.com
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
|
|
Intel Nova Lake is supported by the generic platform driver,
so add it to the list of supported in Kconfig.
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
|
|
The Radxa Zero2 has an FUSB302 controller on i2c3 at address 0x22 and
INT# wired to GPIOA-13; include a minimal connector.
Signed-off-by: Ricardo Pardini <ricardo@pardini.net>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20260115-arm64-dts-amlogic-radxa-zero2-additions-v2-1-948bb0479a45@pardini.net
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
|
|
during boot.
Fix issue with Odroid HC4 (and all meson-sm1-odroid) DTS that causes
regulator power to momentarily glitch OFF-ON during boot. Add
regulator-boot-on to all regulator-fixed and regulator-gpio entries
that (1) define a gpio AND (2) define regulator-always-on.
U-boot powers on devices necessary for boot then hands off the DTB to
the kernel. During probe, linux drivers/regulator/fixed.c and
gpio-regulator.c both first set the regulator control gpio (that U-boot
already turned ON) to default OFF before then setting it to the defined
(ON) state. This glitches the power to the affected devices, unless
regulator-boot-on is specified with it. In fact, U-boot has the same
behavior. So, during reboot, a power glitch can actually happen twice:
once when U-boot reads the DTB and probes the gpio and again when the
kernel reads the DTB and probes the gpio.
Problem this fixes: On the Odroid HC4, power to the SATA ports glitches
during boot and causes some HDDs to do emergency head retract, which
should be avoided. On the HC4, power glitches to the SD card, USB,
SATA, and HDMI interfaces during boot. These are all boot devices.
A power glitch can potentially cause a problem for any sensitive devices
during boot.
NOTE: This is not limited to just the HC4, likely an issue with ALL DTS
with regulator-fixed or regulator-gpio entries that (1) define a gpio
AND (2) define regulator-always-on. All such entries should also
include regulator-boot-on in order to avoid potential power glitches.
At worst, adding regulator-boot-on in such cases is harmless because of
regulator-always-on, and, at best, it eliminates detrimental power
glitches during boot. So, this is best-practice.
Fixes: 164147f094ec5d0fc2c2098a888f4b50cf3096a7 ("arm64: dts: meson-sm1-odroid-hc4: add regulators controlled by GPIOH_8")
Fixes: 45d736ab17b44257e15e75e0dba364139fdb0983 ("arm64: dts: meson-sm1-odroid: add 5v regulator gpio")
Fixes: 1f80a5cf74a60997b92d2cde772edec093bec4d9 ("arm64: dts: meson-sm1-odroid: add missing enable gpio and supply for tf_io regulator")
Fixes: 88d537bc92ca035e2a9920b0abc750dd62146520 ("arm64: dts: meson: convert meson-sm1-odroid-c4 to dtsi")
Signed-off-by: Eric Neulight <Eric.Neulight@linuxdev.slmail.me>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Acked-by: Viacheslav Bocharov <v@baodeep.com>
Tested-by: Ricardo Pardini <ricardo@pardini.net> # on Odroid-HC4 5V HDD
Link: https://patch.msgid.link/20260116-odroid-hc4-dts-v1-1-459b601cd5cf@linuxdev.slmail.me
[narmstrong: fixed subject prefix]
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
|
|
Enable the on-board eMMC storage for Khadas VIM1S.
The VIM1S features a 16GB eMMC 5.1 module. This patch adds the
necessary regulators and the eMMC controller node.
Signed-off-by: Nick Xie <nick@khadas.com>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Link: https://patch.msgid.link/20260116023611.2033078-1-nick@khadas.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
|
|
authencesn assumes an ESP/ESN-formatted AAD. When assoclen is shorter than
the minimum expected length, crypto_authenc_esn_decrypt() can advance past
the end of the destination scatterlist and trigger a NULL pointer dereference
in scatterwalk_map_and_copy(), leading to a kernel panic (DoS).
Add a minimum AAD length check to fail fast on invalid inputs.
Fixes: 104880a6b470 ("crypto: authencesn - Convert to new AEAD interface")
Reported-By: Taeyang Lee <0wn@theori.io>
Signed-off-by: Taeyang Lee <0wn@theori.io>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
During initialization of DRM buddy memory manager at drm_buddy_init,
mm->free_trees array is allocated for both clear and dirty RB trees.
During cleanup happening at drm_buddy_fini it is never freed, leading to
following memory leaks observed on xe module load & unload cycles:
kmemleak_alloc+0x4a/0x90
__kmalloc_cache_noprof+0x488/0x800
drm_buddy_init+0xc2/0x330 [drm_buddy]
__xe_ttm_vram_mgr_init+0xc3/0x190 [xe]
xe_ttm_stolen_mgr_init+0xf5/0x9d0 [xe]
xe_device_probe+0x326/0x9e0 [xe]
xe_pci_probe+0x39a/0x610 [xe]
local_pci_probe+0x47/0xb0
pci_device_probe+0xf3/0x260
really_probe+0xf1/0x3c0
__driver_probe_device+0x8c/0x180
driver_probe_device+0x24/0xd0
__driver_attach+0x10f/0x220
bus_for_each_dev+0x7f/0xe0
driver_attach+0x1e/0x30
bus_add_driver+0x151/0x290
Deallocate array for free trees when cleaning up buddy memory manager
in the same way as if going through out_free_tree label.
Fixes: d4cd665c98c1 ("drm/buddy: Separate clear and dirty free block trees")
Signed-off-by: Michał Grzelak <michal.grzelak@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Link: https://patch.msgid.link/20251208102714.4008260-2-michal.grzelak@intel.com
|
|
Disable CASF with joiner as it is not supported
in hardware.
v2: Replace dmesg_WARN with drm_dbg_kms. [Jani]
v3: Modify commit message. [Suraj]
Signed-off-by: Nemesa Garg <nemesa.garg@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260115113948.641822-1-nemesa.garg@intel.com
|
|
Currently we don't used mballoc optimized scanning (using max free
extent order and avg free extent order group lists) for inodes with
indirect block based format. This is confusing for users and I don't see
a good reason for that. Even with indirect block based inode format we
can spend big amount of time searching for free blocks for large
filesystems with fragmented free space. To add to the confusion before
commit 077d0c2c78df ("ext4: make mb_optimize_scan performance mount
option work with extents") optimized scanning was applied *only* to
indirect block based inodes so that commit appears as a performance
regression to some users. Just use optimized scanning whenever it is
enabled by mount options.
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Link: https://patch.msgid.link/20260114182836.14120-4-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
For filesystems with more than 2^32 blocks inodes using indirect block
based format cannot use blocks beyond the 32-bit limit.
ext4_mb_scan_groups_linear() takes care to not select these unsupported
groups for such inodes however other functions selecting groups for
allocation don't. So far this is harmless because the other selection
functions are used only with mb_optimize_scan and this is currently
disabled for inodes with indirect blocks however in the following patch
we want to enable mb_optimize_scan regardless of inode format.
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Pedro Falcato <pfalcato@suse.de>
Cc: stable@kernel.org
Link: https://patch.msgid.link/20260114182836.14120-3-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
fstests test generic/388 occasionally reproduces a warning in
ext4_put_super() associated with the dirty clusters count:
WARNING: CPU: 7 PID: 76064 at fs/ext4/super.c:1324 ext4_put_super+0x48c/0x590 [ext4]
Tracing the failure shows that the warning fires due to an
s_dirtyclusters_counter value of -1. IOW, this appears to be a
spurious decrement as opposed to some sort of leak. Further tracing
of the dirty cluster count deltas and an LLM scan of the resulting
output identified the cause as a double decrement in the error path
between ext4_mb_mark_diskspace_used() and the caller
ext4_mb_new_blocks().
First, note that generic/388 is a shutdown vs. fsstress test and so
produces a random set of operations and shutdown injections. In the
problematic case, the shutdown triggers an error return from the
ext4_handle_dirty_metadata() call(s) made from
ext4_mb_mark_context(). The changed value is non-zero at this point,
so ext4_mb_mark_diskspace_used() does not exit after the error
bubbles up from ext4_mb_mark_context(). Instead, the former
decrements both cluster counters and returns the error up to
ext4_mb_new_blocks(). The latter falls into the !ar->len out path
which decrements the dirty clusters counter a second time, creating
the inconsistency.
To avoid this problem and simplify ownership of the cluster
reservation in this codepath, lift the counter reduction to a single
place in the caller. This makes it more clear that
ext4_mb_new_blocks() is responsible for acquiring cluster
reservation (via ext4_claim_free_clusters()) in the !delalloc case
as well as releasing it, regardless of whether it ends up consumed
or returned due to failure.
Fixes: 0087d9fb3f29 ("ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Link: https://patch.msgid.link/20260113171905.118284-1-bfoster@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
The xfstests' test-case generic/037 fails to execute
correctly:
FSTYP -- hfsplus
PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.15.0-rc4+ #8 SMP PREEMPT_DYNAMIC Thu May 1 16:43:22 PDT 2025
MKFS_OPTIONS -- /dev/loop51
MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch
generic/037 - output mismatch (see xfstests-dev/results//generic/037.out.bad)
The goal of generic/037 test-case is to "verify that replacing
a xattr's value is an atomic operation". The test "consists of
removing the old value and then inserting the new value in a btree.
This made readers (getxattr and listxattrs) not getting neither
the old nor the new value during a short time window".
The HFS+ has the issue of executing the xattr replace operation
because __hfsplus_setxattr() method [1] implemented it as not
atomic operation [2]:
if (hfsplus_attr_exists(inode, name)) {
if (flags & XATTR_CREATE) {
pr_err("xattr exists yet\n");
err = -EOPNOTSUPP;
goto end_setxattr;
}
err = hfsplus_delete_attr(inode, name);
if (err)
goto end_setxattr;
err = hfsplus_create_attr(inode, name, value, size);
if (err)
goto end_setxattr;
}
The main issue of the logic that it implements delete and
create of xattr as independent atomic operations, but the replace
operation at whole is not atomic operation. This patch implements
a new hfsplus_replace_attr() method that makes the xattr replace
operation by atomic one. Also, it reworks hfsplus_create_attr() and
hfsplus_delete_attr() with the goal of reusing the common logic
in hfsplus_replace_attr() method.
sudo ./check generic/037
FSTYP -- hfsplus
PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.19.0-rc1+ #47 SMP PREEMPT_DYNAMIC Thu Jan 8 15:37:20 PST 2026
MKFS_OPTIONS -- /dev/loop51
MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch
generic/037 37s ... 37s
Ran: generic/037
Passed all 1 tests
[1] https://elixir.bootlin.com/linux/v6.19-rc4/source/fs/hfsplus/xattr.c#L261
[2] https://elixir.bootlin.com/linux/v6.19-rc4/source/fs/hfsplus/xattr.c#L338
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20260109234213.2805400-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
|
|
s_fc_lock can be acquired from inode eviction and thus is
reclaim unsafe. Since the fast commit path holds s_fc_lock while writing
the commit log, allocations under the lock can enter reclaim and invert
the lock order with fs_reclaim. Add ext4_fc_lock()/ext4_fc_unlock()
helpers which acquire s_fc_lock under memalloc_nofs_save()/restore()
context and use them everywhere so allocations under the lock cannot
recurse into filesystem reclaim.
Fixes: 6593714d67ba ("ext4: hold s_fc_lock while during fast commit")
Signed-off-by: Li Chen <me@linux.beauty>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260106120621.440126-1-me@linux.beauty
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
A bitmap inconsistency issue was observed during stress tests under
mixed huge-page workloads. Ext4 reported multiple e4b bitmap check
failures like:
ext4_mb_complex_scan_group:2508: group 350, 8179 free clusters as
per group info. But got 8192 blocks
Analysis and experimentation confirmed that the issue is caused by a
race condition between page migration and bitmap modification. Although
this timing window is extremely narrow, it is still hit in practice:
folio_lock ext4_mb_load_buddy
__migrate_folio
check ref count
folio_mc_copy __filemap_get_folio
folio_try_get(folio)
......
mb_mark_used
ext4_mb_unload_buddy
__folio_migrate_mapping
folio_ref_freeze
folio_unlock
The root cause of this issue is that the fast path of load_buddy only
increments the folio's reference count, which is insufficient to prevent
concurrent folio migration. We observed that the folio migration process
acquires the folio lock. Therefore, we can determine whether to take the
fast path in load_buddy by checking the lock status. If the folio is
locked, we opt for the slow path (which acquires the lock) to close this
concurrency window.
Additionally, this change addresses the following issues:
When the DOUBLE_CHECK macro is enabled to inspect bitmap-related
issues, the following error may be triggered:
corruption in group 324 at byte 784(6272): f in copy != ff on
disk/prealloc
Analysis reveals that this is a false positive. There is a specific race
window where the bitmap and the group descriptor become momentarily
inconsistent, leading to this error report:
ext4_mb_load_buddy ext4_mb_load_buddy
__filemap_get_folio(create|lock)
folio_lock
ext4_mb_init_cache
folio_mark_uptodate
__filemap_get_folio(no lock)
......
mb_mark_used
mb_mark_used_double
mb_cmp_bitmaps
mb_set_bits(e4b->bd_bitmap)
folio_unlock
The original logic assumed that since mb_cmp_bitmaps is called when the
bitmap is newly loaded from disk, the folio lock would be sufficient to
prevent concurrent access. However, this overlooks a specific race
condition: if another process attempts to load buddy and finds the folio
is already in an uptodate state, it will immediately begin using it without
holding folio lock.
Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20260106090820.836242-1-sunyongjian@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
Remove redundant NULL check after kcalloc() with GFP_NOFS | __GFP_NOFAIL.
Signed-off-by: Baolin Liu <liubaolin@kylinos.cn>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20260106062016.154573-1-liubaolin12138@163.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
We do not use EXT4_GET_BLOCKS_IO_CREATE_EXT or split extents before
submitting I/O; therefore, remove the related code.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20260105014522.1937690-8-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
In the write path mapping check of ext4_iomap_begin(), the return value
'ret' should never greater than orig_mlen. If 'ret' equals 'orig_mlen',
it can be returned directly without checking IOMAP_ATOMIC.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20260105014522.1937690-7-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The parameter unwritten in ext4_dio_write_iter() is no longer needed,
simply remove it.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20260105014522.1937690-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
ext4_iomap_overwrite_ops was introduced in commit 8cd115bdda17 ("ext4:
Optimize ext4 DIO overwrites"), which can optimize pure overwrite
performance by dropping the IOMAP_WRITE flag to only query the mapped
mapping information. This avoids starting a new journal handle, thereby
improving speed. Later, commit 9faac62d4013 ("ext4: optimize file
overwrites") also optimized similar scenarios, but it performs the check
later, examining the mappings status only when the actual block mapping
is needed. Thus, it can handle the previous commit scenario. That means
in the case of an overwrite scenario, the condition
"offset + length <= i_size_read(inode)" in the write path must always be
true.
Therefore, it is acceptable to remove the ext4_iomap_overwrite_ops,
which will also clarify the write and read paths of ext4_iomap_begin.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20260105014522.1937690-5-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Since we have deferred the split of the unwritten extent until after I/O
completion, it is not necessary to initiate the journal handle when
submitting the I/O.
This can improve the write performance of concurrent DIO for multiple
files. The fio tests below show a ~25% performance improvement when
wirting to unwritten files on my VM with a mem disk.
[unwritten]
direct=1
ioengine=psync
numjobs=16
rw=write # write/randwrite
bs=4K
iodepth=1
directory=/mnt
size=5G
runtime=30s
overwrite=0
norandommap=1
fallocate=native
ramp_time=5s
group_reporting=1
[w/o]
w: IOPS=62.5k, BW=244MiB/s
rw: IOPS=56.7k, BW=221MiB/s
[w]
w: IOPS=79.6k, BW=311MiB/s
rw: IOPS=70.2k, BW=274MiB/s
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20260105014522.1937690-4-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Currently, when writing back dirty pages to the filesystem with the
dioread_nolock feature enabled and when doing DIO, if the area to be
written back is part of an unwritten extent, the
EXT4_GET_BLOCKS_IO_CREATE_EXT flag is set during block allocation before
submitting I/O. The function ext4_split_convert_extents() then attempts
to split this extent in advance. This approach is designed to prevents
extent splitting and conversion to the written type from failing due to
insufficient disk space at the time of I/O completion, which could
otherwise result in data loss.
However, we already have two mechanisms to ensure successful extent
conversion. The first is the EXT4_GET_BLOCKS_METADATA_NOFAIL flag, which
is a best effort, it permits the use of 2% of the reserved space or
4,096 blocks in the file system when splitting extents. This flag covers
most scenarios where extent splitting might fail. The second is the
EXT4_EXT_MAY_ZEROOUT flag, which is also set during extent splitting. If
the reserved space is insufficient and splitting fails, it does not
retry the allocation. Instead, it directly zeros out the extra part of
the extent, thereby avoiding splitting and directly converting the
entire extent to the written type.
These two mechanisms also exist when I/Os are completed because there is
a concurrency window between write-back and fallocate, which may still
require us to split extents upon I/O completion. There is no much
difference between splitting extents before submitting I/O. Therefore,
It seems possible to defer the splitting until I/O completion, it won't
increase the risk of I/O failure and data loss. On the contrary, if some
I/Os can be merged when I/O completion, it can also reduce unnecessary
splitting operations, thereby alleviating the pressure on reserved
space.
In addition, deferring extent splitting until I/O completion can
also simplify the IO submission process and avoid initiating unnecessary
journal handles when writing unwritten extents.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20260105014522.1937690-3-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When performing buffered writes, we may need to split and convert an
unwritten extent into a written one during the end I/O process. However,
we do not reserve space specifically for these metadata changes, we only
reserve 2% of space or 4096 blocks. To address this, we use
EXT4_GET_BLOCKS_PRE_IO to potentially split extents in advance and
EXT4_GET_BLOCKS_METADATA_NOFAIL to utilize reserved space if necessary.
These two approaches can reduce the likelihood of running out of space
and losing data. However, these methods are merely best efforts, we
could still run out of space, and there is not much difference between
converting an extent during the writeback process and the end I/O
process, it won't increase the risk of losing data if we postpone the
conversion.
Therefore, also use EXT4_GET_BLOCKS_METADATA_NOFAIL in
ext4_convert_unwritten_extents_endio() to prepare for the buffered I/O
iomap conversion, which may perform extent conversion during the end I/O
process.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20260105014522.1937690-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
In ext4_ext_shift_extents(), if the extent is NULL in the while loop, the
function returns immediately without releasing the path obtained via
ext4_find_extent(), leading to a memory leak.
Fix this by jumping to the out label to ensure the path is properly
released.
Fixes: a18ed359bdddc ("ext4: always check ext4_ext_find_extent result")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Link: https://patch.msgid.link/20251225084800.905701-1-zilin@seu.edu.cn
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
When zeroing out a written partial block, it is necessary to order the
data to prevent exposing stale data on disk. However, if the buffer is
unwritten or delayed, it is not allocated as written, so ordering the
data is not required. This can prevent strange and unnecessary ordered
writes when appending data across a region within a block.
Assume we have a 2K unwritten file on a filesystem with 4K blocksize,
and buffered write from 3K to 4K. Before this patch,
__ext4_block_zero_page_range() would add the range [2k,3k) to the
ordered range, and then the JBD2 commit process would write back this
block. However, it does nothing since the block is not mapped as
written, this folio will be redirtied and written back agian through the
normal write back process.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Link: https://patch.msgid.link/20251223011927.34042-1-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|