summaryrefslogtreecommitdiff
path: root/lib/raid/xor/alpha/git@git.tavy.me:linux.git
diff options
context:
space:
mode:
authorDouglas Anderson <dianders@chromium.org>2026-04-06 16:22:54 -0700
committerDanilo Krummrich <dakr@kernel.org>2026-04-11 14:32:07 +0200
commita2225b6e834a838ae3c93709760edc0a169eb2f2 (patch)
treece89b0b92fc1c04d0b2fdfacf69fad558b5abcc0 /lib/raid/xor/alpha/git@git.tavy.me:linux.git
parentc369299895a591d96745d6492d4888259b004a9e (diff)
driver core: Don't let a device probe until it's ready
The moment we link a "struct device" into the list of devices for the bus, it's possible probe can happen. This is because another thread can load the driver at any time and that can cause the device to probe. This has been seen in practice with a stack crawl that looks like this [1]: really_probe() __driver_probe_device() driver_probe_device() __driver_attach() bus_for_each_dev() driver_attach() bus_add_driver() driver_register() __platform_driver_register() init_module() [some module] do_one_initcall() do_init_module() load_module() __arm64_sys_finit_module() invoke_syscall() As a result of the above, it was seen that device_links_driver_bound() could be called for the device before "dev->fwnode->dev" was assigned. This prevented __fw_devlink_pickup_dangling_consumers() from being called which meant that other devices waiting on our driver's sub-nodes were stuck deferring forever. It's believed that this problem is showing up suddenly for two reasons: 1. Android has recently (last ~1 year) implemented an optimization to the order it loads modules [2]. When devices opt-in to this faster loading, modules are loaded one-after-the-other very quickly. This is unlike how other distributions do it. The reproduction of this problem has only been seen on devices that opt-in to Android's "parallel module loading". 2. Android devices typically opt-in to fw_devlink, and the most noticeable issue is the NULL "dev->fwnode->dev" in device_links_driver_bound(). fw_devlink is somewhat new code and also not in use by all Linux devices. Even though the specific symptom where "dev->fwnode->dev" wasn't assigned could be fixed by moving that assignment higher in device_add(), other parts of device_add() (like the call to device_pm_add()) are also important to run before probe. Only moving the "dev->fwnode->dev" assignment would likely fix the current symptoms but lead to difficult-to-debug problems in the future. Fix the problem by preventing probe until device_add() has run far enough that the device is ready to probe. If somehow we end up trying to probe before we're allowed, __driver_probe_device() will return -EPROBE_DEFER which will make certain the device is noticed. In the race condition that was seen with Android's faster module loading, we will temporarily add the device to the deferred list and then take it off immediately when device_add() probes the device. Instead of adding another flag to the bitfields already in "struct device", instead add a new "flags" field and use that. This allows us to freely change the bit from different thread without worrying about corrupting nearby bits (and means threads changing other bit won't corrupt us). [1] Captured on a machine running a downstream 6.6 kernel [2] https://cs.android.com/android/platform/superproject/main/+/main:system/core/libmodprobe/libmodprobe.cpp?q=LoadModulesParallel Cc: stable@vger.kernel.org Fixes: 2023c610dc54 ("Driver core: add new device to bus's list before probing") Reviewed-by: Alan Stern <stern@rowland.harvard.edu> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patch.msgid.link/20260406162231.v5.1.Id750b0fbcc94f23ed04b7aecabcead688d0d8c17@changeid Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Diffstat (limited to 'lib/raid/xor/alpha/git@git.tavy.me:linux.git')
0 files changed, 0 insertions, 0 deletions