summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
8 daysmacsec: introduce dedicated workqueue for SA crypto cleanupJinliang Zheng
Introduce a dedicated ordered workqueue, macsec_wq, which will be used by subsequent patches to defer SA crypto cleanup (crypto_free_aead and related teardown) out of softirq context. Using a dedicated workqueue instead of system_wq allows macsec_exit() to drain exactly the work items belonging to this module via destroy_workqueue(), without interfering with unrelated work items on system_wq or causing unexpected delays elsewhere. rcu_barrier() in macsec_exit() ensures all in-flight rcu_work callbacks have enqueued their work items before destroy_workqueue() drains and destroys the queue, making the two-step teardown correct and complete. The same sequence is kept in the error path of macsec_init() as a precaution, to mirror macsec_exit() and stay safe if work ever becomes queueable before this point in the future. While at it, rename the error labels in macsec_init() from the resource-named style (rtnl:, notifier:, wq:) to the err_xxx: style (err_rtnl:, err_notifier:, err_destroy_wq:) to align with the broader kernel convention. Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20260511153102.2640368-2-alexjlzheng@tencent.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysnet: net_failover: Fix the deadlock in slave registerFaicker Mo
There is netdev_lock_ops() before the NETDEV_REGISTER notifier in register_netdevice(), so use the non-locking functions in net_failover_slave_register(). failover_slave_register() in failover_existing_slave_register() adds lock and unlock ops too. Call Trace: <TASK> __schedule+0x30d/0x7a0 schedule+0x27/0x90 schedule_preempt_disabled+0x15/0x30 __mutex_lock.constprop.0+0x538/0x9e0 __mutex_lock_slowpath+0x13/0x20 mutex_lock+0x3b/0x50 dev_set_mtu+0x40/0xe0 net_failover_slave_register+0x24/0x280 failover_slave_register+0x103/0x1b0 failover_event+0x15e/0x210 ? dropmon_net_event+0xac/0xe0 notifier_call_chain+0x5e/0xe0 raw_notifier_call_chain+0x16/0x30 call_netdevice_notifiers_info+0x52/0xa0 register_netdevice+0x5f4/0x7c0 register_netdev+0x1e/0x40 _mlx5e_probe+0xe2/0x370 [mlx5_core] mlx5e_probe+0x59/0x70 [mlx5_core] ? __pfx_mlx5e_probe+0x10/0x10 [mlx5_core] Fixes: 4c975fd70002 ("net: hold instance lock during NETDEV_REGISTER/UP") Signed-off-by: Faicker Mo <faicker.mo@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysdrivers/base/memory: fix memory block reference leak in poison accountingMuchun Song
memblk_nr_poison_inc() and memblk_nr_poison_sub() look up a memory block via find_memory_block_by_id(), which acquires a reference to the memory block device. Both helpers use the returned memory block without dropping that reference, leaking the device reference on each successful lookup. Drop the reference after updating nr_hwpoison. Link: https://lore.kernel.org/20260428085219.1316047-3-songmuchun@bytedance.com Fixes: 5033091de814 ("mm/hwpoison: introduce per-memory_block hwpoison counter") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Oscar Salvador <osalvador@suse.de> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Huang, Ying" <huang.ying.caritas@gmail.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
8 daysFDDI: defza: Sanitise the reset safety timerMaciej W. Rozycki
The reset actions of the DEFZA adapters are exceedingly slow, taking up to 30 seconds to complete by the device spec and typically in the range of 10 seconds in reality, as required for the device RTOS to boot, still quite a lot. Therefore a state machine is used that's interrupt driven, however a safety mechanism is required in case of adapter malfunction, so that if no state change interrupt has arrived in time, then the situation is taken care of. The safety mechanism depends on the origin of the reset. For regular adapter initialisation at the device probe time a sleep is requested. However a reset is also required by the device spec when the adapter has transitioned into the halted state, such as in response to a PC Trace event in the course of ring fault recovery, possibly a common network event. In that case no sleep is possible as a device halt is reported at the hardirq level. A timer is therefore set up to ensure progress in case no adapter state change interrupt has arrived in time, but as from commit 168f6b6ffbee ("timers: Use del_timer_sync() even on UP") a warning is issued as the timer is deleted in the hardirq handler upon an expected state change: defza: v.1.1.4 Oct 6 2018 Maciej W. Rozycki tc2: DEC FDDIcontroller 700 or 700-C at 0x18000000, irq 4 tc2: resetting the board... ------------[ cut here ]------------ WARNING: kernel/time/timer.c:1611 at __timer_delete_sync+0x104/0x120, CPU#0: swapper/0/0 Modules linked in: CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 7.0.0-dirty #2 VOLUNTARY Stack : 9800000002027d08 00000000140120e0 0000000000000000 ffffffff8089d468 0000000000000000 0000000000000000 ffffffff807ed6b8 ffffffff80897458 ffffffff80897400 9800000002027b88 0000000000000000 7070617773203a6d 0000000000000000 9800000002027ba4 0000000000001000 6465746e69617420 0000000000000000 ffffffff807ed6b8 00000000140120e0 0000000000000009 000000000000064b ffffffff800dd14c 0000000000000036 9800000002184000 0000000000000000 0000000000000020 0000000000000000 ffffffff80910000 ffffffff8085c000 9800000002027c70 0000000000000001 ffffffff80045fa0 0000000000000000 0000000000000000 0000000000000000 0000000000000009 000000000000064b ffffffff800502b8 ffffffff807ed6b8 ffffffff80045fa0 ... Call Trace: [<ffffffff800502b8>] show_stack+0x28/0xf0 [<ffffffff80045fa0>] dump_stack_lvl+0x48/0x7c [<ffffffff80068c98>] __warn+0xa0/0x128 [<ffffffff8004120c>] warn_slowpath_fmt+0x64/0xa4 [<ffffffff800dd14c>] __timer_delete_sync+0x104/0x120 [<ffffffff804934ac>] fza_interrupt+0xc74/0xeb8 [<ffffffff800c6390>] __handle_irq_event_percpu+0x70/0x228 [<ffffffff800c6560>] handle_irq_event_percpu+0x18/0x78 [<ffffffff800cc320>] handle_percpu_irq+0x50/0x80 [<ffffffff800c5970>] generic_handle_irq+0x90/0xd0 [<ffffffff806e956c>] do_IRQ+0x1c/0x30 [<ffffffff8004ad4c>] handle_int+0x148/0x154 [<ffffffff800ab7c0>] do_idle+0x40/0x108 [<ffffffff800abb0c>] cpu_startup_entry+0x2c/0x38 [<ffffffff806dfec8>] kernel_init+0x0/0x108 ---[ end trace 0000000000000000 ]--- tc2: OK tc2: model 700 (DEFZA-AA), MMF PMD, address 08-00-2b-xx-xx-xx tc2: ROM rev. 1.0, firmware rev. 1.2, RMC rev. A, SMT ver. 1 tc2: link unavailable ------------[ cut here ]------------ WARNING: kernel/time/timer.c:1611 at __timer_delete_sync+0x104/0x120, CPU#0: swapper/0/0 Modules linked in: CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G W 7.0.0-dirty #2 VOLUNTARY Tainted: [W]=WARN Stack : 9800000002027d08 00000000140120e0 0000000000000000 ffffffff8089d468 0000000000000000 0000000000000000 ffffffff807ed6b8 ffffffff80897458 ffffffff80897400 9800000002027b88 0000000000000000 0000000000000000 0000000000000000 9800000002027ba4 0000000000001000 0000000000000000 0000000000000000 ffffffff807ed6b8 00000000140120e0 0000000000000009 000000000000064b ffffffff800dd14c 0000000000000036 9800000002184000 0000000000000000 0000000000000020 0000000000000000 ffffffff80910000 ffffffff8085c000 9800000002027c70 0000000000000001 ffffffff80045fa0 0000000000000000 0000000000000000 0000000000000000 0000000000000009 000000000000064b ffffffff800502b8 ffffffff807ed6b8 ffffffff80045fa0 ... Call Trace: [<ffffffff800502b8>] show_stack+0x28/0xf0 [<ffffffff80045fa0>] dump_stack_lvl+0x48/0x7c [<ffffffff80068c98>] __warn+0xa0/0x128 [<ffffffff8004120c>] warn_slowpath_fmt+0x64/0xa4 [<ffffffff800dd14c>] __timer_delete_sync+0x104/0x120 [<ffffffff804934ac>] fza_interrupt+0xc74/0xeb8 [<ffffffff800c6390>] __handle_irq_event_percpu+0x70/0x228 [<ffffffff800c6560>] handle_irq_event_percpu+0x18/0x78 [<ffffffff800cc320>] handle_percpu_irq+0x50/0x80 [<ffffffff800c5970>] generic_handle_irq+0x90/0xd0 [<ffffffff806e956c>] do_IRQ+0x1c/0x30 [<ffffffff8004ad4c>] handle_int+0x148/0x154 [<ffffffff806de8a4>] arch_local_irq_disable+0x4/0x28 [<ffffffff800ab7d0>] do_idle+0x50/0x108 [<ffffffff800abb0c>] cpu_startup_entry+0x2c/0x38 [<ffffffff806dfec8>] kernel_init+0x0/0x108 ---[ end trace 0000000000000000 ]--- tc2: registered as fddi0 The immediate origin of the new warning is the switch away from aliasing del_timer_sync() to del_timer() (timer_delete_sync() to timer_delete() in terms of current function names) for UP configurations, which however is the only choice for this driver anyway as no SMP hardware supports the TURBOchannel bus this device interfaces to. Therefore there is a very remote issue only this is a sign of. Specifically if an adapter reset issued upon a transition to the halted state times out and first triggers fza_reset_timer() for another reset assertion, which then schedules fza_reset_timer() for reset deassertion and then that second call is pre-empted after poking at the hardware, but before the timer has been rearmed and owing to high system load causing exceedingly high scheduling latency control is not handed back before a transition to the uninitialised state has caused the timer to be deleted even before it has been started, then fza_reset_timer() will be called yet again and issue another reset even though by then the adapter has already recovered. Prevent this situation from happening by switching to timer_delete() for the transition to the halted state and protect the code region affected with a spinlock, also to make sure add_timer() has not been called twice in a row due to an execution race between the interrupt handler and the timer handler (though it could only happen on SMP, but let's keep the driver clean). It's a very unlikely sequence of events to happen and therefore there's no point in trying to be overly clever about it, such as by placing printk() calls outside the protection. For the transition to the uninitialised state switch to timer_delete_sync_try() instead, so that a timer isn't deleted that's just been rearmed by the timer handler and needs to watch for the device to come out of reset again (again, an SMP scenario only). Retain timer_delete_sync() invocations outside the hardirq context for a stray timer not to fire once device structures have been released. Fixes: 61414f5ec9834 ("FDDI: defza: Add support for DEC FDDIcontroller 700 TURBOchannel adapter") Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysvfio/pci: fix dma-buf kref underflow after revokeAlex Williamson
vfio_pci_dma_buf_move(revoked=true) and vfio_pci_dma_buf_cleanup() ran the same drain sequence: set priv->revoked, invalidate mappings, wait for fences, drop the registered kref, wait for completion. When the VFIO device fd was closed after PCI_COMMAND_MEMORY had been cleared, both ran in turn -- the second kref_put underflowed and the subsequent wait_for_completion() blocked on a completion that the first run had already consumed: refcount_t: underflow; use-after-free. WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x59/0x90 Call Trace: vfio_pci_dma_buf_cleanup+0x163/0x168 [vfio_pci_core] vfio_pci_core_close_device+0x67/0xe0 [vfio_pci_core] vfio_df_close+0x4c/0x80 [vfio] vfio_df_group_close+0x36/0x80 [vfio] vfio_device_fops_release+0x21/0x40 [vfio] __fput+0xe6/0x2b0 __x64_sys_close+0x3d/0x80 Collapse the duplication: vfio_pci_dma_buf_cleanup() now delegates the drain to vfio_pci_dma_buf_move(true), which is idempotent for already-revoked dma-bufs. cleanup retains only list removal and the device registration drop; the dma_resv_lock that bracketed those is dropped along with the in-line drain that required it, memory_lock continues to protect them. Re-arm the kref and the completion at the end of move()'s revoke branch so post-revoke state matches post-creation (kref == 1, completion ready). This keeps cleanup's call into move() a no-op when revoke already ran, and replaces the explicit kref_init() that the un-revoke branch used to perform for the un-revoke -> remap path. Fixes: 1a8a5227f229 ("vfio: Wait for dma-buf invalidation to complete") Reported-by: Joonas Kylmälä <joonas.kylmala@netum.fi> Closes: https://lore.kernel.org/all/GVXPR02MB12019AA6014F27EF5D773E89BFB372@GVXPR02MB12019.eurprd02.prod.outlook.com/ Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Reviewed-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Alex Williamson <alex.williamson@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20260507143548.1018405-1-alex.williamson@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>
8 daysdrm/gma500/oaktrail_lvds: fix i2c adapter leaks on initJohan Hovold
The LVDS init code looks up an I2C adapter using i2c_get_adapter() and tries to read the EDID before falling back to allocating and registering its own adapter. Make sure to drop the references taken by i2c_get_adapter() when falling back to allocating an adapter as well as on late errors to allow the looked up adapter to be deregistered. Fixes: 1b082ccf5901 ("gma500: Add Oaktrail support") Cc: stable@vger.kernel.org # 3.3 Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Link: https://patch.msgid.link/20260508144446.59722-4-johan@kernel.org
8 daysdrm/gma500/oaktrail_lvds: fix hang on init failureJohan Hovold
The LVDS init code looks up an I2C adapter using i2c_get_adapter() and tries to read the EDID before falling back to allocating and registering its own adapter. The error handling does not separate these cases so on a late init failure it will try to deregister and free also an adapter that had previously been registered. Since i2c_get_adapter() takes another reference to the adapter, deregistration hangs indefinitely while waiting for the reference to be released. Fix this by only destroying adapters allocated during LVDS init on errors. Fixes: a57ebfc0b4da ("drm/gma500: Make oaktrail lvds use ddc adapter from drm_connector") Cc: stable@vger.kernel.org # 6.0 Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Link: https://patch.msgid.link/20260508144446.59722-3-johan@kernel.org
8 daysdrm/gma500/oaktrail_hdmi: fix i2c adapter leak on setupJohan Hovold
Make sure to drop the reference taken to the I2C adapter (and its module) when setting up HDMI to allow the adapter to be deregistered. Fixes: 1b082ccf5901 ("gma500: Add Oaktrail support") Cc: stable@vger.kernel.org # 3.3 Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Link: https://patch.msgid.link/20260508144446.59722-2-johan@kernel.org
8 daysdrm/xe: Drop unused ggtt_balloon fieldMichal Wajdeczko
During recent GGTT refactoring we missed to drop now unused field from the xe_tile. Drop it now. Fixes: e904c56ba6e0 ("drm/xe: Rewrite GGTT VF initialization") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://patch.msgid.link/20260510205605.642-1-michal.wajdeczko@intel.com (cherry picked from commit 21d5a871f57909dc4d8e4f5d3bf92f9ccf2597b2) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
9 daysnet: ethernet: ravb: Do not check URAM suspension when WoL is activeNiklas Söderlund
When updating the driver to match latest datasheet to suspend access to URAM when suspending DMA transfers a corner-case was missed, URAM access will not be suspended if WoL is enabled. This lead to the error message (correctly) being triggered as URAM access is not suspended even tho it's requested as part of stopping DMA. Avoid checking if URAM access is suspended and printing the error message if WoL is enabled when we suspend the system, as we know it will not be. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/all/CAMuHMdWnjV%3DHGE1o08zLhUfTgOSene5fYx1J5GG10mB%2BToq8qg@mail.gmail.com/ Fixes: 353d8e7989b6 ("net: ethernet: ravb: Suspend and resume the transmission flow") Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Sai Krishna <saikrishnag@marvell.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: atlantic: preserve PCI wake-from-D3 on shutdown when WOL enabledZoran Ilievski
The shutdown handler aq_pci_shutdown() unconditionally calls pci_wake_from_d3(pdev, false), clearing the PCI PME_En bit even when wake-on-LAN has been configured. While aq_nic_shutdown() correctly programs the NIC firmware via aq_nic_set_power() to listen for magic packets, the PCI subsystem will not propagate the resulting PME wake event from D3, so the system never wakes after poweroff. WOL from suspend (S3) is unaffected because aq_suspend_common() does not touch pci_wake_from_d3() and relies on the PM core's wake configuration via device_may_wakeup(). This affects all atlantic-supported NICs (AQC107/108/111/112/113); users have reported that WOL works if the atlantic driver is never loaded, but breaks once it has run its shutdown path. Pass the configured WOL state to pci_wake_from_d3() instead of a literal false, so the PCI PME_En bit is preserved when the user has armed WOL via ethtool. Fixes: 90869ddfefeb ("net: aquantia: Implement pci shutdown callback") Cc: stable@vger.kernel.org Signed-off-by: Zoran Ilievski <goodboy@rexbytes.com> Reviewed-by: Sukhdeep Singh <sukhdeeps@marvell.com> Link: https://patch.msgid.link/20260511064002.1857-1-goodboy@rexbytes.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 dayshwmon: (asus_atk0110) Check ACPI_COMPANION() against NULLRafael J. Wysocki
Every platform driver can be forced to match a device that doesn't match its list of device IDs because of device_match_driver_override(), so platform drivers that rely on the existence of a device's ACPI companion object need to verify its presence. Accordingly, add a requisite ACPI_HANDLE() check against NULL to the asus_atk0110 hwmon driver. Fixes: ee1752590733 ("hwmon: (asus_atk0110) Convert ACPI driver to a platform one") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/2261594.irdbgypaU6@rafael.j.wysocki Signed-off-by: Guenter Roeck <linux@roeck-us.net>
9 dayshwmon: (acpi_power_meter) Check ACPI_COMPANION() against NULLRafael J. Wysocki
Every platform driver can be forced to match a device that doesn't match its list of device IDs because of device_match_driver_override(), so platform drivers that rely on the existence of a device's ACPI companion object need to verify its presence. Accordingly, add a requisite ACPI_COMPANION() check against NULL to the acpi_power_meter hwmon driver. Fixes: afc6c4aedea5 ("hwmon: (acpi_power_meter) Convert ACPI driver to a platform one") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/5068745.GXAFRqVoOG@rafael.j.wysocki Signed-off-by: Guenter Roeck <linux@roeck-us.net>
9 daysACPI: PAD: xen: Check ACPI_COMPANION() against NULLRafael J. Wysocki
Every platform driver can be forced to match a device that doesn't match its list of device IDs because of device_match_driver_override(), so platform drivers that rely on the existence of a device's ACPI companion object need to verify its presence. Accordingly, add a requisite ACPI_COMPANION() check against NULL to the Xen variant of the ACPI processor aggregator device (PAD) driver. Fixes: 112b2f978afe ("ACPI: PAD: xen: Convert to a platform driver") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Juergen Gross <jgross@suse.com> Link: https://patch.msgid.link/3427762.aeNJFYEL58@rafael.j.wysocki
9 daysaccel/qaic: Add overflow check to remap_pfn_range during mmapZack McKevitt
The call to remap_pfn_range in qaic_gem_object_mmap is susceptible to (re)mapping beyond the VMA if the BO is too large. This can cause use after free issues when munmap() unmaps only the VMA region and not the additional mappings. To prevent this, check the remaining size of the VMA before remapping and truncate the remapped length if sg->length is too large. Reported-by: Lukas Maar <lukas.maar@tugraz.at> Fixes: ff13be830333 ("accel/qaic: Add datapath") Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com> Signed-off-by: Zack McKevitt <zachary.mckevitt@oss.qualcomm.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> [jhugo: fix braces from checkpatch --strict] Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://patch.msgid.link/20260430193858.1178641-1-zachary.mckevitt@oss.qualcomm.com
9 daysHID: logitech-hidpp: Add support for newer Bluetooth keyboardsAlain Michaud
Add product IDs (PIDs) for several newer Logitech Bluetooth keyboards to the hidpp_devices matching table, enabling full HID++ support for them. The added keyboards are: - Logitech Signature K650 & B2B - Logitech Pebble Keys 2 K380S - Logitech Casa Pop-Up Desk & B2B - Logitech Wave Keys & B2B - Logitech Signature Slim K950 & B2B - Logitech MX Keys S & B2B - Logitech Keys-To-Go 2 - Logitech Pop Icon Keys - Logitech MX Keys Mini & B2B - Logitech Signature Slim Solar+ K980 B2B - Logitech Bluetooth Keyboard K250/K251 - Logitech Signature Comfort K880 & B2B Signed-off-by: Alain Michaud <alainmichaud@google.com> Reviewed-by: Olivier Gay <ogay@logitech.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
9 daysHID: pidff: Fix integer overflow in pidff_rescaleTomasz Pakuła
Rescaling values close to the max (U16_MAX) temporarily creates values that exceed the s32 range. This caused value overflow in case when, for example, a periodic effect phase was higer than 180 degrees. In turn, rescale function could return values outised of the logical range of the HID field. Fix by using 64 bit signed integer to store the value during calculation but still return only 32 bit integer. Closes: https://github.com/JacKeTUs/universal-pidff/issues/116 Fixes: 224ee88fe395 ("Input: add force feedback driver for PID devices") Cc: stable@vger.kernel.org Signed-off-by: Tomasz Pakuła <tomasz.pakula.oficjalny@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
9 daysHID: i2c-hid: add reset quirk for BLTP7853 touchpadXu Rao
The BLTP7853 I2C HID touchpad may fail to probe after reboot or reprobe because reset completion is not signalled to the host. The driver then waits for the reset-complete interrupt until it times out and the device probe fails: i2c_hid i2c-BLTP7853:00: failed to reset device. i2c_hid i2c-BLTP7853:00: can't add hid device: -61 i2c_hid: probe of i2c-BLTP7853:00 failed with error -61 Add I2C_HID_QUIRK_NO_IRQ_AFTER_RESET for the device so i2c-hid does not wait for a reset interrupt that may never arrive. Signed-off-by: Xu Rao <raoxu@uniontech.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
9 daysHID: core: introduce hid_safe_input_report()Benjamin Tissoires
hid_input_report() is used in too many places to have a commit that doesn't cross subsystem borders. Instead of changing the API, introduce a new one when things matters in the transport layers: - usbhid - i2chid This effectively revert to the old behavior for those two transport layers. Fixes: 0a3fe972a7cb ("HID: core: Mitigate potential OOB by removing bogus memset()") Cc: stable@vger.kernel.org Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Signed-off-by: Jiri Kosina <jkosina@suse.com>
9 daysHID: pass the buffer size to hid_report_raw_eventBenjamin Tissoires
commit 0a3fe972a7cb ("HID: core: Mitigate potential OOB by removing bogus memset()") enforced the provided data to be at least the size of the declared buffer in the report descriptor to prevent a buffer overflow. However, we can try to be smarter by providing both the buffer size and the data size, meaning that hid_report_raw_event() can make better decision whether we should plaining reject the buffer (buffer overflow attempt) or if we can safely memset it to 0 and pass it to the rest of the stack. Fixes: 0a3fe972a7cb ("HID: core: Mitigate potential OOB by removing bogus memset()") Cc: stable@vger.kernel.org Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Acked-by: Johan Hovold <johan@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jiri Kosina <jkosina@suse.com>
9 daysHID: google: hammer: stop hardware on devres action failureMyeonghun Pak
hammer_probe() starts the HID hardware before registering the devres action that stops it. If devm_add_action() fails, probe returns an error with the hardware still started because the cleanup action was never registered and the driver's remove callback is not called after a failed probe. Use devm_add_action_or_reset() so the stop action runs immediately on registration failure while preserving the existing devres-managed cleanup path for later probe failures and remove. Signed-off-by: Myeonghun Pak <mhun512@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
9 daysHID: appletb-kbd: run inactivity autodim from workqueuesSangyun Kim
The autodim code in hid-appletb-kbd takes backlight_device->ops_lock via backlight_device_set_brightness() -> mutex_lock() from two different atomic contexts: * appletb_inactivity_timer() is a struct timer_list callback, so it runs in softirq context. Every expiry triggers BUG: sleeping function called from invalid context at kernel/locking/mutex.c:591 Call Trace: <IRQ> __might_resched __mutex_lock backlight_device_set_brightness appletb_inactivity_timer call_timer_fn run_timer_softirq * reset_inactivity_timer() is called from appletb_kbd_hid_event() and appletb_kbd_inp_event(). On real USB hardware these run in softirq/IRQ context (URB completion and input-event dispatch). When the Touch Bar has already been dimmed or turned off, the reset path calls backlight_device_set_brightness() directly to restore brightness, producing the same warning. Both call sites hit the same mutex_lock()-from-atomic bug. Fix them together by moving the blocking work onto the system workqueue: * Convert the inactivity timer from struct timer_list to struct delayed_work; the callback (appletb_inactivity_work) now runs in process context where mutex_lock() is legal. * Add a dedicated struct work_struct restore_brightness_work and have reset_inactivity_timer() schedule it instead of calling backlight_device_set_brightness() directly. Cancel both works synchronously during driver tear-down alongside the existing backlight reference drop. The semantics are unchanged (same delays, same state transitions on dim, turn-off and user activity); only the execution context of the sleeping call changes. The timer field and callback are renamed to match their new type; reset_inactivity_timer() keeps its name because it is invoked from input event paths that read naturally as "reset the inactivity timer". Fixes: 93a0fc489481 ("HID: hid-appletb-kbd: add support for automatic brightness control while using the touchbar") Cc: stable@vger.kernel.org Signed-off-by: Sangyun Kim <sangyun.kim@snu.ac.kr> Signed-off-by: Jiri Kosina <jkosina@suse.com>
9 daysHID: appletb-kbd: fix UAF in inactivity-timer cleanup pathSangyun Kim
Commit 38224c472a03 ("HID: appletb-kbd: fix slab use-after-free bug in appletb_kbd_probe") added timer_delete_sync(&kbd->inactivity_timer) to both the probe close_hw error path and appletb_kbd_remove(), but the way it was wired in left the inactivity timer reachable during driver tear-down via two distinct windows. Window A -- put_device() before timer_delete_sync(): put_device(&kbd->backlight_dev->dev); timer_delete_sync(&kbd->inactivity_timer); The inactivity_timer softirq reads kbd->backlight_dev and calls backlight_device_set_brightness() -> mutex_lock(&ops_lock). If a concurrent hid_appletb_bl unbind drops the last devm reference between these two calls, the backlight_device is freed and the mutex_lock() touches freed memory. Window B -- backlight cleanup before hid_hw_stop(): if (kbd->backlight_dev) { timer_delete_sync(...); put_device(...); } hid_hw_close(hdev); hid_hw_stop(hdev); Even after Window A is closed, hid_hw_close()/hid_hw_stop() still run afterwards, so a late ".event" callback from the HID core (USB URB completion on real Apple hardware) can arrive after timer_delete_sync() drained the softirq but before put_device() drops the reference. That callback reaches reset_inactivity_timer(), which calls mod_timer() and re-arms the timer. The freshly re-armed timer can then fire on the about-to-be-freed backlight_device. Both windows produce the same KASAN slab-use-after-free: BUG: KASAN: slab-use-after-free in __mutex_lock+0x1aab/0x21c0 Read of size 8 at addr ffff88803ee9a108 by task swapper/0/0 Call Trace: <IRQ> __mutex_lock backlight_device_set_brightness appletb_inactivity_timer call_timer_fn run_timer_softirq handle_softirqs Allocated by task N: devm_backlight_device_register appletb_bl_probe Freed by task M: (concurrent hid_appletb_bl unbind path) Close both windows at once by reworking the tear-down in appletb_kbd_remove() and in the probe close_hw error path so that 1) hid_hw_close()/hid_hw_stop() run before the backlight cleanup, guaranteeing no further .event callback can fire and re-arm the timer, and 2) inside the "if (kbd->backlight_dev)" block, timer_delete_sync() runs before put_device(), so the softirq is drained before the final reference is dropped. Fixes: 38224c472a03 ("HID: appletb-kbd: fix slab use-after-free bug in appletb_kbd_probe") Cc: stable@vger.kernel.org Signed-off-by: Sangyun Kim <sangyun.kim@snu.ac.kr> Signed-off-by: Jiri Kosina <jkosina@suse.com>
9 daysHID: playstation: Clamp num_touch_reportsT.J. Mercier
A device would never lie about the number of touch reports would it? If it does the loop in dualshock4_parse_report will read off the end of the touch_reports array, up to about 2 KiB for the maximum number of 256 loop iteraions. The data that is read is emitted via evdev if the DS4_TOUCH_POINT_INACTIVE bit happens to be set. Protect against this by clamping the num_touch_reports value provided by the device to the maximum size of the touch_reports array. Fixes: 752038248808 ("HID: playstation: add DualShock4 touchpad support.") Cc: stable@vger.kernel.org Reported-by: Xingyu Jin <xingyuj@google.com> Signed-off-by: T.J. Mercier <tjmercier@google.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
9 daysHID: magicmouse: Prevent out-of-bounds (OOB) read during DOUBLE_REPORT_IDLee Jones
It is currently possible for a malicious or misconfigured USB device to cause an out-of-bounds (OOB) read when submitting reports using DOUBLE_REPORT_ID by specifying a large report length and providing a smaller one. Let's prevent that by comparing the specified report length with the actual size of the data read in from userspace. If the actual data length ends up being smaller than specified, we'll politely warn the user and prevent any further processing. Signed-off-by: Lee Jones <lee@kernel.org> Reviewed-by: Günther Noack <gnoack@google.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
9 daysHID: mcp2221: fix OOB write in mcp2221_raw_event()Florian Pradines
mcp2221_raw_event() copies device-supplied data into mcp->rxbuf at offset rxbuf_idx without checking that the copy fits within the destination buffer. A device responding with up to 60 bytes to a small I2C/SMBus read can overflow the buffer. Add a rxbuf_size field to struct mcp2221, set it alongside rxbuf in mcp_i2c_smbus_read(), and check rxbuf_idx + data[3] <= rxbuf_size before the memcpy. Reported-by: Benoît Sevens <bsevens@google.com> Signed-off-by: Florian Pradines <florian.pradines@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
9 daysHID: quirks: really enable the intended work around for appledisplayLukas Bulwahn
Commit c7fabe4ad921 ("HID: quirks: work around VID/PID conflict for appledisplay") intends to add a quirk for kernels built with Apple Cinema Display support, but it refers to the non-existing config option CONFIG_APPLEDISPLAY, whereas the config option for Apple Cinema Display support is named CONFIG_USB_APPLEDISPLAY. Refer to the intended config option CONFIG_USB_APPLEDISPLAY in the ifdef directive. Fixes: c7fabe4ad921 ("HID: quirks: work around VID/PID conflict for appledisplay") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
9 daysHID: hid-sjoy: race between init and usageOliver Neukum
The driver uses an initial IO to set the device to a default state. That initialization is currently being done after the device node has been created. That means that the single buffer used for output can be altered while IO is in progress. Move the intialization before announcement to user space. Fixes: fac733f029251 ("HID: force feedback support for SmartJoy PLUS PS2/USB adapter") Signed-off-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
9 daysnet: ethernet: cs89x0: remove stale CONFIG_MACH_MX31ADS referenceEthan Nelson-Moore
The legacy ARM board file for MACH_MX31ADS was removed in commit c93197b0041d ("ARM: imx: Remove i.MX31 board files"), but a reference to it remained in the cs89x0 driver. Drop this unused code. Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Fixes: c93197b0041d ("ARM: imx: Remove i.MX31 board files") Link: https://patch.msgid.link/20260509023732.42256-1-enelsonmoore@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 daysnet: ethernet: cortina: Carry over frag counterLinus Walleij
The gmac_rx() NAPI poll function assembles packets in an SKB from a ring buffer. If the ring buffer gets completely emptied during a poll cycle, we exit gmac_rx(), but the packet is not yet completely assembled in the SKB, yet the fragment counter frag_nr is reset to zero on the next invocation. Solve this by making the RX fragment counter a part of the port struct, and carry it over between invocations. Reset the fragment counter only right after calling napi_gro_frags(), on error (after calling napi_free_frags()) or if stopping the port. Reset it in some place where not strictly necessary just to emphasize what is going on. This was found by Sashiko during normal patch review. Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet") Link: https://sashiko.dev/#/patchset/20260505-gemini-ethernet-fix-v2-1-997c31d06079%40kernel.org Signed-off-by: Linus Walleij <linusw@kernel.org> Link: https://patch.msgid.link/20260509-gemini-ethernet-fixes-v1-3-6c5d20ddc35b@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 daysnet: ethernet: cortina: Make RX SKB per-portLinus Walleij
The SKB used to assemble packets from fragments in gmac_rx() is static local, but the Gemini has two ethernet ports, meaning there can be races between the ports on a bad day if a device is using both. Make the RX SKB a per-port variable and carry it over between invocations in the port struct instead. Zero the pointer once we call napi_gro_frags(), on error (after calling napi_free_frags()) or if the port is stopped. Zero it in some place where not strictly necessary just to emphasize what is going on. This was found by Sashiko during normal patch review. Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet") Link: https://sashiko.dev/#/patchset/20260505-gemini-ethernet-fix-v2-1-997c31d06079%40kernel.org Signed-off-by: Linus Walleij <linusw@kernel.org> Link: https://patch.msgid.link/20260509-gemini-ethernet-fixes-v1-2-6c5d20ddc35b@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 daysnet: ethernet: cortina: No mapping is a dropped rxLinus Walleij
Increase stats.rx_dropped++ even if this is the first fragment (skb == NULL) so we are doing proper accounting. Fixes: b266bacba796 ("net: ethernet: cortina: Drop half-assembled SKB") Link: https://sashiko.dev/#/patchset/20260505-gemini-ethernet-fix-v2-1-997c31d06079%40kernel.org Signed-off-by: Linus Walleij <linusw@kernel.org> Link: https://patch.msgid.link/20260509-gemini-ethernet-fixes-v1-1-6c5d20ddc35b@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
10 daysdrm/i915/dp: Fix VSC dynamic range signaling for RGB formatsChaitanya Kumar Borah
For RGB, set dynamic_range to CTA or VESA based on crtc_state->limited_color_range so sinks apply correct quantization. YCbCr remains limited (CTA) range. (DP v1.4, Table 5-1) v2: - Added Reported-by and Tested-by tags v3: - Add back YCbCr comment(Suraj) Cc: stable@vger.kernel.org #v5.8+ Reported-by: DeepChirp <DeepChirp@outlook.com> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/work_items/15874 Tested-by: DeepChirp <DeepChirp@outlook.com> Fixes: 9799c4c3b76e ("drm/i915/dp: Add compute routine for DP VSC SDP") Assisted-by: GitHub-Copilot:GPT-5.4 Signed-off-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://patch.msgid.link/20260505090920.2479112-1-chaitanya.kumar.borah@intel.com (cherry picked from commit 38e10ddae6f8d42a2e8437fcd25a1cac51106c64) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
10 daysdrm/i915: skip __i915_request_skip() for already signaled requestsSebastian Brzezinka
After a GPU reset the HWSP is zeroed, so previously completed requests appear incomplete. If such a request is picked up during reset_rewind() and marked guilty, i915_request_set_error_once() returns early (fence already signaled), leaving fence.error without a fatal error code. The subsequent __i915_request_skip() then hits: ``` GEM_BUG_ON(!fatal_error(rq->fence.error)) ``` Fixes a kernel BUG observed on Sandy Bridge (Gen6) during heartbeat-triggered engine resets. ``` kernel BUG at drivers/gpu/drm/i915/i915_request.c:556! RIP: __i915_request_skip+0x15e/0x1d0 [i915] ... __i915_request_reset+0x212/0xa70 [i915] reset_rewind+0xe4/0x280 [i915] intel_gt_reset+0x30d/0x5b0 [i915] heartbeat+0x516/0x530 [i915] ``` Guard __i915_request_skip() with i915_request_signaled(), if the fence is already signaled, the ring content is committed and there is nothing left to skip. Fixes: 36e191f0644b ("drm/i915: Apply i915_request_skip() on submission") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/work_items/13729 Signed-off-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com> Cc: stable@vger.kernel.org # v5.7+ Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/fe76921d35b6ae85aa651822726d0d9815aa5362.1776339012.git.sebastian.brzezinka@intel.com (cherry picked from commit 5ba54393dcd7adf75a9f39f5a933b1538349cad5) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
10 daysdrm/bridge: imx8qxp-pxl2dpi: avoid ERR_PTR with device_node cleanupGuangshuo Li
imx8qxp_pxl2dpi_get_available_ep_from_port() returns ERR_PTR() on errors. imx8qxp_pxl2dpi_find_next_bridge() stores its return value in a __free(device_node) variable before checking IS_ERR(). When the function returns on the error path, the cleanup action calls of_node_put() on the ERR_PTR() value. Do not let a device_node cleanup variable hold error pointers. Change imx8qxp_pxl2dpi_get_available_ep_from_port() to return an int and pass the endpoint node through an output argument. Initialize the output argument to NULL so callers hold either NULL on error paths or a valid device_node pointer on successful path. Fixes: ceea3f7806a10 ("drm/bridge: imx8qxp-pxl2dpi: simplify put of device_node pointers") Cc: stable@vger.kernel.org Reviewed-by: Liu Ying <victor.liu@nxp.com> Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Link: https://patch.msgid.link/20260507100604.667731-1-lgs201920130244@gmail.com Signed-off-by: Liu Ying <victor.liu@nxp.com>
10 daysqed: fix division by zero in qed_init_wfq_param when all vports are configuredEvgenii Burenchev
In qed_init_wfq_param(), variable non_requested_count can become zero when the number of vports with the configured flag set (including the current vport being configured) equals total num_vports. This happens when configuring the last unconfigured vport or when re-configuring an already configured vport. The function then calculates left_rate_per_vp = total_left_rate / non_requested_count, which causes division by zero. Fix this by skipping the division when non_requested_count is zero. In that case, there is no remaining bandwidth to distribute, so just record the configuration for the current vport and return success. Fixes: bcd197c81f63 ("qed: Add vport WFQ configuration APIs") Signed-off-by: Evgenii Burenchev <evg28bur@yandex.ru> Link: https://patch.msgid.link/20260507145520.23106-1-evg28bur@yandex.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysnet: ena: PHC: Check return code before setting timestamp outputArthur Kiyanovski
ena_phc_gettimex64() is setting the output parameter regardless of whether ena_com_phc_get_timestamp() succeeded or failed. When ena_com_phc_get_timestamp() returns an error, the timestamp parameter may contain uninitialized stack memory (e.g., when PHC is disabled or in blocked state) or invalid hardware values. Passing these to userspace via the PTP ioctl is both a security issue (information leak) and a correctness bug. Fix by checking the return code after releasing the lock and only setting the output timestamp on success. Fixes: e0ea34158ee8 ("net: ena: Add PHC support in the ENA driver") Cc: stable@vger.kernel.org Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20260507003518.22554-1-akiyano@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysdrm/amdgpu/gfx_v12_0: set gfx.rs64_enable from PFP header on GFX12Jesse Zhang
gfx_v12_0_init_microcode() always loads RS64 CP ucode but never set adev->gfx.rs64_enable, so it stayed false and code that branches on it (e.g. MEC pipe reset) used the legacy CP_MEC_CNTL path incorrectly. Match GFX11: derive RS64 mode from the PFP firmware header (v2.0) via amdgpu_ucode_hdr_version(). Log at debug when RS64 is enabled. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b03d53598b0d2048e8fa7303b8d0784768ec4fa6)
10 daysdrm/amd/ras: Fix CPER ring debugfs read overflowXiang Liu
The legacy CPER debugfs reader can reach the payload path without a valid pointer snapshot. The remaining user byte count is also treated as the ring occupancy in dwords, so reads past the header can copy more than requested. Take the CPER lock before sampling pointers. Resample rptr/wptr for payload reads, bound the payload copy by available dwords and the remaining user size, and advance the file position for each dword copied. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1e40ef87ffdc291e05ccdade8b9170cc9c1c4249)
10 daysdrm/amd/display: Wrap DCN32 phantom-plane allocation in ↵Mikhail Gavrilov
DC_RUN_WITH_PREEMPTION_ENABLED [Why] dcn32_validate_bandwidth() wraps dcn32_internal_validate_bw() with DC_FP_START()/DC_FP_END(). In x86 non-RT, DC_FP_START takes fpregs_lock(), which disables local softirqs. The DML1 path through dcn32_enable_phantom_plane() calls kvzalloc() to allocate ~335 KiB for dc_plane_state. This triggers the vmalloc path, which calls BUG_ON(in_interrupt()) because it's invoked within the FPU-enabled (softirq disabled) region, leading to a kernel crash. [How] Wrap the dc_state_create_phantom_plane() call with the DC_RUN_WITH_PREEMPTION_ENABLED() macro to allow preemption during this memory allocation. Fixes: 235c67634230 ("drm/amd/display: add DCN32/321 specific files for Display Core") Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4470 Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Signed-off-by: James Lin <pinglei.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 885ccbef7b94a8b38f69c4211c679021aa27ad11) Cc: stable@vger.kernel.org
10 daysdrm/amdgpu: fix userq hang detection and resetChristian König
Fix lock inversions pointed out by Prike and Sunil. The hang detection timeout *CAN'T* grab locks under which we wait for fences, especially not the userq_mutex lock. Then instead of this completely broken handling with the hang_detect_fence just cancel the work when fences are processed and re-start if necessary. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1b62077f045ac6ffde7c97005c6659569ac5c1ec)
10 daysdrm/amdgpu: remove almost all calls to amdgpu_userq_detect_and_reset_queuesChristian König
Well the reset handling seems broken on multiple levels. As first step of fixing this remove most calls to the hang detection. That function should only be called after we run into a timeout! And *NOT* as random check spread over the code in multiple places. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 71bea36b54ccfb14cbc90f94267af6369af4e702)
10 daysdrm/amdgpu: rework amdgpu_userq_signal_ioctl v3Christian König
This one was fortunately not looking so bad as the wait ioctl path, but there were still a few things which could be fixed/improved: 1. Allocating with GFP_ATOMIC was quite unnecessary, we can do that before taking the userq_lock. 2. Use a new mutex as protection for the fence_drv_xa so that we can do memory allocations while holding it. 3. Starting the reset timer is unnecessary when the fence is already signaled when we create it. 4. Cleanup error handling, avoid trying to free the queue when we don't even got one. v2: fix incorrect usage of xa_find, destroy the new mutex on error v3: cleanup ref ordering Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1609eb0f81a609d350169839128cecf298c84e7a)
10 daysdrm/amdgpu: remove deadlocks from amdgpu_userq_pre_resetChristian König
The purpose of a GPU reset is to make sure that fence can be signaled again and the signal and resume workers can make progress again. So waiting for the resume worker or any fence in the GPU reset path is just utterly nonsense. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fcd5f065eab46993af43442fd77ee8d9eb9c5bdf)
10 daysdrm/xe/dma-buf: fix UAF with retry loopMatthew Auld
Retry doesn't work here, since bo will be freed on error, leading to UAF. However, now that we do the alloc & init before the attach, we can now combine this as one unit and have the init do the alloc for us. This should make the retry safe. Reported by Sashiko. v2: Fix up the error unwind (CI) Closes: https://sashiko.dev/#/patchset/20260506184332.86743-2-matthew.auld%40intel.com Fixes: eb289a5f6cc6 ("drm/xe: Convert xe_dma_buf.c for exhaustive eviction") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.18+ Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20260508102635.149172-4-matthew.auld@intel.com (cherry picked from commit 479669418253e0f27f8cf5db01a731352ea592e7) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
10 daysdrm/xe/dma-buf: handle empty bo and UAF racesMatthew Auld
There look to be some nasty races here when triggering the invalidate_mappings hook: 1) We do xe_bo_alloc() followed by the attach, before the actual full bo init step in xe_dma_buf_init_obj(). However the bo is visible on the attachments list after the attach. This is bad since exporter driver, say amdgpu, can at any time call back into our invalidate_mappings hook, with an empty/bogus bo, leading to potential bugs/crashes. 2) Similar to 1) but here we get a UAF, when the invalidate_mappings hook is triggered. For example, we get as far as xe_bo_init_locked() but this fails in some way. But here the bo will be freed on error, but we still have it attached from dma-buf pov, so if the invalidate_mappings is now triggered then the bo we access is gone and we trigger UAF and more bugs/crashes. To fix this, move the attach step until after we actually have a fully set up buffer object. Note that the bo is not published to userspace until later, so not sure what the comment "Don't publish the bo until we have a valid attachment", is referring to. We have at least two different customers reporting hitting a NULL ptr deref in evict_flags when importing something from amdgpu, followed by triggering the evict flow. Hit rate is also pretty low, which would hint at some kind of race, so something like 1) or 2) might explain this. v2: - Shuffle the order of the ops slightly (no functional change) - Improve the comment to better explain the ordering (Matt B) Assisted-by: Gemini:gemini-3 #debug Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7903 Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/4055 Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20260508102635.149172-3-matthew.auld@intel.com (cherry picked from commit af1f2ad0c59fe4e2f924c526f66e968289d77971) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
10 daysdrm/xe: Make decision to use Xe2-style blitter instructions a feature flagMatt Roper
The blitter engines' MEM_COPY and MEM_SET instructions were added as part of the same hardware change that introduced service copy engines (i.e., BCS1-BCS8) which is why the driver checks for service copy engine presence when deciding whether to use these instructions or the older XY_* instructions. However when making this decision the driver should consider which engines are part of the hardware architecture, not which engines are present/usable on the current device. For graphics IP versions that architecturally include service copy engines (i.e., everything Xe2 and later, plus PVC's Xe_HPC) we should use MEM_SET and MEM_COPY even in if all of the service copy engines wind up getting fused off. I.e., we need to decide based on whether the platform's graphics descriptor contains these engines, rather than whether the usable engine mask contains them. This logic got broken when gt->info.__engine_mask was removed, although in practice that mistake has been harmless so far because there haven't been any hardware SKUs that fuse off all of the service copy engines yet. Replace the incorrect has_service_copy_support() function with a GT feature flag that tracks more accurately whether the new blitter instructions are usable. In addition to fixing incorrect logic if all service copies are fused off, the flag also makes it more obvious what the calling code is trying to do; previously it wasn't terribly obvious why "has service copy engines" was being used as the condition for using different instructions on all copy engine types. The new feature flag is named 'has_xe2_blt_instructions' because we expect this flag to be set for all Xe2 and later platforms (i.e., everything officially supported by the Xe driver). Technically there's also one Xe1-era platform (PVC) that supports these engines/instructions and will set this flag, but this still seems to be the most clear and understandable name for the flag. Fixes: 61549a2ee594 ("drm/xe: Drop __engine_mask") Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://patch.msgid.link/20260507-xe2_copy-v1-1-26506381b821@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 09b399842907565a64e351fb22da790b4c673ffb) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
10 daysdrm/xe/madvise: Track purgeability with BO-local countersArvind Yadav
xe_bo_recompute_purgeable_state() walks all VMAs of a BO to determine whether the BO can be made purgeable. This makes VMA create/destroy and madvise updates O(n) in the number of mappings. Replace the walk with BO-local counters protected by the BO dma-resv lock: - vma_count tracks the number of VMAs mapping the BO. - willneed_count tracks active WILLNEED holders, including WILLNEED VMAs and active dma-buf exports for non-imported BOs. A DONTNEED BO is promoted back to WILLNEED on a 0->1 transition of willneed_count. A BO is demoted to DONTNEED on a 1->0 transition only when it still has VMAs, preserving the previous behaviour where a BO with no mappings keeps its current madvise state. PURGED remains terminal, preserving the existing "once purged, always purged" rule. Fixes: 4f44961eab84 ("drm/xe/vm: Prevent binding of purged buffer objects") v2: - Use early return for imported BOs in all four helpers to avoid nesting (Matt B). - Group purgeability state into a purgeable sub-struct on struct xe_bo (Matt B). - Reword xe_bo_willneed_put_locked() kernel-doc to explain that a 1->0 transition means all remaining active VMAs are DONTNEED (Matt B). v3: - Move DONTNEED/PURGED reject from vma_lock_and_validate() into xe_vma_create(), gated on attr->purgeable_state == WILLNEED. Fixes vm_bind bypass and partial-unbind rejection on DONTNEED BOs (Matt B). - Drop .check_purged from MAP and REMAP; keep it for PREFETCH and add a comment why (Matt B). - Skip BO validation in vma_lock_and_validate() for non-WILLNEED VMA remnants so cleanup/remap paths do not repopulate DONTNEED/PURGED BOs. Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260506132027.2556046-1-arvind.yadav@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> (cherry picked from commit 23fb2ea56cb4fa2587bc072b04e4e698687a48e4) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
10 daysnvme: fix race condition between connected uevent and STARTED_ONCE flagMaurizio Lombardi
When a controller connects, nvme_start_ctrl() emits the "NVME_EVENT=connected" uevent and sets the NVME_CTRL_STARTED_ONCE flag. Currently, the uevent is emitted before the flag is set. This creates a race condition for userspace tools (like udev rules) that might rely on the "connected" event to configure other attributes. Swap the order of operations in nvme_start_ctrl() so that the NVME_CTRL_STARTED_ONCE flag is set before the uevent is sent. This guarantees that the admin_timeout can already be changed when userspace is notified. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
10 daysACPI: driver: Check ACPI_COMPANION() against NULL during probeRafael J. Wysocki
Since every platform driver can be forced to match a device that doesn't match its list of device IDs because of device_match_driver_override(), platform drivers that rely on the existence of a device's ACPI companion object should verify its presence. Accordingly, add requisite ACPI_COMPANION() or ACPI_HANDLE() checks against NULL to 13 platform drivers handling core ACPI devices. Also change the value returned by the ACPI thermal zone driver when the device's ACPI companion is not present to -ENODEV for consistency with the other drivers. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Hans de Goede <johannes.goede@oss.qualcomm.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/4516068.ejJDZkT8p0@rafael.j.wysocki Cc: 7.0+ <stable@vger.kernel.org> # 7.0+