summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2025-10-29net: add ndo_fdb_del_bulkNikolay Aleksandrov
[ Upstream commit 1306d5362a591493a2d07f685ed2cc480dcda320 ] Add a new netdev op called ndo_fdb_del_bulk, it will be later used for driver-specific bulk delete implementation dispatched from rtnetlink. The first user will be the bridge, we need it to signal to rtnetlink from the driver that we support bulk delete operation (NLM_F_BULK). Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: bf29555f5bdc ("rtnetlink: Allow deleting FDB entries in user namespace") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-29iio: frequency: adf4350: Fix ADF4350_REG3_12BIT_CLKDIV_MODEMichael Hennerich
commit 1d8fdabe19267338f29b58f968499e5b55e6a3b6 upstream. The clk div bits (2 bits wide) do not start in bit 16 but in bit 15. Fix it accordingly. Fixes: e31166f0fd48 ("iio: frequency: New driver for Analog Devices ADF4350/ADF4351 Wideband Synthesizers") Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Nuno Sá <nuno.sa@analog.com> Link: https://patch.msgid.link/20250829-adf4350-fix-v2-2-0bf543ba797d@analog.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-29driver core/PM: Set power.no_callbacks along with power.no_pmRafael J. Wysocki
commit c2ce2453413d429e302659abc5ace634e873f6f5 upstream. Devices with power.no_pm set are not expected to need any power management at all, so modify device_set_pm_not_required() to set power.no_callbacks for them too in case runtime PM will be enabled for any of them (which in principle may be done for convenience if such a device participates in a dependency chain). Since device_set_pm_not_required() must be called before device_add() or it would not have any effect, it can update power.no_callbacks without locking, unlike pm_runtime_no_callbacks() that can be called after registering the target device. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: stable <stable@kernel.org> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/1950054.tdWV9SEqCh@rafael.j.wysocki Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-02genirq: Provide new interfaces for affinity hintsThomas Gleixner
[ Upstream commit 65c7cdedeb3026fabcc967a7aae2f755ad4d0783 ] The discussion about removing the side effect of irq_set_affinity_hint() of actually applying the cpumask (if not NULL) as affinity to the interrupt, unearthed a few unpleasantries: 1) The modular perf drivers rely on the current behaviour for the very wrong reasons. 2) While none of the other drivers prevents user space from changing the affinity, a cursorily inspection shows that there are at least expectations in some drivers. #1 needs to be cleaned up anyway, so that's not a problem #2 might result in subtle regressions especially when irqbalanced (which nowadays ignores the affinity hint) is disabled. Provide new interfaces: irq_update_affinity_hint() - Only sets the affinity hint pointer irq_set_affinity_and_hint() - Set the pointer and apply the affinity to the interrupt Make irq_set_affinity_hint() a wrapper around irq_apply_affinity_hint() and document it to be phased out. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210501021832.743094-1-jesse.brandeburg@intel.com Link: https://lore.kernel.org/r/20210903152430.244937-2-nitesh@redhat.com Stable-dep-of: 915470e1b44e ("i40e: fix IRQ freeing in i40e_vsi_request_irq_msix error path") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-02genirq: Export affinity setter for modulesThomas Gleixner
[ Upstream commit 4d80d6ca5d77fde9880da8466e5b64f250e5bf82 ] Perf modules abuse irq_set_affinity_hint() to set the affinity of system PMU interrupts just because irq_set_affinity() was not exported. The fact that irq_set_affinity_hint() actually sets the affinity is a non-documented side effect and the name is clearly saying it's a hint. To clean this up, export the real affinity setter. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20210518093117.968251441@linutronix.de Stable-dep-of: 915470e1b44e ("i40e: fix IRQ freeing in i40e_vsi_request_irq_msix error path") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-02genirq/affinity: Add irq_update_affinity_desc()John Garry
[ Upstream commit 1d3aec89286254487df7641c30f1b14ad1d127a5 ] Add a function to allow the affinity of an interrupt be switched to managed, such that interrupts allocated for platform devices may be managed. This new interface has certain limitations, and attempts to use it in the following circumstances will fail: - For when the kernel is configured for generic IRQ reservation mode (in config GENERIC_IRQ_RESERVATION_MODE). The reason being that it could conflict with managed vs. non-managed interrupt accounting. - The interrupt is already started, which should not be the case during init - The interrupt is already configured as managed, which means double init Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1606905417-183214-2-git-send-email-john.garry@huawei.com Stable-dep-of: 915470e1b44e ("i40e: fix IRQ freeing in i40e_vsi_request_irq_msix error path") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04atm: atmtcp: Prevent arbitrary write in atmtcp_recv_control().Kuniyuki Iwashima
[ Upstream commit ec79003c5f9d2c7f9576fc69b8dbda80305cbe3a ] syzbot reported the splat below. [0] When atmtcp_v_open() or atmtcp_v_close() is called via connect() or close(), atmtcp_send_control() is called to send an in-kernel special message. The message has ATMTCP_HDR_MAGIC in atmtcp_control.hdr.length. Also, a pointer of struct atm_vcc is set to atmtcp_control.vcc. The notable thing is struct atmtcp_control is uAPI but has a space for an in-kernel pointer. struct atmtcp_control { struct atmtcp_hdr hdr; /* must be first */ ... atm_kptr_t vcc; /* both directions */ ... } __ATM_API_ALIGN; typedef struct { unsigned char _[8]; } __ATM_API_ALIGN atm_kptr_t; The special message is processed in atmtcp_recv_control() called from atmtcp_c_send(). atmtcp_c_send() is vcc->dev->ops->send() and called from 2 paths: 1. .ndo_start_xmit() (vcc->send() == atm_send_aal0()) 2. vcc_sendmsg() The problem is sendmsg() does not validate the message length and userspace can abuse atmtcp_recv_control() to overwrite any kptr by atmtcp_control. Let's add a new ->pre_send() hook to validate messages from sendmsg(). [0]: Oops: general protection fault, probably for non-canonical address 0xdffffc00200000ab: 0000 [#1] SMP KASAN PTI KASAN: probably user-memory-access in range [0x0000000100000558-0x000000010000055f] CPU: 0 UID: 0 PID: 5865 Comm: syz-executor331 Not tainted 6.17.0-rc1-syzkaller-00215-gbab3ce404553 #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025 RIP: 0010:atmtcp_recv_control drivers/atm/atmtcp.c:93 [inline] RIP: 0010:atmtcp_c_send+0x1da/0x950 drivers/atm/atmtcp.c:297 Code: 4d 8d 75 1a 4c 89 f0 48 c1 e8 03 42 0f b6 04 20 84 c0 0f 85 15 06 00 00 41 0f b7 1e 4d 8d b7 60 05 00 00 4c 89 f0 48 c1 e8 03 <42> 0f b6 04 20 84 c0 0f 85 13 06 00 00 66 41 89 1e 4d 8d 75 1c 4c RSP: 0018:ffffc90003f5f810 EFLAGS: 00010203 RAX: 00000000200000ab RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff88802a510000 RSI: 00000000ffffffff RDI: ffff888030a6068c RBP: ffff88802699fb40 R08: ffff888030a606eb R09: 1ffff1100614c0dd R10: dffffc0000000000 R11: ffffffff8718fc40 R12: dffffc0000000000 R13: ffff888030a60680 R14: 000000010000055f R15: 00000000ffffffff FS: 00007f8d7e9236c0(0000) GS:ffff888125c1c000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000045ad50 CR3: 0000000075bde000 CR4: 00000000003526f0 Call Trace: <TASK> vcc_sendmsg+0xa10/0xc60 net/atm/common.c:645 sock_sendmsg_nosec net/socket.c:714 [inline] __sock_sendmsg+0x219/0x270 net/socket.c:729 ____sys_sendmsg+0x505/0x830 net/socket.c:2614 ___sys_sendmsg+0x21f/0x2a0 net/socket.c:2668 __sys_sendmsg net/socket.c:2700 [inline] __do_sys_sendmsg net/socket.c:2705 [inline] __se_sys_sendmsg net/socket.c:2703 [inline] __x64_sys_sendmsg+0x19b/0x260 net/socket.c:2703 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f8d7e96a4a9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 51 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f8d7e923198 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f8d7e9f4308 RCX: 00007f8d7e96a4a9 RDX: 0000000000000000 RSI: 0000200000000240 RDI: 0000000000000005 RBP: 00007f8d7e9f4300 R08: 65732f636f72702f R09: 65732f636f72702f R10: 65732f636f72702f R11: 0000000000000246 R12: 00007f8d7e9c10ac R13: 00007f8d7e9231a0 R14: 0000200000000200 R15: 0000200000000250 </TASK> Modules linked in: Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+1741b56d54536f4ec349@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68a6767c.050a0220.3d78fd.0011.GAE@google.com/ Tested-by: syzbot+1741b56d54536f4ec349@syzkaller.appspotmail.com Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250821021901.2814721-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04net/atm: remove the atmdev_ops {get, set}sockopt methodsChristoph Hellwig
[ Upstream commit a06d30ae7af492497ffbca6abf1621d508b8fcaa ] All implementations of these two methods are dummies that always return -EINVAL. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: ec79003c5f9d ("atm: atmtcp: Prevent arbitrary write in atmtcp_recv_control().") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28nfs: fix UAF in direct writesJosef Bacik
commit 17f46b803d4f23c66cacce81db35fef3adb8f2af upstream. In production we have been hitting the following warning consistently ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 17 PID: 1800359 at lib/refcount.c:28 refcount_warn_saturate+0x9c/0xe0 Workqueue: nfsiod nfs_direct_write_schedule_work [nfs] RIP: 0010:refcount_warn_saturate+0x9c/0xe0 PKRU: 55555554 Call Trace: <TASK> ? __warn+0x9f/0x130 ? refcount_warn_saturate+0x9c/0xe0 ? report_bug+0xcc/0x150 ? handle_bug+0x3d/0x70 ? exc_invalid_op+0x16/0x40 ? asm_exc_invalid_op+0x16/0x20 ? refcount_warn_saturate+0x9c/0xe0 nfs_direct_write_schedule_work+0x237/0x250 [nfs] process_one_work+0x12f/0x4a0 worker_thread+0x14e/0x3b0 ? ZSTD_getCParams_internal+0x220/0x220 kthread+0xdc/0x120 ? __btf_name_valid+0xa0/0xa0 ret_from_fork+0x1f/0x30 This is because we're completing the nfs_direct_request twice in a row. The source of this is when we have our commit requests to submit, we process them and send them off, and then in the completion path for the commit requests we have if (nfs_commit_end(cinfo.mds)) nfs_direct_write_complete(dreq); However since we're submitting asynchronous requests we sometimes have one that completes before we submit the next one, so we end up calling complete on the nfs_direct_request twice. The only other place we use nfs_generic_commit_list() is in __nfs_commit_inode, which wraps this call in a nfs_commit_begin(); nfs_commit_end(); Which is a common pattern for this style of completion handling, one that is also repeated in the direct code with get_dreq()/put_dreq() calls around where we process events as well as in the completion paths. Fix this by using the same pattern for the commit requests. Before with my 200 node rocksdb stress running this warning would pop every 10ish minutes. With my patch the stress test has been running for several hours without popping. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> [ chanho : Backports v5.4.y, commit 133a48abf6ec (NFS: Fix up commit deadlocks) is needed to use nfs_commit_end ] Signed-off-by: Chanho Min <chanho.min@lge.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28NFS: Fix up commit deadlocksTrond Myklebust
commit 133a48abf6ecc535d7eddc6da1c3e4c972445882 upstream. If O_DIRECT bumps the commit_info rpcs_out field, then that could lead to fsync() hangs. The fix is to ensure that O_DIRECT calls nfs_commit_end(). Fixes: 723c921e7dfc ("sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chanho Min <chanho.min@lge.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28mm: update memfd seal write check to include F_SEAL_WRITELorenzo Stoakes
[ Upstream commit 28464bbb2ddc199433383994bcb9600c8034afa1 ] The seal_check_future_write() function is called by shmem_mmap() or hugetlbfs_file_mmap() to disallow any future writable mappings of an memfd sealed this way. The F_SEAL_WRITE flag is not checked here, as that is handled via the mapping->i_mmap_writable mechanism and so any attempt at a mapping would fail before this could be run. However we intend to change this, meaning this check can be performed for F_SEAL_WRITE mappings also. The logic here is equally applicable to both flags, so update this function to accommodate both and rename it accordingly. Link: https://lkml.kernel.org/r/913628168ce6cce77df7d13a63970bae06a526e0.1697116581.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28mm: drop the assumption that VM_SHARED always implies writableLorenzo Stoakes
[ Upstream commit e8e17ee90eaf650c855adb0a3e5e965fd6692ff1 ] Patch series "permit write-sealed memfd read-only shared mappings", v4. The man page for fcntl() describing memfd file seals states the following about F_SEAL_WRITE:- Furthermore, trying to create new shared, writable memory-mappings via mmap(2) will also fail with EPERM. With emphasis on 'writable'. In turns out in fact that currently the kernel simply disallows all new shared memory mappings for a memfd with F_SEAL_WRITE applied, rendering this documentation inaccurate. This matters because users are therefore unable to obtain a shared mapping to a memfd after write sealing altogether, which limits their usefulness. This was reported in the discussion thread [1] originating from a bug report [2]. This is a product of both using the struct address_space->i_mmap_writable atomic counter to determine whether writing may be permitted, and the kernel adjusting this counter when any VM_SHARED mapping is performed and more generally implicitly assuming VM_SHARED implies writable. It seems sensible that we should only update this mapping if VM_MAYWRITE is specified, i.e. whether it is possible that this mapping could at any point be written to. If we do so then all we need to do to permit write seals to function as documented is to clear VM_MAYWRITE when mapping read-only. It turns out this functionality already exists for F_SEAL_FUTURE_WRITE - we can therefore simply adapt this logic to do the same for F_SEAL_WRITE. We then hit a chicken and egg situation in mmap_region() where the check for VM_MAYWRITE occurs before we are able to clear this flag. To work around this, perform this check after we invoke call_mmap(), with careful consideration of error paths. Thanks to Andy Lutomirski for the suggestion! [1]:https://lore.kernel.org/all/20230324133646.16101dfa666f253c4715d965@linux-foundation.org/ [2]:https://bugzilla.kernel.org/show_bug.cgi?id=217238 This patch (of 3): There is a general assumption that VMAs with the VM_SHARED flag set are writable. If the VM_MAYWRITE flag is not set, then this is simply not the case. Update those checks which affect the struct address_space->i_mmap_writable field to explicitly test for this by introducing [vma_]is_shared_maywrite() helper functions. This remains entirely conservative, as the lack of VM_MAYWRITE guarantees that the VMA cannot be written to. Link: https://lkml.kernel.org/r/cover.1697116581.git.lstoakes@gmail.com Link: https://lkml.kernel.org/r/d978aefefa83ec42d18dfa964ad180dbcde34795.1697116581.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Suggested-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28net: usbnet: Avoid potential RCU stall on LINK_CHANGE eventJohn Ernberg
commit 0d9cfc9b8cb17dbc29a98792d36ec39a1cf1395f upstream. The Gemalto Cinterion PLS83-W modem (cdc_ether) is emitting confusing link up and down events when the WWAN interface is activated on the modem-side. Interrupt URBs will in consecutive polls grab: * Link Connected * Link Disconnected * Link Connected Where the last Connected is then a stable link state. When the system is under load this may cause the unlink_urbs() work in __handle_link_change() to not complete before the next usbnet_link_change() call turns the carrier on again, allowing rx_submit() to queue new SKBs. In that event the URB queue is filled faster than it can drain, ending up in a RCU stall: rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-.... } 33108 jiffies s: 201 root: 0x1/. rcu: blocking rcu_node structures (internal RCU debug): Sending NMI from CPU 1 to CPUs 0: NMI backtrace for cpu 0 Call trace: arch_local_irq_enable+0x4/0x8 local_bh_enable+0x18/0x20 __netdev_alloc_skb+0x18c/0x1cc rx_submit+0x68/0x1f8 [usbnet] rx_alloc_submit+0x4c/0x74 [usbnet] usbnet_bh+0x1d8/0x218 [usbnet] usbnet_bh_tasklet+0x10/0x18 [usbnet] tasklet_action_common+0xa8/0x110 tasklet_action+0x2c/0x34 handle_softirqs+0x2cc/0x3a0 __do_softirq+0x10/0x18 ____do_softirq+0xc/0x14 call_on_irq_stack+0x24/0x34 do_softirq_own_stack+0x18/0x20 __irq_exit_rcu+0xa8/0xb8 irq_exit_rcu+0xc/0x30 el1_interrupt+0x34/0x48 el1h_64_irq_handler+0x14/0x1c el1h_64_irq+0x68/0x6c _raw_spin_unlock_irqrestore+0x38/0x48 xhci_urb_dequeue+0x1ac/0x45c [xhci_hcd] unlink1+0xd4/0xdc [usbcore] usb_hcd_unlink_urb+0x70/0xb0 [usbcore] usb_unlink_urb+0x24/0x44 [usbcore] unlink_urbs.constprop.0.isra.0+0x64/0xa8 [usbnet] __handle_link_change+0x34/0x70 [usbnet] usbnet_deferred_kevent+0x1c0/0x320 [usbnet] process_scheduled_works+0x2d0/0x48c worker_thread+0x150/0x1dc kthread+0xd8/0xe8 ret_from_fork+0x10/0x20 Get around the problem by delaying the carrier on to the scheduled work. This needs a new flag to keep track of the necessary action. The carrier ok check cannot be removed as it remains required for the LINK_RESET event flow. Fixes: 4b49f58fff00 ("usbnet: handle link change") Cc: stable@vger.kernel.org Signed-off-by: John Ernberg <john.ernberg@actia.se> Link: https://patch.msgid.link/20250723102526.1305339-1-john.ernberg@actia.se Signed-off-by: Jakub Kicinski <kuba@kernel.org> [ adjust context in header ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28PCI/ACPI: Fix runtime PM ref imbalance on Hot-Plug Capable portsLukas Wunner
commit 6cff20ce3b92ffbf2fc5eb9e5a030b3672aa414a upstream. pci_bridge_d3_possible() is called from both pcie_portdrv_probe() and pcie_portdrv_remove() to determine whether runtime power management shall be enabled (on probe) or disabled (on remove) on a PCIe port. The underlying assumption is that pci_bridge_d3_possible() always returns the same value, else a runtime PM reference imbalance would occur. That assumption is not given if the PCIe port is inaccessible on remove due to hot-unplug: pci_bridge_d3_possible() calls pciehp_is_native(), which accesses Config Space to determine whether the port is Hot-Plug Capable. An inaccessible port returns "all ones", which is converted to "all zeroes" by pcie_capability_read_dword(). Hence the port no longer seems Hot-Plug Capable on remove even though it was on probe. The resulting runtime PM ref imbalance causes warning messages such as: pcieport 0000:02:04.0: Runtime PM usage count underflow! Avoid the Config Space access (and thus the runtime PM ref imbalance) by caching the Hot-Plug Capable bit in struct pci_dev. The struct already contains an "is_hotplug_bridge" flag, which however is not only set on Hot-Plug Capable PCIe ports, but also Conventional PCI Hot-Plug bridges and ACPI slots. The flag identifies bridges which are allocated additional MMIO and bus number resources to allow for hierarchy expansion. The kernel is somewhat sloppily using "is_hotplug_bridge" in a number of places to identify Hot-Plug Capable PCIe ports, even though the flag encompasses other devices. Subsequent commits replace these occurrences with the new flag to clearly delineate Hot-Plug Capable PCIe ports from other kinds of hotplug bridges. Document the existing "is_hotplug_bridge" and the new "is_pciehp" flag and document the (non-obvious) requirement that pci_bridge_d3_possible() always returns the same value across the entire lifetime of a bridge, including its hot-removal. Fixes: 5352a44a561d ("PCI: pciehp: Make pciehp_is_native() stricter") Reported-by: Laurent Bigonville <bigon@bigon.be> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220216 Reported-by: Mario Limonciello <mario.limonciello@amd.com> Closes: https://lore.kernel.org/r/20250609020223.269407-3-superm1@kernel.org/ Link: https://lore.kernel.org/all/20250620025535.3425049-3-superm1@kernel.org/T/#u Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Cc: stable@vger.kernel.org # v4.18+ Link: https://patch.msgid.link/fe5dcc3b2e62ee1df7905d746bde161eb1b3291c.1752390101.git.lukas@wunner.de [ Adjust surrounding documentation changes ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28net: vlan: Replace BUG() with WARN_ON_ONCE() in vlan_dev_* stubsGal Pressman
[ Upstream commit 60a8b1a5d0824afda869f18dc0ecfe72f8dfda42 ] When CONFIG_VLAN_8021Q=n, a set of stub helpers are used, three of these helpers use BUG() unconditionally. This code should not be reached, as callers of these functions should always check for is_vlan_dev() first, but the usage of BUG() is not recommended, replace it with WARN_ON() instead. Reviewed-by: Alex Lazar <alazar@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250616132626.1749331-3-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28netmem: fix skb_frag_address_safe with unreadable skbsMina Almasry
[ Upstream commit 4672aec56d2e8edabcb74c3e2320301d106a377e ] skb_frag_address_safe() needs a check that the skb_frag_page exists check similar to skb_frag_address(). Cc: ap420073@gmail.com Signed-off-by: Mina Almasry <almasrymina@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250619175239.3039329-1-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28ipv6: reject malicious packets in ipv6_gso_segment()Eric Dumazet
[ Upstream commit d45cf1e7d7180256e17c9ce88e32e8061a7887fe ] syzbot was able to craft a packet with very long IPv6 extension headers leading to an overflow of skb->transport_header. This 16bit field has a limited range. Add skb_reset_transport_header_careful() helper and use it from ipv6_gso_segment() WARNING: CPU: 0 PID: 5871 at ./include/linux/skbuff.h:3032 skb_reset_transport_header include/linux/skbuff.h:3032 [inline] WARNING: CPU: 0 PID: 5871 at ./include/linux/skbuff.h:3032 ipv6_gso_segment+0x15e2/0x21e0 net/ipv6/ip6_offload.c:151 Modules linked in: CPU: 0 UID: 0 PID: 5871 Comm: syz-executor211 Not tainted 6.16.0-rc6-syzkaller-g7abc678e3084 #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025 RIP: 0010:skb_reset_transport_header include/linux/skbuff.h:3032 [inline] RIP: 0010:ipv6_gso_segment+0x15e2/0x21e0 net/ipv6/ip6_offload.c:151 Call Trace: <TASK> skb_mac_gso_segment+0x31c/0x640 net/core/gso.c:53 nsh_gso_segment+0x54a/0xe10 net/nsh/nsh.c:110 skb_mac_gso_segment+0x31c/0x640 net/core/gso.c:53 __skb_gso_segment+0x342/0x510 net/core/gso.c:124 skb_gso_segment include/net/gso.h:83 [inline] validate_xmit_skb+0x857/0x11b0 net/core/dev.c:3950 validate_xmit_skb_list+0x84/0x120 net/core/dev.c:4000 sch_direct_xmit+0xd3/0x4b0 net/sched/sch_generic.c:329 __dev_xmit_skb net/core/dev.c:4102 [inline] __dev_queue_xmit+0x17b6/0x3a70 net/core/dev.c:4679 Fixes: d1da932ed4ec ("ipv6: Separate ipv6 offload support") Reported-by: syzbot+af43e647fd835acc02df@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/688a1a05.050a0220.5d226.0008.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250730131738.3385939-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28module: Restore the moduleparam prefix length checkPetr Pavlu
[ Upstream commit bdc877ba6b7ff1b6d2ebeff11e63da4a50a54854 ] The moduleparam code allows modules to provide their own definition of MODULE_PARAM_PREFIX, instead of using the default KBUILD_MODNAME ".". Commit 730b69d22525 ("module: check kernel param length at compile time, not runtime") added a check to ensure the prefix doesn't exceed MODULE_NAME_LEN, as this is what param_sysfs_builtin() expects. Later, commit 58f86cc89c33 ("VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.") removed this check, but there is no indication this was intentional. Since the check is still useful for param_sysfs_builtin() to function properly, reintroduce it in __module_param_call(), but in a modernized form using static_assert(). While here, clean up the __module_param_call() comments. In particular, remove the comment "Default value instead of permissions?", which comes from commit 9774a1f54f17 ("[PATCH] Compile-time check re world-writeable module params"). This comment was related to the test variable __param_perm_check_##name, which was removed in the previously mentioned commit 58f86cc89c33. Fixes: 58f86cc89c33 ("VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Link: https://lore.kernel.org/r/20250630143535.267745-4-petr.pavlu@suse.com Signed-off-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28pps: fix poll supportDenis OSTERLAND-HEIM
[ Upstream commit 12c409aa1ec2592280a2ddcc66ff8f3c7f7bb171 ] Because pps_cdev_poll() returns unconditionally EPOLLIN, a user space program that calls select/poll get always an immediate data ready-to-read response. As a result the intended use to wait until next data becomes ready does not work. User space snippet: struct pollfd pollfd = { .fd = open("/dev/pps0", O_RDONLY), .events = POLLIN|POLLERR, .revents = 0 }; while(1) { poll(&pollfd, 1, 2000/*ms*/); // returns immediate, but should wait if(revents & EPOLLIN) { // always true struct pps_fdata fdata; memset(&fdata, 0, sizeof(memdata)); ioctl(PPS_FETCH, &fdata); // currently fetches data at max speed } } Lets remember the last fetch event counter and compare this value in pps_cdev_poll() with most recent event counter and return 0 if they are equal. Signed-off-by: Denis OSTERLAND-HEIM <denis.osterland@diehl.com> Co-developed-by: Rodolfo Giometti <giometti@enneenne.com> Signed-off-by: Rodolfo Giometti <giometti@enneenne.com> Fixes: eae9d2ba0cfc ("LinuxPPS: core support") Link: https://lore.kernel.org/all/f6bed779-6d59-4f0f-8a59-b6312bd83b4e@enneenne.com/ Acked-by: Rodolfo Giometti <giometti@enneenne.com> Link: https://lore.kernel.org/r/c3c50ad1eb19ef553eca8a57c17f4c006413ab70.camel@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28usb: chipidea: introduce CI_HDRC_CONTROLLER_VBUS_EVENT glue layer usePeter Chen
[ Upstream commit d755cdb1b9d7e1b645e176b97eb137194bbe8cf9 ] Some vendors glue layer need to handle some events for vbus, eg, some i.mx platforms (imx7d, imx8mm, imx8mn, etc) needs vbus event to handle charger detection, its charger detection is finished at glue layer code, but not at USB PHY driver. Signed-off-by: Peter Chen <peter.chen@nxp.com> Stable-dep-of: b7a62611fab7 ("usb: chipidea: add USB PHY event") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-17regulator: gpio: Add input_supply support in gpio_regulator_configJerome Neanne
[ Upstream commit adfdfcbdbd32b356323a3db6d3a683270051a7e6 ] This is simillar as fixed-regulator. Used to extract regulator parent from the device tree. Without that property used, the parent regulator can be shut down (if not an always on). Thus leading to inappropriate behavior: On am62-SP-SK this fix is required to avoid tps65219 ldo1 (SDMMC rail) to be shut down after boot completion. Signed-off-by: Jerome Neanne <jneanne@baylibre.com> Link: https://lore.kernel.org/r/20220929132526.29427-2-jneanne@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org> Stable-dep-of: c9764fd88bc7 ("regulator: gpio: Fix the out-of-bounds access to drvdata::gpiods") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-17usb: typec: altmodes/displayport: do not index invalid pin_assignmentsRD Babiera
commit af4db5a35a4ef7a68046883bfd12468007db38f1 upstream. A poorly implemented DisplayPort Alt Mode port partner can indicate that its pin assignment capabilities are greater than the maximum value, DP_PIN_ASSIGN_F. In this case, calls to pin_assignment_show will cause a BRK exception due to an out of bounds array access. Prevent for loop in pin_assignment_show from accessing invalid values in pin_assignments by adding DP_PIN_ASSIGN_MAX value in typec_dp.h and using i < DP_PIN_ASSIGN_MAX as a loop condition. Fixes: 0e3bb7d6894d ("usb: typec: Add driver for DisplayPort alternate mode") Cc: stable <stable@kernel.org> Signed-off-by: RD Babiera <rdbabiera@google.com> Reviewed-by: Badhri Jagan Sridharan <badhri@google.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20250618224943.3263103-2-rdbabiera@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17of: Add of_property_present() helperRob Herring
[ Upstream commit 9cbad37ce8122de32a1529e394b468bc101c9e7f ] Add an of_property_present() function similar to fwnode_property_present(). of_property_read_bool() could be used directly, but it is cleaner to not use it on non-boolean properties. Reviewed-by: Frank Rowand <frowand.list@gmail.com> Tested-by: Frank Rowand <frowand.list@gmail.com> Link: https://lore.kernel.org/all/20230215215547.691573-1-robh@kernel.org/ Signed-off-by: Rob Herring <robh@kernel.org> Stable-dep-of: 171eb6f71e9e ("ASoC: meson: meson-card-utils: use of_property_present() for DT parsing") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-17of: property: define of_property_read_u{8,16,32,64}_array() unconditionallyMichael Walle
[ Upstream commit 2ca42c3ad9ed875b136065b010753a4caaaa1d38 ] We can get rid of all the empty stubs because all these functions call of_property_read_variable_u{8,16,32,64}_array() which already have an empty stub if CONFIG_OF is not defined. Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220118173504.2867523-3-michael@walle.cc Stable-dep-of: 171eb6f71e9e ("ASoC: meson: meson-card-utils: use of_property_present() for DT parsing") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27HID: usbhid: Eliminate recurrent out-of-bounds bug in usbhid_parse()Terry Junge
commit fe7f7ac8e0c708446ff017453add769ffc15deed upstream. Update struct hid_descriptor to better reflect the mandatory and optional parts of the HID Descriptor as per USB HID 1.11 specification. Note: the kernel currently does not parse any optional HID class descriptors, only the mandatory report descriptor. Update all references to member element desc[0] to rpt_desc. Add test to verify bLength and bNumDescriptors values are valid. Replace the for loop with direct access to the mandatory HID class descriptor member for the report descriptor. This eliminates the possibility of getting an out-of-bounds fault. Add a warning message if the HID descriptor contains any unsupported optional HID class descriptors. Reported-by: syzbot+c52569baf0c843f35495@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c52569baf0c843f35495 Fixes: f043bfc98c19 ("HID: usbhid: fix out-of-bounds bug") Cc: stable@vger.kernel.org Signed-off-by: Terry Junge <linuxhid@cosmicgizmosystems.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Terry Junge <linuxhid@cosmicgizmosystems.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27atm: Revert atm_account_tx() if copy_from_iter_full() fails.Kuniyuki Iwashima
commit 7851263998d4269125fd6cb3fdbfc7c6db853859 upstream. In vcc_sendmsg(), we account skb->truesize to sk->sk_wmem_alloc by atm_account_tx(). It is expected to be reverted by atm_pop_raw() later called by vcc->dev->ops->send(vcc, skb). However, vcc_sendmsg() misses the same revert when copy_from_iter_full() fails, and then we will leak a socket. Let's factorise the revert part as atm_return_tx() and call it in the failure path. Note that the corresponding sk_wmem_alloc operation can be found in alloc_tx() as of the blamed commit. $ git blame -L:alloc_tx net/atm/common.c c55fa3cccbc2c~ Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Simon Horman <horms@kernel.org> Closes: https://lore.kernel.org/netdev/20250614161959.GR414686@horms.kernel.org/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250616182147.963333-3-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-04coredump: hand a pidfd to the usermode coredump helperChristian Brauner
commit b5325b2a270fcaf7b2a9a0f23d422ca8a5a8bdea upstream. Give userspace a way to instruct the kernel to install a pidfd into the usermode helper process. This makes coredump handling a lot more reliable for userspace. In parallel with this commit we already have systemd adding support for this in [1]. We create a pidfs file for the coredumping process when we process the corename pattern. When the usermode helper process is forked we then install the pidfs file as file descriptor three into the usermode helpers file descriptor table so it's available to the exec'd program. Since usermode helpers are either children of the system_unbound_wq workqueue or kthreadd we know that the file descriptor table is empty and can thus always use three as the file descriptor number. Note, that we'll install a pidfd for the thread-group leader even if a subthread is calling do_coredump(). We know that task linkage hasn't been removed due to delay_group_leader() and even if this @current isn't the actual thread-group leader we know that the thread-group leader cannot be reaped until @current has exited. [brauner: This is a backport for the v5.4 series. Upstream has significantly changed and backporting all that infra is a non-starter. So simply backport the pidfd_prepare() helper and waste the file descriptor we allocated. Then we minimally massage the umh coredump setup code.] Link: https://github.com/systemd/systemd/pull/37125 [1] Link: https://lore.kernel.org/20250414-work-coredump-v2-3-685bf231f828@kernel.org Tested-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-04pid: add pidfd_prepare()Christian Brauner
commit 6ae930d9dbf2d093157be33428538c91966d8a9f upstream. Add a new helper that allows to reserve a pidfd and allocates a new pidfd file that stashes the provided struct pid. This will allow us to remove places that either open code this function or that call pidfd_create() but then have to call close_fd() because there are still failure points after pidfd_create() has been called. Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230327-pidfd-file-api-v1-1-5c0e9a3158e4@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-04pidfd: check pid has attached task in fdinfoChristian Brauner
commit 3d6d8da48d0b214d65ea0227d47228abc75d7c88 upstream. Currently, when a task is dead we still print the pid it used to use in the fdinfo files of its pidfds. This doesn't make much sense since the pid may have already been reused. So verify that the task is still alive by introducing the pid_has_task() helper which will be used by other callers in follow-up patches. If the task is not alive anymore, we will print -1. This allows us to differentiate between a task not being present in a given pid namespace - in which case we already print 0 - and a task having been reaped. Note that this uses PIDTYPE_PID for the check. Technically, we could've checked PIDTYPE_TGID since pidfds currently only refer to thread-group leaders but if they won't anymore in the future then this check becomes problematic without it being immediately obvious to non-experts imho. If a thread is created via clone(CLONE_THREAD) than struct pid has a single non-empty list pid->tasks[PIDTYPE_PID] and this pid can't be used as a PIDTYPE_TGID meaning pid->tasks[PIDTYPE_TGID] will return NULL even though the thread-group leader might still be very much alive. So checking PIDTYPE_PID is fine and is easier to maintain should we ever allow pidfds to refer to threads. Cc: Jann Horn <jannh@google.com> Cc: Christian Kellner <christian@kellner.me> Cc: linux-api@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20191017101832.5985-1-christian.brauner@ubuntu.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-04rcu: fix header guard for rcu_all_qs()Ankur Arora
[ Upstream commit ad6b5b73ff565e88aca7a7d1286788d80c97ba71 ] rcu_all_qs() is defined for !CONFIG_PREEMPT_RCU but the declaration is conditioned on CONFIG_PREEMPTION. With CONFIG_PREEMPT_LAZY, CONFIG_PREEMPTION=y does not imply CONFIG_PREEMPT_RCU=y. Decouple the two. Cc: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04net/mlx4_core: Avoid impossible mlx4_db_alloc() order valueKees Cook
[ Upstream commit 4a6f18f28627e121bd1f74b5fcc9f945d6dbeb1e ] GCC can see that the value range for "order" is capped, but this leads it to consider that it might be negative, leading to a false positive warning (with GCC 15 with -Warray-bounds -fdiagnostics-details): ../drivers/net/ethernet/mellanox/mlx4/alloc.c:691:47: error: array subscript -1 is below array bounds of 'long unsigned int *[2]' [-Werror=array-bounds=] 691 | i = find_first_bit(pgdir->bits[o], MLX4_DB_PER_PAGE >> o); | ~~~~~~~~~~~^~~ 'mlx4_alloc_db_from_pgdir': events 1-2 691 | i = find_first_bit(pgdir->bits[o], MLX4_DB_PER_PAGE >> o); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | | | | (2) out of array bounds here | (1) when the condition is evaluated to true In file included from ../drivers/net/ethernet/mellanox/mlx4/mlx4.h:53, from ../drivers/net/ethernet/mellanox/mlx4/alloc.c:42: ../include/linux/mlx4/device.h:664:33: note: while referencing 'bits' 664 | unsigned long *bits[2]; | ^~~~ Switch the argument to unsigned int, which removes the compiler needing to consider negative values. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20250210174504.work.075-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04dma-mapping: avoid potential unused data compilation warningMarek Szyprowski
[ Upstream commit c9b19ea63036fc537a69265acea1b18dabd1cbd3 ] When CONFIG_NEED_DMA_MAP_STATE is not defined, dma-mapping clients might report unused data compilation warnings for dma_unmap_*() calls arguments. Redefine macros for those calls to let compiler to notice that it is okay when the provided arguments are not used. Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20250415075659.428549-1-m.szyprowski@samsung.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04types: Complement the aligned types with signed 64-bit oneAndy Shevchenko
[ Upstream commit e4ca0e59c39442546866f3dd514a3a5956577daf ] Some user may want to use aligned signed 64-bit type. Provide it for them. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20240903180218.3640501-2-andriy.shevchenko@linux.intel.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Stable-dep-of: 5097eaae98e5 ("iio: adc: dln2: Use aligned_s64 for timestamp") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02PCI: Rename PCI_IRQ_LEGACY to PCI_IRQ_INTXBjorn Helgaas
[ Upstream commit 58ff9c5acb4aef58e118bbf39736cc4d6c11a3d3 ] Rename PCI_IRQ_LEGACY to PCI_IRQ_INTX to be more explicit about the type of IRQ being referenced as well as to match the PCI specifications terms. Redefine PCI_IRQ_LEGACY as an alias to PCI_IRQ_INTX to avoid the need for doing the renaming tree-wide. New drivers and new code should now prefer using PCI_IRQ_INTX instead of PCI_IRQ_LEGACY. Link: https://lore.kernel.org/r/20231122060406.14695-2-dlemoal@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Stable-dep-of: 919d14603dab ("misc: pci_endpoint_test: Fix displaying 'irq_type' after 'request_irq' error") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02nfs: add missing selections of CONFIG_CRC32Eric Biggers
[ Upstream commit cd35b6cb46649750b7dbd0df0e2d767415d8917b ] nfs.ko, nfsd.ko, and lockd.ko all use crc32_le(), which is available only when CONFIG_CRC32 is enabled. But the only NFS kconfig option that selected CONFIG_CRC32 was CONFIG_NFS_DEBUG, which is client-specific and did not actually guard the use of crc32_le() even on the client. The code worked around this bug by only actually calling crc32_le() when CONFIG_CRC32 is built-in, instead hard-coding '0' in other cases. This avoided randconfig build errors, and in real kernels the fallback code was unlikely to be reached since CONFIG_CRC32 is 'default y'. But, this really needs to just be done properly, especially now that I'm planning to update CONFIG_CRC32 to not be 'default y'. Therefore, make CONFIG_NFS_FS, CONFIG_NFSD, and CONFIG_LOCKD select CONFIG_CRC32. Then remove the fallback code that becomes unnecessary, as well as the selection of CONFIG_CRC32 from CONFIG_NFS_DEBUG. Fixes: 1264a2f053a3 ("NFS: refactor code for calculating the crc32 hash of a filehandle") Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02nfs: move nfs_fhandle_hash to common include fileJeff Layton
[ Upstream commit e59fb6749ed833deee5b3cfd7e89925296d41f49 ] lockd needs to be able to hash filehandles for tracepoints. Move the nfs_fhandle_hash() helper to a common nfs include file. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Stable-dep-of: cd35b6cb4664 ("nfs: add missing selections of CONFIG_CRC32") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02writeback: fix false warning in inode_to_wb()Andreas Gruenbacher
commit 9e888998ea4d22257b07ce911576509486fa0667 upstream. inode_to_wb() is used also for filesystems that don't support cgroup writeback. For these filesystems inode->i_wb is stable during the lifetime of the inode (it points to bdi->wb) and there's no need to hold locks protecting the inode->i_wb dereference. Improve the warning in inode_to_wb() to not trigger for these filesystems. Link: https://lkml.kernel.org/r/20250412163914.3773459-3-agruenba@redhat.com Fixes: aaa2cacf8184 ("writeback: add lockdep annotation to inode_to_wb()") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10sched/smt: Always inline sched_smt_active()Josh Poimboeuf
[ Upstream commit 09f37f2d7b21ff35b8b533f9ab8cfad2fe8f72f6 ] sched_smt_active() can be called from noinstr code, so it should always be inlined. The CONFIG_SCHED_SMT version already has __always_inline. Do the same for its !CONFIG_SCHED_SMT counterpart. Fixes the following warning: vmlinux.o: error: objtool: intel_idle_ibrs+0x13: call to sched_smt_active() leaves .noinstr.text section Fixes: 321a874a7ef8 ("sched/smt: Expose sched_smt_present static key") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/1d03907b0a247cf7fb5c1d518de378864f603060.1743481539.git.jpoimboe@kernel.org Closes: https://lore.kernel.org/r/202503311434.lyw2Tveh-lkp@intel.com/ Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10lockdep: Don't disable interrupts on RT in disable_irq_nosync_lockdep.*()Sebastian Andrzej Siewior
[ Upstream commit 87886b32d669abc11c7be95ef44099215e4f5788 ] disable_irq_nosync_lockdep() disables interrupts with lockdep enabled to avoid false positive reports by lockdep that a certain lock has not been acquired with disabled interrupts. The user of this macros expects that a lock can be acquried without disabling interrupts because the IRQ line triggering the interrupt is disabled. This triggers a warning on PREEMPT_RT because after disable_irq_nosync_lockdep.*() the following spinlock_t now is acquired with disabled interrupts. On PREEMPT_RT there is no difference between spin_lock() and spin_lock_irq() so avoiding disabling interrupts in this case works for the two remaining callers as of today. Don't disable interrupts on PREEMPT_RT in disable_irq_nosync_lockdep.*(). Closes: https://lore.kernel.org/760e34f9-6034-40e0-82a5-ee9becd24438@roeck-us.net Fixes: e8106b941ceab ("[PATCH] lockdep: core, add enable/disable_irq_irqsave/irqrestore() APIs") Reported-by: Guenter Roeck <linux@roeck-us.net> Suggested-by: "Steven Rostedt (Google)" <rostedt@goodmis.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20250212103619.2560503-2-bigeasy@linutronix.de Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10fuse: don't truncate cached, mutated symlinkMiklos Szeredi
[ Upstream commit b4c173dfbb6c78568578ff18f9e8822d7bd0e31b ] Fuse allows the value of a symlink to change and this property is exploited by some filesystems (e.g. CVMFS). It has been observed, that sometimes after changing the symlink contents, the value is truncated to the old size. This is caused by fuse_getattr() racing with fuse_reverse_inval_inode(). fuse_reverse_inval_inode() updates the fuse_inode's attr_version, which results in fuse_change_attributes() exiting before updating the cached attributes This is okay, as the cached attributes remain invalid and the next call to fuse_change_attributes() will likely update the inode with the correct values. The reason this causes problems is that cached symlinks will be returned through page_get_link(), which truncates the symlink to inode->i_size. This is correct for filesystems that don't mutate symlinks, but in this case it causes bad behavior. The solution is to just remove this truncation. This can cause a regression in a filesystem that relies on supplying a symlink larger than the file size, but this is unlikely. If that happens we'd need to make this behavior conditional. Reported-by: Laura Promberger <laura.promberger@cern.ch> Tested-by: Sam Lewis <samclewis@google.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/r/20250220100258.793363-1-mszeredi@redhat.com Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10netpoll: netpoll_send_skb() returns transmit statusEric Dumazet
[ Upstream commit 1ddabdfaf70c202b88925edd74c66f4707dbd92e ] Some callers want to know if the packet has been sent or dropped, to inform upper stacks. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: 505ead7ab77f ("netpoll: hold rcu read lock in __netpoll_send_skb()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10netpoll: move netpoll_send_skb() out of lineEric Dumazet
[ Upstream commit fb1eee476b0d3be3e58dac1a3a96f726c6278bed ] There is no need to inline this helper, as we intend to add more code in this function. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: 505ead7ab77f ("netpoll: hold rcu read lock in __netpoll_send_skb()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10netpoll: remove dev argument from netpoll_send_skb_on_dev()Eric Dumazet
[ Upstream commit 307f660d056b5eb8f5bb2328fac3915ab75b5007 ] netpoll_send_skb_on_dev() can get the device pointer directly from np->dev Rename it to __netpoll_send_skb() Following patch will move netpoll_send_skb() out-of-line. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: 505ead7ab77f ("netpoll: hold rcu read lock in __netpoll_send_skb()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10clockevents/drivers/i8253: Fix stop sequence for timer 0David Woodhouse
commit 531b2ca0a940ac9db03f246c8b77c4201de72b00 upstream. According to the data sheet, writing the MODE register should stop the counter (and thus the interrupts). This appears to work on real hardware, at least modern Intel and AMD systems. It should also work on Hyper-V. However, on some buggy virtual machines the mode change doesn't have any effect until the counter is subsequently loaded (or perhaps when the IRQ next fires). So, set MODE 0 and then load the counter, to ensure that those buggy VMs do the right thing and the interrupts stop. And then write MODE 0 *again* to stop the counter on compliant implementations too. Apparently, Hyper-V keeps firing the IRQ *repeatedly* even in mode zero when it should only happen once, but the second MODE write stops that too. Userspace test program (mostly written by tglx): ===== #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <stdint.h> #include <sys/io.h> static __always_inline void __out##bwl(type value, uint16_t port) \ { \ asm volatile("out" #bwl " %" #bw "0, %w1" \ : : "a"(value), "Nd"(port)); \ } \ \ static __always_inline type __in##bwl(uint16_t port) \ { \ type value; \ asm volatile("in" #bwl " %w1, %" #bw "0" \ : "=a"(value) : "Nd"(port)); \ return value; \ } BUILDIO(b, b, uint8_t) #define inb __inb #define outb __outb #define PIT_MODE 0x43 #define PIT_CH0 0x40 #define PIT_CH2 0x42 static int is8254; static void dump_pit(void) { if (is8254) { // Latch and output counter and status outb(0xC2, PIT_MODE); printf("%02x %02x %02x\n", inb(PIT_CH0), inb(PIT_CH0), inb(PIT_CH0)); } else { // Latch and output counter outb(0x0, PIT_MODE); printf("%02x %02x\n", inb(PIT_CH0), inb(PIT_CH0)); } } int main(int argc, char* argv[]) { int nr_counts = 2; if (argc > 1) nr_counts = atoi(argv[1]); if (argc > 2) is8254 = 1; if (ioperm(0x40, 4, 1) != 0) return 1; dump_pit(); printf("Set oneshot\n"); outb(0x38, PIT_MODE); outb(0x00, PIT_CH0); outb(0x0F, PIT_CH0); dump_pit(); usleep(1000); dump_pit(); printf("Set periodic\n"); outb(0x34, PIT_MODE); outb(0x00, PIT_CH0); outb(0x0F, PIT_CH0); dump_pit(); usleep(1000); dump_pit(); dump_pit(); usleep(100000); dump_pit(); usleep(100000); dump_pit(); printf("Set stop (%d counter writes)\n", nr_counts); outb(0x30, PIT_MODE); while (nr_counts--) outb(0xFF, PIT_CH0); dump_pit(); usleep(100000); dump_pit(); usleep(100000); dump_pit(); printf("Set MODE 0\n"); outb(0x30, PIT_MODE); dump_pit(); usleep(100000); dump_pit(); usleep(100000); dump_pit(); return 0; } ===== Suggested-by: Sean Christopherson <seanjc@google.com> Co-developed-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mhkelley@outlook.com> Link: https://lore.kernel.org/all/20240802135555.564941-2-dwmw2@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-13pps: Fix a use-after-freeCalvin Owens
commit c79a39dc8d060b9e64e8b0fa9d245d44befeefbe upstream. On a board running ntpd and gpsd, I'm seeing a consistent use-after-free in sys_exit() from gpsd when rebooting: pps pps1: removed ------------[ cut here ]------------ kobject: '(null)' (00000000db4bec24): is not initialized, yet kobject_put() is being called. WARNING: CPU: 2 PID: 440 at lib/kobject.c:734 kobject_put+0x120/0x150 CPU: 2 UID: 299 PID: 440 Comm: gpsd Not tainted 6.11.0-rc6-00308-gb31c44928842 #1 Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT) pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : kobject_put+0x120/0x150 lr : kobject_put+0x120/0x150 sp : ffffffc0803d3ae0 x29: ffffffc0803d3ae0 x28: ffffff8042dc9738 x27: 0000000000000001 x26: 0000000000000000 x25: ffffff8042dc9040 x24: ffffff8042dc9440 x23: ffffff80402a4620 x22: ffffff8042ef4bd0 x21: ffffff80405cb600 x20: 000000000008001b x19: ffffff8040b3b6e0 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 696e6920746f6e20 x14: 7369203a29343263 x13: 205d303434542020 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: kobject_put+0x120/0x150 cdev_put+0x20/0x3c __fput+0x2c4/0x2d8 ____fput+0x1c/0x38 task_work_run+0x70/0xfc do_exit+0x2a0/0x924 do_group_exit+0x34/0x90 get_signal+0x7fc/0x8c0 do_signal+0x128/0x13b4 do_notify_resume+0xdc/0x160 el0_svc+0xd4/0xf8 el0t_64_sync_handler+0x140/0x14c el0t_64_sync+0x190/0x194 ---[ end trace 0000000000000000 ]--- ...followed by more symptoms of corruption, with similar stacks: refcount_t: underflow; use-after-free. kernel BUG at lib/list_debug.c:62! Kernel panic - not syncing: Oops - BUG: Fatal exception This happens because pps_device_destruct() frees the pps_device with the embedded cdev immediately after calling cdev_del(), but, as the comment above cdev_del() notes, fops for previously opened cdevs are still callable even after cdev_del() returns. I think this bug has always been there: I can't explain why it suddenly started happening every time I reboot this particular board. In commit d953e0e837e6 ("pps: Fix a use-after free bug when unregistering a source."), George Spelvin suggested removing the embedded cdev. That seems like the simplest way to fix this, so I've implemented his suggestion, using __register_chrdev() with pps_idr becoming the source of truth for which minor corresponds to which device. But now that pps_idr defines userspace visibility instead of cdev_add(), we need to be sure the pps->dev refcount can't reach zero while userspace can still find it again. So, the idr_remove() call moves to pps_unregister_cdev(), and pps_idr now holds a reference to pps->dev. pps_core: source serial1 got cdev (251:1) <...> pps pps1: removed pps_core: unregistering pps1 pps_core: deallocating pps1 Fixes: d953e0e837e6 ("pps: Fix a use-after free bug when unregistering a source.") Cc: stable@vger.kernel.org Signed-off-by: Calvin Owens <calvin@wbinvd.org> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Link: https://lore.kernel.org/r/a17975fd5ae99385791929e563f72564edbcf28f.1731383727.git.calvin@wbinvd.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-13x86/i8253: Disable PIT timer 0 when not in useDavid Woodhouse
commit 70e6b7d9ae3c63df90a7bba7700e8d5c300c3c60 upstream. Leaving the PIT interrupt running can cause noticeable steal time for virtual guests. The VMM generally has a timer which toggles the IRQ input to the PIC and I/O APIC, which takes CPU time away from the guest. Even on real hardware, running the counter may use power needlessly (albeit not much). Make sure it's turned off if it isn't going to be used. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mhkelley@outlook.com> Link: https://lore.kernel.org/all/20240802135555.564941-1-dwmw2@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-13can: ems_pci: move ASIX AX99100 ids to pci_ids.hJiaqing Zhao
commit 3029ad91335353a70feb42acd24d580d70ab258b upstream. Move PCI Vendor and Device ID of ASIX AX99100 PCIe to Multi I/O Controller to pci_ids.h for its serial and parallel port driver support in subsequent patches. Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/r/20230724083933.3173513-3-jiaqing.zhao@linux.intel.com [Moeko: Drop changes in drivers/net/can/sja1000/ems_pci.c] Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-13net: add dev_net_rcu() helperEric Dumazet
[ Upstream commit 482ad2a4ace2740ca0ff1cbc8f3c7f862f3ab507 ] dev->nd_net can change, readers should either use rcu_read_lock() or RTNL. We currently use a generic helper, dev_net() with no debugging support. We probably have many hidden bugs. Add dev_net_rcu() helper for callers using rcu_read_lock() protection. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250205155120.1676781-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: dd205fcc33d9 ("ipv4: use RCU protection in rt_is_expired()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-13KVM: Explicitly verify target vCPU is online in kvm_get_vcpu()Sean Christopherson
commit 1e7381f3617d14b3c11da80ff5f8a93ab14cfc46 upstream. Explicitly verify the target vCPU is fully online _prior_ to clamping the index in kvm_get_vcpu(). If the index is "bad", the nospec clamping will generate '0', i.e. KVM will return vCPU0 instead of NULL. In practice, the bug is unlikely to cause problems, as it will only come into play if userspace or the guest is buggy or misbehaving, e.g. KVM may send interrupts to vCPU0 instead of dropping them on the floor. However, returning vCPU0 when it shouldn't exist per online_vcpus is problematic now that KVM uses an xarray for the vCPUs array, as KVM needs to insert into the xarray before publishing the vCPU to userspace (see commit c5b077549136 ("KVM: Convert the kvm->vcpus array to a xarray")), i.e. before vCPU creation is guaranteed to succeed. As a result, incorrectly providing access to vCPU0 will trigger a use-after-free if vCPU0 is dereferenced and kvm_vm_ioctl_create_vcpu() bails out of vCPU creation due to an error and frees vCPU0. Commit afb2acb2e3a3 ("KVM: Fix vcpu_array[0] races") papered over that issue, but in doing so introduced an unsolvable teardown conundrum. Preventing accesses to vCPU0 before it's fully online will allow reverting commit afb2acb2e3a3, without re-introducing the vcpu_array[0] UAF race. Fixes: 1d487e9bf8ba ("KVM: fix spectrev1 gadgets") Cc: stable@vger.kernel.org Cc: Will Deacon <will@kernel.org> Cc: Michal Luczaj <mhal@rbox.co> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241009150455.1057573-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-13usb: xhci: Add timeout argument in address_device USB HCD callbackHardik Gajjar
[ Upstream commit a769154c7cac037914ba375ae88aae55b2c853e0 ] - The HCD address_device callback now accepts a user-defined timeout value in milliseconds, providing better control over command execution times. - The default timeout value for the address_device command has been set to 5000 ms, aligning with the USB 3.2 specification. However, this timeout can be adjusted as needed. - The xhci_setup_device function has been updated to accept the timeout value, allowing it to specify the maximum wait time for the command operation to complete. - The hub driver has also been updated to accommodate the newly added timeout parameter during the SET_ADDRESS request. Signed-off-by: Hardik Gajjar <hgajjar@de.adit-jv.com> Reviewed-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20231027152029.104363-1-hgajjar@de.adit-jv.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 1e0a19912adb ("usb: xhci: Fix NULL pointer dereference on certain command aborts") Signed-off-by: Sasha Levin <sashal@kernel.org>