summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2023-12-20cred: switch to using atomic_long_tJens Axboe
commit f8fa5d76925991976b3e7076f9d1052515ec1fca upstream. There are multiple ways to grab references to credentials, and the only protection we have against overflowing it is the memory required to do so. With memory sizes only moving in one direction, let's bump the reference count to 64-bit and move it outside the realm of feasibly overflowing. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28pwm: Fix double shift bugDan Carpenter
[ Upstream commit d27abbfd4888d79dd24baf50e774631046ac4732 ] These enums are passed to set/test_bit(). The set/test_bit() functions take a bit number instead of a shifted value. Passing a shifted value is a double shift bug like doing BIT(BIT(1)). The double shift bug doesn't cause a problem here because we are only checking 0 and 1 but if the value was 5 or above then it can lead to a buffer overflow. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Sam Protsenko <semen.protsenko@linaro.org> Signed-off-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-08PCI: Prevent xHCI driver from claiming AMD VanGogh USB3 DRD deviceVicki Pfau
commit 7e6f3b6d2c352b5fde37ce3fed83bdf6172eebd4 upstream. The AMD VanGogh SoC contains a DesignWare USB3 Dual-Role Device that can be operated as either a USB Host or a USB Device, similar to on the AMD Nolan platform. be6646bfbaec ("PCI: Prevent xHCI driver from claiming AMD Nolan USB3 DRD device") added a quirk to let the dwc3 driver claim the Nolan device since it provides more specific support. Extend that quirk to include the VanGogh SoC USB3 device. Link: https://lore.kernel.org/r/20230927202212.2388216-1-vi@endrift.com Signed-off-by: Vicki Pfau <vi@endrift.com> [bhelgaas: include be6646bfbaec reference, add stable tag] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # v3.19+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-08rpmsg: Fix calling device_lock() on non-initialized deviceKrzysztof Kozlowski
commit bb17d110cbf270d5247a6e261c5ad50e362d1675 upstream. driver_set_override() helper uses device_lock() so it should not be called before rpmsg_register_device() (which calls device_register()). Effect can be seen with CONFIG_DEBUG_MUTEXES: DEBUG_LOCKS_WARN_ON(lock->magic != lock) WARNING: CPU: 3 PID: 57 at kernel/locking/mutex.c:582 __mutex_lock+0x1ec/0x430 ... Call trace: __mutex_lock+0x1ec/0x430 mutex_lock_nested+0x44/0x50 driver_set_override+0x124/0x150 qcom_glink_native_probe+0x30c/0x3b0 glink_rpm_probe+0x274/0x350 platform_probe+0x6c/0xe0 really_probe+0x17c/0x3d0 __driver_probe_device+0x114/0x190 driver_probe_device+0x3c/0xf0 ... Refactor the rpmsg_register_device() function to use two-step device registering (initialization + add) and call driver_set_override() in proper moment. This moves the code around, so while at it also NULL-ify the rpdev->driver_override in error path to be sure it won't be kfree() second time. Fixes: 42cd402b8fd4 ("rpmsg: Fix kfree() of static memory on setting driver_override") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20220429195946.1061725-2-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit bb17d110cbf270d5247a6e261c5ad50e362d1675) Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-08rpmsg: Fix kfree() of static memory on setting driver_overrideKrzysztof Kozlowski
commit 42cd402b8fd4672b692400fe5f9eecd55d2794ac upstream. The driver_override field from platform driver should not be initialized from static memory (string literal) because the core later kfree() it, for example when driver_override is set via sysfs. Use dedicated helper to set driver_override properly. Fixes: 950a7388f02b ("rpmsg: Turn name service into a stand alone driver") Fixes: c0cdc19f84a4 ("rpmsg: Driver for user space endpoint interface") Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220419113435.246203-13-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-08driver: platform: Add helper for safer setting of driver_overrideKrzysztof Kozlowski
commit 6c2f421174273de8f83cde4286d1c076d43a2d35 upstream. Several core drivers and buses expect that driver_override is a dynamically allocated memory thus later they can kfree() it. However such assumption is not documented, there were in the past and there are already users setting it to a string literal. This leads to kfree() of static memory during device release (e.g. in error paths or during unbind): kernel BUG at ../mm/slub.c:3960! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM ... (kfree) from [<c058da50>] (platform_device_release+0x88/0xb4) (platform_device_release) from [<c0585be0>] (device_release+0x2c/0x90) (device_release) from [<c0a69050>] (kobject_put+0xec/0x20c) (kobject_put) from [<c0f2f120>] (exynos5_clk_probe+0x154/0x18c) (exynos5_clk_probe) from [<c058de70>] (platform_drv_probe+0x6c/0xa4) (platform_drv_probe) from [<c058b7ac>] (really_probe+0x280/0x414) (really_probe) from [<c058baf4>] (driver_probe_device+0x78/0x1c4) (driver_probe_device) from [<c0589854>] (bus_for_each_drv+0x74/0xb8) (bus_for_each_drv) from [<c058b48c>] (__device_attach+0xd4/0x16c) (__device_attach) from [<c058a638>] (bus_probe_device+0x88/0x90) (bus_probe_device) from [<c05871fc>] (device_add+0x3dc/0x62c) (device_add) from [<c075ff10>] (of_platform_device_create_pdata+0x94/0xbc) (of_platform_device_create_pdata) from [<c07600ec>] (of_platform_bus_create+0x1a8/0x4fc) (of_platform_bus_create) from [<c0760150>] (of_platform_bus_create+0x20c/0x4fc) (of_platform_bus_create) from [<c07605f0>] (of_platform_populate+0x84/0x118) (of_platform_populate) from [<c0f3c964>] (of_platform_default_populate_init+0xa0/0xb8) (of_platform_default_populate_init) from [<c01031f8>] (do_one_initcall+0x8c/0x404) Provide a helper which clearly documents the usage of driver_override. This will allow later to reuse the helper and reduce the amount of duplicated code. Convert the platform driver to use a new helper and make the driver_override field const char (it is not modified by the core). Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220419113435.246203-2-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25perf: Disallow mis-matched inherited group readsPeter Zijlstra
commit 32671e3799ca2e4590773fd0e63aaa4229e50c06 upstream. Because group consistency is non-atomic between parent (filedesc) and children (inherited) events, it is possible for PERF_FORMAT_GROUP read() to try and sum non-matching counter groups -- with non-sensical results. Add group_generation to distinguish the case where a parent group removes and adds an event and thus has the same number, but a different configuration of events as inherited groups. This became a problem when commit fa8c269353d5 ("perf/core: Invert perf_read_group() loops") flipped the order of child_list and sibling_list. Previously it would iterate the group (sibling_list) first, and for each sibling traverse the child_list. In this order, only the group composition of the parent is relevant. By flipping the order the group composition of the child (inherited) events becomes an issue and the mis-match in group composition becomes evident. That said; even prior to this commit, while reading of a group that is not equally inherited was not broken, it still made no sense. (Ab)use ECHILD as error return to indicate issues with child process group composition. Fixes: fa8c269353d5 ("perf/core: Invert perf_read_group() loops") Reported-by: Budimir Markovic <markovicbudimir@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20231018115654.GK33217@noisy.programming.kicks-ass.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25mcb: remove is_added flag from mcb_device structJorge Sanjuan Garcia
commit 0f28ada1fbf0054557cddcdb93ad17f767105208 upstream. When calling mcb_bus_add_devices(), both mcb devices and the mcb bus will attempt to attach a device to a driver because they share the same bus_type. This causes an issue when trying to cast the container of the device to mcb_device struct using to_mcb_device(), leading to a wrong cast when the mcb_bus is added. A crash occurs when freing the ida resources as the bus numbering of mcb_bus gets confused with the is_added flag on the mcb_device struct. The only reason for this cast was to keep an is_added flag on the mcb_device struct that does not seem necessary. The function device_attach() handles already bound devices and the mcb subsystem does nothing special with this is_added flag so remove it completely. Fixes: 18d288198099 ("mcb: Correctly initialize the bus's device") Cc: stable <stable@kernel.org> Signed-off-by: Jorge Sanjuan Garcia <jorge.sanjuangarcia@duagon.com> Co-developed-by: Jose Javier Rodriguez Barbarin <JoseJavier.Rodriguez@duagon.com> Signed-off-by: Jose Javier Rodriguez Barbarin <JoseJavier.Rodriguez@duagon.com> Link: https://lore.kernel.org/r/20230906114901.63174-2-JoseJavier.Rodriguez@duagon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-10ata: libata: disallow dev-initiated LPM transitions to unsupported statesNiklas Cassel
commit 24e0e61db3cb86a66824531989f1df80e0939f26 upstream. In AHCI 1.3.1, the register description for CAP.SSC: "When cleared to ‘0’, software must not allow the HBA to initiate transitions to the Slumber state via agressive link power management nor the PxCMD.ICC field in each port, and the PxSCTL.IPM field in each port must be programmed to disallow device initiated Slumber requests." In AHCI 1.3.1, the register description for CAP.PSC: "When cleared to ‘0’, software must not allow the HBA to initiate transitions to the Partial state via agressive link power management nor the PxCMD.ICC field in each port, and the PxSCTL.IPM field in each port must be programmed to disallow device initiated Partial requests." Ensure that we always set the corresponding bits in PxSCTL.IPM, such that a device is not allowed to initiate transitions to power states which are unsupported by the HBA. DevSleep is always initiated by the HBA, however, for completeness, set the corresponding bit in PxSCTL.IPM such that agressive link power management cannot transition to DevSleep if DevSleep is not supported. sata_link_scr_lpm() is used by libahci, ata_piix and libata-pmp. However, only libahci has the ability to read the CAP/CAP2 register to see if these features are supported. Therefore, in order to not introduce any regressions on ata_piix or libata-pmp, create flags that indicate that the respective feature is NOT supported. This way, the behavior for ata_piix and libata-pmp should remain unchanged. This change is based on a patch originally submitted by Runa Guo-oc. Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Fixes: 1152b2617a6e ("libata: implement sata_link_scr_lpm() and make ata_dev_set_feature() global") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-10ata: libata-sata: increase PMP SRST timeout to 10sMatthias Schiffer
commit 753a4d531bc518633ea88ac0ed02b25a16823d51 upstream. On certain SATA controllers, softreset fails after wakeup from S2RAM with the message "softreset failed (1st FIS failed)", sometimes resulting in drives not being detected again. With the increased timeout, this issue is avoided. Instead, "softreset failed (device not ready)" is now logged 1-2 times; this later failure seems to cause fewer problems however, and the drives are detected reliably once they've spun up and the probe is retried. The issue was observed with the primary SATA controller of the QNAP TS-453B, which is an "Intel Corporation Celeron/Pentium Silver Processor SATA Controller [8086:31e3] (rev 06)" integrated in the Celeron J4125 CPU, and the following drives: - Seagate IronWolf ST12000VN0008 - Seagate IronWolf ST8000NE0004 The SATA controller seems to be more relevant to this issue than the drives, as the same drives are always detected reliably on the secondary SATA controller on the same board (an ASMedia 106x) without any "softreset failed" errors even without the increased timeout. Fixes: e7d3ef13d52a ("libata: change drive ready wait after hard reset to 5s") Cc: stable@vger.kernel.org Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-10team: fix null-ptr-deref when team device type is changedZiyang Xuan
[ Upstream commit 492032760127251e5540a5716a70996bacf2a3fd ] Get a null-ptr-deref bug as follows with reproducer [1]. BUG: kernel NULL pointer dereference, address: 0000000000000228 ... RIP: 0010:vlan_dev_hard_header+0x35/0x140 [8021q] ... Call Trace: <TASK> ? __die+0x24/0x70 ? page_fault_oops+0x82/0x150 ? exc_page_fault+0x69/0x150 ? asm_exc_page_fault+0x26/0x30 ? vlan_dev_hard_header+0x35/0x140 [8021q] ? vlan_dev_hard_header+0x8e/0x140 [8021q] neigh_connected_output+0xb2/0x100 ip6_finish_output2+0x1cb/0x520 ? nf_hook_slow+0x43/0xc0 ? ip6_mtu+0x46/0x80 ip6_finish_output+0x2a/0xb0 mld_sendpack+0x18f/0x250 mld_ifc_work+0x39/0x160 process_one_work+0x1e6/0x3f0 worker_thread+0x4d/0x2f0 ? __pfx_worker_thread+0x10/0x10 kthread+0xe5/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 [1] $ teamd -t team0 -d -c '{"runner": {"name": "loadbalance"}}' $ ip link add name t-dummy type dummy $ ip link add link t-dummy name t-dummy.100 type vlan id 100 $ ip link add name t-nlmon type nlmon $ ip link set t-nlmon master team0 $ ip link set t-nlmon nomaster $ ip link set t-dummy up $ ip link set team0 up $ ip link set t-dummy.100 down $ ip link set t-dummy.100 master team0 When enslave a vlan device to team device and team device type is changed from non-ether to ether, header_ops of team device is changed to vlan_header_ops. That is incorrect and will trigger null-ptr-deref for vlan->real_dev in vlan_dev_hard_header() because team device is not a vlan device. Cache eth_header_ops in team_setup(), then assign cached header_ops to header_ops of team net device when its type is changed from non-ether to ether to fix the bug. Fixes: 1d76efe1577b ("team: add support for non-ethernet devices") Suggested-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230918123011.1884401-1-william.xuanziyang@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-23hw_breakpoint: fix single-stepping when using bpf_overflow_handlerTomislav Novak
[ Upstream commit d11a69873d9a7435fe6a48531e165ab80a8b1221 ] Arm platforms use is_default_overflow_handler() to determine if the hw_breakpoint code should single-step over the breakpoint trigger or let the custom handler deal with it. Since bpf_overflow_handler() currently isn't recognized as a default handler, attaching a BPF program to a PERF_TYPE_BREAKPOINT event causes it to keep firing (the instruction triggering the data abort exception is never skipped). For example: # bpftrace -e 'watchpoint:0x10000:4:w { print("hit") }' -c ./test Attaching 1 probe... hit hit [...] ^C (./test performs a single 4-byte store to 0x10000) This patch replaces the check with uses_default_overflow_handler(), which accounts for the bpf_overflow_handler() case by also testing if one of the perf_event_output functions gets invoked indirectly, via orig_default_handler. Signed-off-by: Tomislav Novak <tnovak@meta.com> Tested-by: Samuel Gosselin <sgosselin@google.com> # arm64 Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/linux-arm-kernel/20220923203644.2731604-1-tnovak@fb.com/ Link: https://lore.kernel.org/r/20230605191923.1219974-1-tnovak@meta.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-23net: handle ARPHRD_PPP in dev_is_mac_header_xmit()Nicolas Dichtel
commit a4f39c9f14a634e4cd35fcd338c239d11fcc73fc upstream. The goal is to support a bpf_redirect() from an ethernet device (ingress) to a ppp device (egress). The l2 header is added automatically by the ppp driver, thus the ethernet header should be removed. CC: stable@vger.kernel.org Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Tested-by: Siwar Zitouni <siwar.zitouni@6wind.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-23fs/nls: make load_nls() take a const parameterWinston Wen
[ Upstream commit c1ed39ec116272935528ca9b348b8ee79b0791da ] load_nls() take a char * parameter, use it to find nls module in list or construct the module name to load it. This change make load_nls() take a const parameter, so we don't need do some cast like this: ses->local_nls = load_nls((char *)ctx->local_nls->charset); Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Winston Wen <wentao@uniontech.com> Reviewed-by: Paulo Alcantara <pc@manguebit.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30scsi: core: raid_class: Remove raid_component_add()Zhu Wang
commit 60c5fd2e8f3c42a5abc565ba9876ead1da5ad2b7 upstream. The raid_component_add() function was added to the kernel tree via patch "[SCSI] embryonic RAID class" (2005). Remove this function since it never has had any callers in the Linux kernel. And also raid_component_release() is only used in raid_component_add(), so it is also removed. Signed-off-by: Zhu Wang <wangzhu9@huawei.com> Link: https://lore.kernel.org/r/20230822015254.184270-1-wangzhu9@huawei.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Fixes: 04b5b5cb0136 ("scsi: core: Fix possible memory leak if device_add() fails") Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-30net: do not allow gso_size to be set to GSO_BY_FRAGSEric Dumazet
[ Upstream commit b616be6b97688f2f2bd7c4a47ab32f27f94fb2a9 ] One missing check in virtio_net_hdr_to_skb() allowed syzbot to crash kernels again [1] Do not allow gso_size to be set to GSO_BY_FRAGS (0xffff), because this magic value is used by the kernel. [1] general protection fault, probably for non-canonical address 0xdffffc000000000e: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077] CPU: 0 PID: 5039 Comm: syz-executor401 Not tainted 6.5.0-rc5-next-20230809-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 RIP: 0010:skb_segment+0x1a52/0x3ef0 net/core/skbuff.c:4500 Code: 00 00 00 e9 ab eb ff ff e8 6b 96 5d f9 48 8b 84 24 00 01 00 00 48 8d 78 70 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e ea 21 00 00 48 8b 84 24 00 01 RSP: 0018:ffffc90003d3f1c8 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 000000000001fffe RCX: 0000000000000000 RDX: 000000000000000e RSI: ffffffff882a3115 RDI: 0000000000000070 RBP: ffffc90003d3f378 R08: 0000000000000005 R09: 000000000000ffff R10: 000000000000ffff R11: 5ee4a93e456187d6 R12: 000000000001ffc6 R13: dffffc0000000000 R14: 0000000000000008 R15: 000000000000ffff FS: 00005555563f2380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020020000 CR3: 000000001626d000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> udp6_ufo_fragment+0x9d2/0xd50 net/ipv6/udp_offload.c:109 ipv6_gso_segment+0x5c4/0x17b0 net/ipv6/ip6_offload.c:120 skb_mac_gso_segment+0x292/0x610 net/core/gso.c:53 __skb_gso_segment+0x339/0x710 net/core/gso.c:124 skb_gso_segment include/net/gso.h:83 [inline] validate_xmit_skb+0x3a5/0xf10 net/core/dev.c:3625 __dev_queue_xmit+0x8f0/0x3d60 net/core/dev.c:4329 dev_queue_xmit include/linux/netdevice.h:3082 [inline] packet_xmit+0x257/0x380 net/packet/af_packet.c:276 packet_snd net/packet/af_packet.c:3087 [inline] packet_sendmsg+0x24c7/0x5570 net/packet/af_packet.c:3119 sock_sendmsg_nosec net/socket.c:727 [inline] sock_sendmsg+0xd9/0x180 net/socket.c:750 ____sys_sendmsg+0x6ac/0x940 net/socket.c:2496 ___sys_sendmsg+0x135/0x1d0 net/socket.c:2550 __sys_sendmsg+0x117/0x1e0 net/socket.c:2579 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7ff27cdb34d9 Fixes: 3953c46c3ac7 ("sk_buff: allow segmenting based on frag sizes") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Xin Long <lucien.xin@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20230816142158.1779798-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-11tcp: annotate data-races around fastopenq.max_qlenEric Dumazet
[ Upstream commit 70f360dd7042cb843635ece9d28335a4addff9eb ] This field can be read locklessly. Fixes: 1536e2857bd3 ("tcp: Add a TCP_FASTOPEN socket option to get a max backlog on its listner") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-12-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-11posix-timers: Ensure timer ID search-loop limit is validThomas Gleixner
[ Upstream commit 8ce8849dd1e78dadcee0ec9acbd259d239b7069f ] posix_timer_add() tries to allocate a posix timer ID by starting from the cached ID which was stored by the last successful allocation. This is done in a loop searching the ID space for a free slot one by one. The loop has to terminate when the search wrapped around to the starting point. But that's racy vs. establishing the starting point. That is read out lockless, which leads to the following problem: CPU0 CPU1 posix_timer_add() start = sig->posix_timer_id; lock(hash_lock); ... posix_timer_add() if (++sig->posix_timer_id < 0) start = sig->posix_timer_id; sig->posix_timer_id = 0; So CPU1 can observe a negative start value, i.e. -1, and the loop break never happens because the condition can never be true: if (sig->posix_timer_id == start) break; While this is unlikely to ever turn into an endless loop as the ID space is huge (INT_MAX), the racy read of the start value caught the attention of KCSAN and Dmitry unearthed that incorrectness. Rewrite it so that all id operations are under the hash lock. Reported-by: syzbot+5c54bd3eb218bb595aa9@syzkaller.appspotmail.com Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/87bkhzdn6g.ffs@tglx Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-11Revert "8250: add support for ASIX devices with a FIFO bug"Jiaqing Zhao
commit a82d62f708545d22859584e0e0620da8e3759bbc upstream. This reverts commit eb26dfe8aa7eeb5a5aa0b7574550125f8aa4c3b3. Commit eb26dfe8aa7e ("8250: add support for ASIX devices with a FIFO bug") merged on Jul 13, 2012 adds a quirk for PCI_VENDOR_ID_ASIX (0x9710). But that ID is the same as PCI_VENDOR_ID_NETMOS defined in 1f8b061050c7 ("[PATCH] Netmos parallel/serial/combo support") merged on Mar 28, 2005. In pci_serial_quirks array, the NetMos entry always takes precedence over the ASIX entry even since it was initially merged, code in that commit is always unreachable. In my tests, adding the FIFO workaround to pci_netmos_init() makes no difference, and the vendor driver also does not have such workaround. Given that the code was never used for over a decade, it's safe to revert it. Also, the real PCI_VENDOR_ID_ASIX should be 0x125b, which is used on their newer AX99100 PCIe serial controllers released on 2016. The FIFO workaround should not be intended for these newer controllers, and it was never implemented in vendor driver. Fixes: eb26dfe8aa7e ("8250: add support for ASIX devices with a FIFO bug") Cc: stable <stable@kernel.org> Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20230619155743.827859-1-jiaqing.zhao@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-11workqueue: clean up WORK_* constant types, clarify maskingLinus Torvalds
commit afa4bb778e48d79e4a642ed41e3b4e0de7489a6c upstream. Dave Airlie reports that gcc-13.1.1 has started complaining about some of the workqueue code in 32-bit arm builds: kernel/workqueue.c: In function ‘get_work_pwq’: kernel/workqueue.c:713:24: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] 713 | return (void *)(data & WORK_STRUCT_WQ_DATA_MASK); | ^ [ ... a couple of other cases ... ] and while it's not immediately clear exactly why gcc started complaining about it now, I suspect it's some C23-induced enum type handlign fixup in gcc-13 is the cause. Whatever the reason for starting to complain, the code and data types are indeed disgusting enough that the complaint is warranted. The wq code ends up creating various "helper constants" (like that WORK_STRUCT_WQ_DATA_MASK) using an enum type, which is all kinds of confused. The mask needs to be 'unsigned long', not some unspecified enum type. To make matters worse, the actual "mask and cast to a pointer" is repeated a couple of times, and the cast isn't even always done to the right pointer, but - as the error case above - to a 'void *' with then the compiler finishing the job. That's now how we roll in the kernel. So create the masks using the proper types rather than some ambiguous enumeration, and use a nice helper that actually does the type conversion in one well-defined place. Incidentally, this magically makes clang generate better code. That, admittedly, is really just a sign of clang having been seriously confused before, and cleaning up the typing unconfuses the compiler too. Reported-by: Dave Airlie <airlied@gmail.com> Link: https://lore.kernel.org/lkml/CAPM=9twNnV4zMCvrPkw3H-ajZOH-01JVh_kDrxdPYQErz8ZTdA@mail.gmail.com/ Cc: Arnd Bergmann <arnd@arndb.de> Cc: Tejun Heo <tj@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-11PCI: Add pci_clear_master() stub for non-CONFIG_PCISui Jingfeng
[ Upstream commit 2aa5ac633259843f656eb6ecff4cf01e8e810c5e ] Add a pci_clear_master() stub when CONFIG_PCI is not set so drivers that support both PCI and platform devices don't need #ifdefs or extra Kconfig symbols for the PCI parts. [bhelgaas: commit log] Fixes: 6a479079c072 ("PCI: Add pci_clear_master() as opposite of pci_set_master()") Link: https://lore.kernel.org/r/20230531102744.2354313-1-suijingfeng@loongson.cn Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-11watchdog/perf: define dummy watchdog_update_hrtimer_threshold() on correct ↵Douglas Anderson
config [ Upstream commit 5e008df11c55228a86a1bae692cc2002503572c9 ] Patch series "watchdog/hardlockup: Add the buddy hardlockup detector", v5. This patch series adds the "buddy" hardlockup detector. In brief, the buddy hardlockup detector can detect hardlockups without arch-level support by having CPUs checkup on a "buddy" CPU periodically. Given the new design of this patch series, testing all combinations is fairly difficult. I've attempted to make sure that all combinations of CONFIG_ options are good, but it wouldn't surprise me if I missed something. I apologize in advance and I'll do my best to fix any problems that are found. This patch (of 18): The real watchdog_update_hrtimer_threshold() is defined in kernel/watchdog_hld.c. That file is included if CONFIG_HARDLOCKUP_DETECTOR_PERF and the function is defined in that file if CONFIG_HARDLOCKUP_CHECK_TIMESTAMP. The dummy version of the function in "nmi.h" didn't get that quite right. While this doesn't appear to be a huge deal, it's nice to make it consistent. It doesn't break builds because CHECK_TIMESTAMP is only defined by x86 so others don't get a double definition, and x86 uses perf lockup detector, so it gets the out of line version. Link: https://lkml.kernel.org/r/20230519101840.v5.18.Ia44852044cdcb074f387e80df6b45e892965d4a1@changeid Link: https://lkml.kernel.org/r/20230519101840.v5.1.I8cbb2f4fa740528fcfade4f5439b6cdcdd059251@changeid Fixes: 7edaeb6841df ("kernel/watchdog: Prevent false positives with turbo modes") Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guenter Roeck <groeck@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Pingfan Liu <kernelfans@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Sumit Garg <sumit.garg@linaro.org> Cc: Tzung-Bi Shih <tzungbi@chromium.org> Cc: Will Deacon <will@kernel.org> Cc: Colin Cross <ccross@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-08init: Provide arch_cpu_finalize_init()Thomas Gleixner
commit 7725acaa4f0c04fbefb0e0d342635b967bb7d414 upstream check_bugs() has become a dumping ground for all sorts of activities to finalize the CPU initialization before running the rest of the init code. Most are empty, a few do actual bug checks, some do alternative patching and some cobble a CPU advertisement string together.... Aside of that the current implementation requires duplicated function declaration and mostly empty header files for them. Provide a new function arch_cpu_finalize_init(). Provide a generic declaration if CONFIG_ARCH_HAS_CPU_FINALIZE_INIT is selected and a stub inline otherwise. This requires a temporary #ifdef in start_kernel() which will be removed along with check_bugs() once the architectures are converted over. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230613224544.957805717@linutronix.de Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-21Remove DECnet support from kernelStephen Hemminger
commit 1202cdd665315c525b5237e96e0bedc76d7e754f upstream. DECnet is an obsolete network protocol that receives more attention from kernel janitors than users. It belongs in computer protocol history museum not in Linux kernel. It has been "Orphaned" in kernel since 2010. The iproute2 support for DECnet was dropped in 5.0 release. The documentation link on Sourceforge says it is abandoned there as well. Leave the UAPI alone to keep userspace programs compiling. This means that there is still an empty neighbour table for AF_DECNET. The table of /proc/sys/net entries was updated to match current directories and reformatted to be alphabetical. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: David Ahern <dsahern@kernel.org> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-14rfs: annotate lockless accesses to RFS sock flow tableEric Dumazet
[ Upstream commit 5c3b74a92aa285a3df722bf6329ba7ccf70346d6 ] Add READ_ONCE()/WRITE_ONCE() on accesses to the sock flow table. This also prevents a (smart ?) compiler to remove the condition in: if (table->ents[index] != newval) table->ents[index] = newval; We need the condition to avoid dirtying a shared cache line. Fixes: fec5e652e58f ("rfs: Receive Flow Steering") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09cdc_ncm: Implement the 32-bit version of NCM Transfer BlockAlexander Bersenev
commit 0fa81b304a7973a499f844176ca031109487dd31 upstream. The NCM specification defines two formats of transfer blocks: with 16-bit fields (NTB-16) and with 32-bit fields (NTB-32). Currently only NTB-16 is implemented. This patch adds the support of NTB-32. The motivation behind this is that some devices such as E5785 or E5885 from the current generation of Huawei LTE routers do not support NTB-16. The previous generations of Huawei devices are also use NTB-32 by default. Also this patch enables NTB-32 by default for Huawei devices. During the 2019 ValdikSS made five attempts to contact Huawei to add the NTB-16 support to their router firmware, but they were unsuccessful. Signed-off-by: Alexander Bersenev <bay@hackerdom.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: 7e01c7f7046e ("net: cdc_ncm: Deal with too low values of dwNtbOutMaxSize") Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-30power: supply: bq27xxx: Fix poll_interval handling and races on removeHans de Goede
commit c00bc80462afc7963f449d7f21d896d2f629cacc upstream. Before this patch bq27xxx_battery_teardown() was setting poll_interval = 0 to avoid bq27xxx_battery_update() requeuing the delayed_work item. There are 2 problems with this: 1. If the driver is unbound through sysfs, rather then the module being rmmod-ed, this changes poll_interval unexpectedly 2. This is racy, after it being set poll_interval could be changed before bq27xxx_battery_update() checks it through /sys/module/bq27xxx_battery/parameters/poll_interval Fix this by added a removed attribute to struct bq27xxx_device_info and using that instead of setting poll_interval to 0. There also is another poll_interval related race on remove(), writing /sys/module/bq27xxx_battery/parameters/poll_interval will requeue the delayed_work item for all devices on the bq27xxx_battery_devices list and the device being removed was only removed from that list after cancelling the delayed_work item. Fix this by moving the removal from the bq27xxx_battery_devices list to before cancelling the delayed_work item. Fixes: 8cfaaa811894 ("bq27x00_battery: Fix OOPS caused by unregistring bq27x00 driver") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-30USB: core: Add routines for endpoint checks in old driversAlan Stern
commit 13890626501ffda22b18213ddaf7930473da5792 upstream. Many of the older USB drivers in the Linux USB stack were written based simply on a vendor's device specification. They use the endpoint information in the spec and assume these endpoints will always be present, with the properties listed, in any device matching the given vendor and product IDs. While that may have been true back then, with spoofing and fuzzing it is not true any more. More and more we are finding that those old drivers need to perform at least a minimum of checking before they try to use any endpoint other than ep0. To make this checking as simple as possible, we now add a couple of utility routines to the USB core. usb_check_bulk_endpoints() and usb_check_int_endpoints() take an interface pointer together with a list of endpoint addresses (numbers and directions). They check that the interface's current alternate setting includes endpoints with those addresses and that each of these endpoints has the right type: bulk or interrupt, respectively. Although we already have usb_find_common_endpoints() and related routines meant for a similar purpose, they are not well suited for this kind of checking. Those routines find endpoints of various kinds, but only one (either the first or the last) of each kind, and they don't verify that the endpoints' addresses agree with what the caller expects. In theory the new routines could be more general: They could take a particular altsetting as their argument instead of always using the interface's current altsetting. In practice I think this won't matter too much; multiple altsettings tend to be used for transferring media (audio or visual) over isochronous endpoints, not bulk or interrupt. Drivers for such devices will generally require more sophisticated checking than these simplistic routines provide. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/dd2c8e8c-2c87-44ea-ba17-c64b97e201c9@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-30sched: Fix KCSAN noinstr violationJosh Poimboeuf
[ Upstream commit e0b081d17a9f4e5c0cbb0e5fbeb1abe3de0f7e4e ] With KCSAN enabled, end_of_stack() can get out-of-lined. Force it inline. Fixes the following warnings: vmlinux.o: warning: objtool: check_stackleak_irqoff+0x2b: call to end_of_stack() leaves .noinstr.text section Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/cc1b4d73d3a428a00d206242a68fdf99a934ca7b.1681320026.git.jpoimboe@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-17printk: declare printk_deferred_{enter,safe}() in include/linux/printk.hTetsuo Handa
commit 85e3e7fbbb720b9897fba9a99659e31cbd1c082e upstream. [This patch implements subset of original commit 85e3e7fbbb72 ("printk: remove NMI tracking") where commit 1007843a9190 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock") depends on, for commit 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation") was backported to stable.] All NMI contexts are handled the same as the safe context: store the message and defer printing. There is no need to have special NMI context tracking for this. Using in_nmi() is enough. There are several parts of the kernel that are manually calling into the printk NMI context tracking in order to cause general printk deferred printing: arch/arm/kernel/smp.c arch/powerpc/kexec/crash.c kernel/trace/trace.c For arm/kernel/smp.c and powerpc/kexec/crash.c, provide a new function pair printk_deferred_enter/exit that explicitly achieves the same objective. For ftrace, remove the printk context manipulation completely. It was added in commit 03fc7f9c99c1 ("printk/nmi: Prevent deadlock when accessing the main log buffer in NMI"). The purpose was to enforce storing messages directly into the ring buffer even in NMI context. It really should have only modified the behavior in NMI context. There is no need for a special behavior any longer. All messages are always stored directly now. The console deferring is handled transparently in vprintk(). Signed-off-by: John Ogness <john.ogness@linutronix.de> [pmladek@suse.com: Remove special handling in ftrace.c completely. Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210715193359.25946-5-john.ogness@linutronix.de [penguin-kernel: Copy only printk_deferred_{enter,safe}() definition ] Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-17tty: Prevent writing chars during tcsetattr TCSADRAIN/FLUSHIlpo Järvinen
If userspace races tcsetattr() with a write, the drained condition might not be guaranteed by the kernel. There is a race window after checking Tx is empty before tty_set_termios() takes termios_rwsem for write. During that race window, more characters can be queued by a racing writer. Any ongoing transmission might produce garbage during HW's ->set_termios() call. The intent of TCSADRAIN/FLUSH seems to be preventing such a character corruption. If those flags are set, take tty's write lock to stop any writer before performing the lower layer Tx empty check and wait for the pending characters to be sent (if any). The initial wait for all-writers-done must be placed outside of tty's write lock to avoid deadlock which makes it impossible to use tty_wait_until_sent(). The write lock is retried if a racing write is detected. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20230317113318.31327-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 094fb49a2d0d6827c86d2e0840873e6db0c491d2) Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-17SUNRPC: remove the maximum number of retries in call_bind_statusDai Ngo
[ Upstream commit 691d0b782066a6eeeecbfceb7910a8f6184e6105 ] Currently call_bind_status places a hard limit of 3 to the number of retries on EACCES error. This limit was done to prevent NLM unlock requests from being hang forever when the server keeps returning garbage. However this change causes problem for cases when NLM service takes longer than 9 seconds to register with the port mapper after a restart. This patch removes this hard coded limit and let the RPC handles the retry based on the standard hard/soft task semantics. Fixes: 0b760113a3a1 ("NLM: Don't hang forever on NLM unlock requests") Reported-by: Helen Chao <helen.chao@oracle.com> Tested-by: Helen Chao <helen.chao@oracle.com> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-17linux/vt_buffer.h: allow either builtin or modular for macrosRandy Dunlap
[ Upstream commit 2b76ffe81e32afd6d318dc4547e2ba8c46207b77 ] Fix build errors on ARCH=alpha when CONFIG_MDA_CONSOLE=m. This allows the ARCH macros to be the only ones defined. In file included from ../drivers/video/console/mdacon.c:37: ../arch/alpha/include/asm/vga.h:17:40: error: expected identifier or '(' before 'volatile' 17 | static inline void scr_writew(u16 val, volatile u16 *addr) | ^~~~~~~~ ../include/linux/vt_buffer.h:24:34: note: in definition of macro 'scr_writew' 24 | #define scr_writew(val, addr) (*(addr) = (val)) | ^~~~ ../include/linux/vt_buffer.h:24:40: error: expected ')' before '=' token 24 | #define scr_writew(val, addr) (*(addr) = (val)) | ^ ../arch/alpha/include/asm/vga.h:17:20: note: in expansion of macro 'scr_writew' 17 | static inline void scr_writew(u16 val, volatile u16 *addr) | ^~~~~~~~~~ ../arch/alpha/include/asm/vga.h:25:29: error: expected identifier or '(' before 'volatile' 25 | static inline u16 scr_readw(volatile const u16 *addr) | ^~~~~~~~ Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: dri-devel@lists.freedesktop.org Cc: linux-fbdev@vger.kernel.org Link: https://lore.kernel.org/r/20230329021529.16188-1-rdunlap@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20ftrace: Mark get_lock_parent_ip() __always_inlineJohn Keeping
commit ea65b41807a26495ff2a73dd8b1bab2751940887 upstream. If the compiler decides not to inline this function then preemption tracing will always show an IP inside the preemption disabling path and never the function actually calling preempt_{enable,disable}. Link: https://lore.kernel.org/linux-trace-kernel/20230327173647.1690849-1-john@metanate.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: stable@vger.kernel.org Fixes: f904f58263e1d ("sched/debug: Fix preempt_disable_ip recording for preempt_disable()") Signed-off-by: John Keeping <john@metanate.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-22HID: core: Provide new max_buffer_size attribute to over-ride the defaultLee Jones
commit b1a37ed00d7908a991c1d0f18a8cba3c2aa99bdc upstream. Presently, when a report is processed, its proposed size, provided by the user of the API (as Report Size * Report Count) is compared against the subsystem default HID_MAX_BUFFER_SIZE (16k). However, some low-level HID drivers allocate a reduced amount of memory to their buffers (e.g. UHID only allocates UHID_DATA_MAX (4k) buffers), rending this check inadequate in some cases. In these circumstances, if the received report ends up being smaller than the proposed report size, the remainder of the buffer is zeroed. That is, the space between sizeof(csize) (size of the current report) and the rsize (size proposed i.e. Report Size * Report Count), which can be handled up to HID_MAX_BUFFER_SIZE (16k). Meaning that memset() shoots straight past the end of the buffer boundary and starts zeroing out in-use values, often resulting in calamity. This patch introduces a new variable into 'struct hid_ll_driver' where individual low-level drivers can over-ride the default maximum value of HID_MAX_BUFFER_SIZE (16k) with something more sympathetic to the interface. Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz> [Lee: Backported to v4.14.y] Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-22sh: intc: Avoid spurious sizeof-pointer-div warningMichael Karcher
[ Upstream commit 250870824c1cf199b032b1ef889c8e8d69d9123a ] GCC warns about the pattern sizeof(void*)/sizeof(void), as it looks like the abuse of a pattern to calculate the array size. This pattern appears in the unevaluated part of the ternary operator in _INTC_ARRAY if the parameter is NULL. The replacement uses an alternate approach to return 0 in case of NULL which does not generate the pattern sizeof(void*)/sizeof(void), but still emits the warning if _INTC_ARRAY is called with a nonarray parameter. This patch is required for successful compilation with -Werror enabled. The idea to use _Generic for type distinction is taken from Comment #7 in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108483 by Jakub Jelinek Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Link: https://lore.kernel.org/r/619fa552-c988-35e5-b1d7-fe256c46a272@mkarcher.dialup.fu-berlin.de Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-22net: tunnels: annotate lockless accesses to dev->needed_headroomEric Dumazet
[ Upstream commit 4b397c06cb987935b1b097336532aa6b4210e091 ] IP tunnels can apparently update dev->needed_headroom in their xmit path. This patch takes care of three tunnels xmit, and also the core LL_RESERVED_SPACE() and LL_RESERVED_SPACE_EXTRA() helpers. More changes might be needed for completeness. BUG: KCSAN: data-race in ip_tunnel_xmit / ip_tunnel_xmit read to 0xffff88815b9da0ec of 2 bytes by task 888 on cpu 1: ip_tunnel_xmit+0x1270/0x1730 net/ipv4/ip_tunnel.c:803 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 write to 0xffff88815b9da0ec of 2 bytes by task 2379 on cpu 0: ip_tunnel_xmit+0x1294/0x1730 net/ipv4/ip_tunnel.c:804 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip6_finish_output2+0x9bc/0xc50 net/ipv6/ip6_output.c:134 __ip6_finish_output net/ipv6/ip6_output.c:195 [inline] ip6_finish_output+0x39a/0x4e0 net/ipv6/ip6_output.c:206 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip6_output+0xeb/0x220 net/ipv6/ip6_output.c:227 dst_output include/net/dst.h:444 [inline] NF_HOOK include/linux/netfilter.h:302 [inline] mld_sendpack+0x438/0x6a0 net/ipv6/mcast.c:1820 mld_send_cr net/ipv6/mcast.c:2121 [inline] mld_ifc_work+0x519/0x7b0 net/ipv6/mcast.c:2653 process_one_work+0x3e6/0x750 kernel/workqueue.c:2390 worker_thread+0x5f2/0xa10 kernel/workqueue.c:2537 kthread+0x1ac/0x1e0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 value changed: 0x0dd4 -> 0x0e14 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 2379 Comm: kworker/0:0 Not tainted 6.3.0-rc1-syzkaller-00002-g8ca09d5fa354-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023 Workqueue: mld mld_ifc_work Fixes: 8eb30be0352d ("ipv6: Create ip6_tnl_xmit") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230310191109.2384387-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-17PCI: Add SolidRun vendor IDAlvaro Karsz
[ Upstream commit db6c4dee4c104f50ed163af71c53bfdb878a8318 ] Add SolidRun vendor ID to pci_ids.h The vendor ID is used in 2 different source files, the SNET vDPA driver and PCI quirks. Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Message-Id: <20230110165638.123745-2-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11ima: Align ima_file_mmap() parameters with mmap_file LSM hookRoberto Sassu
commit 4971c268b85e1c7a734a61622fc0813c86e2362e upstream. Commit 98de59bfe4b2f ("take calculation of final prot in security_mmap_file() into a helper") moved the code to update prot, to be the actual protections applied to the kernel, to a new helper called mmap_prot(). However, while without the helper ima_file_mmap() was getting the updated prot, with the helper ima_file_mmap() gets the original prot, which contains the protections requested by the application. A possible consequence of this change is that, if an application calls mmap() with only PROT_READ, and the kernel applies PROT_EXEC in addition, that application would have access to executable memory without having this event recorded in the IMA measurement list. This situation would occur for example if the application, before mmap(), calls the personality() system call with READ_IMPLIES_EXEC as the first argument. Align ima_file_mmap() parameters with those of the mmap_file LSM hook, so that IMA can receive both the requested prot and the final prot. Since the requested protections are stored in a new variable, and the final protections are stored in the existing variable, this effectively restores the original behavior of the MMAP_CHECK hook. Cc: stable@vger.kernel.org Fixes: 98de59bfe4b2 ("take calculation of final prot in security_mmap_file() into a helper") Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe rangeYang Jihong
commit f1c97a1b4ef709e3f066f82e3ba3108c3b133ae6 upstream. When arch_prepare_optimized_kprobe calculating jump destination address, it copies original instructions from jmp-optimized kprobe (see __recover_optprobed_insn), and calculated based on length of original instruction. arch_check_optimized_kprobe does not check KPROBE_FLAG_OPTIMATED when checking whether jmp-optimized kprobe exists. As a result, setup_detour_execution may jump to a range that has been overwritten by jump destination address, resulting in an inval opcode error. For example, assume that register two kprobes whose addresses are <func+9> and <func+11> in "func" function. The original code of "func" function is as follows: 0xffffffff816cb5e9 <+9>: push %r12 0xffffffff816cb5eb <+11>: xor %r12d,%r12d 0xffffffff816cb5ee <+14>: test %rdi,%rdi 0xffffffff816cb5f1 <+17>: setne %r12b 0xffffffff816cb5f5 <+21>: push %rbp 1.Register the kprobe for <func+11>, assume that is kp1, corresponding optimized_kprobe is op1. After the optimization, "func" code changes to: 0xffffffff816cc079 <+9>: push %r12 0xffffffff816cc07b <+11>: jmp 0xffffffffa0210000 0xffffffff816cc080 <+16>: incl 0xf(%rcx) 0xffffffff816cc083 <+19>: xchg %eax,%ebp 0xffffffff816cc084 <+20>: (bad) 0xffffffff816cc085 <+21>: push %rbp Now op1->flags == KPROBE_FLAG_OPTIMATED; 2. Register the kprobe for <func+9>, assume that is kp2, corresponding optimized_kprobe is op2. register_kprobe(kp2) register_aggr_kprobe alloc_aggr_kprobe __prepare_optimized_kprobe arch_prepare_optimized_kprobe __recover_optprobed_insn // copy original bytes from kp1->optinsn.copied_insn, // jump address = <func+14> 3. disable kp1: disable_kprobe(kp1) __disable_kprobe ... if (p == orig_p || aggr_kprobe_disabled(orig_p)) { ret = disarm_kprobe(orig_p, true) // add op1 in unoptimizing_list, not unoptimized orig_p->flags |= KPROBE_FLAG_DISABLED; // op1->flags == KPROBE_FLAG_OPTIMATED | KPROBE_FLAG_DISABLED ... 4. unregister kp2 __unregister_kprobe_top ... if (!kprobe_disabled(ap) && !kprobes_all_disarmed) { optimize_kprobe(op) ... if (arch_check_optimized_kprobe(op) < 0) // because op1 has KPROBE_FLAG_DISABLED, here not return return; p->kp.flags |= KPROBE_FLAG_OPTIMIZED; // now op2 has KPROBE_FLAG_OPTIMIZED } "func" code now is: 0xffffffff816cc079 <+9>: int3 0xffffffff816cc07a <+10>: push %rsp 0xffffffff816cc07b <+11>: jmp 0xffffffffa0210000 0xffffffff816cc080 <+16>: incl 0xf(%rcx) 0xffffffff816cc083 <+19>: xchg %eax,%ebp 0xffffffff816cc084 <+20>: (bad) 0xffffffff816cc085 <+21>: push %rbp 5. if call "func", int3 handler call setup_detour_execution: if (p->flags & KPROBE_FLAG_OPTIMIZED) { ... regs->ip = (unsigned long)op->optinsn.insn + TMPL_END_IDX; ... } The code for the destination address is 0xffffffffa021072c: push %r12 0xffffffffa021072e: xor %r12d,%r12d 0xffffffffa0210731: jmp 0xffffffff816cb5ee <func+14> However, <func+14> is not a valid start instruction address. As a result, an error occurs. Link: https://lore.kernel.org/all/20230216034247.32348-3-yangjihong1@huawei.com/ Fixes: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: stable@vger.kernel.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11x86/kprobes: Fix __recover_optprobed_insn check optimizing logicYang Jihong
commit 868a6fc0ca2407622d2833adefe1c4d284766c4c upstream. Since the following commit: commit f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code") modified the update timing of the KPROBE_FLAG_OPTIMIZED, a optimized_kprobe may be in the optimizing or unoptimizing state when op.kp->flags has KPROBE_FLAG_OPTIMIZED and op->list is not empty. The __recover_optprobed_insn check logic is incorrect, a kprobe in the unoptimizing state may be incorrectly determined as unoptimizing. As a result, incorrect instructions are copied. The optprobe_queued_unopt function needs to be exported for invoking in arch directory. Link: https://lore.kernel.org/all/20230216034247.32348-2-yangjihong1@huawei.com/ Fixes: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code") Cc: stable@vger.kernel.org Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11genirq: Fix the return type of kstat_cpu_irqs_sum()Zhen Lei
[ Upstream commit 47904aed898a08f028572b9b5a5cc101ddfb2d82 ] The type of member ->irqs_sum is unsigned long, but kstat_cpu_irqs_sum() returns int, which can result in truncation. Therefore, change the kstat_cpu_irqs_sum() function's return value to unsigned long to avoid truncation. Fixes: f2c66cd8eedd ("/proc/stat: scalability of irq num per cpu") Reported-by: Elliott, Robert (Servers) <elliott@hpe.com> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Cc: Tejun Heo <tj@kernel.org> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Josh Don <joshdon@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11bpf: Fix 32 bit src register truncation on div/modDaniel Borkmann
Commit e88b2c6e5a4d9ce30d75391e4d950da74bb2bd90 upstream. While reviewing a different fix, John and I noticed an oddity in one of the BPF program dumps that stood out, for example: # bpftool p d x i 13 0: (b7) r0 = 808464450 1: (b4) w4 = 808464432 2: (bc) w0 = w0 3: (15) if r0 == 0x0 goto pc+1 4: (9c) w4 %= w0 [...] In line 2 we noticed that the mov32 would 32 bit truncate the original src register for the div/mod operation. While for the two operations the dst register is typically marked unknown e.g. from adjust_scalar_min_max_vals() the src register is not, and thus verifier keeps tracking original bounds, simplified: 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (b7) r0 = -1 1: R0_w=invP-1 R1=ctx(id=0,off=0,imm=0) R10=fp0 1: (b7) r1 = -1 2: R0_w=invP-1 R1_w=invP-1 R10=fp0 2: (3c) w0 /= w1 3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP-1 R10=fp0 3: (77) r1 >>= 32 4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP4294967295 R10=fp0 4: (bf) r0 = r1 5: R0_w=invP4294967295 R1_w=invP4294967295 R10=fp0 5: (95) exit processed 6 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 Runtime result of r0 at exit is 0 instead of expected -1. Remove the verifier mov32 src rewrite in div/mod and replace it with a jmp32 test instead. After the fix, we result in the following code generation when having dividend r1 and divisor r6: div, 64 bit: div, 32 bit: 0: (b7) r6 = 8 0: (b7) r6 = 8 1: (b7) r1 = 8 1: (b7) r1 = 8 2: (55) if r6 != 0x0 goto pc+2 2: (56) if w6 != 0x0 goto pc+2 3: (ac) w1 ^= w1 3: (ac) w1 ^= w1 4: (05) goto pc+1 4: (05) goto pc+1 5: (3f) r1 /= r6 5: (3c) w1 /= w6 6: (b7) r0 = 0 6: (b7) r0 = 0 7: (95) exit 7: (95) exit mod, 64 bit: mod, 32 bit: 0: (b7) r6 = 8 0: (b7) r6 = 8 1: (b7) r1 = 8 1: (b7) r1 = 8 2: (15) if r6 == 0x0 goto pc+1 2: (16) if w6 == 0x0 goto pc+1 3: (9f) r1 %= r6 3: (9c) w1 %= w6 4: (b7) r0 = 0 4: (b7) r0 = 0 5: (95) exit 5: (95) exit x86 in particular can throw a 'divide error' exception for div instruction not only for divisor being zero, but also for the case when the quotient is too large for the designated register. For the edx:eax and rdx:rax dividend pair it is not an issue in x86 BPF JIT since we always zero edx (rdx). Hence really the only protection needed is against divisor being zero. [Salvatore Bonaccorso: This is an earlier version of the patch provided by Daniel Borkmann which does not rely on availability of the BPF_JMP32 instruction class. This means it is not even strictly a backport of the upstream commit mentioned but based on Daniel's and John's work to address the issue.] Fixes: 68fda450a7df ("bpf: fix 32-bit divide by zero") Co-developed-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Salvatore Bonaccorso <carnil@debian.org> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Edward Liaw <edliaw@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-25uaccess: Add speculation barrier to copy_from_user()Dave Hansen
commit 74e19ef0ff8061ef55957c3abd71614ef0f42f47 upstream. The results of "access_ok()" can be mis-speculated. The result is that you can end speculatively: if (access_ok(from, size)) // Right here even for bad from/size combinations. On first glance, it would be ideal to just add a speculation barrier to "access_ok()" so that its results can never be mis-speculated. But there are lots of system calls just doing access_ok() via "copy_to_user()" and friends (example: fstat() and friends). Those are generally not problematic because they do not _consume_ data from userspace other than the pointer. They are also very quick and common system calls that should not be needlessly slowed down. "copy_from_user()" on the other hand uses a user-controller pointer and is frequently followed up with code that might affect caches. Take something like this: if (!copy_from_user(&kernelvar, uptr, size)) do_something_with(kernelvar); If userspace passes in an evil 'uptr' that *actually* points to a kernel addresses, and then do_something_with() has cache (or other) side-effects, it could allow userspace to infer kernel data values. Add a barrier to the common copy_from_user() code to prevent mis-speculated values which happen after the copy. Also add a stub for architectures that do not define barrier_nospec(). This makes the macro usable in generic code. Since the barrier is now usable in generic code, the x86 #ifdef in the BPF code can also go away. Reported-by: Jordy Zomer <jordyzomer@google.com> Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Daniel Borkmann <daniel@iogearbox.net> # BPF bits Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-25random: always mix cycle counter in add_latent_entropy()Jason A. Donenfeld
[ Upstream commit d7bf7f3b813e3755226bcb5114ad2ac477514ebf ] add_latent_entropy() is called every time a process forks, in kernel_clone(). This in turn calls add_device_randomness() using the latent entropy global state. add_device_randomness() does two things: 2) Mixes into the input pool the latent entropy argument passed; and 1) Mixes in a cycle counter, a sort of measurement of when the event took place, the high precision bits of which are presumably difficult to predict. (2) is impossible without CONFIG_GCC_PLUGIN_LATENT_ENTROPY=y. But (1) is always possible. However, currently CONFIG_GCC_PLUGIN_LATENT_ENTROPY=n disables both (1) and (2), instead of just (2). This commit causes the CONFIG_GCC_PLUGIN_LATENT_ENTROPY=n case to still do (1) by passing NULL (len 0) to add_device_randomness() when add_latent_ entropy() is called. Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: PaX Team <pageexec@freemail.hu> Cc: Emese Revfy <re.emese@gmail.com> Fixes: 38addce8b600 ("gcc-plugins: Add latent_entropy plugin") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-22hugetlb: check for undefined shift on 32 bit architecturesMike Kravetz
commit ec4288fe63966b26d53907212ecd05dfa81dd2cc upstream. Users can specify the hugetlb page size in the mmap, shmget and memfd_create system calls. This is done by using 6 bits within the flags argument to encode the base-2 logarithm of the desired page size. The routine hstate_sizelog() uses the log2 value to find the corresponding hugetlb hstate structure. Converting the log2 value (page_size_log) to potential hugetlb page size is the simple statement: 1UL << page_size_log Because only 6 bits are used for page_size_log, the left shift can not be greater than 63. This is fine on 64 bit architectures where a long is 64 bits. However, if a value greater than 31 is passed on a 32 bit architecture (where long is 32 bits) the shift will result in undefined behavior. This was generally not an issue as the result of the undefined shift had to exactly match hugetlb page size to proceed. Recent improvements in runtime checking have resulted in this undefined behavior throwing errors such as reported below. Fix by comparing page_size_log to BITS_PER_LONG before doing shift. Link: https://lkml.kernel.org/r/20230216013542.138708-1-mike.kravetz@oracle.com Link: https://lore.kernel.org/lkml/CA+G9fYuei_Tr-vN9GS7SfFyU1y9hNysnf=PB7kT0=yv4MiPgVg@mail.gmail.com/ Fixes: 42d7395feb56 ("mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLB") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reviewed-by: Jesper Juhl <jesperjuhl76@gmail.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Sasha Levin <sashal@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smapsMike Kravetz
commit 3489dbb696d25602aea8c3e669a6d43b76bd5358 upstream. Patch series "Fixes for hugetlb mapcount at most 1 for shared PMDs". This issue of mapcount in hugetlb pages referenced by shared PMDs was discussed in [1]. The following two patches address user visible behavior caused by this issue. [1] https://lore.kernel.org/linux-mm/Y9BF+OCdWnCSilEu@monkey/ This patch (of 2): A hugetlb page will have a mapcount of 1 if mapped by multiple processes via a shared PMD. This is because only the first process increases the map count, and subsequent processes just add the shared PMD page to their page table. page_mapcount is being used to decide if a hugetlb page is shared or private in /proc/PID/smaps. Pages referenced via a shared PMD were incorrectly being counted as private. To fix, check for a shared PMD if mapcount is 1. If a shared PMD is found count the hugetlb page as shared. A new helper to check for a shared PMD is added. [akpm@linux-foundation.org: simplification, per David] [akpm@linux-foundation.org: hugetlb.h: include page_ref.h for page_count()] Link: https://lkml.kernel.org/r/20230126222721.222195-2-mike.kravetz@oracle.com Fixes: 25ee01a2fca0 ("mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: James Houghton <jthoughton@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-06panic: Consolidate open-coded panic_on_warn checksKees Cook
commit 79cc1ba7badf9e7a12af99695a557e9ce27ee967 upstream. Several run-time checkers (KASAN, UBSAN, KFENCE, KCSAN, sched) roll their own warnings, and each check "panic_on_warn". Consolidate this into a single function so that future instrumentation can be added in a single location. Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Gow <davidgow@google.com> Cc: tangmeng <tangmeng@uniontech.com> Cc: Jann Horn <jannh@google.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Petr Mladek <pmladek@suse.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: kasan-dev@googlegroups.com Cc: linux-mm@kvack.org Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/20221117234328.594699-4-keescook@chromium.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-06exit: Add and use make_task_dead.Eric W. Biederman
commit 0e25498f8cd43c1b5aa327f373dd094e9a006da7 upstream. There are two big uses of do_exit. The first is it's design use to be the guts of the exit(2) system call. The second use is to terminate a task after something catastrophic has happened like a NULL pointer in kernel code. Add a function make_task_dead that is initialy exactly the same as do_exit to cover the cases where do_exit is called to handle catastrophic failure. In time this can probably be reduced to just a light wrapper around do_task_dead. For now keep it exactly the same so that there will be no behavioral differences introducing this new concept. Replace all of the uses of do_exit that use it for catastraphic task cleanup with make_task_dead to make it clear what the code is doing. As part of this rename rewind_stack_do_exit rewind_stack_and_make_dead. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-06sysctl: add a new register_sysctl_init() interfaceXiaoming Ni
commit 3ddd9a808cee7284931312f2f3e854c9617f44b2 upstream. Patch series "sysctl: first set of kernel/sysctl cleanups", v2. Finally had time to respin the series of the work we had started last year on cleaning up the kernel/sysct.c kitchen sink. People keeps stuffing their sysctls in that file and this creates a maintenance burden. So this effort is aimed at placing sysctls where they actually belong. I'm going to split patches up into series as there is quite a bit of work. This first set adds register_sysctl_init() for uses of registerting a sysctl on the init path, adds const where missing to a few places, generalizes common values so to be more easy to share, and starts the move of a few kernel/sysctl.c out where they belong. The majority of rework on v2 in this first patch set is 0-day fixes. Eric Biederman's feedback is later addressed in subsequent patch sets. I'll only post the first two patch sets for now. We can address the rest once the first two patch sets get completely reviewed / Acked. This patch (of 9): The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. Today though folks heavily rely on tables on kernel/sysctl.c so they can easily just extend this table with their needed sysctls. In order to help users move their sysctls out we need to provide a helper which can be used during code initialization. We special-case the initialization use of register_sysctl() since it *is* safe to fail, given all that sysctls do is provide a dynamic interface to query or modify at runtime an existing variable. So the use case of register_sysctl() on init should *not* stop if the sysctls don't end up getting registered. It would be counter productive to stop boot if a simple sysctl registration failed. Provide a helper for init then, and document the recommended init levels to use for callers of this routine. We will later use this in subsequent patches to start slimming down kernel/sysctl.c tables and moving sysctl registration to the code which actually needs these sysctls. [mcgrof@kernel.org: major commit log and documentation rephrasing also moved to fs/proc/proc_sysctl.c ] Link: https://lkml.kernel.org/r/20211123202347.818157-1-mcgrof@kernel.org Link: https://lkml.kernel.org/r/20211123202347.818157-2-mcgrof@kernel.org Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Paul Turner <pjt@google.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Qing Wang <wangqing@vivo.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jan Kara <jack@suse.cz> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Stephen Kitt <steve@sk2.org> Cc: Antti Palosaari <crope@iki.fi> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: David Airlie <airlied@linux.ie> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Mark Fasheh <mark@fasheh.com> Cc: Phillip Potter <phil@philpotter.co.uk> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: James E.J. Bottomley <jejb@linux.ibm.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>