summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2023-07-19blktrace: use inline function for blk_trace_remove() while blktrace is disabledYu Kuai
commit cbe7cff4a76bc749dd70264ca5cf924e2adf9296 upstream. If config is disabled, call blk_trace_remove() directly will trigger build warning, hence use inline function instead, prepare to fix blktrace debugfs entries leakage. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230610022003.2557284-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19shmem: use ramfs_kill_sb() for kill_sb method of ramfs-based tmpfsRoberto Sassu
commit 36ce9d76b0a93bae799e27e4f5ac35478c676592 upstream. As the ramfs-based tmpfs uses ramfs_init_fs_context() for the init_fs_context method, which allocates fc->s_fs_info, use ramfs_kill_sb() to free it and avoid a memory leak. Link: https://lkml.kernel.org/r/20230607161523.2876433-1-roberto.sassu@huaweicloud.com Fixes: c3b1b1cbf002 ("ramfs: add support for "mode=" mount option") Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19watch_queue: prevent dangling pipe pointerSiddh Raman Pant
commit 943211c87427f25bd22e0e63849fb486bb5f87fa upstream. NULL the dangling pipe reference while clearing watch_queue. If not done, a reference to a freed pipe remains in the watch_queue, as this function is called before freeing a pipe in free_pipe_info() (see line 834 of fs/pipe.c). The sole use of wqueue->defunct is for checking if the watch queue has been cleared, but wqueue->pipe is also NULLed while clearing. Thus, wqueue->defunct is superfluous, as wqueue->pipe can be checked for NULL. Hence, the former can be removed. Tested with keyutils testsuite. Cc: stable@vger.kernel.org # 6.1 Signed-off-by: Siddh Raman Pant <code@siddh.me> Acked-by: David Howells <dhowells@redhat.com> Message-Id: <20230605143616.640517-1-code@siddh.me> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19net: dsa: sja1105: always enable the send_meta optionsVladimir Oltean
[ Upstream commit a372d66af48506d9f7aaae2a474cd18f14d98cb8 ] incl_srcpt has the limitation, mentioned in commit b4638af8885a ("net: dsa: sja1105: always enable the INCL_SRCPT option"), that frames with a MAC DA of 01:80:c2:xx:yy:zz will be received as 01:80:c2:00:00:zz unless PTP RX timestamping is enabled. The incl_srcpt option was initially unconditionally enabled, then that changed with commit 42824463d38d ("net: dsa: sja1105: Limit use of incl_srcpt to bridge+vlan mode"), then again with b4638af8885a ("net: dsa: sja1105: always enable the INCL_SRCPT option"). Bottom line is that it now needs to be always enabled, otherwise the driver does not have a reliable source of information regarding source_port and switch_id for link-local traffic (tag_8021q VLANs may be imprecise since now they identify an entire bridging domain when ports are not standalone). If we accept that PTP RX timestamping (and therefore, meta frame generation) is always enabled in hardware, then that limitation could be avoided and packets with any MAC DA can be properly received, because meta frames do contain the original bytes from the MAC DA of their associated link-local packet. This change enables meta frame generation unconditionally, which also has the nice side effects of simplifying the switch control path (a switch reset is no longer required on hwtstamping settings change) and the tagger data path (it no longer needs to be informed whether to expect meta frames or not - it always does). Fixes: 227d07a07ef1 ("net: dsa: sja1105: Add support for traffic through standalone ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19lib/bitmap: drop optimization of bitmap_{from,to}_arr64Yury Norov
[ Upstream commit c1d2ba10f594046831d14b03f194e8d05e78abad ] bitmap_{from,to}_arr64() optimization is overly optimistic on 32-bit LE architectures when it's wired to bitmap_copy_clear_tail(). bitmap_copy_clear_tail() takes care of unused bits in the bitmap up to the next word boundary. But on 32-bit machines when copying bits from bitmap to array of 64-bit words, it's expected that the unused part of a recipient array must be cleared up to 64-bit boundary, so the last 4 bytes may stay untouched when nbits % 64 <= 32. While the copying part of the optimization works correct, that clear-tail trick makes corresponding tests reasonably fail: test_bitmap: bitmap_to_arr64(nbits == 1): tail is not safely cleared: 0xa5a5a5a500000001 (must be 0x0000000000000001) Fix it by removing bitmap_{from,to}_arr64() optimization for 32-bit LE arches. Reported-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/lkml/20230225184702.GA3587246@roeck-us.net/ Fixes: 0a97953fd221 ("lib: add bitmap_{from,to}_arr64") Signed-off-by: Yury Norov <yury.norov@gmail.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19Revert "usb: common: usb-conn-gpio: Set last role to unknown before initial ↵Greg Kroah-Hartman
detection" [ Upstream commit df49f2a0ac4a34c0cb4b5c233fcfa0add644c43c ] This reverts commit edd60d24bd858cef165274e4cd6cab43bdc58d15. Heikki reports that this should not be a global flag just to work around one broken driver and should be fixed differently, so revert it. Reported-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Fixes: edd60d24bd85 ("usb: common: usb-conn-gpio: Set last role to unknown before initial detection") Link: https://lore.kernel.org/r/ZImE4L3YgABnCIsP@kuha.fi.intel.com Cc: Prashanth K <quic_prashk@quicinc.com> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19usb: common: usb-conn-gpio: Set last role to unknown before initial detectionPrashanth K
[ Upstream commit edd60d24bd858cef165274e4cd6cab43bdc58d15 ] Currently if we bootup a device without cable connected, then usb-conn-gpio won't call set_role() since last_role is same as current role. This happens because during probe last_role gets initialised to zero. To avoid this, added a new constant in enum usb_role, last_role is set to USB_ROLE_UNKNOWN before performing initial detection. While at it, also handle default case for the usb_role switch in cdns3, intel-xhci-usb-role-switch & musb/jz4740 to avoid build warnings. Fixes: 4602f3bff266 ("usb: common: add USB GPIO based connection detection driver") Signed-off-by: Prashanth K <quic_prashk@quicinc.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Message-ID: <1685544074-17337-1-git-send-email-quic_prashk@quicinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19sh: Avoid using IRQ0 on SH3 and SH4Sergey Shtylyov
[ Upstream commit a8ac2961148e8c720dc760f2e06627cd5c55a154 ] IRQ0 is no longer returned by platform_get_irq() and its ilk -- they now return -EINVAL instead. However, the kernel code supporting SH3/4-based SoCs still maps the IRQ #s starting at 0 -- modify that code to start the IRQ #s from 16 instead. The patch should mostly affect the AP-SH4A-3A/AP-SH4AD-0A boards as they indeed are using IRQ0 for the SMSC911x compatible Ethernet chip. Fixes: ce753ad1549c ("platform: finally disallow IRQ0 in platform_get_irq() and its ilk") Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/71105dbf-cdb0-72e1-f9eb-eeda8e321696@omp.ru Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19USB: Extend pci resume function to handle PM eventsBasavaraj Natikar
[ Upstream commit 1f7d5520719dd1fed1a2947679f6cc26a55f1e6b ] Currently, the pci_resume method has only a flag indicating whether the system is resuming from hibernation. In order to handle all PM events like AUTO_RESUME (runtime resume from device in D3), RESUME (system resume from s2idle, S3 or S4 states) etc change the pci_resume method to handle all PM events. Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20230428140056.1318981-2-Basavaraj.Natikar@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 1c024241d018 ("xhci: Improve the XHCI system resume time") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19PCI: Add pci_clear_master() stub for non-CONFIG_PCISui Jingfeng
[ Upstream commit 2aa5ac633259843f656eb6ecff4cf01e8e810c5e ] Add a pci_clear_master() stub when CONFIG_PCI is not set so drivers that support both PCI and platform devices don't need #ifdefs or extra Kconfig symbols for the PCI parts. [bhelgaas: commit log] Fixes: 6a479079c072 ("PCI: Add pci_clear_master() as opposite of pci_set_master()") Link: https://lore.kernel.org/r/20230531102744.2354313-1-suijingfeng@loongson.cn Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19ARM/musb: omap2: Remove global GPIO numbers from TUSB6010Linus Walleij
[ Upstream commit 8e0285ab95a9baf374f2c13eb152221c8ecb3f28 ] The TUSB6010 (MUSB) device is picking up some GPIO lines hardcoded by number and passing on to the TUSB6010 device when registering it. Instead of nasty workarounds, provide a GPIO descriptor table and then make the TUSB6010 MUSB glue driver pick up the GPIO lines directly, convert it to an IRQ and pass down to the MUSB driver. OMAP2 is the only system using the TUSB6010. Stash the GPIO descriptors in the glue layer and use then to power up and down the TUSB6010 on-demand, instead of using boardfile callbacks. Since the OMAP2 boards are the only boards using the .set_power() and .board_set_power() callbacks, we can just delete them as the power is now handled directly in the TUSB6010 glue code. Cc: Bin Liu <b-liu@ti.com> Cc: linux-usb@vger.kernel.org Fixes: 92bf78b33b0b ("gpio: omap: use dynamic allocation of base") Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19ARM/gpio: Push OMAP2 quirk down into TWL4030 driverLinus Walleij
[ Upstream commit d5f4fa60d63aa54ae33339895b88d8932b6037ed ] The TWL4030 GPIO driver has a custom platform data .set_up() callback to call back into the platform and do misc stuff such as hog and export a GPIO for WLAN PWR on a specific OMAP3 board. Avoid all the kludgery in the platform data and the boardfile and just put the quirks right into the driver. Make it conditional on OMAP3. I think the exported GPIO is used by some kind of userspace so ordinary DTS hogs will probably not work. Fixes: 92bf78b33b0b ("gpio: omap: use dynamic allocation of base") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19ARM/mmc: Convert old mmci-omap to GPIO descriptorsLinus Walleij
[ Upstream commit e519f0bb64efc2c9c8b67bb2d114dda458bdc34d ] A recent change to the OMAP driver making it use a dynamic GPIO base created problems with some old OMAP1 board files, among them Nokia 770, SX1 and also the OMAP2 Nokia n8x0. Fix up all instances of GPIOs being used for the MMC driver by pushing the handling of power, slot selection and MMC "cover" into the driver as optional GPIOs. This is maybe not the most perfect solution as the MMC framework have some central handlers for some of the stuff, but it at least makes the situtation better and solves the immediate issue. Fixes: 92bf78b33b0b ("gpio: omap: use dynamic allocation of base") Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19Input: ads7846 - Convert to use software nodesLinus Walleij
[ Upstream commit 767d83361aaa6a1ecb4d5b89eeb38a267239917a ] The Nokia 770 is using GPIOs from the global numberspace on the CBUS node to pass down to the LCD controller. This regresses when we let the OMAP GPIO driver use dynamic GPIO base. The Nokia 770 now has dynamic allocation of IRQ numbers, so this needs to be fixed for it to work. As this is the only user of LCD MIPID we can easily augment the driver to use a GPIO descriptor instead and resolve the issue. The platform data .shutdown() callback wasn't even used in the code, but we encode a shutdown asserting RESET in the remove() callback for completeness sake. The CBUS also has the ADS7846 touchscreen attached. Populate the devices on the Nokia 770 CBUS I2C using software nodes instead of platform data quirks. This includes the LCD and the ADS7846 touchscreen so the conversion just brings the LCD along with it as software nodes is an all-or-nothing design pattern. The ADS7846 has some limited support for using GPIO descriptors, let's convert it over completely to using device properties and then fix all remaining boardfile users to provide all platform data using software nodes. Dump the of includes and of_match_ptr() in the ADS7846 driver as part of the job. Since we have to move ADS7846 over to obtaining the GPIOs it is using exclusively from descriptors, we provide descriptor tables for the two remaining in-kernel boardfiles using ADS7846: - PXA Spitz - MIPS Alchemy DB1000 development board It was too hard for me to include software node conversion of these two remaining users at this time: the spitz is using a hscync callback in the platform data that would require further GPIO descriptor conversion of the Spitz, and moving the hsync callback down into the driver: it will just become too big of a job, but it can be done separately. The MIPS Alchemy DB1000 is simply something I cannot test, so take the easier approach of just providing some GPIO descriptors in this case as I don't want the patch to grow too intrusive. As we see that several device trees have incorrect polarity flags and just expect to bypass the gpiolib polarity handling, fix up all device trees too, in a separate patch. Suggested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Fixes: 92bf78b33b0b ("gpio: omap: use dynamic allocation of base") Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19ARM/mfd/gpio: Fixup TPS65010 regression on OMAP1 OSK1Linus Walleij
[ Upstream commit c32c81f3dbdfd68f6ab20a29ad86f811aed36e4e ] Aaro reports problems on the OSK1 board after we altered the dynamic base for GPIO allocations. It appears this happens because the OMAP driver now allocates GPIO numbers dynamically, so all that is references by number is a bit up in the air. Let's bite the bullet and try to just move the gpio_chip in the tps65010 MFD driver over to using dynamic allocations. Alter everything in the OSK1 board file to use a GPIO descriptor table and lookups. Utilize the NULL device to define some board-specific GPIO lookups and use these to immediately look up the same GPIOs, convert to IRQ numbers and pass as resources to the devices. This is ugly but should work. The .setup() callback for tps65010 was used for some GPIO hogging, but since the OSK1 is the only user in the entire kernel we can alter the signatures to something that is helpful and make a clean transition. Fixes: 92bf78b33b0b ("gpio: omap: use dynamic allocation of base") Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: andy.shevchenko@gmail.com Cc: Andreas Kemnade <andreas@kemnade.info> Acked-by: Lee Jones <lee@kernel.org> Reviewed-by: Lee Jones <lee@kernel.org> Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19can: length: fix bitstuffing countVincent Mailhol
[ Upstream commit 9fde4c557f78ee2f3626e92b4089ce9d54a2573a ] The Stuff Bit Count is always coded on 4 bits [1]. Update the Stuff Bit Count size accordingly. In addition, the CRC fields of CAN FD Frames contain stuff bits at fixed positions called fixed stuff bits [2]. The CRC field starts with a fixed stuff bit and then has another fixed stuff bit after each fourth bit [2], which allows us to derive this formula: FSB count = 1 + round_down(len(CRC field)/4) The length of the CRC field is [1]: len(CRC field) = len(Stuff Bit Count) + len(CRC) = 4 + len(CRC) with len(CRC) either 17 or 21 bits depending of the payload length. In conclusion, for CRC17: FSB count = 1 + round_down((4 + 17)/4) = 6 and for CRC 21: FSB count = 1 + round_down((4 + 21)/4) = 7 Add a Fixed Stuff bits (FSB) field with above values and update CANFD_FRAME_OVERHEAD_SFF and CANFD_FRAME_OVERHEAD_EFF accordingly. [1] ISO 11898-1:2015 section 10.4.2.6 "CRC field": The CRC field shall contain the CRC sequence followed by a recessive CRC delimiter. For FD Frames, the CRC field shall also contain the stuff count. Stuff count If FD Frames, the stuff count shall be at the beginning of the CRC field. It shall consist of the stuff bit count modulo 8 in a 3-bit gray code followed by a parity bit [...] [2] ISO 11898-1:2015 paragraph 10.5 "Frame coding": In the CRC field of FD Frames, the stuff bits shall be inserted at fixed positions; they are called fixed stuff bits. There shall be a fixed stuff bit before the first bit of the stuff count, even if the last bits of the preceding field are a sequence of five consecutive bits of identical value, there shall be only the fixed stuff bit, there shall not be two consecutive stuff bits. A further fixed stuff bit shall be inserted after each fourth bit of the CRC field [...] Fixes: 85d99c3e2a13 ("can: length: can_skb_get_frame_len(): introduce function to get data length of frame in data link layer") Suggested-by: Thomas Kopp <Thomas.Kopp@microchip.com> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Reviewed-by: Thomas Kopp <Thomas.Kopp@microchip.com> Link: https://lore.kernel.org/all/20230611025728.450837-2-mailhol.vincent@wanadoo.fr Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19bpf: Fix bpf socket lookup from tc/xdp to respect socket VRF bindingsGilad Sever
[ Upstream commit 9a5cb79762e0eda17ca15c2a6eaca4622383c21c ] When calling bpf_sk_lookup_tcp(), bpf_sk_lookup_udp() or bpf_skc_lookup_tcp() from tc/xdp ingress, VRF socket bindings aren't respoected, i.e. unbound sockets are returned, and bound sockets aren't found. VRF binding is determined by the sdif argument to sk_lookup(), however when called from tc the IP SKB control block isn't initialized and thus inet{,6}_sdif() always returns 0. Fix by calculating sdif for the tc/xdp flows by observing the device's l3 enslaved state. The cg/sk_skb hooking points which are expected to support inet{,6}_sdif() pass sdif=-1 which makes __bpf_skc_lookup() use the existing logic. Fixes: 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF") Signed-off-by: Gilad Sever <gilad9366@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Reviewed-by: Eyal Birger <eyal.birger@gmail.com> Acked-by: Stanislav Fomichev <sdf@google.com> Cc: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/bpf/20230621104211.301902-4-gilad9366@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19mmc: Add MMC_QUIRK_BROKEN_SD_CACHE for Kingston Canvas Go Plus from 11/2019Marek Vasut
[ Upstream commit c467c8f081859d4f4ca4eee4fba54bb5d85d6c97 ] This microSD card never clears Flush Cache bit after cache flush has been started in sd_flush_cache(). This leads e.g. to failure to mount file system. Add a quirk which disables the SD cache for this specific card from specific manufacturing date of 11/2019, since on newer dated cards from 05/2023 the cache flush works correctly. Fixes: 08ebf903af57 ("mmc: core: Fixup support for writeback-cache for eMMC and SD") Signed-off-by: Marek Vasut <marex@denx.de> Link: https://lore.kernel.org/r/20230620102713.7701-1-marex@denx.de Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe failsDouglas Anderson
[ Upstream commit 9ec272c586b07d1abf73438524bd12b1df9c5f9b ] Patch series "watchdog: Cleanup / fixes after buddy series v5 reviews". This patch series attempts to finish resolving the feedback received from Petr Mladek on the v5 series I posted. Probably the only thing that wasn't fully as clean as Petr requested was the Kconfig stuff. I couldn't find a better way to express it without a more major overhaul. In the very least, I renamed "NON_ARCH" to "PERF_OR_BUDDY" in the hopes that will make it marginally better. Nothing in this series is terribly critical and even the bugfixes are small. However, it does cleanup a few things that were pointed out in review. This patch (of 10): The permissions for the kernel.nmi_watchdog sysctl have always been set at compile time despite the fact that a watchdog can fail to probe. Let's fix this and set the permissions based on whether the hardlockup detector actually probed. Link: https://lkml.kernel.org/r/20230527014153.2793931-1-dianders@chromium.org Link: https://lkml.kernel.org/r/20230526184139.1.I0d75971cc52a7283f495aac0bd5c3041aadc734e@changeid Fixes: a994a3147e4c ("watchdog/hardlockup/perf: Implement init time detection of perf") Signed-off-by: Douglas Anderson <dianders@chromium.org> Reported-by: Petr Mladek <pmladek@suse.com> Closes: https://lore.kernel.org/r/ZHCn4hNxFpY5-9Ki@alley Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19watchdog/perf: adapt the watchdog_perf interface for async modelLecopzer Chen
[ Upstream commit 930d8f8dbab97cb05dba30e67a2dfa0c6dbf4bc7 ] When lockup_detector_init()->watchdog_hardlockup_probe(), PMU may be not ready yet. E.g. on arm64, PMU is not ready until device_initcall(armv8_pmu_driver_init). And it is deeply integrated with the driver model and cpuhp. Hence it is hard to push this initialization before smp_init(). But it is easy to take an opposite approach and try to initialize the watchdog once again later. The delayed probe is called using workqueues. It need to allocate memory and must be proceed in a normal context. The delayed probe is able to use if watchdog_hardlockup_probe() returns non-zero which means the return code returned when PMU is not ready yet. Provide an API - lockup_detector_retry_init() for anyone who needs to delayed init lockup detector if they had ever failed at lockup_detector_init(). The original assumption is: nobody should use delayed probe after lockup_detector_check() which has __init attribute. That is, anyone uses this API must call between lockup_detector_init() and lockup_detector_check(), and the caller must have __init attribute Link: https://lkml.kernel.org/r/20230519101840.v5.16.If4ad5dd5d09fb1309cebf8bcead4b6a5a7758ca7@changeid Reviewed-by: Petr Mladek <pmladek@suse.com> Co-developed-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Suggested-by: Petr Mladek <pmladek@suse.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Colin Cross <ccross@android.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guenter Roeck <groeck@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Sumit Garg <sumit.garg@linaro.org> Cc: Tzung-Bi Shih <tzungbi@chromium.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 9ec272c586b0 ("watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19watchdog/hardlockup: rename some "NMI watchdog" constants/functionDouglas Anderson
[ Upstream commit df95d3085caa5b99a60eb033d7ad6c2ff2b43dbf ] Do a search and replace of: - NMI_WATCHDOG_ENABLED => WATCHDOG_HARDLOCKUP_ENABLED - SOFT_WATCHDOG_ENABLED => WATCHDOG_SOFTOCKUP_ENABLED - watchdog_nmi_ => watchdog_hardlockup_ - nmi_watchdog_available => watchdog_hardlockup_available - nmi_watchdog_user_enabled => watchdog_hardlockup_user_enabled - soft_watchdog_user_enabled => watchdog_softlockup_user_enabled - NMI_WATCHDOG_DEFAULT => WATCHDOG_HARDLOCKUP_DEFAULT Then update a few comments near where names were changed. This is specifically to make it less confusing when we want to introduce the buddy hardlockup detector, which isn't using NMIs. As part of this, we sanitized a few names for consistency. [trix@redhat.com: make variables static] Link: https://lkml.kernel.org/r/20230525162822.1.I0fb41d138d158c9230573eaa37dc56afa2fb14ee@changeid Link: https://lkml.kernel.org/r/20230519101840.v5.12.I91f7277bab4bf8c0cb238732ed92e7ce7bbd71a6@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Tom Rix <trix@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Colin Cross <ccross@android.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guenter Roeck <groeck@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Sumit Garg <sumit.garg@linaro.org> Cc: Tzung-Bi Shih <tzungbi@chromium.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 9ec272c586b0 ("watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19watchdog/hardlockup: move perf hardlockup checking/panic to common watchdog.cDouglas Anderson
[ Upstream commit 81972551df9d168a8183b786ff4de06008469c2e ] The perf hardlockup detector works by looking at interrupt counts and seeing if they change from run to run. The interrupt counts are managed by the common watchdog code via its watchdog_timer_fn(). Currently the API between the perf detector and the common code is a function: is_hardlockup(). When the hard lockup detector sees that function return true then it handles printing out debug info and inducing a panic if necessary. Let's change the API a little bit in preparation for the buddy hardlockup detector. The buddy hardlockup detector wants to print nearly the same debug info and have nearly the same panic behavior. That means we want to move all that code to the common file. For now, the code in the common file will only be there if the perf hardlockup detector is enabled, but eventually it will be selected by a common config. Right now, this _just_ moves the code from the perf detector file to the common file and changes the names. It doesn't make the changes that the buddy hardlockup detector will need and doesn't do any style cleanups. A future patch will do cleanup to make it more obvious what changed. With the above, we no longer have any callers of is_hardlockup() outside of the "watchdog.c" file, so we can remove it from the header, make it static, and move it to the same "#ifdef" block as our new watchdog_hardlockup_check(). While doing this, it can be noted that even if no hardlockup detectors were configured the existing code used to still have the code for counting/checking "hrtimer_interrupts" even if the perf hardlockup detector wasn't configured. We didn't need to do that, so move all the "hrtimer_interrupts" counting to only be there if the perf hardlockup detector is configured as well. This change is expected to be a no-op. Link: https://lkml.kernel.org/r/20230519101840.v5.8.Id4133d3183e798122dc3b6205e7852601f289071@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Colin Cross <ccross@android.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guenter Roeck <groeck@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Pingfan Liu <kernelfans@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Sumit Garg <sumit.garg@linaro.org> Cc: Tzung-Bi Shih <tzungbi@chromium.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 9ec272c586b0 ("watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19watchdog/hardlockup: change watchdog_nmi_enable() to voidLecopzer Chen
[ Upstream commit 730211182ed083898fa5feb4b28459ffac4c9615 ] Nobody cares about the return value of watchdog_nmi_enable(), changing its prototype to void. Link: https://lkml.kernel.org/r/20230519101840.v5.4.Ic3a19b592eb1ac4c6f6eade44ffd943e8637b6e5@changeid Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Nicholas Piggin <npiggin@gmail.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Colin Cross <ccross@android.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guenter Roeck <groeck@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Sumit Garg <sumit.garg@linaro.org> Cc: Tzung-Bi Shih <tzungbi@chromium.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 9ec272c586b0 ("watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19mm: move mm_count into its own cache lineMathieu Desnoyers
[ Upstream commit c1753fd02a0058ea43cbb31ab26d25be2f6cfe08 ] The mm_struct mm_count field is frequently updated by mmgrab/mmdrop performed by context switch. This causes false-sharing for surrounding mm_struct fields which are read-mostly. This has been observed on a 2sockets/112core/224cpu Intel Sapphire Rapids server running hackbench, and by the kernel test robot will-it-scale testcase. Move the mm_count field into its own cache line to prevent false-sharing with other mm_struct fields. Move mm_count to the first field of mm_struct to minimize the amount of padding required: rather than adding padding before and after the mm_count field, padding is only added after mm_count. Note that I noticed this odd comment in mm_struct: commit 2e3025434a6b ("mm: relocate 'write_protect_seq' in struct mm_struct") /* * With some kernel config, the current mmap_lock's offset * inside 'mm_struct' is at 0x120, which is very optimal, as * its two hot fields 'count' and 'owner' sit in 2 different * cachelines, and when mmap_lock is highly contended, both * of the 2 fields will be accessed frequently, current layout * will help to reduce cache bouncing. * * So please be careful with adding new fields before * mmap_lock, which can easily push the 2 fields into one * cacheline. */ struct rw_semaphore mmap_lock; This comment is rather odd for a few reasons: - It requires addition/removal of mm_struct fields to carefully consider field alignment of _other_ fields, - It expresses the wish to keep an "optimal" alignment for a specific kernel config. I suspect that the author of this comment may want to revisit this topic and perhaps introduce a split-struct approach for struct rw_semaphore, if the need is to place various fields of this structure in different cache lines. Link: https://lkml.kernel.org/r/20230515143536.114960-1-mathieu.desnoyers@efficios.com Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid") Fixes: af7f588d8f73 ("sched: Introduce per-memory-map concurrency ID") Link: https://lore.kernel.org/lkml/7a0c1db1-103d-d518-ed96-1584a28fbf32@efficios.com Reported-by: kernel test robot <yujie.liu@intel.com> Link: https://lore.kernel.org/oe-lkp/202305151017.27581d75-yujie.liu@intel.com Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Aaron Lu <aaron.lu@intel.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Olivier Dion <odion@efficios.com> Cc: <michael.christie@oracle.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19wifi: ieee80211: Fix the common size calculation for reconfiguration MLIlan Peer
[ Upstream commit ce6e1f600b0cfc563a7d607de702262a58cd835d ] The common information length is found in the first octet of the common information. Fixes: 0f48b8b88aa9 ("wifi: ieee80211: add definitions for multi-link element") Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230618214435.3c7ed4817338.I42ef706cb827b4dade6e4ffbb6e7f341eaccd398@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19wifi: iwlwifi: mvm: add support for Extra EHT LTFGregory Greenman
[ Upstream commit 18c0ffb404db2093b6afdc8ae15f18ba3975e1ed ] Add support for Extra EHT LTF defined in 9.4.2.313 EHT Capabilities element. Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230613155501.de019d7cc174.I806f0f6042b89274192701a60b4f7900822db666@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Stable-dep-of: f91295987576 ("wifi: iwlwifi: mvm: correctly access HE/EHT sband capa") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19bpf: Verify scalar ids mapping in regsafe() using check_ids()Eduard Zingerman
[ Upstream commit 1ffc85d9298e0ca0137ba65c93a786143fe167b8 ] Make sure that the following unsafe example is rejected by verifier: 1: r9 = ... some pointer with range X ... 2: r6 = ... unbound scalar ID=a ... 3: r7 = ... unbound scalar ID=b ... 4: if (r6 > r7) goto +1 5: r6 = r7 6: if (r6 > X) goto ...
2023-07-19bpf: Use scalar ids in mark_chain_precision()Eduard Zingerman
[ Upstream commit 904e6ddf4133c52fdb9654c2cd2ad90f320d48b9 ] Change mark_chain_precision() to track precision in situations like below: r2 = unknown value ... --- state #0 --- ... r1 = r2 // r1 and r2 now share the same ID ... --- state #1 {r1.id = A, r2.id = A} --- ... if (r2 > 10) goto exit; // find_equal_scalars() assigns range to r1 ... --- state #2 {r1.id = A, r2.id = A} --- r3 = r10 r3 += r1 // need to mark both r1 and r2 At the beginning of the processing of each state, ensure that if a register with a scalar ID is marked as precise, all registers sharing this ID are also marked as precise. This property would be used by a follow-up change in regsafe(). Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230613153824.3324830-2-eddyz87@gmail.com Stable-dep-of: 1ffc85d9298e ("bpf: Verify scalar ids mapping in regsafe() using check_ids()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19watchdog/perf: define dummy watchdog_update_hrtimer_threshold() on correct ↵Douglas Anderson
config [ Upstream commit 5e008df11c55228a86a1bae692cc2002503572c9 ] Patch series "watchdog/hardlockup: Add the buddy hardlockup detector", v5. This patch series adds the "buddy" hardlockup detector. In brief, the buddy hardlockup detector can detect hardlockups without arch-level support by having CPUs checkup on a "buddy" CPU periodically. Given the new design of this patch series, testing all combinations is fairly difficult. I've attempted to make sure that all combinations of CONFIG_ options are good, but it wouldn't surprise me if I missed something. I apologize in advance and I'll do my best to fix any problems that are found. This patch (of 18): The real watchdog_update_hrtimer_threshold() is defined in kernel/watchdog_hld.c. That file is included if CONFIG_HARDLOCKUP_DETECTOR_PERF and the function is defined in that file if CONFIG_HARDLOCKUP_CHECK_TIMESTAMP. The dummy version of the function in "nmi.h" didn't get that quite right. While this doesn't appear to be a huge deal, it's nice to make it consistent. It doesn't break builds because CHECK_TIMESTAMP is only defined by x86 so others don't get a double definition, and x86 uses perf lockup detector, so it gets the out of line version. Link: https://lkml.kernel.org/r/20230519101840.v5.18.Ia44852044cdcb074f387e80df6b45e892965d4a1@changeid Link: https://lkml.kernel.org/r/20230519101840.v5.1.I8cbb2f4fa740528fcfade4f5439b6cdcdd059251@changeid Fixes: 7edaeb6841df ("kernel/watchdog: Prevent false positives with turbo modes") Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guenter Roeck <groeck@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Pingfan Liu <kernelfans@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Sumit Garg <sumit.garg@linaro.org> Cc: Tzung-Bi Shih <tzungbi@chromium.org> Cc: Will Deacon <will@kernel.org> Cc: Colin Cross <ccross@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19soc: qcom: geni-se: Add interfaces geni_se_tx_init_dma() and ↵Vijaya Krishna Nivarthi
geni_se_rx_init_dma() [ Upstream commit 6d6e57594957ee9131bc3802dfc8657ca6f78fee ] The geni_se_xx_dma_prep() interfaces necessarily do DMA mapping before initiating DMA transfers. This is not suitable for spi where framework is expected to handle map/unmap. Expose new interfaces geni_se_xx_init_dma() which do only DMA transfer. Signed-off-by: Vijaya Krishna Nivarthi <quic_vnivarth@quicinc.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/1684325894-30252-2-git-send-email-quic_vnivarth@quicinc.com Signed-off-by: Mark Brown <broonie@kernel.org> Stable-dep-of: 3a76c7ca9e77 ("spi: spi-geni-qcom: Do not do DMA map/unmap inside driver, use framework instead") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19bpf: Remove bpf trampoline selectorYafang Shao
[ Upstream commit 47e79cbeea4b3891ad476047f4c68543eb51c8e0 ] After commit e21aa341785c ("bpf: Fix fexit trampoline."), the selector is only used to indicate how many times the bpf trampoline image are updated and been displayed in the trampoline ksym name. After the trampoline is freed, the selector will start from 0 again. So the selector is a useless value to the user. We can remove it. If the user want to check whether the bpf trampoline image has been updated or not, the user can compare the address. Each time the trampoline image is updated, the address will change consequently. Jiri also pointed out another issue that perf is still using the old name "bpf_trampoline_%lu", so this change can fix the issue in perf. Fixes: e21aa341785c ("bpf: Fix fexit trampoline.") Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <song@kernel.org> Cc: Jiri Olsa <olsajiri@gmail.com> Link: https://lore.kernel.org/bpf/ZFvOOlrmHiY9AgXE@krava Link: https://lore.kernel.org/bpf/20230515130849.57502-3-laoar.shao@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19bpf: improve precision backtrack loggingAndrii Nakryiko
[ Upstream commit d9439c21a9e4769bfd83a03ab39056164d44ac31 ] Add helper to format register and stack masks in more human-readable format. Adjust logging a bit during backtrack propagation and especially during forcing precision fallback logic to make it clearer what's going on (with log_level=2, of course), and also start reporting affected frame depth. This is in preparation for having more than one active frame later when precision propagation between subprog calls is added. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230505043317.3629845-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Stable-dep-of: f655badf2a8f ("bpf: fix propagate_precision() logic for inner frames") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19bpf: encapsulate precision backtracking bookkeepingAndrii Nakryiko
[ Upstream commit 407958a0e980b9e1842ab87b5a1040521e1e24e9 ] Add struct backtrack_state and straightforward API around it to keep track of register and stack masks used and maintained during precision backtracking process. Having this logic separately allow to keep high-level backtracking algorithm cleaner, but also it sets us up to cleanly keep track of register and stack masks per frame, allowing (with some further logic adjustments) to perform precision backpropagation across multiple frames (i.e., subprog calls). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230505043317.3629845-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Stable-dep-of: f655badf2a8f ("bpf: fix propagate_precision() logic for inner frames") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19drivers/perf: apple_m1: Force 63bit counters for M2 CPUsMarc Zyngier
[ Upstream commit 8be3593b9efa8903d2ee7bb9cdf57a8e56c66f36 ] Sidharth reports that on M2, the PMU never generates any interrupt when using 'perf record', which is a annoying as you get no sample. I'm temped to say "no sample, no problem", but others may have a different opinion. Upon investigation, it appears that the counters on M2 are significantly different from the ones on M1, as they count on 64 bits instead of 48. Which of course, in the fine M1 tradition, means that we can only use 63 bits, as the top bit is used to signal the interrupt... This results in having to introduce yet another flag to indicate yet another odd counter width. Who knows what the next crazy implementation will do... With this, perf can work out the correct offset, and 'perf record' works as intended. Tested on M2 and M2-Pro CPUs. Cc: Janne Grunau <j@jannau.net> Cc: Hector Martin <marcan@marcan.st> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Fixes: 7d0bfb7c9977 ("drivers/perf: apple_m1: Add Apple M2 support") Reported-by: Sidharth Kshatriya <sid.kshatriya@gmail.com> Tested-by: Sidharth Kshatriya <sid.kshatriya@gmail.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230528080205.288446-1-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19blk-mq: fix potential io hang by wrong 'wake_batch'Yu Kuai
[ Upstream commit 4f1731df60f9033669f024d06ae26a6301260b55 ] In __blk_mq_tag_busy/idle(), updating 'active_queues' and calculating 'wake_batch' is not atomic: t1: t2: _blk_mq_tag_busy blk_mq_tag_busy inc active_queues // assume 1->2 inc active_queues // 2 -> 3 blk_mq_update_wake_batch // calculate based on 3 blk_mq_update_wake_batch /* calculate based on 2, while active_queues is actually 3. */ Fix this problem by protecting them wih 'tags->lock', this is not a hot path, so performance should not be concerned. And now that all writers are inside the lock, switch 'actives_queues' from atomic to unsigned int. Fixes: 180dccb0dba4 ("blk-mq: fix tag_get wait task can't be awakened") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230610023043.2559121-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19block/rq_qos: protect rq_qos apis with a new lockYu Kuai
[ Upstream commit a13bd91be22318768d55470cbc0b0f4488ef9edf ] commit 50e34d78815e ("block: disable the elevator int del_gendisk") move rq_qos_exit() from disk_release() to del_gendisk(), this will introduce some problems: 1) If rq_qos_add() is triggered by enabling iocost/iolatency through cgroupfs, then it can concurrent with del_gendisk(), it's not safe to write 'q->rq_qos' concurrently. 2) Activate cgroup policy that is relied on rq_qos will call rq_qos_add() and blkcg_activate_policy(), and if rq_qos_exit() is called in the middle, null-ptr-dereference will be triggered in blkcg_activate_policy(). 3) blkg_conf_open_bdev() can call blkdev_get_no_open() first to find the disk, then if rq_qos_exit() from del_gendisk() is done before rq_qos_add(), then memory will be leaked. This patch add a new disk level mutex 'rq_qos_mutex': 1) The lock will protect rq_qos_exit() directly. 2) For wbt that doesn't relied on blk-cgroup, rq_qos_add() can only be called from disk initialization for now because wbt can't be destructed until rq_qos_exit(), so it's safe not to protect wbt for now. Hoever, in case that rq_qos dynamically destruction is supported in the furture, this patch also protect rq_qos_add() from wbt_init() directly, this is enough because blk-sysfs already synchronize writers with disk removal. 3) For iocost and iolatency, in order to synchronize disk removal and cgroup configuration, the lock is held after blkdev_get_no_open() from blkg_conf_open_bdev(), and is released in blkg_conf_exit(). In order to fix the above memory leak, disk_live() is checked after holding the new lock. Fixes: 50e34d78815e ("block: disable the elevator int del_gendisk") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230414084008.2085155-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19block: Fix the type of the second bdev_op_is_zoned_write() argumentBart Van Assche
[ Upstream commit 3ddbe2a7e0d4a155a805f69c906c9beed30d4cc4 ] Change the type of the second argument of bdev_op_is_zoned_write() from blk_opf_t into enum req_op because this function expects an operation without flags as second argument. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Cc: Ming Lei <ming.lei@redhat.com> Fixes: 8cafdb5ab94c ("block: adapt blk_mq_plug() to not plug for writes that require a zone lock") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230517174230.897144-4-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19fs: pipe: reveal missing function protoypesArnd Bergmann
[ Upstream commit 247c8d2f9837a3e29e3b6b7a4aa9c36c37659dd4 ] A couple of functions from fs/pipe.c are used both internally and for the watch queue code, but the declaration is only visible when the latter is enabled: fs/pipe.c:1254:5: error: no previous prototype for 'pipe_resize_ring' fs/pipe.c:758:15: error: no previous prototype for 'account_pipe_buffers' fs/pipe.c:764:6: error: no previous prototype for 'too_many_pipe_buffers_soft' fs/pipe.c:771:6: error: no previous prototype for 'too_many_pipe_buffers_hard' fs/pipe.c:777:6: error: no previous prototype for 'pipe_is_unprivileged_user' Make the visible unconditionally to avoid these warnings. Fixes: c73be61cede5 ("pipe: Add general notification queue support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Message-Id: <20230516195629.551602-1-arnd@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19start_kernel: Add __no_stack_protector function attributendesaulniers@google.com
commit 514ca14ed5444b911de59ed3381dfd195d99fe4b upstream. Back during the discussion of commit a9a3ed1eff36 ("x86: Fix early boot crash on gcc-10, third try") we discussed the need for a function attribute to control the omission of stack protectors on a per-function basis; at the time Clang had support for no_stack_protector but GCC did not. This was fixed in gcc-11. Now that the function attribute is available, let's start using it. Callers of boot_init_stack_canary need to use this function attribute unless they're compiled with -fno-stack-protector, otherwise the canary stored in the stack slot of the caller will differ upon the call to boot_init_stack_canary. This will lead to a call to __stack_chk_fail() then panic. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94722 Link: https://lore.kernel.org/all/20200316130414.GC12561@hirez.programming.kicks-ass.net/ Tested-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Miguel Ojeda <ojeda@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20230412-no_stackp-v2-1-116f9fe4bbe7@google.com Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: ndesaulniers@google.com <ndesaulniers@google.com>
2023-07-11bootmem: remove the vmemmap pages from kmemleak in free_bootmem_pageLiu Shixin
commit 028725e73375a1ff080bbdf9fb503306d0116f28 upstream. commit dd0ff4d12dd2 ("bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem") fix an overlaps existing problem of kmemleak. But the problem still existed when HAVE_BOOTMEM_INFO_NODE is disabled, because in this case, free_bootmem_page() will call free_reserved_page() directly. Fix the problem by adding kmemleak_free_part() in free_bootmem_page() when HAVE_BOOTMEM_INFO_NODE is disabled. Link: https://lkml.kernel.org/r/20230704101942.2819426-1-liushixin2@huawei.com Fixes: f41f2ed43ca5 ("mm: hugetlb: free the vmemmap pages associated with each HugeTLB page") Signed-off-by: Liu Shixin <liushixin2@huawei.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-05execve: always mark stack as growing down during early stack setupLinus Torvalds
commit f66066bc5136f25e36a2daff4896c768f18c211e upstream. While our user stacks can grow either down (all common architectures) or up (parisc and the ia64 register stack), the initial stack setup when we copy the argument and environment strings to the new stack at execve() time is always done by extending the stack downwards. But it turns out that in commit 8d7071af8907 ("mm: always expand the stack with the mmap write lock held"), as part of making the stack growing code more robust, 'expand_downwards()' was now made to actually check the vma flags: if (!(vma->vm_flags & VM_GROWSDOWN)) return -EFAULT; and that meant that this execve-time stack expansion started failing on parisc, because on that architecture, the stack flags do not contain the VM_GROWSDOWN bit. At the same time the new check in expand_downwards() is clearly correct, and simplified the callers, so let's not remove it. The solution is instead to just codify the fact that yes, during execve(), the stack grows down. This not only matches reality, it ends up being particularly simple: we already have special execve-time flags for the stack (VM_STACK_INCOMPLETE_SETUP) and use those flags to avoid page migration during this setup time (see vma_is_temporary_stack() and invalid_migration_vma()). So just add VM_GROWSDOWN to that set of temporary flags, and now our stack flags automatically match reality, and the parisc stack expansion works again. Note that the VM_STACK_INCOMPLETE_SETUP bits will be cleared when the stack is finalized, so we only add the extra VM_GROWSDOWN bit on CONFIG_STACK_GROWSUP architectures (ie parisc) rather than adding it in general. Link: https://lore.kernel.org/all/612eaa53-6904-6e16-67fc-394f4faa0e16@bell.net/ Link: https://lore.kernel.org/all/5fd98a09-4792-1433-752d-029ae3545168@gmx.de/ Fixes: 8d7071af8907 ("mm: always expand the stack with the mmap write lock held") Reported-by: John David Anglin <dave.anglin@bell.net> Reported-and-tested-by: Helge Deller <deller@gmx.de> Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01xtensa: fix NOMMU build with lock_mm_and_find_vma() conversionLinus Torvalds
commit d85a143b69abb4d7544227e26d12c4c7735ab27d upstream. It turns out that xtensa has a really odd configuration situation: you can do a no-MMU config, but still have the page fault code enabled. Which doesn't sound all that sensible, but it turns out that xtensa can have protection faults even without the MMU, and we have this: config PFAULT bool "Handle protection faults" if EXPERT && !MMU default y help Handle protection faults. MMU configurations must enable it. noMMU configurations may disable it if used memory map never generates protection faults or faults are always fatal. If unsure, say Y. which completely violated my expectations of the page fault handling. End result: Guenter reports that the xtensa no-MMU builds all fail with arch/xtensa/mm/fault.c: In function ‘do_page_fault’: arch/xtensa/mm/fault.c:133:8: error: implicit declaration of function ‘lock_mm_and_find_vma’ because I never exposed the new lock_mm_and_find_vma() function for the no-MMU case. Doing so is simple enough, and fixes the problem. Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net> Fixes: a050ba1e7422 ("mm/fault: convert remaining simple cases to lock_mm_and_find_vma()") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01mm: always expand the stack with the mmap write lock heldLinus Torvalds
commit 8d7071af890768438c14db6172cc8f9f4d04e184 upstream. This finishes the job of always holding the mmap write lock when extending the user stack vma, and removes the 'write_locked' argument from the vm helper functions again. For some cases, we just avoid expanding the stack at all: drivers and page pinning really shouldn't be extending any stacks. Let's see if any strange users really wanted that. It's worth noting that architectures that weren't converted to the new lock_mm_and_find_vma() helper function are left using the legacy "expand_stack()" function, but it has been changed to drop the mmap_lock and take it for writing while expanding the vma. This makes it fairly straightforward to convert the remaining architectures. As a result of dropping and re-taking the lock, the calling conventions for this function have also changed, since the old vma may no longer be valid. So it will now return the new vma if successful, and NULL - and the lock dropped - if the area could not be extended. Tested-by: Vegard Nossum <vegard.nossum@oracle.com> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> # ia64 Tested-by: Frank Scheiner <frank.scheiner@web.de> # ia64 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01mm: make find_extend_vma() fail if write lock not heldLiam R. Howlett
commit f440fa1ac955e2898893f9301568435eb5cdfc4b upstream. Make calls to extend_vma() and find_extend_vma() fail if the write lock is required. To avoid making this a flag-day event, this still allows the old read-locking case for the trivial situations, and passes in a flag to say "is it write-locked". That way write-lockers can say "yes, I'm being careful", and legacy users will continue to work in all the common cases until they have been fully converted to the new world order. Co-Developed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01mm: introduce new 'lock_mm_and_find_vma()' page fault helperLinus Torvalds
commit c2508ec5a58db67093f4fb8bf89a9a7c53a109e9 upstream. .. and make x86 use it. This basically extracts the existing x86 "find and expand faulting vma" code, but extends it to also take the mmap lock for writing in case we actually do need to expand the vma. We've historically short-circuited that case, and have some rather ugly special logic to serialize the stack segment expansion (since we only hold the mmap lock for reading) that doesn't match the normal VM locking. That slight violation of locking worked well, right up until it didn't: the maple tree code really does want proper locking even for simple extension of an existing vma. So extract the code for "look up the vma of the fault" from x86, fix it up to do the necessary write locking, and make it available as a helper function for other architectures that can use the common helper. Note: I say "common helper", but it really only handles the normal stack-grows-down case. Which is all architectures except for PA-RISC and IA64. So some rare architectures can't use the helper, but if they care they'll just need to open-code this logic. It's also worth pointing out that this code really would like to have an optimistic "mmap_upgrade_trylock()" to make it quicker to go from a read-lock (for the common case) to taking the write lock (for having to extend the vma) in the normal single-threaded situation where there is no other locking activity. But that _is_ all the very uncommon special case, so while it would be nice to have such an operation, it probably doesn't matter in reality. I did put in the skeleton code for such a possible future expansion, even if it only acts as pseudo-documentation for what we're doing. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-25Merge tag 'perf_urgent_for_v6.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Drop the __weak attribute from a function prototype as it otherwise leads to the function getting replaced by a dummy stub - Fix the umask value setup of the frontend event as former is different on two Intel cores * tag 'perf_urgent_for_v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix the FRONTEND encoding on GNR and MTL perf/core: Drop __weak attribute from arch_perf_update_userpage() prototype
2023-06-23Merge tag 'gpio-fixes-for-v6.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix IRQ initialization in gpiochip_irqchip_add_domain() - add a missing return value check for platform_get_irq() in gpio-sifive - don't free irq_domains which GPIOLIB does not manage * tag 'gpio-fixes-for-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpiolib: Fix irq_domain resource tracking for gpiochip_irqchip_add_domain() gpio: sifive: add missing check for platform_get_irq gpiolib: Fix GPIO chip IRQ initialization restriction
2023-06-23workqueue: clean up WORK_* constant types, clarify maskingLinus Torvalds
Dave Airlie reports that gcc-13.1.1 has started complaining about some of the workqueue code in 32-bit arm builds: kernel/workqueue.c: In function ‘get_work_pwq’: kernel/workqueue.c:713:24: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] 713 | return (void *)(data & WORK_STRUCT_WQ_DATA_MASK); | ^ [ ... a couple of other cases ... ] and while it's not immediately clear exactly why gcc started complaining about it now, I suspect it's some C23-induced enum type handlign fixup in gcc-13 is the cause. Whatever the reason for starting to complain, the code and data types are indeed disgusting enough that the complaint is warranted. The wq code ends up creating various "helper constants" (like that WORK_STRUCT_WQ_DATA_MASK) using an enum type, which is all kinds of confused. The mask needs to be 'unsigned long', not some unspecified enum type. To make matters worse, the actual "mask and cast to a pointer" is repeated a couple of times, and the cast isn't even always done to the right pointer, but - as the error case above - to a 'void *' with then the compiler finishing the job. That's now how we roll in the kernel. So create the masks using the proper types rather than some ambiguous enumeration, and use a nice helper that actually does the type conversion in one well-defined place. Incidentally, this magically makes clang generate better code. That, admittedly, is really just a sign of clang having been seriously confused before, and cleaning up the typing unconfuses the compiler too. Reported-by: Dave Airlie <airlied@gmail.com> Link: https://lore.kernel.org/lkml/CAPM=9twNnV4zMCvrPkw3H-ajZOH-01JVh_kDrxdPYQErz8ZTdA@mail.gmail.com/ Cc: Arnd Bergmann <arnd@arndb.de> Cc: Tejun Heo <tj@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-21Merge tag 'regulator-fix-v6.4-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fix from Mark Brown: "One simple fix for v6.4, some incorrectly specified bitfield masks in the PCA9450 driver" * tag 'regulator-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: pca9450: Fix LDO3OUT and LDO4OUT MASK
2023-06-19Merge tag 'hyperv-fixes-signed-20230619' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Fix races in Hyper-V PCI controller (Dexuan Cui) - Fix handling of hyperv_pcpu_input_arg (Michael Kelley) - Fix vmbus_wait_for_unload to scan present CPUs (Michael Kelley) - Call hv_synic_free in the failure path of hv_synic_alloc (Dexuan Cui) - Add noop for real mode handlers for virtual trust level code (Saurabh Sengar) * tag 'hyperv-fixes-signed-20230619' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: PCI: hv: Add a per-bus mutex state_lock Revert "PCI: hv: Fix a timing issue which causes kdump to fail occasionally" PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic PCI: hv: Fix a race condition bug in hv_pci_query_relations() arm64/hyperv: Use CPUHP_AP_HYPERV_ONLINE state to fix CPU online sequencing x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs Drivers: hv: vmbus: Call hv_synic_free() if hv_synic_alloc() fails x86/hyperv/vtl: Add noop for realmode pointers