summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2023-01-12efi: random: combine bootloader provided RNG seed with RNG protocol outputArd Biesheuvel
commit 196dff2712ca5a2e651977bb2fe6b05474111a83 upstream. Instead of blindly creating the EFI random seed configuration table if the RNG protocol is implemented and works, check whether such a EFI configuration table was provided by an earlier boot stage and if so, concatenate the existing and the new seeds, leaving it up to the core code to mix it in and credit it the way it sees fit. This can be used for, e.g., systemd-boot, to pass an additional seed to Linux in a way that can be consumed by the kernel very early. In that case, the following definitions should be used to pass the seed to the EFI stub: struct linux_efi_random_seed { u32 size; // of the 'seed' array in bytes u8 seed[]; }; The memory for the struct must be allocated as EFI_ACPI_RECLAIM_MEMORY pool memory, and the address of the struct in memory should be installed as a EFI configuration table using the following GUID: LINUX_EFI_RANDOM_SEED_TABLE_GUID 1ce1e5bc-7ceb-42f2-81e5-8aadf180f57b Note that doing so is safe even on kernels that were built without this patch applied, but the seed will simply be overwritten with a seed derived from the EFI RNG protocol, if available. The recommended seed size is 32 bytes, and seeds larger than 512 bytes are considered corrupted and ignored entirely. In order to preserve forward secrecy, seeds from previous bootloaders are memzero'd out, and in order to preserve memory, those older seeds are also freed from memory. Freeing from memory without first memzeroing is not safe to do, as it's possible that nothing else will ever overwrite those pages used by EFI. Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> [ardb: incorporate Jason's followup changes to extend the maximum seed size on the consumer end, memzero() it and drop a needless printk] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-12net: dsa: tag_qca: fix wrong MGMT_DATA2 sizeChristian Marangi
commit d9dba91be71f03cc75bcf39fc0d5d99ff33f1ae0 upstream. It was discovered that MGMT_DATA2 can contain up to 28 bytes of data instead of the 12 bytes written in the Documentation by accounting the limit of 16 bytes declared in Documentation subtracting the first 4 byte in the packet header. Update the define with the real world value. Tested-by: Ronald Wahl <ronald.wahl@raritan.com> Fixes: c2ee8181fddb ("net: dsa: tag_qca: add define for handling mgmt Ethernet packet") Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Cc: stable@vger.kernel.org # v5.18+ Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-12netfilter: ipset: Rework long task execution when adding/deleting entriesJozsef Kadlecsik
[ Upstream commit 5e29dc36bd5e2166b834ceb19990d9e68a734d7d ] When adding/deleting large number of elements in one step in ipset, it can take a reasonable amount of time and can result in soft lockup errors. The patch 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete") tried to fix it by limiting the max elements to process at all. However it was not enough, it is still possible that we get hung tasks. Lowering the limit is not reasonable, so the approach in this patch is as follows: rely on the method used at resizing sets and save the state when we reach a smaller internal batch limit, unlock/lock and proceed from the saved state. Thus we can avoid long continuous tasks and at the same time removed the limit to add/delete large number of elements in one step. The nfnl mutex is held during the whole operation which prevents one to issue other ipset commands in parallel. Fixes: 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete") Reported-by: syzbot+9204e7399656300bf271@syzkaller.appspotmail.com Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12filelock: new helper: vfs_inode_has_locksJeff Layton
[ Upstream commit ab1ddef98a715eddb65309ffa83267e4e84a571e ] Ceph has a need to know whether a particular inode has any locks set on it. It's currently tracking that by a num_locks field in its filp->private_data, but that's problematic as it tries to decrement this field when releasing locks and that can race with the file being torn down. Add a new vfs_inode_has_locks helper that just returns whether any locks are currently held on the inode. Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Stable-dep-of: 461ab10ef7e6 ("ceph: switch to vfs_inode_has_locks() to fix file lock bug") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12net/mlx5: E-Switch, properly handle ingress tagged packets on VSTMoshe Shemesh
[ Upstream commit 1f0ae22ab470946143485a02cc1cd7e05c0f9120 ] Fix SRIOV VST mode behavior to insert cvlan when a guest tag is already present in the frame. Previous VST mode behavior was to drop packets or override existing tag, depending on the device version. In this patch we fix this behavior by correctly building the HW steering rule with a push vlan action, or for older devices we ask the FW to stack the vlan when a vlan is already present. Fixes: 07bab9502641 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes") Fixes: dfcb1ed3c331 ("net/mlx5: E-Switch, Vport ingress/egress ACLs rules for VST mode") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12SUNRPC: ensure the matching upcall is in-flight upon downcallminoura makoto
[ Upstream commit b18cba09e374637a0a3759d856a6bca94c133952 ] Commit 9130b8dbc6ac ("SUNRPC: allow for upcalls for the same uid but different gss service") introduced `auth` argument to __gss_find_upcall(), but in gss_pipe_downcall() it was left as NULL since it (and auth->service) was not (yet) determined. When multiple upcalls with the same uid and different service are ongoing, it could happen that __gss_find_upcall(), which returns the first match found in the pipe->in_downcall list, could not find the correct gss_msg corresponding to the downcall we are looking for. Moreover, it might return a msg which is not sent to rpc.gssd yet. We could see mount.nfs process hung in D state with multiple mount.nfs are executed in parallel. The call trace below is of CentOS 7.9 kernel-3.10.0-1160.24.1.el7.x86_64 but we observed the same hang w/ elrepo kernel-ml-6.0.7-1.el7. PID: 71258 TASK: ffff91ebd4be0000 CPU: 36 COMMAND: "mount.nfs" #0 [ffff9203ca3234f8] __schedule at ffffffffa3b8899f #1 [ffff9203ca323580] schedule at ffffffffa3b88eb9 #2 [ffff9203ca323590] gss_cred_init at ffffffffc0355818 [auth_rpcgss] #3 [ffff9203ca323658] rpcauth_lookup_credcache at ffffffffc0421ebc [sunrpc] #4 [ffff9203ca3236d8] gss_lookup_cred at ffffffffc0353633 [auth_rpcgss] #5 [ffff9203ca3236e8] rpcauth_lookupcred at ffffffffc0421581 [sunrpc] #6 [ffff9203ca323740] rpcauth_refreshcred at ffffffffc04223d3 [sunrpc] #7 [ffff9203ca3237a0] call_refresh at ffffffffc04103dc [sunrpc] #8 [ffff9203ca3237b8] __rpc_execute at ffffffffc041e1c9 [sunrpc] #9 [ffff9203ca323820] rpc_execute at ffffffffc0420a48 [sunrpc] The scenario is like this. Let's say there are two upcalls for services A and B, A -> B in pipe->in_downcall, B -> A in pipe->pipe. When rpc.gssd reads pipe to get the upcall msg corresponding to service B from pipe->pipe and then writes the response, in gss_pipe_downcall the msg corresponding to service A will be picked because only uid is used to find the msg and it is before the one for B in pipe->in_downcall. And the process waiting for the msg corresponding to service A will be woken up. Actual scheduing of that process might be after rpc.gssd processes the next msg. In rpc_pipe_generic_upcall it clears msg->errno (for A). The process is scheduled to see gss_msg->ctx == NULL and gss_msg->msg.errno == 0, therefore it cannot break the loop in gss_create_upcall and is never woken up after that. This patch adds a simple check to ensure that a msg which is not sent to rpc.gssd yet is not chosen as the matching upcall upon receiving a downcall. Signed-off-by: minoura makoto <minoura@valinux.co.jp> Signed-off-by: Hiroshi Shimamoto <h-shimamoto@nec.com> Tested-by: Hiroshi Shimamoto <h-shimamoto@nec.com> Cc: Trond Myklebust <trondmy@hammerspace.com> Fixes: 9130b8dbc6ac ("SUNRPC: allow for upcalls for same uid but different gss service") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-07ext4: fix deadlock due to mbcache entry corruptionJan Kara
commit a44e84a9b7764c72896f7241a0ec9ac7e7ef38dd upstream. When manipulating xattr blocks, we can deadlock infinitely looping inside ext4_xattr_block_set() where we constantly keep finding xattr block for reuse in mbcache but we are unable to reuse it because its reference count is too big. This happens because cache entry for the xattr block is marked as reusable (e_reusable set) although its reference count is too big. When this inconsistency happens, this inconsistent state is kept indefinitely and so ext4_xattr_block_set() keeps retrying indefinitely. The inconsistent state is caused by non-atomic update of e_reusable bit. e_reusable is part of a bitfield and e_reusable update can race with update of e_referenced bit in the same bitfield resulting in loss of one of the updates. Fix the problem by using atomic bitops instead. This bug has been around for many years, but it became *much* easier to hit after commit 65f8b80053a1 ("ext4: fix race when reusing xattr blocks"). Cc: stable@vger.kernel.org Fixes: 6048c64b2609 ("mbcache: add reusable flag to cache entries") Fixes: 65f8b80053a1 ("ext4: fix race when reusing xattr blocks") Reported-and-tested-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com> Reported-by: Thilo Fromm <t-lo@linux.microsoft.com> Link: https://lore.kernel.org/r/c77bf00f-4618-7149-56f1-b8d1664b9d07@linux.microsoft.com/ Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20221123193950.16758-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-07ext4: journal_path mount options should follow linksLukas Czerner
commit e3ea75ee651daf5e434afbfdb7dbf75e200ea1f6 upstream. Before the commit 461c3af045d3 ("ext4: Change handle_mount_opt() to use fs_parameter") ext4 mount option journal_path did follow links in the provided path. Bring this behavior back by allowing to pass pathwalk flags to fs_lookup_param(). Fixes: 461c3af045d3 ("ext4: Change handle_mount_opt() to use fs_parameter") Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20221004135803.32283-1-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-07bpf: Resolve fext program type when checking map compatibilityToke Høiland-Jørgensen
[ Upstream commit 1c123c567fb138ebd187480b7fc0610fcb0851f5 ] The bpf_prog_map_compatible() check makes sure that BPF program types are not mixed inside BPF map types that can contain programs (tail call maps, cpumaps and devmaps). It does this by setting the fields of the map->owner struct to the values of the first program being checked against, and rejecting any subsequent programs if the values don't match. One of the values being set in the map owner struct is the program type, and since the code did not resolve the prog type for fext programs, the map owner type would be set to PROG_TYPE_EXT and subsequent loading of programs of the target type into the map would fail. This bug is seen in particular for XDP programs that are loaded as PROG_TYPE_EXT using libxdp; these cannot insert programs into devmaps and cpumaps because the check fails as described above. Fix the bug by resolving the fext program type to its target program type as elsewhere in the verifier. v3: - Add Yonghong's ACK Fixes: f45d5b6ce2e8 ("bpf: generalise tail call map compatibility check") Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20221214230254.790066-1-toke@redhat.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-07PM/devfreq: governor: Add a private governor_data for governorKant Fan
commit 5fdded8448924e3631d466eea499b11606c43640 upstream. The member void *data in the structure devfreq can be overwrite by governor_userspace. For example: 1. The device driver assigned the devfreq governor to simple_ondemand by the function devfreq_add_device() and init the devfreq member void *data to a pointer of a static structure devfreq_simple_ondemand_data by the function devfreq_add_device(). 2. The user changed the devfreq governor to userspace by the command "echo userspace > /sys/class/devfreq/.../governor". 3. The governor userspace alloced a dynamic memory for the struct userspace_data and assigend the member void *data of devfreq to this memory by the function userspace_init(). 4. The user changed the devfreq governor back to simple_ondemand by the command "echo simple_ondemand > /sys/class/devfreq/.../governor". 5. The governor userspace exited and assigned the member void *data in the structure devfreq to NULL by the function userspace_exit(). 6. The governor simple_ondemand fetched the static information of devfreq_simple_ondemand_data in the function devfreq_simple_ondemand_func() but the member void *data of devfreq was assigned to NULL by the function userspace_exit(). 7. The information of upthreshold and downdifferential is lost and the governor simple_ondemand can't work correctly. The member void *data in the structure devfreq is designed for a static pointer used in a governor and inited by the function devfreq_add_device(). This patch add an element named governor_data in the devfreq structure which can be used by a governor(E.g userspace) who want to assign a private data to do some private things. Fixes: ce26c5bb9569 ("PM / devfreq: Add basic governors") Cc: stable@vger.kernel.org # 5.10+ Reviewed-by: Chanwoo Choi <cwchoi00@gmail.com> Acked-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Kant Fan <kant@allwinnertech.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-04eventfd: provide a eventfd_signal_mask() helperJens Axboe
commit 03e02acda8e267a8183e1e0ed289ff1ef9cd7ed8 upstream. This is identical to eventfd_signal(), but it allows the caller to pass in a mask to be used for the poll wakeup key. The use case is avoiding repeated multishot triggers if we have a dependency between eventfd and io_uring. If we setup an eventfd context and register that as the io_uring eventfd, and at the same time queue a multishot poll request for the eventfd context, then any CQE posted will repeatedly trigger the multishot request until it terminates when the CQ ring overflows. In preparation for io_uring detecting this circular dependency, add the mentioned helper so that io_uring can pass in EPOLL_URING as part of the poll wakeup key. Cc: stable@vger.kernel.org # 6.0 [axboe: fold in !CONFIG_EVENTFD fix from Zhang Qilong] Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-04nvme: fix the NVME_CMD_EFFECTS_CSE_MASK definitionChristoph Hellwig
[ Upstream commit 685e6311637e46f3212439ce2789f8a300e5050f ] 3 << 16 does not generate the correct mask for bits 16, 17 and 18. Use the GENMASK macro to generate the correct mask instead. Fixes: 84fef62d135b ("nvme: check admin passthru command effects") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31regulator: core: Use different devices for resource allocation and DT lookupChiYuan Huang
[ Upstream commit 8f3cbcd6b440032ebc7f7d48a1689dcc70a4eb98 ] Following by the below discussion, there's the potential UAF issue between regulator and mfd. https://lore.kernel.org/all/20221128143601.1698148-1-yangyingliang@huawei.com/ From the analysis of Yingliang CPU A |CPU B mt6370_probe() | devm_mfd_add_devices() | |mt6370_regulator_probe() | regulator_register() | //allocate init_data and add it to devres | regulator_of_get_init_data() i2c_unregister_device() | device_del() | devres_release_all() | // init_data is freed | release_nodes() | | // using init_data causes UAF | regulator_register() It's common to use mfd core to create child device for the regulator. In order to do the DT lookup for init data, the child that registered the regulator would pass its parent as the parameter. And this causes init data resource allocated to its parent, not itself. The issue happen when parent device is going to release and regulator core is still doing some operation of init data constraint for the regulator of child device. To fix it, this patch expand 'regulator_register' API to use the different devices for init data allocation and DT lookup. Reported-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: ChiYuan Huang <cy_huang@richtek.com> Link: https://lore.kernel.org/r/1670311341-32664-1-git-send-email-u0084500@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31bpf: Fix a BTF_ID_LIST bug with CONFIG_DEBUG_INFO_BTF not setYonghong Song
[ Upstream commit beb3d47d1d3d7185bb401af628ad32ee204a9526 ] With CONFIG_DEBUG_INFO_BTF not set, we hit the following compilation error, /.../kernel/bpf/verifier.c:8196:23: error: array index 6 is past the end of the array (that has type 'u32[5]' (aka 'unsigned int[5]')) [-Werror,-Warray-bounds] if (meta->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx]) ^ ~~~~~~~~~~~~~~~~~~~~~~~ /.../kernel/bpf/verifier.c:8174:1: note: array 'special_kfunc_list' declared here BTF_ID_LIST(special_kfunc_list) ^ /.../include/linux/btf_ids.h:207:27: note: expanded from macro 'BTF_ID_LIST' #define BTF_ID_LIST(name) static u32 __maybe_unused name[5]; ^ /.../kernel/bpf/verifier.c:8443:19: error: array index 5 is past the end of the array (that has type 'u32[5]' (aka 'unsigned int[5]')) [-Werror,-Warray-bounds] btf_id == special_kfunc_list[KF_bpf_list_pop_back]; ^ ~~~~~~~~~~~~~~~~~~~~ /.../kernel/bpf/verifier.c:8174:1: note: array 'special_kfunc_list' declared here BTF_ID_LIST(special_kfunc_list) ^ /.../include/linux/btf_ids.h:207:27: note: expanded from macro 'BTF_ID_LIST' #define BTF_ID_LIST(name) static u32 __maybe_unused name[5]; ... Fix the problem by increase the size of BTF_ID_LIST to 16 to avoid compilation error and also prevent potentially unintended issue due to out-of-bound access. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221123155759.2669749-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31net: add atomic_long_t to net_device_stats fieldsEric Dumazet
[ Upstream commit 6c1c5097781f563b70a81683ea6fdac21637573b ] Long standing KCSAN issues are caused by data-race around some dev->stats changes. Most performance critical paths already use per-cpu variables, or per-queue ones. It is reasonable (and more correct) to use atomic operations for the slow paths. This patch adds an union for each field of net_device_stats, so that we can convert paths that are not yet protected by a spinlock or a mutex. netdev_stats_to_stats64() no longer has an #if BITS_PER_LONG==64 Note that the memcpy() we were using on 64bit arches had no provision to avoid load-tearing, while atomic_long_read() is providing the needed protection at no cost. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31video: hyperv_fb: Avoid taking busy spinlock on panic pathGuilherme G. Piccoli
[ Upstream commit 1d044ca035dc22df0d3b39e56f2881071d9118bd ] The Hyper-V framebuffer code registers a panic notifier in order to try updating its fbdev if the kernel crashed. The notifier callback is straightforward, but it calls the vmbus_sendpacket() routine eventually, and such function takes a spinlock for the ring buffer operations. Panic path runs in atomic context, with local interrupts and preemption disabled, and all secondary CPUs shutdown. That said, taking a spinlock might cause a lockup if a secondary CPU was disabled with such lock taken. Fix it here by checking if the ring buffer spinlock is busy on Hyper-V framebuffer panic notifier; if so, bail-out avoiding the potential lockup scenario. Cc: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Tianyu Lan <Tianyu.Lan@microsoft.com> Cc: Wei Liu <wei.liu@kernel.org> Tested-by: Fabio A M Martins <fabiomirmar@gmail.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20220819221731.480795-10-gpiccoli@igalia.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31iio: adis: add '__adis_enable_irq()' implementationRamona Bolboaca
[ Upstream commit 99c05e4283a19a02a256f14100ca4ec3b2da3f62 ] Add '__adis_enable_irq()' implementation which is the unlocked version of 'adis_enable_irq()'. Call '__adis_enable_irq()' instead of 'adis_enable_irq()' from '__adis_intial_startup()' to keep the expected unlocked functionality. This fix is needed to remove a deadlock for all devices which are using 'adis_initial_startup()'. The deadlock occurs because the same mutex is acquired twice, without releasing it. The mutex is acquired once inside 'adis_initial_startup()', before calling '__adis_initial_startup()', and once inside 'adis_enable_irq()', which is called by '__adis_initial_startup()'. The deadlock is removed by calling '__adis_enable_irq()', instead of 'adis_enable_irq()' from within '__adis_initial_startup()'. Fixes: b600bd7eb3335 ("iio: adis: do not disabe IRQs in 'adis_init()'") Signed-off-by: Ramona Bolboaca <ramona.bolboaca@analog.com> Reviewed-by: Nuno Sá <nuno.sa@analog.com> Link: https://lore.kernel.org/r/20221122082757.449452-2-ramona.bolboaca@analog.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31crypto: hisilicon/qm - add missing pci_dev_put() in q_num_set()Xiongfeng Wang
[ Upstream commit cc7710d0d4ebc6998f04035cde4f32c5ddbe9d7f ] pci_get_device() will increase the reference count for the returned pci_dev. We need to use pci_dev_put() to decrease the reference count before q_num_set() returns. Fixes: c8b4b477079d ("crypto: hisilicon - add HiSilicon HPRE accelerator") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: Weili Qian <qianweili@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31fortify: Do not cast to "unsigned char"Kees Cook
[ Upstream commit e9a40e1585d792751d3a122392695e5a53032809 ] Do not cast to "unsigned char", as this needlessly creates type problems when attempting builds without -Wno-pointer-sign[1]. The intent of the cast is to drop possible "const" types. [1] https://lore.kernel.org/lkml/CAHk-=wgz3Uba8w7kdXhsqR1qvfemYL+OFQdefJnkeqXG8qZ_pA@mail.gmail.com/ Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: 3009f891bb9f ("fortify: Allow strlen() and strnlen() to pass compile-time known lengths") Cc: linux-hardening@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31fortify: Use SIZE_MAX instead of (size_t)-1Kees Cook
[ Upstream commit 311fb40aa0569abacc430b0d66ee41470803111f ] Clean up uses of "(size_t)-1" in favor of SIZE_MAX. Cc: linux-hardening@vger.kernel.org Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Stable-dep-of: e9a40e1585d7 ("fortify: Do not cast to "unsigned char"") Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31crypto: hisilicon/qm - get hardware features from hardware registersWeili Qian
[ Upstream commit 82f00b24f532557fb0e15a6a2747859e4b70c4bd ] Before hardware V3, hardwares do not provide the feature registers, driver resolves hardware differences based on the hardware version. As a result, the driver does not support the new hardware. Hardware V3 and later versions support to obtain hardware features, such as power-gating management and doorbell isolation, through the hardware registers. To be compatible with later hardware versions, the features of the current device is obtained by reading the hardware registers instead of the hardware version. Signed-off-by: Weili Qian <qianweili@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Stable-dep-of: ee1537fe3dd8 ("crypto: hisilicon/qm - re-enable communicate interrupt before notifying PF") Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31bpf, sockmap: Fix missing BPF_F_INGRESS flag when using apply_bytesPengcheng Yang
[ Upstream commit a351d6087bf7d3d8440d58d3bf244ec64b89394a ] When redirecting, we use sk_msg_to_ingress() to get the BPF_F_INGRESS flag from the msg->flags. If apply_bytes is used and it is larger than the current data being processed, sk_psock_msg_verdict() will not be called when sendmsg() is called again. At this time, the msg->flags is 0, and we lost the BPF_F_INGRESS flag. So we need to save the BPF_F_INGRESS flag in sk_psock and use it when redirection. Fixes: 8934ce2fd081 ("bpf: sockmap redirect ingress support") Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/1669718441-2654-3-git-send-email-yangpc@wangsu.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31net, proc: Provide PROC_FS=n fallback for proc_create_net_single_write()David Howells
[ Upstream commit c3d96f690a790074b508fe183a41e36a00cd7ddd ] Provide a CONFIG_PROC_FS=n fallback for proc_create_net_single_write(). Also provide a fallback for proc_create_net_data_write(). Fixes: 564def71765c ("proc: Add a way to make network proc files writable") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31wifi: fix multi-link element subelement iterationJohannes Berg
[ Upstream commit 1177aaa7fe9373c762cd5bf5f5de8517bac989d5 ] The subelements obviously start after the common data, including the common multi-link element structure definition itself. This bug was possibly just hidden by the higher bits of the control being set to 0, so the iteration just found one bogus element and most of the code could continue anyway. Fixes: 0f48b8b88aa9 ("wifi: ieee80211: add definitions for multi-link element") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31eventfd: change int to __u64 in eventfd_signal() ifndef CONFIG_EVENTFDZhang Qilong
[ Upstream commit fd4e60bf0ef8eb9edcfa12dda39e8b6ee9060492 ] Commit ee62c6b2dc93 ("eventfd: change int to __u64 in eventfd_signal()") forgot to change int to __u64 in the CONFIG_EVENTFD=n stub function. Link: https://lkml.kernel.org/r/20221124140154.104680-1-zhangqilong3@huawei.com Fixes: ee62c6b2dc93 ("eventfd: change int to __u64 in eventfd_signal()") Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Cc: Dylan Yudaken <dylany@fb.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Sha Zhengju <handai.szj@taobao.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31debugfs: fix error when writing negative value to atomic_t debugfs fileAkinobu Mita
[ Upstream commit d472cf797c4e268613dbce5ec9b95d0bcae19ecb ] The simple attribute files do not accept a negative value since the commit 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()"), so we have to use a 64-bit value to write a negative value for a debugfs file created by debugfs_create_atomic_t(). This restores the previous behaviour by introducing DEFINE_DEBUGFS_ATTRIBUTE_SIGNED for a signed value. Link: https://lkml.kernel.org/r/20220919172418.45257-4-akinobu.mita@gmail.com Fixes: 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()") Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reported-by: Zhao Gongyi <zhaogongyi@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Oscar Salvador <osalvador@suse.de> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Wei Yongjun <weiyongjun1@huawei.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31libfs: add DEFINE_SIMPLE_ATTRIBUTE_SIGNED for signed valueAkinobu Mita
[ Upstream commit 2e41f274f9aa71cdcc69dc1f26a3f9304a651804 ] Patch series "fix error when writing negative value to simple attribute files". The simple attribute files do not accept a negative value since the commit 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()"), but some attribute files want to accept a negative value. This patch (of 3): The simple attribute files do not accept a negative value since the commit 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()"), so we have to use a 64-bit value to write a negative value. This adds DEFINE_SIMPLE_ATTRIBUTE_SIGNED for a signed value. Link: https://lkml.kernel.org/r/20220919172418.45257-1-akinobu.mita@gmail.com Link: https://lkml.kernel.org/r/20220919172418.45257-2-akinobu.mita@gmail.com Fixes: 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()") Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reported-by: Zhao Gongyi <zhaogongyi@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Oscar Salvador <osalvador@suse.de> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Wei Yongjun <weiyongjun1@huawei.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31timerqueue: Use rb_entry_safe() in timerqueue_getnext()Barnabás Pőcze
[ Upstream commit 2f117484329b233455ee278f2d9b0a4356835060 ] When `timerqueue_getnext()` is called on an empty timer queue, it will use `rb_entry()` on a NULL pointer, which is invalid. Fix that by using `rb_entry_safe()` which handles NULL pointers. This has not caused any issues so far because the offset of the `rb_node` member in `timerqueue_node` is 0, so `rb_entry()` is essentially a no-op. Fixes: 511885d7061e ("lib/timerqueue: Rely on rbtree semantics for next timer") Signed-off-by: Barnabás Pőcze <pobrn@protonmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221114195421.342929-1-pobrn@protonmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-21kallsyms: Make module_kallsyms_on_each_symbol generally availableJiri Olsa
commit 73feb8d5fa3b755bb51077c0aabfb6aa556fd498 upstream. Making module_kallsyms_on_each_symbol generally available, so it can be used outside CONFIG_LIVEPATCH option in following changes. Rather than adding another ifdef option let's make the function generally available (when CONFIG_KALLSYMS and CONFIG_MODULES options are defined). Cc: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20221025134148.3300700-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-19can: sja1000: fix size of OCR_MODE_MASK defineHeiko Schocher
[ Upstream commit 26e8f6a75248247982458e8237b98c9fb2ffcf9d ] bitfield mode in ocr register has only 2 bits not 3, so correct the OCR_MODE_MASK define. Signed-off-by: Heiko Schocher <hs@denx.de> Link: https://lore.kernel.org/all/20221123071636.2407823-1-hs@denx.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-14memcg: fix possible use-after-free in memcg_write_event_control()Tejun Heo
commit 4a7ba45b1a435e7097ca0f79a847d0949d0eb088 upstream. memcg_write_event_control() accesses the dentry->d_name of the specified control fd to route the write call. As a cgroup interface file can't be renamed, it's safe to access d_name as long as the specified file is a regular cgroup file. Also, as these cgroup interface files can't be removed before the directory, it's safe to access the parent too. Prior to 347c4a874710 ("memcg: remove cgroup_event->cft"), there was a call to __file_cft() which verified that the specified file is a regular cgroupfs file before further accesses. The cftype pointer returned from __file_cft() was no longer necessary and the commit inadvertently dropped the file type check with it allowing any file to slip through. With the invarients broken, the d_name and parent accesses can now race against renames and removals of arbitrary files and cause use-after-free's. Fix the bug by resurrecting the file type check in __file_cft(). Now that cgroupfs is implemented through kernfs, checking the file operations needs to go through a layer of indirection. Instead, let's check the superblock and dentry type. Link: https://lkml.kernel.org/r/Y5FRm/cfcKPGzWwl@slm.duckdns.org Fixes: 347c4a874710 ("memcg: remove cgroup_event->cft") Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jann Horn <jannh@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: <stable@vger.kernel.org> [3.14+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-14platform/x86: asus-wmi: Add support for ROG X13 tablet modeLuke D. Jones
[ Upstream commit e397c3c460bf3849384f2f55516d1887617cfca9 ] Add quirk for ASUS ROG X13 Flow 2-in-1 to enable tablet mode with lid flip (all screen rotations). Signed-off-by: Luke D. Jones <luke@ljones.dev> Link: https://lore.kernel.org/r/20220813092753.6635-2-luke@ljones.dev Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-14hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processingMike Kravetz
commit 04ada095dcfc4ae359418053c0be94453bdf1e84 upstream. madvise(MADV_DONTNEED) ends up calling zap_page_range() to clear page tables associated with the address range. For hugetlb vmas, zap_page_range will call __unmap_hugepage_range_final. However, __unmap_hugepage_range_final assumes the passed vma is about to be removed and deletes the vma_lock to prevent pmd sharing as the vma is on the way out. In the case of madvise(MADV_DONTNEED) the vma remains, but the missing vma_lock prevents pmd sharing and could potentially lead to issues with truncation/fault races. This issue was originally reported here [1] as a BUG triggered in page_try_dup_anon_rmap. Prior to the introduction of the hugetlb vma_lock, __unmap_hugepage_range_final cleared the VM_MAYSHARE flag to prevent pmd sharing. Subsequent faults on this vma were confused as VM_MAYSHARE indicates a sharable vma, but was not set so page_mapping was not set in new pages added to the page table. This resulted in pages that appeared anonymous in a VM_SHARED vma and triggered the BUG. Address issue by adding a new zap flag ZAP_FLAG_UNMAP to indicate an unmap call from unmap_vmas(). This is used to indicate the 'final' unmapping of a hugetlb vma. When called via MADV_DONTNEED, this flag is not set and the vm_lock is not deleted. NOTE - Prior to the introduction of the huegtlb vma_lock in v6.1, this issue is addressed by not clearing the VM_MAYSHARE flag when __unmap_hugepage_range_final is called in the MADV_DONTNEED case. [1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6JudD0g@mail.gmail.com/ Link: https://lkml.kernel.org/r/20221114235507.294320-3-mike.kravetz@oracle.com Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Wei Chen <harperchen1110@gmail.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mina Almasry <almasrymina@google.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-14madvise: use zap_page_range_single for madvise dontneedMike Kravetz
[ Upstream commit 21b85b09527c28e242db55c1b751f7f7549b830c ] This series addresses the issue first reported in [1], and fully described in patch 2. Patches 1 and 2 address the user visible issue and are tagged for stable backports. While exploring solutions to this issue, related problems with mmu notification calls were discovered. This is addressed in the patch "hugetlb: remove duplicate mmu notifications:". Since there are no user visible effects, this third is not tagged for stable backports. Previous discussions suggested further cleanup by removing the routine zap_page_range. This is possible because zap_page_range_single is now exported, and all callers of zap_page_range pass ranges entirely within a single vma. This work will be done in a later patch so as not to distract from this bug fix. [1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6JudD0g@mail.gmail.com/ This patch (of 2): Expose the routine zap_page_range_single to zap a range within a single vma. The madvise routine madvise_dontneed_single_vma can use this routine as it explicitly operates on a single vma. Also, update the mmu notification range in zap_page_range_single to take hugetlb pmd sharing into account. This is required as MADV_DONTNEED supports hugetlb vmas. Link: https://lkml.kernel.org/r/20221114235507.294320-1-mike.kravetz@oracle.com Link: https://lkml.kernel.org/r/20221114235507.294320-2-mike.kravetz@oracle.com Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Wei Chen <harperchen1110@gmail.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mina Almasry <almasrymina@google.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-08mm/damon: introduce struct damos_access_patternYajun Deng
[ Upstream commit f5a79d7c0c87c8d88bb5e3f3c898258fdf1b3b05 ] damon_new_scheme() has too many parameters, so introduce struct damos_access_pattern to simplify it. In additon, we can't use a bpf trace kprobe that has more than 5 parameters. Link: https://lkml.kernel.org/r/20220908191443.129534-1-sj@kernel.org Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: SeongJae Park <sj@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 95bc35f9bee5 ("mm/damon/sysfs: fix wrong empty schemes assumption under online tuning in damon_sysfs_set_schemes()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-08mmc: core: Fix ambiguous TRIM and DISCARD argChristian Löhle
commit 489d144563f23911262a652234b80c70c89c978b upstream. Clean up the MMC_TRIM_ARGS define that became ambiguous with DISCARD introduction. While at it, let's fix one usage where MMC_TRIM_ARGS falsely included DISCARD too. Fixes: b3bf915308ca ("mmc: core: new discard feature support at eMMC v4.5") Signed-off-by: Christian Loehle <cloehle@hyperstone.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/11376b5714964345908f3990f17e0701@hyperstone.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-08vfs: fix copy_file_range() averts filesystem freeze protectionAmir Goldstein
[ Upstream commit 10bc8e4af65946b727728d7479c028742321b60a ] Commit 868f9f2f8e00 ("vfs: fix copy_file_range() regression in cross-fs copies") removed fallback to generic_copy_file_range() for cross-fs cases inside vfs_copy_file_range(). To preserve behavior of nfsd and ksmbd server-side-copy, the fallback to generic_copy_file_range() was added in nfsd and ksmbd code, but that call is missing sb_start_write(), fsnotify hooks and more. Ideally, nfsd and ksmbd would pass a flag to vfs_copy_file_range() that will take care of the fallback, but that code would be subtle and we got vfs_copy_file_range() logic wrong too many times already. Instead, add a flag to explicitly request vfs_copy_file_range() to perform only generic_copy_file_range() and let nfsd and ksmbd use this flag only in the fallback path. This choise keeps the logic changes to minimum in the non-nfsd/ksmbd code paths to reduce the risk of further regressions. Fixes: 868f9f2f8e00 ("vfs: fix copy_file_range() regression in cross-fs copies") Tested-by: Namjae Jeon <linkinjeon@kernel.org> Tested-by: Luis Henriques <lhenriques@suse.de> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-02block: make blk_set_default_limits() privateKeith Busch
[ Upstream commit b3228254bb6e91e57f920227f72a1a7d81925d81 ] There are no external users of this function. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221110184501.2451620-4-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-02bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)Peter Zijlstra
[ Upstream commit c86df29d11dfba27c0a1f5039cd6fe387fbf4239 ] The dispatcher function is currently abusing the ftrace __fentry__ call location for its own purposes -- this obviously gives trouble when the dispatcher and ftrace are both in use. A previous solution tried using __attribute__((patchable_function_entry())) which works, except it is GCC-8+ only, breaking the build on the earlier still supported compilers. Instead use static_call() -- which has its own annotations and does not conflict with ftrace -- to rewrite the dispatch function. By using: return static_call()(ctx, insni, bpf_func) you get a perfect forwarding tail call as function body (iow a single jmp instruction). By having the default static_call() target be bpf_dispatcher_nop_func() it retains the default behaviour (an indirect call to the argument function). Only once a dispatcher program is attached is the target rewritten to directly call the JIT'ed image. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/Y1/oBlK0yFk5c/Im@hirez.programming.kicks-ass.net Link: https://lore.kernel.org/bpf/20221103120647.796772565@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-02mm: fix unexpected changes to {failslab|fail_page_alloc}.attrQi Zheng
commit ea4452de2ae987342fadbdd2c044034e6480daad upstream. When we specify __GFP_NOWARN, we only expect that no warnings will be issued for current caller. But in the __should_failslab() and __should_fail_alloc_page(), the local GFP flags alter the global {failslab|fail_page_alloc}.attr, which is persistent and shared by all tasks. This is not what we expected, let's fix it. [akpm@linux-foundation.org: unexport should_fail_ex()] Link: https://lkml.kernel.org/r/20221118100011.2634-1-zhengqi.arch@bytedance.com Fixes: 3f913fc5f974 ("mm: fix missing handler for __GFP_NOWARN") Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-02fscache: fix OOB Read in __fscache_acquire_volumeDavid Howells
[ Upstream commit 9f0933ac026f7e54fe096797af9de20724e79097 ] The type of a->key[0] is char in fscache_volume_same(). If the length of cache volume key is greater than 127, the value of a->key[0] is less than 0. In this case, klen becomes much larger than 255 after type conversion, because the type of klen is size_t. As a result, memcmp() is read out of bounds. This causes a slab-out-of-bounds Read in __fscache_acquire_volume(), as reported by Syzbot. Fix this by changing the type of the stored key to "u8 *" rather than "char *" (it isn't a simple string anyway). Also put in a check that the volume name doesn't exceed NAME_MAX. BUG: KASAN: slab-out-of-bounds in memcmp+0x16f/0x1c0 lib/string.c:757 Read of size 8 at addr ffff888016f3aa90 by task syz-executor344/3613 Call Trace: memcmp+0x16f/0x1c0 lib/string.c:757 memcmp include/linux/fortify-string.h:420 [inline] fscache_volume_same fs/fscache/volume.c:133 [inline] fscache_hash_volume fs/fscache/volume.c:171 [inline] __fscache_acquire_volume+0x76c/0x1080 fs/fscache/volume.c:328 fscache_acquire_volume include/linux/fscache.h:204 [inline] v9fs_cache_session_get_cookie+0x143/0x240 fs/9p/cache.c:34 v9fs_session_init+0x1166/0x1810 fs/9p/v9fs.c:473 v9fs_mount+0xba/0xc90 fs/9p/vfs_super.c:126 legacy_get_tree+0x105/0x220 fs/fs_context.c:610 vfs_get_tree+0x89/0x2f0 fs/super.c:1530 do_new_mount fs/namespace.c:3040 [inline] path_mount+0x1326/0x1e20 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __x64_sys_mount+0x27f/0x300 fs/namespace.c:3568 Fixes: 62ab63352350 ("fscache: Implement volume registration") Reported-by: syzbot+a76f6a6e524cf2080aa3@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Zhang Peng <zhangpeng362@huawei.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Jeff Layton <jlayton@kernel.org> cc: v9fs-developer@lists.sourceforge.net cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/Y3OH+Dmi0QIOK18n@codewreck.org/ # Zhang Peng's v1 fix Link: https://lore.kernel.org/r/20221115140447.2971680-1-zhangpeng362@huawei.com/ # Zhang Peng's v2 fix Link: https://lore.kernel.org/r/166869954095.3793579.8500020902371015443.stgit@warthog.procyon.org.uk/ # v1 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-02net/mlx5: cmdif, Print info on any firmware cmd failure to tracepointMoshe Shemesh
[ Upstream commit 870c2481174b839e7159555127bc8b5a5d0699ba ] While moving to new CMD API (quiet API), some pre-existing flows may call the new API function that in case of error, returns the error instead of printing it as previously done. For such flows we bring back the print but to tracepoint this time for sys admins to have the ability to check for errors especially for commands using the new quiet API. Tracepoint output example: devlink-1333 [001] ..... 822.746922: mlx5_cmd: ACCESS_REG(0x805) op_mod(0x0) failed, status bad resource(0x5), syndrome (0xb06e1f), err(-22) Fixes: f23519e542e5 ("net/mlx5: cmdif, Add new api for command execution") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-26bpf: Prevent bpf program recursion for raw tracepoint probesJiri Olsa
commit 05b24ff9b2cfabfcfd951daaa915a036ab53c9e1 upstream. We got report from sysbot [1] about warnings that were caused by bpf program attached to contention_begin raw tracepoint triggering the same tracepoint by using bpf_trace_printk helper that takes trace_printk_lock lock. Call Trace: <TASK> ? trace_event_raw_event_bpf_trace_printk+0x5f/0x90 bpf_trace_printk+0x2b/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 __unfreeze_partials+0x5b/0x160 ... The can be reproduced by attaching bpf program as raw tracepoint on contention_begin tracepoint. The bpf prog calls bpf_trace_printk helper. Then by running perf bench the spin lock code is forced to take slow path and call contention_begin tracepoint. Fixing this by skipping execution of the bpf program if it's already running, Using bpf prog 'active' field, which is being currently used by trampoline programs for the same reason. Moving bpf_prog_inc_misses_counter to syscall.c because trampoline.c is compiled in just for CONFIG_BPF_JIT option. Reviewed-by: Stanislav Fomichev <sdf@google.com> Reported-by: syzbot+2251879aa068ad9c960d@syzkaller.appspotmail.com [1] https://lore.kernel.org/bpf/YxhFe3EwqchC%2FfYf@krava/T/#t Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220916071914.7156-1-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-26wifi: wext: use flex array destination for memcpy()Hawkins Jiawei
commit e3e6e1d16a4cf7b63159ec71774e822194071954 upstream. Syzkaller reports buffer overflow false positive as follows: ------------[ cut here ]------------ memcpy: detected field-spanning write (size 8) of single field "&compat_event->pointer" at net/wireless/wext-core.c:623 (size 4) WARNING: CPU: 0 PID: 3607 at net/wireless/wext-core.c:623 wireless_send_event+0xab5/0xca0 net/wireless/wext-core.c:623 Modules linked in: CPU: 1 PID: 3607 Comm: syz-executor659 Not tainted 6.0.0-rc6-next-20220921-syzkaller #0 [...] Call Trace: <TASK> ioctl_standard_call+0x155/0x1f0 net/wireless/wext-core.c:1022 wireless_process_ioctl+0xc8/0x4c0 net/wireless/wext-core.c:955 wext_ioctl_dispatch net/wireless/wext-core.c:988 [inline] wext_ioctl_dispatch net/wireless/wext-core.c:976 [inline] wext_handle_ioctl+0x26b/0x280 net/wireless/wext-core.c:1049 sock_ioctl+0x285/0x640 net/socket.c:1220 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] </TASK> Wireless events will be sent on the appropriate channels in wireless_send_event(). Different wireless events may have different payload structure and size, so kernel uses **len** and **cmd** field in struct __compat_iw_event as wireless event common LCP part, uses **pointer** as a label to mark the position of remaining different part. Yet the problem is that, **pointer** is a compat_caddr_t type, which may be smaller than the relative structure at the same position. So during wireless_send_event() tries to parse the wireless events payload, it may trigger the memcpy() run-time destination buffer bounds checking when the relative structure's data is copied to the position marked by **pointer**. This patch solves it by introducing flexible-array field **ptr_bytes**, to mark the position of the wireless events remaining part next to LCP part. What's more, this patch also adds **ptr_len** variable in wireless_send_event() to improve its maintainability. Reported-and-tested-by: syzbot+473754e5af963cf014cf@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/00000000000070db2005e95a5984@google.com/ Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-26tracing: Fix warning on variable 'struct trace_array'Aashish Sharma
[ Upstream commit bedf06833b1f63c2627bd5634602e05592129d7a ] Move the declaration of 'struct trace_array' out of #ifdef CONFIG_TRACING block, to fix the following warning when CONFIG_TRACING is not set: >> include/linux/trace.h:63:45: warning: 'struct trace_array' declared inside parameter list will not be visible outside of this definition or declaration Link: https://lkml.kernel.org/r/20221107160556.2139463-1-shraash@google.com Fixes: 1a77dd1c2bb5 ("scsi: tracing: Fix compile error in trace_array calls when TRACING is disabled") Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Arun Easi <aeasi@marvell.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Aashish Sharma <shraash@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-26io_uring: fix multishot accept request leaksPavel Begunkov
commit 91482864768a874c4290ef93b84a78f4f1dac51b upstream. Having REQ_F_POLLED set doesn't guarantee that the request is executed as a multishot from the polling path. Fortunately for us, if the code thinks it's multishot issue when it's not, it can only ask to skip completion so leaking the request. Use issue_flags to mark multipoll issues. Cc: stable@vger.kernel.org Fixes: 390ed29b5e425 ("io_uring: add IORING_ACCEPT_MULTISHOT for accept") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7700ac57653f2823e30b34dc74da68678c0c5f13.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-26tracing/ring-buffer: Have polling block on watermarkSteven Rostedt (Google)
commit 42fb0a1e84ff525ebe560e2baf9451ab69127e2b upstream. Currently the way polling works on the ring buffer is broken. It will return immediately if there's any data in the ring buffer whereas a read will block until the watermark (defined by the tracefs buffer_percent file) is hit. That is, a select() or poll() will return as if there's data available, but then the following read will block. This is broken for the way select()s and poll()s are supposed to work. Have the polling on the ring buffer also block the same way reads and splice does on the ring buffer. Link: https://lkml.kernel.org/r/20221020231427.41be3f26@gandalf.local.home Cc: Linux Trace Kernel <linux-trace-kernel@vger.kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Primiano Tucci <primiano@google.com> Cc: stable@vger.kernel.org Fixes: 1e0d6714aceb7 ("ring-buffer: Do not wake up a splice waiter when page is not full") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-26block: make dma_alignment a stacking queue_limitKeith Busch
[ Upstream commit c964d62f5cab7b43dd0534f22a96eab386c6ec5d ] Device mappers had always been getting the default 511 dma mask, but the underlying device might have a larger alignment requirement. Since this value is used to determine alloweable direct-io alignment, this needs to be a stackable limit. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221110184501.2451620-2-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: 86e4d3e8d183 ("dm-crypt: provide dma_alignment limit in io_hints") Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-26hugetlb: rename remove_huge_page to hugetlb_delete_from_page_cacheMike Kravetz
[ Upstream commit 7e1813d48dd30e6c6f235f6661d1bc108fcab528 ] remove_huge_page removes a hugetlb page from the page cache. Change to hugetlb_delete_from_page_cache as it is a more descriptive name. huge_add_to_page_cache is global in scope, but only deals with hugetlb pages. For consistency and clarity, rename to hugetlb_add_to_page_cache. Link: https://lkml.kernel.org/r/20220914221810.95771-4-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: James Houghton <jthoughton@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Prakash Sangappa <prakash.sangappa@oracle.com> Cc: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 8625147cafaa ("hugetlbfs: don't delete error page from pagecache") Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-16can: dev: fix skb drop checkOliver Hartkopp
commit ae64438be1923e3c1102d90fd41db7afcfaf54cc upstream. In commit a6d190f8c767 ("can: skb: drop tx skb if in listen only mode") the priv->ctrlmode element is read even on virtual CAN interfaces that do not create the struct can_priv at startup. This out-of-bounds read may lead to CAN frame drops for virtual CAN interfaces like vcan and vxcan. This patch mainly reverts the original commit and adds a new helper for CAN interface drivers that provide the required information in struct can_priv. Fixes: a6d190f8c767 ("can: skb: drop tx skb if in listen only mode") Reported-by: Dariusz Stojaczyk <Dariusz.Stojaczyk@opensynergy.com> Cc: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Cc: Max Staudt <max@enpas.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://lore.kernel.org/all/20221102095431.36831-1-socketcan@hartkopp.net Cc: stable@vger.kernel.org # 6.0.x [mkl: patch pch_can, too] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>