summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2018-11-21clockevents/drivers/i8253: Add support for PIT shutdown quirkMichael Kelley
commit 35b69a420bfb56b7b74cb635ea903db05e357bec upstream. Add support for platforms where pit_shutdown() doesn't work because of a quirk in the PIT emulation. On these platforms setting the counter register to zero causes the PIT to start running again, negating the shutdown. Provide a global variable that controls whether the counter register is zero'ed, which platform specific code can override. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org> Cc: "devel@linuxdriverproject.org" <devel@linuxdriverproject.org> Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org> Cc: "virtualization@lists.linux-foundation.org" <virtualization@lists.linux-foundation.org> Cc: "jgross@suse.com" <jgross@suse.com> Cc: "akataria@vmware.com" <akataria@vmware.com> Cc: "olaf@aepfle.de" <olaf@aepfle.de> Cc: "apw@canonical.com" <apw@canonical.com> Cc: vkuznets <vkuznets@redhat.com> Cc: "jasowang@redhat.com" <jasowang@redhat.com> Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com> Cc: KY Srinivasan <kys@microsoft.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1541303219-11142-2-git-send-email-mikelley@microsoft.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21watchdog/core: Add missing prototypes for weak functionsMathieu Malaterre
commit 81bd415c91eb966118d773dddf254aebf3022411 upstream. The split out of the hard lockup detector exposed two new weak functions, but no prototypes for them, which triggers the build warning: kernel/watchdog.c:109:12: warning: no previous prototype for ‘watchdog_nmi_enable’ [-Wmissing-prototypes] kernel/watchdog.c:115:13: warning: no previous prototype for ‘watchdog_nmi_disable’ [-Wmissing-prototypes] Add the prototypes. Fixes: 73ce0511c436 ("kernel/watchdog.c: move hardlockup detector to separate file") Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Babu Moger <babu.moger@oracle.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180606194232.17653-1-malat@debian.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21mtd: nand: Fix nanddev_neraseblocks()Boris Brezillon
commit d098093ba06eb032057d1aca1c2e45889e099d00 upstream. nanddev_neraseblocks() currently returns the number pages per LUN instead of the total number of eraseblocks. Fixes: 9c3736a3de21 ("mtd: nand: Add core infrastructure to deal with NAND devices") Cc: <stable@vger.kernel.org> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21libceph: bump CEPH_MSG_MAX_DATA_LENIlya Dryomov
commit 94e6992bb560be8bffb47f287194adf070b57695 upstream. If the read is large enough, we end up spinning in the messenger: libceph: osd0 192.168.122.1:6801 io error libceph: osd0 192.168.122.1:6801 io error libceph: osd0 192.168.122.1:6801 io error This is a receive side limit, so only reads were affected. Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13media: hdmi.h: rename ADOBE_RGB to OPRGB and ADOBE_YCC to OPYCCHans Verkuil
commit 463659a08d7999d5461fa45b35b17686189a70ca upstream. These names have been renamed in the CTA-861 standard due to trademark issues. Replace them here as well so they are in sync with the standard. Signed-off-by: Hans Verkuil <hansverk@cisco.com> Cc: stable@vger.kernel.org Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13TC: Set DMA masks for devicesMaciej W. Rozycki
commit 3f2aa244ee1a0d17ed5b6c86564d2c1b24d1c96b upstream. Fix a TURBOchannel support regression with commit 205e1b7f51e4 ("dma-mapping: warn when there is no coherent_dma_mask") that caused coherent DMA allocations to produce a warning such as: defxx: v1.11 2014/07/01 Lawrence V. Stefani and others tc1: DEFTA at MMIO addr = 0x1e900000, IRQ = 20, Hardware addr = 08-00-2b-a3-a3-29 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 dfx_dev_register+0x670/0x678 Modules linked in: CPU: 0 PID: 1 Comm: swapper Not tainted 4.19.0-rc6 #2 Stack : ffffffff8009ffc0 fffffffffffffec0 0000000000000000 ffffffff80647650 0000000000000000 0000000000000000 ffffffff806f5f80 ffffffffffffffff 0000000000000000 0000000000000000 0000000000000001 ffffffff8065d4e8 98000000031b6300 ffffffff80563478 ffffffff805685b0 ffffffffffffffff 0000000000000000 ffffffff805d6720 0000000000000204 ffffffff80388df8 0000000000000000 0000000000000009 ffffffff8053efd0 ffffffff806657d0 0000000000000000 ffffffff803177f8 0000000000000000 ffffffff806d0000 9800000003078000 980000000307b9e0 000000001e900000 ffffffff80067940 0000000000000000 ffffffff805d6720 0000000000000204 ffffffff80388df8 ffffffff805176c0 ffffffff8004dc78 0000000000000000 ffffffff80067940 ... Call Trace: [<ffffffff8004dc78>] show_stack+0xa0/0x130 [<ffffffff80067940>] __warn+0x128/0x170 ---[ end trace b1d1e094f67f3bb2 ]--- This is because the TURBOchannel bus driver fails to set the coherent DMA mask for devices enumerated. Set the regular and coherent DMA masks for TURBOchannel devices then, observing that the bus protocol supports a 34-bit (16GiB) DMA address space, by interpreting the value presented in the address cycle across the 32 `ad' lines as a 32-bit word rather than byte address[1]. The architectural size of the TURBOchannel DMA address space exceeds the maximum amount of RAM any actual TURBOchannel system in existence may have, hence both masks are the same. This removes the warning shown above. References: [1] "TURBOchannel Hardware Specification", EK-369AA-OD-007B, Digital Equipment Corporation, January 1993, Section "DMA", pp. 1-15 -- 1-17 Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/20835/ Fixes: 205e1b7f51e4 ("dma-mapping: warn when there is no coherent_dma_mask") Cc: stable@vger.kernel.org # 4.16+ Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13fsnotify: Fix busy inodes during unmountJan Kara
commit 721fb6fbfd2132164c2e8777cc837f9b2c1794dc upstream. Detaching of mark connector from fsnotify_put_mark() can race with unmounting of the filesystem like: CPU1 CPU2 fsnotify_put_mark() spin_lock(&conn->lock); ... inode = fsnotify_detach_connector_from_object(conn) spin_unlock(&conn->lock); generic_shutdown_super() fsnotify_unmount_inodes() sees connector detached for inode -> nothing to do evict_inode() barfs on pending inode reference iput(inode); Resulting in "Busy inodes after unmount" message and possible kernel oops. Make fsnotify_unmount_inodes() properly wait for outstanding inode references from detached connectors. Note that the accounting of outstanding inode references in the superblock can cause some cacheline contention on the counter. OTOH it happens only during deletion of the last notification mark from an inode (or during unlinking of watched inode) and that is not too bad. I have measured time to create & delete inotify watch 100000 times from 64 processes in parallel (each process having its own inotify group and its own file on a shared superblock) on a 64 CPU machine. Average and standard deviation of 15 runs look like: Avg Stddev Vanilla 9.817400 0.276165 Fixed 9.710467 0.228294 So there's no statistically significant difference. Fixes: 6b3f05d24d35 ("fsnotify: Detach mark from object list when last reference is dropped") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13signal: Guard against negative signal numbers in copy_siginfo_from_user32Eric W. Biederman
commit a36700589b85443e28170be59fa11c8a104130a5 upstream. While fixing an out of bounds array access in known_siginfo_layout reported by the kernel test robot it became apparent that the same bug exists in siginfo_layout and affects copy_siginfo_from_user32. The straight forward fix that makes guards against making this mistake in the future and should keep the code size small is to just take an unsigned signal number instead of a signed signal number, as I did to fix known_siginfo_layout. Cc: stable@vger.kernel.org Fixes: cc731525f26a ("signal: Remove kernel interal si_code magic") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13signal: Introduce COMPAT_SIGMINSTKSZ for use in compat_sys_sigaltstackWill Deacon
[ Upstream commit 22839869f21ab3850fbbac9b425ccc4c0023926f ] The sigaltstack(2) system call fails with -ENOMEM if the new alternative signal stack is found to be smaller than SIGMINSTKSZ. On architectures such as arm64, where the native value for SIGMINSTKSZ is larger than the compat value, this can result in an unexpected error being reported to a compat task. See, for example: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=904385 This patch fixes the problem by extending do_sigaltstack to take the minimum signal stack size as an additional parameter, allowing the native and compat system call entry code to pass in their respective values. COMPAT_SIGMINSTKSZ is just defined as SIGMINSTKSZ if it has not been defined by the architecture. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Reported-by: Steve McIntyre <steve.mcintyre@arm.com> Tested-by: Steve McIntyre <93sam@debian.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-10bpf: fix partial copy of map_ptr when dst is scalarDaniel Borkmann
commit 0962590e553331db2cc0aef2dc35c57f6300dbbe upstream. ALU operations on pointers such as scalar_reg += map_value_ptr are handled in adjust_ptr_min_max_vals(). Problem is however that map_ptr and range in the register state share a union, so transferring state through dst_reg->range = ptr_reg->range is just buggy as any new map_ptr in the dst_reg is then truncated (or null) for subsequent checks. Fix this by adding a raw member and use it for copying state over to dst_reg. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Edward Cree <ecree@solarflare.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-10vfs: swap names of {do,vfs}_clone_file_range()Amir Goldstein
commit a725356b6659469d182d662f22d770d83d3bc7b5 upstream. Commit 031a072a0b8a ("vfs: call vfs_clone_file_range() under freeze protection") created a wrapper do_clone_file_range() around vfs_clone_file_range() moving the freeze protection to former, so overlayfs could call the latter. The more common vfs practice is to call do_xxx helpers from vfs_xxx helpers, where freeze protecction is taken in the vfs_xxx helper, so this anomality could be a source of confusion. It seems that commit 8ede205541ff ("ovl: add reflink/copyfile/dedup support") may have fallen a victim to this confusion - ovl_clone_file_range() calls the vfs_clone_file_range() helper in the hope of getting freeze protection on upper fs, but in fact results in overlayfs allowing to bypass upper fs freeze protection. Swap the names of the two helpers to conform to common vfs practice and call the correct helpers from overlayfs and nfsd. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 031a072a0b8a ("vfs: call vfs_clone_file_range() under freeze...") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-04net/mlx5: WQ, fixes for fragmented WQ buffers APITariq Toukan
[ Upstream commit 37fdffb217a45609edccbb8b407d031143f551c0 ] mlx5e netdevice used to calculate fragment edges by a call to mlx5_wq_cyc_get_frag_size(). This calculation did not give the correct indication for queues smaller than a PAGE_SIZE, (broken by default on PowerPC, where PAGE_SIZE == 64KB). Here it is replaced by the correct new calls/API. Since (TX/RX) Work Queues buffers are fragmented, here we introduce changes to the API in core driver, so that it gets a stride index and returns the index of last stride on same fragment, and an additional wrapping function that returns the number of physically contiguous strides that can be written contiguously to the work queue. This obsoletes the following API functions, and their buggy usage in EN driver: * mlx5_wq_cyc_get_frag_size() * mlx5_wq_cyc_ctr2fragix() The new API improves modularity and hides the details of such calculation for mlx5e netdevice and mlx5_ib rdma drivers. New calculation is also more efficient, and improves performance as follows: Packet rate test: pktgen, UDP / IPv4, 64byte, single ring, 8K ring size. Before: 16,477,619 pps After: 17,085,793 pps 3.7% improvement Fixes: 3a2f70331226 ("net/mlx5: Use order-0 allocations for all WQ types") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-04gpio: Assign gpio_irq_chip::parents to non-stack pointerStephen Boyd
[ Upstream commit 3e779a2e7f909015f21428b66834127496110b6d ] gpiochip_set_cascaded_irqchip() is passed 'parent_irq' as an argument and then the address of that argument is assigned to the gpio chips gpio_irq_chip 'parents' pointer shortly thereafter. This can't ever work, because we've just assigned some stack address to a pointer that we plan to dereference later in gpiochip_irq_map(). I ran into this issue with the KASAN report below when gpiochip_irq_map() tried to setup the parent irq with a total junk pointer for the 'parents' array. BUG: KASAN: stack-out-of-bounds in gpiochip_irq_map+0x228/0x248 Read of size 4 at addr ffffffc0dde472e0 by task swapper/0/1 CPU: 7 PID: 1 Comm: swapper/0 Not tainted 4.14.72 #34 Call trace: [<ffffff9008093638>] dump_backtrace+0x0/0x718 [<ffffff9008093da4>] show_stack+0x20/0x2c [<ffffff90096b9224>] __dump_stack+0x20/0x28 [<ffffff90096b91c8>] dump_stack+0x80/0xbc [<ffffff900845a350>] print_address_description+0x70/0x238 [<ffffff900845a8e4>] kasan_report+0x1cc/0x260 [<ffffff900845aa14>] __asan_report_load4_noabort+0x2c/0x38 [<ffffff900897e098>] gpiochip_irq_map+0x228/0x248 [<ffffff900820cc08>] irq_domain_associate+0x114/0x2ec [<ffffff900820d13c>] irq_create_mapping+0x120/0x234 [<ffffff900820da78>] irq_create_fwspec_mapping+0x4c8/0x88c [<ffffff900820e2d8>] irq_create_of_mapping+0x180/0x210 [<ffffff900917114c>] of_irq_get+0x138/0x198 [<ffffff9008dc70ac>] spi_drv_probe+0x94/0x178 [<ffffff9008ca5168>] driver_probe_device+0x51c/0x824 [<ffffff9008ca6538>] __device_attach_driver+0x148/0x20c [<ffffff9008ca14cc>] bus_for_each_drv+0x120/0x188 [<ffffff9008ca570c>] __device_attach+0x19c/0x2dc [<ffffff9008ca586c>] device_initial_probe+0x20/0x2c [<ffffff9008ca18bc>] bus_probe_device+0x80/0x154 [<ffffff9008c9b9b4>] device_add+0x9b8/0xbdc [<ffffff9008dc7640>] spi_add_device+0x1b8/0x380 [<ffffff9008dcbaf0>] spi_register_controller+0x111c/0x1378 [<ffffff9008dd6b10>] spi_geni_probe+0x4dc/0x6f8 [<ffffff9008cab058>] platform_drv_probe+0xdc/0x130 [<ffffff9008ca5168>] driver_probe_device+0x51c/0x824 [<ffffff9008ca59cc>] __driver_attach+0x100/0x194 [<ffffff9008ca0ea8>] bus_for_each_dev+0x104/0x16c [<ffffff9008ca58c0>] driver_attach+0x48/0x54 [<ffffff9008ca1edc>] bus_add_driver+0x274/0x498 [<ffffff9008ca8448>] driver_register+0x1ac/0x230 [<ffffff9008caaf6c>] __platform_driver_register+0xcc/0xdc [<ffffff9009c4b33c>] spi_geni_driver_init+0x1c/0x24 [<ffffff9008084cb8>] do_one_initcall+0x240/0x3dc [<ffffff9009c017d0>] kernel_init_freeable+0x378/0x468 [<ffffff90096e8240>] kernel_init+0x14/0x110 [<ffffff9008086fcc>] ret_from_fork+0x10/0x18 The buggy address belongs to the page: page:ffffffbf037791c0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x4000000000000000() raw: 4000000000000000 0000000000000000 0000000000000000 00000000ffffffff raw: ffffffbf037791e0 ffffffbf037791e0 0000000000000000 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffffffc0dde47180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffc0dde47200: f1 f1 f1 f1 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f2 f2 >ffffffc0dde47280: f2 f2 00 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 ^ ffffffc0dde47300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffc0dde47380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Let's leave around one unsigned int in the gpio_irq_chip struct for the single parent irq case and repoint the 'parents' array at it. This way code is left mostly intact to setup parents and we waste an extra few bytes per structure of which there should be only a handful in a system. Cc: Evan Green <evgreen@chromium.org> Cc: Thierry Reding <treding@nvidia.com> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Fixes: e0d897289813 ("gpio: Implement tighter IRQ chip integration") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-04compiler.h: Allow arch-specific asm/compiler.hPaul Burton
[ Upstream commit 04f264d3a8b0eb25d378127bd78c3c9a0261c828 ] We have a need to override the definition of barrier_before_unreachable() for MIPS, which means we either need to add architecture-specific code into linux/compiler-gcc.h or we need to allow the architecture to provide a header that can define the macro before the generic definition. The latter seems like the better approach. A straightforward approach to the per-arch header is to make use of asm-generic to provide a default empty header & adjust architectures which don't need anything specific to make use of that by adding the header to generic-y. Unfortunately this doesn't work so well due to commit 28128c61e08e ("kconfig.h: Include compiler types to avoid missed struct attributes") which caused linux/compiler_types.h to be included in the compilation of every C file via the -include linux/kconfig.h flag in c_flags. Because the -include flag is present for all C files we compile, we need the architecture-provided header to be present before any C files are compiled. If any C files can be compiled prior to the asm-generic header wrappers being generated then we hit a build failure due to missing header. Such cases do exist - one pointed out by the kbuild test robot is the compilation of arch/ia64/kernel/nr-irqs.c, which occurs as part of the archprepare target [1]. This leaves us with a few options: 1) Use generic-y & fix any build failures we find by enforcing ordering such that the asm-generic target occurs before any C compilation, such that linux/compiler_types.h can always include the generated asm-generic wrapper which in turn includes the empty asm-generic header. This would rely on us finding all the problematic cases - I don't know for sure that the ia64 issue is the only one. 2) Add an actual empty header to each architecture, so that we don't need the generated asm-generic wrapper. This seems messy. 3) Give up & add #ifdef CONFIG_MIPS or similar to linux/compiler_types.h. This seems messy too. 4) Include the arch header only when it's actually needed, removing the need for the asm-generic wrapper for all other architectures. This patch allows us to use approach 4, by including an asm/compiler.h header from linux/compiler_types.h after the inclusion of the compiler-specific linux/compiler-*.h header(s). We do this conditionally, only when CONFIG_HAVE_ARCH_COMPILER_H is selected, in order to avoid the need for asm-generic wrappers & the associated build ordering issue described above. The asm/compiler.h header is included after the generic linux/compiler-*.h header(s) for consistency with the way linux/compiler-intel.h & linux/compiler-clang.h are included after the linux/compiler-gcc.h header that they override. [1] https://lists.01.org/pipermail/kbuild-all/2018-August/051175.html Signed-off-by: Paul Burton <paul.burton@mips.com> Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com> Patchwork: https://patchwork.linux-mips.org/patch/20269/ Cc: Arnd Bergmann <arnd@arndb.de> Cc: James Hogan <jhogan@kernel.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-arch@vger.kernel.org Cc: linux-kbuild@vger.kernel.org Cc: linux-mips@linux-mips.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-04netfilter: avoid erronous array bounds warningFlorian Westphal
[ Upstream commit 421c119f558761556afca6a62ad183bc2d8659e0 ] Unfortunately some versions of gcc emit following warning: $ make net/xfrm/xfrm_output.o linux/compiler.h:252:20: warning: array subscript is above array bounds [-Warray-bounds] hook_head = rcu_dereference(net->nf.hooks_arp[hook]); ^~~~~~~~~~~~~~~~~~~~~ xfrm_output_resume passes skb_dst(skb)->ops->family as its 'pf' arg so compiler can't know that we'll never access hooks_arp[]. (NFPROTO_IPV4 or NFPROTO_IPV6 are only possible cases). Avoid this by adding an explicit WARN_ON_ONCE() check. This patch has no effect if the family is a compile-time constant as gcc will remove the switch() construct entirely. Reported-by: David Ahern <dsahern@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-10-20mremap: properly flush TLB before releasing the pageLinus Torvalds
commit eb66ae030829605d61fbef1909ce310e29f78821 upstream. Jann Horn points out that our TLB flushing was subtly wrong for the mremap() case. What makes mremap() special is that we don't follow the usual "add page to list of pages to be freed, then flush tlb, and then free pages". No, mremap() obviously just _moves_ the page from one page table location to another. That matters, because mremap() thus doesn't directly control the lifetime of the moved page with a freelist: instead, the lifetime of the page is controlled by the page table locking, that serializes access to the entry. As a result, we need to flush the TLB not just before releasing the lock for the source location (to avoid any concurrent accesses to the entry), but also before we release the destination page table lock (to avoid the TLB being flushed after somebody else has already done something to that page). This also makes the whole "need_flush" logic unnecessary, since we now always end up flushing the TLB for every valid entry. Reported-and-tested-by: Jann Horn <jannh@google.com> Acked-by: Will Deacon <will.deacon@arm.com> Tested-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18arm64: perf: Reject stand-alone CHAIN events for PMUv3Will Deacon
commit ca2b497253ad01c80061a1f3ee9eb91b5d54a849 upstream. It doesn't make sense for a perf event to be configured as a CHAIN event in isolation, so extend the arm_pmu structure with a ->filter_match() function to allow the backend PMU implementation to reject CHAIN events early. Cc: <stable@vger.kernel.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18cgroup: Fix dom_cgrp propagation when enabling threaded modeTejun Heo
commit 479adb89a97b0a33e5a9d702119872cc82ca21aa upstream. A cgroup which is already a threaded domain may be converted into a threaded cgroup if the prerequisite conditions are met. When this happens, all threaded descendant should also have their ->dom_cgrp updated to the new threaded domain cgroup. Unfortunately, this propagation was missing leading to the following failure. # cd /sys/fs/cgroup/unified # cat cgroup.subtree_control # show that no controllers are enabled # mkdir -p mycgrp/a/b/c # echo threaded > mycgrp/a/b/cgroup.type At this point, the hierarchy looks as follows: mycgrp [d] a [dt] b [t] c [inv] Now let's make node "a" threaded (and thus "mycgrp" s made "domain threaded"): # echo threaded > mycgrp/a/cgroup.type By this point, we now have a hierarchy that looks as follows: mycgrp [dt] a [t] b [t] c [inv] But, when we try to convert the node "c" from "domain invalid" to "threaded", we get ENOTSUP on the write(): # echo threaded > mycgrp/a/b/c/cgroup.type sh: echo: write error: Operation not supported This patch fixes the problem by * Moving the opencoded ->dom_cgrp save and restoration in cgroup_enable_threaded() into cgroup_{save|restore}_control() so that mulitple cgroups can be handled. * Updating all threaded descendants' ->dom_cgrp to point to the new dom_cgrp when enabling threaded mode. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Reported-by: Amin Jamali <ajamali@pivotal.io> Reported-by: Joao De Almeida Pereira <jpereira@pivotal.io> Link: https://lore.kernel.org/r/CAKgNAkhHYCMn74TCNiMJ=ccLd7DcmXSbvw3CbZ1YREeG7iJM5g@mail.gmail.com Fixes: 454000adaa2a ("cgroup: introduce cgroup->dom_cgrp and threaded css_set handling") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18net: stmmac: Rework coalesce timer and fix multi-queue racesJose Abreu
[ Upstream commit 8fce3331702316d4bcfeb0771c09ac75d2192bbc ] This follows David Miller advice and tries to fix coalesce timer in multi-queue scenarios. We are now using per-queue coalesce values and per-queue TX timer. Coalesce timer default values was changed to 1ms and the coalesce frames to 25. Tested in B2B setup between XGMAC2 and GMAC5. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Fixes: ce736788e8a ("net: stmmac: adding multiple buffers for TX") Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Neil Armstrong <narmstrong@baylibre.com> Cc: Jerome Brunet <jbrunet@baylibre.com> Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18net/packet: fix packet drop as of virtio gsoJianfeng Tan
[ Upstream commit 9d2f67e43b73e8af7438be219b66a5de0cfa8bd9 ] When we use raw socket as the vhost backend, a packet from virito with gso offloading information, cannot be sent out in later validaton at xmit path, as we did not set correct skb->protocol which is further used for looking up the gso function. To fix this, we set this field according to virito hdr information. Fixes: e858fae2b0b8f4 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion") Signed-off-by: Jianfeng Tan <jianfeng.tan@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18net: ipv4: update fnhe_pmtu when first hop's MTU changesSabrina Dubroca
[ Upstream commit af7d6cce53694a88d6a1bb60c9a239a6a5144459 ] Since commit 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions"), exceptions get deprecated separately from cached routes. In particular, administrative changes don't clear PMTU anymore. As Stefano described in commit e9fa1495d738 ("ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes"), the PMTU discovered before the local MTU change can become stale: - if the local MTU is now lower than the PMTU, that PMTU is now incorrect - if the local MTU was the lowest value in the path, and is increased, we might discover a higher PMTU Similarly to what commit e9fa1495d738 did for IPv6, update PMTU in those cases. If the exception was locked, the discovered PMTU was smaller than the minimal accepted PMTU. In that case, if the new local MTU is smaller than the current PMTU, let PMTU discovery figure out if locking of the exception is still needed. To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU notifier. By the time the notifier is called, dev->mtu has been changed. This patch adds the old MTU as additional information in the notifier structure, and a new call_netdevice_notifiers_u32() function. Fixes: 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-13mm: migration: fix migration of huge PMD shared pagesMike Kravetz
commit 017b1660df89f5fb4bfe66c34e35f7d2031100c7 upstream. The page migration code employs try_to_unmap() to try and unmap the source page. This is accomplished by using rmap_walk to find all vmas where the page is mapped. This search stops when page mapcount is zero. For shared PMD huge pages, the page map count is always 1 no matter the number of mappings. Shared mappings are tracked via the reference count of the PMD page. Therefore, try_to_unmap stops prematurely and does not completely unmap all mappings of the source page. This problem can result is data corruption as writes to the original source page can happen after contents of the page are copied to the target page. Hence, data is lost. This problem was originally seen as DB corruption of shared global areas after a huge page was soft offlined due to ECC memory errors. DB developers noticed they could reproduce the issue by (hotplug) offlining memory used to back huge pages. A simple testcase can reproduce the problem by creating a shared PMD mapping (note that this must be at least PUD_SIZE in size and PUD_SIZE aligned (1GB on x86)), and using migrate_pages() to migrate process pages between nodes while continually writing to the huge pages being migrated. To fix, have the try_to_unmap_one routine check for huge PMD sharing by calling huge_pmd_unshare for hugetlbfs huge pages. If it is a shared mapping it will be 'unshared' which removes the page table entry and drops the reference on the PMD page. After this, flush caches and TLB. mmu notifiers are called before locking page tables, but we can not be sure of PMD sharing until page tables are locked. Therefore, check for the possibility of PMD sharing before locking so that notifiers can prepare for the worst possible case. Link: http://lkml.kernel.org/r/20180823205917.16297-2-mike.kravetz@oracle.com [mike.kravetz@oracle.com: make _range_in_vma() a static inline] Link: http://lkml.kernel.org/r/6063f215-a5c8-2f0c-465a-2c515ddc952d@oracle.com Fixes: 39dde65c9940 ("shared page table for hugetlb page") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10new primitive: discard_new_inode()Al Viro
commit c2b6d621c4ffe9936adf7a55c8b1c769672c306f upstream. We don't want open-by-handle picking half-set-up in-core struct inode from e.g. mkdir() having failed halfway through. In other words, we don't want such inodes returned by iget_locked() on their way to extinction. However, we can't just have them unhashed - otherwise open-by-handle immediately *after* that would've ended up creating a new in-core inode over the on-disk one that is in process of being freed right under us. Solution: new flag (I_CREATING) set by insert_inode_locked() and removed by unlock_new_inode() and a new primitive (discard_new_inode()) to be used by such halfway-through-setup failure exits instead of unlock_new_inode() / iput() combinations. That primitive unlocks new inode, but leaves I_CREATING in place. iget_locked() treats finding an I_CREATING inode as failure (-ESTALE, once we sort out the error propagation). insert_inode_locked() treats the same as instant -EBUSY. ilookup() treats those as icache miss. [Fix by Dan Carpenter <dan.carpenter@oracle.com> folded in] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10Revert "blk-throttle: fix race between blkcg_bio_issue_check() and ↵Dennis Zhou (Facebook)
cgroup_rmdir()" [ Upstream commit 6b06546206868f723f2061d703a3c3c378dcbf4c ] This reverts commit 4c6994806f708559c2812b73501406e21ae5dcd0. Destroying blkgs is tricky because of the nature of the relationship. A blkg should go away when either a blkcg or a request_queue goes away. However, blkg's pin the blkcg to ensure they remain valid. To break this cycle, when a blkcg is offlined, blkgs put back their css ref. This eventually lets css_free() get called which frees the blkcg. The above commit (4c6994806f70) breaks this order of events by trying to destroy blkgs in css_free(). As the blkgs still hold references to the blkcg, css_free() is never called. The race between blkcg_bio_issue_check() and cgroup_rmdir() will be addressed in the following patch by delaying destruction of a blkg until all writeback associated with the blkcg has been finished. Fixes: 4c6994806f70 ("blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-03arm/arm64: smccc-1.1: Handle function result as parametersMarc Zyngier
[ Upstream commit 755a8bf5579d22eb5636685c516d8dede799e27b ] If someone has the silly idea to write something along those lines: extern u64 foo(void); void bar(struct arm_smccc_res *res) { arm_smccc_1_1_smc(0xbad, foo(), res); } they are in for a surprise, as this gets compiled as: 0000000000000588 <bar>: 588: a9be7bfd stp x29, x30, [sp, #-32]! 58c: 910003fd mov x29, sp 590: f9000bf3 str x19, [sp, #16] 594: aa0003f3 mov x19, x0 598: aa1e03e0 mov x0, x30 59c: 94000000 bl 0 <_mcount> 5a0: 94000000 bl 0 <foo> 5a4: aa0003e1 mov x1, x0 5a8: d4000003 smc #0x0 5ac: b4000073 cbz x19, 5b8 <bar+0x30> 5b0: a9000660 stp x0, x1, [x19] 5b4: a9010e62 stp x2, x3, [x19, #16] 5b8: f9400bf3 ldr x19, [sp, #16] 5bc: a8c27bfd ldp x29, x30, [sp], #32 5c0: d65f03c0 ret 5c4: d503201f nop The call to foo "overwrites" the x0 register for the return value, and we end up calling the wrong secure service. A solution is to evaluate all the parameters before assigning anything to specific registers, leading to the expected result: 0000000000000588 <bar>: 588: a9be7bfd stp x29, x30, [sp, #-32]! 58c: 910003fd mov x29, sp 590: f9000bf3 str x19, [sp, #16] 594: aa0003f3 mov x19, x0 598: aa1e03e0 mov x0, x30 59c: 94000000 bl 0 <_mcount> 5a0: 94000000 bl 0 <foo> 5a4: aa0003e1 mov x1, x0 5a8: d28175a0 mov x0, #0xbad 5ac: d4000003 smc #0x0 5b0: b4000073 cbz x19, 5bc <bar+0x34> 5b4: a9000660 stp x0, x1, [x19] 5b8: a9010e62 stp x2, x3, [x19, #16] 5bc: f9400bf3 ldr x19, [sp, #16] 5c0: a8c27bfd ldp x29, x30, [sp], #32 5c4: d65f03c0 ret Reported-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-03arm/arm64: smccc-1.1: Make return values unsigned longMarc Zyngier
[ Upstream commit 1d8f574708a3fb6f18c85486d0c5217df893c0cf ] An unfortunate consequence of having a strong typing for the input values to the SMC call is that it also affects the type of the return values, limiting r0 to 32 bits and r{1,2,3} to whatever was passed as an input. Let's turn everything into "unsigned long", which satisfies the requirements of both architectures, and allows for the full range of return values. Reported-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-03hwmon: (ina2xx) fix sysfs shunt resistor read accessLothar Felten
[ Upstream commit 3ad867001c91657c46dcf6656d52eb6080286fd5 ] fix the sysfs shunt resistor read access: return the shunt resistor value, not the calibration register contents. update email address Signed-off-by: Lothar Felten <lothar.felten@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-03uaccess: Fix is_source param for check_copy_size() in copy_to_iter_mcsafe()Dave Jiang
commit dfb06cba8c73c0704710b2e3fbe2c35ac66a01b4 upstream. copy_to_iter_mcsafe() is passing in the is_source parameter as "false" to check_copy_size(). This is different than what copy_to_iter() does. Also, the addr parameter passed to check_copy_size() is the source so therefore we should be passing in "true" instead. Fixes: 8780356ef630 ("x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()") Cc: <stable@vger.kernel.org> Reported-by: Fan Du <fan.du@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reported-by: Wenwei Tao <wenwei.tww@alibaba-inc.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-03regulator: Fix 'do-nothing' value for regulators without suspend stateMarek Szyprowski
commit 3edd79cf5a44b12dbb13bc320f5788aed6562b36 upstream. Some regulators don't have all states defined and in such cases regulator core should not assume anything. However in current implementation of of_get_regulation_constraints() DO_NOTHING_IN_SUSPEND enable value was set only for regulators which had suspend node defined, otherwise the default 0 value was used, what means DISABLE_IN_SUSPEND. This lead to broken system suspend/resume on boards, which had simple regulator constraints definition (without suspend state nodes). To avoid further mismatches between the default and uninitialized values of the suspend enabled/disabled states, change the values of the them, so default '0' means DO_NOTHING_IN_SUSPEND. Fixes: 72069f9957a1: regulator: leave one item to record whether regulator is enabled Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-03bitfield: fix *_encode_bits()Johannes Berg
[ Upstream commit e7d4a95da86e0b048702765bbdcdc968aaf312e7 ] There's a bug in *_encode_bits() in using ~field_multiplier() for the check whether or not the constant value fits into the field, this is wrong and clearly ~field_mask() was intended. This was triggering for me for both constant and non-constant values. Additionally, make this case actually into an compile error. Declaring the extern function that will never exist with just a warning is pointless as then later we'll just get a link error. While at it, also fix the indentation in those lines I'm touching. Finally, as suggested by Andy Shevchenko, add some tests and for that introduce also u8 helpers. The tests don't compile without the fix, showing that it's necessary. Fixes: 00b0c9b82663 ("Add primitives for manipulating bitfields both in host- and fixed-endian.") Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-03posix-timers: Sanitize overrun handlingThomas Gleixner
[ Upstream commit 78c9c4dfbf8c04883941445a195276bb4bb92c76 ] The posix timer overrun handling is broken because the forwarding functions can return a huge number of overruns which does not fit in an int. As a consequence timer_getoverrun(2) and siginfo::si_overrun can turn into random number generators. The k_clock::timer_forward() callbacks return a 64 bit value now. Make k_itimer::ti_overrun[_last] 64bit as well, so the kernel internal accounting is correct. 3Remove the temporary (int) casts. Add a helper function which clamps the overrun value returned to user space via timer_getoverrun(2) or siginfo::si_overrun limited to a positive value between 0 and INT_MAX. INT_MAX is an indicator for user space that the overrun value has been clamped. Reported-by: Team OWL337 <icytxw@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Link: https://lkml.kernel.org/r/20180626132705.018623573@linutronix.de Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-03power: remove possible deadlock when unregistering power_supplyBenjamin Tissoires
[ Upstream commit 3ffa6583e24e1ad1abab836d24bfc9d2308074e5 ] If a device gets removed right after having registered a power_supply node, we might enter in a deadlock between the remove call (that has a lock on the parent device) and the deferred register work. Allow the deferred register work to exit without taking the lock when we are in the remove state. Stack trace on a Ubuntu 16.04: [16072.109121] INFO: task kworker/u16:2:1180 blocked for more than 120 seconds. [16072.109127] Not tainted 4.13.0-41-generic #46~16.04.1-Ubuntu [16072.109129] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [16072.109132] kworker/u16:2 D 0 1180 2 0x80000000 [16072.109142] Workqueue: events_power_efficient power_supply_deferred_register_work [16072.109144] Call Trace: [16072.109152] __schedule+0x3d6/0x8b0 [16072.109155] schedule+0x36/0x80 [16072.109158] schedule_preempt_disabled+0xe/0x10 [16072.109161] __mutex_lock.isra.2+0x2ab/0x4e0 [16072.109166] __mutex_lock_slowpath+0x13/0x20 [16072.109168] ? __mutex_lock_slowpath+0x13/0x20 [16072.109171] mutex_lock+0x2f/0x40 [16072.109174] power_supply_deferred_register_work+0x2b/0x50 [16072.109179] process_one_work+0x15b/0x410 [16072.109182] worker_thread+0x4b/0x460 [16072.109186] kthread+0x10c/0x140 [16072.109189] ? process_one_work+0x410/0x410 [16072.109191] ? kthread_create_on_node+0x70/0x70 [16072.109194] ret_from_fork+0x35/0x40 [16072.109199] INFO: task test:2257 blocked for more than 120 seconds. [16072.109202] Not tainted 4.13.0-41-generic #46~16.04.1-Ubuntu [16072.109204] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [16072.109206] test D 0 2257 2256 0x00000004 [16072.109208] Call Trace: [16072.109211] __schedule+0x3d6/0x8b0 [16072.109215] schedule+0x36/0x80 [16072.109218] schedule_timeout+0x1f3/0x360 [16072.109221] ? check_preempt_curr+0x5a/0xa0 [16072.109224] ? ttwu_do_wakeup+0x1e/0x150 [16072.109227] wait_for_completion+0xb4/0x140 [16072.109230] ? wait_for_completion+0xb4/0x140 [16072.109233] ? wake_up_q+0x70/0x70 [16072.109236] flush_work+0x129/0x1e0 [16072.109240] ? worker_detach_from_pool+0xb0/0xb0 [16072.109243] __cancel_work_timer+0x10f/0x190 [16072.109247] ? device_del+0x264/0x310 [16072.109250] ? __wake_up+0x44/0x50 [16072.109253] cancel_delayed_work_sync+0x13/0x20 [16072.109257] power_supply_unregister+0x37/0xb0 [16072.109260] devm_power_supply_release+0x11/0x20 [16072.109263] release_nodes+0x110/0x200 [16072.109266] devres_release_group+0x7c/0xb0 [16072.109274] wacom_remove+0xc2/0x110 [wacom] [16072.109279] hid_device_remove+0x6e/0xd0 [hid] [16072.109284] device_release_driver_internal+0x158/0x210 [16072.109288] device_release_driver+0x12/0x20 [16072.109291] bus_remove_device+0xec/0x160 [16072.109293] device_del+0x1de/0x310 [16072.109298] hid_destroy_device+0x27/0x60 [hid] [16072.109303] usbhid_disconnect+0x51/0x70 [usbhid] [16072.109308] usb_unbind_interface+0x77/0x270 [16072.109311] device_release_driver_internal+0x158/0x210 [16072.109315] device_release_driver+0x12/0x20 [16072.109318] usb_driver_release_interface+0x77/0x80 [16072.109321] proc_ioctl+0x20f/0x250 [16072.109325] usbdev_do_ioctl+0x57f/0x1140 [16072.109327] ? __wake_up+0x44/0x50 [16072.109331] usbdev_ioctl+0xe/0x20 [16072.109336] do_vfs_ioctl+0xa4/0x600 [16072.109339] ? vfs_write+0x15a/0x1b0 [16072.109343] SyS_ioctl+0x79/0x90 [16072.109347] entry_SYSCALL_64_fastpath+0x24/0xab [16072.109349] RIP: 0033:0x7f20da807f47 [16072.109351] RSP: 002b:00007ffc422ae398 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [16072.109353] RAX: ffffffffffffffda RBX: 00000000010b8560 RCX: 00007f20da807f47 [16072.109355] RDX: 00007ffc422ae3a0 RSI: 00000000c0105512 RDI: 0000000000000009 [16072.109356] RBP: 0000000000000000 R08: 00007ffc422ae3e0 R09: 0000000000000010 [16072.109357] R10: 00000000000000a6 R11: 0000000000000246 R12: 0000000000000000 [16072.109359] R13: 00000000010b8560 R14: 00007ffc422ae2e0 R15: 0000000000000000 Reported-and-tested-by: Richard Hughes <rhughes@redhat.com> Tested-by: Aaron Skomra <Aaron.Skomra@wacom.com> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Fixes: 7f1a57fdd6cb ("power_supply: Fix possible NULL pointer dereference on early uevent") Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-26evm: Don't deadlock if a crypto algorithm is unavailableMatthew Garrett
[ Upstream commit e2861fa71641c6414831d628a1f4f793b6562580 ] When EVM attempts to appraise a file signed with a crypto algorithm the kernel doesn't have support for, it will cause the kernel to trigger a module load. If the EVM policy includes appraisal of kernel modules this will in turn call back into EVM - since EVM is holding a lock until the crypto initialisation is complete, this triggers a deadlock. Add a CRYPTO_NOLOAD flag and skip module loading if it's set, and add that flag in the EVM case in order to fail gracefully with an error message instead of deadlocking. Signed-off-by: Matthew Garrett <mjg59@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-26of: add helper to lookup compatible child nodeJohan Hovold
commit 36156f9241cb0f9e37d998052873ca7501ad4b36 upstream. Add of_get_compatible_child() helper that can be used to lookup compatible child nodes. Several drivers currently use of_find_compatible_node() to lookup child nodes while failing to notice that the of_find_ functions search the entire tree depth-first (from a given start node) and therefore can match unrelated nodes. The fact that these functions also drop a reference to the node they start searching from (e.g. the parent node) is typically also overlooked, something which can lead to use-after-free bugs. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-26net/mlx5: Use u16 for Work Queue buffer fragment sizeTariq Toukan
[ Upstream commit 8d71e818506718e8d7032ce824b5c74a17d4f7a5 ] Minimal stride size is 16. Hence, the number of strides in a fragment (of PAGE_SIZE) is <= PAGE_SIZE / 16 <= 4K. u16 is sufficient to represent this. Fixes: 388ca8be0037 ("IB/mlx5: Implement fragmented completion queue (CQ)") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-26net/mlx5: Fix use-after-free in self-healing flowJack Morgenstein
[ Upstream commit 76d5581c870454be5f1f1a106c57985902e7ea20 ] When the mlx5 health mechanism detects a problem while the driver is in the middle of init_one or remove_one, the driver needs to prevent the health mechanism from scheduling future work; if future work is scheduled, there is a problem with use-after-free: the system WQ tries to run the work item (which has been freed) at the scheduled future time. Prevent this by disabling work item scheduling in the health mechanism when the driver is in the middle of init_one() or remove_one(). Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-19mm: get rid of vmacache_flush_all() entirelyLinus Torvalds
commit 7a9cdebdcc17e426fb5287e4a82db1dfe86339b2 upstream. Jann Horn points out that the vmacache_flush_all() function is not only potentially expensive, it's buggy too. It also happens to be entirely unnecessary, because the sequence number overflow case can be avoided by simply making the sequence number be 64-bit. That doesn't even grow the data structures in question, because the other adjacent fields are already 64-bit. So simplify the whole thing by just making the sequence number overflow case go away entirely, which gets rid of all the complications and makes the code faster too. Win-win. [ Oleg Nesterov points out that the VMACACHE_FULL_FLUSHES statistics also just goes away entirely with this ] Reported-by: Jann Horn <jannh@google.com> Suggested-by: Will Deacon <will.deacon@arm.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-19mtd: rawnand: make subop helpers return unsigned valuesMiquel Raynal
[ Upstream commit 760c435e0f85ed19e48a90d746ce1de2cd02def7 ] A report from Colin Ian King pointed a CoverityScan issue where error values on these helpers where not checked in the drivers. These helpers can error out only in case of a software bug in driver code, not because of a runtime/hardware error. Hence, let's WARN_ON() in this case and return 0 which is harmless anyway. Fixes: 8878b126df76 ("mtd: nand: add ->exec_op() implementation") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Reviewed-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-19HID: core: fix grouping by applicationBenjamin Tissoires
commit 0d6c3011409135ea84e2a231b013a22017ff999a upstream. commit f07b3c1da92d ("HID: generic: create one input report per application type") was effectively the same as MULTI_INPUT: hidinput->report was never set, so hidinput_match_application() always returned null. Fix that by testing against the real application. Note that this breaks some old eGalax touchscreens that expect MULTI_INPUT instead of HID_QUIRK_INPUT_PER_APP. Enable this quirk for backward compatibility on all non-Win8 touchscreens. link: https://bugzilla.kernel.org/show_bug.cgi?id=200847 link: https://bugzilla.kernel.org/show_bug.cgi?id=200849 link: https://bugs.archlinux.org/task/59699 link: https://github.com/NixOS/nixpkgs/issues/45165 Cc: stable@vger.kernel.org # v4.18+ Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-15r8169: add support for NCube 8168 network cardAnthony Wong
[ Upstream commit 9fd0e09a4e86499639653243edfcb417a05c5c46 ] This card identifies itself as: Ethernet controller [0200]: NCube Device [10ff:8168] (rev 06) Subsystem: TP-LINK Technologies Co., Ltd. Device [7470:3468] Adding a new entry to rtl8169_pci_tbl makes the card work. Link: http://launchpad.net/bugs/1788730 Signed-off-by: Anthony Wong <anthony.wong@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-09iommu/vt-d: Fix dev iotlb pfsid useJacob Pan
commit 1c48db44924298ad0cb5a6386b88017539be8822 upstream. PFSID should be used in the invalidation descriptor for flushing device IOTLBs on SRIOV VFs. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: stable@vger.kernel.org Cc: "Ashok Raj" <ashok.raj@intel.com> Cc: "Lu Baolu" <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-09iommu/vt-d: Add definitions for PFSIDJacob Pan
commit 0f725561e168485eff7277d683405c05b192f537 upstream. When SRIOV VF device IOTLB is invalidated, we need to provide the PF source ID such that IOMMU hardware can gauge the depth of invalidation queue which is shared among VFs. This is needed when device invalidation throttle (DIT) capability is supported. This patch adds bit definitions for checking and tracking PFSID. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: stable@vger.kernel.org Cc: "Ashok Raj" <ashok.raj@intel.com> Cc: "Lu Baolu" <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-09NFSv4 client live hangs after live data migration recoveryBill Baker
commit 0f90be132cbf1537d87a6a8b9e80867adac892f6 upstream. After a live data migration event at the NFS server, the client may send I/O requests to the wrong server, causing a live hang due to repeated recovery events. On the wire, this will appear as an I/O request failing with NFS4ERR_BADSESSION, followed by successful CREATE_SESSION, repeatedly. NFS4ERR_BADSSESSION is returned because the session ID being used was issued by the other server and is not valid at the old server. The failure is caused by async worker threads having cached the transport (xprt) in the rpc_task structure. After the migration recovery completes, the task is redispatched and the task resends the request to the wrong server based on the old value still present in tk_xprt. The solution is to recompute the tk_xprt field of the rpc_task structure so that the request goes to the correct server. Signed-off-by: Bill Baker <bill.baker@oracle.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Helen Chao <helen.chao@oracle.com> Fixes: fb43d17210ba ("SUNRPC: Use the multipath iterator to assign a ...") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-09nfsd: fix leaked file lock with nfs exported overlayfsAmir Goldstein
commit 64bed6cbe38bc95689fb9399872d9ce250192f90 upstream. nfsd and lockd call vfs_lock_file() to lock/unlock the inode returned by locks_inode(file). Many places in nfsd/lockd code use the inode returned by file_inode(file) for lock manipulation. With Overlayfs, file_inode() (the underlying inode) is not the same object as locks_inode() (the overlay inode). This can result in "Leaked POSIX lock" messages and eventually to a kernel crash as reported by Eddie Horng: https://marc.info/?l=linux-unionfs&m=153086643202072&w=2 Fix all the call sites in nfsd/lockd that should use locks_inode(). This is a correctness bug that manifested when overlayfs gained NFS export support in v4.16. Reported-by: Eddie Horng <eddiehorng.tw@gmail.com> Tested-by: Eddie Horng <eddiehorng.tw@gmail.com> Cc: Jeff Layton <jlayton@kernel.org> Fixes: 8383f1748829 ("ovl: wire up NFS export operations") Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-09Drivers: hv: vmbus: Reset the channel callback in vmbus_onoffer_rescind()Dexuan Cui
commit d3b26dd7cb0e3433bfd3c1d4dcf74c6039bb49fb upstream. Before setting channel->rescind in vmbus_rescind_cleanup(), we should make sure the channel callback won't run any more, otherwise a high-level driver like pci_hyperv, which may be infinitely waiting for the host VSP's response and notices the channel has been rescinded, can't safely give up: e.g., in hv_pci_protocol_negotiation() -> wait_for_response(), it's unsafe to exit from wait_for_response() and proceed with the on-stack variable "comp_pkt" popped. The issue was originally spotted by Michael Kelley <mikelley@microsoft.com>. In vmbus_close_internal(), the patch also minimizes the range protected by disabling/enabling channel->callback_event: we don't really need that for the whole function. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Cc: stable@vger.kernel.org Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Michael Kelley <mikelley@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-09overflow.h: Add arithmetic shift helperJason Gunthorpe
commit 0c66847793d1982d1083dc6f7adad60fa265ce9c upstream. Add shift_overflow() helper to assist driver authors in ensuring that shift operations don't cause overflows or other odd conditions. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> [kees: tweaked comments and commit log, dropped unneeded assignment] Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-09powerpc/64s: Fix page table fragment refcount race vs speculative referencesNicholas Piggin
commit 4231aba000f5a4583dd9f67057aadb68c3eca99d upstream. The page table fragment allocator uses the main page refcount racily with respect to speculative references. A customer observed a BUG due to page table page refcount underflow in the fragment allocator. This can be caused by the fragment allocator set_page_count stomping on a speculative reference, and then the speculative failure handler decrements the new reference, and the underflow eventually pops when the page tables are freed. Fix this by using a dedicated field in the struct page for the page table fragment allocator. Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage") Cc: stable@vger.kernel.org # v3.10+ Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-09Replace magic for trusting the secondary keyring with #defineYannik Sembritzki
commit 817aef260037f33ee0f44c17fe341323d3aebd6d upstream. Replace the use of a magic number that indicates that verify_*_signature() should use the secondary keyring with a symbol. Signed-off-by: Yannik Sembritzki <yannik@sembritzki.me> Signed-off-by: David Howells <dhowells@redhat.com> Cc: keyrings@vger.kernel.org Cc: linux-security-module@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-09blkcg: Introduce blkg_root_lookup()Bart Van Assche
commit 6bad9b210a228d2fe0e0efe26d9b115348529cee upstream. This new function will be used in a later patch to verify whether a queue has been dissociated from the cgroup controller before being released. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Alexandru Moise <00moses.alexander00@gmail.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: <stable@vger.kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-05scsi: sysfs: Introduce sysfs_{un,}break_active_protection()Bart Van Assche
commit 2afc9166f79b8f6da5f347f48515215ceee4ae37 upstream. Introduce these two functions and export them such that the next patch can add calls to these functions from the SCSI core. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>