summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2018-11-21clockevents/drivers/i8253: Add support for PIT shutdown quirkMichael Kelley
commit 35b69a420bfb56b7b74cb635ea903db05e357bec upstream. Add support for platforms where pit_shutdown() doesn't work because of a quirk in the PIT emulation. On these platforms setting the counter register to zero causes the PIT to start running again, negating the shutdown. Provide a global variable that controls whether the counter register is zero'ed, which platform specific code can override. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org> Cc: "devel@linuxdriverproject.org" <devel@linuxdriverproject.org> Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org> Cc: "virtualization@lists.linux-foundation.org" <virtualization@lists.linux-foundation.org> Cc: "jgross@suse.com" <jgross@suse.com> Cc: "akataria@vmware.com" <akataria@vmware.com> Cc: "olaf@aepfle.de" <olaf@aepfle.de> Cc: "apw@canonical.com" <apw@canonical.com> Cc: vkuznets <vkuznets@redhat.com> Cc: "jasowang@redhat.com" <jasowang@redhat.com> Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com> Cc: KY Srinivasan <kys@microsoft.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1541303219-11142-2-git-send-email-mikelley@microsoft.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21watchdog/core: Add missing prototypes for weak functionsMathieu Malaterre
commit 81bd415c91eb966118d773dddf254aebf3022411 upstream. The split out of the hard lockup detector exposed two new weak functions, but no prototypes for them, which triggers the build warning: kernel/watchdog.c:109:12: warning: no previous prototype for ‘watchdog_nmi_enable’ [-Wmissing-prototypes] kernel/watchdog.c:115:13: warning: no previous prototype for ‘watchdog_nmi_disable’ [-Wmissing-prototypes] Add the prototypes. Fixes: 73ce0511c436 ("kernel/watchdog.c: move hardlockup detector to separate file") Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Babu Moger <babu.moger@oracle.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180606194232.17653-1-malat@debian.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21mtd: nand: Fix nanddev_neraseblocks()Boris Brezillon
commit d098093ba06eb032057d1aca1c2e45889e099d00 upstream. nanddev_neraseblocks() currently returns the number pages per LUN instead of the total number of eraseblocks. Fixes: 9c3736a3de21 ("mtd: nand: Add core infrastructure to deal with NAND devices") Cc: <stable@vger.kernel.org> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21libceph: bump CEPH_MSG_MAX_DATA_LENIlya Dryomov
commit 94e6992bb560be8bffb47f287194adf070b57695 upstream. If the read is large enough, we end up spinning in the messenger: libceph: osd0 192.168.122.1:6801 io error libceph: osd0 192.168.122.1:6801 io error libceph: osd0 192.168.122.1:6801 io error This is a receive side limit, so only reads were affected. Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13media: hdmi.h: rename ADOBE_RGB to OPRGB and ADOBE_YCC to OPYCCHans Verkuil
commit 463659a08d7999d5461fa45b35b17686189a70ca upstream. These names have been renamed in the CTA-861 standard due to trademark issues. Replace them here as well so they are in sync with the standard. Signed-off-by: Hans Verkuil <hansverk@cisco.com> Cc: stable@vger.kernel.org Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13TC: Set DMA masks for devicesMaciej W. Rozycki
commit 3f2aa244ee1a0d17ed5b6c86564d2c1b24d1c96b upstream. Fix a TURBOchannel support regression with commit 205e1b7f51e4 ("dma-mapping: warn when there is no coherent_dma_mask") that caused coherent DMA allocations to produce a warning such as: defxx: v1.11 2014/07/01 Lawrence V. Stefani and others tc1: DEFTA at MMIO addr = 0x1e900000, IRQ = 20, Hardware addr = 08-00-2b-a3-a3-29 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 dfx_dev_register+0x670/0x678 Modules linked in: CPU: 0 PID: 1 Comm: swapper Not tainted 4.19.0-rc6 #2 Stack : ffffffff8009ffc0 fffffffffffffec0 0000000000000000 ffffffff80647650 0000000000000000 0000000000000000 ffffffff806f5f80 ffffffffffffffff 0000000000000000 0000000000000000 0000000000000001 ffffffff8065d4e8 98000000031b6300 ffffffff80563478 ffffffff805685b0 ffffffffffffffff 0000000000000000 ffffffff805d6720 0000000000000204 ffffffff80388df8 0000000000000000 0000000000000009 ffffffff8053efd0 ffffffff806657d0 0000000000000000 ffffffff803177f8 0000000000000000 ffffffff806d0000 9800000003078000 980000000307b9e0 000000001e900000 ffffffff80067940 0000000000000000 ffffffff805d6720 0000000000000204 ffffffff80388df8 ffffffff805176c0 ffffffff8004dc78 0000000000000000 ffffffff80067940 ... Call Trace: [<ffffffff8004dc78>] show_stack+0xa0/0x130 [<ffffffff80067940>] __warn+0x128/0x170 ---[ end trace b1d1e094f67f3bb2 ]--- This is because the TURBOchannel bus driver fails to set the coherent DMA mask for devices enumerated. Set the regular and coherent DMA masks for TURBOchannel devices then, observing that the bus protocol supports a 34-bit (16GiB) DMA address space, by interpreting the value presented in the address cycle across the 32 `ad' lines as a 32-bit word rather than byte address[1]. The architectural size of the TURBOchannel DMA address space exceeds the maximum amount of RAM any actual TURBOchannel system in existence may have, hence both masks are the same. This removes the warning shown above. References: [1] "TURBOchannel Hardware Specification", EK-369AA-OD-007B, Digital Equipment Corporation, January 1993, Section "DMA", pp. 1-15 -- 1-17 Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/20835/ Fixes: 205e1b7f51e4 ("dma-mapping: warn when there is no coherent_dma_mask") Cc: stable@vger.kernel.org # 4.16+ Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13fsnotify: Fix busy inodes during unmountJan Kara
commit 721fb6fbfd2132164c2e8777cc837f9b2c1794dc upstream. Detaching of mark connector from fsnotify_put_mark() can race with unmounting of the filesystem like: CPU1 CPU2 fsnotify_put_mark() spin_lock(&conn->lock); ... inode = fsnotify_detach_connector_from_object(conn) spin_unlock(&conn->lock); generic_shutdown_super() fsnotify_unmount_inodes() sees connector detached for inode -> nothing to do evict_inode() barfs on pending inode reference iput(inode); Resulting in "Busy inodes after unmount" message and possible kernel oops. Make fsnotify_unmount_inodes() properly wait for outstanding inode references from detached connectors. Note that the accounting of outstanding inode references in the superblock can cause some cacheline contention on the counter. OTOH it happens only during deletion of the last notification mark from an inode (or during unlinking of watched inode) and that is not too bad. I have measured time to create & delete inotify watch 100000 times from 64 processes in parallel (each process having its own inotify group and its own file on a shared superblock) on a 64 CPU machine. Average and standard deviation of 15 runs look like: Avg Stddev Vanilla 9.817400 0.276165 Fixed 9.710467 0.228294 So there's no statistically significant difference. Fixes: 6b3f05d24d35 ("fsnotify: Detach mark from object list when last reference is dropped") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13signal: Guard against negative signal numbers in copy_siginfo_from_user32Eric W. Biederman
commit a36700589b85443e28170be59fa11c8a104130a5 upstream. While fixing an out of bounds array access in known_siginfo_layout reported by the kernel test robot it became apparent that the same bug exists in siginfo_layout and affects copy_siginfo_from_user32. The straight forward fix that makes guards against making this mistake in the future and should keep the code size small is to just take an unsigned signal number instead of a signed signal number, as I did to fix known_siginfo_layout. Cc: stable@vger.kernel.org Fixes: cc731525f26a ("signal: Remove kernel interal si_code magic") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13scsi: sched/wait: Add wait_event_lock_irq_timeout for TASK_UNINTERRUPTIBLE usageNicholas Bellinger
commit 25ab0bc334b43bbbe4eabc255006ce42a9424da2 upstream. Short of reverting commit 00d909a10710 ("scsi: target: Make the session shutdown code also wait for commands that are being aborted") for v4.19, target-core needs a wait_event_t macro can be executed using TASK_UNINTERRUPTIBLE to function correctly with existing fabric drivers that expect to run with signals pending during session shutdown and active se_cmd I/O quiesce. The most notable is iscsi-target/iser-target, while ibmvscsi_tgt invokes session shutdown logic from userspace via configfs attribute that could also potentially have signals pending. So go ahead and introduce wait_event_lock_irq_timeout() to achieve this, and update + rename __wait_event_lock_irq_timeout() to make it accept 'state' as a parameter. Fixes: 00d909a10710 ("scsi: target: Make the session shutdown code also wait for commands that are being aborted") Cc: <stable@vger.kernel.org> # v4.19+ Cc: Bart Van Assche <bvanassche@acm.org> Cc: Mike Christie <mchristi@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Reviewed-by: Bryant G. Ly <bly@catalogicsoftware.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13signal: Introduce COMPAT_SIGMINSTKSZ for use in compat_sys_sigaltstackWill Deacon
[ Upstream commit 22839869f21ab3850fbbac9b425ccc4c0023926f ] The sigaltstack(2) system call fails with -ENOMEM if the new alternative signal stack is found to be smaller than SIGMINSTKSZ. On architectures such as arm64, where the native value for SIGMINSTKSZ is larger than the compat value, this can result in an unexpected error being reported to a compat task. See, for example: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=904385 This patch fixes the problem by extending do_sigaltstack to take the minimum signal stack size as an additional parameter, allowing the native and compat system call entry code to pass in their respective values. COMPAT_SIGMINSTKSZ is just defined as SIGMINSTKSZ if it has not been defined by the architecture. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Reported-by: Steve McIntyre <steve.mcintyre@arm.com> Tested-by: Steve McIntyre <93sam@debian.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13nvme: call nvme_complete_rq when nvmf_check_ready fails for mpath I/OJames Smart
[ Upstream commit 783f4a4408e1251d17f333ad56abac24dde988b9 ] When an io is rejected by nvmf_check_ready() due to validation of the controller state, the nvmf_fail_nonready_command() will normally return BLK_STS_RESOURCE to requeue and retry. However, if the controller is dying or the I/O is marked for NVMe multipath, the I/O is failed so that the controller can terminate or so that the io can be issued on a different path. Unfortunately, as this reject point is before the transport has accepted the command, blk-mq ends up completing the I/O and never calls nvme_complete_rq(), which is where multipath may preserve or re-route the I/O. The end result is, the device user ends up seeing an EIO error. Example: single path connectivity, controller is under load, and a reset is induced. An I/O is received: a) while the reset state has been set but the queues have yet to be stopped; or b) after queues are started (at end of reset) but before the reconnect has completed. The I/O finishes with an EIO status. This patch makes the following changes: - Adds the HOST_PATH_ERROR pathing status from TP4028 - Modifies the reject point such that it appears to queue successfully, but actually completes the io with the new pathing status and calls nvme_complete_rq(). - nvme_complete_rq() recognizes the new status, avoids resetting the controller (likely was already done in order to get this new status), and calls the multipather to clear the current path that errored. This allows the next command (retry or new command) to select a new path if there is one. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13bpf: fix partial copy of map_ptr when dst is scalarDaniel Borkmann
commit 0962590e553331db2cc0aef2dc35c57f6300dbbe upstream. ALU operations on pointers such as scalar_reg += map_value_ptr are handled in adjust_ptr_min_max_vals(). Problem is however that map_ptr and range in the register state share a union, so transferring state through dst_reg->range = ptr_reg->range is just buggy as any new map_ptr in the dst_reg is then truncated (or null) for subsequent checks. Fix this by adding a raw member and use it for copying state over to dst_reg. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Edward Cree <ecree@solarflare.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-10-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netGreg Kroah-Hartman
David writes: "Networking 1) Fix gro_cells leak in xfrm layer, from Li RongQing. 2) BPF selftests change RLIMIT_MEMLOCK blindly, don't do that. From Eric Dumazet. 3) AF_XDP calls synchronize_net() under RCU lock, fix from Björn Töpel. 4) Out of bounds packet access in _decode_session6(), from Alexei Starovoitov. 5) Several ethtool bugs, where we copy a struct into the kernel twice and our validations of the values in the first copy can be invalidated by the second copy due to asynchronous updates to the memory by the user. From Wenwen Wang. 6) Missing netlink attribute validation in cls_api, from Davide Caratti. 7) LLC SAP sockets neet to be SOCK_RCU FREE, from Cong Wang. 8) rxrpc operates on wrong kvec, from Yue Haibing. 9) A regression was introduced by the disassosciation of route neighbour references in rt6_probe(), causing probe for neighbourless routes to not be properly rate limited. Fix from Sabrina Dubroca. 10) Unsafe RCU locking in tipc, from Tung Nguyen. 11) Use after free in inet6_mc_check(), from Eric Dumazet. 12) PMTU from icmp packets should update the SCTP transport pathmtu, from Xin Long. 13) Missing peer put on error in rxrpc, from David Howells. 14) Fix pedit in nfp driver, from Pieter Jansen van Vuuren. 15) Fix overflowing shift statement in qla3xxx driver, from Nathan Chancellor. 16) Fix Spectre v1 in ptp code, from Gustavo A. R. Silva. 17) udp6_unicast_rcv_skb() interprets udpv6_queue_rcv_skb() return value in an inverted manner, fix from Paolo Abeni. 18) Fix missed unresolved entries in ipmr dumps, from Nikolay Aleksandrov. 19) Fix NAPI handling under high load, we can completely miss events when NAPI has to loop more than one time in a cycle. From Heiner Kallweit." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (49 commits) ip6_tunnel: Fix encapsulation layout tipc: fix info leak from kernel tipc_event net: socket: fix a missing-check bug net: sched: Fix for duplicate class dump r8169: fix NAPI handling under high load net: ipmr: fix unresolved entry dumps net: mscc: ocelot: Fix comment in ocelot_vlant_wait_for_completion() sctp: fix the data size calculation in sctp_data_size virtio_net: avoid using netif_tx_disable() for serializing tx routine udp6: fix encap return code for resubmitting mlxsw: core: Fix use-after-free when flashing firmware during init sctp: not free the new asoc when sctp_wait_for_connect returns err sctp: fix race on sctp_id2asoc r8169: re-enable MSI-X on RTL8168g net: bpfilter: use get_pid_task instead of pid_task ptp: fix Spectre v1 vulnerability net: qla3xxx: Remove overflowing shift statement geneve, vxlan: Don't set exceptions if skb->len < mtu geneve, vxlan: Don't check skb_dst() twice sctp: get pr_assoc and pr_stream all status with SCTP_PR_SCTP_ALL instead ...
2018-10-18mremap: properly flush TLB before releasing the pageLinus Torvalds
Jann Horn points out that our TLB flushing was subtly wrong for the mremap() case. What makes mremap() special is that we don't follow the usual "add page to list of pages to be freed, then flush tlb, and then free pages". No, mremap() obviously just _moves_ the page from one page table location to another. That matters, because mremap() thus doesn't directly control the lifetime of the moved page with a freelist: instead, the lifetime of the page is controlled by the page table locking, that serializes access to the entry. As a result, we need to flush the TLB not just before releasing the lock for the source location (to avoid any concurrent accesses to the entry), but also before we release the destination page table lock (to avoid the TLB being flushed after somebody else has already done something to that page). This also makes the whole "need_flush" logic unnecessary, since we now always end up flushing the TLB for every valid entry. Reported-and-tested-by: Jann Horn <jannh@google.com> Acked-by: Will Deacon <will.deacon@arm.com> Tested-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18Merge tag 'trace-v4.19-rc8' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Steven writes: "tracing: Two fixes for 4.19 This fixes two bugs: - Fix size mismatch of tracepoint array - Have preemptirq test module use same clock source of the selftest" * tag 'trace-v4.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Use trace_clock_local() for looping in preemptirq_delay_test.c tracepoint: Fix tracepoint array element size mismatch
2018-10-17tracepoint: Fix tracepoint array element size mismatchMathieu Desnoyers
commit 46e0c9be206f ("kernel: tracepoints: add support for relative references") changes the layout of the __tracepoint_ptrs section on architectures supporting relative references. However, it does so without turning struct tracepoint * const into const int elsewhere in the tracepoint code, which has the following side-effect: Setting mod->num_tracepoints is done in by module.c: mod->tracepoints_ptrs = section_objs(info, "__tracepoints_ptrs", sizeof(*mod->tracepoints_ptrs), &mod->num_tracepoints); Basically, since sizeof(*mod->tracepoints_ptrs) is a pointer size (rather than sizeof(int)), num_tracepoints is erroneously set to half the size it should be on 64-bit arch. So a module with an odd number of tracepoints misses the last tracepoint due to effect of integer division. So in the module going notifier: for_each_tracepoint_range(mod->tracepoints_ptrs, mod->tracepoints_ptrs + mod->num_tracepoints, tp_module_going_check_quiescent, NULL); the expression (mod->tracepoints_ptrs + mod->num_tracepoints) actually evaluates to something within the bounds of the array, but miss the last tracepoint if the number of tracepoints is odd on 64-bit arch. Fix this by introducing a new typedef: tracepoint_ptr_t, which is either "const int" on architectures that have PREL32 relocations, or "struct tracepoint * const" on architectures that does not have this feature. Also provide a new tracepoint_ptr_defer() static inline to encapsulate deferencing this type rather than duplicate code and ugly idefs within the for_each_tracepoint_range() implementation. This issue appears in 4.19-rc kernels, and should ideally be fixed before the end of the rc cycle. Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Jessica Yu <jeyu@kernel.org> Link: http://lkml.kernel.org/r/20181013191050.22389-1-mathieu.desnoyers@efficios.com Link: http://lkml.kernel.org/r/20180704083651.24360-7-ard.biesheuvel@linaro.org Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morris <james.morris@microsoft.com> Cc: James Morris <jmorris@namei.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Nicolas Pitre <nico@linaro.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Russell King <linux@armlinux.org.uk> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-15Merge tag 'mlx5-fixes-2018-10-10' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2018-10-10 This pull request includes some fixes to mlx5 driver, Please pull and let me know if there's any problem. For -stable v4.11: ('net/mlx5: Take only bit 24-26 of wqe.pftype_wq for page fault type') For -stable v4.17: ('net/mlx5: Fix memory leak when setting fpga ipsec caps') For -stable v4.18: ('net/mlx5: WQ, fixes for fragmented WQ buffers API') ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-13Merge tag 'arm64-fixes' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Will writes: "More arm64 fixes - Reject CHAIN PMU events when they are not part of a 64-bit counter - Fix WARN_ON_ONCE() that triggers for reserved regions that don't correspond to mapped memory" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: perf: Reject stand-alone CHAIN events for PMUv3 arm64: Fix /proc/iomem for reserved but not memory regions
2018-10-12arm64: perf: Reject stand-alone CHAIN events for PMUv3Will Deacon
It doesn't make sense for a perf event to be configured as a CHAIN event in isolation, so extend the arm_pmu structure with a ->filter_match() function to allow the backend PMU implementation to reject CHAIN events early. Cc: <stable@vger.kernel.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-10-12Merge tag 'gpio-v4.19-4' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Linus writes: "GPIO fix for the v4.19 series: - Fix up the interrupt parent for the irqdomains." * tag 'gpio-v4.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: Assign gpio_irq_chip::parents to non-stack pointer
2018-10-12Merge branch 'for-linus' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Dmitry writes: "Input updates for v4.19-rc7 - we added a few scheduling points into various input interfaces to ensure that large writes will not cause RCU stalls - fixed configuring PS/2 keyboards as wakeup devices on newer platforms - added a new Xbox gamepad ID." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: uinput - add a schedule point in uinput_inject_events() Input: evdev - add a schedule point in evdev_write() Input: mousedev - add a schedule point in mousedev_write() Input: i8042 - enable keyboard wakeups by default when s2idle is used Input: xpad - add support for Xbox1 PDP Camo series gamepad
2018-10-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netGreg Kroah-Hartman
David writes: "Networking 1) RXRPC receive path fixes from David Howells. 2) Re-export __skb_recv_udp(), from Jiri Kosina. 3) Fix refcounting in u32 classificer, from Al Viro. 4) Userspace netlink ABI fixes from Eugene Syromiatnikov. 5) Don't double iounmap on rmmod in ena driver, from Arthur Kiyanovski. 6) Fix devlink string attribute handling, we must pull a copy into a kernel buffer if the lifetime extends past the netlink request. From Moshe Shemesh. 7) Fix hangs in RDS, from Ka-Cheong Poon. 8) Fix recursive locking lockdep warnings in tipc, from Ying Xue. 9) Clear RX irq correctly in socionext, from Ilias Apalodimas. 10) bcm_sf2 fixes from Florian Fainelli." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits) net: dsa: bcm_sf2: Call setup during switch resume net: dsa: bcm_sf2: Fix unbind ordering net: phy: sfp: remove sfp_mutex's definition r8169: set RX_MULTI_EN bit in RxConfig for 8168F-family chips net: socionext: clear rx irq correctly net/mlx4_core: Fix warnings during boot on driverinit param set failures tipc: eliminate possible recursive locking detected by LOCKDEP selftests: udpgso_bench.sh explicitly requires bash selftests: rtnetlink.sh explicitly requires bash. qmi_wwan: Added support for Gemalto's Cinterion ALASxx WWAN interface tipc: queue socket protocol error messages into socket receive buffer tipc: set link tolerance correctly in broadcast link net: ipv4: don't let PMTU updates increase route MTU net: ipv4: update fnhe_pmtu when first hop's MTU changes net/ipv6: stop leaking percpu memory in fib6 info rds: RDS (tcp) hangs on sendto() to unresponding address net: make skb_partial_csum_set() more robust against overflows devlink: Add helper function for safely copy string param devlink: Fix param cmode driverinit for string type devlink: Fix param set handling for string type ...
2018-10-11Merge branch 'for-4.19-fixes' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Tejun writes: "cgroup fixes for v4.19-rc7 One cgroup2 threaded mode fix for v4.19-rc7. While threaded mode isn't used widely (yet) and the bug requires somewhat convoluted sequence of operations, it causes a userland visible malfunction - EINVAL on a valid attempt to enable threaded mode. This pull request contains the fix" * 'for-4.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Fix dom_cgrp propagation when enabling threaded mode
2018-10-10net: ipv4: update fnhe_pmtu when first hop's MTU changesSabrina Dubroca
Since commit 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions"), exceptions get deprecated separately from cached routes. In particular, administrative changes don't clear PMTU anymore. As Stefano described in commit e9fa1495d738 ("ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes"), the PMTU discovered before the local MTU change can become stale: - if the local MTU is now lower than the PMTU, that PMTU is now incorrect - if the local MTU was the lowest value in the path, and is increased, we might discover a higher PMTU Similarly to what commit e9fa1495d738 did for IPv6, update PMTU in those cases. If the exception was locked, the discovered PMTU was smaller than the minimal accepted PMTU. In that case, if the new local MTU is smaller than the current PMTU, let PMTU discovery figure out if locking of the exception is still needed. To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU notifier. By the time the notifier is called, dev->mtu has been changed. This patch adds the old MTU as additional information in the notifier structure, and a new call_netdevice_notifiers_u32() function. Fixes: 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net/mlx5: WQ, fixes for fragmented WQ buffers APITariq Toukan
mlx5e netdevice used to calculate fragment edges by a call to mlx5_wq_cyc_get_frag_size(). This calculation did not give the correct indication for queues smaller than a PAGE_SIZE, (broken by default on PowerPC, where PAGE_SIZE == 64KB). Here it is replaced by the correct new calls/API. Since (TX/RX) Work Queues buffers are fragmented, here we introduce changes to the API in core driver, so that it gets a stride index and returns the index of last stride on same fragment, and an additional wrapping function that returns the number of physically contiguous strides that can be written contiguously to the work queue. This obsoletes the following API functions, and their buggy usage in EN driver: * mlx5_wq_cyc_get_frag_size() * mlx5_wq_cyc_ctr2fragix() The new API improves modularity and hides the details of such calculation for mlx5e netdevice and mlx5_ib rdma drivers. New calculation is also more efficient, and improves performance as follows: Packet rate test: pktgen, UDP / IPv4, 64byte, single ring, 8K ring size. Before: 16,477,619 pps After: 17,085,793 pps 3.7% improvement Fixes: 3a2f70331226 ("net/mlx5: Use order-0 allocations for all WQ types") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-10gpio: Assign gpio_irq_chip::parents to non-stack pointerStephen Boyd
gpiochip_set_cascaded_irqchip() is passed 'parent_irq' as an argument and then the address of that argument is assigned to the gpio chips gpio_irq_chip 'parents' pointer shortly thereafter. This can't ever work, because we've just assigned some stack address to a pointer that we plan to dereference later in gpiochip_irq_map(). I ran into this issue with the KASAN report below when gpiochip_irq_map() tried to setup the parent irq with a total junk pointer for the 'parents' array. BUG: KASAN: stack-out-of-bounds in gpiochip_irq_map+0x228/0x248 Read of size 4 at addr ffffffc0dde472e0 by task swapper/0/1 CPU: 7 PID: 1 Comm: swapper/0 Not tainted 4.14.72 #34 Call trace: [<ffffff9008093638>] dump_backtrace+0x0/0x718 [<ffffff9008093da4>] show_stack+0x20/0x2c [<ffffff90096b9224>] __dump_stack+0x20/0x28 [<ffffff90096b91c8>] dump_stack+0x80/0xbc [<ffffff900845a350>] print_address_description+0x70/0x238 [<ffffff900845a8e4>] kasan_report+0x1cc/0x260 [<ffffff900845aa14>] __asan_report_load4_noabort+0x2c/0x38 [<ffffff900897e098>] gpiochip_irq_map+0x228/0x248 [<ffffff900820cc08>] irq_domain_associate+0x114/0x2ec [<ffffff900820d13c>] irq_create_mapping+0x120/0x234 [<ffffff900820da78>] irq_create_fwspec_mapping+0x4c8/0x88c [<ffffff900820e2d8>] irq_create_of_mapping+0x180/0x210 [<ffffff900917114c>] of_irq_get+0x138/0x198 [<ffffff9008dc70ac>] spi_drv_probe+0x94/0x178 [<ffffff9008ca5168>] driver_probe_device+0x51c/0x824 [<ffffff9008ca6538>] __device_attach_driver+0x148/0x20c [<ffffff9008ca14cc>] bus_for_each_drv+0x120/0x188 [<ffffff9008ca570c>] __device_attach+0x19c/0x2dc [<ffffff9008ca586c>] device_initial_probe+0x20/0x2c [<ffffff9008ca18bc>] bus_probe_device+0x80/0x154 [<ffffff9008c9b9b4>] device_add+0x9b8/0xbdc [<ffffff9008dc7640>] spi_add_device+0x1b8/0x380 [<ffffff9008dcbaf0>] spi_register_controller+0x111c/0x1378 [<ffffff9008dd6b10>] spi_geni_probe+0x4dc/0x6f8 [<ffffff9008cab058>] platform_drv_probe+0xdc/0x130 [<ffffff9008ca5168>] driver_probe_device+0x51c/0x824 [<ffffff9008ca59cc>] __driver_attach+0x100/0x194 [<ffffff9008ca0ea8>] bus_for_each_dev+0x104/0x16c [<ffffff9008ca58c0>] driver_attach+0x48/0x54 [<ffffff9008ca1edc>] bus_add_driver+0x274/0x498 [<ffffff9008ca8448>] driver_register+0x1ac/0x230 [<ffffff9008caaf6c>] __platform_driver_register+0xcc/0xdc [<ffffff9009c4b33c>] spi_geni_driver_init+0x1c/0x24 [<ffffff9008084cb8>] do_one_initcall+0x240/0x3dc [<ffffff9009c017d0>] kernel_init_freeable+0x378/0x468 [<ffffff90096e8240>] kernel_init+0x14/0x110 [<ffffff9008086fcc>] ret_from_fork+0x10/0x18 The buggy address belongs to the page: page:ffffffbf037791c0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x4000000000000000() raw: 4000000000000000 0000000000000000 0000000000000000 00000000ffffffff raw: ffffffbf037791e0 ffffffbf037791e0 0000000000000000 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffffffc0dde47180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffc0dde47200: f1 f1 f1 f1 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f2 f2 >ffffffc0dde47280: f2 f2 00 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 ^ ffffffc0dde47300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffc0dde47380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Let's leave around one unsigned int in the gpio_irq_chip struct for the single parent irq case and repoint the 'parents' array at it. This way code is left mostly intact to setup parents and we waste an extra few bytes per structure of which there should be only a handful in a system. Cc: Evan Green <evgreen@chromium.org> Cc: Thierry Reding <treding@nvidia.com> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Fixes: e0d897289813 ("gpio: Implement tighter IRQ chip integration") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-10-09mm, sched/numa: Remove remaining traces of NUMA rate-limitingSrikar Dronamraju
Remove the leftover pglist_data::numabalancing_migrate_lock and its initialization, we stopped using this lock with: efaffc5e40ae ("mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration") [ mingo: Rewrote the changelog. ] Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linux-MM <linux-mm@kvack.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1538824999-31230-1-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-07Merge tag 'char-misc-4.19-rc7' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc I wrote: "Char/Misc fixes for 4.19-rc7 Here are 8 small fixes for some char/misc driver issues Included here are: - fpga driver fixes - thunderbolt bugfixes - firmware core revert/fix - hv core fix - hv tool fix All of these have been in linux-next with no reported issues." * tag 'char-misc-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: thunderbolt: Initialize after IOMMUs thunderbolt: Do not handle ICM events after domain is stopped firmware: Always initialize the fw_priv list object docs: fpga: document fpga manager flags fpga: bridge: fix obvious function documentation error tools: hv: fcopy: set 'error' in case an unknown operation was requested fpga: do not access region struct after fpga_region_unregister Drivers: hv: vmbus: Use get/put_cpu() in vmbus_connect()
2018-10-07Merge tag 'tty-4.19-rc7' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty I wrote: "Serial driver fixes for 4.19-rc7 Here are 3 small serial driver fixes for 4.19-rc7 - 2 sh-sci bugfixes for reported issues - a revert of the PM handling for the 8250_dw code All of these have been in linux-next with no reported issues." * tag 'tty-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: Revert "serial: sh-sci: Allow for compressed SCIF address" Revert "serial: sh-sci: Remove SCIx_RZ_SCIFA_REGTYPE" Revert "serial: 8250_dw: Fix runtime PM handling"
2018-10-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netGreg Kroah-Hartman
Dave writes: "Networking fixes: 1) Fix truncation of 32-bit right shift in bpf, from Jann Horn. 2) Fix memory leak in wireless wext compat, from Stefan Seyfried. 3) Use after free in cfg80211's reg_process_hint(), from Yu Zhao. 4) Need to cancel pending work when unbinding in smsc75xx otherwise we oops, also from Yu Zhao. 5) Don't allow enslaving a team device to itself, from Ido Schimmel. 6) Fix backwards compat with older userspace for rtnetlink FDB dumps. From Mauricio Faria. 7) Add validation of tc policy netlink attributes, from David Ahern. 8) Fix RCU locking in rawv6_send_hdrinc(), from Wei Wang." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits) net: mvpp2: Extract the correct ethtype from the skb for tx csum offload ipv6: take rcu lock in rawv6_send_hdrinc() net: sched: Add policy validation for tc attributes rtnetlink: fix rtnl_fdb_dump() for ndmsg header yam: fix a missing-check bug net: bpfilter: Fix type cast and pointer warnings net: cxgb3_main: fix a missing-check bug bpf: 32-bit RSH verification must truncate input before the ALU op net: phy: phylink: fix SFP interface autodetection be2net: don't flip hw_features when VXLANs are added/deleted net/packet: fix packet drop as of virtio gso net: dsa: b53: Keep CPU port as tagged in all VLANs openvswitch: load NAT helper bnxt_en: get the reduced max_irqs by the ones used by RDMA bnxt_en: free hwrm resources, if driver probe fails. bnxt_en: Fix enables field in HWRM_QUEUE_COS2BW_CFG request bnxt_en: Fix VNIC reservations on the PF. team: Forbid enslaving team device to itself net/usb: cancel pending work when unbinding smsc75xx mlxsw: spectrum: Delete RIF when VLAN device is removed ...
2018-10-05Merge branch 'akpm'Greg Kroah-Hartman
* akpm: mm: madvise(MADV_DODUMP): allow hugetlbfs pages ocfs2: fix locking for res->tracking and dlm->tracking_list mm/vmscan.c: fix int overflow in callers of do_shrink_slab() mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly mm/vmstat.c: fix outdated vmstat_text proc: restrict kernel stack dumps to root mm/hugetlb: add mmap() encodings for 32MB and 512MB page sizes mm/migrate.c: split only transparent huge pages when allocation fails ipc/shm.c: use ERR_CAST() for shm_lock() error return mm/gup_benchmark: fix unsigned comparison to zero in __gup_benchmark_ioctl mm, thp: fix mlocking THP page with migration enabled ocfs2: fix crash in ocfs2_duplicate_clusters_by_page() hugetlb: take PMD sharing into account when flushing tlb/caches mm: migration: fix migration of huge PMD shared pages
2018-10-05mm: migration: fix migration of huge PMD shared pagesMike Kravetz
The page migration code employs try_to_unmap() to try and unmap the source page. This is accomplished by using rmap_walk to find all vmas where the page is mapped. This search stops when page mapcount is zero. For shared PMD huge pages, the page map count is always 1 no matter the number of mappings. Shared mappings are tracked via the reference count of the PMD page. Therefore, try_to_unmap stops prematurely and does not completely unmap all mappings of the source page. This problem can result is data corruption as writes to the original source page can happen after contents of the page are copied to the target page. Hence, data is lost. This problem was originally seen as DB corruption of shared global areas after a huge page was soft offlined due to ECC memory errors. DB developers noticed they could reproduce the issue by (hotplug) offlining memory used to back huge pages. A simple testcase can reproduce the problem by creating a shared PMD mapping (note that this must be at least PUD_SIZE in size and PUD_SIZE aligned (1GB on x86)), and using migrate_pages() to migrate process pages between nodes while continually writing to the huge pages being migrated. To fix, have the try_to_unmap_one routine check for huge PMD sharing by calling huge_pmd_unshare for hugetlbfs huge pages. If it is a shared mapping it will be 'unshared' which removes the page table entry and drops the reference on the PMD page. After this, flush caches and TLB. mmu notifiers are called before locking page tables, but we can not be sure of PMD sharing until page tables are locked. Therefore, check for the possibility of PMD sharing before locking so that notifiers can prepare for the worst possible case. Link: http://lkml.kernel.org/r/20180823205917.16297-2-mike.kravetz@oracle.com [mike.kravetz@oracle.com: make _range_in_vma() a static inline] Link: http://lkml.kernel.org/r/6063f215-a5c8-2f0c-465a-2c515ddc952d@oracle.com Fixes: 39dde65c9940 ("shared page table for hugetlb page") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05Merge branch 'sched-urgent-for-linus' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Ingo writes: "scheduler fixes: These fixes address a rather involved performance regression between v4.17->v4.19 in the sched/numa auto-balancing code. Since distros really need this fix we accelerated it to sched/urgent for a faster upstream merge. NUMA scheduling and balancing performance is now largely back to v4.17 levels, without reintroducing the NUMA placement bugs that v4.18 and v4.19 fixed. Many thanks to Srikar Dronamraju, Mel Gorman and Jirka Hladky, for reporting, testing, re-testing and solving this rather complex set of bugs." * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/numa: Migrate pages to local nodes quicker early in the lifetime of a task mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration sched/numa: Avoid task migration for small NUMA improvement mm/migrate: Use spin_trylock() while resetting rate limit sched/numa: Limit the conditions where scan period is reset sched/numa: Reset scan rate whenever task moves across nodes sched/numa: Pass destination CPU as a parameter to migrate_task_rq sched/numa: Stop multiple tasks from moving to the CPU at the same time
2018-10-04net/packet: fix packet drop as of virtio gsoJianfeng Tan
When we use raw socket as the vhost backend, a packet from virito with gso offloading information, cannot be sent out in later validaton at xmit path, as we did not set correct skb->protocol which is further used for looking up the gso function. To fix this, we set this field according to virito hdr information. Fixes: e858fae2b0b8f4 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion") Signed-off-by: Jianfeng Tan <jianfeng.tan@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04cgroup: Fix dom_cgrp propagation when enabling threaded modeTejun Heo
A cgroup which is already a threaded domain may be converted into a threaded cgroup if the prerequisite conditions are met. When this happens, all threaded descendant should also have their ->dom_cgrp updated to the new threaded domain cgroup. Unfortunately, this propagation was missing leading to the following failure. # cd /sys/fs/cgroup/unified # cat cgroup.subtree_control # show that no controllers are enabled # mkdir -p mycgrp/a/b/c # echo threaded > mycgrp/a/b/cgroup.type At this point, the hierarchy looks as follows: mycgrp [d] a [dt] b [t] c [inv] Now let's make node "a" threaded (and thus "mycgrp" s made "domain threaded"): # echo threaded > mycgrp/a/cgroup.type By this point, we now have a hierarchy that looks as follows: mycgrp [dt] a [t] b [t] c [inv] But, when we try to convert the node "c" from "domain invalid" to "threaded", we get ENOTSUP on the write(): # echo threaded > mycgrp/a/b/c/cgroup.type sh: echo: write error: Operation not supported This patch fixes the problem by * Moving the opencoded ->dom_cgrp save and restoration in cgroup_enable_threaded() into cgroup_{save|restore}_control() so that mulitple cgroups can be handled. * Updating all threaded descendants' ->dom_cgrp to point to the new dom_cgrp when enabling threaded mode. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Reported-by: Amin Jamali <ajamali@pivotal.io> Reported-by: Joao De Almeida Pereira <jpereira@pivotal.io> Link: https://lore.kernel.org/r/CAKgNAkhHYCMn74TCNiMJ=ccLd7DcmXSbvw3CbZ1YREeG7iJM5g@mail.gmail.com Fixes: 454000adaa2a ("cgroup: introduce cgroup->dom_cgrp and threaded css_set handling") Cc: stable@vger.kernel.org # v4.14+
2018-10-04Merge tag 'ovl-fixes-4.19-rc7' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Miklos writes: "overlayfs fixes for 4.19-rc7 This update fixes a couple of regressions in the stacked file update added in this cycle, as well as some older bugs uncovered by syzkaller. There's also one trivial naming change that touches other parts of the fs subsystem." * tag 'ovl-fixes-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix format of setxattr debug ovl: fix access beyond unterminated strings ovl: make symbol 'ovl_aops' static vfs: swap names of {do,vfs}_clone_file_range() ovl: fix freeze protection bypass in ovl_clone_file_range() ovl: fix freeze protection bypass in ovl_write_iter() ovl: fix memory leak on unlink of indexed file
2018-10-03Merge gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/netGreg Kroah-Hartman
David writes: "Networking fixes: 1) Prefix length validation in xfrm layer, from Steffen Klassert. 2) TX status reporting fix in mac80211, from Andrei Otcheretianski. 3) Fix hangs due to TX_DROP in mac80211, from Bob Copeland. 4) Fix DMA error regression in b43, from Larry Finger. 5) Add input validation to xenvif_set_hash_mapping(), from Jan Beulich. 6) SMMU unmapping fix in hns driver, from Yunsheng Lin. 7) Bluetooh crash in unpairing on SMP, from Matias Karhumaa. 8) WoL handling fixes in the phy layer, from Heiner Kallweit. 9) Fix deadlock in bonding, from Mahesh Bandewar. 10) Fill ttl inherit infor in vxlan driver, from Hangbin Liu. 11) Fix TX timeouts during netpoll, from Michael Chan. 12) RXRPC layer fixes from David Howells. 13) Another batch of ndo_poll_controller() removals to deal with excessive resource consumption during load. From Eric Dumazet. 14) Fix a specific TIPC failure secnario, from LUU Duc Canh. 15) Really disable clocks in r8169 during suspend so that low power states can actually be reached. 16) Fix SYN backlog lockdep issue in tcp and dccp, from Eric Dumazet. 17) Fix RCU locking in netpoll SKB send, which shows up in bonding, from Dave Jones. 18) Fix TX stalls in r8169, from Heiner Kallweit. 19) Fix locksup in nfp due to control message storms, from Jakub Kicinski. 20) Various rmnet bug fixes from Subash Abhinov Kasiviswanathan and Sean Tranchetti. 21) Fix use after free in ip_cmsg_recv_dstaddr(), from Eric Dumazet." * gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net: (122 commits) ixgbe: check return value of napi_complete_done() sctp: fix fall-through annotation r8169: always autoneg on resume ipv4: fix use-after-free in ip_cmsg_recv_dstaddr() net: qualcomm: rmnet: Fix incorrect allocation flag in receive path net: qualcomm: rmnet: Fix incorrect allocation flag in transmit net: qualcomm: rmnet: Skip processing loopback packets net: systemport: Fix wake-up interrupt race during resume rtnl: limit IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES to 4096 bonding: fix warning message inet: make sure to grab rcu_read_lock before using ireq->ireq_opt nfp: avoid soft lockups under control message storm declance: Fix continuation with the adapter identification message net: fec: fix rare tx timeout r8169: fix network stalls due to missing bit TXCFG_AUTO_FIFO tun: napi flags belong to tfile tun: initialize napi_mutex unconditionally tun: remove unused parameters bond: take rcu lock in netpoll_send_skb_on_dev rtnetlink: Fail dump if target netnsid is invalid ...
2018-10-02Merge tag 'mlx5-fixes-2018-10-01' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2018-10-01 This pull request includes some fixes to mlx5 driver, Please pull and let me know if there's any problem. For -stable v4.11: "6e0a4a23c59a ('net/mlx5: E-Switch, Fix out of bound access when setting vport rate')" For -stable v4.18: "98d6627c372a ('net/mlx5e: Set vlan masks for all offloaded TC rules')" ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02Revert "serial: sh-sci: Remove SCIx_RZ_SCIFA_REGTYPE"Geert Uytterhoeven
This reverts commit 7acece71a517cad83a0842a94d94c13f271b680c. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Chris Brandt <chris.brandt@renesas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-02mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migrationMel Gorman
Rate limiting of page migrations due to automatic NUMA balancing was introduced to mitigate the worst-case scenario of migrating at high frequency due to false sharing or slowly ping-ponging between nodes. Since then, a lot of effort was spent on correctly identifying these pages and avoiding unnecessary migrations and the safety net may no longer be required. Jirka Hladky reported a regression in 4.17 due to a scheduler patch that avoids spreading STREAM tasks wide prematurely. However, once the task was properly placed, it delayed migrating the memory due to rate limiting. Increasing the limit fixed the problem for him. Currently, the limit is hard-coded and does not account for the real capabilities of the hardware. Even if an estimate was attempted, it would not properly account for the number of memory controllers and it could not account for the amount of bandwidth used for normal accesses. Rather than fudging, this patch simply eliminates the rate limiting. However, Jirka reports that a STREAM configuration using multiple processes achieved similar performance to 4.16. In local tests, this patch improved performance of STREAM relative to the baseline but it is somewhat machine-dependent. Most workloads show little or not performance difference implying that there is not a heavily reliance on the throttling mechanism and it is safe to remove. STREAM on 2-socket machine 4.19.0-rc5 4.19.0-rc5 numab-v1r1 noratelimit-v1r1 MB/sec copy 43298.52 ( 0.00%) 44673.38 ( 3.18%) MB/sec scale 30115.06 ( 0.00%) 31293.06 ( 3.91%) MB/sec add 32825.12 ( 0.00%) 34883.62 ( 6.27%) MB/sec triad 32549.52 ( 0.00%) 34906.60 ( 7.24% Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Rik van Riel <riel@surriel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jirka Hladky <jhladky@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linux-MM <linux-mm@kvack.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181001100525.29789-2-mgorman@techsingularity.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-01Input: i8042 - enable keyboard wakeups by default when s2idle is usedDaniel Drake
Previously, on typical consumer laptops, pressing a key on the keyboard when the system is in suspend would cause it to wake up (default or unconditional behaviour). This happens because the EC generates a SCI interrupt in this scenario. That is no longer true on modern laptops based on Intel WhiskeyLake, including Acer Swift SF314-55G, Asus UX333FA, Asus UX433FN and Asus UX533FD. We confirmed with Asus EC engineers that the "Modern Standby" design has been modified so that the EC no longer generates a SCI in this case; the keyboard controller itself should be used for wakeup. In order to retain the standard behaviour of being able to use the keyboard to wake up the system, enable serio wakeups by default on platforms that are using s2idle. Link: https://lkml.kernel.org/r/CAB4CAwfQ0mPMqCLp95TVjw4J0r5zKPWkSvvkK4cpZUGE--w8bQ@mail.gmail.com Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Daniel Drake <drake@endlessm.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2018-10-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree: 1) Skip ip_sabotage_in() for packet making into the VRF driver, otherwise packets are dropped, from David Ahern. 2) Clang compilation warning uncovering typo in the nft_validate_register_store() call from nft_osf, from Stefan Agner. 3) Double sizeof netlink message length calculations in ctnetlink, from zhong jiang. 4) Missing rb_erase() on batch full in rbtree garbage collector, from Taehee Yoo. 5) Calm down compilation warning in nf_hook(), from Florian Westphal. 6) Missing check for non-null sk in xt_socket before validating netns procedence, from Flavio Leitner. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01net/mlx5e: Avoid unbounded peer devices when unpairing TC hairpin rulesAlaa Hleihel
If the peer device was already unbound, then do not attempt to modify it's resources, otherwise we will crash on dereferencing non-existing device. Fixes: 5c65c564c962 ("net/mlx5e: Support offloading TC NIC hairpin flows") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-30docs: fpga: document fpga manager flagsAlan Tull
Add flags #defines to kerneldoc documentation in a useful place. Signed-off-by: Alan Tull <atull@kernel.org> Acked-by: Moritz Fischer <mdf@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-28Merge tag 'spi-fix-v4.19-rc5' of ↵Greg Kroah-Hartman
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Mark writes: "spi: Fixes for v4.19 Quite a few fixes for the Renesas drivers in here, plus a fix for the Tegra driver and some documentation fixes for the recently added spi-mem code. The Tegra fix is relatively large but fairly straightforward and mechanical, it runs on probe so it's been reasonably well covered in -next testing." * tag 'spi-fix-v4.19-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: spi-mem: Move the DMA-able constraint doc to the kerneldoc header spi: spi-mem: Add missing description for data.nbytes field spi: rspi: Fix interrupted DMA transfers spi: rspi: Fix invalid SPI use during system suspend spi: sh-msiof: Fix handling of write value for SISTR register spi: sh-msiof: Fix invalid SPI use during system suspend spi: gpio: Fix copy-and-paste error spi: tegra20-slink: explicitly enable/disable clock
2018-09-28Merge tag 'regulator-v4.19-rc5' of ↵Greg Kroah-Hartman
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Mark writes: "regulator: Fixes for 4.19 A collection of fairly minor bug fixes here, a couple of driver specific ones plus two core fixes. There's one fix for the new suspend state code which fixes some confusion with constant values that are supposed to indicate noop operation and another fixing a race condition with the creation of sysfs files on new regulators." * tag 'regulator-v4.19-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: fix crash caused by null driver data regulator: Fix 'do-nothing' value for regulators without suspend state regulator: da9063: fix DT probing with constraints regulator: bd71837: Disable voltage monitoring for LDO3/4
2018-09-28netfilter: avoid erronous array bounds warningFlorian Westphal
Unfortunately some versions of gcc emit following warning: $ make net/xfrm/xfrm_output.o linux/compiler.h:252:20: warning: array subscript is above array bounds [-Warray-bounds] hook_head = rcu_dereference(net->nf.hooks_arp[hook]); ^~~~~~~~~~~~~~~~~~~~~ xfrm_output_resume passes skb_dst(skb)->ops->family as its 'pf' arg so compiler can't know that we'll never access hooks_arp[]. (NFPROTO_IPV4 or NFPROTO_IPV6 are only possible cases). Avoid this by adding an explicit WARN_ON_ONCE() check. This patch has no effect if the family is a compile-time constant as gcc will remove the switch() construct entirely. Reported-by: David Ahern <dsahern@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-26net: core: add member wol_enabled to struct net_deviceHeiner Kallweit
Add flag wol_enabled to struct net_device indicating whether Wake-on-LAN is enabled. As first user phy_suspend() will use it to decide whether PHY can be suspended or not. Fixes: f1e911d5d0df ("r8169: add basic phylib support") Fixes: e8cfd9d6c772 ("net: phy: call state machine synchronously in phy_stop") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25erge tag 'libnvdimm-fixes-4.19-rc6' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Dan writes: "libnvdimm/dax for 4.19-rc6 * (2) fixes for the dax error handling updates that were merged for v4.19-rc1. My mails to Al have been bouncing recently, so I do not have his ack but the uaccess change is of the trivial / obviously correct variety. The address_space_operations fixes a regression. * A filesystem-dax fix to correct the zero page lookup to be compatible with non-x86 (mips and s390) architectures." * tag 'libnvdimm-fixes-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: device-dax: Add missing address_space_operations uaccess: Fix is_source param for check_copy_size() in copy_to_iter_mcsafe() filesystem-dax: Fix use of zero page
2018-09-25Merge gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/netGreg Kroah-Hartman
Dave writes: "Networking fixes: 1) Fix multiqueue handling of coalesce timer in stmmac, from Jose Abreu. 2) Fix memory corruption in NFC, from Suren Baghdasaryan. 3) Don't write reserved bits in ravb driver, from Kazuya Mizuguchi. 4) SMC bug fixes from Karsten Graul, YueHaibing, and Ursula Braun. 5) Fix TX done race in mvpp2, from Antoine Tenart. 6) ipv6 metrics leak, from Wei Wang. 7) Adjust firmware version requirements in mlxsw, from Petr Machata. 8) Fix autonegotiation on resume in r8169, from Heiner Kallweit. 9) Fixed missing entries when dumping /proc/net/if_inet6, from Jeff Barnhill. 10) Fix double free in devlink, from Dan Carpenter. 11) Fix ethtool regression from UFO feature removal, from Maciej Żenczykowski. 12) Fix drivers that have a ndo_poll_controller() that captures the cpu entirely on loaded hosts by trying to drain all rx and tx queues, from Eric Dumazet. 13) Fix memory corruption with jumbo frames in aquantia driver, from Friedemann Gerold." * gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net: (79 commits) net: mvneta: fix the remaining Rx descriptor unmapping issues ip_tunnel: be careful when accessing the inner header mpls: allow routes on ip6gre devices net: aquantia: memory corruption on jumbo frames tun: remove ndo_poll_controller nfp: remove ndo_poll_controller bnxt: remove ndo_poll_controller bnx2x: remove ndo_poll_controller mlx5: remove ndo_poll_controller mlx4: remove ndo_poll_controller i40evf: remove ndo_poll_controller ice: remove ndo_poll_controller igb: remove ndo_poll_controller ixgb: remove ndo_poll_controller fm10k: remove ndo_poll_controller ixgbevf: remove ndo_poll_controller ixgbe: remove ndo_poll_controller bonding: use netpoll_poll_dev() helper netpoll: make ndo_poll_controller() optional rds: Fix build regression. ...