summaryrefslogtreecommitdiff
path: root/include/uapi/linux
AgeCommit message (Collapse)Author
2019-05-07Merge tag 'for-5.2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "This time the majority of changes are cleanups, though there's still a number of changes of user interest. User visible changes: - better read time and write checks to catch errors early and before writing data to disk (to catch potential memory corruption on data that get checksummed) - qgroups + metadata relocation: last speed up patch int the series to address the slowness, there should be no overhead comparing balance with and without qgroups - FIEMAP ioctl does not start a transaction unnecessarily, this can result in a speed up and less blocking due to IO - LOGICAL_INO (v1, v2) does not start transaction unnecessarily, this can speed up the mentioned ioctl and scrub as well - fsync on files with many (but not too many) hardlinks is faster, finer decision if the links should be fsynced individually or completely - send tries harder to find ranges to clone - trim/discard will skip unallocated chunks that haven't been touched since the last mount Fixes: - send flushes delayed allocation before start, otherwise it could miss some changes in case of a very recent rw->ro switch of a subvolume - fix fallocate with qgroups that could lead to space accounting underflow, reported as a warning - trim/discard ioctl honours the requested range - starting send and dedupe on a subvolume at the same time will let only one of them succeed, this is to prevent changes that send could miss due to dedupe; both operations are restartable Core changes: - more tree-checker validations, errors reported by fuzzing tools: - device item - inode item - block group profiles - tracepoints for extent buffer locking - async cow preallocates memory to avoid errors happening too deep in the call chain - metadata reservations for delalloc reworked to better adapt in many-writers/low-space scenarios - improved space flushing logic for intense DIO vs buffered workloads - lots of cleanups - removed unused struct members - redundant argument removal - properties and xattrs - extent buffer locking - selftests - use common file type conversions - many-argument functions reduction" * tag 'for-5.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (227 commits) btrfs: Use kvmalloc for allocating compressed path context btrfs: Factor out common extent locking code in submit_compressed_extents btrfs: Set io_tree only once in submit_compressed_extents btrfs: Replace clear_extent_bit with unlock_extent btrfs: Make compress_file_range take only struct async_chunk btrfs: Remove fs_info from struct async_chunk btrfs: Rename async_cow to async_chunk btrfs: Preallocate chunks in cow_file_range_async btrfs: reserve delalloc metadata differently btrfs: track DIO bytes in flight btrfs: merge calls of btrfs_setxattr and btrfs_setxattr_trans in btrfs_set_prop btrfs: delete unused function btrfs_set_prop_trans btrfs: start transaction in xattr_handler_set_prop btrfs: drop local copy of inode i_mode btrfs: drop old_fsflags in btrfs_ioctl_setflags btrfs: modify local copy of btrfs_inode flags btrfs: drop useless inode i_flags copy and restore btrfs: start transaction in btrfs_ioctl_setflags() btrfs: export btrfs_set_prop btrfs: refactor btrfs_set_props to validate externally ...
2019-05-07Merge tag 'spi-v5.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi updates from Mark Brown: "One small feature was added this release but the bulk of the diffstat and the changelog comes from the fact that several older drivers got some fairly hefty reworks and a couple of new drivers were added: - Support for detailed control of timing around chip selects from Sowjanya Komatineni. - A big set of fixes and imrovements for the Tegra114 driver from Sowjanya Komatineni. - A big simplification of the GPIO driver from Andrey Smirnov. - DMA support and fixes for the Freescale LPSPI driver from Clark Wang. - Fixes and optimizations for the bcm2835aux from Martin Sparl. - New drivers for Mediatek MT7621 (graduated from staging) and Zynq QSPI" [ This is a so-called "evil merge" that additionally removes a warning due to an unused variable 'i' introduced by commit 1dfbf334f123 ("spi: ep93xx: Convert to use CS GPIO descriptors") - Linus ] * tag 'spi-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (127 commits) spi: rspi: Fix handling of QSPI code when transmit and receive spi: atmel-quadspi: fix crash while suspending spi: stm32: return the get_irq error spi: tegra114: fix PIO transfer spi: pxa2xx: fix SCR (divisor) calculation spi: Clear SPI_CS_HIGH flag from bad_bits for GPIO chip-select spi: ep93xx: Convert to use CS GPIO descriptors spi: AD ASoC: declare missing of table spi: spi-mem: zynq-qspi: Fix build error on architectures missing readsl/writesl spi: stm32-qspi: manage the get_irq error case spi/spi-bcm2835: Split transfers that exceed DLEN spi: expand mode support dt-bindings: spi: spi-mt65xx: add support for MT8516 spi: pxa2xx: Add support for Intel Comet Lake spi/trace: Cap buffer contents at 64 bytes spi: Release spi_res after finalizing message spi: Remove warning in spi_split_transfers_maxsize() spi: Remove one needless transfer speed fall back case spi: sh-msiof: Document r8a77470 bindings spi: pxa2xx: use a module softdep for dw_dmac ...
2019-05-07clone: add CLONE_PIDFDChristian Brauner
This patchset makes it possible to retrieve pid file descriptors at process creation time by introducing the new flag CLONE_PIDFD to the clone() system call. Linus originally suggested to implement this as a new flag to clone() instead of making it a separate system call. As spotted by Linus, there is exactly one bit for clone() left. CLONE_PIDFD creates file descriptors based on the anonymous inode implementation in the kernel that will also be used to implement the new mount api. They serve as a simple opaque handle on pids. Logically, this makes it possible to interpret a pidfd differently, narrowing or widening the scope of various operations (e.g. signal sending). Thus, a pidfd cannot just refer to a tgid, but also a tid, or in theory - given appropriate flag arguments in relevant syscalls - a process group or session. A pidfd does not represent a privilege. This does not imply it cannot ever be that way but for now this is not the case. A pidfd comes with additional information in fdinfo if the kernel supports procfs. The fdinfo file contains the pid of the process in the callers pid namespace in the same format as the procfs status file, i.e. "Pid:\t%d". As suggested by Oleg, with CLONE_PIDFD the pidfd is returned in the parent_tidptr argument of clone. This has the advantage that we can give back the associated pid and the pidfd at the same time. To remove worries about missing metadata access this patchset comes with a sample program that illustrates how a combination of CLONE_PIDFD, and pidfd_send_signal() can be used to gain race-free access to process metadata through /proc/<pid>. The sample program can easily be translated into a helper that would be suitable for inclusion in libc so that users don't have to worry about writing it themselves. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <christian@brauner.io> Co-developed-by: Jann Horn <jannh@google.com> Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: David Howells <dhowells@redhat.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk>
2019-05-06Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "API: - Add support for AEAD in simd - Add fuzz testing to testmgr - Add panic_on_fail module parameter to testmgr - Use per-CPU struct instead multiple variables in scompress - Change verify API for akcipher Algorithms: - Convert x86 AEAD algorithms over to simd - Forbid 2-key 3DES in FIPS mode - Add EC-RDSA (GOST 34.10) algorithm Drivers: - Set output IV with ctr-aes in crypto4xx - Set output IV in rockchip - Fix potential length overflow with hashing in sun4i-ss - Fix computation error with ctr in vmx - Add SM4 protected keys support in ccree - Remove long-broken mxc-scc driver - Add rfc4106(gcm(aes)) cipher support in cavium/nitrox" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (179 commits) crypto: ccree - use a proper le32 type for le32 val crypto: ccree - remove set but not used variable 'du_size' crypto: ccree - Make cc_sec_disable static crypto: ccree - fix spelling mistake "protedcted" -> "protected" crypto: caam/qi2 - generate hash keys in-place crypto: caam/qi2 - fix DMA mapping of stack memory crypto: caam/qi2 - fix zero-length buffer DMA mapping crypto: stm32/cryp - update to return iv_out crypto: stm32/cryp - remove request mutex protection crypto: stm32/cryp - add weak key check for DES crypto: atmel - remove set but not used variable 'alg_name' crypto: picoxcell - Use dev_get_drvdata() crypto: crypto4xx - get rid of redundant using_sd variable crypto: crypto4xx - use sync skcipher for fallback crypto: crypto4xx - fix cfb and ofb "overran dst buffer" issues crypto: crypto4xx - fix ctr-aes missing output IV crypto: ecrdsa - select ASN1 and OID_REGISTRY for EC-RDSA crypto: ux500 - use ccflags-y instead of CFLAGS_<basename>.o crypto: ccree - handle tee fips error during power management resume crypto: ccree - add function to handle cryptocell tee fips error ...
2019-05-06Merge tag 'pm-5.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These fix the (Intel-specific) Performance and Energy Bias Hint (EPB) handling and expose it to user space via sysfs, fix and clean up several cpufreq drivers, add support for two new chips to the qoriq cpufreq driver, fix, simplify and clean up the cpufreq core and the schedutil governor, add support for "CPU" domains to the generic power domains (genpd) framework and provide low-level PSCI firmware support for that feature, fix the exynos cpuidle driver and fix a couple of issues in the devfreq subsystem and clean it up. Specifics: - Fix the handling of Performance and Energy Bias Hint (EPB) on Intel processors and expose it to user space via sysfs to avoid having to access it through the generic MSR I/F (Rafael Wysocki). - Improve the handling of global turbo changes made by the platform firmware in the intel_pstate driver (Rafael Wysocki). - Convert some slow-path static_cpu_has() callers to boot_cpu_has() in cpufreq (Borislav Petkov). - Fix the frequency calculation loop in the armada-37xx cpufreq driver (Gregory CLEMENT). - Fix possible object reference leaks in multuple cpufreq drivers (Wen Yang). - Fix kerneldoc comment in the centrino cpufreq driver (dongjian). - Clean up the ACPI and maple cpufreq drivers (Viresh Kumar, Mohan Kumar). - Add support for lx2160a and ls1028a to the qoriq cpufreq driver (Vabhav Sharma, Yuantian Tang). - Fix kobject memory leak in the cpufreq core (Viresh Kumar). - Simplify the IOwait boosting in the schedutil cpufreq governor and rework the TSC cpufreq notifier on x86 (Rafael Wysocki). - Clean up the cpufreq core and statistics code (Yue Hu, Kyle Lin). - Improve the cpufreq documentation, add SPDX license tags to some PM documentation files and unify copyright notices in them (Rafael Wysocki). - Add support for "CPU" domains to the generic power domains (genpd) framework and provide low-level PSCI firmware support for that feature (Ulf Hansson). - Rearrange the PSCI firmware support code and add support for SYSTEM_RESET2 to it (Ulf Hansson, Sudeep Holla). - Improve genpd support for devices in multiple power domains (Ulf Hansson). - Unify target residency for the AFTR and coupled AFTR states in the exynos cpuidle driver (Marek Szyprowski). - Introduce new helper routine in the operating performance points (OPP) framework (Andrew-sh.Cheng). - Add support for passing on-die termination (ODT) and auto power down parameters from the kernel to Trusted Firmware-A (TF-A) to the rk3399_dmc devfreq driver (Enric Balletbo i Serra). - Add tracing to devfreq (Lukasz Luba). - Make the exynos-bus devfreq driver suspend all devices on system shutdown (Marek Szyprowski). - Fix a few minor issues in the devfreq subsystem and clean it up somewhat (Enric Balletbo i Serra, MyungJoo Ham, Rob Herring, Saravana Kannan, Yangtao Li). - Improve system wakeup diagnostics (Stephen Boyd). - Rework filesystem sync messages emitted during system suspend and hibernation (Harry Pan)" * tag 'pm-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (72 commits) cpufreq: Fix kobject memleak cpufreq: armada-37xx: fix frequency calculation for opp cpufreq: centrino: Fix centrino_setpolicy() kerneldoc comment cpufreq: qoriq: add support for lx2160a x86: tsc: Rework time_cpufreq_notifier() PM / Domains: Allow to attach a CPU via genpd_dev_pm_attach_by_id|name() PM / Domains: Search for the CPU device outside the genpd lock PM / Domains: Drop unused in-parameter to some genpd functions PM / Domains: Use the base device for driver_deferred_probe_check_state() cpufreq: qoriq: Add ls1028a chip support PM / Domains: Enable genpd_dev_pm_attach_by_id|name() for single PM domain PM / Domains: Allow OF lookup for multi PM domain case from ->attach_dev() PM / Domains: Don't kfree() the virtual device in the error path cpufreq: Move ->get callback check outside of __cpufreq_get() PM / Domains: remove unnecessary unlikely() cpufreq: Remove needless bios_limit check in show_bios_limit() drivers/cpufreq/acpi-cpufreq.c: This fixes the following checkpatch warning firmware/psci: add support for SYSTEM_RESET2 PM / devfreq: add tracing for scheduling work trace: events: add devfreq trace event file ...
2019-05-06Merge tag 's390-5.2-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: - Support for kernel address space layout randomization - Add support for kernel image signature verification - Convert s390 to the generic get_user_pages_fast code - Convert s390 to the stack unwind API analog to x86 - Add support for CPU directed interrupts for PCI devices - Provide support for MIO instructions to the PCI base layer, this will allow the use of direct PCI mappings in user space code - Add the basic KVM guest ultravisor interface for protected VMs - Add AT_HWCAP bits for several new hardware capabilities - Update the CPU measurement facility counter definitions to SVN 6 - Arnds cleanup patches for his quest to get LLVM compiles working - A vfio-ccw update with bug fixes and support for halt and clear - Improvements for the hardware TRNG code - Another round of cleanup for the QDIO layer - Numerous cleanups and bug fixes * tag 's390-5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (98 commits) s390/vdso: drop unnecessary cc-ldoption s390: fix clang -Wpointer-sign warnigns in boot code s390: drop CONFIG_VIRT_TO_BUS s390: boot, purgatory: pass $(CLANG_FLAGS) where needed s390: only build for new CPUs with clang s390: simplify disabled_wait s390/ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR s390/unwind: introduce stack unwind API s390/opcodes: add missing instructions to the disassembler s390/bug: add entry size to the __bug_table section s390: use proper expoline sections for .dma code s390/nospec: rename assembler generated expoline thunks s390: add missing ENDPROC statements to assembler functions locking/lockdep: check for freed initmem in static_obj() s390/kernel: add support for kernel address space layout randomization (KASLR) s390/kernel: introduce .dma sections s390/sclp: do not use static sccbs s390/kprobes: use static buffer for insn_page s390/kernel: convert SYSCALL and PGM_CHECK handlers to .quad s390/kernel: build a relocatable kernel ...
2019-05-06Merge branches 'pm-docs' and 'pm-misc'Rafael J. Wysocki
* pm-docs: Documentation: PM: Unify copyright notices Documentation: PM: Add SPDX license tags to multiple files cpufreq: intel_pstate: Documentation: Add references sections * pm-misc: firmware/psci: add support for SYSTEM_RESET2 drivers: firmware: psci: Announce support for OS initiated suspend mode drivers: firmware: psci: Simplify error path of psci_dt_init() drivers: firmware: psci: Split psci_dt_cpu_init_idle() MAINTAINERS: Update files for PSCI drivers: firmware: psci: Move psci to separate directory
2019-05-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: =================== Netfilter updates for net-next The following batch contains Netfilter updates for net-next, they are: 1) Move nft_expr_clone() to nft_dynset, from Paul Gortmaker. 2) Do not include module.h from net/netfilter/nf_tables.h, also from Paul. 3) Restrict conntrack sysctl entries to boolean, from Tonghao Zhang. 4) Several patches to add infrastructure to autoload NAT helper modules from their respective conntrack helper, this also includes the first client of this code in OVS, patches from Flavio Leitner. 5) Add support to match for conntrack ID, from Brett Mastbergen. 6) Spelling fix in connlabel, from Colin Ian King. 7) Use struct_size() from hashlimit, from Gustavo A. R. Silva. 8) Add optimized version of nf_inet_addr_mask(), from Li RongQing. =================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-04Merge tag 'mlx5-updates-2019-04-30' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2019-04-30 mlx5 misc updates: 1) Bodong Wang and Parav Pandit (6): - Remove unused mlx5_query_nic_vport_vlans - vport macros refactoring - Fix vport access in E-Switch - Use atomic rep state to serialize state change 2) Eli Britstein (2): - prio tag mode support, added ACLs and replace TC vlan pop with vlan 0 rewrite when prio tag mode is enabled. 3) Erez Alfasi (2): - ethtool: Add SFF-8436 and SFF-8636 max EEPROM length definitions - mlx5e: ethtool, Add support for EEPROM high pages query 4) Masahiro Yamada (1): - remove meaningless CFLAGS_tracepoint.o 5) Maxim Mikityanskiy (1): - Put the common XDP code into a function 6) Tariq Toukan (2): - Turn on HW tunnel offload in all TIRs 7) Vlad Buslov (1): - Return error when trying to insert existing flower filter ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-03ether: Add dedicated Ethertype for pseudo-802.1Q DSA taggingVladimir Oltean
There are two possible utilizations so far: - Switch devices that don't support a native insertion/extraction header on the CPU port may still enjoy the benefits of port isolation with a custom VLAN tag. For this, they need to have a customizable TPID in hardware and a new Ethertype to distinguish between real 802.1Q traffic and the private tags used for port separation. - Switches that don't support the deactivation of VLAN awareness, but still want to have a mode in which they accept all traffic, including frames that are tagged with a VLAN not configured on their ports, may use this as a fake to trick the hardware into thinking that the TPID for VLAN is something other than 0x8100. What follows after the ETH_P_DSA_8021Q EtherType is a regular VLAN header (TCI), however there is no other EtherType that can be used for this purpose and doesn't already have a well-defined meaning. ETH_P_8021AD, ETH_P_QINQ1, ETH_P_QINQ2 and ETH_P_QINQ3 expect that another follow-up VLAN tag is present, which is not the case here. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Suggested-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-02io_uring: add support for eventfd notificationsJens Axboe
Allow registration of an eventfd, which will trigger an event every time a completion event happens for this io_uring instance. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-02io_uring: add support for IORING_OP_SYNC_FILE_RANGEJens Axboe
This behaves just like sync_file_range(2) does. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-02io_uring: add support for marking commands as drainingJens Axboe
There are no ordering constraints between the submission and completion side of io_uring. But sometimes that would be useful to have. One common example is doing an fsync, for instance, and have it ordered with previous writes. Without support for that, the application must do this tracking itself. This adds a general SQE flag, IOSQE_IO_DRAIN. If a command is marked with this flag, then it will not be issued before previous commands have completed, and subsequent commands submitted after the drain will not be issued before the drain is started.. If there are no pending commands, setting this flag will not change the behavior of the issue of the command. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-02Merge branch 'spi-5.2' into spi-nextMark Brown
2019-05-01ethtool: Add SFF-8436 and SFF-8636 max EEPROM length definitionsErez Alfasi
Added max EEPROM length defines for ethtool usage: #define ETH_MODULE_SFF_8636_MAX_LEN 640 #define ETH_MODULE_SFF_8436_MAX_LEN 640 These definitions are used to determine the EEPROM data length when reading high eeprom pages. For example, SFF-8636 EEPROM data from page 03h needs to be stored at data[512] - data[639]. Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01taprio: Add support for cycle-time-extensionVinicius Costa Gomes
IEEE 802.1Q-2018 defines the concept of a cycle-time-extension, so the last entry of a schedule before the start of a new schedule can be extended, so "too-short" entries can be avoided. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-01taprio: Add support for setting the cycle-time manuallyVinicius Costa Gomes
IEEE 802.1Q-2018 defines that a the cycle-time of a schedule may be overridden, so the schedule is truncated to a determined "width". Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-01taprio: Add support adding an admin scheduleVinicius Costa Gomes
The IEEE 802.1Q-2018 defines two "types" of schedules, the "Oper" (from operational?) and "Admin" ones. Up until now, 'taprio' only had support for the "Oper" one, added when the qdisc is created. This adds support for the "Admin" one, which allows the .change() operation to be supported. Just for clarification, some quick (and dirty) definitions, the "Oper" schedule is the currently (as in this instant) running one, and it's read-only. The "Admin" one is the one that the system configurator has installed, it can be changed, and it will be "promoted" to "Oper" when it's 'base-time' is reached. The idea behing this patch is that calling something like the below, (after taprio is already configured with an initial schedule): $ tc qdisc change taprio dev IFACE parent root \ base-time X \ sched-entry <CMD> <GATES> <INTERVAL> \ ... Will cause a new admin schedule to be created and programmed to be "promoted" to "Oper" at instant X. If an "Admin" schedule already exists, it will be overwritten with the new parameters. Up until now, there was some code that was added to ease the support of changing a single entry of a schedule, but was ultimately unused. Now, that we have support for "change" with more well thought semantics, updating a single entry seems to be less useful. So we remove what is in practice dead code, and return a "not supported" error if the user tries to use it. If changing a single entry would make the user's life easier we may ressurrect this idea, but at this point, removing it simplifies the code. For now, only the schedule specific bits are allowed to be added for a new schedule, that means that 'clockid', 'num_tc', 'map' and 'queues' cannot be modified. Example: $ tc qdisc change dev IFACE parent root handle 100 taprio \ base-time $BASE_TIME \ sched-entry S 00 500000 \ sched-entry S 0f 500000 \ clockid CLOCK_TAI The only change in the netlink API introduced by this change is the introduction of an "admin" type in the response to a dump request, that type allows userspace to separate the "oper" schedule from the "admin" schedule. If userspace doesn't support the "admin" type, it will only display the "oper" schedule. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-30sed-opal.h: remove redundant licence boilerplateChristoph Hellwig
The file already has the correct SPDX header. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-30netfilter: nft_ct: Add ct id supportBrett Mastbergen
The 'id' key returns the unique id of the conntrack entry as returned by nf_ct_get_id(). Signed-off-by: Brett Mastbergen <bmastbergen@untangle.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-30KVM: PPC: Book3S HV: XIVE: Introduce a new capability KVM_CAP_PPC_IRQ_XIVECédric Le Goater
The user interface exposes a new capability KVM_CAP_PPC_IRQ_XIVE to let QEMU connect the vCPU presenters to the XIVE KVM device if required. The capability is not advertised for now as the full support for the XIVE native exploitation mode is not yet available. When this is case, the capability will be advertised on PowerNV Hypervisors only. Nested guests (pseries KVM Hypervisor) are not supported. Internally, the interface to the new KVM device is protected with a new interrupt mode: KVMPPC_IRQ_XIVE. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-04-30KVM: PPC: Book3S HV: Add a new KVM device for the XIVE native exploitation modeCédric Le Goater
This is the basic framework for the new KVM device supporting the XIVE native exploitation mode. The user interface exposes a new KVM device to be created by QEMU, only available when running on a L0 hypervisor. Support for nested guests is not available yet. The XIVE device reuses the device structure of the XICS-on-XIVE device as they have a lot in common. That could possibly change in the future if the need arise. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-04-29btrfs: use common file type conversionPhillip Potter
Deduplicate the btrfs file type conversion implementation - file systems that use the same file types as defined by POSIX do not need to define their own versions and can use the common helper functions decared in fs_types.h and implemented in fs_types.c Common implementation can be found via commit: bbe7449e2599 "fs: common implementation of file type" Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29tty: serial: add driver for the SiFive UARTPaul Walmsley
Add a serial driver for the SiFive UART, found on SiFive FU540 devices (among others). The underlying serial IP block is relatively basic, and currently does not support serial break detection. Further information on the IP block can be found in the documentation and Chisel sources: https://static.dev.sifive.com/FU540-C000-v1.0.pdf https://github.com/sifive/sifive-blocks/tree/master/src/main/scala/devices/uart This driver was written in collaboration with Wesley Terpstra <wesley@sifive.com>. Tested on a SiFive HiFive Unleashed A00 board, using BBL and the open- source FSBL (using a DT file based on what's targeted for mainline). This revision incorporates changes based on comments by Julia Lawall <julia.lawall@lip6.fr>, Emil Renner Berthing <kernel@esmil.dk>, and Andreas Schwab <schwab@suse.de>. Thanks also to Andreas for testing the driver with his userspace and reporting a bug with the set_termios implementation. Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Paul Walmsley <paul@pwsan.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jslaby@suse.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Wesley Terpstra <wesley@sifive.com> Cc: linux-serial@vger.kernel.org Cc: linux-riscv@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Emil Renner Berthing <kernel@esmil.dk> Cc: Andreas Schwab <schwab@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-04-28 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Introduce BPF socket local storage map so that BPF programs can store private data they associate with a socket (instead of e.g. separate hash table), from Martin. 2) Add support for bpftool to dump BTF types. This is done through a new `bpftool btf dump` sub-command, from Andrii. 3) Enable BPF-based flow dissector for skb-less eth_get_headlen() calls which was currently not supported since skb was used to lookup netns, from Stanislav. 4) Add an opt-in interface for tracepoints to expose a writable context for attached BPF programs, used here for NBD sockets, from Matt. 5) BPF xadd related arm64 JIT fixes and scalability improvements, from Daniel. 6) Change the skb->protocol for bpf_skb_adjust_room() helper in order to support tunnels such as sit. Add selftests as well, from Willem. 7) Various smaller misc fixes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27bpf: Introduce bpf sk local storageMartin KaFai Lau
After allowing a bpf prog to - directly read the skb->sk ptr - get the fullsock bpf_sock by "bpf_sk_fullsock()" - get the bpf_tcp_sock by "bpf_tcp_sock()" - get the listener sock by "bpf_get_listener_sock()" - avoid duplicating the fields of "(bpf_)sock" and "(bpf_)tcp_sock" into different bpf running context. this patch is another effort to make bpf's network programming more intuitive to do (together with memory and performance benefit). When bpf prog needs to store data for a sk, the current practice is to define a map with the usual 4-tuples (src/dst ip/port) as the key. If multiple bpf progs require to store different sk data, multiple maps have to be defined. Hence, wasting memory to store the duplicated keys (i.e. 4 tuples here) in each of the bpf map. [ The smallest key could be the sk pointer itself which requires some enhancement in the verifier and it is a separate topic. ] Also, the bpf prog needs to clean up the elem when sk is freed. Otherwise, the bpf map will become full and un-usable quickly. The sk-free tracking currently could be done during sk state transition (e.g. BPF_SOCK_OPS_STATE_CB). The size of the map needs to be predefined which then usually ended-up with an over-provisioned map in production. Even the map was re-sizable, while the sk naturally come and go away already, this potential re-size operation is arguably redundant if the data can be directly connected to the sk itself instead of proxy-ing through a bpf map. This patch introduces sk->sk_bpf_storage to provide local storage space at sk for bpf prog to use. The space will be allocated when the first bpf prog has created data for this particular sk. The design optimizes the bpf prog's lookup (and then optionally followed by an inline update). bpf_spin_lock should be used if the inline update needs to be protected. BPF_MAP_TYPE_SK_STORAGE: ----------------------- To define a bpf "sk-local-storage", a BPF_MAP_TYPE_SK_STORAGE map (new in this patch) needs to be created. Multiple BPF_MAP_TYPE_SK_STORAGE maps can be created to fit different bpf progs' needs. The map enforces BTF to allow printing the sk-local-storage during a system-wise sk dump (e.g. "ss -ta") in the future. The purpose of a BPF_MAP_TYPE_SK_STORAGE map is not for lookup/update/delete a "sk-local-storage" data from a particular sk. Think of the map as a meta-data (or "type") of a "sk-local-storage". This particular "type" of "sk-local-storage" data can then be stored in any sk. The main purposes of this map are mostly: 1. Define the size of a "sk-local-storage" type. 2. Provide a similar syscall userspace API as the map (e.g. lookup/update, map-id, map-btf...etc.) 3. Keep track of all sk's storages of this "type" and clean them up when the map is freed. sk->sk_bpf_storage: ------------------ The main lookup/update/delete is done on sk->sk_bpf_storage (which is a "struct bpf_sk_storage"). When doing a lookup, the "map" pointer is now used as the "key" to search on the sk_storage->list. The "map" pointer is actually serving as the "type" of the "sk-local-storage" that is being requested. To allow very fast lookup, it should be as fast as looking up an array at a stable-offset. At the same time, it is not ideal to set a hard limit on the number of sk-local-storage "type" that the system can have. Hence, this patch takes a cache approach. The last search result from sk_storage->list is cached in sk_storage->cache[] which is a stable sized array. Each "sk-local-storage" type has a stable offset to the cache[] array. In the future, a map's flag could be introduced to do cache opt-out/enforcement if it became necessary. The cache size is 16 (i.e. 16 types of "sk-local-storage"). Programs can share map. On the program side, having a few bpf_progs running in the networking hotpath is already a lot. The bpf_prog should have already consolidated the existing sock-key-ed map usage to minimize the map lookup penalty. 16 has enough runway to grow. All sk-local-storage data will be removed from sk->sk_bpf_storage during sk destruction. bpf_sk_storage_get() and bpf_sk_storage_delete(): ------------------------------------------------ Instead of using bpf_map_(lookup|update|delete)_elem(), the bpf prog needs to use the new helper bpf_sk_storage_get() and bpf_sk_storage_delete(). The verifier can then enforce the ARG_PTR_TO_SOCKET argument. The bpf_sk_storage_get() also allows to "create" new elem if one does not exist in the sk. It is done by the new BPF_SK_STORAGE_GET_F_CREATE flag. An optional value can also be provided as the initial value during BPF_SK_STORAGE_GET_F_CREATE. The BPF_MAP_TYPE_SK_STORAGE also supports bpf_spin_lock. Together, it has eliminated the potential use cases for an equivalent bpf_map_update_elem() API (for bpf_prog) in this patch. Misc notes: ---------- 1. map_get_next_key is not supported. From the userspace syscall perspective, the map has the socket fd as the key while the map can be shared by pinned-file or map-id. Since btf is enforced, the existing "ss" could be enhanced to pretty print the local-storage. Supporting a kernel defined btf with 4 tuples as the return key could be explored later also. 2. The sk->sk_lock cannot be acquired. Atomic operations is used instead. e.g. cmpxchg is done on the sk->sk_bpf_storage ptr. Please refer to the source code comments for the details in synchronization cases and considerations. 3. The mem is charged to the sk->sk_omem_alloc as the sk filter does. Benchmark: --------- Here is the benchmark data collected by turning on the "kernel.bpf_stats_enabled" sysctl. Two bpf progs are tested: One bpf prog with the usual bpf hashmap (max_entries = 8192) with the sk ptr as the key. (verifier is modified to support sk ptr as the key That should have shortened the key lookup time.) Another bpf prog is with the new BPF_MAP_TYPE_SK_STORAGE. Both are storing a "u32 cnt", do a lookup on "egress_skb/cgroup" for each egress skb and then bump the cnt. netperf is used to drive data with 4096 connected UDP sockets. BPF_MAP_TYPE_HASH with a modifier verifier (152ns per bpf run) 27: cgroup_skb name egress_sk_map tag 74f56e832918070b run_time_ns 58280107540 run_cnt 381347633 loaded_at 2019-04-15T13:46:39-0700 uid 0 xlated 344B jited 258B memlock 4096B map_ids 16 btf_id 5 BPF_MAP_TYPE_SK_STORAGE in this patch (66ns per bpf run) 30: cgroup_skb name egress_sk_stora tag d4aa70984cc7bbf6 run_time_ns 25617093319 run_cnt 390989739 loaded_at 2019-04-15T13:47:54-0700 uid 0 xlated 168B jited 156B memlock 4096B map_ids 17 btf_id 6 Here is a high-level picture on how are the objects organized: sk ┌──────┐ │ │ │ │ │ │ │*sk_bpf_storage─────▶ bpf_sk_storage └──────┘ ┌───────┐ ┌───────────┤ list │ │ │ │ │ │ │ │ │ │ │ └───────┘ │ │ elem │ ┌────────┐ ├─▶│ snode │ │ ├────────┤ │ │ data │ bpf_map │ ├────────┤ ┌─────────┐ │ │map_node│◀─┬─────┤ list │ │ └────────┘ │ │ │ │ │ │ │ │ elem │ │ │ │ ┌────────┐ │ └─────────┘ └─▶│ snode │ │ ├────────┤ │ bpf_map │ data │ │ ┌─────────┐ ├────────┤ │ │ list ├───────▶│map_node│ │ │ │ └────────┘ │ │ │ │ │ │ elem │ └─────────┘ ┌────────┐ │ ┌─▶│ snode │ │ │ ├────────┤ │ │ │ data │ │ │ ├────────┤ │ │ │map_node│◀─┘ │ └────────┘ │ │ │ ┌───────┐ sk └──────────│ list │ ┌──────┐ │ │ │ │ │ │ │ │ │ │ │ │ └───────┘ │*sk_bpf_storage───────▶bpf_sk_storage └──────┘ Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-26bpf: add writable context for raw tracepointsMatt Mullins
This is an opt-in interface that allows a tracepoint to provide a safe buffer that can be written from a BPF_PROG_TYPE_RAW_TRACEPOINT program. The size of the buffer must be a compile-time constant, and is checked before allowing a BPF program to attach to a tracepoint that uses this feature. The pointer to this buffer will be the first argument of tracepoints that opt in; the pointer is valid and can be bpf_probe_read() by both BPF_PROG_TYPE_RAW_TRACEPOINT and BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that attach to such a tracepoint, but the buffer to which it points may only be written by the latter. Signed-off-by: Matt Mullins <mmullins@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-26Input: add KEY_KBD_LAYOUT_NEXTDmitry Torokhov
The HID usage tables define a key to cycle through a set of keyboard layouts, let's add corresponding keycode. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2019-04-26Merge tag 'mac80211-next-for-davem-2019-04-26' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Various updates, notably: * extended key ID support (from 802.11-2016) * per-STA TX power control support * mac80211 TX performance improvements * HE (802.11ax) updates * mesh link probing support * enhancements of multi-BSSID support (also related to HE) * OWE userspace processing support ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26cfg80211: add support to probe unexercised mesh linkRajkumar Manoharan
Adding support to allow mesh HWMP to measure link metrics on unexercised direct mesh path by sending some data frames to other mesh points which are not currently selected as a primary traffic path but only 1 hop away. The absence of the primary path to the chosen node makes it necessary to apply some form of marking on a chosen packet stream so that the packets can be properly steered to the selected node for testing, and not by the regular mesh path lookup. Tested-by: Pradeep Kumar Chitrapu <pradeepc@codeaurora.org> Signed-off-by: Rajkumar Manoharan <rmanohar@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-04-26cfg80211: Add support to set tx power for a station associatedAshok Raj Nagarajan
This patch adds support to set transmit power setting type and transmit power level attributes to NL80211_CMD_SET_STATION in order to facilitate adjusting the transmit power level of a station associated to the AP. The added attributes allow selection of automatic and limited transmit power level, with the level defined in dBm format. Co-developed-by: Balaji Pothunoori <bpothuno@codeaurora.org> Signed-off-by: Ashok Raj Nagarajan <arnagara@codeaurora.org> Signed-off-by: Balaji Pothunoori <bpothuno@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-04-26nl80211/cfg80211: Extended Key ID supportAlexander Wetzel
Add support for IEEE 802.11-2016 "Extended Key ID for Individually Addressed Frames". Extend cfg80211 and nl80211 to allow pairwise keys to be installed for Rx only, enable Tx separately and allow Key ID 1 for pairwise keys. Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> [use NLA_POLICY_RANGE() for NL80211_KEY_MODE] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-04-26nl80211: increase NL80211_MAX_SUPP_REG_RULESShaul Triebitz
The iwlwifi driver creates one rule per channel, thus it needs more rules than normal. To solve this, increase NL80211_MAX_SUPP_REG_RULES so iwlwifi can also fit UHB (ultra high band) channels. Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two easy cases of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-25NFS: Move internal constants out of uapi/linux/nfs_mount.hTrond Myklebust
When the label says "for internal use only", then it doesn't belong in the 'uapi' subtree. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25drivers/misc: Add Aspeed P2A control driverPatrick Venture
The ASPEED AST2400, and AST2500 in some configurations include a PCI-to-AHB MMIO bridge. This bridge allows a server to read and write in the BMC's physical address space. This feature is especially useful when using this bridge to send large files to the BMC. The host may use this to send down a firmware image by staging data at a specific memory address, and in a coordinated effort with the BMC's software stack and kernel, transmit the bytes. This driver enables the BMC to unlock the PCI bridge on demand, and configure it via ioctl to allow the host to write bytes to an agreed upon location. In the primary use-case, the region to use is known apriori on the BMC, and the host requests this information. Once this request is received, the BMC's software stack will enable the bridge and the region and then using some software flow control (possibly via IPMI packets), copy the bytes down. Once the process is complete, the BMC will disable the bridge and unset any region involved. The default behavior of this bridge when present is: enabled and all regions marked read-write. This driver will fix the regions to be read-only and then disable the bridge entirely. The memory regions protected are: * BMC flash MMIO window * System flash MMIO windows * SOC IO (peripheral MMIO) * DRAM The DRAM region itself is all of DRAM and cannot be further specified. Once the PCI bridge is enabled, the host can read all of DRAM, and if the DRAM section is write-enabled, then it can write to all of it. Signed-off-by: Patrick Venture <venture@google.com> Reviewed-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25media: v4l: Add definitions for missing 16-bit RGB555 formatsLaurent Pinchart
The V4L2 API is missing the 16-bit RGB555 formats for the RGBA, RGBX, ABGR, XBGR, BGRA and BGRX component orders. Add them, using the same 4CCs as DRM. Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Jacopo Mondi <jacopo@jmondi.org> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-04-25media: v4l: Add definitions for missing 16-bit RGB4444 formatsLaurent Pinchart
The V4L2 API is missing the 16-bit RGB4444 formats for the RGBA, RGBX, ABGR, XBGR, BGRA and BGRX component orders. Add them, using the same 4CCs as DRM. Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Reviewed-by: Jacopo Mondi <jacopo@jmondi.org> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-04-25media: v4l: Add definitions for missing 32-bit RGB formatsLaurent Pinchart
The V4L2 API is missing the 32-bit RGB formats for the ABGR, XBGR, RGBA and RGBX component orders. Add them, using the same 4CCs as DRM. Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Reviewed-by: Jacopo Mondi <jacopo@jmondi.org> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-04-24fuse: Add ioctl flag for x32 compat ioctlIan Abbott
Currently, a CUSE server running on a 64-bit kernel can tell when an ioctl request comes from a process running a 32-bit ABI, but cannot tell whether the requesting process is using legacy IA32 emulation or x32 ABI. In particular, the server does not know the size of the client process's `time_t` type. For 64-bit kernels, the `FUSE_IOCTL_COMPAT` and `FUSE_IOCTL_32BIT` flags are currently set in the ioctl input request (`struct fuse_ioctl_in` member `flags`) for a 32-bit requesting process. This patch defines a new flag `FUSE_IOCTL_COMPAT_X32` and sets it if the 32-bit requesting process is using the x32 ABI. This allows the server process to distinguish between requests coming from client processes using IA32 emulation or the x32 ABI and so infer the size of the client process's `time_t` type and any other IA32/x32 differences. Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-04-24fuse: fix changelog entry for protocol 7.9Alan Somers
Retroactively add changelog entry for the atime and mtime "now" flags. This was an oversight in commit 17637cbaba59 ("fuse: improve utimes support"). Signed-off-by: Alan Somers <asomers@FreeBSD.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-04-24fuse: fix changelog entry for protocol 7.12Alan Somers
This was a mistake in the comment in commit e0a43ddcc08c ("fuse: allow umask processing in userspace"). Signed-off-by: Alan Somers <asomers@FreeBSD.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-04-24fuse: document fuse_fsync_in.fsync_flagsAlan Somers
The FUSE_FSYNC_DATASYNC flag was introduced by commit b6aeadeda22a ("[PATCH] FUSE - file operations") as a magic number. No new values have been added to fsync_flags since. Signed-off-by: Alan Somers <asomers@FreeBSD.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-04-24fuse: Add FOPEN_STREAM to use stream_open()Kirill Smelkov
Starting from commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") files opened even via nonseekable_open gate read and write via lock and do not allow them to be run simultaneously. This can create read vs write deadlock if a filesystem is trying to implement a socket-like file which is intended to be simultaneously used for both read and write from filesystem client. See commit 10dce8af3422 ("fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock") for details and e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes to /proc/xen/xenbus") for a similar deadlock example on /proc/xen/xenbus. To avoid such deadlock it was tempting to adjust fuse_finish_open to use stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE, and in particular GVFS which actually uses offset in its read and write handlers https://codesearch.debian.net/search?q=-%3Enonseekable+%3D https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481 so if we would do such a change it will break a real user. Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the opened handler is having stream-like semantics; does not use file position and thus the kernel is free to issue simultaneous read and write request on opened file handle. This patch together with stream_open() should be added to stable kernels starting from v3.14+. This will allow to patch OSSPD and other FUSE filesystems that provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all kernel versions. This should work because fuse_finish_open ignores unknown open flags returned from a filesystem and so passing FOPEN_STREAM to a kernel that is not aware of this flag cannot hurt. In turn the kernel that is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE is sufficient to implement streams without read vs write deadlock. Cc: stable@vger.kernel.org # v3.14+ Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-04-24fuse: allow filesystems to have precise control over data cacheKirill Smelkov
On networked filesystems file data can be changed externally. FUSE provides notification messages for filesystem to inform kernel that metadata or data region of a file needs to be invalidated in local page cache. That provides the basis for filesystem implementations to invalidate kernel cache explicitly based on observed filesystem-specific events. FUSE has also "automatic" invalidation mode(*) when the kernel automatically invalidates data cache of a file if it sees mtime change. It also automatically invalidates whole data cache of a file if it sees file size being changed. The automatic mode has corresponding capability - FUSE_AUTO_INVAL_DATA. However, due to probably historical reason, that capability controls only whether mtime change should be resulting in automatic invalidation or not. A change in file size always results in invalidating whole data cache of a file irregardless of whether FUSE_AUTO_INVAL_DATA was negotiated(+). The filesystem I write[1] represents data arrays stored in networked database as local files suitable for mmap. It is read-only filesystem - changes to data are committed externally via database interfaces and the filesystem only glues data into contiguous file streams suitable for mmap and traditional array processing. The files are big - starting from hundreds gigabytes and more. The files change regularly, and frequently by data being appended to their end. The size of files thus changes frequently. If a file was accessed locally and some part of its data got into page cache, we want that data to stay cached unless there is memory pressure, or unless corresponding part of the file was actually changed. However current FUSE behaviour - when it sees file size change - is to invalidate the whole file. The data cache of the file is thus completely lost even on small size change, and despite that the filesystem server is careful to accurately translate database changes into FUSE invalidation messages to kernel. Let's fix it: if a filesystem, through new FUSE_EXPLICIT_INVAL_DATA capability, indicates to kernel that it is fully responsible for data cache invalidation, then the kernel won't invalidate files data cache on size change and only truncate that cache to new size in case the size decreased. (*) see 72d0d248ca "fuse: add FUSE_AUTO_INVAL_DATA init flag", eed2179efe "fuse: invalidate inode mapping if mtime changes" (+) in writeback mode the kernel does not invalidate data cache on file size change, but neither it allows the filesystem to set the size due to external event (see 8373200b12 "fuse: Trust kernel i_size only") [1] https://lab.nexedi.com/kirr/wendelin.core/blob/a50f1d9f/wcfs/wcfs.go#L20 Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-04-24KVM: arm64: Add capability to advertise ptrauth for guestAmit Daniel Kachhap
This patch advertises the capability of two cpu feature called address pointer authentication and generic pointer authentication. These capabilities depend upon system support for pointer authentication and VHE mode. The current arm64 KVM partially implements pointer authentication and support of address/generic authentication are tied together. However, separate ABI requirements for both of them is added so that any future isolated implementation will not require any ABI changes. Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@arm.com> Cc: kvmarm@lists.cs.columbia.edu Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-24nfsd: un-deprecate nfsdcldScott Mayhew
When nfsdcld was released, it was quickly deprecated in favor of the nfsdcltrack usermodehelper, so as to not require another running daemon. That prevents NFSv4 clients from reclaiming locks from nfsd's running in containers, since neither nfsdcltrack nor the legacy client tracking code work in containers. This commit un-deprecates the use of nfsdcld, with one twist: we will populate the reclaim_str_hashtbl on startup. During client tracking initialization, do an upcall ("GraceStart") to nfsdcld to get a list of clients from the database. nfsdcld will do one downcall with a status of -EINPROGRESS for each client record in the database, which in turn will cause an nfs4_client_reclaim to be added to the reclaim_str_hashtbl. When complete, nfsdcld will do a final downcall with a status of 0. This will save nfsd from having to do an upcall to the daemon during nfs4_check_open_reclaim() processing. Even though nfsdcld was quickly deprecated, there is a very small chance of old nfsdcld daemons running in the wild. These will respond to the new "GraceStart" upcall with -EOPNOTSUPP, in which case we will log a message and fall back to the original nfsdcld tracking ops (now called nfsd4_cld_tracking_ops_v0). Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-24vfio-ccw: add handling for async channel instructionsCornelia Huck
Add a region to the vfio-ccw device that can be used to submit asynchronous I/O instructions. ssch continues to be handled by the existing I/O region; the new region handles hsch and csch. Interrupt status continues to be reported through the same channels as for ssch. Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Farhan Ali <alifm@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-04-24vfio-ccw: add capabilities chainCornelia Huck
Allow to extend the regions used by vfio-ccw. The first user will be handling of halt and clear subchannel. Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Farhan Ali <alifm@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-04-24Merge tag 'drm-misc-next-2019-04-18' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v5.2: UAPI Changes: - Document which feature flags belong to which command in virtio_gpu.h - Make the FB_DAMAGE_CLIPS available for atomic userspace only, it's useless for legacy. Cross-subsystem Changes: - Add device tree bindings for lg,acx467akm-7 panel and ST-Ericsson Multi Channel Display Engine MCDE - Add parameters to the device tree bindings for tfp410 - iommu/io-pgtable: Add ARM Mali midgard MMU page table format - dma-buf: Only do a 64-bits seqno compare when driver explicitly asks for it, else wraparound. - Use the 64-bits compare for dma-fence-chains Core Changes: - Make the fb conversion functions use __iomem dst. - Rename drm_client_add to drm_client_register - Move intel_fb_initial_config to core. - Add a drm_gem_objects_lookup helper - Add drm_gem_fence_array helpers, and use it in lima. - Add drm_format_helper.c to kerneldoc. Driver Changes: - Add panfrost driver for mali midgard/bitfrost. - Converts bochs to use the simple display type. - Small fixes to sun4i, tinydrm, ti-fp410. - Fid aspeed's Kconfig options. - Make some symbols/functions static in lima, sun4i and meson. - Add a driver for the lg,acx467akm-7 panel. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/737ad994-213d-45b5-207a-b99d795acd21@linux.intel.com