summaryrefslogtreecommitdiff
path: root/include/uapi
AgeCommit message (Collapse)Author
2019-09-16ethtool: implement Energy Detect Powerdown support via phy-tunableAlexandru Ardelean
The `phy_tunable_id` has been named `ETHTOOL_PHY_EDPD` since it looks like this feature is common across other PHYs (like EEE), and defining `ETHTOOL_PHY_ENERGY_DETECT_POWER_DOWN` seems too long. The way EDPD works, is that the RX block is put to a lower power mode, except for link-pulse detection circuits. The TX block is also put to low power mode, but the PHY wakes-up periodically to send link pulses, to avoid lock-ups in case the other side is also in EDPD mode. Currently, there are 2 PHY drivers that look like they could use this new PHY tunable feature: the `adin` && `micrel` PHYs. The ADIN's datasheet mentions that TX pulses are at intervals of 1 second default each, and they can be disabled. For the Micrel KSZ9031 PHY, the datasheet does not mention whether they can be disabled, but mentions that they can modified. The way this change is structured, is similar to the PHY tunable downshift control: * a `ETHTOOL_PHY_EDPD_DFLT_TX_MSECS` value is exposed to cover a default TX interval; some PHYs could specify a certain value that makes sense * `ETHTOOL_PHY_EDPD_NO_TX` would disable TX when EDPD is enabled * `ETHTOOL_PHY_EDPD_DISABLE` will disable EDPD As noted by the `ETHTOOL_PHY_EDPD_DFLT_TX_MSECS` the interval unit is 1 millisecond, which should cover a reasonable range of intervals: - from 1 millisecond, which does not sound like much of a power-saver - to ~65 seconds which is quite a lot to wait for a link to come up when plugging a cable Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16taprio: Add support for hardware offloadingVinicius Costa Gomes
This allows taprio to offload the schedule enforcement to capable network cards, resulting in more precise windows and less CPU usage. The gate mask acts on traffic classes (groups of queues of same priority), as specified in IEEE 802.1Q-2018, and following the existing taprio and mqprio semantics. It is up to the driver to perform conversion between tc and individual netdev queues if for some reason it needs to make that distinction. Full offload is requested from the network interface by specifying "flags 2" in the tc qdisc creation command, which in turn corresponds to the TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD bit. The important detail here is the clockid which is implicitly /dev/ptpN for full offload, and hence not configurable. A reference counting API is added to support the use case where Ethernet drivers need to keep the taprio offload structure locally (i.e. they are a multi-port switch driver, and configuring a port depends on the settings of other ports as well). The refcount_t variable is kept in a private structure (__tc_taprio_qopt_offload) and not exposed to drivers. In the future, the private structure might also be expanded with a backpointer to taprio_sched *q, to implement the notification system described in the patch (of when admin became oper, or an error occurred, etc, so the offload can be monitored with 'tc qdisc show'). Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16Merge tag 'core-process-v5.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull pidfd/waitid updates from Christian Brauner: "This contains two features and various tests. First, it adds support for waiting on process through pidfds by adding the P_PIDFD type to the waitid() syscall. This completes the basic functionality of the pidfd api (cf. [1]). In the meantime we also have a new adition to the userspace projects that make use of the pidfd api. The qt project was nice enough to send a mail pointing out that they have a pr up to switch to the pidfd api (cf. [2]). Second, this tag contains an extension to the waitid() syscall to make it possible to wait on the current process group in a race free manner (even though the actual problem is very unlikely) by specifing 0 together with the P_PGID type. This extension traces back to a discussion on the glibc development mailing list. There are also a range of tests for the features above. Additionally, the test-suite which detected the pidfd-polling race we fixed in [3] is included in this tag" [1] https://lwn.net/Articles/794707/ [2] https://codereview.qt-project.org/c/qt/qtbase/+/108456 [3] commit b191d6491be6 ("pidfd: fix a poll race when setting exit_state") * tag 'core-process-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: waitid: Add support for waiting for the current process group tests: add pidfd poll tests tests: move common definitions and functions into pidfd.h pidfd: add pidfd_wait tests pidfd: add P_PIDFD to waitid()
2019-09-16tcp: Add snd_wnd to TCP_INFOThomas Higdon
Neal Cardwell mentioned that snd_wnd would be useful for diagnosing TCP performance problems -- > (1) Usually when we're diagnosing TCP performance problems, we do so > from the sender, since the sender makes most of the > performance-critical decisions (cwnd, pacing, TSO size, TSQ, etc). > From the sender-side the thing that would be most useful is to see > tp->snd_wnd, the receive window that the receiver has advertised to > the sender. This serves the purpose of adding an additional __u32 to avoid the would-be hole caused by the addition of the tcpi_rcvi_ooopack field. Signed-off-by: Thomas Higdon <tph@fb.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16tcp: Add TCP_INFO counter for packets received out-of-orderThomas Higdon
For receive-heavy cases on the server-side, we want to track the connection quality for individual client IPs. This counter, similar to the existing system-wide TCPOFOQueue counter in /proc/net/netstat, tracks out-of-order packet reception. By providing this counter in TCP_INFO, it will allow understanding to what degree receive-heavy sockets are experiencing out-of-order delivery and packet drops indicating congestion. Please note that this is similar to the counter in NetBSD TCP_INFO, and has the same name. Also note that we avoid increasing the size of the tcp_sock struct by taking advantage of a hole. Signed-off-by: Thomas Higdon <tph@fb.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16dm: introduce DM_GET_TARGET_VERSIONMikulas Patocka
This commit introduces a new ioctl DM_GET_TARGET_VERSION. It will load a target that is specified in the "name" entry in the parameter structure and return its version. This functionality is intended to be used by cryptsetup, so that it can query kernel capabilities before activating the device. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-09-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Minor overlapping changes in the btusb and ixgbe drivers. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Don't corrupt xfrm_interface parms before validation, from Nicolas Dichtel. 2) Revert use of usb-wakeup in btusb, from Mario Limonciello. 3) Block ipv6 packets in bridge netfilter if ipv6 is disabled, from Leonardo Bras. 4) IPS_OFFLOAD not honored in ctnetlink, from Pablo Neira Ayuso. 5) Missing ULP check in sock_map, from John Fastabend. 6) Fix receive statistic handling in forcedeth, from Zhu Yanjun. 7) Fix length of SKB allocated in 6pack driver, from Christophe JAILLET. 8) ip6_route_info_create() returns an error pointer, not NULL. From Maciej Żenczykowski. 9) Only add RDS sock to the hashes after rs_transport is set, from Ka-Cheong Poon. 10) Don't double clean TX descriptors in ixgbe, from Ilya Maximets. 11) Presence of transmit IPSEC offload in an SKB is not tested for correctly in ixgbe and ixgbevf. From Steffen Klassert and Jeff Kirsher. 12) Need rcu_barrier() when register_netdevice() takes one of the notifier based failure paths, from Subash Abhinov Kasiviswanathan. 13) Fix leak in sctp_do_bind(), from Mao Wenan. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits) cdc_ether: fix rndis support for Mediatek based smartphones sctp: destroy bucket if failed to bind addr sctp: remove redundant assignment when call sctp_get_port_local sctp: change return type of sctp_get_port_local ixgbevf: Fix secpath usage for IPsec Tx offload sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()' ixgbe: Fix secpath usage for IPsec TX offload. net: qrtr: fix memort leak in qrtr_tun_write_iter net: Fix null de-reference of device refcount ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()' tun: fix use-after-free when register netdev failed tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR ixgbe: fix double clean of Tx descriptors with xdp ixgbe: Prevent u8 wrapping of ITR value to something less than 10us mlx4: fix spelling mistake "veify" -> "verify" net: hns3: fix spelling mistake "undeflow" -> "underflow" net: lmc: fix spelling mistake "runnin" -> "running" NFC: st95hf: fix spelling mistake "receieve" -> "receive" net/rds: An rds_sock is added too early to the hash table mac80211: Do not send Layer 2 Update frame before authorization ...
2019-09-13drm/amdgpu: Extends amdgpu vm definitions (v2)Oak Zeng
Add RW mtype introduced for arcturus. v2: * Don't add probe-invalidation bit from UAPI * Don't add unused AMDGPU_MTYPE_ definitions Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-13net: devlink: move reload fail indication to devlink core and expose to userJiri Pirko
Currently the fact that devlink reload failed is stored in drivers. Move this flag into devlink core. Also, expose it to the user. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-13md: add feature flag MD_FEATURE_RAID0_LAYOUTNeilBrown
Due to a bug introduced in Linux 3.14 we cannot determine the correctly layout for a multi-zone RAID0 array - there are two possibilities. It is possible to tell the kernel which to chose using a module parameter, but this can be clumsy to use. It would be best if the choice were recorded in the metadata. So add a feature flag for this purpose. If it is set, then the 'layout' field of the superblock is used to determine which layout to use. If this flag is not set, then mddev->layout gets set to -1, which causes the module parameter to be required. Acked-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Song Liu <songliubraving@fb.com>
2019-09-13Merge tag 'v5.3-rc8' into rdma.git for-nextJason Gunthorpe
To resolve dependencies in following patches mlx5_ib.h conflict resolved by keeing both hunks Linux 5.3-rc8 Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-13PTP: add support for one-shot outputFelipe Balbi
Some controllers allow for a one-shot output pulse, in contrast to periodic output. Now that we have extensible versions of our IOCTLs, we can finally make use of the 'flags' field to pass a bit telling driver that if we want one-shot pulse output. Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Reviewed-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-13PTP: introduce new versions of IOCTLsFelipe Balbi
The current version of the IOCTL have a small problem which prevents us from extending the API by making use of reserved fields. In these new IOCTLs, we are now making sure that flags and rsv fields are zero which will allow us to extend the API in the future. Reviewed-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Fix error path of nf_tables_updobj(), from Dan Carpenter. 2) Move large structure away from stack in the nf_tables offload infrastructure, from Arnd Bergmann. 3) Move indirect flow_block logic to nf_tables_offload. 4) Support for synproxy objects, from Fernando Fernandez Mancera. 5) Support for fwd and dup offload. 6) Add __nft_offload_get_chain() helper, this implicitly fixes missing mutex and check for offload flags in the indirect block support, patch from wenxu. 7) Remove rules on device unregistration, from wenxu. This includes two preparation patches to reuse nft_flow_offload_chain() and nft_flow_offload_rule(). Large batch from Jeremy Sowden to make a second pass to the CONFIG_HEADER_TEST support and a bit of housekeeping: 8) Missing include guard in conntrack label header, from Jeremy Sowden. 9) A few coding style errors: trailing whitespace, incorrect indent in Kconfig, and semicolons at the end of function definitions. 10) Remove unused ipt_init() and ip6t_init() declarations. 11) Inline xt_hashlimit, ebt_802_3 and xt_physdev headers. They are only used once. 12) Update include directive in several netfilter files. 13) Remove unused include/net/netfilter/ipv6/nf_conntrack_icmpv6.h. 14) Move nf_ip6_ext_hdr() to include/linux/netfilter_ipv6.h 15) Move several synproxy structure definitions to nf_synproxy.h 16) Move nf_bridge_frag_data structure to include/linux/netfilter_bridge.h 17) Clean up static inline definitions in nf_conntrack_ecache.h. 18) Replace defined(CONFIG...) || defined(CONFIG...MODULE) with IS_ENABLED(CONFIG...). 19) Missing inline function conditional definitions based on Kconfig preferences in synproxy and nf_conntrack_timeout. 20) Update br_nf_pre_routing_ipv6() definition. 21) Move conntrack code in linux/skbuff.h to nf_conntrack headers. 22) Several patches to remove superfluous CONFIG_NETFILTER and CONFIG_NF_CONNTRACK checks in headers, coming from the initial batch support for CONFIG_HEADER_TEST for netfilter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-12fuse: reserve byteswapped init opcodesMichael S. Tsirkin
virtio fs tunnels fuse over a virtio channel. One issue is two sides might be speaking different endian-ness. To detects this, host side looks at the opcode value in the FUSE_INIT command. Works fine at the moment but might fail if a future version of fuse will use such an opcode for initialization. Let's reserve this opcode so we remember and don't do this. Same for CUSE_INIT. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12fuse: reserve values for mapping protocolDr. David Alan Gilbert
SETUPMAPPING is a command for use with 'virtiofsd', a fuse-over-virtio implementation; it may find use in other fuse impelementations as well in which the kernel does not have access to the address space of the daemon directly. A SETUPMAPPING operation causes a section of a file to be mapped into a memory window visible to the kernel. The offsets in the file and the window are defined by the kernel performing the operation. The daemon may reject the request, for reasons including permissions and limited resources. When a request perfectly overlaps a previous mapping, the previous mapping is replaced. When a mapping partially overlaps a previous mapping, the previous mapping is split into one or two smaller mappings. REMOVEMAPPING is the complement to SETUPMAPPING; it unmaps a range of mapped files from the window visible to the kernel. The map_alignment field communicates the alignment constraint for FUSE_SETUPMAPPING/FUSE_REMOVEMAPPING and allows the daemon to constrain the addresses and file offsets chosen by the kernel. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-11devlink: add unknown 'fw_load_policy' valueDirk van der Merwe
Similar to the 'reset_dev_on_drv_probe' devlink parameter, it is useful to have an unknown value which can be used by drivers to report that the hardware value isn't recognized or is otherwise invalid instead of failing the operation. This is especially useful for u8/enum parameters. Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-11Merge tag 'mac80211-next-for-davem-2019-09-11' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== We have a number of changes, but things are settling down: * a fix in the new 6 GHz channel support * a fix for recent minstrel (rate control) updates for an infinite loop * handle interface type changes better wrt. management frame registrations (for management frames sent to userspace) * add in-BSS RX time to survey information * handle HW rfkill properly if !CONFIG_RFKILL * send deauth on IBSS station expiry, to avoid state mismatches * handle deferred crypto tailroom updates in mac80211 better when device restart happens * fix a spectre-v1 - really a continuation of a previous patch * advertise NL80211_CMD_UPDATE_FT_IES as supported if so * add some missing parsing in VHT extended NSS support * support HE in mac80211_hwsim * let mac80211 drivers determine the max MTU themselves along with the usual cleanups etc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-11KVM: x86: Return to userspace with internal error on unexpected exit reasonLiran Alon
Receiving an unexpected exit reason from hardware should be considered as a severe bug in KVM. Therefore, instead of just injecting #UD to guest and ignore it, exit to userspace on internal error so that it could handle it properly (probably by terminating guest). In addition, prefer to use vcpu_unimpl() instead of WARN_ONCE() as handling unexpected exit reason should be a rare unexpected event (that was expected to never happen) and we prefer to print a message on it every time it occurs to guest. Furthermore, dump VMCS/VMCB to dmesg to assist diagnosing such cases. Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10netfilter: nft_synproxy: add synproxy stateful object supportFernando Fernandez Mancera
Register a new synproxy stateful object type into the stateful object infrastructure. Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-09-10devlink: add 'reset_dev_on_drv_probe' paramDirk van der Merwe
Add the 'reset_dev_on_drv_probe' devlink parameter, controlling the device reset policy on driver probe. This parameter is useful in conjunction with the existing 'fw_load_policy' parameter. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10devlink: extend 'fw_load_policy' valuesDirk van der Merwe
Add the 'disk' value to the generic 'fw_load_policy' devlink parameter. This value indicates that firmware should always be loaded from disk only. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10nfsd: add support for upcall version 2Scott Mayhew
Version 2 upcalls will allow the nfsd to include a hash of the kerberos principal string in the Cld_Create upcall. If a principal is present in the svc_cred, then the hash will be included in the Cld_Create upcall. We attempt to use the svc_cred.cr_raw_principal (which is returned by gssproxy) first, and then fall back to using the svc_cred.cr_principal (which is returned by both gssproxy and rpc.svcgssd). Upon a subsequent restart, the hash will be returned in the Cld_Gracestart downcall and stored in the reclaim_str_hashtbl so it can be used when handling reclaim opens. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-09-10nfsd: add a "GetVersion" upcall for nfsdcldScott Mayhew
Add a "GetVersion" upcall to allow nfsd to determine the maximum upcall version that the nfsdcld userspace daemon supports. If the daemon responds with -EOPNOTSUPP, then we know it only supports v1. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-09-09Merge branch 'asoc-5.4' into asoc-nextMark Brown
2019-09-09btrfs: turn checksum type define into an enumJohannes Thumshirn
Turn the checksum type definition into a enum. This eases later addition of new checksums. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: replace: BTRFS_DEV_REPLACE_ITEM_STATE_x defines should goAnand Jain
The BTRFS_DEV_REPLACE_ITEM_STATE_x defines, as shown in [1], are unused in both kernel and btrfs-progs (except for one instance of BTRFS_DEV_REPLACE_ITEM_STATE_NEVER_STARTED in kernel). [1] btrfs.h:#define BTRFS_IOCTL_DEV_REPLACE_STATE_FINISHED 2 btrfs.h:#define BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED 3 btrfs.h:#define BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED 4 Further these define-values are different form its counterpart BTRFS_IOCTL_DEV_REPLACE_STATE_x series as shown in [2]. [2] btrfs_tree.h:#define BTRFS_DEV_REPLACE_ITEM_STATE_SUSPENDED 2 btrfs_tree.h:#define BTRFS_DEV_REPLACE_ITEM_STATE_FINISHED 3 btrfs_tree.h:#define BTRFS_DEV_REPLACE_ITEM_STATE_CANCELED 4 So this patch deletes the BTRFS_DEV_REPLACE_ITEM_STATE_x altogether, and one instance of BTRFS_DEV_REPLACE_ITEM_STATE_NEVER_STARTED is replaced with BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED in the kernel. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: clarify btrfs_ioctl_get_dev_stats paddingHans van Kranenburg
In commit c11d2c236cc26 ("Btrfs: add ioctl to get and reset the device stats") the get_dev_stats ioctl was added. Shortly thereafter, in commit b27f7c0c150f7 ("btrfs: join DEV_STATS ioctls to one") , the flags field was added. However, the calculation for unused padding space was not updated, which also invalidated the comment. Clarify what happened to reduce confusion and wasted time for anyone implementing this. Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: use common vfs LABEL ioctl definitionsEric Sandeen
I lifted the btrfs label get/set ioctls to the vfs some time ago, but never followed up to use those common definitions directly in btrfs. This patch does that. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09KVM: arm/arm64: vgic: Allow more than 256 vcpus for KVM_IRQ_LINEMarc Zyngier
While parts of the VGIC support a large number of vcpus (we bravely allow up to 512), other parts are more limited. One of these limits is visible in the KVM_IRQ_LINE ioctl, which only allows 256 vcpus to be signalled when using the CPU or PPI types. Unfortunately, we've cornered ourselves badly by allocating all the bits in the irq field. Since the irq_type subfield (8 bit wide) is currently only taking the values 0, 1 and 2 (and we have been careful not to allow anything else), let's reduce this field to only 4 bits, and allocate the remaining 4 bits to a vcpu2_index, which acts as a multiplier: vcpu_id = 256 * vcpu2_index + vcpu_index With that, and a new capability (KVM_CAP_ARM_IRQ_LINE_LAYOUT_2) allowing this to be discovered, it becomes possible to inject PPIs to up to 4096 vcpus. But please just don't. Whilst we're there, add a clarification about the use of KVM_IRQ_LINE on arm, which is not completely conditionned by KVM_CAP_IRQCHIP. Reported-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-09-08parisc: add kexec syscall supportSven Schnelle
Signed-off-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: Helge Deller <deller@gmx.de>
2019-09-07isdn/capi: check message length in capi_write()Eric Biggers
syzbot reported: BUG: KMSAN: uninit-value in capi_write+0x791/0xa90 drivers/isdn/capi/capi.c:700 CPU: 0 PID: 10025 Comm: syz-executor379 Not tainted 4.20.0-rc7+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x173/0x1d0 lib/dump_stack.c:113 kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613 __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313 capi_write+0x791/0xa90 drivers/isdn/capi/capi.c:700 do_loop_readv_writev fs/read_write.c:703 [inline] do_iter_write+0x83e/0xd80 fs/read_write.c:961 vfs_writev fs/read_write.c:1004 [inline] do_writev+0x397/0x840 fs/read_write.c:1039 __do_sys_writev fs/read_write.c:1112 [inline] __se_sys_writev+0x9b/0xb0 fs/read_write.c:1109 __x64_sys_writev+0x4a/0x70 fs/read_write.c:1109 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7 [...] The problem is that capi_write() is reading past the end of the message. Fix it by checking the message's length in the needed places. Reported-and-tested-by: syzbot+0849c524d9c634f5ae66@syzkaller.appspotmail.com Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Add nft_reg_store64() and nft_reg_load64() helpers, from Ander Juaristi. 2) Time matching support, also from Ander Juaristi. 3) VLAN support for nfnetlink_log, from Michael Braun. 4) Support for set element deletions from the packet path, also from Ander. 5) Remove __read_mostly from conntrack spinlock, from Li RongQing. 6) Support for updating stateful objects, this also includes the initial client for this infrastructure: the quota extension. A follow up fix for the control plane also comes in this batch. Patches from Fernando Fernandez Mancera. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-06ipc: fix semtimedop for generic 32-bit architecturesArnd Bergmann
As Vincent noticed, the y2038 conversion of semtimedop in linux-5.1 broke when commit 00bf25d693e7 ("y2038: use time32 syscall names on 32-bit") changed all system calls on all architectures that take a 32-bit time_t to point to the _time32 implementation, but left out semtimedop in the asm-generic header. This affects all 32-bit architectures using asm-generic/unistd.h: h8300, unicore32, openrisc, nios2, hexagon, c6x, arc, nds32 and csky. The notable exception is riscv32, which has dropped support for the time32 system calls entirely. Reported-by: Vincent Chen <deanbo422@gmail.com> Cc: stable@vger.kernel.org Cc: Vincent Chen <deanbo422@gmail.com> Cc: Greentime Hu <green.hu@gmail.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Stafford Horne <shorne@gmail.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Ley Foon Tan <lftan@altera.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Mark Salter <msalter@redhat.com> Cc: Aurelien Jacquiot <jacquiot.aurelien@gmail.com> Cc: Guo Ren <guoren@kernel.org> Fixes: 00bf25d693e7 ("y2038: use time32 syscall names on 32-bit") Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-09-06io_uring: expose single mmap capabilityJens Axboe
After commit 75b28affdd6a we can get by with just a single mmap to map both the sq and cq ring. However, userspace doesn't know that. Add a features variable to io_uring_params, and notify userspace that the kernel has this ability. This can then be used in liburing (or in applications directly) to avoid the second mmap. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add the ability to use unaligned chunks in the AF_XDP umem. By relaxing where the chunks can be placed, it allows to use an arbitrary buffer size and place whenever there is a free address in the umem. Helps more seamless DPDK AF_XDP driver integration. Support for i40e, ixgbe and mlx5e, from Kevin and Maxim. 2) Addition of a wakeup flag for AF_XDP tx and fill rings so the application can wake up the kernel for rx/tx processing which avoids busy-spinning of the latter, useful when app and driver is located on the same core. Support for i40e, ixgbe and mlx5e, from Magnus and Maxim. 3) bpftool fixes for printf()-like functions so compiler can actually enforce checks, bpftool build system improvements for custom output directories, and addition of 'bpftool map freeze' command, from Quentin. 4) Support attaching/detaching XDP programs from 'bpftool net' command, from Daniel. 5) Automatic xskmap cleanup when AF_XDP socket is released, and several barrier/{read,write}_once fixes in AF_XDP code, from Björn. 6) Relicense of bpf_helpers.h/bpf_endian.h for future libbpf inclusion as well as libbpf versioning improvements, from Andrii. 7) Several new BPF kselftests for verifier precision tracking, from Alexei. 8) Several BPF kselftest fixes wrt endianess to run on s390x, from Ilya. 9) And more BPF kselftest improvements all over the place, from Stanislav. 10) Add simple BPF map op cache for nfp driver to batch dumps, from Jakub. 11) AF_XDP socket umem mapping improvements for 32bit archs, from Ivan. 12) Add BPF-to-BPF call and BTF line info support for s390x JIT, from Yauheni. 13) Small optimization in arm64 JIT to spare 1 insns for BPF_MOD, from Jerin. 14) Fix an error check in bpf_tcp_gen_syncookie() helper, from Petar. 15) Various minor fixes and cleanups, from Nathan, Masahiro, Masanari, Peter, Wei, Yue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-06net_sched: act_police: add 2 new attributes to support police 64bit rate and ↵David Dai
peakrate For high speed adapter like Mellanox CX-5 card, it can reach upto 100 Gbits per second bandwidth. Currently htb already supports 64bit rate in tc utility. However police action rate and peakrate are still limited to 32bit value (upto 32 Gbits per second). Add 2 new attributes TCA_POLICE_RATE64 and TCA_POLICE_RATE64 in kernel for 64bit support so that tc utility can use them for 64bit rate and peakrate value to break the 32bit limit, and still keep the backward binary compatibility. Tested-by: David Dai <zdai@linux.vnet.ibm.com> Signed-off-by: David Dai <zdai@linux.vnet.ibm.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-06net: openvswitch: Set OvS recirc_id from tc chain indexPaul Blakey
Offloaded OvS datapath rules are translated one to one to tc rules, for example the following simplified OvS rule: recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2) Will be translated to the following tc rule: $ tc filter add dev dev1 ingress \ prio 1 chain 0 proto ip \ flower tcp ct_state -trk \ action ct pipe \ action goto chain 2 Received packets will first travel though tc, and if they aren't stolen by it, like in the above rule, they will continue to OvS datapath. Since we already did some actions (action ct in this case) which might modify the packets, and updated action stats, we would like to continue the proccessing with the correct recirc_id in OvS (here recirc_id(2)) where we left off. To support this, introduce a new skb extension for tc, which will be used for translating tc chain to ovs recirc_id to handle these miss cases. Last tc chain index will be set by tc goto chain action and read by OvS datapath. Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-06Merge branch 'x86/cleanups' into x86/cpu, to pick up dependent changesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-05PCI: pciehp: Remove pciehp_set_attention_status()Denis Efremov
Remove pciehp_set_attention_status() and use pciehp_set_indicators() instead, since the code is mostly the same. Link: https://lore.kernel.org/r/20190903111021.1559-4-efremov@linux.com Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2019-09-05habanalabs: stop using the acronym KMDOded Gabbay
We want to stop using the acronym KMD. Therefore, replace all locations (except for register names we can't modify) where KMD is written to other terms such as "Linux kernel driver" or "Host kernel driver", etc. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-09-05habanalabs: add uapi to retrieve aggregate H/W eventsOded Gabbay
Add a new opcode to INFO IOCTL to retrieve aggregate H/W events. i.e. the events counters are NOT cleared upon device reset, but count from the loading of the driver. Add the code to support it in the device event handling function. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-09-05habanalabs: add uapi to retrieve device utilizationOded Gabbay
Users and sysadmins usually want to know what is the device utilization as a level 0 indication if they are efficiently using the device. Add a new opcode to the INFO IOCTL that will return the device utilization over the last period of 100-1000ms. The return value is 0-100, representing as percentage the total utilization rate. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-09-05habanalabs: Make the Coresight timestamp perpetualTomer Tayar
The Coresight timestamp is enabled for a specific debug session using the HL_DEBUG_OP_TIMESTAMP opcode of the debug IOCTL. In order to have a perpetual timestamp that would be comparable between various debug sessions, this patch moves the timestamp enablement to be part of the HW initialization. The HL_DEBUG_OP_TIMESTAMP opcode turns to be deprecated and shouldn't be used. Old user-space that will call it won't see any change in the behavior of the debug session. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-09-05habanalabs: explicitly set the queue-id enumerated numbersDotan Barak
When looking at kernel log messages and when debugging user applications, we only see the queue id. This patch explicitly set the queue id in the queue enumeration which will be helpful for finding the queue name when we have its id. Signed-off-by: Dotan Barak <dbarak@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2019-09-05habanalabs: add comments on INFO IOCTLOded Gabbay
This patch adds some in-code documentation on the different opcodes of the INFO IOCTL. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2019-09-04can: add support of SAE J1939 protocolThe j1939 authors
SAE J1939 is the vehicle bus recommended practice used for communication and diagnostics among vehicle components. Originating in the car and heavy-duty truck industry in the United States, it is now widely used in other parts of the world. J1939, ISO 11783 and NMEA 2000 all share the same high level protocol. SAE J1939 can be considered the replacement for the older SAE J1708 and SAE J1587 specifications. Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Bastian Stender <bst@pengutronix.de> Signed-off-by: Elenita Hinds <ecathinds@gmail.com> Signed-off-by: kbuild test robot <lkp@intel.com> Signed-off-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> Signed-off-by: Maxime Jayat <maxime.jayat@mobile-devices.fr> Signed-off-by: Robin van der Gracht <robin@protonic.nl> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2019-09-04can: extend sockaddr_can to include j1939 membersKurt Van Dijck
This patch prepares struct sockaddr_can for SAE J1939. Signed-off-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2019-09-04can: add socket type for CAN_J1939Kurt Van Dijck
This patch is a preparation for SAE J1939 and adds CAN_J1939 socket type. Signed-off-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>