summaryrefslogtreecommitdiff
path: root/include/uapi
AgeCommit message (Collapse)Author
2023-10-30Merge tag 'for-6.7-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "New features: - raid-stripe-tree New tree for logical file extent mapping where the physical mapping may not match on multiple devices. This is now used in zoned mode to implement RAID0/RAID1* profiles, but can be used in non-zoned mode as well. The support for RAID56 is in development and will eventually fix the problems with the current implementation. This is a backward incompatible feature and has to be enabled at mkfs time. - simple quota accounting (squota) A simplified mode of qgroup that accounts all space on the initial extent owners (a subvolume), the snapshots are then cheap to create and delete. The deletion of snapshots in fully accounting qgroups is a known CPU/IO performance bottleneck. The squota is not suitable for the general use case but works well for containers where the original subvolume exists for the whole time. This is a backward incompatible feature as it needs extending some structures, but can be enabled on an existing filesystem. - temporary filesystem fsid (temp_fsid) The fsid identifies a filesystem and is hard coded in the structures, which disallows mounting the same fsid found on different devices. For a single device filesystem this is not strictly necessary, a new temporary fsid can be generated on mount e.g. after a device is cloned. This will be used by Steam Deck for root partition A/B testing, or can be used for VM root images. Other user visible changes: - filesystems with partially finished metadata_uuid conversion cannot be mounted anymore and the uuid fixup has to be done by btrfs-progs (btrfstune). Performance improvements: - reduce reservations for checksum deletions (with enabled free space tree by factor of 4), on a sample workload on file with many extents the deletion time decreased by 12% - make extent state merges more efficient during insertions, reduce rb-tree iterations (run time of critical functions reduced by 5%) Core changes: - the integrity check functionality has been removed, this was a debugging feature and removal does not affect other integrity checks like checksums or tree-checker - space reservation changes: - more efficient delayed ref reservations, this avoids building up too much work or overusing or exhausting the global block reserve in some situations - move delayed refs reservation to the transaction start time, this prevents some ENOSPC corner cases related to exhaustion of global reserve - improvements in reducing excessive reservations for block group items - adjust overcommit logic in near full situations, account for one more chunk to eventually allocate metadata chunk, this is mostly relevant for small filesystems (<10GiB) - single device filesystems are scanned but not registered (except seed devices), this allows temp_fsid to work - qgroup iterations do not need GFP_ATOMIC allocations anymore - cleanups, refactoring, reduced data structure size, function parameter simplifications, error handling fixes" * tag 'for-6.7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (156 commits) btrfs: open code timespec64 in struct btrfs_inode btrfs: remove redundant log root tree index assignment during log sync btrfs: remove redundant initialization of variable dirty in btrfs_update_time() btrfs: sysfs: show temp_fsid feature btrfs: disable the device add feature for temp-fsid btrfs: disable the seed feature for temp-fsid btrfs: update comment for temp-fsid, fsid, and metadata_uuid btrfs: remove pointless empty log context list check when syncing log btrfs: update comment for struct btrfs_inode::lock btrfs: remove pointless barrier from btrfs_sync_file() btrfs: add and use helpers for reading and writing last_trans_committed btrfs: add and use helpers for reading and writing fs_info->generation btrfs: add and use helpers for reading and writing log_transid btrfs: add and use helpers for reading and writing last_log_commit btrfs: support cloned-device mount capability btrfs: add helper function find_fsid_by_disk btrfs: stop reserving excessive space for block group item insertions btrfs: stop reserving excessive space for block group item updates btrfs: reorder btrfs_inode to fill gaps btrfs: open code btrfs_ordered_inode_tree in btrfs_inode ...
2023-10-30Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linuxLinus Torvalds
Pull fscrypt updates from Eric Biggers: "This update adds support for configuring the crypto data unit size (i.e. the granularity of file contents encryption) to be less than the filesystem block size. This can allow users to use inline encryption hardware in some cases when it wouldn't otherwise be possible. In addition, there are two commits that are prerequisites for the extent-based encryption support that the btrfs folks are working on" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: track master key presence separately from secret fscrypt: rename fscrypt_info => fscrypt_inode_info fscrypt: support crypto data unit size less than filesystem block size fscrypt: replace get_ino_and_lblk_bits with just has_32bit_inodes fscrypt: compute max_lblk_bits from s_maxbytes and block size fscrypt: make the bounce page pool opt-in instead of opt-out fscrypt: make it clearer that key_prefix is deprecated
2023-10-30Merge tag 'nfsd-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "This release completes the SunRPC thread scheduler work that was begun in v6.6. The scheduler can now find an svc thread to wake in constant time and without a list walk. Thanks again to Neil Brown for this overhaul. Lorenzo Bianconi contributed infrastructure for a netlink-based NFSD control plane. The long-term plan is to provide the same functionality as found in /proc/fs/nfsd, plus some interesting additions, and then migrate the NFSD user space utilities to netlink. A long series to overhaul NFSD's NFSv4 operation encoding was applied in this release. The goals are to bring this family of encoding functions in line with the matching NFSv4 decoding functions and with the NFSv2 and NFSv3 XDR functions, preparing the way for better memory safety and maintainability. A further improvement to NFSD's write delegation support was contributed by Dai Ngo. This adds a CB_GETATTR callback, enabling the server to retrieve cached size and mtime data from clients holding write delegations. If the server can retrieve this information, it does not have to recall the delegation in some cases. The usual panoply of bug fixes and minor improvements round out this release. As always I am grateful to all contributors, reviewers, and testers" * tag 'nfsd-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (127 commits) svcrdma: Fix tracepoint printk format svcrdma: Drop connection after an RDMA Read error NFSD: clean up alloc_init_deleg() NFSD: Fix frame size warning in svc_export_parse() NFSD: Rewrite synopsis of nfsd_percpu_counters_init() nfsd: Clean up errors in nfs3proc.c nfsd: Clean up errors in nfs4state.c NFSD: Clean up errors in stats.c NFSD: simplify error paths in nfsd_svc() NFSD: Clean up nfsd4_encode_seek() NFSD: Clean up nfsd4_encode_offset_status() NFSD: Clean up nfsd4_encode_copy_notify() NFSD: Clean up nfsd4_encode_copy() NFSD: Clean up nfsd4_encode_test_stateid() NFSD: Clean up nfsd4_encode_exchange_id() NFSD: Clean up nfsd4_do_encode_secinfo() NFSD: Clean up nfsd4_encode_access() NFSD: Clean up nfsd4_encode_readdir() NFSD: Clean up nfsd4_encode_entry4() NFSD: Add an nfsd4_encode_nfs_cookie4() helper ...
2023-10-28ubi: fastmap: Add control in 'UBI_IOCATT' ioctl to reserve PEBs for filling ↵Zhihao Cheng
pools This patch imports a new field 'need_resv_pool' in struct 'ubi_attach_req' to control whether or not reserving free PEBs for filling pool/wl_pool. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217787 Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2023-10-28Merge branch 'pci/field-get'Bjorn Helgaas
- Use FIELD_GET()/FIELD_PREP() when possible throughout drivers/pci/ (Ilpo Järvinen, Bjorn Helgaas) - Rework DPC control programming for clarity (Ilpo Järvinen) * pci/field-get: PCI/portdrv: Use FIELD_GET() PCI/VC: Use FIELD_GET() PCI/PTM: Use FIELD_GET() PCI/PME: Use FIELD_GET() PCI/ATS: Use FIELD_GET() PCI/ATS: Show PASID Capability register width in bitmasks PCI: Use FIELD_GET() in Sapphire RX 5600 XT Pulse quirk PCI: Use FIELD_GET() PCI/MSI: Use FIELD_GET/PREP() PCI/DPC: Use defines with DPC reason fields PCI/DPC: Use defined fields with DPC_CTL register PCI/DPC: Use FIELD_GET() PCI: hotplug: Use FIELD_GET/PREP() PCI: dwc: Use FIELD_GET/PREP() PCI: cadence: Use FIELD_GET() PCI: Use FIELD_GET() to extract Link Width PCI: mvebu: Use FIELD_PREP() with Link Width PCI: tegra194: Use FIELD_GET()/FIELD_PREP() with Link Width fields # Conflicts: # drivers/pci/controller/dwc/pcie-tegra194.c
2023-10-28Merge branch 'pci/enumeration'Bjorn Helgaas
- Add and use pci_get_base_class() to search for all PCI_BASE_CLASS_DISPLAY devices (Sui Jingfeng) - Fix a vmd check for multi-function devices (Ilpo Järvinen) - Add PCI_HEADER_TYPE_MFD and use it to replace literals (Ilpo Järvinen) - Use acpi_evaluate_dsm_typed() instead of open-coding it (Andy Shevchenko) - Keep .remove() and .probe() callbacks (previously marked __init) in case they're used via sysfs (Uwe Kleine-König) * pci/enumeration: PCI: keystone: Don't discard .probe() callback PCI: keystone: Don't discard .remove() callback PCI: kirin: Don't discard .remove() callback PCI: exynos: Don't discard .remove() callback PCI/ACPI: Use acpi_evaluate_dsm_typed() PCI: Use PCI_HEADER_TYPE_* instead of literals PCI: Add PCI_HEADER_TYPE_MFD definition PCI: vmd: Correct PCI Header Type Register's multi-function check drm/radeon: Use pci_get_base_class() to reduce duplicated code drm/amdgpu: Use pci_get_base_class() to reduce duplicated code drm/nouveau: Use pci_get_base_class() to reduce duplicated code ALSA: hda: Use pci_get_base_class() to reduce duplicated code PCI: Add pci_get_base_class() helper
2023-10-27doc/netlink: Update schema to support cmd-cnt-name and cmd-max-nameDavide Caratti
allow specifying cmd-cnt-name and cmd-max-name in netlink specs, in accordance with Documentation/userspace-api/netlink/c-code-gen.rst. Use cmd-cnt-name and attr-cnt-name in the mptcp yaml spec and in the corresponding uAPI headers, to preserve the #defines we had in the past and avoid adding new ones. v2: - squash modification in mptcp.yaml and MPTCP uAPI headers Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/12d4ed0116d8883cf4b533b856f3125a34e56749.1698415310.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-27perf/x86/intel: Support branch counters loggingKan Liang
The branch counters logging (A.K.A LBR event logging) introduces a per-counter indication of precise event occurrences in LBRs. It can provide a means to attribute exposed retirement latency to combinations of events across a block of instructions. It also provides a means of attributing Timed LBR latencies to events. The feature is first introduced on SRF/GRR. It is an enhancement of the ARCH LBR. It adds new fields in the LBR_INFO MSRs to log the occurrences of events on the GP counters. The information is displayed by the order of counters. The design proposed in this patch requires that the events which are logged must be in a group with the event that has LBR. If there are more than one LBR group, the counters logging information only from the current group (overflowed) are stored for the perf tool, otherwise the perf tool cannot know which and when other groups are scheduled especially when multiplexing is triggered. The user can ensure it uses the maximum number of counters that support LBR info (4 by now) by making the group large enough. The HW only logs events by the order of counters. The order may be different from the order of enabling which the perf tool can understand. When parsing the information of each branch entry, convert the counter order to the enabled order, and store the enabled order in the extension space. Unconditionally reset LBRs for an LBR event group when it's deleted. The logged counter information is only valid for the current LBR group. If another LBR group is scheduled later, the information from the stale LBRs would be otherwise wrongly interpreted. Add a sanity check in intel_pmu_hw_config(). Disable the feature if other counter filters (inv, cmask, edge, in_tx) are set or LBR call stack mode is enabled. (For the LBR call stack mode, we cannot simply flush the LBR, since it will break the call stack. Also, there is no obvious usage with the call stack mode for now.) Only applying the PERF_SAMPLE_BRANCH_COUNTERS doesn't require any branch stack setup. Expose the maximum number of supported counters and the width of the counters into the sysfs. The perf tool can use the information to parse the logged counters in each branch. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20231025201626.3000228-5-kan.liang@linux.intel.com
2023-10-27perf: Add branch stack countersKan Liang
Currently, the additional information of a branch entry is stored in a u64 space. With more and more information added, the space is running out. For example, the information of occurrences of events will be added for each branch. Two places were suggested to append the counters. https://lore.kernel.org/lkml/20230802215814.GH231007@hirez.programming.kicks-ass.net/ One place is right after the flags of each branch entry. It changes the existing struct perf_branch_entry. The later ARCH specific implementation has to be really careful to consistently pick the right struct. The other place is right after the entire struct perf_branch_stack. The disadvantage is that the pointer of the extra space has to be recorded. The common interface perf_sample_save_brstack() has to be updated. The latter is much straightforward, and should be easily understood and maintained. It is implemented in the patch. Add a new branch sample type, PERF_SAMPLE_BRANCH_COUNTERS, to indicate the event which is recorded in the branch info. The "u64 counters" may store the occurrences of several events. The information regarding the number of events/counters and the width of each counter should be exposed via sysfs as a reference for the perf tool. Define the branch_counter_nr and branch_counter_width ABI here. The support will be implemented later in the Intel-specific patch. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20231025201626.3000228-1-kan.liang@linux.intel.com
2023-10-27drm: introduce CLOSEFB IOCTLSimon Ser
This new IOCTL allows callers to close a framebuffer without disabling planes or CRTCs. This takes inspiration from Rob Clark's unref_fb IOCTL [1] and DRM_MODE_FB_PERSIST [2]. User-space patch for wlroots available at [3]. IGT test available at [4]. v2: add an extra pad field just in case we want to extend this IOCTL in the future (Pekka, Sima). [1]: https://lore.kernel.org/dri-devel/20170509153654.23464-1-robdclark@gmail.com/ [2]: https://lore.kernel.org/dri-devel/20211006151921.312714-1-contact@emersion.fr/ [3]: https://gitlab.freedesktop.org/wlroots/wlroots/-/merge_requests/4394 [4]: https://lists.freedesktop.org/archives/igt-dev/2023-October/063294.html Signed-off-by: Simon Ser <contact@emersion.fr> Reviewed-by: Daniel Stone <daniels@collabora.com> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Dennis Filder <d.filder@web.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Rob Clark <robdclark@gmail.com> Cc: Sean Paul <seanpaul@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20231020101926.145327-2-contact@emersion.fr
2023-10-27usb: raw-gadget: report suspend, resume, reset, and disconnect eventsAndrey Konovalov
Update USB_RAW_IOCTL_EVENT_FETCH to also report suspend, resume, reset, and disconnect events. This allows the code that emulates a USB device via Raw Gadget to handle these events. For example, the device can restart enumeration when it gets reset. Also do not print a WARNING when the event queue overflows. With these new events being queued, the queue might overflow if the device emulation code stops fetching events. Also print debug messages when a non-control event is received. Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/d610b629a5f32fb76c24012180743f7f0f1872c0.1698350424.git.andreyknvl@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-27crypto: FIPS 202 SHA-3 register in hash info for IMADimitri John Ledkov
Register FIPS 202 SHA-3 hashes in hash info for IMA and other users. Sizes 256 and up, as 224 is too weak for any practical purposes. Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-27bridge: add MDB get uAPI attributesIdo Schimmel
Add MDB get attributes that correspond to the MDB set attributes used in RTM_NEWMDB messages. Specifically, add 'MDBA_GET_ENTRY' which will hold a 'struct br_mdb_entry' and 'MDBA_GET_ENTRY_ATTRS' which will hold 'MDBE_ATTR_*' attributes that are used as indexes (source IP and source VNI). An example request will look as follows: [ struct nlmsghdr ] [ struct br_port_msg ] [ MDBA_GET_ENTRY ] struct br_mdb_entry [ MDBA_GET_ENTRY_ATTRS ] [ MDBE_ATTR_SOURCE ] struct in_addr / struct in6_addr [ MDBE_ATTR_SRC_VNI ] u32 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Add TCP_AO_REPAIRDmitry Safonov
Add TCP_AO_REPAIR setsockopt(), getsockopt(). They let a user to repair TCP-AO ISNs/SNEs. Also let the user hack around when (tp->repair) is on and add ao_info on a socket in any supported state. As SNEs now can be read/written at any moment, use WRITE_ONCE()/READ_ONCE() to set/read them. Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Allow asynchronous delete for TCP-AO keys (MKTs)Dmitry Safonov
Delete becomes very, very fast - almost free, but after setsockopt() syscall returns, the key is still alive until next RCU grace period. Which is fine for listen sockets as userspace needs to be aware of setsockopt(TCP_AO) and accept() race and resolve it with verification by getsockopt() after TCP connection was accepted. The benchmark results (on non-loaded box, worse with more RCU work pending): > ok 33 Worst case delete 16384 keys: min=5ms max=10ms mean=6.93904ms stddev=0.263421 > ok 34 Add a new key 16384 keys: min=1ms max=4ms mean=2.17751ms stddev=0.147564 > ok 35 Remove random-search 16384 keys: min=5ms max=10ms mean=6.50243ms stddev=0.254999 > ok 36 Remove async 16384 keys: min=0ms max=0ms mean=0.0296107ms stddev=0.0172078 Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Add TCP-AO getsockopt()sDmitry Safonov
Introduce getsockopt(TCP_AO_GET_KEYS) that lets a user get TCP-AO keys and their properties from a socket. The user can provide a filter to match the specific key to be dumped or ::get_all = 1 may be used to dump all keys in one syscall. Add another getsockopt(TCP_AO_INFO) for providing per-socket/per-ao_info stats: packet counters, Current_key/RNext_key and flags like ::ao_required and ::accept_icmps. Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Add option for TCP-AO to (not) hash headerDmitry Safonov
Provide setsockopt() key flag that makes TCP-AO exclude hashing TCP header for peers that match the key. This is needed for interraction with middleboxes that may change TCP options, see RFC5925 (9.2). Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Ignore specific ICMPs for TCP-AO connectionsDmitry Safonov
Similarly to IPsec, RFC5925 prescribes: ">> A TCP-AO implementation MUST default to ignore incoming ICMPv4 messages of Type 3 (destination unreachable), Codes 2-4 (protocol unreachable, port unreachable, and fragmentation needed -- ’hard errors’), and ICMPv6 Type 1 (destination unreachable), Code 1 (administratively prohibited) and Code 4 (port unreachable) intended for connections in synchronized states (ESTABLISHED, FIN-WAIT-1, FIN- WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT) that match MKTs." A selftest (later in patch series) verifies that this attack is not possible in this TCP-AO implementation. Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Add TCP-AO segments countersDmitry Safonov
Introduce segment counters that are useful for troubleshooting/debugging as well as for writing tests. Now there are global snmp counters as well as per-socket and per-key. Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Introduce TCP_AO setsockopt()sDmitry Safonov
Add 3 setsockopt()s: 1. TCP_AO_ADD_KEY to add a new Master Key Tuple (MKT) on a socket 2. TCP_AO_DEL_KEY to delete present MKT from a socket 3. TCP_AO_INFO to change flags, Current_key/RNext_key on a TCP-AO sk Userspace has to introduce keys on every socket it wants to use TCP-AO option on, similarly to TCP_MD5SIG/TCP_MD5SIG_EXT. RFC5925 prohibits definition of MKTs that would match the same peer, so do sanity checks on the data provided by userspace. Be as conservative as possible, including refusal of defining MKT on an established connection with no AO, removing the key in-use and etc. (1) and (2) are to be used by userspace key manager to add/remove keys. (3) main purpose is to set RNext_key, which (as prescribed by RFC5925) is the KeyID that will be requested in TCP-AO header from the peer to sign their segments with. At this moment the life of ao_info ends in tcp_v4_destroy_sock(). Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Add TCP-AO config and structuresDmitry Safonov
Introduce new kernel config option and common structures as well as helpers to be used by TCP-AO code. Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27tty: n_gsm: add copyright Siemens Mobility GmbHDaniel Starke
More than 1/3 of the n_gsm code has been contributed by us in the last 1.5 years, completing conformance with the standard and stabilizing the driver: - added UI (unnumbered information) frame support - added PN (parameter negotiation) message handling and function support - added optional keep-alive control link supervision via test messages - added TIOCM_OUT1 and TIOCM_OUT2 to allow responder to operate as modem - added TIOCMIWAIT support on virtual ttys - added additional ioctls and parameters to configure the new functions - added overall locking mechanism to avoid data race conditions - added outgoing data flow to decouple physical from virtual tty handling for better performance and to avoid dead-locks - fixed advanced option mode implementation - fixed convergence layer type 2 implementation - fixed handling of CLD (multiplexer close down) messages - fixed broken muxer close down procedure - and many more bug fixes With this most of our initial RFC has been implemented. It gives the driver a quality boost unseen in the decade before. Add a copyright notice to the n_gsm files to highlight this contribution. Link: https://lore.kernel.org/all/20220225080758.2869-1-daniel.starke@siemens.com/ Signed-off-by: Daniel Starke <daniel.starke@siemens.com> Link: https://lore.kernel.org/r/20231027053903.1886-1-daniel.starke@siemens.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-26Merge tag 'wireless-next-2023-10-26' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.7 The third, and most likely the last, features pull request for v6.7. Fixes all over and only few small new features. Major changes: iwlwifi - more Multi-Link Operation (MLO) work ath12k - QCN9274: mesh support ath11k - firmware-2.bin container file format support * tag 'wireless-next-2023-10-26' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (155 commits) wifi: ray_cs: Remove unnecessary (void*) conversions Revert "wifi: ath11k: call ath11k_mac_fils_discovery() without condition" wifi: ath12k: Introduce and use ath12k_sta_to_arsta() wifi: ath12k: fix htt mlo-offset event locking wifi: ath12k: fix dfs-radar and temperature event locking wifi: ath11k: fix gtk offload status event locking wifi: ath11k: fix htt pktlog locking wifi: ath11k: fix dfs radar event locking wifi: ath11k: fix temperature event locking wifi: ath12k: rename the sc naming convention to ab wifi: ath12k: rename the wmi_sc naming convention to wmi_ab wifi: ath11k: add firmware-2.bin support wifi: ath11k: qmi: refactor ath11k_qmi_m3_load() wifi: rtw89: cleanup firmware elements parsing wifi: rt2x00: rework MT7620 PA/LNA RF calibration wifi: rt2x00: rework MT7620 channel config function wifi: rt2x00: improve MT7620 register initialization MAINTAINERS: wifi: rt2x00: drop Helmut Schaa wifi: wlcore: main: replace deprecated strncpy with strscpy wifi: wlcore: boot: replace deprecated strncpy with strscpy ... ==================== Link: https://lore.kernel.org/r/20231026090411.B2426C433CB@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26Merge tag 'for-netdev' of ↵Jakub Kicinski
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-10-26 We've added 51 non-merge commits during the last 10 day(s) which contain a total of 75 files changed, 5037 insertions(+), 200 deletions(-). The main changes are: 1) Add open-coded task, css_task and css iterator support. One of the use cases is customizable OOM victim selection via BPF, from Chuyi Zhou. 2) Fix BPF verifier's iterator convergence logic to use exact states comparison for convergence checks, from Eduard Zingerman, Andrii Nakryiko and Alexei Starovoitov. 3) Add BPF programmable net device where bpf_mprog defines the logic of its xmit routine. It can operate in L3 and L2 mode, from Daniel Borkmann and Nikolay Aleksandrov. 4) Batch of fixes for BPF per-CPU kptr and re-enable unit_size checking for global per-CPU allocator, from Hou Tao. 5) Fix libbpf which eagerly assumed that SHT_GNU_verdef ELF section was going to be present whenever a binary has SHT_GNU_versym section, from Andrii Nakryiko. 6) Fix BPF ringbuf correctness to fold smp_mb__before_atomic() into atomic_set_release(), from Paul E. McKenney. 7) Add a warning if NAPI callback missed xdp_do_flush() under CONFIG_DEBUG_NET which helps checking if drivers were missing the former, from Sebastian Andrzej Siewior. 8) Fix missed RCU read-lock in bpf_task_under_cgroup() which was throwing a warning under sleepable programs, from Yafang Shao. 9) Avoid unnecessary -EBUSY from htab_lock_bucket by disabling IRQ before checking map_locked, from Song Liu. 10) Make BPF CI linked_list failure test more robust, from Kumar Kartikeya Dwivedi. 11) Enable samples/bpf to be built as PIE in Fedora, from Viktor Malik. 12) Fix xsk starving when multiple xsk sockets were associated with a single xsk_buff_pool, from Albert Huang. 13) Clarify the signed modulo implementation for the BPF ISA standardization document that it uses truncated division, from Dave Thaler. 14) Improve BPF verifier's JEQ/JNE branch taken logic to also consider signed bounds knowledge, from Andrii Nakryiko. 15) Add an option to XDP selftests to use multi-buffer AF_XDP xdp_hw_metadata and mark used XDP programs as capable to use frags, from Larysa Zaremba. 16) Fix bpftool's BTF dumper wrt printing a pointer value and another one to fix struct_ops dump in an array, from Manu Bretelle. * tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (51 commits) netkit: Remove explicit active/peer ptr initialization selftests/bpf: Fix selftests broken by mitigations=off samples/bpf: Allow building with custom bpftool samples/bpf: Fix passing LDFLAGS to libbpf samples/bpf: Allow building with custom CFLAGS/LDFLAGS bpf: Add more WARN_ON_ONCE checks for mismatched alloc and free selftests/bpf: Add selftests for netkit selftests/bpf: Add netlink helper library bpftool: Extend net dump with netkit progs bpftool: Implement link show support for netkit libbpf: Add link-based API for netkit tools: Sync if_link uapi header netkit, bpf: Add bpf programmable net device bpf: Improve JEQ/JNE branch taken logic bpf: Fold smp_mb__before_atomic() into atomic_set_release() bpf: Fix unnecessary -EBUSY from htab_lock_bucket xsk: Avoid starving the xsk further down the list bpf: print full verifier states on infinite loop detection selftests/bpf: test if state loops are detected in a tricky case bpf: correct loop detection for iterators convergence ... ==================== Link: https://lore.kernel.org/r/20231026150509.2824-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: net/mac80211/rx.c 91535613b609 ("wifi: mac80211: don't drop all unprotected public action frames") 6c02fab72429 ("wifi: mac80211: split ieee80211_drop_unencrypted_mgmt() return value") Adjacent changes: drivers/net/ethernet/apm/xgene/xgene_enet_main.c 61471264c018 ("net: ethernet: apm: Convert to platform remove callback returning void") d2ca43f30611 ("net: xgene: Fix unused xgene_enet_of_match warning for !CONFIG_OF") net/vmw_vsock/virtio_transport.c 64c99d2d6ada ("vsock/virtio: support to send non-linear skb") 53b08c498515 ("vsock/virtio: initialize the_virtio_vsock before using VQs") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26landlock: Support network rules with TCP bind and connectKonstantin Meskhidze
Add network rules support in the ruleset management helpers and the landlock_create_ruleset() syscall. Extend user space API to support network actions: * Add new network access rights: LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP. * Add a new network rule type: LANDLOCK_RULE_NET_PORT tied to struct landlock_net_port_attr. The allowed_access field contains the network access rights, and the port field contains the port value according to the controlled protocol. This field can take up to a 64-bit value but the maximum value depends on the related protocol (e.g. 16-bit value for TCP). Network port is in host endianness [1]. * Add a new handled_access_net field to struct landlock_ruleset_attr that contains network access rights. * Increment the Landlock ABI version to 4. Implement socket_bind() and socket_connect() LSM hooks, which enable to control TCP socket binding and connection to specific ports. Expand access_masks_t from u16 to u32 to be able to store network access rights alongside filesystem access rights for rulesets' handled access rights. Access rights are not tied to socket file descriptors but checked at bind() or connect() call time against the caller's Landlock domain. For the filesystem, a file descriptor is a direct access to a file/data. However, for network sockets, we cannot identify for which data or peer a newly created socket will give access to. Indeed, we need to wait for a connect or bind request to identify the use case for this socket. Likewise a directory file descriptor may enable to open another file (i.e. a new data item), but this opening is also restricted by the caller's domain, not the file descriptor's access rights [2]. [1] https://lore.kernel.org/r/278ab07f-7583-a4e0-3d37-1bacd091531d@digikod.net [2] https://lore.kernel.org/r/263c1eb3-602f-57fe-8450-3f138581bee7@digikod.net Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com> Link: https://lore.kernel.org/r/20231026014751.414649-9-konstantin.meskhidze@huawei.com [mic: Extend commit message, fix typo in comments, and specify endianness in the documentation] Co-developed-by: Mickaël Salaün <mic@digikod.net> Signed-off-by: Mickaël Salaün <mic@digikod.net>
2023-10-26Merge tag 'net-6.6-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from WiFi and netfilter. Most regressions addressed here come from quite old versions, with the exceptions of the iavf one and the WiFi fixes. No known outstanding reports or investigation. Fixes to fixes: - eth: iavf: in iavf_down, disable queues when removing the driver Previous releases - regressions: - sched: act_ct: additional checks for outdated flows - tcp: do not leave an empty skb in write queue - tcp: fix wrong RTO timeout when received SACK reneging - wifi: cfg80211: pass correct pointer to rdev_inform_bss() - eth: i40e: sync next_to_clean and next_to_process for programming status desc - eth: iavf: initialize waitqueues before starting watchdog_task Previous releases - always broken: - eth: r8169: fix data-races - eth: igb: fix potential memory leak in igb_add_ethtool_nfc_entry - eth: r8152: avoid writing garbage to the adapter's registers - eth: gtp: fix fragmentation needed check with gso" * tag 'net-6.6-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits) iavf: in iavf_down, disable queues when removing the driver vsock/virtio: initialize the_virtio_vsock before using VQs net: ipv6: fix typo in comments net: ipv4: fix typo in comments net/sched: act_ct: additional checks for outdated flows netfilter: flowtable: GC pushes back packets to classic path i40e: Fix wrong check for I40E_TXR_FLAGS_WB_ON_ITR gtp: fix fragmentation needed check with gso gtp: uapi: fix GTPA_MAX Fix NULL pointer dereference in cn_filter() sfc: cleanup and reduce netlink error messages net/handshake: fix file ref count in handshake_nl_accept_doit() wifi: mac80211: don't drop all unprotected public action frames wifi: cfg80211: fix assoc response warning on failed links wifi: cfg80211: pass correct pointer to rdev_inform_bss() isdn: mISDN: hfcsusb: Spelling fix in comment tcp: fix wrong RTO timeout when received SACK reneging r8152: Block future register access if register access fails r8152: Rename RTL8152_UNPLUG to RTL8152_INACCESSIBLE r8152: Check for unplug in r8153b_ups_en() / r8153c_ups_en() ...
2023-10-26iommu/vt-d: Disallow read-only mappings to nest parent domainLu Baolu
When remapping hardware is configured by system software in scalable mode as Nested (PGTT=011b) and with PWSNP field Set in the PASID-table-entry, it may Set Accessed bit and Dirty bit (and Extended Access bit if enabled) in first-stage page-table entries even when second-stage mappings indicate that corresponding first-stage page-table is Read-Only. As the result, contents of pages designated by VMM as Read-Only can be modified by IOMMU via PML5E (PML4E for 4-level tables) access as part of address translation process due to DMAs issued by Guest. This disallows read-only mappings in the domain that is supposed to be used as nested parent. Reference from Sapphire Rapids Specification Update [1], errata details, SPR17. Userspace should know this limitation by checking the IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 flag reported in the IOMMU_GET_HW_INFO ioctl. [1] https://www.intel.com/content/www/us/en/content-details/772415/content-details.html Link: https://lore.kernel.org/r/20231026044216.64964-9-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26iommufd: Add data structure for Intel VT-d stage-1 domain allocationYi Liu
This adds IOMMU_HWPT_DATA_VTD_S1 for stage-1 hw_pagetable of Intel VT-d and the corressponding data structure for userspace specified parameter for the domain allocation. Link: https://lore.kernel.org/r/20231026044216.64964-2-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26iommufd: Add a nested HW pagetable objectNicolin Chen
IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce. But it can only allocate a hw_pagetable that associates to a given IOAS, i.e. only a kernel-managed hw_pagetable of IOMMUFD_OBJ_HWPT_PAGING type. IOMMU drivers can now support user-managed hw_pagetables, for two-stage translation use cases that require user data input from the user space. Add a new IOMMUFD_OBJ_HWPT_NESTED type with its abort/destroy(). Pair it with a new iommufd_hwpt_nested structure and its to_hwpt_nested() helper. Update the to_hwpt_paging() helper, so a NESTED-type hw_pagetable can be handled in the callers, for example iommufd_hw_pagetable_enforce_rr(). Screen the inputs including the parent PAGING-type hw_pagetable that has a need of a new nest_parent flag in the iommufd_hwpt_paging structure. Extend the IOMMU_HWPT_ALLOC ioctl to accept an IOMMU driver specific data input which is tagged by the enum iommu_hwpt_data_type. Also, update the @pt_id to accept hwpt_id too besides an ioas_id. Then, use them to allocate a hw_pagetable of IOMMUFD_OBJ_HWPT_NESTED type using the iommufd_hw_pagetable_alloc_nested() allocator. Link: https://lore.kernel.org/r/20231026043938.63898-8-yi.l.liu@intel.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Co-developed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26ALSA: seq: Replace with __packed attributeTakashi Iwai
Replace the old __attribute__((packed)) with the new __packed. Only cleanup, no functional changes. Link: https://lore.kernel.org/r/20231025132314.5878-12-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-10-25ipv6: drop feature RTAX_FEATURE_ALLFRAGYan Zhai
RTAX_FEATURE_ALLFRAG was added before the first git commit: https://www.mail-archive.com/bk-commits-head@vger.kernel.org/msg03399.html The feature would send packets to the fragmentation path if a box receives a PMTU value with less than 1280 byte. However, since commit 9d289715eb5c ("ipv6: stop sending PTB packets for MTU < 1280"), such message would be simply discarded. The feature flag is neither supported in iproute2 utility. In theory one can still manipulate it with direct netlink message, but it is not ideal because it was based on obsoleted guidance of RFC-2460 (replaced by RFC-8200). The feature would always test false at the moment, so remove related code or mark them as unused. Signed-off-by: Yan Zhai <yan@cloudflare.com> Reviewed-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/d78e44dcd9968a252143ffe78460446476a472a1.1698156966.git.yan@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-25mempolicy: remove confusing MPOL_MF_LAZY dead codeHugh Dickins
v3.8 commit b24f53a0bea3 ("mm: mempolicy: Add MPOL_MF_LAZY") introduced MPOL_MF_LAZY, and included it in the MPOL_MF_VALID flags; but a720094ded8 ("mm: mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now") immediately removed it from MPOL_MF_VALID flags, pending further review. "This will need to be revisited", but it has not been reinstated. The present state is confusing: there is dead code in mm/mempolicy.c to handle MPOL_MF_LAZY cases which can never occur. Remove that: it can be resurrected later if necessary. But keep the definition of MPOL_MF_LAZY, which must remain in the UAPI, even though it always fails with EINVAL. https://lore.kernel.org/linux-mm/1553041659-46787-1-git-send-email-yang.shi@linux.alibaba.com/ links to a previous request to remove MPOL_MF_LAZY. Link: https://lkml.kernel.org/r/80c9665c-1c3f-17ba-21a3-f6115cebf7d@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun heo <tj@kernel.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25drm/doc: document DRM_IOCTL_MODE_CREATE_DUMBSimon Ser
The main motivation is to repeat that dumb buffers should not be abused for anything else than basic software rendering with KMS. User-space devs are more likely to look at the IOCTL docs than to actively search for the driver-oriented "Dumb Buffer Objects" section. v2: reference DRM_CAP_DUMB_BUFFER, DRM_CAP_DUMB_PREFERRED_DEPTH and DRM_CAP_DUMB_PREFER_SHADOW (Pekka) Signed-off-by: Simon Ser <contact@emersion.fr> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230821131548.269204-1-contact@emersion.fr
2023-10-24netkit, bpf: Add bpf programmable net deviceDaniel Borkmann
This work adds a new, minimal BPF-programmable device called "netkit" (former PoC code-name "meta") we recently presented at LSF/MM/BPF. The core idea is that BPF programs are executed within the drivers xmit routine and therefore e.g. in case of containers/Pods moving BPF processing closer to the source. One of the goals was that in case of Pod egress traffic, this allows to move BPF programs from hostns tcx ingress into the device itself, providing earlier drop or forward mechanisms, for example, if the BPF program determines that the skb must be sent out of the node, then a redirect to the physical device can take place directly without going through per-CPU backlog queue. This helps to shift processing for such traffic from softirq to process context, leading to better scheduling decisions/performance (see measurements in the slides). In this initial version, the netkit device ships as a pair, but we plan to extend this further so it can also operate in single device mode. The pair comes with a primary and a peer device. Only the primary device, typically residing in hostns, can manage BPF programs for itself and its peer. The peer device is designated for containers/Pods and cannot attach/detach BPF programs. Upon the device creation, the user can set the default policy to 'pass' or 'drop' for the case when no BPF program is attached. Additionally, the device can be operated in L3 (default) or L2 mode. The management of BPF programs is done via bpf_mprog, so that multi-attach is supported right from the beginning with similar API and dependency controls as tcx. For details on the latter see commit 053c8e1f235d ("bpf: Add generic attach/detach/query API for multi-progs"). tc BPF compatibility is provided, so that existing programs can be easily migrated. Going forward, we plan to use netkit devices in Cilium as the main device type for connecting Pods. They will be operated in L3 mode in order to simplify a Pod's neighbor management and the peer will operate in default drop mode, so that no traffic is leaving between the time when a Pod is brought up by the CNI plugin and programs attached by the agent. Additionally, the programs we attach via tcx on the physical devices are using bpf_redirect_peer() for inbound traffic into netkit device, hence the latter is also supporting the ndo_get_peer_dev callback. Similarly, we use bpf_redirect_neigh() for the way out, pushing from netkit peer to phys device directly. Also, BIG TCP is supported on netkit device. For the follow-up work in single device mode, we plan to convert Cilium's cilium_host/_net devices into a single one. An extensive test suite for checking device operations and the BPF program and link management API comes as BPF selftests in this series. Co-developed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://github.com/borkmann/iproute2/tree/pr/netkit Link: http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf (24ff.) Link: https://lore.kernel.org/r/20231024214904.29825-2-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-24PCI/PME: Use FIELD_GET()Bjorn Helgaas
Use FIELD_GET() to remove dependences on the field position, i.e., the shift value. No functional change intended. Link: https://lore.kernel.org/r/20231010204436.1000644-8-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2023-10-24PCI/ATS: Use FIELD_GET()Bjorn Helgaas
Use FIELD_GET() to remove dependences on the field position, i.e., the shift value. No functional change intended. Link: https://lore.kernel.org/r/20231010204436.1000644-6-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2023-10-24PCI/ATS: Show PASID Capability register width in bitmasksBjorn Helgaas
The PASID Capability and Control registers are both 16 bits wide. Use 16-bit wide constants in field names to match the register width. No functional change intended. Link: https://lore.kernel.org/r/20231010204436.1000644-5-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2023-10-24net: dsa: Rename IFLA_DSA_MASTER to IFLA_DSA_CONDUITFlorian Fainelli
This preserves the existing IFLA_DSA_MASTER which is part of the uAPI and creates an alias named IFLA_DSA_CONDUIT. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20231023181729.1191071-3-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24uapi: mptcp: use header file generated from YAML specDavide Caratti
generated with: $ ./tools/net/ynl/ynl-gen-c.py --mode uapi \ > --spec Documentation/netlink/specs/mptcp.yaml \ > --header -o include/uapi/linux/mptcp_pm.h Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-5-16b1f701f900@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24net: mptcp: convert netlink from small_ops to opsDavide Caratti
in the current MPTCP control plane, all operations use a netlink attribute of the same type "MPTCP_PM_ATTR". However, add/del/get/flush operations only parse the first element in the message _ the one that describes MPTCP endpoints (that was named MPTCP_PM_ATTR_ADDR and mostly used in ADD_ADDR operations _ probably the similarity of "attr", "addr" and "add" might cause some confusion to human readers). Convert MPTCP from 'small_ops' to 'ops', thus allowing different attributes for each single operation, hopefully makes all this clearer to human readers. - use a separate attribute set for add/del/get/flush address operation, binary compatible with the existing one, to store the endpoint address. MPTCP_PM_ENDPOINT_ADDR is added to the uAPI (with the same value as MPTCP_PM_ATTR_ADDR) for these operations. - convert mptcp_pm_ops[] and add policy files accordingly. this prepares MPTCP control plane to be described as YAML spec. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-3-16b1f701f900@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24drm/fourcc: Add NV20 and NV30 YUV formatsJonas Karlman
DRM_FORMAT_NV20 and DRM_FORMAT_NV30 formats is the 2x1 and non-subsampled variant of NV15, a 10-bit 2-plane YUV format that has no padding between components. Instead, luminance and chrominance samples are grouped into 4s so that each group is packed into an integer number of bytes: YYYY = UVUV = 4 * 10 bits = 40 bits = 5 bytes The '20' and '30' suffix refers to the optimum effective bits per pixel which is achieved when the total number of luminance samples is a multiple of 4. V2: Added NV30 format Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Reviewed-by: Sandy Huang <hjc@rock-chips.com> Reviewed-by: Christopher Obbard <chris.obbard@collabora.com> Tested-by: Christopher Obbard <chris.obbard@collabora.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://patchwork.freedesktop.org/patch/msgid/20231023173718.188102-2-jonas@kwiboo.se
2023-10-24PCI/DPC: Use defines with DPC reason fieldsIlpo Järvinen
Add new defines for DPC reason fields and use them instead of literals. Link: https://lore.kernel.org/r/20231018113254.17616-7-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: shorten comments] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-10-24PCI/DPC: Use FIELD_GET()Bjorn Helgaas
Use FIELD_GET() to remove dependencies on the field position, i.e., the shift value. No functional change intended. Link: https://lore.kernel.org/r/20231018113254.17616-5-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-10-24PCI: dwc: Use FIELD_GET/PREP()Ilpo Järvinen
Convert open-coded variants of PCI field access into FIELD_GET/PREP() to make the code easier to understand. Add two missing defines into pci_regs.h. Logically, the Max No-Snoop Latency Register is a separate word sized register in the PCIe spec, but the pre-existing LTR defines in pci_regs.h with dword long values seem to consider the registers together (the same goes for the only user). Thus, follow the custom and make the new values also take both word long LTR registers as a joint dword register. Link: https://lore.kernel.org/r/20231024110336.26264-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-10-24iommufd: Add a flag to skip clearing of IOPTE dirtyJoao Martins
VFIO has an operation where it unmaps an IOVA while returning a bitmap with the dirty data. In reality the operation doesn't quite query the IO pagetables that the PTE was dirty or not. Instead it marks as dirty on anything that was mapped, and doing so in one syscall. In IOMMUFD the equivalent is done in two operations by querying with GET_DIRTY_IOVA followed by UNMAP_IOVA. However, this would incur two TLB flushes given that after clearing dirty bits IOMMU implementations require invalidating their IOTLB, plus another invalidation needed for the UNMAP. To allow dirty bits to be queried faster, add a flag (IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR) that requests to not clear the dirty bits from the PTE (but just reading them), under the expectation that the next operation is the unmap. An alternative is to unmap and just perpectually mark as dirty as that's the same behaviour as today. So here equivalent functionally can be provided with unmap alone, and if real dirty info is required it will amortize the cost while querying. There's still a race against DMA where in theory the unmap of the IOVA (when the guest invalidates the IOTLB via emulated iommu) would race against the VF performing DMA on the same IOVA. As discussed in [0], we are accepting to resolve this race as throwing away the DMA and it doesn't matter if it hit physical DRAM or not, the VM can't tell if we threw it away because the DMA was blocked or because we failed to copy the DRAM. [0] https://lore.kernel.org/linux-iommu/20220502185239.GR8364@nvidia.com/ Link: https://lore.kernel.org/r/20231024135109.73787-10-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24iommufd: Add capabilities to IOMMU_GET_HW_INFOJoao Martins
Extend IOMMUFD_CMD_GET_HW_INFO op to query generic iommu capabilities for a given device. Capabilities are IOMMU agnostic and use device_iommu_capable() API passing one of the IOMMU_CAP_*. Enumerate IOMMU_CAP_DIRTY_TRACKING for now in the out_capabilities field returned back to userspace. Link: https://lore.kernel.org/r/20231024135109.73787-9-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24iommufd: Add IOMMU_HWPT_GET_DIRTY_BITMAPJoao Martins
Connect a hw_pagetable to the IOMMU core dirty tracking read_and_clear_dirty iommu domain op. It exposes all of the functionality for the UAPI that read the dirtied IOVAs while clearing the Dirty bits from the PTEs. In doing so, add an IO pagetable API iopt_read_and_clear_dirty_data() that performs the reading of dirty IOPTEs for a given IOVA range and then copying back to userspace bitmap. Underneath it uses the IOMMU domain kernel API which will read the dirty bits, as well as atomically clearing the IOPTE dirty bit and flushing the IOTLB at the end. The IOVA bitmaps usage takes care of the iteration of the bitmaps user pages efficiently and without copies. Within the iterator function we iterate over io-pagetable contigous areas that have been mapped. Contrary to past incantation of a similar interface in VFIO the IOVA range to be scanned is tied in to the bitmap size, thus the application needs to pass a appropriately sized bitmap address taking into account the iova range being passed *and* page size ... as opposed to allowing bitmap-iova != iova. Link: https://lore.kernel.org/r/20231024135109.73787-8-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24iommufd: Add IOMMU_HWPT_SET_DIRTY_TRACKINGJoao Martins
Every IOMMU driver should be able to implement the needed iommu domain ops to control dirty tracking. Connect a hw_pagetable to the IOMMU core dirty tracking ops, specifically the ability to enable/disable dirty tracking on an IOMMU domain (hw_pagetable id). To that end add an io_pagetable kernel API to toggle dirty tracking: * iopt_set_dirty_tracking(iopt, [domain], state) The intended caller of this is via the hw_pagetable object that is created. Internally it will ensure the leftover dirty state is cleared /right before/ dirty tracking starts. This is also useful for iommu drivers which may decide that dirty tracking is always-enabled at boot without wanting to toggle dynamically via corresponding iommu domain op. Link: https://lore.kernel.org/r/20231024135109.73787-7-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24iommufd: Add a flag to enforce dirty tracking on attachJoao Martins
Throughout IOMMU domain lifetime that wants to use dirty tracking, some guarantees are needed such that any device attached to the iommu_domain supports dirty tracking. The idea is to handle a case where IOMMU in the system are assymetric feature-wise and thus the capability may not be supported for all devices. The enforcement is done by adding a flag into HWPT_ALLOC namely: IOMMU_HWPT_ALLOC_DIRTY_TRACKING .. Passed in HWPT_ALLOC ioctl() flags. The enforcement is done by creating a iommu_domain via domain_alloc_user() and validating the requested flags with what the device IOMMU supports (and failing accordingly) advertised). Advertising the new IOMMU domain feature flag requires that the individual iommu driver capability is supported when a future device attachment happens. Link: https://lore.kernel.org/r/20231024135109.73787-6-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>