| Age | Commit message (Collapse) | Author |
|
Not all USB4/TB implementations are based on a PCIe-attached
controller. In order to make way for these, start off with moving the
pci_device reference out of the main tb_nhi structure.
Encapsulate the existing struct in a new tb_nhi_pci, that shall also
house all properties that relate to the parent bus. Similarly, any
other type of controller will be expected to contain tb_nhi as a
member.
Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
|
|
This function is entirely unused, remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260511072239.2456725-3-hch@lst.de
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
|
|
The SPI controller API is asymmetric in that a controller is allocated
and registered in two step, while it is freed as part of deregistration.
[1]
This is especially unfortunate as any driver data is freed along with
the controller, something which has lead to use-after-free bugs during
deregistration when drivers tear down resources after deregistering the
controller (or tear down resources that may still be in use before
deregistering the controller in an attempt to work around the API).
To reduce the risk of such bugs being introduced a device managed
allocation interface was added, but this arguably made things even less
consistent as now whether the controller gets freed as part of
deregistration depends on how it was allocated. [2][3]
With most drivers converted to use managed allocation in preparation for
fixing the API, the remaining 16 drivers can be converted in one
tree-wide change. Ten of those drivers use the bitbang interface and can
be converted by simply removing the extra reference already taken by
spi_bitbang_start() (and updating the two bitbang drivers that use
managed allocation). [4]
Fix the API inconsistency by no longer dropping a reference when
deregistering non-devres allocated controllers.
[1] 68b892f1fdc4 ("spi: document odd controller reference handling")
[2] 5e844cc37a5c ("spi: Introduce device-managed SPI controller allocation")
[3] 3f174274d224 ("spi: fix misleading controller deregistration kernel-doc")
[4] 702a4879ec33 ("spi: bitbang: Let spi_bitbang_start() take a reference to master")
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260521073816.766596-1-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Each user port of the NETC switch supports 802.3 basic and mandatory
managed objects statistic counters and IETF Management Information
Database (MIB) package (RFC2665) and Remote Network Monitoring (RMON)
counters. And all of these counters are 64-bit registers. In addition,
some user ports support preemption, so these ports have two MACs, MAC
0 is the express MAC (eMAC), MAC 1 is the preemptible MAC (pMAC). So
for ports that support preemption, the statistics are the sum of the
pMAC and eMAC statistics.
Note that the current switch driver does not support preemption, all
frames are sent and received via the eMAC by default. The statistics
read from the pMAC should be zero.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260518082506.1318236-15-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The NXP NETC switch tag is a proprietary header added to frames after the
source MAC address. The switch tag has 3 types, and each type has 1 ~ 4
subtypes, the details are as follows.
Forward NXP switch tag (Type=0): Represents forwarded frames.
- SubType = 0 - Normal frame processing.
To_Port NXP switch tag (Type=1): Represents frames that are to be sent
to a specific switch port.
- SubType = 0. No request to perform timestamping.
- SubType = 1. Request to perform one-step timestamping.
- SubType = 2. Request to perform two-step timestamping.
- SubType = 3. Request to perform both one-step timestamping and
two-step timestamping.
To_Host NXP switch tag (Type=2): Represents frames redirected or copied
to the switch management port.
- SubType = 0. Received frames redirected or copied to the switch
management port.
- SubType = 1. Received frames redirected or copied to the switch
management port with captured timestamp at the switch port where
the frame was received.
- SubType = 2. Transmit timestamp response (two-step timestamping).
In addition, the length of different type switch tag is different, the
minimum length is 6 bytes, the maximum length is 14 bytes. Currently,
Forward tag, SubType 0 of To_Port tag and Subtype 0 of To_Host tag are
supported. More tags will be supported in the future.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260518082506.1318236-10-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The ingress port filter table (IPFT )contains a set of filters each
capable of classifying incoming traffic using a mix of L2, L3, and L4
parsed and arbitrary field data. As a result of a filter match, several
actions can be specified such as on whether to deny or allow a frame,
overriding internal QoS attributes associated with the frame and setting
parameters for the subsequent frame processing functions, such as stream
identification, policing, ingress mirroring. Each entry corresponds to a
filter. The ingress port filter entries are added using a precedence
value. If a frame matches multiple entries, the entry with the higher
precedence is used. Currently, this patch only adds "Add" and "Delete"
operations to the ingress port filter table. These two interfaces will
be used by both ENETC driver and NETC switch driver.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260518082506.1318236-8-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The buffer pool table contains buffer pool configuration and operational
information. Each entry corresponds to a buffer pool. The Entry ID value
represents the buffer pool ID to access.
The buffer pool table is a static bounded index table, buffer pools are
always present and enabled. It only supports Update and Query operations,
This patch only adds ntmp_bpt_update_entry() helper to support updating
the specified entry of the buffer pool table. Query action to the table
will be added in the future.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260518082506.1318236-7-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The VLAN filter table contains configuration and control information for
each VLAN configured on the switch. Each VLAN entry includes the VLAN
port membership, which FID to use in the FDB lookup, which spanning tree
group to use, the egress frame modification actions to apply to a frame
exiting form this VLAN, and various configuration and control parameters
for this VLAN.
The VLAN filter table can only be managed by the command BD ring using
table management protocol version 2.0. The table supports Add, Delete,
Update and Query operations. And the table supports 3 access methods:
Entry ID, Exact Match Key Element and Search. But currently we only add
the ntmp_vft_add_entry() helper to support the upcoming switch driver to
add an entry to the VLAN filter table. Other interfaces will be added in
the future.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260518082506.1318236-6-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The FDB table is used for MAC learning lookups and MAC forwarding lookups.
Each table entry includes information such as a FID and MAC address that
may be unicast or multicast and a forwarding destination field containing
a port bitmap identifying the associated port(s) with the MAC address.
FDB table entries can be static or dynamic. Static entries are added from
software whereby dynamic entries are added either by software or by the
hardware as MAC addresses are learned in the datapath.
The FDB table can only be managed by the command BD ring using table
management protocol version 2.0. Table management command operations Add,
Delete, Update and Query are supported. And the FDB table supports three
access methods: Entry ID, Exact Match Key Element and Search. This patch
adds the following basic supports to the FDB table.
ntmp_fdbt_update_entry() - update the configuration element data of a
specified FDB entry
ntmp_fdbt_delete_entry() - delete a specified FDB entry
ntmp_fdbt_add_entry() - add an entry into the FDB table
ntmp_fdbt_search_port_entry() - Search the FDB entry on the specified
port based on RESUME_ENTRY_ID.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260518082506.1318236-5-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Several DRM drivers already define their own constants for minimum and
maximum TMDS character rates.
By defining common rate constants in a shared header, drivers can just use
them instead of having driver local define macros or use magic numbers.
The values defined in the <linux/hdmi.h> header correspond to maximum TMDS
character rates defined by each HDMI specification version:
- HDMI_TMDS_CHAR_RATE_MIN_HZ: 25 MHz (minimum for all versions)
- HDMI_1_0_TMDS_CHAR_RATE_MAX_HZ: 165 MHz (HDMI 1.0 maximum)
- HDMI_1_3_TMDS_CHAR_RATE_MAX_HZ: 340 MHz (HDMI 1.3 maximum)
- HDMI_2_0_TMDS_CHAR_RATE_MAX_HZ: 600 MHz (HDMI 2.0 maximum)
Suggested-by: Maxime Ripard <mripard@kernel.org>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Link: https://patch.msgid.link/20260520144424.1633354-2-javierm@redhat.com
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
|
|
To get an operable version of an O_PATH file descriptor, it is possible
to use openat(fd, ".", O_DIRECTORY) for directories, but other files
currently require going through open("/proc/<pid>/fd/<nr>"), which
depends on a functioning procfs.
This patch adds the O_EMPTYPATH flag to openat(2)/openat2(2). If passed,
LOOKUP_EMPTY is set at path resolution time.
Note: This implies that you cannot rely anymore on disabling procfs from
being mounted (e.g. inside a container without procfs mounted and with
CAP_SYS_ADMIN dropped) to prevent O_PATH fds from being re-opened
read-write.
Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
Link: https://patch.msgid.link/20260424114611.1678641-2-jkoolstra@xs4all.nl
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
|
|
There is legacy 'extern' keyword for the exported simple_strtox()
function which are the artefact that can be removed. So drop it.
While at it, tweak the declaration to provide parameter names.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20260331070519.5974-7-ddiss@suse.de
Reviewed-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
No users anymore and none should be in the first place.
This reverts commit fcc155008a20fa31b01569e105250490750f0687.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20260331070519.5974-6-ddiss@suse.de
Acked-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The mdacon driver supports using ISA MDA or Hercules-compatible display
adapters as a secondary text console. This was commonly used in the
1990s and earlier for debugging software which took over the primary
display. It is highly unlikely anyone is doing so nowadays because
serial consoles and much better methods of debugging exist.
The driver is not enabled by any defconfig, nor any of the
dozens of distro configs collected at [1]. It has been relegated to VTs
13-16 since commit 0b9cf3aa6b1e ("mdacon messing up default vc's - set
default to vc13-16 again") in Linux 2.6.27 (and before Linux 2.5.53 -
see the link in the message of the above commit). The change in 2.6.27
was done because it was incorrectly detecting non-MDA adapters as MDA
and taking over all VTs, rendering them unusable.
Furthermore, vgacon supports using MDA/Hercules-compatible adapters as
the primary text console, so any systems with only one of these
adapters were already using vgacon and will not experience any loss in
functionality from the removal of this driver.
Given all of these factors, the mdacon driver is likely entirely
unused. Remove it.
[1] https://github.com/nyrahul/linux-kernel-configs/tree/f0bee86a135a0406ea427855f52702dd00d770f9
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Since "vfio/pci: Set up barmap in vfio_pci_core_enable()", the
resource request and iomap for the BARs was performed early, and
vfio_pci_core_setup_barmap() just checks those actions succeeded.
Move this logic to a new helper that checks success and returns the
iomap address, replacing the various bare vdev->barmap[] lookups.
This maintains the error behaviour of the previous on-demand
vfio_pci_core_setup_barmap() scheme.
Signed-off-by: Matt Evans <mattev@meta.com>
Link: https://lore.kernel.org/r/20260511145829.2993601-4-mattev@meta.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
The ldo_vcn33_[12]_wifi and ldo_vcn33_[12]_bt are just two regulator
outputs instead of four. The wifi and bt parts refer to separate enable
bits that are OR-ed together to affect the actual regulator output. The
separate bits allow the wifi and bt stacks to enable their power without
coordination between them. These have been deprecated in favor of proper
nodes matching the output.
Add proper ldo_vcn33_[12] regulators to replace the existing ones. The
enable status is synced to just one of the two enable bits, and the
other is forced off. This makes the handling in other bits simpler.
The existing *_(bt|wifi) regulators are converted to no-op regulators
that are fed from their new respective ldo_vcn33_[12] regulator. This
allows existing device trees to continue to work.
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Link: https://patch.msgid.link/20260514091520.2718987-7-wenst@chromium.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
__bf_shf() is currently based on built-in ffsll. It's more
straightforward to wire it to __builtin_ctzll, which makes it a pure
rename.
Worth to notice that __builtin_ffsll() is buggy on GCC before 14.1:
int main() {
sizeof(struct {
int t : !(__builtin_ffsll(~0ULL) + 1 < 0);
});
}
test.c: In function 'main':
test.c:3:21: error: bit-field 't' width not an integer constant
3 | int t : !(__builtin_ffsll(~0ULL) + 1 < 0);
| ^
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124699
Reported-by: Matt Coster <matt.coster@imgtec.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202603222211.A2XiR1YU-lkp@intel.com/
Signed-off-by: Yury Norov <ynorov@nvidia.com>
|
|
The bitfields are designed in assumption that fields contain unsigned
integer values, thus extracting the values from the field implies
zero-extending.
Some drivers need to sign-extend their fields, and currently do it like:
dc_re += sign_extend32(FIELD_GET(0xfff000, tmp), 11);
dc_im += sign_extend32(FIELD_GET(0xfff, tmp), 11);
It's error-prone because it relies on user to provide the correct
index of the most significant bit and proper 32 vs 64 function flavor.
Thus, introduce a FIELD_GET_SIGNED(). With the new API, the above
snippet turns into the more convenient:
dc_re += FIELD_GET_SIGNED(0xfff000, tmp);
dc_im += FIELD_GET_SIGNED(0xfff, tmp);
It compiles (on x86_64) into just a couple instructions: shl and sar.
When the mask includes MSB, the '<< __builtin_clzll(mask)' part becomes
a NOP, and the compiler only emits a single sar:
long long foo(long long reg)
{
10: f3 0f 1e fa endbr64
return FIELD_GET_SIGNED(GENMASK_ULL(63, 60), reg);
14: 48 89 f8 mov %rdi,%rax
17: 48 c1 f8 3c sar $0x3c,%rax
}
32-bit code generation is equally well. On arm32:
long long foo(long long reg)
{
return FIELD_GET_SIGNED(0x00f00000ULL, reg);
}
generates:
foo(long long):
lsls r1, r0, #8
asrs r0, r1, #28
asrs r1, r1, #31
bx lr
Signed-off-by: Yury Norov <ynorov@nvidia.com>
|
|
Commit 5f9c23abc477 ("firmware: smccc: Support optional Arm SMCCC SOC_ID
name") introduced the SOC_ID name string call, which reports a human
readable string describing the SoC, as returned by firmware.
The SMCCC spec v1.6 describes this feature as AArch64 only, since we rely
on 8 characters to be transmitted per register. Consequently the SMCCC
call must use the AArch64 calling convention, which requires bit 30 of
the FID to be set. The spec is a bit confusing here, since it mentions
that in the parameter description ("2: SoC name (optionally implemented for
SMC64 calls, ..."), but still prints the FID explicitly as 0x80000002.
But as this FID is using the SMC32 calling convention (correct for the
other two calls), it will not match what any SMCCC conformant firmware is
expecting, so any call would return NOT_SUPPORTED.
Add a 64-bit version of the ARCH_SOC_ID FID macro, and use that for the
SoC name version of the call to fix the issue.
Fixes: 5f9c23abc477 ("firmware: smccc: Support optional Arm SMCCC SOC_ID name")
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Link: https://patch.msgid.link/20250902172053.304911-1-andre.przywara@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
|
|
drivers-for-7.2
Merge the initial set of UBWC rework through a topic branch, to allow it
being shared with the DRM/MSM branch for the continuation.
|
|
Adreno and MDSS drivers need to know whether to enable AMSBC. Add
separate helper, describing that feature.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260507-ubwc-rework-v4-4-c19593d20c1d@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
Define special helper returning version setting for MDSS and A8xx
drivers.
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260507-ubwc-rework-v4-3-c19593d20c1d@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
Follow the comment for the macrotile_mode and introduce separate
revision for UBWC 3.0 + 8-channel macrotiling mode. It is not used by
the database (since the drivers are not yet changed to handle it yet).
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260507-ubwc-rework-v4-2-c19593d20c1d@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
In an internal review from Airoha, it was notice that the RX DMA descriptor
bits and mask are wrong. These values probably refer to an old NPU firmware
never published. The previous value works correctly but it was reported
that in some specific condition in mixed scenario with both Ethernet and
WiFi offload it's possible that RX DMA descriptor signal wrong value with
the problem to the RX ring or packets getting dropped.
To handle these specific scenario, apply the new suggested bits mask from
Airoha.
Correct functionality of both AN7581 NPU and MT7996 variant were verified
and confirmed working.
Fixes: a7fc8c641cab ("net: airoha: Fix npu rx DMA definitions")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260518134530.3683-1-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fixes from Niklas Cassel:
- Make sure that the issuing of a deferred non-NCQ command via
workqueue feature is only used when mixing NCQ and non-NCQ commands
to the same link (i.e. return value ATA_DEFER_LINK), and nothing
else. This way we will not incorrectly try to use the feature for
e.g. PATA drivers
- The deferred non-NCQ command was stored in a per-port struct. When
using Port Multipliers with FIS-Based Switching, we would thus
needlessly defer commands to all other links. Store the deferred QC
in a per-link struct, such that Port Multipliers with FBS will get
the same performance as before
- The issuing of a deferred non-NCQ command via workqueue feature broke
support for Port Multipliers using Command-Based Switching. The
issuing of a deferred non-NCQ command via workqueue feature is not
compatible with the use of ap->excl_link, which PMPs with CBS use for
fairness (using implicit round robin)
* tag 'ata-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata-scsi: do not needlessly defer commands when using PMP with FBS
ata: libata-scsi: do not use the deferred QC feature on PMPs with CBS
ata: libata-scsi: do not use the deferred QC feature for ATA_DEFER_PORT
ata: libata-scsi: improve readability of ata_scsi_qc_issue()
|
|
Remove extra semicolons from comments.
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"14 hotfixes. 9 are for MM. 10 are cc:stable and the remainder are for
post-7.1 issues or aren't deemed suitable for backporting.
There's a two-patch MAINTAINERS series from Mike Rapoport which
updates us for the new KEXEC/KDUMP/crash/LUO/etc arrangements. And
another two-patch series from Muchun Song to fix a couple of
memory-hotplug issues. Otherwise singletons, please see the changelogs
for details"
* tag 'mm-hotfixes-stable-2026-05-18-21-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/memory: fix spurious warning when unmapping device-private/exclusive pages
mm: fix __vm_normal_page() to handle missing support for pmd_special()/pud_special()
drivers/base/memory: fix memory block reference leak in poison accounting
mm/memory_hotplug: fix memory block reference leak on remove
lib: kunit_iov_iter: fix test fail on powerpc
mm/page_alloc: fix initialization of tags of the huge zero folio with init_on_free
MAINTAINERS: add kexec@ list to LIVE UPDATE ENTRY
MAINTAINERS: add tree for KDUMP and KEXEC
selftests/mm: run_vmtests.sh: fix destructive tests invocation
scripts/gdb: slab: update field names of struct kmem_cache
scripts/gdb: mm: cast untyped symbols in x86_page_ops
mm/damon: fix damos_stat tracepoint format for sz_applied
mm/damon/sysfs-schemes: call missing mem_cgroup_iter_break()
mm/migrate_device: fix spinlock leak in migrate_vma_insert_huge_pmd_page
|
|
This adds ConfigFS support to USB4/Thunderbolt bus. By itself this just
creates the subsystem but it exposes functions that can be used to
register groups under it.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
|
|
This allows the caller to wait for the ring to be empty. We are going to
need this in the upcoming userspace tunneling support.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
|
|
Add to common header a function that returns size of the ring. This can
be used in the drivers instead of rolling own version.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
|
|
Instead of the core driver programming fixed value for throttling let
the service drivers to specify the interval if they need this.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
|
|
This function can be used outside of thunderbolt networking driver so
move it to the common header.
No functional changes.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
|
|
The XDomain properties can be useful for service drivers, for example to
implement a registry for the services they expose. So far there has been
no need for service drivers to specify these but with the USB4STREAM
driver that we are going to use them.
This adds remote and local side properties that the service drivers have
access to. Remote side is read-only but the local side can be changed by
a service driver. Also provide a mechanism to notify the remote side
that there are changes.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
|
|
This allows merging one XDomain property directory into another. We are
going to use this in the subsequent patch.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
|
|
Enable context analysis for struct rt_mutex and annotate all functions that
accept a struct rt_mutex pointer. In the __rt_mutex_lock_common() callers,
instead of adding the __no_context_analysis annotation, emit a runtime
warning if the __rt_mutex_lock_common() return value is not zero and add an
__acquire() statement.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260508174520.1416285-1-bvanassche@acm.org
|
|
Recent optimizations of sd->shared assignment moved to allocating a
single instance of per-CPU sched_domain_shared objects per s_data.
Recent optimizations to select_idle_capacity() moved the sd->shared
assignments to "sd_asym" domain when ASYM_CPUCAPACITY is detected but
cache-aware scheduling mandates the presence of "sd_llc_shared" to
compute and cache per-LLC statistics.
Use an "alloc_flags" union in sched_domain_shared to claim a
sched_domain_shared object per sched_domain. Allocation starts searching
for an available / matching sched_domain_shared instance from the first
CPU of sched_domain_span(sd) (sd can be sd_llc, or sd_asym). If the
shared object is claimed by another domain, the instance corresponding
to next CPU in the domain span is explored until a matching / available
instance is found.
In case of a single CPU in sched_domain_span(), the domain will be
degenerated and a temporary overlap of ->shared objects across different
domains is acceptable.
"alloc_flags" forms a union with "nr_idle_scan" and the stale flags are
left as is when the sd->shared is published. The expectation is for the
first load balancing instance to correct the value just like the current
behavior, except the initial value is no longer 0.
Originally-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Andrea Righi <arighi@nvidia.com>
|
|
Merge the cache aware balancer topic branch.
# Conflicts:
# kernel/sched/topology.c
|
|
Now, that cpu_smt_mask is defined as cpumask_of(cpu) for
CONFIG_SCHED_SMT=n, it is possible to get rid of the ifdeffery.
Effectively,
- This makes sched_smt_present is defined always
- cpumask_weight(cpumask_of(cpu)) == 1. So sched_smt_present_inc/dec
will never enable the sched_smt_present. Which is expected.
- Paths that were compile-time eliminated become runtime guarded
using static keys.
- Defines set_idle_cores, test_idle_cores, etc which could likely benefit
the CONFIG_SCHED_SMT=n systems to use the same optimizations within the
LLC at wakeups.
- This will expose sched_smt_present symbol for CONFIG_SCHED_SMT=n.
Likely not a concern.
- There is a bloat of code CONFIG_SCHED_SMT=n. (NR_CPUS=2048)
add/remove: 24/18 grow/shrink: 26/28 up/down: 6396/-3188 (3208)
Total: Before=30629880, After=30633088, chg +0.01%
- No code bloat for CONFIG_SCHED_SMT=y, which is expected.
- Add comments around stop_core_cpuslocked on why ifdefs are not
removed.
- This leaves the remaining uses of CONFIG_SCHED_SMT mainly for
topology building bits which has a policy based decision.
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://patch.msgid.link/20260515172456.542799-3-sshegde@linux.ibm.com
|
|
Define cpu_smt_mask in case of CONFIG_SCHED_SMT=n as cpumask_of that
CPU. With that config, it is expected that kernel treats each CPU
as individual core. Using cpumask_of(cpu) reflects that.
This would help to get rid of the ifdeffery that is spread across
the codebase since cpu_smt_mask is defined only in case of
CONFIG_SCHED_SMT=y.
Note: There is no arch today which defines cpu_smt_mask unconditionally.
So likely defining the cpu_smt_mask shouldn't lead redefinition errors.
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://patch.msgid.link/20260515172456.542799-2-sshegde@linux.ibm.com
|
|
When CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is disabled, sched_clock() is
already assumed to provide stable semantics, but the public header
doesn't provide a sched_clock_stable() stub for that case.
Add a header stub that always returns true and clean up the duplicate
local stub in ring_buffer.c, so callers can use sched_clock_stable()
unconditionally.
Signed-off-by: Yiyang Chen <cyyzero16@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Link: https://patch.msgid.link/56e45338858946cd9581b75c8bd45dd37dba52c5.1778773587.git.cyyzero16@gmail.com
|
|
Reasons:
- lenovo-wmi-* feature work
- an important WMI core fix
|
|
Generating the ARM SMMUv3 and RISC-V invalidation commands optimally
requires some additional details from iommupt:
- leaf_levels_bitmap is used to compute the ARM Range Invalidation
Table Top Level hint
- leaf_levels_bitmap is also used to compute the stride when
generating single invalidations to invalidate once per leaf
- table_levels_bitmap also computes the ARM TTL for future cases when
there are no leaves
Put these under a feature since only two drivers need to calculate
them.
This is also useful for the coming kunit iotlb invalidation test to
know more about what invalidation is happening.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Tested-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
Use in-line member documentation and add some small clarifications to
the members. This is preparation to add more members.
- Note that pgsize is only used by arm-smmuv3
- Note that freelist is only used by iommupt
- Reword queued to emphasize the flush-all behavior
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Tested-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
Some virtual devices like netkit (or ifb) never DMA and never touch frag
contents, they just forward the skb to another device. They are unable
to forward unreadable skbs, however, because they fail to pass TX
validation checks on dev->netmem_tx. The existing two-state
NETMEM_TX_NONE / NETMEM_TX_DMA doesn't give the TX validator enough
information to differentiate devices that will attempt DMA on the
unreadable skb from those that will simply route it untouched.
Add a third mode to the enum so drivers can indicate 1) if they have
netmem TX support, and 2) if they do, whether they are DMA-capable:
NETMEM_TX_NO_DMA - pass-through, device never DMAs
Widen dev->netmem_tx from a 1-bit field to 2 bits to fit the new value,
and declare netkit as NETMEM_TX_NO_DMA. Devmem TX support over these
devices comes in a follow-up patch.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260514-tcp-dm-netkit-v5-2-408c59b91e66@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Devices that support netmem TX previously set dev->netmem_tx = true.
This was checked in validate_xmit_unreadable_skb() to drop unreadable
skbs (skbs with dmabuf-backed frags) before they reach drivers that
would mishandle them or devices that would not have the iommu mappings
for them.
A subsequent patch will introduce a third state for virtual devices
that forward unreadable skbs without ever performing DMA on them. To
prepare for that, convert the boolean dev->netmem_tx into an enum:
NETMEM_TX_NONE - no netmem TX support (drop unreadable skbs)
NETMEM_TX_DMA - full support, device does DMA
Update the existing NIC drivers (bnxt, gve, mlx5, fbnic) and the
validators in net/core to use the new enum. No functional change.
Acked-by: Harshitha Ramamurthy <hramamurthy@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260514-tcp-dm-netkit-v5-1-408c59b91e66@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The last user was removed in commit 72628da6d634 ("net: ks8851:
Remove ks8851_mll.c") which consolidated the driver into a
common implementation. No file includes this header.
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://patch.msgid.link/20260515184531.1515418-1-costa.shul@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a debugfs file exposing per-node DMA pool usage for mlx5_frag_buf
allocations.
# cat /sys/kernel/debug/mlx5/<dev>/frag_buf_dma_pools
node block_size used_blocks allocated_blocks
0 4096 0 0
0 8192 0 0
0 16384 0 0
0 32768 0 0
0 65536 0 0
1 4096 0 0
1 8192 0 0
1 16384 0 0
1 32768 0 0
1 65536 0 0
Signed-off-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260514104925.337570-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce generic helper functions to parse Root Port device tree nodes and
extract common properties like reset GPIOs. This allows multiple PCI host
controller drivers to share the same parsing logic.
Define struct pci_host_port to hold common Root Port properties (currently
only list of PERST# GPIO descriptors) and add pci_host_common_parse_ports()
to parse Root Port nodes from device tree.
Also add the 'ports' list to struct pci_host_bridge to better maintain
parsed Root Port information.
Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260422093549.407022-3-sherry.sun@nxp.com
|
|
There is a race condition that, after a task is enqueued
on a runqueue, task_llc(p) may change due to CPU hotplug,
because the llc_id is dynamically allocated and adjusted
at runtime.
Therefore, checking task_llc(p) to determine whether the
task is being dequeued from its preferred LLC is unreliable
and can cause inconsistent values.
To fix this problem, record whether p is enqueued on its
preferred LLC, in order to pair with account_llc_dequeue()
to maintain a consistent nr_pref_llc_running per runqueue.
This bug was reported by sashiko, and the solution was once
suggested by Prateek.
Fixes: 46afe3af7ead ("sched/cache: Track LLC-preferred tasks per runqueue")
Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/0c8c6a1571d66792a4d2ff0103ba3cc13e059046.1778703694.git.tim.c.chen@linux.intel.com
|
|
Prateek and Tingyin reported that memory-intensive workloads (such as
stream) can saturate memory bandwidth and caches on the preferred LLC
when sched_cache aggregates too many threads.
To mitigate this, estimate a process's memory footprint by comparing
its NUMA balancing fault statistics to the size of the LLC. If the
footprint exceeds the LLC size, skip cache-aware scheduling.
Note that footprint is only an approximation of the memory footprint,
since the kernel lacks suitable metrics to estimate the real working
set. If a user-provided hint is available in the future, it would be
more accurate. A later patch will allow users to provide a hint to
adjust this threshold.
Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Suggested-by: Vern Hao <vernhao@tencent.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Tingyin Duan <tingyin.duan@gmail.com>
Link: https://patch.msgid.link/95cf64a385bcc12f18dcebe9d59e8d3ba8bb318f.1778703694.git.tim.c.chen@linux.intel.com
|