summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2026-03-30RDMA/umem: Use consistent DMA attributes when unmapping entriesLeon Romanovsky
The DMA API expects that mapping and unmapping use the same DMA attributes. The RDMA umem code did not meet this requirement, so fix the mismatch. Fixes: f03d9fadfe13 ("RDMA/core: Add weak ordering dma attr to dma mapping") Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-30Merge branch 'master' into rdma-nextLeon Romanovsky
Let's bring v7.0-rc6 to the -next branch, so we can merge the DMA attributes fix [1] without merge conflicts. [1] https://lore.kernel.org/all/20260323-umem-dma-attrs-v1-1-d6890f2e6a1e@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org> * master: (1688 commits) Linux 7.0-rc6 ...
2026-03-30RDMA: Properly propagate the number of CQEs as unsigned intLeon Romanovsky
Instead of checking whether the number of CQEs is negative or zero, fix the .resize_user_cq() declaration to use unsigned int. This better reflects the expected value range. The sanity check is then handled correctly in ib_uvbers. Link: https://patch.msgid.link/20260319-resize_cq-cqe-v1-1-b78c6efc1def@nvidia.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-30RDMA: Clarify that CQ resize is a user‑space verbLeon Romanovsky
The CQ resize operation is used only by uverbs. Make this explicit. Link: https://patch.msgid.link/20260318-resize_cq-type-v1-2-b2846ed18846@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-30RDMA/core: Remove unused ib_resize_cq() implementationLeon Romanovsky
There are no in-kernel users of the CQ resize functionality, so drop it. Link: https://patch.msgid.link/20260318-resize_cq-type-v1-1-b2846ed18846@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-30RDMA/nldev: Add dellink function pointerZhu Yanjun
Add a dellink function pointer to rdma_link_ops to allow drivers to clean up resources created during newlink. Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://patch.msgid.link/20260313023058.13020-2-yanjun.zhu@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-30Merge branch '20260125-iris-ubwc-v4-1-1ff30644ac81@oss.qualcomm.com' into ↵Bjorn Andersson
drivers-for-7.1 Merge the new helpers in UBWC driver through a topic branch, to allow them to be shared with display and video branches as well.
2026-03-30soc: qcom: ubwc: add helpers to get programmable valuesDmitry Baryshkov
Currently the database stores macrotile_mode in the data. However it can be derived from the rest of the data: it should be used for UBWC encoding >= 3.0 except for several corner cases (SM8150 and SC8180X). The ubwc_bank_spread field seems to be based on the impreside data we had for the MDSS and DPU programming. In some cases UBWC engine inside the display controller doesn't need to program it, although bank spread is to be enabled. Bank swizzle is also currently stored as is, but it is almost standard (banks 1-3 for UBWC 1.0 and 2-3 for other versions), the only exception being Lemans (it uses only bank 3). Add helpers returning values from the config for now. They will be rewritten later, in a separate series, but having the helper now simplifies refacroring the code later. Tested-by: Wangao Wang <wangao.wang@oss.qualcomm.com> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260125-iris-ubwc-v4-2-1ff30644ac81@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-30soc: qcom: ubwc: add helper to get min_acc lengthDmitry Baryshkov
MDSS and GPU drivers use different approaches to get min_acc length. Add helper function that can be used by all the drivers. The helper reflects our current best guess, it blindly copies the approach adopted by the MDSS drivers and it matches current values selected by the GPU driver. Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Acked-by: Bjorn Andersson <andersson@kernel.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Dikshita Agarwal <dikshita.agarwal@oss.qualcomm.com> Tested-by: Wangao Wang <wangao.wang@oss.qualcomm.com> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260125-iris-ubwc-v4-1-1ff30644ac81@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-30ASoC: Merge up fixesMark Brown
Merge branch 'for-7.0' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into asoc-7.1 for both ASoC and general bug fixes to support testing.
2026-03-30KVM: arm64: Allow userspace to create protected VMs when pKVM is enabledWill Deacon
Introduce a new VM type for KVM/arm64 to allow userspace to request the creation of a "protected VM" when the host has booted with pKVM enabled. For now, this feature results in a taint on first use as many aspects of a protected VM are not yet protected! Tested-by: Fuad Tabba <tabba@google.com> Tested-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20260330144841.26181-32-will@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-30hvc/xen: Check console connection flagJason Andryuk
When the console out buffer is filled, __write_console() will return 0 as it cannot send any data. domU_write_console() will then spin in `while (len)` as len doesn't decrement until xenconsoled attaches. This would block a domU and nullify the parallelism of Hyperlaunch until dom0 userspace starts xenconsoled, which empties the buffer. Xen 4.21 added a connection field to the xen console page. This is set to XENCONSOLE_DISCONNECTED (1) when a domain is built, and xenconsoled will set it to XENCONSOLE_CONNECTED (0) when it connects. Update the hvc_xen driver to check the field. When the field is disconnected, drop the write with -ENOTCONN. We only drop the write when the field is XENCONSOLE_DISCONNECTED (1) to try for maximum compatibility. The Xen toolstack has historically zero initialized the console, so it should see XENCONSOLE_CONNECTED (0) by default. If an implemenation used uninitialized memory, only checking for XENCONSOLE_DISCONNECTED could have the lowest chance of not connecting. This lets the hyperlaunched domU boot without stalling. Once dom0 starts xenconsoled, xl console can be used to access the domU's hvc0. Paritally sync console.h from xen.git to bring in the new field. Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Link: https://patch.msgid.link/20260318235326.14568-1-jason.andryuk@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-30lib/linear_ranges: Add linear_range_get_selector_high_arrayAmit Sunil Dhamne
Add a helper function to find the selector for a given value in a linear range array. The selector should be such that the value it represents should be higher or equal to the given value. Signed-off-by: Amit Sunil Dhamne <amitsd@google.com> Reviewed-by: Matti Vaittinen <mazziesaccount@gmail.com> Acked-by: Mark Brown <broonie@kernel.org> Link: https://patch.msgid.link/20260325-max77759-charger-v9-4-4486dd297adc@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-30mfd: max77759: add register bitmasks and modify irq configs for chargerAmit Sunil Dhamne
Add register bitmasks for charger function. In addition split the charger IRQs further such that each bit represents an IRQ downstream of charger regmap irq chip. In addition populate the ack_base to offload irq ack to the regmap irq chip framework. Signed-off-by: Amit Sunil Dhamne <amitsd@google.com> Reviewed-by: André Draszik <andre.draszik@linaro.org> Link: https://patch.msgid.link/20260325-max77759-charger-v9-3-4486dd297adc@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-30usb: translate ENOSPC for user spaceOliver Neukum
In case of insufficient bandwidth usb_submit_urb() returns -ENOSPC. Translating this to -EIO is not optimal. There are insufficient resources not an error. EBUSY is a better fit. Signed-off-by: Oliver Neukum <oneukum@suse.com> Link: https://patch.msgid.link/20260325145537.372993-1-oneukum@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-30USB: uapi: add BULK_MAX_PACKET_UPDATEOliver Neukum
The spec for Embedded USB2 Version 2.0 adds a new feature request. This needs to be added to uapi for monitoring. Signed-off-by: Oliver Neukum <oneukum@suse.com> Link: https://patch.msgid.link/20260319144715.2957358-2-oneukum@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-30usb: uapi: add usb 3.0 authentication declarationsOliver Neukum
This adds the USB authentication extensions to the uapi chapter 9 declarations, so that user space tools correctly operate on the descriptor and commands. This is necessary for sniffing and debugging in gadget mode to correctly work, even though the kernel does not use these requests in host mode. Signed-off-by: Oliver Neukum <oneukum@suse.com> Link: https://patch.msgid.link/20260319144715.2957358-1-oneukum@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-30dt-bindings: clock: qcom: Add SM8750 GPU clocksKonrad Dybcio
The SM8750 features a "traditional" GPU_CC block, much of which is controlled through the GMU microcontroller. GPU_CC block requires the MX and CX rail control and thus add the corresponding power-domains and require-opps. Additionally, there's an separate GX_CC block, where the GX GDSC is moved. Update the bindings to accommodate for SM8750 SoC. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Taniya Das <taniya.das@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260305-gpucc_sm8750_v2-v5-1-78292b40b053@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-30dt-bindings: clock: qcom: Add CMN PLL support for IPQ8074John Crispin
The CMN PLL block in the IPQ8074 SoC takes 48 MHz as the reference input clock. Its output clocks are the bias_pll_cc_clk (300 MHz) and bias_pll_nss_noc_clk (416.5 MHz) clocks used by the networking subsystem. Add the related compatible for IPQ8074 to the ipq9574-cmn-pll generic schema. Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260311183942.10134-4-ansuelsmth@gmail.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-30dt-bindings: clock: qcom: Add CMN PLL support for IPQ6018John Crispin
The CMN PLL block in the IPQ6018 SoC takes 48 MHz as the reference input clock. Its output clocks are the bias_pll_cc_clk (300 MHz) and bias_pll_nss_noc_clk (416.5 MHz) clocks used by the networking subsystem. Add the related compatible for IPQ6018 to the ipq9574-cmn-pll generic schema. Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260311183942.10134-2-ansuelsmth@gmail.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-30dt-bindings: arm: qcom,ids: Add SoC ID for SA8650PLei wang
Add unique ID for Qualcomm SA8650P SoC. Signed-off-by: Lei wang <quic_leiwan@quicinc.com> Signed-off-by: Radu Rendec <rrendec@redhat.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260321152307.9131-2-rrendec@redhat.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-30dax: export dax_dev_get()John Groves
famfs needs to look up a dax_device by dev_t when resolving fmap entries that reference character dax devices. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: John Groves <john@groves.net> Link: https://patch.msgid.link/0100019d311daab5-bb212f0b-4e05-4668-bf53-d76fab56be68-000000@email.amazonses.com Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2026-03-30dax: Add fs_dax_get() func to prepare dax for fs-dax usageJohn Groves
The fs_dax_get() function should be called by fs-dax file systems after opening a fsdev dax device. This adds holder_operations, which provides a memory failure callback path and effects exclusivity between callers of fs_dax_get(). fs_dax_get() is specific to fsdev_dax, so it checks the driver type (which required touching bus.[ch]). fs_dax_get() fails if fsdev_dax is not bound to the memory. This function serves the same role as fs_dax_get_by_bdev(), which dax file systems call after opening the pmem block device. This can't be located in fsdev.c because struct dax_device is opaque there. This will be called by fs/fuse/famfs.c in a subsequent commit. Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: John Groves <john@groves.net> Link: https://patch.msgid.link/0100019d311d8750-75395c22-031b-4d5f-aebe-790dca656b87-000000@email.amazonses.com Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2026-03-30dax: Add dax_set_ops() for setting dax_operations at bind timeJohn Groves
Add a new dax_set_ops() function that allows drivers to set the dax_operations after the dax_device has been allocated. This is needed for fsdev_dax where the operations need to be set during probe and cleared during unbind. The fsdev driver uses devm_add_action_or_reset() for cleanup consistency, avoiding the complexity of mixing devm-managed resources with manual cleanup in a remove() callback. This ensures cleanup happens automatically in the correct reverse order when the device is unbound. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: John Groves <john@groves.net> Link: https://patch.msgid.link/0100019d311d65a0-b9c1419e-f3a0-4afd-b0bd-848f18ff5950-000000@email.amazonses.com Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2026-03-30dax: Factor out dax_folio_reset_order() helperJohn Groves
Both fs/dax.c:dax_folio_put() and drivers/dax/fsdev.c: fsdev_clear_folio_state() (the latter coming in the next commit after this one) contain nearly identical code to reset a compound DAX folio back to order-0 pages. Factor this out into a shared helper function. The new dax_folio_reset_order() function: - Clears the folio's mapping and share count - Resets compound folio state via folio_reset_order() - Clears PageHead and compound_head for each sub-page - Restores the pgmap pointer for each resulting order-0 folio - Returns the original folio order (for callers that need to advance by that many pages) Two intentional differences from the original dax_folio_put() logic: 1. folio->share is cleared unconditionally. This is correct because the DAX subsystem maintains the invariant that share != 0 only when mapping == NULL (enforced by dax_folio_make_shared()). dax_folio_put() ensures share has reached zero before calling this helper, so the unconditional clear is safe. 2. folio->pgmap is now explicitly restored for order-0 folios. For the dax_folio_put() caller this is a no-op (reads and writes back the same field). It is intentional for the upcoming fsdev_clear_folio_state() caller, which converts previously-compound folios and needs pgmap re-established for all pages regardless of order. This simplifies fsdev_clear_folio_state() from ~50 lines to ~15 lines. Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: John Groves <john@groves.net> Link: https://patch.msgid.link/0100019d311cc6b9-5be7428a-7f16-4774-8f90-a44b88ac5660-000000@email.amazonses.com Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2026-03-30powercap: intel_rapl: Remove unused AVERAGE_POWER primitiveKuppuswamy Sathyanarayanan
The AVERAGE_POWER primitive and RAPL_PRIMITIVE_DERIVED flag are not used anywhere in the code. Remove them to simplify the primitive handling logic. No functional changes. Co-developed-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20260313185333.2370733-2-sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-03-30powercap: correct kernel-doc function parameter namesRandy Dunlap
Use the correct function parameter names in kernel-doc comments to avoid these warnings: Warning: include/linux/powercap.h:254 function parameter 'name' not described in 'powercap_register_control_type' Warning: include/linux/powercap.h:298 function parameter 'nr_constraints' not described in 'powercap_register_zone' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20260312051444.685136-1-rdunlap@infradead.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-03-30crypto/ccp: Implement SNP x86 shutdownTycho Andersen (AMD)
The SEV firmware has support to disable SNP during an SNP_SHUTDOWN_EX command. Verify that this support is available and set the flag so that SNP is disabled when it is not being used. In cases where SNP is disabled, skip the call to amd_iommu_snp_disable(), as all of the IOMMU pages have already been made shared. Also skip the panic case, since snp_shutdown() does IPIs. Signed-off-by: Tycho Andersen (AMD) <tycho@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://patch.msgid.link/20260324161301.1353976-7-tycho@kernel.org
2026-03-30Merge drm/drm-fixes into drm-misc-next-fixesMaxime Ripard
Boris needs 7.0-rc6 for a shmem helper fix. Signed-off-by: Maxime Ripard <mripard@kernel.org>
2026-03-29SUNRPC: xdr.h: fix all kernel-doc warningsRandy Dunlap
Correct a function parameter name (s/page/folio/) and add function return value sections for multiple functions to eliminate kernel-doc warnings: Warning: include/linux/sunrpc/xdr.h:298 function parameter 'folio' not described in 'xdr_set_scratch_folio' Warning: include/linux/sunrpc/xdr.h:337 No description found for return value of 'xdr_stream_remaining' Warning: include/linux/sunrpc/xdr.h:357 No description found for return value of 'xdr_align_size' Warning: include/linux/sunrpc/xdr.h:374 No description found for return value of 'xdr_pad_size' Warning: include/linux/sunrpc/xdr.h:387 No description found for return value of 'xdr_stream_encode_item_present' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29svcrdma: Add Write chunk WRs to the RPC's Send WR chainChuck Lever
Previously, Write chunk RDMA Writes were posted via a separate ib_post_send() call with their own completion handler. Each Write chunk incurred a doorbell and generated a completion event. Link Write chunk WRs onto the RPC Reply's Send WR chain so that a single ib_post_send() call posts both the RDMA Writes and the Send WR. A single completion event signals that all operations have finished. This reduces both doorbell rate and completion rate, as well as eliminating the latency of a round-trip between the Write chunk completion and the subsequent Send WR posting. The lifecycle of Write chunk resources changes: previously, the svc_rdma_write_done() completion handler released Write chunk resources when RDMA Writes completed. With WR chaining, resources remain live until the Send completion. A new sc_write_info_list tracks Write chunk metadata attached to each Send context, and svc_rdma_write_chunk_release() frees these resources when the Send context is released. The svc_rdma_write_done() handler now handles only error cases. On success it returns immediately since the Send completion handles resource release. On failure (WR flush), it closes the connection to signal to the client that the RPC Reply is incomplete. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29svcrdma: Add fair queuing for Send Queue accessChuck Lever
When the Send Queue fills, multiple threads may wait for SQ slots. The previous implementation had no ordering guarantee, allowing starvation when one thread repeatedly acquires slots while others wait indefinitely. Introduce a ticket-based fair queuing system. Each waiter takes a ticket number and is served in FIFO order. This ensures forward progress for all waiters when SQ capacity is constrained. The implementation has two phases: 1. Fast path: attempt to reserve SQ slots without waiting 2. Slow path: take a ticket, wait for turn, then wait for slots The ticket system adds two atomic counters to the transport: - sc_sq_ticket_head: next ticket to issue - sc_sq_ticket_tail: ticket currently being served A dedicated wait queue (sc_sq_ticket_wait) handles ticket ordering, separate from sc_send_wait which handles SQ capacity. This separation ensures that send completions (the high-frequency wake source) wake only the current ticket holder rather than all queued waiters. Ticket handoff wakes only the ticket wait queue, and each ticket holder that exits via connection close propagates the wake to the next waiter in line. When a waiter successfully reserves slots, it advances the tail counter and wakes the next waiter. This creates an orderly handoff that prevents starvation while maintaining good throughput on the fast path when contention is low. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29SUNRPC: Optimize rq_respages allocation in svc_alloc_argChuck Lever
svc_alloc_arg() invokes alloc_pages_bulk() with the full rq_maxpages count (~259 for 1MB messages) for the rq_respages array, causing a full-array scan despite most slots holding valid pages. svc_rqst_release_pages() NULLs only the range [rq_respages, rq_next_page) after each RPC, so only that range contains NULL entries. Limit the rq_respages fill in svc_alloc_arg() to that range instead of scanning the full array. svc_init_buffer() initializes rq_next_page to span the entire rq_respages array, so the first svc_alloc_arg() call fills all slots. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29SUNRPC: Track consumed rq_pages entriesChuck Lever
The rq_pages array holds pages allocated for incoming RPC requests. Two transport receive paths NULL entries in rq_pages to prevent svc_rqst_release_pages() from freeing pages that the transport has taken ownership of: - svc_tcp_save_pages() moves partial request data pages to svsk->sk_pages during multi-fragment TCP reassembly. - svc_rdma_clear_rqst_pages() moves request data pages to head->rc_pages because they are targets of active RDMA Read WRs. A new rq_pages_nfree field in struct svc_rqst records how many entries were NULLed. svc_alloc_arg() uses it to refill only those entries rather than scanning the full rq_pages array. In steady state, the transport NULLs a handful of entries per RPC, so the allocator visits only those entries instead of the full ~259 slots (for 1MB messages). Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29SUNRPC: Allocate a separate Reply page arrayChuck Lever
struct svc_rqst uses a single dynamically-allocated page array (rq_pages) for both the incoming RPC Call message and the outgoing RPC Reply message. rq_respages is a sliding pointer into rq_pages that each transport receive path must compute based on how many pages the Call consumed. This boundary tracking is a source of confusion and bugs, and prevents an RPC transaction from having both a large Call and a large Reply simultaneously. Allocate rq_respages as its own page array, eliminating the boundary arithmetic. This decouples Call and Reply buffer lifetimes, following the precedent set by rq_bvec (a separate dynamically- allocated array for I/O vectors). Each svc_rqst now pins twice as many pages as before. For a server running 16 threads with a 1MB maximum payload, the additional cost is roughly 16MB of pinned memory. The new dynamic svc thread count facility keeps this overhead minimal on an idle server. A subsequent patch in this series limits per-request repopulation to only the pages released during the previous RPC, avoiding a full-array scan on each call to svc_alloc_arg(). Note: We've considered several alternatives to maintaining a full second array. Each alternative reintroduces either boundary logic complexity or I/O-path allocation pressure. rq_next_page is initialized in svc_alloc_arg() and svc_process() during Reply construction, and in svc_rdma_recvfrom() as a precaution on error paths. Transport receive paths no longer compute it from the Call size. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29NFSD/export: Add sign_fh export optionBenjamin Coddington
In order to signal that filehandles on this export should be signed, add a "sign_fh" export option. Filehandle signing can help the server defend against certain filehandle guessing attacks. Setting the "sign_fh" export option sets NFSEXP_SIGN_FH. In a future patch NFSD uses this signal to append a MAC onto filehandles for that export. While we're in here, tidy a few stray expflags to more closely align to the export flag order. Link: https://lore.kernel.org/linux-nfs/cover.1772022373.git.bcodding@hammerspace.com Signed-off-by: Benjamin Coddington <bcodding@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29NFSD: Add a key for signing filehandlesBenjamin Coddington
A future patch will enable NFSD to sign filehandles by appending a Message Authentication Code(MAC). To do this, NFSD requires a secret 128-bit key that can persist across reboots. A persisted key allows the server to accept filehandles after a restart. Enable NFSD to be configured with this key via the netlink interface. Link: https://lore.kernel.org/linux-nfs/cover.1772022373.git.bcodding@hammerspace.com Signed-off-by: Benjamin Coddington <bcodding@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29sunrpc: split cache_detail queue into request and reader listsJeff Layton
Replace the single interleaved queue (which mixed cache_request and cache_reader entries distinguished by a ->reader flag) with two dedicated lists: cd->requests for upcall requests and cd->readers for open file handles. Readers now track their position via a monotonically increasing sequence number (next_seqno) rather than by their position in the shared list. Each cache_request is assigned a seqno when enqueued, and a new cache_next_request() helper finds the next request at or after a given seqno. This eliminates the cache_queue wrapper struct entirely, simplifies the reader-skipping loops in cache_read/cache_poll/cache_ioctl/ cache_release, and makes the data flow easier to reason about. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29sunrpc: convert queue_wait from global to per-cache-detail waitqueueJeff Layton
The queue_wait waitqueue is currently a file-scoped global, so a wake_up for one cache_detail wakes pollers on all caches. Convert it to a per-cache-detail field so that only pollers on the relevant cache are woken. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29sunrpc: convert queue_lock from global spinlock to per-cache-detail lockJeff Layton
The global queue_lock serializes all upcall queue operations across every cache_detail instance. Convert it to a per-cache-detail spinlock so that different caches (e.g. auth.unix.ip vs nfsd.fh) no longer contend with each other on queue operations. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29sunrpc: Add XPT flags missing from SVC_XPRT_FLAG_LISTChuck Lever
Commit eccbbc7c00a5 ("nfsd: don't use sv_nrthreads in connection limiting calculations.") and commit 898374fdd7f0 ("nfsd: unregister with rpcbind when deleting a transport") added new XPT flags but neglected to update the show_svc_xprt_flags() macro. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29Documentation: Add the RPC language description of NLM version 4Chuck Lever
In order to generate source code to encode and decode NLMv4 protocol elements, include a copy of the RPC language description of NLMv4 for xdrgen to process. The language description is an amalgam of RFC 1813 and the Open Group's XNFS specification: https://pubs.opengroup.org/onlinepubs/9629799/chap10.htm The C code committed here was generated from the new nlm4.x file using tools/net/sunrpc/xdrgen/xdrgen. The goals of replacing hand-written XDR functions with ones that are tool-generated are to improve memory safety and make XDR encoding and decoding less brittle to maintain. The xdrgen utility derives both the type definitions and the encode/decode functions directly from protocol specifications, using names and symbols familiar to anyone who knows those specs. Unlike hand-written code that can inadvertently diverge from the specification, xdrgen guarantees that the generated code matches the specification exactly. We would eventually like xdrgen to generate Rust code as well, making the conversion of the kernel's NFS stacks to use Rust just a little easier for us. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29NFSD: Enforce timeout on layout recall and integrate lease manager fencingDai Ngo
When a layout conflict triggers a recall, enforcing a timeout is necessary to prevent excessive nfsd threads from being blocked in __break_lease ensuring the server continues servicing incoming requests efficiently. This patch introduces a new function to lease_manager_operations: lm_breaker_timedout: Invoked when a lease recall times out and is about to be disposed of. This function enables the lease manager to inform the caller whether the file_lease should remain on the flc_list or be disposed of. For the NFSD lease manager, this function now handles layout recall timeouts. If the layout type supports fencing and the client has not been fenced, a fence operation is triggered to prevent the client from accessing the block device. While the fencing operation is in progress, the conflicting file_lease remains on the flc_list until fencing is complete. This guarantees that no other clients can access the file, and the client with exclusive access is properly blocked before disposal. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29sunrpc: Fix compilation error (`make W=1`) when dprintk() is no-opAndy Shevchenko
Clang compiler is not happy about set but unused variables: .../flexfilelayout/flexfilelayoutdev.c:56:9: error: variable 'ret' set but not used [-Werror,-Wunused-but-set-variable] .../flexfilelayout/flexfilelayout.c:1505:6: error: variable 'err' set but not used [-Werror,-Wunused-but-set-variable] .../nfs4proc.c:9244:12: error: variable 'ptr' set but not used [-Werror,-Wunused-but-set-variable] Fix these by forwarding parameters of dprintk() to no_printk(). The positive side-effect is a format-string checker enabled even for the cases when dprintk() is no-op. Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Fixes: fc931582c260 ("nfs41: create_session operation") Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29sunrpc: Kill RPC_IFDEBUG()Andy Shevchenko
RPC_IFDEBUG() is used in only two places. In one the user of the definition is guarded by ifdeffery, in the second one it's implied due to dprintk() usage. Kill the macro and move the ifdeffery to the regular condition with the variable defined inside, while in the second case add the same conditional and move the respective code there. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29lockd: Make linux/lockd/nlm.h an internal headerChuck Lever
The NLM protocol constants and status codes in nlm.h are needed only by lockd's internal implementation. NFS client code and NFSD interact with lockd through the stable API in bind.h and have no direct use for protocol-level definitions. Exposing these definitions globally via bind.h creates unnecessary coupling between lockd internals and its consumers. Moving nlm.h from include/linux/lockd/ to fs/lockd/ clarifies the API boundary: bind.h provides the lockd service interface, while nlm.h remains available only to code within fs/lockd/ that implements the protocol. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29lockd: Move xdr.h from include/linux/lockd/ to fs/lockd/Chuck Lever
The lockd subsystem unnecessarily exposes internal NLM XDR type definitions through the global include path. These definitions are not used by any code outside fs/lockd/, making them inappropriate for include/linux/lockd/. Moving xdr.h to fs/lockd/ narrows the API surface and clarifies that these types are internal implementation details. The comment in linux/lockd/bind.h stating xdr.h was needed for "xdr-encoded error codes" is stale: no lockd API consumers use those codes. Forward declarations for struct nfs_fh and struct file_lock are added to bind.h because their definitions were previously pulled in transitively through xdr.h. Additionally, nfs3proc.c and proc.c need explicit includes of filelock.h for FL_CLOSE and for accessing struct file_lock members, respectively. Built and tested with lockd client/server operations. No functional change. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29lockd: Remove lockd/debug.hChuck Lever
The lockd include structure has unnecessary indirection. The header include/linux/lockd/debug.h is consumed only by fs/lockd/lockd.h, creating an extra compilation dependency and making the code harder to navigate. Fold the debug.h definitions directly into lockd.h and remove the now-redundant header. This reduces the include tree depth and makes the debug-related definitions easier to find when working on lockd internals. Build-tested with lockd built as module and built-in. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29lockd: Relocate include/linux/lockd/lockd.hChuck Lever
Headers placed in include/linux/ form part of the kernel's internal API and signal to subsystem maintainers that other parts of the kernel may depend on them. By moving lockd.h into fs/lockd/, lockd becomes a more self-contained module whose internal interfaces are clearly distinguished from its public contract with the rest of the kernel. This relocation addresses a long-standing XXX comment in the header itself that acknowledged the file's misplacement. Future changes to lockd internals can now proceed with confidence that external consumers are not inadvertently coupled to implementation details. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29lockd: Move share.h from include/linux/lockd/ to fs/lockd/Chuck Lever
The share.h header defines struct nlm_share and declares the DOS share management functions used by the NLM server to implement NLM_SHARE and NLM_UNSHARE operations. These interfaces are used exclusively within the lockd subsystem. A git grep search confirms no external code references them. Relocating this header from include/linux/lockd/ to fs/lockd/ narrows the public API surface of the lockd module. Out-of-tree code cannot depend on these internal interfaces after this change. Future refactoring of the share management implementation thus requires no consideration of external consumers. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>