summaryrefslogtreecommitdiff
path: root/net/sunrpc
AgeCommit message (Collapse)Author
2026-05-26Merge tag 'nfsd-7.1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Regressions: - Tighten bounds checking for sunrpc cache hash tables - Don't report key material in the ftrace log Stable fix: - Fix lockd's implementation of the NLM TEST procedure" * tag 'nfsd-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: lockd: fix TEST handling when not all permissions are available. NFSD: Report whether fh_key was actually updated sunrpc: prevent out-of-bounds read in __cache_seq_start()
2026-05-21sunrpc: prevent out-of-bounds read in __cache_seq_start()Chuck Lever
Commit 7b546bd89975 ("sunrpc/cache: improve RCU safety in cache_list walking.") changed the tail of __cache_seq_start() to unconditionally store *pos = ((long long)hash << 32) + 1 before returning, dropping a prior "hash >= cd->hash_size" guard. When the while loop exits because every remaining bucket was empty, hash equals cd->hash_size, so the stored *pos is one position past the table's last valid bucket. seq_read_iter() caches that index in m->index. A subsequent pread(2) at the same file offset skips traverse() and hands the stored index back to __cache_seq_start(), which decodes hash = cd->hash_size and dereferences cd->hash_table[cd->hash_size] -- one hlist_head past the end of the kzalloc'd table. KASAN reports an eight-byte slab-out-of-bounds read at the tail of the 2048-byte hash_table allocation for the NFSD export cache (EXPORT_HASHMAX * sizeof(struct hlist_head) == 256 * 8). Reject an input hash that is out of range before touching the hash table. cache_seq_next() already bounds-checks its own loop; the start routine needs to be symmetric. Reported-by: syzbot+60cfa08822470bbebe44@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=60cfa08822470bbebe44 Fixes: 7b546bd89975 ("sunrpc/cache: improve RCU safety in cache_list walking.") Reviewed-by: Benjamin Coddington <bcodding@hammerspace.com> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-05-15Merge tag 'nfsd-7.1-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Fixes for this release: - Correctness fix for the new sunrpc cache netlink protocol Marked for stable: - Correctness fixes for delegated attributes - Prevent an infinite loop when revoking layouts" * tag 'nfsd-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: Fix infinite loop in layout state revocation sunrpc: start cache request seqno at 1 to fix netlink GET_REQS nfsd: update mtime/ctime on COPY in presence of delegated attributes nfsd: update mtime/ctime on CLONE in presense of delegated attributes nfsd: fix file change detection in CB_GETATTR nfsd: fix GET_DIR_DELEGATION when VFS leases are disabled
2026-05-10sunrpc: start cache request seqno at 1 to fix netlink GET_REQSJeff Layton
sunrpc_cache_requests_snapshot() filters requests with crq->seqno <= min_seqno. The min_seqno for the first netlink dump call is cb->args[0] which is 0. Since next_seqno was initialized to 0, the very first cache request got seqno=0 and was silently skipped by the snapshot (0 <= 0 is true). This caused netlink-based GET_REQS to return 0 pending requests even when a request was queued, preventing mountd from resolving cache entries (particularly expkey/nfsd.fh). The unresolved CACHE_PENDING state blocked all further notifications for the entry, leading to permanent NFS4ERR_DELAY hangs. Start next_seqno at 1 so all requests have seqno >= 1 and pass the snapshot filter when min_seqno is 0. Fixes: facc4e3c8042 ("sunrpc: split cache_detail queue into request and reader lists") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-04-24Merge tag 'nfs-for-7.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Bugfixes: - Fix handling of ENOSPC so that if we have to resend writes, they are written synchronously - SUNRPC RDMA transport fixes from Chuck - Several fixes for delegated timestamps in NFSv4.2 - Failure to obtain a directory delegation should not cause stat() to fail with NFSv4 - Rename was failing to update timestamps when a directory delegation is held on NFSv4 - Ensure we check rsize/wsize after crossing a NFSv4 filesystem boundary - NFSv4/pnfs: - If the server is down, retry the layout returns on reboot - Fallback to MDS could result in a short write being incorrectly logged Cleanups: - Use memcpy_and_pad in decode_fh" * tag 'nfs-for-7.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (21 commits) NFS: Fix RCU dereference of cl_xprt in nfs_compare_super_address NFS: remove redundant __private attribute from nfs_page_class NFSv4.2: fix CLONE/COPY attrs in presence of delegated attributes NFS: fix writeback in presence of errors nfs: use memcpy_and_pad in decode_fh NFSv4.1: Apply session size limits on clone path NFSv4: retry GETATTR if GET_DIR_DELEGATION failed NFS: fix RENAME attr in presence of directory delegations pnfs/flexfiles: validate ds_versions_cnt is non-zero NFS/blocklayout: print each device used for SCSI layouts xprtrdma: Post receive buffers after RPC completion xprtrdma: Scale receive batch size with credit window xprtrdma: Replace rpcrdma_mr_seg with xdr_buf cursor xprtrdma: Decouple frwr_wp_create from frwr_map xprtrdma: Close lost-wakeup race in xprt_rdma_alloc_slot xprtrdma: Avoid 250 ms delay on backlog wakeup xprtrdma: Close sendctx get/put race that can block a transport nfs: update inode ctime after removexattr operation nfs: fix utimensat() for atime with delegated timestamps NFS: improve "Server wrote zero bytes" error ...
2026-04-20Merge tag 'nfsd-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: - filehandle signing to defend against filehandle-guessing attacks (Benjamin Coddington) The server now appends a SipHash-2-4 MAC to each filehandle when the new "sign_fh" export option is enabled. NFSD then verifies filehandles received from clients against the expected MAC; mismatches return NFS error STALE - convert the entire NLMv4 server-side XDR layer from hand-written C to xdrgen-generated code, spanning roughly thirty patches (Chuck Lever) XDR functions are generally boilerplate code and are easy to get wrong. The goals of this conversion are improved memory safety, lower maintenance burden, and groundwork for eventual Rust code generation for these functions. - improve pNFS block/SCSI layout robustness with two related changes (Dai Ngo) SCSI persistent reservation fencing is now tracked per client and per device via an xarray, to avoid both redundant preempt operations on devices already fenced and a potential NFSD deadlock when all nfsd threads are waiting for a layout return. - scalability and infrastructure improvements Sincere thanks to all contributors, reviewers, testers, and bug reporters who participated in the v7.1 NFSD development cycle. * tag 'nfsd-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (83 commits) NFSD: Docs: clean up pnfs server timeout docs nfsd: fix comment typo in nfsxdr nfsd: fix comment typo in nfs3xdr NFSD: convert callback RPC program to per-net namespace NFSD: use per-operation statidx for callback procedures svcrdma: Use contiguous pages for RDMA Read sink buffers SUNRPC: Add svc_rqst_page_release() helper SUNRPC: xdr.h: fix all kernel-doc warnings svcrdma: Factor out WR chain linking into helper svcrdma: Add Write chunk WRs to the RPC's Send WR chain svcrdma: Clean up use of rdma->sc_pd->device svcrdma: Clean up use of rdma->sc_pd->device in Receive paths svcrdma: Add fair queuing for Send Queue access SUNRPC: Optimize rq_respages allocation in svc_alloc_arg SUNRPC: Track consumed rq_pages entries svcrdma: preserve rq_next_page in svc_rdma_save_io_pages SUNRPC: Handle NULL entries in svc_rqst_release_pages SUNRPC: Allocate a separate Reply page array SUNRPC: Tighten bounds checking in svc_rqst_replace_page NFSD: Sign filehandles ...
2026-04-13xprtrdma: Post receive buffers after RPC completionChuck Lever
rpcrdma_post_recvs() runs in CQ poll context and its cost falls on the latency-critical path between polling a Receive completion and waking the RPC consumer. Every cycle spent refilling the Receive Queue delays delivery of the reply to the NFS layer. Move the rpcrdma_post_recvs() call in rpcrdma_reply_handler() to after the RPC has been decoded and completed. The larger batch size from the preceding patch provides sufficient Receive Queue headroom to absorb the brief delay before buffers are replenished. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2026-04-13xprtrdma: Scale receive batch size with credit windowChuck Lever
The fixed RPCRDMA_MAX_RECV_BATCH of 7 results in frequent small ib_post_recv batches during high-rate workloads. With a 128-slot credit window, receives are reposted every 7th completion, each batch incurring atomic serialization and a doorbell write. Replace the fixed batch constant with a per-endpoint value scaled to 25% of the negotiated credit window. For a typical 128-credit connection this raises the batch from 7 to 32, reducing doorbell frequency by roughly 4x and amortizing the per-batch atomic and MMIO costs over a larger group of receive WRs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2026-04-13xprtrdma: Replace rpcrdma_mr_seg with xdr_buf cursorChuck Lever
The FRWR registration path converts data through three representations: xdr_buf -> rpcrdma_mr_seg[] -> scatterlist[] -> ib_map_mr_sg(). The rpcrdma_mr_seg intermediate is a relic of when multiple registration strategies existed (FMR, physical, FRWR). Only FRWR remains, so this indirection and the 6240-byte rl_segments[260] array embedded in each rpcrdma_req serve no purpose. Introduce struct rpcrdma_xdr_cursor to track position within an xdr_buf during iterative MR registration. Rewrite frwr_map to populate scatterlist entries directly from the xdr_buf regions (head kvec, page list, tail kvec). The boundary logic for non-SG_GAPS devices is simpler because the xdr_buf structure guarantees that page-region entries after the first start at offset 0, and that head/tail kvecs are separate regions that naturally break at MR boundaries. Fix a pre-existing bug in rpcrdma_encode_write_list where the write-pad statistics accumulator added mr->mr_length from the last data MR rather than the write-pad MR. The refactored code uses ep->re_write_pad_mr->mr_length. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2026-04-13xprtrdma: Decouple frwr_wp_create from frwr_mapChuck Lever
frwr_wp_create is the only caller of frwr_map outside the encode path. It registers a single 4-byte write-pad region from a stack- local rpcrdma_mr_seg. Inlining the registration logic directly (sg_init_table + sg_set_page + ib_dma_map_sg + ib_map_mr_sg + IOVA mangle + reg_wr setup) eliminates the coupling that would otherwise complicate the removal of rpcrdma_mr_seg from frwr_map's interface. The inlined version adds a proper error-unwind ladder: on failure, the DMA mapping (if established) is released, ep->re_write_pad_mr is cleared, and the MR is returned to the transport free list. The old frwr_map-based code relied on rpcrdma_mrs_destroy at teardown to reclaim partially-initialized MRs. This is a one-time setup path; duplicating ~20 lines is a reasonable tradeoff for decoupling the write-pad registration from the data- path MR registration. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2026-04-13xprtrdma: Close lost-wakeup race in xprt_rdma_alloc_slotChuck Lever
xprt_rdma_alloc_slot() and xprt_rdma_free_slot() lack serialization between the buffer pool and the backlog queue. A buffer freed after rpcrdma_buffer_get() finds the pool empty but before rpc_sleep_on() places the task on the backlog is returned to the pool with no waiter to wake, leaving the task stuck on the backlog indefinitely. After joining the backlog, re-check the pool and route any recovered buffer through xprt_wake_up_backlog(), whose queue lock serializes with concurrent wakeups and avoids double-assignment of slots. Because xprt_rdma_free_slot() does not hold reserve_lock, the XPRT_CONGESTED double-check in xprt_throttle_congested() is ineffective: a task can join the backlog through that path after free_slot has already found it empty and cleared the bit. Avoid this by using xprt_add_backlog_noncongested(), which queues the task without setting XPRT_CONGESTED, so every allocation reaches xprt_rdma_alloc_slot() and its post-sleep re-check. Fixes: edb41e61a54e ("xprtrdma: Make rpc_rqst part of rpcrdma_req") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2026-04-13xprtrdma: Avoid 250 ms delay on backlog wakeupChuck Lever
Commit a721035477fb ("SUNRPC/xprt: async tasks mustn't block waiting for memory") changed xprt_rdma_alloc_slot() to set tk_status to -ENOMEM so that call_reserveresult() would sleep HZ/4 before retrying. That rationale applies to xprt_dynamic_alloc_slot(), where an immediate retry under memory pressure wastes CPU, but not to the RDMA backlog path: a task woken from the backlog has a slot waiting for it, so the 250 ms rpc_delay adds latency without benefit. This also aligns the code with the existing kernel-doc for xprt_rdma_alloc_slot(), which already documented %-EAGAIN. Fixes: a721035477fb ("SUNRPC/xprt: async tasks mustn't block waiting for memory") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2026-04-13xprtrdma: Close sendctx get/put race that can block a transportChuck Lever
rpcrdma_sendctx_get_locked() and rpcrdma_sendctx_put_locked() can race in a way that leaves XPRT_WRITE_SPACE set permanently, blocking all further sends on the transport: get_locked put_locked (Send completion) ---------- -------------------------- read rb_sc_tail -> ring full advance rb_sc_tail xprt_write_space(): test_bit(WRITE_SPACE) -> not set, return set_bit(WRITE_SPACE) return NULL (-EAGAIN) After the sender releases XPRT_LOCKED, the release path refuses to wake the next task because XPRT_WRITE_SPACE is set. The sender retries, finds XPRT_WRITE_SPACE still set, and sleeps on xprt_sending. No further Send completions arrive to clear the flag because no new Sends can be posted. With nconnect, the stalled transport's share of congestion credits are never returned, starving the remaining transports as well. Fixes: 05eb06d86685 ("xprtrdma: Fix occasional transport deadlock") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2026-04-09kernfs: pass struct ns_common instead of const void * for namespace tagsChristian Brauner
kernfs has historically used const void * to pass around namespace tags used for directory-level namespace filtering. The only current user of this is sysfs network namespace tagging where struct net pointers are cast to void *. Replace all const void * namespace parameters with const struct ns_common * throughout the kernfs, sysfs, and kobject namespace layers. This includes the kobj_ns_type_operations callbacks, kobject_namespace(), and all sysfs/kernfs APIs that accept or return namespace tags. Passing struct ns_common is needed because various codepaths require access to the underlying namespace. A struct ns_common can always be converted back to the concrete namespace type (e.g., struct net) via container_of() or to_ns_common() in the reverse direction. This is a preparatory change for switching to ns_id-based directory iteration to prevent a KASLR pointer leak through the current use of raw namespace pointers as hash seeds and comparison keys. Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-04-03svcrdma: Use contiguous pages for RDMA Read sink buffersChuck Lever
svc_rdma_build_read_segment() constructs RDMA Read sink buffers by consuming pages one-at-a-time from rq_pages[] and building one bvec per page. A 64KB NFS READ payload produces 16 separate bvecs, 16 DMA mappings, and potentially multiple RDMA Read WRs (on platforms with 4KB pages). A single higher-order allocation followed by split_page() yields physically contiguous memory while preserving per-page refcounts. A single bvec spanning the contiguous range causes rdma_rw_ctx_init_bvec() to take the rdma_rw_init_single_wr_bvec() fast path: one DMA mapping, one SGE, one WR. The split sub-pages replace the original rq_pages[] entries, so all downstream page tracking, completion handling, and xdr_buf assembly remain unchanged. Allocation uses __GFP_NORETRY | __GFP_NOWARN and falls back through decreasing orders. If even order-1 fails, the existing per-page path handles the segment. When nr_pages is not a power of two, get_order() rounds up and the allocation yields more pages than needed. The extra split pages replace existing rq_pages[] entries (freed via put_page() first), so there is no net increase in per- request page consumption. Successive segments reuse the same padding slots, preventing accumulation. The rq_maxpages guard rejects any allocation that would overrun the array, falling back to the per-page path. Under memory pressure, __GFP_NORETRY causes the higher- order allocation to fail without stalling. The contiguous path is attempted when the segment starts page-aligned (rc_pageoff == 0) and spans at least two pages. NFS WRITE segments carry application-modified byte ranges of arbitrary length, so the optimization is not restricted to power-of-two page counts. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-04-03SUNRPC: Add svc_rqst_page_release() helperChuck Lever
svc_rqst_replace_page() releases displaced pages through a per-rqst folio batch, but exposes the add-or-flush sequence directly. svc_tcp_restore_pages() releases displaced pages individually with put_page(). Introduce svc_rqst_page_release() to encapsulate the batched release mechanism. Convert svc_rqst_replace_page() and svc_tcp_restore_pages() to use it. The latter now benefits from the same batched release that svc_rqst_replace_page() already uses. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29svcrdma: Factor out WR chain linking into helperChuck Lever
svc_rdma_prepare_write_chunk() and svc_rdma_prepare_reply_chunk() contain identical code for linking RDMA R/W work requests onto a Send context's WR chain. This duplication increases maintenance burden and risks divergent bug fixes. Introduce svc_rdma_cc_link_wrs() to consolidate the WR chain linking logic. The helper walks the chunk context's rwctxts list, chains each WR via rdma_rw_ctx_wrs(), and updates the Send context's chain head and SQE count. Completion signaling is requested only for the tail WR (posted first). No functional change. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29svcrdma: Add Write chunk WRs to the RPC's Send WR chainChuck Lever
Previously, Write chunk RDMA Writes were posted via a separate ib_post_send() call with their own completion handler. Each Write chunk incurred a doorbell and generated a completion event. Link Write chunk WRs onto the RPC Reply's Send WR chain so that a single ib_post_send() call posts both the RDMA Writes and the Send WR. A single completion event signals that all operations have finished. This reduces both doorbell rate and completion rate, as well as eliminating the latency of a round-trip between the Write chunk completion and the subsequent Send WR posting. The lifecycle of Write chunk resources changes: previously, the svc_rdma_write_done() completion handler released Write chunk resources when RDMA Writes completed. With WR chaining, resources remain live until the Send completion. A new sc_write_info_list tracks Write chunk metadata attached to each Send context, and svc_rdma_write_chunk_release() frees these resources when the Send context is released. The svc_rdma_write_done() handler now handles only error cases. On success it returns immediately since the Send completion handles resource release. On failure (WR flush), it closes the connection to signal to the client that the RPC Reply is incomplete. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29svcrdma: Clean up use of rdma->sc_pd->deviceChuck Lever
I can't think of a reason why svcrdma is using the PD's device. Most other consumers of the IB DMA API use the ib_device pointer from the connection's rdma_cm_id. I don't think there's any functional difference between the two, but it is a little confusing to see some uses of rdma_cm_id and some of ib_pd. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29svcrdma: Clean up use of rdma->sc_pd->device in Receive pathsChuck Lever
I can't think of a reason why svcrdma is using the PD's device. Most other consumers of the IB DMA API use the ib_device pointer from the connection's rdma_cm_id. I don't believe there's any functional difference between the two, but it is a little confusing to see some uses of rdma_cm_id->device and some of ib_pd->device. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29svcrdma: Add fair queuing for Send Queue accessChuck Lever
When the Send Queue fills, multiple threads may wait for SQ slots. The previous implementation had no ordering guarantee, allowing starvation when one thread repeatedly acquires slots while others wait indefinitely. Introduce a ticket-based fair queuing system. Each waiter takes a ticket number and is served in FIFO order. This ensures forward progress for all waiters when SQ capacity is constrained. The implementation has two phases: 1. Fast path: attempt to reserve SQ slots without waiting 2. Slow path: take a ticket, wait for turn, then wait for slots The ticket system adds two atomic counters to the transport: - sc_sq_ticket_head: next ticket to issue - sc_sq_ticket_tail: ticket currently being served A dedicated wait queue (sc_sq_ticket_wait) handles ticket ordering, separate from sc_send_wait which handles SQ capacity. This separation ensures that send completions (the high-frequency wake source) wake only the current ticket holder rather than all queued waiters. Ticket handoff wakes only the ticket wait queue, and each ticket holder that exits via connection close propagates the wake to the next waiter in line. When a waiter successfully reserves slots, it advances the tail counter and wakes the next waiter. This creates an orderly handoff that prevents starvation while maintaining good throughput on the fast path when contention is low. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29SUNRPC: Optimize rq_respages allocation in svc_alloc_argChuck Lever
svc_alloc_arg() invokes alloc_pages_bulk() with the full rq_maxpages count (~259 for 1MB messages) for the rq_respages array, causing a full-array scan despite most slots holding valid pages. svc_rqst_release_pages() NULLs only the range [rq_respages, rq_next_page) after each RPC, so only that range contains NULL entries. Limit the rq_respages fill in svc_alloc_arg() to that range instead of scanning the full array. svc_init_buffer() initializes rq_next_page to span the entire rq_respages array, so the first svc_alloc_arg() call fills all slots. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29SUNRPC: Track consumed rq_pages entriesChuck Lever
The rq_pages array holds pages allocated for incoming RPC requests. Two transport receive paths NULL entries in rq_pages to prevent svc_rqst_release_pages() from freeing pages that the transport has taken ownership of: - svc_tcp_save_pages() moves partial request data pages to svsk->sk_pages during multi-fragment TCP reassembly. - svc_rdma_clear_rqst_pages() moves request data pages to head->rc_pages because they are targets of active RDMA Read WRs. A new rq_pages_nfree field in struct svc_rqst records how many entries were NULLed. svc_alloc_arg() uses it to refill only those entries rather than scanning the full rq_pages array. In steady state, the transport NULLs a handful of entries per RPC, so the allocator visits only those entries instead of the full ~259 slots (for 1MB messages). Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29svcrdma: preserve rq_next_page in svc_rdma_save_io_pagesChuck Lever
svc_rdma_save_io_pages() transfers response pages to the send context and sets those slots to NULL. It then resets rq_next_page to equal rq_respages, hiding the NULL region from svc_rqst_release_pages(). Now that svc_rqst_release_pages() handles NULL entries, this reset is no longer necessary. Removing it preserves the invariant that the range [rq_respages, rq_next_page) accurately describes how many response pages were consumed, enabling a subsequent optimization in svc_alloc_arg() that refills only the consumed range. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29SUNRPC: Handle NULL entries in svc_rqst_release_pagesChuck Lever
svc_rqst_release_pages() releases response pages between rq_respages and rq_next_page. It currently passes the entire range to release_pages(), which does not expect NULL entries. A subsequent patch preserves the rq_next_page pointer in svc_rdma_save_io_pages() so that it accurately records how many response pages were consumed. After that change, the range [rq_respages, rq_next_page) can contain NULL entries where pages have already been transferred to a send context. Iterate through the range entry by entry, skipping NULLs, to handle this case correctly. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29SUNRPC: Allocate a separate Reply page arrayChuck Lever
struct svc_rqst uses a single dynamically-allocated page array (rq_pages) for both the incoming RPC Call message and the outgoing RPC Reply message. rq_respages is a sliding pointer into rq_pages that each transport receive path must compute based on how many pages the Call consumed. This boundary tracking is a source of confusion and bugs, and prevents an RPC transaction from having both a large Call and a large Reply simultaneously. Allocate rq_respages as its own page array, eliminating the boundary arithmetic. This decouples Call and Reply buffer lifetimes, following the precedent set by rq_bvec (a separate dynamically- allocated array for I/O vectors). Each svc_rqst now pins twice as many pages as before. For a server running 16 threads with a 1MB maximum payload, the additional cost is roughly 16MB of pinned memory. The new dynamic svc thread count facility keeps this overhead minimal on an idle server. A subsequent patch in this series limits per-request repopulation to only the pages released during the previous RPC, avoiding a full-array scan on each call to svc_alloc_arg(). Note: We've considered several alternatives to maintaining a full second array. Each alternative reintroduces either boundary logic complexity or I/O-path allocation pressure. rq_next_page is initialized in svc_alloc_arg() and svc_process() during Reply construction, and in svc_rdma_recvfrom() as a precaution on error paths. Transport receive paths no longer compute it from the Call size. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29SUNRPC: Tighten bounds checking in svc_rqst_replace_pageChuck Lever
svc_rqst_replace_page() builds the Reply buffer by advancing rq_next_page through the response page range. The bounds check validates rq_next_page against the full rq_pages array, but the valid range for rq_next_page is [rq_respages, rq_page_end]. Use those bounds instead. This is correct today because rq_respages and rq_page_end both point into rq_pages, and it prepares for a subsequent change that separates the Reply page array from rq_pages. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29sunrpc: split cache_detail queue into request and reader listsJeff Layton
Replace the single interleaved queue (which mixed cache_request and cache_reader entries distinguished by a ->reader flag) with two dedicated lists: cd->requests for upcall requests and cd->readers for open file handles. Readers now track their position via a monotonically increasing sequence number (next_seqno) rather than by their position in the shared list. Each cache_request is assigned a seqno when enqueued, and a new cache_next_request() helper finds the next request at or after a given seqno. This eliminates the cache_queue wrapper struct entirely, simplifies the reader-skipping loops in cache_read/cache_poll/cache_ioctl/ cache_release, and makes the data flow easier to reason about. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29sunrpc: convert queue_wait from global to per-cache-detail waitqueueJeff Layton
The queue_wait waitqueue is currently a file-scoped global, so a wake_up for one cache_detail wakes pollers on all caches. Convert it to a per-cache-detail field so that only pollers on the relevant cache are woken. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29sunrpc: convert queue_lock from global spinlock to per-cache-detail lockJeff Layton
The global queue_lock serializes all upcall queue operations across every cache_detail instance. Convert it to a per-cache-detail spinlock so that different caches (e.g. auth.unix.ip vs nfsd.fh) no longer contend with each other on queue operations. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29sunrpc: Kill RPC_IFDEBUG()Andy Shevchenko
RPC_IFDEBUG() is used in only two places. In one the user of the definition is guarded by ifdeffery, in the second one it's implied due to dprintk() usage. Kill the macro and move the ifdeffery to the regular condition with the variable defined inside, while in the second case add the same conditional and move the respective code there. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29SUNRPC: Replace KUnit tests for memcmp() with KUNIT_EXPECT_MEMEQ_MSG()Ryota Sakamoto
Replace KUnit tests for memcmp() with KUNIT_EXPECT_MEMEQ_MSG() to improve debugging that prints the hex dump of the buffers when the assertion fails, whereas memcmp() only returns an integer difference. Signed-off-by: Ryota Sakamoto <sakamo.ryota@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29sunrpc/cache: improve RCU safety in cache_list walking.NeilBrown
1/ consistently use hlist_add_head_rcu() when adding to the cachelist to reflect the fact that it can be concurrently walked using RCU. In fact hlist_add_head() has all the needed barriers so this is no safety issue, primarily a clarity issue. 2/ call cache_get() *before* adding the list with hlist_add_head_rcu(). It is generally safest to inc the refcount before publishing a reference. In this case it doesn't have any behavioural effect as code which does an RCU walk does not depend on precision of the refcount, and it will always be at least one. But it looks more correct to use this order. 3/ avoid possible races between NULL tests and hlist_entry_safe() calls. It is possible that a test will find that .next or .head is not NULL, but hlist_entry_safe() will find that it is NULL. This can lead to incorrect behaviour with the list-walk terminating early. It is safest to always call hlist_entry_safe() and test the result. Also simplify the *ppos calculation by simply assigning the hash shifted 32, rather than masking out low bits and incrementing high bits. Signed-off-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-18Merge tag 'nfsd-7.0-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix cache_request leak in cache_release() - Fix heap overflow in the NFSv4.0 LOCK replay cache - Hold net reference for the lifetime of /proc/fs/nfs/exports fd - Defer sub-object cleanup in export "put" callbacks * tag 'nfsd-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: fix heap overflow in NFSv4.0 LOCK replay cache sunrpc: fix cache_request leak in cache_release NFSD: Hold net reference for the lifetime of /proc/fs/nfs/exports fd NFSD: Defer sub-object cleanup in export put callbacks
2026-03-14sunrpc: fix cache_request leak in cache_releaseJeff Layton
When a reader's file descriptor is closed while in the middle of reading a cache_request (rp->offset != 0), cache_release() decrements the request's readers count but never checks whether it should free the request. In cache_read(), when readers drops to 0 and CACHE_PENDING is clear, the cache_request is removed from the queue and freed along with its buffer and cache_head reference. cache_release() lacks this cleanup. The only other path that frees requests with readers == 0 is cache_dequeue(), but it runs only when CACHE_PENDING transitions from set to clear. If that transition already happened while readers was still non-zero, cache_dequeue() will have skipped the request, and no subsequent call will clean it up. Add the same cleanup logic from cache_read() to cache_release(): after decrementing readers, check if it reached 0 with CACHE_PENDING clear, and if so, dequeue and free the cache_request. Reported-by: NeilBrown <neilb@ownmail.net> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@kernel.org Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-02-27xprtrdma: Decrement re_receiving on the early exit pathsEric Badger
In the event that rpcrdma_post_recvs() fails to create a work request (due to memory allocation failure, say) or otherwise exits early, we should decrement ep->re_receiving before returning. Otherwise we will hang in rpcrdma_xprt_drain() as re_receiving will never reach zero and the completion will never be triggered. On a system with high memory pressure, this can appear as the following hung task: INFO: task kworker/u385:17:8393 blocked for more than 122 seconds. Tainted: G S E 6.19.0 #3 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u385:17 state:D stack:0 pid:8393 tgid:8393 ppid:2 task_flags:0x4248060 flags:0x00080000 Workqueue: xprtiod xprt_autoclose [sunrpc] Call Trace: <TASK> __schedule+0x48b/0x18b0 ? ib_post_send_mad+0x247/0xae0 [ib_core] schedule+0x27/0xf0 schedule_timeout+0x104/0x110 __wait_for_common+0x98/0x180 ? __pfx_schedule_timeout+0x10/0x10 wait_for_completion+0x24/0x40 rpcrdma_xprt_disconnect+0x444/0x460 [rpcrdma] xprt_rdma_close+0x12/0x40 [rpcrdma] xprt_autoclose+0x5f/0x120 [sunrpc] process_one_work+0x191/0x3e0 worker_thread+0x2e3/0x420 ? __pfx_worker_thread+0x10/0x10 kthread+0x10d/0x230 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x273/0x2b0 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 Fixes: 15788d1d1077 ("xprtrdma: Do not refresh Receive Queue while it is draining") Signed-off-by: Eric Badger <ebadger@purestorage.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-02-21Convert more 'alloc_obj' cases to default GFP_KERNEL argumentsLinus Torvalds
This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_flex' family to use the new default GFP_KERNEL argumentLinus Torvalds
This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the *alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs*'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-12Merge tag 'nfs-for-7.0-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "New Features: - Use an LRU list for returning unused delegations - Introduce a KConfig option to disable NFS v4.0 and make NFS v4.1 the default Bugfixes: - NFS/localio: - Handle short writes by retrying - Prevent direct reclaim recursion into NFS via nfs_writepages - Use GFP_NOIO and non-memreclaim workqueue in nfs_local_commit - Remove -EAGAIN handling in nfs_local_doio() - pNFS: fix a missing wake up while waiting on NFS_LAYOUT_DRAIN - fs/nfs: Fix a readdir slow-start regression - SUNRPC: fix gss_auth kref leak in gss_alloc_msg error path Other cleanups and improvements: - A few other NFS/localio cleanups - Various other delegation handling cleanups from Christoph - Unify security_inode_listsecurity() calls - Improvements to NFSv4 lease handling - Clean up SUNRPC *_debug fields when CONFIG_SUNRPC_DEBUG is not set" * tag 'nfs-for-7.0-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (60 commits) SUNRPC: fix gss_auth kref leak in gss_alloc_msg error path nfs: nfs4proc: Convert comma to semicolon SUNRPC: Change list definition method sunrpc: rpc_debug and others are defined even if CONFIG_SUNRPC_DEBUG unset NFSv4: limit lease period in nfs4_set_lease_period() NFSv4: pass lease period in seconds to nfs4_set_lease_period() nfs: unify security_inode_listsecurity() calls fs/nfs: Fix readdir slow-start regression pNFS: fix a missing wake up while waiting on NFS_LAYOUT_DRAIN NFS: fix delayed delegation return handling NFS: simplify error handling in nfs_end_delegation_return NFS: fold nfs_abort_delegation_return into nfs_end_delegation_return NFS: remove the delegation == NULL check in nfs_end_delegation_return NFS: use bool for the issync argument to nfs_end_delegation_return NFS: return void from ->return_delegation NFS: return void from nfs4_inode_make_writeable NFS: Merge CONFIG_NFS_V4_1 with CONFIG_NFS_V4 NFS: Add a way to disable NFS v4.0 via KConfig NFS: Move sequence slot operations into minorversion operations NFS: Pass a struct nfs_client to nfs4_init_sequence() ...
2026-02-12Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "Usual smallish cycle. The NFS biovec work to push it down into RDMA instead of indirecting through a scatterlist is pretty nice to see, been talked about for a long time now. - Various code improvements in irdma, rtrs, qedr, ocrdma, irdma, rxe - Small driver improvements and minor bug fixes to hns, mlx5, rxe, mana, mlx5, irdma - Robusness improvements in completion processing for EFA - New query_port_speed() verb to move past limited IBA defined speed steps - Support for SG_GAPS in rts and many other small improvements - Rare list corruption fix in iwcm - Better support different page sizes in rxe - Device memory support for mana - Direct bio vec to kernel MR for use by NFS-RDMA - QP rate limiting for bnxt_re - Remote triggerable NULL pointer crash in siw - DMA-buf exporter support for RDMA mmaps like doorbells" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (66 commits) RDMA/mlx5: Implement DMABUF export ops RDMA/uverbs: Add DMABUF object type and operations RDMA/uverbs: Support external FD uobjects RDMA/siw: Fix potential NULL pointer dereference in header processing RDMA/umad: Reject negative data_len in ib_umad_write IB/core: Extend rate limit support for RC QPs RDMA/mlx5: Support rate limit only for Raw Packet QP RDMA/bnxt_re: Report QP rate limit in debugfs RDMA/bnxt_re: Report packet pacing capabilities when querying device RDMA/bnxt_re: Add support for QP rate limiting MAINTAINERS: Drop RDMA files from Hyper-V section RDMA/uverbs: Add __GFP_NOWARN to ib_uverbs_unmarshall_recv() kmalloc svcrdma: use bvec-based RDMA read/write API RDMA/core: add rdma_rw_max_sge() helper for SQ sizing RDMA/core: add MR support for bvec-based RDMA operations RDMA/core: use IOVA-based DMA mapping for bvec RDMA operations RDMA/core: add bio_vec based RDMA read/write API RDMA/irdma: Use kvzalloc for paged memory DMA address array RDMA/rxe: Fix race condition in QP timer handlers RDMA/mana_ib: Add device‑memory support ...
2026-02-12Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves disk space by teaching ocfs2 to reclaim suballocator block group space (Heming Zhao) - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the ARRAY_END() macro and uses it in various places (Alejandro Colomar) - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the page size (Pnina Feder) - "kallsyms: Prevent invalid access when showing module buildid" cleans up kallsyms code related to module buildid and fixes an invalid access crash when printing backtraces (Petr Mladek) - "Address page fault in ima_restore_measurement_list()" fixes a kexec-related crash that can occur when booting the second-stage kernel on x86 (Harshit Mogalapalli) - "kho: ABI headers and Documentation updates" updates the kexec handover ABI documentation (Mike Rapoport) - "Align atomic storage" adds the __aligned attribute to atomic_t and atomic64_t definitions to get natural alignment of both types on csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain) - "kho: clean up page initialization logic" simplifies the page initialization logic in kho_restore_page() (Pratyush Yadav) - "Unload linux/kernel.h" moves several things out of kernel.h and into more appropriate places (Yury Norov) - "don't abuse task_struct.group_leader" removes the usage of ->group_leader when it is "obviously unnecessary" (Oleg Nesterov) - "list private v2 & luo flb" adds some infrastructure improvements to the live update orchestrator (Pasha Tatashin) * tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits) watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency procfs: fix missing RCU protection when reading real_parent in do_task_stat() watchdog/softlockup: fix sample ring index wrap in need_counting_irqs() kcsan, compiler_types: avoid duplicate type issues in BPF Type Format kho: fix doc for kho_restore_pages() tests/liveupdate: add in-kernel liveupdate test liveupdate: luo_flb: introduce File-Lifecycle-Bound global state liveupdate: luo_file: Use private list list: add kunit test for private list primitives list: add primitives for private list manipulations delayacct: fix uapi timespec64 definition panic: add panic_force_cpu= parameter to redirect panic to a specific CPU netclassid: use thread_group_leader(p) in update_classid_task() RDMA/umem: don't abuse current->group_leader drm/pan*: don't abuse current->group_leader drm/amd: kill the outdated "Only the pthreads threading model is supported" checks drm/amdgpu: don't abuse current->group_leader android/binder: use same_thread_group(proc->tsk, current) in binder_mmap() android/binder: don't abuse current->group_leader kho: skip memoryless NUMA nodes when reserving scratch areas ...
2026-02-09SUNRPC: fix gss_auth kref leak in gss_alloc_msg error pathDaniel Hodges
Commit 5940d1cf9f42 ("SUNRPC: Rebalance a kref in auth_gss.c") added a kref_get(&gss_auth->kref) call to balance the gss_put_auth() done in gss_release_msg(), but forgot to add a corresponding kref_put() on the error path when kstrdup_const() fails. If service_name is non-NULL and kstrdup_const() fails, the function jumps to err_put_pipe_version which calls put_pipe_version() and kfree(gss_msg), but never releases the gss_auth reference. This leads to a kref leak where the gss_auth structure is never freed. Add a forward declaration for gss_free_callback() and call kref_put() in the err_put_pipe_version error path to properly release the reference taken earlier. Fixes: 5940d1cf9f42 ("SUNRPC: Rebalance a kref in auth_gss.c") Cc: stable@vger.kernel.org Signed-off-by: Daniel Hodges <git@danielhodges.dev> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-02-09SUNRPC: Change list definition methodChenguang Zhao
The LIST_HEAD macro can both define a linked list and initialize it in one step. To simplify code, we replace the separate operations of linked list definition and manual initialization with the LIST_HEAD macro. Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-28sunrpc: allow svc_recv() to return -ETIMEDOUT and -EBUSYJeff Layton
To dynamically adjust the thread count, nfsd requires some information about how busy things are. Change svc_recv() to take a timeout value, and then allow the wait for work to time out if it's set. If a timeout is not defined, then the schedule will be set to MAX_SCHEDULE_TIMEOUT. If the task waits for the full timeout, then have it return -ETIMEDOUT to the caller. If it wakes up, finds that there is more work and that no threads are available, then attempt to set SP_TASK_STARTING. If wasn't already set, have the task return -EBUSY to cue to the caller that the service could use more threads. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28sunrpc: split new thread creation into a separate functionJeff Layton
Break out the part of svc_start_kthreads() that creates a thread into svc_new_thread(), as a new exported helper function. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28sunrpc: introduce the concept of a minimum number of threads per poolJeff Layton
Add a new pool->sp_nrthrmin field to track the minimum number of threads in a pool. Add min_threads parameters to both svc_set_num_threads() and svc_set_pool_threads(). If min_threads is non-zero and less than the max, svc_set_num_threads() will ensure that the number of running threads is between the min and the max. If the min is 0 or greater than the max, then it is ignored, and the maximum number of threads will be started, and never spun down. For now, the min_threads is always 0, but a later patch will pass the proper value through from nfsd. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28sunrpc: track the max number of requested threads in a poolJeff Layton
The kernel currently tracks the number of threads running in a pool in the "sp_nrthreads" field. In the future, where threads are dynamically spun up and down, it'll be necessary to keep track of the maximum number of requested threads separately from the actual number running. Add a pool->sp_nrthrmax parameter to track this. When userland changes the number of threads in a pool, update that value accordingly. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28sunrpc: remove special handling of NULL pool from svc_start/stop_kthreads()Jeff Layton
Now that svc_set_num_threads() handles distributing the threads among the available pools, remove the special handling of a NULL pool pointer from svc_start_kthreads() and svc_stop_kthreads(). Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>