summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
12 daysnetfilter: flowtable: Validate iph->ihl in nf_flow_ip4_tunnel_proto()Lorenzo Bianconi
Add sanity check for iph->ihl field in nf_flow_ip4_tunnel_proto() before using it to compute the header size, avoiding out-of-bounds access with malformed IP headers. While at it, use iph->protocol instead of the hardcoded IPPROTO_IPIP constant when setting ctx->tun.proto and reference ctx->tun.hdr_size when updating ctx->offset. Fixes: ab427db178858 ("netfilter: flowtable: Add IPIP rx sw acceleration") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
12 daysnetfilter: nf_conncount: prevent connlimit drops for early confirmed ctFernando Fernandez Mancera
Commit 69894e5b4c5e ("netfilter: nft_connlimit: update the count if add was skipped") introduced a regression where packets for valid connections are dropped when using connlimit for soft-limiting scenarios. The issue occurs when a new connection reuses a socket currently in the TIME_WAIT state. In this scenario, the connection tracking entry is evaluated as already confirmed. Previously, __nf_conncount_add() assumed that if a connection was confirmed and did not originate from the loopback interface, it should skip the addition and return -EEXIST. Skipping the addition triggers a garbage collection run that cleans up the TIME_WAIT connection. Consequently, the active connection count drops to 0, which xt_connlimit mishandles, leading to the false rejection of the perfectly valid new connection. Fix this by replacing the interface check with protocol-agnostic state checks. We now skip the tree insertion and preserve the lockless garbage collection optimization only if the connection is IPS_ASSURED. This allows early-confirmed setup packets (such as reused TIME_WAIT sockets or locally generated SYN-ACKs) to be properly evaluated and counted without falsely dropping. The goto check_connections path is maintained to ensure these setup packets are deduplicated correctly. This has been tested with slowhttptest and HTTP server configured locally to ensure we are not breaking soft-limiting scenarios for local or external connections. In addition, it was tested with a OVS zone limit too. Fixes: 69894e5b4c5e ("netfilter: nft_connlimit: update the count if add was skipped") Reported-by: Alejandro Olivan Alvarez <alejandro.olivan.alvarez@gmail.com> Closes: https://lore.kernel.org/netfilter-devel/177349610461.3071718.4083978280323144323@eldamar.lan/ Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
12 daysnetfilter: nf_nat: avoid invalid nat_net pointer use on failed nf_nat_init()Mathias Krause
We ran into below KASAN splat, which is mostly uninteresting, beside for having nf_nat_register_fn() in the call chain as a cause for the offending access: ================================================================== BUG: KASAN: slab-out-of-bounds in nf_nat_register_fn+0x5f9/0x640 Read of size 8 at addr ffff890031e54c20 by task iptables/9510 CPU: 0 UID: 0 PID: 9510 Comm: iptables Not tainted 6.18.18-grsec-full-20260320181326 #1 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 Call Trace: <TASK> […] dump_stack_lvl+0xee/0x160 ffff88004117eeb8 […] print_report+0x6e/0x640 ffff88004117eee0 […] ? __phys_addr+0x8e/0x140 ffff88004117eef0 […] ? kasan_addr_to_slab+0x51/0xe0 ffff88004117ef08 […] ? complete_report_info+0xec/0x1c0 ffff88004117ef20 […] ? nf_nat_register_fn+0x5f9/0x640 ffff88004117ef48 […] kasan_report+0xbc/0x140 ffff88004117ef50 […] ? nf_nat_register_fn+0x5f9/0x640 ffff88004117ef90 […] nf_nat_register_fn+0x5f9/0x640 ffff88004117eff8 […] ? nf_nat_icmp_reply_translation+0x6e0/0x6e0 ffff88004117f070 […] nf_tables_register_hook.part.0+0xa0/0x220 ffff88004117f080 […] nf_tables_addchain.constprop.0+0x1054/0x1fc0 ffff88004117f0b8 […] ? nft_chain_lookup.part.0+0x4ce/0xac0 ffff88004117f130 […] ? nf_tables_abort+0x3d80/0x3d80 ffff88004117f190 […] ? nf_tables_dumpreset_obj+0x100/0x100 ffff88004117f1c8 […] ? nft_table_lookup.part.0+0x255/0x300 ffff88004117f310 […] ? nf_tables_newchain+0x21a4/0x2fa0 ffff88004117f358 […] nf_tables_newchain+0x21a4/0x2fa0 ffff88004117f360 […] ? nf_tables_addchain.constprop.0+0x1fc0/0x1fc0 ffff88004117f458 […] ? nla_get_range_signed+0x4a0/0x4a0 ffff88004117f488 […] ? lock_acquire+0x16f/0x320 ffff88004117f490 […] ? find_held_lock+0x3b/0xe0 ffff88004117f4b0 […] ? __nla_parse+0x45/0x80 ffff88004117f500 […] nfnetlink_rcv_batch+0xbca/0x19a0 ffff88004117f550 […] ? nfnetlink_net_exit_batch+0x120/0x120 ffff88004117f618 […] ? __sanitizer_cov_trace_switch+0x63/0xe0 ffff88004117f720 […] ? gr_acl_handle_mmap+0x1c4/0x320 ffff88004117f7c0 […] ? nla_get_range_signed+0x4a0/0x4a0 ffff88004117f7e8 […] ? gr_is_capable+0x6f/0xe0 ffff88004117f830 […] ? __nla_parse+0x45/0x80 ffff88004117f860 […] ? skb_pull+0x103/0x1a0 ffff88004117f880 […] nfnetlink_rcv+0x3db/0x4a0 ffff88004117f8b0 […] ? nfnetlink_rcv_batch+0x19a0/0x19a0 ffff88004117f8d8 […] ? netlink_lookup+0xe2/0x240 ffff88004117f900 […] netlink_unicast+0x74b/0xb00 ffff88004117f930 […] ? netlink_attachskb+0xb20/0xb20 ffff88004117f980 […] ? __check_object_size+0x3e/0xaa0 ffff88004117f998 […] ? security_netlink_send+0x51/0x160 ffff88004117f9c8 […] netlink_sendmsg+0xa03/0x1200 ffff88004117f9f8 […] ? netlink_unicast+0xb00/0xb00 ffff88004117fa70 […] ? netlink_unicast+0xb00/0xb00 ffff88004117fac8 […] ? ____sys_sendmsg+0xe2a/0x1040 ffff88004117faf8 […] ____sys_sendmsg+0xe2a/0x1040 ffff88004117fb00 […] ? kernel_recvmsg+0x300/0x300 ffff88004117fb60 […] ? reacquire_held_locks+0xe9/0x260 ffff88004117fbc8 […] ___sys_sendmsg+0x138/0x200 ffff88004117fbf8 […] ? do_recvmmsg+0x7e0/0x7e0 ffff88004117fc30 […] ? lockdep_hardirqs_on_prepare+0x101/0x1e0 ffff88004117fc50 […] ? lock_acquire+0x16f/0x320 ffff88004117fd20 […] ? lock_acquire+0x16f/0x320 ffff88004117fd58 […] ? find_held_lock+0x3b/0xe0 ffff88004117fd70 […] __sys_sendmsg+0x17a/0x260 ffff88004117fdc8 […] ? __sys_sendmsg_sock+0x80/0x80 ffff88004117fdf0 […] ? syscall_trace_enter+0x15e/0x2c0 ffff88004117fe98 […] do_syscall_64+0x7d/0x400 ffff88004117fec8 […] entry_SYSCALL_64_safe_stack+0x4a/0x60 ffff88004117fef8 </TASK> ================================================================== The out-of-bounds report, though, is a red herring as it is for an access that shouldn't have happened in the first place. When nf_nat_init() fails to register its BPF kfuncs, it'll unwind and, among others, call unregister_pernet_subsys() to deregister its per-net ops. This makes the previously allocated net id available for reuse by the next caller of register_pernet_subsys(), in our case, synproxy. However, 'nat_net_id' will still hold the previously allocated value. If nf_nat.o gets build as a module, all this doesn't matter. A failed initialization routine makes the module fail to load and any dependent module won't be able to load either. However, if nf_nat.o is built-in, a failing init won't /completely/ make its functionality unavailable to dependent modules, namely the code and static data is still there, free to be called by modules like nft_chain_nat.ko. Case in point, nft_chain_nat registers hooks that'll call into nf_nat which, in our case, failed to initialize and therefore won't have a valid net id nor related net_nat object any more. Code in nf_nat, namely nf_nat_register_fn() and nf_nat_unregister_fn(), still making use of the reallocated net id, lead to a type confusion as the call to net_generic() will no longer return memory belonging to an object suited to fit 'struct nat_net' but 'struct synproxy_net' instead. The latter is only 24 bytes on 64-bit systems, much smaller than struct nat_net which is 176 bytes, perfectly explaining the OOB KASAN report. Detect and handle a failed nf_nat_init() by testing the 'nf_nat_hook' pointer which will be reset to NULL on initialization errors to prevent the usage of an invalid nat_net pointer. As this check is only needed when nf_nat.o is built-in, guard it by '#ifndef MODULE...'. Fixes: cbc1dd5b659f ("netfilter: nf_nat: Fix possible memory leak in nf_nat_init()") Signed-off-by: Mathias Krause <minipli@grsecurity.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
12 daysMerge branch 'next' into for-linusDmitry Torokhov
Prepare input updates for 7.2 merge window.
12 daysInput: mms114 - fix touch indexing for MMS134S and MMS136Dmitry Torokhov
The MMS134S and MMS136 touch controllers have an event size of 6 bytes rather than 8 bytes. When __mms114_read_reg() reads the touch data packet from the device into the touch buffer, the events are packed tightly at 6-byte intervals. However, the driver iterates through the events using standard C array indexing (touch[index]), where each element is sizeof(struct mms114_touch) (8 bytes) apart. As a result, any touch events beyond the first one are read from incorrect offsets and parsed improperly. Fix this by explicitly calculating the byte offset for each touch event based on the device's specific event size. Fixes: 53fefdd1d3a3 ("Input: mms114 - support MMS136") Fixes: ab108678195f ("Input: mms114 - support MMS134S") Reported-by: sashiko-bot@kernel.org Assisted-by: Antigravity:gemini-3.5-flash Reviewed-by: Bryam Vargas <hexlabsecurity@proton.me> Link: https://patch.msgid.link/20260616050912.1531241-1-dmitry.torokhov@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
12 daysInput: elan_i2c - prevent division by zero and arithmetic underflowRanjan Kumar
The Elan I2C touchpad driver queries the device for its physical dimensions and trace counts to calculate the device resolution and width. However, if the device firmware or device tree provides invalid zero values for x_traces or y_traces, it results in a fatal division-by-zero exception leading to a kernel panic during device probe. Add checks to ensure these parameters are non-zero before performing the division. If invalid trace values are detected, fall back to a safe default of 1. Additionally, prevent an arithmetic underflow in the touch reporting logic. Previously, if the calculated or fallback width was smaller than ETP_FWIDTH_REDUCE (90), the subtraction would underflow, resulting in a massive unsigned integer being reported to userspace. Clamp the adjusted width to a minimum of 0 to safely handle small physical dimensions and fallback scenarios. Completing the probe with safe fallback values ensures the sysfs nodes are created, keeping the firmware update path intact so a recovery firmware can be flashed to the device. Fixes: 6696777c6506 ("Input: add driver for Elan I2C/SMbus touchpad") Fixes: e3a9a1290688 ("Input: elan_i2c - do not query the info if they are provided") Signed-off-by: Ranjan Kumar <kumarranja@chromium.org> Link: https://patch.msgid.link/20260612060339.3829666-1-kumarranja@chromium.org Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
12 daysInput: stop force-feedback timer when unregistering input devicesDmitry Torokhov
Memoryless force-feedback devices use a timer to manage playback of effects. When a driver for such a device is unbound (or the device is unregistered for other reasons), the driver typically frees its private data synchronously. However, the input_dev structure (and its associated force-feedback structures, including the timer) is only freed when the last user closes the corresponding device node. If userspace keeps the device node open while the device is unregistered (e.g., during driver unbind), the force-feedback timer can still fire after the driver's private data has been freed. Introduce a new 'stop' callback to struct ff_device, and call it from input_unregister_device() before the device is deleted. Implement this callback for memoryless devices and synchronously shut down the timer to ensure it is stopped and cannot be rearmed once unregistration happens. Assisted-by: Gemini:gemini-3.1-pro Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
12 daysInput: iforce - bound the device-reported force-feedback effect indexBryam Vargas
iforce_process_packet() handles a status report (packet id 0x02) by taking a force-feedback effect index straight from the device wire and using it to address the per-effect state array: i = data[1] & 0x7f; if (data[1] & 0x80) { if (!test_and_set_bit(FF_CORE_IS_PLAYED, iforce->core_effects[i].flags)) ... } else if (test_and_clear_bit(FF_CORE_IS_PLAYED, iforce->core_effects[i].flags)) { ... } The index is masked only with 0x7f, so it ranges 0..127, but core_effects[] holds only IFORCE_EFFECTS_MAX (32) entries. For an index of 32..127 the test_and_set_bit()/test_and_clear_bit() is an out-of-bounds single-bit read-modify-write past the array. core_effects[] is the second-to-last member of struct iforce, so the write lands in the trailing members and beyond the embedding kzalloc()'d iforce_serio / iforce_usb object. data[1] is unvalidated device payload on both transports (the USB interrupt endpoint and serio), and the status path is not gated on force feedback being present, so a malicious or counterfeit device can set or clear a bit at an attacker-chosen offset past the object. Reject an out-of-range index instead of indexing with it. Bound against the array dimension IFORCE_EFFECTS_MAX rather than dev->ff->max_effects so the check guarantees memory safety regardless of how many effects the device registered. A legitimate "effect started/stopped" status always carries an index below IFORCE_EFFECTS_MAX, so well-formed devices are unaffected; the neighbouring mark_core_as_ready() loop is already bounded and is left untouched. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260613-b4-disp-4828d263-v1-1-02320e1a89dd@proton.me Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
12 daysInput: goodix - clamp the device-reported contact countBryam Vargas
goodix_ts_read_input_report() copies the number of touch points reported by the device into an on-stack buffer u8 point_data[2 + GOODIX_MAX_CONTACT_SIZE * GOODIX_MAX_CONTACTS]; which is sized for at most GOODIX_MAX_CONTACTS (10) contacts. The only runtime check bounds the per-interrupt count against ts->max_touch_num, but that value is taken verbatim from a 4-bit field of the device configuration block and is never clamped: ts->max_touch_num = ts->config[MAX_CONTACTS_LOC] & 0x0f; The nibble can be 0..15, so a malfunctioning, malicious or counterfeit controller (or an attacker tampering with the I2C bus) can advertise up to 15 contacts. goodix_ts_read_input_report() then accepts a touch_num of up to 15 and the second goodix_i2c_read() writes ts->contact_size * (touch_num - 1) bytes past the one-contact header into point_data - up to 30 bytes (45 with the 9-byte report format) beyond the 92-byte buffer: a stack out-of-bounds write. Clamp max_touch_num to GOODIX_MAX_CONTACTS, the number of contacts point_data[] is sized for, when reading it from the configuration. Fixes: a7ac7c95d468 ("Input: goodix - use max touch number from device config") Cc: stable@vger.kernel.org Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me> Reviewed-by: Hans de Goede <johannes.goede@oss.qualcomm.com> Link: https://patch.msgid.link/20260612-b4-disp-6844625d-v1-1-df0aed080c9d@proton.me Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
12 daysbpf: Disable xfrm_decode_session hook attachmentBradley Morgan
BPF LSM programs can currently attach to xfrm_decode_session(). That hook may return an error, but security_skb_classify_flow() calls it from a void path and triggers BUG_ON() if an error is returned. Disable BPF attachment to the hook to prevent a BPF LSM program from turning packet classification into a full panic. Fixes: 9e4e01dfd325 ("bpf: lsm: Implement attach, detach and execution") Signed-off-by: Bradley Morgan <include@grrlz.net> Link: https://lore.kernel.org/r/20260619130305.27779-1-include@grrlz.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
12 daysMerge tag 'erofs-for-7.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "The most notable change is the removal of the fscache backend: it has been deprecated for almost two years, mainly because EROFS file-backed mounts and fanotify pre-content hooks (together with erofs-utils) now provide better functionality and simpler codebase. In addition, fscache has depended on netfslib for years, which is undesirable for EROFS since it is a local filesystem. More details in [1]. In addition, sparse support has been added to the pcluster layout, which is helpful for large sparse AI datasets, and map requests for chunk-based inodes have been optimized to be more efficient as well. There are also the usual fixes and cleanups. Summary: - Report more consecutive chunks of the same type for each iomap request - Add sparse support for the pcluster layout - Update the EROFS documentation overview - Remove the deprecated fscache backend - Various fixes and cleanups" Link: https://lore.kernel.org/r/20260622013622.934174-1-hsiangkao@linux.alibaba.com [1] * tag 'erofs-for-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: handle 48-bit blocks_hi for compressed inodes erofs: remove fscache backend entirely erofs: simplify RCU read critical sections erofs: add sparse support to pcluster layout erofs: add folio order to trace_erofs_read_folio erofs: introduce erofs_map_chunks() erofs: call erofs_exit_ishare() before rcu_barrier() erofs: update the overview of the documentation erofs: clean up erofs_ishare_fill_inode()
12 daysmd/raid5: avoid R5_Overlap races while breaking stripe batchesChen Cheng
KCSAN report a race in break_stripe_batch_list() vs. raid5_make_request() on sh->dev[i].flags (plain word write vs. atomic bit op).. and .. one possible scenario is: CPU1 CPU2 break_stripe_batch_list(sh1) -> handle sh2 -> lock(sh2) -> sh2->batch_head = NULL -> unlock(sh2) -> test_and_clear_bit(R5_Overlap, sh2->dev[i].flags) -> wake_up_bit(sh2->dev[i].flags) raid5_make_request() -> add_all_stripe_bios(sh2) -> lock(sh2) -> stripe_bio_overlaps(sh2) returns true batch_head is NULL, so new bio overlap exist bio on sh2 -> true -> set_bit(R5_Overlap, sh2->dev[i].flags) -> unlock(sh2) -> wait_on_bit(sh2->dev[i].flags) -> sh2->dev[i].flags = sh1->dev[i].flags & ~R5_Overlap No wait_up_bit(), CPU2 could be wait_on_bit() forever... Fix by : - Expand the protect zone. - Use batch_head's device flag's snaphot when no held head_sh->stripe_lock. - Move sh/head_sh->batch_head = NULL to the end of protected zone , and , any concurrent add_all_stripe_bios() grabs sh->stripe_lock now either: - see batch_head != null, and , is rejected by stripe_bio_overlaps() under the lock (no R5_Overlap wait ) , or , - sees batch_head == NULL, only after dev[i].flags has already been set and the prior R5_Overlap waiters worken. KCSAN report: ================================================ BUG: KCSAN: data-race in break_stripe_batch_list / raid5_make_request write (marked) to 0xffff8e89c8117548 of 8 bytes by task 4042 on cpu 0: raid5_make_request+0xea0/0x2930 md_handle_request+0x4a2/0xa40 md_submit_bio+0x109/0x1a0 __submit_bio+0x2ec/0x390 submit_bio_noacct_nocheck+0x457/0x710 submit_bio_noacct+0x2a7/0xc20 submit_bio+0x56/0x250 blkdev_direct_IO+0x54c/0xda0 blkdev_write_iter+0x38f/0x570 aio_write+0x22b/0x490 io_submit_one+0xa51/0xf70 __x64_sys_io_submit+0xf7/0x220 x64_sys_call+0x1907/0x1c60 do_syscall_64+0x130/0x570 entry_SYSCALL_64_after_hwframe+0x76/0x7e read to 0xffff8e89c8117548 of 8 bytes by task 4010 on cpu 5: break_stripe_batch_list+0x249/0x480 handle_stripe_clean_event+0x720/0x9b0 handle_stripe+0x32fb/0x4500 handle_active_stripes.isra.0+0x6e0/0xa50 raid5d+0x7e0/0xba0 md_thread+0x15a/0x2d0 kthread+0x1e3/0x220 ret_from_fork+0x37a/0x410 ret_from_fork_asm+0x1a/0x30 value changed: 0x0000000000000019 -> 0x0000000000000099 --> R5_Overlap Fixes: fb642b92c267 ("md/raid5: duplicate some more handle_stripe_clean_event code in break_stripe_batch_list") Signed-off-by: Chen Cheng <chencheng@fnnas.com> Link: https://patch.msgid.link/20260619041013.1207148-1-chencheng@fnnas.com Signed-off-by: Yu Kuai <yukuai@fygo.io>
12 daysmd/raid5: use stripe state snapshot in break_stripe_batch_list()Chen Cheng
The patch just suppress KCSAN noise. No functional change. RAID-5 can group multi full-stripe-write aka stripe_head into a batch aka batch_list, with one head_sh leading them. Call break_stripe_batch_list() when the batch is finished, or, a stripe has to be dropped out of the batch. break_stripe_batch_list() reads stripe state several times while request paths can update thost state words concurrently with lockless bitops, which reported by KCSAN. Use a snapshot to guarantees that the value used for warning, copying, and handle checks is internally consistent at current read moment. KCSAN report: ============================================== BUG: KCSAN: data-race in __add_stripe_bio / break_stripe_batch_list write (marked) to 0xffff8e89d4f0b988 of 8 bytes by task 4323 on cpu 3: __add_stripe_bio+0x35e/0x400 raid5_make_request+0x6ac/0x2930 md_handle_request+0x4a2/0xa40 md_submit_bio+0x109/0x1a0 __submit_bio+0x2ec/0x390 submit_bio_noacct_nocheck+0x457/0x710 submit_bio_noacct+0x2a7/0xc20 submit_bio+0x56/0x250 blkdev_direct_IO+0x54c/0xda0 blkdev_write_iter+0x38f/0x570 aio_write+0x22b/0x490 io_submit_one+0xa51/0xf70 read to 0xffff8e89d4f0b988 of 8 bytes by task 4290 on cpu 4: break_stripe_batch_list+0x3ce/0x480 handle_stripe_clean_event+0x720/0x9b0 handle_stripe+0x32fb/0x4500 handle_active_stripes.isra.0+0x6e0/0xa50 raid5d+0x7e0/0xba0 Signed-off-by: Chen Cheng <chencheng@fnnas.com> Link: https://patch.msgid.link/20260618134748.1168360-1-chencheng@fnnas.com Signed-off-by: Yu Kuai <yukuai@fygo.io>
12 daysselftests: drv-net: so_txtime: relax variance boundsWillem de Bruijn
The net-next-hw spinners on netdev.bots.linux.dev observe failing so-txtime-py tests. A review of stdout shows most failures to be due to exceeding the 4ms grace period. All I saw were within 8ms. So increase to that. Double the bounds from 4 to 8ms. This is still is small enough to differentiate the delays programmed by the test, 10 and 20ms. Fixes: 5c6baef3885c ("selftests: drv-net: convert so_txtime to drv-net") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20260610170651.1b644001@kernel.org/ Signed-off-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260621200137.1564776-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 daysnet: airoha: Fix TX scheduler queue mask loop upper boundWayen Yan
In airoha_qdma_set_chan_tx_sched(), the loop clearing queue mask was using AIROHA_NUM_TX_RING (32) instead of AIROHA_NUM_QOS_QUEUES (8). Each channel has 8 queues, and TXQ_DISABLE_CHAN_QUEUE_MASK(channel, i) computes BIT(i + (channel * 8)). With i ranging 0..31, this causes: - channel 0: clears bit 0..31 (all 4 channels) instead of 0..7 - channel 1: clears bit 8..31 (channels 1-3) instead of 8..15 - channel 2: clears bit 16..31 (channels 2-3) instead of 16..23 - channel 3: clears bit 24..31 (channel 3 only) - correct by accident While BIT(32+) on arm64 produces 64-bit values truncated to 0 in u32 mask parameter, the loop still incorrectly clears queues within the same channel beyond queue 7. Even though this is functionally harmless (the register resets to 0 and is only ever cleared, never set — so clearing extra bits is a no-op), the loop bound is semantically wrong and should be fixed for correctness and clarity. Fix by using AIROHA_NUM_QOS_QUEUES (8) as the loop upper bound. Fixes: ef1ca9271313 ("net: airoha: Add sched HTB offload support") Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Wayen Yan <win847@gmail.com> Link: https://patch.msgid.link/178187479434.2400840.1312143943526335838@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 daysbnx2x: fix potential memory leak in bnx2x_alloc_mem_bp()Abdun Nihaal
If the allocation of fp[i].tpa_info fails, the error path will not free the struct bnx2x_fastpath allocated earlier, as it is not linked to the bp structure yet. Fix that by linking it immediately after allocation. Cc: stable@vger.kernel.org Fixes: 15192a8cf8a8 ("bnx2x: Split the FP structure") Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260620062402.89549-1-nihaal@cse.iitm.ac.in Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 daysipv4: fib: Don't ignore error route in local/main tables.Kuniyuki Iwashima
When CONFIG_IP_MULTIPLE_TABLES is enabled but no rule is added, fib_lookup() performs route lookup directly on two tables. Since the first lookup does not properly bail out, the result of an error route in the merged local/main table could be overwritten by another route in the default table: # unshare -n # ip link set lo up # ip route add 192.168.0.0/24 dev lo table 253 # ip route add unreachable 192.168.0.0/24 # ip route get 192.168.0.1 192.168.0.1 dev lo table default uid 0 cache <local> Once a random rule is added, the error route is respected: # ip rule add table 0 # ip rule del table 0 # ip route get 192.168.0.1 RTNETLINK answers: No route to host Let's fix the inconsistent behaviour. Fixes: f4530fa574df ("ipv4: Avoid overhead when no custom FIB rules are installed.") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260619212753.3367244-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 dayseth: bnxt: improve the timing of statsJakub Kicinski
Kernel selftests wait 1.25x of the promised stats refresh time (as read from ethtool -c). bnxt reports 1sec by default, but the stats update process has two steps. First device DMAs the new values, then the service task performs update in full-width SW counters. So the worst case delay is actually 2x. Note that the behavior is different for ring stats and port stats. Port stats are fetched synchronously by the service worker, so there's no risk of doubling up the delay there. The problem of stale stats impacts not only tests but real workloads which monitor egress bandwidth of a NIC. The inaccuracy causes double counting in the next cycle and spurious overload alarms. Try to read from the DMA buffer more aggressively, to mitigate timing issues between DMA and service task. The SW update should be cheap. Fixes: 51f307856b60 ("bnxt_en: Allow statistics DMA to be configurable using ethtool -C.") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260619191538.104165-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 daysipv6: Fix null-ptr-deref in fib6_nh_mtu_change().Xiang Mei
fib6_nh_mtu_change() re-fetches idev via __in6_dev_get(arg->dev) and dereferences idev->cnf.mtu6 without a NULL check. addrconf_ifdown() clears dev->ip6_ptr with RCU_INIT_POINTER() after rt6_disable_ip() has released tb6_lock, so the RA-driven MTU walk can observe a NULL idev and oops. The caller rt6_mtu_change_route() guards its own __in6_dev_get(), but this re-fetch is unguarded; nexthop-backed routes survive addrconf_ifdown()'s flush, so the walk still reaches it after ip6_ptr is nulled. Return 0 when idev is NULL, matching rt6_mtu_change_route() and the fib6_mtu() fix in commit 5ad509c1fdad ("ipv6: Fix null-ptr-deref in fib6_mtu()."). Oops: general protection fault, ... KASAN: null-ptr-deref in range [0x00000000000002a8-0x00000000000002af] RIP: 0010:fib6_nh_mtu_change+0x203/0x990 rt6_mtu_change_route+0x141/0x1d0 __fib6_clean_all+0xd0/0x160 rt6_mtu_change+0xb4/0x100 ndisc_router_discovery+0x24b5/0x2cb0 icmpv6_rcv+0x12e9/0x1710 ipv6_rcv+0x39b/0x410 Fixes: c0b220cf7d80 ("ipv6: Refactor exception functions") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260619045334.2427073-1-xmei5@asu.edu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 dayshdlc_ppp: sync per-proto timers before freeing hdlc stateFan Wu
Each PPP control protocol (LCP/IPCP/IPV6CP) embedded in struct ppp registers a timer via timer_setup(). That struct ppp is the hdlc->state allocation, which detach_hdlc_protocol() frees with kfree() in both teardown paths: unregister_hdlc_device() and the re-attach inside attach_hdlc_protocol(). The ppp proto never registered a .detach callback, so detach_hdlc_protocol() performs no timer synchronization before the kfree(). The only cancel, timer_delete(&proto->timer) in ppp_cp_event(), is partial (it does not wait for a running callback) and only runs on the ->CLOSED transition; ppp_stop()/ppp_close() do not sync either. A ppp_timer callback already executing (blocked on ppp->lock) survives the kfree and then dereferences proto->state / ppp->lock in freed memory, leading to a use-after-free. Fix this by adding a .detach helper that calls timer_shutdown_sync() on every per-proto timer. detach_hdlc_protocol() invokes proto->detach(dev) before kfree(hdlc->state), so timer_shutdown_sync() now runs on both free paths. timer_shutdown_sync() is used instead of timer_delete_sync() because the keepalive path re-arms the timer through add_timer()/mod_timer() and shutdown blocks any re-activation during teardown. Initialize the per-protocol timers in ppp_ioctl() when the protocol is attached, and remove the now-redundant timer_setup() from ppp_start(), so that the timers are initialized exactly once at attach time and ppp_timer_release() never operates on uninitialized timer_list structures. attach_hdlc_protocol() uses kmalloc() (not kzalloc), so struct ppp's protos[i].timer is uninitialized garbage until the first timer_setup(); without this init-at-attach, attaching the PPP protocol without ever bringing the device up would leave timer_shutdown_sync() operating on uninitialized memory in .detach. Moving the init out of ppp_start() (which only runs on NETDEV_UP) into the attach path makes the initialization unconditional and avoids initializing the same timer_list twice. This bug was found by static analysis. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Fan Wu <fanwu01@zju.edu.cn> Link: https://patch.msgid.link/20260617020518.116319-1-fanwu01@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 daysksmbd: increase SMB3_DEFAULT_TRANS_SIZE from 1MB to 4MBNamjae Jeon
This patch raises `SMB3_DEFAULT_TRANS_SIZE` to 4MB to align it with `smb2 max read/write`. This allows better I/O negotiation with modern clients and improves sequential read/write performance on high-speed networks. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: fix UBSAN array-index-out-of-bounds in decode_compress_ctxt()Namjae Jeon
decode_compress_ctxt() walks CompressionAlgorithms[] using the client supplied CompressionAlgorithmCount. That field is declared in struct smb2_compression_capabilities_context as a fixed 4-element array, but the number of algorithms is actually variable and clients such as Windows advertise more than four (e.g. LZ77, LZ77+Huffman, LZNT1, Pattern_V1 and LZ4). The on-wire context length is already validated, so the access is within the received buffer, but indexing the statically sized [4] array makes UBSAN report an out-of-bounds access: UBSAN: array-index-out-of-bounds in smb2pdu.c:1122:48 index 4 is out of range for type '__le16 [4]' Call Trace: smb2_handle_negotiate+0xda7/0xde0 [ksmbd] ksmbd_smb_negotiate_common+0x27b/0x3e0 [ksmbd] smb2_negotiate_request+0x14/0x20 [ksmbd] handle_ksmbd_work+0x181/0x500 [ksmbd] Walk the algorithms through a pointer so the fixed-array bounds check is not applied, while keeping the existing length validation that bounds the loop to the data actually received. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: sleep interruptibly in the durable handle scavengerNamjae Jeon
The durable handle scavenger kthread waits up to DURABLE_HANDLE_MAX_TIMEOUT (300 seconds) between scans using wait_event_timeout(), which sleeps in TASK_UNINTERRUPTIBLE. When there are no durable handles pending expiry the task stays in D state far longer than 120 seconds, so the hung task detector prints a bogus "task ksmbd-durable-s blocked for more than 120 seconds" warning with a backtrace, even though the thread is only idle. Use wait_event_interruptible_timeout() so the thread sleeps in TASK_INTERRUPTIBLE, which the hung task detector ignores. This also suits the already-freezable kthread. Treat a negative return (e.g. -ERESTARTSYS) like a timeout when recomputing the next wake interval. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: start file id allocation at 1Namjae Jeon
ksmbd allocates both the volatile id (per-session file table) and the persistent id (global file table) with idr_alloc_cyclic() starting at 0. The first open after the module loads therefore gets volatile id 0 and persistent id 0, and ksmbd returns an SMB2 FileId of {0, 0} in the create response. Clients treat an all-zero FileId as a null handle. smbtorture's smb2_util_handle_empty() considers {0, 0} empty, so tests that guard the close with it (e.g. smb2.oplock.statopen1, smb2.lease.statopen*) never close that first handle. The leaked open keeps the inode's oplock count non-zero, so a later batch oplock request on the same file is downgraded to level II and the test fails. Start the id allocation at 1 (KSMBD_START_FID) so no handle is ever assigned a {0, 0} FileId, matching the behaviour of other SMB servers. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: treat read-control opens as stat opens only for leasesNamjae Jeon
A second open that requests only metadata-level access must not break the existing caching state. ksmbd already skips the break for such opens via fp->attrib_only (FILE_READ_ATTRIBUTES, FILE_WRITE_ATTRIBUTES and FILE_SYNCHRONIZE). An open requesting only READ_CONTROL (reading the security descriptor) must be treated differently depending on the existing caching state. smbtorture smb2.lease.statopen4 expects a read-control open NOT to break a caching lease, while smb2.oplock.statopen1 expects the same open to break a batch oplock. So READ_CONTROL is a stat open for leases but not for oplocks. Extend the stat-open break-skip in smb_grant_oplock() to also cover a read-control-only open, but only when the existing holder is a lease. The global fp->attrib_only flag (used for share-mode, rename and truncate decisions) is left unchanged so oplock behaviour is preserved. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: validate :: stream type against directory createNamjae Jeon
smb2.streams.dir opens <dir>::$DATA with FILE_DIRECTORY_FILE and expects STATUS_NOT_A_DIRECTORY, then opens <dir>::$DATA without it and expects STATUS_FILE_IS_A_DIRECTORY. Commit "treat unnamed DATA stream as base file" canonicalizes the ::$DATA suffix to a NULL stream name so the open continues through the base-file path. That skipped the stream/directory type validation, which was guarded by "if (stream_name)", so opening a directory's ::$DATA stream with FILE_DIRECTORY_FILE incorrectly returned STATUS_OK and a plain open of it no longer reported STATUS_FILE_IS_A_DIRECTORY. parse_stream_name() still records the explicit $DATA type in s_type even when it clears stream_name. Run the data-stream vs directory validation whenever s_type is DATA_STREAM, not only when stream_name is set, so the canonicalized ::$DATA open is rejected with the correct status. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: break conflicting-open leases only as far as neededNamjae Jeon
smb2.lease.oplock and smb2.lease.breaking1 hold a lease and then issue a single conflicting open on the same file. The held lease must break one step to drop write caching (RWH->RH, RW->R) and then stop, so lease_break_info.count is 1 and the lease keeps its read/handle caching. ksmbd instead cascaded the break all the way down to none (e.g. RWH->RH->R->none), so the break count was 2 or 3 and the reported lease state ended at 0. Commit "chain pending lease breaks before waking waiters" forces break_level to SMB2_OPLOCK_LEVEL_NONE for any non-lease open against a handle-caching lease, which drives oplock_break()'s retry loop down to none even when only one open is contending. Drop that break_level override so a conflicting open breaks a lease only to its own compatible level (level II, i.e. RH/R). A deeper break is still required when a truncating open is also waiting behind the same lease break. smb2.lease.breaking3 keeps a normal open pending through RWH->RH and an overwrite open pending behind it, and expects the lease to continue RH->R->none before either open completes. The overwrite waiter sets open_trunc on the lease while it blocks on the pending break, so extend the retry loop to chain another break while that truncating waiter still needs the lease at none. The per-break open_trunc snapshot stays cleared, so the cascade steps down (RH->R->none) instead of collapsing straight to none, and the normal open stays pending until the lease is fully broken. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: break handle caching for share conflictsNamjae Jeon
smb2.lease.break_twice first opens a file with an RHW lease and then tries a second open with restrictive sharing. That open must fail with a sharing violation, but the existing lease should be broken from RHW to RW because only handle caching conflicts with the requested sharing. ksmbd used the normal write-cache break calculation for this path, so RHW was broken to RH. The following successful open then did not generate the expected second break from RW to R. Pass share-conflict context into the lease break helper and, for lease breaks caused by sharing, drop only SMB2_LEASE_HANDLE_CACHING from the current lease state. Other break paths keep the existing write/truncate break behavior. A share-conflict break must also remain a single break. The triggering open fails with a sharing violation and is never granted, so there is no target oplock level to converge on. The lease break retry loop, however, keeps breaking while the lease level is still above req_op_level, which broke RHW all the way down to R in one open (lease_break_info.count became 2 instead of 1). Skip the again loop for share-conflict breaks so the sharing open produces exactly one RHW->RW break and the later successful open produces the separate RW->R break the test expects. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: normalize ungrantable lease statesNamjae Jeon
smb2.lease.request verifies which SMB2 lease state combinations are granted by the server. Requests for H-only, W-only, and HW leases are valid lease state bitmasks, but they are not grantable combinations and should be returned as lease state none. ksmbd only checked that the requested bits were inside the SMB2 lease state mask. As a result it could grant H-only, W-only, or HW requests and return non-zero lease states where the client expects no lease. Keep the bitmask validation, but normalize ungrantable combinations to zero before allocating or looking up the lease. The grantable combinations remain unchanged: R, RH, RW, and RHW. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: return oplock protocol error for level II ackNamjae Jeon
SMB2 level II to none oplock breaks do not require an acknowledgment from the client. smb2.oplock.levelii500 intentionally acknowledges such a break and expects the server to reject it with STATUS_INVALID_OPLOCK_PROTOCOL. ksmbd drops the local level II oplock to none immediately after sending the break notification because it does not wait for an ACK. When the client then sends the invalid ACK, smb20_oplock_break_ack() sees that the oplock is not in OPLOCK_ACK_WAIT state and returns STATUS_INVALID_DEVICE_STATE before checking the current oplock level. If the oplock is already none when an unexpected SMB2 oplock break ACK arrives, report STATUS_INVALID_OPLOCK_PROTOCOL. Keep the existing STATUS_INVALID_DEVICE_STATE response for other unexpected non-wait states. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: avoid level II oplock break notification on unlinkNamjae Jeon
smb2_util_unlink() opens the target with FILE_DELETE_ON_CLOSE and then closes that handle. Other clients can also mark a file for delete with SMB2 SET_INFO FileDispositionInformation. When these unlink paths break existing SMB2 level II oplocks, ksmbd sends an unsolicited SMB2_OPLOCK_BREAK notification to none. This races with the synchronous CREATE or SET_INFO response expected by the client, and smbtorture reports NT_STATUS_INVALID_NETWORK_RESPONSE while running smb2.oplock.exclusive2. SMB2 level II oplock breaks do not require an acknowledgment in the delete path. Keep lease handling unchanged, but drop plain SMB2 level II oplocks locally for unlink requests without sending a break notification. Normal write/truncate paths still send the level II to none notification, preserving the behavior covered by smb2.oplock.levelII500. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: downgrade oplock after break timeoutNamjae Jeon
smb2.oplock.batch22a opens a file with a batch oplock and then issues a second open that waits for the oplock break timeout. After the timeout the second open should succeed, but the granted oplock level must be level II. When the break times out, oplock_break() returns -ENOENT after invalidating the previous opener. smb_grant_oplock() went straight to set_lev with the original requested oplock level, so the second open could be granted a new batch oplock. Downgrade the requested oplock to level II on the -ENOENT break-timeout path before granting the oplock to the new open. A break that completes because the previous owner closed its handle from the oplock break handler must be distinguished from a real timeout. smb2.oplock.batch7 closes the first handle during the break wait, and the second open is then expected to be granted the originally requested batch oplock. Return -EAGAIN from the non-lease break path when the previous opener closed during the break wait, recheck sharing in smb_grant_oplock(), and grant the requested oplock if the close removed the conflict. Real break timeouts still return -ENOENT and keep the downgrade to level II. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: apply create security descriptor firstNamjae Jeon
smb2.create.aclfile creates files with an SMB2_CREATE_SD_BUFFER create context and expects the resulting security descriptor to match the descriptor supplied by the client. ksmbd currently tries to inherit the parent DACL first and only parses the SMB2_CREATE_SD_BUFFER context when DACL inheritance fails. If inheritance succeeds, the explicit security descriptor supplied on create is ignored. This breaks create requests that include owner/group information in the security descriptor. Apply the create security descriptor first when the context is present. Fall back to the existing inherited/default ACL path only when no create security descriptor was supplied. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: return requested create allocation sizeNamjae Jeon
smb2.create.blob sends an SMB2_CREATE_ALLOCATION_SIZE create context with a 1MiB allocation size and expects the create response AllocationSize field to match the requested size. smb2.create.open additionally compares the AllocationSize returned in the CREATE response with the AllocationSize returned by FILE_ALL_INFORMATION on the same handle. ksmbd applies the allocation with fallocate(), but then fills both the create response and handle-based information from stat.blocks << 9. On filesystems such as ext4 this can include filesystem allocation rounding and metadata effects, causing a response larger than the SMB2 allocation size context and a disagreement between the two queries. Remember the requested allocation size while processing the create context, store the reported allocation size in struct ksmbd_file, and use it for both the create response and handle-based allocation size responses. Update the stored value when FILE_ALLOCATION_INFORMATION changes it, and fall back to stat.blocks << 9 when no allocation size context was provided. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: tighten create file attribute validationNamjae Jeon
smb2.create.gentest checks each create FileAttributes bit independently and expects FILE_ATTRIBUTE_INTEGRITY_STREAM and FILE_ATTRIBUTE_NO_SCRUB_DATA to be rejected with STATUS_INVALID_PARAMETER. ksmbd validates create FileAttributes against FILE_ATTRIBUTE_MASK, which includes those bits. It also rejects only requests that have no known attribute bit at all, so a request containing both known and unknown bits can pass validation. Use a create-specific attribute mask that excludes INTEGRITY_STREAM and NO_SCRUB_DATA, and reject any bit outside that mask. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: reject empty-attribute synchronize-only createNamjae Jeon
smb2.create.gentest checks each desired access bit independently and expects an open that requests only SYNCHRONIZE with CreateDisposition OPEN_IF and FileAttributes 0 to fail with STATUS_ACCESS_DENIED. Rejecting all SYNCHRONIZE-only opens is too broad: SYNCHRONIZE does not imply read, write, or delete data access, and smb2.sharemode.sharemode-access expects a SYNCHRONIZE-only open to succeed when it does not conflict with the existing share mode. Limit the rejection to the gentest create shape: SYNCHRONIZE-only access, OPEN_IF disposition, and no file attributes. Other synchronize-only opens are handled by the normal permission and share-mode checks. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: honor stream delete sharing for base fileNamjae Jeon
smb2.streams.delete opens an alternate data stream without FILE_SHARE_DELETE and then tries to delete the base file. Windows rejects the base-file delete with STATUS_SHARING_VIOLATION while the stream handle is open. ksmbd tracks stream opens on the same ksmbd_inode as the base file, but the delete-on-close path only checked delete access on the base handle before marking the inode delete-pending. As a result, deleting the base file succeeded even though an open stream handle denied delete sharing. Add a helper to detect open stream handles on the same inode that do not allow FILE_SHARE_DELETE, and reject base-file delete pending and DELETE opens with a sharing violation in that case. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: send pending interim for last compound I/ONamjae Jeon
smb2.compound_async.write_write and smb2.compound_async.read_read expect the last I/O request in a compound request to become cancellable before its final response is received. smb clients mark a request cancellable after receiving an interim STATUS_PENDING response. ksmbd handled the last READ/WRITE synchronously and returned the final response directly, so the client never observed STATUS_PENDING and req->cancel.can_cancel remained false. For the last READ or WRITE in a compound request, register the work briefly as async and send a STATUS_PENDING interim response before continuing with the normal synchronous completion. The final READ/WRITE response remains unchanged. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: return success for deferred final closeNamjae Jeon
ksmbd_close_fd() marks an open file as FP_CLOSED and drops the file table reference. If another in-flight request still holds a reference, the final close is deferred until that request drops its reference. The function currently returns -EINVAL in that deferred-final-close case because fp is cleared when the reference count does not reach zero. That turns a valid close into STATUS_FILE_CLOSED. smb2.compound_find.compound_find_close sends QUERY_DIRECTORY and then closes the same directory handle before receiving the find response. The query holds a reference while it builds the response, so close must mark the handle closed and return success even though final teardown is delayed. Track whether the handle was successfully transitioned to FP_CLOSED and return success when only the final close is deferred. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: preserve compound responses for chained errorsNamjae Jeon
set_smb2_rsp_status() resets the response iov and compound offsets before building an error response. That is fine for a single request, but it corrupts a compound response when an error is detected after an earlier compound element has already been completed. smb2.compound.invalid4 sends a READ as the first compound element and a bogus command as the second one. The READ response must remain in the compound response with STATUS_END_OF_FILE, followed by the bogus command response with STATUS_INVALID_PARAMETER. Resetting the response state for the second command breaks the compound framing and the client reports NT_STATUS_INVALID_NETWORK_RESPONSE. When setting an error for a chained command, update and pin only the current compound response slot instead of resetting the whole response. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: validate handle for create or get object idNamjae Jeon
FSCTL_CREATE_OR_GET_OBJECT_ID returned a dummy successful response without checking whether the request handle was valid. That let an invalid related compound handle succeed in smb2.compound.related5, although the client expected STATUS_FILE_CLOSED. Look up the file handle before building the object id response and fail with STATUS_FILE_CLOSED when the handle is invalid or already closed. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: propagate failed command status in related compoundsNamjae Jeon
In a related compound request, later commands can refer to the file handle from an earlier command using the related FID value. If the earlier command fails without producing a valid compound FID, the later related commands must fail with the same status instead of operating on an invalid or stale handle. smb2.compound.related4 sends CREATE followed by IOCTL, CLOSE and SET_INFO. The CREATE is expected to fail with STATUS_ACCESS_DENIED, and the remaining related commands are expected to return STATUS_ACCESS_DENIED as well. ksmbd only stored the compound FID on successful CREATE and did not remember failed compound statuses. Store the failed status in the work item and make related handle-based requests fail immediately with that status only when the compound FID is invalid. Also preserve and consume the related FID across successful FLUSH, READ and WRITE requests whose responses do not carry a file id. Keep a valid compound FID across non-close failures so later related commands can continue to use the handle. When extracting the FID from a successful READ, WRITE or FLUSH request, use the request structure matching the SMB2 command: READ and WRITE place PersistentFileId and VolatileFileId at a different offset than FLUSH, so a single smb2_flush_req cast can save the wrong value as compound_fid and make the following related request fail with STATUS_FILE_CLOSED (smb2.compound_async.write_write after smb2.compound_async.flush_flush). Only update the saved compound FID when the request carries a valid volatile FID. otherwise an all-ones related FID would overwrite the CREATE FID and break smb2.compound.related6. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: deny renaming directory with open childrenNamjae Jeon
Windows denies renaming a directory while a file below that directory is still open. smb2.rename.rename_dir_openfile checks this by keeping a file handle open under the directory and then attempting to rename the directory handle. ksmbd did not check open children before calling vfs_rename(), so the rename incorrectly succeeded. For non-POSIX clients, scan the global open file table for active handles whose dentries are below the directory being renamed. If any child is open, fail the rename with -EACCES so the client receives STATUS_ACCESS_DENIED. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: check parent directory sharing conflicts on renameNamjae Jeon
When renaming a file, some existing opens on the parent directory must block the rename with STATUS_SHARING_VIOLATION. This includes parent directory handles opened with DELETE access and handles opened without FILE_SHARE_DELETE. ksmbd checked only the parent's desired access for FILE_DELETE. That handled smb2.rename.share_delete_and_delete_access, but missed the case where the parent directory was opened without delete access and without delete sharing, so smb2.rename.no_share_delete_no_delete_access incorrectly succeeded. Attribute-only parent opens, however, must not block the rename. smb2.rename.msword opens the parent directory with only SYNCHRONIZE and FILE_READ_ATTRIBUTES, no share access, and then renames an already-open child file. Windows allows this pattern. Reject parent directory handles that request DELETE access, and reject non-attribute-only parent opens that deny FILE_SHARE_DELETE, while allowing attribute-only parent opens to coexist with child rename. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: preserve open change time across renameNamjae Jeon
inode ctime is updated when a file is renamed. ksmbd returned that ctime directly as SMB2 ChangeTime for handle-based query information. This makes ChangeTime change after a rename through an already-open handle, while Windows keeps the handle's ChangeTime stable for this case. Store the SMB ChangeTime in struct ksmbd_file when the handle is opened and use that value for create, close, and handle-based query information responses. If a client explicitly sets FILE_BASIC_INFORMATION ChangeTime, update the stored value as well. This fixes smbtorture smb2.rename.simple_modtime, which expects ChangeTime and LastWriteTime to remain unchanged after renaming an already-open file. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: handle durable v2 app instance idNamjae Jeon
The SMB2_CREATE_APP_INSTANCE_ID create context is used with durable v2 opens to identify another open from the same application instance. When a new durable v2 open arrives with the same AppInstanceId as an existing open, the server should close the previous open without sending an oplock break notification. ksmbd ignored this create context. A second durable v2 batch oplock open with the same AppInstanceId therefore went through the normal competing open path and sent an oplock break to the first opener. smbtorture smb2.durable-v2-open.app-instance expects no oplock break and then expects the old handle to be closed. Parse and store AppInstanceId for durable v2 opens. Before creating the new open, find an existing file with the same AppInstanceId and close it through the normal close teardown path without issuing an oplock break. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: fix durable reconnect context parsingNamjae Jeon
SMB2 create context DataLength describes only the create context data payload. It does not include the create context header, name field, or any local padding that exists in ksmbd's helper structures. ksmbd validated durable reconnect contexts by comparing DataOffset + DataLength against sizeof the whole helper structure. This rejects a valid durable v2 reconnect context because the wire DH2C data is 36 bytes while struct create_durable_handle_reconnect_v2 contains an extra four byte pad. Validate the durable context payload length against the corresponding payload member instead. Also keep the reconnect context authoritative when a later durable request context is present, matching the existing durable v1 reconnect behavior. This fixes smbtorture smb2.durable-v2-open.durable-v2-setinfo, where the durable v2 reconnect after SET_INFO was rejected with STATUS_INVALID_PARAMETER. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: invalidate durable handles on oplock breakNamjae Jeon
When a durable handle is preserved after a disconnect, its oplock state can still block later opens. If another client opens the same file and the preserved oplock or lease has to be broken, the old durable handle must no longer be reconnectable after the break cannot be acknowledged. ksmbd was treating a missing connection, or an oplock break timeout, as a successful break only by downgrading the oplock state. The old durable handle remained reconnectable, so a later durable reconnect for that stale handle could succeed. The open path can also see a detached durable handle before the break notification helpers fully dispose of it. Invalidate such a preserved durable handle directly when a competing open has to break its batch or exclusive oplock, while leaving ordinary durable reconnects without a competing open untouched. If the old handle still reaches the reconnect path, reject it when the same inode already has another active open. This matches the smb2.durable-open.open2-lease/open2-oplock sequence where a later open replaces the disconnected durable owner and the stale first handle must not be reclaimed. Also, reconnect lookup used only the persistent id. A new durable open can get a persistent id that matches the stale reconnect request after the old durable state is invalidated. Preserve the disconnected handle's old volatile id and require durable reconnect contexts to match it, so a stale reconnect cannot attach to a different durable open. Windows allows the later open to proceed and rejects the old reconnect with STATUS_OBJECT_NAME_NOT_FOUND. The smbtorture smb2.durable-open.oplock test covers this case. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: supersede disconnected delete-on-close durable handleNamjae Jeon
A durable handle opened with FILE_DELETE_ON_CLOSE is preserved across a disconnect so it can be reclaimed by a durable reconnect. smb2.durable-open.delete_on_close2 disconnects such a handle and then reconnects it, expecting the reconnect to succeed. When the client does not reconnect but instead opens the same name with a new delete-on-close create, the preserved handle keeps the file present with delete-on-close set. ksmbd then rejects the new open with STATUS_ACCESS_DENIED on the file_present + FILE_DELETE_ON_CLOSE + OPEN_IF/OVERWRITE_IF path. smb2.durable-open.delete_on_close1 expects this open to create a fresh, empty file instead, i.e. the disconnected handle's delete-on-close must take effect first. Add ksmbd_close_disconnected_durable_delete_on_close(), which closes disconnected (conn == NULL) durable handles that keep a delete-on-close file present. The final close promotes S_DEL_ON_CLS to S_DEL_PENDING and unlinks the file, so a re-resolved path is absent and the new open creates it fresh. Call it from smb2_open() before the delete-on-close conflict check, only for the conflicting open shapes. A live (connected) handle still keeps the file and blocks the open as before. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysksmbd: handle missing create contexts for lease opensNamjae Jeon
smb2_find_context_vals() assumes that callers only search create contexts when the SMB2 CREATE request contains a non-empty create context area. That is not always true. a client can send RequestedOplockLevel set to SMB2_OPLOCK_LEVEL_LEASE without a lease create context. In that case parse_lease_state() searches for a lease context and smb2_find_context_vals() starts parsing from offset 0 with length 0, returning -EINVAL. This makes the open fail with STATUS_INVALID_PARAMETER. The smbtorture smb2.lease.duplicate_open test hits this while creating a second file without a lease request. Return NULL when the request has no create context area so the missing context is treated the same as any other absent create context. The open then continues without granting a lease. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>