| Age | Commit message (Collapse) | Author |
|
Add sanity check for iph->ihl field in nf_flow_ip4_tunnel_proto() before
using it to compute the header size, avoiding out-of-bounds access with
malformed IP headers.
While at it, use iph->protocol instead of the hardcoded IPPROTO_IPIP
constant when setting ctx->tun.proto and reference ctx->tun.hdr_size
when updating ctx->offset.
Fixes: ab427db178858 ("netfilter: flowtable: Add IPIP rx sw acceleration")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Commit 69894e5b4c5e ("netfilter: nft_connlimit: update the count if add
was skipped") introduced a regression where packets for valid
connections are dropped when using connlimit for soft-limiting
scenarios.
The issue occurs when a new connection reuses a socket currently in
the TIME_WAIT state. In this scenario, the connection tracking entry
is evaluated as already confirmed. Previously, __nf_conncount_add()
assumed that if a connection was confirmed and did not originate from
the loopback interface, it should skip the addition and return -EEXIST.
Skipping the addition triggers a garbage collection run that cleans up
the TIME_WAIT connection. Consequently, the active connection count
drops to 0, which xt_connlimit mishandles, leading to the false rejection
of the perfectly valid new connection.
Fix this by replacing the interface check with protocol-agnostic state
checks. We now skip the tree insertion and preserve the lockless garbage
collection optimization only if the connection is IPS_ASSURED. This
allows early-confirmed setup packets (such as reused TIME_WAIT sockets
or locally generated SYN-ACKs) to be properly evaluated and counted
without falsely dropping. The goto check_connections path is maintained
to ensure these setup packets are deduplicated correctly.
This has been tested with slowhttptest and HTTP server configured
locally to ensure we are not breaking soft-limiting scenarios for local
or external connections. In addition, it was tested with a OVS zone
limit too.
Fixes: 69894e5b4c5e ("netfilter: nft_connlimit: update the count if add was skipped")
Reported-by: Alejandro Olivan Alvarez <alejandro.olivan.alvarez@gmail.com>
Closes: https://lore.kernel.org/netfilter-devel/177349610461.3071718.4083978280323144323@eldamar.lan/
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We ran into below KASAN splat, which is mostly uninteresting, beside
for having nf_nat_register_fn() in the call chain as a cause for the
offending access:
==================================================================
BUG: KASAN: slab-out-of-bounds in nf_nat_register_fn+0x5f9/0x640
Read of size 8 at addr ffff890031e54c20 by task iptables/9510
CPU: 0 UID: 0 PID: 9510 Comm: iptables Not tainted 6.18.18-grsec-full-20260320181326 #1 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Call Trace:
<TASK>
[…] dump_stack_lvl+0xee/0x160 ffff88004117eeb8
[…] print_report+0x6e/0x640 ffff88004117eee0
[…] ? __phys_addr+0x8e/0x140 ffff88004117eef0
[…] ? kasan_addr_to_slab+0x51/0xe0 ffff88004117ef08
[…] ? complete_report_info+0xec/0x1c0 ffff88004117ef20
[…] ? nf_nat_register_fn+0x5f9/0x640 ffff88004117ef48
[…] kasan_report+0xbc/0x140 ffff88004117ef50
[…] ? nf_nat_register_fn+0x5f9/0x640 ffff88004117ef90
[…] nf_nat_register_fn+0x5f9/0x640 ffff88004117eff8
[…] ? nf_nat_icmp_reply_translation+0x6e0/0x6e0 ffff88004117f070
[…] nf_tables_register_hook.part.0+0xa0/0x220 ffff88004117f080
[…] nf_tables_addchain.constprop.0+0x1054/0x1fc0 ffff88004117f0b8
[…] ? nft_chain_lookup.part.0+0x4ce/0xac0 ffff88004117f130
[…] ? nf_tables_abort+0x3d80/0x3d80 ffff88004117f190
[…] ? nf_tables_dumpreset_obj+0x100/0x100 ffff88004117f1c8
[…] ? nft_table_lookup.part.0+0x255/0x300 ffff88004117f310
[…] ? nf_tables_newchain+0x21a4/0x2fa0 ffff88004117f358
[…] nf_tables_newchain+0x21a4/0x2fa0 ffff88004117f360
[…] ? nf_tables_addchain.constprop.0+0x1fc0/0x1fc0 ffff88004117f458
[…] ? nla_get_range_signed+0x4a0/0x4a0 ffff88004117f488
[…] ? lock_acquire+0x16f/0x320 ffff88004117f490
[…] ? find_held_lock+0x3b/0xe0 ffff88004117f4b0
[…] ? __nla_parse+0x45/0x80 ffff88004117f500
[…] nfnetlink_rcv_batch+0xbca/0x19a0 ffff88004117f550
[…] ? nfnetlink_net_exit_batch+0x120/0x120 ffff88004117f618
[…] ? __sanitizer_cov_trace_switch+0x63/0xe0 ffff88004117f720
[…] ? gr_acl_handle_mmap+0x1c4/0x320 ffff88004117f7c0
[…] ? nla_get_range_signed+0x4a0/0x4a0 ffff88004117f7e8
[…] ? gr_is_capable+0x6f/0xe0 ffff88004117f830
[…] ? __nla_parse+0x45/0x80 ffff88004117f860
[…] ? skb_pull+0x103/0x1a0 ffff88004117f880
[…] nfnetlink_rcv+0x3db/0x4a0 ffff88004117f8b0
[…] ? nfnetlink_rcv_batch+0x19a0/0x19a0 ffff88004117f8d8
[…] ? netlink_lookup+0xe2/0x240 ffff88004117f900
[…] netlink_unicast+0x74b/0xb00 ffff88004117f930
[…] ? netlink_attachskb+0xb20/0xb20 ffff88004117f980
[…] ? __check_object_size+0x3e/0xaa0 ffff88004117f998
[…] ? security_netlink_send+0x51/0x160 ffff88004117f9c8
[…] netlink_sendmsg+0xa03/0x1200 ffff88004117f9f8
[…] ? netlink_unicast+0xb00/0xb00 ffff88004117fa70
[…] ? netlink_unicast+0xb00/0xb00 ffff88004117fac8
[…] ? ____sys_sendmsg+0xe2a/0x1040 ffff88004117faf8
[…] ____sys_sendmsg+0xe2a/0x1040 ffff88004117fb00
[…] ? kernel_recvmsg+0x300/0x300 ffff88004117fb60
[…] ? reacquire_held_locks+0xe9/0x260 ffff88004117fbc8
[…] ___sys_sendmsg+0x138/0x200 ffff88004117fbf8
[…] ? do_recvmmsg+0x7e0/0x7e0 ffff88004117fc30
[…] ? lockdep_hardirqs_on_prepare+0x101/0x1e0 ffff88004117fc50
[…] ? lock_acquire+0x16f/0x320 ffff88004117fd20
[…] ? lock_acquire+0x16f/0x320 ffff88004117fd58
[…] ? find_held_lock+0x3b/0xe0 ffff88004117fd70
[…] __sys_sendmsg+0x17a/0x260 ffff88004117fdc8
[…] ? __sys_sendmsg_sock+0x80/0x80 ffff88004117fdf0
[…] ? syscall_trace_enter+0x15e/0x2c0 ffff88004117fe98
[…] do_syscall_64+0x7d/0x400 ffff88004117fec8
[…] entry_SYSCALL_64_safe_stack+0x4a/0x60 ffff88004117fef8
</TASK>
==================================================================
The out-of-bounds report, though, is a red herring as it is for an
access that shouldn't have happened in the first place.
When nf_nat_init() fails to register its BPF kfuncs, it'll unwind and,
among others, call unregister_pernet_subsys() to deregister its per-net
ops. This makes the previously allocated net id available for reuse by
the next caller of register_pernet_subsys(), in our case, synproxy.
However, 'nat_net_id' will still hold the previously allocated value.
If nf_nat.o gets build as a module, all this doesn't matter. A failed
initialization routine makes the module fail to load and any dependent
module won't be able to load either. However, if nf_nat.o is built-in,
a failing init won't /completely/ make its functionality unavailable to
dependent modules, namely the code and static data is still there, free
to be called by modules like nft_chain_nat.ko.
Case in point, nft_chain_nat registers hooks that'll call into nf_nat
which, in our case, failed to initialize and therefore won't have a
valid net id nor related net_nat object any more.
Code in nf_nat, namely nf_nat_register_fn() and nf_nat_unregister_fn(),
still making use of the reallocated net id, lead to a type confusion as
the call to net_generic() will no longer return memory belonging to an
object suited to fit 'struct nat_net' but 'struct synproxy_net' instead.
The latter is only 24 bytes on 64-bit systems, much smaller than struct
nat_net which is 176 bytes, perfectly explaining the OOB KASAN report.
Detect and handle a failed nf_nat_init() by testing the 'nf_nat_hook'
pointer which will be reset to NULL on initialization errors to prevent
the usage of an invalid nat_net pointer.
As this check is only needed when nf_nat.o is built-in, guard it by
'#ifndef MODULE...'.
Fixes: cbc1dd5b659f ("netfilter: nf_nat: Fix possible memory leak in nf_nat_init()")
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Prepare input updates for 7.2 merge window.
|
|
The MMS134S and MMS136 touch controllers have an event size of 6 bytes
rather than 8 bytes. When __mms114_read_reg() reads the touch data
packet from the device into the touch buffer, the events are packed
tightly at 6-byte intervals. However, the driver iterates through the
events using standard C array indexing (touch[index]), where each
element is sizeof(struct mms114_touch) (8 bytes) apart. As a result, any
touch events beyond the first one are read from incorrect offsets and
parsed improperly.
Fix this by explicitly calculating the byte offset for each touch event
based on the device's specific event size.
Fixes: 53fefdd1d3a3 ("Input: mms114 - support MMS136")
Fixes: ab108678195f ("Input: mms114 - support MMS134S")
Reported-by: sashiko-bot@kernel.org
Assisted-by: Antigravity:gemini-3.5-flash
Reviewed-by: Bryam Vargas <hexlabsecurity@proton.me>
Link: https://patch.msgid.link/20260616050912.1531241-1-dmitry.torokhov@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
The Elan I2C touchpad driver queries the device for its physical
dimensions and trace counts to calculate the device resolution and width.
However, if the device firmware or device tree provides invalid zero
values for x_traces or y_traces, it results in a fatal division-by-zero
exception leading to a kernel panic during device probe.
Add checks to ensure these parameters are non-zero before performing
the division. If invalid trace values are detected, fall back to a safe
default of 1.
Additionally, prevent an arithmetic underflow in the touch reporting
logic. Previously, if the calculated or fallback width was smaller than
ETP_FWIDTH_REDUCE (90), the subtraction would underflow, resulting in a
massive unsigned integer being reported to userspace. Clamp the adjusted
width to a minimum of 0 to safely handle small physical dimensions and
fallback scenarios.
Completing the probe with safe fallback values ensures the sysfs nodes
are created, keeping the firmware update path intact so a recovery
firmware can be flashed to the device.
Fixes: 6696777c6506 ("Input: add driver for Elan I2C/SMbus touchpad")
Fixes: e3a9a1290688 ("Input: elan_i2c - do not query the info if they are provided")
Signed-off-by: Ranjan Kumar <kumarranja@chromium.org>
Link: https://patch.msgid.link/20260612060339.3829666-1-kumarranja@chromium.org
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
Memoryless force-feedback devices use a timer to manage playback of
effects. When a driver for such a device is unbound (or the device is
unregistered for other reasons), the driver typically frees its private
data synchronously. However, the input_dev structure (and its associated
force-feedback structures, including the timer) is only freed when the
last user closes the corresponding device node.
If userspace keeps the device node open while the device is unregistered
(e.g., during driver unbind), the force-feedback timer can still fire
after the driver's private data has been freed.
Introduce a new 'stop' callback to struct ff_device, and call it from
input_unregister_device() before the device is deleted. Implement this
callback for memoryless devices and synchronously shut down the timer to
ensure it is stopped and cannot be rearmed once unregistration happens.
Assisted-by: Gemini:gemini-3.1-pro
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
iforce_process_packet() handles a status report (packet id 0x02) by
taking a force-feedback effect index straight from the device wire and
using it to address the per-effect state array:
i = data[1] & 0x7f;
if (data[1] & 0x80) {
if (!test_and_set_bit(FF_CORE_IS_PLAYED,
iforce->core_effects[i].flags))
...
} else if (test_and_clear_bit(FF_CORE_IS_PLAYED,
iforce->core_effects[i].flags)) {
...
}
The index is masked only with 0x7f, so it ranges 0..127, but
core_effects[] holds only IFORCE_EFFECTS_MAX (32) entries. For an index
of 32..127 the test_and_set_bit()/test_and_clear_bit() is an
out-of-bounds single-bit read-modify-write past the array. core_effects[]
is the second-to-last member of struct iforce, so the write lands in the
trailing members and beyond the embedding kzalloc()'d iforce_serio /
iforce_usb object.
data[1] is unvalidated device payload on both transports (the USB
interrupt endpoint and serio), and the status path is not gated on force
feedback being present, so a malicious or counterfeit device can set or
clear a bit at an attacker-chosen offset past the object.
Reject an out-of-range index instead of indexing with it. Bound against
the array dimension IFORCE_EFFECTS_MAX rather than dev->ff->max_effects so
the check guarantees memory safety regardless of how many effects the
device registered. A legitimate "effect started/stopped" status always
carries an index below IFORCE_EFFECTS_MAX, so well-formed devices are
unaffected; the neighbouring mark_core_as_ready() loop is already bounded
and is left untouched.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260613-b4-disp-4828d263-v1-1-02320e1a89dd@proton.me
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
goodix_ts_read_input_report() copies the number of touch points reported
by the device into an on-stack buffer
u8 point_data[2 + GOODIX_MAX_CONTACT_SIZE * GOODIX_MAX_CONTACTS];
which is sized for at most GOODIX_MAX_CONTACTS (10) contacts. The only
runtime check bounds the per-interrupt count against ts->max_touch_num,
but that value is taken verbatim from a 4-bit field of the device
configuration block and is never clamped:
ts->max_touch_num = ts->config[MAX_CONTACTS_LOC] & 0x0f;
The nibble can be 0..15, so a malfunctioning, malicious or counterfeit
controller (or an attacker tampering with the I2C bus) can advertise up
to 15 contacts. goodix_ts_read_input_report() then accepts a touch_num
of up to 15 and the second goodix_i2c_read() writes
ts->contact_size * (touch_num - 1) bytes past the one-contact header into
point_data - up to 30 bytes (45 with the 9-byte report format) beyond the
92-byte buffer: a stack out-of-bounds write.
Clamp max_touch_num to GOODIX_MAX_CONTACTS, the number of contacts
point_data[] is sized for, when reading it from the configuration.
Fixes: a7ac7c95d468 ("Input: goodix - use max touch number from device config")
Cc: stable@vger.kernel.org
Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
Reviewed-by: Hans de Goede <johannes.goede@oss.qualcomm.com>
Link: https://patch.msgid.link/20260612-b4-disp-6844625d-v1-1-df0aed080c9d@proton.me
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
BPF LSM programs can currently attach to xfrm_decode_session(). That
hook may return an error, but security_skb_classify_flow() calls it
from a void path and triggers BUG_ON() if an error is returned.
Disable BPF attachment to the hook to prevent a BPF LSM program from
turning packet classification into a full panic.
Fixes: 9e4e01dfd325 ("bpf: lsm: Implement attach, detach and execution")
Signed-off-by: Bradley Morgan <include@grrlz.net>
Link: https://lore.kernel.org/r/20260619130305.27779-1-include@grrlz.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"The most notable change is the removal of the fscache backend: it has
been deprecated for almost two years, mainly because EROFS file-backed
mounts and fanotify pre-content hooks (together with erofs-utils) now
provide better functionality and simpler codebase. In addition,
fscache has depended on netfslib for years, which is undesirable for
EROFS since it is a local filesystem. More details in [1].
In addition, sparse support has been added to the pcluster layout,
which is helpful for large sparse AI datasets, and map requests for
chunk-based inodes have been optimized to be more efficient as well.
There are also the usual fixes and cleanups.
Summary:
- Report more consecutive chunks of the same type for
each iomap request
- Add sparse support for the pcluster layout
- Update the EROFS documentation overview
- Remove the deprecated fscache backend
- Various fixes and cleanups"
Link: https://lore.kernel.org/r/20260622013622.934174-1-hsiangkao@linux.alibaba.com [1]
* tag 'erofs-for-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: handle 48-bit blocks_hi for compressed inodes
erofs: remove fscache backend entirely
erofs: simplify RCU read critical sections
erofs: add sparse support to pcluster layout
erofs: add folio order to trace_erofs_read_folio
erofs: introduce erofs_map_chunks()
erofs: call erofs_exit_ishare() before rcu_barrier()
erofs: update the overview of the documentation
erofs: clean up erofs_ishare_fill_inode()
|
|
KCSAN report a race in break_stripe_batch_list() vs. raid5_make_request()
on sh->dev[i].flags (plain word write vs. atomic bit op)..
and .. one possible scenario is:
CPU1 CPU2
break_stripe_batch_list(sh1)
-> handle sh2
-> lock(sh2)
-> sh2->batch_head = NULL
-> unlock(sh2)
-> test_and_clear_bit(R5_Overlap, sh2->dev[i].flags)
-> wake_up_bit(sh2->dev[i].flags)
raid5_make_request()
-> add_all_stripe_bios(sh2)
-> lock(sh2)
-> stripe_bio_overlaps(sh2) returns true
batch_head is NULL, so new bio overlap
exist bio on sh2 -> true
-> set_bit(R5_Overlap, sh2->dev[i].flags)
-> unlock(sh2)
-> wait_on_bit(sh2->dev[i].flags)
-> sh2->dev[i].flags = sh1->dev[i].flags & ~R5_Overlap
No wait_up_bit(), CPU2 could be wait_on_bit() forever...
Fix by :
- Expand the protect zone.
- Use batch_head's device flag's snaphot when no held head_sh->stripe_lock.
- Move sh/head_sh->batch_head = NULL to the end of protected zone , and ,
any concurrent add_all_stripe_bios() grabs sh->stripe_lock now either:
- see batch_head != null, and , is rejected by stripe_bio_overlaps()
under the lock (no R5_Overlap wait ) , or ,
- sees batch_head == NULL, only after dev[i].flags has already been
set and the prior R5_Overlap waiters worken.
KCSAN report:
================================================
BUG: KCSAN: data-race in break_stripe_batch_list / raid5_make_request
write (marked) to 0xffff8e89c8117548 of 8 bytes by task 4042 on cpu 0:
raid5_make_request+0xea0/0x2930
md_handle_request+0x4a2/0xa40
md_submit_bio+0x109/0x1a0
__submit_bio+0x2ec/0x390
submit_bio_noacct_nocheck+0x457/0x710
submit_bio_noacct+0x2a7/0xc20
submit_bio+0x56/0x250
blkdev_direct_IO+0x54c/0xda0
blkdev_write_iter+0x38f/0x570
aio_write+0x22b/0x490
io_submit_one+0xa51/0xf70
__x64_sys_io_submit+0xf7/0x220
x64_sys_call+0x1907/0x1c60
do_syscall_64+0x130/0x570
entry_SYSCALL_64_after_hwframe+0x76/0x7e
read to 0xffff8e89c8117548 of 8 bytes by task 4010 on cpu 5:
break_stripe_batch_list+0x249/0x480
handle_stripe_clean_event+0x720/0x9b0
handle_stripe+0x32fb/0x4500
handle_active_stripes.isra.0+0x6e0/0xa50
raid5d+0x7e0/0xba0
md_thread+0x15a/0x2d0
kthread+0x1e3/0x220
ret_from_fork+0x37a/0x410
ret_from_fork_asm+0x1a/0x30
value changed: 0x0000000000000019 -> 0x0000000000000099 --> R5_Overlap
Fixes: fb642b92c267 ("md/raid5: duplicate some more handle_stripe_clean_event code in break_stripe_batch_list")
Signed-off-by: Chen Cheng <chencheng@fnnas.com>
Link: https://patch.msgid.link/20260619041013.1207148-1-chencheng@fnnas.com
Signed-off-by: Yu Kuai <yukuai@fygo.io>
|
|
The patch just suppress KCSAN noise. No functional change.
RAID-5 can group multi full-stripe-write aka stripe_head into a
batch aka batch_list, with one head_sh leading them. Call
break_stripe_batch_list() when the batch is finished, or,
a stripe has to be dropped out of the batch.
break_stripe_batch_list() reads stripe state several times while
request paths can update thost state words concurrently with
lockless bitops, which reported by KCSAN.
Use a snapshot to guarantees that the value used for
warning, copying, and handle checks is internally consistent
at current read moment.
KCSAN report:
==============================================
BUG: KCSAN: data-race in __add_stripe_bio / break_stripe_batch_list
write (marked) to 0xffff8e89d4f0b988 of 8 bytes by task 4323 on cpu 3:
__add_stripe_bio+0x35e/0x400
raid5_make_request+0x6ac/0x2930
md_handle_request+0x4a2/0xa40
md_submit_bio+0x109/0x1a0
__submit_bio+0x2ec/0x390
submit_bio_noacct_nocheck+0x457/0x710
submit_bio_noacct+0x2a7/0xc20
submit_bio+0x56/0x250
blkdev_direct_IO+0x54c/0xda0
blkdev_write_iter+0x38f/0x570
aio_write+0x22b/0x490
io_submit_one+0xa51/0xf70
read to 0xffff8e89d4f0b988 of 8 bytes by task 4290 on cpu 4:
break_stripe_batch_list+0x3ce/0x480
handle_stripe_clean_event+0x720/0x9b0
handle_stripe+0x32fb/0x4500
handle_active_stripes.isra.0+0x6e0/0xa50
raid5d+0x7e0/0xba0
Signed-off-by: Chen Cheng <chencheng@fnnas.com>
Link: https://patch.msgid.link/20260618134748.1168360-1-chencheng@fnnas.com
Signed-off-by: Yu Kuai <yukuai@fygo.io>
|
|
The net-next-hw spinners on netdev.bots.linux.dev observe failing
so-txtime-py tests. A review of stdout shows most failures to be
due to exceeding the 4ms grace period. All I saw were within 8ms.
So increase to that.
Double the bounds from 4 to 8ms. This is still is small enough to
differentiate the delays programmed by the test, 10 and 20ms.
Fixes: 5c6baef3885c ("selftests: drv-net: convert so_txtime to drv-net")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20260610170651.1b644001@kernel.org/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260621200137.1564776-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In airoha_qdma_set_chan_tx_sched(), the loop clearing queue mask was
using AIROHA_NUM_TX_RING (32) instead of AIROHA_NUM_QOS_QUEUES (8).
Each channel has 8 queues, and TXQ_DISABLE_CHAN_QUEUE_MASK(channel, i)
computes BIT(i + (channel * 8)). With i ranging 0..31, this causes:
- channel 0: clears bit 0..31 (all 4 channels) instead of 0..7
- channel 1: clears bit 8..31 (channels 1-3) instead of 8..15
- channel 2: clears bit 16..31 (channels 2-3) instead of 16..23
- channel 3: clears bit 24..31 (channel 3 only) - correct by accident
While BIT(32+) on arm64 produces 64-bit values truncated to 0 in u32
mask parameter, the loop still incorrectly clears queues within the
same channel beyond queue 7.
Even though this is functionally harmless (the register resets to 0
and is only ever cleared, never set — so clearing extra bits is a
no-op), the loop bound is semantically wrong and should be fixed for
correctness and clarity.
Fix by using AIROHA_NUM_QOS_QUEUES (8) as the loop upper bound.
Fixes: ef1ca9271313 ("net: airoha: Add sched HTB offload support")
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Wayen Yan <win847@gmail.com>
Link: https://patch.msgid.link/178187479434.2400840.1312143943526335838@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If the allocation of fp[i].tpa_info fails, the error path will not free
the struct bnx2x_fastpath allocated earlier, as it is not linked to the
bp structure yet. Fix that by linking it immediately after allocation.
Cc: stable@vger.kernel.org
Fixes: 15192a8cf8a8 ("bnx2x: Split the FP structure")
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260620062402.89549-1-nihaal@cse.iitm.ac.in
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When CONFIG_IP_MULTIPLE_TABLES is enabled but no rule is added,
fib_lookup() performs route lookup directly on two tables.
Since the first lookup does not properly bail out, the result
of an error route in the merged local/main table could be
overwritten by another route in the default table:
# unshare -n
# ip link set lo up
# ip route add 192.168.0.0/24 dev lo table 253
# ip route add unreachable 192.168.0.0/24
# ip route get 192.168.0.1
192.168.0.1 dev lo table default uid 0
cache <local>
Once a random rule is added, the error route is respected:
# ip rule add table 0
# ip rule del table 0
# ip route get 192.168.0.1
RTNETLINK answers: No route to host
Let's fix the inconsistent behaviour.
Fixes: f4530fa574df ("ipv4: Avoid overhead when no custom FIB rules are installed.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260619212753.3367244-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Kernel selftests wait 1.25x of the promised stats refresh time
(as read from ethtool -c). bnxt reports 1sec by default, but
the stats update process has two steps. First device DMAs the
new values, then the service task performs update in full-width
SW counters. So the worst case delay is actually 2x.
Note that the behavior is different for ring stats and port stats.
Port stats are fetched synchronously by the service worker, so
there's no risk of doubling up the delay there.
The problem of stale stats impacts not only tests but real workloads
which monitor egress bandwidth of a NIC. The inaccuracy causes double
counting in the next cycle and spurious overload alarms.
Try to read from the DMA buffer more aggressively, to mitigate
timing issues between DMA and service task. The SW update should
be cheap.
Fixes: 51f307856b60 ("bnxt_en: Allow statistics DMA to be configurable using ethtool -C.")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20260619191538.104165-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
fib6_nh_mtu_change() re-fetches idev via __in6_dev_get(arg->dev) and
dereferences idev->cnf.mtu6 without a NULL check. addrconf_ifdown()
clears dev->ip6_ptr with RCU_INIT_POINTER() after rt6_disable_ip() has
released tb6_lock, so the RA-driven MTU walk can observe a NULL idev and
oops. The caller rt6_mtu_change_route() guards its own __in6_dev_get(),
but this re-fetch is unguarded; nexthop-backed routes survive
addrconf_ifdown()'s flush, so the walk still reaches it after ip6_ptr is
nulled.
Return 0 when idev is NULL, matching rt6_mtu_change_route() and the
fib6_mtu() fix in commit 5ad509c1fdad ("ipv6: Fix null-ptr-deref in
fib6_mtu().").
Oops: general protection fault, ... KASAN: null-ptr-deref in range
[0x00000000000002a8-0x00000000000002af]
RIP: 0010:fib6_nh_mtu_change+0x203/0x990
rt6_mtu_change_route+0x141/0x1d0
__fib6_clean_all+0xd0/0x160
rt6_mtu_change+0xb4/0x100
ndisc_router_discovery+0x24b5/0x2cb0
icmpv6_rcv+0x12e9/0x1710
ipv6_rcv+0x39b/0x410
Fixes: c0b220cf7d80 ("ipv6: Refactor exception functions")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260619045334.2427073-1-xmei5@asu.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Each PPP control protocol (LCP/IPCP/IPV6CP) embedded in struct ppp
registers a timer via timer_setup(). That struct ppp is the
hdlc->state allocation, which detach_hdlc_protocol() frees with kfree()
in both teardown paths: unregister_hdlc_device() and the re-attach inside
attach_hdlc_protocol().
The ppp proto never registered a .detach callback, so
detach_hdlc_protocol() performs no timer synchronization before the
kfree(). The only cancel, timer_delete(&proto->timer) in ppp_cp_event(),
is partial (it does not wait for a running callback) and only runs on the
->CLOSED transition; ppp_stop()/ppp_close() do not sync either. A
ppp_timer callback already executing (blocked on ppp->lock) survives the
kfree and then dereferences proto->state / ppp->lock in freed memory,
leading to a use-after-free.
Fix this by adding a .detach helper that calls timer_shutdown_sync() on
every per-proto timer. detach_hdlc_protocol() invokes proto->detach(dev)
before kfree(hdlc->state), so timer_shutdown_sync()
now runs on both free paths.
timer_shutdown_sync() is used instead of timer_delete_sync() because the
keepalive path re-arms the timer through add_timer()/mod_timer() and
shutdown blocks any re-activation during teardown.
Initialize the per-protocol timers in ppp_ioctl() when the protocol is
attached, and remove the now-redundant timer_setup() from ppp_start(), so
that the timers are initialized exactly once at attach time and
ppp_timer_release() never operates on uninitialized timer_list
structures. attach_hdlc_protocol() uses kmalloc() (not kzalloc), so
struct ppp's protos[i].timer is uninitialized garbage until the first
timer_setup(); without this init-at-attach, attaching the PPP protocol
without ever bringing the device up would leave timer_shutdown_sync()
operating on uninitialized memory in .detach. Moving the init out of
ppp_start() (which only runs on NETDEV_UP) into the attach path makes the
initialization unconditional and avoids initializing the same timer_list
twice.
This bug was found by static analysis.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Fan Wu <fanwu01@zju.edu.cn>
Link: https://patch.msgid.link/20260617020518.116319-1-fanwu01@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch raises `SMB3_DEFAULT_TRANS_SIZE` to 4MB to align it with
`smb2 max read/write`. This allows better I/O negotiation with modern
clients and improves sequential read/write performance on high-speed
networks.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
decode_compress_ctxt() walks CompressionAlgorithms[] using the client
supplied CompressionAlgorithmCount. That field is declared in
struct smb2_compression_capabilities_context as a fixed 4-element array,
but the number of algorithms is actually variable and clients such as
Windows advertise more than four (e.g. LZ77, LZ77+Huffman, LZNT1,
Pattern_V1 and LZ4).
The on-wire context length is already validated, so the access is within
the received buffer, but indexing the statically sized [4] array makes
UBSAN report an out-of-bounds access:
UBSAN: array-index-out-of-bounds in smb2pdu.c:1122:48
index 4 is out of range for type '__le16 [4]'
Call Trace:
smb2_handle_negotiate+0xda7/0xde0 [ksmbd]
ksmbd_smb_negotiate_common+0x27b/0x3e0 [ksmbd]
smb2_negotiate_request+0x14/0x20 [ksmbd]
handle_ksmbd_work+0x181/0x500 [ksmbd]
Walk the algorithms through a pointer so the fixed-array bounds check is
not applied, while keeping the existing length validation that bounds the
loop to the data actually received.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The durable handle scavenger kthread waits up to DURABLE_HANDLE_MAX_TIMEOUT
(300 seconds) between scans using wait_event_timeout(), which sleeps in
TASK_UNINTERRUPTIBLE. When there are no durable handles pending expiry the
task stays in D state far longer than 120 seconds, so the hung task
detector prints a bogus "task ksmbd-durable-s blocked for more than 120
seconds" warning with a backtrace, even though the thread is only idle.
Use wait_event_interruptible_timeout() so the thread sleeps in
TASK_INTERRUPTIBLE, which the hung task detector ignores. This also suits
the already-freezable kthread. Treat a negative return (e.g. -ERESTARTSYS)
like a timeout when recomputing the next wake interval.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
ksmbd allocates both the volatile id (per-session file table) and the
persistent id (global file table) with idr_alloc_cyclic() starting at 0.
The first open after the module loads therefore gets volatile id 0 and
persistent id 0, and ksmbd returns an SMB2 FileId of {0, 0} in the create
response.
Clients treat an all-zero FileId as a null handle. smbtorture's
smb2_util_handle_empty() considers {0, 0} empty, so tests that guard the
close with it (e.g. smb2.oplock.statopen1, smb2.lease.statopen*) never
close that first handle. The leaked open keeps the inode's oplock count
non-zero, so a later batch oplock request on the same file is downgraded
to level II and the test fails.
Start the id allocation at 1 (KSMBD_START_FID) so no handle is ever
assigned a {0, 0} FileId, matching the behaviour of other SMB servers.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
A second open that requests only metadata-level access must not break
the existing caching state. ksmbd already skips the break for such opens
via fp->attrib_only (FILE_READ_ATTRIBUTES,
FILE_WRITE_ATTRIBUTES and FILE_SYNCHRONIZE).
An open requesting only READ_CONTROL (reading the security descriptor)
must be treated differently depending on the existing caching state.
smbtorture smb2.lease.statopen4 expects a read-control open NOT to break
a caching lease, while smb2.oplock.statopen1 expects the same open to
break a batch oplock. So READ_CONTROL is a stat open for leases but not
for oplocks.
Extend the stat-open break-skip in smb_grant_oplock() to also cover a
read-control-only open, but only when the existing holder is a lease.
The global fp->attrib_only flag (used for share-mode, rename and truncate
decisions) is left unchanged so oplock behaviour is preserved.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
smb2.streams.dir opens <dir>::$DATA with FILE_DIRECTORY_FILE and expects
STATUS_NOT_A_DIRECTORY, then opens <dir>::$DATA without it and expects
STATUS_FILE_IS_A_DIRECTORY.
Commit "treat unnamed DATA stream as base file" canonicalizes the ::$DATA
suffix to a NULL stream name so the open continues through the base-file
path. That skipped the stream/directory type validation, which was
guarded by "if (stream_name)", so opening a directory's ::$DATA stream
with FILE_DIRECTORY_FILE incorrectly returned STATUS_OK and a plain open
of it no longer reported STATUS_FILE_IS_A_DIRECTORY.
parse_stream_name() still records the explicit $DATA type in s_type even
when it clears stream_name. Run the data-stream vs directory validation
whenever s_type is DATA_STREAM, not only when stream_name is set, so the
canonicalized ::$DATA open is rejected with the correct status.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
smb2.lease.oplock and smb2.lease.breaking1 hold a lease and then issue a
single conflicting open on the same file. The held lease must break one
step to drop write caching (RWH->RH, RW->R) and then stop, so
lease_break_info.count is 1 and the lease keeps its read/handle caching.
ksmbd instead cascaded the break all the way down to none
(e.g. RWH->RH->R->none), so the break count was 2 or 3 and the reported
lease state ended at 0. Commit "chain pending lease breaks before waking
waiters" forces break_level to SMB2_OPLOCK_LEVEL_NONE for any non-lease
open against a handle-caching lease, which drives oplock_break()'s retry
loop down to none even when only one open is contending.
Drop that break_level override so a conflicting open breaks a lease only
to its own compatible level (level II, i.e. RH/R).
A deeper break is still required when a truncating open is also waiting
behind the same lease break. smb2.lease.breaking3 keeps a normal open
pending through RWH->RH and an overwrite open pending behind it, and
expects the lease to continue RH->R->none before either open completes.
The overwrite waiter sets open_trunc on the lease while it blocks on the
pending break, so extend the retry loop to chain another break while that
truncating waiter still needs the lease at none. The per-break open_trunc
snapshot stays cleared, so the cascade steps down (RH->R->none) instead of
collapsing straight to none, and the normal open stays pending until the
lease is fully broken.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
smb2.lease.break_twice first opens a file with an RHW lease and then tries
a second open with restrictive sharing. That open must fail with a sharing
violation, but the existing lease should be broken from RHW to RW because
only handle caching conflicts with the requested sharing.
ksmbd used the normal write-cache break calculation for this path, so RHW
was broken to RH. The following successful open then did not generate the
expected second break from RW to R.
Pass share-conflict context into the lease break helper and, for lease
breaks caused by sharing, drop only SMB2_LEASE_HANDLE_CACHING from the
current lease state. Other break paths keep the existing write/truncate
break behavior.
A share-conflict break must also remain a single break. The triggering
open fails with a sharing violation and is never granted, so there is no
target oplock level to converge on. The lease break retry loop, however,
keeps breaking while the lease level is still above req_op_level, which
broke RHW all the way down to R in one open (lease_break_info.count became
2 instead of 1). Skip the again loop for share-conflict breaks so the
sharing open produces exactly one RHW->RW break and the later successful
open produces the separate RW->R break the test expects.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
smb2.lease.request verifies which SMB2 lease state combinations are granted
by the server. Requests for H-only, W-only, and HW leases are valid lease
state bitmasks, but they are not grantable combinations and should be
returned as lease state none.
ksmbd only checked that the requested bits were inside the SMB2 lease state
mask. As a result it could grant H-only, W-only, or HW requests and return
non-zero lease states where the client expects no lease.
Keep the bitmask validation, but normalize ungrantable combinations to zero
before allocating or looking up the lease. The grantable combinations
remain unchanged: R, RH, RW, and RHW.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
SMB2 level II to none oplock breaks do not require an acknowledgment from
the client. smb2.oplock.levelii500 intentionally acknowledges such a break
and expects the server to reject it with STATUS_INVALID_OPLOCK_PROTOCOL.
ksmbd drops the local level II oplock to none immediately after sending the
break notification because it does not wait for an ACK. When the client
then sends the invalid ACK, smb20_oplock_break_ack() sees that the oplock
is not in OPLOCK_ACK_WAIT state and returns STATUS_INVALID_DEVICE_STATE
before checking the current oplock level.
If the oplock is already none when an unexpected SMB2 oplock break ACK
arrives, report STATUS_INVALID_OPLOCK_PROTOCOL. Keep the existing
STATUS_INVALID_DEVICE_STATE response for other unexpected non-wait states.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
smb2_util_unlink() opens the target with FILE_DELETE_ON_CLOSE and then
closes that handle. Other clients can also mark a file for delete with
SMB2 SET_INFO FileDispositionInformation.
When these unlink paths break existing SMB2 level II oplocks, ksmbd sends
an unsolicited SMB2_OPLOCK_BREAK notification to none. This races with the
synchronous CREATE or SET_INFO response expected by the client, and
smbtorture reports NT_STATUS_INVALID_NETWORK_RESPONSE while running
smb2.oplock.exclusive2.
SMB2 level II oplock breaks do not require an acknowledgment in the delete
path. Keep lease handling unchanged, but drop plain SMB2 level II oplocks
locally for unlink requests without sending a break notification. Normal
write/truncate paths still send the level II to none notification,
preserving the behavior covered by smb2.oplock.levelII500.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
smb2.oplock.batch22a opens a file with a batch oplock and then issues a
second open that waits for the oplock break timeout. After the timeout the
second open should succeed, but the granted oplock level must be level II.
When the break times out, oplock_break() returns -ENOENT after invalidating
the previous opener. smb_grant_oplock() went straight to set_lev with the
original requested oplock level, so the second open could be granted a new
batch oplock. Downgrade the requested oplock to level II on the -ENOENT
break-timeout path before granting the oplock to the new open.
A break that completes because the previous owner closed its handle from
the oplock break handler must be distinguished from a real timeout.
smb2.oplock.batch7 closes the first handle during the break wait, and the
second open is then expected to be granted the originally requested batch
oplock. Return -EAGAIN from the non-lease break path when the previous
opener closed during the break wait, recheck sharing in smb_grant_oplock(),
and grant the requested oplock if the close removed the conflict. Real
break timeouts still return -ENOENT and keep the downgrade to level II.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
smb2.create.aclfile creates files with an SMB2_CREATE_SD_BUFFER create
context and expects the resulting security descriptor to match
the descriptor supplied by the client.
ksmbd currently tries to inherit the parent DACL first and only parses
the SMB2_CREATE_SD_BUFFER context when DACL inheritance fails.
If inheritance succeeds, the explicit security descriptor supplied on
create is ignored. This breaks create requests that include owner/group
information in the security descriptor.
Apply the create security descriptor first when the context is present.
Fall back to the existing inherited/default ACL path only when no create
security descriptor was supplied.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
smb2.create.blob sends an SMB2_CREATE_ALLOCATION_SIZE create context with
a 1MiB allocation size and expects the create response AllocationSize field
to match the requested size. smb2.create.open additionally compares the
AllocationSize returned in the CREATE response with the AllocationSize
returned by FILE_ALL_INFORMATION on the same handle.
ksmbd applies the allocation with fallocate(), but then fills both the
create response and handle-based information from stat.blocks << 9. On
filesystems such as ext4 this can include filesystem allocation rounding
and metadata effects, causing a response larger than the SMB2 allocation
size context and a disagreement between the two queries.
Remember the requested allocation size while processing the create context,
store the reported allocation size in struct ksmbd_file, and use it for
both the create response and handle-based allocation size responses. Update
the stored value when FILE_ALLOCATION_INFORMATION changes it, and fall back
to stat.blocks << 9 when no allocation size context was provided.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
smb2.create.gentest checks each create FileAttributes bit independently and
expects FILE_ATTRIBUTE_INTEGRITY_STREAM and FILE_ATTRIBUTE_NO_SCRUB_DATA to
be rejected with STATUS_INVALID_PARAMETER.
ksmbd validates create FileAttributes against FILE_ATTRIBUTE_MASK, which
includes those bits. It also rejects only requests that have no known
attribute bit at all, so a request containing both known and unknown bits
can pass validation.
Use a create-specific attribute mask that excludes INTEGRITY_STREAM and
NO_SCRUB_DATA, and reject any bit outside that mask.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
smb2.create.gentest checks each desired access bit independently and
expects an open that requests only SYNCHRONIZE with CreateDisposition
OPEN_IF and FileAttributes 0 to fail with STATUS_ACCESS_DENIED.
Rejecting all SYNCHRONIZE-only opens is too broad: SYNCHRONIZE does not
imply read, write, or delete data access, and
smb2.sharemode.sharemode-access expects a SYNCHRONIZE-only open to succeed
when it does not conflict with the existing share mode.
Limit the rejection to the gentest create shape: SYNCHRONIZE-only access,
OPEN_IF disposition, and no file attributes. Other synchronize-only opens
are handled by the normal permission and share-mode checks.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
smb2.streams.delete opens an alternate data stream without
FILE_SHARE_DELETE and then tries to delete the base file. Windows rejects
the base-file delete with STATUS_SHARING_VIOLATION while the stream handle
is open. ksmbd tracks stream opens on the same ksmbd_inode as the base
file, but the delete-on-close path only checked delete access on the base
handle before marking the inode delete-pending. As a result, deleting
the base file succeeded even though an open stream handle denied delete
sharing. Add a helper to detect open stream handles on the same inode that
do not allow FILE_SHARE_DELETE, and reject base-file delete pending and
DELETE opens with a sharing violation in that case.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
smb2.compound_async.write_write and smb2.compound_async.read_read expect
the last I/O request in a compound request to become cancellable before
its final response is received. smb clients mark a request cancellable
after receiving an interim STATUS_PENDING response.
ksmbd handled the last READ/WRITE synchronously and returned the final
response directly, so the client never observed STATUS_PENDING and
req->cancel.can_cancel remained false.
For the last READ or WRITE in a compound request, register the work briefly
as async and send a STATUS_PENDING interim response before continuing with
the normal synchronous completion. The final READ/WRITE response remains
unchanged.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
ksmbd_close_fd() marks an open file as FP_CLOSED and drops the file table
reference. If another in-flight request still holds a reference, the final
close is deferred until that request drops its reference.
The function currently returns -EINVAL in that deferred-final-close case
because fp is cleared when the reference count does not reach zero. That
turns a valid close into STATUS_FILE_CLOSED.
smb2.compound_find.compound_find_close sends QUERY_DIRECTORY and then
closes the same directory handle before receiving the find response.
The query holds a reference while it builds the response, so close must
mark the handle closed and return success even though final teardown is
delayed. Track whether the handle was successfully transitioned to
FP_CLOSED and return success when only the final close is deferred.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
set_smb2_rsp_status() resets the response iov and compound offsets before
building an error response. That is fine for a single request, but it
corrupts a compound response when an error is detected after an earlier
compound element has already been completed.
smb2.compound.invalid4 sends a READ as the first compound element and a
bogus command as the second one. The READ response must remain in
the compound response with STATUS_END_OF_FILE, followed by the bogus
command response with STATUS_INVALID_PARAMETER. Resetting the response
state for the second command breaks the compound framing and the client
reports NT_STATUS_INVALID_NETWORK_RESPONSE.
When setting an error for a chained command, update and pin only
the current compound response slot instead of resetting the whole response.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
FSCTL_CREATE_OR_GET_OBJECT_ID returned a dummy successful response without
checking whether the request handle was valid. That let an invalid related
compound handle succeed in smb2.compound.related5, although the client
expected STATUS_FILE_CLOSED.
Look up the file handle before building the object id response and fail
with STATUS_FILE_CLOSED when the handle is invalid or already closed.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
In a related compound request, later commands can refer to the file handle
from an earlier command using the related FID value. If the earlier
command fails without producing a valid compound FID, the later related
commands must fail with the same status instead of operating on an invalid
or stale handle.
smb2.compound.related4 sends CREATE followed by IOCTL, CLOSE and SET_INFO.
The CREATE is expected to fail with STATUS_ACCESS_DENIED, and the remaining
related commands are expected to return STATUS_ACCESS_DENIED as well. ksmbd
only stored the compound FID on successful CREATE and did not remember
failed compound statuses.
Store the failed status in the work item and make related handle-based
requests fail immediately with that status only when the compound FID is
invalid. Also preserve and consume the related FID across successful
FLUSH, READ and WRITE requests whose responses do not carry a file id. Keep
a valid compound FID across non-close failures so later related commands
can continue to use the handle.
When extracting the FID from a successful READ, WRITE or FLUSH request, use
the request structure matching the SMB2 command: READ and WRITE place
PersistentFileId and VolatileFileId at a different offset than FLUSH, so a
single smb2_flush_req cast can save the wrong value as compound_fid and
make the following related request fail with STATUS_FILE_CLOSED
(smb2.compound_async.write_write after smb2.compound_async.flush_flush).
Only update the saved compound FID when the request carries a valid
volatile FID. otherwise an all-ones related FID would overwrite the CREATE
FID and break smb2.compound.related6.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Windows denies renaming a directory while a file below that directory is
still open. smb2.rename.rename_dir_openfile checks this by keeping a file
handle open under the directory and then attempting to rename the directory
handle. ksmbd did not check open children before calling vfs_rename(), so
the rename incorrectly succeeded.
For non-POSIX clients, scan the global open file table for active handles
whose dentries are below the directory being renamed. If any child is
open, fail the rename with -EACCES so the client receives
STATUS_ACCESS_DENIED.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When renaming a file, some existing opens on the parent directory must
block the rename with STATUS_SHARING_VIOLATION. This includes parent
directory handles opened with DELETE access and handles opened without
FILE_SHARE_DELETE.
ksmbd checked only the parent's desired access for FILE_DELETE. That
handled smb2.rename.share_delete_and_delete_access, but missed the case
where the parent directory was opened without delete access and without
delete sharing, so smb2.rename.no_share_delete_no_delete_access incorrectly
succeeded.
Attribute-only parent opens, however, must not block the rename.
smb2.rename.msword opens the parent directory with only SYNCHRONIZE and
FILE_READ_ATTRIBUTES, no share access, and then renames an already-open
child file. Windows allows this pattern.
Reject parent directory handles that request DELETE access, and reject
non-attribute-only parent opens that deny FILE_SHARE_DELETE, while allowing
attribute-only parent opens to coexist with child rename.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
inode ctime is updated when a file is renamed. ksmbd returned that
ctime directly as SMB2 ChangeTime for handle-based query information.
This makes ChangeTime change after a rename through an already-open
handle, while Windows keeps the handle's ChangeTime stable for this
case.
Store the SMB ChangeTime in struct ksmbd_file when the handle is opened
and use that value for create, close, and handle-based query information
responses. If a client explicitly sets FILE_BASIC_INFORMATION
ChangeTime, update the stored value as well.
This fixes smbtorture smb2.rename.simple_modtime, which expects
ChangeTime and LastWriteTime to remain unchanged after renaming an
already-open file.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The SMB2_CREATE_APP_INSTANCE_ID create context is used with durable v2
opens to identify another open from the same application instance. When
a new durable v2 open arrives with the same AppInstanceId as an existing
open, the server should close the previous open without sending an
oplock break notification.
ksmbd ignored this create context. A second durable v2 batch oplock open
with the same AppInstanceId therefore went through the normal competing
open path and sent an oplock break to the first opener. smbtorture
smb2.durable-v2-open.app-instance expects no oplock break and then
expects the old handle to be closed.
Parse and store AppInstanceId for durable v2 opens. Before creating the
new open, find an existing file with the same AppInstanceId and close it
through the normal close teardown path without issuing an oplock break.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
SMB2 create context DataLength describes only the create context data
payload. It does not include the create context header, name field, or
any local padding that exists in ksmbd's helper structures.
ksmbd validated durable reconnect contexts by comparing
DataOffset + DataLength against sizeof the whole helper structure. This
rejects a valid durable v2 reconnect context because the wire DH2C data
is 36 bytes while struct create_durable_handle_reconnect_v2 contains an
extra four byte pad.
Validate the durable context payload length against the corresponding
payload member instead. Also keep the reconnect context authoritative
when a later durable request context is present, matching the existing
durable v1 reconnect behavior.
This fixes smbtorture smb2.durable-v2-open.durable-v2-setinfo, where
the durable v2 reconnect after SET_INFO was rejected with
STATUS_INVALID_PARAMETER.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When a durable handle is preserved after a disconnect, its oplock state
can still block later opens. If another client opens the same file and
the preserved oplock or lease has to be broken, the old durable handle
must no longer be reconnectable after the break cannot be acknowledged.
ksmbd was treating a missing connection, or an oplock break timeout, as a
successful break only by downgrading the oplock state. The old durable
handle remained reconnectable, so a later durable reconnect for that
stale handle could succeed.
The open path can also see a detached durable handle before the break
notification helpers fully dispose of it. Invalidate such a preserved
durable handle directly when a competing open has to break its batch or
exclusive oplock, while leaving ordinary durable reconnects without a
competing open untouched.
If the old handle still reaches the reconnect path, reject it when the
same inode already has another active open. This matches the
smb2.durable-open.open2-lease/open2-oplock sequence where a later open
replaces the disconnected durable owner and the stale first handle must
not be reclaimed.
Also, reconnect lookup used only the persistent id. A new durable open
can get a persistent id that matches the stale reconnect request after
the old durable state is invalidated. Preserve the disconnected
handle's old volatile id and require durable reconnect contexts to match
it, so a stale reconnect cannot attach to a different durable open.
Windows allows the later open to proceed and rejects the old reconnect
with STATUS_OBJECT_NAME_NOT_FOUND. The smbtorture
smb2.durable-open.oplock test covers this case.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
A durable handle opened with FILE_DELETE_ON_CLOSE is preserved across a
disconnect so it can be reclaimed by a durable reconnect.
smb2.durable-open.delete_on_close2 disconnects such a handle and then
reconnects it, expecting the reconnect to succeed.
When the client does not reconnect but instead opens the same name with a
new delete-on-close create, the preserved handle keeps the file present
with delete-on-close set. ksmbd then rejects the new open with
STATUS_ACCESS_DENIED on the file_present + FILE_DELETE_ON_CLOSE +
OPEN_IF/OVERWRITE_IF path. smb2.durable-open.delete_on_close1 expects this
open to create a fresh, empty file instead, i.e. the disconnected handle's
delete-on-close must take effect first.
Add ksmbd_close_disconnected_durable_delete_on_close(), which closes
disconnected (conn == NULL) durable handles that keep a delete-on-close
file present. The final close promotes S_DEL_ON_CLS to S_DEL_PENDING and
unlinks the file, so a re-resolved path is absent and the new open creates
it fresh. Call it from smb2_open() before the delete-on-close conflict
check, only for the conflicting open shapes. A live (connected) handle
still keeps the file and blocks the open as before.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
smb2_find_context_vals() assumes that callers only search create
contexts when the SMB2 CREATE request contains a non-empty create context
area. That is not always true. a client can send RequestedOplockLevel set
to SMB2_OPLOCK_LEVEL_LEASE without a lease create context.
In that case parse_lease_state() searches for a lease context and
smb2_find_context_vals() starts parsing from offset 0 with length 0,
returning -EINVAL. This makes the open fail with STATUS_INVALID_PARAMETER.
The smbtorture smb2.lease.duplicate_open test hits this while creating
a second file without a lease request.
Return NULL when the request has no create context area so the missing
context is treated the same as any other absent create context. The open
then continues without granting a lease.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|