| Age | Commit message (Collapse) | Author |
|
This needs to test for nonzero retval.
Fixes: c54c7c685494 ("netfilter: nft_meta_bridge: add NFT_META_BRI_IIFPVID support")
Closes: https://sashiko.dev/#/patchset/20260618061631.21919-1-fw%40strlen.de
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This patch replaces the timer API by GC worker approach for
expectations, as it already happened in many other subsystems.
Use the existing conntrack GC worker to iterate over the local list of
expectations in the master conntrack to reap expired expectations.
Check IPS_HELPER_BIT to run GC for expectations, set it on for nft_ct
expectation which nevers sets it. Hold the expectation spinlock while
iterating over the master conntrack expectation list to synchronize with
nf_ct_remove_expectations(). This also performs runtime packet path
garbage collection through the expectation insertion and lookup
functions while walking over one of the chains of the global expectation
hashtables. Unconfirmed conntrack entries are skipped since ct->ext can
be reallocated and dying are skipped since those will be gone soon.
Set on IPS_HELPER_BIT if the helper ct extension is added, then the new
GC worker does not need to bump the ct refcount to check if the ct->ext
helper is available.
This removes the extra bump on the refcount for expectation timers, this
allows to remove several nf_ct_expect_put() calls after the unlink,
after this update only refcount remains at 1 while on the expectation
hashes.
This patch implicitly addresses a race with the existing timer API
allowing an expectation to access a stale exp->master pointer which has
been already released when expectation removal loses races with an
expiring timer, ie. timer_del() reporting false.
Add a new NF_CT_EXPECT_DEAD flag to reap this expectation via GC. This
is needed by nf_conntrack_unexpect_related() which is called in error
paths to invalidate newly created expectations that has been added into
the hashes. These expectactions cannot be inmediately released as GC or
nf_ct_remove_expectations() could race to make it. On expectation
insert, the runtime GC reaps stale expectations before checking the
expectation limit set by policy.
Set current timestamp in nf_ct_expect_alloc(), then add the expectation
policy timeout (or custom timeout specified added on top of this) to
specify the expectation lifetime.
Fixes: bffcaad9afdf ("netfilter: ctnetlink: ensure safe access to master conntrack")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Not a big deal but this hould have used the real ip header length and not the
base header size. As-is, if there are options then
nf_skb_is_icmp_unreach() result will be random.
Fixes: db99b2f2b3e2 ("netfilter: nf_reject: don't reply to icmp error messages")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
LLM points out that the skip causes unitialised stack array to
propagate down into dev_fill_forward_path(). Its not clear to me that
there is a guarantee that a later ctx.dev->netdev_ops->ndo_fill_forward_path()
would always fix this up.
Cc: Felix Fietkau <nbd@nbd.name>
Fixes: 45ca3e61999e ("netfilter: nft_flow_offload: skip dst neigh lookup for ppp devices")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Blamed commit added NFT_META_BRI_IIFHWADDR to the set validate callback,
yet this is a get operation.
Add a get validate callback and move the NFT_META_BRI_IIFHWADDR key
there.
AFAICS this is harmless, NFT_META_BRI_IIFHWADDR can deal with a NULL
input device and the set handler ignores a NFT_META_BRI_IIFHWADDR
operation, but it allows to read 4 bytes off bridge skb->cb[].
Fixes: cbd2257dc96e ("netfilter: nft_meta_bridge: introduce NFT_META_BRI_IIFHWADDR support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Large offsets were rejected based on netlink policy, but blamed commit
removed the policy without updating nft_payload_inner_init() to use the
truncation-check helper.
Silent truncation is not a problem, but not wanted either, so add a
check.
Fixes: 077dc4a27579 ("netfilter: nft_payload: extend offset to 65535 bytes")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Sashiko noticed that when destroying a set,
cancel_delayed_work_sync() was called while gc
calls queue_delayed_work() unconditionally which
can lead not to properly shutting down the gc.
Fixes: f66ee0410b1c ("netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Sashiko pointed out that kfree_rcu() was called before
rcu_assign_pointer() in handling the comment extension.
Fix the order so that rcu_assign_pointer() called first.
Fixes: b57b2d1fa53f ("netfilter: ipset: Prepare the ipset core to use RCU at set level")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The pair of the patch "netfilter: ipset: Don't use test_bit() in lockless
RCU readers in hash types" for the bitmap types.
Fixes: 02a3231b6d82 ("netfilter: nf_conntrack_expect: store netns and zone in expectation")
Fixes: b0da3905bb1e ("netfilter: ipset: Bitmap types using the unified code base")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Sashiko pointed out that there are a few lockless RCU readers
using test_bit() which is a relaxed atomic operation and
provides no memory barrier guarantees. Use test_bit_acquire()
instead where the operation may run parallel with add/del/gc,
i.e. is not one from the next cases
- protected by region lock
- in a set destroy phase
- in a new/temporary set creation phase
Fixes: 18f84d41d34f ("netfilter: ipset: Introduce RCU locking in hash:* types")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Once the 32-bit seq wraps, a newer bm_seq can look smaller
than old, so .. covert to wrap-safe calculate way.
Signed-off-by: Chen Cheng <chencheng@fnnas.com>
Link: https://patch.msgid.link/20260618025735.915113-1-chencheng@fnnas.com
Signed-off-by: Yu Kuai <yukuai@fygo.io>
|
|
KCSAN reports a data race between raid1_end_read_request() and
raid1_read_request().
The completion path updates conf->mirrors[disk].head_position in
update_head_pos() without a lock, while the read-balance heuristic reads
the same field locklessly in is_sequential() and choose_best_rdev().
KCSAN report:
=========================
BUG: KCSAN: data-race in raid1_end_read_request / raid1_read_request
write to 0xffff8f0306ba7868 of 8 bytes by interrupt on cpu 9:
raid1_end_read_request+0xb5/0x440
bio_endio+0x3c9/0x3e0
blk_update_request+0x257/0x770
scsi_end_request+0x4d/0x520
scsi_io_completion+0x6f/0x990
scsi_finish_command+0x188/0x280
scsi_complete+0xac/0x160
blk_complete_reqs+0x8e/0xb0
blk_done_softirq+0x1d/0x30
[...]
read to 0xffff8f0306ba7868 of 8 bytes by task 667002 on cpu 11:
raid1_read_request+0x497/0x1a10
raid1_make_request+0xdf/0x1950
md_handle_request+0x2c5/0x700
md_submit_bio+0x126/0x320
__submit_bio+0x2ec/0x3a0
submit_bio_noacct_nocheck+0x572/0x890
[...]
value changed: 0x0000000000000078 -> 0x00000000005fe448
Signed-off-by: Chen Cheng <chencheng@fnnas.com>
Link: https://patch.msgid.link/20260619044114.1208456-1-chencheng@fnnas.com
Signed-off-by: Yu Kuai <yukuai@fygo.io>
|
|
When a read is retried, raid1_read_request() may be called with a
pre-allocated r1_bio. If wait_read_barrier() fails for a REQ_NOWAIT
read, the bio is completed and the function returns immediately. In this
case the existing r1_bio is leaked.
This fixes a leak of pre-allocated r1_bio structures for retried reads.
Fixes: 5aa705039c4f ("md: raid1 add nowait support")
Reported-by: sashiko-bot <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260611083514.754922-1-abd.masalkhi@gmail.com?part=1
Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
Link: https://patch.msgid.link/20260611101350.759154-1-abd.masalkhi@gmail.com
Signed-off-by: Yu Kuai <yukuai@fygo.io>
|
|
raid1 supports REQ_NOWAIT reads by avoiding waits in the barrier path
through wait_read_barrier(). However, a read can still block on a
WriteMostly device when the array uses a bitmap and there are
outstanding behind writes.
In that case raid1 unconditionally calls wait_behind_writes(), which
may sleep until all behind writes complete. As a result, a REQ_NOWAIT
read can block despite the caller explicitly requesting non-blocking
behavior.
This ensures that raid1 consistently honors REQ_NOWAIT reads across all
paths that may otherwise wait for behind writes.
Fixes: 5aa705039c4f ("md: raid1 add nowait support")
Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
Link: https://patch.msgid.link/20260611083514.754922-1-abd.masalkhi@gmail.com
Signed-off-by: Yu Kuai <yukuai@fygo.io>
|
|
llbitmap discard is useful even when no underlying member device supports
it. The discard still converts the llbitmap range to unwritten, so later
reads and recovery do not rely on stale parity for that range.
Let llbitmap discard bypass the raid5 lower discard support check. If lower
discard is not safe or not supported, complete the accounted clone after
md_account_bio() so the llbitmap conversion callbacks run without member
discard bios.
Link: https://patch.msgid.link/20260605072639.2434847-4-yukuai@kernel.org
Signed-off-by: Yu Kuai <yukuai@fygo.io>
|
|
Raid5 used to disable discard limits when devices_handle_discard_safely
was not set or when stacked member limits could not support a full-stripe
discard. That hides discard from userspace before raid5 can decide whether
a request can be handled safely.
Follow other virtual drivers and advertise a UINT_MAX discard limit for the
md device. Cache lower discard support in r5conf when setting queue limits,
and reject unsupported discard bios before queuing stripe work.
Link: https://patch.msgid.link/20260605072639.2434847-3-yukuai@kernel.org
Signed-off-by: Yu Kuai <yukuai@fygo.io>
|
|
Raid5 handles discard bios internally through make_discard_request() and
never passes them through md_account_bio(). As a result, discard IO is
missing the md-device iostat accounting that normal raid5 IO and discard
IO in other raid levels get from md_account_bio().
Before accounting the bio, trim the request to the full data stripes that
raid5 will actually discard. The first full stripe is the ceiling of the
bio start divided by data-stripe sectors, and the last full stripe is the
floor of the bio end divided by data-stripe sectors. Account that exact
MD logical full-stripe range, then restore the original iterator so bio
completion and iostat still cover the original request.
Link: https://patch.msgid.link/20260605072639.2434847-2-yukuai@kernel.org
Signed-off-by: Yu Kuai <yukuai@fygo.io>
|
|
raid1_write_request() increments rdev->nr_pending before checking the
badblocks and then immediately decrements it again when a device is
skipped. Move the increment until after the checks succeed so the
reference accounting is easier to follow.
Consolidate the failure paths so that each error label releases exactly
the resources acquired up to that point. err_dec_pending drops pending
references and frees the r1bio, while err_allow_barrier handles the
barrier release before returning.
When a REQ_ATOMIC write cannot be satisfied due to a badblock range,
complete the bio with BLK_STS_NOTSUPP rather than reporting an I/O
error, since the operation is unsupported rather than having failed
during I/O.
Rename max_write_sectors to max_sectors and remove the redundant local
copy.
Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
Link: https://patch.msgid.link/20260613182810.1317258-5-abd.masalkhi@gmail.com
Signed-off-by: Yu Kuai <yukuai@fygo.io>
|
|
raid10_make_request() acquires a writes_pending reference with
md_write_start() before calling raid10_handle_discard(). Several failure
paths in raid10_handle_discard() complete the bio and return without
releasing the corresponding reference, causing md_write_end() to be
skipped.
Call md_write_end() before returning from these failure paths to keep
writes_pending accounting balanced.
Additionally, discard split allocation failures can occur after
wait_barrier() succeeds. Those paths return without calling
allow_barrier(), leaking the associated barrier reference.
Release the barrier before returning from those paths.
Fixes: c9aa889b035f ("md: raid10 add nowait support")
Fixes: 4cf58d952909 ("md/raid10: Handle bio_split() errors")
Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
Link: https://patch.msgid.link/20260613182810.1317258-4-abd.masalkhi@gmail.com
Signed-off-by: Yu Kuai <yukuai@fygo.io>
|
|
raid10_make_request() acquires a writes_pending reference with
md_write_start() before dispatching write requests. Several failure
paths in raid10_write_request() complete the bio and return without
reaching the normal write completion path, causing the corresponding
md_write_end() to be skipped.
Make raid10_write_request() return a status indicating whether the write
request was successfully queued. This allows raid10_make_request() to
release the writes_pending reference with md_write_end() when a write
request fails.
Fixes: 4cf58d952909 ("md/raid10: Handle bio_split() errors")
Fixes: c9aa889b035f ("md: raid10 add nowait support")
Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
Link: https://patch.msgid.link/20260613182810.1317258-3-abd.masalkhi@gmail.com
Signed-off-by: Yu Kuai <yukuai@fygo.io>
|
|
raid1_make_request() acquires a writes_pending reference with
md_write_start() before calling raid1_write_request(). Several failure
paths in raid1_write_request() complete the bio and return without
reaching the normal write completion path, causing the corresponding
md_write_end() to be skipped.
Make raid1_write_request() return a status indicating whether the write
request was successfully queued. This allows raid1_make_request() to
call md_write_end() when raid1_write_request() fails.
Additionally, if wait_blocked_rdev() fails after wait_barrier()
succeeds, the associated barrier reference is not released.
Call allow_barrier() before returning from that path to keep the barrier
accounting balanced.
Fixes: b1a7ad8b5c4f ("md/raid1: Handle bio_split() errors")
Fixes: f2a38abf5f1c ("md/raid1: Atomic write support")
Fixes: 5aa705039c4f ("md: raid1 add nowait support")
Reported-by: sashiko-bot <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260611083514.754922-1-abd.masalkhi@gmail.com?part=1
Closes: https://sashiko.dev/#/patchset/20260611132500.763528-1-abd.masalkhi@gmail.com?part=1
Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
Link: https://patch.msgid.link/20260613182810.1317258-2-abd.masalkhi@gmail.com
Signed-off-by: Yu Kuai <yukuai@fygo.io>
|
|
The correct path of the "hotmod" module parameter should be
/sys/module/ipmi_si/parameters/hotmod. Fix it.
Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev>
Message-ID: <20260620122747.7902-1-zenghui.yu@linux.dev>
Signed-off-by: Corey Minyard <corey@minyard.net>
|
|
Refactor the control flow in id_mode_to_cifs_acl() to reduce nesting and
prevent error code overwriting.
Instead of wrapping the call to ops->set_acl() in a conditional block,
introduce early exits (goto id_mode_to_cifs_acl_exit) when build_sec_desc()
fails or ops->set_acl is NULL. This ensures that any actual error returned
by build_sec_desc() is not overwritten with -EOPNOTSUPP.
Signed-off-by: Ralph Boehme <slow@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
Pull power supply and reset updates from Sebastian Reichel:
"Power-supply drivers:
- New EC driver providing battery info for Microsoft Surface RT
- New driver for battery charger in Samsung S2M PMICs
- Rework max17042 driver
- sysfs control for bd71828 auto input current limitation
All over:
- Use named fields for struct platform_device_id and of_device_id
entries
- Misc small cleanups and fixes"
* tag 'for-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (33 commits)
Documentation: ABI: sysfs-class-reboot-mode-reboot_modes: fix doc warnings
power: supply: charger-manager: fix refcount leak in is_full_charged()
power: supply: core: fix supplied_from allocations
power: supply: max17042_battery: Use modern PM ops to clear up warning
power: supply: add support for Samsung S2M series PMIC charger device
power: supply: Add support for Surface RT battery and charger
dt-bindings: embedded-controller: Document Surface RT EC
power: supply: bd71828: sysfs for auto input current limitation
power: supply: cpcap-charger: include missing <linux/property.h>
power: supply: cros_charge-control: Move MODULE_DEVICE_TABLE next to the table itself
power: supply: ab8500_fg: Fix typos in comments
power: supply: Use named initializers for arrays of i2c_device_data
power: supply: Remove unused jz4740-battery.h
power: reset: st-poweroff: Use of_device_get_match_data()
power: supply: bq257xx: Add fields for 'charging' and 'overvoltage' states
power: supply: bq257xx: Consistently use indirect get/set helpers
power: supply: bq257xx: Make the default current limit a per-chip attribute
power: supply: bq257xx: Fix VSYSMIN clamping logic
power: supply: cpcap-battery: Fix missing nvmem_device_put() causing reference leak
power: supply: max17042: fix OF node reference imbalance
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull strncpy removal from Kees Cook:
- Remove the per-arch strncpy implementations in alpha, m68k, powerpc,
x86, and xtensa
- Remove strncpy API
Over the last 6 years working on strncpy removal there were 362
commits by 70 contributors. Folks with more than 1 commit were:
211 Justin Stitt <justinstitt@google.com>
22 Xu Panda <xu.panda@zte.com.cn>
21 Kees Cook <kees@kernel.org>
17 Thorsten Blum <thorsten.blum@linux.dev>
12 Arnd Bergmann <arnd@arndb.de>
4 Pranav Tyagi <pranav.tyagi03@gmail.com>
4 Lee Jones <lee@kernel.org>
2 Steven Rostedt <rostedt@goodmis.org>
2 Sam Ravnborg <sam@ravnborg.org>
2 Marcelo Moreira <marcelomoreira1905@gmail.com>
2 Krzysztof Kozlowski <krzk@kernel.org>
2 Kalle Valo <kvalo@kernel.org>
2 Jaroslav Kysela <perex@perex.cz>
2 Daniel Thompson <danielt@kernel.org>
2 Andrew Lunn <andrew@lunn.ch>
* tag 'strncpy-removal-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
string: Remove strncpy() from the kernel
xtensa: Remove arch-specific strncpy() implementation
x86: Remove arch-specific strncpy() implementation
powerpc: Remove arch-specific strncpy() implementation
m68k: Remove arch-specific strncpy() implementation
alpha: Remove arch-specific strncpy() implementation
|
|
The LLSEC ADD/DEL doit handlers under the legacy IEEE802154_NL family
consume IEEE802154_ATTR_LLSEC_KEY_BYTES and
IEEE802154_ATTR_LLSEC_KEY_USAGE_COMMANDS, both declared in
net/ieee802154/nl_policy.c as bare length entries with no .type
(defaulting to NLA_UNSPEC). Generic netlink strict validation rejects
all NLA_UNSPEC attributes via validate_nla(), so every LLSEC_ADD_KEY,
LLSEC_DEL_KEY, LLSEC_ADD_DEV, LLSEC_DEL_DEV, LLSEC_ADD_DEVKEY,
LLSEC_DEL_DEVKEY, LLSEC_ADD_SECLEVEL, and LLSEC_DEL_SECLEVEL request
fails at the dispatcher with "Unsupported attribute" before reaching
the handler.
The doit path has been silently dead since strict validation became
the default for genl families that do not opt out. The dump path is
unaffected because dump requests carry no LLSEC attributes to
validate, which is why the LLSEC_LIST_KEY read remained reachable
(patch 1/2). Introduce IEEE802154_OP_RELAXED() mirroring
IEEE802154_OP() but with .validate = GENL_DONT_VALIDATE_STRICT, and
use it for the eight legacy LLSEC mutate ops so admin-driven LLSEC
configuration via the legacy interface works again.
Fixes: 3e9c156e2c21 ("ieee802154: add netlink interfaces for llsec")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Link: https://lore.kernel.org/20260520141640.1149513-3-michael.bommarito@gmail.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|
|
In net/ieee802154/netlink.c, the legacy IEEE802154_NL family ops table
builds the LLSEC dump entries (LLSEC_LIST_KEY, LLSEC_LIST_DEV,
LLSEC_LIST_DEVKEY, LLSEC_LIST_SECLEVEL) with IEEE802154_DUMP() which
sets no .flags, so generic netlink runs them ungated. The modern
nl802154 family admin-gates the equivalent reads via
NL802154_CMD_GET_SEC_KEY and friends with .flags = GENL_ADMIN_PERM.
Any local uid that can open AF_NETLINK / NETLINK_GENERIC can resolve
the "802.15.4 MAC" family and dump LLSEC_LIST_KEY on any wpan netdev
that has an LLSEC key installed; the dump handler writes the raw
16-byte AES-128 key bytes (IEEE802154_ATTR_LLSEC_KEY_BYTES, copied
verbatim from struct ieee802154_llsec_key.key) into the reply.
Recovering the AES key compromises 802.15.4 LLSEC link confidentiality
and authenticity, since LLSEC uses CCM* and the same key authenticates
and encrypts frames.
Impact: any local uid with no capabilities can read the raw 16-byte
AES-128 LLSEC key from the kernel keytable on any wpan netdev that has
an administrator-installed LLSEC key, by issuing an LLSEC_LIST_KEY
dump on the legacy IEEE802154_NL generic-netlink family.
Introduce IEEE802154_DUMP_PRIV() mirroring IEEE802154_DUMP() but
setting .flags = GENL_ADMIN_PERM, and use it for the four LLSEC dump
entries. LIST_PHY and LIST_IFACE retain IEEE802154_DUMP() because the
modern nl802154 family exposes their equivalents to unprivileged
readers by design (NL802154_CMD_GET_WPAN_PHY and
NL802154_CMD_GET_INTERFACE carry "can be retrieved by unprivileged
users" annotations).
Fixes: 3e9c156e2c21 ("ieee802154: add netlink interfaces for llsec")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Link: https://lore.kernel.org/20260520141640.1149513-2-michael.bommarito@gmail.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|
|
When assoc_status not equal to IEEE802154_ASSOCIATION_SUCCESSFUL, the
return value assigned to either "-ERANGE" or "-EPERM" but this return
value will be overwritten to 0 after exiting the conditional scope.
So, jump to clear_assoc label to preserve the return value when
assoc_status not equal to IEEE802154_ASSOCIATION_SUCCESSFUL.
This is reported by Coverity Scan as "Unused value".
Fixes: fefd19807fe9 ("mac802154: Handle associating")
Signed-off-by: Robertus Diawan Chris <robertusdchris@gmail.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/20260602054133.470293-1-robertusdchris@gmail.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|
|
KMSAN reported a kernel-infoleak in move_addr_to_user():
BUG: KMSAN: kernel-infoleak in instrument_copy_to_user
include/linux/instrumented.h:131 [inline]
BUG: KMSAN: kernel-infoleak in _inline_copy_to_user
include/linux/uaccess.h:205 [inline]
BUG: KMSAN: kernel-infoleak in _copy_to_user+0xcc/0x120
lib/usercopy.c:26
instrument_copy_to_user include/linux/instrumented.h:131 [inline]
_inline_copy_to_user include/linux/uaccess.h:205 [inline]
_copy_to_user+0xcc/0x120 lib/usercopy.c:26
copy_to_user include/linux/uaccess.h:236 [inline]
move_addr_to_user+0x2e7/0x440 net/socket.c:302
____sys_recvmsg+0x232/0x610 net/socket.c:2925
...
Uninit was stored to memory at:
ieee802154_addr_to_sa include/net/ieee802154_netdev.h:369 [inline]
dgram_recvmsg+0xa09/0xbe0 net/ieee802154/socket.c:739
The issue occurs because the `pan_id` field of `struct ieee802154_addr`
is left uninitialized when the address mode is `IEEE802154_ADDR_NONE`.
The execution flow is as follows:
1. `__ieee802154_rx_handle_packet()` declares a local `struct
ieee802154_hdr hdr` on the stack.
2. `ieee802154_hdr_pull()` calls `ieee802154_hdr_get_addr()` to parse
the source and destination addresses into this structure.
3. If the address mode is `IEEE802154_ADDR_NONE`,
`ieee802154_hdr_get_addr()` previously only set the `mode` field,
leaving the `pan_id` field containing uninitialized stack memory.
4. This uninitialized `pan_id` is later copied into a `struct
sockaddr_ieee802154` in `dgram_recvmsg()` via `ieee802154_addr_to_sa()`.
5. Finally, `move_addr_to_user()` copies the socket address structure to
user space, leaking the uninitialized bytes.
Fix this by using `memset` to zero out the address structure in
`ieee802154_hdr_get_addr()` when the mode is `IEEE802154_ADDR_NONE`.
Fixes: 94b4f6c21cf5 ("ieee802154: add header structs with endiannes and operations")
Assisted-by: Gemini:gemini-3.1-pro-preview Gemini:gemini-3-flash-preview syzbot
Reported-by: syzbot+346474e3bf0b26bd3090@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=346474e3bf0b26bd3090
Link: https://syzkaller.appspot.com/ai_job?id=a507a109-d683-4a2c-bc03-93394f491b17
Signed-off-by: Aleksandr Nogikh <nogikh@google.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/62795fd9-fc0c-48eb-bb82-05ffc5a57104@mail.kernel.org
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat updates from Namjae Jeon:
- Convert exfat buffered and direct I/O to the iomap infrastructure
- Add the supporting block mapping changes needed for that conversion,
including multi-cluster allocation, byte-based cluster mapping
helpers
- Support SEEK_HOLE/SEEK_DATA and swapfile activation through iomap
- Fix damaged upcase-table handling so a zero-sized table does not lead
to an infinite loop
- Fix a potential use-after-free in exfat_find_dir_entry()
- Bound filename-entry advancement in exfat_find_dir_entry()
- Preserve benign secondary entries during rename and move
- Serialize truncate against in-flight direct I/O
- Simplify exfat_lookup()
- Replace unsafe arithmetic macros with static inline helpers
* tag 'exfat-for-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: bound uniname advance in exfat_find_dir_entry()
exfat: add swap_activate support
exfat: preserve benign secondary entries during rename and move
exfat: serialize truncate against in-flight DIO
exfat: add support for SEEK_HOLE and SEEK_DATA in llseek
exfat: add iomap direct I/O support
exfat: add iomap buffered I/O support
exfat: fix implicit declaration of brelse()
exfat: add data_start_bytes and exfat_cluster_to_phys_bytes() helper
exfat: add support for multi-cluster allocation
exfat: add exfat_file_open()
exfat: add balloc parameter to exfat_map_cluster() for iomap support
exfat: replace unsafe macros with static inline functions
exfat: simplify exfat_lookup()
exfat: fix potential use-after-free in exfat_find_dir_entry()
exfat: fix handling of damaged volume in exfat_create_upcase_table()
|
|
llsec_do_encrypt_unauth(), llsec_do_encrypt_auth(),
llsec_do_decrypt_unauth(), and llsec_do_decrypt_auth() all perform
in-place cryptographic transformations on skb data. They build a
scatterlist with sg_init_one() pointing into the skb's linear data area
and then pass the same scatterlist as both src and dst to the crypto API
(e.g. crypto_skcipher_encrypt/decrypt, crypto_aead_encrypt/decrypt).
On the RX path, __ieee802154_rx_handle_packet() clones the received skb
before handing it to each subscriber via ieee802154_subif_frame(). The
cloned skb shares the same underlying data buffer via reference
counting. When llsec_do_decrypt() subsequently modifies this shared
buffer in place, it corrupts data that other clones -- potentially
belonging to other sockets or subsystems -- still reference.
On the TX path, similar data sharing can occur when an skb's head has
been cloned (skb_cloned() returns true).
The fix is to call skb_cow_data() before performing any in-place crypto
operation. skb_cow_data() ensures that the skb's data area is not
shared: if the skb head is cloned or the data spans multiple fragments,
it copies the data into a private buffer that can be safely modified in
place. This is the same pattern used by:
- ESP (net/ipv4/esp4.c, net/ipv6/esp6.c)
- MACsec (drivers/net/macsec.c)
- WireGuard (drivers/net/wireguard/receive.c)
- TIPC (net/tipc/crypto.c)
Without this guard, in-place crypto on shared skb data leads to:
- Silent data corruption of other skb clones
- Use-after-free when the crypto API scatterwalk writes through a
page that has already been freed by another clone's kfree_skb()
- Kernel crashes under concurrent 802.15.4 traffic with security
enabled (KASAN/KMSAN reports slab-use-after-free)
Found by 0sec (https://0sec.ai) using automated source analysis.
Fixes: 4c14a2fb5d14 ("mac802154: add llsec decryption method")
Fixes: 03556e4d0dbb ("mac802154: add llsec encryption method")
Cc: stable@vger.kernel.org
Reported-by: Doruk Tan Ozturk <doruk@0sec.ai>
Closes: https://lore.kernel.org/linux-wpan/20260525161806.96158-1-doruk@0sec.ai/
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Doruk Tan Ozturk <doruk@0sec.ai>
Closes: <link to your mail on lore>
Link: https://lore.kernel.org/20260526183726.56100-1-doruk@0sec.ai
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|
|
ca8210_test_int_driver_write() and ca8210_test_int_user_read() exchange
a kmalloc'd buffer pointer through a struct kfifo, but pass a literal
'4' as the byte count to kfifo_in()/kfifo_out().
This is correct on 32-bit (pointer = 4 bytes), but on 64-bit only the
low 4 bytes of the 8-byte pointer are written into the FIFO. The reader
then reads back 4 bytes into an 8-byte local pointer variable, leaving
the upper 4 bytes uninitialized stack data. The first dereference of
the reconstructed pointer (fifo_buffer[1]) accesses an arbitrary kernel
address and generally results in an oops.
Use sizeof(fifo_buffer) so the byte count matches pointer width on every
architecture.
The driver has no architecture restriction in Kconfig, so any 64-bit
build with CONFIG_IEEE802154_CA8210_DEBUGFS=y is exposed. Issue has
been latent since the driver was added in 2017 because it is most
commonly deployed on 32-bit MCUs.
Found via a custom Coccinelle semantic patch hunting for short-byte
kfifo I/O on byte-mode kfifos used to shuttle pointers.
Fixes: ded845a781a5 ("ieee802154: Add CA8210 IEEE 802.15.4 device driver")
Cc: stable@vger.kernel.org
Signed-off-by: Shitalkumar Gandhi <shitalkumar.gandhi@cambiumnetworks.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/20260520105750.30144-1-shitalkumar.gandhi@cambiumnetworks.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/ntfs
Pull ntfs updates from Namjae Jeon:
- Harden handling of malformed on-disk metadata.
This adds stricter validation for attributes, attribute lists, index
roots and entries, EA entries, mapping pairs, and $LogFile restart
areas. These changes fix several out-of-bounds access, integer
overflow, and inconsistent metadata handling issues.
- Prevent a writeback deadlock involving extent MFT records
- Fix resource leaks in fill_super() failure paths and the name cache
- Serialize volume label access and improving its error handling
- Fix mapping-pairs decoding bounds and LCN overflow checks
- Keep resident index root metadata consistent during resize
- Fix the reported size of symbolic links
- Avoid an unnecessary allocation for resident inline data
- Add support for following and creating Windows native symbolic links.
Relative links, absolute links, and junctions are handled, with new
mount options controlling native symlink creation and absolute target
translation. The existing WSL symlink behavior remains the default.
- The unsupported quota code is removed, along with several smaller
cleanups
* tag 'ntfs-for-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/ntfs: (39 commits)
docs/fs/ntfs: add mount options to support Windows native symbolic links
ntfs: support creating Windows native symlinks
ntfs: clean up target name conversion for WSL symlinks
ntfs: add native_symlink mount option
ntfs: support following Windows native symlink with absolute paths
ntfs: support following Windows native symlink with relative paths
ntfs: fix incorrect size of symbolic link
ntfs: use direct pointer for inline data to avoid redundant allocation
ntfs: validate resident index root values on lookup
ntfs: update index root allocated size before shrink
ntfs: grow index root value before reparent header update
ntfs: reject non-resident records for resident-only attributes
ntfs: fix u16 truncation of restart-area length check
ntfs: bound the attribute-list entry in ntfs_read_inode_mount()
ntfs: bound the look-ahead attribute-list entry in ntfs_external_attr_find()
ntfs: validate resident attribute lists and harden the validator
ntfs: validate resident volume name values on lookup
ntfs: reinit search context before volume information lookup
ntfs: do not replace volume name after lookup errors
ntfs: validate attribute values on lookup
...
|
|
ca8210_spi_transfer() allocates cas_ctl with kzalloc_obj(GFP_ATOMIC)
and relies entirely on the SPI completion callback
ca8210_spi_transfer_complete() to free it.
The spi_async() API only invokes the completion callback on successful
submission. On failure it returns a negative error code without ever
queuing the callback, which leaves cas_ctl and its embedded spi_message
and spi_transfer orphaned. Every kfree(cas_ctl) in the driver is
inside the completion callback, so there is no other reclamation path.
ca8210_spi_transfer() is called from ca8210_spi_exchange(), the
interrupt handler ca8210_interrupt_handler(), and from the retry path
inside the completion callback itself. The exchange and interrupt
handler paths loop on -EBUSY, so under sustained SPI bus contention
every retry iteration leaks a fresh cas_ctl (~600 bytes per
occurrence).
Fix it by freeing cas_ctl on the spi_async() error path. While here,
correct the misleading error string: the function calls spi_async(),
not spi_sync().
Fixes: ded845a781a5 ("ieee802154: Add CA8210 IEEE 802.15.4 device driver")
Cc: stable@vger.kernel.org
Signed-off-by: Shitalkumar Gandhi <shitalkumar.gandhi@cambiumnetworks.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/20260421073259.2259783-1-shitalkumar.gandhi@cambiumnetworks.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|
|
In mm_cid_fixup_cpus_to_tasks(), when rq->curr has the target mm and
mm_cid.active is set, the CID is checked with cid_in_transit() before
setting the transition bit. In per-CPU mode a newly forked or exec'd
task can be running with mm_cid.cid == MM_CID_UNSET because CIDs are
assigned lazily on schedule-in. With cid_in_transit() the guard passes
for MM_CID_UNSET (no transit bit), converts it to MM_CID_UNSET |
MM_CID_TRANSIT and stores it back; later mm_cid_schedout() feeds this
to clear_bit() with MM_CID_UNSET as the bit number, triggering an
out-of-bounds write.
Symptoms: this is genuine memory corruption, but a bounded out-of-bounds
write, not an arbitrary one. MM_CID_UNSET is the fixed sentinel BIT(31),
so once the bad value reaches mm_cid_schedout() the cid_from_transit_cid()
strip leaves MM_CID_UNSET, which fails the "cid < max_cids" convergence
test and falls into mm_drop_cid() -> clear_bit(MM_CID_UNSET,
mm_cidmask(mm)). The cid bitmap is embedded in the mm_struct slab object
(after cpu_bitmap and mm_cpus_allowed) and is only num_possible_cpus()
bits wide, so clearing bit 31 is a deterministic OOB bit-clear at a
fixed offset of 2^31 / 8 == 256 MiB past the bitmap base. The address is
not attacker-influenced (fixed sentinel -> fixed offset) and the op only
clears a single bit; what sits 256 MiB further along the direct map is
whatever kernel object happens to live there, so this corrupts one bit of
unpredictable kernel memory -- it is not an arbitrary-address or
arbitrary-value write.
It triggers only in per-CPU CID mode, when a CPU is running an active
task of the target mm whose cid is still MM_CID_UNSET -- the
fork()/execve() window before that task's next schedule-in assigns it a
real CID -- and a per-CPU -> per-task fixup walks over it (the mode
fallback driven by a thread exit, sched_mm_cid_exit(), or by the deferred
max_cids recompute in mm_cid_work_fn()).
In practice syzkaller surfaced it as a KASAN use-after-free reported in
__schedule -> mm_cid_switch_to, where the offending clear_bit() is inlined
via mm_cid_schedout() -> mm_drop_cid().
Guard the transition-bit assignment against MM_CID_UNSET, in addition to
the existing cid_in_transit() check, so the bit is only set on a genuine
task-owned CID. A CPU-owned (MM_CID_ONCPU) CID of a running active task
is handled by the cid_on_cpu(pcp->cid) branch above and never reaches
this path, so excluding MM_CID_UNSET (and the already-transitioning case)
is sufficient.
Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functions")
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Assisted-by: Claude:claude-opus-4-8 syzkaller
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260616203818.1516263-1-riel@surriel.com
|
|
There's no need to call WARN_ON() in cfg802154_pernet_exit(), since
every point of failure in cfg802154_switch_netns() is covered with
WARN_ON(), so remove it.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 66e5c2672cd1 ("ieee802154: add netns support")
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Ivan Abramov <i.abramov@mt-integration.ru>
Link: https://lore.kernel.org/20250403101935.991385-4-i.abramov@mt-integration.ru
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|
|
It's pointless to call WARN_ON() in case of an allocation failure in
dev_change_net_namespace() and device_rename(), since it only leads to
useless splats caused by deliberate fault injections, so avoid it.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 66e5c2672cd1 ("ieee802154: add netns support")
Reported-by: syzbot+e0bd4e4815a910c0daa8@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/000000000000f4a1b7061f9421de@google.com/#t
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Ivan Abramov <i.abramov@mt-integration.ru>
Link: https://lore.kernel.org/20250403101935.991385-3-i.abramov@mt-integration.ru
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|
|
cfg802154_switch_netns()
Currently, the return value of device_rename() is not acted upon.
To avoid an inconsistent state in case of failure, roll back the changes
made before the device_rename() call.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 66e5c2672cd1 ("ieee802154: add netns support")
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Ivan Abramov <i.abramov@mt-integration.ru>
Link: https://lore.kernel.org/20250403101935.991385-2-i.abramov@mt-integration.ru
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
"This adds new Landlock access rights to control UDP bind and
connect/send operations, and a new "quiet" feature to mute specific
specific audit logs (and other future observability events).
A few commits also fix Landlock issues"
* tag 'landlock-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: (24 commits)
selftests/landlock: Add tests for invalid use of quiet flag
selftests/landlock: Add tests for quiet flag with scope
selftests/landlock: Add tests for quiet flag with net rules
selftests/landlock: Add tests for quiet flag with fs rules
selftests/landlock: Replace hard-coded 16 with a constant
samples/landlock: Add quiet flag support to sandboxer
landlock: Suppress logging when quiet flag is present
landlock: Add API support and docs for the quiet flags
landlock: Add a place for flags to layer rules
landlock: Add documentation for UDP support
samples/landlock: Add sandboxer UDP access control
selftests/landlock: Add tests for UDP send
selftests/landlock: Add tests for UDP bind/connect
landlock: Add UDP send+connect access control
landlock: Add UDP bind() access control
landlock: Fix unmarked concurrent access to socket family
selftests/landlock: Explicitly disable audit in teardowns
selftests/landlock: Test SCOPE_SIGNAL on the SIGIO/fowner pgid path
landlock: Fix LANDLOCK_SCOPE_SIGNAL bypass on the SIGIO path
landlock: Demonstrate best-effort allowed_access filtering
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull keys update from Jarkko Sakkinen:
"This contains only bug fixes"
* tag 'for-next-keys-7.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
keys: keyctl_pkey: replace BUG with return -EOPNOTSUPP
keys: request_key: replace BUG with return -EINVAL
keys: Pin request_key_auth payload in instantiate paths
keys: prevent slab cache merging for key_jar
keys: Replace strcpy(derived_buf, "AUTH_KEY") with strscpy(..., HASH_SIZE)
KEYS: Use acquire when reading state in keyring search
keys/trusted_keys: mark 'migratable' as __ro_after_init
keys: use kmalloc_flex in user_preparse
KEYS: trusted: Debugging as a feature
KEYS: encrypted: Remove unnecessary selection of CRYPTO_RNG
KEYS: fix overflow in keyctl_pkey_params_get_2()
|
|
Before commit a1a69b297e47 ("ACPI / IPMI: Fix race caused by the
unprotected ACPI IPMI user"), ipmi_bmc_gone() skipped entries whose
interface number did not match the SMI being removed, then killed the
matching entry:
if (ipmi_device->ipmi_ifnum != iface)
continue;
__ipmi_dev_kill(ipmi_device);
That commit folded the removal block into the existing non-match test
while converting the object lifetime handling, but left the comparison
unchanged. The old != meant "continue past this entry"; after the
refactor it meant "kill this entry".
As a result, a single ACPI IPMI interface is never removed when its SMI
disappears. If multiple interfaces are tracked, the first interface
whose number differs from iface is removed instead, while the interface
that actually disappeared remains on driver_data.ipmi_devices. The
stale entry is not marked dead and can continue to be selected for ACPI
IPMI transactions. It can also prevent the same ACPI handle from being
registered again.
Change the comparison to == so ipmi_bmc_gone() removes exactly the
interface reported as gone by the SMI watcher. This restores the
pre-a1a69b297e47 behavior and is the correct interface matching logic.
Fixes: a1a69b297e47 ("ACPI / IPMI: Fix race caused by the unprotected ACPI IPMI user")
Signed-off-by: Xu Rao <raoxu@uniontech.com>
Link: https://patch.msgid.link/B486593E06E6F6E0+20260616093621.1039943-1-raoxu@uniontech.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull IMA updates from Mimi Zohar:
- Introduce IMA and EVM post-quantum ML-DSA signature support
ML-DSA signature support for IMA and EVM is limited to sigv3
signatures, which calculates and verifies a hash of a compact
structure containing the file data/metadata hash, hash type, and hash
algorithm. IMA and EVM still calculate the file data/metadata hashes
respectively.
- Introduce support for removing IMA measurement list records stored in
kernel memory
The IMA measurement list can grow large depending on policy, but
removing records breaks remote attestation, unless they are safely
preserved and made available for attestation requests. Until
environments are prepared to preserve the measurement records, a new
CONFIG_IMA_STAGING Kconfig option is introduced to guard against
deletion.
Several approaches for removing measurement list records were
evaluated but rejected due to filesystem constraints, the
introduction of a new critical data record, and locking concerns. Two
methods are being upstreamed: staged deletion with confirmation, and
staged deletion of N records without confirmation. Both methods
minimize the period during which new measurements are blocked from
being appended to the measurement list by staging the measurement
list.
A comparison of the two methods is included in the documentation.
- Some code cleanup, and a couple of bug fixes
* tag 'integrity-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
doc: security: Add documentation of exporting and deleting IMA measurements
ima: Support staging and deleting N measurements records
ima: Add support for flushing the hash table when staging measurements
ima: Add support for staging measurements with prompt
ima: Introduce ima_dump_measurement()
ima: Use snprintf() in create_securityfs_measurement_lists
ima: Mediate open/release method of the measurements list
ima: Introduce _ima_measurements_start() and _ima_measurements_next()
ima: Introduce per binary measurements list type binary_runtime_size value
ima: Introduce per binary measurements list type ima_num_records counter
ima: Replace static htable queue with dynamically allocated array
ima: Remove ima_h_table structure
evm: terminate and bound the evm_xattrs read buffer
integrity: Add support for sigv3 verification using ML-DSA keys
integrity: Refactor asymmetric_verify for reusability
integrity: Check that algo parameter is within valid range
integrity: Check for NULL returned by asymmetric_key_public_key
ima: return error early if file xattr cannot be changed
ima: Fix sigv3 signature handling for EVM_IMA_XATTR_DIGSIG
|
|
The functions are referred as func() in the kernel-doc. The % (percent)
character makes the rendering for constants as described in the respective
documentation. Amend all these.
Fixes: 8e345c991c8c ("ACPI: Centralized processing of ACPI device resources")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20260617090555.2648709-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The correct path of module parameters should be
/sys/module/acpi/parameters/xxx. Fix them.
Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev>
Link: https://patch.msgid.link/20260611142518.77343-1-zenghui.yu@linux.dev
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The function thermal_throttle_add_dev() may fail and abort a CPU hotplug
online operation. Since the failure occurs within the online callback,
thermal_throttle_online(), the CPU hotplug framework does not invoke the
corresponding offline callback. As a result, the hardware and software
resources set up during the failed operation are not torn down.
Since only thermal_throttle_add_dev() can fail, call it before setting up
the rest of the resources.
Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle")
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Link: https://patch.msgid.link/20260613-rneri-directed-therm-intr-v3-1-3a26d1e47fc8@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "selftests/mm: clean up build output and verbosity" (Li Wang)
Remove some noise from the MM selftests build
- "mm: Free contiguous order-0 pages efficiently" (Ryan Roberts)
Speed up the freeing of a batch of 0-order pages by first scanning
them for coalescing opportunities. This is applicable to vfree() and
to the releasing of frozen pages
- "mm/damon: introduce DAMOS failed region quota charge ratio"
(SeongJae Park)
Address a DAMOS usability issue: The DAMOS quota often exhausts
prematurely because it charges for all memory attempted, causing slow
and inconsistent performance when actions fail on unreclaimable
memory.
To fix this, a new feature lets users set a smaller, flexible quota
charge ratio (via a numerator and denominator) for failed regions.
Since failed actions cause less overhead, reducing their quota cost
ensures more predictable and efficient DAMOS processing
- "selftests/cgroup: improve zswap tests robustness and support large
page sizes" (Li Wang)
Fix various spurious failures and improves the overall robustness of
the cgroup zswap selftests
- "fix MAP_DROPPABLE not supported errno" (Anthony Yznaga)
Fix an issue in the mlock selftests on arm32
- "mm: huge_memory: clean up defrag sysfs with shared" (Breno Leitao)
Some maintenance work in the huge_memory code
- "treewide: fixup gfp_t printks" (Brendan Jackman)
Use the special vprintf() gfp_t conversion in various places
- "mm: Fix vmemmap optimization accounting and initialization" (Muchun
Song)
Fix several bugs in the vmemmap optimization, mainly around incorrect
page accounting and memmap initialization in the DAX and memory
hotplug paths. It also fixes pageblock migratetype initialization and
struct page initialization for ZONE_DEVICE compound pages
- "mm/damon: repost non-hotfix reviewed patches in damon/next tree"
A sprinkle of unrelated minor bugfixes for DAMON
- "mm: remove page_mapped()" (David Hildenbrand)
Remove this function from the tree, replacing it with folio_mapped()
- "mm/damon: let DAMON be paused and resumed" (SeongJae Park)
Allow DAMON to be paused and resumed without losing its current state
- "kasan: hw_tags: Disable tagging for stack and page-tables" (Muhammad
Usama Anjum)
Simplify and speed up kasan by removing its ineffective tagging of
stacks and page tables
- "mm/damon/reclaim,lru_sort: monitor all system rams by default"
(SeongJae Park)
Simplify deployment on diverse hardware like NUMA systems by updating
DAMON_RECLAIM and DAMON_LRU_SORT to automatically monitor the
physical address range covering all System RAM areas by default,
replacing the overly restrictive behavior that only targeted the
single largest memory block to save on negligible overhead
- "mm/damon/sysfs: document filters/ directory as deprecated" (SeongJae
Park)
Update some DAMON docs
- "mm: use spinlock guards for zone lock" (Dmitry Ilvokhin)
Switch zone->lock handling over to using the guard() mechanisms
- "mm/filemap: tighten mmap_miss hit accounting" (fujunjie)
Fix a flaw where the mmap_miss counter over-credited page cache hits
during fault-arounds and page-fault retries. This results in
significant reduction of redundant synchronous mmap readahead I/O,
drastically cutting down execution time and gigabytes read for sparse
random or strided memory access workloads
- "selftests/cgroup: Fix false positive failures in test_percpu_basic"
(Li Wang)
Fix a couple of false-positives in the cgroup kmem selftests
- "mm/damon/reclaim: support monitoring intervals auto-tuning"
(SeongJae Park)
Add a new parameter to DAMON permitting DAMON_RECLAIM to
automatically tune DAMON's sampling and aggregation intervals
- "mm/damon/stat: add kdamond_pid parameter" (SeongJae Park)
Change DAMON_STAT to provide the pid of its kdamond
- "mm/kmemleak: dedupe verbose scan output" (Breno Leitao)
Remove large amounts of duplicated backtraces from the verbose-mode
kmemleak output
- "mm: remove CONFIG_HAVE_BOOTMEM_INFO_NODE (Part 1)" (David
Hildenbrand)
Reduce our use of CONFIG_HAVE_BOOTMEM_INFO_NODE, with a view to
removing it entirely in a later series
- "mm/damon: validate min_region_size to be power of 2" (Liew Rui Yan)
Prevent users from passing a non-power-of-2 value of `addr_unit', as
this later results in undesirable behavior
- "mm: document read_pages and simplify usage" (Frederick Mayle)
- "tools/mm/page-types: Fix misc bugs" (Ye Liu)
Fix three issues in tools/mm/page-types.c
- "mm: misc cleanups from __GFP_UNMAPPED series" (Brendan Jackman)
Implement several cleanups in the page allocator and related code
- "mm, swap: swap table phase IV: unify allocation" (Kairui Song)
Unify the allocation and charging of anon and shmem swap in folios,
provides better synchronization, consolidates the metadata
management, hence dropping the static array and map, and improves
performance
- "mm/damon: introduce data attributes monitoring" (SeongJae Park(
Extend DAMON to monitor general data attributes other than accesses
- "mm/vmalloc: free unused pages on vrealloc() shrink" (Shivam Kalra)
Implement the TODO in vrealloc() to unmap and free unused pages when
shrinking across a page boundary
- "mm/damon: documentation and comment fixes" (niecheng)
- "remove mmap_action success, error hooks" (Lorenzo Stoakes)
Eliminate custom hooks from mmap_action by removing the problematic
success_hook which allowed drivers to improperly access uninitialized
VMAs. It replaces the error_hook with a simple error-code field and
updates the memory char driver accordingly
- "mm/damon: minor improvements for code readability and tests"
(SeongJae Park)
- "mm/damon: fix macro arguments and clarify quota goals doc" (Maksym
Shcherba)
- "userfaultfd: merge fs/userfaultfd.c into mm/userfaultfd.c" (Mike
Rapoport)
- "mm/mglru: improve reclaim loop and dirty folio" (Kairui Song and
others)
Clean up and slightly improves MGLRU's reclaim loop and dirty
writeback handling. Large performance improvements are measured
- "use vma locks for proc/pid/{smaps|numa_maps} reads" (Suren
Baghdasaryan)
Use per-vma locks when reading /proc/pid/smaps and numa_maps similar
to reduce contention on central mmap_lock
- "refactors thpsize_shmem_enabled_store() and thpsize_shmem_enabled_show()"
(Ran Xiaokai)
Some cleanup work in the THP code
- "selftests/memfd: fix compilation warnings" (Konstantin Khorenko)
Fix a few build glitches in the memfd selftest code.
- "memcg: shrink obj_stock_pcp and cache multiple objcgs" (Shakeel
Butt)
Resolve a 68% performance regression caused by NUMA-node cache
thrashing around struct obj_stock_pcp by shrinking its existing
fields and expanding it into a multi-slot array that caches up to
five obj_cgroup pointers per CPU, allowing per-node variants of the
same memcg to coexist within a single 64-byte cache line.
- "zram: writeback fixes" (Sergey Senozhatsky)
address a couple of unrelated zram writeback issues
- "mm: switch THP shrinker to list_lru" (Johannes Weiner)
Resolve NUMA-awareness issues and streamlines callsite interaction by
refactoring and extending the list_lru API to completely replace the
complex, open-coded deferred split queue for Transparent Huge Pages
- "mm: improve large folio readahead for exec memory" (Usama Arif)
Improve large-folio readahead on systems like 64K-page arm64 by
preventing the mmap_miss check from permanently disabling
target-oriented VM_EXEC readahead, and by generalizing the
force_thp_readahead gate to support mappings with any usefully large
maximum folio order under the cache cap.
- "userfaultfd/pagemap: pre-existing fixes" (Kiryl Shutsemau)
Fix a bunch of minor issues in the userfaultfd/pagemap, all of which
were flagged by Sashiko review of proposed new material
- "mm/sparse-vmemmap: Provide generic vmemmap_set_pmd() and
vmemmap_check_pmd()" (Muchun Song)
Provide generic versions of these two functions so the four
arch-specific implementations can be removed.
- "mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap
device" (Youngjun Park)
Address a uswsusp-vs-swapoff race and reduces the swap device
reference taking/releasing frequency.
- "mm/hmm: A fix and a selftest" (Dev Jain)
* tag 'mm-stable-2026-06-18-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
selftests/mm/hmm-tests: test pagemap reads of PMD device-private entries
fs/proc/task_mmu: do not warn on seeing non-migration pmd entry
lib/test_hmm: check alloc_page_vma() return value and handle OOM
mm/compaction: cap compact_gap() at COMPACT_CLUSTER_MAX
mm/swap: remove redundant swap device reference in alloc/free
mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device
mm/filemap: use folio_next_index() for start
vmalloc: fix NULL pointer dereference in is_vm_area_hugepages()
sparc/mm: drop vmemmap_check_pmd helper and use generic code
loongarch/mm: drop vmemmap_check_pmd helper and use generic code
riscv/mm: drop vmemmap_pmd helpers and use generic code
arm64/mm: drop vmemmap_pmd helpers and use generic code
mm/sparse-vmemmap: provide generic vmemmap_set_pmd() and vmemmap_check_pmd()
rust: page: mark Page::nid as inline
userfaultfd: build __VMA_UFFD_FLAGS from config-gated masks
userfaultfd: gate must_wait writability check on pte_present()
mm/huge_memory: preserve pmd_swp_uffd_wp on device-private PMD downgrade
fs/proc/task_mmu: fix hugetlb self-deadlock in pagemap_scan_pte_hole()
fs/proc/task_mmu: use huge_page_size() in pagemap_scan_hugetlb_entry()
fs/proc/task_mmu: fix make_uffd_wp_huge_pte() prot-update race
...
|
|
The PMIC PCA9451A and PCA9452 have a default power-off debounce time of
2ms according to their datasheet, while PCA9450A and PCA9450BC use 120us.
Add default_t_off_deb field to struct pca9450 to support per-variant
default configuration when the device tree property is not specified.
Datasheet reference links:
- PCA9451A Rev.2.1: https://www.nxp.com/docs/en/data-sheet/PCA9451A.pdf
- PCA9452 Rev.1.0: https://www.nxp.com/docs/en/data-sheet/PCA9452.pdf
Signed-off-by: Joy Zou <joy.zou@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260618-b4-regulator-opt-v1-1-c43b1f62aaf6@oss.nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Add a new compatible entry "snps,dwc-ssi-2.00a" for the Synopsys
DesignWare SSI controller version 2.00a. This variant uses the same
initialization routine as snps,dwc-ssi-1.01a (dw_spi_hssi_init).
Signed-off-by: Changhuang Liang <changhuang.liang@starfivetech.com>
Link: https://patch.msgid.link/20260619143443.22267-3-changhuang.liang@starfivetech.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Add a new compatible string "starfive,jhb100-spi" for the StarFive
JHB100 SPI, it based on the Synopsys DesignWare SSI version 2.00a,
uses snps,dwc-ssi-2.00a as the primary fallback and snps,dwc-ssi-1.01a
as the secondary fallback.
Signed-off-by: Changhuang Liang <changhuang.liang@starfivetech.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260619143443.22267-2-changhuang.liang@starfivetech.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
hw_breakpoint_arch_parse() positions the BAS bit pattern in
hw->ctrl.len with
offset = hw->address & alignment_mask; /* 0..7 */
hw->ctrl.len <<= offset;
ctrl.len is an 8-bit bitfield (struct arch_hw_breakpoint_ctrl::len is
u32 :8), so the shift silently drops any bits past bit 7. For
non-compat AArch64 watchpoints the offset is unbounded relative to
ctrl.len: a perf_event_open(PERF_TYPE_BREAKPOINT) caller asking for
HW_BREAKPOINT_W with bp_addr=page+1 and bp_len=HW_BREAKPOINT_LEN_8
ends up with 0xff << 1 = 0x1fe, stored as 0xfe. The kernel programs
WCR.BAS=0xfe and the hardware watches bytes [1..7] instead of the
requested [1..8] -- the eighth byte is silently dropped. The
syscall still returns success, leaving userspace to discover the
gap by empirical probing.
The same class affects HW_BREAKPOINT_LEN_{2,4} when offset pushes the
high BAS bit past bit 7 (e.g. LEN_4 with offset=5 yields 0xe0
instead of 0x1e0). No memory-safety impact -- the value is masked
into 8 bits before encoding -- but debuggers and perf users observe
missed events on bytes they thought they were watching.
The AArch32 branch immediately above already rejects unrepresentable
(offset, len) combinations via an explicit switch. Mirror that for
the non-compat branch by checking that the shifted pattern fits in
the BAS field, returning -EINVAL when it does not.
GDB and similar debuggers are unaffected by the stricter check.
aarch64_linux_set_debug_regs() already treats EINVAL on
NT_ARM_HW_WATCH as a downgrade signal: it clears
kernel_supports_any_contiguous_range, calls aarch64_downgrade_regs()
to round the BAS up to a legacy 0x01/03/0f/ff mask with an aligned
base, and retries -- the same fallback path that PR-20207 introduced.
The new -EINVAL is therefore reachable only from a raw
perf_event_open() that pairs an unaligned base with an oversized
bp_len, which is precisely the bug.
Reproducer:
struct perf_event_attr a = {
.type = PERF_TYPE_BREAKPOINT, .size = sizeof(a),
.bp_type = HW_BREAKPOINT_W,
.bp_addr = (uintptr_t)(buf + 1),
.bp_len = HW_BREAKPOINT_LEN_8,
.exclude_kernel = 1, .exclude_hv = 1,
};
int fd = perf_event_open(&a, 0, -1, -1, 0);
/* before this fix: succeeds, watches 7 bytes (buf+1..buf+7) */
/* after this fix: fails with EINVAL */
Fixes: b08fb180bb88 ("arm64: Allow hw watchpoint at varied offset from base address")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Will Deacon <will@kernel.org>
|