| Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
- Fix ARM64-specific rseq regressions (Mark Rutland)
* tag 'sched-urgent-2026-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
arm64/entry: Fix arm64-specific rseq brokenness
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull IRQ fixes from Ingo Molnar:
- Fix use-after-free in irq_work_single() on PREEMPT_RT (Jiayuan Chen)
- Don't call add_interrupt_randomness() for NMIs in
handle_percpu_devid_irq() (Mark Rutland)
- Remove unused function in the ath79-cpu irqchip driver causing LKP
CI build warnings (Rosen Penev)
- Fix IRQ allocation/teardown leakage regressions in the GICv5 irqchip
driver (Sascha Bischoff)
- Fix an IRQ trigger type regression in the Meson S4 SoC irqchip driver
(Xianwei Zhao)
- Fix CPU offlining regression in the RiscV IMSIC irqchip driver
(Yong-Xuan Wang)
* tag 'irq-urgent-2026-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT
irqchip/riscv-imsic: Clear interrupt move state during CPU offlining
irqchip/meson-gpio: Use the correct register in meson_s4_gpio_irq_set_type()
irqchip/ath79-cpu: Remove unused function
genirq/chip: Don't call add_interrupt_randomness() for NMIs
irqchip/gic-v5: Allocate ITS parent LPIs as a range
irqchip/gic-v5: Support range allocation for LPIs
irqchip/gic-v5: Move LPI allocation into the LPI domain
|
|
Pull VFIO fixes from Alex Williamson:
- Convert vfio-pci BAR resource requests and iomaps initialization
from a lazy, on-demand model to an eager pre-allocation model to
avoid races while preserving legacy error behavior. Fix unchecked
barmap access in dma-buf export path (Matt Evans)
- Introduce an implicit unsigned cast in converting vfio-pci device
offsets to region indexes, closing a potential out-of-bounds
access through the vfio_pci_ioeventfd() interface (Matt Evans)
- Fix a dma-buf kref underflow and stuck wait_for_completion() when
closing a previously revoked dma-buf (Alex Williamson)
* tag 'vfio-v7.1-rc4' of https://github.com/awilliam/linux-vfio:
vfio/pci: Check BAR resources before exporting a DMABUF
vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable()
vfio/pci: Make VFIO_PCI_OFFSET_TO_INDEX() return unsigned
vfio/pci: fix dma-buf kref underflow after revoke
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- NVMe merge request via Keith:
- Fix memory leak on a passthrough integrity mapping failure (Keith)
- Hide secrets behind debug option (Hannes)
- Fix pci use-after-free for host memory buffer (Chia-Lin Kao)
- Fix tcp taregt use-after-free for data digest (Sagi)
- Revert a mistaken quirk (Alan Cui)
- Fix uevent and controller state race condition (Maurizio)
- Fix apple submission queue re-initialization (Nick Chan)
- Three fixes for blk-integrity, fixing an issue with the user data
mapping and two problems with recomputing number of segments
- Two fixes for the iov_iter bounce buffering
- Fix for the handling of dead zoned write plugs
- ublk max_sectors validation fix, with associated selftest addition
* tag 'block-7.1-20260515' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
nvme-apple: Reset q->sq_tail during queue init
block: align down bounces bios
block: pass a minsize argument to bio_iov_iter_bounce
selftests: ublk: cap nthreads to kernel's actual nr_hw_queues
block: fix handling of dead zone write plugs
block: bio-integrity: Fix null-ptr-deref in bio_integrity_map_user()
block: recompute nr_integrity_segments in blk_insert_cloned_request
block: don't overwrite bip_vcnt in bio_integrity_copy_user()
nvme: fix race condition between connected uevent and STARTED_ONCE flag
Revert "nvme: add quirk NVME_QUIRK_IGNORE_DEV_SUBNQN for 144d:a808"
nvmet-tcp: Fix potential UAF when ddgst mismatch
nvme-pci: fix use-after-free in nvme_free_host_mem()
nvmet-auth: Do not print DH-HMAC-CHAP secrets
nvme: fix bio leak on mapping failure
nvme: make prp passthrough usage less scary
ublk: reject max_sectors smaller than PAGE_SECTORS in parameter validation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
- asus-nb-wmi:
- Use existing keyboard quirk for ASUS Zenbook Duo UX8407AA
- hp-wmi:
- Add support for Victus 16-r0xxx (8BC2)
- intel/vsec_tpmi:
- Move debugfs register before creating devices
- Prevent fault during unbind
- lenovo-wmi-*:
- Fix memory leak in lwmi_dev_evaluate_int()
- Balance IDA id allocation and free
- Balance component bind and unbind
- Prevent sending uninitialized WMI arguments to the device
- Decouple lenovo-wmi-gamezone and lenovo-wmi-other to simplify
module dependency graph
- Limit adding attributes to supported devices
- samsung-galaxybook:
- Handle kbd backlight, mic mute and camera block hotkeys
* tag 'platform-drivers-x86-v7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: asus-nb-wmi: add DMI quirk for ASUS Zenbook Duo UX8407AA
platform/x86: lenovo-wmi-other: Limit adding attributes to supported devices
platform/x86: lenovo-wmi-other: Add Attribute ID helper functions
platform/x86: lenovo-wmi-helpers: Move gamezone enums to wmi-helpers
platform/x86: lenovo: Decouple lenovo-wmi-gamezone and lenovo-wmi-other
platform/x86: lenovo-wmi-other: Fix tunable_attr_01 struct members
platform/x86: lenovo-wmi-other: Zero initialize WMI arguments
platform/x86: lenovo-wmi-other: Balance component bind and unbind
platform/x86: lenovo-wmi-other: Balance IDA id allocation and free
platform/x86: lenovo-wmi-helpers: Fix memory leak in lwmi_dev_evaluate_int()
platform/x86: hp-wmi: Add support for Victus 16-r0xxx (8BC2)
platform/x86/intel/tpmi/plr: Prevent fault during unbind
platform/x86: intel: Add notifiers support
platform/x86: intel: Move debugfs register before creating devices
platform/x86: samsung-galaxybook: Handle ACPI hotkey notifications
platform/x86: samsung-galaxybook: Refactor camera lens cover input device
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- Fix potential dead-lock in rhashtable when used by xattr
- Avoid calling kvfree on atomic path in rhashtable
* tag 'v7.1-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
rhashtable: Add bucket_table_free_atomic() helper
mm/slab: Add kvfree_atomic() helper
rhashtable: drop ht->mutex in rhashtable_free_and_destroy()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- fixes for a few OOB/UAF in several HID drivers (Florian Pradines, Lee
Jones, Michael Zaidman, Rosalie Wanders, Sangyun Kim and Tomasz
Pakuła)
- more general sanitation of input data, dealing with potentially
malicious hardware in hid-core (Benjamin Tissoires)
- a few device-specific quirks and fixups
* tag 'hid-for-linus-2026051401' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (22 commits)
HID: logitech-hidpp: Add support for newer Bluetooth keyboards
HID: pidff: Fix integer overflow in pidff_rescale
HID: i2c-hid: add reset quirk for BLTP7853 touchpad
HID: core: introduce hid_safe_input_report()
HID: pass the buffer size to hid_report_raw_event
HID: google: hammer: stop hardware on devres action failure
HID: appletb-kbd: run inactivity autodim from workqueues
HID: appletb-kbd: fix UAF in inactivity-timer cleanup path
HID: playstation: Clamp num_touch_reports
HID: magicmouse: Prevent out-of-bounds (OOB) read during DOUBLE_REPORT_ID
HID: mcp2221: fix OOB write in mcp2221_raw_event()
HID: quirks: really enable the intended work around for appledisplay
HID: hid-sjoy: race between init and usage
HID: uclogic: Fix regression of input name assignment
HID: intel-thc-hid: Intel-quickspi: Fix some error codes
HID: hid-lenovo-go-s: restore OS_TYPE after resume from s2idle
HID: elan: Add support for ELAN SB974D touchpad
HID: sony: add missing size validation for Rock Band 3 Pro instruments
HID: sony: add missing size validation for SMK-Link remotes
HID: sony: remove unneeded WARN_ON() in sony_leds_init()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter.
Previous releases - regressions:
- ethtool: fix NULL pointer dereference in phy_reply_size
- netfilter:
- allocate hook ops while under mutex
- close dangling table module init race
- restore nf_conntrack helper propagation via expectation
- tcp:
- fix potential UAF in reqsk_timer_handler().
- fix out-of-bounds access for twsk in tcp_ao_established_key().
- vsock: fix empty payload in tap skb for non-linear buffers
- hsr: fix NULL pointer dereference in hsr_get_node_data()
- eth:
- cortina: fix RX drop accounting
- ice: fix locking in ice_dcb_rebuild()
Previous releases - always broken:
- napi: avoid gro timer misfiring at end of busypoll
- sched:
- dualpi2: initialize timer earlier in dualpi2_init()
- sch_cbs: Call qdisc_reset for child qdisc
- shaper:
- fix ordering issue in net_shaper_commit()
- reject handle IDs exceeding internal bit-width
- ipv6: flowlabel: enforce per-netns limit for unprivileged callers
- tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring
- smc: avoid NULL deref of conn->lnk in smc_msg_event tracepoint
- sctp: revalidate list cursor after sctp_sendmsg_to_asoc() in SCTP_SENDALL
- batman-adv:
- reject new tp_meter sessions during teardown
- purge non-released claims
- eth:
- i40e: cleanup PTP registration on probe failure
- idpf: fix double free and use-after-free in aux device error paths
- ena: fix potential use-after-free in get_timestamp"
* tag 'net-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
net: phy: DP83TC811: add reading of abilities
net: tls: prevent chain-after-chain in plain text SG
net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring
net/smc: reject CHID-0 ACCEPT that matches an empty ism_dev slot
macsec: use rcu_work to defer TX SA crypto cleanup out of softirq
macsec: use rcu_work to defer RX SA crypto cleanup out of softirq
macsec: introduce dedicated workqueue for SA crypto cleanup
net: net_failover: Fix the deadlock in slave register
MAINTAINERS: update atlantic driver maintainer
selftests/tc-testing: Add QFQ/CBS qlen underflow test
net/sched: sch_cbs: Call qdisc_reset for child qdisc
FDDI: defza: Sanitise the reset safety timer
net: ethernet: ravb: Do not check URAM suspension when WoL is active
ethtool: fix ethnl_bitmap32_not_zero() bit interval semantics
net/smc: avoid NULL deref of conn->lnk in smc_msg_event tracepoint
net/smc: fix sleep-inside-lock in __smc_setsockopt() causing local DoS
net: atm: fix skb leak in sigd_send() default branch
net: ethtool: phy: avoid NULL deref when PHY driver is unbound
net: atlantic: preserve PCI wake-from-D3 on shutdown when WOL enabled
net: shaper: reject QUEUE scope handle with missing id
...
|
|
The 'dumpability' of a task is fundamentally about the memory image of
the task - the concept comes from whether it can core dump or not - and
makes no sense when you don't have an associated mm.
And almost all users do in fact use it only for the case where the task
has a mm pointer.
But we have one odd special case: ptrace_may_access() uses 'dumpable' to
check various other things entirely independently of the MM (typically
explicitly using flags like PTRACE_MODE_READ_FSCREDS). Including for
threads that no longer have a VM (and maybe never did, like most kernel
threads).
It's not what this flag was designed for, but it is what it is.
The ptrace code does check that the uid/gid matches, so you do have to
be uid-0 to see kernel thread details, but this means that the
traditional "drop capabilities" model doesn't make any difference for
this all.
Make it all make a *bit* more sense by saying that if you don't have a
MM pointer, we'll use a cached "last dumpability" flag if the thread
ever had a MM (it will be zero for kernel threads since it is never
set), and require a proper CAP_SYS_PTRACE capability to override.
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
"The bulk of this is hardening of the new sub-scheduler infrastructure.
- UAFs and lifecycle bugs on the sub-sched attach/detach paths:
parent sub_kset freed under a racing child, list_del_rcu on an
uninitialized list head, ops->priv stomped by concurrent
attach/detach, and a UAF in the init-failure error path
- Task state-machine reorg closing concurrent enable-vs-dead races: a
task exiting during the unlocked init window could trip NULL ops
derefs or skip exit_task() cleanup
- A scx_link_sched() self-deadlock on scx_sched_lock
- isolcpus: stop dereferencing the now-RCU-protected HK_TYPE_DOMAIN
cpumask without RCU, and stop rejecting BPF schedulers when only
cpuset isolated partitions are active
- PREEMPT_RT: disable irq_work runs in hardirq context so dumps show
the failing task rather than the irq_work kthread
- Assorted !CONFIG_EXT_SUB_SCHED, randconfig, and selftest build
fixes"
* tag 'sched_ext-for-7.1-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Use HK_TYPE_DOMAIN_BOOT to detect isolcpus= domain isolation
sched_ext: Defer sub_kset base put to scx_sched_free_rcu_work
sched_ext: INIT_LIST_HEAD() &sch->all in scx_alloc_and_add_sched()
sched_ext: Drop NONE early return in scx_disable_and_exit_task()
sched_ext: Avoid UAF in scx_root_enable_workfn() init failure path
sched_ext: Clear ops->priv on scx_alloc_and_add_sched() error paths
sched_ext: Fix ops->priv clobber on concurrent attach/detach
selftests/sched_ext: Fix build error in dequeue selftest
sched_ext: Handle SCX_TASK_NONE in disable/switched_from paths
sched_ext: Close sub-sched init race with post-init DEAD recheck
sched_ext: Close root-enable vs sched_ext_dead() race with SCX_TASK_INIT_BEGIN
sched_ext: Replace SCX_TASK_OFF_TASKS flag with SCX_TASK_DEAD state
sched_ext: Inline scx_init_task() and move RESET_RUNNABLE_AT into scx_set_task_state()
sched_ext: Cleanups in preparation for the SCX_TASK_INIT_BEGIN/DEAD work
sched_ext: Use IRQ_WORK_INIT_HARD() to initialize sch->disable_irq_work
sched_ext: Fix !CONFIG_EXT_SUB_SCHED build warnings
sched_ext: Drop unused scx_find_sub_sched() stub
sched_ext: Move scx_error() out of scx_link_sched()'s lock region
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- cpuset fixes:
- Partition invalidation could return CPUs still in use by sibling
partitions, producing overlapping effective_cpus
- cpuset_can_attach() over-reserved DL bandwidth on moves that
stayed within the same root domain
- Pending DL migration state leaked into later attaches when a
later can_attach() check failed
- Reorder PF_EXITING and __GFP_HARDWALL checks so dying tasks can
allocate from any node and exit quickly
- dmem: propagate -ENOMEM instead of spinning forever when the fallback
pool allocation also fails
- selftests/cgroup: percpu test error-path leak, bogus numeric
comparison of cpuset strings, and a zero-length read() that silently
passed OOM-kill tests
* tag 'cgroup-for-7.1-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: Return only actually allocated CPUs during partition invalidation
selftests/cgroup: Fix error path leaks in test_percpu_basic
cgroup/cpuset: Reserve DL bandwidth only for root-domain moves
cgroup/cpuset: Reset DL migration state on can_attach() failure
selftests/cgroup: Fix string comparison in write_test
selftests/cgroup: Fix cg_read_strcmp() empty string comparison
cgroup/dmem: Return -ENOMEM on failed pool preallocation
cgroup/cpuset: move PF_EXITING check before __GFP_HARDWALL in cpuset_current_node_allowed()
|
|
VFIO_PCI_OFFSET_TO_INDEX() is used in several places with a signed
parameter (e.g. loff_t). Because it makes no sense for a BAR/resource
index to be negative, enforce this in the macro.
This fixes at least one current issue, where vfio_pci_ioeventfd() uses
this macro with an unvalidated signed loff_t returned into a signed
type, leading to a possible negative array access. This instance does
test against an out-of-bounds positive value, so treating the index as
unsigned fixes this issue.
Fixes: 89e1f7d4c66d8 ("vfio: Add PCI device driver")
Signed-off-by: Matt Evans <mattev@meta.com>
Link: https://lore.kernel.org/r/20260511144642.2926799-1-mattev@meta.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
When bouncing for block size > PAGE_SIZE file systems that require
file system block size alignment (e.g. zoned XFS), the bio needs to
be big enough to fit an entire block.
Fixes: 8dd5e7c75d7b ("block: add helpers to bounce buffer an iov_iter into bios")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Link: https://patch.msgid.link/20260507050153.1298375-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull kvm fixes from Paolo Bonzini:
"arm64:
- Add the pKVM side of the workaround for ARM's erratum 4193714,
provided that the EL3 firmware does its part of the job. KVM will
refuse to initialise otherwise
- Correctly handle 52bit VAs for guest EL2 stage-1 translations when
running under NV with E2H==0
- Correctly deal with permission faults in guest_memfd memslots
- Fix the steal-time selftest after the infrastructure was reworked
- Make sure the host cannot pass a non-sensical clock update to the
EL2 tracing infrastructure
- Appoint Steffen Eiden as a reviewer in anticipation of the KVM/s390
ability to run arm64 guests, which will inevitably lead to arm64
code being directly used on s390
- Make sure that EL2 is configured with both exception entry and exit
being Context Synchronization Events
- Handle the current vcpu being NULL on EL2 panic
- Fix the selftest_vcpu memcache being empty at the point of donation
or sharing
- Check that the memcache has enough capacity before engaging on the
share/donate path
- Fix __deactivate_fgt() to use its parameter rather than a variable
in the macro context
s390:
- Fix array overrun with large amounts of PCI devices
x86:
- Never use L0's PAUSE loop exiting while L2 is running, since it's
unlikely that a nested guest will help solving the hypervisor's
spinlock contention
- Fix emulation of MOVNTDQA
- Fix typo in Xen hypercall tracepoint
- Add back an optimization that was left behind when recently fixing
a bug
- Add module parameter to disable CET, whose implementation seems to
have issues. For now it remains enabled by default
Generic:
- Reject offset causing an unsigned overflow in kvm_reset_dirty_gfn()
Documentation:
- Update stale links
Selftests:
- Fix guest_memfd_test with host page size > guest page size"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
KVM: VMX: introduce module parameter to disable CET
KVM: x86: Swap the dst and src operand for MOVNTDQA
KVM: x86: use again the flush argument of __link_shadow_page()
KVM: selftests: Ensure gmem file sizes are multiple of host page size
Documentation: kvm: update links in the references section of AMD Memory Encryption
KVM: nSVM: Never use L0's PAUSE loop exiting while L2 is running
KVM: x86: Fix Xen hypercall tracepoint argument assignment
KVM: Reject wrapped offset in kvm_reset_dirty_gfn()
KVM: arm64: Pre-check vcpu memcache for host->guest donate
KVM: arm64: Pre-check vcpu memcache for host->guest share
KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache
KVM: arm64: Fix __deactivate_fgt macro parameter typo
KVM: arm64: Guard against NULL vcpu on VHE hyp panic path
KVM: arm64: Make EL2 exception entry and exit context-synchronization events
MAINTAINERS: Add Steffen as reviewer for KVM/arm64
KVM: arm64: Remove potential UB on nvhe tracing clock update
KVM: selftests: arm64: Fix steal_time test after UAPI refactoring
KVM: arm64: Handle permission faults with guest_memfd
KVM: arm64: nv: Consider the DS bit when translating TCR_EL2
KVM: arm64: Work around C1-Pro erratum 4193714 for protected guests
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:
- kprobes: skip non-symbol addresses in kprobe_add_ksym_blacklist()
Since the ftrace adds its NOPs at .kprobes.text section (which stores
an array), a wrong entry is added when loading a module which uses
"__kprobes" attribute.
To solve this, add "notrace" to __kprobes functions
- test_kprobes: clear kprobes between test runs
Clear all kprobes in the test program after running a test set,
because Kunit test can run several times
- fprobe: Fix unregister_fprobe() to wait for RCU grace period
Since the fprobe data structure is removed with hlist_del_rcu(), it
should wait for the RCU grace period. If the caller waits for RCU, we
can use the async variant (e.g. eBPF)
* tag 'probes-fixes-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
fprobe: Fix unregister_fprobe() to wait for RCU grace period
test_kprobes: clear kprobes between test runs
kprobes: skip non-symbol addresses in kprobe_add_ksym_blacklist()
|
|
hid_input_report() is used in too many places to have a commit that
doesn't cross subsystem borders. Instead of changing the API, introduce
a new one when things matters in the transport layers:
- usbhid
- i2chid
This effectively revert to the old behavior for those two transport
layers.
Fixes: 0a3fe972a7cb ("HID: core: Mitigate potential OOB by removing bogus memset()")
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
|
|
commit 0a3fe972a7cb ("HID: core: Mitigate potential OOB by removing
bogus memset()") enforced the provided data to be at least the size of
the declared buffer in the report descriptor to prevent a buffer
overflow. However, we can try to be smarter by providing both the buffer
size and the data size, meaning that hid_report_raw_event() can make
better decision whether we should plaining reject the buffer (buffer
overflow attempt) or if we can safely memset it to 0 and pass it to the
rest of the stack.
Fixes: 0a3fe972a7cb ("HID: core: Mitigate potential OOB by removing bogus memset()")
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Acked-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
|
|
cpuset_can_attach() currently adds the bandwidth of all migrating
SCHED_DEADLINE tasks to sum_migrate_dl_bw. If the source and destination
cpuset effective CPU masks do not overlap, the whole sum is then
reserved in the destination root domain.
set_cpus_allowed_dl(), however, subtracts bandwidth from the source
root domain only when the affinity change really moves the task between
root domains. A DL task can move between cpusets that are still in the
same root domain, so including that task in sum_migrate_dl_bw can reserve
destination bandwidth without a matching source-side subtraction.
Share the root-domain move test with set_cpus_allowed_dl(). Keep
nr_migrate_dl_tasks counting all migrating deadline tasks for cpuset DL
task accounting, but add to sum_migrate_dl_bw only for tasks that need a
root-domain bandwidth move. Keep using the destination cpuset effective
CPU mask and leave the broader can_attach()/attach() transaction model
unchanged.
Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
Cc: stable@vger.kernel.org # v6.10+
Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
Reviewed-by: Waiman Long <longman@redhat.com>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
The IPI and ITS MSI domains currently allocate and release LPIs
directly, then pass the selected LPI ID to the parent LPI domain. This
leaks the LPI domain's allocation policy into its child domains and
forces each child to duplicate part of the parent domain's teardown.
Make the LPI domain allocate LPIs in its .alloc() callback and release
them in a matching .free() callback. Child domains can then request a
parent interrupt without passing an implementation-specific LPI ID,
and the LPI lifetime is tied to the domain that owns the LPI
namespace.
Remove the gicv5_alloc_lpi() and gicv5_free_lpi() wrappers now that no
external caller needs to manage LPIs directly.
This is a preparatory change for an actual leakage problem in the
allocation code and therefore tagged with the same Fixes tag.
Fixes: 0f0101325876 ("irqchip/gic-v5: Add GICv5 LPI/IPI support")
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260506093634.382062-2-sascha.bischoff@arm.com
|
|
Commit 4346ba1604093 ("fprobe: Rewrite fprobe on function-graph tracer")
changed fprobe to register struct fprobe to an rcu-hlist, but it forgot
to wait for RCU GP. Thus there can be use-after-free if the fprobe is
released right after unregistering. This can be happened on fprobe
event and sample module code.
To fix this issue, add synchronize_rcu() in unregister_fprobe().
Note that BPF is OK because fprobe is used as a part of
bpf_kprobe_multi_link. This unregisters its fprobe in
bpf_kprobe_multi_link_release() and it is deallocated via
bpf_kprobe_multi_link_dealloc(), which is invoked from
bpf_link_defer_dealloc_rcu_gp() RCU callback.
For BPF, this also introduced unregister_fprobe_async() which does
NOT wait for RCU grace priod.
Link: https://lore.kernel.org/all/177813998919.256460.2809243930741138224.stgit@mhiramat.tok.corp.google.com/
Fixes: 4346ba1604093 ("fprobe: Rewrite fprobe on function-graph tracer")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
scx_root_enable_workfn() drops the iter rq lock for ops.init_task() and a
TASK_DEAD @p can fall through sched_ext_dead() in that window. The race hits
when sched_ext_dead() observes SCX_TASK_INIT (the intermediate state before
@p->scx.sched is published) and dereferences NULL via SCX_HAS_OP(NULL,
exit_task), or observes SCX_TASK_NONE during the unlocked init window and
skips cleanup so exit_task() never runs.
Add SCX_TASK_INIT_BEGIN. The enable path writes NONE -> INIT_BEGIN under the
iter rq lock, then takes the rq lock again after init to walk INIT_BEGIN ->
INIT -> READY. sched_ext_dead() that wins the rq-lock race observes
INIT_BEGIN and sets DEAD without calling into ops; the post-init recheck
unwinds via scx_sub_init_cancel_task().
scx_fork() runs single-threaded against sched_ext_dead() (the task is not on
scx_tasks until scx_post_fork() adds it) so its INIT_BEGIN -> INIT walk
needs no rq-lock pairing; it rolls back to NONE on ops.init_task() failure.
The validation matrix grows the INIT_BEGIN row and the INIT_BEGIN -> DEAD
edge; INIT now requires INIT_BEGIN as the predecessor. scx_sub_disable()'s
migration writes INIT_BEGIN as a synthetic predecessor to satisfy the
tightened verification.
The sub-sched paths still race with sched_ext_dead() during the unlocked
init window. This will be fixed by the next patch.
Reported-by: zhidao su <suzhidao@xiaomi.com>
Link: https://lore.kernel.org/all/20260429133155.3825247-1-suzhidao@xiaomi.com/
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
|
|
SCX_TASK_OFF_TASKS marked tasks already through sched_ext_dead() so cgroup
task iteration would skip them. This can be expressed better with a task
state. Replace the flag with SCX_TASK_DEAD.
scx_disable_and_exit_task() resets state to NONE on its way out, so
sched_ext_dead() now sets DEAD after the wrapper returns. The validation
matrix grows NONE -> DEAD, warns on DEAD -> NONE, and tightens READY's
predecessor to INIT or ENABLED so the new DEAD value cannot silently
transition to READY.
Prepares for the following enable vs dead race fix.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
|
|
Pull bpf fixes from Alexei Starovoitov:
- Fix sk_local_storage diag dump via netlink (Amery Hung)
- Fix off-by-one in arena direct-value access (Junyoung Jang)
- Reject TCP_NODELAY in bpf-tcp congestion control (KaFai Wan)
- Fix type confusion in bpf_*_sock() (Kuniyuki Iwashima)
- Reject TX-only AF_XDP sockets (Linpu Yu)
- Don't run arg-tracking analysis twice on main subprog (Paul Chaignon)
- Fix NULL pointer dereference in bpf_sk_storage_clone and fib lookup
(Weiming Shi)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix off-by-one boundary validation in arena direct-value access
xskmap: reject TX-only AF_XDP sockets
bpf: Don't run arg-tracking analysis twice on main subprog
bpf: Free reuseport cBPF prog after RCU grace period.
bpf: tcp: Fix type confusion in sol_tcp_sockopt().
bpf: tcp: Fix type confusion in bpf_skc_to_tcp6_sock().
bpf: tcp: Fix type confusion in bpf_skc_to_tcp_sock().
mptcp: bpf: Fix type confusion in bpf_mptcp_sock_from_subflow()
selftest: bpf: Add test for bpf_tcp_sock() and RAW socket.
bpf: tcp: Fix type confusion in bpf_tcp_sock().
tools/headers: Regenerate stddef.h to fix BPF selftests
bpf: Fix sk_local_storage diag dumping uninitialized special fields
bpf: Fix NULL pointer dereference in bpf_skb_fib_lookup()
sockmap: Fix sk_psock_drop() race vs sock_map_{unhash,close,destroy}().
bpf: Fix NULL pointer dereference in bpf_sk_storage_clone and diag paths
selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY
selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks
bpf: Reject TCP_NODELAY in bpf-tcp-cc
bpf: Reject TCP_NODELAY in TCP header option callbacks
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
- Fix spurious failures in rseq self-tests (Mark Brown)
- Fix rseq rseq::cpu_id_start ABI regression due to TCMalloc's creative
use of the supposedly read-only field
The fix is to introduce a new ABI variant based on a new (larger)
rseq area registration size, to keep the TCMalloc use of rseq
backwards compatible on new kernels (Thomas Gleixner)
- Fix wakeup_preempt_fair() for not waking up task (Vincent Guittot)
- Fix s64 mult overflow in vruntime_eligible() (Zhan Xusheng)
* tag 'sched-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix wakeup_preempt_fair() for not waking up task
sched/fair: Fix overflow in vruntime_eligible()
selftests/rseq: Expand for optimized RSEQ ABI v2
rseq: Reenable performance optimizations conditionally
rseq: Implement read only ABI enforcement for optimized RSEQ V2 mode
selftests/rseq: Validate legacy behavior
selftests/rseq: Make registration flexible for legacy and optimized mode
selftests/rseq: Skip tests if time slice extensions are not available
rseq: Revert to historical performance killing behaviour
rseq: Don't advertise time slice extensions if disabled
rseq: Protect rseq_reset() against interrupts
rseq: Set rseq::cpu_id_start to 0 on unregistration
selftests/rseq: Don't run tests with runner scripts outside of the scripts
|
|
Mathias Stearn reports that since v6.19, there are two big issues
affecting rseq:
(1) On arm64 specifically, rseq critical sections aren't aborted when
they should be.
(2) The 'cpu_id_start' field is no longer written by the kernel in all
cases it used to be, including some cases where TCMalloc depends on
the kernel clobbering the field.
This patch fixes issue #1. This patch DOES NOT fix issue #2, which will
need to be addressed by other patches.
The arm64-specific brokenness is a result of commits:
2fc0e4b4126c ("rseq: Record interrupt from user space")
39a167560a61 ("rseq: Optimize event setting")
The first commit failed to add a call to rseq_note_user_irq_entry() on
arm64. Thus arm64 never sets rseq_event::user_irq to record that it may
be necessary to abort an active rseq critical section upon return to
userspace. On its own, this commit had no functional impact as the value
of rseq_event::user_irq was not consumed.
The second commit relied upon rseq_event::user_irq to determine whether
or not to bother to perform rseq work when returning to userspace. As
rseq_event::user_irq wasn't set on arm64, this work would be skipped,
and consequently an active rseq critical section would not be aborted.
Fix this by giving arm64 syscall-specific entry/exit paths, and
performing the relevant logic in syscall and non-syscall paths,
including calling rseq_note_user_irq_entry() for non-syscall entry.
Currently arm64 cannot use syscall_enter_from_user_mode(),
syscall_exit_to_user_mode(), and irqentry_exit_to_user_mode(), due to
ordering constraints with exception masking, and risk of ABI breakage
for syscall tracing/audit/etc. For the moment the entry/exit logic is
left as arm64-specific, directly using enter_from_user_mode() and
exit_to_user_mode(), but mirroring the generic code.
I intend to follow up with refactoring/cleanup, as we did for kernel
mode entry paths in commit:
041aa7a85390 ("entry: Split preemption from irqentry_exit_to_kernel_mode()")
... which will allow arm64 to use the GENERIC_IRQ_ENTRY functions directly.
Fixes: 39a167560a61 ("rseq: Optimize event setting")
Reported-by: Mathias Stearn <mathias@mongodb.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/regressions/CAHnCjA25b+nO2n5CeifknSKHssJpPrjnf+dtr7UgzRw4Zgu=oA@mail.gmail.com/
Link: https://patch.msgid.link/20260508142023.3268622-1-mark.rutland@arm.com
|
|
Previous change added xtables_unregister_table_pre_exit to detach the
table from the packetpath and to unlink it from the active table list.
In case of rmmod, userspace that is doing set/getsockopt for this table
will not be able to re-instantiate the table:
1. The larval table has been removed already
2. existing instantiated table is no longer on the xt pernet table list.
This adds the second stage helper:
unlink the table from the dying list, free the hook ops (if any) and do
the audit notification. It replaces xt_unregister_table().
Fixes: fdacd57c79b7 ("netfilter: x_tables: never register tables by default")
Reported-by: Tristan Madani <tristan@talencesecurity.com>
Reviewed-by: Tristan Madani <tristan@talencesecurity.com>
Closes: https://lore.kernel.org/netfilter-devel/20260429175613.1459342-1-tristmd@gmail.com/
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Remove the copypasted variants of _pre_exit and add one single
function in the xtables core. ebtables is not compatible with
x_tables and therefore unchanged.
This is a preparation patch to reduce noise in the followup
bug fixes.
Reviewed-by: Tristan Madani <tristan@talencesecurity.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
arp/ip(6)t_register_table() add the table to the per-netns list via
xt_register_table() before allocating the per-netns hook ops copy
via kmemdup_array(). This leaves a window where the table is
visible in the list with ops=NULL.
If the pernet exit happens runs concurrently the pre_exit callback finds
the table via xt_find_table() and passes the NULL ops pointer to
nf_unregister_net_hooks(), causing a NULL dereference:
general protection fault in nf_unregister_net_hooks+0xbc/0x150
RIP: nf_unregister_net_hooks (net/netfilter/core.c:613)
Call Trace:
ipt_unregister_table_pre_exit
iptable_mangle_net_pre_exit
ops_pre_exit_list
cleanup_net
Fix by moving the ops allocation into the xtables core so the table is
never in the list without valid ops. Also ensure the table is no longer
processing packets before its torn down on error unwind.
nf_register_net_hooks might have published at least one hook; call
synchronize_rcu() if there was an error.
audit log register message gets deferred until all operations have
passed, this avoids need to emit another ureg message in case of
error unwinding.
Based on earlier patch by Tristan Madani.
Fixes: f9006acc8dfe5 ("netfilter: arp_tables: pass table pointer via nf_hook_ops")
Fixes: ee177a54413a ("netfilter: ip6_tables: pass table pointer via nf_hook_ops")
Fixes: ae689334225f ("netfilter: ip_tables: pass table pointer via nf_hook_ops")
Link: https://lore.kernel.org/netfilter-devel/20260429175613.1459342-1-tristmd@gmail.com/
Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from Netfilter, IPsec, Bluetooth and WiFi.
Current release - fix to a fix:
- ipmr: add __rcu to netns_ipv4.mrt, make sure we hold the RCU lock
in all relevant places
Current release - new code bugs:
- fixes for the recently added resizable hash tables
- ipv6: make sure we default IPv6 tunnel drivers to =m now that IPv6
itself is built in
- drv: octeontx2-af: fixes for parser/CAM fixes
Previous releases - regressions:
- phy: micrel: fix LAN8814 QSGMII soft reset
- wifi:
- cw1200: revert "Fix locking in error paths"
- ath12k: fix crash on WCN7850, due to adding the same queue
buffer to a list multiple times
Previous releases - always broken:
- number of info leak fixes
- ipv6: implement limits on extension header parsing
- wifi: number of fixes for missing bound checks in the drivers
- Bluetooth: fixes for races and locking issues
- af_unix:
- fix an issue between garbage collection and PEEK
- fix yet another issue with OOB data
- xfrm: esp: avoid in-place decrypt on shared skb frags
- netfilter: replace skb_try_make_writable() by skb_ensure_writable()
- openvswitch: vport: fix race between tunnel creation and linking
leading to invalid memory accesses (type confusion)
- drv: amd-xgbe: fix PTP addend overflow causing frozen clock
Misc:
- sched/isolation: make HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN
(for relevant IPVS change)"
* tag 'net-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (190 commits)
net: sparx5: configure serdes for 1000BASE-X in sparx5_port_init()
net: sparx5: fix wrong chip ids for TSN SKUs
net: stmmac: dwmac-nuvoton: fix NULL pointer dereference in nvt_set_phy_intf_sel()
tcp: Fix dst leak in tcp_v6_connect().
ipmr: Call ipmr_fib_lookup() under RCU.
net: phy: broadcom: Save PHY counters during suspend
net/smc: fix missing sk_err when TCP handshake fails
af_unix: Reject SIOCATMARK on non-stream sockets
veth: fix OOB txq access in veth_poll() with asymmetric queue counts
eth: fbnic: fix double-free of PCS on phylink creation failure
net: ethernet: cortina: Drop half-assembled SKB
selftests: mptcp: pm: restrict 'unknown' check to pm_nl_ctl
selftests: mptcp: check output: catch cmd errors
mptcp: pm: prio: skip closed subflows
mptcp: pm: ADD_ADDR rtx: return early if no retrans
mptcp: pm: ADD_ADDR rtx: skip inactive subflows
mptcp: pm: ADD_ADDR rtx: resched blocked ADD_ADDR quicker
mptcp: pm: ADD_ADDR rtx: free sk if last
mptcp: pm: ADD_ADDR rtx: always decrease sk refcount
mptcp: pm: ADD_ADDR rtx: fix potential data-race
...
|
|
In some cases a driver using services of vsec_tpmi driver requires some
processing before vsec_tpmi exits. For example a children using debugfs
can't use debugfs as this will be deleted by the vsec_tpmi driver.
This is the case when unbind using PCI driver interface. In this case
the remove callback of vsec_tpmi driver is called first, then remove
callback of its children.
Add support of blocking chain notifiers support. Notify on successful probe
and before clean up in the remove callback.
Fixes: 811f67c51636 ("platform/x86/intel/tpmi: Add new auxiliary driver for performance limits")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Stable@vger.kernel.org
Link: https://patch.msgid.link/20260430151103.1549733-3-srinivas.pandruvada@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Pull smb server fixes from Steve French:
- Fix memory leak in connection free
- Fix inherited ACL ACE validation
- Minor cleanup
- Fix for share config
- Fix durable handle cleanup race
- Fix close_file_table_ids in session teardown
- smbdirect fixes:
- Fix memory region registration
- Two fixes for out-of-tree builds
* tag 'v7.1-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: validate inherited ACE SID length
ksmbd: fix kernel-doc warnings from ksmbd_conn_get/put()
ksmbd: fail share config requests when path allocation fails
ksmbd: close durable scavenger races against m_fp_list lookups
ksmbd: harden file lifetime during session teardown
ksmbd: centralize ksmbd_conn final release to plug transport leak
smb: smbdirect: fix MR registration for coalesced SG lists
smb: smbdirect: introduce and use include/linux/smbdirect.h
smb: smbdirect: make use of DEFAULT_SYMBOL_NAMESPACE and EXPORT_SYMBOL_GPL
|
|
C1-Pro cores with SME have an erratum where TLBI+DSB does not complete
all outstanding SME accesses. Instead a DSB needs to be executed on the
affected CPUs. The implication is that pages cannot be unmapped from the
host Stage 2 and then provided to a protected guest or to the
hypervisor. Host SME accesses may still complete after this point.
This erratum breaks pKVM's guarantees, and the workaround is hard to
implement as EL2 and EL1 share a security state meaning EL1 can mask
IPIs sent by EL2, leading to interrupt blackouts.
Instead, do this in EL3. This has the advantage of a separate security
state, meaning lower EL cannot mask the IPI. It is also simpler for EL3
to know about CPUs that are off or in PSCI's CPU_SUSPEND.
Add the needed hook to host_stage2_set_owner_metadata_locked(). This
covers the cases where the host loses access to a page:
__pkvm_host_donate_guest()
__pkvm_guest_unshare_host()
host_stage2_set_owner_locked() when owner_id == PKVM_ID_HYP
Since pKVM relies on the firmware call for correctness, check for the
firmware counterpart during protected KVM initialisation and fail the
pKVM initialisation if it is missing.
Signed-off-by: James Morse <james.morse@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Sudeep Holla <sudeep.holla@kernel.org>
Link: https://patch.msgid.link/20260505165205.2690919-1-catalin.marinas@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The optimized RSEQ V2 mode requires that user space adheres to the ABI
specification and does not modify the read-only fields cpu_id_start,
cpu_id, node_id and mm_cid behind the kernel's back.
While the kernel does not rely on these fields, the adherence to this is a
fundamental prerequisite to allow multiple entities, e.g. libraries, in an
application to utilize the full potential of RSEQ without stepping on each
other toes.
Validate this adherence on every update of these fields. If the kernel
detects that user space modified the fields, the application is force
terminated.
Fixes: d6200245c75e ("rseq: Allow registering RSEQ with slice extension")
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://patch.msgid.link/20260428224427.845230956%40kernel.org
Cc: stable@vger.kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
- Fix devm_alloc_workqueue() passing a va_list as a positional arg to
the variadic alloc_workqueue() macro, which garbled wq->name and
skipped lockdep init on the devm path. Fold both noprof entry points
onto a va_list helper.
Also, annotate it using __printf(1, 0)
* tag 'wq-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Annotate alloc_workqueue_va() with __printf(1, 0)
workqueue: fix devm_alloc_workqueue() va_list misuse
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- During v6.19, cgroup task unlink was moved from do_exit() to after the
final task switch to satisfy a controller invariant. That left the kernel
seeing tasks past exit_signals() longer than userspace expected, and
several v7.0 follow-ups tried to bridge the gap by making rmdir wait for
the kernel side. None held up.
The latest is an A-A deadlock when rmdir is invoked by the reaper of
zombies whose pidns teardown the rmdir itself is waiting on, which
points at the synchronizing approach being fundamentally wrong.
Take a different approach: drop the wait, leave rmdir's user-visible
side returning as soon as cgroup.procs is empty, and defer the css
percpu_ref kill that drives ->css_offline() until the cgroup is fully
depopulated.
Tagged for stable. Somewhat invasive but contained. The hope is that
fixing forward sticks. If not, the fallback is to revert the entire
chain and rework on the development branch.
Note that this doesn't plug a pre-existing analogous race in
cgroup_apply_control_disable() (controller disable via
subtree_control). Not a regression. The development branch will do
the more invasive restructuring needed for that.
- Documentation update for cgroup-v1 charge-commit section that still
referenced functions removed when the memcg hugetlb try-commit-cancel
protocol was retired.
* tag 'cgroup-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
docs: cgroup-v1: Update charge-commit section
cgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulated
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- Fix idle CPU selection returning prev_cpu outside the task's cpus_ptr
when the BPF caller's allowed mask was wider. Stable backport.
- Two opposite-direction gaps in scx_task_iter's cgroup-scoped mode
versus the global mode:
- Tasks past exit_signals() are filtered by the cgroup walk but kept
by global. Sub-scheduler enable abort leaked __scx_init_task()
state. Add a CSS_TASK_ITER_WITH_DEAD flag to cgroup's task
iterator (scx_task_iter is its only user) and use it.
- Tasks past sched_ext_dead() are still returned, tripping
WARN_ON_ONCE() in callers or making them touch torn-down state.
Mark and skip under the per-task rq lock.
* tag 'sched_ext-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: idle: Recheck prev_cpu after narrowing allowed mask
sched_ext: Skip past-sched_ext_dead() tasks in scx_task_iter_next_locked()
cgroup, sched_ext: Include exiting tasks in cgroup iter
|
|
The recent RSEQ optimization work broke the TCMalloc abuse of the RSEQ ABI
as it not longer unconditionally updates the CPU, node, mm_cid fields,
which are documented as read only for user space. Due to the observed
behavior of the kernel it was possible for TCMalloc to overwrite the
cpu_id_start field for their own purposes and rely on the kernel to update
it unconditionally after each context switch and before signal delivery.
The RSEQ ABI only guarantees that these fields are updated when the data
changes, i.e. the task is migrated or the MMCID of the task changes due to
switching from or to per CPU ownership mode.
The optimization work eliminated the unconditional updates and reduced them
to the documented ABI guarantees, which results in a massive performance
win for syscall, scheduling heavy work loads, which in turn breaks the
TCMalloc expectations.
There have been several options discussed to restore the TCMalloc
functionality while preserving the optimization benefits. They all end up
in a series of hard to maintain workarounds, which in the worst case
introduce overhead for everyone, e.g. in the scheduler.
The requirements of TCMalloc and the optimization work are diametral and
the required work arounds are a maintainence burden. They end up as fragile
constructs, which are blocking further optimization work and are pretty
much guaranteed to cause more subtle issues down the road.
The optimization work heavily depends on the generic entry code, which is
not used by all architectures yet. So the rework preserved the original
mechanism moslty unmodified to keep the support for architectures, which
handle rseq in their own exit to user space loop. That code is currently
optimized out by the compiler on architectures which use the generic entry
code.
This allows to revert back to the original behaviour by replacing the
compile time constant conditions with a runtime condition where required,
which disables the optimization and the dependend time slice extension
feature until the run-time condition can be enabled in the RSEQ
registration code on a per task basis again.
The following changes are required to restore the original behavior, which
makes TCMalloc work again:
1) Replace the compile time constant conditionals with runtime
conditionals where appropriate to prevent the compiler from optimizing
the legacy mode out
2) Enforce unconditional update of IDs on context switch for the
non-optimized v1 mode
3) Enforce update of IDs in the pre signal delivery path for the
non-optimized v1 mode
4) Enforce update of IDs in the membarrier(RSEQ) IPI for the
non-optimized v1 mode
5) Make time slice and future extensions depend on optimized v2 mode
This brings back the full performance problems, but preserves the v2
optimization code and for generic entry code using architectures also the
TIF_RSEQ optimization which avoids a full evaluation of the exit to user
mode loop in many cases.
Fixes: 566d8015f7ee ("rseq: Avoid CPU/MM CID updates when no event pending")
Reported-by: Mathias Stearn <mathias@mongodb.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Closes: https://lore.kernel.org/CAHnCjA25b+nO2n5CeifknSKHssJpPrjnf+dtr7UgzRw4Zgu=oA@mail.gmail.com
Link: https://patch.msgid.link/20260428224427.517051752%40kernel.org
Cc: stable@vger.kernel.org
|
|
kvmalloc() now supports non-sleeping GFP flags, including
the vmalloc fallback path. This means it may return vmalloc
memory even for GFP_ATOMIC and GFP_NOWAIT allocations.
Freeing such memory with kvfree() may then end up calling
vfree(), which is not safe for non-sleeping contexts.
Introduce kvfree_atomic() helper for such cases. It mirrors
kvfree(), but uses vfree_atomic() for vmalloced memory.
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Acked-by: Harry Yoo (Oracle) <harry@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Since commit 041ee6f3727a ("kthread: Rely on HK_TYPE_DOMAIN for preferred
affinity management"), kthreads default to use the HK_TYPE_DOMAIN
cpumask. IOW, it is no longer affected by the setting of the nohz_full
boot kernel parameter.
That means HK_TYPE_KTHREAD should now be an alias of HK_TYPE_DOMAIN
instead of HK_TYPE_KERNEL_NOISE to correctly reflect the current kthread
behavior. Make the change as HK_TYPE_KTHREAD is still being used in
some networking code.
Fixes: 041ee6f3727a ("kthread: Rely on HK_TYPE_DOMAIN for preferred affinity management")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
scx_task_iter's cgroup-scoped mode can return tasks whose
sched_ext_dead() has already completed: cgroup_task_dead() removes
from cset->tasks after sched_ext_dead() in finish_task_switch() and is
irq-work deferred on PREEMPT_RT. The global mode is fine -
sched_ext_dead() removes from scx_tasks via list_del_init() first.
Callers (sub-sched enable prep/abort/apply, scx_sub_disable(),
scx_fail_parent()) assume returned tasks are still on @sch and trip
WARN_ON_ONCE() or operate on torn-down state otherwise.
Set %SCX_TASK_OFF_TASKS in sched_ext_dead() under @p's rq lock and
have scx_task_iter_next_locked() skip flagged tasks under the same
lock. Setter and reader serialize on the per-task rq lock - no race.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup") made
css_task_iter_advance() skip exiting tasks so cgroup.procs stays consistent
with waitpid() visibility. Unfortunately, this broke scx_task_iter.
scx_task_iter walks either scx_tasks (global) or a cgroup subtree via
css_task_iter() and the two modes are expected to cover the same set of
tasks. After the above change the cgroup-scoped mode silently skips tasks
past exit_signals() that are still on scx_tasks.
scx_sub_enable_workfn()'s abort path is one of the symptoms: an exiting
SCX_TASK_SUB_INIT task can race past the cgroup iter leaking
__scx_init_task() state. Other iterations share the same gap.
Add CSS_TASK_ITER_WITH_DEAD to opt out of the skip and use it from
scx_task_iter().
Fixes: b0e4c2f8a0f0 ("sched_ext: Implement cgroup subtree iteration for scx_task_iter")
Reported-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
A chain of commits going back to v7.0 reworked rmdir to satisfy the
controller invariant that a subsystem's ->css_offline() must not run while
tasks are still doing kernel-side work in the cgroup.
[1] d245698d727a ("cgroup: Defer task cgroup unlink until after the task is done switching out")
[2] a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup")
[3] 1b164b876c36 ("cgroup: Wait for dying tasks to leave on rmdir")
[4] 4c56a8ac6869 ("cgroup: Fix cgroup_drain_dying() testing the wrong condition")
[5] 13e786b64bd3 ("cgroup: Increment nr_dying_subsys_* from rmdir context")
[1] moved task cset unlink from do_exit() to finish_task_switch() so a
task's cset link drops only after the task has fully stopped scheduling.
That made tasks past exit_signals() linger on cset->tasks until their final
context switch, which led to a series of problems as what userspace expected
to see after rmdir diverged from what the kernel needs to wait for. [2]-[5]
tried to bridge that divergence: [2] filtered the exiting tasks from
cgroup.procs; [3] had rmdir(2) sleep in TASK_UNINTERRUPTIBLE for them; [4]
fixed the wait's condition; [5] made nr_dying_subsys_* visible
synchronously.
The cgroup_drain_dying() wait in [3] turned out to be a dead end. When the
rmdir caller is also the reaper of a zombie that pins a pidns teardown (e.g.
host PID 1 systemd reaping orphan pids that were re-parented to it during
the same teardown), rmdir blocks in TASK_UNINTERRUPTIBLE waiting for those
pids to free, the pids can't free because PID 1 is the reaper and it's stuck
in rmdir, and the system A-A deadlocks. No internal lock ordering breaks
this; the wait itself is the bug.
The css killing side that drove the original reorder, however, can be made
cleanly asynchronous: ->css_offline() is already async, run from
css_killed_work_fn() driven by percpu_ref_kill_and_confirm(). The fix is to
make that chain start only after all tasks have left the cgroup. rmdir's
user-visible side then returns as soon as cgroup.procs and friends are
empty, while ->css_offline() still runs only after the cgroup is fully
drained.
Verified by the original reproducer (pidns teardown + zombie reaper, runs
under vng) which hangs vanilla and succeeds here, and by per-commit
deterministic repros for [2], [3], [4], [5] with a boot parameter that
widens the post-exit_signals() window so each state is reliably reachable.
Some stress tests on top of that.
cgroup_apply_control_disable() has the same shape of pre-existing race:
when a controller is disabled via subtree_control, kill_css() ran
synchronously while tasks past exit_signals() could still be linked to
the cgroup's csets, and ->css_offline() could fire before they drained.
This patch preserves the existing synchronous behavior at that call site
(kill_css_sync() + kill_css_finish() back-to-back) and a follow-up patch
will defer kill_css_finish() there using a per-css trigger.
This seems like the right approach and I don't see problems with it. The
changes are somewhat invasive but not excessively so, so backporting to
-stable should be okay. If something does turn out to be wrong, the fallback
is to revert the entire chain ([1]-[5]) and rework in the development branch
instead.
v2: Pin cgrp across the deferred destroy work with explicit
cgroup_get()/cgroup_put() around queue_work() and the work_fn. v1
wasn't actually broken (ordered cgroup_offline_wq + queue_work order
in cgroup_task_dead() saved it) but the explicit ref removes the
dependency on those non-obvious invariants. Also note the
pre-existing cgroup_apply_control_disable() race in the description;
a follow-up will defer kill_css_finish() there.
Fixes: 1b164b876c36 ("cgroup: Wait for dying tasks to leave on rmdir")
Cc: stable@vger.kernel.org # v7.0+
Reported-and-tested-by: Martin Pitt <martin@piware.de>
Link: https://lore.kernel.org/all/afHNg2VX2jy9bW7y@piware.de/
Link: https://lore.kernel.org/all/35e0670adb4abeab13da2c321582af9f@kernel.org/
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Pull drm fixes from Dave Airlie:
"Fixes for rc2, the usual amdgpu/xe double header, I think xe had a
couple of weeks combined due to some maintainer access issues,
otherwise there's just a few misc fixes and documentation fixups.
core and helpers:
- calculate framebuffer geometry with format helpers
- fix docs
amdgpu:
- GFX12 fix for CONFIG_DRM_DEBUG_MM configs
- Fix DC analog support
- Userq fixes
- GART placement fix
- Aldebaran SMU fixes
- AMDGPU_INFO_READ_MMR_REG fix
- UVD 3.1 fix
- GC 6 TCC fix
- Fix root reservation in amdgpu_vm_handle_fault()
- RAS fix
- Module reload fix for APUs
- Fix build for CONFIG_DRM_FBDEV_EMULATION=n
- IGT DWB regression fix
- GC 11.5.4 fix
- VCN user fence fixes
- JPEG user fence fixes
- SMU 13.0.6 fix
- VCN 3/4 IB parser fixes
- NV3x+ dGPU vblank fix
- DCE6/8 fixes for LVDS/eDP panels without an EDID
amdkfd:
- Fix for when CONFIG_HSA_AMD is not set
- SVM fixes
xe:
- uapi: Add missing pad and extensions check
- uapi: Reject unsafe PAT indices for CPU cached memory
- Drop registration of guc_submit_wedged_fini from xe_guc_submit_wedge
- Xe3p tuning and workaround fixes
- USE drm mm instead of drm SA for CCS read/write
- Fix leaks and null derefs
- Fix Wa_18022495364
appletbdrm:
- allocate protocol buffers with kvzalloc()
dma-buf:
- fix docs
imagination:
- avoid segfault in debugfs
ofdrm:
- put PCI device reference on errors
udl:
- increase USB timeout"
* tag 'drm-fixes-2026-05-02' of https://gitlab.freedesktop.org/drm/kernel: (77 commits)
drm/xe/uapi: Reject coh_none PAT index for CPU_ADDR_MIRROR
drm/xe/uapi: Reject coh_none PAT index for CPU cached memory in madvise
drm/xe/xelp: Fix Wa_18022495364
drm/xe/gsc: Fix BO leak on error in query_compatibility_version()
drm/xe/eustall: Fix drm_dev_put called before stream disable in close
drm/xe: Fix error cleanup in xe_exec_queue_create_ioctl()
drm/xe: Fix dma-buf attachment leak in xe_gem_prime_import()
drm/xe: Fix bo leak in xe_dma_buf_init_obj() on allocation failure
drm/xe/bo: Fix bo leak on GGTT flag validation in xe_bo_init_locked()
drm/xe/bo: Fix bo leak on unaligned size validation in xe_bo_init_locked()
drm/xe: Fix potential NULL deref in xe_exec_queue_tlb_inval_last_fence_put_unlocked
drm/xe/vf: Use drm mm instead of drm sa for CCS read/write
drm/xe: Add memory pool with shadow support
drm/xe/debugfs: Correct printing of register whitelist ranges
drm/xe: Mark ROW_CHICKEN5 as a masked register
drm/xe/tuning: Use proper register offset for GAMSTLB_CTRL
drm/xe/xe3p_lpg: Add missing indirect ring state feature flag
drm/xe: Drop redundant rtp entries for Wa_14019988906 & Wa_14019877138
drm/xe/vm: Add missing pad and extensions check
drm/xe: Drop registration of guc_submit_wedged_fini from xe_guc_submit_wedge()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following batch contains Netfilter fixes for net:
1) Replace skb_try_make_writable() by skb_ensure_writable() in
nft_fwd_netdev and the flowtable to deal with uncloned packets
having their network header in paged fragments.
2) Drop packet if output device does not exist and ensure sufficient
headroom in nft_fwd_netdev before transmitting the skb.
3) Use the existing dup recursion counter in nft_fwd_netdev for the
neigh_xmit variant, from Weiming Shi.
4) Add .check_hooks interface to x_tables to detach the control plane
hook check based on the match/target configuration. Then, update
nft_compat to use .check_hooks from .validate path, this fixes a
lack of hook validation for several match/targets.
5) Fix incorrect .usersize in xt_CT, from Florian Westphal.
6) Fix a memleak with netdev tables in dormant state,
from Florian Westphal.
7) Several patches to check if the packet is a fragment, then skip
layer 4 inspection, for x_tables and nf_tables; as well as common
nf_socket infrastructure. The xt_hashlimit match drops fragments
to stay consistent with the existing approach when failing to parse
the layer 4 protocol header.
8) Ensure sufficient headroom in the flowtable before transmitting
the skb.
9) Fix the flowtable inline vlan approach for double-tagged vlan:
Reverse the iteration over .encap[] since it represents the
encapsulation as seen from the ingress path. Postpone pushing
layer 2 header so output device is available to calculate needed
headroom. Finally, add and use nf_flow_vlan_push() to fix it.
10) Fix flowtable inline pppoe with GSO packets. Moreover, use
FLOW_OFFLOAD_XMIT_DIRECT to fill up destination hardware
address since neighbour cache does not exist in pppoe.
11) Use skb_pull_rcsum() to decapsulate vlan and pppoe headers, for
double-tagged vlan in particular this should provide some benefits
in certain scenarios.
More notes regarding 9-11):
- sashiko is also signalling to use it for IPIP headers, but that needs
more adjustments such setting skb->protocol after removing the IPIP
header, will follow up in a separated patch.
- I plan to submit selftests to cover double-tagged-vlan. As for pppoe,
it should be possible but that would mandate a few userspace dependencies.
This has been semi-automatically tested by me and reporters describing
broken double-vlan-tagged and pppoe currently in the flowtable.
* tag 'nf-26-05-01' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: flowtable: use skb_pull_rcsum() to pop vlan/pppoe header
netfilter: flowtable: fix inline pppoe encapsulation in xmit path
netfilter: flowtable: fix inline vlan encapsulation in xmit path
netfilter: flowtable: ensure sufficient headroom in xmit path
netfilter: xtables: fix L4 header parsing for non-first fragments
netfilter: nf_tables: skip L4 header parsing for non-first fragments
netfilter: nf_socket: skip socket lookup for non-first fragments
netfilter: nf_tables: fix netdev hook allocation memleak with dormant tables
netfilter: xt_CT: fix usersize for v1 and v2 revision
netfilter: nft_compat: run xt_check_hooks_{match,target}() from .validate
netfilter: x_tables: add .check_hooks to matches and targets
netfilter: nft_fwd_netdev: use recursion counter in neigh egress path
netfilter: nft_fwd_netdev: add device and headroom validate with neigh forwarding
netfilter: replace skb_try_make_writable() by skb_ensure_writable()
====================
Link: https://patch.msgid.link/20260501122237.296262-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This makes it easier to rebuild cifs.ko and ksmbd.ko against
a running kernel.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Link: https://lore.kernel.org/linux-cifs/aehrPuY60VMcYGU8@infradead.org/
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
rseq_reset() uses memset() to clear the tasks rseq data. That's racy
against membarrier() and preemption.
Guard it with irqsave to cure this.
Fixes: faba9d250eae ("rseq: Introduce struct rseq_data")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://patch.msgid.link/20260428224427.353887714%40kernel.org
Cc: stable@vger.kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- MD pull request via Yu:
- Fix a raid5 UAF on IO across the reshape position
- Avoid failing RAID1/RAID10 devices for invalid IO errors
- Fix RAID10 divide-by-zero when far_copies is zero
- Restore bitmap grow through sysfs
- Use mddev_is_dm() instead of open-coding gendisk checks
- Use ATTRIBUTE_GROUPS() for md default sysfs attributes
- Replace open-coded wait loops with wait_event helpers
- NVMe pull request via Keith:
- Target data transfer size configuation (Aurelien)
- Enable P2P for RDMA (Shivaji Kant)
- TCP target updates (Maurizio, Alistair, Chaitanya, Shivam Kumar)
- TCP host updates (Alistair, Chaitanya)
- Authentication updates (Alistair, Daniel, Chris Leech)
- Multipath fixes (John Garry)
- New quirks (Alan Cui, Tao Jiang)
- Apple driver fix (Fedor Pchelkin)
- PCI admin doorbell update fix (Keith)
- Properly propagate CDROM read-only state to the block layer
* tag 'block-7.1-20260430' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (35 commits)
md: use ATTRIBUTE_GROUPS() for md default sysfs attributes
md: use mddev_is_dm() instead of open-coding gendisk checks
md/raid1: replace wait loop with wait_event_idle() in raid1_write_request()
md/md-bitmap: add a none backend for bitmap grow
md/md-bitmap: split bitmap sysfs groups
md: factor bitmap creation away from sysfs handling
md: use mddev_lock_nointr() in mddev_suspend_and_lock_nointr()
md: replace wait loop with wait_event() in md_handle_request()
md/raid10: fix divide-by-zero in setup_geo() with zero far_copies
md/raid1,raid10: don't fail devices for invalid IO errors
MAINTAINERS: Add Xiao Ni as md/raid reviewer
md/raid5: Fix UAF on IO across the reshape position
cdrom, scsi: sr: propagate read-only status to block layer via set_disk_ro()
nvme-auth: Hash DH shared secret to create session key
nvme-pci: fix missed admin queue sq doorbell write
nvme-auth: Include SC_C in RVAL controller hash
nvme-tcp: teardown circular locking fixes
nvmet-tcp: Don't clear tls_key when freeing sq
Revert "nvmet-tcp: Don't free SQ on authentication success"
nvme: skip trace completion for host path errors
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM fixes from Andrew Morton:
"20 hotfixes. All are for MM (and for MMish maintainers). 9 are
cc:stable and the remainder are for post-7.0 issues or aren't deemed
suitable for backporting.
There are two DAMON series from SeongJae Park which address races
which could lead to use-after-free errors, and avoid the possibility
of presenting stale parameter values to users"
* tag 'mm-hotfixes-stable-2026-04-30-15-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: memcontrol: fix rcu unbalance in get_non_dying_memcg_end()
mm/userfaultfd: detect VMA type change after copy retry in mfill_copy_folio_retry()
MAINTAINERS: remove stale kdump project URL
mm/damon/stat: detect and use fresh enabled value
mm/damon/lru_sort: detect and use fresh enabled and kdamond_pid values
mm/damon/reclaim: detect and use fresh enabled and kdamond_pid values
selftests/mm: specify requirement for PROC_MEM_ALWAYS_FORCE=y
mm/damon/sysfs-schemes: protect path kfree() with damon_sysfs_lock
mm/damon/sysfs-schemes: protect memcg_path kfree() with damon_sysfs_lock
MAINTAINERS: update Li Wang's email address
MAINTAINERS, mailmap: update email address for Qi Zheng
MAINTAINERS: update Liam's email address
mm/hugetlb_cma: round up per_node before logging it
MAINTAINERS: fix regex pattern in CORE MM category
mm/vma: do not try to unmap a VMA if mmap_prepare() invoked from mmap()
mm: start background writeback based on per-wb threshold for strictlimit BDIs
kho: fix error handling in kho_add_subtree()
liveupdate: fix return value on session allocation failure
mailmap: update entry for Dan Carpenter
vmalloc: fix buffer overflow in vrealloc_node_align()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull mtd fixes from Miquel Raynal:
"Besides an out-of-bound bug, this is about properly supporting Winbond
octal SPI NAND chips which use a specific pattern for stuffing more
address bits in some operations. This uses the spi-mem flag in SPI
NAND that was added to the spi-mem layer just before the merge window
through the spi tree"
* tag 'mtd/fixes-for-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: spinand: winbond: Fix ODTR write VCR on W35NxxJW
mtd: spinand: winbond: Set the packed page read flag to W35N02/04JW
mtd: spinand: Add support for packed read data ODTR commands
mtd: spi-nor: debugfs: fix out-of-bounds read in spi_nor_params_show()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter.
Current release - regressions:
- ipmr: free mr_table after RCU grace period.
Previous releases - regressions:
- core: add net_iov_init() and use it to initialize ->page_type
- sched: taprio: fix NULL pointer dereference in class dump
- netfilter: nf_tables:
- use list_del_rcu for netlink hooks
- fix strict mode inbound policy matching
- tcp: make probe0 timer handle expired user timeout
- vrf: fix a potential NPD when removing a port from a VRF
- eth: ice:
- fix NULL pointer dereference in ice_reset_all_vfs()
- fix infinite recursion in ice_cfg_tx_topo via ice_init_dev_hw
Previous releases - always broken:
- page_pool: fix memory-provider leak in error path
- sched: sch_cake: annotate data-races in cake_dump_stats()
- mptcp: fix scheduling with atomic in timestamp sockopt
- psp: check for device unregister when creating assoc
- tls: fix strparser anchor skb leak on offload RX setup failure
- eth:
- stmmac: prevent NULL deref when RX memory exhausted
- airoha: do not read uninitialized fragment address
- rtl8150: fix use-after-free in rtl8150_start_xmit()
Misc:
- add Ido Schimmel as IPv4/IPv6 maintainer
- add David Heidelberg as NFC subsystem maintainer"
* tag 'net-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits)
net/sched: cls_flower: revert unintended changes
sfc: fix error code in efx_devlink_info_running_versions()
net: tls: fix strparser anchor skb leak on offload RX setup failure
ice: add dpll peer notification for paired SMA and U.FL pins
ice: fix missing dpll notifications for SW pins
dpll: export __dpll_pin_change_ntf() for use under dpll_lock
ice: fix SMA and U.FL pin state changes affecting paired pin
ice: fix missing SMA pin initialization in DPLL subsystem
ice: fix infinite recursion in ice_cfg_tx_topo via ice_init_dev_hw
ice: fix NULL pointer dereference in ice_reset_all_vfs()
iavf: add VIRTCHNL_OP_ADD_VLAN to success completion handler
iavf: wait for PF confirmation before removing VLAN filters
iavf: stop removing VLAN filters from PF on interface down
iavf: rename IAVF_VLAN_IS_NEW to IAVF_VLAN_ADDING
page_pool: fix memory-provider leak in page_pool_create_percpu() error path
bonding: 3ad: implement proper RCU rules for port->aggregator
net: airoha: Do not return err in ndo_stop() callback
hv_sock: fix ARM64 support
MAINTAINERS: update the IPv4/IPv6 entry and add Ido Schimmel
selftests: drv-net: clarify linters and frameworks in README
...
|