| Age | Commit message (Collapse) | Author |
|
State that the object charge is zero if object->cred == NULL, or equal
to the ptoa(object->size) otherwise.
Besides being much simpler, the transition to use object->size corrects
the architectural issue with the use of object->charge. The split
operations effectively carve the holes in the charged regions, but
single counter cannot properly express it. As result, coalescing
anonymous mappings cannot calculate correctly if the extended mapping
already backed by the existing object is already accounted or not [1].
To properly solve the issue, either we need to start tracking exact
charged regions in the anonymous objects, which has the significant
overhead and complications. Or give up on the slight over-accounting
and charge the whole object unconditionally, as it is done in the patch.
Reported by: mmel, pho [1]
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D54572
|
|
Remove twice included but unneeded explicit sys/param.h. Sort.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
|
|
Walk through the disk driver entries chained off of INT13.
MEMDISK is part of the Syslinux project; it loads disk images into
memory, sets an int 13h hook and then does a BIOS boot from the image;
this can be used as part of a PXE boot environment to load installer
disks, however the disks are not accessible from inside the FreeBSD
kernel because it doesn't access disks through BIOS APIs.
This patch detects the disk images in the loader, and passes their
address and length as a driver hint. When the md driver sees the hint,
it maps the image, and presents it to the system.
(rebased and reworked from https://reviews.freebsd.org/D27349)
Feedback from: kib, bapt, olce
Differential Revision: https://reviews.freebsd.org/D45404
|
|
We've done the same in the past to the vnconfig.8->mdconfig.8 link in:
eb5f4569819 Remove ancient vnconfig symlink
Reviewed by: bcr, markj, ziaee
Approved by: markj (mentor)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27122
|
|
mddestroy() may be invoked on a partially constructed md device.
Restore the guards that handled this prior to commit e91022168101.
Reported by: syzbot+a0ff73f664de8757cfaa@syzkaller.appspotmail.com
Reported by: syzbot+7b4a4824bf81548283ab@syzkaller.appspotmail.com
Reviewed by: kib
Fixes: e91022168101 ("md(4): move type-specific data under union")
Differential Revision: https://reviews.freebsd.org/D51145
|
|
This way it is clear which type uses which members.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D51127
|
|
With the old size, the string could easily be truncated, resulting in
non-unique identifiers.
PR: 287679
Reported by: Phil Krylov <phil@krylov.eu>
Reviewed by: kib
MFC after: 2 weeks
|
|
embedfs.S needs the right aarch64 features for BTI and/or PAC.
Obtained from: CheriBSD
Fixes: c2e0d56f5e49 ("arm64: Support BTI checking in most of the kernel")
Sponsored by: AFRL, DARPA
|
|
Do it also for the preloaded disk, in addition to the dynamically
configured device. This is needed to avoid geom checking alignment and
panicing on read of the last sector, e.g. for partition schemes and
label tasting.
PR: 281978
Reported by: bz
Reviewed by: bz, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D47102
|
|
If those options are requested when the device is created, ensure that
they will be reported by MDIOCQUERY.
MFC after: 2 weeks
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1270
|
|
While here, use bp->bio_cmd instead of auio.uio_rw to drive read vs
write behavior.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D45155
|
|
In cd8537910406, kib made maxphys a load-time tunable. This made
the #define MAXPHYS in sys/param.h almost entirely obsolete, as
it could now be overridden by kern.maxphys at boot time, or by
opt_maxphys.h.
However, decades of tradition have led to several new, incorrect, uses
of MAXPHYS in other parts of the kernel, mostly by seasoned
developers. I've corrected those uses here in a mechanical fashion,
and verified that it fixes a bug in the md driver that I was
experiencing.
Since using MAXPHYS is such an easy mistake to make, it is best to
hide it from the kernel namespace. So I've moved its definition to
_maxphys.h, which is now included in param.h only for userspace.
That brings up the fact that lots of userspace programs use MAXPHYS
for different reasons, most of them probably wrong. Userspace consumers
that really need to know the value of maxphys should probably be
changed to use the kern.maxphys sysctl. But that's outside the scope
of this change.
Reviewed by: imp, jkim, kib, markj
Fixes: 30038a8b4efc ("md: Get rid of the pbuf zone")
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D44986
|
|
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.
Sponsored by: Netflix
|
|
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
|
|
Because the 32-bit md_ioctl structure contains 64-bit members, arm
and powerpc add padding to a multiple of 8. i386 doesn't do this.
The md_ioctl32 definition was correct for amd64/i386 without padding,
but wrong for arm64 and powerpc64. Make __packed__ conditional on
__amd64__, and test for the expected size on non-amd64. Note that
mdconfig is used in the ATF test suite. Note, I verified the
structure size for powerpc, but was unable to test.
MFC after: 1 week
Reviewed by: jrtc27
Differential Revision: https://reviews.freebsd.org/D41339
Discussed with: jhibbits
|
|
Currently for the MFS, firmware and VDSO template assembly files we pass
the path to include with .incbin unquoted and use __XSTRING within the
assembly file to stringify it. However, __XSTRING doesn't just perform a
single level of expansion, it performs the normal full expansion of the
macro, and so if the path itself happens to tokenise to something that
includes a defined macro in it that will itself be substituted. For
example, with #define MACRO 1, a path like /path/containing/MACRO/in/it
will expand to /path/containing/1/in/it and then, when stringified, end
up as "/path/containing/1/in/it", not the intended string. Normally,
macros have names that start or end witih underscores and are unlikely
to appear in a tokenised path (even if technically they could), but now
that we've switched to GNU C as of commit ec41a96daaa6 ("sys: Switch the
kernel's C standard from C99 to GNU99.") there are a few new macros
defined which don't start or end with underscores: unix, which is always
defined to 1, and i386, which is defined to 1 on i386. The former
probably doesn't appear in user paths in practice, but the latter has
been seen to and is likely quite common in the wild.
Fix this by defining the macro pre-quoted instead of using __XSTRING.
Note that technically we don't need to do this for vdso_wrap.S today as
all the paths passed to it are safe file names with no user-controlled
prefix but we should do it anyway for consistency and robustness against
future changes.
This allows make tinderbox to pass when built with source and object
directories inside ~/path-with-unix, which would otherwise expand to
~/path-with-1 and break.
PR: 272744
Fixes: ec41a96daaa6 ("sys: Switch the kernel's C standard from C99 to GNU99.")
|
|
The zone is used solely to provide KVA for mapping BIOs so that we can
pass mapped buffers to VOP_READ and VOP_WRITE. Currently we preallocate
nswbuf/10 bufs for this purpose during boot.
The intent was to limit KVA usage on 32-bit systems, but the
preallocation means that we in fact consumed more KVA than needed unless
one has more than nswbuf/10 (typically 25) vnode-backed MD devices
in existence, which I would argue is the uncommon case.
Meanwhile, all I/O to an MD is handled by a dedicated thread, so we can
instead simply preallocate the KVA region at MD device creation time.
Event: BSDCan 2023
Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D40215
|
|
Noted by: jkim
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
|
|
We need to repeat the operation if the vnode was relocked.
Reported and reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38114
|
|
|
|
backend.
PR: 260200
Reported by: editor@callfortesting.org
Reviewed by: vmaffione (mentor), markj
Approved by: vmaffione (mentor), markj
Differential Revision: https://reviews.freebsd.org/D34260
|
|
This was missed in 74d6c131cbe2 where other geom modules were annotated
with MODULE_VERSION. Again, the problem is the same: we can't detect
that geom_md is loaded into the kernel without it.
This was noticed in release builds on the cluster; mdconfig attempts to
load geom_md because it can't detect it in the kernel, but the cluster
config includes md(4) and does not build the kmod. This problem would
have been masked on hosts with the kmod built, as the kmod attempts to
register the g_md module and fails. With this commit, mdconfig would
not even try to load it again.
Reported by: re (cperciva)
MFC after: 3 days
|
|
See b4a58fbf640409a1 ("vfs: remove cn_thread")
Bump __FreeBSD_version to 1400043.
|
|
This adds an option to detect if hole-punching is implemented by the
underlying file system. If this flag is set, and if the underlying file
system does not support hole-punching, md(4) fails BIO_DELETE requests
with EOPNOTSUPP.
Sponsored by: The FreeBSD Foundation
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D31883
|
|
We do this when creating md(4) devices, in kern_mdattach_locked(), but
not when resizing the provider. Apply the same policy when resizing, as
many GEOM classes do not expect to deal with providers for which
pp->mediasize % pp->sectorsize != 0.
Reported by: syzkaller
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
|
|
Both zero-filling and/or deallocation can be done with vn_deallocate(9).
Sponsored by: The FreeBSD Foundation
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D28899
|
|
LLVM12 complains if you change the symbol binding:
error: mfs_root_end changed binding to STB_WEAK [-Werror,-Winline-asm]
error: mfs_root changed binding to STB_WEAK [-Werror,-Winline-asm]
|
|
Release a grabbed page's busy state only after marking it as referenced.
Otherwise there exists a narrow window where the page could be freed
before the update. Before r356902 this was not a problem since the
object lock was held.
Discussed with: kib
Sponsored by: The FreeBSD Foundation
|
|
Account for any residual bytes. This is only relevant for vnode-backed
md(4) devices.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27738
|
|
g_handleattr_int() consumes the bio if the attribute matches, so when we
check bp->bio_cmd bp may have been freed.
Move GETATTR handling to a separate function to avoid the problem. We
do not need to set bio_completed for such bios, g_handleattr_int() will
handle it. Also remove the setting of bio_resid before the
devstat_end_transaction_bio() call. All of the md(4) bio handlers set
bio_resid already.
Reported by: KASAN
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27724
|
|
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.
Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*). Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.
Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys. Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight. Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.
Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.
Suggested by: mav (*)
Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27225
Notes:
svn path=/head/; revision=368124
|
|
Approved by: kaktus (src)
Notes:
svn path=/head/; revision=367618
|
|
This uses the .incbin directive to pull in the MFS image contents.
Using assembly directly ensures that symbols can be defined with the
name and properties (such as .size) desired without having to rename
symbols, etc. via a second objcopy invocation. Since it is compiled
by the C compiler driver, it also avoids the need for all of the
EMBEDFS* make variables.
Suggested by: jrtc27
Reviewed by: kib, markj
Obtained from: CheriBSD
MFC after: 2 weeks
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D26781
Notes:
svn path=/head/; revision=366897
|
|
Notes:
svn path=/head/; revision=365211
|
|
Reported by: alc
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25400
Notes:
svn path=/head/; revision=362739
|
|
All vm_object_page_remove() callers, except
linux_invalidate_mapping_pages() in the LinuxKPI, free swap space when
removing a range of pages from an object. The LinuxKPI case appears to
be an unintentional omission that could result in leaked swap blocks, so
unconditionally free swap space in vm_object_page_remove() to protect
against similar bugs in the future.
Reviewed by: alc, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25329
Notes:
svn path=/head/; revision=362613
|
|
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D23847
Notes:
svn path=/head/; revision=358447
|
|
The vnode pager does not want the object lock held. Moving this out allows
further object lock scope reduction in callers. While here add some missing
paging in progress calls and an assert. The object handle is now protected
explicitly with pip.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D23033
Notes:
svn path=/head/; revision=356902
|
|
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D21427
Notes:
svn path=/head/; revision=356337
|
|
r356147 removed a vm_page_activate() call, but this is required to
ensure that pages end up in the page queues in the first place.
Restore the pre-r356157 logic. Now, without the page lock, the
vm_page_active() check is racy, but this race is harmless.
Reviewed by: alc, kib
Reported and tested by: pho
Differential Revision: https://reviews.freebsd.org/D23024
Notes:
svn path=/head/; revision=356326
|
|
Alike to geom_disk free the provider statistics structure and point GEOM
toward local statistics. It allows to save some CPU time.
MFC after: 2 weeks
Notes:
svn path=/head/; revision=356315
|
|
Combined with earlier nstart/nend removal it allows to remove several locks
from request path of GEOM and few other places. It would be cool if we had
more SMP-friendly statistics, but this helps too.
Sponsored by: iXsystems, Inc.
Notes:
svn path=/head/; revision=356200
|
|
With the previous reviews, the page lock is no longer required in order
to perform queue operations on a page. It is also no longer needed in
the page queue scans. This change effectively eliminates remaining uses
of the page lock and also the false sharing caused by multiple pages
sharing a page lock.
Reviewed by: jeff
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D22885
Notes:
svn path=/head/; revision=356157
|
|
an exclusive object lock.
Previously swap space was freed on a best effort basis when a page that
had valid swap was dirtied, thus invalidating the swap copy. This may be
done inconsistently and requires the object lock which is not always
convenient.
Instead, track when swap space is present. The first dirty is responsible
for deleting space or setting PGA_SWAP_FREE which will trigger background
scans to free the swap space.
Simplify the locking in vm_fault_dirty() now that we can reliably identify
the first dirty.
Discussed with: alc, kib, markj
Differential Revision: https://reviews.freebsd.org/D22654
Notes:
svn path=/head/; revision=355765
|
|
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.
v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.
Reviewed by: kib, jeff
Differential Revision: https://reviews.freebsd.org/D22715
Notes:
svn path=/head/; revision=355537
|
|
tightening constraints on busy as a precursor to lockless page lookup and
should largely be a NOP for these cases.
Reviewed by: alc, kib, markj
Differential Revision: https://reviews.freebsd.org/D22611
Notes:
svn path=/head/; revision=355314
|
|
Atomics are used for page busy and valid state when the shared busy is
held. The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21594
Notes:
svn path=/head/; revision=353539
|
|
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
Notes:
svn path=/head/; revision=352110
|
|
It is unused, the ABI was broken in r322969, and it is broken by design
(more than MDNPAD md devices can exist and there is no way to retreive
them with this interface).
mdconfig(8) was converted to use libgeom to obtain this information
in r157160 and any other consumers of MDIOCLIST should likewise be
converted.
Reviewed by: emaste
Relnotes: yes
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D18936
Notes:
svn path=/head/; revision=351132
|
|
I/O operations already in its queue were not being properly drained.
The GEOM framework does the queue draining, but the device driver
needs to wait for the draining to happen. The waiting is done by
adding a g_md_providergone() function to wait for the I/O operations
to finish up.
It is likely that every GEOM provider that implements orphaning
attached GEOM consumers needs to use the "providergone" mechanism
for this same reason, but some of them do not do so. Apparently
Kenneth Merry (ken@) added the drain for just such races, but he
missed adding it to some of the device drivers that needed it.
Submitted by: Chuck Silvers
Reviewed by: imp
Tested by: Chuck Silvers
MFC after: 1 week
Sponsored by: Netflix
Notes:
svn path=/head/; revision=345758
|