linux.git - Linux kernel source tree

diff options

author	Mike Marshall <hubcap@omnibond.com>	2026-04-13 11:18:23 -0400
committer	Mike Marshall <hubcap@omnibond.com>	2026-04-13 12:14:17 -0400
commit	e61bc5e4d87433c8759e7dc92bb640ef71a8970c (patch)
tree	c4e0e50b5691128f37b6413e444f3ae5f466f11e /rust/kernel/alloc
parent	092e0d0e964279feb9f43f81e8d1c52ef080d085 (diff)

bufmap: manage as folios, V2.

Thanks for the feedback from Dan Carpenter and Arnd Bergmann. Dan suggested to make the rollback loop in orangefs_bufmap_map more robust. Arnd caught a %ld format for a size_t in orangefs_bufmap_copy_to_iovec. He suggested %zd, I used %zu which I think is OK too. Orangefs userspace allocates 40 megabytes on an address that's page aligned. With this folio modification the allocation is aligned on a multiple of 2 megabytes: posix_memalign(&ptr, 2097152, 41943040); Then userspace tries to enable Huge Pages for the range: madvise(ptr, 41943040, MADV_HUGEPAGE); Userspace provides the address of the 40 megabyte allocation to the Orangefs kernel module with an ioctl. The kernel module initializes the memory as a "bufmap" with ten 4 megabyte "slots". Traditionally, the slots are manipulated a page at a time. This folio/bufmap modification manages the slots as folios, with two 2 megabyte folios per slot and data can be read into and out of each slot a folio at a time. This modification works fine with orangefs userspace lacking the THP focused posix_memalign and madvise settings listed above, each slot can end up being made of page sized folios. It also works if there are some, but less than 20, hugepages available. A message is printed in the kernel ring buffer (dmesg) at userspace start time that describes the folio/page ratio. As an example, I started orangefs and saw "Grouped 2575 folios from 10240 pages" in the ring buffer. To get the optimum ratio, 20/10240, I use these settings before I start the orangefs userspace: echo always > /sys/kernel/mm/transparent_hugepage/enabled echo always > /sys/kernel/mm/transparent_hugepage/defrag echo 30 > /proc/sys/vm/nr_hugepages https://docs.kernel.org/admin-guide/mm/hugetlbpage.html discusses hugepages and manipulating the /proc/sys/vm settings. Comparing the performance between the page/bufmap and the folio/bufmap is a mixed bag. - The folio/bufmap version is about 8% faster at running through the xfstest suite on my VMs. - It is easy to construct an fio test that brings the page/bufmap version to its knees on my dinky VM test system, with all bufmap slots used and I/O timeouts cascading. - Some smaller tests I did with fio that didn't overwhelm the page/bufmap version showed no performance gain with the folio/bufmap version on my VM. I suspect this change will improve performance only in some use-cases. I think it will be a gain when there are many concurrent IOs that mostly fill the bufmap. I'm working up a gcloud test for that. Reported-by: Dan Carpenter <error27@gmail.com> Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mike Marshall <hubcap@omnibond.com>

Diffstat (limited to 'rust/kernel/alloc')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: