linux-stable.git/include/linux/dax.h, branch v5.8.2

mm: don't include asm/pgtable.h if linux/mm.h is already included

2020-06-09T16:39:13+00:00

Patch series "mm: consolidate definitions of page table accessors", v2.

The low level page table accessors (pXY_index(), pXY_offset()) are
duplicated across all architectures and sometimes more than once.  For
instance, we have 31 definition of pgd_offset() for 25 supported
architectures.

Most of these definitions are actually identical and typically it boils
down to, e.g.

static inline unsigned long pmd_index(unsigned long address)
{
        return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
}

static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
{
        return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
}

These definitions can be shared among 90% of the arches provided
XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

For architectures that really need a custom version there is always
possibility to override the generic version with the usual ifdefs magic.

These patches introduce include/linux/pgtable.h that replaces
include/asm-generic/pgtable.h and add the definitions of the page table
accessors to the new header.

This patch (of 12):

The linux/mm.h header includes  to allow inlining of the
functions involving page table manipulations, e.g.  pte_alloc() and
pmd_alloc().  So, there is no point to explicitly include 
in the files that include .

The include statements in such cases are remove with a simple loop:

	for f in $(git grep -l "include ") ; do
		sed -i -e '/include / d' $f
	done

Signed-off-by: Mike Rapoport 
Signed-off-by: Andrew Morton 
Cc: Arnd Bergmann 
Cc: Borislav Petkov 
Cc: Brian Cain 
Cc: Catalin Marinas 
Cc: Chris Zankel 
Cc: "David S. Miller" 
Cc: Geert Uytterhoeven 
Cc: Greentime Hu 
Cc: Greg Ungerer 
Cc: Guan Xuetao 
Cc: Guo Ren 
Cc: Heiko Carstens 
Cc: Helge Deller 
Cc: Ingo Molnar 
Cc: Ley Foon Tan 
Cc: Mark Salter 
Cc: Matthew Wilcox 
Cc: Matt Turner 
Cc: Max Filippov 
Cc: Michael Ellerman 
Cc: Michal Simek 
Cc: Mike Rapoport 
Cc: Nick Hu 
Cc: Paul Walmsley 
Cc: Richard Weinberger 
Cc: Rich Felker 
Cc: Russell King 
Cc: Stafford Horne 
Cc: Thomas Bogendoerfer 
Cc: Thomas Gleixner 
Cc: Tony Luck 
Cc: Vincent Chen 
Cc: Vineet Gupta 
Cc: Will Deacon 
Cc: Yoshinori Sato 
Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
Signed-off-by: Linus Torvalds

dax,iomap: Add helper dax_iomap_zero() to zero a range

2020-04-03T02:15:03+00:00

Add a helper dax_ioamp_zero() to zero a range. This patch basically
merges __dax_zero_page_range() and iomap_dax_zero().

Suggested-by: Christoph Hellwig 
Signed-off-by: Vivek Goyal 
Reviewed-by: Christoph Hellwig 
Link: https://lore.kernel.org/r/20200228163456.1587-7-vgoyal@redhat.com
Signed-off-by: Dan Williams

dax, pmem: Add a dax operation zero_page_range

2020-04-03T02:15:03+00:00

Add a dax operation zero_page_range, to zero a page. This will also clear any
known poison in the page being zeroed.

As of now, zeroing of one page is allowed in a single call. There
are no callers which are trying to zero more than a page in a single call.
Once we grow the callers which zero more than a page in single call, we
can add that support. Primary reason for not doing that yet is that this
will add little complexity in dm implementation where a range might be
spanning multiple underlying targets and one will have to split the range
into multiple sub ranges and call zero_page_range() on individual targets.

Suggested-by: Christoph Hellwig 
Signed-off-by: Vivek Goyal 
Reviewed-by: Pankaj Gupta 
Link: https://lore.kernel.org/r/20200228163456.1587-3-vgoyal@redhat.com
Signed-off-by: Dan Williams

dax: Get rid of fs_dax_get_by_host() helper

2020-01-16T17:52:27+00:00

Looks like nobody is using fs_dax_get_by_host() except fs_dax_get_by_bdev()
and it can easily use dax_get_by_host() instead.

IIUC, fs_dax_get_by_host() was only introduced so that one could compile
with CONFIG_FS_DAX=n and CONFIG_DAX=m. fs_dax_get_by_bdev() achieves
the same purpose and hence it looks like fs_dax_get_by_host() is not
needed anymore.

Signed-off-by: Vivek Goyal 
Reviewed-by: Christoph Hellwig 
Link: https://lore.kernel.org/r/20200106181117.GA16248@redhat.com
Signed-off-by: Dan Williams

dax: Pass dax_dev instead of bdev to dax_writeback_mapping_range()

2020-01-03T19:13:12+00:00

As of now dax_writeback_mapping_range() takes "struct block_device" as a
parameter and dax_dev is searched from bdev name. This also involves taking
a fresh reference on dax_dev and putting that reference at the end of
function.

We are developing a new filesystem virtio-fs and using dax to access host
page cache directly. But there is no block device. IOW, we want to make
use of dax but want to get rid of this assumption that there is always
a block device associated with dax_dev.

So pass in "struct dax_device" as parameter instead of bdev.

ext2/ext4/xfs are current users and they already have a reference on
dax_device. So there is no need to take reference and drop reference to
dax_device on each call of this function.

Suggested-by: Christoph Hellwig 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
Signed-off-by: Vivek Goyal 
Link: https://lore.kernel.org/r/20200103183307.GB13350@redhat.com
Signed-off-by: Dan Williams

dax: check synchronous mapping is supported

2019-07-05T22:19:10+00:00

This patch introduces 'daxdev_mapping_supported' helper
which checks if 'MAP_SYNC' is supported with filesystem
mapping. It also checks if corresponding dax_device is
synchronous. Virtio pmem device is asynchronous and
does not not support VM_SYNC.

Suggested-by: Jan Kara 
Signed-off-by: Pankaj Gupta 
Reviewed-by: Jan Kara 
Signed-off-by: Dan Williams

libnvdimm: add dax_dev sync flag

2019-07-05T22:19:10+00:00

This patch adds 'DAXDEV_SYNC' flag which is set
for nd_region doing synchronous flush. This later
is used to disable MAP_SYNC functionality for
ext4 & xfs filesystem for devices don't support
synchronous flush.

Signed-off-by: Pankaj Gupta 
Signed-off-by: Dan Williams

dax: Arrange for dax_supported check to span multiple devices

2019-05-20T22:02:08+00:00

Pankaj reports that starting with commit ad428cdb525a "dax: Check the
end of the block-device capacity with dax_direct_access()" device-mapper
no longer allows dax operation. This results from the stricter checks in
__bdev_dax_supported() that validate that the start and end of a
block-device map to the same 'pagemap' instance.

Teach the dax-core and device-mapper to validate the 'pagemap' on a
per-target basis. This is accomplished by refactoring the
bdev_dax_supported() internals into generic_fsdax_supported() which
takes a sector range to validate. Consequently generic_fsdax_supported()
is suitable to be used in a device-mapper ->iterate_devices() callback.
A new ->dax_supported() operation is added to allow composite devices to
split and route upper-level bdev_dax_supported() requests.

Fixes: ad428cdb525a ("dax: Check the end of the block-device...")
Cc: 
Cc: Ira Weiny 
Cc: Dave Jiang 
Cc: Keith Busch 
Cc: Matthew Wilcox 
Cc: Vishal Verma 
Cc: Heiko Carstens 
Cc: Martin Schwidefsky 
Reviewed-by: Jan Kara 
Reported-by: Pankaj Gupta 
Reviewed-by: Pankaj Gupta 
Tested-by: Pankaj Gupta 
Tested-by: Vaibhav Jain 
Reviewed-by: Mike Snitzer 
Signed-off-by: Dan Williams

dax: Fix unlock mismatch with updated API

2018-12-05T05:32:00+00:00

Internal to dax_unlock_mapping_entry(), dax_unlock_entry() is used to
store a replacement entry in the Xarray at the given xas-index with the
DAX_LOCKED bit clear. When called, dax_unlock_entry() expects the unlocked
value of the entry relative to the current Xarray state to be specified.

In most contexts dax_unlock_entry() is operating in the same scope as
the matched dax_lock_entry(). However, in the dax_unlock_mapping_entry()
case the implementation needs to recall the original entry. In the case
where the original entry is a 'pmd' entry it is possible that the pfn
performed to do the lookup is misaligned to the value retrieved in the
Xarray.

Change the api to return the unlock cookie from dax_lock_page() and pass
it to dax_unlock_page(). This fixes a bug where dax_unlock_page() was
assuming that the page was PMD-aligned if the entry was a PMD entry with
signatures like:

 WARNING: CPU: 38 PID: 1396 at fs/dax.c:340 dax_insert_entry+0x2b2/0x2d0
 RIP: 0010:dax_insert_entry+0x2b2/0x2d0
 [..]
 Call Trace:
  dax_iomap_pte_fault.isra.41+0x791/0xde0
  ext4_dax_huge_fault+0x16f/0x1f0
  ? up_read+0x1c/0xa0
  __do_fault+0x1f/0x160
  __handle_mm_fault+0x1033/0x1490
  handle_mm_fault+0x18b/0x3d0

Link: https://lkml.kernel.org/r/20181130154902.GL10377@bombadil.infradead.org
Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray")
Reported-by: Dan Williams 
Signed-off-by: Matthew Wilcox 
Tested-by: Dan Williams 
Reviewed-by: Jan Kara 
Signed-off-by: Dan Williams

filesystem-dax: Introduce dax_lock_mapping_entry()

2018-07-23T17:38:06+00:00

In preparation for implementing support for memory poison (media error)
handling via dax mappings, implement a lock_page() equivalent. Poison
error handling requires rmap and needs guarantees that the page->mapping
association is maintained / valid (inode not freed) for the duration of
the lookup.

In the device-dax case it is sufficient to simply hold a dev_pagemap
reference. In the filesystem-dax case we need to use the entry lock.

Export the entry lock via dax_lock_mapping_entry() that uses
rcu_read_lock() to protect against the inode being freed, and
revalidates the page->mapping association under xa_lock().

Cc: Christoph Hellwig 
Cc: Matthew Wilcox 
Cc: Ross Zwisler 
Cc: Jan Kara 
Signed-off-by: Dan Williams 
Signed-off-by: Dave Jiang