linux.git/mm/readahead.c, branch v4.19

vfs: implement readahead(2) using POSIX_FADV_WILLNEED

2018-08-30T18:01:32+00:00

The implementation of readahead(2) syscall is identical to that of
fadvise64(POSIX_FADV_WILLNEED) with a few exceptions:
1. readahead(2) returns -EINVAL for !mapping->a_ops and fadvise64()
   ignores the request and returns 0.
2. fadvise64() checks for integer overflow corner case
3. fadvise64() calls the optional filesystem fadvise() file operation

Unite the two implementations by calling vfs_fadvise() from readahead(2)
syscall. Check the !mapping->a_ops in readahead(2) syscall to preserve
documented syscall ABI behaviour.

Suggested-by: Miklos Szeredi 
Fixes: d1d04ef8572b ("ovl: stack file ops")
Signed-off-by: Amir Goldstein 
Signed-off-by: Miklos Szeredi

readahead: stricter check for bdi io_pages

2018-07-27T15:09:53+00:00

ondemand_readahead() checks bdi->io_pages to cap the maximum pages
that need to be processed. This works until the readit section. If
we would do an async only readahead (async size = sync size) and
target is at beginning of window we expand the pages by another
get_next_ra_size() pages. Btrace for large reads shows that kernel
always issues a doubled size read at the beginning of processing.
Add an additional check for io_pages in the lower part of the func.
The fix helps devices that hard limit bio pages and rely on proper
handling of max_hw_read_sectors (e.g. older FusionIO cards). For
that reason it could qualify for stable.

Fixes: 9491ae4a ("mm: don't cap request size based on read-ahead setting")
Cc: stable@vger.kernel.org
Signed-off-by: Markus Stockhausen stockhausen@collogia.de
Signed-off-by: Jens Axboe

mm: skip readahead if the cgroup is congested

2018-07-09T15:07:54+00:00

We noticed in testing we'd get pretty bad latency stalls under heavy
pressure because read ahead would try to do its thing while the cgroup
was under severe pressure.  If we're under this much pressure we want to
do as little IO as possible so we can still make progress on real work
if we're a throttled cgroup, so just skip readahead if our group is
under pressure.

Signed-off-by: Josef Bacik 
Acked-by: Tejun Heo 
Acked-by: Andrew Morton 
Signed-off-by: Jens Axboe

mm: split ->readpages calls to avoid non-contiguous pages lists

2018-06-02T01:37:32+00:00

That way file systems don't have to go spotting for non-contiguous pages
and work around them.  It also kicks off I/O earlier, allowing it to
finish earlier and reduce latency.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dave Chinner 
Reviewed-by: Darrick J. Wong 
Signed-off-by: Darrick J. Wong

mm: return an unsigned int from __do_page_cache_readahead

2018-06-02T01:37:32+00:00

We never return an error, so switch to returning an unsigned int.  Most
callers already did implicit casts to an unsigned type, and the one that
didn't can be simplified now.

Suggested-by: Matthew Wilcox 
Signed-off-by: Christoph Hellwig 
Reviewed-by: Dave Chinner 
Reviewed-by: Darrick J. Wong 
Signed-off-by: Darrick J. Wong

mm: give the 'ret' variable a better name __do_page_cache_readahead

2018-06-02T01:37:32+00:00

It counts the number of pages acted on, so name it nr_pages to make that
obvious.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dave Chinner 
Reviewed-by: Darrick J. Wong 
Signed-off-by: Darrick J. Wong

page cache: use xa_lock

2018-04-11T17:28:39+00:00

Remove the address_space ->tree_lock and use the xa_lock newly added to
the radix_tree_root.  Rename the address_space ->page_tree to ->i_pages,
since we don't really care that it's a tree.

[willy@infradead.org: fix nds32, fs/dax.c]
  Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.org
Signed-off-by: Matthew Wilcox 
Acked-by: Jeff Layton 
Cc: Darrick J. Wong 
Cc: Dave Chinner 
Cc: Ryusuke Konishi 
Cc: Will Deacon 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead()

2018-04-02T18:16:12+00:00

Using this helper allows us to avoid the in-kernel calls to the
sys_readahead() syscall. The ksys_ prefix denotes that this function is
meant as a drop-in replacement for the syscall. In particular, it uses the
same calling convention as sys_readahead().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Andrew Morton 
Cc: linux-mm@kvack.org
Signed-off-by: Dominik Brodowski

mm: don't cap request size based on read-ahead setting

2016-12-13T02:55:08+00:00

We ran into a funky issue, where someone doing 256K buffered reads saw
128K requests at the device level.  Turns out it is read-ahead capping
the request size, since we use 128K as the default setting.  This
doesn't make a lot of sense - if someone is issuing 256K reads, they
should see 256K reads, regardless of the read-ahead setting, if the
underlying device can support a 256K read in a single command.

This patch introduces a bdi hint, io_pages.  This is the soft max IO
size for the lower level, I've hooked it up to the bdev settings here.
Read-ahead is modified to issue the maximum of the user request size,
and the read-ahead max size, but capped to the max request size on the
device side.  The latter is done to avoid reading ahead too much, if the
application asks for a huge read.  With this patch, the kernel behaves
like the application expects.

Link: http://lkml.kernel.org/r/1479498073-8657-1-git-send-email-axboe@fb.com
Signed-off-by: Jens Axboe 
Acked-by: Johannes Weiner 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: silently skip readahead for DAX inodes

2016-08-27T00:39:35+00:00

For DAX inodes we need to be careful to never have page cache pages in
the mapping->page_tree.  This radix tree should be composed only of DAX
exceptional entries and zero pages.

ltp's readahead02 test was triggering a warning because we were trying
to insert a DAX exceptional entry but found that a page cache page had
already been inserted into the tree.  This page was being inserted into
the radix tree in response to a readahead(2) call.

Readahead doesn't make sense for DAX inodes, but we don't want it to
report a failure either.  Instead, we just return success and don't do
any work.

Link: http://lkml.kernel.org/r/20160824221429.21158-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler 
Reported-by: Jeff Moyer 
Cc: Dan Williams 
Cc: Dave Chinner 
Cc: Dave Hansen 
Cc: Jan Kara 
Cc: 	[4.5+]
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds