<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/mm/readahead.c, branch v4.19</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>vfs: implement readahead(2) using POSIX_FADV_WILLNEED</title>
<updated>2018-08-30T18:01:32+00:00</updated>
<author>
<name>Amir Goldstein</name>
<email>amir73il@gmail.com</email>
</author>
<published>2018-08-29T05:41:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3d8f7615319b2bca87a4815e13787439e3339a93'/>
<id>3d8f7615319b2bca87a4815e13787439e3339a93</id>
<content type='text'>
The implementation of readahead(2) syscall is identical to that of
fadvise64(POSIX_FADV_WILLNEED) with a few exceptions:
1. readahead(2) returns -EINVAL for !mapping-&gt;a_ops and fadvise64()
   ignores the request and returns 0.
2. fadvise64() checks for integer overflow corner case
3. fadvise64() calls the optional filesystem fadvise() file operation

Unite the two implementations by calling vfs_fadvise() from readahead(2)
syscall. Check the !mapping-&gt;a_ops in readahead(2) syscall to preserve
documented syscall ABI behaviour.

Suggested-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Fixes: d1d04ef8572b ("ovl: stack file ops")
Signed-off-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The implementation of readahead(2) syscall is identical to that of
fadvise64(POSIX_FADV_WILLNEED) with a few exceptions:
1. readahead(2) returns -EINVAL for !mapping-&gt;a_ops and fadvise64()
   ignores the request and returns 0.
2. fadvise64() checks for integer overflow corner case
3. fadvise64() calls the optional filesystem fadvise() file operation

Unite the two implementations by calling vfs_fadvise() from readahead(2)
syscall. Check the !mapping-&gt;a_ops in readahead(2) syscall to preserve
documented syscall ABI behaviour.

Suggested-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Fixes: d1d04ef8572b ("ovl: stack file ops")
Signed-off-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>readahead: stricter check for bdi io_pages</title>
<updated>2018-07-27T15:09:53+00:00</updated>
<author>
<name>Markus Stockhausen</name>
<email>stockhausen@collogia.de</email>
</author>
<published>2018-07-27T15:09:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dc30b96ab6d569060741572cf30517d3179429a8'/>
<id>dc30b96ab6d569060741572cf30517d3179429a8</id>
<content type='text'>
ondemand_readahead() checks bdi-&gt;io_pages to cap the maximum pages
that need to be processed. This works until the readit section. If
we would do an async only readahead (async size = sync size) and
target is at beginning of window we expand the pages by another
get_next_ra_size() pages. Btrace for large reads shows that kernel
always issues a doubled size read at the beginning of processing.
Add an additional check for io_pages in the lower part of the func.
The fix helps devices that hard limit bio pages and rely on proper
handling of max_hw_read_sectors (e.g. older FusionIO cards). For
that reason it could qualify for stable.

Fixes: 9491ae4a ("mm: don't cap request size based on read-ahead setting")
Cc: stable@vger.kernel.org
Signed-off-by: Markus Stockhausen stockhausen@collogia.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
ondemand_readahead() checks bdi-&gt;io_pages to cap the maximum pages
that need to be processed. This works until the readit section. If
we would do an async only readahead (async size = sync size) and
target is at beginning of window we expand the pages by another
get_next_ra_size() pages. Btrace for large reads shows that kernel
always issues a doubled size read at the beginning of processing.
Add an additional check for io_pages in the lower part of the func.
The fix helps devices that hard limit bio pages and rely on proper
handling of max_hw_read_sectors (e.g. older FusionIO cards). For
that reason it could qualify for stable.

Fixes: 9491ae4a ("mm: don't cap request size based on read-ahead setting")
Cc: stable@vger.kernel.org
Signed-off-by: Markus Stockhausen stockhausen@collogia.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: skip readahead if the cgroup is congested</title>
<updated>2018-07-09T15:07:54+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>jbacik@fb.com</email>
</author>
<published>2018-07-03T15:15:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ca47e8c72ae141587cabf3dab693f6754d8c416b'/>
<id>ca47e8c72ae141587cabf3dab693f6754d8c416b</id>
<content type='text'>
We noticed in testing we'd get pretty bad latency stalls under heavy
pressure because read ahead would try to do its thing while the cgroup
was under severe pressure.  If we're under this much pressure we want to
do as little IO as possible so we can still make progress on real work
if we're a throttled cgroup, so just skip readahead if our group is
under pressure.

Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We noticed in testing we'd get pretty bad latency stalls under heavy
pressure because read ahead would try to do its thing while the cgroup
was under severe pressure.  If we're under this much pressure we want to
do as little IO as possible so we can still make progress on real work
if we're a throttled cgroup, so just skip readahead if our group is
under pressure.

Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: split -&gt;readpages calls to avoid non-contiguous pages lists</title>
<updated>2018-06-02T01:37:32+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2018-06-01T16:03:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b3751e6ab45a3b92da1e4acd42ada7b6a4122f2b'/>
<id>b3751e6ab45a3b92da1e4acd42ada7b6a4122f2b</id>
<content type='text'>
That way file systems don't have to go spotting for non-contiguous pages
and work around them.  It also kicks off I/O earlier, allowing it to
finish earlier and reduce latency.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
That way file systems don't have to go spotting for non-contiguous pages
and work around them.  It also kicks off I/O earlier, allowing it to
finish earlier and reduce latency.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: return an unsigned int from __do_page_cache_readahead</title>
<updated>2018-06-02T01:37:32+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2018-06-01T16:03:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c534aa3fdd149fab18b094375f334b4bb3635cbf'/>
<id>c534aa3fdd149fab18b094375f334b4bb3635cbf</id>
<content type='text'>
We never return an error, so switch to returning an unsigned int.  Most
callers already did implicit casts to an unsigned type, and the one that
didn't can be simplified now.

Suggested-by: Matthew Wilcox &lt;willy@infradead.org&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We never return an error, so switch to returning an unsigned int.  Most
callers already did implicit casts to an unsigned type, and the one that
didn't can be simplified now.

Suggested-by: Matthew Wilcox &lt;willy@infradead.org&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: give the 'ret' variable a better name __do_page_cache_readahead</title>
<updated>2018-06-02T01:37:32+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2018-06-01T16:03:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=836978b35fcd402a31323a16ac0b4c8242890a84'/>
<id>836978b35fcd402a31323a16ac0b4c8242890a84</id>
<content type='text'>
It counts the number of pages acted on, so name it nr_pages to make that
obvious.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It counts the number of pages acted on, so name it nr_pages to make that
obvious.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>page cache: use xa_lock</title>
<updated>2018-04-11T17:28:39+00:00</updated>
<author>
<name>Matthew Wilcox</name>
<email>mawilcox@microsoft.com</email>
</author>
<published>2018-04-10T23:36:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b93b016313b3ba8003c3b8bb71f569af91f19fc7'/>
<id>b93b016313b3ba8003c3b8bb71f569af91f19fc7</id>
<content type='text'>
Remove the address_space -&gt;tree_lock and use the xa_lock newly added to
the radix_tree_root.  Rename the address_space -&gt;page_tree to -&gt;i_pages,
since we don't really care that it's a tree.

[willy@infradead.org: fix nds32, fs/dax.c]
  Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.org
Signed-off-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Acked-by: Jeff Layton &lt;jlayton@redhat.com&gt;
Cc: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove the address_space -&gt;tree_lock and use the xa_lock newly added to
the radix_tree_root.  Rename the address_space -&gt;page_tree to -&gt;i_pages,
since we don't really care that it's a tree.

[willy@infradead.org: fix nds32, fs/dax.c]
  Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.org
Signed-off-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Acked-by: Jeff Layton &lt;jlayton@redhat.com&gt;
Cc: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead()</title>
<updated>2018-04-02T18:16:12+00:00</updated>
<author>
<name>Dominik Brodowski</name>
<email>linux@dominikbrodowski.net</email>
</author>
<published>2018-03-19T16:51:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c7b95d5156a9ee70f800bd2e47a9eba677be73e1'/>
<id>c7b95d5156a9ee70f800bd2e47a9eba677be73e1</id>
<content type='text'>
Using this helper allows us to avoid the in-kernel calls to the
sys_readahead() syscall. The ksys_ prefix denotes that this function is
meant as a drop-in replacement for the syscall. In particular, it uses the
same calling convention as sys_readahead().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: linux-mm@kvack.org
Signed-off-by: Dominik Brodowski &lt;linux@dominikbrodowski.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Using this helper allows us to avoid the in-kernel calls to the
sys_readahead() syscall. The ksys_ prefix denotes that this function is
meant as a drop-in replacement for the syscall. In particular, it uses the
same calling convention as sys_readahead().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: linux-mm@kvack.org
Signed-off-by: Dominik Brodowski &lt;linux@dominikbrodowski.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: don't cap request size based on read-ahead setting</title>
<updated>2016-12-13T02:55:08+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@fb.com</email>
</author>
<published>2016-12-13T00:43:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9491ae4aade6814afcfa67f4eb3e3342c2b39750'/>
<id>9491ae4aade6814afcfa67f4eb3e3342c2b39750</id>
<content type='text'>
We ran into a funky issue, where someone doing 256K buffered reads saw
128K requests at the device level.  Turns out it is read-ahead capping
the request size, since we use 128K as the default setting.  This
doesn't make a lot of sense - if someone is issuing 256K reads, they
should see 256K reads, regardless of the read-ahead setting, if the
underlying device can support a 256K read in a single command.

This patch introduces a bdi hint, io_pages.  This is the soft max IO
size for the lower level, I've hooked it up to the bdev settings here.
Read-ahead is modified to issue the maximum of the user request size,
and the read-ahead max size, but capped to the max request size on the
device side.  The latter is done to avoid reading ahead too much, if the
application asks for a huge read.  With this patch, the kernel behaves
like the application expects.

Link: http://lkml.kernel.org/r/1479498073-8657-1-git-send-email-axboe@fb.com
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We ran into a funky issue, where someone doing 256K buffered reads saw
128K requests at the device level.  Turns out it is read-ahead capping
the request size, since we use 128K as the default setting.  This
doesn't make a lot of sense - if someone is issuing 256K reads, they
should see 256K reads, regardless of the read-ahead setting, if the
underlying device can support a 256K read in a single command.

This patch introduces a bdi hint, io_pages.  This is the soft max IO
size for the lower level, I've hooked it up to the bdev settings here.
Read-ahead is modified to issue the maximum of the user request size,
and the read-ahead max size, but capped to the max request size on the
device side.  The latter is done to avoid reading ahead too much, if the
application asks for a huge read.  With this patch, the kernel behaves
like the application expects.

Link: http://lkml.kernel.org/r/1479498073-8657-1-git-send-email-axboe@fb.com
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: silently skip readahead for DAX inodes</title>
<updated>2016-08-27T00:39:35+00:00</updated>
<author>
<name>Ross Zwisler</name>
<email>ross.zwisler@linux.intel.com</email>
</author>
<published>2016-08-25T22:17:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=11bd969fdefea3ac0cb9791224f1e09784e21e58'/>
<id>11bd969fdefea3ac0cb9791224f1e09784e21e58</id>
<content type='text'>
For DAX inodes we need to be careful to never have page cache pages in
the mapping-&gt;page_tree.  This radix tree should be composed only of DAX
exceptional entries and zero pages.

ltp's readahead02 test was triggering a warning because we were trying
to insert a DAX exceptional entry but found that a page cache page had
already been inserted into the tree.  This page was being inserted into
the radix tree in response to a readahead(2) call.

Readahead doesn't make sense for DAX inodes, but we don't want it to
report a failure either.  Instead, we just return success and don't do
any work.

Link: http://lkml.kernel.org/r/20160824221429.21158-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Reported-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Jan Kara &lt;jack@suse.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.5+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For DAX inodes we need to be careful to never have page cache pages in
the mapping-&gt;page_tree.  This radix tree should be composed only of DAX
exceptional entries and zero pages.

ltp's readahead02 test was triggering a warning because we were trying
to insert a DAX exceptional entry but found that a page cache page had
already been inserted into the tree.  This page was being inserted into
the radix tree in response to a readahead(2) call.

Readahead doesn't make sense for DAX inodes, but we don't want it to
report a failure either.  Instead, we just return success and don't do
any work.

Link: http://lkml.kernel.org/r/20160824221429.21158-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Reported-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Jan Kara &lt;jack@suse.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.5+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
