<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/mm/filemap.c, branch v5.4.166</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>mm/filemap: fix storing to a THP shadow entry</title>
<updated>2021-06-10T11:37:15+00:00</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2021-06-07T20:08:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3b7f3cab1d4736cd9748c89bfeca8a980f587c12'/>
<id>3b7f3cab1d4736cd9748c89bfeca8a980f587c12</id>
<content type='text'>
commit 198b62f83eef1d605d70eca32759c92cdcc14175 upstream

When a THP is removed from the page cache by reclaim, we replace it with a
shadow entry that occupies all slots of the XArray previously occupied by
the THP.  If the user then accesses that page again, we only allocate a
single page, but storing it into the shadow entry replaces all entries
with that one page.  That leads to bugs like

page dumped because: VM_BUG_ON_PAGE(page_to_pgoff(page) != offset)
------------[ cut here ]------------
kernel BUG at mm/filemap.c:2529!

https://bugzilla.kernel.org/show_bug.cgi?id=206569

This is hard to reproduce with mainline, but happens regularly with the
THP patchset (as so many more THPs are created).  This solution is take
from the THP patchset.  It splits the shadow entry into order-0 pieces at
the time that we bring a new page into cache.

Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Song Liu &lt;songliubraving@fb.com&gt;
Cc: "Kirill A . Shutemov" &lt;kirill@shutemov.name&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Link: https://lkml.kernel.org/r/20200903183029.14930-4-willy@infradead.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 198b62f83eef1d605d70eca32759c92cdcc14175 upstream

When a THP is removed from the page cache by reclaim, we replace it with a
shadow entry that occupies all slots of the XArray previously occupied by
the THP.  If the user then accesses that page again, we only allocate a
single page, but storing it into the shadow entry replaces all entries
with that one page.  That leads to bugs like

page dumped because: VM_BUG_ON_PAGE(page_to_pgoff(page) != offset)
------------[ cut here ]------------
kernel BUG at mm/filemap.c:2529!

https://bugzilla.kernel.org/show_bug.cgi?id=206569

This is hard to reproduce with mainline, but happens regularly with the
THP patchset (as so many more THPs are created).  This solution is take
from the THP patchset.  It splits the shadow entry into order-0 pieces at
the time that we bring a new page into cache.

Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Song Liu &lt;songliubraving@fb.com&gt;
Cc: "Kirill A . Shutemov" &lt;kirill@shutemov.name&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Link: https://lkml.kernel.org/r/20200903183029.14930-4-willy@infradead.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/error_inject: Fix allow_error_inject function signatures.</title>
<updated>2020-10-29T08:57:37+00:00</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2020-08-27T22:01:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d911c0e9fcf070688691bc65e32e7290c053ca45'/>
<id>d911c0e9fcf070688691bc65e32e7290c053ca45</id>
<content type='text'>
[ Upstream commit 76cd61739fd107a7f7ec4c24a045e98d8ee150f0 ]

'static' and 'static noinline' function attributes make no guarantees that
gcc/clang won't optimize them. The compiler may decide to inline 'static'
function and in such case ALLOW_ERROR_INJECT becomes meaningless. The compiler
could have inlined __add_to_page_cache_locked() in one callsite and didn't
inline in another. In such case injecting errors into it would cause
unpredictable behavior. It's worse with 'static noinline' which won't be
inlined, but it still can be optimized. Like the compiler may decide to remove
one argument or constant propagate the value depending on the callsite.

To avoid such issues make sure that these functions are global noinline.

Fixes: af3b854492f3 ("mm/page_alloc.c: allow error injection")
Fixes: cfcbfb1382db ("mm/filemap.c: enable error injection at add_to_page_cache()")
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Link: https://lore.kernel.org/bpf/20200827220114.69225-2-alexei.starovoitov@gmail.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 76cd61739fd107a7f7ec4c24a045e98d8ee150f0 ]

'static' and 'static noinline' function attributes make no guarantees that
gcc/clang won't optimize them. The compiler may decide to inline 'static'
function and in such case ALLOW_ERROR_INJECT becomes meaningless. The compiler
could have inlined __add_to_page_cache_locked() in one callsite and didn't
inline in another. In such case injecting errors into it would cause
unpredictable behavior. It's worse with 'static noinline' which won't be
inlined, but it still can be optimized. Like the compiler may decide to remove
one argument or constant propagate the value depending on the callsite.

To avoid such issues make sure that these functions are global noinline.

Fixes: af3b854492f3 ("mm/page_alloc.c: allow error injection")
Fixes: cfcbfb1382db ("mm/filemap.c: enable error injection at add_to_page_cache()")
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Link: https://lore.kernel.org/bpf/20200827220114.69225-2-alexei.starovoitov@gmail.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/filemap.c: clear page error before actual read</title>
<updated>2020-10-01T11:17:53+00:00</updated>
<author>
<name>Xianting Tian</name>
<email>xianting_tian@126.com</email>
</author>
<published>2020-04-02T04:04:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6eed4b3392c666d17e5412c203ddc0f917da4db0'/>
<id>6eed4b3392c666d17e5412c203ddc0f917da4db0</id>
<content type='text'>
[ Upstream commit faffdfa04fa11ccf048cebdde73db41ede0679e0 ]

Mount failure issue happens under the scenario: Application forked dozens
of threads to mount the same number of cramfs images separately in docker,
but several mounts failed with high probability.  Mount failed due to the
checking result of the page(read from the superblock of loop dev) is not
uptodate after wait_on_page_locked(page) returned in function cramfs_read:

   wait_on_page_locked(page);
   if (!PageUptodate(page)) {
      ...
   }

The reason of the checking result of the page not uptodate: systemd-udevd
read the loopX dev before mount, because the status of loopX is Lo_unbound
at this time, so loop_make_request directly trigger the calling of io_end
handler end_buffer_async_read, which called SetPageError(page).  So It
caused the page can't be set to uptodate in function
end_buffer_async_read:

   if(page_uptodate &amp;&amp; !PageError(page)) {
      SetPageUptodate(page);
   }

Then mount operation is performed, it used the same page which is just
accessed by systemd-udevd above, Because this page is not uptodate, it
will launch a actual read via submit_bh, then wait on this page by calling
wait_on_page_locked(page).  When the I/O of the page done, io_end handler
end_buffer_async_read is called, because no one cleared the page
error(during the whole read path of mount), which is caused by
systemd-udevd reading, so this page is still in "PageError" status, which
can't be set to uptodate in function end_buffer_async_read, then caused
mount failure.

But sometimes mount succeed even through systemd-udeved read loopX dev
just before, The reason is systemd-udevd launched other loopX read just
between step 3.1 and 3.2, the steps as below:

1, loopX dev default status is Lo_unbound;
2, systemd-udved read loopX dev (page is set to PageError);
3, mount operation
   1) set loopX status to Lo_bound;
   ==&gt;systemd-udevd read loopX dev&lt;==
   2) read loopX dev(page has no error)
   3) mount succeed

As the loopX dev status is set to Lo_bound after step 3.1, so the other
loopX dev read by systemd-udevd will go through the whole I/O stack, part
of the call trace as below:

   SYS_read
      vfs_read
          do_sync_read
              blkdev_aio_read
                 generic_file_aio_read
                     do_generic_file_read:
                        ClearPageError(page);
                        mapping-&gt;a_ops-&gt;readpage(filp, page);

here, mapping-&gt;a_ops-&gt;readpage() is blkdev_readpage.  In latest kernel,
some function name changed, the call trace as below:

   blkdev_read_iter
      generic_file_read_iter
         generic_file_buffered_read:
            /*
             * A previous I/O error may have been due to temporary
             * failures, eg. mutipath errors.
             * Pg_error will be set again if readpage fails.
             */
            ClearPageError(page);
            /* Start the actual read. The read will unlock the page*/
            error=mapping-&gt;a_ops-&gt;readpage(flip, page);

We can see ClearPageError(page) is called before the actual read,
then the read in step 3.2 succeed.

This patch is to add the calling of ClearPageError just before the actual
read of read path of cramfs mount.  Without the patch, the call trace as
below when performing cramfs mount:

   do_mount
      cramfs_read
         cramfs_blkdev_read
            read_cache_page
               do_read_cache_page:
                  filler(data, page);
                  or
                  mapping-&gt;a_ops-&gt;readpage(data, page);

With the patch, the call trace as below when performing mount:

   do_mount
      cramfs_read
         cramfs_blkdev_read
            read_cache_page:
               do_read_cache_page:
                  ClearPageError(page); &lt;== new add
                  filler(data, page);
                  or
                  mapping-&gt;a_ops-&gt;readpage(data, page);

With the patch, mount operation trigger the calling of
ClearPageError(page) before the actual read, the page has no error if no
additional page error happen when I/O done.

Signed-off-by: Xianting Tian &lt;xianting_tian@126.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: &lt;yubin@h3c.com&gt;
Link: http://lkml.kernel.org/r/1583318844-22971-1-git-send-email-xianting_tian@126.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit faffdfa04fa11ccf048cebdde73db41ede0679e0 ]

Mount failure issue happens under the scenario: Application forked dozens
of threads to mount the same number of cramfs images separately in docker,
but several mounts failed with high probability.  Mount failed due to the
checking result of the page(read from the superblock of loop dev) is not
uptodate after wait_on_page_locked(page) returned in function cramfs_read:

   wait_on_page_locked(page);
   if (!PageUptodate(page)) {
      ...
   }

The reason of the checking result of the page not uptodate: systemd-udevd
read the loopX dev before mount, because the status of loopX is Lo_unbound
at this time, so loop_make_request directly trigger the calling of io_end
handler end_buffer_async_read, which called SetPageError(page).  So It
caused the page can't be set to uptodate in function
end_buffer_async_read:

   if(page_uptodate &amp;&amp; !PageError(page)) {
      SetPageUptodate(page);
   }

Then mount operation is performed, it used the same page which is just
accessed by systemd-udevd above, Because this page is not uptodate, it
will launch a actual read via submit_bh, then wait on this page by calling
wait_on_page_locked(page).  When the I/O of the page done, io_end handler
end_buffer_async_read is called, because no one cleared the page
error(during the whole read path of mount), which is caused by
systemd-udevd reading, so this page is still in "PageError" status, which
can't be set to uptodate in function end_buffer_async_read, then caused
mount failure.

But sometimes mount succeed even through systemd-udeved read loopX dev
just before, The reason is systemd-udevd launched other loopX read just
between step 3.1 and 3.2, the steps as below:

1, loopX dev default status is Lo_unbound;
2, systemd-udved read loopX dev (page is set to PageError);
3, mount operation
   1) set loopX status to Lo_bound;
   ==&gt;systemd-udevd read loopX dev&lt;==
   2) read loopX dev(page has no error)
   3) mount succeed

As the loopX dev status is set to Lo_bound after step 3.1, so the other
loopX dev read by systemd-udevd will go through the whole I/O stack, part
of the call trace as below:

   SYS_read
      vfs_read
          do_sync_read
              blkdev_aio_read
                 generic_file_aio_read
                     do_generic_file_read:
                        ClearPageError(page);
                        mapping-&gt;a_ops-&gt;readpage(filp, page);

here, mapping-&gt;a_ops-&gt;readpage() is blkdev_readpage.  In latest kernel,
some function name changed, the call trace as below:

   blkdev_read_iter
      generic_file_read_iter
         generic_file_buffered_read:
            /*
             * A previous I/O error may have been due to temporary
             * failures, eg. mutipath errors.
             * Pg_error will be set again if readpage fails.
             */
            ClearPageError(page);
            /* Start the actual read. The read will unlock the page*/
            error=mapping-&gt;a_ops-&gt;readpage(flip, page);

We can see ClearPageError(page) is called before the actual read,
then the read in step 3.2 succeed.

This patch is to add the calling of ClearPageError just before the actual
read of read path of cramfs mount.  Without the patch, the call trace as
below when performing cramfs mount:

   do_mount
      cramfs_read
         cramfs_blkdev_read
            read_cache_page
               do_read_cache_page:
                  filler(data, page);
                  or
                  mapping-&gt;a_ops-&gt;readpage(data, page);

With the patch, the call trace as below when performing mount:

   do_mount
      cramfs_read
         cramfs_blkdev_read
            read_cache_page:
               do_read_cache_page:
                  ClearPageError(page); &lt;== new add
                  filler(data, page);
                  or
                  mapping-&gt;a_ops-&gt;readpage(data, page);

With the patch, mount operation trigger the calling of
ClearPageError(page) before the actual read, the page has no error if no
additional page error happen when I/O done.

Signed-off-by: Xianting Tian &lt;xianting_tian@126.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: &lt;yubin@h3c.com&gt;
Link: http://lkml.kernel.org/r/1583318844-22971-1-git-send-email-xianting_tian@126.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/filemap.c: don't bother dropping mmap_sem for zero size readahead</title>
<updated>2020-08-05T07:59:41+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2020-04-02T04:04:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=47e20933814feeb75543eb514437989cea53ec52'/>
<id>47e20933814feeb75543eb514437989cea53ec52</id>
<content type='text'>
commit 5c72feee3e45b40a3c96c7145ec422899d0e8964 upstream.

When handling a page fault, we drop mmap_sem to start async readahead so
that we don't block on IO submission with mmap_sem held.  However there's
no point to drop mmap_sem in case readahead is disabled.  Handle that case
to avoid pointless dropping of mmap_sem and retrying the fault.  This was
actually reported to block mlockall(MCL_CURRENT) indefinitely.

Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations")
Reported-by: Minchan Kim &lt;minchan@kernel.org&gt;
Reported-by: Robert Stupp &lt;snazy@gmx.de&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: Minchan Kim &lt;minchan@kernel.org&gt;
Link: http://lkml.kernel.org/r/20200212101356.30759-1-jack@suse.cz
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: SeongJae Park &lt;sjpark@amazon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 5c72feee3e45b40a3c96c7145ec422899d0e8964 upstream.

When handling a page fault, we drop mmap_sem to start async readahead so
that we don't block on IO submission with mmap_sem held.  However there's
no point to drop mmap_sem in case readahead is disabled.  Handle that case
to avoid pointless dropping of mmap_sem and retrying the fault.  This was
actually reported to block mlockall(MCL_CURRENT) indefinitely.

Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations")
Reported-by: Minchan Kim &lt;minchan@kernel.org&gt;
Reported-by: Robert Stupp &lt;snazy@gmx.de&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: Minchan Kim &lt;minchan@kernel.org&gt;
Link: http://lkml.kernel.org/r/20200212101356.30759-1-jack@suse.cz
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: SeongJae Park &lt;sjpark@amazon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>mm: drop mmap_sem before calling balance_dirty_pages() in write fault</title>
<updated>2020-01-09T09:19:55+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2019-12-01T01:50:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=173fa52f7fd25519deb286173d80ed742007b28e'/>
<id>173fa52f7fd25519deb286173d80ed742007b28e</id>
<content type='text'>
[ Upstream commit 89b15332af7c0312a41e50846819ca6613b58b4c ]

One of our services is observing hanging ps/top/etc under heavy write
IO, and the task states show this is an mmap_sem priority inversion:

A write fault is holding the mmap_sem in read-mode and waiting for
(heavily cgroup-limited) IO in balance_dirty_pages():

    balance_dirty_pages+0x724/0x905
    balance_dirty_pages_ratelimited+0x254/0x390
    fault_dirty_shared_page.isra.96+0x4a/0x90
    do_wp_page+0x33e/0x400
    __handle_mm_fault+0x6f0/0xfa0
    handle_mm_fault+0xe4/0x200
    __do_page_fault+0x22b/0x4a0
    page_fault+0x45/0x50

Somebody tries to change the address space, contending for the mmap_sem in
write-mode:

    call_rwsem_down_write_failed_killable+0x13/0x20
    do_mprotect_pkey+0xa8/0x330
    SyS_mprotect+0xf/0x20
    do_syscall_64+0x5b/0x100
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

The waiting writer locks out all subsequent readers to avoid lock
starvation, and several threads can be seen hanging like this:

    call_rwsem_down_read_failed+0x14/0x30
    proc_pid_cmdline_read+0xa0/0x480
    __vfs_read+0x23/0x140
    vfs_read+0x87/0x130
    SyS_read+0x42/0x90
    do_syscall_64+0x5b/0x100
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

To fix this, do what we do for cache read faults already: drop the
mmap_sem before calling into anything IO bound, in this case the
balance_dirty_pages() function, and return VM_FAULT_RETRY.

Link: http://lkml.kernel.org/r/20190924194238.GA29030@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Josef Bacik &lt;josef@toxicpanda.com&gt;
Cc: Hillf Danton &lt;hdanton@sina.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 89b15332af7c0312a41e50846819ca6613b58b4c ]

One of our services is observing hanging ps/top/etc under heavy write
IO, and the task states show this is an mmap_sem priority inversion:

A write fault is holding the mmap_sem in read-mode and waiting for
(heavily cgroup-limited) IO in balance_dirty_pages():

    balance_dirty_pages+0x724/0x905
    balance_dirty_pages_ratelimited+0x254/0x390
    fault_dirty_shared_page.isra.96+0x4a/0x90
    do_wp_page+0x33e/0x400
    __handle_mm_fault+0x6f0/0xfa0
    handle_mm_fault+0xe4/0x200
    __do_page_fault+0x22b/0x4a0
    page_fault+0x45/0x50

Somebody tries to change the address space, contending for the mmap_sem in
write-mode:

    call_rwsem_down_write_failed_killable+0x13/0x20
    do_mprotect_pkey+0xa8/0x330
    SyS_mprotect+0xf/0x20
    do_syscall_64+0x5b/0x100
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

The waiting writer locks out all subsequent readers to avoid lock
starvation, and several threads can be seen hanging like this:

    call_rwsem_down_read_failed+0x14/0x30
    proc_pid_cmdline_read+0xa0/0x480
    __vfs_read+0x23/0x140
    vfs_read+0x87/0x130
    SyS_read+0x42/0x90
    do_syscall_64+0x5b/0x100
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

To fix this, do what we do for cache read faults already: drop the
mmap_sem before calling into anything IO bound, in this case the
balance_dirty_pages() function, and return VM_FAULT_RETRY.

Link: http://lkml.kernel.org/r/20190924194238.GA29030@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Josef Bacik &lt;josef@toxicpanda.com&gt;
Cc: Hillf Danton &lt;hdanton@sina.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/filemap.c: include &lt;linux/ramfs.h&gt; for generic_file_vm_ops definition</title>
<updated>2019-10-19T10:32:32+00:00</updated>
<author>
<name>Ben Dooks</name>
<email>ben.dooks@codethink.co.uk</email>
</author>
<published>2019-10-19T03:20:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d0e6a5821cdf08eea91d8597dce1ad695ebeb8bc'/>
<id>d0e6a5821cdf08eea91d8597dce1ad695ebeb8bc</id>
<content type='text'>
The generic_file_vm_ops is defined in &lt;linux/ramfs.h&gt; so include it to
fix the following warning:

  mm/filemap.c:2717:35: warning: symbol 'generic_file_vm_ops' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/20191008102311.25432-1-ben.dooks@codethink.co.uk
Signed-off-by: Ben Dooks &lt;ben.dooks@codethink.co.uk&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The generic_file_vm_ops is defined in &lt;linux/ramfs.h&gt; so include it to
fix the following warning:

  mm/filemap.c:2717:35: warning: symbol 'generic_file_vm_ops' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/20191008102311.25432-1-ben.dooks@codethink.co.uk
Signed-off-by: Ben Dooks &lt;ben.dooks@codethink.co.uk&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm,thp: avoid writes to file with THP in pagecache</title>
<updated>2019-09-24T22:54:11+00:00</updated>
<author>
<name>Song Liu</name>
<email>songliubraving@fb.com</email>
</author>
<published>2019-09-23T22:38:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=09d91cda0e8207c1f14ee0d572f61a53dbcdaf85'/>
<id>09d91cda0e8207c1f14ee0d572f61a53dbcdaf85</id>
<content type='text'>
In previous patch, an application could put part of its text section in
THP via madvise().  These THPs will be protected from writes when the
application is still running (TXTBSY).  However, after the application
exits, the file is available for writes.

This patch avoids writes to file THP by dropping page cache for the file
when the file is open for write.  A new counter nr_thps is added to struct
address_space.  In do_dentry_open(), if the file is open for write and
nr_thps is non-zero, we drop page cache for the whole file.

Link: http://lkml.kernel.org/r/20190801184244.3169074-8-songliubraving@fb.com
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Reported-by: kbuild test robot &lt;lkp@intel.com&gt;
Acked-by: Rik van Riel &lt;riel@surriel.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Hillf Danton &lt;hdanton@sina.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: William Kucharski &lt;william.kucharski@oracle.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In previous patch, an application could put part of its text section in
THP via madvise().  These THPs will be protected from writes when the
application is still running (TXTBSY).  However, after the application
exits, the file is available for writes.

This patch avoids writes to file THP by dropping page cache for the file
when the file is open for write.  A new counter nr_thps is added to struct
address_space.  In do_dentry_open(), if the file is open for write and
nr_thps is non-zero, we drop page cache for the whole file.

Link: http://lkml.kernel.org/r/20190801184244.3169074-8-songliubraving@fb.com
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Reported-by: kbuild test robot &lt;lkp@intel.com&gt;
Acked-by: Rik van Riel &lt;riel@surriel.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Hillf Danton &lt;hdanton@sina.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: William Kucharski &lt;william.kucharski@oracle.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm,thp: add read-only THP support for (non-shmem) FS</title>
<updated>2019-09-24T22:54:11+00:00</updated>
<author>
<name>Song Liu</name>
<email>songliubraving@fb.com</email>
</author>
<published>2019-09-23T22:38:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=99cb0dbd47a15d395bf3faa78dc122bc5efe3fc0'/>
<id>99cb0dbd47a15d395bf3faa78dc122bc5efe3fc0</id>
<content type='text'>
This patch is (hopefully) the first step to enable THP for non-shmem
filesystems.

This patch enables an application to put part of its text sections to THP
via madvise, for example:

    madvise((void *)0x600000, 0x200000, MADV_HUGEPAGE);

We tried to reuse the logic for THP on tmpfs.

Currently, write is not supported for non-shmem THP.  khugepaged will only
process vma with VM_DENYWRITE.  sys_mmap() ignores VM_DENYWRITE requests
(see ksys_mmap_pgoff).  The only way to create vma with VM_DENYWRITE is
execve().  This requirement limits non-shmem THP to text sections.

The next patch will handle writes, which would only happen when the all
the vmas with VM_DENYWRITE are unmapped.

An EXPERIMENTAL config, READ_ONLY_THP_FOR_FS, is added to gate this
feature.

[songliubraving@fb.com: fix build without CONFIG_SHMEM]
  Link: http://lkml.kernel.org/r/F53407FB-96CC-42E8-9862-105C92CC2B98@fb.com
[songliubraving@fb.com: fix double unlock in collapse_file()]
  Link: http://lkml.kernel.org/r/B960CBFA-8EFC-4DA4-ABC5-1977FFF2CA57@fb.com
Link: http://lkml.kernel.org/r/20190801184244.3169074-7-songliubraving@fb.com
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Acked-by: Rik van Riel &lt;riel@surriel.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Cc: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Cc: Hillf Danton &lt;hdanton@sina.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: William Kucharski &lt;william.kucharski@oracle.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch is (hopefully) the first step to enable THP for non-shmem
filesystems.

This patch enables an application to put part of its text sections to THP
via madvise, for example:

    madvise((void *)0x600000, 0x200000, MADV_HUGEPAGE);

We tried to reuse the logic for THP on tmpfs.

Currently, write is not supported for non-shmem THP.  khugepaged will only
process vma with VM_DENYWRITE.  sys_mmap() ignores VM_DENYWRITE requests
(see ksys_mmap_pgoff).  The only way to create vma with VM_DENYWRITE is
execve().  This requirement limits non-shmem THP to text sections.

The next patch will handle writes, which would only happen when the all
the vmas with VM_DENYWRITE are unmapped.

An EXPERIMENTAL config, READ_ONLY_THP_FOR_FS, is added to gate this
feature.

[songliubraving@fb.com: fix build without CONFIG_SHMEM]
  Link: http://lkml.kernel.org/r/F53407FB-96CC-42E8-9862-105C92CC2B98@fb.com
[songliubraving@fb.com: fix double unlock in collapse_file()]
  Link: http://lkml.kernel.org/r/B960CBFA-8EFC-4DA4-ABC5-1977FFF2CA57@fb.com
Link: http://lkml.kernel.org/r/20190801184244.3169074-7-songliubraving@fb.com
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Acked-by: Rik van Riel &lt;riel@surriel.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Cc: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Cc: Hillf Danton &lt;hdanton@sina.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: William Kucharski &lt;william.kucharski@oracle.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>filemap: update offset check in filemap_fault()</title>
<updated>2019-09-24T22:54:11+00:00</updated>
<author>
<name>Song Liu</name>
<email>songliubraving@fb.com</email>
</author>
<published>2019-09-23T22:37:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=520e5ba415906373186bcd3c7cffa3535bfdbdde'/>
<id>520e5ba415906373186bcd3c7cffa3535bfdbdde</id>
<content type='text'>
With THP, current check of offset:

    VM_BUG_ON_PAGE(page-&gt;index != offset, page);

is no longer accurate. Update it to:

    VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page);

Link: http://lkml.kernel.org/r/20190801184244.3169074-4-songliubraving@fb.com
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Acked-by: Rik van Riel &lt;riel@surriel.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Hillf Danton &lt;hdanton@sina.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: William Kucharski &lt;william.kucharski@oracle.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With THP, current check of offset:

    VM_BUG_ON_PAGE(page-&gt;index != offset, page);

is no longer accurate. Update it to:

    VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page);

Link: http://lkml.kernel.org/r/20190801184244.3169074-4-songliubraving@fb.com
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Acked-by: Rik van Riel &lt;riel@surriel.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Hillf Danton &lt;hdanton@sina.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: William Kucharski &lt;william.kucharski@oracle.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>filemap: check compound_head(page)-&gt;mapping in pagecache_get_page()</title>
<updated>2019-09-24T22:54:11+00:00</updated>
<author>
<name>Song Liu</name>
<email>songliubraving@fb.com</email>
</author>
<published>2019-09-23T22:37:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=31895438e702f48e25b7aa6d88f9c97c795c79c7'/>
<id>31895438e702f48e25b7aa6d88f9c97c795c79c7</id>
<content type='text'>
Similar to previous patch, pagecache_get_page() avoids race condition with
truncate by checking page-&gt;mapping == mapping.  This does not work for
compound pages.  This patch let it check compound_head(page)-&gt;mapping
instead.

Link: http://lkml.kernel.org/r/20190801184244.3169074-3-songliubraving@fb.com
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Suggested-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Hillf Danton &lt;hdanton@sina.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: William Kucharski &lt;william.kucharski@oracle.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Similar to previous patch, pagecache_get_page() avoids race condition with
truncate by checking page-&gt;mapping == mapping.  This does not work for
compound pages.  This patch let it check compound_head(page)-&gt;mapping
instead.

Link: http://lkml.kernel.org/r/20190801184244.3169074-3-songliubraving@fb.com
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Suggested-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Hillf Danton &lt;hdanton@sina.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: William Kucharski &lt;william.kucharski@oracle.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
