linux.git/mm/filemap.c, branch v4.5-rc2

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2016-01-23T20:24:56+00:00

Pull final vfs updates from Al Viro:

 - The ->i_mutex wrappers (with small prereq in lustre)

 - a fix for too early freeing of symlink bodies on shmem (they need to
   be RCU-delayed) (-stable fodder)

 - followup to dedupe stuff merged this cycle

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: abort dedupe loop if fatal signals are pending
  make sure that freeing shmem fast symlinks is RCU-delayed
  wrappers for ->i_mutex access
  lustre: remove unused declaration

dax: add support for fsync/sync

2016-01-23T01:02:18+00:00

To properly handle fsync/msync in an efficient way DAX needs to track
dirty pages so it is able to flush them durably to media on demand.

The tracking of dirty pages is done via the radix tree in struct
address_space.  This radix tree is already used by the page writeback
infrastructure for tracking dirty pages associated with an open file,
and it already has support for exceptional (non struct page*) entries.
We build upon these features to add exceptional entries to the radix
tree for DAX dirty PMD or PTE pages at fault time.

[dan.j.williams@intel.com: fix dax_pmd_dbg build warning]
Signed-off-by: Ross Zwisler 
Cc: "H. Peter Anvin" 
Cc: "J. Bruce Fields" 
Cc: "Theodore Ts'o" 
Cc: Alexander Viro 
Cc: Andreas Dilger 
Cc: Dave Chinner 
Cc: Ingo Molnar 
Cc: Jan Kara 
Cc: Jeff Layton 
Cc: Matthew Wilcox 
Cc: Thomas Gleixner 
Cc: Matthew Wilcox 
Cc: Dave Hansen 
Signed-off-by: Dan Williams 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: add find_get_entries_tag()

2016-01-23T01:02:18+00:00

Add find_get_entries_tag() to the family of functions that include
find_get_entries(), find_get_pages() and find_get_pages_tag().  This is
needed for DAX dirty page handling because we need a list of both page
offsets and radix tree entries ('indices' and 'entries' in this
function) that are marked with the PAGECACHE_TAG_TOWRITE tag.

Signed-off-by: Ross Zwisler 
Reviewed-by: Jan Kara 
Cc: "H. Peter Anvin" 
Cc: "J. Bruce Fields" 
Cc: "Theodore Ts'o" 
Cc: Alexander Viro 
Cc: Andreas Dilger 
Cc: Dave Chinner 
Cc: Ingo Molnar 
Cc: Jeff Layton 
Cc: Matthew Wilcox 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Matthew Wilcox 
Cc: Dave Hansen 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

dax: support dirty DAX entries in radix tree

2016-01-23T01:02:18+00:00

Add support for tracking dirty DAX entries in the struct address_space
radix tree.  This tree is already used for dirty page writeback, and it
already supports the use of exceptional (non struct page*) entries.

In order to properly track dirty DAX pages we will insert new
exceptional entries into the radix tree that represent dirty DAX PTE or
PMD pages.  These exceptional entries will also contain the writeback
addresses for the PTE or PMD faults that we can use at fsync/msync time.

There are currently two types of exceptional entries (shmem and shadow)
that can be placed into the radix tree, and this adds a third.  We rely
on the fact that only one type of exceptional entry can be found in a
given radix tree based on its usage.  This happens for free with DAX vs
shmem but we explicitly prevent shadow entries from being added to radix
trees for DAX mappings.

The only shadow entries that would be generated for DAX radix trees
would be to track zero page mappings that were created for holes.  These
pages would receive minimal benefit from having shadow entries, and the
choice to have only one type of exceptional entry in a given radix tree
makes the logic simpler both in clear_exceptional_entry() and in the
rest of DAX.

Signed-off-by: Ross Zwisler 
Cc: "H. Peter Anvin" 
Cc: "J. Bruce Fields" 
Cc: "Theodore Ts'o" 
Cc: Alexander Viro 
Cc: Andreas Dilger 
Cc: Dave Chinner 
Cc: Ingo Molnar 
Cc: Jan Kara 
Cc: Jeff Layton 
Cc: Matthew Wilcox 
Cc: Thomas Gleixner 
Cc: Dan Williams 
Cc: Matthew Wilcox 
Cc: Dave Hansen 
Cc: Hugh Dickins 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

wrappers for ->i_mutex access

2016-01-22T23:04:28+00:00

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro

mm: differentiate page_mapped() from page_mapcount() for compound pages

2016-01-16T01:56:32+00:00

Let's define page_mapped() to be true for compound pages if any
sub-pages of the compound page is mapped (with PMD or PTE).

On other hand page_mapcount() return mapcount for this particular small
page.

This will make cases like page_get_anon_vma() behave correctly once we
allow huge pages to be mapped with PTE.

Most users outside core-mm should use page_mapcount() instead of
page_mapped().

Signed-off-by: Kirill A. Shutemov 
Tested-by: Sasha Levin 
Tested-by: Aneesh Kumar K.V 
Acked-by: Jerome Marchand 
Cc: Vlastimil Babka 
Cc: Andrea Arcangeli 
Cc: Hugh Dickins 
Cc: Dave Hansen 
Cc: Mel Gorman 
Cc: Rik van Riel 
Cc: Naoya Horiguchi 
Cc: Steve Capper 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Christoph Lameter 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

memcg: adjust to support new THP refcounting

2016-01-16T01:56:32+00:00

As with rmap, with new refcounting we cannot rely on PageTransHuge() to
check if we need to charge size of huge page form the cgroup.  We need
to get information from caller to know whether it was mapped with PMD or
PTE.

We do uncharge when last reference on the page gone.  At that point if
we see PageTransHuge() it means we need to unchange whole huge page.

The tricky part is partial unmap -- when we try to unmap part of huge
page.  We don't do a special handing of this situation, meaning we don't
uncharge the part of huge page unless last user is gone or
split_huge_page() is triggered.  In case of cgroup memory pressure
happens the partial unmapped page will be split through shrinker.  This
should be good enough.

Signed-off-by: Kirill A. Shutemov 
Tested-by: Sasha Levin 
Tested-by: Aneesh Kumar K.V 
Acked-by: Vlastimil Babka 
Acked-by: Jerome Marchand 
Cc: Andrea Arcangeli 
Cc: Hugh Dickins 
Cc: Dave Hansen 
Cc: Mel Gorman 
Cc: Rik van Riel 
Cc: Naoya Horiguchi 
Cc: Steve Capper 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Christoph Lameter 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

page-flags: define PG_locked behavior on compound pages

2016-01-16T01:56:32+00:00

lock_page() must operate on the whole compound page.  It doesn't make
much sense to lock part of compound page.  Change code to use head
page's PG_locked, if tail page is passed.

This patch also gets rid of custom helper functions --
__set_page_locked() and __clear_page_locked().  They are replaced with
helpers generated by __SETPAGEFLAG/__CLEARPAGEFLAG.  Tail pages to these
helper would trigger VM_BUG_ON().

SLUB uses PG_locked as a bit spin locked.  IIUC, tail pages should never
appear there.  VM_BUG_ON() is added to make sure that this assumption is
correct.

[akpm@linux-foundation.org: fix fs/cifs/file.c]
Signed-off-by: Kirill A. Shutemov 
Cc: Andrea Arcangeli 
Cc: Hugh Dickins 
Cc: Dave Hansen 
Cc: Mel Gorman 
Cc: Rik van Riel 
Cc: Vlastimil Babka 
Cc: Christoph Lameter 
Cc: Naoya Horiguchi 
Cc: Steve Capper 
Cc: "Aneesh Kumar K.V" 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Jerome Marchand 
Cc: Jérôme Glisse 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: allow GFP_{FS,IO} for page_cache_read page cache allocation

2016-01-15T00:00:49+00:00

page_cache_read has been historically using page_cache_alloc_cold to
allocate a new page.  This means that mapping_gfp_mask is used as the
base for the gfp_mask.  Many filesystems are setting this mask to
GFP_NOFS to prevent from fs recursion issues.  page_cache_read is called
from the vm_operations_struct::fault() context during the page fault.
This context doesn't need the reclaim protection normally.

ceph and ocfs2 which call filemap_fault from their fault handlers seem
to be OK because they are not taking any fs lock before invoking generic
implementation.  xfs which takes XFS_MMAPLOCK_SHARED is safe from the
reclaim recursion POV because this lock serializes truncate and punch
hole with the page faults and it doesn't get involved in the reclaim.

There is simply no reason to deliberately use a weaker allocation
context when a __GFP_FS | __GFP_IO can be used.  The GFP_NOFS protection
might be even harmful.  There is a push to fail GFP_NOFS allocations
rather than loop within allocator indefinitely with a very limited
reclaim ability.  Once we start failing those requests the OOM killer
might be triggered prematurely because the page cache allocation failure
is propagated up the page fault path and end up in
pagefault_out_of_memory.

We cannot play with mapping_gfp_mask directly because that would be racy
wrt.  parallel page faults and it might interfere with other users who
really rely on NOFS semantic from the stored gfp_mask.  The mask is also
inode proper so it would even be a layering violation.  What we can do
instead is to push the gfp_mask into struct vm_fault and allow fs layer
to overwrite it should the callback need to be called with a different
allocation context.

Initialize the default to (mapping_gfp_mask | __GFP_FS | __GFP_IO)
because this should be safe from the page fault path normally.  Why do
we care about mapping_gfp_mask at all then? Because this doesn't hold
only reclaim protection flags but it also might contain zone and
movability restrictions (GFP_DMA32, __GFP_MOVABLE and others) so we have
to respect those.

Signed-off-by: Michal Hocko 
Reported-by: Tetsuo Handa 
Acked-by: Jan Kara 
Acked-by: Vlastimil Babka 
Cc: Tetsuo Handa 
Cc: Mel Gorman 
Cc: Dave Chinner 
Cc: Mark Fasheh 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, fs: introduce mapping_gfp_constraint()

2015-11-07T01:50:42+00:00

There are many places which use mapping_gfp_mask to restrict a more
generic gfp mask which would be used for allocations which are not
directly related to the page cache but they are performed in the same
context.

Let's introduce a helper function which makes the restriction explicit and
easier to track.  This patch doesn't introduce any functional changes.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Michal Hocko 
Suggested-by: Andrew Morton 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds