linux.git/include/linux/memcontrol.h, branch v4.7-rc4

mm: vmscan: reduce size of inactive file list

2016-05-21T00:58:30+00:00

The inactive file list should still be large enough to contain readahead
windows and freshly written file data, but it no longer is the only
source for detecting multiple accesses to file pages.  The workingset
refault measurement code causes recently evicted file pages that get
accessed again after a shorter interval to be promoted directly to the
active list.

With that mechanism in place, we can afford to (on a larger system)
dedicate more memory to the active file list, so we can actually cache
more of the frequently used file pages in memory, and not have them
pushed out by streaming writes, once-used streaming file reads, etc.

This can help things like database workloads, where only half the page
cache can currently be used to cache the database working set.  This
patch automatically increases that fraction on larger systems, using the
same ratio that has already been used for anonymous memory.

[hannes@cmpxchg.org: cgroup-awareness]
Signed-off-by: Rik van Riel 
Signed-off-by: Johannes Weiner 
Reported-by: Andres Freund 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: update_lru_size do the __mod_zone_page_state

2016-05-20T02:12:14+00:00

Konstantin Khlebnikov pointed out (nearly four years ago, when lumpy
reclaim was removed) that lru_size can be updated by -nr_taken once per
call to isolate_lru_pages(), instead of page by page.

Update it inside isolate_lru_pages(), or at its two callsites? I chose
to update it at the callsites, rearranging and grouping the updates by
nr_taken and nr_scanned together in both.

With one exception, mem_cgroup_update_lru_size(,lru,) is then used where
__mod_zone_page_state(,NR_LRU_BASE+lru,) is used; and we shall be adding
some more calls in a future commit.  Make the code a little smaller and
simpler by incorporating stat update in lru_size update.

The exception was move_active_pages_to_lru(), which aggregated the
pgmoved stat update separately from the individual lru_size updates; but
I still think this a simplification worth making.

However, the __mod_zone_page_state is not peculiar to mem_cgroups: so
better use the name update_lru_size, calls mem_cgroup_update_lru_size
when CONFIG_MEMCG.

Signed-off-by: Hugh Dickins 
Cc: "Kirill A. Shutemov" 
Cc: Andrea Arcangeli 
Cc: Andres Lagar-Cavilla 
Cc: Yang Shi 
Cc: Ning Qu 
Cc: Mel Gorman 
Cc: Konstantin Khlebnikov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: workingset: make shadow node shrinker memcg aware

2016-03-17T22:09:34+00:00

Workingset code was recently made memcg aware, but shadow node shrinker
is still global.  As a result, one small cgroup can consume all memory
available for shadow nodes, possibly hurting other cgroups by reclaiming
their shadow nodes, even though reclaim distances stored in its shadow
nodes have no effect.  To avoid this, we need to make shadow node
shrinker memcg aware.

Signed-off-by: Vladimir Davydov 
Acked-by: Johannes Weiner 
Cc: Michal Hocko 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: memcontrol: zap memcg_kmem_online helper

2016-03-17T22:09:34+00:00

As kmem accounting is now either enabled for all cgroups or disabled
system-wide, there's no point in having memcg_kmem_online() helper -
instead one can use memcg_kmem_enabled() and mem_cgroup_online(), as
shrink_slab() now does.

There are only two places left where this helper is used -
__memcg_kmem_charge() and memcg_create_kmem_cache().  The former can
only be called if memcg_kmem_enabled() returned true.  Since the cgroup
it operates on is online, mem_cgroup_is_root() check will be enough.

memcg_create_kmem_cache() can't use mem_cgroup_online() helper instead
of memcg_kmem_online(), because it relies on the fact that in
memcg_offline_kmem() memcg->kmem_state is changed before
memcg_deactivate_kmem_caches() is called, but there we can just
open-code the check.

Signed-off-by: Vladimir Davydov 
Acked-by: Johannes Weiner 
Cc: Michal Hocko 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: memcontrol: report kernel stack usage in cgroup2 memory.stat

2016-03-17T22:09:34+00:00

Show how much memory is allocated to kernel stacks.

Signed-off-by: Vladimir Davydov 
Acked-by: Johannes Weiner 
Cc: Michal Hocko 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: memcontrol: report slab usage in cgroup2 memory.stat

2016-03-17T22:09:34+00:00

Show how much memory is used for storing reclaimable and unreclaimable
in-kernel data structures allocated from slab caches.

Signed-off-by: Vladimir Davydov 
Acked-by: Johannes Weiner 
Cc: Michal Hocko 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: remove unnecessary uses of lock_page_memcg()

2016-03-15T23:55:16+00:00

There are several users that nest lock_page_memcg() inside lock_page()
to prevent page->mem_cgroup from changing.  But the page lock prevents
pages from moving between cgroups, so that is unnecessary overhead.

Remove lock_page_memcg() in contexts with locked contexts and fix the
debug code in the page stat functions to be okay with the page lock.

Signed-off-by: Johannes Weiner 
Acked-by: Vladimir Davydov 
Cc: Michal Hocko 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: simplify lock_page_memcg()

2016-03-15T23:55:16+00:00

Now that migration doesn't clear page->mem_cgroup of live pages anymore,
it's safe to make lock_page_memcg() and the memcg stat functions take
pages, and spare the callers from memcg objects.

[akpm@linux-foundation.org: fix warnings]
Signed-off-by: Johannes Weiner 
Suggested-by: Vladimir Davydov 
Acked-by: Vladimir Davydov 
Cc: Michal Hocko 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: migrate: do not touch page->mem_cgroup of live pages

2016-03-15T23:55:16+00:00

Changing a page's memcg association complicates dealing with the page,
so we want to limit this as much as possible.  Page migration e.g.  does
not have to do that.  Just like page cache replacement, it can forcibly
charge a replacement page, and then uncharge the old page when it gets
freed.  Temporarily overcharging the cgroup by a single page is not an
issue in practice, and charging is so cheap nowadays that this is much
preferrable to the headache of messing with live pages.

The only place that still changes the page->mem_cgroup binding of live
pages is when pages move along with a task to another cgroup.  But that
path isolates the page from the LRU, takes the page lock, and the move
lock (lock_page_memcg()).  That means page->mem_cgroup is always stable
in callers that have the page isolated from the LRU or locked.  Lighter
unlocked paths, like writeback accounting, can use lock_page_memcg().

[akpm@linux-foundation.org: fix build]
[vdavydov@virtuozzo.com: fix lockdep splat]
Signed-off-by: Johannes Weiner 
Acked-by: Vladimir Davydov 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Greg Thelen 
Cc: Hugh Dickins 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: workingset: per-cgroup cache thrash detection

2016-03-15T23:55:16+00:00

Cache thrash detection (see a528910e12ec "mm: thrash detection-based
file cache sizing" for details) currently only works on the system
level, not inside cgroups.  Worse, as the refaults are compared to the
global number of active cache, cgroups might wrongfully get all their
refaults activated when their pages are hotter than those of others.

Move the refault machinery from the zone to the lruvec, and then tag
eviction entries with the memcg ID.  This makes the thrash detection
work correctly inside cgroups.

[sergey.senozhatsky@gmail.com: do not return from workingset_activation() with locked rcu and page]
Signed-off-by: Johannes Weiner 
Signed-off-by: Sergey Senozhatsky 
Reviewed-by: Vladimir Davydov 
Cc: Michal Hocko 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds