<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/mm/memory_hotplug.c, branch v4.20.2</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined</title>
<updated>2019-01-13T08:24:02+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2018-12-28T08:38:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a2b977e3d9e4298d28ebe5cfff9e0859b74a7ac7'/>
<id>a2b977e3d9e4298d28ebe5cfff9e0859b74a7ac7</id>
<content type='text'>
commit b15c87263a69272423771118c653e9a1d0672caa upstream.

We have received a bug report that an injected MCE about faulty memory
prevents memory offline to succeed on 4.4 base kernel.  The underlying
reason was that the HWPoison page has an elevated reference count and the
migration keeps failing.  There are two problems with that.  First of all
it is dubious to migrate the poisoned page because we know that accessing
that memory is possible to fail.  Secondly it doesn't make any sense to
migrate a potentially broken content and preserve the memory corruption
over to a new location.

Oscar has found out that 4.4 and the current upstream kernels behave
slightly differently with his simply testcase

===

int main(void)
{
        int ret;
        int i;
        int fd;
        char *array = malloc(4096);
        char *array_locked = malloc(4096);

        fd = open("/tmp/data", O_RDONLY);
        read(fd, array, 4095);

        for (i = 0; i &lt; 4096; i++)
                array_locked[i] = 'd';

        ret = mlock((void *)PAGE_ALIGN((unsigned long)array_locked), sizeof(array_locked));
        if (ret)
                perror("mlock");

        sleep (20);

        ret = madvise((void *)PAGE_ALIGN((unsigned long)array_locked), 4096, MADV_HWPOISON);
        if (ret)
                perror("madvise");

        for (i = 0; i &lt; 4096; i++)
                array_locked[i] = 'd';

        return 0;
}
===

+ offline this memory.

In 4.4 kernels he saw the hwpoisoned page to be returned back to the LRU
list
kernel:  [&lt;ffffffff81019ac9&gt;] dump_trace+0x59/0x340
kernel:  [&lt;ffffffff81019e9a&gt;] show_stack_log_lvl+0xea/0x170
kernel:  [&lt;ffffffff8101ac71&gt;] show_stack+0x21/0x40
kernel:  [&lt;ffffffff8132bb90&gt;] dump_stack+0x5c/0x7c
kernel:  [&lt;ffffffff810815a1&gt;] warn_slowpath_common+0x81/0xb0
kernel:  [&lt;ffffffff811a275c&gt;] __pagevec_lru_add_fn+0x14c/0x160
kernel:  [&lt;ffffffff811a2eed&gt;] pagevec_lru_move_fn+0xad/0x100
kernel:  [&lt;ffffffff811a334c&gt;] __lru_cache_add+0x6c/0xb0
kernel:  [&lt;ffffffff81195236&gt;] add_to_page_cache_lru+0x46/0x70
kernel:  [&lt;ffffffffa02b4373&gt;] extent_readpages+0xc3/0x1a0 [btrfs]
kernel:  [&lt;ffffffff811a16d7&gt;] __do_page_cache_readahead+0x177/0x200
kernel:  [&lt;ffffffff811a18c8&gt;] ondemand_readahead+0x168/0x2a0
kernel:  [&lt;ffffffff8119673f&gt;] generic_file_read_iter+0x41f/0x660
kernel:  [&lt;ffffffff8120e50d&gt;] __vfs_read+0xcd/0x140
kernel:  [&lt;ffffffff8120e9ea&gt;] vfs_read+0x7a/0x120
kernel:  [&lt;ffffffff8121404b&gt;] kernel_read+0x3b/0x50
kernel:  [&lt;ffffffff81215c80&gt;] do_execveat_common.isra.29+0x490/0x6f0
kernel:  [&lt;ffffffff81215f08&gt;] do_execve+0x28/0x30
kernel:  [&lt;ffffffff81095ddb&gt;] call_usermodehelper_exec_async+0xfb/0x130
kernel:  [&lt;ffffffff8161c045&gt;] ret_from_fork+0x55/0x80

And that latter confuses the hotremove path because an LRU page is
attempted to be migrated and that fails due to an elevated reference
count.  It is quite possible that the reuse of the HWPoisoned page is some
kind of fixed race condition but I am not really sure about that.

With the upstream kernel the failure is slightly different.  The page
doesn't seem to have LRU bit set but isolate_movable_page simply fails and
do_migrate_range simply puts all the isolated pages back to LRU and
therefore no progress is made and scan_movable_pages finds same set of
pages over and over again.

Fix both cases by explicitly checking HWPoisoned pages before we even try
to get reference on the page, try to unmap it if it is still mapped.  As
explained by Naoya:

: Hwpoison code never unmapped those for no big reason because
: Ksm pages never dominate memory, so we simply didn't have strong
: motivation to save the pages.

Also put WARN_ON(PageLRU) in case there is a race and we can hit LRU
HWPoison pages which shouldn't happen but I couldn't convince myself about
that.  Naoya has noted the following:

: Theoretically no such gurantee, because try_to_unmap() doesn't have a
: guarantee of success and then memory_failure() returns immediately
: when hwpoison_user_mappings fails.
: Or the following code (comes after hwpoison_user_mappings block) also impli=
: es
: that the target page can still have PageLRU flag.
:
:         /*
:          * Torn down by someone else?
:          */
:         if (PageLRU(p) &amp;&amp; !PageSwapCache(p) &amp;&amp; p-&gt;mapping =3D=3D NULL) {
:                 action_result(pfn, MF_MSG_TRUNCATED_LRU, MF_IGNORED);
:                 res =3D -EBUSY;
:                 goto out;
:         }
:
: So I think it's OK to keep "if (WARN_ON(PageLRU(page)))" block in
: current version of your patch.

Link: http://lkml.kernel.org/r/20181206120135.14079-1-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.com&gt;
Debugged-by: Oscar Salvador &lt;osalvador@suse.com&gt;
Tested-by: Oscar Salvador &lt;osalvador@suse.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit b15c87263a69272423771118c653e9a1d0672caa upstream.

We have received a bug report that an injected MCE about faulty memory
prevents memory offline to succeed on 4.4 base kernel.  The underlying
reason was that the HWPoison page has an elevated reference count and the
migration keeps failing.  There are two problems with that.  First of all
it is dubious to migrate the poisoned page because we know that accessing
that memory is possible to fail.  Secondly it doesn't make any sense to
migrate a potentially broken content and preserve the memory corruption
over to a new location.

Oscar has found out that 4.4 and the current upstream kernels behave
slightly differently with his simply testcase

===

int main(void)
{
        int ret;
        int i;
        int fd;
        char *array = malloc(4096);
        char *array_locked = malloc(4096);

        fd = open("/tmp/data", O_RDONLY);
        read(fd, array, 4095);

        for (i = 0; i &lt; 4096; i++)
                array_locked[i] = 'd';

        ret = mlock((void *)PAGE_ALIGN((unsigned long)array_locked), sizeof(array_locked));
        if (ret)
                perror("mlock");

        sleep (20);

        ret = madvise((void *)PAGE_ALIGN((unsigned long)array_locked), 4096, MADV_HWPOISON);
        if (ret)
                perror("madvise");

        for (i = 0; i &lt; 4096; i++)
                array_locked[i] = 'd';

        return 0;
}
===

+ offline this memory.

In 4.4 kernels he saw the hwpoisoned page to be returned back to the LRU
list
kernel:  [&lt;ffffffff81019ac9&gt;] dump_trace+0x59/0x340
kernel:  [&lt;ffffffff81019e9a&gt;] show_stack_log_lvl+0xea/0x170
kernel:  [&lt;ffffffff8101ac71&gt;] show_stack+0x21/0x40
kernel:  [&lt;ffffffff8132bb90&gt;] dump_stack+0x5c/0x7c
kernel:  [&lt;ffffffff810815a1&gt;] warn_slowpath_common+0x81/0xb0
kernel:  [&lt;ffffffff811a275c&gt;] __pagevec_lru_add_fn+0x14c/0x160
kernel:  [&lt;ffffffff811a2eed&gt;] pagevec_lru_move_fn+0xad/0x100
kernel:  [&lt;ffffffff811a334c&gt;] __lru_cache_add+0x6c/0xb0
kernel:  [&lt;ffffffff81195236&gt;] add_to_page_cache_lru+0x46/0x70
kernel:  [&lt;ffffffffa02b4373&gt;] extent_readpages+0xc3/0x1a0 [btrfs]
kernel:  [&lt;ffffffff811a16d7&gt;] __do_page_cache_readahead+0x177/0x200
kernel:  [&lt;ffffffff811a18c8&gt;] ondemand_readahead+0x168/0x2a0
kernel:  [&lt;ffffffff8119673f&gt;] generic_file_read_iter+0x41f/0x660
kernel:  [&lt;ffffffff8120e50d&gt;] __vfs_read+0xcd/0x140
kernel:  [&lt;ffffffff8120e9ea&gt;] vfs_read+0x7a/0x120
kernel:  [&lt;ffffffff8121404b&gt;] kernel_read+0x3b/0x50
kernel:  [&lt;ffffffff81215c80&gt;] do_execveat_common.isra.29+0x490/0x6f0
kernel:  [&lt;ffffffff81215f08&gt;] do_execve+0x28/0x30
kernel:  [&lt;ffffffff81095ddb&gt;] call_usermodehelper_exec_async+0xfb/0x130
kernel:  [&lt;ffffffff8161c045&gt;] ret_from_fork+0x55/0x80

And that latter confuses the hotremove path because an LRU page is
attempted to be migrated and that fails due to an elevated reference
count.  It is quite possible that the reuse of the HWPoisoned page is some
kind of fixed race condition but I am not really sure about that.

With the upstream kernel the failure is slightly different.  The page
doesn't seem to have LRU bit set but isolate_movable_page simply fails and
do_migrate_range simply puts all the isolated pages back to LRU and
therefore no progress is made and scan_movable_pages finds same set of
pages over and over again.

Fix both cases by explicitly checking HWPoisoned pages before we even try
to get reference on the page, try to unmap it if it is still mapped.  As
explained by Naoya:

: Hwpoison code never unmapped those for no big reason because
: Ksm pages never dominate memory, so we simply didn't have strong
: motivation to save the pages.

Also put WARN_ON(PageLRU) in case there is a race and we can hit LRU
HWPoison pages which shouldn't happen but I couldn't convince myself about
that.  Naoya has noted the following:

: Theoretically no such gurantee, because try_to_unmap() doesn't have a
: guarantee of success and then memory_failure() returns immediately
: when hwpoison_user_mappings fails.
: Or the following code (comes after hwpoison_user_mappings block) also impli=
: es
: that the target page can still have PageLRU flag.
:
:         /*
:          * Torn down by someone else?
:          */
:         if (PageLRU(p) &amp;&amp; !PageSwapCache(p) &amp;&amp; p-&gt;mapping =3D=3D NULL) {
:                 action_result(pfn, MF_MSG_TRUNCATED_LRU, MF_IGNORED);
:                 res =3D -EBUSY;
:                 goto out;
:         }
:
: So I think it's OK to keep "if (WARN_ON(PageLRU(page)))" block in
: current version of your patch.

Link: http://lkml.kernel.org/r/20181206120135.14079-1-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.com&gt;
Debugged-by: Oscar Salvador &lt;osalvador@suse.com&gt;
Tested-by: Oscar Salvador &lt;osalvador@suse.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>memory_hotplug: cond_resched in __remove_pages</title>
<updated>2018-11-03T17:09:38+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2018-11-02T22:48:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=dd33ad7b251f900481701b2a82d25de583867708'/>
<id>dd33ad7b251f900481701b2a82d25de583867708</id>
<content type='text'>
We have received a bug report that unbinding a large pmem (&gt;1TB) can
result in a soft lockup:

  NMI watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [ndctl:4365]
  [...]
  Supported: Yes
  CPU: 9 PID: 4365 Comm: ndctl Not tainted 4.12.14-94.40-default #1 SLE12-SP4
  Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.01.00.0833.051120182255 05/11/2018
  task: ffff9cce7d4410c0 task.stack: ffffbe9eb1bc4000
  RIP: 0010:__put_page+0x62/0x80
  Call Trace:
   devm_memremap_pages_release+0x152/0x260
   release_nodes+0x18d/0x1d0
   device_release_driver_internal+0x160/0x210
   unbind_store+0xb3/0xe0
   kernfs_fop_write+0x102/0x180
   __vfs_write+0x26/0x150
   vfs_write+0xad/0x1a0
   SyS_write+0x42/0x90
   do_syscall_64+0x74/0x150
   entry_SYSCALL_64_after_hwframe+0x3d/0xa2
  RIP: 0033:0x7fd13166b3d0

It has been reported on an older (4.12) kernel but the current upstream
code doesn't cond_resched in the hot remove code at all and the given
range to remove might be really large.  Fix the issue by calling
cond_resched once per memory section.

Link: http://lkml.kernel.org/r/20181031125840.23982-1-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Johannes Thumshirn &lt;jthumshirn@suse.de&gt;
Cc: Dan Williams &lt;dan.j.williams@gmail.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have received a bug report that unbinding a large pmem (&gt;1TB) can
result in a soft lockup:

  NMI watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [ndctl:4365]
  [...]
  Supported: Yes
  CPU: 9 PID: 4365 Comm: ndctl Not tainted 4.12.14-94.40-default #1 SLE12-SP4
  Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.01.00.0833.051120182255 05/11/2018
  task: ffff9cce7d4410c0 task.stack: ffffbe9eb1bc4000
  RIP: 0010:__put_page+0x62/0x80
  Call Trace:
   devm_memremap_pages_release+0x152/0x260
   release_nodes+0x18d/0x1d0
   device_release_driver_internal+0x160/0x210
   unbind_store+0xb3/0xe0
   kernfs_fop_write+0x102/0x180
   __vfs_write+0x26/0x150
   vfs_write+0xad/0x1a0
   SyS_write+0x42/0x90
   do_syscall_64+0x74/0x150
   entry_SYSCALL_64_after_hwframe+0x3d/0xa2
  RIP: 0033:0x7fd13166b3d0

It has been reported on an older (4.12) kernel but the current upstream
code doesn't cond_resched in the hot remove code at all and the given
range to remove might be really large.  Fix the issue by calling
cond_resched once per memory section.

Link: http://lkml.kernel.org/r/20181031125840.23982-1-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Johannes Thumshirn &lt;jthumshirn@suse.de&gt;
Cc: Dan Williams &lt;dan.j.williams@gmail.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock</title>
<updated>2018-10-31T15:54:17+00:00</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2018-10-30T22:10:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=381eab4a6ee81266f8dddc62e57376c7e584e5b8'/>
<id>381eab4a6ee81266f8dddc62e57376c7e584e5b8</id>
<content type='text'>
There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
fix concurrent memory hot-add deadlock"), which tried to fix a possible
lock inversion reported and discussed in [1] due to the two locks
	a) device_lock()
	b) mem_hotplug_lock

While add_memory() first takes b), followed by a) during
bus_probe_device(), onlining of memory from user space first took a),
followed by b), exposing a possible deadlock.

In [1], and it was decided to not make use of device_hotplug_lock, but
rather to enforce a locking order.

The problems I spotted related to this:

1. Memory block device attributes: While .state first calls
   mem_hotplug_begin() and the calls device_online() - which takes
   device_lock() - .online does no longer call mem_hotplug_begin(), so
   effectively calls online_pages() without mem_hotplug_lock.

2. device_online() should be called under device_hotplug_lock, however
   onlining memory during add_memory() does not take care of that.

In addition, I think there is also something wrong about the locking in

3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
   without locks. This was introduced after 30467e0b3be. And skimming over
   the code, I assume it could need some more care in regards to locking
   (e.g. device_online() called without device_hotplug_lock. This will
   be addressed in the following patches.

Now that we hold the device_hotplug_lock when
- adding memory (e.g. via add_memory()/add_memory_resource())
- removing memory (e.g. via remove_memory())
- device_online()/device_offline()

We can move mem_hotplug_lock usage back into
online_pages()/offline_pages().

Why is mem_hotplug_lock still needed? Essentially to make
get_online_mems()/put_online_mems() be very fast (relying on
device_hotplug_lock would be very slow), and to serialize against
addition of memory that does not create memory block devices (hmm).

[1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
    2015-February/065324.html

This patch is partly based on a patch by Vitaly Kuznetsov.

Link: http://lkml.kernel.org/r/20180925091457.28651-4-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Reviewed-by: Rashmica Gupta &lt;rashmica.g@gmail.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@rjwysocki.net&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "K. Y. Srinivasan" &lt;kys@microsoft.com&gt;
Cc: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Cc: Stephen Hemminger &lt;sthemmin@microsoft.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: Rashmica Gupta &lt;rashmica.g@gmail.com&gt;
Cc: Michael Neuling &lt;mikey@neuling.org&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Kate Stewart &lt;kstewart@linuxfoundation.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Philippe Ombredanne &lt;pombredanne@nexb.com&gt;
Cc: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: YASUAKI ISHIMATSU &lt;yasu.isimatu@gmail.com&gt;
Cc: Mathieu Malaterre &lt;malat@debian.org&gt;
Cc: John Allen &lt;jallen@linux.vnet.ibm.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Nathan Fontenot &lt;nfont@linux.vnet.ibm.com&gt;
Cc: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
fix concurrent memory hot-add deadlock"), which tried to fix a possible
lock inversion reported and discussed in [1] due to the two locks
	a) device_lock()
	b) mem_hotplug_lock

While add_memory() first takes b), followed by a) during
bus_probe_device(), onlining of memory from user space first took a),
followed by b), exposing a possible deadlock.

In [1], and it was decided to not make use of device_hotplug_lock, but
rather to enforce a locking order.

The problems I spotted related to this:

1. Memory block device attributes: While .state first calls
   mem_hotplug_begin() and the calls device_online() - which takes
   device_lock() - .online does no longer call mem_hotplug_begin(), so
   effectively calls online_pages() without mem_hotplug_lock.

2. device_online() should be called under device_hotplug_lock, however
   onlining memory during add_memory() does not take care of that.

In addition, I think there is also something wrong about the locking in

3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
   without locks. This was introduced after 30467e0b3be. And skimming over
   the code, I assume it could need some more care in regards to locking
   (e.g. device_online() called without device_hotplug_lock. This will
   be addressed in the following patches.

Now that we hold the device_hotplug_lock when
- adding memory (e.g. via add_memory()/add_memory_resource())
- removing memory (e.g. via remove_memory())
- device_online()/device_offline()

We can move mem_hotplug_lock usage back into
online_pages()/offline_pages().

Why is mem_hotplug_lock still needed? Essentially to make
get_online_mems()/put_online_mems() be very fast (relying on
device_hotplug_lock would be very slow), and to serialize against
addition of memory that does not create memory block devices (hmm).

[1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
    2015-February/065324.html

This patch is partly based on a patch by Vitaly Kuznetsov.

Link: http://lkml.kernel.org/r/20180925091457.28651-4-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Reviewed-by: Rashmica Gupta &lt;rashmica.g@gmail.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@rjwysocki.net&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "K. Y. Srinivasan" &lt;kys@microsoft.com&gt;
Cc: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Cc: Stephen Hemminger &lt;sthemmin@microsoft.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: Rashmica Gupta &lt;rashmica.g@gmail.com&gt;
Cc: Michael Neuling &lt;mikey@neuling.org&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Kate Stewart &lt;kstewart@linuxfoundation.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Philippe Ombredanne &lt;pombredanne@nexb.com&gt;
Cc: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: YASUAKI ISHIMATSU &lt;yasu.isimatu@gmail.com&gt;
Cc: Mathieu Malaterre &lt;malat@debian.org&gt;
Cc: John Allen &lt;jallen@linux.vnet.ibm.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Nathan Fontenot &lt;nfont@linux.vnet.ibm.com&gt;
Cc: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memory_hotplug: make add_memory() take the device_hotplug_lock</title>
<updated>2018-10-31T15:54:17+00:00</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2018-10-30T22:10:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8df1d0e4a265f25dc1e7e7624ccdbcb4a6630c89'/>
<id>8df1d0e4a265f25dc1e7e7624ccdbcb4a6630c89</id>
<content type='text'>
add_memory() currently does not take the device_hotplug_lock, however
is aleady called under the lock from
	arch/powerpc/platforms/pseries/hotplug-memory.c
	drivers/acpi/acpi_memhotplug.c
to synchronize against CPU hot-remove and similar.

In general, we should hold the device_hotplug_lock when adding memory to
synchronize against online/offline request (e.g.  from user space) - which
already resulted in lock inversions due to device_lock() and
mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory
hot-add deadlock").  add_memory()/add_memory_resource() will create memory
block devices, so this really feels like the right thing to do.

Holding the device_hotplug_lock makes sure that a memory block device
can really only be accessed (e.g. via .online/.state) from user space,
once the memory has been fully added to the system.

The lock is not held yet in
	drivers/xen/balloon.c
	arch/powerpc/platforms/powernv/memtrace.c
	drivers/s390/char/sclp_cmd.c
	drivers/hv/hv_balloon.c
So, let's either use the locked variants or take the lock.

Don't export add_memory_resource(), as it once was exported to be used by
XEN, which is never built as a module.  If somebody requires it, we also
have to export a locked variant (as device_hotplug_lock is never
exported).

Link: http://lkml.kernel.org/r/20180925091457.28651-3-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Reviewed-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Reviewed-by: Rashmica Gupta &lt;rashmica.g@gmail.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@rjwysocki.net&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: Nathan Fontenot &lt;nfont@linux.vnet.ibm.com&gt;
Cc: John Allen &lt;jallen@linux.vnet.ibm.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Mathieu Malaterre &lt;malat@debian.org&gt;
Cc: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Cc: YASUAKI ISHIMATSU &lt;yasu.isimatu@gmail.com&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Kate Stewart &lt;kstewart@linuxfoundation.org&gt;
Cc: "K. Y. Srinivasan" &lt;kys@microsoft.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Michael Neuling &lt;mikey@neuling.org&gt;
Cc: Philippe Ombredanne &lt;pombredanne@nexb.com&gt;
Cc: Stephen Hemminger &lt;sthemmin@microsoft.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
add_memory() currently does not take the device_hotplug_lock, however
is aleady called under the lock from
	arch/powerpc/platforms/pseries/hotplug-memory.c
	drivers/acpi/acpi_memhotplug.c
to synchronize against CPU hot-remove and similar.

In general, we should hold the device_hotplug_lock when adding memory to
synchronize against online/offline request (e.g.  from user space) - which
already resulted in lock inversions due to device_lock() and
mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory
hot-add deadlock").  add_memory()/add_memory_resource() will create memory
block devices, so this really feels like the right thing to do.

Holding the device_hotplug_lock makes sure that a memory block device
can really only be accessed (e.g. via .online/.state) from user space,
once the memory has been fully added to the system.

The lock is not held yet in
	drivers/xen/balloon.c
	arch/powerpc/platforms/powernv/memtrace.c
	drivers/s390/char/sclp_cmd.c
	drivers/hv/hv_balloon.c
So, let's either use the locked variants or take the lock.

Don't export add_memory_resource(), as it once was exported to be used by
XEN, which is never built as a module.  If somebody requires it, we also
have to export a locked variant (as device_hotplug_lock is never
exported).

Link: http://lkml.kernel.org/r/20180925091457.28651-3-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Reviewed-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Reviewed-by: Rashmica Gupta &lt;rashmica.g@gmail.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@rjwysocki.net&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: Nathan Fontenot &lt;nfont@linux.vnet.ibm.com&gt;
Cc: John Allen &lt;jallen@linux.vnet.ibm.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Mathieu Malaterre &lt;malat@debian.org&gt;
Cc: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Cc: YASUAKI ISHIMATSU &lt;yasu.isimatu@gmail.com&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Kate Stewart &lt;kstewart@linuxfoundation.org&gt;
Cc: "K. Y. Srinivasan" &lt;kys@microsoft.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Michael Neuling &lt;mikey@neuling.org&gt;
Cc: Philippe Ombredanne &lt;pombredanne@nexb.com&gt;
Cc: Stephen Hemminger &lt;sthemmin@microsoft.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memory_hotplug: make remove_memory() take the device_hotplug_lock</title>
<updated>2018-10-31T15:54:17+00:00</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2018-10-30T22:10:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d15e59260f62bd5e0f625cf5f5240f6ffac78ab6'/>
<id>d15e59260f62bd5e0f625cf5f5240f6ffac78ab6</id>
<content type='text'>
Patch series "mm: online/offline_pages called w.o. mem_hotplug_lock", v3.

Reading through the code and studying how mem_hotplug_lock is to be used,
I noticed that there are two places where we can end up calling
device_online()/device_offline() - online_pages()/offline_pages() without
the mem_hotplug_lock.  And there are other places where we call
device_online()/device_offline() without the device_hotplug_lock.

While e.g.
	echo "online" &gt; /sys/devices/system/memory/memory9/state
is fine, e.g.
	echo 1 &gt; /sys/devices/system/memory/memory9/online
Will not take the mem_hotplug_lock. However the device_lock() and
device_hotplug_lock.

E.g.  via memory_probe_store(), we can end up calling
add_memory()-&gt;online_pages() without the device_hotplug_lock.  So we can
have concurrent callers in online_pages().  We e.g.  touch in
online_pages() basically unprotected zone-&gt;present_pages then.

Looks like there is a longer history to that (see Patch #2 for details),
and fixing it to work the way it was intended is not really possible.  We
would e.g.  have to take the mem_hotplug_lock in device/base/core.c, which
sounds wrong.

Summary: We had a lock inversion on mem_hotplug_lock and device_lock().
More details can be found in patch 3 and patch 6.

I propose the general rules (documentation added in patch 6):

1. add_memory/add_memory_resource() must only be called with
   device_hotplug_lock.
2. remove_memory() must only be called with device_hotplug_lock. This is
   already documented and holds for all callers.
3. device_online()/device_offline() must only be called with
   device_hotplug_lock. This is already documented and true for now in core
   code. Other callers (related to memory hotplug) have to be fixed up.
4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
   online_pages/offline_pages.

To me, this looks way cleaner than what we have right now (and easier to
verify).  And looking at the documentation of remove_memory, using
lock_device_hotplug also for add_memory() feels natural.

This patch (of 6):

remove_memory() is exported right now but requires the
device_hotplug_lock, which is not exported.  So let's provide a variant
that takes the lock and only export that one.

The lock is already held in
	arch/powerpc/platforms/pseries/hotplug-memory.c
	drivers/acpi/acpi_memhotplug.c
	arch/powerpc/platforms/powernv/memtrace.c

Apart from that, there are not other users in the tree.

Link: http://lkml.kernel.org/r/20180925091457.28651-2-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Reviewed-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Reviewed-by: Rashmica Gupta &lt;rashmica.g@gmail.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@rjwysocki.net&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: Rashmica Gupta &lt;rashmica.g@gmail.com&gt;
Cc: Michael Neuling &lt;mikey@neuling.org&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Nathan Fontenot &lt;nfont@linux.vnet.ibm.com&gt;
Cc: John Allen &lt;jallen@linux.vnet.ibm.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: YASUAKI ISHIMATSU &lt;yasu.isimatu@gmail.com&gt;
Cc: Mathieu Malaterre &lt;malat@debian.org&gt;
Cc: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Cc: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: Kate Stewart &lt;kstewart@linuxfoundation.org&gt;
Cc: "K. Y. Srinivasan" &lt;kys@microsoft.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Philippe Ombredanne &lt;pombredanne@nexb.com&gt;
Cc: Stephen Hemminger &lt;sthemmin@microsoft.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Patch series "mm: online/offline_pages called w.o. mem_hotplug_lock", v3.

Reading through the code and studying how mem_hotplug_lock is to be used,
I noticed that there are two places where we can end up calling
device_online()/device_offline() - online_pages()/offline_pages() without
the mem_hotplug_lock.  And there are other places where we call
device_online()/device_offline() without the device_hotplug_lock.

While e.g.
	echo "online" &gt; /sys/devices/system/memory/memory9/state
is fine, e.g.
	echo 1 &gt; /sys/devices/system/memory/memory9/online
Will not take the mem_hotplug_lock. However the device_lock() and
device_hotplug_lock.

E.g.  via memory_probe_store(), we can end up calling
add_memory()-&gt;online_pages() without the device_hotplug_lock.  So we can
have concurrent callers in online_pages().  We e.g.  touch in
online_pages() basically unprotected zone-&gt;present_pages then.

Looks like there is a longer history to that (see Patch #2 for details),
and fixing it to work the way it was intended is not really possible.  We
would e.g.  have to take the mem_hotplug_lock in device/base/core.c, which
sounds wrong.

Summary: We had a lock inversion on mem_hotplug_lock and device_lock().
More details can be found in patch 3 and patch 6.

I propose the general rules (documentation added in patch 6):

1. add_memory/add_memory_resource() must only be called with
   device_hotplug_lock.
2. remove_memory() must only be called with device_hotplug_lock. This is
   already documented and holds for all callers.
3. device_online()/device_offline() must only be called with
   device_hotplug_lock. This is already documented and true for now in core
   code. Other callers (related to memory hotplug) have to be fixed up.
4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
   online_pages/offline_pages.

To me, this looks way cleaner than what we have right now (and easier to
verify).  And looking at the documentation of remove_memory, using
lock_device_hotplug also for add_memory() feels natural.

This patch (of 6):

remove_memory() is exported right now but requires the
device_hotplug_lock, which is not exported.  So let's provide a variant
that takes the lock and only export that one.

The lock is already held in
	arch/powerpc/platforms/pseries/hotplug-memory.c
	drivers/acpi/acpi_memhotplug.c
	arch/powerpc/platforms/powernv/memtrace.c

Apart from that, there are not other users in the tree.

Link: http://lkml.kernel.org/r/20180925091457.28651-2-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Reviewed-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Reviewed-by: Rashmica Gupta &lt;rashmica.g@gmail.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@rjwysocki.net&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: Rashmica Gupta &lt;rashmica.g@gmail.com&gt;
Cc: Michael Neuling &lt;mikey@neuling.org&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Nathan Fontenot &lt;nfont@linux.vnet.ibm.com&gt;
Cc: John Allen &lt;jallen@linux.vnet.ibm.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: YASUAKI ISHIMATSU &lt;yasu.isimatu@gmail.com&gt;
Cc: Mathieu Malaterre &lt;malat@debian.org&gt;
Cc: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Cc: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: Kate Stewart &lt;kstewart@linuxfoundation.org&gt;
Cc: "K. Y. Srinivasan" &lt;kys@microsoft.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Philippe Ombredanne &lt;pombredanne@nexb.com&gt;
Cc: Stephen Hemminger &lt;sthemmin@microsoft.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: remove include/linux/bootmem.h</title>
<updated>2018-10-31T15:54:16+00:00</updated>
<author>
<name>Mike Rapoport</name>
<email>rppt@linux.vnet.ibm.com</email>
</author>
<published>2018-10-30T22:09:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=57c8a661d95dff48dd9c2f2496139082bbaf241a'/>
<id>57c8a661d95dff48dd9c2f2496139082bbaf241a</id>
<content type='text'>
Move remaining definitions and declarations from include/linux/bootmem.h
into include/linux/memblock.h and remove the redundant header.

The includes were replaced with the semantic patch below and then
semi-automated removal of duplicated '#include &lt;linux/memblock.h&gt;

@@
@@
- #include &lt;linux/bootmem.h&gt;
+ #include &lt;linux/memblock.h&gt;

[sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
  Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
[sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
  Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
[sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
  Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport &lt;rppt@linux.vnet.ibm.com&gt;
Signed-off-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Chris Zankel &lt;chris@zankel.net&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Greentime Hu &lt;green.hu@gmail.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Guan Xuetao &lt;gxt@pku.edu.cn&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: "James E.J. Bottomley" &lt;jejb@parisc-linux.org&gt;
Cc: Jonas Bonn &lt;jonas@southpole.se&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Ley Foon Tan &lt;lftan@altera.com&gt;
Cc: Mark Salter &lt;msalter@redhat.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Matt Turner &lt;mattst88@gmail.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Michal Simek &lt;monstr@monstr.eu&gt;
Cc: Palmer Dabbelt &lt;palmer@sifive.com&gt;
Cc: Paul Burton &lt;paul.burton@mips.com&gt;
Cc: Richard Kuo &lt;rkuo@codeaurora.org&gt;
Cc: Richard Weinberger &lt;richard@nod.at&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Serge Semin &lt;fancer.lancer@gmail.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Move remaining definitions and declarations from include/linux/bootmem.h
into include/linux/memblock.h and remove the redundant header.

The includes were replaced with the semantic patch below and then
semi-automated removal of duplicated '#include &lt;linux/memblock.h&gt;

@@
@@
- #include &lt;linux/bootmem.h&gt;
+ #include &lt;linux/memblock.h&gt;

[sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
  Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
[sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
  Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
[sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
  Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport &lt;rppt@linux.vnet.ibm.com&gt;
Signed-off-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Chris Zankel &lt;chris@zankel.net&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Greentime Hu &lt;green.hu@gmail.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Guan Xuetao &lt;gxt@pku.edu.cn&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: "James E.J. Bottomley" &lt;jejb@parisc-linux.org&gt;
Cc: Jonas Bonn &lt;jonas@southpole.se&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Ley Foon Tan &lt;lftan@altera.com&gt;
Cc: Mark Salter &lt;msalter@redhat.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Matt Turner &lt;mattst88@gmail.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Michal Simek &lt;monstr@monstr.eu&gt;
Cc: Palmer Dabbelt &lt;palmer@sifive.com&gt;
Cc: Paul Burton &lt;paul.burton@mips.com&gt;
Cc: Richard Kuo &lt;rkuo@codeaurora.org&gt;
Cc: Richard Weinberger &lt;richard@nod.at&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Serge Semin &lt;fancer.lancer@gmail.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memory_hotplug.c: clean up node_states_check_changes_offline()</title>
<updated>2018-10-26T23:26:33+00:00</updated>
<author>
<name>Oscar Salvador</name>
<email>osalvador@suse.de</email>
</author>
<published>2018-10-26T22:07:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=86b27beae59685a42f81bcda9d502b5aebddfab8'/>
<id>86b27beae59685a42f81bcda9d502b5aebddfab8</id>
<content type='text'>
This patch, as the previous one, gets rid of the wrong if statements.
While at it, I realized that the comments are sometimes very confusing,
to say the least, and wrong.
For example:

___
zone_last = ZONE_MOVABLE;

/*
 * check whether node_states[N_HIGH_MEMORY] will be changed
 * If we try to offline the last present @nr_pages from the node,
 * we can determind we will need to clear the node from
 * node_states[N_HIGH_MEMORY].
 */

for (; zt &lt;= zone_last; zt++)
        present_pages += pgdat-&gt;node_zones[zt].present_pages;
if (nr_pages &gt;= present_pages)
        arg-&gt;status_change_nid = zone_to_nid(zone);
else
        arg-&gt;status_change_nid = -1;
___

In case the node gets empry, it must be removed from N_MEMORY.  We already
check N_HIGH_MEMORY a bit above within the CONFIG_HIGHMEM ifdef code.  Not
to say that status_change_nid is for N_MEMORY, and not for N_HIGH_MEMORY.

So I re-wrote some of the comments to what I think is better.

[osalvador@suse.de: address feedback from Pavel]
  Link: http://lkml.kernel.org/r/20180921132634.10103-5-osalvador@techadventures.net
Link: http://lkml.kernel.org/r/20180919100819.25518-6-osalvador@techadventures.net
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Reviewed-by: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: &lt;yasu.isimatu@gmail.com&gt;
Cc: Mathieu Malaterre &lt;malat@debian.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch, as the previous one, gets rid of the wrong if statements.
While at it, I realized that the comments are sometimes very confusing,
to say the least, and wrong.
For example:

___
zone_last = ZONE_MOVABLE;

/*
 * check whether node_states[N_HIGH_MEMORY] will be changed
 * If we try to offline the last present @nr_pages from the node,
 * we can determind we will need to clear the node from
 * node_states[N_HIGH_MEMORY].
 */

for (; zt &lt;= zone_last; zt++)
        present_pages += pgdat-&gt;node_zones[zt].present_pages;
if (nr_pages &gt;= present_pages)
        arg-&gt;status_change_nid = zone_to_nid(zone);
else
        arg-&gt;status_change_nid = -1;
___

In case the node gets empry, it must be removed from N_MEMORY.  We already
check N_HIGH_MEMORY a bit above within the CONFIG_HIGHMEM ifdef code.  Not
to say that status_change_nid is for N_MEMORY, and not for N_HIGH_MEMORY.

So I re-wrote some of the comments to what I think is better.

[osalvador@suse.de: address feedback from Pavel]
  Link: http://lkml.kernel.org/r/20180921132634.10103-5-osalvador@techadventures.net
Link: http://lkml.kernel.org/r/20180919100819.25518-6-osalvador@techadventures.net
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Reviewed-by: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: &lt;yasu.isimatu@gmail.com&gt;
Cc: Mathieu Malaterre &lt;malat@debian.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memory_hotplug.c: simplify node_states_check_changes_online</title>
<updated>2018-10-26T23:26:33+00:00</updated>
<author>
<name>Oscar Salvador</name>
<email>osalvador@suse.de</email>
</author>
<published>2018-10-26T22:07:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8efe33f40f3e69ac6069ed46666d2d527ddc2c04'/>
<id>8efe33f40f3e69ac6069ed46666d2d527ddc2c04</id>
<content type='text'>
While looking at node_states_check_changes_online, I stumbled upon some
confusing things.

Right after entering the function, we find this:

if (N_MEMORY == N_NORMAL_MEMORY)
        zone_last = ZONE_MOVABLE;

This is wrong.
N_MEMORY cannot really be equal to N_NORMAL_MEMORY.
My guess is that this wanted to be something like:

if (N_NORMAL_MEMORY == N_HIGH_MEMORY)

to check if we have CONFIG_HIGHMEM.

Later on, in the CONFIG_HIGHMEM block, we have:

if (N_MEMORY == N_HIGH_MEMORY)
        zone_last = ZONE_MOVABLE;

Again, this is wrong, and will never be evaluated to true.

Besides removing these wrong if statements, I simplified the function a
bit.

[osalvador@suse.de: address feedback from Pavel]
  Link: http://lkml.kernel.org/r/20180921132634.10103-4-osalvador@techadventures.net
Link: http://lkml.kernel.org/r/20180919100819.25518-5-osalvador@techadventures.net
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Reviewed-by: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Mathieu Malaterre &lt;malat@debian.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: &lt;yasu.isimatu@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
While looking at node_states_check_changes_online, I stumbled upon some
confusing things.

Right after entering the function, we find this:

if (N_MEMORY == N_NORMAL_MEMORY)
        zone_last = ZONE_MOVABLE;

This is wrong.
N_MEMORY cannot really be equal to N_NORMAL_MEMORY.
My guess is that this wanted to be something like:

if (N_NORMAL_MEMORY == N_HIGH_MEMORY)

to check if we have CONFIG_HIGHMEM.

Later on, in the CONFIG_HIGHMEM block, we have:

if (N_MEMORY == N_HIGH_MEMORY)
        zone_last = ZONE_MOVABLE;

Again, this is wrong, and will never be evaluated to true.

Besides removing these wrong if statements, I simplified the function a
bit.

[osalvador@suse.de: address feedback from Pavel]
  Link: http://lkml.kernel.org/r/20180921132634.10103-4-osalvador@techadventures.net
Link: http://lkml.kernel.org/r/20180919100819.25518-5-osalvador@techadventures.net
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Reviewed-by: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Mathieu Malaterre &lt;malat@debian.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: &lt;yasu.isimatu@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memory_hotplug.c: tidy up node_states_clear_node()</title>
<updated>2018-10-26T23:26:33+00:00</updated>
<author>
<name>Oscar Salvador</name>
<email>osalvador@suse.de</email>
</author>
<published>2018-10-26T22:07:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=cf01f6f5e398a74f00fa9ac490ec98c12e63e4b1'/>
<id>cf01f6f5e398a74f00fa9ac490ec98c12e63e4b1</id>
<content type='text'>
node_states_clear has the following if statements:

if ((N_MEMORY != N_NORMAL_MEMORY) &amp;&amp;
    (arg-&gt;status_change_nid_high &gt;= 0))
        ...

if ((N_MEMORY != N_HIGH_MEMORY) &amp;&amp;
    (arg-&gt;status_change_nid &gt;= 0))
        ...

N_MEMORY can never be equal to neither N_NORMAL_MEMORY nor
N_HIGH_MEMORY.

Similar problem was found in [1].
Since this is wrong, let us get rid of it.

[1] https://patchwork.kernel.org/patch/10579155/

Link: http://lkml.kernel.org/r/20180919100819.25518-4-osalvador@techadventures.net
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Reviewed-by: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Mathieu Malaterre &lt;malat@debian.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: &lt;yasu.isimatu@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
node_states_clear has the following if statements:

if ((N_MEMORY != N_NORMAL_MEMORY) &amp;&amp;
    (arg-&gt;status_change_nid_high &gt;= 0))
        ...

if ((N_MEMORY != N_HIGH_MEMORY) &amp;&amp;
    (arg-&gt;status_change_nid &gt;= 0))
        ...

N_MEMORY can never be equal to neither N_NORMAL_MEMORY nor
N_HIGH_MEMORY.

Similar problem was found in [1].
Since this is wrong, let us get rid of it.

[1] https://patchwork.kernel.org/patch/10579155/

Link: http://lkml.kernel.org/r/20180919100819.25518-4-osalvador@techadventures.net
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Reviewed-by: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Mathieu Malaterre &lt;malat@debian.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: &lt;yasu.isimatu@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memory_hotplug.c: spare unnecessary calls to node_set_state</title>
<updated>2018-10-26T23:26:33+00:00</updated>
<author>
<name>Oscar Salvador</name>
<email>osalvador@suse.de</email>
</author>
<published>2018-10-26T22:07:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=83d83612d707c3709e030c745e3df8d4e17bbfa2'/>
<id>83d83612d707c3709e030c745e3df8d4e17bbfa2</id>
<content type='text'>
In node_states_check_changes_online, we check if the node will have to be
set for any of the N_*_MEMORY states after the pages have been onlined.

Later on, we perform the activation in node_states_set_node.  Currently,
in node_states_set_node we set the node to N_MEMORY unconditionally.

This means that we call node_set_state for N_MEMORY every time pages go
online, but we only need to do it if the node has not yet been set for
N_MEMORY.

Fix this by checking status_change_nid.

Link: http://lkml.kernel.org/r/20180919100819.25518-2-osalvador@techadventures.net
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Reviewed-by: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: &lt;yasu.isimatu@gmail.com&gt;
Cc: Mathieu Malaterre &lt;malat@debian.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In node_states_check_changes_online, we check if the node will have to be
set for any of the N_*_MEMORY states after the pages have been onlined.

Later on, we perform the activation in node_states_set_node.  Currently,
in node_states_set_node we set the node to N_MEMORY unconditionally.

This means that we call node_set_state for N_MEMORY every time pages go
online, but we only need to do it if the node has not yet been set for
N_MEMORY.

Fix this by checking status_change_nid.

Link: http://lkml.kernel.org/r/20180919100819.25518-2-osalvador@techadventures.net
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Reviewed-by: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: &lt;yasu.isimatu@gmail.com&gt;
Cc: Mathieu Malaterre &lt;malat@debian.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
