<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/mm, branch v5.11-rc2</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>mm: slub: call account_slab_page() after slab page initialization</title>
<updated>2020-12-29T23:36:49+00:00</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2020-12-29T23:15:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1f3147b49d75b47b6be54a1e6dfa87a4921e1e51'/>
<id>1f3147b49d75b47b6be54a1e6dfa87a4921e1e51</id>
<content type='text'>
It's convenient to have page-&gt;objects initialized before calling into
account_slab_page().  In particular, this information can be used to
pre-alloc the obj_cgroup vector.

Let's call account_slab_page() a bit later, after the initialization of
page-&gt;objects.

This commit doesn't bring any functional change, but is required for
further optimizations.

[akpm@linux-foundation.org: undo changes needed by forthcoming mm-memcg-slab-pre-allocate-obj_cgroups-for-slab-caches-with-slab_account.patch]

Link: https://lkml.kernel.org/r/20201110195753.530157-1-guro@fb.com
Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It's convenient to have page-&gt;objects initialized before calling into
account_slab_page().  In particular, this information can be used to
pre-alloc the obj_cgroup vector.

Let's call account_slab_page() a bit later, after the initialization of
page-&gt;objects.

This commit doesn't bring any functional change, but is required for
further optimizations.

[akpm@linux-foundation.org: undo changes needed by forthcoming mm-memcg-slab-pre-allocate-obj_cgroups-for-slab-caches-with-slab_account.patch]

Link: https://lkml.kernel.org/r/20201110195753.530157-1-guro@fb.com
Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kasan: fix null pointer dereference in kasan_record_aux_stack</title>
<updated>2020-12-29T23:36:49+00:00</updated>
<author>
<name>Walter Wu</name>
<email>walter-zh.wu@mediatek.com</email>
</author>
<published>2020-12-29T23:14:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=13384f6125ad7ebdcc8914fe1e03ded48ce76581'/>
<id>13384f6125ad7ebdcc8914fe1e03ded48ce76581</id>
<content type='text'>
Syzbot reported the following [1]:

  BUG: kernel NULL pointer dereference, address: 0000000000000008
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 2d993067 P4D 2d993067 PUD 19a3c067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP KASAN
  CPU: 1 PID: 3852 Comm: kworker/1:2 Not tainted 5.10.0-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Workqueue: events free_ipc
  RIP: 0010:kasan_record_aux_stack+0x77/0xb0

Add null checking slab object from kasan_get_alloc_meta() in order to
avoid null pointer dereference.

[1] https://syzkaller.appspot.com/x/log.txt?x=10a82a50d00000

Link: https://lkml.kernel.org/r/20201228080018.23041-1-walter-zh.wu@mediatek.com
Signed-off-by: Walter Wu &lt;walter-zh.wu@mediatek.com&gt;
Suggested-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Syzbot reported the following [1]:

  BUG: kernel NULL pointer dereference, address: 0000000000000008
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 2d993067 P4D 2d993067 PUD 19a3c067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP KASAN
  CPU: 1 PID: 3852 Comm: kworker/1:2 Not tainted 5.10.0-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Workqueue: events free_ipc
  RIP: 0010:kasan_record_aux_stack+0x77/0xb0

Add null checking slab object from kasan_get_alloc_meta() in order to
avoid null pointer dereference.

[1] https://syzkaller.appspot.com/x/log.txt?x=10a82a50d00000

Link: https://lkml.kernel.org/r/20201228080018.23041-1-walter-zh.wu@mediatek.com
Signed-off-by: Walter Wu &lt;walter-zh.wu@mediatek.com&gt;
Suggested-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: generalise COW SMC TLB flushing race comment</title>
<updated>2020-12-29T23:36:49+00:00</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2020-12-29T23:14:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=111fe7186b29d172729db5e294875b9fc7a0ec1d'/>
<id>111fe7186b29d172729db5e294875b9fc7a0ec1d</id>
<content type='text'>
I'm not sure if I'm completely missing something here, but AFAIKS the
reference to the mysterious "COW SMC race" confuses the issue.  The
original changelog and mailing list thread didn't help me either.

This SMC race is where the problem was detected, but isn't the general
problem bigger and more obvious: that the new PTE could be picked up at
any time by any TLB while entries for the old PTE exist in other TLBs
before the TLB flush takes effect?

The case where the iTLB and dTLB of a CPU are pointing at different pages
is an interesting one but follows from the general problem.

The other (minor) thing with the comment I think it makes it a bit clearer
to say what the old code was doing (i.e., it avoids the race as opposed to
what?).

References: 4ce072f1faf29 ("mm: fix a race condition under SMC + COW")
Link: https://lkml.kernel.org/r/20201215121119.351650-1-npiggin@gmail.com
Reviewed-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Suresh Siddha &lt;sbsiddha@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I'm not sure if I'm completely missing something here, but AFAIKS the
reference to the mysterious "COW SMC race" confuses the issue.  The
original changelog and mailing list thread didn't help me either.

This SMC race is where the problem was detected, but isn't the general
problem bigger and more obvious: that the new PTE could be picked up at
any time by any TLB while entries for the old PTE exist in other TLBs
before the TLB flush takes effect?

The case where the iTLB and dTLB of a CPU are pointing at different pages
is an interesting one but follows from the general problem.

The other (minor) thing with the comment I think it makes it a bit clearer
to say what the old code was doing (i.e., it avoids the race as opposed to
what?).

References: 4ce072f1faf29 ("mm: fix a race condition under SMC + COW")
Link: https://lkml.kernel.org/r/20201215121119.351650-1-npiggin@gmail.com
Reviewed-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Suresh Siddha &lt;sbsiddha@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/mremap.c: fix extent calculation</title>
<updated>2020-12-29T23:36:49+00:00</updated>
<author>
<name>Kalesh Singh</name>
<email>kaleshsingh@google.com</email>
</author>
<published>2020-12-29T23:14:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e05986ee7a5814bec0e0075d813daca3d46e4a9e'/>
<id>e05986ee7a5814bec0e0075d813daca3d46e4a9e</id>
<content type='text'>
When `next &lt; old_addr`, `next - old_addr` arithmetic underflows causing
`extent` to be incorrect.

Make `extent` the smaller of `next - old_addr` or `old_end - old_addr`.

Link: https://lkml.kernel.org/r/20201219170433.2418867-1-kaleshsingh@google.com
Fixes: c49dd34018026 ("mm: speedup mremap on 1GB or larger regions")
Signed-off-by: Kalesh Singh &lt;kaleshsingh@google.com&gt;
Reported-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Tested-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Lokesh Gidra &lt;lokeshgidra@google.com&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: Kalesh Singh &lt;kaleshsingh@google.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When `next &lt; old_addr`, `next - old_addr` arithmetic underflows causing
`extent` to be incorrect.

Make `extent` the smaller of `next - old_addr` or `old_end - old_addr`.

Link: https://lkml.kernel.org/r/20201219170433.2418867-1-kaleshsingh@google.com
Fixes: c49dd34018026 ("mm: speedup mremap on 1GB or larger regions")
Signed-off-by: Kalesh Singh &lt;kaleshsingh@google.com&gt;
Reported-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Tested-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Lokesh Gidra &lt;lokeshgidra@google.com&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: Kalesh Singh &lt;kaleshsingh@google.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: memmap defer init doesn't work as expected</title>
<updated>2020-12-29T23:36:49+00:00</updated>
<author>
<name>Baoquan He</name>
<email>bhe@redhat.com</email>
</author>
<published>2020-12-29T23:14:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dc2da7b45ffe954a0090f5d0310ed7b0b37d2bd2'/>
<id>dc2da7b45ffe954a0090f5d0310ed7b0b37d2bd2</id>
<content type='text'>
VMware observed a performance regression during memmap init on their
platform, and bisected to commit 73a6e474cb376 ("mm: memmap_init:
iterate over memblock regions rather that check each PFN") causing it.

Before the commit:

  [0.033176] Normal zone: 1445888 pages used for memmap
  [0.033176] Normal zone: 89391104 pages, LIFO batch:63
  [0.035851] ACPI: PM-Timer IO Port: 0x448

With commit

  [0.026874] Normal zone: 1445888 pages used for memmap
  [0.026875] Normal zone: 89391104 pages, LIFO batch:63
  [2.028450] ACPI: PM-Timer IO Port: 0x448

The root cause is the current memmap defer init doesn't work as expected.

Before, memmap_init_zone() was used to do memmap init of one whole zone,
to initialize all low zones of one numa node, but defer memmap init of
the last zone in that numa node.  However, since commit 73a6e474cb376,
function memmap_init() is adapted to iterater over memblock regions
inside one zone, then call memmap_init_zone() to do memmap init for each
region.

E.g, on VMware's system, the memory layout is as below, there are two
memory regions in node 2.  The current code will mistakenly initialize the
whole 1st region [mem 0xab00000000-0xfcffffffff], then do memmap defer to
iniatialize only one memmory section on the 2nd region [mem
0x10000000000-0x1033fffffff].  In fact, we only expect to see that there's
only one memory section's memmap initialized.  That's why more time is
costed at the time.

[    0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
[    0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
[    0.008843] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x55ffffffff]
[    0.008844] ACPI: SRAT: Node 1 PXM 1 [mem 0x5600000000-0xaaffffffff]
[    0.008844] ACPI: SRAT: Node 2 PXM 2 [mem 0xab00000000-0xfcffffffff]
[    0.008845] ACPI: SRAT: Node 2 PXM 2 [mem 0x10000000000-0x1033fffffff]

Now, let's add a parameter 'zone_end_pfn' to memmap_init_zone() to pass
down the real zone end pfn so that defer_init() can use it to judge
whether defer need be taken in zone wide.

Link: https://lkml.kernel.org/r/20201223080811.16211-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20201223080811.16211-2-bhe@redhat.com
Fixes: commit 73a6e474cb376 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
Signed-off-by: Baoquan He &lt;bhe@redhat.com&gt;
Reported-by: Rahul Gopakumar &lt;gopakumarr@vmware.com&gt;
Reviewed-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
VMware observed a performance regression during memmap init on their
platform, and bisected to commit 73a6e474cb376 ("mm: memmap_init:
iterate over memblock regions rather that check each PFN") causing it.

Before the commit:

  [0.033176] Normal zone: 1445888 pages used for memmap
  [0.033176] Normal zone: 89391104 pages, LIFO batch:63
  [0.035851] ACPI: PM-Timer IO Port: 0x448

With commit

  [0.026874] Normal zone: 1445888 pages used for memmap
  [0.026875] Normal zone: 89391104 pages, LIFO batch:63
  [2.028450] ACPI: PM-Timer IO Port: 0x448

The root cause is the current memmap defer init doesn't work as expected.

Before, memmap_init_zone() was used to do memmap init of one whole zone,
to initialize all low zones of one numa node, but defer memmap init of
the last zone in that numa node.  However, since commit 73a6e474cb376,
function memmap_init() is adapted to iterater over memblock regions
inside one zone, then call memmap_init_zone() to do memmap init for each
region.

E.g, on VMware's system, the memory layout is as below, there are two
memory regions in node 2.  The current code will mistakenly initialize the
whole 1st region [mem 0xab00000000-0xfcffffffff], then do memmap defer to
iniatialize only one memmory section on the 2nd region [mem
0x10000000000-0x1033fffffff].  In fact, we only expect to see that there's
only one memory section's memmap initialized.  That's why more time is
costed at the time.

[    0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
[    0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
[    0.008843] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x55ffffffff]
[    0.008844] ACPI: SRAT: Node 1 PXM 1 [mem 0x5600000000-0xaaffffffff]
[    0.008844] ACPI: SRAT: Node 2 PXM 2 [mem 0xab00000000-0xfcffffffff]
[    0.008845] ACPI: SRAT: Node 2 PXM 2 [mem 0x10000000000-0x1033fffffff]

Now, let's add a parameter 'zone_end_pfn' to memmap_init_zone() to pass
down the real zone end pfn so that defer_init() can use it to judge
whether defer need be taken in zone wide.

Link: https://lkml.kernel.org/r/20201223080811.16211-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20201223080811.16211-2-bhe@redhat.com
Fixes: commit 73a6e474cb376 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
Signed-off-by: Baoquan He &lt;bhe@redhat.com&gt;
Reported-by: Rahul Gopakumar &lt;gopakumarr@vmware.com&gt;
Reviewed-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/hugetlb: fix deadlock in hugetlb_cow error path</title>
<updated>2020-12-29T23:36:49+00:00</updated>
<author>
<name>Mike Kravetz</name>
<email>mike.kravetz@oracle.com</email>
</author>
<published>2020-12-29T23:14:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e7dd91c456a8cdbcd7066997d15e36d14276a949'/>
<id>e7dd91c456a8cdbcd7066997d15e36d14276a949</id>
<content type='text'>
syzbot reported the deadlock here [1].  The issue is in hugetlb cow
error handling when there are not enough huge pages for the faulting
task which took the original reservation.  It is possible that other
(child) tasks could have consumed pages associated with the reservation.
In this case, we want the task which took the original reservation to
succeed.  So, we unmap any associated pages in children so that they can
be used by the faulting task that owns the reservation.

The unmapping code needs to hold i_mmap_rwsem in write mode.  However,
due to commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd
sharing synchronization") we are already holding i_mmap_rwsem in read
mode when hugetlb_cow is called.

Technically, i_mmap_rwsem does not need to be held in read mode for COW
mappings as they can not share pmd's.  Modifying the fault code to not
take i_mmap_rwsem in read mode for COW (and other non-sharable) mappings
is too involved for a stable fix.

Instead, we simply drop the hugetlb_fault_mutex and i_mmap_rwsem before
unmapping.  This is OK as it is technically not needed.  They are
reacquired after unmapping as expected by calling code.  Since this is
done in an uncommon error path, the overhead of dropping and reacquiring
mutexes is acceptable.

While making changes, remove redundant BUG_ON after unmap_ref_private.

[1] https://lkml.kernel.org/r/000000000000b73ccc05b5cf8558@google.com

Link: https://lkml.kernel.org/r/4c5781b8-3b00-761e-c0c7-c5edebb6ec1a@oracle.com
Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
Signed-off-by: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Reported-by: syzbot+5eee4145df3c15e96625@syzkaller.appspotmail.com
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: "Aneesh Kumar K . V" &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
syzbot reported the deadlock here [1].  The issue is in hugetlb cow
error handling when there are not enough huge pages for the faulting
task which took the original reservation.  It is possible that other
(child) tasks could have consumed pages associated with the reservation.
In this case, we want the task which took the original reservation to
succeed.  So, we unmap any associated pages in children so that they can
be used by the faulting task that owns the reservation.

The unmapping code needs to hold i_mmap_rwsem in write mode.  However,
due to commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd
sharing synchronization") we are already holding i_mmap_rwsem in read
mode when hugetlb_cow is called.

Technically, i_mmap_rwsem does not need to be held in read mode for COW
mappings as they can not share pmd's.  Modifying the fault code to not
take i_mmap_rwsem in read mode for COW (and other non-sharable) mappings
is too involved for a stable fix.

Instead, we simply drop the hugetlb_fault_mutex and i_mmap_rwsem before
unmapping.  This is OK as it is technically not needed.  They are
reacquired after unmapping as expected by calling code.  Since this is
done in an uncommon error path, the overhead of dropping and reacquiring
mutexes is acceptable.

While making changes, remove redundant BUG_ON after unmap_ref_private.

[1] https://lkml.kernel.org/r/000000000000b73ccc05b5cf8558@google.com

Link: https://lkml.kernel.org/r/4c5781b8-3b00-761e-c0c7-c5edebb6ec1a@oracle.com
Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
Signed-off-by: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Reported-by: syzbot+5eee4145df3c15e96625@syzkaller.appspotmail.com
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: "Aneesh Kumar K . V" &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost</title>
<updated>2020-12-24T20:06:46+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-12-24T20:06:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=64145482d3339d71f58857591d021588040543f4'/>
<id>64145482d3339d71f58857591d021588040543f4</id>
<content type='text'>
Pull virtio updates from Michael Tsirkin:

 - vdpa sim refactoring

 - virtio mem: Big Block Mode support

 - misc cleanus, fixes

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (61 commits)
  vdpa: Use simpler version of ida allocation
  vdpa: Add missing comment for virtqueue count
  uapi: virtio_ids: add missing device type IDs from OASIS spec
  uapi: virtio_ids.h: consistent indentions
  vhost scsi: fix error return code in vhost_scsi_set_endpoint()
  virtio_ring: Fix two use after free bugs
  virtio_net: Fix error code in probe()
  virtio_ring: Cut and paste bugs in vring_create_virtqueue_packed()
  tools/virtio: add barrier for aarch64
  tools/virtio: add krealloc_array
  tools/virtio: include asm/bug.h
  vdpa/mlx5: Use write memory barrier after updating CQ index
  vdpa: split vdpasim to core and net modules
  vdpa_sim: split vdpasim_virtqueue's iov field in out_iov and in_iov
  vdpa_sim: make vdpasim-&gt;buffer size configurable
  vdpa_sim: use kvmalloc to allocate vdpasim-&gt;buffer
  vdpa_sim: set vringh notify callback
  vdpa_sim: add set_config callback in vdpasim_dev_attr
  vdpa_sim: add get_config callback in vdpasim_dev_attr
  vdpa_sim: make 'config' generic and usable for any device type
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull virtio updates from Michael Tsirkin:

 - vdpa sim refactoring

 - virtio mem: Big Block Mode support

 - misc cleanus, fixes

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (61 commits)
  vdpa: Use simpler version of ida allocation
  vdpa: Add missing comment for virtqueue count
  uapi: virtio_ids: add missing device type IDs from OASIS spec
  uapi: virtio_ids.h: consistent indentions
  vhost scsi: fix error return code in vhost_scsi_set_endpoint()
  virtio_ring: Fix two use after free bugs
  virtio_net: Fix error code in probe()
  virtio_ring: Cut and paste bugs in vring_create_virtqueue_packed()
  tools/virtio: add barrier for aarch64
  tools/virtio: add krealloc_array
  tools/virtio: include asm/bug.h
  vdpa/mlx5: Use write memory barrier after updating CQ index
  vdpa: split vdpasim to core and net modules
  vdpa_sim: split vdpasim_virtqueue's iov field in out_iov and in_iov
  vdpa_sim: make vdpasim-&gt;buffer size configurable
  vdpa_sim: use kvmalloc to allocate vdpasim-&gt;buffer
  vdpa_sim: set vringh notify callback
  vdpa_sim: add set_config callback in vdpasim_dev_attr
  vdpa_sim: add get_config callback in vdpasim_dev_attr
  vdpa_sim: make 'config' generic and usable for any device type
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>kasan, mm: allow cache merging with no metadata</title>
<updated>2020-12-22T20:55:09+00:00</updated>
<author>
<name>Andrey Konovalov</name>
<email>andreyknvl@google.com</email>
</author>
<published>2020-12-22T20:03:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e86f8b09f215e3755cd2d56930487dec2de02433'/>
<id>e86f8b09f215e3755cd2d56930487dec2de02433</id>
<content type='text'>
The reason cache merging is disabled with KASAN is because KASAN puts its
metadata right after the allocated object. When the merged caches have
slightly different sizes, the metadata ends up in different places, which
KASAN doesn't support.

It might be possible to adjust the metadata allocation algorithm and make
it friendly to the cache merging code. Instead this change takes a simpler
approach and allows merging caches when no metadata is present. Which is
the case for hardware tag-based KASAN with kasan.mode=prod.

Link: https://lkml.kernel.org/r/37497e940bfd4b32c0a93a702a9ae4cf061d5392.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/Ia114847dfb2244f297d2cb82d592bf6a07455dba
Co-developed-by: Vincenzo Frascino &lt;Vincenzo.Frascino@arm.com&gt;
Signed-off-by: Vincenzo Frascino &lt;Vincenzo.Frascino@arm.com&gt;
Signed-off-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Reviewed-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Reviewed-by: Marco Elver &lt;elver@google.com&gt;
Tested-by: Vincenzo Frascino &lt;vincenzo.frascino@arm.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Branislav Rankov &lt;Branislav.Rankov@arm.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Evgenii Stepanov &lt;eugenis@google.com&gt;
Cc: Kevin Brodsky &lt;kevin.brodsky@arm.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The reason cache merging is disabled with KASAN is because KASAN puts its
metadata right after the allocated object. When the merged caches have
slightly different sizes, the metadata ends up in different places, which
KASAN doesn't support.

It might be possible to adjust the metadata allocation algorithm and make
it friendly to the cache merging code. Instead this change takes a simpler
approach and allows merging caches when no metadata is present. Which is
the case for hardware tag-based KASAN with kasan.mode=prod.

Link: https://lkml.kernel.org/r/37497e940bfd4b32c0a93a702a9ae4cf061d5392.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/Ia114847dfb2244f297d2cb82d592bf6a07455dba
Co-developed-by: Vincenzo Frascino &lt;Vincenzo.Frascino@arm.com&gt;
Signed-off-by: Vincenzo Frascino &lt;Vincenzo.Frascino@arm.com&gt;
Signed-off-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Reviewed-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Reviewed-by: Marco Elver &lt;elver@google.com&gt;
Tested-by: Vincenzo Frascino &lt;vincenzo.frascino@arm.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Branislav Rankov &lt;Branislav.Rankov@arm.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Evgenii Stepanov &lt;eugenis@google.com&gt;
Cc: Kevin Brodsky &lt;kevin.brodsky@arm.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kasan: sanitize objects when metadata doesn't fit</title>
<updated>2020-12-22T20:55:09+00:00</updated>
<author>
<name>Andrey Konovalov</name>
<email>andreyknvl@google.com</email>
</author>
<published>2020-12-22T20:03:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=97593cad003c668e2532cb2939a24a031f8de52d'/>
<id>97593cad003c668e2532cb2939a24a031f8de52d</id>
<content type='text'>
KASAN marks caches that are sanitized with the SLAB_KASAN cache flag.
Currently if the metadata that is appended after the object (stores e.g.
stack trace ids) doesn't fit into KMALLOC_MAX_SIZE (can only happen with
SLAB, see the comment in the patch), KASAN turns off sanitization
completely.

With this change sanitization of the object data is always enabled.
However the metadata is only stored when it fits.  Instead of checking for
SLAB_KASAN flag accross the code to find out whether the metadata is
there, use cache-&gt;kasan_info.alloc/free_meta_offset.  As 0 can be a valid
value for free_meta_offset, introduce KASAN_NO_FREE_META as an indicator
that the free metadata is missing.

Without this change all sanitized KASAN objects would be put into
quarantine with generic KASAN.  With this change, only the objects that
have metadata (i.e.  when it fits) are put into quarantine, the rest is
freed right away.

Along the way rework __kasan_cache_create() and add claryfying comments.

Link: https://lkml.kernel.org/r/aee34b87a5e4afe586c2ac6a0b32db8dc4dcc2dc.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/Icd947e2bea054cb5cfbdc6cf6652227d97032dcb
Co-developed-by: Vincenzo Frascino &lt;Vincenzo.Frascino@arm.com&gt;
Signed-off-by: Vincenzo Frascino &lt;Vincenzo.Frascino@arm.com&gt;
Signed-off-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Reviewed-by: Marco Elver &lt;elver@google.com&gt;
Tested-by: Vincenzo Frascino &lt;vincenzo.frascino@arm.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Branislav Rankov &lt;Branislav.Rankov@arm.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Evgenii Stepanov &lt;eugenis@google.com&gt;
Cc: Kevin Brodsky &lt;kevin.brodsky@arm.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
KASAN marks caches that are sanitized with the SLAB_KASAN cache flag.
Currently if the metadata that is appended after the object (stores e.g.
stack trace ids) doesn't fit into KMALLOC_MAX_SIZE (can only happen with
SLAB, see the comment in the patch), KASAN turns off sanitization
completely.

With this change sanitization of the object data is always enabled.
However the metadata is only stored when it fits.  Instead of checking for
SLAB_KASAN flag accross the code to find out whether the metadata is
there, use cache-&gt;kasan_info.alloc/free_meta_offset.  As 0 can be a valid
value for free_meta_offset, introduce KASAN_NO_FREE_META as an indicator
that the free metadata is missing.

Without this change all sanitized KASAN objects would be put into
quarantine with generic KASAN.  With this change, only the objects that
have metadata (i.e.  when it fits) are put into quarantine, the rest is
freed right away.

Along the way rework __kasan_cache_create() and add claryfying comments.

Link: https://lkml.kernel.org/r/aee34b87a5e4afe586c2ac6a0b32db8dc4dcc2dc.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/Icd947e2bea054cb5cfbdc6cf6652227d97032dcb
Co-developed-by: Vincenzo Frascino &lt;Vincenzo.Frascino@arm.com&gt;
Signed-off-by: Vincenzo Frascino &lt;Vincenzo.Frascino@arm.com&gt;
Signed-off-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Reviewed-by: Marco Elver &lt;elver@google.com&gt;
Tested-by: Vincenzo Frascino &lt;vincenzo.frascino@arm.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Branislav Rankov &lt;Branislav.Rankov@arm.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Evgenii Stepanov &lt;eugenis@google.com&gt;
Cc: Kevin Brodsky &lt;kevin.brodsky@arm.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kasan: clarify comment in __kasan_kfree_large</title>
<updated>2020-12-22T20:55:09+00:00</updated>
<author>
<name>Andrey Konovalov</name>
<email>andreyknvl@google.com</email>
</author>
<published>2020-12-22T20:03:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3933c1757163e8fb471a2d306ba769a04a698900'/>
<id>3933c1757163e8fb471a2d306ba769a04a698900</id>
<content type='text'>
Currently it says that the memory gets poisoned by page_alloc code.
Clarify this by mentioning the specific callback that poisons the memory.

Link: https://lkml.kernel.org/r/1c8380fe0332a3bcc720fe29f1e0bef2e2974416.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/I1334dffb69b87d7986fab88a1a039cc3ea764725
Signed-off-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Reviewed-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Reviewed-by: Marco Elver &lt;elver@google.com&gt;
Tested-by: Vincenzo Frascino &lt;vincenzo.frascino@arm.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Branislav Rankov &lt;Branislav.Rankov@arm.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Evgenii Stepanov &lt;eugenis@google.com&gt;
Cc: Kevin Brodsky &lt;kevin.brodsky@arm.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently it says that the memory gets poisoned by page_alloc code.
Clarify this by mentioning the specific callback that poisons the memory.

Link: https://lkml.kernel.org/r/1c8380fe0332a3bcc720fe29f1e0bef2e2974416.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/I1334dffb69b87d7986fab88a1a039cc3ea764725
Signed-off-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Reviewed-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Reviewed-by: Marco Elver &lt;elver@google.com&gt;
Tested-by: Vincenzo Frascino &lt;vincenzo.frascino@arm.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Branislav Rankov &lt;Branislav.Rankov@arm.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Evgenii Stepanov &lt;eugenis@google.com&gt;
Cc: Kevin Brodsky &lt;kevin.brodsky@arm.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
