<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/binfmt_elf.c, branch v5.17</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Merge tag 'binfmt_elf-v5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux</title>
<updated>2022-03-01T19:31:37+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-03-01T19:31:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=575115360652e9920cc56a028a286ebe9bf82694'/>
<id>575115360652e9920cc56a028a286ebe9bf82694</id>
<content type='text'>
Pull binfmt_elf fix from Kees Cook:
 "This addresses a regression[1] under ia64 where some ET_EXEC binaries
  were not loading"

Link: https://linux-regtracking.leemhuis.info/regzbot/regression/a3edd529-c42d-3b09-135c-7e98a15b150f@leemhuis.info/ [1]

- Fix ia64 ET_EXEC loading

* tag 'binfmt_elf-v5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  binfmt_elf: Avoid total_mapping_size for ET_EXEC
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull binfmt_elf fix from Kees Cook:
 "This addresses a regression[1] under ia64 where some ET_EXEC binaries
  were not loading"

Link: https://linux-regtracking.leemhuis.info/regzbot/regression/a3edd529-c42d-3b09-135c-7e98a15b150f@leemhuis.info/ [1]

- Fix ia64 ET_EXEC loading

* tag 'binfmt_elf-v5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  binfmt_elf: Avoid total_mapping_size for ET_EXEC
</pre>
</div>
</content>
</entry>
<entry>
<title>binfmt_elf: Avoid total_mapping_size for ET_EXEC</title>
<updated>2022-03-01T18:29:20+00:00</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2022-02-28T18:59:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=439a8468242b313486e69b8cc3b45ddcfa898fbf'/>
<id>439a8468242b313486e69b8cc3b45ddcfa898fbf</id>
<content type='text'>
Partially revert commit 5f501d555653 ("binfmt_elf: reintroduce using
MAP_FIXED_NOREPLACE"), which applied the ET_DYN "total_mapping_size"
logic also to ET_EXEC.

At least ia64 has ET_EXEC PT_LOAD segments that are not virtual-address
contiguous (but _are_ file-offset contiguous). This would result in a
giant mapping attempting to cover the entire span, including the virtual
address range hole, and well beyond the size of the ELF file itself,
causing the kernel to refuse to load it. For example:

$ readelf -lW /usr/bin/gcc
...
Program Headers:
  Type Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   ...
...
  LOAD 0x000000 0x4000000000000000 0x4000000000000000 0x00b5a0 0x00b5a0 ...
  LOAD 0x00b5a0 0x600000000000b5a0 0x600000000000b5a0 0x0005ac 0x000710 ...
...
       ^^^^^^^^ ^^^^^^^^^^^^^^^^^^                    ^^^^^^^^ ^^^^^^^^

File offset range     : 0x000000-0x00bb4c
			0x00bb4c bytes

Virtual address range : 0x4000000000000000-0x600000000000bcb0
			0x200000000000bcb0 bytes

Remove the total_mapping_size logic for ET_EXEC, which reduces the
ET_EXEC MAP_FIXED_NOREPLACE coverage to only the first PT_LOAD (better
than nothing), and retains it for ET_DYN.

Ironically, this is the reverse of the problem that originally caused
problems with MAP_FIXED_NOREPLACE: overlapping PT_LOAD segments. Future
work could restore full coverage if load_elf_binary() were to perform
mappings in a separate phase from the loading (where it could resolve
both overlaps and holes).

Cc: Eric Biederman &lt;ebiederm@xmission.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org
Reported-by: matoro &lt;matoro_bugzilla_kernel@matoro.tk&gt;
Fixes: 5f501d555653 ("binfmt_elf: reintroduce using MAP_FIXED_NOREPLACE")
Link: https://lore.kernel.org/r/a3edd529-c42d-3b09-135c-7e98a15b150f@leemhuis.info
Tested-by: matoro &lt;matoro_mailinglist_kernel@matoro.tk&gt;
Link: https://lore.kernel.org/lkml/ce8af9c13bcea9230c7689f3c1e0e2cd@matoro.tk
Tested-By: John Paul Adrian Glaubitz &lt;glaubitz@physik.fu-berlin.de&gt;
Link: https://lore.kernel.org/lkml/49182d0d-708b-4029-da5f-bc18603440a6@physik.fu-berlin.de
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Partially revert commit 5f501d555653 ("binfmt_elf: reintroduce using
MAP_FIXED_NOREPLACE"), which applied the ET_DYN "total_mapping_size"
logic also to ET_EXEC.

At least ia64 has ET_EXEC PT_LOAD segments that are not virtual-address
contiguous (but _are_ file-offset contiguous). This would result in a
giant mapping attempting to cover the entire span, including the virtual
address range hole, and well beyond the size of the ELF file itself,
causing the kernel to refuse to load it. For example:

$ readelf -lW /usr/bin/gcc
...
Program Headers:
  Type Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   ...
...
  LOAD 0x000000 0x4000000000000000 0x4000000000000000 0x00b5a0 0x00b5a0 ...
  LOAD 0x00b5a0 0x600000000000b5a0 0x600000000000b5a0 0x0005ac 0x000710 ...
...
       ^^^^^^^^ ^^^^^^^^^^^^^^^^^^                    ^^^^^^^^ ^^^^^^^^

File offset range     : 0x000000-0x00bb4c
			0x00bb4c bytes

Virtual address range : 0x4000000000000000-0x600000000000bcb0
			0x200000000000bcb0 bytes

Remove the total_mapping_size logic for ET_EXEC, which reduces the
ET_EXEC MAP_FIXED_NOREPLACE coverage to only the first PT_LOAD (better
than nothing), and retains it for ET_DYN.

Ironically, this is the reverse of the problem that originally caused
problems with MAP_FIXED_NOREPLACE: overlapping PT_LOAD segments. Future
work could restore full coverage if load_elf_binary() were to perform
mappings in a separate phase from the loading (where it could resolve
both overlaps and holes).

Cc: Eric Biederman &lt;ebiederm@xmission.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org
Reported-by: matoro &lt;matoro_bugzilla_kernel@matoro.tk&gt;
Fixes: 5f501d555653 ("binfmt_elf: reintroduce using MAP_FIXED_NOREPLACE")
Link: https://lore.kernel.org/r/a3edd529-c42d-3b09-135c-7e98a15b150f@leemhuis.info
Tested-by: matoro &lt;matoro_mailinglist_kernel@matoro.tk&gt;
Link: https://lore.kernel.org/lkml/ce8af9c13bcea9230c7689f3c1e0e2cd@matoro.tk
Tested-By: John Paul Adrian Glaubitz &lt;glaubitz@physik.fu-berlin.de&gt;
Link: https://lore.kernel.org/lkml/49182d0d-708b-4029-da5f-bc18603440a6@physik.fu-berlin.de
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs/binfmt_elf: fix PT_LOAD p_align values for loaders</title>
<updated>2022-02-12T01:55:00+00:00</updated>
<author>
<name>Mike Rapoport</name>
<email>rppt@linux.ibm.com</email>
</author>
<published>2022-02-12T00:32:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=925346c129da1171222a9cdb11fa2b734d9955da'/>
<id>925346c129da1171222a9cdb11fa2b734d9955da</id>
<content type='text'>
Rui Salvaterra reported that Aisleroit solitaire crashes with "Wrong
__data_start/_end pair" assertion from libgc after update to v5.17-rc1.

Bisection pointed to commit 9630f0d60fec ("fs/binfmt_elf: use PT_LOAD
p_align values for static PIE") that fixed handling of static PIEs, but
made the condition that guards load_bias calculation to exclude loader
binaries.

Restoring the check for presence of interpreter fixes the problem.

Link: https://lkml.kernel.org/r/20220202121433.3697146-1-rppt@kernel.org
Fixes: 9630f0d60fec ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE")
Signed-off-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Reported-by: Rui Salvaterra &lt;rsalvaterra@gmail.com&gt;
Tested-by: Rui Salvaterra &lt;rsalvaterra@gmail.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Eric Biederman &lt;ebiederm@xmission.com&gt;
Cc: "H.J. Lu" &lt;hjl.tools@gmail.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Rui Salvaterra reported that Aisleroit solitaire crashes with "Wrong
__data_start/_end pair" assertion from libgc after update to v5.17-rc1.

Bisection pointed to commit 9630f0d60fec ("fs/binfmt_elf: use PT_LOAD
p_align values for static PIE") that fixed handling of static PIEs, but
made the condition that guards load_bias calculation to exclude loader
binaries.

Restoring the check for presence of interpreter fixes the problem.

Link: https://lkml.kernel.org/r/20220202121433.3697146-1-rppt@kernel.org
Fixes: 9630f0d60fec ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE")
Signed-off-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Reported-by: Rui Salvaterra &lt;rsalvaterra@gmail.com&gt;
Tested-by: Rui Salvaterra &lt;rsalvaterra@gmail.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Eric Biederman &lt;ebiederm@xmission.com&gt;
Cc: "H.J. Lu" &lt;hjl.tools@gmail.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs/binfmt_elf: use PT_LOAD p_align values for static PIE</title>
<updated>2022-01-20T06:52:54+00:00</updated>
<author>
<name>H.J. Lu</name>
<email>hjl.tools@gmail.com</email>
</author>
<published>2022-01-20T02:09:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9630f0d60fec5fbcaa4435a66f75df1dc9704b66'/>
<id>9630f0d60fec5fbcaa4435a66f75df1dc9704b66</id>
<content type='text'>
Extend commit ce81bb256a22 ("fs/binfmt_elf: use PT_LOAD p_align values
for suitable start address") which fixed PIE binaries built with
-Wl,-z,max-page-size=0x200000, to cover static PIE binaries.  This
fixes:

    https://bugzilla.kernel.org/show_bug.cgi?id=215275

Tested by verifying static PIE binaries with -Wl,-z,max-page-size=0x200000 loading.

Link: https://lkml.kernel.org/r/20211209174052.370537-1-hjl.tools@gmail.com
Signed-off-by: H.J. Lu &lt;hjl.tools@gmail.com&gt;
Cc: Chris Kennelly &lt;ckennelly@google.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Song Liu &lt;songliubraving@fb.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Sandeep Patil &lt;sspatil@google.com&gt;
Cc: Fangrui Song &lt;maskray@google.com&gt;
Cc: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Shuah Khan &lt;shuah@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Extend commit ce81bb256a22 ("fs/binfmt_elf: use PT_LOAD p_align values
for suitable start address") which fixed PIE binaries built with
-Wl,-z,max-page-size=0x200000, to cover static PIE binaries.  This
fixes:

    https://bugzilla.kernel.org/show_bug.cgi?id=215275

Tested by verifying static PIE binaries with -Wl,-z,max-page-size=0x200000 loading.

Link: https://lkml.kernel.org/r/20211209174052.370537-1-hjl.tools@gmail.com
Signed-off-by: H.J. Lu &lt;hjl.tools@gmail.com&gt;
Cc: Chris Kennelly &lt;ckennelly@google.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Song Liu &lt;songliubraving@fb.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Sandeep Patil &lt;sspatil@google.com&gt;
Cc: Fangrui Song &lt;maskray@google.com&gt;
Cc: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Shuah Khan &lt;shuah@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs/binfmt_elf: replace open-coded string copy with get_task_comm</title>
<updated>2022-01-20T06:52:53+00:00</updated>
<author>
<name>Yafang Shao</name>
<email>laoar.shao@gmail.com</email>
</author>
<published>2022-01-20T02:08:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=95af469c4f609de011debc08e7a35b45201623a8'/>
<id>95af469c4f609de011debc08e7a35b45201623a8</id>
<content type='text'>
It is better to use get_task_comm() instead of the open coded string
copy as we do in other places.

struct elf_prpsinfo is used to dump the task information in userspace
coredump or kernel vmcore.  Below is the verification of vmcore,

  crash&gt; ps
     PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
        0      0   0  ffffffff9d21a940  RU   0.0       0      0  [swapper/0]
  &gt;     0      0   1  ffffa09e40f85e80  RU   0.0       0      0  [swapper/1]
  &gt;     0      0   2  ffffa09e40f81f80  RU   0.0       0      0  [swapper/2]
  &gt;     0      0   3  ffffa09e40f83f00  RU   0.0       0      0  [swapper/3]
  &gt;     0      0   4  ffffa09e40f80000  RU   0.0       0      0  [swapper/4]
  &gt;     0      0   5  ffffa09e40f89f80  RU   0.0       0      0  [swapper/5]
        0      0   6  ffffa09e40f8bf00  RU   0.0       0      0  [swapper/6]
  &gt;     0      0   7  ffffa09e40f88000  RU   0.0       0      0  [swapper/7]
  &gt;     0      0   8  ffffa09e40f8de80  RU   0.0       0      0  [swapper/8]
  &gt;     0      0   9  ffffa09e40f95e80  RU   0.0       0      0  [swapper/9]
  &gt;     0      0  10  ffffa09e40f91f80  RU   0.0       0      0  [swapper/10]
  &gt;     0      0  11  ffffa09e40f93f00  RU   0.0       0      0  [swapper/11]
  &gt;     0      0  12  ffffa09e40f90000  RU   0.0       0      0  [swapper/12]
  &gt;     0      0  13  ffffa09e40f9bf00  RU   0.0       0      0  [swapper/13]
  &gt;     0      0  14  ffffa09e40f98000  RU   0.0       0      0  [swapper/14]
  &gt;     0      0  15  ffffa09e40f9de80  RU   0.0       0      0  [swapper/15]

It works well as expected.

Some comments are added to explain why we use the hard-coded 16.

Link: https://lkml.kernel.org/r/20211120112738.45980-5-laoar.shao@gmail.com
Suggested-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Yafang Shao &lt;laoar.shao@gmail.com&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;arnaldo.melo@gmail.com&gt;
Cc: Andrii Nakryiko &lt;andrii.nakryiko@gmail.com&gt;
Cc: Michal Miroslaw &lt;mirq-linux@rere.qmqm.pl&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Alexei Starovoitov &lt;alexei.starovoitov@gmail.com&gt;
Cc: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Cc: Dennis Dalessandro &lt;dennis.dalessandro@cornelisnetworks.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It is better to use get_task_comm() instead of the open coded string
copy as we do in other places.

struct elf_prpsinfo is used to dump the task information in userspace
coredump or kernel vmcore.  Below is the verification of vmcore,

  crash&gt; ps
     PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
        0      0   0  ffffffff9d21a940  RU   0.0       0      0  [swapper/0]
  &gt;     0      0   1  ffffa09e40f85e80  RU   0.0       0      0  [swapper/1]
  &gt;     0      0   2  ffffa09e40f81f80  RU   0.0       0      0  [swapper/2]
  &gt;     0      0   3  ffffa09e40f83f00  RU   0.0       0      0  [swapper/3]
  &gt;     0      0   4  ffffa09e40f80000  RU   0.0       0      0  [swapper/4]
  &gt;     0      0   5  ffffa09e40f89f80  RU   0.0       0      0  [swapper/5]
        0      0   6  ffffa09e40f8bf00  RU   0.0       0      0  [swapper/6]
  &gt;     0      0   7  ffffa09e40f88000  RU   0.0       0      0  [swapper/7]
  &gt;     0      0   8  ffffa09e40f8de80  RU   0.0       0      0  [swapper/8]
  &gt;     0      0   9  ffffa09e40f95e80  RU   0.0       0      0  [swapper/9]
  &gt;     0      0  10  ffffa09e40f91f80  RU   0.0       0      0  [swapper/10]
  &gt;     0      0  11  ffffa09e40f93f00  RU   0.0       0      0  [swapper/11]
  &gt;     0      0  12  ffffa09e40f90000  RU   0.0       0      0  [swapper/12]
  &gt;     0      0  13  ffffa09e40f9bf00  RU   0.0       0      0  [swapper/13]
  &gt;     0      0  14  ffffa09e40f98000  RU   0.0       0      0  [swapper/14]
  &gt;     0      0  15  ffffa09e40f9de80  RU   0.0       0      0  [swapper/15]

It works well as expected.

Some comments are added to explain why we use the hard-coded 16.

Link: https://lkml.kernel.org/r/20211120112738.45980-5-laoar.shao@gmail.com
Suggested-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Yafang Shao &lt;laoar.shao@gmail.com&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;arnaldo.melo@gmail.com&gt;
Cc: Andrii Nakryiko &lt;andrii.nakryiko@gmail.com&gt;
Cc: Michal Miroslaw &lt;mirq-linux@rere.qmqm.pl&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Alexei Starovoitov &lt;alexei.starovoitov@gmail.com&gt;
Cc: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Cc: Dennis Dalessandro &lt;dennis.dalessandro@cornelisnetworks.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'akpm' (patches from Andrew)</title>
<updated>2021-11-09T18:11:53+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-11-09T18:11:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=59a2ceeef6d6bb8f68550fdbd84246b74a99f06b'/>
<id>59a2ceeef6d6bb8f68550fdbd84246b74a99f06b</id>
<content type='text'>
Merge more updates from Andrew Morton:
 "87 patches.

  Subsystems affected by this patch series: mm (pagecache and hugetlb),
  procfs, misc, MAINTAINERS, lib, checkpatch, binfmt, kallsyms, ramfs,
  init, codafs, nilfs2, hfs, crash_dump, signals, seq_file, fork,
  sysvfs, kcov, gdb, resource, selftests, and ipc"

* emailed patches from Andrew Morton &lt;akpm@linux-foundation.org&gt;: (87 commits)
  ipc/ipc_sysctl.c: remove fallback for !CONFIG_PROC_SYSCTL
  ipc: check checkpoint_restore_ns_capable() to modify C/R proc files
  selftests/kselftest/runner/run_one(): allow running non-executable files
  virtio-mem: disallow mapping virtio-mem memory via /dev/mem
  kernel/resource: disallow access to exclusive system RAM regions
  kernel/resource: clean up and optimize iomem_is_exclusive()
  scripts/gdb: handle split debug for vmlinux
  kcov: replace local_irq_save() with a local_lock_t
  kcov: avoid enable+disable interrupts if !in_task()
  kcov: allocate per-CPU memory on the relevant node
  Documentation/kcov: define `ip' in the example
  Documentation/kcov: include types.h in the example
  sysv: use BUILD_BUG_ON instead of runtime check
  kernel/fork.c: unshare(): use swap() to make code cleaner
  seq_file: fix passing wrong private data
  seq_file: move seq_escape() to a header
  signal: remove duplicate include in signal.h
  crash_dump: remove duplicate include in crash_dump.h
  crash_dump: fix boolreturn.cocci warning
  hfs/hfsplus: use WARN_ON for sanity check
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Merge more updates from Andrew Morton:
 "87 patches.

  Subsystems affected by this patch series: mm (pagecache and hugetlb),
  procfs, misc, MAINTAINERS, lib, checkpatch, binfmt, kallsyms, ramfs,
  init, codafs, nilfs2, hfs, crash_dump, signals, seq_file, fork,
  sysvfs, kcov, gdb, resource, selftests, and ipc"

* emailed patches from Andrew Morton &lt;akpm@linux-foundation.org&gt;: (87 commits)
  ipc/ipc_sysctl.c: remove fallback for !CONFIG_PROC_SYSCTL
  ipc: check checkpoint_restore_ns_capable() to modify C/R proc files
  selftests/kselftest/runner/run_one(): allow running non-executable files
  virtio-mem: disallow mapping virtio-mem memory via /dev/mem
  kernel/resource: disallow access to exclusive system RAM regions
  kernel/resource: clean up and optimize iomem_is_exclusive()
  scripts/gdb: handle split debug for vmlinux
  kcov: replace local_irq_save() with a local_lock_t
  kcov: avoid enable+disable interrupts if !in_task()
  kcov: allocate per-CPU memory on the relevant node
  Documentation/kcov: define `ip' in the example
  Documentation/kcov: include types.h in the example
  sysv: use BUILD_BUG_ON instead of runtime check
  kernel/fork.c: unshare(): use swap() to make code cleaner
  seq_file: fix passing wrong private data
  seq_file: move seq_escape() to a header
  signal: remove duplicate include in signal.h
  crash_dump: remove duplicate include in crash_dump.h
  crash_dump: fix boolreturn.cocci warning
  hfs/hfsplus: use WARN_ON for sanity check
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>ELF: simplify STACK_ALLOC macro</title>
<updated>2021-11-09T18:02:50+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2021-11-09T02:33:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a43e5e3a0227435e580a9e58f57c5face82368ee'/>
<id>a43e5e3a0227435e580a9e58f57c5face82368ee</id>
<content type='text'>
"A -= B; A" is equivalent to "A -= B".

Link: https://lkml.kernel.org/r/YVmcP256fRMqCwgK@localhost.localdomain
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
"A -= B; A" is equivalent to "A -= B".

Link: https://lkml.kernel.org/r/YVmcP256fRMqCwgK@localhost.localdomain
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>binfmt_elf: reintroduce using MAP_FIXED_NOREPLACE</title>
<updated>2021-11-09T18:02:50+00:00</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2021-11-09T02:33:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5f501d555653f8968011a1e65ebb121c8b43c144'/>
<id>5f501d555653f8968011a1e65ebb121c8b43c144</id>
<content type='text'>
Commit b212921b13bd ("elf: don't use MAP_FIXED_NOREPLACE for elf
executable mappings") reverted back to using MAP_FIXED to map ELF LOAD
segments because it was found that the segments in some binaries overlap
and can cause MAP_FIXED_NOREPLACE to fail.

The original intent of MAP_FIXED_NOREPLACE in the ELF loader was to
prevent the silent clobbering of an existing mapping (e.g.  stack) by
the ELF image, which could lead to exploitable conditions.  Quoting
commit 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map"),
which originally introduced the use of MAP_FIXED_NOREPLACE in the
loader:

    Both load_elf_interp and load_elf_binary rely on elf_map to map
    segments [to a specific] address and they use MAP_FIXED to enforce
    that. This is however [a] dangerous thing prone to silent data
    corruption which can be even exploitable.
    ...
    Let's take CVE-2017-1000253 as an example ... we could end up mapping
    [the executable] over the existing stack ... The [stack layout] issue
    has been fixed since then ... So we should be safe and any [similar]
    attack should be impractical. On the other hand this is just too
    subtle [an] assumption ... it can break quite easily and [be] hard to
    spot.
    ...
    Address this [weakness] by changing MAP_FIXED to the newly added
    MAP_FIXED_NOREPLACE. This will mean that mmap will fail if there is
    an existing mapping clashing with the requested one [instead of
    silently] clobbering it.

Then processing ET_DYN binaries the loader already calculates a total
size for the image when the first segment is mapped, maps the entire
image, and then unmaps the remainder before the remaining segments are
then individually mapped.

To avoid the earlier problems (legitimate overlapping LOAD segments
specified in the ELF), apply the same logic to ET_EXEC binaries as well.

For both ET_EXEC and ET_DYN+INTERP use MAP_FIXED_NOREPLACE for the
initial total size mapping and then use MAP_FIXED to build the final
(possibly legitimately overlapping) mappings.  For ET_DYN w/out INTERP,
continue to map at a system-selected address in the mmap region.

Link: https://lkml.kernel.org/r/20210916215947.3993776-1-keescook@chromium.org
Link: https://lore.kernel.org/lkml/1595869887-23307-2-git-send-email-anthony.yznaga@oracle.com
Co-developed-by: Anthony Yznaga &lt;anthony.yznaga@oracle.com&gt;
Signed-off-by: Anthony Yznaga &lt;anthony.yznaga@oracle.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Eric Biederman &lt;ebiederm@xmission.com&gt;
Cc: Chen Jingwen &lt;chenjingwen6@huawei.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Andrei Vagin &lt;avagin@openvz.org&gt;
Cc: Khalid Aziz &lt;khalid.aziz@oracle.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit b212921b13bd ("elf: don't use MAP_FIXED_NOREPLACE for elf
executable mappings") reverted back to using MAP_FIXED to map ELF LOAD
segments because it was found that the segments in some binaries overlap
and can cause MAP_FIXED_NOREPLACE to fail.

The original intent of MAP_FIXED_NOREPLACE in the ELF loader was to
prevent the silent clobbering of an existing mapping (e.g.  stack) by
the ELF image, which could lead to exploitable conditions.  Quoting
commit 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map"),
which originally introduced the use of MAP_FIXED_NOREPLACE in the
loader:

    Both load_elf_interp and load_elf_binary rely on elf_map to map
    segments [to a specific] address and they use MAP_FIXED to enforce
    that. This is however [a] dangerous thing prone to silent data
    corruption which can be even exploitable.
    ...
    Let's take CVE-2017-1000253 as an example ... we could end up mapping
    [the executable] over the existing stack ... The [stack layout] issue
    has been fixed since then ... So we should be safe and any [similar]
    attack should be impractical. On the other hand this is just too
    subtle [an] assumption ... it can break quite easily and [be] hard to
    spot.
    ...
    Address this [weakness] by changing MAP_FIXED to the newly added
    MAP_FIXED_NOREPLACE. This will mean that mmap will fail if there is
    an existing mapping clashing with the requested one [instead of
    silently] clobbering it.

Then processing ET_DYN binaries the loader already calculates a total
size for the image when the first segment is mapped, maps the entire
image, and then unmaps the remainder before the remaining segments are
then individually mapped.

To avoid the earlier problems (legitimate overlapping LOAD segments
specified in the ELF), apply the same logic to ET_EXEC binaries as well.

For both ET_EXEC and ET_DYN+INTERP use MAP_FIXED_NOREPLACE for the
initial total size mapping and then use MAP_FIXED to build the final
(possibly legitimately overlapping) mappings.  For ET_DYN w/out INTERP,
continue to map at a system-selected address in the mmap region.

Link: https://lkml.kernel.org/r/20210916215947.3993776-1-keescook@chromium.org
Link: https://lore.kernel.org/lkml/1595869887-23307-2-git-send-email-anthony.yznaga@oracle.com
Co-developed-by: Anthony Yznaga &lt;anthony.yznaga@oracle.com&gt;
Signed-off-by: Anthony Yznaga &lt;anthony.yznaga@oracle.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Eric Biederman &lt;ebiederm@xmission.com&gt;
Cc: Chen Jingwen &lt;chenjingwen6@huawei.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Andrei Vagin &lt;avagin@openvz.org&gt;
Cc: Khalid Aziz &lt;khalid.aziz@oracle.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'per_signal_struct_coredumps-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace</title>
<updated>2021-11-03T19:15:29+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-11-03T19:15:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a602285ac11b019e9ce7c3907328e9f95f4967f0'/>
<id>a602285ac11b019e9ce7c3907328e9f95f4967f0</id>
<content type='text'>
Pull per signal_struct coredumps from Eric Biederman:
 "Current coredumps are mixed up with the exit code, the signal handling
  code, and the ptrace code making coredumps much more complicated than
  necessary and difficult to follow.

  This series of changes starts with ptrace_stop and cleans it up,
  making it easier to follow what is happening in ptrace_stop. Then
  cleans up the exec interactions with coredumps. Then cleans up the
  coredump interactions with exit. Finally the coredump interactions
  with the signal handling code is cleaned up.

  The first and last changes are bug fixes for minor bugs.

  I believe the fact that vfork followed by execve can kill the process
  the called vfork if exec fails is sufficient justification to change
  the userspace visible behavior.

  In previous discussions some of these changes were organized
  differently and individually appeared to make the code base worse. As
  currently written I believe they all stand on their own as cleanups
  and bug fixes.

  Which means that even if the worst should happen and the last change
  needs to be reverted for some unimaginable reason, the code base will
  still be improved.

  If the worst does not happen there are a more cleanups that can be
  made. Signals that generate coredumps can easily become eligible for
  short circuit delivery in complete_signal. The entire rendezvous for
  generating a coredump can move into get_signal. The function
  force_sig_info_to_task be written in a way that does not modify the
  signal handling state of the target task (because coredumps are
  eligible for short circuit delivery). Many of these future cleanups
  can be done another way but nothing so cleanly as if coredumps become
  per signal_struct"

* 'per_signal_struct_coredumps-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  coredump: Limit coredumps to a single thread group
  coredump:  Don't perform any cleanups before dumping core
  exit: Factor coredump_exit_mm out of exit_mm
  exec: Check for a pending fatal signal instead of core_state
  ptrace: Remove the unnecessary arguments from arch_ptrace_stop
  signal: Remove the bogus sigkill_pending in ptrace_stop
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull per signal_struct coredumps from Eric Biederman:
 "Current coredumps are mixed up with the exit code, the signal handling
  code, and the ptrace code making coredumps much more complicated than
  necessary and difficult to follow.

  This series of changes starts with ptrace_stop and cleans it up,
  making it easier to follow what is happening in ptrace_stop. Then
  cleans up the exec interactions with coredumps. Then cleans up the
  coredump interactions with exit. Finally the coredump interactions
  with the signal handling code is cleaned up.

  The first and last changes are bug fixes for minor bugs.

  I believe the fact that vfork followed by execve can kill the process
  the called vfork if exec fails is sufficient justification to change
  the userspace visible behavior.

  In previous discussions some of these changes were organized
  differently and individually appeared to make the code base worse. As
  currently written I believe they all stand on their own as cleanups
  and bug fixes.

  Which means that even if the worst should happen and the last change
  needs to be reverted for some unimaginable reason, the code base will
  still be improved.

  If the worst does not happen there are a more cleanups that can be
  made. Signals that generate coredumps can easily become eligible for
  short circuit delivery in complete_signal. The entire rendezvous for
  generating a coredump can move into get_signal. The function
  force_sig_info_to_task be written in a way that does not modify the
  signal handling state of the target task (because coredumps are
  eligible for short circuit delivery). Many of these future cleanups
  can be done another way but nothing so cleanly as if coredumps become
  per signal_struct"

* 'per_signal_struct_coredumps-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  coredump: Limit coredumps to a single thread group
  coredump:  Don't perform any cleanups before dumping core
  exit: Factor coredump_exit_mm out of exit_mm
  exec: Check for a pending fatal signal instead of core_state
  ptrace: Remove the unnecessary arguments from arch_ptrace_stop
  signal: Remove the bogus sigkill_pending in ptrace_stop
</pre>
</div>
</content>
</entry>
<entry>
<title>coredump: Limit coredumps to a single thread group</title>
<updated>2021-10-08T17:06:02+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2021-09-22T16:24:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0258b5fd7c7124b87e185a1a9322d2c66b1876b7'/>
<id>0258b5fd7c7124b87e185a1a9322d2c66b1876b7</id>
<content type='text'>
Today when a signal is delivered with a handler of SIG_DFL whose
default behavior is to generate a core dump not only that process but
every process that shares the mm is killed.

In the case of vfork this looks like a real world problem.  Consider
the following well defined sequence.

	if (vfork() == 0) {
		execve(...);
		_exit(EXIT_FAILURE);
	}

If a signal that generates a core dump is received after vfork but
before the execve changes the mm the process that called vfork will
also be killed (as the mm is shared).

Similarly if the execve fails after the point of no return the kernel
delivers SIGSEGV which will kill both the exec'ing process and because
the mm is shared the process that called vfork as well.

As far as I can tell this behavior is a violation of people's
reasonable expectations, POSIX, and is unnecessarily fragile when the
system is low on memory.

Solve this by making a userspace visible change to only kill a single
process/thread group.  This is possible because Jann Horn recently
modified[1] the coredump code so that the mm can safely be modified
while the coredump is happening.  With LinuxThreads long gone I don't
expect anyone to have a notice this behavior change in practice.

To accomplish this move the core_state pointer from mm_struct to
signal_struct, which allows different thread groups to coredump
simultatenously.

In zap_threads remove the work to kill anything except for the current
thread group.

v2: Remove core_state from the VM_BUG_ON_MM print to fix
    compile failure when CONFIG_DEBUG_VM is enabled.
    Reported-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;

[1] a07279c9a8cd ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot")
Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3")
History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Link: https://lkml.kernel.org/r/87y27mvnke.fsf@disp2133
Link: https://lkml.kernel.org/r/20211007144701.67592574@canb.auug.org.au
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Today when a signal is delivered with a handler of SIG_DFL whose
default behavior is to generate a core dump not only that process but
every process that shares the mm is killed.

In the case of vfork this looks like a real world problem.  Consider
the following well defined sequence.

	if (vfork() == 0) {
		execve(...);
		_exit(EXIT_FAILURE);
	}

If a signal that generates a core dump is received after vfork but
before the execve changes the mm the process that called vfork will
also be killed (as the mm is shared).

Similarly if the execve fails after the point of no return the kernel
delivers SIGSEGV which will kill both the exec'ing process and because
the mm is shared the process that called vfork as well.

As far as I can tell this behavior is a violation of people's
reasonable expectations, POSIX, and is unnecessarily fragile when the
system is low on memory.

Solve this by making a userspace visible change to only kill a single
process/thread group.  This is possible because Jann Horn recently
modified[1] the coredump code so that the mm can safely be modified
while the coredump is happening.  With LinuxThreads long gone I don't
expect anyone to have a notice this behavior change in practice.

To accomplish this move the core_state pointer from mm_struct to
signal_struct, which allows different thread groups to coredump
simultatenously.

In zap_threads remove the work to kill anything except for the current
thread group.

v2: Remove core_state from the VM_BUG_ON_MM print to fix
    compile failure when CONFIG_DEBUG_VM is enabled.
    Reported-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;

[1] a07279c9a8cd ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot")
Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3")
History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Link: https://lkml.kernel.org/r/87y27mvnke.fsf@disp2133
Link: https://lkml.kernel.org/r/20211007144701.67592574@canb.auug.org.au
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
