<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/binfmt_misc.c, branch v6.3</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>binfmt_misc: fix shift-out-of-bounds in check_special_flags</title>
<updated>2022-12-02T21:57:04+00:00</updated>
<author>
<name>Liu Shixin</name>
<email>liushixin2@huawei.com</email>
</author>
<published>2022-11-02T02:51:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6a46bf558803dd2b959ca7435a5c143efe837217'/>
<id>6a46bf558803dd2b959ca7435a5c143efe837217</id>
<content type='text'>
UBSAN reported a shift-out-of-bounds warning:

 left shift of 1 by 31 places cannot be represented in type 'int'
 Call Trace:
  &lt;TASK&gt;
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0x8d/0xcf lib/dump_stack.c:106
  ubsan_epilogue+0xa/0x44 lib/ubsan.c:151
  __ubsan_handle_shift_out_of_bounds+0x1e7/0x208 lib/ubsan.c:322
  check_special_flags fs/binfmt_misc.c:241 [inline]
  create_entry fs/binfmt_misc.c:456 [inline]
  bm_register_write+0x9d3/0xa20 fs/binfmt_misc.c:654
  vfs_write+0x11e/0x580 fs/read_write.c:582
  ksys_write+0xcf/0x120 fs/read_write.c:637
  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
  do_syscall_64+0x34/0x80 arch/x86/entry/common.c:80
  entry_SYSCALL_64_after_hwframe+0x63/0xcd
 RIP: 0033:0x4194e1

Since the type of Node's flags is unsigned long, we should define these
macros with same type too.

Signed-off-by: Liu Shixin &lt;liushixin2@huawei.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Link: https://lore.kernel.org/r/20221102025123.1117184-1-liushixin2@huawei.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
UBSAN reported a shift-out-of-bounds warning:

 left shift of 1 by 31 places cannot be represented in type 'int'
 Call Trace:
  &lt;TASK&gt;
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0x8d/0xcf lib/dump_stack.c:106
  ubsan_epilogue+0xa/0x44 lib/ubsan.c:151
  __ubsan_handle_shift_out_of_bounds+0x1e7/0x208 lib/ubsan.c:322
  check_special_flags fs/binfmt_misc.c:241 [inline]
  create_entry fs/binfmt_misc.c:456 [inline]
  bm_register_write+0x9d3/0xa20 fs/binfmt_misc.c:654
  vfs_write+0x11e/0x580 fs/read_write.c:582
  ksys_write+0xcf/0x120 fs/read_write.c:637
  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
  do_syscall_64+0x34/0x80 arch/x86/entry/common.c:80
  entry_SYSCALL_64_after_hwframe+0x63/0xcd
 RIP: 0033:0x4194e1

Since the type of Node's flags is unsigned long, we should define these
macros with same type too.

Signed-off-by: Liu Shixin &lt;liushixin2@huawei.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Link: https://lore.kernel.org/r/20221102025123.1117184-1-liushixin2@huawei.com
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix regression due to "fs: move binfmt_misc sysctl to its own file"</title>
<updated>2022-02-09T17:50:02+00:00</updated>
<author>
<name>Domenico Andreoli</name>
<email>domenico.andreoli@linux.com</email>
</author>
<published>2022-02-09T07:49:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b42bc9a3c5115c3102a4923776bbeed3b191f2db'/>
<id>b42bc9a3c5115c3102a4923776bbeed3b191f2db</id>
<content type='text'>
Commit 3ba442d5331f ("fs: move binfmt_misc sysctl to its own file") did
not go unnoticed, binfmt-support stopped to work on my Debian system
since v5.17-rc2 (did not check with -rc1).

The existance of the /proc/sys/fs/binfmt_misc is a precondition for
attempting to mount the binfmt_misc fs, which in turn triggers the
autoload of the binfmt_misc module.  Without it, no module is loaded and
no binfmt is available at boot.

Building as built-in or manually loading the module and mounting the fs
works fine, it's therefore only a matter of interaction with user-space.
I could try to improve the Debian systemd configuration but I can't say
anything about the other distributions.

This patch restores a working system right after boot.

Fixes: 3ba442d5331f ("fs: move binfmt_misc sysctl to its own file")
Signed-off-by: Domenico Andreoli &lt;domenico.andreoli@linux.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Reviewed-by: Tong Zhang &lt;ztong0001@gmail.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 3ba442d5331f ("fs: move binfmt_misc sysctl to its own file") did
not go unnoticed, binfmt-support stopped to work on my Debian system
since v5.17-rc2 (did not check with -rc1).

The existance of the /proc/sys/fs/binfmt_misc is a precondition for
attempting to mount the binfmt_misc fs, which in turn triggers the
autoload of the binfmt_misc module.  Without it, no module is loaded and
no binfmt is available at boot.

Building as built-in or manually loading the module and mounting the fs
works fine, it's therefore only a matter of interaction with user-space.
I could try to improve the Debian systemd configuration but I can't say
anything about the other distributions.

This patch restores a working system right after boot.

Fixes: 3ba442d5331f ("fs: move binfmt_misc sysctl to its own file")
Signed-off-by: Domenico Andreoli &lt;domenico.andreoli@linux.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Reviewed-by: Tong Zhang &lt;ztong0001@gmail.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>binfmt_misc: fix crash when load/unload module</title>
<updated>2022-01-30T07:56:58+00:00</updated>
<author>
<name>Tong Zhang</name>
<email>ztong0001@gmail.com</email>
</author>
<published>2022-01-29T21:40:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e7f1e8834b2b2144dfbe0b2235d05e4f6f95882e'/>
<id>e7f1e8834b2b2144dfbe0b2235d05e4f6f95882e</id>
<content type='text'>
We should unregister the table upon module unload otherwise something
horrible will happen when we load binfmt_misc module again.  Also note
that we should keep value returned by register_sysctl_mount_point() and
release it later, otherwise it will leak.

Also, per Christian's comment, to fully restore the old behavior that
won't break userspace the check(binfmt_misc_header) should be
eliminated.

To reproduce:
  modprobe binfmt_misc
  modprobe -r binfmt_misc
  modprobe binfmt_misc
  modprobe -r binfmt_misc
  modprobe binfmt_misc

resulting in

  modprobe: can't load module binfmt_misc (kernel/fs/binfmt_misc.ko): Cannot allocate memory

and an unhappy kernel:

  binfmt_misc: Failed to create fs/binfmt_misc sysctl mount point
  binfmt_misc: Failed to create fs/binfmt_misc sysctl mount point
  BUG: unable to handle page fault for address: fffffbfff8004802
  Call Trace:
    init_misc_binfmt+0x2d/0x1000 [binfmt_misc]

Link: https://lkml.kernel.org/r/20220124181812.1869535-2-ztong0001@gmail.com
Fixes: 3ba442d5331f ("fs: move binfmt_misc sysctl to its own file")
Signed-off-by: Tong Zhang &lt;ztong0001@gmail.com&gt;
Co-developed-by: Christian Brauner&lt;brauner@kernel.org&gt;
Acked-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Cc: Eric Biederman &lt;ebiederm@xmission.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Iurii Zaikin &lt;yzaikin@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We should unregister the table upon module unload otherwise something
horrible will happen when we load binfmt_misc module again.  Also note
that we should keep value returned by register_sysctl_mount_point() and
release it later, otherwise it will leak.

Also, per Christian's comment, to fully restore the old behavior that
won't break userspace the check(binfmt_misc_header) should be
eliminated.

To reproduce:
  modprobe binfmt_misc
  modprobe -r binfmt_misc
  modprobe binfmt_misc
  modprobe -r binfmt_misc
  modprobe binfmt_misc

resulting in

  modprobe: can't load module binfmt_misc (kernel/fs/binfmt_misc.ko): Cannot allocate memory

and an unhappy kernel:

  binfmt_misc: Failed to create fs/binfmt_misc sysctl mount point
  binfmt_misc: Failed to create fs/binfmt_misc sysctl mount point
  BUG: unable to handle page fault for address: fffffbfff8004802
  Call Trace:
    init_misc_binfmt+0x2d/0x1000 [binfmt_misc]

Link: https://lkml.kernel.org/r/20220124181812.1869535-2-ztong0001@gmail.com
Fixes: 3ba442d5331f ("fs: move binfmt_misc sysctl to its own file")
Signed-off-by: Tong Zhang &lt;ztong0001@gmail.com&gt;
Co-developed-by: Christian Brauner&lt;brauner@kernel.org&gt;
Acked-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Cc: Eric Biederman &lt;ebiederm@xmission.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Iurii Zaikin &lt;yzaikin@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: move binfmt_misc sysctl to its own file</title>
<updated>2022-01-22T06:33:35+00:00</updated>
<author>
<name>Luis Chamberlain</name>
<email>mcgrof@kernel.org</email>
</author>
<published>2022-01-22T06:12:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3ba442d5331ff4d4339faf4986d0841386196f92'/>
<id>3ba442d5331ff4d4339faf4986d0841386196f92</id>
<content type='text'>
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.

To help with this maintenance let's start by moving sysctls to places
where they actually belong.  The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.

This moves the binfmt_misc sysctl to its own file to help remove clutter
from kernel/sysctl.c.

Link: https://lkml.kernel.org/r/20211124231435.1445213-5-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Amir Goldstein &lt;amir73il@gmail.com&gt;
Cc: Andy Shevchenko &lt;andriy.shevchenko@linux.intel.com&gt;
Cc: Antti Palosaari &lt;crope@iki.fi&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: Clemens Ladisch &lt;clemens@ladisch.de&gt;
Cc: David Airlie &lt;airlied@linux.ie&gt;
Cc: Douglas Gilbert &lt;dgilbert@interlog.com&gt;
Cc: Eric Biederman &lt;ebiederm@xmission.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Iurii Zaikin &lt;yzaikin@google.com&gt;
Cc: James E.J. Bottomley &lt;jejb@linux.ibm.com&gt;
Cc: Jani Nikula &lt;jani.nikula@intel.com&gt;
Cc: Jani Nikula &lt;jani.nikula@linux.intel.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Joel Becker &lt;jlbec@evilplan.org&gt;
Cc: John Ogness &lt;john.ogness@linutronix.de&gt;
Cc: Joonas Lahtinen &lt;joonas.lahtinen@linux.intel.com&gt;
Cc: Joseph Qi &lt;joseph.qi@linux.alibaba.com&gt;
Cc: Julia Lawall &lt;julia.lawall@inria.fr&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Lukas Middendorf &lt;kernel@tuxforce.de&gt;
Cc: Mark Fasheh &lt;mark@fasheh.com&gt;
Cc: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Cc: Paul Turner &lt;pjt@google.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Phillip Potter &lt;phil@philpotter.co.uk&gt;
Cc: Qing Wang &lt;wangqing@vivo.com&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
Cc: Sebastian Reichel &lt;sre@kernel.org&gt;
Cc: Sergey Senozhatsky &lt;senozhatsky@chromium.org&gt;
Cc: Stephen Kitt &lt;steve@sk2.org&gt;
Cc: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Cc: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Cc: Xiaoming Ni &lt;nixiaoming@huawei.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.

To help with this maintenance let's start by moving sysctls to places
where they actually belong.  The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.

This moves the binfmt_misc sysctl to its own file to help remove clutter
from kernel/sysctl.c.

Link: https://lkml.kernel.org/r/20211124231435.1445213-5-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Amir Goldstein &lt;amir73il@gmail.com&gt;
Cc: Andy Shevchenko &lt;andriy.shevchenko@linux.intel.com&gt;
Cc: Antti Palosaari &lt;crope@iki.fi&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: Clemens Ladisch &lt;clemens@ladisch.de&gt;
Cc: David Airlie &lt;airlied@linux.ie&gt;
Cc: Douglas Gilbert &lt;dgilbert@interlog.com&gt;
Cc: Eric Biederman &lt;ebiederm@xmission.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Iurii Zaikin &lt;yzaikin@google.com&gt;
Cc: James E.J. Bottomley &lt;jejb@linux.ibm.com&gt;
Cc: Jani Nikula &lt;jani.nikula@intel.com&gt;
Cc: Jani Nikula &lt;jani.nikula@linux.intel.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Joel Becker &lt;jlbec@evilplan.org&gt;
Cc: John Ogness &lt;john.ogness@linutronix.de&gt;
Cc: Joonas Lahtinen &lt;joonas.lahtinen@linux.intel.com&gt;
Cc: Joseph Qi &lt;joseph.qi@linux.alibaba.com&gt;
Cc: Julia Lawall &lt;julia.lawall@inria.fr&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Lukas Middendorf &lt;kernel@tuxforce.de&gt;
Cc: Mark Fasheh &lt;mark@fasheh.com&gt;
Cc: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Cc: Paul Turner &lt;pjt@google.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Phillip Potter &lt;phil@philpotter.co.uk&gt;
Cc: Qing Wang &lt;wangqing@vivo.com&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
Cc: Sebastian Reichel &lt;sre@kernel.org&gt;
Cc: Sergey Senozhatsky &lt;senozhatsky@chromium.org&gt;
Cc: Stephen Kitt &lt;steve@sk2.org&gt;
Cc: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Cc: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Cc: Xiaoming Ni &lt;nixiaoming@huawei.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>binfmt_misc: fix possible deadlock in bm_register_write</title>
<updated>2021-03-13T19:27:30+00:00</updated>
<author>
<name>Lior Ribak</name>
<email>liorribak@gmail.com</email>
</author>
<published>2021-03-13T05:07:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e7850f4d844e0acfac7e570af611d89deade3146'/>
<id>e7850f4d844e0acfac7e570af611d89deade3146</id>
<content type='text'>
There is a deadlock in bm_register_write:

First, in the begining of the function, a lock is taken on the binfmt_misc
root inode with inode_lock(d_inode(root)).

Then, if the user used the MISC_FMT_OPEN_FILE flag, the function will call
open_exec on the user-provided interpreter.

open_exec will call a path lookup, and if the path lookup process includes
the root of binfmt_misc, it will try to take a shared lock on its inode
again, but it is already locked, and the code will get stuck in a deadlock

To reproduce the bug:
$ echo ":iiiii:E::ii::/proc/sys/fs/binfmt_misc/bla:F" &gt; /proc/sys/fs/binfmt_misc/register

backtrace of where the lock occurs (#5):
0  schedule () at ./arch/x86/include/asm/current.h:15
1  0xffffffff81b51237 in rwsem_down_read_slowpath (sem=0xffff888003b202e0, count=&lt;optimized out&gt;, state=state@entry=2) at kernel/locking/rwsem.c:992
2  0xffffffff81b5150a in __down_read_common (state=2, sem=&lt;optimized out&gt;) at kernel/locking/rwsem.c:1213
3  __down_read (sem=&lt;optimized out&gt;) at kernel/locking/rwsem.c:1222
4  down_read (sem=&lt;optimized out&gt;) at kernel/locking/rwsem.c:1355
5  0xffffffff811ee22a in inode_lock_shared (inode=&lt;optimized out&gt;) at ./include/linux/fs.h:783
6  open_last_lookups (op=0xffffc9000022fe34, file=0xffff888004098600, nd=0xffffc9000022fd10) at fs/namei.c:3177
7  path_openat (nd=nd@entry=0xffffc9000022fd10, op=op@entry=0xffffc9000022fe34, flags=flags@entry=65) at fs/namei.c:3366
8  0xffffffff811efe1c in do_filp_open (dfd=&lt;optimized out&gt;, pathname=pathname@entry=0xffff8880031b9000, op=op@entry=0xffffc9000022fe34) at fs/namei.c:3396
9  0xffffffff811e493f in do_open_execat (fd=fd@entry=-100, name=name@entry=0xffff8880031b9000, flags=&lt;optimized out&gt;, flags@entry=0) at fs/exec.c:913
10 0xffffffff811e4a92 in open_exec (name=&lt;optimized out&gt;) at fs/exec.c:948
11 0xffffffff8124aa84 in bm_register_write (file=&lt;optimized out&gt;, buffer=&lt;optimized out&gt;, count=19, ppos=&lt;optimized out&gt;) at fs/binfmt_misc.c:682
12 0xffffffff811decd2 in vfs_write (file=file@entry=0xffff888004098500, buf=buf@entry=0xa758d0 ":iiiii:E::ii::i:CF
", count=count@entry=19, pos=pos@entry=0xffffc9000022ff10) at fs/read_write.c:603
13 0xffffffff811defda in ksys_write (fd=&lt;optimized out&gt;, buf=0xa758d0 ":iiiii:E::ii::i:CF
", count=19) at fs/read_write.c:658
14 0xffffffff81b49813 in do_syscall_64 (nr=&lt;optimized out&gt;, regs=0xffffc9000022ff58) at arch/x86/entry/common.c:46
15 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:120

To solve the issue, the open_exec call is moved to before the write
lock is taken by bm_register_write

Link: https://lkml.kernel.org/r/20210228224414.95962-1-liorribak@gmail.com
Fixes: 948b701a607f1 ("binfmt_misc: add persistent opened binary handler for containers")
Signed-off-by: Lior Ribak &lt;liorribak@gmail.com&gt;
Acked-by: Helge Deller &lt;deller@gmx.de&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is a deadlock in bm_register_write:

First, in the begining of the function, a lock is taken on the binfmt_misc
root inode with inode_lock(d_inode(root)).

Then, if the user used the MISC_FMT_OPEN_FILE flag, the function will call
open_exec on the user-provided interpreter.

open_exec will call a path lookup, and if the path lookup process includes
the root of binfmt_misc, it will try to take a shared lock on its inode
again, but it is already locked, and the code will get stuck in a deadlock

To reproduce the bug:
$ echo ":iiiii:E::ii::/proc/sys/fs/binfmt_misc/bla:F" &gt; /proc/sys/fs/binfmt_misc/register

backtrace of where the lock occurs (#5):
0  schedule () at ./arch/x86/include/asm/current.h:15
1  0xffffffff81b51237 in rwsem_down_read_slowpath (sem=0xffff888003b202e0, count=&lt;optimized out&gt;, state=state@entry=2) at kernel/locking/rwsem.c:992
2  0xffffffff81b5150a in __down_read_common (state=2, sem=&lt;optimized out&gt;) at kernel/locking/rwsem.c:1213
3  __down_read (sem=&lt;optimized out&gt;) at kernel/locking/rwsem.c:1222
4  down_read (sem=&lt;optimized out&gt;) at kernel/locking/rwsem.c:1355
5  0xffffffff811ee22a in inode_lock_shared (inode=&lt;optimized out&gt;) at ./include/linux/fs.h:783
6  open_last_lookups (op=0xffffc9000022fe34, file=0xffff888004098600, nd=0xffffc9000022fd10) at fs/namei.c:3177
7  path_openat (nd=nd@entry=0xffffc9000022fd10, op=op@entry=0xffffc9000022fe34, flags=flags@entry=65) at fs/namei.c:3366
8  0xffffffff811efe1c in do_filp_open (dfd=&lt;optimized out&gt;, pathname=pathname@entry=0xffff8880031b9000, op=op@entry=0xffffc9000022fe34) at fs/namei.c:3396
9  0xffffffff811e493f in do_open_execat (fd=fd@entry=-100, name=name@entry=0xffff8880031b9000, flags=&lt;optimized out&gt;, flags@entry=0) at fs/exec.c:913
10 0xffffffff811e4a92 in open_exec (name=&lt;optimized out&gt;) at fs/exec.c:948
11 0xffffffff8124aa84 in bm_register_write (file=&lt;optimized out&gt;, buffer=&lt;optimized out&gt;, count=19, ppos=&lt;optimized out&gt;) at fs/binfmt_misc.c:682
12 0xffffffff811decd2 in vfs_write (file=file@entry=0xffff888004098500, buf=buf@entry=0xa758d0 ":iiiii:E::ii::i:CF
", count=count@entry=19, pos=pos@entry=0xffffc9000022ff10) at fs/read_write.c:603
13 0xffffffff811defda in ksys_write (fd=&lt;optimized out&gt;, buf=0xa758d0 ":iiiii:E::ii::i:CF
", count=19) at fs/read_write.c:658
14 0xffffffff81b49813 in do_syscall_64 (nr=&lt;optimized out&gt;, regs=0xffffc9000022ff58) at arch/x86/entry/common.c:46
15 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:120

To solve the issue, the open_exec call is moved to before the write
lock is taken by bm_register_write

Link: https://lkml.kernel.org/r/20210228224414.95962-1-liorribak@gmail.com
Fixes: 948b701a607f1 ("binfmt_misc: add persistent opened binary handler for containers")
Signed-off-by: Lior Ribak &lt;liorribak@gmail.com&gt;
Acked-by: Helge Deller &lt;deller@gmx.de&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>binfmt_misc: pass binfmt_misc flags to the interpreter</title>
<updated>2021-02-15T17:28:30+00:00</updated>
<author>
<name>Laurent Vivier</name>
<email>laurent@vivier.eu</email>
</author>
<published>2020-01-28T13:25:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2347961b11d4079deace3c81dceed460c08a8fc1'/>
<id>2347961b11d4079deace3c81dceed460c08a8fc1</id>
<content type='text'>
It can be useful to the interpreter to know which flags are in use.

For instance, knowing if the preserve-argv[0] is in use would
allow to skip the pathname argument.

This patch uses an unused auxiliary vector, AT_FLAGS, to add a
flag to inform interpreter if the preserve-argv[0] is enabled.

Note by Helge Deller:
The real-world user of this patch is qemu-user, which needs to know
if it has to preserve the argv[0]. See Debian bug #970460.

Signed-off-by: Laurent Vivier &lt;laurent@vivier.eu&gt;
Reviewed-by: YunQiang Su &lt;ysu@wavecomp.com&gt;
URL: http://bugs.debian.org/970460
Signed-off-by: Helge Deller &lt;deller@gmx.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It can be useful to the interpreter to know which flags are in use.

For instance, knowing if the preserve-argv[0] is in use would
allow to skip the pathname argument.

This patch uses an unused auxiliary vector, AT_FLAGS, to add a
flag to inform interpreter if the preserve-argv[0] is enabled.

Note by Helge Deller:
The real-world user of this patch is qemu-user, which needs to know
if it has to preserve the argv[0]. See Debian bug #970460.

Signed-off-by: Laurent Vivier &lt;laurent@vivier.eu&gt;
Reviewed-by: YunQiang Su &lt;ysu@wavecomp.com&gt;
URL: http://bugs.debian.org/970460
Signed-off-by: Helge Deller &lt;deller@gmx.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'akpm' (patches from Andrew)</title>
<updated>2020-06-05T02:18:29+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-06-05T02:18:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=886d7de631da71e30909980fdbf318f7caade262'/>
<id>886d7de631da71e30909980fdbf318f7caade262</id>
<content type='text'>
Merge yet more updates from Andrew Morton:

 - More MM work. 100ish more to go. Mike Rapoport's "mm: remove
   __ARCH_HAS_5LEVEL_HACK" series should fix the current ppc issue

 - Various other little subsystems

* emailed patches from Andrew Morton &lt;akpm@linux-foundation.org&gt;: (127 commits)
  lib/ubsan.c: fix gcc-10 warnings
  tools/testing/selftests/vm: remove duplicate headers
  selftests: vm: pkeys: fix multilib builds for x86
  selftests: vm: pkeys: use the correct page size on powerpc
  selftests/vm/pkeys: override access right definitions on powerpc
  selftests/vm/pkeys: test correct behaviour of pkey-0
  selftests/vm/pkeys: introduce a sub-page allocator
  selftests/vm/pkeys: detect write violation on a mapped access-denied-key page
  selftests/vm/pkeys: associate key on a mapped page and detect write violation
  selftests/vm/pkeys: associate key on a mapped page and detect access violation
  selftests/vm/pkeys: improve checks to determine pkey support
  selftests/vm/pkeys: fix assertion in test_pkey_alloc_exhaust()
  selftests/vm/pkeys: fix number of reserved powerpc pkeys
  selftests/vm/pkeys: introduce powerpc support
  selftests/vm/pkeys: introduce generic pkey abstractions
  selftests: vm: pkeys: use the correct huge page size
  selftests/vm/pkeys: fix alloc_random_pkey() to make it really random
  selftests/vm/pkeys: fix assertion in pkey_disable_set/clear()
  selftests/vm/pkeys: fix pkey_disable_clear()
  selftests: vm: pkeys: add helpers for pkey bits
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Merge yet more updates from Andrew Morton:

 - More MM work. 100ish more to go. Mike Rapoport's "mm: remove
   __ARCH_HAS_5LEVEL_HACK" series should fix the current ppc issue

 - Various other little subsystems

* emailed patches from Andrew Morton &lt;akpm@linux-foundation.org&gt;: (127 commits)
  lib/ubsan.c: fix gcc-10 warnings
  tools/testing/selftests/vm: remove duplicate headers
  selftests: vm: pkeys: fix multilib builds for x86
  selftests: vm: pkeys: use the correct page size on powerpc
  selftests/vm/pkeys: override access right definitions on powerpc
  selftests/vm/pkeys: test correct behaviour of pkey-0
  selftests/vm/pkeys: introduce a sub-page allocator
  selftests/vm/pkeys: detect write violation on a mapped access-denied-key page
  selftests/vm/pkeys: associate key on a mapped page and detect write violation
  selftests/vm/pkeys: associate key on a mapped page and detect access violation
  selftests/vm/pkeys: improve checks to determine pkey support
  selftests/vm/pkeys: fix assertion in test_pkey_alloc_exhaust()
  selftests/vm/pkeys: fix number of reserved powerpc pkeys
  selftests/vm/pkeys: introduce powerpc support
  selftests/vm/pkeys: introduce generic pkey abstractions
  selftests: vm: pkeys: use the correct huge page size
  selftests/vm/pkeys: fix alloc_random_pkey() to make it really random
  selftests/vm/pkeys: fix assertion in pkey_disable_set/clear()
  selftests/vm/pkeys: fix pkey_disable_clear()
  selftests: vm: pkeys: add helpers for pkey bits
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>exec: simplify the copy_strings_kernel calling convention</title>
<updated>2020-06-05T02:06:26+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2020-06-04T23:51:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=986db2d14a6dca6456b63b4f5c410ae2aab4ec9d'/>
<id>986db2d14a6dca6456b63b4f5c410ae2aab4ec9d</id>
<content type='text'>
copy_strings_kernel is always used with a single argument,
adjust the calling convention to that.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Link: http://lkml.kernel.org/r/20200501104105.2621149-2-hch@lst.de
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
copy_strings_kernel is always used with a single argument,
adjust the calling convention to that.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Link: http://lkml.kernel.org/r/20200501104105.2621149-2-hch@lst.de
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>exec: Compute file based creds only once</title>
<updated>2020-05-30T03:00:54+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2020-05-30T03:00:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=56305aa9b6fab91a5555a45796b79c1b0a6353d1'/>
<id>56305aa9b6fab91a5555a45796b79c1b0a6353d1</id>
<content type='text'>
Move the computation of creds from prepare_binfmt into begin_new_exec
so that the creds need only be computed once.  This is just code
reorganization no semantic changes of any kind are made.

Moving the computation is safe.  I have looked through the kernel and
verified none of the binfmts look at bprm-&gt;cred directly, and that
there are no helpers that look at bprm-&gt;cred indirectly.  Which means
that it is not a problem to compute the bprm-&gt;cred later in the
execution flow as it is not used until it becomes current-&gt;cred.

A new function bprm_creds_from_file is added to contain the work that
needs to be done.  bprm_creds_from_file first computes which file
bprm-&gt;executable or most likely bprm-&gt;file that the bprm-&gt;creds
will be computed from.

The funciton bprm_fill_uid is updated to receive the file instead of
accessing bprm-&gt;file.  The now unnecessary work needed to reset the
bprm-&gt;cred-&gt;euid, and bprm-&gt;cred-&gt;egid is removed from brpm_fill_uid.
A small comment to document that bprm_fill_uid now only deals with the
work to handle suid and sgid files.  The default case is already
heandled by prepare_exec_creds.

The function security_bprm_repopulate_creds is renamed
security_bprm_creds_from_file and now is explicitly passed the file
from which to compute the creds.  The documentation of the
bprm_creds_from_file security hook is updated to explain when the hook
is called and what it needs to do.  The file is passed from
cap_bprm_creds_from_file into get_file_caps so that the caps are
computed for the appropriate file.  The now unnecessary work in
cap_bprm_creds_from_file to reset the ambient capabilites has been
removed.  A small comment to document that the work of
cap_bprm_creds_from_file is to read capabilities from the files
secureity attribute and derive capabilities from the fact the
user had uid 0 has been added.

Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Move the computation of creds from prepare_binfmt into begin_new_exec
so that the creds need only be computed once.  This is just code
reorganization no semantic changes of any kind are made.

Moving the computation is safe.  I have looked through the kernel and
verified none of the binfmts look at bprm-&gt;cred directly, and that
there are no helpers that look at bprm-&gt;cred indirectly.  Which means
that it is not a problem to compute the bprm-&gt;cred later in the
execution flow as it is not used until it becomes current-&gt;cred.

A new function bprm_creds_from_file is added to contain the work that
needs to be done.  bprm_creds_from_file first computes which file
bprm-&gt;executable or most likely bprm-&gt;file that the bprm-&gt;creds
will be computed from.

The funciton bprm_fill_uid is updated to receive the file instead of
accessing bprm-&gt;file.  The now unnecessary work needed to reset the
bprm-&gt;cred-&gt;euid, and bprm-&gt;cred-&gt;egid is removed from brpm_fill_uid.
A small comment to document that bprm_fill_uid now only deals with the
work to handle suid and sgid files.  The default case is already
heandled by prepare_exec_creds.

The function security_bprm_repopulate_creds is renamed
security_bprm_creds_from_file and now is explicitly passed the file
from which to compute the creds.  The documentation of the
bprm_creds_from_file security hook is updated to explain when the hook
is called and what it needs to do.  The file is passed from
cap_bprm_creds_from_file into get_file_caps so that the caps are
computed for the appropriate file.  The now unnecessary work in
cap_bprm_creds_from_file to reset the ambient capabilites has been
removed.  A small comment to document that the work of
cap_bprm_creds_from_file is to read capabilities from the files
secureity attribute and derive capabilities from the fact the
user had uid 0 has been added.

Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>exec: Remove recursion from search_binary_handler</title>
<updated>2020-05-21T15:16:57+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2020-05-18T23:43:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=bc2bf338d54b7aadaed49bb45b9e10d4592b2a46'/>
<id>bc2bf338d54b7aadaed49bb45b9e10d4592b2a46</id>
<content type='text'>
Recursion in kernel code is generally a bad idea as it can overflow
the kernel stack.  Recursion in exec also hides that the code is
looping and that the loop changes bprm-&gt;file.

Instead of recursing in search_binary_handler have the methods that
would recurse set bprm-&gt;interpreter and return 0.  Modify exec_binprm
to loop when bprm-&gt;interpreter is set.  Consolidate all of the
reassignments of bprm-&gt;file in that loop to make it clear what is
going on.

The structure of the new loop in exec_binprm is that all errors return
immediately, while successful completion (ret == 0 &amp;&amp;
!bprm-&gt;interpreter) just breaks out of the loop and runs what
exec_bprm has always run upon successful completion.

Fail if the an interpreter is being call after execfd has been set.
The code has never properly handled an interpreter being called with
execfd being set and with reassignments of bprm-&gt;file and the
assignment of bprm-&gt;executable in generic code it has finally become
possible to test and fail when if this problematic condition happens.

With the reassignments of bprm-&gt;file and the assignment of
bprm-&gt;executable moved into the generic code add a test to see if
bprm-&gt;executable is being reassigned.

In search_binary_handler remove the test for !bprm-&gt;file.  With all
reassignments of bprm-&gt;file moved to exec_binprm bprm-&gt;file can never
be NULL in search_binary_handler.

Link: https://lkml.kernel.org/r/87sgfwyd84.fsf_-_@x220.int.ebiederm.org
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Recursion in kernel code is generally a bad idea as it can overflow
the kernel stack.  Recursion in exec also hides that the code is
looping and that the loop changes bprm-&gt;file.

Instead of recursing in search_binary_handler have the methods that
would recurse set bprm-&gt;interpreter and return 0.  Modify exec_binprm
to loop when bprm-&gt;interpreter is set.  Consolidate all of the
reassignments of bprm-&gt;file in that loop to make it clear what is
going on.

The structure of the new loop in exec_binprm is that all errors return
immediately, while successful completion (ret == 0 &amp;&amp;
!bprm-&gt;interpreter) just breaks out of the loop and runs what
exec_bprm has always run upon successful completion.

Fail if the an interpreter is being call after execfd has been set.
The code has never properly handled an interpreter being called with
execfd being set and with reassignments of bprm-&gt;file and the
assignment of bprm-&gt;executable in generic code it has finally become
possible to test and fail when if this problematic condition happens.

With the reassignments of bprm-&gt;file and the assignment of
bprm-&gt;executable moved into the generic code add a test to see if
bprm-&gt;executable is being reassigned.

In search_binary_handler remove the test for !bprm-&gt;file.  With all
reassignments of bprm-&gt;file moved to exec_binprm bprm-&gt;file can never
be NULL in search_binary_handler.

Link: https://lkml.kernel.org/r/87sgfwyd84.fsf_-_@x220.int.ebiederm.org
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
