<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/ceph/super.c, branch v3.17</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>dcache: d_obtain_alias callers don't all want DISCONNECTED</title>
<updated>2014-08-07T18:40:10+00:00</updated>
<author>
<name>J. Bruce Fields</name>
<email>bfields@redhat.com</email>
</author>
<published>2014-02-14T22:35:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1a0a397e41cb1bf70cfe45fd0eeff08c7c501ec0'/>
<id>1a0a397e41cb1bf70cfe45fd0eeff08c7c501ec0</id>
<content type='text'>
There are a few d_obtain_alias callers that are using it to get the
root of a filesystem which may already have an alias somewhere else.

This is not the same as the filehandle-lookup case, and none of them
actually need DCACHE_DISCONNECTED set.

It isn't really a serious problem, but it would really be clearer if we
reserved DCACHE_DISCONNECTED for those cases where it's actually needed.

In the btrfs case this was causing a spurious printk from
nfsd/nfsfh.c:fh_verify when it found an unexpected DCACHE_DISCONNECTED
dentry.  Josef worked around this by unsetting DCACHE_DISCONNECTED
manually in 3a0dfa6a12e "Btrfs: unset DCACHE_DISCONNECTED when mounting
default subvol", and this replaces that workaround.

Cc: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: J. Bruce Fields &lt;bfields@redhat.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are a few d_obtain_alias callers that are using it to get the
root of a filesystem which may already have an alias somewhere else.

This is not the same as the filehandle-lookup case, and none of them
actually need DCACHE_DISCONNECTED set.

It isn't really a serious problem, but it would really be clearer if we
reserved DCACHE_DISCONNECTED for those cases where it's actually needed.

In the btrfs case this was causing a spurious printk from
nfsd/nfsfh.c:fh_verify when it found an unexpected DCACHE_DISCONNECTED
dentry.  Josef worked around this by unsetting DCACHE_DISCONNECTED
manually in 3a0dfa6a12e "Btrfs: unset DCACHE_DISCONNECTED when mounting
default subvol", and this replaces that workaround.

Cc: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: J. Bruce Fields &lt;bfields@redhat.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: use fl-&gt;fl_file as owner identifier of flock and posix lock</title>
<updated>2014-04-05T04:07:11+00:00</updated>
<author>
<name>Yan, Zheng</name>
<email>zheng.z.yan@intel.com</email>
</author>
<published>2014-03-09T15:16:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=eb13e832f823f6c110ea53e3067bafe22b87de63'/>
<id>eb13e832f823f6c110ea53e3067bafe22b87de63</id>
<content type='text'>
flock and posix lock should use fl-&gt;fl_file instead of process ID
as owner identifier. (posix lock uses fl-&gt;fl_owner. fl-&gt;fl_owner
is usually equal to fl-&gt;fl_file, but it also can be a customized
value). The process ID of who holds the lock is just for F_GETLK
fcntl(2).

The fix is rename the 'pid' fields of struct ceph_mds_request_args
and struct ceph_filelock to 'owner', rename 'pid_namespace' fields
to 'pid'. Assign fl-&gt;fl_file to the 'owner' field of lock messages.
We also set the most significant bit of the 'owner' field. MDS can
use that bit to distinguish between old and new clients.

The MDS counterpart of this patch modifies the flock code to not
take the 'pid_namespace' into consideration when checking conflict
locks.

Signed-off-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
flock and posix lock should use fl-&gt;fl_file instead of process ID
as owner identifier. (posix lock uses fl-&gt;fl_owner. fl-&gt;fl_owner
is usually equal to fl-&gt;fl_file, but it also can be a customized
value). The process ID of who holds the lock is just for F_GETLK
fcntl(2).

The fix is rename the 'pid' fields of struct ceph_mds_request_args
and struct ceph_filelock to 'owner', rename 'pid_namespace' fields
to 'pid'. Assign fl-&gt;fl_file to the 'owner' field of lock messages.
We also set the most significant bit of the 'owner' field. MDS can
use that bit to distinguish between old and new clients.

The MDS counterpart of this patch modifies the flock code to not
take the 'pid_namespace' into consideration when checking conflict
locks.

Signed-off-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: add acl, noacl options for cephfs mount</title>
<updated>2014-02-17T20:37:12+00:00</updated>
<author>
<name>Sage Weil</name>
<email>sage@inktank.com</email>
</author>
<published>2014-02-16T18:05:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=45195e42c78ea91135108207dbcaf75e5556a309'/>
<id>45195e42c78ea91135108207dbcaf75e5556a309</id>
<content type='text'>
Make the 'acl' option dependent on having ACL support compiled in.  Make
the 'noacl' option work even without it so that one can always ask it to
be off and not error out on mount when it is not supported.

Signed-off-by: Guangliang Zhao &lt;lucienchao@gmail.com&gt;
Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Make the 'acl' option dependent on having ACL support compiled in.  Make
the 'noacl' option work even without it so that one can always ask it to
be off and not error out on mount when it is not supported.

Signed-off-by: Guangliang Zhao &lt;lucienchao@gmail.com&gt;
Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: all features fields must be u64</title>
<updated>2013-12-31T18:32:08+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2013-12-24T19:19:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=12b4629a9fb80fecaebadc217b13b8776ed8dbef'/>
<id>12b4629a9fb80fecaebadc217b13b8776ed8dbef</id>
<content type='text'>
In preparation for ceph_features.h update, change all features fields
from unsigned int/u32 to u64.  (ceph.git has ~40 feature bits at this
point.)

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In preparation for ceph_features.h update, change all features fields
from unsigned int/u32 to u64.  (ceph.git has ~40 feature bits at this
point.)

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: add acl for cephfs</title>
<updated>2013-12-31T18:32:01+00:00</updated>
<author>
<name>Guangliang Zhao</name>
<email>lucienchao@gmail.com</email>
</author>
<published>2013-11-11T07:18:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7221fe4c2ed72804b28633c8e0217d65abb0023f'/>
<id>7221fe4c2ed72804b28633c8e0217d65abb0023f</id>
<content type='text'>
Signed-off-by: Guangliang Zhao &lt;lucienchao@gmail.com&gt;
Reviewed-by: Li Wang &lt;li.wang@ubuntykylin.com&gt;
Reviewed-by: Zheng Yan &lt;zheng.z.yan@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Guangliang Zhao &lt;lucienchao@gmail.com&gt;
Reviewed-by: Li Wang &lt;li.wang@ubuntykylin.com&gt;
Reviewed-by: Zheng Yan &lt;zheng.z.yan@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: drop unconnected inodes</title>
<updated>2013-12-13T17:13:16+00:00</updated>
<author>
<name>Yan, Zheng</name>
<email>zheng.z.yan@intel.com</email>
</author>
<published>2013-09-20T11:55:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9f12bd119e408388233e7aeb1152f372a8b5dcad'/>
<id>9f12bd119e408388233e7aeb1152f372a8b5dcad</id>
<content type='text'>
Positve dentry and corresponding inode are always accompanied in MDS reply.
So no need to keep inode in the cache after dropping all its aliases.

Signed-off-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Positve dentry and corresponding inode are always accompanied in MDS reply.
So no need to keep inode in the cache after dropping all its aliases.

Signed-off-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: use fscache as a local presisent cache</title>
<updated>2013-09-06T16:50:11+00:00</updated>
<author>
<name>Milosz Tanski</name>
<email>milosz@adfin.com</email>
</author>
<published>2013-08-21T21:29:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=99ccbd229cf7453206bc858e795ec1f0345ff258'/>
<id>99ccbd229cf7453206bc858e795ec1f0345ff258</id>
<content type='text'>
Adding support for fscache to the Ceph filesystem. This would bring it to on
par with some of the other network filesystems in Linux (like NFS, AFS, etc...)

In order to mount the filesystem with fscache the 'fsc' mount option must be
passed.

Signed-off-by: Milosz Tanski &lt;milosz@adfin.com&gt;
Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Adding support for fscache to the Ceph filesystem. This would bring it to on
par with some of the other network filesystems in Linux (like NFS, AFS, etc...)

In order to mount the filesystem with fscache the 'fsc' mount option must be
passed.

Signed-off-by: Milosz Tanski &lt;milosz@adfin.com&gt;
Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: avoid accessing invalid memory</title>
<updated>2013-07-03T22:32:55+00:00</updated>
<author>
<name>Sasha Levin</name>
<email>sasha.levin@oracle.com</email>
</author>
<published>2013-07-01T22:33:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5446429630257f4723829409337a26c076907d5d'/>
<id>5446429630257f4723829409337a26c076907d5d</id>
<content type='text'>
when mounting ceph with a dev name that starts with a slash, ceph
would attempt to access the character before that slash. Since we
don't actually own that byte of memory, we would trigger an
invalid access:

[   43.499934] BUG: unable to handle kernel paging request at ffff880fa3a97fff
[   43.500984] IP: [&lt;ffffffff818f3884&gt;] parse_mount_options+0x1a4/0x300
[   43.501491] PGD 743b067 PUD 10283c4067 PMD 10282a6067 PTE 8000000fa3a97060
[   43.502301] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[   43.503006] Dumping ftrace buffer:
[   43.503596]    (ftrace buffer empty)
[   43.504046] CPU: 0 PID: 10879 Comm: mount Tainted: G        W    3.10.0-sasha #1129
[   43.504851] task: ffff880fa625b000 ti: ffff880fa3412000 task.ti: ffff880fa3412000
[   43.505608] RIP: 0010:[&lt;ffffffff818f3884&gt;]  [&lt;ffffffff818f3884&gt;] parse_mount_options$
[   43.506552] RSP: 0018:ffff880fa3413d08  EFLAGS: 00010286
[   43.507133] RAX: ffff880fa3a98000 RBX: ffff880fa3a98000 RCX: 0000000000000000
[   43.507893] RDX: ffff880fa3a98001 RSI: 000000000000002f RDI: ffff880fa3a98000
[   43.508610] RBP: ffff880fa3413d58 R08: 0000000000001f99 R09: ffff880fa3fe64c0
[   43.509426] R10: ffff880fa3413d98 R11: ffff880fa38710d8 R12: ffff880fa3413da0
[   43.509792] R13: ffff880fa3a97fff R14: 0000000000000000 R15: ffff880fa3413d90
[   43.509792] FS:  00007fa9c48757e0(0000) GS:ffff880fd2600000(0000) knlGS:000000000000$
[   43.509792] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   43.509792] CR2: ffff880fa3a97fff CR3: 0000000fa3bb9000 CR4: 00000000000006b0
[   43.509792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   43.509792] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   43.509792] Stack:
[   43.509792]  0000e5180000000e ffffffff85ca1900 ffff880fa38710d8 ffff880fa3413d98
[   43.509792]  0000000000000120 0000000000000000 ffff880fa3a98000 0000000000000000
[   43.509792]  ffffffff85cf32a0 0000000000000000 ffff880fa3413dc8 ffffffff818f3c72
[   43.509792] Call Trace:
[   43.509792]  [&lt;ffffffff818f3c72&gt;] ceph_mount+0xa2/0x390
[   43.509792]  [&lt;ffffffff81226314&gt;] ? pcpu_alloc+0x334/0x3c0
[   43.509792]  [&lt;ffffffff81282f8d&gt;] mount_fs+0x8d/0x1a0
[   43.509792]  [&lt;ffffffff812263d0&gt;] ? __alloc_percpu+0x10/0x20
[   43.509792]  [&lt;ffffffff8129f799&gt;] vfs_kern_mount+0x79/0x100
[   43.509792]  [&lt;ffffffff812a224d&gt;] do_new_mount+0xcd/0x1c0
[   43.509792]  [&lt;ffffffff812a2e8d&gt;] do_mount+0x15d/0x210
[   43.509792]  [&lt;ffffffff81220e55&gt;] ? strndup_user+0x45/0x60
[   43.509792]  [&lt;ffffffff812a2fdd&gt;] SyS_mount+0x9d/0xe0
[   43.509792]  [&lt;ffffffff83fd816c&gt;] tracesys+0xdd/0xe2
[   43.509792] Code: 4c 8b 5d c0 74 0a 48 8d 50 01 49 89 14 24 eb 17 31 c0 48 83 c9 ff $
[   43.509792] RIP  [&lt;ffffffff818f3884&gt;] parse_mount_options+0x1a4/0x300
[   43.509792]  RSP &lt;ffff880fa3413d08&gt;
[   43.509792] CR2: ffff880fa3a97fff
[   43.509792] ---[ end trace 22469cd81e93af51 ]---

Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktan.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
when mounting ceph with a dev name that starts with a slash, ceph
would attempt to access the character before that slash. Since we
don't actually own that byte of memory, we would trigger an
invalid access:

[   43.499934] BUG: unable to handle kernel paging request at ffff880fa3a97fff
[   43.500984] IP: [&lt;ffffffff818f3884&gt;] parse_mount_options+0x1a4/0x300
[   43.501491] PGD 743b067 PUD 10283c4067 PMD 10282a6067 PTE 8000000fa3a97060
[   43.502301] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[   43.503006] Dumping ftrace buffer:
[   43.503596]    (ftrace buffer empty)
[   43.504046] CPU: 0 PID: 10879 Comm: mount Tainted: G        W    3.10.0-sasha #1129
[   43.504851] task: ffff880fa625b000 ti: ffff880fa3412000 task.ti: ffff880fa3412000
[   43.505608] RIP: 0010:[&lt;ffffffff818f3884&gt;]  [&lt;ffffffff818f3884&gt;] parse_mount_options$
[   43.506552] RSP: 0018:ffff880fa3413d08  EFLAGS: 00010286
[   43.507133] RAX: ffff880fa3a98000 RBX: ffff880fa3a98000 RCX: 0000000000000000
[   43.507893] RDX: ffff880fa3a98001 RSI: 000000000000002f RDI: ffff880fa3a98000
[   43.508610] RBP: ffff880fa3413d58 R08: 0000000000001f99 R09: ffff880fa3fe64c0
[   43.509426] R10: ffff880fa3413d98 R11: ffff880fa38710d8 R12: ffff880fa3413da0
[   43.509792] R13: ffff880fa3a97fff R14: 0000000000000000 R15: ffff880fa3413d90
[   43.509792] FS:  00007fa9c48757e0(0000) GS:ffff880fd2600000(0000) knlGS:000000000000$
[   43.509792] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   43.509792] CR2: ffff880fa3a97fff CR3: 0000000fa3bb9000 CR4: 00000000000006b0
[   43.509792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   43.509792] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   43.509792] Stack:
[   43.509792]  0000e5180000000e ffffffff85ca1900 ffff880fa38710d8 ffff880fa3413d98
[   43.509792]  0000000000000120 0000000000000000 ffff880fa3a98000 0000000000000000
[   43.509792]  ffffffff85cf32a0 0000000000000000 ffff880fa3413dc8 ffffffff818f3c72
[   43.509792] Call Trace:
[   43.509792]  [&lt;ffffffff818f3c72&gt;] ceph_mount+0xa2/0x390
[   43.509792]  [&lt;ffffffff81226314&gt;] ? pcpu_alloc+0x334/0x3c0
[   43.509792]  [&lt;ffffffff81282f8d&gt;] mount_fs+0x8d/0x1a0
[   43.509792]  [&lt;ffffffff812263d0&gt;] ? __alloc_percpu+0x10/0x20
[   43.509792]  [&lt;ffffffff8129f799&gt;] vfs_kern_mount+0x79/0x100
[   43.509792]  [&lt;ffffffff812a224d&gt;] do_new_mount+0xcd/0x1c0
[   43.509792]  [&lt;ffffffff812a2e8d&gt;] do_mount+0x15d/0x210
[   43.509792]  [&lt;ffffffff81220e55&gt;] ? strndup_user+0x45/0x60
[   43.509792]  [&lt;ffffffff812a2fdd&gt;] SyS_mount+0x9d/0xe0
[   43.509792]  [&lt;ffffffff83fd816c&gt;] tracesys+0xdd/0xe2
[   43.509792] Code: 4c 8b 5d c0 74 0a 48 8d 50 01 49 89 14 24 eb 17 31 c0 48 83 c9 ff $
[   43.509792] RIP  [&lt;ffffffff818f3884&gt;] parse_mount_options+0x1a4/0x300
[   43.509792]  RSP &lt;ffff880fa3413d08&gt;
[   43.509792] CR2: ffff880fa3a97fff
[   43.509792] ---[ end trace 22469cd81e93af51 ]---

Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktan.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: set up page array mempool with correct size</title>
<updated>2013-05-02T04:17:50+00:00</updated>
<author>
<name>Alex Elder</name>
<email>elder@inktank.com</email>
</author>
<published>2013-04-01T15:48:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3bf53337af27a3ccc6e0f433b081063cdf0a2bf6'/>
<id>3bf53337af27a3ccc6e0f433b081063cdf0a2bf6</id>
<content type='text'>
In create_fs_client() a memory pool is set up be used for arrays of
pages that might be needed in ceph_writepages_start() if memory is
tight.  There are two problems with the way it's initialized:
    - The size provided is the number of pages we want in the
      array, but it should be the number of bytes required for
      that many page pointers.
    - The number of pages computed can end up being 0, while we
      will always need at least one page.

This patch fixes both of these problems.

This resolves the two simple problems defined in:
    http://tracker.ceph.com/issues/4603

Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In create_fs_client() a memory pool is set up be used for arrays of
pages that might be needed in ceph_writepages_start() if memory is
tight.  There are two problems with the way it's initialized:
    - The size provided is the number of pages we want in the
      array, but it should be the number of bytes required for
      that many page pointers.
    - The number of pages computed can end up being 0, while we
      will always need at least one page.

This patch fixes both of these problems.

This resolves the two simple problems defined in:
    http://tracker.ceph.com/issues/4603

Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: Limit sys_mount to only request filesystem modules.</title>
<updated>2013-03-04T03:36:31+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2013-03-03T03:39:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7f78e0351394052e1a6293e175825eb5c7869507'/>
<id>7f78e0351394052e1a6293e175825eb5c7869507</id>
<content type='text'>
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.

A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.

Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.

Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives.  Allowing simple, safe,
well understood work-arounds to known problematic software.

This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work.  While writing this patch I saw a handful of such
cases.  The most significant being autofs that lives in the module
autofs4.

This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.

After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module.  The common pattern in the kernel is to call request_module()
without regards to the users permissions.  In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted.  In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.

Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Reported-by: Kees Cook &lt;keescook@google.com&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.

A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.

Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.

Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives.  Allowing simple, safe,
well understood work-arounds to known problematic software.

This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work.  While writing this patch I saw a handful of such
cases.  The most significant being autofs that lives in the module
autofs4.

This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.

After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module.  The common pattern in the kernel is to call request_module()
without regards to the users permissions.  In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted.  In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.

Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Reported-by: Kees Cook &lt;keescook@google.com&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
