<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/inode.c, branch v2.6.34</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>dquot: move dquot initialization responsibility into the filesystem</title>
<updated>2010-03-04T23:20:30+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@infradead.org</email>
</author>
<published>2010-03-03T14:05:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=907f4554e2521cb28b0009d17167760650a9561c'/>
<id>907f4554e2521cb28b0009d17167760650a9561c</id>
<content type='text'>
Currently various places in the VFS call vfs_dq_init directly.  This means
we tie the quota code into the VFS.  Get rid of that and make the
filesystem responsible for the initialization.   For most metadata operations
this is a straight forward move into the methods, but for truncate and
open it's a bit more complicated.

For truncate we currently only call vfs_dq_init for the sys_truncate case
because open already takes care of it for ftruncate and open(O_TRUNC) - the
new code causes an additional vfs_dq_init for those which is harmless.

For open the initialization is moved from do_filp_open into the open method,
which means it happens slightly earlier now, and only for regular files.
The latter is fine because we don't need to initialize it for operations
on special files, and we already do it as part of the namespace operations
for directories.

Add a dquot_file_open helper that filesystems that support generic quotas
can use to fill in -&gt;open.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently various places in the VFS call vfs_dq_init directly.  This means
we tie the quota code into the VFS.  Get rid of that and make the
filesystem responsible for the initialization.   For most metadata operations
this is a straight forward move into the methods, but for truncate and
open it's a bit more complicated.

For truncate we currently only call vfs_dq_init for the sys_truncate case
because open already takes care of it for ftruncate and open(O_TRUNC) - the
new code causes an additional vfs_dq_init for those which is harmless.

For open the initialization is moved from do_filp_open into the open method,
which means it happens slightly earlier now, and only for regular files.
The latter is fine because we don't need to initialize it for operations
on special files, and we already do it as part of the namespace operations
for directories.

Add a dquot_file_open helper that filesystems that support generic quotas
can use to fill in -&gt;open.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dquot: move dquot drop responsibility into the filesystem</title>
<updated>2010-03-04T23:20:29+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@infradead.org</email>
</author>
<published>2010-03-03T14:05:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=257ba15cedf1288f0c96118d7e63947231d27278'/>
<id>257ba15cedf1288f0c96118d7e63947231d27278</id>
<content type='text'>
Currently clear_inode calls vfs_dq_drop directly.  This means
we tie the quota code into the VFS.  Get rid of that and make the
filesystem responsible for the drop inside the -&gt;clear_inode
superblock operation.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently clear_inode calls vfs_dq_drop directly.  This means
we tie the quota code into the VFS.  Get rid of that and make the
filesystem responsible for the drop inside the -&gt;clear_inode
superblock operation.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kill I_LOCK</title>
<updated>2009-12-17T16:03:25+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2009-12-17T13:25:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=eaff8079d4f1016a12e34ab323737314f24127dd'/>
<id>eaff8079d4f1016a12e34ab323737314f24127dd</id>
<content type='text'>
After I_SYNC was split from I_LOCK the leftover is always used together with
I_NEW and thus superflous.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After I_SYNC was split from I_LOCK the leftover is always used together with
I_NEW and thus superflous.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>LSM: imbed ima calls in the security hooks</title>
<updated>2009-10-25T04:22:48+00:00</updated>
<author>
<name>Mimi Zohar</name>
<email>zohar@linux.vnet.ibm.com</email>
</author>
<published>2009-10-22T21:30:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6c21a7fb492bf7e2c4985937082ce58ddeca84bd'/>
<id>6c21a7fb492bf7e2c4985937082ce58ddeca84bd</id>
<content type='text'>
Based on discussions on LKML and LSM, where there are consecutive
security_ and ima_ calls in the vfs layer, move the ima_ calls to
the existing security_ hooks.

Signed-off-by: Mimi Zohar &lt;zohar@us.ibm.com&gt;
Signed-off-by: James Morris &lt;jmorris@namei.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Based on discussions on LKML and LSM, where there are consecutive
security_ and ima_ calls in the vfs layer, move the ima_ calls to
the existing security_ hooks.

Signed-off-by: Mimi Zohar &lt;zohar@us.ibm.com&gt;
Signed-off-by: James Morris &lt;jmorris@namei.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfs: optimize touch_time() too</title>
<updated>2009-09-24T11:47:27+00:00</updated>
<author>
<name>Andi Kleen</name>
<email>andi@firstfloor.org</email>
</author>
<published>2009-09-18T20:05:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ce06e0b21d6732a2bab10a585a3ec6909499be28'/>
<id>ce06e0b21d6732a2bab10a585a3ec6909499be28</id>
<content type='text'>
Do a similar optimization as earlier for touch_atime.  Getting the lock in
mnt_get_write is relatively costly, so try all avenues to avoid it first.

This patch is careful to still only update inode fields inside the lock
region.

This didn't show up in benchmarks, but it's easy enough to do.

[akpm@linux-foundation.org: fix typo in comment]
[hugh.dickins@tiscali.co.uk: fix inverted test of mnt_want_write_file()]
Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Valerie Aurora &lt;vaurora@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Dave Hansen &lt;haveblue@us.ibm.com&gt;
Signed-off-by: Hugh Dickins &lt;hugh.dickins@tiscali.co.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Do a similar optimization as earlier for touch_atime.  Getting the lock in
mnt_get_write is relatively costly, so try all avenues to avoid it first.

This patch is careful to still only update inode fields inside the lock
region.

This didn't show up in benchmarks, but it's easy enough to do.

[akpm@linux-foundation.org: fix typo in comment]
[hugh.dickins@tiscali.co.uk: fix inverted test of mnt_want_write_file()]
Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Valerie Aurora &lt;vaurora@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Dave Hansen &lt;haveblue@us.ibm.com&gt;
Signed-off-by: Hugh Dickins &lt;hugh.dickins@tiscali.co.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfs: optimization for touch_atime()</title>
<updated>2009-09-24T11:47:26+00:00</updated>
<author>
<name>Andi Kleen</name>
<email>andi@firstfloor.org</email>
</author>
<published>2009-09-18T20:05:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b12536c27043f1c21195e587eb59950428326e22'/>
<id>b12536c27043f1c21195e587eb59950428326e22</id>
<content type='text'>
Some benchmark testing shows touch_atime to be high up in profile logs for
IO intensive workloads.  Most likely that's due to the lock in
mnt_want_write().  Unfortunately touch_atime first takes the lock, and
then does all the other tests that could avoid atime updates (like noatime
or relatime).

Do it the other way round -- first try to avoid the update and only then
if that didn't succeed take the lock.  That works because none of the
atime avoidance tests rely on locking.

This also eliminates a goto.

Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Reviewed-by: Valerie Aurora &lt;vaurora@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Dave Hansen &lt;haveblue@us.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Some benchmark testing shows touch_atime to be high up in profile logs for
IO intensive workloads.  Most likely that's due to the lock in
mnt_want_write().  Unfortunately touch_atime first takes the lock, and
then does all the other tests that could avoid atime updates (like noatime
or relatime).

Do it the other way round -- first try to avoid the update and only then
if that didn't succeed take the lock.  That works because none of the
atime avoidance tests rely on locking.

This also eliminates a goto.

Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Reviewed-by: Valerie Aurora &lt;vaurora@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Dave Hansen &lt;haveblue@us.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfs: split generic_forget_inode() so that hugetlbfs does not have to copy it</title>
<updated>2009-09-24T11:47:25+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2009-09-18T20:05:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=22fe404218156328a27e66349b1175cd0baa4990'/>
<id>22fe404218156328a27e66349b1175cd0baa4990</id>
<content type='text'>
Hugetlbfs needs to do special things instead of truncate_inode_pages().
 Currently, it copied generic_forget_inode() except for
truncate_inode_pages() call which is asking for trouble (the code there
isn't trivial).  So create a separate function generic_detach_inode()
which does all the list magic done in generic_forget_inode() and call
it from hugetlbfs_forget_inode().

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Hugetlbfs needs to do special things instead of truncate_inode_pages().
 Currently, it copied generic_forget_inode() except for
truncate_inode_pages() call which is asking for trouble (the code there
isn't trivial).  So create a separate function generic_detach_inode()
which does all the list magic done in generic_forget_inode() and call
it from hugetlbfs_forget_inode().

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs/inode.c: add dev-id and inode number for debugging in init_special_inode()</title>
<updated>2009-09-24T11:47:24+00:00</updated>
<author>
<name>Manish Katiyar</name>
<email>mkatiyar@gmail.com</email>
</author>
<published>2009-09-18T20:05:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=af0d9ae811d11de8a01d6bc922c5e062be01bd7f'/>
<id>af0d9ae811d11de8a01d6bc922c5e062be01bd7f</id>
<content type='text'>
Add device-id and inode number for better debugging.  This was suggested
by Andreas in one of the threads
http://article.gmane.org/gmane.comp.file-systems.ext4/12062 .

"If anyone has a chance, fixing this error message to be not-useless would
be good...  Including the device name and the inode number would help
track down the source of the problem."

Signed-off-by: Manish Katiyar &lt;mkatiyar@gmail.com&gt;
Cc: Andreas Dilger &lt;adilger@sun.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add device-id and inode number for better debugging.  This was suggested
by Andreas in one of the threads
http://article.gmane.org/gmane.comp.file-systems.ext4/12062 .

"If anyone has a chance, fixing this error message to be not-useless would
be good...  Including the device name and the inode number would help
track down the source of the problem."

Signed-off-by: Manish Katiyar &lt;mkatiyar@gmail.com&gt;
Cc: Andreas Dilger &lt;adilger@sun.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: turn iprune_mutex into rwsem</title>
<updated>2009-09-23T14:39:29+00:00</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@suse.de</email>
</author>
<published>2009-09-22T23:43:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=88e0fbc452ed94393bf89585c2b90edb94749b45'/>
<id>88e0fbc452ed94393bf89585c2b90edb94749b45</id>
<content type='text'>
We have had a report of bad memory allocation latency during DVD-RAM (UDF)
writing.  This is causing the user's desktop session to become unusable.

Jan tracked the cause of this down to UDF inode reclaim blocking:

gnome-screens D ffff810006d1d598     0 20686      1
 ffff810006d1d508 0000000000000082 ffff810037db6718 0000000000000800
 ffff810006d1d488 ffffffff807e4280 ffffffff807e4280 ffff810006d1a580
 ffff8100bccbc140 ffff810006d1a8c0 0000000006d1d4e8 ffff810006d1a8c0
Call Trace:
 [&lt;ffffffff804477f3&gt;] io_schedule+0x63/0xa5
 [&lt;ffffffff802c2587&gt;] sync_buffer+0x3b/0x3f
 [&lt;ffffffff80447d2a&gt;] __wait_on_bit+0x47/0x79
 [&lt;ffffffff80447dc6&gt;] out_of_line_wait_on_bit+0x6a/0x77
 [&lt;ffffffff802c24f6&gt;] __wait_on_buffer+0x1f/0x21
 [&lt;ffffffff802c442a&gt;] __bread+0x70/0x86
 [&lt;ffffffff88de9ec7&gt;] :udf:udf_tread+0x38/0x3a
 [&lt;ffffffff88de0fcf&gt;] :udf:udf_update_inode+0x4d/0x68c
 [&lt;ffffffff88de26e1&gt;] :udf:udf_write_inode+0x1d/0x2b
 [&lt;ffffffff802bcf85&gt;] __writeback_single_inode+0x1c0/0x394
 [&lt;ffffffff802bd205&gt;] write_inode_now+0x7d/0xc4
 [&lt;ffffffff88de2e76&gt;] :udf:udf_clear_inode+0x3d/0x53
 [&lt;ffffffff802b39ae&gt;] clear_inode+0xc2/0x11b
 [&lt;ffffffff802b3ab1&gt;] dispose_list+0x5b/0x102
 [&lt;ffffffff802b3d35&gt;] shrink_icache_memory+0x1dd/0x213
 [&lt;ffffffff8027ede3&gt;] shrink_slab+0xe3/0x158
 [&lt;ffffffff8027fbab&gt;] try_to_free_pages+0x177/0x232
 [&lt;ffffffff8027a578&gt;] __alloc_pages+0x1fa/0x392
 [&lt;ffffffff802951fa&gt;] alloc_page_vma+0x176/0x189
 [&lt;ffffffff802822d8&gt;] __do_fault+0x10c/0x417
 [&lt;ffffffff80284232&gt;] handle_mm_fault+0x466/0x940
 [&lt;ffffffff8044b922&gt;] do_page_fault+0x676/0xabf

This blocks with iprune_mutex held, which then blocks other reclaimers:

X             D ffff81009d47c400     0 17285  14831
 ffff8100844f3728 0000000000000086 0000000000000000 ffff81000000e288
 ffff81000000da00 ffffffff807e4280 ffffffff807e4280 ffff81009d47c400
 ffffffff805ff890 ffff81009d47c740 00000000844f3808 ffff81009d47c740
Call Trace:
 [&lt;ffffffff80447f8c&gt;] __mutex_lock_slowpath+0x72/0xa9
 [&lt;ffffffff80447e1a&gt;] mutex_lock+0x1e/0x22
 [&lt;ffffffff802b3ba1&gt;] shrink_icache_memory+0x49/0x213
 [&lt;ffffffff8027ede3&gt;] shrink_slab+0xe3/0x158
 [&lt;ffffffff8027fbab&gt;] try_to_free_pages+0x177/0x232
 [&lt;ffffffff8027a578&gt;] __alloc_pages+0x1fa/0x392
 [&lt;ffffffff8029507f&gt;] alloc_pages_current+0xd1/0xd6
 [&lt;ffffffff80279ac0&gt;] __get_free_pages+0xe/0x4d
 [&lt;ffffffff802ae1b7&gt;] __pollwait+0x5e/0xdf
 [&lt;ffffffff8860f2b4&gt;] :nvidia:nv_kern_poll+0x2e/0x73
 [&lt;ffffffff802ad949&gt;] do_select+0x308/0x506
 [&lt;ffffffff802adced&gt;] core_sys_select+0x1a6/0x254
 [&lt;ffffffff802ae0b7&gt;] sys_select+0xb5/0x157

Now I think the main problem is having the filesystem block (and do IO) in
inode reclaim.  The problem is that this doesn't get accounted well and
penalizes a random allocator with a big latency spike caused by work
generated from elsewhere.

I think the best idea would be to avoid this.  By design if possible, or
by deferring the hard work to an asynchronous context.  If the latter,
then the fs would probably want to throttle creation of new work with
queue size of the deferred work, but let's not get into those details.

Anyway, the other obvious thing we looked at is the iprune_mutex which is
causing the cascading blocking.  We could turn this into an rwsem to
improve concurrency.  It is unreasonable to totally ban all potentially
slow or blocking operations in inode reclaim, so I think this is a cheap
way to get a small improvement.

This doesn't solve the whole problem of course.  The process doing inode
reclaim will still take the latency hit, and concurrent processes may end
up contending on filesystem locks.  So fs developers should keep these
problems in mind.

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Jan Kara &lt;jack@ucw.cz&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have had a report of bad memory allocation latency during DVD-RAM (UDF)
writing.  This is causing the user's desktop session to become unusable.

Jan tracked the cause of this down to UDF inode reclaim blocking:

gnome-screens D ffff810006d1d598     0 20686      1
 ffff810006d1d508 0000000000000082 ffff810037db6718 0000000000000800
 ffff810006d1d488 ffffffff807e4280 ffffffff807e4280 ffff810006d1a580
 ffff8100bccbc140 ffff810006d1a8c0 0000000006d1d4e8 ffff810006d1a8c0
Call Trace:
 [&lt;ffffffff804477f3&gt;] io_schedule+0x63/0xa5
 [&lt;ffffffff802c2587&gt;] sync_buffer+0x3b/0x3f
 [&lt;ffffffff80447d2a&gt;] __wait_on_bit+0x47/0x79
 [&lt;ffffffff80447dc6&gt;] out_of_line_wait_on_bit+0x6a/0x77
 [&lt;ffffffff802c24f6&gt;] __wait_on_buffer+0x1f/0x21
 [&lt;ffffffff802c442a&gt;] __bread+0x70/0x86
 [&lt;ffffffff88de9ec7&gt;] :udf:udf_tread+0x38/0x3a
 [&lt;ffffffff88de0fcf&gt;] :udf:udf_update_inode+0x4d/0x68c
 [&lt;ffffffff88de26e1&gt;] :udf:udf_write_inode+0x1d/0x2b
 [&lt;ffffffff802bcf85&gt;] __writeback_single_inode+0x1c0/0x394
 [&lt;ffffffff802bd205&gt;] write_inode_now+0x7d/0xc4
 [&lt;ffffffff88de2e76&gt;] :udf:udf_clear_inode+0x3d/0x53
 [&lt;ffffffff802b39ae&gt;] clear_inode+0xc2/0x11b
 [&lt;ffffffff802b3ab1&gt;] dispose_list+0x5b/0x102
 [&lt;ffffffff802b3d35&gt;] shrink_icache_memory+0x1dd/0x213
 [&lt;ffffffff8027ede3&gt;] shrink_slab+0xe3/0x158
 [&lt;ffffffff8027fbab&gt;] try_to_free_pages+0x177/0x232
 [&lt;ffffffff8027a578&gt;] __alloc_pages+0x1fa/0x392
 [&lt;ffffffff802951fa&gt;] alloc_page_vma+0x176/0x189
 [&lt;ffffffff802822d8&gt;] __do_fault+0x10c/0x417
 [&lt;ffffffff80284232&gt;] handle_mm_fault+0x466/0x940
 [&lt;ffffffff8044b922&gt;] do_page_fault+0x676/0xabf

This blocks with iprune_mutex held, which then blocks other reclaimers:

X             D ffff81009d47c400     0 17285  14831
 ffff8100844f3728 0000000000000086 0000000000000000 ffff81000000e288
 ffff81000000da00 ffffffff807e4280 ffffffff807e4280 ffff81009d47c400
 ffffffff805ff890 ffff81009d47c740 00000000844f3808 ffff81009d47c740
Call Trace:
 [&lt;ffffffff80447f8c&gt;] __mutex_lock_slowpath+0x72/0xa9
 [&lt;ffffffff80447e1a&gt;] mutex_lock+0x1e/0x22
 [&lt;ffffffff802b3ba1&gt;] shrink_icache_memory+0x49/0x213
 [&lt;ffffffff8027ede3&gt;] shrink_slab+0xe3/0x158
 [&lt;ffffffff8027fbab&gt;] try_to_free_pages+0x177/0x232
 [&lt;ffffffff8027a578&gt;] __alloc_pages+0x1fa/0x392
 [&lt;ffffffff8029507f&gt;] alloc_pages_current+0xd1/0xd6
 [&lt;ffffffff80279ac0&gt;] __get_free_pages+0xe/0x4d
 [&lt;ffffffff802ae1b7&gt;] __pollwait+0x5e/0xdf
 [&lt;ffffffff8860f2b4&gt;] :nvidia:nv_kern_poll+0x2e/0x73
 [&lt;ffffffff802ad949&gt;] do_select+0x308/0x506
 [&lt;ffffffff802adced&gt;] core_sys_select+0x1a6/0x254
 [&lt;ffffffff802ae0b7&gt;] sys_select+0xb5/0x157

Now I think the main problem is having the filesystem block (and do IO) in
inode reclaim.  The problem is that this doesn't get accounted well and
penalizes a random allocator with a big latency spike caused by work
generated from elsewhere.

I think the best idea would be to avoid this.  By design if possible, or
by deferring the hard work to an asynchronous context.  If the latter,
then the fs would probably want to throttle creation of new work with
queue size of the deferred work, but let's not get into those details.

Anyway, the other obvious thing we looked at is the iprune_mutex which is
causing the cascading blocking.  We could turn this into an rwsem to
improve concurrency.  It is unreasonable to totally ban all potentially
slow or blocking operations in inode reclaim, so I think this is a cheap
way to get a small improvement.

This doesn't solve the whole problem of course.  The process doing inode
reclaim will still take the latency hit, and concurrent processes may end
up contending on filesystem locks.  So fs developers should keep these
problems in mind.

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Jan Kara &lt;jack@ucw.cz&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>const: mark remaining inode_operations as const</title>
<updated>2009-09-22T14:17:24+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2009-09-22T00:01:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6e1d5dcc2bbbe71dbf010c747e15739bef6b7218'/>
<id>6e1d5dcc2bbbe71dbf010c747e15739bef6b7218</id>
<content type='text'>
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
