<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/nfs/write.c, branch v3.13</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>nfs: use %p[dD] instead of open-coded (and often racy) equivalents</title>
<updated>2013-10-25T03:34:50+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2013-09-16T14:53:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6de1472f1a4a3bd912f515f29d3cf52a65a4c718'/>
<id>6de1472f1a4a3bd912f515f29d3cf52a65a4c718</id>
<content type='text'>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>NFS: Don't check lock owner compatibility in writes unless file is locked</title>
<updated>2013-09-05T22:11:42+00:00</updated>
<author>
<name>Trond Myklebust</name>
<email>Trond.Myklebust@netapp.com</email>
</author>
<published>2013-09-05T19:52:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0f1d26055068bbc66751d1974ecc6f0398b3ac67'/>
<id>0f1d26055068bbc66751d1974ecc6f0398b3ac67</id>
<content type='text'>
If we're doing buffered writes, and there is no file locking involved,
then we don't have to worry about whether or not the lock owner information
is identical.
By relaxing this check, we ensure that fork()ed child processes can write
to a page without having to first sync dirty data that was written
by the parent to disk.

Reported-by: Quentin Barnes &lt;qbarnes@gmail.com&gt;
Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
Tested-by: Quentin Barnes &lt;qbarnes@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If we're doing buffered writes, and there is no file locking involved,
then we don't have to worry about whether or not the lock owner information
is identical.
By relaxing this check, we ensure that fork()ed child processes can write
to a page without having to first sync dirty data that was written
by the parent to disk.

Reported-by: Quentin Barnes &lt;qbarnes@gmail.com&gt;
Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
Tested-by: Quentin Barnes &lt;qbarnes@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nfs4.1: Add SP4_MACH_CRED write and commit support</title>
<updated>2013-09-05T14:50:45+00:00</updated>
<author>
<name>Weston Andros Adamson</name>
<email>dros@netapp.com</email>
</author>
<published>2013-08-13T20:37:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8c21c62c4452f4e66c3dac9b3f6b74474fad3e08'/>
<id>8c21c62c4452f4e66c3dac9b3f6b74474fad3e08</id>
<content type='text'>
WRITE and COMMIT can use the machine credential.

If WRITE is supported and COMMIT is not, make all (mach cred) writes FILE_SYNC4.

Signed-off-by: Weston Andros Adamson &lt;dros@netapp.com&gt;
Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
WRITE and COMMIT can use the machine credential.

If WRITE is supported and COMMIT is not, make all (mach cred) writes FILE_SYNC4.

Signed-off-by: Weston Andros Adamson &lt;dros@netapp.com&gt;
Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>NFSv4: Don't try to recover NFSv4 locks when they are lost.</title>
<updated>2013-09-04T16:26:32+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2013-09-04T07:04:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ef1820f9be27b6ad158f433ab38002ab8131db4d'/>
<id>ef1820f9be27b6ad158f433ab38002ab8131db4d</id>
<content type='text'>
When an NFSv4 client loses contact with the server it can lose any
locks that it holds.

Currently when it reconnects to the server it simply tries to reclaim
those locks.  This might succeed even though some other client has
held and released a lock in the mean time.  So the first client might
think the file is unchanged, but it isn't.  This isn't good.

If, when recovery happens, the locks cannot be claimed because some
other client still holds the lock, then we get a message in the kernel
logs, but the client can still write.  So two clients can both think
they have a lock and can both write at the same time.  This is equally
not good.

There was a patch a while ago
  http://comments.gmane.org/gmane.linux.nfs/41917

which tried to address some of this, but it didn't seem to go
anywhere.  That patch would also send a signal to the process.  That
might be useful but for now this patch just causes writes to fail.

For NFSv4 (unlike v2/v3) there is a strong link between the lock and
the write request so we can fairly easily fail any IO of the lock is
gone.  While some applications might not expect this, it is still
safer than allowing the write to succeed.

Because this is a fairly big change in behaviour a module parameter,
"recover_locks", is introduced which defaults to true (the current
behaviour) but can be set to "false" to tell the client not to try to
recover things that were lost.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When an NFSv4 client loses contact with the server it can lose any
locks that it holds.

Currently when it reconnects to the server it simply tries to reclaim
those locks.  This might succeed even though some other client has
held and released a lock in the mean time.  So the first client might
think the file is unchanged, but it isn't.  This isn't good.

If, when recovery happens, the locks cannot be claimed because some
other client still holds the lock, then we get a message in the kernel
logs, but the client can still write.  So two clients can both think
they have a lock and can both write at the same time.  This is equally
not good.

There was a patch a while ago
  http://comments.gmane.org/gmane.linux.nfs/41917

which tried to address some of this, but it didn't seem to go
anywhere.  That patch would also send a signal to the process.  That
might be useful but for now this patch just causes writes to fail.

For NFSv4 (unlike v2/v3) there is a strong link between the lock and
the write request so we can fairly easily fail any IO of the lock is
gone.  While some applications might not expect this, it is still
safer than allowing the write to succeed.

Because this is a fairly big change in behaviour a module parameter,
"recover_locks", is introduced which defaults to true (the current
behaviour) but can be set to "false" to tell the client not to try to
recover things that were lost.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>NFS avoid expired credential keys for buffered writes</title>
<updated>2013-09-03T19:25:09+00:00</updated>
<author>
<name>Andy Adamson</name>
<email>andros@netapp.com</email>
</author>
<published>2013-08-14T15:59:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dc24826bfca8d788d05f625208f06d90be5560b3'/>
<id>dc24826bfca8d788d05f625208f06d90be5560b3</id>
<content type='text'>
We must avoid buffering a WRITE that is using a credential key (e.g. a GSS
context key) that is about to expire or has expired.  We currently will
paint ourselves into a corner by returning success to the applciation
for such a buffered WRITE, only to discover that we do not have permission when
we attempt to flush the WRITE (and potentially associated COMMIT) to disk.

Use the RPC layer credential key timeout and expire routines which use a
a watermark, gss_key_expire_timeo. We test the key in nfs_file_write.

If a WRITE is using a credential with a key that will expire within
watermark seconds, flush the inode in nfs_write_end and send only
NFS_FILE_SYNC WRITEs by adding nfs_ctx_key_to_expire to nfs_need_sync_write.
Note that this results in single page NFS_FILE_SYNC WRITEs.

Signed-off-by: Andy Adamson &lt;andros@netapp.com&gt;
[Trond: removed a pr_warn_ratelimited() for now]
Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We must avoid buffering a WRITE that is using a credential key (e.g. a GSS
context key) that is about to expire or has expired.  We currently will
paint ourselves into a corner by returning success to the applciation
for such a buffered WRITE, only to discover that we do not have permission when
we attempt to flush the WRITE (and potentially associated COMMIT) to disk.

Use the RPC layer credential key timeout and expire routines which use a
a watermark, gss_key_expire_timeo. We test the key in nfs_file_write.

If a WRITE is using a credential with a key that will expire within
watermark seconds, flush the inode in nfs_write_end and send only
NFS_FILE_SYNC WRITEs by adding nfs_ctx_key_to_expire to nfs_need_sync_write.
Note that this results in single page NFS_FILE_SYNC WRITEs.

Signed-off-by: Andy Adamson &lt;andros@netapp.com&gt;
[Trond: removed a pr_warn_ratelimited() for now]
Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>NFS: Add event tracing for generic NFS events</title>
<updated>2013-08-22T12:58:17+00:00</updated>
<author>
<name>Trond Myklebust</name>
<email>Trond.Myklebust@netapp.com</email>
</author>
<published>2013-08-19T22:59:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f4ce1299b329e96bb247c95c4fee8809827d6931'/>
<id>f4ce1299b329e96bb247c95c4fee8809827d6931</id>
<content type='text'>
Add tracepoints for inode attribute updates, attribute revalidation,
writeback start/end fsync start/end, attribute change start/end,
permission check start/end.

The intention is to enable performance tracing using 'perf'as well as
improving debugging.

Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add tracepoints for inode attribute updates, attribute revalidation,
writeback start/end fsync start/end, attribute change start/end,
permission check start/end.

The intention is to enable performance tracing using 'perf'as well as
improving debugging.

Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>NFS: Allow nfs_updatepage to extend a write under additional circumstances</title>
<updated>2013-07-09T23:32:50+00:00</updated>
<author>
<name>Scott Mayhew</name>
<email>smayhew@redhat.com</email>
</author>
<published>2013-07-05T21:33:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c7559663e42f4294ffe31fe159da6b6a66b35d61'/>
<id>c7559663e42f4294ffe31fe159da6b6a66b35d61</id>
<content type='text'>
Currently nfs_updatepage allows a write to be extended to cover a full
page only if we don't have a byte range lock lock on the file... but if
we have a write delegation on the file or if we have the whole file
locked for writing then we should be allowed to extend the write as
well.

Signed-off-by: Scott Mayhew &lt;smayhew@redhat.com&gt;
[Trond: fix up call to nfs_have_delegation()]
Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently nfs_updatepage allows a write to be extended to cover a full
page only if we don't have a byte range lock lock on the file... but if
we have a write delegation on the file or if we have the whole file
locked for writing then we should be allowed to extend the write as
well.

Signed-off-by: Scott Mayhew &lt;smayhew@redhat.com&gt;
[Trond: fix up call to nfs_have_delegation()]
Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>NFS: Don't accept more reads/writes if the open context recovery failed</title>
<updated>2013-03-25T16:04:10+00:00</updated>
<author>
<name>Trond Myklebust</name>
<email>Trond.Myklebust@netapp.com</email>
</author>
<published>2013-03-18T23:45:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c58c844187df61ef7cc103d0abb5dd6198bcfcd6'/>
<id>c58c844187df61ef7cc103d0abb5dd6198bcfcd6</id>
<content type='text'>
If the state recovery failed, we want to ensure that the application
doesn't try to use the same file descriptor for more reads or writes.

Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If the state recovery failed, we want to ensure that the application
doesn't try to use the same file descriptor for more reads or writes.

Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>NFS: Ensure that we free the rpc_task after read and write cleanups are done</title>
<updated>2013-01-04T17:59:10+00:00</updated>
<author>
<name>Trond Myklebust</name>
<email>Trond.Myklebust@netapp.com</email>
</author>
<published>2013-01-04T17:47:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6db6dd7d3fd8f7c765dabc376493d6791ab28bd6'/>
<id>6db6dd7d3fd8f7c765dabc376493d6791ab28bd6</id>
<content type='text'>
This patch ensures that we free the rpc_task after the cleanup callbacks
are done in order to avoid a deadlock problem that can be triggered if
the callback needs to wait for another workqueue item to complete.

Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
Cc: Weston Andros Adamson &lt;dros@netapp.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Bruce Fields &lt;bfields@fieldses.org&gt;
Cc: stable@vger.kernel.org [&gt;= 3.5]
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch ensures that we free the rpc_task after the cleanup callbacks
are done in order to avoid a deadlock problem that can be triggered if
the callback needs to wait for another workqueue item to complete.

Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
Cc: Weston Andros Adamson &lt;dros@netapp.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Bruce Fields &lt;bfields@fieldses.org&gt;
Cc: stable@vger.kernel.org [&gt;= 3.5]
</pre>
</div>
</content>
</entry>
<entry>
<title>NFS: nfs_migrate_page() does not wait for FS-Cache to finish with a page</title>
<updated>2012-12-20T22:12:03+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2012-12-05T13:34:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8c209ce721444a61b61d9e772746c721e4d8d1e8'/>
<id>8c209ce721444a61b61d9e772746c721e4d8d1e8</id>
<content type='text'>
nfs_migrate_page() does not wait for FS-Cache to finish with a page, probably
leading to the following bad-page-state:

 BUG: Bad page state in process python-bin  pfn:17d39b
 page:ffffea00053649e8 flags:004000000000100c count:0 mapcount:0 mapping:(null)
index:38686 (Tainted: G    B      ---------------- )
 Pid: 31053, comm: python-bin Tainted: G    B      ----------------
2.6.32-71.24.1.el6.x86_64 #1
 Call Trace:
 [&lt;ffffffff8111bfe7&gt;] bad_page+0x107/0x160
 [&lt;ffffffff8111ee69&gt;] free_hot_cold_page+0x1c9/0x220
 [&lt;ffffffff8111ef19&gt;] __pagevec_free+0x59/0xb0
 [&lt;ffffffff8104b988&gt;] ? flush_tlb_others_ipi+0x128/0x130
 [&lt;ffffffff8112230c&gt;] release_pages+0x21c/0x250
 [&lt;ffffffff8115b92a&gt;] ? remove_migration_pte+0x28a/0x2b0
 [&lt;ffffffff8115f3f8&gt;] ? mem_cgroup_get_reclaim_stat_from_page+0x18/0x70
 [&lt;ffffffff81122687&gt;] ____pagevec_lru_add+0x167/0x180
 [&lt;ffffffff811226f8&gt;] __lru_cache_add+0x58/0x70
 [&lt;ffffffff81122731&gt;] lru_cache_add_lru+0x21/0x40
 [&lt;ffffffff81123f49&gt;] putback_lru_page+0x69/0x100
 [&lt;ffffffff8115c0bd&gt;] migrate_pages+0x13d/0x5d0
 [&lt;ffffffff81122687&gt;] ? ____pagevec_lru_add+0x167/0x180
 [&lt;ffffffff81152ab0&gt;] ? compaction_alloc+0x0/0x370
 [&lt;ffffffff8115255c&gt;] compact_zone+0x4cc/0x600
 [&lt;ffffffff8111cfac&gt;] ? get_page_from_freelist+0x15c/0x820
 [&lt;ffffffff810672f4&gt;] ? check_preempt_wakeup+0x1c4/0x3c0
 [&lt;ffffffff8115290e&gt;] compact_zone_order+0x7e/0xb0
 [&lt;ffffffff81152a49&gt;] try_to_compact_pages+0x109/0x170
 [&lt;ffffffff8111e94d&gt;] __alloc_pages_nodemask+0x5ed/0x850
 [&lt;ffffffff814c9136&gt;] ? thread_return+0x4e/0x778
 [&lt;ffffffff81150d43&gt;] alloc_pages_vma+0x93/0x150
 [&lt;ffffffff81167ea5&gt;] do_huge_pmd_anonymous_page+0x135/0x340
 [&lt;ffffffff814cb6f6&gt;] ? rwsem_down_read_failed+0x26/0x30
 [&lt;ffffffff81136755&gt;] handle_mm_fault+0x245/0x2b0
 [&lt;ffffffff814ce383&gt;] do_page_fault+0x123/0x3a0
 [&lt;ffffffff814cbdf5&gt;] page_fault+0x25/0x30

nfs_migrate_page() calls nfs_fscache_release_page() which doesn't actually wait
- even if __GFP_WAIT is set.  The reason that doesn't wait is that
fscache_maybe_release_page() might deadlock the allocator as the work threads
writing to the cache may all end up sleeping on memory allocation.

However, I wonder if that is actually a problem.  There are a number of things
I can do to deal with this:

 (1) Make nfs_migrate_page() wait.

 (2) Make fscache_maybe_release_page() honour the __GFP_WAIT flag.

 (3) Set a timeout around the wait.

 (4) Make nfs_migrate_page() return an error if the page is still busy.

For the moment, I'll select (2) and (4).

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Acked-by: Jeff Layton &lt;jlayton@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
nfs_migrate_page() does not wait for FS-Cache to finish with a page, probably
leading to the following bad-page-state:

 BUG: Bad page state in process python-bin  pfn:17d39b
 page:ffffea00053649e8 flags:004000000000100c count:0 mapcount:0 mapping:(null)
index:38686 (Tainted: G    B      ---------------- )
 Pid: 31053, comm: python-bin Tainted: G    B      ----------------
2.6.32-71.24.1.el6.x86_64 #1
 Call Trace:
 [&lt;ffffffff8111bfe7&gt;] bad_page+0x107/0x160
 [&lt;ffffffff8111ee69&gt;] free_hot_cold_page+0x1c9/0x220
 [&lt;ffffffff8111ef19&gt;] __pagevec_free+0x59/0xb0
 [&lt;ffffffff8104b988&gt;] ? flush_tlb_others_ipi+0x128/0x130
 [&lt;ffffffff8112230c&gt;] release_pages+0x21c/0x250
 [&lt;ffffffff8115b92a&gt;] ? remove_migration_pte+0x28a/0x2b0
 [&lt;ffffffff8115f3f8&gt;] ? mem_cgroup_get_reclaim_stat_from_page+0x18/0x70
 [&lt;ffffffff81122687&gt;] ____pagevec_lru_add+0x167/0x180
 [&lt;ffffffff811226f8&gt;] __lru_cache_add+0x58/0x70
 [&lt;ffffffff81122731&gt;] lru_cache_add_lru+0x21/0x40
 [&lt;ffffffff81123f49&gt;] putback_lru_page+0x69/0x100
 [&lt;ffffffff8115c0bd&gt;] migrate_pages+0x13d/0x5d0
 [&lt;ffffffff81122687&gt;] ? ____pagevec_lru_add+0x167/0x180
 [&lt;ffffffff81152ab0&gt;] ? compaction_alloc+0x0/0x370
 [&lt;ffffffff8115255c&gt;] compact_zone+0x4cc/0x600
 [&lt;ffffffff8111cfac&gt;] ? get_page_from_freelist+0x15c/0x820
 [&lt;ffffffff810672f4&gt;] ? check_preempt_wakeup+0x1c4/0x3c0
 [&lt;ffffffff8115290e&gt;] compact_zone_order+0x7e/0xb0
 [&lt;ffffffff81152a49&gt;] try_to_compact_pages+0x109/0x170
 [&lt;ffffffff8111e94d&gt;] __alloc_pages_nodemask+0x5ed/0x850
 [&lt;ffffffff814c9136&gt;] ? thread_return+0x4e/0x778
 [&lt;ffffffff81150d43&gt;] alloc_pages_vma+0x93/0x150
 [&lt;ffffffff81167ea5&gt;] do_huge_pmd_anonymous_page+0x135/0x340
 [&lt;ffffffff814cb6f6&gt;] ? rwsem_down_read_failed+0x26/0x30
 [&lt;ffffffff81136755&gt;] handle_mm_fault+0x245/0x2b0
 [&lt;ffffffff814ce383&gt;] do_page_fault+0x123/0x3a0
 [&lt;ffffffff814cbdf5&gt;] page_fault+0x25/0x30

nfs_migrate_page() calls nfs_fscache_release_page() which doesn't actually wait
- even if __GFP_WAIT is set.  The reason that doesn't wait is that
fscache_maybe_release_page() might deadlock the allocator as the work threads
writing to the cache may all end up sleeping on memory allocation.

However, I wonder if that is actually a problem.  There are a number of things
I can do to deal with this:

 (1) Make nfs_migrate_page() wait.

 (2) Make fscache_maybe_release_page() honour the __GFP_WAIT flag.

 (3) Set a timeout around the wait.

 (4) Make nfs_migrate_page() return an error if the page is still busy.

For the moment, I'll select (2) and (4).

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Acked-by: Jeff Layton &lt;jlayton@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
