linux.git/fs/nfs/write.c, branch v3.13

nfs: use %p[dD] instead of open-coded (and often racy) equivalents

2013-10-25T03:34:50+00:00

Signed-off-by: Al Viro

NFS: Don't check lock owner compatibility in writes unless file is locked

2013-09-05T22:11:42+00:00

If we're doing buffered writes, and there is no file locking involved,
then we don't have to worry about whether or not the lock owner information
is identical.
By relaxing this check, we ensure that fork()ed child processes can write
to a page without having to first sync dirty data that was written
by the parent to disk.

Reported-by: Quentin Barnes 
Signed-off-by: Trond Myklebust 
Tested-by: Quentin Barnes

nfs4.1: Add SP4_MACH_CRED write and commit support

2013-09-05T14:50:45+00:00

WRITE and COMMIT can use the machine credential.

If WRITE is supported and COMMIT is not, make all (mach cred) writes FILE_SYNC4.

Signed-off-by: Weston Andros Adamson 
Signed-off-by: Trond Myklebust

NFSv4: Don't try to recover NFSv4 locks when they are lost.

2013-09-04T16:26:32+00:00

When an NFSv4 client loses contact with the server it can lose any
locks that it holds.

Currently when it reconnects to the server it simply tries to reclaim
those locks.  This might succeed even though some other client has
held and released a lock in the mean time.  So the first client might
think the file is unchanged, but it isn't.  This isn't good.

If, when recovery happens, the locks cannot be claimed because some
other client still holds the lock, then we get a message in the kernel
logs, but the client can still write.  So two clients can both think
they have a lock and can both write at the same time.  This is equally
not good.

There was a patch a while ago
  http://comments.gmane.org/gmane.linux.nfs/41917

which tried to address some of this, but it didn't seem to go
anywhere.  That patch would also send a signal to the process.  That
might be useful but for now this patch just causes writes to fail.

For NFSv4 (unlike v2/v3) there is a strong link between the lock and
the write request so we can fairly easily fail any IO of the lock is
gone.  While some applications might not expect this, it is still
safer than allowing the write to succeed.

Because this is a fairly big change in behaviour a module parameter,
"recover_locks", is introduced which defaults to true (the current
behaviour) but can be set to "false" to tell the client not to try to
recover things that were lost.

Signed-off-by: NeilBrown 
Signed-off-by: Trond Myklebust

NFS avoid expired credential keys for buffered writes

2013-09-03T19:25:09+00:00

We must avoid buffering a WRITE that is using a credential key (e.g. a GSS
context key) that is about to expire or has expired.  We currently will
paint ourselves into a corner by returning success to the applciation
for such a buffered WRITE, only to discover that we do not have permission when
we attempt to flush the WRITE (and potentially associated COMMIT) to disk.

Use the RPC layer credential key timeout and expire routines which use a
a watermark, gss_key_expire_timeo. We test the key in nfs_file_write.

If a WRITE is using a credential with a key that will expire within
watermark seconds, flush the inode in nfs_write_end and send only
NFS_FILE_SYNC WRITEs by adding nfs_ctx_key_to_expire to nfs_need_sync_write.
Note that this results in single page NFS_FILE_SYNC WRITEs.

Signed-off-by: Andy Adamson 
[Trond: removed a pr_warn_ratelimited() for now]
Signed-off-by: Trond Myklebust

NFS: Add event tracing for generic NFS events

2013-08-22T12:58:17+00:00

Add tracepoints for inode attribute updates, attribute revalidation,
writeback start/end fsync start/end, attribute change start/end,
permission check start/end.

The intention is to enable performance tracing using 'perf'as well as
improving debugging.

Signed-off-by: Trond Myklebust

NFS: Allow nfs_updatepage to extend a write under additional circumstances

2013-07-09T23:32:50+00:00

Currently nfs_updatepage allows a write to be extended to cover a full
page only if we don't have a byte range lock lock on the file... but if
we have a write delegation on the file or if we have the whole file
locked for writing then we should be allowed to extend the write as
well.

Signed-off-by: Scott Mayhew 
[Trond: fix up call to nfs_have_delegation()]
Signed-off-by: Trond Myklebust

NFS: Don't accept more reads/writes if the open context recovery failed

2013-03-25T16:04:10+00:00

If the state recovery failed, we want to ensure that the application
doesn't try to use the same file descriptor for more reads or writes.

Signed-off-by: Trond Myklebust

NFS: Ensure that we free the rpc_task after read and write cleanups are done

2013-01-04T17:59:10+00:00

This patch ensures that we free the rpc_task after the cleanup callbacks
are done in order to avoid a deadlock problem that can be triggered if
the callback needs to wait for another workqueue item to complete.

Signed-off-by: Trond Myklebust 
Cc: Weston Andros Adamson 
Cc: Tejun Heo 
Cc: Bruce Fields 
Cc: stable@vger.kernel.org [>= 3.5]

NFS: nfs_migrate_page() does not wait for FS-Cache to finish with a page

2012-12-20T22:12:03+00:00

nfs_migrate_page() does not wait for FS-Cache to finish with a page, probably
leading to the following bad-page-state:

 BUG: Bad page state in process python-bin  pfn:17d39b
 page:ffffea00053649e8 flags:004000000000100c count:0 mapcount:0 mapping:(null)
index:38686 (Tainted: G    B      ---------------- )
 Pid: 31053, comm: python-bin Tainted: G    B      ----------------
2.6.32-71.24.1.el6.x86_64 #1
 Call Trace:
 [] bad_page+0x107/0x160
 [] free_hot_cold_page+0x1c9/0x220
 [] __pagevec_free+0x59/0xb0
 [] ? flush_tlb_others_ipi+0x128/0x130
 [] release_pages+0x21c/0x250
 [] ? remove_migration_pte+0x28a/0x2b0
 [] ? mem_cgroup_get_reclaim_stat_from_page+0x18/0x70
 [] ____pagevec_lru_add+0x167/0x180
 [] __lru_cache_add+0x58/0x70
 [] lru_cache_add_lru+0x21/0x40
 [] putback_lru_page+0x69/0x100
 [] migrate_pages+0x13d/0x5d0
 [] ? ____pagevec_lru_add+0x167/0x180
 [] ? compaction_alloc+0x0/0x370
 [] compact_zone+0x4cc/0x600
 [] ? get_page_from_freelist+0x15c/0x820
 [] ? check_preempt_wakeup+0x1c4/0x3c0
 [] compact_zone_order+0x7e/0xb0
 [] try_to_compact_pages+0x109/0x170
 [] __alloc_pages_nodemask+0x5ed/0x850
 [] ? thread_return+0x4e/0x778
 [] alloc_pages_vma+0x93/0x150
 [] do_huge_pmd_anonymous_page+0x135/0x340
 [] ? rwsem_down_read_failed+0x26/0x30
 [] handle_mm_fault+0x245/0x2b0
 [] do_page_fault+0x123/0x3a0
 [] page_fault+0x25/0x30

nfs_migrate_page() calls nfs_fscache_release_page() which doesn't actually wait
- even if __GFP_WAIT is set.  The reason that doesn't wait is that
fscache_maybe_release_page() might deadlock the allocator as the work threads
writing to the cache may all end up sleeping on memory allocation.

However, I wonder if that is actually a problem.  There are a number of things
I can do to deal with this:

 (1) Make nfs_migrate_page() wait.

 (2) Make fscache_maybe_release_page() honour the __GFP_WAIT flag.

 (3) Set a timeout around the wait.

 (4) Make nfs_migrate_page() return an error if the page is still busy.

For the moment, I'll select (2) and (4).

Signed-off-by: David Howells 
Acked-by: Jeff Layton