<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/cachefiles/namei.c, branch v3.11</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>FS-Cache: Fix object state machine to have separate work and wait states</title>
<updated>2013-06-19T13:16:47+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2013-05-10T18:50:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=caaef6900befb45689b1d1831ce3c7e7fb5b504f'/>
<id>caaef6900befb45689b1d1831ce3c7e7fb5b504f</id>
<content type='text'>
Fix object state machine to have separate work and wait states as that makes
it easier to envision.

There are now three kinds of state:

 (1) Work state.  This is an execution state.  No event processing is performed
     by a work state.  The function attached to a work state returns a pointer
     indicating the next state to which the OSM should transition.  Returning
     NO_TRANSIT repeats the current state, but goes back to the scheduler
     first.

 (2) Wait state.  This is an event processing state.  No execution is
     performed by a wait state.  Wait states are just tables of "if event X
     occurs, clear it and transition to state Y".  The dispatcher returns to
     the scheduler if none of the events in which the wait state has an
     interest are currently pending.

 (3) Out-of-band state.  This is a special work state.  Transitions to normal
     states can be overridden when an unexpected event occurs (eg. I/O error).
     Instead the dispatcher disables and clears the OOB event and transits to
     the specified work state.  This then acts as an ordinary work state,
     though object-&gt;state points to the overridden destination.  Returning
     NO_TRANSIT resumes the overridden transition.

In addition, the states have names in their definitions, so there's no need for
tables of state names.  Further, the EV_REQUEUE event is no longer necessary as
that is automatic for work states.

Since the states are now separate structs rather than values in an enum, it's
not possible to use comparisons other than (non-)equality between them, so use
some object-&gt;flags to indicate what phase an object is in.

The EV_RELEASE, EV_RETIRE and EV_WITHDRAW events have been squished into one
(EV_KILL).  An object flag now carries the information about retirement.

Similarly, the RELEASING, RECYCLING and WITHDRAWING states have been merged
into an KILL_OBJECT state and additional states have been added for handling
waiting dependent objects (JUMPSTART_DEPS and KILL_DEPENDENTS).

A state has also been added for synchronising with parent object initialisation
(WAIT_FOR_PARENT) and another for initiating look up (PARENT_READY).

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Tested-By: Milosz Tanski &lt;milosz@adfin.com&gt;
Acked-by: Jeff Layton &lt;jlayton@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix object state machine to have separate work and wait states as that makes
it easier to envision.

There are now three kinds of state:

 (1) Work state.  This is an execution state.  No event processing is performed
     by a work state.  The function attached to a work state returns a pointer
     indicating the next state to which the OSM should transition.  Returning
     NO_TRANSIT repeats the current state, but goes back to the scheduler
     first.

 (2) Wait state.  This is an event processing state.  No execution is
     performed by a wait state.  Wait states are just tables of "if event X
     occurs, clear it and transition to state Y".  The dispatcher returns to
     the scheduler if none of the events in which the wait state has an
     interest are currently pending.

 (3) Out-of-band state.  This is a special work state.  Transitions to normal
     states can be overridden when an unexpected event occurs (eg. I/O error).
     Instead the dispatcher disables and clears the OOB event and transits to
     the specified work state.  This then acts as an ordinary work state,
     though object-&gt;state points to the overridden destination.  Returning
     NO_TRANSIT resumes the overridden transition.

In addition, the states have names in their definitions, so there's no need for
tables of state names.  Further, the EV_REQUEUE event is no longer necessary as
that is automatic for work states.

Since the states are now separate structs rather than values in an enum, it's
not possible to use comparisons other than (non-)equality between them, so use
some object-&gt;flags to indicate what phase an object is in.

The EV_RELEASE, EV_RETIRE and EV_WITHDRAW events have been squished into one
(EV_KILL).  An object flag now carries the information about retirement.

Similarly, the RELEASING, RECYCLING and WITHDRAWING states have been merged
into an KILL_OBJECT state and additional states have been added for handling
waiting dependent objects (JUMPSTART_DEPS and KILL_DEPENDENTS).

A state has also been added for synchronising with parent object initialisation
(WAIT_FOR_PARENT) and another for initiating look up (PARENT_READY).

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Tested-By: Milosz Tanski &lt;milosz@adfin.com&gt;
Acked-by: Jeff Layton &lt;jlayton@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>FS-Cache: Wrap checks on object state</title>
<updated>2013-06-19T13:16:47+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2013-05-10T18:50:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=493f7bc11457bc1f6fbf25a4b2bdf215ebaf050f'/>
<id>493f7bc11457bc1f6fbf25a4b2bdf215ebaf050f</id>
<content type='text'>
Wrap checks on object state (mostly outside of fs/fscache/object.c) with
inline functions so that the mechanism can be replaced.

Some of the state checks within object.c are left as-is as they will be
replaced.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Tested-By: Milosz Tanski &lt;milosz@adfin.com&gt;
Acked-by: Jeff Layton &lt;jlayton@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Wrap checks on object state (mostly outside of fs/fscache/object.c) with
inline functions so that the mechanism can be replaced.

Some of the state checks within object.c are left as-is as they will be
replaced.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Tested-By: Milosz Tanski &lt;milosz@adfin.com&gt;
Acked-by: Jeff Layton &lt;jlayton@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>CacheFiles: name i_mutex lock class explicitly</title>
<updated>2013-06-19T13:16:47+00:00</updated>
<author>
<name>J. Bruce Fields</name>
<email>bfields@redhat.com</email>
</author>
<published>2013-05-10T18:50:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6bd5e82b099c3d633905098b3c0af808ced73d85'/>
<id>6bd5e82b099c3d633905098b3c0af808ced73d85</id>
<content type='text'>
Just some cleanup.

(And note the caller of this function may, for example, call vfs_unlink
on a child, so the "1" (I_MUTEX_PARENT) really was what was intended
here.)

Signed-off-by: J. Bruce Fields &lt;bfields@redhat.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Tested-By: Milosz Tanski &lt;milosz@adfin.com&gt;
Acked-by: Jeff Layton &lt;jlayton@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Just some cleanup.

(And note the caller of this function may, for example, call vfs_unlink
on a child, so the "1" (I_MUTEX_PARENT) really was what was intended
here.)

Signed-off-by: J. Bruce Fields &lt;bfields@redhat.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Tested-By: Milosz Tanski &lt;milosz@adfin.com&gt;
Acked-by: Jeff Layton &lt;jlayton@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>FS-Cache: Don't mask off the object event mask when printing it</title>
<updated>2012-12-20T22:08:53+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2012-12-05T13:34:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c2d35bfe4b508451b75b5b6bc60a08dbdc44f952'/>
<id>c2d35bfe4b508451b75b5b6bc60a08dbdc44f952</id>
<content type='text'>
Don't mask off the object event mask when printing it.  That way it can be seen
if threre are bits set that shouldn't be.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Don't mask off the object event mask when printing it.  That way it can be seen
if threre are bits set that shouldn't be.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>don't pass nameidata * to vfs_create()</title>
<updated>2012-07-14T12:34:50+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2012-06-10T22:09:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=312b63fba9e88a0dcf800834b8ede8716bcc1e17'/>
<id>312b63fba9e88a0dcf800834b8ede8716bcc1e17</id>
<content type='text'>
all we want is a boolean flag, same as the method gets now

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
all we want is a boolean flag, same as the method gets now

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>switch touch_atime to struct path</title>
<updated>2012-03-21T01:29:41+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2012-03-15T12:21:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=68ac1234fb949b66941d94dce4157742799fc581'/>
<id>68ac1234fb949b66941d94dce4157742799fc581</id>
<content type='text'>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>CacheFiles: Add calls to path-based security hooks</title>
<updated>2011-01-23T23:49:45+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2010-12-24T14:48:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=821404434f3324bf23f545050ff64055a149766e'/>
<id>821404434f3324bf23f545050ff64055a149766e</id>
<content type='text'>
Add calls to path-based security hooks into CacheFiles as, unlike inode-based
security, these aren't implicit in the vfs_mkdir() and similar calls.

Reported-by: Tetsuo Handa &lt;penguin-kernel@i-love.sakura.ne.jp&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: James Morris &lt;jmorris@namei.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add calls to path-based security hooks into CacheFiles as, unlike inode-based
security, these aren't implicit in the vfs_mkdir() and similar calls.

Reported-by: Tetsuo Handa &lt;penguin-kernel@i-love.sakura.ne.jp&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: James Morris &lt;jmorris@namei.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fscache: convert object to use workqueue instead of slow-work</title>
<updated>2010-07-22T20:58:34+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2010-07-20T20:09:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8b8edefa2fffbff97f9eec8b70e78ae23abad1a0'/>
<id>8b8edefa2fffbff97f9eec8b70e78ae23abad1a0</id>
<content type='text'>
Make fscache object state transition callbacks use workqueue instead
of slow-work.  New dedicated unbound CPU workqueue fscache_object_wq
is created.  get/put callbacks are renamed and modified to take
@object and called directly from the enqueue wrapper and the work
function.  While at it, make all open coded instances of get/put to
use fscache_get/put_object().

* Unbound workqueue is used.

* work_busy() output is printed instead of slow-work flags in object
  debugging outputs.  They mean basically the same thing bit-for-bit.

* sysctl fscache.object_max_active added to control concurrency.  The
  default value is nr_cpus clamped between 4 and
  WQ_UNBOUND_MAX_ACTIVE.

* slow_work_sleep_till_thread_needed() is replaced with fscache
  private implementation fscache_object_sleep_till_congested() which
  waits on fscache_object_wq congestion.

* debugfs support is dropped for now.  Tracing API based debug
  facility is planned to be added.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Make fscache object state transition callbacks use workqueue instead
of slow-work.  New dedicated unbound CPU workqueue fscache_object_wq
is created.  get/put callbacks are renamed and modified to take
@object and called directly from the enqueue wrapper and the work
function.  While at it, make all open coded instances of get/put to
use fscache_get/put_object().

* Unbound workqueue is used.

* work_busy() output is printed instead of slow-work flags in object
  debugging outputs.  They mean basically the same thing bit-for-bit.

* sysctl fscache.object_max_active added to control concurrency.  The
  default value is nr_cpus clamped between 4 and
  WQ_UNBOUND_MAX_ACTIVE.

* slow_work_sleep_till_thread_needed() is replaced with fscache
  private implementation fscache_object_sleep_till_congested() which
  waits on fscache_object_wq congestion.

* debugfs support is dropped for now.  Tracing API based debug
  facility is planned to be added.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: David Howells &lt;dhowells@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>CacheFiles: Fix occasional EIO on call to vfs_unlink()</title>
<updated>2010-05-11T17:07:53+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2010-05-11T15:51:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c61ea31dac0319ec64b33725917bda81fc293a25'/>
<id>c61ea31dac0319ec64b33725917bda81fc293a25</id>
<content type='text'>
Fix an occasional EIO returned by a call to vfs_unlink():

	[ 4868.465413] CacheFiles: I/O Error: Unlink failed
	[ 4868.465444] FS-Cache: Cache cachefiles stopped due to I/O error
	[ 4947.320011] CacheFiles: File cache on md3 unregistering
	[ 4947.320041] FS-Cache: Withdrawing cache "mycache"
	[ 5127.348683] FS-Cache: Cache "mycache" added (type cachefiles)
	[ 5127.348716] CacheFiles: File cache on md3 registered
	[ 7076.871081] CacheFiles: I/O Error: Unlink failed
	[ 7076.871130] FS-Cache: Cache cachefiles stopped due to I/O error
	[ 7116.780891] CacheFiles: File cache on md3 unregistering
	[ 7116.780937] FS-Cache: Withdrawing cache "mycache"
	[ 7296.813394] FS-Cache: Cache "mycache" added (type cachefiles)
	[ 7296.813432] CacheFiles: File cache on md3 registered

What happens is this:

 (1) A cached NFS file is seen to have become out of date, so NFS retires the
     object and immediately acquires a new object with the same key.

 (2) Retirement of the old object is done asynchronously - so the lookup/create
     to generate the new object may be done first.

     This can be a problem as the old object and the new object must exist at
     the same point in the backing filesystem (i.e. they must have the same
     pathname).

 (3) The lookup for the new object sees that a backing file already exists,
     checks to see whether it is valid and sees that it isn't.  It then deletes
     that file and creates a new one on disk.

 (4) The retirement phase for the old file is then performed.  It tries to
     delete the dentry it has, but ext4_unlink() returns -EIO because the inode
     attached to that dentry no longer matches the inode number associated with
     the filename in the parent directory.

The trace below shows this quite well.

	[md5sum] ==&gt; __fscache_relinquish_cookie(ffff88002d12fb58{NFS.fh,ffff88002ce62100},1)
	[md5sum] ==&gt; __fscache_acquire_cookie({NFS.server},{NFS.fh},ffff88002ce62100)

NFS has retired the old cookie and asked for a new one.

	[kslowd] ==&gt; fscache_object_state_machine({OBJ52,OBJECT_ACTIVE,24})
	[kslowd] &lt;== fscache_object_state_machine() [-&gt;OBJECT_DYING]
	[kslowd] ==&gt; fscache_object_state_machine({OBJ53,OBJECT_INIT,0})
	[kslowd] &lt;== fscache_object_state_machine() [-&gt;OBJECT_LOOKING_UP]
	[kslowd] ==&gt; fscache_object_state_machine({OBJ52,OBJECT_DYING,24})
	[kslowd] &lt;== fscache_object_state_machine() [-&gt;OBJECT_RECYCLING]

The old object (OBJ52) is going through the terminal states to get rid of it,
whilst the new object - (OBJ53) - is coming into being.

	[kslowd] ==&gt; fscache_object_state_machine({OBJ53,OBJECT_LOOKING_UP,0})
	[kslowd] ==&gt; cachefiles_walk_to_object({ffff88003029d8b8},OBJ53,@68,)
	[kslowd] lookup '@68'
	[kslowd] next -&gt; ffff88002ce41bd0 positive
	[kslowd] advance
	[kslowd] lookup 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA'
	[kslowd] next -&gt; ffff8800369faac8 positive

The new object has looked up the subdir in which the file would be in (getting
dentry ffff88002ce41bd0) and then looked up the file itself (getting dentry
ffff8800369faac8).

	[kslowd] validate 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA'
	[kslowd] ==&gt; cachefiles_bury_object(,'@68','Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA')
	[kslowd] remove ffff8800369faac8 from ffff88002ce41bd0
	[kslowd] unlink stale object
	[kslowd] &lt;== cachefiles_bury_object() = 0

It then checks the file's xattrs to see if it's valid.  NFS says that the
auxiliary data indicate the file is out of date (obvious to us - that's why NFS
ditched the old version and got a new one).  CacheFiles then deletes the old
file (dentry ffff8800369faac8).

	[kslowd] redo lookup
	[kslowd] lookup 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA'
	[kslowd] next -&gt; ffff88002cd94288 negative
	[kslowd] create -&gt; ffff88002cd94288{ffff88002cdaf238{ino=148247}}

CacheFiles then redoes the lookup and gets a negative result in a new dentry
(ffff88002cd94288) which it then creates a file for.

	[kslowd] ==&gt; cachefiles_mark_object_active(,OBJ53)
	[kslowd] &lt;== cachefiles_mark_object_active() = 0
	[kslowd] === OBTAINED_OBJECT ===
	[kslowd] &lt;== cachefiles_walk_to_object() = 0 [148247]
	[kslowd] &lt;== fscache_object_state_machine() [-&gt;OBJECT_AVAILABLE]

The new object is then marked active and the state machine moves to the
available state - at which point NFS can start filling the object.

	[kslowd] ==&gt; fscache_object_state_machine({OBJ52,OBJECT_RECYCLING,20})
	[kslowd] ==&gt; fscache_release_object()
	[kslowd] ==&gt; cachefiles_drop_object({OBJ52,2})
	[kslowd] ==&gt; cachefiles_delete_object(,OBJ52{ffff8800369faac8})

The old object, meanwhile, goes on with being retired.  If allocation occurs
first, cachefiles_delete_object() has to wait for dir-&gt;d_inode-&gt;i_mutex to
become available before it can continue.

	[kslowd] ==&gt; cachefiles_bury_object(,'@68','Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA')
	[kslowd] remove ffff8800369faac8 from ffff88002ce41bd0
	[kslowd] unlink stale object
	EXT4-fs warning (device sda6): ext4_unlink: Inode number mismatch in unlink (148247!=148193)
	CacheFiles: I/O Error: Unlink failed
	FS-Cache: Cache cachefiles stopped due to I/O error

CacheFiles then tries to delete the file for the old object, but the dentry it
has (ffff8800369faac8) no longer points to a valid inode for that directory
entry, and so ext4_unlink() returns -EIO when de-&gt;inode does not match i_ino.

	[kslowd] &lt;== cachefiles_bury_object() = -5
	[kslowd] &lt;== cachefiles_delete_object() = -5
	[kslowd] &lt;== fscache_object_state_machine() [-&gt;OBJECT_DEAD]
	[kslowd] ==&gt; fscache_object_state_machine({OBJ53,OBJECT_AVAILABLE,0})
	[kslowd] &lt;== fscache_object_state_machine() [-&gt;OBJECT_ACTIVE]

(Note that the above trace includes extra information beyond that produced by
the upstream code).

The fix is to note when an object that is being retired has had its object
deleted preemptively by a replacement object that is being created, and to
skip the second removal attempt in such a case.

Reported-by: Greg M &lt;gregm@servu.net.au&gt;
Reported-by: Mark Moseley &lt;moseleymark@gmail.com&gt;
Reported-by: Romain DEGEZ &lt;romain.degez@smartjog.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix an occasional EIO returned by a call to vfs_unlink():

	[ 4868.465413] CacheFiles: I/O Error: Unlink failed
	[ 4868.465444] FS-Cache: Cache cachefiles stopped due to I/O error
	[ 4947.320011] CacheFiles: File cache on md3 unregistering
	[ 4947.320041] FS-Cache: Withdrawing cache "mycache"
	[ 5127.348683] FS-Cache: Cache "mycache" added (type cachefiles)
	[ 5127.348716] CacheFiles: File cache on md3 registered
	[ 7076.871081] CacheFiles: I/O Error: Unlink failed
	[ 7076.871130] FS-Cache: Cache cachefiles stopped due to I/O error
	[ 7116.780891] CacheFiles: File cache on md3 unregistering
	[ 7116.780937] FS-Cache: Withdrawing cache "mycache"
	[ 7296.813394] FS-Cache: Cache "mycache" added (type cachefiles)
	[ 7296.813432] CacheFiles: File cache on md3 registered

What happens is this:

 (1) A cached NFS file is seen to have become out of date, so NFS retires the
     object and immediately acquires a new object with the same key.

 (2) Retirement of the old object is done asynchronously - so the lookup/create
     to generate the new object may be done first.

     This can be a problem as the old object and the new object must exist at
     the same point in the backing filesystem (i.e. they must have the same
     pathname).

 (3) The lookup for the new object sees that a backing file already exists,
     checks to see whether it is valid and sees that it isn't.  It then deletes
     that file and creates a new one on disk.

 (4) The retirement phase for the old file is then performed.  It tries to
     delete the dentry it has, but ext4_unlink() returns -EIO because the inode
     attached to that dentry no longer matches the inode number associated with
     the filename in the parent directory.

The trace below shows this quite well.

	[md5sum] ==&gt; __fscache_relinquish_cookie(ffff88002d12fb58{NFS.fh,ffff88002ce62100},1)
	[md5sum] ==&gt; __fscache_acquire_cookie({NFS.server},{NFS.fh},ffff88002ce62100)

NFS has retired the old cookie and asked for a new one.

	[kslowd] ==&gt; fscache_object_state_machine({OBJ52,OBJECT_ACTIVE,24})
	[kslowd] &lt;== fscache_object_state_machine() [-&gt;OBJECT_DYING]
	[kslowd] ==&gt; fscache_object_state_machine({OBJ53,OBJECT_INIT,0})
	[kslowd] &lt;== fscache_object_state_machine() [-&gt;OBJECT_LOOKING_UP]
	[kslowd] ==&gt; fscache_object_state_machine({OBJ52,OBJECT_DYING,24})
	[kslowd] &lt;== fscache_object_state_machine() [-&gt;OBJECT_RECYCLING]

The old object (OBJ52) is going through the terminal states to get rid of it,
whilst the new object - (OBJ53) - is coming into being.

	[kslowd] ==&gt; fscache_object_state_machine({OBJ53,OBJECT_LOOKING_UP,0})
	[kslowd] ==&gt; cachefiles_walk_to_object({ffff88003029d8b8},OBJ53,@68,)
	[kslowd] lookup '@68'
	[kslowd] next -&gt; ffff88002ce41bd0 positive
	[kslowd] advance
	[kslowd] lookup 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA'
	[kslowd] next -&gt; ffff8800369faac8 positive

The new object has looked up the subdir in which the file would be in (getting
dentry ffff88002ce41bd0) and then looked up the file itself (getting dentry
ffff8800369faac8).

	[kslowd] validate 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA'
	[kslowd] ==&gt; cachefiles_bury_object(,'@68','Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA')
	[kslowd] remove ffff8800369faac8 from ffff88002ce41bd0
	[kslowd] unlink stale object
	[kslowd] &lt;== cachefiles_bury_object() = 0

It then checks the file's xattrs to see if it's valid.  NFS says that the
auxiliary data indicate the file is out of date (obvious to us - that's why NFS
ditched the old version and got a new one).  CacheFiles then deletes the old
file (dentry ffff8800369faac8).

	[kslowd] redo lookup
	[kslowd] lookup 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA'
	[kslowd] next -&gt; ffff88002cd94288 negative
	[kslowd] create -&gt; ffff88002cd94288{ffff88002cdaf238{ino=148247}}

CacheFiles then redoes the lookup and gets a negative result in a new dentry
(ffff88002cd94288) which it then creates a file for.

	[kslowd] ==&gt; cachefiles_mark_object_active(,OBJ53)
	[kslowd] &lt;== cachefiles_mark_object_active() = 0
	[kslowd] === OBTAINED_OBJECT ===
	[kslowd] &lt;== cachefiles_walk_to_object() = 0 [148247]
	[kslowd] &lt;== fscache_object_state_machine() [-&gt;OBJECT_AVAILABLE]

The new object is then marked active and the state machine moves to the
available state - at which point NFS can start filling the object.

	[kslowd] ==&gt; fscache_object_state_machine({OBJ52,OBJECT_RECYCLING,20})
	[kslowd] ==&gt; fscache_release_object()
	[kslowd] ==&gt; cachefiles_drop_object({OBJ52,2})
	[kslowd] ==&gt; cachefiles_delete_object(,OBJ52{ffff8800369faac8})

The old object, meanwhile, goes on with being retired.  If allocation occurs
first, cachefiles_delete_object() has to wait for dir-&gt;d_inode-&gt;i_mutex to
become available before it can continue.

	[kslowd] ==&gt; cachefiles_bury_object(,'@68','Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA')
	[kslowd] remove ffff8800369faac8 from ffff88002ce41bd0
	[kslowd] unlink stale object
	EXT4-fs warning (device sda6): ext4_unlink: Inode number mismatch in unlink (148247!=148193)
	CacheFiles: I/O Error: Unlink failed
	FS-Cache: Cache cachefiles stopped due to I/O error

CacheFiles then tries to delete the file for the old object, but the dentry it
has (ffff8800369faac8) no longer points to a valid inode for that directory
entry, and so ext4_unlink() returns -EIO when de-&gt;inode does not match i_ino.

	[kslowd] &lt;== cachefiles_bury_object() = -5
	[kslowd] &lt;== cachefiles_delete_object() = -5
	[kslowd] &lt;== fscache_object_state_machine() [-&gt;OBJECT_DEAD]
	[kslowd] ==&gt; fscache_object_state_machine({OBJ53,OBJECT_AVAILABLE,0})
	[kslowd] &lt;== fscache_object_state_machine() [-&gt;OBJECT_ACTIVE]

(Note that the above trace includes extra information beyond that produced by
the upstream code).

The fix is to note when an object that is being retired has had its object
deleted preemptively by a replacement object that is being created, and to
skip the second removal attempt in such a case.

Reported-by: Greg M &lt;gregm@servu.net.au&gt;
Reported-by: Mark Moseley &lt;moseleymark@gmail.com&gt;
Reported-by: Romain DEGEZ &lt;romain.degez@smartjog.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h</title>
<updated>2010-03-30T13:02:32+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2010-03-24T08:04:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5a0e3ad6af8660be21ca98a971cd00f331318c05'/>
<id>5a0e3ad6af8660be21ca98a971cd00f331318c05</id>
<content type='text'>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -&gt; slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Guess-its-ok-by: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Lee Schermerhorn &lt;Lee.Schermerhorn@hp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -&gt; slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Guess-its-ok-by: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Lee Schermerhorn &lt;Lee.Schermerhorn@hp.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
