<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/include/linux/fs.h, branch linux-2.6.35.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>mm: prevent concurrent unmap_mapping_range() on the same inode</title>
<updated>2011-08-01T20:54:59+00:00</updated>
<author>
<name>Miklos Szeredi</name>
<email>mszeredi@suse.cz</email>
</author>
<published>2011-02-23T12:49:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2d32f62fbbad8f663c26df23242113ef9464f790'/>
<id>2d32f62fbbad8f663c26df23242113ef9464f790</id>
<content type='text'>
commit 2aa15890f3c191326678f1bd68af61ec6b8753ec upstream.

Michael Leun reported that running parallel opens on a fuse filesystem
can trigger a "kernel BUG at mm/truncate.c:475"

Gurudas Pai reported the same bug on NFS.

The reason is, unmap_mapping_range() is not prepared for more than
one concurrent invocation per inode.  For example:

  thread1: going through a big range, stops in the middle of a vma and
     stores the restart address in vm_truncate_count.

  thread2: comes in with a small (e.g. single page) unmap request on
     the same vma, somewhere before restart_address, finds that the
     vma was already unmapped up to the restart address and happily
     returns without doing anything.

Another scenario would be two big unmap requests, both having to
restart the unmapping and each one setting vm_truncate_count to its
own value.  This could go on forever without any of them being able to
finish.

Truncate and hole punching already serialize with i_mutex.  Other
callers of unmap_mapping_range() do not, and it's difficult to get
i_mutex protection for all callers.  In particular -&gt;d_revalidate(),
which calls invalidate_inode_pages2_range() in fuse, may be called
with or without i_mutex.

This patch adds a new mutex to 'struct address_space' to prevent
running multiple concurrent unmap_mapping_range() on the same mapping.

[ We'll hopefully get rid of all this with the upcoming mm
  preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
  lockbreak" patch in particular.  But that is for 2.6.39 ]

Signed-off-by: Miklos Szeredi &lt;mszeredi@suse.cz&gt;
Reported-by: Michael Leun &lt;lkml20101129@newton.leun.net&gt;
Reported-by: Gurudas Pai &lt;gurudas.pai@oracle.com&gt;
Tested-by: Gurudas Pai &lt;gurudas.pai@oracle.com&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 2aa15890f3c191326678f1bd68af61ec6b8753ec upstream.

Michael Leun reported that running parallel opens on a fuse filesystem
can trigger a "kernel BUG at mm/truncate.c:475"

Gurudas Pai reported the same bug on NFS.

The reason is, unmap_mapping_range() is not prepared for more than
one concurrent invocation per inode.  For example:

  thread1: going through a big range, stops in the middle of a vma and
     stores the restart address in vm_truncate_count.

  thread2: comes in with a small (e.g. single page) unmap request on
     the same vma, somewhere before restart_address, finds that the
     vma was already unmapped up to the restart address and happily
     returns without doing anything.

Another scenario would be two big unmap requests, both having to
restart the unmapping and each one setting vm_truncate_count to its
own value.  This could go on forever without any of them being able to
finish.

Truncate and hole punching already serialize with i_mutex.  Other
callers of unmap_mapping_range() do not, and it's difficult to get
i_mutex protection for all callers.  In particular -&gt;d_revalidate(),
which calls invalidate_inode_pages2_range() in fuse, may be called
with or without i_mutex.

This patch adds a new mutex to 'struct address_space' to prevent
running multiple concurrent unmap_mapping_range() on the same mapping.

[ We'll hopefully get rid of all this with the upcoming mm
  preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
  lockbreak" patch in particular.  But that is for 2.6.39 ]

Signed-off-by: Miklos Szeredi &lt;mszeredi@suse.cz&gt;
Reported-by: Michael Leun &lt;lkml20101129@newton.leun.net&gt;
Reported-by: Gurudas Pai &lt;gurudas.pai@oracle.com&gt;
Tested-by: Gurudas Pai &lt;gurudas.pai@oracle.com&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>direct-io: move aio_complete into -&gt;end_io</title>
<updated>2010-09-20T20:36:10+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@infradead.org</email>
</author>
<published>2010-07-18T21:17:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1ee3a461df139a0dc8642d951a41507fd80df30c'/>
<id>1ee3a461df139a0dc8642d951a41507fd80df30c</id>
<content type='text'>
commit 40e2e97316af6e62affab7a392e792494b8d9dde upstream.

Filesystems with unwritten extent support must not complete an AIO request
until the transaction to convert the extent has been commited.  That means
the aio_complete calls needs to be moved into the -&gt;end_io callback so
that the filesystem can control when to call it exactly.

This makes a bit of a mess out of dio_complete and the -&gt;end_io callback
prototype even more complicated.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Alex Elder &lt;aelder@sgi.com&gt;
Cc: Chuck Ebbert &lt;cebbert@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 40e2e97316af6e62affab7a392e792494b8d9dde upstream.

Filesystems with unwritten extent support must not complete an AIO request
until the transaction to convert the extent has been commited.  That means
the aio_complete calls needs to be moved into the -&gt;end_io callback so
that the filesystem can control when to call it exactly.

This makes a bit of a mess out of dio_complete and the -&gt;end_io callback
prototype even more complicated.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Alex Elder &lt;aelder@sgi.com&gt;
Cc: Chuck Ebbert &lt;cebbert@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>bio, fs: update RWA_MASK, READA and SWRITE to match the corresponding BIO_RW_* bits</title>
<updated>2010-08-13T20:30:53+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2010-08-03T11:14:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=bd388a58b5243cce7fad0c5f8e8586c6f9a02081'/>
<id>bd388a58b5243cce7fad0c5f8e8586c6f9a02081</id>
<content type='text'>
commit aca27ba9618276dd2f777bcd5a1419589ccf1ca8 upstream.

Commit a82afdf (block: use the same failfast bits for bio and request)
moved BIO_RW_* bits around such that they match up with REQ_* bits.
Unfortunately, fs.h hard coded RW_MASK, RWA_MASK, READ, WRITE, READA
and SWRITE as 0, 1, 2 and 3, and expected them to match with BIO_RW_*
bits.  READ/WRITE didn't change but BIO_RW_AHEAD was moved to bit 4
instead of bit 1, breaking RWA_MASK, READA and SWRITE.

This patch updates RWA_MASK, READA and SWRITE such that they match the
BIO_RW_* bits again.  A follow up patch will update the definitions to
directly use BIO_RW_* bits so that this kind of breakage won't happen
again.

Neil also spotted missing RWA_MASK conversion.

Stable: The offending commit a82afdf was released with v2.6.32, so
this patch should be applied to all kernels since then but it must
_NOT_ be applied to kernels earlier than that.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-and-bisected-by: Vladislav Bolkhovitin &lt;vst@vlnb.net&gt;
Root-caused-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit aca27ba9618276dd2f777bcd5a1419589ccf1ca8 upstream.

Commit a82afdf (block: use the same failfast bits for bio and request)
moved BIO_RW_* bits around such that they match up with REQ_* bits.
Unfortunately, fs.h hard coded RW_MASK, RWA_MASK, READ, WRITE, READA
and SWRITE as 0, 1, 2 and 3, and expected them to match with BIO_RW_*
bits.  READ/WRITE didn't change but BIO_RW_AHEAD was moved to bit 4
instead of bit 1, breaking RWA_MASK, READA and SWRITE.

This patch updates RWA_MASK, READA and SWRITE such that they match the
BIO_RW_* bits again.  A follow up patch will update the definitions to
directly use BIO_RW_* bits so that this kind of breakage won't happen
again.

Neil also spotted missing RWA_MASK conversion.

Stable: The offending commit a82afdf was released with v2.6.32, so
this patch should be applied to all kernels since then but it must
_NOT_ be applied to kernels earlier than that.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-and-bisected-by: Vladislav Bolkhovitin &lt;vst@vlnb.net&gt;
Root-caused-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Fix sget() race with failing mount</title>
<updated>2010-08-13T20:30:50+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2010-08-09T16:05:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=752638f5e4285a42b2df0394c060054f7154c85e'/>
<id>752638f5e4285a42b2df0394c060054f7154c85e</id>
<content type='text'>
commit 7a4dec53897ecd3367efb1e12fe8a4edc47dc0e9 upstream.

If sget() finds a matching superblock being set up, it'll
grab an active reference to it and grab s_umount.  That's
fine - we'll wait for completion of foofs_get_sb() that way.
However, if said foofs_get_sb() fails we'll end up holding
the halfway-created superblock.  deactivate_locked_super()
called by foofs_get_sb() will just unlock the sucker since
we are holding another active reference to it.

What we need is a way to tell if superblock has been successfully
set up.  Unfortunately, neither -&gt;s_root nor the check for
MS_ACTIVE quite fit.  Cheap and easy way, suitable for backport:
new flag set by the (only) caller of -&gt;get_sb().  If that flag
isn't present by the time sget() grabbed s_umount on preexisting
superblock it has found, it's seeing a stillborn and should
just bury it with deactivate_locked_super() (and repeat the search).

Longer term we want to set that flag in -&gt;get_sb() instances (and
check for it to distinguish between "sget() found us a live sb"
and "sget() has allocated an sb, we need to set it up" in there,
instead of checking -&gt;s_root as we do now).

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 7a4dec53897ecd3367efb1e12fe8a4edc47dc0e9 upstream.

If sget() finds a matching superblock being set up, it'll
grab an active reference to it and grab s_umount.  That's
fine - we'll wait for completion of foofs_get_sb() that way.
However, if said foofs_get_sb() fails we'll end up holding
the halfway-created superblock.  deactivate_locked_super()
called by foofs_get_sb() will just unlock the sucker since
we are holding another active reference to it.

What we need is a way to tell if superblock has been successfully
set up.  Unfortunately, neither -&gt;s_root nor the check for
MS_ACTIVE quite fit.  Cheap and easy way, suitable for backport:
new flag set by the (only) caller of -&gt;get_sb().  If that flag
isn't present by the time sget() grabbed s_umount on preexisting
superblock it has found, it's seeing a stillborn and should
just bury it with deactivate_locked_super() (and repeat the search).

Longer term we want to set that flag in -&gt;get_sb() instances (and
check for it to distinguish between "sget() found us a live sb"
and "sget() has allocated an sb, we need to set it up" in there,
instead of checking -&gt;s_root as we do now).

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>VFS: introduce s_dirty accessors</title>
<updated>2010-07-07T00:32:07+00:00</updated>
<author>
<name>Artem Bityutskiy</name>
<email>Artem.Bityutskiy@nokia.com</email>
</author>
<published>2010-06-10T10:56:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=140236b4b1c749c9b795ea3d11558a0eb5a3a080'/>
<id>140236b4b1c749c9b795ea3d11558a0eb5a3a080</id>
<content type='text'>
This patch introduces 3 VFS accessors: 'sb_mark_dirty()',
'sb_mark_clean()', and 'sb_is_dirty()'. They simply
set 'sb-&gt;s_dirt' or test 'sb-&gt;s_dirt'. The plan is to make
every FS use these accessors later instead of manipulating
the 'sb-&gt;s_dirt' flag directly.

Ultimately, this change is a preparation for the periodic
superblock synchronization optimization which is about
preventing the "sync_supers" kernel thread from waking up
even if there is nothing to synchronize.

This patch does not do any functional change, just adds
accessor functions.

Signed-off-by: Artem Bityutskiy &lt;Artem.Bityutskiy@nokia.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch introduces 3 VFS accessors: 'sb_mark_dirty()',
'sb_mark_clean()', and 'sb_is_dirty()'. They simply
set 'sb-&gt;s_dirt' or test 'sb-&gt;s_dirt'. The plan is to make
every FS use these accessors later instead of manipulating
the 'sb-&gt;s_dirt' flag directly.

Ultimately, this change is a preparation for the periodic
superblock synchronization optimization which is about
preventing the "sync_supers" kernel thread from waking up
even if there is nothing to synchronize.

This patch does not do any functional change, just adds
accessor functions.

Signed-off-by: Artem Bityutskiy &lt;Artem.Bityutskiy@nokia.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>wrong type for 'magic' argument in simple_fill_super()</title>
<updated>2010-06-04T21:16:28+00:00</updated>
<author>
<name>Roberto Sassu</name>
<email>roberto.sassu@polito.it</email>
</author>
<published>2010-06-03T09:58:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7d683a09990ff095a91b6e724ecee0ff8733274a'/>
<id>7d683a09990ff095a91b6e724ecee0ff8733274a</id>
<content type='text'>
It's used to superblock -&gt;s_magic, which is unsigned long.

Signed-off-by: Roberto Sassu &lt;roberto.sassu@polito.it&gt;
Reviewed-by: Mimi Zohar &lt;zohar@us.ibm.com&gt;
Signed-off-by: Eric Paris &lt;eparis@redhat.com&gt;
CC: stable@kernel.org
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It's used to superblock -&gt;s_magic, which is unsigned long.

Signed-off-by: Roberto Sassu &lt;roberto.sassu@polito.it&gt;
Reviewed-by: Mimi Zohar &lt;zohar@us.ibm.com&gt;
Signed-off-by: Eric Paris &lt;eparis@redhat.com&gt;
CC: stable@kernel.org
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: introduce new truncate sequence</title>
<updated>2010-05-28T02:15:33+00:00</updated>
<author>
<name>npiggin@suse.de</name>
<email>npiggin@suse.de</email>
</author>
<published>2010-05-26T15:05:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7bb46a6734a7e1ad4beaecc11cae7ed3ff81d30f'/>
<id>7bb46a6734a7e1ad4beaecc11cae7ed3ff81d30f</id>
<content type='text'>
Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
setattr &gt; vmtruncate &gt; truncate, have filesystems call their truncate sequence
from -&gt;setattr if filesystem specific operations are required. vmtruncate is
deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
previously should be used.

simple_setattr is introduced for simple in-ram filesystems to implement
the new truncate sequence. Eventually all filesystems should be converted
to implement a setattr, and the default code in notify_change should go
away.

simple_setsize is also introduced to perform just the ATTR_SIZE portion
of simple_setattr (ie. changing i_size and trimming pagecache).

To implement the new truncate sequence:
- filesystem specific manipulations (eg freeing blocks) must be done in
  the setattr method rather than -&gt;truncate.
- vmtruncate can not be used by core code to trim blocks past i_size in
  the event of write failure after allocation, so this must be performed
  in the fs code.
- convert usage of helpers block_write_begin, nobh_write_begin,
  cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
  variants. These avoid calling vmtruncate to trim blocks (see previous).
- inode_setattr should not be used. generic_setattr is a new function
  to be used to copy simple attributes into the generic inode.
- make use of the better opportunity to handle errors with the new sequence.

Big problem with the previous calling sequence: the filesystem is not called
until i_size has already changed.  This means it is not allowed to fail the
call, and also it does not know what the previous i_size was. Also, generic
code calling vmtruncate to truncate allocated blocks in case of error had
no good way to return a meaningful error (or, for example, atomically handle
block deallocation).

Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
setattr &gt; vmtruncate &gt; truncate, have filesystems call their truncate sequence
from -&gt;setattr if filesystem specific operations are required. vmtruncate is
deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
previously should be used.

simple_setattr is introduced for simple in-ram filesystems to implement
the new truncate sequence. Eventually all filesystems should be converted
to implement a setattr, and the default code in notify_change should go
away.

simple_setsize is also introduced to perform just the ATTR_SIZE portion
of simple_setattr (ie. changing i_size and trimming pagecache).

To implement the new truncate sequence:
- filesystem specific manipulations (eg freeing blocks) must be done in
  the setattr method rather than -&gt;truncate.
- vmtruncate can not be used by core code to trim blocks past i_size in
  the event of write failure after allocation, so this must be performed
  in the fs code.
- convert usage of helpers block_write_begin, nobh_write_begin,
  cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
  variants. These avoid calling vmtruncate to trim blocks (see previous).
- inode_setattr should not be used. generic_setattr is a new function
  to be used to copy simple attributes into the generic inode.
- make use of the better opportunity to handle errors with the new sequence.

Big problem with the previous calling sequence: the filesystem is not called
until i_size has already changed.  This means it is not allowed to fail the
call, and also it does not know what the previous i_size was. Also, generic
code calling vmtruncate to truncate allocated blocks in case of error had
no good way to return a meaningful error (or, for example, atomically handle
block deallocation).

Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rename the generic fsync implementations</title>
<updated>2010-05-28T02:06:06+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2010-05-26T15:53:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1b061d9247f71cd15edc4c4c4600191a903642c0'/>
<id>1b061d9247f71cd15edc4c4c4600191a903642c0</id>
<content type='text'>
We don't name our generic fsync implementations very well currently.
The no-op implementation for in-memory filesystems currently is called
simple_sync_file which doesn't make too much sense to start with,
the the generic one for simple filesystems is called simple_fsync
which can lead to some confusion.

This patch renames the generic file fsync method to generic_file_fsync
to match the other generic_file_* routines it is supposed to be used
with, and the no-op implementation to noop_fsync to make it obvious
what to expect.  In addition add some documentation for both methods.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We don't name our generic fsync implementations very well currently.
The no-op implementation for in-memory filesystems currently is called
simple_sync_file which doesn't make too much sense to start with,
the the generic one for simple filesystems is called simple_fsync
which can lead to some confusion.

This patch renames the generic file fsync method to generic_file_fsync
to match the other generic_file_* routines it is supposed to be used
with, and the no-op implementation to noop_fsync to make it obvious
what to expect.  In addition add some documentation for both methods.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drop unused dentry argument to -&gt;fsync</title>
<updated>2010-05-28T02:05:02+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2010-05-26T15:53:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7ea8085910ef3dd4f3cad6845aaa2b580d39b115'/>
<id>7ea8085910ef3dd4f3cad6845aaa2b580d39b115</id>
<content type='text'>
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>get rid of the magic around f_count in aio</title>
<updated>2010-05-28T02:03:07+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2010-05-26T19:13:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d7065da038227a4d09a244e6014e0186a6bd21d0'/>
<id>d7065da038227a4d09a244e6014e0186a6bd21d0</id>
<content type='text'>
__aio_put_req() plays sick games with file refcount.  What
it wants is fput() from atomic context; it's almost always
done with f_count &gt; 1, so they only have to deal with delayed
work in rare cases when their reference happens to be the
last one.  Current code decrements f_count and if it hasn't
hit 0, everything is fine.  Otherwise it keeps a pointer
to struct file (with zero f_count!) around and has delayed
work do __fput() on it.

Better way to do it: use atomic_long_add_unless( , -1, 1)
instead of !atomic_long_dec_and_test().  IOW, decrement it
only if it's not the last reference, leave refcount alone
if it was.  And use normal fput() in delayed work.

I've made that atomic_long_add_unless call a new helper -
fput_atomic().  Drops a reference to file if it's safe to
do in atomic (i.e. if that's not the last one), tells if
it had been able to do that.  aio.c converted to it, __fput()
use is gone.  req-&gt;ki_file *always* contributes to refcount
now.  And __fput() became static.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
__aio_put_req() plays sick games with file refcount.  What
it wants is fput() from atomic context; it's almost always
done with f_count &gt; 1, so they only have to deal with delayed
work in rare cases when their reference happens to be the
last one.  Current code decrements f_count and if it hasn't
hit 0, everything is fine.  Otherwise it keeps a pointer
to struct file (with zero f_count!) around and has delayed
work do __fput() on it.

Better way to do it: use atomic_long_add_unless( , -1, 1)
instead of !atomic_long_dec_and_test().  IOW, decrement it
only if it's not the last reference, leave refcount alone
if it was.  And use normal fput() in delayed work.

I've made that atomic_long_add_unless call a new helper -
fput_atomic().  Drops a reference to file if it's safe to
do in atomic (i.e. if that's not the last one), tells if
it had been able to do that.  aio.c converted to it, __fput()
use is gone.  req-&gt;ki_file *always* contributes to refcount
now.  And __fput() became static.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
</feed>
