<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/buffer.c, branch v2.6.14</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>[PATCH] gfp flags annotations - part 1</title>
<updated>2005-10-08T22:00:57+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@ftp.linux.org.uk</email>
</author>
<published>2005-10-07T06:46:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dd0fc66fb33cd610bc1a5db8a5e232d34879b4d7'/>
<id>dd0fc66fb33cd610bc1a5db8a5e232d34879b4d7</id>
<content type='text'>
 - added typedef unsigned int __nocast gfp_t;

 - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
   the same warnings as far as sparse is concerned, doesn't change
   generated code (from gcc point of view we replaced unsigned int with
   typedef) and documents what's going on far better.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
 - added typedef unsigned int __nocast gfp_t;

 - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
   the same warnings as far as sparse is concerned, doesn't change
   generated code (from gcc point of view we replaced unsigned int with
   typedef) and documents what's going on far better.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] spinlock consolidation</title>
<updated>2005-09-10T17:06:21+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@elte.hu</email>
</author>
<published>2005-09-10T07:25:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fb1c8f93d869b34cacb8b8932e2b83d96a19d720'/>
<id>fb1c8f93d869b34cacb8b8932e2b83d96a19d720</id>
<content type='text'>
This patch (written by me and also containing many suggestions of Arjan van
de Ven) does a major cleanup of the spinlock code.  It does the following
things:

 - consolidates and enhances the spinlock/rwlock debugging code

 - simplifies the asm/spinlock.h files

 - encapsulates the raw spinlock type and moves generic spinlock
   features (such as -&gt;break_lock) into the generic code.

 - cleans up the spinlock code hierarchy to get rid of the spaghetti.

Most notably there's now only a single variant of the debugging code,
located in lib/spinlock_debug.c.  (previously we had one SMP debugging
variant per architecture, plus a separate generic one for UP builds)

Also, i've enhanced the rwlock debugging facility, it will now track
write-owners.  There is new spinlock-owner/CPU-tracking on SMP builds too.
All locks have lockup detection now, which will work for both soft and hard
spin/rwlock lockups.

The arch-level include files now only contain the minimally necessary
subset of the spinlock code - all the rest that can be generalized now
lives in the generic headers:

 include/asm-i386/spinlock_types.h       |   16
 include/asm-x86_64/spinlock_types.h     |   16

I have also split up the various spinlock variants into separate files,
making it easier to see which does what. The new layout is:

   SMP                         |  UP
   ----------------------------|-----------------------------------
   asm/spinlock_types_smp.h    |  linux/spinlock_types_up.h
   linux/spinlock_types.h      |  linux/spinlock_types.h
   asm/spinlock_smp.h          |  linux/spinlock_up.h
   linux/spinlock_api_smp.h    |  linux/spinlock_api_up.h
   linux/spinlock.h            |  linux/spinlock.h

/*
 * here's the role of the various spinlock/rwlock related include files:
 *
 * on SMP builds:
 *
 *  asm/spinlock_types.h: contains the raw_spinlock_t/raw_rwlock_t and the
 *                        initializers
 *
 *  linux/spinlock_types.h:
 *                        defines the generic type and initializers
 *
 *  asm/spinlock.h:       contains the __raw_spin_*()/etc. lowlevel
 *                        implementations, mostly inline assembly code
 *
 *   (also included on UP-debug builds:)
 *
 *  linux/spinlock_api_smp.h:
 *                        contains the prototypes for the _spin_*() APIs.
 *
 *  linux/spinlock.h:     builds the final spin_*() APIs.
 *
 * on UP builds:
 *
 *  linux/spinlock_type_up.h:
 *                        contains the generic, simplified UP spinlock type.
 *                        (which is an empty structure on non-debug builds)
 *
 *  linux/spinlock_types.h:
 *                        defines the generic type and initializers
 *
 *  linux/spinlock_up.h:
 *                        contains the __raw_spin_*()/etc. version of UP
 *                        builds. (which are NOPs on non-debug, non-preempt
 *                        builds)
 *
 *   (included on UP-non-debug builds:)
 *
 *  linux/spinlock_api_up.h:
 *                        builds the _spin_*() APIs.
 *
 *  linux/spinlock.h:     builds the final spin_*() APIs.
 */

All SMP and UP architectures are converted by this patch.

arm, i386, ia64, ppc, ppc64, s390/s390x, x64 was build-tested via
crosscompilers.  m32r, mips, sh, sparc, have not been tested yet, but should
be mostly fine.

From: Grant Grundler &lt;grundler@parisc-linux.org&gt;

  Booted and lightly tested on a500-44 (64-bit, SMP kernel, dual CPU).
  Builds 32-bit SMP kernel (not booted or tested).  I did not try to build
  non-SMP kernels.  That should be trivial to fix up later if necessary.

  I converted bit ops atomic_hash lock to raw_spinlock_t.  Doing so avoids
  some ugly nesting of linux/*.h and asm/*.h files.  Those particular locks
  are well tested and contained entirely inside arch specific code.  I do NOT
  expect any new issues to arise with them.

 If someone does ever need to use debug/metrics with them, then they will
  need to unravel this hairball between spinlocks, atomic ops, and bit ops
  that exist only because parisc has exactly one atomic instruction: LDCW
  (load and clear word).

From: "Luck, Tony" &lt;tony.luck@intel.com&gt;

   ia64 fix

Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Arjan van de Ven &lt;arjanv@infradead.org&gt;
Signed-off-by: Grant Grundler &lt;grundler@parisc-linux.org&gt;
Cc: Matthew Wilcox &lt;willy@debian.org&gt;
Signed-off-by: Hirokazu Takata &lt;takata@linux-m32r.org&gt;
Signed-off-by: Mikael Pettersson &lt;mikpe@csd.uu.se&gt;
Signed-off-by: Benoit Boissinot &lt;benoit.boissinot@ens-lyon.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch (written by me and also containing many suggestions of Arjan van
de Ven) does a major cleanup of the spinlock code.  It does the following
things:

 - consolidates and enhances the spinlock/rwlock debugging code

 - simplifies the asm/spinlock.h files

 - encapsulates the raw spinlock type and moves generic spinlock
   features (such as -&gt;break_lock) into the generic code.

 - cleans up the spinlock code hierarchy to get rid of the spaghetti.

Most notably there's now only a single variant of the debugging code,
located in lib/spinlock_debug.c.  (previously we had one SMP debugging
variant per architecture, plus a separate generic one for UP builds)

Also, i've enhanced the rwlock debugging facility, it will now track
write-owners.  There is new spinlock-owner/CPU-tracking on SMP builds too.
All locks have lockup detection now, which will work for both soft and hard
spin/rwlock lockups.

The arch-level include files now only contain the minimally necessary
subset of the spinlock code - all the rest that can be generalized now
lives in the generic headers:

 include/asm-i386/spinlock_types.h       |   16
 include/asm-x86_64/spinlock_types.h     |   16

I have also split up the various spinlock variants into separate files,
making it easier to see which does what. The new layout is:

   SMP                         |  UP
   ----------------------------|-----------------------------------
   asm/spinlock_types_smp.h    |  linux/spinlock_types_up.h
   linux/spinlock_types.h      |  linux/spinlock_types.h
   asm/spinlock_smp.h          |  linux/spinlock_up.h
   linux/spinlock_api_smp.h    |  linux/spinlock_api_up.h
   linux/spinlock.h            |  linux/spinlock.h

/*
 * here's the role of the various spinlock/rwlock related include files:
 *
 * on SMP builds:
 *
 *  asm/spinlock_types.h: contains the raw_spinlock_t/raw_rwlock_t and the
 *                        initializers
 *
 *  linux/spinlock_types.h:
 *                        defines the generic type and initializers
 *
 *  asm/spinlock.h:       contains the __raw_spin_*()/etc. lowlevel
 *                        implementations, mostly inline assembly code
 *
 *   (also included on UP-debug builds:)
 *
 *  linux/spinlock_api_smp.h:
 *                        contains the prototypes for the _spin_*() APIs.
 *
 *  linux/spinlock.h:     builds the final spin_*() APIs.
 *
 * on UP builds:
 *
 *  linux/spinlock_type_up.h:
 *                        contains the generic, simplified UP spinlock type.
 *                        (which is an empty structure on non-debug builds)
 *
 *  linux/spinlock_types.h:
 *                        defines the generic type and initializers
 *
 *  linux/spinlock_up.h:
 *                        contains the __raw_spin_*()/etc. version of UP
 *                        builds. (which are NOPs on non-debug, non-preempt
 *                        builds)
 *
 *   (included on UP-non-debug builds:)
 *
 *  linux/spinlock_api_up.h:
 *                        builds the _spin_*() APIs.
 *
 *  linux/spinlock.h:     builds the final spin_*() APIs.
 */

All SMP and UP architectures are converted by this patch.

arm, i386, ia64, ppc, ppc64, s390/s390x, x64 was build-tested via
crosscompilers.  m32r, mips, sh, sparc, have not been tested yet, but should
be mostly fine.

From: Grant Grundler &lt;grundler@parisc-linux.org&gt;

  Booted and lightly tested on a500-44 (64-bit, SMP kernel, dual CPU).
  Builds 32-bit SMP kernel (not booted or tested).  I did not try to build
  non-SMP kernels.  That should be trivial to fix up later if necessary.

  I converted bit ops atomic_hash lock to raw_spinlock_t.  Doing so avoids
  some ugly nesting of linux/*.h and asm/*.h files.  Those particular locks
  are well tested and contained entirely inside arch specific code.  I do NOT
  expect any new issues to arise with them.

 If someone does ever need to use debug/metrics with them, then they will
  need to unravel this hairball between spinlocks, atomic ops, and bit ops
  that exist only because parisc has exactly one atomic instruction: LDCW
  (load and clear word).

From: "Luck, Tony" &lt;tony.luck@intel.com&gt;

   ia64 fix

Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Arjan van de Ven &lt;arjanv@infradead.org&gt;
Signed-off-by: Grant Grundler &lt;grundler@parisc-linux.org&gt;
Cc: Matthew Wilcox &lt;willy@debian.org&gt;
Signed-off-by: Hirokazu Takata &lt;takata@linux-m32r.org&gt;
Signed-off-by: Mikael Pettersson &lt;mikpe@csd.uu.se&gt;
Signed-off-by: Benoit Boissinot &lt;benoit.boissinot@ens-lyon.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] Make ll_rw_block() wait for buffer lock</title>
<updated>2005-09-07T23:57:55+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2005-09-06T22:19:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a7662236253374012d364106b6dc9161bd929e2e'/>
<id>a7662236253374012d364106b6dc9161bd929e2e</id>
<content type='text'>
Introduce new ll_rw_block() operation SWRITE meaning that block layer should
wait for the buffer lock and write-out afterwards.  Hence data in buffers at
the time of call are guaranteed to be submitted to the disk.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Introduce new ll_rw_block() operation SWRITE meaning that block layer should
wait for the buffer lock and write-out afterwards.  Hence data in buffers at
the time of call are guaranteed to be submitted to the disk.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] alloc_buffer_head() and free_buffer_head() cleanup</title>
<updated>2005-09-07T23:57:41+00:00</updated>
<author>
<name>Coywolf Qi Hunt</name>
<email>qiyong@fc-cn.com</email>
</author>
<published>2005-09-06T22:18:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=736c7b808f38f3bb72941345e11e236ec65dec3d'/>
<id>736c7b808f38f3bb72941345e11e236ec65dec3d</id>
<content type='text'>
Signed-off-by: Coywolf Qi Hunt &lt;qiyong@fc-cn.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Coywolf Qi Hunt &lt;qiyong@fc-cn.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] page_uptodate locking scalability</title>
<updated>2005-07-08T01:23:45+00:00</updated>
<author>
<name>Nick Piggin</name>
<email>nickpiggin@yahoo.com.au</email>
</author>
<published>2005-07-08T00:56:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a39722034ae37f80a1803bf781fe3fe1b03e20bc'/>
<id>a39722034ae37f80a1803bf781fe3fe1b03e20bc</id>
<content type='text'>
Use a bit spin lock in the first buffer of the page to synchronise asynch
IO buffer completions, instead of the global page_uptodate_lock, which is
showing some scalabilty problems.

Signed-off-by: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use a bit spin lock in the first buffer of the page to synchronise asynch
IO buffer completions, instead of the global page_uptodate_lock, which is
showing some scalabilty problems.

Signed-off-by: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] rename wakeup_bdflush to wakeup_pdflush</title>
<updated>2005-06-29T04:20:31+00:00</updated>
<author>
<name>Pekka J Enberg</name>
<email>penberg@cs.Helsinki.FI</email>
</author>
<published>2005-06-29T03:44:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=687a21cee17000177b1935896b9b475acf136678'/>
<id>687a21cee17000177b1935896b9b475acf136678</id>
<content type='text'>
Signed-off-by: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] Bug in error recovery in fs/buffer.c::__block_prepare_write()</title>
<updated>2005-06-23T16:45:34+00:00</updated>
<author>
<name>Anton Altaparmakov</name>
<email>aia21@cam.ac.uk</email>
</author>
<published>2005-06-23T07:10:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=152becd26e0563aefdbc4fd1fe491928efe92d1f'/>
<id>152becd26e0563aefdbc4fd1fe491928efe92d1f</id>
<content type='text'>
fs/buffer.c::__block_prepare_write() has broken error recovery.  It calls
the get_block() callback with "create = 1" and if that succeeds it
immediately clears buffer_new on the just allocated buffer (which has
buffer_new set).

The bug is that if an error occurs and get_block() returns != 0, we break
from this loop and go into recovery code.  This code has this comment:

/* Error case: */
/*
 * Zero out any newly allocated blocks to avoid exposing stale
 * data.  If BH_New is set, we know that the block was newly
 * allocated in the above loop.
 */

So the intent is obviously good in that it wants to clear just allocated
and hence not zeroed buffers.  However the code recognises allocated
buffers by checking for buffer_new being set.

Unfortunately __block_prepare_write() as discussed above already cleared
buffer_new on all allocated buffers thus no buffers will be cleared during
error recovery and old data will be leaked.

The simplest way I can see to fix this is to make the current recovery code
work by _not_ clearing buffer_new after calling get_block() in
__block_prepare_write().

We cannot safely allow buffer_new buffers to "leak out" of
__block_prepare_write(), thus we simply do a quick loop over the buffers
clearing buffer_new on each of them if it is set just before returning
"success" from __block_prepare_write().

Signed-off-by: Anton Altaparmakov &lt;aia21@cantab.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
fs/buffer.c::__block_prepare_write() has broken error recovery.  It calls
the get_block() callback with "create = 1" and if that succeeds it
immediately clears buffer_new on the just allocated buffer (which has
buffer_new set).

The bug is that if an error occurs and get_block() returns != 0, we break
from this loop and go into recovery code.  This code has this comment:

/* Error case: */
/*
 * Zero out any newly allocated blocks to avoid exposing stale
 * data.  If BH_New is set, we know that the block was newly
 * allocated in the above loop.
 */

So the intent is obviously good in that it wants to clear just allocated
and hence not zeroed buffers.  However the code recognises allocated
buffers by checking for buffer_new being set.

Unfortunately __block_prepare_write() as discussed above already cleared
buffer_new on all allocated buffers thus no buffers will be cleared during
error recovery and old data will be leaked.

The simplest way I can see to fix this is to make the current recovery code
work by _not_ clearing buffer_new after calling get_block() in
__block_prepare_write().

We cannot safely allow buffer_new buffers to "leak out" of
__block_prepare_write(), thus we simply do a quick loop over the buffers
clearing buffer_new on each of them if it is set just before returning
"success" from __block_prepare_write().

Signed-off-by: Anton Altaparmakov &lt;aia21@cantab.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] factor out common code in sys_fsync/sys_fdatasync</title>
<updated>2005-06-23T16:45:29+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@tv-sign.ru</email>
</author>
<published>2005-06-23T07:10:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dfb388bf8a328f206bba33933dd97230f412238b'/>
<id>dfb388bf8a328f206bba33933dd97230f412238b</id>
<content type='text'>
This patch consolidates sys_fsync and sys_fdatasync.

Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch consolidates sys_fsync and sys_fdatasync.

Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] vm: try_to_free_pages unused argument</title>
<updated>2005-06-22T01:46:17+00:00</updated>
<author>
<name>Darren Hart</name>
<email>dvhltc@us.ibm.com</email>
</author>
<published>2005-06-22T00:14:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1ad539b2bd89bf2e129123eb24d5bcc4484a35de'/>
<id>1ad539b2bd89bf2e129123eb24d5bcc4484a35de</id>
<content type='text'>
try_to_free_pages accepts a third argument, order, but hasn't used it since
before 2.6.0.  The following patch removes the argument and updates all the
calls to try_to_free_pages.

Signed-off-by: Darren Hart &lt;dvhltc@us.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
try_to_free_pages accepts a third argument, order, but hasn't used it since
before 2.6.0.  The following patch removes the argument and updates all the
calls to try_to_free_pages.

Signed-off-by: Darren Hart &lt;dvhltc@us.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] block_read_full_page() get_block() error handling fix</title>
<updated>2005-05-17T14:59:20+00:00</updated>
<author>
<name>Andrew Morton</name>
<email>akpm@osdl.org</email>
</author>
<published>2005-05-17T04:53:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c64610ba585fabb36be78782868277f3d9741a2e'/>
<id>c64610ba585fabb36be78782868277f3d9741a2e</id>
<content type='text'>
If block_read_full_page() detects an error when running get_block() it will
run SetPageError(), then it will zero out the block in pagecache and will mark
the buffer_head uptodate.

So at the end of readahead we end up with a non-uptodate pagecache page which
is marked PageError.  But it has uptodate buffers.

The pagefault code will run ClearPageError, will launch readpage a second time
and block_read_full_page() will notice the uptodate buffers and will mark the
page uptodate as well.  We end up with an uptodate, !PageError page full of
zeros and the error is lost.

(It seems a little odd that filemap_nopage() runs ClearPageError().  I guess
all of this adds up to meaning that for each attempted access to the page, the
pagefault handler will retry the I/O.  Which is good and bad.  If the app is
ignoring SIGBUS for some reason we could get a lot of back-to-back I/O
errors.)

Fix it by not marking the pagecache buffer_head as uptodate if the attempt to
map that buffer to a disk block failed.

Credit-to: Qu Fuping &lt;fs@ercist.iscas.ac.cn&gt;

  For reporting the bug and identifying its source.

Signed-off-by: Qu Fuping &lt;fs@ercist.iscas.ac.cn&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If block_read_full_page() detects an error when running get_block() it will
run SetPageError(), then it will zero out the block in pagecache and will mark
the buffer_head uptodate.

So at the end of readahead we end up with a non-uptodate pagecache page which
is marked PageError.  But it has uptodate buffers.

The pagefault code will run ClearPageError, will launch readpage a second time
and block_read_full_page() will notice the uptodate buffers and will mark the
page uptodate as well.  We end up with an uptodate, !PageError page full of
zeros and the error is lost.

(It seems a little odd that filemap_nopage() runs ClearPageError().  I guess
all of this adds up to meaning that for each attempted access to the page, the
pagefault handler will retry the I/O.  Which is good and bad.  If the app is
ignoring SIGBUS for some reason we could get a lot of back-to-back I/O
errors.)

Fix it by not marking the pagecache buffer_head as uptodate if the attempt to
map that buffer to a disk block failed.

Credit-to: Qu Fuping &lt;fs@ercist.iscas.ac.cn&gt;

  For reporting the bug and identifying its source.

Signed-off-by: Qu Fuping &lt;fs@ercist.iscas.ac.cn&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
