<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/buffer.c, branch v2.6.29</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Fix race in create_empty_buffers() vs __set_page_dirty_buffers()</title>
<updated>2009-03-19T18:32:05+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2009-03-19T18:32:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a8e7d49aa7be728c4ae241a75a2a124cdcabc0c5'/>
<id>a8e7d49aa7be728c4ae241a75a2a124cdcabc0c5</id>
<content type='text'>
Nick Piggin noticed this (very unlikely) race between setting a page
dirty and creating the buffers for it - we need to hold the mapping
private_lock until we've set the page dirty bit in order to make sure
that create_empty_buffers() might not build up a set of buffers without
the dirty bits set when the page is dirty.

I doubt anybody has ever hit this race (and it didn't solve the issue
Nick was looking at), but as Nick says: "Still, it does appear to solve
a real race, which we should close."

Acked-by: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Nick Piggin noticed this (very unlikely) race between setting a page
dirty and creating the buffers for it - we need to hold the mapping
private_lock until we've set the page dirty bit in order to make sure
that create_empty_buffers() might not build up a set of buffers without
the dirty bits set when the page is dirty.

I doubt anybody has ever hit this race (and it didn't solve the issue
Nick was looking at), but as Nick says: "Still, it does appear to solve
a real race, which we should close."

Acked-by: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block</title>
<updated>2009-02-19T02:33:04+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2009-02-19T02:33:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ba95fd47d177d46743ad94055908d22840370e06'/>
<id>ba95fd47d177d46743ad94055908d22840370e06</id>
<content type='text'>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: fix deadlock in blk_abort_queue() for drivers that readd to timeout list
  block: fix booting from partitioned md array
  block: revert part of 18ce3751ccd488c78d3827e9f6bf54e6322676fb
  cciss: PCI power management reset for kexec
  paride/pg.c: xs(): &amp;&amp;/|| confusion
  fs/bio: bio_alloc_bioset: pass right object ptr to mempool_free
  block: fix bad definition of BIO_RW_SYNC
  bsg: Fix sense buffer bug in SG_IO
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: fix deadlock in blk_abort_queue() for drivers that readd to timeout list
  block: fix booting from partitioned md array
  block: revert part of 18ce3751ccd488c78d3827e9f6bf54e6322676fb
  cciss: PCI power management reset for kexec
  paride/pg.c: xs(): &amp;&amp;/|| confusion
  fs/bio: bio_alloc_bioset: pass right object ptr to mempool_free
  block: fix bad definition of BIO_RW_SYNC
  bsg: Fix sense buffer bug in SG_IO
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: task dirty accounting fix</title>
<updated>2009-02-18T23:37:54+00:00</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@suse.de</email>
</author>
<published>2009-02-18T22:48:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1cf6e7d83bf334cc5916137862c920a97aabc018'/>
<id>1cf6e7d83bf334cc5916137862c920a97aabc018</id>
<content type='text'>
YAMAMOTO-san noticed that task_dirty_inc doesn't seem to be called properly for
cases where set_page_dirty is not used to dirty a page (eg. mark_buffer_dirty).

Additionally, there is some inconsistency about when task_dirty_inc is
called.  It is used for dirty balancing, however it even gets called for
__set_page_dirty_no_writeback.

So rather than increment it in a set_page_dirty wrapper, move it down to
exactly where the dirty page accounting stats are incremented.

Cc: YAMAMOTO Takashi &lt;yamamoto@valinux.co.jp&gt;
Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
YAMAMOTO-san noticed that task_dirty_inc doesn't seem to be called properly for
cases where set_page_dirty is not used to dirty a page (eg. mark_buffer_dirty).

Additionally, there is some inconsistency about when task_dirty_inc is
called.  It is used for dirty balancing, however it even gets called for
__set_page_dirty_no_writeback.

So rather than increment it in a set_page_dirty wrapper, move it down to
exactly where the dirty page accounting stats are incremented.

Cc: YAMAMOTO Takashi &lt;yamamoto@valinux.co.jp&gt;
Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: revert part of 18ce3751ccd488c78d3827e9f6bf54e6322676fb</title>
<updated>2009-02-18T09:32:01+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>jens.axboe@oracle.com</email>
</author>
<published>2009-02-17T12:59:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=78f707bfc723552e8309b7c38a8d0cc51012e813'/>
<id>78f707bfc723552e8309b7c38a8d0cc51012e813</id>
<content type='text'>
The above commit added WRITE_SYNC and switched various places to using
that for committing writes that will be waited upon immediately after
submission. However, this causes a performance regression with AS and CFQ
for ext3 at least, since sync_dirty_buffer() will submit some writes with
WRITE_SYNC while ext3 has sumitted others dependent writes without the sync
flag set. This causes excessive anticipation/idling in the IO scheduler
because sync and async writes get interleaved, causing a big performance
regression for the below test case (which is meant to simulate sqlite
like behaviour).

---- test case ----

int main(int argc, char **argv)
{

	int fdes, i;
	FILE *fp;
	struct timeval start;
	struct timeval end;
	struct timeval res;

	gettimeofday(&amp;start, NULL);
	for (i=0; i&lt;ROWS; i++) {
		fp = fopen("test_file", "a");
		fprintf(fp, "Some Text Data\n");
		fdes = fileno(fp);
		fsync(fdes);
		fclose(fp);
	}
	gettimeofday(&amp;end, NULL);

	timersub(&amp;end, &amp;start, &amp;res);
	fprintf(stdout, "time to write %d lines is %ld(msec)\n", ROWS,
			(res.tv_sec*1000000 + res.tv_usec)/1000);

	return 0;
}

-------------------

Thanks to Sean.White@APCC.com for tracking down this performance
regression and providing a test case.

Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The above commit added WRITE_SYNC and switched various places to using
that for committing writes that will be waited upon immediately after
submission. However, this causes a performance regression with AS and CFQ
for ext3 at least, since sync_dirty_buffer() will submit some writes with
WRITE_SYNC while ext3 has sumitted others dependent writes without the sync
flag set. This causes excessive anticipation/idling in the IO scheduler
because sync and async writes get interleaved, causing a big performance
regression for the below test case (which is meant to simulate sqlite
like behaviour).

---- test case ----

int main(int argc, char **argv)
{

	int fdes, i;
	FILE *fp;
	struct timeval start;
	struct timeval end;
	struct timeval res;

	gettimeofday(&amp;start, NULL);
	for (i=0; i&lt;ROWS; i++) {
		fp = fopen("test_file", "a");
		fprintf(fp, "Some Text Data\n");
		fdes = fileno(fp);
		fsync(fdes);
		fclose(fp);
	}
	gettimeofday(&amp;end, NULL);

	timersub(&amp;end, &amp;start, &amp;res);
	fprintf(stdout, "time to write %d lines is %ld(msec)\n", ROWS,
			(res.tv_sec*1000000 + res.tv_usec)/1000);

	return 0;
}

-------------------

Thanks to Sean.White@APCC.com for tracking down this performance
regression and providing a test case.

Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfs: Don't call attach_nobh_buffers() with an empty list</title>
<updated>2009-02-06T21:34:22+00:00</updated>
<author>
<name>Dave Kleikamp</name>
<email>shaggy@linux.vnet.ibm.com</email>
</author>
<published>2009-02-06T20:59:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d4cf109f05ff04c6f5065c3e14165ef01a57dd53'/>
<id>d4cf109f05ff04c6f5065c3e14165ef01a57dd53</id>
<content type='text'>
This is a modification of a patch by Bill Pemberton &lt;wfp5p@virginia.edu&gt;

nobh_write_end() could call attach_nobh_buffers() with head == NULL.
This would result in a trap when attach_nobh_buffers() attempted to
access bh-&gt;b_this_page.

This can be illustrated by running the writev01 testcase from LTP on jfs.

This error was introduced by commit 5b41e74a "vfs: fix data leak in
nobh_write_end()".  That patch did not take into account that if
PageMappedToDisk() is true upon entry to nobh_write_begin(), then no
buffers will be allocated for the page.  In that case, we won't have to
worry about a failed write leaving unitialized data in the page.

Of course, head != NULL implies !page_has_buffers(page), so no need to
test both.

Signed-off-by: Dave Kleikamp &lt;shaggy@linux.vnet.ibm.com&gt;
Cc: Bill Pemberton &lt;wfp5p@virginia.edu&gt;
Cc: Dmitri Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is a modification of a patch by Bill Pemberton &lt;wfp5p@virginia.edu&gt;

nobh_write_end() could call attach_nobh_buffers() with head == NULL.
This would result in a trap when attach_nobh_buffers() attempted to
access bh-&gt;b_this_page.

This can be illustrated by running the writev01 testcase from LTP on jfs.

This error was introduced by commit 5b41e74a "vfs: fix data leak in
nobh_write_end()".  That patch did not take into account that if
PageMappedToDisk() is true upon entry to nobh_write_begin(), then no
buffers will be allocated for the page.  In that case, we won't have to
worry about a failed write leaving unitialized data in the page.

Of course, head != NULL implies !page_has_buffers(page), so no need to
test both.

Signed-off-by: Dave Kleikamp &lt;shaggy@linux.vnet.ibm.com&gt;
Cc: Bill Pemberton &lt;wfp5p@virginia.edu&gt;
Cc: Dmitri Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[CVE-2009-0029] System call wrappers part 10</title>
<updated>2009-01-14T13:15:22+00:00</updated>
<author>
<name>Heiko Carstens</name>
<email>heiko.carstens@de.ibm.com</email>
</author>
<published>2009-01-14T13:14:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=bdc480e3bef6eb0e7071770834cbdda7e30a5436'/>
<id>bdc480e3bef6eb0e7071770834cbdda7e30a5436</id>
<content type='text'>
Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>filesystem freeze: implement generic freeze feature</title>
<updated>2009-01-10T00:54:42+00:00</updated>
<author>
<name>Takashi Sato</name>
<email>t-sato@yk.jp.nec.com</email>
</author>
<published>2009-01-10T00:40:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fcccf502540e3d752d33b2d8e976034dee81f9f7'/>
<id>fcccf502540e3d752d33b2d8e976034dee81f9f7</id>
<content type='text'>
The ioctls for the generic freeze feature are below.
o Freeze the filesystem
  int ioctl(int fd, int FIFREEZE, arg)
    fd: The file descriptor of the mountpoint
    FIFREEZE: request code for the freeze
    arg: Ignored
    Return value: 0 if the operation succeeds. Otherwise, -1

o Unfreeze the filesystem
  int ioctl(int fd, int FITHAW, arg)
    fd: The file descriptor of the mountpoint
    FITHAW: request code for unfreeze
    arg: Ignored
    Return value: 0 if the operation succeeds. Otherwise, -1
    Error number: If the filesystem has already been unfrozen,
                  errno is set to EINVAL.

[akpm@linux-foundation.org: fix CONFIG_BLOCK=n]
Signed-off-by: Takashi Sato &lt;t-sato@yk.jp.nec.com&gt;
Signed-off-by: Masayuki Hamaguchi &lt;m-hamaguchi@ys.jp.nec.com&gt;
Cc: &lt;xfs-masters@oss.sgi.com&gt;
Cc: &lt;linux-ext4@vger.kernel.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Dave Kleikamp &lt;shaggy@austin.ibm.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Alasdair G Kergon &lt;agk@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The ioctls for the generic freeze feature are below.
o Freeze the filesystem
  int ioctl(int fd, int FIFREEZE, arg)
    fd: The file descriptor of the mountpoint
    FIFREEZE: request code for the freeze
    arg: Ignored
    Return value: 0 if the operation succeeds. Otherwise, -1

o Unfreeze the filesystem
  int ioctl(int fd, int FITHAW, arg)
    fd: The file descriptor of the mountpoint
    FITHAW: request code for unfreeze
    arg: Ignored
    Return value: 0 if the operation succeeds. Otherwise, -1
    Error number: If the filesystem has already been unfrozen,
                  errno is set to EINVAL.

[akpm@linux-foundation.org: fix CONFIG_BLOCK=n]
Signed-off-by: Takashi Sato &lt;t-sato@yk.jp.nec.com&gt;
Signed-off-by: Masayuki Hamaguchi &lt;m-hamaguchi@ys.jp.nec.com&gt;
Cc: &lt;xfs-masters@oss.sgi.com&gt;
Cc: &lt;linux-ext4@vger.kernel.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Dave Kleikamp &lt;shaggy@austin.ibm.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Alasdair G Kergon &lt;agk@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>filesystem freeze: add error handling of write_super_lockfs/unlockfs</title>
<updated>2009-01-10T00:54:42+00:00</updated>
<author>
<name>Takashi Sato</name>
<email>t-sato@yk.jp.nec.com</email>
</author>
<published>2009-01-10T00:40:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c4be0c1dc4cdc37b175579be1460f15ac6495e9a'/>
<id>c4be0c1dc4cdc37b175579be1460f15ac6495e9a</id>
<content type='text'>
Currently, ext3 in mainline Linux doesn't have the freeze feature which
suspends write requests.  So, we cannot take a backup which keeps the
filesystem's consistency with the storage device's features (snapshot and
replication) while it is mounted.

In many case, a commercial filesystem (e.g.  VxFS) has the freeze feature
and it would be used to get the consistent backup.

If Linux's standard filesystem ext3 has the freeze feature, we can do it
without a commercial filesystem.

So I have implemented the ioctls of the freeze feature.
I think we can take the consistent backup with the following steps.
1. Freeze the filesystem with the freeze ioctl.
2. Separate the replication volume or create the snapshot
   with the storage device's feature.
3. Unfreeze the filesystem with the unfreeze ioctl.
4. Take the backup from the separated replication volume
   or the snapshot.

This patch:

VFS:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they can return an error.
Rename write_super_lockfs and unlockfs of the super block operation
freeze_fs and unfreeze_fs to avoid a confusion.

ext3, ext4, xfs, gfs2, jfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that write_super_lockfs returns an error if needed,
and unlockfs always returns 0.

reiserfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they always return 0 (success) to keep a current behavior.

Signed-off-by: Takashi Sato &lt;t-sato@yk.jp.nec.com&gt;
Signed-off-by: Masayuki Hamaguchi &lt;m-hamaguchi@ys.jp.nec.com&gt;
Cc: &lt;xfs-masters@oss.sgi.com&gt;
Cc: &lt;linux-ext4@vger.kernel.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Dave Kleikamp &lt;shaggy@austin.ibm.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Alasdair G Kergon &lt;agk@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, ext3 in mainline Linux doesn't have the freeze feature which
suspends write requests.  So, we cannot take a backup which keeps the
filesystem's consistency with the storage device's features (snapshot and
replication) while it is mounted.

In many case, a commercial filesystem (e.g.  VxFS) has the freeze feature
and it would be used to get the consistent backup.

If Linux's standard filesystem ext3 has the freeze feature, we can do it
without a commercial filesystem.

So I have implemented the ioctls of the freeze feature.
I think we can take the consistent backup with the following steps.
1. Freeze the filesystem with the freeze ioctl.
2. Separate the replication volume or create the snapshot
   with the storage device's feature.
3. Unfreeze the filesystem with the unfreeze ioctl.
4. Take the backup from the separated replication volume
   or the snapshot.

This patch:

VFS:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they can return an error.
Rename write_super_lockfs and unlockfs of the super block operation
freeze_fs and unfreeze_fs to avoid a confusion.

ext3, ext4, xfs, gfs2, jfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that write_super_lockfs returns an error if needed,
and unlockfs always returns 0.

reiserfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they always return 0 (success) to keep a current behavior.

Signed-off-by: Takashi Sato &lt;t-sato@yk.jp.nec.com&gt;
Signed-off-by: Masayuki Hamaguchi &lt;m-hamaguchi@ys.jp.nec.com&gt;
Cc: &lt;xfs-masters@oss.sgi.com&gt;
Cc: &lt;linux-ext4@vger.kernel.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Dave Kleikamp &lt;shaggy@austin.ibm.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Alasdair G Kergon &lt;agk@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block_write_begin(): remove useless goto</title>
<updated>2009-01-06T23:59:08+00:00</updated>
<author>
<name>Franck Bui-Huu</name>
<email>fbuihuu@gmail.com</email>
</author>
<published>2009-01-06T22:40:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=69e9930993cfd70d82c8d9dd96fc3a88854d06fc'/>
<id>69e9930993cfd70d82c8d9dd96fc3a88854d06fc</id>
<content type='text'>
Signed-off-by: Franck Bui-Huu &lt;fbuihuu@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Franck Bui-Huu &lt;fbuihuu@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: symlink write_begin allocation context fix</title>
<updated>2009-01-04T21:33:20+00:00</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@suse.de</email>
</author>
<published>2009-01-04T20:00:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=54566b2c1594c2326a645a3551f9d989f7ba3c5e'/>
<id>54566b2c1594c2326a645a3551f9d989f7ba3c5e</id>
<content type='text'>
With the write_begin/write_end aops, page_symlink was broken because it
could no longer pass a GFP_NOFS type mask into the point where the
allocations happened.  They are done in write_begin, which would always
assume that the filesystem can be entered from reclaim.  This bug could
cause filesystem deadlocks.

The funny thing with having a gfp_t mask there is that it doesn't really
allow the caller to arbitrarily tinker with the context in which it can be
called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
take the page lock.  The only thing any callers care about is __GFP_FS
anyway, so turn that into a single flag.

Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
this flag in their write_begin function.  Change __grab_cache_page to
accept a nofs argument as well, to honour that flag (while we're there,
change the name to grab_cache_page_write_begin which is more instructive
and does away with random leading underscores).

This is really a more flexible way to go in the end anyway -- if a
filesystem happens to want any extra allocations aside from the pagecache
ones in ints write_begin function, it may now use GFP_KERNEL (rather than
GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
random example).

[kosaki.motohiro@jp.fujitsu.com: fix ubifs]
[kosaki.motohiro@jp.fujitsu.com: fix fuse]
Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Reviewed-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: &lt;stable@kernel.org&gt;		[2.6.28.x]
Signed-off-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
[ Cleaned up the calling convention: just pass in the AOP flags
  untouched to the grab_cache_page_write_begin() function.  That
  just simplifies everybody, and may even allow future expansion of the
  logic.   - Linus ]
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With the write_begin/write_end aops, page_symlink was broken because it
could no longer pass a GFP_NOFS type mask into the point where the
allocations happened.  They are done in write_begin, which would always
assume that the filesystem can be entered from reclaim.  This bug could
cause filesystem deadlocks.

The funny thing with having a gfp_t mask there is that it doesn't really
allow the caller to arbitrarily tinker with the context in which it can be
called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
take the page lock.  The only thing any callers care about is __GFP_FS
anyway, so turn that into a single flag.

Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
this flag in their write_begin function.  Change __grab_cache_page to
accept a nofs argument as well, to honour that flag (while we're there,
change the name to grab_cache_page_write_begin which is more instructive
and does away with random leading underscores).

This is really a more flexible way to go in the end anyway -- if a
filesystem happens to want any extra allocations aside from the pagecache
ones in ints write_begin function, it may now use GFP_KERNEL (rather than
GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
random example).

[kosaki.motohiro@jp.fujitsu.com: fix ubifs]
[kosaki.motohiro@jp.fujitsu.com: fix fuse]
Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Reviewed-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: &lt;stable@kernel.org&gt;		[2.6.28.x]
Signed-off-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
[ Cleaned up the calling convention: just pass in the AOP flags
  untouched to the grab_cache_page_write_begin() function.  That
  just simplifies everybody, and may even allow future expansion of the
  logic.   - Linus ]
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
