<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/Documentation/filesystems/Locking, branch v2.6.30</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>mm: close page_mkwrite races</title>
<updated>2009-05-02T22:36:09+00:00</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@suse.de</email>
</author>
<published>2009-04-30T22:08:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b827e496c893de0c0f142abfaeb8730a2fd6b37f'/>
<id>b827e496c893de0c0f142abfaeb8730a2fd6b37f</id>
<content type='text'>
Change page_mkwrite to allow implementations to return with the page
locked, and also change it's callers (in page fault paths) to hold the
lock until the page is marked dirty.  This allows the filesystem to have
full control of page dirtying events coming from the VM.

Rather than simply hold the page locked over the page_mkwrite call, we
call page_mkwrite with the page unlocked and allow callers to return with
it locked, so filesystems can avoid LOR conditions with page lock.

The problem with the current scheme is this: a filesystem that wants to
associate some metadata with a page as long as the page is dirty, will
perform this manipulation in its -&gt;page_mkwrite.  It currently then must
return with the page unlocked and may not hold any other locks (according
to existing page_mkwrite convention).

In this window, the VM could write out the page, clearing page-dirty.  The
filesystem has no good way to detect that a dirty pte is about to be
attached, so it will happily write out the page, at which point, the
filesystem may manipulate the metadata to reflect that the page is no
longer dirty.

It is not always possible to perform the required metadata manipulation in
-&gt;set_page_dirty, because that function cannot block or fail.  The
filesystem may need to allocate some data structure, for example.

And the VM cannot mark the pte dirty before page_mkwrite, because
page_mkwrite is allowed to fail, so we must not allow any window where the
page could be written to if page_mkwrite does fail.

This solution of holding the page locked over the 3 critical operations
(page_mkwrite, setting the pte dirty, and finally setting the page dirty)
closes out races nicely, preventing page cleaning for writeout being
initiated in that window.  This provides the filesystem with a strong
synchronisation against the VM here.

- Sage needs this race closed for ceph filesystem.
- Trond for NFS (http://bugzilla.kernel.org/show_bug.cgi?id=12913).
- I need it for fsblock.
- I suspect other filesystems may need it too (eg. btrfs).
- I have converted buffer.c to the new locking. Even simple block allocation
  under dirty pages might be susceptible to i_size changing under partial page
  at the end of file (we also have a buffer.c-side problem here, but it cannot
  be fixed properly without this patch).
- Other filesystems (eg. NFS, maybe btrfs) will need to change their
  page_mkwrite functions themselves.

[ This also moves page_mkwrite another step closer to fault, which should
  eventually allow page_mkwrite to be moved into -&gt;fault, and thus avoiding a
  filesystem calldown and page lock/unlock cycle in __do_fault. ]

[akpm@linux-foundation.org: fix derefs of NULL -&gt;mapping]
Cc: Sage Weil &lt;sage@newdream.net&gt;
Cc: Trond Myklebust &lt;trond.myklebust@fys.uio.no&gt;
Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Valdis Kletnieks &lt;Valdis.Kletnieks@vt.edu&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change page_mkwrite to allow implementations to return with the page
locked, and also change it's callers (in page fault paths) to hold the
lock until the page is marked dirty.  This allows the filesystem to have
full control of page dirtying events coming from the VM.

Rather than simply hold the page locked over the page_mkwrite call, we
call page_mkwrite with the page unlocked and allow callers to return with
it locked, so filesystems can avoid LOR conditions with page lock.

The problem with the current scheme is this: a filesystem that wants to
associate some metadata with a page as long as the page is dirty, will
perform this manipulation in its -&gt;page_mkwrite.  It currently then must
return with the page unlocked and may not hold any other locks (according
to existing page_mkwrite convention).

In this window, the VM could write out the page, clearing page-dirty.  The
filesystem has no good way to detect that a dirty pte is about to be
attached, so it will happily write out the page, at which point, the
filesystem may manipulate the metadata to reflect that the page is no
longer dirty.

It is not always possible to perform the required metadata manipulation in
-&gt;set_page_dirty, because that function cannot block or fail.  The
filesystem may need to allocate some data structure, for example.

And the VM cannot mark the pte dirty before page_mkwrite, because
page_mkwrite is allowed to fail, so we must not allow any window where the
page could be written to if page_mkwrite does fail.

This solution of holding the page locked over the 3 critical operations
(page_mkwrite, setting the pte dirty, and finally setting the page dirty)
closes out races nicely, preventing page cleaning for writeout being
initiated in that window.  This provides the filesystem with a strong
synchronisation against the VM here.

- Sage needs this race closed for ceph filesystem.
- Trond for NFS (http://bugzilla.kernel.org/show_bug.cgi?id=12913).
- I need it for fsblock.
- I suspect other filesystems may need it too (eg. btrfs).
- I have converted buffer.c to the new locking. Even simple block allocation
  under dirty pages might be susceptible to i_size changing under partial page
  at the end of file (we also have a buffer.c-side problem here, but it cannot
  be fixed properly without this patch).
- Other filesystems (eg. NFS, maybe btrfs) will need to change their
  page_mkwrite functions themselves.

[ This also moves page_mkwrite another step closer to fault, which should
  eventually allow page_mkwrite to be moved into -&gt;fault, and thus avoiding a
  filesystem calldown and page lock/unlock cycle in __do_fault. ]

[akpm@linux-foundation.org: fix derefs of NULL -&gt;mapping]
Cc: Sage Weil &lt;sage@newdream.net&gt;
Cc: Trond Myklebust &lt;trond.myklebust@fys.uio.no&gt;
Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Valdis Kletnieks &lt;Valdis.Kletnieks@vt.edu&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: page_mkwrite change prototype to match fault</title>
<updated>2009-04-01T15:59:14+00:00</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@suse.de</email>
</author>
<published>2009-03-31T22:23:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c2ec175c39f62949438354f603f4aa170846aabb'/>
<id>c2ec175c39f62949438354f603f4aa170846aabb</id>
<content type='text'>
Change the page_mkwrite prototype to take a struct vm_fault, and return
VM_FAULT_xxx flags.  There should be no functional change.

This makes it possible to return much more detailed error information to
the VM (and also can provide more information eg.  virtual_address to the
driver, which might be important in some special cases).

This is required for a subsequent fix.  And will also make it easier to
merge page_mkwrite() with fault() in future.

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Chris Mason &lt;chris.mason@oracle.com&gt;
Cc: Trond Myklebust &lt;trond.myklebust@fys.uio.no&gt;
Cc: Miklos Szeredi &lt;miklos@szeredi.hu&gt;
Cc: Steven Whitehouse &lt;swhiteho@redhat.com&gt;
Cc: Mark Fasheh &lt;mfasheh@suse.com&gt;
Cc: Joel Becker &lt;joel.becker@oracle.com&gt;
Cc: Artem Bityutskiy &lt;dedekind@infradead.org&gt;
Cc: Felix Blyakher &lt;felixb@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change the page_mkwrite prototype to take a struct vm_fault, and return
VM_FAULT_xxx flags.  There should be no functional change.

This makes it possible to return much more detailed error information to
the VM (and also can provide more information eg.  virtual_address to the
driver, which might be important in some special cases).

This is required for a subsequent fix.  And will also make it easier to
merge page_mkwrite() with fault() in future.

Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Chris Mason &lt;chris.mason@oracle.com&gt;
Cc: Trond Myklebust &lt;trond.myklebust@fys.uio.no&gt;
Cc: Miklos Szeredi &lt;miklos@szeredi.hu&gt;
Cc: Steven Whitehouse &lt;swhiteho@redhat.com&gt;
Cc: Mark Fasheh &lt;mfasheh@suse.com&gt;
Cc: Joel Becker &lt;joel.becker@oracle.com&gt;
Cc: Artem Bityutskiy &lt;dedekind@infradead.org&gt;
Cc: Felix Blyakher &lt;felixb@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Move FASYNC bit handling to f_op-&gt;fasync()</title>
<updated>2009-03-16T14:32:27+00:00</updated>
<author>
<name>Jonathan Corbet</name>
<email>corbet@lwn.net</email>
</author>
<published>2009-02-01T21:26:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=76398425bb06b07cc3a3b1ce169c67dc9d6874ed'/>
<id>76398425bb06b07cc3a3b1ce169c67dc9d6874ed</id>
<content type='text'>
Removing the BKL from FASYNC handling ran into the challenge of keeping the
setting of the FASYNC bit in filp-&gt;f_flags atomic with regard to calls to
the underlying fasync() function.  Andi Kleen suggested moving the handling
of that bit into fasync(); this patch does exactly that.  As a result, we
have a couple of internal API changes: fasync() must now manage the FASYNC
bit, and it will be called without the BKL held.

As it happens, every fasync() implementation in the kernel with one
exception calls fasync_helper().  So, if we make fasync_helper() set the
FASYNC bit, we can avoid making any changes to the other fasync()
functions - as long as those functions, themselves, have proper locking.
Most fasync() implementations do nothing but call fasync_helper() - which
has its own lock - so they are easily verified as correct.  The BKL had
already been pushed down into the rest.

The networking code has its own version of fasync_helper(), so that code
has been augmented with explicit FASYNC bit handling.

Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jonathan Corbet &lt;corbet@lwn.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Removing the BKL from FASYNC handling ran into the challenge of keeping the
setting of the FASYNC bit in filp-&gt;f_flags atomic with regard to calls to
the underlying fasync() function.  Andi Kleen suggested moving the handling
of that bit into fasync(); this patch does exactly that.  As a result, we
have a couple of internal API changes: fasync() must now manage the FASYNC
bit, and it will be called without the BKL held.

As it happens, every fasync() implementation in the kernel with one
exception calls fasync_helper().  So, if we make fasync_helper() set the
FASYNC bit, we can avoid making any changes to the other fasync()
functions - as long as those functions, themselves, have proper locking.
Most fasync() implementations do nothing but call fasync_helper() - which
has its own lock - so they are easily verified as correct.  The BKL had
already been pushed down into the rest.

The networking code has its own version of fasync_helper(), so that code
has been augmented with explicit FASYNC bit handling.

Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jonathan Corbet &lt;corbet@lwn.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>filesystem freeze: add error handling of write_super_lockfs/unlockfs</title>
<updated>2009-01-10T00:54:42+00:00</updated>
<author>
<name>Takashi Sato</name>
<email>t-sato@yk.jp.nec.com</email>
</author>
<published>2009-01-10T00:40:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c4be0c1dc4cdc37b175579be1460f15ac6495e9a'/>
<id>c4be0c1dc4cdc37b175579be1460f15ac6495e9a</id>
<content type='text'>
Currently, ext3 in mainline Linux doesn't have the freeze feature which
suspends write requests.  So, we cannot take a backup which keeps the
filesystem's consistency with the storage device's features (snapshot and
replication) while it is mounted.

In many case, a commercial filesystem (e.g.  VxFS) has the freeze feature
and it would be used to get the consistent backup.

If Linux's standard filesystem ext3 has the freeze feature, we can do it
without a commercial filesystem.

So I have implemented the ioctls of the freeze feature.
I think we can take the consistent backup with the following steps.
1. Freeze the filesystem with the freeze ioctl.
2. Separate the replication volume or create the snapshot
   with the storage device's feature.
3. Unfreeze the filesystem with the unfreeze ioctl.
4. Take the backup from the separated replication volume
   or the snapshot.

This patch:

VFS:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they can return an error.
Rename write_super_lockfs and unlockfs of the super block operation
freeze_fs and unfreeze_fs to avoid a confusion.

ext3, ext4, xfs, gfs2, jfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that write_super_lockfs returns an error if needed,
and unlockfs always returns 0.

reiserfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they always return 0 (success) to keep a current behavior.

Signed-off-by: Takashi Sato &lt;t-sato@yk.jp.nec.com&gt;
Signed-off-by: Masayuki Hamaguchi &lt;m-hamaguchi@ys.jp.nec.com&gt;
Cc: &lt;xfs-masters@oss.sgi.com&gt;
Cc: &lt;linux-ext4@vger.kernel.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Dave Kleikamp &lt;shaggy@austin.ibm.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Alasdair G Kergon &lt;agk@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, ext3 in mainline Linux doesn't have the freeze feature which
suspends write requests.  So, we cannot take a backup which keeps the
filesystem's consistency with the storage device's features (snapshot and
replication) while it is mounted.

In many case, a commercial filesystem (e.g.  VxFS) has the freeze feature
and it would be used to get the consistent backup.

If Linux's standard filesystem ext3 has the freeze feature, we can do it
without a commercial filesystem.

So I have implemented the ioctls of the freeze feature.
I think we can take the consistent backup with the following steps.
1. Freeze the filesystem with the freeze ioctl.
2. Separate the replication volume or create the snapshot
   with the storage device's feature.
3. Unfreeze the filesystem with the unfreeze ioctl.
4. Take the backup from the separated replication volume
   or the snapshot.

This patch:

VFS:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they can return an error.
Rename write_super_lockfs and unlockfs of the super block operation
freeze_fs and unfreeze_fs to avoid a confusion.

ext3, ext4, xfs, gfs2, jfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that write_super_lockfs returns an error if needed,
and unlockfs always returns 0.

reiserfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they always return 0 (success) to keep a current behavior.

Signed-off-by: Takashi Sato &lt;t-sato@yk.jp.nec.com&gt;
Signed-off-by: Masayuki Hamaguchi &lt;m-hamaguchi@ys.jp.nec.com&gt;
Cc: &lt;xfs-masters@oss.sgi.com&gt;
Cc: &lt;linux-ext4@vger.kernel.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Dave Kleikamp &lt;shaggy@austin.ibm.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Alasdair G Kergon &lt;agk@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>poll: allow f_op-&gt;poll to sleep</title>
<updated>2009-01-06T23:59:12+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>htejun@gmail.com</email>
</author>
<published>2009-01-06T22:40:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5f820f648c92a5ecc771a96b3c29aa6e90013bba'/>
<id>5f820f648c92a5ecc771a96b3c29aa6e90013bba</id>
<content type='text'>
f_op-&gt;poll is the only vfs operation which is not allowed to sleep.  It's
because poll and select implementation used task state to synchronize
against wake ups, which doesn't have to be the case anymore as wait/wake
interface can now use custom wake up functions.  The non-sleep restriction
can be a bit tricky because -&gt;poll is not called from an atomic context
and the result of accidentally sleeping in -&gt;poll only shows up as
temporary busy looping when the timing is right or rather wrong.

This patch converts poll/select to use custom wake up function and use
separate triggered variable to synchronize against wake up events.  The
only added overhead is an extra function call during wake up and
negligible.

This patch removes the one non-sleep exception from vfs locking rules and
is beneficial to userland filesystem implementations like FUSE, 9p or
peculiar fs like spufs as it's very difficult for those to implement
non-sleeping poll method.

While at it, make the following cosmetic changes to make poll.h and
select.c checkpatch friendly.

* s/type * symbol/type *symbol/		   : three places in poll.h
* remove blank line before EXPORT_SYMBOL() : two places in select.c

Oleg: spotted missing barrier in poll_schedule_timeout()
Davide: spotted missing write barrier in pollwake()

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Eric Van Hensbergen &lt;ericvh@gmail.com&gt;
Cc: Ron Minnich &lt;rminnich@sandia.gov&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@suse.cz&gt;
Cc: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Cc: Brad Boyer &lt;flar@allandria.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Cc: Mauro Carvalho Chehab &lt;mchehab@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
f_op-&gt;poll is the only vfs operation which is not allowed to sleep.  It's
because poll and select implementation used task state to synchronize
against wake ups, which doesn't have to be the case anymore as wait/wake
interface can now use custom wake up functions.  The non-sleep restriction
can be a bit tricky because -&gt;poll is not called from an atomic context
and the result of accidentally sleeping in -&gt;poll only shows up as
temporary busy looping when the timing is right or rather wrong.

This patch converts poll/select to use custom wake up function and use
separate triggered variable to synchronize against wake up events.  The
only added overhead is an extra function call during wake up and
negligible.

This patch removes the one non-sleep exception from vfs locking rules and
is beneficial to userland filesystem implementations like FUSE, 9p or
peculiar fs like spufs as it's very difficult for those to implement
non-sleeping poll method.

While at it, make the following cosmetic changes to make poll.h and
select.c checkpatch friendly.

* s/type * symbol/type *symbol/		   : three places in poll.h
* remove blank line before EXPORT_SYMBOL() : two places in select.c

Oleg: spotted missing barrier in poll_schedule_timeout()
Davide: spotted missing write barrier in pollwake()

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Eric Van Hensbergen &lt;ericvh@gmail.com&gt;
Cc: Ron Minnich &lt;rminnich@sandia.gov&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@suse.cz&gt;
Cc: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Cc: Brad Boyer &lt;flar@allandria.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Cc: Mauro Carvalho Chehab &lt;mchehab@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kill -&gt;dir_notify()</title>
<updated>2008-12-31T23:07:43+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2008-12-26T05:57:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6badd79bd002788aaec27b50a74ab69ef65ab8ee'/>
<id>6badd79bd002788aaec27b50a74ab69ef65ab8ee</id>
<content type='text'>
Remove the hopelessly misguided -&gt;dir_notify().  The only instance (cifs)
has been broken by design from the very beginning; the objects it creates
are never destroyed, keep references to struct file they can outlive, nothing
that could possibly evict them exists on close(2) path *and* no locking
whatsoever is done to prevent races with close(), should the previous, er,
deficiencies someday be dealt with.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove the hopelessly misguided -&gt;dir_notify().  The only instance (cifs)
has been broken by design from the very beginning; the objects it creates
are never destroyed, keep references to struct file they can outlive, nothing
that could possibly evict them exists on close(2) path *and* no locking
whatsoever is done to prevent races with close(), should the previous, er,
deficiencies someday be dealt with.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: remove prepare_write/commit_write</title>
<updated>2008-10-30T18:38:45+00:00</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@suse.de</email>
</author>
<published>2008-10-29T21:00:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4e02ed4b4a2fae34aae766a5bb93ae235f60adb8'/>
<id>4e02ed4b4a2fae34aae766a5bb93ae235f60adb8</id>
<content type='text'>
Nothing uses prepare_write or commit_write. Remove them from the tree
completely.

[akpm@linux-foundation.org: schedule simple_prepare_write() for unexporting]
Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Nothing uses prepare_write or commit_write. Remove them from the tree
completely.

[akpm@linux-foundation.org: schedule simple_prepare_write() for unexporting]
Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>update Documentation/filesystems/Locking for 2.6.27 changes</title>
<updated>2008-09-09T18:51:15+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2008-09-09T18:02:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=adaae7215e5130e5ce1ac3ee390e5a23101b09b2'/>
<id>adaae7215e5130e5ce1ac3ee390e5a23101b09b2</id>
<content type='text'>
In the 2.6.27 circle -&gt;fasync lost the BKL, and the last remaining
-&gt;open variant that takes the BKL is also gone.  -&gt;get_sb and -&gt;kill_sb
didn't have BKL forever, so updated the entries while we're at that.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In the 2.6.27 circle -&gt;fasync lost the BKL, and the last remaining
-&gt;open variant that takes the BKL is also gone.  -&gt;get_sb and -&gt;kill_sb
didn't have BKL forever, so updated the entries while we're at that.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>access_process_vm device memory infrastructure</title>
<updated>2008-07-24T17:47:15+00:00</updated>
<author>
<name>Rik van Riel</name>
<email>riel@redhat.com</email>
</author>
<published>2008-07-24T04:27:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=28b2ee20c7cba812b6f2ccf6d722cf86d00a84dc'/>
<id>28b2ee20c7cba812b6f2ccf6d722cf86d00a84dc</id>
<content type='text'>
In order to be able to debug things like the X server and programs using
the PPC Cell SPUs, the debugger needs to be able to access device memory
through ptrace and /proc/pid/mem.

This patch:

Add the generic_access_phys access function and put the hooks in place
to allow access_process_vm to access device or PPC Cell SPU memory.

[riel@redhat.com: Add documentation for the vm_ops-&gt;access function]
Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Benjamin Herrensmidt &lt;benh@kernel.crashing.org&gt;
Cc: Dave Airlie &lt;airlied@linux.ie&gt;
Cc: Hugh Dickins &lt;hugh@veritas.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In order to be able to debug things like the X server and programs using
the PPC Cell SPUs, the debugger needs to be able to access device memory
through ptrace and /proc/pid/mem.

This patch:

Add the generic_access_phys access function and put the hooks in place
to allow access_process_vm to access device or PPC Cell SPU memory.

[riel@redhat.com: Add documentation for the vm_ops-&gt;access function]
Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Benjamin Herrensmidt &lt;benh@kernel.crashing.org&gt;
Cc: Dave Airlie &lt;airlied@linux.ie&gt;
Cc: Hugh Dickins &lt;hugh@veritas.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] kill -&gt;put_inode</title>
<updated>2008-05-06T17:45:34+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2008-04-29T15:46:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=33dcdac2df54e66c447ae03f58c95c7251aa5649'/>
<id>33dcdac2df54e66c447ae03f58c95c7251aa5649</id>
<content type='text'>
And with that last patch to affs killing the last put_inode instance we
can finally, after many years of transition kill this racy and awkward
interface.

(It's kinda funny that even the description in
Documentation/filesystems/vfs.txt was entirely wrong..)

Also remove a very misleading comment above the defintion of
struct super_operations.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
And with that last patch to affs killing the last put_inode instance we
can finally, after many years of transition kill this racy and awkward
interface.

(It's kinda funny that even the description in
Documentation/filesystems/vfs.txt was entirely wrong..)

Also remove a very misleading comment above the defintion of
struct super_operations.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
</feed>
