<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/mm/filemap.c, branch v3.14.2</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>fix O_SYNC|O_APPEND syncing the wrong range on write()</title>
<updated>2014-02-09T20:18:09+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2014-02-09T20:18:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d311d79de305f1ada47cadd672e6ed1b28a949eb'/>
<id>d311d79de305f1ada47cadd672e6ed1b28a949eb</id>
<content type='text'>
It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support)
when sync_page_range() had been introduced; generic_file_write{,v}() correctly
synced
	pos_after_write - written .. pos_after_write - 1
but generic_file_aio_write() synced
	pos_before_write .. pos_before_write + written - 1
instead.  Which is not the same thing with O_APPEND, obviously.
A couple of years later correct variant had been killed off when
everything switched to use of generic_file_aio_write().

All users of generic_file_aio_write() are affected, and the same bug
has been copied into other instances of -&gt;aio_write().

The fix is trivial; the only subtle point is that generic_write_sync()
ought to be inlined to avoid calculations useless for the majority of
calls.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support)
when sync_page_range() had been introduced; generic_file_write{,v}() correctly
synced
	pos_after_write - written .. pos_after_write - 1
but generic_file_aio_write() synced
	pos_before_write .. pos_before_write + written - 1
instead.  Which is not the same thing with O_APPEND, obviously.
A couple of years later correct variant had been killed off when
everything switched to use of generic_file_aio_write().

All users of generic_file_aio_write() are affected, and the same bug
has been copied into other instances of -&gt;aio_write().

The fix is trivial; the only subtle point is that generic_write_sync()
ought to be inlined to avoid calculations useless for the majority of
calls.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2014-01-28T16:38:04+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-01-28T16:38:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=bf3d846b783327359ddc4bd4f52627b36abb4d1d'/>
<id>bf3d846b783327359ddc4bd4f52627b36abb4d1d</id>
<content type='text'>
Pull vfs updates from Al Viro:
 "Assorted stuff; the biggest pile here is Christoph's ACL series.  Plus
  assorted cleanups and fixes all over the place...

  There will be another pile later this week"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits)
  __dentry_path() fixes
  vfs: Remove second variable named error in __dentry_path
  vfs: Is mounted should be testing mnt_ns for NULL or error.
  Fix race when checking i_size on direct i/o read
  hfsplus: remove can_set_xattr
  nfsd: use get_acl and -&gt;set_acl
  fs: remove generic_acl
  nfs: use generic posix ACL infrastructure for v3 Posix ACLs
  gfs2: use generic posix ACL infrastructure
  jfs: use generic posix ACL infrastructure
  xfs: use generic posix ACL infrastructure
  reiserfs: use generic posix ACL infrastructure
  ocfs2: use generic posix ACL infrastructure
  jffs2: use generic posix ACL infrastructure
  hfsplus: use generic posix ACL infrastructure
  f2fs: use generic posix ACL infrastructure
  ext2/3/4: use generic posix ACL infrastructure
  btrfs: use generic posix ACL infrastructure
  fs: make posix_acl_create more useful
  fs: make posix_acl_chmod more useful
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull vfs updates from Al Viro:
 "Assorted stuff; the biggest pile here is Christoph's ACL series.  Plus
  assorted cleanups and fixes all over the place...

  There will be another pile later this week"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits)
  __dentry_path() fixes
  vfs: Remove second variable named error in __dentry_path
  vfs: Is mounted should be testing mnt_ns for NULL or error.
  Fix race when checking i_size on direct i/o read
  hfsplus: remove can_set_xattr
  nfsd: use get_acl and -&gt;set_acl
  fs: remove generic_acl
  nfs: use generic posix ACL infrastructure for v3 Posix ACLs
  gfs2: use generic posix ACL infrastructure
  jfs: use generic posix ACL infrastructure
  xfs: use generic posix ACL infrastructure
  reiserfs: use generic posix ACL infrastructure
  ocfs2: use generic posix ACL infrastructure
  jffs2: use generic posix ACL infrastructure
  hfsplus: use generic posix ACL infrastructure
  f2fs: use generic posix ACL infrastructure
  ext2/3/4: use generic posix ACL infrastructure
  btrfs: use generic posix ACL infrastructure
  fs: make posix_acl_create more useful
  fs: make posix_acl_chmod more useful
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix race when checking i_size on direct i/o read</title>
<updated>2014-01-26T13:26:42+00:00</updated>
<author>
<name>Steven Whitehouse</name>
<email>swhiteho@redhat.com</email>
</author>
<published>2014-01-24T14:42:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9fe55eea7e4b444bafc42fa0000cc2d1d2847275'/>
<id>9fe55eea7e4b444bafc42fa0000cc2d1d2847275</id>
<content type='text'>
So far I've had one ACK for this, and no other comments. So I think it
is probably time to send this via some suitable tree. I'm guessing that
the vfs tree would be the most appropriate route, but not sure that
there is one at the moment (don't see anything recent at kernel.org)
so in that case I think -mm is the "back up plan". Al, please let me
know if you will take this?

Steve.

---------------------

Following on from the "Re: [PATCH v3] vfs: fix a bug when we do some dio
reads with append dio writes" thread on linux-fsdevel, this patch is my
current version of the fix proposed as option (b) in that thread.

Removing the i_size test from the direct i/o read path at vfs level
means that filesystems now have to deal with requests which are beyond
i_size themselves. These I've divided into three sets:

 a) Those with "no op" -&gt;direct_IO (9p, cifs, ceph)
These are obviously not going to be an issue

 b) Those with "home brew" -&gt;direct_IO (nfs, fuse)
I've been told that NFS should not have any problem with the larger
i_size, however I've added an extra test to FUSE to duplicate the
original behaviour just to be on the safe side.

 c) Those using __blockdev_direct_IO()
These call through to -&gt;get_block() which should deal with the EOF
condition correctly. I've verified that with GFS2 and I believe that
Zheng has verified it for ext4. I've also run the test on XFS and it
passes both before and after this change.

The part of the patch in filemap.c looks a lot larger than it really is
- there are only two lines of real change. The rest is just indentation
of the contained code.

There remains a test of i_size though, which was added for btrfs. It
doesn't cause the other filesystems a problem as the test is performed
after -&gt;direct_IO has been called. It is possible that there is a race
that does matter to btrfs, however this patch doesn't change that, so
its still an overall improvement.

Signed-off-by: Steven Whitehouse &lt;swhiteho@redhat.com&gt;
Reported-by: Zheng Liu &lt;gnehzuil.liu@gmail.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Acked-by: Miklos Szeredi &lt;miklos@szeredi.hu&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Josef Bacik &lt;jbacik@fb.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
So far I've had one ACK for this, and no other comments. So I think it
is probably time to send this via some suitable tree. I'm guessing that
the vfs tree would be the most appropriate route, but not sure that
there is one at the moment (don't see anything recent at kernel.org)
so in that case I think -mm is the "back up plan". Al, please let me
know if you will take this?

Steve.

---------------------

Following on from the "Re: [PATCH v3] vfs: fix a bug when we do some dio
reads with append dio writes" thread on linux-fsdevel, this patch is my
current version of the fix proposed as option (b) in that thread.

Removing the i_size test from the direct i/o read path at vfs level
means that filesystems now have to deal with requests which are beyond
i_size themselves. These I've divided into three sets:

 a) Those with "no op" -&gt;direct_IO (9p, cifs, ceph)
These are obviously not going to be an issue

 b) Those with "home brew" -&gt;direct_IO (nfs, fuse)
I've been told that NFS should not have any problem with the larger
i_size, however I've added an extra test to FUSE to duplicate the
original behaviour just to be on the safe side.

 c) Those using __blockdev_direct_IO()
These call through to -&gt;get_block() which should deal with the EOF
condition correctly. I've verified that with GFS2 and I believe that
Zheng has verified it for ext4. I've also run the test on XFS and it
passes both before and after this change.

The part of the patch in filemap.c looks a lot larger than it really is
- there are only two lines of real change. The rest is just indentation
of the contained code.

There remains a test of i_size though, which was added for btrfs. It
doesn't cause the other filesystems a problem as the test is performed
after -&gt;direct_IO has been called. It is possible that there is a race
that does matter to btrfs, however this patch doesn't change that, so
its still an overall improvement.

Signed-off-by: Steven Whitehouse &lt;swhiteho@redhat.com&gt;
Reported-by: Zheng Liu &lt;gnehzuil.liu@gmail.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Acked-by: Miklos Szeredi &lt;miklos@szeredi.hu&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Josef Bacik &lt;jbacik@fb.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE</title>
<updated>2014-01-24T00:36:50+00:00</updated>
<author>
<name>Sasha Levin</name>
<email>sasha.levin@oracle.com</email>
</author>
<published>2014-01-23T23:52:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=309381feaee564281c3d9e90fbca8963bb7428ad'/>
<id>309381feaee564281c3d9e90fbca8963bb7428ad</id>
<content type='text'>
Most of the VM_BUG_ON assertions are performed on a page.  Usually, when
one of these assertions fails we'll get a BUG_ON with a call stack and
the registers.

I've recently noticed based on the requests to add a small piece of code
that dumps the page to various VM_BUG_ON sites that the page dump is
quite useful to people debugging issues in mm.

This patch adds a VM_BUG_ON_PAGE(cond, page) which beyond doing what
VM_BUG_ON() does, also dumps the page before executing the actual
BUG_ON.

[akpm@linux-foundation.org: fix up includes]
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Most of the VM_BUG_ON assertions are performed on a page.  Usually, when
one of these assertions fails we'll get a BUG_ON with a call stack and
the registers.

I've recently noticed based on the requests to add a small piece of code
that dumps the page to various VM_BUG_ON sites that the page dump is
quite useful to people debugging issues in mm.

This patch adds a VM_BUG_ON_PAGE(cond, page) which beyond doing what
VM_BUG_ON() does, also dumps the page before executing the actual
BUG_ON.

[akpm@linux-foundation.org: fix up includes]
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: drop actor argument of do_generic_file_read()</title>
<updated>2013-11-15T00:32:13+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2013-11-14T22:30:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b77d88d493b8fc7a4c2dadd3bb86d1dee2f53a56'/>
<id>b77d88d493b8fc7a4c2dadd3bb86d1dee2f53a56</id>
<content type='text'>
There's only one caller of do_generic_file_read() and the only actor is
file_read_actor().  No reason to have a callback parameter.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Reviewed-by: Wanpeng Li &lt;liwanp@linux.vnet.ibm.com&gt;
Cc: Matthew Wilcox &lt;willy@linux.intel.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There's only one caller of do_generic_file_read() and the only actor is
file_read_actor().  No reason to have a callback parameter.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Reviewed-by: Wanpeng Li &lt;liwanp@linux.vnet.ibm.com&gt;
Cc: Matthew Wilcox &lt;willy@linux.intel.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: memcg: handle non-error OOM situations more gracefully</title>
<updated>2013-10-17T04:35:53+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2013-10-16T20:46:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=4942642080ea82d99ab5b653abb9a12b7ba31f4a'/>
<id>4942642080ea82d99ab5b653abb9a12b7ba31f4a</id>
<content type='text'>
Commit 3812c8c8f395 ("mm: memcg: do not trap chargers with full
callstack on OOM") assumed that only a few places that can trigger a
memcg OOM situation do not return VM_FAULT_OOM, like optional page cache
readahead.  But there are many more and it's impractical to annotate
them all.

First of all, we don't want to invoke the OOM killer when the failed
allocation is gracefully handled, so defer the actual kill to the end of
the fault handling as well.  This simplifies the code quite a bit for
added bonus.

Second, since a failed allocation might not be the abrupt end of the
fault, the memcg OOM handler needs to be re-entrant until the fault
finishes for subsequent allocation attempts.  If an allocation is
attempted after the task already OOMed, allow it to bypass the limit so
that it can quickly finish the fault and invoke the OOM killer.

Reported-by: azurIt &lt;azurit@pobox.sk&gt;
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 3812c8c8f395 ("mm: memcg: do not trap chargers with full
callstack on OOM") assumed that only a few places that can trigger a
memcg OOM situation do not return VM_FAULT_OOM, like optional page cache
readahead.  But there are many more and it's impractical to annotate
them all.

First of all, we don't want to invoke the OOM killer when the failed
allocation is gracefully handled, so defer the actual kill to the end of
the fault handling as well.  This simplifies the code quite a bit for
added bonus.

Second, since a failed allocation might not be the abrupt end of the
fault, the memcg OOM handler needs to be re-entrant until the fault
finishes for subsequent allocation attempts.  If an allocation is
attempted after the task already OOMed, allow it to bypass the limit so
that it can quickly finish the fault and invoke the OOM killer.

Reported-by: azurIt &lt;azurit@pobox.sk&gt;
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: cleanup add_to_page_cache_locked()</title>
<updated>2013-09-12T22:38:03+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2013-09-12T22:13:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=66a0c8ee3dce78362d59f00a8efbd752fbeddfb1'/>
<id>66a0c8ee3dce78362d59f00a8efbd752fbeddfb1</id>
<content type='text'>
Make add_to_page_cache_locked() cleaner:

 - unindent most code of the function by inverting one condition;
 - streamline code no-error path;
 - move insert error path outside normal code path;
 - call radix_tree_preload_end() earlier;

No functional changes.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;willy@linux.intel.com&gt;
Cc: Hillf Danton &lt;dhillf@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Make add_to_page_cache_locked() cleaner:

 - unindent most code of the function by inverting one condition;
 - streamline code no-error path;
 - move insert error path outside normal code path;
 - call radix_tree_preload_end() earlier;

No functional changes.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;willy@linux.intel.com&gt;
Cc: Hillf Danton &lt;dhillf@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: memcg: enable memcg OOM killer only for user faults</title>
<updated>2013-09-12T22:38:01+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2013-09-12T22:13:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=519e52473ebe9db5cdef44670d5a97f1fd53d721'/>
<id>519e52473ebe9db5cdef44670d5a97f1fd53d721</id>
<content type='text'>
System calls and kernel faults (uaccess, gup) can handle an out of memory
situation gracefully and just return -ENOMEM.

Enable the memcg OOM killer only for user faults, where it's really the
only option available.

Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: azurIt &lt;azurit@pobox.sk&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
System calls and kernel faults (uaccess, gup) can handle an out of memory
situation gracefully and just return -ENOMEM.

Enable the memcg OOM killer only for user faults, where it's really the
only option available.

Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: azurIt &lt;azurit@pobox.sk&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/radix-tree.c: make radix_tree_node_alloc() work correctly within interrupt</title>
<updated>2013-09-11T22:59:36+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2013-09-11T21:26:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5e4c0d974139a98741b829b27cf38dc8f9284490'/>
<id>5e4c0d974139a98741b829b27cf38dc8f9284490</id>
<content type='text'>
With users of radix_tree_preload() run from interrupt (block/blk-ioc.c is
one such possible user), the following race can happen:

radix_tree_preload()
...
radix_tree_insert()
  radix_tree_node_alloc()
    if (rtp-&gt;nr) {
      ret = rtp-&gt;nodes[rtp-&gt;nr - 1];
&lt;interrupt&gt;
...
radix_tree_preload()
...
radix_tree_insert()
  radix_tree_node_alloc()
    if (rtp-&gt;nr) {
      ret = rtp-&gt;nodes[rtp-&gt;nr - 1];

And we give out one radix tree node twice.  That clearly results in radix
tree corruption with different results (usually OOPS) depending on which
two users of radix tree race.

We fix the problem by making radix_tree_node_alloc() always allocate fresh
radix tree nodes when in interrupt.  Using preloading when in interrupt
doesn't make sense since all the allocations have to be atomic anyway and
we cannot steal nodes from process-context users because some users rely
on radix_tree_insert() succeeding after radix_tree_preload().
in_interrupt() check is somewhat ugly but we cannot simply key off passed
gfp_mask as that is acquired from root_gfp_mask() and thus the same for
all preload users.

Another part of the fix is to avoid node preallocation in
radix_tree_preload() when passed gfp_mask doesn't allow waiting.  Again,
preallocation in such case doesn't make sense and when preallocation would
happen in interrupt we could possibly leak some allocated nodes.  However,
some users of radix_tree_preload() require following radix_tree_insert()
to succeed.  To avoid unexpected effects for these users,
radix_tree_preload() only warns if passed gfp mask doesn't allow waiting
and we provide a new function radix_tree_maybe_preload() for those users
which get different gfp mask from different call sites and which are
prepared to handle radix_tree_insert() failure.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jens Axboe &lt;jaxboe@fusionio.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With users of radix_tree_preload() run from interrupt (block/blk-ioc.c is
one such possible user), the following race can happen:

radix_tree_preload()
...
radix_tree_insert()
  radix_tree_node_alloc()
    if (rtp-&gt;nr) {
      ret = rtp-&gt;nodes[rtp-&gt;nr - 1];
&lt;interrupt&gt;
...
radix_tree_preload()
...
radix_tree_insert()
  radix_tree_node_alloc()
    if (rtp-&gt;nr) {
      ret = rtp-&gt;nodes[rtp-&gt;nr - 1];

And we give out one radix tree node twice.  That clearly results in radix
tree corruption with different results (usually OOPS) depending on which
two users of radix tree race.

We fix the problem by making radix_tree_node_alloc() always allocate fresh
radix tree nodes when in interrupt.  Using preloading when in interrupt
doesn't make sense since all the allocations have to be atomic anyway and
we cannot steal nodes from process-context users because some users rely
on radix_tree_insert() succeeding after radix_tree_preload().
in_interrupt() check is somewhat ugly but we cannot simply key off passed
gfp_mask as that is acquired from root_gfp_mask() and thus the same for
all preload users.

Another part of the fix is to avoid node preallocation in
radix_tree_preload() when passed gfp_mask doesn't allow waiting.  Again,
preallocation in such case doesn't make sense and when preallocation would
happen in interrupt we could possibly leak some allocated nodes.  However,
some users of radix_tree_preload() require following radix_tree_insert()
to succeed.  To avoid unexpected effects for these users,
radix_tree_preload() only warns if passed gfp mask doesn't allow waiting
and we provide a new function radix_tree_maybe_preload() for those users
which get different gfp mask from different call sites and which are
prepared to handle radix_tree_insert() failure.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jens Axboe &lt;jaxboe@fusionio.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>direct-io: Handle O_(D)SYNC AIO</title>
<updated>2013-09-04T13:23:46+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@infradead.org</email>
</author>
<published>2013-09-04T13:04:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=02afc27faec94c9e068517a22acf55400976c698'/>
<id>02afc27faec94c9e068517a22acf55400976c698</id>
<content type='text'>
Call generic_write_sync() from the deferred I/O completion handler if
O_DSYNC is set for a write request.  Also make sure various callers
don't call generic_write_sync if the direct I/O code returns
-EIOCBQUEUED.

Based on an earlier patch from Jan Kara &lt;jack@suse.cz&gt; with updates from
Jeff Moyer &lt;jmoyer@redhat.com&gt; and Darrick J. Wong &lt;darrick.wong@oracle.com&gt;.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Call generic_write_sync() from the deferred I/O completion handler if
O_DSYNC is set for a write request.  Also make sure various callers
don't call generic_write_sync if the direct I/O code returns
-EIOCBQUEUED.

Based on an earlier patch from Jan Kara &lt;jack@suse.cz&gt; with updates from
Jeff Moyer &lt;jmoyer@redhat.com&gt; and Darrick J. Wong &lt;darrick.wong@oracle.com&gt;.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
</feed>
