<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/fs, branch linux-3.4.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>ocfs2: fix BUG when calculate new backup super</title>
<updated>2016-10-26T15:15:41+00:00</updated>
<author>
<name>Joseph Qi</name>
<email>joseph.qi@huawei.com</email>
</author>
<published>2015-12-29T22:54:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c173fefcd46726e3a1f0d0623c1f9008eaa00a4b'/>
<id>c173fefcd46726e3a1f0d0623c1f9008eaa00a4b</id>
<content type='text'>
commit 5c9ee4cbf2a945271f25b89b137f2c03bbc3be33 upstream.

When resizing, it firstly extends the last gd.  Once it should backup
super in the gd, it calculates new backup super and update the
corresponding value.

But it currently doesn't consider the situation that the backup super is
already done.  And in this case, it still sets the bit in gd bitmap and
then decrease from bg_free_bits_count, which leads to a corrupted gd and
trigger the BUG in ocfs2_block_group_set_bits:

    BUG_ON(le16_to_cpu(bg-&gt;bg_free_bits_count) &lt; num_bits);

So check whether the backup super is done and then do the updates.

Signed-off-by: Joseph Qi &lt;joseph.qi@huawei.com&gt;
Reviewed-by: Jiufei Xue &lt;xuejiufei@huawei.com&gt;
Reviewed-by: Yiwen Jiang &lt;jiangyiwen@huawei.com&gt;
Cc: Mark Fasheh &lt;mfasheh@suse.de&gt;
Cc: Joel Becker &lt;jlbec@evilplan.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 5c9ee4cbf2a945271f25b89b137f2c03bbc3be33 upstream.

When resizing, it firstly extends the last gd.  Once it should backup
super in the gd, it calculates new backup super and update the
corresponding value.

But it currently doesn't consider the situation that the backup super is
already done.  And in this case, it still sets the bit in gd bitmap and
then decrease from bg_free_bits_count, which leads to a corrupted gd and
trigger the BUG in ocfs2_block_group_set_bits:

    BUG_ON(le16_to_cpu(bg-&gt;bg_free_bits_count) &lt; num_bits);

So check whether the backup super is done and then do the updates.

Signed-off-by: Joseph Qi &lt;joseph.qi@huawei.com&gt;
Reviewed-by: Jiufei Xue &lt;xuejiufei@huawei.com&gt;
Reviewed-by: Yiwen Jiang &lt;jiangyiwen@huawei.com&gt;
Cc: Mark Fasheh &lt;mfasheh@suse.de&gt;
Cc: Joel Becker &lt;jlbec@evilplan.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>9p: -&gt;evict_inode() should kick out -&gt;i_data, not -&gt;i_mapping</title>
<updated>2016-10-26T15:15:35+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2015-12-08T08:07:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=938188912dc13ff051362774f840721d86a9d7e3'/>
<id>938188912dc13ff051362774f840721d86a9d7e3</id>
<content type='text'>
commit 4ad78628445d26e5e9487b2e8f23274ad7b0f5d3 upstream.

For block devices the pagecache is associated with the inode
on bdevfs, not with the aliasing ones on the mountable filesystems.
The latter have its own -&gt;i_data empty and -&gt;i_mapping pointing
to the (unique per major/minor) bdevfs inode.  That guarantees
cache coherence between all block device inodes with the same
device number.

Eviction of an alias inode has no business trying to evict the
pages belonging to bdevfs one; moreover, -&gt;i_mapping is only
safe to access when the thing is opened.  At the time of
-&gt;evict_inode() the victim is definitely *not* opened.  We are
about to kill the address space embedded into struct inode
(inode-&gt;i_data) and that's what we need to empty of any pages.

9p instance tries to empty inode-&gt;i_mapping instead, which is
both unsafe and bogus - if we have several device nodes with
the same device number in different places, closing one of them
should not try to empty the (shared) page cache.

Fortunately, other instances in the tree are OK; they are
evicting from &amp;inode-&gt;i_data instead, as 9p one should.

Reported-by: "Suzuki K. Poulose" &lt;Suzuki.Poulose@arm.com&gt;
Tested-by: "Suzuki K. Poulose" &lt;Suzuki.Poulose@arm.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 4ad78628445d26e5e9487b2e8f23274ad7b0f5d3 upstream.

For block devices the pagecache is associated with the inode
on bdevfs, not with the aliasing ones on the mountable filesystems.
The latter have its own -&gt;i_data empty and -&gt;i_mapping pointing
to the (unique per major/minor) bdevfs inode.  That guarantees
cache coherence between all block device inodes with the same
device number.

Eviction of an alias inode has no business trying to evict the
pages belonging to bdevfs one; moreover, -&gt;i_mapping is only
safe to access when the thing is opened.  At the time of
-&gt;evict_inode() the victim is definitely *not* opened.  We are
about to kill the address space embedded into struct inode
(inode-&gt;i_data) and that's what we need to empty of any pages.

9p instance tries to empty inode-&gt;i_mapping instead, which is
both unsafe and bogus - if we have several device nodes with
the same device number in different places, closing one of them
should not try to empty the (shared) page cache.

Fortunately, other instances in the tree are OK; they are
evicting from &amp;inode-&gt;i_data instead, as 9p one should.

Reported-by: "Suzuki K. Poulose" &lt;Suzuki.Poulose@arm.com&gt;
Tested-by: "Suzuki K. Poulose" &lt;Suzuki.Poulose@arm.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>jbd2: Fix unreclaimed pages after truncate in data=journal mode</title>
<updated>2016-10-26T15:15:34+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2015-11-24T20:34:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a164637644572e50547569cb05fe2d958c7061ec'/>
<id>a164637644572e50547569cb05fe2d958c7061ec</id>
<content type='text'>
commit bc23f0c8d7ccd8d924c4e70ce311288cb3e61ea8 upstream.

Ted and Namjae have reported that truncated pages don't get timely
reclaimed after being truncated in data=journal mode. The following test
triggers the issue easily:

for (i = 0; i &lt; 1000; i++) {
	pwrite(fd, buf, 1024*1024, 0);
	fsync(fd);
	fsync(fd);
	ftruncate(fd, 0);
}

The reason is that journal_unmap_buffer() finds that truncated buffers
are not journalled (jh-&gt;b_transaction == NULL), they are part of
checkpoint list of a transaction (jh-&gt;b_cp_transaction != NULL) and have
been already written out (!buffer_dirty(bh)). We clean such buffers but
we leave them in the checkpoint list. Since checkpoint transaction holds
a reference to the journal head, these buffers cannot be released until
the checkpoint transaction is cleaned up. And at that point we don't
call release_buffer_page() anymore so pages detached from mapping are
lingering in the system waiting for reclaim to find them and free them.

Fix the problem by removing buffers from transaction checkpoint lists
when journal_unmap_buffer() finds out they don't have to be there
anymore.

Reported-and-tested-by: Namjae Jeon &lt;namjae.jeon@samsung.com&gt;
Fixes: de1b794130b130e77ffa975bb58cb843744f9ae5
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit bc23f0c8d7ccd8d924c4e70ce311288cb3e61ea8 upstream.

Ted and Namjae have reported that truncated pages don't get timely
reclaimed after being truncated in data=journal mode. The following test
triggers the issue easily:

for (i = 0; i &lt; 1000; i++) {
	pwrite(fd, buf, 1024*1024, 0);
	fsync(fd);
	fsync(fd);
	ftruncate(fd, 0);
}

The reason is that journal_unmap_buffer() finds that truncated buffers
are not journalled (jh-&gt;b_transaction == NULL), they are part of
checkpoint list of a transaction (jh-&gt;b_cp_transaction != NULL) and have
been already written out (!buffer_dirty(bh)). We clean such buffers but
we leave them in the checkpoint list. Since checkpoint transaction holds
a reference to the journal head, these buffers cannot be released until
the checkpoint transaction is cleaned up. And at that point we don't
call release_buffer_page() anymore so pages detached from mapping are
lingering in the system waiting for reclaim to find them and free them.

Fix the problem by removing buffers from transaction checkpoint lists
when journal_unmap_buffer() finds out they don't have to be there
anymore.

Reported-and-tested-by: Namjae Jeon &lt;namjae.jeon@samsung.com&gt;
Fixes: de1b794130b130e77ffa975bb58cb843744f9ae5
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: Fix handling of extended tv_sec</title>
<updated>2016-10-26T15:15:34+00:00</updated>
<author>
<name>David Turner</name>
<email>novalis@novalis.org</email>
</author>
<published>2015-11-24T19:34:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0a36982a2abdaf4b20401bc0a0ba3881c7071f4a'/>
<id>0a36982a2abdaf4b20401bc0a0ba3881c7071f4a</id>
<content type='text'>
commit a4dad1ae24f850410c4e60f22823cba1289b8d52 upstream.

In ext4, the bottom two bits of {a,c,m}time_extra are used to extend
the {a,c,m}time fields, deferring the year 2038 problem to the year
2446.

When decoding these extended fields, for times whose bottom 32 bits
would represent a negative number, sign extension causes the 64-bit
extended timestamp to be negative as well, which is not what's
intended.  This patch corrects that issue, so that the only negative
{a,c,m}times are those between 1901 and 1970 (as per 32-bit signed
timestamps).

Some older kernels might have written pre-1970 dates with 1,1 in the
extra bits.  This patch treats those incorrectly-encoded dates as
pre-1970, instead of post-2311, until kernel 4.20 is released.
Hopefully by then e2fsck will have fixed up the bad data.

Also add a comment explaining the encoding of ext4's extra {a,c,m}time
bits.

Signed-off-by: David Turner &lt;novalis@novalis.org&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Reported-by: Mark Harris &lt;mh8928@yahoo.com&gt;
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23732
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit a4dad1ae24f850410c4e60f22823cba1289b8d52 upstream.

In ext4, the bottom two bits of {a,c,m}time_extra are used to extend
the {a,c,m}time fields, deferring the year 2038 problem to the year
2446.

When decoding these extended fields, for times whose bottom 32 bits
would represent a negative number, sign extension causes the 64-bit
extended timestamp to be negative as well, which is not what's
intended.  This patch corrects that issue, so that the only negative
{a,c,m}times are those between 1901 and 1970 (as per 32-bit signed
timestamps).

Some older kernels might have written pre-1970 dates with 1,1 in the
extra bits.  This patch treats those incorrectly-encoded dates as
pre-1970, instead of post-2311, until kernel 4.20 is released.
Hopefully by then e2fsck will have fixed up the bad data.

Also add a comment explaining the encoding of ext4's extra {a,c,m}time
bits.

Signed-off-by: David Turner &lt;novalis@novalis.org&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Reported-by: Mark Harris &lt;mh8928@yahoo.com&gt;
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23732
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fuse: break infinite loop in fuse_fill_write_pages()</title>
<updated>2016-10-26T15:15:34+00:00</updated>
<author>
<name>Roman Gushchin</name>
<email>klamm@yandex-team.ru</email>
</author>
<published>2015-10-12T13:33:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0dfa2ec52321e4d6fb0ef02a58e86b3fb734a284'/>
<id>0dfa2ec52321e4d6fb0ef02a58e86b3fb734a284</id>
<content type='text'>
commit 3ca8138f014a913f98e6ef40e939868e1e9ea876 upstream.

I got a report about unkillable task eating CPU. Further
investigation shows, that the problem is in the fuse_fill_write_pages()
function. If iov's first segment has zero length, we get an infinite
loop, because we never reach iov_iter_advance() call.

Fix this by calling iov_iter_advance() before repeating an attempt to
copy data from userspace.

A similar problem is described in 124d3b7041f ("fix writev regression:
pan hanging unkillable and un-straceable"). If zero-length segmend
is followed by segment with invalid address,
iov_iter_fault_in_readable() checks only first segment (zero-length),
iov_iter_copy_from_user_atomic() skips it, fails at second and
returns zero -&gt; goto again without skipping zero-length segment.

Patch calls iov_iter_advance() before goto again: we'll skip zero-length
segment at second iteraction and iov_iter_fault_in_readable() will detect
invalid address.

Special thanks to Konstantin Khlebnikov, who helped a lot with the commit
description.

Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Maxim Patlasov &lt;mpatlasov@parallels.com&gt;
Cc: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Signed-off-by: Roman Gushchin &lt;klamm@yandex-team.ru&gt;
Signed-off-by: Miklos Szeredi &lt;miklos@szeredi.hu&gt;
Fixes: ea9b9907b82a ("fuse: implement perform_write")
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 3ca8138f014a913f98e6ef40e939868e1e9ea876 upstream.

I got a report about unkillable task eating CPU. Further
investigation shows, that the problem is in the fuse_fill_write_pages()
function. If iov's first segment has zero length, we get an infinite
loop, because we never reach iov_iter_advance() call.

Fix this by calling iov_iter_advance() before repeating an attempt to
copy data from userspace.

A similar problem is described in 124d3b7041f ("fix writev regression:
pan hanging unkillable and un-straceable"). If zero-length segmend
is followed by segment with invalid address,
iov_iter_fault_in_readable() checks only first segment (zero-length),
iov_iter_copy_from_user_atomic() skips it, fails at second and
returns zero -&gt; goto again without skipping zero-length segment.

Patch calls iov_iter_advance() before goto again: we'll skip zero-length
segment at second iteraction and iov_iter_fault_in_readable() will detect
invalid address.

Special thanks to Konstantin Khlebnikov, who helped a lot with the commit
description.

Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Maxim Patlasov &lt;mpatlasov@parallels.com&gt;
Cc: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Signed-off-by: Roman Gushchin &lt;klamm@yandex-team.ru&gt;
Signed-off-by: Miklos Szeredi &lt;miklos@szeredi.hu&gt;
Fixes: ea9b9907b82a ("fuse: implement perform_write")
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fix sysvfs symlinks</title>
<updated>2016-10-26T15:15:33+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2015-11-24T02:11:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b411ae9afe4ddc0d7f31c023b09d50cd9559ffbd'/>
<id>b411ae9afe4ddc0d7f31c023b09d50cd9559ffbd</id>
<content type='text'>
commit 0ebf7f10d67a70e120f365018f1c5fce9ddc567d upstream.

The thing got broken back in 2002 - sysvfs does *not* have inline
symlinks; even short ones have bodies stored in the first block
of file.  sysv_symlink() handles that correctly; unfortunately,
attempting to look an existing symlink up will end up confusing
them for inline symlinks, and interpret the block number containing
the body as the body itself.

Nobody has noticed until now, which says something about the level
of testing sysvfs gets ;-/

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 0ebf7f10d67a70e120f365018f1c5fce9ddc567d upstream.

The thing got broken back in 2002 - sysvfs does *not* have inline
symlinks; even short ones have bodies stored in the first block
of file.  sysv_symlink() handles that correctly; unfortunately,
attempting to look an existing symlink up will end up confusing
them for inline symlinks, and interpret the block number containing
the body as the body itself.

Nobody has noticed until now, which says something about the level
of testing sysvfs gets ;-/

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nfs: if we have no valid attrs, then don't declare the attribute cache valid</title>
<updated>2016-10-26T15:15:33+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@poochiereds.net</email>
</author>
<published>2015-11-25T18:50:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5cd25ba84bd375bb62ff8a8de68d3788591ecb6b'/>
<id>5cd25ba84bd375bb62ff8a8de68d3788591ecb6b</id>
<content type='text'>
commit c812012f9ca7cf89c9e1a1cd512e6c3b5be04b85 upstream.

If we pass in an empty nfs_fattr struct to nfs_update_inode, it will
(correctly) not update any of the attributes, but it then clears the
NFS_INO_INVALID_ATTR flag, which indicates that the attributes are
up to date. Don't clear the flag if the fattr struct has no valid
attrs to apply.

Reviewed-by: Steve French &lt;steve.french@primarydata.com&gt;
Signed-off-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
Signed-off-by: Trond Myklebust &lt;trond.myklebust@primarydata.com&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit c812012f9ca7cf89c9e1a1cd512e6c3b5be04b85 upstream.

If we pass in an empty nfs_fattr struct to nfs_update_inode, it will
(correctly) not update any of the attributes, but it then clears the
NFS_INO_INVALID_ATTR flag, which indicates that the attributes are
up to date. Don't clear the flag if the fattr struct has no valid
attrs to apply.

Reviewed-by: Steve French &lt;steve.french@primarydata.com&gt;
Signed-off-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
Signed-off-by: Trond Myklebust &lt;trond.myklebust@primarydata.com&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfs: Avoid softlockups with sendfile(2)</title>
<updated>2016-10-26T15:15:32+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2015-11-23T12:09:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a8d3a5b24da10c5d6b7c2bc2ee26def66f9f46ec'/>
<id>a8d3a5b24da10c5d6b7c2bc2ee26def66f9f46ec</id>
<content type='text'>
commit c2489e07c0a71a56fb2c84bc0ee66cddfca7d068 upstream.

The following test program from Dmitry can cause softlockups or RCU
stalls as it copies 1GB from tmpfs into eventfd and we don't have any
scheduling point at that path in sendfile(2) implementation:

        int r1 = eventfd(0, 0);
        int r2 = memfd_create("", 0);
        unsigned long n = 1&lt;&lt;30;
        fallocate(r2, 0, 0, n);
        sendfile(r1, r2, 0, n);

Add cond_resched() into __splice_from_pipe() to fix the problem.

CC: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit c2489e07c0a71a56fb2c84bc0ee66cddfca7d068 upstream.

The following test program from Dmitry can cause softlockups or RCU
stalls as it copies 1GB from tmpfs into eventfd and we don't have any
scheduling point at that path in sendfile(2) implementation:

        int r1 = eventfd(0, 0);
        int r2 = memfd_create("", 0);
        unsigned long n = 1&lt;&lt;30;
        fallocate(r2, 0, 0, n);
        sendfile(r1, r2, 0, n);

Add cond_resched() into __splice_from_pipe() to fix the problem.

CC: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfs: Make sendfile(2) killable even better</title>
<updated>2016-10-26T15:15:32+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2015-11-23T12:09:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9c6caab0351a09777714505db6a355a4479932b8'/>
<id>9c6caab0351a09777714505db6a355a4479932b8</id>
<content type='text'>
commit c725bfce7968009756ed2836a8cd7ba4dc163011 upstream.

Commit 296291cdd162 (mm: make sendfile(2) killable) fixed an issue where
sendfile(2) was doing a lot of tiny writes into a filesystem and thus
was unkillable for a long time. However sendfile(2) can be (mis)used to
issue lots of writes into arbitrary file descriptor such as evenfd or
similar special file descriptors which never hit the standard filesystem
write path and thus are still unkillable. E.g. the following example
from Dmitry burns CPU for ~16s on my test system without possibility to
be killed:

        int r1 = eventfd(0, 0);
        int r2 = memfd_create("", 0);
        unsigned long n = 1&lt;&lt;30;
        fallocate(r2, 0, 0, n);
        sendfile(r1, r2, 0, n);

There are actually quite a few tests for pending signals in sendfile
code however we data to write is always available none of them seems to
trigger. So fix the problem by adding a test for pending signal into
splice_from_pipe_next() also before the loop waiting for pipe buffers to
be available. This should fix all the lockup issues with sendfile of the
do-ton-of-tiny-writes nature.

Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit c725bfce7968009756ed2836a8cd7ba4dc163011 upstream.

Commit 296291cdd162 (mm: make sendfile(2) killable) fixed an issue where
sendfile(2) was doing a lot of tiny writes into a filesystem and thus
was unkillable for a long time. However sendfile(2) can be (mis)used to
issue lots of writes into arbitrary file descriptor such as evenfd or
similar special file descriptors which never hit the standard filesystem
write path and thus are still unkillable. E.g. the following example
from Dmitry burns CPU for ~16s on my test system without possibility to
be killed:

        int r1 = eventfd(0, 0);
        int r2 = memfd_create("", 0);
        unsigned long n = 1&lt;&lt;30;
        fallocate(r2, 0, 0, n);
        sendfile(r1, r2, 0, n);

There are actually quite a few tests for pending signals in sendfile
code however we data to write is always available none of them seems to
trigger. So fix the problem by adding a test for pending signal into
splice_from_pipe_next() also before the loop waiting for pipe buffers to
be available. This should fix all the lockup issues with sendfile of the
do-ton-of-tiny-writes nature.

Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>binfmt_elf: Don't clobber passed executable's file header</title>
<updated>2016-10-26T15:15:28+00:00</updated>
<author>
<name>Maciej W. Rozycki</name>
<email>macro@imgtec.com</email>
</author>
<published>2015-10-26T15:48:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=bf4d43e3d2eb1a772e6ac915ffbb88e50f006cbf'/>
<id>bf4d43e3d2eb1a772e6ac915ffbb88e50f006cbf</id>
<content type='text'>
commit b582ef5c53040c5feef4c96a8f9585b6831e2441 upstream.

Do not clobber the buffer space passed from `search_binary_handler' and
originally preloaded by `prepare_binprm' with the executable's file
header by overwriting it with its interpreter's file header.  Instead
keep the buffer space intact and directly use the data structure locally
allocated for the interpreter's file header, fixing a bug introduced in
2.1.14 with loadable module support (linux-mips.org commit beb11695
[Import of Linux/MIPS 2.1.14], predating kernel.org repo's history).
Adjust the amount of data read from the interpreter's file accordingly.

This was not an issue before loadable module support, because back then
`load_elf_binary' was executed only once for a given ELF executable,
whether the function succeeded or failed.

With loadable module support supported and enabled, upon a failure of
`load_elf_binary' -- which may for example be caused by architecture
code rejecting an executable due to a missing hardware feature requested
in the file header -- a module load is attempted and then the function
reexecuted by `search_binary_handler'.  With the executable's file
header replaced with its interpreter's file header the executable can
then be erroneously accepted in this subsequent attempt.

Signed-off-by: Maciej W. Rozycki &lt;macro@imgtec.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit b582ef5c53040c5feef4c96a8f9585b6831e2441 upstream.

Do not clobber the buffer space passed from `search_binary_handler' and
originally preloaded by `prepare_binprm' with the executable's file
header by overwriting it with its interpreter's file header.  Instead
keep the buffer space intact and directly use the data structure locally
allocated for the interpreter's file header, fixing a bug introduced in
2.1.14 with loadable module support (linux-mips.org commit beb11695
[Import of Linux/MIPS 2.1.14], predating kernel.org repo's history).
Adjust the amount of data read from the interpreter's file accordingly.

This was not an issue before loadable module support, because back then
`load_elf_binary' was executed only once for a given ELF executable,
whether the function succeeded or failed.

With loadable module support supported and enabled, upon a failure of
`load_elf_binary' -- which may for example be caused by architecture
code rejecting an executable due to a missing hardware feature requested
in the file header -- a module load is attempted and then the function
reexecuted by `search_binary_handler'.  With the executable's file
header replaced with its interpreter's file header the executable can
then be erroneously accepted in this subsequent attempt.

Signed-off-by: Maciej W. Rozycki &lt;macro@imgtec.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
