<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/compat.c, branch v2.6.30</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>do_execve() must not clear fs-&gt;in_exec if it was set by another thread</title>
<updated>2009-04-24T14:39:45+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2009-04-23T23:01:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8c652f96d3852b97a49c331cd0bb02d22f3cb31b'/>
<id>8c652f96d3852b97a49c331cd0bb02d22f3cb31b</id>
<content type='text'>
If do_execve() fails after check_unsafe_exec(), it clears fs-&gt;in_exec
unconditionally. This is wrong if we race with our sub-thread which
also does do_execve:

	Two threads T1 and T2 and another process P, all share the same
	-&gt;fs.

	T1 starts do_execve(BAD_FILE). It calls check_unsafe_exec(), since
	-&gt;fs is shared, we set LSM_UNSAFE but not -&gt;in_exec.

	P exits and decrements fs-&gt;users.

	T2 starts do_execve(), calls check_unsafe_exec(), now -&gt;fs is not
	shared, we set fs-&gt;in_exec.

	T1 continues, open_exec(BAD_FILE) fails, we clear -&gt;in_exec and
	return to the user-space.

	T1 does clone(CLONE_FS /* without CLONE_THREAD */).

	T2 continues without LSM_UNSAFE_SHARE while -&gt;fs is shared with
	another process.

Change check_unsafe_exec() to return res = 1 if we set -&gt;in_exec, and change
do_execve() to clear -&gt;in_exec depending on res.

When do_execve() suceeds, it is safe to clear -&gt;in_exec unconditionally.
It can be set only if we don't share -&gt;fs with another process, and since
we already killed all sub-threads either -&gt;in_exec == 0 or we are the
only user of this -&gt;fs.

Also, we do not need fs-&gt;lock to clear fs-&gt;in_exec.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Acked-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If do_execve() fails after check_unsafe_exec(), it clears fs-&gt;in_exec
unconditionally. This is wrong if we race with our sub-thread which
also does do_execve:

	Two threads T1 and T2 and another process P, all share the same
	-&gt;fs.

	T1 starts do_execve(BAD_FILE). It calls check_unsafe_exec(), since
	-&gt;fs is shared, we set LSM_UNSAFE but not -&gt;in_exec.

	P exits and decrements fs-&gt;users.

	T2 starts do_execve(), calls check_unsafe_exec(), now -&gt;fs is not
	shared, we set fs-&gt;in_exec.

	T1 continues, open_exec(BAD_FILE) fails, we clear -&gt;in_exec and
	return to the user-space.

	T1 does clone(CLONE_FS /* without CLONE_THREAD */).

	T2 continues without LSM_UNSAFE_SHARE while -&gt;fs is shared with
	another process.

Change check_unsafe_exec() to return res = 1 if we set -&gt;in_exec, and change
do_execve() to clear -&gt;in_exec depending on res.

When do_execve() suceeds, it is safe to clear -&gt;in_exec unconditionally.
It can be set only if we don't share -&gt;fs with another process, and since
we already killed all sub-threads either -&gt;in_exec == 0 or we are the
only user of this -&gt;fs.

Also, we do not need fs-&gt;lock to clear fs-&gt;in_exec.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Acked-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kill vfs_stat_fd / vfs_lstat_fd</title>
<updated>2009-04-21T03:02:52+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@infradead.org</email>
</author>
<published>2009-04-08T20:34:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2eae7a1874ca5be3232765d89e0250a449f1bc90'/>
<id>2eae7a1874ca5be3232765d89e0250a449f1bc90</id>
<content type='text'>
There's really no reason to keep vfs_stat_fd and vfs_lstat_fd with
Oleg's vfs_fstatat.  Use vfs_fstatat for the few cases having the
directory fd, and switch all others to vfs_stat / vfs_lstat.

Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There's really no reason to keep vfs_stat_fd and vfs_lstat_fd with
Oleg's vfs_fstatat.  Use vfs_fstatat for the few cases having the
directory fd, and switch all others to vfs_stat / vfs_lstat.

Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Separate out common fstatat code into vfs_fstatat</title>
<updated>2009-04-21T03:02:51+00:00</updated>
<author>
<name>Oleg Drokin</name>
<email>green@linuxhacker.ru</email>
</author>
<published>2009-04-08T16:05:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0112fc2229847feb6c4eb011e6833d8f1742a375'/>
<id>0112fc2229847feb6c4eb011e6833d8f1742a375</id>
<content type='text'>
This is a version incorporating Christoph's suggestion.

Separate out common *fstatat functionality into a single function
instead of duplicating it all over the code.

Signed-off-by: Oleg Drokin &lt;green@linuxhacker.ru&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is a version incorporating Christoph's suggestion.

Separate out common *fstatat functionality into a single function
instead of duplicating it all over the code.

Signed-off-by: Oleg Drokin &lt;green@linuxhacker.ru&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Make non-compat preadv/pwritev use native register size</title>
<updated>2009-04-04T21:20:34+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2009-04-03T15:03:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=601cc11d054ae4b5e9b5babec3d8e4667a2cb9b5'/>
<id>601cc11d054ae4b5e9b5babec3d8e4667a2cb9b5</id>
<content type='text'>
Instead of always splitting the file offset into 32-bit 'high' and 'low'
parts, just split them into the largest natural word-size - which in C
terms is 'unsigned long'.

This allows 64-bit architectures to avoid the unnecessary 32-bit
shifting and masking for native format (while the compat interfaces will
obviously always have to do it).

This also changes the order of 'high' and 'low' to be "low first".  Why?
Because when we have it like this, the 64-bit system calls now don't use
the "pos_high" argument at all, and it makes more sense for the native
system call to simply match the user-mode prototype.

This results in a much more natural calling convention, and allows the
compiler to generate much more straightforward code.  On x86-64, we now
generate

        testq   %rcx, %rcx      # pos_l
        js      .L122   #,
        movq    %rcx, -48(%rbp) # pos_l, pos

from the C source

        loff_t pos = pos_from_hilo(pos_h, pos_l);
	...
        if (pos &lt; 0)
                return -EINVAL;

and the 'pos_h' register isn't even touched.  It used to generate code
like

        mov     %r8d, %r8d      # pos_low, pos_low
        salq    $32, %rcx       #, tmp71
        movq    %r8, %rax       # pos_low, pos.386
        orq     %rcx, %rax      # tmp71, pos.386
        js      .L122   #,
        movq    %rax, -48(%rbp) # pos.386, pos

which isn't _that_ horrible, but it does show how the natural word size
is just a more sensible interface (same arguments will hold in the user
level glibc wrapper function, of course, so the kernel side is just half
of the equation!)

Note: in all cases the user code wrapper can again be the same. You can
just do

	#define HALF_BITS (sizeof(unsigned long)*4)
	__syscall(PWRITEV, fd, iov, count, offset, (offset &gt;&gt; HALF_BITS) &gt;&gt; HALF_BITS);

or something like that.  That way the user mode wrapper will also be
nicely passing in a zero (it won't actually have to do the shifts, the
compiler will understand what is going on) for the last argument.

And that is a good idea, even if nobody will necessarily ever care: if
we ever do move to a 128-bit lloff_t, this particular system call might
be left alone.  Of course, that will be the least of our worries if we
really ever need to care, so this may not be worth really caring about.

[ Fixed for lost 'loff_t' cast noticed by Andrew Morton ]

Acked-by: Gerd Hoffmann &lt;kraxel@redhat.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Instead of always splitting the file offset into 32-bit 'high' and 'low'
parts, just split them into the largest natural word-size - which in C
terms is 'unsigned long'.

This allows 64-bit architectures to avoid the unnecessary 32-bit
shifting and masking for native format (while the compat interfaces will
obviously always have to do it).

This also changes the order of 'high' and 'low' to be "low first".  Why?
Because when we have it like this, the 64-bit system calls now don't use
the "pos_high" argument at all, and it makes more sense for the native
system call to simply match the user-mode prototype.

This results in a much more natural calling convention, and allows the
compiler to generate much more straightforward code.  On x86-64, we now
generate

        testq   %rcx, %rcx      # pos_l
        js      .L122   #,
        movq    %rcx, -48(%rbp) # pos_l, pos

from the C source

        loff_t pos = pos_from_hilo(pos_h, pos_l);
	...
        if (pos &lt; 0)
                return -EINVAL;

and the 'pos_h' register isn't even touched.  It used to generate code
like

        mov     %r8d, %r8d      # pos_low, pos_low
        salq    $32, %rcx       #, tmp71
        movq    %r8, %rax       # pos_low, pos.386
        orq     %rcx, %rax      # tmp71, pos.386
        js      .L122   #,
        movq    %rax, -48(%rbp) # pos.386, pos

which isn't _that_ horrible, but it does show how the natural word size
is just a more sensible interface (same arguments will hold in the user
level glibc wrapper function, of course, so the kernel side is just half
of the equation!)

Note: in all cases the user code wrapper can again be the same. You can
just do

	#define HALF_BITS (sizeof(unsigned long)*4)
	__syscall(PWRITEV, fd, iov, count, offset, (offset &gt;&gt; HALF_BITS) &gt;&gt; HALF_BITS);

or something like that.  That way the user mode wrapper will also be
nicely passing in a zero (it won't actually have to do the shifts, the
compiler will understand what is going on) for the last argument.

And that is a good idea, even if nobody will necessarily ever care: if
we ever do move to a 128-bit lloff_t, this particular system call might
be left alone.  Of course, that will be the least of our worries if we
really ever need to care, so this may not be worth really caring about.

[ Fixed for lost 'loff_t' cast noticed by Andrew Morton ]

Acked-by: Gerd Hoffmann &lt;kraxel@redhat.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6</title>
<updated>2009-04-03T04:09:10+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2009-04-03T04:09:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8fe74cf053de7ad2124a894996f84fa890a81093'/>
<id>8fe74cf053de7ad2124a894996f84fa890a81093</id>
<content type='text'>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  Remove two unneeded exports and make two symbols static in fs/mpage.c
  Cleanup after commit 585d3bc06f4ca57f975a5a1f698f65a45ea66225
  Trim includes of fdtable.h
  Don't crap into descriptor table in binfmt_som
  Trim includes in binfmt_elf
  Don't mess with descriptor table in load_elf_binary()
  Get rid of indirect include of fs_struct.h
  New helper - current_umask()
  check_unsafe_exec() doesn't care about signal handlers sharing
  New locking/refcounting for fs_struct
  Take fs_struct handling to new file (fs/fs_struct.c)
  Get rid of bumping fs_struct refcount in pivot_root(2)
  Kill unsharing fs_struct in __set_personality()
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  Remove two unneeded exports and make two symbols static in fs/mpage.c
  Cleanup after commit 585d3bc06f4ca57f975a5a1f698f65a45ea66225
  Trim includes of fdtable.h
  Don't crap into descriptor table in binfmt_som
  Trim includes in binfmt_elf
  Don't mess with descriptor table in load_elf_binary()
  Get rid of indirect include of fs_struct.h
  New helper - current_umask()
  check_unsafe_exec() doesn't care about signal handlers sharing
  New locking/refcounting for fs_struct
  Take fs_struct handling to new file (fs/fs_struct.c)
  Get rid of bumping fs_struct refcount in pivot_root(2)
  Kill unsharing fs_struct in __set_personality()
</pre>
</div>
</content>
</entry>
<entry>
<title>preadv/pwritev: switch compat readv/preadv/writev/pwritev from fget to fget_light</title>
<updated>2009-04-03T02:05:08+00:00</updated>
<author>
<name>Gerd Hoffmann</name>
<email>kraxel@redhat.com</email>
</author>
<published>2009-04-02T23:59:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=10c7db279218eda4b19d29ee17db8a815b18d564'/>
<id>10c7db279218eda4b19d29ee17db8a815b18d564</id>
<content type='text'>
Signed-off-by: Gerd Hoffmann &lt;kraxel@redhat.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: &lt;linux-api@vger.kernel.org&gt;
Cc: &lt;linux-arch@vger.kernel.org&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Gerd Hoffmann &lt;kraxel@redhat.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: &lt;linux-api@vger.kernel.org&gt;
Cc: &lt;linux-arch@vger.kernel.org&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>preadv/pwritev: Add preadv and pwritev system calls.</title>
<updated>2009-04-03T02:05:08+00:00</updated>
<author>
<name>Gerd Hoffmann</name>
<email>kraxel@redhat.com</email>
</author>
<published>2009-04-02T23:59:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f3554f4bc69803ac2baaf7cf2aa4339e1f4b693e'/>
<id>f3554f4bc69803ac2baaf7cf2aa4339e1f4b693e</id>
<content type='text'>
This patch adds preadv and pwritev system calls.  These syscalls are a
pretty straightforward combination of pread and readv (same for write).
They are quite useful for doing vectored I/O in threaded applications.
Using lseek+readv instead opens race windows you'll have to plug with
locking.

Other systems have such system calls too, for example NetBSD, check
here: http://www.daemon-systems.org/man/preadv.2.html

The application-visible interface provided by glibc should look like
this to be compatible to the existing implementations in the *BSD family:

  ssize_t preadv(int d, const struct iovec *iov, int iovcnt, off_t offset);
  ssize_t pwritev(int d, const struct iovec *iov, int iovcnt, off_t offset);

This prototype has one problem though: On 32bit archs is the (64bit)
offset argument unaligned, which the syscall ABI of several archs doesn't
allow to do.  At least s390 needs a wrapper in glibc to handle this.  As
we'll need a wrappers in glibc anyway I've decided to push problem to
glibc entriely and use a syscall prototype which works without
arch-specific wrappers inside the kernel: The offset argument is
explicitly splitted into two 32bit values.

The patch sports the actual system call implementation and the windup in
the x86 system call tables.  Other archs follow as separate patches.

Signed-off-by: Gerd Hoffmann &lt;kraxel@redhat.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: &lt;linux-api@vger.kernel.org&gt;
Cc: &lt;linux-arch@vger.kernel.org&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch adds preadv and pwritev system calls.  These syscalls are a
pretty straightforward combination of pread and readv (same for write).
They are quite useful for doing vectored I/O in threaded applications.
Using lseek+readv instead opens race windows you'll have to plug with
locking.

Other systems have such system calls too, for example NetBSD, check
here: http://www.daemon-systems.org/man/preadv.2.html

The application-visible interface provided by glibc should look like
this to be compatible to the existing implementations in the *BSD family:

  ssize_t preadv(int d, const struct iovec *iov, int iovcnt, off_t offset);
  ssize_t pwritev(int d, const struct iovec *iov, int iovcnt, off_t offset);

This prototype has one problem though: On 32bit archs is the (64bit)
offset argument unaligned, which the syscall ABI of several archs doesn't
allow to do.  At least s390 needs a wrapper in glibc to handle this.  As
we'll need a wrappers in glibc anyway I've decided to push problem to
glibc entriely and use a syscall prototype which works without
arch-specific wrappers inside the kernel: The offset argument is
explicitly splitted into two 32bit values.

The patch sports the actual system call implementation and the windup in
the x86 system call tables.  Other archs follow as separate patches.

Signed-off-by: Gerd Hoffmann &lt;kraxel@redhat.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: &lt;linux-api@vger.kernel.org&gt;
Cc: &lt;linux-arch@vger.kernel.org&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>preadv/pwritev: create compat_writev()</title>
<updated>2009-04-03T02:05:07+00:00</updated>
<author>
<name>Gerd Hoffmann</name>
<email>kraxel@redhat.com</email>
</author>
<published>2009-04-02T23:59:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6949a6318e60aeb9c755679ac7f978aefe8c1722'/>
<id>6949a6318e60aeb9c755679ac7f978aefe8c1722</id>
<content type='text'>
Factor out some code from compat_sys_writev() which can be shared with the
upcoming compat_sys_pwritev().

Signed-off-by: Gerd Hoffmann &lt;kraxel@redhat.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: &lt;linux-api@vger.kernel.org&gt;
Cc: &lt;linux-arch@vger.kernel.org&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Factor out some code from compat_sys_writev() which can be shared with the
upcoming compat_sys_pwritev().

Signed-off-by: Gerd Hoffmann &lt;kraxel@redhat.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: &lt;linux-api@vger.kernel.org&gt;
Cc: &lt;linux-arch@vger.kernel.org&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>preadv/pwritev: create compat_readv()</title>
<updated>2009-04-03T02:05:07+00:00</updated>
<author>
<name>Gerd Hoffmann</name>
<email>kraxel@redhat.com</email>
</author>
<published>2009-04-02T23:59:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dac1213842f5caf081804a11d87d2a39a21d6f55'/>
<id>dac1213842f5caf081804a11d87d2a39a21d6f55</id>
<content type='text'>
This patch series:

Implement the preadv() and pwritev() syscalls.  *BSD has this syscall for
quite some time.

Test code:

#if 0
set -x
gcc -Wall -O2 -o preadv $0
exit 0
#endif
/*
 * preadv demo / test
 *
 * (c) 2008 Gerd Hoffmann &lt;kraxel@redhat.com&gt;
 *
 * build with "sh $thisfile"
 */

#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;unistd.h&gt;
#include &lt;errno.h&gt;
#include &lt;inttypes.h&gt;
#include &lt;sys/uio.h&gt;

/* ----------------------------------------------------------------- */
/* syscall windup                                                    */

#include &lt;sys/syscall.h&gt;
#if 0
/* WARNING: Be sure you know what you are doing if you enable this.
 * linux syscall code isn't upstream yet, syscall numbers are subject
 * to change */
# ifndef __NR_preadv
#  ifdef __i386__
#   define __NR_preadv  333
#   define __NR_pwritev 334
#  endif
#  ifdef __x86_64__
#   define __NR_preadv  295
#   define __NR_pwritev 296
#  endif
# endif
#endif
#ifndef __NR_preadv
# error preadv/pwritev syscall numbers are unknown
#endif

static ssize_t preadv(int fd, const struct iovec *iov, int iovcnt, off_t offset)
{
    uint32_t pos_high = (offset &gt;&gt; 32) &amp; 0xffffffff;
    uint32_t pos_low  =  offset        &amp; 0xffffffff;

    return syscall(__NR_preadv, fd, iov, iovcnt, pos_high, pos_low);
}

static ssize_t pwritev(int fd, const struct iovec *iov, int iovcnt, off_t offset)
{
    uint32_t pos_high = (offset &gt;&gt; 32) &amp; 0xffffffff;
    uint32_t pos_low  =  offset        &amp; 0xffffffff;

    return syscall(__NR_pwritev, fd, iov, iovcnt, pos_high, pos_low);
}

/* ----------------------------------------------------------------- */
/* demo/test app                                                     */

static char filename[] = "/tmp/preadv-XXXXXX";
static char outbuf[11] = "0123456789";
static char inbuf[11]  = "----------";

static struct iovec ovec[2] = {{
        .iov_base = outbuf + 5,
        .iov_len  = 5,
    },{
        .iov_base = outbuf + 0,
        .iov_len  = 5,
    }};

static struct iovec ivec[3] = {{
        .iov_base = inbuf + 6,
        .iov_len  = 2,
    },{
        .iov_base = inbuf + 4,
        .iov_len  = 2,
    },{
        .iov_base = inbuf + 2,
        .iov_len  = 2,
    }};

void cleanup(void)
{
    unlink(filename);
}

int main(int argc, char **argv)
{
    int fd, rc;

    fd = mkstemp(filename);
    if (-1 == fd) {
        perror("mkstemp");
        exit(1);
    }
    atexit(cleanup);

    /* write to file: "56789-01234" */
    rc = pwritev(fd, ovec, 2, 0);
    if (rc &lt; 0) {
        perror("pwritev");
        exit(1);
    }

    /* read from file: "78-90-12" */
    rc = preadv(fd, ivec, 3, 2);
    if (rc &lt; 0) {
        perror("preadv");
        exit(1);
    }

    printf("result  : %s\n", inbuf);
    printf("expected: %s\n", "--129078--");
    exit(0);
}

This patch:

Factor out some code from compat_sys_readv() which can be shared with the
upcoming compat_sys_preadv().

Signed-off-by: Gerd Hoffmann &lt;kraxel@redhat.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: &lt;linux-api@vger.kernel.org&gt;
Cc: &lt;linux-arch@vger.kernel.org&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch series:

Implement the preadv() and pwritev() syscalls.  *BSD has this syscall for
quite some time.

Test code:

#if 0
set -x
gcc -Wall -O2 -o preadv $0
exit 0
#endif
/*
 * preadv demo / test
 *
 * (c) 2008 Gerd Hoffmann &lt;kraxel@redhat.com&gt;
 *
 * build with "sh $thisfile"
 */

#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;unistd.h&gt;
#include &lt;errno.h&gt;
#include &lt;inttypes.h&gt;
#include &lt;sys/uio.h&gt;

/* ----------------------------------------------------------------- */
/* syscall windup                                                    */

#include &lt;sys/syscall.h&gt;
#if 0
/* WARNING: Be sure you know what you are doing if you enable this.
 * linux syscall code isn't upstream yet, syscall numbers are subject
 * to change */
# ifndef __NR_preadv
#  ifdef __i386__
#   define __NR_preadv  333
#   define __NR_pwritev 334
#  endif
#  ifdef __x86_64__
#   define __NR_preadv  295
#   define __NR_pwritev 296
#  endif
# endif
#endif
#ifndef __NR_preadv
# error preadv/pwritev syscall numbers are unknown
#endif

static ssize_t preadv(int fd, const struct iovec *iov, int iovcnt, off_t offset)
{
    uint32_t pos_high = (offset &gt;&gt; 32) &amp; 0xffffffff;
    uint32_t pos_low  =  offset        &amp; 0xffffffff;

    return syscall(__NR_preadv, fd, iov, iovcnt, pos_high, pos_low);
}

static ssize_t pwritev(int fd, const struct iovec *iov, int iovcnt, off_t offset)
{
    uint32_t pos_high = (offset &gt;&gt; 32) &amp; 0xffffffff;
    uint32_t pos_low  =  offset        &amp; 0xffffffff;

    return syscall(__NR_pwritev, fd, iov, iovcnt, pos_high, pos_low);
}

/* ----------------------------------------------------------------- */
/* demo/test app                                                     */

static char filename[] = "/tmp/preadv-XXXXXX";
static char outbuf[11] = "0123456789";
static char inbuf[11]  = "----------";

static struct iovec ovec[2] = {{
        .iov_base = outbuf + 5,
        .iov_len  = 5,
    },{
        .iov_base = outbuf + 0,
        .iov_len  = 5,
    }};

static struct iovec ivec[3] = {{
        .iov_base = inbuf + 6,
        .iov_len  = 2,
    },{
        .iov_base = inbuf + 4,
        .iov_len  = 2,
    },{
        .iov_base = inbuf + 2,
        .iov_len  = 2,
    }};

void cleanup(void)
{
    unlink(filename);
}

int main(int argc, char **argv)
{
    int fd, rc;

    fd = mkstemp(filename);
    if (-1 == fd) {
        perror("mkstemp");
        exit(1);
    }
    atexit(cleanup);

    /* write to file: "56789-01234" */
    rc = pwritev(fd, ovec, 2, 0);
    if (rc &lt; 0) {
        perror("pwritev");
        exit(1);
    }

    /* read from file: "78-90-12" */
    rc = preadv(fd, ivec, 3, 2);
    if (rc &lt; 0) {
        perror("preadv");
        exit(1);
    }

    printf("result  : %s\n", inbuf);
    printf("expected: %s\n", "--129078--");
    exit(0);
}

This patch:

Factor out some code from compat_sys_readv() which can be shared with the
upcoming compat_sys_preadv().

Signed-off-by: Gerd Hoffmann &lt;kraxel@redhat.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: &lt;linux-api@vger.kernel.org&gt;
Cc: &lt;linux-arch@vger.kernel.org&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>New locking/refcounting for fs_struct</title>
<updated>2009-04-01T03:00:26+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2009-03-30T11:20:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=498052bba55ecaff58db6a1436b0e25bfd75a7ff'/>
<id>498052bba55ecaff58db6a1436b0e25bfd75a7ff</id>
<content type='text'>
* all changes of current-&gt;fs are done under task_lock and write_lock of
  old fs-&gt;lock
* refcount is not atomic anymore (same protection)
* its decrements are done when removing reference from current; at the
  same time we decide whether to free it.
* put_fs_struct() is gone
* new field - -&gt;in_exec.  Set by check_unsafe_exec() if we are trying to do
  execve() and only subthreads share fs_struct.  Cleared when finishing exec
  (success and failure alike).  Makes CLONE_FS fail with -EAGAIN if set.
* check_unsafe_exec() may fail with -EAGAIN if another execve() from subthread
  is in progress.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* all changes of current-&gt;fs are done under task_lock and write_lock of
  old fs-&gt;lock
* refcount is not atomic anymore (same protection)
* its decrements are done when removing reference from current; at the
  same time we decide whether to free it.
* put_fs_struct() is gone
* new field - -&gt;in_exec.  Set by check_unsafe_exec() if we are trying to do
  execve() and only subthreads share fs_struct.  Cleared when finishing exec
  (success and failure alike).  Makes CLONE_FS fail with -EAGAIN if set.
* check_unsafe_exec() may fail with -EAGAIN if another execve() from subthread
  is in progress.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
</feed>
