<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/open.c, branch v2.6.23</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>VFS: fix a race in lease-breaking during truncate</title>
<updated>2007-07-31T22:39:42+00:00</updated>
<author>
<name>david m. richter</name>
<email>richterd@citi.umich.edu</email>
</author>
<published>2007-07-31T07:39:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9700382c3c9ff3e673e587084d76eedb3ba88668'/>
<id>9700382c3c9ff3e673e587084d76eedb3ba88668</id>
<content type='text'>
It is possible that another process could acquire a new file lease right
after break_lease() is called during a truncate, but before lease-granting
is disabled by the subsequent get_write_access().  Merely switching the
order of the break_lease() and get_write_access() calls prevents this race.

Signed-off-by: David M. Richter &lt;richterd@citi.umich.edu&gt;
Signed-off-by: "J. Bruce Fields" &lt;bfields@citi.umich.edu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It is possible that another process could acquire a new file lease right
after break_lease() is called during a truncate, but before lease-granting
is disabled by the subsequent get_write_access().  Merely switching the
order of the break_lease() and get_write_access() calls prevents this race.

Signed-off-by: David M. Richter &lt;richterd@citi.umich.edu&gt;
Signed-off-by: "J. Bruce Fields" &lt;bfields@citi.umich.edu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fallocate syscall interface deficiency</title>
<updated>2007-07-24T19:24:58+00:00</updated>
<author>
<name>Ulrich Drepper</name>
<email>drepper@redhat.com</email>
</author>
<published>2007-07-24T01:43:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0d786d4a2773f06a791e8c3730d049077fb81df6'/>
<id>0d786d4a2773f06a791e8c3730d049077fb81df6</id>
<content type='text'>
The fallocate syscall returns ENOSYS in case the filesystem does not support
the operation and expects the userlevel code to fill in.  This is good in
concept.

The problem is that the libc code for old kernels should be able to
distinguish the case where the syscall is not at all available vs not
functioning for a specific mount point.  As is this is not possible and we
always have to invoke the syscall even if the kernel doesn't support it.

I suggest the following patch.  Using EOPNOTSUPP is IMO the right thing to do.

Cc: Amit Arora &lt;aarora@in.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The fallocate syscall returns ENOSYS in case the filesystem does not support
the operation and expects the userlevel code to fill in.  This is good in
concept.

The problem is that the libc code for old kernels should be able to
distinguish the case where the syscall is not at all available vs not
functioning for a specific mount point.  As is this is not possible and we
always have to invoke the syscall even if the kernel doesn't support it.

I suggest the following patch.  Using EOPNOTSUPP is IMO the right thing to do.

Cc: Amit Arora &lt;aarora@in.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sys_fallocate() implementation on i386, x86_64 and powerpc</title>
<updated>2007-07-18T01:42:44+00:00</updated>
<author>
<name>Amit Arora</name>
<email>aarora@in.ibm.com</email>
</author>
<published>2007-07-18T01:42:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=97ac73506c0ba93f30239bb57b4cfc5d73e68a62'/>
<id>97ac73506c0ba93f30239bb57b4cfc5d73e68a62</id>
<content type='text'>
fallocate() is a new system call being proposed here which will allow
applications to preallocate space to any file(s) in a file system.
Each file system implementation that wants to use this feature will need
to support an inode operation called -&gt;fallocate().
Applications can use this feature to avoid fragmentation to certain
level and thus get faster access speed. With preallocation, applications
also get a guarantee of space for particular file(s) - even if later the
the system becomes full.

Currently, glibc provides an interface called posix_fallocate() which
can be used for similar cause. Though this has the advantage of working
on all file systems, but it is quite slow (since it writes zeroes to
each block that has to be preallocated). Without a doubt, file systems
can do this more efficiently within the kernel, by implementing
the proposed fallocate() system call. It is expected that
posix_fallocate() will be modified to call this new system call first
and incase the kernel/filesystem does not implement it, it should fall
back to the current implementation of writing zeroes to the new blocks.
ToDos:
1. Implementation on other architectures (other than i386, x86_64,
   and ppc). Patches for s390(x) and ia64 are already available from
   previous posts, but it was decided that they should be added later
   once fallocate is in the mainline. Hence not including those patches
   in this take.
2. Changes to glibc,
   a) to support fallocate() system call
   b) to make posix_fallocate() and posix_fallocate64() call fallocate()

Signed-off-by: Amit Arora &lt;aarora@in.ibm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
fallocate() is a new system call being proposed here which will allow
applications to preallocate space to any file(s) in a file system.
Each file system implementation that wants to use this feature will need
to support an inode operation called -&gt;fallocate().
Applications can use this feature to avoid fragmentation to certain
level and thus get faster access speed. With preallocation, applications
also get a guarantee of space for particular file(s) - even if later the
the system becomes full.

Currently, glibc provides an interface called posix_fallocate() which
can be used for similar cause. Though this has the advantage of working
on all file systems, but it is quite slow (since it writes zeroes to
each block that has to be preallocated). Without a doubt, file systems
can do this more efficiently within the kernel, by implementing
the proposed fallocate() system call. It is expected that
posix_fallocate() will be modified to call this new system call first
and incase the kernel/filesystem does not implement it, it should fall
back to the current implementation of writing zeroes to the new blocks.
ToDos:
1. Implementation on other architectures (other than i386, x86_64,
   and ppc). Patches for s390(x) and ia64 are already available from
   previous posts, but it was decided that they should be added later
   once fallocate is in the mainline. Hence not including those patches
   in this take.
2. Changes to glibc,
   a) to support fallocate() system call
   b) to make posix_fallocate() and posix_fallocate64() call fallocate()

Signed-off-by: Amit Arora &lt;aarora@in.ibm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>O_CLOEXEC for SCM_RIGHTS</title>
<updated>2007-07-16T16:05:45+00:00</updated>
<author>
<name>Ulrich Drepper</name>
<email>drepper@redhat.com</email>
</author>
<published>2007-07-16T06:40:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4a19542e5f694cd408a32c3d9dc593ba9366e2d7'/>
<id>4a19542e5f694cd408a32c3d9dc593ba9366e2d7</id>
<content type='text'>
Part two in the O_CLOEXEC saga: adding support for file descriptors received
through Unix domain sockets.

The patch is once again pretty minimal, it introduces a new flag for recvmsg
and passes it just like the existing MSG_CMSG_COMPAT flag.  I think this bit
is not used otherwise but the networking people will know better.

This new flag is not recognized by recvfrom and recv.  These functions cannot
be used for that purpose and the asymmetry this introduces is not worse than
the already existing MSG_CMSG_COMPAT situations.

The patch must be applied on the patch which introduced O_CLOEXEC.  It has to
remove static from the new get_unused_fd_flags function but since scm.c cannot
live in a module the function still hasn't to be exported.

Here's a test program to make sure the code works.  It's so much longer than
the actual patch...

#include &lt;errno.h&gt;
#include &lt;error.h&gt;
#include &lt;fcntl.h&gt;
#include &lt;stdio.h&gt;
#include &lt;string.h&gt;
#include &lt;unistd.h&gt;
#include &lt;sys/socket.h&gt;
#include &lt;sys/un.h&gt;

#ifndef O_CLOEXEC
# define O_CLOEXEC 02000000
#endif
#ifndef MSG_CMSG_CLOEXEC
# define MSG_CMSG_CLOEXEC 0x40000000
#endif

int
main (int argc, char *argv[])
{
  if (argc &gt; 1)
    {
      int fd = atol (argv[1]);
      printf ("child: fd = %d\n", fd);
      if (fcntl (fd, F_GETFD) == 0 || errno != EBADF)
        {
          puts ("file descriptor valid in child");
          return 1;
        }
      return 0;

    }

  struct sockaddr_un sun;
  strcpy (sun.sun_path, "./testsocket");
  sun.sun_family = AF_UNIX;

  char databuf[] = "hello";
  struct iovec iov[1];
  iov[0].iov_base = databuf;
  iov[0].iov_len = sizeof (databuf);

  union
  {
    struct cmsghdr hdr;
    char bytes[CMSG_SPACE (sizeof (int))];
  } buf;
  struct msghdr msg = { .msg_iov = iov, .msg_iovlen = 1,
                        .msg_control = buf.bytes,
                        .msg_controllen = sizeof (buf) };
  struct cmsghdr *cmsg = CMSG_FIRSTHDR (&amp;msg);

  cmsg-&gt;cmsg_level = SOL_SOCKET;
  cmsg-&gt;cmsg_type = SCM_RIGHTS;
  cmsg-&gt;cmsg_len = CMSG_LEN (sizeof (int));

  msg.msg_controllen = cmsg-&gt;cmsg_len;

  pid_t child = fork ();
  if (child == -1)
    error (1, errno, "fork");
  if (child == 0)
    {
      int sock = socket (PF_UNIX, SOCK_STREAM, 0);
      if (sock &lt; 0)
        error (1, errno, "socket");

      if (bind (sock, (struct sockaddr *) &amp;sun, sizeof (sun)) &lt; 0)
        error (1, errno, "bind");
      if (listen (sock, SOMAXCONN) &lt; 0)
        error (1, errno, "listen");

      int conn = accept (sock, NULL, NULL);
      if (conn == -1)
        error (1, errno, "accept");

      *(int *) CMSG_DATA (cmsg) = sock;
      if (sendmsg (conn, &amp;msg, MSG_NOSIGNAL) &lt; 0)
        error (1, errno, "sendmsg");

      return 0;
    }

  /* For a test suite this should be more robust like a
     barrier in shared memory.  */
  sleep (1);

  int sock = socket (PF_UNIX, SOCK_STREAM, 0);
  if (sock &lt; 0)
    error (1, errno, "socket");

  if (connect (sock, (struct sockaddr *) &amp;sun, sizeof (sun)) &lt; 0)
    error (1, errno, "connect");
  unlink (sun.sun_path);

  *(int *) CMSG_DATA (cmsg) = -1;

  if (recvmsg (sock, &amp;msg, MSG_CMSG_CLOEXEC) &lt; 0)
    error (1, errno, "recvmsg");

  int fd = *(int *) CMSG_DATA (cmsg);
  if (fd == -1)
    error (1, 0, "no descriptor received");

  char fdname[20];
  snprintf (fdname, sizeof (fdname), "%d", fd);
  execl ("/proc/self/exe", argv[0], fdname, NULL);
  puts ("execl failed");
  return 1;
}

[akpm@linux-foundation.org: Fix fastcall inconsistency noted by Michael Buesch]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Ulrich Drepper &lt;drepper@redhat.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Michael Buesch &lt;mb@bu3sch.de&gt;
Cc: Michael Kerrisk &lt;mtk-manpages@gmx.net&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Part two in the O_CLOEXEC saga: adding support for file descriptors received
through Unix domain sockets.

The patch is once again pretty minimal, it introduces a new flag for recvmsg
and passes it just like the existing MSG_CMSG_COMPAT flag.  I think this bit
is not used otherwise but the networking people will know better.

This new flag is not recognized by recvfrom and recv.  These functions cannot
be used for that purpose and the asymmetry this introduces is not worse than
the already existing MSG_CMSG_COMPAT situations.

The patch must be applied on the patch which introduced O_CLOEXEC.  It has to
remove static from the new get_unused_fd_flags function but since scm.c cannot
live in a module the function still hasn't to be exported.

Here's a test program to make sure the code works.  It's so much longer than
the actual patch...

#include &lt;errno.h&gt;
#include &lt;error.h&gt;
#include &lt;fcntl.h&gt;
#include &lt;stdio.h&gt;
#include &lt;string.h&gt;
#include &lt;unistd.h&gt;
#include &lt;sys/socket.h&gt;
#include &lt;sys/un.h&gt;

#ifndef O_CLOEXEC
# define O_CLOEXEC 02000000
#endif
#ifndef MSG_CMSG_CLOEXEC
# define MSG_CMSG_CLOEXEC 0x40000000
#endif

int
main (int argc, char *argv[])
{
  if (argc &gt; 1)
    {
      int fd = atol (argv[1]);
      printf ("child: fd = %d\n", fd);
      if (fcntl (fd, F_GETFD) == 0 || errno != EBADF)
        {
          puts ("file descriptor valid in child");
          return 1;
        }
      return 0;

    }

  struct sockaddr_un sun;
  strcpy (sun.sun_path, "./testsocket");
  sun.sun_family = AF_UNIX;

  char databuf[] = "hello";
  struct iovec iov[1];
  iov[0].iov_base = databuf;
  iov[0].iov_len = sizeof (databuf);

  union
  {
    struct cmsghdr hdr;
    char bytes[CMSG_SPACE (sizeof (int))];
  } buf;
  struct msghdr msg = { .msg_iov = iov, .msg_iovlen = 1,
                        .msg_control = buf.bytes,
                        .msg_controllen = sizeof (buf) };
  struct cmsghdr *cmsg = CMSG_FIRSTHDR (&amp;msg);

  cmsg-&gt;cmsg_level = SOL_SOCKET;
  cmsg-&gt;cmsg_type = SCM_RIGHTS;
  cmsg-&gt;cmsg_len = CMSG_LEN (sizeof (int));

  msg.msg_controllen = cmsg-&gt;cmsg_len;

  pid_t child = fork ();
  if (child == -1)
    error (1, errno, "fork");
  if (child == 0)
    {
      int sock = socket (PF_UNIX, SOCK_STREAM, 0);
      if (sock &lt; 0)
        error (1, errno, "socket");

      if (bind (sock, (struct sockaddr *) &amp;sun, sizeof (sun)) &lt; 0)
        error (1, errno, "bind");
      if (listen (sock, SOMAXCONN) &lt; 0)
        error (1, errno, "listen");

      int conn = accept (sock, NULL, NULL);
      if (conn == -1)
        error (1, errno, "accept");

      *(int *) CMSG_DATA (cmsg) = sock;
      if (sendmsg (conn, &amp;msg, MSG_NOSIGNAL) &lt; 0)
        error (1, errno, "sendmsg");

      return 0;
    }

  /* For a test suite this should be more robust like a
     barrier in shared memory.  */
  sleep (1);

  int sock = socket (PF_UNIX, SOCK_STREAM, 0);
  if (sock &lt; 0)
    error (1, errno, "socket");

  if (connect (sock, (struct sockaddr *) &amp;sun, sizeof (sun)) &lt; 0)
    error (1, errno, "connect");
  unlink (sun.sun_path);

  *(int *) CMSG_DATA (cmsg) = -1;

  if (recvmsg (sock, &amp;msg, MSG_CMSG_CLOEXEC) &lt; 0)
    error (1, errno, "recvmsg");

  int fd = *(int *) CMSG_DATA (cmsg);
  if (fd == -1)
    error (1, 0, "no descriptor received");

  char fdname[20];
  snprintf (fdname, sizeof (fdname), "%d", fd);
  execl ("/proc/self/exe", argv[0], fdname, NULL);
  puts ("execl failed");
  return 1;
}

[akpm@linux-foundation.org: Fix fastcall inconsistency noted by Michael Buesch]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Ulrich Drepper &lt;drepper@redhat.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Michael Buesch &lt;mb@bu3sch.de&gt;
Cc: Michael Kerrisk &lt;mtk-manpages@gmx.net&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Introduce O_CLOEXEC</title>
<updated>2007-07-16T16:05:45+00:00</updated>
<author>
<name>Ulrich Drepper</name>
<email>drepper@redhat.com</email>
</author>
<published>2007-07-16T06:40:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f23513e8d96cf5e6cf8d2ff0cb5dd6bbc33995e4'/>
<id>f23513e8d96cf5e6cf8d2ff0cb5dd6bbc33995e4</id>
<content type='text'>
The problem is as follows: in multi-threaded code (or more correctly: all
code using clone() with CLONE_FILES) we have a race when exec'ing.

   thread #1                       thread #2

   fd=open()

                                   fork + exec

  fcntl(fd,F_SETFD,FD_CLOEXEC)

In some applications this can happen frequently.  Take a web browser.  One
thread opens a file and another thread starts, say, an external PDF viewer.
 The result can even be a security issue if that open file descriptor
refers to a sensitive file and the external program can somehow be tricked
into using that descriptor.

Just adding O_CLOEXEC support to open() doesn't solve the whole set of
problems.  There are other ways to create file descriptors (socket,
epoll_create, Unix domain socket transfer, etc).  These can and should be
addressed separately though.  open() is such an easy case that it makes not
much sense putting the fix off.

The test program:

#include &lt;errno.h&gt;
#include &lt;fcntl.h&gt;
#include &lt;stdio.h&gt;
#include &lt;unistd.h&gt;

#ifndef O_CLOEXEC
# define O_CLOEXEC 02000000
#endif

int
main (int argc, char *argv[])
{
  int fd;
  if (argc &gt; 1)
    {
      fd = atol (argv[1]);
      printf ("child: fd = %d\n", fd);
      if (fcntl (fd, F_GETFD) == 0 || errno != EBADF)
        {
          puts ("file descriptor valid in child");
          return 1;
        }
      return 0;
    }

  fd = open ("/proc/self/exe", O_RDONLY | O_CLOEXEC);
  printf ("in parent: new fd = %d\n", fd);
  char buf[20];
  snprintf (buf, sizeof (buf), "%d", fd);
  execl ("/proc/self/exe", argv[0], buf, NULL);
  puts ("execl failed");
  return 1;
}

[kyle@parisc-linux.org: parisc fix]
Signed-off-by: Ulrich Drepper &lt;drepper@redhat.com&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Cc: Michael Kerrisk &lt;mtk-manpages@gmx.net&gt;
Cc: Chris Zankel &lt;chris@zankel.net&gt;
Signed-off-by: Kyle McMartin &lt;kyle@parisc-linux.org&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The problem is as follows: in multi-threaded code (or more correctly: all
code using clone() with CLONE_FILES) we have a race when exec'ing.

   thread #1                       thread #2

   fd=open()

                                   fork + exec

  fcntl(fd,F_SETFD,FD_CLOEXEC)

In some applications this can happen frequently.  Take a web browser.  One
thread opens a file and another thread starts, say, an external PDF viewer.
 The result can even be a security issue if that open file descriptor
refers to a sensitive file and the external program can somehow be tricked
into using that descriptor.

Just adding O_CLOEXEC support to open() doesn't solve the whole set of
problems.  There are other ways to create file descriptors (socket,
epoll_create, Unix domain socket transfer, etc).  These can and should be
addressed separately though.  open() is such an easy case that it makes not
much sense putting the fix off.

The test program:

#include &lt;errno.h&gt;
#include &lt;fcntl.h&gt;
#include &lt;stdio.h&gt;
#include &lt;unistd.h&gt;

#ifndef O_CLOEXEC
# define O_CLOEXEC 02000000
#endif

int
main (int argc, char *argv[])
{
  int fd;
  if (argc &gt; 1)
    {
      fd = atol (argv[1]);
      printf ("child: fd = %d\n", fd);
      if (fcntl (fd, F_GETFD) == 0 || errno != EBADF)
        {
          puts ("file descriptor valid in child");
          return 1;
        }
      return 0;
    }

  fd = open ("/proc/self/exe", O_RDONLY | O_CLOEXEC);
  printf ("in parent: new fd = %d\n", fd);
  char buf[20];
  snprintf (buf, sizeof (buf), "%d", fd);
  execl ("/proc/self/exe", argv[0], buf, NULL);
  puts ("execl failed");
  return 1;
}

[kyle@parisc-linux.org: parisc fix]
Signed-off-by: Ulrich Drepper &lt;drepper@redhat.com&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Cc: Michael Kerrisk &lt;mtk-manpages@gmx.net&gt;
Cc: Chris Zankel &lt;chris@zankel.net&gt;
Signed-off-by: Kyle McMartin &lt;kyle@parisc-linux.org&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Remove suid/sgid bits on [f]truncate()</title>
<updated>2007-05-09T03:10:00+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@woody.linux-foundation.org</email>
</author>
<published>2007-05-09T03:10:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7b82dc0e64e93f430182f36b46b79fcee87d3532'/>
<id>7b82dc0e64e93f430182f36b46b79fcee87d3532</id>
<content type='text'>
.. to match what we do on write().  This way, people who write to files
by using [f]truncate + writable mmap have the same semantics as if they
were using the write() family of system calls.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
.. to match what we do on write().  This way, people who write to files
by using [f]truncate + writable mmap have the same semantics as if they
were using the write() family of system calls.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>header cleaning: don't include smp_lock.h when not used</title>
<updated>2007-05-08T18:15:07+00:00</updated>
<author>
<name>Randy Dunlap</name>
<email>randy.dunlap@oracle.com</email>
</author>
<published>2007-05-08T07:28:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e63340ae6b6205fef26b40a75673d1c9c0c8bb90'/>
<id>e63340ae6b6205fef26b40a75673d1c9c0c8bb90</id>
<content type='text'>
Remove includes of &lt;linux/smp_lock.h&gt; where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove includes of &lt;linux/smp_lock.h&gt; where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] fdtable: Make fdarray and fdsets equal in size</title>
<updated>2006-12-10T17:57:22+00:00</updated>
<author>
<name>Vadim Lobanov</name>
<email>vlobanov@speakeasy.net</email>
</author>
<published>2006-12-10T10:21:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=bbea9f69668a3d0cf9feba15a724cd02896f8675'/>
<id>bbea9f69668a3d0cf9feba15a724cd02896f8675</id>
<content type='text'>
Currently, each fdtable supports three dynamically-sized arrays of data: the
fdarray and two fdsets.  The code allows the number of fds supported by the
fdarray (fdtable-&gt;max_fds) to differ from the number of fds supported by each
of the fdsets (fdtable-&gt;max_fdset).

In practice, it is wasteful for these two sizes to differ: whenever we hit a
limit on the smaller-capacity structure, we will reallocate the entire fdtable
and all the dynamic arrays within it, so any delta in the memory used by the
larger-capacity structure will never be touched at all.

Rather than hogging this excess, we shouldn't even allocate it in the first
place, and keep the capacities of the fdarray and the fdsets equal.  This
patch removes fdtable-&gt;max_fdset.  As an added bonus, most of the supporting
code becomes simpler.

Signed-off-by: Vadim Lobanov &lt;vlobanov@speakeasy.net&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Dipankar Sarma &lt;dipankar@in.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, each fdtable supports three dynamically-sized arrays of data: the
fdarray and two fdsets.  The code allows the number of fds supported by the
fdarray (fdtable-&gt;max_fds) to differ from the number of fds supported by each
of the fdsets (fdtable-&gt;max_fdset).

In practice, it is wasteful for these two sizes to differ: whenever we hit a
limit on the smaller-capacity structure, we will reallocate the entire fdtable
and all the dynamic arrays within it, so any delta in the memory used by the
larger-capacity structure will never be touched at all.

Rather than hogging this excess, we shouldn't even allocate it in the first
place, and keep the capacities of the fdarray and the fdsets equal.  This
patch removes fdtable-&gt;max_fdset.  As an added bonus, most of the supporting
code becomes simpler.

Signed-off-by: Vadim Lobanov &lt;vlobanov@speakeasy.net&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Dipankar Sarma &lt;dipankar@in.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] VFS: change struct file to use struct path</title>
<updated>2006-12-08T16:28:41+00:00</updated>
<author>
<name>Josef "Jeff" Sipek</name>
<email>jsipek@cs.sunysb.edu</email>
</author>
<published>2006-12-08T10:36:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0f7fc9e4d03987fe29f6dd4aa67e4c56eb7ecb05'/>
<id>0f7fc9e4d03987fe29f6dd4aa67e4c56eb7ecb05</id>
<content type='text'>
This patch changes struct file to use struct path instead of having
independent pointers to struct dentry and struct vfsmount, and converts all
users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

Additionally, it adds two #define's to make the transition easier for users of
the f_dentry and f_vfsmnt.

Signed-off-by: Josef "Jeff" Sipek &lt;jsipek@cs.sunysb.edu&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch changes struct file to use struct path instead of having
independent pointers to struct dentry and struct vfsmount, and converts all
users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

Additionally, it adds two #define's to make the transition easier for users of
the f_dentry and f_vfsmnt.

Signed-off-by: Josef "Jeff" Sipek &lt;jsipek@cs.sunysb.edu&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] tty: -&gt;signal-&gt;tty locking</title>
<updated>2006-12-08T16:28:38+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2006-12-08T10:36:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=24ec839c431eb79bb8f6abc00c4e1eb3b8c4d517'/>
<id>24ec839c431eb79bb8f6abc00c4e1eb3b8c4d517</id>
<content type='text'>
Fix the locking of signal-&gt;tty.

Use -&gt;sighand-&gt;siglock to protect -&gt;signal-&gt;tty; this lock is already used
by most other members of -&gt;signal/-&gt;sighand.  And unless we are 'current'
or the tasklist_lock is held we need -&gt;siglock to access -&gt;signal anyway.

(NOTE: sys_unshare() is broken wrt -&gt;sighand locking rules)

Note that tty_mutex is held over tty destruction, so while holding
tty_mutex any tty pointer remains valid.  Otherwise the lifetime of ttys
are governed by their open file handles.  This leaves some holes for tty
access from signal-&gt;tty (or any other non file related tty access).

It solves the tty SLAB scribbles we were seeing.

(NOTE: the change from group_send_sig_info to __group_send_sig_info needs to
       be examined by someone familiar with the security framework, I think
       it is safe given the SEND_SIG_PRIV from other __group_send_sig_info
       invocations)

[schwidefsky@de.ibm.com: 3270 fix]
[akpm@osdl.org: various post-viro fixes]
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Acked-by: Alan Cox &lt;alan@redhat.com&gt;
Cc: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Prarit Bhargava &lt;prarit@redhat.com&gt;
Cc: Chris Wright &lt;chrisw@sous-sol.org&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Cc: Stephen Smalley &lt;sds@tycho.nsa.gov&gt;
Cc: James Morris &lt;jmorris@namei.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Jeff Dike &lt;jdike@addtoit.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Jan Kara &lt;jack@ucw.cz&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix the locking of signal-&gt;tty.

Use -&gt;sighand-&gt;siglock to protect -&gt;signal-&gt;tty; this lock is already used
by most other members of -&gt;signal/-&gt;sighand.  And unless we are 'current'
or the tasklist_lock is held we need -&gt;siglock to access -&gt;signal anyway.

(NOTE: sys_unshare() is broken wrt -&gt;sighand locking rules)

Note that tty_mutex is held over tty destruction, so while holding
tty_mutex any tty pointer remains valid.  Otherwise the lifetime of ttys
are governed by their open file handles.  This leaves some holes for tty
access from signal-&gt;tty (or any other non file related tty access).

It solves the tty SLAB scribbles we were seeing.

(NOTE: the change from group_send_sig_info to __group_send_sig_info needs to
       be examined by someone familiar with the security framework, I think
       it is safe given the SEND_SIG_PRIV from other __group_send_sig_info
       invocations)

[schwidefsky@de.ibm.com: 3270 fix]
[akpm@osdl.org: various post-viro fixes]
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Acked-by: Alan Cox &lt;alan@redhat.com&gt;
Cc: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Prarit Bhargava &lt;prarit@redhat.com&gt;
Cc: Chris Wright &lt;chrisw@sous-sol.org&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Cc: Stephen Smalley &lt;sds@tycho.nsa.gov&gt;
Cc: James Morris &lt;jmorris@namei.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Jeff Dike &lt;jdike@addtoit.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Jan Kara &lt;jack@ucw.cz&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
