<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/include/linux/proc_fs.h, branch linux-2.6.24.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>proc: fix proc_dir_entry refcounting</title>
<updated>2007-12-05T17:21:20+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@sw.ru</email>
</author>
<published>2007-12-05T07:45:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5a622f2d0f86b316b07b55a4866ecb5518dd1cf7'/>
<id>5a622f2d0f86b316b07b55a4866ecb5518dd1cf7</id>
<content type='text'>
Creating PDEs with refcount 0 and "deleted" flag has problems (see below).
Switch to usual scheme:
* PDE is created with refcount 1
* every de_get does +1
* every de_put() and remove_proc_entry() do -1
* once refcount reaches 0, PDE is freed.

This elegantly fixes at least two following races (both observed) without
introducing new locks, without abusing old locks, without spreading
lock_kernel():

1) PDE leak

remove_proc_entry			de_put
-----------------			------
			[refcnt = 1]
if (atomic_read(&amp;de-&gt;count) == 0)
					if (atomic_dec_and_test(&amp;de-&gt;count))
						if (de-&gt;deleted)
							/* also not taken! */
							free_proc_entry(de);
else
	de-&gt;deleted = 1;
		[refcount=0, deleted=1]

2) use after free

remove_proc_entry			de_put
-----------------			------
			[refcnt = 1]

					if (atomic_dec_and_test(&amp;de-&gt;count))
if (atomic_read(&amp;de-&gt;count) == 0)
	free_proc_entry(de);
						/* boom! */
						if (de-&gt;deleted)
							free_proc_entry(de);

BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
printing eip: c10acdda *pdpt = 00000000338f8001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 23161, comm: cat Not tainted (2.6.24-rc2-8c0863403f109a43d7000b4646da4818220d501f #4)
EIP: 0060:[&lt;c10acdda&gt;] EFLAGS: 00210097 CPU: 1
EIP is at strnlen+0x6/0x18
EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: fffffffe
ESI: c128fa3b EDI: f380bf34 EBP: ffffffff ESP: f380be44
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 23161, ti=f380b000 task=f38f2570 task.ti=f380b000)
Stack: c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400
       c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400
       f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34
Call Trace:
 [&lt;c10ac4f0&gt;] vsnprintf+0x2ad/0x49b
 [&lt;c10ac779&gt;] vscnprintf+0x14/0x1f
 [&lt;c1018e6b&gt;] vprintk+0xc5/0x2f9
 [&lt;c10379f1&gt;] handle_fasteoi_irq+0x0/0xab
 [&lt;c1004f44&gt;] do_IRQ+0x9f/0xb7
 [&lt;c117db3b&gt;] preempt_schedule_irq+0x3f/0x5b
 [&lt;c100264e&gt;] need_resched+0x1f/0x21
 [&lt;c10190ba&gt;] printk+0x1b/0x1f
 [&lt;c107c8ad&gt;] de_put+0x3d/0x50
 [&lt;c107c8f8&gt;] proc_delete_inode+0x38/0x41
 [&lt;c107c8c0&gt;] proc_delete_inode+0x0/0x41
 [&lt;c1066298&gt;] generic_delete_inode+0x5e/0xc6
 [&lt;c1065aa9&gt;] iput+0x60/0x62
 [&lt;c1063c8e&gt;] d_kill+0x2d/0x46
 [&lt;c1063fa9&gt;] dput+0xdc/0xe4
 [&lt;c10571a1&gt;] __fput+0xb0/0xcd
 [&lt;c1054e49&gt;] filp_close+0x48/0x4f
 [&lt;c1055ee9&gt;] sys_close+0x67/0xa5
 [&lt;c10026b6&gt;] sysenter_past_esp+0x5f/0x85
=======================
Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 &lt;80&gt; 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9
EIP: [&lt;c10acdda&gt;] strnlen+0x6/0x18 SS:ESP 0068:f380be44

Also, remove broken usage of -&gt;deleted from reiserfs: if sget() succeeds,
module is already pinned and remove_proc_entry() can't happen =&gt; nobody
can mark PDE deleted.

Dummy proc root in netns code is not marked with refcount 1. AFAICS, we
never get it, it's just for proper /proc/net removal. I double checked
CLONE_NETNS continues to work.

Patch survives many hours of modprobe/rmmod/cat loops without new bugs
which can be attributed to refcounting.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Creating PDEs with refcount 0 and "deleted" flag has problems (see below).
Switch to usual scheme:
* PDE is created with refcount 1
* every de_get does +1
* every de_put() and remove_proc_entry() do -1
* once refcount reaches 0, PDE is freed.

This elegantly fixes at least two following races (both observed) without
introducing new locks, without abusing old locks, without spreading
lock_kernel():

1) PDE leak

remove_proc_entry			de_put
-----------------			------
			[refcnt = 1]
if (atomic_read(&amp;de-&gt;count) == 0)
					if (atomic_dec_and_test(&amp;de-&gt;count))
						if (de-&gt;deleted)
							/* also not taken! */
							free_proc_entry(de);
else
	de-&gt;deleted = 1;
		[refcount=0, deleted=1]

2) use after free

remove_proc_entry			de_put
-----------------			------
			[refcnt = 1]

					if (atomic_dec_and_test(&amp;de-&gt;count))
if (atomic_read(&amp;de-&gt;count) == 0)
	free_proc_entry(de);
						/* boom! */
						if (de-&gt;deleted)
							free_proc_entry(de);

BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
printing eip: c10acdda *pdpt = 00000000338f8001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 23161, comm: cat Not tainted (2.6.24-rc2-8c0863403f109a43d7000b4646da4818220d501f #4)
EIP: 0060:[&lt;c10acdda&gt;] EFLAGS: 00210097 CPU: 1
EIP is at strnlen+0x6/0x18
EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: fffffffe
ESI: c128fa3b EDI: f380bf34 EBP: ffffffff ESP: f380be44
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 23161, ti=f380b000 task=f38f2570 task.ti=f380b000)
Stack: c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400
       c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400
       f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34
Call Trace:
 [&lt;c10ac4f0&gt;] vsnprintf+0x2ad/0x49b
 [&lt;c10ac779&gt;] vscnprintf+0x14/0x1f
 [&lt;c1018e6b&gt;] vprintk+0xc5/0x2f9
 [&lt;c10379f1&gt;] handle_fasteoi_irq+0x0/0xab
 [&lt;c1004f44&gt;] do_IRQ+0x9f/0xb7
 [&lt;c117db3b&gt;] preempt_schedule_irq+0x3f/0x5b
 [&lt;c100264e&gt;] need_resched+0x1f/0x21
 [&lt;c10190ba&gt;] printk+0x1b/0x1f
 [&lt;c107c8ad&gt;] de_put+0x3d/0x50
 [&lt;c107c8f8&gt;] proc_delete_inode+0x38/0x41
 [&lt;c107c8c0&gt;] proc_delete_inode+0x0/0x41
 [&lt;c1066298&gt;] generic_delete_inode+0x5e/0xc6
 [&lt;c1065aa9&gt;] iput+0x60/0x62
 [&lt;c1063c8e&gt;] d_kill+0x2d/0x46
 [&lt;c1063fa9&gt;] dput+0xdc/0xe4
 [&lt;c10571a1&gt;] __fput+0xb0/0xcd
 [&lt;c1054e49&gt;] filp_close+0x48/0x4f
 [&lt;c1055ee9&gt;] sys_close+0x67/0xa5
 [&lt;c10026b6&gt;] sysenter_past_esp+0x5f/0x85
=======================
Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 &lt;80&gt; 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9
EIP: [&lt;c10acdda&gt;] strnlen+0x6/0x18 SS:ESP 0068:f380be44

Also, remove broken usage of -&gt;deleted from reiserfs: if sget() succeeds,
module is already pinned and remove_proc_entry() can't happen =&gt; nobody
can mark PDE deleted.

Dummy proc root in netns code is not marked with refcount 1. AFAICS, we
never get it, it's just for proper /proc/net removal. I double checked
CLONE_NETNS continues to work.

Patch survives many hours of modprobe/rmmod/cat loops without new bugs
which can be attributed to refcounting.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[NETNS]: Fix /proc/net breakage</title>
<updated>2007-12-01T13:33:17+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2007-12-01T13:33:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2b1e300a9dfc3196ccddf6f1d74b91b7af55e416'/>
<id>2b1e300a9dfc3196ccddf6f1d74b91b7af55e416</id>
<content type='text'>
Well I clearly goofed when I added the initial network namespace support
for /proc/net.  Currently things work but there are odd details visible to
user space, even when we have a single network namespace.

Since we do not cache proc_dir_entry dentries at the moment we can just
modify -&gt;lookup to return a different directory inode depending on the
network namespace of the process looking at /proc/net, replacing the
current technique of using a magic and fragile follow_link method.

To accomplish that this patch:
- introduces a shadow_proc method to allow different dentries to
  be returned from proc_lookup.
- Removes the old /proc/net follow_link magic
- Fixes a weakness in our not caching of proc generic dentries.

As shadow_proc uses a task struct to decided which dentry to return we can
go back later and fix the proc generic caching without modifying any code
that uses the shadow_proc method.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@sisk.pl&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Well I clearly goofed when I added the initial network namespace support
for /proc/net.  Currently things work but there are odd details visible to
user space, even when we have a single network namespace.

Since we do not cache proc_dir_entry dentries at the moment we can just
modify -&gt;lookup to return a different directory inode depending on the
network namespace of the process looking at /proc/net, replacing the
current technique of using a magic and fragile follow_link method.

To accomplish that this patch:
- introduces a shadow_proc method to allow different dentries to
  be returned from proc_lookup.
- Removes the old /proc/net follow_link magic
- Fixes a weakness in our not caching of proc generic dentries.

As shadow_proc uses a task struct to decided which dentry to return we can
go back later and fix the proc generic caching without modifying any code
that uses the shadow_proc method.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@sisk.pl&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[NET]: Kill proc_net_create()</title>
<updated>2007-11-07T12:10:52+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@sunset.davemloft.net</email>
</author>
<published>2007-11-07T12:10:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=44656ba1286d82b5a5f8817eb2e4ea744143c3ca'/>
<id>44656ba1286d82b5a5f8817eb2e4ea744143c3ca</id>
<content type='text'>
There are no more users.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are no more users.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>pid namespaces: initialize the namespace's proc_mnt</title>
<updated>2007-10-19T18:53:40+00:00</updated>
<author>
<name>Pavel Emelyanov</name>
<email>xemul@openvz.org</email>
</author>
<published>2007-10-19T06:40:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6f4e643353aea52d80f33960bd88954a7c074f0f'/>
<id>6f4e643353aea52d80f33960bd88954a7c074f0f</id>
<content type='text'>
The namespace's proc_mnt must be kern_mount-ed to make this pointer always
valid, independently of whether the user space mounted the proc or not.  This
solves raced in proc_flush_task, etc.  with the proc_mnt switching from NULL
to not-NULL.

The initialization is done after the init's pid is created and hashed to make
proc_get_sb() finr it and get for root inode.

Sice the namespace holds the vfsmnt, vfsmnt holds the superblock and the
superblock holds the namespace we must explicitly break this circle to destroy
all the stuff.  This is done after the init of the namespace dies.  Running a
few steps forward - when init exits it will kill all its children, so no
proc_mnt will be needed after its death.

Signed-off-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Sukadev Bhattiprolu &lt;sukadev@us.ibm.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The namespace's proc_mnt must be kern_mount-ed to make this pointer always
valid, independently of whether the user space mounted the proc or not.  This
solves raced in proc_flush_task, etc.  with the proc_mnt switching from NULL
to not-NULL.

The initialization is done after the init's pid is created and hashed to make
proc_get_sb() finr it and get for root inode.

Sice the namespace holds the vfsmnt, vfsmnt holds the superblock and the
superblock holds the namespace we must explicitly break this circle to destroy
all the stuff.  This is done after the init of the namespace dies.  Running a
few steps forward - when init exits it will kill all its children, so no
proc_mnt will be needed after its death.

Signed-off-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Sukadev Bhattiprolu &lt;sukadev@us.ibm.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>pid namespaces: make proc have multiple superblocks - one for each namespace</title>
<updated>2007-10-19T18:53:39+00:00</updated>
<author>
<name>Pavel Emelyanov</name>
<email>xemul@openvz.org</email>
</author>
<published>2007-10-19T06:40:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=07543f5c75cee744b791cf7716c69571486fe753'/>
<id>07543f5c75cee744b791cf7716c69571486fe753</id>
<content type='text'>
Each pid namespace have to be visible through its own proc mount.  Thus we
need to have per-namespace proc trees with their own superblocks.

We cannot easily show different pid namespace via one global proc tree, since
each pid refers to different tasks in different namespaces.  E.g.  pid 1
refers to the init task in the initial namespace and to some other task when
seeing from another namespace.  Moreover - pid, exisintg in one namespace may
not exist in the other.

This approach has one move advantage is that the tasks from the init namespace
can see what tasks live in another namespace by reading entries from another
proc tree.

Signed-off-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Sukadev Bhattiprolu &lt;sukadev@us.ibm.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Each pid namespace have to be visible through its own proc mount.  Thus we
need to have per-namespace proc trees with their own superblocks.

We cannot easily show different pid namespace via one global proc tree, since
each pid refers to different tasks in different namespaces.  E.g.  pid 1
refers to the init task in the initial namespace and to some other task when
seeing from another namespace.  Moreover - pid, exisintg in one namespace may
not exist in the other.

This approach has one move advantage is that the tasks from the init namespace
can see what tasks live in another namespace by reading entries from another
proc tree.

Signed-off-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Sukadev Bhattiprolu &lt;sukadev@us.ibm.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>pid namespaces: prepare proc_flust_task() to flush entries from multiple proc trees</title>
<updated>2007-10-19T18:53:38+00:00</updated>
<author>
<name>Pavel Emelyanov</name>
<email>xemul@openvz.org</email>
</author>
<published>2007-10-19T06:40:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=60347f6716aa49831ac311e04d77ccdc50dc024a'/>
<id>60347f6716aa49831ac311e04d77ccdc50dc024a</id>
<content type='text'>
The first part is trivial - we just make the proc_flush_task() to operate on
arbitrary vfsmount with arbitrary ids and pass the pid and global proc_mnt to
it.

The other change is more tricky: I moved the proc_flush_task() call in
release_task() higher to address the following problem.

When flushing task from many proc trees we need to know the set of ids (not
just one pid) to find the dentries' names to flush.  Thus we need to pass the
task's pid to proc_flush_task() as struct pid is the only object that can
provide all the pid numbers.  But after __exit_signal() task has detached all
his pids and this information is lost.

This creates a tiny gap for proc_pid_lookup() to bring some dentries back to
tree and keep them in hash (since pids are still alive before __exit_signal())
till the next shrink, but since proc_flush_task() does not provide a 100%
guarantee that the dentries will be flushed, this is OK to do so.

Signed-off-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Sukadev Bhattiprolu &lt;sukadev@us.ibm.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The first part is trivial - we just make the proc_flush_task() to operate on
arbitrary vfsmount with arbitrary ids and pass the pid and global proc_mnt to
it.

The other change is more tricky: I moved the proc_flush_task() call in
release_task() higher to address the following problem.

When flushing task from many proc trees we need to know the set of ids (not
just one pid) to find the dentries' names to flush.  Thus we need to pass the
task's pid to proc_flush_task() as struct pid is the only object that can
provide all the pid numbers.  But after __exit_signal() task has detached all
his pids and this information is lost.

This creates a tiny gap for proc_pid_lookup() to bring some dentries back to
tree and keep them in hash (since pids are still alive before __exit_signal())
till the next shrink, but since proc_flush_task() does not provide a 100%
guarantee that the dentries will be flushed, this is OK to do so.

Signed-off-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Sukadev Bhattiprolu &lt;sukadev@us.ibm.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[NET]: Fix race when opening a proc file while a network namespace is exiting.</title>
<updated>2007-10-10T23:49:22+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2007-09-13T07:18:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=077130c0cf7d5ba1992f5b51b96136d7b1c8aad5'/>
<id>077130c0cf7d5ba1992f5b51b96136d7b1c8aad5</id>
<content type='text'>
The problem:  proc_net files remember which network namespace the are
against but do not remember hold a reference count (as that would pin
the network namespace).   So we currently have a small window where
the reference count on a network namespace may be incremented when opening
a /proc file when it has already gone to zero.

To fix this introduce maybe_get_net and get_proc_net.

maybe_get_net increments the network namespace reference count only if it is
greater then zero, ensuring we don't increment a reference count after it
has gone to zero.

get_proc_net handles all of the magic to go from a proc inode to the network
namespace instance and call maybe_get_net on it.

PROC_NET the old accessor is removed so that we don't get confused and use
the wrong helper function.

Then I fix up the callers to use get_proc_net and handle the case case
where get_proc_net returns NULL.  In that case I return -ENXIO because
effectively the network namespace has already gone away so the files
we are trying to access don't exist anymore.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Acked-by: Paul E. McKenney &lt;paulmck@us.ibm.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The problem:  proc_net files remember which network namespace the are
against but do not remember hold a reference count (as that would pin
the network namespace).   So we currently have a small window where
the reference count on a network namespace may be incremented when opening
a /proc file when it has already gone to zero.

To fix this introduce maybe_get_net and get_proc_net.

maybe_get_net increments the network namespace reference count only if it is
greater then zero, ensuring we don't increment a reference count after it
has gone to zero.

get_proc_net handles all of the magic to go from a proc inode to the network
namespace instance and call maybe_get_net on it.

PROC_NET the old accessor is removed so that we don't get confused and use
the wrong helper function.

Then I fix up the callers to use get_proc_net and handle the case case
where get_proc_net returns NULL.  In that case I return -ENXIO because
effectively the network namespace has already gone away so the files
we are trying to access don't exist anymore.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Acked-by: Paul E. McKenney &lt;paulmck@us.ibm.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[NET]: Make /proc/net per network namespace</title>
<updated>2007-10-10T23:49:06+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2007-09-12T10:01:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=457c4cbc5a3dde259d2a1f15d5f9785290397267'/>
<id>457c4cbc5a3dde259d2a1f15d5f9785290397267</id>
<content type='text'>
This patch makes /proc/net per network namespace.  It modifies the global
variables proc_net and proc_net_stat to be per network namespace.
The proc_net file helpers are modified to take a network namespace argument,
and all of their callers are fixed to pass &amp;init_net for that argument.
This ensures that all of the /proc/net files are only visible and
usable in the initial network namespace until the code behind them
has been updated to be handle multiple network namespaces.

Making /proc/net per namespace is necessary as at least some files
in /proc/net depend upon the set of network devices which is per
network namespace, and even more files in /proc/net have contents
that are relevant to a single network namespace.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch makes /proc/net per network namespace.  It modifies the global
variables proc_net and proc_net_stat to be per network namespace.
The proc_net file helpers are modified to take a network namespace argument,
and all of their callers are fixed to pass &amp;init_net for that argument.
This ensures that all of the /proc/net files are only visible and
usable in the initial network namespace until the code behind them
has been updated to be handle multiple network namespaces.

Making /proc/net per namespace is necessary as at least some files
in /proc/net depend upon the set of network devices which is per
network namespace, and even more files in /proc/net have contents
that are relevant to a single network namespace.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Remove unused struct proc_dir_entry::set</title>
<updated>2007-08-11T22:47:40+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2007-08-10T20:00:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=76ceb2f90f6efb6d1f3d88f855428bff947a3483'/>
<id>76ceb2f90f6efb6d1f3d88f855428bff947a3483</id>
<content type='text'>
After /proc/sys rewrite it was left unused.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After /proc/sys rewrite it was left unused.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix rmmod/read/write races in /proc entries</title>
<updated>2007-07-16T16:05:39+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@sw.ru</email>
</author>
<published>2007-07-16T06:39:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=786d7e1612f0b0adb6046f19b906609e4fe8b1ba'/>
<id>786d7e1612f0b0adb6046f19b906609e4fe8b1ba</id>
<content type='text'>
Fix following races:
===========================================
1. Write via -&gt;write_proc sleeps in copy_from_user(). Module disappears
   meanwhile. Or, more generically, system call done on /proc file, method
   supplied by module is called, module dissapeares meanwhile.

   pde = create_proc_entry()
   if (!pde)
	return -ENOMEM;
   pde-&gt;write_proc = ...
				open
				write
				copy_from_user
   pde = create_proc_entry();
   if (!pde) {
	remove_proc_entry();
	return -ENOMEM;
	/* module unloaded */
   }
				*boom*
==========================================
2. bogo-revoke aka proc_kill_inodes()

  remove_proc_entry		vfs_read
  proc_kill_inodes		[check -&gt;f_op validness]
				[check -&gt;f_op-&gt;read validness]
				[verify_area, security permissions checks]
	-&gt;f_op = NULL;
				if (file-&gt;f_op-&gt;read)
					/* -&gt;f_op dereference, boom */

NOTE, NOTE, NOTE: file_operations are proxied for regular files only. Let's
see how this scheme behaves, then extend if needed for directories.
Directories creators in /proc only set -&gt;owner for them, so proxying for
directories may be unneeded.

NOTE, NOTE, NOTE: methods being proxied are -&gt;llseek, -&gt;read, -&gt;write,
-&gt;poll, -&gt;unlocked_ioctl, -&gt;ioctl, -&gt;compat_ioctl, -&gt;open, -&gt;release.
If your in-tree module uses something else, yell on me. Full audit pending.

[akpm@linux-foundation.org: build fix]
Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix following races:
===========================================
1. Write via -&gt;write_proc sleeps in copy_from_user(). Module disappears
   meanwhile. Or, more generically, system call done on /proc file, method
   supplied by module is called, module dissapeares meanwhile.

   pde = create_proc_entry()
   if (!pde)
	return -ENOMEM;
   pde-&gt;write_proc = ...
				open
				write
				copy_from_user
   pde = create_proc_entry();
   if (!pde) {
	remove_proc_entry();
	return -ENOMEM;
	/* module unloaded */
   }
				*boom*
==========================================
2. bogo-revoke aka proc_kill_inodes()

  remove_proc_entry		vfs_read
  proc_kill_inodes		[check -&gt;f_op validness]
				[check -&gt;f_op-&gt;read validness]
				[verify_area, security permissions checks]
	-&gt;f_op = NULL;
				if (file-&gt;f_op-&gt;read)
					/* -&gt;f_op dereference, boom */

NOTE, NOTE, NOTE: file_operations are proxied for regular files only. Let's
see how this scheme behaves, then extend if needed for directories.
Directories creators in /proc only set -&gt;owner for them, so proxying for
directories may be unneeded.

NOTE, NOTE, NOTE: methods being proxied are -&gt;llseek, -&gt;read, -&gt;write,
-&gt;poll, -&gt;unlocked_ioctl, -&gt;ioctl, -&gt;compat_ioctl, -&gt;open, -&gt;release.
If your in-tree module uses something else, yell on me. Full audit pending.

[akpm@linux-foundation.org: build fix]
Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
