<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/proc/generic.c, branch v2.6.25</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>[NET]: Make /proc/net a symlink on /proc/self/net (v3)</title>
<updated>2008-03-07T19:08:40+00:00</updated>
<author>
<name>Pavel Emelyanov</name>
<email>xemul@openvz.org</email>
</author>
<published>2008-03-07T19:08:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e9720acd728a46cb40daa52c99a979f7c4ff195c'/>
<id>e9720acd728a46cb40daa52c99a979f7c4ff195c</id>
<content type='text'>
Current /proc/net is done with so called "shadows", but current
implementation is broken and has little chances to get fixed.

The problem is that dentries subtree of /proc/net directory has
fancy revalidation rules to make processes living in different
net namespaces see different entries in /proc/net subtree, but
currently, tasks see in the /proc/net subdir the contents of any
other namespace, depending on who opened the file first.

The proposed fix is to turn /proc/net into a symlink, which points
to /proc/self/net, which in turn shows what previously was in
/proc/net - the network-related info, from the net namespace the
appropriate task lives in.

# ls -l /proc/net
lrwxrwxrwx  1 root root 8 Mar  5 15:17 /proc/net -&gt; self/net

In other words - this behaves like /proc/mounts, but unlike
"mounts", "net" is not a file, but a directory.

Changes from v2:
* Fixed discrepancy of /proc/net nlink count and selinux labeling
  screwup pointed out by Stephen.

  To get the correct nlink count the -&gt;getattr callback for /proc/net
  is overridden to read one from the net-&gt;proc_net entry.

  To make selinux still work the net-&gt;proc_net entry is initialized
  properly, i.e. with the "net" name and the proc_net parent.

Selinux fixes are
Acked-by:  Stephen Smalley &lt;sds@tycho.nsa.gov&gt;

Changes from v1:
* Fixed a task_struct leak in get_proc_task_net, pointed out by Paul.

Signed-off-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Acked-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Current /proc/net is done with so called "shadows", but current
implementation is broken and has little chances to get fixed.

The problem is that dentries subtree of /proc/net directory has
fancy revalidation rules to make processes living in different
net namespaces see different entries in /proc/net subtree, but
currently, tasks see in the /proc/net subdir the contents of any
other namespace, depending on who opened the file first.

The proposed fix is to turn /proc/net into a symlink, which points
to /proc/self/net, which in turn shows what previously was in
/proc/net - the network-related info, from the net namespace the
appropriate task lives in.

# ls -l /proc/net
lrwxrwxrwx  1 root root 8 Mar  5 15:17 /proc/net -&gt; self/net

In other words - this behaves like /proc/mounts, but unlike
"mounts", "net" is not a file, but a directory.

Changes from v2:
* Fixed discrepancy of /proc/net nlink count and selinux labeling
  screwup pointed out by Stephen.

  To get the correct nlink count the -&gt;getattr callback for /proc/net
  is overridden to read one from the net-&gt;proc_net entry.

  To make selinux still work the net-&gt;proc_net entry is initialized
  properly, i.e. with the "net" name and the proc_net parent.

Selinux fixes are
Acked-by:  Stephen Smalley &lt;sds@tycho.nsa.gov&gt;

Changes from v1:
* Fixed a task_struct leak in get_proc_task_net, pointed out by Paul.

Signed-off-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Acked-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>proc: fix -&gt;open'less usage due to -&gt;proc_fops flip</title>
<updated>2008-02-08T17:22:24+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@sw.ru</email>
</author>
<published>2008-02-08T12:18:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2d3a4e3666325a9709cc8ea2e88151394e8f20fc'/>
<id>2d3a4e3666325a9709cc8ea2e88151394e8f20fc</id>
<content type='text'>
Typical PDE creation code looks like:

	pde = create_proc_entry("foo", 0, NULL);
	if (pde)
		pde-&gt;proc_fops = &amp;foo_proc_fops;

Notice that PDE is first created, only then -&gt;proc_fops is set up to
final value. This is a problem because right after creation
a) PDE is fully visible in /proc , and
b) -&gt;proc_fops are proc_file_operations which do not have -&gt;open callback. So, it's
   possible to -&gt;read without -&gt;open (see one class of oopses below).

The fix is new API called proc_create() which makes sure -&gt;proc_fops are
set up before gluing PDE to main tree. Typical new code looks like:

	pde = proc_create("foo", 0, NULL, &amp;foo_proc_fops);
	if (!pde)
		return -ENOMEM;

Fix most networking users for a start.

In the long run, create_proc_entry() for regular files will go.

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024
printing eip: c1188c1b *pdpt = 000000002929e001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/block/sda/sda1/dev
Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom

Pid: 24679, comm: cat Not tainted (2.6.24-rc3-mm1 #2)
EIP: 0060:[&lt;c1188c1b&gt;] EFLAGS: 00210002 CPU: 0
EIP is at mutex_lock_nested+0x75/0x25d
EAX: 000006fe EBX: fffffffb ECX: 00001000 EDX: e9340570
ESI: 00000020 EDI: 00200246 EBP: e9340570 ESP: e8ea1ef8
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 24679, ti=E8EA1000 task=E9340570 task.ti=E8EA1000)
Stack: 00000000 c106f7ce e8ee05b4 00000000 00000001 458003d0 f6fb6f20 fffffffb
       00000000 c106f7aa 00001000 c106f7ce 08ae9000 f6db53f0 00000020 00200246
       00000000 00000002 00000000 00200246 00200246 e8ee05a0 fffffffb e8ee0550
Call Trace:
 [&lt;c106f7ce&gt;] seq_read+0x24/0x28a
 [&lt;c106f7aa&gt;] seq_read+0x0/0x28a
 [&lt;c106f7ce&gt;] seq_read+0x24/0x28a
 [&lt;c106f7aa&gt;] seq_read+0x0/0x28a
 [&lt;c10818b8&gt;] proc_reg_read+0x60/0x73
 [&lt;c1081858&gt;] proc_reg_read+0x0/0x73
 [&lt;c105a34f&gt;] vfs_read+0x6c/0x8b
 [&lt;c105a6f3&gt;] sys_read+0x3c/0x63
 [&lt;c10025f2&gt;] sysenter_past_esp+0x5f/0xa5
 [&lt;c10697a7&gt;] destroy_inode+0x24/0x33
 =======================
INFO: lockdep is turned off.
Code: 75 21 68 e1 1a 19 c1 68 87 00 00 00 68 b8 e8 1f c1 68 25 73 1f c1 e8 84 06 e9 ff e8 52 b8 e7 ff 83 c4 10 9c 5f fa e8 28 89 ea ff &lt;f0&gt; fe 4e 04 79 0a f3 90 80 7e 04 00 7e f8 eb f0 39 76 34 74 33
EIP: [&lt;c1188c1b&gt;] mutex_lock_nested+0x75/0x25d SS:ESP 0068:e8ea1ef8

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Typical PDE creation code looks like:

	pde = create_proc_entry("foo", 0, NULL);
	if (pde)
		pde-&gt;proc_fops = &amp;foo_proc_fops;

Notice that PDE is first created, only then -&gt;proc_fops is set up to
final value. This is a problem because right after creation
a) PDE is fully visible in /proc , and
b) -&gt;proc_fops are proc_file_operations which do not have -&gt;open callback. So, it's
   possible to -&gt;read without -&gt;open (see one class of oopses below).

The fix is new API called proc_create() which makes sure -&gt;proc_fops are
set up before gluing PDE to main tree. Typical new code looks like:

	pde = proc_create("foo", 0, NULL, &amp;foo_proc_fops);
	if (!pde)
		return -ENOMEM;

Fix most networking users for a start.

In the long run, create_proc_entry() for regular files will go.

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024
printing eip: c1188c1b *pdpt = 000000002929e001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/block/sda/sda1/dev
Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom

Pid: 24679, comm: cat Not tainted (2.6.24-rc3-mm1 #2)
EIP: 0060:[&lt;c1188c1b&gt;] EFLAGS: 00210002 CPU: 0
EIP is at mutex_lock_nested+0x75/0x25d
EAX: 000006fe EBX: fffffffb ECX: 00001000 EDX: e9340570
ESI: 00000020 EDI: 00200246 EBP: e9340570 ESP: e8ea1ef8
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 24679, ti=E8EA1000 task=E9340570 task.ti=E8EA1000)
Stack: 00000000 c106f7ce e8ee05b4 00000000 00000001 458003d0 f6fb6f20 fffffffb
       00000000 c106f7aa 00001000 c106f7ce 08ae9000 f6db53f0 00000020 00200246
       00000000 00000002 00000000 00200246 00200246 e8ee05a0 fffffffb e8ee0550
Call Trace:
 [&lt;c106f7ce&gt;] seq_read+0x24/0x28a
 [&lt;c106f7aa&gt;] seq_read+0x0/0x28a
 [&lt;c106f7ce&gt;] seq_read+0x24/0x28a
 [&lt;c106f7aa&gt;] seq_read+0x0/0x28a
 [&lt;c10818b8&gt;] proc_reg_read+0x60/0x73
 [&lt;c1081858&gt;] proc_reg_read+0x0/0x73
 [&lt;c105a34f&gt;] vfs_read+0x6c/0x8b
 [&lt;c105a6f3&gt;] sys_read+0x3c/0x63
 [&lt;c10025f2&gt;] sysenter_past_esp+0x5f/0xa5
 [&lt;c10697a7&gt;] destroy_inode+0x24/0x33
 =======================
INFO: lockdep is turned off.
Code: 75 21 68 e1 1a 19 c1 68 87 00 00 00 68 b8 e8 1f c1 68 25 73 1f c1 e8 84 06 e9 ff e8 52 b8 e7 ff 83 c4 10 9c 5f fa e8 28 89 ea ff &lt;f0&gt; fe 4e 04 79 0a f3 90 80 7e 04 00 7e f8 eb f0 39 76 34 74 33
EIP: [&lt;c1188c1b&gt;] mutex_lock_nested+0x75/0x25d SS:ESP 0068:e8ea1ef8

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>proc: detect duplicate names on registration</title>
<updated>2008-02-08T17:22:23+00:00</updated>
<author>
<name>Zhang Rui</name>
<email>rui.zhang@intel.com</email>
</author>
<published>2008-02-08T12:18:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=94413d8807a3c511a3675be4ce27a4d16d6408ee'/>
<id>94413d8807a3c511a3675be4ce27a4d16d6408ee</id>
<content type='text'>
Print a warning if PDE is registered with a name which already exists in
target directory.

Bug report and a simple fix can be found here:
http://bugzilla.kernel.org/show_bug.cgi?id=8798

[\n fixlet and no undescriptive variable usage --adobriyan]
[akpm@linux-foundation.org: make printk comprehensible]
Signed-off-by: Zhang Rui &lt;rui.zhang@intel.com&gt;
Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Print a warning if PDE is registered with a name which already exists in
target directory.

Bug report and a simple fix can be found here:
http://bugzilla.kernel.org/show_bug.cgi?id=8798

[\n fixlet and no undescriptive variable usage --adobriyan]
[akpm@linux-foundation.org: make printk comprehensible]
Signed-off-by: Zhang Rui &lt;rui.zhang@intel.com&gt;
Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>proc: remove useless check on symlink removal</title>
<updated>2008-02-08T17:22:23+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@sw.ru</email>
</author>
<published>2008-02-08T12:18:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fd2cbe48883a01f710c2a639877e3b3e4eba6e59'/>
<id>fd2cbe48883a01f710c2a639877e3b3e4eba6e59</id>
<content type='text'>
proc symlinks always have valid -&gt;data containing destination of symlink.  No
need to check it on removal -- proc_symlink() already done it.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
proc symlinks always have valid -&gt;data containing destination of symlink.  No
need to check it on removal -- proc_symlink() already done it.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>proc: simplify function prototypes</title>
<updated>2008-02-08T17:22:23+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@sw.ru</email>
</author>
<published>2008-02-08T12:18:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=76df0c25d0c34eba9fbb8a44106ed096553ba0e8'/>
<id>76df0c25d0c34eba9fbb8a44106ed096553ba0e8</id>
<content type='text'>
Move code around so as to reduce the number of forward-declarations.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Move code around so as to reduce the number of forward-declarations.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>proc: less LOCK operations during lookup</title>
<updated>2008-02-08T17:22:23+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@sw.ru</email>
</author>
<published>2008-02-08T12:18:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4237e0d3de38da640d7c977d68f5f7f1d207a631'/>
<id>4237e0d3de38da640d7c977d68f5f7f1d207a631</id>
<content type='text'>
Pseudo-code for lookup effectively is:

	LOCK kernel
	LOCK proc_subdir_lock
		find PDE
		UNLOCK proc_subdir_lock

		get inode

		LOCK proc_subdir_lock
		goto unlock
	UNLOCK proc_subdir_lock
	UNLOCK kernel

We can get rid of LOCK/UNLOCK pair after getting inode simply by jumping
to unlock_kernel() directly.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pseudo-code for lookup effectively is:

	LOCK kernel
	LOCK proc_subdir_lock
		find PDE
		UNLOCK proc_subdir_lock

		get inode

		LOCK proc_subdir_lock
		goto unlock
	UNLOCK proc_subdir_lock
	UNLOCK kernel

We can get rid of LOCK/UNLOCK pair after getting inode simply by jumping
to unlock_kernel() directly.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>proc: remove/Fix proc generic d_revalidate</title>
<updated>2007-12-11T03:43:55+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2007-12-10T23:49:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3790ee4bd86396558eedd86faac1052cb782e4e1'/>
<id>3790ee4bd86396558eedd86faac1052cb782e4e1</id>
<content type='text'>
Ultimately to implement /proc perfectly we need an implementation of
d_revalidate because files and directories can be removed behind the back
of the VFS, and d_revalidate is the only way we can let the VFS know that
this has happened.

Unfortunately the linux VFS can not cope with anything in the path to a
mount point going away.  So a proper d_revalidate method that calls d_drop
also needs to call have_submounts which is moderately expensive, so you
really don't want a d_revalidate method that unconditionally calls it, but
instead only calls it when the backing object has really gone away.

proc generic entries only disappear on module_unload (when not counting the
fledgling network namespace) so it is quite rare that we actually encounter
that case and has not actually caused us real world trouble yet.

So until we get a proper test for keeping dentries in the dcache fix the
current d_revalidate method by completely removing it.  This returns us to
the current status quo.

So with CONFIG_NETNS=n things should look as they have always looked.

For CONFIG_NETNS=y things work most of the time but there are a few rare
corner cases that don't behave properly.  As the network namespace is
barely present in 2.6.24 this should not be a problem.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@sisk.pl&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Trond Myklebust &lt;trond.myklebust@fys.uio.no&gt;
Cc: "Denis V. Lunev" &lt;den@openvz.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Ultimately to implement /proc perfectly we need an implementation of
d_revalidate because files and directories can be removed behind the back
of the VFS, and d_revalidate is the only way we can let the VFS know that
this has happened.

Unfortunately the linux VFS can not cope with anything in the path to a
mount point going away.  So a proper d_revalidate method that calls d_drop
also needs to call have_submounts which is moderately expensive, so you
really don't want a d_revalidate method that unconditionally calls it, but
instead only calls it when the backing object has really gone away.

proc generic entries only disappear on module_unload (when not counting the
fledgling network namespace) so it is quite rare that we actually encounter
that case and has not actually caused us real world trouble yet.

So until we get a proper test for keeping dentries in the dcache fix the
current d_revalidate method by completely removing it.  This returns us to
the current status quo.

So with CONFIG_NETNS=n things should look as they have always looked.

For CONFIG_NETNS=y things work most of the time but there are a few rare
corner cases that don't behave properly.  As the network namespace is
barely present in 2.6.24 this should not be a problem.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@sisk.pl&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Trond Myklebust &lt;trond.myklebust@fys.uio.no&gt;
Cc: "Denis V. Lunev" &lt;den@openvz.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>proc: fix proc_dir_entry refcounting</title>
<updated>2007-12-05T17:21:20+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@sw.ru</email>
</author>
<published>2007-12-05T07:45:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5a622f2d0f86b316b07b55a4866ecb5518dd1cf7'/>
<id>5a622f2d0f86b316b07b55a4866ecb5518dd1cf7</id>
<content type='text'>
Creating PDEs with refcount 0 and "deleted" flag has problems (see below).
Switch to usual scheme:
* PDE is created with refcount 1
* every de_get does +1
* every de_put() and remove_proc_entry() do -1
* once refcount reaches 0, PDE is freed.

This elegantly fixes at least two following races (both observed) without
introducing new locks, without abusing old locks, without spreading
lock_kernel():

1) PDE leak

remove_proc_entry			de_put
-----------------			------
			[refcnt = 1]
if (atomic_read(&amp;de-&gt;count) == 0)
					if (atomic_dec_and_test(&amp;de-&gt;count))
						if (de-&gt;deleted)
							/* also not taken! */
							free_proc_entry(de);
else
	de-&gt;deleted = 1;
		[refcount=0, deleted=1]

2) use after free

remove_proc_entry			de_put
-----------------			------
			[refcnt = 1]

					if (atomic_dec_and_test(&amp;de-&gt;count))
if (atomic_read(&amp;de-&gt;count) == 0)
	free_proc_entry(de);
						/* boom! */
						if (de-&gt;deleted)
							free_proc_entry(de);

BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
printing eip: c10acdda *pdpt = 00000000338f8001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 23161, comm: cat Not tainted (2.6.24-rc2-8c0863403f109a43d7000b4646da4818220d501f #4)
EIP: 0060:[&lt;c10acdda&gt;] EFLAGS: 00210097 CPU: 1
EIP is at strnlen+0x6/0x18
EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: fffffffe
ESI: c128fa3b EDI: f380bf34 EBP: ffffffff ESP: f380be44
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 23161, ti=f380b000 task=f38f2570 task.ti=f380b000)
Stack: c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400
       c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400
       f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34
Call Trace:
 [&lt;c10ac4f0&gt;] vsnprintf+0x2ad/0x49b
 [&lt;c10ac779&gt;] vscnprintf+0x14/0x1f
 [&lt;c1018e6b&gt;] vprintk+0xc5/0x2f9
 [&lt;c10379f1&gt;] handle_fasteoi_irq+0x0/0xab
 [&lt;c1004f44&gt;] do_IRQ+0x9f/0xb7
 [&lt;c117db3b&gt;] preempt_schedule_irq+0x3f/0x5b
 [&lt;c100264e&gt;] need_resched+0x1f/0x21
 [&lt;c10190ba&gt;] printk+0x1b/0x1f
 [&lt;c107c8ad&gt;] de_put+0x3d/0x50
 [&lt;c107c8f8&gt;] proc_delete_inode+0x38/0x41
 [&lt;c107c8c0&gt;] proc_delete_inode+0x0/0x41
 [&lt;c1066298&gt;] generic_delete_inode+0x5e/0xc6
 [&lt;c1065aa9&gt;] iput+0x60/0x62
 [&lt;c1063c8e&gt;] d_kill+0x2d/0x46
 [&lt;c1063fa9&gt;] dput+0xdc/0xe4
 [&lt;c10571a1&gt;] __fput+0xb0/0xcd
 [&lt;c1054e49&gt;] filp_close+0x48/0x4f
 [&lt;c1055ee9&gt;] sys_close+0x67/0xa5
 [&lt;c10026b6&gt;] sysenter_past_esp+0x5f/0x85
=======================
Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 &lt;80&gt; 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9
EIP: [&lt;c10acdda&gt;] strnlen+0x6/0x18 SS:ESP 0068:f380be44

Also, remove broken usage of -&gt;deleted from reiserfs: if sget() succeeds,
module is already pinned and remove_proc_entry() can't happen =&gt; nobody
can mark PDE deleted.

Dummy proc root in netns code is not marked with refcount 1. AFAICS, we
never get it, it's just for proper /proc/net removal. I double checked
CLONE_NETNS continues to work.

Patch survives many hours of modprobe/rmmod/cat loops without new bugs
which can be attributed to refcounting.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Creating PDEs with refcount 0 and "deleted" flag has problems (see below).
Switch to usual scheme:
* PDE is created with refcount 1
* every de_get does +1
* every de_put() and remove_proc_entry() do -1
* once refcount reaches 0, PDE is freed.

This elegantly fixes at least two following races (both observed) without
introducing new locks, without abusing old locks, without spreading
lock_kernel():

1) PDE leak

remove_proc_entry			de_put
-----------------			------
			[refcnt = 1]
if (atomic_read(&amp;de-&gt;count) == 0)
					if (atomic_dec_and_test(&amp;de-&gt;count))
						if (de-&gt;deleted)
							/* also not taken! */
							free_proc_entry(de);
else
	de-&gt;deleted = 1;
		[refcount=0, deleted=1]

2) use after free

remove_proc_entry			de_put
-----------------			------
			[refcnt = 1]

					if (atomic_dec_and_test(&amp;de-&gt;count))
if (atomic_read(&amp;de-&gt;count) == 0)
	free_proc_entry(de);
						/* boom! */
						if (de-&gt;deleted)
							free_proc_entry(de);

BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
printing eip: c10acdda *pdpt = 00000000338f8001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 23161, comm: cat Not tainted (2.6.24-rc2-8c0863403f109a43d7000b4646da4818220d501f #4)
EIP: 0060:[&lt;c10acdda&gt;] EFLAGS: 00210097 CPU: 1
EIP is at strnlen+0x6/0x18
EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: fffffffe
ESI: c128fa3b EDI: f380bf34 EBP: ffffffff ESP: f380be44
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 23161, ti=f380b000 task=f38f2570 task.ti=f380b000)
Stack: c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400
       c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400
       f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34
Call Trace:
 [&lt;c10ac4f0&gt;] vsnprintf+0x2ad/0x49b
 [&lt;c10ac779&gt;] vscnprintf+0x14/0x1f
 [&lt;c1018e6b&gt;] vprintk+0xc5/0x2f9
 [&lt;c10379f1&gt;] handle_fasteoi_irq+0x0/0xab
 [&lt;c1004f44&gt;] do_IRQ+0x9f/0xb7
 [&lt;c117db3b&gt;] preempt_schedule_irq+0x3f/0x5b
 [&lt;c100264e&gt;] need_resched+0x1f/0x21
 [&lt;c10190ba&gt;] printk+0x1b/0x1f
 [&lt;c107c8ad&gt;] de_put+0x3d/0x50
 [&lt;c107c8f8&gt;] proc_delete_inode+0x38/0x41
 [&lt;c107c8c0&gt;] proc_delete_inode+0x0/0x41
 [&lt;c1066298&gt;] generic_delete_inode+0x5e/0xc6
 [&lt;c1065aa9&gt;] iput+0x60/0x62
 [&lt;c1063c8e&gt;] d_kill+0x2d/0x46
 [&lt;c1063fa9&gt;] dput+0xdc/0xe4
 [&lt;c10571a1&gt;] __fput+0xb0/0xcd
 [&lt;c1054e49&gt;] filp_close+0x48/0x4f
 [&lt;c1055ee9&gt;] sys_close+0x67/0xa5
 [&lt;c10026b6&gt;] sysenter_past_esp+0x5f/0x85
=======================
Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 &lt;80&gt; 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9
EIP: [&lt;c10acdda&gt;] strnlen+0x6/0x18 SS:ESP 0068:f380be44

Also, remove broken usage of -&gt;deleted from reiserfs: if sget() succeeds,
module is already pinned and remove_proc_entry() can't happen =&gt; nobody
can mark PDE deleted.

Dummy proc root in netns code is not marked with refcount 1. AFAICS, we
never get it, it's just for proper /proc/net removal. I double checked
CLONE_NETNS continues to work.

Patch survives many hours of modprobe/rmmod/cat loops without new bugs
which can be attributed to refcounting.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6</title>
<updated>2007-12-03T16:15:36+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@woody.linux-foundation.org</email>
</author>
<published>2007-12-03T16:15:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8002cedc1adbf51e2d56091534ef7551b88329b4'/>
<id>8002cedc1adbf51e2d56091534ef7551b88329b4</id>
<content type='text'>
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6: (27 commits)
  [INET]: Fix inet_diag dead-lock regression
  [NETNS]: Fix /proc/net breakage
  [TEXTSEARCH]: Do not allow zero length patterns in the textsearch infrastructure
  [NETFILTER]: fix forgotten module release in xt_CONNMARK and xt_CONNSECMARK
  [NETFILTER]: xt_TCPMSS: remove network triggerable WARN_ON
  [DECNET]: dn_nl_deladdr() almost always returns no error
  [IPV6]: Restore IPv6 when MTU is big enough
  [RXRPC]: Add missing select on CRYPTO
  mac80211: rate limit wep decrypt failed messages
  rfkill: fix double-mutex-locking
  mac80211: drop unencrypted frames if encryption is expected
  mac80211: Fix behavior of ieee80211_open and ieee80211_close
  ieee80211: fix unaligned access in ieee80211_copy_snap
  mac80211: free ifsta-&gt;extra_ie and clear IEEE80211_STA_PRIVACY_INVOKED
  SCTP: Fix build issues with SCTP AUTH.
  SCTP: Fix chunk acceptance when no authenticated chunks were listed.
  SCTP: Fix the supported extensions paramter
  SCTP: Fix SCTP-AUTH to correctly add HMACS paramter.
  SCTP: Fix the number of HB transmissions.
  [TCP] illinois: Incorrect beta usage
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6: (27 commits)
  [INET]: Fix inet_diag dead-lock regression
  [NETNS]: Fix /proc/net breakage
  [TEXTSEARCH]: Do not allow zero length patterns in the textsearch infrastructure
  [NETFILTER]: fix forgotten module release in xt_CONNMARK and xt_CONNSECMARK
  [NETFILTER]: xt_TCPMSS: remove network triggerable WARN_ON
  [DECNET]: dn_nl_deladdr() almost always returns no error
  [IPV6]: Restore IPv6 when MTU is big enough
  [RXRPC]: Add missing select on CRYPTO
  mac80211: rate limit wep decrypt failed messages
  rfkill: fix double-mutex-locking
  mac80211: drop unencrypted frames if encryption is expected
  mac80211: Fix behavior of ieee80211_open and ieee80211_close
  ieee80211: fix unaligned access in ieee80211_copy_snap
  mac80211: free ifsta-&gt;extra_ie and clear IEEE80211_STA_PRIVACY_INVOKED
  SCTP: Fix build issues with SCTP AUTH.
  SCTP: Fix chunk acceptance when no authenticated chunks were listed.
  SCTP: Fix the supported extensions paramter
  SCTP: Fix SCTP-AUTH to correctly add HMACS paramter.
  SCTP: Fix the number of HB transmissions.
  [TCP] illinois: Incorrect beta usage
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>[NETNS]: Fix /proc/net breakage</title>
<updated>2007-12-01T13:33:17+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2007-12-01T13:33:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2b1e300a9dfc3196ccddf6f1d74b91b7af55e416'/>
<id>2b1e300a9dfc3196ccddf6f1d74b91b7af55e416</id>
<content type='text'>
Well I clearly goofed when I added the initial network namespace support
for /proc/net.  Currently things work but there are odd details visible to
user space, even when we have a single network namespace.

Since we do not cache proc_dir_entry dentries at the moment we can just
modify -&gt;lookup to return a different directory inode depending on the
network namespace of the process looking at /proc/net, replacing the
current technique of using a magic and fragile follow_link method.

To accomplish that this patch:
- introduces a shadow_proc method to allow different dentries to
  be returned from proc_lookup.
- Removes the old /proc/net follow_link magic
- Fixes a weakness in our not caching of proc generic dentries.

As shadow_proc uses a task struct to decided which dentry to return we can
go back later and fix the proc generic caching without modifying any code
that uses the shadow_proc method.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@sisk.pl&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Well I clearly goofed when I added the initial network namespace support
for /proc/net.  Currently things work but there are odd details visible to
user space, even when we have a single network namespace.

Since we do not cache proc_dir_entry dentries at the moment we can just
modify -&gt;lookup to return a different directory inode depending on the
network namespace of the process looking at /proc/net, replacing the
current technique of using a magic and fragile follow_link method.

To accomplish that this patch:
- introduces a shadow_proc method to allow different dentries to
  be returned from proc_lookup.
- Removes the old /proc/net follow_link magic
- Fixes a weakness in our not caching of proc generic dentries.

As shadow_proc uses a task struct to decided which dentry to return we can
go back later and fix the proc generic caching without modifying any code
that uses the shadow_proc method.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@sisk.pl&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
</feed>
