<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/proc/generic.c, branch v2.6.24</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>proc: remove/Fix proc generic d_revalidate</title>
<updated>2007-12-11T03:43:55+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2007-12-10T23:49:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3790ee4bd86396558eedd86faac1052cb782e4e1'/>
<id>3790ee4bd86396558eedd86faac1052cb782e4e1</id>
<content type='text'>
Ultimately to implement /proc perfectly we need an implementation of
d_revalidate because files and directories can be removed behind the back
of the VFS, and d_revalidate is the only way we can let the VFS know that
this has happened.

Unfortunately the linux VFS can not cope with anything in the path to a
mount point going away.  So a proper d_revalidate method that calls d_drop
also needs to call have_submounts which is moderately expensive, so you
really don't want a d_revalidate method that unconditionally calls it, but
instead only calls it when the backing object has really gone away.

proc generic entries only disappear on module_unload (when not counting the
fledgling network namespace) so it is quite rare that we actually encounter
that case and has not actually caused us real world trouble yet.

So until we get a proper test for keeping dentries in the dcache fix the
current d_revalidate method by completely removing it.  This returns us to
the current status quo.

So with CONFIG_NETNS=n things should look as they have always looked.

For CONFIG_NETNS=y things work most of the time but there are a few rare
corner cases that don't behave properly.  As the network namespace is
barely present in 2.6.24 this should not be a problem.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@sisk.pl&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Trond Myklebust &lt;trond.myklebust@fys.uio.no&gt;
Cc: "Denis V. Lunev" &lt;den@openvz.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Ultimately to implement /proc perfectly we need an implementation of
d_revalidate because files and directories can be removed behind the back
of the VFS, and d_revalidate is the only way we can let the VFS know that
this has happened.

Unfortunately the linux VFS can not cope with anything in the path to a
mount point going away.  So a proper d_revalidate method that calls d_drop
also needs to call have_submounts which is moderately expensive, so you
really don't want a d_revalidate method that unconditionally calls it, but
instead only calls it when the backing object has really gone away.

proc generic entries only disappear on module_unload (when not counting the
fledgling network namespace) so it is quite rare that we actually encounter
that case and has not actually caused us real world trouble yet.

So until we get a proper test for keeping dentries in the dcache fix the
current d_revalidate method by completely removing it.  This returns us to
the current status quo.

So with CONFIG_NETNS=n things should look as they have always looked.

For CONFIG_NETNS=y things work most of the time but there are a few rare
corner cases that don't behave properly.  As the network namespace is
barely present in 2.6.24 this should not be a problem.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@sisk.pl&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Trond Myklebust &lt;trond.myklebust@fys.uio.no&gt;
Cc: "Denis V. Lunev" &lt;den@openvz.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>proc: fix proc_dir_entry refcounting</title>
<updated>2007-12-05T17:21:20+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@sw.ru</email>
</author>
<published>2007-12-05T07:45:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5a622f2d0f86b316b07b55a4866ecb5518dd1cf7'/>
<id>5a622f2d0f86b316b07b55a4866ecb5518dd1cf7</id>
<content type='text'>
Creating PDEs with refcount 0 and "deleted" flag has problems (see below).
Switch to usual scheme:
* PDE is created with refcount 1
* every de_get does +1
* every de_put() and remove_proc_entry() do -1
* once refcount reaches 0, PDE is freed.

This elegantly fixes at least two following races (both observed) without
introducing new locks, without abusing old locks, without spreading
lock_kernel():

1) PDE leak

remove_proc_entry			de_put
-----------------			------
			[refcnt = 1]
if (atomic_read(&amp;de-&gt;count) == 0)
					if (atomic_dec_and_test(&amp;de-&gt;count))
						if (de-&gt;deleted)
							/* also not taken! */
							free_proc_entry(de);
else
	de-&gt;deleted = 1;
		[refcount=0, deleted=1]

2) use after free

remove_proc_entry			de_put
-----------------			------
			[refcnt = 1]

					if (atomic_dec_and_test(&amp;de-&gt;count))
if (atomic_read(&amp;de-&gt;count) == 0)
	free_proc_entry(de);
						/* boom! */
						if (de-&gt;deleted)
							free_proc_entry(de);

BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
printing eip: c10acdda *pdpt = 00000000338f8001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 23161, comm: cat Not tainted (2.6.24-rc2-8c0863403f109a43d7000b4646da4818220d501f #4)
EIP: 0060:[&lt;c10acdda&gt;] EFLAGS: 00210097 CPU: 1
EIP is at strnlen+0x6/0x18
EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: fffffffe
ESI: c128fa3b EDI: f380bf34 EBP: ffffffff ESP: f380be44
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 23161, ti=f380b000 task=f38f2570 task.ti=f380b000)
Stack: c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400
       c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400
       f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34
Call Trace:
 [&lt;c10ac4f0&gt;] vsnprintf+0x2ad/0x49b
 [&lt;c10ac779&gt;] vscnprintf+0x14/0x1f
 [&lt;c1018e6b&gt;] vprintk+0xc5/0x2f9
 [&lt;c10379f1&gt;] handle_fasteoi_irq+0x0/0xab
 [&lt;c1004f44&gt;] do_IRQ+0x9f/0xb7
 [&lt;c117db3b&gt;] preempt_schedule_irq+0x3f/0x5b
 [&lt;c100264e&gt;] need_resched+0x1f/0x21
 [&lt;c10190ba&gt;] printk+0x1b/0x1f
 [&lt;c107c8ad&gt;] de_put+0x3d/0x50
 [&lt;c107c8f8&gt;] proc_delete_inode+0x38/0x41
 [&lt;c107c8c0&gt;] proc_delete_inode+0x0/0x41
 [&lt;c1066298&gt;] generic_delete_inode+0x5e/0xc6
 [&lt;c1065aa9&gt;] iput+0x60/0x62
 [&lt;c1063c8e&gt;] d_kill+0x2d/0x46
 [&lt;c1063fa9&gt;] dput+0xdc/0xe4
 [&lt;c10571a1&gt;] __fput+0xb0/0xcd
 [&lt;c1054e49&gt;] filp_close+0x48/0x4f
 [&lt;c1055ee9&gt;] sys_close+0x67/0xa5
 [&lt;c10026b6&gt;] sysenter_past_esp+0x5f/0x85
=======================
Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 &lt;80&gt; 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9
EIP: [&lt;c10acdda&gt;] strnlen+0x6/0x18 SS:ESP 0068:f380be44

Also, remove broken usage of -&gt;deleted from reiserfs: if sget() succeeds,
module is already pinned and remove_proc_entry() can't happen =&gt; nobody
can mark PDE deleted.

Dummy proc root in netns code is not marked with refcount 1. AFAICS, we
never get it, it's just for proper /proc/net removal. I double checked
CLONE_NETNS continues to work.

Patch survives many hours of modprobe/rmmod/cat loops without new bugs
which can be attributed to refcounting.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Creating PDEs with refcount 0 and "deleted" flag has problems (see below).
Switch to usual scheme:
* PDE is created with refcount 1
* every de_get does +1
* every de_put() and remove_proc_entry() do -1
* once refcount reaches 0, PDE is freed.

This elegantly fixes at least two following races (both observed) without
introducing new locks, without abusing old locks, without spreading
lock_kernel():

1) PDE leak

remove_proc_entry			de_put
-----------------			------
			[refcnt = 1]
if (atomic_read(&amp;de-&gt;count) == 0)
					if (atomic_dec_and_test(&amp;de-&gt;count))
						if (de-&gt;deleted)
							/* also not taken! */
							free_proc_entry(de);
else
	de-&gt;deleted = 1;
		[refcount=0, deleted=1]

2) use after free

remove_proc_entry			de_put
-----------------			------
			[refcnt = 1]

					if (atomic_dec_and_test(&amp;de-&gt;count))
if (atomic_read(&amp;de-&gt;count) == 0)
	free_proc_entry(de);
						/* boom! */
						if (de-&gt;deleted)
							free_proc_entry(de);

BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
printing eip: c10acdda *pdpt = 00000000338f8001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 23161, comm: cat Not tainted (2.6.24-rc2-8c0863403f109a43d7000b4646da4818220d501f #4)
EIP: 0060:[&lt;c10acdda&gt;] EFLAGS: 00210097 CPU: 1
EIP is at strnlen+0x6/0x18
EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: fffffffe
ESI: c128fa3b EDI: f380bf34 EBP: ffffffff ESP: f380be44
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 23161, ti=f380b000 task=f38f2570 task.ti=f380b000)
Stack: c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400
       c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400
       f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34
Call Trace:
 [&lt;c10ac4f0&gt;] vsnprintf+0x2ad/0x49b
 [&lt;c10ac779&gt;] vscnprintf+0x14/0x1f
 [&lt;c1018e6b&gt;] vprintk+0xc5/0x2f9
 [&lt;c10379f1&gt;] handle_fasteoi_irq+0x0/0xab
 [&lt;c1004f44&gt;] do_IRQ+0x9f/0xb7
 [&lt;c117db3b&gt;] preempt_schedule_irq+0x3f/0x5b
 [&lt;c100264e&gt;] need_resched+0x1f/0x21
 [&lt;c10190ba&gt;] printk+0x1b/0x1f
 [&lt;c107c8ad&gt;] de_put+0x3d/0x50
 [&lt;c107c8f8&gt;] proc_delete_inode+0x38/0x41
 [&lt;c107c8c0&gt;] proc_delete_inode+0x0/0x41
 [&lt;c1066298&gt;] generic_delete_inode+0x5e/0xc6
 [&lt;c1065aa9&gt;] iput+0x60/0x62
 [&lt;c1063c8e&gt;] d_kill+0x2d/0x46
 [&lt;c1063fa9&gt;] dput+0xdc/0xe4
 [&lt;c10571a1&gt;] __fput+0xb0/0xcd
 [&lt;c1054e49&gt;] filp_close+0x48/0x4f
 [&lt;c1055ee9&gt;] sys_close+0x67/0xa5
 [&lt;c10026b6&gt;] sysenter_past_esp+0x5f/0x85
=======================
Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 &lt;80&gt; 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9
EIP: [&lt;c10acdda&gt;] strnlen+0x6/0x18 SS:ESP 0068:f380be44

Also, remove broken usage of -&gt;deleted from reiserfs: if sget() succeeds,
module is already pinned and remove_proc_entry() can't happen =&gt; nobody
can mark PDE deleted.

Dummy proc root in netns code is not marked with refcount 1. AFAICS, we
never get it, it's just for proper /proc/net removal. I double checked
CLONE_NETNS continues to work.

Patch survives many hours of modprobe/rmmod/cat loops without new bugs
which can be attributed to refcounting.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6</title>
<updated>2007-12-03T16:15:36+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@woody.linux-foundation.org</email>
</author>
<published>2007-12-03T16:15:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8002cedc1adbf51e2d56091534ef7551b88329b4'/>
<id>8002cedc1adbf51e2d56091534ef7551b88329b4</id>
<content type='text'>
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6: (27 commits)
  [INET]: Fix inet_diag dead-lock regression
  [NETNS]: Fix /proc/net breakage
  [TEXTSEARCH]: Do not allow zero length patterns in the textsearch infrastructure
  [NETFILTER]: fix forgotten module release in xt_CONNMARK and xt_CONNSECMARK
  [NETFILTER]: xt_TCPMSS: remove network triggerable WARN_ON
  [DECNET]: dn_nl_deladdr() almost always returns no error
  [IPV6]: Restore IPv6 when MTU is big enough
  [RXRPC]: Add missing select on CRYPTO
  mac80211: rate limit wep decrypt failed messages
  rfkill: fix double-mutex-locking
  mac80211: drop unencrypted frames if encryption is expected
  mac80211: Fix behavior of ieee80211_open and ieee80211_close
  ieee80211: fix unaligned access in ieee80211_copy_snap
  mac80211: free ifsta-&gt;extra_ie and clear IEEE80211_STA_PRIVACY_INVOKED
  SCTP: Fix build issues with SCTP AUTH.
  SCTP: Fix chunk acceptance when no authenticated chunks were listed.
  SCTP: Fix the supported extensions paramter
  SCTP: Fix SCTP-AUTH to correctly add HMACS paramter.
  SCTP: Fix the number of HB transmissions.
  [TCP] illinois: Incorrect beta usage
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6: (27 commits)
  [INET]: Fix inet_diag dead-lock regression
  [NETNS]: Fix /proc/net breakage
  [TEXTSEARCH]: Do not allow zero length patterns in the textsearch infrastructure
  [NETFILTER]: fix forgotten module release in xt_CONNMARK and xt_CONNSECMARK
  [NETFILTER]: xt_TCPMSS: remove network triggerable WARN_ON
  [DECNET]: dn_nl_deladdr() almost always returns no error
  [IPV6]: Restore IPv6 when MTU is big enough
  [RXRPC]: Add missing select on CRYPTO
  mac80211: rate limit wep decrypt failed messages
  rfkill: fix double-mutex-locking
  mac80211: drop unencrypted frames if encryption is expected
  mac80211: Fix behavior of ieee80211_open and ieee80211_close
  ieee80211: fix unaligned access in ieee80211_copy_snap
  mac80211: free ifsta-&gt;extra_ie and clear IEEE80211_STA_PRIVACY_INVOKED
  SCTP: Fix build issues with SCTP AUTH.
  SCTP: Fix chunk acceptance when no authenticated chunks were listed.
  SCTP: Fix the supported extensions paramter
  SCTP: Fix SCTP-AUTH to correctly add HMACS paramter.
  SCTP: Fix the number of HB transmissions.
  [TCP] illinois: Incorrect beta usage
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>[NETNS]: Fix /proc/net breakage</title>
<updated>2007-12-01T13:33:17+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2007-12-01T13:33:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2b1e300a9dfc3196ccddf6f1d74b91b7af55e416'/>
<id>2b1e300a9dfc3196ccddf6f1d74b91b7af55e416</id>
<content type='text'>
Well I clearly goofed when I added the initial network namespace support
for /proc/net.  Currently things work but there are odd details visible to
user space, even when we have a single network namespace.

Since we do not cache proc_dir_entry dentries at the moment we can just
modify -&gt;lookup to return a different directory inode depending on the
network namespace of the process looking at /proc/net, replacing the
current technique of using a magic and fragile follow_link method.

To accomplish that this patch:
- introduces a shadow_proc method to allow different dentries to
  be returned from proc_lookup.
- Removes the old /proc/net follow_link magic
- Fixes a weakness in our not caching of proc generic dentries.

As shadow_proc uses a task struct to decided which dentry to return we can
go back later and fix the proc generic caching without modifying any code
that uses the shadow_proc method.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@sisk.pl&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Well I clearly goofed when I added the initial network namespace support
for /proc/net.  Currently things work but there are odd details visible to
user space, even when we have a single network namespace.

Since we do not cache proc_dir_entry dentries at the moment we can just
modify -&gt;lookup to return a different directory inode depending on the
network namespace of the process looking at /proc/net, replacing the
current technique of using a magic and fragile follow_link method.

To accomplish that this patch:
- introduces a shadow_proc method to allow different dentries to
  be returned from proc_lookup.
- Removes the old /proc/net follow_link magic
- Fixes a weakness in our not caching of proc generic dentries.

As shadow_proc uses a task struct to decided which dentry to return we can
go back later and fix the proc generic caching without modifying any code
that uses the shadow_proc method.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@sisk.pl&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>proc: fix NULL -&gt;i_fop oops</title>
<updated>2007-11-29T17:24:52+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@sw.ru</email>
</author>
<published>2007-11-29T00:21:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c2319540cd7330fa9066e5b9b84d357a2c8631a2'/>
<id>c2319540cd7330fa9066e5b9b84d357a2c8631a2</id>
<content type='text'>
proc_kill_inodes() can clear -&gt;i_fop in the middle of vfs_readdir resulting in
NULL dereference during "file-&gt;f_op-&gt;readdir(file, buf, filler)".

The solution is to remove proc_kill_inodes() completely:

a) we don't have tricky modules implementing their tricky readdir hooks which
   could keeping this revoke from hell.

b) In a situation when module is gone but PDE still alive, standard
   readdir will return only "." and "..", because pde-&gt;next was cleared by
   remove_proc_entry().

c) the race proc_kill_inode() destined to prevent is not completely
   fixed, just race window made smaller, because vfs_readdir() is run
   without sb_lock held and without file_list_lock held.  Effectively,
   -&gt;i_fop is cleared at random moment, which can't fix properly anything.

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000018
printing eip: c1061205 *pdpt = 0000000005b22001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw sr_mod k8temp cdrom hwmon amd_rng
Pid: 2033, comm: find Not tainted (2.6.24-rc1-b1d08ac064268d0ae2281e98bf5e82627e0f0c56 #2)
EIP: 0060:[&lt;c1061205&gt;] EFLAGS: 00010246 CPU: 0
EIP is at vfs_readdir+0x47/0x74
EAX: c6b6a780 EBX: 00000000 ECX: c1061040 EDX: c5decf94
ESI: c6b6a780 EDI: fffffffe EBP: c9797c54 ESP: c5decf78
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process find (pid: 2033, ti=c5dec000 task=c64bba90 task.ti=c5dec000)
Stack: c5decf94 c1061040 fffffff7 0805ffbc 00000000 c6b6a780 c1061295 0805ffbc
       00000000 00000400 00000000 00000004 0805ffbc 4588eff4 c5dec000 c10026ba
       00000004 0805ffbc 00000400 0805ffbc 4588eff4 bfdc6c70 000000dc 0000007b
Call Trace:
 [&lt;c1061040&gt;] filldir64+0x0/0xc5
 [&lt;c1061295&gt;] sys_getdents64+0x63/0xa5
 [&lt;c10026ba&gt;] sysenter_past_esp+0x5f/0x85
 =======================
Code: 49 83 78 18 00 74 43 8d 6b 74 bf fe ff ff ff 89 e8 e8 b8 c0 12 00 f6 83 2c 01 00 00 10 75 22 8b 5e 10 8b 4c 24 04 89 f0 8b 14 24 &lt;ff&gt; 53 18 f6 46 1a 04 89 c7 75 0b 8b 56 0c 8b 46 08 e8 c8 66 00
EIP: [&lt;c1061205&gt;] vfs_readdir+0x47/0x74 SS:ESP 0068:c5decf78

hch: "Nice, getting rid of this is a very good step formwards.
      Unfortunately we have another copy of this junk in
      security/selinux/selinuxfs.c:sel_remove_entries() which would need the
      same treatment."

Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Acked-by: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Stephen Smalley &lt;sds@tycho.nsa.gov&gt;
Cc: James Morris &lt;jmorris@namei.org&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
proc_kill_inodes() can clear -&gt;i_fop in the middle of vfs_readdir resulting in
NULL dereference during "file-&gt;f_op-&gt;readdir(file, buf, filler)".

The solution is to remove proc_kill_inodes() completely:

a) we don't have tricky modules implementing their tricky readdir hooks which
   could keeping this revoke from hell.

b) In a situation when module is gone but PDE still alive, standard
   readdir will return only "." and "..", because pde-&gt;next was cleared by
   remove_proc_entry().

c) the race proc_kill_inode() destined to prevent is not completely
   fixed, just race window made smaller, because vfs_readdir() is run
   without sb_lock held and without file_list_lock held.  Effectively,
   -&gt;i_fop is cleared at random moment, which can't fix properly anything.

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000018
printing eip: c1061205 *pdpt = 0000000005b22001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw sr_mod k8temp cdrom hwmon amd_rng
Pid: 2033, comm: find Not tainted (2.6.24-rc1-b1d08ac064268d0ae2281e98bf5e82627e0f0c56 #2)
EIP: 0060:[&lt;c1061205&gt;] EFLAGS: 00010246 CPU: 0
EIP is at vfs_readdir+0x47/0x74
EAX: c6b6a780 EBX: 00000000 ECX: c1061040 EDX: c5decf94
ESI: c6b6a780 EDI: fffffffe EBP: c9797c54 ESP: c5decf78
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process find (pid: 2033, ti=c5dec000 task=c64bba90 task.ti=c5dec000)
Stack: c5decf94 c1061040 fffffff7 0805ffbc 00000000 c6b6a780 c1061295 0805ffbc
       00000000 00000400 00000000 00000004 0805ffbc 4588eff4 c5dec000 c10026ba
       00000004 0805ffbc 00000400 0805ffbc 4588eff4 bfdc6c70 000000dc 0000007b
Call Trace:
 [&lt;c1061040&gt;] filldir64+0x0/0xc5
 [&lt;c1061295&gt;] sys_getdents64+0x63/0xa5
 [&lt;c10026ba&gt;] sysenter_past_esp+0x5f/0x85
 =======================
Code: 49 83 78 18 00 74 43 8d 6b 74 bf fe ff ff ff 89 e8 e8 b8 c0 12 00 f6 83 2c 01 00 00 10 75 22 8b 5e 10 8b 4c 24 04 89 f0 8b 14 24 &lt;ff&gt; 53 18 f6 46 1a 04 89 c7 75 0b 8b 56 0c 8b 46 08 e8 c8 66 00
EIP: [&lt;c1061205&gt;] vfs_readdir+0x47/0x74 SS:ESP 0068:c5decf78

hch: "Nice, getting rid of this is a very good step formwards.
      Unfortunately we have another copy of this junk in
      security/selinux/selinuxfs.c:sel_remove_entries() which would need the
      same treatment."

Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Acked-by: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Stephen Smalley &lt;sds@tycho.nsa.gov&gt;
Cc: James Morris &lt;jmorris@namei.org&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>proc: fix proc_kill_inodes to kill dentries on all proc superblocks</title>
<updated>2007-11-15T02:45:38+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2007-11-15T00:59:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e1a1c997afe907e6ec4799e4be0f38cffd8b418c'/>
<id>e1a1c997afe907e6ec4799e4be0f38cffd8b418c</id>
<content type='text'>
It appears we overlooked support for removing generic proc files
when we added support for multiple proc super blocks.  Handle
that now.

[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Acked-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Acked-by: Sukadev Bhattiprolu &lt;sukadev@us.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It appears we overlooked support for removing generic proc files
when we added support for multiple proc super blocks.  Handle
that now.

[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Acked-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Acked-by: Sukadev Bhattiprolu &lt;sukadev@us.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Group short-lived and reclaimable kernel allocations</title>
<updated>2007-10-16T16:43:00+00:00</updated>
<author>
<name>Mel Gorman</name>
<email>mel@csn.ul.ie</email>
</author>
<published>2007-10-16T08:25:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e12ba74d8ff3e2f73a583500d7095e406df4d093'/>
<id>e12ba74d8ff3e2f73a583500d7095e406df4d093</id>
<content type='text'>
This patch marks a number of allocations that are either short-lived such as
network buffers or are reclaimable such as inode allocations.  When something
like updatedb is called, long-lived and unmovable kernel allocations tend to
be spread throughout the address space which increases fragmentation.

This patch groups these allocations together as much as possible by adding a
new MIGRATE_TYPE.  The MIGRATE_RECLAIMABLE type is for allocations that can be
reclaimed on demand, but not moved.  i.e.  they can be migrated by deleting
them and re-reading the information from elsewhere.

Signed-off-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Andy Whitcroft &lt;apw@shadowen.org&gt;
Cc: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch marks a number of allocations that are either short-lived such as
network buffers or are reclaimable such as inode allocations.  When something
like updatedb is called, long-lived and unmovable kernel allocations tend to
be spread throughout the address space which increases fragmentation.

This patch groups these allocations together as much as possible by adding a
new MIGRATE_TYPE.  The MIGRATE_RECLAIMABLE type is for allocations that can be
reclaimed on demand, but not moved.  i.e.  they can be migrated by deleting
them and re-reading the information from elsewhere.

Signed-off-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Andy Whitcroft &lt;apw@shadowen.org&gt;
Cc: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>procfs directory entry cleanup</title>
<updated>2007-07-16T16:05:43+00:00</updated>
<author>
<name>Changli Gao</name>
<email>xiaosuo@gmail.com</email>
</author>
<published>2007-07-16T06:40:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=99fc06df72fe1c9ad3ec274720dcb5658c40bfd2'/>
<id>99fc06df72fe1c9ad3ec274720dcb5658c40bfd2</id>
<content type='text'>
Function proc_register() will assign proc_dir_operations and
proc_dir_inode_operations to ent's members proc_fops and proc_iops
correctly if ent is a directory. So the early assignment isn't
necessary.

Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Function proc_register() will assign proc_dir_operations and
proc_dir_inode_operations to ent's members proc_fops and proc_iops
correctly if ent is a directory. So the early assignment isn't
necessary.

Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix rmmod/read/write races in /proc entries</title>
<updated>2007-07-16T16:05:39+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@sw.ru</email>
</author>
<published>2007-07-16T06:39:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=786d7e1612f0b0adb6046f19b906609e4fe8b1ba'/>
<id>786d7e1612f0b0adb6046f19b906609e4fe8b1ba</id>
<content type='text'>
Fix following races:
===========================================
1. Write via -&gt;write_proc sleeps in copy_from_user(). Module disappears
   meanwhile. Or, more generically, system call done on /proc file, method
   supplied by module is called, module dissapeares meanwhile.

   pde = create_proc_entry()
   if (!pde)
	return -ENOMEM;
   pde-&gt;write_proc = ...
				open
				write
				copy_from_user
   pde = create_proc_entry();
   if (!pde) {
	remove_proc_entry();
	return -ENOMEM;
	/* module unloaded */
   }
				*boom*
==========================================
2. bogo-revoke aka proc_kill_inodes()

  remove_proc_entry		vfs_read
  proc_kill_inodes		[check -&gt;f_op validness]
				[check -&gt;f_op-&gt;read validness]
				[verify_area, security permissions checks]
	-&gt;f_op = NULL;
				if (file-&gt;f_op-&gt;read)
					/* -&gt;f_op dereference, boom */

NOTE, NOTE, NOTE: file_operations are proxied for regular files only. Let's
see how this scheme behaves, then extend if needed for directories.
Directories creators in /proc only set -&gt;owner for them, so proxying for
directories may be unneeded.

NOTE, NOTE, NOTE: methods being proxied are -&gt;llseek, -&gt;read, -&gt;write,
-&gt;poll, -&gt;unlocked_ioctl, -&gt;ioctl, -&gt;compat_ioctl, -&gt;open, -&gt;release.
If your in-tree module uses something else, yell on me. Full audit pending.

[akpm@linux-foundation.org: build fix]
Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix following races:
===========================================
1. Write via -&gt;write_proc sleeps in copy_from_user(). Module disappears
   meanwhile. Or, more generically, system call done on /proc file, method
   supplied by module is called, module dissapeares meanwhile.

   pde = create_proc_entry()
   if (!pde)
	return -ENOMEM;
   pde-&gt;write_proc = ...
				open
				write
				copy_from_user
   pde = create_proc_entry();
   if (!pde) {
	remove_proc_entry();
	return -ENOMEM;
	/* module unloaded */
   }
				*boom*
==========================================
2. bogo-revoke aka proc_kill_inodes()

  remove_proc_entry		vfs_read
  proc_kill_inodes		[check -&gt;f_op validness]
				[check -&gt;f_op-&gt;read validness]
				[verify_area, security permissions checks]
	-&gt;f_op = NULL;
				if (file-&gt;f_op-&gt;read)
					/* -&gt;f_op dereference, boom */

NOTE, NOTE, NOTE: file_operations are proxied for regular files only. Let's
see how this scheme behaves, then extend if needed for directories.
Directories creators in /proc only set -&gt;owner for them, so proxying for
directories may be unneeded.

NOTE, NOTE, NOTE: methods being proxied are -&gt;llseek, -&gt;read, -&gt;write,
-&gt;poll, -&gt;unlocked_ioctl, -&gt;ioctl, -&gt;compat_ioctl, -&gt;open, -&gt;release.
If your in-tree module uses something else, yell on me. Full audit pending.

[akpm@linux-foundation.org: build fix]
Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix race between proc_readdir and remove_proc_entry</title>
<updated>2007-05-08T18:15:02+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@us.ibm.com</email>
</author>
<published>2007-05-08T07:25:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=59cd0cbc75367b82f704f63b104117462275060d'/>
<id>59cd0cbc75367b82f704f63b104117462275060d</id>
<content type='text'>
Fix the following race:

proc_readdir				remove_proc_entry
============				=================

spin_lock(&amp;proc_subdir_lock);
[choose PDE to start filldir from]
spin_unlock(&amp;proc_subdir_lock);
					spin_lock(&amp;proc_subdir_lock);
					[find PDE]
					[free PDE, refcount is 0]
					spin_unlock(&amp;proc_subdir_lock);
		    /* boom */
if (filldir(dirent, de-&gt;name, ...

[de_put on error path --adobriyan]
Signed-off-by: Darrick J. Wong &lt;djwong@us.ibm.com&gt;
Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix the following race:

proc_readdir				remove_proc_entry
============				=================

spin_lock(&amp;proc_subdir_lock);
[choose PDE to start filldir from]
spin_unlock(&amp;proc_subdir_lock);
					spin_lock(&amp;proc_subdir_lock);
					[find PDE]
					[free PDE, refcount is 0]
					spin_unlock(&amp;proc_subdir_lock);
		    /* boom */
if (filldir(dirent, de-&gt;name, ...

[de_put on error path --adobriyan]
Signed-off-by: Darrick J. Wong &lt;djwong@us.ibm.com&gt;
Signed-off-by: Alexey Dobriyan &lt;adobriyan@sw.ru&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
