linux-stable.git/kernel/sys.c, branch linux-2.6.20.y

[PATCH] setpgid(child) fails if the child was forked by sub-thread

2007-10-17T19:30:29+00:00

commit b07e35f94a7b6a059f889b904529ee907dc0634d in mainline tree

Spotted by Marcin Kowalczyk .

sys_setpgid(child) fails if the child was forked by sub-thread.

Fix the "is it our child" check. The previous commit
ee0acf90d320c29916ba8c5c1b2e908d81f5057d was not complete.

(this patch asks for the new same_thread_group() helper, but mainline doesn't
 have it yet).

Signed-off-by: Oleg Nesterov 
Acked-by: Roland McGrath 
Tested-by: "Marcin 'Qrczak' Kowalczyk" 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

CPU time limit patch / setrlimit(RLIMIT_CPU, 0) cheat fix

2007-09-23T09:22:24+00:00

CPU time limit patch / setrlimit(RLIMIT_CPU, 0) cheat fix

As discovered here today, the change in Kernel 2.6.17 intended to inhibit
users from setting RLIMIT_CPU to 0 (as that is equivalent to unlimited) by
"cheating" and setting it to 1 in such a case, does not make a difference,
as the check is done in the wrong place (too late), and only applies to the
profiling code.

On all systems I checked running kernels above 2.6.17, no matter what the
hard and soft CPU time limits were before, a user could escape them by
issuing in the shell (sh/bash/zsh) "ulimit -t 0", and then the user's
process was not ever killed.

Attached is a trivial patch to fix that.  Simply moving the check to a
slightly earlier location (specifically, before the line that actually
assigns the limit - *old_rlim = new_rlim), does the trick.

Do note that at least the zsh (but not ash, dash, or bash) shell has the
problem of "caching" the limits set by the ulimit command, so when running
zsh the fix will not immediately be evident - after entering "ulimit -t 0",
"ulimit -a" will show "-t: cpu time (seconds) 0", even though the actual
limit as returned by getrlimit(...) will be 1.  It can be verified by
opening a subshell (which will not have the values of the parent shell in
cache) and checking in it, or just by running a CPU intensive command like
"echo '65536^1048576' | bc" and verifying that it dumps core after one
second.

Regardless of whether that is a misfeature in the shell, perhaps it would
be better to return -EINVAL from setrlimit in such a case instead of
cheating and setting to 1, as that does not really reflect the actual state
of the process anymore.  I do not however know what the ground for that
decision was in the original 2.6.17 change, and whether there would be any
"backward" compatibility issues, so I preferred not to touch that right
now.

Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] notifiers: fix blocking_notifier_call_chain() scalability

2007-01-23T19:08:03+00:00

while lock-profiling the -rt kernel i noticed weird contention during
mmap-intense workloads, and the tracer showed the following gem, in one
of our MM hotpaths:

 threaded-2771  1....   65us : sys_munmap (sysenter_do_call)
 threaded-2771  1....   66us : profile_munmap (sys_munmap)
 threaded-2771  1....   66us : blocking_notifier_call_chain (profile_munmap)
 threaded-2771  1....   66us : rt_down_read (blocking_notifier_call_chain)

ouch! a global rw-semaphore taken in one of the most performance-
sensitive codepaths of the kernel.  And i dont even have oprofile
enabled! All distro kernels have CONFIG_PROFILING enabled, so this
scalability problem affects the majority of Linux users.

The fix is to enhance blocking_notifier_call_chain() to only take the
lock if there appears to be work on the call-chain.

With this patch applied i get nicely saturated system, and much higher
munmap performance, on SMP systems.

And as a bonus this also fixes a similar scalability bottleneck in the
thread-exit codepath: profile_task_exit() ...

Signed-off-by: Ingo Molnar 
Acked-by: Peter Zijlstra 
Acked-by: Nick Piggin 
Signed-off-by: Linus Torvalds

[PATCH] sys_setpgid: eliminate unnecessary do_each_task_pid(PIDTYPE_PGID)

2006-12-08T16:28:52+00:00

All tasks in the process group have the same sid, we don't need to iterate
them all to check that the caller of sys_setpgid() doesn't change its
session.

Signed-off-by: Oleg Nesterov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] add process_session() helper routine

2006-12-08T16:28:51+00:00

Replace occurences of task->signal->session by a new process_session() helper
routine.

It will be useful for pid namespaces to abstract the session pid number.

Signed-off-by: Cedric Le Goater 
Cc: Kirill Korotaev 
Cc: Eric W. Biederman 
Cc: Herbert Poetzl 
Cc: Sukadev Bhattiprolu 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] tty: ->signal->tty locking

2006-12-08T16:28:38+00:00

Fix the locking of signal->tty.

Use ->sighand->siglock to protect ->signal->tty; this lock is already used
by most other members of ->signal/->sighand.  And unless we are 'current'
or the tasklist_lock is held we need ->siglock to access ->signal anyway.

(NOTE: sys_unshare() is broken wrt ->sighand locking rules)

Note that tty_mutex is held over tty destruction, so while holding
tty_mutex any tty pointer remains valid.  Otherwise the lifetime of ttys
are governed by their open file handles.  This leaves some holes for tty
access from signal->tty (or any other non file related tty access).

It solves the tty SLAB scribbles we were seeing.

(NOTE: the change from group_send_sig_info to __group_send_sig_info needs to
       be examined by someone familiar with the security framework, I think
       it is safe given the SEND_SIG_PRIV from other __group_send_sig_info
       invocations)

[schwidefsky@de.ibm.com: 3270 fix]
[akpm@osdl.org: various post-viro fixes]
Signed-off-by: Peter Zijlstra 
Acked-by: Alan Cox 
Cc: Oleg Nesterov 
Cc: Prarit Bhargava 
Cc: Chris Wright 
Cc: Roland McGrath 
Cc: Stephen Smalley 
Cc: James Morris 
Cc: "David S. Miller" 
Cc: Jeff Dike 
Cc: Martin Schwidefsky 
Cc: Jan Kara 
Signed-off-by: Martin Schwidefsky 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] sys: remove unused variable

2006-12-07T16:39:44+00:00

Remove unused 'new_ruid' variable.

Reported by David Binderman .

Signed-off-by: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

WorkStruct: Pass the work_struct pointer instead of context data

2006-11-22T14:55:48+00:00

Pass the work_struct pointer to the work function rather than context data.
The work function can use container_of() to work out the data.

For the cases where the container of the work_struct may go away the moment the
pending bit is cleared, it is made possible to defer the release of the
structure by deferring the clearing of the pending bit.

To make this work, an extra flag is introduced into the management side of the
work_struct.  This governs auto-release of the structure upon execution.

Ordinarily, the work queue executor would release the work_struct for further
scheduling or deallocation by clearing the pending bit prior to jumping to the
work function.  This means that, unless the driver makes some guarantee itself
that the work_struct won't go away, the work function may not access anything
else in the work_struct or its container lest they be deallocated..  This is a
problem if the auxiliary data is taken away (as done by the last patch).

However, if the pending bit is *not* cleared before jumping to the work
function, then the work function *may* access the work_struct and its container
with no problems.  But then the work function must itself release the
work_struct by calling work_release().

In most cases, automatic release is fine, so this is the default.  Special
initiators exist for the non-auto-release case (ending in _NAR).


Signed-Off-By: David Howells

[PATCH] SRCU: report out-of-memory errors

2006-10-04T14:55:30+00:00

Currently the init_srcu_struct() routine has no way to report out-of-memory
errors.  This patch (as761) makes it return -ENOMEM when the per-cpu data
allocation fails.

The patch also makes srcu_init_notifier_head() report a BUG if a notifier
head can't be initialized.  Perhaps it should return -ENOMEM instead, but
in the most likely cases where this might occur I don't think any recovery
is possible.  Notifier chains generally are not created dynamically.

[akpm@osdl.org: avoid statement-with-side-effect in macro]
Signed-off-by: Alan Stern 
Acked-by: Paul E. McKenney 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] Add SRCU-based notifier chains

2006-10-04T14:55:30+00:00

This patch (as751) adds a new type of notifier chain, based on the SRCU
(Sleepable Read-Copy Update) primitives recently added to the kernel.  An
SRCU notifier chain is much like a blocking notifier chain, in that it must
be called in process context and its callout routines are allowed to sleep.
 The difference is that the chain's links are protected by the SRCU
mechanism rather than by an rw-semaphore, so calling the chain has
extremely low overhead: no memory barriers and no cache-line bouncing.  On
the other hand, unregistering from the chain is expensive and the chain
head requires special runtime initialization (plus cleanup if it is to be
deallocated).

SRCU notifiers are appropriate for notifiers that will be called very
frequently and for which unregistration occurs very seldom.  The proposed
"task notifier" scheme qualifies, as may some of the network notifiers.

Signed-off-by: Alan Stern 
Acked-by: Paul E. McKenney 
Acked-by: Chandra Seetharaman 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds