linux-stable.git/include/linux/sched.h, branch linux-2.6.14.y

[PATCH] Fix signal sending in usbdevio on async URB completion

2005-10-10T23:16:33+00:00

If a process issues an URB from userspace and (starts to) terminate
before the URB comes back, we run into the issue described above.  This
is because the urb saves a pointer to "current" when it is posted to the
device, but there's no guarantee that this pointer is still valid
afterwards.

In fact, there are three separate issues:

1) the pointer to "current" can become invalid, since the task could be
   completely gone when the URB completion comes back from the device.

2) Even if the saved task pointer is still pointing to a valid task_struct,
   task_struct->sighand could have gone meanwhile.

3) Even if the process is perfectly fine, permissions may have changed,
   and we can no longer send it a signal.

So what we do instead, is to save the PID and uid's of the process, and
introduce a new kill_proc_info_as_uid() function.

Signed-off-by: Harald Welte 
[ Fixed up types and added symbol exports ]
Signed-off-by: Linus Torvalds

Revert task flag re-ordering, add comments

2005-09-29T22:18:21+00:00

Roland points out that the flags end up having non-obvious dependencies
elsewhere, so revert aa55a08687059aa169d10a313c41f238c2070488 and add
some comments about why things are as they are.

We'll just have to fix up the broken comparisons. Roland has a patch.

Signed-off-by: Linus Torvalds

[PATCH] fix TASK_STOPPED vs TASK_NONINTERACTIVE interaction

2005-09-29T16:05:52+00:00

do_signal_stop:

	for_each_thread(t) {
		if (t->state < TASK_STOPPED)
			++sig->group_stop_count;
	}

However, TASK_NONINTERACTIVE > TASK_STOPPED, so this loop will not
count TASK_INTERRUPTIBLE | TASK_NONINTERACTIVE threads.

See also wait_task_stopped(), which checks ->state > TASK_STOPPED.

Signed-off-by: Oleg Nesterov 

[ We really probably should always use the appropriate bitmasks to test
  task states, not do it like this. Using something like

	#define TASK_RUNNABLE (TASK_RUNNING | TASK_INTERRUPTIBLE | \
				TASK_UNINTERRUPTIBLE | TASK_NONINTERACTIVE)

  and then doing "if (task->state & TASK_RUNNABLE)" or similar. But the
  ordering of the task states is historical, and keeping the ordering
  does make sense regardless. ]

Signed-off-by: Linus Torvalds

[PATCH] set_current_state() commentary

2005-09-13T15:22:29+00:00

Explain the mysteries of set_current_state().

Quoth Linus:

 The scheduler itself never needs the memory barrier at all.

 The barrier is needed only if the user itself ends up testing some other
 thing afterwards, ie if you have

 	set_process_state(TASK_INTERRUPTIBLE);
 	if (still_need_to_sleep())
 		schedule();

 then the "still_need_to_sleep()" thing may test flags and wakeup events,
 and then you _may_ want to (and often do) make sure that the write of
 TASK_INTERRUPTIBLE is serialized wrt the reads of any wakeup data (since
 the wakeup may have happened on another CPU).

 So the comment is somewhat wrong. We don't really _care_ whether the state
 propagates out to other CPU's since all of our actions are purely local,
 and there is nothing we do that is conditional on any other CPU: we're
 going to sleep unconditionally, and the scheduler only cares about _our_
 state, not about somebody elses state.

Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] cpuset semaphore depth check optimize

2005-09-12T16:16:27+00:00

Optimize the deadlock avoidance check on the global cpuset
semaphore cpuset_sem.  Instead of adding a depth counter to the
task struct of each task, rather just two words are enough, one
to store the depth and the other the current cpuset_sem holder.

Thanks to Nikita Danilov for the idea.

Signed-off-by: Paul Jackson 

[ We may want to change this further, but at least it's now
  a totally internal decision to the cpusets code ]

Signed-off-by: Linus Torvalds

[PATCH] MCA/INIT: scheduler hooks

2005-09-11T21:01:30+00:00

Scheduler hooks to see/change which process is deemed to be on a cpu.

Signed-off-by: Keith Owens 
Signed-off-by: Tony Luck

[PATCH] add schedule_timeout_{,un}interruptible() interfaces

2005-09-10T17:06:36+00:00

Add schedule_timeout_{,un}interruptible() interfaces so that
schedule_timeout() callers don't have to worry about forgetting to add the
set_current_state() call beforehand.

Signed-off-by: Nishanth Aravamudan 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] sched: TASK_NONINTERACTIVE

2005-09-10T17:06:22+00:00

This patch implements a task state bit (TASK_NONINTERACTIVE), which can be
used by blocking points to mark the task's wait as "non-interactive".  This
does not mean the task will be considered a CPU-hog - the wait will simply
not have an effect on the waiting task's priority - positive or negative
alike.  Right now only pipe_wait() will make use of it, because it's a
common source of not-so-interactive waits (kernel compilation jobs, etc.).

Signed-off-by: Ingo Molnar 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] cpuset semaphore depth check deadlock fix

2005-09-10T17:06:21+00:00

The cpusets-formalize-intermediate-gfp_kernel-containment patch
has a deadlock problem.

This patch was part of a set of four patches to make more
extensive use of the cpuset 'mem_exclusive' attribute to
manage kernel GFP_KERNEL memory allocations and to constrain
the out-of-memory (oom) killer.

A task that is changing cpusets in particular ways on a system
when it is very short of free memory could double trip over
the global cpuset_sem semaphore (get the lock and then deadlock
trying to get it again).

The second attempt to get cpuset_sem would be in the routine
cpuset_zone_allowed().  This was discovered by code inspection.
I can not reproduce the problem except with an artifically
hacked kernel and a specialized stress test.

In real life you cannot hit this unless you are manipulating
cpusets, and are very unlikely to hit it unless you are rapidly
modifying cpusets on a memory tight system.  Even then it would
be a rare occurence.

If you did hit it, the task double tripping over cpuset_sem
would deadlock in the kernel, and any other task also trying
to manipulate cpusets would deadlock there too, on cpuset_sem.
Your batch manager would be wedged solid (if it was cpuset
savvy), but classic Unix shells and utilities would work well
enough to reboot the system.

The unusual condition that led to this bug is that unlike most
semaphores, cpuset_sem _can_ be acquired while in the page
allocation code, when __alloc_pages() calls cpuset_zone_allowed.
So it easy to mistakenly perform the following sequence:
  1) task makes system call to alter a cpuset
  2) take cpuset_sem
  3) try to allocate memory
  4) memory allocator, via cpuset_zone_allowed, trys to take cpuset_sem
  5) deadlock

The reason that this is not a serious bug for most users
is that almost all calls to allocate memory don't require
taking cpuset_sem.  Only some code paths off the beaten
track require taking cpuset_sem -- which is good.  Taking
a global semaphore on the main code path for allocating
memory would not scale well.

This patch fixes this deadlock by wrapping the up() and down()
calls on cpuset_sem in kernel/cpuset.c with code that tracks
the nesting depth of the current task on that semaphore, and
only does the real down() if the task doesn't hold the lock
already, and only does the real up() if the nesting depth
(number of unmatched downs) is exactly one.

The previous required use of refresh_mems(), anytime that
the cpuset_sem semaphore was acquired and the code executed
while holding that semaphore might try to allocate memory, is
no longer required.  Two refresh_mems() calls were removed
thanks to this.  This is a good change, as failing to get
all the necessary refresh_mems() calls placed was a primary
source of bugs in this cpuset code.  The only remaining call
to refresh_mems() is made while doing a memory allocation,
if certain task memory placement data needs to be updated
from its cpuset, due to the cpuset having been changed behind
the tasks back.

Signed-off-by: Paul Jackson 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] Prefetch kernel stacks to speed up context switch

2005-09-09T20:57:31+00:00

For architecture like ia64, the switch stack structure is fairly large
(currently 528 bytes).  For context switch intensive application, we found
that significant amount of cache misses occurs in switch_to() function.
The following patch adds a hook in the schedule() function to prefetch
switch stack structure as soon as 'next' task is determined.  This allows
maximum overlap in prefetch cache lines for that structure.

Signed-off-by: Ken Chen 
Cc: Ingo Molnar 
Cc: "Luck, Tony" 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds