<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/fs/proc/stat.c, branch linux-3.4.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>genirq: Prevent proc race against freeing of irq descriptors</title>
<updated>2015-04-14T09:33:46+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2014-12-11T22:01:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0d030473658c7760c4cdd4fd0cc61e287b6023b3'/>
<id>0d030473658c7760c4cdd4fd0cc61e287b6023b3</id>
<content type='text'>
commit c291ee622165cb2c8d4e7af63fffd499354a23be upstream.

Since the rework of the sparse interrupt code to actually free the
unused interrupt descriptors there exists a race between the /proc
interfaces to the irq subsystem and the code which frees the interrupt
descriptor.

CPU0				CPU1
				show_interrupts()
				  desc = irq_to_desc(X);
free_desc(desc)
  remove_from_radix_tree();
  kfree(desc);
				  raw_spinlock_irq(&amp;desc-&gt;lock);

/proc/interrupts is the only interface which can actively corrupt
kernel memory via the lock access. /proc/stat can only read from freed
memory. Extremly hard to trigger, but possible.

The interfaces in /proc/irq/N/ are not affected by this because the
removal of the proc file is serialized in procfs against concurrent
readers/writers. The removal happens before the descriptor is freed.

For architectures which have CONFIG_SPARSE_IRQ=n this is a non issue
as the descriptor is never freed. It's merely cleared out with the irq
descriptor lock held. So any concurrent proc access will either see
the old correct value or the cleared out ones.

Protect the lookup and access to the irq descriptor in
show_interrupts() with the sparse_irq_lock.

Provide kstat_irqs_usr() which is protecting the lookup and access
with sparse_irq_lock and switch /proc/stat to use it.

Document the existing kstat_irqs interfaces so it's clear that the
caller needs to take care about protection. The users of these
interfaces are either not affected due to SPARSE_IRQ=n or already
protected against removal.

Fixes: 1f5a5b87f78f "genirq: Implement a sane sparse_irq allocator"
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
[lizf: Backported to 3.4:
 - define kstat_irqs() for CONFIG_GENERIC_HARDIRQS
 - add ifdef/endif CONFIG_SPARSE_IRQ]
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit c291ee622165cb2c8d4e7af63fffd499354a23be upstream.

Since the rework of the sparse interrupt code to actually free the
unused interrupt descriptors there exists a race between the /proc
interfaces to the irq subsystem and the code which frees the interrupt
descriptor.

CPU0				CPU1
				show_interrupts()
				  desc = irq_to_desc(X);
free_desc(desc)
  remove_from_radix_tree();
  kfree(desc);
				  raw_spinlock_irq(&amp;desc-&gt;lock);

/proc/interrupts is the only interface which can actively corrupt
kernel memory via the lock access. /proc/stat can only read from freed
memory. Extremly hard to trigger, but possible.

The interfaces in /proc/irq/N/ are not affected by this because the
removal of the proc file is serialized in procfs against concurrent
readers/writers. The removal happens before the descriptor is freed.

For architectures which have CONFIG_SPARSE_IRQ=n this is a non issue
as the descriptor is never freed. It's merely cleared out with the irq
descriptor lock held. So any concurrent proc access will either see
the old correct value or the cleared out ones.

Protect the lookup and access to the irq descriptor in
show_interrupts() with the sparse_irq_lock.

Provide kstat_irqs_usr() which is protecting the lookup and access
with sparse_irq_lock and switch /proc/stat to use it.

Document the existing kstat_irqs interfaces so it's clear that the
caller needs to take care about protection. The users of these
interfaces are either not affected due to SPARSE_IRQ=n or already
protected against removal.

Fixes: 1f5a5b87f78f "genirq: Implement a sane sparse_irq allocator"
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
[lizf: Backported to 3.4:
 - define kstat_irqs() for CONFIG_GENERIC_HARDIRQS
 - add ifdef/endif CONFIG_SPARSE_IRQ]
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nohz: Fix idle ticks in cpu summary line of /proc/stat</title>
<updated>2012-10-28T17:14:12+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.cz</email>
</author>
<published>2012-10-10T06:21:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d39bd2701f0f414aba1376ae61202b526ec7e128'/>
<id>d39bd2701f0f414aba1376ae61202b526ec7e128</id>
<content type='text'>
commit 7386cdbf2f57ea8cff3c9fde93f206e58b9fe13f upstream.

Git commit 09a1d34f8535ecf9 "nohz: Make idle/iowait counter update
conditional" introduced a bug in regard to cpu hotplug. The effect is
that the number of idle ticks in the cpu summary line in /proc/stat is
still counting ticks for offline cpus.

Reproduction is easy, just start a workload that keeps all cpus busy,
switch off one or more cpus and then watch the idle field in top.
On a dual-core with one cpu 100% busy and one offline cpu you will get
something like this:

%Cpu(s): 48.7 us,  1.3 sy,  0.0 ni, 50.0 id,  0.0 wa,  0.0 hi,  0.0 si,
%0.0 st

The problem is that an offline cpu still has ts-&gt;idle_active == 1.
To fix this we should make sure that the cpu is online when calling
get_cpu_idle_time_us and get_cpu_iowait_time_us.

[Srivatsa: Rebased to current mainline]

Reported-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Reviewed-by: Srivatsa S. Bhat &lt;srivatsa.bhat@linux.vnet.ibm.com&gt;
Signed-off-by: Srivatsa S. Bhat &lt;srivatsa.bhat@linux.vnet.ibm.com&gt;
Link: http://lkml.kernel.org/r/20121010061820.8999.57245.stgit@srivatsabhat.in.ibm.com
Cc: deepthi@linux.vnet.ibm.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 7386cdbf2f57ea8cff3c9fde93f206e58b9fe13f upstream.

Git commit 09a1d34f8535ecf9 "nohz: Make idle/iowait counter update
conditional" introduced a bug in regard to cpu hotplug. The effect is
that the number of idle ticks in the cpu summary line in /proc/stat is
still counting ticks for offline cpus.

Reproduction is easy, just start a workload that keeps all cpus busy,
switch off one or more cpus and then watch the idle field in top.
On a dual-core with one cpu 100% busy and one offline cpu you will get
something like this:

%Cpu(s): 48.7 us,  1.3 sy,  0.0 ni, 50.0 id,  0.0 wa,  0.0 hi,  0.0 si,
%0.0 st

The problem is that an offline cpu still has ts-&gt;idle_active == 1.
To fix this we should make sure that the cpu is online when calling
get_cpu_idle_time_us and get_cpu_iowait_time_us.

[Srivatsa: Rebased to current mainline]

Reported-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Reviewed-by: Srivatsa S. Bhat &lt;srivatsa.bhat@linux.vnet.ibm.com&gt;
Signed-off-by: Srivatsa S. Bhat &lt;srivatsa.bhat@linux.vnet.ibm.com&gt;
Link: http://lkml.kernel.org/r/20121010061820.8999.57245.stgit@srivatsabhat.in.ibm.com
Cc: deepthi@linux.vnet.ibm.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>proc: stats: Use arch_idle_time for idle and iowait times if available</title>
<updated>2012-03-30T13:43:33+00:00</updated>
<author>
<name>Martin Schwidefsky</name>
<email>schwidefsky@de.ibm.com</email>
</author>
<published>2012-03-30T10:23:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=cb85a6ed67e979c59a29b7b4e8217e755b951cf4'/>
<id>cb85a6ed67e979c59a29b7b4e8217e755b951cf4</id>
<content type='text'>
Git commit a25cac5198d4ff28 "proc: Consider NO_HZ when printing idle and
iowait times" changes the code for /proc/stat to use get_cpu_idle_time_us
and get_cpu_iowait_time_us if the system is running with nohz enabled.
For architectures which define arch_idle_time (currently s390 only)
this is a change for the worse. The result of arch_idle_time is supposed
to be the exact sleep time of the target cpu and should be used instead
of the value kept by the scheduler.

Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Reviewed-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Reviewed-by: Srivatsa S. Bhat &lt;srivatsa.bhat@linux.vnet.ibm.com&gt;
Link: http://lkml.kernel.org/r/20120330122308.18720283@de.ibm.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Git commit a25cac5198d4ff28 "proc: Consider NO_HZ when printing idle and
iowait times" changes the code for /proc/stat to use get_cpu_idle_time_us
and get_cpu_iowait_time_us if the system is running with nohz enabled.
For architectures which define arch_idle_time (currently s390 only)
this is a change for the worse. The result of arch_idle_time is supposed
to be the exact sleep time of the target cpu and should be used instead
of the value kept by the scheduler.

Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Reviewed-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Reviewed-by: Srivatsa S. Bhat &lt;srivatsa.bhat@linux.vnet.ibm.com&gt;
Link: http://lkml.kernel.org/r/20120330122308.18720283@de.ibm.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>procfs: add num_to_str() to speed up /proc/stat</title>
<updated>2012-03-23T23:58:42+00:00</updated>
<author>
<name>KAMEZAWA Hiroyuki</name>
<email>kamezawa.hiroyu@jp.fujitsu.com</email>
</author>
<published>2012-03-23T22:02:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1ac101a5d675aca2426c5cd460c73fb95acb8391'/>
<id>1ac101a5d675aca2426c5cd460c73fb95acb8391</id>
<content type='text'>
== stat_check.py
num = 0
with open("/proc/stat") as f:
        while num &lt; 1000 :
                data = f.read()
                f.seek(0, 0)
                num = num + 1
==

perf shows

    20.39%  stat_check.py  [kernel.kallsyms]    [k] format_decode
    13.41%  stat_check.py  [kernel.kallsyms]    [k] number
    12.61%  stat_check.py  [kernel.kallsyms]    [k] vsnprintf
    10.85%  stat_check.py  [kernel.kallsyms]    [k] memcpy
     4.85%  stat_check.py  [kernel.kallsyms]    [k] radix_tree_lookup
     4.43%  stat_check.py  [kernel.kallsyms]    [k] seq_printf

This patch removes most of calls to vsnprintf() by adding num_to_str()
and seq_print_decimal_ull(), which prints decimal numbers without rich
functions provided by printf().

On my 8cpu box.
== Before patch ==
[root@bluextal test]# time ./stat_check.py

real    0m0.150s
user    0m0.026s
sys     0m0.121s

== After patch ==
[root@bluextal test]# time ./stat_check.py

real    0m0.055s
user    0m0.022s
sys     0m0.030s

[akpm@linux-foundation.org: remove incorrect comment, use less statck in num_to_str(), move comment from .h to .c, simplify seq_put_decimal_ull()]
[andrea@betterlinux.com: avoid breaking the ABI in /proc/stat]
Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrea Righi &lt;andrea@betterlinux.com&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Glauber Costa &lt;glommer@parallels.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Paul Turner &lt;pjt@google.com&gt;
Cc: Russell King &lt;rmk@arm.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
== stat_check.py
num = 0
with open("/proc/stat") as f:
        while num &lt; 1000 :
                data = f.read()
                f.seek(0, 0)
                num = num + 1
==

perf shows

    20.39%  stat_check.py  [kernel.kallsyms]    [k] format_decode
    13.41%  stat_check.py  [kernel.kallsyms]    [k] number
    12.61%  stat_check.py  [kernel.kallsyms]    [k] vsnprintf
    10.85%  stat_check.py  [kernel.kallsyms]    [k] memcpy
     4.85%  stat_check.py  [kernel.kallsyms]    [k] radix_tree_lookup
     4.43%  stat_check.py  [kernel.kallsyms]    [k] seq_printf

This patch removes most of calls to vsnprintf() by adding num_to_str()
and seq_print_decimal_ull(), which prints decimal numbers without rich
functions provided by printf().

On my 8cpu box.
== Before patch ==
[root@bluextal test]# time ./stat_check.py

real    0m0.150s
user    0m0.026s
sys     0m0.121s

== After patch ==
[root@bluextal test]# time ./stat_check.py

real    0m0.055s
user    0m0.022s
sys     0m0.030s

[akpm@linux-foundation.org: remove incorrect comment, use less statck in num_to_str(), move comment from .h to .c, simplify seq_put_decimal_ull()]
[andrea@betterlinux.com: avoid breaking the ABI in /proc/stat]
Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrea Righi &lt;andrea@betterlinux.com&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Glauber Costa &lt;glommer@parallels.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Paul Turner &lt;pjt@google.com&gt;
Cc: Russell King &lt;rmk@arm.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>proc: speed up /proc/stat handling</title>
<updated>2012-03-23T23:58:42+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2012-03-23T22:02:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=59a32e2ce5eb809967cac4e718bc527beca83c59'/>
<id>59a32e2ce5eb809967cac4e718bc527beca83c59</id>
<content type='text'>
On a typical 16 cpus machine, "cat /proc/stat" gives more than 4096 bytes,
and is slow :

  # strace -T -o /tmp/STRACE cat /proc/stat | wc -c
  5826
  # grep "cpu " /tmp/STRACE
  read(0, "cpu  1949310 19 2144714 12117253"..., 32768) = 5826 &lt;0.001504&gt;

Thats partly because show_stat() must be called twice since initial
buffer size is too small (4096 bytes for less than 32 possible cpus)

Fix this by :

 1) Taking into account nr_irqs in the initial buffer sizing.

 2) Using ksize() to allow better filling of initial buffer.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Glauber Costa &lt;glommer@parallels.com&gt;
Cc: Russell King - ARM Linux &lt;linux@arm.linux.org.uk&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Paul Turner &lt;pjt@google.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On a typical 16 cpus machine, "cat /proc/stat" gives more than 4096 bytes,
and is slow :

  # strace -T -o /tmp/STRACE cat /proc/stat | wc -c
  5826
  # grep "cpu " /tmp/STRACE
  read(0, "cpu  1949310 19 2144714 12117253"..., 32768) = 5826 &lt;0.001504&gt;

Thats partly because show_stat() must be called twice since initial
buffer size is too small (4096 bytes for less than 32 possible cpus)

Fix this by :

 1) Taking into account nr_irqs in the initial buffer sizing.

 2) Using ksize() to allow better filling of initial buffer.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Glauber Costa &lt;glommer@parallels.com&gt;
Cc: Russell King - ARM Linux &lt;linux@arm.linux.org.uk&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Paul Turner &lt;pjt@google.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/accounting, proc: Fix /proc/stat interrupts sum</title>
<updated>2012-01-16T07:13:27+00:00</updated>
<author>
<name>Russell King</name>
<email>rmk+kernel@arm.linux.org.uk</email>
</author>
<published>2012-01-14T00:01:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f7e6746ebae984ea67b0a1a1e23c7e6698240631'/>
<id>f7e6746ebae984ea67b0a1a1e23c7e6698240631</id>
<content type='text'>
Commit 3292beb340c7688 ("sched/accounting: Change cpustat fields to an array")
deleted the code which provides us with the sum of all interrupts in the
system, causing vmstat to report zero interrupts occuring in the system.

Fix this by restoring the code.

Signed-off-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
Tested-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt; # [on ARM]
Tested-by: Tony Luck &lt;tony.luck@intel.com&gt;
Tested-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Glauber Costa &lt;glommer@parallels.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Paul Tuner &lt;pjt@google.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 3292beb340c7688 ("sched/accounting: Change cpustat fields to an array")
deleted the code which provides us with the sum of all interrupts in the
system, causing vmstat to report zero interrupts occuring in the system.

Fix this by restoring the code.

Signed-off-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
Tested-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt; # [on ARM]
Tested-by: Tony Luck &lt;tony.luck@intel.com&gt;
Tested-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Glauber Costa &lt;glommer@parallels.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Paul Tuner &lt;pjt@google.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2012-01-06T16:44:54+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-01-06T16:33:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0db49b72bce26341274b74fd968501489a361ae3'/>
<id>0db49b72bce26341274b74fd968501489a361ae3</id>
<content type='text'>
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  sched/tracing: Add a new tracepoint for sleeptime
  sched: Disable scheduler warnings during oopses
  sched: Fix cgroup movement of waking process
  sched: Fix cgroup movement of newly created process
  sched: Fix cgroup movement of forking process
  sched: Remove cfs bandwidth period check in tg_set_cfs_period()
  sched: Fix load-balance lock-breaking
  sched: Replace all_pinned with a generic flags field
  sched: Only queue remote wakeups when crossing cache boundaries
  sched: Add missing rcu_dereference() around -&gt;real_parent usage
  [S390] fix cputime overflow in uptime_proc_show
  [S390] cputime: add sparse checking and cleanup
  sched: Mark parent and real_parent as __rcu
  sched, nohz: Fix missing RCU read lock
  sched, nohz: Set the NOHZ_BALANCE_KICK flag for idle load balancer
  sched, nohz: Fix the idle cpu check in nohz_idle_balance
  sched: Use jump_labels for sched_feat
  sched/accounting: Fix parameter passing in task_group_account_field
  sched/accounting: Fix user/system tick double accounting
  sched/accounting: Re-use scheduler statistics for the root cgroup
  ...

Fix up conflicts in
 - arch/ia64/include/asm/cputime.h, include/asm-generic/cputime.h
	usecs_to_cputime64() vs the sparse cleanups
 - kernel/sched/fair.c, kernel/time/tick-sched.c
	scheduler changes in multiple branches
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  sched/tracing: Add a new tracepoint for sleeptime
  sched: Disable scheduler warnings during oopses
  sched: Fix cgroup movement of waking process
  sched: Fix cgroup movement of newly created process
  sched: Fix cgroup movement of forking process
  sched: Remove cfs bandwidth period check in tg_set_cfs_period()
  sched: Fix load-balance lock-breaking
  sched: Replace all_pinned with a generic flags field
  sched: Only queue remote wakeups when crossing cache boundaries
  sched: Add missing rcu_dereference() around -&gt;real_parent usage
  [S390] fix cputime overflow in uptime_proc_show
  [S390] cputime: add sparse checking and cleanup
  sched: Mark parent and real_parent as __rcu
  sched, nohz: Fix missing RCU read lock
  sched, nohz: Set the NOHZ_BALANCE_KICK flag for idle load balancer
  sched, nohz: Fix the idle cpu check in nohz_idle_balance
  sched: Use jump_labels for sched_feat
  sched/accounting: Fix parameter passing in task_group_account_field
  sched/accounting: Fix user/system tick double accounting
  sched/accounting: Re-use scheduler statistics for the root cgroup
  ...

Fix up conflicts in
 - arch/ia64/include/asm/cputime.h, include/asm-generic/cputime.h
	usecs_to_cputime64() vs the sparse cleanups
 - kernel/sched/fair.c, kernel/time/tick-sched.c
	scheduler changes in multiple branches
</pre>
</div>
</content>
</entry>
<entry>
<title>procfs: do not confuse jiffies with cputime64_t</title>
<updated>2011-12-30T00:31:57+00:00</updated>
<author>
<name>Andreas Schwab</name>
<email>schwab@linux-m68k.org</email>
</author>
<published>2011-12-28T23:57:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=34845636a184f3be91a531098192592cbe6db587'/>
<id>34845636a184f3be91a531098192592cbe6db587</id>
<content type='text'>
Commit 2a95ea6c0d129b4 ("procfs: do not overflow get_{idle,iowait}_time
for nohz") did not take into account that one some architectures jiffies
and cputime use different units.

This causes get_idle_time() to return numbers in the wrong units, making
the idle time fields in /proc/stat wrong.

Instead of converting the usec value returned by
get_cpu_{idle,iowait}_time_us to units of jiffies, use the new function
usecs_to_cputime64 to convert it to the correct unit of cputime64_t.

Signed-off-by: Andreas Schwab &lt;schwab@linux-m68k.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: "Artem S. Tashkinov" &lt;t.artem@mailcity.com&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 2a95ea6c0d129b4 ("procfs: do not overflow get_{idle,iowait}_time
for nohz") did not take into account that one some architectures jiffies
and cputime use different units.

This causes get_idle_time() to return numbers in the wrong units, making
the idle time fields in /proc/stat wrong.

Instead of converting the usec value returned by
get_cpu_{idle,iowait}_time_us to units of jiffies, use the new function
usecs_to_cputime64 to convert it to the correct unit of cputime64_t.

Signed-off-by: Andreas Schwab &lt;schwab@linux-m68k.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: "Artem S. Tashkinov" &lt;t.artem@mailcity.com&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge commit 'v3.2-rc5' into sched/core</title>
<updated>2011-12-15T07:21:30+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@elte.hu</email>
</author>
<published>2011-12-15T07:21:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6a54aebf6978e9f296a4d3da3e40af425163c22e'/>
<id>6a54aebf6978e9f296a4d3da3e40af425163c22e</id>
<content type='text'>
Merge reason: Pick up the latest fixes.

Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Merge reason: Pick up the latest fixes.

Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>procfs: do not overflow get_{idle,iowait}_time for nohz</title>
<updated>2011-12-09T15:50:29+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.cz</email>
</author>
<published>2011-12-08T22:34:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2a95ea6c0d129b4568fb64e1deda16ceb20e6636'/>
<id>2a95ea6c0d129b4568fb64e1deda16ceb20e6636</id>
<content type='text'>
Since commit a25cac5198d4 ("proc: Consider NO_HZ when printing idle and
iowait times") we are reporting idle/io_wait time also while a CPU is
tickless.  We rely on get_{idle,iowait}_time functions to retrieve
proper data.

These functions, however, use usecs_to_cputime to translate micro
seconds time to cputime64_t.  This is just an alias to usecs_to_jiffies
which reduces the data type from u64 to unsigned int and also checks
whether the given parameter overflows jiffies_to_usecs(MAX_JIFFY_OFFSET)
and returns MAX_JIFFY_OFFSET in that case.

When we overflow depends on CONFIG_HZ but especially for CONFIG_HZ_300
it is quite low (1431649781) so we are getting MAX_JIFFY_OFFSET for
&gt;3000s! until we overflow unsigned int.  Just for reference
CONFIG_HZ_100 has an overflow window around 20s, CONFIG_HZ_250 ~8s and
CONFIG_HZ_1000 ~2s.

This results in a bug when people saw [h]top going mad reporting 100%
CPU usage even though there was basically no CPU load.  The reason was
simply that /proc/stat stopped reporting idle/io_wait changes (and
reported MAX_JIFFY_OFFSET) and so the only change happening was for user
system time.

Let's use nsecs_to_jiffies64 instead which doesn't reduce the precision
to 32b type and it is much more appropriate for cumulative time values
(unlike usecs_to_jiffies which intended for timeout calculations).

Signed-off-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Tested-by: Artem S. Tashkinov &lt;t.artem@mailcity.com&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since commit a25cac5198d4 ("proc: Consider NO_HZ when printing idle and
iowait times") we are reporting idle/io_wait time also while a CPU is
tickless.  We rely on get_{idle,iowait}_time functions to retrieve
proper data.

These functions, however, use usecs_to_cputime to translate micro
seconds time to cputime64_t.  This is just an alias to usecs_to_jiffies
which reduces the data type from u64 to unsigned int and also checks
whether the given parameter overflows jiffies_to_usecs(MAX_JIFFY_OFFSET)
and returns MAX_JIFFY_OFFSET in that case.

When we overflow depends on CONFIG_HZ but especially for CONFIG_HZ_300
it is quite low (1431649781) so we are getting MAX_JIFFY_OFFSET for
&gt;3000s! until we overflow unsigned int.  Just for reference
CONFIG_HZ_100 has an overflow window around 20s, CONFIG_HZ_250 ~8s and
CONFIG_HZ_1000 ~2s.

This results in a bug when people saw [h]top going mad reporting 100%
CPU usage even though there was basically no CPU load.  The reason was
simply that /proc/stat stopped reporting idle/io_wait changes (and
reported MAX_JIFFY_OFFSET) and so the only change happening was for user
system time.

Let's use nsecs_to_jiffies64 instead which doesn't reduce the precision
to 32b type and it is much more appropriate for cumulative time values
(unlike usecs_to_jiffies which intended for timeout calculations).

Signed-off-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Tested-by: Artem S. Tashkinov &lt;t.artem@mailcity.com&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
