<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/kernel/sched/cputime.c, branch v4.4.2</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>sched/cputime: Fix invalid gtime in proc</title>
<updated>2015-12-04T09:18:49+00:00</updated>
<author>
<name>Hiroshi Shimamoto</name>
<email>h-shimamoto@ct.jp.nec.com</email>
</author>
<published>2015-11-19T15:47:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2541117b0cf79977fa11a0d6e17d61010677bd7b'/>
<id>2541117b0cf79977fa11a0d6e17d61010677bd7b</id>
<content type='text'>
/proc/stats shows invalid gtime when the thread is running in guest.
When vtime accounting is not enabled, we cannot get a valid delta.
The delta is calculated with now - tsk-&gt;vtime_snap, but tsk-&gt;vtime_snap
is only updated when vtime accounting is runtime enabled.

This patch makes task_gtime() just return gtime without computing the
buggy non-existing tickless delta when vtime accounting is not enabled.

Use context_tracking_is_enabled() to check if vtime is accounting on
some cpu, in which case only we need to check the tickless delta. This
way we fix the gtime value regression on machines not running nohz full.

The kernel config contains CONFIG_VIRT_CPU_ACCOUNTING_GEN=y and
CONFIG_NO_HZ_FULL_ALL=n and boot without nohz_full.

I ran and stop a busy loop in VM and see the gtime in host.
Dump the 43rd field which shows the gtime in every second:

	 # while :; do awk '{print $3" "$43}' /proc/3955/task/4014/stat; sleep 1; done
	S 4348
	R 7064566
	R 7064766
	R 7064967
	R 7065168
	S 4759
	S 4759

During running busy loop, it returns large value.

After applying this patch, we can see right gtime.

	 # while :; do awk '{print $3" "$43}' /proc/10913/task/10956/stat; sleep 1; done
	S 5338
	R 5365
	R 5465
	R 5566
	R 5666
	S 5726
	S 5726

Signed-off-by: Hiroshi Shimamoto &lt;h-shimamoto@ct.jp.nec.com&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Chris Metcalf &lt;cmetcalf@ezchip.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Luiz Capitulino &lt;lcapitulino@redhat.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Paul E . McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1447948054-28668-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
/proc/stats shows invalid gtime when the thread is running in guest.
When vtime accounting is not enabled, we cannot get a valid delta.
The delta is calculated with now - tsk-&gt;vtime_snap, but tsk-&gt;vtime_snap
is only updated when vtime accounting is runtime enabled.

This patch makes task_gtime() just return gtime without computing the
buggy non-existing tickless delta when vtime accounting is not enabled.

Use context_tracking_is_enabled() to check if vtime is accounting on
some cpu, in which case only we need to check the tickless delta. This
way we fix the gtime value regression on machines not running nohz full.

The kernel config contains CONFIG_VIRT_CPU_ACCOUNTING_GEN=y and
CONFIG_NO_HZ_FULL_ALL=n and boot without nohz_full.

I ran and stop a busy loop in VM and see the gtime in host.
Dump the 43rd field which shows the gtime in every second:

	 # while :; do awk '{print $3" "$43}' /proc/3955/task/4014/stat; sleep 1; done
	S 4348
	R 7064566
	R 7064766
	R 7064967
	R 7065168
	S 4759
	S 4759

During running busy loop, it returns large value.

After applying this patch, we can see right gtime.

	 # while :; do awk '{print $3" "$43}' /proc/10913/task/10956/stat; sleep 1; done
	S 5338
	R 5365
	R 5465
	R 5566
	R 5666
	S 5726
	S 5726

Signed-off-by: Hiroshi Shimamoto &lt;h-shimamoto@ct.jp.nec.com&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Chris Metcalf &lt;cmetcalf@ezchip.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Luiz Capitulino &lt;lcapitulino@redhat.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Paul E . McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1447948054-28668-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kvm/x86: Hyper-V HV_X64_MSR_VP_RUNTIME support</title>
<updated>2015-10-01T13:06:33+00:00</updated>
<author>
<name>Andrey Smetanin</name>
<email>asmetanin@virtuozzo.com</email>
</author>
<published>2015-09-16T09:29:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9eec50b8bbe1535c440a1ee88c1958f78fc55957'/>
<id>9eec50b8bbe1535c440a1ee88c1958f78fc55957</id>
<content type='text'>
HV_X64_MSR_VP_RUNTIME msr used by guest to get
"the time the virtual processor consumes running guest code,
and the time the associated logical processor spends running
hypervisor code on behalf of that guest."

Calculation of this time is performed by task_cputime_adjusted()
for vcpu task.

Necessary to support loading of winhv.sys in guest, which in turn is
required to support Windows VMBus.

Signed-off-by: Andrey Smetanin &lt;asmetanin@virtuozzo.com&gt;
Reviewed-by: Roman Kagan &lt;rkagan@virtuozzo.com&gt;
Signed-off-by: Denis V. Lunev &lt;den@openvz.org&gt;
CC: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
CC: Gleb Natapov &lt;gleb@kernel.org&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
HV_X64_MSR_VP_RUNTIME msr used by guest to get
"the time the virtual processor consumes running guest code,
and the time the associated logical processor spends running
hypervisor code on behalf of that guest."

Calculation of this time is performed by task_cputime_adjusted()
for vcpu task.

Necessary to support loading of winhv.sys in guest, which in turn is
required to support Windows VMBus.

Signed-off-by: Andrey Smetanin &lt;asmetanin@virtuozzo.com&gt;
Reviewed-by: Roman Kagan &lt;rkagan@virtuozzo.com&gt;
Signed-off-by: Denis V. Lunev &lt;den@openvz.org&gt;
CC: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
CC: Gleb Natapov &lt;gleb@kernel.org&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/cputime: Guarantee stime + utime == rtime</title>
<updated>2015-08-03T10:21:21+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2015-06-30T09:30:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9d7fb04276481c59610983362d8e023d262b58ca'/>
<id>9d7fb04276481c59610983362d8e023d262b58ca</id>
<content type='text'>
While the current code guarantees monotonicity for stime and utime
independently of one another, it does not guarantee that the sum of
both is equal to the total time we started out with.

This confuses things (and peoples) who look at this sum, like top, and
will report &gt;100% usage followed by a matching period of 0%.

Rework the code to provide both individual monotonicity and a coherent
sum.

Suggested-by: Fredrik Markstrom &lt;fredrik.markstrom@gmail.com&gt;
Reported-by: Fredrik Markstrom &lt;fredrik.markstrom@gmail.com&gt;
Tested-by: Fredrik Markstrom &lt;fredrik.markstrom@gmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: jason.low2@hp.com
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
While the current code guarantees monotonicity for stime and utime
independently of one another, it does not guarantee that the sum of
both is equal to the total time we started out with.

This confuses things (and peoples) who look at this sum, like top, and
will report &gt;100% usage followed by a matching period of 0%.

Rework the code to provide both individual monotonicity and a coherent
sum.

Suggested-by: Fredrik Markstrom &lt;fredrik.markstrom@gmail.com&gt;
Reported-by: Fredrik Markstrom &lt;fredrik.markstrom@gmail.com&gt;
Tested-by: Fredrik Markstrom &lt;fredrik.markstrom@gmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: jason.low2@hp.com
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched, timer: Convert usages of ACCESS_ONCE() in the scheduler to READ_ONCE()/WRITE_ONCE()</title>
<updated>2015-05-08T10:11:32+00:00</updated>
<author>
<name>Jason Low</name>
<email>jason.low2@hp.com</email>
</author>
<published>2015-04-28T20:00:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=316c1608d15c736439d4065ed12f306db554b3da'/>
<id>316c1608d15c736439d4065ed12f306db554b3da</id>
<content type='text'>
ACCESS_ONCE doesn't work reliably on non-scalar types. This patch removes
the rest of the existing usages of ACCESS_ONCE() in the scheduler, and use
the new READ_ONCE() and WRITE_ONCE() APIs as appropriate.

Signed-off-by: Jason Low &lt;jason.low2@hp.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Waiman Long &lt;Waiman.Long@hp.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Aswin Chandramouleeswaran &lt;aswin@hp.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Preeti U Murthy &lt;preeti@linux.vnet.ibm.com&gt;
Cc: Scott J Norton &lt;scott.norton@hp.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Link: http://lkml.kernel.org/r/1430251224-5764-2-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
ACCESS_ONCE doesn't work reliably on non-scalar types. This patch removes
the rest of the existing usages of ACCESS_ONCE() in the scheduler, and use
the new READ_ONCE() and WRITE_ONCE() APIs as appropriate.

Signed-off-by: Jason Low &lt;jason.low2@hp.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Waiman Long &lt;Waiman.Long@hp.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Aswin Chandramouleeswaran &lt;aswin@hp.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Preeti U Murthy &lt;preeti@linux.vnet.ibm.com&gt;
Cc: Scott J Norton &lt;scott.norton@hp.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Link: http://lkml.kernel.org/r/1430251224-5764-2-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched, time: Fix build error with 64 bit cputime_t on 32 bit systems</title>
<updated>2014-10-03T03:46:55+00:00</updated>
<author>
<name>Rik van Riel</name>
<email>riel@redhat.com</email>
</author>
<published>2014-09-30T19:59:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=347abad981c1ef815ea5ba861adba6a8c6aa1580'/>
<id>347abad981c1ef815ea5ba861adba6a8c6aa1580</id>
<content type='text'>
On 32 bit systems cmpxchg cannot handle 64 bit values, so
some additional magic is required to allow a 32 bit system
with CONFIG_VIRT_CPU_ACCOUNTING_GEN=y enabled to build.

Make sure the correct cmpxchg function is used when doing
an atomic swap of a cputime_t.

Reported-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: umgwanakikbuti@gmail.com
Cc: fweisbec@gmail.com
Cc: srao@redhat.com
Cc: lwoodman@redhat.com
Cc: atheurer@redhat.com
Cc: oleg@redhat.com
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: linux390@de.ibm.com
Cc: linux-arch@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Link: http://lkml.kernel.org/r/20140930155947.070cdb1f@annuminas.surriel.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On 32 bit systems cmpxchg cannot handle 64 bit values, so
some additional magic is required to allow a 32 bit system
with CONFIG_VIRT_CPU_ACCOUNTING_GEN=y enabled to build.

Make sure the correct cmpxchg function is used when doing
an atomic swap of a cputime_t.

Reported-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: umgwanakikbuti@gmail.com
Cc: fweisbec@gmail.com
Cc: srao@redhat.com
Cc: lwoodman@redhat.com
Cc: atheurer@redhat.com
Cc: oleg@redhat.com
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: linux390@de.ibm.com
Cc: linux-arch@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Link: http://lkml.kernel.org/r/20140930155947.070cdb1f@annuminas.surriel.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched, time: Fix lock inversion in thread_group_cputime()</title>
<updated>2014-09-19T10:35:17+00:00</updated>
<author>
<name>Rik van Riel</name>
<email>riel@redhat.com</email>
</author>
<published>2014-09-12T13:12:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9c368b5b6eccce1cbd7f68142106b3b4ddb1c5b5'/>
<id>9c368b5b6eccce1cbd7f68142106b3b4ddb1c5b5</id>
<content type='text'>
The sig-&gt;stats_lock nests inside the tasklist_lock and the
sighand-&gt;siglock in __exit_signal and wait_task_zombie.

However, both of those locks can be taken from irq context,
which means we need to use the interrupt safe variant of
read_seqbegin_or_lock. This blocks interrupts when the "lock"
branch is taken (seq is odd), preventing the lock inversion.

On the first (lockless) pass through the loop, irqs are not
blocked.

Reported-by: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: prarit@redhat.com
Cc: oleg@redhat.com
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/1410527535-9814-3-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The sig-&gt;stats_lock nests inside the tasklist_lock and the
sighand-&gt;siglock in __exit_signal and wait_task_zombie.

However, both of those locks can be taken from irq context,
which means we need to use the interrupt safe variant of
read_seqbegin_or_lock. This blocks interrupts when the "lock"
branch is taken (seq is odd), preventing the lock inversion.

On the first (lockless) pass through the loop, irqs are not
blocked.

Reported-by: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: prarit@redhat.com
Cc: oleg@redhat.com
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/1410527535-9814-3-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched, time: Atomically increment stime &amp; utime</title>
<updated>2014-09-08T06:17:02+00:00</updated>
<author>
<name>Rik van Riel</name>
<email>riel@redhat.com</email>
</author>
<published>2014-08-15T20:05:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=eb1b4af0a64ac7bb0ee36f579c1c7cefcbc3ac2c'/>
<id>eb1b4af0a64ac7bb0ee36f579c1c7cefcbc3ac2c</id>
<content type='text'>
The functions task_cputime_adjusted and thread_group_cputime_adjusted()
can be called locklessly, as well as concurrently on many different CPUs.

This can occasionally lead to the utime and stime reported by times(), and
other syscalls like it, going backward. The cause for this appears to be
multiple threads racing in cputime_adjust(), both with values for utime or
stime that is larger than the original, but each with a different value.

Sometimes the larger value gets saved first, only to be immediately
overwritten with a smaller value by another thread.

Using atomic exchange prevents that problem, and ensures time
progresses monotonically.

Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: umgwanakikbuti@gmail.com
Cc: fweisbec@gmail.com
Cc: akpm@linux-foundation.org
Cc: srao@redhat.com
Cc: lwoodman@redhat.com
Cc: atheurer@redhat.com
Cc: oleg@redhat.com
Link: http://lkml.kernel.org/r/1408133138-22048-4-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The functions task_cputime_adjusted and thread_group_cputime_adjusted()
can be called locklessly, as well as concurrently on many different CPUs.

This can occasionally lead to the utime and stime reported by times(), and
other syscalls like it, going backward. The cause for this appears to be
multiple threads racing in cputime_adjust(), both with values for utime or
stime that is larger than the original, but each with a different value.

Sometimes the larger value gets saved first, only to be immediately
overwritten with a smaller value by another thread.

Using atomic exchange prevents that problem, and ensures time
progresses monotonically.

Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: umgwanakikbuti@gmail.com
Cc: fweisbec@gmail.com
Cc: akpm@linux-foundation.org
Cc: srao@redhat.com
Cc: lwoodman@redhat.com
Cc: atheurer@redhat.com
Cc: oleg@redhat.com
Link: http://lkml.kernel.org/r/1408133138-22048-4-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>time, signal: Protect resource use statistics with seqlock</title>
<updated>2014-09-08T06:17:01+00:00</updated>
<author>
<name>Rik van Riel</name>
<email>riel@redhat.com</email>
</author>
<published>2014-08-16T17:40:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e78c3496790ee8a36522a838b59b388e8a709e65'/>
<id>e78c3496790ee8a36522a838b59b388e8a709e65</id>
<content type='text'>
Both times() and clock_gettime(CLOCK_PROCESS_CPUTIME_ID) have scalability
issues on large systems, due to both functions being serialized with a
lock.

The lock protects against reporting a wrong value, due to a thread in the
task group exiting, its statistics reporting up to the signal struct, and
that exited task's statistics being counted twice (or not at all).

Protecting that with a lock results in times() and clock_gettime() being
completely serialized on large systems.

This can be fixed by using a seqlock around the events that gather and
propagate statistics. As an additional benefit, the protection code can
be moved into thread_group_cputime(), slightly simplifying the calling
functions.

In the case of posix_cpu_clock_get_task() things can be simplified a
lot, because the calling function already ensures that the task sticks
around, and the rest is now taken care of in thread_group_cputime().

This way the statistics reporting code can run lockless.

Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Alex Thorlton &lt;athorlton@sgi.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Daeseok Youn &lt;daeseok.youn@gmail.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Dongsheng Yang &lt;yangds.fnst@cn.fujitsu.com&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Guillaume Morin &lt;guillaume@morinfr.org&gt;
Cc: Ionut Alexa &lt;ionut.m.alexa@gmail.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Cc: umgwanakikbuti@gmail.com
Cc: fweisbec@gmail.com
Cc: srao@redhat.com
Cc: lwoodman@redhat.com
Cc: atheurer@redhat.com
Link: http://lkml.kernel.org/r/20140816134010.26a9b572@annuminas.surriel.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Both times() and clock_gettime(CLOCK_PROCESS_CPUTIME_ID) have scalability
issues on large systems, due to both functions being serialized with a
lock.

The lock protects against reporting a wrong value, due to a thread in the
task group exiting, its statistics reporting up to the signal struct, and
that exited task's statistics being counted twice (or not at all).

Protecting that with a lock results in times() and clock_gettime() being
completely serialized on large systems.

This can be fixed by using a seqlock around the events that gather and
propagate statistics. As an additional benefit, the protection code can
be moved into thread_group_cputime(), slightly simplifying the calling
functions.

In the case of posix_cpu_clock_get_task() things can be simplified a
lot, because the calling function already ensures that the task sticks
around, and the rest is now taken care of in thread_group_cputime().

This way the statistics reporting code can run lockless.

Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Alex Thorlton &lt;athorlton@sgi.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Daeseok Youn &lt;daeseok.youn@gmail.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Dongsheng Yang &lt;yangds.fnst@cn.fujitsu.com&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Guillaume Morin &lt;guillaume@morinfr.org&gt;
Cc: Ionut Alexa &lt;ionut.m.alexa@gmail.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Cc: umgwanakikbuti@gmail.com
Cc: fweisbec@gmail.com
Cc: srao@redhat.com
Cc: lwoodman@redhat.com
Cc: atheurer@redhat.com
Link: http://lkml.kernel.org/r/20140816134010.26a9b572@annuminas.surriel.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Change thread_group_cputime() to use for_each_thread()</title>
<updated>2014-08-20T07:47:18+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2014-08-13T19:20:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1e4dda08b4c39b3d8f4a3ee7269d49e0200c8af8'/>
<id>1e4dda08b4c39b3d8f4a3ee7269d49e0200c8af8</id>
<content type='text'>
Change thread_group_cputime() to use for_each_thread() instead of
buggy while_each_thread(). This also makes the pid_alive() check
unnecessary.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Cc: Hidetoshi Seto &lt;seto.hidetoshi@jp.fujitsu.com&gt;
Cc: Frank Mayhar &lt;fmayhar@google.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Sanjay Rao &lt;srao@redhat.com&gt;
Cc: Larry Woodman &lt;lwoodman@redhat.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/20140813192000.GA19327@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change thread_group_cputime() to use for_each_thread() instead of
buggy while_each_thread(). This also makes the pid_alive() check
unnecessary.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Cc: Hidetoshi Seto &lt;seto.hidetoshi@jp.fujitsu.com&gt;
Cc: Frank Mayhar &lt;fmayhar@google.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Sanjay Rao &lt;srao@redhat.com&gt;
Cc: Larry Woodman &lt;lwoodman@redhat.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/20140813192000.GA19327@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Sanitize irq accounting madness</title>
<updated>2014-05-07T09:51:30+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2014-05-02T21:26:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2d513868e2a33e1d5315490ef4c861ee65babd65'/>
<id>2d513868e2a33e1d5315490ef4c861ee65babd65</id>
<content type='text'>
Russell reported, that irqtime_account_idle_ticks() takes ages due to:

       for (i = 0; i &lt; ticks; i++)
               irqtime_account_process_tick(current, 0, rq);

It's sad, that this code was written way _AFTER_ the NOHZ idle
functionality was available. I charge myself guitly for not paying
attention when that crap got merged with commit abb74cefa ("sched:
Export ns irqtimes through /proc/stat")

So instead of looping nr_ticks times just apply the whole thing at
once.

As a side note: The whole cputime_t vs. u64 business in that context
wants to be cleaned up as well. There is no point in having all these
back and forth conversions. Lets standardise on u64 nsec for all
kernel internal accounting and be done with it. Everything else does
not make sense at all for fine grained accounting. Frederic, can you
please take care of that?

Reported-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Venkatesh Pallipadi &lt;venki@google.com&gt;
Cc: Shaun Ruffell &lt;sruffell@digium.com&gt;
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1405022307000.6261@ionos.tec.linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Russell reported, that irqtime_account_idle_ticks() takes ages due to:

       for (i = 0; i &lt; ticks; i++)
               irqtime_account_process_tick(current, 0, rq);

It's sad, that this code was written way _AFTER_ the NOHZ idle
functionality was available. I charge myself guitly for not paying
attention when that crap got merged with commit abb74cefa ("sched:
Export ns irqtimes through /proc/stat")

So instead of looping nr_ticks times just apply the whole thing at
once.

As a side note: The whole cputime_t vs. u64 business in that context
wants to be cleaned up as well. There is no point in having all these
back and forth conversions. Lets standardise on u64 nsec for all
kernel internal accounting and be done with it. Everything else does
not make sense at all for fine grained accounting. Frederic, can you
please take care of that?

Reported-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Venkatesh Pallipadi &lt;venki@google.com&gt;
Cc: Shaun Ruffell &lt;sruffell@digium.com&gt;
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1405022307000.6261@ionos.tec.linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
