linux-stable.git/arch/powerpc/kernel/time.c, branch v3.2.73

powerpc/pseries: Duplicate dtl entries sometimes sent to userspace

2014-01-03T04:33:23+00:00

commit 84b073868b9d9e754ae48b828337633d1b386482 upstream.

When reading from the dispatch trace log (dtl) userspace interface, I
sometimes see duplicate entries. One example:

# hexdump -C dtl.out

00000000  07 04 00 0c 00 00 48 44  00 00 00 00 00 00 00 00
00000010  00 0c a0 b4 16 83 6d 68  00 00 00 00 00 00 00 00
00000020  00 00 00 00 10 00 13 50  80 00 00 00 00 00 d0 32

00000030  07 04 00 0c 00 00 48 44  00 00 00 00 00 00 00 00
00000040  00 0c a0 b4 16 83 6d 68  00 00 00 00 00 00 00 00
00000050  00 00 00 00 10 00 13 50  80 00 00 00 00 00 d0 32

The problem is in scan_dispatch_log() where we call dtl_consumer()
but bail out before incrementing the index.

To fix this I moved dtl_consumer() after the timebase comparison.

Signed-off-by: Anton Blanchard 
Signed-off-by: Benjamin Herrenschmidt 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

powerpc/vdso: Remove redundant locking in update_vsyscall_tz()

2013-01-16T01:13:16+00:00

commit ce73ec6db47af84d1466402781ae0872a9e7873c upstream.

The locking in update_vsyscall_tz() is not only unnecessary because the vdso
code copies the data unproteced in __kernel_gettimeofday() but also
introduces a hard to reproduce race condition between update_vsyscall()
and update_vsyscall_tz(), which causes user space process to loop
forever in vdso code.

The following patch removes the locking from update_vsyscall_tz().

Locking is not only unnecessary because the vdso code copies the data
unprotected in __kernel_gettimeofday() but also erroneous because updating
the tb_update_count is not atomic and introduces a hard to reproduce race
condition between update_vsyscall() and update_vsyscall_tz(), which further
causes user space process to loop forever in vdso code.

The below scenario describes the race condition,
x==0	Boot CPU			other CPU
	proc_P: x==0
	    timer interrupt
		update_vsyscall
x==1		    x++;sync		settimeofday
					    update_vsyscall_tz
x==2						x++;sync
x==3		    sync;x++
						sync;x++
	proc_P: x==3 (loops until x becomes even)

Because the ++ operator would be implemented as three instructions and not
atomic on powerpc.

A similar change was made for x86 in commit 6c260d58634
("x86: vdso: Remove bogus locking in update_vsyscall_tz")

Signed-off-by: Shan Hai 
Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Ben Hutchings

powerpc: Fix wrong divisor in usecs_to_cputime

2012-07-25T03:11:38+00:00

commit 9f5072d4f63f28d30d343573830ac6c85fc0deff upstream.

Commit d57af9b (taskstats: use real microsecond granularity for CPU times)
renamed msecs_to_cputime to usecs_to_cputime, but failed to update all
numbers on the way.  This causes nonsensical cpu idle/iowait values to be
displayed in /proc/stat (the only user of usecs_to_cputime so far).

This also renames __cputime_msec_factor to __cputime_usec_factor, adapting
its value and using it directly in cputime_to_usecs instead of doing two
multiplications.

Signed-off-by: Andreas Schwab 
Acked-by: Anton Blanchard 
Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Ben Hutchings

powerpc/time: Handle wrapping of decrementer

2012-01-12T19:29:22+00:00

commit 37fb9a0231ee43d42d069863bdfd567fca2b61af upstream.

When re-enabling interrupts we have code to handle edge sensitive
decrementers by resetting the decrementer to 1 whenever it is negative.
If interrupts were disabled long enough that the decrementer wrapped to
positive we do nothing. This means interrupts can be delayed for a long
time until it finally goes negative again.

While we hope interrupts are never be disabled long enough for the
decrementer to go positive, we have a very good test team that can
drive any kernel into the ground. The softlockup data we get back
from these fails could be seconds in the future, completely missing
the cause of the lockup.

We already keep track of the timebase of the next event so use that
to work out if we should trigger a decrementer exception.

Signed-off-by: Anton Blanchard 
Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Greg Kroah-Hartman

powerpc: various straight conversions from module.h --> export.h

2011-10-31T23:30:44+00:00

All these files were including module.h just for the basic
EXPORT_SYMBOL infrastructure.  We can shift them off to the
export.h header which is a way smaller footprint and thus
realize some compile time gains.

Signed-off-by: Paul Gortmaker

irq_work, ppc: Fix up arch hooks

2011-07-01T09:02:22+00:00

Commit e360adbe29 ("irq_work: Add generic hardirq context
callbacks") fouled up the ppc bit, not properly naming the
arch specific function that raises the 'self-IPI'.

Cc: Huang Ying 
Cc: Benjamin Herrenschmidt 
Cc: Anton Blanchard 
Cc: Eric B Munson 
Cc: stable@kernel.org # 37+
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/n/tip-eg0aqien8p1aqvzu9dft6dtv@git.kernel.org
Signed-off-by: Ingo Molnar

powerpc: Fix oops if scan_dispatch_log is called too early

2011-04-18T03:08:19+00:00

We currently enable interrupts before the dispatch log for the boot
cpu is setup. If a timer interrupt comes in early enough we oops in
scan_dispatch_log:

Unable to handle kernel paging request for data at address 0x00000010

...

.scan_dispatch_log+0xb0/0x170
.account_system_vtime+0xa0/0x220
.irq_enter+0x88/0xc0
.do_IRQ+0x48/0x230

The patch below adds a check to scan_dispatch_log to ensure the
dispatch log has been allocated.

Signed-off-by: Anton Blanchard 
Cc: 
Signed-off-by: Benjamin Herrenschmidt

powerpc: Make decrementer interrupt robust against offlined CPUs

2011-04-01T04:37:07+00:00

With some implementations, it is possible that a timer interrupt
occurs every few seconds on an offline CPU. In this case, just
re-arm the decrementer and return immediately

Signed-off-by: Benjamin Herrenschmidt

powerpc: Fix accounting of softirq time when idle

2011-03-29T23:44:18+00:00

commit cf9efce0ce31 (powerpc: Account time using timebase rather
than PURR) used in_irq() to detect if the time was spent in
interrupt processing. This only catches hardirq context so if we
are in softirq context and in the idle loop we end up accounting it
as idle time. If we instead use in_interrupt() we catch both softirq
and hardirq time.

The issue was found when running a network intensive workload. top
showed the following:

0.0%us,  1.1%sy,  0.0%ni, 85.7%id,  0.0%wa,  9.9%hi,  3.3%si,  0.0%st

85.7% idle. But this was wildly different to the perf events data.
To confirm the suspicion I ran something to keep the core busy:

# yes > /dev/null &

8.2%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa, 10.3%hi, 81.4%si,  0.0%st

We only got 8.2% of the CPU for the userspace task and softirq has
shot up to 81.4%.

With the patch below top shows the correct stats:

0.0%us,  0.0%sy,  0.0%ni,  5.3%id,  0.0%wa, 13.3%hi, 81.3%si,  0.0%st

Signed-off-by: Anton Blanchard 
Cc: stable@kernel.org
Signed-off-by: Benjamin Herrenschmidt

powerpc/cell: Use system_wq in cpufreq_spudemand

2011-01-21T03:08:34+00:00

With cmwq, there's no reason to use a separate workqueue in
cpufreq_spudemand.  Use system_wq instead.  The work items are already
sync canceled on stop, so it's already guaranteed that no work is
running when spu_gov_exit() is entered.

Signed-off-by: Tejun Heo 
Cc: Arnd Bergmann 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Dave Jones 
Cc: cpufreq@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt