<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/kernel/locking, branch v3.16.40</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>locking/rtmutex: Prevent dequeue vs. unlock race</title>
<updated>2017-02-23T03:54:41+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2016-11-30T21:04:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b94195752fb8ac766a5907cd025539f4aa1cef05'/>
<id>b94195752fb8ac766a5907cd025539f4aa1cef05</id>
<content type='text'>
commit dbb26055defd03d59f678cb5f2c992abe05b064a upstream.

David reported a futex/rtmutex state corruption. It's caused by the
following problem:

CPU0		CPU1		CPU2

l-&gt;owner=T1
		rt_mutex_lock(l)
		lock(l-&gt;wait_lock)
		l-&gt;owner = T1 | HAS_WAITERS;
		enqueue(T2)
		boost()
		  unlock(l-&gt;wait_lock)
		schedule()

				rt_mutex_lock(l)
				lock(l-&gt;wait_lock)
				l-&gt;owner = T1 | HAS_WAITERS;
				enqueue(T3)
				boost()
				  unlock(l-&gt;wait_lock)
				schedule()
		signal(-&gt;T2)	signal(-&gt;T3)
		lock(l-&gt;wait_lock)
		dequeue(T2)
		deboost()
		  unlock(l-&gt;wait_lock)
				lock(l-&gt;wait_lock)
				dequeue(T3)
				  ===&gt; wait list is now empty
				deboost()
				 unlock(l-&gt;wait_lock)
		lock(l-&gt;wait_lock)
		fixup_rt_mutex_waiters()
		  if (wait_list_empty(l)) {
		    owner = l-&gt;owner &amp; ~HAS_WAITERS;
		    l-&gt;owner = owner
		     ==&gt; l-&gt;owner = T1
		  }

				lock(l-&gt;wait_lock)
rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
				  if (wait_list_empty(l)) {
				    owner = l-&gt;owner &amp; ~HAS_WAITERS;
cmpxchg(l-&gt;owner, T1, NULL)
 ===&gt; Success (l-&gt;owner = NULL)
				    l-&gt;owner = owner
				     ==&gt; l-&gt;owner = T1
				  }

That means the problem is caused by fixup_rt_mutex_waiters() which does the
RMW to clear the waiters bit unconditionally when there are no waiters in
the rtmutexes rbtree.

This can be fatal: A concurrent unlock can release the rtmutex in the
fastpath because the waiters bit is not set. If the cmpxchg() gets in the
middle of the RMW operation then the previous owner, which just unlocked
the rtmutex is set as the owner again when the write takes place after the
successfull cmpxchg().

The solution is rather trivial: verify that the owner member of the rtmutex
has the waiters bit set before clearing it. This does not require a
cmpxchg() or other atomic operations because the waiters bit can only be
set and cleared with the rtmutex wait_lock held. It's also safe against the
fast path unlock attempt. The unlock attempt via cmpxchg() will either see
the bit set and take the slowpath or see the bit cleared and release it
atomically in the fastpath.

It's remarkable that the test program provided by David triggers on ARM64
and MIPS64 really quick, but it refuses to reproduce on x86-64, while the
problem exists there as well. That refusal might explain that this got not
discovered earlier despite the bug existing from day one of the rtmutex
implementation more than 10 years ago.

Thanks to David for meticulously instrumenting the code and providing the
information which allowed to decode this subtle problem.

Reported-by: David Daney &lt;ddaney@caviumnetworks.com&gt;
Tested-by: David Daney &lt;david.daney@cavium.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Sebastian Siewior &lt;bigeasy@linutronix.de&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core")
Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
[bwh: Backported to 3.16: use ACCESS_ONCE() instead of {READ,WRITE}_ONCE()]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit dbb26055defd03d59f678cb5f2c992abe05b064a upstream.

David reported a futex/rtmutex state corruption. It's caused by the
following problem:

CPU0		CPU1		CPU2

l-&gt;owner=T1
		rt_mutex_lock(l)
		lock(l-&gt;wait_lock)
		l-&gt;owner = T1 | HAS_WAITERS;
		enqueue(T2)
		boost()
		  unlock(l-&gt;wait_lock)
		schedule()

				rt_mutex_lock(l)
				lock(l-&gt;wait_lock)
				l-&gt;owner = T1 | HAS_WAITERS;
				enqueue(T3)
				boost()
				  unlock(l-&gt;wait_lock)
				schedule()
		signal(-&gt;T2)	signal(-&gt;T3)
		lock(l-&gt;wait_lock)
		dequeue(T2)
		deboost()
		  unlock(l-&gt;wait_lock)
				lock(l-&gt;wait_lock)
				dequeue(T3)
				  ===&gt; wait list is now empty
				deboost()
				 unlock(l-&gt;wait_lock)
		lock(l-&gt;wait_lock)
		fixup_rt_mutex_waiters()
		  if (wait_list_empty(l)) {
		    owner = l-&gt;owner &amp; ~HAS_WAITERS;
		    l-&gt;owner = owner
		     ==&gt; l-&gt;owner = T1
		  }

				lock(l-&gt;wait_lock)
rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
				  if (wait_list_empty(l)) {
				    owner = l-&gt;owner &amp; ~HAS_WAITERS;
cmpxchg(l-&gt;owner, T1, NULL)
 ===&gt; Success (l-&gt;owner = NULL)
				    l-&gt;owner = owner
				     ==&gt; l-&gt;owner = T1
				  }

That means the problem is caused by fixup_rt_mutex_waiters() which does the
RMW to clear the waiters bit unconditionally when there are no waiters in
the rtmutexes rbtree.

This can be fatal: A concurrent unlock can release the rtmutex in the
fastpath because the waiters bit is not set. If the cmpxchg() gets in the
middle of the RMW operation then the previous owner, which just unlocked
the rtmutex is set as the owner again when the write takes place after the
successfull cmpxchg().

The solution is rather trivial: verify that the owner member of the rtmutex
has the waiters bit set before clearing it. This does not require a
cmpxchg() or other atomic operations because the waiters bit can only be
set and cleared with the rtmutex wait_lock held. It's also safe against the
fast path unlock attempt. The unlock attempt via cmpxchg() will either see
the bit set and take the slowpath or see the bit cleared and release it
atomically in the fastpath.

It's remarkable that the test program provided by David triggers on ARM64
and MIPS64 really quick, but it refuses to reproduce on x86-64, while the
problem exists there as well. That refusal might explain that this got not
discovered earlier despite the bug existing from day one of the rtmutex
implementation more than 10 years ago.

Thanks to David for meticulously instrumenting the code and providing the
information which allowed to decode this subtle problem.

Reported-by: David Daney &lt;ddaney@caviumnetworks.com&gt;
Tested-by: David Daney &lt;david.daney@cavium.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Sebastian Siewior &lt;bigeasy@linutronix.de&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core")
Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
[bwh: Backported to 3.16: use ACCESS_ONCE() instead of {READ,WRITE}_ONCE()]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Handle priority boosted tasks proper in setscheduler()</title>
<updated>2015-05-28T09:00:00+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2015-05-05T17:49:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=daf639f6ed118436322db96a1af2aa341d6cd37c'/>
<id>daf639f6ed118436322db96a1af2aa341d6cd37c</id>
<content type='text'>
commit 0782e63bc6fe7e2d3408d250df11d388b7799c6b upstream.

Ronny reported that the following scenario is not handled correctly:

	T1 (prio = 10)
	   lock(rtmutex);

	T2 (prio = 20)
	   lock(rtmutex)
	      boost T1

	T1 (prio = 20)
	   sys_set_scheduler(prio = 30)
	   T1 prio = 30
	   ....
	   sys_set_scheduler(prio = 10)
	   T1 prio = 30

The last step is wrong as T1 should now be back at prio 20.

Commit c365c292d059 ("sched: Consider pi boosting in setscheduler()")
only handles the case where a boosted tasks tries to lower its
priority.

Fix it by taking the new effective priority into account for the
decision whether a change of the priority is required.

Reported-by: Ronny Meeus &lt;ronny.meeus@gmail.com&gt;
Tested-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Fixes: c365c292d059 ("sched: Consider pi boosting in setscheduler()")
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1505051806060.4225@nanos
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques &lt;luis.henriques@canonical.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 0782e63bc6fe7e2d3408d250df11d388b7799c6b upstream.

Ronny reported that the following scenario is not handled correctly:

	T1 (prio = 10)
	   lock(rtmutex);

	T2 (prio = 20)
	   lock(rtmutex)
	      boost T1

	T1 (prio = 20)
	   sys_set_scheduler(prio = 30)
	   T1 prio = 30
	   ....
	   sys_set_scheduler(prio = 10)
	   T1 prio = 30

The last step is wrong as T1 should now be back at prio 20.

Commit c365c292d059 ("sched: Consider pi boosting in setscheduler()")
only handles the case where a boosted tasks tries to lower its
priority.

Fix it by taking the new effective priority into account for the
decision whether a change of the priority is required.

Reported-by: Ronny Meeus &lt;ronny.meeus@gmail.com&gt;
Tested-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Fixes: c365c292d059 ("sched: Consider pi boosting in setscheduler()")
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1505051806060.4225@nanos
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques &lt;luis.henriques@canonical.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>locking/rtmutex: Avoid a NULL pointer dereference on deadlock</title>
<updated>2015-03-02T15:04:30+00:00</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2015-02-17T15:43:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=eb9abcff096d2b1d89cbf9d968714c339b6f94c3'/>
<id>eb9abcff096d2b1d89cbf9d968714c339b6f94c3</id>
<content type='text'>
commit 8d1e5a1a1ccf5ae9d8a5a0ee7960202ccb0c5429 upstream.

With task_blocks_on_rt_mutex() returning early -EDEADLK we never
add the waiter to the waitqueue. Later, we try to remove it via
remove_waiter() and go boom in rt_mutex_top_waiter() because
rb_entry() gives a NULL pointer.

( Tested on v3.18-RT where rtmutex is used for regular mutex and I
  tried to get one twice in a row. )

Not sure when this started but I guess 397335f004f4 ("rtmutex: Fix
deadlock detector for real") or commit 3d5c9340d194 ("rtmutex:
Handle deadlock detection smarter").

Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1424187823-19600-1-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques &lt;luis.henriques@canonical.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 8d1e5a1a1ccf5ae9d8a5a0ee7960202ccb0c5429 upstream.

With task_blocks_on_rt_mutex() returning early -EDEADLK we never
add the waiter to the waitqueue. Later, we try to remove it via
remove_waiter() and go boom in rt_mutex_top_waiter() because
rb_entry() gives a NULL pointer.

( Tested on v3.18-RT where rtmutex is used for regular mutex and I
  tried to get one twice in a row. )

Not sure when this started but I guess 397335f004f4 ("rtmutex: Fix
deadlock detector for real") or commit 3d5c9340d194 ("rtmutex:
Handle deadlock detection smarter").

Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1424187823-19600-1-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques &lt;luis.henriques@canonical.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>locking/rwsem: Add CONFIG_RWSEM_SPIN_ON_OWNER</title>
<updated>2014-07-16T12:57:13+00:00</updated>
<author>
<name>Davidlohr Bueso</name>
<email>davidlohr@hp.com</email>
</author>
<published>2014-07-11T21:00:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5db6c6fefb1ca0e81e3bd6dd8998bf51c453d823'/>
<id>5db6c6fefb1ca0e81e3bd6dd8998bf51c453d823</id>
<content type='text'>
Just like with mutexes (CONFIG_MUTEX_SPIN_ON_OWNER),
encapsulate the dependencies for rwsem optimistic spinning.
No logical changes here as it continues to depend on both
SMP and the XADD algorithm variant.

Signed-off-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Acked-by: Jason Low &lt;jason.low2@hp.com&gt;
[ Also make it depend on ARCH_SUPPORTS_ATOMIC_RMW. ]
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/1405112406-13052-2-git-send-email-davidlohr@hp.com
Cc: aswin@hp.com
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Cc: Josef Bacik &lt;jbacik@fusionio.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Waiman Long &lt;Waiman.Long@hp.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;

Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Just like with mutexes (CONFIG_MUTEX_SPIN_ON_OWNER),
encapsulate the dependencies for rwsem optimistic spinning.
No logical changes here as it continues to depend on both
SMP and the XADD algorithm variant.

Signed-off-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Acked-by: Jason Low &lt;jason.low2@hp.com&gt;
[ Also make it depend on ARCH_SUPPORTS_ATOMIC_RMW. ]
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/1405112406-13052-2-git-send-email-davidlohr@hp.com
Cc: aswin@hp.com
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Cc: Josef Bacik &lt;jbacik@fusionio.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Waiman Long &lt;Waiman.Long@hp.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;

Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>locking/rwsem: Rename 'activity' to 'count'</title>
<updated>2014-07-16T12:56:55+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2014-07-16T12:54:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=13b9a962a2594ee880c5d50d7f70964da1d4fe5a'/>
<id>13b9a962a2594ee880c5d50d7f70964da1d4fe5a</id>
<content type='text'>
There are two definitions of struct rw_semaphore, one in linux/rwsem.h
and one in linux/rwsem-spinlock.h.

For some reason they have different names for the initial field. This
makes it impossible to use C99 named initialization for
__RWSEM_INITIALIZER() -- or we have to duplicate that entire thing
along with the structure definitions.

The simpler patch is renaming the rwsem-spinlock variant to match the
regular rwsem.

This allows us to switch to C99 named initialization.

Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/n/tip-bmrZolsbGmautmzrerog27io@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are two definitions of struct rw_semaphore, one in linux/rwsem.h
and one in linux/rwsem-spinlock.h.

For some reason they have different names for the initial field. This
makes it impossible to use C99 named initialization for
__RWSEM_INITIALIZER() -- or we have to duplicate that entire thing
along with the structure definitions.

The simpler patch is renaming the rwsem-spinlock variant to match the
regular rwsem.

This allows us to switch to C99 named initialization.

Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/n/tip-bmrZolsbGmautmzrerog27io@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>locking/spinlocks/mcs: Micro-optimize osq_unlock()</title>
<updated>2014-07-16T11:28:06+00:00</updated>
<author>
<name>Jason Low</name>
<email>jason.low2@hp.com</email>
</author>
<published>2014-07-14T17:27:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=33ecd2083a9560fbc1ef1b1279ef3ecb4c012a4f'/>
<id>33ecd2083a9560fbc1ef1b1279ef3ecb4c012a4f</id>
<content type='text'>
In the unlock function of the cancellable MCS spinlock, the first
thing we do is to retrive the current CPU's osq node. However, due to
the changes made in the previous patch, in the common case where the
lock is not contended, we wouldn't need to access the current CPU's
osq node anymore.

This patch optimizes this by only retriving this CPU's osq node
after we attempt the initial cmpxchg to unlock the osq and found
that its contended.

Signed-off-by: Jason Low &lt;jason.low2@hp.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Scott Norton &lt;scott.norton@hp.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Waiman Long &lt;waiman.long@hp.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Cc: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Aswin Chandramouleeswaran &lt;aswin@hp.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/1405358872-3732-5-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In the unlock function of the cancellable MCS spinlock, the first
thing we do is to retrive the current CPU's osq node. However, due to
the changes made in the previous patch, in the common case where the
lock is not contended, we wouldn't need to access the current CPU's
osq node anymore.

This patch optimizes this by only retriving this CPU's osq node
after we attempt the initial cmpxchg to unlock the osq and found
that its contended.

Signed-off-by: Jason Low &lt;jason.low2@hp.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Scott Norton &lt;scott.norton@hp.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Waiman Long &lt;waiman.long@hp.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Cc: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Aswin Chandramouleeswaran &lt;aswin@hp.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/1405358872-3732-5-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>locking/spinlocks/mcs: Introduce and use init macro and function for osq locks</title>
<updated>2014-07-16T11:28:05+00:00</updated>
<author>
<name>Jason Low</name>
<email>jason.low2@hp.com</email>
</author>
<published>2014-07-14T17:27:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=4d9d951e6b5df85ccfca2c5bd8b4f5c71d256b65'/>
<id>4d9d951e6b5df85ccfca2c5bd8b4f5c71d256b65</id>
<content type='text'>
Currently, we initialize the osq lock by directly setting the lock's values. It
would be preferable if we use an init macro to do the initialization like we do
with other locks.

This patch introduces and uses a macro and function for initializing the osq lock.

Signed-off-by: Jason Low &lt;jason.low2@hp.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Scott Norton &lt;scott.norton@hp.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Waiman Long &lt;waiman.long@hp.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Cc: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Aswin Chandramouleeswaran &lt;aswin@hp.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Josef Bacik &lt;jbacik@fusionio.com&gt;
Link: http://lkml.kernel.org/r/1405358872-3732-4-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, we initialize the osq lock by directly setting the lock's values. It
would be preferable if we use an init macro to do the initialization like we do
with other locks.

This patch introduces and uses a macro and function for initializing the osq lock.

Signed-off-by: Jason Low &lt;jason.low2@hp.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Scott Norton &lt;scott.norton@hp.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Waiman Long &lt;waiman.long@hp.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Cc: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Aswin Chandramouleeswaran &lt;aswin@hp.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Josef Bacik &lt;jbacik@fusionio.com&gt;
Link: http://lkml.kernel.org/r/1405358872-3732-4-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>locking/spinlocks/mcs: Convert osq lock to atomic_t to reduce overhead</title>
<updated>2014-07-16T11:28:04+00:00</updated>
<author>
<name>Jason Low</name>
<email>jason.low2@hp.com</email>
</author>
<published>2014-07-14T17:27:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=90631822c5d307b5410500806e8ac3e63928aa3e'/>
<id>90631822c5d307b5410500806e8ac3e63928aa3e</id>
<content type='text'>
The cancellable MCS spinlock is currently used to queue threads that are
doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining
the lock would access and queue the local node corresponding to the CPU that
it's running on. Currently, the cancellable MCS lock is implemented by using
pointers to these nodes.

In this patch, instead of operating on pointers to the per-cpu nodes, we
store the CPU numbers in which the per-cpu nodes correspond to in atomic_t.
A similar concept is used with the qspinlock.

By operating on the CPU # of the nodes using atomic_t instead of pointers
to those nodes, this can reduce the overhead of the cancellable MCS spinlock
by 32 bits (on 64 bit systems).

Signed-off-by: Jason Low &lt;jason.low2@hp.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Scott Norton &lt;scott.norton@hp.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Waiman Long &lt;waiman.long@hp.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Cc: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Aswin Chandramouleeswaran &lt;aswin@hp.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Josef Bacik &lt;jbacik@fusionio.com&gt;
Link: http://lkml.kernel.org/r/1405358872-3732-3-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The cancellable MCS spinlock is currently used to queue threads that are
doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining
the lock would access and queue the local node corresponding to the CPU that
it's running on. Currently, the cancellable MCS lock is implemented by using
pointers to these nodes.

In this patch, instead of operating on pointers to the per-cpu nodes, we
store the CPU numbers in which the per-cpu nodes correspond to in atomic_t.
A similar concept is used with the qspinlock.

By operating on the CPU # of the nodes using atomic_t instead of pointers
to those nodes, this can reduce the overhead of the cancellable MCS spinlock
by 32 bits (on 64 bit systems).

Signed-off-by: Jason Low &lt;jason.low2@hp.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Scott Norton &lt;scott.norton@hp.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Waiman Long &lt;waiman.long@hp.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Cc: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Aswin Chandramouleeswaran &lt;aswin@hp.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Josef Bacik &lt;jbacik@fusionio.com&gt;
Link: http://lkml.kernel.org/r/1405358872-3732-3-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>locking/spinlocks/mcs: Rename optimistic_spin_queue() to optimistic_spin_node()</title>
<updated>2014-07-16T11:28:03+00:00</updated>
<author>
<name>Jason Low</name>
<email>jason.low2@hp.com</email>
</author>
<published>2014-07-14T17:27:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=046a619d8e9746fa4c0e29e8c6b78e16efc008a8'/>
<id>046a619d8e9746fa4c0e29e8c6b78e16efc008a8</id>
<content type='text'>
Currently, the per-cpu nodes structure for the cancellable MCS spinlock is
named "optimistic_spin_queue". However, in a follow up patch in the series
we will be introducing a new structure that serves as the new "handle" for
the lock. It would make more sense if that structure is named
"optimistic_spin_queue". Additionally, since the current use of the
"optimistic_spin_queue" structure are  "nodes", it might be better if we
rename them to "node" anyway.

This preparatory patch renames all current "optimistic_spin_queue"
to "optimistic_spin_node".

Signed-off-by: Jason Low &lt;jason.low2@hp.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Scott Norton &lt;scott.norton@hp.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Waiman Long &lt;waiman.long@hp.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Cc: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Aswin Chandramouleeswaran &lt;aswin@hp.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Josef Bacik &lt;jbacik@fusionio.com&gt;
Link: http://lkml.kernel.org/r/1405358872-3732-2-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, the per-cpu nodes structure for the cancellable MCS spinlock is
named "optimistic_spin_queue". However, in a follow up patch in the series
we will be introducing a new structure that serves as the new "handle" for
the lock. It would make more sense if that structure is named
"optimistic_spin_queue". Additionally, since the current use of the
"optimistic_spin_queue" structure are  "nodes", it might be better if we
rename them to "node" anyway.

This preparatory patch renames all current "optimistic_spin_queue"
to "optimistic_spin_node".

Signed-off-by: Jason Low &lt;jason.low2@hp.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Scott Norton &lt;scott.norton@hp.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Waiman Long &lt;waiman.long@hp.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Cc: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Aswin Chandramouleeswaran &lt;aswin@hp.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Josef Bacik &lt;jbacik@fusionio.com&gt;
Link: http://lkml.kernel.org/r/1405358872-3732-2-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>locking/rwsem: Allow conservative optimistic spinning when readers have lock</title>
<updated>2014-07-16T11:28:02+00:00</updated>
<author>
<name>Jason Low</name>
<email>jason.low2@hp.com</email>
</author>
<published>2014-07-05T03:49:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=37e9562453b813d2ea527bd9531fef2c3c592847'/>
<id>37e9562453b813d2ea527bd9531fef2c3c592847</id>
<content type='text'>
Commit 4fc828e24cd9 ("locking/rwsem: Support optimistic spinning")
introduced a major performance regression for workloads such as
xfs_repair which mix read and write locking of the mmap_sem across
many threads. The result was xfs_repair ran 5x slower on 3.16-rc2
than on 3.15 and using 20x more system CPU time.

Perf profiles indicate in some workloads that significant time can
be spent spinning on !owner. This is because we don't set the lock
owner when readers(s) obtain the rwsem.

In this patch, we'll modify rwsem_can_spin_on_owner() such that we'll
return false if there is no lock owner. The rationale is that if we
just entered the slowpath, yet there is no lock owner, then there is
a possibility that a reader has the lock. To be conservative, we'll
avoid spinning in these situations.

This patch reduced the total run time of the xfs_repair workload from
about 4 minutes 24 seconds down to approximately 1 minute 26 seconds,
back to close to the same performance as on 3.15.

Retesting of AIM7, which were some of the workloads used to test the
original optimistic spinning code, confirmed that we still get big
performance gains with optimistic spinning, even with this additional
regression fix. Davidlohr found that while the 'custom' workload took
a performance hit of ~-14% to throughput for &gt;300 users with this
additional patch, the overall gain with optimistic spinning is
still ~+45%. The 'disk' workload even improved by ~+15% at &gt;1000 users.

Tested-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Acked-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Signed-off-by: Jason Low &lt;jason.low2@hp.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/1404532172.2572.30.camel@j-VirtualBox
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 4fc828e24cd9 ("locking/rwsem: Support optimistic spinning")
introduced a major performance regression for workloads such as
xfs_repair which mix read and write locking of the mmap_sem across
many threads. The result was xfs_repair ran 5x slower on 3.16-rc2
than on 3.15 and using 20x more system CPU time.

Perf profiles indicate in some workloads that significant time can
be spent spinning on !owner. This is because we don't set the lock
owner when readers(s) obtain the rwsem.

In this patch, we'll modify rwsem_can_spin_on_owner() such that we'll
return false if there is no lock owner. The rationale is that if we
just entered the slowpath, yet there is no lock owner, then there is
a possibility that a reader has the lock. To be conservative, we'll
avoid spinning in these situations.

This patch reduced the total run time of the xfs_repair workload from
about 4 minutes 24 seconds down to approximately 1 minute 26 seconds,
back to close to the same performance as on 3.15.

Retesting of AIM7, which were some of the workloads used to test the
original optimistic spinning code, confirmed that we still get big
performance gains with optimistic spinning, even with this additional
regression fix. Davidlohr found that while the 'custom' workload took
a performance hit of ~-14% to throughput for &gt;300 users with this
additional patch, the overall gain with optimistic spinning is
still ~+45%. The 'disk' workload even improved by ~+15% at &gt;1000 users.

Tested-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Acked-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Signed-off-by: Jason Low &lt;jason.low2@hp.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/1404532172.2572.30.camel@j-VirtualBox
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
