<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/lib/list_sort.c, branch v7.1-rc2</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>lib/list_sort: remove dummy cmp() calls to speed up merge_final()</title>
<updated>2026-04-03T06:36:22+00:00</updated>
<author>
<name>Kuan-Wei Chiu</name>
<email>visitorckw@gmail.com</email>
</author>
<published>2026-03-20T18:09:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=86bda539fbcf17c077b7ff4899968a2dc7c31e2d'/>
<id>86bda539fbcf17c077b7ff4899968a2dc7c31e2d</id>
<content type='text'>
Historically, list_sort() implemented a hack in merge_final():
    if (unlikely(!++count))
        cmp(priv, b, b);

This was introduced 16 years ago in commit 835cc0c8477f ("lib: more
scalable list_sort()") so that callers could periodically invoke
cond_resched() within their comparison functions when merging highly
unbalanced lists.

An audit of the kernel tree reveals that fs/ubifs/ was the sole user of
this mechanism.  Recent discussions and inspections by Richard Weinberger
confirm that UBIFS lists are strictly bounded in size (a few thousand
elements at most), meaning it does not strictly rely on these dummy
callbacks to prevent soft lockups.

For the vast majority of list_sort() users (such as block layer IO
schedulers and file systems), this hack results in completely wasted
function calls.  In the worst-case scenario (merging an already sorted
list where 'a' is exhausted quickly), it results in approximately
(N/2)/256 unnecessary cmp() invocations.

Remove the dummy cmp(priv, b, b) fallback from merge_final().  This saves
unnecessary function calls, avoids branching overhead in the tight loop,
and slightly speeds up the final merge step for all generic list_sort()
users.

[akpm@linux-foundation.org: remove now-unused local]
Link: https://lkml.kernel.org/r/20260320180938.1827148-3-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu &lt;visitorckw@gmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Ching-Chun (Jim) Huang &lt;jserv@ccns.ncku.edu.tw&gt;
Cc: Mars Cheng &lt;marscheng@google.com&gt;
Cc: Richard Weinberger &lt;richard@nod.at&gt;
Cc: Yu-Chun Lin &lt;eleanor15x@gmail.com&gt;
Cc: Zhihao Cheng &lt;chengzhihao1@huawei.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Historically, list_sort() implemented a hack in merge_final():
    if (unlikely(!++count))
        cmp(priv, b, b);

This was introduced 16 years ago in commit 835cc0c8477f ("lib: more
scalable list_sort()") so that callers could periodically invoke
cond_resched() within their comparison functions when merging highly
unbalanced lists.

An audit of the kernel tree reveals that fs/ubifs/ was the sole user of
this mechanism.  Recent discussions and inspections by Richard Weinberger
confirm that UBIFS lists are strictly bounded in size (a few thousand
elements at most), meaning it does not strictly rely on these dummy
callbacks to prevent soft lockups.

For the vast majority of list_sort() users (such as block layer IO
schedulers and file systems), this hack results in completely wasted
function calls.  In the worst-case scenario (merging an already sorted
list where 'a' is exhausted quickly), it results in approximately
(N/2)/256 unnecessary cmp() invocations.

Remove the dummy cmp(priv, b, b) fallback from merge_final().  This saves
unnecessary function calls, avoids branching overhead in the tight loop,
and slightly speeds up the final merge step for all generic list_sort()
users.

[akpm@linux-foundation.org: remove now-unused local]
Link: https://lkml.kernel.org/r/20260320180938.1827148-3-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu &lt;visitorckw@gmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Ching-Chun (Jim) Huang &lt;jserv@ccns.ncku.edu.tw&gt;
Cc: Mars Cheng &lt;marscheng@google.com&gt;
Cc: Richard Weinberger &lt;richard@nod.at&gt;
Cc: Yu-Chun Lin &lt;eleanor15x@gmail.com&gt;
Cc: Zhihao Cheng &lt;chengzhihao1@huawei.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/list_sort: clarify comparison function requirements in list_sort()</title>
<updated>2025-01-25T06:47:23+00:00</updated>
<author>
<name>Kuan-Wei Chiu</name>
<email>visitorckw@gmail.com</email>
</author>
<published>2025-01-06T17:01:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e420460ba443e8ad1cd71568c50b6e09d13fb106'/>
<id>e420460ba443e8ad1cd71568c50b6e09d13fb106</id>
<content type='text'>
Add a detailed explanation in the list_sort() kernel doc comment
specifying that the comparison function must satisfy antisymmetry and
transitivity.  These properties are essential for the sorting algorithm to
produce correct results.

Issues have arisen in the past [1][2][3][4] where comparison functions
violated the transitivity property, causing sorting algorithms to fail to
correctly order elements.  While these requirements may seem
straightforward, they are commonly misunderstood or overlooked, leading to
bugs.  Highlighting these properties in the documentation will help
prevent such mistakes in the future.

Link: https://lore.kernel.org/lkml/20240701205639.117194-1-visitorckw@gmail.com [1]
Link: https://lore.kernel.org/lkml/20241203202228.1274403-1-visitorckw@gmail.com [2]
Link: https://lore.kernel.org/lkml/20241209134226.1939163-1-visitorckw@gmail.com [3]
Link: https://lore.kernel.org/lkml/20241209145728.1975311-1-visitorckw@gmail.com [4]
Link: https://lkml.kernel.org/r/20250106170104.3137845-3-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu &lt;visitorckw@gmail.com&gt;
Cc: Ching-Chun (Jim) Huang &lt;jserv@ccns.ncku.edu.tw&gt;
Cc: &lt;chuang@cs.nycu.edu.tw&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a detailed explanation in the list_sort() kernel doc comment
specifying that the comparison function must satisfy antisymmetry and
transitivity.  These properties are essential for the sorting algorithm to
produce correct results.

Issues have arisen in the past [1][2][3][4] where comparison functions
violated the transitivity property, causing sorting algorithms to fail to
correctly order elements.  While these requirements may seem
straightforward, they are commonly misunderstood or overlooked, leading to
bugs.  Highlighting these properties in the documentation will help
prevent such mistakes in the future.

Link: https://lore.kernel.org/lkml/20240701205639.117194-1-visitorckw@gmail.com [1]
Link: https://lore.kernel.org/lkml/20241203202228.1274403-1-visitorckw@gmail.com [2]
Link: https://lore.kernel.org/lkml/20241209134226.1939163-1-visitorckw@gmail.com [3]
Link: https://lore.kernel.org/lkml/20241209145728.1975311-1-visitorckw@gmail.com [4]
Link: https://lkml.kernel.org/r/20250106170104.3137845-3-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu &lt;visitorckw@gmail.com&gt;
Cc: Ching-Chun (Jim) Huang &lt;jserv@ccns.ncku.edu.tw&gt;
Cc: &lt;chuang@cs.nycu.edu.tw&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/list_sort: remove unnecessary header includes</title>
<updated>2024-11-06T01:12:33+00:00</updated>
<author>
<name>Kuan-Wei Chiu</name>
<email>visitorckw@gmail.com</email>
</author>
<published>2024-10-12T04:28:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=908ef9bb4bd36837c3619109bdcf58f6ab00bfc7'/>
<id>908ef9bb4bd36837c3619109bdcf58f6ab00bfc7</id>
<content type='text'>
Patch series "Remove unnecessary header includes from
{tools/}lib/list_sort.c".

Remove outdated and unnecessary header includes from lib/list_sort.c and
tools/lib/list_sort.c.  Additionally, update the hunk exceptions checked
by check_headers.sh to reflect these changes.


This patch (of 3):

After commit 043b3f7b6388 ("lib/list_sort: simplify and remove
MAX_LIST_LENGTH_BITS"), list_sort.c no longer uses ARRAY_SIZE() (which
required kernel.h and bug.h for BUILD_BUG_ON_ZERO via __must_be_array) or
memset() (which required string.h).  As these headers are no longer
needed, removes them.

There are no changes to the generated code, as confirmed by 'objdump -d'. 
Additionally, 'wc -l' shows that the size of lib/.list_sort.o.cmd is
reduced from 259 lines to 101 lines.

Link: https://lkml.kernel.org/r/20241012042828.471614-1-visitorckw@gmail.com
Link: https://lkml.kernel.org/r/20241012042828.471614-2-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu &lt;visitorckw@gmail.com&gt;
Acked-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@kernel.org&gt;
Cc: Ching-Chun (Jim) Huang &lt;jserv@ccns.ncku.edu.tw&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: "Liang, Kan" &lt;kan.liang@linux.intel.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Patch series "Remove unnecessary header includes from
{tools/}lib/list_sort.c".

Remove outdated and unnecessary header includes from lib/list_sort.c and
tools/lib/list_sort.c.  Additionally, update the hunk exceptions checked
by check_headers.sh to reflect these changes.


This patch (of 3):

After commit 043b3f7b6388 ("lib/list_sort: simplify and remove
MAX_LIST_LENGTH_BITS"), list_sort.c no longer uses ARRAY_SIZE() (which
required kernel.h and bug.h for BUILD_BUG_ON_ZERO via __must_be_array) or
memset() (which required string.h).  As these headers are no longer
needed, removes them.

There are no changes to the generated code, as confirmed by 'objdump -d'. 
Additionally, 'wc -l' shows that the size of lib/.list_sort.o.cmd is
reduced from 259 lines to 101 lines.

Link: https://lkml.kernel.org/r/20241012042828.471614-1-visitorckw@gmail.com
Link: https://lkml.kernel.org/r/20241012042828.471614-2-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu &lt;visitorckw@gmail.com&gt;
Acked-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@kernel.org&gt;
Cc: Ching-Chun (Jim) Huang &lt;jserv@ccns.ncku.edu.tw&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: "Liang, Kan" &lt;kan.liang@linux.intel.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lib: fix spelling mistakes</title>
<updated>2021-07-08T18:48:20+00:00</updated>
<author>
<name>Zhen Lei</name>
<email>thunder.leizhen@huawei.com</email>
</author>
<published>2021-07-08T01:07:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9dbbc3b9d09d6deba9f3b9e1d5b355032ed46a75'/>
<id>9dbbc3b9d09d6deba9f3b9e1d5b355032ed46a75</id>
<content type='text'>
Fix some spelling mistakes in comments:
permanentely ==&gt; permanently
wont ==&gt; won't
remaning ==&gt; remaining
succed ==&gt; succeed
shouldnt ==&gt; shouldn't
alpha-numeric ==&gt; alphanumeric
storeing ==&gt; storing
funtion ==&gt; function
documenation ==&gt; documentation
Determin ==&gt; Determine
intepreted ==&gt; interpreted
ammount ==&gt; amount
obious ==&gt; obvious
interupts ==&gt; interrupts
occured ==&gt; occurred
asssociated ==&gt; associated
taking into acount ==&gt; taking into account
squence ==&gt; sequence
stil ==&gt; still
contiguos ==&gt; contiguous
matchs ==&gt; matches

Link: https://lkml.kernel.org/r/20210607072555.12416-1-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei &lt;thunder.leizhen@huawei.com&gt;
Reviewed-by: Jacob Keller &lt;jacob.e.keller@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix some spelling mistakes in comments:
permanentely ==&gt; permanently
wont ==&gt; won't
remaning ==&gt; remaining
succed ==&gt; succeed
shouldnt ==&gt; shouldn't
alpha-numeric ==&gt; alphanumeric
storeing ==&gt; storing
funtion ==&gt; function
documenation ==&gt; documentation
Determin ==&gt; Determine
intepreted ==&gt; interpreted
ammount ==&gt; amount
obious ==&gt; obvious
interupts ==&gt; interrupts
occured ==&gt; occurred
asssociated ==&gt; associated
taking into acount ==&gt; taking into account
squence ==&gt; sequence
stil ==&gt; still
contiguos ==&gt; contiguous
matchs ==&gt; matches

Link: https://lkml.kernel.org/r/20210607072555.12416-1-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei &lt;thunder.leizhen@huawei.com&gt;
Reviewed-by: Jacob Keller &lt;jacob.e.keller@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/list_sort.c: fix typo in function description</title>
<updated>2021-05-07T02:24:12+00:00</updated>
<author>
<name>ToastC</name>
<email>mrtoastcheng@gmail.com</email>
</author>
<published>2021-05-07T01:03:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e89b6358052de202e53e47623f50b6d28182ccdf'/>
<id>e89b6358052de202e53e47623f50b6d28182ccdf</id>
<content type='text'>
Replace beautiully with beautifully

Link: https://lkml.kernel.org/r/20210315090633.9759-1-mrtoastcheng@gmail.com
Signed-off-by: ShihCheng Tu &lt;mrtoastcheng@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Replace beautiully with beautifully

Link: https://lkml.kernel.org/r/20210315090633.9759-1-mrtoastcheng@gmail.com
Signed-off-by: ShihCheng Tu &lt;mrtoastcheng@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: Change list_sort to use const pointers</title>
<updated>2021-04-08T23:04:22+00:00</updated>
<author>
<name>Sami Tolvanen</name>
<email>samitolvanen@google.com</email>
</author>
<published>2021-04-08T18:28:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4f0f586bf0c898233d8f316f471a21db2abd522d'/>
<id>4f0f586bf0c898233d8f316f471a21db2abd522d</id>
<content type='text'>
list_sort() internally casts the comparison function passed to it
to a different type with constant struct list_head pointers, and
uses this pointer to call the functions, which trips indirect call
Control-Flow Integrity (CFI) checking.

Instead of removing the consts, this change defines the
list_cmp_func_t type and changes the comparison function types of
all list_sort() callers to use const pointers, thus avoiding type
mismatches.

Suggested-by: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Signed-off-by: Sami Tolvanen &lt;samitolvanen@google.com&gt;
Reviewed-by: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Tested-by: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Tested-by: Nathan Chancellor &lt;nathan@kernel.org&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
list_sort() internally casts the comparison function passed to it
to a different type with constant struct list_head pointers, and
uses this pointer to call the functions, which trips indirect call
Control-Flow Integrity (CFI) checking.

Instead of removing the consts, this change defines the
list_cmp_func_t type and changes the comparison function types of
all list_sort() callers to use const pointers, thus avoiding type
mismatches.

Suggested-by: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Signed-off-by: Sami Tolvanen &lt;samitolvanen@google.com&gt;
Reviewed-by: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Tested-by: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Tested-by: Nathan Chancellor &lt;nathan@kernel.org&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com
</pre>
</div>
</content>
</entry>
<entry>
<title>lib: list_sort.c: add a blank line to avoid kernel-doc warnings</title>
<updated>2019-06-20T20:07:34+00:00</updated>
<author>
<name>Mauro Carvalho Chehab</name>
<email>mchehab+samsung@kernel.org</email>
</author>
<published>2019-06-18T18:51:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4ae5b8f2140d18788979153370c83cf925092b5c'/>
<id>4ae5b8f2140d18788979153370c83cf925092b5c</id>
<content type='text'>
In order for a list to be recognized as such, blank lines
are required.

Solve those Sphinx warnings:

./lib/list_sort.c:162: WARNING: Unexpected indentation.
./lib/list_sort.c:163: WARNING: Block quote ends without a blank line; unexpected unindent.

Signed-off-by: Mauro Carvalho Chehab &lt;mchehab+samsung@kernel.org&gt;
Signed-off-by: Jonathan Corbet &lt;corbet@lwn.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In order for a list to be recognized as such, blank lines
are required.

Solve those Sphinx warnings:

./lib/list_sort.c:162: WARNING: Unexpected indentation.
./lib/list_sort.c:163: WARNING: Block quote ends without a blank line; unexpected unindent.

Signed-off-by: Mauro Carvalho Chehab &lt;mchehab+samsung@kernel.org&gt;
Signed-off-by: Jonathan Corbet &lt;corbet@lwn.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/list_sort: fix kerneldoc build error</title>
<updated>2019-05-23T15:27:39+00:00</updated>
<author>
<name>Jonathan Corbet</name>
<email>corbet@lwn.net</email>
</author>
<published>2019-05-22T19:41:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f35a1abd9e7a0d37a1bcc1149eaf2aa737d7ea21'/>
<id>f35a1abd9e7a0d37a1bcc1149eaf2aa737d7ea21</id>
<content type='text'>
Commit 043b3f7b6388 ("lib/list_sort: simplify and remove
MAX_LIST_LENGTH_BITS") added some useful kerneldoc info, but also broke the
docs build:

  ./lib/list_sort.c:128: WARNING: Definition list ends without a blank line; unexpected unindent.
  ./lib/list_sort.c:161: WARNING: Unexpected indentation.
  ./lib/list_sort.c:162: WARNING: Block quote ends without a blank line; unexpected unindent.

Fix the offending literal block and make the error go away.

Fixes: 043b3f7b6388 ("lib/list_sort: simplify and remove MAX_LIST_LENGTH_BITS")
Cc: George Spelvin &lt;lkml@sdf.org&gt;
Signed-off-by: Jonathan Corbet &lt;corbet@lwn.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 043b3f7b6388 ("lib/list_sort: simplify and remove
MAX_LIST_LENGTH_BITS") added some useful kerneldoc info, but also broke the
docs build:

  ./lib/list_sort.c:128: WARNING: Definition list ends without a blank line; unexpected unindent.
  ./lib/list_sort.c:161: WARNING: Unexpected indentation.
  ./lib/list_sort.c:162: WARNING: Block quote ends without a blank line; unexpected unindent.

Fix the offending literal block and make the error go away.

Fixes: 043b3f7b6388 ("lib/list_sort: simplify and remove MAX_LIST_LENGTH_BITS")
Cc: George Spelvin &lt;lkml@sdf.org&gt;
Signed-off-by: Jonathan Corbet &lt;corbet@lwn.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/list_sort: optimize number of calls to comparison function</title>
<updated>2019-05-15T02:52:49+00:00</updated>
<author>
<name>George Spelvin</name>
<email>lkml@sdf.org</email>
</author>
<published>2019-05-14T22:43:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b5c56e0cdd62979dd538e5363b06be5bdf735a09'/>
<id>b5c56e0cdd62979dd538e5363b06be5bdf735a09</id>
<content type='text'>
CONFIG_RETPOLINE has severely degraded indirect function call
performance, so it's worth putting some effort into reducing the number
of times cmp() is called.

This patch avoids badly unbalanced merges on unlucky input sizes.  It
slightly increases the code size, but saves an average of 0.2*n calls to
cmp().

x86-64 code size 739 -&gt; 803 bytes (+64)

Unfortunately, there's not a lot of low-hanging fruit in a merge sort;
it already performs only n*log2(n) - K*n + O(1) compares.  The leading
coefficient is already at the theoretical limit (log2(n!) corresponds to
K=1.4427), so we're fighting over the linear term, and the best
mergesort can do is K=1.2645, achieved when n is a power of 2.

The differences between mergesort variants appear when n is *not* a
power of 2; K is a function of the fractional part of log2(n).  Top-down
mergesort does best of all, achieving a minimum K=1.2408, and an average
(over all sizes) K=1.248.  However, that requires knowing the number of
entries to be sorted ahead of time, and making a full pass over the
input to count it conflicts with a second performance goal, which is
cache blocking.

Obviously, we have to read the entire list into L1 cache at some point,
and performance is best if it fits.  But if it doesn't fit, each full
pass over the input causes a cache miss per element, which is
undesirable.

While textbooks explain bottom-up mergesort as a succession of merging
passes, practical implementations do merging in depth-first order: as
soon as two lists of the same size are available, they are merged.  This
allows as many merge passes as possible to fit into L1; only the final
few merges force cache misses.

This cache-friendly depth-first merge order depends on us merging the
beginning of the input as much as possible before we've even seen the
end of the input (and thus know its size).

The simple eager merge pattern causes bad performance when n is just
over a power of 2.  If n=1028, the final merge is between 1024- and
4-element lists, which is wasteful of comparisons.  (This is actually
worse on average than n=1025, because a 1204:1 merge will, on average,
end after 512 compares, while 1024:4 will walk 4/5 of the list.)

Because of this, bottom-up mergesort achieves K &lt; 0.5 for such sizes,
and has an average (over all sizes) K of around 1.  (My experiments show
K=1.01, while theory predicts K=0.965.)

There are "worst-case optimal" variants of bottom-up mergesort which
avoid this bad performance, but the algorithms given in the literature,
such as queue-mergesort and boustrodephonic mergesort, depend on the
breadth-first multi-pass structure that we are trying to avoid.

This implementation is as eager as possible while ensuring that all
merge passes are at worst 1:2 unbalanced.  This achieves the same
average K=1.207 as queue-mergesort, which is 0.2*n better then
bottom-up, and only 0.04*n behind top-down mergesort.

Specifically, defers merging two lists of size 2^k until it is known
that there are 2^k additional inputs following.  This ensures that the
final uneven merges triggered by reaching the end of the input will be
at worst 2:1.  This will avoid cache misses as long as 3*2^k elements
fit into the cache.

(I confess to being more than a little bit proud of how clean this code
turned out.  It took a lot of thinking, but the resultant inner loop is
very simple and efficient.)

Refs:
  Bottom-up Mergesort: A Detailed Analysis
  Wolfgang Panny, Helmut Prodinger
  Algorithmica 14(4):340--354, October 1995
  https://doi.org/10.1007/BF01294131
  https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.5260

  The cost distribution of queue-mergesort, optimal mergesorts, and
  power-of-two rules
  Wei-Mei Chen, Hsien-Kuei Hwang, Gen-Huey Chen
  Journal of Algorithms 30(2); Pages 423--448, February 1999
  https://doi.org/10.1006/jagm.1998.0986
  https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.5380

  Queue-Mergesort
  Mordecai J. Golin, Robert Sedgewick
  Information Processing Letters, 48(5):253--259, 10 December 1993
  https://doi.org/10.1016/0020-0190(93)90088-q
  https://sci-hub.tw/10.1016/0020-0190(93)90088-Q

Feedback from Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;.

Link: http://lkml.kernel.org/r/fd560853cc4dca0d0f02184ffa888b4c1be89abc.1552704200.git.lkml@sdf.org
Signed-off-by: George Spelvin &lt;lkml@sdf.org&gt;
Acked-by: Andrey Abramov &lt;st5pub@yandex.ru&gt;
Acked-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Reviewed-by: Andy Shevchenko &lt;andriy.shevchenko@linux.intel.com&gt;
Cc: Daniel Wagner &lt;daniel.wagner@siemens.com&gt;
Cc: Dave Chinner &lt;dchinner@redhat.com&gt;
Cc: Don Mullis &lt;don.mullis@gmail.com&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
CONFIG_RETPOLINE has severely degraded indirect function call
performance, so it's worth putting some effort into reducing the number
of times cmp() is called.

This patch avoids badly unbalanced merges on unlucky input sizes.  It
slightly increases the code size, but saves an average of 0.2*n calls to
cmp().

x86-64 code size 739 -&gt; 803 bytes (+64)

Unfortunately, there's not a lot of low-hanging fruit in a merge sort;
it already performs only n*log2(n) - K*n + O(1) compares.  The leading
coefficient is already at the theoretical limit (log2(n!) corresponds to
K=1.4427), so we're fighting over the linear term, and the best
mergesort can do is K=1.2645, achieved when n is a power of 2.

The differences between mergesort variants appear when n is *not* a
power of 2; K is a function of the fractional part of log2(n).  Top-down
mergesort does best of all, achieving a minimum K=1.2408, and an average
(over all sizes) K=1.248.  However, that requires knowing the number of
entries to be sorted ahead of time, and making a full pass over the
input to count it conflicts with a second performance goal, which is
cache blocking.

Obviously, we have to read the entire list into L1 cache at some point,
and performance is best if it fits.  But if it doesn't fit, each full
pass over the input causes a cache miss per element, which is
undesirable.

While textbooks explain bottom-up mergesort as a succession of merging
passes, practical implementations do merging in depth-first order: as
soon as two lists of the same size are available, they are merged.  This
allows as many merge passes as possible to fit into L1; only the final
few merges force cache misses.

This cache-friendly depth-first merge order depends on us merging the
beginning of the input as much as possible before we've even seen the
end of the input (and thus know its size).

The simple eager merge pattern causes bad performance when n is just
over a power of 2.  If n=1028, the final merge is between 1024- and
4-element lists, which is wasteful of comparisons.  (This is actually
worse on average than n=1025, because a 1204:1 merge will, on average,
end after 512 compares, while 1024:4 will walk 4/5 of the list.)

Because of this, bottom-up mergesort achieves K &lt; 0.5 for such sizes,
and has an average (over all sizes) K of around 1.  (My experiments show
K=1.01, while theory predicts K=0.965.)

There are "worst-case optimal" variants of bottom-up mergesort which
avoid this bad performance, but the algorithms given in the literature,
such as queue-mergesort and boustrodephonic mergesort, depend on the
breadth-first multi-pass structure that we are trying to avoid.

This implementation is as eager as possible while ensuring that all
merge passes are at worst 1:2 unbalanced.  This achieves the same
average K=1.207 as queue-mergesort, which is 0.2*n better then
bottom-up, and only 0.04*n behind top-down mergesort.

Specifically, defers merging two lists of size 2^k until it is known
that there are 2^k additional inputs following.  This ensures that the
final uneven merges triggered by reaching the end of the input will be
at worst 2:1.  This will avoid cache misses as long as 3*2^k elements
fit into the cache.

(I confess to being more than a little bit proud of how clean this code
turned out.  It took a lot of thinking, but the resultant inner loop is
very simple and efficient.)

Refs:
  Bottom-up Mergesort: A Detailed Analysis
  Wolfgang Panny, Helmut Prodinger
  Algorithmica 14(4):340--354, October 1995
  https://doi.org/10.1007/BF01294131
  https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.5260

  The cost distribution of queue-mergesort, optimal mergesorts, and
  power-of-two rules
  Wei-Mei Chen, Hsien-Kuei Hwang, Gen-Huey Chen
  Journal of Algorithms 30(2); Pages 423--448, February 1999
  https://doi.org/10.1006/jagm.1998.0986
  https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.5380

  Queue-Mergesort
  Mordecai J. Golin, Robert Sedgewick
  Information Processing Letters, 48(5):253--259, 10 December 1993
  https://doi.org/10.1016/0020-0190(93)90088-q
  https://sci-hub.tw/10.1016/0020-0190(93)90088-Q

Feedback from Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;.

Link: http://lkml.kernel.org/r/fd560853cc4dca0d0f02184ffa888b4c1be89abc.1552704200.git.lkml@sdf.org
Signed-off-by: George Spelvin &lt;lkml@sdf.org&gt;
Acked-by: Andrey Abramov &lt;st5pub@yandex.ru&gt;
Acked-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Reviewed-by: Andy Shevchenko &lt;andriy.shevchenko@linux.intel.com&gt;
Cc: Daniel Wagner &lt;daniel.wagner@siemens.com&gt;
Cc: Dave Chinner &lt;dchinner@redhat.com&gt;
Cc: Don Mullis &lt;don.mullis@gmail.com&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/list_sort: simplify and remove MAX_LIST_LENGTH_BITS</title>
<updated>2019-05-15T02:52:49+00:00</updated>
<author>
<name>George Spelvin</name>
<email>lkml@sdf.org</email>
</author>
<published>2019-05-14T22:42:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=043b3f7b6388fca6be86ca82979f66c5723a0d10'/>
<id>043b3f7b6388fca6be86ca82979f66c5723a0d10</id>
<content type='text'>
Rather than a fixed-size array of pending sorted runs, use the -&gt;prev
links to keep track of things.  This reduces stack usage, eliminates
some ugly overflow handling, and reduces the code size.

Also:
* merge() no longer needs to handle NULL inputs, so simplify.
* The same applies to merge_and_restore_back_links(), which is renamed
  to the less ponderous merge_final().  (It's a static helper function,
  so we don't need a super-descriptive name; comments will do.)
* Document the actual return value requirements on the (*cmp)()
  function; some callers are already using this feature.

x86-64 code size 1086 -&gt; 739 bytes (-347)

(Yes, I see checkpatch complaining about no space after comma in
"__attribute__((nonnull(2,3,4,5)))".  Checkpatch is wrong.)

Feedback from Rasmus Villemoes, Andy Shevchenko and Geert Uytterhoeven.

[akpm@linux-foundation.org: remove __pure usage due to mysterious warning]
Link: http://lkml.kernel.org/r/f63c410e0ff76009c9b58e01027e751ff7fdb749.1552704200.git.lkml@sdf.org
Signed-off-by: George Spelvin &lt;lkml@sdf.org&gt;
Acked-by: Andrey Abramov &lt;st5pub@yandex.ru&gt;
Acked-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Reviewed-by: Andy Shevchenko &lt;andriy.shevchenko@linux.intel.com&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Daniel Wagner &lt;daniel.wagner@siemens.com&gt;
Cc: Dave Chinner &lt;dchinner@redhat.com&gt;
Cc: Don Mullis &lt;don.mullis@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Rather than a fixed-size array of pending sorted runs, use the -&gt;prev
links to keep track of things.  This reduces stack usage, eliminates
some ugly overflow handling, and reduces the code size.

Also:
* merge() no longer needs to handle NULL inputs, so simplify.
* The same applies to merge_and_restore_back_links(), which is renamed
  to the less ponderous merge_final().  (It's a static helper function,
  so we don't need a super-descriptive name; comments will do.)
* Document the actual return value requirements on the (*cmp)()
  function; some callers are already using this feature.

x86-64 code size 1086 -&gt; 739 bytes (-347)

(Yes, I see checkpatch complaining about no space after comma in
"__attribute__((nonnull(2,3,4,5)))".  Checkpatch is wrong.)

Feedback from Rasmus Villemoes, Andy Shevchenko and Geert Uytterhoeven.

[akpm@linux-foundation.org: remove __pure usage due to mysterious warning]
Link: http://lkml.kernel.org/r/f63c410e0ff76009c9b58e01027e751ff7fdb749.1552704200.git.lkml@sdf.org
Signed-off-by: George Spelvin &lt;lkml@sdf.org&gt;
Acked-by: Andrey Abramov &lt;st5pub@yandex.ru&gt;
Acked-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Reviewed-by: Andy Shevchenko &lt;andriy.shevchenko@linux.intel.com&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Daniel Wagner &lt;daniel.wagner@siemens.com&gt;
Cc: Dave Chinner &lt;dchinner@redhat.com&gt;
Cc: Don Mullis &lt;don.mullis@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
