<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/openrisc/include/asm/string.h, branch v4.11</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>openrisc: Add optimized memcpy routine</title>
<updated>2017-02-24T19:14:36+00:00</updated>
<author>
<name>Stafford Horne</name>
<email>shorne@gmail.com</email>
</author>
<published>2016-03-21T07:16:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f5d45dc9116b17ee830d3425ece1e9485c9bab88'/>
<id>f5d45dc9116b17ee830d3425ece1e9485c9bab88</id>
<content type='text'>
The generic memcpy routine provided in kernel does only byte copies.
Using word copies we can lower boot time and cycles spend in memcpy
quite significantly.

Booting on my de0 nano I see boot times go from 7.2 to 5.6 seconds.
The avg cycles in memcpy during boot go from 6467 to 1887.

I tested several algorithms (see code in previous patch mails)

The implementations I tested and avg cycles:
  - Word Copies + Loop Unrolls + Non Aligned    1882
  - Word Copies + Loop Unrolls                  1887
  - Word Copies                                 2441
  - Byte Copies + Loop Unrolls                  6467
  - Byte Copies                                 7600

In the end I ended up going with Word Copies + Loop Unrolls as it
provides best tradeoff between simplicity and boot speedups.

Signed-off-by: Stafford Horne &lt;shorne@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The generic memcpy routine provided in kernel does only byte copies.
Using word copies we can lower boot time and cycles spend in memcpy
quite significantly.

Booting on my de0 nano I see boot times go from 7.2 to 5.6 seconds.
The avg cycles in memcpy during boot go from 6467 to 1887.

I tested several algorithms (see code in previous patch mails)

The implementations I tested and avg cycles:
  - Word Copies + Loop Unrolls + Non Aligned    1882
  - Word Copies + Loop Unrolls                  1887
  - Word Copies                                 2441
  - Byte Copies + Loop Unrolls                  6467
  - Byte Copies                                 7600

In the end I ended up going with Word Copies + Loop Unrolls as it
provides best tradeoff between simplicity and boot speedups.

Signed-off-by: Stafford Horne &lt;shorne@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>openrisc: Add optimized memset</title>
<updated>2017-02-24T19:14:35+00:00</updated>
<author>
<name>Olof Kindgren</name>
<email>olof.kindgren@gmail.com</email>
</author>
<published>2015-02-06T12:41:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d857a1e253498feb231173218df26f5562c70f09'/>
<id>d857a1e253498feb231173218df26f5562c70f09</id>
<content type='text'>
This adds a hand-optimized assembler version of memset and sets
__HAVE_ARCH_MEMSET to use this version instead of the generic C
routine

Signed-off-by: Olof Kindgren &lt;olof.kindgren@gmail.com&gt;
Signed-off-by: Stafford Horne &lt;shorne@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This adds a hand-optimized assembler version of memset and sets
__HAVE_ARCH_MEMSET to use this version instead of the generic C
routine

Signed-off-by: Olof Kindgren &lt;olof.kindgren@gmail.com&gt;
Signed-off-by: Stafford Horne &lt;shorne@gmail.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
