<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/arm/lib, branch v2.6.33</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Merge branch 'for-rmk' of git://linux-arm.org/linux-2.6</title>
<updated>2009-09-19T12:47:57+00:00</updated>
<author>
<name>Russell King</name>
<email>rmk+kernel@arm.linux.org.uk</email>
</author>
<published>2009-09-19T12:47:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=40d743b8c16a8cf6e30c1d941aa6147f9550ea75'/>
<id>40d743b8c16a8cf6e30c1d941aa6147f9550ea75</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>ARM: 5701/1: ARM: copy_page.S: take into account the size of the cache line</title>
<updated>2009-09-15T21:07:02+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill@shutemov.name</email>
</author>
<published>2009-09-15T09:26:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dca230f00d737353e2dffae489c916b41971921f'/>
<id>dca230f00d737353e2dffae489c916b41971921f</id>
<content type='text'>
Optimized version of copy_page() was written with assumption that cache
line size is 32 bytes. On Cortex-A8 cache line size is 64 bytes.

This patch tries to generalize copy_page() to work with any cache line
size if cache line size is multiple of 16 and page size is multiple of
two cache line size.

After this optimization we've got ~25% speedup on OMAP3(tested in
userspace).

There is test for kernelspace which trigger copy-on-write after fork():

 #include &lt;stdlib.h&gt;
 #include &lt;string.h&gt;
 #include &lt;unistd.h&gt;

 #define BUF_SIZE (10000*4096)
 #define NFORK 200

 int main(int argc, char **argv)
 {
         char *buf = malloc(BUF_SIZE);
         int i;

         memset(buf, 0, BUF_SIZE);

         for(i = 0; i &lt; NFORK; i++) {
                 if (fork()) {
                         wait(NULL);
                 } else {
                         int j;

                         for(j = 0; j &lt; BUF_SIZE; j+= 4096)
                                 buf[j] = (j &amp; 0xFF) + 1;
                         break;
                 }
         }

         free(buf);
         return 0;
 }

Before optimization this test takes ~66 seconds, after optimization
takes ~56 seconds.

Signed-off-by: Siarhei Siamashka &lt;siarhei.siamashka@nokia.com&gt;
Signed-off-by: Kirill A. Shutemov &lt;kirill@shutemov.name&gt;
Signed-off-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Optimized version of copy_page() was written with assumption that cache
line size is 32 bytes. On Cortex-A8 cache line size is 64 bytes.

This patch tries to generalize copy_page() to work with any cache line
size if cache line size is multiple of 16 and page size is multiple of
two cache line size.

After this optimization we've got ~25% speedup on OMAP3(tested in
userspace).

There is test for kernelspace which trigger copy-on-write after fork():

 #include &lt;stdlib.h&gt;
 #include &lt;string.h&gt;
 #include &lt;unistd.h&gt;

 #define BUF_SIZE (10000*4096)
 #define NFORK 200

 int main(int argc, char **argv)
 {
         char *buf = malloc(BUF_SIZE);
         int i;

         memset(buf, 0, BUF_SIZE);

         for(i = 0; i &lt; NFORK; i++) {
                 if (fork()) {
                         wait(NULL);
                 } else {
                         int j;

                         for(j = 0; j &lt; BUF_SIZE; j+= 4096)
                                 buf[j] = (j &amp; 0xFF) + 1;
                         break;
                 }
         }

         free(buf);
         return 0;
 }

Before optimization this test takes ~66 seconds, after optimization
takes ~56 seconds.

Signed-off-by: Siarhei Siamashka &lt;siarhei.siamashka@nokia.com&gt;
Signed-off-by: Kirill A. Shutemov &lt;kirill@shutemov.name&gt;
Signed-off-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Nicolas Pitre has a new email address</title>
<updated>2009-09-15T16:37:12+00:00</updated>
<author>
<name>Nicolas Pitre</name>
<email>nico@fluxnic.net</email>
</author>
<published>2009-09-14T07:25:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2f82af08fcc7dc01a7e98a49a5995a77e32a2925'/>
<id>2f82af08fcc7dc01a7e98a49a5995a77e32a2925</id>
<content type='text'>
Due to problems at cam.org, my nico@cam.org email address is no longer
valid.  FRom now on, nico@fluxnic.net should be used instead.

Signed-off-by: Nicolas Pitre &lt;nico@fluxnic.net&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Due to problems at cam.org, my nico@cam.org email address is no longer
valid.  FRom now on, nico@fluxnic.net should be used instead.

Signed-off-by: Nicolas Pitre &lt;nico@fluxnic.net&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-rmk-2.6.32' of git://git.pengutronix.de/git/ukl/linux-2.6 into devel-stable</title>
<updated>2009-08-15T15:51:48+00:00</updated>
<author>
<name>Russell King</name>
<email>rmk@dyn-67.arm.linux.org.uk</email>
</author>
<published>2009-08-15T15:51:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9b2616c2e8cc98ca98bbb40cad83a8d3d859e840'/>
<id>9b2616c2e8cc98ca98bbb40cad83a8d3d859e840</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Complete irq tracing support for ARM</title>
<updated>2009-08-13T18:34:37+00:00</updated>
<author>
<name>Uwe Kleine-König</name>
<email>u.kleine-koenig@pengutronix.de</email>
</author>
<published>2009-08-13T18:38:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0d928b0b616d1c5c5fe76019a87cba171ca91633'/>
<id>0d928b0b616d1c5c5fe76019a87cba171ca91633</id>
<content type='text'>
Before this patch enabling and disabling irqs in assembler code and by
the hardware wasn't tracked completly.

I had to transpose two instructions in arch/arm/lib/bitops.h because
restore_irqs doesn't preserve the flags with CONFIG_TRACE_IRQFLAGS=y

Signed-off-by: Uwe Kleine-König &lt;u.kleine-koenig@pengutronix.de&gt;
Cc: Russell King &lt;linux@arm.linux.org.uk&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;

Signed-off-by: Uwe Kleine-König &lt;u.kleine-koenig@pengutronix.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Before this patch enabling and disabling irqs in assembler code and by
the hardware wasn't tracked completly.

I had to transpose two instructions in arch/arm/lib/bitops.h because
restore_irqs doesn't preserve the flags with CONFIG_TRACE_IRQFLAGS=y

Signed-off-by: Uwe Kleine-König &lt;u.kleine-koenig@pengutronix.de&gt;
Cc: Russell King &lt;linux@arm.linux.org.uk&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;

Signed-off-by: Uwe Kleine-König &lt;u.kleine-koenig@pengutronix.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Thumb-2: Implement the unified arch/arm/lib functions</title>
<updated>2009-07-24T11:32:57+00:00</updated>
<author>
<name>Catalin Marinas</name>
<email>catalin.marinas@arm.com</email>
</author>
<published>2009-07-24T11:32:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8b592783a2e8b7721a99730bd549aab5208f36af'/>
<id>8b592783a2e8b7721a99730bd549aab5208f36af</id>
<content type='text'>
This patch adds the ARM/Thumb-2 unified support for the arch/arm/lib/*
files.

Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;



</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch adds the ARM/Thumb-2 unified support for the arch/arm/lib/*
files.

Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;



</pre>
</div>
</content>
</entry>
<entry>
<title>Thumb-2: Add some .align statements to the .S files</title>
<updated>2009-07-24T11:32:52+00:00</updated>
<author>
<name>Catalin Marinas</name>
<email>catalin.marinas@arm.com</email>
</author>
<published>2009-07-24T11:32:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=88987ef91b99cf99bc5d167caeb31d4958fbf931'/>
<id>88987ef91b99cf99bc5d167caeb31d4958fbf931</id>
<content type='text'>
Since the Thumb-2 instructions can be 16-bit wide, data in the .text
sections may not be aligned to a 32-bit word and this leads to unaligned
exceptions. This patch does not affect the ARM code generation.

Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since the Thumb-2 instructions can be 16-bit wide, data in the .text
sections may not be aligned to a 32-bit word and this leads to unaligned
exceptions. This patch does not affect the ARM code generation.

Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'copy_user' of git://git.marvell.com/orion into devel</title>
<updated>2009-06-14T09:59:32+00:00</updated>
<author>
<name>Russell King</name>
<email>rmk@dyn-67.arm.linux.org.uk</email>
</author>
<published>2009-06-14T09:59:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=98797a241e28b787b84d308b867ec4c5fe7bbdf8'/>
<id>98797a241e28b787b84d308b867ec4c5fe7bbdf8</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>[ARM] alternative copy_to_user: more precise fallback threshold</title>
<updated>2009-05-30T05:10:15+00:00</updated>
<author>
<name>Nicolas Pitre</name>
<email>nico@cam.org</email>
</author>
<published>2009-05-30T01:55:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c626e3f5ca1d95ad2204d3128c26e7678714eb55'/>
<id>c626e3f5ca1d95ad2204d3128c26e7678714eb55</id>
<content type='text'>
Previous size thresholds were guessed from various user space benchmarks
using a kernel with and without the alternative uaccess option.  This
is however not as precise as a kernel based test to measure the real
speed of each method.

This adds a simple test bench to show the time needed for each method.
With this, the optimal size treshold for the alternative implementation
can be determined with more confidence.  It appears that the optimal
threshold for both copy_to_user and clear_user is around 64 bytes. This
is not a surprise knowing that the memcpy and memset implementations
need at least 64 bytes to achieve maximum throughput.

One might suggest that such test be used to determine the optimal
threshold at run time instead, but results are near enough to 64 on
tested targets concerned by this alternative copy_to_user implementation,
so adding some overhead associated with a variable threshold is probably
not worth it for now.

Signed-off-by: Nicolas Pitre &lt;nico@marvell.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Previous size thresholds were guessed from various user space benchmarks
using a kernel with and without the alternative uaccess option.  This
is however not as precise as a kernel based test to measure the real
speed of each method.

This adds a simple test bench to show the time needed for each method.
With this, the optimal size treshold for the alternative implementation
can be determined with more confidence.  It appears that the optimal
threshold for both copy_to_user and clear_user is around 64 bytes. This
is not a surprise knowing that the memcpy and memset implementations
need at least 64 bytes to achieve maximum throughput.

One might suggest that such test be used to determine the optimal
threshold at run time instead, but results are near enough to 64 on
tested targets concerned by this alternative copy_to_user implementation,
so adding some overhead associated with a variable threshold is probably
not worth it for now.

Signed-off-by: Nicolas Pitre &lt;nico@marvell.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[ARM] lower overhead with alternative copy_to_user for small copies</title>
<updated>2009-05-30T02:38:33+00:00</updated>
<author>
<name>Nicolas Pitre</name>
<email>nico@cam.org</email>
</author>
<published>2009-05-22T02:17:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=cb9dc92c0a1b76165c8c334402e27191084b2047'/>
<id>cb9dc92c0a1b76165c8c334402e27191084b2047</id>
<content type='text'>
Because the alternate copy_to_user implementation has a higher setup cost
than the standard implementation, the size of the memory area to copy
is tested and the standard implementation invoked instead when that size
is too small.  Still, that test is made after the processor has preserved
a bunch of registers on the stack which have to be reloaded right away
needlessly in that case, causing a measurable performance regression
compared to plain usage of the standard implementation only.

To make the size test overhead negligible, let's factorize it out of
the alternate copy_to_user function where it is clear to the compiler
that no stack frame is needed.  Thanks to CONFIG_ARM_UNWIND allowing
for frame pointers to be disabled and tail call optimization to kick in,
the overhead in the small copy case becomes only 3 assembly instructions.

A similar trick is applied to clear_user as well.

Signed-off-by: Nicolas Pitre &lt;nico@marvell.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Because the alternate copy_to_user implementation has a higher setup cost
than the standard implementation, the size of the memory area to copy
is tested and the standard implementation invoked instead when that size
is too small.  Still, that test is made after the processor has preserved
a bunch of registers on the stack which have to be reloaded right away
needlessly in that case, causing a measurable performance regression
compared to plain usage of the standard implementation only.

To make the size test overhead negligible, let's factorize it out of
the alternate copy_to_user function where it is clear to the compiler
that no stack frame is needed.  Thanks to CONFIG_ARM_UNWIND allowing
for frame pointers to be disabled and tail call optimization to kick in,
the overhead in the small copy case becomes only 3 assembly instructions.

A similar trick is applied to clear_user as well.

Signed-off-by: Nicolas Pitre &lt;nico@marvell.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
