linux.git/arch/arm/lib, branch v2.6.33

Merge branch 'for-rmk' of git://linux-arm.org/linux-2.6

2009-09-19T12:47:57+00:00

ARM: 5701/1: ARM: copy_page.S: take into account the size of the cache line

2009-09-15T21:07:02+00:00

Optimized version of copy_page() was written with assumption that cache
line size is 32 bytes. On Cortex-A8 cache line size is 64 bytes.

This patch tries to generalize copy_page() to work with any cache line
size if cache line size is multiple of 16 and page size is multiple of
two cache line size.

After this optimization we've got ~25% speedup on OMAP3(tested in
userspace).

There is test for kernelspace which trigger copy-on-write after fork():

 #include 
 #include 
 #include 

 #define BUF_SIZE (10000*4096)
 #define NFORK 200

 int main(int argc, char **argv)
 {
         char *buf = malloc(BUF_SIZE);
         int i;

         memset(buf, 0, BUF_SIZE);

         for(i = 0; i < NFORK; i++) {
                 if (fork()) {
                         wait(NULL);
                 } else {
                         int j;

                         for(j = 0; j < BUF_SIZE; j+= 4096)
                                 buf[j] = (j & 0xFF) + 1;
                         break;
                 }
         }

         free(buf);
         return 0;
 }

Before optimization this test takes ~66 seconds, after optimization
takes ~56 seconds.

Signed-off-by: Siarhei Siamashka 
Signed-off-by: Kirill A. Shutemov 
Signed-off-by: Russell King

Nicolas Pitre has a new email address

2009-09-15T16:37:12+00:00

Due to problems at cam.org, my nico@cam.org email address is no longer
valid.  FRom now on, nico@fluxnic.net should be used instead.

Signed-off-by: Nicolas Pitre 
Signed-off-by: Linus Torvalds

Merge branch 'for-rmk-2.6.32' of git://git.pengutronix.de/git/ukl/linux-2.6 into devel-stable

2009-08-15T15:51:48+00:00

Complete irq tracing support for ARM

2009-08-13T18:34:37+00:00

Before this patch enabling and disabling irqs in assembler code and by
the hardware wasn't tracked completly.

I had to transpose two instructions in arch/arm/lib/bitops.h because
restore_irqs doesn't preserve the flags with CONFIG_TRACE_IRQFLAGS=y

Signed-off-by: Uwe Kleine-König 
Cc: Russell King 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 

Signed-off-by: Uwe Kleine-König

Thumb-2: Implement the unified arch/arm/lib functions

2009-07-24T11:32:57+00:00

This patch adds the ARM/Thumb-2 unified support for the arch/arm/lib/*
files.

Signed-off-by: Catalin Marinas

Thumb-2: Add some .align statements to the .S files

2009-07-24T11:32:52+00:00

Since the Thumb-2 instructions can be 16-bit wide, data in the .text
sections may not be aligned to a 32-bit word and this leads to unaligned
exceptions. This patch does not affect the ARM code generation.

Signed-off-by: Catalin Marinas

Merge branch 'copy_user' of git://git.marvell.com/orion into devel

2009-06-14T09:59:32+00:00

[ARM] alternative copy_to_user: more precise fallback threshold

2009-05-30T05:10:15+00:00

Previous size thresholds were guessed from various user space benchmarks
using a kernel with and without the alternative uaccess option.  This
is however not as precise as a kernel based test to measure the real
speed of each method.

This adds a simple test bench to show the time needed for each method.
With this, the optimal size treshold for the alternative implementation
can be determined with more confidence.  It appears that the optimal
threshold for both copy_to_user and clear_user is around 64 bytes. This
is not a surprise knowing that the memcpy and memset implementations
need at least 64 bytes to achieve maximum throughput.

One might suggest that such test be used to determine the optimal
threshold at run time instead, but results are near enough to 64 on
tested targets concerned by this alternative copy_to_user implementation,
so adding some overhead associated with a variable threshold is probably
not worth it for now.

Signed-off-by: Nicolas Pitre

[ARM] lower overhead with alternative copy_to_user for small copies

2009-05-30T02:38:33+00:00

Because the alternate copy_to_user implementation has a higher setup cost
than the standard implementation, the size of the memory area to copy
is tested and the standard implementation invoked instead when that size
is too small.  Still, that test is made after the processor has preserved
a bunch of registers on the stack which have to be reloaded right away
needlessly in that case, causing a measurable performance regression
compared to plain usage of the standard implementation only.

To make the size test overhead negligible, let's factorize it out of
the alternate copy_to_user function where it is clear to the compiler
that no stack frame is needed.  Thanks to CONFIG_ARM_UNWIND allowing
for frame pointers to be disabled and tail call optimization to kick in,
the overhead in the small copy case becomes only 3 assembly instructions.

A similar trick is applied to clear_user as well.

Signed-off-by: Nicolas Pitre