linux.git/arch/arm/lib, branch v3.17

ARM: 8137/1: fix get_user BE behavior for target variable with size of 8 bytes

2014-09-12T16:38:59+00:00

e38361d 'ARM: 8091/2: add get_user() support for 8 byte types' commit
broke V7 BE get_user call when target var size is 64 bit, but '*ptr' size
is 32 bit or smaller. e38361d changed type of __r2 from 'register
unsigned long' to 'register typeof(x) __r2 asm("r2")' i.e before the change
even when target variable size was 64 bit, __r2 was still 32 bit.
But after e38361d commit, for target var of 64 bit size, __r2 became 64
bit and now it should occupy 2 registers r2, and r3. The issue in BE case
that r3 register is least significant word of __r2 and r2 register is most
significant word of __r2. But __get_user_4 still copies result into r2 (most
significant word of __r2). Subsequent code copies from __r2 into x, but
for situation described it will pick up only garbage from r3 register.

Special __get_user_64t_(124) functions are introduced. They are similar to
corresponding __get_user_(124) function but result stored in r3 register
(lsw in case of 64 bit __r2 in BE image). Those function are used by
get_user macro in case of BE and target var size is 64bit.

Also changed __get_user_lo8 name into __get_user_32t_8 to get consistent
naming accross all cases.

Signed-off-by: Victor Kamensky 
Suggested-by: Daniel Thompson 
Reviewed-by: Daniel Thompson 
Signed-off-by: Russell King

Merge tag 'soc-for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

2014-08-08T18:14:29+00:00

Pull ARM SoC platform changes from Olof Johansson:
 "This is the bulk of new SoC enablement and other platform changes for
  3.17:

   - Samsung S5PV210 has been converted to DT and multiplatform
   - Clock drivers and bindings for some of the lower-end i.MX 1/2
     platforms
   - Kirkwood, one of the popular Marvell platforms, is folded into the
     mvebu platform code, removing mach-kirkwood
   - Hwmod data for TI AM43xx and DRA7 platforms
   - More additions of Renesas shmobile platform support
   - Removal of plat-samsung contents that can be removed with S5PV210
     being multiplatform/DT-enabled and the other two old platforms
     being removed

  New platforms (most with only basic support right now):

   - Hisilicon X5HD2 settop box chipset is introduced
   - Mediatek MT6589 (mobile chipset) is introduced
   - Broadcom BCM7xxx settop box chipset is introduced

  + as usual a lot other pieces all over the platform code"

* tag 'soc-for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (240 commits)
  ARM: hisi: remove smp from machine descriptor
  power: reset: move hisilicon reboot code
  ARM: dts: Add hix5hd2-dkb dts file.
  ARM: debug: Rename Hi3716 to HIX5HD2
  ARM: hisi: enable hix5hd2 SoC
  ARM: hisi: add ARCH_HISI
  MAINTAINERS: add entry for Broadcom ARM STB architecture
  ARM: brcmstb: select GISB arbiter and interrupt drivers
  ARM: brcmstb: add infrastructure for ARM-based Broadcom STB SoCs
  ARM: configs: enable SMP in bcm_defconfig
  ARM: add SMP support for Broadcom mobile SoCs
  Documentation: arm: misc updates to Marvell EBU SoC status
  Documentation: arm: add URLs to public datasheets for the Marvell Armada XP SoC
  ARM: mvebu: fix build without platforms selected
  ARM: mvebu: add cpuidle support for Armada 38x
  ARM: mvebu: add cpuidle support for Armada 370
  cpuidle: mvebu: add Armada 38x support
  cpuidle: mvebu: add Armada 370 support
  cpuidle: mvebu: rename the driver from armada-370-xp to mvebu-v7
  ARM: mvebu: export the SCU address
  ...

ARM: 8091/2: add get_user() support for 8 byte types

2014-07-18T11:29:34+00:00

Recent contributions, including to DRM and binder, introduce 64-bit
values in their interfaces. A common motivation for this is to allow
the same ABI for 32- and 64-bit userspaces (and therefore also a shared
ABI for 32/64 hybrid userspaces). Anyhow, the developers would like to
avoid gotchas like having to use copy_from_user().

This feature is already implemented on x86-32 and the majority of other
32-bit architectures. The current list of get_user_8 hold out
architectures are: arm, avr32, blackfin, m32r, metag, microblaze,
mn10300, sh.

Credit:

    My name sits rather uneasily at the top of this patch. The v1 and
    v2 versions of the patch were written by Rob Clark and to produce v4
    I mostly copied code from Russell King and H. Peter Anvin. However I
    have mangled the patch sufficiently that *blame* is rightfully mine
    even if credit should more widely shared.

Changelog:

v5: updated to use the ret macro (requested by Russell King)
v4: remove an inlined add on big endian systems (spotted by Russell King),
    used __ARMEB__ rather than BIG_ENDIAN (to match rest of file),
    cleared r3 on EFAULT during __get_user_8.
v3: fix a couple of checkpatch issues
v2: pass correct size to check_uaccess, and better handling of narrowing
    double word read with __get_user_xb() (Russell King's suggestion)
v1: original

Signed-off-by: Rob Clark 
Signed-off-by: Daniel Thompson 
Signed-off-by: Russell King

ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+

2014-07-18T11:29:04+00:00

ARMv6 and greater introduced a new instruction ("bx") which can be used
to return from function calls.  Recent CPUs perform better when the
"bx lr" instruction is used rather than the "mov pc, lr" instruction,
and this sequence is strongly recommended to be used by the ARM
architecture manual (section A.4.1.1).

We provide a new macro "ret" with all its variants for the condition
code which will resolve to the appropriate instruction.

Rather than doing this piecemeal, and miss some instances, change all
the "mov pc" instances to use the new macro, with the exception of
the "movs" instruction and the kprobes code.  This allows us to detect
the "mov pc, lr" case and fix it up - and also gives us the possibility
of deploying this for other registers depending on the CPU selection.

Reported-by: Will Deacon 
Tested-by: Stephen Warren  # Tegra Jetson TK1
Tested-by: Robert Jarzmik  # mioa701_bootresume.S
Tested-by: Andrew Lunn  # Kirkwood
Tested-by: Shawn Guo 
Tested-by: Tony Lindgren  # OMAPs
Tested-by: Gregory CLEMENT  # Armada XP, 375, 385
Acked-by: Sekhar Nori  # DaVinci
Acked-by: Christoffer Dall  # kvm/hyp
Acked-by: Haojian Zhuang  # PXA3xx
Acked-by: Stefano Stabellini  # Xen
Tested-by: Uwe Kleine-König  # ARMv7M
Tested-by: Simon Horman  # Shmobile
Signed-off-by: Russell King

ARM: choose highest resolution delay timer

2014-06-16T18:48:07+00:00

In case there are several possible delay timers, choose the one with the
highest resolution. This code relies on the fact secondary CPUs have not yet
been brought online when register_current_timer_delay() is called. This is
ensured by implementing calibration_delay_done(),

Signed-off-by: Peter De Schrijver 
Acked-by: Russell King 
Signed-off-by: Stephen Warren

ARM: 7990/1: asm: rename logical shift macros push pull into lspush lspull

2014-02-25T11:33:57+00:00

Renames logical shift macros, 'push' and 'pull', defined in
arch/arm/include/asm/assembler.h, into 'lspush' and 'lspull'.
That eliminates name conflict between 'push' logical shift macro
and 'push' instruction mnemonic. That allows assembler.h to be
included in .S files that use 'push' instruction.

Suggested-by: Will Deacon 
Signed-off-by: Victor Kamensky 
Acked-by: Nicolas Pitre 
Signed-off-by: Russell King

ARM: 7984/1: prefetch: add prefetchw invocations for barriered atomics

2014-02-25T11:30:20+00:00

After a bunch of benchmarking on the interaction between dmb and pldw,
it turns out that issuing the pldw *after* the dmb instruction can
give modest performance gains (~3% atomic_add_return improvement on a
dual A15).

This patch adds prefetchw invocations to our barriered atomic operations
including cmpxchg, test_and_xxx and futexes.

Signed-off-by: Will Deacon 
Signed-off-by: Russell King

ARM: 7877/1: use built-in byte swap function

2013-12-29T12:32:45+00:00

Enable the compiler intrinsic for byte swapping on arch ARM. This
allows the compiler to detect and be able to optimize out byte
swappings, and has a very modest benefit on vmlinux size (Linaro gcc
4.8):

text data bss dec hex filename
2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig
2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap

6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig
6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap

7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig
7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap

Signed-off-by: Kim Phillips 
Reviewed-by: Nicolas Pitre 
Acked-by: David Woodhouse 
Signed-off-by: Russell King

ARM: make kernel oops easier to read

2013-12-29T12:32:30+00:00

We don't need the offset for the first function name in each backtrace
entry; this needlessly consumes screen space.  This is virtually always
the first or second instruction in the called function.

Also, recognise stmfd instructions which include r10 as a valid stack
saving instruction, and when dumping the registers, dump six registers
per line rather than five, and fix the wrapping.

Signed-off-by: Russell King

ARM: 7907/1: lib: delay-loop: Add align directive to fix BogoMIPS calculation

2013-11-30T22:21:03+00:00

Currently mx53 (CortexA8) running at 1GHz reports:
Calibrating delay loop... 663.55 BogoMIPS (lpj=3317760)

Tom Evans verified that alignments of 0x0 and 0x8 run the two instructions of __loop_delay in one clock cycle (1 clock/loop), while alignments of 0x4 and 0xc take 3 clocks to run the loop twice. (1.5 clock/loop)

The original object code looks like this:

00000010 <__loop_const_udelay>:
  10:	e3e01000 	mvn	r1, #0
  14:	e51f201c 	ldr	r2, [pc, #-28]	; 0 <__loop_udelay-0x8>
  18:	e5922000 	ldr	r2, [r2]
  1c:	e0800921 	add	r0, r0, r1, lsr #18
  20:	e1a00720 	lsr	r0, r0, #14
  24:	e0822b21 	add	r2, r2, r1, lsr #22
  28:	e1a02522 	lsr	r2, r2, #10
  2c:	e0000092 	mul	r0, r2, r0
  30:	e0800d21 	add	r0, r0, r1, lsr #26
  34:	e1b00320 	lsrs	r0, r0, #6
  38:	01a0f00e 	moveq	pc, lr

0000003c <__loop_delay>:
  3c:	e2500001 	subs	r0, r0, #1
  40:	8afffffe 	bhi	3c <__loop_delay>
  44:	e1a0f00e 	mov	pc, lr

After adding the 'align 3' directive to __loop_delay (align to 8 bytes):

00000010 <__loop_const_udelay>:
  10:	e3e01000 	mvn	r1, #0
  14:	e51f201c 	ldr	r2, [pc, #-28]	; 0 <__loop_udelay-0x8>
  18:	e5922000 	ldr	r2, [r2]
  1c:	e0800921 	add	r0, r0, r1, lsr #18
  20:	e1a00720 	lsr	r0, r0, #14
  24:	e0822b21 	add	r2, r2, r1, lsr #22
  28:	e1a02522 	lsr	r2, r2, #10
  2c:	e0000092 	mul	r0, r2, r0
  30:	e0800d21 	add	r0, r0, r1, lsr #26
  34:	e1b00320 	lsrs	r0, r0, #6
  38:	01a0f00e 	moveq	pc, lr
  3c:	e320f000 	nop	{0}

00000040 <__loop_delay>:
  40:	e2500001 	subs	r0, r0, #1
  44:	8afffffe 	bhi	40 <__loop_delay>
  48:	e1a0f00e 	mov	pc, lr
  4c:	e320f000 	nop	{0}

, which now reports:
Calibrating delay loop... 996.14 BogoMIPS (lpj=4980736)

Some more test results:

On mx31 (ARM1136) running at 532 MHz, before the patch:
Calibrating delay loop... 351.43 BogoMIPS (lpj=1757184)

On mx31 (ARM1136) running at 532 MHz after the patch:
Calibrating delay loop... 528.79 BogoMIPS (lpj=2643968)

Also tested on mx6 (CortexA9) and on mx27 (ARM926), which shows the same
BogoMIPS value before and after this patch.

Reported-by: Tom Evans 
Suggested-by: Tom Evans 
Signed-off-by: Fabio Estevam 
Signed-off-by: Russell King