<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/arm/include/asm/memory.h, branch v6.0</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>ARM/dma-mapping: use the generic versions of dma_to_phys/phys_to_dma by default</title>
<updated>2022-07-07T16:18:57+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2022-04-19T08:00:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=af6f23b88e9508f1cb8d0af008bb175019428f73'/>
<id>af6f23b88e9508f1cb8d0af008bb175019428f73</id>
<content type='text'>
Only the footbridge platforms provide their own DMA address translation
helpers, so switch to the generic version for all other platforms, and
consolidate the footbridge implementation to remove two levels of
indirection.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Tested-by: Marc Zyngier &lt;maz@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Only the footbridge platforms provide their own DMA address translation
helpers, so switch to the generic version for all other platforms, and
consolidate the footbridge implementation to remove two levels of
indirection.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Tested-by: Marc Zyngier &lt;maz@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARM: 9104/2: Fix Keystone 2 kernel mapping regression</title>
<updated>2021-08-10T11:17:25+00:00</updated>
<author>
<name>Linus Walleij</name>
<email>linus.walleij@linaro.org</email>
</author>
<published>2021-08-09T11:57:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=463dbba4d189750c2f576449d0bbb11c5413712e'/>
<id>463dbba4d189750c2f576449d0bbb11c5413712e</id>
<content type='text'>
This fixes a Keystone 2 regression discovered as a side effect of
defining an passing the physical start/end sections of the kernel
to the MMU remapping code.

As the Keystone applies an offset to all physical addresses,
including those identified and patches by phys2virt, we fail to
account for this offset in the kernel_sec_start and kernel_sec_end
variables.

Further these offsets can extend into the 64bit range on LPAE
systems such as the Keystone 2.

Fix it like this:
- Extend kernel_sec_start and kernel_sec_end to be 64bit
- Add the offset also to kernel_sec_start and kernel_sec_end

As passing kernel_sec_start and kernel_sec_end as 64bit invariably
incurs BE8 endianness issues I have attempted to dry-code around
these.

Tested on the Vexpress QEMU model both with and without LPAE
enabled.

Fixes: 6e121df14ccd ("ARM: 9090/1: Map the lowmem and kernel separately")
Reported-by: Nishanth Menon &lt;nmenon@kernel.org&gt;
Suggested-by: Russell King &lt;rmk+kernel@armlinux.org.uk&gt;
Tested-by: Grygorii Strashko &lt;grygorii.strashko@ti.com&gt;
Tested-by: Nishanth Menon &lt;nmenon@kernel.org&gt;
Signed-off-by: Linus Walleij &lt;linus.walleij@linaro.org&gt;
Signed-off-by: Russell King (Oracle) &lt;rmk+kernel@armlinux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This fixes a Keystone 2 regression discovered as a side effect of
defining an passing the physical start/end sections of the kernel
to the MMU remapping code.

As the Keystone applies an offset to all physical addresses,
including those identified and patches by phys2virt, we fail to
account for this offset in the kernel_sec_start and kernel_sec_end
variables.

Further these offsets can extend into the 64bit range on LPAE
systems such as the Keystone 2.

Fix it like this:
- Extend kernel_sec_start and kernel_sec_end to be 64bit
- Add the offset also to kernel_sec_start and kernel_sec_end

As passing kernel_sec_start and kernel_sec_end as 64bit invariably
incurs BE8 endianness issues I have attempted to dry-code around
these.

Tested on the Vexpress QEMU model both with and without LPAE
enabled.

Fixes: 6e121df14ccd ("ARM: 9090/1: Map the lowmem and kernel separately")
Reported-by: Nishanth Menon &lt;nmenon@kernel.org&gt;
Suggested-by: Russell King &lt;rmk+kernel@armlinux.org.uk&gt;
Tested-by: Grygorii Strashko &lt;grygorii.strashko@ti.com&gt;
Tested-by: Nishanth Menon &lt;nmenon@kernel.org&gt;
Signed-off-by: Linus Walleij &lt;linus.walleij@linaro.org&gt;
Signed-off-by: Russell King (Oracle) &lt;rmk+kernel@armlinux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARM: 9097/1: mmu: Declare section start/end correctly</title>
<updated>2021-06-21T10:39:39+00:00</updated>
<author>
<name>Linus Walleij</name>
<email>linus.walleij@linaro.org</email>
</author>
<published>2021-06-17T10:50:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e17362d683fb6bcda0e419ec0ad7cabb8252c509'/>
<id>e17362d683fb6bcda0e419ec0ad7cabb8252c509</id>
<content type='text'>
The kernel test robot reported an interesting bug:

A debug print was using %08x with kernel_sec_start and kernel_sec_end
being phys_addr_t which can be either u32 or u64 (possibly more).

Actually these should just be declared as u32 to begin with: they are
declared as such in the assembly in head.S and the kernel definitely
boots in a 32 bit physical address space. Redeclare the kernel_sec_start
and kernel_sec_end to rid the bug.

Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Fixes: 6e121df14ccd ("ARM: 9090/1: Map the lowmem and kernel separately")
Signed-off-by: Linus Walleij &lt;linus.walleij@linaro.org&gt;
Signed-off-by: Russell King &lt;rmk+kernel@armlinux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The kernel test robot reported an interesting bug:

A debug print was using %08x with kernel_sec_start and kernel_sec_end
being phys_addr_t which can be either u32 or u64 (possibly more).

Actually these should just be declared as u32 to begin with: they are
declared as such in the assembly in head.S and the kernel definitely
boots in a 32 bit physical address space. Redeclare the kernel_sec_start
and kernel_sec_end to rid the bug.

Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Fixes: 6e121df14ccd ("ARM: 9090/1: Map the lowmem and kernel separately")
Signed-off-by: Linus Walleij &lt;linus.walleij@linaro.org&gt;
Signed-off-by: Russell King &lt;rmk+kernel@armlinux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARM: 9089/1: Define kernel physical section start and end</title>
<updated>2021-06-13T17:16:41+00:00</updated>
<author>
<name>Linus Walleij</name>
<email>linus.walleij@linaro.org</email>
</author>
<published>2021-06-03T08:51:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a91da54570856e3d3af4ba2884db71fbce06f70b'/>
<id>a91da54570856e3d3af4ba2884db71fbce06f70b</id>
<content type='text'>
When we are mapping the initial sections in head.S we
know very well where the start and end of the kernel image
in physical memory is placed. Later on it gets hard
to determine this.

Save the information into two variables named
kernel_sec_start and kernel_sec_end for convenience
for later work involving the physical start and end
of the kernel. These variables are section-aligned
corresponding to the early section mappings set up
in head.S.

Signed-off-by: Linus Walleij &lt;linus.walleij@linaro.org&gt;
Signed-off-by: Russell King &lt;rmk+kernel@armlinux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we are mapping the initial sections in head.S we
know very well where the start and end of the kernel image
in physical memory is placed. Later on it gets hard
to determine this.

Save the information into two variables named
kernel_sec_start and kernel_sec_end for convenience
for later work involving the physical start and end
of the kernel. These variables are section-aligned
corresponding to the early section mappings set up
in head.S.

Signed-off-by: Linus Walleij &lt;linus.walleij@linaro.org&gt;
Signed-off-by: Russell King &lt;rmk+kernel@armlinux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARM: 9088/1: Split KERNEL_OFFSET from PAGE_OFFSET</title>
<updated>2021-06-13T17:16:40+00:00</updated>
<author>
<name>Linus Walleij</name>
<email>linus.walleij@linaro.org</email>
</author>
<published>2021-06-03T08:50:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b78f63f4439bbfd02bfc628114ed0f63460e5570'/>
<id>b78f63f4439bbfd02bfc628114ed0f63460e5570</id>
<content type='text'>
We want to be able to compile the kernel into an address different
from PAGE_OFFSET (start of lowmem) + TEXT_OFFSET, so start to pry
apart the address of where the kernel is located from the address
where the lowmem is located by defining and using KERNEL_OFFSET in
a few key places.

Signed-off-by: Linus Walleij &lt;linus.walleij@linaro.org&gt;
Signed-off-by: Russell King &lt;rmk+kernel@armlinux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We want to be able to compile the kernel into an address different
from PAGE_OFFSET (start of lowmem) + TEXT_OFFSET, so start to pry
apart the address of where the kernel is located from the address
where the lowmem is located by defining and using KERNEL_OFFSET in
a few key places.

Signed-off-by: Linus Walleij &lt;linus.walleij@linaro.org&gt;
Signed-off-by: Russell King &lt;rmk+kernel@armlinux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARM: 9059/1: cache-v7: get rid of mini-stack</title>
<updated>2021-03-09T10:25:18+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ardb@kernel.org</email>
</author>
<published>2021-02-11T08:25:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=95731b8ee63ec9419822a51cd9878fa32582fdd2'/>
<id>95731b8ee63ec9419822a51cd9878fa32582fdd2</id>
<content type='text'>
Now that we have reduced the number of registers that we need to
preserve when calling v7_invalidate_l1 from the boot code, we can use
scratch registers to preserve the remaining ones, and get rid of the
mini stack entirely. This works around any issues regarding cache
behavior in relation to the uncached accesses to this memory, which is
hard to get right in the general case (i.e., both bare metal and under
virtualization)

While at it, switch v7_invalidate_l1 to using ip as a scratch register
instead of r4. This makes the function AAPCS compliant, and removes the
need to stash r4 in ip across the call.

Acked-by: Nicolas Pitre &lt;nico@fluxnic.net&gt;
Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Signed-off-by: Russell King &lt;rmk+kernel@armlinux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that we have reduced the number of registers that we need to
preserve when calling v7_invalidate_l1 from the boot code, we can use
scratch registers to preserve the remaining ones, and get rid of the
mini stack entirely. This works around any issues regarding cache
behavior in relation to the uncached accesses to this memory, which is
hard to get right in the general case (i.e., both bare metal and under
virtualization)

While at it, switch v7_invalidate_l1 to using ip as a scratch register
instead of r4. This makes the function AAPCS compliant, and removes the
need to stash r4 in ip across the call.

Acked-by: Nicolas Pitre &lt;nico@fluxnic.net&gt;
Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Signed-off-by: Russell King &lt;rmk+kernel@armlinux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'devel-stable' into for-next</title>
<updated>2020-12-21T11:19:26+00:00</updated>
<author>
<name>Russell King</name>
<email>rmk+kernel@armlinux.org.uk</email>
</author>
<published>2020-12-21T11:19:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ecbbb88727aee7880527d4b320b4d06dde75d46d'/>
<id>ecbbb88727aee7880527d4b320b4d06dde75d46d</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>ARM: p2v: reduce p2v alignment requirement to 2 MiB</title>
<updated>2020-10-28T15:59:43+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ardb@kernel.org</email>
</author>
<published>2020-09-18T08:55:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9443076e4330a14ae2c6114307668b98a8293b77'/>
<id>9443076e4330a14ae2c6114307668b98a8293b77</id>
<content type='text'>
The ARM kernel's linear map starts at PAGE_OFFSET, which maps to a
physical address (PHYS_OFFSET) that is platform specific, and is
discovered at boot. Since we don't want to slow down translations
between physical and virtual addresses by keeping the offset in a
variable in memory, we implement this by patching the code performing
the translation, and putting the offset between PAGE_OFFSET and the
start of physical RAM directly into the instruction opcodes.

As we only patch up to 8 bits of offset, yielding 4 GiB &gt;&gt; 8 == 16 MiB
of granularity, we have to round up PHYS_OFFSET to the next multiple if
the start of physical RAM is not a multiple of 16 MiB. This wastes some
physical RAM, since the memory that was skipped will now live below
PAGE_OFFSET, making it inaccessible to the kernel.

We can improve this by changing the patchable sequences and the patching
logic to carry more bits of offset: 11 bits gives us 4 GiB &gt;&gt; 11 == 2 MiB
of granularity, and so we will never waste more than that amount by
rounding up the physical start of DRAM to the next multiple of 2 MiB.
(Note that 2 MiB granularity guarantees that the linear mapping can be
created efficiently, whereas less than 2 MiB may result in the linear
mapping needing another level of page tables)

This helps Zhen Lei's scenario, where the start of DRAM is known to be
occupied. It also helps EFI boot, which relies on the firmware's page
allocator to allocate space for the decompressed kernel as low as
possible. And if the KASLR patches ever land for 32-bit, it will give
us 3 more bits of randomization of the placement of the kernel inside
the linear region.

For the ARM code path, it simply comes down to using two add/sub
instructions instead of one for the carryless version, and patching
each of them with the correct immediate depending on the rotation
field. For the LPAE calculation, which has to deal with a carry, it
patches the MOVW instruction with up to 12 bits of offset (but we only
need 11 bits anyway)

For the Thumb2 code path, patching more than 11 bits of displacement
would be somewhat cumbersome, but the 11 bits we need fit nicely into
the second word of the u16[2] opcode, so we simply update the immediate
assignment and the left shift to create an addend of the right magnitude.

Suggested-by: Zhen Lei &lt;thunder.leizhen@huawei.com&gt;
Acked-by: Nicolas Pitre &lt;nico@fluxnic.net&gt;
Acked-by: Linus Walleij &lt;linus.walleij@linaro.org&gt;
Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The ARM kernel's linear map starts at PAGE_OFFSET, which maps to a
physical address (PHYS_OFFSET) that is platform specific, and is
discovered at boot. Since we don't want to slow down translations
between physical and virtual addresses by keeping the offset in a
variable in memory, we implement this by patching the code performing
the translation, and putting the offset between PAGE_OFFSET and the
start of physical RAM directly into the instruction opcodes.

As we only patch up to 8 bits of offset, yielding 4 GiB &gt;&gt; 8 == 16 MiB
of granularity, we have to round up PHYS_OFFSET to the next multiple if
the start of physical RAM is not a multiple of 16 MiB. This wastes some
physical RAM, since the memory that was skipped will now live below
PAGE_OFFSET, making it inaccessible to the kernel.

We can improve this by changing the patchable sequences and the patching
logic to carry more bits of offset: 11 bits gives us 4 GiB &gt;&gt; 11 == 2 MiB
of granularity, and so we will never waste more than that amount by
rounding up the physical start of DRAM to the next multiple of 2 MiB.
(Note that 2 MiB granularity guarantees that the linear mapping can be
created efficiently, whereas less than 2 MiB may result in the linear
mapping needing another level of page tables)

This helps Zhen Lei's scenario, where the start of DRAM is known to be
occupied. It also helps EFI boot, which relies on the firmware's page
allocator to allocate space for the decompressed kernel as low as
possible. And if the KASLR patches ever land for 32-bit, it will give
us 3 more bits of randomization of the placement of the kernel inside
the linear region.

For the ARM code path, it simply comes down to using two add/sub
instructions instead of one for the carryless version, and patching
each of them with the correct immediate depending on the rotation
field. For the LPAE calculation, which has to deal with a carry, it
patches the MOVW instruction with up to 12 bits of offset (but we only
need 11 bits anyway)

For the Thumb2 code path, patching more than 11 bits of displacement
would be somewhat cumbersome, but the 11 bits we need fit nicely into
the second word of the u16[2] opcode, so we simply update the immediate
assignment and the left shift to create an addend of the right magnitude.

Suggested-by: Zhen Lei &lt;thunder.leizhen@huawei.com&gt;
Acked-by: Nicolas Pitre &lt;nico@fluxnic.net&gt;
Acked-by: Linus Walleij &lt;linus.walleij@linaro.org&gt;
Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARM: p2v: switch to MOVW for Thumb2 and ARM/LPAE</title>
<updated>2020-10-28T15:59:43+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ardb@kernel.org</email>
</author>
<published>2020-09-20T21:02:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e8e00f5afb087912fb3edb225ee373aa6499bb79'/>
<id>e8e00f5afb087912fb3edb225ee373aa6499bb79</id>
<content type='text'>
In preparation for reducing the phys-to-virt minimum relative alignment
from 16 MiB to 2 MiB, switch to patchable sequences involving MOVW
instructions that can more easily be manipulated to carry a 12-bit
immediate. Note that the non-LPAE ARM sequence is not updated: MOVW
may not be supported on non-LPAE platforms, and the sequence itself
can be updated more easily to apply the 12 bits of displacement.

For Thumb2, which has many more versions of opcodes, switch to a sequence
that can be patched by the same patching code for both versions. Note
that the Thumb2 opcodes for MOVW and MVN are unambiguous, and have no
rotation bits in their immediate fields, so there is no need to use
placeholder constants in the asm blocks.

While at it, drop the 'volatile' qualifiers from the asm blocks: the
code does not have any side effects that are invisible to the compiler,
so it is free to omit these sequences if the outputs are not used.

Suggested-by: Russell King &lt;linux@armlinux.org.uk&gt;
Acked-by: Nicolas Pitre &lt;nico@fluxnic.net&gt;
Reviewed-by: Linus Walleij &lt;linus.walleij@linaro.org&gt;
Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In preparation for reducing the phys-to-virt minimum relative alignment
from 16 MiB to 2 MiB, switch to patchable sequences involving MOVW
instructions that can more easily be manipulated to carry a 12-bit
immediate. Note that the non-LPAE ARM sequence is not updated: MOVW
may not be supported on non-LPAE platforms, and the sequence itself
can be updated more easily to apply the 12 bits of displacement.

For Thumb2, which has many more versions of opcodes, switch to a sequence
that can be patched by the same patching code for both versions. Note
that the Thumb2 opcodes for MOVW and MVN are unambiguous, and have no
rotation bits in their immediate fields, so there is no need to use
placeholder constants in the asm blocks.

While at it, drop the 'volatile' qualifiers from the asm blocks: the
code does not have any side effects that are invisible to the compiler,
so it is free to omit these sequences if the outputs are not used.

Suggested-by: Russell King &lt;linux@armlinux.org.uk&gt;
Acked-by: Nicolas Pitre &lt;nico@fluxnic.net&gt;
Reviewed-by: Linus Walleij &lt;linus.walleij@linaro.org&gt;
Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARM: p2v: use relative references in patch site arrays</title>
<updated>2020-10-28T15:59:43+00:00</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ardb@kernel.org</email>
</author>
<published>2020-09-20T16:23:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2730e8eaa4f2baccc03296e0c5ee109c0673fe5f'/>
<id>2730e8eaa4f2baccc03296e0c5ee109c0673fe5f</id>
<content type='text'>
Free up a register in the p2v patching code by switching to relative
references, which don't require keeping the phys-to-virt displacement
live in a register.

Acked-by: Nicolas Pitre &lt;nico@fluxnic.net&gt;
Reviewed-by: Linus Walleij &lt;linus.walleij@linaro.org&gt;
Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Free up a register in the p2v patching code by switching to relative
references, which don't require keeping the phys-to-virt displacement
live in a register.

Acked-by: Nicolas Pitre &lt;nico@fluxnic.net&gt;
Reviewed-by: Linus Walleij &lt;linus.walleij@linaro.org&gt;
Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
