<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/powerpc/net, branch v6.5</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>powerpc/bpf: populate extable entries only during the last pass</title>
<updated>2023-05-15T05:40:31+00:00</updated>
<author>
<name>Hari Bathini</name>
<email>hbathini@linux.ibm.com</email>
</author>
<published>2023-04-25T06:58:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=35a4b8ce4ac00e940b46b1034916ccb22ce9bdef'/>
<id>35a4b8ce4ac00e940b46b1034916ccb22ce9bdef</id>
<content type='text'>
Since commit 85e031154c7c ("powerpc/bpf: Perform complete extra passes
to update addresses"), two additional passes are performed to avoid
space and CPU time wastage on powerpc. But these extra passes led to
WARN_ON_ONCE() hits in bpf_add_extable_entry() as extable entries are
populated again, during the extra pass, without resetting the index.
Fix it by resetting entry index before repopulating extable entries,
if and when there is an additional pass.

Fixes: 85e031154c7c ("powerpc/bpf: Perform complete extra passes to update addresses")
Cc: stable@vger.kernel.org # v6.3+
Signed-off-by: Hari Bathini &lt;hbathini@linux.ibm.com&gt;
Reviewed-by: Naveen N. Rao &lt;naveen.n.rao@linux.vnet.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://msgid.link/20230425065829.18189-1-hbathini@linux.ibm.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since commit 85e031154c7c ("powerpc/bpf: Perform complete extra passes
to update addresses"), two additional passes are performed to avoid
space and CPU time wastage on powerpc. But these extra passes led to
WARN_ON_ONCE() hits in bpf_add_extable_entry() as extable entries are
populated again, during the extra pass, without resetting the index.
Fix it by resetting entry index before repopulating extable entries,
if and when there is an additional pass.

Fixes: 85e031154c7c ("powerpc/bpf: Perform complete extra passes to update addresses")
Cc: stable@vger.kernel.org # v6.3+
Signed-off-by: Hari Bathini &lt;hbathini@linux.ibm.com&gt;
Reviewed-by: Naveen N. Rao &lt;naveen.n.rao@linux.vnet.ibm.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://msgid.link/20230425065829.18189-1-hbathini@linux.ibm.com
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/64: vmlinux support building with PCREL addresing</title>
<updated>2023-04-20T02:59:21+00:00</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2023-04-08T02:17:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7e3a68be42e10f5fa5890e97afc0afd992355bc3'/>
<id>7e3a68be42e10f5fa5890e97afc0afd992355bc3</id>
<content type='text'>
PC-Relative or PCREL addressing is an extension to the ELF ABI which
uses Power ISA v3.1 PC-relative instructions to calculate addresses,
rather than the traditional TOC scheme.

Add an option to build vmlinux using pcrel addressing. Modules continue
to use TOC addressing.

- TOC address helpers and r2 are poisoned with -1 when running vmlinux.
  r2 could be used for something useful once things are ironed out.

- Assembly must call C functions with @notoc annotation, or the linker
  complains aobut a missing nop after the call. This is done with the
  CFUNC macro introduced earlier.

- Boot: with the exception of prom_init, the execution branches to the
  kernel virtual address early in boot, before any addresses are
  generated, which ensures 34-bit pcrel addressing does not miss the
  high PAGE_OFFSET bits. TOC relative addressing has a similar
  requirement. prom_init does not go to the virtual address and its
  addresses should not carry over to the post-prom kernel.

- Ftrace trampolines are converted from TOC addressing to pcrel
  addressing, including module ftrace trampolines that currently use the
  kernel TOC to find ftrace target functions.

- BPF function prologue and function calling generation are converted
  from TOC to pcrel.

- copypage_64.S has an interesting problem, prefixed instructions have
  alignment restrictions so the linker can add padding, which makes the
  assembler treat the difference between two local labels as
  non-constant even if alignment is arranged so padding is not required.
  This may need toolchain help to solve nicely, for now move the prefix
  instruction out of the alternate patch section to work around it.

This reduces kernel text size by about 6%.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://msgid.link/20230408021752.862660-6-npiggin@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
PC-Relative or PCREL addressing is an extension to the ELF ABI which
uses Power ISA v3.1 PC-relative instructions to calculate addresses,
rather than the traditional TOC scheme.

Add an option to build vmlinux using pcrel addressing. Modules continue
to use TOC addressing.

- TOC address helpers and r2 are poisoned with -1 when running vmlinux.
  r2 could be used for something useful once things are ironed out.

- Assembly must call C functions with @notoc annotation, or the linker
  complains aobut a missing nop after the call. This is done with the
  CFUNC macro introduced earlier.

- Boot: with the exception of prom_init, the execution branches to the
  kernel virtual address early in boot, before any addresses are
  generated, which ensures 34-bit pcrel addressing does not miss the
  high PAGE_OFFSET bits. TOC relative addressing has a similar
  requirement. prom_init does not go to the virtual address and its
  addresses should not carry over to the post-prom kernel.

- Ftrace trampolines are converted from TOC addressing to pcrel
  addressing, including module ftrace trampolines that currently use the
  kernel TOC to find ftrace target functions.

- BPF function prologue and function calling generation are converted
  from TOC to pcrel.

- copypage_64.S has an interesting problem, prefixed instructions have
  alignment restrictions so the linker can add padding, which makes the
  assembler treat the difference between two local labels as
  non-constant even if alignment is arranged so padding is not required.
  This may need toolchain help to solve nicely, for now move the prefix
  instruction out of the alternate patch section to work around it.

This reduces kernel text size by about 6%.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://msgid.link/20230408021752.862660-6-npiggin@gmail.com
</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/bpf/32: perform three operands ALU operations</title>
<updated>2023-02-10T11:17:35+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@csgroup.eu</email>
</author>
<published>2023-02-01T10:04:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=19daf0aef84f33bde9c742ed41b4ded567b8dfbf'/>
<id>19daf0aef84f33bde9c742ed41b4ded567b8dfbf</id>
<content type='text'>
When an ALU instruction is preceded by a MOV instruction
that just moves a source register into the destination register of
the ALU, replace that MOV+ALU instructions by an ALU operation
taking the source of the MOV as second source instead of using its
destination.

Before the change, code could look like the following, with
superfluous separate register move (mr) instructions.

  70:	7f c6 f3 78 	mr      r6,r30
  74:	7f a5 eb 78 	mr      r5,r29
  78:	30 c6 ff f4 	addic   r6,r6,-12
  7c:	7c a5 01 d4 	addme   r5,r5

With this commit, addition instructions take r30 and r29 directly.

  70:	30 de ff f4 	addic   r6,r30,-12
  74:	7c bd 01 d4 	addme   r5,r29

Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/b6719beaf01f9dcbcdbb787ef67c4a2f8e3a4cb6.1675245773.git.christophe.leroy@csgroup.eu

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When an ALU instruction is preceded by a MOV instruction
that just moves a source register into the destination register of
the ALU, replace that MOV+ALU instructions by an ALU operation
taking the source of the MOV as second source instead of using its
destination.

Before the change, code could look like the following, with
superfluous separate register move (mr) instructions.

  70:	7f c6 f3 78 	mr      r6,r30
  74:	7f a5 eb 78 	mr      r5,r29
  78:	30 c6 ff f4 	addic   r6,r6,-12
  7c:	7c a5 01 d4 	addme   r5,r5

With this commit, addition instructions take r30 and r29 directly.

  70:	30 de ff f4 	addic   r6,r30,-12
  74:	7c bd 01 d4 	addme   r5,r29

Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/b6719beaf01f9dcbcdbb787ef67c4a2f8e3a4cb6.1675245773.git.christophe.leroy@csgroup.eu

</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/bpf/32: introduce a second source register for ALU operations</title>
<updated>2023-02-10T11:17:35+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@csgroup.eu</email>
</author>
<published>2023-02-01T10:04:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c88da29b4d2ce8d0070646b8f99729e9b355a4bf'/>
<id>c88da29b4d2ce8d0070646b8f99729e9b355a4bf</id>
<content type='text'>
At the time being, all ALU operation are performed with same L-source
and destination, requiring the L-source to be moved into destination via
a separate register move, like:

  70:	7f c6 f3 78 	mr      r6,r30
  74:	7f a5 eb 78 	mr      r5,r29
  78:	30 c6 ff f4 	addic   r6,r6,-12
  7c:	7c a5 01 d4 	addme   r5,r5

Introduce a second source register to all ALU operations. For the time
being that second source register is made equal to the destination
register.

That change will allow, via following patch, to optimise the generated
code as:

  70:	30 de ff f4 	addic   r6,r30,-12
  74:	7c bd 01 d4 	addme   r5,r29

Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/d5aaaba50d9d6b4a0e9f0cd4a5e34101aca1e247.1675245773.git.christophe.leroy@csgroup.eu

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
At the time being, all ALU operation are performed with same L-source
and destination, requiring the L-source to be moved into destination via
a separate register move, like:

  70:	7f c6 f3 78 	mr      r6,r30
  74:	7f a5 eb 78 	mr      r5,r29
  78:	30 c6 ff f4 	addic   r6,r6,-12
  7c:	7c a5 01 d4 	addme   r5,r5

Introduce a second source register to all ALU operations. For the time
being that second source register is made equal to the destination
register.

That change will allow, via following patch, to optimise the generated
code as:

  70:	30 de ff f4 	addic   r6,r30,-12
  74:	7c bd 01 d4 	addme   r5,r29

Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/d5aaaba50d9d6b4a0e9f0cd4a5e34101aca1e247.1675245773.git.christophe.leroy@csgroup.eu

</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/bpf/32: Optimise some particular const operations</title>
<updated>2023-02-10T11:17:35+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@csgroup.eu</email>
</author>
<published>2023-02-01T10:04:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8616045fe785229b53a24b8698631826298d1500'/>
<id>8616045fe785229b53a24b8698631826298d1500</id>
<content type='text'>
Simplify multiplications and divisions with constants when the
constant is 1 or -1.

When the constant is a power of 2, replace them by bit shits.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/e53b1f4a4150ec6cabcaeeef82bf9c361b5f9204.1675245773.git.christophe.leroy@csgroup.eu

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Simplify multiplications and divisions with constants when the
constant is 1 or -1.

When the constant is a power of 2, replace them by bit shits.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/e53b1f4a4150ec6cabcaeeef82bf9c361b5f9204.1675245773.git.christophe.leroy@csgroup.eu

</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/bpf: Only pad length-variable code at initial pass</title>
<updated>2023-02-10T11:17:35+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@csgroup.eu</email>
</author>
<published>2023-02-01T10:04:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d3921cbb6cd663193cecf04f0b170a30c6d0e390'/>
<id>d3921cbb6cd663193cecf04f0b170a30c6d0e390</id>
<content type='text'>
Now that two real additional passes are performed in case of extra pass
requested by BPF core, padding is not needed anymore except during
initial pass done before memory allocation to count maximum possible
program size.

So, only do the padding when 'image' is NULL.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/921851d6577badc1e6b08b270a0ced80a6a26d03.1675245773.git.christophe.leroy@csgroup.eu

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that two real additional passes are performed in case of extra pass
requested by BPF core, padding is not needed anymore except during
initial pass done before memory allocation to count maximum possible
program size.

So, only do the padding when 'image' is NULL.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/921851d6577badc1e6b08b270a0ced80a6a26d03.1675245773.git.christophe.leroy@csgroup.eu

</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/bpf: Perform complete extra passes to update addresses</title>
<updated>2023-02-10T11:17:35+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@csgroup.eu</email>
</author>
<published>2023-02-01T10:04:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=85e031154c7c14edee0705532a9ffc8a2fe591d0'/>
<id>85e031154c7c14edee0705532a9ffc8a2fe591d0</id>
<content type='text'>
BPF core calls the jit compiler again for an extra pass in order
to properly set subprog addresses.

Unlike other architectures, powerpc only updates the addresses
during that extra pass. It means that holes must have been left
in the code in order to enable the maximum possible instruction
size.

In order to avoid waste of space, and waste of CPU time on powerpc
processors on which the NOP instruction is not 0-cycle, perform
two real additional passes.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/d484a4ac95949ff55fc4344b674e7c0d3ddbfcd5.1675245773.git.christophe.leroy@csgroup.eu

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
BPF core calls the jit compiler again for an extra pass in order
to properly set subprog addresses.

Unlike other architectures, powerpc only updates the addresses
during that extra pass. It means that holes must have been left
in the code in order to enable the maximum possible instruction
size.

In order to avoid waste of space, and waste of CPU time on powerpc
processors on which the NOP instruction is not 0-cycle, perform
two real additional passes.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/d484a4ac95949ff55fc4344b674e7c0d3ddbfcd5.1675245773.git.christophe.leroy@csgroup.eu

</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/bpf/32: BPF prog is never called with more than one arg</title>
<updated>2023-02-10T11:17:35+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@csgroup.eu</email>
</author>
<published>2023-02-01T10:04:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7dd0e2848764306d7a70943b97584ffdc7754708'/>
<id>7dd0e2848764306d7a70943b97584ffdc7754708</id>
<content type='text'>
BPF progs are never called with more than one argument, plus the
tail call count as a second argument when needed.

So, no need to retrieve 9th and 10th argument (5th 64 bits argument)
from the stack in prologue.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/89a200fb45048601475c092c5775294dee3886de.1675245773.git.christophe.leroy@csgroup.eu

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
BPF progs are never called with more than one argument, plus the
tail call count as a second argument when needed.

So, no need to retrieve 9th and 10th argument (5th 64 bits argument)
from the stack in prologue.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/89a200fb45048601475c092c5775294dee3886de.1675245773.git.christophe.leroy@csgroup.eu

</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/bpf/32: Only set a stack frame when necessary</title>
<updated>2023-02-10T11:17:34+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@csgroup.eu</email>
</author>
<published>2023-02-01T10:04:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d084dcf256bc4565b4b1af9b00297ac7b51c7049'/>
<id>d084dcf256bc4565b4b1af9b00297ac7b51c7049</id>
<content type='text'>
Until now a stack frame was set at all time due to the need
to keep tail call counter in the stack.

But since commit 89d21e259a94 ("powerpc/bpf/32: Fix Oops on tail call
tests") the tail call counter is passed via register r4. It is therefore
not necessary anymore to have a stack frame for that.

Just like PPC64, implement bpf_has_stack_frame() and only sets the frame
when needed.

The difference with PPC64 is that PPC32 doesn't have a redzone, so
the stack is required as soon as non volatile registers are used or
when tail call count is set up.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
[mpe: Fix commit reference in change log]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/62d7b654a3cfe73d998697cb29bbc5ffd89bfdb1.1675245773.git.christophe.leroy@csgroup.eu

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Until now a stack frame was set at all time due to the need
to keep tail call counter in the stack.

But since commit 89d21e259a94 ("powerpc/bpf/32: Fix Oops on tail call
tests") the tail call counter is passed via register r4. It is therefore
not necessary anymore to have a stack frame for that.

Just like PPC64, implement bpf_has_stack_frame() and only sets the frame
when needed.

The difference with PPC64 is that PPC32 doesn't have a redzone, so
the stack is required as soon as non volatile registers are used or
when tail call count is set up.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
[mpe: Fix commit reference in change log]
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/62d7b654a3cfe73d998697cb29bbc5ffd89bfdb1.1675245773.git.christophe.leroy@csgroup.eu

</pre>
</div>
</content>
</entry>
<entry>
<title>powerpc/bpf/32: No need to zeroise r4 when not doing tail call</title>
<updated>2023-02-10T11:17:34+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@csgroup.eu</email>
</author>
<published>2023-02-01T10:04:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6376ed8feca829039d31a208216b958f0e439d87'/>
<id>6376ed8feca829039d31a208216b958f0e439d87</id>
<content type='text'>
r4 is cleared at function entry and used as tail call count.

But when the function does not perform tail call, r4 is
ignored, so no need to clear it.

Replace it by a NOP in that case.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/9c5440b2b6d90a78600257433ac499b5c5101fbb.1675245773.git.christophe.leroy@csgroup.eu

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
r4 is cleared at function entry and used as tail call count.

But when the function does not perform tail call, r4 is
ignored, so no need to clear it.

Replace it by a NOP in that case.

Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/9c5440b2b6d90a78600257433ac499b5c5101fbb.1675245773.git.christophe.leroy@csgroup.eu

</pre>
</div>
</content>
</entry>
</feed>
