<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/arm64/kernel/module-plts.c, branch v6.6</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>arm64: module: Use module_init_layout_section() to spot init sections</title>
<updated>2023-08-03T20:42:02+00:00</updated>
<author>
<name>James Morse</name>
<email>james.morse@arm.com</email>
</author>
<published>2023-08-01T14:54:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f928f8b1a2496e7af95b860f9acf553f20f68f16'/>
<id>f928f8b1a2496e7af95b860f9acf553f20f68f16</id>
<content type='text'>
Today module_frob_arch_sections() spots init sections from their
'init' prefix, and uses this to keep the init PLTs separate from the rest.

module_emit_plt_entry() uses within_module_init() to determine if a
location is in the init text or not, but this depends on whether
core code thought this was an init section.

Naturally the logic is different.

module_init_layout_section() groups the init and exit text together if
module unloading is disabled, as the exit code will never run. The result
is kernels with this configuration can't load all their modules because
there are not enough PLTs for the combined init+exit section.

This results in the following:
| WARNING: CPU: 2 PID: 51 at arch/arm64/kernel/module-plts.c:99 module_emit_plt_entry+0x184/0x1cc
| Modules linked in: crct10dif_common
| CPU: 2 PID: 51 Comm: modprobe Not tainted 6.5.0-rc4-yocto-standard-dirty #15208
| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : module_emit_plt_entry+0x184/0x1cc
| lr : module_emit_plt_entry+0x94/0x1cc
| sp : ffffffc0803bba60
[...]
| Call trace:
|  module_emit_plt_entry+0x184/0x1cc
|  apply_relocate_add+0x2bc/0x8e4
|  load_module+0xe34/0x1bd4
|  init_module_from_file+0x84/0xc0
|  __arm64_sys_finit_module+0x1b8/0x27c
|  invoke_syscall.constprop.0+0x5c/0x104
|  do_el0_svc+0x58/0x160
|  el0_svc+0x38/0x110
|  el0t_64_sync_handler+0xc0/0xc4
|  el0t_64_sync+0x190/0x194

A previous patch exposed module_init_layout_section(), use that so the
logic is the same.

Reported-by: Adam Johnston &lt;adam.johnston@arm.com&gt;
Tested-by: Adam Johnston &lt;adam.johnston@arm.com&gt;
Fixes: 055f23b74b20 ("module: check for exit sections in layout_sections() instead of module_init_section()")
Cc: &lt;stable@vger.kernel.org&gt; # 5.15.x: 60a0aab7463ee69 arm64: module-plts: inline linux/moduleloader.h
Cc: &lt;stable@vger.kernel.org&gt; # 5.15.x
Signed-off-by: James Morse &lt;james.morse@arm.com&gt;
Acked-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Signed-off-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Today module_frob_arch_sections() spots init sections from their
'init' prefix, and uses this to keep the init PLTs separate from the rest.

module_emit_plt_entry() uses within_module_init() to determine if a
location is in the init text or not, but this depends on whether
core code thought this was an init section.

Naturally the logic is different.

module_init_layout_section() groups the init and exit text together if
module unloading is disabled, as the exit code will never run. The result
is kernels with this configuration can't load all their modules because
there are not enough PLTs for the combined init+exit section.

This results in the following:
| WARNING: CPU: 2 PID: 51 at arch/arm64/kernel/module-plts.c:99 module_emit_plt_entry+0x184/0x1cc
| Modules linked in: crct10dif_common
| CPU: 2 PID: 51 Comm: modprobe Not tainted 6.5.0-rc4-yocto-standard-dirty #15208
| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : module_emit_plt_entry+0x184/0x1cc
| lr : module_emit_plt_entry+0x94/0x1cc
| sp : ffffffc0803bba60
[...]
| Call trace:
|  module_emit_plt_entry+0x184/0x1cc
|  apply_relocate_add+0x2bc/0x8e4
|  load_module+0xe34/0x1bd4
|  init_module_from_file+0x84/0xc0
|  __arm64_sys_finit_module+0x1b8/0x27c
|  invoke_syscall.constprop.0+0x5c/0x104
|  do_el0_svc+0x58/0x160
|  el0_svc+0x38/0x110
|  el0t_64_sync_handler+0xc0/0xc4
|  el0t_64_sync+0x190/0x194

A previous patch exposed module_init_layout_section(), use that so the
logic is the same.

Reported-by: Adam Johnston &lt;adam.johnston@arm.com&gt;
Tested-by: Adam Johnston &lt;adam.johnston@arm.com&gt;
Fixes: 055f23b74b20 ("module: check for exit sections in layout_sections() instead of module_init_section()")
Cc: &lt;stable@vger.kernel.org&gt; # 5.15.x: 60a0aab7463ee69 arm64: module-plts: inline linux/moduleloader.h
Cc: &lt;stable@vger.kernel.org&gt; # 5.15.x
Signed-off-by: James Morse &lt;james.morse@arm.com&gt;
Acked-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Signed-off-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: module-plts: inline linux/moduleloader.h</title>
<updated>2023-05-25T16:44:02+00:00</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2023-05-16T16:06:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=60a0aab7463ee69296692d980b96510ccce3934e'/>
<id>60a0aab7463ee69296692d980b96510ccce3934e</id>
<content type='text'>
module_frob_arch_sections() is declared in moduleloader.h, but
that is not included before the definition:

arch/arm64/kernel/module-plts.c:286:5: error: no previous prototype for 'module_frob_arch_sections' [-Werror=missing-prototypes]

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20230516160642.523862-11-arnd@kernel.org
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
module_frob_arch_sections() is declared in moduleloader.h, but
that is not included before the definition:

arch/arm64/kernel/module-plts.c:286:5: error: no previous prototype for 'module_frob_arch_sections' [-Werror=missing-prototypes]

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: https://lore.kernel.org/r/20230516160642.523862-11-arnd@kernel.org
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>module: replace module_layout with module_memory</title>
<updated>2023-03-09T20:55:15+00:00</updated>
<author>
<name>Song Liu</name>
<email>song@kernel.org</email>
</author>
<published>2023-02-07T00:28:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ac3b43283923440900b4f36ca5f9f0b1ca43b70e'/>
<id>ac3b43283923440900b4f36ca5f9f0b1ca43b70e</id>
<content type='text'>
module_layout manages different types of memory (text, data, rodata, etc.)
in one allocation, which is problematic for some reasons:

1. It is hard to enable CONFIG_STRICT_MODULE_RWX.
2. It is hard to use huge pages in modules (and not break strict rwx).
3. Many archs uses module_layout for arch-specific data, but it is not
   obvious how these data are used (are they RO, RX, or RW?)

Improve the scenario by replacing 2 (or 3) module_layout per module with
up to 7 module_memory per module:

        MOD_TEXT,
        MOD_DATA,
        MOD_RODATA,
        MOD_RO_AFTER_INIT,
        MOD_INIT_TEXT,
        MOD_INIT_DATA,
        MOD_INIT_RODATA,

and allocating them separately. This adds slightly more entries to
mod_tree (from up to 3 entries per module, to up to 7 entries per
module). However, this at most adds a small constant overhead to
__module_address(), which is expected to be fast.

Various archs use module_layout for different data. These data are put
into different module_memory based on their location in module_layout.
IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT;
data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc.

module_memory simplifies quite some of the module code. For example,
ARCH_WANTS_MODULES_DATA_IN_VMALLOC is a lot cleaner, as it just uses a
different allocator for the data. kernel/module/strict_rwx.c is also
much cleaner with module_memory.

Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Cc: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Guenter Roeck &lt;linux@roeck-us.net&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Reviewed-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Signed-off-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
module_layout manages different types of memory (text, data, rodata, etc.)
in one allocation, which is problematic for some reasons:

1. It is hard to enable CONFIG_STRICT_MODULE_RWX.
2. It is hard to use huge pages in modules (and not break strict rwx).
3. Many archs uses module_layout for arch-specific data, but it is not
   obvious how these data are used (are they RO, RX, or RW?)

Improve the scenario by replacing 2 (or 3) module_layout per module with
up to 7 module_memory per module:

        MOD_TEXT,
        MOD_DATA,
        MOD_RODATA,
        MOD_RO_AFTER_INIT,
        MOD_INIT_TEXT,
        MOD_INIT_DATA,
        MOD_INIT_RODATA,

and allocating them separately. This adds slightly more entries to
mod_tree (from up to 3 entries per module, to up to 7 entries per
module). However, this at most adds a small constant overhead to
__module_address(), which is expected to be fast.

Various archs use module_layout for different data. These data are put
into different module_memory based on their location in module_layout.
IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT;
data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc.

module_memory simplifies quite some of the module code. For example,
ARCH_WANTS_MODULES_DATA_IN_VMALLOC is a lot cleaner, as it just uses a
different allocator for the data. kernel/module/strict_rwx.c is also
much cleaner with module_memory.

Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Cc: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Guenter Roeck &lt;linux@roeck-us.net&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Reviewed-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Signed-off-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: module: Make plt_equals_entry() static</title>
<updated>2022-09-29T16:47:18+00:00</updated>
<author>
<name>Li Huafei</name>
<email>lihuafei1@huawei.com</email>
</author>
<published>2022-09-29T09:41:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3fb420f56cbf06b6ef8c39b886fed8f8f9aedee3'/>
<id>3fb420f56cbf06b6ef8c39b886fed8f8f9aedee3</id>
<content type='text'>
Since commit 4e69ecf4da1e ("arm64/module: ftrace: deal with place
relative nature of PLTs"), plt_equals_entry() is not used outside of
module-plts.c, so make it static.

Signed-off-by: Li Huafei &lt;lihuafei1@huawei.com&gt;
Acked-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Link: https://lore.kernel.org/r/20220929094134.99512-2-lihuafei1@huawei.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since commit 4e69ecf4da1e ("arm64/module: ftrace: deal with place
relative nature of PLTs"), plt_equals_entry() is not used outside of
module-plts.c, so make it static.

Signed-off-by: Li Huafei &lt;lihuafei1@huawei.com&gt;
Acked-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Link: https://lore.kernel.org/r/20220929094134.99512-2-lihuafei1@huawei.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: fix typos in comments</title>
<updated>2022-04-04T09:32:50+00:00</updated>
<author>
<name>Julia Lawall</name>
<email>Julia.Lawall@inria.fr</email>
</author>
<published>2022-03-18T10:37:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dd671f16b1cdb188aa64d740a408f7d00e281444'/>
<id>dd671f16b1cdb188aa64d740a408f7d00e281444</id>
<content type='text'>
Various spelling mistakes in comments.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall &lt;Julia.Lawall@inria.fr&gt;
Link: https://lore.kernel.org/r/20220318103729.157574-10-Julia.Lawall@inria.fr
[will: Squashed in 20220318103729.157574-28-Julia.Lawall@inria.fr]
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Various spelling mistakes in comments.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall &lt;Julia.Lawall@inria.fr&gt;
Link: https://lore.kernel.org/r/20220318103729.157574-10-Julia.Lawall@inria.fr
[will: Squashed in 20220318103729.157574-28-Julia.Lawall@inria.fr]
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: improve whitespace</title>
<updated>2021-02-04T13:59:49+00:00</updated>
<author>
<name>Zhiyuan Dai</name>
<email>daizhiyuan@phytium.com.cn</email>
</author>
<published>2021-02-04T01:43:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d9f1b52afa4012974b3c726ca89ae311f194e83f'/>
<id>d9f1b52afa4012974b3c726ca89ae311f194e83f</id>
<content type='text'>
In a few places we don't have whitespace between macro parameters,
which makes them hard to read. This patch adds whitespace to clearly
separate the parameters.

In a few places we have unnecessary whitespace around unary operators,
which is confusing, This patch removes the unnecessary whitespace.

Signed-off-by: Zhiyuan Dai &lt;daizhiyuan@phytium.com.cn&gt;
Link: https://lore.kernel.org/r/1612403029-5011-1-git-send-email-daizhiyuan@phytium.com.cn
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In a few places we don't have whitespace between macro parameters,
which makes them hard to read. This patch adds whitespace to clearly
separate the parameters.

In a few places we have unnecessary whitespace around unary operators,
which is confusing, This patch removes the unnecessary whitespace.

Signed-off-by: Zhiyuan Dai &lt;daizhiyuan@phytium.com.cn&gt;
Link: https://lore.kernel.org/r/1612403029-5011-1-git-send-email-daizhiyuan@phytium.com.cn
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64/module: set trampoline section flags regardless of CONFIG_DYNAMIC_FTRACE</title>
<updated>2020-09-02T07:35:33+00:00</updated>
<author>
<name>Jessica Yu</name>
<email>jeyu@kernel.org</email>
</author>
<published>2020-09-01T16:00:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e0328feda79d9681b3e3245e6e180295550c8ee9'/>
<id>e0328feda79d9681b3e3245e6e180295550c8ee9</id>
<content type='text'>
In the arm64 module linker script, the section .text.ftrace_trampoline
is specified unconditionally regardless of whether CONFIG_DYNAMIC_FTRACE
is enabled (this is simply due to the limitation that module linker
scripts are not preprocessed like the vmlinux one).

Normally, for .plt and .text.ftrace_trampoline, the section flags
present in the module binary wouldn't matter since module_frob_arch_sections()
would assign them manually anyway. However, the arm64 module loader only
sets the section flags for .text.ftrace_trampoline when CONFIG_DYNAMIC_FTRACE=y.
That's only become problematic recently due to a recent change in
binutils-2.35, where the .text.ftrace_trampoline section (along with the
.plt section) is now marked writable and executable (WAX).

We no longer allow writable and executable sections to be loaded due to
commit 5c3a7db0c7ec ("module: Harden STRICT_MODULE_RWX"), so this is
causing all modules linked with binutils-2.35 to be rejected under arm64.
Drop the IS_ENABLED(CONFIG_DYNAMIC_FTRACE) check in module_frob_arch_sections()
so that the section flags for .text.ftrace_trampoline get properly set to
SHF_EXECINSTR|SHF_ALLOC, without SHF_WRITE.

Signed-off-by: Jessica Yu &lt;jeyu@kernel.org&gt;
Acked-by: Will Deacon &lt;will@kernel.org&gt;
Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: http://lore.kernel.org/r/20200831094651.GA16385@linux-8ccs
Link: https://lore.kernel.org/r/20200901160016.3646-1-jeyu@kernel.org
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In the arm64 module linker script, the section .text.ftrace_trampoline
is specified unconditionally regardless of whether CONFIG_DYNAMIC_FTRACE
is enabled (this is simply due to the limitation that module linker
scripts are not preprocessed like the vmlinux one).

Normally, for .plt and .text.ftrace_trampoline, the section flags
present in the module binary wouldn't matter since module_frob_arch_sections()
would assign them manually anyway. However, the arm64 module loader only
sets the section flags for .text.ftrace_trampoline when CONFIG_DYNAMIC_FTRACE=y.
That's only become problematic recently due to a recent change in
binutils-2.35, where the .text.ftrace_trampoline section (along with the
.plt section) is now marked writable and executable (WAX).

We no longer allow writable and executable sections to be loaded due to
commit 5c3a7db0c7ec ("module: Harden STRICT_MODULE_RWX"), so this is
causing all modules linked with binutils-2.35 to be rejected under arm64.
Drop the IS_ENABLED(CONFIG_DYNAMIC_FTRACE) check in module_frob_arch_sections()
so that the section flags for .text.ftrace_trampoline get properly set to
SHF_EXECINSTR|SHF_ALLOC, without SHF_WRITE.

Signed-off-by: Jessica Yu &lt;jeyu@kernel.org&gt;
Acked-by: Will Deacon &lt;will@kernel.org&gt;
Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Link: http://lore.kernel.org/r/20200831094651.GA16385@linux-8ccs
Link: https://lore.kernel.org/r/20200901160016.3646-1-jeyu@kernel.org
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64/module: Optimize module load time by optimizing PLT counting</title>
<updated>2020-07-02T11:17:13+00:00</updated>
<author>
<name>Saravana Kannan</name>
<email>saravanak@google.com</email>
</author>
<published>2020-06-23T01:18:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d4e0340919fb9190a57e879fb3125c4acce0d9b2'/>
<id>d4e0340919fb9190a57e879fb3125c4acce0d9b2</id>
<content type='text'>
When loading a module, module_frob_arch_sections() tries to figure out
the number of PLTs that'll be needed to handle all the RELAs. While
doing this, it tries to dedupe PLT allocations for multiple
R_AARCH64_CALL26 relocations to the same symbol. It does the same for
R_AARCH64_JUMP26 relocations.

To make checks for duplicates easier/faster, it sorts the relocation
list by type, symbol and addend. That way, to check for a duplicate
relocation, it just needs to compare with the previous entry.

However, sorting the entire relocation array is unnecessary and
expensive (O(n log n)) because there are a lot of other relocation types
that don't need deduping or can't be deduped.

So this commit partitions the array into entries that need deduping and
those that don't. And then sorts just the part that needs deduping. And
when CONFIG_RANDOMIZE_BASE is disabled, the sorting is skipped entirely
because PLTs are not allocated for R_AARCH64_CALL26 and R_AARCH64_JUMP26
if it's disabled.

This gives significant reduction in module load time for modules with
large number of relocations with no measurable impact on modules with a
small number of relocations. In my test setup with CONFIG_RANDOMIZE_BASE
enabled, these were the results for a few downstream modules:

Module		Size (MB)
wlan		14
video codec	3.8
drm		1.8
IPA		2.5
audio		1.2
gpu		1.8

Without this patch:
Module		Number of entries sorted	Module load time (ms)
wlan		243739				283
video codec	74029				138
drm		53837				67
IPA		42800				90
audio		21326				27
gpu		20967				32

Total time to load all these module: 637 ms

With this patch:
Module		Number of entries sorted	Module load time (ms)
wlan		22454				61
video codec	10150				47
drm		13014				40
IPA		8097				63
audio		4606				16
gpu		6527				20

Total time to load all these modules: 247

Time saved during boot for just these 6 modules: 390 ms

Signed-off-by: Saravana Kannan &lt;saravanak@google.com&gt;
Acked-by: Will Deacon &lt;will@kernel.org&gt;
Cc: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Link: https://lore.kernel.org/r/20200623011803.91232-1-saravanak@google.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When loading a module, module_frob_arch_sections() tries to figure out
the number of PLTs that'll be needed to handle all the RELAs. While
doing this, it tries to dedupe PLT allocations for multiple
R_AARCH64_CALL26 relocations to the same symbol. It does the same for
R_AARCH64_JUMP26 relocations.

To make checks for duplicates easier/faster, it sorts the relocation
list by type, symbol and addend. That way, to check for a duplicate
relocation, it just needs to compare with the previous entry.

However, sorting the entire relocation array is unnecessary and
expensive (O(n log n)) because there are a lot of other relocation types
that don't need deduping or can't be deduped.

So this commit partitions the array into entries that need deduping and
those that don't. And then sorts just the part that needs deduping. And
when CONFIG_RANDOMIZE_BASE is disabled, the sorting is skipped entirely
because PLTs are not allocated for R_AARCH64_CALL26 and R_AARCH64_JUMP26
if it's disabled.

This gives significant reduction in module load time for modules with
large number of relocations with no measurable impact on modules with a
small number of relocations. In my test setup with CONFIG_RANDOMIZE_BASE
enabled, these were the results for a few downstream modules:

Module		Size (MB)
wlan		14
video codec	3.8
drm		1.8
IPA		2.5
audio		1.2
gpu		1.8

Without this patch:
Module		Number of entries sorted	Module load time (ms)
wlan		243739				283
video codec	74029				138
drm		53837				67
IPA		42800				90
audio		21326				27
gpu		20967				32

Total time to load all these module: 637 ms

With this patch:
Module		Number of entries sorted	Module load time (ms)
wlan		22454				61
video codec	10150				47
drm		13014				40
IPA		8097				63
audio		4606				16
gpu		6527				20

Total time to load all these modules: 247

Time saved during boot for just these 6 modules: 390 ms

Signed-off-by: Saravana Kannan &lt;saravanak@google.com&gt;
Acked-by: Will Deacon &lt;will@kernel.org&gt;
Cc: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Link: https://lore.kernel.org/r/20200623011803.91232-1-saravanak@google.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: implement ftrace with regs</title>
<updated>2019-11-06T14:17:35+00:00</updated>
<author>
<name>Torsten Duwe</name>
<email>duwe@lst.de</email>
</author>
<published>2019-02-08T15:10:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3b23e4991fb66f6d152f9055ede271a726ef9f21'/>
<id>3b23e4991fb66f6d152f9055ede271a726ef9f21</id>
<content type='text'>
This patch implements FTRACE_WITH_REGS for arm64, which allows a traced
function's arguments (and some other registers) to be captured into a
struct pt_regs, allowing these to be inspected and/or modified. This is
a building block for live-patching, where a function's arguments may be
forwarded to another function. This is also necessary to enable ftrace
and in-kernel pointer authentication at the same time, as it allows the
LR value to be captured and adjusted prior to signing.

Using GCC's -fpatchable-function-entry=N option, we can have the
compiler insert a configurable number of NOPs between the function entry
point and the usual prologue. This also ensures functions are AAPCS
compliant (e.g. disabling inter-procedural register allocation).

For example, with -fpatchable-function-entry=2, GCC 8.1.0 compiles the
following:

| unsigned long bar(void);
|
| unsigned long foo(void)
| {
|         return bar() + 1;
| }

... to:

| &lt;foo&gt;:
|         nop
|         nop
|         stp     x29, x30, [sp, #-16]!
|         mov     x29, sp
|         bl      0 &lt;bar&gt;
|         add     x0, x0, #0x1
|         ldp     x29, x30, [sp], #16
|         ret

This patch builds the kernel with -fpatchable-function-entry=2,
prefixing each function with two NOPs. To trace a function, we replace
these NOPs with a sequence that saves the LR into a GPR, then calls an
ftrace entry assembly function which saves this and other relevant
registers:

| mov	x9, x30
| bl	&lt;ftrace-entry&gt;

Since patchable functions are AAPCS compliant (and the kernel does not
use x18 as a platform register), x9-x18 can be safely clobbered in the
patched sequence and the ftrace entry code.

There are now two ftrace entry functions, ftrace_regs_entry (which saves
all GPRs), and ftrace_entry (which saves the bare minimum). A PLT is
allocated for each within modules.

Signed-off-by: Torsten Duwe &lt;duwe@suse.de&gt;
[Mark: rework asm, comments, PLTs, initialization, commit message]
Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Reviewed-by: Amit Daniel Kachhap &lt;amit.kachhap@arm.com&gt;
Reviewed-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Reviewed-by: Torsten Duwe &lt;duwe@suse.de&gt;
Tested-by: Amit Daniel Kachhap &lt;amit.kachhap@arm.com&gt;
Tested-by: Torsten Duwe &lt;duwe@suse.de&gt;
Cc: AKASHI Takahiro &lt;takahiro.akashi@linaro.org&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Julien Thierry &lt;jthierry@redhat.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch implements FTRACE_WITH_REGS for arm64, which allows a traced
function's arguments (and some other registers) to be captured into a
struct pt_regs, allowing these to be inspected and/or modified. This is
a building block for live-patching, where a function's arguments may be
forwarded to another function. This is also necessary to enable ftrace
and in-kernel pointer authentication at the same time, as it allows the
LR value to be captured and adjusted prior to signing.

Using GCC's -fpatchable-function-entry=N option, we can have the
compiler insert a configurable number of NOPs between the function entry
point and the usual prologue. This also ensures functions are AAPCS
compliant (e.g. disabling inter-procedural register allocation).

For example, with -fpatchable-function-entry=2, GCC 8.1.0 compiles the
following:

| unsigned long bar(void);
|
| unsigned long foo(void)
| {
|         return bar() + 1;
| }

... to:

| &lt;foo&gt;:
|         nop
|         nop
|         stp     x29, x30, [sp, #-16]!
|         mov     x29, sp
|         bl      0 &lt;bar&gt;
|         add     x0, x0, #0x1
|         ldp     x29, x30, [sp], #16
|         ret

This patch builds the kernel with -fpatchable-function-entry=2,
prefixing each function with two NOPs. To trace a function, we replace
these NOPs with a sequence that saves the LR into a GPR, then calls an
ftrace entry assembly function which saves this and other relevant
registers:

| mov	x9, x30
| bl	&lt;ftrace-entry&gt;

Since patchable functions are AAPCS compliant (and the kernel does not
use x18 as a platform register), x9-x18 can be safely clobbered in the
patched sequence and the ftrace entry code.

There are now two ftrace entry functions, ftrace_regs_entry (which saves
all GPRs), and ftrace_entry (which saves the bare minimum). A PLT is
allocated for each within modules.

Signed-off-by: Torsten Duwe &lt;duwe@suse.de&gt;
[Mark: rework asm, comments, PLTs, initialization, commit message]
Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Reviewed-by: Amit Daniel Kachhap &lt;amit.kachhap@arm.com&gt;
Reviewed-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Reviewed-by: Torsten Duwe &lt;duwe@suse.de&gt;
Tested-by: Amit Daniel Kachhap &lt;amit.kachhap@arm.com&gt;
Tested-by: Torsten Duwe &lt;duwe@suse.de&gt;
Cc: AKASHI Takahiro &lt;takahiro.akashi@linaro.org&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Julien Thierry &lt;jthierry@redhat.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: Replace strncmp with str_has_prefix</title>
<updated>2019-08-05T10:06:34+00:00</updated>
<author>
<name>Chuhong Yuan</name>
<email>hslester96@gmail.com</email>
</author>
<published>2019-07-30T02:44:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b3e089cd446b26bb1e12860e1afb9da314453fd6'/>
<id>b3e089cd446b26bb1e12860e1afb9da314453fd6</id>
<content type='text'>
In commit b6b2735514bc
("tracing: Use str_has_prefix() instead of using fixed sizes")
the newly introduced str_has_prefix() was used
to replace error-prone strncmp(str, const, len).
Here fix codes with the same pattern.

Signed-off-by: Chuhong Yuan &lt;hslester96@gmail.com&gt;
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In commit b6b2735514bc
("tracing: Use str_has_prefix() instead of using fixed sizes")
the newly introduced str_has_prefix() was used
to replace error-prone strncmp(str, const, len).
Here fix codes with the same pattern.

Signed-off-by: Chuhong Yuan &lt;hslester96@gmail.com&gt;
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
