<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/arch/mips/include/asm, branch v4.19.232</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>MIPS: Octeon: Fix build errors using clang</title>
<updated>2022-01-27T08:04:29+00:00</updated>
<author>
<name>Tianjia Zhang</name>
<email>tianjia.zhang@linux.alibaba.com</email>
</author>
<published>2021-12-16T09:50:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0734d421e01e9cc81d0c768b80b20500539dfc9a'/>
<id>0734d421e01e9cc81d0c768b80b20500539dfc9a</id>
<content type='text'>
[ Upstream commit 95339b70677dc6f9a2d669c4716058e71b8dc1c7 ]

A large number of the following errors is reported when compiling
with clang:

  cvmx-bootinfo.h:326:3: error: adding 'int' to a string does not append to the string [-Werror,-Wstring-plus-int]
                  ENUM_BRD_TYPE_CASE(CVMX_BOARD_TYPE_NULL)
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  cvmx-bootinfo.h:321:20: note: expanded from macro 'ENUM_BRD_TYPE_CASE'
          case x: return(#x + 16);        /* Skip CVMX_BOARD_TYPE_ */
                         ~~~^~~~
  cvmx-bootinfo.h:326:3: note: use array indexing to silence this warning
  cvmx-bootinfo.h:321:20: note: expanded from macro 'ENUM_BRD_TYPE_CASE'
          case x: return(#x + 16);        /* Skip CVMX_BOARD_TYPE_ */
                          ^

Follow the prompts to use the address operator '&amp;' to fix this error.

Signed-off-by: Tianjia Zhang &lt;tianjia.zhang@linux.alibaba.com&gt;
Reviewed-by: Nathan Chancellor &lt;nathan@kernel.org&gt;
Reviewed-by: Philippe Mathieu-Daudé &lt;f4bug@amsat.org&gt;
Signed-off-by: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 95339b70677dc6f9a2d669c4716058e71b8dc1c7 ]

A large number of the following errors is reported when compiling
with clang:

  cvmx-bootinfo.h:326:3: error: adding 'int' to a string does not append to the string [-Werror,-Wstring-plus-int]
                  ENUM_BRD_TYPE_CASE(CVMX_BOARD_TYPE_NULL)
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  cvmx-bootinfo.h:321:20: note: expanded from macro 'ENUM_BRD_TYPE_CASE'
          case x: return(#x + 16);        /* Skip CVMX_BOARD_TYPE_ */
                         ~~~^~~~
  cvmx-bootinfo.h:326:3: note: use array indexing to silence this warning
  cvmx-bootinfo.h:321:20: note: expanded from macro 'ENUM_BRD_TYPE_CASE'
          case x: return(#x + 16);        /* Skip CVMX_BOARD_TYPE_ */
                          ^

Follow the prompts to use the address operator '&amp;' to fix this error.

Signed-off-by: Tianjia Zhang &lt;tianjia.zhang@linux.alibaba.com&gt;
Reviewed-by: Nathan Chancellor &lt;nathan@kernel.org&gt;
Reviewed-by: Philippe Mathieu-Daudé &lt;f4bug@amsat.org&gt;
Signed-off-by: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mips: cm: Convert to bitfield API to fix out-of-bounds access</title>
<updated>2021-11-26T10:36:14+00:00</updated>
<author>
<name>Geert Uytterhoeven</name>
<email>geert+renesas@glider.be</email>
</author>
<published>2021-10-29T09:58:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=96be0f7cadae2478dcd4ffa851a98928d45347bf'/>
<id>96be0f7cadae2478dcd4ffa851a98928d45347bf</id>
<content type='text'>
[ Upstream commit 18b8f5b6fc53d097cadb94a93d8d6566ba88e389 ]

mips_cm_error_report() extracts the cause and other cause from the error
register using shifts.  This works fine for the former, as it is stored
in the top bits, and the shift will thus remove all non-related bits.
However, the latter is stored in the bottom bits, hence thus needs masking
to get rid of non-related bits.  Without such masking, using it as an
index into the cm2_causes[] array will lead to an out-of-bounds access,
probably causing a crash.

Fix this by using FIELD_GET() instead.  Bite the bullet and convert all
MIPS CM handling to the bitfield API, to improve readability and safety.

Fixes: 3885c2b463f6a236 ("MIPS: CM: Add support for reporting CM cache errors")
Signed-off-by: Geert Uytterhoeven &lt;geert+renesas@glider.be&gt;
Reviewed-by: Jiaxun Yang &lt;jiaxun.yang@flygoat.com&gt;
Signed-off-by: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 18b8f5b6fc53d097cadb94a93d8d6566ba88e389 ]

mips_cm_error_report() extracts the cause and other cause from the error
register using shifts.  This works fine for the former, as it is stored
in the top bits, and the shift will thus remove all non-related bits.
However, the latter is stored in the bottom bits, hence thus needs masking
to get rid of non-related bits.  Without such masking, using it as an
index into the cm2_causes[] array will lead to an out-of-bounds access,
probably causing a crash.

Fix this by using FIELD_GET() instead.  Bite the bullet and convert all
MIPS CM handling to the bitfield API, to improve readability and safety.

Fixes: 3885c2b463f6a236 ("MIPS: CM: Add support for reporting CM cache errors")
Signed-off-by: Geert Uytterhoeven &lt;geert+renesas@glider.be&gt;
Reviewed-by: Jiaxun Yang &lt;jiaxun.yang@flygoat.com&gt;
Signed-off-by: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed</title>
<updated>2021-11-06T12:58:45+00:00</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2020-11-11T16:52:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d4fe42d646f277dfbc8a6cbc82bc4c8a12dd7798'/>
<id>d4fe42d646f277dfbc8a6cbc82bc4c8a12dd7798</id>
<content type='text'>
commit cef397038167ac15d085914493d6c86385773709 upstream.

Stefan Agner reported a bug when using zsram on 32-bit Arm machines
with RAM above the 4GB address boundary:

  Unable to handle kernel NULL pointer dereference at virtual address 00000000
  pgd = a27bd01c
  [00000000] *pgd=236a0003, *pmd=1ffa64003
  Internal error: Oops: 207 [#1] SMP ARM
  Modules linked in: mdio_bcm_unimac(+) brcmfmac cfg80211 brcmutil raspberrypi_hwmon hci_uart crc32_arm_ce bcm2711_thermal phy_generic genet
  CPU: 0 PID: 123 Comm: mkfs.ext4 Not tainted 5.9.6 #1
  Hardware name: BCM2711
  PC is at zs_map_object+0x94/0x338
  LR is at zram_bvec_rw.constprop.0+0x330/0xa64
  pc : [&lt;c0602b38&gt;]    lr : [&lt;c0bda6a0&gt;]    psr: 60000013
  sp : e376bbe0  ip : 00000000  fp : c1e2921c
  r10: 00000002  r9 : c1dda730  r8 : 00000000
  r7 : e8ff7a00  r6 : 00000000  r5 : 02f9ffa0  r4 : e3710000
  r3 : 000fdffe  r2 : c1e0ce80  r1 : ebf979a0  r0 : 00000000
  Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
  Control: 30c5383d  Table: 235c2a80  DAC: fffffffd
  Process mkfs.ext4 (pid: 123, stack limit = 0x495a22e6)
  Stack: (0xe376bbe0 to 0xe376c000)

As it turns out, zsram needs to know the maximum memory size, which
is defined in MAX_PHYSMEM_BITS when CONFIG_SPARSEMEM is set, or in
MAX_POSSIBLE_PHYSMEM_BITS on the x86 architecture.

The same problem will be hit on all 32-bit architectures that have a
physical address space larger than 4GB and happen to not enable sparsemem
and include asm/sparsemem.h from asm/pgtable.h.

After the initial discussion, I suggested just always defining
MAX_POSSIBLE_PHYSMEM_BITS whenever CONFIG_PHYS_ADDR_T_64BIT is
set, or provoking a build error otherwise. This addresses all
configurations that can currently have this runtime bug, but
leaves all other configurations unchanged.

I looked up the possible number of bits in source code and
datasheets, here is what I found:

 - on ARC, CONFIG_ARC_HAS_PAE40 controls whether 32 or 40 bits are used
 - on ARM, CONFIG_LPAE enables 40 bit addressing, without it we never
   support more than 32 bits, even though supersections in theory allow
   up to 40 bits as well.
 - on MIPS, some MIPS32r1 or later chips support 36 bits, and MIPS32r5
   XPA supports up to 60 bits in theory, but 40 bits are more than
   anyone will ever ship
 - On PowerPC, there are three different implementations of 36 bit
   addressing, but 32-bit is used without CONFIG_PTE_64BIT
 - On RISC-V, the normal page table format can support 34 bit
   addressing. There is no highmem support on RISC-V, so anything
   above 2GB is unused, but it might be useful to eventually support
   CONFIG_ZRAM for high pages.

Fixes: 61989a80fb3a ("staging: zsmalloc: zsmalloc memory allocation library")
Fixes: 02390b87a945 ("mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS")
Acked-by: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Reviewed-by: Stefan Agner &lt;stefan@agner.ch&gt;
Tested-by: Stefan Agner &lt;stefan@agner.ch&gt;
Acked-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Link: https://lore.kernel.org/linux-mm/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.1604762181.git.stefan@agner.ch/
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
[florian: patch arch/powerpc/include/asm/pte-common.h for 4.19.y]
Signed-off-by: Florian Fainelli &lt;f.fainelli@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit cef397038167ac15d085914493d6c86385773709 upstream.

Stefan Agner reported a bug when using zsram on 32-bit Arm machines
with RAM above the 4GB address boundary:

  Unable to handle kernel NULL pointer dereference at virtual address 00000000
  pgd = a27bd01c
  [00000000] *pgd=236a0003, *pmd=1ffa64003
  Internal error: Oops: 207 [#1] SMP ARM
  Modules linked in: mdio_bcm_unimac(+) brcmfmac cfg80211 brcmutil raspberrypi_hwmon hci_uart crc32_arm_ce bcm2711_thermal phy_generic genet
  CPU: 0 PID: 123 Comm: mkfs.ext4 Not tainted 5.9.6 #1
  Hardware name: BCM2711
  PC is at zs_map_object+0x94/0x338
  LR is at zram_bvec_rw.constprop.0+0x330/0xa64
  pc : [&lt;c0602b38&gt;]    lr : [&lt;c0bda6a0&gt;]    psr: 60000013
  sp : e376bbe0  ip : 00000000  fp : c1e2921c
  r10: 00000002  r9 : c1dda730  r8 : 00000000
  r7 : e8ff7a00  r6 : 00000000  r5 : 02f9ffa0  r4 : e3710000
  r3 : 000fdffe  r2 : c1e0ce80  r1 : ebf979a0  r0 : 00000000
  Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
  Control: 30c5383d  Table: 235c2a80  DAC: fffffffd
  Process mkfs.ext4 (pid: 123, stack limit = 0x495a22e6)
  Stack: (0xe376bbe0 to 0xe376c000)

As it turns out, zsram needs to know the maximum memory size, which
is defined in MAX_PHYSMEM_BITS when CONFIG_SPARSEMEM is set, or in
MAX_POSSIBLE_PHYSMEM_BITS on the x86 architecture.

The same problem will be hit on all 32-bit architectures that have a
physical address space larger than 4GB and happen to not enable sparsemem
and include asm/sparsemem.h from asm/pgtable.h.

After the initial discussion, I suggested just always defining
MAX_POSSIBLE_PHYSMEM_BITS whenever CONFIG_PHYS_ADDR_T_64BIT is
set, or provoking a build error otherwise. This addresses all
configurations that can currently have this runtime bug, but
leaves all other configurations unchanged.

I looked up the possible number of bits in source code and
datasheets, here is what I found:

 - on ARC, CONFIG_ARC_HAS_PAE40 controls whether 32 or 40 bits are used
 - on ARM, CONFIG_LPAE enables 40 bit addressing, without it we never
   support more than 32 bits, even though supersections in theory allow
   up to 40 bits as well.
 - on MIPS, some MIPS32r1 or later chips support 36 bits, and MIPS32r5
   XPA supports up to 60 bits in theory, but 40 bits are more than
   anyone will ever ship
 - On PowerPC, there are three different implementations of 36 bit
   addressing, but 32-bit is used without CONFIG_PTE_64BIT
 - On RISC-V, the normal page table format can support 34 bit
   addressing. There is no highmem support on RISC-V, so anything
   above 2GB is unused, but it might be useful to eventually support
   CONFIG_ZRAM for high pages.

Fixes: 61989a80fb3a ("staging: zsmalloc: zsmalloc memory allocation library")
Fixes: 02390b87a945 ("mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS")
Acked-by: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Reviewed-by: Stefan Agner &lt;stefan@agner.ch&gt;
Tested-by: Stefan Agner &lt;stefan@agner.ch&gt;
Acked-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Link: https://lore.kernel.org/linux-mm/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.1604762181.git.stefan@agner.ch/
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
[florian: patch arch/powerpc/include/asm/pte-common.h for 4.19.y]
Signed-off-by: Florian Fainelli &lt;f.fainelli@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "MIPS: add PMD table accounting into MIPS'pmd_alloc_one"</title>
<updated>2021-07-28T09:13:49+00:00</updated>
<author>
<name>Huang Pei</name>
<email>huangpei@loongson.cn</email>
</author>
<published>2021-07-26T07:26:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2e207001ef2446dc282a14f4fabc10dd135fc1d6'/>
<id>2e207001ef2446dc282a14f4fabc10dd135fc1d6</id>
<content type='text'>
This reverts commit 68046cc531577b8f0ebe67ccf18b9c70106d7937 which is
commit ed914d48b6a1040d1039d371b56273d422c0081e upstream.

Commit b2b29d6d011944 (mm: account PMD tables like PTE tables) is
introduced between v5.9 and v5.10, so this fix (commit 002d8b395fa1)
should NOT apply to any pre-5.10 branch.

Signed-off-by: Huang Pei &lt;huangpei@loongson.cn&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit 68046cc531577b8f0ebe67ccf18b9c70106d7937 which is
commit ed914d48b6a1040d1039d371b56273d422c0081e upstream.

Commit b2b29d6d011944 (mm: account PMD tables like PTE tables) is
introduced between v5.9 and v5.10, so this fix (commit 002d8b395fa1)
should NOT apply to any pre-5.10 branch.

Signed-off-by: Huang Pei &lt;huangpei@loongson.cn&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>MIPS: set mips32r5 for virt extensions</title>
<updated>2021-07-20T14:16:03+00:00</updated>
<author>
<name>Nick Desaulniers</name>
<email>ndesaulniers@google.com</email>
</author>
<published>2021-06-28T21:50:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ebfe50a690080e7fcccff503f4fdc8af120d3368'/>
<id>ebfe50a690080e7fcccff503f4fdc8af120d3368</id>
<content type='text'>
[ Upstream commit c994a3ec7ecc8bd2a837b2061e8a76eb8efc082b ]

Clang's integrated assembler only accepts these instructions when the
cpu is set to mips32r5. With this change, we can assemble
malta_defconfig with Clang via `make LLVM_IAS=1`.

Link: https://github.com/ClangBuiltLinux/linux/issues/763
Reported-by: Dmitry Golovin &lt;dima@golovin.in&gt;
Signed-off-by: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Signed-off-by: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit c994a3ec7ecc8bd2a837b2061e8a76eb8efc082b ]

Clang's integrated assembler only accepts these instructions when the
cpu is set to mips32r5. With this change, we can assemble
malta_defconfig with Clang via `make LLVM_IAS=1`.

Link: https://github.com/ClangBuiltLinux/linux/issues/763
Reported-by: Dmitry Golovin &lt;dima@golovin.in&gt;
Signed-off-by: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Signed-off-by: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>MIPS: add PMD table accounting into MIPS'pmd_alloc_one</title>
<updated>2021-07-20T14:16:01+00:00</updated>
<author>
<name>Huang Pei</name>
<email>huangpei@loongson.cn</email>
</author>
<published>2021-06-11T07:09:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=68046cc531577b8f0ebe67ccf18b9c70106d7937'/>
<id>68046cc531577b8f0ebe67ccf18b9c70106d7937</id>
<content type='text'>
[ Upstream commit ed914d48b6a1040d1039d371b56273d422c0081e ]

This fixes Page Table accounting bug.

MIPS is the ONLY arch just defining __HAVE_ARCH_PMD_ALLOC_ONE alone.
Since commit b2b29d6d011944 (mm: account PMD tables like PTE tables),
"pmd_free" in asm-generic with PMD table accounting and "pmd_alloc_one"
in MIPS without PMD table accounting causes PageTable accounting number
negative, which read by global_zone_page_state(), always returns 0.

Signed-off-by: Huang Pei &lt;huangpei@loongson.cn&gt;
Signed-off-by: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit ed914d48b6a1040d1039d371b56273d422c0081e ]

This fixes Page Table accounting bug.

MIPS is the ONLY arch just defining __HAVE_ARCH_PMD_ALLOC_ONE alone.
Since commit b2b29d6d011944 (mm: account PMD tables like PTE tables),
"pmd_free" in asm-generic with PMD table accounting and "pmd_alloc_one"
in MIPS without PMD table accounting causes PageTable accounting number
negative, which read by global_zone_page_state(), always returns 0.

Signed-off-by: Huang Pei &lt;huangpei@loongson.cn&gt;
Signed-off-by: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>hugetlb: clear huge pte during flush function on mips platform</title>
<updated>2021-07-20T14:15:59+00:00</updated>
<author>
<name>Bibo Mao</name>
<email>maobibo@loongson.cn</email>
</author>
<published>2020-06-29T13:15:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=909bfdecf39843d1e42a9480a9c3ed49f7579707'/>
<id>909bfdecf39843d1e42a9480a9c3ed49f7579707</id>
<content type='text'>
[ Upstream commit 33ae8f801ad8bec48e886d368739feb2816478f2 ]

If multiple threads are accessing the same huge page at the same
time, hugetlb_cow will be called if one thread write the COW huge
page. And function huge_ptep_clear_flush is called to notify other
threads to clear the huge pte tlb entry. The other threads clear
the huge pte tlb entry and reload it from page table, the reload
huge pte entry may be old.

This patch fixes this issue on mips platform, and it clears huge
pte entry before notifying other threads to flush current huge
page entry, it is similar with other architectures.

Signed-off-by: Bibo Mao &lt;maobibo@loongson.cn&gt;
Signed-off-by: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 33ae8f801ad8bec48e886d368739feb2816478f2 ]

If multiple threads are accessing the same huge page at the same
time, hugetlb_cow will be called if one thread write the COW huge
page. And function huge_ptep_clear_flush is called to notify other
threads to clear the huge pte tlb entry. The other threads clear
the huge pte tlb entry and reload it from page table, the reload
huge pte entry may be old.

This patch fixes this issue on mips platform, and it clears huge
pte entry before notifying other threads to flush current huge
page entry, it is similar with other architectures.

Signed-off-by: Bibo Mao &lt;maobibo@loongson.cn&gt;
Signed-off-by: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>MIPS: Avoid handcoded DIVU in `__div64_32' altogether</title>
<updated>2021-05-22T08:59:47+00:00</updated>
<author>
<name>Maciej W. Rozycki</name>
<email>macro@orcam.me.uk</email>
</author>
<published>2021-04-22T20:36:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a5e67c081d953b75a037802ec04a04c6a205efeb'/>
<id>a5e67c081d953b75a037802ec04a04c6a205efeb</id>
<content type='text'>
commit 25ab14cbe9d1b66fda44c71a2db7582a31b6f5cd upstream.

Remove the inline asm with a DIVU instruction from `__div64_32' and use
plain C code for the intended DIVMOD calculation instead.  GCC is smart
enough to know that both the quotient and the remainder are calculated
with single DIVU, so with ISAs up to R5 the same instruction is actually
produced with overall similar code.

For R6 compiled code will work, but separate DIVU and MODU instructions
will be produced, which are also interlocked, so scalar implementations
will likely not perform as well as older ISAs with their asynchronous MD
unit.  Likely still faster then the generic algorithm though.

This removes a compilation error for R6 however where the original DIVU
instruction is not supported anymore and the MDU accumulator registers
have been removed and consequently GCC complains as to a constraint it
cannot find a register for:

In file included from ./include/linux/math.h:5,
                 from ./include/linux/kernel.h:13,
                 from mm/page-writeback.c:15:
./include/linux/math64.h: In function 'div_u64_rem':
./arch/mips/include/asm/div64.h:76:17: error: inconsistent operand constraints in an 'asm'
   76 |                 __asm__("divu   $0, %z1, %z2"                           \
      |                 ^~~~~~~
./include/asm-generic/div64.h:245:25: note: in expansion of macro '__div64_32'
  245 |                 __rem = __div64_32(&amp;(n), __base);       \
      |                         ^~~~~~~~~~
./include/linux/math64.h:91:22: note: in expansion of macro 'do_div'
   91 |         *remainder = do_div(dividend, divisor);
      |                      ^~~~~~

This has passed correctness verification with test_div64 and reduced the
module's average execution time down to 1.0404s from 1.0445s with R3400
@40MHz.  The module's MIPS I machine code has also shrunk by 12 bytes or
3 instructions.

Signed-off-by: Maciej W. Rozycki &lt;macro@orcam.me.uk&gt;
Signed-off-by: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 25ab14cbe9d1b66fda44c71a2db7582a31b6f5cd upstream.

Remove the inline asm with a DIVU instruction from `__div64_32' and use
plain C code for the intended DIVMOD calculation instead.  GCC is smart
enough to know that both the quotient and the remainder are calculated
with single DIVU, so with ISAs up to R5 the same instruction is actually
produced with overall similar code.

For R6 compiled code will work, but separate DIVU and MODU instructions
will be produced, which are also interlocked, so scalar implementations
will likely not perform as well as older ISAs with their asynchronous MD
unit.  Likely still faster then the generic algorithm though.

This removes a compilation error for R6 however where the original DIVU
instruction is not supported anymore and the MDU accumulator registers
have been removed and consequently GCC complains as to a constraint it
cannot find a register for:

In file included from ./include/linux/math.h:5,
                 from ./include/linux/kernel.h:13,
                 from mm/page-writeback.c:15:
./include/linux/math64.h: In function 'div_u64_rem':
./arch/mips/include/asm/div64.h:76:17: error: inconsistent operand constraints in an 'asm'
   76 |                 __asm__("divu   $0, %z1, %z2"                           \
      |                 ^~~~~~~
./include/asm-generic/div64.h:245:25: note: in expansion of macro '__div64_32'
  245 |                 __rem = __div64_32(&amp;(n), __base);       \
      |                         ^~~~~~~~~~
./include/linux/math64.h:91:22: note: in expansion of macro 'do_div'
   91 |         *remainder = do_div(dividend, divisor);
      |                      ^~~~~~

This has passed correctness verification with test_div64 and reduced the
module's average execution time down to 1.0404s from 1.0445s with R3400
@40MHz.  The module's MIPS I machine code has also shrunk by 12 bytes or
3 instructions.

Signed-off-by: Maciej W. Rozycki &lt;macro@orcam.me.uk&gt;
Signed-off-by: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>MIPS: Avoid DIVU in `__div64_32' is result would be zero</title>
<updated>2021-05-22T08:59:47+00:00</updated>
<author>
<name>Maciej W. Rozycki</name>
<email>macro@orcam.me.uk</email>
</author>
<published>2021-04-20T02:50:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c70877c0fa032cc7efebe013d67827cdf9032dd8'/>
<id>c70877c0fa032cc7efebe013d67827cdf9032dd8</id>
<content type='text'>
commit c1d337d45ec0a802299688e17d568c4e3a585895 upstream.

We already check the high part of the divident against zero to avoid the
costly DIVU instruction in that case, needed to reduce the high part of
the divident, so we may well check against the divisor instead and set
the high part of the quotient to zero right away.  We need to treat the
high part the divident in that case though as the remainder that would
be calculated by the DIVU instruction we avoided.

This has passed correctness verification with test_div64 and reduced the
module's average execution time down to 1.0445s and 0.2619s from 1.0668s
and 0.2629s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz.

Signed-off-by: Maciej W. Rozycki &lt;macro@orcam.me.uk&gt;
Signed-off-by: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit c1d337d45ec0a802299688e17d568c4e3a585895 upstream.

We already check the high part of the divident against zero to avoid the
costly DIVU instruction in that case, needed to reduce the high part of
the divident, so we may well check against the divisor instead and set
the high part of the quotient to zero right away.  We need to treat the
high part the divident in that case though as the remainder that would
be calculated by the DIVU instruction we avoided.

This has passed correctness verification with test_div64 and reduced the
module's average execution time down to 1.0445s and 0.2619s from 1.0668s
and 0.2629s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz.

Signed-off-by: Maciej W. Rozycki &lt;macro@orcam.me.uk&gt;
Signed-off-by: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>MIPS: Reinstate platform `__div64_32' handler</title>
<updated>2021-05-22T08:59:47+00:00</updated>
<author>
<name>Maciej W. Rozycki</name>
<email>macro@orcam.me.uk</email>
</author>
<published>2021-04-20T02:50:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1506af3076817306f61c64fd67b180746b14c7fe'/>
<id>1506af3076817306f61c64fd67b180746b14c7fe</id>
<content type='text'>
commit c49f71f60754acbff37505e1d16ca796bf8a8140 upstream.

Our current MIPS platform `__div64_32' handler is inactive, because it
is incorrectly only enabled for 64-bit configurations, for which generic
`do_div' code does not call it anyway.

The handler is not suitable for being called from there though as it
only calculates 32 bits of the quotient under the assumption the 64-bit
divident has been suitably reduced.  Code for such reduction used to be
there, however it has been incorrectly removed with commit c21004cd5b4c
("MIPS: Rewrite &lt;asm/div64.h&gt; to work with gcc 4.4.0."), which should
have only updated an obsoleted constraint for an inline asm involving
$hi and $lo register outputs, while possibly wiring the original MIPS
variant of the `do_div' macro as `__div64_32' handler for the generic
`do_div' implementation

Correct the handler as follows then:

- Revert most of the commit referred, however retaining the current
  formatting, except for the final two instructions of the inline asm
  sequence, which the original commit missed.  Omit the original 64-bit
  parts though.

- Rename the original `do_div' macro to `__div64_32'.  Use the combined
  `x' constraint referring to the MD accumulator as a whole, replacing
  the original individual `h' and `l' constraints used for $hi and $lo
  registers respectively, of which `h' has been obsoleted with GCC 4.4.
  Update surrounding code accordingly.

  We have since removed support for GCC versions before 4.9, so no need
  for a special arrangement here; GCC has supported the `x' constraint
  since forever anyway, or at least going back to 1991.

- Rename the `__base' local variable in `__div64_32' to `__radix' to
  avoid a conflict with a local variable in `do_div'.

- Actually enable this code for 32-bit rather than 64-bit configurations
  by qualifying it with BITS_PER_LONG being 32 instead of 64.  Include
  &lt;asm/bitsperlong.h&gt; for this macro rather than &lt;linux/types.h&gt; as we
  don't need anything else.

- Finally include &lt;asm-generic/div64.h&gt; last rather than first.

This has passed correctness verification with test_div64 and reduced the
module's average execution time down to 1.0668s and 0.2629s from 2.1529s
and 0.5647s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz.
For a reference 64-bit `do_div' code where we have the DDIVU instruction
available to do the whole calculation right away averages at 0.0660s for
the latter CPU.

Fixes: c21004cd5b4c ("MIPS: Rewrite &lt;asm/div64.h&gt; to work with gcc 4.4.0.")
Reported-by: Huacai Chen &lt;chenhuacai@kernel.org&gt;
Signed-off-by: Maciej W. Rozycki &lt;macro@orcam.me.uk&gt;
Cc: stable@vger.kernel.org # v2.6.30+
Signed-off-by: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit c49f71f60754acbff37505e1d16ca796bf8a8140 upstream.

Our current MIPS platform `__div64_32' handler is inactive, because it
is incorrectly only enabled for 64-bit configurations, for which generic
`do_div' code does not call it anyway.

The handler is not suitable for being called from there though as it
only calculates 32 bits of the quotient under the assumption the 64-bit
divident has been suitably reduced.  Code for such reduction used to be
there, however it has been incorrectly removed with commit c21004cd5b4c
("MIPS: Rewrite &lt;asm/div64.h&gt; to work with gcc 4.4.0."), which should
have only updated an obsoleted constraint for an inline asm involving
$hi and $lo register outputs, while possibly wiring the original MIPS
variant of the `do_div' macro as `__div64_32' handler for the generic
`do_div' implementation

Correct the handler as follows then:

- Revert most of the commit referred, however retaining the current
  formatting, except for the final two instructions of the inline asm
  sequence, which the original commit missed.  Omit the original 64-bit
  parts though.

- Rename the original `do_div' macro to `__div64_32'.  Use the combined
  `x' constraint referring to the MD accumulator as a whole, replacing
  the original individual `h' and `l' constraints used for $hi and $lo
  registers respectively, of which `h' has been obsoleted with GCC 4.4.
  Update surrounding code accordingly.

  We have since removed support for GCC versions before 4.9, so no need
  for a special arrangement here; GCC has supported the `x' constraint
  since forever anyway, or at least going back to 1991.

- Rename the `__base' local variable in `__div64_32' to `__radix' to
  avoid a conflict with a local variable in `do_div'.

- Actually enable this code for 32-bit rather than 64-bit configurations
  by qualifying it with BITS_PER_LONG being 32 instead of 64.  Include
  &lt;asm/bitsperlong.h&gt; for this macro rather than &lt;linux/types.h&gt; as we
  don't need anything else.

- Finally include &lt;asm-generic/div64.h&gt; last rather than first.

This has passed correctness verification with test_div64 and reduced the
module's average execution time down to 1.0668s and 0.2629s from 2.1529s
and 0.5647s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz.
For a reference 64-bit `do_div' code where we have the DDIVU instruction
available to do the whole calculation right away averages at 0.0660s for
the latter CPU.

Fixes: c21004cd5b4c ("MIPS: Rewrite &lt;asm/div64.h&gt; to work with gcc 4.4.0.")
Reported-by: Huacai Chen &lt;chenhuacai@kernel.org&gt;
Signed-off-by: Maciej W. Rozycki &lt;macro@orcam.me.uk&gt;
Cc: stable@vger.kernel.org # v2.6.30+
Signed-off-by: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
