<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/arc/include/asm/cache.h, branch v4.19</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>ARC: dma [non-IOC] setup SMP_CACHE_BYTES and cache_line_size</title>
<updated>2018-07-27T20:12:45+00:00</updated>
<author>
<name>Eugeniy Paltsev</name>
<email>Eugeniy.Paltsev@synopsys.com</email>
</author>
<published>2018-07-26T13:15:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=eb2777397fd83a4a7eaa26984d09d3babb845d2a'/>
<id>eb2777397fd83a4a7eaa26984d09d3babb845d2a</id>
<content type='text'>
As for today we don't setup SMP_CACHE_BYTES and cache_line_size for
ARC, so they are set to L1_CACHE_BYTES by default. L1 line length
(L1_CACHE_BYTES) might be easily smaller than L2 line (which is
usually the case BTW). This breaks code.

For example this breaks ethernet infrastructure on HSDK/AXS103 boards
with IOC disabled, involving manual cache flushes
Functions which alloc and manage sk_buff packet data area rely on
SMP_CACHE_BYTES define. In the result we can share last L2 cache
line in sk_buff linear packet data area between DMA buffer and
some useful data in other structure. So we can lose this data when
we invalidate DMA buffer.

   sk_buff linear packet data area
                |
                |
                |         skb-&gt;end        skb-&gt;tail
                V            |                |
                             V                V
----------------------------------------------.
      packet data            | &lt;tail padding&gt; |  &lt;useful data in other struct&gt;
----------------------------------------------.

---------------------.--------------------------------------------------.
     SLC line        |             SLC (L2 cache) line (128B)           |
---------------------.--------------------------------------------------.
        ^                                     ^
        |                                     |
     These cache lines will be invalidated when we invalidate skb
     linear packet data area before DMA transaction starting.

This leads to issues painful to debug as it reproduces only if
(sk_buff-&gt;end - sk_buff-&gt;tail) &lt; SLC_LINE_SIZE and
if we have some useful data right after sk_buff-&gt;end.

Fix that by hardcode SMP_CACHE_BYTES to max line length we may have.

Signed-off-by: Eugeniy Paltsev &lt;Eugeniy.Paltsev@synopsys.com&gt;
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As for today we don't setup SMP_CACHE_BYTES and cache_line_size for
ARC, so they are set to L1_CACHE_BYTES by default. L1 line length
(L1_CACHE_BYTES) might be easily smaller than L2 line (which is
usually the case BTW). This breaks code.

For example this breaks ethernet infrastructure on HSDK/AXS103 boards
with IOC disabled, involving manual cache flushes
Functions which alloc and manage sk_buff packet data area rely on
SMP_CACHE_BYTES define. In the result we can share last L2 cache
line in sk_buff linear packet data area between DMA buffer and
some useful data in other structure. So we can lose this data when
we invalidate DMA buffer.

   sk_buff linear packet data area
                |
                |
                |         skb-&gt;end        skb-&gt;tail
                V            |                |
                             V                V
----------------------------------------------.
      packet data            | &lt;tail padding&gt; |  &lt;useful data in other struct&gt;
----------------------------------------------.

---------------------.--------------------------------------------------.
     SLC line        |             SLC (L2 cache) line (128B)           |
---------------------.--------------------------------------------------.
        ^                                     ^
        |                                     |
     These cache lines will be invalidated when we invalidate skb
     linear packet data area before DMA transaction starting.

This leads to issues painful to debug as it reproduces only if
(sk_buff-&gt;end - sk_buff-&gt;tail) &lt; SLC_LINE_SIZE and
if we have some useful data right after sk_buff-&gt;end.

Fix that by hardcode SMP_CACHE_BYTES to max line length we may have.

Signed-off-by: Eugeniy Paltsev &lt;Eugeniy.Paltsev@synopsys.com&gt;
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARCv2: SLC: provide a line based flush routine for debugging</title>
<updated>2017-08-30T16:21:34+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2017-08-01T04:53:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ae0b63d97d8efc377cc5b161abccc6e3586b206f'/>
<id>ae0b63d97d8efc377cc5b161abccc6e3586b206f</id>
<content type='text'>
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARC: Hardcode ARCH_DMA_MINALIGN to max line length we may have</title>
<updated>2017-08-30T16:21:25+00:00</updated>
<author>
<name>Alexey Brodkin</name>
<email>alexey.brodkin@gmail.com</email>
</author>
<published>2017-07-18T14:31:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9f82e90a6668e522c7fd0e0322c52d86f29b624d'/>
<id>9f82e90a6668e522c7fd0e0322c52d86f29b624d</id>
<content type='text'>
Current implementation relies on L1 line length which might easily
be smaller than L2 line (which is usually the case BTW).

Imagine this typical case: L2 line is 128 bytes while L1 line is
64-bytes. Now we want to allocate small buffer and later use it for DMA
(consider IOC is not available).

kmalloc() allocates small KMALLOC_MIN_SIZE-sized, KMALLOC_MIN_SIZE-aligned
That way if buffer happens to be aligned to L1 line and not L2 line we'll be
flushing and invalidating extra portions of data from L2 which will cause
cache coherency issues.

And since KMALLOC_MIN_SIZE is bound to ARCH_DMA_MINALIGN the fix could
be simple - set ARCH_DMA_MINALIGN to the largest cache line we may ever
get. As of today neither L1 of ARC700 and ARC HS38 nor SLC might not be
longer than 128 bytes.

Signed-off-by: Alexey Brodkin &lt;abrodkin@synopsys.com&gt;
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Current implementation relies on L1 line length which might easily
be smaller than L2 line (which is usually the case BTW).

Imagine this typical case: L2 line is 128 bytes while L1 line is
64-bytes. Now we want to allocate small buffer and later use it for DMA
(consider IOC is not available).

kmalloc() allocates small KMALLOC_MIN_SIZE-sized, KMALLOC_MIN_SIZE-aligned
That way if buffer happens to be aligned to L1 line and not L2 line we'll be
flushing and invalidating extra portions of data from L2 which will cause
cache coherency issues.

And since KMALLOC_MIN_SIZE is bound to ARCH_DMA_MINALIGN the fix could
be simple - set ARCH_DMA_MINALIGN to the largest cache line we may ever
get. As of today neither L1 of ARC700 and ARC HS38 nor SLC might not be
longer than 128 bytes.

Signed-off-by: Alexey Brodkin &lt;abrodkin@synopsys.com&gt;
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARCv2: PAE40: Explicitly set MSB counterpart of SLC region ops addresses</title>
<updated>2017-08-04T08:26:34+00:00</updated>
<author>
<name>Alexey Brodkin</name>
<email>Alexey.Brodkin@synopsys.com</email>
</author>
<published>2017-08-01T09:58:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7d79cee2c6540ea64dd917a14e2fd63d4ac3d3c0'/>
<id>7d79cee2c6540ea64dd917a14e2fd63d4ac3d3c0</id>
<content type='text'>
It is necessary to explicitly set both SLC_AUX_RGN_START1 and SLC_AUX_RGN_END1
which hold MSB bits of the physical address correspondingly of region start
and end otherwise SLC region operation is executed in unpredictable manner

Without this patch, SLC flushes on HSDK (IOC disabled) were taking
seconds.

Cc: stable@vger.kernel.org   #4.4+
Reported-by: Vladimir Kondratiev &lt;vladimir.kondratiev@intel.com&gt;
Signed-off-by: Alexey Brodkin &lt;abrodkin@synopsys.com&gt;
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
[vgupta: PAR40 regs only written if PAE40 exist]
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It is necessary to explicitly set both SLC_AUX_RGN_START1 and SLC_AUX_RGN_END1
which hold MSB bits of the physical address correspondingly of region start
and end otherwise SLC region operation is executed in unpredictable manner

Without this patch, SLC flushes on HSDK (IOC disabled) were taking
seconds.

Cc: stable@vger.kernel.org   #4.4+
Reported-by: Vladimir Kondratiev &lt;vladimir.kondratiev@intel.com&gt;
Signed-off-by: Alexey Brodkin &lt;abrodkin@synopsys.com&gt;
Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
[vgupta: PAR40 regs only written if PAE40 exist]
</pre>
</div>
</content>
</entry>
<entry>
<title>ARCv2: mm: micro-optimize region flush generated code</title>
<updated>2017-05-02T23:40:29+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2017-05-02T23:23:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f734a31083324b8f4f24b2c5cba178c7459db309'/>
<id>f734a31083324b8f4f24b2c5cba178c7459db309</id>
<content type='text'>
DC_CTRL.RGN_OP is 3 bits wide, however only 1 bit is used in current
programming model (0: flush, 1: invalidate)

The current code targetting 3 bits leads to additional 8 byte AND
operation which can be elided given that only 1 bit is ever set by
software and/or looked at by hardware

before
------

| 80b63324 &lt;__dma_cache_wback_inv_l1&gt;:
| 80b63324:	clri	r3
| 80b63328:	lr	r2,[dc_ctrl]
| 80b6332c:	and	r2,r2,0xfffff1ff	&lt;--- 8 bytes insn
| 80b63334:	or	r2,r2,576
| 80b63338:	sr	r2,[dc_ctrl]
| ...
| ...
| 80b63360 &lt;__dma_cache_inv_l1&gt;:
| 80b63360:	clri	r3
| 80b63364:	lr	r2,[dc_ctrl]
| 80b63368:	and	r2,r2,0xfffff1ff	&lt;--- 8 bytes insn
| 80b63370:	bset_s	r2,r2,0x9
| 80b63372:	sr	r2,[dc_ctrl]
| ...
| ...
| 80b6338c &lt;__dma_cache_wback_l1&gt;:
| 80b6338c:	clri	r3
| 80b63390:	lr	r2,[dc_ctrl]
| 80b63394:	and	r2,r2,0xfffff1ff	&lt;--- 8 bytes insn
| 80b6339c:	sr	r2,[dc_ctrl]

after (AND elided totally in 2 cases, replaced with 2 byte BCLR in 3rd)
-----

| 80b63324 &lt;__dma_cache_wback_inv_l1&gt;:
| 80b63324:	clri	r3
| 80b63328:	lr	r2,[dc_ctrl]
| 80b6332c:	or	r2,r2,576
| 80b63330:	sr	r2,[dc_ctrl]
| ...
| ...
| 80b63358 &lt;__dma_cache_inv_l1&gt;:
| 80b63358:	clri	r3
| 80b6335c:	lr	r2,[dc_ctrl]
| 80b63360:	bset_s	r2,r2,0x9
| 80b63362:	sr	r2,[dc_ctrl]
| ...
| ...
| 80b6337c &lt;__dma_cache_wback_l1&gt;:
| 80b6337c:	clri	r3
| 80b63380:	lr	r2,[dc_ctrl]
| 80b63384:	bclr_s	r2,r2,0x9
| 80b63386:	sr	r2,[dc_ctrl]

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
DC_CTRL.RGN_OP is 3 bits wide, however only 1 bit is used in current
programming model (0: flush, 1: invalidate)

The current code targetting 3 bits leads to additional 8 byte AND
operation which can be elided given that only 1 bit is ever set by
software and/or looked at by hardware

before
------

| 80b63324 &lt;__dma_cache_wback_inv_l1&gt;:
| 80b63324:	clri	r3
| 80b63328:	lr	r2,[dc_ctrl]
| 80b6332c:	and	r2,r2,0xfffff1ff	&lt;--- 8 bytes insn
| 80b63334:	or	r2,r2,576
| 80b63338:	sr	r2,[dc_ctrl]
| ...
| ...
| 80b63360 &lt;__dma_cache_inv_l1&gt;:
| 80b63360:	clri	r3
| 80b63364:	lr	r2,[dc_ctrl]
| 80b63368:	and	r2,r2,0xfffff1ff	&lt;--- 8 bytes insn
| 80b63370:	bset_s	r2,r2,0x9
| 80b63372:	sr	r2,[dc_ctrl]
| ...
| ...
| 80b6338c &lt;__dma_cache_wback_l1&gt;:
| 80b6338c:	clri	r3
| 80b63390:	lr	r2,[dc_ctrl]
| 80b63394:	and	r2,r2,0xfffff1ff	&lt;--- 8 bytes insn
| 80b6339c:	sr	r2,[dc_ctrl]

after (AND elided totally in 2 cases, replaced with 2 byte BCLR in 3rd)
-----

| 80b63324 &lt;__dma_cache_wback_inv_l1&gt;:
| 80b63324:	clri	r3
| 80b63328:	lr	r2,[dc_ctrl]
| 80b6332c:	or	r2,r2,576
| 80b63330:	sr	r2,[dc_ctrl]
| ...
| ...
| 80b63358 &lt;__dma_cache_inv_l1&gt;:
| 80b63358:	clri	r3
| 80b6335c:	lr	r2,[dc_ctrl]
| 80b63360:	bset_s	r2,r2,0x9
| 80b63362:	sr	r2,[dc_ctrl]
| ...
| ...
| 80b6337c &lt;__dma_cache_wback_l1&gt;:
| 80b6337c:	clri	r3
| 80b63380:	lr	r2,[dc_ctrl]
| 80b63384:	bclr_s	r2,r2,0x9
| 80b63386:	sr	r2,[dc_ctrl]

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARCv2: mm: Implement cache region flush operations</title>
<updated>2017-05-02T22:57:22+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2014-08-29T05:25:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0d77117fc5c0333d024a183d6790167bb90c3b62'/>
<id>0d77117fc5c0333d024a183d6790167bb90c3b62</id>
<content type='text'>
These are more efficient than the per-line ops

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
These are more efficient than the per-line ops

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARCv2: IOC: Adhere to progamming model guidelines to avoid DMA corruption</title>
<updated>2017-01-18T22:48:33+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2016-06-22T10:31:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8c47f83ba45928ce9495fcf1b29e828c28e3c839'/>
<id>8c47f83ba45928ce9495fcf1b29e828c28e3c839</id>
<content type='text'>
On AXS103 release bitfiles, DMA data corruptions were seen because IOC
setup was not following the recommended way in documentation.

Flipping IOC on when caches are enabled or coherency transactions are in
flight, might cause some of the memory operations to not observe
coherency as expected.

So strictly follow the programming model recommendations as documented
in comment header above arc_ioc_setup()

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On AXS103 release bitfiles, DMA data corruptions were seen because IOC
setup was not following the recommended way in documentation.

Flipping IOC on when caches are enabled or coherency transactions are in
flight, might cause some of the memory operations to not observe
coherency as expected.

So strictly follow the programming model recommendations as documented
in comment header above arc_ioc_setup()

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARCv2: IOC: refactor the IOC and SLC operations into own functions</title>
<updated>2017-01-18T22:35:10+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2016-06-22T10:13:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d4911cdd3270da45d3a1c55bf28e88a932bbba7b'/>
<id>d4911cdd3270da45d3a1c55bf28e88a932bbba7b</id>
<content type='text'>
 - Move IOC setup into arc_ioc_setup()
 - Move SLC disabling into arc_slc_disable()

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
 - Move IOC setup into arc_ioc_setup()
 - Move SLC disabling into arc_slc_disable()

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARCv2: IOC: use @ioc_enable not @ioc_exist where intended</title>
<updated>2016-10-24T16:24:47+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2016-10-13T22:58:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=cf986d470208fbdd68b6934a86ccd81c04408484'/>
<id>cf986d470208fbdd68b6934a86ccd81c04408484</id>
<content type='text'>
if user disables IOC from debugger at startup (by clearing @ioc_enable),
@ioc_exists is cleared too. This means boot prints don't capture the
fact that IOC was present but disabled which could be misleading.

So invert how we use @ioc_enable and @ioc_exists and make it more
canonical. @ioc_exists represent whether hardware is present or not and
stays same whether enabled or not. @ioc_enable is still user driven,
but will be auto-disabled if IOC hardware is not present, i.e. if
@ioc_exist=0. This is opposite to what we were doing before, but much
clearer.

This means @ioc_enable is now the "exported" toggle in rest of code such
as dma mapping API.

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
if user disables IOC from debugger at startup (by clearing @ioc_enable),
@ioc_exists is cleared too. This means boot prints don't capture the
fact that IOC was present but disabled which could be misleading.

So invert how we use @ioc_enable and @ioc_exists and make it more
canonical. @ioc_exists represent whether hardware is present or not and
stays same whether enabled or not. @ioc_enable is still user driven,
but will be auto-disabled if IOC hardware is not present, i.e. if
@ioc_exist=0. This is opposite to what we were doing before, but much
clearer.

This means @ioc_enable is now the "exported" toggle in rest of code such
as dma mapping API.

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ARCv2: Support dynamic peripheral address space in HS38 rel 3.0 cores</title>
<updated>2016-09-30T21:48:17+00:00</updated>
<author>
<name>Vineet Gupta</name>
<email>vgupta@synopsys.com</email>
</author>
<published>2016-08-26T22:41:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=26c01c49d559268527d78f45a6818fae0c204a45'/>
<id>26c01c49d559268527d78f45a6818fae0c204a45</id>
<content type='text'>
HS release 3.0 provides for even more flexibility in specifying the
volatile address space for mapping peripherals.

With HS 2.1 @start was made flexible / programmable - with HS 3.0 even
@end can be setup (vs. fixed to 0xFFFF_FFFF before).

So add code to reflect that and while at it remove an unused struct
defintion

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
HS release 3.0 provides for even more flexibility in specifying the
volatile address space for mapping peripherals.

With HS 2.1 @start was made flexible / programmable - with HS 3.0 even
@end can be setup (vs. fixed to 0xFFFF_FFFF before).

So add code to reflect that and while at it remove an unused struct
defintion

Signed-off-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
