<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/x86/lib, branch v2.6.37</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>x86, mem: Optimize memmove for small size and unaligned cases</title>
<updated>2010-09-25T01:57:11+00:00</updated>
<author>
<name>Ma Ling</name>
<email>ling.ma@intel.com</email>
</author>
<published>2010-09-16T19:12:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3b4b682becdfa9f42321aa024d5cc84f71f06d8c'/>
<id>3b4b682becdfa9f42321aa024d5cc84f71f06d8c</id>
<content type='text'>
movs instruction will combine data to accelerate moving data,
however we need to concern two cases about it.

1. movs instruction need long lantency to startup,
   so here we use general mov instruction to copy data.
2. movs instruction is not good for unaligned case,
   even if src offset is 0x10, dest offset is 0x0,
   we avoid and handle the case by general mov instruction.

Signed-off-by: Ma Ling &lt;ling.ma@intel.com&gt;
LKML-Reference: &lt;1284664360-6138-1-git-send-email-ling.ma@intel.com&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
movs instruction will combine data to accelerate moving data,
however we need to concern two cases about it.

1. movs instruction need long lantency to startup,
   so here we use general mov instruction to copy data.
2. movs instruction is not good for unaligned case,
   even if src offset is 0x10, dest offset is 0x0,
   we avoid and handle the case by general mov instruction.

Signed-off-by: Ma Ling &lt;ling.ma@intel.com&gt;
LKML-Reference: &lt;1284664360-6138-1-git-send-email-ling.ma@intel.com&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86, mem: Optimize memcpy by avoiding memory false dependece</title>
<updated>2010-08-23T21:56:41+00:00</updated>
<author>
<name>Ma Ling</name>
<email>ling.ma@intel.com</email>
</author>
<published>2010-06-28T19:24:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=59daa706fbec745684702741b9f5373142dd9fdc'/>
<id>59daa706fbec745684702741b9f5373142dd9fdc</id>
<content type='text'>
All read operations after allocation stage can run speculatively,
all write operation will run in program order, and if addresses are
different read may run before older write operation, otherwise wait
until write commit. However CPU don't check each address bit,
so read could fail to recognize different address even they
are in different page.For example if rsi is 0xf004, rdi is 0xe008,
in following operation there will generate big performance latency.
1. movq (%rsi),	%rax
2. movq %rax,	(%rdi)
3. movq 8(%rsi), %rax
4. movq %rax,	8(%rdi)

If %rsi and rdi were in really the same meory page, there are TRUE
read-after-write dependence because instruction 2 write 0x008 and
instruction 3 read 0x00c, the two address are overlap partially.
Actually there are in different page and no any issues,
but without checking each address bit CPU could think they are
in the same page, and instruction 3 have to wait for instruction 2
to write data into cache from write buffer, then load data from cache,
the cost time read spent is equal to mfence instruction. We may avoid it by
tuning operation sequence as follow.

1. movq 8(%rsi), %rax
2. movq %rax,	8(%rdi)
3. movq (%rsi),	%rax
4. movq %rax,	(%rdi)

Instruction 3 read 0x004, instruction 2 write address 0x010, no any
dependence.  At last on Core2 we gain 1.83x speedup compared with
original instruction sequence.  In this patch we first handle small
size(less 20bytes), then jump to different copy mode. Based on our
micro-benchmark small bytes from 1 to 127 bytes, we got up to 2X
improvement, and up to 1.5X improvement for 1024 bytes on Corei7.  (We
use our micro-benchmark, and will do further test according to your
requirment)

Signed-off-by: Ma Ling &lt;ling.ma@intel.com&gt;
LKML-Reference: &lt;1277753065-18610-1-git-send-email-ling.ma@intel.com&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
All read operations after allocation stage can run speculatively,
all write operation will run in program order, and if addresses are
different read may run before older write operation, otherwise wait
until write commit. However CPU don't check each address bit,
so read could fail to recognize different address even they
are in different page.For example if rsi is 0xf004, rdi is 0xe008,
in following operation there will generate big performance latency.
1. movq (%rsi),	%rax
2. movq %rax,	(%rdi)
3. movq 8(%rsi), %rax
4. movq %rax,	8(%rdi)

If %rsi and rdi were in really the same meory page, there are TRUE
read-after-write dependence because instruction 2 write 0x008 and
instruction 3 read 0x00c, the two address are overlap partially.
Actually there are in different page and no any issues,
but without checking each address bit CPU could think they are
in the same page, and instruction 3 have to wait for instruction 2
to write data into cache from write buffer, then load data from cache,
the cost time read spent is equal to mfence instruction. We may avoid it by
tuning operation sequence as follow.

1. movq 8(%rsi), %rax
2. movq %rax,	8(%rdi)
3. movq (%rsi),	%rax
4. movq %rax,	(%rdi)

Instruction 3 read 0x004, instruction 2 write address 0x010, no any
dependence.  At last on Core2 we gain 1.83x speedup compared with
original instruction sequence.  In this patch we first handle small
size(less 20bytes), then jump to different copy mode. Based on our
micro-benchmark small bytes from 1 to 127 bytes, we got up to 2X
improvement, and up to 1.5X improvement for 1024 bytes on Corei7.  (We
use our micro-benchmark, and will do further test according to your
requirment)

Signed-off-by: Ma Ling &lt;ling.ma@intel.com&gt;
LKML-Reference: &lt;1277753065-18610-1-git-send-email-ling.ma@intel.com&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86, mem: Don't implement forward memmove() as memcpy()</title>
<updated>2010-08-23T21:14:27+00:00</updated>
<author>
<name>Ma, Ling</name>
<email>ling.ma@intel.com</email>
</author>
<published>2010-08-23T21:11:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fdf4289679fd41d76553ce224750e9737cd80eea'/>
<id>fdf4289679fd41d76553ce224750e9737cd80eea</id>
<content type='text'>
memmove() allow source and destination address to be overlap, but
there is no such limitation for memcpy().  Therefore, explicitly
implement memmove() in both the forwards and backward directions, to
give us the ability to optimize memcpy().

Signed-off-by: Ma Ling &lt;ling.ma@intel.com&gt;
LKML-Reference: &lt;C10D3FB0CD45994C8A51FEC1227CE22F0E483AD86A@shsmsx502.ccr.corp.intel.com&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@linux.intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
memmove() allow source and destination address to be overlap, but
there is no such limitation for memcpy().  Therefore, explicitly
implement memmove() in both the forwards and backward directions, to
give us the ability to optimize memcpy().

Signed-off-by: Ma Ling &lt;ling.ma@intel.com&gt;
LKML-Reference: &lt;C10D3FB0CD45994C8A51FEC1227CE22F0E483AD86A@shsmsx502.ccr.corp.intel.com&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@linux.intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip</title>
<updated>2010-08-13T17:35:48+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2010-08-13T17:35:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c029b55af7d6b02b993e8a5add78d062da7a3940'/>
<id>c029b55af7d6b02b993e8a5add78d062da7a3940</id>
<content type='text'>
* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, asm: Use a lower case name for the end macro in atomic64_386_32.S
  x86, asm: Refactor atomic64_386_32.S to support old binutils and be cleaner
  x86: Document __phys_reloc_hide() usage in __pa_symbol()
  x86, apic: Map the local apic when parsing the MP table.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, asm: Use a lower case name for the end macro in atomic64_386_32.S
  x86, asm: Refactor atomic64_386_32.S to support old binutils and be cleaner
  x86: Document __phys_reloc_hide() usage in __pa_symbol()
  x86, apic: Map the local apic when parsing the MP table.
</pre>
</div>
</content>
</entry>
<entry>
<title>x86, asm: Use a lower case name for the end macro in atomic64_386_32.S</title>
<updated>2010-08-12T14:04:16+00:00</updated>
<author>
<name>Luca Barbieri</name>
<email>luca@luca-barbieri.com</email>
</author>
<published>2010-08-12T14:00:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=417484d47e115774745ef025bce712a102b6f86f'/>
<id>417484d47e115774745ef025bce712a102b6f86f</id>
<content type='text'>
Use a lowercase name for the end macro, which somehow fixes a binutils 2.16
problem.

Signed-off-by: Luca Barbieri &lt;luca@luca-barbieri.com&gt;
LKML-Reference: &lt;tip-30246557a06bb20618bed906a06d1e1e0faa8bb4@git.kernel.org&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use a lowercase name for the end macro, which somehow fixes a binutils 2.16
problem.

Signed-off-by: Luca Barbieri &lt;luca@luca-barbieri.com&gt;
LKML-Reference: &lt;tip-30246557a06bb20618bed906a06d1e1e0faa8bb4@git.kernel.org&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86, asm: Refactor atomic64_386_32.S to support old binutils and be cleaner</title>
<updated>2010-08-12T04:03:28+00:00</updated>
<author>
<name>Luca Barbieri</name>
<email>luca@luca-barbieri.com</email>
</author>
<published>2010-08-06T02:04:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=30246557a06bb20618bed906a06d1e1e0faa8bb4'/>
<id>30246557a06bb20618bed906a06d1e1e0faa8bb4</id>
<content type='text'>
The old code didn't work on binutils 2.12 because setting a symbol to
a register apparently requires a fairly recent version.

This commit refactors the code to use the C preprocessor instead, and
in the process makes the whole code a bit easier to understand.

The object code produced is unchanged as expected.

This fixes kernel bugzilla 16506.

Reported-by: Dieter Stussy &lt;kd6lvw+software@kd6lvw.ampr.org&gt;
Signed-off-by: Luca Barbieri &lt;luca@luca-barbieri.com&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: &lt;stable@kernel.org&gt; 2.6.35
LKML-Reference: &lt;tip-*@git.kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The old code didn't work on binutils 2.12 because setting a symbol to
a register apparently requires a fairly recent version.

This commit refactors the code to use the C preprocessor instead, and
in the process makes the whole code a bit easier to understand.

The object code produced is unchanged as expected.

This fixes kernel bugzilla 16506.

Reported-by: Dieter Stussy &lt;kd6lvw+software@kd6lvw.ampr.org&gt;
Signed-off-by: Luca Barbieri &lt;luca@luca-barbieri.com&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: &lt;stable@kernel.org&gt; 2.6.35
LKML-Reference: &lt;tip-*@git.kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'x86-alternatives-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip</title>
<updated>2010-08-06T23:24:17+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2010-08-06T23:24:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=66cd55d2b903643cbd019ef97a5305d9428d3865'/>
<id>66cd55d2b903643cbd019ef97a5305d9428d3865</id>
<content type='text'>
* 'x86-alternatives-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, alternatives: BUG on encountering an invalid CPU feature number
  x86, alternatives: Fix one more open-coded 8-bit alternative number
  x86, alternatives: Use 16-bit numbers for cpufeature index
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* 'x86-alternatives-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, alternatives: BUG on encountering an invalid CPU feature number
  x86, alternatives: Fix one more open-coded 8-bit alternative number
  x86, alternatives: Use 16-bit numbers for cpufeature index
</pre>
</div>
</content>
</entry>
<entry>
<title>x86, asm: Merge cmpxchg_486_u64() and cmpxchg8b_emu()</title>
<updated>2010-07-29T00:05:11+00:00</updated>
<author>
<name>H. Peter Anvin</name>
<email>hpa@linux.intel.com</email>
</author>
<published>2010-07-29T00:05:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a378d9338e8dde78314b3a6ae003de351936c729'/>
<id>a378d9338e8dde78314b3a6ae003de351936c729</id>
<content type='text'>
We have two functions for doing exactly the same thing -- emulating
cmpxchg8b on 486 and older hardware -- with different calling
conventions, and yet doing the same thing.  Drop the C version and use
the assembly version, via alternatives, for both the local and
non-local versions of cmpxchg8b.

Signed-off-by: H. Peter Anvin &lt;hpa@linux.intel.com&gt;
LKML-Reference: &lt;AANLkTikAmaDPji-TVDarmG1yD=fwbffcsmEU=YEuP+8r@mail.gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have two functions for doing exactly the same thing -- emulating
cmpxchg8b on 486 and older hardware -- with different calling
conventions, and yet doing the same thing.  Drop the C version and use
the assembly version, via alternatives, for both the local and
non-local versions of cmpxchg8b.

Signed-off-by: H. Peter Anvin &lt;hpa@linux.intel.com&gt;
LKML-Reference: &lt;AANLkTikAmaDPji-TVDarmG1yD=fwbffcsmEU=YEuP+8r@mail.gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86, asm: Move cmpxchg emulation code to arch/x86/lib</title>
<updated>2010-07-28T23:53:49+00:00</updated>
<author>
<name>H. Peter Anvin</name>
<email>hpa@linux.intel.com</email>
</author>
<published>2010-07-28T23:53:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=90c8f92f5c807807ca74d5f2f313794925174e6b'/>
<id>90c8f92f5c807807ca74d5f2f313794925174e6b</id>
<content type='text'>
Move cmpxchg emulation code from arch/x86/kernel/cpu (which is
otherwise CPU identification) to arch/x86/lib, where other emulation
code lives already.

Signed-off-by: H. Peter Anvin &lt;hpa@linux.intel.com&gt;
LKML-Reference: &lt;AANLkTikAmaDPji-TVDarmG1yD=fwbffcsmEU=YEuP+8r@mail.gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Move cmpxchg emulation code from arch/x86/kernel/cpu (which is
otherwise CPU identification) to arch/x86/lib, where other emulation
code lives already.

Signed-off-by: H. Peter Anvin &lt;hpa@linux.intel.com&gt;
LKML-Reference: &lt;AANLkTikAmaDPji-TVDarmG1yD=fwbffcsmEU=YEuP+8r@mail.gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86, alternatives: Fix one more open-coded 8-bit alternative number</title>
<updated>2010-07-13T21:56:16+00:00</updated>
<author>
<name>H. Peter Anvin</name>
<email>hpa@linux.intel.com</email>
</author>
<published>2010-07-13T21:55:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=df378ccfc4dd04e263426ad805516915874774aa'/>
<id>df378ccfc4dd04e263426ad805516915874774aa</id>
<content type='text'>
Fix a missing case of an 8-bit alternative number, buried inside an
assembly macro.

Signed-off-by: H. Peter Anvin &lt;hpa@linux.intel.com&gt;
Reported-by: Yinghai Lu &lt;yinhai@kernel.org&gt;
Cc: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
LKML-Reference: &lt;4C3BDDA3.2060900@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix a missing case of an 8-bit alternative number, buried inside an
assembly macro.

Signed-off-by: H. Peter Anvin &lt;hpa@linux.intel.com&gt;
Reported-by: Yinghai Lu &lt;yinhai@kernel.org&gt;
Cc: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
LKML-Reference: &lt;4C3BDDA3.2060900@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
