<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c, branch v5.17</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>drm/amdgpu: Add kernel parameter support for ignoring bad page threshold</title>
<updated>2021-10-28T18:26:12+00:00</updated>
<author>
<name>Kent Russell</name>
<email>kent.russell@amd.com</email>
</author>
<published>2021-10-19T14:05:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=68daadf3d673568bb7122b1683fd8b0e27c55d9b'/>
<id>68daadf3d673568bb7122b1683fd8b0e27c55d9b</id>
<content type='text'>
When a GPU hits the bad_page_threshold, it will not be initialized by
the amdgpu driver. This means that the table cannot be cleared, nor can
information gathering be performed (getting serial number, BDF, etc).

If the bad_page_threshold kernel parameter is set to -2,
continue to initialize the GPU, while printing a warning to dmesg that
this action has been done

v2: squash in Luben's fix to restore RAS info reporting

Cc: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Cc: Mukul Joshi &lt;Mukul.Joshi@amd.com&gt;
Signed-off-by: Kent Russell &lt;kent.russell@amd.com&gt;
Acked-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When a GPU hits the bad_page_threshold, it will not be initialized by
the amdgpu driver. This means that the table cannot be cleared, nor can
information gathering be performed (getting serial number, BDF, etc).

If the bad_page_threshold kernel parameter is set to -2,
continue to initialize the GPU, while printing a warning to dmesg that
this action has been done

v2: squash in Luben's fix to restore RAS info reporting

Cc: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Cc: Mukul Joshi &lt;Mukul.Joshi@amd.com&gt;
Signed-off-by: Kent Russell &lt;kent.russell@amd.com&gt;
Acked-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Warn when bad pages approaches 90% threshold</title>
<updated>2021-10-28T18:26:12+00:00</updated>
<author>
<name>Kent Russell</name>
<email>kent.russell@amd.com</email>
</author>
<published>2021-10-12T20:49:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8483fdfea778aedded76c74659692dee3756b12b'/>
<id>8483fdfea778aedded76c74659692dee3756b12b</id>
<content type='text'>
dmesg doesn't warn when the number of bad pages approaches the
threshold for page retirement. WARN when the number of bad pages
is at 90% or greater for easier checks and planning, instead of waiting
until the GPU is full of bad pages.

Cc: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Cc: Mukul Joshi &lt;Mukul.Joshi@amd.com&gt;
Signed-off-by: Kent Russell &lt;kent.russell@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
dmesg doesn't warn when the number of bad pages approaches the
threshold for page retirement. WARN when the number of bad pages
is at 90% or greater for easier checks and planning, instead of waiting
until the GPU is full of bad pages.

Cc: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Cc: Mukul Joshi &lt;Mukul.Joshi@amd.com&gt;
Signed-off-by: Kent Russell &lt;kent.russell@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Clarify error when hitting bad page threshold</title>
<updated>2021-10-20T15:43:57+00:00</updated>
<author>
<name>Kent Russell</name>
<email>kent.russell@amd.com</email>
</author>
<published>2021-10-19T13:53:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dcd5ea9f9428d1c95b59416cf1d7af92fd5d0b45'/>
<id>dcd5ea9f9428d1c95b59416cf1d7af92fd5d0b45</id>
<content type='text'>
Change the error message when the bad_page_threshold is reached,
explicitly stating that the GPU will not be initialized.

Cc: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Cc: Mukul Joshi &lt;Mukul.Joshi@amd.com&gt;
Signed-off-by: Kent Russell &lt;kent.russell@amd.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change the error message when the bad_page_threshold is reached,
explicitly stating that the GPU will not be initialized.

Cc: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Cc: Mukul Joshi &lt;Mukul.Joshi@amd.com&gt;
Signed-off-by: Kent Russell &lt;kent.russell@amd.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Drop inline from amdgpu_ras_eeprom_max_record_count</title>
<updated>2021-09-16T13:56:24+00:00</updated>
<author>
<name>Michel Dänzer</name>
<email>mdaenzer@redhat.com</email>
</author>
<published>2021-09-09T16:56:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=114518ff3b30a3f0611f384fb58e0a968fdf7f5e'/>
<id>114518ff3b30a3f0611f384fb58e0a968fdf7f5e</id>
<content type='text'>
This was unusual; normally, inline functions are declared static as
well, and defined in a header file if used by multiple compilation
units. The latter would be more involved in this case, so just drop
the inline declaration for now.

Fixes compile failure building for ppc64le on RHEL 8:

In file included from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h:32,
                 from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:33:
../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c: In function ‘amdgpu_ras_recovery_init’:
../drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h:90:17: error: inlining failed in call
 to ‘always_inline’ ‘amdgpu_ras_eeprom_max_record_count’: function body not available
   90 | inline uint32_t amdgpu_ras_eeprom_max_record_count(void);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1985:34: note: called from here
 1985 |         max_eeprom_records_len = amdgpu_ras_eeprom_max_record_count();
      |                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: c84d46707ebb "drm/amdgpu: validate bad page threshold in ras(v3)"
Reviewed-by: Lyude Paul &lt;lyude@redhat.com&gt;
Signed-off-by: Michel Dänzer &lt;mdaenzer@redhat.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This was unusual; normally, inline functions are declared static as
well, and defined in a header file if used by multiple compilation
units. The latter would be more involved in this case, so just drop
the inline declaration for now.

Fixes compile failure building for ppc64le on RHEL 8:

In file included from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h:32,
                 from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:33:
../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c: In function ‘amdgpu_ras_recovery_init’:
../drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h:90:17: error: inlining failed in call
 to ‘always_inline’ ‘amdgpu_ras_eeprom_max_record_count’: function body not available
   90 | inline uint32_t amdgpu_ras_eeprom_max_record_count(void);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1985:34: note: called from here
 1985 |         max_eeprom_records_len = amdgpu_ras_eeprom_max_record_count();
      |                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: c84d46707ebb "drm/amdgpu: validate bad page threshold in ras(v3)"
Reviewed-by: Lyude Paul &lt;lyude@redhat.com&gt;
Signed-off-by: Michel Dänzer &lt;mdaenzer@redhat.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Process any VBIOS RAS EEPROM address</title>
<updated>2021-08-30T18:59:33+00:00</updated>
<author>
<name>Luben Tuikov</name>
<email>luben.tuikov@amd.com</email>
</author>
<published>2021-08-25T17:50:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=cc947bf91bad65d4f0ef85a3cd7272a1cf26f53d'/>
<id>cc947bf91bad65d4f0ef85a3cd7272a1cf26f53d</id>
<content type='text'>
We can now process any RAS EEPROM address from
VBIOS. Generalize so as to compute the top three
bits of the 19-bit EEPROM address, from any byte
returned as the "i2c address" from VBIOS.

Cc: John Clements &lt;john.clements@amd.com&gt;
Cc: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Cc: Alex Deucher &lt;Alexander.Deucher@amd.com&gt;
Signed-off-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Reviewed-by: Alex Deucher &lt;Alexander.Deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We can now process any RAS EEPROM address from
VBIOS. Generalize so as to compute the top three
bits of the 19-bit EEPROM address, from any byte
returned as the "i2c address" from VBIOS.

Cc: John Clements &lt;john.clements@amd.com&gt;
Cc: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Cc: Alex Deucher &lt;Alexander.Deucher@amd.com&gt;
Signed-off-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Reviewed-by: Alex Deucher &lt;Alexander.Deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: set RAS EEPROM address from VBIOS</title>
<updated>2021-08-06T01:17:59+00:00</updated>
<author>
<name>John Clements</name>
<email>john.clements@amd.com</email>
</author>
<published>2021-08-04T09:11:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=14fb496a84f15c1e462c8b7ff5563154174a6c5e'/>
<id>14fb496a84f15c1e462c8b7ff5563154174a6c5e</id>
<content type='text'>
update to latest atombios fw table

Signed-off-by: John Clements &lt;john.clements@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;.
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
update to latest atombios fw table

Signed-off-by: John Clements &lt;john.clements@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;.
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: return -EFAULT if copy_to_user() fails</title>
<updated>2021-07-08T21:47:28+00:00</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@oracle.com</email>
</author>
<published>2021-07-03T09:46:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=64598e23de7873b9d47cd9b9a02daa2bb4ded343'/>
<id>64598e23de7873b9d47cd9b9a02daa2bb4ded343</id>
<content type='text'>
If copy_to_user() fails then this should return -EFAULT instead of
-EINVAL.

Fixes: c65b0805e77919 ("drm/amdgpu: RAS EEPROM table is now in debugfs")
Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If copy_to_user() fails then this should return -EFAULT instead of
-EINVAL.

Fixes: c65b0805e77919 ("drm/amdgpu: RAS EEPROM table is now in debugfs")
Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: unlock on error in amdgpu_ras_debugfs_table_read()</title>
<updated>2021-07-08T21:47:16+00:00</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@oracle.com</email>
</author>
<published>2021-07-03T09:45:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b8badd507a5b76a8e58c864b01116f3de43464cb'/>
<id>b8badd507a5b76a8e58c864b01116f3de43464cb</id>
<content type='text'>
This error path needs to unlock before returning.  While we're at it,
the correct error code from copy_to_user() failure is -EFAULT, not
-EINVAL.

Fixes: c65b0805e77919 ("drm/amdgpu: RAS EEPROM table is now in debugfs")
Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This error path needs to unlock before returning.  While we're at it,
the correct error code from copy_to_user() failure is -EFAULT, not
-EINVAL.

Fixes: c65b0805e77919 ("drm/amdgpu: RAS EEPROM table is now in debugfs")
Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: fix a signedness bug in __verify_ras_table_checksum()</title>
<updated>2021-07-08T19:17:02+00:00</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@oracle.com</email>
</author>
<published>2021-07-03T09:44:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3006c9245542609d3a11b856b6d17cfce747ca88'/>
<id>3006c9245542609d3a11b856b6d17cfce747ca88</id>
<content type='text'>
If amdgpu_eeprom_read() returns a negative error code then the error
handling checks:

	if (res &lt; buf_size) {

The problem is that "buf_size" is a u32 so negative values are type
promoted to a high positive values and the condition is false.  Fix
this by changing the type of "buf_size" to int.

Fixes: 63d4c081a556a1 ("drm/amdgpu: Optimize EEPROM RAS table I/O")
Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If amdgpu_eeprom_read() returns a negative error code then the error
handling checks:

	if (res &lt; buf_size) {

The problem is that "buf_size" is a u32 so negative values are type
promoted to a high positive values and the condition is false.  Fix
this by changing the type of "buf_size" to int.

Fixes: 63d4c081a556a1 ("drm/amdgpu: Optimize EEPROM RAS table I/O")
Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: fix 64 bit divide in eeprom code</title>
<updated>2021-07-01T04:24:41+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2021-06-30T15:19:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d456f3875af2eb5bf5a9cbd526622801ffc51037'/>
<id>d456f3875af2eb5bf5a9cbd526622801ffc51037</id>
<content type='text'>
pos is 64 bits.

Fixes: c65b0805e77919 ("drm/amdgpu: RAS EEPROM table is now in debugfs")
Cc: luben.tuikov@amd.com
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
pos is 64 bits.

Fixes: c65b0805e77919 ("drm/amdgpu: RAS EEPROM table is now in debugfs")
Cc: luben.tuikov@amd.com
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
