linux-stable.git/drivers, branch linux-6.3.y

md/raid1-10: fix casting from randomized structure in raid1_submit_write()

2023-07-11T17:39:51+00:00

commit b5a99602b74bbfa655be509c615181dd95b0719e upstream.

Following build error triggered while build with clang version 17.0.0
with W=1(this can't be reporduced with gcc 13.1.0):

drivers/md/raid1-10.c:117:25: error: casting from randomized structure
pointer type 'struct block_device *' to 'struct md_rdev *'
     117 |         struct md_rdev *rdev = (struct md_rdev *)bio->bi_bdev;
         |                                ^

Fix this by casting 'bio->bi_bdev' to 'void *', as it used to be.

Reported-by: kernel test robot 
Closes: https://lore.kernel.org/oe-kbuild-all/202306142042.fmjfmTF8-lkp@intel.com/
Fixes: 8295efbe68c0 ("md/raid1-10: factor out a helper to submit normal write")
Signed-off-by: Yu Kuai 
Signed-off-by: Song Liu 
Link: https://lore.kernel.org/r/20230616012136.3047071-1-yukuai1@huaweicloud.com
Signed-off-by: Greg Kroah-Hartman

efi/libstub: Disable PCI DMA before grabbing the EFI memory map

2023-07-11T17:39:51+00:00

[ Upstream commit 2e28a798c3092ea42b968fa16ac835969d124898 ]

Currently, the EFI stub will disable PCI DMA as the very last thing it
does before calling ExitBootServices(), to avoid interfering with the
firmware's normal operation as much as possible.

However, the stub will invoke DisconnectController() on all endpoints
downstream of the PCI bridges it disables, and this may affect the
layout of the EFI memory map, making it substantially more likely that
ExitBootServices() will fail the first time around, and that the EFI
memory map needs to be reloaded.

This, in turn, increases the likelihood that the slack space we
allocated is insufficient (and we can no longer allocate memory via boot
services after having called ExitBootServices() once), causing the
second call to GetMemoryMap (and therefore the boot) to fail. This makes
the PCI DMA disable feature a bit more fragile than it already is, so
let's make it more robust, by allocating the space for the EFI memory
map after disabling PCI DMA.

Fixes: 4444f8541dad16fe ("efi: Allow disabling PCI busmastering on bridges during boot")
Reported-by: Glenn Washburn 
Acked-by: Matthew Garrett 
Signed-off-by: Ard Biesheuvel 
Signed-off-by: Sasha Levin

cxl/region: Fix state transitions after reset failure

2023-07-11T17:39:50+00:00

[ Upstream commit adfe19738b71a893da62cb2e30bd6bdb4299ea67 ]

Jonathan reports that failed attempts to reset a region (teardown its
HDM decoder configuration) mistakenly advance the state of the region
to "not committed". Revert to the previous state of the region on reset
failure so that the reset can be re-attempted.

Reported-by: Jonathan Cameron 
Closes: http://lore.kernel.org/r/20230316171441.0000205b@Huawei.com
Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
Reviewed-by: Jonathan Cameron 
Reviewed-by: Dave Jiang 
Link: https://lore.kernel.org/r/168696507968.3590522.14484000711718573626.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams 
Signed-off-by: Sasha Levin

cxl/region: Flag partially torn down regions as unusable

2023-07-11T17:39:50+00:00

[ Upstream commit 2ab47045ac96a605e3037d479a7d5854570ee5bf ]

cxl_region_decode_reset() walks all the decoders associated with a given
region and disables them. Due to decoder ordering rules it is possible
that a switch in the topology notices that a given decoder can not be
shutdown before another region with a higher HPA is shutdown first. That
can leave the region in a partially committed state.

Capture that state in a new CXL_REGION_F_NEEDS_RESET flag and require
that a successful cxl_region_decode_reset() attempt must be completed
before cxl_region_probe() accepts the region.

This is a corollary for the bug that Jonathan identified in "CXL/region
:  commit reset of out of order region appears to succeed." [1].

Cc: Jonathan Cameron 
Link: http://lore.kernel.org/r/20230316171441.0000205b@Huawei.com [1]
Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
Reviewed-by: Dave Jiang 
Link: https://lore.kernel.org/r/168696507423.3590522.16254212607926684429.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams 
Signed-off-by: Sasha Levin

cxl/region: Move cache invalidation before region teardown, and before setup

2023-07-11T17:39:50+00:00

[ Upstream commit d1257d098a5a38753a0736a50db0a26a62377ad7 ]

Vikram raised a concern with the theoretical case of a CPU sending
MemClnEvict to a device that is not prepared to receive. MemClnEvict is
a message that is sent after a CPU has taken ownership of a cacheline
from accelerator memory (HDM-DB). In the case of hotplug or HDM decoder
reconfiguration it is possible that the CPU is holding old contents for
a new device that has taken over the physical address range being cached
by the CPU.

To avoid this scenario, invalidate caches prior to tearing down an HDM
decoder configuration.

Now, this poses another problem that it is possible for something to
speculate into that space while the decode configuration is still up, so
to close that gap also invalidate prior to establish new contents behind
a given physical address range.

With this change the cache invalidation is now explicit and need not be
checked in cxl_region_probe(), and that obviates the need for
CXL_REGION_F_INCOHERENT.

Cc: Jonathan Cameron 
Fixes: d18bc74aced6 ("cxl/region: Manage CPU caches relative to DPA invalidation events")
Reported-by: Vikram Sethi 
Closes: http://lore.kernel.org/r/BYAPR12MB33364B5EB908BF7239BB996BBD53A@BYAPR12MB3336.namprd12.prod.outlook.com
Reviewed-by: Jonathan Cameron 
Reviewed-by: Dave Jiang 
Link: https://lore.kernel.org/r/168696506886.3590522.4597053660991916591.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams 
Signed-off-by: Sasha Levin

hwrng: st - keep clock enabled while hwrng is registered

2023-07-11T17:39:50+00:00

[ Upstream commit 501e197a02d4aef157f53ba3a0b9049c3e52fedc ]

The st-rng driver uses devres to register itself with the hwrng core,
the driver will be unregistered from hwrng when its device goes out of
scope. This happens after the driver's remove function is called.

However, st-rng's clock is disabled in the remove function. There's a
short timeframe where st-rng is still registered with the hwrng core
although its clock is disabled. I suppose the clock must be active to
access the hardware and serve requests from the hwrng core.

Switch to devm_clk_get_enabled and let devres disable the clock and
unregister the hwrng. This avoids the race condition.

Fixes: 3e75241be808 ("hwrng: drivers - Use device-managed registration API")
Signed-off-by: Martin Kaiser 
Signed-off-by: Herbert Xu 
Signed-off-by: Sasha Levin

dax/kmem: Pass valid argument to memory_group_register_static

2023-07-11T17:39:50+00:00

[ Upstream commit 46e66dab8565f742374e9cc4ff7d35f344d774e2 ]

memory_group_register_static takes maximum number of pages as the argument
while dev_dax_kmem_probe passes total_len (in bytes) as the argument.

IIUC, I don't see any crash/panic impact as such. As,
memory_group_register_static just set the max_pages limit which is used in
auto_movable_zone_for_pfn to determine the zone.

which might cause these condition to behave differently,

This will be true always so jump will happen to kernel_zone
    ...
    if (!auto_movable_can_online_movable(NUMA_NO_NODE, group, nr_pages))
        goto kernel_zone;

    ...
    kernel_zone:
        return default_kernel_zone_for_pfn(nid, pfn, nr_pages);

Here, In below, zone_intersects compare range will be larger as nr_pages
will be higher (derived from total_len passed in dev_dax_kmem_probe).

    ...
    static struct zone *default_kernel_zone_for_pfn(int nid, unsigned long start_pfn,
    		unsigned long nr_pages)
    {
    	struct pglist_data *pgdat = NODE_DATA(nid);
    	int zid;

    	for (zid = 0; zid < ZONE_NORMAL; zid++) {
    		struct zone *zone = &pgdat->node_zones[zid];

    		if (zone_intersects(zone, start_pfn, nr_pages))
    			return zone;
    	}

    	return &pgdat->node_zones[ZONE_NORMAL];
    }

Incorrect zone will be returned here, which in later time might cause bigger
problem.

Fixes: eedf634aac3b ("dax/kmem: use a single static memory group for a single probed unit")
Signed-off-by: Tarun Sahu 
Link: https://lore.kernel.org/r/20230621155025.370672-1-tsahu@linux.ibm.com
Reviewed-by: Vishal Verma 
Signed-off-by: Vishal Verma 
Signed-off-by: Sasha Levin

dax: Introduce alloc_dev_dax_id()

2023-07-11T17:39:50+00:00

[ Upstream commit 70aab281e18c68a1284bc387de127c2fc0bed3f8 ]

The reference counting of dax_region objects is needlessly complicated,
has lead to confusion [1], and has hidden a bug [2]. Towards cleaning up
that mess introduce alloc_dev_dax_id() to minimize the holding of a
dax_region reference to only what dev_dax_release() needs, the
dax_region->ida.

Part of the reason for the mess was the design to dereference a
dax_region in all cases in free_dev_dax_id() even if the id was
statically assigned by the upper level dax_region driver. Remove the
need to call "is_static(dax_region)" by tracking whether the id is
dynamic directly in the dev_dax instance itself.

With that flag the dax_region pinning and release per dev_dax instance
can move to alloc_dev_dax_id() and free_dev_dax_id() respectively.

A follow-on cleanup address the unnecessary references in the dax_region
setup and drivers.

Fixes: 0f3da14a4f05 ("device-dax: introduce 'seed' devices")
Link: http://lore.kernel.org/r/20221203095858.612027-1-liuyongqiang13@huawei.com [1]
Link: http://lore.kernel.org/r/3cf0890b-4eb0-e70e-cd9c-2ecc3d496263@hpe.com [2]
Reported-by: Yongqiang Liu 
Reported-by: Paul Cassella 
Reported-by: Ira Weiny 
Signed-off-by: Dan Williams 
Link: https://lore.kernel.org/r/168577284563.1672036.13493034988900989554.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Ira Weiny 
Signed-off-by: Vishal Verma 
Signed-off-by: Sasha Levin

dax: Fix dax_mapping_release() use after free

2023-07-11T17:39:50+00:00

[ Upstream commit 6d24b170a9db0456f577b1ab01226a2254c016a8 ]

A CONFIG_DEBUG_KOBJECT_RELEASE test of removing a device-dax region
provider (like modprobe -r dax_hmem) yields:

 kobject: 'mapping0' (ffff93eb460e8800): kobject_release, parent 0000000000000000 (delayed 2000)
 [..]
 DEBUG_LOCKS_WARN_ON(1)
 WARNING: CPU: 23 PID: 282 at kernel/locking/lockdep.c:232 __lock_acquire+0x9fc/0x2260
 [..]
 RIP: 0010:__lock_acquire+0x9fc/0x2260
 [..]
 Call Trace:
  
 [..]
  lock_acquire+0xd4/0x2c0
  ? ida_free+0x62/0x130
  _raw_spin_lock_irqsave+0x47/0x70
  ? ida_free+0x62/0x130
  ida_free+0x62/0x130
  dax_mapping_release+0x1f/0x30
  device_release+0x36/0x90
  kobject_delayed_cleanup+0x46/0x150

Due to attempting ida_free() on an ida object that has already been
freed. Devices typically only hold a reference on their parent while
registered. If a child needs a parent object to complete its release it
needs to hold a reference that it drops from its release callback.
Arrange for a dax_mapping to pin its parent dev_dax instance until
dax_mapping_release().

Fixes: 0b07ce872a9e ("device-dax: introduce 'mapping' devices")
Signed-off-by: Dan Williams 
Link: https://lore.kernel.org/r/168577283412.1672036.16111545266174261446.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Dave Jiang 
Reviewed-by: Fan Ni 
Reviewed-by: Ira Weiny 
Signed-off-by: Vishal Verma 
Signed-off-by: Sasha Levin

crypto: qat - unmap buffers before free for RSA

2023-07-11T17:39:50+00:00

[ Upstream commit d776b25495f2c71b9dbf1f5e53b642215ba72f3c ]

The callback function for RSA frees the memory allocated for the source
and destination buffers before unmapping them.
This sequence is wrong.

Change the cleanup sequence to unmap the buffers before freeing them.

Fixes: 3dfaf0071ed7 ("crypto: qat - remove dma_free_coherent() for RSA")
Signed-off-by: Hareshx Sankar Raj 
Co-developed-by: Bolemx Sivanagaleela 
Signed-off-by: Bolemx Sivanagaleela 
Reviewed-by: Giovanni Cabiddu 
Reviewed-by: Andy Shevchenko 
Signed-off-by: Giovanni Cabiddu 
Signed-off-by: Herbert Xu 
Signed-off-by: Sasha Levin