linux-stable.git/drivers/base, branch v5.3.18

mm/memory_hotplug: fix try_offline_node()

2019-11-20T15:49:15+00:00

commit 2c91f8fc6c999fe10185d8ad99fda1759f662f70 upstream.

try_offline_node() is pretty much broken right now:

 - The node span is updated when onlining memory, not when adding it. We
   ignore memory that was mever onlined. Bad.

 - We touch possible garbage memmaps. The pfn_to_nid(pfn) can easily
   trigger a kernel panic. Bad for memory that is offline but also bad
   for subsection hotadd with ZONE_DEVICE, whereby the memmap of the
   first PFN of a section might contain garbage.

 - Sections belonging to mixed nodes are not properly considered.

As memory blocks might belong to multiple nodes, we would have to walk
all pageblocks (or at least subsections) within present sections.
However, we don't have a way to identify whether a memmap that is not
online was initialized (relevant for ZONE_DEVICE).  This makes things
more complicated.

Luckily, we can piggy pack on the node span and the nid stored in memory
blocks.  Currently, the node span is grown when calling
move_pfn_range_to_zone() - e.g., when onlining memory, and shrunk when
removing memory, before calling try_offline_node().  Sysfs links are
created via link_mem_sections(), e.g., during boot or when adding
memory.

If the node still spans memory or if any memory block belongs to the
nid, we don't set the node offline.  As memory blocks that span multiple
nodes cannot get offlined, the nid stored in memory blocks is reliable
enough (for such online memory blocks, the node still spans the memory).

Introduce for_each_memory_block() to efficiently walk all memory blocks.

Note: We will soon stop shrinking the ZONE_DEVICE zone and the node span
when removing ZONE_DEVICE memory to fix similar issues (access of
garbage memmaps) - until we have a reliable way to identify whether
these memmaps were properly initialized.  This implies later, that once
a node had ZONE_DEVICE memory, we won't be able to set a node offline -
which should be acceptable.

Since commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate
hotadded memory to zones until online") memory that is added is not
assoziated with a zone/node (memmap not initialized).  The introducing
commit 60a5a19e7419 ("memory-hotplug: remove sysfs file of node")
already missed that we could have multiple nodes for a section and that
the zone/node span is updated when onlining pages, not when adding them.

I tested this by hotplugging two DIMMs to a memory-less and cpu-less
NUMA node.  The node is properly onlined when adding the DIMMs.  When
removing the DIMMs, the node is properly offlined.

Masayoshi Mizuma reported:

: Without this patch, memory hotplug fails as panic:
:
:  BUG: kernel NULL pointer dereference, address: 0000000000000000
:  ...
:  Call Trace:
:   remove_memory_block_devices+0x81/0xc0
:   try_remove_memory+0xb4/0x130
:   __remove_memory+0xa/0x20
:   acpi_memory_device_remove+0x84/0x100
:   acpi_bus_trim+0x57/0x90
:   acpi_bus_trim+0x2e/0x90
:   acpi_device_hotplug+0x2b2/0x4d0
:   acpi_hotplug_work_fn+0x1a/0x30
:   process_one_work+0x171/0x380
:   worker_thread+0x49/0x3f0
:   kthread+0xf8/0x130
:   ret_from_fork+0x35/0x40

[david@redhat.com: v3]
  Link: http://lkml.kernel.org/r/20191102120221.7553-1-david@redhat.com
Link: http://lkml.kernel.org/r/20191028105458.28320-1-david@redhat.com
Fixes: 60a5a19e7419 ("memory-hotplug: remove sysfs file of node")
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visiable after d0dc12e86b319
Signed-off-by: David Hildenbrand 
Tested-by: Masayoshi Mizuma 
Cc: Tang Chen 
Cc: Greg Kroah-Hartman 
Cc: "Rafael J. Wysocki" 
Cc: Keith Busch 
Cc: Jiri Olsa 
Cc: "Peter Zijlstra (Intel)" 
Cc: Jani Nikula 
Cc: Nayna Jain 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Stephen Rothwell 
Cc: Dan Williams 
Cc: Pavel Tatashin 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

x86/bugs: Add ITLB_MULTIHIT bug infrastructure

2019-11-12T18:28:28+00:00

commit db4d30fbb71b47e4ecb11c4efa5d8aad4b03dfae upstream.

Some processors may incur a machine check error possibly resulting in an
unrecoverable CPU lockup when an instruction fetch encounters a TLB
multi-hit in the instruction TLB. This can occur when the page size is
changed along with either the physical address or cache type. The relevant
erratum can be found here:

   https://bugzilla.kernel.org/show_bug.cgi?id=205195

There are other processors affected for which the erratum does not fully
disclose the impact.

This issue affects both bare-metal x86 page tables and EPT.

It can be mitigated by either eliminating the use of large pages or by
using careful TLB invalidations when changing the page size in the page
tables.

Just like Spectre, Meltdown, L1TF and MDS, a new bit has been allocated in
MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will be set on CPUs which
are mitigated against this issue.

Signed-off-by: Vineela Tummalapalli 
Co-developed-by: Pawan Gupta 
Signed-off-by: Pawan Gupta 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

x86/speculation/taa: Add sysfs reporting for TSX Async Abort

2019-11-12T18:28:27+00:00

commit 6608b45ac5ecb56f9e171252229c39580cc85f0f upstream.

Add the sysfs reporting file for TSX Async Abort. It exposes the
vulnerability and the mitigation state similar to the existing files for
the other hardware vulnerabilities.

Sysfs file path is:
/sys/devices/system/cpu/vulnerabilities/tsx_async_abort

Signed-off-by: Pawan Gupta 
Signed-off-by: Borislav Petkov 
Signed-off-by: Thomas Gleixner 
Tested-by: Neelima Krishnan 
Reviewed-by: Mark Gross 
Reviewed-by: Tony Luck 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Josh Poimboeuf 
Signed-off-by: Greg Kroah-Hartman

cpufreq: Avoid cpufreq_suspend() deadlock on system shutdown

2019-10-29T08:22:45+00:00

commit 65650b35133ff20f0c9ef0abd5c3c66dbce3ae57 upstream.

It is incorrect to set the cpufreq syscore shutdown callback pointer
to cpufreq_suspend(), because that function cannot be run in the
syscore stage of system shutdown for two reasons: (a) it may attempt
to carry out actions depending on devices that have already been shut
down at that point and (b) the RCU synchronization carried out by it
may not be able to make progress then.

The latter issue has been present since commit 45975c7d21a1 ("rcu:
Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds"),
but the former one has been there since commit 90de2a4aa9f3 ("cpufreq:
suspend cpufreq governors on shutdown") regardless.

Fix that by dropping cpufreq_syscore_ops altogether and making
device_shutdown() call cpufreq_suspend() directly before shutting
down devices, which is along the lines of what system-wide power
management does.

Fixes: 45975c7d21a1 ("rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds")
Fixes: 90de2a4aa9f3 ("cpufreq: suspend cpufreq governors on shutdown")
Reported-by: Ville Syrjälä 
Tested-by: Ville Syrjälä 
Signed-off-by: Rafael J. Wysocki 
Acked-by: Viresh Kumar 
Cc: 4.0+  # 4.0+
Signed-off-by: Greg Kroah-Hartman

drivers/base/memory.c: don't access uninitialized memmaps in soft_offline_page_store()

2019-10-29T08:22:28+00:00

commit 641fe2e9387a36f9ee01d7c69382d1fe147a5e98 upstream.

Uninitialized memmaps contain garbage and in the worst case trigger kernel
BUGs, especially with CONFIG_PAGE_POISONING.  They should not get touched.

Right now, when trying to soft-offline a PFN that resides on a memory
block that was never onlined, one gets a misleading error with
CONFIG_PAGE_POISONING:

  :/# echo 5637144576 > /sys/devices/system/memory/soft_offline_page
  [   23.097167] soft offline: 0x150000 page already poisoned

But the actual result depends on the garbage in the memmap.

soft_offline_page() can only work with online pages, it returns -EIO in
case of ZONE_DEVICE.  Make sure to only forward pages that are online
(iow, managed by the buddy) and, therefore, have an initialized memmap.

Add a check against pfn_to_online_page() and similarly return -EIO.

Link: http://lkml.kernel.org/r/20191010141200.8985-1-david@redhat.com
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online")	[visible after d0dc12e86b319]
Signed-off-by: David Hildenbrand 
Acked-by: Naoya Horiguchi 
Acked-by: Michal Hocko 
Cc: Greg Kroah-Hartman 
Cc: "Rafael J. Wysocki" 
Cc: 	[4.13+]
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

base: soc: Export soc_device_register/unregister APIs

2019-10-05T13:11:34+00:00

[ Upstream commit f7ccc7a397cf2ef64aebb2f726970b93203858d2 ]

Qcom Socinfo driver can be built as a module, so
export these two APIs.

Tested-by: Vinod Koul 
Signed-off-by: Vinod Koul 
Signed-off-by: Vaishali Thakkar 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Stephen Boyd 
Reviewed-by: Bjorn Andersson 
Signed-off-by: Bjorn Andersson 
Signed-off-by: Sasha Levin

Merge tag 'soundwire-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire into char-misc-linus

2019-08-16T10:35:56+00:00

Vinod writes:

soundwire fixes for v5.3-rc5

Pierre sent fixes which are queued now for v5.3-rc5 are:
 - regmap dependecy
 - cadence register definitions

* tag 'soundwire-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
  soundwire: fix regmap dependencies and align with other serial links
  soundwire: cadence_master: fix definitions for INTSTAT0/1
  soundwire: cadence_master: fix register definition for SLAVE_STATE

Merge tag 'driver-core-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

2019-08-10T19:20:02+00:00

Pull driver core fixes from Greg KH:
 "Here are two small fixes for some driver core issues that have been
  reported. There is also a kernfs "fix" here, which was then reverted
  because it was found to cause problems in linux-next.

  The driver core fixes both resolve reported issues, one with gpioint
  stuff that showed up in 5.3-rc1, and the other finally (and hopefully)
  resolves a very long standing race when removing glue directories.
  It's nice to get that issue finally resolved and the developers
  involved should be applauded for the persistence it took to get this
  patch finally accepted.

  All of these have been in linux-next for a while with no reported
  issues. Well, the one reported issue, hence the revert :)"

* tag 'driver-core-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  Revert "kernfs: fix memleak in kernel_ops_readdir()"
  kernfs: fix memleak in kernel_ops_readdir()
  driver core: Fix use-after-free and double free on glue directory
  driver core: platform: return -ENXIO for missing GpioInt

soundwire: fix regmap dependencies and align with other serial links

2019-08-09T04:50:40+00:00

The existing code has a mixed select/depend usage which makes no sense.

config SOUNDWIRE_BUS
       tristate
       select REGMAP_SOUNDWIRE

config REGMAP_SOUNDWIRE
        tristate
        depends on SOUNDWIRE_BUS

Let's remove one layer of Kconfig definitions and align with the
solutions used by all other serial links.

Signed-off-by: Pierre-Louis Bossart 
Link: https://lore.kernel.org/r/20190718230215.18675-1-pierre-louis.bossart@linux.intel.com
Signed-off-by: Vinod Koul

driver core: Fix use-after-free and double free on glue directory

2019-07-30T16:44:47+00:00

There is a race condition between removing glue directory and adding a new
device under the glue dir. It can be reproduced in following test:

CPU1:                                         CPU2:

device_add()
  get_device_parent()
    class_dir_create_and_add()
      kobject_add_internal()
        create_dir()    // create glue_dir

                                              device_add()
                                                get_device_parent()
                                                  kobject_get() // get glue_dir

device_del()
  cleanup_glue_dir()
    kobject_del(glue_dir)

                                                kobject_add()
                                                  kobject_add_internal()
                                                    create_dir() // in glue_dir
                                                      sysfs_create_dir_ns()
                                                        kernfs_create_dir_ns(sd)

      sysfs_remove_dir() // glue_dir->sd=NULL
      sysfs_put()        // free glue_dir->sd

                                                          // sd is freed
                                                          kernfs_new_node(sd)
                                                            kernfs_get(glue_dir)
                                                            kernfs_add_one()
                                                            kernfs_put()

Before CPU1 remove last child device under glue dir, if CPU2 add a new
device under glue dir, the glue_dir kobject reference count will be
increase to 2 via kobject_get() in get_device_parent(). And CPU2 has
been called kernfs_create_dir_ns(), but not call kernfs_new_node().
Meanwhile, CPU1 call sysfs_remove_dir() and sysfs_put(). This result in
glue_dir->sd is freed and it's reference count will be 0. Then CPU2 call
kernfs_get(glue_dir) will trigger a warning in kernfs_get() and increase
it's reference count to 1. Because glue_dir->sd is freed by CPU1, the next
call kernfs_add_one() by CPU2 will fail(This is also use-after-free)
and call kernfs_put() to decrease reference count. Because the reference
count is decremented to 0, it will also call kmem_cache_free() to free
the glue_dir->sd again. This will result in double free.

In order to avoid this happening, we also should make sure that kernfs_node
for glue_dir is released in CPU1 only when refcount for glue_dir kobj is
1 to fix this race.

The following calltrace is captured in kernel 4.14 with the following patch
applied:

commit 726e41097920 ("drivers: core: Remove glue dirs from sysfs earlier")

--------------------------------------------------------------------------
[    3.633703] WARNING: CPU: 4 PID: 513 at .../fs/kernfs/dir.c:494
                Here is WARN_ON(!atomic_read(&kn->count) in kernfs_get().
....
[    3.633986] Call trace:
[    3.633991]  kernfs_create_dir_ns+0xa8/0xb0
[    3.633994]  sysfs_create_dir_ns+0x54/0xe8
[    3.634001]  kobject_add_internal+0x22c/0x3f0
[    3.634005]  kobject_add+0xe4/0x118
[    3.634011]  device_add+0x200/0x870
[    3.634017]  _request_firmware+0x958/0xc38
[    3.634020]  request_firmware_into_buf+0x4c/0x70
....
[    3.634064] kernel BUG at .../mm/slub.c:294!
                Here is BUG_ON(object == fp) in set_freepointer().
....
[    3.634346] Call trace:
[    3.634351]  kmem_cache_free+0x504/0x6b8
[    3.634355]  kernfs_put+0x14c/0x1d8
[    3.634359]  kernfs_create_dir_ns+0x88/0xb0
[    3.634362]  sysfs_create_dir_ns+0x54/0xe8
[    3.634366]  kobject_add_internal+0x22c/0x3f0
[    3.634370]  kobject_add+0xe4/0x118
[    3.634374]  device_add+0x200/0x870
[    3.634378]  _request_firmware+0x958/0xc38
[    3.634381]  request_firmware_into_buf+0x4c/0x70
--------------------------------------------------------------------------

Fixes: 726e41097920 ("drivers: core: Remove glue dirs from sysfs earlier")
Signed-off-by: Muchun Song 
Reviewed-by: Mukesh Ojha 
Signed-off-by: Prateek Sood 
Link: https://lore.kernel.org/r/20190727032122.24639-1-smuchun@gmail.com
Signed-off-by: Greg Kroah-Hartman