<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/vfio, branch v6.1.78</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>vfio/mdev: Fix a null-ptr-deref bug for mdev_unregister_parent()</title>
<updated>2023-10-06T12:56:45+00:00</updated>
<author>
<name>Jinjie Ruan</name>
<email>ruanjinjie@huawei.com</email>
</author>
<published>2023-09-18T11:55:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c01b2e0ee22ef8b4dd7509a93aecc0ac0826bae4'/>
<id>c01b2e0ee22ef8b4dd7509a93aecc0ac0826bae4</id>
<content type='text'>
[ Upstream commit c777b11d34e0f47dbbc4b018ef65ad030f2b283a ]

Inject fault while probing mdpy.ko, if kstrdup() of create_dir() fails in
kobject_add_internal() in kobject_init_and_add() in mdev_type_add()
in parent_create_sysfs_files(), it will return 0 and probe successfully.
And when rmmod mdpy.ko, the mdpy_dev_exit() will call
mdev_unregister_parent(), the mdev_type_remove() may traverse uninitialized
parent-&gt;types[i] in parent_remove_sysfs_files(), and it will cause
below null-ptr-deref.

If mdev_type_add() fails, return the error code and kset_unregister()
to fix the issue.

 general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN
 KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
 CPU: 2 PID: 10215 Comm: rmmod Tainted: G        W        N 6.6.0-rc2+ #20
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
 RIP: 0010:__kobject_del+0x62/0x1c0
 Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 51 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 8b 6b 28 48 8d 7d 10 48 89 fa 48 c1 ea 03 &lt;80&gt; 3c 02 00 0f 85 24 01 00 00 48 8b 75 10 48 89 df 48 8d 6b 3c e8
 RSP: 0018:ffff88810695fd30 EFLAGS: 00010202
 RAX: dffffc0000000000 RBX: ffffffffa0270268 RCX: 0000000000000000
 RDX: 0000000000000002 RSI: 0000000000000004 RDI: 0000000000000010
 RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed10233a4ef1
 R10: ffff888119d2778b R11: 0000000063666572 R12: 0000000000000000
 R13: fffffbfff404e2d4 R14: dffffc0000000000 R15: ffffffffa0271660
 FS:  00007fbc81981540(0000) GS:ffff888119d00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fc14a142dc0 CR3: 0000000110a62003 CR4: 0000000000770ee0
 DR0: ffffffff8fb0bce8 DR1: ffffffff8fb0bce9 DR2: ffffffff8fb0bcea
 DR3: ffffffff8fb0bceb DR6: 00000000fffe0ff0 DR7: 0000000000000600
 PKRU: 55555554
 Call Trace:
  &lt;TASK&gt;
  ? die_addr+0x3d/0xa0
  ? exc_general_protection+0x144/0x220
  ? asm_exc_general_protection+0x22/0x30
  ? __kobject_del+0x62/0x1c0
  kobject_del+0x32/0x50
  parent_remove_sysfs_files+0xd6/0x170 [mdev]
  mdev_unregister_parent+0xfb/0x190 [mdev]
  ? mdev_register_parent+0x270/0x270 [mdev]
  ? find_module_all+0x9d/0xe0
  mdpy_dev_exit+0x17/0x63 [mdpy]
  __do_sys_delete_module.constprop.0+0x2fa/0x4b0
  ? module_flags+0x300/0x300
  ? __fput+0x4e7/0xa00
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x46/0xb0
 RIP: 0033:0x7fbc813221b7
 Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 &lt;48&gt; 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48
 RSP: 002b:00007ffe780e0648 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
 RAX: ffffffffffffffda RBX: 00007ffe780e06a8 RCX: 00007fbc813221b7
 RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055e214df9b58
 RBP: 000055e214df9af0 R08: 00007ffe780df5c1 R09: 0000000000000000
 R10: 00007fbc8139ecc0 R11: 0000000000000206 R12: 00007ffe780e0870
 R13: 00007ffe780e0ed0 R14: 000055e214df9260 R15: 000055e214df9af0
  &lt;/TASK&gt;
 Modules linked in: mdpy(-) mdev vfio_iommu_type1 vfio [last unloaded: mdpy]
 Dumping ftrace buffer:
    (ftrace buffer empty)
 ---[ end trace 0000000000000000 ]---
 RIP: 0010:__kobject_del+0x62/0x1c0
 Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 51 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 8b 6b 28 48 8d 7d 10 48 89 fa 48 c1 ea 03 &lt;80&gt; 3c 02 00 0f 85 24 01 00 00 48 8b 75 10 48 89 df 48 8d 6b 3c e8
 RSP: 0018:ffff88810695fd30 EFLAGS: 00010202
 RAX: dffffc0000000000 RBX: ffffffffa0270268 RCX: 0000000000000000
 RDX: 0000000000000002 RSI: 0000000000000004 RDI: 0000000000000010
 RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed10233a4ef1
 R10: ffff888119d2778b R11: 0000000063666572 R12: 0000000000000000
 R13: fffffbfff404e2d4 R14: dffffc0000000000 R15: ffffffffa0271660
 FS:  00007fbc81981540(0000) GS:ffff888119d00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fc14a142dc0 CR3: 0000000110a62003 CR4: 0000000000770ee0
 DR0: ffffffff8fb0bce8 DR1: ffffffff8fb0bce9 DR2: ffffffff8fb0bcea
 DR3: ffffffff8fb0bceb DR6: 00000000fffe0ff0 DR7: 0000000000000600
 PKRU: 55555554
 Kernel panic - not syncing: Fatal exception
 Dumping ftrace buffer:
    (ftrace buffer empty)
 Kernel Offset: disabled
 Rebooting in 1 seconds..

Fixes: da44c340c4fe ("vfio/mdev: simplify mdev_type handling")
Signed-off-by: Jinjie Ruan &lt;ruanjinjie@huawei.com&gt;
Reviewed-by: Eric Farman &lt;farman@linux.ibm.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Link: https://lore.kernel.org/r/20230918115551.1423193-1-ruanjinjie@huawei.com
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit c777b11d34e0f47dbbc4b018ef65ad030f2b283a ]

Inject fault while probing mdpy.ko, if kstrdup() of create_dir() fails in
kobject_add_internal() in kobject_init_and_add() in mdev_type_add()
in parent_create_sysfs_files(), it will return 0 and probe successfully.
And when rmmod mdpy.ko, the mdpy_dev_exit() will call
mdev_unregister_parent(), the mdev_type_remove() may traverse uninitialized
parent-&gt;types[i] in parent_remove_sysfs_files(), and it will cause
below null-ptr-deref.

If mdev_type_add() fails, return the error code and kset_unregister()
to fix the issue.

 general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN
 KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
 CPU: 2 PID: 10215 Comm: rmmod Tainted: G        W        N 6.6.0-rc2+ #20
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
 RIP: 0010:__kobject_del+0x62/0x1c0
 Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 51 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 8b 6b 28 48 8d 7d 10 48 89 fa 48 c1 ea 03 &lt;80&gt; 3c 02 00 0f 85 24 01 00 00 48 8b 75 10 48 89 df 48 8d 6b 3c e8
 RSP: 0018:ffff88810695fd30 EFLAGS: 00010202
 RAX: dffffc0000000000 RBX: ffffffffa0270268 RCX: 0000000000000000
 RDX: 0000000000000002 RSI: 0000000000000004 RDI: 0000000000000010
 RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed10233a4ef1
 R10: ffff888119d2778b R11: 0000000063666572 R12: 0000000000000000
 R13: fffffbfff404e2d4 R14: dffffc0000000000 R15: ffffffffa0271660
 FS:  00007fbc81981540(0000) GS:ffff888119d00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fc14a142dc0 CR3: 0000000110a62003 CR4: 0000000000770ee0
 DR0: ffffffff8fb0bce8 DR1: ffffffff8fb0bce9 DR2: ffffffff8fb0bcea
 DR3: ffffffff8fb0bceb DR6: 00000000fffe0ff0 DR7: 0000000000000600
 PKRU: 55555554
 Call Trace:
  &lt;TASK&gt;
  ? die_addr+0x3d/0xa0
  ? exc_general_protection+0x144/0x220
  ? asm_exc_general_protection+0x22/0x30
  ? __kobject_del+0x62/0x1c0
  kobject_del+0x32/0x50
  parent_remove_sysfs_files+0xd6/0x170 [mdev]
  mdev_unregister_parent+0xfb/0x190 [mdev]
  ? mdev_register_parent+0x270/0x270 [mdev]
  ? find_module_all+0x9d/0xe0
  mdpy_dev_exit+0x17/0x63 [mdpy]
  __do_sys_delete_module.constprop.0+0x2fa/0x4b0
  ? module_flags+0x300/0x300
  ? __fput+0x4e7/0xa00
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x46/0xb0
 RIP: 0033:0x7fbc813221b7
 Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 &lt;48&gt; 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48
 RSP: 002b:00007ffe780e0648 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
 RAX: ffffffffffffffda RBX: 00007ffe780e06a8 RCX: 00007fbc813221b7
 RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055e214df9b58
 RBP: 000055e214df9af0 R08: 00007ffe780df5c1 R09: 0000000000000000
 R10: 00007fbc8139ecc0 R11: 0000000000000206 R12: 00007ffe780e0870
 R13: 00007ffe780e0ed0 R14: 000055e214df9260 R15: 000055e214df9af0
  &lt;/TASK&gt;
 Modules linked in: mdpy(-) mdev vfio_iommu_type1 vfio [last unloaded: mdpy]
 Dumping ftrace buffer:
    (ftrace buffer empty)
 ---[ end trace 0000000000000000 ]---
 RIP: 0010:__kobject_del+0x62/0x1c0
 Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 51 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 8b 6b 28 48 8d 7d 10 48 89 fa 48 c1 ea 03 &lt;80&gt; 3c 02 00 0f 85 24 01 00 00 48 8b 75 10 48 89 df 48 8d 6b 3c e8
 RSP: 0018:ffff88810695fd30 EFLAGS: 00010202
 RAX: dffffc0000000000 RBX: ffffffffa0270268 RCX: 0000000000000000
 RDX: 0000000000000002 RSI: 0000000000000004 RDI: 0000000000000010
 RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed10233a4ef1
 R10: ffff888119d2778b R11: 0000000063666572 R12: 0000000000000000
 R13: fffffbfff404e2d4 R14: dffffc0000000000 R15: ffffffffa0271660
 FS:  00007fbc81981540(0000) GS:ffff888119d00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fc14a142dc0 CR3: 0000000110a62003 CR4: 0000000000770ee0
 DR0: ffffffff8fb0bce8 DR1: ffffffff8fb0bce9 DR2: ffffffff8fb0bcea
 DR3: ffffffff8fb0bceb DR6: 00000000fffe0ff0 DR7: 0000000000000600
 PKRU: 55555554
 Kernel panic - not syncing: Fatal exception
 Dumping ftrace buffer:
    (ftrace buffer empty)
 Kernel Offset: disabled
 Rebooting in 1 seconds..

Fixes: da44c340c4fe ("vfio/mdev: simplify mdev_type handling")
Signed-off-by: Jinjie Ruan &lt;ruanjinjie@huawei.com&gt;
Reviewed-by: Eric Farman &lt;farman@linux.ibm.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Link: https://lore.kernel.org/r/20230918115551.1423193-1-ruanjinjie@huawei.com
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio/type1: fix cap_migration information leak</title>
<updated>2023-09-13T07:42:47+00:00</updated>
<author>
<name>Stefan Hajnoczi</name>
<email>stefanha@redhat.com</email>
</author>
<published>2023-08-01T15:53:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f6f300ecc196d243c02adeb9ee0c62c677c24bfb'/>
<id>f6f300ecc196d243c02adeb9ee0c62c677c24bfb</id>
<content type='text'>
[ Upstream commit cd24e2a60af633f157d7e59c0a6dba64f131c0b1 ]

Fix an information leak where an uninitialized hole in struct
vfio_iommu_type1_info_cap_migration on the stack is exposed to userspace.

The definition of struct vfio_iommu_type1_info_cap_migration contains a hole as
shown in this pahole(1) output:

  struct vfio_iommu_type1_info_cap_migration {
          struct vfio_info_cap_header header;              /*     0     8 */
          __u32                      flags;                /*     8     4 */

          /* XXX 4 bytes hole, try to pack */

          __u64                      pgsize_bitmap;        /*    16     8 */
          __u64                      max_dirty_bitmap_size; /*    24     8 */

          /* size: 32, cachelines: 1, members: 4 */
          /* sum members: 28, holes: 1, sum holes: 4 */
          /* last cacheline: 32 bytes */
  };

The cap_mig variable is filled in without initializing the hole:

  static int vfio_iommu_migration_build_caps(struct vfio_iommu *iommu,
                         struct vfio_info_cap *caps)
  {
      struct vfio_iommu_type1_info_cap_migration cap_mig;

      cap_mig.header.id = VFIO_IOMMU_TYPE1_INFO_CAP_MIGRATION;
      cap_mig.header.version = 1;

      cap_mig.flags = 0;
      /* support minimum pgsize */
      cap_mig.pgsize_bitmap = (size_t)1 &lt;&lt; __ffs(iommu-&gt;pgsize_bitmap);
      cap_mig.max_dirty_bitmap_size = DIRTY_BITMAP_SIZE_MAX;

      return vfio_info_add_capability(caps, &amp;cap_mig.header, sizeof(cap_mig));
  }

The structure is then copied to a temporary location on the heap. At this point
it's already too late and ioctl(VFIO_IOMMU_GET_INFO) copies it to userspace
later:

  int vfio_info_add_capability(struct vfio_info_cap *caps,
                   struct vfio_info_cap_header *cap, size_t size)
  {
      struct vfio_info_cap_header *header;

      header = vfio_info_cap_add(caps, size, cap-&gt;id, cap-&gt;version);
      if (IS_ERR(header))
          return PTR_ERR(header);

      memcpy(header + 1, cap + 1, size - sizeof(*header));

      return 0;
  }

This issue was found by code inspection.

Signed-off-by: Stefan Hajnoczi &lt;stefanha@redhat.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Fixes: ad721705d09c ("vfio iommu: Add migration capability to report supported features")
Link: https://lore.kernel.org/r/20230801155352.1391945-1-stefanha@redhat.com
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit cd24e2a60af633f157d7e59c0a6dba64f131c0b1 ]

Fix an information leak where an uninitialized hole in struct
vfio_iommu_type1_info_cap_migration on the stack is exposed to userspace.

The definition of struct vfio_iommu_type1_info_cap_migration contains a hole as
shown in this pahole(1) output:

  struct vfio_iommu_type1_info_cap_migration {
          struct vfio_info_cap_header header;              /*     0     8 */
          __u32                      flags;                /*     8     4 */

          /* XXX 4 bytes hole, try to pack */

          __u64                      pgsize_bitmap;        /*    16     8 */
          __u64                      max_dirty_bitmap_size; /*    24     8 */

          /* size: 32, cachelines: 1, members: 4 */
          /* sum members: 28, holes: 1, sum holes: 4 */
          /* last cacheline: 32 bytes */
  };

The cap_mig variable is filled in without initializing the hole:

  static int vfio_iommu_migration_build_caps(struct vfio_iommu *iommu,
                         struct vfio_info_cap *caps)
  {
      struct vfio_iommu_type1_info_cap_migration cap_mig;

      cap_mig.header.id = VFIO_IOMMU_TYPE1_INFO_CAP_MIGRATION;
      cap_mig.header.version = 1;

      cap_mig.flags = 0;
      /* support minimum pgsize */
      cap_mig.pgsize_bitmap = (size_t)1 &lt;&lt; __ffs(iommu-&gt;pgsize_bitmap);
      cap_mig.max_dirty_bitmap_size = DIRTY_BITMAP_SIZE_MAX;

      return vfio_info_add_capability(caps, &amp;cap_mig.header, sizeof(cap_mig));
  }

The structure is then copied to a temporary location on the heap. At this point
it's already too late and ioctl(VFIO_IOMMU_GET_INFO) copies it to userspace
later:

  int vfio_info_add_capability(struct vfio_info_cap *caps,
                   struct vfio_info_cap_header *cap, size_t size)
  {
      struct vfio_info_cap_header *header;

      header = vfio_info_cap_add(caps, size, cap-&gt;id, cap-&gt;version);
      if (IS_ERR(header))
          return PTR_ERR(header);

      memcpy(header + 1, cap + 1, size - sizeof(*header));

      return 0;
  }

This issue was found by code inspection.

Signed-off-by: Stefan Hajnoczi &lt;stefanha@redhat.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Fixes: ad721705d09c ("vfio iommu: Add migration capability to report supported features")
Link: https://lore.kernel.org/r/20230801155352.1391945-1-stefanha@redhat.com
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio/mdev: Move the compat_class initialization to module init</title>
<updated>2023-07-19T14:21:41+00:00</updated>
<author>
<name>Eric Farman</name>
<email>farman@linux.ibm.com</email>
</author>
<published>2023-06-26T13:36:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8f98749d536d099bfdeaa01e646f0e9eace2191f'/>
<id>8f98749d536d099bfdeaa01e646f0e9eace2191f</id>
<content type='text'>
[ Upstream commit ff598081e5b9d0bdd6874bfe340811bbb75b35e4 ]

The pointer to mdev_bus_compat_class is statically defined at the top
of mdev_core, and was originally (commit 7b96953bc640 ("vfio: Mediated
device Core driver") serialized by the parent_list_lock. The blamed
commit removed this mutex, leaving the pointer initialization
unserialized. As a result, the creation of multiple MDEVs in parallel
(such as during boot) can encounter errors during the creation of the
sysfs entries, such as:

  [    8.337509] sysfs: cannot create duplicate filename '/class/mdev_bus'
  [    8.337514] vfio_ccw 0.0.01d8: MDEV: Registered
  [    8.337516] CPU: 13 PID: 946 Comm: driverctl Not tainted 6.4.0-rc7 #20
  [    8.337522] Hardware name: IBM 3906 M05 780 (LPAR)
  [    8.337525] Call Trace:
  [    8.337528]  [&lt;0000000162b0145a&gt;] dump_stack_lvl+0x62/0x80
  [    8.337540]  [&lt;00000001622aeb30&gt;] sysfs_warn_dup+0x78/0x88
  [    8.337549]  [&lt;00000001622aeca6&gt;] sysfs_create_dir_ns+0xe6/0xf8
  [    8.337552]  [&lt;0000000162b04504&gt;] kobject_add_internal+0xf4/0x340
  [    8.337557]  [&lt;0000000162b04d48&gt;] kobject_add+0x78/0xd0
  [    8.337561]  [&lt;0000000162b04e0a&gt;] kobject_create_and_add+0x6a/0xb8
  [    8.337565]  [&lt;00000001627a110e&gt;] class_compat_register+0x5e/0x90
  [    8.337572]  [&lt;000003ff7fd815da&gt;] mdev_register_parent+0x102/0x130 [mdev]
  [    8.337581]  [&lt;000003ff7fdc7f2c&gt;] vfio_ccw_sch_probe+0xe4/0x178 [vfio_ccw]
  [    8.337588]  [&lt;0000000162a7833c&gt;] css_probe+0x44/0x80
  [    8.337599]  [&lt;000000016279f4da&gt;] really_probe+0xd2/0x460
  [    8.337603]  [&lt;000000016279fa08&gt;] driver_probe_device+0x40/0xf0
  [    8.337606]  [&lt;000000016279fb78&gt;] __device_attach_driver+0xc0/0x140
  [    8.337610]  [&lt;000000016279cbe0&gt;] bus_for_each_drv+0x90/0xd8
  [    8.337618]  [&lt;00000001627a00b0&gt;] __device_attach+0x110/0x190
  [    8.337621]  [&lt;000000016279c7c8&gt;] bus_rescan_devices_helper+0x60/0xb0
  [    8.337626]  [&lt;000000016279cd48&gt;] drivers_probe_store+0x48/0x80
  [    8.337632]  [&lt;00000001622ac9b0&gt;] kernfs_fop_write_iter+0x138/0x1f0
  [    8.337635]  [&lt;00000001621e5e14&gt;] vfs_write+0x1ac/0x2f8
  [    8.337645]  [&lt;00000001621e61d8&gt;] ksys_write+0x70/0x100
  [    8.337650]  [&lt;0000000162b2bdc4&gt;] __do_syscall+0x1d4/0x200
  [    8.337656]  [&lt;0000000162b3c828&gt;] system_call+0x70/0x98
  [    8.337664] kobject: kobject_add_internal failed for mdev_bus with -EEXIST, don't try to register things with the same name in the same directory.
  [    8.337668] kobject: kobject_create_and_add: kobject_add error: -17
  [    8.337674] vfio_ccw: probe of 0.0.01d9 failed with error -12
  [    8.342941] vfio_ccw_mdev aeb9ca91-10c6-42bc-a168-320023570aea: Adding to iommu group 2

Move the initialization of the mdev_bus_compat_class pointer to the
init path, to match the cleanup in module exit. This way the code
in mdev_register_parent() can simply link the new parent to it,
rather than determining whether initialization is required first.

Fixes: 89345d5177aa ("vfio/mdev: embedd struct mdev_parent in the parent data structure")
Reported-by: Alexander Egorenkov &lt;egorenar@linux.ibm.com&gt;
Signed-off-by: Eric Farman &lt;farman@linux.ibm.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Tony Krowiak &lt;akrowiak@linux.ibm.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Link: https://lore.kernel.org/r/20230626133642.2939168-1-farman@linux.ibm.com
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit ff598081e5b9d0bdd6874bfe340811bbb75b35e4 ]

The pointer to mdev_bus_compat_class is statically defined at the top
of mdev_core, and was originally (commit 7b96953bc640 ("vfio: Mediated
device Core driver") serialized by the parent_list_lock. The blamed
commit removed this mutex, leaving the pointer initialization
unserialized. As a result, the creation of multiple MDEVs in parallel
(such as during boot) can encounter errors during the creation of the
sysfs entries, such as:

  [    8.337509] sysfs: cannot create duplicate filename '/class/mdev_bus'
  [    8.337514] vfio_ccw 0.0.01d8: MDEV: Registered
  [    8.337516] CPU: 13 PID: 946 Comm: driverctl Not tainted 6.4.0-rc7 #20
  [    8.337522] Hardware name: IBM 3906 M05 780 (LPAR)
  [    8.337525] Call Trace:
  [    8.337528]  [&lt;0000000162b0145a&gt;] dump_stack_lvl+0x62/0x80
  [    8.337540]  [&lt;00000001622aeb30&gt;] sysfs_warn_dup+0x78/0x88
  [    8.337549]  [&lt;00000001622aeca6&gt;] sysfs_create_dir_ns+0xe6/0xf8
  [    8.337552]  [&lt;0000000162b04504&gt;] kobject_add_internal+0xf4/0x340
  [    8.337557]  [&lt;0000000162b04d48&gt;] kobject_add+0x78/0xd0
  [    8.337561]  [&lt;0000000162b04e0a&gt;] kobject_create_and_add+0x6a/0xb8
  [    8.337565]  [&lt;00000001627a110e&gt;] class_compat_register+0x5e/0x90
  [    8.337572]  [&lt;000003ff7fd815da&gt;] mdev_register_parent+0x102/0x130 [mdev]
  [    8.337581]  [&lt;000003ff7fdc7f2c&gt;] vfio_ccw_sch_probe+0xe4/0x178 [vfio_ccw]
  [    8.337588]  [&lt;0000000162a7833c&gt;] css_probe+0x44/0x80
  [    8.337599]  [&lt;000000016279f4da&gt;] really_probe+0xd2/0x460
  [    8.337603]  [&lt;000000016279fa08&gt;] driver_probe_device+0x40/0xf0
  [    8.337606]  [&lt;000000016279fb78&gt;] __device_attach_driver+0xc0/0x140
  [    8.337610]  [&lt;000000016279cbe0&gt;] bus_for_each_drv+0x90/0xd8
  [    8.337618]  [&lt;00000001627a00b0&gt;] __device_attach+0x110/0x190
  [    8.337621]  [&lt;000000016279c7c8&gt;] bus_rescan_devices_helper+0x60/0xb0
  [    8.337626]  [&lt;000000016279cd48&gt;] drivers_probe_store+0x48/0x80
  [    8.337632]  [&lt;00000001622ac9b0&gt;] kernfs_fop_write_iter+0x138/0x1f0
  [    8.337635]  [&lt;00000001621e5e14&gt;] vfs_write+0x1ac/0x2f8
  [    8.337645]  [&lt;00000001621e61d8&gt;] ksys_write+0x70/0x100
  [    8.337650]  [&lt;0000000162b2bdc4&gt;] __do_syscall+0x1d4/0x200
  [    8.337656]  [&lt;0000000162b3c828&gt;] system_call+0x70/0x98
  [    8.337664] kobject: kobject_add_internal failed for mdev_bus with -EEXIST, don't try to register things with the same name in the same directory.
  [    8.337668] kobject: kobject_create_and_add: kobject_add error: -17
  [    8.337674] vfio_ccw: probe of 0.0.01d9 failed with error -12
  [    8.342941] vfio_ccw_mdev aeb9ca91-10c6-42bc-a168-320023570aea: Adding to iommu group 2

Move the initialization of the mdev_bus_compat_class pointer to the
init path, to match the cleanup in module exit. This way the code
in mdev_register_parent() can simply link the new parent to it,
rather than determining whether initialization is required first.

Fixes: 89345d5177aa ("vfio/mdev: embedd struct mdev_parent in the parent data structure")
Reported-by: Alexander Egorenkov &lt;egorenar@linux.ibm.com&gt;
Signed-off-by: Eric Farman &lt;farman@linux.ibm.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Tony Krowiak &lt;akrowiak@linux.ibm.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Link: https://lore.kernel.org/r/20230626133642.2939168-1-farman@linux.ibm.com
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio/type1: check pfn valid before converting to struct page</title>
<updated>2023-06-05T07:26:19+00:00</updated>
<author>
<name>Yan Zhao</name>
<email>yan.y.zhao@intel.com</email>
</author>
<published>2023-05-19T06:58:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=cd3c5e4e0d60630c5aae4480204a4a083a3d2503'/>
<id>cd3c5e4e0d60630c5aae4480204a4a083a3d2503</id>
<content type='text'>
[ Upstream commit 4752354af71043e6fd72ef5490ed6da39e6cab4a ]

Check physical PFN is valid before converting the PFN to a struct page
pointer to be returned to caller of vfio_pin_pages().

vfio_pin_pages() pins user pages with contiguous IOVA.
If the IOVA of a user page to be pinned belongs to vma of vm_flags
VM_PFNMAP, pin_user_pages_remote() will return -EFAULT without returning
struct page address for this PFN. This is because usually this kind of PFN
(e.g. MMIO PFN) has no valid struct page address associated.
Upon this error, vaddr_get_pfns() will obtain the physical PFN directly.

While previously vfio_pin_pages() returns to caller PFN arrays directly,
after commit
34a255e67615 ("vfio: Replace phys_pfn with pages for vfio_pin_pages()"),
PFNs will be converted to "struct page *" unconditionally and therefore
the returned "struct page *" array may contain invalid struct page
addresses.

Given current in-tree users of vfio_pin_pages() only expect "struct page *
returned, check PFN validity and return -EINVAL to let the caller be
aware of IOVAs to be pinned containing PFN not able to be returned in
"struct page *" array. So that, the caller will not consume the returned
pointer (e.g. test PageReserved()) and avoid error like "supervisor read
access in kernel mode".

Fixes: 34a255e67615 ("vfio: Replace phys_pfn with pages for vfio_pin_pages()")
Cc: Sean Christopherson &lt;seanjc@google.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Yan Zhao &lt;yan.y.zhao@intel.com&gt;
Reviewed-by: Sean Christopherson &lt;seanjc@google.com&gt;
Link: https://lore.kernel.org/r/20230519065843.10653-1-yan.y.zhao@intel.com
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 4752354af71043e6fd72ef5490ed6da39e6cab4a ]

Check physical PFN is valid before converting the PFN to a struct page
pointer to be returned to caller of vfio_pin_pages().

vfio_pin_pages() pins user pages with contiguous IOVA.
If the IOVA of a user page to be pinned belongs to vma of vm_flags
VM_PFNMAP, pin_user_pages_remote() will return -EFAULT without returning
struct page address for this PFN. This is because usually this kind of PFN
(e.g. MMIO PFN) has no valid struct page address associated.
Upon this error, vaddr_get_pfns() will obtain the physical PFN directly.

While previously vfio_pin_pages() returns to caller PFN arrays directly,
after commit
34a255e67615 ("vfio: Replace phys_pfn with pages for vfio_pin_pages()"),
PFNs will be converted to "struct page *" unconditionally and therefore
the returned "struct page *" array may contain invalid struct page
addresses.

Given current in-tree users of vfio_pin_pages() only expect "struct page *
returned, check PFN validity and return -EINVAL to let the caller be
aware of IOVAs to be pinned containing PFN not able to be returned in
"struct page *" array. So that, the caller will not consume the returned
pointer (e.g. test PageReserved()) and avoid error like "supervisor read
access in kernel mode".

Fixes: 34a255e67615 ("vfio: Replace phys_pfn with pages for vfio_pin_pages()")
Cc: Sean Christopherson &lt;seanjc@google.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Yan Zhao &lt;yan.y.zhao@intel.com&gt;
Reviewed-by: Sean Christopherson &lt;seanjc@google.com&gt;
Link: https://lore.kernel.org/r/20230519065843.10653-1-yan.y.zhao@intel.com
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio/type1: restore locked_vm</title>
<updated>2023-03-10T08:34:32+00:00</updated>
<author>
<name>Steve Sistare</name>
<email>steven.sistare@oracle.com</email>
</author>
<published>2023-01-31T16:58:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=bf119a3c9c9f1b01930e8798b59c791080bce90f'/>
<id>bf119a3c9c9f1b01930e8798b59c791080bce90f</id>
<content type='text'>
commit 90fdd158a695d70403163f9a0e4efc5b20f3fd3e upstream.

When a vfio container is preserved across exec or fork-exec, the new
task's mm has a locked_vm count of 0.  After a dma vaddr is updated using
VFIO_DMA_MAP_FLAG_VADDR, locked_vm remains 0, and the pinned memory does
not count against the task's RLIMIT_MEMLOCK.

To restore the correct locked_vm count, when VFIO_DMA_MAP_FLAG_VADDR is
used and the dma's mm has changed, add the dma's locked_vm count to
the new mm-&gt;locked_vm, subject to the rlimit, and subtract it from the
old mm-&gt;locked_vm.

Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Sistare &lt;steven.sistare@oracle.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Link: https://lore.kernel.org/r/1675184289-267876-5-git-send-email-steven.sistare@oracle.com
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 90fdd158a695d70403163f9a0e4efc5b20f3fd3e upstream.

When a vfio container is preserved across exec or fork-exec, the new
task's mm has a locked_vm count of 0.  After a dma vaddr is updated using
VFIO_DMA_MAP_FLAG_VADDR, locked_vm remains 0, and the pinned memory does
not count against the task's RLIMIT_MEMLOCK.

To restore the correct locked_vm count, when VFIO_DMA_MAP_FLAG_VADDR is
used and the dma's mm has changed, add the dma's locked_vm count to
the new mm-&gt;locked_vm, subject to the rlimit, and subtract it from the
old mm-&gt;locked_vm.

Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Sistare &lt;steven.sistare@oracle.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Link: https://lore.kernel.org/r/1675184289-267876-5-git-send-email-steven.sistare@oracle.com
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio/type1: track locked_vm per dma</title>
<updated>2023-03-10T08:34:32+00:00</updated>
<author>
<name>Steve Sistare</name>
<email>steven.sistare@oracle.com</email>
</author>
<published>2023-01-31T16:58:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9f0fd4f8ec0c551ef0a819b30833873718eaef10'/>
<id>9f0fd4f8ec0c551ef0a819b30833873718eaef10</id>
<content type='text'>
commit 18e292705ba21cc9b3227b9ad5b1c28973605ee5 upstream.

Track locked_vm per dma struct, and create a new subroutine, both for use
in a subsequent patch.  No functional change.

Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Sistare &lt;steven.sistare@oracle.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Link: https://lore.kernel.org/r/1675184289-267876-4-git-send-email-steven.sistare@oracle.com
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 18e292705ba21cc9b3227b9ad5b1c28973605ee5 upstream.

Track locked_vm per dma struct, and create a new subroutine, both for use
in a subsequent patch.  No functional change.

Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Sistare &lt;steven.sistare@oracle.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Link: https://lore.kernel.org/r/1675184289-267876-4-git-send-email-steven.sistare@oracle.com
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio/type1: prevent underflow of locked_vm via exec()</title>
<updated>2023-03-10T08:34:32+00:00</updated>
<author>
<name>Steve Sistare</name>
<email>steven.sistare@oracle.com</email>
</author>
<published>2023-01-31T16:58:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a6b2aabe664098d5cf877ae0fd96459464a30e17'/>
<id>a6b2aabe664098d5cf877ae0fd96459464a30e17</id>
<content type='text'>
commit 046eca5018f8a5dd1dc2cedf87fb5843b9ea3026 upstream.

When a vfio container is preserved across exec, the task does not change,
but it gets a new mm with locked_vm=0, and loses the count from existing
dma mappings.  If the user later unmaps a dma mapping, locked_vm underflows
to a large unsigned value, and a subsequent dma map request fails with
ENOMEM in __account_locked_vm.

To avoid underflow, grab and save the mm at the time a dma is mapped.
Use that mm when adjusting locked_vm, rather than re-acquiring the saved
task's mm, which may have changed.  If the saved mm is dead, do nothing.

locked_vm is incremented for existing mappings in a subsequent patch.

Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Sistare &lt;steven.sistare@oracle.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Link: https://lore.kernel.org/r/1675184289-267876-3-git-send-email-steven.sistare@oracle.com
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 046eca5018f8a5dd1dc2cedf87fb5843b9ea3026 upstream.

When a vfio container is preserved across exec, the task does not change,
but it gets a new mm with locked_vm=0, and loses the count from existing
dma mappings.  If the user later unmaps a dma mapping, locked_vm underflows
to a large unsigned value, and a subsequent dma map request fails with
ENOMEM in __account_locked_vm.

To avoid underflow, grab and save the mm at the time a dma is mapped.
Use that mm when adjusting locked_vm, rather than re-acquiring the saved
task's mm, which may have changed.  If the saved mm is dead, do nothing.

locked_vm is incremented for existing mappings in a subsequent patch.

Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Sistare &lt;steven.sistare@oracle.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Link: https://lore.kernel.org/r/1675184289-267876-3-git-send-email-steven.sistare@oracle.com
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR</title>
<updated>2023-03-10T08:34:32+00:00</updated>
<author>
<name>Steve Sistare</name>
<email>steven.sistare@oracle.com</email>
</author>
<published>2023-01-31T16:58:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e8c21b19c2d0c4f5ad0e5b504c18e4e25e737e5f'/>
<id>e8c21b19c2d0c4f5ad0e5b504c18e4e25e737e5f</id>
<content type='text'>
commit ef3a3f6a294ba65fd906a291553935881796f8a5 upstream.

Disable the VFIO_UPDATE_VADDR capability if mediated devices are present.
Their kernel threads could be blocked indefinitely by a misbehaving
userland while trying to pin/unpin pages while vaddrs are being updated.

Do not allow groups to be added to the container while vaddr's are invalid,
so we never need to block user threads from pinning, and can delete the
vaddr-waiting code in a subsequent patch.

Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Sistare &lt;steven.sistare@oracle.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Link: https://lore.kernel.org/r/1675184289-267876-2-git-send-email-steven.sistare@oracle.com
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit ef3a3f6a294ba65fd906a291553935881796f8a5 upstream.

Disable the VFIO_UPDATE_VADDR capability if mediated devices are present.
Their kernel threads could be blocked indefinitely by a misbehaving
userland while trying to pin/unpin pages while vaddrs are being updated.

Do not allow groups to be added to the container while vaddr's are invalid,
so we never need to block user threads from pinning, and can delete the
vaddr-waiting code in a subsequent patch.

Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Sistare &lt;steven.sistare@oracle.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Link: https://lore.kernel.org/r/1675184289-267876-2-git-send-email-steven.sistare@oracle.com
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio/type1: Respect IOMMU reserved regions in vfio_test_domain_fgsp()</title>
<updated>2023-02-01T07:34:36+00:00</updated>
<author>
<name>Niklas Schnelle</name>
<email>schnelle@linux.ibm.com</email>
</author>
<published>2023-01-10T16:44:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=4ba7d17f2b3b60556e8e5832b55064d6bc800f0c'/>
<id>4ba7d17f2b3b60556e8e5832b55064d6bc800f0c</id>
<content type='text'>
[ Upstream commit 895c0747f726bb50c9b7a805613a61d1b6f9fa06 ]

Since commit cbf7827bc5dc ("iommu/s390: Fix potential s390_domain
aperture shrinking") the s390 IOMMU driver uses reserved regions for the
system provided DMA ranges of PCI devices. Previously it reduced the
size of the IOMMU aperture and checked it on each mapping operation.
On current machines the system denies use of DMA addresses below 2^32 for
all PCI devices.

Usually mapping IOVAs in a reserved regions is harmless until a DMA
actually tries to utilize the mapping. However on s390 there is
a virtual PCI device called ISM which is implemented in firmware and
used for cross LPAR communication. Unlike real PCI devices this device
does not use the hardware IOMMU but inspects IOMMU translation tables
directly on IOTLB flush (s390 RPCIT instruction). If it detects IOVA
mappings outside the allowed ranges it goes into an error state. This
error state then causes the device to be unavailable to the KVM guest.

Analysing this we found that vfio_test_domain_fgsp() maps 2 pages at DMA
address 0 irrespective of the IOMMUs reserved regions. Even if usually
harmless this seems wrong in the general case so instead go through the
freshly updated IOVA list and try to find a range that isn't reserved,
and fits 2 pages, is PAGE_SIZE * 2 aligned. If found use that for
testing for fine grained super pages.

Fixes: af029169b8fd ("vfio/type1: Check reserved region conflict and update iova list")
Signed-off-by: Niklas Schnelle &lt;schnelle@linux.ibm.com&gt;
Reviewed-by: Matthew Rosato &lt;mjrosato@linux.ibm.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Link: https://lore.kernel.org/r/20230110164427.4051938-2-schnelle@linux.ibm.com
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 895c0747f726bb50c9b7a805613a61d1b6f9fa06 ]

Since commit cbf7827bc5dc ("iommu/s390: Fix potential s390_domain
aperture shrinking") the s390 IOMMU driver uses reserved regions for the
system provided DMA ranges of PCI devices. Previously it reduced the
size of the IOMMU aperture and checked it on each mapping operation.
On current machines the system denies use of DMA addresses below 2^32 for
all PCI devices.

Usually mapping IOVAs in a reserved regions is harmless until a DMA
actually tries to utilize the mapping. However on s390 there is
a virtual PCI device called ISM which is implemented in firmware and
used for cross LPAR communication. Unlike real PCI devices this device
does not use the hardware IOMMU but inspects IOMMU translation tables
directly on IOTLB flush (s390 RPCIT instruction). If it detects IOVA
mappings outside the allowed ranges it goes into an error state. This
error state then causes the device to be unavailable to the KVM guest.

Analysing this we found that vfio_test_domain_fgsp() maps 2 pages at DMA
address 0 irrespective of the IOMMUs reserved regions. Even if usually
harmless this seems wrong in the general case so instead go through the
freshly updated IOVA list and try to find a range that isn't reserved,
and fits 2 pages, is PAGE_SIZE * 2 aligned. If found use that for
testing for fine grained super pages.

Fixes: af029169b8fd ("vfio/type1: Check reserved region conflict and update iova list")
Signed-off-by: Niklas Schnelle &lt;schnelle@linux.ibm.com&gt;
Reviewed-by: Matthew Rosato &lt;mjrosato@linux.ibm.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Link: https://lore.kernel.org/r/20230110164427.4051938-2-schnelle@linux.ibm.com
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio/iova_bitmap: refactor iova_bitmap_set() to better handle page boundaries</title>
<updated>2022-12-31T12:32:41+00:00</updated>
<author>
<name>Joao Martins</name>
<email>joao.m.martins@oracle.com</email>
</author>
<published>2022-11-29T13:12:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ba090c6debb5de85ef1289ce23e9adda05e59f3f'/>
<id>ba090c6debb5de85ef1289ce23e9adda05e59f3f</id>
<content type='text'>
[ Upstream commit b058ea3ab5afea873ab8d976277539ca9e43869a ]

Commit f38044e5ef58 ("vfio/iova_bitmap: Fix PAGE_SIZE unaligned bitmaps")
had fixed the unaligned bitmaps by capping the remaining iterable set at
the start of the bitmap. Although, that mistakenly worked around
iova_bitmap_set() incorrectly setting bits across page boundary.

Fix this by reworking the loop inside iova_bitmap_set() to iterate over a
range of bits to set (cur_bit .. last_bit) which may span different pinned
pages, thus updating @page_idx and @offset as it sets the bits. The
previous cap to the first page is now adjusted to be always accounted
rather than when there's only a non-zero pgoff.

While at it, make @page_idx , @offset and @nbits to be unsigned int given
that it won't be more than 512 and 4096 respectively (even a bigger
PAGE_SIZE or a smaller struct page size won't make this bigger than the
above 32-bit max). Also, delete the stale kdoc on Return type.

Cc: Avihai Horon &lt;avihaih@nvidia.com&gt;
Fixes: f38044e5ef58 ("vfio/iova_bitmap: Fix PAGE_SIZE unaligned bitmaps")
Co-developed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Joao Martins &lt;joao.m.martins@oracle.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Tested-by: Avihai Horon &lt;avihaih@nvidia.com&gt;
Link: https://lore.kernel.org/r/20221129131235.38880-1-joao.m.martins@oracle.com
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit b058ea3ab5afea873ab8d976277539ca9e43869a ]

Commit f38044e5ef58 ("vfio/iova_bitmap: Fix PAGE_SIZE unaligned bitmaps")
had fixed the unaligned bitmaps by capping the remaining iterable set at
the start of the bitmap. Although, that mistakenly worked around
iova_bitmap_set() incorrectly setting bits across page boundary.

Fix this by reworking the loop inside iova_bitmap_set() to iterate over a
range of bits to set (cur_bit .. last_bit) which may span different pinned
pages, thus updating @page_idx and @offset as it sets the bits. The
previous cap to the first page is now adjusted to be always accounted
rather than when there's only a non-zero pgoff.

While at it, make @page_idx , @offset and @nbits to be unsigned int given
that it won't be more than 512 and 4096 respectively (even a bigger
PAGE_SIZE or a smaller struct page size won't make this bigger than the
above 32-bit max). Also, delete the stale kdoc on Return type.

Cc: Avihai Horon &lt;avihaih@nvidia.com&gt;
Fixes: f38044e5ef58 ("vfio/iova_bitmap: Fix PAGE_SIZE unaligned bitmaps")
Co-developed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Joao Martins &lt;joao.m.martins@oracle.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Tested-by: Avihai Horon &lt;avihaih@nvidia.com&gt;
Link: https://lore.kernel.org/r/20221129131235.38880-1-joao.m.martins@oracle.com
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
