linux-stable.git/include/linux, branch linux-5.0.y

inet: switch IP ID generator to siphash

2019-06-04T06:01:26+00:00

[ Upstream commit df453700e8d81b1bdafdf684365ee2b9431fb702 ]

According to Amit Klein and Benny Pinkas, IP ID generation is too weak
and might be used by attackers.

Even with recent net_hash_mix() fix (netns: provide pure entropy for net_hash_mix())
having 64bit key and Jenkins hash is risky.

It is time to switch to siphash and its 128bit keys.

Signed-off-by: Eric Dumazet 
Reported-by: Amit Klein 
Reported-by: Benny Pinkas 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

rcu: Do a single rhp->func read in rcu_head_after_call_rcu()

2019-05-31T13:45:16+00:00

[ Upstream commit b699cce1604e828f19c39845252626eb78cdf38a ]

The rcu_head_after_call_rcu() function reads the rhp->func pointer twice,
which can result in a false-positive WARN_ON_ONCE() if the callback
were passed to call_rcu() between the two reads.  Although racing
rcu_head_after_call_rcu() with call_rcu() is to be a dubious use case
(the return value is not reliable in that case), intermittent and
irreproducible warnings are also quite dubious.  This commit therefore
uses a single READ_ONCE() to pick up the value of rhp->func once, then
tests that value twice, thus guaranteeing consistent processing within
rcu_head_after_call_rcu()().

Neverthless, racing rcu_head_after_call_rcu() with call_rcu() is still
a dubious use case.

Signed-off-by: Neeraj Upadhyay 
[ paulmck: Add blank line after declaration per checkpatch.pl. ]
Signed-off-by: Paul E. McKenney 
Signed-off-by: Sasha Levin

overflow: Fix -Wtype-limits compilation warnings

2019-05-31T13:45:15+00:00

[ Upstream commit dc7fe518b0493faa0af0568d6d8c2a33c00f58d0 ]

Attempt to use check_shl_overflow() with inputs of unsigned type
produces the following compilation warnings.

drivers/infiniband/hw/mlx5/qp.c: In function _set_user_rq_size_:
./include/linux/overflow.h:230:6: warning: comparison of unsigned
expression >= 0 is always true [-Wtype-limits]
   _s >= 0 && _s < 8 * sizeof(*d) ? _s : 0;  \
      ^~
drivers/infiniband/hw/mlx5/qp.c:5820:6: note: in expansion of macro _check_shl_overflow_
  if (check_shl_overflow(rwq->wqe_count, rwq->wqe_shift,
&rwq->buf_size))
      ^~~~~~~~~~~~~~~~~~
./include/linux/overflow.h:232:26: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
  (_to_shift != _s || *_d < 0 || _a < 0 ||   \
                          ^
drivers/infiniband/hw/mlx5/qp.c:5820:6: note: in expansion of macro _check_shl_overflow_
  if (check_shl_overflow(rwq->wqe_count, rwq->wqe_shift, &rwq->buf_size))
      ^~~~~~~~~~~~~~~~~~
./include/linux/overflow.h:232:36: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
  (_to_shift != _s || *_d < 0 || _a < 0 ||   \
                                    ^
drivers/infiniband/hw/mlx5/qp.c:5820:6: note: in expansion of macro _check_shl_overflow_
  if (check_shl_overflow(rwq->wqe_count, rwq->wqe_shift,&rwq->buf_size))
      ^~~~~~~~~~~~~~~~~~

Fixes: 0c66847793d1 ("overflow.h: Add arithmetic shift helper")
Reviewed-by: Bart Van Assche 
Acked-by: Kees Cook 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Jason Gunthorpe 
Signed-off-by: Sasha Levin

timekeeping: Force upper bound for setting CLOCK_REALTIME

2019-05-31T13:45:15+00:00

[ Upstream commit 7a8e61f8478639072d402a26789055a4a4de8f77 ]

Several people reported testing failures after setting CLOCK_REALTIME close
to the limits of the kernel internal representation in nanoseconds,
i.e. year 2262.

The failures are exposed in subsequent operations, i.e. when arming timers
or when the advancing CLOCK_MONOTONIC makes the calculation of
CLOCK_REALTIME overflow into negative space.

Now people start to paper over the underlying problem by clamping
calculations to the valid range, but that's just wrong because such
workarounds will prevent detection of real issues as well.

It is reasonable to force an upper bound for the various methods of setting
CLOCK_REALTIME. Year 2262 is the absolute upper bound. Assume a maximum
uptime of 30 years which is plenty enough even for esoteric embedded
systems. That results in an upper bound of year 2232 for setting the time.

Once that limit is reached in reality this limit is only a small part of
the problem space. But until then this stops people from trying to paper
over the problem at the wrong places.

Reported-by: Xiongfeng Wang 
Reported-by: Hongbo Yao 
Signed-off-by: Thomas Gleixner 
Cc: John Stultz 
Cc: Stephen Boyd 
Cc: Miroslav Lichvar 
Cc: Arnd Bergmann 
Cc: Richard Cochran 
Cc: Peter Zijlstra 
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1903231125480.2157@nanos.tec.linutronix.de
Signed-off-by: Sasha Levin

HID: core: move Usage Page concatenation to Main item

2019-05-31T13:45:13+00:00

[ Upstream commit 58e75155009cc800005629955d3482f36a1e0eec ]

As seen on some USB wireless keyboards manufactured by Primax, the HID
parser was using some assumptions that are not always true. In this case
it's s the fact that, inside the scope of a main item, an Usage Page
will always precede an Usage.

The spec is not pretty clear as 6.2.2.7 states "Any usage that follows
is interpreted as a Usage ID and concatenated with the Usage Page".
While 6.2.2.8 states "When the parser encounters a main item it
concatenates the last declared Usage Page with a Usage to form a
complete usage value." Being somewhat contradictory it was decided to
match Window's implementation, which follows 6.2.2.8.

In summary, the patch moves the Usage Page concatenation from the local
item parsing function to the main item parsing function.

Signed-off-by: Nicolas Saenz Julienne 
Reviewed-by: Terry Junge 
Signed-off-by: Benjamin Tissoires 
Signed-off-by: Sasha Levin

iio: ad_sigma_delta: Properly handle SPI bus locking vs CS assertion

2019-05-31T13:45:09+00:00

[ Upstream commit df1d80aee963480c5c2938c64ec0ac3e4a0df2e0 ]

For devices from the SigmaDelta family we need to keep CS low when doing a
conversion, since the device will use the MISO line as a interrupt to
indicate that the conversion is complete.

This is why the driver locks the SPI bus and when the SPI bus is locked
keeps as long as a conversion is going on. The current implementation gets
one small detail wrong though. CS is only de-asserted after the SPI bus is
unlocked. This means it is possible for a different SPI device on the same
bus to send a message which would be wrongfully be addressed to the
SigmaDelta device as well. Make sure that the last SPI transfer that is
done while holding the SPI bus lock de-asserts the CS signal.

Signed-off-by: Lars-Peter Clausen 
Signed-off-by: Alexandru Ardelean 
Signed-off-by: Jonathan Cameron 
Signed-off-by: Sasha Levin

cgroup: protect cgroup->nr_(dying_)descendants by css_set_lock

2019-05-31T13:45:03+00:00

[ Upstream commit 4dcabece4c3a9f9522127be12cc12cc120399b2f ]

The number of descendant cgroups and the number of dying
descendant cgroups are currently synchronized using the cgroup_mutex.

The number of descendant cgroups will be required by the cgroup v2
freezer, which will use it to determine if a cgroup is frozen
(depending on total number of descendants and number of frozen
descendants). It's not always acceptable to grab the cgroup_mutex,
especially from quite hot paths (e.g. exit()).

To avoid this, let's additionally synchronize these counters using
the css_set_lock.

So, it's safe to read these counters with either cgroup_mutex or
css_set_lock locked, and for changing both locks should be acquired.

Signed-off-by: Roman Gushchin 
Signed-off-by: Tejun Heo 
Cc: kernel-team@fb.com
Signed-off-by: Sasha Levin

block: fix use-after-free on gendisk

2019-05-31T13:45:02+00:00

[ Upstream commit 2c88e3c7ec32d7a40cc7c9b4a487cf90e4671bdd ]

commit 2da78092dda "block: Fix dev_t minor allocation lifetime"
specifically moved blk_free_devt(dev->devt) call to part_release()
to avoid reallocating device number before the device is fully
shutdown.

However, it can cause use-after-free on gendisk in get_gendisk().
We use md device as example to show the race scenes:

Process1		Worker			Process2
md_free
						blkdev_open
del_gendisk
  add delete_partition_work_fn() to wq
  						__blkdev_get
						get_gendisk
put_disk
  disk_release
    kfree(disk)
    						find part from ext_devt_idr
						get_disk_and_module(disk)
    					  	cause use after free

    			delete_partition_work_fn
			put_device(part)
    		  	part_release
		    	remove part from ext_devt_idr

Before  is removed from ext_devt_idr by
delete_partition_work_fn(), we can find the devt and then access
gendisk by hd_struct pointer. But, if we access the gendisk after
it have been freed, it can cause in use-after-freeon gendisk in
get_gendisk().

We fix this by adding a new helper blk_invalidate_devt() in
delete_partition() and del_gendisk(). It replaces hd_struct
pointer in idr with value 'NULL', and deletes the entry from
idr in part_release() as we do now.

Thanks to Jan Kara for providing the solution and more clear comments
for the code.

Fixes: 2da78092dda1 ("block: Fix dev_t minor allocation lifetime")
Cc: Al Viro 
Reviewed-by: Bart Van Assche 
Reviewed-by: Keith Busch 
Reviewed-by: Jan Kara 
Suggested-by: Jan Kara 
Signed-off-by: Yufen Yu 
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin

smpboot: Place the __percpu annotation correctly

2019-05-31T13:45:00+00:00

[ Upstream commit d4645d30b50d1691c26ff0f8fa4e718b08f8d3bb ]

The test robot reported a wrong assignment of a per-CPU variable which
it detected by using sparse and sent a report. The assignment itself is
correct. The annotation for sparse was wrong and hence the report.
The first pointer is a "normal" pointer and points to the per-CPU memory
area. That means that the __percpu annotation has to be moved.

Move the __percpu annotation to pointer which points to the per-CPU
area. This change affects only the sparse tool (and is ignored by the
compiler).

Reported-by: kbuild test robot 
Signed-off-by: Sebastian Andrzej Siewior 
Cc: Linus Torvalds 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Fixes: f97f8f06a49fe ("smpboot: Provide infrastructure for percpu hotplug threads")
Link: http://lkml.kernel.org/r/20190424085253.12178-1-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar 
Signed-off-by: Sasha Levin

x86/modules: Avoid breaking W^X while loading modules

2019-05-31T13:44:57+00:00

[ Upstream commit f2c65fb3221adc6b73b0549fc7ba892022db9797 ]

When modules and BPF filters are loaded, there is a time window in
which some memory is both writable and executable. An attacker that has
already found another vulnerability (e.g., a dangling pointer) might be
able to exploit this behavior to overwrite kernel code. Prevent having
writable executable PTEs in this stage.

In addition, avoiding having W+X mappings can also slightly simplify the
patching of modules code on initialization (e.g., by alternatives and
static-key), as would be done in the next patch. This was actually the
main motivation for this patch.

To avoid having W+X mappings, set them initially as RW (NX) and after
they are set as RO set them as X as well. Setting them as executable is
done as a separate step to avoid one core in which the old PTE is cached
(hence writable), and another which sees the updated PTE (executable),
which would break the W^X protection.

Suggested-by: Thomas Gleixner 
Suggested-by: Andy Lutomirski 
Signed-off-by: Nadav Amit 
Signed-off-by: Rick Edgecombe 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: H. Peter Anvin 
Cc: Jessica Yu 
Cc: Kees Cook 
Cc: Linus Torvalds 
Cc: Masami Hiramatsu 
Cc: Rik van Riel 
Link: https://lkml.kernel.org/r/20190426001143.4983-12-namit@vmware.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Sasha Levin