linux-stable.git/include/linux, branch v3.2.56

jiffies: Avoid undefined behavior from signed overflow

2014-04-01T23:59:01+00:00

commit 5a581b367b5df0531265311fc681c2abd377e5e6 upstream.

According to the C standard 3.4.3p3, overflow of a signed integer results
in undefined behavior.  This commit therefore changes the definitions
of time_after(), time_after_eq(), time_after64(), and time_after_eq64()
to avoid this undefined behavior.  The trick is that the subtraction
is done using unsigned arithmetic, which according to 6.2.5p9 cannot
overflow because it is defined as modulo arithmetic.  This has the added
(though admittedly quite small) benefit of shortening four lines of code
by four characters each.

Note that the C standard considers the cast from unsigned to
signed to be implementation-defined, see 6.3.1.3p3.  However, on a
two's-complement system, an implementation that defines anything other
than a reinterpretation of the bits is free to come to me, and I will be
happy to act as a witness for its being committed to an insane asylum.
(Although I have nothing against saturating arithmetic or signals in some
cases, these things really should not be the default when compiling an
operating-system kernel.)

Signed-off-by: Paul E. McKenney 
Cc: John Stultz 
Cc: "David S. Miller" 
Cc: Arnd Bergmann 
Cc: Ingo Molnar 
Cc: Linus Torvalds 
Cc: Eric Dumazet 
Cc: Kevin Easton 
[ paulmck: Included time_after64() and time_after_eq64(), as suggested
  by Eric Dumazet, also fixed commit message.]
Reviewed-by: Josh Triplett 
Signed-off-by: Ben Hutchings

tracing: Do not add event files for modules that fail tracepoints

2014-04-01T23:58:57+00:00

commit 45ab2813d40d88fc575e753c38478de242d03f88 upstream.

If a module fails to add its tracepoints due to module tainting, do not
create the module event infrastructure in the debugfs directory. As the events
will not work and worse yet, they will silently fail, making the user wonder
why the events they enable do not display anything.

Having a warning on module load and the events not visible to the users
will make the cause of the problem much clearer.

Link: http://lkml.kernel.org/r/20140227154923.265882695@goodmis.org

Fixes: 6d723736e472 "tracing/events: add support for modules to TRACE_EVENT"
Acked-by: Mathieu Desnoyers 
Cc: Rusty Russell 
Signed-off-by: Steven Rostedt 
Signed-off-by: Ben Hutchings

compiler/gcc4: Make quirk for asm_volatile_goto() unconditional

2014-04-01T23:58:51+00:00

commit a9f180345f5378ac87d80ed0bea55ba421d83859 upstream.

I started noticing problems with KVM guest destruction on Linux
3.12+, where guest memory wasn't being cleaned up. I bisected it
down to the commit introducing the new 'asm goto'-based atomics,
and found this quirk was later applied to those.

Unfortunately, even with GCC 4.8.2 (which ostensibly fixed the
known 'asm goto' bug) I am still getting some kind of
miscompilation. If I enable the asm_volatile_goto quirk for my
compiler, KVM guests are destroyed correctly and the memory is
cleaned up.

So make the quirk unconditional for now, until bug is found
and fixed.

Suggested-by: Linus Torvalds 
Signed-off-by: Steven Noonan 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Jakub Jelinek 
Cc: Richard Henderson 
Cc: Andrew Morton 
Cc: Oleg Nesterov 
Link: http://lkml.kernel.org/r/1392274867-15236-1-git-send-email-steven@uplinklabs.net
Link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670
Signed-off-by: Ingo Molnar 
Signed-off-by: Ben Hutchings

fuse: fix pipe_buf_operations

2014-04-01T23:58:44+00:00

commit 28a625cbc2a14f17b83e47ef907b2658576a32aa upstream.

Having this struct in module memory could Oops when if the module is
unloaded while the buffer still persists in a pipe.

Since sock_pipe_buf_ops is essentially the same as fuse_dev_pipe_buf_steal
merge them into nosteal_pipe_buf_ops (this is the same as
default_pipe_buf_ops except stealing the page from the buffer is not
allowed).

Reported-by: Al Viro 
Signed-off-by: Miklos Szeredi 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

libata: disable LPM for some WD SATA-I devices

2014-04-01T23:58:43+00:00

commit ecd75ad514d73efc1bbcc5f10a13566c3ace5f53 upstream.

For some reason, some early WD drives spin up and down drives
erratically when the link is put into slumber mode which can reduce
the life expectancy of the device significantly.  Unfortunately, we
don't have full list of devices and given the nature of the issue it'd
be better to err on the side of false positives than the other way
around.  Let's disable LPM on all WD devices which match one of the
known problematic model prefixes and are SATA-I.

As horkage list doesn't support matching SATA capabilities, this is
implemented as two horkages - WD_BROKEN_LPM and NOLPM.  The former is
set for the known prefixes and sets the latter if the matched device
is SATA-I.

Note that this isn't optimal as this disables all LPM operations and
partial link power state reportedly works fine on these; however, the
way LPM is implemented in libata makes it difficult to precisely map
libata LPM setting to specific link power state.  Well, these devices
are already fairly outdated.  Let's just disable whole LPM for now.

Signed-off-by: Tejun Heo 
Reported-and-tested-by: Nikos Barkas 
Reported-and-tested-by: Ioannis Barkas 
References: https://bugzilla.kernel.org/show_bug.cgi?id=57211
[bwh: Backported to 3.2:
 - Adjust context
 - Use literal 76 instead of ATA_ID_SATA_CAPABILITY]
Signed-off-by: Ben Hutchings

sched/rt: Avoid updating RT entry timeout twice within one tick period

2014-02-15T19:20:18+00:00

commit 57d2aa00dcec67afa52478730f2b524521af14fb upstream.

The issue below was found in 2.6.34-rt rather than mainline rt
kernel, but the issue still exists upstream as well.

So please let me describe how it was noticed on 2.6.34-rt:

On this version, each softirq has its own thread, it means there
is at least one RT FIFO task per cpu. The priority of these
tasks is set to 49 by default. If user launches an RT FIFO task
with priority lower than 49 of softirq RT tasks, it's possible
there are two RT FIFO tasks enqueued one cpu runqueue at one
moment. By current strategy of balancing RT tasks, when it comes
to RT tasks, we really need to put them off to a CPU that they
can run on as soon as possible. Even if it means a bit of cache
line flushing, we want RT tasks to be run with the least latency.

When the user RT FIFO task which just launched before is
running, the sched timer tick of the current cpu happens. In this
tick period, the timeout value of the user RT task will be
updated once. Subsequently, we try to wake up one softirq RT
task on its local cpu. As the priority of current user RT task
is lower than the softirq RT task, the current task will be
preempted by the higher priority softirq RT task. Before
preemption, we check to see if current can readily move to a
different cpu. If so, we will reschedule to allow the RT push logic
to try to move current somewhere else. Whenever the woken
softirq RT task runs, it first tries to migrate the user FIFO RT
task over to a cpu that is running a task of lesser priority. If
migration is done, it will send a reschedule request to the found
cpu by IPI interrupt. Once the target cpu responds the IPI
interrupt, it will pick the migrated user RT task to preempt its
current task. When the user RT task is running on the new cpu,
the sched timer tick of the cpu fires. So it will tick the user
RT task again. This also means the RT task timeout value will be
updated again. As the migration may be done in one tick period,
it means the user RT task timeout value will be updated twice
within one tick.

If we set a limit on the amount of cpu time for the user RT task
by setrlimit(RLIMIT_RTTIME), the SIGXCPU signal should be posted
upon reaching the soft limit.

But exactly when the SIGXCPU signal should be sent depends on the
RT task timeout value. In fact the timeout mechanism of sending
the SIGXCPU signal assumes the RT task timeout is increased once
every tick.

However, currently the timeout value may be added twice per
tick. So it results in the SIGXCPU signal being sent earlier
than expected.

To solve this issue, we prevent the timeout value from increasing
twice within one tick time by remembering the jiffies value of
last updating the timeout. As long as the RT task's jiffies is
different with the global jiffies value, we allow its timeout to
be updated.

Signed-off-by: Ying Xue 
Signed-off-by: Fan Du 
Reviewed-by: Yong Zhang 
Acked-by: Steven Rostedt 
Cc: 
Link: http://lkml.kernel.org/r/1342508623-2887-1-git-send-email-ying.xue@windriver.com
Signed-off-by: Ingo Molnar 
[ lizf: backported to 3.4: adjust context ]
Signed-off-by: Li Zefan 
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings

mm: hugetlbfs: fix hugetlbfs optimization

2014-02-15T19:20:17+00:00

commit 27c73ae759774e63313c1fbfeb17ba076cea64c5 upstream.

Commit 7cb2ef56e6a8 ("mm: fix aio performance regression for database
caused by THP") can cause dereference of a dangling pointer if
split_huge_page runs during PageHuge() if there are updates to the
tail_page->private field.

Also it is repeating compound_head twice for hugetlbfs and it is running
compound_head+compound_trans_head for THP when a single one is needed in
both cases.

The new code within the PageSlab() check doesn't need to verify that the
THP page size is never bigger than the smallest hugetlbfs page size, to
avoid memory corruption.

A longstanding theoretical race condition was found while fixing the
above (see the change right after the skip_unlock label, that is
relevant for the compound_lock path too).

By re-establishing the _mapcount tail refcounting for all compound
pages, this also fixes the below problem:

  echo 0 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

  BUG: Bad page state in process bash  pfn:59a01
  page:ffffea000139b038 count:0 mapcount:10 mapping:          (null) index:0x0
  page flags: 0x1c00000000008000(tail)
  Modules linked in:
  CPU: 6 PID: 2018 Comm: bash Not tainted 3.12.0+ #25
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  Call Trace:
    dump_stack+0x55/0x76
    bad_page+0xd5/0x130
    free_pages_prepare+0x213/0x280
    __free_pages+0x36/0x80
    update_and_free_page+0xc1/0xd0
    free_pool_huge_page+0xc2/0xe0
    set_max_huge_pages.part.58+0x14c/0x220
    nr_hugepages_store_common.isra.60+0xd0/0xf0
    nr_hugepages_store+0x13/0x20
    kobj_attr_store+0xf/0x20
    sysfs_write_file+0x189/0x1e0
    vfs_write+0xc5/0x1f0
    SyS_write+0x55/0xb0
    system_call_fastpath+0x16/0x1b

Signed-off-by: Khalid Aziz 
Signed-off-by: Andrea Arcangeli 
Tested-by: Khalid Aziz 
Cc: Pravin Shelar 
Cc: Greg Kroah-Hartman 
Cc: Ben Hutchings 
Cc: Christoph Lameter 
Cc: Johannes Weiner 
Cc: Mel Gorman 
Cc: Rik van Riel 
Cc: Andi Kleen 
Cc: Minchan Kim 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
[Khalid Aziz: Backported to 3.4]
Signed-off-by: Ben Hutchings

pci: Add PCI_DEVICE_SUB() macro

2014-02-15T19:20:16+00:00

This was added as part of commit 3d567e0e291c ('tg3: Set 10_100_ONLY
flag for additional 10/100 Mbps devices') upstream and is needed by
the following patch to ahci.

Signed-off-by: Ben Hutchings

vlan: Fix header ops passthru when doing TX VLAN offload.

2014-02-15T19:20:09+00:00

[ Upstream commit 2205369a314e12fcec4781cc73ac9c08fc2b47de ]

When the vlan code detects that the real device can do TX VLAN offloads
in hardware, it tries to arrange for the real device's header_ops to
be invoked directly.

But it does so illegally, by simply hooking the real device's
header_ops up to the VLAN device.

This doesn't work because we will end up invoking a set of header_ops
routines which expect a device type which matches the real device, but
will see a VLAN device instead.

Fix this by providing a pass-thru set of header_ops which will arrange
to pass the proper real device instead.

To facilitate this add a dev_rebuild_header().  There are
implementations which provide a ->cache and ->create but not a
->rebuild (f.e. PLIP).  So we need a helper function just like
dev_hard_header() to avoid crashes.

Use this helper in the one existing place where the
header_ops->rebuild was being invoked, the neighbour code.

With lots of help from Florian Westphal.

Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

net: rework recvmsg handler msg_name and msg_namelen logic

2014-01-03T04:33:33+00:00

[ Upstream commit f3d3342602f8bcbf37d7c46641cb9bca7618eb1c ]

This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
to return msg_name to the user.

This prevents numerous uninitialized memory leaks we had in the
recvmsg handlers and makes it harder for new code to accidentally leak
uninitialized memory.

Optimize for the case recvfrom is called with NULL as address. We don't
need to copy the address at all, so set it to NULL before invoking the
recvmsg handler. We can do so, because all the recvmsg handlers must
cope with the case a plain read() is called on them. read() also sets
msg_name to NULL.

Also document these changes in include/linux/net.h as suggested by David
Miller.

Changes since RFC:

Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
affect sendto as it would bail out earlier while trying to copy-in the
address. It also more naturally reflects the logic by the callers of
verify_iovec.

With this change in place I could remove "
if (!uaddr || msg_sys->msg_namelen == 0)
	msg->msg_name = NULL
".

This change does not alter the user visible error logic as we ignore
msg_namelen as long as msg_name is NULL.

Also remove two unnecessary curly brackets in ___sys_recvmsg and change
comments to netdev style.

Cc: David Miller 
Suggested-by: Eric Dumazet 
Signed-off-by: Hannes Frederic Sowa 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings