<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/ipc/sem.c, branch v3.12.64</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>sysv, ipc: fix security-layer leaking</title>
<updated>2016-08-19T07:51:07+00:00</updated>
<author>
<name>Fabian Frederick</name>
<email>fabf@skynet.be</email>
</author>
<published>2016-08-02T21:03:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a19e67e83ab64e5b6572c654342d3ab1764ed89f'/>
<id>a19e67e83ab64e5b6572c654342d3ab1764ed89f</id>
<content type='text'>
commit 9b24fef9f0410fb5364245d6cc2bd044cc064007 upstream.

Commit 53dad6d3a8e5 ("ipc: fix race with LSMs") updated ipc_rcu_putref()
to receive rcu freeing function but used generic ipc_rcu_free() instead
of msg_rcu_free() which does security cleaning.

Running LTP msgsnd06 with kmemleak gives the following:

  cat /sys/kernel/debug/kmemleak

  unreferenced object 0xffff88003c0a11f8 (size 8):
    comm "msgsnd06", pid 1645, jiffies 4294672526 (age 6.549s)
    hex dump (first 8 bytes):
      1b 00 00 00 01 00 00 00                          ........
    backtrace:
      kmemleak_alloc+0x23/0x40
      kmem_cache_alloc_trace+0xe1/0x180
      selinux_msg_queue_alloc_security+0x3f/0xd0
      security_msg_queue_alloc+0x2e/0x40
      newque+0x4e/0x150
      ipcget+0x159/0x1b0
      SyS_msgget+0x39/0x40
      entry_SYSCALL_64_fastpath+0x13/0x8f

Manfred Spraul suggested to fix sem.c as well and Davidlohr Bueso to
only use ipc_rcu_free in case of security allocation failure in newary()

Fixes: 53dad6d3a8e ("ipc: fix race with LSMs")
Link: http://lkml.kernel.org/r/1470083552-22966-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick &lt;fabf@skynet.be&gt;
Cc: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 9b24fef9f0410fb5364245d6cc2bd044cc064007 upstream.

Commit 53dad6d3a8e5 ("ipc: fix race with LSMs") updated ipc_rcu_putref()
to receive rcu freeing function but used generic ipc_rcu_free() instead
of msg_rcu_free() which does security cleaning.

Running LTP msgsnd06 with kmemleak gives the following:

  cat /sys/kernel/debug/kmemleak

  unreferenced object 0xffff88003c0a11f8 (size 8):
    comm "msgsnd06", pid 1645, jiffies 4294672526 (age 6.549s)
    hex dump (first 8 bytes):
      1b 00 00 00 01 00 00 00                          ........
    backtrace:
      kmemleak_alloc+0x23/0x40
      kmem_cache_alloc_trace+0xe1/0x180
      selinux_msg_queue_alloc_security+0x3f/0xd0
      security_msg_queue_alloc+0x2e/0x40
      newque+0x4e/0x150
      ipcget+0x159/0x1b0
      SyS_msgget+0x39/0x40
      entry_SYSCALL_64_fastpath+0x13/0x8f

Manfred Spraul suggested to fix sem.c as well and Davidlohr Bueso to
only use ipc_rcu_free in case of security allocation failure in newary()

Fixes: 53dad6d3a8e ("ipc: fix race with LSMs")
Link: http://lkml.kernel.org/r/1470083552-22966-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick &lt;fabf@skynet.be&gt;
Cc: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits</title>
<updated>2015-08-25T14:57:06+00:00</updated>
<author>
<name>Herton R. Krzesinski</name>
<email>herton@redhat.com</email>
</author>
<published>2015-08-14T22:35:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9ae388fbe87e9203f3ed81c4229d48795a68057d'/>
<id>9ae388fbe87e9203f3ed81c4229d48795a68057d</id>
<content type='text'>
commit 602b8593d2b4138c10e922eeaafe306f6b51817b upstream.

The current semaphore code allows a potential use after free: in
exit_sem we may free the task's sem_undo_list while there is still
another task looping through the same semaphore set and cleaning the
sem_undo list at freeary function (the task called IPC_RMID for the same
semaphore set).

For example, with a test program [1] running which keeps forking a lot
of processes (which then do a semop call with SEM_UNDO flag), and with
the parent right after removing the semaphore set with IPC_RMID, and a
kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and
CONFIG_DEBUG_SPINLOCK, you can easily see something like the following
in the kernel log:

   Slab corruption (Not tainted): kmalloc-64 start=ffff88003b45c1c0, len=64
   000: 6b 6b 6b 6b 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b  kkkkkkkk.kkkkkkk
   010: ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
   Prev obj: start=ffff88003b45c180, len=64
   000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
   010: ff ff ff ff ff ff ff ff c0 fb 01 37 00 88 ff ff  ...........7....
   Next obj: start=ffff88003b45c200, len=64
   000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
   010: ff ff ff ff ff ff ff ff 68 29 a7 3c 00 88 ff ff  ........h).&lt;....
   BUG: spinlock wrong CPU on CPU#2, test/18028
   general protection fault: 0000 [#1] SMP
   Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
   CPU: 2 PID: 18028 Comm: test Not tainted 4.2.0-rc5+ #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   RIP: spin_dump+0x53/0xc0
   Call Trace:
     spin_bug+0x30/0x40
     do_raw_spin_unlock+0x71/0xa0
     _raw_spin_unlock+0xe/0x10
     freeary+0x82/0x2a0
     ? _raw_spin_lock+0xe/0x10
     semctl_down.clone.0+0xce/0x160
     ? __do_page_fault+0x19a/0x430
     ? __audit_syscall_entry+0xa8/0x100
     SyS_semctl+0x236/0x2c0
     ? syscall_trace_leave+0xde/0x130
     entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2c a4 81 31 c0 65 8b 15 eb 40 f3 7e e8 08 31 68 00 4d 85 e4 44 8b 4b 08 74 5e &lt;45&gt; 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89
   RIP  [&lt;ffffffff810d6053&gt;] spin_dump+0x53/0xc0
    RSP &lt;ffff88003750fd68&gt;
   ---[ end trace 783ebb76612867a0 ]---
   NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:18053]
   Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
   CPU: 3 PID: 18053 Comm: test Tainted: G      D         4.2.0-rc5+ #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   RIP: native_read_tsc+0x0/0x20
   Call Trace:
     ? delay_tsc+0x40/0x70
     __delay+0xf/0x20
     do_raw_spin_lock+0x96/0x140
     _raw_spin_lock+0xe/0x10
     sem_lock_and_putref+0x11/0x70
     SYSC_semtimedop+0x7bf/0x960
     ? handle_mm_fault+0xbf6/0x1880
     ? dequeue_task_fair+0x79/0x4a0
     ? __do_page_fault+0x19a/0x430
     ? kfree_debugcheck+0x16/0x40
     ? __do_page_fault+0x19a/0x430
     ? __audit_syscall_entry+0xa8/0x100
     ? do_audit_syscall_entry+0x66/0x70
     ? syscall_trace_enter_phase1+0x139/0x160
     SyS_semtimedop+0xe/0x10
     SyS_semop+0x10/0x20
     entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 47 10 83 e8 01 85 c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 &lt;55&gt; 48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9
   Kernel panic - not syncing: softlockup: hung tasks

I wasn't able to trigger any badness on a recent kernel without the
proper config debugs enabled, however I have softlockup reports on some
kernel versions, in the semaphore code, which are similar as above (the
scenario is seen on some servers running IBM DB2 which uses semaphore
syscalls).

The patch here fixes the race against freeary, by acquiring or waiting
on the sem_undo_list lock as necessary (exit_sem can race with freeary,
while freeary sets un-&gt;semid to -1 and removes the same sem_undo from
list_proc or when it removes the last sem_undo).

After the patch I'm unable to reproduce the problem using the test case
[1].

[1] Test case used below:

    #include &lt;stdio.h&gt;
    #include &lt;sys/types.h&gt;
    #include &lt;sys/ipc.h&gt;
    #include &lt;sys/sem.h&gt;
    #include &lt;sys/wait.h&gt;
    #include &lt;stdlib.h&gt;
    #include &lt;time.h&gt;
    #include &lt;unistd.h&gt;
    #include &lt;errno.h&gt;

    #define NSEM 1
    #define NSET 5

    int sid[NSET];

    void thread()
    {
            struct sembuf op;
            int s;
            uid_t pid = getuid();

            s = rand() % NSET;
            op.sem_num = pid % NSEM;
            op.sem_op = 1;
            op.sem_flg = SEM_UNDO;

            semop(sid[s], &amp;op, 1);
            exit(EXIT_SUCCESS);
    }

    void create_set()
    {
            int i, j;
            pid_t p;
            union {
                    int val;
                    struct semid_ds *buf;
                    unsigned short int *array;
                    struct seminfo *__buf;
            } un;

            /* Create and initialize semaphore set */
            for (i = 0; i &lt; NSET; i++) {
                    sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT);
                    if (sid[i] &lt; 0) {
                            perror("semget");
                            exit(EXIT_FAILURE);
                    }
            }
            un.val = 0;
            for (i = 0; i &lt; NSET; i++) {
                    for (j = 0; j &lt; NSEM; j++) {
                            if (semctl(sid[i], j, SETVAL, un) &lt; 0)
                                    perror("semctl");
                    }
            }

            /* Launch threads that operate on semaphore set */
            for (i = 0; i &lt; NSEM * NSET * NSET; i++) {
                    p = fork();
                    if (p &lt; 0)
                            perror("fork");
                    if (p == 0)
                            thread();
            }

            /* Free semaphore set */
            for (i = 0; i &lt; NSET; i++) {
                    if (semctl(sid[i], NSEM, IPC_RMID))
                            perror("IPC_RMID");
            }

            /* Wait for forked processes to exit */
            while (wait(NULL)) {
                    if (errno == ECHILD)
                            break;
            };
    }

    int main(int argc, char **argv)
    {
            pid_t p;

            srand(time(NULL));

            while (1) {
                    p = fork();
                    if (p &lt; 0) {
                            perror("fork");
                            exit(EXIT_FAILURE);
                    }
                    if (p == 0) {
                            create_set();
                            goto end;
                    }

                    /* Wait for forked processes to exit */
                    while (wait(NULL)) {
                            if (errno == ECHILD)
                                    break;
                    };
            }
    end:
            return 0;
    }

[akpm@linux-foundation.org: use normal comment layout]
Signed-off-by: Herton R. Krzesinski &lt;herton@redhat.com&gt;
Acked-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Rafael Aquini &lt;aquini@redhat.com&gt;
CC: Aristeu Rozanski &lt;aris@redhat.com&gt;
Cc: David Jeffery &lt;djeffery@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 602b8593d2b4138c10e922eeaafe306f6b51817b upstream.

The current semaphore code allows a potential use after free: in
exit_sem we may free the task's sem_undo_list while there is still
another task looping through the same semaphore set and cleaning the
sem_undo list at freeary function (the task called IPC_RMID for the same
semaphore set).

For example, with a test program [1] running which keeps forking a lot
of processes (which then do a semop call with SEM_UNDO flag), and with
the parent right after removing the semaphore set with IPC_RMID, and a
kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and
CONFIG_DEBUG_SPINLOCK, you can easily see something like the following
in the kernel log:

   Slab corruption (Not tainted): kmalloc-64 start=ffff88003b45c1c0, len=64
   000: 6b 6b 6b 6b 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b  kkkkkkkk.kkkkkkk
   010: ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
   Prev obj: start=ffff88003b45c180, len=64
   000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
   010: ff ff ff ff ff ff ff ff c0 fb 01 37 00 88 ff ff  ...........7....
   Next obj: start=ffff88003b45c200, len=64
   000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
   010: ff ff ff ff ff ff ff ff 68 29 a7 3c 00 88 ff ff  ........h).&lt;....
   BUG: spinlock wrong CPU on CPU#2, test/18028
   general protection fault: 0000 [#1] SMP
   Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
   CPU: 2 PID: 18028 Comm: test Not tainted 4.2.0-rc5+ #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   RIP: spin_dump+0x53/0xc0
   Call Trace:
     spin_bug+0x30/0x40
     do_raw_spin_unlock+0x71/0xa0
     _raw_spin_unlock+0xe/0x10
     freeary+0x82/0x2a0
     ? _raw_spin_lock+0xe/0x10
     semctl_down.clone.0+0xce/0x160
     ? __do_page_fault+0x19a/0x430
     ? __audit_syscall_entry+0xa8/0x100
     SyS_semctl+0x236/0x2c0
     ? syscall_trace_leave+0xde/0x130
     entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2c a4 81 31 c0 65 8b 15 eb 40 f3 7e e8 08 31 68 00 4d 85 e4 44 8b 4b 08 74 5e &lt;45&gt; 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89
   RIP  [&lt;ffffffff810d6053&gt;] spin_dump+0x53/0xc0
    RSP &lt;ffff88003750fd68&gt;
   ---[ end trace 783ebb76612867a0 ]---
   NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:18053]
   Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
   CPU: 3 PID: 18053 Comm: test Tainted: G      D         4.2.0-rc5+ #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   RIP: native_read_tsc+0x0/0x20
   Call Trace:
     ? delay_tsc+0x40/0x70
     __delay+0xf/0x20
     do_raw_spin_lock+0x96/0x140
     _raw_spin_lock+0xe/0x10
     sem_lock_and_putref+0x11/0x70
     SYSC_semtimedop+0x7bf/0x960
     ? handle_mm_fault+0xbf6/0x1880
     ? dequeue_task_fair+0x79/0x4a0
     ? __do_page_fault+0x19a/0x430
     ? kfree_debugcheck+0x16/0x40
     ? __do_page_fault+0x19a/0x430
     ? __audit_syscall_entry+0xa8/0x100
     ? do_audit_syscall_entry+0x66/0x70
     ? syscall_trace_enter_phase1+0x139/0x160
     SyS_semtimedop+0xe/0x10
     SyS_semop+0x10/0x20
     entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 47 10 83 e8 01 85 c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 &lt;55&gt; 48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9
   Kernel panic - not syncing: softlockup: hung tasks

I wasn't able to trigger any badness on a recent kernel without the
proper config debugs enabled, however I have softlockup reports on some
kernel versions, in the semaphore code, which are similar as above (the
scenario is seen on some servers running IBM DB2 which uses semaphore
syscalls).

The patch here fixes the race against freeary, by acquiring or waiting
on the sem_undo_list lock as necessary (exit_sem can race with freeary,
while freeary sets un-&gt;semid to -1 and removes the same sem_undo from
list_proc or when it removes the last sem_undo).

After the patch I'm unable to reproduce the problem using the test case
[1].

[1] Test case used below:

    #include &lt;stdio.h&gt;
    #include &lt;sys/types.h&gt;
    #include &lt;sys/ipc.h&gt;
    #include &lt;sys/sem.h&gt;
    #include &lt;sys/wait.h&gt;
    #include &lt;stdlib.h&gt;
    #include &lt;time.h&gt;
    #include &lt;unistd.h&gt;
    #include &lt;errno.h&gt;

    #define NSEM 1
    #define NSET 5

    int sid[NSET];

    void thread()
    {
            struct sembuf op;
            int s;
            uid_t pid = getuid();

            s = rand() % NSET;
            op.sem_num = pid % NSEM;
            op.sem_op = 1;
            op.sem_flg = SEM_UNDO;

            semop(sid[s], &amp;op, 1);
            exit(EXIT_SUCCESS);
    }

    void create_set()
    {
            int i, j;
            pid_t p;
            union {
                    int val;
                    struct semid_ds *buf;
                    unsigned short int *array;
                    struct seminfo *__buf;
            } un;

            /* Create and initialize semaphore set */
            for (i = 0; i &lt; NSET; i++) {
                    sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT);
                    if (sid[i] &lt; 0) {
                            perror("semget");
                            exit(EXIT_FAILURE);
                    }
            }
            un.val = 0;
            for (i = 0; i &lt; NSET; i++) {
                    for (j = 0; j &lt; NSEM; j++) {
                            if (semctl(sid[i], j, SETVAL, un) &lt; 0)
                                    perror("semctl");
                    }
            }

            /* Launch threads that operate on semaphore set */
            for (i = 0; i &lt; NSEM * NSET * NSET; i++) {
                    p = fork();
                    if (p &lt; 0)
                            perror("fork");
                    if (p == 0)
                            thread();
            }

            /* Free semaphore set */
            for (i = 0; i &lt; NSET; i++) {
                    if (semctl(sid[i], NSEM, IPC_RMID))
                            perror("IPC_RMID");
            }

            /* Wait for forked processes to exit */
            while (wait(NULL)) {
                    if (errno == ECHILD)
                            break;
            };
    }

    int main(int argc, char **argv)
    {
            pid_t p;

            srand(time(NULL));

            while (1) {
                    p = fork();
                    if (p &lt; 0) {
                            perror("fork");
                            exit(EXIT_FAILURE);
                    }
                    if (p == 0) {
                            create_set();
                            goto end;
                    }

                    /* Wait for forked processes to exit */
                    while (wait(NULL)) {
                            if (errno == ECHILD)
                                    break;
                    };
            }
    end:
            return 0;
    }

[akpm@linux-foundation.org: use normal comment layout]
Signed-off-by: Herton R. Krzesinski &lt;herton@redhat.com&gt;
Acked-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Rafael Aquini &lt;aquini@redhat.com&gt;
CC: Aristeu Rozanski &lt;aris@redhat.com&gt;
Cc: David Jeffery &lt;djeffery@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ipc/sem.c: update/correct memory barriers</title>
<updated>2015-08-25T14:57:06+00:00</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2015-08-14T22:35:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e50758a0d7fe5d403876fa4f0631c5c66bb78b6b'/>
<id>e50758a0d7fe5d403876fa4f0631c5c66bb78b6b</id>
<content type='text'>
commit 3ed1f8a99d70ea1cd1508910eb107d0edcae5009 upstream.

sem_lock() did not properly pair memory barriers:

!spin_is_locked() and spin_unlock_wait() are both only control barriers.
The code needs an acquire barrier, otherwise the cpu might perform read
operations before the lock test.

As no primitive exists inside &lt;include/spinlock.h&gt; and since it seems
noone wants another primitive, the code creates a local primitive within
ipc/sem.c.

With regards to -stable:

The change of sem_wait_array() is a bugfix, the change to sem_lock() is a
nop (just a preprocessor redefinition to improve the readability).  The
bugfix is necessary for all kernels that use sem_wait_array() (i.e.:
starting from 3.10).

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Reported-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Kirill Tkhai &lt;ktkhai@parallels.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 3ed1f8a99d70ea1cd1508910eb107d0edcae5009 upstream.

sem_lock() did not properly pair memory barriers:

!spin_is_locked() and spin_unlock_wait() are both only control barriers.
The code needs an acquire barrier, otherwise the cpu might perform read
operations before the lock test.

As no primitive exists inside &lt;include/spinlock.h&gt; and since it seems
noone wants another primitive, the code creates a local primitive within
ipc/sem.c.

With regards to -stable:

The change of sem_wait_array() is a bugfix, the change to sem_lock() is a
nop (just a preprocessor redefinition to improve the readability).  The
bugfix is necessary for all kernels that use sem_wait_array() (i.e.:
starting from 3.10).

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Reported-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Kirill Tkhai &lt;ktkhai@parallels.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipc/sem.c: fully initialize sem_array before making it visible</title>
<updated>2015-04-09T11:14:24+00:00</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2014-12-02T23:59:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=dd06a9623b7e811b362f248bfc1d3886b0e8add1'/>
<id>dd06a9623b7e811b362f248bfc1d3886b0e8add1</id>
<content type='text'>
commit e8577d1f0329d4842e8302e289fb2c22156abef4 upstream.

ipc_addid() makes a new ipc identifier visible to everyone.  New objects
start as locked, so that the caller can complete the initialization
after the call.  Within struct sem_array, at least sma-&gt;sem_base and
sma-&gt;sem_nsems are accessed without any locks, therefore this approach
doesn't work.

Thus: Move the ipc_addid() to the end of the initialization.

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Reported-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Acked-by: Rafael Aquini &lt;aquini@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit e8577d1f0329d4842e8302e289fb2c22156abef4 upstream.

ipc_addid() makes a new ipc identifier visible to everyone.  New objects
start as locked, so that the caller can complete the initialization
after the call.  Within struct sem_array, at least sma-&gt;sem_base and
sma-&gt;sem_nsems are accessed without any locks, therefore this approach
doesn't work.

Thus: Move the ipc_addid() to the end of the initialization.

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Reported-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Acked-by: Rafael Aquini &lt;aquini@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipc/sem.c: change memory barrier in sem_lock() to smp_rmb()</title>
<updated>2015-02-16T13:56:32+00:00</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2014-12-13T00:58:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b116a0436c29bd83196d1988d75521044dc36e5d'/>
<id>b116a0436c29bd83196d1988d75521044dc36e5d</id>
<content type='text'>
commit 2e094abfd1f29a08a60523b42d4508281b8dee0e upstream.

When I fixed bugs in the sem_lock() logic, I was more conservative than
necessary.  Therefore it is safe to replace the smp_mb() with smp_rmb().
And: With smp_rmb(), semop() syscalls are up to 10% faster.

The race we must protect against is:

	sem-&gt;lock is free
	sma-&gt;complex_count = 0
	sma-&gt;sem_perm.lock held by thread B

thread A:

A: spin_lock(&amp;sem-&gt;lock)

			B: sma-&gt;complex_count++; (now 1)
			B: spin_unlock(&amp;sma-&gt;sem_perm.lock);

A: spin_is_locked(&amp;sma-&gt;sem_perm.lock);
A: XXXXX memory barrier
A: if (sma-&gt;complex_count == 0)

Thread A must read the increased complex_count value, i.e. the read must
not be reordered with the read of sem_perm.lock done by spin_is_locked().

Since it's about ordering of reads, smp_rmb() is sufficient.

[akpm@linux-foundation.org: update sem_lock() comment, from Davidlohr]
Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Reviewed-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Acked-by: Rafael Aquini &lt;aquini@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;

Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 2e094abfd1f29a08a60523b42d4508281b8dee0e upstream.

When I fixed bugs in the sem_lock() logic, I was more conservative than
necessary.  Therefore it is safe to replace the smp_mb() with smp_rmb().
And: With smp_rmb(), semop() syscalls are up to 10% faster.

The race we must protect against is:

	sem-&gt;lock is free
	sma-&gt;complex_count = 0
	sma-&gt;sem_perm.lock held by thread B

thread A:

A: spin_lock(&amp;sem-&gt;lock)

			B: sma-&gt;complex_count++; (now 1)
			B: spin_unlock(&amp;sma-&gt;sem_perm.lock);

A: spin_is_locked(&amp;sma-&gt;sem_perm.lock);
A: XXXXX memory barrier
A: if (sma-&gt;complex_count == 0)

Thread A must read the increased complex_count value, i.e. the read must
not be reordered with the read of sem_perm.lock done by spin_is_locked().

Since it's about ordering of reads, smp_rmb() is sufficient.

[akpm@linux-foundation.org: update sem_lock() comment, from Davidlohr]
Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Reviewed-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Acked-by: Rafael Aquini &lt;aquini@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;

Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipc/sem.c: synchronize semop and semctl with IPC_RMID</title>
<updated>2013-10-17T04:35:52+00:00</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2013-10-16T20:46:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6e224f94597842c5eb17f1fc2208d20b6f7f7d49'/>
<id>6e224f94597842c5eb17f1fc2208d20b6f7f7d49</id>
<content type='text'>
After acquiring the semlock spinlock, operations must test that the
array is still valid.

 - semctl() and exit_sem() would walk stale linked lists (ugly, but
   should be ok: all lists are empty)

 - semtimedop() would sleep forever - and if woken up due to a signal -
   access memory after free.

The patch also:
 - standardizes the tests for .deleted, so that all tests in one
   function leave the function with the same approach.
 - unconditionally tests for .deleted immediately after every call to
   sem_lock - even it it means that for semctl(GETALL), .deleted will be
   tested twice.

Both changes make the review simpler: After every sem_lock, there must
be a test of .deleted, followed by a goto to the cleanup code (if the
function uses "goto cleanup").

The only exception is semctl_down(): If sem_ids().rwsem is locked, then
the presence in ids-&gt;ipcs_idr is equivalent to !.deleted, thus no
additional test is required.

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Acked-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After acquiring the semlock spinlock, operations must test that the
array is still valid.

 - semctl() and exit_sem() would walk stale linked lists (ugly, but
   should be ok: all lists are empty)

 - semtimedop() would sleep forever - and if woken up due to a signal -
   access memory after free.

The patch also:
 - standardizes the tests for .deleted, so that all tests in one
   function leave the function with the same approach.
 - unconditionally tests for .deleted immediately after every call to
   sem_lock - even it it means that for semctl(GETALL), .deleted will be
   tested twice.

Both changes make the review simpler: After every sem_lock, there must
be a test of .deleted, followed by a goto to the cleanup code (if the
function uses "goto cleanup").

The only exception is semctl_down(): If sem_ids().rwsem is locked, then
the presence in ids-&gt;ipcs_idr is equivalent to !.deleted, thus no
additional test is required.

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Acked-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipc/sem.c: update sem_otime for all operations</title>
<updated>2013-09-30T21:31:03+00:00</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2013-09-30T20:45:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0e8c665699e953fa58dc1b0b0d09e5dce7343cc7'/>
<id>0e8c665699e953fa58dc1b0b0d09e5dce7343cc7</id>
<content type='text'>
In commit 0a2b9d4c7967 ("ipc/sem.c: move wake_up_process out of the
spinlock section"), the update of semaphore's sem_otime(last semop time)
was moved to one central position (do_smart_update).

But since do_smart_update() is only called for operations that modify
the array, this means that wait-for-zero semops do not update sem_otime
anymore.

The fix is simple:
Non-alter operations must update sem_otime.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Reported-by: Jia He &lt;jiakernel@gmail.com&gt;
Tested-by: Jia He &lt;jiakernel@gmail.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In commit 0a2b9d4c7967 ("ipc/sem.c: move wake_up_process out of the
spinlock section"), the update of semaphore's sem_otime(last semop time)
was moved to one central position (do_smart_update).

But since do_smart_update() is only called for operations that modify
the array, this means that wait-for-zero semops do not update sem_otime
anymore.

The fix is simple:
Non-alter operations must update sem_otime.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Reported-by: Jia He &lt;jiakernel@gmail.com&gt;
Tested-by: Jia He &lt;jiakernel@gmail.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipc/sem.c: synchronize the proc interface</title>
<updated>2013-09-30T21:31:01+00:00</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2013-09-30T20:45:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d8c633766ad88527f25d9f81a5c2f083d78a2b39'/>
<id>d8c633766ad88527f25d9f81a5c2f083d78a2b39</id>
<content type='text'>
The proc interface is not aware of sem_lock(), it instead calls
ipc_lock_object() directly.  This means that simple semop() operations
can run in parallel with the proc interface.  Right now, this is
uncritical, because the implementation doesn't do anything that requires
a proper synchronization.

But it is dangerous and therefore should be fixed.

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The proc interface is not aware of sem_lock(), it instead calls
ipc_lock_object() directly.  This means that simple semop() operations
can run in parallel with the proc interface.  Right now, this is
uncritical, because the implementation doesn't do anything that requires
a proper synchronization.

But it is dangerous and therefore should be fixed.

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipc/sem.c: optimize sem_lock()</title>
<updated>2013-09-30T21:31:01+00:00</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2013-09-30T20:45:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6d07b68ce16ae9535955ba2059dedba5309c3ca1'/>
<id>6d07b68ce16ae9535955ba2059dedba5309c3ca1</id>
<content type='text'>
Operations that need access to the whole array must guarantee that there
are no simple operations ongoing.  Right now this is achieved by
spin_unlock_wait(sem-&gt;lock) on all semaphores.

If complex_count is nonzero, then this spin_unlock_wait() is not
necessary, because it was already performed in the past by the thread
that increased complex_count and even though sem_perm.lock was dropped
inbetween, no simple operation could have started, because simple
operations cannot start when complex_count is non-zero.

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Mike Galbraith &lt;bitbucket@online.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Reviewed-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Operations that need access to the whole array must guarantee that there
are no simple operations ongoing.  Right now this is achieved by
spin_unlock_wait(sem-&gt;lock) on all semaphores.

If complex_count is nonzero, then this spin_unlock_wait() is not
necessary, because it was already performed in the past by the thread
that increased complex_count and even though sem_perm.lock was dropped
inbetween, no simple operation could have started, because simple
operations cannot start when complex_count is non-zero.

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Mike Galbraith &lt;bitbucket@online.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Reviewed-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipc/sem.c: fix race in sem_lock()</title>
<updated>2013-09-30T21:31:01+00:00</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2013-09-30T20:45:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5e9d527591421ccdb16acb8c23662231135d8686'/>
<id>5e9d527591421ccdb16acb8c23662231135d8686</id>
<content type='text'>
The exclusion of complex operations in sem_lock() is insufficient: after
acquiring the per-semaphore lock, a simple op must first check that
sem_perm.lock is not locked and only after that test check
complex_count.  The current code does it the other way around - and that
creates a race.  Details are below.

The patch is a complete rewrite of sem_lock(), based in part on the code
from Mike Galbraith.  It removes all gotos and all loops and thus the
risk of livelocks.

I have tested the patch (together with the next one) on my i3 laptop and
it didn't cause any problems.

The bug is probably also present in 3.10 and 3.11, but for these kernels
it might be simpler just to move the test of sma-&gt;complex_count after
the spin_is_locked() test.

Details of the bug:

Assume:
 - sma-&gt;complex_count = 0.
 - Thread 1: semtimedop(complex op that must sleep)
 - Thread 2: semtimedop(simple op).

Pseudo-Trace:

Thread 1: sem_lock(): acquire sem_perm.lock
Thread 1: sem_lock(): check for ongoing simple ops
			Nothing ongoing, thread 2 is still before sem_lock().
Thread 1: try_atomic_semop()
	&lt;&lt;&lt; preempted.

Thread 2: sem_lock():
        static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
                                      int nsops)
        {
                int locknum;
         again:
                if (nsops == 1 &amp;&amp; !sma-&gt;complex_count) {
                        struct sem *sem = sma-&gt;sem_base + sops-&gt;sem_num;

                        /* Lock just the semaphore we are interested in. */
                        spin_lock(&amp;sem-&gt;lock);

                        /*
                         * If sma-&gt;complex_count was set while we were spinning,
                         * we may need to look at things we did not lock here.
                         */
                        if (unlikely(sma-&gt;complex_count)) {
                                spin_unlock(&amp;sem-&gt;lock);
                                goto lock_array;
                        }
        &lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;
	&lt;&lt;&lt; complex_count is still 0.
	&lt;&lt;&lt;
        &lt;&lt;&lt; Here it is preempted
        &lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;

Thread 1: try_atomic_semop() returns, notices that it must sleep.
Thread 1: increases sma-&gt;complex_count.
Thread 1: drops sem_perm.lock
Thread 2:
                /*
                 * Another process is holding the global lock on the
                 * sem_array; we cannot enter our critical section,
                 * but have to wait for the global lock to be released.
                 */
                if (unlikely(spin_is_locked(&amp;sma-&gt;sem_perm.lock))) {
                        spin_unlock(&amp;sem-&gt;lock);
                        spin_unlock_wait(&amp;sma-&gt;sem_perm.lock);
                        goto again;
                }
	&lt;&lt;&lt; sem_perm.lock already dropped, thus no "goto again;"

                locknum = sops-&gt;sem_num;

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Mike Galbraith &lt;bitbucket@online.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[3.10+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The exclusion of complex operations in sem_lock() is insufficient: after
acquiring the per-semaphore lock, a simple op must first check that
sem_perm.lock is not locked and only after that test check
complex_count.  The current code does it the other way around - and that
creates a race.  Details are below.

The patch is a complete rewrite of sem_lock(), based in part on the code
from Mike Galbraith.  It removes all gotos and all loops and thus the
risk of livelocks.

I have tested the patch (together with the next one) on my i3 laptop and
it didn't cause any problems.

The bug is probably also present in 3.10 and 3.11, but for these kernels
it might be simpler just to move the test of sma-&gt;complex_count after
the spin_is_locked() test.

Details of the bug:

Assume:
 - sma-&gt;complex_count = 0.
 - Thread 1: semtimedop(complex op that must sleep)
 - Thread 2: semtimedop(simple op).

Pseudo-Trace:

Thread 1: sem_lock(): acquire sem_perm.lock
Thread 1: sem_lock(): check for ongoing simple ops
			Nothing ongoing, thread 2 is still before sem_lock().
Thread 1: try_atomic_semop()
	&lt;&lt;&lt; preempted.

Thread 2: sem_lock():
        static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
                                      int nsops)
        {
                int locknum;
         again:
                if (nsops == 1 &amp;&amp; !sma-&gt;complex_count) {
                        struct sem *sem = sma-&gt;sem_base + sops-&gt;sem_num;

                        /* Lock just the semaphore we are interested in. */
                        spin_lock(&amp;sem-&gt;lock);

                        /*
                         * If sma-&gt;complex_count was set while we were spinning,
                         * we may need to look at things we did not lock here.
                         */
                        if (unlikely(sma-&gt;complex_count)) {
                                spin_unlock(&amp;sem-&gt;lock);
                                goto lock_array;
                        }
        &lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;
	&lt;&lt;&lt; complex_count is still 0.
	&lt;&lt;&lt;
        &lt;&lt;&lt; Here it is preempted
        &lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;

Thread 1: try_atomic_semop() returns, notices that it must sleep.
Thread 1: increases sma-&gt;complex_count.
Thread 1: drops sem_perm.lock
Thread 2:
                /*
                 * Another process is holding the global lock on the
                 * sem_array; we cannot enter our critical section,
                 * but have to wait for the global lock to be released.
                 */
                if (unlikely(spin_is_locked(&amp;sma-&gt;sem_perm.lock))) {
                        spin_unlock(&amp;sem-&gt;lock);
                        spin_unlock_wait(&amp;sma-&gt;sem_perm.lock);
                        goto again;
                }
	&lt;&lt;&lt; sem_perm.lock already dropped, thus no "goto again;"

                locknum = sops-&gt;sem_num;

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Mike Galbraith &lt;bitbucket@online.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[3.10+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
