linux-stable.git/fs, branch v3.12.52

ocfs2: fix umask ignored issue

2016-01-05T16:48:50+00:00

commit 8f1eb48758aacf6c1ffce18179295adbf3bd7640 upstream.

New created file's mode is not masked with umask, and this makes umask not
work for ocfs2 volume.

Fixes: 702e5bc ("ocfs2: use generic posix ACL infrastructure")
Signed-off-by: Junxiao Bi 
Cc: Gang He 
Cc: Mark Fasheh 
Cc: Joel Becker 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Jiri Slaby

nfs: if we have no valid attrs, then don't declare the attribute cache valid

2016-01-05T16:48:50+00:00

commit c812012f9ca7cf89c9e1a1cd512e6c3b5be04b85 upstream.

If we pass in an empty nfs_fattr struct to nfs_update_inode, it will
(correctly) not update any of the attributes, but it then clears the
NFS_INO_INVALID_ATTR flag, which indicates that the attributes are
up to date. Don't clear the flag if the fattr struct has no valid
attrs to apply.

Reviewed-by: Steve French 
Signed-off-by: Jeff Layton 
Signed-off-by: Trond Myklebust 
Signed-off-by: Jiri Slaby

nfs4: start callback_ident at idr 1

2016-01-05T16:48:50+00:00

commit c68a027c05709330fe5b2f50c50d5fa02124b5d8 upstream.

If clp->cl_cb_ident is zero, then nfs_cb_idr_remove_locked() skips removing
it when the nfs_client is freed.  A decoding or server bug can then find
and try to put that first nfs_client which would lead to a crash.

Signed-off-by: Benjamin Coddington 
Fixes: d6870312659d ("nfs4client: convert to idr_alloc()")
Signed-off-by: Trond Myklebust 
Signed-off-by: Jiri Slaby

ext4, jbd2: ensure entering into panic after recording an error in superblock

2016-01-05T16:48:50+00:00

commit 4327ba52afd03fc4b5afa0ee1d774c9c5b0e85c5 upstream.

If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the
journaling will be aborted first and the error number will be recorded
into JBD2 superblock and, finally, the system will enter into the
panic state in "errors=panic" option.  But, in the rare case, this
sequence is little twisted like the below figure and it will happen
that the system enters into panic state, which means the system reset
in mobile environment, before completion of recording an error in the
journal superblock. In this case, e2fsck cannot recognize that the
filesystem failure occurred in the previous run and the corruption
wouldn't be fixed.

Task A                        Task B
ext4_handle_error()
-> jbd2_journal_abort()
  -> __journal_abort_soft()
    -> __jbd2_journal_abort_hard()
    | -> journal->j_flags |= JBD2_ABORT;
    |
    |                         __ext4_abort()
    |                         -> jbd2_journal_abort()
    |                         | -> __journal_abort_soft()
    |                         |   -> if (journal->j_flags & JBD2_ABORT)
    |                         |           return;
    |                         -> panic()
    |
    -> jbd2_journal_update_sb_errno()

Tested-by: Hobin Woo 
Signed-off-by: Daeho Jeong 
Signed-off-by: Theodore Ts'o 
Signed-off-by: Jiri Slaby

ext4: fix potential use after free in __ext4_journal_stop

2016-01-05T16:48:49+00:00

commit 6934da9238da947628be83635e365df41064b09b upstream.

There is a use-after-free possibility in __ext4_journal_stop() in the
case that we free the handle in the first jbd2_journal_stop() because
we're referencing handle->h_err afterwards. This was introduced in
9705acd63b125dee8b15c705216d7186daea4625 and it is wrong. Fix it by
storing the handle->h_err value beforehand and avoid referencing
potentially freed handle.

Fixes: 9705acd63b125dee8b15c705216d7186daea4625
Signed-off-by: Lukas Czerner 
Reviewed-by: Andreas Dilger 
Signed-off-by: Jiri Slaby

Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow

2016-01-05T16:48:49+00:00

commit 1d512cb77bdbda80f0dd0620a3b260d697fd581d upstream.

If we are using the NO_HOLES feature, we have a tiny time window when
running delalloc for a nodatacow inode where we can race with a concurrent
link or xattr add operation leading to a BUG_ON.

This happens because at run_delalloc_nocow() we end up casting a leaf item
of type BTRFS_INODE_[REF|EXTREF]_KEY or of type BTRFS_XATTR_ITEM_KEY to a
file extent item (struct btrfs_file_extent_item) and then analyse its
extent type field, which won't match any of the expected extent types
(values BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]) and therefore trigger an
explicit BUG_ON(1).

The following sequence diagram shows how the race happens when running a
no-cow dellaloc range [4K, 8K[ for inode 257 and we have the following
neighbour leafs:

             Leaf X (has N items)                    Leaf Y

 [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ]  [ (257 EXTENT_DATA 8192), ... ]
              slot N - 2         slot N - 1              slot 0

 (Note the implicit hole for inode 257 regarding the [0, 8K[ range)

       CPU 1                                         CPU 2

 run_dealloc_nocow()
   btrfs_lookup_file_extent()
     --> searches for a key with value
         (257 EXTENT_DATA 4096) in the
         fs/subvol tree
     --> returns us a path with
         path->nodes[0] == leaf X and
         path->slots[0] == N

   because path->slots[0] is >=
   btrfs_header_nritems(leaf X), it
   calls btrfs_next_leaf()

   btrfs_next_leaf()
     --> releases the path

                                              hard link added to our inode,
                                              with key (257 INODE_REF 500)
                                              added to the end of leaf X,
                                              so leaf X now has N + 1 keys

     --> searches for the key
         (257 INODE_REF 256), because
         it was the last key in leaf X
         before it released the path,
         with path->keep_locks set to 1

     --> ends up at leaf X again and
         it verifies that the key
         (257 INODE_REF 256) is no longer
         the last key in the leaf, so it
         returns with path->nodes[0] ==
         leaf X and path->slots[0] == N,
         pointing to the new item with
         key (257 INODE_REF 500)

   the loop iteration of run_dealloc_nocow()
   does not break out the loop and continues
   because the key referenced in the path
   at path->nodes[0] and path->slots[0] is
   for inode 257, its type is < BTRFS_EXTENT_DATA_KEY
   and its offset (500) is less then our delalloc
   range's end (8192)

   the item pointed by the path, an inode reference item,
   is (incorrectly) interpreted as a file extent item and
   we get an invalid extent type, leading to the BUG_ON(1):

   if (extent_type == BTRFS_FILE_EXTENT_REG ||
      extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
       (...)
   } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
       (...)
   } else {
       BUG_ON(1)
   }

The same can happen if a xattr is added concurrently and ends up having
a key with an offset smaller then the delalloc's range end.

So fix this by skipping keys with a type smaller than
BTRFS_EXTENT_DATA_KEY.

Signed-off-by: Filipe Manana 
Signed-off-by: Jiri Slaby

Btrfs: fix race leading to incorrect item deletion when dropping extents

2016-01-05T16:48:49+00:00

commit aeafbf8486c9e2bd53f5cc3c10c0b7fd7149d69c upstream.

While running a stress test I got the following warning triggered:

  [191627.672810] ------------[ cut here ]------------
  [191627.673949] WARNING: CPU: 8 PID: 8447 at fs/btrfs/file.c:779 __btrfs_drop_extents+0x391/0xa50 [btrfs]()
  (...)
  [191627.701485] Call Trace:
  [191627.702037]  [] dump_stack+0x4f/0x7b
  [191627.702992]  [] ? console_unlock+0x356/0x3a2
  [191627.704091]  [] warn_slowpath_common+0xa1/0xbb
  [191627.705380]  [] ? __btrfs_drop_extents+0x391/0xa50 [btrfs]
  [191627.706637]  [] warn_slowpath_null+0x1a/0x1c
  [191627.707789]  [] __btrfs_drop_extents+0x391/0xa50 [btrfs]
  [191627.709155]  [] ? cache_alloc_debugcheck_after.isra.32+0x171/0x1d0
  [191627.712444]  [] ? kmemleak_alloc_recursive.constprop.40+0x16/0x18
  [191627.714162]  [] insert_reserved_file_extent.constprop.40+0x83/0x24e [btrfs]
  [191627.715887]  [] ? start_transaction+0x3bb/0x610 [btrfs]
  [191627.717287]  [] btrfs_finish_ordered_io+0x273/0x4e2 [btrfs]
  [191627.728865]  [] finish_ordered_fn+0x15/0x17 [btrfs]
  [191627.730045]  [] normal_work_helper+0x14c/0x32c [btrfs]
  [191627.731256]  [] btrfs_endio_write_helper+0x12/0x14 [btrfs]
  [191627.732661]  [] process_one_work+0x24c/0x4ae
  [191627.733822]  [] worker_thread+0x206/0x2c2
  [191627.734857]  [] ? process_scheduled_works+0x2f/0x2f
  [191627.736052]  [] ? process_scheduled_works+0x2f/0x2f
  [191627.737349]  [] kthread+0xef/0xf7
  [191627.738267]  [] ? time_hardirqs_on+0x15/0x28
  [191627.739330]  [] ? __kthread_parkme+0xad/0xad
  [191627.741976]  [] ret_from_fork+0x42/0x70
  [191627.743080]  [] ? __kthread_parkme+0xad/0xad
  [191627.744206] ---[ end trace bbfddacb7aaada8d ]---

  $ cat -n fs/btrfs/file.c
  691  int __btrfs_drop_extents(struct btrfs_trans_handle *trans,
  (...)
  758                  btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
  759                  if (key.objectid > ino ||
  760                      key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
  761                          break;
  762
  763                  fi = btrfs_item_ptr(leaf, path->slots[0],
  764                                      struct btrfs_file_extent_item);
  765                  extent_type = btrfs_file_extent_type(leaf, fi);
  766
  767                  if (extent_type == BTRFS_FILE_EXTENT_REG ||
  768                      extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
  (...)
  774                  } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
  (...)
  778                  } else {
  779                          WARN_ON(1);
  780                          extent_end = search_start;
  781                  }
  (...)

This happened because the item we were processing did not match a file
extent item (its key type != BTRFS_EXTENT_DATA_KEY), and even on this
case we cast the item to a struct btrfs_file_extent_item pointer and
then find a type field value that does not match any of the expected
values (BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]). This scenario happens
due to a tiny time window where a race can happen as exemplified below.
For example, consider the following scenario where we're using the
NO_HOLES feature and we have the following two neighbour leafs:

               Leaf X (has N items)                    Leaf Y

[ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ]  [ (257 EXTENT_DATA 8192), ... ]
          slot N - 2         slot N - 1              slot 0

Our inode 257 has an implicit hole in the range [0, 8K[ (implicit rather
than explicit because NO_HOLES is enabled). Now if our inode has an
ordered extent for the range [4K, 8K[ that is finishing, the following
can happen:

          CPU 1                                       CPU 2

  btrfs_finish_ordered_io()
    insert_reserved_file_extent()
      __btrfs_drop_extents()
         Searches for the key
          (257 EXTENT_DATA 4096) through
          btrfs_lookup_file_extent()

         Key not found and we get a path where
         path->nodes[0] == leaf X and
         path->slots[0] == N

         Because path->slots[0] is >=
         btrfs_header_nritems(leaf X), we call
         btrfs_next_leaf()

         btrfs_next_leaf() releases the path

                                                  inserts key
                                                  (257 INODE_REF 4096)
                                                  at the end of leaf X,
                                                  leaf X now has N + 1 keys,
                                                  and the new key is at
                                                  slot N

         btrfs_next_leaf() searches for
         key (257 INODE_REF 256), with
         path->keep_locks set to 1,
         because it was the last key it
         saw in leaf X

           finds it in leaf X again and
           notices it's no longer the last
           key of the leaf, so it returns 0
           with path->nodes[0] == leaf X and
           path->slots[0] == N (which is now
           < btrfs_header_nritems(leaf X)),
           pointing to the new key
           (257 INODE_REF 4096)

         __btrfs_drop_extents() casts the
         item at path->nodes[0], slot
         path->slots[0], to a struct
         btrfs_file_extent_item - it does
         not skip keys for the target
         inode with a type less than
         BTRFS_EXTENT_DATA_KEY
         (BTRFS_INODE_REF_KEY < BTRFS_EXTENT_DATA_KEY)

         sees a bogus value for the type
         field triggering the WARN_ON in
         the trace shown above, and sets
         extent_end = search_start (4096)

         does the if-then-else logic to
         fixup 0 length extent items created
         by a past bug from hole punching:

           if (extent_end == key.offset &&
               extent_end >= search_start)
               goto delete_extent_item;

         that evaluates to true and it ends
         up deleting the key pointed to by
         path->slots[0], (257 INODE_REF 4096),
         from leaf X

The same could happen for example for a xattr that ends up having a key
with an offset value that matches search_start (very unlikely but not
impossible).

So fix this by ensuring that keys smaller than BTRFS_EXTENT_DATA_KEY are
skipped, never casted to struct btrfs_file_extent_item and never deleted
by accident. Also protect against the unexpected case of getting a key
for a lower inode number by skipping that key and issuing a warning.

Signed-off-by: Filipe Manana 
Signed-off-by: Jiri Slaby

ceph: fix kick_requests()

2015-11-14T17:57:13+00:00

commit 282c105225ec3229f344c5fced795b9e1e634440 upstream.

__do_request() may unregister the request. So we should update
iterator 'p' before calling __do_request()

Signed-off-by: "Yan, Zheng" 
Signed-off-by: Jiri Slaby

ceph: protect kick_requests() with mdsc->mutex

2015-11-14T17:57:08+00:00

commit 656e4382948d4b2c81bdaf707f1400f53eff2625 upstream.

Signed-off-by: "Yan, Zheng" 
Reviewed-by: Sage Weil 
Signed-off-by: Jiri Slaby

ceph: make sure request isn't in any waiting list when kicking request.

2015-11-14T17:56:56+00:00

commit 03974e8177b36d672eb59658f976f03cb77c1129 upstream.

we may corrupt waiting list if a request in the waiting list is kicked.

Signed-off-by: "Yan, Zheng" 
Reviewed-by: Sage Weil 
Signed-off-by: Jiri Slaby