linux-stable.git/fs/jbd2, branch v3.2.41

jbd2: fix assertion failure in jbd2_journal_flush()

2013-01-16T01:13:11+00:00

commit d7961c7fa4d2e3c3f12be67e21ba8799b5a7238a upstream.

The following race is possible between start_this_handle() and someone
calling jbd2_journal_flush().

Process A                              Process B
start_this_handle().
  if (journal->j_barrier_count) # false
  if (!journal->j_running_transaction) { #true
    read_unlock(&journal->j_state_lock);
                                       jbd2_journal_lock_updates()
                                       jbd2_journal_flush()
                                         write_lock(&journal->j_state_lock);
                                         if (journal->j_running_transaction) {
                                           # false
                                         ... wait for committing trans ...
                                         write_unlock(&journal->j_state_lock);
    ...
    write_lock(&journal->j_state_lock);
    if (!journal->j_running_transaction) { # true
      jbd2_get_transaction(journal, new_transaction);
    write_unlock(&journal->j_state_lock);
    goto repeat; # eventually blocks on j_barrier_count > 0
                                         ...
                                         J_ASSERT(!journal->j_running_transaction);
                                           # fails

We fix the race by rechecking j_barrier_count after reacquiring j_state_lock
in exclusive mode.

Reported-by: yjwsignal@empal.com
Signed-off-by: Jan Kara 
Signed-off-by: "Theodore Ts'o" 
Signed-off-by: Ben Hutchings

jbd2: use GFP_NOFS for blkdev_issue_flush

2012-05-11T12:13:59+00:00

commit 99aa78466777083255b876293e9e83dec7cd809a upstream.

flush request is issued in transaction commit code path, so looks using
GFP_KERNEL to allocate memory for flush request bio falls into the classic
deadlock issue.  I saw btrfs and dm get it right, but ext4, xfs and md are
using GFP.

Signed-off-by: Shaohua Li 
Signed-off-by: Theodore Ts'o 
Reviewed-by: Jan Kara 
Signed-off-by: Ben Hutchings

jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer

2012-04-02T16:53:03+00:00

commit 15291164b22a357cb211b618adfef4fa82fc0de3 upstream.

journal_unmap_buffer()'s zap_buffer: code clears a lot of buffer head
state ala discard_buffer(), but does not touch _Delay or _Unwritten as
discard_buffer() does.

This can be problematic in some areas of the ext4 code which assume
that if they have found a buffer marked unwritten or delay, then it's
a live one.  Perhaps those spots should check whether it is mapped
as well, but if jbd2 is going to tear down a buffer, let's really
tear it down completely.

Without this I get some fsx failures on sub-page-block filesystems
up until v3.2, at which point 4e96b2dbbf1d7e81f22047a50f862555a6cb87cb
and 189e868fa8fdca702eb9db9d8afc46b5cb9144c9 make the failures go
away, because buried within that large change is some more flag
clearing.  I still think it's worth doing in jbd2, since
->invalidatepage leads here directly, and it's the right place
to clear away these flags.

Signed-off-by: Eric Sandeen 
Signed-off-by: "Theodore Ts'o" 
Signed-off-by: Greg Kroah-Hartman

jbd2: Unify log messages in jbd2 code

2011-11-01T23:09:18+00:00

Some jbd2 code prints out kernel messages with "JBD2: " prefix, at the
same time other jbd2 code prints with "JBD: " prefix. Unify the prefix
to "JBD2: ".

Signed-off-by: Eryu Guan 
Signed-off-by: "Theodore Ts'o"

jbd/jbd2: validate sb->s_first in journal_get_superblock()

2011-11-01T23:04:59+00:00

I hit a J_ASSERT(blocknr != 0) failure in cleanup_journal_tail() when
mounting a fsfuzzed ext3 image. It turns out that the corrupted ext3
image has s_first = 0 in journal superblock, and the 0 is passed to
journal->j_head in journal_reset(), then to blocknr in
cleanup_journal_tail(), in the end the J_ASSERT failed.

So validate s_first after reading journal superblock from disk in
journal_get_superblock() to ensure s_first is valid.

The following script could reproduce it:

fstype=ext3
blocksize=1024
img=$fstype.img
offset=0
found=0
magic="c0 3b 39 98"

dd if=/dev/zero of=$img bs=1M count=8
mkfs -t $fstype -b $blocksize -F $img
filesize=`stat -c %s $img`
while [ $offset -lt $filesize ]
do
        if od -j $offset -N 4 -t x1 $img | grep -i "$magic";then
                echo "Found journal: $offset"
                found=1
                break
        fi
        offset=`echo "$offset+$blocksize" | bc`
done

if [ $found -ne 1 ];then
        echo "Magic \"$magic\" not found"
        exit 1
fi

dd if=/dev/zero of=$img seek=$(($offset+23)) conv=notrunc bs=1 count=1

mkdir -p ./mnt
mount -o loop $img ./mnt

Cc: Jan Kara 
Signed-off-by: Eryu Guan 
Signed-off-by: "Theodore Ts'o"

jbd2: fix build when CONFIG_BUG is not enabled

2011-10-27T08:05:13+00:00

Fix build error when CONFIG_BUG is not enabled:

fs/jbd2/transaction.c:1175:3: error: implicit declaration of function '__WARN'

by changing __WARN() to WARN_ON(), as suggested by
Arnaud Lacombe .

Signed-off-by: Randy Dunlap 
Signed-off-by: "Theodore Ts'o" 
Cc: Arnd Bergmann 
Cc: Arnaud Lacombe

jbd2: use gfp_t instead of int

2011-09-04T14:20:14+00:00

This silences some Sparse warnings:
fs/jbd2/transaction.c:135:69: warning: incorrect type in argument 2 (different base types)
fs/jbd2/transaction.c:135:69:    expected restricted gfp_t [usertype] flags
fs/jbd2/transaction.c:135:69:    got int [signed] gfp_mask

Signed-off-by: Dan Carpenter 
Signed-off-by: "Theodore Ts'o"

jbd2: add debugging information to jbd2_journal_dirty_metadata()

2011-09-04T14:18:14+00:00

Add debugging information in case jbd2_journal_dirty_metadata() is
called with a buffer_head which didn't have
jbd2_journal_get_write_access() called on it, or if the journal_head
has the wrong transaction in it.  In addition, return an error code.
This won't change anything for ocfs2, which will BUG_ON() the non-zero
exit code.

For ext4, the caller of this function is ext4_handle_dirty_metadata(),
and on seeing a non-zero return code, will call __ext4_journal_stop(),
which will print the function and line number of the (buggy) calling
function and abort the journal.  This will allow us to recover instead
of bug halting, which is better from a robustness and reliability
point of view.

Signed-off-by: "Theodore Ts'o"

jbd2: remove jbd2_dev_to_name() from jbd2 tracepoints

2011-07-11T02:05:08+00:00

Using function calls in TP_printk causes perf heartburn, so print the
MAJOR/MINOR device numbers instead.

Signed-off-by: "Theodore Ts'o"

jbd2: use WRITE_SYNC in journal checkpoint

2011-06-27T16:36:29+00:00

In journal checkpoint, we write the buffer and wait for its finish.
But in cfq, the async queue has a very low priority, and in our test,
if there are too many sync queues and every queue is filled up with
requests, the write request will be delayed for quite a long time and
all the tasks which are waiting for journal space will end with errors like:

INFO: task attr_set:3816 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
attr_set      D ffff880028393480     0  3816      1 0x00000000
 ffff8802073fbae8 0000000000000086 ffff8802140847c8 ffff8800283934e8
 ffff8802073fb9d8 ffffffff8103e456 ffff8802140847b8 ffff8801ed728080
 ffff8801db4bc080 ffff8801ed728450 ffff880028393480 0000000000000002
Call Trace:
 [] ? __dequeue_entity+0x33/0x38
 [] ? need_resched+0x23/0x2d
 [] ? thread_return+0xa2/0xbc
 [] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
 [] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
 [] __mutex_lock_common+0x14e/0x1a9
 [] ? brelse+0x13/0x15 [ext4]
 [] __mutex_lock_slowpath+0x19/0x1b
 [] mutex_lock+0x1b/0x32
 [] __jbd2_journal_insert_checkpoint+0xe3/0x20c [jbd2]
 [] start_this_handle+0x438/0x527 [jbd2]
 [] ? autoremove_wake_function+0x0/0x3e
 [] jbd2_journal_start+0xa1/0xcc [jbd2]
 [] ext4_journal_start_sb+0x57/0x81 [ext4]
 [] ext4_xattr_set+0x6c/0xe3 [ext4]
 [] ext4_xattr_user_set+0x42/0x4b [ext4]
 [] generic_setxattr+0x6b/0x76
 [] __vfs_setxattr_noperm+0x47/0xc0
 [] vfs_setxattr+0x7f/0x9a
 [] setxattr+0xb5/0xe8
 [] ? do_filp_open+0x571/0xa6e
 [] sys_fsetxattr+0x6b/0x91
 [] system_call_fastpath+0x16/0x1b

So this patch tries to use WRITE_SYNC in __flush_batch so that the request will
be moved into sync queue and handled by cfq timely. We also use the new plug,
sot that all the WRITE_SYNC requests can be given as a whole when we unplug it.

Signed-off-by: Tao Ma 
Signed-off-by: "Theodore Ts'o" 
Cc: Jan Kara 
Reported-by: Robin Dong