linux.git/fs/jbd/commit.c, branch v2.6.28

ext3: add an option to control error handling on file data

2008-10-20T15:52:37+00:00

If the journal doesn't abort when it gets an IO error in file data blocks,
the file data corruption will spread silently.  Because most of
applications and commands do buffered writes without fsync(), they don't
notice the IO error.  It's scary for mission critical systems.  On the
other hand, if the journal aborts whenever it gets an IO error in file
data blocks, the system will easily become inoperable.  So this patch
introduces a filesystem option to determine whether it aborts the journal
or just call printk() when it gets an IO error in file data.

If you mount a ext3 fs with data_err=abort option, it aborts on file data
write error.  If you mount it with data_err=ignore, it doesn't abort, just
call printk().  data_err=ignore is the default.

Signed-off-by: Hidehiro Kawai 
Cc: Jan Kara 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

jbd: don't dirty original metadata buffer on abort

2008-10-20T15:52:37+00:00

Currently, original metadata buffers are dirtied when they are unfiled
whether the journal has aborted or not.  Eventually these buffers will be
written-back to the filesystem by pdflush.  This means some metadata
buffers are written to the filesystem without journaling if the journal
aborts.  So if both journal abort and system crash happen at the same
time, the filesystem would become inconsistent state.  Additionally,
replaying journaled metadata can overwrite the latest metadata on the
filesystem partly.  Because, if the journal aborts, journaled metadata are
preserved and replayed during the next mount not to lose uncheckpointed
metadata.  This would also break the consistency of the filesystem.

This patch prevents original metadata buffers from being dirtied on abort
by clearing BH_JBDDirty flag from those buffers.  Thus, no metadata
buffers are written to the filesystem without journaling.

Signed-off-by: Hidehiro Kawai 
Acked-by: Jan Kara 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

jbd: abort when failed to log metadata buffers

2008-10-20T15:52:36+00:00

If we failed to write metadata buffers to the journal space and succeeded
to write the commit record, stale data can be written back to the
filesystem as metadata in the recovery phase.

To avoid this, when we failed to write out metadata buffers, abort the
journal before writing the commit record.

Signed-off-by: Hidehiro Kawai 
Acked-by: Jan Kara 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

fs: rename buffer trylock

2008-08-05T04:56:09+00:00

Like the page lock change, this also requires name change, so convert the
raw test_and_set bitop to a trylock.

Signed-off-by: Nick Piggin 
Signed-off-by: Linus Torvalds

mm: rename page trylock

2008-08-05T04:31:34+00:00

Converting page lock to new locking bitops requires a change of page flag
operation naming, so we might as well convert it to something nicer
(!TestSetPageLocked_Lock => trylock_page, SetPageLocked => set_page_locked).

This also facilitates lockdeping of page lock.

Signed-off-by: Nick Piggin 
Acked-by: KOSAKI Motohiro 
Acked-by: Peter Zijlstra 
Acked-by: Andrew Morton 
Acked-by: Benjamin Herrenschmidt 
Signed-off-by: Linus Torvalds

jbd: don't abort if flushing file data failed

2008-07-25T17:53:32+00:00

In ordered mode, the current jbd aborts the journal if a file data buffer
has an error.  But this behavior is unintended, and we found that it has
been adopted accidentally.

This patch undoes it and just calls printk() instead of aborting the
journal.  Additionally, set AS_EIO into the address_space object of the
failed buffer which is submitted by journal_do_submit_data() so that
fsync() can get -EIO.

Missing error checkings are also added to inform errors on file data
buffers to the user.  The following buffers are targeted.

  (a) the buffer which has already been written out by pdflush
  (b) the buffer which has been unlocked before scanned in the
      t_locked_list loop

[akpm@linux-foundation.org: improve grammar in a printk]
Signed-off-by: Hidehiro Kawai 
Acked-by: Jan Kara 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

jbd: positively dispose the unmapped data buffers in journal_commit_transaction()

2008-07-25T17:53:32+00:00

After ext3-ordered files are truncated, there is a possibility that the
pages which cannot be estimated still remain.  Remaining pages can be
released when the system has really few memory.  So, it is not memory
leakage.  But the resource management software etc.  may not work
correctly.

It is possible that journal_unmap_buffer() cannot release the buffers, and
the pages to which they belong because they are attached to a commiting
transaction and journal_unmap_buffer() cannot release them.  To release
such the buffers and the pages later, journal_unmap_buffer() leaves it to
journal_commit_transaction().  (journal_unmap_buffer() puts the mark
'BH_Freed' to the buffers so that journal_commit_transaction() can
identify whether they can be released or not.)

In the journalled mode and the writeback mode, jbd does with only metadata
buffers.  But in the ordered mode, jbd does with metadata buffers and also
data buffers.

Actually, journal_commit_transaction() releases only the metadata buffers
of which release is demanded by journal_unmap_buffer(), and also releases
the pages to which they belong if possible.

As a result, the data buffers of which release is demanded by
journal_unmap_buffer() remain after a transaction commits.  And also the
pages to which they belong remain.

Such the remained pages don't have mapping any longer.  Due to this fact,
there is a possibility that the pages which cannot be estimated remain.

The metadata buffers marked 'BH_Freed' and the pages to which
they belong can be released at 'JBD: commit phase 7'.

Therefore, by applying the same code into 'JBD: commit phase 2' (where the
data buffers are done with), journal_commit_transaction() can also release
the data buffers marked 'BH_Freed' and the pages to which they belong.

As a result, all the buffers marked 'BH_Freed' can be released, and also
all the pages to which these buffers belong can be released at
journal_commit_transaction().  So, the page which cannot be estimated is
lost.

<>
 >         spin_lock(&journal->j_list_lock);
 >         while (commit_transaction->t_forget) {
 >                 transaction_t *cp_transaction;
 >                 struct buffer_head *bh;
 >
 >                 jh = commit_transaction->t_forget;
 >...
 >                 if (buffer_freed(bh)) {
 >                 ^^^^^^^^^^^^^^^^^^^^^^^^
 >                         clear_buffer_freed(bh);
 >                        ^^^^^^^^^^^^^^^^^^^^^^^^
 >                         clear_buffer_jbddirty(bh);
 >                 }
 >
 >                 if (buffer_jbddirty(bh)) {
 >                         JBUFFER_TRACE(jh, "add to new checkpointing trans");
 >                         __journal_insert_checkpoint(jh, commit_transaction);
 >                         JBUFFER_TRACE(jh, "refile for checkpoint writeback");
 >                         __journal_refile_buffer(jh);
 >                         jbd_unlock_bh_state(bh);
 >                 } else {
 >                         J_ASSERT_BH(bh, !buffer_dirty(bh));
 > ...
 >                         JBUFFER_TRACE(jh, "refile or unfile freed buffer");
 >                         __journal_refile_buffer(jh);
 >                         if (!jh->b_transaction) {
 >                                 jbd_unlock_bh_state(bh);
 >                                  /* needs a brelse */
 >                                 journal_remove_journal_head(bh);
 >                                 release_buffer_page(bh);
 >                                 ^^^^^^^^^^^^^^^^^^^^^^^^
 >                         } else
 >                 }
****************************************************************
* Apply the code of "^^^^^^" lines into 'JBD: commit phase 2' *
****************************************************************

At journal_commit_transaction() code, there is one extra message in the
series of jbd debug messages.  ("JBD: commit phase 2") This patch fixes
it, too.

Signed-off-by: Toshiyuki Okajima 
Acked-by: Jan Kara 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

jbd: need to hold j_state_lock to updates to transaction t_state to T_COMMIT

2008-05-15T02:11:14+00:00

Updating the current transaction's t_state is protected by j_state_lock.  We
need to do the same when updating the t_state to T_COMMIT.

Signed-off-by: Mingming Cao 
Acked-by: Jan Kara 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

jbd: fix possible journal overflow issues

2008-04-28T15:58:44+00:00

There are several cases where the running transaction can get buffers added to
its BJ_Metadata list which it never dirtied, which makes its t_nr_buffers
counter end up larger than its t_outstanding_credits counter.

This will cause issues when starting new transactions as while we are logging
buffers we decrement t_outstanding_buffers, so when t_outstanding_buffers goes
negative, we will report that we need less space in the journal than we
actually need, so transactions will be started even though there may not be
enough room for them.  In the worst case scenario (which admittedly is almost
impossible to reproduce) this will result in the journal running out of space.

The fix is to only
refile buffers from the committing transaction to the running transactions
BJ_Modified list when b_modified is set on that journal, which is the only way
to be sure if the running transaction has modified that buffer.

This patch also fixes an accounting error in journal_forget, it is possible
that we can call journal_forget on a buffer without having modified it, only
gotten write access to it, so instead of freeing a credit, we only do so if
the buffer was modified.  The assert will help catch if this problem occurs.
Without these two patches I could hit this assert within minutes of running
postmark, with them this issue no longer arises.  Thank you,

Signed-off-by: Josef Bacik 
Cc: 
Acked-by: Jan Kara 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

jbd: fix the way the b_modified flag is cleared

2008-04-28T15:58:44+00:00

Currently at the start of a journal commit we loop through all of the buffers
on the committing transaction and clear the b_modified flag (the flag that is
set when a transaction modifies the buffer) under the j_list_lock.

The problem is that everywhere else this flag is modified only under the jbd
lock buffer flag, so it will race with a running transaction who could
potentially set it, and have it unset by the committing transaction.

This is also a big waste, you can have several thousands of buffers that you
are clearing the modified flag on when you may not need to.  This patch
removes this code and instead clears the b_modified flag upon entering
do_get_write_access/journal_get_create_access, so if that transaction does
indeed use the buffer then it will be accounted for properly, and if it does
not then we know we didn't use it.

That will be important for the next patch in this series.  Tested thoroughly
by myself using postmark/iozone/bonnie++.

Signed-off-by: Josef Bacik 
Cc: 
Acked-by: Jan Kara 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds