linux-stable.git/include/linux/jbd2.h, branch v5.15.208

jbd2: make jbd2_journal_get_max_txn_bufs() internal

2024-08-19T03:45:14+00:00

commit 4aa99c71e42ad60178c1154ec24e3df9c684fb67 upstream.

There's no reason to have jbd2_journal_get_max_txn_bufs() public
function. Currently all users are internal and can use
journal->j_max_transaction_buffers instead. This saves some unnecessary
recomputations of the limit as a bonus which becomes important as this
function gets more complex in the following patch.

CC: stable@vger.kernel.org
Signed-off-by: Jan Kara 
Reviewed-by: Zhang Yi 
Link: https://patch.msgid.link/20240624170127.3253-1-jack@suse.cz
Signed-off-by: Theodore Ts'o 
Signed-off-by: Greg Kroah-Hartman

jbd2: rename jbd_debug() to jbd2_debug()

2023-09-23T09:10:00+00:00

[ Upstream commit cb3b3bf22cf33707d684e74207908ba0ef3b6467 ]

The name of jbd_debug() is confusing as all functions inside jbd2 have
jbd2_ prefix. Rename jbd_debug() to jbd2_debug(). No functional changes.

Signed-off-by: Jan Kara 
Reviewed-by: Lukas Czerner 
Link: https://lore.kernel.org/r/20220608112355.4397-2-jack@suse.cz
Signed-off-by: Theodore Ts'o 
Stable-dep-of: 2dfba3bb40ad ("jbd2: correct the end of the journal recovery scan range")
Signed-off-by: Sasha Levin

jbd2: kill t_handle_lock transaction spinlock

2023-09-23T09:10:00+00:00

[ Upstream commit f7f497cb702462e8505ff3d8d4e7722ad95626a1 ]

This patch kills t_handle_lock transaction spinlock completely from
jbd2.

To explain the reasoning, currently there were three sites at which
this spinlock was used.

1. jbd2_journal_wait_updates()
   a. Based on careful code review it can be seen that, we don't need this
      lock here. This is since we wait for any currently ongoing updates
      based on a atomic variable t_updates. And we anyway don't take any
      t_handle_lock while in stop_this_handle().
      i.e.

	write_lock(&journal->j_state_lock()
	jbd2_journal_wait_updates() 			stop_this_handle()
		while (atomic_read(txn->t_updates) { 		|
		DEFINE_WAIT(wait); 				|
		prepare_to_wait(); 				|
		if (atomic_read(txn->t_updates) 		if (atomic_dec_and_test(txn->t_updates))
			write_unlock(&journal->j_state_lock);
			schedule();					wake_up()
			write_lock(&journal->j_state_lock);
		finish_wait();
	   }
	txn->t_state = T_COMMIT
	write_unlock(&journal->j_state_lock);

   b.  Also note that between atomic_inc(&txn->t_updates) in
       start_this_handle() and jbd2_journal_wait_updates(), the
       synchronization happens via read_lock(journal->j_state_lock) in
       start_this_handle();

2. jbd2_journal_extend()
   a. jbd2_journal_extend() is called with the handle of each process from
      task_struct. So no lock required in updating member fields of handle_t

   b. For member fields of h_transaction, all updates happens only via
      atomic APIs (which is also within read_lock()).
      So, no need of this transaction spinlock.

3. update_t_max_wait()
   Based on Jan suggestion, this can be carefully removed using atomic
   cmpxchg API.
   Note that there can be several processes which are waiting for a new
   transaction to be allocated and started. For doing this only one
   process will succeed in taking write_lock() and allocating a new txn.
   After that all of the process will be updating the t_max_wait (max
   transaction wait time). This can be done via below method w/o taking
   any locks using atomic cmpxchg.
   For more details refer [1]

	   new = get_new_val();
	   old = READ_ONCE(ptr->max_val);
	   while (old < new)
		old = cmpxchg(&ptr->max_val, old, new);

[1]: https://lwn.net/Articles/849237/

Suggested-by: Jan Kara 
Signed-off-by: Ritesh Harjani 
Reviewed-by: Jan Kara 
Link: https://lore.kernel.org/r/d89e599658b4a1f3893a48c6feded200073037fc.1644992076.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o 
Stable-dep-of: 2dfba3bb40ad ("jbd2: correct the end of the journal recovery scan range")
Signed-off-by: Sasha Levin

jbd2: refactor wait logic for transaction updates into a common function

2023-09-23T09:09:59+00:00

[ Upstream commit 4f98186848707f530669238d90e0562d92a78aab ]

No functionality change as such in this patch. This only refactors the
common piece of code which waits for t_updates to finish into a common
function named as jbd2_journal_wait_updates(journal_t *)

Signed-off-by: Ritesh Harjani 
Reviewed-by: Jan Kara 
Link: https://lore.kernel.org/r/8c564f70f4b2591171677a2a74fccb22a7b6c3a4.1642416995.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o 
Stable-dep-of: 2dfba3bb40ad ("jbd2: correct the end of the journal recovery scan range")
Signed-off-by: Sasha Levin

jbd2: fix a race when checking checkpoint buffer busy

2023-08-30T14:18:12+00:00

[ Upstream commit 46f881b5b1758dc4a35fba4a643c10717d0cf427 ]

Before removing checkpoint buffer from the t_checkpoint_list, we have to
check both BH_Dirty and BH_Lock bits together to distinguish buffers
have not been or were being written back. But __cp_buffer_busy() checks
them separately, it first check lock state and then check dirty, the
window between these two checks could be raced by writing back
procedure, which locks buffer and clears buffer dirty before I/O
completes. So it cannot guarantee checkpointing buffers been written
back to disk if some error happens later. Finally, it may clean
checkpoint transactions and lead to inconsistent filesystem.

jbd2_journal_forget() and __journal_try_to_free_buffer() also have the
same problem (journal_unmap_buffer() escape from this issue since it's
running under the buffer lock), so fix them through introducing a new
helper to try holding the buffer lock and remove really clean buffer.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Cc: stable@vger.kernel.org
Suggested-by: Jan Kara 
Signed-off-by: Zhang Yi 
Reviewed-by: Jan Kara 
Link: https://lore.kernel.org/r/20230606135928.434610-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o 
Signed-off-by: Sasha Levin

jbd2: remove t_checkpoint_io_list

2023-08-30T14:18:12+00:00

[ Upstream commit be22255360f80d3af789daad00025171a65424a5 ]

Since t_checkpoint_io_list was stop using in jbd2_log_do_checkpoint()
now, it's time to remove the whole t_checkpoint_io_list logic.

Signed-off-by: Zhang Yi 
Reviewed-by: Jan Kara 
Link: https://lore.kernel.org/r/20230606135928.434610-3-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o 
Stable-dep-of: 46f881b5b175 ("jbd2: fix a race when checking checkpoint buffer busy")
Signed-off-by: Sasha Levin

ext4: fast commit may not fallback for ineligible commit

2022-03-08T18:12:32+00:00

[ Upstream commit e85c81ba8859a4c839bcd69c5d83b32954133a5b ]

For the follow scenario:
1. jbd start commit transaction n
2. task A get new handle for transaction n+1
3. task A do some ineligible actions and mark FC_INELIGIBLE
4. jbd complete transaction n and clean FC_INELIGIBLE
5. task A call fsync

In this case fast commit will not fallback to full commit and
transaction n+1 also not handled by jbd.

Make ext4_fc_mark_ineligible() also record transaction tid for
latest ineligible case, when call ext4_fc_cleanup() check
current transaction tid, if small than latest ineligible tid
do not clear the EXT4_MF_FC_INELIGIBLE.

Reported-by: kernel test robot 
Reported-by: Dan Carpenter 
Reported-by: Ritesh Harjani 
Suggested-by: Harshad Shirwadkar 
Signed-off-by: Xin Yin 
Link: https://lore.kernel.org/r/20220117093655.35160-2-yinxin.x@bytedance.com
Signed-off-by: Theodore Ts'o 
Cc: stable@kernel.org
Signed-off-by: Sasha Levin

ext4: inline jbd2_journal_[un]register_shrinker()

2021-07-08T12:37:31+00:00

The function jbd2_journal_unregister_shrinker() was getting called
twice when the file system was getting unmounted.  On Power and ARM
platforms this was causing kernel crash when unmounting the file
system, when a percpu_counter was destroyed twice.

Fix this by removing jbd2_journal_[un]register_shrinker() functions,
and inlining the shrinker setup and teardown into
journal_init_common() and jbd2_journal_destroy().  This means that
ext4 and ocfs2 now no longer need to know about registering and
unregistering jbd2's shrinker.

Also, while we're at it, rename the percpu counter from
j_jh_shrink_count to j_checkpoint_jh_count, since this makes it
clearer what this counter is intended to track.

Link: https://lore.kernel.org/r/20210705145025.3363130-1-tytso@mit.edu
Fixes: 4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers")
Reported-by: Jon Hunter 
Reported-by: Sachin Sant 
Tested-by: Sachin Sant 
Tested-by: Jon Hunter 
Reviewed-by: Jan Kara 
Signed-off-by: Theodore Ts'o

jbd2,ext4: add a shrinker to release checkpointed buffers

2021-06-24T14:54:49+00:00

Current metadata buffer release logic in bdev_try_to_free_page() have
a lot of use-after-free issues when umount filesystem concurrently, and
it is difficult to fix directly because ext4 is the only user of
s_op->bdev_try_to_free_page callback and we may have to add more special
refcount or lock that is only used by ext4 into the common vfs layer,
which is unacceptable.

One better solution is remove the bdev_try_to_free_page callback, but
the real problem is we cannot easily release journal_head on the
checkpointed buffer, so try_to_free_buffers() cannot release buffers and
page under memory pressure, which is more likely to trigger
out-of-memory. So we cannot remove the callback directly before we find
another way to release journal_head.

This patch introduce a shrinker to free journal_head on the checkpointed
transaction. After the journal_head got freed, try_to_free_buffers()
could free buffer properly.

Signed-off-by: Zhang Yi 
Suggested-by: Jan Kara 
Reviewed-by: Jan Kara 
Link: https://lore.kernel.org/r/20210610112440.3438139-6-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o

jbd2: ensure abort the journal if detect IO error when writing original buffer back

2021-06-24T14:33:49+00:00

Although we merged c044f3d8360 ("jbd2: abort journal if free a async
write error metadata buffer"), there is a race between
jbd2_journal_try_to_free_buffers() and jbd2_journal_destroy(), so the
jbd2_log_do_checkpoint() may still fail to detect the buffer write
io error flag which may lead to filesystem inconsistency.

jbd2_journal_try_to_free_buffers()     ext4_put_super()
                                        jbd2_journal_destroy()
  __jbd2_journal_remove_checkpoint()
  detect buffer write error              jbd2_log_do_checkpoint()
                                         jbd2_cleanup_journal_tail()
                                           <--- lead to inconsistency
  jbd2_journal_abort()

Fix this issue by introducing a new atomic flag which only have one
JBD2_CHECKPOINT_IO_ERROR bit now, and set it in
__jbd2_journal_remove_checkpoint() when freeing a checkpoint buffer
which has write_io_error flag. Then jbd2_journal_destroy() will detect
this mark and abort the journal to prevent updating log tail.

Signed-off-by: Zhang Yi 
Reviewed-by: Jan Kara 
Link: https://lore.kernel.org/r/20210610112440.3438139-3-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o