Skip to content

Regular for-next build test #1157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10,000 commits into
base: build
Choose a base branch
from
Open

Regular for-next build test #1157

wants to merge 10,000 commits into from

Conversation

kdave
Copy link
Member

@kdave kdave commented Feb 21, 2024

Keep this open, the build tests are hosted on github CI.

@kdave kdave changed the title Post rc5 build test Regular for-next build test Feb 22, 2024
@kdave kdave force-pushed the for-next branch 6 times, most recently from 2d4aefb to c9e380a Compare February 28, 2024 14:37
@kdave kdave force-pushed the for-next branch 6 times, most recently from c56343b to 1cab137 Compare March 5, 2024 17:23
@kdave kdave force-pushed the for-next branch 2 times, most recently from 6613f3c to b30a0ce Compare March 15, 2024 01:05
@kdave kdave force-pushed the for-next branch 6 times, most recently from d205ebd to c0bd9d9 Compare March 25, 2024 17:48
@kdave kdave force-pushed the for-next branch 4 times, most recently from 15022b1 to c22750c Compare March 28, 2024 02:04
@kdave kdave force-pushed the for-next branch 3 times, most recently from 28d9855 to e18d8ce Compare April 4, 2024 19:30
kdave and others added 8 commits June 18, 2025 13:56
The RCU protection is now done in the plain helpers, we can remove the
"_in_rcu" and "_rl_in_rcu".

Reviewed-by: Daniel Vacek <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Remove the critical level message helpers as they're not used, the RCU
protection is provided by the plain helpers.

Reviewed-by: Daniel Vacek <[email protected]>
Signed-off-by: David Sterba <[email protected]>
The btrfs_debug() helpers depend on various config options. In case of
"no_printk" we don't need to define a special helper that in the end
does nothing but validates the parameters. As the default build config
is to validate the parameters we can simplify it to let the debug
helpers expand to nothing and remove btrfs_no_printk_in_rcu().

To avoid warnings use fs_info and inline one variable in
extent_from_logical().

Reviewed-by: Daniel Vacek <[email protected]>
Signed-off-by: David Sterba <[email protected]>
There's only one caller of btrfs_printk_ratelimited(), merge it there.

Reviewed-by: Daniel Vacek <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Unlike qgroup rescan, which always shows whether it cleared the
inconsistent flag, we do not have a proper way to show if qgroup is
marked inconsistent.

This was not a big deal before as there aren't that many locations that
can mark qgroup inconsistent.

But with the introduction of drop_subtree_threshold, qgroup can be
marked inconsistent very frequently, especially when dropping
subvolumes.

Although most user space tools relying on qgroup should do their own
checks and queue a rescan if needed, we have no idea when qgroup is
marked inconsistent, and will be much harder to debug.

So this patch will add an extra warning (btrfs_warn_rl()) when the
qgroup flag is flipped into inconsistent for the first time.
And add extra reason why qgroup flips inconsistent.

This means we can move the error message immediately before
qgroup_inconsistent_warning() into that function.

For call sites without an obvious reason, or is a shared error handling,
output the function that failed and the error code instead.

Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
…ions

If we are rebuilding a free space tree, while modifying the free space
tree we may need to allocate a new metadata block group.
If we end up using multiple transactions for the rebuild, when we call
btrfs_end_transaction() we enter btrfs_create_pending_block_groups()
which calls add_block_group_free_space() to add items to the free space
tree for the block group.

Then later during the free space tree rebuild, at
btrfs_rebuild_free_space_tree(), we may find such new block groups
and call populate_free_space_tree() for them, which fails with -EEXIST
because there are already items in the free space tree. Then we abort the
transaction with -EEXIST at btrfs_rebuild_free_space_tree().
Notice that we say "may find" the new block groups because a new block
group may be inserted in the block groups rbtree, which is being iterated
by the rebuild process, before or after the current node where the rebuild
process is currently at.

Syzbot recently reported such case which produces a trace like the
following:

  ------------[ cut here ]------------
  BTRFS: Transaction aborted (error -17)
  WARNING: CPU: 1 PID: 7626 at fs/btrfs/free-space-tree.c:1341 btrfs_rebuild_free_space_tree+0x470/0x54c fs/btrfs/free-space-tree.c:1341
  Modules linked in:
  CPU: 1 UID: 0 PID: 7626 Comm: syz.2.25 Not tainted 6.15.0-rc7-syzkaller-00085-gd7fa1af5b33e-dirty #0 PREEMPT
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : btrfs_rebuild_free_space_tree+0x470/0x54c fs/btrfs/free-space-tree.c:1341
  lr : btrfs_rebuild_free_space_tree+0x470/0x54c fs/btrfs/free-space-tree.c:1341
  sp : ffff80009c4f7740
  x29: ffff80009c4f77b0 x28: ffff0000d4c3f400 x27: 0000000000000000
  x26: dfff800000000000 x25: ffff70001389eee8 x24: 0000000000000003
  x23: 1fffe000182b6e7b x22: 0000000000000000 x21: ffff0000c15b73d8
  x20: 00000000ffffffef x19: ffff0000c15b7378 x18: 1fffe0003386f276
  x17: ffff80008f31e000 x16: ffff80008adbe98c x15: 0000000000000001
  x14: 1fffe0001b281550 x13: 0000000000000000 x12: 0000000000000000
  x11: ffff60001b281551 x10: 0000000000000003 x9 : 1c8922000a902c00
  x8 : 1c8922000a902c00 x7 : ffff800080485878 x6 : 0000000000000000
  x5 : 0000000000000001 x4 : 0000000000000001 x3 : ffff80008047843c
  x2 : 0000000000000001 x1 : ffff80008b3ebc40 x0 : 0000000000000001
  Call trace:
   btrfs_rebuild_free_space_tree+0x470/0x54c fs/btrfs/free-space-tree.c:1341 (P)
   btrfs_start_pre_rw_mount+0xa78/0xe10 fs/btrfs/disk-io.c:3074
   btrfs_remount_rw fs/btrfs/super.c:1319 [inline]
   btrfs_reconfigure+0x828/0x2418 fs/btrfs/super.c:1543
   reconfigure_super+0x1d4/0x6f0 fs/super.c:1083
   do_remount fs/namespace.c:3365 [inline]
   path_mount+0xb34/0xde0 fs/namespace.c:4200
   do_mount fs/namespace.c:4221 [inline]
   __do_sys_mount fs/namespace.c:4432 [inline]
   __se_sys_mount fs/namespace.c:4409 [inline]
   __arm64_sys_mount+0x3e8/0x468 fs/namespace.c:4409
   __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
   invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
   el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
   do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
   el0_svc+0x58/0x17c arch/arm64/kernel/entry-common.c:767
   el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:786
   el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600
  irq event stamp: 330
  hardirqs last  enabled at (329): [<ffff80008048590c>] raw_spin_rq_unlock_irq kernel/sched/sched.h:1525 [inline]
  hardirqs last  enabled at (329): [<ffff80008048590c>] finish_lock_switch+0xb0/0x1c0 kernel/sched/core.c:5130
  hardirqs last disabled at (330): [<ffff80008adb9e60>] el1_dbg+0x24/0x80 arch/arm64/kernel/entry-common.c:511
  softirqs last  enabled at (10): [<ffff8000801fbf10>] local_bh_enable+0x10/0x34 include/linux/bottom_half.h:32
  softirqs last disabled at (8): [<ffff8000801fbedc>] local_bh_disable+0x10/0x34 include/linux/bottom_half.h:19
  ---[ end trace 0000000000000000 ]---

Fix this by flagging new block groups which had their free space tree
entries already added and then skip them in the rebuild process. Also,
since the rebuild may be triggered when doing a remount, make sure that
when we clear an existing free space tree that we clear such flag from
every existing block group, otherwise we would skip those block groups
during the rebuild.

Reported-by: [email protected]
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
Fixes: 882af9f ("btrfs: handle free space tree rebuild in multiple transactions")
Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
… space tree

Only one of the callers of __add_block_group_free_space() aborts the
transaction if the call fails, while the others don't do it and it's
either never done up the call chain or much higher in the call chain.

So make sure we abort the transaction at __add_block_group_free_space()
if it fails, which brings a couple benefits:

1) If some call chain never aborts the transaction, we avoid having some
   metadata inconsistency because BLOCK_GROUP_FLAG_NEEDS_FREE_SPACE is
   cleared when we enter __add_block_group_free_space() and therefore
   __add_block_group_free_space() is never called again to add the block
   group items to the free space tree, since the function is only called
   when that flag is set in a block group;

2) If the call chain already aborts the transaction, then we get a better
   trace that points to the exact step from __add_block_group_free_space()
   which failed, which is better for analysis.

So abort the transaction at __add_block_group_free_space() if any of its
steps fails.

Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
…ree_space()

Every caller of __add_block_group_free_space() is checking if the flag
BLOCK_GROUP_FLAG_NEEDS_FREE_SPACE is set before calling it. This is
duplicate code and it's prone to some mistake in case we add more callers
in the future. So move the check for that flag into the start of
__add_block_group_free_space(), and, as a consequence, the path allocation
from add_block_group_free_space() is moved into
__add_block_group_free_space(), to preserve the behaviour of allocating
a path only if the flag BLOCK_GROUP_FLAG_NEEDS_FREE_SPACE is set.

Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
kdave and others added 21 commits June 18, 2025 14:00
…eof()

The way zero_end is calculated and used does a -1 and +1 that
effectively cancel out, so this can be simplified. This is also
preparatory patch for using a helper for folio_pos + folio_size with the
semantics of exclusive end of the range.

Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: David Sterba <[email protected]>
In preparation to use a helper for folio_pos + folio_size, rename the
variables for the locked range so they don't use the 'folio_' prefix. As
the locking ranges take inclusive end of the range (hence the "-1") this
would be confusing as the folio helpers typically use exclusive end of
the range.

Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: David Sterba <[email protected]>
There are several cases of folio_pos + folio_size, add a convenience
helper for that. This is a local helper and not proposed as folio API
because it does not seem to be heavily used elsewhere:

A quick grep (folio_size + folio_end) in fs/ shows

     24 btrfs
      4 iomap
      4 ext4
      2 xfs
      2 netfs
      1 gfs2
      1 f2fs
      1 bcachefs
      1 buffer.c

Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Simplify folio_pos() + folio_size() and use the new helper.

Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: David Sterba <[email protected]>
The dirty_log_pages tree is used for tree logging and marks extents
based on log_transid. The bits could be renamed to resemble the
LOG1/LOG2 naming used for the BTRFS_FS_LOG1_ERR bits.

The DIRTY bit is renamed to LOG1 and NEW to LOG2.

Signed-off-by: David Sterba <[email protected]>
We can just return directly if btrfs_insert_empty_item() fails, there is
no need to release the path before returning, as all callers (or upper
in the call chain) will free the path if they get an error from the call
to add_new_free_space_info(), which is also a common pattern everywhere
in btrfs. Finally there's no need to set 'ret' to 0 if the call to
btrfs_insert_empty_item() didn't fail, since 'ret' is already 0.

Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Just return directly, we don't need the label since all we do under it is
to return.

Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
All the callers want is to determine if a bit is set and all of them call
the function and do a double negation (!!) on its result to get a boolean.
So change it to return a boolean and simplify callers.

Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
The function returns the result of another function that returns a boolean
(extent_buffer_test_bit()), and all the callers need is a boolean an not
an integer. So change its return type from int to bool, and modify the
callers to store results in booleans instead of integers, which also makes
them simpler.

Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
All we do under the label is to return, so there's no point in having it,
just return directly whenever we get an error.

Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
All we do under the label is to return, so there's no point in having it,
just return directly whenever we get an error.

Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
All we do under the label is to return, so there's no point in having it,
just return directly whenever we get an error.

Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
All we do under the label is to return, so there's no point in having it,
just return directly whenever we get an error.

Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
All we do under the label is to return, so there's no point in having it,
just return directly whenever we get an error.

Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
A few of the free space tree exported functions have a 'btrfs_' prefix in
their name, but most don't. Not only is this inconsistent, the preferred
style is to have such a prefix, to avoid potential collisions in the
future with other kernel code and offer a somewhat better readibility by
making it obvious in calls sites that we are calling btrfs specific code.

So add the 'btrfs_' prefix to all free space tree functions that are
missing it.

Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
The free_space_set_bits() is used both to set a range of bits or to clear
range of bits, depending on the 'bit' argument value. So the name is very
misleading since it's not used only to set bits. Furthermore the 'bit'
argument is an integer when a boolean is all that is needed plus its name
is singular, which gives the idea the function operates on a single bit
and not on a range of bits.

So rename the function to free_space_modify_bits(), rename the 'bit'
argument to 'set_bits' and turn the argument to bool instead of int.

Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
The argument is used as a boolean, so switch its type from int to bool.

Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
…ents()

There's no need to subtract 1 from path->slots[0] and then decrement the
slot, we can reduce the generated assembly code by decrementing the slot
and then use it.

Module size before:

  $ size fs/btrfs/btrfs.ko
     text	   data	    bss	    dec	    hex	filename
  1846220	 162998	  16136	2025354	 1ee78a	fs/btrfs/btrfs.ko

Module size after:

  $ size fs/btrfs/btrfs.ko
     text	   data	    bss	    dec	    hex	filename
  1846204	 162998	  16136	2025338	 1ee77a	fs/btrfs/btrfs.ko

Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
…_extents()

There's no need to dereference the block group to extract fs_info as we
have already stored fs_info in a local variable. So use the local variable
instead.

Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
… tree

When adding and removing free space to the free space tree, we need to
lookup the respective block group's free info item in the free space tree,
check its flags for the BTRFS_FREE_SPACE_USING_BITMAPS bit and then
release the path.

Move these steps into a helper function and use it in both sites.
This will also help avoiding duplicate code in a subsequent change.

Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Every time we add free space to the free space tree or we remove free
space from the free space tree, we do a lookup for the block group's free
space info item in the free space tree. This takes time, navigating the
btree and we may block either on IO when reading extent buffers from disk
or on extent buffer lock contention due to concurrency.

Instead of doing this lookup everytime, cache the result in the block
structure and use it after the first lookup. This adds two boolean members
to the block group structure but doesn't increase the structure's size.

The following script that runs fs_mark was used to measure the time spent
on run_delayed_tree_ref(), since down that call chain we have calls to
add and remove free space to/from the free space tree (calls to
btrfs_add_to_free_space_tree() and btrfs_remove_from_free_space_tree()):

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/nullb0
  MNT=/mnt
  FILES=100000
  THREADS=$(nproc --all)

  echo "performance" | \
      tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

  umount $DEV &> /dev/null
  mkfs.btrfs -f $DEV
  mount -o ssd $DEV $MNT

  OPTS="-S 0 -L 5 -n $FILES -s 0 -t $THREADS -k"
  for ((i = 1; i <= $THREADS; i++)); do
      OPTS="$OPTS -d $MNT/d$i"
  done

  fs_mark $OPTS

  umount $MNT

This is a heavy metadata test as it's exercising only file creation, so a
lot of allocations of metadata extents, creating delayed refs for adding
new metadata extents and dropping existing ones due to COW. The results
of the times it took to execute run_delayed_tree_ref(), in nanoseconds,
are the following.

Before this change:

  Range: 1868.000 - 6482857.000; Mean: 10231.430; Median: 7005.000; Stddev: 27993.173
  Percentiles:  90th: 13342.000; 95th: 23279.000; 99th: 82448.000
  1868.000 - 4222.038: 270696 ############
  4222.038 - 9541.029: 1201327 #####################################################
  9541.029 - 21559.383: 385436 #################
  21559.383 - 48715.063: 64942 ###
  48715.063 - 110073.800: 31454 #
  110073.800 - 248714.944:  8218 |
  248714.944 - 561977.042:  1030 |
  561977.042 - 1269798.254:   295 |
  1269798.254 - 2869132.711:   116 |
  2869132.711 - 6482857.000:    28 |

After this change:

  Range: 1554.000 - 4557014.000; Mean: 9168.164; Median: 6391.000; Stddev: 21467.060
  Percentiles:  90th: 12478.000; 95th: 20964.000; 99th: 72234.000
  1554.000 - 3453.820: 219004 ############
  3453.820 - 7674.743: 980645 #####################################################
  7674.743 - 17052.574: 552486 ##############################
  17052.574 - 37887.762: 68558 ####
  37887.762 - 84178.322: 31557 ##
  84178.322 - 187024.331: 12102 #
  187024.331 - 415522.355:  1364 |
  415522.355 - 923187.626:   256 |
  923187.626 - 2051092.468:   125 |
  2051092.468 - 4557014.000:    21 |

Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet