Tags

Tags give the ability to mark specific points in history as being important
  • fix-asciici-bugs-6.4_2023-04-12

    xfs: fix ascii-ci problems, then kill it [v2]
    
    Last week, I was fiddling around with the metadump name obfuscation code
    while writing a debugger command to generate directories full of names
    that all have the same hash name.  I had a few questions about how well
    all that worked with ascii-ci mode, and discovered a nasty discrepancy
    between the kernel and glibc's implementations of the tolower()
    function.
    
    I discovered that I could create a directory that is large enough to
    require separate leaf index blocks.  The hashes stored in the dabtree
    use the ascii-ci specific hash function, which uses a library function
    to convert the name to lowercase before hashing.  If the kernel and C
    library's versions of tolower do not behave exactly identically,
    xfs_ascii_ci_hashname will not produce the same results for the same
    inputs.  xfs_repair will deem the leaf information corrupt and rebuild
    the directory.  After that, lookups in the kernel will fail because the
    hash index doesn't work.
    
    The kernel's tolower function will convert extended ascii uppercase
    letters (e.g. A-with-umlaut) to extended ascii lowercase letters (e.g.
    a-with-umlaut), whereas glibc's will only do that if you force LANG to
    ascii.  Tiny embedded libc implementations just plain won't do it at
    all, and the result is a mess.  Stabilize the behavior of the hash
    function by encoding the name transformation function in libxfs, add it
    to the selftest, and fix all the userspace tools, none of which handle
    this transformation correctly.
    
    The v1 series generated a /lot/ of discussion, in which several things
    became very clear: (1) Linus is not enamored of case folding of any
    kind; (2) Dave and Christoph don't seem to agree on whether the feature
    is supposed to work for 7-bit ascii or latin1; (3) it trashes UTF8
    encoded names if those happen to show up; and (4) I don't want to
    maintain this mess any longer than I have to.  Kill it in 2030.
    
    v2: rename the functions to make it clear we're moving away from the
    letters t, o, l, o, w, e, and r; and deprecate the whole feature once
    we've fixed the bugs and added tests.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    
  • repair-bitmap-rework-6.4_2023-04-12

    xfs: rework online fsck incore bitmap [v24.5]
    
    In this series, we make some changes to the incore bitmap code: First,
    we shorten the prefix to 'xbitmap'.  Then, we rework some utility
    functions for later use by online repair and clarify how the walk
    functions are supposed to be used.
    
    Finally, we use all these new pieces to convert the incore bitmap to use
    an interval tree instead of linked lists.  This lifts the limitation
    that callers had to be careful not to set a range that was already set;
    and gets us ready for the btree rebuilder functions needing to be able
    to set bits in a bitmap and generate maximal contiguous extents for the
    set ranges.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    
  • scrub-strengthen-rmap-checking-6.4_2023-04-12

    xfs: strengthen rmapbt scrubbing [v24.5]
    
    This series strengthens space allocation record cross referencing by
    using AG block bitmaps to compute the difference between space used
    according to the rmap records and the primary metadata, and reports
    cross-referencing errors for any discrepancies.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    
  • scrub-fix-xattr-memory-mgmt-6.4_2023-04-12

    xfs: clean up memory management in xattr scrub [v24.5]
    
    Currently, the extended attribute scrubber uses a single VLA to store
    all the context information needed in various parts of the scrubber
    code.  This includes xattr leaf block space usage bitmaps, and the value
    buffer used to check the correctness of remote xattr value block
    headers.  We try to minimize the insanity through the use of helper
    functions, but this is a memory management nightmare.  Clean this up by
    making the bitmap and value pointers explicit members of struct
    xchk_xattr_buf.
    
    Second, strengthen the xattr checking by teaching it to look for overlapping
    data structures in the shortform attr data.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    
  • scrub-detect-mergeable-records-6.4_2023-04-12

    xfs: detect mergeable and overlapping btree records [v24.5]
    
    While I was doing differential fuzz analysis between xfs_scrub and
    xfs_repair, I noticed that xfs_repair was only partially effective at
    detecting btree records that can be merged, and xfs_scrub totally didn't
    notice at all.
    
    For every interval btree type except for the bmbt, there should never
    exist two adjacent records with adjacent keyspaces because the
    blockcount field is always large enough to span the entire keyspace of
    the domain.  This is because the free space, rmap, and refcount btrees
    have a blockcount field large enough to store the maximum AG length, and
    there can never be an allocation larger than an AG.
    
    The bmbt is a different story due to its ondisk encoding where the
    blockcount is only 21 bits wide.  Because AGs can span up to 2^31 blocks
    and the RT volume can span up to 2^52 blocks, a preallocation of 2^22
    blocks will be expressed as two records of 2^21 length.  We don't
    opportunistically combine records when doing bmbt operations, which is
    why the fsck tools have never complained about this scenario.
    
    Offline repair is partially effective at detecting mergeable records
    because I taught it to do that for the rmap and refcount btrees.  This
    series enhances the free space, rmap, and refcount scrubbers to detect
    mergeable records.  For the bmbt, it will flag the file as being
    eligible for an optimization to shrink the size of the data structure.
    
    The last patch in this set also enhances the rmap scrubber to detect
    records that overlap incorrectly.  This check is done automatically for
    non-overlapping btree types, but we have to do it separately for the
    rmapbt because there are constraints on which allocation types are
    allowed to overlap.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    
  • scrub-merge-bmap-records-6.4_2023-04-12

    xfs: merge bmap records for faster scrubs [v24.5]
    
    I started looking into performance problems with the data fork scrubber
    in generic/333, and noticed a few things that needed improving.  First,
    due to design reasons, it's possible for file forks btrees to contain
    multiple contiguous mappings to the same physical space.  Instead of
    checking each ondisk mapping individually, it's much faster to combine
    them when possible and check the combined mapping because that's fewer
    trips through the rmap btree, and we can drop this check-around
    behavior that it does when an rmapbt lookup produces a record that
    starts before or ends after a particular bmbt mapping.
    
    Second, I noticed that the bmbt scrubber decides to walk every reverse
    mapping in the filesystem if the file fork is in btree format.  This is
    very costly, and only necessary if the inode repair code had to zap a
    fork to convince iget to work.  Constraining the full-rmap scan to this
    one case means we can skip it for normal files, which drives the runtime
    of this test from 8 hours down to 45 minutes (observed with realtime
    reflink and rebuild-all mode.)
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    
  • scrub-iget-fixes-6.4_2023-04-12

    xfs: fix iget/irele usage in online fsck [v24.5]
    
    This patchset fixes a handful of problems relating to how we get and
    release incore inodes in the online scrub code.  The first patch fixes
    how we handle DONTCACHE -- our reasons for setting (or clearing it)
    depend entirely on the runtime environment at irele time.  Hence we can
    refactor iget and irele to use our own wrappers that set that context
    appropriately.
    
    The second patch fixes a race between the iget call in the inode core
    scrubber and other writer threads that are allocating or freeing inodes
    in the same AG by changing the behavior of xchk_iget (and the inode core
    scrub setup function) to return either an incore inode or the AGI buffer
    so that we can be sure that the inode cannot disappear on us.
    
    The final patch elides MMAPLOCK from scrub paths when possible.  It did
    not fit anywhere else.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    
  • scrub-parent-fixes-6.4_2023-04-12

    xfs: fix bugs in parent pointer checking [v24.5]
    
    Jan Kara pointed out that the VFS doesn't take i_rwsem of a child
    subdirectory that is being moved from one parent to another.  Upon
    deeper analysis, I realized that this was the source of a very hard to
    trigger false corruption report in the parent pointer checking code.
    
    Now that we've refactored how directory walks work in scrub, we can also
    get rid of all the unnecessary and broken locking to make parent pointer
    scrubbing work properly.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    
  • scrub-dir-iget-fixes-6.4_2023-04-12

    xfs: fix iget usage in directory scrub [v24.5]
    
    In this series, we fix some problems with how the directory scrubber
    grabs child inodes.  First, we want to reduce EDEADLOCK returns by
    replacing fixed-iteration loops with interruptible trylock loops.
    Second, we add UNTRUSTED to the child iget call so that we can detect a
    dirent that points to an unallocated inode.  Third, we fix a bug where
    we weren't checking the inode pointed to by dotdot entries at all.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    
  • scrub-detect-rmapbt-gaps-6.4_2023-04-12

    xfs: detect incorrect gaps in rmap btree [v24.5]
    
    Following in the theme of the last two patchsets, this one strengthens
    the rmap btree record checking so that scrub can count the number of
    space records that map to a given owner and that do not map to a given
    owner.  This enables us to determine exclusive ownership of space that
    can't be shared.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    
  • scrub-detect-inobt-gaps-6.4_2023-04-12

    xfs: detect incorrect gaps in inode btree [v24.5]
    
    This series continues the corrections for a couple of problems I found
    in the inode btree scrubber.  The first problem is that we don't
    directly check the inobt records have a direct correspondence with the
    finobt records, and vice versa.  The second problem occurs on
    filesystems with sparse inode chunks -- the cross-referencing we do
    detects sparseness, but it doesn't actually check the consistency
    between the inobt hole records and the rmap data.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    
  • scrub-detect-refcount-gaps-6.4_2023-04-12

    xfs: detect incorrect gaps in refcount btree [v24.5]
    
    The next few patchsets address a deficiency in scrub that I found while
    QAing the refcount btree scrubber.  If there's a gap between refcount
    records, we need to cross-reference that gap with the reverse mappings
    to ensure that there are no overlapping records in the rmap btree.  If
    we find any, then the refcount btree is not consistent.  This is not a
    property that is specific to the refcount btree; they all need to have
    this sort of keyspace scanning logic to detect inconsistencies.
    
    To do this accurately, we need to be able to scan the keyspace of a
    btree (which we already do) to be able to tell the caller if the
    keyspace is empty, sparse, or fully covered by records.  The first few
    patches add the keyspace scanner to the generic btree code, along with
    the ability to mask off parts of btree keys because when we scan the
    rmapbt, we only care about space usage, not the owners.
    
    The final patch closes the scanning gap in the refcountbt scanner.
    
    v23.1: create helpers for the key extraction and comparison functions,
           improve documentation, and eliminate the ->mask_key indirect
           calls
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    
  • scrub-btree-key-enhancements-6.4_2023-04-12

    xfs: enhance btree key scrubbing [v24.5]
    
    This series fixes the scrub btree block checker to ensure that the keys
    in the parent block accurately represent the block, and check the
    ordering of all interior key records.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    
  • rmap-btree-fix-key-handling-6.4_2023-04-12

    xfs: fix rmap btree key flag handling [v24.5]
    
    This series fixes numerous flag handling bugs in the rmapbt key code.
    The most serious transgression is that key comparisons completely strip
    out all flag bits from rm_offset, including the ones that participate in
    record lookups.  The second problem is that for years we've been letting
    the unwritten flag (which is an attribute of a specific record and not
    part of the record key) escape from leaf records into key records.
    
    The solution to the second problem is to filter attribute flags when
    creating keys from records, and the solution to the first problem is to
    preserve *only* the flags used for key lookups.  The ATTR and BMBT flags
    are a part of the lookup key, and the UNWRITTEN flag is a record
    attribute.
    
    This has worked for years without generating user complaints because
    ATTR and BMBT extents cannot be shared, so key comparisons succeed
    solely on rm_startblock.  Only file data fork extents can be shared, and
    those records never set any of the three flag bits, so comparisons that
    dig into rm_owner and rm_offset work just fine.
    
    A filesystem written with an unpatched kernel and mounted on a patched
    kernel will work correctly because the ATTR/BMBT flags have been
    conveyed into keys correctly all along, and we still ignore the
    UNWRITTEN flag in any key record.  This was what doomed my previous
    attempt to correct this problem in 2019.
    
    A filesystem written with a patched kernel and mounted on an unpatched
    kernel will also work correctly because unpatched kernels ignore all
    flags.
    
    With this patchset applied, the scrub code gains the ability to detect
    rmap btrees with incorrectly set attr and bmbt flags in the key records.
    After three years of testing, I haven't encountered any problems.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    
  • btree-complain-bad-records-6.4_2023-04-12

    xfs: standardize btree record checking code [v24.5]
    
    While I was cleaning things up for 6.1, I noticed that the btree
    _query_range and _query_all functions don't perform the same checking
    that the _get_rec functions perform.  In fact, they don't perform /any/
    sanity checking, which means that callers aren't warned about impossible
    records.
    
    Therefore, hoist the record validation and complaint logging code into
    separate functions, and call them from any place where we convert an
    ondisk record into an incore record.  For online scrub, we can replace
    checking code with a call to the record checking functions in libxfs,
    thereby reducing the size of the codebase.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    
  • btree-hoist-scrub-checks-6.4_2023-04-12

    xfs: hoist scrub record checks into libxfs [v24.5]
    
    There are a few things about btree records that scrub checked but the
    libxfs _get_rec functions didn't.  Move these bits into libxfs so that
    everyone can benefit.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    
  • scrub-drain-intents-6.4_2023-04-12

    xfs: drain deferred work items when scrubbing [v24.5]
    
    The design doc for XFS online fsck contains a long discussion of the
    eventual consistency models in use for XFS metadata.  In that chapter,
    we note that it is possible for scrub to collide with a chain of
    deferred space metadata updates, and proposes a lightweight solution:
    The use of a pending-intents counter so that scrub can wait for the
    system to drain all chains.
    
    This patchset implements that scrub drain.  The first patch implements
    the basic mechanism, and the subsequent patches reduce the runtime
    overhead by converting the implementation to use sloppy counters and
    introducing jump labels to avoid walking into scrub hooks when it isn't
    running.  This last paradigm repeats elsewhere in this megaseries.
    
    v23.1: make intent items take an active ref to the perag structure and
           document why we bump and drop the intent counts when we do
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    
  • scrub-fix-legalese-6.4_2023-04-12

    xfs_scrub: fix licensing and copyright notices [v24.5]
    
    Fix various attribution problems in the xfs_scrub source code, such as
    the author's contact information, out of date SPDX tags, and a rough
    estimate of when the feature was under heavy development.  The most
    egregious parts are the files that are missing license information
    completely.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    
  • intents-perag-refs-6.4_2023-04-12

    xfs: make intent items take a perag reference [v24.5]
    
    Now that we've cleaned up some code warts in the deferred work item
    processing code, let's make intent items take an active perag reference
    from their creation until they are finally freed by the defer ops
    machinery.  This change facilitates the scrub drain in the next patchset
    and will make it easier for the future AG removal code to detect a busy
    AG in need of quiescing.
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>
    
  • pass-perag-refs-6.4_2023-04-12

    xfs: pass perag references around when possible [v24.5]
    
    Avoid the cost of perag radix tree lookups by passing around active perag
    references when possible.
    
    v24.2: rework some of the naming and whatnot so there's less opencoding
    
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>