summaryrefslogtreecommitdiff
path: root/fs/ntfs
AgeCommit message (Collapse)Author
2026-05-11ntfs: restore $MFT mirror contents checkDaeMyung Kang
check_mft_mirror() still computes the number of bytes to validate in each mirrored MFT record, but the actual comparison against $MFTMirr was dropped when the superblock code was updated. As a result, mount misses a stale or inconsistent $MFTMirr as long as both records pass the structural baad-record checks. Restore the comparison and log an error when the primary $MFT record differs from its mirror copy. Returning false lets the existing mount error handling mark the volume as having NTFS errors and, with on_errors=remount-ro, continue read-only. The default on_errors=continue mount policy still allows the mount to proceed. Fixes: 6251f0b0de7d ("ntfs: update super block operations") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-05-10ntfs: fix empty_buf and ra lifetime bugs in ntfs_empty_logfile()DaeMyung Kang
ntfs_empty_logfile() has three related allocator bugs around the @empty_buf and @ra buffers it uses inside the per-cluster loop. When the loop encounters a runlist entry with LCN_RL_NOT_MAPPED, the function kvfrees @empty_buf and goes to map_vcn to remap. @empty_buf is not cleared. If ntfs_map_runlist_nolock() fails on re-entry, control jumps to the err label which kvfrees @empty_buf a second time. In the same branch, @ra is left allocated. When the remap succeeds the function falls through the @empty_buf re-allocation and the @ra re-allocation, overwriting the previous @ra pointer and leaking it. The success path frees @empty_buf with kfree() instead of kvfree(). kvzalloc() may fall back to vmalloc(), in which case kfree() does not correctly release the memory. A KASAN-enabled QEMU harness mirroring this control flow reports "BUG: KASAN: double-free" when the second ntfs_map_runlist_nolock() fails. Clear both @empty_buf and @ra after the in-loop releases so the err path is a no-op when the buffers have already been freed and so the remap-success path does not leak the previous @ra. Switch the success path to kvfree() to match the @empty_buf allocator. Fixes: 5218cd102aec ("ntfs: update misc operations") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-05-10ntfs: validate attribute name bounds before returning itDaeMyung Kang
ntfs_attr_find() validates a named attribute before comparing it with the requested name, but that check is currently after the AT_UNUSED handling. When callers enumerate attributes with AT_UNUSED, ntfs_attr_find() can return a malformed named attribute before checking whether name_offset and name_length stay within the attribute record. Some enumeration callers use the returned attribute name pointer directly. For example, one path passes (attr + name_offset, name_length) to ntfs_attr_iget(), where the name can later be copied according to name_length. A malformed on-disk name_offset/name_length pair should not be exposed to those callers. Move the existing name bounds validation before returning attributes during AT_UNUSED enumeration, and write it as an offset/remaining-size check so the subtraction cannot underflow. Extract the converted values into local variables (name_offset, attr_len, name_size) to make the intent explicit and avoid repeating the endian conversions inside the bounds check. This keeps matching attributes on the same checked path while also covering attribute enumeration. A small userspace ASAN model with attr length=32, name_offset=124 and name_length=8 reproduces a heap-buffer-overflow read in the old enumeration path. With this change the same malformed attribute is rejected before the name pointer is returned to the caller. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-05-10ntfs: fix MFT bitmap scan 2^32 boundary checkDaeMyung Kang
NTFS MFT record numbers are limited to the 32-bit range, and ntfs_mft_record_layout() rejects mft_no >= 2^32. The free-MFT-record bitmap scan in ntfs_mft_bitmap_find_and_alloc_free_rec_nolock() also guards against this overflow but uses a strict greater than comparison, allowing record number 2^32 itself through this earlier check. Every other 2^32 boundary check in fs/ntfs/mft.c uses '>=', so the strict greater than here is both a real off-by-one and an internal inconsistency. A model with ll == 2^32 confirms the current check accepts the value while the corrected check rejects it. Use '>=' so the boundary matches the layout-time rejection and the surrounding bitmap-scan checks. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-05-10ntfs: validate MFT attrs_offset against bytes_in_useDaeMyung Kang
ntfs_mft_record_check() verifies that attrs_offset is aligned and that the resulting pointer stays within the allocated MFT record buffer, but it does not check that the first attribute header starts within the bytes_in_use area. A malformed record with attrs_offset greater than bytes_in_use can pass this check as long as attrs_offset is still within bytes_allocated. The attribute parser then computes the remaining record space by subtracting the attribute pointer from bytes_in_use. Because that value is unsigned, the subtraction can underflow and allow bytes after bytes_in_use to be interpreted as an attribute. Reject records where attrs_offset is outside bytes_in_use or where the used area does not even contain the four-byte attribute type/AT_END terminator at attrs_offset. A small userspace model with attrs_offset=128 and bytes_in_use=64 shows the current check accepts the record and the parser space calculation underflows to 0xffffffc0. With this change the same malformed record is rejected before the attribute walker is entered. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-05-09ntfs: fix missing kstrdup() error check in ntfs_write_volume_label()Zhan Xusheng
ntfs_write_volume_label() does not check the return value of kstrdup(). If the allocation fails, vol->volume_label is set to NULL while the function returns success. A subsequent FS_IOC_GETFSLABEL then returns an empty string even though the on-disk label was updated correctly. Fix by allocating the new label before taking vol_ni->mrec_lock and updating any on-disk metadata, so an -ENOMEM from kstrdup() leaves both the in-memory and on-disk labels untouched and consistent. On success the preallocated copy replaces the old vol->volume_label. Also move mark_inode_dirty_sync() into the success path so that it is not called when no metadata was actually modified. Fixes: 6251f0b0de7d ("ntfs: update super block operations") Suggested-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-05-08ntfs: avoid leaking uninitialised bytes in new security descriptorsDaeMyung Kang
ntfs_sd_add_everyone() builds the on-disk security descriptor for a newly created file by kmalloc()'ing a buffer and then partially filling it in: sd = kmalloc(sd_len, GFP_NOFS); ... sd->revision = 1; sd->control = SE_DACL_PRESENT | SE_SELF_RELATIVE; ... The buffer is then handed to ntfs_attr_add() and persisted as the SECURITY_DESCRIPTOR attribute of the new MFT record. The descriptor covers a relative security descriptor header, two SIDs (owner and group), an ACL header, and a single ACE, but several fields inside those structures are never written before the buffer is committed to disk: - struct security_descriptor_relative @alignment (1 byte) @sacl (4 bytes; SE_SACL_PRESENT is not set but the offset still reaches disk) - struct ntfs_sid (3 instances: owner, group, ACE.sid) identifier_authority.value[0..4] (5 bytes per SID, 15 total - only value[5] is set) - struct ntfs_acl @alignment1 (1 byte) @alignment2 (2 bytes) That is 23 bytes of uninitialised slab memory persisted to disk for every new file or directory the legacy ntfs driver creates. The "+ 4" trailing accounting in sd_len holds ace->sid.sub_authority[0], which the existing code does explicitly write to zero, so it is not part of the leak. Anything later able to read the SECURITY_DESCRIPTOR attribute - the same NTFS volume mounted on Windows or by another NTFS reader, an offline forensics tool, an unprivileged user that ends up with read access to the volume - can recover those bytes. The leak persists for the lifetime of the file on disk, not just the lifetime of the kernel that wrote it. Switch the allocation to kzalloc() so every byte the on-disk descriptor covers is zero before the explicit initialisations run. While there, replace the bare "return -1" allocation-failure path with a proper -ENOMEM so the error reaches userspace as a meaningful errno instead of an unrelated -EPERM. Found by inspection while auditing fs/ntfs new-inode paths. Fixes: af0db57d4293 ("ntfs: update inode operations") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-05-08ntfs: fix out-of-bounds write in ntfs_index_walk_down()DaeMyung Kang
ntfs_index_walk_down() used to update the index traversal depth directly before writing parent_pos[] and parent_vcn[]. A malformed directory index with too many child-node levels can therefore advance pindex past MAX_PARENT_VCN and write past the fixed arrays in struct ntfs_index_context, corrupting context state used by later index traversal. Use ntfs_icx_parent_inc() for walk-down transitions so the existing depth limit is enforced before the arrays are updated. Make the helper check the limit before incrementing pindex so failed callers do not leave the context at an out-of-range depth. This is reachable by iterating a crafted NTFS directory after the volume has been mounted, including read-only mounts. The reproducer uses getdents64() on an index root that points to an excessively deep chain of child index blocks. A crafted directory index with a chain of child-node entries reproduced UBSAN array-index-out-of-bounds reports in ntfs_index_walk_down() and subsequent KASAN reports in ntfs_index_walk_up(). With this change, the same image is rejected with "Index is over 32 level deep" and no KASAN or UBSAN report is emitted. Fixes: 0a8ac0c1fa0b ("ntfs: update directory operations") Suggested-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-05-08ntfs: fix out-of-bounds write in ntfs_rl_collapse_range() merge pathDaeMyung Kang
ntfs_rl_collapse_range() merges the run on the left of the collapsed region with the run on its right when they are contiguous. The contiguous check chooses a clamped index when @new_1st_cnt is 0: i = new_1st_cnt == 0 ? 1 : new_1st_cnt; if (ntfs_rle_lcn_contiguous(&new_rl[i - 1], &new_rl[i])) { but the merge itself uses the unclamped value: s_rl = &new_rl[new_1st_cnt - 1]; s_rl->length += s_rl[1].length; When @new_1st_cnt is 0 this computes &new_rl[-1] and writes 8 bytes before the kvcalloc() runlist buffer. The path is reachable through fallocate(FALLOC_FL_COLLAPSE_RANGE) starting at vcn 0 against an attribute whose first run after the collapsed region and the following run are holes. In that case ntfs_rle_lcn_contiguous() returns true because both checked entries are LCN_HOLE, so the merge path is entered with @new_1st_cnt still 0. Such consecutive holes do not occur on a well-formed runlist (NTFS keeps runlists coalesced in memory), so this OOB path is only reachable from a crafted volume. A normal runlist has no element to the left of vcn 0, so the left/right merge is not valid when @new_1st_cnt is 0. Require @new_1st_cnt to be positive before checking or performing the merge. This skips the merge entirely in that case instead of clamping the merge target. The out-of-bounds write can corrupt an adjacent slab object. On a non-KASAN kernel, it is reachable after a crafted NTFS volume has been mounted read-write with the legacy fs/ntfs driver, by a local user that has write access to the crafted file. Fixes: 11ccc9107dc4 ("ntfs: update runlist handling and cluster allocator") Suggested-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-05-08ntfs: fix variable dereferenced before check ni in ntfs_attr_open()Namjae Jeon
Smatch warnings: ntfs_attr_open() warn: variable dereferenced before check 'ni' Moves the ntfs_debug() call after the NULL pointer checks to ensure safe access to the structure members. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-05-08ntfs: fix default_upcase refcount underflow and UAF on fs_context teardownDaeMyung Kang
ntfs_init_fs_context() allocates a fresh ntfs_volume with vol->upcase left as NULL. ntfs_free_fs_context() unconditionally calls ntfs_volume_free() during fs_context teardown, even when ntfs_fill_super() never ran or already cleaned up. ntfs_volume_free() then executes: mutex_lock(&ntfs_lock); if (vol->upcase == default_upcase) { ntfs_nr_upcase_users--; vol->upcase = NULL; } When the global default_upcase is also NULL (very first mount attempt, or all prior mounts have released the table), the comparison is NULL == NULL, and ntfs_nr_upcase_users is decremented even though this volume never claimed a reference. ntfs_nr_upcase_users is unsigned long, so the decrement wraps to ULONG_MAX. A subsequent successful mount can then free the shared table while the mounted volume still points at it: 1. ntfs_fill_super() does the temporary ntfs_nr_upcase_users++ at the "Generate the global default upcase table if necessary" block. With the prior wraparound this brings the counter back to 0. 2. If the volume's $UpCase matches the default, the match path does ntfs_nr_upcase_users++ and sets vol->upcase = default_upcase. The counter is now 1. 3. On the success path, !--ntfs_nr_upcase_users evaluates true and default_upcase is kvfree()'d while vol->upcase still points at it. Subsequent upcase comparisons through that mount touch freed memory. This was reproduced with KASAN by closing a fresh fsopen("ntfs") context, then mounting an NTFS image whose $UpCase table matches generate_default_upcase(), and finally doing a case-insensitive lookup. KASAN reports the dangling vol->upcase access: BUG: KASAN: use-after-free in ntfs_collate_names+0x3b4/0x420 Read of size 2 at addr ffff888008d40048 by task init/1 ntfs_collate_names+0x3b4/0x420 ntfs_lookup_inode_by_name+0x1921/0x3130 ntfs_lookup+0x193/0xc40 vfs_statx+0xc7/0x190 vfs_fstatat+0x4b/0xa0 __do_sys_newfstatat+0x92/0xf0 The same QEMU reproducer was rerun after this change with KASAN enabled. It reached "reproducer finished", and the log contained no KASAN, use-after-free, Oops, or panic signatures. Guard each comparison with an explicit vol->upcase non-NULL check so a volume that never took a reference cannot decrement the global users counter. Apply the same guard to the other default_upcase release sites so all cleanup paths follow the same ownership rule: only volumes that actually hold a default_upcase reference may drop one. Fixes: 1e9ea7e04472 ("Revert "fs: Remove NTFS classic"") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-05-08ntfs: match ntfs_resident_attr_min_value_length with $AttrDefHyunchul Lee
Update ntfs_resident_attr_min_value_length() to align with $AttrDef. The $VOLUME_NAME is allowed to have the size of 0. The Windows 11 $AttrDef values are as follows: Attribute Name (ID) Size (Min-Max) Flags $STANDARD_INFORMATION (16) 48-72 Resident $ATTRIBUTE_LIST (32) No Limit Non-resident $FILE_NAME (48) 68-578 Resident, Index $OBJECT_ID (64) 0-256 Resident $SECURITY_DESCRIPTOR (80) No Limit Non-resident $VOLUME_NAME (96) 2-256 Resident $VOLUME_INFORMATION (112) 12-12 Resident $DATA (128) No Limit (None) $INDEX_ROOT (144) No Limit Resident $INDEX_ALLOCATION (160) No Limit Non-resident $BITMAP (176) No Limit Non-resident $REPARSE_POINT (192) 0-16384 Non-resident $EA_INFORMATION (208) 8-8 Resident $EA (224) 0-65536 (None) $LOGGED_UTILITY_STREAM (256) 0-65536 Non-resident Reported-by: woot000 <woot000@woot000.com> Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-05-08ntfs: avoid use-after-free of index inode in ntfs_inode_sync_filename()DaeMyung Kang
ntfs_inode_sync_filename() walks every FILE_NAME attribute and, for each one that points at a different parent, opens the parent index inode with ntfs_iget() and locks index_ni->mrec_lock. All three error branches (NInoBeingDeleted, ntfs_index_ctx_get failure, ntfs_index_lookup failure) drop the parent reference before unlocking: iput(index_vi); mutex_unlock(&index_ni->mrec_lock); continue; index_ni is NTFS_I(index_vi), so the ntfs_inode (and its mrec_lock) is embedded in the inode allocation. If the parent directory is not held outside the icache - no open dentry, recently evicted from dcache, no other concurrent lookup - ntfs_iget() returns with i_count == 1 and our iput() drops the last reference. evict_inode() then runs and destroy_inode() schedules the slab object for RCU free, while mutex_unlock() on the next line is still touching index_ni->mrec_lock. Swap the order so the mutex is dropped while index_vi is still alive, matching the success path at the bottom of the loop which already unlocks before iput(). Reproduced under KASAN with a debug build that forces ntfs_index_ctx_get() to fail when the parent index inode has been opened with i_count == 1. KASAN reports a slab-use-after-free read on the parent's mrec_lock from mutex_unlock() on the writeback worker: BUG: KASAN: slab-use-after-free in __mutex_unlock_slowpath+0xb5/0x970 Read of size 8 at addr ffff8880014b7598 by task kworker/u8:0/12 Workqueue: writeback wb_workfn (flush-253:0) Call Trace: mutex_unlock ntfs_inode_sync_filename __ntfs_write_inode ntfs_write_inode __writeback_single_inode Allocated by task 103: ntfs_alloc_big_inode ntfs_iget ntfs_lookup __x64_sys_mkdir Freed by task 12: ntfs_free_big_inode i_callback rcu_do_batch Last potentially related work creation: call_rcu destroy_inode evict dispose_list evict_inodes ntfs_inode_sync_filename __ntfs_write_inode The buggy address belongs to the object at ffff8880014b7440 which belongs to the cache ntfs_big_inode_cache of size 1800 The freed object is the parent directory inode itself: allocated by mkdir(2) via ntfs_iget(), then released through call_rcu(i_callback) that destroy_inode() scheduled when evict_inodes() ran from inside ntfs_inode_sync_filename(). Re-running the same workload with mutex_unlock() moved before iput() runs cleanly under KASAN. Fixes: af0db57d4293 ("ntfs: update inode operations") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-05-08ntfs: fix copy length in ntfs_bdev_write() for non-page-aligned startDaeMyung Kang
This is not a normal data I/O hot path. The single in-tree caller is the $LogFile emptying path used during read-write mount/remount, and the bug only becomes visible on NTFS volumes whose cluster_size is strictly smaller than the kernel's PAGE_SIZE (typically 4 KiB on x86_64). Per Microsoft's format command documentation, NTFS supports allocation unit sizes starting at 512 bytes, so 512 B, 1 KiB and 2 KiB clusters are uncommon but valid on-disk configurations. When cluster_size >= PAGE_SIZE every "start" passed in is page-aligned and the buggy "from != 0" path is never taken. ntfs_bdev_write() splits the write across one or more block-device folios. Inside the loop, "to" is computed as the *end byte offset* within the current page (0..PAGE_SIZE), and "from" is the start byte offset within the page (reset to 0 from the second iteration onward). The copy length should therefore be "to - from", but the current code uses "to" directly: to = min_t(u32, end - offset, PAGE_SIZE); memcpy_to_folio(folio, from, buf + buf_off, to); buf_off += to; When "from != 0" (i.e. "start" is not page-aligned) memcpy_to_folio() copies "from" extra bytes: - it reads "from" bytes past the source buffer into kernel heap; - it writes "from" bytes past the requested range into the next part of the block-device page (or, if "from + to > PAGE_SIZE", past the folio boundary entirely, which trips the VM_BUG_ON inside memcpy_to_folio() on CONFIG_DEBUG_VM=y kernels). "buf_off" is then advanced by the wrong amount, so every subsequent iteration also reads the source buffer at the wrong offset and writes the wrong content to disk. ntfs_empty_logfile() calls ntfs_bdev_write(sb, empty_buf, NTFS_CLU_TO_B(vol, lcn), vol->cluster_size); with empty_buf sized to vol->cluster_size. On a sub-PAGE_SIZE-cluster volume, any $LogFile run whose LCN is not aligned to PAGE_SIZE / cluster_size reaches the non-page-aligned path. The over-copy can read beyond empty_buf and overwrite the sectors following the requested cluster in the block-device page with unrelated kernel heap contents while $LogFile is being emptied. A userspace reducer of the same arithmetic and copy loop confirms the bug under AddressSanitizer: ASan reports a heap-buffer-overflow read past the source buffer for the buggy length, and the fixed version is ASan-clean. Compute the copy length as "to - from" and advance buf_off by the same amount. Fixes: 5218cd102aec ("ntfs: update misc operations") Link: https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/format Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-05-08ntfs: wait for sync mft writes to completeDaeMyung Kang
ntfs_sync_mft_mirror() and write_mft_record_nolock() with @sync set are both documented as synchronous, but neither actually waits for the bio they submit nor inspects bi_status. write_inode() can return success while dirty mft record bytes are still in flight, and bio errors are silently dropped: the volume is not marked with errors and the inode is not redirtied. This breaks fsync()/sync metadata durability. Switch ntfs_sync_mft_mirror() and the @sync path of write_mft_record_nolock() to submit_bio_wait() and propagate the returned error to the caller. Capture ntfs_sync_mft_mirror()'s return value at its call sites in write_mft_record_nolock() so a mirror write failure surfaces too. The @sync parameter only controls the main MFT bio. The !@sync main submission is therefore unchanged and still uses ntfs_bio_end_io() to drop the folio reference taken before submission. The mirror call has always been documented as performing synchronous I/O regardless of @sync, so making it actually block restores the originally intended contract for both @sync and !@sync callers. Note this only fixes the synchronous mirror/main paths reachable from write_mft_record_nolock(). The main MFT write submitted from ntfs_write_mft_block() (the .writepages path) still does not wait for completion or check bi_status; that requires a larger restructuring and is left to a follow-up patch. Fixes: 115380f9a2f9 ("ntfs: update mft operations") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-05-08ntfs: capture mft mirror sync errors in ntfs_write_mft_block()DaeMyung Kang
After ntfs_sync_mft_mirror() became able to return real I/O errors, ntfs_write_mft_block() still discards its return value at the call site inside the per-record loop. A failed $MFTMirr write therefore leaves the volume looking clean from the writeback path even though the on-disk mirror is now stale. Capture the return value and feed it into the function's existing @err variable using the same "first error wins" pattern already used on other failure paths. The error is propagated to the caller and, via the existing tail of the function, sets NVolErrors so umount and chkdsk see the volume as inconsistent. Fixes: 115380f9a2f9 ("ntfs: update mft operations") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-05-08ntfs: redirty folio when ntfs_write_mft_block() runs out of memoryDaeMyung Kang
ntfs_write_mft_block() is called by writeback_iter() with the folio locked. When the per-call allocations for @locked_nis or @ref_inos fail, the function returns -ENOMEM directly without unlocking the folio. Any later task that needs the folio's lock then stalls, and the folio's dirty state is silently lost from the writeback iterator's point of view. Use folio_redirty_for_writepage() so the folio remains dirty for a subsequent writeback pass, unlock it, and only then return -ENOMEM so the caller can propagate the error to fsync()/sync_filesystem(). Fixes: f462fdf3d6a4 ("ntfs: reduce stack usage in ntfs_write_mft_block()") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-05-08ntfs: use base mft_no when looking up base inode for extent recordDaeMyung Kang
When the mft record is an extent record, ntfs_may_write_mft_record() looks up its base inode in the icache. The hash key passed to find_inode_nowait() must be the base inode's mft number (na.mft_no, set just above to MREF_LE(m->base_mft_record)), but the code passes @mft_no, the extent record's own number. find_inode_nowait() uses its second argument as the hashval, so the lookup lands in the wrong bucket and almost always returns NULL. ntfs_may_write_mft_record() then returns false and the writeback path (ntfs_write_mft_block()) skips that extent record, leaving the on-disk copy permanently out of sync with the in-memory one. The original ilookup5_nowait() call this conversion replaced used na.mft_no. Restore that. Fixes: 115380f9a2f9 ("ntfs: update mft operations") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-05-03ntfs: fix variable dereferenced before check ni and attr in ↵Namjae Jeon
ntfs_attrlist_entry_add() Smatch warnings: ntfs_attrlist_entry_add() warn: variable dereferenced before check 'ni' ntfs_attrlist_entry_add() warn: variable dereferenced before check 'attr' Moves the ntfs_debug() call after the NULL pointer checks to ensure safe access to the structure members. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-30ntfs: Use return instead of goto in ntfs_mapping_pairs_decompress()Nathan Chancellor
Clang warns (or errors with CONFIG_WERROR=y / W=e): fs/ntfs/runlist.c:755:6: error: variable 'rl' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] 755 | if (overflows_type(lowest_vcn, vcn)) { | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... fs/ntfs/runlist.c:971:9: note: uninitialized use occurs here 971 | kvfree(rl); | ^~ ... rl has not been allocated at this point so the 'goto err_out' should really just be a return of the error pointer -EIO. Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-30ntfs: drop nlink once for WIN32/DOS aliasesHyunchul Lee
NTFS could store a filename as paired WIN32 and DOS $FILE_NAME attributes for directories. But ntfs_delete() deleted both attributes for unlinking a directory, but it also called drop_nlink() for each attributes. This could trigger warnings when unlinking directories. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-28ntfs: fix invalid PTR_ERR() usage in __ntfs_bitmap_set_bits_in_run()Namjae Jeon
The Smatch reported a warning in __ntfs_bitmap_set_bits_in_run(): "warn: passing a valid pointer to 'PTR_ERR'" This occurs because the 'folio' variable might contain a valid pointer when jumping to the 'rollback' label, specifically when 'cnt <= 0' is detected during the subsequent page mapping loop. In such cases, calling PTR_ERR(folio) is incorrect as it does not contain an error code. Fix this by introducing an explicit 'err' variable to track the error status. This ensures that the rollback logic and the return value consistently use a proper error code regardless of the state of the folio pointer. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-27ntfs: fix error handling in ntfs_write_iomap_end_resident()Namjae Jeon
When ntfs_attr_get_search_ctx() fails and returns NULL, the function returned early without calling put_page(ipage). Fix this by jumping to err_out label on error. The err_out path now properly releases the page and the mutex, with a NULL check for the search context. Reported-by: DaeMyung Kang <charsyam@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-27ntfs: fix VCN overflow in ntfs_mapping_pairs_decompress()Zhan Xusheng
In ntfs_mapping_pairs_decompress(), lowest_vcn is read from on-disk metadata and used as the initial vcn without validation. A malformed value can introduce an invalid (e.g. negative) vcn, corrupting the runlist from the start. Additionally, the accumulation vcn += deltaxcn does not check for s64 overflow. A crafted mapping pairs array can wrap vcn to a negative value, breaking the monotonically- increasing invariant relied upon by ntfs_rl_vcn_to_lcn() and related helpers. Fix this by validating lowest_vcn and using check_add_overflow() for vcn accumulation. Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-27ntfs: fix WSL symlink target leak on reparse failureDaeMyung Kang
ntfs_reparse_set_wsl_symlink() converts the symlink target into an allocated NLS string and transfers ownership to ni->target only after ntfs_set_ntfs_reparse_data() succeeds. If setting the reparse data fails, the converted target is left unreferenced and leaks. Free the converted target on the reparse update failure path. Use kfree() for the other local failure path as well, matching the ntfs_ucstonls() allocation contract. Fixes: fc053f05ca28 ("ntfs: add reparse and ea operations") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-27ntfs: fix NULL dereference in ntfs_index_walk_down()DaeMyung Kang
ntfs_index_walk_down() allocates ictx->ib when descending from the root into an index allocation block. If that allocation fails, the old code still passes the NULL buffer to ntfs_ib_read(), which can write through it via ntfs_inode_attr_pread(). Allocate the index block into a temporary pointer and return -ENOMEM before changing the index context on allocation failure. Also propagate ERR_PTR() through ntfs_index_next() and ntfs_readdir() so walk-down allocation or index block read failures are not mistaken for normal index iteration inside the filesystem. ntfs_readdir() keeps the existing userspace-visible behavior of suppressing readdir errors after marking end_in_iterate; this change only prevents the walk-down failure path from dereferencing NULL internally. The failure was reproduced with failslab fail-nth injection on getdents64; the original module hits a NULL pointer dereference in memcpy_orig through ntfs_ib_read(), while the patched module reaches the same ntfs_index_walk_down() allocation failure without crashing. Fixes: 0a8ac0c1fa0b ("ntfs: update directory operations") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-22ntfs: use page allocation for resident attribute inline dataNamjae Jeon
The current kmemdup() based allocation for IOMAP_INLINE can result in inline_data pointer having a non-zero page offset. This causes iomap_inline_data_valid() to fail the check: iomap->length <= PAGE_SIZE - offset_in_page(iomap->inline_data) and triggers the kernel BUG at fs/iomap/buffered-io.c:1061. This particularly affects workloads with frequent small file access (e.g. Firefox Nightly profile on NTFS with bind mount) when using the new ntfs. This fix this by allocating a full page with alloc_page() so that page_address() always returns a page-aligned address. Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-22ntfs: fix mmap_prepare writable check for shared mappingsNamjae Jeon
Linus pointed out that checking only VMA_WRITE_BIT is incorrect. Private writable mappings (MAP_PRIVATE) set VM_WRITE but do not write back to the filesystem. Also, mappings that can become writable via mprotect() (VM_MAYWRITE) must be handled. Use vma_desc_test_all(VMA_SHARED_BIT, VMA_MAYWRITE_BIT) instead, which matches what other filesystems do. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-18ntfs: fix potential 32-bit truncation in ntfs_write_cb()Dan Carpenter
Smatch warned that the bitwise negation in ntfs_write_cb() might lead to unintended truncation. Casting the block size to loff_t before bitwise negation prevents the upper 32 bits of pos from being incorrectly zeroed out during the calculation of new_vcn. Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-18ntfs: fix uninitialized variable in ntfs_map_runlist_nolockNamjae Jeon
Smatch reported that ctx_needs_reset could be used uninitialized if ntfs_map_runlist_nolock() fails early when a search context is provided. Specifically, if the function returns -EIO because the attribute is resident, the code jumps to err_out. This initializes ctx_needs_reset to false to satisfy the static checker. Reported-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-18ntfs: delete dead codeDan Carpenter
We know "ret2" is zero so there is no need to check. Delete the if statement. Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-18ntfs: add missing error code in ntfs_mft_record_alloc()Dan Carpenter
Return -ENOMEM if the kmalloc() fails. Don't return success. Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-18ntfs: fix uninitialized variables in ntfs_ea_set_wsl_inode()Namjae Jeon
Smatch reported uninitialized symbol warnings in ntfs_ea_set_wsl_inode() and __ntfs_create(). In ntfs_ea_set_wsl_inode(), the err variable could be returned without initialization if no flags are set and rdev is zero. Additionally, ea_size might remain uninitialized from the caller's perspective if no EA operations are performed. While these cases might not be triggered under current logic, we initialize them to zero to satisfy the static checker. Reported-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-18ntfs: fix uninitialized pointer in ntfs_write_mft_blockNamjae Jeon
Smatch reported that the variable rl could be used uninitialized in ntfs_write_mft_block(). After analyzing the code, when vol->cluster_size == NTFS_BLOCK_SIZE (512), it is smaller than folio_size, so rl is guaranteed to be initialized. If vol->cluster_size is larger, the condition to access rl becomes false, so a runtime error is not expected to occur. However, to make the static checker happy, this patch initializes rl to NULL and adds an explicit check before its usage. Reported-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-18ntfs: fix uninitialized variable in ntfs_write_simple_iomap_begin_non_residentNamjae Jeon
Smatch reported that err could be used uninitialized if the code path does not enter the first ntfs_zero_range() block. Reported-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-18ntfs: remove noop_direct_IO from address_space_operationsHyunchul Lee
Since commit a2ad63daa88b ("VFS: add FMODE_CAN_ODIRECT file flag"), noop_direct_io is not required. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-18ntfs: limit memory allocation in ntfs_attr_readallHyunchul Lee
check an attribute size before memory allocation, and reject if the size is over the maximum size. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-18ntfs: not zero out range beyond init in punch_holeHyunchul Lee
The area beyond initialized_size are read as zero values, there is no need to zero out that region. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-18ntfs: zero out stale data in straddle block beyond initialized_sizeHyunchul Lee
ntfs_read_iomap_begin_non_resident() rounds up MAPPED extents to the block boundary of initialized_size. This ensures that any subsequent blocks are treated as IOMAP_UNWRITTEN, but it also causes the "straddle block" containing initialized_size to be read from disk. The disk data beyond initialized_size in this block is stale and must be zeroed to prevent data leakage. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-17Merge tag 'ntfs-for-7.1-rc1-v2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/ntfs Pull ntfs resurrection from Namjae Jeon: "Ever since Kari Argillander’s 2022 report [1] regarding the state of the ntfs3 driver, I have spent the last 4 years working to provide full write support and current trends (iomap, no buffer head, folio), enhanced performance, stable maintenance, utility support including fsck for NTFS in Linux. This new implementation is built upon the clean foundation of the original read-only NTFS driver, adding: - Write support: Implemented full write support based on the classic read-only NTFS driver. Added delayed allocation to improve write performance through multi-cluster allocation and reduced fragmentation of the cluster bitmap. - iomap conversion: Switched buffered IO (reads/writes), direct IO, file extent mapping, readpages, and writepages to use iomap. - Remove buffer_head: Completely removed buffer_head usage by converting to folios. As a result, the dependency on CONFIG_BUFFER_HEAD has been removed from Kconfig. - Stability improvements: The new ntfs driver passes 326 xfstests, compared to 273 for ntfs3. All tests passed by ntfs3 are a complete subset of the tests passed by this implementation. Added support for fallocate, idmapped mounts, permissions, and more. xfstests Results report: Total tests run: 787 Passed : 326 Failed : 38 Skipped : 423 Failed tests breakdown: - 34 tests require metadata journaling - 4 other tests: 094: No unwritten extent concept in NTFS on-disk format 563: cgroup v2 aware writeback accounting not supported 631: RENAME_WHITEOUT support required 787: NFS delegation test" Link: https://lore.kernel.org/all/da20d32b-5185-f40b-48b8-2986922d8b25@stargateuniverse.net/ [1] [ Let's see if this undead filesystem ends up being of the "Easter miracle" kind, or the "Nosferatu of filesystems" kind... ] * tag 'ntfs-for-7.1-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/ntfs: (46 commits) ntfs: remove redundant out-of-bound checks ntfs: add bound checking to ntfs_external_attr_find ntfs: add bound checking to ntfs_attr_find ntfs: fix ignoring unreachable code warnings ntfs: fix inconsistent indenting warnings ntfs: fix variable dereferenced before check warnings ntfs: prefer IS_ERR_OR_NULL() over manual NULL check ntfs: harden ntfs_listxattr against EA entries ntfs: harden ntfs_ea_lookup against malformed EA entries ntfs: check $EA query-length in ntfs_ea_get ntfs: validate WSL EA payload sizes ntfs: fix WSL ea restore condition ntfs: add missing newlines to pr_err() messages ntfs: fix pointer/integer casting warnings ntfs: use ->mft_no instead of ->i_ino in prints ntfs: change mft_no type to u64 ntfs: select FS_IOMAP in Kconfig ntfs: add MODULE_ALIAS_FS ntfs: reduce stack usage in ntfs_write_mft_block() ntfs: fix sysctl table registration and path ...
2026-04-07ntfs: remove redundant out-of-bound checksHyunchul Lee
Remove redundant out-of-bounds validations. Since ntfs_attr_find and ntfs_external_attr_find now validate the attribute value offsets and lengths against the bounds of the MFT record block, performing subsequent bounds checking in caller functions like ntfs_attr_lookup is no longer necessary. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-07ntfs: add bound checking to ntfs_external_attr_findHyunchul Lee
Add bound validation in ntfs_external_attr_find to prevent out-of-bounds memory accesses. This ensures that the attribute record's length, name offset, and both resident and non-resident value offsets strictly fall within the safe boundaries of the MFT record. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-07ntfs: add bound checking to ntfs_attr_findHyunchul Lee
Add bound validations in ntfs_attr_find to ensure attribute value offsets and lengths are safe to access. It verifies that resident attributes meet type-specific minimum length requirements and check the mapping_pairs_offset boundaries for non-resident attributes. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-16ntfs: fix ignoring unreachable code warningsHyunchul Lee
Detected by Smatch. inode.c:1796 load_attribute_list_mount() warn: ignoring unreachable code. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-16ntfs: fix inconsistent indenting warningsHyunchul Lee
Detected by Smatch. ndex.c:2041 ntfs_index_walk_up() warn: inconsistent indenting mft.c:2462 ntfs_mft_record_alloc() warn: inconsistent indenting Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-16ntfs: fix variable dereferenced before check warningsHyunchul Lee
Detected by Smatch. lcnalloc.c:736 ntfs_cluster_alloc() error: we previously assumed 'rl' could be null (see line 719) inode.c:3275 ntfs_inode_close() warn: variable dereferenced before check 'tmp_nis' (see line 3255) attrib.c:4952 ntfs_attr_remove() warn: variable dereferenced before check 'ni' (see line 4951) dir.c:1035 ntfs_readdir() error: we previously assumed 'private' could be null (see line 850) Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-16ntfs: prefer IS_ERR_OR_NULL() over manual NULL checkHyunchul Lee
Use IS_ERR_OR_NULL() instead of manual NULL and IS_ERR() checks. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-16ntfs: harden ntfs_listxattr against EA entriesHyunchul Lee
Validate every EA entry only if the buffer length is required to prevent large memory allocation. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-16ntfs: harden ntfs_ea_lookup against malformed EA entriesHyunchul Lee
Validate p_ea->ea_name_length tightly, and the used entry size for every EA. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-16ntfs: check $EA query-length in ntfs_ea_getHyunchul Lee
if ea_info_qlen exceeds all_ea_size, OOB can happen. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>