From 1b6444331bebb4e586dab30b1fa0612ef482bab4 Mon Sep 17 00:00:00 2001 From: Zi Yan Date: Sun, 17 May 2026 09:54:15 -0400 Subject: mm/khugepaged: enable clean pagecache folio collapse for writable files collapse_file() is capable of collapsing pagecache folios from writable files to PMD folios. Now enable clean pagecache folio collapse in addition to read-only pagecache folio collapse by removing the inode_is_open_for_write() from file_thp_enabled() and only performing filemap_flush() if the file is read-only. This means userspace needs to explicitly flush the content of pagecache folios before khugepaged can collapse the folios, or use madvise(MADV_COLLAPSE), which does the flush in the retry. The reason is that blindly enabling dirty pagecache folio from writable files collapse makes khugepaged flush these folios all the time. It is undesirable to cause system level pagecache flushes. To properly support dirty pagecache folio collapse, filemap_flush() needs to be avoided. Potentially, merging associated buffer instead of dropping it with filemap_release_folio() might be needed. NOTE: this breaks khugepaged selftests for writable file pagecache collapse, which is set to fail all the time. The next commit fixes it. Link: https://lore.kernel.org/20260517135416.1434539-14-ziy@nvidia.com Signed-off-by: Zi Yan Reviewed-by: Lance Yang Cc: Al Viro Cc: Baolin Wang Cc: Barry Song Cc: Chris Mason Cc: Christian Brauner Cc: David Hildenbrand (Arm) Cc: David Sterba Cc: Dev Jain Cc: Jan Kara Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Mike Rapoport Cc: Nico Pache Cc: Ryan Roberts Cc: Shuah Khan Cc: Song Liu Cc: Suren Baghdasaryan Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- mm/huge_memory.c | 2 +- mm/khugepaged.c | 15 +++++++++------ 2 files changed, 10 insertions(+), 7 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 74bdf6ecd0cd..cbc0f0d3bf02 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -100,7 +100,7 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) if (!mapping_pmd_folio_support(vma->vm_file->f_mapping)) return false; - return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); + return S_ISREG(inode->i_mode); } /* If returns true, we are unable to access the VMA's folios. */ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index d9de48423a2e..e69a0fb4c7cc 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2345,18 +2345,21 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr, } else if (folio_test_dirty(folio)) { /* * This page is dirty because it hasn't - * been flushed since first write. There - * won't be new dirty pages. + * been flushed since first write. * - * Trigger async flush here and hope the - * writeback is done when khugepaged - * revisits this page. + * Trigger async flush for read-only files and + * hope the writeback is done when khugepaged + * revisits this page. Writable files can have + * their folios dirty at any time; blindly + * flushing them would cause undesirable + * system-wide writeback. * * This is a one-off situation. We are not * forcing writeback in loop. */ xas_unlock_irq(&xas); - filemap_flush(mapping); + if (!inode_is_open_for_write(mapping->host)) + filemap_flush(mapping); result = SCAN_PAGE_DIRTY_OR_WRITEBACK; goto xa_unlocked; } else if (folio_test_writeback(folio)) { -- cgit v1.2.3