linux.git/fs/splice.c, branch v2.6.29

[CVE-2009-0029] System call wrappers part 31

2009-01-14T13:15:31+00:00

Signed-off-by: Heiko Carstens

memcg: synchronized LRU

2009-01-08T16:31:05+00:00

A big patch for changing memcg's LRU semantics.

Now,
  - page_cgroup is linked to mem_cgroup's its own LRU (per zone).

  - LRU of page_cgroup is not synchronous with global LRU.

  - page and page_cgroup is one-to-one and statically allocated.

  - To find page_cgroup is on what LRU, you have to check pc->mem_cgroup as
    - lru = page_cgroup_zoneinfo(pc, nid_of_pc, zid_of_pc);

  - SwapCache is handled.

And, when we handle LRU list of page_cgroup, we do following.

	pc = lookup_page_cgroup(page);
	lock_page_cgroup(pc); .....................(1)
	mz = page_cgroup_zoneinfo(pc);
	spin_lock(&mz->lru_lock);
	.....add to LRU
	spin_unlock(&mz->lru_lock);
	unlock_page_cgroup(pc);

But (1) is spin_lock and we have to be afraid of dead-lock with zone->lru_lock.
So, trylock() is used at (1), now. Without (1), we can't trust "mz" is correct.

This is a trial to remove this dirty nesting of locks.
This patch changes mz->lru_lock to be zone->lru_lock.
Then, above sequence will be written as

        spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU
	mem_cgroup_add/remove/etc_lru() {
		pc = lookup_page_cgroup(page);
		mz = page_cgroup_zoneinfo(pc);
		if (PageCgroupUsed(pc)) {
			....add to LRU
		}
        spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU

This is much simpler.
(*) We're safe even if we don't take lock_page_cgroup(pc). Because..
    1. When pc->mem_cgroup can be modified.
       - at charge.
       - at account_move().
    2. at charge
       the PCG_USED bit is not set before pc->mem_cgroup is fixed.
    3. at account_move()
       the page is isolated and not on LRU.

Pros.
  - easy for maintenance.
  - memcg can make use of laziness of pagevec.
  - we don't have to duplicated LRU/Active/Unevictable bit in page_cgroup.
  - LRU status of memcg will be synchronized with global LRU's one.
  - # of locks are reduced.
  - account_move() is simplified very much.
Cons.
  - may increase cost of LRU rotation.
    (no impact if memcg is not configured.)

Signed-off-by: KAMEZAWA Hiroyuki 
Cc: Li Zefan 
Cc: Balbir Singh 
Cc: Pavel Emelyanov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

fs: remove prepare_write/commit_write

2008-10-30T18:38:45+00:00

Nothing uses prepare_write or commit_write. Remove them from the tree
completely.

[akpm@linux-foundation.org: schedule simple_prepare_write() for unexporting]
Signed-off-by: Nick Piggin 
Cc: Christoph Hellwig 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

Don't allow splice() to files opened with O_APPEND

2008-10-09T21:26:38+00:00

This is debatable, but while we're debating it, let's disallow the
combination of splice and an O_APPEND destination.

It's not entirely clear what the semantics of O_APPEND should be, and
POSIX apparently expects pwrite() to ignore O_APPEND, for example.  So
we could make up any semantics we want, including the old ones.

But Miklos convinced me that we should at least give it some thought,
and that accepting writes at arbitrary offsets is wrong at least for
IS_APPEND() files (which always have O_APPEND set, even if the reverse
isn't true: you can obviously have O_APPEND set on a regular file).

So disallow O_APPEND entirely for now.  I doubt anybody cares, and this
way we have one less gray area to worry about.

Reported-and-argued-for-by: Miklos Szeredi 
Acked-by: Jens Axboe 
Signed-off-by: Linus Torvalds

mm: rename page trylock

2008-08-05T04:31:34+00:00

Converting page lock to new locking bitops requires a change of page flag
operation naming, so we might as well convert it to something nicer
(!TestSetPageLocked_Lock => trylock_page, SetPageLocked => set_page_locked).

This also facilitates lockdeping of page lock.

Signed-off-by: Nick Piggin 
Acked-by: KOSAKI Motohiro 
Acked-by: Peter Zijlstra 
Acked-by: Andrew Morton 
Acked-by: Benjamin Herrenschmidt 
Signed-off-by: Linus Torvalds

[patch 3/5] vfs: change remove_suid() to file_remove_suid()

2008-07-27T00:53:16+00:00

All calls to remove_suid() are made with a file pointer, because
(similarly to file_update_time) it is called when the file is written.

Clean up callers by passing in a file instead of a dentry.

Signed-off-by: Miklos Szeredi

splice: use get_user_pages_fast

2008-07-26T19:00:06+00:00

Use get_user_pages_fast in splice.  This reverts some mmap_sem batching
there, however the biggest problem with mmap_sem tends to be hold times
blocking out other threads rather than cacheline bouncing.  Further: on
architectures that implement get_user_pages_fast without locks, mmap_sem
can be avoided completely anyway.

Signed-off-by: Nick Piggin 
Cc: Dave Kleikamp 
Cc: Andy Whitcroft 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Andi Kleen 
Cc: Dave Kleikamp 
Cc: Badari Pulavarty 
Cc: Zach Brown 
Cc: Jens Axboe 
Reviewed-by: Peter Zijlstra 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

splice: fix generic_file_splice_read() race with page invalidation

2008-07-04T07:52:14+00:00

If a page was invalidated during splicing from file to a pipe, then
generic_file_splice_read() could return a short or zero count.

This manifested itself in rare I/O errors seen on nfs exported fuse
filesystems.  This is because nfsd uses splice_direct_to_actor() to read
files, and fuse uses invalidate_inode_pages2() to invalidate stale data on
open.

Fix by redoing the page find/create if it was found to be truncated
(invalidated).

Signed-off-by: Miklos Szeredi 
Signed-off-by: Andrew Morton 
Signed-off-by: Jens Axboe

splice: handle try_to_release_page() failure

2008-05-28T12:49:27+00:00

splice currently assumes that try_to_release_page() always suceeds,
but it can return failure. If it does, we cannot steal the page.

Acked-by: Mingming Cao

splice: fix sendfile() issue with relay

2008-05-28T12:49:27+00:00

Splice isn't always incrementing the ppos correctly, which broke
relay splice.

Signed-off-by: Tom Zanussi 
Tested-by: Dan Williams 
Signed-off-by: Jens Axboe