freebsd-src.git - FreeBSD sources

Age	Commit message (Collapse)	Author
2012-04-17	Import jemalloc 9ef7f5dc34ff02f50d401e41c8d9a4a928e7c2aa (dev branch,	Jason Evans
	prior to 3.0.0 release) as contrib/jemalloc, and integrate it into libc. The code being imported by this commit diverged from lib/libc/stdlib/malloc.c in March 2010, which means that a portion of the jemalloc 1.0.0 ChangeLog entries are relevant, as are the entries for all subsequent releases. Notes: svn path=/head/; revision=234370
2012-01-09	Add aligned_alloc(3).	Ed Schouten
	The C11 folks reinvented the wheel by introducing an aligned version of malloc(3) called aligned_alloc(3), instead of posix_memalign(3). Instead of returning the allocation by reference, it returns the address, just like malloc(3). Reviewed by: jasone@ Notes: svn path=/head/; revision=229848
2011-12-15	Since clang does not support the tls_model attribute used in malloc.c	Dimitry Andric
	yet (see LLVM PR 9788), and warns about it, rub it out for now. When clang grows support for this attribute, I will revert this again. MFC after: 1 week Notes: svn path=/head/; revision=228540
2011-06-21	Change sparc64 to use the initial exec TLS model, too. This avoids random	Marius Strobl
	assertion failures in _malloc_thread_cleanup(). Notes: svn path=/head/; revision=223369
2011-03-11	Now that TLS generally is available on sparc64 since r219534 turn on	Marius Strobl
	support for it. Note that while sparc64 also supports the static TLS model and thus tls_model("initial-exec"), using the default model turned out to yield slightly better buildstone performance. Notes: svn path=/head/; revision=219535
2010-08-17	Use aux vector to get values for SSP canary, pagesize, pagesizes array,	Konstantin Belousov
	number of host CPUs and osreldate. This eliminates the last sysctl(2) calls from the dynamically linked image startup. No objections from: kan Tested by: marius (sparc64) MFC after: 1 month Notes: svn path=/head/; revision=211416
2010-07-10	Provide 64-bit PowerPC support in libc.	Nathan Whitehorn
	Obtained from: projects/ppc64 Notes: svn path=/head/; revision=209878
2010-02-28	Rewrite red-black trees to do lazy balance fixup. This improves	Jason Evans
	insert/remove speed by ~30%. Notes: svn path=/head/; revision=204493
2010-02-16	Define TLS_MODEL for PowerPC as well. Since PowerPC uses variant I,	Marcel Moolenaar
	like ia64, leave it empty (default model). Notes: svn path=/head/; revision=203969
2010-02-16	Unbreak ia64: tls_model("initial-exec") is invalid, because it assumes	Marcel Moolenaar
	the static TLS model, which is fundamentally different from the dynamic TLS model. The consequence was data corruption. Limit the attribute to i386 and amd64. Notes: svn path=/head/; revision=203950
2010-01-31	Fix bugs:	Jason Evans
	* Fix a race in chunk_dealloc_dss(). * Check for allocation failure before zeroing memory in base_calloc(). Merge enhancements from a divergent version of jemalloc: * Convert thread-specific caching from magazines to an algorithm that is more tunable, and implement incremental GC. * Add support for medium size classes, [4KiB..32KiB], 2KiB apart by default. * Add dirty page tracking for pages within active small/medium object runs. This allows malloc to track precisely which pages are in active use, which makes dirty page purging more effective. * Base maximum dirty page count on proportion of active memory. * Use optional zeroing in arena_chunk_alloc() to avoid needless zeroing of chunks. This is useful in the context of DSS allocation, since a long-lived application may commonly recycle chunks. * Increase the default chunk size from 1MiB to 4MiB. Remove feature: * Remove the dynamic rebalancing code, since thread caching reduces its utility. Notes: svn path=/head/; revision=203329
2010-01-27	Add missing return, in a rare case where we can't allocate memory in	Ed Maste
	deallocate. Submitted by: Ryan Stone (rysto32 at gmail dot com) Approved by: jasone Notes: svn path=/head/; revision=203077
2009-12-10	Simplify arena_run_reg_dalloc(), and remove a bug that was due to incorrect	Jason Evans
	initialization of ssize_invs. Notes: svn path=/head/; revision=200345
2009-12-10	Fix the posix_memalign() changes in r196861 to actually return a NULL pointer	Jason Evans
	as intended. PR: standards/138307 Notes: svn path=/head/; revision=200340
2009-11-14	Change the utrace log entry for malloc_init from (0, 0, 0) to (-1, 0, 0)	Colin Percival
	in order to distinguish it from free(NULL), which is logged as (0, 0, 0). Reviewed by: jhb Notes: svn path=/head/; revision=199264
2009-09-26	Make malloc(3) superpage aware. Specifically, if getpagesizes(3) returns	Alan Cox
	a large page size that is greater than malloc(3)'s default chunk size but less than or equal to 4 MB, then increase the chunk size to match the large page size. Most often, using a chunk size that is less than the large page size is not a problem. However, consider a long-running application that allocates and frees significant amounts of memory. In particular, it frees enough memory at times that some of that memory is munmap()ed. Up until the first munmap(), a 1MB chunk size is just fine; it's not a problem for the virtual memory system. Two adjacent 1MB chunks that are aligned on a 2MB boundary will be promoted automatically to a superpage even though they were allocated at different times. The trouble begins with the munmap(), releasing a 1MB chunk will trigger the demotion of the containing superpage, leaving behind a half-used 2MB reservation. Now comes the real problem. Unfortunately, when the application needs to allocate more memory, and it recycles the previously munmap()ed address range, the implementation of mmap() won't be able to reuse the reservation. Basically, the coalescing rules in the virtual memory system don't allow this new range to combine with its neighbor. The effect being that superpage promotion will not reoccur for this range of addresses until both 1MB chunks are freed at some point in the future. Reviewed by: jasone MFC after: 3 weeks Notes: svn path=/head/; revision=197524
2009-09-05	Handle zero size for posix_memalign. Return NULL or unique address	Konstantin Belousov
	according to the 'V' option. PR: standards/138307 MFC after: 1 week Notes: svn path=/head/; revision=196861
2008-12-01	Fix a lock order reversal bug that could cause deadlock during fork(2).	Jason Evans
	Reported by: kib Notes: svn path=/head/; revision=185514
2008-11-30	Adjust an assertion to handle the case where a lock is contested, but	Jason Evans
	spinning is avoided due to running on a single-CPU system. Reported by: stefanf Notes: svn path=/head/; revision=185483
2008-11-30	Do not spin when trying to lock on a single-CPU system.	Jason Evans
	Reported by: davidxu Notes: svn path=/head/; revision=185468
2008-11-03	Revert to preferring mmap(2) over sbrk(2) when mapping memory, due to	Jason Evans
	potential extreme contention in the kernel for multi-threaded applications on SMP systems. Reported by: kris Notes: svn path=/head/; revision=184602
2008-09-10	Use PAGE_{SIZE,MASK,SHIFT} from machine/param.h rather than hard-coding	Jason Evans
	page size and using sysconf(3). Suggested by: marcel Notes: svn path=/head/; revision=182906
2008-09-06	Unbreak ia64: pges are 8KB.	Marcel Moolenaar
	Notes: svn path=/head/; revision=182809
2008-08-27	Add thread-specific caching for small size classes, based on magazines.	Jason Evans
	This caching allows for completely lock-free allocation/deallocation in the steady state, at the expense of likely increased memory use and fragmentation. Reduce the default number of arenas to 2*ncpus, since thread-specific caching typically reduces arena contention. Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced, cacheline-spaced, and subpage-spaced size classes. The advantages are: fewer size classes, reduced false cacheline sharing, and reduced internal fragmentation for allocations that are slightly over 512, 1024, etc. Increase RUN_MAX_SMALL, in order to limit fragmentation for the subpage-spaced size classes. Add a size-->bin lookup table for small sizes to simplify translating sizes to size classes. Include a hard-coded constant table that is used unless custom size class spacing is specified at run time. Add the ability to disable tiny size classes at compile time via MALLOC_TINY. Notes: svn path=/head/; revision=182225
2008-08-14	Move CPU_SPINWAIT into the innermost spin loop, in order to allow faster	Jason Evans
	preemption while busy-waiting. Submitted by: Mike Schuster <schuster@adobe.com> Notes: svn path=/head/; revision=181733
2008-08-14	Re-order the terms of an expression in arena_run_reg_dalloc() to correctly	Jason Evans
	detect whether the integer division table is large enough to handle the divisor. Before this change, the last two table elements were never used, thus causing the slow path to be used for those divisors. Notes: svn path=/head/; revision=181732
2008-08-08	Remove variables which are assigned values and never used thereafter.	Colin Percival
	Found by: LLVM/Clang Static Checker Approved by: jasone Notes: svn path=/head/; revision=181438
2008-07-18	Enhance arena_chunk_map_t to directly support run coalescing, and use	Jason Evans
	the chunk map instead of red-black trees where possible. Remove the red-black trees and node objects that are obsoleted by this change. The net result is a ~1-2% memory savings, and a substantial allocation speed improvement. Notes: svn path=/head/; revision=180599
2008-06-10	In the error path through base_alloc(), release base_mtx [1].	Jason Evans
	Fix bit vector initialization for run headers. Submitted by: [1] Mike Schuster <schuster@adobe.com> Notes: svn path=/head/; revision=179704
2008-05-01	Add a separate tree to track arena chunks that contain dirty pages.	Jason Evans
	This substantially improves worst case allocation performance, since O(lg n) tree search can be used instead of O(n) tree iteration. Use rb_wrap() instead of directly calling rb_*() macros. Notes: svn path=/head/; revision=178709
2008-04-29	Set QUANTUM_2POW_MIN and SIZEOF_PTR_2POW parameters for MIPS	Oleksandr Tymoshenko
	Approved by: imp Notes: svn path=/head/; revision=178683
2008-04-29	Check for integer overflow before calling sbrk(2), since it uses a	Jason Evans
	signed increment argument, but the size is an unsigned integer. Notes: svn path=/head/; revision=178645
2008-04-23	Implement red-black trees without using parent pointers, and store the	Jason Evans
	color bit in the least significant bit of the right child pointer, in order to reduce red-black tree linkage overhead by ~2X as compared to sys/tree.h. Use the new red-black tree implementation in malloc, which drops memory usage by ~0.5 or ~1%, for 32- and 64-bit systems, respectively. Notes: svn path=/head/; revision=178440
2008-03-07	Remove stale #include <machine/atomic.h>, which as needed by lazy	Jason Evans
	deallocation. Notes: svn path=/head/; revision=176909
2008-02-17	Fix a race condition in arena_ralloc() for shrinking in-place large	Jason Evans
	reallocation, when junk filling is enabled. Junk filling must occur prior to shrinking, since any deallocated trailing pages are immediately available for use by other threads. Reported by: Mats Palmgren <mats.palmgren@bredband.net> Notes: svn path=/head/; revision=176369
2008-02-17	Remove support for lazy deallocation. Benchmarks across a wide range of	Jason Evans
	allocation patterns, number of CPUs, and MALLOC_OPTIONS settings indicate that lazy deallocation has the potential to worsen throughput dramatically. Performance degradation occurs when multiple threads try to clear the lazy free cache simultaneously. Various experiments to avoid this bottleneck failed to completely solve this problem, while adding yet more complexity. Notes: svn path=/head/; revision=176368
2008-02-08	Fix a bug in lazy deallocation that was introduced when	Jason Evans
	arena_dalloc_lazy_hard() was split out of arena_dalloc_lazy() in revision 1.162. Reduce thundering herd problems in lazy deallocation by randomly varying how many probes a thread does before taking the slow path. Notes: svn path=/head/; revision=176103
2008-02-08	Clean up manipulation of chunk page map elements to remove some tenuous	Jason Evans
	assumptions about whether bits are set at various times. This makes adding other flags safe. Reorganize functions in order to inline i{m,c,p,s,re}alloc(). This allows the entire fast-path call chains for malloc() and free() to be inlined. [1] Suggested by: [1] Stuart Parmenter <stuart@mozilla.com> Notes: svn path=/head/; revision=176100
2008-02-06	Track dirty unused pages so that they can be purged if they exceed a	Jason Evans
	threshold, according to the 'F' MALLOC_OPTIONS flag. This obsoletes the 'H' flag. Try to realloc() large objects in place. This substantially speeds up incremental large reallocations in the common case. Fix a bug in arena_ralloc() that caused relocation of sub-page objects even if the old and new sizes were in the same size class. Maintain trees of runs and simplify the per-chunk page map. This allows logarithmic-time searching for sufficiently large runs in arena_run_alloc(), whereas the previous algorithm required linear time in the worst case. Break various large functions into smaller sub-functions, and inline only the functions that are in the fast path for small object allocation/deallocation. Remove an unnecessary check in base_pages_alloc_mmap(). Avoid integer division in choose_arena() for the NO_TLS case on single-CPU systems. Notes: svn path=/head/; revision=176022
2008-01-03	Enable both sbrk(2)- and mmap(2)-based memory acquisition methods by	Jason Evans
	default. This has the disadvantage of rendering the datasize resource limit irrelevant, but without this change, legitimate uses of more memory than will fit in the data segment are thwarted by default. Fix chunk_alloc_mmap() to work correctly if initial mapping is not chunk-aligned and mapping extension fails. Notes: svn path=/head/; revision=175075
2007-12-31	Fix a major chunk-related memory leak in chunk_dealloc_dss_record(). [1]	Jason Evans
	Clean up DSS-related locking and protect all pertinent variables with dss_mtx (remove dss_chunks_mtx). This fixes race conditions that could cause chunk leaks. Reported by: [1] kris Notes: svn path=/head/; revision=175011
2007-12-31	Fix a bug related to sbrk() calls that could cause address space leaks.	Jason Evans
	This is a long-standing bug, but until recent changes it was difficult to trigger, and even then its impact was non-catastrophic, with the exception of revision 1.157. Optimize chunk_alloc_mmap() to avoid the need for unmapping pages in the common case. Thanks go to Kris Kennaway for a patch that inspired this change. Do not maintain a record of previously mmap'ed chunk address ranges. The original intent was to avoid the extra system call overhead in chunk_alloc_mmap(), which is no longer a concern. This also allows some simplifications for the tree of unused DSS chunks. Introduce huge_mtx and dss_chunks_mtx to replace chunks_mtx. There was no compelling reason to use the same mutex for these disjoint purposes. Avoid memset() for huge allocations when possible. Maintain two trees instead of one for tracking unused DSS address ranges. This allows scalable allocation of multi-chunk huge objects in the DSS. Previously, multi-chunk huge allocation requests failed if the DSS could not be extended. Notes: svn path=/head/; revision=175004
2007-12-28	Back out premature commit of previous version.	Jason Evans
	Notes: svn path=/head/; revision=174957
2007-12-28	Maintain two trees instead of one (old_chunks --> old_chunks_{ad,szad}) in	Jason Evans
	order to support re-use of multi-chunk unused regions within the DSS for huge allocations. This generalization is important to correct function when mmap-based allocation is disabled. Avoid zeroing re-used memory in the DSS unless it really needs to be zeroed. Notes: svn path=/head/; revision=174956
2007-12-28	Release chunks_mtx for all paths through chunk_dealloc().	Jason Evans
	Reported by: kris Notes: svn path=/head/; revision=174953
2007-12-27	Add the 'D' and 'M' run time options, and use them to control whether	Jason Evans
	memory is acquired from the system via sbrk(2) and/or mmap(2). By default, use sbrk(2) only, in order to support traditional use of resource limits. Additionally, when both options are enabled, prefer the data segment to anonymous mappings, in order to coexist better with large file mappings in applications on 32-bit platforms. This change has the potential to increase memory fragmentation due to the linear nature of the data segment, but from a performance perspective this is mitigated by the use of madvise(2). [1] Add the ability to interpret integer prefixes in MALLOC_OPTIONS processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as MALLOC_OPTIONS=9l. Reported by: [1] rwatson Design review: [1] alc, peter, rwatson Notes: svn path=/head/; revision=174950
2007-12-18	Use fixed point integer math instead of floating point math when	Jason Evans
	calculating run sizes. Use of the floating point unit was a potential pessimization to context switching for applications that do not otherwise use floating point math. [1] Reformat cpp macro-related comments to improve consistency. Submitted by: das Notes: svn path=/head/; revision=174745
2007-12-17	Refactor features a bit in order to make it possible to disable lazy	Jason Evans
	deallocation and dynamic load balancing via the MALLOC_LAZY_FREE and MALLOC_BALANCE knobs. This is a non-functional change, since these features are still enabled when possible. Clean up a few things that more pedantic compiler settings would cause complaints over. Notes: svn path=/head/; revision=174695
2007-11-28	Only zero large allocations when necessary (for calloc()).	Jason Evans
	Notes: svn path=/head/; revision=174002
2007-11-27	Implement dynamic load balancing of thread-->arena mapping, based on lock	Jason Evans
	contention. The intent is to dynamically adjust to load imbalances, which can cause severe contention. Use pthread mutexes where possible instead of libc "spinlocks" (they aren't actually spin locks). Conceptually, this change is meant only to support the dynamic load balancing code by enabling the use of spin locks, but it has the added apparent benefit of substantially improving performance due to reduced context switches when there is moderate arena lock contention. Proper tuning parameter configuration for this change is a finicky business, and it is very much machine-dependent. One seemingly promising solution would be to run a tuning program during operating system installation that computes appropriate settings for load balancing. (The pthreads adaptive spin locks should probably be similarly tuned.) Notes: svn path=/head/; revision=173968