jemalloc

Commit Graph

Author	SHA1	Message	Date
Dan Horák	17881ebbfd	Add configure check for gettid() presence The gettid() function is available on Linux in glibc only since version 2.30. There are supported distributions that still use older glibc version. Thus add a configure check if the gettid() function is available and extend the check in src/prof_stack_range.c so it's skipped also when gettid() isn't available. Fixes: https://github.com/jemalloc/jemalloc/issues/2740	2024-12-17 12:40:54 -08:00
appujee	4b88bddbca	Conditionally remove unreachable for C23+	2024-12-17 12:39:00 -08:00
appujee	d8486b2653	Remove unreachable() macro as c23 already defines it. Taken from https://android-review.git.corp.google.com/c/platform/external/jemalloc_new/+/3316478 This might need more cleanups to remove the definition of JEMALLOC_INTERNAL_UNREACHABLE.	2024-12-17 12:39:00 -08:00
Guangli Dai	587676fee8	Disable psset test when hugepage size is too large.	2024-12-17 12:35:35 -08:00
Guangli Dai	a17385a882	Enable large hugepage tests for arm64 on Travis	2024-12-17 12:35:35 -08:00
Guangli Dai	6786934280	Fix ehooks assertion for arena creation	2024-12-11 13:33:32 -08:00
Dmitry Ilvokhin	46690c9ec0	Fix `test_retained` on boxes with a lot of CPUs We are trying to create `ncpus * 2` threads for this test and place them into `VARIABLE_ARRAY`, but `VARIABLE_ARRAY` can not be more than `VARIABLE_ARRAY_SIZE_MAX` bytes. When there are a lot of threads on the box test always fails. ``` $ nproc 176 $ make -j`nproc` tests_unit && ./test/unit/retained <jemalloc>: ../test/unit/retained.c:123: Failed assertion: "sizeof(thd_t) * (nthreads) <= VARIABLE_ARRAY_SIZE_MAX" Aborted (core dumped) ``` There is no need for high concurrency for this test as we are only checking stats there and it's behaviour is quite stable regarding number of allocating threads. Limited number of threads to 16 to save compute resources (on CI for example) and reduce tests running time. Before the change (`nproc` is 80 on this box). ``` $ make -j`nproc` tests_unit && time ./test/unit/retained <...> real 0m0.372s user 0m14.236s sys 0m12.338s ``` After the change (same box). ``` $ make -j`nproc` tests_unit && time ./test/unit/retained <...> real 0m0.018s user 0m0.108s sys 0m0.068s ```	2024-12-02 14:12:26 -08:00
Dmitry Ilvokhin	6092c980a6	Expose `psset` state stats When evaluating changes in HPA logic, it is useful to know internal `hpa_shard` state. Great deal of this state is `psset`. Some of the `psset` stats was available, but in disaggregated form, which is not very convenient. This commit exposed `psset` counters to `mallctl` and malloc stats dumps. Example of how malloc stats dump will look like after the change. HPA shard stats: Pageslabs: 14899 (4354 huge, 10545 nonhuge) Active pages: 6708166 (2228917 huge, 4479249 nonhuge) Dirty pages: 233816 (331 huge, 233485 nonhuge) Retained pages: 686306 Purge passes: 8730 (10 / sec) Purges: 127501 (146 / sec) Hugeifies: 4358 (5 / sec) Dehugifies: 4 (0 / sec) Pageslabs, active pages, dirty pages and retained pages are rows added by this change.	2024-11-21 09:23:32 -08:00
Dmitry Ilvokhin	3820e38dc1	Remove validation for HPA ratios Config validation was introduced at `3aae792b` with main intention to fix infinite purging loop, but it didn't actually fix the underlying problem, just masked it. Later `47d69b4ea` was merged to address the same problem. Options `hpa_dirty_mult` and `hpa_hugification_threshold` have different application dimensions: `hpa_dirty_mult` applied to active memory on the shard, but `hpa_hugification_threshold` is a threshold for single pageslab (hugepage). It doesn't make much sense to sum them up together. While it is true that too high value of `hpa_dirty_mult` and too low value of `hpa_hugification_threshold` can lead to pathological behaviour, it is true for other options as well. Poor configurations might lead to suboptimal and sometimes completely unacceptable behaviour and that's OK, that is exactly the reason why they are called poor. There are other mechanism exist to prevent extreme behaviour, when we hugified and then immediately purged page, see `hpa_hugify_blocked_by_ndirty` function, which exist to prevent exactly this case. Lastly, `hpa_dirty_mult + hpa_hugification_threshold >= 1` constraint is too tight and prevents a lot of valid configurations.	2024-11-20 18:59:07 -08:00
Dmitry Ilvokhin	0ce13c6fb5	Add opt `hpa_hugify_sync` to hugify synchronously Linux 6.1 introduced `MADV_COLLAPSE` flag to perform a best-effort synchronous collapse of the native pages mapped by the memory range into transparent huge pages. Synchronous hugification might be beneficial for at least two reasons: we are not relying on khugepaged anymore and get an instant feedback if range wasn't hugified. If `hpa_hugify_sync` option is on, we'll try to perform synchronously collapse and if it wasn't successful, we'll fallback to asynchronous behaviour.	2024-11-20 10:52:52 -08:00
Dmitry Ilvokhin	a361e886e2	Move `je_cv_thp` logic closer to definition	2024-11-20 10:52:52 -08:00
Dmitry Ilvokhin	b82333fdec	Split `stats_arena_hpa_shard_print` function Make multiple functions from `stats_arena_hpa_shard_print` for readability and ease of change in the future.	2024-11-08 12:18:15 -08:00
Dmitry Ilvokhin	b9758afff0	Add `nstime_ms_since` to get time since in ms Milliseconds are used a lot in hpa, so it is convenient to have `nstime_ms_since` function instead of dividing to `MILLION` constantly. For consistency renamed `nstime_msec` to `nstime_ms` as `ms` abbreviation is used much more commonly across codebase than `msec`. ``` $ grep -Rn '_msec' include src \| wc -l 2 $ grep -RPn '_ms( \|,\|:)' include src \| wc -l 72 ``` Function `nstime_msec` wasn't used anywhere in the code yet.	2024-11-08 10:37:28 -08:00
Qi Wang	2a693b83d2	Fix the sized-dealloc safety check abort msg.	2024-10-14 10:34:15 -07:00
Qi Wang	6d625d5e5e	Add support for clock_gettime_nsec_np() Prefer clock_gettime_nsec_np(CLOCK_UPTIME_RAW) to mach_absolute_time().	2024-10-14 10:33:27 -07:00
Guangli Dai	397827a27d	Updated jeprof with more symbols to filter.	2024-10-14 10:31:58 -07:00
Qi Wang	02251c0070	Update the configure cache file example in INSTALL.md	2024-10-10 16:41:48 -07:00
Qi Wang	8c2b8bcf24	Update doc to reflect muzzy decay is disabled by default. It has been disabled since 5.2.0 (in #1421).	2024-10-10 16:41:23 -07:00
Nathan Slingerland	edc1576f03	Add safe frame-pointer backtrace unwinder	2024-10-01 11:01:56 -07:00
Ben Niu	3a0d9cdadb	Use MSVC __declspec(thread) for TSD on Windows	2024-09-30 11:33:44 -07:00
Guangli Dai	1c900088c3	Do not support hpa if HUGEPAGE is too large.	2024-09-27 15:34:13 -07:00
Dmitry Ilvokhin	4f4fd42447	Remove `strict_min_purge_interval` option Option `experimental_hpa_strict_min_purge_interval` was expected to be temporary to simplify rollout of a bugfix. Now, when bugfix rollout is complete it is safe to remove this option.	2024-09-25 11:49:18 -07:00
Qi Wang	6cc42173cb	Assert the mutex is locked within malloc_mutex_assert_owner().	2024-09-23 18:06:07 -07:00
Qi Wang	44db479fad	Fix the lock owner sanity checking during background thread boot. During boot, some mutexes are not initialized yet, plus there's no point taking many mutexes while everything is covered by the global init lock, so the locking assumptions in some functions (e.g. background_thread_enabled_set()) can't be enforced. Skip the lock owner check in this case.	2024-09-23 18:06:07 -07:00
Guangli Dai	0181aaa495	Optimize edata_cmp_summary_compare when __uint128_t is available	2024-09-23 16:23:42 -07:00
roblabla	734f29ce56	Fix compilation with MSVC 2022 On MSVC, log is an intrinsic that doesn't require libm. However, AC_SEARCH_LIBS does not successfully detect this, as it will try to compile a program using the wrong signature for log. Newer versions of MSVC CL detects this and rejects the program with the following messages: conftest.c(40): warning C4391: 'char log()': incorrect return type for intrinsic function, expected 'double' conftest.c(44): error C2168: 'log': too few actual parameters for intrinsic function Since log is always available on MSVC (it's been around since the dawn of time), we simply always assume it's there if MSVC is detected.	2024-09-23 10:42:31 -07:00
Qi Wang	de5606d0d8	Fix a missing init value warning caught by static analysis.	2024-09-20 16:56:07 -07:00
Qi Wang	1960536b61	Add malloc_mutex_is_locked() sanity checks.	2024-09-20 16:56:07 -07:00
Qi Wang	3eb7a4b53d	Fix mutex state tracking around pthread_cond_wait(). pthread_cond_wait drops and re-acquires the mutex internally, w/o going through our wrapper. Update the locked state explicitly.	2024-09-20 16:56:07 -07:00
Qi Wang	661fb1e672	Fix the locked flag for malloc_mutex_trylock().	2024-09-20 16:56:07 -07:00
Guangli Dai	db4f0e7182	Add travis tests for arm64.	2024-09-12 15:40:04 -07:00
Nathan Slingerland	8c2e15d1a5	Add malloc_open() / malloc_close() reentrancy safe helpers	2024-09-12 15:38:08 -07:00
Nathan Slingerland	60f472f367	Fix initialization of pop_attempt_results in bin_batching test	2024-09-12 11:36:17 -07:00
Qi Wang	323ed2e3a8	Optimize fast path to allow static size class computation. After inlining at LTO time, many callsites have input size known which means the index and usable size can be translated at compile time. However the size-index lookup table prevents it -- this commit solves that by switching to the compute approach when the size is detected to be a known const.	2024-09-12 11:34:09 -07:00
Qi Wang	c1a3ca3755	Adjust the value width in stats output. Some of the values are accumulative and can reach high after running for long periods.	2024-09-11 14:29:32 -07:00
Qi Wang	3383b98f1b	Check if the huge page size is expected when enabling HPA.	2024-09-04 15:43:59 -07:00
Qi Wang	cd05b19f10	Fix the VM over-reservation on aarch64 w/ larger pages. HUGEPAGE could be larger on some platforms (e.g. 512M on aarch64 w/ 64K pages), in which case it would cause grow_retained / exp_grow to over-reserve VMs. Similarly, make sure the base alloc has a const 2M alignment.	2024-09-04 15:43:59 -07:00
Shirui Cheng	baa5a90cc6	fix nstime_update_mock in arena_decay unit test	2024-08-29 10:50:33 -07:00
Shirui Cheng	7c99686165	Better handle burst allocation on tcache_alloc_small_hard	2024-08-29 10:50:33 -07:00
Shirui Cheng	0c88be9e0a	Regulate GC frequency by requiring a time interval between two consecutive GCs	2024-08-29 10:50:33 -07:00
Shirui Cheng	e2c9f3a9ce	Take locality into consideration when doing GC flush	2024-08-29 10:50:33 -07:00
Shirui Cheng	14d5dc136a	Allow a range for the nfill passed to arena_cache_bin_fill_small	2024-08-29 10:50:33 -07:00
Shirui Cheng	f68effe4ac	Add a runtime option opt_experimental_tcache_gc to guard the new design	2024-08-29 10:50:33 -07:00
Ben Niu	9e123a833c	Leverage new Windows API TlsGetValue2 for performance	2024-08-28 16:50:33 -07:00
Qi Wang	e29ac61987	Limit Cirrus CI to freebsd 15 and 14	2024-08-28 16:33:36 -07:00
Qi Wang	bd0a5b0f3b	Fix static analysis warnings. Newly reported warnings included several reserved macro identifier, and false-positive used-uninitialized.	2024-08-28 16:03:53 -07:00
Guangli Dai	5b72ac098a	Remove tests for ppc64 on Travic CI.	2024-08-26 09:53:00 -07:00
Shirui Cheng	8c54637f8c	Better trigger race condition in bin_batching unit test	2024-08-23 14:10:04 -07:00
Dmitry Ilvokhin	c7ccb8d7e9	Add `experimental` prefix to `hpa_strict_min_purge_interval` Goal is to make it obvious this option is experimental.	2024-08-20 10:02:38 -07:00
Dmitry Ilvokhin	aaa29003ab	Limit maximum number of purged slabs with option Option `experimental_hpa_max_purge_nhp` introduced for backward compatibility reasons: to make it possible to have behaviour similar to buggy `hpa_strict_min_purge_interval` implementation. When `experimental_hpa_max_purge_nhp` is set to -1, there is no limit to number of slabs we'll purge on each iteration. Otherwise, we'll purge no more than `experimental_hpa_max_purge_nhp` hugepages (slabs). This in turn means we might not purge enough dirty pages to satisfy `hpa_dirty_mult` requirement. Combination of `hpa_dirty_mult`, `experimental_hpa_max_purge_nhp` and `hpa_strict_min_purge_interval` options allows us to have steady rate of pages returned back to the system. This provides a strickier latency guarantees as number of `madvise` calls is bounded (and hence number of TLB shootdowns is limited) in exchange to weaker memory usage guarantees.	2024-08-20 10:02:38 -07:00

1 2 3 4 5 ...

3530 Commits All Branches Search

3530 Commits

All Branches