The gettid() function is available on Linux in glibc only since version
2.30. There are supported distributions that still use older glibc
version. Thus add a configure check if the gettid() function is
available and extend the check in src/prof_stack_range.c so it's skipped
also when gettid() isn't available.
Fixes: https://github.com/jemalloc/jemalloc/issues/2740
We are trying to create `ncpus * 2` threads for this test and place them
into `VARIABLE_ARRAY`, but `VARIABLE_ARRAY` can not be more than
`VARIABLE_ARRAY_SIZE_MAX` bytes. When there are a lot of threads on the
box test always fails.
```
$ nproc
176
$ make -j`nproc` tests_unit && ./test/unit/retained
<jemalloc>: ../test/unit/retained.c:123: Failed assertion:
"sizeof(thd_t) * (nthreads) <= VARIABLE_ARRAY_SIZE_MAX"
Aborted (core dumped)
```
There is no need for high concurrency for this test as we are only
checking stats there and it's behaviour is quite stable regarding number
of allocating threads.
Limited number of threads to 16 to save compute resources (on CI for
example) and reduce tests running time.
Before the change (`nproc` is 80 on this box).
```
$ make -j`nproc` tests_unit && time ./test/unit/retained
<...>
real 0m0.372s
user 0m14.236s
sys 0m12.338s
```
After the change (same box).
```
$ make -j`nproc` tests_unit && time ./test/unit/retained
<...>
real 0m0.018s
user 0m0.108s
sys 0m0.068s
```
When evaluating changes in HPA logic, it is useful to know internal
`hpa_shard` state. Great deal of this state is `psset`. Some of the
`psset` stats was available, but in disaggregated form, which is not
very convenient. This commit exposed `psset` counters to `mallctl`
and malloc stats dumps.
Example of how malloc stats dump will look like after the change.
HPA shard stats:
Pageslabs: 14899 (4354 huge, 10545 nonhuge)
Active pages: 6708166 (2228917 huge, 4479249 nonhuge)
Dirty pages: 233816 (331 huge, 233485 nonhuge)
Retained pages: 686306
Purge passes: 8730 (10 / sec)
Purges: 127501 (146 / sec)
Hugeifies: 4358 (5 / sec)
Dehugifies: 4 (0 / sec)
Pageslabs, active pages, dirty pages and retained pages are rows added
by this change.
Config validation was introduced at 3aae792b with main intention to fix
infinite purging loop, but it didn't actually fix the underlying
problem, just masked it. Later 47d69b4ea was merged to address the same
problem.
Options `hpa_dirty_mult` and `hpa_hugification_threshold` have different
application dimensions: `hpa_dirty_mult` applied to active memory on the
shard, but `hpa_hugification_threshold` is a threshold for single
pageslab (hugepage). It doesn't make much sense to sum them up together.
While it is true that too high value of `hpa_dirty_mult` and too low
value of `hpa_hugification_threshold` can lead to pathological
behaviour, it is true for other options as well. Poor configurations
might lead to suboptimal and sometimes completely unacceptable
behaviour and that's OK, that is exactly the reason why they are called
poor.
There are other mechanism exist to prevent extreme behaviour, when we
hugified and then immediately purged page, see
`hpa_hugify_blocked_by_ndirty` function, which exist to prevent exactly
this case.
Lastly, `hpa_dirty_mult + hpa_hugification_threshold >= 1` constraint is
too tight and prevents a lot of valid configurations.
Linux 6.1 introduced `MADV_COLLAPSE` flag to perform a best-effort
synchronous collapse of the native pages mapped by the memory range into
transparent huge pages.
Synchronous hugification might be beneficial for at least two reasons:
we are not relying on khugepaged anymore and get an instant feedback if
range wasn't hugified.
If `hpa_hugify_sync` option is on, we'll try to perform synchronously
collapse and if it wasn't successful, we'll fallback to asynchronous
behaviour.
Milliseconds are used a lot in hpa, so it is convenient to have
`nstime_ms_since` function instead of dividing to `MILLION` constantly.
For consistency renamed `nstime_msec` to `nstime_ms` as `ms` abbreviation
is used much more commonly across codebase than `msec`.
```
$ grep -Rn '_msec' include src | wc -l
2
$ grep -RPn '_ms( |,|:)' include src | wc -l
72
```
Function `nstime_msec` wasn't used anywhere in the code yet.
Option `experimental_hpa_strict_min_purge_interval` was expected to be
temporary to simplify rollout of a bugfix. Now, when bugfix rollout is
complete it is safe to remove this option.
During boot, some mutexes are not initialized yet, plus there's no point taking
many mutexes while everything is covered by the global init lock, so the locking
assumptions in some functions (e.g. background_thread_enabled_set()) can't be
enforced. Skip the lock owner check in this case.
On MSVC, log is an intrinsic that doesn't require libm. However,
AC_SEARCH_LIBS does not successfully detect this, as it will try to
compile a program using the wrong signature for log. Newer versions of
MSVC CL detects this and rejects the program with the following
messages:
conftest.c(40): warning C4391: 'char log()': incorrect return type for intrinsic function, expected 'double'
conftest.c(44): error C2168: 'log': too few actual parameters for intrinsic function
Since log is always available on MSVC (it's been around since the dawn
of time), we simply always assume it's there if MSVC is detected.
After inlining at LTO time, many callsites have input size known which means the
index and usable size can be translated at compile time. However the size-index
lookup table prevents it -- this commit solves that by switching to the compute
approach when the size is detected to be a known const.
HUGEPAGE could be larger on some platforms (e.g. 512M on aarch64 w/ 64K pages),
in which case it would cause grow_retained / exp_grow to over-reserve VMs.
Similarly, make sure the base alloc has a const 2M alignment.
Option `experimental_hpa_max_purge_nhp` introduced for backward
compatibility reasons: to make it possible to have behaviour similar
to buggy `hpa_strict_min_purge_interval` implementation.
When `experimental_hpa_max_purge_nhp` is set to -1, there is no limit
to number of slabs we'll purge on each iteration. Otherwise, we'll purge
no more than `experimental_hpa_max_purge_nhp` hugepages (slabs). This in
turn means we might not purge enough dirty pages to satisfy
`hpa_dirty_mult` requirement.
Combination of `hpa_dirty_mult`, `experimental_hpa_max_purge_nhp` and
`hpa_strict_min_purge_interval` options allows us to have steady rate of
pages returned back to the system. This provides a strickier latency
guarantees as number of `madvise` calls is bounded (and hence number of
TLB shootdowns is limited) in exchange to weaker memory usage
guarantees.