[OpenMP][libomp] avoid spin wait and yield on arm64 macOS

This patch changes the default behavior to avoid spin waiting and
yielding. (See “Don’t Keep Threads Active And Idle” section here:
https://developer.apple.com/documentation/apple-silicon/tuning-your-code-s-performance-for-apple-silicon)

We verified using instruments traces that the changes improve scheduling
behavior on macOS.

We also collected results using EPCC schedbench
(https://github.com/LangdalP/EPCC-OpenMP-micro-benchmarks) that are
attached here that show a reduction in standard deviation and max test
run time across all scheduling types. Static scheduling sees dramatic
improvements with these changes, we see a 2-4x average runtime
improvement in the benchmark.

Differential Revision: https://reviews.llvm.org/D126510
This commit is contained in:
Daniel Douglas 2022-06-24 11:59:22 -05:00 committed by Jonathan Peyton
parent 42bb88e2aa
commit d4a7b8de52
3 changed files with 9 additions and 1 deletions

View File

@ -3061,6 +3061,8 @@ extern int __kmp_storage_map_verbose_specified;
#if KMP_ARCH_X86 || KMP_ARCH_X86_64
extern kmp_cpuinfo_t __kmp_cpuinfo;
static inline bool __kmp_is_hybrid_cpu() { return __kmp_cpuinfo.flags.hybrid; }
#elif KMP_OS_DARWIN && KMP_ARCH_AARCH64
static inline bool __kmp_is_hybrid_cpu() { return true; }
#else
static inline bool __kmp_is_hybrid_cpu() { return false; }
#endif

View File

@ -425,7 +425,13 @@ int __kmp_env_consistency_check = FALSE; /* KMP_CONSISTENCY_CHECK specified? */
// 0 = never yield;
// 1 = always yield (default);
// 2 = yield only if oversubscribed
#if KMP_OS_DARWIN && KMP_ARCH_AARCH64
// Set to 0 for environments where yield is slower
kmp_int32 __kmp_use_yield = 0;
#else
kmp_int32 __kmp_use_yield = 1;
#endif
// This will be 1 if KMP_USE_YIELD environment variable was set explicitly
kmp_int32 __kmp_use_yield_exp_set = 0;

View File

@ -8300,7 +8300,7 @@ void __kmp_aux_set_library(enum library_type arg) {
break;
case library_throughput:
if (__kmp_dflt_blocktime == KMP_MAX_BLOCKTIME)
__kmp_dflt_blocktime = 200;
__kmp_dflt_blocktime = KMP_DEFAULT_BLOCKTIME;
break;
default:
KMP_FATAL(UnknownLibraryType, arg);