llvm-project

Commit Graph

Author	SHA1	Message	Date
Johannes Doerfert	d150152615	[OpenMP] Introduce more fine-grained control over the thread state use We can help optimizations by making sure we use the team state whenever it is clear there is no thread state. To this end we introduce a new state flag (`state::HasThreadState`) and explicit control for the `state::ValueRAII` helpers, including a dedicated "assert equal". Differential Revision: https://reviews.llvm.org/D130113	2022-07-21 12:30:38 -05:00
Johannes Doerfert	7472b42b78	[OpenMP] Use Undef instead of null as pointer for inactive lanes Our conditional writes in the runtime look like this: ``` if (active) *ptr = value; ``` In the RAII we need to assign `ptr` which comes from a lookup call. If a thread that is not the main thread calls lookup with the intention to write the pointer, we'll create a new thread state. As such, we need to avoid calling lookup for inactive threads. We used to use `nullptr` as their `ptr` value but that can cause pessimistic reasoning. We now use `undef` instead. Differential Revision: https://reviews.llvm.org/D130114	2022-07-21 12:28:45 -05:00
Johannes Doerfert	a42361dc1c	[OpenMP] Expose the state in the header to allow non-lto optimizations We used to inline the `lookup` calls such that the runtime had "known" access offsets when it was shipped. With the new static library build it doesn't as the lookup is an indirection we cannot look through. This should help us optimize the code better until we can do LTO for the runtime again. Differential Revision: https://reviews.llvm.org/D130111	2022-07-21 12:28:44 -05:00
Joseph Huber	e01ce4e88a	[Libomptarget] Add checks for CUDA subarchitecture using new info This patch extends the `is_valid_binary` routine to also check if the binary's architecture string matches the one parsed from the runtime. This should allow us to only use the binary whose compute capability matches, allowing us to support basic multi-architecture binaries for CUDA. Depends on D127432 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D127505	2022-07-21 13:20:06 -04:00
Joseph Huber	fbcb1ee7f3	[Libomptarget] Add support for offloading binaries in libomptarget The previous path changed the linker wrapper to embed the offloading binary format inside the target image instead. This will allow us to more generically bundle metadata with these images, such as requires clauses or the target architecture it was compiled for. I wasn't sure how to handle this best, so I introduced a new type that replaces the old `__tgt_device_image` struct that we can expand inside the runtime library. I made the new `__tgt_device_binary` struct pretty much the same for now. In the future we could change this struct to pretty much be the `OffloadBinary` class in the future. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D127432	2022-07-21 13:20:04 -04:00
Joseph Huber	5d8a76feb0	[Libomptarget] Build the device library even if the sm list is empty We previously had some logic that stopped us from building the device runtime if there were no NVPTX architectures provided. This is incorrect because we could have AMDGPU libraries. Even if the lists are empty we should be able to attempt to build these and get dummy output. THis wilil make it much easier for our tooling which expects certain libraries. If the user wishes to disable the library entirely they should use `-DLIBOMPTARGET_BUILD_DEVICERTL_BCLIB=OFF" Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D130266	2022-07-21 10:57:47 -04:00
Joseph Huber	dc52712a06	[Libomptarget] Make libomptarget an LLVM library This patch makes libomptarget depend on LLVM libraries to be built. The reason for this is because we already have an implicit dependency on LLVM headers for ELF identification and extraction as well as an optional dependenly on the LLVMSupport library for time tracing information. Furthermore, there are changes in the future that require using more LLVM libraries, and will heavily simplify some future code as well as open up the large amount of useful LLVM libraries to libomptarget. This will make "standalone" builds of `libomptarget' more difficult for vendors wishing to ship their own. This will require a sufficiently new version of LLVM to be installed on the system that should be picked up by the existing handling for the implicit headers. The things this patch changes are as follows: - `libomptarget.so` links against LLVMSupport and LLVMObject - `libomptarget.so` is a symbolic link to `libomptarget.so.15` - If using a shared library build, user applications will depend on LLVM libraries as well - We can now use LLVM resources in Libomptarget. Note that this patch only changes this to apply to libomptarget itself, not the plugins. Additional patches will be necessary for that. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D129875	2022-07-20 15:58:06 -04:00
Joseph Huber	b5b20164d2	Revert "[Libomptarget] Make libomptarget an LLVM library" This reverts commit `643dfd97d5`. This patch still makes the AMDGPU buildbots unhappy. Reverting for now until the AMD folks figure it out.	2022-07-20 10:18:55 -04:00
Joseph Huber	6b0db92bbd	[Libomptarget] Fix LTO command line in test Summary: The test passed -offload-lto instead of -foffload-lto.	2022-07-20 10:18:55 -04:00
Joseph Huber	643dfd97d5	[Libomptarget] Make libomptarget an LLVM library This patch makes libomptarget depend on LLVM libraries to be built. The reason for this is because we already have an implicit dependency on LLVM headers for ELF identification and extraction as well as an optional dependenly on the LLVMSupport library for time tracing information. Furthermore, there are changes in the future that require using more LLVM libraries, and will heavily simplify some future code as well as open up the large amount of useful LLVM libraries to libomptarget. This will make "standalone" builds of `libomptarget' more difficult for vendors wishing to ship their own. This will require a sufficiently new version of LLVM to be installed on the system that should be picked up by the existing handling for the implicit headers. The things this patch changes are as follows: - `libomptarget.so` links against LLVMSupport and LLVMObject - `libomptarget.so` is a symbolic link to `libomptarget.so.15` - If using a shared library build, user applications will depend on LLVM libraries as well - We can now use LLVM resources in Libomptarget. Note that this patch only changes this to apply to libomptarget itself, not the plugins. Additional patches will be necessary for that. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D129875	2022-07-20 09:52:09 -04:00
Jonathan Peyton	40ce65b5b2	[OpenMP][libomp] Fix affinity warnings and unify under one macro Warnings that occur during affinity initialization are supposed to be guarded by KMP_AFFINITY=nowarnings,noverbose, but some had been missed by this logic. Create one macro for affinity warnings that takes these settings into account. Differential Revision: https://reviews.llvm.org/D125991	2022-07-19 13:10:25 -05:00
AndreyChurbanov	17dcde5f1b	[OpenMP][libomp] Allow reset affinity mask after parallel Added control to reset affinity of primary thread after outermost parallel region to initial affinity encountered before OpenMP runtime was initialized. KMP_AFFINITY environment variable reset/noreset modifier introduced. Default behavior is unchanged. Differential Revision: https://reviews.llvm.org/D125993	2022-07-19 13:05:05 -05:00
Jonathan Peyton	28c8da2965	[OpenMP][libomp] Fix fallthrough attribute detection for Intel compilers icc does not properly detect lack of fallthrough attribute since it defines __GNU__ > 7 and also icc's __has_cpp_attribute/__has_attribute feature detectors do not properly detect the lack of fallthrough attribute. Differential Revision: https://reviews.llvm.org/D126001	2022-07-19 13:04:25 -05:00
AndreyChurbanov	a01d274fbd	[OpenMP][libomp] Fix /dev/shm pollution after forked child process terminates Made library registration conditional and skip it in the __kmp_atfork_child handler, postponed it till middle initialization in the child. This fixes the problem of applications those use e.g. popen/pclose which terminate the forked child process. Differential Revision: https://reviews.llvm.org/D125996	2022-07-19 12:59:58 -05:00
Jon Chesterfield	e46f727b38	Revert "[Libomptarget] Make libomptarget an LLVM library" This reverts commit `70039be627`.	2022-07-19 17:59:45 +01:00
Joseph Huber	70039be627	[Libomptarget] Make libomptarget an LLVM library This patch makes libomptarget depend on LLVM libraries to be built. The reason for this is because we already have an implicit dependency on LLVM headers for ELF identification and extraction as well as an optional dependenly on the LLVMSupport library for time tracing information. Furthermore, there are changes in the future that require using more LLVM libraries, and will heavily simplify some future code as well as open up the large amount of useful LLVM libraries to libomptarget. This will make "standalone" builds of `libomptarget' more difficult for vendors wishing to ship their own. This will require a sufficiently new version of LLVM to be installed on the system that should be picked up by the existing handling for the implicit headers. The things this patch changes are as follows: - `libomptarget.so` links against LLVMSupport and LLVMObject - `libomptarget.so` is a symbolic link to `libomptarget.so.15` - If using a shared library build, user applications will depend on LLVM libraries as well - We can now use LLVM resources in Libomptarget. Note that this patch only changes this to apply to libomptarget itself, not the plugins. Additional patches will be necessary for that. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D129875	2022-07-19 12:33:31 -04:00
Joseph Huber	cdea437057	[Libomptarget] Fix warnings on address space attributes The device runtime uses the address space attribute to control the placement of important constants on the GPU. The changes made in D126061 caused these to start emitting errors as they were not applied to the type. This patch fixes the issues to make the warnings go away. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D129896	2022-07-15 17:21:30 -04:00
Joseph Huber	1f940b69c3	[Libomptarget][NFC] Fix signed comparison warnings Summary: Non-functional change, just fixing some sign comparison warnings by making both match.	2022-07-15 13:22:55 -04:00
Shilei Tian	65ebcee197	[OpenMP] Ignore .eggs file in OpenMP The OMPD patches introduces GDB plugin. When it is built, it will create a coulple of temp files in `.eggs`. This patch add it into `.gitignore` in case it messed up the git tracking. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D129711	2022-07-14 12:06:50 -04:00
Joseph Huber	b1d574867d	[Libomptarget] Allow static assert to work on 32-bit systems Summary: We use a static assert to make sure that someone doesn't change the size of an argument struct without properly updating all the other logic. This originally only checked the size on a 64-bit system with 8-byte pointers, causing builds on 32-bit systems to fail. This patch allows either pointer size to work. Fixes #56486	2022-07-12 08:05:01 -04:00
Vignesh Balasubramanian	9dc0d6aaa1	Fixing build bot failure due to python-pip unavailability. commit: `51d3f421f4` failed due to missing python-pip om machine. Now the ompd gdb-plugin code will be skipped with a warning if pip is not available in the machine.	2022-07-12 16:01:59 +05:30
Vignesh Balasubramanian	51d3f421f4	[OpenMP][OMPD] GDB plugin code to leverage libompd to provide debugging support for OpenMP programs. This is 5th of 6 patches started from https://reviews.llvm.org/D100181 This plugin code, when loaded in gdb, adds a few commands like ompd icv, ompd bt, ompd parallel. These commands create an interface for GDB to read the OpenMP runtime through libompd. Reviewed By: @dreachem Differential Revision: https://reviews.llvm.org/D100185	2022-07-12 14:38:41 +05:30
Shilei Tian	e7d998e51e	[NFC][OpenMP][Offloading] Fix compilation warning caused by misuse of `static_cast`	2022-07-08 20:59:37 -04:00
Joseph Huber	269d5c16bc	[Libomptarget][NFC] Move legacy functions to a separate file This patch moves the old legacy interfaces into `libomptarget` to a separate file. These do not need to be included anywhere and are simply provided for backwards compatibility with the ABI. This cleans up the interface greatly. Depends on D128817 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D128818	2022-07-08 14:44:21 -04:00
Joseph Huber	c9353eb4bc	[Libomptarget] Use new tripcount argument in the runtime. The previous patch added an argument to the `__tgt_target_kernel` runtime function which includes the tripcount used for the loop clause. This was originally passed in via the `__kmpc_push_target_tripcount` function. Now we move this logic to the kernel launch itself and remove the need for the push function. Depends on D128816 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D128817	2022-07-08 14:44:19 -04:00
Joseph Huber	ad23e4d85f	[Libomptarget] Implement a unified kernel entry function This patch implements a unified kernel entry function that will be targeted from both teams and non-teams clauses. We introduce a new interface and make the old functions call in using the new one. A following patch will include the necessary changes to Clang to call these new functions instead. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D128549	2022-07-08 14:44:06 -04:00
Ye Luo	fca79b78c4	[libomptarget] compile DeviceRTL bc files with -O3 bc files of DeviceRTL are compiled with -O3, the same as the static library. Differential Revision: https://reviews.llvm.org/D129344	2022-07-08 10:00:26 -05:00
Vadim Paretsky	43d5c4d539	[OpenMP] add 4 custom APIs supporting MSVC OMP codegen This check-in adds 4 APIs to support MSVC, specifically: * 3 APIs (__kmpc_sections_init, __kmpc_next_section, __kmpc_end_sections) to support the dynamic scheduling of OMP sections. * 1 API (__kmpc_copyprivate_light, a light-weight version of __kmpc_copyrprivate) to support the OMP single copyprivate clause. Differential Revision: https://reviews.llvm.org/D128403	2022-07-05 17:26:18 -05:00
Joseph Huber	d27d0a673c	[Libomptarget][NFC] Make Libomptarget use the LLVM naming convention Libomptarget grew out of a project that was originally not in LLVM. As we develop libomptarget this has led to an increasingly large clash between the naming conventions used. This patch fixes most of the variable names that did not confrom to the LLVM standard, that is `VariableName` for variables and `functionName` for functions. This patch was primarily done using my editor's linting messages, if there are any issues I missed arising from the automation let me know. Reviewed By: saiislam Differential Revision: https://reviews.llvm.org/D128997	2022-07-05 14:53:38 -04:00
Shilei Tian	696bca9bb2	[NFC][OpenMP][CUDA] Remove unnecessary default label	2022-07-01 09:50:29 -04:00
Jose M Monsalve Diaz	616dd9ae14	[OpenMP] Implementing omp_get_device_num() This patch implements omp_get_device_num() in the host and the device. It uses the already existing getDeviceNum in the device config for the device. And in the host it uses the omp_get_num_devices(). Two simple tests added Differential Revision: https://reviews.llvm.org/D128347	2022-06-29 02:18:21 -05:00
Shilei Tian	2695e23ad9	[OpenMP][CUDA] Fix the issue that P2P memcpy doesn't work This patch fixes the issue that P2P memcpy doesn't work. The root cause is we didn't set current context when calling the API function. In addition, a matrix to track the states of each pair of devices is also added such that we only need to query and configure the device once. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D122764	2022-06-28 15:32:03 -04:00
Daniel Douglas	d4a7b8de52	[OpenMP][libomp] avoid spin wait and yield on arm64 macOS This patch changes the default behavior to avoid spin waiting and yielding. (See “Don’t Keep Threads Active And Idle” section here: https://developer.apple.com/documentation/apple-silicon/tuning-your-code-s-performance-for-apple-silicon) We verified using instruments traces that the changes improve scheduling behavior on macOS. We also collected results using EPCC schedbench (https://github.com/LangdalP/EPCC-OpenMP-micro-benchmarks) that are attached here that show a reduction in standard deviation and max test run time across all scheduling types. Static scheduling sees dramatic improvements with these changes, we see a 2-4x average runtime improvement in the benchmark. Differential Revision: https://reviews.llvm.org/D126510	2022-06-24 12:02:16 -05:00
Jonathan Peyton	b7b4986576	[OpenMP][libomp] Hold old __kmp_threads arrays until library shutdown When many nested teams are formed, __kmp_threads may be reallocated to accommodate new threads. This reallocation causes a data race when another existing team's thread simultaneously references __kmp_threads. This patch keeps the old thread arrays around until library shutdown so these lingering references can complete without issue and access to __kmp_threads remains a simple array reference. Fixes: https://github.com/llvm/llvm-project/issues/54708 Differential Revision: https://reviews.llvm.org/D125013	2022-06-22 10:30:35 -05:00
Joseph Huber	3351ae61d9	[Libomptarget] Remove duplicate data environment exit Summary: This patch removes a duplicated exit from the OpenMP data envrionment. We already have an RAII method that guards this environment so it is unnecessary.	2022-06-21 22:35:32 -04:00
Ye Luo	4d9499e8cc	[libomptarget] Make libomptarget.devicertl.a built in all cases. Make libomptarget.device.a built when using -DLLVM_ENABLE_PROJECTS=openmp Use add_custom_command. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D128130	2022-06-20 08:29:16 -05:00
Ye Luo	54b45afb59	[libomptarget]Add a trap for external omptarget from LLVM Old LLVM installation may expose its internal omptarget CMake target when being used by find_package(LLVM) and caused issues in the CMake of libomptarget that is being built. Trap the issue early. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D128129	2022-06-18 21:08:53 -05:00
Joseph Huber	d87ca519c9	[Libomptarget] Use binutils archive executable to address failing tests Summary: The static linking test ensures that we can statically link offloading programs. To create the test we used `llvm-ar`. However, this may not exist in the user's environment. This patch changes it to use the binutils `ar` which should exist on every system running these tests currently. In the future we should set up the dependencies properly.	2022-06-14 22:14:17 -04:00
Joseph Huber	d5d836635c	[Libomptarget] Add test config for compiling in LTO-mode We are planning on making LTO the default compilation mode for offloading. In order to make sure it works we should run these tests on the test suite. AMDGPU already uses the LTO compilation path for its linking, but in LTO mode it also links the static library late. Performing LTO requires the static library to be built, if we make the change this will be a hard requirement and the old bitcode library will go away. This means users will need to use either a two-step build or a runtimes build for libomptarget. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D127512	2022-06-14 10:16:03 -04:00
John Ericson	0bb317b7bf	Revert "[cmake] Don't export `LLVM_TOOLS_INSTALL_DIR` anymore" This reverts commit `d5daa5c5b0`.	2022-06-10 19:26:12 +00:00
John Ericson	d5daa5c5b0	[cmake] Don't export `LLVM_TOOLS_INSTALL_DIR` anymore First of all, `LLVM_TOOLS_INSTALL_DIR` put there breaks our NixOS builds, because `LLVM_TOOLS_INSTALL_DIR` defined the same as `CMAKE_INSTALL_BINDIR` becomes an absolute path, and then when downstream projects try to install there too this breaks because our builds always install to fresh directories for isolation's sake. Second of all, note that `LLVM_TOOLS_INSTALL_DIR` stands out against the other specially crafted `LLVM_CONFIG_*` variables substituted in `llvm/cmake/modules/LLVMConfig.cmake.in`. @beanz added it in `d0e1c2a550` to fix a dangling reference in `AddLLVM`, but I am suspicious of how this variable doesn't follow the pattern. Those other ones are carefully made to be build-time vs install-time variables depending on which `LLVMConfig.cmake` is being generated, are carefully made relative as appropriate, etc. etc. For my NixOS use-case they are also fine because they are never used as downstream install variables, only for reading not writing. To avoid the problems I face, and restore symmetry, I deleted the exported and arranged to have many `${project}_TOOLS_INSTALL_DIR`s. `AddLLVM` now instead expects each project to define its own, and they do so based on `CMAKE_INSTALL_BINDIR`. `LLVMConfig` still exports `LLVM_TOOLS_BINARY_DIR` which is the location for the tools defined in the usual way, matching the other remaining exported variables. For the `AddLLVM` changes, I tried to copy the existing pattern of internal vs non-internal or for LLVM vs for downstream function/macro names, but it would good to confirm I did that correctly. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D117977	2022-06-10 14:35:18 +00:00
Yuki Okushi	074f12e467	[OpenMP] Fix the build on Windows The code expanded from kmp_barrier.h uses some `KMP_INTERNAL_*`s, so the definitions have to be placed before it. Fixes #55815 Differential Revision: https://reviews.llvm.org/D126873	2022-06-09 22:12:42 +09:00
Jose Manuel Monsalve Diaz	15ed5c0a07	[LIBOMPTARGET] Adding AMD to llvm-omp-device-info Adding device information print for AMD devices on the `llvm-omp-device-info` command line tool. The output is inspired by the rocminfo command line tool. This commit adds missing HSA functions, enums and structs needed to query additional information from the HSA agents. A generic message for the `generic-elf-64bit` plugin is also added Example of an output: ``` llvm-omp-device-info Device (0): This is a generic-elf-64bit device Device (1): This is a generic-elf-64bit device Device (2): This is a generic-elf-64bit device Device (3): This is a generic-elf-64bit device Device (4): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 0 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (5): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 1 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (6): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 2 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (7): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 3 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE ``` Differential Revision: https://reviews.llvm.org/D126836	2022-06-09 11:58:39 +00:00
Jose Manuel Monsalve Diaz	84e020a061	Revert "[LIBOMPTARGET] Adding AMD to llvm-omp-device-info" This reverts commit `d16a0877d8`.	2022-06-09 10:46:03 +00:00
Jose Manuel Monsalve Diaz	d16a0877d8	[LIBOMPTARGET] Adding AMD to llvm-omp-device-info Adding device information print for AMD devices on the `llvm-omp-device-info` command line tool. The output is inspired by the rocminfo command line tool. This commit adds missing HSA functions, enums and structs needed to query additional information from the HSA agents. A generic message for the `generic-elf-64bit` plugin is also added Example of an output: ``` llvm-omp-device-info Device (0): This is a generic-elf-64bit device Device (1): This is a generic-elf-64bit device Device (2): This is a generic-elf-64bit device Device (3): This is a generic-elf-64bit device Device (4): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 0 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (5): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 1 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (6): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 2 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (7): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 3 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE ``` Differential Revision: https://reviews.llvm.org/D126836	2022-06-08 16:31:12 +00:00
Joseph Huber	86a4c78047	[Libomptarget] Add missing include to define `printf` Summary: This test was failing because of an implicit declaration of `printf` which isn't legal with newer C, causing it to fail. This patch just adds the necessary header.	2022-06-08 09:56:51 -04:00
Joseph Huber	421b1f55c6	[Libomptarget] Do not use retaining attributes for the static library When we build the libomptarget device runtime library targeting bitcode, we need special care to make sure that certain functions are not optimized out. This is because we manually internalize and optimize these definitions, ignoring their standard linkage semantics. When we build with the static library, we can maintain these semantics and we do not need these to be kept-alive. Furthermore, if they are kept-alive it prevents them from being removed during LTO. This prevents us from completely internalizing `IsSPMDMode` and removing several other functions. This patch removes these for the static library target by using a macro definition to enable them. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D126701	2022-06-07 12:16:34 -04:00
Vadim Paretsky	f58fe2e186	[OpenMP] allow loc to be NULL in __kmp_determine_reduction_method for MSVC MSVC may not supply source location information to kmpc_reduce passing NULL for the value. The patch adds a check for the loc value being NULL in kmp_determine_reduction_method. Differential Revision: https://reviews.llvm.org/D126564	2022-06-03 14:11:39 -05:00
Daniel Douglas	5d25dbff67	[OpenMP][libomp] do not try to dlopen libmemkind on macOS The memkind library is only available for linux. Calling dlopen here can also be problematic in a client app that fork'ed. Differential Revision: https://reviews.llvm.org/D126579	2022-06-02 14:28:09 -05:00
David CARLIER	2ba5d820e2	[OpenMP] omp_get_proc_id uses sched_getcpu fallback on FreeBSD 13.1 and above. Reviewers: jlpeyton, jdoerfert Reviewed-By: jlpeyton Differential-Revision: https://reviews.llvm.org/D126408	2022-06-02 17:10:29 +01:00
Mikael Simberg	e27ce28139	[OpenMP][libomp] Make LIBOMP_CONFIGURED_LIBFLAGS a list instead of string When configuring llvm with the openmp subproject, the build for the omp target fails if LIBOMP_CONFIGURED_LIBFLAGS contains more than one item. LIBOMP_CONFIGURED_LIBFLAGS should be a semicolon-separated list instead of a string with items separated by spaces. Differential Revision: https://reviews.llvm.org/D125370	2022-06-02 10:50:21 -05:00
Joseph Huber	f4f23de1a4	[Libomptarget] Add basic support for dynamic shared memory on AMDGPU This patchs adds the arguments necessary to allocate the size of the dynamic shared memory via the `LIBOMPTARGET_SHARED_MEMORY_SIZE` environment variable. This patch only allocates the memory, AMDGPU has a limitation that shared memory can only be accessed from the kernel directly. So this will currently only work with optimizations to inline the accessor function. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D125252	2022-06-01 13:32:50 -04:00
Joseph Huber	ae76652677	Revert "[Libomptarget] Add `leaf` attribute to `vprintf` declaration" This is preventing users from calling `printf` on NVPTX code. Revert for now until there is a fix. This reverts commit `eda4ef3add`.	2022-05-31 10:24:04 -04:00
Joel E. Denny	d2e3cb7374	[OpenMP][Clang] Fix atomic compare for signed vs. unsigned Without this patch, arguments to the `llvm::OpenMPIRBuilder::AtomicOpValue` initializer are reversed. Reviewed By: ABataev, tianshilei1992 Differential Revision: https://reviews.llvm.org/D126619	2022-05-30 11:02:20 -04:00
Joel E. Denny	4a36813669	[OpenACC][OpenMP] Document atomic-in-teams extension That is, put D126323 in the status doc and explain its relationship to OpenACC support. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D126547	2022-05-27 18:53:19 -04:00
Joel E. Denny	48ca3a5ebb	[OpenMP] Extend omp teams to permit nested omp atomic OpenMP 5.2, sec. 10.2 "teams Construct", p. 232, L9-12 restricts what regions can be strictly nested within a `teams` construct. This patch relaxes Clang's enforcement of this restriction in the case of nested `atomic` constructs unless `-fno-openmp-extensions` is specified. Cases like the following then seem to work fine with no additional implementation changes: ``` #pragma omp target teams map(tofrom:x) #pragma omp atomic update x++; ``` Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D126323	2022-05-26 14:59:16 -04:00
Joseph Huber	20ec4161d7	[Libomptarget] Add branch prediction intrinsic to state check Summary: We usually used the `OMP_LIKELY` and `OMP_UNLIKELY` macros to add branch prediction intrinsics to help the optimizer ignore unlikely loops. This wasn't applied to this one loop so add that in.	2022-05-20 15:38:54 -04:00
Jonathan Peyton	f613e6d19d	[OpenMP][libomp] Fix accidental removal of else for core attributes	2022-05-19 14:00:27 -05:00
Joseph Huber	eda4ef3add	[Libomptarget] Add `leaf` attribute to `vprintf` declaration Summary: This patch adds the `leaf` attribute to the `vprintf` declaration in the OpenMP runtime. This attribute allows us to determine that the `vprintf` function will not call any functions within the translation unit, allowing us to deduce `norecurse` attributes on the caller.	2022-05-19 14:22:53 -04:00
AndreyChurbanov	c44ba01de7	[OpenMP] libomp: honor passive wait policy requested with tasking Currently the library ignores requested wait policy in the presence of tasking. Threads always actively spin. The patch fixes this problem making the wait policy passive if this explicitly requested by user. Differential Revision: https://reviews.llvm.org/D123044	2022-05-18 10:04:30 -05:00
Joseph Huber	5ffecd28c9	[Libomptarget] Don't build the device runtime without a new Clang The OpenMP device offloading library is a bitcode library and thus only expect to build and linked with the same version of clang that was used to create it. This somewhat copmlicates the building process as we require the Clang that was just built to be used to create the library. This is either done with a two-step build, where OpenMP is built with the Clang that was just installed, or through the `-DLLLVM_ENABLE_RUNTIMES=openmp` option. This has always been the case, but recent changes have caused this to make it difficult to build the rest of OpenMP. This patchs adds a check to not build the OpenMP device runtime if the current compiler is not Clang with the same version as the LLVM installation. This should allow users to build OpenMP as a project using any compiler without it erroring out due to the bitcode library, but if users require it they will need to use the above methods to compile it. Reviewed By: jdoerfert, tianshilei1992, ye-luo Differential Revision: https://reviews.llvm.org/D125698	2022-05-16 18:18:32 -04:00
Joseph Huber	54e02179b3	[Libomptarget] Build the static library without CUDA installed Summary: This patch allows users to compile the static library without CUDA installed on the system. This requires the new flag `--cuda-feature` to indicate that we need `+ptx61` in order to compile the runtime.	2022-05-13 16:30:58 -04:00
Joseph Huber	16b7a0b43b	[Libomptarget] Build the device runtime as a static library This patch adds the necessary CMake configuration to build a static library version of the device runtime, `libomptarget.devicertl.a`. Various improvements in how we handle static libraries and generating offloading code should allow us to treat the device library as a regular project without needing to invoke the clang front-end directly. Here we generate a job for each offloading architecture supported. Each offloading architecture will be embedded into the static library and used as-needed by the host. This library will primarily be used to replace the bitcode library when performing LTO. Currently, we need to manually pass in the bitcode library which requires foreknowledge of the offloading architecture. This approach lets us handle that in the linker wrapper instead. Furthermore this should improve our interface to the device runtime. We can now build it fully under a release build and have all the expected entry points, as well as supporting debug builds. Depends on D125265 D125256 D125260 D125314 D125563 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D125315	2022-05-13 14:38:51 -04:00
Joseph Huber	9ffa945c40	[Libomptarget] Remove global include directory from libomptarget We used to globally include the libomptarget include directory for all projects. This caused some conflicts with the other files named "Debug.h". This patch changes the cmake to include these files via the target include instead. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D125563	2022-05-13 14:38:47 -04:00
Joseph Huber	ce0caf41bd	[Libomptarget] Address existing warnings in the device runtime library This patche attemps to address the current warnings in the OpenMP offloading device runtime. Previously we did not see these because we compiled the runtime without the standard warning flags enabled. However, these warnings are used when we now build the static library version of this runtime. This became extremely noisy when coupled with the fact the we compile each file roughly 32 times when all the architectures are considered. So it would be ideal to not have all these warnings show up when building. Most of these errors were simply implicit switch-case fallthroughs, which can be addressed using C++17's fallthrough attribute. Additionally there was a volatile variable that was being casted away. This is most likely safe to remove because we cast it away before its even used and didn't seem to affect anything in testing. Depends on D125260 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D125339	2022-05-13 14:38:31 -04:00
Joseph Huber	b4f8443d97	[Libomptarget] Allow the device runtime to be compiled for the host Currently the OpenMP offloading device runtime is only expected to be compiled for the specific architecture it's targeting. This is problematic if we want to make compiling the device runtime more general via the standar `clang` driver rather than invoking the clang front-end directly. This patch addresses this by primarily changing the declare type to `nohost` so the host will not contain any of this code. Additionally we forward declare the functions that are defined via variants, otherwise these would cause problems on the host. Reviewed By: jdoerfert, tianshilei1992 Differential Revision: https://reviews.llvm.org/D125260	2022-05-13 14:38:27 -04:00
serge-sans-paille	40d3a0ba4d	[openmp] Fix strict aliasing issue in cmpxchg routine Avoid warning under -fstrict-aliasing by using a call to memcpy to perform type punning. Differential Revision: https://reviews.llvm.org/D125467	2022-05-12 16:14:48 +02:00
AndreyChurbanov	52d0ef3c00	[OpenMP] libomp: Add itt notifications to sync dependent tasks. Intel Inspector uses itt notifications to analyze code execution, and it reports race conditions in dependent tasks. This patch fixes the issue notifying Inspector on tasks dependency synchronizations. Differential Revision: https://reviews.llvm.org/D123042	2022-05-05 11:30:59 -05:00
AndreyChurbanov	4a64bed216	[OpenMP] libomp: cleanup - remove duplicate check The identical check remains 20 lines above in the code. Differential Revision: https://reviews.llvm.org/D123046	2022-05-05 11:01:20 -05:00
AndreyChurbanov	eed0d85152	[OpenMP] libomp: cleanup dead code Differential Revision: https://reviews.llvm.org/D123047	2022-05-05 10:56:49 -05:00
Hansang Bae	7e23b46ab8	[OpenMP] Possible fix for sporadic test failure from loop_dispatch.c This patch tries to fix sporadic test failure after the change https://reviews.llvm.org/D122107. Made the test wait until every thread has at least one loop iteration. Differential Revision: https://reviews.llvm.org/D124812	2022-05-03 14:46:32 -05:00
Joseph Huber	5ad07ac400	[Libomptarget] Use entry name for global info Currently, globals on the device will have an infinite reference count and an unknown name when using `LIBOMPTARGET_INFO` to print the mapping table. We already store the name of the global in the offloading entry so we should be able to use it, although there will be no source location. To do this we need to create a valid `ident_t` string from a name only. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D124381	2022-04-25 09:56:43 -04:00
Ye Luo	8a880db519	[libomptarget] Make omp_target_is_present checks storage instead of zero length array. Consider checking whether a pointer has been mapped can be achieved via omp_get_mapped_ptr. omp_target_is_present is more needed to check whether the storage being pointed is mapped. This restore the old behavior of omp_target_is_present before D123093 Fixes https://github.com/llvm/llvm-project/issues/54899 Reviewed By: jdenny Differential Revision: https://reviews.llvm.org/D123891	2022-04-22 17:37:06 -05:00
Ye Luo	91ccd8248c	[Clang][OpenMP] libompd: get libomp hwloc includedir by target_link_libraries When hwloc is used and is installed outside of the default paths, the omp CMake target needs to provide the needed include path thru the CMake target by adding it with target_include_directories to it, so libompd gets it as well when it defines it's cmake target using target_link_libraries. As suggested in D122667 Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D123888	2022-04-22 17:33:41 -05:00
Joseph Huber	f557bb8733	[OpenMP][Docs] Remove usage of deprecated flag in documentation Summary: This documentation used the `-fopenmp-target-new-runtime` flag which is deprecated and has no effect. Remove it.	2022-04-21 18:50:25 -04:00
Atmn Patel	c44420e90d	[Libomptarget][remote] Add OpenMP linker flag to the plugin The remote offloading server and plugin rely on OpenMP, so this needs to be added as a linker flag. Without this, applications segfault. Differential Revision: https://reviews.llvm.org/D124200	2022-04-21 15:45:29 -04:00
Atmn Patel	489894f363	[Libomptarget][remote] Fix compile-time error This fixes a compile-time error recently introduced within the remote offloading plugin. This patch also removes some extra linker flags that are unnecessary, and adds an explicit abseil linker flag without which we occasionally get problems. Differential Revision: https://reviews.llvm.org/D119984	2022-04-19 16:46:01 -04:00
Joseph Huber	80787213ea	[Libomptarget] Fix test using old unsupported lit string Summary: One test had an old "unsupported" string that used the old `newDriver` string which was removed. This test should be updated to use the `oldDriver` one instead.	2022-04-18 23:08:12 -04:00
Joseph Huber	ae23be84cb	[OpenMP] Make the new offloading driver the default Previously an opt-in flag `-fopenmp-new-driver` was used to enable the new offloading driver. After passing tests for a few months it should be sufficiently mature to flip the switch and make it the default. The new offloading driver is now enabled if there is OpenMP and OpenMP offloading present and the new `-fno-openmp-new-driver` is not present. The new offloading driver has three main benefits over the old method: - Static library support - Device-side LTO - Unified clang driver stages Depends on D122683 Differential Revision: https://reviews.llvm.org/D122831	2022-04-18 15:05:09 -04:00
Joseph Huber	ba01306009	[Libomptarget] Fix LIBOMPTARGET_INFO test Summary: A patch added a new line to one of the info outputs without updating this test. This patch adds the new text to the existing test.	2022-04-18 14:09:02 -04:00
Dhruva Chakrabarti	7086a1db80	[libomptarget] [amdgpu] Hostcall offset check should consider implicit args Fixed hostcall offset check to compare against kernarg segment size and implicit arguments. Improved the corresponding debug print. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D123827	2022-04-15 00:53:47 +00:00
Saiyedul Islam	54a6cc3405	[libomptarget][amdgpu] Add hidden_heap_v1 kernarg metadata Code object version 5 adds support of hidden_heap_v1 kernarg metadata field [1]. It is a global address space pointer to an initialized memory buffer that conforms to the requirements of the malloc/free device library V1 version implementation. [1] https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdhsa-code-object-kernel-argument-metadata-map-table-v5 Reviewed By: carlo.bertolli Differential Revision: https://reviews.llvm.org/D123527	2022-04-13 03:57:29 +00:00
Johannes Doerfert	a3a42c3ca2	[OpenMP][FIX] Ensure to set the context for wait events if necessary Differential Revision: https://reviews.llvm.org/D123445	2022-04-12 16:42:50 -05:00
Jonathan Peyton	d49ce7c356	[OpenMP][libomp] Replace global variable references with local object Remove references to global __kmp_topology within a kmp_topology_t object method. There should just be implicit references to the private object.	2022-04-12 12:50:41 -05:00
Jonathan Peyton	747a490612	[OpenMP][libomp] Fix some Doxygen issues Fix spelling of variable names and remove accidental references (#) in Doxygen comments.	2022-04-12 11:05:30 -05:00
Joseph Huber	2e0cb61570	[OpenMP] Fix linker error when building info tool Summary: The changes made in D123177 added new targets to the `LIBOMPTARGET_TESTED_PLUGINS` variable which was linked against when building the `llvm-omp-target-info` tool. This caused linker errors on the export scripts. This patch removes that dependency, it still builds and runs as expected so I will assume it's correct.	2022-04-08 10:50:31 -04:00
Ye Luo	c1a6fe196d	[libomptarget] Implement pointer lookup as 5.1 spec. As described in 5.1 spec 2.21.7.2 Pointer Initialization for Device Data Environments Reviewed By: RaviNarayanaswamy Differential Revision: https://reviews.llvm.org/D123093	2022-04-07 23:01:25 -05:00
Joseph Huber	a3f423cf57	[OpenMP] Add dynamic memory function to omp.h and add documentation This patch adds the `llvm_omp_target_dynamic_shared_alloc` function to the `omp.h` header file so users can access it by default. Also changed the name to keep it consistent with the other target allocators. Added some documentation so users know how to use it. Didn't add the interface for Fortran since there's no way to test it right now. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D123246	2022-04-07 14:23:23 -04:00
Joseph Huber	840c040498	[OpenMP] Change target memory tests to use allocators The target allocators have been supported for NVPTX offloading for awhile. The tests should use the allocators instead of calling the functions manually. Also the comments indicating these being a preview should be removed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D123242	2022-04-07 14:23:14 -04:00
Michael Kruse	7fa7b0cbd8	[libomptarget] Add device RTL to regression test dependencies. In a clean build directory, `check-openmp` or `check-libomptarget` will fail because of missing device RTL .bc files. Ensure that the new targets new custom targets `omptarget.devicertl.nvptx` and `omptarget.devicertl.amdgpu` (corresponding to the plugin rtl targets `omptarget.rtl.cuda`, respectively `omptarget.rlt.amdgpu` ) are dependencies of the regression tests. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D123177	2022-04-06 20:01:47 -05:00
Hansang Bae	090309d316	[OpenMP] Fix warnings Silenced compiler warnings after pushing the following change. https://reviews.llvm.org/D122107 Differential Revision: https://reviews.llvm.org/D123233	2022-04-06 12:35:05 -05:00
Hansang Bae	e4ac11beb7	[OpenMP] Add support for ompt_callback_dispatch This change adds support for ompt_callback_dispatch with the new dispatch chunk type introduced in 5.2. Definitions of the new ompt_work_loop types were also added in the header file. Differential Revision: https://reviews.llvm.org/D122107	2022-04-06 08:01:02 -05:00
Jonathan Peyton	d345fe7c22	[OpenMP][libomp] NFC: Move omp_* functions out of kmp_* section	2022-03-31 13:39:30 -05:00
Joachim Protze	7641e42def	[OpenMP][Tools] Fix handling of initial-task-end Latest OpenMP spec says parallel_data is NULL for initial/implicit-task-end. We nevertheless need to cleanup the ParallelData here, as there is no other callback for the end of the implicit parallel region. We can use the reference stored in the TaskData. Reviewed By: dreachem Differential Revision: https://reviews.llvm.org/D114005	2022-03-31 12:33:40 -05:00
Ron Lieberman	95eac47260	[libomptarget] x86 offloading fails map_back_race.cpp intermittently Differential Revision: https://reviews.llvm.org/D122658	2022-03-29 16:01:17 +00:00
Johannes Doerfert	b803f06901	[OpenMP] The test does not have check lines	2022-03-29 00:02:55 -05:00
Johannes Doerfert	b309bdb970	[OpenMP][FIX] Use clang++ for the C++ test case	2022-03-28 23:14:24 -05:00
Johannes Doerfert	b316126887	[OpenMP][FIX] Avoid races in the handling of to be deleted mapping entries If we decided to delete a mapping entry we did not act on it right away but first issued and waited for memory copies. In the meantime some other thread might reuse the entry. While there was some logic to avoid colliding on the actual "deletion" part, there were two races happening: 1) The data transfer back of the thread deleting the entry and the data transfer back of the thread taking over the entry raced. 2) The update to the shadow map happened regardless if the entry was actually reused by another thread which left the shadow map in a inconsistent state. To fix both issues we will now update the shadow map and delete the entry only if we are sure the thread is responsible for deletion, hence no other thread took over the entry and reused it. We also wait for a potential former data transfer from the device to finish before we issue another one that would race with it. Fixes https://github.com/llvm/llvm-project/issues/54216 Differential Revision: https://reviews.llvm.org/D121058	2022-03-28 22:33:18 -05:00
Johannes Doerfert	ba93e4e33e	[OpenMP][NFC] Add missing virtual destructor to silence warning	2022-03-28 22:33:18 -05:00
Johannes Doerfert	7df2eba7fa	[Attributor][OpenMP] Add assumption for non-call assembly instructions Inline assembly is scary but we need to support it for the OpenMP GPU device runtime. The new assumption expresses the fact that it may not have call semantics, that is, it will not call another function but simply perform an operation or side-effect. This is important for reachability in the presence of inline assembly. Differential Revision: https://reviews.llvm.org/D109986	2022-03-28 20:57:52 -05:00
Shilei Tian	545fcc3d84	[OpenMP][CUDA] Fix potential program crash caused by double free resources As we mentioned in the code comments for function `ResourcePoolTy::release`, at some point there could be two identical resources on the two sides of `Next` mark. It is usually not an issue, unless the following case: 1. Some resources are not returned. 2. We need to iterate the pool and free the element. That will cause double free, which is the case for event pool. Since we don't release events hold by the data map, it can happen that the `Next` mark is not reset, and we have two identical items in the pool. When the pool is destroyed, we will call `cuEventDestroy` twice on the same event. In the best case, we can only observe CUDA errors. In the worst case, it can cause internal failures in CUDART and further crash. This patch fixes the issue by tracking all resources that have been given using an `unordered_set`. We don't remove it when a resource is returned. When the pool is destroyed, we merge the pool (a `vector`) and the set. In this way, we can make sure that the set contains all resources allocated from the device. We just need to iterate the set and free the resource accordingly. For now, only event pool is set to use it. Stream pool is not because we can make sure all streams are returned when the plugin is destroyed. Someone might be wondering, why don't we release all events hold in the data map. That is because, plugins are determined to be destroyed before `libomptarget`. If we can somehow make the plugin outlast `libomptarget`, life will be much easier. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D122014	2022-03-25 22:49:32 -04:00
Joseph Huber	9d3550c517	[OpenMP] Add AMDGPU calling convention to ctor / dtor functions This patch adds the necessary AMDGPU calling convention to the ctor / dtor kernels. These are fundamentally device kenels called by the host on image load. Without this calling convention information the AMDGPU plugin is unable to identify them. Depends on D122504 Fixes #54091 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D122515	2022-03-25 22:44:20 -04:00
Johannes Doerfert	6c2be885ff	Revert "[OpenMP][NFC] Add missing virtual destructor to silence warning" This reverts commit `b9fd8f34ae` as it accidentally contained a unit test change that is not finished (and unrelated).	2022-03-25 16:07:11 -05:00
Johannes Doerfert	7dfad948f1	[OpenMP][FIX] Repair ExclusiveAccess move semantic snafu	2022-03-25 16:00:53 -05:00
Johannes Doerfert	b9fd8f34ae	[OpenMP][NFC] Add missing virtual destructor to silence warning	2022-03-25 16:00:53 -05:00
Johannes Doerfert	4e34f061d6	[OpenMP][FIX] Ensure exclusive access to the HDTT map This patch solves two problems with the `HostDataToTargetMap` (HDTT map) which caused races and crashes before: 1) Any access to the HDTT map needs to be exclusive access. This was not the case for the "dump table" traversals that could collide with updates by other threads. The new `Accessor` and `ProtectedObject` wrappers will ensure we have a hard time introducing similar races in the future. Note that we could allow multiple concurrent read-accesses but that feature can be added to the `Accessor` API later. 2) The elements of the HDTT map were `HostDataToTargetTy` objects which meant that they could be copied/moved/deleted as the map was changed. However, we sometimes kept pointers to these elements around after we gave up the map lock which caused potential races again. The new indirection through `HostDataToTargetMapKeyTy` will allows us to modify the map while keeping the (interesting part of the) entries valid. To offset potential cost we duplicate the ordering key of the entry which avoids an additional indirect lookup. We should replace more objects with "protected objects" as we go. Differential Revision: https://reviews.llvm.org/D121057	2022-03-25 11:38:54 -05:00
Joseph Huber	a619072c61	[OpenMP] Manually unroll the argument copy loop The unroll pragma did not properly work as the loop bound was not known when we optimize the runtime and we then added a "unroll disable" metadata which prevented unrolling later when the bounds were known. For now we manually unroll to make sure up to 16 elements are handled nicely. This helps optimizations to look through the argument passing. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109164	2022-03-21 20:54:11 -04:00
Stanislav Mekhanoshin	e0b9364b5c	[AMDGPU] Add gfx90a and gfx940 to get_elf_mach_gfx_name.cpp Differential Revision: https://reviews.llvm.org/D120849	2022-03-17 13:05:07 -07:00
Jon Chesterfield	75779435f3	[nfc][openmp] Swap arguments to remove noise from upcoming diff	2022-03-11 23:08:37 +00:00
Shilei Tian	f6639a424b	[OpenMP][CUDA] Fix the check of `setContext`	2022-03-09 18:48:44 -05:00
Shilei Tian	39d3283a08	[OpenMP][CUDA] Avoid calling `cuCtxSetCurrent` redundantly Currently we set ccontext everywhere accordingly, but that causes many unnecessary function calls. For example, in the resource pool, if we need to resize the pool, we need to get from allocator. Each call to allocate sets the current context once, which is unnecessary. In this patch, we set the context only in the entry interface functions, if needed. Actually in the best way this should be implemented via RAII, but since `cuCtxSetCurrent` could return error, and we don't use exception, we can't stop the execution if RAII fails. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D121322	2022-03-09 16:32:47 -05:00
Shilei Tian	5105c7cd78	[OpenMP][CUDA] Fix an issue that multiple `CUmodule` are could be overwritten This patch fixes the issue introduced in `14de0820e8` and D120089, that if dynamic libraries are used, the `CUmodule` array could be overwritten. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D121308	2022-03-09 14:55:20 -05:00
Mark Dewing	7e0b0e05af	[OpenMP][doc]Minor doc fixes In SupportAndFAQ.rst, add blank lines before and after a bullet list and sublist. This avoids an "Unepxected indentation" warning. In Runtimes.rst, adjust the suggestion for setting LIBOMPTARGET_INFO. The right shifts are not necessary as the bit mask values are already correct. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119595	2022-03-09 14:54:42 -05:00
Johannes Doerfert	14de0820e8	[OpenMP][FIX] Ensure the modules vector is filled as others are The modules vector was for some reason special which could lead to it not being of the same size (=num devices). Easiest solution is to treat it like we do all the other vectors.	2022-03-08 23:45:43 -06:00
Joseph Huber	eae306f52c	[OpenMP][Docs] Make copy pasting remarks easier	2022-03-08 16:54:12 -05:00
Johannes Doerfert	1660288b28	[OpenMP][CUDA] Use one event pool per device An event pool, similar to the stream pool, needs to be kept per device. For one, events are associated with cuda contexts which means we cannot destroy the former after the latter. Also, CUDA documentation states streams and events need to be associated with the same context, which we did not ensure at all. Differential Revision: https://reviews.llvm.org/D120142	2022-03-07 23:43:05 -06:00
Johannes Doerfert	10aa83ff74	[OpenMP] Allow to explicitly deinitialize device resources There are two problems this patch tries to address: 1) We currently free resources in a random order wrt. plugin and libomptarget destruction. This patch should ensure the CUDA plugin is less fragile if something during the deinitialization goes wrong. 2) We need to support (hard) pause runtime calls eventually. This patch allows us to free all associated resources, though we cannot reinitialize the device yet. Follow up patch will associate one event pool per device/context. Differential Revision: https://reviews.llvm.org/D120089	2022-03-07 23:43:04 -06:00
Johannes Doerfert	307bbd3c82	[OpenMP][NFCI] Use RAII lock guards in libomptarget where possible Differential Revision: https://reviews.llvm.org/D121060	2022-03-07 23:43:04 -06:00
Jonathan Peyton	6564a70415	[OpenMP][libomp] Fix register constraint for tpause and umwait Register constraint switched to "=q" which means very specifically (from https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints) > Any register accessible as rl. In 32-bit mode, a, b, c, and d; in 64-bit mode, any integer register. Older gcc versions (8.x and below) were trying to use esi or edi for the 8 bit flag variable, but it wound up displaying this error in the end: kmp_lock.cpp: In function ‘void __kmp_spin_backoff(kmp_backoff_t*)’: kmp_lock.cpp:2684:1: error: unsupported size for integer register Hence the correct restriction is "=q" instead of "=r". Fixes: https://github.com/llvm/llvm-project/issues/53309 Differential Revision: https://reviews.llvm.org/D120519	2022-03-07 14:55:49 -06:00
AndreyChurbanov	6d9eb7e7ec	[OpenMP] libomp: implemented task priorities. Before this patch task priorities were ignored, that was a valid implementation as the task priority is a hint according to OpenMP specification. Implemented shared list of sorted (high -> low) task deques one per task priority value. Tasks execution changed to first check if priority tasks ready for execution exist, and these tasks executed before others; otherwise usual tasks execution mechanics work. Differential Revision: https://reviews.llvm.org/D119676	2022-03-07 22:24:18 +03:00
Johannes Doerfert	7ead7e90fc	Revert "[OpenMP][NFCI] Use RAII lock guards in libomptarget where possible" This reverts commit `ff50e81b50` as it broke the buildbots, see https://reviews.llvm.org/D121060#3362737.	2022-03-06 21:27:41 -06:00
Johannes Doerfert	ff50e81b50	[OpenMP][NFCI] Use RAII lock guards in libomptarget where possible Differential Revision: https://reviews.llvm.org/D121060	2022-03-06 19:59:23 -06:00
James Beddek	2d0c9b64a0	[OpenMP][CMake] Ensure linking against libm for Linux Do the same as is done for NetBSD. Some compiler-rt/lib/builtins files call libm functions (e.g. fmaxl, fabs). Linking libomp with --rtlib=compiler-rt references these functions. Downstream report: https://bugs.gentoo.org/816831 Fixes: https://github.com/llvm/llvm-project/issues/51457	2022-03-05 20:20:28 -08:00
Shilei Tian	7f7c2c34b6	[OpenMP][CMake] Clean up the CMake variable `LIBOMPTARGET_LLVM_INCLUDE_DIRS` `LIBOMPTARGET_LLVM_INCLUDE_DIRS` is currently checked and included for multiple times redundantly. This patch is simply a clean up. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D121055	2022-03-05 22:37:59 -05:00
AndreyChurbanov	1fbdb03b1d	[OpenMP] libomp: omp_in_explicit_task() implemented. Differential Revision: https://reviews.llvm.org/D120671	2022-03-05 21:46:39 +03:00
Joseph Huber	e2dcc2218c	[Libomptarget] Work around bug in initialization of libomptarget Libomptarget uses some shared variables to track certain internal stated in the runtime. This causes problems when we have code that contains no OpenMP kernels. These variables are normally initialized upon kernel entry, but if there are no kernels we will see no initialization. Currently we load the runtime into each source file when not running in LTO mode, so these variables will be erroneously considered undefined or dead and removed, causing miscompiles. This patch temporarily works around the most obvious case, but others still exhibit this problem. We will need to fix this more soundly later. Fixes #54208. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D121007	2022-03-04 13:13:31 -05:00
Aakanksha	840695814a	[AMDGPU] Add gfx1036 target Differential Revision: https://reviews.llvm.org/D120846	2022-03-02 23:26:38 +00:00
Stanislav Mekhanoshin	2e2e64df4a	[AMDGPU] Add gfx940 target This is target definition only. Differential Revision: https://reviews.llvm.org/D120688	2022-03-02 13:54:48 -08:00
Malhar Jajoo	6d658f37a4	[Openmp]: Missing import statement in openmp interface for Fortran Essentially removed the "use omp_lib_kinds" statement and replaced it with import to maintain consistency (and avoid compilation error in case the omp_lib_kinds.mod file is not accessible) in header file. The import is required to access entities in host scoping unit. Differential Revision: https://reviews.llvm.org/D120707	2022-03-01 17:33:06 +00:00
Shilei Tian	75812e7704	[OpenMP][Offloading] Change N back to 256 in bug49334.cpp	2022-02-23 16:10:35 -05:00
Joseph Huber	5dd0c39638	[Libomptarget][NFC} Fix missing newline in error message	2022-02-23 08:10:16 -05:00
Carlo Bertolli	7b731f4d0b	[OpenMP][libomptarget] Delay restore of shadow pointers in structs to after H2D memory copies are completed When using asynchronous plugin calls, shadow pointer restore could happen before the D2H copy for the entire struct has completed, effectively leaving a device pointer in a host struct. This patch fixes the problem by delaying restore's to after a synchronization happens (target regions) and by calling early synchronization (target update). Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119968	2022-02-18 10:09:10 -06:00
Joseph Huber	0870a4f59a	[OpenMP] Add flag for disabling thread state in runtime The runtime uses thread state values to indicate when we use an ICV or are in nested parallelism. This is done for OpenMP correctness, but it not needed in the majority of cases. The new flag added is `-fopenmp-assume-no-thread-state`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120106	2022-02-18 08:35:05 -05:00
Shilei Tian	092a5bb72b	[OpenMP][Offloading] Fix test case issues in bug49334.cpp `bug49334.cpp` has one issue that causes flaky result reported in #53730. The root cause is `BlockedC` is never initialized but in `BlockMatMul_TargetNowait` it is directly read and written (via `+=`). Fixes #53730. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D119988	2022-02-17 10:22:48 -05:00
Johannes Doerfert	57b4c5267b	[OpenMP][FIX] Eliminate race on the IsSPMD global The `IsSPMD` global can only be read by threads other than the main thread after initialization is complete. To allow usage of `mapping::getBlockSize` before initialization is done, we can pass the `IsSPMD` state explicitly. This is similar to other APIs that take `IsSPMD` explicitly to avoid such a race, e.g., `mapping::isInitialThreadInLevel0(IsSPMD)` Fixes https://github.com/llvm/llvm-project/issues/53857	2022-02-16 14:44:20 -06:00
Joseph Huber	777039a51c	[Libomptarget] Run CPU offloading tests using the new driver This patch adds a new target to the OpenMP CPU offloading tests. This tests the usage of the new driver for CPU offloading. If this all works then we can move to transition to the new driver as the default. Depends on D119613 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119736	2022-02-15 15:05:32 -05:00
Joseph Huber	48e3dcecc4	[Libomptarget][NFC] Remove constexpr to hide warnings Currently whenever we compile the device runtime we get the following 'Mapping.cpp:32:32: warning: inline function '_OMP::impl::getGridValue' is not defined [-Wundefined-inline]' warning. This can be silenced by removing the constexpr attribute for this function. Doing this doesn't change the generated bitcode at all but prevents the screen from getting filled with warnings whenver we build the runtime. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119747	2022-02-14 15:34:18 -05:00
Jonathan Peyton	1234011b80	[OpenMP][libomp] Introduce oneAPI compiler support Introduce KMP_COMPILER_ICX macro to represent compilation with oneAPI compiler. Fixup flag detection and compiler ID detection in CMake. Older CMake's detect IntelLLVM as Clang. Fix compiler warnings. Fixup many of the tests to have non-empty parallel regions as they are elided by oneAPI compiler.	2022-02-14 14:10:33 -06:00
Shilei Tian	c27f530d4c	[OpenMP][Offloading] Fix infinite loop in applyToShadowMapEntries This patch fixes the issue that the for loop in `applyToShadowMapEntries` is infinite because `Itr` is not incremented in `CB`. Fixes #53727. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119471	2022-02-12 22:02:53 -05:00
AndreyChurbanov	cb1bee4725	[OpenMP] libomp: fix UB when LIBOMP_NUM_HIDDEN_HELPER_THREADS=1. The __kmp_hidden_helper_threads_num set to N+1 if user requested N threads. Thus number of worker hidden helper threads corresponds to user request, main thread of helper team excluded as it does not participate in actual work. This also fixes divide-by-0 issue in the code. Fixes #48656 Differential Revision: https://reviews.llvm.org/D119586	2022-02-12 03:00:38 +03:00
AndreyChurbanov	d84dedc7d3	[OpenMP] libomp: fix bug in implementation of distribute construct. Fixed mistaken iterations distribution between different target regions. Differential Revision: https://reviews.llvm.org/D118393	2022-02-11 17:34:26 +03:00
Shilei Tian	702a976c12	[OpenMP][Offloading] Change the way to compare floating point values in bug49334.cpp `bug49334.cpp` directly uses `!=` to compare two floating point values, which is almost wrong. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D119485	2022-02-10 18:20:36 -05:00
Shilei Tian	aca33b0b37	[OpenMP][CUDA] Remove the hard team limit Currently we have a hard team limit, which is set to 65536. It says no matter whether the device can support more teams, or users set more teams, as long as it is larger than that hard limit, the final number to launch the kernel will always be that hard limit. It is way less than the actual hardware limit. For example, my workstation has GTX2080, and the hardware limit of grid size is 2147483647, which is exactly the largest number a `int32_t` can represent. There is no limitation mentioned in the spec. This patch simply removes it. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119313	2022-02-10 18:07:46 -05:00
Ye Luo	59ad9650cf	[Libomptarget][AMDGCN] add gfx90c target Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D119478	2022-02-10 15:55:44 -06:00
Shilei Tian	f6685f7746	[OpenMP][CUDA] Refine the logic to determine grid size This patch refines the logic to determine grid size as previous method can escape the check of whether `CudaBlocksPerGrid` could be greater than the actual hardware limit. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119311	2022-02-10 14:13:32 -05:00
Joseph Huber	9582f09690	[Libomptarget] Increase stack size for bug49779 test The 'bug49779.cpp' test has been failing recently. This is because the runtime is sufficiently complex when using nested parallelism without optimizations that the CUDA tools cannot statically determine the stack size. Because of this the kernel can exceed the thread stack size and crash. Work around this using the 'LIBOMPTARGET_STACK_SIZE' environment variable and add an FAQ entry for this situation. Fixes #53670 Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D119357	2022-02-09 15:37:23 -05:00
Jonathan Peyton	6be7c21b57	[OpenMP][libomp] Replace accidental VLA with KMP_ALLOCA MSVC does not support variable length arrays. Replace with KMP_ALLOCA which is already used in the same file for stack-allocated variables.	2022-02-09 08:09:27 -06:00
Joseph Huber	99d72ebddf	[Libomptarget] Add header files as a dependency to CMake target This patch manually adds the runtime include files to the list of dependencies when we build the bitcode runtime library. Previously if only the header was changed we would not recompile the source files. The solution used here isn't optimal because every source file not has a dependency on each header file regardless of if it was actually used by that file. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D119254	2022-02-08 12:09:59 -05:00
Joseph Huber	f8ffac5987	[OpenMP] Enable new driver tests for AMDGPU This patch enables running the new driver tests for AMDGPU. Previously this was disabled because some tests failed. This was only because the new driver tests hadn't been listed as unsupported or expected to fail. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D119240	2022-02-08 09:55:29 -05:00
Joseph Huber	d28051c4ab	[Libomptarget] Replace Value RAII with default value This patch replaces the ValueRAII pointer with a default 'nullptr' value. Previously this was initialized as a reference to an existing variable. The use of this variable caused overhead as the compiler could not look through the uses and determine that it was unused if 'Active' was not set. Because of this accesses to the variable would be left in the runtime once compiled. Fixes #53641 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119187	2022-02-07 17:12:00 -05:00
Igor Kirillov	4ae885b1e3	[OpenMP] kmp_atomic_float10_max_min.c test should only be executed on x86 platform Differential Revision: https://reviews.llvm.org/D118988	2022-02-07 10:07:54 +00:00
Joseph Huber	034adaf5be	[OpenMP] Completely remove old device runtime This patch completely removes the old OpenMP device runtime. Previously, the old runtime had the prefix `libomptarget-new-` and the old runtime was simply called `libomptarget-`. This patch makes the formerly new runtime the only runtime available. The entire project has been deleted, and all references to the `libomptarget-new` runtime has been replaced with `libomptarget-`. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D118934	2022-02-04 15:31:33 -05:00
Joseph Huber	b4be18219e	[Libomptarget] Remove AMDGPU XFAIL from test Summary; This test should pass now with AMDGPU. Previously the symbols were hidden and would fail when read.	2022-02-04 13:40:03 -05:00
Tom Stellard	a2601c9887	Bump the trunk major version to 15	2022-02-01 23:54:52 -08:00
Jon Chesterfield	f52927c122	Revert "[OpenMP][FIX] Explicit barriers in SPMD mode are not aligned" This seems to be the root cause of hangs on amdgpu. Reverting while investigating. This reverts commit `7b9844cc8d`.	2022-02-01 14:56:59 +00:00
Jon Chesterfield	8b7e99c41d	[openmp] Disable tests that presently hang on CI	2022-02-01 13:01:35 +00:00
Johannes Doerfert	7b9844cc8d	[OpenMP][FIX] Explicit barriers in SPMD mode are not aligned Due to num_threads (probably also other reasons) we cannot assume explicit barriers are always executed by all threads in an aligned fashion. We can optimize them if that property can be proven but that is different.	2022-02-01 01:10:52 -06:00
Johannes Doerfert	3c8a4c6f47	[OpenMP] Eliminate redundant barriers in the same block Patch originally by Giorgis Georgakoudis (@ggeorgakoudis), typos and bugs introduced later by me. This patch allows us to remove redundant barriers if they are part of a "consecutive" pair of barriers in a basic block with no impacted memory effect (read or write) in-between them. Memory accesses to local (=thread private) or constant memory are allowed to appear. Technically we could also allow any other memory that is not used to share information between threads, e.g., the result of a malloc that is also not captured. However, it will be easier to do more reasoning once the code is put into an AA. That will also allow us to look through phis/selects reasonably. At that point we should also deal with calls, barriers in different blocks, and other complexities. Differential Revision: https://reviews.llvm.org/D118002	2022-02-01 01:07:50 -06:00
Joseph Huber	4d4587d5b0	[OpenMP] Remove new driver tests for AMDGPU Some of the new driver tests are flaky on AMDGPU, remove for now.	2022-01-31 23:32:33 -05:00
Joseph Huber	0ac799b5c9	[Libomptarget] Run GPU offloading tests using the new drvier This patch adds a new target to the tests to run using the new driver as the method for generating offloading code. Depends on D116541 Differential Revision: https://reviews.llvm.org/D118637	2022-01-31 23:11:43 -05:00
Joachim Protze	0fd5f6964d	[OpenMP][tests][NFC] Pin debug info to DWARF v4 for libarcher tests Temporary solution for #53467, since debian test machines do not support DWARF v5.	2022-01-31 22:55:29 +01:00
Joseph Huber	ad0a306a38	[OpenMP][NFC] Change error message on offloading failure to mention documentation This patch changes the error message to instead mention the documentation page for the debugging options provided by libomptarget and the bitcode runtimes. Add some extra information to the documentation to help users more quickly identify debugging resources. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D118626	2022-01-31 15:19:52 -05:00
Joseph Huber	fd5853dae6	[Libomptarget] Reduce shared memory stack size to 512 and a message when it is exceeded Reduces the shared memory size used for globalization to 512 bytes from 2048 to reduce the pressure on shared memory. This patch ado adds a debug mesage to indicate when the shared memory was insufficient. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D118625	2022-01-31 15:19:48 -05:00
Jon Chesterfield	9b9d08111b	Set rpath on openmp executables Openmp executables need to find libomp and libomptarget at runtime. This currently requires LD_LIBRARY_PATH or the user to specify rpath. Change that to set the expected location of the openmp libraries in the install tree. Whether rpath means rpath or runpath is system dependent. The attached test shows that the Wl,--disable-new-dtags control interacts correctly with this feature. The implicit rpath field is appended to any user specified ones which is ideal. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D118493	2022-01-31 16:35:00 +00:00
Jon Chesterfield	a841a3a579	Revert "Set rpath on openmp executables" Failed some buildbots, bad assumptions about structure of install path This reverts commit `a80d5c34e4`.	2022-01-31 16:18:03 +00:00
Jon Chesterfield	a80d5c34e4	Set rpath on openmp executables Openmp executables need to find libomp and libomptarget at runtime. This currently requires LD_LIBRARY_PATH or the user to specify rpath. Change that to set the expected location of the openmp libraries in the install tree. Whether rpath means rpath or runpath is system dependent. The attached test shows that the Wl,--disable-new-dtags control interacts correctly with this feature. The implicit rpath field is appended to any user specified ones which is ideal. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D118493	2022-01-31 16:01:08 +00:00
John Ericson	368c54b81a	[openmp][cmake] `CMAKE_INSTALL_BINDIR` usage should not be quoted As @mstorsjo wrote in https://reviews.llvm.org/D117945#inline-1132920 : > This change seems to have broken one aspect: When doing `ninja install` I now get a warning saying `Error copying file "libomp.dll" to "libiomp5md.dll".`, and `libiomp5md.dll` isn't installed. > > I believe the reason is that the inline cmake snippet is written to `runtime/src/cmake_install.cmake` and then executed on install, but on install, `${CMAKE_INSTALL_BINDIR}` isn't set (as `GNUInstallDirs` isn't included there). Should this maybe expand `${CMAKE_INSTALL_BINDIR}` right here instead of deferring it to the install cmake, or what's the right course of action? I agree that is the right course of action. We also agreed to restore the `CMAKE_INSTALL_PREFIX` that was there before, too. Reviewed By: mstorsjo Differential Revision: https://reviews.llvm.org/D118528	2022-01-29 23:52:50 +00:00
Shilei Tian	d4d0ae628c	[OpenMP] Fix link error on Windows caused by `interop` functions This patch fixes the link error on Windows caused by `interop` functions. Reviewed By: mstorsjo Differential Revision: https://reviews.llvm.org/D118524	2022-01-29 11:29:29 -05:00
Shilei Tian	184f94a8a8	[OpenMP] Fix wrong number in openmp/runtime/src/dllexports This patch fixes the issue that numbers assigned to `interop` functions were already taken in `openmp/runtime/src/dllexports`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D118523	2022-01-29 00:23:08 -05:00
Ye Luo	bafb6f3e9c	[OpenMP] disable build of old nvptx device runtime Fully respect LIBOMPTARGET_BUILD_NVPTX_BCLIB. There is no CUDA toolchain dependency. Complement D118268. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D118522	2022-01-28 21:25:48 -06:00
Ron Lieberman	619f44b0ed	Revert "[OpenMP] Ensure broken assumptions print once, not thousands of times." This reverts commit `27c799ecc9`.	2022-01-28 01:41:10 +00:00
Joseph Huber	27c799ecc9	[OpenMP] Ensure broken assumptions print once, not thousands of times. If we have a broken assumption we want to print a message to the user. If the assumption is broken by many threads in many teams this can become a problem. To avoid it we use a hash that tracks if a broken assumption has (likely) been printed and avoid printing it again. This is not fool proof and has some caveats that might cause problems in the future (see comment) but it should improve the situation considerably for now. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D112156	2022-01-27 18:43:45 -05:00
Johannes Doerfert	1e12156896	[OpenMP][NFCI] Pipe the IdentTy object through more new RT functions IdentTy objects are useful for debugging and profiling so we want to keep them around in more places, especially those that have a large impact on performance, e.g., everything related to state. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112494	2022-01-27 15:36:55 -05:00
Sri Hari Krishna Narayanan	f44e41af41	Runtime for Interop directive This implements the runtime portion of the interop directive. It expects the frontend and IRBuilder portions to be in place for proper execution. It currently works only for GPUs and has several TODOs that should be addressed going forward. Reviewed By: RaviNarayanaswamy Differential Revision: https://reviews.llvm.org/D106674	2022-01-27 15:16:24 -05:00
Jon Chesterfield	e08f3bfe58	[openmp] Disable build of old runtimes by default The old runtime is not tested by CI. Disable the build prior to the llvm-14 branch. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D118268	2022-01-26 19:17:31 +00:00
Malhar Jajoo	c1988dbf2d	[openmp] Allow x87 fp functions only in Openmp runtime for x86. This patch allows Openmp runtime atomic functions operating on x87 high-precision to be present only in Openmp runtime for x86 architectures The functions affected are: __kmpc_atomic_10 __kmpc_atomic_20 __kmpc_atomic_cmplx10_add __kmpc_atomic_cmplx10_div __kmpc_atomic_cmplx10_mul __kmpc_atomic_cmplx10_sub __kmpc_atomic_float10_add __kmpc_atomic_float10_div __kmpc_atomic_float10_mul __kmpc_atomic_float10_sub __kmpc_atomic_float10_add_fp __kmpc_atomic_float10_div_fp __kmpc_atomic_float10_mul_fp __kmpc_atomic_float10_sub_fp __kmpc_atomic_float10_max __kmpc_atomic_float10_min Differential Revision: https://reviews.llvm.org/D117473	2022-01-22 22:09:44 +00:00
John Ericson	0a6b4258ab	[openmp][cmake] Use `GNUInstallDirs` to support custom installation dirs I am breaking apart D99484 so the cause of build failures is easier to understand. Differential Revision: https://reviews.llvm.org/D117945	2022-01-22 18:05:36 +00:00
Joseph Huber	26feef0846	[Libomptarget] Change visibility to hidden for device RTL This patch changes the visibility for all construct in the new device RTL to be hidden by default. This is done after the changes introduced in D117806 changed the visibility from being hidden by default for all device compilations. This asserts that the visibility for the device runtime library will be hidden except for the internal environment variable. This is done to aid optimization and linking of the device library. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D117807	2022-01-20 21:06:28 -05:00
Johannes Doerfert	b0789a1b12	[OpenMP] Avoid costly shadow map traversals whenever possible In the OpenMC app we saw `omp target update` spending an awful lot of time in the shadow map traversal without ever doing any update there. There are two cases that allow us to avoid the traversal completely. The simplest thing is that small updates cannot (reasonably) contain an attached pointer part. The other case requires to track in the mapping table if an entry might contain an attached pointer as part. Given that we have a single location shadow map entries are created, the latter is actually fairly easy as well. Differential Revision: https://reviews.llvm.org/D113124	2022-01-19 22:14:41 -06:00
Johannes Doerfert	1e447d03e2	[OpenMP] Introduce an environment variable to disable atomic map clauses Atomic handling of map clauses was introduced to comply with the OpenMP standard (see D104418). However, many apps won't need this feature which can be costly in certain situations. To allow for applications to opt-out we now introduce the `LIBOMPTARGET_MAP_FORCE_ATOMIC` environment flag that voids the atomicity guarantee of the standard for map clauses again, shifting the burden to the user. This patch also de-duplicates the code that introduces the events used to enforce atomicity as a cleanup. Differential Revision: https://reviews.llvm.org/D117627	2022-01-19 22:14:41 -06:00
Joseph Huber	28d718602a	[OpenMP] Expand short verisions of OpenMP offloading triples The OpenMP offloading libraries are built with fixed triples and linked in during compile time. This would cause un-helpful errors if the user passed in the wrong expansion of the triple used for the bitcode library. because we only support these triples for OpenMP offloading we can normalize them to the full verion used in the bitcode library. Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D117634	2022-01-19 20:26:37 -05:00
Jon Chesterfield	ce8f365884	[openmp] Always pass valid triple to openmp-targets when using newRTL Previously, we sometimes pass fopenmp-targets=nvptx64-nvidia-cuda-newRTL Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D117715	2022-01-19 22:07:22 +00:00
Jon Chesterfield	8baf4ba890	[openmp][amdgpu] Remove xfail from test using declare target variable	2022-01-19 15:55:37 +00:00
Jon Chesterfield	ca84c43d69	[openmp][amdgpu] Disable tests on old runtime, enable tests on new one	2022-01-19 15:49:47 +00:00
Jon Chesterfield	e35c8f541c	[openmp][amdgpu] Temporarily disable tests on old runtime	2022-01-19 15:39:00 +00:00
Joseph Huber	4863fed933	[Libomptarget] Fix external visibility for internal variables After the changes in D117362 made variables declared inside of a target declare directive visible outside the plugin, some variables inside the runtime were given visiblity that conflicted with their address space type. This caused problems when shared or local memory was made externally visible. This patch fixes this issue by making these varialbes static within the module, therefore limiting their visibility to being internal. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D117526	2022-01-18 18:19:57 -05:00
Joseph Huber	138cc5a001	Revert "[Libomptarget] Fix external visibility for internal variables" Reverting to investigate break on AMDGPU. This reverts commit `0203ff1960`.	2022-01-18 14:44:11 -05:00
Joseph Huber	0203ff1960	[Libomptarget] Fix external visibility for internal variables After the changes in D117362 made variables declared inside of a target declare directive visible outside the plugin, some variables inside the runtime were given visiblity that conflicted with their address space type. This caused problems when shared or local memory was made externally visible. This patch fixes this issue by making these varialbes static within the module, therefore limiting their visibility to being internal. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D117526	2022-01-18 12:53:24 -05:00
Terry Wilmarth	2e02579a76	[OpenMP] Add use of TPAUSE Add use of TPAUSE (from WAITPKG) to the runtime for Intel hardware, with an envirable to turn it on in a particular C-state. Always uses TPAUSE if it is selected and enabled by Intel hardware and presence of WAITPKG, and if not, falls back to old way of checking __kmp_use_yield, etc. Differential Revision: https://reviews.llvm.org/D115758	2022-01-18 10:14:32 -06:00
Joseph Huber	4869a22d1d	[Libomptarget] Add `cold` to KeepAlive attributes This patch adds the `cold` attribute to the keepAlive functions in the RTL. This dummy function exists to keep certain RTL calls alive without them being optimized out, but it is never called and can be declared cold. This also helps some erroneous remarks being given on this function because it has weak linkage and cannot be made internal. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D117513	2022-01-17 17:29:26 -05:00
John Ericson	da77db58d7	Revert "[cmake] Use `GNUInstallDirs` to support custom installation dirs." https://lab.llvm.org/buildbot/#/builders/46/builds/21146 Still have this odd error, not sure how to reproduce, so I will just try breaking up my patch. This reverts commit `4a678f8072`.	2022-01-16 05:48:30 +00:00
John Ericson	4a678f8072	[cmake] Use `GNUInstallDirs` to support custom installation dirs. This is the original patch in my GNUInstallDirs series, now last to merge as the final piece! It arose as a new draft of D28234. I initially did the unorthodox thing of pushing to that when I wasn't the original author, but since I ended up - Using `GNUInstallDirs`, rather than mimicking it, as the original author was hesitant to do but others requested. - Converting all the packages, not just LLVM, effecting many more projects than LLVM itself. I figured it was time to make a new revision. I have used this patch series (and many back-ports) as the basis of https://github.com/NixOS/nixpkgs/pull/111487 for my distro (NixOS), which was merged last spring (2021). It looked like people were generally on board in D28234, but I make note of this here in case extra motivation is useful. --- As pointed out in the original issue, a central tension is that LLVM already has some partial support for these sorts of things. Variables like `COMPILER_RT_INSTALL_PATH` have already been dealt with. Variables like `LLVM_LIBDIR_SUFFIX` however, will require further work, so that we may use `CMAKE_INSTALL_LIBDIR`. These remaining items will be addressed in further patches. What is here is now rote and so we should get it out of the way before dealing more intricately with the remainder. Reviewed By: #libunwind, #libc, #libc_abi, compnerd Differential Revision: https://reviews.llvm.org/D99484	2022-01-16 05:33:07 +00:00
John Ericson	6e52bfe09d	Revert "[cmake] Use `GNUInstallDirs` to support custom installation dirs." Sorry for the disruption, I will try again later. This reverts commit `efeb501970`.	2022-01-15 07:35:02 +00:00
John Ericson	efeb501970	[cmake] Use `GNUInstallDirs` to support custom installation dirs. This is the original patch in my GNUInstallDirs series, now last to merge as the final piece! It arose as a new draft of D28234. I initially did the unorthodox thing of pushing to that when I wasn't the original author, but since I ended up - Using `GNUInstallDirs`, rather than mimicking it, as the original author was hesitant to do but others requested. - Converting all the packages, not just LLVM, effecting many more projects than LLVM itself. I figured it was time to make a new revision. I have used this patch series (and many back-ports) as the basis of https://github.com/NixOS/nixpkgs/pull/111487 for my distro (NixOS), which was merged last spring (2021). It looked like people were generally on board in D28234, but I make note of this here in case extra motivation is useful. --- As pointed out in the original issue, a central tension is that LLVM already has some partial support for these sorts of things. Variables like `COMPILER_RT_INSTALL_PATH` have already been dealt with. Variables like `LLVM_LIBDIR_SUFFIX` however, will require further work, so that we may use `CMAKE_INSTALL_LIBDIR`. These remaining items will be addressed in further patches. What is here is now rote and so we should get it out of the way before dealing more intricately with the remainder. Reviewed By: #libunwind, #libc, #libc_abi, compnerd Differential Revision: https://reviews.llvm.org/D99484	2022-01-15 01:08:35 +00:00
Jon Chesterfield	d53b979596	[openmp][devicertl] Handle missing clang_tool Fixes github issues/52910 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D117230	2022-01-13 22:43:26 +00:00
Joseph Huber	4746e38f67	[Libomptarget] Fix multiply defined symbol during linking This patch adds the `weak` identifier to the openmp device environment variable. The changes introduced in https://reviews.llvm.org/D117211 result in multiply defined symbols. Because the symbol is potentially included multiple times for each offloading file we will get symbol colisions, and because it needs to have external visiblity it should be weak. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D117231	2022-01-13 11:57:33 -05:00
Jon Chesterfield	4395608939	[openmp] Mark used variables as retain as well D97446 changed the behaviour of 'used'. Compensate. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D117211	2022-01-13 13:57:32 +00:00
Jon Chesterfield	a74826d30a	[openmp][amdgpu] Replace unsigned long with uint64_t Some types need to be 64 bit. Unsigned long is a hazard there. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D116963	2022-01-10 22:19:30 +00:00
Shilei Tian	aab62aab04	[OpenMP][Offloading] Fixed a crash caused by dereferencing nullptr In function `DeviceTy::getTargetPointer`, `Entry` could be `nullptr` because of zero length array section. We need to check if it is a valid iterator before using it. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D116716	2022-01-05 23:04:29 -05:00
Shilei Tian	9584c6fa2f	[OpenMP][Offloading] Fixed data race in libomptarget caused by async data movement The async data movement can cause data race if the target supports it. Details can be found in [1]. This patch tries to fix this problem by attaching an event to the entry of data mapping table. Here are the details. For each issued data movement, a new event is generated and returned to `libomptarget` by calling `createEvent`. The event will be attached to the corresponding mapping table entry. For each data mapping lookup, if there is no need for a data movement, the attached event has to be inserted into the queue to gaurantee that all following operations in the queue can only be executed if the event is fulfilled. This design is to avoid synchronization on host side. Note that we are using CUDA terminolofy here. Similar mechanism is assumped to be supported by another targets. Even if the target doesn't support it, it can be easily implemented in the following fall back way: - `Event` can be any kind of flag that has at least two status, 0 and 1. - `waitEvent` can directly busy loop if `Event` is still 0. My local test shows that `bug49334.cpp` can pass. Reference: [1] https://bugs.llvm.org/show_bug.cgi?id=49940 Reviewed By: grokos, JonChesterfield, ye-luo Differential Revision: https://reviews.llvm.org/D104418	2022-01-05 20:20:04 -05:00
RitanyaB	378b0ac179	SIGSEGV in ompt_tsan_dependences with for-ordered Segmentation fault in ompt_tsan_dependences function due to an unchecked NULL pointer dereference is as follows: ``` ThreadSanitizer:DEADLYSIGNAL ==140865==ERROR: ThreadSanitizer: SEGV on unknown address 0x000000000050 (pc 0x7f217c2d3652 bp 0x7ffe8cfc7e00 sp 0x7ffe8cfc7d90 T140865) ==140865==The signal is caused by a READ memory access. ==140865==Hint: address points to the zero page. /usr/bin/addr2line: DWARF error: could not find variable specification at offset 1012a /usr/bin/addr2line: DWARF error: could not find variable specification at offset 133b5 /usr/bin/addr2line: DWARF error: could not find variable specification at offset 1371a /usr/bin/addr2line: DWARF error: could not find variable specification at offset 13a58 #0 ompt_tsan_dependences(ompt_data_t, ompt_dependence_t const, int) /ptmp/bhararit/llvm-project/openmp/tools/archer/ompt-tsan.cpp:1004 (libarcher.so+0x15652) #1 __kmpc_doacross_post /ptmp/bhararit/llvm-project/openmp/runtime/src/kmp_csupport.cpp:4280 (libomp.so+0x74d98) #2 .omp_outlined. for_ordered_01.c:? (for_ordered_01.exe+0x5186cb) #3 __kmp_invoke_microtask /ptmp/bhararit/llvm-project/openmp/runtime/src/z_Linux_asm.S:1166 (libomp.so+0x14e592) #4 __kmp_invoke_task_func /ptmp/bhararit/llvm-project/openmp/runtime/src/kmp_runtime.cpp:7556 (libomp.so+0x909ad) #5 __kmp_fork_call /ptmp/bhararit/llvm-project/openmp/runtime/src/kmp_runtime.cpp:2284 (libomp.so+0x8461a) #6 __kmpc_fork_call /ptmp/bhararit/llvm-project/openmp/runtime/src/kmp_csupport.cpp:308 (libomp.so+0x6db55) #7 main ??:? (for_ordered_01.exe+0x51828f) #8 __libc_start_main ??:? (libc.so.6+0x24349) #9 _start /home/abuild/rpmbuild/BUILD/glibc-2.26/csu/../sysdeps/x86_64/start.S:120 (for_ordered_01.exe+0x4214e9) ThreadSanitizer can not provide additional info. SUMMARY: ThreadSanitizer: SEGV /ptmp/bhararit/llvm-project/openmp/tools/archer/ompt-tsan.cpp:1004 in ompt_tsan_dependences(ompt_data_t, ompt_dependence_t const, int) ==140865==ABORTING ``` To reproduce the error, use the following openmp code snippet: ``` /* initialise testMatrixInt Matrix, cols, r and c */ #pragma omp parallel private(r,c) shared(testMatrixInt) { #pragma omp for ordered(2) for (r=1; r < rows; r++) { for (c=1; c < cols; c++) { #pragma omp ordered depend(sink:r-1, c+1) depend(sink:r-1,c-1) testMatrixInt[r][c] = (testMatrixInt[r-1][c] + testMatrixInt[r-1][c-1]) % cols ; #pragma omp ordered depend (source) } } } ``` Compilation: ``` clang -g -stdlib=libc++ -fsanitize=thread -fopenmp -larcher test_case.c ``` It seems like the changes introduced by the commit https://reviews.llvm.org/D114005 causes this particular SEGV while using Archer. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D115328	2022-01-03 11:23:57 -06:00
Shilei Tian	458db51c10	[OpenMP] Add missing `tt_hidden_helper_task_encountered` along with `tt_found_proxy_tasks` In most cases, hidden helper task behave similar as detached tasks. That means, for example, if we have to wait for detached tasks, we have to do the same thing for hidden helper tasks as well. This patch adds the missing condition for hidden helper task accordingly along with detached task. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D107316	2021-12-29 23:22:53 -05:00
Johannes Doerfert	73104ad65b	[OpenMP][NFC] Move headers into include folder	2021-12-28 23:53:28 -06:00
Shilei Tian	943d1d83dd	[OpenMP][CUDA] Add resource pool for CUevent Following D111954, this patch adds the resource pool for CUevent. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D116315	2021-12-28 17:42:38 -05:00
Shilei Tian	357c8031ff	[OpenMP][Plugin] Minor adjustments to ResourcePool This patch makes some minor adjustments to `ResourcePool`: - Don't initialize the resources if `Size` is 0 which can avoid assertion. - Add a new interface function `clear` to release all hold resources. - If initial size is 0, resize to 1 when the first request is encountered. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D116340	2021-12-28 16:11:03 -05:00
Joseph Huber	7cdaa5a94e	[OpenMP][FIX] Change globalization alignment to 16 This patch changes the default aligntment from 8 to 16, and encodes this information in the `__kmpc_alloc_shared` runtime call to communicate it to the HeapToStack pass. The previous alignment of 8 was not sufficient for the maximum size of primitive types on 64-bit systems, and needs to be increaesd. This reduces the amount of space availible in the data sharing stack, so this implementation will need to be improved later to include the alignment requirements in the allocation call, and use it properly in the data sharing stack in the runtime. Depends on D115888 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D115971	2021-12-27 16:58:25 -05:00
Shilei Tian	a697a0a4b6	[OpenMP][Plugin] Introduce generic resource pool Currently CUDA streams are managed by `StreamManagerTy`. It works very well. Now we have the need that some resources, such as CUDA stream and event, will be hold by `libomptarget`. It is always good to buffer those resources. What's more important, given the way that `libomptarget` and plugins are connected, we cannot make sure whether plugins are still alive when `libomptarget` is destroyed. That leads to an issue that those resouces hold by `libomptarget` might not be released correctly. As a result, we need an unified management of all the resources that can be shared between `libomptarget` and plugins. `ResourcePoolTy` is designed to manage the type of resource for one device. It has to work with an allocator which is supposed to provide `create` and `destroy`. In this way, when the plugin is destroyed, we can make sure that all resources allocated from native runtime library will be released correctly, no matter whether `libomptarget` starts its destroy. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D111954	2021-12-27 11:32:14 -05:00
Jonathan Peyton	6a556ecaf4	[OpenMP][libomp] Add use-all syntax to KMP_HW_SUBSET This patch allows the user to request all resources of a particular layer (or core-attribute). The syntax of KMP_HW_SUBSET is modified so the number of units requested is optional or can be replaced with an '' character. e.g., KMP_HW_SUBSET=c:intel_atom@3 will use all the cores after offset 3 e.g., KMP_HW_SUBSET=c:intel_core will use all the big cores e.g., KMP_HW_SUBSET=s,c,1t will use all the sockets, all cores per each socket and 1 thread per core. Differential Revision: https://reviews.llvm.org/D115826	2021-12-20 13:45:21 -06:00
Jon Chesterfield	38af5b4fd1	[libomptarget][nfc] Refactor dlwrap.h for easier reuse in D115966 and upcoming patches	2021-12-17 22:28:31 +00:00
Jon Chesterfield	91dfb32f2f	[openmp][amdgpu][nfc] Mark all external functions extern C to get type checking	2021-12-17 18:46:43 +00:00
Carlo Bertolli	d3abb04e14	[OpenMP][libomptarget] Fix __tgt_rtl_run_target_team_region_async API with missing parameter I missed the async info parameter in the first version of this API. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115887	2021-12-17 15:58:18 +00:00
Carlo Bertolli	d83dc4c648	[OpenMP] Increase opportunity for parallel kernel launch in AMDGPUs: add multiple hsa queue's per device in plugin This patch extends the AMDGPU plugin for OpenMP target offloading from using a single HSA queue to multiple queues (four in this patch) per device. This enables concurrent threads to concurrently submit kernel launches to the same GPU. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115771	2021-12-15 15:33:17 +00:00
Jonathan Peyton	9769340905	[OpenMP][libomp] Fix compile errors with new KMP_HW_SUBSET changes Add missing guards around x86-specific code. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D115664	2021-12-14 08:33:05 +01:00
John Ericson	ddcc02dbcc	Quote some more destination paths with variables Just defensive CMake-ing. I pulled this from D115544 and D99484 which are blocked on some lldb CI failures I don't yet understand. Hoping to land something smaller in the meantime. Reviewed By: #libc, ldionne Differential Revision: https://reviews.llvm.org/D115566	2021-12-13 17:29:08 +00:00
Michael Kruse	77e019c233	[OpenMP] Add "not" to test dependencies. The `not` program is used to test executions prefixed with `%libomptarget-run-fail-`. Currently `not` is not used for libomp tests, but might be used in the future and its dependency does not add any additional burden over the already established `FileCheck` dependency. Required to add libomptarget testing to the Phabricator pre-merge check (see https://github.com/google/llvm-premerge-checks/issues/368) Reviewed By: jdenny, JonChesterfield Differential Revision: https://reviews.llvm.org/D115454	2021-12-12 10:52:53 -06:00
Med Ismail Bennani	30fc88bf1d	Revert "Revert "Revert "Use `GNUInstallDirs` to support custom installation dirs. -- LLVM""" This reverts commit `492de35df4`. I tried to apply John's changes in `8d897ec915` that were expected to fix his patch but that didn't work unfortunately. Reverting this again to fix the macOS bots and leave him more time to investigate the issue.	2021-12-10 17:33:54 -08:00
John Ericson	492de35df4	Revert "Revert "Use `GNUInstallDirs` to support custom installation dirs. -- LLVM"" This reverts commit `797b50d4be`. See the original D99484. @mib who noticed the original problem could not longer reproduce it, after I tried and also failed. We are threfore hoping it went away on its own! Reviewed By: mib Differential Revision: https://reviews.llvm.org/D115544	2021-12-10 20:59:43 +00:00
Joseph Huber	8425bde82d	Revert "[OpenMP] Avoid costly shadow map traversals whenever possible" This reverts commit `7c8f4e7b85`. Fails a few OpenMP tests, causes a few updates to segfault.	2021-12-10 15:57:58 -05:00
Jonathan Peyton	df20599597	[OpenMP][libomp] Add core attributes to KMP_HW_SUBSET Allow filtering of resources based on core attributes. There are two new attributes added: 1) Core Type (intel_atom, intel_core) 2) Core Efficiency (integer) where the higher the efficiency, the more performant the core On hybrid architectures , e.g., Alder Lake, users can specify KMP_HW_SUBSET=4c:intel_atom,4c:intel_core to select the first four Atom and first four Big cores. The can also use the efficiency syntax. e.g., KMP_HW_SUBSET=2c:eff0,2c:eff1 Differential Revision: https://reviews.llvm.org/D114901	2021-12-10 14:34:33 -06:00
Joseph Huber	7c8f4e7b85	[OpenMP] Avoid costly shadow map traversals whenever possible In the OpenMC app we saw `omp target update` spending an awful lot of time in the shadow map traversal without ever doing any update there. There are two cases that allow us to avoid the traversal completely. The simplest thing is that small updates cannot (reasonably) contain an attached pointer part. The other case requires to track in the mapping table if an entry might contain an attached pointer as part. Given that we have a single location shadow map entries are created, the latter is actually fairly easy as well. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D113124	2021-12-10 14:33:18 -05:00
Carlo Bertolli	28309c5436	[OpenMP] Part 2 of At present, amdgpu plugin merges both asynchronous and synchronous kernel launch implementations into a single synchronous version. This patch prepares the plugin for asynchronous implementation by: Privatizing actual kernel launch code (valid in both cases) into an anonymous namespace base function (submitted at D115267) - Separating the control flow path of asynchronous and synchronous kernel launch functions** (this diff) Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115273	2021-12-10 19:21:05 +00:00
Joel E. Denny	51168ce8d5	[OpenMP] Add test for custom state machine if have reduction D113602 broke the custom state machine when a reduction is present, as revealed by the reproducer this patch adds to the test suite. In that case, openmp-opts changes the return value to undef in `__kmpc_get_warp_size` (which the custom state machine calls as of D113602). Later optimizations then optimize away the custom state machine code as if all threads are outside the thread block, so the target region does not execute. D114802 fixed that but didn't add a reproducer. This patch also adds a `__OMP_RTL_ATTRS` entry for `__kmpc_get_warp_size` to OMPKinds.def, which D113602 missed. This change does not seem to have any impact on the reduction problem. Reviewed By: JonChesterfield, jdoerfert Differential Revision: https://reviews.llvm.org/D113824	2021-12-10 12:53:54 -05:00
AndreyChurbanov	1031e43052	[OpenMP] libomp: fix Fortran header: lines exceeded 72-char length Added line continuation to two long lines in Fortran header. Differential Revision: https://reviews.llvm.org/D114537	2021-12-10 16:23:21 +03:00
Joseph Huber	bc9c4d7216	[OpenMP][FIX] Pass the num_threads value directly to parallel_51 The problem with the old scheme is that we would need to keep track of the "next region" and reset the num_threads value after it. The new RT doesn't do it and an assertion is triggered. The old RT doesn't do it either, I haven't tested it but I assume a num_threads clause might impact multiple parallel regions "accidentally". Further, in SPMD mode num_threads was simply ignored, for some reason beyond me. In any case, parallel_51 is designed to take the clause value directly, so let's do that instead. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D113623	2021-12-09 16:30:29 -05:00
Carlo Bertolli	cc8dc5e28b	[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy. Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115279	2021-12-08 23:02:39 +00:00
AndreyChurbanov	4dd8fccb71	[OpenMP] libomp: Fix crash if application send us negative thread_limit value Regardless that specification requires thread_limit to be positive, it is better to warn user instead of crash in case the value is negative. Differential Revision: https://reviews.llvm.org/D115340	2021-12-08 19:02:57 +03:00
Jon Chesterfield	14ff611fe1	Revert "[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version" This reverts commit `6de698bf10`. It didn't build in the dynamic_hsa configuration	2021-12-08 08:23:12 +00:00
Carlo Bertolli	6de698bf10	[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy. Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115279	2021-12-07 23:05:23 +00:00
Carlo Bertolli	d9b1d827d2	[NFC][OpenMP] Prepare amdgpu plugin for asynchronous implementation of target region launch At present, amdgpu plugin merges both asynchronous and synchronous kernel launch implementations into a single synchronous version. This patch prepares the plugin for asynchronous implementation by: - Privatizing actual kernel launch code (valid in both cases) into an anonymous namespace base function Actual separation of kernel launch code (async vs sync) is a following patch. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115267	2021-12-07 21:02:45 +00:00
Martin Storsjö	db32c4f456	[OpenMP] Disable libomptarget profiling by default if built via the "runtimes" setup In the "runtimes" setup, the runtime (e.g. OpenMP) can be built for a target entirely different from the current host build (where LLVM and Clang are built). If profiling is enabled, libomptarget links against LLVMSupport (which only has been built for the host). Thus, don't enable profiling by default in this setup. This should allow relanding D113253. Differential Revision: https://reviews.llvm.org/D114083	2021-12-07 22:23:50 +02:00
Ye Luo	21a51cebf1	[OpenMP][libomptarget] amdgpu plugin adds runpath for dependencies amdgpu plugin depends on libhsa-runtime64 library. Add runpath in case it is not on the LD_LIBRARY_PATH. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115198	2021-12-06 18:19:18 -06:00
Jon Chesterfield	a05a0c3c2f	[libomptarget] Add cmake variables to disable building the amdgpu or cuda plugins Analogous to the controls on building device runtimes Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D115148	2021-12-06 16:42:26 +00:00
Jon Chesterfield	a2b3b4dadc	[openmp] Run tests on both runtimes, independent of the default Minor fix to the lit.cfg. Currently, nvptx runs the tests twice on the new runtime. Soon, amdgpu will run them on the new runtime as well as the old. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D115150	2021-12-06 16:41:23 +00:00
Jon Chesterfield	9e08c2054a	[openmp] Enable tests on new devicertl on amdgpu Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D114891	2021-12-06 15:26:18 +00:00
Jon Chesterfield	1a87a18955	[openmp][amdgpu] Disable tests requiring USM on amdgcn These tests tend to hang or crash on hardware that doesn't support USM. Disabling them helps diagnose other issues. To safely enable we require a means of testing whether USM is expected to work. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D115144	2021-12-06 13:25:23 +00:00
Matt Arsenault	90f914c870	OpenMP: Un-xfail tests that pass now `729bf9b26b` should have fixed these	2021-12-04 11:25:22 -05:00
Ron Lieberman	8f4013ad46	Restric xfail on openmp/libomptarget/test/mapping/reduction_implicit_map.cpp to amdgcn-amd-amdhsa	2021-12-02 20:58:26 +00:00
Ron Lieberman	f87c2c637e	xfail: libomptarget reduction_implicit_map.cpp after reapply of Start calling setTargetAttributes	2021-12-02 20:38:25 +00:00
Jon Chesterfield	fb9fc3c951	[openmp][amdgpu] Disable three tests in preparation for new runtime	2021-12-02 07:57:14 +00:00
Kazushi (Jam) Marukawa	5e2358c781	[runtimes][openmp] Change to not treat ARCH-unknown-linux-gnu as errors When OpenMP is compiled as a part runtimes for multiple targets, openmp is compiled under build/runtimes/runtimes-arch-unknown-linux-gnu-bins directory. Old implementation treats this directory name as errors. This patch adds a guard like "[Uu]known[^-]". Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D114346	2021-12-01 08:33:37 +09:00
Jonathan Peyton	618f8dc5e5	[OpenMP][libomp][doc] Add environment variables documentation Add documentation for the environment variables for libomp Differential Revision: https://reviews.llvm.org/D114269	2021-11-30 16:29:31 -06:00
Jon Chesterfield	3ab150f6e4	[openmp][devicertl] Add a missing loader_uninitialized attribute	2021-11-29 23:54:37 +00:00
Matt Arsenault	935abeaace	OpenMP: Correctly query location for amdgpu-arch This was trying to figure out the build path for amdgpu-arch, and making assumptions about where it is which were not working on my system. Whether a standalone build or not, we should have a proper imported target to get the location from.	2021-11-29 16:31:32 -05:00
Jon Chesterfield	ae5348a38e	[openmp][amdgpu] Make plugin robust to presence of explicit implicit arguments OpenMP (compiler) does not currently request any implicit kernel arguments. OpenMP (runtime) allocates and initialises a reasonable guess at the implicit kernel arguments anyway. This change makes the plugin check the number of explicit arguments, instead of all arguments, and puts the pointer to hostcall buffer in both the current location and at the offset expected when implicit arguments are added to the metadata by D113538. This is intended to keep things running while fixing the oversight in the compiler (in D113538). Once that patch lands, and a following one marks openmp kernels that use printf such that the backend emits an args element with the right type (instead of hidden_node), the over-allocation can be removed and the hardcoded 8*e+3 offset replaced with one read from the .offset of the corresponding metadata element. Reviewed By: estewart08 Differential Revision: https://reviews.llvm.org/D114274	2021-11-22 23:00:20 +00:00
Joseph Huber	fbfe8fcbc3	[Libomptarget] Remove undefined symbol in old runtime A function with no definition was left in the old runtime, causing linker errors when trying to compile. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D114264	2021-11-20 08:26:57 -05:00
Jon Chesterfield	04954824ee	[openmp][amdgpu][nfc] Simplify implicit args handling Removes a +x/-x pair on the only store/load of a variable and deletes some nearby dead code. Also reduces the size of the implicit struct to reflect the code currently emitted by clang. Differential Revision: https://reviews.llvm.org/D114270	2021-11-19 20:18:23 +00:00
Jon Chesterfield	9cdaf0b01b	[openmp][amdgpu][nfc] Inline interop_hsa_get_kernel_info into only caller	2021-11-19 18:45:17 +00:00
Alexey Bataev	80256605f8	[OpenMP] support depend clause for taskwait directive, by Deepak Eachempati. This patch adds clang (parsing, sema, serialization, codegen) support for the 'depend' clause on the 'taskwait' directive. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D113540	2021-11-19 06:30:17 -08:00
Peyton, Jonathan L	a733b18bdb	[OpenMP][libomp] Enable HWLOC topology detection of multiple CPU kinds Teach the HWLOC topology method how to detect Atom and Core types so hybrid CPUs are properly detected and represented when using the HWLOC topology method. Differential Revision: https://reviews.llvm.org/D112270	2021-11-17 16:30:18 -06:00
Peyton, Jonathan L	286094af9b	[OpenMP][libomp] Improve Windows Processor Group handling within topology The current implementation of Windows Processor Groups has a separate topology method to handle them. This patch deprecates that specific method and uses the regular CPUID topology method by default and inserts the Windows Processor Group objects in the topology manually. Notes: * The preference for processor groups is lowered to a value less than socket so that the user will see sockets in the KMP_AFFINITY=verbose output instead of processor groups when sockets=processor groups. * The topology's capacity is modified to handle additional topology layers without the need for reallocation. * If a user asks for a granularity setting that is "above" the processor group layer, then the granularity is adjusted "down" to the processor group since this is the coarsest layer available for threads. Differential Revision: https://reviews.llvm.org/D112273	2021-11-17 16:29:01 -06:00

... 3 4 5 6 7 ...

2498 Commits