Commit Graph

2498 Commits

Author SHA1 Message Date
Joseph Huber fdbb15355e [Libomptarget][CUDA] Check CUDA compatibilty correctly
We recently added support for multi-architecture binaries in
libomptarget. This is done by extracting the architecture from the
embedded image and comparing it with the major and minor version
supported by the current CUDA installation. Previously we just compared
these directly, which was not correct for binary compatibility. The CUDA
documentation states that we can consider any image with an equivalent
major or a greater or equal to minor compatible with the current image.
Change the check to use this new logic in the CUDA plugin.

Fixes #57049

Reviewed By: jdoerfert, ye-luo

Differential Revision: https://reviews.llvm.org/D131567
2022-08-10 11:15:27 -04:00
Ron Lieberman 9ff0cc7e0f [openmp] Fix enumeration build issue for openmp library
integer value 40962 is outside the valid range of values [0, 31] for this enumeration type [-Wenum-constexpr-conversion]` (Issue #57022)

turn on -Wno-enum-constexpr-conversion to buy some time to fix the more egregious issue in hsa_agent_into_t and hsa_amd_agent_info_t interfaces.

relates to https://reviews.llvm.org/D131307/new/

Differential Revision: https://reviews.llvm.org/D131477
2022-08-09 10:25:03 +00:00
Fangrui Song 0972a390b9 LLVM_FALLTHROUGH => [[fallthrough]]. NFC 2022-08-09 04:06:52 +00:00
Jon Chesterfield 521a5c11ac Rename OPENMP_HAVE_STD_CPP14_FLAG to match c++17 2022-08-08 17:07:45 +01:00
Ron Lieberman af28b27d31 Move openmp from -std=c++14 to -std=c++17 2022-08-08 16:04:57 +00:00
Jon Chesterfield 104f11630a [nfc][openmp] clang-format system.cpp prior to D131401 2022-08-08 16:24:34 +01:00
Shilei Tian 294bbdc0b8 [NFC] Fix wrong header in `LibC.cpp` 2022-08-04 23:54:07 -04:00
Shilei Tian 459e3c5184 [OpenMP] Fix the test case issue that printf cannot be used in target region for AMDGPU 2022-08-04 14:48:48 -04:00
Shilei Tian db5a2afa62 [OpenMP][DeviceRTL] Implement libc function `memcmp`
We will add some simple implementation of libc functions starting from
this patch, and the first one is `memcmp`, which is reported in #56929. Note that
`malloc` and `free` are not included in this patch because of the use of
`declare variant`. In the near future we will implement the two functions w/o
using any vendor provided function.

This fixes #56929.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D131182
2022-08-04 14:37:54 -04:00
Joseph Huber b3335e8ed7 [Libomptarget][NFC] Clang format the AMDGPU plugin
Summary:
A previous patch did not format the plugin again after making changes.
Ensure that libomptarget stays formatted.
2022-08-03 15:18:16 -04:00
Joseph Huber 2b7203a359 [Libomptarget] Deinitialize AMDGPU global state more intentionally
A previous patch made the destruction of the HSA plugin more
deterministic. However, there were still other global values that are not
handled this way. When attempting to call a destructor kernel, the
device would have already been uninitialized and we could not find the
appropriate kernel to call. This is because they were stored in global
containers that had their destructors called already. Merges this global
state into the rest of the info state by putting those global values
inside of the global pointer already allocated and deallocated by the
constructor and destructor. This should allow the AMDGPU plugin to
correctly identify the destructors if we were to run them.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D131011
2022-08-02 18:24:39 -04:00
Jonathan Peyton 9cf6511bff [OpenMP][libomp] Detect if test compiler has omp.h
omp50_taskdep_depobj.c relies on the test compiler's omp.h file.
If the test compiler does not have an omp.h file, then use the one
within the build tree.

Fixes: https://github.com/llvm/llvm-project/issues/56820
Differential Revision: https://reviews.llvm.org/D131000
2022-08-02 17:05:56 -05:00
Martin Storsjö 3f25ad335b [OpenMP] Fix warnings about unused expressions when OMPT_LOOP_DISPATCH is a no-op. NFC.
This fixes warnings like these:

../runtime/src/kmp_dispatch.cpp:2159:24: warning: left operand of comma operator has no effect [-Wunused-value]
    OMPT_LOOP_DISPATCH(*p_lb, *p_ub, pr->u.p.st, status);
                       ^~~~~
../runtime/src/kmp_dispatch.cpp:2159:31: warning: left operand of comma operator has no effect [-Wunused-value]
    OMPT_LOOP_DISPATCH(*p_lb, *p_ub, pr->u.p.st, status);
                              ^~~~~
../runtime/src/kmp_dispatch.cpp:2159:46: warning: left operand of comma operator has no effect [-Wunused-value]
    OMPT_LOOP_DISPATCH(*p_lb, *p_ub, pr->u.p.st, status);
                                     ~~~~~~~ ^~
../runtime/src/kmp_dispatch.cpp:2159:50: warning: expression result unused [-Wunused-value]
    OMPT_LOOP_DISPATCH(*p_lb, *p_ub, pr->u.p.st, status);
                                                 ^~~~~~
2022-08-02 11:16:23 +03:00
Martin Storsjö 7f24fd26a8 [OpenMP] Only include CMAKE_DL_LIBS on unix platforms
CMAKE_DL_LIBS is documented as "Name of library containing dlopen
and dlclose".

On Windows platforms, there's no system provided dlopen/dlclose, but
it can be argued that if you really intend to call dlopen/dlclose,
you're going to be using a third party compat library like
https://github.com/dlfcn-win32/dlfcn-win32, and CMAKE_DL_LIBS should
expand to its name.

This has been argued upstream in CMake in
https://gitlab.kitware.com/cmake/cmake/-/issues/17600 and
https://gitlab.kitware.com/cmake/cmake/-/merge_requests/1642, that
CMAKE_DL_LIBS should expand to "dl" on mingw platforms.

The merge request wasn't merged though, as it caused some amount of
breakage, but in practice, Fedora still carries a custom CMake patch
with the same effect.

Thus, this patch fixes cross compiling OpenMP for mingw targets
on Fedora with their custom-patched CMake.

Differential Revision: https://reviews.llvm.org/D130892
2022-08-02 10:56:30 +03:00
Joseph Huber 5afb5312a0 [Libomptarget][NFC] Remove unused CMake file
Summary:
This file is no longer used, get rid of it.
2022-08-01 16:21:53 -04:00
Joseph Huber 51bda3a0e7 [Libomptarget] Replace std::vector with llvm::SmallVector
The runtime makes some use of `std::vector` data structures. We should
be able to replace these trivially with `llvm::SmallVector` instead.
This should allow us to avoid heap allocations in the majority of cases
now.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D130927
2022-08-01 15:59:15 -04:00
Michał Górny eb4612ca23 [openmp] [test] Fix prepending config.library_dir to LD_LIBRARY_PATH
Fix the LD_LIBRARY_PATH prepending order to make sure that
config.library_path ends up before any potentially-system directories
(e.g. config.hwloc_library_dir).  This makes sure that we are testing
against the just-built openmp libraries rather than the version that is
already installed.

Also rename the function to `prepend_*` to make it clearer what it
actually does.

https://github.com/llvm/llvm-project/issues/56821
Differential Revision: https://reviews.llvm.org/D130825
2022-08-01 18:54:06 +02:00
Joseph Huber 1d03b2efcd [Libomptarget] Disable testing map_back_race.cpp
This test hasn't been fixed and causes spurious failures when testing.
This patch sets it as unsupported until we have a reliable fix.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D130789
2022-07-30 15:01:47 -04:00
tlattner a140f43431 Update references to mailing lists that have moved to Discourse. 2022-07-29 15:55:38 -07:00
tlattner 520d29f381 Update references to mailing lists that have moved to Discourse. 2022-07-28 16:54:58 -07:00
Jon Chesterfield ed0f218115 [openmp][amdgpu] Tear down amdgpu plugin accurately
Moves DeviceInfo global to heap to accurately control lifetime.
Moves calls from libomptarget to deinit_plugin later, plugins need to stay
alive until very shortly before libomptarget is destructed.

Leaving the deinit_plugin calls where initially inserted hits use after
free from the dynamic_module.c offloading test (verified with valgrind
 that the new location is sound with respect to this)

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D130714
2022-07-28 20:00:03 +01:00
Jon Chesterfield c214cb6a68 [amdgpu][openmp][nfc] Restore stb_local on DeviceInfo symbol 2022-07-28 16:50:46 +01:00
Jon Chesterfield 75aa521064 [openmp][amdgpu] Move global DeviceInfo behind call syntax prior to using D130712 2022-07-28 16:40:42 +01:00
Jon Chesterfield 1f9d3974e4 [openmp] Introduce optional plugin init/deinit functions
Will allow plugins to migrate away from using global variables to
manage lifetime, which will fix a segfault discovered in relation to D127432

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D130712
2022-07-28 16:21:38 +01:00
Sebastian Neubauer 50716ba2b3 [CMake][OpenMP] Remove wrong backslash
outdir is defined in the line above, it will not exist in the install
command, so it should not be escaped.
2022-07-28 14:35:04 +02:00
Joseph Huber b08369f7f2 Revert "[OpenMP] Remove noinline attributes in the device runtime"
The behaviour of this patch is not great, but it has some side-effects
that are required for OpenMPOpt to work. The problem is that when we use
`-mlink-builtin-bitcode` we only import used symbols from the runtime.
Then OpenMPOpt will insert calls to symbols that were not previously
included. This patch removed this implicit behaviour as these functions
were kept alive by the `noinline` simply because it kept calls to them
in the module. This caused regression in some tests that relied on some
OpenMPOpt passes without using LTO. Reverting for the LLVM15 release but
will try to fix it more correctly on main.

This reverts commit d61d72dae6.

Fixes #56752
2022-07-27 11:09:18 -04:00
Tom Stellard 809855b56f Bump the trunk major version to 16 2022-07-26 21:34:45 -07:00
John Ericson 28e665fa05 [cmake] Slight fix ups to make robust to the full range of GNUInstallDirs
See https://cmake.org/cmake/help/v3.14/module/GNUInstallDirs.html#result-variables for `CMAKE_INSTALL_FULL_*`

Reviewed By: sebastian-ne

Differential Revision: https://reviews.llvm.org/D130545
2022-07-26 14:48:49 +00:00
Saiyedul Islam 4075a811ad
[Libomptarget] Add checks for AMDGPU TargetID using new image info
This patch extends the is_valid_binary routine to also check if the
binary's target ID matches the one parsed from the system's runtime
environment.
This should allow us to only use the binary whose compute capability
matches, allowing us to support basic multi-architecture binaries for
AMDGPU.
It also handles compatibility testing of target IDs of the image and
the enviornment.

Depends on D127432

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D127769
2022-07-26 02:44:31 -05:00
Joseph Huber 8c626fc0c8 [Libomptarget] Reintroduce host architecture checks for device RTL
A previous patch removed the need to set the auxiliary architecture as
it was no longer needed for the clang invocation after moving to using
the clang frontend. However, this had a second use of preventing
unsupported host architectures from building the device runtime. This
caused failures when trying to build on 32-bit hosts for example.

Fixes #56699

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D130509
2022-07-25 17:01:12 -04:00
Joseph Huber d61d72dae6 [OpenMP] Remove noinline attributes in the device runtime
We previously used the `noinline` attributes to specify some defintions
which should be kept alive in the runtime. These were then stripped
immediately in the OpenMPOpt module pass. However, Since the changes in
D130298, we not explicitly state which functions will have external
visiblity in the bitcode library. Additionally the OpenMPOpt module pass
should run before the inliner pass, so this shouldn't make a difference
in whether or not the functions will be alive for the initial pass of
OpenMPOpt. This should simplify the interface, and additionally save
time spend on scanning funciton names for noinline.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D130368
2022-07-25 15:44:50 -04:00
Saiyedul Islam 4cf30c5157
Revert "Revert "Revert "[Libomptarget] Add checks for AMDGPU TargetID using new image info"""
This reverts commit 281eb9223c.
2022-07-25 11:35:37 -05:00
Saiyedul Islam 281eb9223c
Revert "Revert "[Libomptarget] Add checks for AMDGPU TargetID using new image info""
This reverts commit 8cbf4a386b.
2022-07-25 08:32:26 -05:00
Saiyedul Islam 8cbf4a386b
Revert "[Libomptarget] Add checks for AMDGPU TargetID using new image info"
This reverts commit 471f2abc62.
2022-07-25 05:32:59 -05:00
Saiyedul Islam 471f2abc62
[Libomptarget] Add checks for AMDGPU TargetID using new image info
This patch extends the is_valid_binary routine to also check if the
binary's target ID matches the one parsed from the system's runtime
environment.
This should allow us to only use the binary whose compute capability
matches, allowing us to support basic multi-architecture binaries for
AMDGPU.
It also handles compatibility testing of target IDs of the image and
the enviornment.

Depends on D127432

Differential Revision: https://reviews.llvm.org/D127769
2022-07-25 04:44:36 -05:00
Shilei Tian b95d31a849 [OpenMP][Offloading] Enlarge the work size of `wtime.c` in case of any noise 2022-07-22 16:03:39 -04:00
Joel E. Denny cfa6e79df3 [Libomptarget] Don't report lack of CUDA devices
Sometimes libomptarget's CUDA plugin produces unhelpful diagnostics
about a lack of CUDA devices before an application runs:

```
$ clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa hello-world.c
$ ./a.out
CUDA error: Error returned from cuInit
CUDA error: no CUDA-capable device is detected
Hello World: 4
```

This can happen when the CUDA plugin was built but all CUDA devices
are currently disabled in some manner, perhaps because
`CUDA_VISIBLE_DEVICES` is set to the empty string.  As shown in the
above example, it can even happen when we haven't compiled the
application for offloading to CUDA.

The following code from `openmp/libomptarget/plugins/cuda/src/rtl.cpp`
appears to be intended to handle this case, and it chooses not to
write a diagnostic to stderr unless debugging is enabled:

```
if (NumberOfDevices == 0) {
  DP("There are no devices supporting CUDA.\n");
  return;
}
```

The problem is that the above code is never reached because the
earlier `cuInit` returns `CUDA_ERROR_NO_DEVICE`.  This patch handles
that `cuInit` case in the same manner as the above code handles the
`NumberOfDevices == 0` case.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D130371
2022-07-22 14:46:45 -04:00
Shilei Tian 0c86c4f50c [OpenMP] Fix test error introduced in D130179 2022-07-22 14:16:47 -04:00
Shilei Tian 602e0eb9f0 [OpenMP][DeviceRTL] Fix the issue that multiple calls to `omp_get_wtime` is optimized out by mistake
Multiple calls to `omp_get_wtime` could be optimized out due to the function
is mistakenly marked as `readnone`. This patch fixes the issue, and also add the
support to run optimization on `libomptarget` tests.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D130179
2022-07-22 13:46:45 -04:00
Shilei Tian 77cb30e3a6 Revert "[OpenMP][DeviceRTL] Fix the issue that multiple calls to `omp_get_wtime` is optimized out by mistake"
This reverts commit ad34f1dba8.
2022-07-22 11:45:13 -04:00
Shilei Tian ad34f1dba8 [OpenMP][DeviceRTL] Fix the issue that multiple calls to `omp_get_wtime` is optimized out by mistake
Multiple calls to `omp_get_wtime` could be optimized out due to the function
is mistakenly marked as `readnone`. This patch fixes the issue, and also add the
support to run optimization on `libomptarget` tests.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D130179
2022-07-22 11:43:30 -04:00
Joseph Huber a3804a3145 [Libomptarget] Make the plugins link as LLVM libraries
Previously we made `libomptarget` link as an LLVM library so we have
access to the LLVM core libraries. After the initial patch stuck we can
now apply the same changes to the plugins. This will allow us to use
LLVM in all of `libomptarget` when we have uses for them. In the future
this should allow us to remove the dependencies on `libelf`, `libffi`,
and `dl`.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D130262
2022-07-22 09:34:12 -04:00
Joseph Huber 908054df4f [Libomptarget] Only export needed definitions in the BC library
This patch adds the use of the `-internalize-public-api-file` option in
the internalization pass to internalize any definition that isn't
explicitly needed for the interface. This will allow us to perform more
optimizations on the file that normally would not have been possible
with functions internal to the library not being internal.

Depends on D130293

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D130298
2022-07-22 08:24:35 -04:00
Joseph Huber e82e07d74a [Libomptarget] Build the DeviceRTL BC using clang directly
Currently the bitcode library is build using the clang front-end
manually. This was originally done because we did not support device
only compilation. Now we support device only compilation, at least for a
single offloading toolchain, so we can instead use clang directly rather
than using the front-end. This saves us needing to define things like
`aux_triple`.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D130293
2022-07-22 08:24:29 -04:00
Ron Lieberman 45a379ce2f Revert "[Libomptarget] Stop testing CPU offloading with LTO"
This reverts commit 3e8d46921f.
2022-07-22 12:10:06 +00:00
Ye Luo 4794bbffb2 Revert "[OpenMP][OMPD] GDB plugin code to leverage libompd to provide debugging"
This reverts commit 51d3f421f4.
2022-07-21 22:00:33 -05:00
Ye Luo ee95be3c46 Revert "Fixing build bot failure due to python-pip unavailability."
This reverts commit 9dc0d6aaa1.
2022-07-21 22:00:32 -05:00
Johannes Doerfert 1da6ae4b54 [OpenMP][FIX] Ensure thread and team state are defined properly
The namespaces were missing causing the symbols to have "C" mangling.
To avoid this in the future we qualify the names now fully.
2022-07-21 21:57:14 -05:00
Joseph Huber 3e8d46921f [Libomptarget] Stop testing CPU offloading with LTO
Summary:
Some of the buildbots don't find the libraries because they don't build
for the GPU. Although it should always be there it's unclear why these
buildbots are having problemsd. LTO is only interesting on the GPU and
these tests take extra time anyway so I'm just going to disable them for
now.
2022-07-21 16:47:41 -04:00
John Ericson 07b749800c [cmake] Don't export `LLVM_TOOLS_INSTALL_DIR` anymore
First of all, `LLVM_TOOLS_INSTALL_DIR` put there breaks our NixOS
builds, because `LLVM_TOOLS_INSTALL_DIR` defined the same as
`CMAKE_INSTALL_BINDIR` becomes an *absolute* path, and then when
downstream projects try to install there too this breaks because our
builds always install to fresh directories for isolation's sake.

Second of all, note that `LLVM_TOOLS_INSTALL_DIR` stands out against the
other specially crafted `LLVM_CONFIG_*` variables substituted in
`llvm/cmake/modules/LLVMConfig.cmake.in`.

@beanz added it in d0e1c2a550 to fix a
dangling reference in `AddLLVM`, but I am suspicious of how this
variable doesn't follow the pattern.

Those other ones are carefully made to be build-time vs install-time
variables depending on which `LLVMConfig.cmake` is being generated, are
carefully made relative as appropriate, etc. etc. For my NixOS use-case
they are also fine because they are never used as downstream install
variables, only for reading not writing.

To avoid the problems I face, and restore symmetry, I deleted the
exported and arranged to have many `${project}_TOOLS_INSTALL_DIR`s.
`AddLLVM` now instead expects each project to define its own, and they
do so based on `CMAKE_INSTALL_BINDIR`. `LLVMConfig` still exports
`LLVM_TOOLS_BINARY_DIR` which is the location for the tools defined in
the usual way, matching the other remaining exported variables.

For the `AddLLVM` changes, I tried to copy the existing pattern of
internal vs non-internal or for LLVM vs for downstream function/macro
names, but it would good to confirm I did that correctly.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D117977
2022-07-21 19:04:00 +00:00
Johannes Doerfert d150152615 [OpenMP] Introduce more fine-grained control over the thread state use
We can help optimizations by making sure we use the team state whenever
it is clear there is no thread state. To this end we introduce a new
state flag (`state::HasThreadState`) and explicit control for the
`state::ValueRAII` helpers, including a dedicated "assert equal".

Differential Revision: https://reviews.llvm.org/D130113
2022-07-21 12:30:38 -05:00
Johannes Doerfert 7472b42b78 [OpenMP] Use Undef instead of null as pointer for inactive lanes
Our conditional writes in the runtime look like this:
```
  if (active)
    *ptr = value;
```
In the RAII we need to assign `ptr` which comes from a lookup call.
If a thread that is not the main thread calls lookup with the intention
to write the pointer, we'll create a new thread state. As such, we need
to avoid calling lookup for inactive threads. We used to use `nullptr`
as their `ptr` value but that can cause pessimistic reasoning. We now
use `undef` instead.

Differential Revision: https://reviews.llvm.org/D130114
2022-07-21 12:28:45 -05:00
Johannes Doerfert a42361dc1c [OpenMP] Expose the state in the header to allow non-lto optimizations
We used to inline the `lookup` calls such that the runtime had "known"
access offsets when it was shipped. With the new static library build it
doesn't as the lookup is an indirection we cannot look through. This
should help us optimize the code better until we can do LTO for the
runtime again.

Differential Revision: https://reviews.llvm.org/D130111
2022-07-21 12:28:44 -05:00
Joseph Huber e01ce4e88a [Libomptarget] Add checks for CUDA subarchitecture using new info
This patch extends the `is_valid_binary` routine to also check if the
binary's architecture string matches the one parsed from the runtime.
This should allow us to only use the binary whose compute capability
matches, allowing us to support basic multi-architecture binaries for
CUDA.

Depends on D127432

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D127505
2022-07-21 13:20:06 -04:00
Joseph Huber fbcb1ee7f3 [Libomptarget] Add support for offloading binaries in libomptarget
The previous path changed the linker wrapper to embed the offloading
binary format inside the target image instead. This will allow us to
more generically bundle metadata with these images, such as requires
clauses or the target architecture it was compiled for.

I wasn't sure how to handle this best, so I introduced a new type that
replaces the old `__tgt_device_image` struct that we can expand inside
the runtime library. I made the new `__tgt_device_binary` struct pretty
much the same for now. In the future we could change this struct to
pretty much be the `OffloadBinary` class in the future.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D127432
2022-07-21 13:20:04 -04:00
Joseph Huber 5d8a76feb0 [Libomptarget] Build the device library even if the sm list is empty
We previously had some logic that stopped us from building the device runtime if
there were no NVPTX architectures provided. This is incorrect because we could
have AMDGPU libraries. Even if the lists are empty we should be able to attempt
to build these and get dummy output. THis wilil make it much easier for our
tooling which expects certain libraries. If the user wishes to disable the
library entirely they should use `-DLIBOMPTARGET_BUILD_DEVICERTL_BCLIB=OFF"

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D130266
2022-07-21 10:57:47 -04:00
Joseph Huber dc52712a06 [Libomptarget] Make libomptarget an LLVM library
This patch makes libomptarget depend on LLVM libraries to be built. The
reason for this is because we already have an implicit dependency on
LLVM headers for ELF identification and extraction as well as an
optional dependenly on the LLVMSupport library for time tracing
information. Furthermore, there are changes in the future that require
using more LLVM libraries, and will heavily simplify some future code as
well as open up the large amount of useful LLVM libraries to
libomptarget.

This will make "standalone" builds of `libomptarget' more difficult for
vendors wishing to ship their own. This will require a sufficiently new
version of LLVM to be installed on the system that should be picked up
by the existing handling for the implicit headers.

The things this patch changes are as follows:
  - `libomptarget.so` links against LLVMSupport and LLVMObject
  - `libomptarget.so` is a symbolic link to `libomptarget.so.15`
  - If using a shared library build, user applications will depend on LLVM
    libraries as well
  - We can now use LLVM resources in Libomptarget.

Note that this patch only changes this to apply to libomptarget itself,
not the plugins. Additional patches will be necessary for that.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D129875
2022-07-20 15:58:06 -04:00
Joseph Huber b5b20164d2 Revert "[Libomptarget] Make libomptarget an LLVM library"
This reverts commit 643dfd97d5.

This patch still makes the AMDGPU buildbots unhappy. Reverting for now
until the AMD folks figure it out.
2022-07-20 10:18:55 -04:00
Joseph Huber 6b0db92bbd [Libomptarget] Fix LTO command line in test
Summary:
The test passed -offload-lto instead of -foffload-lto.
2022-07-20 10:18:55 -04:00
Joseph Huber 643dfd97d5 [Libomptarget] Make libomptarget an LLVM library
This patch makes libomptarget depend on LLVM libraries to be built. The
reason for this is because we already have an implicit dependency on
LLVM headers for ELF identification and extraction as well as an
optional dependenly on the LLVMSupport library for time tracing
information. Furthermore, there are changes in the future that require
using more LLVM libraries, and will heavily simplify some future code as
well as open up the large amount of useful LLVM libraries to
libomptarget.

This will make "standalone" builds of `libomptarget' more difficult for
vendors wishing to ship their own. This will require a sufficiently new
version of LLVM to be installed on the system that should be picked up
by the existing handling for the implicit headers.

The things this patch changes are as follows:
  - `libomptarget.so` links against LLVMSupport and LLVMObject
  - `libomptarget.so` is a symbolic link to `libomptarget.so.15`
  - If using a shared library build, user applications will depend on LLVM
    libraries as well
  - We can now use LLVM resources in Libomptarget.

Note that this patch only changes this to apply to libomptarget itself,
not the plugins. Additional patches will be necessary for that.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D129875
2022-07-20 09:52:09 -04:00
Jonathan Peyton 40ce65b5b2 [OpenMP][libomp] Fix affinity warnings and unify under one macro
Warnings that occur during affinity initialization are supposed
to be guarded by KMP_AFFINITY=nowarnings,noverbose, but some had been
missed by this logic. Create one macro for affinity warnings that takes
these settings into account.

Differential Revision: https://reviews.llvm.org/D125991
2022-07-19 13:10:25 -05:00
AndreyChurbanov 17dcde5f1b [OpenMP][libomp] Allow reset affinity mask after parallel
Added control to reset affinity of primary thread after outermost parallel
region to initial affinity encountered before OpenMP runtime was initialized.
KMP_AFFINITY environment variable reset/noreset modifier introduced.
Default behavior is unchanged.

Differential Revision: https://reviews.llvm.org/D125993
2022-07-19 13:05:05 -05:00
Jonathan Peyton 28c8da2965 [OpenMP][libomp] Fix fallthrough attribute detection for Intel compilers
icc does not properly detect lack of fallthrough attribute since it
defines __GNU__ > 7 and also icc's __has_cpp_attribute/__has_attribute
feature detectors do not properly detect the lack of fallthrough attribute.

Differential Revision: https://reviews.llvm.org/D126001
2022-07-19 13:04:25 -05:00
AndreyChurbanov a01d274fbd [OpenMP][libomp] Fix /dev/shm pollution after forked child process terminates
Made library registration conditional and skip it in the __kmp_atfork_child
handler, postponed it till middle initialization in the child.
This fixes the problem of applications those use e.g. popen/pclose
which terminate the forked child process.

Differential Revision: https://reviews.llvm.org/D125996
2022-07-19 12:59:58 -05:00
Jon Chesterfield e46f727b38 Revert "[Libomptarget] Make libomptarget an LLVM library"
This reverts commit 70039be627.
2022-07-19 17:59:45 +01:00
Joseph Huber 70039be627 [Libomptarget] Make libomptarget an LLVM library
This patch makes libomptarget depend on LLVM libraries to be built. The
reason for this is because we already have an implicit dependency on
LLVM headers for ELF identification and extraction as well as an
optional dependenly on the LLVMSupport library for time tracing
information. Furthermore, there are changes in the future that require
using more LLVM libraries, and will heavily simplify some future code as
well as open up the large amount of useful LLVM libraries to
libomptarget.

This will make "standalone" builds of `libomptarget' more difficult for
vendors wishing to ship their own. This will require a sufficiently new
version of LLVM to be installed on the system that should be picked up
by the existing handling for the implicit headers.

The things this patch changes are as follows:
  - `libomptarget.so` links against LLVMSupport and LLVMObject
  - `libomptarget.so` is a symbolic link to `libomptarget.so.15`
  - If using a shared library build, user applications will depend on LLVM
    libraries as well
  - We can now use LLVM resources in Libomptarget.

Note that this patch only changes this to apply to libomptarget itself,
not the plugins. Additional patches will be necessary for that.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D129875
2022-07-19 12:33:31 -04:00
Joseph Huber cdea437057 [Libomptarget] Fix warnings on address space attributes
The device runtime uses the address space attribute to control the
placement of important constants on the GPU. The changes made in D126061
caused these to start emitting errors as they were not applied to the
type. This patch fixes the issues to make the warnings go away.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D129896
2022-07-15 17:21:30 -04:00
Joseph Huber 1f940b69c3 [Libomptarget][NFC] Fix signed comparison warnings
Summary:
Non-functional change, just fixing some sign comparison warnings by
making both match.
2022-07-15 13:22:55 -04:00
Shilei Tian 65ebcee197 [OpenMP] Ignore .eggs file in OpenMP
The OMPD patches introduces GDB plugin. When it is built, it will create a
coulple of temp files in `.eggs`. This patch add it into `.gitignore` in case it
messed up the git tracking.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D129711
2022-07-14 12:06:50 -04:00
Joseph Huber b1d574867d [Libomptarget] Allow static assert to work on 32-bit systems
Summary:
We use a static assert to make sure that someone doesn't change the size
of an argument struct without properly updating all the other logic.
This originally only checked the size on a 64-bit system with 8-byte
pointers, causing builds on 32-bit systems to fail. This patch allows
either pointer size to work.

Fixes #56486
2022-07-12 08:05:01 -04:00
Vignesh Balasubramanian 9dc0d6aaa1 Fixing build bot failure due to python-pip unavailability.
commit: 51d3f421f4
failed due to missing python-pip om machine.
Now the ompd gdb-plugin code will be skipped with a warning
if pip is not available in the machine.
2022-07-12 16:01:59 +05:30
Vignesh Balasubramanian 51d3f421f4 [OpenMP][OMPD] GDB plugin code to leverage libompd to provide debugging
support for OpenMP programs.

This is 5th of 6 patches started from https://reviews.llvm.org/D100181
This plugin code, when loaded in gdb, adds a few commands like
ompd icv, ompd bt, ompd parallel.
These commands create an interface for GDB to read the OpenMP
runtime through libompd.

Reviewed By: @dreachem
Differential Revision: https://reviews.llvm.org/D100185
2022-07-12 14:38:41 +05:30
Shilei Tian e7d998e51e [NFC][OpenMP][Offloading] Fix compilation warning caused by misuse of `static_cast` 2022-07-08 20:59:37 -04:00
Joseph Huber 269d5c16bc [Libomptarget][NFC] Move legacy functions to a separate file
This patch moves the old legacy interfaces into `libomptarget` to a
separate file. These do not need to be included anywhere and are simply
provided for backwards compatibility with the ABI. This cleans up the
interface greatly.

Depends on D128817

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D128818
2022-07-08 14:44:21 -04:00
Joseph Huber c9353eb4bc [Libomptarget] Use new tripcount argument in the runtime.
The previous patch added an argument to the `__tgt_target_kernel`
runtime function which includes the tripcount used for the loop clause.
This was originally passed in via the `__kmpc_push_target_tripcount`
function. Now we move this logic to the kernel launch itself and remove
the need for the push function.

Depends on D128816

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D128817
2022-07-08 14:44:19 -04:00
Joseph Huber ad23e4d85f [Libomptarget] Implement a unified kernel entry function
This patch implements a unified kernel entry function that will be
targeted from both teams and non-teams clauses. We introduce a new
interface and make the old functions call in using the new one. A
following patch will include the necessary changes to Clang to call
these new functions instead.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D128549
2022-07-08 14:44:06 -04:00
Ye Luo fca79b78c4 [libomptarget] compile DeviceRTL bc files with -O3
bc files of DeviceRTL are compiled with -O3, the same as the static library.

Differential Revision: https://reviews.llvm.org/D129344
2022-07-08 10:00:26 -05:00
Vadim Paretsky 43d5c4d539 [OpenMP] add 4 custom APIs supporting MSVC OMP codegen
This check-in adds 4 APIs to support MSVC, specifically:

* 3 APIs (__kmpc_sections_init, __kmpc_next_section,
   __kmpc_end_sections) to support the dynamic scheduling of OMP sections.
* 1 API (__kmpc_copyprivate_light, a light-weight version of
  __kmpc_copyrprivate) to support the OMP single copyprivate clause.

Differential Revision: https://reviews.llvm.org/D128403
2022-07-05 17:26:18 -05:00
Joseph Huber d27d0a673c [Libomptarget][NFC] Make Libomptarget use the LLVM naming convention
Libomptarget grew out of a project that was originally not in LLVM. As
we develop libomptarget this has led to an increasingly large clash
between the naming conventions used. This patch fixes most of the
variable names that did not confrom to the LLVM standard, that is
`VariableName` for variables and `functionName` for functions.

This patch was primarily done using my editor's linting messages, if
there are any issues I missed arising from the automation let me know.

Reviewed By: saiislam

Differential Revision: https://reviews.llvm.org/D128997
2022-07-05 14:53:38 -04:00
Shilei Tian 696bca9bb2 [NFC][OpenMP][CUDA] Remove unnecessary default label 2022-07-01 09:50:29 -04:00
Jose M Monsalve Diaz 616dd9ae14 [OpenMP] Implementing omp_get_device_num()
This patch implements omp_get_device_num() in the host and the device.

It uses the already existing getDeviceNum in the device config for the device.
And in the host it uses the omp_get_num_devices().

Two simple tests added

Differential Revision: https://reviews.llvm.org/D128347
2022-06-29 02:18:21 -05:00
Shilei Tian 2695e23ad9 [OpenMP][CUDA] Fix the issue that P2P memcpy doesn't work
This patch fixes the issue that P2P memcpy doesn't work. The root cause is we didn't set current context when calling the API function. In addition, a matrix to track the states of each pair of devices is also added such that we only need to query and configure the device once.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D122764
2022-06-28 15:32:03 -04:00
Daniel Douglas d4a7b8de52 [OpenMP][libomp] avoid spin wait and yield on arm64 macOS
This patch changes the default behavior to avoid spin waiting and
yielding. (See “Don’t Keep Threads Active And Idle” section here:
https://developer.apple.com/documentation/apple-silicon/tuning-your-code-s-performance-for-apple-silicon)

We verified using instruments traces that the changes improve scheduling
behavior on macOS.

We also collected results using EPCC schedbench
(https://github.com/LangdalP/EPCC-OpenMP-micro-benchmarks) that are
attached here that show a reduction in standard deviation and max test
run time across all scheduling types. Static scheduling sees dramatic
improvements with these changes, we see a 2-4x average runtime
improvement in the benchmark.

Differential Revision: https://reviews.llvm.org/D126510
2022-06-24 12:02:16 -05:00
Jonathan Peyton b7b4986576 [OpenMP][libomp] Hold old __kmp_threads arrays until library shutdown
When many nested teams are formed, __kmp_threads may be reallocated
to accommodate new threads. This reallocation causes a data
race when another existing team's thread simultaneously references
__kmp_threads. This patch keeps the old thread arrays around until library
shutdown so these lingering references can complete without issue and
access to __kmp_threads remains a simple array reference.

Fixes: https://github.com/llvm/llvm-project/issues/54708
Differential Revision: https://reviews.llvm.org/D125013
2022-06-22 10:30:35 -05:00
Joseph Huber 3351ae61d9 [Libomptarget] Remove duplicate data environment exit
Summary:
This patch removes a duplicated exit from the OpenMP data envrionment.
We already have an RAII method that guards this environment so it is
unnecessary.
2022-06-21 22:35:32 -04:00
Ye Luo 4d9499e8cc [libomptarget] Make libomptarget.devicertl.a built in all cases.
Make libomptarget.device.a built when using -DLLVM_ENABLE_PROJECTS=openmp
Use add_custom_command.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D128130
2022-06-20 08:29:16 -05:00
Ye Luo 54b45afb59 [libomptarget]Add a trap for external omptarget from LLVM
Old LLVM installation may expose its internal omptarget CMake target when being used by find_package(LLVM) and caused issues in the CMake of libomptarget that is being built. Trap the issue early.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D128129
2022-06-18 21:08:53 -05:00
Joseph Huber d87ca519c9 [Libomptarget] Use binutils archive executable to address failing tests
Summary:
The static linking test ensures that we can statically link offloading
programs. To create the test we used `llvm-ar`. However, this may not
exist in the user's environment. This patch changes it to use the
binutils `ar` which should exist on every system running these tests
currently. In the future we should set up the dependencies properly.
2022-06-14 22:14:17 -04:00
Joseph Huber d5d836635c [Libomptarget] Add test config for compiling in LTO-mode
We are planning on making LTO the default compilation mode for
offloading. In order to make sure it works we should run these tests on
the test suite. AMDGPU already uses the LTO compilation path for its
linking, but in LTO mode it also links the static library late.

Performing LTO requires the static library to be built, if we make the
change this will be a hard requirement and the old bitcode library will
go away. This means users will need to use either a two-step build or a
runtimes build for libomptarget.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D127512
2022-06-14 10:16:03 -04:00
John Ericson 0bb317b7bf Revert "[cmake] Don't export `LLVM_TOOLS_INSTALL_DIR` anymore"
This reverts commit d5daa5c5b0.
2022-06-10 19:26:12 +00:00
John Ericson d5daa5c5b0 [cmake] Don't export `LLVM_TOOLS_INSTALL_DIR` anymore
First of all, `LLVM_TOOLS_INSTALL_DIR` put there breaks our NixOS
builds, because `LLVM_TOOLS_INSTALL_DIR` defined the same as
`CMAKE_INSTALL_BINDIR` becomes an *absolute* path, and then when
downstream projects try to install there too this breaks because our
builds always install to fresh directories for isolation's sake.

Second of all, note that `LLVM_TOOLS_INSTALL_DIR` stands out against the
other specially crafted `LLVM_CONFIG_*` variables substituted in
`llvm/cmake/modules/LLVMConfig.cmake.in`.

@beanz added it in d0e1c2a550 to fix a
dangling reference in `AddLLVM`, but I am suspicious of how this
variable doesn't follow the pattern.

Those other ones are carefully made to be build-time vs install-time
variables depending on which `LLVMConfig.cmake` is being generated, are
carefully made relative as appropriate, etc. etc. For my NixOS use-case
they are also fine because they are never used as downstream install
variables, only for reading not writing.

To avoid the problems I face, and restore symmetry, I deleted the
exported and arranged to have many `${project}_TOOLS_INSTALL_DIR`s.
`AddLLVM` now instead expects each project to define its own, and they
do so based on `CMAKE_INSTALL_BINDIR`. `LLVMConfig` still exports
`LLVM_TOOLS_BINARY_DIR` which is the location for the tools defined in
the usual way, matching the other remaining exported variables.

For the `AddLLVM` changes, I tried to copy the existing pattern of
internal vs non-internal or for LLVM vs for downstream function/macro
names, but it would good to confirm I did that correctly.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D117977
2022-06-10 14:35:18 +00:00
Yuki Okushi 074f12e467
[OpenMP] Fix the build on Windows
The code expanded from kmp_barrier.h uses some `KMP_INTERNAL_*`s,
so the definitions have to be placed before it.

Fixes #55815

Differential Revision: https://reviews.llvm.org/D126873
2022-06-09 22:12:42 +09:00
Jose Manuel Monsalve Diaz 15ed5c0a07 [LIBOMPTARGET] Adding AMD to llvm-omp-device-info
Adding device information print for AMD devices on the
`llvm-omp-device-info` command line tool. The output is inspired by
the rocminfo command line tool.

This commit adds missing HSA functions, enums and structs
needed to query additional information from the HSA agents.
A generic message for the `generic-elf-64bit` plugin is also added

Example of an output:
```
llvm-omp-device-info
Device (0):
    This is a generic-elf-64bit device

Device (1):
    This is a generic-elf-64bit device

Device (2):
    This is a generic-elf-64bit device

Device (3):
    This is a generic-elf-64bit device

Device (4):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           0
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE

Device (5):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           1
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE

Device (6):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           2
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE

Device (7):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           3
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE
```

Differential Revision: https://reviews.llvm.org/D126836
2022-06-09 11:58:39 +00:00
Jose Manuel Monsalve Diaz 84e020a061 Revert "[LIBOMPTARGET] Adding AMD to llvm-omp-device-info"
This reverts commit d16a0877d8.
2022-06-09 10:46:03 +00:00
Jose Manuel Monsalve Diaz d16a0877d8 [LIBOMPTARGET] Adding AMD to llvm-omp-device-info
Adding device information print for AMD devices on the
`llvm-omp-device-info` command line tool. The output is inspired by
the rocminfo command line tool.

This commit adds missing HSA functions, enums and structs
needed to query additional information from the HSA agents.
A generic message for the `generic-elf-64bit` plugin is also added

Example of an output:
```
llvm-omp-device-info
Device (0):
    This is a generic-elf-64bit device

Device (1):
    This is a generic-elf-64bit device

Device (2):
    This is a generic-elf-64bit device

Device (3):
    This is a generic-elf-64bit device

Device (4):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           0
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE

Device (5):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           1
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE

Device (6):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           2
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE

Device (7):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           3
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE
```

Differential Revision: https://reviews.llvm.org/D126836
2022-06-08 16:31:12 +00:00
Joseph Huber 86a4c78047 [Libomptarget] Add missing include to define `printf`
Summary:
This test was failing because of an implicit declaration of `printf`
which isn't legal with newer C, causing it to fail. This patch just adds
the necessary header.
2022-06-08 09:56:51 -04:00
Joseph Huber 421b1f55c6 [Libomptarget] Do not use retaining attributes for the static library
When we build the libomptarget device runtime library targeting bitcode,
we need special care to make sure that certain functions are not
optimized out. This is because we manually internalize and optimize
these definitions, ignoring their standard linkage semantics. When we
build with the static library, we can maintain these semantics and we do
not need these to be kept-alive. Furthermore, if they are kept-alive it
prevents them from being removed during LTO. This prevents us from
completely internalizing `IsSPMDMode` and removing several other
functions. This patch removes these for the static library target by
using a macro definition to enable them.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D126701
2022-06-07 12:16:34 -04:00
Vadim Paretsky f58fe2e186 [OpenMP] allow loc to be NULL in __kmp_determine_reduction_method for MSVC
MSVC may not supply source location information to kmpc_reduce passing
NULL for the value. The patch adds a check for the loc value being NULL
in kmp_determine_reduction_method.

Differential Revision: https://reviews.llvm.org/D126564
2022-06-03 14:11:39 -05:00
Daniel Douglas 5d25dbff67 [OpenMP][libomp] do not try to dlopen libmemkind on macOS
The memkind library is only available for linux. Calling dlopen here
can also be problematic in a client app that fork'ed.

Differential Revision: https://reviews.llvm.org/D126579
2022-06-02 14:28:09 -05:00
David CARLIER 2ba5d820e2 [OpenMP] omp_get_proc_id uses sched_getcpu fallback on FreeBSD 13.1 and above.
Reviewers: jlpeyton, jdoerfert

Reviewed-By: jlpeyton

Differential-Revision: https://reviews.llvm.org/D126408
2022-06-02 17:10:29 +01:00
Mikael Simberg e27ce28139 [OpenMP][libomp] Make LIBOMP_CONFIGURED_LIBFLAGS a list instead of string
When configuring llvm with the openmp subproject, the build for the omp
target fails if LIBOMP_CONFIGURED_LIBFLAGS contains more than one item.
LIBOMP_CONFIGURED_LIBFLAGS should be a semicolon-separated list instead
of a string with items separated by spaces.

Differential Revision: https://reviews.llvm.org/D125370
2022-06-02 10:50:21 -05:00
Joseph Huber f4f23de1a4 [Libomptarget] Add basic support for dynamic shared memory on AMDGPU
This patchs adds the arguments necessary to allocate the size of the
dynamic shared memory via the `LIBOMPTARGET_SHARED_MEMORY_SIZE`
environment variable. This patch only allocates the memory, AMDGPU has a
limitation that shared memory can only be accessed from the kernel
directly. So this will currently only work with optimizations to inline
the accessor function.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D125252
2022-06-01 13:32:50 -04:00
Joseph Huber ae76652677 Revert "[Libomptarget] Add `leaf` attribute to `vprintf` declaration"
This is preventing users from calling `printf` on NVPTX code. Revert for
now until there is a fix.

This reverts commit eda4ef3add.
2022-05-31 10:24:04 -04:00
Joel E. Denny d2e3cb7374 [OpenMP][Clang] Fix atomic compare for signed vs. unsigned
Without this patch, arguments to the
`llvm::OpenMPIRBuilder::AtomicOpValue` initializer are reversed.

Reviewed By: ABataev, tianshilei1992

Differential Revision: https://reviews.llvm.org/D126619
2022-05-30 11:02:20 -04:00
Joel E. Denny 4a36813669 [OpenACC][OpenMP] Document atomic-in-teams extension
That is, put D126323 in the status doc and explain its relationship to
OpenACC support.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D126547
2022-05-27 18:53:19 -04:00
Joel E. Denny 48ca3a5ebb [OpenMP] Extend omp teams to permit nested omp atomic
OpenMP 5.2, sec. 10.2 "teams Construct", p. 232, L9-12 restricts what
regions can be strictly nested within a `teams` construct.  This patch
relaxes Clang's enforcement of this restriction in the case of nested
`atomic` constructs unless `-fno-openmp-extensions` is specified.
Cases like the following then seem to work fine with no additional
implementation changes:

```
 #pragma omp target teams map(tofrom:x)
 #pragma omp atomic update
 x++;
```

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D126323
2022-05-26 14:59:16 -04:00
Joseph Huber 20ec4161d7 [Libomptarget] Add branch prediction intrinsic to state check
Summary:
We usually used the `OMP_LIKELY` and `OMP_UNLIKELY` macros to add branch
prediction intrinsics to help the optimizer ignore unlikely loops. This
wasn't applied to this one loop so add that in.
2022-05-20 15:38:54 -04:00
Jonathan Peyton f613e6d19d [OpenMP][libomp] Fix accidental removal of else for core attributes 2022-05-19 14:00:27 -05:00
Joseph Huber eda4ef3add [Libomptarget] Add `leaf` attribute to `vprintf` declaration
Summary:
This patch adds the `leaf` attribute to the `vprintf` declaration in the
OpenMP runtime. This attribute allows us to determine that the `vprintf`
function will not call any functions within the translation unit,
allowing us to deduce `norecurse` attributes on the caller.
2022-05-19 14:22:53 -04:00
AndreyChurbanov c44ba01de7 [OpenMP] libomp: honor passive wait policy requested with tasking
Currently the library ignores requested wait policy in the presence
of tasking. Threads always actively spin. The patch fixes this problem
making the wait policy passive if this explicitly requested by user.

Differential Revision: https://reviews.llvm.org/D123044
2022-05-18 10:04:30 -05:00
Joseph Huber 5ffecd28c9 [Libomptarget] Don't build the device runtime without a new Clang
The OpenMP device offloading library is a bitcode library and thus only
expect to build and linked with the same version of clang that was used
to create it. This somewhat copmlicates the building process as we
require the Clang that was just built to be used to create the library.
This is either done with a two-step build, where OpenMP is built with
the Clang that was just installed, or through the
`-DLLLVM_ENABLE_RUNTIMES=openmp` option. This has always been the case,
but recent changes have caused this to make it difficult to build the
rest of OpenMP. This patchs adds a check to not build the OpenMP device
runtime if the current compiler is not Clang with the same version as
the LLVM installation. This should allow users to build OpenMP as a
project using any compiler without it erroring out due to the bitcode
library, but if users require it they will need to use the above methods
to compile it.

Reviewed By: jdoerfert, tianshilei1992, ye-luo

Differential Revision: https://reviews.llvm.org/D125698
2022-05-16 18:18:32 -04:00
Joseph Huber 54e02179b3 [Libomptarget] Build the static library without CUDA installed
Summary:
This patch allows users to compile the static library without CUDA
installed on the system. This requires the new flag `--cuda-feature` to
indicate that we need `+ptx61` in order to compile the runtime.
2022-05-13 16:30:58 -04:00
Joseph Huber 16b7a0b43b [Libomptarget] Build the device runtime as a static library
This patch adds the necessary CMake configuration to build a static
library version of the device runtime, `libomptarget.devicertl.a`.
Various improvements in how we handle static libraries and generating
offloading code should allow us to treat the device library as a regular
project without needing to invoke the clang front-end directly. Here we
generate a job for each offloading architecture supported. Each
offloading architecture will be embedded into the static library and
used as-needed by the host.

This library will primarily be used to replace the bitcode library when
performing LTO. Currently, we need to manually pass in the bitcode
library which requires foreknowledge of the offloading architecture.
This approach lets us handle that in the linker wrapper instead.
Furthermore this should improve our interface to the device runtime. We
can now build it fully under a release build and have all the expected
entry points, as well as supporting debug builds.

Depends on D125265 D125256 D125260 D125314 D125563

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D125315
2022-05-13 14:38:51 -04:00
Joseph Huber 9ffa945c40 [Libomptarget] Remove global include directory from libomptarget
We used to globally include the libomptarget include directory for all
projects. This caused some conflicts with the other files named
"Debug.h". This patch changes the cmake to include these files via the
target include instead.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D125563
2022-05-13 14:38:47 -04:00
Joseph Huber ce0caf41bd [Libomptarget] Address existing warnings in the device runtime library
This patche attemps to address the current warnings in the OpenMP
offloading device runtime. Previously we did not see these because we
compiled the runtime without the standard warning flags enabled.
However, these warnings are used when we now build the static library
version of this runtime. This became extremely noisy when coupled with
the fact the we compile each file roughly 32 times when all the
architectures are considered. So it would be ideal to not have all these
warnings show up when building.

Most of these errors were simply implicit switch-case fallthroughs,
which can be addressed using C++17's fallthrough attribute. Additionally
there was a volatile variable that was being casted away. This is most
likely safe to remove because we cast it away before its even used and
didn't seem to affect anything in testing.

Depends on D125260

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D125339
2022-05-13 14:38:31 -04:00
Joseph Huber b4f8443d97 [Libomptarget] Allow the device runtime to be compiled for the host
Currently the OpenMP offloading device runtime is only expected to be
compiled for the specific architecture it's targeting. This is
problematic if we want to make compiling the device runtime more general
via the standar `clang` driver rather than invoking the clang front-end
directly. This patch addresses this by primarily changing the declare
type to `nohost` so the host will not contain any of this code.
Additionally we forward declare the functions that are defined via
variants, otherwise these would cause problems on the host.

Reviewed By: jdoerfert, tianshilei1992

Differential Revision: https://reviews.llvm.org/D125260
2022-05-13 14:38:27 -04:00
serge-sans-paille 40d3a0ba4d [openmp] Fix strict aliasing issue in cmpxchg routine
Avoid warning under -fstrict-aliasing by using a call to memcpy to perform type
punning.

Differential Revision: https://reviews.llvm.org/D125467
2022-05-12 16:14:48 +02:00
AndreyChurbanov 52d0ef3c00 [OpenMP] libomp: Add itt notifications to sync dependent tasks.
Intel Inspector uses itt notifications to analyze code execution, and it
reports race conditions in dependent tasks.
This patch fixes the issue notifying Inspector on tasks dependency
synchronizations.

Differential Revision: https://reviews.llvm.org/D123042
2022-05-05 11:30:59 -05:00
AndreyChurbanov 4a64bed216 [OpenMP] libomp: cleanup - remove duplicate check
The identical check remains 20 lines above in the code.

Differential Revision: https://reviews.llvm.org/D123046
2022-05-05 11:01:20 -05:00
AndreyChurbanov eed0d85152 [OpenMP] libomp: cleanup dead code
Differential Revision: https://reviews.llvm.org/D123047
2022-05-05 10:56:49 -05:00
Hansang Bae 7e23b46ab8 [OpenMP] Possible fix for sporadic test failure from loop_dispatch.c
This patch tries to fix sporadic test failure after the change
https://reviews.llvm.org/D122107.
Made the test wait until every thread has at least one loop iteration.

Differential Revision: https://reviews.llvm.org/D124812
2022-05-03 14:46:32 -05:00
Joseph Huber 5ad07ac400 [Libomptarget] Use entry name for global info
Currently, globals on the device will have an infinite reference count
and an unknown name when using `LIBOMPTARGET_INFO` to print the mapping
table. We already store the name of the global in the offloading entry
so we should be able to use it, although there will be no source
location. To do this we need to create a valid `ident_t` string from a
name only.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D124381
2022-04-25 09:56:43 -04:00
Ye Luo 8a880db519 [libomptarget] Make omp_target_is_present checks storage instead of zero length array.
Consider checking whether a pointer has been mapped can be achieved via omp_get_mapped_ptr.
omp_target_is_present is more needed to check whether the storage being pointed is mapped.
This restore the old behavior of omp_target_is_present before D123093
Fixes https://github.com/llvm/llvm-project/issues/54899

Reviewed By: jdenny

Differential Revision: https://reviews.llvm.org/D123891
2022-04-22 17:37:06 -05:00
Ye Luo 91ccd8248c [Clang][OpenMP] libompd: get libomp hwloc includedir by target_link_libraries
When hwloc is used and is installed outside of the default paths, the omp CMake target
needs to provide the needed include path thru the CMake target by adding it with
target_include_directories to it, so libompd gets it as well when it defines it's cmake
target using target_link_libraries.

As suggested in D122667

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D123888
2022-04-22 17:33:41 -05:00
Joseph Huber f557bb8733 [OpenMP][Docs] Remove usage of deprecated flag in documentation
Summary:
This documentation used the `-fopenmp-target-new-runtime` flag which is
deprecated and has no effect. Remove it.
2022-04-21 18:50:25 -04:00
Atmn Patel c44420e90d [Libomptarget][remote] Add OpenMP linker flag to the plugin
The remote offloading server and plugin rely on OpenMP, so this needs to be added as a linker flag. Without this, applications segfault.

Differential Revision: https://reviews.llvm.org/D124200
2022-04-21 15:45:29 -04:00
Atmn Patel 489894f363 [Libomptarget][remote] Fix compile-time error
This fixes a compile-time error recently introduced within the remote
offloading plugin. This patch also removes some extra linker flags that are unnecessary, and adds an explicit abseil linker flag without which we occasionally get problems.

Differential Revision: https://reviews.llvm.org/D119984
2022-04-19 16:46:01 -04:00
Joseph Huber 80787213ea [Libomptarget] Fix test using old unsupported lit string
Summary:
One test had an old "unsupported" string that used the old `newDriver`
string which was removed. This test should be updated to use the
`oldDriver` one instead.
2022-04-18 23:08:12 -04:00
Joseph Huber ae23be84cb [OpenMP] Make the new offloading driver the default
Previously an opt-in flag `-fopenmp-new-driver` was used to enable the
new offloading driver. After passing tests for a few months it should be
sufficiently mature to flip the switch and make it the default. The new
offloading driver is now enabled if there is OpenMP and OpenMP
offloading present and the new `-fno-openmp-new-driver` is not present.

The new offloading driver has three main benefits over the old method:
- Static library support
- Device-side LTO
- Unified clang driver stages

Depends on D122683

Differential Revision: https://reviews.llvm.org/D122831
2022-04-18 15:05:09 -04:00
Joseph Huber ba01306009 [Libomptarget] Fix LIBOMPTARGET_INFO test
Summary:
A patch added a new line to one of the info outputs without updating
this test. This patch adds the new text to the existing test.
2022-04-18 14:09:02 -04:00
Dhruva Chakrabarti 7086a1db80 [libomptarget] [amdgpu] Hostcall offset check should consider implicit args
Fixed hostcall offset check to compare against kernarg segment size
and implicit arguments. Improved the corresponding debug print.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D123827
2022-04-15 00:53:47 +00:00
Saiyedul Islam 54a6cc3405
[libomptarget][amdgpu] Add hidden_heap_v1 kernarg metadata
Code object version 5 adds support of hidden_heap_v1 kernarg
metadata field [1]. It is a global address space pointer to an
initialized memory buffer that conforms to the requirements of the
malloc/free device library V1 version implementation.

[1] https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdhsa-code-object-kernel-argument-metadata-map-table-v5

Reviewed By: carlo.bertolli

Differential Revision: https://reviews.llvm.org/D123527
2022-04-13 03:57:29 +00:00
Johannes Doerfert a3a42c3ca2 [OpenMP][FIX] Ensure to set the context for wait events if necessary
Differential Revision: https://reviews.llvm.org/D123445
2022-04-12 16:42:50 -05:00
Jonathan Peyton d49ce7c356 [OpenMP][libomp] Replace global variable references with local object
Remove references to global __kmp_topology within a kmp_topology_t
object method. There should just be implicit references to the
private object.
2022-04-12 12:50:41 -05:00
Jonathan Peyton 747a490612 [OpenMP][libomp] Fix some Doxygen issues
Fix spelling of variable names and remove accidental references (#)
in Doxygen comments.
2022-04-12 11:05:30 -05:00
Joseph Huber 2e0cb61570 [OpenMP] Fix linker error when building info tool
Summary:
The changes made in D123177 added new targets to the
`LIBOMPTARGET_TESTED_PLUGINS` variable which was linked against when
building the `llvm-omp-target-info` tool. This caused linker errors on
the export scripts. This patch removes that dependency, it still builds
and runs as expected so I will assume it's correct.
2022-04-08 10:50:31 -04:00
Ye Luo c1a6fe196d [libomptarget] Implement pointer lookup as 5.1 spec.
As described in 5.1 spec
2.21.7.2 Pointer Initialization for Device Data Environments

Reviewed By: RaviNarayanaswamy

Differential Revision: https://reviews.llvm.org/D123093
2022-04-07 23:01:25 -05:00
Joseph Huber a3f423cf57 [OpenMP] Add dynamic memory function to omp.h and add documentation
This patch adds the `llvm_omp_target_dynamic_shared_alloc` function to
the `omp.h` header file so users can access it by default. Also changed
the name to keep it consistent with the other target allocators. Added
some documentation so users know how to use it. Didn't add the interface
for Fortran since there's no way to test it right now.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D123246
2022-04-07 14:23:23 -04:00
Joseph Huber 840c040498 [OpenMP] Change target memory tests to use allocators
The target allocators have been supported for NVPTX offloading for
awhile. The tests should use the allocators instead of calling the
functions manually. Also the comments indicating these being a preview
should be removed.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D123242
2022-04-07 14:23:14 -04:00
Michael Kruse 7fa7b0cbd8 [libomptarget] Add device RTL to regression test dependencies.
In a clean build directory, `check-openmp` or `check-libomptarget` will fail because of missing device RTL .bc files. Ensure that the new targets new custom targets `omptarget.devicertl.nvptx` and `omptarget.devicertl.amdgpu` (corresponding to the plugin rtl targets `omptarget.rtl.cuda`, respectively `omptarget.rlt.amdgpu` ) are dependencies of the regression tests.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D123177
2022-04-06 20:01:47 -05:00
Hansang Bae 090309d316 [OpenMP] Fix warnings
Silenced compiler warnings after pushing the following change.
https://reviews.llvm.org/D122107

Differential Revision: https://reviews.llvm.org/D123233
2022-04-06 12:35:05 -05:00
Hansang Bae e4ac11beb7 [OpenMP] Add support for ompt_callback_dispatch
This change adds support for ompt_callback_dispatch with the new
dispatch chunk type introduced in 5.2. Definitions of the new
ompt_work_loop types were also added in the header file.

Differential Revision: https://reviews.llvm.org/D122107
2022-04-06 08:01:02 -05:00
Jonathan Peyton d345fe7c22 [OpenMP][libomp] NFC: Move omp_* functions out of kmp_* section 2022-03-31 13:39:30 -05:00
Joachim Protze 7641e42def [OpenMP][Tools] Fix handling of initial-task-end
Latest OpenMP spec says parallel_data is NULL for initial/implicit-task-end.
We nevertheless need to cleanup the ParallelData here, as there is no other
callback for the end of the implicit parallel region. We can use the reference
stored in the TaskData.

Reviewed By: dreachem

Differential Revision: https://reviews.llvm.org/D114005
2022-03-31 12:33:40 -05:00
Ron Lieberman 95eac47260 [libomptarget] x86 offloading fails map_back_race.cpp intermittently
Differential Revision: https://reviews.llvm.org/D122658
2022-03-29 16:01:17 +00:00
Johannes Doerfert b803f06901 [OpenMP] The test does not have check lines 2022-03-29 00:02:55 -05:00
Johannes Doerfert b309bdb970 [OpenMP][FIX] Use clang++ for the C++ test case 2022-03-28 23:14:24 -05:00
Johannes Doerfert b316126887 [OpenMP][FIX] Avoid races in the handling of to be deleted mapping entries
If we decided to delete a mapping entry we did not act on it right away
but first issued and waited for memory copies. In the meantime some
other thread might reuse the entry. While there was some logic to avoid
colliding on the actual "deletion" part, there were two races happening:

1) The data transfer back of the thread deleting the entry and
   the data transfer back of the thread taking over the entry raced.
2) The update to the shadow map happened regardless if the entry was
   actually reused by another thread which left the shadow map in a
   inconsistent state.

To fix both issues we will now update the shadow map and delete the
entry only if we are sure the thread is responsible for deletion, hence
no other thread took over the entry and reused it. We also wait for a
potential former data transfer from the device to finish before we issue
another one that would race with it.

Fixes https://github.com/llvm/llvm-project/issues/54216

Differential Revision: https://reviews.llvm.org/D121058
2022-03-28 22:33:18 -05:00
Johannes Doerfert ba93e4e33e [OpenMP][NFC] Add missing virtual destructor to silence warning 2022-03-28 22:33:18 -05:00
Johannes Doerfert 7df2eba7fa [Attributor][OpenMP] Add assumption for non-call assembly instructions
Inline assembly is scary but we need to support it for the OpenMP GPU
device runtime. The new assumption expresses the fact that it may not
have call semantics, that is, it will not call another function but
simply perform an operation or side-effect. This is important for
reachability in the presence of inline assembly.

Differential Revision: https://reviews.llvm.org/D109986
2022-03-28 20:57:52 -05:00
Shilei Tian 545fcc3d84 [OpenMP][CUDA] Fix potential program crash caused by double free resources
As we mentioned in the code comments for function `ResourcePoolTy::release`,
at some point there could be two identical resources on the two sides of `Next`
mark. It is usually not an issue, unless the following case:
1. Some resources are not returned.
2. We need to iterate the pool and free the element.

That will cause double free, which is the case for event pool. Since we don't release
events hold by the data map, it can happen that the `Next` mark is not reset, and
we have two identical items in the pool. When the pool is destroyed, we will call
`cuEventDestroy` twice on the same event. In the best case, we can only observe
CUDA errors. In the worst case, it can cause internal failures in CUDART and further
crash.

This patch fixes the issue by tracking all resources that have been given using
an `unordered_set`. We don't remove it when a resource is returned. When the pool
is destroyed, we merge the pool (a `vector`) and the set. In this way, we can make
sure that the set contains all resources allocated from the device. We just need
to iterate the set and free the resource accordingly.

For now, only event pool is set to use it. Stream pool is not because we can make
sure all streams are returned when the plugin is destroyed.

Someone might be wondering, why don't we release all events hold in the data map.
That is because, plugins are determined to be destroyed *before* `libomptarget`.
If we can somehow make the plugin outlast `libomptarget`, life will be much
easier.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D122014
2022-03-25 22:49:32 -04:00
Joseph Huber 9d3550c517 [OpenMP] Add AMDGPU calling convention to ctor / dtor functions
This patch adds the necessary AMDGPU calling convention to the ctor /
dtor kernels. These are fundamentally device kenels called by the host
on image load. Without this calling convention information the AMDGPU
plugin is unable to identify them.

Depends on D122504

Fixes #54091

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D122515
2022-03-25 22:44:20 -04:00
Johannes Doerfert 6c2be885ff Revert "[OpenMP][NFC] Add missing virtual destructor to silence warning"
This reverts commit b9fd8f34ae as it
accidentally contained a unit test change that is not finished (and
unrelated).
2022-03-25 16:07:11 -05:00
Johannes Doerfert 7dfad948f1 [OpenMP][FIX] Repair ExclusiveAccess move semantic snafu 2022-03-25 16:00:53 -05:00
Johannes Doerfert b9fd8f34ae [OpenMP][NFC] Add missing virtual destructor to silence warning 2022-03-25 16:00:53 -05:00
Johannes Doerfert 4e34f061d6 [OpenMP][FIX] Ensure exclusive access to the HDTT map
This patch solves two problems with the `HostDataToTargetMap` (HDTT
map) which caused races and crashes before:

1) Any access to the HDTT map needs to be exclusive access. This was not
   the case for the "dump table" traversals that could collide with
   updates by other threads. The new `Accessor` and `ProtectedObject`
   wrappers will ensure we have a hard time introducing similar races in
   the future. Note that we could allow multiple concurrent
   read-accesses but that feature can be added to the `Accessor` API
   later.
2) The elements of the HDTT map were `HostDataToTargetTy` objects which
   meant that they could be copied/moved/deleted as the map was changed.
   However, we sometimes kept pointers to these elements around after we
   gave up the map lock which caused potential races again. The new
   indirection through `HostDataToTargetMapKeyTy` will allows us to
   modify the map while keeping the (interesting part of the) entries
   valid. To offset potential cost we duplicate the ordering key of the
   entry which avoids an additional indirect lookup.

We should replace more objects with "protected objects" as we go.

Differential Revision: https://reviews.llvm.org/D121057
2022-03-25 11:38:54 -05:00
Joseph Huber a619072c61 [OpenMP] Manually unroll the argument copy loop
The unroll pragma did not properly work as the loop bound was not known
when we optimize the runtime and we then added a "unroll disable"
metadata which prevented unrolling later when the bounds were known.
For now we manually unroll to make sure up to 16 elements are handled
nicely. This helps optimizations to look through the argument passing.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109164
2022-03-21 20:54:11 -04:00
Stanislav Mekhanoshin e0b9364b5c [AMDGPU] Add gfx90a and gfx940 to get_elf_mach_gfx_name.cpp
Differential Revision: https://reviews.llvm.org/D120849
2022-03-17 13:05:07 -07:00
Jon Chesterfield 75779435f3 [nfc][openmp] Swap arguments to remove noise from upcoming diff 2022-03-11 23:08:37 +00:00
Shilei Tian f6639a424b [OpenMP][CUDA] Fix the check of `setContext` 2022-03-09 18:48:44 -05:00
Shilei Tian 39d3283a08 [OpenMP][CUDA] Avoid calling `cuCtxSetCurrent` redundantly
Currently we set ccontext everywhere accordingly, but that causes many
unnecessary function calls. For example, in the resource pool, if we need to
resize the pool, we need to get from allocator. Each call to allocate sets the
current context once, which is unnecessary. In this patch, we set the context
only in the entry interface functions, if needed. Actually in the best way this
should be implemented via RAII, but since `cuCtxSetCurrent` could return error,
and we don't use exception, we can't stop the execution if RAII fails.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D121322
2022-03-09 16:32:47 -05:00
Shilei Tian 5105c7cd78 [OpenMP][CUDA] Fix an issue that multiple `CUmodule` are could be overwritten
This patch fixes the issue introduced in 14de0820e8 and D120089, that
if dynamic libraries are used, the `CUmodule` array could be overwritten.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D121308
2022-03-09 14:55:20 -05:00
Mark Dewing 7e0b0e05af [OpenMP][doc]Minor doc fixes
In SupportAndFAQ.rst, add blank lines before and after a bullet list and
sublist.  This avoids an "Unepxected indentation" warning.

In Runtimes.rst, adjust the suggestion for setting LIBOMPTARGET_INFO.
The right shifts are not necessary as the bit mask values are already
correct.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119595
2022-03-09 14:54:42 -05:00
Johannes Doerfert 14de0820e8 [OpenMP][FIX] Ensure the modules vector is filled as others are
The modules vector was for some reason special which could lead to it
not being of the same size (=num devices). Easiest solution is to treat
it like we do all the other vectors.
2022-03-08 23:45:43 -06:00
Joseph Huber eae306f52c [OpenMP][Docs] Make copy pasting remarks easier 2022-03-08 16:54:12 -05:00
Johannes Doerfert 1660288b28 [OpenMP][CUDA] Use one event pool per device
An event pool, similar to the stream pool, needs to be kept per device.
For one, events are associated with cuda contexts which means we cannot
destroy the former after the latter. Also, CUDA documentation states
streams and events need to be associated with the same context, which
we did not ensure at all.

Differential Revision: https://reviews.llvm.org/D120142
2022-03-07 23:43:05 -06:00
Johannes Doerfert 10aa83ff74 [OpenMP] Allow to explicitly deinitialize device resources
There are two problems this patch tries to address:
1) We currently free resources in a random order wrt. plugin and
   libomptarget destruction. This patch should ensure the CUDA plugin
   is less fragile if something during the deinitialization goes wrong.
2) We need to support (hard) pause runtime calls eventually. This patch
   allows us to free all associated resources, though we cannot
   reinitialize the device yet.

Follow up patch will associate one event pool per device/context.

Differential Revision: https://reviews.llvm.org/D120089
2022-03-07 23:43:04 -06:00
Johannes Doerfert 307bbd3c82 [OpenMP][NFCI] Use RAII lock guards in libomptarget where possible
Differential Revision: https://reviews.llvm.org/D121060
2022-03-07 23:43:04 -06:00
Jonathan Peyton 6564a70415 [OpenMP][libomp] Fix register constraint for tpause and umwait
Register constraint switched to "=q" which means very specifically (from
https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints)

> Any register accessible as rl. In 32-bit mode, a, b, c, and d; in 64-bit
mode, any integer register.

Older gcc versions (8.x and below) were trying to use esi or edi for the
8 bit flag variable, but it wound up displaying this error in the end:

kmp_lock.cpp: In function ‘void __kmp_spin_backoff(kmp_backoff_t*)’:
kmp_lock.cpp:2684:1: error: unsupported size for integer register
Hence the correct restriction is "=q" instead of "=r".

Fixes: https://github.com/llvm/llvm-project/issues/53309
Differential Revision: https://reviews.llvm.org/D120519
2022-03-07 14:55:49 -06:00
AndreyChurbanov 6d9eb7e7ec [OpenMP] libomp: implemented task priorities.
Before this patch task priorities were ignored, that was a valid implementation
as the task priority is a hint according to OpenMP specification.

Implemented shared list of sorted (high -> low) task deques one per task
priority value. Tasks execution changed to first check if priority tasks ready
for execution exist, and these tasks executed before others;
otherwise usual tasks execution mechanics work.

Differential Revision: https://reviews.llvm.org/D119676
2022-03-07 22:24:18 +03:00
Johannes Doerfert 7ead7e90fc Revert "[OpenMP][NFCI] Use RAII lock guards in libomptarget where possible"
This reverts commit ff50e81b50 as it broke
the buildbots, see https://reviews.llvm.org/D121060#3362737.
2022-03-06 21:27:41 -06:00
Johannes Doerfert ff50e81b50 [OpenMP][NFCI] Use RAII lock guards in libomptarget where possible
Differential Revision: https://reviews.llvm.org/D121060
2022-03-06 19:59:23 -06:00
James Beddek 2d0c9b64a0 [OpenMP][CMake] Ensure linking against libm for Linux
Do the same as is done for NetBSD. Some compiler-rt/lib/builtins files call
libm functions (e.g. fmaxl, fabs). Linking libomp with --rtlib=compiler-rt
references these functions.
Downstream report: https://bugs.gentoo.org/816831

Fixes: https://github.com/llvm/llvm-project/issues/51457
2022-03-05 20:20:28 -08:00
Shilei Tian 7f7c2c34b6 [OpenMP][CMake] Clean up the CMake variable `LIBOMPTARGET_LLVM_INCLUDE_DIRS`
`LIBOMPTARGET_LLVM_INCLUDE_DIRS` is currently checked and included for
multiple times redundantly. This patch is simply a clean up.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D121055
2022-03-05 22:37:59 -05:00
AndreyChurbanov 1fbdb03b1d [OpenMP] libomp: omp_in_explicit_task() implemented.
Differential Revision: https://reviews.llvm.org/D120671
2022-03-05 21:46:39 +03:00
Joseph Huber e2dcc2218c [Libomptarget] Work around bug in initialization of libomptarget
Libomptarget uses some shared variables to track certain internal stated
in the runtime. This causes problems when we have code that contains no
OpenMP kernels. These variables are normally initialized upon kernel
entry, but if there are no kernels we will see no initialization.
Currently we load the runtime into each source file when not running in
LTO mode, so these variables will be erroneously considered undefined or
dead and removed, causing miscompiles. This patch temporarily works
around the most obvious case, but others still exhibit this problem. We
will need to fix this more soundly later.

Fixes #54208.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D121007
2022-03-04 13:13:31 -05:00
Aakanksha 840695814a [AMDGPU] Add gfx1036 target
Differential Revision: https://reviews.llvm.org/D120846
2022-03-02 23:26:38 +00:00
Stanislav Mekhanoshin 2e2e64df4a [AMDGPU] Add gfx940 target
This is target definition only.

Differential Revision: https://reviews.llvm.org/D120688
2022-03-02 13:54:48 -08:00
Malhar Jajoo 6d658f37a4 [Openmp]: Missing import statement in openmp interface for Fortran
Essentially removed the "use omp_lib_kinds" statement and replaced it
with import to maintain consistency (and avoid compilation error
in case the omp_lib_kinds.mod file is not accessible) in header file.

The import is required to access entities in host scoping unit.

Differential Revision: https://reviews.llvm.org/D120707
2022-03-01 17:33:06 +00:00
Shilei Tian 75812e7704 [OpenMP][Offloading] Change N back to 256 in bug49334.cpp 2022-02-23 16:10:35 -05:00
Joseph Huber 5dd0c39638 [Libomptarget][NFC} Fix missing newline in error message 2022-02-23 08:10:16 -05:00
Carlo Bertolli 7b731f4d0b [OpenMP][libomptarget] Delay restore of shadow pointers in structs to after H2D memory copies are completed
When using asynchronous plugin calls, shadow pointer restore could happen before the D2H copy for the entire struct has completed, effectively leaving a device pointer in a host struct.
This patch fixes the problem by delaying restore's to after a synchronization happens (target regions) and by calling early synchronization (target update).

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119968
2022-02-18 10:09:10 -06:00
Joseph Huber 0870a4f59a [OpenMP] Add flag for disabling thread state in runtime
The runtime uses thread state values to indicate when we use an ICV or
are in nested parallelism. This is done for OpenMP correctness, but it
not needed in the majority of cases. The new flag added is
`-fopenmp-assume-no-thread-state`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D120106
2022-02-18 08:35:05 -05:00
Shilei Tian 092a5bb72b [OpenMP][Offloading] Fix test case issues in bug49334.cpp
`bug49334.cpp` has one issue that causes flaky result reported in #53730.
The root cause is `BlockedC` is never initialized but in `BlockMatMul_TargetNowait`
it is directly read and written (via `+=`). Fixes #53730.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D119988
2022-02-17 10:22:48 -05:00
Johannes Doerfert 57b4c5267b [OpenMP][FIX] Eliminate race on the IsSPMD global
The `IsSPMD` global can only be read by threads other than the main
thread *after* initialization is complete. To allow usage of
`mapping::getBlockSize` before initialization is done, we can pass the
`IsSPMD` state explicitly. This is similar to other APIs that take
`IsSPMD` explicitly to avoid such a race, e.g.,
`mapping::isInitialThreadInLevel0(IsSPMD)`

Fixes https://github.com/llvm/llvm-project/issues/53857
2022-02-16 14:44:20 -06:00
Joseph Huber 777039a51c [Libomptarget] Run CPU offloading tests using the new driver
This patch adds a new target to the OpenMP CPU offloading tests. This
tests the usage of the new driver for CPU offloading. If this all works
then we can move to transition to the new driver as the default.

Depends on D119613

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119736
2022-02-15 15:05:32 -05:00
Joseph Huber 48e3dcecc4 [Libomptarget][NFC] Remove constexpr to hide warnings
Currently whenever we compile the device runtime we get the following
'Mapping.cpp:32:32: warning: inline function '_OMP::impl::getGridValue'
is not defined [-Wundefined-inline]' warning. This can be silenced by
removing the constexpr attribute for this function. Doing this doesn't
change the generated bitcode at all but prevents the screen from getting
filled with warnings whenver we build the runtime.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119747
2022-02-14 15:34:18 -05:00
Jonathan Peyton 1234011b80 [OpenMP][libomp] Introduce oneAPI compiler support
Introduce KMP_COMPILER_ICX macro to represent compilation with oneAPI
compiler.

Fixup flag detection and compiler ID detection in CMake. Older CMake's
detect IntelLLVM as Clang.

Fix compiler warnings.

Fixup many of the tests to have non-empty parallel regions as they are
elided by oneAPI compiler.
2022-02-14 14:10:33 -06:00
Shilei Tian c27f530d4c [OpenMP][Offloading] Fix infinite loop in applyToShadowMapEntries
This patch fixes the issue that the for loop in `applyToShadowMapEntries`
is infinite because `Itr` is not incremented in `CB`. Fixes #53727.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119471
2022-02-12 22:02:53 -05:00
AndreyChurbanov cb1bee4725 [OpenMP] libomp: fix UB when LIBOMP_NUM_HIDDEN_HELPER_THREADS=1.
The __kmp_hidden_helper_threads_num set to N+1 if user requested N threads.
Thus number of worker hidden helper threads corresponds to user request,
main thread of helper team excluded as it does not participate in actual work.
This also fixes divide-by-0 issue in the code.

Fixes #48656

Differential Revision: https://reviews.llvm.org/D119586
2022-02-12 03:00:38 +03:00
AndreyChurbanov d84dedc7d3 [OpenMP] libomp: fix bug in implementation of distribute construct.
Fixed mistaken iterations distribution between different target regions.

Differential Revision: https://reviews.llvm.org/D118393
2022-02-11 17:34:26 +03:00
Shilei Tian 702a976c12 [OpenMP][Offloading] Change the way to compare floating point values in bug49334.cpp
`bug49334.cpp` directly uses `!=` to compare two floating point values,
which is almost wrong.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D119485
2022-02-10 18:20:36 -05:00
Shilei Tian aca33b0b37 [OpenMP][CUDA] Remove the hard team limit
Currently we have a hard team limit, which is set to 65536. It says no matter whether the device can support more teams, or users set more teams, as long as it is larger than that hard limit, the final number to launch the kernel will always be that hard limit. It is way less than the actual hardware limit. For example, my workstation has GTX2080, and the hardware limit of grid size is 2147483647, which is exactly the largest number a `int32_t` can represent. There is no limitation mentioned in the spec. This patch simply removes it.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119313
2022-02-10 18:07:46 -05:00
Ye Luo 59ad9650cf [Libomptarget][AMDGCN] add gfx90c target
Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D119478
2022-02-10 15:55:44 -06:00
Shilei Tian f6685f7746 [OpenMP][CUDA] Refine the logic to determine grid size
This patch refines the logic to determine grid size as previous method
can escape the check of whether `CudaBlocksPerGrid` could be greater than the actual
hardware limit.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119311
2022-02-10 14:13:32 -05:00
Joseph Huber 9582f09690 [Libomptarget] Increase stack size for bug49779 test
The 'bug49779.cpp' test has been failing recently. This is because the
runtime is sufficiently complex when using nested parallelism without
optimizations that the CUDA tools cannot statically determine the stack
size. Because of this the kernel can exceed the thread stack size and
crash. Work around this using the 'LIBOMPTARGET_STACK_SIZE' environment
variable and add an FAQ entry for this situation.

Fixes #53670

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D119357
2022-02-09 15:37:23 -05:00
Jonathan Peyton 6be7c21b57 [OpenMP][libomp] Replace accidental VLA with KMP_ALLOCA
MSVC does not support variable length arrays. Replace with KMP_ALLOCA
which is already used in the same file for stack-allocated variables.
2022-02-09 08:09:27 -06:00
Joseph Huber 99d72ebddf [Libomptarget] Add header files as a dependency to CMake target
This patch manually adds the runtime include files to the list of
dependencies when we build the bitcode runtime library. Previously if
only the header was changed we would not recompile the source files.
The solution used here isn't optimal because every source file not has a
dependency on each header file regardless of if it was actually used by
that file.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D119254
2022-02-08 12:09:59 -05:00
Joseph Huber f8ffac5987 [OpenMP] Enable new driver tests for AMDGPU
This patch enables running the new driver tests for AMDGPU. Previously
this was disabled because some tests failed. This was only because the
new driver tests hadn't been listed as unsupported or expected to fail.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D119240
2022-02-08 09:55:29 -05:00
Joseph Huber d28051c4ab [Libomptarget] Replace Value RAII with default value
This patch replaces the ValueRAII pointer with a default 'nullptr'
value. Previously this was initialized as a reference to an existing
variable. The use of this variable caused overhead as the compiler could
not look through the uses and determine that it was unused if 'Active'
was not set. Because of this accesses to the variable would be left in
the runtime once compiled.

Fixes #53641

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119187
2022-02-07 17:12:00 -05:00