Commit Graph

318 Commits

Author SHA1 Message Date
Ron Lieberman b09a5e5cb3 Revert "Add mean_anyway to hpc config"
my bad, wrong repo ,so sorry.

This reverts commit 0b9350f3da.
2022-11-29 15:20:23 -06:00
Ron Lieberman 0b9350f3da Add mean_anyway to hpc config 2022-11-29 15:11:57 -06:00
Joseph Huber 3458a2b737 [Libomptarget][NFC] Add missing LLVM header 2022-11-29 09:46:51 -06:00
Vitaly Buka 98441fc9e4 [NFC][OpenMP] Remove unused label 2022-11-17 23:35:08 -08:00
Vitaly Buka a35ad711d9 [NFC][OpenMP] Fix const cast warning 2022-11-17 23:24:40 -08:00
Vitaly Buka e42080ae3f [NFC][OpenMP] Remove extra ";" 2022-11-17 23:24:40 -08:00
Kevin Sala 846904195b [OpenMP][libomptarget] New plugin infrastructure and new CUDA plugin
This patch adds a new infrastructure for OpenMP target plugins. It also implements the CUDA and GenericELF64bit plugins under this new infrastructure. We place the sources in a separate directory named plugins-nextgen, and we build the new plugins as different plugin libraries. The original plugins, which remain untouched, will be used by default. However, the user can change this behavior at run-time through the boolean envar LIBOMPTARGET_NEXTGEN_PLUGINS. If enabled, the libomptarget will try to load the NextGen version of each plugin, falling back to the original if they are not present or valid.

The idea of this new plugin infrastructure is to implement the common parts of target plugins in generic classes (defined in files inside plugins-next/common/PluginInterface folder), and then, each specific plugin defines its own specific classes inheriting from the common ones. In this way, most logic remains on the common interface while reducing the plugin-specific source code. It is also beneficial in the sense that now most code and behavior are the same across the different plugins. As an example, we define classes for a plugin, a device, a device image, a stream manager, etc. The plugin object (a single instance per plugin library) holds different device objects (i.e., one per available device), while these latter are the responsible for managing its own resources.

Most code on this patch is based on the changes made by @jdoerfert (Johannes Doerfert)

Reviewed By: jhuber6, jdoerfert

Differential Revision: https://reviews.llvm.org/D134396
2022-10-27 18:10:14 +00:00
Joseph Huber 429d3d4e9d [Libomptarget] Build plugins with protected visibility by default
The plugins all define the same interface symbols. This is generally not
a problem when calling the plugin directly from the dynamic library's
handle. However, when calling from within the plugin itself it is
possible for another plugin's symbols to preempt the symbols. This was
observed with the `__tgt_rtl_is_valid_binary` call in the
`__tgt_rtl_is_valid_binary_info` function being mapped to the x86_64
plugin.

This patch changes the default visibility to `protected` intead. This
visibility ensures that these symbols are all externally visible from
the plugin, but ensures their definitions are fixed within the shared
library. Having protected visiiblity makes such symbol preemption
impossible.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D136365
2022-10-20 11:12:18 -05:00
Joseph Huber 1af7541741 [Libomptarget] Fix missing semicolon in exports 2022-10-14 09:02:42 -05:00
Joseph Huber 619dced0fc [Libomptarget] Don't use full names for exported plugin symbols
Summary:
This patch changes the `exports` file to export all `__tgt_rtl`
functions. This is a better option as not each plugin implements all of
these functions, furthermore any new functions added will be
automatically included.
2022-10-14 08:57:57 -05:00
Slava Zakharin 88da0de14f Revert "[Libomp] Do not error on undefined version script symbols"
This reverts commit 096f93e73d.

Revert "[Libomptarget] Make the plugins ingore undefined exported symbols"

This reverts commit 3f62314c23.

Revert "[LLD] Enable --no-undefined-version by default."

This reverts commit 7ec8b0d162.

Three commits are reverted because of the current omp build fail
with GNU ld. See discussion here: https://reviews.llvm.org/rG096f93e73dc3
2022-10-13 14:12:07 -07:00
Joseph Huber 3f62314c23 [Libomptarget] Make the plugins ingore undefined exported symbols
Summary:
Recent changes made the default behaviour to error when given an
undefined symbol in a version script. A previous patch fixed this for
`libomptarget` by removing the single undefined symbol. However, the
plguins are expected to only define a subset of the availible functions
so we shouldn't treat it as an error. This patch updates the build flags
to work appropriately.
2022-10-13 08:13:03 -05:00
Dan Palermo db021abf33 [OpenMP][AMDGPU] Enable OpenMP device runtime build for gfx110[0123]
Add OpenMP device runtime build support for the gfx1100, gfx1101,
gfx1102, and gfx1103 targets.

Differential Revision: https://reviews.llvm.org/D134465
2022-09-23 01:49:51 +00:00
Joseph Huber 292cb114b0 [Libomptarget] Revert changes to AMDGPU plugin destructors
These patches exposed a lot of problems in the AMD toolchain. Rather
than keep it broken we should revert it to its old semi-functional
state. This will prevent us from using device destructors but should
remove some new bugs. In the future this interface should be changed
once these problems are addressed more correctly.

This reverts commit ed0f218115.

This reverts commit 2b7203a359.

Fixes #57536

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D133997
2022-09-16 06:55:51 -05:00
Joseph Huber 23bc343855 [Libomptarget] Change device free routines to accept the allocation kind
Previous support for device memory allocators used a single free
routine and did not provide the original kind of the allocation. This is
problematic as some of these memory types required different handling.
Previously this was worked around using a map in runtime to record the
original kind of each pointer. Instead, this patch introduces new free
routines similar to the existing allocation routines. This allows us to
avoid a map traversal every time we free a device pointer.

The only interfaces defined by the standard are `omp_target_alloc` and
`omp_target_free`, these do not take a kind as `omp_alloc` does. The
standard dictates the following:

"The omp_target_alloc routine returns a device pointer that references
the device address of a storage location of size bytes. The storage
location is dynamically allocated in the device data environment of the
device specified by device_num."

Which suggests that these routines only allocate the default device
memory for the kind. So this has been changed to reflect this. This
change is somewhat breaking if users were using `omp_target_free` as
previously shown in the tests.

Reviewed By: JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D133053
2022-09-14 12:14:07 -05:00
Joseph Huber c2acb1e5d3 [Libomptarget][NFC] Remove unused variable 2022-09-09 15:26:02 -05:00
Joseph Huber 83fcba82cc [Libomptarget] Add proper LLVM libraries now that the AMDGPU plugin uses them
Summary:
The AMDGPU and CUDA plugins now relies on the Object and Support
libraries. This patch adds them explicitly rather than hoping that they
share the symbols loaded from the standard `libomptarget`.
2022-09-09 10:33:26 -05:00
Joseph Huber 8d2a447bf9 [Libomptarget] Remove leftover ELF header from x86 plugin
Summary:
We removed the linking support for `gelf.h` in a previous patch. This
header was incorrectly leftover causing build problems on some systems.
2022-09-07 13:41:40 -05:00
Joseph Huber 300155911a [Libomptarget] Replace libelf with LLVM's Elf libraries
This patch replaces the dependency on `libelf` with LLVM's ELF support.
With this patch the user no-longer needs to have `libelf` on their
system to build and configure OpenMP offloading. The replacement is
mostly mechanical, with the exception of the hash table support which
was added in D131309.

Depends on D131309

Reviewed By: JonChesterfield, saiislam

Differential Revision: https://reviews.llvm.org/D131401
2022-09-07 12:38:51 -05:00
Joseph Huber 894531f59b [Libomptarget] Add utility functions for loading an ELF symbol by name
The `SHT_HASH` sections in an ELF are used to look up a symbol in the
symbol table using a symbol's name. This is done by obtaining the
`SHT_HASH` section and using its `sh_link` attribute to access the
associated symbol table, from which we can access the string table
containing the associated name. We can then search for the symbol using
the hash of the name and the buckets and chains in the hash table
itself

This patch adds utility functions that allow us to look up a symbol in
an ELF file by name. It will first attempt to look through the hash
tables, and then search the section tables manually if failed. This
allows us to pull out constants necessary for setting up offloading
without first loading the object.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D131309
2022-09-07 12:38:50 -05:00
Joseph Huber 31f434ee3b [Libomptarget][NFC] Clean up CUDA plugin and address warnings 2022-09-06 15:28:57 -05:00
Joseph Huber f8b1f93f26 [libomptarget] Enable the device allocator for AMDGPU
This patch adds support for the device memory type, this is currently equivalent
to the default type so it should be treated as the same.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D133128
2022-09-01 12:40:59 -05:00
Jon Chesterfield ffabe997a5 [openmp][amdgpu] Implement target_alloc_host as fine grain HSA memory
The cuda plugin maps TARGET_ALLOC_HOST onto cuMemAllocHost
which is page locked host memory. Fine grain HSA memory is not
necessarily page locked but has the same read/write from host or
device semantics.

The cuda plugin does this per-gpu and this patch makes it accessible
from any gpu, but it can be locked down to match the cuda behaviour
if preferred.

Enabling tests requires an equivalent to
// RUN: %libomptarget-compile-run-and-check-nvptx64-nvidia-cuda
for amdgpu which doesn't seem to be in use yet.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D132660
2022-08-25 16:27:52 +01:00
Ye Luo 322ea53144 [libomptarget][amdgpu] enable tests whenever possible.
if(TARGET amdgpu-arch) doesn't work when ENABLE_LLVM_PROJECTS=openmp because openmp subdirectory is processed before clang subdirectory. Adopt the same logic of enabling tests like the CUDA plugin.

Differential Revision: https://reviews.llvm.org/D132579
2022-08-24 14:33:28 -05:00
Joseph Huber 540a13652f [Libomptarget] Replace use of `dlopen` with LLVM's dynamic library support
This patch replaces uses of `dlopen` and `dlsym` with LLVM's support
with `loadPermanentLibrary` and `getSymbolAddress`. This allows us to
remove the explicit dependency on the `dl` libraries in the CMake. This
removes another explicit dependency and solves an issue encountered
while building on Windows platforms. The one downside to this is that
the LLVM library does not currently support `dlclose` functionality, but
this could be added in the future.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D131507
2022-08-24 10:46:21 -05:00
Joseph Huber 30efb459e0 [Libomptarget] Remove use of ELF link_address in x86_64 plugin
We use the offloading entires array to determine the relative names and
addressed of device-side kernel functions. The x86_64 plugin previously
derived the device-side entry table by first identifying the
`omp_offloading_entries` section offset in the loaded elf. Then we would
use the base offset of the loaded dyanmic library to identify the
entries array within the loaded image. This relied on some more
unconventional methods which prevented us from using the LLVM dynamic
library loader for this plugin. This patch simplifies this by instead
copying the host-side entry and replacing its address with the
device-side address looked up through `dlsym`.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D131516
2022-08-24 10:46:20 -05:00
Joseph Huber fdbb15355e [Libomptarget][CUDA] Check CUDA compatibilty correctly
We recently added support for multi-architecture binaries in
libomptarget. This is done by extracting the architecture from the
embedded image and comparing it with the major and minor version
supported by the current CUDA installation. Previously we just compared
these directly, which was not correct for binary compatibility. The CUDA
documentation states that we can consider any image with an equivalent
major or a greater or equal to minor compatible with the current image.
Change the check to use this new logic in the CUDA plugin.

Fixes #57049

Reviewed By: jdoerfert, ye-luo

Differential Revision: https://reviews.llvm.org/D131567
2022-08-10 11:15:27 -04:00
Fangrui Song 0972a390b9 LLVM_FALLTHROUGH => [[fallthrough]]. NFC 2022-08-09 04:06:52 +00:00
Jon Chesterfield 104f11630a [nfc][openmp] clang-format system.cpp prior to D131401 2022-08-08 16:24:34 +01:00
Joseph Huber b3335e8ed7 [Libomptarget][NFC] Clang format the AMDGPU plugin
Summary:
A previous patch did not format the plugin again after making changes.
Ensure that libomptarget stays formatted.
2022-08-03 15:18:16 -04:00
Joseph Huber 2b7203a359 [Libomptarget] Deinitialize AMDGPU global state more intentionally
A previous patch made the destruction of the HSA plugin more
deterministic. However, there were still other global values that are not
handled this way. When attempting to call a destructor kernel, the
device would have already been uninitialized and we could not find the
appropriate kernel to call. This is because they were stored in global
containers that had their destructors called already. Merges this global
state into the rest of the info state by putting those global values
inside of the global pointer already allocated and deallocated by the
constructor and destructor. This should allow the AMDGPU plugin to
correctly identify the destructors if we were to run them.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D131011
2022-08-02 18:24:39 -04:00
Jon Chesterfield ed0f218115 [openmp][amdgpu] Tear down amdgpu plugin accurately
Moves DeviceInfo global to heap to accurately control lifetime.
Moves calls from libomptarget to deinit_plugin later, plugins need to stay
alive until very shortly before libomptarget is destructed.

Leaving the deinit_plugin calls where initially inserted hits use after
free from the dynamic_module.c offloading test (verified with valgrind
 that the new location is sound with respect to this)

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D130714
2022-07-28 20:00:03 +01:00
Jon Chesterfield c214cb6a68 [amdgpu][openmp][nfc] Restore stb_local on DeviceInfo symbol 2022-07-28 16:50:46 +01:00
Jon Chesterfield 75aa521064 [openmp][amdgpu] Move global DeviceInfo behind call syntax prior to using D130712 2022-07-28 16:40:42 +01:00
Jon Chesterfield 1f9d3974e4 [openmp] Introduce optional plugin init/deinit functions
Will allow plugins to migrate away from using global variables to
manage lifetime, which will fix a segfault discovered in relation to D127432

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D130712
2022-07-28 16:21:38 +01:00
Saiyedul Islam 4075a811ad
[Libomptarget] Add checks for AMDGPU TargetID using new image info
This patch extends the is_valid_binary routine to also check if the
binary's target ID matches the one parsed from the system's runtime
environment.
This should allow us to only use the binary whose compute capability
matches, allowing us to support basic multi-architecture binaries for
AMDGPU.
It also handles compatibility testing of target IDs of the image and
the enviornment.

Depends on D127432

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D127769
2022-07-26 02:44:31 -05:00
Saiyedul Islam 4cf30c5157
Revert "Revert "Revert "[Libomptarget] Add checks for AMDGPU TargetID using new image info"""
This reverts commit 281eb9223c.
2022-07-25 11:35:37 -05:00
Saiyedul Islam 281eb9223c
Revert "Revert "[Libomptarget] Add checks for AMDGPU TargetID using new image info""
This reverts commit 8cbf4a386b.
2022-07-25 08:32:26 -05:00
Saiyedul Islam 8cbf4a386b
Revert "[Libomptarget] Add checks for AMDGPU TargetID using new image info"
This reverts commit 471f2abc62.
2022-07-25 05:32:59 -05:00
Saiyedul Islam 471f2abc62
[Libomptarget] Add checks for AMDGPU TargetID using new image info
This patch extends the is_valid_binary routine to also check if the
binary's target ID matches the one parsed from the system's runtime
environment.
This should allow us to only use the binary whose compute capability
matches, allowing us to support basic multi-architecture binaries for
AMDGPU.
It also handles compatibility testing of target IDs of the image and
the enviornment.

Depends on D127432

Differential Revision: https://reviews.llvm.org/D127769
2022-07-25 04:44:36 -05:00
Joel E. Denny cfa6e79df3 [Libomptarget] Don't report lack of CUDA devices
Sometimes libomptarget's CUDA plugin produces unhelpful diagnostics
about a lack of CUDA devices before an application runs:

```
$ clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa hello-world.c
$ ./a.out
CUDA error: Error returned from cuInit
CUDA error: no CUDA-capable device is detected
Hello World: 4
```

This can happen when the CUDA plugin was built but all CUDA devices
are currently disabled in some manner, perhaps because
`CUDA_VISIBLE_DEVICES` is set to the empty string.  As shown in the
above example, it can even happen when we haven't compiled the
application for offloading to CUDA.

The following code from `openmp/libomptarget/plugins/cuda/src/rtl.cpp`
appears to be intended to handle this case, and it chooses not to
write a diagnostic to stderr unless debugging is enabled:

```
if (NumberOfDevices == 0) {
  DP("There are no devices supporting CUDA.\n");
  return;
}
```

The problem is that the above code is never reached because the
earlier `cuInit` returns `CUDA_ERROR_NO_DEVICE`.  This patch handles
that `cuInit` case in the same manner as the above code handles the
`NumberOfDevices == 0` case.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D130371
2022-07-22 14:46:45 -04:00
Joseph Huber a3804a3145 [Libomptarget] Make the plugins link as LLVM libraries
Previously we made `libomptarget` link as an LLVM library so we have
access to the LLVM core libraries. After the initial patch stuck we can
now apply the same changes to the plugins. This will allow us to use
LLVM in all of `libomptarget` when we have uses for them. In the future
this should allow us to remove the dependencies on `libelf`, `libffi`,
and `dl`.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D130262
2022-07-22 09:34:12 -04:00
Joseph Huber e01ce4e88a [Libomptarget] Add checks for CUDA subarchitecture using new info
This patch extends the `is_valid_binary` routine to also check if the
binary's architecture string matches the one parsed from the runtime.
This should allow us to only use the binary whose compute capability
matches, allowing us to support basic multi-architecture binaries for
CUDA.

Depends on D127432

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D127505
2022-07-21 13:20:06 -04:00
Joseph Huber 1f940b69c3 [Libomptarget][NFC] Fix signed comparison warnings
Summary:
Non-functional change, just fixing some sign comparison warnings by
making both match.
2022-07-15 13:22:55 -04:00
Joseph Huber d27d0a673c [Libomptarget][NFC] Make Libomptarget use the LLVM naming convention
Libomptarget grew out of a project that was originally not in LLVM. As
we develop libomptarget this has led to an increasingly large clash
between the naming conventions used. This patch fixes most of the
variable names that did not confrom to the LLVM standard, that is
`VariableName` for variables and `functionName` for functions.

This patch was primarily done using my editor's linting messages, if
there are any issues I missed arising from the automation let me know.

Reviewed By: saiislam

Differential Revision: https://reviews.llvm.org/D128997
2022-07-05 14:53:38 -04:00
Shilei Tian 696bca9bb2 [NFC][OpenMP][CUDA] Remove unnecessary default label 2022-07-01 09:50:29 -04:00
Shilei Tian 2695e23ad9 [OpenMP][CUDA] Fix the issue that P2P memcpy doesn't work
This patch fixes the issue that P2P memcpy doesn't work. The root cause is we didn't set current context when calling the API function. In addition, a matrix to track the states of each pair of devices is also added such that we only need to query and configure the device once.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D122764
2022-06-28 15:32:03 -04:00
Joseph Huber d5d836635c [Libomptarget] Add test config for compiling in LTO-mode
We are planning on making LTO the default compilation mode for
offloading. In order to make sure it works we should run these tests on
the test suite. AMDGPU already uses the LTO compilation path for its
linking, but in LTO mode it also links the static library late.

Performing LTO requires the static library to be built, if we make the
change this will be a hard requirement and the old bitcode library will
go away. This means users will need to use either a two-step build or a
runtimes build for libomptarget.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D127512
2022-06-14 10:16:03 -04:00
Jose Manuel Monsalve Diaz 15ed5c0a07 [LIBOMPTARGET] Adding AMD to llvm-omp-device-info
Adding device information print for AMD devices on the
`llvm-omp-device-info` command line tool. The output is inspired by
the rocminfo command line tool.

This commit adds missing HSA functions, enums and structs
needed to query additional information from the HSA agents.
A generic message for the `generic-elf-64bit` plugin is also added

Example of an output:
```
llvm-omp-device-info
Device (0):
    This is a generic-elf-64bit device

Device (1):
    This is a generic-elf-64bit device

Device (2):
    This is a generic-elf-64bit device

Device (3):
    This is a generic-elf-64bit device

Device (4):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           0
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE

Device (5):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           1
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE

Device (6):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           2
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE

Device (7):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           3
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE
```

Differential Revision: https://reviews.llvm.org/D126836
2022-06-09 11:58:39 +00:00
Jose Manuel Monsalve Diaz 84e020a061 Revert "[LIBOMPTARGET] Adding AMD to llvm-omp-device-info"
This reverts commit d16a0877d8.
2022-06-09 10:46:03 +00:00