[OpenMP][Docs] Add documentation for linking OpenMP with CUDA/HIP

Summary:
This patch adds an entry to the FAQ that shows how to link CUDA with
OpenMP.
This commit is contained in:
Joseph Huber 2022-10-11 12:00:06 -05:00
parent 4b76a80459
commit 316eaa3008
1 changed files with 34 additions and 14 deletions

View File

@ -333,28 +333,28 @@ occurs.
Q: Can OpenMP offloading compile for multiple architectures?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Since LLVM version 15.0, OpenMP offloading supports offloading to multiple
architectures at once. This allows for executables to be run on different
targets, such as offloading to AMD and NVIDIA GPUs simultaneously, as well as
multiple sub-architectures for the same target. Additionally, static libraries
will only extract archive members if an architecture is used, allowing users to
Since LLVM version 15.0, OpenMP offloading supports offloading to multiple
architectures at once. This allows for executables to be run on different
targets, such as offloading to AMD and NVIDIA GPUs simultaneously, as well as
multiple sub-architectures for the same target. Additionally, static libraries
will only extract archive members if an architecture is used, allowing users to
create generic libraries.
The architecture can either be specified manually using ``--offload-arch=``. If
``--offload-arch=`` is present no ``-fopenmp-targets=`` flag is present then the
targets will be inferred from the architectures. Conversely, if
``--fopenmp-targets=`` is present with no ``--offload-arch`` then the target
architecture will be set to a default value, usually the architecture supported
The architecture can either be specified manually using ``--offload-arch=``. If
``--offload-arch=`` is present no ``-fopenmp-targets=`` flag is present then the
targets will be inferred from the architectures. Conversely, if
``--fopenmp-targets=`` is present with no ``--offload-arch`` then the target
architecture will be set to a default value, usually the architecture supported
by the system LLVM was built on.
For example, an executable can be built that runs on AMDGPU and NVIDIA hardware
For example, an executable can be built that runs on AMDGPU and NVIDIA hardware
given that the necessary build tools are installed for both.
.. code-block:: shell
clang example.c -fopenmp --offload-arch=gfx90a --offload-arch=sm_80
If just given the architectures we should be able to infer the triples,
If just given the architectures we should be able to infer the triples,
otherwise we can specify them manually.
.. code-block:: shell
@ -363,7 +363,7 @@ otherwise we can specify them manually.
-Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx90a \
-Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_80
When linking against a static library that contains device code for multiple
When linking against a static library that contains device code for multiple
architectures, only the images used by the executable will be extracted.
.. code-block:: shell
@ -372,7 +372,7 @@ architectures, only the images used by the executable will be extracted.
llvm-ar rcs libexample.a example.o
clang app.c -fopenmp --offload-arch=gfx90a -o app
The supported device images can be viewed using the ``--offloading`` option with
The supported device images can be viewed using the ``--offloading`` option with
``llvm-objdump``.
.. code-block:: shell
@ -393,3 +393,23 @@ The supported device images can be viewed using the ``--offloading`` option with
arch sm_80
triple nvptx64-nvidia-cuda
producer openmp
Q: Can I link OpenMP offloading with CUDA or HIP?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OpenMP offloading files can currently be experimentally linked with CUDA and HIP
files. This will allow OpenMP to call a CUDA device function or vice-versa.
However, the global state will be distinct between the two images at runtime.
This means any global variables will potentially have different values when
queried from OpenMP or CUDA.
Linking CUDA and HIP currently requires enabling a different compilation mode
for CUDA / HIP with ``--offload-new-driver`` and to link using
``--offload-link``. Additionally, ``-fgpu-rdc`` must be used to create a
linkable device image.
.. code-block:: shell
clang++ openmp.cpp -fopenmp --offload-arch=sm_80 -c
clang++ cuda.cu --offload-new-driver --offload-arch=sm_80 -fgpu-rdc -c
clang++ openmp.o cuda.o --offload-link -o app