[OpenMP][Docs] Add documentation for linking OpenMP with CUDA/HIP
Summary: This patch adds an entry to the FAQ that shows how to link CUDA with OpenMP.
This commit is contained in:
parent
4b76a80459
commit
316eaa3008
|
@ -333,28 +333,28 @@ occurs.
|
|||
Q: Can OpenMP offloading compile for multiple architectures?
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Since LLVM version 15.0, OpenMP offloading supports offloading to multiple
|
||||
architectures at once. This allows for executables to be run on different
|
||||
targets, such as offloading to AMD and NVIDIA GPUs simultaneously, as well as
|
||||
multiple sub-architectures for the same target. Additionally, static libraries
|
||||
will only extract archive members if an architecture is used, allowing users to
|
||||
Since LLVM version 15.0, OpenMP offloading supports offloading to multiple
|
||||
architectures at once. This allows for executables to be run on different
|
||||
targets, such as offloading to AMD and NVIDIA GPUs simultaneously, as well as
|
||||
multiple sub-architectures for the same target. Additionally, static libraries
|
||||
will only extract archive members if an architecture is used, allowing users to
|
||||
create generic libraries.
|
||||
|
||||
The architecture can either be specified manually using ``--offload-arch=``. If
|
||||
``--offload-arch=`` is present no ``-fopenmp-targets=`` flag is present then the
|
||||
targets will be inferred from the architectures. Conversely, if
|
||||
``--fopenmp-targets=`` is present with no ``--offload-arch`` then the target
|
||||
architecture will be set to a default value, usually the architecture supported
|
||||
The architecture can either be specified manually using ``--offload-arch=``. If
|
||||
``--offload-arch=`` is present no ``-fopenmp-targets=`` flag is present then the
|
||||
targets will be inferred from the architectures. Conversely, if
|
||||
``--fopenmp-targets=`` is present with no ``--offload-arch`` then the target
|
||||
architecture will be set to a default value, usually the architecture supported
|
||||
by the system LLVM was built on.
|
||||
|
||||
For example, an executable can be built that runs on AMDGPU and NVIDIA hardware
|
||||
For example, an executable can be built that runs on AMDGPU and NVIDIA hardware
|
||||
given that the necessary build tools are installed for both.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
clang example.c -fopenmp --offload-arch=gfx90a --offload-arch=sm_80
|
||||
|
||||
If just given the architectures we should be able to infer the triples,
|
||||
If just given the architectures we should be able to infer the triples,
|
||||
otherwise we can specify them manually.
|
||||
|
||||
.. code-block:: shell
|
||||
|
@ -363,7 +363,7 @@ otherwise we can specify them manually.
|
|||
-Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx90a \
|
||||
-Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_80
|
||||
|
||||
When linking against a static library that contains device code for multiple
|
||||
When linking against a static library that contains device code for multiple
|
||||
architectures, only the images used by the executable will be extracted.
|
||||
|
||||
.. code-block:: shell
|
||||
|
@ -372,7 +372,7 @@ architectures, only the images used by the executable will be extracted.
|
|||
llvm-ar rcs libexample.a example.o
|
||||
clang app.c -fopenmp --offload-arch=gfx90a -o app
|
||||
|
||||
The supported device images can be viewed using the ``--offloading`` option with
|
||||
The supported device images can be viewed using the ``--offloading`` option with
|
||||
``llvm-objdump``.
|
||||
|
||||
.. code-block:: shell
|
||||
|
@ -393,3 +393,23 @@ The supported device images can be viewed using the ``--offloading`` option with
|
|||
arch sm_80
|
||||
triple nvptx64-nvidia-cuda
|
||||
producer openmp
|
||||
|
||||
Q: Can I link OpenMP offloading with CUDA or HIP?
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
OpenMP offloading files can currently be experimentally linked with CUDA and HIP
|
||||
files. This will allow OpenMP to call a CUDA device function or vice-versa.
|
||||
However, the global state will be distinct between the two images at runtime.
|
||||
This means any global variables will potentially have different values when
|
||||
queried from OpenMP or CUDA.
|
||||
|
||||
Linking CUDA and HIP currently requires enabling a different compilation mode
|
||||
for CUDA / HIP with ``--offload-new-driver`` and to link using
|
||||
``--offload-link``. Additionally, ``-fgpu-rdc`` must be used to create a
|
||||
linkable device image.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
clang++ openmp.cpp -fopenmp --offload-arch=sm_80 -c
|
||||
clang++ cuda.cu --offload-new-driver --offload-arch=sm_80 -fgpu-rdc -c
|
||||
clang++ openmp.o cuda.o --offload-link -o app
|
||||
|
|
Loading…
Reference in New Issue