Mixing LLVM and Clang address spaces can result in subtle bugs, and there
is no need for this hook to use the LLVM IR level address spaces.
Most of this change is just replacing zero with LangAS::Default,
but it also allows us to remove a few calls to getTargetAddressSpace().
This also removes a stale comment+workaround in
CGDebugInfo::CreatePointerLikeType(): ASTContext::getTypeSize() does
return the expected size for ReferenceType (and handles address spaces).
Differential Revision: https://reviews.llvm.org/D138295
Recent Clang changes expose _bf16 types for SSE2-enabled host compilations and
that makes those types visible furing GPU-side compilation, where it currently
fails with Sema complaining that __bf16 is not supported.
Considering that __bf16 is a storage-only type, enabling it for NVPTX if it's
enabled on the host should pose no issues, correctness-wise.
Recent NVIDIA GPUs have introduced bf16 support, so we'll likely grow better
support for __bf16 on NVPTX going forward.
Differential Revision: https://reviews.llvm.org/D136311
Currently we define the `__CUDA_ARCH__` macro only in CUDA mode. This
patch allows us to use this macro in OpenMP-offloading mode when
targeting NVPTX.
Reviewed By: tra, tianshilei1992
Differential Revision: https://reviews.llvm.org/D125256
This patch enables SPIR-V binary emission for HIP device code via the
HIPSPV tool chain.
‘--offload’ option, which is envisioned in [1], is added for specifying
offload targets. This option is used to override default device target
(amdgcn-amd-amdhsa) for HIP compilation for emitting device code as
SPIR-V binary. The option is handled in getHIPOffloadTargetTriple().
getOffloadingDeviceToolChain() function (based on the design in the
SYCL repository) is added to select HIPSPVToolChain when HIP offload
target is ‘spirv64’.
The HIPActionBuilder is modified to produce LLVM IR at the backend
phase. HIPSPV tool chain expects to receive HIP device code as LLVM
IR so it can run external LLVM passes over them. HIPSPV TC is also
responsible for emitting the SPIR-V binary.
A Cuda GPU architecture ‘generic’ is added. The name is picked from
the LLVM SPIR-V Backend. In the HIPSPV code path the architecture
name is inserted to the bundle entry ID as target ID. Target ID is
expected to be always present so a component in the target triple
is not mistaken as target ID.
Tests are added for checking the HIPSPV tool chain.
[1]: https://lists.llvm.org/pipermail/cfe-dev/2020-December/067362.html
Patch by: Henry Linjamäki
Reviewed by: Yaxun Liu, Artem Belevich, Alexey Bader
Differential Revision: https://reviews.llvm.org/D110622
Remove redundant fields and replace pointer with virtual function
Of fourteen fields, three are dead and four can be computed from the
remainder. This leaves a couple of currently dead fields in place as
they are expected to be used from the deviceRTL shortly. Two of the
fields that can be computed are only used from codegen and require a
log2() implementation so are inlined into codegen instead.
This change leaves the new methods in the same location in the struct
as the previous fields for convenience at review.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D108380
Remove redundant fields and replace pointer with virtual function
Of fourteen fields, three are dead and four can be computed from the
remainder. This leaves a couple of currently dead fields in place as
they are expected to be used from the deviceRTL shortly. Two of the
fields that can be computed are only used from codegen and require a
log2() implementation so are inlined into codegen instead.
This change leaves the new methods in the same location in the struct
as the previous fields for convenience at review.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D108380
[nfc] Replaces enum indices into an array with a struct. Named the
fields to match the enum, leaves memory layout and initialization unchanged.
Motivation is to later safely remove dead fields and replace redundant ones
with (compile time) computation. It should also be possible to factor some
common fields into a base and introduce a gfx10 amdgpu instance with less
duplication than the arrays of integers require.
Reviewed By: ronlieb
Differential Revision: https://reviews.llvm.org/D108339
The patch only plumbs through the option necessary for targeting sm_86 GPUs w/o
adding any new functionality.
Differential Revision: https://reviews.llvm.org/D95974
This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were
previously included in gfx909.
Differential Revision: https://reviews.llvm.org/D90419
Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
At AMD, in an internal audit of our code, we found some corner cases
where we were not quite differentiating targets enough for some old
hardware. This commit is part of fixing that by adding three new
targets:
* The "Oland" and "Hainan" variants of gfx601 are now split out into
gfx602. LLPC (in the GPUOpen driver) and other front-ends could use
that to avoid using the shaderZExport workaround on gfx602.
* One variant of gfx703 is now split out into gfx705. LLPC and other
front-ends could use that to avoid using the
shaderSpiCsRegAllocFragmentation workaround on gfx705.
* The "TongaPro" variant of gfx802 is now split out into gfx805.
TongaPro has a faster 64-bit shift than its former friends in gfx802,
and a subtarget feature could be set up for that to take advantage of
it. This commit does not make that change; it just adds the target.
V2: Add clang changes. Put TargetParser list in order.
V3: AMDGCNGPUs table in TargetParser.cpp needs to be in GPUKind order,
so fix the GPUKind order.
Differential Revision: https://reviews.llvm.org/D88916
Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
Currently CUDA/HIP toolchain uses "unknown" as bound arch
for offload action for fat binary. This causes -mcpu or -march
with "unknown" added in HIPToolChain::TranslateArgs or
CUDAToolChain::TranslateArgs.
This causes issue for https://reviews.llvm.org/D88377 since
HIP toolchain needs to check -mcpu in HIPToolChain::TranslateArgs.
The bound arch of offload action for fat binary is not really
used, therefore set it to CudaArch::UNUSED.
Differential Revision: https://reviews.llvm.org/D88524
Summary:
New file include to support platform dependent grid constants. It will be
used by clang, libomptarget plugins, and deviceRTLs to access constant
values consistently and with fast access in the deviceRTLs.
Originally authored by Greg Rodgers (@gregrodgers).
Reviewers: arsenm, sameerds, jdoerfert, yaxunl, b-sumner, scchan, JonChesterfield
Reviewed By: arsenm
Subscribers: llvm-commits, pdhaliwal, jholewinski, jvesely, wdng, nhaehnle, guansong, kerbowa, sstefan1, cfe-commits, ronlieb, gregrodgers
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D80917
Generate PTX using newer versions of PTX and allow using sm_80 with CUDA-11.
None of the new features of CUDA-10.2+ have been implemented yet, so using these
versions will still produce a warning.
Differential Revision: https://reviews.llvm.org/D77670
According to alignment section in below ARM64 ABI document, MSVC could increase
alignment of global data based on its total size. Clang doesn't do this. Compile
the same symbol into different alignments by Clang and MSVC could cause link
error because some instruction encodings, like 64-bit LDR/STR with immediate,
require the target to be 8 bytes aligned, and linker could choose code stream
with such LDR/STR instruction from MSVC and 4 bytes aligned data from Clang into
final image, which actually cannot be linked together
(see https://bugs.llvm.org/show_bug.cgi?id=41506 for more details).
https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=vs-2019#alignment
Differential Revision: https://reviews.llvm.org/D61225
llvm-svn: 359744
These builtins provide access to the new integer and
sub-integer variants of MMA (matrix multiply-accumulate) instructions
provided by CUDA-10.x on sm_75 (AKA Turing) GPUs.
Also added a feature for PTX 6.4. While Clang/LLVM does not generate
any PTX instructions that need it, we still need to pass it through to
ptxas in order to be able to compile code that uses the new 'mma'
instruction as inline assembly (e.g used by NVIDIA's CUTLASS library
https://github.com/NVIDIA/cutlass/blob/master/cutlass/arch/mma.h#L101)
Differential Revision: https://reviews.llvm.org/D60279
llvm-svn: 359248
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
The option enables use of 32-bit pointers for accessing
const/local/shared memory. The feature is disabled by default.
Differential Revision: https://reviews.llvm.org/D46148
llvm-svn: 331938
When NVPTX TARGET_BUILTIN specifies sm_XX or ptxYY as required feature,
consider those features available if we're compiling for GPU >= sm_XX or have
enabled PTX version >= ptxYY.
Differential Revision: https://reviews.llvm.org/D45061
llvm-svn: 329829
amdgcn targets only support HIP, which does not define __CUDA_ARCH__.
this is a partial unroll of r329232 / D45277.
Differential Revision: https://reviews.llvm.org/D45387
llvm-svn: 329584
Clang can use CUDA-9.1 now, though new APIs (are not implemented yet.
The major change is that headers in CUDA-9.1 went through substantial
changes that started in CUDA-9.0 which required substantial changes
in the cuda compatibility headers provided by clang.
There are two major issues:
* CUDA SDK no longer provides declarations for libdevice functions.
* A lot of device-side functions have become nvcc's builtins and
CUDA headers no longer contain their implementations.
This patch changes the way CUDA headers are handled if we compile
with CUDA 9.x. Both 9.0 and 9.1 are affected.
* Clang provides its own declarations of libdevice functions.
* For CUDA-9.x clang now provides implementation of device-side
'standard library' functions using libdevice.
This patch should not affect compilation with CUDA-8. There may be
some observable differences for CUDA-9.0, though they are not expected
to affect functionality.
Tested: CUDA test-suite tests for all supported combinations of:
CUDA: 7.0,7.5,8.0,9.0,9.1
GPU: sm_20, sm_35, sm_60, sm_70
Differential Revision: https://reviews.llvm.org/D42513
llvm-svn: 323713
Some target devices (e.g. Nvidia GPUs) don't support dynamic stack
allocation and hence no VLAs. Print errors with description instead
of failing in the backend or generating code that doesn't work.
This patch handles explicit uses of VLAs (local variable in target
or declare target region) or implicitly generated (private) VLAs
for reductions on VLAs or on array sections with non-constant size.
Differential Revision: https://reviews.llvm.org/D39505
llvm-svn: 318601
For now CUDA-9 is not included in the list of CUDA versions clang
searches for, so the path to CUDA-9 must be explicitly passed
via --cuda-path=.
On LLVM side NVPTX added sm_70 GPU type which bumps required
PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment.
Differential Revision: https://reviews.llvm.org/D37576
llvm-svn: 312734
Targets.cpp is getting unwieldy, and even minor changes cause the entire thing
to cause recompilation for everyone. This patch bites the bullet and breaks
it up into a number of files.
I tended to keep function definitions in the class declaration unless it
caused additional includes to be necessary. In those cases, I pulled it
over into the .cpp file. Content is copy/paste for the most part,
besides includes/format/etc.
Differential Revision: https://reviews.llvm.org/D35701
llvm-svn: 308791