Commit Graph

621 Commits

Author SHA1 Message Date
zhoujing 9b11eb8feb [NFC][fix] Fix test cases failure 2024-03-07 10:57:30 +08:00
Mifuns e4c88939fe
[#60][feat] Support barrier with memory scope parameter (#63)
Co-authored-by: qinfan <qinfan.wang@terapines.com>
2023-11-22 16:02:16 +08:00
zhoujing 967cb725c8 [VENTUS][RISCV][feat] Set ventus kernel for OpenCL kernel functions 2023-06-05 13:10:35 +08:00
zhoujingya 553e65dcf7 Change barrier and work_group_barrier into builtin functions 2023-03-14 10:38:22 +08:00
zhoujing a92723f212 Update barrier intrinsics' name and modify barrier's encoding 2023-02-10 14:50:40 +08:00
zhoujing 18810a86c0 Update barrier&barriersub instructions, codegen test cases for barrier builtins and intrinsics 2023-02-10 10:27:46 +08:00
Ron Lieberman ca856fff1c Revert "enable code-object-version=5"
very sorry wrong repo.

This reverts commit d882ba7aea.
2022-11-29 15:21:09 -06:00
Ron Lieberman d882ba7aea enable code-object-version=5 2022-11-29 15:11:57 -06:00
Sven van Haastregt b0e4897a1b [OpenCL] Remove arm-integer-dot-product extension pragmas
This extension only adds builtin functions and thus doesn't need to be
included as an extension.  Instead of a pragma, the builtin functions
of the extension can be exposed through enabling preprocessor defines.
2022-11-29 13:26:50 +00:00
David Stuttard 7940888c59 [AMDGPU] Intrinsic to expose s_wait_event for export ready
Differential Revision: https://reviews.llvm.org/D138216
2022-11-28 11:26:15 +00:00
Roman Lebedev 25f01d593c
Revert "[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 2)"
TableGen is still getting miscompiled on PPC buildbots.
Sent a mail with request for help.

This reverts commit 3c4d2a0396.
2022-11-27 00:00:06 +03:00
Roman Lebedev 3c4d2a0396
[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 2)
This is a recommit of cf624b23bc,
which was reverted in 5cfc22cafe,
because the cut-off on the number of vector elements was not low enough,
and it triggered both SDAG SDNode operand number assertions,
and caused compile time explosions in some cases.

Let's try with something really *REALLY* conservative first,
just to get somewhere, and try to bump it (to 64/128) later.

FIXME: should this respect TTI reg width * num vec regs?

Original commit message:

Now, there's a big caveat here - these bytes
are abstract bytes, not the i8 we have in LLVM,
so strictly speaking this is not exactly legal,
see e.g. https://github.com/AliveToolkit/alive2/issues/860
^ the "bytes" "could" have been a pointer,
and loading it as an integer inserts an implicit ptrtoint.

But at the same time,
InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()`
would expand a memtransfer of 1/2/4/8 bytes
into integer-typed load+store,
so this isn't exactly a new problem.

Note that in memory, poison is byte-wise,
so we really can't widen elements,
but SROA seems to be inconsistent here.

Fixes #59116.
2022-11-26 23:19:15 +03:00
Alex Richardson 54ad4d2dd1 Drop redundant pipe to opt -instnamer in clang tests
This used to be required, but the difference between asserts/!asserts
builds no longer exists for %clang_cc1 (only for %clang), so they pass
just fine without this flag.
2022-11-25 11:34:55 +00:00
Benjamin Kramer 5cfc22cafe Revert "[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes"
This reverts commit cf624b23bc. It
triggers crashes in clang, see the comments on github on the original
change.
2022-11-23 13:11:16 +01:00
Roman Lebedev cf624b23bc
[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes
Now, there's a big caveat here - these bytes
are abstract bytes, not the i8 we have in LLVM,
so strictly speaking this is not exactly legal,
see e.g. https://github.com/AliveToolkit/alive2/issues/860
^ the "bytes" "could" have been a pointer,
and loading it as an integer inserts an implicit ptrtoint.

But at the same time,
InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()`
would expand a memtransfer of 1/2/4/8 bytes
into integer-typed load+store,
so this isn't exactly a new problem.

Note that in memory, poison is byte-wise,
so we really can't widen elements,
but SROA seems to be inconsistent here.

Fixes #59116.
2022-11-23 02:38:25 +03:00
Arthur Eubanks e564f5153f [clang][test] Avoid UB in overload.cl 2022-11-13 14:02:24 -08:00
Zahira Ammarguellat 91628f0616 The handling of 'funsafe-math-optimizations' doesn't update the 'MathErrno'
flag. But the driver checks for 'fno-math-errno' before passing
'funsafe-math-optimizations' to the FE. In GCC, the option
'funsafe-math-optimizations' doesn't affect the 'fmath-errno' flag.
This patch aligns clang with GCC.

'-ffast-math' sets the FPContract to 'fast'. But 'funsafe-math-optimizations'
the driver doesn't consider the FPContract when handling the option.
Unfortunately there are places in the BE that interpret unsafe math
mode as allowing FMA. This patch makes -ffast-math' and
'funsafe-math-optimizations' behave similarly in regard to the setting of the
FPContract.

Differential Revision: https://reviews.llvm.org/D137578
2022-11-11 10:24:12 -05:00
Bjorn Pettersson 5f9a82683d [clang][test] Use opt -passes=<name> instead of opt -name
Updated the RUN line in several test cases to use the new PM syntax
  opt -passes=<pipeline>
instead of the deprecated syntax
  opt -pass1 -pass2

This was not a complete cleanup in clang/test. But just a swipe using
some simple search-and-replace. Mainly for RUN lines involving
-mem2reg, -instnamer and -early-cse.
2022-11-08 12:15:42 +01:00
Nikita Popov 304f1d59ca [IR] Switch everything to use memory attribute
This switches everything to use the memory attribute proposed in
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579.
The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly
attributes are dropped. The readnone, readonly and writeonly attributes
are restricted to parameters only.

The old attributes are auto-upgraded both in bitcode and IR.
The bitcode upgrade is a policy requirement that has to be retained
indefinitely. The IR upgrade is mainly there so it's not necessary
to update all tests using memory attributes in this patch, which
is already large enough. We could drop that part after migrating
tests, or retain it longer term, to make it easier to import IR
from older LLVM versions.

High-level Function/CallBase APIs like doesNotAccessMemory() or
setDoesNotAccessMemory() are mapped transparently to the memory
attribute. Code that directly manipulates attributes (e.g. via
AttributeList) on the other hand needs to switch to working with
the memory attribute instead.

Differential Revision: https://reviews.llvm.org/D135780
2022-11-04 10:21:38 +01:00
Matt Arsenault f59f116bd5 AMDGPU: Add __builtin_amdgcn_permlane64 2022-10-13 21:12:11 -07:00
Petar Avramovic dcc756d03e [AMDGPU] Pattern for flat atomic fadd f64 intrinsic with local addr
Fix regression from clang opencl test in builtins-fp-atomics-gfx90a.cl
test_flat_add_local_f64 caused by D130579
Revert a3becb333d.

Differential Revision: https://reviews.llvm.org/D134568
2022-09-25 13:25:41 +02:00
Petar Avramovic a3becb333d [clang][AMDGPU] Temporarily disable clang atomic fadd test for gfx90a
Test is broken by D130579. Temporarily disable to silence builbot failures.
2022-09-23 21:49:16 +02:00
Stanislav Mekhanoshin e540965915 [AMDGPU] Added __builtin_amdgcn_ds_bvh_stack_rtn
Differential Revision: https://reviews.llvm.org/D133966
2022-09-16 02:42:09 -07:00
Fangrui Song 74742147ee [test] Change cc1 -fvisibility to -fvisibility= 2022-09-02 12:36:44 -07:00
Muhammad Omair Javaid 18de7c6a3b Revert "[InstCombine] Treat passing undef to noundef params as UB"
This reverts commit c911befaec.

It has broken LLDB Arm/AArch64 Linux buildbots. I dont really understand
the underlying reason. Reverting for now make buildbot green.

https://reviews.llvm.org/D133036
2022-09-02 16:09:50 +05:00
Arthur Eubanks c911befaec [InstCombine] Treat passing undef to noundef params as UB
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D133036
2022-09-01 15:16:45 -07:00
Anastasia Stulova 6b1a04529c [OpenCL][SPIR-V] Test extern functions with a pointer arg.
Added a test case that enhances coverage of opaque pointers
particularly for the problematic case with extern functions
for which there is no solution found for type recovery.

Differential Revision: https://reviews.llvm.org/D130768
2022-09-01 10:22:47 +01:00
Yaxun (Sam) Liu 9f6cb3e9fd [AMDGPU] Add builtin s_sendmsg_rtn
Reviewed by: Brian Sumner, Artem Belevich

Differential Revision: https://reviews.llvm.org/D132140

Fixes: SWDEV-352017
2022-08-22 18:29:23 -04:00
Austin Kerbow b0f4678b90 [AMDGPU] Add iglp_opt builtin and MFMA GEMM Opt strategy
Adds a builtin that serves as an optimization hint to apply specific optimized
DAG mutations during scheduling. This also disables any other mutations or
clustering that may interfere with the desired pipeline. The first optimization
strategy that is added here is designed to improve the performance of small gemm
kernels on gfx90a.

Reviewed By: jrbyrnes

Differential Revision: https://reviews.llvm.org/D132079
2022-08-19 15:38:36 -07:00
Austin Kerbow f5b21680d1 [AMDGPU] Add amdgcn_sched_group_barrier builtin
This builtin allows the creation of custom scheduling pipelines on a per-region
basis. Like the sched_barrier builtin this is intended to be used either for
testing, in situations where the default scheduler heuristics cannot be
improved, or in critical kernels where users are trying to get performance that
is close to handwritten assembly. Obviously using these builtins will require
extra work from the kernel writer to maintain the desired behavior.

The builtin can be used to create groups of instructions called "scheduling
groups" where ordering between the groups is enforced by the scheduler.
__builtin_amdgcn_sched_group_barrier takes three parameters. The first parameter
is a mask that determines the types of instructions that you would like to
synchronize around and add to a scheduling group. These instructions will be
selected from the bottom up starting from the sched_group_barrier's location
during instruction scheduling. The second parameter is the number of matching
instructions that will be associated with this sched_group_barrier. The third
parameter is an identifier which is used to describe what other
sched_group_barriers should be synchronized with. Note that multiple
sched_group_barriers must be added in order for them to be useful since they
only synchronize with other sched_group_barriers. Only "scheduling groups" with
a matching third parameter will have any enforced ordering between them.

As an example, the code below tries to create a pipeline of 1 VMEM_READ
instruction followed by 1 VALU instruction followed by 5 MFMA instructions...
// 1 VMEM_READ
__builtin_amdgcn_sched_group_barrier(32, 1, 0)
// 1 VALU
__builtin_amdgcn_sched_group_barrier(2, 1, 0)
// 5 MFMA
__builtin_amdgcn_sched_group_barrier(8, 5, 0)
// 1 VMEM_READ
__builtin_amdgcn_sched_group_barrier(32, 1, 0)
// 3 VALU
__builtin_amdgcn_sched_group_barrier(2, 3, 0)
// 2 VMEM_WRITE
__builtin_amdgcn_sched_group_barrier(64, 2, 0)

Reviewed By: jrbyrnes

Differential Revision: https://reviews.llvm.org/D128158
2022-07-28 10:43:14 -07:00
Aaron Ballman 7068aa9841 Strengthen -Wint-conversion to default to an error
Clang has traditionally allowed C programs to implicitly convert
integers to pointers and pointers to integers, despite it not being
valid to do so except under special circumstances (like converting the
integer 0, which is the null pointer constant, to a pointer). In C89,
this would result in undefined behavior per 3.3.4, and in C99 this rule
was strengthened to be a constraint violation instead. Constraint
violations are most often handled as an error.

This patch changes the warning to default to an error in all C modes
(it is already an error in C++). This gives us better security posture
by calling out potential programmer mistakes in code but still allows
users who need this behavior to use -Wno-error=int-conversion to retain
the warning behavior, or -Wno-int-conversion to silence the diagnostic
entirely.

Differential Revision: https://reviews.llvm.org/D129881
2022-07-22 15:24:54 -04:00
Stanislav Mekhanoshin 523a99c0eb [AMDGPU] Support for gfx940 fp8 smfmac
Differential Revision: https://reviews.llvm.org/D129908
2022-07-18 12:12:41 -07:00
Stanislav Mekhanoshin 2695f0a688 [AMDGPU] Support for gfx940 fp8 mfma
Differential Revision: https://reviews.llvm.org/D129906
2022-07-18 11:49:56 -07:00
Stanislav Mekhanoshin 9fa5a6b7e8 [AMDGPU] Support for gfx940 fp8 conversions
Differential Revision: https://reviews.llvm.org/D129902
2022-07-18 11:48:43 -07:00
Piotr Sobczak 4a78225212 [AMDGPU] Add WMMA clang builtins
Add WMMA clang builtins and tests. Extra changes in code
are needed to handle function overloads.

WavefrontSize 32:
__builtin_amdgcn_wmma_f32_16x16x16_f16_w32
__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32
__builtin_amdgcn_wmma_f16_16x16x16_f16_w32
__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32
__builtin_amdgcn_wmma_i32_16x16x16_iu8_w32
__builtin_amdgcn_wmma_i32_16x16x16_iu4_w32

WavefrontSize 64:
__builtin_amdgcn_wmma_f32_16x16x16_f16_w64
__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64
__builtin_amdgcn_wmma_f16_16x16x16_f16_w64
__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64
__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64
__builtin_amdgcn_wmma_i32_16x16x16_iu4_w64

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D128952
2022-07-01 08:55:25 +02:00
Joe Nash 2d43de13df [AMDGPU] gfx11 new dot instruction codegen support
Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D127904
2022-06-16 14:19:34 -04:00
Austin Kerbow 2db700215a [AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic
Adds an intrinsic/builtin that can be used to fine tune scheduler behavior. If
there is a need to have highly optimized codegen and kernel developers have
knowledge of inter-wave runtime behavior which is unknown to the compiler this
builtin can be used to tune scheduling.

This intrinsic creates a barrier between scheduling regions. The immediate
parameter is a mask to determine the types of instructions that should be
prevented from crossing the sched_barrier. In this initial patch, there are only
two variations. A mask of 0 means that no instructions may be scheduled across
the sched_barrier. A mask of 1 means that non-memory, non-side-effect inducing
instructions may cross the sched_barrier.

Note that this intrinsic is only meant to work with the scheduling passes. Any
other transformations that may move code will not be impacted in the ways
described above.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D124700
2022-05-11 13:22:51 -07:00
Joe Nash 8bdfc73f63 [AMDGPU][clang] Definition of gfx11 subtarget
Contributors:
Jay Foad <jay.foad@amd.com>
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>

Patch 2/N for upstreaming of AMDGPU gfx11 architecture

Depends on D124536

Reviewed By: foad, kzhuravl, #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D124537
2022-04-29 13:55:56 -04:00
Nikita Popov b16a3b4f3b [Clang] Add -no-opaque-pointers to more tests (NFC)
This adds the flag to more tests that were not caught by the
mass-migration in 532dc62b90.
2022-04-07 12:53:29 +02:00
Nikita Popov 532dc62b90 [OpaquePtrs][Clang] Add -no-opaque-pointers to tests (NFC)
This adds -no-opaque-pointers to clang tests whose output will
change when opaque pointers are enabled by default. This is
intended to be part of the migration approach described in
https://discourse.llvm.org/t/enabling-opaque-pointers-by-default/61322/9.

The patch has been produced by replacing %clang_cc1 with
%clang_cc1 -no-opaque-pointers for tests that fail with opaque
pointers enabled. Worth noting that this doesn't cover all tests,
there's a remaining ~40 tests not using %clang_cc1 that will need
a followup change.

Differential Revision: https://reviews.llvm.org/D123115
2022-04-07 12:09:47 +02:00
Scott Linder 09f33a430b [AMDGPU][OpenCL] Remove "printf and hostcall" diagnostic
The diagnostic is unreliable, and triggers even for dead uses of
hostcall that may exist when linking the device-libs at lower
optimization levels.

Eliminate the diagnostic, and directly document the limitation for
OpenCL before code object V5.

Make some NFC changes to clarify the related code in the
MetadataStreamer.

Add a clang test to tie OCL sources containing printf to the backend IR
tests for this situation.

Reviewed By: sameerds, arsenm, yaxunl

Differential Revision: https://reviews.llvm.org/D121951
2022-04-05 19:10:23 +00:00
Nikita Popov f348ca51c7 [Tests] Use %clang_cc1 instead of %clang -cc1 in codegen tests (NFC) 2022-04-05 13:21:44 +02:00
Stanislav Mekhanoshin 6e3e14f600 [AMDGPU] Support gfx940 smfmac instructions
Differential Revision: https://reviews.llvm.org/D122191
2022-03-24 12:40:42 -07:00
Stanislav Mekhanoshin 27439a7642 [AMDGPU] New gfx940 mfma instructions
Differential Revision: https://reviews.llvm.org/D122044
2022-03-24 12:12:52 -07:00
Austin Kerbow 62bcfcb5a5 [AMDGPU] Add llvm.amdgcn.s.setprio intrinsic
Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D120976
2022-03-12 22:15:42 -08:00
Stanislav Mekhanoshin 932f628121 [AMDGPU] new gfx940 fp atomics
Differential Revision: https://reviews.llvm.org/D121028
2022-03-07 12:32:02 -08:00
Aakanksha 840695814a [AMDGPU] Add gfx1036 target
Differential Revision: https://reviews.llvm.org/D120846
2022-03-02 23:26:38 +00:00
Stanislav Mekhanoshin 2e2e64df4a [AMDGPU] Add gfx940 target
This is target definition only.

Differential Revision: https://reviews.llvm.org/D120688
2022-03-02 13:54:48 -08:00
hyeongyukim b529744c29 [Clang] Rename `disable-noundef-analysis` flag to `-[no-]enable-noundef-analysis`
This flag was previously renamed `enable_noundef_analysis` to
`disable-noundef-analysis,` which is not a conventional name. (Driver and
CC1's boolean options are using [no-] prefix)
As discussed at https://reviews.llvm.org/D105169, this patch reverts its
name to `[no-]enable_noundef_analysis` and enables noundef-analysis as
default.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D119998
2022-02-18 17:02:41 +09:00
Aaron Ballman 1ea584377e A significant number of our tests in C accidentally use functions
without prototypes. This patch converts the function signatures to have
a prototype for the situations where the test is not specific to K&R C
declarations. e.g.,

  void func();

becomes

  void func(void);

This is the ninth batch of tests being updated (there are a
significant number of other tests left to be updated).
2022-02-13 08:03:40 -05:00