Currently if an OpenMP program uses `linear` clause, and is compiled with
optimization, `llvm.lifetime.end` for variables listed in `linear` clause are
emitted too early such that there could still be uses after that. Let's take the
following code as example:
```
// loop.c
int j;
int *u;
void loop(int n) {
int i;
for (i = 0; i < n; ++i) {
++j;
u = &j;
}
}
```
We compile using the command:
```
clang -cc1 -fopenmp-simd -O3 -x c -triple x86_64-apple-darwin10 -emit-llvm loop.c -o loop.ll
```
The following IR (simplified) will be generated:
```
@j = local_unnamed_addr global i32 0, align 4
@u = local_unnamed_addr global ptr null, align 8
define void @loop(i32 noundef %n) local_unnamed_addr {
entry:
%j = alloca i32, align 4
%cmp = icmp sgt i32 %n, 0
br i1 %cmp, label %simd.if.then, label %simd.if.end
simd.if.then: ; preds = %entry
call void @llvm.lifetime.start.p0(i64 4, ptr nonnull %j)
store ptr %j, ptr @u, align 8
call void @llvm.lifetime.end.p0(i64 4, ptr nonnull %j)
%0 = load i32, ptr %j, align 4
store i32 %0, ptr @j, align 4
br label %simd.if.end
simd.if.end: ; preds = %simd.if.then, %entry
ret void
}
```
The most important part is:
```
call void @llvm.lifetime.end.p0(i64 4, ptr nonnull %j)
%0 = load i32, ptr %j, align 4
store i32 %0, ptr @j, align 4
```
`%j` is still loaded after `@llvm.lifetime.end.p0(i64 4, ptr nonnull %j)`. This
could cause the backend incorrectly optimizes the code and further generates
incorrect code. The root cause is, when we emit a construct that could have
`linear` clause, it usually has the following pattern:
```
EmitOMPLinearClauseInit(S)
{
OMPPrivateScope LoopScope(*this);
...
EmitOMPLinearClause(S, LoopScope);
...
(void)LoopScope.Privatize();
...
}
EmitOMPLinearClauseFinal(S, [](CodeGenFunction &) { return nullptr; });
```
Variables that need to be privatized are added into `LoopScope`, which also
serves as a RAII object. When `LoopScope` is destructed and if optimization is
enabled, a `@llvm.lifetime.end` is also emitted for each privatized variable.
However, the writing back to original variables in `linear` clause happens after
the scope in `EmitOMPLinearClauseFinal`, causing the issue we see above.
A quick "fix" seems to be, moving `EmitOMPLinearClauseFinal` inside the scope.
However, it doesn't work. That's because the local variable map has been updated
by `LoopScope` such that a variable declaration is mapped to the privatized
variable, instead of the actual one. In that way, the following code will be
generated:
```
%0 = load i32, ptr %j, align 4
store i32 %0, ptr %j, align 4
call void @llvm.lifetime.end.p0(i64 4, ptr nonnull %j)
```
Well, now the life time is correct, but apparently the writing back is broken.
In this patch, a new function `OMPPrivateScope::restoreMap` is added and called
before calling `EmitOMPLinearClauseFinal`. This can make sure that
`EmitOMPLinearClauseFinal` can find the orignal varaibls to write back.
Fixes#56913.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D131272
Scope of changes:
1) Added new function to generate loop versioning
2) Added support for if clause to applySimd function
2) Added tests which confirm that lowering is successful
If ifCond is specified, then collapsed loop is duplicated and if branch
is added. Duplicated loop is executed if simd ifCond is evaluated to false.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D129368
Signed-off-by: Dominik Adamski <dominik.adamski@amd.com>
This patch makes use of OMPIRBuilder support for codegen of taskgroup
construct in clang.
Depends on D128203
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D129992
This patch adds OMPIRBuilder support for the simdlen clause for the
simd directive. It uses the simdlen support in OpenMPIRBuilder when
it is enabled in Clang. Simdlen is lowered by OpenMPIRBuilder by
generating the loop.vectorize.width metadata.
Reviewed By: jdoerfert, Meinersbur
Differential Revision: https://reviews.llvm.org/D129149
D127041 introduced the support for `fmax` and `fmin` such that we can also reprent
`atomic compare` and `atomic compare capture` with `atomicrmw` instruction. This
patch simply lifts the limitation we set before.
Depend on D127041.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D127042
Implementing target in_reduction by wrapping target task with host task with in_reduction and if clause. This is in compliance with OpenMP 5.0 section: 2.19.5.6.
So, this
```
for (int i=0; i<N; i++) {
res = res+i
}
```
will become
```
#pragma omp task in_reduction(+:res) if(0)
#pragma omp target map(res)
for (int i=0; i<N; i++) {
res = res+i
}
```
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D125669
This patch adds the codegen support for `atomic compare capture` in clang.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D120290
This patch removes all `IgnoreImpCasts` in Sema, and only uses it if necessary. If the expression is not of the same type as the pointer value, a cast is inserted.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D126602
Without this patch, arguments to the
`llvm::OpenMPIRBuilder::AtomicOpValue` initializer are reversed.
Reviewed By: ABataev, tianshilei1992
Differential Revision: https://reviews.llvm.org/D126619
This creates an entry with address=nullptr and flag=0x80.
When an 'omp_all_memory' entry is specified any other 'out' or
'inout' entries are not needed and are not passed to the runtime.
Differential Revision: https://reviews.llvm.org/D126321
Currently when using `atomic update` with floating-point variables, if
the operation is add or sub, `cmpxchg`, instead of `atomicrmw` is emitted, as
shown in [1]. In fact, about three years ago, llvm-svn: 351850 added the
support for FP operations. This patch adds the support in OpenMP as well.
[1] https://godbolt.org/z/M7b4ba9na
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D124724
Need to emit final update of the inscan reduction variables. For
worksharing loops, the reduction values are stored in the temp array,
need to copy the last element to the original var at the end of the
construct.
Differential Revision: https://reviews.llvm.org/D121156
The callback is expected to create a branch to the ContinuationBB (sometimes called FiniBB in some lambdas) argument when finishing. This creates problems:
1. The InsertPoint used for CodeGenIP does not need to be the end of a block. If it is not, a naive callback will insert a branch instruction into the middle of the block.
2. The BasicBlock the CodeGenIP is pointing to may or may not have a terminator. There is an conflict where to branch to if the block already has a terminator.
3. Some API functions work only with block having a terminator. Some workarounds have been used to insert a temporary terminator that is removed again.
4. Some callbacks are sensitive to whether the BasicBlock has a terminator or not. This creates a callback ordering problem where different callback may have different behaviour depending on whether a previous callback created a terminator or not. The problem also exists for FinalizeCallbackTy where some callbacks do create branch to another "continue" block, but unlike BodyGenCallbackTy does not receive the target as argument. This is not addressed in this patch.
With this patch, the callback receives an CodeGenIP into a BasicBlock where to insert instructions. If it has to insert control flow, it can split the block at that position as needed but otherwise no separate ContinuationBB is needed. In particular, a callback can be empty without breaking the emitted IR. If the caller needs the control flow to branch to a specific target, it can insert the branch instruction itself and pass an InsertPoint before the terminator to the callback.
Certain frontends such as Clang may expect the current IRBuilder position to be at the end of a basic block. In this case its callbacks must split the block at CodeGenIP before setting the IRBuilder position such that the instructions after CodeGenIP are moved to another basic block and before returning create a new branch instruction to the split block.
Some utility functions such as `splitBB` are supporting correct splitting of BasicBlocks, independent of whether they have a terminator or not, returning/setting the InsertPoint of an IRBuilder to the end of split predecessor block, and optionally omitting creating a branch to the split successor block to be added later.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D118409
In case of OpenMP programs, thread local variables can be present in
any clause pertaining to OpenMP constructs, as we know that compiler
generates artificial functions and in some cases values are passed to
those artificial functions thru parameters. For an example, if thread
local variable is present in copyin clause (testcase attached with the
patch), parameter with same name is generated as parameter to artificial
function. When user inquires the thread Local variable, its debug info
is hidden by the parameter. User never gets the actual TLS variable
when inquires it, instead gets the artificial parameter.
Current patch suppresses the debug info for such artificial parameter to
enable correct debugging of TLS variables.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D123787
This reverts commit af0285122f.
The test "libomp::loop_dispatch.c" on builder
openmp-gcc-x86_64-linux-debian fails from time-to-time.
See #54969. This patch is unrelated.
The OMPScheduleType enum stores the constants from libomp's internal sched_type in kmp.h and are used by several kmp API functions. The enum values have an internal structure, namely each scheduling algorithm (e.g.) exists in four variants: unordered, orderend, normerge unordered, and nomerge ordered.
This patch (basically a followup to D114940) splits the "ordered" and "nomerge" bits into separate flags, as was already done for the "monotonic" and "nonmonotonic", so we can apply bit flags operations on them. It also now contains all possible combinations according to kmp's sched_type. Deriving of the OMPScheduleType enum from clause parameters has been moved form MLIR's OpenMPToLLVMIRTranslation.cpp to OpenMPIRBuilder to make available for clang as well. Since the primary purpose of the flag is the binary interface to libomp, it has been made more private to LLVMFrontend. The primary interface for generating worksharing-loop using OpenMPIRBuilder code becomes `applyWorkshareLoop` which derives the OMPScheduleType automatically and calls the appropriate emitter function.
While this is mostly a NFC refactor, it still applies the following functional changes:
* The logic from OpenMPToLLVMIRTranslation to derive the OMPScheduleType also applies to clang. Most notably, it now applies the nonmonotonic flag for non-static schedules by default.
* In OpenMPToLLVMIRTranslation, the nonmonotonic default flag was previously not applied if the simd modifier was used. I assume this was a bug, since the effect was due to `loop.schedule_modifier()` returning `mlir::omp::ScheduleModifier::none` instead of `llvm::Optional::None`.
* In OpenMPToLLVMIRTranslation, the nonmonotonic default flag was set even if ordered was specified, in breach to what the comment before citing the OpenMP specification says. I assume this was an oversight.
The ordered flag with parameter was not considered in this patch. Changes will need to be made (e.g. adding/modifying function parameters) when support for it is added. The lengthy names of the enum values can be discussed, for the moment this is avoiding reusing previously existing enum value names such as `StaticChunked` to avoid confusion.
Reviewed By: peixin
Differential Revision: https://reviews.llvm.org/D123403
* Use default ref capture for non-escaping lambdas (this makes
maintenance easier by allowing new uses, removing uses, having
conditional uses (such as in assertions) not require updates to an
explicit capture list)
* Simplify addPrivate API not to take a lambda, since it calls it
unconditionally/immediately anyway - most callers are simply passing
in a named value or short expression anyway and the lambda syntax just
adds noise/overhead
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D121077
Add applyStaticChunkedWorkshareLoop method implementing static schedule when chunk-size is specified. Unlike a static schedule without chunk-size (where chunk-size is chosen by the runtime such that each thread receives one chunk), we need two nested loops: one for looping over the iterations of a chunk, and a second for looping over all chunks assigned to the threads.
This patch includes the following related changes:
* Adapt applyWorkshareLoop to triage between the schedule types, now possible since all schedules have been implemented. The default schedule is assumed to be non-chunked static, as without OpenMPIRBuilder.
* Remove the chunk parameter from applyStaticWorkshareLoop, it is ignored by the runtime. Change the value for the value passed to the init function to 0, as without OpenMPIRBuilder.
* Refactor CanonicalLoopInfo::setTripCount and CanonicalLoopInfo::mapIndVar as used by both, applyStaticWorkshareLoop and applyStaticChunkedWorkshareLoop.
* Enable Clang to use the OpenMPIRBuilder in the presence of the schedule clause.
Differential Revision: https://reviews.llvm.org/D114413
Currently when we generate OpenMP offloading code we always make
fallback code for the CPU. This is necessary for implementing features
like conditional offloading and ensuring that unhandled pragmas don't
result in missing symbols. However, this is problematic for a few cases.
For offloading tests we can silently fail to the host without realizing
that offloading failed. Additionally, this makes it impossible to
provide interoperabiility to other offloading schemes like HIP or CUDA
because those methods do not provide any such host fallback guaruntee.
this patch adds the `-fopenmp-offload-mandatory` flag to prevent
generating the fallback symbol on the CPU and instead replaces the
function with a dummy global and the failed branch with 'unreachable'.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D120353
This patch adds the support for `atomic compare capture` in parser and part of
sema. We don't create an AST node for this because the spec doesn't say `compare`
and `capture` clauses should be used tightly, so we cannot look one more token
ahead in the parser.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D116261
To make uses of the deprecated constructor easier to spot, and to
ensure that no new uses are introduced, rename it to
Address::deprecated().
While doing the rename, I've filled in element types in cases
where it was relatively obvious, but we're still left with 135
calls to the deprecated constructor.
Based on the output of include-what-you-use.
This is a big chunk of changes. It is very likely to break downstream code
unless they took a lot of care in avoiding hidden ehader dependencies, something
the LLVM codebase doesn't do that well :-/
I've tried to summarize the biggest change below:
- llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h
- llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h
- llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h
- llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h
- llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h
- llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h
- llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h
And the usual count of preprocessed lines:
$ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
before: 6400831
after: 6189948
200k lines less to process is no that bad ;-)
Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D118652
This patch adds OMPIRBuilder support for the simd directive (without any clause). This will be a first step towards lowering simd directive in LLVM_Flang. The patch uses existing CanonicalLoop infrastructure of IRBuilder to add the support. Also adds necessary code to add llvm.access.group and llvm.loop metadata wherever needed.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D114379
The `EmitDeclareOfAutoVariable` introduced in D114504 and D115510 has a
precondition that cannot be violated. It is unclear if we should call it
directly given the sparse usage in clang but for now we should at least
not crash if the debug info kind is too low.
Fixes#52938.
Differential Revision: https://reviews.llvm.org/D116865
This patch adds the support for `atomic compare` in parser. The support
in Sema and CodeGen will come soon. For now, it simply eimits an error when it
is encountered.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D115561
The reduction initialization code creates a "naturally aligned null
pointer to void lvalue", which I found somewhat odd, even though it
works out in the end because it is not actually used. It doesn't
look like this code actually needs an LValue for anything though,
and we can use an invalid Address to represent this case instead.
Differential Revision: https://reviews.llvm.org/D116214
MakeNaturalAlignAddrLValue() expects the pointee type, but the
pointer type was passed. As a result, the natural alignment of
the pointer (usually 8) was always used in place of the natural
alignment of the value type.
Differential Revision: https://reviews.llvm.org/D116171
The problem with the old scheme is that we would need to keep track of
the "next region" and reset the num_threads value after it. The new RT
doesn't do it and an assertion is triggered. The old RT doesn't do it
either, I haven't tested it but I assume a num_threads clause might
impact multiple parallel regions "accidentally". Further, in SPMD mode
num_threads was simply ignored, for some reason beyond me.
In any case, parallel_51 is designed to take the clause value directly,
so let's do that instead.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D113623
Update `OpenMPIRBuilder::collapseLoops()` to call `resize()` instead of
`set_size()`. The latter asserts on capacity limits and cannot grow,
which seems likely to be unintentional here (if it is, I think a local
assertion would be good for clarity).
Also update `CodeGenFunction::EmitOMPCollapsedCanonicalLoopNest()` to
use `pop_back_n()` instead of `set_size()`.
Differential Revision: https://reviews.llvm.org/D115378
Currently variables appearing inside private/firstprivate/lastprivate
clause of openmp task construct are not visible inside lldb debugger.
This is because compiler does not generate debug info for it.
Please consider the testcase debug_private.c attached with patch.
```
28 #pragma omp task shared(res) private(priv1, priv2) firstprivate(fpriv)
29 {
30 priv1 = n;
31 priv2 = n + 2;
32 printf("Task n=%d,priv1=%d,priv2=%d,fpriv=%d\n",n,priv1,priv2,fpriv);
33
-> 34 res = priv1 + priv2 + fpriv + foo(n - 1);
35 }
36 #pragma omp taskwait
37 return res;
(lldb) p priv1
error: <user expression 0>:1:1: use of undeclared identifier 'priv1'
priv1
^
(lldb) p priv2
error: <user expression 1>:1:1: use of undeclared identifier 'priv2'
priv2
^
(lldb) p fpriv
error: <user expression 2>:1:1: use of undeclared identifier 'fpriv'
fpriv
^
```
After the current patch, lldb is able to show the variables
```
(lldb) p priv1
(int) $0 = 10
(lldb) p priv2
(int) $1 = 12
(lldb) p fpriv
(int) $2 = 14
```
Reviewed By: djtodoro
Differential Revision: https://reviews.llvm.org/D114504
Eachempati.
This patch adds clang (parsing, sema, serialization, codegen) support for the 'depend' clause on the 'taskwait' directive.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D113540