Commit Graph

244 Commits

Author SHA1 Message Date
ziliangzl 5b1f7d5875 [VENTUS][fix]Fix double add and mul operation
Passed test_geometrics in CTS.
2024-07-24 10:32:18 +08:00
Jonas Paulsson 122efef8ee Revert "Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions.""
This reverts commit 17db0de330.

Some more bots got broken - need to investigate.
2022-12-05 00:52:00 +01:00
Fangrui Song 4e62072ca1 [Passes] llvm::Optional => std::optional 2022-12-04 20:44:52 +00:00
Jonas Paulsson 17db0de330 Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."
Init captures added in processBlock() to avoid capturing structured bindings,
which caused the build problems (with clang).

RISCV has this disabled for now until problems relating to post RA pseudo
expansions are resolved.
2022-12-03 14:15:15 -06:00
Kazu Hirata 998960ee1f [CodeGen] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:08 -08:00
Jan Svoboda abf0c6c0c0 Use CTAD on llvm::SaveAndRestore
Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D139229
2022-12-02 15:36:12 -08:00
Jonas Paulsson 8ef4632681 Revert "[CodeGen] Add new pass for late cleanup of redundant definitions."
Temporarily revert and fix buildbot failure.

This reverts commit 6d12599fd4.
2022-12-01 13:29:24 -05:00
Jonas Paulsson 6d12599fd4 [CodeGen] Add new pass for late cleanup of redundant definitions.
A new pass MachineLateInstrsCleanup is added to be run after PEI.

This is a simple pass that removes redundant and identical instructions
whenever found by scanning the MF once while keeping track of register
definitions in a map. These instructions are typically immediate loads
resulting from rematerialization, and address loads emitted by target in
eliminateFrameInde().

This is enabled by default, but a target could easily disable it by means of
'disablePass(&MachineLateInstrsCleanupID);'.

This late cleanup is naturally not "optimal" in removing instructions as it
is done by looking at phys-regs, but still quite effective. It would be
desirable to improve other parts of CodeGen and avoid these redundant
instructions in the first place, but there are no ideas for this yet.

Differential Revision: https://reviews.llvm.org/D123394

Reviewed By: RKSimon, foad, craig.topper, arsenm, asb
2022-12-01 13:21:35 -05:00
Freddy Ye 89f36dd8f3 [X86] Add ExpandLargeFpConvert Pass and enable for X86
As stated in
https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528,
this implementation is very similar to ExpandLargeDivRem, which expands
‘fptoui .. to’, ‘fptosi .. to’, ‘uitofp .. to’, ‘sitofp .. to’ instructions
with a bitwidth above a threshold into auto-generated functions. This is
useful for targets like x86_64 that cannot lower fp convertions with more
than 128 bits. The expanded nodes are referring from the IR generated by
`compiler-rt/lib/builtins/floattidf.c`, `compiler-rt/lib/builtins/fixdfti.c`,
and etc.

Corner cases:
1. For fp16: as there is no related builtins added in compliler-rt. So I
mainly utilized the fp32 <-> fp16 lib calls to implement.
2. For fp80: as this pass is soft fp emulation and no fp80 instructions can
help in this problem. I recommend users to deprecate this usage. For now, the
implementation uses fp128 as the temporary conversion type and inserts
fptrunc/ext at top/end of the function.
3. For bf16: as clang FE currently doesn't support bf16 algorithm operations
(convert to int, float, +, -, *, ...), this patch doesn't consider bf16 for
now.
4. For unsigned FPToI: since both default hardware behaviors and libgcc are
ignoring "returns 0 for negative input" spec. This pass follows this old way
to ignore unsigned FPToI. See this example:
https://gcc.godbolt.org/z/bnv3jqW1M

The end-to-end tests are uploaded at https://reviews.llvm.org/D138261

Reviewed By: LuoYuanke, mgehre-amd

Differential Revision: https://reviews.llvm.org/D137241
2022-12-01 13:47:43 +08:00
Marco Elver b95646fe70 Revert "Use-after-return sanitizer binary metadata"
This reverts commit d3c851d3fc.

Some bots broke:

- https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8796062278266465473/overview
- https://lab.llvm.org/buildbot/#/builders/124/builds/5759/steps/7/logs/stdio
2022-11-30 23:35:50 +01:00
Dmitry Vyukov d3c851d3fc Use-after-return sanitizer binary metadata
Currently per-function metadata consists of:
(start-pc, size, features)

This adds a new UAR feature and if it's set an additional element:
(start-pc, size, features, stack-args-size)

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D136078
2022-11-30 14:50:22 +01:00
Dmitry Vyukov 0aedf9d714 Revert "Use-after-return sanitizer binary metadata"
This reverts commit e6aea4a5db.

Broke tests:
https://lab.llvm.org/buildbot/#/builders/16/builds/38992
2022-11-30 09:38:56 +01:00
Dmitry Vyukov e6aea4a5db Use-after-return sanitizer binary metadata
Currently per-function metadata consists of:
(start-pc, size, features)

This adds a new UAR feature and if it's set an additional element:
(start-pc, size, features, stack-args-size)

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D136078
2022-11-30 09:14:19 +01:00
Kazu Hirata dbb1130966 Revert "Use-after-return sanitizer binary metadata"
This reverts commit a1255dc467.

This patch results in:

  llvm/lib/CodeGen/SanitizerBinaryMetadata.cpp:57:17: error: no member
  named 'size' in 'llvm::MDTuple'
2022-11-29 09:04:00 -08:00
Dmitry Vyukov a1255dc467 Use-after-return sanitizer binary metadata
Currently per-function metadata consists of:
(start-pc, size, features)

This adds a new UAR feature and if it's set an additional element:
(start-pc, size, features, stack-args-size)

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D136078
2022-11-29 17:37:36 +01:00
Kazu Hirata d6f0ab47a7 [CodeGen] Use std::optional in TargetPassConfig.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-26 15:12:11 -08:00
Matthias Gehre 2090e85fee [llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64
This adds the ExpandLargeDivRem to the default pass pipeline.
The limit at which it expands div/rem instructions is configured
via a new TargetTransformInfo hook (default: no expansion)
X86, Arm and AArch64 backends implement this hook to expand div/rem
instructions with more than 128 bits.

Differential Revision: https://reviews.llvm.org/D130076
2022-09-06 15:32:04 +01:00
Fangrui Song de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Luo, Yuanke 5cb0979870 [X86][AMX] Split greedy RA for tile register
When we fill the shape to tile configure memory, the shape is gotten
from AMX pseudo instruction. However the register for the shape may be
split or spilled by greedy RA. That cause we fill the shape to config
memory after ldtilecfg is executed, so that the shape configuration
would be wrong.
This patch is to split the tile register allocation from greedy register
allocation, so that after tile registers are allocated the shape
registers are still virtual register. The shape register only may be
redefined or multi-defined by phi elimination pass, two address pass.
That doesn't affect tile register configuration.

Differential Revision: https://reviews.llvm.org/D128584
2022-06-29 10:35:43 +08:00
Fangrui Song 95a134254a Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options 2022-06-05 01:07:51 -07:00
Rahman Lavaee 08cc058518 Reland "[Propeller] Promote functions with propeller profiles to .text.hot."
This relands commit 4d8d2580c5.

The major change here is using 'addUsedIfAvailable<BasicBlockSectionsProfileReader>()` to make sure we don't change the pipeline tests.

Differential Revision: https://reviews.llvm.org/D126518
2022-05-26 19:53:14 -07:00
Rahman Lavaee 3aa249329f Revert "[Propeller] Promote functions with propeller profiles to .text.hot."
This reverts commit 4d8d2580c5.
2022-05-26 18:45:40 -07:00
Rahman Lavaee 4d8d2580c5 [Propeller] Promote functions with propeller profiles to .text.hot.
Today, text section prefixes (none, .unlikely, .hot, and .unkown) are determined based on PGO profile. However, Propeller may deem a function hot when PGO doesn't. Besides, when `-Wl,-keep-text-section-prefix=true` Propeller cannot enforce a global section ordering as the linker can only reorder sections within each output section (.text, .text.hot, .text.unlikely).

This patch promotes all functions with Propeller profiles (functions listed in the basic-block-sections profile) to .text.hot. The feature is hidden behind the flag `--bbsections-guided-section-prefix` which defaults to `true`.

The new implementation refactors the parsing of basic block sections profile into a new `BasicBlockSectionsProfileReader` analysis pass. This allows us to use the information earlier in `CodeGenPrepare` in order to set the functions text prefix. `BasicBlockSectionsProfileReader` will be used both by `BasicBlockSections` pass and `CodeGenPrepare`.

Differential Revision: https://reviews.llvm.org/D122930
2022-05-26 16:23:21 -07:00
Sotiris Apostolakis ca7c307d18 [SelectOpti][1/5] Setup new select-optimize pass
This is the first commit for the cmov-vs-branch optimization pass.
The goal is to develop a new profile-guided and target-independent cost/benefit analysis
for selecting conditional moves over branches when optimizing for performance.

Initially, this new pass is expected to be enabled only for instrumentation-based PGO.

RFC: https://discourse.llvm.org/t/rfc-cmov-vs-branch-optimization/6040

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D120230
2022-05-19 16:31:10 +00:00
Momchil Velikov b4ad28da19 [CodeGen] Async unwind - add a pass to fix CFI information
This pass inserts the necessary CFI instructions to compensate for the
inconsistency of the call-frame information caused by linear (non-CGA
aware) nature of the unwind tables.

Unlike the `CFIInstrInserer` pass, this one almost always emits only
`.cfi_remember_state`/`.cfi_restore_state`, which results in smaller
unwind tables and also transparently handles custom unwind info
extensions like CFA offset adjustement and save locations of SVE
registers.

This pass takes advantage of the constraints taht LLVM imposes on the
placement of save/restore points (cf. `ShrinkWrap.cpp`):

  * there is a single basic block, containing the function prologue

  * possibly multiple epilogue blocks, where each epilogue block is
    complete and self-contained, i.e. CSR restore instructions (and the
    corresponding CFI instructions are not split across two or more
    blocks.

  * prologue and epilogue blocks are outside of any loops

Thus, during execution, at the beginning and at the end of each basic
block the function can be in one of two states:

  - "has a call frame", if the function has executed the prologue, or
     has not executed any epilogue

  - "does not have a call frame", if the function has not executed the
    prologue, or has executed an epilogue

These properties can be computed for each basic block by a single RPO
traversal.

From the point of view of the unwind tables, the "has/does not have
call frame" state at beginning of each block is determined by the
state at the end of the previous block, in layout order.

Where these states differ, we insert compensating CFI instructions,
which come in two flavours:

- CFI instructions, which reset the unwind table state to the
    initial one.  This is done by a target specific hook and is
    expected to be trivial to implement, for example it could be:
```
     .cfi_def_cfa <sp>, 0
     .cfi_same_value <rN>
     .cfi_same_value <rN-1>
     ...
```
where `<rN>` are the callee-saved registers.

- CFI instructions, which reset the unwind table state to the one
    created by the function prologue. These are the sequence:
```
       .cfi_restore_state
       .cfi_remember_state
```
In this case we also insert a `.cfi_remember_state` after the
last CFI instruction in the function prologue.

Reviewed By: MaskRay, danielkiss, chill

Differential Revision: https://reviews.llvm.org/D114545
2022-04-11 13:27:26 +01:00
Muhammad Omair Javaid 0320115c16 Revert "[CodeGen] Async unwind - add a pass to fix CFI information"
This reverts commit 980c3e6dd2.

This commit had failing tests with clang crashing across various
AArch64/Linux buildots.

https://lab.llvm.org/buildbot/#/builders/179/builds/3346

Differential Revision: https://reviews.llvm.org/D114545
2022-04-05 13:12:30 +05:00
Momchil Velikov 980c3e6dd2 [CodeGen] Async unwind - add a pass to fix CFI information
This pass inserts the necessary CFI instructions to compensate for the
inconsistency of the call-frame information caused by linear (non-CFG
aware) nature of the unwind tables.

Unlike the `CFIInstrInserer` pass, this one almost always emits only
`.cfi_remember_state`/`.cfi_restore_state`, which results in smaller
unwind tables and also transparently handles custom unwind info
extensions like CFA offset adjustement and save locations of SVE
registers.

This pass takes advantage of the constraints that LLVM imposes on the
placement of save/restore points (cf. `ShrinkWrap.cpp`):

  * there is a single basic block, containing the function prologue

  * possibly multiple epilogue blocks, where each epilogue block is
    complete and self-contained, i.e. CSR restore instructions (and the
    corresponding CFI instructions are not split across two or more
    blocks.

  * prologue and epilogue blocks are outside of any loops

Thus, during execution, at the beginning and at the end of each basic
block the function can be in one of two states:

  - "has a call frame", if the function has executed the prologue, or
     has not executed any epilogue

  - "does not have a call frame", if the function has not executed the
    prologue, or has executed an epilogue

These properties can be computed for each basic block by a single RPO
traversal.

In order to accommodate backends which do not generate unwind info in
epilogues we compute an additional property "strong no call frame on
entry" which is set for the entry point of the function and for every
block reachable from the entry along a path that does not execute the
prologue. If this property holds, it takes precedence over the "has a
call frame" property.

From the point of view of the unwind tables, the "has/does not have
call frame" state at beginning of each block is determined by the
state at the end of the previous block, in layout order.

Where these states differ, we insert compensating CFI instructions,
which come in two flavours:

- CFI instructions, which reset the unwind table state to the
    initial one.  This is done by a target specific hook and is
    expected to be trivial to implement, for example it could be:
```
     .cfi_def_cfa <sp>, 0
     .cfi_same_value <rN>
     .cfi_same_value <rN-1>
     ...
```
where `<rN>` are the callee-saved registers.

- CFI instructions, which reset the unwind table state to the one
    created by the function prologue. These are the sequence:
```
       .cfi_restore_state
       .cfi_remember_state
```
In this case we also insert a `.cfi_remember_state` after the
last CFI instruction in the function prologue.

Reviewed By: MaskRay, danielkiss, chill

Differential Revision: https://reviews.llvm.org/D114545
2022-04-04 14:38:22 +01:00
Julian Lettner 64902d335c Reland "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO"
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.

Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.

Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`).  This escape hatch will be
removed in the future.

Differential Revision: https://reviews.llvm.org/D121736
2022-03-23 18:36:55 -07:00
Zequan Wu 581dc3c729 Revert "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO"
This reverts commit 22570bac69.
2022-03-23 16:11:54 -07:00
Julian Lettner 22570bac69 Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.

Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.

Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`).  This escape hatch will be
removed in the future.

Differential Revision: https://reviews.llvm.org/D121736
2022-03-17 10:47:13 -07:00
Shengchen Kan ac64d0d230 [NFC][CodeGen] Remove redundant if clause in TargetPassConfig::addPass 2022-03-16 22:14:23 +08:00
serge-sans-paille 989f1c72e0 Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169

after:  1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681
2022-03-16 08:43:00 +01:00
Simon Pilgrim 7262eacd41 Revert rG9c542a5a4e1ba36c24e48185712779df52b7f7a6 "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO"
Mane of the build bots are complaining: Unknown command line argument '-lower-global-dtors'
2022-03-15 13:01:35 +00:00
Julian Lettner 9c542a5a4e Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.

Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.

Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`).  This escape hatch will be
removed in the future.

Differential Revision: https://reviews.llvm.org/D121327
2022-03-14 17:51:18 -07:00
Nico Weber a278250b0f Revert "Cleanup codegen includes"
This reverts commit 7f230feeea.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
2022-03-10 07:59:22 -05:00
serge-sans-paille 7f230feeea Cleanup codegen includes
after:  1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169
2022-03-10 10:00:30 +01:00
Xiang1 Zhang c31014322c TLS loads opimization (hoist)
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D120000
2022-03-10 09:29:06 +08:00
Xiang1 Zhang 65588a0776 Revert "TLS loads opimization (hoist)"
Revert for more reviews

This reverts commit 30e612ebdf.
2022-03-02 14:10:11 +08:00
Xiang1 Zhang 30e612ebdf TLS loads opimization (hoist)
Reviewed By: Wang Pheobe, Topper Craig

Differential Revision: https://reviews.llvm.org/D120000
2022-03-02 10:37:24 +08:00
Rong Xu 52d981a4c1 [SampleFDO] Enable FSAFDO loading passes if --enable-fs-discriminator is enabled
FSAFDO profile loader is currently disabled even --enable-fs-discriminator is enabled.
They need to be turned on by options which makes it cumbersome for experiments.

This patch changes the FSAFDO profile loader enabled by default.  Since they are
guarded by EnableFSDiscriminator, they will only be turned on if
--enable-fs-discriminator is enabled. Note that --enable-fs-discriminator is
still disabled by default.

Differential Revision: https://reviews.llvm.org/D119033
2022-02-05 22:37:09 -08:00
Mircea Trofin e67430cca4 [MLGO] ML Regalloc Eviction Advisor
The bulk of the implementation is common between 'release' mode (==AOT-ed
model) and 'development' mode (for training), the main difference is
that in development mode, we may also log features (for training logs),
inject scoring information (currently after the Virtual Register
Rewriter) and then produce the log file.

This patch also introduces the score injection pass, 'Register
Allocation Pass Scoring', which is trivially just logging the score in
development mode.

Differential Revision: https://reviews.llvm.org/D117147
2022-01-19 11:00:32 -08:00
Kazu Hirata d09a284dfb [CodeGen] Drop unnecessary const from return types (NFC)
Identified with readability-const-return-type.
2021-12-28 00:38:11 -08:00
Jay Foad 012248b0bc Remove the verifyAfter mechanism that was replaced by D111397
Differential Revision: https://reviews.llvm.org/D111872
2021-10-18 10:26:46 +01:00
Jay Foad b84d9d299e [TargetPassConfig] Remove an obsolete FIXME comment
The "coloring with register" functionality it refers to was removed ten
years ago in svn r144481 "Remove the -color-ss-with-regs option".
2021-10-08 07:34:25 +01:00
Jay Foad 097339b1ca [TargetPassConfig] Enable machine verification after miscellaneous passes
In a couple of places machine verification was disabled for no apparent
reason, probably just because an "addPass(..., false)" line was cut and
pasted from elsewhere.

After this patch the only remaining place where machine verification is
disabled in the generic TargetPassConfig code, is after addPreEmitPass.
2021-10-07 21:24:50 +01:00
Jay Foad 27c57e791a [TwoAddressInstruction] Enable machine verification after this pass
Differential Revision: https://reviews.llvm.org/D111007
2021-10-07 20:04:51 +01:00
Jay Foad 3ff0a5747d [PHIElimination] Enable machine verification after this pass
Differential Revision: https://reviews.llvm.org/D111006
2021-10-07 20:04:51 +01:00
Jay Foad 0bd4365445 [LiveIntervals] Fix verification of early-clobbered segments
Enable verification of live intervals immediately after computing them
(when -early-live-intervals is used) and fix a problem that that
provokes: currently the verifier insists that a segment that ends at an
early-clobber slot must be followed by another segment starting at the
same slot. But before TwoAddressInstruction runs, the equivalent
condition is: a segment that ends at an early-clobber slot must have its
last use tied to an early-clobber def. That condition is harder to check
here, so for now just disable this check until tied operands have been
rewritten.

Differential Revision: https://reviews.llvm.org/D111065
2021-10-05 08:17:56 +01:00
Jay Foad 31c92d515d [MachineLoopInfo] Enable machine verification after this pass
Enabling this does not show any problems in check-llvm in an
LLVM_ENABLE_EXPENSIVE_CHECKS build.

Differential Revision: https://reviews.llvm.org/D110703
2021-10-01 18:15:57 +01:00
Jay Foad 04787239c9 [LiveVariables] Skip verification of kills inside bundles
LiveVariables does not examine the contents of bundles, so
MachineVerifier should not expect it to know about kill flags on
operands of instructions inside a bundle.

With this fix we can enable machine verification after running the
LiveVariables analysis. Doing this does not show any problems in
check-llvm in an LLVM_ENABLE_EXPENSIVE_CHECKS build.

Differential Revision: https://reviews.llvm.org/D110700
2021-10-01 18:15:57 +01:00