Commit Graph

419 Commits

Author SHA1 Message Date
Fangrui Song bac974278c CodeGen/CommandFlags: Convert Optional to std::optional 2022-12-03 18:38:12 +00:00
Krzysztof Parzyszek 8c7c20f033 Convert Optional<CodeModel> to std::optional<CodeModel> 2022-12-03 12:08:47 -06:00
Nicholas Guy d52e2839f3 [ARM][CodeGen] Add support for complex deinterleaving
Adds the Complex Deinterleaving Pass implementing support for complex numbers in a target-independent manner, deferring to the TargetLowering for the given target to create a target-specific intrinsic.

Differential Revision: https://reviews.llvm.org/D114174
2022-11-14 14:02:27 +00:00
Kazu Hirata e0e687a615 [llvm] Don't use Optional::hasValue (NFC) 2022-06-20 10:38:12 -07:00
Archibald Elliott 3a24df992c [ARM] Pass for Cortex-A57 and Cortex-A72 Fused AES Erratum
This adds a late Machine Pass to work around a Cortex CPU Erratum
affecting Cortex-A57 and Cortex-A72:
- Cortex-A57 Erratum 1742098
- Cortex-A72 Erratum 1655431

The pass inserts instructions to make the inputs to the fused AES
instruction pairs no longer trigger the erratum. Here the pass errs on
the side of caution, inserting the instructions wherever we cannot prove
that the inputs came from a safe instruction.

The pass is used:
- for Cortex-A57 and Cortex-A72,
- for "generic" cores (which are used when using `-march=`),
- when the user specifies `-mfix-cortex-a57-aes-1742098` or
  `mfix-cortex-a72-aes-1655431` in the command-line arguments to clang.

Reviewed By: dmgreen, simon_tatham

Differential Revision: https://reviews.llvm.org/D119720
2022-05-13 10:47:33 +01:00
David Penry dcb77643e3 Reapply [CodeGen][ARM] Enable Swing Module Scheduling for ARM
Fixed "private field is not used" warning when compiled
with clang.

original commit: 28d09bbbc3
reverted in: fa49021c68

------

This patch permits Swing Modulo Scheduling for ARM targets
turns it on by default for the Cortex-M7.  The t2Bcc
instruction is recognized as a loop-ending branch.

MachinePipeliner is extended by adding support for
"unpipelineable" instructions.  These instructions are
those which contribute to the loop exit test; in the SMS
papers they are removed before creating the dependence graph
and then inserted into the final schedule of the kernel and
prologues. Support for these instructions was not previously
necessary because current targets supporting SMS have only
supported it for hardware loop branches, which have no
loop-exit-contributing instructions in the loop body.

The current structure of the MachinePipeliner makes it difficult
to remove/exclude these instructions from the dependence graph.
Therefore, this patch leaves them in the graph, but adds a
"normalization" method which moves them in the schedule to
stage 0, which causes them to appear properly in kernel and
prologues.

It was also necessary to be more careful about boundary nodes
when iterating across successors in the dependence graph because
the loop exit branch is now a non-artificial successor to
instructions in the graph. In additional, schedules with physical
use/def pairs in the same cycle should be treated as creating an
invalid schedule because the scheduling logic doesn't respect
physical register dependence once scheduled to the same cycle.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D122672
2022-04-29 10:54:39 -07:00
David Penry fa49021c68 Revert "[CodeGen][ARM] Enable Swing Module Scheduling for ARM"
This reverts commit 28d09bbbc3
while I investigate a buildbot failure.
2022-04-28 13:29:27 -07:00
David Penry 28d09bbbc3 [CodeGen][ARM] Enable Swing Module Scheduling for ARM
This patch permits Swing Modulo Scheduling for ARM targets
turns it on by default for the Cortex-M7.  The t2Bcc
instruction is recognized as a loop-ending branch.

MachinePipeliner is extended by adding support for
"unpipelineable" instructions.  These instructions are
those which contribute to the loop exit test; in the SMS
papers they are removed before creating the dependence graph
and then inserted into the final schedule of the kernel and
prologues. Support for these instructions was not previously
necessary because current targets supporting SMS have only
supported it for hardware loop branches, which have no
loop-exit-contributing instructions in the loop body.

The current structure of the MachinePipeliner makes it difficult
to remove/exclude these instructions from the dependence graph.
Therefore, this patch leaves them in the graph, but adds a
"normalization" method which moves them in the schedule to
stage 0, which causes them to appear properly in kernel and
prologues.

It was also necessary to be more careful about boundary nodes
when iterating across successors in the dependence graph because
the loop exit branch is now a non-artificial successor to
instructions in the graph. In additional, schedules with physical
use/def pairs in the same cycle should be treated as creating an
invalid schedule because the scheduling logic doesn't respect
physical register dependence once scheduled to the same cycle.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D122672
2022-04-28 13:01:18 -07:00
serge-sans-paille ed98c1b376 Cleanup includes: DebugInfo & CodeGen
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121332
2022-03-12 17:26:40 +01:00
Mircea Trofin cb2160760e [nfc][codegen] Move RegisterBank[Info].h under CodeGen
This wraps up from D119053. The 2 headers are moved as described,
fixed file headers and include guards, updated all files where the old
paths were detected (simple grep through the repo), and `clang-format`-ed it all.

Differential Revision: https://reviews.llvm.org/D119876
2022-03-01 21:53:25 -08:00
Jameson Nash c4b1a63a1b mark getTargetTransformInfo and getTargetIRAnalysis as const
Seems like this can be const, since Passes shouldn't modify it.

Reviewed By: wsmoses

Differential Revision: https://reviews.llvm.org/D120518
2022-02-25 14:30:44 -05:00
Yuanfang Chen f927021410 Reland "[clang-cl] Support the /JMC flag"
This relands commit b380a31de0.

Restrict the tests to Windows only since the flag symbol hash depends on
system-dependent path normalization.
2022-02-10 15:16:17 -08:00
Yuanfang Chen b380a31de0 Revert "[clang-cl] Support the /JMC flag"
This reverts commit bd3a1de683.

Break bots:
https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-windows-x64/b8822587673277278177/overview
2022-02-10 14:17:37 -08:00
Yuanfang Chen bd3a1de683 [clang-cl] Support the /JMC flag
The introduction and some examples are on this page:
https://devblogs.microsoft.com/cppblog/announcing-jmc-stepping-in-visual-studio/

The `/JMC` flag enables these instrumentations:
- Insert at the beginning of every function immediately after the prologue with
  a call to `void __fastcall __CheckForDebuggerJustMyCode(unsigned char *JMC_flag)`.
  The argument for `__CheckForDebuggerJustMyCode` is the address of a boolean
  global variable (the global variable is initialized to 1) with the name
  convention `__<hash>_<filename>`. All such global variables are placed in
  the `.msvcjmc` section.
- The `<hash>` part of `__<hash>_<filename>` has a one-to-one mapping
  with a directory path. MSVC uses some unknown hashing function. Here I
  used DJB.
- Add a dummy/empty COMDAT function `__JustMyCode_Default`.
- Add `/alternatename:__CheckForDebuggerJustMyCode=__JustMyCode_Default` link
  option via ".drectve" section. This is to prevent failure in
  case `__CheckForDebuggerJustMyCode` is not provided during linking.

Implementation:
All the instrumentations are implemented in an IR codegen pass. The pass is placed immediately before CodeGenPrepare pass. This is to not interfere with mid-end optimizations and make the instrumentation target-independent (I'm still working on an ELF port in a separate patch).

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D118428
2022-02-10 10:26:30 -08:00
serge-sans-paille 75e164f61d [llvm] Cleanup header dependencies in ADT and Support
The cleanup was manual, but assisted by "include-what-you-use". It consists in

1. Removing unused forward declaration. No impact expected.
2. Removing unused headers in .cpp files. No impact expected.
3. Removing unused headers in .h files. This removes implicit dependencies and
   is generally considered a good thing, but this may break downstream builds.
   I've updated llvm, clang, lld, lldb and mlir deps, and included a list of the
   modification in the second part of the commit.
4. Replacing header inclusion by forward declaration. This has the same impact
   as 3.

Notable changes:

- llvm/Support/TargetParser.h no longer includes llvm/Support/AArch64TargetParser.h nor llvm/Support/ARMTargetParser.h
- llvm/Support/TypeSize.h no longer includes llvm/Support/WithColor.h
- llvm/Support/YAMLTraits.h no longer includes llvm/Support/Regex.h
- llvm/ADT/SmallVector.h no longer includes llvm/Support/MemAlloc.h nor llvm/Support/ErrorHandling.h

You may need to add some of these headers in your compilation units, if needs be.

As an hint to the impact of the cleanup, running

clang++ -E  -Iinclude -I../llvm/include ../llvm/lib/Support/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l

before: 8000919 lines
after:  7917500 lines

Reduced dependencies also helps incremental rebuilds and is more ccache
friendly, something not shown by the above metric :-)

Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup/5831
2022-01-21 13:54:49 +01:00
Ties Stuij f5f28d5b0c [ARM] Implement BTI placement pass for PACBTI-M
This patch implements a new MachineFunction in the ARM backend for
placing BTI instructions. It is similar to the existing AArch64
aarch64-branch-targets pass.

BTI instructions are inserted into basic blocks that:
- Have their address taken
- Are the entry block of a function, if the function has external
  linkage or has its address taken
- Are mentioned in jump tables
- Are exception/cleanup landing pads

Each BTI instructions is placed in the beginning of a BB after the
so-called meta instructions (e.g. exception handler labels).

Each outlining candidate and the outlined function need to be in agreement about
whether BTI placement is enabled or not. If branch target enforcement is
disabled for a function, the outliner should not covertly enable it by emitting
a call to an outlined function, which begins with BTI.

The cost mode of the outliner is adjusted to account for the extra BTI
instructions in the outlined function.

The ARM Constant Islands pass will maintain the count of the jump tables, which
reference a block. A `BTI` instruction is removed from a block only if the
reference count reaches zero.

PAC instructions in entry blocks are replaced with PACBTI instructions (tests
for this case will be added in a later patch because the compiler currently does
not generate PAC instructions).

The ARM Constant Island pass is adjusted to handle BTI
instructions correctly.

Functions with static linkage that don't have their address taken can
still be called indirectly by linker-generated veneers and thus their
entry points need be marked with BTI or PACBTI.

The changes are tested using "LLVM IR -> assembly" tests, jump tables
also have a MIR test. Unfortunately it is not possible add MIR tests
for exception handling and computed gotos because of MIR parser
limitations.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Mikhail Maltsev
- Momchil Velikov
- Ties Stuij

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D112426
2021-12-01 12:54:05 +00:00
David Green 091244023a [ARM] Move VPTBlock pass after post-ra scheduling
Currently when tail predicating loops, vpt blocks need to be created
with the vctp predicate in case we need to revert to non-tail predicated
form. This has the unfortunate side effect of severely hampering post-ra
scheduling at times as the instructions are already stuck in vpt blocks,
not allowed to be independently ordered.

This patch addresses that by just moving the creation of VPT blocks
later in the pipeline, after post-ra scheduling has been performed. This
allows more optimal scheduling post-ra before the vpt blocks are
created, leading to more optimal tail predicated loops.

Differential Revision: https://reviews.llvm.org/D113094
2021-11-04 18:42:12 +00:00
Reid Kleckner 89b57061f7 Move TargetRegistry.(h|cpp) from Support to MC
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.

This allows us to ensure that Support doesn't have includes from MC/*.

Differential Revision: https://reviews.llvm.org/D111454
2021-10-08 14:51:48 -07:00
Arthur Eubanks 8815ce03e8 Remove "Rewrite Symbols" from codegen pipeline
It breaks up the function pass manager in the codegen pipeline.

With empty parameters, it looks at the -mllvm flag -rewrite-map-file.
This is likely not in use.

Add a check that we only have one function pass manager in the codegen
pipeline.

Some tests relied on the fact that we had a module pass somewhere in the
codegen pipeline.

addr-label.ll crashes on ARM due to this change. This is because a
ARMConstantPoolConstant containing a BasicBlock to represent a
blockaddress may hold an invalid pointer to a BasicBlock if the
blockaddress is invalidated by its BasicBlock getting removed. In that
case all referencing blockaddresses are RAUW a constant int. Making
ARMConstantPoolConstant::CVal a WeakVH fixes the crash, but I'm not sure
that's the right fix. As a workaround, create a barrier right before
ISel so that IR optimizations can't happen while a
ARMConstantPoolConstant has been created.

Reviewed By: rnk, MaskRay, compnerd

Differential Revision: https://reviews.llvm.org/D99707
2021-05-31 08:32:36 -07:00
Serge Guelton d6de1e1a71 Normalize interaction with boolean attributes
Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from

        if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

to

        if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

Introduce a getValueAsBool that normalize the check, with the following
behavior:

no attributes or attribute set to "false" => return false
attribute set to "true" => return true

Differential Revision: https://reviews.llvm.org/D99299
2021-04-17 08:17:33 +02:00
David Green 7b6f760fcd [ARM] MVE vector lane interleaving
MVE does not have a single sext/zext or trunc instruction that takes the
bottom half of a vector and extends to a full width, like NEON has with
MOVL. Instead it is expected that this happens through top/bottom
instructions. So the MVE equivalent VMOVLT/B instructions take either
the even or odd elements of the input and extend them to the larger
type, producing a vector with half the number of elements each of double
the bitwidth. As there is no simple instruction for a normal extend, we
often have to expand sext/zext/trunc into a series of lane moves (or
stack loads/stores, which we do not do yet).

This pass takes vector code that starts at truncs, looks for
interconnected blobs of operations that end with sext/zext and
transforms them by adding shuffles so that the lanes are interleaved and
the MVE VMOVL/VMOVN instructions can be used. This is done pre-ISel so
that it can work across basic blocks.

This initial version of the pass just handles a limited set of
instructions, not handling constants or splats or FP, which can all come
as extensions to this base.

Differential Revision: https://reviews.llvm.org/D95804
2021-03-28 19:34:58 +01:00
Amara Emerson 8a316045ed [AArch64][GlobalISel] Enable use of the optsize predicate in the selector.
To do this while supporting the existing functionality in SelectionDAG of using
PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis
dependencies to the instruction selector pass.

Then, use the predicate to generate constant pool loads for f32 materialization,
if we're targeting optsize/minsize.

Differential Revision: https://reviews.llvm.org/D97732
2021-03-02 12:55:51 -08:00
David Green e880f8b88a [ARM] Rename pass to MVETPAndVPTOptimisationsPass
This pass has for a while performed Tail predication as well as VPT
block optimizations. Rename the pass to make that clear.
2021-03-01 21:57:19 +00:00
Arlo Siemsen 080866470d Add ehcont section support
In the future Windows will enable Control-flow Enforcement Technology (CET aka shadow stacks). To protect the path where the context is updated during exception handling, the binary is required to enumerate valid unwind entrypoints in a dedicated section which is validated when the context is being set during exception handling.

This change allows llvm to generate the section that contains the appropriate symbol references in the form expected by the msvc linker.

This feature is enabled through a new module flag, ehcontguard, which was modelled on the cfguard flag.

The change includes a test that when the module flag is enabled the section is correctly generated.

The set of exception continuation information includes returns from exceptional control flow (catchret in llvm).

In order to collect catchret we:
1) Includes an additional flag on machine basic blocks to indicate that the given block is the target of a catchret operation,
2) Introduces a new machine function pass to insert and collect symbols at the start of each block, and
3) Combines these targets with the other EHCont targets that were already being collected.

Change originally authored by Daniel Frampton <dframpto@microsoft.com>

For more details, see MSVC documentation for `/guard:ehcont`
  https://docs.microsoft.com/en-us/cpp/build/reference/guard-enable-eh-continuation-metadata

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D94835
2021-02-15 14:27:12 +08:00
Sam Tebbs 5e4480b6c0 [ARM] Don't run the block placement pass at O0
The block placement pass shouldn't run unless optimisations are enabled.

Differential Revision: https://reviews.llvm.org/D94691
2021-01-15 13:59:29 +00:00
Sam Tebbs 60fda8ebb6 [ARM] Add a pass that re-arranges blocks when there is a backwards WLS branch
Blocks can be laid out such that a t2WhileLoopStart branches backwards. This is forbidden by the architecture and so it fails to be converted into a low-overhead loop. This new pass checks for these cases and moves the target block, fixing any fall-through that would then be broken.

Differential Revision: https://reviews.llvm.org/D92385
2021-01-13 17:23:00 +00:00
Kristof Beyls a4c1f5160e [ARM] Harden indirect calls against SLS
To make sure that no barrier gets placed on the architectural execution
path, each indirect call calling the function in register rN, it gets
transformed to a direct call to __llvm_slsblr_thunk_mode_rN.  mode is
either arm or thumb, depending on the mode of where the indirect call
happens.

The llvm_slsblr_thunk_mode_rN thunk contains:

bx rN
<speculation barrier>

Therefore, the indirect call gets split into 2; one direct call and one
indirect jump.
This transformation results in not inserting a speculation barrier on
the architectural execution path.

The mitigation is off by default and can be enabled by the
harden-sls-blr subtarget feature.

As a linker is allowed to clobber r12 on function calls, the
above code transformation is not correct in case a linker does so.
Similarly, the transformation is not correct when register lr is used.
Avoiding r12/lr being used is done in a follow-on patch to make
reviewing this code easier.

Differential Revision: https://reviews.llvm.org/D92468
2020-12-19 12:33:42 +00:00
Kristof Beyls 195f44278c [ARM] Implement harden-sls-retbr for ARM mode
Some processors may speculatively execute the instructions immediately
following indirect control flow, such as returns, indirect jumps and
indirect function calls.

To avoid a potential miss-speculatively executed gadget after these
instructions leaking secrets through side channels, this pass places a
speculation barrier immediately after every indirect control flow where
control flow doesn't return to the next instruction, such as returns and
indirect jumps, but not indirect function calls.

Hardening of indirect function calls will be done in a later,
independent patch.

This patch is implementing the same functionality as the AArch64 counter
part implemented in https://reviews.llvm.org/D81400.
For AArch64, returns and indirect jumps only occur on RET and BR
instructions and hence the function attribute to control the hardening
is called "harden-sls-retbr" there. On AArch32, there is a much wider
variety of instructions that can trigger an indirect unconditional
control flow change.  I've decided to stick with the name
"harden-sls-retbr" as introduced for the corresponding AArch64
mitigation.

This patch implements this for ARM mode. A future patch will extend this
to also support Thumb mode.

The inserted barriers are never on the correct, architectural execution
path, and therefore performance overhead of this is expected to be low.
To ensure these barriers are never on an architecturally executed path,
when the harden-sls-retbr function attribute is present, indirect
control flow is never conditionalized/predicated.

On targets that implement that Armv8.0-SB Speculation Barrier extension,
a single SB instruction is emitted that acts as a speculation barrier.
On other targets, a DSB SYS followed by a ISB is emitted to act as a
speculation barrier.

These speculation barriers are implemented as pseudo instructions to
avoid later passes to analyze them and potentially remove them.

The mitigation is off by default and can be enabled by the
harden-sls-retbr subtarget feature.

Differential Revision: https://reviews.llvm.org/D92395
2020-12-19 11:42:39 +00:00
Amara Emerson e5784ef8f6 [GlobalISel] Enable usage of BranchProbabilityInfo in IRTranslator.
We weren't using this before, so none of the MachineFunction CFG edges had the
branch probability information added. As a result, block placement later in the
pipeline was flying blind.

This is enabled only with optimizations enabled like SelectionDAG.

Differential Revision: https://reviews.llvm.org/D86824
2020-09-09 14:31:12 -07:00
Roman Lebedev bb7d3af113
Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
This was reverted in 503deec218
because it caused gigantic increase (3x) in branch mispredictions
in certain benchmarks on certain CPU's,
see https://reviews.llvm.org/D84108#2227365.

It has since been investigated and here are the results:
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200907/827578.html
> It's an amazingly severe regression, but it's also all due to branch
> mispredicts (about 3x without this). The code layout looks ok so there's
> probably something else to deal with. I'm not sure there's anything we can
> reasonably do so we'll just have to take the hit for now and wait for
> another code reorganization to make the branch predictor a bit more happy :)
>
> Thanks for giving us some time to investigate and feel free to recommit
> whenever you'd like.
>
> -eric

So let's just reland this.
Original commit message:


I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.

After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.

Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.

I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.

Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.

Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.

As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S

llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement

The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.

If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
  * -14 (-73.68%) loops not rotated due to the header size (yay)
  * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
  * -3937 (-64.19%) common instructions hoisted
  * +561 (+0.06%) x86 asm instructions
  * -2 basic blocks
  * +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable  {F12360201}
  * -36396 (-65.29%) common instructions hoisted
  * +1676 (+0.02%) x86 asm instructions
  * +662 (+0.06%) basic blocks
  * +4395 (+0.04%) IR instructions

It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D84108

This reverts commit 503deec218.
2020-09-08 00:24:03 +03:00
Craig Topper aab90384a3 [Attributes] Add a method to check if an Attribute has AttrKind None. Use instead of hasAttribute(Attribute::None)
There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None.

So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline.

This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations.

Differential Revision: https://reviews.llvm.org/D86744
2020-08-28 13:23:45 -07:00
Sam Parker 03141aa04a [ARM] Enable outliner at -Oz for M-class
Enable default outlining when the function has the minsize attribute
and we're targeting an m-class core.

Differential Revision: https://reviews.llvm.org/D82951
2020-08-27 08:02:56 +01:00
Roman Lebedev 503deec218
Temporairly revert "[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline"
As disscussed in post-commit review starting with
	https://reviews.llvm.org/D84108#2227365
while this appears to be mostly a win overall, especially code-size-wise,
this appears to shake //certain// code pattens in a way that is extremely
unfavorable for performance (+30% runtime regression)
on certain CPU's (i personally can't reproduce).

So until the behaviour is better understood, and a path forward is mapped,
let's back this out for now.

This reverts commit 1d51dc38d8.
2020-08-22 00:33:22 +03:00
Roman Lebedev 1d51dc38d8
[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.

After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.

Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.

I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.

Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.

Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.

As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S

llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement

The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.

If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
  * -14 (-73.68%) loops not rotated due to the header size (yay)
  * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
  * -3937 (-64.19%) common instructions hoisted
  * +561 (+0.06%) x86 asm instructions
  * -2 basic blocks
  * +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable  {F12360201}
  * -36396 (-65.29%) common instructions hoisted
  * +1676 (+0.02%) x86 asm instructions
  * +662 (+0.06%) basic blocks
  * +4395 (+0.04%) IR instructions

It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D84108
2020-07-29 20:05:30 +03:00
Roman Lebedev fb432a51f4
Reland "[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions"
This reverts commit 1067d3e176,
which reverted commit b2018198c3,
because it introduced a Dependency Cycle between Transforms/Scalar and
Transforms/Utils.

So let's just move SimplifyCFGOptions.h into Utils/, thus avoiding
the cycle.
2020-07-16 13:40:01 +03:00
Adrian Kuegel 1067d3e176 Revert "[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions"
This reverts commit b2018198c3.
This commit introduced a Dependency Cycle between Transforms/Scalar and
Transforms/Utils. Transforms/Scalar already depends on Transforms/Utils,
so if SimplifyCFGOptions.h is moved to Scalar, and Utils/Local.h still
depends on it, we have a cycle.
2020-07-16 10:54:10 +02:00
Roman Lebedev b2018198c3
[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions
Taking so many parameters is simply unmaintainable.

We don't want to include the entire llvm/Transforms/Utils/Local.h into
llvm/Transforms/Scalar.h so i've split SimplifyCFGOptions into
it's own header.
2020-07-16 01:27:54 +03:00
Nicholas Guy dc8e4d8566 [ARM] Rearrange SizeReduction when using -Oz
Move the Thumb2SizeReduce pass to before IfConversion when optimising
for minimal code size.

Running the Thumb2SizeReduction pass before IfConversionallows T1
instructions to propagate to the final output, rather than the
ifConverter modifying T2 instructions and preventing them from being
reduced later.

This change does introduce a regression regarding execution time, so
it's only applied when optimising for size.

Running the LLVM Test Suite with this change produces a geomean
difference of -0.1% for the size..text metric.

Differential Revision: https://reviews.llvm.org/D82439
2020-07-02 09:19:38 +01:00
Yvan Roux 0e4827aa4e [ARM][MachineOutliner] Add Machine Outliner support for ARM.
Enables Machine Outlining for ARM and Thumb2 modes.  This is the first
patch of the series which adds all the basic logic for the support, and
only handles tail-calls and thunks.

The outliner can be turned on by using clang -moutline option or -mllvm
-enable-machine-outliner one (like AArch64).

Differential Revision: https://reviews.llvm.org/D76066
2020-05-15 08:44:23 +02:00
Pierre-vh 4563024356 [Target][ARM] Adding MVE VPT Optimisation Pass
Differential Revision: https://reviews.llvm.org/D76709
2020-04-14 15:16:27 +01:00
Yvan Roux bd069ad39c [ARM] Move ConstantIsland and LowOverheadLoops Passes.
Move ARM ConstantIsland and LowOverheadLopps passes later in the pipeline
such that they will be run after the upcoming Machine Outlining pass.

Differential Revision: https://reviews.llvm.org/D76065
2020-03-25 16:49:21 +01:00
Djordje Todorovic d9b9621009 Reland D73534: [DebugInfo] Enable the debug entry values feature by default
The issue that was causing the build failures was fixed with the D76164.
2020-03-19 13:57:30 +01:00
Nico Weber f82b32a51e Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit 5aa5c943f7.
Causes clang to assert, see
https://bugs.chromium.org/p/chromium/issues/detail?id=1061533#c4
for a repro.
2020-03-13 15:37:44 -04:00
Djordje Todorovic 5aa5c943f7 Reland "[DebugInfo] Enable the debug entry values feature by default"
Differential Revision: https://reviews.llvm.org/D73534
2020-03-10 09:15:06 +01:00
Djordje Todorovic 2f215cf36a Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit rGfaff707db82d.
A failure found on an ARM 2-stage buildbot.
The investigation is needed.
2020-02-20 14:41:39 +01:00
Djordje Todorovic faff707db8 Reland "[DebugInfo] Enable the debug entry values feature by default"
Differential Revision: https://reviews.llvm.org/D73534
2020-02-19 11:12:26 +01:00
Djordje Todorovic 2bf44d11cb Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit rGa82d3e8a6e67.
2020-02-18 16:38:11 +01:00
Djordje Todorovic a82d3e8a6e Reland "[DebugInfo] Enable the debug entry values feature by default"
This patch enables the debug entry values feature.

  - Remove the (CC1) experimental -femit-debug-entry-values option
  - Enable it for x86, arm and aarch64 targets
  - Resolve the test failures
  - Leave the llc experimental option for targets that do not
    support the CallSiteInfo yet

Differential Revision: https://reviews.llvm.org/D73534
2020-02-18 14:41:08 +01:00
Djordje Todorovic 97ed706a96 Revert "[DebugInfo] Enable the debug entry values feature by default"
This reverts commit rG9f6ff07f8a39.

Found a test failure on clang-with-thin-lto-ubuntu buildbot.
2020-02-12 11:59:04 +01:00
Djordje Todorovic 9f6ff07f8a [DebugInfo] Enable the debug entry values feature by default
This patch enables the debug entry values feature.

  - Remove the (CC1) experimental -femit-debug-entry-values option
  - Enable it for x86, arm and aarch64 targets
  - Resolve the test failures
  - Leave the llc experimental option for targets that do not
    support the CallSiteInfo yet

Differential Revision: https://reviews.llvm.org/D73534
2020-02-12 10:25:14 +01:00