Commit Graph

885 Commits

Author SHA1 Message Date
Kazu Hirata 109df7f9a4 [llvm] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-13 12:55:42 -07:00
Ruobing Han f756f06cc4 [SimpleLoopUnswitch] Skip non-trivial unswitching of cold loops
With profile data, non-trivial LoopUnswitch will only apply on non-cold loops, as unswitching cold loops may not gain much benefit but significantly increase the code size.

Reviewed By: aeubanks, asbirlea

Differential Revision: https://reviews.llvm.org/D129599
2022-08-08 18:12:04 +00:00
Arthur Eubanks 81c4e58e2a [StandardInstrumentations] Handle case where block order changes
Previously we'd go off the end of the BI iterator because we expected
that the relative positions of common blocks before and after were
consistent. That's not always true though, for example with
jump-threading.

Reviewed By: jamieschmeiser

Differential Revision: https://reviews.llvm.org/D130596
2022-08-08 07:41:39 -07:00
Congzhe Cao 76be554931 [DependenceAnalysis][PR56275] Normalize negative dependence analysis results
This patch is the first of the two-patch series (D130188, D130179) that
resolve PR56275 (https://github.com/llvm/llvm-project/issues/56275)
which is a missed opportunity, where a perfrectly valid case for loop
interchange failed interchange legality.

If the distance/direction vector produced by dependence analysis (DA) is
negative, it needs to be normalized (reversed). This patch provides helper
functions `isDirectionNegative()` and `normalize()` in DA that does the
normalization, and clients can query DA to do normalization if needed.

A pass option `<normalized-results>` is added to DependenceAnalysisPrinterPass,
and we leverage it to update DA test cases to make sure of test coverage. The
test cases added in `Banerjee.ll` shows that negative vectors are normalized
with `print<da><normalized-results>`.

Reviewed By: bmahjour, Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D130188
2022-08-03 19:59:00 -04:00
Arthur Eubanks 43aa4ac70b [StandardInstrumentations] Assign names to basic blocks without names
Fixes code in OrderedChangedData<T>::report which assumes that a string will only appear once in Before/After.

Reviewed By: jamieschmeiser

Differential Revision: https://reviews.llvm.org/D130587
2022-08-02 11:04:01 -07:00
Fangrui Song 2b70bebc6d [MachineFunctionPass] Support -print-changed={,c}diff{,-quiet}
Follow-up to D130434.
Move doSystemDiff to PrintPasses.cpp and call it in MachineFunctionPass.cpp.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D130833
2022-08-01 12:56:15 -07:00
Fangrui Song f106525de2 [MachineFunctionPass] Support -print-changed and -print-changed=quiet
-print-changed for new pass manager is handy beside -print-after-all.
Port it to MachineFunctionPass.

Note: lib/Passes/StandardInstrumentations.cpp implements a number of
misc features. If we want to use them for codegen, we may need to lift
some functionality to LLVMIR.

Reviewed By: aeubanks, jamieschmeiser

Differential Revision: https://reviews.llvm.org/D130434
2022-07-26 10:16:49 -07:00
Sanjay Patel bfb9b8e075 [Passes] add a tail-call-elim pass near the end of the opt pipeline
We call tail-call-elim near the beginning of the pipeline,
but that is too early to annotate calls that get added later.

In the motivating case from issue #47852, the missing 'tail'
on memset leads to sub-optimal codegen.

I experimented with removing the early instance of
tail-call-elim instead of just adding another pass, but that
appears to be slightly worse for compile-time:
+0.15% vs. +0.08% time.
"tailcall" shows adding the pass; "tailcall2" shows moving
the pass to later, then adding the original early pass back
(so 1596886802 is functionally equivalent to 180b0439dc ):
https://llvm-compile-time-tracker.com/index.php?config=NewPM-O3&stat=instructions&remote=rotateright

Note that there was an effort to split the tail call functionality
into 2 passes - that could help reduce compile-time if we find
that this change costs more in compile-time than expected based
on the preliminary testing:
D60031

Differential Revision: https://reviews.llvm.org/D130374
2022-07-25 15:25:47 -04:00
Fangrui Song 89357f0cb9 [Passes] Simplify ChangePrinter names. NFC 2022-07-23 19:32:13 -07:00
Arthur Eubanks 8c6305b8b4 [NewPM] Print function/SCC size with -debug-pass-manager
This is helpful for debugging issues with very large functions or SCC.
Also helpful when function names are very large and it's hard to tell the number of nodes in an SCC.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D128003
2022-07-19 09:00:37 -07:00
Alina Sbirlea 846d10f16a Turn on flag to not re-run simplification pipeline.
This patch turns on the flag `-enable-no-rerun-simplification-pipeline`, which means the simplification pipeline will not be rerun on unchanged functions in the CGSCCPass Manager.

Compile time improvement:
https://llvm-compile-time-tracker.com/compare.php?from=17457be1c393ff691cca032b04ea1698fedf0301&to=882301ebb893c8ef9f09fe1ea871f7995426fa07&stat=instructions

No meaningful run time regressions observed in the llvm test suite and
in additional internal workloads at this time.

The example test in `test/Other/no-rerun-function-simplification-pipeline.ll` is a good means to understand the effect of this change:
```
define void @f1(void()* %p) alwaysinline {
  call void %p()
  ret void
}

define void @f2() #0 {
  call void @f1(void()* @f2)
  call void @f3()
  ret void
}

define void @f3() #0 {
  call void @f2()
  ret void
}
```

There are two SCCs formed by the ModuleToPostOrderCGSCCAdaptor: (f1) and (f2, f3).

The pass manager runs on the first SCC, leading to running the simplification pipeline (function and loop passes) on f1. With the flag on, after this, the output will have `Running analysis: ShouldNotRunFunctionPassesAnalysis on f1`.

Next, the pass manager runs on the second SCC: (f2, f3). Since f1() was inlined, f2() now calls itself, and also calls f3(), while f3() only calls f2().
So the pass manager for the SCC first runs the Inliner on (f2, f3), then the simplification pipeline on f2.
With the flag on, the output will have `Running analysis: ShouldNotRunFunctionPassesAnalysis on f2`; unless the inliner makes a change, this analysis remains preserved which means there's no reason to rerun the simplification pipeline. With the flag off, there is a second run of the simplification pipeline run on f2.

Next, the same flow occurs for f3. The simplification pipeline is run on f3 a single time with the flag on, along with `ShouldNotRunFunctionPassesAnalysis on f3`, and twice with the flag off.
The reruns occur only on f2 and f3 due to the additional ref edges.
2022-07-14 06:23:55 -07:00
Kazu Hirata ec9a0e36d9 [IPO] Remove addLTOOptimizationPasses and addLateLTOOptimizationPasses (NFC)
The last uses were removed on Apr 15, 2022 in commit
2e6ac54cf4.

Differential Revision: https://reviews.llvm.org/D129460
2022-07-11 20:15:24 -07:00
Nicolai Hähnle ede600377c ManagedStatic: remove many straightforward uses in llvm
(Reapply after revert in e9ce1a5880 due to
Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other
than error categories, to be checked in more detail and reapplied
separately.)

Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.

Differential Revision: https://reviews.llvm.org/D129120
2022-07-10 10:29:15 +02:00
Nicolai Hähnle e9ce1a5880 Revert "ManagedStatic: remove many straightforward uses in llvm"
This reverts commit e6f1f06245.

Reverting due to a failure on the fuchsia-x86_64-linux buildbot.
2022-07-10 09:54:30 +02:00
Nicolai Hähnle e6f1f06245 ManagedStatic: remove many straightforward uses in llvm
Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.

Differential Revision: https://reviews.llvm.org/D129120
2022-07-10 09:15:08 +02:00
Ben Dunbobbin 325e7e8b87 [LLVM][LTO][LLD] Enable Profile Guided Layout (--call-graph-profile-sort) for FullLTO
The CGProfilePass needs to be run during FullLTO compilation at link
time to emit the .llvm.call-graph-profile section to the compiled LTO
object file. Currently, it is being run only during the initial
LTO-prelink compilation stage (to produce the bitcode files to be
consumed by the linker) and so the section is not produced.

ThinLTO is not affected because:
- For ThinLTO-prelink compilation the CGProfilePass pass is not run
  because ThinLTO-prelink passes are added via
  buildThinLTOPreLinkDefaultPipeline. Normal and FullLTO-prelink
  passes are both added via buildPerModuleDefaultPipeline which uses
  the LTOPreLink parameter to customize its behavior for the
  FullLTO-prelink pass differences.
- ThinLTO backend compilation phase adds the CGProfilePass (see:
  buildModuleOptimizationPipeline).

Adjust when the pass is run so that the .llvm.call-graph-profile
section is produced correctly for FullLTO.

Fixes #56185 (https://github.com/llvm/llvm-project/issues/56185)
2022-07-01 13:57:36 +01:00
Nicolai Hähnle 8de6d4b712 StandardInstrumentation: print verifier output to errs
Enabling the verifiers is not very helpful if their output is
suppressed beyond the fatal error.

Differential Revision: https://reviews.llvm.org/D128743
2022-06-29 12:11:55 +02:00
Mitch Phillips dacfa24f75 Delete 'llvm.asan.globals' for global metadata.
Now that we have the sanitizer metadata that is actually on the global
variable, and now that we use debuginfo in order to do symbolization of
globals, we can delete the 'llvm.asan.globals' IR synthesis.

This patch deletes the 'location' part of the __asan_global that's
embedded in the binary as well, because it's unnecessary. This saves
about ~1.7% of the optimised non-debug with-asserts clang binary.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D127911
2022-06-27 14:40:40 -07:00
Chuanqi Xu 24e53b01d5 Revert "[Coroutines] Only do symmetric transfer if optimization is on"
This reverts commit 7782e080e8. According
to the discussion of WG21, symmetric transfer is a desired feature.
2022-06-27 10:54:56 +08:00
Mingming Liu e0d069598b [Inline] Annotate inline pass name with link phase information for analysis.
The annotation is flag gated; flag is turned off by default.

Differential Revision: https://reviews.llvm.org/D125495
2022-06-24 10:06:43 -07:00
Chuanqi Xu 7782e080e8 [Coroutines] Only do symmetric transfer if optimization is on
Symmetric transfer is not a part of C++ standards. So the vendors is not
forced to implement it any way. Given the symmetric transfer nowadays is
an optimization. It makes more sense to enable it only if the
optimization is enabled. It is also helpful for the compilation speed in
O0.
2022-06-20 16:20:36 +08:00
Jin Xin Ng aaff3fb6d5 [mlgo] Fix accounting for SCC splits
Previously if the inliner split an SCC such that an empty one remained, the MLInlineAdvisor could potentially lose track of the EdgeCount if a subsequent CGSCC pass modified the calls of a function that was initially in the SCC pre-split. Saving the seen nodes in onPassEntry resolves this.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D127693
2022-06-15 10:53:23 -07:00
Fangrui Song d86a206f06 Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options 2022-06-05 00:31:44 -07:00
Fangrui Song 557efc9a8b [llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.

Also remove cl::init(false) while touching the lines.
2022-06-03 21:59:05 -07:00
Arthur Eubanks 36096c2b38 [NFC][JumpThreading] Remove InsertFreezeWhenUnfoldingSelect pass parameter
All callers pass true.

select-unfold-freeze.ll is now a subset of select.ll so delete it.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D126501
2022-05-26 16:13:34 -07:00
Jamie Schmeiser 24239e246c Add new hidden option -print-on-crash that prints out IR that caused opt pipeline to crash
A new hidden option -print-on-crash that prints the IR as it was upon entering
the last pass when there is a crash.

The IR is saved in its print form before each pass is started and a
signal handler is registered.  If the compilation crashes, the signal
handler will print the saved IR to dbgs().  This option
can be modified using -print-module-scope to get the IR for the complete
module.  Note that this option only works with the new pass manager.

Reviewed By: yrouban

Differential Revision: https://reviews.llvm.org/D86657
2022-05-23 15:38:38 -07:00
Yang Keao 7dce9eb6e5 [DomPrinter] Migrate -dot-dom to the new pass manager.
In D123677, @YangKeao provided an implementation of `DOTGraphTraits{Viewer,Printer}` in the new pass manager. This commit migrates the `DomPrinter` and `DomViewer` to the new pass manager.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D124904
2022-05-16 15:07:16 -05:00
Chuanqi Xu 02d6845234 [NFC] [Coroutines] Remove EnableReuseStorageInFrame option
The EnableReuseStorageInFrame option is designed for testing only.
But it is better to use *_PASS_WITH_PARAMS macro to keep consistent with
other passes.
2022-05-10 17:28:43 +08:00
Chuanqi Xu 405bf90235 [NFC] [Pipelines] Hoist CoroCleanup as Module Pass
This is similar to previous patch https://reviews.llvm.org/D123925. It
could also reduce the time we call declaresCoroCleanupIntrinsics. And it
is helpful for further changes.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D124362
2022-05-05 15:15:09 +08:00
Chuanqi Xu 7d40f562e7 [Pipelines] Hoist CoroCleanup to avoid blocking optimizations
CoroCleanup is designed to lowering all the remaining coroutine
intrinsics. It is required to run after CoroSplit only. However, the
position of CoroCleanup now is far too late. The downside here is that
the unlowered coroutine instrincs might blocking other optimizations
too. So it should be a pure win to hoist the position of CoroCleanup.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D124360
2022-05-05 15:13:27 +08:00
Mingming Liu 408bb9a375 Add a regression test to guard the 0 hot-caller threshold in SamplePGO + ThinLTO. - Add a comment near where the threshold is set. 2022-04-25 18:29:56 +00:00
Chuanqi Xu f9bee35689 [Pipelines] Hoist CoroEarly as a module pass
This change could reduce the time we call `declaresCoroEarlyIntrinsics`.
And it is helpful for future changes.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D123925
2022-04-19 11:04:24 +08:00
Arthur Eubanks a7e20a8a7a [CallPrinter] Port CallPrinter passes to new pass manager
Port the legacy CallGraphViewer and CallGraphDOTPrinter to work with the new pass manager.

Addresses issue https://github.com/llvm/llvm-project/issues/54323
Adds back related tests that were removed in commits d53a4e7b4a and 9e9d9aba14

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D122989
2022-04-18 10:02:18 -07:00
Evgeny Mandrikov 443b6ec169 [NFC] Fix build failure with GCC 11 in C++20 mode
This was already fixed in
2ccf0b76bc
but then regressed in
79a1f3e7c6

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D123589
2022-04-13 09:42:41 -07:00
Matt Arsenault 39f1568633 Transforms: Split LowerAtomics into separate Utils and pass
This will allow code sharing from AtomicExpandPass. Not entirely sure
why these exist as separate passes though.
2022-04-06 20:54:45 -04:00
Wenju He 0bda12b5bc [NewPM] Add OptimizerEarly module extension point
VectorizerStart extension is module callback in old PM, but is function
callback in new PM. We lack a module extension point between end of
buildModuleSimplificationPipeline and the function optimization
(including vectorizer) pipeline. So this patch adds a new module
extension point before the function optimization pipeline.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D122296
2022-03-31 08:22:27 -07:00
Julian Lettner 64902d335c Reland "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO"
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.

Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.

Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`).  This escape hatch will be
removed in the future.

Differential Revision: https://reviews.llvm.org/D121736
2022-03-23 18:36:55 -07:00
Zequan Wu 581dc3c729 Revert "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO"
This reverts commit 22570bac69.
2022-03-23 16:11:54 -07:00
Arthur Eubanks 9bd66b312c [PassManager][Coroutine] Run passes under -O0 conditionally and run GlobalDCE
CoroSplit lowers various coroutine intrinsics. It's a CGSCC pass and
CGSCC passes don't run on unreachable functions. Normally GlobalDCE will
come along and delete unreachable functions, but we don't run GlobalDCE
under -O0, so an unreachable function with coroutine intrinsics may
never have CoroSplit run on it.

This patch adds GlobalDCE when coroutines intrinsics are present. It
also now runs all coroutine passes conditional when coroutine intrinsics
are present. This should also solve the -O0 regression reported in
D105877 due to LazyCallGraph construction.

Fixes https://github.com/llvm/llvm-project/issues/54117

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D122275
2022-03-23 11:03:26 -07:00
Florian Hahn 5ab421fb4e
[LICM] Add allowspeculation pass options.
This adds a new option to control AllowSpeculation added in D119965 when
using `-passes=...`.

This allows reproducing #54023 using opt.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D121944
2022-03-18 16:51:57 +00:00
Julian Lettner 22570bac69 Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.

Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.

Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`).  This escape hatch will be
removed in the future.

Differential Revision: https://reviews.llvm.org/D121736
2022-03-17 10:47:13 -07:00
Simon Pilgrim 7262eacd41 Revert rG9c542a5a4e1ba36c24e48185712779df52b7f7a6 "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO"
Mane of the build bots are complaining: Unknown command line argument '-lower-global-dtors'
2022-03-15 13:01:35 +00:00
Julian Lettner 9c542a5a4e Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.

Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.

Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`).  This escape hatch will be
removed in the future.

Differential Revision: https://reviews.llvm.org/D121327
2022-03-14 17:51:18 -07:00
Arthur Eubanks 4fc7c55fff [NewPM] Actually recompute GlobalsAA before module optimization pipeline
RequireAnalysis<GlobalsAA> doesn't actually recompute GlobalsAA.
GlobalsAA isn't invalidated (unless specifically invalidated) because
it's self-updating via ValueHandles, but can be imprecise during the
self-updates.

Rather than invalidating GlobalsAA, which would invalidate AAManager and
any analyses that use AAManager, create a new pass that recomputes
GlobalsAA.

Fixes #53131.

Differential Revision: https://reviews.llvm.org/D121167
2022-03-14 09:42:34 -07:00
Xiang1 Zhang c31014322c TLS loads opimization (hoist)
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D120000
2022-03-10 09:29:06 +08:00
Florian Hahn f98125abb2
Revert "[PassManager] Add pretty stack entries before P->run() call."
This reverts commit 128745cc26.

This increased compile-time unnecessarily. Revert this change and follow
ups 2c7afadb47 & add0c5856d.

http://llvm-compile-time-tracker.com/compare.php?from=338dfcd60f843082bb589b287d890dbd9394eb82&to=128745cc2681c284bc6d0150a319673a6d6e8424&stat=instructions
2022-03-09 18:46:32 +00:00
Florian Hahn 128745cc26
[PassManager] Add pretty stack entries before P->run() call.
This patch adds PrettyStackEntries before running passes. The entries
include the pass name and the IR unit the pass runs on.

The information is used the print additional information when a pass
crashes, including the name and a reference to the IR unit on which it
crashed. This is similar to the behavior of the legacy pass manager.

The improved stack trace now includes:

Stack dump:
0.	Program arguments: bin/opt -loop-vectorize -force-vector-width=4 crash.ll
1.	Running pass 'ModuleToFunctionPassAdaptor' on module 'crash.ll'
2.	Running pass 'LoopVectorizePass' on function '@a'

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D120993
2022-03-09 13:01:09 +00:00
Arthur Eubanks 79a1f3e7c6 [NFC] Cleanup StandardInstrumentations 2022-03-07 16:24:36 -08:00
serge-sans-paille 59630917d6 Cleanup includes: Transform/Scalar
Estimated impact on preprocessor output line:
before: 1062981579
after:  1062494547

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120817
2022-03-03 07:56:34 +01:00
Xiang1 Zhang 65588a0776 Revert "TLS loads opimization (hoist)"
Revert for more reviews

This reverts commit 30e612ebdf.
2022-03-02 14:10:11 +08:00