Commit Graph

356 Commits

Author SHA1 Message Date
Amir Ayupov 556efdba85 [BOLT][NFC] Extend debug logging in analyzeJumpTable
Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D131918
2022-08-15 20:34:40 -07:00
Kazu Hirata 2febc32c9c Use llvm::erase_if (NFC) 2022-08-13 12:55:48 -07:00
Fangrui Song 53113515cd [BOLT] Use Optional::emplace to avoid move assignment. NFC 2022-08-12 12:51:50 -07:00
Fangrui Song 0972a390b9 LLVM_FALLTHROUGH => [[fallthrough]]. NFC 2022-08-09 04:06:52 +00:00
Thorsten Schütt 0c9258612b [bolt] silence unused variables warnings 2022-08-06 20:52:45 +02:00
Rafael Auler 19eb908e61 [BOLT] Remove always true if statement
Got a warning from GCC when building this.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D131092
2022-08-03 13:11:33 -07:00
Nicolai Hähnle f7872cdce1 CommandLine: add and use cl::SubCommand::get{All,TopLevel}
Prefer using these accessors to access the special sub-commands
corresponding to the top-level (no subcommand) and all sub-commands.

This is a preparatory step towards removing the use of ManagedStatic:
with a subsequent change, these global instances will be moved to
be regular function-scope statics.

It is split up to give downstream projects a (albeit short) window in
which they can switch to using the accessors in a forward-compatible
way.

Differential Revision: https://reviews.llvm.org/D129118
2022-08-02 23:49:16 +02:00
David Blaikie 7651522b78 Fold assert-used variable into assert
Fixes #56724
2022-08-01 21:57:11 +00:00
Alexander Yermolovich dd29b3c542 [BOLT][DWARF] Fix handling of multiple DW_OP_addrx in an expression
We were not handling correclty multiple DW_OP_addrx in the location expression.
This was exposed by clang-15 build in release mode with debug information.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D130812
2022-08-01 14:38:47 -07:00
Kazu Hirata bf6021709a Use drop_begin (NFC) 2022-07-31 15:17:09 -07:00
Kazu Hirata ce3b687b88 [BOLT] Remove redundaunt string initialization (NFC)
Identified with readability-redundant-string-init.
2022-07-31 15:17:05 -07:00
Kazu Hirata 1bf531a5d0 [BOLT] Use boolean literals (NFC)
Identified with modernize-use-bool-literals.
2022-07-31 15:17:02 -07:00
Amir Ayupov 468d4f6d18 Revert "[BOLT] Ignore functions accessing false positive jump tables"
This diff uncovers an ASAN leak in getOrCreateJumpTable:
```
Indirect leak of 264 byte(s) in 1 object(s) allocated from:
    #1 0x4f6e48c in llvm::bolt::BinaryContext::getOrCreateJumpTable ...
```
The removal of an assertion needs to be accompanied by proper deallocation of
a `JumpTable` object for which `analyzeJumpTable` was unsuccessful.

This reverts commit 52cd00cabf.
2022-07-30 10:39:46 -07:00
Kazu Hirata 12b29900a1 Use any_of (NFC) 2022-07-30 10:35:56 -07:00
Kazu Hirata f081ec20b5 [bolt] Remove redundaunt virtual specifiers (NFC)
Identified with modernize-use-override.
2022-07-30 10:35:51 -07:00
Kazu Hirata b498a8991e [bolt] Remove redundaunt control-flow statements (NFC)
Identified with readability-redundant-control-flow.
2022-07-30 10:35:49 -07:00
Kazu Hirata 60db8d9b4e Use nullptr instead of 0 (NFC)
Identified with modernize-use-nullptr.
2022-07-30 10:35:48 -07:00
Rafael Auler fc0ced73dc Add BAT testing framework
This patch refactors BAT to be testable as a library, so we
can have open-source tests on it. This further fixes an issue with
basic blocks that lack a valid input offset, making BAT omit those
when writing translation tables.

Test Plan: new testcases added, new testing tool added (llvm-bat-dump)

Differential Revision: https://reviews.llvm.org/D129382
2022-07-29 14:55:04 -07:00
Fangrui Song 7430894a65 Replace Optional::hasValue with has_value or operator bool. NFC 2022-07-29 10:57:25 -07:00
Fangrui Song 999514bb9a [bolt] Replace Optional::getValue with value or operator*. NFC 2022-07-29 01:15:24 -07:00
Huan Nguyen 52cd00cabf [BOLT] Ignore functions accessing false positive jump tables
Disassembly and branch target analysis are not decoupled, so any
analysis that depends on disassembly may not operate properly.

In specific, analyzeJumpTable uses instruction bounds check property.
A jump table was analyzed twice: (a) during disassembly, and (b) after
disassembly, so there are potentially some mismatched results.

In this update, functions that access JTs which fail the second check
will be marked as ignored.

Test Plan:
```
ninja check-bolt
```

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D130431
2022-07-28 23:22:17 -07:00
Huan Nguyen ccabbfff86 [BOLT] Remove --allow-stripped option
AllowStripped has not been used in BOLT.
This option is replaced by actively detecting stripped binary.

Test Plan:

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D130036
2022-07-28 23:15:53 -07:00
Huan Nguyen 986362d4a3 [BOLT] Add BinaryContext::IsStripped
Determine stripped status of a binary based on .symtab

Test Plan:
```
ninja check-bolt
```

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D130034
2022-07-28 23:11:03 -07:00
Amir Ayupov 77c1977384 [BOLT] Support files with no symbols
`LastSymbol` handling in `discoverFileObjects` assumes a non-zero number of
symbols in an object file. It's not the case for broken_dynsym.test added in
D130073, and potentially other stripped binaries.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D130544
2022-07-26 00:07:59 -07:00
Fabian Parzefall 83882606db [BOLT] Process each block only once in fixCFGForPIC
Rather than iterating over the whole function from the start until no
internal calls are found, process each block only once and continue
processing after splitting. This version of the function also does not
seemingly invalidate iterators from within the loop.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D130436
2022-07-25 15:06:24 -07:00
Huan Nguyen 8eb68d92d4 [BOLT] Handle broken .dynsym in stripped binaries
Strip tools cause a few symbols in .dynsym to have bad section index.
This update safely keeps such broken symbols intact.

Test Plan:
```
ninja check-bolt
```

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D130073
2022-07-22 11:24:09 -07:00
Maksim Panchenko 661577b5f4 [BOLT] Add support for the latest perf tool
The latest perf tool can return non-empty buffer when executing
buildid-list command, even when perf.data was recorded with -B flag.
Some binaries will be listed without the ID, while others may have a
recorded ID. Allow invalid entires on the input, while checking the
valid ones for the match.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D130223
2022-07-22 07:56:15 -07:00
Sriraman Tallam 116ee23f4c [bolt] std::atomic_uint64_t to std::atomic<uint64_t>
Differential Revision: https://reviews.llvm.org/D129903
2022-07-19 16:09:11 -07:00
Fabian Parzefall 8477bc6761 [BOLT] Add function layout class
This patch adds a dedicated class to keep track of each function's
layout. It also lays the groundwork for splitting functions into
multiple fragments (as opposed to a strict hot/cold split).

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D129518
2022-07-16 17:23:24 -07:00
Huan Nguyen ae563c9146 [BOLT] Support split landing pad
We previously support split jump table, where some jump table entries
target different fragments of same function. In this fix, we provide
support for another type of intra-indirect transfer: landing pad.

When C++ exception handling is used, compiler emits .gcc_except_table
that describes the location of catch block (landing pad) for specific
range that potentially invokes a throw(). Normally landing pads reside
in the function, but with -fsplit-machine-functions, landing pads can
be moved to another fragment. The intuition is, landing pads are rarely
executed, so compiler can move them to .cold section.

This update will mark all fragments that have landing pad to another
fragment as non-simple, and later propagate non-simple to all related
fragments.

This update also includes one manual test case: split-landing-pad.s

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D128561
2022-07-14 18:10:22 -07:00
Fabian Parzefall d55dfeaf32 [BOLT] Replace uses of layout with basic block list
As we are moving towards support for multiple fragments, loops that
iterate over all basic blocks of a function, but do not depend on the
order of basic blocks in the final layout, should iterate over binary
functions directly, rather than the layout.

Eventually, all loops using the layout list should either iterate over
the function, or be aware of multiple layouts. This patch replaces
references to binary function's block layout with the binary function
itself where only little code changes are necessary.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D129585
2022-07-14 13:07:05 -07:00
Huan Nguyen 05523dc32d [BOLT] Support multiple parents for split jump table
There are two assumptions regarding jump table:
(a) It is accessed by only one fragment, say, Parent
(b) All entries target instructions in Parent

For (a), BOLT stores jump table entries as relative offset to Parent.
For (b), BOLT treats jump table entries target somewhere out of Parent
as INVALID_OFFSET, including fragment of same split function.

In this update, we extend (a) and (b) to include fragment of same split
functinon. For (a), we store jump table entries in absolute offset
instead. In addition, jump table will store all fragments that access
it. A fragment uses this information to only create label for jump table
entries that target to that fragment.

For (b), using absolute offset allows jump table entries to target
fragments of same split function, i.e., extend support for split jump
table. This can be done using relocation (fragment start/size) and
fragment detection heuristics (e.g., using symbol name pattern for
non-stripped binaries).

For jump table targets that can only be reached by one fragment, we
mark them as local label; otherwise, they would be the secondary
function entry to the target fragment.

Test Plan
```
ninja check-bolt
```

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D128474
2022-07-13 23:37:31 -07:00
Vladislav Khmelevsky 35efe1d806 [BOLT][AArch64] Handle gold linker veneers
The gold linker veneers are written between functions without symbols,
so we to handle it specially in BOLT.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Differential Revision: https://reviews.llvm.org/D129260
2022-07-13 14:47:22 +03:00
Denis Revunov 7564167885 [BOLT][AArch64] Use all supported CPU features on AArch64
Since we now have +all feature for AArch64 disassembler, we can use it
in BOLT and allow it to disassemble all ARM instructions supported by LLVM.

Reviewed by: rafauler

Differential Revision: https://reviews.llvm.org/D129139
2022-07-12 03:56:04 -04:00
Rafael Auler a3cfdd746e [BOLT] Increase coverage of shrink wrapping [5/5]
Add -experimental-shrink-wrapping flag to control when we
want to move callee-saved registers even when addresses of the stack
frame are captured and used in pointer arithmetic, making it more
challenging to do alias analysis to prove that we do not access
optimized stack positions. This alias analysis is not yet implemented,
hence, it is experimental. In practice, though, no compiler would emit
code to do pointer arithmetic to access a saved callee-saved register
unless there is a memory bug or we are failing to identify a
callee-saved reg, so I'm not sure how useful it would be to formally
prove that.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D126115
2022-07-11 17:30:13 -07:00
Rafael Auler 3e5f67f356 [BOLT] Increase coverage of shrink wrapping [4/5]
Change shrink-wrapping to try a priority list of save
positions, instead of trying the best one and giving up if it doesn't
work. This also increases coverage.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D126114
2022-07-11 17:30:05 -07:00
Rafael Auler 3332904ad6 [BOLT] Increase coverage of shrink wrapping [3/5]
Add the option to run -equalize-bb-counts before shrink
wrapping to avoid unnecessarily optimizing some CFGs where profile is
inaccurate but we can prove two blocks have the same frequency.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D126113
2022-07-11 17:30:00 -07:00
Rafael Auler 3508ced6ea [BOLT] Increase coverage of shrink wrapping [2/5]
Refactor isStackAccess() to reflect updates by D126116. Now we only
handle simple stack accesses and delegate the rest of the cases to
getMemDataSize.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D126112
2022-07-11 17:29:54 -07:00
Rafael Auler 42465efd17 [BOLT] Increase coverage of shrink wrapping [1/5]
Change how function score is calculated and provide more
detailed statistics when reporting back frame optimizer and shrink
wrapping results. In this new statistics, we provide dynamic coverage
numbers. The main metric for shrink wrapping is the number of executed
stores that were saved because of shrink wrapping (push instructions
that were either entirely moved away from the hot block or converted
to a stack adjustment instruction). There is still a number of reduced
load instructions (pop) that we are not counting at the moment. Also
update alloc combiner to report dynamic numbers, as well as frame
optimizer.

For debugging purposes, we also include a list of top 10 functions
optimized by shrink wrapping. These changes are aimed at better
understanding the impact of shrink wrapping in a given binary.

We also remove an assertion in dataflow analysis to do not choke on
empty functions (which makes no sense).

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D126111
2022-07-11 17:29:22 -07:00
spupyrev 228970f612 Revert "Rebase: [Facebook] Revert "[BOLT] Update dynamic relocations from section relocations""
This reverts commit 76029cc53e.
2022-07-11 09:50:47 -07:00
spupyrev eecd41aa09 Revert "Rebase: [Facebook] [MC] Introduce NeverAlign fragment type"
This reverts commit 6d0528636a.
2022-07-11 09:50:47 -07:00
spupyrev 7228371054 [BOLT] Do not merge cold and hot chains of basic blocks
There is a post-processing in ext-tsp block reordering that merges some blocks
into chains. This allows to maintain the original block order in the absense of
profile data and can be beneficial for code size (when fallthroughs are merged).
In the earlier version we could merge hot and cold (with zero execution count)
chains, that later were split by SplitFunction.cpp (when split-all-cold=1). The
diff eliminates the redundant merging.

It is unlikely the change will affect the performance of a binary in a
measurable way, as it is mostly operates with cold basic blocks. However, after
the diff the impact of split-all-cold is almost negligible and we can avoid the
extra function splitting.

Measuring on the clang binary (negative is good, positive is a regression):
**clang12**
benchmark1:  `0.0253`
benchmark2:  `-0.1843`
benchmark3:  `0.3234`
benchmark4:  `0.0333`

**clang10**
benchmark1  `-0.2517`
benchmark2  `-0.3703`
benchmark3  `-0.1186`
benchmark4  `-0.3822`

**clang7**
benchmark1  `0.2526`
benchmark2  `0.0500`
benchmark3  `0.3024`
benchmark4  `-0.0489`

**Overall**: `-0.0671 ± 0.1172` (insignificant)

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D129397
2022-07-11 09:31:52 -07:00
Maksim Panchenko 76029cc53e Rebase: [Facebook] Revert "[BOLT] Update dynamic relocations from section relocations"
Summary:
This reverts commit 729d29e167.

Needed as a workaround for T112872562.

Manual rebase conflict history:
https://phabricator.intern.facebook.com/D35230076
https://phabricator.intern.facebook.com/D35681740

Test Plan: sandcastle

Reviewers: #llvm-bolt

Subscribers: spupyrev

Differential Revision: https://phabricator.intern.facebook.com/D37098481
2022-07-11 09:31:52 -07:00
Rafael Auler 6d0528636a Rebase: [Facebook] [MC] Introduce NeverAlign fragment type
Summary:
Introduce NeverAlign fragment type.

The intended usage of this fragment is to insert it before a pair of
macro-op fusion eligible instructions. NeverAlign fragment ensures that
the next fragment (first instruction in the pair) does not end at a
given alignment boundary by emitting a minimal size nop if necessary.

In effect, it ensures that a pair of macro-fusible instructions is not
split by a given alignment boundary, which is a precondition for
macro-op fusion in modern Intel Cores (64B = cache line size, see Intel
Architecture Optimization Reference Manual, 2.3.2.1 Legacy Decode
Pipeline: Macro-Fusion).

This patch introduces functionality used by BOLT when emitting code with
MacroFusion alignment already in place.

The use case is different from BoundaryAlign and instruction bundling:
- BoundaryAlign can be extended to perform the desired alignment for the
first instruction in the macro-op fusion pair (D101817). However, this
approach has higher overhead due to reliance on relaxation as
BoundaryAlign requires in the general case - see
https://reviews.llvm.org/D97982#2710638.
- Instruction bundling: the intent of NeverAlign fragment is to prevent
the first instruction in a pair ending at a given alignment boundary, by
inserting at most one minimum size nop. It's OK if either instruction
crosses the cache line. Padding both instructions using bundles to not
cross the alignment boundary would result in excessive padding. There's
no straightforward way to request instruction bundling to avoid a given
end alignment for the first instruction in the bundle.

LLVM: https://reviews.llvm.org/D97982

Manual rebase conflict history:
https://phabricator.intern.facebook.com/D30142613

Test Plan: sandcastle

Reviewers: #llvm-bolt

Subscribers: phabricatorlinter

Differential Revision: https://phabricator.intern.facebook.com/D31361547
2022-07-11 09:31:52 -07:00
Alexander Yermolovich e159abdb04 [BOLT][DWARF] Support mix mode DWARF
Added support for mixing monolithic DWARF5 with legacy DWARF, and monolithic legacy and DWARF5 split dwarf.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D128232
2022-06-30 16:53:15 -07:00
Amir Ayupov 66b01a8934 [BOLT] Fix getDynoStats to handle BCs with no functions
Address fuzzer crash

Reviewed By: yota9

Differential Revision: https://reviews.llvm.org/D120696
2022-06-30 01:18:45 -07:00
Amir Ayupov cb75faf40c [X86][BOLT] Use getOperandType to determine memory access size
Generate INSTRINFO_OPERAND_TYPE table in X86GenInstrInfo.inc.

This diff adds support for instructions that were previously reported as having
memory access size 0. It replaces the heuristic of looking at instruction
register width to determine memory access width by instead checking the memory
operand type using tablegen-provided tables.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D126116
2022-06-30 00:25:32 -07:00
Amir Ayupov 798e92c6c4 [BOLT] Respect shouldPrint in dump-dot-all
Don't dump dot CFG graph for functions that should not be printed.

Reviewed By: rafauler, maksfb

Differential Revision: https://reviews.llvm.org/D128699
2022-06-29 17:01:17 -07:00
Maksim Panchenko ed74304506 [BOLT] Fix EH trampoline backout code
When SplitFunctions pass adds a trampoline code for exception landing
pads (limited to shared objects), it may increase the size of the hot
fragment making it larger than the whole function pre-split. When this
happens, the pass reverts the splitting action by restoring the original
block order and marking all blocks hot.

However, if createEHTrampolines() added new blocks to the CFG and
modified invoke instructions, simply restoring the original block layout
will not suffice as the new CFG has more blocks.

For proper backout of the split, modify the original layout by merging
in trampoline blocks immediately before their matching targets. As a
result, the number of blocks increases, but the number of instructions
and the function size remains the same as pre-split.

Add an assertion for the number of blocks when updating a function
layout.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D128696
2022-06-29 14:35:57 -07:00
Fabian Parzefall e341e9f094 [BOLT] Add option to randomize function split point
For test purposes, we want to split functions at a random split point
to be able to test different layouts without relying on the profile.
This patch introduces an option, that randomly chooses a split point
to partition blocks of a function into hot and cold regions.

Reviewed By: Amir, yota9

Differential Revision: https://reviews.llvm.org/D128773
2022-06-29 13:02:05 -07:00
Rafael Auler fc2d96c334 Revert "[BOLT][AArch64] Handle gold linker veneers"
This reverts commit 425dda76e9.

This commit is currently causing BOLT to crash in one of our
binaries and needs a bit more checking to make sure it is safe
to land.
2022-06-28 19:23:28 -07:00
Vladislav Khmelevsky 425dda76e9 [BOLT][AArch64] Handle gold linker veneers
The gold linker veneers are written between functions without symbols,
so we to handle it specially in BOLT.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Differential Revision: https://reviews.llvm.org/D128082
2022-06-28 16:14:05 +03:00
Amir Ayupov d58b5a0614 [BOLT] Restrict icp-inline to callsites
ICP peel for inline mode only makes sense for calls, not jump tables.
Plus, add a check that the Target BinaryFunction is found.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D128404
2022-06-27 11:08:55 -07:00
Amir Ayupov 0d477f63b0 [BOLT][NFC] Add aliases for ICP flags
- `indirect-call-promotion` -> `icp`
- `indirect-call-promotion-mispredict-threshold` -> `icp-mp-threshold`
- `indirect-call-promotion-use-mispredicts` -> `icp-use-mp`
- `indirect-call-promotion-topn` -> `icp-topn`
- `indirect-call-promotion-calls-topn` -> `icp-calls-topn`
- `indirect-call-promotion-jump-tables-topn` -> `icp-jt-topn`
- `icp-jump-table-targets` -> `icp-jt-targets`

This also fixes an inconsistency in ICP flag names that some start with
`indirect-call-promotion` while others start with `icp`.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D128375
2022-06-27 10:29:26 -07:00
Amir Ayupov c4302e4fc2 [BOLT][NFC] Use llvm::less_first
Follow the case of https://reviews.llvm.org/D126068 and simplify call sites
with `llvm::less_first`.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D128242
2022-06-27 10:27:17 -07:00
Fabian Parzefall 96f6ec5090 [BOLT] Mark option values of --split-functions deprecated
The SplitFunctions pass does not distinguish between various splitting
modes anymore. This change updates the command line interface to
reflect this behavior by deprecating values passed to the
--split-function option.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D128558
2022-06-24 17:01:13 -07:00
Alexander Yermolovich 11a8dd65ec [BOLT][DWARF] Add support for DW_AT_call_pc/DW_AT_call_return_pc
DWARF 5 added two new attributes DW_AT_call_pc and DW_AT_call_return_pc.
Adding support for them.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D128526
2022-06-24 12:37:58 -07:00
Amir Ayupov d2c8769936 [BOLT][NFC] Use range-based STL wrappers
Replace `std::` algorithms taking begin/end iterators with `llvm::` counterparts
accepting ranges.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D128154
2022-06-23 22:16:27 -07:00
Maksim Panchenko f263a66ba0 [BOLT] Split functions with exceptions in shared objects and PIEs
Add functionality to allow splitting code with C++ exceptions in shared
libraries and PIEs. To overcome a limitation in exception ranges format,
for functions with fragments spanning multiple sections, add trampoline
landing pads in the same section as the corresponding throwing range.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D127936
2022-06-19 16:48:48 -07:00
Amir Ayupov 445bc88501 [BOLT] Use 32-bit MOV to zero 64-bit register in instrumentation code
Instead of `movabsq $0x0, %rax` emit shorter equivalent `movl $0x0, %eax`.
Intel SDM, 3.4.1.1 General-Purpose Registers in 64-Bit Mode:
>32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in
> the destination general-purpose register.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D127045
2022-06-19 11:34:32 -07:00
Huan Nguyen 543f13c99b [BOLT] Allow function entry to be a cold fragment
Allow cold fragment to get new address.

Our previous assumption is that a fragment (.cold) is only reached
through the main fragment of same function. In addition, .cold fragment
must be reached through either (a) direct transfer, or (b) split jump
table. For (a), we perform a simple fix-up. For (b), we currently mark
all relevant fragments as non-simple. Therefore, there is no need to
get new address for .cold fragment.

This is not always the case, as function entry can be rarely executed,
and is placed in .text.cold segment. Essentially we cannot tell which
the source-level function entry is based on hot and cold segments,
so we must treat each fragment a function on its own. Therfore, we
remove the assertion that a function entry cannot be cold fragment.

Test Plan:
```
ninja check-bolt
```

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D128111
2022-06-18 11:39:51 -07:00
Huan Nguyen 28b1dcb122 [BOLT] Allow function fragments to point to one jump table
Resolve a crash related to split functions

Due to split function optimization, a function can be divided to two

fragments, and both fragments can access same jump table. This
violates 
the assumption that a jump table can only have one parent
function, 
which causes a crash during instrumentation.

We want to support the case: different functions cannot access same
jump tables, but different fragments of same function can!

As all fragments are from same function, we point JT::Parent to one
specific fragment. Right now it is the first disassembled fragment, but
we can point it to the function's main fragment later.

Functions are disassembled sequentially. Previously, at the end of
processing a function, JT::OffsetEntries is cleared, so other fragment
can no longer reuse JT::OffsetEntries. To extend the support for split
function, we only clear JT::OffsetEntries after all functions are
disassembled.

Let say A.hot and A.cold access JT of three targets {X, Y, Z}, where
X and Y are in A.hot, and Z is in A.cold. Suppose that A.hot is
disassembled first, JT::OffsetEntries = {X',Y',INVALID_OFFSET}. When
A.cold is disassembled, it cannot reuse JT::OffsetEntries above due to
different fragment start. A simple solution:
A.hot  = {X',Y',INVALID_OFFSET}
A.cold = {INVALID_OFFSET, INVALID_OFFSET, INVALID_OFFSET}

We update the assertion to allow different fragments of same function
to get the same JumpTable object.

Potential improvements:
A.hot  = {X',Y',INVALID_OFFSET}
A.cold = {INVALID_OFFSET, INVALID_OFFSET, Z'}
The main issue is A.hot and A.cold have separate CFGs, thus jump table
targets are still constrained within fragment bounds.

Future improvements:
A.hot  = {X, Y, Z}
A.cold = {X, Y, Z}

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D127924
2022-06-17 16:22:30 -07:00
Rafael Auler 9d5e6ccd9b [BOLT] Fix for missing entry offset
Temporary fix for missing entry offset when creating address
translation tables (BAT) after D127935 landed. Will later work on
assigning a more reasonable offset different than zero.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D128092
2022-06-17 13:14:42 -07:00
Maksim Panchenko 8228c70358 [BOLT][NFCI] Refactor interface for adding basic blocks
Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D127935
2022-06-16 11:51:57 -07:00
Amir Ayupov 02d510b416 [BOLT][NFC] Pass Function to BC.printInstructions in BinaryBasicBlock::dump
BC::printInstruction(s) has many uses of Function ptr if it's available:
# printing CFI instructions (unconditional)
# printing debug line information (-print-debug-info)
# printing instruction relocations (-print-relocations)

Enable these uses by passing Function ptr from the primary printing entry point:
BinaryBasicBlock::dump.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D126916
2022-06-13 14:26:51 -07:00
Amir Ayupov a2c4d6d332 [BOLT][NFC] Forward declare ReorderBlocks for MSVC19
Fix bolt-x86_64-wine-msvc builder:
https://lab.llvm.org/buildbot/#/builders/222/builds/1154

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D127612
2022-06-13 10:26:58 -07:00
Vladislav Khmelevsky 6e26ffa064 [BOLT][AARCH64] Skip R_AARCH64_LD_PREL_LO19 relocation
Supress failed to analyze relocations warning for R_AARCH64_LD_PREL_LO19
relocation. This relocation is mostly used to get value stored in CI and
we don't process it since we are caluclating target address using the
instruction value in evaluateMemOperandTarget().

Differential Revision: https://reviews.llvm.org/D127413
2022-06-13 15:40:06 +03:00
Amir Ayupov 7dee646b28 [BOLT][NFC] Move printDebugInfo out of BC::printInstruction
Simplify `BinaryContext::printInstruction`.

Reviewed By: ayermolo

Differential Revision: https://reviews.llvm.org/D127561
2022-06-11 11:58:36 -07:00
Fangrui Song adf4142f76 [MC] De-capitalize SwitchSection. NFC
Add SwitchSection to return switchSection. The API will be removed soon.
2022-06-10 22:50:55 -07:00
Huan Nguyen 82095bd5ed [BOLT] Mark fragments related to split jump table as non-simple
Mark fragments related to split jump table as non-simple.

A function could be splitted into hot and cold fragments. A split jump table is
challenging for correctly reconstructing control flow graphs, so it was marked
as ignored. This update marks those fragments as non-simple, allowing them
to be printed and partial control flow graph construction.

Test Plan:
```
llvm-lit -a tools/bolt/test/X86/split-func-icf.s
```
This test has two functions (main, main2), each has a jump table target to the
same cold portion main2.cold.1(*2). We try to print out only this cold portion.
If it is ignored, it cannot be printed. If it is non-simple, it can be printed. We
verify that it can be printed.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D127464
2022-06-10 15:49:32 -07:00
Denis Revunov 0b7e8baf83 [BOLT][AArch64] Handle data at the beginning of a function when disassembling and building CFG.
This patch adds getFirstInstructionOffset method for BinaryFunction
which is used to properly handle cases where data is at zero offset in
a function. The main change is that we add basic block at first
instruction offset when disassembling, which prevents assertion
failures in buildCFG.

Reviewed By: yota9, rafauler

Differential Revision: https://reviews.llvm.org/D127111
2022-06-09 15:26:32 -07:00
Maksim Panchenko 1817642684 [BOLT] Add support for GOTPCRELX relocations
The linker can convert instructions with GOTPCRELX relocations into a
form that uses an absolute addressing with an immediate. BOLT needs to
recognize such conversions and symbolize the immediates.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D126747
2022-06-09 13:37:04 -07:00
Alexander Yermolovich 1c6dc43de9 [BOLT]DWARF] Eagerly write out loclists
Taking advantage of us being able to re-write .debug_info to reduce memory
footprint loclists. Writing out loc-list as they are added, similar to how
we handle ranges.

Collected on clang-14
trunk
4:41.20 real,   389.50 user,    59.50 sys,      0 amem, 38412532 mmem
4:30.08 real,   376.10 user,    63.75 sys,      0 amem, 38477844 mmem
4:25.58 real,   373.76 user,    54.71 sys,      0 amem, 38439660 mmem
diff
4:34.66 real,   392.83 user,    57.73 sys,      0 amem, 38382560 mmem
4:35.96 real,   377.70 user,    58.62 sys,      0 amem, 38255840 mmem
4:27.61 real,    390.18 user,    57.02 sys,      0 amem, 38223224 mmem

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D126999
2022-06-08 16:52:59 -07:00
Vladislav Khmelevsky fd9604952d [BOLT] Set valid index for functions with profiles
Some of the passes that calculates tentative layout like LongJmp and
Golang are expecting that only functions with valid index will be
located in hot text section. But currently functions with valid profiles
and not set index are breaking this logic, to fix this we can move the
hasValidProfile() condition from AssignSections pass to ReorderFunctions.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Differential Revision: https://reviews.llvm.org/D127223
2022-06-08 14:13:12 +03:00
Fangrui Song 15d82c62dc [MC] De-capitalize MCStreamer functions
Follow-up to c031378ce0 .
The class is mostly consistent now.
2022-06-07 00:31:02 -07:00
Fangrui Song b92436efcb [bolt] Remove unneeded cl::ZeroOrMore for cl::opt options 2022-06-05 13:29:49 -07:00
Fangrui Song 36c7d79dc4 Remove unneeded cl::ZeroOrMore for cl::opt options
Similar to 557efc9a8b.
This commit handles options where cl::ZeroOrMore is more than one line below
cl::opt.
2022-06-04 00:10:42 -07:00
Fangrui Song 72f9c69421 [Hexagon][bolt] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Similar to 557efc9a8b
2022-06-03 22:04:57 -07:00
Huan Nguyen 5ac26156fe [BOLT][NFC] Warning for deprecated option '-reorder-blocks=cache+'
Emit warning when using deprecated option '-reorder-blocks=cache+'.
Auto switch to option '-reorder-blocks=ext-tsp'.

Test Plan:
```
ninja check-bolt
```
Added a new test cache+-deprecated.test.
Run and verify that the upstream tests are passed.

Reviewed By: rafauler, Amir, maksfb

Differential Revision: https://reviews.llvm.org/D126722
2022-06-03 14:16:55 -07:00
spupyrev 5904836b8a [BOLT] Cache-Aware Tail Duplication
A new "cache-aware" strategy for tail duplication.

Differential Revision: https://reviews.llvm.org/D123050
2022-06-03 09:08:45 -07:00
Amir Ayupov e2142ff47c [BOLT][NFC] Make ICP::verifyProfile static
Follow LLVM style guide suggestion to avoid function definitions in anonymous
namespaces: https://llvm.org/docs/CodingStandards.html#anonymous-namespaces

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D124896
2022-06-02 19:09:29 -07:00
Maksim Panchenko 986e5dedf2 [BOLT][NFC] Fix braces in BinaryEmitter
Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D126844
2022-06-02 12:45:25 -07:00
Amir Ayupov 6333e5dde9 [BOLT][NFC] Use colors in CFG dumps
Use color coding to distinguish nodes:
- Entry nodes have bold border
- Scalar (non-loopy) code is milk white
- Outer loops are light yellow
- Innermost loops are light blue

`-print-loops` needs to be enabled to provide BinaryLoopInfo.
Examples:
{F23170673}
{F23170680}

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D126248
2022-06-02 00:27:12 -07:00
Amir Ayupov cc23c64ff1 [BOLT][NFC] Print block instructions in dumpGraph as part of node label
Reuse the option `-dot-tooltip-code` to put block instructions into the label.
This way, the instructions are displayed by default when used with dot viewer.

When the .dot file is used with dot2html, instructions are hidden by default,
and are shown by clicking on a node.

{F23169510}

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D126237
2022-06-01 23:41:41 -07:00
Alexander Yermolovich ab9a175990 [BOLT][DWARF] Fix TU Index handling for DWARF4/5
When we generate split dwarf with -fdebug-types-section we will have
.debug_types.dwo sections. These go into TU Index when we run llvm-dwp. BOLT was
not handling DWP input correctly with this section.

Added support for handling DWP with TU Index as an input and output for DWARF4.
Added support for handling DWP with TU Index as an input for DWARF5

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D126087
2022-06-01 18:16:12 -07:00
Maksim Panchenko 0426100ff4 [BOLT][NFC] Remove unused variable
Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D126808
2022-06-01 13:43:10 -07:00
Maksim Panchenko e290133c76 [BOLT] Add new class for symbolizing X86 instructions
Summary:
While disassembling instructions, we need to replace certain immediate
operands with symbols. This symbolizing process relies on reading
relocations against instructions. However, some X86 instructions can
have multiple immediate operands and up to two relocations against
them. Thus, correctly matching a relocation to an operand is not
always possible without knowing the operand offset within the
instruction.

Luckily, LLVM provides an interface for passing the required info from
the disassembler via a virtual MCSymbolizer class. Creating a
target-specific version allows a precise matching of relocations to
operands.

This diff adds X86MCSymbolizer class that performs X86-specific
symbolizing (currently limited to non-branch instructions).

Reviewers: yota9, Amir, ayermolo, rafauler, zr33

Differential Revision: https://reviews.llvm.org/D120928
2022-05-31 17:48:19 -07:00
Denis Revunov 8579db96e8 [BOLT] [AArch64] Handle constant islands spanning multiple functions
Fix BOLT's constant island mapping when a constant island marked by $d
spans multiple functions. Currently, because BOLT only marks the
constant island in the first function where $d is located, if the next
function contains data at its start, BOLT will miss the data and try
to disassemble it. This patch adds code to explicitly go through all
symbols between $d and $x markers and mark their respective offsets as
data, which stops BOLT from trying to disassemble data. It also adds
MarkerType enum and refactors related functions.

Reviewed By: yota9, rafauler

Differential Revision: https://reviews.llvm.org/D126177
2022-05-31 13:51:35 -07:00
Balazs Benics a73b50ad06 Revert "[llvm][clang][bolt][NFC] Use llvm::less_first() when applicable"
This reverts commit 3988bd1398.

Did not build on this bot:
https://lab.llvm.org/buildbot#builders/215/builds/6372

/usr/include/c++/9/bits/predefined_ops.h:177:11: error: no match for call to
‘(llvm::less_first) (std::pair<long unsigned int, llvm::bolt::BinaryBasicBlock*>&, const std::pair<long unsigned int, std::nullptr_t>&)’
  177 |  { return bool(_M_comp(*__it, __val)); }
2022-05-27 11:19:18 +02:00
Balazs Benics 3988bd1398 [llvm][clang][bolt][NFC] Use llvm::less_first() when applicable
One could reuse this functor instead of rolling out your own version.
There were a couple other cases where the code was similar, but not
quite the same, such as it might have an assertion in the lambda or other
constructs. Thus, I've not touched any of those, as it might change the
behavior in some way.

As per https://discourse.llvm.org/t/submitting-simple-nfc-patches/62640/3?u=steakhal
Chris Lattner
> LLVM intentionally has a “yes, you can apply common sense judgement to
> things” policy when it comes to code review. If you are doing mechanical
> patches (e.g. adopting less_first) that apply to the entire monorepo,
> then you don’t need everyone in the monorepo to sign off on it. Having
> some +1 validation from someone is useful, but you don’t need everyone
> whose code you touch to weigh in.

Differential Revision: https://reviews.llvm.org/D126068
2022-05-27 11:15:23 +02:00
Rafael Auler c09cd64e5c [BOLT] Fix AND evaluation bug in shrink wrapping
Fix a bug where shrink-wrapping would use wrong stack offsets
because the stack was being aligned with an AND instruction, hence,
making its true offsets only available during runtime (we can't
statically determine where are the stack elements and we must give up
on this case).

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D126110
2022-05-26 14:59:28 -07:00
Amir Ayupov f7581a3969 [BOLT][NFC] Use ListSeparator in BinaryFunction print methods
Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D126243
2022-05-24 18:29:24 -07:00
Amir Ayupov 69f87b6c29 [BOLT][NFC] Customize endline character for printInstruction(s)
This would be used in `BF::dumpGraph` to dump left-justified text.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D126232
2022-05-24 18:26:12 -07:00
Amir Ayupov 5d8247d4c7 [BOLT][NFC] Use for_each to simplify printLoopInfo
Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D126242
2022-05-24 18:05:43 -07:00
Amir Ayupov c907d6e0e9 [BOLT][NFC] Suppress unused variable warnings
Addresses the warnings emitted by Apple Clang 13.1.6 (Xcode 13.3.1).
Tip @tschuett issue #55404.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D125733
2022-05-17 14:30:23 -07:00
Amir Ayupov a7b69dbdd1 [BOLT][NFC] Move BinaryDominatorTree out of BinaryLoop header
Split up the BinaryLoop header and move BinaryDominatorTree into its own header,
preparing it for a standalone use.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D125664
2022-05-17 14:20:11 -07:00
Amir Ayupov bdba3d091c [BOLT][CMAKE] Fix DYLIB build
Move BOLT libraries out of `LLVM_LINK_COMPONENTS` to `target_link_libraries`.
Addresses issue #55432.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D125568
2022-05-13 13:27:21 -07:00
Amir Ayupov 253b8f0abd [BOLT][NFC] Use refs for loop variables to avoid copies
Addresses warnings when built with Apple Clang.

Reviewed By: yota9

Differential Revision: https://reviews.llvm.org/D125483
2022-05-13 20:18:29 +01:00
Amir Ayupov 139744ac53 [BOLT][NFC] Suppress unused variable warnings
Address warnings in Release build without assertions.
Tip @tschuett for reporting the issue #55404.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D125475
2022-05-13 20:10:19 +01:00
Amir Ayupov d63c5a38fe [BOLT][NFC] Use BitVector::set_bits
Refactor and use `set_bits` BitVector interface.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D125374
2022-05-11 16:23:44 -07:00
Amir Ayupov 8cb7a873ab [BOLT][NFC] Add MCPlus::primeOperands iterator_range
Reviewed By: yota9

Differential Revision: https://reviews.llvm.org/D125397
2022-05-11 09:34:51 -07:00
Amir Ayupov c2d40f1dfb [BOLT] Add icp-inline option
Add an option to only peel ICP targets that can be subsequently inlined.
Yet there's no guarantee that they will be inlined.

The mode is independent from the heuristic used to choose ICP targets: by exec
count, mispredictions, or memory profile.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D124900
2022-05-11 03:21:24 -07:00
Alexander Yermolovich 3abb68a626 [BOLT][DWARF] Fix assert for split dwarf.
Fixing a small bug where it would assert if CU does not modify .debug_addr section.

Differential Revision: https://reviews.llvm.org/D125181
2022-05-08 19:18:17 -07:00
Alexander Yermolovich ba1ac98c62 [BOLT][DWARF] Add version 5 split dwarf support
Added support for DWARF5 Split Dwarf.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D122988
2022-05-05 14:59:05 -07:00
Rahman Lavaee 733dc3e50b [BOLT] Report per-section hotness in bolt-heatmap.
This patch adds a new feature to bolt heatmap to print the hotness of each section in terms of the percentage of samples within that section.

Sample output generated for the clang binary:

Section Name, Begin Address, End Address, Percentage Hotness
.text, 0x1a7b9b0, 0x20a2cc0, 1.4709
.init, 0x20a2cc0, 0x20a2ce1, 0.0001
.fini, 0x20a2ce4, 0x20a2cf2, 0.0000
.text.unlikely, 0x20a2d00, 0x431990c, 0.3061
.text.hot, 0x4319910, 0x4bc6927, 97.2197
.text.startup, 0x4bc6930, 0x4c10c89, 0.0058
.plt, 0x4c10c90, 0x4c12010, 0.9974

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D124412
2022-05-05 11:37:46 -07:00
Amir Ayupov f8d2d8b587 [BOLT][NFC] Move getInliningInfo out of Inliner class
`getInliningInfo` is useful in other passes that need to check inlining
eligibility for some function. Move the declaration and InliningInfo definition
out of Inliner class. Prepare for subsequent use in ICP.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D124899
2022-05-04 14:08:06 -07:00
Amir Ayupov 2ad1c7540e [BOLT][NFC] Minor cleanup in ICP getCallTargets and canPromoteCallsite
Minor refactoring. NFC.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D124898
2022-05-04 14:06:53 -07:00
Amir Ayupov 68c7299f16 [BOLT][NFC] Fix MCPlusBuilder::getAliases caching behavior
Caching behavior of `getAliases` causes a failure in unit tests where two
MCPlusBuilder objects are created corresponding to AArch64 and X86:
the alias cache is created for AArch64 but then used for X86.

https://lab.llvm.org/staging/#/builders/211/builds/126

The issue only affects unit tests as we only construct one MCPlusBuilder
for ELF binary.

Resolve the issue by moving alias bitvectors to MCPlusBuilder object.

Reviewed By: yota9

Differential Revision: https://reviews.llvm.org/D124942
2022-05-04 12:53:26 -07:00
Amir Ayupov 60957a5a08 [BOLT] Fix ICPJumpTablesTopN option use
Fix non-sensical `opts::ICPJumpTablesTopN != 0 ? opts::ICPTopN : opts::ICPTopN`.
Refactor/simplify another similar assignment.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D124880
2022-05-03 19:34:10 -07:00
Amir Ayupov c3d5372093 [BOLT][NFC] Make ICP options naming uniform
Rename `opts::IndirectCallPromotion*` to `opts::ICP*`, making option naming
uniform and easier to follow.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D124879
2022-05-03 19:32:45 -07:00
Amir Ayupov d0b1c98c96 [BOLT][NFC] ICP: simplify findTargetsIndex
Unnest lambda and use `llvm::is_contained`.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D124877
2022-05-03 19:31:20 -07:00
Amir Ayupov ec02227bf7 [BOLT][NFC] Refactor ICP::findCallTargetSymbols
Reduce nesting making it easier to read.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D124876
2022-05-03 19:29:22 -07:00
Paul Kirth 625e0e611b [BOLT] [NFC] Remove unused variable
This patch fixes a warning from -Wunused-but-set-variable
MismatchedBranches are counted, but are never reported.
Since evaluateProfileData() should already identify and report
these cases, we can safely remove the unused variable.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D124588
2022-05-03 15:15:56 +00:00
Amir Ayupov 64421e191b [BOLT][NFC] Reduce Target/{AArch64,X86} dependencies
We don't actually depend on entire X86/AArch64 components that pull in CodeGen,
SelectionDAG etc., just the Desc part with opcode and other definitions.

Note that it doesn't decouple BOLT from these components - we still pull in X86
and AArch64 from top-level llvm-bolt dependencies as we use assembler and
disassembler. It's difficult to reduce these as this requires non-trivial
changes to X86/AArch64 components themselves (e.g. moving out AsmPrinter).

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D124206
2022-04-29 20:37:53 -07:00
Paul Kirth a0b8ab1ba3 [BOLT][NFC] Fix warning for unqualified call to std::move
Fixes warning from RetpolineInsertion.cpp:171:44:
warning: unqualified call to std::move [-Wunqualified-std-cast-call]

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D124482
2022-04-26 23:18:20 +00:00
Rahman Lavaee e59e580116 [BOLT] Refactor DataAggregator::printLBRHeatMap.
This also fixes some logs that were impacted by D123067.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D124281
2022-04-25 11:39:44 -07:00
Alexander Yermolovich 014cd37f51 [BOLT][DWARF] Implement monolithic DWARF5
Added implementation to support DWARF5 in monolithic mode.
Next step DWARF5 split dwarf support.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D121876
2022-04-21 16:02:23 -07:00
Alexey Moksyakov 48e894a536 [BOLT] Add R_AARCH64_PREL16/32/64 relocations support
Reviewed By: yota9, rafauler

Differential Revision: https://reviews.llvm.org/D122294
2022-04-21 13:52:47 +03:00
Vladislav Khmelevsky 63686af1e1 [BOLT] Fix build with GCC 7.3.0
The gcc 7.3.0 version raises "could not covert" error without std::move
used explicitly.

Differential Revision: https://reviews.llvm.org/D124009
2022-04-21 13:47:58 +03:00
Maksim Panchenko 76981fbcf6 [BOLT] Add fuzzy function name matching for LLVM LTO
LLVM with LTO can generate function names in the form
func.llvm.<number>, where <number> could vary based on the compilation
environment. As a result, if a profiled binary originated from a
different build than a corresponding binary used for BOLT optimization,
then profiles for such LTO functions will be ignored.

To fix the problem, use "fuzzy" matching with "func.llvm.*" form.

Reviewed By: yota9, Amir

Differential Revision: https://reviews.llvm.org/D124117
2022-04-20 17:00:21 -07:00
Alexander Yermolovich 7d6716786f [BOLT][DWARF] Handle Error returned by visitLocationList
Looks like implementation in llvm changed, and now we need to process error
being returned.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D124133
2022-04-20 16:40:46 -07:00
Amir Ayupov 4f277f28ab [BOLT] Check if LLVM_REVISION is defined
Handle the case where LLVM_REVISION is undefined (due to LLVM_APPEND_VC_REV=OFF
or otherwise) by setting "<unknown>" value as before D123549.

Reviewed By: yota9

Differential Revision: https://reviews.llvm.org/D123852
2022-04-15 06:33:14 -07:00
Amir Ayupov 2a9386726b [BOLT][NFC] Use LLVM_REVISION instead of BOLT_VERSION_STRING
Remove duplicate version string identification

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D123549
2022-04-14 19:16:35 -07:00
Maksim Panchenko 77b75ca53f [BOLT][perf2bolt] Fix base address calculation for shared objects
When processing profile data for shared object or PIE, perf2bolt needs
to calculate base address of the binary based on the map info reported
by the perf tool. When the mapping data provided is for the second
(or any other than the first) segment and the segment's file offset
does not match its memory offset, perf2bolt uses wrong assumption
about the binary base address.

Add a function to calculate binary base address using the reported
memory mapping and use the returned base for further address
adjustments.

Reviewed By: yota9

Differential Revision: https://reviews.llvm.org/D123755
2022-04-14 10:29:53 -07:00
Vladislav Khmelevsky 2f98c5febc [BOLT] Update skipRelocation for aarch64
The ld might relax ADRP+ADD or ADRP+LDR sequences to the ADR+NOP, add
the new case to the skipRelocation for aarch64.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Differential Revision: https://reviews.llvm.org/D123334
2022-04-13 22:54:06 +03:00
Maksim Panchenko 36cb736665 [BOLT] Ignore PC-relative relocations from data to data
BOLT expects PC-relative relocations in data sections to reference code
and the relocated data to form a jump table. However, there are cases
where PC-relative addressing is used for data-to-data references
(e.g. clang-15 can generate such code). BOLT should recognize and ignore
such relocations. Otherwise, they will be considered relocations not
claimed by any jump table and cause a failure in the strict mode.

Reviewed By: yota9, Amir

Differential Revision: https://reviews.llvm.org/D123650
2022-04-13 11:13:51 -07:00
Amir Ayupov bad3798113 [BOLT] Fix data race in shortenInstructions
Address ThreadSanitizer warning

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D121338
2022-04-13 11:10:36 -07:00
Rahman Lavaee 0c13d97e2b Allow building heatmaps from basic sampled events with `-nl`.
I find that this is useful for finding event hotspots.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D123067
2022-04-11 15:04:44 -07:00
Amir Ayupov 9b02dc631d [BOLT] Check MCContext errors
Abort on emission errors to prevent a malformed binary being written.
Example:
```
<unknown>:0: error: Undefined temporary symbol .Ltmp26310
<unknown>:0: error: Undefined temporary symbol .Ltmp26311
<unknown>:0: error: Undefined temporary symbol .Ltmp26312
<unknown>:0: error: Undefined temporary symbol .Ltmp26313
<unknown>:0: error: Undefined temporary symbol .Ltmp26314
<unknown>:0: error: Undefined temporary symbol .Ltmp26315
BOLT-ERROR: Emission failed.
```

Reviewed By: yota9

Differential Revision: https://reviews.llvm.org/D123263
2022-04-08 21:08:39 -07:00
Argyrios Kyrtzidis 330268ba34 [Support/Hash functions] Change the `final()` and `result()` of the hashing functions to return an array of bytes
Returning `std::array<uint8_t, N>` is better ergonomics for the hashing functions usage, instead of a `StringRef`:

* When returning `StringRef`, client code is "jumping through hoops" to do string manipulations instead of dealing with fixed array of bytes directly, which is more natural
* Returning `std::array<uint8_t, N>` avoids the need for the hasher classes to keep a field just for the purpose of wrapping it and returning it as a `StringRef`

As part of this patch also:

* Introduce `TruncatedBLAKE3` which is useful for using BLAKE3 as the hasher type for `HashBuilder` with non-default hash sizes.
* Make `MD5Result` inherit from `std::array<uint8_t, 16>` which improves & simplifies its API.

Differential Revision: https://reviews.llvm.org/D123100
2022-04-05 21:38:06 -07:00
Amir Ayupov f99398fe0e [BOLT][NFC] Move isADD64rr and isADDri out of MCPlusBuilder class
Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D123077
2022-04-05 14:32:07 -07:00
Vladislav Khmelevsky 2e51a32219 [BOLT] Check for !isTailCall in isUnconditionalBranch
Add !isTailCall in isUnconditionalBranch check in order to sync the x86
and aarch64 and fix the fixDoubleJumps pass on aarch64.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Differential Revision: https://reviews.llvm.org/D122929
2022-04-05 23:39:34 +03:00
Vladislav Khmelevsky 4956e0e197 [BOLT] Fix plt relocations symbol match
The bfd linker adds the symbol versioning string to the symbol name in symtab.
Skip the versioning part in order to find the registered PLT function.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Differential Revision: https://reviews.llvm.org/D122039
2022-04-05 15:57:26 +03:00
Amir Ayupov 686406a006 [BOLT][NFC] Use X86 mnemonic checks
Remove switches in X86MCPlusBuilder.cpp, use mnemonic checks instead

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D122853
2022-04-04 14:05:46 -07:00
Vladislav Khmelevsky 3b1314f4de [BOLT] AArch64: Read all static relocations
Read static relocs on the same address, as dynamic in order to update
constant island data address properly.

Differential Revision: https://reviews.llvm.org/D122100
2022-04-03 19:03:35 +03:00
Vladislav Khmelevsky 4c14519ecb [BOLT] LongJmp: Check for shouldEmit
Check that the function will be emitted in the final binary. Preserving
old function address is needed in case it is PLT trampiline, that is
currently not moved by the BOLT.

Differential Revision: https://reviews.llvm.org/D122098
2022-03-31 22:33:09 +03:00
Vladislav Khmelevsky fed958c6cc [BOLT] AArch64: Emit text objects
BOLT treats aarch64 objects located in text as empty functions with
contant islands. Emit them with at least 8-byte alignment to the new
text section.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Differential Revision: https://reviews.llvm.org/D122097
2022-03-31 22:28:50 +03:00
Amir Ayupov c31af7cfe3 [MC][BOLT] Add setter for AllowAtInName
Use the setter in BOLT to allow printing names with variant kind in the name
(e.g. "func@PLT").
Fixes BOLT buildbot tests that broke after D122516:
https://lab.llvm.org/buildbot/#/builders/215/builds/3595

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D122694
2022-03-30 13:04:28 -07:00
Vladislav Khmelevsky af9bdcfc46 [BOLT] Align constant islands to 8 bytes
AArch64 requires CI to be aligned to 8 bytes due to access instructions
restrictions. E.g. the ldr with imm, where imm must be aligned to 8 bytes.

Differential Revision: https://reviews.llvm.org/D122065
2022-03-27 22:30:42 +03:00
spupyrev 4609f60ebc [BOLT] Avoid pointless loop rotation
It seems the earlier implementation does not follow the description
in LoopRotationPass.h: It rotates loops even if they are already laid out
correctly. The diff adjusts the behaviour.

Given that the impact of LoopInversionPass is minor, this change won't
yield significant perf differences. Tested on clang-10: there seems to be a
0.1%-0.3% cpu win and a small reduction of branch misses.

**Before:**
BOLT-INFO: 120 Functions were reordered by LoopInversionPass

**After:**
BOLT-INFO: 79 Functions were reordered by LoopInversionPass

Reviewed By: yota9

Differential Revision: https://reviews.llvm.org/D121921
2022-03-22 12:42:42 -07:00
Vladislav Khmelevsky 5be5d0f56e [BOLT] LongJmp speedup refactoring
Run tentativeLayoutRelocMode twice only if UseOldText option was passed.
Refactor BF loop to break on condtition met.

Differential Revision: https://reviews.llvm.org/D121825
2022-03-18 16:16:47 +03:00
Amir Ayupov 42e8e00189 [BOLT][NFC] Use X86 mnemonic tables
Remove tables from X86MCPlusBuilder, make use of llvm::X86 mnemonic tables.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D121573
2022-03-18 01:52:11 -07:00
Amir Ayupov dc1cf838a5 [BOLT] Strip redundant AdSize override prefix
Since LLVM MC now preserves redundant AdSize override prefix (0x67), remove it
in BOLT explicitly (-x86-strip-redundant-adsize, on by default).

Test Plan:
`bin/llvm-lit -a bolt/test/X86/addr32.s`

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D120975
2022-03-16 09:38:17 -07:00
Amir Ayupov 698127df51 [BOLT][NFC] Move isMOVSX64rm32 out of MCPlusBuilder
Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D121669
2022-03-16 08:18:56 -07:00
Vladislav Khmelevsky 62a289d85c [BOLT] LongJmp: Fix hot text section alignment
The BinaryEmitter uses opts::AlignText value to align the hot text
section. Also check that the opts::AlignText is at least
equal opts::AlignFunctions for the same reason, as described in D121392.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Differential Revision: https://reviews.llvm.org/D121728
2022-03-16 15:57:46 +03:00
Maksim Panchenko 57f03db195 [BOLT][NFC] Remove unused function
Reviewed By: yota9

Differential Revision: https://reviews.llvm.org/D121729
2022-03-15 12:39:14 -07:00
Vladislav Khmelevsky 8ab69baad5 [BOLT] Set cold sections alignment explicitly
The cold text section alignment is set using the maximum alignment value
passed to the emitCodeAlignment. In order to calculate tentetive layout
right we will set the minimum alignment of such sections to the maximum
possible function alignment explicitly.

Differential Revision: https://reviews.llvm.org/D121392
2022-03-15 22:12:17 +03:00
Amir Ayupov 5790441c45 [BOLT][NFC] Use getShortOpcodeArith in X86MCPlusBuilder
Unify `llvm::X86::getRelaxedOpcodeArith` and `getShortArithOpcode` in
X86MCPlusBuilder.cpp.

Addresses https://lists.llvm.org/pipermail/llvm-dev/2022-January/154526.html

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D121404
2022-03-12 09:07:28 -08:00
Elvina Yakubova db65429db5 [BOLT] Divide RegularPageSize for X86 and AArch64 cases
For AArch64 in some cases/some distributions ld uses 64K alignment of LOAD segments by default.

Reviewed By: yota9, maksfb

Differential Revision: https://reviews.llvm.org/D119267
2022-03-10 23:09:50 +03:00
Vladislav Khmelevsky 04b87cf0e7 [BOLT] LongJmp: Use per-function alignment values
The per-function alignment values must be used in order to create
tentative layout.

Differential Revision: https://reviews.llvm.org/D121298
2022-03-10 19:48:48 +03:00