Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.
Also remove cl::init(false) while touching the lines.
When using counter relocations, two instructions are emitted to compute
the address of the counter variable.
```
%BiasAdd = add i64 ptrtoint <__profc_>, <__llvm_profile_counter_bias>
%Addr = inttoptr i64 %BiasAdd to i64*
```
When promoting a counter, these instructions might not be available in
the block, so we need to copy these instructions.
This fixes https://github.com/llvm/llvm-project/issues/55125
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D125710
Add a map from functions to load instructions that compute the profile bias. Previously we assumed that if the first instruction in the function was a load instruction, then it must be computing the bias. This was likely to work out because functions usually start with the `llvm.instrprof.increment` instruction, but optimizations could change this. For example, inlining into a non-profiled function.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D114319
This patch switches the PGO implementation on AIX from using the runtime
registration-based section tracking to the __start_SECNAME/__stop_SECNAME
based. In order to enable the recognition of __start_SECNAME/__stop_SECNAME
symbols in the AIX linker, the -bdbg:namedsects:ss needs to be used.
Reviewed By: jsji, MaskRay, davidxl
Differential Revision: https://reviews.llvm.org/D124857
CoverageMappingModuleGen generates a coverage mapping record
even for unused functions with internal linkage, e.g.
static int foo() { return 100; }
Clang frontend eliminates such functions, but InstrProfiling pass
still pulls in profile runtime since there is a coverage record.
Fuchsia uses runtime counter relocation, and pulling in profile
runtime for unused functions causes a linker error:
undefined hidden symbol: __llvm_profile_counter_bias.
Since 389dc94d4b, we do not hook profile runtime for the binaries
that none of its translation units have been instrumented in Fuchsia.
This patch extends that for the instrumented binaries that
consist of only unused functions.
Differential Revision: https://reviews.llvm.org/D122336
Use the llvm flag `-pgo-function-entry-coverage` to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because
* We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions
* We use a single byte per function rather than 8 bytes per block
The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code.
When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%).
[0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D116180
The `InstrProfInstBase` class is for all `llvm.instrprof.*` intrinsics. In a
later diff we will add new instrinsic of this type. Also refactor some
logic in `InstrProfiling.cpp`.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D117261
This reverts commit fd4808887e.
This patch causes gcc to issue a lot of warnings like:
warning: base class ‘class llvm::MCParsedAsmOperand’ should be
explicitly initialized in the copy constructor [-Wextra]
In debug info correlation mode we do not emit the data globals so we
need to explicitly mark the counter globals as used so they don't get
stripped.
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D115981
Extend `llvm-profdata` to read in a `.proflite` file and also a debug info file to generate a normal `.profdata` profile. This reduces the binary size by 8.4% when building an instrumented Clang binary without value profiling (164 MB vs 179 MB).
This work is part of the "lightweight instrumentation" RFC: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
This was first landed in https://reviews.llvm.org/D114566 but had to be reverted due to build errors.
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D115915
Extend `llvm-profdata` to read in a `.proflite` file and also a debug info file to generate a normal `.profdata` profile. This reduces the binary size by 8.4% when building an instrumented Clang binary without value profiling (164 MB vs 179 MB).
This work is part of the "lightweight instrumentation" RFC: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D114566
Add the llvm flag `-debug-info-correlate` to attach debug info to instrumentation counters so we can correlate raw profile data to their functions. Raw profiles are dumped as `.proflite` files. The next diff enables `llvm-profdata` to consume `.proflite` and debug info files to produce a normal `.profdata` profile.
Part of the "lightweight instrumentation" work: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
The original diff https://reviews.llvm.org/D114565 was reverted because of the `Instrumentation/InstrProfiling/debug-info-correlate.ll` test, which is fixed in this commit.
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D115693
This reverts commit 800bf8ed29.
The `Instrumentation/InstrProfiling/debug-info-correlate.ll` test was
failing because I forgot the `llc` commands are architecture specific.
I'll follow up with a fix.
Differential Revision: https://reviews.llvm.org/D115689
Add the llvm flag `-debug-info-correlate` to attach debug info to instrumentation counters so we can correlate raw profile data to their functions. Raw profiles are dumped as `.proflite` files. The next diff enables `llvm-profdata` to consume `.proflite` and debug info files to produce a normal `.profdata` profile.
Part of the "lightweight instrumentation" work: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D114565
Instead of using `DenseMap::find()` and `DenseMap::insert()`, use
`DenseMap::operator[]` to get a reference to the profile data and update
the reference. This simplifies the changes in D114565.
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D114828
The `llvm.instrprof.increment` intrinsic uses `i32` for the index. We should use this same type for the index into the GEP instructions.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D114268
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.
We generate symbols like `profc`/`profd` for each function, and put them into csects.
When there are weak functions, we generate weak symbols for the functions as well,
with ELF (and some others), linker (binder) will discard and only keep one copy of the weak symbols.
However, on AIX, the current binder can NOT discard the weak symbols if we put all of them into the same csect,
as binder can NOT discard a subset of a csect.
This creates a unique challenge for using those symbols to calculate some relative offsets.
This patch changed the linkage of `profc`/`profd` symbols to be private, so that all the profc/profd for each weak symbol will be *local* to objects, and all kept in the csect, so we won't have problem. Although only one of the counters will be used, all the pointer in the profd is correct.
The downside is that we won't be able to discard the duplicated counters and profile data,
but those can not be discarded even if we keep the weak linkage,
due to the binder limitation of not discarding a subsect of the csect either .
Reviewed By: Whitney, MaskRay
Differential Revision: https://reviews.llvm.org/D110422
Similar to D97585.
D25456 used `S_ATTR_LIVE_SUPPORT` to ensure the data variable will be retained
or discarded as a unit with the counter variable, so llvm.compiler.used is
sufficient. It allows ld to dead strip unneeded profc and profd variables.
Reviewed By: vsk
Differential Revision: https://reviews.llvm.org/D105445
The NS==0 condition used by D103717 missed a corner case: if the current copy
does not have a hash suffix (e.g. weak_odr), a copy with value profiling (with a
different CFG) may exist. This is super rare, but is possible with pre-inlining
PGO instrumentation (which can make a weak_odr function inlines its callees
differently, sometimes with value profiling while sometimes without).
If the current copy with private profd is prevailing, the non-prevailing copy
may get an undefined symbol if a caller inlining the non-prevailing function
references its profd. If the other copy with non-private profd is prevailing,
the current copy may cause a "relocation to discarded section" linker error.
The fix is straightforward: just keep non-private profd in such a `DataReferencedByCode` case.
With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
clang is 0.08% larger (172431496/172286720-1).
`stat -c %s **/*.o | awk '{s+=$1}END{print s}' is 0.026% larger.
The majority of D103717's benefits remains.
Reviewed By: xur
Differential Revision: https://reviews.llvm.org/D108432
This reverts commit f653beea88.
It broke Windows coverage-inline.cpp because link.exe has a limitation
that external symbols in IMAGE_COMDAT_SELECT_ASSOCIATIVE don't work.
It essentially dropped the previous size optimization for coverage
because coverage doesn't rename comdat by default.
Needs more investigation what we should do.
The NS==0 condition used by D103717 missed a corner case: if the current copy
does not have a hash suffix (e.g. weak_odr), a copy with value profiling (with a
different CFG) may exist. This is super rare, but is possible with pre-inlining
PGO instrumentation (which can make a weak_odr function inlines its callees
differently, sometimes with value profiling while sometimes without).
If the current copy with private profd is prevailing, the non-prevailing copy
may get an undefined symbol if a caller inlining the non-prevailing function
references its profd. If the other copy with non-private profd is prevailing,
the current copy may cause a "relocation to discarded section" linker error.
The fix is straightforward: just keep non-private profd in this case.
With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
clang is 0.08% larger (172431496/172286720-1).
`stat -c %s **/*.o | awk '{s+=$1}END{print s}' is 0.026% larger.
The majority of D103717's benefits remains.
Reviewed By: xur
Differential Revision: https://reviews.llvm.org/D108432
The COFF specific `DataReferencedByCode` complexity (D103372 D103717) is due to
a link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE is
not really dropped, so it can cause duplicate definition error.
When none of the translation units in the binary have been instrumented
we shouldn't need to link the profile runtime. However, because we pass
-u__llvm_profile_runtime on Linux and Fuchsia, the runtime would still
be pulled in and incur some overhead. On Fuchsia which uses runtime
counter relocation, it also means that we cannot reference the bias
variable unconditionally.
This change modifies the InstrProfiling pass to pull in the profile
runtime only when needed by declaring the __llvm_profile_runtime symbol
in the translation unit only when needed. For now we restrict this only
for Fuchsia, but this can be later expanded to other platforms. This
approach was already used prior to 9a041a7522, but we changed it
to always generate the __llvm_profile_runtime due to a TAPI limitation,
but that limitation may no longer apply, and it certainly doesn't apply
on platforms like Fuchsia.
Differential Revision: https://reviews.llvm.org/D98061
Rather than emitting the bias variable lazily as needed, emit it
eagerly. This allows profile runtime to refer to this variable
unconditionally without having to use the weak reference. The bias
variable is in a COMDAT so there'll never be more than one instance,
and if it's not needed, linker should be able to GC it, so the overhead
should be minimal.
Differential Revision: https://reviews.llvm.org/D107377
Change `CountersPtr` in `__profd_` to a label difference, which is a link-time
constant. On ELF, when linking a shared object, this requires that `__profc_` is
either private or linkonce/linkonce_odr hidden. On COFF, we need D104564 so that
`.quad a-b` (64-bit label difference) can lower to a 32-bit PC-relative relocation.
```
# ELF: R_X86_64_PC64 (PC-relative)
.quad .L__profc_foo-.L__profd_foo
# Mach-O: a pair of 8-byte X86_64_RELOC_UNSIGNED and X86_64_RELOC_SUBTRACTOR
.quad l___profc_foo-l___profd_foo
# COFF: we actually use IMAGE_REL_AMD64_REL32/IMAGE_REL_ARM64_REL32 so
# the high 32-bit value is zero even if .L__profc_foo < .L__profd_foo
# As compensation, we truncate CountersDelta in the header so that
# __llvm_profile_merge_from_buffer and llvm-profdata reader keep working.
.quad .L__profc_foo-.L__profd_foo
```
(Note: link.exe sorts `.lprfc` before `.lprfd` even if the object writer
has `.lprfd` before `.lprfc`, so we cannot work around by reordering
`.lprfc` and `.lprfd`.)
With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
`ld -pie` linked clang is 1.74% smaller due to fewer R_X86_64_RELATIVE relocations.
```
% readelf -r pie | awk '$3~/R.*/{s[$3]++} END {for (k in s) print k, s[k]}'
R_X86_64_JUMP_SLO 331
R_X86_64_TPOFF64 2
R_X86_64_RELATIVE 476059 # was: 607712
R_X86_64_64 2616
R_X86_64_GLOB_DAT 31
```
The absolute function address (used by llvm-profdata to collect indirect call
targets) can be converted to relative as well, but is not done in this patch.
Differential Revision: https://reviews.llvm.org/D104556
In the textual format, `noduplicates` means no COMDAT/section group
deduplication is performed. Therefore, if both sets of sections are retained, and
they happen to define strong external symbols with the same names,
there will be a duplicate definition linker error.
In PE/COFF, the selection kind lowers to `IMAGE_COMDAT_SELECT_NODUPLICATES`.
The name describes the corollary instead of the immediate semantics. The name
can cause confusion to other binary formats (ELF, wasm) which have implemented/
want to implement the "no deduplication" selection kind. Rename it to be clearer.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D106319
We need the compiler generated variable to override the weak symbol of
the same name inside the profile runtime, but using LinkOnceODRLinkage
results in weak symbol being emitted in which case the symbol selected
by the linker is going to depend on the order of inputs which can be
fragile.
This change replaces the use of weak definition inside the runtime with
a weak alias. We place the compiler generated symbol inside a COMDAT
group so dead definition can be garbage collected by the linker.
We also disable the use of runtime counter relocation on Darwin since
Mach-O doesn't support weak external references, but Darwin already uses
a different continous mode that relies on overmapping so runtime counter
relocation isn't needed there.
Differential Revision: https://reviews.llvm.org/D105176
We need the compiler generated variable to override the weak symbol of
the same name inside the profile runtime, but using LinkOnceODRLinkage
results in weak symbol being emitted which leads to an issue where the
linker might choose either of the weak symbols potentially disabling the
runtime counter relocation.
This change replaces the use of weak definition inside the runtime with
an external weak reference to address the issue. We also place the
compiler generated symbol inside a COMDAT group so dead definition can
be garbage collected by the linker.
Differential Revision: https://reviews.llvm.org/D105176
On ELF, the D1003372 optimization can apply to more cases. There are two
prerequisites for making `__profd_` private:
* `__profc_` keeps `__profd_` live under compiler/linker GC
* `__profd_` is not referenced by code
The first is satisfied because all counters/data are in a section group (either
`comdat any` or `comdat noduplicates`). The second requires that the function
does not use value profiling.
Regarding the second point: `__profd_` may be referenced by other text sections
due to inlining. There will be a linker error if a prevailing text section
references the non-prevailing local symbol.
With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
clang is 4.2% smaller (1-169620032/177066968).
`stat -c %s **/*.o | awk '{s+=$1}END{print s}' is 2.5% smaller.
Reviewed By: davidxl, rnk
Differential Revision: https://reviews.llvm.org/D103717
This reverts commit 76d0747e08.
If a group has `__llvm_prf_vals` due to static value profiler counters
(`NS!=0`), we cannot make `__llvm_prf_data` private, because a prevailing text
section may reference `__llvm_prf_data` and will cause a `relocation refers to a
discarded section` linker error.
Note: while a `__profc_` group is non-prevailing, it may be referenced by a
prevailing text section due to inlining.
```
group section [ 66] `.group' [__profc__ZN5clang20EmitClangDeclContextERN4llvm12RecordKeeperERNS0_11raw_ostreamE] contains 4 sections:
[Index] Name
[ 67] __llvm_prf_cnts
[ 68] __llvm_prf_vals
[ 69] __llvm_prf_data
[ 70] .rela__llvm_prf_data
```
For ELF, since all counters/data are in a section group (either `comdat any` or
`comdat noduplicates`), and the signature for `comdat any` is `__profc_`, the
D1003372 optimization prerequisite (linker GC cannot discard data variables
while the text section is retained) is always satisified, we can make __profd_
unconditionally private.
Reviewed By: davidxl, rnk
Differential Revision: https://reviews.llvm.org/D103717
`__profd_*` variables are referenced by code only when value profiling is
enabled. If disabled (e.g. default -fprofile-instr-generate), the symbols just
waste space on ELF/Mach-O. We change the comdat symbol from `__profd_*` to
`__profc_*` because an internal symbol does not provide deduplication features
on COFF. The choice doesn't matter on ELF.
(In -DLLVM_BUILD_INSTRUMENTED_COVERAGE=on build, there is now no `__profd_*` symbols.)
On Windows this enables further optimization. We are no longer affected by the
link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE can
cause duplicate definition error.
https://lists.llvm.org/pipermail/llvm-dev/2021-May/150758.html
We can thus use llvm.compiler.used instead of llvm.used like ELF (D97585).
This avoids many `/INCLUDE:` directives in `.drectve`.
Here is rnk's measurement for Chrome:
```
This reduced object file size of base_unittests.exe, compiled with coverage, optimizations, and gmlt debug info by 10%:
#BEFORE
$ find . -iname '*.obj' | xargs du -b | awk '{ sum += $1 } END { print sum}'
1047758867
$ du -cksh base_unittests.exe
82M base_unittests.exe
82M total
# AFTER
$ find . -iname '*.obj' | xargs du -b | awk '{ sum += $1 } END { print sum}'
937886499
$ du -cksh base_unittests.exe
78M base_unittests.exe
78M total
```
The change is NFC for Mach-O.
Reviewed By: davidxl, rnk
Differential Revision: https://reviews.llvm.org/D103372