Implicit cast between char* and StringRef when writing sections.
Reproduce:
```
$> llvm-objcopy --dump-section=name=name.data out.wasm
$> llvm-objcopy --remove-section=name out.wasm out_no_name.wasm
$> llvm-objcopy --add-section=name=name.data out_no_name.wasm out_new_name.wasm
# With wasm-objdump -h we can see that the name section is not totally copied in the new wasm file (if it actually contain empty bytes)
```
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D139210
We need to flatten the SampleFDO profile in profile supplementation
because the InstrFDO profile does not have inlined callsite counters.
Without flattening profile, FDO optimizations are not stable:
we will not supplement the second generation profile when the modified
functions are all inlined.
This patch fixes this issue: we will flatten the profile for functions
that appears in FDO profile.
Note that we only need to find the hot/warm functions in SampleFDO
profile, so we will not perform a full flatten. We will use
a DFS traversal to compute the accumulated entry count and max bodycount.
This is much cheaper than full flattening.
Differential Revision: https://reviews.llvm.org/D138893
If we don't want to set PointerToRawData, for an empty section,
we do must set it to zero explicitly. Some object file generators
do set it to zero for empty sections, while others set a nonzero
value pointing at the end of the previous section.
If the value was nonzero on input, we need to update it - either
setting it to zero, or to a valid offset in the output file (not
out of bounds)
This fixes https://github.com/mstorsjo/llvm-mingw/issues/313.
Testing this is tricky, because we can't use yaml2obj, since that
doesn't produce object files with nonzero PointerToRawData for
empty sections. We can use llvm-mc to assemble a small file
(assuming that LLVM's MC layer keeps this behaviour), or bundle
a small binary object file. I opted for using llvm-mc for now here
(with a test that it actually does keep this property), but I don't
mind changing it to a canned object file to make the test less brittle.
Differential Revision: https://reviews.llvm.org/D138783
The znver2 override already matched the WriteBlendY class exactly, and the znver1 override wasn't accounting for ymm double-pumping.
Found with the help of D138359
The exports trie used to be pointed by the information in LC_DYLD_INFO,
but when chained fixups are present, the exports trie is pointed by
LC_DYLD_EXPORTS_TRIE instead.
Modify ObjCopy code to calculate the right offset and size needed
depending on the existence of LC_DYLD_INFO or LC_DYLD_EXPORTS_TRIE, read
the exports from either of those places, and write the export
information as pointed to either of those places.
Depends on D134571.
Reviewed By: alexander-shaposhnikov
Differential Revision: https://reviews.llvm.org/D137879
znver1/znver2 has barely any difference in behaviour between the AVX1/2 variants of these instructions - it looks like it was a copy+paste mistake to miss the AVX2 integer domain instructions in the overrides.
Having said that the override numbers don't appear to match the numbers in the AMD 17h SoGs very well - for instance vperm2f128/vperm2i128 might be microcoded from the AMD sense of >3 uops, but it doesn't have a 100cy latency..... These will need to be further addressed.
This appears to be a slow down vs Skylake (which the model was copied off) - confirmed with uops.info / instlatx64
Noticed as D138359 was reporting that many of the PACKS overrides were redundant, but were in fact incorrect
Reported by D138359 - they were being overridden as WriteMicrocoded despite already being declared WriteMicrocoded
It also fixes a rather funny instregex mismatch that was matching the movsldup shuffle by mistake
D138359 was reporting that this override was superfluous, but it had never been setup - I took the numbers from uops.info (I couldn't find an estimate in Intel docs).
These were missed for some reason - only noticed this while investigating a FIXME in the SandyBridge model
Also sync the znver2/znver3 tests which had been missed when LOCK test coverage was added
D138359 was reporting that the EXTRACTPSrr override was unnecessary, however the AMD SoG and Agner both confirm that both the rr and rm versions take 2uops (matching znver1)
NOTE: For IceLakeServer we actually test TigerLake as that's the only target that supports it (we do something similar for F16C on IvyBridge in the SandyBridge tests).
Unconditionally removing landing pads results in invalid IR,
if there is a different `invoke` that uses it. Update the code
to only remove the landing pad if the current invoke is the only
user. Also carefully avoid creating plain branches to bbs with
landing pads we couldn't remove.
Reviewed By: arsenm, aeubanks
Differential Revision: https://reviews.llvm.org/D138072
Fix a assertion in dsymutil coming from the Reproducer/FileCollector.
When TMPDIR is empty, the root becomes a relative path, triggering an
assertion when adding a relative path to the VFS mapping. This patch
fixes the issue by resolving the relative path and also moves the
assertion up to make it easier to diagnose these issues in the future.
rdar://102170986
Differential revision: https://reviews.llvm.org/D137959
Fix a assertion in dsymutil coming from the Reproducer/FileCollector.
When TMPDIR is empty, the root becomes a relative path, triggering an
assertion when adding a relative path to the VFS mapping. This patch
fixes the issue by resolving the relative path and also moves the
assertion up to make it easier to diagnose these issues in the future.
rdar://102170986
Differential revision: https://reviews.llvm.org/D137959
AMD 15h SoG + Agner both indicate there's no difference between MPSADBWrri + VMPSADBWrri - I can't find any data on the folded variant so I've kept the existing numbers
Removes the last X86 override for WriteMPSAD/WritePSADBW classes - removing a further 3 entries from every sched class table
The znver1/znver2 schedules for CVTPD2PS were incorrectly double pumping the xmm-load variant instead of the ymm variants (znver1 only)
Also, the xmm-load variant was incorrectly using FP03 instead of just FP3
Confirmed by the AMD SoG 17h tables, Agner + uops.info
Another step towards removing a lot of unnecessary overrides from all the x86 scheduler models - these should hopefully be convertible into regular WriteCvtPD2I classes soon.
Fixes a lot of throughput mismatches - the more complicated conversion instructions use ICXPort5+ICXPort01, not ICXPort5+ICXPort015 (ICXPort015 is mainly used for basic Logic + blend ops)
Fixing this should allow us to remove a lot of unnecessary scheduler overrides from IceLakeModel
Confirmed by both Agner + uops.info