Previously, this unconditionally emitted text IR. I ran
into a bug that manifested in broken disassembly, so the
desired output was the bitcode format. If the input format
was binary bitcode, the requested output file ends in .bc,
or an explicit -output-bitcode option was used, emit bitcode.
In a testcase I'm working on, the old write out and count IR lines
was taking about 200-300ms per iteration. This drops it out of the
profile.
This doesn't account for everything, but it doesn't seem to matter.
We should probably try to account for metadata and constantexpr tree
depths.
There are two different senses in which a block can be "address-taken".
There can be a BlockAddress involved, which means we need to map the
IR-level value to some specific block of machine code. Or there can be
constructs inside a function which involve using the address of a basic
block to implement certain kinds of control flow.
Mixing these together causes a problem: if target-specific passes are
marking random blocks "address-taken", if we have a BlockAddress, we
can't actually tell which MachineBasicBlock corresponds to the
BlockAddress.
So split this into two separate bits: one for BlockAddress, and one for
the machine-specific bits.
Discovered while trying to sort out related stuff on D102817.
Differential Revision: https://reviews.llvm.org/D124697
I have a register allocator failure that only reproduces with IPRA
enabled, and requires the specific regmask if I want to only run the
one relevant pass. The printed custom regmask is enormous and I would
like to reduce it.
This reduces each individual bit in the mask, but it would probably be
better to start at register units and clear all aliasing fields at a
time. This would require stricter verification that all aliasing bits
are set in regmasks (although I would prefer to switch regmasks to use
register units in the first place).
Adds support for reading and writing LTO bitcode files.
- Emit a summary if the original bitcode file had a summary
- Use split LTO units if the original bitcode file used them.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D127168
MIR support is totally unusable for AMDGPU without this, since the set
of reserved registers is set from fields here.
Add a clone method to MachineFunctionInfo. This is a subtle variant of
the copy constructor that is required if there are any MIR constructs
that use pointers. Specifically, at minimum fields that reference
MachineBasicBlocks or the MachineFunction need to be adjusted to the
values in the new function.
Use the query that doesn't assert if TracksLiveness isn't set, which
needs to always be available. We also need to start printing liveins
regardless of TracksLiveness.
I'm a bit confused by what's actually stored for the allocation
hints. The MIR parser only handles the "simple" case where there's a
single hint. I don't really understand the assertion in
clearSimpleHint, or under what circumstances there are multiple hint
registers.
Many MIR reductions benefit from or require increasing the instruction
count. For example, unlike in the IR, you may need to insert a new
instruction to represent an undef. The current instruction reduction
pass works around this by sticking implicit defs on whatever
instruction happens to be first in the entry block block.
Other strategies I've applied manually include breaking instructions
with multiple defs into separate instructions, or breaking large
register defs into multiple subregister defs.
Make up a simple scoring system based on what I generally try to get
rid of first when manually reducing. Counts implicit defs as free
since reduction passes will be introducing them, although they
probably should count for something. It also might make more sense to
have a comparison the two functions, rather than having to compute a
contextless number. This isn't particularly well tested since overall
the MIR support isn't in a place where it is useful on the kinds of
testcases I want to throw at it.
There were two problems with directly copying the MMOs from the old
function. The MMOs are owned by the function's Allocator, so need to
be reallocated anyways (surprisingly I didn't notice breakage on
this). Second, the PseudoSourceValues are also allocated per function
and need to be reallocated.
The current testcase I'm trying to reduce only reproduces with IPRA
enabled and requires handling multiple functions.
The only real difference vs. the IR is the extra indirect to look for
the underlying MachineFunction, so treat the ReduceWorkItem as the
module instead of the function.
The ugliest piece of this is really the ugliness of
MachineModuleInfo. It not only tracks actual module state, but has a
number of transient fields used for isel and/or the asm printer. These
shouldn't do any harm for the use here, though they should be
separated out.
Just clone all the virtual registers instead of looking for def
operands. This preserves the register values used, simplifying the
rest of the code. This avoids needing to expose the register map to
target code.
Previously the specific values used for fixed frame indexes was in
reverse order in the cloned function from the original, and a map was
used to adjust all frame indexes to the potentially new values. Insert
the fixed objects in reverse to avoid this. This simplifies other
code, since now we don't need to track down all frame indexes. This
will allow targets that store frame indexes in MachineFunctionInfo to
simply copy the values.
Note this isn't directly observable in the test since the resulting
MIR print/parse can shuffle the IDs around (in particular the final
serialization implicitly strips out dead objects).
getSuccProbability was private for some reason, saying to go through
MachineBranchProbabilityInfo. There doesn't seem to be much point to
that, as that wrapper directly calls this.
Like other areas, some of these fields aren't handled by the MIR
printer/parser so aren't tested.
This didn't work at all before, and would assert on any frame
index. Also copy the other fields, which I believe should cover
everything. There are a few that are untested since MIR serialization
is apparently still missing them (isStatepointSpillSlot,
ObjectSSPLayout, and ObjectSExt/ObjectZExt).
When exporting textual IR during reduction the ShouldPreserveUseListOrder
parameter of the IR printer should be set to get predictable results.
Differential Revision: https://reviews.llvm.org/D118585
(Second try. Need to link against CodeGen and MC libs.)
The llvm-reduce tool has been extended to operate on MIR (import, clone and
export). Current limitation is that only a single machine function is
supported. A single reducer pass that operates on machine instructions (while
on SSA-form) has been added. Additional MIR specific reducer passes can be
added later as needed.
Differential Revision: https://reviews.llvm.org/D110527
The llvm-reduce tool has been extended to operate on MIR (import, clone and
export). Current limitation is that only a single machine function is
supported. A single reducer pass that operates on machine instructions (while
on SSA-form) has been added. Additional MIR specific reducer passes can be
added later as needed.
Differential Revision: https://reviews.llvm.org/D110527