Previously the InstCombiner class contained a pointer to an IR builder that had been passed to the constructor. Sometimes this would be passed to helper functions as either a pointer or the pointer would be dereferenced to be passed by reference.
This patch makes it a reference everywhere including the InstCombiner class itself so there is more inconsistency. This a large, but mechanical patch. I've done very minimal formatting changes on it despite what clang-format wanted to do.
llvm-svn: 307451
Going through the Constant methods requires redetermining that the Constant is a ConstantInt and then calling isZero/isOne/isMinusOne.
llvm-svn: 307292
Summary:
I noticed that passing known bits across these intrinsics isn't great at capturing the information we really know. Turning known bits of the input into known bits of a count output isn't able to convey a lot of what we really know.
This patch adds range metadata to these intrinsics based on the known bits.
Currently the patch punts if we already have range metadata present.
Reviewers: spatel, RKSimon, davide, majnemer
Reviewed By: RKSimon
Subscribers: sanjoy, hfinkel, llvm-commits
Differential Revision: https://reviews.llvm.org/D32582
llvm-svn: 305927
Summary:
Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html
This change is to alter the prototype for the atomic memcpy intrinsic. The prototype itself is being changed to more closely resemble the semantics and parameters of the llvm.memcpy intrinsic -- to ease later combination of the llvm.memcpy and atomic memcpy intrinsics. Furthermore, the name of the atomic memcpy intrinsic is being changed to make it clear that it is not a generic atomic memcpy, but specifically a memcpy is unordered atomic.
Reviewers: reames, sanjoy, efriedma
Reviewed By: reames
Subscribers: mzolotukhin, anna, llvm-commits, skatkov
Differential Revision: https://reviews.llvm.org/D33240
llvm-svn: 305558
Summary: This matches the behavior we already had for compares and makes us consistent everywhere.
Reviewers: dberlin, hfinkel, spatel
Reviewed By: dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33604
llvm-svn: 305049
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
Summary:
Fairly straightforward patch to fill in some of the holes in the
attributes API with respect to accessing parameter/argument attributes.
The patch aims to step further towards encapsulating the
idx+FirstArgIndex pattern to access these attributes to within the
AttributeList.
Patch by Daniel Neilson!
Reviewers: rnk, chandlerc, pete, javed.absar, reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33355
llvm-svn: 304329
Every other place in InstCombine that uses these methods in ValueTracking already pass this information. This makes the remaining sites consistent.
Differential Revision: https://reviews.llvm.org/D33567
llvm-svn: 304018
This continues the changes started when computeSignBit was replaced with this new version of computeKnowBits.
Differential Revision: https://reviews.llvm.org/D33431
llvm-svn: 303773
This patch adds min/max population count, leading/trailing zero/one bit counting methods.
The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give.
Differential Revision: https://reviews.llvm.org/D32931
llvm-svn: 302925
This adds routines for reseting KnownBits to unknown, making the value all zeros or all ones. It also adds methods for querying if the value is zero, all ones or unknown.
Differential Revision: https://reviews.llvm.org/D32637
llvm-svn: 302262
Summary:
Do three things to help with that:
- Add AttributeList::FirstArgIndex, which is an enumerator currently set
to 1. It allows us to change the indexing scheme with fewer changes.
- Add addParamAttr/removeParamAttr. This just shortens addAttribute call
sites that would otherwise need to spell out FirstArgIndex.
- Remove some attribute-specific getters and setters from Function that
take attribute list indices. Most of these were only used from
BuildLibCalls, and doesNotAlias was only used to test or set if the
return value is malloc-like.
I'm happy to split the patch, but I think they are probably easier to
review when taken together.
This patch should be NFC, but it sets the stage to change the indexing
scheme to this, which is more convenient when indexing into an array:
0: func attrs
1: retattrs
2...: arg attrs
Reviewers: chandlerc, pete, javed.absar
Subscribers: david2050, llvm-commits
Differential Revision: https://reviews.llvm.org/D32811
llvm-svn: 302060
This patch introduces a new KnownBits struct that wraps the two APInt used by computeKnownBits. This allows us to treat them as more of a unit.
Initially I've just altered the signatures of computeKnownBits and InstCombine's simplifyDemandedBits to pass a KnownBits reference instead of two separate APInt references. I'll do similar to the SelectionDAG version of computeKnownBits/simplifyDemandedBits as a separate patch.
I've added a constructor that allows initializing both APInts to the same bit width with a starting value of 0. This reduces the repeated pattern of initializing both APInts. Once place default constructed the APInts so I added a default constructor for those cases.
Going forward I would like to add more methods that will work on the pairs. For example trunc, zext, and sext occur on both APInts together in several places. We should probably add a clear method that can be used to clear both pieces. Maybe a method to check for conflicting information. A method to return (Zero|One) so we don't write it out everywhere. Maybe a method for (Zero|One).isAllOnesValue() to determine if all bits are known. I'm sure there are many other methods we can come up with.
Differential Revision: https://reviews.llvm.org/D32376
llvm-svn: 301432
This patch uses various APInt methods to reduce temporary APInt creation.
This should be all of the unrelated cleanups that got buried in D32376(creating a KnownBits struct) as well as some pointed out by Simon during the review of that. Plus a few improvements to use counting instead of masking.
I've left out any places where we do something like (KnownZero & KnownOne) != 0 as I plan to add a helper method to KnownBits to ask that question and didn't want to thrash that code an additional time.
Differential Revision: https://reviews.llvm.org/D32495
llvm-svn: 301338
Summary:
The return value of these intrinsics should always have 0 bits for
inactive threads. This means that when all arguments are constant
and the comparison evaluates to true, the intrinsic should return
the current exec mask.
Fixes some GL_ARB_shader_ballot tests.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D32344
llvm-svn: 301195
This reverts commit r301105, 4, 3 and 1, as a follow up of the previous
revert, which broke even more bots.
For reference:
Revert "[APInt] Use operator<<= where possible. NFC"
Revert "[APInt] Use operator<<= instead of shl where possible. NFC"
Revert "[APInt] Use ashInPlace where possible."
PR32754.
llvm-svn: 301111
This change is correct because the verifier requires that at most one
argument be marked 'sret'.
NFC, removes a use of AttributeList slot APIs.
llvm-svn: 300784
This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result.
This adds an lshrInPlace(const APInt &) version as well.
Differential Revision: https://reviews.llvm.org/D32155
llvm-svn: 300566
This patch adds new optimization (Folding cmp(sub(a,b),0) into cmp(a,b))
to instCombineCall pass and was written specific for X86 CMP intrinsics.
Differential Revision: https://reviews.llvm.org/D31398
llvm-svn: 300422
This avoids the confusing 'CS.paramHasAttr(ArgNo + 1, Foo)' pattern.
Previously we were testing return value attributes with index 0, so I
introduced hasReturnAttr() for that use case.
llvm-svn: 300367
Add hasParamAttribute() and use it instead of hasAttribute(ArgNo+1,
Kind) everywhere.
The fact that the AttributeList index for an argument is ArgNo+1 should
be a hidden implementation detail.
NFC
llvm-svn: 300272
Summary:
Bug noticed by inspection.
Extend the test to handle invokes as well as calls, and rewrite it to
not depend on the inliner and other passes.
Also simplify the call site replacement code with CallSite, similar to
what I did to dead arg elimination and arg promotion (rL300235 and
rL300229).
Reviewers: danielcdh, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32041
llvm-svn: 300251
This seems like a much more natural API, based on Derek Schuff's
comments on r300015. It further hides the implementation detail of
AttributeList that function attributes come last and appear at index
~0U, which is easy for the user to screw up. git diff says it saves code
as well: 97 insertions(+), 137 deletions(-)
This also makes it easier to change the implementation, which I want to
do next.
llvm-svn: 300153
Summary:
For now, it just wraps AttributeSetNode*. Eventually, it will hold
AvailableAttrs as an inline bitset, and adding and removing enum
attributes will be super cheap.
This sinks AttributeSetNode back down to lib/IR/AttributeImpl.h.
Reviewers: pete, chandlerc
Subscribers: llvm-commits, jfb
Differential Revision: https://reviews.llvm.org/D31940
llvm-svn: 300014
This re-lands r299875.
I introduced a bug in Clang code responsible for replacing K&R, no
prototype declarations with a real function definition with a prototype.
The bug was here:
// Collect any return attributes from the call.
- if (oldAttrs.hasAttributes(llvm::AttributeList::ReturnIndex))
- newAttrs.push_back(llvm::AttributeList::get(newFn->getContext(),
- oldAttrs.getRetAttributes()));
+ newAttrs.push_back(oldAttrs.getRetAttributes());
Previously getRetAttributes() carried AttributeList::ReturnIndex in its
AttributeList. Now that we return the AttributeSetNode* directly, it no
longer carries that index, and we call this overload with a single node:
AttributeList::get(LLVMContext&, ArrayRef<AttributeSetNode*>)
That aborted with an assertion on x86_32 targets. I added an explicit
triple to the test and added CHECKs to help find issues like this in the
future sooner.
llvm-svn: 299899
Summary:
AttributeList::get(Fn|Ret|Param)Attributes no longer creates a temporary
AttributeList just to hide the AttributeSetNode type.
I've also added a factory method to create AttributeLists from a
parallel array of AttributeSetNodes. I think this simplifies
construction of AttributeLists when rewriting function prototypes.
Previously we would test if a particular index had attributes, and
conditionally add a temporary attribute list to a vector. Now the
attribute set vector is parallel to the argument vector already that
these passes already construct.
My long term vision is to wrap AttributeSetNode* inside an AttributeSet
type that holds the enum attributes, but that will come in a follow up
change.
I haven't done any performance measurements for this change because
profiling hasn't shown that any of the affected code is hot.
Reviewers: pete, chandlerc, sanjoy, hfinkel
Reviewed By: pete
Subscribers: jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D31198
llvm-svn: 299875
A common way to implement nearbyint is by fiddling with the floating
point environment and calling rint. This is used at least by the BSD
libm and musl. As such, canonicalizing the latter to the former will
create infinite loops for libm and generally pessimize performance, at
least when the generic C versions are used.
This change preserves the rint in the libcall translation and also
handles the domain truncation logic, so that rint with float argument
will be reduced to rintf etc.
llvm-svn: 299247
Summary: Currently the VP metadata was dropped when InstCombine converts a call to direct call. This patch converts the VP metadata to branch_weights so that its hotness is recorded.
Reviewers: eraman, davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31344
llvm-svn: 299228
Summary:
This class is a list of AttributeSetNodes corresponding the function
prototype of a call or function declaration. This class used to be
called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is
typically accessed by parameter and return value index, so
"AttributeList" seems like a more intuitive name.
Rename AttributeSetImpl to AttributeListImpl to follow suit.
It's useful to rename this class so that we can rename AttributeSetNode
to AttributeSet later. AttributeSet is the set of attributes that apply
to a single function, argument, or return value.
Reviewers: sanjoy, javed.absar, chandlerc, pete
Reviewed By: pete
Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits
Differential Revision: https://reviews.llvm.org/D31102
llvm-svn: 298393
Summary:
Solves PR 31990.
The bad rewrite could replace a memcpy of one word with
store i4 -1
while it should actually be
store i8 -1
Hopefully opt and llc has improved enough so the original optimization
done by the code isn't needed anymore.
One already existing testcase is affected. It originally tested that
the memcpy was replaced with
load double
but since we now remove that rewrite it will be
load i64
instead.
Patch suggestion by Eli Friedman.
Reviewers: eli.friedman, majnemer, efriedma
Reviewed By: efriedma
Subscribers: efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D30254
llvm-svn: 296585
Summary:
If there are two adjacent guards with different conditions, we can
remove one of them and include its condition into the condition of
another one. This patch allows InstCombine to merge them by the
following pattern:
guard(a); guard(b) -> guard(a & b).
Reviewers: reames, apilipenko, igor-laevsky, anna, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29378
llvm-svn: 293778
Summary:
There are many NVVM intrinsics that we can't entirely get rid of, but
that nonetheless often correspond to target-generic LLVM intrinsics.
For example, if flush denormals to zero (ftz) is enabled, we can convert
@llvm.nvvm.ceil.ftz.f to @llvm.ceil.f32. On the other hand, if ftz is
disabled, we can't do this, because @llvm.ceil.f32 will be lowered to a
non-ftz PTX instruction. In this case, we can, however, simplify the
non-ftz nvvm ceil intrinsic, @llvm.nvvm.ceil.f, to @llvm.ceil.f32.
These transformations are particularly useful because they let us
constant fold instructions that appear in libdevice, the bitcode library
that ships with CUDA and essentially functions as its libm.
Reviewers: tra
Subscribers: hfinkel, majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D28794
llvm-svn: 293244
This change reverts:
r293061: "[InstCombine] Canonicalize guards for NOT OR condition"
r293058: "[InstCombine] Canonicalize guards for AND condition"
They miscompile cases like:
```
declare void @llvm.experimental.guard(i1, ...)
define void @test_guard_not_or(i1 %A, i1 %B) {
%C = or i1 %A, %B
%D = xor i1 %C, true
call void(i1, ...) @llvm.experimental.guard(i1 %D, i32 20, i32 30)[ "deopt"() ]
ret void
}
```
because they do transfer the `i32 20, i32 30` parameters to newly
created guard instructions.
llvm-svn: 293227
This intrinsic uses bit 0 and bit 4 of an immediate argument to determine which bits of its inputs to read. This patch uses this information to simplify the demanded elements of the input vectors.
Differential Revision: https://reviews.llvm.org/D28979
llvm-svn: 293151
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine
Reviewed By: apilipenko
Differential Revision: https://reviews.llvm.org/D29075
Patch by Maxim Kazantsev.
llvm-svn: 293061
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine
Reviewed By: apilipenko
Differential Revision: https://reviews.llvm.org/D29074
Patch by Maxim Kazantsev.
llvm-svn: 293058
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine
Reviewed By: majnemer, apilipenko
Differential Revision: https://reviews.llvm.org/D29071
Patch by Maxim Kazantsev.
llvm-svn: 293056
Added early out for single undef input - we were already supporting (and testing) this in the constant folding code, we just do it quicker now
Drop undef handling from demanded elts code now that we handle it fully in InstCombiner::visitCallInst
llvm-svn: 292913
As discussed on D28777 - we don't need to handle 'all element' shuffles inside InstCombiner::visitCallInst as InstCombiner::SimplifyDemandedVectorElts will do everything we need.
llvm-svn: 292365
Add missing fabs(fpext) optimzation that worked with the call,
and also fixes it creating a second fpext when there were multiple
uses.
llvm-svn: 292172
Simplify a pshufb shuffle mask based on the elements of the mask that are actually demanded.
Differential Revision: https://reviews.llvm.org/D28745
llvm-svn: 292101
Here's my second try at making @llvm.assume processing more efficient. My
previous attempt, which leveraged operand bundles, r289755, didn't end up
working: it did make assume processing more efficient but eliminating the
assumption cache made ephemeral value computation too expensive. This is a
more-targeted change. We'll keep the assumption cache, but extend it to keep a
map of affected values (i.e. values about which an assumption might provide
some information) to the corresponding assumption intrinsics. This allows
ValueTracking and LVI to find assumptions relevant to the value being queried
without scanning all assumptions in the function. The fact that ValueTracking
started doing O(number of assumptions in the function) work, for every
known-bits query, has become prohibitively expensive in some cases.
As discussed during the review, this is a pragmatic fix that, longer term, will
likely be replaced by a more-principled solution (perhaps based on an extended
SSA form).
Differential Revision: https://reviews.llvm.org/D28459
llvm-svn: 291671
I wrote this patch before seeing the comment in:
https://reviews.llvm.org/D27114
...that suggests we should actually be canonicalizing the other way.
So just in case we decide this is the right way, we might as well
have a cleaner implementation.
llvm-svn: 290912
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.
This builds on r290554 which added supported for 128 and 256-bit.
llvm-svn: 290582
An earlier commit added support for unmasked scalar operations. At that time isel wouldn't generate an optimal sequence for masked operations, but that has now been fixed.
llvm-svn: 290566
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.
Differential Revision: https://reviews.llvm.org/D28119
llvm-svn: 290554
Summary:
I only do this for unmasked cases for now because isel is failing to fold the mask. I'll try to fix that soon.
I'll do the same thing for packed add/sub/mul/div in a future patch.
Reviewers: delena, RKSimon, zvi, craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27879
llvm-svn: 290535
Summary:
This patch adds support for converting the masked vpermv intrinsics into shufflevector instructions if the indices are constants.
We also need to wrap a select instruction around the shuffle to take care of the masking part. InstCombine will take care of optimizing the select if the mask is constant so I didn't bother checking for that.
Reviewers: zvi, delena, spatel, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27825
llvm-svn: 290530
We're currently doing nearly the same thing for @llvm.objectsize in
three different places: two of them are missing checks for overflow,
and one of them could subtly break if InstCombine gets much smarter
about removing alloc sites. Seems like a good idea to not do that.
llvm-svn: 290214
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...
llvm-svn: 289756
There was an efficiency problem with how we processed @llvm.assume in
ValueTracking (and other places). The AssumptionCache tracked all of the
assumptions in a given function. In order to find assumptions relevant to
computing known bits, etc. we searched every assumption in the function. For
ValueTracking, that means that we did O(#assumes * #values) work in InstCombine
and other passes (with a constant factor that can be quite large because we'd
repeat this search at every level of recursion of the analysis).
Several of us discussed this situation at the last developers' meeting, and
this implements the discussed solution: Make the values that an assume might
affect operands of the assume itself. To avoid exposing this detail to
frontends and passes that need not worry about it, I've used the new
operand-bundle feature to add these extra call "operands" in a way that does
not affect the intrinsic's signature. I think this solution is relatively
clean. InstCombine adds these extra operands based on what ValueTracking, LVI,
etc. will need and then those passes need only search the users of the values
under consideration. This should fix the computational-complexity problem.
At this point, no passes depend on the AssumptionCache, and so I'll remove
that as a follow-up change.
Differential Revision: https://reviews.llvm.org/D27259
llvm-svn: 289755
Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused. Similarly we clear bit 0 for optimizing operand 0.
Also calculate UndefElts correctly.
Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support.
llvm-svn: 289629
Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused.
Also calculate UndefElts correctly.
Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support.
llvm-svn: 289628
Only the lower bits of the input element are used. And only the lower element can be undef since the upper bits are zeroed.
Have InstCombineCalls call SimplifyDemandedVectorElts for these intrinsics to reuse this support.
llvm-svn: 289523
These intrinsics don't read the upper elements of their first and second input. These are slightly different the the SSE version which does use the upper bits of its first element as passthru bits since the result goes to an XMM register. For AVX-512 the result goes to a mask register instead.
llvm-svn: 289371
These intrinsics don't read the upper bits of their second input. And the third input is the passthru for masking and that only uses the lower element as well.
llvm-svn: 289370
This is a straightforward extension of the existing support for 32/64-bit element types. Just needed to add the additional instrinsics to the switches.
llvm-svn: 287316
Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file.
Reviewers: zvi, delena, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26660
llvm-svn: 287083
This fixes a similar issue to the one already fixed by r280804
(revieved in D24256). Revision 280804 fixed the problem with unsafe dyn_casts
in the extrq/extrqi combining logic. However, it turns out that even the
insertq/insertqi logic was affected by the same problem.
llvm-svn: 280807
This patch fixes an assertion failure caused by unsafe dynamic casts on the
constant operands of sse4a intrinsic calls to extrq/extrqi
The combine logic that simplifies sse4a extrq/extrqi intrinsic calls currently
checks if the input operands are constants. Internally, that logic relies on
dyn_casts of values returned by calls to method Constant::getAggregateElement.
However, method getAggregateElemet may return nullptr if the constant element
cannot be retrieved. So, all the dyn_casts can potentially fail. This is what
happens for example if a constexpr value is passed in input to an extrq/extrqi
intrinsic call.
This patch fixes the problem by using a dyn_cast_or_null (instead of a simple
dyn_cast) on the result of each call to Constant::getAggregateElement.
Added reproducible test cases to x86-sse4a.ll.
Differential Revision: https://reviews.llvm.org/D24256
llvm-svn: 280804
memcpy with ld/st.
When InstCombine replaces a memcpy with loads+stores it does not copy over the
llvm.mem.parallel_loop_access from the memcpy instruction. This patch fixes
that.
Differential Revision: https://reviews.llvm.org/D23499
llvm-svn: 280617
Note that this fold really belongs in InstSimplify.
Refactoring here anyway as an intermediate step because
there's a planned addition to this function in D23134.
Differential Revision: https://reviews.llvm.org/D23223
llvm-svn: 277883
Summary:
Asan stack-use-after-scope check should poison alloca even if there is
no access between start and end.
This is possible for code like this:
for (int i = 0; i < 3; i++) {
int x;
p = &x;
}
"Loop Invariant Code Motion" will move "p = &x;" out of the loop, making
start/end range empty.
PR27453
Reviewers: eugenis
Differential Revision: https://reviews.llvm.org/D22842
llvm-svn: 277072
Summary:
Asan stack-use-after-scope check should poison alloca even if there is
no access between start and end.
This is possible for code like this:
for (int i = 0; i < 3; i++) {
int x;
p = &x;
}
"Loop Invariant Code Motion" will move "p = &x;" out of the loop, making
start/end range empty.
PR27453
Reviewers: eugenis
Differential Revision: https://reviews.llvm.org/D22842
llvm-svn: 277068
We were able to fold masked loads with an all-ones mask to a normal
load. However, we couldn't turn a masked load with a mask with mixed
ones and undefs into a normal load.
llvm-svn: 275380
This actually uncovered a surprisingly large chain of ultimately unused
TLI args.
From what I can gather, this argument is a remnant of when
isKnownNonNull would look at the TLI directly.
The current approach seems to be that InferFunctionAttrs runs early in
the pipeline and uses TLI to annotate the TLI-dependent non-null
information as return attributes.
This also removes the dependence of functionattrs on TLI altogether.
llvm-svn: 274455
As suggested by clang-tidy's performance-unnecessary-copy-initialization.
This can easily hit lifetime issues, so I audited every change and ran the
tests under asan, which came back clean.
llvm-svn: 272126
Unlike native shifts, the AVX2 per-element shift instructions VPSRAV/VPSRLV/VPSLLV handle out of range shift values (logical shifts set the result to zero, arithmetic shifts splat the sign bit).
If the shift amount is constant we can sometimes convert these instructions to native shifts:
1 - if all shift amounts are in range then the conversion is trivial.
2 - out of range arithmetic shifts can be clamped to the (bitwidth - 1) (a legal shift amount) before conversion.
3 - logical shifts just return zero if all elements have out of range shift amounts.
In addition, UNDEF shift amounts are handled - either as an UNDEF shift amount in a native shift or as an UNDEF in the logical 'all out of range' zero constant special case for logical shifts.
Differential Revision: http://reviews.llvm.org/D19675
llvm-svn: 271996
This patch adds support for folding undef/zero/constant inputs to MOVMSK instructions.
The SSE/AVX versions can be fully folded, but the MMX version can only handle undef inputs.
Differential Revision: http://reviews.llvm.org/D20998
llvm-svn: 271990
This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused.
Reapplied now that the the companion patch (D20684) removes/auto-upgrade the clang intrinsics has been committed.
Differential Revision: http://reviews.llvm.org/D20686
llvm-svn: 271131
This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused.
A companion patch (D20684) removes/auto-upgrade the clang intrinsics.
Differential Revision: http://reviews.llvm.org/D20686
llvm-svn: 270973
When a va_start or va_copy is immediately followed by a va_end (ignoring
debug information or other start/end in between), then it is safe to
remove the pair. As this code shares some commonalities with the lifetime
markers, this has been factored to helper functions.
This InstCombine pattern kicks-in 3 times when running the LLVM test
suite.
llvm-svn: 269033
Make use of Constant::getAggregateElement instead of checking constant types - first step towards adding support for UNDEF mask elements.
llvm-svn: 268158
Make use of Constant::getAggregateElement instead of checking constant types - first step towards adding support for UNDEF mask elements.
llvm-svn: 268115
We neglected to transfer operand bundles for some transforms. These
were found via inspection, I'll try to come up with some test cases.
llvm-svn: 268010