Commit Graph

442 Commits

Author SHA1 Message Date
Patrick Walton 01859da84b [AliasAnalysis] Introduce getModRefInfoMask() as a generalization of pointsToConstantMemory().
The pointsToConstantMemory() method returns true only if the memory pointed to
by the memory location is globally invariant. However, the LLVM memory model
also has the semantic notion of *locally-invariant*: memory that is known to be
invariant for the life of the SSA value representing that pointer. The most
common example of this is a pointer argument that is marked readonly noalias,
which the Rust compiler frequently emits.

It'd be desirable for LLVM to treat locally-invariant memory the same way as
globally-invariant memory when it's safe to do so. This patch implements that,
by introducing the concept of a *ModRefInfo mask*. A ModRefInfo mask is a bound
on the Mod/Ref behavior of an instruction that writes to a memory location,
based on the knowledge that the memory is globally-constant memory (in which
case the mask is NoModRef) or locally-constant memory (in which case the mask
is Ref). ModRefInfo values for an instruction can be combined with the
ModRefInfo mask by simply using the & operator. Where appropriate, this patch
has modified uses of pointsToConstantMemory() to instead examine the mask.

The most notable optimization change I noticed with this patch is that now
redundant loads from readonly noalias pointers can be eliminated across calls,
even when the pointer is captured. Internally, before this patch,
AliasAnalysis was assigning Ref to reads from constant memory; now AA can
assign NoModRef, which is a tighter bound.

Differential Revision: https://reviews.llvm.org/D136659
2022-10-31 13:03:41 -07:00
Fangrui Song de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Nikita Popov f96ea53e89 [AA] Do not track Must in ModRefInfo
getModRefInfo() queries currently track whether the result is a
MustAlias on a best-effort basis. The only user of this functionality
is the optimized memory access type in MemorySSA -- which in turn
has no users. Given that this functionality has not found a user
since it was introduced five years ago (in D38862), I think we
should drop it again.

The context is that I'm working to separate FunctionModRefBehavior
to track mod/ref for different location kinds (like argmem or
inaccessiblemem) separately, and the fact that ModRefInfo also has
an unrelated Must flag makes this quite awkward, especially as this
means that NoModRef is not a zero value. If we want to retain the
functionality, I would probably split getModRefInfo() results into
a part that just contains the ModRef information, and a separate
part containing a (best-effort) AliasResult.

Differential Revision: https://reviews.llvm.org/D130713
2022-08-01 07:14:31 +02:00
Nikita Popov c81dff3c30 [MemoryBuiltins] Add getFreedOperand() function (NFCI)
We currently assume in a number of places that free-like functions
free their first argument. This is true for all hardcoded free-like
functions, but with the new attribute-based design, the freed
argument is supposed to be indicated by the allocptr attribute.

To make sure we handle this correctly once allockind(free) is
respected, add a getFreedOperand() helper which returns the freed
argument, rather than just indicating whether the call frees *some*
argument.

This migrates most but not all users of isFreeCall() to the new
API. The remaining users are a bit more tricky.
2022-07-21 12:39:35 +02:00
Chuanqi Xu 0e10f12844 [NFC] Remove commented cerr debugging loggings
There are some unused cerr debugging loggings in the codes. It is weird
to remain such commented debug helpers in the product.
2022-06-08 15:58:06 +08:00
Philip Reames 8a0cd23326 Revert "[MemDep][NFCI] Remove redundant dyn_cast, replace with cast"
This reverts commit 180d3f251d.  This commit is simply wrong.  IsLoad is set within the same file based on modref state, not whether the instruction is a LoadInst.

This went uncaught because cast<Ty>(X) has been broken.  See https://discourse.llvm.org/t/cast-x-is-broken-implications-and-proposal-to-address/63033 for context.
2022-06-07 13:21:31 -07:00
Max Kazantsev 8555e59a71 [NFC][MemDep] Remove unnecessary Worklist.clear
This execution path leads to return 'false' where the Worklist
will be deallocated anyways. No need to clear it separately.
2022-06-03 12:31:44 +07:00
Max Kazantsev 7e5a730473 [MemDep][NFC] Remove duplicating check in `if` and `else` branch
Same check is done whether the condition is true or false. Just hoist
it out of conditional.
2022-05-30 17:43:00 +07:00
Max Kazantsev 180d3f251d [MemDep][NFCI] Remove redundant dyn_cast, replace with cast
When `IsLoad` is `true`, we don't need to check if the instruction
is actually a load with dyn_cast. Saves some petty amount of CT.
2022-05-30 17:17:55 +07:00
serge-sans-paille 71c3a5519d Cleanup includes: LLVMAnalysis
Number of lines output by preprocessor:
before: 1065940348
after:  1065307662

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120659
2022-03-01 18:01:54 +01:00
Serguei Katkov b45d0b3e8e [MemoryDependency] Simplfy re-ordering condition. Cleanup. NFC.
Make the reading of condition for restricting re-ordering simpler.

Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D120005
2022-02-18 12:40:26 +07:00
Serguei Katkov 194899caef [MemoryDependency] Relax the re-ordering of atomic store and unordered load/store
Atomic store with Release semantic allows re-ordering of unordered load/store before the store.
Implement it.

Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D119844
2022-02-17 10:53:25 +07:00
Serguei Katkov 15f1cffb3a [MemoryDependency] Relax the re-ordering with volatile store.
Volatile store does not provide any special rules for reordering with
atomics. Usual must alias anaylsis is enough here.

This makes the bahavior similar to how volatile load is handled.

Reviewers: reames, nikic
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D119818
2022-02-16 10:58:48 +07:00
Nikita Popov 92d55e7336 [MemoryBuiltins] Remove isNoAliasFn() in favor of isNoAliasCall()
We currently have two similar implementations of this concept:
isNoAliasCall() only checks for the noalias return attribute.
isNoAliasFn() also checks for allocation functions.

We should switch to only checking the attribute. SLC is responsible
for inferring the noalias return attribute for non-new allocation
functions (with a missing case fixed in
348bc76e35).
For new, clang is responsible for setting the attribute,
if -fno-assume-sane-operator-new is not passed.

Differential Revision: https://reviews.llvm.org/D116800
2022-01-10 09:18:15 +01:00
Kazu Hirata f6bce30cf9 [llvm] Use range-based for loops (NFC) 2021-11-20 18:42:10 -08:00
Daniil Fukalov e853d3b274 [NFC] MemoryDependenceAnalysis cleanup.
1. Removed redundant includes,
2. Removed never defined and used `releaseMemory()`.
3. Fixed member functions names first letter case.
4. Renamed duplicate (in nested struct `NonLocalPointerInfo`) name
   `NonLocalDeps` to `NonLocalDepsMap`.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D102358
2021-05-31 18:07:55 +03:00
Nikita Popov c4fb2a1fc2 [MemDep] Use BatchAA in more places (NFCI)
Previously, we already used BatchAA for individual simple pointer
dependency queries. This extends BatchAA usage for the non-local
case, so that only one BatchAA instance is used for all blocks,
instead of one instance per block.

Use of BatchAA is safe as IR cannot be modified during a MemDep
query.
2021-05-14 22:54:40 +02:00
Nikita Popov 5e289cc597 [AA] Support callCapturesBefore() on BatchAA (NFCI)
This is not expected to have any practical compile-time effect,
as the alias() calls inside callCapturesBefore() are rare. This
should still be supported for API completeness, and might be
useful for reachability caching.
2021-05-14 21:48:08 +02:00
dfukalov fdae3fc8b3 [GVN] Clobber partially aliased loads.
Use offsets stored in `AliasResult` implemented in D98718.

Updated with fix of issue reported in https://reviews.llvm.org/D95543#2745161

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95543
2021-05-14 11:17:14 +03:00
Jordan Rupprecht fec2945998 Revert "[GVN] Clobber partially aliased loads."
This reverts commit 6c57044231.

It causes assertion errors due to widening atomic loads, and potentially causes miscompile elsewhere too. Repro, also posted to D95543:

```
$ cat repro.ll
; ModuleID = 'repro.ll'
source_filename = "repro.ll"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%struct.widget = type { i32 }
%struct.baz = type { i32, %struct.snork }
%struct.snork = type { %struct.spam }
%struct.spam = type { i32, i32 }

@global = external local_unnamed_addr global %struct.widget, align 4
@global.1 = external local_unnamed_addr global i8, align 1
@global.2 = external local_unnamed_addr global i32, align 4

define void @zot(%struct.baz* %arg) local_unnamed_addr align 2 {
bb:
  %tmp = getelementptr inbounds %struct.baz, %struct.baz* %arg, i64 0, i32 1
  %tmp1 = bitcast %struct.snork* %tmp to i64*
  %tmp2 = load i64, i64* %tmp1, align 4
  %tmp3 = getelementptr inbounds %struct.baz, %struct.baz* %arg, i64 0, i32 1, i32 0, i32 1
  %tmp4 = icmp ugt i64 %tmp2, 4294967295
  br label %bb5

bb5:                                              ; preds = %bb14, %bb
  %tmp6 = load i32, i32* %tmp3, align 4
  %tmp7 = icmp ne i32 %tmp6, 0
  %tmp8 = select i1 %tmp7, i1 %tmp4, i1 false
  %tmp9 = zext i1 %tmp8 to i8
  store i8 %tmp9, i8* @global.1, align 1
  %tmp10 = load i32, i32* @global.2, align 4
  switch i32 %tmp10, label %bb11 [
    i32 1, label %bb12
    i32 2, label %bb12
  ]

bb11:                                             ; preds = %bb5
  br label %bb14

bb12:                                             ; preds = %bb5, %bb5
  %tmp13 = load atomic i32, i32* getelementptr inbounds (%struct.widget, %struct.widget* @global, i64 0, i32 0) acquire, align 4
  br label %bb14

bb14:                                             ; preds = %bb12, %bb11
  br label %bb5
}
$ opt -O2 repro.ll -disable-output
opt: /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Utils/VNCoercion.cpp:496: llvm::Value *llvm::VNCoercion::getLoadValueForLoad(llvm::LoadInst *, unsigned int, llvm::Type *, llvm::Instruction *, const llvm::DataLayout &): Assertion `SrcVal->isSimple() && "Cannot widen volatile/atomic load!"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /home/rupprecht/dev/opt -O2 repro.ll -disable-output
...
```
2021-05-11 16:08:53 -07:00
dfukalov 6c57044231 [GVN] Clobber partially aliased loads.
Use offsets stored in `AliasResult` implemented in D98718.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95543
2021-04-24 14:14:20 +03:00
dfukalov d066079728 [NFC][AA] Prepare to convert AliasResult to class with PartialAlias offset.
Main reason is preparation to transform AliasResult to class that contains
offset for PartialAlias case.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98027
2021-04-09 12:54:22 +03:00
Philip Reames ffa15e9463 Extract isVolatile helper on Instruction [NFCI]
We have this logic duplicated in several cases, none of which were exhaustive.  Consolidate it in one place.

I don't believe this actually impacts behavior of the callers.  I think they all filter their inputs such that their partial implementations were correct.  If not, this might be fixing a cornercase bug.
2021-04-01 11:24:02 -07:00
Thomas Preud'homme f12433f127 [MemDepAnalysis] Remove redundant comment.
Exact same comment is found 2 lines above.
2021-03-16 15:51:17 +00:00
William S. Moses 875891a10d [MemoryDependence] Fix invariant group store
Fix bug in MemoryDependence [and thus GVN] for invariant group.

Previously MemDep didn't verify that the store was storing into a
pointer rather than a store simply using a pointer.

Differential Revision: https://reviews.llvm.org/D98267
2021-03-09 19:03:39 -05:00
Kazu Hirata 16baad8f4e [llvm] Use pop_back_val (NFC) 2021-01-24 12:18:57 -08:00
Kazu Hirata a3254904b2 [Analysis] Use llvm::append_range (NFC) 2021-01-22 23:25:01 -08:00
Kazu Hirata cd088ba7e6 [llvm] Use llvm::lower_bound and llvm::upper_bound (NFC) 2021-01-05 21:15:59 -08:00
Nikita Popov 4df8efce80 [AA] Split up LocationSize::unknown()
Currently, we have some confusion in the codebase regarding the
meaning of LocationSize::unknown(): Some parts (including most of
BasicAA) assume that LocationSize::unknown() only allows accesses
after the base pointer. Some parts (various callers of AA) assume
that LocationSize::unknown() allows accesses both before and after
the base pointer (but within the underlying object).

This patch splits up LocationSize::unknown() into
LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer()
to make this completely unambiguous. I tried my best to determine
which one is appropriate for all the existing uses.

The test changes in cs-cs.ll in particular illustrate a previously
clearly incorrect AA result: We were effectively assuming that
argmemonly functions were only allowed to access their arguments
after the passed pointer, but not before it. I'm pretty sure that
this was not intentional, and it's certainly not specified by
LangRef that way.

Differential Revision: https://reviews.llvm.org/D91649
2020-11-26 18:39:55 +01:00
Nikita Popov 393b9e9db3 [MemLoc] Require LocationSize argument (NFC)
When constructing a MemoryLocation by hand, require that a
LocationSize is explicitly specified. D91649 will split up
LocationSize::unknown() into two different states, and callers
should make an explicit choice regarding the kind of MemoryLocation
they want to have.
2020-11-19 21:45:52 +01:00
Krzysztof Parzyszek 055d209589 Handle masked loads and stores in MemoryLocation/Dependence
Differential Revision: https://reviews.llvm.org/D87061
2020-09-08 19:08:44 -05:00
Nikita Popov 3a54b6a4b7 [MemDep] Use BatchAA when computing pointer dependencies
We're not changing IR while running a single MemDep query, so it's
safe to cache alias analysis results using BatchAA. This adds BatchAA
usage to getSimplePointerDependencyFrom(), which is non-intrusive --
covering larger parts (like a whole processNonLocalLoad query) is
also possible, but requires threading BatchAA through a bunch of APIs.

For the ThinLTO configuration, this is a 1% geomean improvement on CTMark.

Differential Revision: https://reviews.llvm.org/D85583
2020-08-25 21:34:34 +02:00
Vitaly Buka b0eb40ca39 [NFC] Remove unused GetUnderlyingObject paramenter
Depends on D84617.

Differential Revision: https://reviews.llvm.org/D84621
2020-07-31 02:10:03 -07:00
Vitaly Buka 89051ebace [NFC] GetUnderlyingObject -> getUnderlyingObject
I am going to touch them in the next patch anyway
2020-07-30 21:08:24 -07:00
Florian Hahn e4b58ea8c1 [MemDep] Also remove load instructions from NonLocalDesCache.
Currently load instructions are added to the cache for invariant pointer
group dependencies, but only pointer values are removed currently. That
leads to dangling AssertingVHs in the test case below, where we delete a
load from an invariant pointer group. We should also remove the entries
from the cache.

Fixes PR46054.

Reviewers: efriedma, hfinkel, asbirlea

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D81726
2020-06-17 09:36:53 +01:00
Evgeniy Brevnov 5a63813dc7 [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses(PR42151).
Summary:
This is second attempt to fix the problem with incorrect dependencies reported in presence of invariant load. Initial fix (https://reviews.llvm.org/D64405) was reverted due to a regression reported in https://reviews.llvm.org/D70516.

The original fix changed caching behavior for invariant loads. Namely such loads are not put into the second level cache (NonLocalDepInfo). The problem with that fix is the first level cache  (CachedNonLocalPointerInfo) still works as if invariant loads were in the second level cache. The solution is in addition to not putting dependence results into the second level cache avoid putting info about invariant loads into the first level cache as well.

Reviewers: jdoerfert, reames, hfinkel, efriedma

Reviewed By: jdoerfert

Subscribers: DaniilSuchkov, hiraditya, bmahjour, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73027
2020-03-04 18:40:02 +07:00
Evgeniy Brevnov b0761bbc76 [DependenceAnalysis] Memory dependence analysis internal caching mechanism is broken in presence of TBAA (PR42733).
Summary:
There is a flaw in memory dependence analysis caching mechanism when memory accesses with TBAA are involved. Assume we first analysed and cached results for access with TBAA. Later we request dependence for the same memory but without TBAA (or different TBAA). By design these two queries should share one entry in the internal cache which corresponds to a general access (without TBAA).  Thus upon second request internal cached is cleared and we continue analysis for access as if there is no TBAA.

The problem is that even though internal cache is cleared the set of visited nodes is not. That means we won't traverse visited nodes again and populate internal cache with the corresponding dependence results. So we end up  with internal cache in an incomplete state. Current implementation tries to signal that situation by resetting CacheInfo->Pair at line 1104. But that doesn't actually help since later code ignores this invalidation and relies on 'Cache->empty()' property to decide on cache completeness.

Reviewers: reames, hfinkel, chandlerc, fedor.sergeev, asbirlea, fhahn, john.brawn, Prazek, sunfish

Reviewed By: john.brawn

Subscribers: DaniilSuchkov, kosarev, jfb, dantrushin, hiraditya, bmahjour, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73032
2020-02-21 20:20:36 +07:00
Reid Kleckner 0c2b09a9b6 [IR] Lazily number instructions for local dominance queries
Essentially, fold OrderedBasicBlock into BasicBlock, and make it
auto-invalidate the instruction ordering when new instructions are
added. Notably, we don't need to invalidate it when removing
instructions, which is helpful when a pass mostly delete dead
instructions rather than transforming them.

The downside is that Instruction grows from 56 bytes to 64 bytes.  The
resulting LLVM code is substantially simpler and automatically handles
invalidation, which makes me think that this is the right speed and size
tradeoff.

The important change is in SymbolTableTraitsImpl.h, where the numbering
is invalidated. Everything else should be straightforward.

We probably want to implement a fancier re-numbering scheme so that
local updates don't invalidate the ordering, but I plan for that to be
future work, maybe for someone else.

Reviewed By: lattner, vsk, fhahn, dexonsmith

Differential Revision: https://reviews.llvm.org/D51664
2020-02-18 14:44:24 -08:00
Evgeniy Brevnov cae643d596 Reverting D73027 [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses(PR42151). 2020-02-14 22:57:23 +07:00
Evgeniy Brevnov 5573abceab [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses(PR42151).
Summary:
This is second attempt to fix the problem with incorrect dependencies reported in presence of invariant load. Initial fix (https://reviews.llvm.org/D64405) was reverted due to a regression reported in https://reviews.llvm.org/D70516.

The original fix changed caching behavior for invariant loads. Namely such loads are not put into the second level cache (NonLocalDepInfo). The problem with that fix is the first level cache  (CachedNonLocalPointerInfo) still works as if invariant loads were in the second level cache. The solution is in addition to not putting dependence results into the second level cache avoid putting info about invariant loads into the first level cache as well.

Reviewers: jdoerfert, reames, hfinkel, efriedma

Reviewed By: jdoerfert

Subscribers: DaniilSuchkov, hiraditya, bmahjour, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73027
2020-02-14 12:18:31 +07:00
Alina Sbirlea 9f6c6ee6b9 [MemDepAnalysis/VNCoercion] Move static method to its only use. [NFCI]
Static method MemoryDependenceResults::getLoadLoadClobberFullWidthSize
does not have or use any info specific to MemoryDependenceResults.
Move it to its only user: VNCoercion.
2020-01-17 15:18:42 -08:00
Mark de Wever 8dc7b982b4 [NFC] Fixes -Wrange-loop-analysis warnings
This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall.

Differential Revision: https://reviews.llvm.org/D71857
2020-01-01 20:01:37 +01:00
Benjamin Kramer 446acafb82 Revert "[DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151"
Summary:
Revert "[DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151"

 This reverts commit 5f026b6d9e.

We're (tensorflow.org/xla team) seeing some misscompiles with the new change, only at -O3, with fast math disabled.

I'm still trying to come up with a useful/small/external example, but for now, the following IR:

```
; ModuleID = '__compute_module'
source_filename = "__compute_module"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

@0 = private unnamed_addr constant [4 x i8] c"\DB\0F\C9@"
@1 = private unnamed_addr constant [4 x i8] c"\00\00\00?"

; Function Attrs: uwtable
define void @jit_wrapped_fun.31(i8* %retval, i8* noalias %run_options, i8** noalias %params, i8** noalias %buffer_table, i64* noalias %prof_counters) #0 {
entry:
  %fusion.invar_address.dim.2 = alloca i64
  %fusion.invar_address.dim.1 = alloca i64
  %fusion.invar_address.dim.0 = alloca i64
  %fusion.1.invar_address.dim.2 = alloca i64
  %fusion.1.invar_address.dim.1 = alloca i64
  %fusion.1.invar_address.dim.0 = alloca i64
  %0 = getelementptr inbounds i8*, i8** %buffer_table, i64 1
  %1 = load i8*, i8** %0, !invariant.load !0, !dereferenceable !1, !align !2
  %parameter.3 = bitcast i8* %1 to [2 x [1 x [4 x float]]]*
  %2 = getelementptr inbounds i8*, i8** %buffer_table, i64 5
  %3 = load i8*, i8** %2, !invariant.load !0, !dereferenceable !1, !align !2
  %fusion.1 = bitcast i8* %3 to [2 x [1 x [4 x float]]]*
  store i64 0, i64* %fusion.1.invar_address.dim.0
  br label %fusion.1.loop_header.dim.0

fusion.1.loop_header.dim.0:                       ; preds = %fusion.1.loop_exit.dim.1, %entry
  %fusion.1.indvar.dim.0 = load i64, i64* %fusion.1.invar_address.dim.0
  %4 = icmp uge i64 %fusion.1.indvar.dim.0, 2
  br i1 %4, label %fusion.1.loop_exit.dim.0, label %fusion.1.loop_body.dim.0

fusion.1.loop_body.dim.0:                         ; preds = %fusion.1.loop_header.dim.0
  store i64 0, i64* %fusion.1.invar_address.dim.1
  br label %fusion.1.loop_header.dim.1

fusion.1.loop_header.dim.1:                       ; preds = %fusion.1.loop_exit.dim.2, %fusion.1.loop_body.dim.0
  %fusion.1.indvar.dim.1 = load i64, i64* %fusion.1.invar_address.dim.1
  %5 = icmp uge i64 %fusion.1.indvar.dim.1, 1
  br i1 %5, label %fusion.1.loop_exit.dim.1, label %fusion.1.loop_body.dim.1

fusion.1.loop_body.dim.1:                         ; preds = %fusion.1.loop_header.dim.1
  store i64 0, i64* %fusion.1.invar_address.dim.2
  br label %fusion.1.loop_header.dim.2

fusion.1.loop_header.dim.2:                       ; preds = %fusion.1.loop_body.dim.2, %fusion.1.loop_body.dim.1
  %fusion.1.indvar.dim.2 = load i64, i64* %fusion.1.invar_address.dim.2
  %6 = icmp uge i64 %fusion.1.indvar.dim.2, 4
  br i1 %6, label %fusion.1.loop_exit.dim.2, label %fusion.1.loop_body.dim.2

fusion.1.loop_body.dim.2:                         ; preds = %fusion.1.loop_header.dim.2
  %7 = load float, float* bitcast ([4 x i8]* @0 to float*)
  %8 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %fusion.1.indvar.dim.0, i64 0, i64 %fusion.1.indvar.dim.2
  %9 = load float, float* %8, !invariant.load !0, !noalias !3
  %10 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %fusion.1.indvar.dim.0, i64 0, i64 %fusion.1.indvar.dim.2
  %11 = load float, float* %10, !invariant.load !0, !noalias !3
  %12 = fmul float %9, %11
  %13 = fmul float %7, %12
  %14 = call float @llvm.log.f32(float %13)
  %15 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %fusion.1, i64 0, i64 %fusion.1.indvar.dim.0, i64 0, i64 %fusion.1.indvar.dim.2
  store float %14, float* %15, !alias.scope !7, !noalias !8
  %invar.inc2 = add nuw nsw i64 %fusion.1.indvar.dim.2, 1
  store i64 %invar.inc2, i64* %fusion.1.invar_address.dim.2
  br label %fusion.1.loop_header.dim.2

fusion.1.loop_exit.dim.2:                         ; preds = %fusion.1.loop_header.dim.2
  %invar.inc1 = add nuw nsw i64 %fusion.1.indvar.dim.1, 1
  store i64 %invar.inc1, i64* %fusion.1.invar_address.dim.1
  br label %fusion.1.loop_header.dim.1

fusion.1.loop_exit.dim.1:                         ; preds = %fusion.1.loop_header.dim.1
  %invar.inc = add nuw nsw i64 %fusion.1.indvar.dim.0, 1
  store i64 %invar.inc, i64* %fusion.1.invar_address.dim.0
  br label %fusion.1.loop_header.dim.0

fusion.1.loop_exit.dim.0:                         ; preds = %fusion.1.loop_header.dim.0
  %16 = getelementptr inbounds i8*, i8** %buffer_table, i64 4
  %17 = load i8*, i8** %16, !invariant.load !0, !dereferenceable !9, !align !2
  %parameter.1 = bitcast i8* %17 to float*
  %18 = getelementptr inbounds i8*, i8** %buffer_table, i64 2
  %19 = load i8*, i8** %18, !invariant.load !0, !dereferenceable !10, !align !2
  %parameter.2 = bitcast i8* %19 to [3 x [1 x float]]*
  %20 = getelementptr inbounds i8*, i8** %buffer_table, i64 0
  %21 = load i8*, i8** %20, !invariant.load !0, !dereferenceable !11, !align !2
  %fusion = bitcast i8* %21 to [2 x [3 x [4 x float]]]*
  store i64 0, i64* %fusion.invar_address.dim.0
  br label %fusion.loop_header.dim.0

fusion.loop_header.dim.0:                         ; preds = %fusion.loop_exit.dim.1, %fusion.1.loop_exit.dim.0
  %fusion.indvar.dim.0 = load i64, i64* %fusion.invar_address.dim.0
  %22 = icmp uge i64 %fusion.indvar.dim.0, 2
  br i1 %22, label %fusion.loop_exit.dim.0, label %fusion.loop_body.dim.0

fusion.loop_body.dim.0:                           ; preds = %fusion.loop_header.dim.0
  store i64 0, i64* %fusion.invar_address.dim.1
  br label %fusion.loop_header.dim.1

fusion.loop_header.dim.1:                         ; preds = %fusion.loop_exit.dim.2, %fusion.loop_body.dim.0
  %fusion.indvar.dim.1 = load i64, i64* %fusion.invar_address.dim.1
  %23 = icmp uge i64 %fusion.indvar.dim.1, 3
  br i1 %23, label %fusion.loop_exit.dim.1, label %fusion.loop_body.dim.1

fusion.loop_body.dim.1:                           ; preds = %fusion.loop_header.dim.1
  store i64 0, i64* %fusion.invar_address.dim.2
  br label %fusion.loop_header.dim.2

fusion.loop_header.dim.2:                         ; preds = %fusion.loop_body.dim.2, %fusion.loop_body.dim.1
  %fusion.indvar.dim.2 = load i64, i64* %fusion.invar_address.dim.2
  %24 = icmp uge i64 %fusion.indvar.dim.2, 4
  br i1 %24, label %fusion.loop_exit.dim.2, label %fusion.loop_body.dim.2

fusion.loop_body.dim.2:                           ; preds = %fusion.loop_header.dim.2
  %25 = mul nuw nsw i64 %fusion.indvar.dim.2, 1
  %26 = add nuw nsw i64 0, %25
  %27 = udiv i64 %26, 4
  %28 = mul nuw nsw i64 %fusion.indvar.dim.0, 1
  %29 = add nuw nsw i64 0, %28
  %30 = udiv i64 %29, 2
  %31 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %fusion.1, i64 0, i64 %29, i64 0, i64 %26
  %32 = load float, float* %31, !alias.scope !7, !noalias !8
  %33 = mul nuw nsw i64 %fusion.indvar.dim.1, 1
  %34 = add nuw nsw i64 0, %33
  %35 = udiv i64 %34, 3
  %36 = load float, float* %parameter.1, !invariant.load !0, !noalias !3
  %37 = getelementptr inbounds [3 x [1 x float]], [3 x [1 x float]]* %parameter.2, i64 0, i64 %34, i64 0
  %38 = load float, float* %37, !invariant.load !0, !noalias !3
  %39 = fsub float %36, %38
  %40 = fmul float %39, %39
  %41 = mul nuw nsw i64 %fusion.indvar.dim.2, 1
  %42 = add nuw nsw i64 0, %41
  %43 = udiv i64 %42, 4
  %44 = mul nuw nsw i64 %fusion.indvar.dim.0, 1
  %45 = add nuw nsw i64 0, %44
  %46 = udiv i64 %45, 2
  %47 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %45, i64 0, i64 %42
  %48 = load float, float* %47, !invariant.load !0, !noalias !3
  %49 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %45, i64 0, i64 %42
  %50 = load float, float* %49, !invariant.load !0, !noalias !3
  %51 = fmul float %48, %50
  %52 = fdiv float %40, %51
  %53 = fadd float %32, %52
  %54 = fneg float %53
  %55 = load float, float* bitcast ([4 x i8]* @1 to float*)
  %56 = fmul float %54, %55
  %57 = getelementptr inbounds [2 x [3 x [4 x float]]], [2 x [3 x [4 x float]]]* %fusion, i64 0, i64 %fusion.indvar.dim.0, i64 %fusion.indvar.dim.1, i64 %fusion.indvar.dim.2
  store float %56, float* %57, !alias.scope !8, !noalias !12
  %invar.inc5 = add nuw nsw i64 %fusion.indvar.dim.2, 1
  store i64 %invar.inc5, i64* %fusion.invar_address.dim.2
  br label %fusion.loop_header.dim.2

fusion.loop_exit.dim.2:                           ; preds = %fusion.loop_header.dim.2
  %invar.inc4 = add nuw nsw i64 %fusion.indvar.dim.1, 1
  store i64 %invar.inc4, i64* %fusion.invar_address.dim.1
  br label %fusion.loop_header.dim.1

fusion.loop_exit.dim.1:                           ; preds = %fusion.loop_header.dim.1
  %invar.inc3 = add nuw nsw i64 %fusion.indvar.dim.0, 1
  store i64 %invar.inc3, i64* %fusion.invar_address.dim.0
  br label %fusion.loop_header.dim.0

fusion.loop_exit.dim.0:                           ; preds = %fusion.loop_header.dim.0
  %58 = getelementptr inbounds i8*, i8** %buffer_table, i64 3
  %59 = load i8*, i8** %58, !invariant.load !0, !dereferenceable !2, !align !2
  %tuple.30 = bitcast i8* %59 to [1 x i8*]*
  %60 = bitcast [2 x [3 x [4 x float]]]* %fusion to i8*
  %61 = getelementptr inbounds [1 x i8*], [1 x i8*]* %tuple.30, i64 0, i64 0
  store i8* %60, i8** %61, !alias.scope !14, !noalias !8
  ret void
}

; Function Attrs: nounwind readnone speculatable willreturn
declare float @llvm.log.f32(float) #1

attributes #0 = { uwtable "no-frame-pointer-elim"="false" }
attributes #1 = { nounwind readnone speculatable willreturn }

!0 = !{}
!1 = !{i64 32}
!2 = !{i64 8}
!3 = !{!4, !6}
!4 = !{!"buffer: {index:0, offset:0, size:96}", !5}
!5 = !{!"XLA global AA domain"}
!6 = !{!"buffer: {index:5, offset:0, size:32}", !5}
!7 = !{!6}
!8 = !{!4}
!9 = !{i64 4}
!10 = !{i64 12}
!11 = !{i64 96}
!12 = !{!13, !6}
!13 = !{!"buffer: {index:3, offset:0, size:8}", !5}
!14 = !{!13}
```

gets (correctly) optimized to the one below without the change:

```
; ModuleID = '__compute_module'
source_filename = "__compute_module"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

; Function Attrs: nofree nounwind uwtable
define void @jit_wrapped_fun.31(i8* nocapture readnone %retval, i8* noalias nocapture readnone %run_options, i8** noalias nocapture readnone %params, i8** noalias nocapture readonly %buffer_table, i64* noalias nocapture readnone %prof_counters) local_unnamed_addr #0 {
entry:
  %0 = getelementptr inbounds i8*, i8** %buffer_table, i64 1
  %1 = bitcast i8** %0 to [2 x [1 x [4 x float]]]**
  %2 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %1, align 8, !invariant.load !0, !dereferenceable !1, !align !2
  %3 = getelementptr inbounds i8*, i8** %buffer_table, i64 5
  %4 = bitcast i8** %3 to [2 x [1 x [4 x float]]]**
  %5 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %4, align 8, !invariant.load !0, !dereferenceable !1, !align !2
  %6 = bitcast [2 x [1 x [4 x float]]]* %2 to <4 x float>*
  %7 = load <4 x float>, <4 x float>* %6, align 8, !invariant.load !0, !noalias !3
  %8 = fmul <4 x float> %7, %7
  %9 = fmul <4 x float> %8, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000>
  %10 = call <4 x float> @llvm.log.v4f32(<4 x float> %9)
  %11 = bitcast [2 x [1 x [4 x float]]]* %5 to <4 x float>*
  store <4 x float> %10, <4 x float>* %11, align 8, !alias.scope !7, !noalias !8
  %12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0
  %13 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0
  %14 = bitcast float* %12 to <4 x float>*
  %15 = load <4 x float>, <4 x float>* %14, align 8, !invariant.load !0, !noalias !3
  %16 = fmul <4 x float> %15, %15
  %17 = fmul <4 x float> %16, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000>
  %18 = call <4 x float> @llvm.log.v4f32(<4 x float> %17)
  %19 = bitcast float* %13 to <4 x float>*
  store <4 x float> %18, <4 x float>* %19, align 8, !alias.scope !7, !noalias !8
  %20 = getelementptr inbounds i8*, i8** %buffer_table, i64 4
  %21 = bitcast i8** %20 to float**
  %22 = load float*, float** %21, align 8, !invariant.load !0, !dereferenceable !9, !align !2
  %23 = getelementptr inbounds i8*, i8** %buffer_table, i64 2
  %24 = bitcast i8** %23 to [3 x [1 x float]]**
  %25 = load [3 x [1 x float]]*, [3 x [1 x float]]** %24, align 8, !invariant.load !0, !dereferenceable !10, !align !2
  %26 = load i8*, i8** %buffer_table, align 8, !invariant.load !0, !dereferenceable !11, !align !2
  %27 = load float, float* %22, align 8, !invariant.load !0, !noalias !3
  %.phi.trans.insert28 = getelementptr inbounds [3 x [1 x float]], [3 x [1 x float]]* %25, i64 0, i64 2, i64 0
  %.pre29 = load float, float* %.phi.trans.insert28, align 8, !invariant.load !0, !noalias !3
  %28 = bitcast [3 x [1 x float]]* %25 to <2 x float>*
  %29 = load <2 x float>, <2 x float>* %28, align 8, !invariant.load !0, !noalias !3
  %30 = insertelement <2 x float> undef, float %27, i32 0
  %31 = shufflevector <2 x float> %30, <2 x float> undef, <2 x i32> zeroinitializer
  %32 = fsub <2 x float> %31, %29
  %33 = fmul <2 x float> %32, %32
  %shuffle30 = shufflevector <2 x float> %33, <2 x float> undef, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1>
  %34 = fsub float %27, %.pre29
  %35 = fmul float %34, %34
  %36 = insertelement <4 x float> undef, float %35, i32 0
  %37 = shufflevector <4 x float> %36, <4 x float> undef, <4 x i32> zeroinitializer
  %shuffle = shufflevector <4 x float> %10, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
  %38 = fmul <4 x float> %7, %7
  %shuffle31 = shufflevector <4 x float> %38, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
  %39 = fdiv <8 x float> %shuffle30, %shuffle31
  %40 = fadd <8 x float> %shuffle, %39
  %41 = fmul <8 x float> %40, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01>
  %42 = bitcast i8* %26 to <8 x float>*
  store <8 x float> %41, <8 x float>* %42, align 8, !alias.scope !8, !noalias !12
  %43 = getelementptr inbounds i8, i8* %26, i64 32
  %44 = fdiv <4 x float> %37, %38
  %45 = fadd <4 x float> %10, %44
  %46 = fmul <4 x float> %45, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01>
  %47 = bitcast i8* %43 to <4 x float>*
  store <4 x float> %46, <4 x float>* %47, align 8, !alias.scope !8, !noalias !12
  %.phi.trans.insert = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0
  %.phi.trans.insert12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0
  %48 = bitcast float* %.phi.trans.insert to <4 x float>*
  %49 = load <4 x float>, <4 x float>* %48, align 8, !alias.scope !7, !noalias !8
  %50 = bitcast float* %.phi.trans.insert12 to <4 x float>*
  %51 = load <4 x float>, <4 x float>* %50, align 8, !invariant.load !0, !noalias !3
  %shuffle.1 = shufflevector <4 x float> %49, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
  %52 = getelementptr inbounds i8, i8* %26, i64 48
  %53 = fmul <4 x float> %51, %51
  %shuffle31.1 = shufflevector <4 x float> %53, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
  %54 = fdiv <8 x float> %shuffle30, %shuffle31.1
  %55 = fadd <8 x float> %shuffle.1, %54
  %56 = fmul <8 x float> %55, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01>
  %57 = bitcast i8* %52 to <8 x float>*
  store <8 x float> %56, <8 x float>* %57, align 8, !alias.scope !8, !noalias !12
  %58 = getelementptr inbounds i8, i8* %26, i64 80
  %59 = fdiv <4 x float> %37, %53
  %60 = fadd <4 x float> %49, %59
  %61 = fmul <4 x float> %60, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01>
  %62 = bitcast i8* %58 to <4 x float>*
  store <4 x float> %61, <4 x float>* %62, align 8, !alias.scope !8, !noalias !12
  %63 = getelementptr inbounds i8*, i8** %buffer_table, i64 3
  %64 = bitcast i8** %63 to [1 x i8*]**
  %65 = load [1 x i8*]*, [1 x i8*]** %64, align 8, !invariant.load !0, !dereferenceable !2, !align !2
  %66 = getelementptr inbounds [1 x i8*], [1 x i8*]* %65, i64 0, i64 0
  store i8* %26, i8** %66, align 8, !alias.scope !14, !noalias !8
  ret void
}

; Function Attrs: nounwind readnone speculatable willreturn
declare <4 x float> @llvm.log.v4f32(<4 x float>) #1

attributes #0 = { nofree nounwind uwtable "no-frame-pointer-elim"="false" }
attributes #1 = { nounwind readnone speculatable willreturn }

!0 = !{}
!1 = !{i64 32}
!2 = !{i64 8}
!3 = !{!4, !6}
!4 = !{!"buffer: {index:0, offset:0, size:96}", !5}
!5 = !{!"XLA global AA domain"}
!6 = !{!"buffer: {index:5, offset:0, size:32}", !5}
!7 = !{!6}
!8 = !{!4}
!9 = !{i64 4}
!10 = !{i64 12}
!11 = !{i64 96}
!12 = !{!13, !6}
!13 = !{!"buffer: {index:3, offset:0, size:8}", !5}
!14 = !{!13}

```

and (incorrectly) optimized to the one below with the change:

```
; ModuleID = '__compute_module'
source_filename = "__compute_module"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

; Function Attrs: nofree nounwind uwtable
define void @jit_wrapped_fun.31(i8* nocapture readnone %retval, i8* noalias nocapture readnone %run_options, i8** noalias nocapture readnone %params, i8** noalias nocapture readonly %buffer_table, i64* noalias nocapture readnone %prof_counters) local_unnamed_addr #0 {
entry:
  %0 = getelementptr inbounds i8*, i8** %buffer_table, i64 1
  %1 = bitcast i8** %0 to [2 x [1 x [4 x float]]]**
  %2 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %1, align 8, !invariant.load !0, !dereferenceable !1, !align !2
  %3 = getelementptr inbounds i8*, i8** %buffer_table, i64 5
  %4 = bitcast i8** %3 to [2 x [1 x [4 x float]]]**
  %5 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %4, align 8, !invariant.load !0, !dereferenceable !1, !align !2
  %6 = bitcast [2 x [1 x [4 x float]]]* %2 to <4 x float>*
  %7 = load <4 x float>, <4 x float>* %6, align 8, !invariant.load !0, !noalias !3
  %8 = fmul <4 x float> %7, %7
  %9 = fmul <4 x float> %8, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000>
  %10 = call <4 x float> @llvm.log.v4f32(<4 x float> %9)
  %11 = bitcast [2 x [1 x [4 x float]]]* %5 to <4 x float>*
  store <4 x float> %10, <4 x float>* %11, align 8, !alias.scope !7, !noalias !8
  %12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0
  %13 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0
  %14 = bitcast float* %12 to <4 x float>*
  %15 = load <4 x float>, <4 x float>* %14, align 8, !invariant.load !0, !noalias !3
  %16 = fmul <4 x float> %15, %15
  %17 = fmul <4 x float> %16, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000>
  %18 = call <4 x float> @llvm.log.v4f32(<4 x float> %17)
  %19 = bitcast float* %13 to <4 x float>*
  store <4 x float> %18, <4 x float>* %19, align 8, !alias.scope !7, !noalias !8
  %20 = getelementptr inbounds i8*, i8** %buffer_table, i64 4
  %21 = bitcast i8** %20 to float**
  %22 = load float*, float** %21, align 8, !invariant.load !0, !dereferenceable !9, !align !2
  %23 = getelementptr inbounds i8*, i8** %buffer_table, i64 2
  %24 = bitcast i8** %23 to [3 x [1 x float]]**
  %25 = load [3 x [1 x float]]*, [3 x [1 x float]]** %24, align 8, !invariant.load !0, !dereferenceable !10, !align !2
  %26 = load i8*, i8** %buffer_table, align 8, !invariant.load !0, !dereferenceable !11, !align !2
  %27 = load float, float* %22, align 8, !invariant.load !0, !noalias !3
  %.phi.trans.insert28 = getelementptr inbounds [3 x [1 x float]], [3 x [1 x float]]* %25, i64 0, i64 2, i64 0
  %.pre29 = load float, float* %.phi.trans.insert28, align 8, !invariant.load !0, !noalias !3
  %28 = bitcast [3 x [1 x float]]* %25 to <2 x float>*
  %29 = load <2 x float>, <2 x float>* %28, align 8, !invariant.load !0, !noalias !3
  %30 = insertelement <2 x float> undef, float %27, i32 0
  %31 = shufflevector <2 x float> %30, <2 x float> undef, <2 x i32> zeroinitializer
  %32 = fsub <2 x float> %31, %29
  %33 = fmul <2 x float> %32, %32
  %shuffle32 = shufflevector <2 x float> %33, <2 x float> undef, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1>
  %34 = fsub float %27, %.pre29
  %35 = fmul float %34, %34
  %36 = insertelement <4 x float> undef, float %35, i32 0
  %37 = shufflevector <4 x float> %36, <4 x float> undef, <4 x i32> zeroinitializer
  %shuffle = shufflevector <4 x float> %10, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
  %38 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 0, i64 0, i64 3
  %39 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 0, i64 0, i64 3
  %40 = fmul <4 x float> %7, %7
  %41 = shufflevector <4 x float> %40, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
  %42 = fdiv <8 x float> %shuffle32, %41
  %43 = fadd <8 x float> %shuffle, %42
  %44 = fmul <8 x float> %43, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01>
  %45 = bitcast i8* %26 to <8 x float>*
  store <8 x float> %44, <8 x float>* %45, align 8, !alias.scope !8, !noalias !12
  %46 = extractelement <4 x float> %10, i32 0
  %47 = getelementptr inbounds i8, i8* %26, i64 32
  %48 = extractelement <4 x float> %10, i32 1
  %49 = extractelement <4 x float> %10, i32 2
  %50 = load float, float* %38, align 4, !alias.scope !7, !noalias !8
  %51 = load float, float* %39, align 4, !invariant.load !0, !noalias !3
  %52 = fmul float %51, %51
  %53 = insertelement <4 x float> undef, float %52, i32 3
  %54 = fdiv <4 x float> %37, %53
  %55 = insertelement <4 x float> undef, float %46, i32 0
  %56 = insertelement <4 x float> %55, float %48, i32 1
  %57 = insertelement <4 x float> %56, float %49, i32 2
  %58 = insertelement <4 x float> %57, float %50, i32 3
  %59 = fadd <4 x float> %58, %54
  %60 = fmul <4 x float> %59, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01>
  %61 = bitcast i8* %47 to <4 x float>*
  store <4 x float> %60, <4 x float>* %61, align 8, !alias.scope !8, !noalias !12
  %.phi.trans.insert = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0
  %.phi.trans.insert12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0
  %62 = bitcast float* %.phi.trans.insert to <4 x float>*
  %63 = load <4 x float>, <4 x float>* %62, align 8, !alias.scope !7, !noalias !8
  %64 = bitcast float* %.phi.trans.insert12 to <4 x float>*
  %65 = load <4 x float>, <4 x float>* %64, align 8, !invariant.load !0, !noalias !3
  %shuffle.1 = shufflevector <4 x float> %63, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
  %66 = getelementptr inbounds i8, i8* %26, i64 48
  %67 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 3
  %68 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 3
  %69 = fmul <4 x float> %65, %65
  %70 = shufflevector <4 x float> %69, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
  %71 = fdiv <8 x float> %shuffle32, %70
  %72 = fadd <8 x float> %shuffle.1, %71
  %73 = fmul <8 x float> %72, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01>
  %74 = bitcast i8* %66 to <8 x float>*
  store <8 x float> %73, <8 x float>* %74, align 8, !alias.scope !8, !noalias !12
  %75 = extractelement <4 x float> %69, i32 0
  %76 = extractelement <4 x float> %63, i32 0
  %77 = getelementptr inbounds i8, i8* %26, i64 80
  %78 = extractelement <4 x float> %69, i32 1
  %79 = extractelement <4 x float> %63, i32 1
  %80 = extractelement <4 x float> %69, i32 2
  %81 = extractelement <4 x float> %63, i32 2
  %82 = load float, float* %67, align 4, !alias.scope !7, !noalias !8
  %83 = load float, float* %68, align 4, !invariant.load !0, !noalias !3
  %84 = fmul float %83, %83
  %85 = insertelement <4 x float> undef, float %75, i32 0
  %86 = insertelement <4 x float> %85, float %78, i32 1
  %87 = insertelement <4 x float> %86, float %80, i32 2
  %88 = insertelement <4 x float> %87, float %84, i32 3
  %89 = fdiv <4 x float> %37, %88
  %90 = insertelement <4 x float> undef, float %76, i32 0
  %91 = insertelement <4 x float> %90, float %79, i32 1
  %92 = insertelement <4 x float> %91, float %81, i32 2
  %93 = insertelement <4 x float> %92, float %82, i32 3
  %94 = fadd <4 x float> %93, %89
  %95 = fmul <4 x float> %94, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01>
  %96 = bitcast i8* %77 to <4 x float>*
  store <4 x float> %95, <4 x float>* %96, align 8, !alias.scope !8, !noalias !12
  %97 = getelementptr inbounds i8*, i8** %buffer_table, i64 3
  %98 = bitcast i8** %97 to [1 x i8*]**
  %99 = load [1 x i8*]*, [1 x i8*]** %98, align 8, !invariant.load !0, !dereferenceable !2, !align !2
  %100 = getelementptr inbounds [1 x i8*], [1 x i8*]* %99, i64 0, i64 0
  store i8* %26, i8** %100, align 8, !alias.scope !14, !noalias !8
  ret void
}

; Function Attrs: nounwind readnone speculatable willreturn
declare <4 x float> @llvm.log.v4f32(<4 x float>) #1

attributes #0 = { nofree nounwind uwtable "no-frame-pointer-elim"="false" }
attributes #1 = { nounwind readnone speculatable willreturn }

!0 = !{}
!1 = !{i64 32}
!2 = !{i64 8}
!3 = !{!4, !6}
!4 = !{!"buffer: {index:0, offset:0, size:96}", !5}
!5 = !{!"XLA global AA domain"}
!6 = !{!"buffer: {index:5, offset:0, size:32}", !5}
!7 = !{!6}
!8 = !{!4}
!9 = !{i64 4}
!10 = !{i64 12}
!11 = !{i64 96}
!12 = !{!13, !6}
!13 = !{!"buffer: {index:3, offset:0, size:8}", !5}
!14 = !{!13}

```

This results in bad numerical answers when used through XLA.
Again, it's not that easy to give a small fully-reproducible example, but the misscompare is:

```
Expected literal:
(
f32[2,3,4] {
{
  { nan, -inf, -3181.35, -inf },
  { nan, -inf, -28.2577019, -inf },
  { nan, -inf, -28.2577019, -inf }
},
{
  { -inf, -inf, -inf, -inf },
  { -6.60753046e+28, -1.47314833e+23, -inf, -inf },
  { -2.43504347e+30, -5.42892693e+24, -inf, -inf }
}
}
)

Actual literal:
(
f32[2,3,4] {
{
  { nan, -inf, -3181.35, -inf },
  { nan, -inf, -inf, -inf },
  { inf, -inf, -28.2577019, -inf }
},
{
  { -inf, -inf, -inf, -inf },
  { -6.60753046e+28, -1.47314833e+23, -inf, -inf },
  { -2.43504347e+30, -5.42892693e+24, -inf, -inf }
}
}
)
```

Reviewers: sanjoy.google, sanjoy, ebrevnov, jdoerfert, reames, chandlerc

Subscribers: hiraditya, Charusso, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70516
2019-11-21 11:40:15 +01:00
Evgeniy Brevnov 5f026b6d9e [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151
Summary:
Dependence anlysis has a mechanism to cache results. Thus for particular memory access the cache keep track of side effects in basic blocks. The problem is that for invariant loads dependepce analysis legally ignores many dependencies due to a special semantic rules for such loads. But later results calculated for invariant load retrived from the cache for general case acceses. As a result we have wrong dependence information causing GVN to do illegal transformation. Fixes, T42151.

Proposed solution is to disable caching of invariant loads. I think such loads a pretty rare and it doesn't make sense to extend caching mechanism for them.

Reviewers: reames, chandlerc, skatkov, morisset, jdoerfert

Reviewed By: reames

Subscribers: hiraditya, test, jdoerfert, lebedev.ri, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64405
2019-11-19 17:30:02 +07:00
Reid Kleckner 05da2fe521 Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.

I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
  recompiles    touches affected_files  header
  342380        95      3604    llvm/include/llvm/ADT/STLExtras.h
  314730        234     1345    llvm/include/llvm/InitializePasses.h
  307036        118     2602    llvm/include/llvm/ADT/APInt.h
  213049        59      3611    llvm/include/llvm/Support/MathExtras.h
  170422        47      3626    llvm/include/llvm/Support/Compiler.h
  162225        45      3605    llvm/include/llvm/ADT/Optional.h
  158319        63      2513    llvm/include/llvm/ADT/Triple.h
  140322        39      3598    llvm/include/llvm/ADT/StringRef.h
  137647        59      2333    llvm/include/llvm/Support/Error.h
  131619        73      1803    llvm/include/llvm/Support/FileSystem.h

Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.

Reviewers: bkramer, asbirlea, bollu, jdoerfert

Differential Revision: https://reviews.llvm.org/D70211
2019-11-13 16:34:37 -08:00
Teresa Johnson 9c27b59cec Change TargetLibraryInfo analysis passes to always require Function
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.

This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.

Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.

There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.

Reviewers: chandlerc, hfinkel

Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66428

llvm-svn: 371284
2019-09-07 03:09:36 +00:00
Philip Reames 27820f9909 [Instruction] Add hasMetadata(Kind) helper [NFC]
It's a common idiom, so let's add the obvious wrapper for metadata kinds which are basically booleans.

llvm-svn: 370933
2019-09-04 17:28:48 +00:00
Fedor Sergeev 92e160abab [MemDep] allow to select block-scan-limit when constructing MemoryDependenceAnalysis
Introducing non-global control for default block-scan-limit in MemDep analysis.
Useful when there are many compilations per initialized LLVM instance (e.g. JIT).

Reviewed By: asbirlea
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65806

llvm-svn: 368502
2019-08-10 01:23:38 +00:00
Florian Hahn 6c3024368c [MemDepAnalysis] Allow caller to pass in an OrderedBasicBlock.
If the caller can preserve the OBB, we can avoid recomputing the order
for each getDependency call.

Reviewers: efriedma, rnk, hfinkel

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D59788

llvm-svn: 357206
2019-03-28 19:17:31 +00:00
Jordan Rupprecht 6387fa2715 [NFC] Fix typos: preceeding -> preceding
llvm-svn: 354715
2019-02-23 01:28:32 +00:00