14950 Commits

Author SHA1 Message Date
Matt Arsenault
27fe841aa6 AMDGPU: Refine rcp/rsq intrinsic folding for modern FP rules
We have to assume undef could be an snan, which would need quieting so
returning qnan is safer than undef. Also consider strictfp, and don't
care if the result rounded.
2020-05-23 13:28:36 -04:00
Michal Paszkowski
335de55fa3 Revert "Added a new IRCanonicalizer pass."
This reverts commit 14d358537f124a732adad1ec6edf3981dc9baece.
2020-05-23 13:51:43 +02:00
Michal Paszkowski
14d358537f Added a new IRCanonicalizer pass.
Summary:
Added a new IRCanonicalizer pass which aims to transform LLVM modules into
a canonical form by reordering and renaming instructions while preserving the
same semantics. The canonicalizer makes it easier to spot semantic differences
when diffing two modules which have undergone different passes.

Presentation: https://www.youtube.com/watch?v=c9WMijSOEUg

Reviewed by: plotfi

Differential Revision: https://reviews.llvm.org/D66029
2020-05-23 12:45:53 +02:00
Florian Hahn
7a325c14b4 [DSE,MSSA] Add additional multiblock tests. 2020-05-22 18:24:43 +01:00
Sanjay Patel
6438ea45e0 [VectorCombine] position pass after SLP in the optimization pipeline rather than before
There are 2 known problem patterns shown in the test diffs here:
vector horizontal ops (an x86 specialization) and vector reductions.

SLP has greater ability to match and fold those than vector-combine,
so let SLP have first chance at that.

This is a quick fix while we continue to improve vector-combine and
possibly canonicalize to reduction intrinsics.

In the longer term, we should improve matching of these patterns
because if they were created in the "bad" forms shown here, then we
would miss optimizing them.

I'm not sure what is happening with alias analysis on the addsub test.
The old pass manager now shows an extra line for that, and we see an
improvement that comes from SLP vectorizing a store. I don't know
what's missing with the new pass manager to make that happen.
Strangely, I can't reproduce the behavior if I compile from C++ with
clang and invoke the new PM with "-fexperimental-new-pass-manager".

Differential Revision: https://reviews.llvm.org/D80236
2020-05-22 12:22:44 -04:00
Sanjay Patel
2f7c24fe30 [InstCombine] (A + B) + B --> A + (B << 1)
This eliminates a use of 'B', so it can enable follow-on transforms
as well as improve analysis/codegen.

The PhaseOrdering test was added for D61726, and that shows
the limits of instcombine vs. real reassociation. We would
need to run some form of CSE to collapse that further.

The intermediate variable naming here is intentional because
there's a test at llvm/test/Bitcode/value-with-long-name.ll
that would break with the usual nameless value. I'm not sure
how to improve that test to be more robust.

The naming may also be helpful to debug regressions if this
change exposes weaknesses in the reassociation pass for example.
2020-05-22 11:46:59 -04:00
Sanjay Patel
b603794061 [InstCombine] add tests for adds with common operand; NFC 2020-05-22 11:46:59 -04:00
Sanjay Patel
5a230a19ad [PhaseOrdering] regenerate test checks; NFC
Remove some redundant/unnecessary bits too.
2020-05-22 10:10:47 -04:00
Anh Tuyen Tran
13bf6039c9 Title: [LV] Handle Fold-Tail of loops with vectorizarion factor equal to 1
Summary:
When handling loops whose VF is 1, fold-tail vectorization sets the
backedge taken count of the original loop with a vector of a single
element. This causes type-mismatch during instruction generartion.

The purpose of this patch is toto address the case of VF==1.

Reviewer: Ayal (Ayal Zaks), bmahjour (Bardia Mahjour), fhahn (Florian Hahn), gilr (Gil Rapaport), rengolin (Renato Golin)

Reviewed By: Ayal (Ayal Zaks), bmahjour (Bardia Mahjour), fhahn (Florian Hahn)

Subscribers: Ayal (Ayal Zaks), rkruppe (Hanna Kruppe), bmahjour (Bardia Mahjour), rogfer01 (Roger Ferrer Ibanez), vkmr (Vineet Kumar), bollu (Siddharth Bhat), hiraditya (Aditya Kumar), llvm-commits (Mailing List llvm-commits)

Tag: LLVM

Differential Revision: https://reviews.llvm.org/D79976
2020-05-22 13:30:56 +00:00
Sanjay Patel
880df559f9 [SLP] fix test to have valid IR; NFC
This test was failing verification because the
metadata is ill-formed. This commit is split
from D80401 because it is an independent fix
(although the test would break with that change).
2020-05-22 09:06:02 -04:00
Matt Arsenault
88c20fa3d2 InstCombine: Add constant folding/simplify for amdgcn.ldexp intrinsic
This really belongs in InstructionSimplify since it doesn't introduce
new instructions. Put it in instcombine to avoid increasing the number
of passes considering target intrinsics.

I also noticed that we seem to now be interpreting strictfp attributes
on call sites, so try to handle that.
2020-05-22 08:21:38 -04:00
Jon Roelofs
5a8db275f8 Revert "[llvm][test] Add COM: directives before colon-less non-CHECKs in comments. NFC"
This reverts commit 183d6af081899973f00fc24aeafcfc32de732f02.

Revert pending further consensus building: https://reviews.llvm.org/D79963#2050521
2020-05-22 05:36:15 -06:00
Max Kazantsev
403810557b [InstCombine] Sink pure instructions down to return and unreachable blocks
If the only user of `Instr` is in a return or unreachable block, we can
sink `Instr` to the`User` safely (unless it reads/writes memory).
Return or unreachable blocks are guaranteed to execute zero
or one time, and `Instr` always dominates `User`, so they either will
be executed together (execution of `User` always implies execution
of `Instr`) or not executed at all.

Differential Revision: https://reviews.llvm.org/D80120
Reviewed By: asbirlea, jdoerfert
2020-05-22 14:33:42 +07:00
Vedant Kumar
77ffce6954 [Instruction] Set metadata uses to undef on deletion
Summary:
Replace any extant metadata uses of a dying instruction with undef to
preserve debug info accuracy. Some alternatives include:

- Treat Instruction like any other Value, and point its extant metadata
  uses to an empty ValueAsMetadata node. This makes extant dbg.value uses
  trivially dead (i.e. fair game for deletion in many passes), leading to
  stale dbg.values being in effect for too long.

- Call salvageDebugInfoOrMarkUndef. Not needed to make instruction removal
  correct. OTOH results in wasted work in some common cases (e.g. when all
  instructions in a BasicBlock are deleted).

This came up while discussing some basic cases in
https://reviews.llvm.org/D80052.

Reviewers: jmorse, TWeaver, aprantl, dexonsmith, jdoerfert

Subscribers: jholewinski, qcolombet, hiraditya, jfb, sstefan1, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80264
2020-05-21 15:58:12 -07:00
Hiroshi Yamauchi
b5c59d77c3 [ProfileSummary] Add the PartialProfileRatio field in ProfileSummary metadata.
Summary:
PartialProfileRatio approximately represents the ratio of the number of profile
counters of the program being built to the number of profile counters in the
partial sample profile. It is used to scale the working set size under the
partial sample profile to reflect the size of the program being built and to
improve the working set size heuristics.

This is a split from D79831.

Reviewers: davidxl

Subscribers: eraman, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79951
2020-05-21 09:12:23 -07:00
Jon Roelofs
5fb979dd06 [llvm][test] Add missing FileCheck colons. NFC 2020-05-21 09:29:27 -06:00
Jon Roelofs
183d6af081 [llvm][test] Add COM: directives before colon-less non-CHECKs in comments. NFC
Differential Revision: https://reviews.llvm.org/D79963
2020-05-21 09:29:27 -06:00
Ehud Katz
111ddc57d3 [FlattenCFG] Fix MergeIfRegion in case then-path is empty
In case the then-path of an if-region is empty, then merging with the
else-path should be handled with the inverse of the condition (leading
to that path).

Fix PR37662

Differential Revision: https://reviews.llvm.org/D78881
2020-05-21 14:06:44 +03:00
Roman Lebedev
b2df961231
[IndVarSimplify][LoopUtils] Avoid TOCTOU/ordering issues (PR45835)
Summary:
Currently, `rewriteLoopExitValues()`'s logic is roughly as following:
> Loop over each incoming value in each PHI node.
> Query whether the SCEV for that incoming value is high-cost.
> Expand the SCEV.
> Perform sanity check (`isValidRewrite()`, D51582)
> Record the info
> Afterwards, see if we can drop the loop given replacements.
> Maybe perform replacements.

The problem is that we interleave SCEV cost checking and expansion.
This is A Problem, because `isHighCostExpansion()` takes special care
to not bill for the expansions that were already expanded, and we can reuse.

While it makes sense in general - if we know that we will expand some SCEV,
all the other SCEV's costs should account for that, which might cause
some of them to become non-high-cost too, and cause chain reaction.

But that isn't what we are doing here. We expand *all* SCEV's, unconditionally.
So every next SCEV's cost will be affected by the already-performed expansions
for previous SCEV's. Even if we are not planning on keeping
some of the expansions we performed.

Worse yet, this current "bonus" depends on the exact PHI node
incoming value processing order. This is completely wrong.

As an example of an issue, see @dmajor's `pr45835.ll` - if we happen to have
a PHI node with two(!) identical high-cost incoming values for the same basic blocks,
we would decide first time around that it is high-cost, expand it,
and immediately decide that it is not high-cost because we have an expansion
that we could reuse (because we expanded it right before, temporarily),
and replace the second incoming value but not the first one;
thus resulting in a broken PHI.

What we instead should do for now, is not perform any expansions
until after we've queried all the costs.

Later, in particular after `isValidRewrite()` is an assertion (D51582)
we could improve upon that, but in a more coherent fashion.

See [[ https://bugs.llvm.org/show_bug.cgi?id=45835 | PR45835 ]]

Reviewers: dmajor, reames, mkazantsev, fhahn, efriedma

Reviewed By: dmajor, mkazantsev

Subscribers: smeenai, nikic, hiraditya, javed.absar, llvm-commits, dmajor

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79787
2020-05-21 13:05:55 +03:00
Sjoerd Meijer
b0614509a0 [HardwareLoops] llvm.loop.decrement.reg definition
This is split off from D80316, slightly tightening the definition of overloaded
hardwareloop intrinsic llvm.loop.decrement.reg specifying that both operands
its result have the same type.
2020-05-21 10:48:16 +01:00
Juneyoung Lee
58f7c938a1 add a test for D77524 2020-05-21 10:03:27 +09:00
Juneyoung Lee
d9a4a24413 Add CanonicalizeFreezeInLoops pass
Summary:
If an induction variable is frozen and used, SCEV yields imprecise result
because it doesn't say anything about frozen variables.

Due to this reason, performance degradation happened after
https://reviews.llvm.org/D76483 is merged, causing
SCEV yield imprecise result and preventing LSR to optimize a loop.

The suggested solution here is to add a pass which canonicalizes frozen variables
inside a loop. To be specific, it pushes freezes out of the loop by freezing
the initial value and step values instead & dropping nsw/nuw flags from instructions used by freeze.
This solution was also mentioned at https://reviews.llvm.org/D70623 .

Reviewers: spatel, efriedma, lebedev.ri, fhahn, jdoerfert

Reviewed By: fhahn

Subscribers: nikic, mgorny, hiraditya, javed.absar, llvm-commits, sanwou01, nlopes

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77523
2020-05-21 09:29:29 +09:00
Eli Friedman
f26bdb539e Make Value::getPointerAlignment() return an Align, not a MaybeAlign.
If we don't know anything about the alignment of a pointer, Align(1) is
still correct: all pointers are at least 1-byte aligned.

Included in this patch is a bugfix for an issue discovered during this
cleanup: pointers with "dereferenceable" attributes/metadata were
assumed to be aligned according to the type of the pointer.  This
wasn't intentional, as far as I can tell, so Loads.cpp was fixed to
stop making this assumption. Frontends may need to be updated.  I
updated clang's handling of C++ references, and added a release note for
this.

Differential Revision: https://reviews.llvm.org/D80072
2020-05-20 16:37:20 -07:00
Roman Lebedev
55430f53f3
[InstCombine] insertelement is negatible if both sources are negatible
----------------------------------------
define <2 x i4> @negate_insertelement(<2 x i4> %src, i4 %a, i32 %x, <2 x i4> %b) {
%0:
  %t0 = sub <2 x i4> { 0, 0 }, %src
  %t1 = sub i4 0, %a
  %t2 = insertelement <2 x i4> %t0, i4 %t1, i32 %x
  %t3 = sub <2 x i4> %b, %t2
  ret <2 x i4> %t3
}
=>
define <2 x i4> @negate_insertelement(<2 x i4> %src, i4 %a, i32 %x, <2 x i4> %b) {
%0:
  %t2.neg = insertelement <2 x i4> %src, i4 %a, i32 %x
  %t3 = add <2 x i4> %t2.neg, %b
  ret <2 x i4> %t3
}
Transformation seems to be correct!
2020-05-20 21:44:31 +03:00
Roman Lebedev
a6097cebe9
[NFC][InstCombine] Negator: tests for insertelement negation 2020-05-20 21:44:31 +03:00
Roman Lebedev
ebed96fdbf
[InstCombine] Negator: extractelement is negatible if src is negatible
----------------------------------------
define i4 @negate_extractelement(<2 x i4> %x, i32 %y, i4 %z) {
%0:
  %t0 = sub <2 x i4> { 0, 0 }, %x
  call void @use_v2i4(<2 x i4> %t0)
  %t1 = extractelement <2 x i4> %t0, i32 %y
  %t2 = sub i4 %z, %t1
  ret i4 %t2
}
=>
define i4 @negate_extractelement(<2 x i4> %x, i32 %y, i4 %z) {
%0:
  %t0 = sub <2 x i4> { 0, 0 }, %x
  call void @use_v2i4(<2 x i4> %t0)
  %t1.neg = extractelement <2 x i4> %x, i32 %y
  %t2 = add i4 %t1.neg, %z
  ret i4 %t2
}
Transformation seems to be correct!
2020-05-20 21:44:31 +03:00
Roman Lebedev
952e7106b3
[NFC][InstCombine] Negator: tests for extractelement negation 2020-05-20 21:44:30 +03:00
Arthur Eubanks
8a88755610 Reland [X86] Codegen for preallocated
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.

In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.

This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.

The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.

The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.

Force any function containing a preallocated call to use the frame
pointer.

Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.

Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).

Aside from the tests added here, I checked that this codegen produces
correct code for something like

```
struct A {
        A();
        A(A&&);
        ~A();
};

void bar() {
        foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```

by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.

Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77689
2020-05-20 11:25:44 -07:00
Arthur Eubanks
b8cbff51d3 Revert "[X86] Codegen for preallocated"
This reverts commit 810567dc691a57c8c13fef06368d7549f7d9c064.

Some tests are unexpectedly passing
2020-05-20 10:04:55 -07:00
Sanjay Patel
ad953a1ae1 [InstCombine] add tests for reassociative fsub/fadd expressions; NFC 2020-05-20 12:45:27 -04:00
Arthur Eubanks
810567dc69 [X86] Codegen for preallocated
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.

In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.

This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.

The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.

The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.

Force any function containing a preallocated call to use the frame
pointer.

Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.

Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).

Aside from the tests added here, I checked that this codegen produces
correct code for something like

```
struct A {
        A();
        A(A&&);
        ~A();
};

void bar() {
        foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```

by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77689
2020-05-20 09:20:38 -07:00
Max Kazantsev
20de2323a0 [Test] Add missing auto-generated checks into tests 2020-05-20 12:03:37 +07:00
Sanjay Patel
67ecd8cbf5 [PGOProfile] make test less brittle; NFC
This test may fail just from cosmetic diffs because the values change names.
This is a minimal diff to work-around that, but more may be needed.
2020-05-19 15:56:29 -04:00
Sanjay Patel
348da7eec3 [PhaseOrdering] add tests for x86 horizontal math ops (PR41813); NFC 2020-05-19 14:54:19 -04:00
Sanjay Patel
6d953693fe [PhaseOrdering] make different pass manager runs equivalent; NFC
I don't see any difference from the 'avx' setting, so leaving that
off until there's a need for it.
2020-05-19 14:52:57 -04:00
Jay Foad
9bc989a48d [InstCombine] Remove hasNoInfs check for pow(C,y) -> exp2(log2(C)*y)
We already check hasNoNaNs and that x is finite and strictly positive.
That only leaves the following special cases (taken from the Linux man
page for pow):

If x is +1, the result is 1.0 (even if y is a NaN).
If the absolute value of x is less than 1, and y is negative infinity, the result is positive infinity.
If the absolute value of x is greater than 1, and y is negative infinity, the result is +0.
If the absolute value of x is less than 1, and y is positive infinity, the result is +0.
If the absolute value of x is greater than 1, and y is positive infinity, the result is positive infinity.

The first case is handled elsewhere, and this transformation preserves
all the others, so there is no need to limit it to hasNoInfs.

Differential Revision: https://reviews.llvm.org/D79409
2020-05-19 17:06:05 +01:00
Sameer Sahasrabuddhe
6c84884366 [LoopSimplify] don't separate nested loops with convergent calls
Summary:
When a loop has multiple backedges, loop simplification attempts to
separate them out into nested loops. This results in incorrect control
flow in the presence of some functions like a GPU barrier. This change
skips the transformation when such "convergent" function calls are
present in the loop body.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D80078
2020-05-19 09:22:39 +05:30
Ayal Zaks
682e739638 [LV] Fix FoldTail under user VF and UF
LV considers an internally computed MaxVF to decide if a constant trip-count is
a multiple of any subsequently chosen VF, and conclude that no scalar remainder
iterations (tail) will be left for Fold Tail to handle. If an external VF is
provided via -force-vector-width, it must be considered instead of the internal
MaxVF.
If an external UF is provided via -force-vector-interleave, it too must be
considered in addition to MaxVF or user VF.

Fixes PR45679.

Differential Revision: https://reviews.llvm.org/D80085
2020-05-19 01:32:25 +03:00
Volkan Keles
63081dc6f6 LoadStoreVectorizer: Match nested adds to prove vectorization is safe
If both OpA and OpB is an add with NSW/NUW and with the same LHS operand,
we can guarantee that the transformation is safe if we can prove that OpA
won't overflow when IdxDiff added to the RHS of OpA.

Review: https://reviews.llvm.org/D79817
2020-05-18 12:13:01 -07:00
Vedant Kumar
623b254244 [Local] Do not ignore zexts in salvageDebugInfo, PR45923
Summary:
When salvaging a dead zext instruction, append a convert operation to
the DIExpressions of the debug uses of the instruction, to prevent the
salvaged value from being sign-extended.

I confirmed that lldb prints out the correct unsigned result for "f" in
the example from PR45923 with this changed applied.

rdar://63246143

Reviewers: aprantl, jmorse, chrisjackson, davide

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80034
2020-05-18 09:52:02 -07:00
Max Kazantsev
a2a4e5aae8 [Test] Opportunity for sinking to unreachable in InstCombine 2020-05-18 16:27:16 +07:00
Roman Lebedev
fde8eb00e1
[InstCombine] visitMaskedMerge(): when unfolding, sanitize undef constants (PR45955)
We can't leave undef vector element constants as-is,
it is a miscompile, so we need to sanitize them.

We have two vectors (C and ~C):
* We can't replace undef with 0 in both of them
* We can't replace undef with 0 in only one of them
* We could replace undef with -1 in both of them
* We could replace undef with -1 in only one(!) of them
* We could replace undef with -1 in one and 0 in another one of them.

Therefore, it seems best to go with the last option, since otherwise
we'd loose knowledge that C and ~C have no common bits set,
which seems more important than preserving partial undef knowledge.

Fixes https://bugs.llvm.org/show_bug.cgi?id=45955
2020-05-17 22:53:03 +03:00
Sanjay Patel
130a2356ae [InstCombine] add tests for FP cast of cast; NFC
A fold of casts is proposed as a backend transform in D79187,
but we can also do that in IR (and that may obsolete the need
for a backend transform).
2020-05-17 11:42:07 -04:00
Sanjay Patel
bfd512160f [InstCombine] improve analysis of FP->int->FP to eliminate fpextend
This was originally in D79116.
Converting from a narrow-enough FP source value to integer and
back to FP guarantees that the conversion to FP is exact because
of UB/poison-on-overflow.

This was suggested in PR36617:
https://bugs.llvm.org/show_bug.cgi?id=36617#c19
2020-05-17 09:06:57 -04:00
Florian Hahn
b54a663312 [LoopUnroll] Extend test case with additional loop with larger TC. 2020-05-17 13:55:11 +01:00
Florian Hahn
9e2a99e5b7 [LoopUnroll] Precommit test for PR459393. 2020-05-17 13:29:36 +01:00
Eli Friedman
4f04db4b54 AllocaInst should store Align instead of MaybeAlign.
Along the lines of D77454 and D79968.  Unlike loads and stores, the
default alignment is getPrefTypeAlign, to match the existing handling in
various places, including SelectionDAG and InstCombine.

Differential Revision: https://reviews.llvm.org/D80044
2020-05-16 14:53:16 -07:00
Sanjay Patel
81e9ede3a2 [VectorCombine] forward walk through instructions to improve chaining of transforms
This is split off from D79799 - where I was proposing to fully iterate
over a function until there are no more transforms. I suspect we are
still going to want to do something like that eventually.

But we can achieve the same gains much more efficiently on the current
set of regression tests just by reversing the order that we visit the
instructions.

This may also reduce the motivation for D79078, but we are still not
getting the optimal pattern for a reduction.
2020-05-16 13:08:01 -04:00
Sanjay Patel
43017ceb78 [PhaseOrdering] add vector reduction tests; NFC
These are based on tests originally included in:
D79078
2020-05-16 12:51:10 -04:00
Sanjay Patel
6211830fba [VectorCombine] add reduction-like patterns; NFC
These are based on tests originally included in:
D79078
2020-05-16 12:45:01 -04:00