34040 Commits

Author SHA1 Message Date
Simon Pilgrim
1603106725 [TargetLowering] Improve expandFunnelShift shift amount masking
For the 'inverse shift', we currently always perform a subtraction of the original (masked) shift amount.

But for the case where we are handling power-of-2 type widths, we can replace:

(sub bw-1, (and amt, bw-1) ) -> (and (xor amt, bw-1), bw-1) -> (and ~amt, bw-1)

This allows x86 shifts to fold away the and-mask.

Followup to D77301 + D80466.

http://volta.cs.utah.edu:8080/z/Nod0Gr

Differential Revision: https://reviews.llvm.org/D80489
2020-05-24 11:25:09 +01:00
Simon Pilgrim
8310c9b741 [X86][AVX] Call SimplifyDemandedBits on MaskedLoadSDNode with non-boolean masks
On X86 (AVX1/AVX2), non-boolean masked loads only demand the sign bit of the mask, we already do the equivalent for masked stores.

Annoyingly I can't easily handle this inside TargetLowering::SimplifyDemandedBits as this is an x86 specific case for a generic node.

Differential Revision: https://reviews.llvm.org/D80478
2020-05-24 09:51:21 +01:00
Simon Pilgrim
cc65a7a5ea [X86] Improve i8 + 'slow' i16 funnel shift codegen
This is a preliminary patch before I deal with the xor+and issue raised in D77301.

We get much better code for i8/i16 funnel shifts by concatenating the operands together and performing the shift as a double width type, it avoids repeated use of the shift amount and partial registers.

fshl(x,y,z) -> (((zext(x) << bw) | zext(y)) << (z & (bw-1))) >> bw.
fshr(x,y,z) -> (((zext(x) << bw) | zext(y)) >> (z & (bw-1))) >> bw.

Alive2: http://volta.cs.utah.edu:8080/z/CZx7Cn

This doesn't do as well for i32 cases on x86_64 (the xor+and followup patch is much better) so I haven't bothered with that.

Cases with constant amounts are more dubious as well so I haven't currently bothered with those - its these kind of 'edge' cases that put me off trying to put this in TargetLowering::expandFunnelShift.

Differential Revision: https://reviews.llvm.org/D80466
2020-05-24 08:08:53 +01:00
Amara Emerson
99660217e9 [AArch64][GlobalISel] When generating SUBS for compares, don't write to wzr/xzr.
Although writing to wzr/xzr is correct since we don't care about the result
of the sub, only the flags, doing so causes tail merge blocks to fail.

Writing to an unused virtual register instead allows the optimization to fire,
improving performance significantly on 256.bzip2.

Differential Revision: https://reviews.llvm.org/D80460
2020-05-23 22:59:49 -07:00
Amy Kwan
b631f86ac5 [TLI][PowerPC] Introduce TLI query to check if MULH is cheaper than MUL + SHIFT
This patch introduces a TargetLowering query, isMulhCheaperThanMulShift.

Currently in DAG Combine, it will transform mulhs/mulhu into a
wider multiply and a shift if the wide multiply is legal.

This TLI function is implemented on 64-bit PowerPC, as it is more desirable to
have multiply-high over multiply + shift for words and doublewords. Having
multiply-high can also aid in further transformations that can be done.

Differential Revision: https://reviews.llvm.org/D78271
2020-05-23 16:47:12 -05:00
Nikita Popov
2833c46f75 [DwarfEHPrepare] Don't prune unreachable resumes at optnone
Disable pruning of unreachable resumes in the DwarfEHPrepare pass
at optnone. While I expect the pruning itself to be essentially free,
this does require a dominator tree calculation, that is not used for
anything else. Saving this DT construction makes for a 0.4% O0
compile-time improvement.

Differential Revision: https://reviews.llvm.org/D80400
2020-05-23 20:58:01 +02:00
Nikita Popov
0c6bba71e3 [TargetPassConfig] Don't add alias analysis at optnone
When performing codegen at optnone, don't add alias analysis to
the pipeline. We don't need it, but it causes an unnecessary
dominator tree calculation.

I've also moved the module verifier call to the top so that a bunch
of disabled-at-optnone passes group more nicely.

Differential Revision: https://reviews.llvm.org/D80378
2020-05-23 10:35:03 +02:00
Stanislav Mekhanoshin
62fb3fa6d9 [AMDGPU] Define 6 dword subregs
This prevents autogeneration of degenerate names for these.

Differential Revision: https://reviews.llvm.org/D80451
2020-05-22 13:53:29 -07:00
Sanjay Patel
024098ae53 [VectorCombine] set preserve alias analysis
As noted in D80236, moving the pass in the pipeline exposed this
shortcoming. Extra work to recalculate the alias results showed
up as a compile-time slowdown.
2020-05-22 16:25:16 -04:00
Jean-Michel Gorius
65cd2c7a80 Revert "[CodeGen] Add support for multiple memory operands in MachineInstr::mayAlias"
This temporarily reverts commit 7019cea26dfef5882c96f278c32d0f9c49a5e516.

It seems that, for some targets, there are instructions with a lot of memory operands (probably more than would be expected). This causes a lot of buildbots to timeout and notify failed builds. While investigations are ongoing to find out why this happens, revert the changes.
2020-05-22 21:26:46 +02:00
Sanjay Patel
6438ea45e0 [VectorCombine] position pass after SLP in the optimization pipeline rather than before
There are 2 known problem patterns shown in the test diffs here:
vector horizontal ops (an x86 specialization) and vector reductions.

SLP has greater ability to match and fold those than vector-combine,
so let SLP have first chance at that.

This is a quick fix while we continue to improve vector-combine and
possibly canonicalize to reduction intrinsics.

In the longer term, we should improve matching of these patterns
because if they were created in the "bad" forms shown here, then we
would miss optimizing them.

I'm not sure what is happening with alias analysis on the addsub test.
The old pass manager now shows an extra line for that, and we see an
improvement that comes from SLP vectorizing a store. I don't know
what's missing with the new pass manager to make that happen.
Strangely, I can't reproduce the behavior if I compile from C++ with
clang and invoke the new PM with "-fexperimental-new-pass-manager".

Differential Revision: https://reviews.llvm.org/D80236
2020-05-22 12:22:44 -04:00
Simon Pilgrim
c479052a74 [CGP] Ensure address offset is representable as int64_t
AddressingModeMatcher::matchAddr was calling getSExtValue for a constant before ensuring that we can actually represent the value as int64_t

Fixes PR46004 / OSSFuzz#22357
2020-05-22 17:00:22 +01:00
Xiangling_Liao
2419dce5d1 [NFC][AIX] Remove spaces after the comma for '.csect' directive
To be consistent with other directives like '.comm', '.lcomm', we remove
the spaces after the comma for '.csect' on AIX.

Differential Revision: https://reviews.llvm.org/D80247
2020-05-22 11:10:32 -04:00
Matt Arsenault
66fe60220c AMDGPU/GlobalISel: Fix masked control flow with fallthrough blocks
Unlike SelectionDAGBuilder, IRTranslator omits the unconditional
branch in fallthrough cases. Confusingly, the control flow pseudos
function in the opposite way the intrinsics are used, and the branch
targets always need to be swapped. We're inverting the target blocks,
so we need to figure out the old fallthrough block and insert a branch
to the original unconditional branch target.
2020-05-22 10:31:44 -04:00
Nemanja Ivanovic
1a493b0fa5 [PowerPC] Add missing handling for half precision
The fix for PR39865 took care of some of the handling for half precision
but it missed a number of issues that still exist. This patch fixes the
remaining issues that cause crashes in the PPC back end.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=45776

Differential revision: https://reviews.llvm.org/D79283
2020-05-22 07:50:11 -05:00
Jon Roelofs
5a8db275f8 Revert "[llvm][test] Add COM: directives before colon-less non-CHECKs in comments. NFC"
This reverts commit 183d6af081899973f00fc24aeafcfc32de732f02.

Revert pending further consensus building: https://reviews.llvm.org/D79963#2050521
2020-05-22 05:36:15 -06:00
Dmitry Preobrazhensky
933ebc4078 [AMDGPU][MC][GFX8+] Enabled clamp for v_mul_i32_i24_e64 and v_mul_u32_u24_e64
See bug 45925: https://bugs.llvm.org/show_bug.cgi?id=45925

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D80287
2020-05-22 14:11:31 +03:00
QingShan Zhang
d1076d729a [NFC][Test] Add test coverage for fsqrt on PowerPC 2020-05-22 10:59:27 +00:00
Victor Campos
872ee78f65 Revert "[ARM] Improve codegen of volatile load/store of i64"
This reverts commit 8a12553223180246eeafaa0fa7bfa11e834d34b6.

A bug has been found when generating code for Thumb2. In some very
specific cases, the prologue/epilogue emitter generates erroneous stack
offsets for the new LDRD instructions that access the stack.

This bug does not seem to be caused by the reverted patch though. Likely
the latter has made an undiscovered issue emerge in the
prologue/epilogue emission pass. Nevertheless, this reversion is
necessary since it is blocking users of the ARM backend.
2020-05-22 11:01:57 +01:00
Jessica Paquette
49a4f3f7d8 [AArch64][GlobalISel] Add a post-legalizer combiner with a very simple combine.
(This patch is by Jessica, I'm just committing it on her behalf because I need
a post-legalizer combiner for something else).

This supersedes D77250, which did equivalent work in the selector. This can be
done pre-legalization or post-legalization. Post-legalization is more likely to
hit, since G_IMPLICIT_DEFs tend to appear during legalization. There's no reason
to not do it pre-legalization though-- if it can be caught earlier, great.

(I also think that it might be worth reimplementing D78769 using a
target-specific post-legalization combine too after thinking about it for a
while.)

Differential Revision: https://reviews.llvm.org/D78852
2020-05-21 18:47:32 -07:00
Alexey Lapshin
bf242c067e [AARCH64][NEON] Allow to sink operands of aarch64_neon_pmull64.
Summary:
This patch fixes a problem when pmull2 instruction is not
generated for vmull_high_p64 intrinsic.

ISel has a pattern for int_aarch64_neon_pmull64 intrinsic to generate
PMULL2 instruction. That pattern assumes that extraction operations
are located in the same basic block. We need to sink them
if they are not. Handle operands of int_aarch64_neon_pmull64
into AArch64TargetLowering::shouldSinkOperands.

Reviewed by: efriedma

Differential Revision: https://reviews.llvm.org/D80320
2020-05-22 01:35:24 +03:00
Arthur Eubanks
fc937806ef Don't jump to landing pads in Control Flow Optimizer
Summary: Likely fixes https://bugs.llvm.org/show_bug.cgi?id=45858.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80047
2020-05-21 15:19:10 -07:00
Tim Renouf
d13a508820 [AMDGPU] Fixed incorrect PAL metadata register naming
This only affects assembly and -filetype=asm codegen of PAL metadata.

Differential Revision: https://reviews.llvm.org/D78860

Change-Id: I7b822e1917bf7b403486820d31afc483be207652
2020-05-21 22:13:19 +01:00
Jean-Michel Gorius
7019cea26d [CodeGen] Add support for multiple memory operands in MachineInstr::mayAlias
Summary:
To support all targets, the mayAlias member function needs to support instructions with multiple operands.

This revision also changes the order of the emitted instructions in some test cases.

Reviewers: efriedma, hfinkel, craig.topper, dmgreen

Reviewed By: efriedma

Subscribers: MatzeB, dmgreen, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80161
2020-05-21 23:02:54 +02:00
Stanislav Mekhanoshin
689e616ed0 [AMDGPU] Promote alloca to vector in opt
Promote alloca to vector before SROA and loop unroll. If we manage
to eliminate allocas before unroll we may choose to unroll less.

Differential Revision: https://reviews.llvm.org/D80386
2020-05-21 13:49:51 -07:00
Eli Friedman
f09d220c71 [AArch64][SVE] Fill out missing unpredicated load/store patterns.
The set of patterns for unpredicated load/store was incomplete: it only
included non-extending stores.  Fill out the remaining patterns for
extending stores, and add the corresponding support to frame offset
lowering.

Differential Revision: https://reviews.llvm.org/D80349
2020-05-21 13:29:30 -07:00
Stanislav Mekhanoshin
71bbe5d799 [AMDGPU] Added opt pipeline test. NFC. 2020-05-21 11:58:35 -07:00
Stanislav Mekhanoshin
1dfd1b3e4b [AMDGPU] Tune threshold for cmp/select vector lowering
It was set in total vector size while the idea was to limit
a number of instructions. Now it started to work with doubles
and thresholds needs to be updated.

Differential Revision: https://reviews.llvm.org/D80322
2020-05-21 08:59:35 -07:00
Jon Roelofs
5fb979dd06 [llvm][test] Add missing FileCheck colons. NFC 2020-05-21 09:29:27 -06:00
Jon Roelofs
183d6af081 [llvm][test] Add COM: directives before colon-less non-CHECKs in comments. NFC
Differential Revision: https://reviews.llvm.org/D79963
2020-05-21 09:29:27 -06:00
Sjoerd Meijer
b0614509a0 [HardwareLoops] llvm.loop.decrement.reg definition
This is split off from D80316, slightly tightening the definition of overloaded
hardwareloop intrinsic llvm.loop.decrement.reg specifying that both operands
its result have the same type.
2020-05-21 10:48:16 +01:00
Denis Antrushin
dedcefe09d [Statepoint] Constant fold FP deopt args.
We do not have any special handling for constant FP deopt arguments.
They are just spilled to stack or generated in register by MOVS
instruction. This is inefficient and, when we have too many such
constant arguments, may result in register allocation failure.
Instead, we can bitcast such constant FP operands to appropriately
sized integer and record as constant into statepoint and later, into
StackMap.

Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D80318
2020-05-21 11:02:54 +03:00
Chen Zheng
8086cdd1b0 [PowerPC] add more high latency opcodes for machine combiner pass
Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D80097
2020-05-21 02:39:20 -04:00
Craig Topper
ae5ab2f40a [LegalizeDAG] Modify ExpandLegalINT_TO_FP to swap data for little/big endian instead of the pointers.
Will make it easier to pass the pointer info and alignment
correctly to the loads/stores.

While there also make the i32 stores independent and use a token
factor to join before the load.
2020-05-20 22:29:59 -07:00
Eli Friedman
b4f9b34701 [AArch64] Fix unwind info generated by outliner.
The offsets were wrong. The result is now the same as what the compiler
would generate for a function that spills lr normally.

Differential Revision: https://reviews.llvm.org/D80238
2020-05-20 16:39:00 -07:00
Francis Visoiu Mistrih
770ba4f051 [AArch64] Fix GlobalISel tests on non-darwin platforms
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/6998
2020-05-20 16:26:58 -07:00
Francis Visoiu Mistrih
161122ea1c [AArch64] Provide Darwin variants of most calling conventions
With the new SVE stack layout, we now need to provide a Darwin variant
for all the calling conventions based on the main AAPCS CSR save order.

This also changes APCS_SwiftError to have a Darwin and a non-Darwin
version, assuming it could be used on other platforms these days, and
restricts the AArch64_CXX_TLS calling convention to Darwin.

Differential Revision: https://reviews.llvm.org/D73805
2020-05-20 16:03:48 -07:00
Stanislav Mekhanoshin
4eecf17164 [AMDGPU] Always expand ext/insertelement with divergent idx
Even though series of cmd/cndmask can produce quite a lot of
code that is still better than a loop. In case of doubles we
would even produce two loops.

Differential Revision: https://reviews.llvm.org/D80032
2020-05-20 15:51:29 -07:00
aartbik
645bba8d3d [llvm] [CodeGen] [X86] Fix issues with v4i1 instruction selection
Summary:
Fixes issue
https://bugs.llvm.org/show_bug.cgi?id=45995

Reviewers: mehdi_amini, nicolasvasilache, reidtatge, craig.topper, ftynse, bkramer

Reviewed By: craig.topper

Subscribers: RKSimon, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80231
2020-05-20 11:34:56 -07:00
Arthur Eubanks
8a88755610 Reland [X86] Codegen for preallocated
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.

In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.

This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.

The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.

The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.

Force any function containing a preallocated call to use the frame
pointer.

Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.

Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).

Aside from the tests added here, I checked that this codegen produces
correct code for something like

```
struct A {
        A();
        A(A&&);
        ~A();
};

void bar() {
        foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```

by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.

Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77689
2020-05-20 11:25:44 -07:00
Arthur Eubanks
b8cbff51d3 Revert "[X86] Codegen for preallocated"
This reverts commit 810567dc691a57c8c13fef06368d7549f7d9c064.

Some tests are unexpectedly passing
2020-05-20 10:04:55 -07:00
Arthur Eubanks
810567dc69 [X86] Codegen for preallocated
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.

In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.

This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.

The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.

The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.

Force any function containing a preallocated call to use the frame
pointer.

Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.

Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).

Aside from the tests added here, I checked that this codegen produces
correct code for something like

```
struct A {
        A();
        A(A&&);
        ~A();
};

void bar() {
        foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```

by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77689
2020-05-20 09:20:38 -07:00
Matt Arsenault
e8f6b0e583 AMDGPU/GlobalISel: Fix splitting 64-bit extensions
This was replicating the low bits into the high bits for G_ZEXT,
rather than using 0.
2020-05-20 11:13:32 -04:00
Jay Foad
3c84353804 [AMDGPU] Add the test from D49097. 2020-05-20 14:34:51 +01:00
Pierre-vh
835251f7d9 [Target][ARM] Make Low Overhead Loops coexist with VPT blocks.
Previously, the LowOverheadLoops pass couldn't handle VPT blocks
with conditions, or with multiple VCTPs. This patch improves the
LowOverheadLoops pass so it can handle those cases.

It also adds support for VCMPs before the VCTP.

Differential Revision: https://reviews.llvm.org/D78206
2020-05-20 12:24:55 +01:00
Kang Zhang
58684fbb6f [NFC][PowerPC] Add 2 new cases to test livevars pass 2020-05-20 05:32:09 +00:00
Stanislav Mekhanoshin
677929e352 [AMDGPU] Process V_MOV_B32_indirect in SET_GPR_IDX optimization
Differential Revision: https://reviews.llvm.org/D80256
2020-05-19 21:37:14 -07:00
Matt Arsenault
77f05e5b53 AMDGPU/GlobalISel: Fix bug in test register bank
The intent wasn't cases with illegal VGPR to SGPR copies.
2020-05-19 22:52:59 -04:00
QingShan Zhang
2b59e9f1bd [DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression
We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression.
However, during negating the expression, the cost might change as we are changing the DAG,
and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore.

This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression,
and check the cost during negating the expression. It also reduce the duplicated code between
getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638

Reviewed By: RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D77319
2020-05-20 02:12:16 +00:00
Matt Arsenault
21d2884a9c AMDGPU: Annotate functions that have stack objects
Relying on any MachineFunction state in the MachineFunctionInfo
constructor is hazardous, because the construction time is unclear and
determined by the first use. The function may be only partially
constructed, which is part of why we have many of these hacky string
attributes to track what we need for ABI lowering.

For SelectionDAG, all stack objects are created up-front before
calling convention lowering so stack objects are visible at
construction time. For GlobalISel, none of the IR function has been
visited yet and the allocas haven't been added to the MachineFrameInfo
yet. This should fix failing to set flat_scratch_init in GlobalISel
when needed.

This pass really needs to be turned into some kind of analysis, but I
haven't found a nice way use one here.
2020-05-19 18:51:00 -04:00