36153 Commits

Author SHA1 Message Date
Sven van Haastregt
bfc961aeb2 [TargetLowering] Check boolean content when folding bit compare
Updates an optimization that relies on boolean contents being either 0
or 1 to properly check for this before triggering.

The following:
  (X & 8) != 0 --> (X & 8) >> 3
Produces unexpected results when a boolean 'true' value is represented
by negative one.

Patch by Erik Hogeman.

Differential Revision: https://reviews.llvm.org/D89390
2020-10-21 11:46:55 +01:00
Sven van Haastregt
1af51f077b [TargetLowering] Add test for bit comparison fold
This adds a test covering an issue in bit comparison folding.  The
issue will be addressed in the subsequent commit.

Patch by Erik Hogeman.

Differential Revision: https://reviews.llvm.org/D89390
2020-10-21 11:46:45 +01:00
Florian Hahn
88241ffb56 [Passes] Move ADCE before DSE & LICM.
The adjustment seems to have very little impact on optimizations.
The only binary change with -O3 MultiSource/SPEC2000/SPEC2006 on X86 is
in consumer-typeset and the size there actually decreases by -0.1%, with
not significant changes in the stats.

On its own, it is mildly positive in terms of compile-time, most likely
due to LICM & DSE having to process slightly less instructions. It
should also be unlikely that DSE/LICM make much new code dead.

http://llvm-compile-time-tracker.com/compare.php?from=df63eedef64d715ce1f31843f7de9c11fe1e597f&to=e3bdfcf94a9eeae6e006d010464f0c1b3550577d&stat=instructions

With DSE & MemorySSA, it gives some nice compile-time improvements, due
to the fact that DSE can re-use the PDT from ADCE, if it does not make
any changes:

http://llvm-compile-time-tracker.com/compare.php?from=15fdd6cd7c24c745df1bb419e72ff66fd138aa7e&to=481f494515fc89cb7caea8d862e40f2c910dc994&stat=instructions

Reviewed By: xbolva00

Differential Revision: https://reviews.llvm.org/D87322
2020-10-21 10:30:56 +01:00
Esme-Yi
9fbb060418 [NFC][PowerPC]Add tests for folding RLWINM before and after RA. 2020-10-21 06:38:22 +00:00
Austin Kerbow
ebdcef20ce [AMDGPU] Avoid inserting noops during scheduling
Passes that are run after the post-RA scheduler may insert instructions like
waitcnt which eliminate the need for certain noops. After this patch the
scheduler is still aware of possible latency from hazards but noops will
not be inserted until the dedicated hazard recognizer pass is run.

Depends on D89753.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D89754
2020-10-20 17:11:36 -07:00
Austin Kerbow
37d907899f [HazardRec] Allow inserting multiple wait-states simultaneously
If a target can encode multiple wait-states into a noop allow emitting such
instructions directly.

Reviewed By: rampitec, dmgreen

Differential Revision: https://reviews.llvm.org/D89753
2020-10-20 17:03:47 -07:00
Tony
1bc7bfffdb [AMDGPU] Optimize waitcnt insertion for flat memory operations
Change waitcnt insertion to check the memory operand tokens to see if
flat memory operations access VMEM in the same way it does to check if
accessing LDS. This avoids adding waitcnt for counters for address
spaces that are not accessed.

In addition, only generate the pessimistic waitcnt 0 if a flat memory
operation appears to access both VMEM and LDS.

This benefits flat memory operations that explicitly specify the
address space as GLOBAL or LOCAL.

Differential Revision: https://reviews.llvm.org/D89618
2020-10-20 22:55:12 +00:00
Michael Liao
2a0e4d1c01 [amdgpu] Enhance AMDGPU AA.
- In general, a generic point may alias to pointers in all other address
  spaces. However, for certain cases enforced by the programming model,
  we may found a generic point won't alias to pointers to local objects.
  * When a generic pointer is loaded from the constant address space, it
    could only be a pointer to the GLOBAL or CONSTANT address space.
    Thus, it won't alias to pointers to the PRIVATE or LOCAL address
    space.
  * When a generic pointer is passed as a kernel argument, it also could
    only be a pointer to the GLOBAL or CONSTANT address space. Thus, it
    also won't alias to pointers to the PRIVATE or LOCAL address space.

Differential Revision: https://reviews.llvm.org/D89525
2020-10-20 09:54:12 -04:00
Carl Ritson
be2afbd019 [AMDGPU] Remove fix up operand from SI_ELSE
Remove immediate operand from SI_ELSE which indicates if EXEC has
been modified.  Instead always emit code that handles EXEC and
remove unnecessary instructions during pre-RA optimisation.

This facilitates passes (i.e. SIWholeQuadMode) adding exec mask
manipulation post control flow lowering, and pre control flow
lower passes do not need to be aware of SI_ELSE handling.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D89644
2020-10-20 19:15:21 +09:00
sstefan1
fbfb1c7909 [IR] Make nosync, nofree and willreturn default for intrinsics.
D70365 allows us to make attributes default. This is a follow up to
actually make nosync, nofree and willreturn default. The approach we
chose, for now, is to opt-in to default attributes to avoid introducing
problems to target specific intrinsics. Intrinsics with default
attributes can be created using `DefaultAttrsIntrinsic` class.
2020-10-20 11:57:19 +02:00
David Green
6dcbc323fd Revert "[ARM][LowOverheadLoops] Adjust Start insertion."
This reverts commit 38f625d0d1360b035271422bab922d22ed04d79a.

This commit contains some holes in its logic and has been causing
issues since it was commited. The idea sounds OK but some cases were not
handled correctly. Instead of trying to fix that up later it is probably
simpler to revert it and work to reimplement it in a more reliable way.
2020-10-20 08:55:21 +01:00
Kai Luo
638fee625d [PowerPC] Add test case for missing nsw flag. NFC. 2020-10-20 03:47:49 +00:00
Qiu Chaofan
1b2fe71ecf [DAGCombiner] Tighten reasscociation of visitFMA
From LangRef, FMF contract should not enable reassociating to form
arbitrary contractions. So it should not help rearrange nodes like
(fma (fmul x, c1), c2, y) into (fma x, c1*c2, y).

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D89527
2020-10-20 10:13:01 +08:00
Wang, Pengfei
3a85472af2 [X86] Fix assert fail when element type is i1.
extract_vector_elt will turn type vxi1 into i8, which triggers the assertion fail.
Since we don't really handle vxi1 cases in below code, we can just return from here.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D89096
2020-10-20 09:26:32 +08:00
Evgenii Stepanov
188a7d6710 Add alloca size threshold for StackTagging initializer merging.
Summary:
Initializer merging generates pretty inefficient code for large allocas
that also happens to trigger an exponential algorithm somewhere in
Machine Instruction Scheduler. See https://bugs.llvm.org/show_bug.cgi?id=47867.

This change adds an upper limit for the alloca size. The default limit
is selected such that worst case size of memtag-generated code is
similar to non-memtag (but because of the ISA quirks, this case is
realized at the different value of alloca size, ex. memset inlining
triggers at sizes below 512, but stack tagging instructions are 2x
shorter, so limit is approx. 256).

We could try harder to emit more compact code with initializer merging,
but that would only affect large, sparsely initialized allocas, and
those are doing fine already.

Reviewers: vitalybuka, pcc

Subscribers: llvm-commits
2020-10-19 13:44:07 -07:00
Craig Topper
edd0cb11bd [SelectionDAG][X86] Enable SimplifySetCC CTPOP transforms for vector splats
This enables these transforms for vectors:
(ctpop x) u< 2 -> (x & x-1) == 0
(ctpop x) u> 1 -> (x & x-1) != 0
(ctpop x) == 1 --> (x != 0) && ((x & x-1) == 0)
(ctpop x) != 1 --> (x == 0) || ((x & x-1) != 0)

All enabled if CTPOP isn't Legal. This differs from the scalar
behavior where the first two are done unconditionally and the
last two are done if CTPOP isn't Legal or Custom. The Legal
check produced better results for vectors based on X86's
custom handling. Might be worth re-visiting scalars here.

I disabled the looking through truncate for vectors. The
code that creates new setcc can use the same result VT as the
original setcc even if we truncated the input. That may work
work for most scalars, but definitely wouldn't work for vectors
unless it was a vector of i1.

Fixes or at least improves PR47825

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D89346
2020-10-19 12:56:59 -07:00
Craig Topper
e28376ec28 [X86] Add i32->float and i64->double bitcast pseudo instructions to store folding table.
We have pseudo instructions we use for bitcasts between these types.
We have them in the load folding table, but not the store folding
table. This adds them there so they can be used for stack spills.

I added an exact size check so that we don't fold when the stack slot
is larger than the GPR. Otherwise the upper bits in the stack slot
would be garbage. That would be fine for Eli's test case in PR47874,
but I'm not sure its safe in general.

A step towards fixing PR47874. Next steps are to change the ADDSSrr_Int
pseudo instructions to use FR32 as the second source register class
instead of VR128. That will keep the coalescer from promoting the
register class of the bitcast instruction which will make the stack
slot 4 bytes instead of 16 bytes.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D89656
2020-10-19 12:53:14 -07:00
Cameron McInally
629d1d117a [SVE] Update vector reduction intrinsics in new tests.
Remove `experimental` from the intrinsic names.
2020-10-19 13:27:46 -05:00
Amy Kwan
6a946fd06f [DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook, Use isOperationLegalOrCustom directly instead.
MULH is often expanded on targets.
This patch removes the isMulhCheaperThanMulShift hook and uses
isOperationLegalOrCustom instead.

Differential Revision: https://reviews.llvm.org/D80485
2020-10-19 12:23:04 -05:00
Piotr Sobczak
c872faf6e0 [AMDGPU] Do not generate S_CMP_LG_U64 on gfx7
S_CMP_LG_U64 was added in gfx8 and is guarded by hasScalarCompareEq64().

Rewrite S_CMP_LG_U64 to S_OR_B32 + S_CMP_LG_U32 for targets that
do not support 64-bit scalar compare.

Differential Revision: https://reviews.llvm.org/D89536
2020-10-19 14:44:31 +02:00
Kazushi (Jam) Marukawa
6bb60d3e26 [VE] Add setcc for fp128
Add setcc for fp128 and clean existing ISel patterns.  Also add
a regression test.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D89683
2020-10-19 21:36:57 +09:00
Kazushi (Jam) Marukawa
fb2bb6fad4 [VE] Add cast to/from fp128 patterns
Add cast to/from fp128 patterns.  Clean other cast patterns too.
Update a regression test by adding missing tests.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D89682
2020-10-19 21:35:27 +09:00
Hans Wennborg
0628bea513 Revert "[PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting"
This broke Chromium's PGO build, it seems because hot-cold-splitting got turned
on unintentionally. See comment on the code review for repro etc.

> This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
> the splitting pass to be toggled on/off. The current method of passing
> `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
> correctly (say, with `-O0` or `-Oz`).
>
> To implement the -fsplit-cold-code option, an attribute is applied to
> functions to indicate that they may be considered for splitting. This
> removes some complexity from the old/new PM pipeline builders, and
> behaves as expected when LTO is enabled.
>
> Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
> Differential Revision: https://reviews.llvm.org/D57265
> Reviewed By: Aditya Kumar, Vedant Kumar
> Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar

This reverts commit 273c299d5d649a0222fbde03c9a41e41913751b4.
2020-10-19 12:31:14 +02:00
Kazushi (Jam) Marukawa
8796746b2a [VE] Support select_cc
Add missing ISel patterns related to select_cc DAG nodes.
Add regression test of all combination of possible scalar types.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D89672
2020-10-19 18:54:25 +09:00
Kazushi (Jam) Marukawa
25955cbae4 [VE] Support br_cc comparing fp128
Support br_cc instruction comparing fp128 values.  Add a br_cc.ll
regression test for all kind of br_cc instructions.  And, clean
existing branch regression tests, this time.  Clean a brcond.ll
regression test for brcond instruction.  Remove mixed branch1.ll
regression test.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D89627
2020-10-19 18:29:39 +09:00
Kazushi (Jam) Marukawa
af8b444de3 [VE] Update ISel patterns for select instruction
Add an ISel pattern for fp128 select instruction and optimize generated
code for other types' select. instructions.  Add a regression test also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D89509
2020-10-19 18:28:21 +09:00
Kai Luo
354d3106c6 [PowerPC] Skip combining (uint_to_fp x) if x is not simple type
Current powerpc64le backend hits
```
Combining: t7: f64 = uint_to_fp t6
llc: llvm-project/llvm/include/llvm/CodeGen/ValueTypes.h:291: llvm::MVT llvm::EVT::getSimpleVT() const: Assertion `isSimple() && "Expected a SimpleValueType!"' failed.
```
This patch fixes it by skipping combination if `t6` is not simple type.
Fixed https://bugs.llvm.org/show_bug.cgi?id=47660.

Reviewed By: #powerpc, steven.zhang

Differential Revision: https://reviews.llvm.org/D88388
2020-10-19 05:23:46 +00:00
Fangrui Song
2819631914 [PrologEpilogInserter] Reduce PR16393 test and fix a prologue parameter in a debuginfo test 2020-10-18 22:18:42 -07:00
Craig Topper
9d23224bf6 [X86] Add test cases for PR47874. NFC 2020-10-18 12:45:01 -07:00
Fangrui Song
98797a5fc0 [PrologEpilogInserter][test] Improve SpilledToReg test
D39386 made CalleeSavedInfo possible to spill a register to another register
(vector register for POWER9) but did not actually test live-in.
2020-10-17 20:36:22 -07:00
Amara Emerson
4ad459997e [AArch64][GlobalISel] Select csinc if a select has a 1 on RHS.
Differential Revision: https://reviews.llvm.org/D89513
2020-10-16 16:49:52 -07:00
Albion Fung
d30155feaa [PowerPC] Implementation of 128-bit Binary Vector Rotate builtins
This patch implements 128-bit Binary Vector Rotate builtins for PowerPC10.

Differential Revision: https://reviews.llvm.org/D86819
2020-10-16 18:03:22 -04:00
Austin Kerbow
978fbd8268 [AMDGPU] Run hazard recognizer pass later
If instructions were removed in peephole passes after the hazard recognizer was
run it is possible that new hazards could be introduced.

Fixes: SWDEV-253090

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D89077
2020-10-16 12:15:51 -07:00
Amara Emerson
39c05a1a71 [AArch64][GlobalISel] Add selection support for v2s32 and v2s64 reductions for FADD/ADD.
We'll need legalizer lower() support for the other types to work.

Differential Revision: https://reviews.llvm.org/D89159
2020-10-16 11:41:57 -07:00
Amara Emerson
32f77eea2d [AArch64][GlobalISel] Regbankselect reductions to use FPR bank for scalars.
Differential Revision: https://reviews.llvm.org/D89075
2020-10-16 10:42:15 -07:00
Amara Emerson
9190411fcf [AArch64][GlobalISel] Add basic legalizer rules for supported add/fadd reductions.
NEON is pretty limited in it's reduction support. As a first step add some
basic rules for the legal types we can select.

Differential Revision: https://reviews.llvm.org/D89070
2020-10-16 10:35:46 -07:00
Amara Emerson
6042c25b0a [GlobalISel] Add translation support for vector reduction intrinsics.
In order to prevent the ExpandReductions pass from expanding some intrinsics
before they get to codegen, I had to add a -disable-expand-reductions flag
for testing purposes.

Differential Revision: https://reviews.llvm.org/D89028
2020-10-16 10:17:53 -07:00
Jay Foad
1417abe54c [AMDGPU] Add new llvm.amdgcn.fma.legacy intrinsic
Differential Revision: https://reviews.llvm.org/D89558
2020-10-16 17:10:21 +01:00
Matt Arsenault
ce16b6835b AMDGPU: Don't kill super-register with overlapping copy
This would end up killing part of the result super-register, resulting
in a verifier error on a later use of the overlapping registers.  We
could add kills of any non-aliasing registers, but we should be moving
away from relying on kill flags.
2020-10-16 09:34:35 -04:00
Florian Hahn
51ff04567b Recommit "[DSE] Switch to MemorySSA-backed DSE by default."
After investigation by @asbirlea, the issue that caused the
revert appears to be an issue in the original source, rather
than a problem with the compiler.

This patch enables MemorySSA DSE again.

This reverts commit 915310bf14cbac58a81fd60e0fa9dc8d341108e2.
2020-10-16 09:02:53 +01:00
Vedant Kumar
273c299d5d [PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting
This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
the splitting pass to be toggled on/off. The current method of passing
`-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
correctly (say, with `-O0` or `-Oz`).

To implement the -fsplit-cold-code option, an attribute is applied to
functions to indicate that they may be considered for splitting. This
removes some complexity from the old/new PM pipeline builders, and
behaves as expected when LTO is enabled.

Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
Differential Revision: https://reviews.llvm.org/D57265
Reviewed By: Aditya Kumar, Vedant Kumar
Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar
2020-10-15 23:13:33 +00:00
Amara Emerson
c2551c1f40 [GlobalISel] Remove scalar src from non-sequential fadd/fmul reductions.
It's probably better to split these into separate G_FADD/G_FMUL + G_VECREDUCE
operations in the translator rather than carrying the scalar around. The
majority of the time it'll get simplified away as the scalars are probably
identity values.

Differential Revision: https://reviews.llvm.org/D89150
2020-10-15 15:51:44 -07:00
Kazushi (Jam) Marukawa
410e5b17cf [VE] Support fabs/fcos/fsin/fsqrt math functions
VE doesn't have instruction for fabs/fcos/fsin/fsqrt, so expand them.
Add regression tests also.  Update fcopysign regression test, also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D89457
2020-10-16 06:27:38 +09:00
Thomas Lively
1992e30c2d [WebAssembly] Prototype i8x16.popcnt
As proposed at https://github.com/WebAssembly/simd/pull/379. Use a target
builtin and intrinsic rather than normal codegen patterns to make the
instruction opt-in until it is merged to the proposal and stabilized in engines.

Differential Revision: https://reviews.llvm.org/D89446
2020-10-15 21:18:22 +00:00
Jameson Nash
122d92dfc3 fix symbol printing on windows
Similar to MCSymbol::print in 3d6c8ebb584375d01b1acead4c2056b3f0c501fc
(llvm-svn: 81682, PR4966), these symbols may need to be quoted to be handled by
the linker correctly.

Reviewed By: compnerd

Differential Revision: https://reviews.llvm.org/D87099
2020-10-15 17:14:55 -04:00
alex-t
42ed388120 [AMDGPU] SILowerControlFlow::removeMBBifRedundant should not try to change MBB layout if it can fallthrough
removeMBBifRedundant normally tries to keep predecessors fallthrough when removing redundant MBB.
         It has to change MBBs layout to keep the new successor to immediately follow the predecessor of removed MBB.
         It only may be allowed in case the new successor itself has no successors to which it fall through.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D89397
2020-10-15 23:20:54 +03:00
Evgenii Stepanov
2e794a46b5 [AArch64] Stack frame reordering.
Implement stack frame reordering in the AArch64 backend.

Unlike the X86 implementation, AArch64 does not seem to benefit from
"access density" based frame reordering, mainly because it has a much
smaller variety of addressing modes, and the fact that all instructions
are 4 bytes so each frame object is either in range of an instruction
(and then the access is "free") or not (and that has a code size cost
of 4 bytes).

This change improves Memory Tagging codegen by
* Placing an object that has been chosen as the base tagged pointer of
the function at SP + 0. This saves one instruction to setup the pointer
(IRG does not have an offset immediate), and more because that object
can now be referenced without materializing its tagged address in a
scratch register.
* Placing objects that go out of scope simultaneously together. This
exposes opportunities for instruction merging in tryMergeAdjacentSTG.

Differential Revision: https://reviews.llvm.org/D72366
2020-10-15 12:50:16 -07:00
Evgenii Stepanov
2f63e57fa5 [MTE] Pin the tagged base pointer to one of the stack slots.
Summary:
Pin the tagged base pointer to one of the stack slots, and (if
necessary) rewrite tag offsets so that an object that occupies that
slot has both address and tag offsets of 0. This allows ADDG
instructions for that object to be eliminated and their uses replaced
with the tagged base pointer itself.

This optimization must be done in machine instructions and not in the IR
instrumentation pass, because referring to a stack slot through an IRG
pointer would confuse the stack coloring pass.

The optimization makes a (pretty naive) attempt to find the slot that
would benefit the most by counting the uses of stack slots in the
function.

Reviewers: ostannard, pcc

Subscribers: merge_guards_bot, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72365
2020-10-15 12:50:16 -07:00
Stanislav Mekhanoshin
d1beb95d12 [AMDGPU] gfx1032 target
Differential Revision: https://reviews.llvm.org/D89487
2020-10-15 12:41:18 -07:00
Thomas Lively
3f738d1f5e Reland "[WebAssembly] v128.load{8,16,32,64}_lane instructions"
This reverts commit 7c8385a352ba21cb388046290d93b53dc273cd9f with a typing fix
to an instruction selection pattern.
2020-10-15 19:32:34 +00:00