43145 Commits

Author SHA1 Message Date
David Penry
fa49021c68 Revert "[CodeGen][ARM] Enable Swing Module Scheduling for ARM"
This reverts commit 28d09bbbc3d09c912b54a4d5edb32cab7de32a6f
while I investigate a buildbot failure.
2022-04-28 13:29:27 -07:00
Simon Pilgrim
ab17ed0723 [X86] Don't fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) on BMI2 targets
With BMI2 we have SHRX which is a lot quicker than regular x86 shifts.

Fixes #55138
2022-04-28 21:28:16 +01:00
Craig Topper
181dcbd36d [RISCV] Add riscv32 RUN lines to bittest.ll. NFC
Add extra check-prefixes to merge common results.
2022-04-28 13:02:13 -07:00
David Penry
28d09bbbc3 [CodeGen][ARM] Enable Swing Module Scheduling for ARM
This patch permits Swing Modulo Scheduling for ARM targets
turns it on by default for the Cortex-M7.  The t2Bcc
instruction is recognized as a loop-ending branch.

MachinePipeliner is extended by adding support for
"unpipelineable" instructions.  These instructions are
those which contribute to the loop exit test; in the SMS
papers they are removed before creating the dependence graph
and then inserted into the final schedule of the kernel and
prologues. Support for these instructions was not previously
necessary because current targets supporting SMS have only
supported it for hardware loop branches, which have no
loop-exit-contributing instructions in the loop body.

The current structure of the MachinePipeliner makes it difficult
to remove/exclude these instructions from the dependence graph.
Therefore, this patch leaves them in the graph, but adds a
"normalization" method which moves them in the schedule to
stage 0, which causes them to appear properly in kernel and
prologues.

It was also necessary to be more careful about boundary nodes
when iterating across successors in the dependence graph because
the loop exit branch is now a non-artificial successor to
instructions in the graph. In additional, schedules with physical
use/def pairs in the same cycle should be treated as creating an
invalid schedule because the scheduling logic doesn't respect
physical register dependence once scheduled to the same cycle.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D122672
2022-04-28 13:01:18 -07:00
Simon Pilgrim
a546b9b06b [X86] setcc.ll - add "NOTBM" check-prefix for expected common code 2022-04-28 20:41:47 +01:00
David Tenty
8042699a30 [LLVM] Add exported visibility style for XCOFF
For the AIX linker, under default options, global or weak symbols which
have no visibility bits set to zero (i.e. no visibility, similar to ELF
default) are only exported if specified on an export list provided to
the linker. So AIX has an additional visibility style called
"exported" which indicates to the linker that the symbol should
be explicitly globally exported.

This change maps "dllexport" in the LLVM IR to correspond to XCOFF
exported as we feel this best models the intended semantic (discussion
on the discourse RFC thread: https://discourse.llvm.org/t/rfc-adding-exported-visibility-style-to-the-ir-to-model-xcoff-exported-visibility/61853)
and allows us to enable writing this visibility for the AIX target
in the assembly path.

Reviewed By: DiggerLin

Differential Revision: https://reviews.llvm.org/D123951
2022-04-28 14:56:00 -04:00
Craig Topper
ec11fbb1d6 [RISCV] Use default promotion for (i32 (shl 1, X)) on RV64 when Zbs is enabled.
This improves opportunities to use bset/bclr/binv. Unfortunately,
there are no W versions of these instrcutions so this isn't always
a clear win. If we use SLLW we get free sign extend and shift masking,
but need to put a 1 in a register and can't remove an or/xor. If
we use bset/bclr/binv we remove the immediate materializationg and
logic op, but might need a mask on the shift amount and sext.w.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D124096
2022-04-28 09:58:30 -07:00
Bjorn Pettersson
3a39bb96ca [SelectionDAG] Use correct boolean representation in FoldConstantArithmetic
The description of SETCC says
  /// SetCC operator - This evaluates to a true value iff the condition is
  /// true.  If the result value type is not i1 then the high bits conform
  /// to getBooleanContents.

Without this patch, we sign extended the i1 to the used larger type
regardless of getBooleanContents. This resulted in miscompiles, as
shown in the attached testcase that ended up returning -1 instead of
1 when using -mattr=+v.

Fixes https://github.com/llvm/llvm-project/issues/55168

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D124618
2022-04-28 18:42:16 +02:00
Craig Topper
8631a5e712 [RISCV] Fix alias printing for vmnot.m
By clearing the HasDummyMask flag from mask register binary operations
and mask load/store.

HasDummyMask was causing an extra operand to get appended when
converting from MachineInstr to MCInst. This extra operand doesn't
appear in the assembly string so was mostly ignored, but it prevented
the alias instruction printing from working correctly.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D124424
2022-04-28 08:33:52 -07:00
Simon Pilgrim
de7cee24b6 [X86] getBT - attempt to peek through aext(and(trunc(x),c)) mask/modulo
Ideally we'd fold this with generic DAGCombiner, but that only works for !isTruncateFree cases - we might be able to adapt IsDesirableToPromoteOp to find truncated src ops in the future, but for now just use this peephole.

Noticed in Issue #55138
2022-04-28 16:10:26 +01:00
Simon Pilgrim
72959f7714 [X86] Add test case for Issue #55158 2022-04-28 13:06:58 +01:00
Andrew Savonichev
0f1b5f115a [NVPTX] Integrate ptxas to LIT tests
ptxas is a proprietary compiler from Nvidia that can compile PTX to
machine code (SASS). It has a lot of diagnostics to catch errors
in PTX, which can be used to verify PTX output from llc.

Set -DPXTAS_EXECUTABLE=/path/to/ptxas CMake option to enable it.
If this option is not set, then ptxas is substituted to true which
effectively disables all ptxas RUN lines.

LLVM_PTXAS_EXECUTABLE environment variable takes precedence over
the CMake option, and allows to override ptxas executable that is used for LIT
without complete re-configuration.

Differential Revision: https://reviews.llvm.org/D121727
2022-04-28 14:59:45 +03:00
Simon Pilgrim
ed8dffef4c [X86] getFauxShuffle - don't assume an UNDEF src element for AND/ANDNP results in an UNDEF shuffle mask index
The other src element might be zero, guaranteeing zero.

Fixes #55157
2022-04-28 12:32:58 +01:00
Simon Pilgrim
ae8143547a Revert rG8680dd5117b0c36f807fecc4360122ae1dd73b6d "[X86] getFauxShuffle - don't assume an UNDEF src element for AND/ANDNP results in an UNDEF shuffle mask index"
I screwed up the merge somehow.
2022-04-28 12:25:14 +01:00
Simon Pilgrim
8680dd5117 [X86] getFauxShuffle - don't assume an UNDEF src element for AND/ANDNP results in an UNDEF shuffle mask index
The other src element might be zero, guaranteeing zero.

Fixes #55157
2022-04-28 11:54:31 +01:00
Simon Pilgrim
e7435e61e9 [X86] Add test case for Issue #55157 2022-04-28 11:34:45 +01:00
Lian Wang
dc0ae8ce18 [RISCV] Support VP_SETCC mask operations
Support VP_SETCC mask operations, turn it to logical operation.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D124438
2022-04-28 08:52:29 +00:00
Luo, Yuanke
942ec5c36d [X86][AMX] combine tile cast and load/store instruction.
The `llvm.x86.cast.tile.to.vector` intrinsic is lowered to
`llvm.x86.tilestored64.internal` and `load <256 x i32>`. The
`llvm.x86.cast.vector.to.tile` is lowered to `store <256 x i32>` and
`llvm.x86.tileloadd64.internal`. When `llvm.x86.cast.tile.to.vector` is
used by `store <256 x i32>` or `load <256 x i32>` is used by
`llvm.x86.cast.vector.to.tile`, they can be combined by
`llvm.x86.tilestored64.internal` and `llvm.x86.tileloadd64.internal`.

Differential Revision: https://reviews.llvm.org/D124378
2022-04-28 14:55:21 +08:00
Shengchen Kan
6a6b0e4a63 [X86] Check the address in machine verifier
1. The scale factor must be 1, 2, 4, 8
2. The displacement must fit in 32-bit signed integer

Noticed by: https://github.com/llvm/llvm-project/issues/55091

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D124455
2022-04-28 10:05:39 +08:00
Simon Pilgrim
8032c5f68c [X86] setcc.ll - add bmi2 + tbm test coverage
As discussed on Issue #55138 - BMI2 (fast shrx) shouldn't always fold to BT
2022-04-27 21:09:57 +01:00
Simon Pilgrim
edc80e7d43 [X86] setcc.ll - remove unnecessary cpu attributes 2022-04-27 21:09:57 +01:00
Simon Pilgrim
81b38668ff [X86] Add test case for Issue #55138 2022-04-27 21:09:57 +01:00
Craig Topper
c2614b31d9 [RISCV] Add isCommutable to scalar FMA instructions.
The default implementation of findCommutedOpIndices picks the
first two source operands. That's exactly what we want for the
scalar FMA instructions.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D124463
2022-04-27 11:07:18 -07:00
Simon Pilgrim
03482bccad [X86] collectConcatOps - add ability to collect from vector 'widening' patterns
Recognise insert_subvector(undef, x, lo/hi) patterns where we double the width of a vector - creating an UNDEF subvector on the fly.
2022-04-27 15:38:58 +01:00
Ivan Kosarev
6ddf2a824d [AMDGPU] Adjust wave priority based on VMEM instructions to avoid duty-cycling.
As older waves execute long sequences of VALU instructions, this may
prevent younger waves from address calculation and then issuing their
VMEM loads, which in turn leads the VALU unit to idle. This patch tries
to prevent this by temporarily raising the wave's priority.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D124246
2022-04-27 14:37:18 +01:00
Denis Antrushin
4059770af5 [StatepointLowering] Only export STATEPOINT results if used in nonlocal blocks.
Cuurently we always export STATEPOINT results (GC pointers lowered via VRegs)
to virtual registers. When processing gc.relocate instructions we have to
generate CopyFromRegs node and then export it to VReg again if gc.relocate
is used in other basic blocks. This results in generation of extra COPY MIR
instruction if statepoint and its gc.relocate are in the same BB, but gc.relocate
result is used in other blocks.

This patch changes this behavior to export statepoint results only if used
in other basic blocks. For local uses StatepointLoweringState.(get|set)Location()
API is used to communicate appropriate statepoint result from `LowerStatepoint()`
to `visitGCRelocate()`

This is NFC and is purely compile time optimization. On big methids it can improve
codegen compile time up to 10%.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D124444
2022-04-27 15:53:24 +03:00
Liqin.Weng
ca3cd345a0 [MIPS][SelectionDAG] Enable TargetLowering::hasBitTest for masks that fit in ANDI.
Reviewed By: sdardis

Differential Revision: https://reviews.llvm.org/D123577
2022-04-27 09:03:14 +00:00
ShihPo Hung
6b55f133fb [RISCV][RVV] Select unmasked TU RVV pseudos in a DAG post-process
Following D118810 that reduced the size of ISel table,
this patch optimizes allone-masked RVV pseudos with TU policy and
swap them out to their unmasked TU pseudos.

Since the UNDEF merge operand is not preserved, we turn it into TA
pseudo regardless of the policy operand.

Reviewed By: craig.topper, frasercrmck
Differential Revision: https://reviews.llvm.org/D121881
2022-04-26 20:14:54 -07:00
ShihPo Hung
bcb2b86df6 [RISCV] Precommit test for D121881
Differential Revision: https://reviews.llvm.org/D123385
2022-04-26 20:14:54 -07:00
David Tenty
f6d209b3ec [AIX][XCOFF] error on emit symbol visibility for XCOFF object file
This is a follow on to the revert of D84265 to add an error if we'd need
to write a non-zero visibility type in the xcoff object file. We can't
currently do that because we lack the auxilary header to interpret the
bits in XCOFF32. This is important because visibility is being enabled
in the assembly writing path, and without this error the visibility
could be silently ignored.

Differential Revision: https://reviews.llvm.org/D124392
2022-04-26 19:22:44 -04:00
Jay Foad
6753bb2c41 [AMDGPU] Precommit a test case for D124450 2022-04-26 18:44:50 +01:00
Igor Chebykin
541cbeeddb [NVPTX][tests] add "XFAIL: nvptx" for some tests
Some tests failed for NVPTX target, but it seems that NVPTX will be
fixed and the tests will pass. So, just mark the tests as XFAIL

Differential Revision: https://reviews.llvm.org/D124125
2022-04-26 17:26:56 +03:00
Igor Chebykin
84cf290c84 [NVPTX][tests] Do not run the tests which are not supported by nvptx
Some generic tests are not supported by the nvptx now.  Moreover, they
are no plans to fix the tested features in nvptx. So, suggest to mark
them as UNSUPPORTED

Differential Revision: https://reviews.llvm.org/D123928
2022-04-26 17:26:56 +03:00
Xiang1 Zhang
c430f0f532 [X86] Add use condition for combineSetCCMOVMSK
Reviewed by RKSimon, LuoYuanke

Differential Revision: https://reviews.llvm.org/D123652
2022-04-26 16:42:50 +08:00
Christudasan Devadasan
8f9dd5e608 [AMDGPU] Vector register spill test cleanup (NFC)
The vector register spills have no dependency with
SILowerSGPRSpills pass anymore. The entire handling
has been moved to PrologEpilogInserter with D55301.
2022-04-26 13:17:16 +05:30
Luo, Yuanke
f3ad7ea03a [X86][AMX] Report error when shapes are not pre-defined.
Instead of report fatal error, this patch emit error message and exit
when shapes are not pre-defined. This would cause the compiling fail but
not crash.

Differential Revision: https://reviews.llvm.org/D124342
2022-04-26 14:57:25 +08:00
Serge Pavlov
170a903144 Intrinsic for checking floating point class
This change introduces a new intrinsic, `llvm.is.fpclass`, which checks
if the provided floating-point number belongs to any of the the specified
value classes. The intrinsic implements the checks made by C standard
library functions `isnan`, `isinf`, `isfinite`, `isnormal`, `issubnormal`,
`issignaling` and corresponding IEEE-754 operations.

The primary motivation for this intrinsic is the support of strict FP
mode. In this mode using compare instructions or other FP operations is
not possible, because if the value is a signaling NaN, floating-point
exception `Invalid` is raised, but the aforementioned functions must
never raise exceptions.

Currently there are two solutions for this problem, both are
implemented partially. One of them is using integer operations to
implement the check. It was implemented in https://reviews.llvm.org/D95948
for `isnan`. It solves the problem of exceptions, but offers one
solution for all targets, although some can do the check in more
efficient way.

The other, implemented in https://reviews.llvm.org/D96568, introduced a
hook 'clang::TargetCodeGenInfo::testFPKind', which injects a target
specific code into IR to implement `isnan` and some other functions. It is
convenient for targets that have dedicated instruction to determine FP data
class. However using target-specific intrinsic complicates analysis and can
prevent some optimizations.

A special intrinsic for value class checks allows representing data class
tests with enough flexibility. During IR transformations it represents the
check in target-independent way and saves it from undesired transformations.
In the instruction selector it allows efficient lowering depending on the
used target and mode.

This implementation is an extended variant of `llvm.isnan` introduced
in https://reviews.llvm.org/D104854. It is limited to minimal intrinsic
support. Target-specific treatment will be implemented in separate
patches.

Differential Revision: https://reviews.llvm.org/D112025
2022-04-26 13:09:16 +07:00
Lian Wang
9980148305 [RISCV][SelectionDAG] Support VP_ADD/VP_MUL/VP_SUB mask operations
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D124144
2022-04-26 02:30:22 +00:00
Craig Topper
494d86d45b [RISCV] Pre-commit test for D122769. NFC 2022-04-25 16:13:57 -07:00
Jakub Chlanda
76d1f5eaa8 [NVPTX] Support float <-> 2 x half bitcasts
Make sure NVPTX backend can handle bitcasting between `float` and `<2 x half>` types.

This was discovered through: https://github.com/intel/llvm/issues/5969
I'm not suggesting that such bitcasts make much sense, but it feels like the compiler should not hard crash on them.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D124171
2022-04-25 14:37:41 -07:00
Artem Belevich
993054c1c9 Change NVPTX/f16x2-instructions.ll to use unix EOL. NFC 2022-04-25 14:30:23 -07:00
Matt Arsenault
7714e03175 RegAllocGreedy: Allow last chance recolor to retry overlapping tuples
Last chance recoloring didn't try recoloring a done register with the
same class since it believed there was no point. This doesn't
necessarily apply if the members in that class overlap. Allow the
recoloring to proceed if the assigned interfering physical register
overlaps with the candidate register.

This avoids an allocation failure with overlapping tuples. This
testcase could be handled better, and I don't believe should reach
last chance recoloring. The failure only manifests with the mutually
unsatisfiable register hints to overlapping tuples. The earlier
assignment decisions probably should have figured out that using these
hints was a bad idea.
2022-04-25 17:07:17 -04:00
Craig Topper
40f1af4760 [RISCV] Add isCommutable to ADD/ADDW/MUL/AND/OR/XOR/MIN/MAX/CLMUL
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D123970
2022-04-25 10:53:41 -07:00
David Green
9727c77d58 [NFC] Rename Instrinsic to Intrinsic 2022-04-25 18:13:23 +01:00
Arthur Eubanks
6f73bd7813 [test] Remove legacy PM pipeline test
The legacy PM for the optimization pipeline is deprecated and in the process of being cleaned up.
2022-04-25 10:02:38 -07:00
Zakk Chen
ffe03ff75c [RISCV] Fix incorrect policy implement for unmasked vslidedown and vslideup.
vslideup works by leaving elements 0<i<OFFSET undisturbed.
so it need the destination operand as input for correctness
regardless of policy. Add a operand to indicate policy.

We also add policy operand for unmaksed vslidedown to keep the interface consistent with vslideup
because vslidedown have only undisturbed at 0<i<vstart but user have no way to control of vstart.

Reviewed By: rogfer01, craig.topper

Differential Revision: https://reviews.llvm.org/D124186
2022-04-25 09:18:41 -07:00
Simon Pilgrim
e8305c0b8f [X86] combineX86ShuffleChain - don't fold to truncate(concat(V1,V2)) if it was already a PACK op
Fixes #55050
2022-04-25 17:13:44 +01:00
Christudasan Devadasan
16d87efc2a [AMDGPU] Lit test pre-commit changes (NFC)
Run line change needed for an upcoming patch.
2022-04-25 21:22:33 +05:30
Christudasan Devadasan
9f631cf7c6 [AMDGPU] Regenerate lit test pattern (NFC). 2022-04-25 21:12:33 +05:30
Luo, Yuanke
c712bf3ce4 [X86][AMX] Add test case for D124378. 2022-04-25 20:03:27 +08:00