52796 Commits

Author SHA1 Message Date
Piyou Chen
a70aa5ea7c [RISCV] precommit for removing useless copy from undef subreg
testcase from https://github.com/llvm/llvm-project/issues/63554

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D155039
2023-07-20 20:38:24 -07:00
Philip Reames
4f057f5296 [RISCV] Expand memset.inline test coverage [nfc]
Add coverage for unaligned overlap cases, and for vector stores.

Note that the vector memset here is coming from store combining, not memset lowering.
2023-07-20 17:37:36 -07:00
Matt Arsenault
d33ab05467 AMDGPU: Add flag to disable fdiv processing in IR pass
We kind of have to have multiple implementations of fdiv split between
the two selectors with some pre-processing. Add yet another test to
check for consistency of interpretation of flag combinations. We have
quite a bit of test redundancy here already, but there are so many
possible interesting permutations it's unwieldy to cover every detail
in any one of them. We have a number of overlapping fdiv tests but
it's hard to follow everything going on as it is.
2023-07-20 19:51:15 -04:00
Matt Arsenault
b2d58b596c AMDGPU: Expand rsq testing to cover contract flag
The 1.0/sqrt(x) -> rsq(x) fold increases precision and probably needs
a contract flag.
2023-07-20 19:51:15 -04:00
Matt Arsenault
fb54afd1b7 AMDGPU: Fold fsub [+-0] into fneg when folding source modifiers
This isn't always folded to fneg for a freestanding fsub depending on
the denormal mode. When matching source modifiers, we're implicitly
canonicalizing the input so we can fold it here.

Doesn't bother handling the VOP3P case since it's only relevant with
DAZ, which nobody really uses with f16.

For f64, tests show an existing bug where DAGCombiner tries to respect
the denormal mode for fsub -0, x, but not after it's lowered to fadd
-0, (fneg x). Either the fold is wrong or we shouldn't restrict the
fsub case based on the denormal mode.

https://reviews.llvm.org/D155652
2023-07-20 19:29:40 -04:00
Matt Arsenault
881e9f2934 AMDGPU: Regenerate test checks
Mostly a workaround for recent reverts in update_test_checks
2023-07-20 19:26:35 -04:00
Matt Arsenault
ca34f1bdcd AMDGPU: Add baseline test for folding fsub into fneg modifiers 2023-07-20 18:29:35 -04:00
Matt Arsenault
0295513238 AMDGPU: Filter out contract flags when lowering exp
It is unsafe to contract the fsub into the fmul. It also increases
code size by duplicating a constant.
2023-07-20 18:14:24 -04:00
Matt Arsenault
076bc374fc AMDGPU: Add some new baseline tests for exp lowering 2023-07-20 18:14:24 -04:00
Philip Reames
34c01a6044 [RISCV] Add memset.inline test coverage with and without V [nfc] 2023-07-20 15:03:53 -07:00
Philip Reames
eb3f2fe467 [RISCV] Revise check names for unaligned memory op tests [nfc]
This has come up a few times in review; the current ones seem to be universally confusing.  Even I as the original author of most of these get confused.  Switch to using the SLOW/FAST naming used by x86, hopefully that's a bit clearer.
2023-07-20 13:36:53 -07:00
Jingu Kang
351b4c17dd Revert "[MachineLICM] Handle Subloops"
This reverts commit 50dd383d08670960540fecb4b48c0f0429fbfba3.
2023-07-20 17:12:25 +01:00
Nikita Popov
9dc391e89c Revert "[IR] Mark add constant expressions as undesirable"
This reverts commit f8a36d8c3e264c4fccf8058e699201a452ea7bb7.

I believe this is causing an assertion failure on the
sanitizer-x86_64-linux buildbot:

clang++: /b/sanitizer-x86_64-linux/build/llvm-project/llvm/include/llvm/Support/Casting.h:578: decltype(auto) llvm::cast(From *) [To = llvm::BinaryOperator, From = llvm::Value]: Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed.

  #10 0x000055bdd7e82408 canonicalizeLogicFirst(llvm::BinaryOperator&, llvm::IRBuilder<llvm::TargetFolder, llvm::IRBuilderCallbackInserter>&) /b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp:2131:5
  #11 0x000055bdd7e80183 llvm::InstCombinerImpl::visitAnd(llvm::BinaryOperator&) /b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp:2661:20

Likely the code is encountering a constant expression in a case it
didn't before.
2023-07-20 18:09:17 +02:00
Jingu Kang
50dd383d08 [MachineLICM] Handle Subloops
Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass
handle subloops with only visiting outmost loop's blocks once.

Differential Revision: https://reviews.llvm.org/D154205
2023-07-20 16:39:13 +01:00
Jingu Kang
8bad7ad6d6 [AArch64] Reuse larger DUPLANE if available
As combining DUP, try to reuse larger DUPLANELANE.

Differential Revision: https://reviews.llvm.org/D155592
2023-07-20 15:49:33 +01:00
Kevin P. Neal
95c2d01dfe [FPEnv][RISCV] Correct strictfp tests.
Correct RISC-V strictfp tests to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

Mostly these tests just needed the strictfp attribute on function definitions.
I've also removed the strictfp attribute from uses of the constrained
intrinsics because it comes by default since D154991, but I only did this
in tests I was changing anyway.

Test changes verified with D146845.
2023-07-20 10:16:56 -04:00
Jake Egan
311abf5fc0 Implement -frecord-command-line for XCOFF integrated assembler path
The patch D153600 implemented `-frecord-command-line` for the XCOFF direct assembly path. This patch adds support for the XCOFF integrated assembly path.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D154921
2023-07-20 09:45:37 -04:00
Nikita Popov
f8a36d8c3e [IR] Mark add constant expressions as undesirable
In preparation for removing support for add expressions, mark them
as undesirable. As such, we will no longer implicitly create such
expressions, but they still exist.
2023-07-20 15:24:19 +02:00
Danila Malyutin
e1aa4e7b38 [Statepoint] Use correct RegisterClass for spilling
Copy propagation might have changed the register class of the register

Differential Revision: https://reviews.llvm.org/D155792
2023-07-20 16:00:00 +03:00
Simon Pilgrim
cc77da5020 [X86] LowerTRUNCATE - use LowerTruncateVecPackWithSignBits for prefer-256 bit AVX512 cases during type legalization
If the AVX512 target will split the 512-bit vector truncation then try to use PACKSS/PACKUS first.
2023-07-20 13:55:28 +01:00
Thorsten Schütt
9d138baeb5 [GIsel][AArch64] extend legalization of G_INSERT_VECTOR_ELT
Fixes https://github.com/llvm/llvm-project/issues/63826

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D155274
2023-07-20 13:40:00 +02:00
Simon Pilgrim
7567b72f4d [DAG] ShrinkDemandedConstant - early-out for empty DemandedBits/Elts
Leave this to constant folding in SimplifyDemandedBits

Fixes #63975
2023-07-20 12:18:10 +01:00
Simon Pilgrim
697f60598e [DAG] hoistLogicOpWithSameOpcodeHands - ensure SIGN_EXTEND_INREG nodes have the same extension value type
Fix bug in the check for matching SIGN_EXTEND_INREG types
2023-07-20 10:44:46 +01:00
Simon Pilgrim
f1cc7913f3 [X86] Add test case showing incorrect and(sextinreg(v0,i2),sextinreg(v1,i5)) -> sextinreg(and(v0,v1),i2) fold 2023-07-20 10:44:46 +01:00
David Green
0c41c59dee [DAG][AArch64] Fix truncated vscale constant types
It appears that vscale values truncated to i1 causes mismatches in the constant
types when created in getNode. https://godbolt.org/z/TaaTo86ne.

Differential Revision: https://reviews.llvm.org/D155626
2023-07-20 09:12:05 +01:00
Fangrui Song
82b4368f7f [llvm-readobj] Print <null> for relocation target with an empty name
For a relocation, we don't differentiate the two cases:

* the symbol index is 0
* the symbol index is non zero, the type is not STT_SECTION, and the name is empty. Clang generates such local symbols for RISC-V linker relaxation.

So we may print
```
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
000000000000001c  0000000100000039 R_RISCV_32_PCREL       0000000000000000 0

// llvm-readobj
0x1C R_RISCV_32_PCREL - 0x0
```

while GNU readelf prints "<null>", which is clearer. Let's match the GNU behavior.
Related to https://reviews.llvm.org/D81842

```
000000000000001c  0000000100000039 R_RISCV_32_PCREL       0000000000000000 <null> + 0

// llvm-readobj
0x1C R_RISCV_32_PCREL <null> 0x0
```

Reviewed By: jhenderson, kito-cheng

Differential Revision: https://reviews.llvm.org/D155353
2023-07-20 00:42:38 -07:00
Fangrui Song
94830bf56c [WebAssembly] Use SetVector to stabilize iteration order after D120365
StringMap iteration order is not guaranteed to be deterministic
(https://llvm.org/docs/ProgrammersManual.html#llvm-adt-stringmap-h).
2023-07-20 00:02:06 -07:00
Freddy Ye
1c154bd755 [X86] Add AVX-VNNI-INT16 instructions.
For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Reviewed By: pengfei, skan

Differential Revision: https://reviews.llvm.org/D155145
2023-07-20 14:31:16 +08:00
Danila Malyutin
76fd79b9d5 [X86] Recognize standalone (1 << nbits) - 1 pattern as bzhi
This can be thought as a subcase of `x & ((1 << nbits) - 1)` where x == -1

Differential Revision: https://reviews.llvm.org/D155622
2023-07-20 09:18:23 +03:00
Danila Malyutin
c1013a6eee [X86][AArch64] Add additional extract_lowbits test
Check that vreg_width-1 mask is only removed for shifts

Differential Revision: https://reviews.llvm.org/D155734
2023-07-20 09:18:19 +03:00
Freddy Ye
049d6a3f42 [X86] Add SM4 instructions.
For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Reviewed By: pengfei, skan

Differential Revision: https://reviews.llvm.org/D155148
2023-07-20 13:35:15 +08:00
Freddy Ye
c6f66de21a [X86] Add SM3 instructions.
For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D155147
2023-07-20 10:24:16 +08:00
Freddy Ye
fc3b7874b6 [X86] Add SHA512 instructions.
For more details about this instruction, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Reviewed By: RKSimon, skan

Differential Revision: https://reviews.llvm.org/D155146
2023-07-20 09:44:44 +08:00
Amara Emerson
ccffc27050 [AArch64][GlobalISel] Widen (<2 x s16> = G_BUILD_VECTOR) to <2 x s32>.
We don't support this as a argument or return type, it's always promoted to <2 x s32>.

Performing the widening prevents us from having selection failures due to unsupported
extends.

Fixes https://github.com/llvm/llvm-project/issues/58274
2023-07-19 16:50:54 -07:00
Craig Topper
7dfe62327d [RISCV] Add a DAG combine for (czero_eq X, (xor Y, 1)) -> (czero_ne X, Y) if Y is 0 or 1.
This is an alternative to D155288 that can handle other sources of
xori like FP compares. Unfortunately, it misses the i64 setge case
on RV32 in condops.ll.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D155328
2023-07-19 12:33:08 -07:00
Simon Pilgrim
310a9a4f28 [X86] matchBinaryShuffle - relax PACKSS for v2i64 -> v4i32 shuffle truncation pattern match.
Similar to combineVectorSignBitsTruncation, we don't require all-signbits source inputs, just enough signbits to reach into the lowest i16 to safely use PACKSSDW.
2023-07-19 18:58:21 +01:00
Momchil Velikov
4c95f79cce [CodeGenPrepare] Refactor optimizeSelectInst (NFC)
Refactor to use BasicBlockUtils functions and make life easier for
a subsequent patch for updating the dominator tree.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D154053
2023-07-19 18:56:44 +01:00
Johannes Doerfert
d015018cb7 [AMDGPUAttributor][FIX] No endless recursion for recursive initializers
Fixes: https://github.com/llvm/llvm-project/issues/63956
2023-07-19 10:27:01 -07:00
Craig Topper
3055c5815a [RISCV] Upgrade Zvfh version to 1.0 and move out of experimental state.
This has been ratified according to https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions

Differential Revision: https://reviews.llvm.org/D155668
2023-07-19 10:03:57 -07:00
Luke Lau
efedcbeeb8 [RISCV] Fold ops into vmv.v.v as vmerge with all-ones mask
A vmv.v.v shares the same encoding as a vmerge that isn't masked, so we can
also fold it into its operands if we treat it as a vmerge with an all-ones
mask.  We take care here not to actually transform the existing vmv into a
vmerge, otherwise things like True.hasOneUse() become inaccurate. Instead this
just returns an equivalent list of operands.
This is an alternative to D153351.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D155101
2023-07-19 17:24:42 +01:00
Luke Lau
0f277ab361 [RISCV] Fold vmerge into its ops with smaller VL if known
Currently when folding vmerge into its operands, we stop if the VLs aren't
identical.  However since the body of (vmerge (vop)) is the intersection of
vmerge and vop's bodies, we can use the smaller of the two VLs if we know it
ahead of time.  This patch relaxes the constraint on VL if they are both
constants, or if either of them are VLMAX.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D155071
2023-07-19 17:24:40 +01:00
Luke Lau
66dc29a82a [RISCV] Add tests for merges with differing VLs that could be folded
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D155069
2023-07-19 17:24:38 +01:00
Simon Pilgrim
db50b77ed4 [X86] matchBinaryShuffle - match PACKSS for v2i64 -> v4i32 all-signbits shuffle truncation patterns.
Ideally matchShuffleWithPACK should be able to handle this, but it needs a major rewrite to handle illegal types.
2023-07-19 17:02:11 +01:00
Djordje Todorovic
80e20c8a8d [RISCV] Add DAG combine for CTTZ/CTLZ in the case of input 0
Within the AggressiveInstCombine Pass we have
an analysis/optimization that matches that
pattern of the Table Based CTZ. Some Targets do
not support/define ctz(0), but since the
AggressiveInstCombine is just an extension of
InstCombine, it should be a target-independent
canonicalization Pass, and therefore, we decided
to introduce several instructions, such as select
and compare that produce canonical IR, even if
the input is 0. The task for the Targets that do
support that input is to handle such a case and
to produce an optimal assembly.

This patch optimizes the CTTZ/CTLZ instructions
if the input is 0 by performing the`DAG combine`,
by generating the cttz(x) & 0x1f pattern (the
same goes for ctlz as well).

Differential Revision: https://reviews.llvm.org/D151449
2023-07-19 16:22:04 +02:00
Simon Pilgrim
6cf8bde056 [X86] getFauxShuffleMask - add SIGN_EXTEND_VECTOR_INREG handling for all-signbits sources
Add suport for shuffle combines (via combineEXTEND_VECTOR_INREG) to begin from SIGN_EXTEND_VECTOR_INREG nodes
2023-07-19 14:32:34 +01:00
Simon Pilgrim
32ed3031fa [X86] Add test coverage for Issue #63946 2023-07-19 14:05:13 +01:00
Simon Pilgrim
70893b62cf [X86] matchUnaryShuffle - match SIGN_EXTEND_VECTOR_INREG patterns for 'all-signbits' sources
Adapt the existing ANY/ZERO_EXTEND_VECTOR_INREG shuffle matching to also recognise SIGN_EXTEND_VECTOR_INREG patterns to handle cases where we're effectively "splatting" all-signbits sources.
2023-07-19 14:05:13 +01:00
John Brawn
cee7e7b245 [ARM] Correctly handle execute-only in EmitStructByval
Currently when compiling for an execute-only target without movt then
EmitStructByval will generate a constant pool load which isn't
compatible with execute-only. Handle this by emitting tMOVi32imm,
and also simplify the existing movt handling by emitting t2MOVi32imm
or MOVi32imm.

Differential Revision: https://reviews.llvm.org/D154944
2023-07-19 13:56:36 +01:00
John Brawn
1b12b1a335 [ARM] Restructure MOVi32imm expansion to not do pointless instructions
The expansion of the various MOVi32imm pseudo-instructions works by
splitting the operand into components (either halfwords or bytes) and
emitting instructions to combine those components into the final
result. When the operand is an immediate with some components being
zero this can result in pointless instructions that just add zero.

Avoid this by restructuring things so that a separate function handles
splitting the operand into components, then don't emit the component
if it is a zero immediate. This is straightforward for movw/movt,
where we just don't emit the movt if it's zero, but the thumb1
expansion using mov/add/lsl is more complex, as even when we don't
emit a given byte we still need to get the shift correct.

Differential Revision: https://reviews.llvm.org/D154943
2023-07-19 13:56:36 +01:00
Ties Stuij
84f888ca82 [ARM] don't emit constant pool for Thumb1 XO/stack guard combo
Currently for armv6-m and armv8-m.baseline, we emit constant pool code when we
use execute-only (XO) in combination with stack guards.

XO is a new feature for armv6-m, and this patch is part of a series of patches
that substitutes constant pool generation with the tMOVi32imm equivalent.

However XO for armv8-m.baseline has been available for about 6 years, and so
for armv8-m.baseline this is a bugfix.

Reviewed By: simonwallis2, olista01

Differential Revision: https://reviews.llvm.org/D155170
2023-07-19 13:51:43 +01:00