41974 Commits

Author SHA1 Message Date
Sanjay Patel
267400c9b0 [x86] add tests for fmul/fdiv with identity constant in select arm; NFC 2022-02-01 15:43:28 -05:00
Sanjay Patel
8191472246 [x86] add more tests for select with identity constant; NFC
D118644
2022-02-01 15:43:27 -05:00
Stanislav Mekhanoshin
79606ee85c [AMDGPU] Check atomics aliasing in the clobbering annotation
MemorySSA considers any atomic a def to any operation it dominates
just like a barrier or fence. That is correct from memory state
perspective, but not required for the no-clobber metadata since
we are not using it for reordering. Skip such atomics during the
scan just like a barrier if it does not alias with the load.

Differential Revision: https://reviews.llvm.org/D118661
2022-02-01 12:33:25 -08:00
David Green
c89cfbd4dd Revert "[DAG] Extend SearchForAndLoads with any_extend handling"
This reverts commit 100763a88fe97b22cd5e3f69d203669aac3ed48f as it was
making incorrect assumptions about implicit zero_extends.
2022-02-01 20:18:40 +00:00
Stanislav Mekhanoshin
c2b18a3cc5 [AMDGPU] Allow scalar loads after barrier
Currently we cannot convert a vector load into scalar if there
is dominating barrier or fence. It is considered a clobbering
memory access to prevent memory operations reordering. While
reordering is not possible the actual memory is not being clobbered
by a barrier or fence and we can still use a scalar load for a
uniform pointer.

The solution is not to bail on a first clobbering access but
traverse MemorySSA to the root excluding barriers and fences.

Differential Revision: https://reviews.llvm.org/D118419
2022-02-01 11:43:17 -08:00
Chris Bieneman
7a0cbe11fb [NFC] These tests require a default target
These test cases all rely on a default target being specified. Adding
the requirement gets the tests properly skipped when
LLVM_DEFAULT_TARGET_TRIPLE is unset.
2022-02-01 13:18:39 -06:00
Fangrui Song
1494d064fa [AMDGPU][test] Add dso_local to prevent preemptible alias resolution 2022-02-01 10:23:45 -08:00
David Green
c40744d4d6 [AArch64] Add some CCMP testing. NFC 2022-02-01 18:15:34 +00:00
Krzysztof Parzyszek
c935f6e048 [Hexagon] Punt on registers without reaching defs in addr mode opt
This fixes https://github.com/llvm/llvm-project/issues/52636.
2022-02-01 09:52:59 -08:00
Craig Topper
7eb7810727 [RISCV] Fix a vsetvli insertion bug involving loads/stores.
The first phase of the analysis can avoid a vsetvli if an earlier
instruction in the block used an SEW and LMUL that when combined with
the EEW of the load/store would produce the desired EMUL. If we
avoided a vsetvli this will affect the global analysis we do in the
second phase.

The third phase where we really insert the vsetvlis needs to agree
with the first phase. If it doesn't we can insert vsetvlis that
invalidate the global analysis.

In the test case there is a VSETVLI in the preheader that sets
SEW=64 and LMUL=1. Inside the loop there is a VADD with SEW=64 and LMUL=1.
This VADD is followed by a store that wants wants SEW=32 LMUL=1/2.
Because it has EEW=32 as part of the opcode the SEW=64 LMUL=1 from the
VADD can be become EMUL=1 for the store. So the first phase determines no
vsetvli is needed.

The third phase manages CurInfo differently than BBInfo.Change from the
first phase. CurInfo is only updated when we see a vsetvli or insert
a vsetvli. This was done to allow predecessor block information from
the global analysis to be applied to multiple instructions. Since the
loop body has no vsetvli we won't update CurInfo for either the VADD
or the VSE. This prevented us from checking the store vsetvli elision
for the VSE resulting in a vsetvli SEW=32 LMUL=1/2 being emitted which
invalidated the global analysis.

To mitigate this, I've added a BBLocalInfo variable that more closely
matches the first phase propagation. This gets updated based on the
VADD and prevents emitting a vsetvli for the store like we did in the
first phase.

I wonder if we should do an earlier phase to handle the load/store case
by adding more pseudo opcodes and changing the SEW/LMUL for those
instructions before the insertion analysis. That might be more robust
than trying to guarantee two phases make the same decision.

Fixes the test from D118629.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D118667
2022-02-01 07:29:01 -08:00
David Green
d9b4577c45 [AArch64] Add signed version of uaddlv test. NFC 2022-02-01 14:51:23 +00:00
Amy Kwan
0d6e64755a [PowerPC] Update P10 vector insert patterns to use refactored load/stores, and update handling of v4f32 vector insert.
This patch updates the P10 patterns with a load feeding into an insertelt to
utilize the refactored load and store infrastructure, as well as updating any
tests that exhibit any codegen changes.

Furthermore, custom legalization is added for v4f32 on Power9 and above to not
only assist with adjusting the refactored load/stores for P10 vector insert,
but also it enables the utilization of direct moves.

Differential Revision: https://reviews.llvm.org/D115691
2022-02-01 08:48:37 -06:00
Nikita Popov
a1dc6d4b83 [AArch64] Do not use ABI alignment for mops.memset.tag
Pointer element types do not imply that the pointer is ABI aligned.
We should be using either an explicit align attribute here, or fall
back to an alignment of 1. This fixes a new element type access
introduced in D117764.

I don't think this makes any practical difference though, as the
lowering does not depend on alignment.

Differential Revision: https://reviews.llvm.org/D118681
2022-02-01 14:37:53 +01:00
Alexander Shaposhnikov
d03076223b [CodeGen][AArch64] Fix typo in legalizer-info-validation.mir 2022-02-01 12:14:25 +00:00
Alexander Shaposhnikov
80c27fbf94 [CodeGen][AArch64] Fix typo in arm64-zero-cycle-zeroing.ll 2022-02-01 12:08:06 +00:00
Simon Pilgrim
d83a96f59f [DAG] Make it clear mul(x,x) knownbits bit[1] == 0 check should be for x is undef only
As raised on rGffd0e464b4b9, if x is poison, this fold is still ok.
2022-02-01 11:32:14 +00:00
Fraser Cormack
e9ceeedf30 [RISCV][3/3] Switch undef -> poison in scalable-vector RVV tests 2022-02-01 11:06:56 +00:00
Fraser Cormack
8d1169cf74 [RISCV][2/3] Switch undef -> poison in fixed-vector RVV tests 2022-02-01 11:06:56 +00:00
Fraser Cormack
414f21ed23 [RISCV][1/3] Switch undef -> poison in VP RVV tests
Inspired by a recent Discourse post on undef vs. poison usage, this
series of patches should reduce the number of undefs in LLVM tests by
around 10%.

Only undef vector operands to insertelement/shufflevector have been
handled, which are by far the most common we've got.

The switchover is split into 3 fairly arbitrary clusters to make it
slightly more manageable: vector predication, fixed-length vectors,
scalable vectors.
2022-02-01 11:06:55 +00:00
Nikita Popov
ccda3d4ec1 [AArch64] Regenerate test checks (NFC)
The check lines were in the wrong order.
2022-02-01 11:55:02 +01:00
Fraser Cormack
b00bce2a93 [RISCV] Add a test showing an incorrect VSETVLI insertion
This test shows a loop, whose preheader uses a SEW=64, LMUL=1 vector
operation. The loop body starts off with another SEW=64, LMUL=1 VADD
vector operation, before switching to a SEW=32, LMUL=1/2 vector store
instruction.

We can see that the VSETVLI insertion pass omits a VSETVLI before the
VADD (thinking it inherits its configuration from the preheader) but
does place a SEW=32, LMUL=1/2 VSETVLI before the store. This results in
a miscompilation as when the loop comes back around, the VADD is
incorrectly configured with SEW=32, LMUL=1/2.

It appears to be a bad load/store optimization, as replacing the vector
store with an SEW=32, LMUL=1/2 VADD does correctly insert a VSETVLI. The
issue is therefore possibly arising from canSkipVSETVLIForLoadStore.

Differential Revision: https://reviews.llvm.org/D118629
2022-02-01 10:21:29 +00:00
Bjorn Pettersson
3885879046 [DAGCombine] Add simple folds for SSHLSAT/USHLSAT
Do "simplifyShift" and "FoldConstantArithmetic" folds for the SSHLSAT
and USHLSAT DAG nodes.

This includes folds such as:
  (shlsat undef/poison, x) -> 0
  (shlsat x, undef/poison) -> undef
  (shlsat x, too_large_shamt) -> undef
  (shlsat 0, x) -> 0
  (shlsat x, 0) -> x
  (shlsat c1, c2) -> c3

Differential Revision: https://reviews.llvm.org/D118603
2022-02-01 10:51:35 +01:00
Bjorn Pettersson
06105f2ef1 Pre-commit test cases missing SSHLSAT/USHLSAT folds. NFC 2022-02-01 10:51:35 +01:00
David Sherwood
daa80339df [CodeGen] Support folds of not(cmp(cc, ...)) -> cmp(!cc, ...) for scalable vectors
I have updated TargetLowering::isConstTrueVal to also consider
SPLAT_VECTOR nodes with constant integer operands. This allows the
optimisation to also work for targets that support scalable vectors.

Differential Revision: https://reviews.llvm.org/D117210
2022-02-01 09:50:00 +00:00
Jay Foad
d2e5d3512b [StructurizeCFG] Clean up some boolean not instructions
In some cases StructurizeCFG inserts i1 xor instructions to invert
predicates. Add a quick loop to clean these up afterwards if we can get
away with modifying an existing compare instruction instead.
(StructurizeCFG is generally run late in the pipeline so instcombine
does not clean them up for us.)

Differential Revision: https://reviews.llvm.org/D118623
2022-02-01 09:35:37 +00:00
Changpeng Fang
1194b9cdda AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args
Summary:
  Add code object v5 support (deafult is still v4)
  Generate metadata for implicit kernel args for the new ABI
  Set the metadata version to be 1.2

Reviewers:
  t-tye, b-sumner, arsenm, and bcahoon

Fixes:
  SWDEV-307188, SWDEV-307189

Differential Revision:
  https://reviews.llvm.org/D118272
2022-01-31 18:07:47 -08:00
Mircea Trofin
9aa2c914b9 [mlgo][regalloc] Factor live interval feature calculation
Factoring it out so we can subsequently cache it. This should be a NFC,
however, for the float quantities, we see small errors in the least
significant digits. This is because, before, we were summing up one by
one. Now, we sum up results of sums.

This shouldn't matter for ML, and will require rework when we do
quantization (avoiding floats altogether), but meanwhile, it did require
an update to the reference file used for testing.

The patch also bumps the precision of the variables involved in this, to
reduce the error (note they are casted back to float at the end by the
SET macro, since we only work with float and not double in TF)

Differential Revision: https://reviews.llvm.org/D118659
2022-01-31 15:19:15 -08:00
tyb0807
5aa08bf708 [AArch64][SelectionDAG] CodeGen for Armv8.8/9.3 MOPS
New target SDNodes are added: AArch64ISD::MOPS_MEMSET, etc.
Each intrinsic is translated to one of these in SelectionDAGBuilder
via EmitTargetCodeForMOPS.

A custom lowering routine for INTRINSIC_W_CHAIN is added to handle
llvm.aarch64.mops.memset.tag. This takes a separate path from the common
intrinsics but ultimately ends up in the same EmitMOPS().

This is part 4/4 of a series of patches split from
https://reviews.llvm.org/D117405 to facilitate reviewing.

Patch by Tomas Matheson, Lucas Prates and Son Tuan Vu.

Differential Revision: https://reviews.llvm.org/D117764
2022-01-31 20:56:27 +00:00
tyb0807
78fd413cf7 [AArch64][GlobalISel] CodeGen for Armv8.8/9.3 MOPS
This implements codegen for Armv8.8/9.3 Memory Operations extension (MOPS).
Any memcpy/memset/memmov intrinsics will always be emitted as a series
of three consecutive instructions P, M and E which perform the
operation. The SelectionDAG implementation is split into a separate
patch.

AArch64LegalizerInfo will now consider the following generic opcodes
if +mops is available, instead of legalising by expanding them to
libcalls: G_BZERO, G_MEMCPY_INLINE, G_MEMCPY, G_MEMMOVE, G_MEMSET
The s8 value of memset is legalised to s64 to match the pseudos.

AArch64O0PreLegalizerCombinerInfo will still be able to combine
G_MEMCPY_INLINE even if +mops is present, as it is unclear whether it is
better to generate fixed length copies or MOPS instructions for the
inline code of small or zero-sized memory operations, so we choose to be
conservative for now.

AArch64InstructionSelector will select the above as new pseudo
instructions: AArch64::MOPSMemory{Copy/Move/Set/SetTagging} These are
each expanded to a series of three instructions (e.g. SETP/SETM/SETE)
which must be emitted together during code emission to avoid scheduler
reordering.

This is part 3/4 of a series of patches split from
https://reviews.llvm.org/D117405 to facilitate reviewing.

Patch by Tomas Matheson and Son Tuan Vu

Differential Revision: https://reviews.llvm.org/D117763
2022-01-31 20:54:41 +00:00
Mircea Trofin
afbc7bdf98 [mlgo][regalloc][test] Add comprehensive log output testing 2022-01-31 12:46:18 -08:00
Sam Clegg
3e230d15eb Revert "[WebAssembly] Refactor and fix emission of external IR global decls"
This reverts commit 00bf4755e90c89963a135739218ef49c2417109f.

This change broke the emscripten builder (among other things):

https://ci.chromium.org/ui/p/emscripten-releases/builders/try/linux/b8823500584349280721/overview

Sample failure:

```
test_unistd_unlink (test_core.core0) ...
wasm-ld: error: symbol type mismatch: __stdio_write
>>> defined as WASM_SYMBOL_TYPE_FUNCTION in /usr/local/google/home/sbc/dev/wasm/emscripten/cache/sysroot/lib/wasm32-emscripten/libc-debug.a(__stdio_write.o)
>>> defined as WASM_SYMBOL_TYPE_DATA in /usr/local/google/home/sbc/dev/wasm/emscripten/cache/sysroot/lib/wasm32-emscripten/libc-debug.a(stderr.o)
```
2022-01-31 12:20:56 -08:00
Sanjay Patel
06fd721fe7 [x86] add tests for binop of select with identity constant; NFC 2022-01-31 15:08:00 -05:00
Paul Walker
bcda4c48c8 [SVE] By using SEL when orring predicates we forgo the need for a PTRUE.
Differential Revision: https://reviews.llvm.org/D118463
2022-01-31 19:39:23 +00:00
Paul Walker
804915f5dc [SVE] Extend isel pattern coverage for INCP & DECP.
Adds patterns for:
    add(x, cntp(p, p)) -> incp(x, p)
    sub(x, cntp(p, p)) -> decp(x, p)

Differential Revision: https://reviews.llvm.org/D118567
2022-01-31 19:05:05 +00:00
Florian Hahn
23091f7d50
[AArch64] Bail out for float operands in SetCC optimization.
The optimization added in D118139 causes a crash on the added test case
while trying to zero extend an vector of floats.

Fix the crash by bailing out for floating point operands.

Reviewed By: DavidTruby

Differential Revision: https://reviews.llvm.org/D118615
2022-01-31 18:20:47 +00:00
Craig Topper
aae947e860 [RISCV] Separate the Zfhmin and Zfh extensions.
The spec doesn't seem to be written as if Zfh implies Zfhmin. They
seem to be separate extensions.

This patch moves the instructions from Zfhmin to be enabled with
either the Zfh or Zfhmin extensions.

Reviewed By: achieveartificialintelligence

Differential Revision: https://reviews.llvm.org/D118581
2022-01-31 09:06:43 -08:00
Jay Foad
8faad29634 Revert "[Local] invertCondition: try modifying an existing ICmpInst"
This reverts commit a6b54ddaba2d5dc0f72dcc4591c92b9544eb0016.

Apparently it is not safe to modify the condition even if it passes the
hasOneUse test, because StructurizeCFG might have other references to
the condition that are not manifest in the IR use-def chains.
2022-01-31 14:55:36 +00:00
Kerry McLaughlin
002b944dfa [SVE] Fix TypeSize->uint64_t implicit conversion in visitAlloca()
Fixes a crash ('Invalid size request on a scalable vector') in visitAlloca()
when we call this function for a scalable alloca instruction, caused
by the implicit conversion of TySize to uint64_t.
This patch changes TySize to a TypeSize as returned by getTypeAllocSize()
and ensures the allocation size is multiplied by vscale for scalable vectors.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D118372
2022-01-31 14:37:23 +00:00
Dávid Bolvanský
ae990a3cbd [Analysis] Attribute noundef should not prevent tail call optimization
Very similar to https://reviews.llvm.org/D101230
Fixes https://github.com/llvm/llvm-project/issues/53501
2022-01-31 15:13:52 +01:00
Simon Pilgrim
7ec8fc2932 [X86] combineAnd() - per-element simplification - call SimplifyDemandedBits using mask demanded bits if SimplifyDemandedVectorElts fails
We already call SimplifyDemandedVectorElts using whether each vector mask element is zero/nonzero, this just extends this to also try SimplifyDemandedBits using the demanded bits mask generated from the nonzero elements.

This also requires an additional TargetLowering::SimplifyDemandedBits DemandedBits/DemandedElts wrapper.
2022-01-31 13:58:00 +00:00
Simon Pilgrim
2d1390efbe [DAG] SimplifyDemandedBits - mul(x,x) - if only demand bit[1] then fold to zero 2022-01-31 12:00:51 +00:00
Simon Pilgrim
48f45f6b25 [X86] Limit mul(x,x) knownbits tests with not undef/poison check
We can only assume bit[1] == zero if its the only demanded bit or the source is not undef/poison
2022-01-31 11:55:10 +00:00
Jay Foad
ae68b3a457 [AMDGPU] Add test for a problem with noclobber metadata
If AMDGPUAnnotateUniformValues finds a load from a uniform pointer with
no potentially clobbering stores between the kernel entry point and the
load instruction, it adds noclobber metadata to the *address*. This is
unsafe because it can get applied to other loads in the same which do
have aliasing stores.

Differential Revision: https://reviews.llvm.org/D118458
2022-01-31 11:09:34 +00:00
Simon Pilgrim
ffd0e464b4 [X86] Add mul(x,x) tests showing miscompile
As raised by @efriedma on D117995 - the source must not be undef/poison to demand any bits in mul(x,x) other than bit[1]

https://alive2.llvm.org/ce/z/Cxkjen
2022-01-31 11:07:03 +00:00
Jay Foad
a6b54ddaba [Local] invertCondition: try modifying an existing ICmpInst
This avoids various cases where StructurizeCFG would otherwise insert an
xor i1 instruction, and it since it generally runs late in the pipeline,
instcombine does not clean up the xor-of-cmp pattern.

Differential Revision: https://reviews.llvm.org/D118478
2022-01-31 10:44:17 +00:00
Paulo Matos
00bf4755e9 [WebAssembly] Refactor and fix emission of external IR global decls
This patches fixes the visibility and linkage information of symbols
referring to IR globals.

Emission of external declarations is now done in the first execution
of emitConstantPool rather than in emitLinkage (and a few other
places). This is the point where we have already gathered information
about used symbols (by running the MC Lower PrePass) and not yet
started emitting any functions so that any declarations that need to
be emitted are done so at the top of the file before any functions.

This changes the order of a few directives in the final asm file which
required an update to a few tests.

Reviewed By: sbc100

Differential Revision: https://reviews.llvm.org/D118122
2022-01-31 11:42:21 +01:00
Craig Topper
e1075186a6 [RISCV] Custom lower brev8 intrinsic to RISCVISD::GREV.
We can use the RISCVISD::GREV encoding that swaps the bits in
each byte.  This allows it to use the existing computeKnownBits
support for RISCVISD::GREV.
2022-01-30 12:41:09 -08:00
Simon Pilgrim
156f83adc2 [X86] combineVectorTruncation - use PACKUSDW(BLENDW(X,0),BLENDW(Y,0)) for v8i32->v8i16 truncation
Limit this to SSE41 - AVX1 targets to avoid UNPCKL(PSHUFB,PSHUFB), pre-SSE41 we don't have PACKUSDW/BLENDW and with AVX2 we can perform this as PERMQ(PSHUFB()).
2022-01-30 20:07:04 +00:00
Simon Pilgrim
2cdbaca394 [X86] Attempt to fold MOVMSK(CMPEQ(AND(X,C1),0)) -> MOVMSK(NOT(SHL(X,C2)))
Allows pow2 mask tests to avoid an unnecessary constant load.

Noticed while investigating how to extend MatchVectorAllZeroTest to support more allof/anyof patterns.
2022-01-30 15:53:21 +00:00
Simon Pilgrim
4e3ba526bf [X86] Add tests showing failure to fold MOVMSK(CMPEQ(AND(X,C1),0)) -> MOVMSK(NOT(SHL(X,C2)))
This would allow pow2 mask tests to avoid an unnecessary constant load.

Noticed while investigating how to extend MatchVectorAllZeroTest to support more allof/anyof patterns.
2022-01-30 15:42:59 +00:00