52796 Commits

Author SHA1 Message Date
Joseph Huber
ffabcbcf8f [NVVMReflect][Reland] Force dead branch elimination in NVVMReflect (#81189)
Summary:
The `__nvvm_reflect` function is used to guard invalid code that varies
between architectures. One problem with this feature is that if it is
used without optimizations, it will leave invalid code in the module
that will then make it to the backend. The `__nvvm_reflect` pass is
already mandatory, so it should do some trivial branch removal to ensure
that constants are handled correctly. This dead branch elimination only
works in the trivial case of a compare on a branch and does not touch
any conditionals that were not realted to the `__nvvm_reflect` call in
order to preserve `O0` semantics as much as possible. This should allow
the following to work on NVPTX targets

```c
int foo() {
  if (__nvvm_reflect("__CUDA_ARCH") >= 700)
    asm("valid;\n");
}
```

Relanding after fixing a bug.
2024-02-08 20:09:44 -06:00
Joseph Huber
0800a36053 Revert "[NVVMReflect] Force dead branch elimination in NVVMReflect (#81189)"
This reverts commit 9211e67da36782db44a46ccb9ac06734ccf2570f.

Summary:
This seemed to crash one one of the CUDA math tests. Revert until it can
be fixed.
2024-02-08 17:32:04 -06:00
Joseph Huber
9211e67da3
[NVVMReflect] Force dead branch elimination in NVVMReflect (#81189)
Summary:
The `__nvvm_reflect` function is used to guard invalid code that varies
between architectures. One problem with this feature is that if it is
used without optimizations, it will leave invalid code in the module
that will then make it to the backend. The `__nvvm_reflect` pass is
already mandatory, so it should do some trivial branch removal to ensure
that constants are handled correctly. This dead branch elimination only
works in the trivial case of a compare on a branch and does not touch
any conditionals that were not realted to the `__nvvm_reflect` call in
order to preserve `O0` semantics as much as possible. This should allow
the following to work on NVPTX targets

```c
int foo() {
  if (__nvvm_reflect("__CUDA_ARCH") >= 700)
    asm("valid;\n");
}
```
2024-02-08 17:16:31 -06:00
Alex MacLean
9affa177b5
[NVPTX] Add support for calling aliases (#81170)
The current implementation of aliases tries to remove all the aliases in
the module to prevent the generic version of `AsmPrinter` from emitting
them incorrectly. Unfortunately, if the aliases are used this will fail.
Instead let's override the function to print aliases directly.

In addition, the declarations of the alias functions must occur before
the uses. To fix this we emit alias declarations as part of
`emitDeclarations` and only emit the `.alias` directives at the end
(where we can assume the aliasee has also already been declared).
2024-02-08 17:14:13 -06:00
Luke Lau
06c89bd59c
[RISCV] Check type is legal before combining mgather to vlse intrinsic (#81107)
Otherwise we will crash since target intrinsics don't have their types
legalized. Let the mgather get legalized first, then do the combine on
the legal type.
Fixes #81088

Co-authored-by: Craig Topper <craig.topper@sifive.com>
2024-02-09 06:51:11 +08:00
Philip Reames
b8545e1ece
[RISCV] Consider all subvector extracts within a single VREG cheap (#81032)
This adjusts the isSubVectorExtractCheap callback to consider any
extract which fits entirely within the first VLEN bits of the src vector
(and uses a 5 bit immediate for the slide) as cheap. These can be done
via a single m1 vslide1down.vi instruction.

This allows our generic DAG combine logic to kick in and recognize a few
more cases where shuffle source is longer than the dest, but that using
a wider shuffle is still profitable. (Or as shown in the test diff, we
can split the wider source and do two narrower shuffles.)
2024-02-08 12:15:33 -08:00
Philip Reames
d0f72f8860
[RISCV] Consider truncate semantics in performBUILD_VECTORCombine (#81168)
Fixes https://github.com/llvm/llvm-project/issues/80910.

Per the documentation in ISDOpcodes.h, for BUILD_VECTOR "The types of
the operands must match the vector element type, except that integer
types are allowed to be larger than the element type, in which case the
operands are implicitly truncated."

This transform was assuming that the scalar operand type matched the
result type. This resulted in essentially performing a truncate before a
binop, instead of after. As demonstrated by the test case changes, this
is often not legal.
2024-02-08 11:28:06 -08:00
alex-t
88e52511ca
[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init (#79586)
This change implements synthesizing the private buffer resource
descriptor in the kernel prolog instead of using the preloaded kernel
argument.
2024-02-08 20:27:36 +01:00
Philip Reames
c8d431e0ed [riscv] Add test coverage in advance of a upcoming fix
This is a reduced test case for a fix for the issue identified in
https://github.com/llvm/llvm-project/issues/80910.
2024-02-08 09:45:57 -08:00
Simon Pilgrim
bef25ae297 [X86] X86FixupVectorConstants - use explicit register bitwidth for the loaded vector instead of using constant pool bitwidth
Fixes #81136 - we might be loading from a constant pool entry wider than the destination register bitwidth, affecting the vextload scale calculation.

ConvertToBroadcastAVX512 doesn't yet set an explicit bitwidth (it will default to the constant pool bitwidth) due to difficulties in looking up the original register width through the fold tables, but as we only use rebuildSplatCst this shouldn't cause any miscompilations, although it might prevent folding to broadcast if only the lower bits match a splatable pattern.
2024-02-08 17:39:19 +00:00
Simon Pilgrim
eb85c8edf5 [X86] Add test case for #81136 2024-02-08 16:35:13 +00:00
Ivan Kosarev
7d19dc50de
[AMDGPU][True16] Support VOP3 source DPP operands. (#80892) 2024-02-08 16:23:00 +00:00
ostannard
5452cbc4a6
[AArch64] Indirect tail-calls cannot use x16 with pac-ret+pc (#81020)
When using -mbranch-protection=pac-ret+pc, x16 is used in the function
epilogue to hold the address of the signing instruction. This is used by
a HINT instruction which can only use x16, so we can't change this. This
means that we can't use it to hold the function pointer for an indirect
tail-call.

There is existing code to force indirect tail-calls to use x16 or x17
when BTI is enabled, so there are now 4 combinations:

bti  pac-ret+pc  Valid function pointer registers
off  off         Any non callee-saved register
on   off         x16 or x17
off  on          Any non callee-saved register except x16
on   on          x17
2024-02-08 15:31:54 +00:00
Evgeniy
49ee2ffc65
[X86][GlobalISel] Reorganize br/brcond tests (NFC) (#80204)
Removing duplicating tests under GlobalISel, consolidating to perform
checks with all three selectors.
2024-02-08 15:36:22 +05:30
Pierre van Houtryve
9ff3b82948
[AMDGPU] Revert Metadata Version Upgrade (#80995)
Metadata is still 1.2, not 1.3 after V6.
I thought that amdhsa.version mapped to the COV version but it's
separate, and there are no MD changes in V6, hence it doesn't need to be
updated.
2024-02-08 08:30:59 +01:00
David Green
7da1dda01e [AArch64][GlobalISel] Update GISel check line and regenerate tests. NFC 2024-02-08 02:58:09 +00:00
Luke Lau
ece66dbc60
[SelectionDAG] Add computeKnownBits support for ISD::STEP_VECTOR (#80452)
This handles two cases where we can work out some known-zero bits for
ISD::STEP_VECTOR.

The first case handles when we know the low bits are zero because the
step
amount is a power of two. This is taken from
https://reviews.llvm.org/D128159,
and even though the original patch didn't end up landing this case due
to it
not having any test difference, I've included it here for completeness's
sake.

The second case handles the case when we have an upper bound on
vscale_range.
We can use this to work out the upper bound on the number of elements,
and thus
what the maximum step will be. From the maximum step we then know which
hi bits
are zero.

On its own, computing the known hi bits results in some small
improvements for
RVV with -mrvv-vector-bits=zvl across the llvm-test-suite. However I'm
hoping
to be able to use this later to reduce the LMUL in index calculations
for
vrgather/indexed accesses.

---------

Co-authored-by: Philip Reames <preames@rivosinc.com>
2024-02-08 10:04:55 +08:00
Visoiu Mistrih Francis
514686acfd
[RISCV] Add correct Uses, Defs, isReturn to Zcmp (#81039)
* they all do stack adjustments, so they all use and def x2.
* popret and popretz also return
* popretz also defines x10

This adds that to the TD file and updates the PushPopOptimizer to
preserve the extra implicit operands added during frame lowering when
converting to popret(z).
2024-02-07 14:30:45 -08:00
Ilya Leoshkevich
9c75a98155
[SystemZ] Implement A, O and R inline assembly format flags (#80685)
Implement the following assembly format flags, which are already
supported by GCC:

	'A': On z14 or higher: If operand is a mem print the alignment
         hint usable with vl/vst prefixed by a comma.
	'O': print only the displacement of a memory reference or address.
	'R': print only the base register of a memory reference or address.

Implement 'A' conservatively, since the memory operand alignment
information is not available for INLINEASM at the moment.
2024-02-07 20:41:40 +01:00
Jeffrey Byrnes
3115ad8980
[AMDGPU] Accept arbitrary sized sources in CalculateByteProvider (#70240)
Reland the original patch with additional commit containing fix for two
issues:

1. Attempting to bitcast using MVTs with no corresponding LLVM type.
getDWordFromOffset now works directly with the original vector to get
the corresponding elements given the DWordOffset.
2. Improper bit tracking in CalculateByteProvider for vector types using
certain ops. Previously, bit tracking for certain ops (e.g.
ISD::TRUNCATE) assumed operands were scalar types, which is not correct
since these ops have different semantics depending on vector / scalar.
CalculateByteProvider / CalculateSrcByte now exit on vector types,
handling which is a TODO.
2024-02-07 11:34:50 -08:00
Arthur Eubanks
5a83bccb35
[X86] Fix lowering TLS under darwin large code model (#80907)
OpFlag and WrapperKind should be chosen consistently with each other in
regards to PIC, otherwise we hit asserts later on.

Broken by c04a05d8.

Fixes #80831.
2024-02-07 09:16:36 -08:00
Jeremy Morse
d109f94f29 [DebugInfo][RemoveDIs] Re-enable some test coverage
We disabled these extra-special RUNlines due to unexpected interactions
between the various things we've been fixing. Re-enable them (they'll run
on the llvm-new-debug-iterators buildbot) as they all now pass.
2024-02-07 12:41:32 +00:00
Simon Pilgrim
c2a91d4a33 [X86] combine-movmsk-avx.ll - add full AVX1/AVX2 VTEST/MOVMSK test coverage
Test all combos of avx1/avx2 and prefer-movmsk-over-vtest
2024-02-07 11:12:29 +00:00
Carl Ritson
7d508eb5d3 Revert "[AMDGPU] Add pal metadata 3.0 support to callable pal funcs (#67104)"
This reverts commit d6c7253d32e4bdff619c39708170f1c1fa01ff95.

Change causing CTS failures due to incomplete metadata.
2024-02-07 17:09:56 +09:00
Carl Ritson
9bda1de0b6
[TwoAddressInstruction] Propagate undef flags for partial defs (#79286)
If part of a register (lowered from REG_SEQUENCE) is undefined then we
should propagate undef flags to uses of those lanes. This is only
performed when live intervals are present as it requires live intervals
to correctly match uses to defs, and the primary goal is to allow
precise computation of subrange intervals.
2024-02-07 16:46:00 +09:00
Serge Pavlov
b0785cd1cb
[GlobalISel][ARM] Support missing case for G_CONSTANT (#80555)
Global Instruction Selector could not select the code:

    %0:gprb(s32) = G_CONSTANT i32 -1

In DAG selector the similar code is selected to the instruction MVNi
using custom operand `mod_imm_not`. Changing its definition from
`PatLeaf` to `ImmLeaf` and providing counterpart for `imm_not_XFORM`
make the relevant rule available for GlobalISel too.
2024-02-07 12:53:20 +07:00
Visoiu Mistrih Francis
69a661cbae
[RISCV] Remove CalleeSavedInfo for Zcmp/save-restore-libcalls registers (#79535)
Registers that are pushed/popped by Zcmp or libcalls have pre-defined
frame indices that are never allocated in MachineFrameInfo. They're
being used throughout PEI, but the rest of codegen doesn't work that way
and expects each frame index to be a valid index in MFI.

This patch keeps it local to PEI and removes them from the
CalleeSavedInfo list at the end of the pass.

Before this pass, any MIR testing post-PEI is broken and asserts (see
issue #79491).
2024-02-06 18:18:49 -08:00
Fangrui Song
1c22d3f55d [ARC] Convert tests to opaque pointers (NFC) 2024-02-06 12:55:16 -08:00
Fangrui Song
423ac3d9ee [CSKY] Convert tests to opaque pointers (NFC) 2024-02-06 12:54:21 -08:00
Fangrui Song
cd0d11be7a [M68k] Convert tests to opaque pointers (NFC) 2024-02-06 12:53:16 -08:00
stephenpeckham
90e8dc0f7c
Fix failing testcases (#80902) 2024-02-06 15:35:21 -05:00
Jeremy Morse
5ce2f73b2e [DebugInfo][RemoveDIs] Add some missing test coverage
In github PR #78731 it looks like I added test coverage for RemoveDIs to
either the wrong test, or not enough. Adding
--try-experimental-debuginfo-iterators to this particular test is enough to
restore some coverage it seems.
2024-02-06 19:34:53 +00:00
Craig Topper
2faeea313f
[RISCV] Add Ssqosid support to -march. (#80747) 2024-02-06 10:06:01 -08:00
Craig Topper
cca49663a5
[FastISel][X86] Use getTypeForExtReturn in GetReturnInfo. (#80803)
The comment and code here seems to match getTypeForExtReturn. The
history shows that at the time this code was added, similar code existed
in SelectionDAGBuilder. SelectionDAGBuiler code has since been
refactored into getTypeForExtReturn.

This patch makes FastISel match SelectionDAGBuilder.

The test changes are because X86 has customization of
getTypeForExtReturn. So now we only extend returns to i8.

Stumbled onto this difference by accident.
2024-02-06 09:38:25 -08:00
Fangrui Song
6b2fd7aed6
[MIPS] Use generic isBlockOnlyReachableByFallthrough (#80799)
FastISel may create a redundant BGTZ terminal which fallthroughes.
```
  BGTZ %2:gpr32, %bb.1, implicit-def $at

bb.1.bb1:
; predecessors: %bb.0
```

The `!I->isBarrier()` check in
MipsAsmPrinter::isBlockOnlyReachableByFallthrough
will incorrectly not print a label, leading to a `Undefined temporary
symbol `
error when we try assembling the output assembly file. See the updated
`Fast-ISel/pr40325.ll` and
https://github.com/rust-lang/rust/issues/108835

In addition, the `SwitchInst` condition is too conservative and prints
many unneeded labels (see the updated tests).

Just use the generic isBlockOnlyReachableByFallthrough, updated by
commit 1995b9fead62f2f6c0ad217bd00ce3184f741fdb for SPARC, which also
handles MIPS.
2024-02-06 09:23:33 -08:00
choikwa
e5638c5a00
[AMDGPU] Use correct number of bits needed for div/rem shrinking (#80622)
There was an error where dividend of type i64 and actual used number of
bits of 32 fell into path that assumes only 24 bits being used. Check
that AtLeast field is used correctly when using computeNumSignBits and
add necessary extend/trunc for 32 bits path.

Regolden and update testcases.

@jrbyrnes @bcahoon @arsenm @rampitec
2024-02-06 21:32:28 +05:30
David Stuttard
d6c7253d32
[AMDGPU] Add pal metadata 3.0 support to callable pal funcs (#67104)
PAL Metadata 3.0 introduces an explicit structure in metadata for the
programmable registers written out by the compiler backend.
The previous approach used opaque registers which can change between different
architectures and required encoding the bitfield information in the backend,
which may change between versions.

This change is an extension the previously added support - which only handled
entry functions. This adds support for all functions.

The change also includes some re-factoring to separate common code.
2024-02-06 15:34:36 +00:00
stephenpeckham
b1acb7a315
[XCOFF] Add compiler version to an auxiliary symbol table entry (#80162)
C_FILE symbols. To match the behavior of the assembler and the legacy
compiler, this includes using the generic ".file" name for the C_FILE
symbol and generating the actual file name in an auxiliary entry.
2024-02-06 09:08:18 -06:00
Thorsten Schütt
364f781344
[GlobalIsel] Combine logic of icmps (#77855)
Inspired by InstCombinerImpl::foldAndOrOfICmpsUsingRanges with some
adaptations to MIR.
2024-02-06 15:58:02 +01:00
David Green
2e3de997ab [DAG] Generalize setcc(setcc) fold to use known bits.
If we have a `SETCC (SETCC), 0, NE` and ZeroOrOneBooleanContent, we can remove
the outer setcc as it will produce the same value as the inner. This can be
generalized to anything where the top bits are known to be 0, as the value will
remain as 1 or 0.
2024-02-06 12:39:48 +00:00
Simon Pilgrim
b8cdc2638e
[DAG] visitCTPOP - if only the upper half of the ctpop operand is zero then see if its profitable to only count the lower half. (#80473) 2024-02-06 12:19:31 +00:00
Rin Dobrescu
7f292b8fb1
[AArch64] Convert concat(uhadd(a,b), uhadd(c,d)) to uhadd(concat(a,c), concat(b,d)) (#80674)
We can convert concat(v4i16 uhadd(a,b), v4i16 uhadd(c,d)) to v8i16
uhadd(concat(a,c), concat(b,d)), which can lead to further
simplifications.
2024-02-06 11:02:06 +00:00
Qiu Chaofan
292d9e869f
[PowerPC] Mask constant operands in ValueBit tracking (#67653)
In IR or C code, shift amount larger than value size is undefined
behavior. But in practice, backend lowering for shift_parts produces
add/sub of shift amounts, thus constant shift amounts might be
negative or larger than value size, which depends on ISA definition.

PowerPC ISA says, the lowest 7 bits (6 bits for 32-bit instruction)
will be taken, and if the highest among them is 1, result will be
zero, otherwise the low 6 bits (or 5 on 32-bit) are used as shift
amount.

This commit emulates the behavior and avoids array overflow in bit
permutation's value bits calculator.
2024-02-06 18:37:31 +08:00
Luke Lau
bc569f6eb3 [RISCV] Add test case for shufflevector that gets scalarized. NFC
This shufflevector gets scalarized into a build_vector of extract_vector_elts
because the output type doesn't match the input vector type.

Normally this is combined back into a vector_shuffle in DAGCombine, but this
one fails because we don't consider a extract_subvector to be cheap,
specifically because it's at an index > 31.

This should be canonicalized back into a vector_shuffle at some point so we can
lower it as a vrgather.vv.
2024-02-06 18:35:18 +08:00
Sjoerd Meijer
35904ec4e1
[AArch64] MI Scheduler STP combine (#80188)
Add opcodes for different store instructions to the target hook that can
enable more STP pairs. This is split off from the patch that does the
same for some load instructions (#79003).

Patch co-authored by Cameron McInally.
2024-02-06 10:29:42 +00:00
paperchalice
c9fd738388
[CodeGen] Port DeadMachineInstructionElim to new pass manager (#80582)
A simple enough op pass so we can test standard instrumentations in
future.
2024-02-06 17:56:56 +08:00
Matt Arsenault
42b5b720ca AMDGPU/GlobalISel: Fix not running -global-isel in global isel test 2024-02-06 14:55:48 +05:30
Derek Schuff
c0cb0be85c
Mark llvm/test/CodeGen/WebAssembly/immediates.ll as passing on MIPS (#80771)
Fixes #80533
2024-02-05 17:38:54 -08:00
Congcong Cai
a71147dd28
[WebAssembly] improve getRegForPromotedValue to avoid meanless value copy (#80469)
When promoted value, it is meaningless to copy value from reg to another
reg with the same type.
This PR add additional check for this cases to reduce the code size.
Fixes: #80053.
2024-02-06 09:07:58 +08:00
Philip Reames
e722d9662d [DAG] Avoid a crash when checking size of scalable type in visitANDLike
Fixes https://github.com/llvm/llvm-project/issues/80744.  This transform
doesn't handled vectors at all,  The fixed length ones pass the first
check, but would fail the constant operand checks which immediate follow.
This patch takes the simplest approach, and just guards the transform
for scalar integers.
2024-02-05 14:30:10 -08:00