56130 Commits

Author SHA1 Message Date
Paul Walker
71b87d1267
[LLVM][SVE] Ensure all fixed length mask bits are defined. (#116819)
convertFixedMaskToScalableVector expects the mask input to honour the
BoolContents scheme employed by the target. For AArch64 this means a
mask should be zero or all ones, and thus when promoting a mask we must
use a sign extend.
2024-11-20 13:54:50 +00:00
Sergei Barannikov
8c56dd3040
[ARM] Stop gluing FP comparisons to FMSTAT (#116676)
Following #116547, this changes the result of `ARMISD::CMPFP*` and the
operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type.

This change allows comparisons to be CSEd and scheduled around as can be
seen in the test changes.

Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is
going to be changed in a separate patch.

This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a
negative value. The reason is the same as for CCR register class: it
makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV`
register to / from virtual registers. Previously, this was not
necessary, since no attempt was made to create copies in the first
place.

There might be a case when a copy can't be avoided (although not found
in existing tests). If a copy is necessary, the virtual register will be
created with `cl_FPSCR_NZCV` register class. If this register class is
inappropriate, `TRI::getCrossCopyRegClass` should be modified to return
the correct class.

Pull Request: https://github.com/llvm/llvm-project/pull/116676
2024-11-20 16:07:05 +03:00
Sam Elliott
408659c5b5
[RISCV] Merge GPRPair and GPRF64Pair (#116094)
As suggested by Craig, this tries to merge the two sets of register
classes created in #112983, GPRPair* and GPRF64Pair*.

- I added some explicit annotations to `RISCVInstrInfoD.td` which fixed
the type inference issues I was seeing from tablegen for select
patterns.
- I've had to make the behaviour of `splitValueIntoRegisterParts` and
`joinRegisterPartsIntoValue` cover more cases, because you cannot
bitcast to/from untyped (the bitcast would otherwise have been inserted
automatically by TargetLowering code).
- I apparently didn't need to change `getNumRegisters` again, which
continues to tell me there's a bug in the code for tied inputs. I added
some more test coverage of this case but it didn't seem to help find the
asserts I was finding before - I think the difference is between the
default behaviour for integers which doesn't apply to floats.
- There's still a difference between BuildGPRPair and BuildPairF64 (and
the same for SplitGPRPair and SplitF64). I'm not happy with this, I
think it's quite confusing, as they're very similar, just differing in
whether they give a `untyped` or a `f64`. I haven't really worked out
how the DAGCombiner copes if one meets the other, I know we have some of
this for the f64 variants already, but they're a lot more complex than
the GPRPair variants anyway.
2024-11-20 10:08:55 +00:00
Diana Picus
09c41246ed
[AMDGPU] Fix restores in chain functions (#116193)
When spilling a VGPR in `emitPrologue`, chain functions prefer to use
offsets to access the stack instead of the SP.

This patch fixes `emitEpilogue` to do the same. It also brings back some
test coverage that was lost in #93526, when WWM registers started being
shifted to the lowest available range (which meant that tests that were
originally spilling v8 would shift to spill v0, which is a scratch
register for chain functions and didn't get spilled).

Change-Id: Icb07fccd859b563cd45f74c25ae578ecb38bdeeb
2024-11-20 10:43:59 +01:00
David Green
bca846d462
[AArch64] Improve mull generation (#114997)
This attempts to clean up and improve where we generate smull/umull
using known-bits. For v2i64 types (where no mul is present), we try to
create mull more aggressively to avoid scalarization.
2024-11-20 09:12:22 +00:00
Simon Pilgrim
3a5cf6d99b
[X86] Rename AVX512 VEXTRACT/INSERT??x? to VEXTRACT/INSERT??X? (#116826)
Use uppercase in the subvector description ("32x2" -> "32X4" etc.) - matches what we already do in VBROADCAST??X?, and we try to use uppercase for all x86 instruction mnemonics anyway (and lowercase just for the arg description suffix).
2024-11-20 08:25:01 +00:00
Craig Topper
2187738508 [RISCV] Add additional CHECK prefixes to fixed-vectors-strided-load-store-asm.ll. NFC
We had 2 RUN lines with conflicting output sharing prefixes. The
script unfortunately did not report the error.
2024-11-19 16:54:29 -08:00
David Green
5b79152937 [AArch64] Make sure there is test coverage for ptr phis. NFC 2024-11-19 21:01:53 +00:00
Craig Topper
eff60d83b0 [RISCV][GISel] Make extended loads and truncating stores with s16 register type and s8 memory type legal.
This addresses some failures I've seen in testing on real code.
2024-11-19 11:57:35 -08:00
Yashas Andaluri
b28eebf926
[RDF] Fix cover check when linking refs to defs (#113888)
During RDF graph construction, linkRefUp method links a register ref to
its upward reaching defs until all RegUnits of the ref have been covered
by defs.
However, when a sub-register def covers some, but not all, of the
RegUnits of a previous super-register def, a super-register ref is not
linked to the super-register def.
This can result in certain super register defs being dead code
eliminated.

This patch fixes the cover check for a register ref. A def must be
skipped only when all RegUnits of that def have already been covered by
a previously seen def.
2024-11-19 12:38:36 -06:00
Jay Foad
b3995aa338
[AMDGPU] Decrease default NSA threshold from 3 to 2 (#116624)
In graphics shaders it is better overall to use NSA encoding for IMAGE
instructions, because the benefit of less constrained register
allocation outweighs the cost of larger encoding. In particular NSA form
often avoids the need for extra V_MOV_B32 instructions between IMAGE
instructions, which can allow the IMAGE instructions to be claused.

Note that in GFX12 there is no longer a bit in the encoding to choose
between NSA and non-NSA forms, so this only affects GFX10 and GFX11.
2024-11-19 15:54:27 +00:00
Zaara Syeda
8e4423eb08
[AsmPrinter] Fix handling in emitGlobalConstantImpl for AIX (#116255)
When GlobalMerge creates a MergedGlobal of statics all initialized to
zero, emitGlobalConstantImpl sees a ConstantAggregateZero. This results
in just emitting zeros followed by labels for the aliases. We need to
handle it more like how emitGlobalConstantStruct does by emitting each
global inside the aggregate.

---------

Co-authored-by: Hubert Tong <hubert.reinterpretcast@gmail.com>
2024-11-19 09:58:25 -05:00
Sergei Barannikov
aff98e4be0
[ARM] Stop gluing 1-bit shifts (#116547)
1. When two (or more) nodes are glued, DAG scheduler will always
schedule them as one piece, i.e. it will not allow any instructions to
be scheduled between them. It does so because if nodes are glued this
usually means that there is an implicit register dependency between
them, and an intervening node could clobber this physical register. When
emitting such nodes into machine IR, they will also be stuck together,
e.g.:
```
    %9:gpr = MOVsrl_glue killed %8, implicit-def $cpsr
    %10:gpr = RRX %3, implicit $cpsr
```

2. If a node has Glue result, SelectionDAG will not try to CSE this
node. If it did, it would break the implicit physical register
dependency. In practice this means that if a node with Glue result has
multiple uses, it has to be duplicated before each use. This the reason
for `ARMTargetLowering::duplicateCmp` to exist.

When using normal data dependency, dependent nodes can freely be
scheduled around. If there is a physical register dependency between
nodes, the physical register will be copied to/from a virtual register,
allowing other nodes to intervene between them. The resulting machine IR
might look like this:
```
    %9:gpr = LSRs1 killed %8, implicit-def $cpsr
    %10:gpr = COPY $cpsr
    %11:gpr = ORRrsi killed %9, %3, 242, 14 /* CC::al */, $noreg, $noreg
    %12:gpr = BICri killed %11, -2147483648, 14 /* CC::al */, $noreg, $noreg
    $cpsr = COPY %10
    %13:gpr = RRX %3, implicit $cpsr
```

The two copies are likely to be eliminated by register coalescer, given
that there are no instructions between them that clobber this physical
register. If the copies are unwanted in the first place (they could be
expensive or impossible), DAG scheduler will try to avoid inserting them
wherever possible, and the resulting machine IR will look like this:
```
    %9:gpr = LSRs1 killed %8, implicit-def $cpsr
    %10:gpr = ORRrsi killed %9, %3, 242, 14 /* CC::al */, $noreg, $noreg
    %11:gpr = BICri killed %10, -2147483648, 14 /* CC::al */, $noreg, $noreg
    %12:gpr = RRX %3, implicit $cpsr
```

On ARM, arithmetic operations and LSLS already use the new data flow
approach. This patch extends it to include 1-bit shifts.

Pull Request: https://github.com/llvm/llvm-project/pull/116547
2024-11-19 17:46:48 +03:00
Yingwei Zheng
c727b48287
[SDAG][ISel][TableGen][LoongArch] Report error for trivial bitcasts when there are predicate calls (#116075)
On loongarch64 with lsx extension, we select `VBITREV_W` for `v4i32 (xor
X, (shl splat(1), Y))`:

8e66303916/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td (L1583-L1584)

And `vsplat_imm_eq_1` is defined as:

8e66303916/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td (L77-L87)

For the `(bitconvert (v4i32 (build_vector)))` case, the pattern is
expected to be:
```
PATTERN: (xor:{ *:[v4i32] } v4i32:{ *:[v4i32] }:$vj, (shl:{ *:[v4i32] } (bitconvert:{ *:[v4i32] } (build_vector:{ *:[v4i32] }))<<P:Predicate_vsplat_imm_eq_1>>, v4i32:{ *:[v4i32] }:$vk))
RESULT:  (VBITREV_W:{ *:[v4i32] } v4i32:{ *:[v4i32] }:$vj, v4i32:{ *:[v4i32] }:$vk)
```

However, `simplifyTree` drops the `bitconvert` node and its predicates:

8e66303916/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp (L3036-L3062)

Then llvm will match `vsplat_imm_eq_1` for any v4i32 splats and cause a
miscompilation:
```
PATTERN: (xor:{ *:[v4i32] } v4i32:{ *:[v4i32] }:$vj, (shl:{ *:[v4i32] } (build_vector:{ *:[v4i32] }), v4i32:{ *:[v4i32] }:$vk))
RESULT:  (VBITREV_W:{ *:[v4i32] } v4i32:{ *:[v4i32] }:$vj, v4i32:{ *:[v4i32] }:$vk)
```

This patch adds additional checks for predicates associated with the
trivial bitconvert node. Unused patterns in the LoongArch target are
also removed.

Fixes https://github.com/llvm/llvm-project/issues/116008.
2024-11-19 21:24:40 +08:00
Hari Limaye
4f0403fe96
[CodeGen][AArch64] Sink splat operands of FMul instructions (#116222)
Sink shuffle operands of FMul instructions if these are splats, as we
can generate lane-indexed variants for these.
2024-11-19 12:59:22 +00:00
Sam Elliott
c4030c896d
[RISCV] Fix FP64 DinX R Regclass (#116688)
This was a typo in llvm/llvm-project#112983 that didn't cause build
failures but is still wrong.
2024-11-19 12:42:27 +00:00
Yingwei Zheng
42ed775783
[InstSimplify] Generalize simplifyAndOrOfFCmps to handle fabs (#116590)
This patch generalizes https://github.com/llvm/llvm-project/issues/81027
to handle pattern `and/or (fcmp ord/uno X, 0), (fcmp pred fabs(X), Y)`.
Alive2: https://alive2.llvm.org/ce/z/tsgUrz
The correctness is straightforward because `fcmp ord/uno X, 0.0` is
equivalent to `fcmp ord/uno fabs(X), 0.0`. We may generalize it to
handle fneg as well.

Address comment
https://github.com/llvm/llvm-project/pull/116065#pullrequestreview-2434796846
2024-11-19 20:10:40 +08:00
Simon Pilgrim
95ab42661e
[X86] Attempt to canonicalize vXf64 SHUFPD shuffle masks with undef elts to improve further folding (#116419)
Currently when creating a SHUFPD immediate mask, any undef shuffle elements are set to 0, which can limit options for further shuffle combining.

This patch attempts to canonicalize the mask to improve folding: first by detecting a per-lane broadcast style mask (which can allow us to fold to UNPCK instead), and second ensure any undef elements are set to an 'inplace' value to improve chances of the SHUFPD later folding to a BLENDPD (or be bypassed in a SimplifyMultipleUseDemandedVectorElts call).

This is very similar to canonicalization we already attempt in getV4X86ShuffleImm for vXi32/vXf32 SHUFPS/SHUFD shuffles.
2024-11-19 10:45:07 +00:00
Sergei Barannikov
6f53ae6e61
[X86] Properly chain PROBED_ALLOCA / SEG_ALLOCA (#116508)
These nodes should appear between CALLSEQ_START / CALLSEQ_END.
Previously, they could be scheduled after CALLSEQ_END because the nodes
didn't update the chain.

The change in a test is due to X86 call frame optimizer pass bailing out
for a particular call when CALLSEQ_START / CALLSEQ_END are not in the
same basic block. This happens because SEG_ALLOCA is expanded into a
sequence of basic blocks early. It didn't bail out before because the
closing CALLSEQ_END was scheduled before SEG_ALLOCA, in the same basic
block as CALLSEQ_START.

While here, simplify creation of these nodes: allocating a virtual
register and copying `Size` into it were unnecessary.
2024-11-19 13:29:58 +03:00
Lukacma
61726add1b
[AArch64] Update predicate for FEXPA (#116613)
This patch updates predicate and backend tests for FEXPA instructions to
match [latest
spec](https://developer.arm.com/documentation/ddi0602/2024-09/SVE-Instructions/FEXPA--Floating-point-exponential-accelerator-).
2024-11-19 10:29:19 +00:00
Mikhail Goncharov
f77126c549 Revert "[FunctionAttrs] Add the "initializes" attribute inference (#97373)"
This reverts commit 661c593850715881d2805a59e90e6d87d8b9fbb8.

Multiple buildbot failures, e.g. https://lab.llvm.org/buildbot/#/builders/108/builds/6096
2024-11-19 10:29:36 +01:00
Davide
8cd348c96a
[MIPS] Updated MIPS N calling conventions so that fp16 arguments no longer cause a crash (#116569)
This PR fixes a bug introduced by #110199, which causes any half float
argument to crash the compiler on MIPS64.

Currently compiling this bit of code with `llc -mtriple=mips64`: 
```
define void @half_args(half %a) nounwind {
entry:
        ret void
}
```

Crashes with the following log:
```
LLVM ERROR: unable to allocate function argument #0
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: llc -mtriple=mips64
1.	Running pass 'Function Pass Manager' on module '<stdin>'.
2.	Running pass 'MIPS DAG->DAG Pattern Instruction Selection' on function '@half_args'
 #0 0x000055a3a4013df8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x32d0df8)
 #1 0x000055a3a401199e llvm::sys::RunSignalHandlers() (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x32ce99e)
 #2 0x000055a3a40144a8 SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f00bde558c0 __restore_rt libc_sigaction.c:0:0
 #4 0x00007f00bdea462c __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007f00bde55822 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #6 0x00007f00bde3e4af abort ./stdlib/abort.c:81:7
 #7 0x000055a3a3f80e3c llvm::report_fatal_error(llvm::Twine const&, bool) (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x323de3c)
 #8 0x000055a3a2e20dfa (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x20dddfa)
 #9 0x000055a3a2a34e20 llvm::MipsTargetLowering::LowerFormalArguments(llvm::SDValue, unsigned int, bool, llvm::SmallVectorImpl<llvm::ISD::InputArg> const&, llvm::SDLoc const&, llvm::SelectionDAG&, llvm::SmallVectorImpl<llvm::SDValue>&) const MipsISelLowering.cpp:0:0
#10 0x000055a3a3d896a9 llvm::SelectionDAGISel::LowerArguments(llvm::Function const&) (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x30466a9)
#11 0x000055a3a3e0b3ec llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x30c83ec)
#12 0x000055a3a3e09e21 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x30c6e21)
#13 0x000055a3a2aae1ca llvm::MipsDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) MipsISelDAGToDAG.cpp:0:0
#14 0x000055a3a3e07706 llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x30c4706)
#15 0x000055a3a3051ed6 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x230eed6)
#16 0x000055a3a35a3ec9 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x2860ec9)
#17 0x000055a3a35ac3b2 llvm::FPPassManager::runOnModule(llvm::Module&) (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x28693b2)
#18 0x000055a3a35a499c llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x286199c)
#19 0x000055a3a262abbb main (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x18e7bbb)
#20 0x00007f00bde3fc4c __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#21 0x00007f00bde3fd05 call_init ./csu/../csu/libc-start.c:128:20
#22 0x00007f00bde3fd05 __libc_start_main@GLIBC_2.2.5 ./csu/../csu/libc-start.c:347:5
#23 0x000055a3a2624921 _start /builddir/glibc-2.39/csu/../sysdeps/x86_64/start.S:117:0
```

This is caused by the fact that after the change, `f16`s are no longer
lowered as `f32`s in calls.

Two possible fixes are available:
- Update calling conventions to properly support passing `f16` as
integers.
- Update `useFPRegsForHalfType()` to return `true` so that `f16` are
still kept in `f32` registers, as before #110199.

This PR implements the first solution to not introduce any more ABI
changes as #110199 already did.

As of what is the correct ABI for halfs, I don't think there is a
correct answer. GCC doesn't support halfs on MIPS, and I couldn't find
any information on old MIPS ABI manuals either.
2024-11-19 10:23:32 +01:00
Sander de Smalen
3093b29b59
[RegisterCoalescer] Fix up subreg lanemasks after rematerializing. (#116191)
In a situation like the following:

```
   undef %2.subreg = INST %1         ; DefMI (rematerializable),
                                     ; DefSubIdx = subreg
   %3 = COPY %2                      ; SrcIdx = DstIdx = 0
   .... = SOMEINSTR %3, %2
```
there are no subranges for `%3` because the entire register is copied,
but after rematerialization the subrange of the rematerialized value
must be fixed up with the appropriate subranges for `.subreg`.

(To me this issue seemed a bit similar to the issue fixed by #96839, but
then related to rematerialization)
2024-11-19 08:46:55 +00:00
Matt Arsenault
927032807d
AMDGPU: Handle gfx950 96/128-bit buffer_load_lds (#116681)
Enforcing this limit in the clang builtin will come later.
2024-11-18 22:01:56 -08:00
Matt Arsenault
50224bd5ba
AMDGPU: Handle gfx950 global_load_lds_* instructions (#116680)
Define global_load_lds_dwordx3 and global_load_dwordx4.
Oddly it seems dwordx2 was skipped.
2024-11-18 21:58:02 -08:00
Matt Arsenault
130a3150ec
AMDGPU: Define v_mfma_f32_32x32x16_bf16 for gfx950 (#116679)
Unlike the existing gfx940 intrinsics using short/i16 in place of
bfloat, this uses the natural bfloat type.
2024-11-18 21:53:56 -08:00
Matt Arsenault
738bdd4969
AMDGPU: Add V_CVT_PK_BF16_F32 for gfx950 (#116678) 2024-11-18 21:50:54 -08:00
Haopeng Liu
661c593850
[FunctionAttrs] Add the "initializes" attribute inference (#97373)
Add the "initializes" attribute inference.

This change is expected to have ~0.09% compile time regression, which
seems acceptable for interprocedural DSE.

https://llvm-compile-time-tracker.com/compare.php?from=9f10252c4ad7cffbbcf692fa9c953698f82ac4f5&to=56345c1cee4375eb5c28b8e7abf4803d20216b3b&stat=instructions%3Au
2024-11-18 21:36:05 -08:00
Jim Lin
cd418030de [RISCV] Remove +a from the attribute test for zacas and zabha. NFC.
zacas and zabha don't require the 'a' or 'zaamo' extension after
https://github.com/llvm/llvm-project/pull/115694.
2024-11-19 10:18:40 +08:00
David Green
36d47f8878 [AArch64][GlobalISel] Legalize ptr vector freeze and implicit defs.
They can be treated the same as other s64 operations.
2024-11-18 22:35:58 +00:00
David Green
50209e9942 [AArch64][GlobalISel] Move and update freeze.ll test. NFC
This adds a number of extra vector cases, notably the ptr vectors.
2024-11-18 22:31:13 +00:00
Youngsuk Kim
b083340cb6
[llvm][NVPTX] Don't reorder MIs that construct a PTX function call (#116522)
With "-enable-misched", MachineScheduler can reorder MIs that must stick
together (in initially set order) to generate legal PTX code for a
function call.

When generating PTX code for the attached test (using LLVM before this
revision), the following invalid PTX code is generated:

```
  { // callseq 0, 0
  .param .b64 param0;
  st.param.f64  [param0], 0d0000000000000000;
  .param .b64 retval0;
  call.uni (retval0),
  mul.lo.s32  %r7, %r10, %r3;
  or.b32    %r8, %r4, %r7;
  mul.lo.s32  %r9, %r2, %r8;
  cvt.rn.f64.s32  %fd3, %r9;
  quux,
  (
  param0
  );
  ld.param.f64  %fd1, [retval0];
  } // callseq 0
```
2024-11-18 17:12:19 -05:00
Justin Bogner
e0b522dd94
[DirectX] Fix crash in DXILFlattenArrays for function declarations (#116690)
We were skipping intrinsics here, but really we need to skip all
function declarations - if the function doesn't have a body there's
nothing to walk.
2024-11-18 13:56:33 -08:00
Alex MacLean
55876278d3
[NVPTX] Add support for f16 fabs (#116107)
Add support for f16 and f16x2 support for abs. See PTX ISA 9.7.4.6. Half
Precision Floating Point Instructions: abs
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#half-precision-floating-point-instructions-abs
2024-11-18 13:47:08 -08:00
Thorsten Schütt
f8d1905a24
[GlobalISel] Combine [S,U]SUBO (#116489)
We import the llvm.ssub.with.overflow.* Intrinsics, but the Legalizer
also builds them while legalizing other opcodes, see narrowScalarAddSub.
2024-11-18 22:39:23 +01:00
Matt Arsenault
0c421687f8
AMDGPU: Add first gfx950 mfma instructions (#116312)
Scheduling info and hazards are wrong and TBD.
2024-11-18 13:38:07 -08:00
Matt Arsenault
ca1b35a6c8
AMDGPU: Add v_prng_b32 instruction for gfx950 (#116310)
Rand num instruction for stochastic rounding.
2024-11-18 10:54:54 -08:00
Matt Arsenault
5a556d55fb
AMDGPU: Increase the LDS size to support to 160 KB for gfx950 (#116309) 2024-11-18 10:48:56 -08:00
Matt Arsenault
a6fc489bb7
AMDGPU: Add gfx950 subtarget definitions (#116307)
Mostly a stub, but adds some baseline tests and
tests for removed instructions.
2024-11-18 10:41:14 -08:00
Fraser Cormack
18be88e20a
[NVPTX][NFC] Regenerate some tests checks (#116605)
Use update_llc_test_checks.py to automate the test checks in some files
I was observing changes in locally.
2024-11-18 18:35:44 +00:00
Lei Huang
ed8ebad6eb
[SelectionDAG] Support integer promotion for VP_LOAD and VP_STORE (#81299)
Add integer promotion support for for VP_LOAD and VP_STORE via legalization of extend
and truncate of each form.

Patch commandeered from: https://reviews.llvm.org/D109377
2024-11-18 13:32:58 -05:00
Sam Elliott
4615cc38f3
[RISCV] Inline Assembly Support for GPR Pairs ('R') (#112983)
This patch adds support for getting even-odd general purpose register
pairs into and out of inline assembly using the `R` constraint as
proposed in riscv-non-isa/riscv-c-api-doc#92

There are a few different pieces to this patch, each of which need their
own explanation.

- Renames the Register Class used for f64 values on rv32i_zdinx from
  `GPRPair*` to `GPRF64Pair*`. These register classes are kept broadly
  unmodified, as their primary value type is used for type inference
  over selection patterns. This rename affects quite a lot of files.

- Adds new `GPRPair*` register classes which will be used for `R`
  constraints and for instructions that need an even-odd GPR pair. This
  new type is used for `amocas.d.*`(rv32) and `amocas.q.*`(rv64) in
  Zacas, instead of the `GPRF64Pair` class being used before.

- Marks the new `GPRPair` class legal as for holding a `MVT::Untyped`.
  Two new RISCVISD node types are added for creating and destructing a
  pair - `BuildGPRPair` and `SplitGPRPair`, and are introduced when
  bitcasting to/from the pair type and `untyped`.

- Adds functionality to `splitValueIntoRegisterParts` and
  `joinRegisterPartsIntoValue` to handle changing `i<2*xlen>` MVTs into
  `untyped` pairs.

- Adds an override for `getNumRegisters` to ensure that `i<2*xlen>`
  values, when going to/from inline assembly, only allocate one (pair)
  register (they would otherwise allocate two). This is due to a bug in
  SelectionDAGBuilder.cpp which other backends also work around.

- Ensures that Clang understands that `R` is a valid inline assembly
  constraint.

- This also allows `R` to be used for `f64` types on `rv32_zdinx`
  architectures, where doubles are stored in a GPR pair.
2024-11-18 17:45:58 +00:00
Hugh Delaney
8f8016fe66
[NVPTX] Add patterns for fma.relu.{f16|f16x2|bf16|bf16x2} (#114977)
Add patterns to lower `fmaxnum(fma(a, b, c), 0)` to `fma.rn{.ftz}.relu`
for `f16`, `f16x2`, `bf16`, `bf16x2` types, when `nnan` is used.

`fma_relu` honours `NaN`, so the substitution is only made if the `fma`
is `nnan`, since `fmaxnum` returns the non NaN argument when passed a
NaN value.

This patch also removes some `bf16` ftz instructions since `FTZ` is not
supported with the `bf16` type, according to the PTX ISA docs.
2024-11-18 15:29:17 +00:00
Steven Perron
756fe54dc7
[SPIRV] Add write to image buffer for shaders. (#115927)
This commit adds an intrinsic that will write to an image buffer. We
chose to match the name of the DXIL intrinsic for simplicity in clang.

We cannot reuse the existing openCL write_image function because that is
not a reserved name in HLSL. There is not much common code to factor
out.
2024-11-18 09:06:05 -05:00
Akshat Oke
3f9d02aae8
[CodeGen][NewPM] Port PeepholeOptimizer to NPM (#116326)
With this, all machine SSA optimization passes are available in the new codegen pipeline.
2024-11-18 11:02:01 +05:30
Aiden Grossman
d9eda6b2f3 [MLGO] Remove extranous check lines from test input
This patch removes check lines from a test input. It was originally
copied from a test that had assertions automatically generated, but
given we only use it as an input, the check lines do absolutely nothing.
Remove them to improve readability of the test/prevent confusion.
2024-11-18 03:52:13 +00:00
Freddy Ye
97836bed63
Reland "[X86] Support -march=diamondrapids (#113881)" (#116564)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-11-18 10:40:32 +08:00
Freddy Ye
90e92239bd
Revert "[X86] Support -march=diamondrapids (#113881)" (#116563)
This reverts commit 826b845c9e97448395431be3e4e5da585bd98c5e.
2024-11-18 08:45:28 +08:00
Freddy Ye
826b845c9e
[X86] Support -march=diamondrapids (#113881)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
2024-11-18 08:31:17 +08:00
Craig Topper
eed9af95e6 [RISCV][GISel] Make loads/stores with s16 register type and s16 memory type legal.
This is needed to support Zfh loads/stores.

This requires supporting extends from sext/zext form i16 and s16
G_FREEZE to support the current tests we have.
2024-11-17 11:39:59 -08:00