52796 Commits

Author SHA1 Message Date
Shilei Tian
e9c1dbb408 Revert "[AMDGPU] Replace isInlinableLiteral16 with specific version (#81345)"
This reverts commit 530f0e64ec11327879c44f2fd55c7c28efdbaa2d because it breaks
downstream.
2024-03-06 08:42:54 -05:00
Yuta Mukai
ea23761429
[AArch64] Verify ldp/stp alignment stricter (#84124)
When ldp-aligned-only/stp-aligned-only is specified, modified to cancel
ldp/stp transformation if MachineMemOperand is not present or the access
size is unknown.
In the previous implementation, the test passed when there was no
MachineMemOperand. Also, if the size was unknown, an incorrect value was
used or an assertion failed. (But actually, if there is no
MachineMemOperand, it will be excluded from the target by
isCandidateToMergeOrPair() before reaching the part.)

A statistic NumFailedAlignmentCheck is added. NumPairCreated is modified
so that it only counts if it is not canceled.
2024-03-06 20:19:56 +09:00
Pierre van Houtryve
52d5b8e02d
[AMDGPU] Don't form sext/abs/neg fp8 cvt (#83843)
gfx940 does not allow abs/sext/neg on v_cvt_fp8/bf8 & pk variants.

Fixes SWDEV-447468
2024-03-06 10:38:20 +01:00
Luke Lau
bec7ad9fd6 [RISCV] Add tests for vw{add,sub,mul} with nested extend. NFC
These test cases show (op (ext a), (ext b)) patterns where the dest EEW is
more than 2 * source EEW. These could be lowered into widening ops where we
still have extend the operands, but at a smaller EEW.
2024-03-06 15:56:02 +08:00
Sameer Sahasrabuddhe
60822637bf Restore "Implement convergence control in MIR using SelectionDAG (#71785)"
This restores commit c7fdd8c11e54585dc9d15d63de9742067e0506b9.
Previously reverted in f010b1bef4dda2c7082cbb41dbabf1f149cce306.

LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-03-06 12:19:32 +05:30
David Majnemer
9e759f3523 [AArch64] Fix fptoi/itofp for bf16
There were a number of issues that needed to be addressed:
- i64 to bf16 did not correctly round
- strict rounding needed to yield a chain
- fastisel did not have logic to bail on bf16
2024-03-06 06:17:39 +00:00
AtariDreams
2a13422b8b
Convert many LivePhysRegs uses to LiveRegUnits (#83905) 2024-03-06 10:38:14 +05:30
Heejin Ahn
403b9cf1bb
[WebAssembly] Use RefTypeMem2Local instead of Mem2Reg (#83196)
When reference-types feature is enabled, forcing mem2reg unconditionally
even in `-O0` has some problems described in #81575. This uses
RefTypeMem2Local pass added in #81965 instead. This also removes
`IsForced` parameter added in

890146b192
given that we don't need it anymore.

This may still hurt debug info related to reference type variables a
little during the backend transformation given that they are not stored
in memory anymore, but reference type variables are presumably rare and
it would be still a lot less damage than forcing mem2reg on the whole
program. Also this fixes the EH problem described in #81575.

Fixes #81575.
2024-03-05 19:54:41 -08:00
Luke Lau
0207270494
[RISCV] Don't remove extends for i1 indices in mgather/mscatter (#83951) 2024-03-06 08:52:27 +08:00
Benjamin Kramer
55c466da2f [X86][AVX512BF16] Add a few missing insert/extract patterns
These are really the same as the f16 (and i16) instructions, but we need
them for any type that can occur.
2024-03-06 00:52:29 +01:00
Florian Mayer
6f11c95d06
Revert "[AArch64] Verify ldp/stp alignment stricter" (#84096)
Reverts llvm/llvm-project#83948

This broke the ASan buildbot:
https://lab.llvm.org/buildbot/#/builders/168/builds/19054/steps/10/logs/stdio
2024-03-05 15:52:09 -08:00
Craig Topper
58d8805ff9
[RISCV] Always use signed APSInt in getExactInteger. (#84070)
We were setting based on whether the FP value is positive/negative, but
we really want to know whether the resulting integer will be treated as
a signed or unsigned value. Since we use SINT_TO_FP to convert the
integer to FP, we should always used signed here.

Without this we convert +2147483648.0 to an integer 0x80000000 and
convert it using sint_to_fp which produces -2147483648.0.
2024-03-05 14:36:37 -08:00
Fangrui Song
201572e34b
[AArch64] Implement -fno-plt for SelectionDAG/GlobalISel
Clang sets the nonlazybind attribute for certain ObjC features. The
AArch64 SelectionDAG implementation for non-intrinsic calls
(46e36f0953aabb5e5cd00ed8d296d60f9f71b424) is behind a cl option.

GCC implements -fno-plt for a few ELF targets. In Clang, -fno-plt also
sets the nonlazybind attribute. For SelectionDAG, make the cl option not
affect ELF so that non-intrinsic calls to a dso_preemptable function use
GOT. Adjust AArch64TargetLowering::LowerCall to handle intrinsic calls.

For FastISel, change `fastLowerCall` to bail out when a call is due to
-fno-plt.

For GlobalISel, handle non-intrinsic calls in CallLowering::lowerCall
and intrinsic calls in AArch64CallLowering::lowerCall (where the
target-independent CallLowering::lowerCall is not called).
The GlobalISel test in `call-rv-marker.ll` is therefore updated.

Note: the current -fno-plt -fpic implementation does not use GOT for a
preemptable function.

Link: #78275

Pull Request: https://github.com/llvm/llvm-project/pull/78890
2024-03-05 13:55:29 -08:00
Craig Topper
a5095b9892 [RISCV] Add test for incorrect FP build vector lowering. NFC
The lowering is not distinquishing -2147483648.0 and 2147483648.0.
2024-03-05 13:10:14 -08:00
Noah Goldstein
17162b61c2 [KnownBits] Make nuw and nsw support in computeForAddSub optimal
Just some improvements that should hopefully strengthen analysis.

Closes #83580
2024-03-05 12:59:58 -06:00
Farzon Lotfi
b2ca23aed8
[HLSL] implement exp intrinsic (#83832)
This change implements: #70072

- `hlsl_intrinsics.h` - add the `exp` api
- `DXIL.td` - add the llvm intrinsic to DXIL opcode lowering mapping.
- This change reuses llvm's existing intrinsic
`__builtin_elementwise_exp` \ `int_exp` & `__builtin_elementwise_exp2` \
`int_exp2`
- This PR is part 1 of 2.
- Part 2 requires an intrinsic to instructions lowering.
Part2 will expand `int_exp` to 
```
A = Builder.CreateFMul(log2eConst, val);
int_exp2(A)
```
just like we do in
[TranslateExp](https://github.com/microsoft/DirectXShaderCompiler/blob/main/lib/HLSL/HLOperationLower.cpp#L2220C1-L2236C2)
2024-03-05 12:42:33 -05:00
Farzon Lotfi
643b31dbe8
[HLSL] implement mad intrinsic (#83826)
This change implements #83736
The dot product lowering needs a tertiary multipy add operation. DXIL
has three mad opcodes for `fmad`(46), `imad`(48), and `umad`(49). Dot
product in DXIL only uses `imad`\ `umad`, but for completeness and
because the hlsl `mad` intrinsic requires it `fmad` was also included.
Two new intrinsics were needed to be created to complete this change.
the `fmad` case already supported by llvm via `fmuladd` intrinsic.

- `hlsl_intrinsics.h` - exposed mad api call.
- `Builtins.td` - exposed a `mad` builtin.
- `Sema.h` - make `tertiary` calls check for float types optional. 
- `CGBuiltin.cpp` - pick the intrinsic for singed\unsigned & float also
reuse `int_fmuladd`.
- `SemaChecking.cpp` - type checks for `__builtin_hlsl_mad`. 
- `IntrinsicsDirectX.td` create the two new intrinsics for
`imad`\`umad`/
- `DXIL.td` - create the llvm intrinsic to  `DXIL` opcode mapping.

---------

Co-authored-by: Farzon Lotfi <farzon@farzon.com>
2024-03-05 12:23:26 -05:00
Yuta Mukai
6b5888c27f
[AArch64] Verify ldp/stp alignment stricter (#83948)
When ldp-aligned-only/stp-aligned-only is specified, modified to cancel
ldp/stp transformation if MachineMemOperand is not present or the access
size is unknown.
In the previous implementation, the test passed when there was no
MachineMemOperand. Also, if the size was unknown, an incorrect value was
used or an assertion failed. (But actually, if there is no
MachineMemOperand, it will be excluded from the target by
isCandidateToMergeOrPair() before reaching the part.)

A statistic NumFailedAlignmentCheck is added. NumPairCreated is modified
so that it only counts if it is not cancelled.
2024-03-06 01:47:28 +09:00
elhewaty
26058e68ea
[DAG] select (sext m), (add X, C), X --> (add X, (and C, (sext m)))) (#83640)
- [DAG][X86] Add tests for Folding select m, add(X, C), X --> add (X, and(C, m))(NFC)
- [DAG][X86] Fold select (sext m), (add X, C), X --> (add X, (and C, (sext m))))
- Fixes: https://github.com/llvm/llvm-project/issues/66101
2024-03-05 16:41:41 +00:00
James Westwood
b2c16e7ff4
Revert "[ARM] R11 not pushed adjacent to link register with PAC-M and… (#84019)
… AAPCS frame chain fix (#82801)"

This reverts commit 00e4a4197137410129d4725ffb82bae9ce44bdde. This patch
was found to cause miscompilations and compilation failures.
2024-03-05 14:34:43 +00:00
bcahoon
4cf8b298cf
[AMDGPU][PromoteAlloca] Correctly handle a variable vector index (#83597)
The promote alloca to vector transformation assumes that the
vector index is a constant value. If it is not a constant, then
either an assert occurs or the tranformation generates an
incorrect index.
2024-03-05 08:18:17 -06:00
Yeting Kuo
d95a0d7c0f
[DAG] Teach SelectionDAGBuilder to read parameter alignment of compressstore/expandload. (#83763)
Previously SelectionDAGBuilder used ABI alignment for
compressstore/expandload. This patch allows SelectionDAGBuilder to use
parameter alignment like vp intrinsics. This does not follow the
original code to default use vector type alignment, since it is possible
implemented to unaligned vector alignment.
2024-03-05 20:48:37 +08:00
Paul Walker
341d674b6f
[LLVM][AArch64][CodeGen] Mark FFR as a reserved register. (#83437)
This allows the removal of FFR related psuedo nodes that only existed to
work round machine verifier failures.
2024-03-05 12:34:15 +00:00
Benjamin Kramer
20895965b2 [NVPTX] Remove sub.s16x2 instruction
According to the PTX ISA this doesn't exist (and ptxas rejects it)

See https://github.com/pytorch/pytorch/issues/118589
2024-03-05 12:43:02 +01:00
Simon Pilgrim
49f95052c8 [X86] pr59305.ll - replace "X86-64" check prefix with "X64" 2024-03-05 11:33:34 +00:00
Simon Pilgrim
191f7678f7 [X86] 2007-03-15-GEP-Idx-Sink.ll - regenerate test checks 2024-03-05 11:33:34 +00:00
Luke Lau
a668846202
[DAGCombiner] Handle extending EXTRACT_VECTOR_ELTs in calculateByteProvider (#83963)
An EXTRACT_VECTOR_ELT can extend the element to the width of its result
type, leaving the high bits undefined. Previously if we attempted to
query the bytes in these high bits we would recurse and hit an
assertion. This fixes it by bailing if the index is outside of the
vector element size.

I think the assertion Index < ByteWidth may still be incorrect, since
ByteWidth is calculated from Op.getValueSizeInBits(). I believe this
should be Op.getScalarValueSizeInBits() whenever VectorIndex is set
since we're querying the element now, not the vector. But I couldn't
think of a test case to trigger it. It can be addressed in a follow-up
patch.

Fixes #83920
2024-03-05 18:31:33 +08:00
Felix (Ting Wang)
ed6275868b
[PowerPC][NFC] Update aix-tls-xcoff-reloc.ll (#83764)
Update test case changed by #66316
2024-03-05 14:07:47 +08:00
Wang Pengcheng
0fbe45bdb9
[RISCV] Add support of Sscofpmf (#83831)
This is used in profile, but somehow we missed it.
2024-03-05 10:45:13 +08:00
MalaySanghiIntel
564b81db85
Add support for x87 registers on GISel register selection (#83528)
We handle 3 register classes for x87 - rfp32, rfp64 and rfp80.
1. X87 registers are assigned a pseudo register class. We need a new
register bank for these classes.
2. We add bank information and enums for these.
3. Legalizer is updated to allow 32b and 64b even in absence of SSE.
This is required because with SSE enabled, x86 doesn't use fp stack and
instead uses SSE registers.
4. Functions in X86RegisterBankInfo need to decide whether to use the
pseudo classes or SSE-enabled classes. I add MachineInstr as an argument
to static helper function getPartialMappingIdx to this end.

Add/Update tests.
2024-03-05 03:00:14 +01:00
wanglei
a5c90e48b6
[LoongArch] Switch to the Machine Scheduler (#83759)
The SelectionDAG scheduling preference now becomes source order
scheduling (machine scheduler generates better code -- even without
there being a machine model defined for LoongArch yet).

Most of the test changes are trivial instruction reorderings and
differing register allocations, without any obvious performance impact.

This is similar to commit: 3d0fbafd0bce43bb9106230a45d1130f7a40e5ec
2024-03-05 09:15:44 +08:00
AtariDreams
3e40c96d89
[X86] Resolve FIXME: Add FPCW as a rounding control register (#82452)
To prevent tests from breaking, another fix had to be made: Now, we
check if the instruction after a waiting instruction is a call, and if
so, we insert the wait.
2024-03-05 08:47:05 +08:00
Craig Topper
bdfebc310e [X86] Use update_mir_test_checks.py to generate CHECK lines in masked_compressstore_isel.ll. NFC 2024-03-04 16:17:26 -08:00
Benjamin Kramer
8cc8fdaf5c [AArch64] Also promote vector bf16 INT_TP_FP to f32
This mirrors the scalar version.
2024-03-04 23:34:56 +01:00
Natalie Chouinard
6325dd5731
[HLSL][SPIR-V] Add SV_DispatchThreadID semantic support (#82536)
Add SPIR-V backend support for the HLSL SV_DispatchThreadID semantic
attribute, which is lowered to a @llvm.dx.thread.id intrinsic in LLVM
IR. In the SPIR-V backend, this is now correctly translated to a
`GlobalInvocationId` builtin variable.

Fixes #82534
2024-03-04 16:45:23 -05:00
David Majnemer
930e7ff9ae [AArch64] Optimize abs, neg and copysign for fp16/bf16
We can use bitwise arithmetic to implement these, making them
considerably faster than legalization via promotion.
2024-03-04 20:05:05 +00:00
Noah Goldstein
a4951eca40 Recommit "[X86] Don't always separate conditions in (br (and/or cond0, cond1)) into separate branches" (2nd Try)
Changes in Recommit:
    1) Fix non-determanism by using `SmallMapVector` instead of
       `SmallPtrSet`.
    2) Fix bug in dependency pruning where we discounted the actual
       `and/or` combining the two conditions. This lead to over pruning.

Closes #81689
2024-03-04 13:23:56 -06:00
Mitch Phillips
f010b1bef4 Revert "Restore "Implement convergence control in MIR using SelectionDAG (#71785)""
This reverts commit c7fdd8c11e54585dc9d15d63de9742067e0506b9.

Reason: Broke the sanitizer buildbots. See the comments at
https://github.com/llvm/llvm-project/pull/71785
for more information.
2024-03-04 17:05:34 +01:00
Tuan Chuong Goh
13a78fd1ac [AArch64][GlobalISel] Re-commit Legalize G_SHUFFLE_VECTOR for Odd-Sized Vectors (#83038)
Legalize smaller/larger than legal vectors with i8 and i16 element sizes.
Vectors with elements smaller than i8 will get widened to i8 elements.
2024-03-04 15:03:55 +00:00
Mirko Brkušanin
27ce5121ee
[AMDGPU] Fix setting nontemporal in memory legalizer (#83815)
Iterator MI can advance in insertWait() but we need original instruction
to set temporal hint. Just move it before handling volatile.
2024-03-04 15:05:31 +01:00
Shilei Tian
530f0e64ec
[AMDGPU] Replace isInlinableLiteral16 with specific version (#81345) 2024-03-04 08:40:42 -05:00
Qiu Chaofan
906580bad3
[PowerPC] Add intrinsics for rldimi/rlwimi/rlwnm (#82968)
These builtins are already there in Clang, however current codegen may
produce suboptimal results due to their complex behavior. Implement them
as intrinsics to ensure expected instructions are emitted.
2024-03-04 21:13:59 +08:00
James Westwood
00e4a41971
[ARM] R11 not pushed adjacent to link register with PAC-M and AAPCS frame chain fix (#82801)
When code for M class architecture was compiled with AAPCS and PAC
enabled, the frame pointer, r11, was not pushed to the stack adjacent to
the link register. Due to PAC being enabled, r12 was placed between r11
and lr. This patch fixes this by adding an extra case to the already
existing code that splits the GPR push in two when R11 is the frame
pointer and certain paremeters are met. The differential revision for
this previous change can be found here:
https://reviews.llvm.org/D125649. This now ensures that r11 and lr are
pushed in a separate push instruction to the other GPRs when PAC and
AAPCS are enabled, meaning the frame pointer and link register are now
pushed onto the stack adjacent to each other.
2024-03-04 12:00:36 +00:00
Vyacheslav Levytskyy
8f30b62395
[SPIR-V] Add support for the SPIR-V extension SPV_INTEL_bfloat16_conversion (#83443)
This PR is to add support for the SPIR-V extension
SPV_INTEL_bfloat16_conversion
(https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/INTEL/SPV_INTEL_bfloat16_conversion.asciidoc)
and OpenCL extension cl_intel_bfloat16_conversions
(https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_bfloat16_conversions.html).
2024-03-04 12:55:09 +01:00
Vyacheslav Levytskyy
67d5ba9077
[SPIR-V] Add support for SPV_KHR_float_controls (#83418)
This PR is to add explicit support for SPV_KHR_float_controls
(https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/KHR/SPV_KHR_float_controls.asciidoc).
This extension is included into SPIR-V after version 1.4, but in case of
lower versions it is to be included explicitly and OpExtension must be
present in the module with `OpExtension "SPV_KHR_float_controls"`.

This PR fixes this issue and fixes the test case
test/CodeGen/SPIRV/exec_mode_float_control_khr.ll to account for a
version lower than 1.4.
2024-03-04 12:15:59 +01:00
Vyacheslav Levytskyy
ecc3bdaae1
[SPIR-V] Fix bitcast legalization/instruction selection in SPIR-V Backend (#83139)
This PR is to fix a way how SPIR-V Backend describes legality of
OpBitcast instruction and how it is validated on a step of instruction
selection. Instead of checking a size of virtual registers (that makes
no sense due to lack of guarantee of direct relations between size of
virtual register and bit width associated with the type size), this PR
allows to legalize OpBitcast without size check and postpones validation
to the instruction selection step.

As an example, let's consider the next example that was copied as is
from a bigger test suite:

```
  %355:id(s16) = G_BITCAST %301:id(s32)
  %303:id(s16) = ASSIGN_TYPE %355:id(s16), %349:type(s32)
  %644:fid(s32) = G_FMUL %645:fid, %646:fid
  %301:id(s32) = ASSIGN_TYPE %644:fid(s32), %40:type(s32)
```

Without the PR this leads to a crash with complains to an illegal
bitcast, because %355 is s16 and %301 is s32. However, we must check not
virtual registers in this case, but types of %355 and %301, i.e.,
%349:type(s32) and %40:type(s32), which are perfectly well compatible in
a sense of OpBitcast in this case.

In a test case that is a part of this PR OpBitcast is legal, being
applied for `OpTypeInt 16` and `OpTypeFloat 16`, but would not be
legalized without this PR due to virtual registers defined as having
size 16 and 32.
2024-03-04 12:15:30 +01:00
Vyacheslav Levytskyy
540d255167
[SPIRV] Add vector reduction instructions (#82786)
This PR is to add vector reduction instructions according to
https://llvm.org/docs/GlobalISel/GenericOpcode.html#vector-reduction-operations
and widen in such a way a range of successful supported conversions,
covering new cases of vector reduction instructions which IRTranslator
is unable to resolve.

By legalizing vector reduction instructions we introduce a new
instruction patterns that should be addressed, including patterns that
are delegated to pre-legalize step. To address this problem, a new pass
is added that is to bring newly generated instructions after
legalization to an aspect required by instruction selection.

Expected overheads for existing cases is minimal, because a new pass is
working only with newly introduced instructions, otherwise it's just a
additional code traverse without any actions.
2024-03-04 12:14:58 +01:00
Mirko Brkušanin
982e9022ca
[AMDGPU] Add GFX12 memory legalizer tests (#83814) 2024-03-04 11:22:04 +01:00
Sameer Sahasrabuddhe
c7fdd8c11e Restore "Implement convergence control in MIR using SelectionDAG (#71785)"
Original commit 79889734b940356ab3381423c93ae06f22e772c9.
Perviously reverted in commit a2afcd5721869d1d03c8146bae3885b3385ba15e.

LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-03-04 13:28:04 +05:30
Luke Lau
63725ab119 [RISCV] Add test for aliasing miscompile fixed by #83017. NFC
Previously we incorrectly removed the scalar load store pair here assuming it
was dead, when it actually aliased with the memset.  This showed up as a
miscompile on SPEC CPU 2017 when compiling with -mrvv-vector-bits, and was only
triggered by the changes in #75531.  This was fixed in #83017, but this patch
adds a test case for this specific miscompile.

For reference, the incorrect codegen was:

	vsetvli	a1, zero, e8, m4, ta, ma
	vmv.v.i	v8, 0
	vs4r.v	v8, (a0)
	addi	a1, a0, 80
	vsetivli	zero, 16, e8, m1, ta, ma
	vmv.v.i	v8, 0
	vs1r.v	v8, (a1)
	addi	a0, a0, 64
	vs1r.v	v8, (a0)
2024-03-04 15:45:26 +08:00