llvm-project

Author	SHA1	Message	Date
Shilei Tian	e9c1dbb408	Revert "[AMDGPU] Replace `isInlinableLiteral16` with specific version (#81345 )" This reverts commit 530f0e64ec11327879c44f2fd55c7c28efdbaa2d because it breaks downstream.	2024-03-06 08:42:54 -05:00
Yuta Mukai	ea23761429	[AArch64] Verify ldp/stp alignment stricter (#84124 ) When ldp-aligned-only/stp-aligned-only is specified, modified to cancel ldp/stp transformation if MachineMemOperand is not present or the access size is unknown. In the previous implementation, the test passed when there was no MachineMemOperand. Also, if the size was unknown, an incorrect value was used or an assertion failed. (But actually, if there is no MachineMemOperand, it will be excluded from the target by isCandidateToMergeOrPair() before reaching the part.) A statistic NumFailedAlignmentCheck is added. NumPairCreated is modified so that it only counts if it is not canceled.	2024-03-06 20:19:56 +09:00
Pierre van Houtryve	52d5b8e02d	[AMDGPU] Don't form sext/abs/neg fp8 cvt (#83843 ) gfx940 does not allow abs/sext/neg on v_cvt_fp8/bf8 & pk variants. Fixes SWDEV-447468	2024-03-06 10:38:20 +01:00
Luke Lau	bec7ad9fd6	[RISCV] Add tests for vw{add,sub,mul} with nested extend. NFC These test cases show (op (ext a), (ext b)) patterns where the dest EEW is more than 2 * source EEW. These could be lowered into widening ops where we still have extend the operands, but at a smaller EEW.	2024-03-06 15:56:02 +08:00
Sameer Sahasrabuddhe	60822637bf	Restore "Implement convergence control in MIR using SelectionDAG (#71785 )" This restores commit c7fdd8c11e54585dc9d15d63de9742067e0506b9. Previously reverted in f010b1bef4dda2c7082cbb41dbabf1f149cce306. LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-03-06 12:19:32 +05:30
David Majnemer	9e759f3523	[AArch64] Fix fptoi/itofp for bf16 There were a number of issues that needed to be addressed: - i64 to bf16 did not correctly round - strict rounding needed to yield a chain - fastisel did not have logic to bail on bf16	2024-03-06 06:17:39 +00:00
AtariDreams	2a13422b8b	Convert many LivePhysRegs uses to LiveRegUnits (#83905 )	2024-03-06 10:38:14 +05:30
Heejin Ahn	403b9cf1bb	[WebAssembly] Use RefTypeMem2Local instead of Mem2Reg (#83196 ) When reference-types feature is enabled, forcing mem2reg unconditionally even in `-O0` has some problems described in #81575. This uses RefTypeMem2Local pass added in #81965 instead. This also removes `IsForced` parameter added in `890146b192` given that we don't need it anymore. This may still hurt debug info related to reference type variables a little during the backend transformation given that they are not stored in memory anymore, but reference type variables are presumably rare and it would be still a lot less damage than forcing mem2reg on the whole program. Also this fixes the EH problem described in #81575. Fixes #81575.	2024-03-05 19:54:41 -08:00
Luke Lau	0207270494	[RISCV] Don't remove extends for i1 indices in mgather/mscatter (#83951 )	2024-03-06 08:52:27 +08:00
Benjamin Kramer	55c466da2f	[X86][AVX512BF16] Add a few missing insert/extract patterns These are really the same as the f16 (and i16) instructions, but we need them for any type that can occur.	2024-03-06 00:52:29 +01:00
Florian Mayer	6f11c95d06	Revert "[AArch64] Verify ldp/stp alignment stricter" (#84096 ) Reverts llvm/llvm-project#83948 This broke the ASan buildbot: https://lab.llvm.org/buildbot/#/builders/168/builds/19054/steps/10/logs/stdio	2024-03-05 15:52:09 -08:00
Craig Topper	58d8805ff9	[RISCV] Always use signed APSInt in getExactInteger. (#84070 ) We were setting based on whether the FP value is positive/negative, but we really want to know whether the resulting integer will be treated as a signed or unsigned value. Since we use SINT_TO_FP to convert the integer to FP, we should always used signed here. Without this we convert +2147483648.0 to an integer 0x80000000 and convert it using sint_to_fp which produces -2147483648.0.	2024-03-05 14:36:37 -08:00
Fangrui Song	201572e34b	[AArch64] Implement -fno-plt for SelectionDAG/GlobalISel Clang sets the nonlazybind attribute for certain ObjC features. The AArch64 SelectionDAG implementation for non-intrinsic calls (46e36f0953aabb5e5cd00ed8d296d60f9f71b424) is behind a cl option. GCC implements -fno-plt for a few ELF targets. In Clang, -fno-plt also sets the nonlazybind attribute. For SelectionDAG, make the cl option not affect ELF so that non-intrinsic calls to a dso_preemptable function use GOT. Adjust AArch64TargetLowering::LowerCall to handle intrinsic calls. For FastISel, change `fastLowerCall` to bail out when a call is due to -fno-plt. For GlobalISel, handle non-intrinsic calls in CallLowering::lowerCall and intrinsic calls in AArch64CallLowering::lowerCall (where the target-independent CallLowering::lowerCall is not called). The GlobalISel test in `call-rv-marker.ll` is therefore updated. Note: the current -fno-plt -fpic implementation does not use GOT for a preemptable function. Link: #78275 Pull Request: https://github.com/llvm/llvm-project/pull/78890	2024-03-05 13:55:29 -08:00
Craig Topper	a5095b9892	[RISCV] Add test for incorrect FP build vector lowering. NFC The lowering is not distinquishing -2147483648.0 and 2147483648.0.	2024-03-05 13:10:14 -08:00
Noah Goldstein	17162b61c2	[KnownBits] Make `nuw` and `nsw` support in `computeForAddSub` optimal Just some improvements that should hopefully strengthen analysis. Closes #83580	2024-03-05 12:59:58 -06:00
Farzon Lotfi	b2ca23aed8	[HLSL] implement exp intrinsic (#83832 ) This change implements: #70072 - `hlsl_intrinsics.h` - add the `exp` api - `DXIL.td` - add the llvm intrinsic to DXIL opcode lowering mapping. - This change reuses llvm's existing intrinsic `__builtin_elementwise_exp` \ `int_exp` & `__builtin_elementwise_exp2` \ `int_exp2` - This PR is part 1 of 2. - Part 2 requires an intrinsic to instructions lowering. Part2 will expand `int_exp` to ``` A = Builder.CreateFMul(log2eConst, val); int_exp2(A) ``` just like we do in [TranslateExp](https://github.com/microsoft/DirectXShaderCompiler/blob/main/lib/HLSL/HLOperationLower.cpp#L2220C1-L2236C2)	2024-03-05 12:42:33 -05:00
Farzon Lotfi	643b31dbe8	[HLSL] implement `mad` intrinsic (#83826 ) This change implements #83736 The dot product lowering needs a tertiary multipy add operation. DXIL has three mad opcodes for `fmad`(46), `imad`(48), and `umad`(49). Dot product in DXIL only uses `imad`\ `umad`, but for completeness and because the hlsl `mad` intrinsic requires it `fmad` was also included. Two new intrinsics were needed to be created to complete this change. the `fmad` case already supported by llvm via `fmuladd` intrinsic. - `hlsl_intrinsics.h` - exposed mad api call. - `Builtins.td` - exposed a `mad` builtin. - `Sema.h` - make `tertiary` calls check for float types optional. - `CGBuiltin.cpp` - pick the intrinsic for singed\unsigned & float also reuse `int_fmuladd`. - `SemaChecking.cpp` - type checks for `__builtin_hlsl_mad`. - `IntrinsicsDirectX.td` create the two new intrinsics for `imad`\`umad`/ - `DXIL.td` - create the llvm intrinsic to `DXIL` opcode mapping. --------- Co-authored-by: Farzon Lotfi <farzon@farzon.com>	2024-03-05 12:23:26 -05:00
Yuta Mukai	6b5888c27f	[AArch64] Verify ldp/stp alignment stricter (#83948 ) When ldp-aligned-only/stp-aligned-only is specified, modified to cancel ldp/stp transformation if MachineMemOperand is not present or the access size is unknown. In the previous implementation, the test passed when there was no MachineMemOperand. Also, if the size was unknown, an incorrect value was used or an assertion failed. (But actually, if there is no MachineMemOperand, it will be excluded from the target by isCandidateToMergeOrPair() before reaching the part.) A statistic NumFailedAlignmentCheck is added. NumPairCreated is modified so that it only counts if it is not cancelled.	2024-03-06 01:47:28 +09:00
elhewaty	26058e68ea	[DAG] select (sext m), (add X, C), X --> (add X, (and C, (sext m)))) (#83640 ) - [DAG][X86] Add tests for Folding select m, add(X, C), X --> add (X, and(C, m))(NFC) - [DAG][X86] Fold select (sext m), (add X, C), X --> (add X, (and C, (sext m)))) - Fixes: https://github.com/llvm/llvm-project/issues/66101	2024-03-05 16:41:41 +00:00
James Westwood	b2c16e7ff4	Revert "[ARM] R11 not pushed adjacent to link register with PAC-M and… (#84019 ) … AAPCS frame chain fix (#82801)" This reverts commit 00e4a4197137410129d4725ffb82bae9ce44bdde. This patch was found to cause miscompilations and compilation failures.	2024-03-05 14:34:43 +00:00
bcahoon	4cf8b298cf	[AMDGPU][PromoteAlloca] Correctly handle a variable vector index (#83597 ) The promote alloca to vector transformation assumes that the vector index is a constant value. If it is not a constant, then either an assert occurs or the tranformation generates an incorrect index.	2024-03-05 08:18:17 -06:00
Yeting Kuo	d95a0d7c0f	[DAG] Teach SelectionDAGBuilder to read parameter alignment of compressstore/expandload. (#83763 ) Previously SelectionDAGBuilder used ABI alignment for compressstore/expandload. This patch allows SelectionDAGBuilder to use parameter alignment like vp intrinsics. This does not follow the original code to default use vector type alignment, since it is possible implemented to unaligned vector alignment.	2024-03-05 20:48:37 +08:00
Paul Walker	341d674b6f	[LLVM][AArch64][CodeGen] Mark FFR as a reserved register. (#83437 ) This allows the removal of FFR related psuedo nodes that only existed to work round machine verifier failures.	2024-03-05 12:34:15 +00:00
Benjamin Kramer	20895965b2	[NVPTX] Remove sub.s16x2 instruction According to the PTX ISA this doesn't exist (and ptxas rejects it) See https://github.com/pytorch/pytorch/issues/118589	2024-03-05 12:43:02 +01:00
Simon Pilgrim	49f95052c8	[X86] pr59305.ll - replace "X86-64" check prefix with "X64"	2024-03-05 11:33:34 +00:00
Simon Pilgrim	191f7678f7	[X86] 2007-03-15-GEP-Idx-Sink.ll - regenerate test checks	2024-03-05 11:33:34 +00:00
Luke Lau	a668846202	[DAGCombiner] Handle extending EXTRACT_VECTOR_ELTs in calculateByteProvider (#83963 ) An EXTRACT_VECTOR_ELT can extend the element to the width of its result type, leaving the high bits undefined. Previously if we attempted to query the bytes in these high bits we would recurse and hit an assertion. This fixes it by bailing if the index is outside of the vector element size. I think the assertion Index < ByteWidth may still be incorrect, since ByteWidth is calculated from Op.getValueSizeInBits(). I believe this should be Op.getScalarValueSizeInBits() whenever VectorIndex is set since we're querying the element now, not the vector. But I couldn't think of a test case to trigger it. It can be addressed in a follow-up patch. Fixes #83920	2024-03-05 18:31:33 +08:00
Felix (Ting Wang)	ed6275868b	[PowerPC][NFC] Update aix-tls-xcoff-reloc.ll (#83764 ) Update test case changed by #66316	2024-03-05 14:07:47 +08:00
Wang Pengcheng	0fbe45bdb9	[RISCV] Add support of Sscofpmf (#83831 ) This is used in profile, but somehow we missed it.	2024-03-05 10:45:13 +08:00
MalaySanghiIntel	564b81db85	Add support for x87 registers on GISel register selection (#83528 ) We handle 3 register classes for x87 - rfp32, rfp64 and rfp80. 1. X87 registers are assigned a pseudo register class. We need a new register bank for these classes. 2. We add bank information and enums for these. 3. Legalizer is updated to allow 32b and 64b even in absence of SSE. This is required because with SSE enabled, x86 doesn't use fp stack and instead uses SSE registers. 4. Functions in X86RegisterBankInfo need to decide whether to use the pseudo classes or SSE-enabled classes. I add MachineInstr as an argument to static helper function getPartialMappingIdx to this end. Add/Update tests.	2024-03-05 03:00:14 +01:00
wanglei	a5c90e48b6	[LoongArch] Switch to the Machine Scheduler (#83759 ) The SelectionDAG scheduling preference now becomes source order scheduling (machine scheduler generates better code -- even without there being a machine model defined for LoongArch yet). Most of the test changes are trivial instruction reorderings and differing register allocations, without any obvious performance impact. This is similar to commit: 3d0fbafd0bce43bb9106230a45d1130f7a40e5ec	2024-03-05 09:15:44 +08:00
AtariDreams	3e40c96d89	[X86] Resolve FIXME: Add FPCW as a rounding control register (#82452 ) To prevent tests from breaking, another fix had to be made: Now, we check if the instruction after a waiting instruction is a call, and if so, we insert the wait.	2024-03-05 08:47:05 +08:00
Craig Topper	bdfebc310e	[X86] Use update_mir_test_checks.py to generate CHECK lines in masked_compressstore_isel.ll. NFC	2024-03-04 16:17:26 -08:00
Benjamin Kramer	8cc8fdaf5c	[AArch64] Also promote vector bf16 INT_TP_FP to f32 This mirrors the scalar version.	2024-03-04 23:34:56 +01:00
Natalie Chouinard	6325dd5731	[HLSL][SPIR-V] Add SV_DispatchThreadID semantic support (#82536 ) Add SPIR-V backend support for the HLSL SV_DispatchThreadID semantic attribute, which is lowered to a @llvm.dx.thread.id intrinsic in LLVM IR. In the SPIR-V backend, this is now correctly translated to a `GlobalInvocationId` builtin variable. Fixes #82534	2024-03-04 16:45:23 -05:00
David Majnemer	930e7ff9ae	[AArch64] Optimize abs, neg and copysign for fp16/bf16 We can use bitwise arithmetic to implement these, making them considerably faster than legalization via promotion.	2024-03-04 20:05:05 +00:00
Noah Goldstein	a4951eca40	Recommit "[X86] Don't always separate conditions in `(br (and/or cond0, cond1))` into separate branches" (2nd Try) Changes in Recommit: 1) Fix non-determanism by using `SmallMapVector` instead of `SmallPtrSet`. 2) Fix bug in dependency pruning where we discounted the actual `and/or` combining the two conditions. This lead to over pruning. Closes #81689	2024-03-04 13:23:56 -06:00
Mitch Phillips	f010b1bef4	Revert "Restore "Implement convergence control in MIR using SelectionDAG (#71785 )"" This reverts commit c7fdd8c11e54585dc9d15d63de9742067e0506b9. Reason: Broke the sanitizer buildbots. See the comments at https://github.com/llvm/llvm-project/pull/71785 for more information.	2024-03-04 17:05:34 +01:00
Tuan Chuong Goh	13a78fd1ac	[AArch64][GlobalISel] Re-commit Legalize G_SHUFFLE_VECTOR for Odd-Sized Vectors (#83038 ) Legalize smaller/larger than legal vectors with i8 and i16 element sizes. Vectors with elements smaller than i8 will get widened to i8 elements.	2024-03-04 15:03:55 +00:00
Mirko Brkušanin	27ce5121ee	[AMDGPU] Fix setting nontemporal in memory legalizer (#83815 ) Iterator MI can advance in insertWait() but we need original instruction to set temporal hint. Just move it before handling volatile.	2024-03-04 15:05:31 +01:00
Shilei Tian	530f0e64ec	[AMDGPU] Replace `isInlinableLiteral16` with specific version (#81345 )	2024-03-04 08:40:42 -05:00
Qiu Chaofan	906580bad3	[PowerPC] Add intrinsics for rldimi/rlwimi/rlwnm (#82968 ) These builtins are already there in Clang, however current codegen may produce suboptimal results due to their complex behavior. Implement them as intrinsics to ensure expected instructions are emitted.	2024-03-04 21:13:59 +08:00
James Westwood	00e4a41971	[ARM] R11 not pushed adjacent to link register with PAC-M and AAPCS frame chain fix (#82801 ) When code for M class architecture was compiled with AAPCS and PAC enabled, the frame pointer, r11, was not pushed to the stack adjacent to the link register. Due to PAC being enabled, r12 was placed between r11 and lr. This patch fixes this by adding an extra case to the already existing code that splits the GPR push in two when R11 is the frame pointer and certain paremeters are met. The differential revision for this previous change can be found here: https://reviews.llvm.org/D125649. This now ensures that r11 and lr are pushed in a separate push instruction to the other GPRs when PAC and AAPCS are enabled, meaning the frame pointer and link register are now pushed onto the stack adjacent to each other.	2024-03-04 12:00:36 +00:00
Vyacheslav Levytskyy	8f30b62395	[SPIR-V] Add support for the SPIR-V extension SPV_INTEL_bfloat16_conversion (#83443 ) This PR is to add support for the SPIR-V extension SPV_INTEL_bfloat16_conversion (https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/INTEL/SPV_INTEL_bfloat16_conversion.asciidoc) and OpenCL extension cl_intel_bfloat16_conversions (https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_bfloat16_conversions.html).	2024-03-04 12:55:09 +01:00
Vyacheslav Levytskyy	67d5ba9077	[SPIR-V] Add support for SPV_KHR_float_controls (#83418 ) This PR is to add explicit support for SPV_KHR_float_controls (https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/KHR/SPV_KHR_float_controls.asciidoc). This extension is included into SPIR-V after version 1.4, but in case of lower versions it is to be included explicitly and OpExtension must be present in the module with `OpExtension "SPV_KHR_float_controls"`. This PR fixes this issue and fixes the test case test/CodeGen/SPIRV/exec_mode_float_control_khr.ll to account for a version lower than 1.4.	2024-03-04 12:15:59 +01:00
Vyacheslav Levytskyy	ecc3bdaae1	[SPIR-V] Fix bitcast legalization/instruction selection in SPIR-V Backend (#83139 ) This PR is to fix a way how SPIR-V Backend describes legality of OpBitcast instruction and how it is validated on a step of instruction selection. Instead of checking a size of virtual registers (that makes no sense due to lack of guarantee of direct relations between size of virtual register and bit width associated with the type size), this PR allows to legalize OpBitcast without size check and postpones validation to the instruction selection step. As an example, let's consider the next example that was copied as is from a bigger test suite: ``` %355:id(s16) = G_BITCAST %301:id(s32) %303:id(s16) = ASSIGN_TYPE %355:id(s16), %349:type(s32) %644:fid(s32) = G_FMUL %645:fid, %646:fid %301:id(s32) = ASSIGN_TYPE %644:fid(s32), %40:type(s32) ``` Without the PR this leads to a crash with complains to an illegal bitcast, because %355 is s16 and %301 is s32. However, we must check not virtual registers in this case, but types of %355 and %301, i.e., %349:type(s32) and %40:type(s32), which are perfectly well compatible in a sense of OpBitcast in this case. In a test case that is a part of this PR OpBitcast is legal, being applied for `OpTypeInt 16` and `OpTypeFloat 16`, but would not be legalized without this PR due to virtual registers defined as having size 16 and 32.	2024-03-04 12:15:30 +01:00
Vyacheslav Levytskyy	540d255167	[SPIRV] Add vector reduction instructions (#82786 ) This PR is to add vector reduction instructions according to https://llvm.org/docs/GlobalISel/GenericOpcode.html#vector-reduction-operations and widen in such a way a range of successful supported conversions, covering new cases of vector reduction instructions which IRTranslator is unable to resolve. By legalizing vector reduction instructions we introduce a new instruction patterns that should be addressed, including patterns that are delegated to pre-legalize step. To address this problem, a new pass is added that is to bring newly generated instructions after legalization to an aspect required by instruction selection. Expected overheads for existing cases is minimal, because a new pass is working only with newly introduced instructions, otherwise it's just a additional code traverse without any actions.	2024-03-04 12:14:58 +01:00
Mirko Brkušanin	982e9022ca	[AMDGPU] Add GFX12 memory legalizer tests (#83814 )	2024-03-04 11:22:04 +01:00
Sameer Sahasrabuddhe	c7fdd8c11e	Restore "Implement convergence control in MIR using SelectionDAG (#71785 )" Original commit 79889734b940356ab3381423c93ae06f22e772c9. Perviously reverted in commit a2afcd5721869d1d03c8146bae3885b3385ba15e. LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-03-04 13:28:04 +05:30
Luke Lau	63725ab119	[RISCV] Add test for aliasing miscompile fixed by #83017 . NFC Previously we incorrectly removed the scalar load store pair here assuming it was dead, when it actually aliased with the memset. This showed up as a miscompile on SPEC CPU 2017 when compiling with -mrvv-vector-bits, and was only triggered by the changes in #75531. This was fixed in #83017, but this patch adds a test case for this specific miscompile. For reference, the incorrect codegen was: vsetvli a1, zero, e8, m4, ta, ma vmv.v.i v8, 0 vs4r.v v8, (a0) addi a1, a0, 80 vsetivli zero, 16, e8, m1, ta, ma vmv.v.i v8, 0 vs1r.v v8, (a1) addi a0, a0, 64 vs1r.v v8, (a0)	2024-03-04 15:45:26 +08:00

... 8 9 10 11 12 ...

52796 Commits