llvm-project

Author	SHA1	Message	Date
wanglei	edd4c6c6dc	[LoongArch] Make sure that the LoongArchISD::BSTRINS node uses the correct `MSB` value (#84454 ) The `MSB` must not be greater than `GRLen`. Without this patch, newly added test cases will crash with LoongArch32, resulting in a 'cannot select' error.	2024-03-11 08:59:17 +08:00
Simon Pilgrim	862c7e0218	[X86] combineAndShuffleNot - ensure the type is legal before create X86ISD::ANDNP target nodes Fixes #84660	2024-03-10 16:23:51 +00:00
Simon Pilgrim	92d7aca441	[X86] Add missing immediate qualifier to the (V)CMPSS/D instructions (#84496 ) Matches (V)CMPPS/D and makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on	2024-03-09 16:21:25 +00:00
Jay Foad	fd3eaf76ba	[GISel] Enforce G_PTR_ADD RHS type matching index size for addr space (#84352 )	2024-03-09 09:07:22 +00:00
Craig Topper	6b270358c7	[SelectionDAG] Allow FREEZE to be hoisted before FP SETCC. (#84358 ) No nans/infs in SelectionDAG is complicated. Hopefully I've captured all of the cases. I've only applied to ConsiderFlags to the SDNodeFlags since those are the only ones that will be droped by hoisting. The condition code and TargetOptions would still be in effect. Recovers some regression from #84232.	2024-03-08 17:21:21 -08:00
yingopq	755b439694	[Mips] Fix missing sign extension in expansion of sub-word atomic max (#77072 ) Add sign extension "SEB/SEH" before compare. Fix #61881	2024-03-08 15:41:31 -05:00
David Majnemer	edc1c3d24e	[AArch64] Make more vector f16 operations legal v8f16 is a legal type but promoting to v16f16 would result in an illegal type. Let's legalize these by a combination of splitting+promoting resulting in a pair of v4f16. Also, we were being overly cautious with different v4f16 nodes. Mark more of them safe to promote to v4f32.	2024-03-08 19:52:54 +00:00
David Majnemer	5f935e9181	[AArch64] Optimize fp64 <-> fp16 SIMD conversions Legalization would result in needless scalarization. Add some DAGCombines to fix this up.	2024-03-08 19:52:53 +00:00
Shilei Tian	e963d0740e	[AMDGPU] Replace `isInlinableLiteral16` with specific version (#84402 ) The current implementation of `isInlinableLiteral16` assumes, a 16-bit inlinable literal is either an `i16` or a `fp16`. This is not always true because of `bf16`. However, we can't tell `fp16` and `bf16` apart by just looking at the value. This patch splits `isInlinableLiteral16` into three versions, `i16`, `fp16`, `bf16` respectively, and call the corresponding version.	2024-03-08 14:49:52 -05:00
Craig Topper	a456885efc	[SelectionDAG] Allow FREEZE to be hoisted before integer SETCC. (#84241 ) Teach canCreateUndefOrPoison that ISD::SETCC with integer operands can never create undef/poison. FP SETCC is more complicated and will be handled in a future patch. Teach isGuaranteedNotToBeUndefOrPoison that ISD::CONDCODE is not poison/undef. Its a special constant only used by setcc/select_cc like nodes. This is needed since the hoisting will only hoist if exactly one operand might be poison. setcc has 3 operand including the condition code. Recovers some regression from #84232.	2024-03-08 10:17:54 -08:00
Lukacma	2b4d8188b2	[Clang][LLVM][SVE2.1] Created intrinsics for DUPQ instr. (#83260 ) This patch adds clang and llvm support for following intrinsic and maps it to DUPQ instruction: ``` // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64 // _bf16, _f16, _f32, _f64 svuint8_t svdup_laneq[_u8](svuint8_t zn, uint64_t imm_idx); ```	2024-03-08 15:35:48 +00:00
Paul Walker	bd6eb54886	[LLVM][CodeGen] Teach SelectionDAG how to expand FREM to a vector math call. (#83859 ) This removes, at least when a vector library is available, a failure case for scalable vectors. Doing so means we can confidently cost vector FREM instructions without making an assumption that later passes will transform the IR before it gets to the code generator. NOTE: Whilst only FREM has been implemented the same mechanism can be used for the other libm related ISD nodes.	2024-03-08 12:09:05 +00:00
zhongyunde 00443407	a110a1c0ed	[AArch64] MachineCombiner msub matching for i64	2024-03-08 18:14:26 +08:00
zhongyunde 00443407	3a62edcf52	[AArch64] MachineCombiner msub matching Pattern should be sorted in priority order since the pattern evalutor stops checking as soon as it finds a faster sequence. so for a * b - c * d, we prefer to match the 2nd operands of sub, which can be use msub to fold them. Refer to https://www.slideshare.net/chimerawang/instruction-combine-in-llvm Fix https://github.com/llvm/llvm-project/issues/84152	2024-03-08 18:14:25 +08:00
Sizov Nikita	ef1eb0315e	[AArch64] Add neon bici test for haddu and shadd (#84073 ) Add neon bici test for haddu and shadd, prerequisite for #76644	2024-03-08 09:45:58 +00:00
Pierre van Houtryve	4b1910b11d	[GlobalISel][AMDGPU] Import patterns with multiple defs (#84171 ) Fixes #63216	2024-03-08 09:39:10 +01:00
Vyacheslav Levytskyy	fb1be9b33c	[SPIR-V] Insert a bitcast before load/store instruction to keep SPIR-V code valid (#84069 ) This PR introduces a step after instruction selection where instructions can be traversed from the perspective of their validity from the specification point of view. The PR adds also a way to correct load/store when there is a type mismatch contradicting the specification -- an additional bitcast is inserted to keep types consistent. Correspondent test cases are added and existing test cases are corrected. This PR helps to successfully validate with the `spirv-val` tool (https://github.com/KhronosGroup/SPIRV-Tools) some output that previously led to validation errors and crashes of back translation from SPIRV to LLVM IR from the side of SPIRV Translator project (https://github.com/KhronosGroup/SPIRV-LLVM-Translator). The added step of bringing instructions to required by the specification type correspondence can be (should be and will be) extended beyond load/store instructions to ensure validity rules of other SPIRV instructions related to type inference.	2024-03-08 08:31:56 +01:00
Amara Emerson	f6b825f51e	Revert "Revert "[AArch64][GlobalISel] Fix incorrect selection of monotonic s32->s64 anyext load."" Attempt 2. The first one was trying to call isa<> on an MI reference that was free'd. This reverts commit ee24409c40ff35c3221892d9723331c233ca9f0e.	2024-03-07 23:28:33 -08:00
Fangrui Song	66bd3cd75b	[AMDGPU,test] Change llc -march= to -mtriple= PR #75982 had been created before these tests were added, therefore some test were not updated.	2024-03-07 19:09:18 -08:00
Chen Zheng	cc34e56b86	[PPC][NFC] add an option to expose the bug in 74951	2024-03-07 20:52:44 -05:00
Chen Zheng	e7a22e72de	[PPC] precommit cases for issue 74915	2024-03-07 20:22:26 -05:00
Igor Kudrin	0cd7942c7f	[llvm-dwarfdump] Fix parsing DW_CFA_AARCH64_negate_ra_state (#84128 ) The saved state of the AARCH64_DWARF_PAUTH_RA_STATE register was not updated, so `llvm-dwarfdump` continued to dump it as `reg34=1` even if the correct value is `0`: ``` > llvm-dwarfdump -v test.o ... 0000002c 00000024 00000030 FDE cie=00000000 pc=00000030...00000064 Format: DWARF32 DW_CFA_advance_loc: 4 DW_CFA_AARCH64_negate_ra_state: DW_CFA_advance_loc: 4 DW_CFA_def_cfa_offset: +16 DW_CFA_offset: W30 -16 DW_CFA_remember_state: DW_CFA_advance_loc: 16 DW_CFA_def_cfa_offset: +0 DW_CFA_advance_loc: 4 DW_CFA_AARCH64_negate_ra_state: DW_CFA_restore: W30 DW_CFA_advance_loc: 4 DW_CFA_restore_state: DW_CFA_advance_loc: 12 DW_CFA_def_cfa_offset: +0 DW_CFA_advance_loc: 4 DW_CFA_AARCH64_negate_ra_state: DW_CFA_restore: W30 DW_CFA_nop: 0x30: CFA=WSP 0x34: CFA=WSP: reg34=1 0x38: CFA=WSP+16: W30=[CFA-16], reg34=1 0x48: CFA=WSP: W30=[CFA-16], reg34=1 0x4c: CFA=WSP: reg34=1 <--- should be '=0' 0x50: CFA=WSP+16: W30=[CFA-16], reg34=1 0x5c: CFA=WSP: W30=[CFA-16], reg34=1 0x60: CFA=WSP: reg34=1 <--- should be '=0' ```	2024-03-08 07:34:20 +07:00
Craig Topper	0d4978f3cf	[RISCV] Update some tests I missed in 909ab0e0d1903ad2329ca9fdf248d21330f9437f. NFC	2024-03-07 16:21:41 -08:00
Amara Emerson	26fa440957	[GlobalISel] Fix yet another pointer type invalid combining issue, this time in tryFoldSelectOfConstants()	2024-03-07 15:58:28 -08:00
Amara Emerson	a01e9ce86f	[AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT We should moreElements <3 x s1> to <4 x s1> before we try to widen the element, otherwise we end up with a <3 x s21> nonsense type.	2024-03-07 15:40:19 -08:00
Evgenii Kudriashov	10edabbcf3	[X86][GlobalISel] Enable G_SDIV/G_UDIV/G_SREM/G_UREM (#81615 ) * Create a libcall for s64 type for 32 bit targets. * Fix a bug in REM selection: SUBREG_TO_REG is not intended to produce a value from super registers. * Replace selector tests by end-to-end tests. Other passes check the selected MIR better.	2024-03-08 00:10:53 +01:00
Craig Topper	909ab0e0d1	[RISCV] Insert a freeze before converting select to AND/OR. (#84232 ) Select blocks poison, but AND/OR do not. We need to insert a freeze to block poison propagation. This creates suboptimal codegen which I will try to fix with other patches. I'm prioritizing the correctness fix since we have 2 bug reports. Fixes #84200 and #84350	2024-03-07 15:03:51 -08:00
Amara Emerson	641b98a0d1	[GlobalISel] Fix crash in tryFoldAndOrOrICmpsUsingRanges() with pointer types.	2024-03-07 12:56:40 -08:00
Noah Goldstein	9f96db8e31	[X86] Fold `(icmp ult (add x,-C),2)` -> `(or (icmp eq X,C), (icmp eq X,C+1))` for Vectors This is undoing a middle-end transform which does the opposite. Since X86 doesn't have unsigned vector comparison instructions pre-AVX512, the simplified form gets worse codegen. Fixes #66479 Proofs: https://alive2.llvm.org/ce/z/UCz3wt Closes #84104 Closes #66479	2024-03-07 13:12:09 -06:00
Noah Goldstein	3e73a080fa	[X86] Add tests for folding `(icmp ult (add x,-C),2)` -> `(or (icmp eq X,C), (icmp eq X,C+1))`; NFC	2024-03-07 13:12:09 -06:00
Florian Mayer	ee24409c40	Revert "[AArch64][GlobalISel] Fix incorrect selection of monotonic s32->s64 anyext load." This reverts commit 7524ad9aa7b1b5003fe554a6ac8e434d50027dfb. Broke sanitizer build bots, e.g. https://lab.llvm.org/buildbot/#/builders/5/builds/41588/steps/9/logs/stdio	2024-03-07 09:43:21 -08:00
Michael Maitland	96049fcf4e	[GISEL] Add IRTranslation for shufflevector on scalable vector types (#80378 ) Recommits llvm/llvm-project#80378 which was reverted in llvm/llvm-project#84330. The problem was that the change in llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir used 217 as an opcode instead of a regex.	2024-03-07 09:10:03 -08:00
Jay Foad	8f79cdd8da	[AArch64] Add -verify-machineinstrs to a test This would have helped identify problems with #83905 which only showed up in an LLVM_ENABLE_EXPENSIVE_CHECKS build.	2024-03-07 17:06:16 +00:00
Michael Maitland	552da24843	Revert "[GISEL] Add IRTranslation for shufflevector on scalable vector types" (#84330 ) Reverts llvm/llvm-project#80378 causing Buildbot failures that did not show up with check-llvm or CI.	2024-03-07 10:16:31 -05:00
SahilPatidar	9e0f5909d0	[DAG] Fix Failure to reassociate SMAX/SMIN/UMAX/UMIN (#82175 ) Resolve #58110	2024-03-07 15:15:17 +00:00
Michael Maitland	2b8aaef09e	[GISEL] Add IRTranslation for shufflevector on scalable vector types (#80378 ) This patch is stacked on https://github.com/llvm/llvm-project/pull/80372, https://github.com/llvm/llvm-project/pull/80307, and https://github.com/llvm/llvm-project/pull/80306. ShuffleVector on scalable vector types gets IRTranslate'd to G_SPLAT_VECTOR since a ShuffleVector that has operates on scalable vectors is a splat vector where the value of the splat vector is the 0th element of the first operand, because the index mask operand is the zeroinitializer (undef and poison are treated as zeroinitializer here). This is analogous to what happens in SelectionDAG for ShuffleVector. `buildSplatVector` is renamed to`buildBuildVectorSplatVector`. I did not make this a separate patch because it would cause problems to revert that change without reverting this change too.	2024-03-07 09:50:29 -05:00
ostannard	503c55e170	[AArch64] Move SLS later in pass pipeline (#84210 ) Currently, the SLS hardening pass is run before the machine outliner, which means that the outliner creates new functions and calls which do not have the SLS hardening applied. The fix for this is to move the SLS passes to after the outliner, as has recently been done for the return address signing pass. This also avoids a bug where the SLS outliner emits code with instructions after a return, which the outliner doesn't correctly handle.	2024-03-07 09:28:49 +00:00
Luke Lau	c59129a7c7	[RISCV] Recursively split concat_vector into smaller LMULs (#83035 ) This is the concat_vector equivalent of #81312, in that we recursively split concat_vectors with more than two operands into smaller concat_vectors. This allows us to break up the chain of vslideups, as well as perform the vslideups at a smaller LMUL, which in turn reduces register pressure as the previous lowering performed N vslideups at the highest result LMUL. For now, it stops splitting past MF2. This is done as a DAG combine so that any undef operands are combined away: If we do this during lowering then we end up with unnecessary vslideups of undefs.	2024-03-07 16:50:26 +08:00
Jay Foad	7a0e222a17	Revert "Convert many LivePhysRegs uses to LiveRegUnits (#83905 )" This reverts commit 2a13422b8bcee449405e3ebff957b4020805f91c. It was causing test failures on the expensive check builders.	2024-03-07 08:20:26 +00:00
Amara Emerson	7524ad9aa7	[AArch64][GlobalISel] Fix incorrect selection of monotonic s32->s64 anyext load. This load isn't selected by tablegen due to the anyext, but wasn't generating a subreg_to_reg. Maybe it shouldn't be formed at all during the combiner but to stop crashes later in codegen select it manually for now.	2024-03-07 00:12:17 -08:00
Fangrui Song	e63ea9d6f7	[CommandFlags] Rename option -relax-elf-relocations to -x86-relax-relocations relax-elf-relocations is misleading and there were AMDGPU/SystemZ tests misusing this x86-specific option.	2024-03-06 23:03:11 -08:00
Amara Emerson	00efb34352	[AArch64][GlobalISel] Fix crash during G_SHUFFLE_VECTOR legalization. A new widening rule was running before the shuffle was canonicalized into a homogenous form. Moving the rules around to ensure it's done before the widening fixes the crash, although this particular test still falls back.	2024-03-06 22:43:00 -08:00
David Green	44be5a7fdc	[Codegen] Make Width in getMemOperandsWithOffsetWidth a LocationSize. (#83875 ) This is another part of #70452 which makes getMemOperandsWithOffsetWidth use a LocationSize for Width, as opposed to the unsigned it currently uses. The advantages on it's own are not super high if getMemOperandsWithOffsetWidth usually uses known sizes, but if the values can come from an MMO it can help be more accurate in case they are Unknown (and in the future, scalable).	2024-03-06 17:40:13 +00:00
Simon Pilgrim	0bd9255f8a	[X86] Improve KnownBits for X86ISD::PSADBW nodes (#83830 ) Don't just return the known zero upperbits, compute the absdiff Knownbits and perform the horizontal sum. Add implementations that handle both the X86ISD::PSADBW nodes and the INTRINSIC_WO_CHAIN intrinsics (pre-legalization).	2024-03-06 17:23:15 +00:00
Craig Topper	c161720ab4	[RISCV] Slightly improve expanded multiply emulation in getVLENFactoredAmount. (#84113 ) Instead of initializing the accumulator to 0. Initialize it on first assignment with a mv from the register that holds VLENB << ShiftAmount. Fix a missing kill flag on the final Add. I have no real interest in this case, just an easy optimization I noticed.	2024-03-06 08:56:37 -08:00
Krzysztof Drewniak	6540f1635a	[AMDGPU] Add IR-level pass to rewrite away address space 7 (#77952 ) This commit adds the -lower-buffer-fat-pointers pass, which is applicable to all AMDGCN compilations. The purpose of this pass is to remove the type `ptr addrspace(7)` from incoming IR. This must be done at the LLVM IR level because `ptr addrspace(7)`, as a 160-bit primitive type, cannot be correctly handled by SelectionDAG. The detailed operation of the pass is described in comments, but, in summary, the removal proceeds by: 1. Rewriting loads and stores of ptr addrspace(7) to loads and stores of i160 (including vectors and aggregates). This is needed because the in-register representation of these pointers will stop matching their in-memory representation in step 2, and so ptrtoint/inttoptr operations are used to preserve the expected memory layout 2. Mutating the IR to replace all occurrences of `ptr addrspace(7)` with the type `{ptr addrspace(8), ptr addrspace(6) }`, which makes the two parts of a buffer fat pointer (the 128-bit address space 8 resource and the 32-bit address space 6 offset) visible in the IR. This also impacts the argument and return types of functions. 3. Splitting the resource and offset parts. All instructions that produce or consume buffer fat pointers (like GEP or load) are rewritten to produce or consume the resource and offset parts separately. For example, GEP updates the offset part of the result and a load uses the resource and offset parts to populate the relevant llvm.amdgcn.raw.ptr.buffer.load intrinsic call. At the end of this process, the original mutated instructions are replaced by their new split counterparts, ensuring no invalidly-typed IR escapes this pass. (For operations like call, where the struct form is needed, insertelement operations are inserted). Compared to LGC's PatchBufferOp ( `32cda89776/lgc/patch/PatchBufferOp.cpp` ): this pass - Also handles vectors of ptr addrspace(7)s - Also handles function boundaries - Includes the same uniform buffer optimization for loops and conditionals - Does not handle memcpy() and friends (this is future work) - Does not break up large loads and stores into smaller parts. This should be handled by extending the legalization of .buffer.{load,store} to handle larger types by producing multiple instructions (the same way ordinary LOAD and STORE are legalized). That work is planned for a followup commit. - Does not* have special logic for handling divergent buffer descriptors. The logic in LGC is, as far as I can tell, incorrect in general, and, per discussions with @nhaehnle, isn't widely used. Therefore, divergent descriptors are handled with waterfall loops later in legalization. As a final matter, this commit updates atomic expansion to treat buffer operations analogously to global ones. (One question for reviewers: is the new pass is the right place? Should it be later in the pipeline?) Differential Revision: https://reviews.llvm.org/D158463	2024-03-06 09:49:58 -06:00
Mirko Brkušanin	1fd1f4c0e1	[AMDGPU] Handle amdgpu.last.use metadata (#83816 ) Convert !amdgpu.last.use metadata into MachineMemOperand for last use and handle it in SIMemoryLegalizer similar to nontemporal and volatile.	2024-03-06 16:33:52 +01:00
Emma Pilkington	4490003a22	[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905 ) The previous name 'amdgpu_code_object_version', was misleading since this is really a property of the HSA OS. The new spelling also matches the asm directive I added in bc82cfb.	2024-03-06 09:51:48 -05:00
yandalur	f7d354af57	[Hexagon] Fix shift value when folding shl DAG node (#83853 ) When folding (or (shl xx, s), (zext y)) to (COMBINE (shl xx, s-32), y), fix resulting shift value in HexagonISD::COMBINE node to not generate negative values. --------- Co-authored-by: Yashas Andaluri <yandalur@qti.qualcomm.com>	2024-03-06 08:17:02 -06:00
Joseph Huber	1fc5e50ceb	[AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (#83906 ) Summary: This patch implements the LLVM floating point environment control intrinsics and also exposes it through clang. We encode the floating point environment as a 64-bit value that simply concatenates the values of the mode registers and the current trap status. We only fetch the bits relevant for floating point instructions. That is, rounding mode, denormalization mode, ieee, dx10 clamp, debug, enabled traps, f16 overflow, and active exceptions.	2024-03-06 08:11:54 -06:00

... 7 8 9 10 11 ...

52796 Commits