llvm-project

Author	SHA1	Message	Date
Michael Maitland	2f400a2fd7	[GISEL] Add G_VSCALE instruction (#84542 )	2024-03-12 20:22:49 -04:00
S. Bharadwaj Yadavalli	54f631d116	[DirectX][NFC] Model precise overload type specification of DXIL Ops (#83917 ) Implement an abstraction to specify precise overload types supported by DXIL ops. These overload types are typically a subset of LLVM intrinsics. Implement the corresponding changes in DXILEmitter backend. Add tests to verify expected errors for unsupported overload types at code generation time. Add tests to check for correct overload error output.	2024-03-12 16:51:18 -04:00
Arthur Eubanks	6bbb73b4cb	[X86] Fix determining if globals with size <8 bits are large (#84975 ) Previously any global under 8 bits would accidentally be considered 0 sized, which is considered a large global.	2024-03-12 12:43:29 -07:00
Arthur Eubanks	45219702e7	[test][X86] Precommit test for large data threshold and i1 global	2024-03-12 19:08:40 +00:00
Simon Pilgrim	c1af6ab505	[X86] getFauxShuffleMask - recognise CONCAT(SUB0, SUB1) style patterns Handles the INSERT_SUBVECTOR(INSERT_SUBVECTOR(UNDEF,SUB0,0),SUB1,N) pattern Currently limited to v8i64/v8f64 cases as only AVX512 has decent cross lane 2-input shuffles, the plan is to relax this as I deal with some regressions	2024-03-12 17:40:19 +00:00
Jun Wang	c4e517f59c	[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035 ) A new function attribute named amdgpu_num_work_groups is added. This attribute, which consists of three integers, allows programmers to let the compiler know the number of workgroups to be launched in each of the three dimensions and do optimizations based on that information. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2024-03-12 10:30:39 -07:00
Matt Arsenault	bd72ebd8d1	AMDGPU: Add some more mfma hazard recognizer tests (#84727 )	2024-03-12 22:05:47 +05:30
Jake Egan	fa1d13590c	[AIX][tests] Disable failing tests on AIX These new tests are failing on the AIX bot because the -I option isn't supported. Disable these tests for now until they can be fixed.	2024-03-12 12:11:18 -04:00
Nemanja Ivanovic	08dd645c15	[RISC-V] Bad immediate value for Zcmp instructions with E extension (#84925 ) When we are using the Zcmp extension together with the E extension in 32-bit mode and we need to spill both callee-saved registers as well as needing a couple of 32-bit stack slots, we emit a meaningless stack adjustment with cm.push/cm.popret. Furthermore this leads to the stack slot for the ra being clobbered so control returns to a random location. This is just a pre-commit test so that the PR for the fix shows the difference in code generation.	2024-03-12 16:26:49 +01:00
Bjorn Pettersson	4d0f79e346	Pre commit test cases SRL/SRA support in canCreateUndefOrPoison. NFC Add test cases to show that we can't push freeze through SRA/SRL with 'exact' flag when there are multiple uses.	2024-03-12 16:03:18 +01:00
Danial Klimkin	afd4758703	Revert "[NVPTX] Add support for atomic add for f16 type" (#84918 ) Reverts llvm/llvm-project#84295 due to breakages.	2024-03-12 15:01:18 +01:00
Adrian Kuegel	8e0f4b943f	[NVPTX] Add support for atomic add for f16 type (#84295 ) atom.add.noftz.f16 is supported since SM 7.0	2024-03-12 09:12:44 +01:00
Dhruv Chawla (work)	1d900e2984	[AArch64][GlobalISel] Avoid generating inserts for undefs when selecting G_BUILD_VECTOR (#84452 ) It is safe to ignore undef values when selecting G_BUILD_VECTOR as undef values choose random registers for copying values from.	2024-03-12 11:57:07 +05:30
Phoebe Wang	e89b4bcf32	[X86] Remove SlowDivide tuning from GRTTuning (#84676 ) The DIV32/64 throughput was improved since Goldmont in the Atom architecture. The Alder Lake-E shows similar number too. So we shouldn't add such tunings to Gracemont and later products. Checked from Agner Fog's table and uops.info.	2024-03-12 13:41:49 +08:00
Craig Topper	884b051a42	Recommit "[TypePromotion] Support positive addition amounts in isSafeWrap. (#81690 )" With special case with Add constant is 0. Original message: We can support these by changing the sext promotion to -zext(-C) and replacing a sgt check with ugt. Reframing the logic in terms of how the unsigned range are affected. More comments in the patch. The new cases check isLegalAddImmediate to avoid some regressions in lit tests.	2024-03-11 12:39:38 -07:00
Michael Maitland	034cc2f5d0	[GISEL] Add G_INSERT_SUBVECTOR and G_EXTRACT_SUBVECTOR (#84538 ) G_INSERT and G_EXTRACT are not sufficient to use to represent both INSERT/EXTRACT on a subregister and INSERT/EXTRACT on a vector. We would like to be able to INSERT/EXTRACT on vectors in cases that INSERT/EXTRACT on vector subregisters are not sufficient, so we add these opcodes. I tried to do a patch where we treated G_EXTRACT as both G_EXTRACT_SUBVECTOR and G_EXTRACT_SUBREG, but ran into an infinite loop at this [point](`8b5b294ec2/llvm/lib/Target/RISCV/RISCVISelLowering.cpp (L9932)`) in the SDAG equivalent code.	2024-03-11 13:47:30 -04:00
Simon Pilgrim	6cd68c2f87	[X86] Add base SSE2 coverage to SRL/SRA combines tests	2024-03-11 16:25:05 +00:00
Simon Pilgrim	7dc4d5f6a0	[X86] Add AVX512 (x86-64-v4) coverage to generic shift combines tests	2024-03-11 16:22:47 +00:00
Sivan Shani	5e688f0dbd	[llvm][arm] add T1 and T2 assembly options for vlldm and vlstm Re-land 634b0243b8f7acc85af4f16b70e91d86ded4dc83. T1 allow for an optional registers list, the register list must be {d0-d15}. T2 define a mandatory register list, the register list must be {d0-d31}. The requirements for T1/T2 are as follows: T1 T2 Require: v8-M.Main, v8.1-M.Main, secure state secure state 16 D Regs valid valid 32 D Regs UNDEFINED valid No D Regs NOP NOP	2024-03-11 14:27:28 +00:00
Pierre van Houtryve	d4569d42b5	[AMDGPU] Let LowerModuleLDS run twice on the same module (#81729 ) If all variables in the module are absolute, this means we're running the pass again on an already lowered module, and that works. If none of them are absolute, lowering can proceed as usual. Only diagnose cases where we have a mix of absolute/non-absolute GVs, which means we added LDS GVs after lowering, which is broken. See #81491 Split from #75333	2024-03-11 09:20:01 +01:00
Craig Topper	561ddb1687	Revert "[TypePromotion] Support positive addition amounts in isSafeWrap. (#81690 )" This reverts commit 0813b90ff5d195d8a40c280f6b745f1cc43e087a. Fixes miscompile reported in #84718.	2024-03-11 00:51:21 -07:00
AtariDreams	4e0e9b17c6	[SelectionDAG] Switch to LiveRegUnits (#84197 )	2024-03-11 12:47:39 +05:30
Carl Ritson	4a21e3afa2	[LiveIntervals] repairIntervalsInRange: recompute width changes (#78564 ) Extend repairIntervalsInRange to completely recompute the interva for a register if subregister defs exist without precise subrange matches (LaneMask exactly matching subregister). This occurs when register sequences are lowered to copies such that the size of the copies do not match any uses of the subregisters formed (i.e. during twoaddressinstruction). The subranges without this change are probably legal, but do not match those generated by live interval computation. This creates problems with other code that assumes subranges precisely cover all subregisters defined, e.g. shrinkToUses().	2024-03-11 15:24:17 +09:00
Kito Cheng	b7f97d3661	[RISCV] Place mergeable small read only data into srodata section (#82214 ) Small mergeable read only data was place on the sdata before, but it also means it lose the mergeable property, which means lose some code size optimization opportunity during link time.	2024-03-11 13:57:06 +08:00
Carl Ritson	d9e6aa7048	[AMDGPU] Update LiveInterval def index for early-clobber (#79285 ) On converting an instruction to an early-clobber definition in convertToThreeAddress, we must also update live intervals for the register to start at the early-clobber index.	2024-03-11 14:54:11 +09:00
Craig Topper	d8d2dea7fc	[RISCV] Handle FP riscv_masked_strided_load with 0 stride. (#84576 ) Previously, we tried to create an integer extending load. We need to a non-extending FP load instead. Fixes #84541.	2024-03-10 21:22:37 -07:00
wanglei	edd4c6c6dc	[LoongArch] Make sure that the LoongArchISD::BSTRINS node uses the correct `MSB` value (#84454 ) The `MSB` must not be greater than `GRLen`. Without this patch, newly added test cases will crash with LoongArch32, resulting in a 'cannot select' error.	2024-03-11 08:59:17 +08:00
Simon Pilgrim	862c7e0218	[X86] combineAndShuffleNot - ensure the type is legal before create X86ISD::ANDNP target nodes Fixes #84660	2024-03-10 16:23:51 +00:00
Simon Pilgrim	92d7aca441	[X86] Add missing immediate qualifier to the (V)CMPSS/D instructions (#84496 ) Matches (V)CMPPS/D and makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on	2024-03-09 16:21:25 +00:00
Jay Foad	fd3eaf76ba	[GISel] Enforce G_PTR_ADD RHS type matching index size for addr space (#84352 )	2024-03-09 09:07:22 +00:00
Craig Topper	6b270358c7	[SelectionDAG] Allow FREEZE to be hoisted before FP SETCC. (#84358 ) No nans/infs in SelectionDAG is complicated. Hopefully I've captured all of the cases. I've only applied to ConsiderFlags to the SDNodeFlags since those are the only ones that will be droped by hoisting. The condition code and TargetOptions would still be in effect. Recovers some regression from #84232.	2024-03-08 17:21:21 -08:00
yingopq	755b439694	[Mips] Fix missing sign extension in expansion of sub-word atomic max (#77072 ) Add sign extension "SEB/SEH" before compare. Fix #61881	2024-03-08 15:41:31 -05:00
David Majnemer	edc1c3d24e	[AArch64] Make more vector f16 operations legal v8f16 is a legal type but promoting to v16f16 would result in an illegal type. Let's legalize these by a combination of splitting+promoting resulting in a pair of v4f16. Also, we were being overly cautious with different v4f16 nodes. Mark more of them safe to promote to v4f32.	2024-03-08 19:52:54 +00:00
David Majnemer	5f935e9181	[AArch64] Optimize fp64 <-> fp16 SIMD conversions Legalization would result in needless scalarization. Add some DAGCombines to fix this up.	2024-03-08 19:52:53 +00:00
Shilei Tian	e963d0740e	[AMDGPU] Replace `isInlinableLiteral16` with specific version (#84402 ) The current implementation of `isInlinableLiteral16` assumes, a 16-bit inlinable literal is either an `i16` or a `fp16`. This is not always true because of `bf16`. However, we can't tell `fp16` and `bf16` apart by just looking at the value. This patch splits `isInlinableLiteral16` into three versions, `i16`, `fp16`, `bf16` respectively, and call the corresponding version.	2024-03-08 14:49:52 -05:00
Craig Topper	a456885efc	[SelectionDAG] Allow FREEZE to be hoisted before integer SETCC. (#84241 ) Teach canCreateUndefOrPoison that ISD::SETCC with integer operands can never create undef/poison. FP SETCC is more complicated and will be handled in a future patch. Teach isGuaranteedNotToBeUndefOrPoison that ISD::CONDCODE is not poison/undef. Its a special constant only used by setcc/select_cc like nodes. This is needed since the hoisting will only hoist if exactly one operand might be poison. setcc has 3 operand including the condition code. Recovers some regression from #84232.	2024-03-08 10:17:54 -08:00
Lukacma	2b4d8188b2	[Clang][LLVM][SVE2.1] Created intrinsics for DUPQ instr. (#83260 ) This patch adds clang and llvm support for following intrinsic and maps it to DUPQ instruction: ``` // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64 // _bf16, _f16, _f32, _f64 svuint8_t svdup_laneq[_u8](svuint8_t zn, uint64_t imm_idx); ```	2024-03-08 15:35:48 +00:00
Paul Walker	bd6eb54886	[LLVM][CodeGen] Teach SelectionDAG how to expand FREM to a vector math call. (#83859 ) This removes, at least when a vector library is available, a failure case for scalable vectors. Doing so means we can confidently cost vector FREM instructions without making an assumption that later passes will transform the IR before it gets to the code generator. NOTE: Whilst only FREM has been implemented the same mechanism can be used for the other libm related ISD nodes.	2024-03-08 12:09:05 +00:00
zhongyunde 00443407	a110a1c0ed	[AArch64] MachineCombiner msub matching for i64	2024-03-08 18:14:26 +08:00
zhongyunde 00443407	3a62edcf52	[AArch64] MachineCombiner msub matching Pattern should be sorted in priority order since the pattern evalutor stops checking as soon as it finds a faster sequence. so for a * b - c * d, we prefer to match the 2nd operands of sub, which can be use msub to fold them. Refer to https://www.slideshare.net/chimerawang/instruction-combine-in-llvm Fix https://github.com/llvm/llvm-project/issues/84152	2024-03-08 18:14:25 +08:00
Sizov Nikita	ef1eb0315e	[AArch64] Add neon bici test for haddu and shadd (#84073 ) Add neon bici test for haddu and shadd, prerequisite for #76644	2024-03-08 09:45:58 +00:00
Pierre van Houtryve	4b1910b11d	[GlobalISel][AMDGPU] Import patterns with multiple defs (#84171 ) Fixes #63216	2024-03-08 09:39:10 +01:00
Vyacheslav Levytskyy	fb1be9b33c	[SPIR-V] Insert a bitcast before load/store instruction to keep SPIR-V code valid (#84069 ) This PR introduces a step after instruction selection where instructions can be traversed from the perspective of their validity from the specification point of view. The PR adds also a way to correct load/store when there is a type mismatch contradicting the specification -- an additional bitcast is inserted to keep types consistent. Correspondent test cases are added and existing test cases are corrected. This PR helps to successfully validate with the `spirv-val` tool (https://github.com/KhronosGroup/SPIRV-Tools) some output that previously led to validation errors and crashes of back translation from SPIRV to LLVM IR from the side of SPIRV Translator project (https://github.com/KhronosGroup/SPIRV-LLVM-Translator). The added step of bringing instructions to required by the specification type correspondence can be (should be and will be) extended beyond load/store instructions to ensure validity rules of other SPIRV instructions related to type inference.	2024-03-08 08:31:56 +01:00
Amara Emerson	f6b825f51e	Revert "Revert "[AArch64][GlobalISel] Fix incorrect selection of monotonic s32->s64 anyext load."" Attempt 2. The first one was trying to call isa<> on an MI reference that was free'd. This reverts commit ee24409c40ff35c3221892d9723331c233ca9f0e.	2024-03-07 23:28:33 -08:00
Fangrui Song	66bd3cd75b	[AMDGPU,test] Change llc -march= to -mtriple= PR #75982 had been created before these tests were added, therefore some test were not updated.	2024-03-07 19:09:18 -08:00
Chen Zheng	cc34e56b86	[PPC][NFC] add an option to expose the bug in 74951	2024-03-07 20:52:44 -05:00
Chen Zheng	e7a22e72de	[PPC] precommit cases for issue 74915	2024-03-07 20:22:26 -05:00
Igor Kudrin	0cd7942c7f	[llvm-dwarfdump] Fix parsing DW_CFA_AARCH64_negate_ra_state (#84128 ) The saved state of the AARCH64_DWARF_PAUTH_RA_STATE register was not updated, so `llvm-dwarfdump` continued to dump it as `reg34=1` even if the correct value is `0`: ``` > llvm-dwarfdump -v test.o ... 0000002c 00000024 00000030 FDE cie=00000000 pc=00000030...00000064 Format: DWARF32 DW_CFA_advance_loc: 4 DW_CFA_AARCH64_negate_ra_state: DW_CFA_advance_loc: 4 DW_CFA_def_cfa_offset: +16 DW_CFA_offset: W30 -16 DW_CFA_remember_state: DW_CFA_advance_loc: 16 DW_CFA_def_cfa_offset: +0 DW_CFA_advance_loc: 4 DW_CFA_AARCH64_negate_ra_state: DW_CFA_restore: W30 DW_CFA_advance_loc: 4 DW_CFA_restore_state: DW_CFA_advance_loc: 12 DW_CFA_def_cfa_offset: +0 DW_CFA_advance_loc: 4 DW_CFA_AARCH64_negate_ra_state: DW_CFA_restore: W30 DW_CFA_nop: 0x30: CFA=WSP 0x34: CFA=WSP: reg34=1 0x38: CFA=WSP+16: W30=[CFA-16], reg34=1 0x48: CFA=WSP: W30=[CFA-16], reg34=1 0x4c: CFA=WSP: reg34=1 <--- should be '=0' 0x50: CFA=WSP+16: W30=[CFA-16], reg34=1 0x5c: CFA=WSP: W30=[CFA-16], reg34=1 0x60: CFA=WSP: reg34=1 <--- should be '=0' ```	2024-03-08 07:34:20 +07:00
Craig Topper	0d4978f3cf	[RISCV] Update some tests I missed in 909ab0e0d1903ad2329ca9fdf248d21330f9437f. NFC	2024-03-07 16:21:41 -08:00
Amara Emerson	26fa440957	[GlobalISel] Fix yet another pointer type invalid combining issue, this time in tryFoldSelectOfConstants()	2024-03-07 15:58:28 -08:00

1 2 3 4 5 ...

52422 Commits