llvm-project

Author	SHA1	Message	Date
Mircea Trofin	cda388c440	[mlgo] Fix test post PR #76697 Opcode values changed, trivial fix.	2024-01-03 19:38:03 -08:00
Micah Weston	7df28fd61a	[SHT_LLVM_BB_ADDR_MAP][AsmPrinter] Implements PGOAnalysisMap emitting in AsmPrinter with tests. (#75202 ) Uses machine analyses to emit PGOAnalysisMap into the bb-addr-map ELF section. Implements filecheck tests to verify emitting new fields. This patch emits optional PGO related analyses into the bb-addr-map ELF section during AsmPrinter. This currently supports Function Entry Count, Machine Block Frequencies. and Machine Branch Probabilities. Each is independently enabled via the `feature` byte of `bb-addr-map` for the given function. A part of [RFC - PGO Accuracy Metrics: Emitting and Evaluating Branch and Block Analysis](https://discourse.llvm.org/t/rfc-pgo-accuracy-metrics-emitting-and-evaluating-branch-and-block-analysis/73902).	2024-01-03 19:17:44 -05:00
Nicolai Hähnle	49b492048a	AMDGPU: Fix packed 16-bit inline constants (#76522 ) Consistently treat packed 16-bit operands as 32-bit values, because that's really what they are. The attempt to treat them differently was ultimately incorrect and lead to miscompiles, e.g. when using non-splat constants such as (1, 0) as operands. Recognize 32-bit float constants for i/u16 instructions. This is a bit odd conceptually, but it matches HW behavior and SP3. Remove isFoldableLiteralV216; there was too much magic in the dependency between it and its use in SIFoldOperands. Instead, we now simply rely on checking whether a constant is an inline constant, and trying a bunch of permutations of the low and high halves. This is more obviously correct and leads to some new cases where inline constants are used as shown by tests. Move the logic for switching packed add vs. sub into SIFoldOperands. This has two benefits: all logic that optimizes for inline constants in packed math is now in one place; and it applies to both SelectionDAG and GISel paths. Disable the use of opsel with v_dot* instructions on gfx11. They are documented to ignore opsel on src0 and src1. It may be interesting to re-enable to use of opsel on src2 as a future optimization. A similar "proper" fix of what inline constants mean could potentially be applied to unpacked 16-bit ops. However, it's less clear what the benefit would be, and there are surely places where we'd have to carefully audit whether values are properly sign- or zero-extended. It is best to keep such a change separate. Fixes: Corruption in FSR 2.0 (latent bug exposed by an LLPC change)	2024-01-04 00:10:15 +01:00
Ahmed Bougacha	155d5849da	[AArch64] Avoid jump tables in swiftasync clobber-live-reg test. NFC. The upstream test relies on jump-tables, which are lowered in dramatically different ways with later arm64e/ptrauth patches. Concretely, it's failing for at least two reasons: - ptrauth removes x16/x17 from tcGPR64 to prevent indirect tail-calls from using either register as the callee, conflicting with their usage as scratch for the tail-call LR auth checking sequence. In the 1/2_available_regs_left tests, this causes the MI scheduler to move the load up across some of the inlineasm register clobbers. - ptrauth adds an x16/x17-using pseudo for jump-table dispatch, which looks somewhat different from the regular jump-table dispatch codegen by itself, but also prevents compression currently. They seem like sensible changes. But they mean the tests aren't really testing what they're intented to, because there's always an implicit x16/x17 clobber when using jump-tables. This updates the test in a way that should work identically regardless of ptrauth support, with one exception, #1 above, which merely reorders the load/inlineasm w.r.t. eachother. I verified the tests still fail the live-reg assertions when applicable.	2024-01-03 13:51:46 -08:00
Craig Topper	bdcd7c0ba0	[DAGCombiner][RISCV] Preserve disjoint flag in folding (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2) (#76860 ) Since we are shifting both inputs to the original Or by the same amount and inserting zeros in the LSBs, the result should still be disjoint.	2024-01-03 13:14:13 -08:00
Craig Topper	f64d1c810a	[RISCV] Add test cases for folding disjoint Or into a scalar load address. NFC After 47a1704ac94c8aeb1aa7e0fc438ff99d36b632c6 we are able to reassociate a disjoint Or used as a GEP index to get the constant closer to a load to fold it. This is show by the first test. We are not able to do this if the GEP created a shift left to scale the index as the second test shows. To make this work, we need to preserve the disjoint flag when pulling the Or through the shift.	2024-01-03 12:17:57 -08:00
Craig Topper	47a1704ac9	[SelectionDAG][X86] Use disjoint flag in SelectionDAG::isADDLike. (#76847 ) Keep the haveNoCommonBitsSet check because we haven't started inferring the flag yet. I've added tests for two transforms, but these are not the only transforms that use isADDLike.	2024-01-03 11:54:29 -08:00
Arthur Eubanks	c4146121e9	Revert "Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG"" This reverts commit 0e46b49de43349f8cbb2a7d4c6badef6d16e31ae. Causes crashes, see repro on `0e46b49de4`.	2024-01-03 17:09:46 +00:00
Arthur Eubanks	ece1359857	Revert "[PowerPC] Add test after #75271 on PPC. NFC. (#75616 )" This reverts commit 5cfc7b3342ce4de0bbe182b38baa8a71fc83f8f8. This depends on 0e46b49de43349f8cbb2a7d4c6badef6d16e31ae which is being reverted.	2024-01-03 17:09:45 +00:00
Mirko Brkušanin	82e33d6203	[AMDGPU] Add VDSDIR instructions for GFX12 (#75197 )	2024-01-03 16:32:00 +01:00
Simon Pilgrim	1d27669e8a	[X86] combineConcatVectorOps - fold 512-bit concat(blendi(x,y,c0),blendi(z,w,c1)) to AVX512BW mask select Yet another yak shaving regression fix for #73509	2024-01-03 12:38:26 +00:00
Saiyedul Islam	df1b5ae31d	[AMDGPU][GlobalISel] Update tests to check for COV5 (#76257 ) Update GlobalISel tests to assume ABI to be code object version 5.	2024-01-03 17:53:47 +05:30
Simon Pilgrim	39be138cb7	[X86] combineTargetShuffle - fold SHUF128(CONCAT(),CONCAT()) to peek through upper subvectors If SHUF128 is accessing only the upper half of a vector source that is a concatenation/insert_subvector then try to access the subvector directly and adjust the element mask accordingly.	2024-01-03 11:09:14 +00:00
Simon Pilgrim	58a335a3f1	[X86] Fold concat_vectors(permq(x),permq(x)) -> permq(concat_vectors(x,x)) Handle a common subvector shuffle pattern in combineConcatVectorOps	2024-01-03 11:09:14 +00:00
David Green	771fd1ad2a	[DAG] Extend input types if needed in combineShiftToAVG. (#76791 ) This atempts to fix #76734 which is a crash in invalid TRUNC nodes types from unoptimized input code in combineShiftToAVG. The NVT can be VT if the larger type was legal and the adds will not overflow, in which case the inputs should be extended. From what I can tell this appears to be valid (if not optimal for this case): https://alive2.llvm.org/ce/z/fRieHR The result has also been changed to getExtOrTrunc in case that VT==NVT, which is not handled by SEXT/ZEXT.	2024-01-03 10:52:01 +00:00
Stanislav Mekhanoshin	3bcee8568a	[AMDGPU] Global ISel for llvm.prefetch (#76183 )	2024-01-03 01:02:55 -08:00
David Green	d659bd1635	[GlobalISel][AArch64] Tail call libcalls. (#74929 ) This tries to allow libcalls to be tail called, using a similar method to DAG where the type is checked to make sure they match, and if so the backend, through lowerCall checks that the tailcall is valid for all arguments.	2024-01-03 07:59:36 +00:00
David Green	5b5614c92f	[AArch64][GlobalISel] Add legalization for vecreduce.fmul (#73309 ) There are no native operations that we can use for floating point mul, so lower by splitting the vector into chunks multiple times. There is still a missing fold for fmul_indexed, that could help the gisel test cases a bit.	2024-01-03 07:49:20 +00:00
brendaso1	923f6ac018	[FastISel][AArch64] Compare Instruction Miscompilation Fix (#75993 ) When shl is folded in compare instruction, a miscompilation occurs when the CMP instruction is also sign-extended. For the following IR: %op3 = shl i8 %op2, 3 %tmp3 = icmp eq i8 %tmp2, %op3 It used to generate cmp w8, w9, sxtb #3 which means sign extend w9, shift left by 3, and then compare with the value in w8. However, the original intention of the IR would require `%op2` to first shift left before extending the operands in the comparison operation . Moreover, if sign extension is used instead of zero extension, the sample test would miscompile. This PR creates a fix for the issue, more specifically to not fold the left shift into the CMP instruction, and to create a zero-extended value rather than a sign-extended value.	2024-01-03 13:49:05 +08:00
Craig Topper	4e347b4e38	Revert "[RISCV][ISel] Combine scalable vector add/sub/mul with zero/sign extension (#72340 )" This reverts most of commit 5b155aea0e529b7b5c807e189fef6ea5cd5faec9. I have left the new test file, but regenerated the checks. This causes failures in our downstream testing. The input types to the extends need to be checked so we don't create RISCVISD::VZEXT_VL with illegal or unsupported input type.	2024-01-02 19:49:42 -08:00
Kai Luo	8ae73fea3a	[PowerPC] Precommit test for #72845 . NFC.	2024-01-03 03:03:48 +00:00
Craig Topper	bf684a97f3	[RISCV] Don't emit vxrm writes for vnclip(u).wi with shift of 0. (#76578 ) If there's no shift being performed, the rounding mode doesn't matter. We could do the same for vssra and vssrl, but they are no-ops with a shift of 0 so would be better off being removed earlier.	2024-01-02 09:50:06 -08:00
Thorsten Schütt	4b9194952d	[GlobalIsel] Combine selects with constants (#76089 ) A first small step at combining selects.	2024-01-02 17:26:39 +01:00
Pierre van Houtryve	33565750e4	[AMDGPU] Fix moveToValu for copy to phys SGPRs (#76715 ) Fixes #76031	2024-01-02 14:45:33 +01:00
Jay Foad	cf025c767e	[AMDGPU] GFX12 global_atomic_ordered_add_b64 instruction and intrinsic (#76149 )	2024-01-02 13:02:20 +00:00
David Green	d714be978c	[AArch64] Check for exact size when finding 1's for CMLE. (#76452 ) This is a fix for the second half of #75822, where smaller constants can also be bitcast to larger types. We should be checking the size is what we expect it to be when matching ones.	2024-01-02 09:24:08 +00:00
Jim Lin	9e1ad3cff6	[RISCV] Remove blank lines at the end of testcases. NFC.	2024-01-02 13:13:04 +08:00
David Green	d4a6995e94	[AArch64][GlobalISel] Legalize large G_SEXT_INREG These come from the legalization of other operations, but it makes sense to split the operations into legal sizes before lowering them.	2024-01-01 15:28:08 +00:00
Matt Arsenault	25cd249355	AMDGPU: Don't assert on select of v32i16/v32f16	2024-01-01 21:24:41 +07:00
Matt Arsenault	459270934b	AMDGPU: Add more select bf16 vector tests	2024-01-01 21:24:41 +07:00
David Green	90c397fc56	[AArch64] Add icmp and fcmp tests for GlobalISel. NFC	2023-12-31 18:45:01 +00:00
Yingwei Zheng	1228becf7d	[FuncAttrs] Deduce `noundef` attributes for return values (#76553 ) This patch deduces `noundef` attributes for return values. IIUC, a function returns `noundef` values iff all of its return values are guaranteed not to be `undef` or `poison`. Definition of `noundef` from LangRef: ``` noundef This attribute applies to parameters and return values. If the value representation contains any undefined or poison bits, the behavior is undefined. Note that this does not refer to padding introduced by the type’s storage representation. ``` Alive2: https://alive2.llvm.org/ce/z/g8Eis6 Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=30dcc33c4ea3ab50397a7adbe85fe977d4a400bd&to=c5e8738d4bfbf1e97e3f455fded90b791f223d74&stat=instructions:u \|stage1-O3\|stage1-ReleaseThinLTO\|stage1-ReleaseLTO-g\|stage1-O0-g\|stage2-O3\|stage2-O0-g\|stage2-clang\| \|--\|--\|--\|--\|--\|--\|--\| \|+0.01%\|+0.01%\|-0.01%\|+0.01%\|+0.03%\|-0.04%\|+0.01%\| The motivation of this patch is to reduce the number of `freeze` insts and enable more optimizations.	2023-12-31 20:44:48 +08:00
Phoebe Wang	a384cd5012	[X86][BF16] Add subvec_zero_lowering patterns (#76507 )	2023-12-31 11:14:41 +08:00
Mikhail Gudim	69bc371835	[RISCV][GlobalISel] Zbkb support for G_ROTL and G_ROTR (#76599 ) These instructions are legal in the presence of Zbkb extension.	2023-12-30 00:45:18 -05:00
Min-Yih Hsu	4bd79ea3fe	[M68k] Add pc-relative displacement (PCD) addressing mode for MOVSX And disable offset folding altogether since we cannot always gain the precise offset there to see if that fits into a certain size of displacement.	2023-12-29 11:52:49 -08:00
Ivan Kosarev	b6daac023a	[AMDGPU][True16] Remove the VGPR_LO/HI16 register classes. (#76500 )	2023-12-29 12:13:24 +00:00
yingopq	e13e95bc44	[Mips] Optimize (shift x (and y, BitWidth - 1)) to (shift x, y) (#73889 ) Do optimization to turn x >> (shift & 31/63) into a single srlv instead of andi + srlv, since the mips variable shift instruction already implicitly masks the shift, like x86, wasm and AMDGPU. Copy the X86DAGToDAGISel::isUnneededShiftMask() function to MIPS for checking whether need combine two instructions to one.	2023-12-29 14:53:55 +05:30
Chia	87779fd823	[RISCV][ISel] Remove redundant min/max in saturating truncation (#75145 ) This patch closed #73424, which is also a missed-optimization case similar to #68466 on X86. ## Source Code ``` define void @trunc_sat_i8i16(ptr %x, ptr %y) { %1 = load <8 x i16>, ptr %x, align 16 %2 = tail call <8 x i16> @llvm.smax.v8i16(<8 x i16> %1, <8 x i16> <i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128, i16 -128>) %3 = tail call <8 x i16> @llvm.smin.v8i16(<8 x i16> %2, <8 x i16> <i16 127, i16 127, i16 127, i16 127, i16 127, i16 127, i16 127, i16 127>) %4 = trunc <8 x i16> %3 to <8 x i8> store <8 x i8> %4, ptr %y, align 8 ret void } ``` ## Before this patch: ``` trunc_sat_i8i16: # @trunc_maxmin_id_i8i16 vsetivli zero, 8, e16, m1, ta, ma vle16.v v8, (a0) li a0, -128 vmax.vx v8, v8, a0 li a0, 127 vmin.vx v8, v8, a0 vsetvli zero, zero, e8, mf2, ta, ma vnsrl.wi v8, v8, 0 vse8.v v8, (a1) ret ``` ## After this patch: ``` trunc_sat_i8i16: # @trunc_maxmin_id_i8i16 vsetivli zero, 8, e8, mf2, ta, ma vle16.v v8, (a0) csrwi vxrm, 0 vnclip.wi v8, v8, 0 vse8.v v8, (a1) ret ```	2023-12-29 16:15:47 +08:00
wanglei	da5378e87e	[LoongArch] Fix incorrect pattern [X]VBITSELI_B instructions Adjusted the operand order of [X]VBITSELI_B to correctly match vselect.	2023-12-29 14:44:29 +08:00
Chia	5b155aea0e	[RISCV][ISel] Combine scalable vector add/sub/mul with zero/sign extension (#72340 ) This PR mainly aims at resolving the below missed-optimization case, while it could also be considered as an extension of the previous patch https://reviews.llvm.org/D133739?id= ## Missed-Optimization Case Compiler Explorer: https://godbolt.org/z/GzWzP7Pfh ### Source Code: ``` define <vscale x 2 x i16> @multiple_users(ptr %x, ptr %y, ptr %z) { %a = load <vscale x 2 x i8>, ptr %x %b = load <vscale x 2 x i8>, ptr %y %b2 = load <vscale x 2 x i8>, ptr %z %c = sext <vscale x 2 x i8> %a to <vscale x 2 x i16> %d = sext <vscale x 2 x i8> %b to <vscale x 2 x i16> %d2 = sext <vscale x 2 x i8> %b2 to <vscale x 2 x i16> %e = mul <vscale x 2 x i16> %c, %d %f = add <vscale x 2 x i16> %c, %d2 %g = sub <vscale x 2 x i16> %c, %d2 %h = or <vscale x 2 x i16> %e, %f %i = or <vscale x 2 x i16> %h, %g ret <vscale x 2 x i16> %i } ``` ### Before This Patch ``` # %bb.0: vsetvli a3, zero, e16, mf2, ta, ma vle8.v v8, (a0) vle8.v v9, (a1) vle8.v v10, (a2) svf2 v11, v8 vsext.vf2 v8, v9 vsext.vf2 v9, v10 vmul.vv v8, v11, v8 vadd.vv v10, v11, v9 vsub.vv v9, v11, v9 vor.vv v8, v8, v10 vor.vv v8, v8, v9 ret ``` ### After This Patch ``` # %bb.0: vsetvli a3, zero, e8, mf4, ta, ma vle8.v v8, (a0) vle8.v v9, (a1) vle8.v v10, (a2) vwmul.vv v11, v8, v9 vwadd.vv v9, v8, v10 vwsub.vv v12, v8, v10 vsetvli zero, zero, e16, mf2, ta, ma vor.vv v8, v11, v9 vor.vv v8, v8, v12 ret ``` We can see Add/Sub/Mul are combined with the Sign Extension. ## Relation to the Patch D133739 The patch D133739 introduced an optimization for folding `ADD_VL`/ `SUB_VL` / `MUL_V` with `VSEXT_VL` / `VZEXT_VL`. However, the patch did not consider the case of non-fixed length vector case, thus this PR could also be considered as an extension for the D133739. Furthermore, in the current `SelectionDAG`, we represent scalable vector add (or any binary operator) as a normal `ADD` operation. It might be better to use an Opcode like `ADD_VL`, which needs further conversation and decision.	2023-12-29 14:36:38 +08:00
wanglei	c7367f985e	[LoongArch] Fix incorrect pattern XVREPL128VEI_{W/D} instructions Remove the incorrect patterns for `XVREPL128VEI_{W/D}` instructions, and add correct patterns for XVREPLVE0_{W/D} instructions	2023-12-29 14:03:53 +08:00
wanglei	47c88bcd5d	[LoongArch] Fix LASX vector_extract codegen Custom lowering `ISD::EXTRACT_VECTOR_ELT` with lasx.	2023-12-29 13:48:53 +08:00
Qiu Chaofan	c97a7675ee	[PowerPC] Expand FSINCOS of fp128 (#76494 )	2023-12-29 11:27:06 +08:00
Phoebe Wang	6c87f46795	[X86][NFC] Remove meaningless FIXME Solved by #76485.	2023-12-29 10:45:42 +08:00
David Green	1c87d5c4fc	[AArch64][GlobalISel] Lower fminnm/fmaxnm through Global ISel Whilst this might technically not be correct if a combine treats signed zeroes differently, where the neon operations are more defined than the minnum/maxnum nodes. It mirrors what SDAG does, which allows us to lower aarch64_neon_fminnm and aarch64_neon_fmaxnm through the existing selection patterns.	2023-12-28 20:02:30 +00:00
Mircea Trofin	dc1931a8c5	[mlgo] Fix post PR #76319 Some opcodes changed.	2023-12-28 08:28:51 -08:00
Craig Topper	2dccf11b92	[RISCV] Remove gp and tp from callee saved register lists. (#76483 ) This appears to match gcc behavior. Resolves https://discourse.llvm.org/t/risc-v-calling-convention-implementation-in-clang-tp-and-gp-registers/75757	2023-12-27 21:36:03 -08:00
Phoebe Wang	e499ae53b3	[X86][BF16] Support INSERT_SUBVECTOR and CONCAT_VECTORS (#76485 )	2023-12-28 13:29:01 +08:00
Wang Pengcheng	13cdee9047	[RISCV][MC] Add support for experimental Zcmop extension (#76395 ) This implements experimental support for the Zcmop extension as specified here: https://github.com/riscv/riscv-isa-manual/blob/main/src/zimop.adoc. This change adds only MC support.	2023-12-28 13:03:16 +08:00
Phoebe Wang	3081bacb60	[X86][BF16] Add X86SubVBroadcastld patterns (#76479 )	2023-12-28 10:08:27 +08:00

... 27 28 29 30 31 ...

52796 Commits