llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	d2856ff457	[X86] Enable v32f16 FNEG custom lowering on AVX512 targets	2023-11-30 10:07:01 +00:00
Simon Pilgrim	8851e411ed	[X86] Enable v16f16 FNEG custom lowering on AVX targets	2023-11-30 10:07:01 +00:00
Simon Pilgrim	06b9d92e0e	[X86] Enable v8f16 FABS custom lowering on SSE2 targets	2023-11-30 10:07:00 +00:00
Simon Pilgrim	abc60e9808	[X86] vec_fabs.ll - add SSE test coverage	2023-11-30 10:07:00 +00:00
Simon Pilgrim	9d4c3e9035	[X86] Enable v8f16 FNEG custom lowering	2023-11-30 10:07:00 +00:00
Shengchen Kan	eb64697a7b	[X86][Codegen] Correct the domain of VP2INTERSECT GenericDomain -> SSEPackedInt Found by #73654	2023-11-30 17:56:21 +08:00
wanglei	b72456120f	[LoongArch] Add codegen support for extractelement (#73759 ) Add codegen support for extractelement when enable `lsx` or `lasx` feature.	2023-11-30 17:29:18 +08:00
Shengchen Kan	511ba45a47	[X86][MC][CodeGen] Support EGPR for KMOV (#73781 ) KMOV is essential for copy between k-registers and GPRs. R16-R31 was added into GPRs in #70958, so we extend KMOV for these new registers first. This patch 1. Promotes KMOV instructions from VEX space to EVEX space 2. Emits prefix {evex} for the EVEX variants 3. Prefers EVEX variant than VEX variant in ISEL and optimizations for better RA EVEX variants will be compressed to VEX variants by existing EVEX2VEX pass if no EGPR is used. RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4 TAG: llvm-test-suite && CPU2017 can be built with feature egpr successfully.	2023-11-30 16:13:51 +08:00
Pierre van Houtryve	8a66510fa7	[AMDGPU] Don't create mulhi_24 in CGP (#72983 ) Instead, create a mul24 with a 64 bit result and let ISel take care of it. This allows patterns to simply match mul24 even for 64-bit muls instead of having to match both mul/mulhi and a buildvector/bitconvert/etc.	2023-11-30 08:26:45 +01:00
Craig Topper	ce570d1a22	[RISCV] Remove old FIXMEs from test. NFC	2023-11-29 21:53:32 -08:00
Kai Luo	afd9582b36	[PowerPC] Enhance test for PR #73609 . NFC.	2023-11-30 05:06:29 +00:00
Philip Reames	e947f95337	[LSR][TTI][RISCV] Enable terminator folding for RISC-V If looking for a miscompile revert candidate, look here! The transform being enabled prefers comparing to a loop invariant exit value for a secondary IV over using an otherwise dead primary IV. This increases register pressure (by requiring the exit value to be live through the loop), but reduces the number of instructions within the loop by one. On RISC-V which has a large number of scalar registers, this is generally a profitable transform. We loose the ability to use a beqz on what is typically a count down IV, and pay the cost of computing the exit value on the secondary IV in the loop preheader, but save an add or sub in the loop body. For anything except an extremely short running loop, or one with extreme register pressure, this is profitable. On spec2017, we see a 0.42% geomean improvement in dynamic icount, with no individual workload regressing by more than 0.25%. Code size wise, we trade a (possibly compressible) beqz and a (possibly compressible) addi for a uncompressible beq. We also add instructions in the preheader. Net result is a slight regression overall, but neutral or better inside the loop. Previous versions of this transform had numerous cornercase correctness bugs. All of them ones I can spot by inspection have been fixed, and I have run this through all of spec2017, but there may be further issues lurking. Adding uses to an IV is a fraught thing to do given poison semantics, so this transform is somewhat inherently risky. This patch is a reworked version of D134893 by @eop. That patch has been abandoned since May, so I picked it up, reworked it a bit, and am landing it.	2023-11-29 12:04:06 -08:00
David Li	f688e09012	Enable custom lowering of fabs_v16f16 with AVX and fabs_v32f16 with A… (#73565 ) This is the last patch for fabs lowering. v32f16 works for AVX as well with the patch (with type legalization).	2023-11-29 09:39:53 -08:00
Nick Desaulniers	b053359892	[X86InstrInfo] support memfold on spillable inline asm (#70832 ) This enables -regalloc=greedy to memfold spillable inline asm MachineOperands. Because no instruction selection framework marks MachineOperands as spillable, no language frontend can observe functional changes from this patch. That will change once instruction selection frameworks are updated. Link: https://github.com/llvm/llvm-project/issues/20571	2023-11-29 08:18:51 -08:00
Simon Pilgrim	244389ad17	[X86] Add fneg vector test coverage	2023-11-29 15:03:26 +00:00
Simon Pilgrim	53f3e59e59	[X86] Rename vec_fneg.ll to combine-fneg.ll These tests are for fneg canonicalization combines, not codegen coverage tests like most of the other vec_* test files.	2023-11-29 15:03:25 +00:00
Paschalis Mpeis	1bfb84b477	[NFC][TLI] Improve tests for ArmPL and SLEEF Intrinsics. (#73352 ) Auto-generate test `armpl-intrinsics.ll` and simplify tests: - Eliminate scalar tail with no tail-folding flag. - Use active lane mask for shorter check lines (no long `shufflevectors`). - Eliminate scalar loops by providing `noalias` to relevant arguments and run `simplifycfg` to drop them. - Update script now use `@llvm.compiler.used` instead of a longer regex.	2023-11-29 11:19:10 +00:00
Simon Pilgrim	0fac9da734	[DAG] getNode() - relax (zext (trunc x)) -> x fold iff the upper bits are known zero. Just leave the (zext (trunc (and x, c))) pattern which is still being used to create some zext_inreg patterns.	2023-11-29 10:38:11 +00:00
Simon Pilgrim	621183cb45	[X86] Add test case showing failure to remove unnecessary zext from address math Thanks to @yubingex007-a11y for the original test case	2023-11-29 10:38:11 +00:00
Alex Bradbury	85c9c16895	[RISCV] Support load clustering in the MachineScheduler (off by default) (#73754 ) This adds minimal support for load clustering, but disables it by default. The intent is to iterate on the precise heuristic and the question of turning this on by default in a separate PR. Although previous discussion indicates hope that the MachineScheduler would replace most uses of the SelectionDAG scheduler, it does seem most targets aren't using MachineScheduler load clustering right now: PPC+AArch64 seem to just use it to help with paired load/store formation and although AMDGPU uses it for general clustering it also implements ShouldScheduleLoadsNear for the SelectionDAG scheduler's clustering.	2023-11-29 10:01:55 +00:00
Qiu Chaofan	403ab9ac74	[NFC] Update X86 frem CodeGen case	2023-11-29 16:53:34 +08:00
David Green	b6ee831b59	[AArch64] Load/store optimizer fixes and cleanup. This includes a couple of fixes after #71908 for bundles and some cleanup for the debug output. One was an iterator type that asserted on bundles, the second a rather subtle issue where forAllMIsUntilDef would hit the LdStLimit when renaming registers, meaning the last instruction was not updated leaving an invalid `ldp x6, x6` instruction.	2023-11-29 07:41:15 +00:00
Craig Topper	d345cfb55c	[RISCV][GISel] Support s64 G_SELECT on RV32 with D extension. We have to force the register bank to FPRB if the type is s64 and the GPR is 32 bits.	2023-11-28 23:36:51 -08:00
wanglei	5e7e0d6032	[LoongArch] Fix pattern for FNMSUB_{S/D} instructions (#73742 ) ``` when a=c=-0.0, b=0.0: -(a * b + (-c)) = -0.0 -a * b + c = 0.0 (fneg (fma a, b (-c))) != (fma (fneg a), b ,c) ``` See https://reviews.llvm.org/D90901 for a similar discussion on X86.	2023-11-29 15:21:21 +08:00
Craig Topper	35db35b7cf	[RISCV][GISel] Support G_FCOPYSIGN with F and D extension.	2023-11-28 21:50:04 -08:00
Yeting Kuo	f35c0f2f23	[RISCV] Refine pattern (select_cc seteq (and x, C), 0, 0, A) with Zbs. (#73746 ) PR #72978 disabled transformation (select_cc seteq (and x, C), 0, 0, A) -> (and (sra(shl x)), A) for better Zicond codegen. It still enables the combine when C is not fit into 12-bits. This patch disables the combine when Zbs enabled.	2023-11-29 13:09:47 +08:00
Ruiling, Song	c1511a65d5	[AMDGPU] Folding imm offset in more cases for scratch access (#70634 ) For scratch load/store, our hardware only accept non-negative value in SGPR/VGPR. Besides the case that we can prove from known bits, we can also prove that the value in `base` will be non-negative: 1.) When the ADD for the address calculation has NonUnsignedWrap flag. 2.) When the immediate offset is already negative.	2023-11-29 12:46:45 +08:00
Yeting Kuo	f73844d92b	[RISCV] Generate bexti for (select(setcc eq (and x, c))) where c is power of 2. (#73649 ) Currently, llvm can transform (setcc ne (and x, c)) to (bexti x, log2(c)) where c is power of 2. This patch transform (select (setcc ne (and x, c)), T, F) into (select (setcc eq (and x, c)), F, T). It is benefit to the case c is not fit to 12-bits.	2023-11-29 11:56:48 +08:00
paperchalice	1debbae96b	[CodeGen] Port CallBrPrepare to new pass manager (#73630 ) IIUC in the new pass manager infrastructure, the analysis result is always computed lazily. So just use `getResult` here.	2023-11-29 10:33:14 +09:00
Arthur Eubanks	d8d9394cb0	Revert "[X86] With large code model, put functions into .ltext with large section flag (#73037 )" This reverts commit 38e435895779c6f0e6c47a171f3b300ad99828b3. May be culprit for https://lab.llvm.org/buildbot/#/builders/37/builds/28079/steps/9/logs/stdio.	2023-11-28 14:14:40 -08:00
Arthur Eubanks	38e4358957	[X86] With large code model, put functions into .ltext with large section flag (#73037 ) So that when mixing small and large text, large text stays out of the way of the rest of the binary. This is useful for mixing precompiled small code model object files and built-from-source large code model binaries so that the the text sections don't get merged.	2023-11-28 12:55:17 -08:00
Philip Reames	02cbae4fe0	[RISCV] Work on subreg for insert_vector_elt when vlen is known (#72666 ) (#73680 ) If we have a constant index and a known vlen, then we can identify which registers out of a register group is being accessed. Given this, we can reuse the (slightly generalized) existing handling for working on sub-register groups. This results in all constant index extracts with known vlen becoming m1 operations. One bit of weirdness to highlight and explain: the existing code uses the VL from the original vector type, not the inner vector type. This is correct because the inner register group must be smaller than the original (possibly fixed length) vector type. Overall, this seems to a reasonable codegen tradeoff as it biases us towards immediate AVLs, which avoids needing the vsetvli form which clobbers a GPR for no real purpose. The downside is that for large fixed length vectors, we end up materializing an immediate in register for little value. We should probably generalize this idea and try to optimize the large fixed length vector case, but that can be done in separate work.	2023-11-28 10:45:22 -08:00
Stanislav Mekhanoshin	87d884b5c8	[AMDGPU] Fix folding of v2i16/v2f16 splat imms (#72709 ) We can use inline constants with packed 16-bit operands, but these should use op_sel. Currently splat of inlinable constants is considered legal, which is not really true if we fail to fold it with op_sel and drop the high half. It may be legal as a literal but not as inline constant, but then usual literal checks must be performed. This patch makes these splat literals illegal but adds additional logic to the operand folding to keep current folds. This logic is somewhat heavy though. This has fixed constant bus violation in the fdot2 test.	2023-11-28 09:07:26 -08:00
Philip Reames	3e5acc78f7	[RISCV] Precommit test coverage for insert_vector_elt with exact VLEN	2023-11-28 08:49:04 -08:00
Philip Reames	f3a9dbe7fc	[RISCV] Split build_vector into vreg sized pieces when exact VLEN is known (#73606 ) If we have a high LMUL build_vector and a known exact VLEN, we can decompose the build_vector into one build_vector per register in the register group. Doing so requires exact knowledge of which elements correspond to each register in the register group, and thus an exact VLEN must be known. Since we no longer have operations which are linear (or worse) in LMUL, this also allows us to lower all build_vectors without resorting to going through the stack.	2023-11-28 07:39:58 -08:00
Jay Foad	0d40831765	[AMDGPU] Allow folding to FMAAK with SGPR and immediate operand on GFX10+ (#72266 ) Allow foldImmediate to create instructions like: v_fmaak_f32 v0, s0, v0, 0x42000000 This instruction has two "scalar values": s0 and 0x42000000. On GFX10+ this is allowed. This fold was originally implemented before the compiler supported GFX10, when all ASICs were limited to one scalar value.	2023-11-28 14:36:37 +00:00
Uday Bondhugula	b5d132010d	[NFC][NVPTX] Add a simpler test case for 0b80288e9e0b (#73379 ) While 0b80288e9e0b allowed more efficient lowering for 16xi8 loads, its test case was closer to an "integration" one. Add a much simpler unit test case that exercises it.	2023-11-28 19:28:51 +05:30
David Green	ab7110bcd6	[AArch64][SVE] Remove pseudo from LD1_IMM (#73631 ) The LD1 immediate offset instructions have both a pseudo and a real instruction, mostly as the instructions shares a tablegen class with the FFR version of the instructions. As far as I can tell the pseudo for the non-ffr versions does not serve any useful purpose though, and we can rejig the the classes to only define the pseudo for FFR instructions similar to the existing sve_mem_cld_ss instructions. The end result of this is that we don't have a SideEffects flag on the LD1_IMM instructions whilst scheduling them, and have a few less pseudo instructions which is usually a good thing.	2023-11-28 12:13:26 +00:00
Mariusz Sikora	facead618b	[AMDGPU] PromoteAlloca - bail always if load/store is volatile (#73228 ) This change is addressing case where alloca size is the same as load/store size.	2023-11-28 12:01:35 +01:00
Simon Pilgrim	eba50929b8	[X86] X86DAGToDAGISel - fix typo in #73126 We were casting the LoadSDNode from the wrong node in the base pointer uses list, meaning the ptr/chain comparison were comparing against themselves.	2023-11-28 10:17:57 +00:00
paperchalice	61e58c4dc1	[CodeGen] Port DwarfEHPrepare to new pass manager (#72500 ) Co-authored-by: PaperChalice <example@example.com>	2023-11-28 17:53:25 +09:00
Stanislav Mekhanoshin	82d22a1bb4	[AMDGPU] Fixed folding of inline imm into dot w/o opsel (#73589 ) A splat packed constant can be folded as an inline immediate but it shall use opsel. On gfx940 this code path can be skipped due to HW bug workaround and then it may be folded w/o opsel which is a bug. Fixed.	2023-11-28 00:50:41 -08:00
Craig Topper	ffcc5c7796	[RISCV][GISel] Select G_FENCE. (#73184 ) Using IR test to make it easier to compare with the SelectionDAG test output. The constant operands otherwise make it harder to understand.	2023-11-27 20:24:03 -08:00
Shengchen Kan	a3b7b2d635	[X86][CodeGen] Not compress EVEX into VEX when R16-R31 is used (#73604 ) b/c VEX prefix can not encode R16-R31.	2023-11-28 11:40:48 +08:00
Kai Luo	00f9946680	[PowerPC] Precommit test of building vector via load and zeros. NFC.	2023-11-28 03:32:57 +00:00
Shengchen Kan	d9221da72b	[X86][MC] Keep backward compatibility in inline asm for constraints (#73529 ) Not use r16-r31 with 'q','r','l' constraint for backward compatibility	2023-11-28 09:42:03 +08:00
Philip Reames	52b413f25a	[RISCV] Precommit tests for buildvector lowering with exact VLEN	2023-11-27 16:48:20 -08:00
Philip Reames	93e156833b	[DAG] Fix a miscompile in insert_subvector undef (insert_subvector undef, ..), idx combine (#73587 ) The combine was implicitly assuming that the index on the outer insert_subvector meant the same thing when the source was switched to be the index of the inner insert_subvector. This is not true if the innermost sub-vector is fixed, and the outer subvector is scalable. I could do a less restrictive fix here - i.e. allow the case where the scalability of the subvectors are the same - but there's no test coverage which shows this transform actually has profit. Given that, go for the simplest fix.	2023-11-27 16:45:29 -08:00
Craig Topper	b4cf014991	[RISCV][GISel] Select trap and debugtrap. (#73171 )	2023-11-27 15:52:15 -08:00
Philip Reames	cf17a24a4b	[RISCV] Use subreg extract for extract_vector_elt when vlen is known (#72666 ) This is the first in a planned patch series to teach our vector lowering how to exploit register boundaries in LMUL>1 types when VLEN is known to be an exact constant. This corresponds to code compiled by clang with the -mrvv-vector-bits=zvl option. For extract_vector_elt, if we have a constant index and a known vlen, then we can identify which register out of a register group is being accessed. Given this, we can do a sub-register extract for that register, and then shift any remaining index. This results in all constant index extracts becoming m1 operations, and thus eliminates the complexity concern for explode-vector idioms at high lmul.	2023-11-27 14:33:16 -08:00

... 35 36 37 38 39 ...

52796 Commits