llvm-project

Author	SHA1	Message	Date
Shengchen Kan	60dbb2cec1	[X86][test] Update CHECK prefixes in CodeGen/X86/vector-interleaved-store-*.ll to suppress warnings Suppress warnings like WARNING: Prefix AVX had conflicting output from different RUN lines for all functions in test vector-interleaved-store-i16-stride-7.ll WARNING: Prefix AVX1 had conflicting output from different RUN lines for all functions in test vector-interleaved-store-i16-stride-7.ll WARNING: Prefix AVX2 had conflicting output from different RUN lines for all functions in test vector-interleaved-store-i16-stride-7.ll WARNING: Prefix AVX2-ONLY had conflicting output from different RUN lines for all functions in test vector-interleaved-store-i16-stride-7.ll WARNING: Prefix AVX512 had conflicting output from different RUN lines for all functions in test vector-interleaved-store-i16-stride-7.ll WARNING: Prefix AVX512F had conflicting output from different RUN lines for all functions in test vector-interleaved-store-i16-stride-7.ll WARNING: Prefix AVX512F-ONLY had conflicting output from different RUN lines for all functions in test vector-interleaved-store-i16-stride-7.ll WARNING: Prefix AVX512-FAST had conflicting output from different RUN lines for all functions in test vector-interleaved-store-i16-stride-7.ll WARNING: Prefix AVX512DQ-ONLY had conflicting output from different RUN lines for all functions in test vector-interleaved-store-i16-stride-7.ll	2024-01-29 00:11:17 +08:00
David Green	f297d0bc6d	[AArch64][GlobalISel] More FCmp legalization. (#78734 ) This fills out the fcmp handling to be more like the other instructions, adding better support for fp16 and some larger vectors. Select of f16 values is still not handled optimally in places as the select is only legal for s32 values, not s16. This would be correct for integer but not necessarily for fp. It is as if we need to do legalization -> regbankselect -> extra legaliation -> selection.	2024-01-28 15:42:36 +00:00
Shengchen Kan	5abbb7b5d0	[X86][test] Update CHECK prefixes in CodeGen/X86/vector-interleaved-load-*.ll to suppress warnings Suppress warnings like WARNING: Prefix AVX had conflicting output from different RUN lines for all functions in test vector-interleaved-load-i16-stride-7.ll WARNING: Prefix AVX1 had conflicting output from different RUN lines for all functions in test vector-interleaved-load-i16-stride-7.ll WARNING: Prefix AVX2 had conflicting output from different RUN lines for all functions in test vector-interleaved-load-i16-stride-7.ll WARNING: Prefix AVX2-ONLY had conflicting output from different RUN lines for all functions in test vector-interleaved-load-i16-stride-7.ll WARNING: Prefix AVX512 had conflicting output from different RUN lines for all functions in test vector-interleaved-load-i16-stride-7.ll WARNING: Prefix AVX512F had conflicting output from different RUN lines for all functions in test vector-interleaved-load-i16-stride-7.ll WARNING: Prefix AVX512F-ONLY had conflicting output from different RUN lines for all functions in test vector-interleaved-load-i16-stride-7.ll WARNING: Prefix AVX512-FAST had conflicting output from different RUN lines for all functions in test vector-interleaved-load-i16-stride-7.ll WARNING: Prefix AVX512DQ-ONLY had conflicting output from different RUN lines for all functions in test vector-interleaved-load-i16-stride-7.ll	2024-01-28 14:41:59 +08:00
Chia	3855757f98	[RISCV][ISel] Remove redundant vmerge for the vwadd. (#78403 ) This patch is aiming at resolving the below missed-optimization case. ### Code ``` define <8 x i64> @vwadd_mask_v8i32(<8 x i32> %x, <8 x i64> %y) { %mask = icmp slt <8 x i32> %x, <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42> %a = select <8 x i1> %mask, <8 x i32> %x, <8 x i32> zeroinitializer %sa = sext <8 x i32> %a to <8 x i64> %ret = add <8 x i64> %sa, %y ret <8 x i64> %ret } ``` ### Before this patch [Compiler Explorer](https://godbolt.org/z/cd1bKTrx6) ``` vwadd_mask_v8i32: li a0, 42 vsetivli zero, 8, e32, m2, ta, ma vmslt.vx v0, v8, a0 vmv.v.i v10, 0 vmerge.vvm v16, v10, v8, v0 vwadd.wv v8, v12, v16 ret ``` ### After this patch ``` vwadd_mask_v8i32: li a0, 42 vsetivli zero, 8, e32, m2, ta, ma vmslt.vx v0, v8, a0 vsetvli zero, zero, e32, m2, tu, mu vwadd.wv v12, v12, v8, v0.t vmv4r.v v8, v12 ret ``` This pattern could be found in a reduction with a widening destination Specifically, we first do a fold like `(vwadd.wv y, (vmerge cond, x, 0)) -> (vwadd.wv y, x, y, cond)`, then do pattern matching on it.	2024-01-27 20:03:32 +09:00
Evgenii Kudriashov	cfd91199ca	[X86] Skip unused VRegs traverse (#78229 ) Almost all loops with getNumVirtRegs skip unused registers by means of reg_nodbg_empty or empty live interval. Except for these two cases that are revealed by GlobalISel since it can skip RegClass assignment for unused registers. Closes #64452, closes #71926	2024-01-26 23:57:14 +01:00
Alex MacLean	1d5820aafe	[NVPTX] improve identifier renaming for PTX (#79459 ) Update `NVPTXAssignValidGlobalNames` to convert all characters which are illegal in PTX identifiers to `_$_`. ([PTX ISA: 4.4 Identifiers](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#identifiers)).	2024-01-26 13:49:00 -08:00
Nikita Popov	07a1925b8b	Revert "Refactor recomputeLiveIns to operate on whole CFG (#79498 )" This reverts commit 59bf60519fc30d9d36c86abd83093b068f6b1e4b. Introduces a major compile-time regression.	2024-01-26 22:33:17 +01:00
Oskar Wirga	59bf60519f	Refactor recomputeLiveIns to operate on whole CFG (#79498 ) Currently, the way that recomputeLiveIns works is that it will recompute the livein registers for that MachineBasicBlock but it matters what order you call recomputeLiveIn which can result in incorrect register allocations down the line. This PR fixes that by simply recomputing the liveins for the entire CFG until convergence is achieved. This makes it harder to introduce subtle bugs which alter liveness.	2024-01-26 11:25:36 -08:00
Adhemerval Zanella	a58c62fa82	[X86] Do not end 'note.gnu.property' section with -fcf-protection (#79360 ) The glibc now adds the required minimum ISA level for libc-nonshared.a (linked on all programs) and this is done with an inline asm along with .note.gnu.property and .pushsection/.popsection. However, the x86 backend always ends the 'note.gnu.property' section when building with -fcf-protection, leading to assert failure: llvm/llvm-project-git/llvm/lib/MC/MCStreamer.cpp:1251: virtual void llvm::MCStreamer::switchSection(llvm::MCSection, const llvm::MCExpr): Assertion `!Section->hasEnded() && "Section already ended"' failed. [1] https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86/isa-level.c;h=3f1b269848a52f994275bab6f60dded3ded6b144;hb=HEAD	2024-01-26 10:33:47 -08:00
David Green	7f518ee9ea	[DAG] Add a one-use check to concat -> scalar_to_vector fold. (#79510 ) Without this we can end up with multiple copies from gpr->fpr.	2024-01-26 18:17:17 +00:00
Amy Kwan	d5fe1bd081	[AIX][TLS] Disallow the use of -maix-small-local-exec-tls and -fno-data-sections (#79252 ) This patch disallows the use of the -maix-small-local-exec-tls and -fno-data-sections options within clang, and also disallows the use of the aix-small-local-exec-tls attribute with the -data-sections=false option in llc. This is because having data sections off when using the aix-small-local-exec-tls feature is not ideal for performance. As the small-local-exec-tls region is a limited resource, this space should not used for variables that may be replaced. Note, that on AIX, data sections is turned on by default, so this patch makes it so that a diagnostic is emitted when users explicitly turn off data sections while using the aix-small-local-exec-tls feature.	2024-01-26 12:39:25 -05:00
dyung	45f883ed06	Change check for embedded llvm version number to a regex to make test more flexible. (#79528 ) This test started to fail when LLVM created the release/18.x branch and the main branch subsequently had the version number increased from 18 to 19. I investigated this failure (it was blocking our internal automation) and discovered that the CHECK statement on line 27 seemed to have the compiler version number (1800) encoded in octal that it was checking for. I don't know if this is something that explicitly needs to be checked, so I am leaving it in, but it should be more flexible so the test doesn't fail anytime the version number is changed. To accomplish that, I changed the check for the 4-digit version number to be a regex. I originally updated this test for the 18->19 transition in a01195ff5cc3d7fd084743b1f47007645bb385f4. This change makes the CHECK line more flexible so it doesn't need to be continually updated.	2024-01-26 09:36:20 -08:00
Nemanja Ivanovic	67c1c1dbb6	[PowerPC][X86] Make cpu id builtins target independent and lower for PPC (#68919 ) Make __builtin_cpu_{init\|supports\|is} target independent and provide an opt-in query for targets that want to support it. Each target is still responsible for their specific lowering/code-gen. Also provide code-gen for PowerPC. I originally proposed this in https://reviews.llvm.org/D152914 and this addresses the comments I received there. --------- Co-authored-by: Nemanja Ivanovic <nemanjaivanovic@nemanjas-air.kpn> Co-authored-by: Nemanja Ivanovic <nemanja@synopsys.com>	2024-01-26 11:24:50 -05:00
Krzysztof Drewniak	63fe80fb18	[SeperateConstOffsetFromGEP] Handle `or disjoint` flags (#76997 ) This commit extends separate-const-offset-from-gep to look at the newly-added `disjoint` flag on `or` instructions so as to preserve additional opportunities for optimization. The tests were pre-committed in #76972.	2024-01-26 09:56:06 -06:00
Evgenii Kudriashov	a437347562	[X86][GlobalISel] Remove G_OR/G_AND/G_XOR test duplication (NFC) (#79088 )	2024-01-26 16:48:51 +01:00
Simon Pilgrim	1f930cf894	[X86] Fold not(pcmpeq(and(X,CstPow2),0)) -> pcmpeq(and(X,CstPow2),CstPow2) (REAPPLIED) Reapply b9483d30a7d7a0650a0e83c75fcb9ab4932f475a with fix (typo - wasn't ensuring icmp vs zero) Fixes #78888	2024-01-26 15:13:59 +00:00
Shengchen Kan	d9245e8b47	[X86][ISEL] Add NDD entries in X86ISelDAGToDAG.cpp	2024-01-26 23:02:53 +08:00
Shimin Cui	e278c67096	Add support to meger strings used by metadata (#77364 ) Currently if the merged string is used by metadata, its metadata uses are not replaced if the string is merged. This is to add code support for the metadata use replacement.	2024-01-26 09:22:37 -05:00
Shengchen Kan	035f33bf41	[X86][CodeGen] Add NDD entries for X86InstrInfo::foldImmediate	2024-01-26 22:11:57 +08:00
Luke Lau	5cf9f2cd98	[RISCV] Fix M1 shuffle on wrong SrcVec in lowerShuffleViaVRegSplitting This fixes a miscompile from #79072 where we were taking the wrong SrcVec to do the M1 shuffle. E.g. if the SrcVecIdx was 2 and we had 2 VRegsPerSrc, we ended up taking it from V1 instead of V2.	2024-01-26 20:25:05 +07:00
Luke Lau	d407e6ca61	[RISCV] Add test to showcase miscompile from #79072	2024-01-26 20:25:05 +07:00
Diana Picus	46dd8acf36	[AMDGPU] Fix typos. NFC	2024-01-26 12:04:58 +01:00
Shengchen Kan	821dee9852	[X86][CodeGen] Add NDD entries for isAssociativeAndCommutative	2024-01-26 18:39:52 +08:00
Shengchen Kan	14a027b2b7	[X86][CodeGen] Support flags copy lowering for NDD ADC/SBB/RCL/RCR (#79280 )	2024-01-26 16:49:44 +08:00
David Green	f0012dcce4	[AArch64] Add a couple more csinc tests with disjoint ors. NFC	2024-01-26 08:30:35 +00:00
XinWang10	02d56801ee	[X86] Support APX promoted RAO-INT and MOVBE instructions (#77431 ) R16-R31 was added into GPRs in https://github.com/llvm/llvm-project/pull/70958, This patch supports the promoted RAO-INT and MOVBE instructions in EVEX space. RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4	2024-01-26 14:33:45 +08:00
XinWang10	6d0080b5de	[X86] Support promoted ENQCMD, KEYLOCKER and USERMSR (#77293 ) R16-R31 was added into GPRs in https://github.com/llvm/llvm-project/pull/70958, This patch supports the promoted ENQCMD, KEYLOCKER and USER-MSR instructions in EVEX space. RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4	2024-01-26 14:24:43 +08:00
Brandon Wu	fb94c6491a	[RISCV][SiFive] Reduce intrinsics of SiFive VCIX extension (#79407 ) This patch models LMUL and SEW as inputs in sf_vc_x_se and sf_vc_i_se, it reduces 42 intrinsics in the lookup table.	2024-01-26 11:15:53 +08:00
Michael Maitland	594b92a7b9	[RISCV] Add Tune to DontSinkSplatOperands (#79199 ) A CPU may prefer to not sink splat operands, one reason being that it could require a S2V transfer buffer to move scalars into buffers.	2024-01-25 14:44:36 -05:00
Florian Hahn	eb678d8993	[AArch64] Combine store (trunc X to <3 x i8>) to sequence of ST1.b. (#78637 ) Improve codegen for (trunc X to <3 x i8>) by converting it to a sequence of 3 ST1.b, but first converting the truncate operand to either v8i8 or v16i8, extracting the lanes for the truncate results and storing them. At the moment, there are almost no cases in which such vector operations will be generated automatically. The motivating case is non-power-of-2 SLP vectorization: https://github.com/llvm/llvm-project/pull/77790 PR: https://github.com/llvm/llvm-project/pull/78637	2024-01-25 18:28:44 +00:00
David Green	30279dcf51	[AArch64] Add a test from #79100 , showing extra unnecessary movs. NFC	2024-01-25 18:15:36 +00:00
Philip Reames	5aa5a2f1b7	[RISCV] Disable exact VLEN splitting for bitrotate shuffles (#79468 ) If we have a bitrotate shuffle, this is also by definition a vreg splitable shuffle when exact VLEN is known. However, there's no profit to be had from splitting the wider bitrotate lowering into individual m1 pieces. We'd rather leave it the higher lmul to reduce code size. This is a general problem for any linear-in-LMUL shuffle expansions when the vreg splitting still has to do linear work per piece. On first reflection it seems like element rotation might have the same interaction, but in that case, splitting can be done via a set of whole register moves (which may get folded into the consumer depending) which at least as good as a pair of slideup/slidedown. I think that bitrotate is the only shuffle expansion we have that actually needs handled here.	2024-01-25 10:06:14 -08:00
Douglas Yung	b9483d30a7	Revert "[X86] Fold not(pcmpeq(and(X,CstPow2),0)) -> pcmpeq(and(X,CstPow2),CstPow2)" This reverts commit 72f10f7eb536da58cb79e13974895cd97d4e1a5f. This change was causing a miscompile on an internal test and is being reverted at the author's request until it can be fixed.	2024-01-25 09:40:16 -08:00
Jay Foad	c5d59fe1b2	[AMDGPU] Disable V_MAD_U64_U32/V_MAD_I64_I32 workaround for GFX11.5 (#79460 ) The hardware bug only affects GFX11.0.x.	2024-01-25 16:28:49 +00:00
Wang Pengcheng	1a14c446dd	[RISCV][MC] Add experimental support of Zaamo and Zalrsc `A` extension has been split into two parts: Zaamo (Atomic Memory Operations) and Zalrsc (Load-Reserved/Store-Conditional). See also https://github.com/riscv/riscv-zaamo-zalrsc. This patch adds the MC support. Reviewers: dtcxzyw, topperc, kito-cheng Reviewed By: topperc Pull Request: https://github.com/llvm/llvm-project/pull/78970	2024-01-25 17:03:25 +08:00
David Green	2c49586e1b	[ARM] Fix MVEFloatOps check on creating VCVTN (#79291 ) In the past PerformSplittingToNarrowingStores handled both int and float ops, but since the introduction of MVETRUNC now only operates on float operations, creating VCVTN nodes. It should be guarded by hasMVEFloatOps to prevent a failure to select.	2024-01-25 08:12:51 +00:00
paperchalice	e390c229a4	[Pass] Add hyphen to some pass names (#74287 ) Here is the list of the renamed passes: - `callbrprepare` -> `callbr-prepare` - `dwarfehprepare` -> `dwarf-eh-prepare` - `flattencfg` -> `flatten-cfg` - `loweratomic` -> `lower-atomic` - `lowerinvoke` -> `lower-invoke` - `lowerswitch` -> `lower-switch` - `winehprepare` -> `win-eh-prepare` - `targetir` -> `target-ir` - `targetlibinfo` -> `target-lib-info` Legacy passes are not affected.	2024-01-25 16:05:54 +08:00
Jay Foad	45d2d7757f	[AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325 ) This is only valid on targets with architected SGPRs.	2024-01-25 07:48:06 +00:00
Yeting Kuo	df08350dcf	[RISCV] Implement foward inserting save/restore FRM instructions. (#77744 ) Previously, RISCVInsertReadWriteCSR inserted an FRM swap for any value other than 7 and restored the original value right after the vector instruction. This is inefficient if multiple vector instructions use the same rounding mode if the next vector instruction uses a different explicit rounding mode. This patch implements a local optimization to solve the above problem. We assume the starting rounding mode of the basic block is "dynamic." When iterating through a basic block and encountering an instruction whose rounding mode is not the same as the current rounding mode, we change the current rounding mode and save the current rounding mode if needed. And we may need to restore FRM when encountering function call, inline asm and some uses of FRM. The advanced version of this is to perform cross basic block analysis for the starting rounding mode of each basic block.	2024-01-25 14:41:52 +08:00
Craig Topper	5446902cf2	[RISCV] Add IsSignExtendingOpW to amocas.w. (#79351 )	2024-01-24 20:15:41 -08:00
Craig Topper	65e0dc68f5	[RISCV] Add test cases showing missed opportunity to remove sext.w after amocas.w. NFC	2024-01-24 20:15:33 -08:00
Philip Reames	28db4017b0	[RISCV] Add test coverage for bad interaction of exact vlen and rotate shuffles	2024-01-24 18:00:41 -08:00
Philip Reames	795090739c	[RISCV] Fix a bug accidentally introduced in e9311f9 If we're lowering an e8 m8 shuffle and we have an index value greater than 255, we have no available space to generate an e16 index vector. The code had originally handled this correctly, but in a recent refactoring I had moved the single source code above the check, and thus broke the single source by accident. I have a change on review to rework this (https://github.com/llvm/llvm-project/pull/79330), but for now, go with the most obvious fix.	2024-01-24 17:10:59 -08:00
Philip Reames	7386aa02ef	[RISCV] Add test coverage for shuffle index > i8 cornercase Triggered by discussion on https://github.com/llvm/llvm-project/pull/79330. In the process of writing this, realized one of my recent refactorings appears to have broken the legalization for the single source case here. Fix to follow in separate patch.	2024-01-24 17:03:29 -08:00
Michael Maitland	3967510032	[RISCV][GISel] First mask argument placed in v0 according to RISCV Ve… (#79343 ) …ctor CC.	2024-01-24 16:03:38 -05:00
Jonas Paulsson	84dcf3d35b	[SystemZ] Require D12 for i128 accesses in isLegalAddressingMode() (#79221 ) Machines with vector support handle i128 in vector registers and therefore only have the small displacement available for memory accesses. Update isLegalAddressingMode() to reflect this.	2024-01-24 20:16:05 +01:00
Alex MacLean	3b8539c9dc	[NVPTX] use incomplete aggregate initializers (#79062 ) The PTX ISA specifies that initializers may be incomplete ([5.4.4. Initializers](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#initializers)) > As in C, array initializers may be incomplete, i.e., the number of initializer elements may be less than the extent of the corresponding array dimension, with remaining array locations initialized to the default value for the specified array type. Emitting initializers in this form is preferable because it reduces the size of the PTX, in some cases significantly, and can improve compile time of ptxas as a result.	2024-01-24 09:24:28 -08:00
Philip Reames	396b6bbc5e	[RISCV] Recurse on second operand of two operand shuffles (#79197 ) This builds on bdc41106ee48dce59c500c9a3957af947f30c8c3. This change completes the migration to a recursive shuffle lowering strategy where when we encounter an unknown two argument shuffle, we lower each operand as a single source permute, and then use a vselect (i.e. a vmerge) to combine the results. This relies for code quality on the post-isel combine which will aggressively fold that vmerge back into the materialization of the second operand if possible. Note: The change includes only the most immediately obvious of the stylistic cleanup. There's a bunch of code movement that this enables that I'll do as a separate patch as rolling it into this creates an unreadable diff.	2024-01-24 08:29:28 -08:00
quic-asaravan	dc5b4daae7	[HEXAGON] Inlining Division (#79021 ) This patch inlines float division function calls for hexagon. Co-authored-by: Awanish Pandey <awanpand@codeaurora.org>	2024-01-24 09:30:33 -06:00
Jay Foad	70fc970378	[AMDGPU] Move architected SGPR implementation into isel (#79120 )	2024-01-24 15:06:20 +00:00

... 19 20 21 22 23 ...

52796 Commits