llvm-project

Author	SHA1	Message	Date
Amara Emerson	eaab3245d4	[GlobalISel] Add constant folding support for G_FMA/G_FMAD in the combiner. (#65659 )	2023-09-09 16:32:02 +08:00
liqin.weng	1b622fff44	[VP] IR expansion for inttoptr/ptrtoint Add basic handling for VP ops that can expand to cast intrinsics Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D159478	2023-09-09 15:34:46 +08:00
Thomas	0a7a926007	[NVPTX] Make i16x2 a native type and add supported vec instructions (#65799 ) recommit https://github.com/llvm/llvm-project/pull/65432 with minor bug fix for bitcasts	2023-09-08 13:44:58 -07:00
Philip Reames	0a298277d3	[RISCV] Add a negative case for strided load matching This covers the bug identified in review of pr 65777.	2023-09-08 13:33:52 -07:00
Dmitri Gribenko	b3a14cac4f	Revert "[NVPTX] Make i16x2 a native type and add supported vec instructions (#65432 )" This reverts commit db5d845c73ee2d64f1a5bab3fc72edece9e3a7ba. As per PR discussion "Looks like we've missed lowering of bitcasts between v2f16 and v2i16 and it breaks XLA."	2023-09-08 19:28:15 +02:00
Philip Reames	5ee7dc04de	[RISCV] Match gather(splat(ptr)) as zero strided load (#65769 ) We were already handling the case where the broadcast was being done via a GEP, but we hadn't handled the case of a broadcast via a shuffle.	2023-09-08 09:37:52 -07:00
Philip Reames	8dd87a5f57	[RISCV] Add gather test coverage for non-intptr index widths Note that these are non-canonical. At IR, we will generally canonicalize to the intptrty width form if knownbits allows.	2023-09-08 08:53:19 -07:00
David Stuttard	8c03239934	[AMDGPU] New intrinsic void llvm.amdgcn.s.nop(i16) (#65757 ) This allows front ends to insert s_nops - this is most often when a delay less than s_sleep 1 is required.	2023-09-08 16:24:10 +01:00
Jay Foad	8669a9f93a	[AMDGPU] Cope with SelectionDAG::UpdateNodeOperands returning a different SDNode (#65765 ) SITargetLowering::adjustWritemask calls SelectionDAG::UpdateNodeOperands to update an EXTRACT_SUBREG node in-place to refer to a new IMAGE_LOAD instruction, before we delete the old IMAGE_LOAD instruction. But in UpdateNodeOperands can do CSE on the fly and return a different EXTRACT_SUBREG node, so the original EXTRACT_SUBREG node would still exist and would refer to the old deleted IMAGE_LOAD instruction. This caused errors like: t31: v3i32,ch = <<Deleted Node!>> # D:1 This target-independent node should have been selected! UNREACHABLE executed at lib/CodeGen/SelectionDAG/InstrEmitter.cpp:1209! Fix it by detecting the CSE case and replacing all uses of the original EXTRACT_SUBREG node with the CSE'd one. Recommit with a fix for a use-after-free bug in the first version of this patch (#65340) which was caught by asan.	2023-09-08 16:16:02 +01:00
Phoebe Wang	24194090e1	[X86][RFC] Add new option `-m[no-]evex512` to disable ZMM and 64-bit mask instructions for AVX512 features This is an alternative of D157485 and a pre-feature to support AVX10. AVX10 Architecture Specification: https://cdrdv2.intel.com/v1/dl/getContent/784267 AVX10 Technical Paper: https://cdrdv2.intel.com/v1/dl/getContent/784343 RFC: https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661 Based on the feedbacks from LLVM and GCC community, we have agreed to start from supporting `-m[no-]evex512` on existing AVX512 features. The option `-mno-evex512` can be used with `-mavx512xxx` to build binaries that can run on both legacy AVX512 targets and AVX10-256. There're still arguments about what's the expected behavior when this option as well as `-mavx512xxx` used together with `-mavx10.1-256`. We decided to defer the support of `-mavx10.1` after we made consensus. Or furthermore, we start from supporting AVX10.2 and not providing any AVX10.1 options. Reviewed By: RKSimon, skan Differential Revision: https://reviews.llvm.org/D159250	2023-09-08 22:47:22 +08:00
Amara Emerson	730e8f659d	[AArch64][GlobalISel] Fix global offset folding combine inserting MIs into wrong place. Was causing use-before-def issues. Not sure how it remained undetected for so long.	2023-09-08 06:28:12 -07:00
Jay Foad	dd5af895bb	[AMDGPU] Mark S_NOP as having side effects (#65745 ) This prevents S_NOP from being rescheduled past other (side-effecting) instructions, which is useful because it is generally used to introduce a short delay or to avoid hazards. Currently this only affects MIR tests because the compiler itself only inserts nops in PostRAHazardRecognizer which runs after all scheduling.	2023-09-08 14:05:56 +01:00
Phoebe Wang	da1eb886c4	[X86] Do not check alignment for VINSERTPS (#65721 ) We don't have alignment constraint in AVX instructions.	2023-09-08 19:23:43 +08:00
Qiu Chaofan	d4d0b5eaab	Fix MIR failure after b922a362	2023-09-08 16:33:45 +08:00
David Green	a82c106e57	[ARM] Change CRC predicate to just HasCRC This removes the backend requirement for crc instructions on HasV8, relying on just HasCRC instead. This should allow them to be selected with ArmV7 + crc, making them more usable whilst hopefully not making them incorrectly generated (they only come from intrinsics, and HasCRC usually requires HasV8). This is how most other instructions are specified.	2023-09-08 09:02:15 +01:00
Qiu Chaofan	b922a36211	[PowerPC] Define SchedModel for Power8 PowerPC subtargets prior to Power9 use the 'legacy' itinerary way to provide scheduling information. This patch re-writes the tablegen file to define the scheduling information in the new SchedModel way, which can bring improvements to some benchmarks. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D154488	2023-09-08 15:43:21 +08:00
Phoebe Wang	2e44b07e24	[X86] Do not directly fold for VINSERTPS (#65718 ) We have already customized folding for VINSERTPS by 7e6606f4f1, which do the folding when alignment >= 4 bytes. We cannot arbitrarily fold it like others because we need to calculate the source offset.	2023-09-08 15:35:44 +08:00
bzEq	d9efcb54c9	[PEI][PowerPC] Fix false alarm of stack size limit (#65559 ) PPC64 allows stack size up to ((2^63)-1) bytes. Currently llc reports ``` warning: stack frame size (4294967568) exceeds limit (4294967295) in function 'main' ``` if the stack allocated is larger than 4G.	2023-09-08 15:16:00 +08:00
Phoebe Wang	11c3b979e6	[X86][NFC] Add a test case to show wrong memory folding for vinsertps	2023-09-08 14:30:33 +08:00
Nicolai Hähnle	2eb767c9e1	AMDGPU: Scratch instructions are trivially disjoint from SMEM and buffer instructions (#65287 ) Scratch instructions are always in addrspace(5), which can only alias with flat (and itself). SMEM and buffer instructions can never reference those address spaces, so they are trivially disjoint.	2023-09-08 07:43:36 +02:00
Amy Kwan	3f46e5453d	[AIX][TLS] Produce a faster local-exec access sequence with -maix-small-local-exec-tls (And optimize when load/store offsets are 0) This patch utilizes the -maix-small-local-exec-tls option added in D155544 to produce a faster access sequence for the local-exec TLS model, where loading from the TOC can be avoided. The patch either produces an addi/la with a displacement off of r13 (the thread pointer) when the address is calculated, or it produces an addi/la followed by a load/store when the address is calculated and used for further accesses. This patch also optimizes this sequence a bit more where we can remove the addi/la when the load/store offset is 0. A follow up patch will be posted to account for when the load/store offset is non-zero, and currently in these situations we keep the addi/la that precedes the load/store. Furthermore, this access sequence is only performed for TLS variables that are less than ~32KB in size. Differential Revision: https://reviews.llvm.org/D155600	2023-09-07 20:05:29 -05:00
Amy Kwan	8bdbee8aaa	[AIX][TLS] Add target attribute for -maix-small-local-exec-tls option. This patch adds a target attribute for an AIX-specific option that informs the compiler that it can use a faster access sequence for the local-exec TLS model (formally named aix-small-local-exec-tls). The Clang portion of this option is in D155544. The initial implementation to generate the faster access sequence is in D155600. Differential Revision: https://reviews.llvm.org/D156203	2023-09-07 20:05:29 -05:00
Philip Reames	b4a99f1cd6	[RISCV] Lower constant build_vectors with few non-sign bits via vsext (#65648 ) If we have a build_vector such as [i64 0, i64 3, i64 1, i64 2], we instead lower this as vsext([i8 0, i8 3, i8 1, i8 2]). For vectors with 4 or fewer elements, the resulting narrow vector can be generated via scalar materialization. For shuffles which get lowered to vrgathers, constant build_vectors of small constants are idiomatic. As such, this change covers all shuffles with an output type of 4 or less. I deliberately started narrow here. I think it makes sense to expand this to longer vectors, but we need a more robust profit model on the recursive expansion. It's questionable if we want to do the zsext if we're going to generate a constant pool load for the narrower type anyways. One possibility for future exploration is to allow the narrower VT to be less than 8 bits. We can't use vsext for that, but we could use something analogous to our widening interleave lowering with some extra shifts and ands.	2023-09-07 16:01:16 -07:00
Jeffrey Byrnes	5044531afd	[AMDGPU] Teach CalculateByteProvider about AMDGPUISD::PERM (#65547 ) As a standalone patch, it has limited effect. However, it is necessary as it supports upcoming commits.	2023-09-07 15:13:42 -07:00
stefanp-ibm	0a4a8bec34	[PowerPC] Turn string pooling on by default. (#65628 ) This patch turns the string pooling pass on by default. Some tests are updated as required.	2023-09-07 16:49:31 -04:00
Jeffrey Byrnes	7fda1b74be	[AMDGPU]: Allow combining into v_dot4 Differential Revision: https://reviews.llvm.org/D155995 Change-Id: I794f540217f0f84141338757b41b1be0493c7207	2023-09-07 12:58:48 -07:00
Philip Reames	063524e35a	[RISCV] Add coverage for missing gather/scatter combines	2023-09-07 11:57:30 -07:00
Wael Yehia	11d5c7bd28	[AIX] Add threadId and use nanosecond timestamp in sinit/sterm symbols With ThinLTO, when compiling SPEC 2017 omnetpp_r with -threads=4, two small modules can end up with the same timestamp in their sinit symbols when calculating time in seconds, creating duplicate definitions. This patch uses a timestamp in nanoseconds. Because the race can be between threads, embed the thread ID as well. Reviewed By: xingxue, daltenty Differential Revision: https://reviews.llvm.org/D159319	2023-09-07 17:46:41 +00:00
Amy Kwan	f94f85348d	Revert "[AIX][TLS] Generate .extern and .ref references to __tls_get_addr for local-exec accesses." This reverts commit f0b2f6954101c9052763a99a1e7ac135770e779a. The implementation is incorrect and breaks compiling local-exec programs.	2023-09-07 12:10:37 -05:00
esmeyi	b85a9b3093	[PowerPC] Try to use less instructions to materialize 64-bit constant when High32=Low32. Summary: Materialization a 64-bit constant with High32=Low32 only requires 2 instructions instead of 3 when Low32 can be materialized in 1 instruction. Reviewed By: qiucf Differential Revision: https://reviews.llvm.org/D158495	2023-09-07 13:03:17 -04:00
Amara Emerson	1cc9f626cb	[GlobalISel] Add constant-folding of FP binops to combiner. (#65230 )	2023-09-07 19:33:35 +03:00
Vladislav Dzhidzhoev	c6ae7df999	[AArch64][GlobalISel] Regenerate arm64-ld1.ll	2023-09-07 17:48:15 +02:00
Stefan Pintilie	84e2fd7ee4	[PowerPC] Add a pass to merge all of the constant global arrays into one pool. On PowerPC the number of TOC entries must be kept low for large applications. In order to reduce the number of constant global arrays we can pool them into one structure and then access them as the base address of that structure plus some offset. The constant global arrays may be arrays of `i8` which are constant strings but they may also be arrays of `i32, i64, etc...`. Reviewed By: lei, amyk Differential Revision: https://reviews.llvm.org/D155730	2023-09-07 11:14:56 -04:00
Phoebe Wang	0856efbf88	Revert "[X86][RFC] Add new option `-m[no-]evex512` to disable ZMM and 64-bit mask instructions for AVX512 features" This reverts commit 7dd48cc24de2d54d40527432cbee8a9d97a8a4f7. Causing buildbot failure.	2023-09-07 21:59:01 +08:00
Phoebe Wang	7dd48cc24d	[X86][RFC] Add new option `-m[no-]evex512` to disable ZMM and 64-bit mask instructions for AVX512 features This is an alternative of D157485 and a pre-feature to support AVX10. AVX10 Architecture Specification: https://cdrdv2.intel.com/v1/dl/getContent/784267 AVX10 Technical Paper: https://cdrdv2.intel.com/v1/dl/getContent/784343 RFC: https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661 Based on the feedbacks from LLVM and GCC community, we have agreed to start from supporting `-m[no-]evex512` on existing AVX512 features. The option `-mno-evex512` can be used with `-mavx512xxx` to build binaries that can run on both legacy AVX512 targets and AVX10-256. There're still arguments about what's the expected behavior when this option as well as `-mavx512xxx` used together with `-mavx10.1-256`. We decided to defer the support of `-mavx10.1` after we made consensus. Or furthermore, we start from supporting AVX10.2 and not providing any AVX10.1 options. Reviewed By: RKSimon, skan Differential Revision: https://reviews.llvm.org/D159250	2023-09-07 21:38:35 +08:00
Stefan Pintilie	492c1f3d7c	[PowerPC] Merge rotate and clear into single instruction. This patch tries to catch a codegen opportunity where the rotate and mask can be merged into a single RLDCL instruction. Reviewed By: lei, amyk Differential Revision: https://reviews.llvm.org/D158328	2023-09-07 09:25:41 -04:00
Vladislav Dzhidzhoev	0de6baab91	[AArch64][GlobalISel] Look through COPY and G_BITCAST while selecting fcvtl2 (fpext) It tackles some regressions introduced in https://reviews.llvm.org/D144670.	2023-09-07 14:08:20 +02:00
pvanhout	69036eb735	[AMDGPU] Fix code-size-estimate.mir test Expensive-checks was failing on it.	2023-09-07 14:04:12 +02:00
Pierre van Houtryve	30955c9d22	[AMDGPU] Fix V_MOV_B32_indirect inst size (#65584 ) This inst lowers to a normal v_mov_b32 so it's not zero-sized, but has a size of 4. Solves SWDEV-416337	2023-09-07 13:12:58 +02:00
Tuan Chuong Goh	b7a305deca	[AArch64][GlobalISel] Optimise Combine Funnel Shift Combine any funnel shift with a shift amount of 0 to a copy. Modulo is applied to shift amount if it is larger than the instruction's bitwidth. Differential Revision: https://reviews.llvm.org/D157591	2023-09-07 11:58:12 +01:00
Vladislav Dzhidzhoev	1d0d2dfce7	[GlobalISel] Fold G_SHUFFLE_VECTOR with a single element mask to G_EXTRACT_VECTOR_ELT It introduces minor regression in arm64-vcvt_f.ll, which will be fixed later.	2023-09-07 12:03:04 +02:00
Mohamed Atef	741c127817	[SelectionDAG] Add computeOverflowForSignedMul / computeOverflowForUnsignedMul overflow handlers Support signed multiplication Support unsigned multiplication Differential Revision: https://reviews.llvm.org/D159406	2023-09-07 10:03:18 +01:00
Jianjian Guan	fab2594968	[RISCV][NFC] Remove unused checkline (#65560 )	2023-09-07 16:24:31 +08:00
Thomas	db5d845c73	[NVPTX] Make i16x2 a native type and add supported vec instructions (#65432 ) On sm_90 some instructions now support i16x2 which allows hardware to execute more efficiently add, min and max instructions. In order to support that we need to make i16x2 a native type in the backend. This does the necessary changes to make i16x2 a native type and adds support for the instructions natively supporting i16x2. This caused a negative test in nvptx slp to start passing. Changed the test to a positive one as the IR is correctly vectorized.	2023-09-06 21:59:13 -07:00
Daniel Hoekwater	866ae69cfa	[AArch64] [BranchRelaxation] Optimize for hot code size in AArch64 branch relaxation On AArch64, it is safe to let the linker handle relaxation of unconditional branches; in most cases, the destination is within range, and the linker doesn't need to do anything. If the linker does insert fixup code, it clobbers the x16 inter-procedural register, so x16 must be available across the branch before linking. If x16 isn't available, but some other register is, we can relax the branch either by spilling x16 OR using the free register for a manually-inserted indirect branch. This patch builds on D145211. While that patch is for correctness, this one is for performance of the common case. As noted in https://reviews.llvm.org/D145211#4537173, we can trust the linker to relax cross-section unconditional branches across which x16 is available. Programs that use machine function splitting care most about the performance of hot code at the expense of the performance of cold code, so we prioritize minimizing hot code size. Here's a breakdown of the cases: Hot -> Cold [x16 is free across the branch] Do nothing; let the linker relax the branch. Cold -> Hot [x16 is free across the branch] Do nothing; let the linker relax the branch. Hot -> Cold [x16 used across the branch, but there is a free register] Spill x16; let the linker relax the branch. Spilling requires fewer instructions than manually inserting an indirect branch. Cold -> Hot [x16 used across the branch, but there is a free register] Manually insert an indirect branch. Spilling would require adding a restore block in the hot section. Hot -> Cold [No free regs] Spill x16; let the linker relax the branch. Cold -> Hot [No free regs] Spill x16 and put the restore block at the end of the hot function; let the linker relax the branch. Ex: [Hot section] func.hot: ... hot code... func.restore: ... restore x16 ... B func.hot [Cold section] func.cold: ... spill x16 ... B func.restore Putting the restore block at the end of the function instead of just before the destination increases the cost of executing the store, but it avoids putting cold code in the middle of hot code. Since the restore is very rarely taken, this is a worthwhile tradeoff. Differential Revision: https://reviews.llvm.org/D156767	2023-09-06 20:44:40 +00:00
Florian Mayer	42a1d16179	Revert "[AMDGPU] Cope with SelectionDAG::UpdateNodeOperands returning a different SDNode (#65340 )" This reverts commit 11171d81aeafb0c2818f288900423e366a2787fc. Broke ASAN bot.	2023-09-06 13:16:55 -07:00
Vladislav Dzhidzhoev	c39edd7b53	[AArch64][GlobalISel] Regenerate prelegalizercombiner-shuffle-vector.mir	2023-09-06 18:38:13 +02:00
Craig Topper	bb810d8fa0	[RISCV] Disable machine verifier in gisel-commandline-option.ll. NFC Hopefully this fixes the expensive checks build.	2023-09-06 09:32:32 -07:00
Simon Pilgrim	e4d0e12099	[DAG] Fold (shl (sext (add_nsw x, c1)), c2) -> (add (shl (sext x), c2), c1 << c2) (REAPPLIED) Assuming the ADD is nsw then it may be sign-extended to merge with a SHL op in a similar fold to the existing (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) fold. This is most useful for helping to expose address math for X86, but has also touched several aarch64 test cases as well. Alive2: https://alive2.llvm.org/ce/z/2UpSbJ Differential Revision: https://reviews.llvm.org/D159198	2023-09-06 13:19:42 +01:00
Jay Foad	11171d81ae	[AMDGPU] Cope with SelectionDAG::UpdateNodeOperands returning a different SDNode (#65340 ) SITargetLowering::adjustWritemask calls SelectionDAG::UpdateNodeOperands to update an EXTRACT_SUBREG node in-place to refer to a new IMAGE_LOAD instruction, before we delete the old IMAGE_LOAD instruction. But in UpdateNodeOperands can do CSE on the fly and return a different EXTRACT_SUBREG node, so the original EXTRACT_SUBREG node would still exist and would refer to the old deleted IMAGE_LOAD instruction. This caused errors like: t31: v3i32,ch = <<Deleted Node!>> # D:1 This target-independent node should have been selected! UNREACHABLE executed at lib/CodeGen/SelectionDAG/InstrEmitter.cpp:1209! Fix it by detecting the CSE case and replacing all uses of the original EXTRACT_SUBREG node with the CSE'd one.	2023-09-06 12:51:44 +01:00

... 57 58 59 60 61 ...

52796 Commits