llvm-project

Author	SHA1	Message	Date
pvanhout	c3cfbbc416	[GlobalISel] Add dead flags to implicit defs in ISel Checks for implicit defs that are unused within a pattern and mark them as dead. This is done directly at the TableGen level forr efficiency. The instructions are directly created with the "dead" operand and no further analysis is needed later. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D157273	2023-08-09 14:20:51 +02:00
Simon Pilgrim	8ae0e1f58d	[X86] Create X86ISD::SHUF128 512-bit masks with getV4X86ShuffleImm8ForMask This allows us to use the same canonicalizations as PSHUFD/SHUFPS etc. to avoid unnecessary demanded elts (better splat detection, blend pass through etc.) instead of defaulting to zero mask values.	2023-08-09 11:13:03 +01:00
Alex Bradbury	89b8ebf3d6	[LegalizeTypes][RISCV] Correct FP_TO_{S,U}INT expansion when bf16 isn't a legal type As noted in D156990, the logic in ExpandIntRes_FP_TO_SINT assumes that if the type action for the float type is TypeSoftPromoteHalf, is must have been an f16 (half). However, the meaning of that type action has been overloaded and it is used for both f16 and bf16. This patch adds an appropriate check to ensure ISD::FP16_TO_FP or ISD::BF16_TO_FP is emitted as required. Differential Revision: https://reviews.llvm.org/D157287	2023-08-09 11:01:28 +01:00
Igor Kirillov	60e2a849b0	[CodeGen] Disable FP LD1RX instructions generation for Neoverse-V1 These instructions show worse performance on Neoverse-V1 compared to pair of LDR(LDP)/MOV instructions. This patch adds `no-sve-fp-ld1r` sub-target feature, which is enabled only on Neoverse-V1. Fixes https://github.com/llvm/llvm-project/issues/64498 Differential Revision: https://reviews.llvm.org/D157279	2023-08-09 09:33:45 +00:00
Simon Wallis	33b9634394	[ARM] v6-M XO: save CPSR around LoadStackGuard For Thumb-1 Execute-Only, expandLoadStackGuardBase generates a tMOVimm32 pseudo when calculating the stack offset. It does this in a context where the CSPR maybe be live. tMOVimm32 may corrupt CPSR. To fix this, generate save/restore CPSR around the tMOVimm32 using MRS/MSR to/from a scratch register. expandLoadStackGuardBase this runs after register allocation, so the scratch register needs to be a physical register. Use R12 as a scratch register, as is usual when expanding a pseudo. MSR/MRS are some of the few v6-M instructions which operate on a high register. New stack-guard test case added which was generating incorrect code without the save/restore CPSR. Reviewed By: stuij Differential Revision: https://reviews.llvm.org/D156968	2023-08-09 08:40:35 +01:00
Konstantina Mitropoulou	2c5d1b5ab7	[DAGCombiner] Reassociate the operands from (OR (OR(CMP1, CMP2)), CMP3) to (OR (OR(CMP1, CMP3)), CMP2) This happens when CMP1 and CMP3 have the same predicate (or CMP2 and CMP3 have the same predicate). This helps optimizations such as the fololowing one: CMP(A,C)\|\|CMP(B,C) => CMP(MIN/MAX(A,B), C) CMP(A,C)&&CMP(B,C) => CMP(MIN/MAX(A,B), C) Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D156215	2023-08-08 20:08:01 -07:00
Konstantina Mitropoulou	51202b8d2e	[NFC][DAGCombiner] Tests for future commit. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155915	2023-08-08 20:05:23 -07:00
Weining Lu	f62c9252fc	[LoongArch] Support -march=native and -mtune= As described in [1][2], `-mtune=` is used to select the type of target microarchitecture, defaults to the value of `-march`. The set of possible values should be a superset of `-march` values. Currently possible values of `-march=` and `-mtune=` are `native`, `loongarch64` and `la464`. D136146 has supported `-march={loongarch64,la464}` and this patch adds support for `-march=native` and `-mtune=`. A new ProcessorModel called `loongarch64` is defined in LoongArch.td to support `-mtune=loongarch64`. `llvm::sys::getHostCPUName()` returns `generic` on unknown or future LoongArch CPUs, e.g. the not yet added `la664`, leading to `llvm::LoongArch::isValidArchName()` failing to parse the arch name. In this case, use `loongarch64` as the default arch name for 64-bit CPUs. Two preprocessor macros are defined based on user-provided `-march=` and `-mtune=` options and the defaults. - __loongarch_arch - __loongarch_tune Note that, to work with `-fno-integrated-cc1` we leverage cc1 options `-target-cpu` and `-tune-cpu` to pass driver options `-march=` and `-mtune=` respectively because cc1 needs these information to define macros in `LoongArchTargetInfo::getTargetDefines`. [1]: https://github.com/loongson/LoongArch-Documentation/blob/2023.04.20/docs/LoongArch-toolchain-conventions-EN.adoc [2]: https://github.com/loongson/la-softdev-convention/blob/v0.1/la-softdev-convention.adoc Reviewed By: xen0n, wangleiat, steven_wu, MaskRay Differential Revision: https://reviews.llvm.org/D155824	2023-08-09 10:29:50 +08:00
David Green	c782e3497d	[AArch64] Add VSHL knownBits handling. These can be handled in the same way as other shifts.	2023-08-08 21:59:53 +01:00
David Green	2bb727297d	[AArch64] Regenerate s/urem-seteq-* tests. NFC	2023-08-08 21:34:34 +01:00
Matt Arsenault	87b6f85c2b	AMDGPU: Add syncscopes to some atomic tests These were not testing what was intended, which should be the cases we can directly select to the instructions.	2023-08-08 14:38:06 -04:00
Matt Arsenault	3371849194	AMDGPU: Round out system atomics tests There were system scope tests only for integer min/max. Expand this to cover all of the integer operations.	2023-08-08 14:38:05 -04:00
Matt Arsenault	7db933a716	AMDGPU: Fix broken test checks There were incomplete generated checks plus some dead manual checks.	2023-08-08 14:38:05 -04:00
Simon Pilgrim	7593f9b59a	[X86] combineConcatVectorOps - add handling for X86ISD::SHUF128 nodes. Prevents regression on some future work to improve codegen for concat_vectors(extract_subvector(),extract_subvector()) patterns. X86ISD::SHUF128 optimization is still pretty poor (especially the zmm variant), not optimizing the shuffle demanded elts like we do for SHUFPS.	2023-08-08 18:13:43 +01:00
Igor Kirillov	84d444f909	[CodeGen] Fix incorrect pattern FMLA_* pseudo instructions * Remove the incorrect patterns from AArch64fmla_p/AArch64fmls_p * Add correct patterns to AArch64fmla_m1/AArch64fmls_m1 * Refactor fma_patfrags for the sake of PatFrags Fixes https://github.com/llvm/llvm-project/issues/64419 Differential Revision: https://reviews.llvm.org/D157095	2023-08-08 16:34:31 +00:00
pvanhout	96e1032a5e	[AMDGPU] Add extended-image-insts to RemoveIncompatibleFunctions Otherwise device libs still has issues at O0 (in OpenCL-CTS) Depends on D156972 as well. They're unrelated fixes but both are needed to fix the issue. Fixes SWDEV-402331 Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D156973	2023-08-08 15:15:57 +02:00
pvanhout	98ccc70b93	[DAG] Fix crash in replaceStoreOfInsertLoad Idx's type can be different from Ptr's, causing a "Binary operator types must match" assertion failure when emitting the MUL. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D156972	2023-08-08 15:15:34 +02:00
Alex Bradbury	f7dbc8501f	[LegalizeTypes][RISCV] Support libcalls for fpto{s,u}i of bfloat by extending to f32 first As there is no direct bf16 libcall for these conversions, extend to f32 first. This patch includes a tiny refactoring to pull out equivalent logic in ExpandIntRes_XROUND_XRINT so it can be reused in ExpandIntRes_FP_TO_{S,U}INT. This patch also demonstrates incorrect codegen for RV32 without zfbfmin for the newly enabled tests. As it doesn't introduce that incorrect codegen (caused by the assumption that 'TypeSoftPromoteHalf' is only used for f16 types), a fix will be added in a follow-up (D157287). Differential Revision: https://reviews.llvm.org/D156990	2023-08-08 13:56:32 +01:00
Jolanta Jensen	932972305b	[NFC][AArch64] Added checks for global entries in ReplaceWithVeclib testing This patch added checks for global entries in ReplaceWithVeclib testing using ArmPL and SLEEF vector libraries. Differential Revision: https://reviews.llvm.org/D157258	2023-08-08 12:28:58 +00:00
Matt Devereau	e8efe7f9d1	[AArch64][SME2][SVE2p1] Choose strided or contiguous loads Lower to the strided/contiguous addressing mode of ld1/ldnt1 instructions depending on register allocation. Differential Revision: https://reviews.llvm.org/D156311	2023-08-08 11:50:33 +00:00
Igor Kirillov	7542477d5d	[CodeGen] Precommit tests for D157095	2023-08-08 11:38:15 +00:00
Igor Kirillov	b560d5c7e3	[CodeGen] Pre-commit tests showing incorrect pattern FMLA_* pseudo instructions Differential Revision: https://reviews.llvm.org/D157094	2023-08-08 10:52:55 +00:00
David Green	de775f264d	[DAG] Add constant SPLAT handling in getNodes SIGN_EXTEND_INREG This helps simplify constant splats a little. Without this the code in llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L14072 always returns the existing node. Differential Revision: https://reviews.llvm.org/D157259	2023-08-08 10:27:55 +01:00
Simon Pilgrim	943fda567a	[X86] matchTruncateWithPACK - canonically prefer v4i64 -> v4i32 shuffle vs truncation Pulled out of LowerTruncateVecPackWithSignBits - prefer shuffles unless we can cheaply split the vector. ComputeNumSignBits struggles with vXi64 through bitcasts, so we're usually better off with shuffles.	2023-08-08 10:05:24 +01:00
Luke Lau	5d510ea724	[RISCV] Lower vro{l,r} for fixed vectors We need to add new VL nodes to mirror ISD::ROTL and ISD::ROTR. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D157295	2023-08-08 09:47:00 +01:00
Luke Lau	768740ef77	[RISCV] Lower unary zvbb ops for fixed vectors This reuses the same strategy for fixed vectors as other ops, i.e. custom lower to a scalable *_vl SD node. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D157294	2023-08-08 09:46:57 +01:00
Luke Lau	44383ac7fd	[RISCV] Add fixed vector tests for ct[l,t]z_zero_undef Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D157293	2023-08-08 09:46:55 +01:00
David Green	0d0599249a	[AArch64] Regenerate fpround mir tests. NFC	2023-08-08 09:24:05 +01:00
Ben Shi	57c6fe273f	[CSKY] Optimize multiplication with immediates Optimize "Rx * imm" for specific immediates to ([IXH32\|IXW32\|IXD32] (LSLI Rx, shift), Rx). Reviewed By: zixuan-wu Differential Revision: https://reviews.llvm.org/D154768	2023-08-08 14:13:49 +08:00
Ben Shi	731bab50be	[CSKY][test][NFC] Add tests of multiplication with immediates These tests will be optimized with IXH32/IXW32/IXD32 in the future. Reviewed By: zixuan-wu Differential Revision: https://reviews.llvm.org/D154332	2023-08-08 14:13:49 +08:00
Ben Shi	30b52a3574	[CSKY] Optimize conditional branch and value select with BTSTI Reviewed By: zixuan-wu Differential Revision: https://reviews.llvm.org/D154768	2023-08-08 14:13:48 +08:00
Philip Reames	f0a9aacdb9	[RISCV] Use vmv.s.x for a constant build_vector when the entire size is less than 32 bits We have a variant of this for splats already, but hadn't handled the case where a single copy of the wider element can be inserted producing the entire required bit pattern. This shows up mostly in very small vector shuffle tests. Differential Revision: https://reviews.llvm.org/D157299	2023-08-07 17:15:05 -07:00
Nitin John Raj	c9fe119869	[RISCV][GlobalISel] Legalize G_ICMP and G_SELECT Test legalization for (i7, i8, i16, i32, i48, i64) on rv32 and for (i8, i15, i16, i32, i64, i72, i128). Legalization fails for i96 on rv32 and i192 on rv64. Note that [i192 fails for AArch64](https://github.com/llvm/llvm-project/issues/64394). Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D157023	2023-08-07 16:44:29 -07:00
Nitin John Raj	cd61e8de06	[RISCV][GlobalISel] Legalize add/sub for wide and non-pow2 types Legalize G_ADD, G_SUB, G_(S/U)ADD(O/E). We test for (s7, s48, s64, s96) on rv32 and (s15, s72, s128, s192) on rv64. Differential Revision: https://reviews.llvm.org/D157019	2023-08-07 16:43:53 -07:00
Nitin John Raj	3bcfd6e962	[RISCV][GlobalISel] Legalize logical instructions for nonpow 2 types Legalize G_AND, G_OR, G_XOR for (s7, s48) on rv32 and (s15, s72) on rv64 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D157017	2023-08-07 16:23:47 -07:00
Matt Arsenault	4b1702e87a	AMDGPU: Fix counting source modifiers as literal constants This fixes over estimating code size. This was broken by 79f52af4cd9a76485dd50bcdbb5d393eb7a70103. https://reviews.llvm.org/D157103	2023-08-07 18:40:16 -04:00
Nitin John Raj	649e1d1b9d	[RISCV][GlobalISel] Legalize bitshift instructions for narrow types Legalize G_SHL, G_ASHR and G_LSHR for types narrower and upto (and including) XLen: (i7, i8, i16 and i32) for rv32 and (i8, i15, i16, i32 and i64) for rv64. This requires adding some rules to handle G_ANYEXT, G_ZEXT and G_SEXT. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155772	2023-08-07 15:11:34 -07:00
Craig Topper	07c8bcc21d	[AArch64] Narrow G_SEXT_INREG to s64 before lowering. This avoids narrowing after it has been expanded to shifts. The G_SEXT_INREG narrowing can use the second operand of the instruction to optimize the narrowing. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D157172	2023-08-07 14:34:21 -07:00
Craig Topper	7cc615413f	[RISCV] Add back handling of X > -1 to ISD::SETCC lowering. There are cases where the -1 doesn't become visible until lowering so the folding doesn't have a chance to run. I think in these cases there is a missed DAGCombine for truncate (undef), which I may fix separately, but RISC-V backend should protect itself. Fixes #64503. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D157314	2023-08-07 13:00:57 -07:00
Nitin John Raj	b8fef7a6d4	[RISCV][GlobalISel] Legalize constants, undefined values, extension instructions, and (un)merge instructions for narrow types Test legalization for (s7, s8, s16, s32, s48, s64, s96) for rv32, (s8, s15, s16, s32, s64, s72, s128, s192) for rv64. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D156383	2023-08-07 11:17:08 -07:00
Nitin John Raj	1b74459df8	[RISCV][GlobalISel] Fix tests for addition, subtraction and logical instructions Fix a bug introduced in a previous commit. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D156380	2023-08-07 10:25:03 -07:00
John Brawn	f83ab2b3be	[ARM] Improve generation of thumb stack accesses Currently when a stack access is out of range of an sp-relative ldr or str then we jump straight to generating the offset with a literal pool load or mov32 pseudo-instruction. This patch improves that in two ways: * If the offset is within range of sp-relative add plus an ldr then use that. * When we use the mov32 pseudo-instruction, if putting part of the offset into the ldr will simplify the expansion of the mov32 then do so. Differential Revision: https://reviews.llvm.org/D156875	2023-08-07 17:53:32 +01:00
Philip Reames	47fe3b3b9a	[RISCV] Use v(f)slide1down for build_vector with dominant values If we have a dominant value, we can still use a v(f)slide1down to handle the last value in the vector if that value is neither undef nor the dominant value. Note that we can extend this idea to any tail of elements, but that's ends up being a near complete merge of the v(f)slide1down insert path, and requires a bit more untangling on profitability heuristics first. Differential Revision: https://reviews.llvm.org/D157120	2023-08-07 07:54:29 -07:00
Philip Reames	999ac10d76	[RISCVGatherScatterLowering] Support broadcast base pointer A broadcast base pointer is the same as a scalar base pointer for GEP semantics (when there's at least one other vector operand). This is the form that SLP likes to emit, so we should handle it. Differential Revision: https://reviews.llvm.org/D157132	2023-08-07 07:42:04 -07:00
Jay Foad	56d92c1758	[MachineScheduler] Track physical register dependencies per-regunit Change the scheduler's physical register dependency tracking from registers-and-their-aliases to regunits. This has a couple of advantages when subregisters are used: - The dependency tracking is more accurate and creates fewer useless edges in the dependency graph. An AMDGPU example, edited for clarity: SU(0): $vgpr1 = V_MOV_B32 $sgpr0 SU(1): $vgpr1 = V_ADDC_U32 0, $vgpr1 SU(2): $vgpr0_vgpr1 = FLAT_LOAD_DWORDX2 $vgpr0_vgpr1, 0, 0 There is a data dependency on $vgpr1 from SU(0) to SU(1) and from SU(1) to SU(2). But the old dependency tracking code also added a useless edge from SU(0) to SU(2) because it thought that SU(0)'s def of $vgpr1 aliased with SU(2)'s use of $vgpr0_vgpr1. - On targets like AMDGPU that make heavy use of subregisters, each register can have a huge number of aliases - it can be quadratic in the size of the largest defined register tuple. There is a much lower bound on the number of regunits per register, so iterating over regunits is faster than iterating over aliases. The LLVM compile-time tracker shows a tiny overall improvement of 0.03% on X86. I expect a larger compile-time improvement on targets like AMDGPU. Recommit after fixing AggressiveAntiDepBreaker in D156880. Differential Revision: https://reviews.llvm.org/D156552	2023-08-07 15:41:40 +01:00
Jay Foad	68a0a37371	[AggressiveAntiDepBreaker] Tweak the fix for renaming a subregister of a live register This patch tweaks the fix in D20627 "Do not rename registers that do not start an independent live range" to only consider Data dependencies, not Output or Anti dependencies. An Output or Anti dependency to a superreg does not imply that that superreg is live at the current instruction. This enables breaking anti-dependencies in a few more cases as shown by the lit test updates. Differential Revision: https://reviews.llvm.org/D156879	2023-08-07 15:41:40 +01:00
Alex Bradbury	380fd8201d	[RISCV][test] Add non-zfbfmin RUN lines to bfloat-convert.ll As requested in review for https://reviews.llvm.org/D156990 This additionally consistently uses the ilp32d/lp64d ABIs when the D extension is enabled.	2023-08-07 14:39:12 +01:00
Simon Pilgrim	0d1f8532bc	[X86] truncateVectorWithPACK - ensure we don't truncate to <1 x iXX> vector types Fuzz testing noticed that the sub-128-bit vector splitting added in ef4330f4f3cc didn't correctly halt at <2 x iXX> truncations.	2023-08-07 14:11:42 +01:00
Simon Pilgrim	711dff4577	[X86] Add matchTruncateWithPACK helper for matching signbits/knownbits for PACKSS/PACKUS Begin to consolidate the similar matching code we have - all have semi-similar constraints that still need merging together to ensure we get consistent codegen depending on when the truncate is lowered.	2023-08-07 14:11:42 +01:00
Jim Lin	f2bdc29f3e	[RISCV] Add a blank line after end of RUN lines. NFC. In most of testcases, it usually has a blank line after end of RUN lines for readability.	2023-08-07 18:38:09 +08:00

... 67 68 69 70 71 ...

52796 Commits