llvm-project

Author	SHA1	Message	Date
Matt Arsenault	5dfdd3494b	AMDGPU: Don't try to fold wavefrontsize intrinsic in libcall simplify It's not a libcall so doesn't really belong here to begin with. Relying on checking the target name and explicit features isn't particularly sound either. The library doesn't use the intrinsic anymore, so it doesn't matter anyway.	2023-08-01 18:20:50 -04:00
Matt Arsenault	eb00555c16	AMDGPU: Add more tests for sincos recognition These show both broken cases and cases which are handled too conservatively.	2023-08-01 18:20:50 -04:00
Philip Reames	1e86abc914	[RISCVRVVInitUndef] Ignore tied use for partial undef register The purpose of this code is to restrict overlap between source and destination registers. The tied input register is conceptually part of the destination. I can't see any reason why we need to prevent a partial undef tied source here, and skipping it reduces register pressure slightly. Differential Revision: https://reviews.llvm.org/D156709	2023-08-01 12:16:26 -07:00
Philip Reames	e938217f81	[RISCV] Implement getOptimalMemOpType for memcpy/memset lowering This patch implements the getOptimalMemOpType callback which is used by the generic mem* lowering in SelectionDAG to pick the widest type used. This patch only changes the behavior when vector instructions are available, as the default is reasonable for scalar. Without this change, we were emitting either XLEN sized stores (for aligned operations) or byte sized stores (for unaligned operations.) Interestingly, the final codegen was nowhere near as bad as that would seem to imply. Generic load combining and store merging kicked in, and frequently (but not always) produced pretty reasonable vector code. The primary effects of this change are: * Enable the use of vector operations for memset of non-constant. Our generic store merging logic doesn't know how to merge a broadcast store, and thus we were seeing the generic (and awful) byte expansion lowering for unaligned memset. * Enable the generic misaligned overlap trick where we write to some of the same bytes twice. The alternative is to either a) use an increasing small sequence of stores for the tail or b) use VL to restrict the vector store. The later is not implemented at this time, so the former is what previously happened. Interestingly, I'm not sure that changing VL (as opposed to the overlap trick) is even obviously profitable here. Differential Revision: https://reviews.llvm.org/D156249	2023-08-01 12:14:50 -07:00
Craig Topper	5a519961c8	[RISCV] Call combineSelectToBinOp before generic select expansion for Zicond. This handles logical ops of setccs and optimizes when the true or false value is -1. Reviewed By: asb, wangpc Differential Revision: https://reviews.llvm.org/D156810	2023-08-01 12:09:35 -07:00
Philip Reames	e93a8137d3	[RISCVRVVInitUndef] Remove implicit single use assumption for IMPLICIT_DEF (try 2) Reapplying after revert due to sanitizer failure. Includes fix to avoid querying dead lanes for vreg introduced by previous transform. The code was written with the implicit assumption that each IMPLICIT_DEF either a) the tied operand, or b) an untied source, but not both. This is true right now, but an upcoming change may allow CSE of IMPLICIT_DEFs in some cases, so let's rewrite the code to handle that possibility. I added an MIR case which demonstrates the multiple use IMPLICIT_DEF. To my knowledge, this is not a reachable configuration from IR right now. As an aside, this makes the structure a much closer match with the sub-reg liveness case, and we can probably just merge these routines. (Future work.) Differential Revision: https://reviews.llvm.org/D156477	2023-08-01 10:50:03 -07:00
Alex Bradbury	bc2ea021ec	[RISCV][test] Add 'atomicrmw xchg a, -1' tests in preparation for D156801 As noted by Craig, we can improve codegen for the -1 case as well.	2023-08-01 18:39:00 +01:00
Craig Topper	048458f94c	[RISCV] Add no NaN support to lowerFMAXIMUM_FMINIMUM. Using the nonans FMF and the DAG.isKnownNeverNaN on the inputs. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D156748	2023-08-01 09:51:24 -07:00
Mikhail Gudim	0fb3ebb2fc	[RISCV] Generalize 'tryFoldSelectIntOp` to other operations. Currently, only `SUB`, `ADD`, `OR` and `XOR` are covered. This patch adds `AND`, `SHL`, `SRA`, `SRL`. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155344	2023-08-01 11:27:10 -04:00
Alex Bradbury	fdac86cce8	[RISCV][test] Add atomicrmw test cases for suboptimal codegen report in #64090 <https://github.com/llvm/llvm-project/issues/64090> A forthcoming patch addresses these cases.	2023-08-01 15:30:09 +01:00
Francesco Petrogalli	cd921e0fd7	[MISched] Do not erase resource booking history for subunits. When dealing with the subunits of a resource group, we should reset the subunits availability at the first avaiable cycle of the resource that contains the subunits. Previously, the reset operation was returning cycle 0, effectively erasing the booking history of the subunits. Without this change, when using intervals for models have make use of subunits, the erasing of resource booking for subunits can raise the assertion "A resource is being overwritten" in `ResourceSegments::add`. The test added in the patch is one of such cases. Reviewed By: andreadb Differential Revision: https://reviews.llvm.org/D156530	2023-08-01 14:00:37 +02:00
Paulo Matos	8f3b87fc14	[SPIRV] Add support for SPV_INTEL_optnone Adds support for SPV_INTEL_optnone. Currently still in draft form but I wanted to open this revision to ask some questions. Differential Revision: https://reviews.llvm.org/D156297	2023-08-01 12:53:54 +02:00
Ben Shi	75c3c6ac15	[CSKY] Optimize 'llvm.cttz.i32' and 'llvm.ctlz.i32' Reviewed By: zixuan-wu Differential Revision: https://reviews.llvm.org/D156780	2023-08-01 18:15:20 +08:00
Ben Shi	f94e9bdc57	[CSKY][NFC][test] Add more tests of CodeGen for intrinsics Reviewed By: zixuan-wu Differential Revision: https://reviews.llvm.org/D156543	2023-08-01 17:12:32 +08:00
Ben Shi	80cd505914	[CSKY] Optimize implementation of intrinsic 'llvm.cttz.i32' Reviewed By: zixuan-wu Differential Revison: https://reviews.llvm.org/D154588	2023-08-01 17:12:32 +08:00
Yeting Kuo	4c8cf92067	[RISCV] Use the first element of source as the start value of reduction. Previously when llvm.reduce.* lowered, riscv backend created scalar vector with netural element as start value. For llvm.reduce.and/or/min/max/fmax/fmin, we could use the first element of source as the start value. It's benefit for RVV since we could just use source vector as start vector. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155929	2023-08-01 13:15:40 +08:00
Jun Sha (Joshua)	934b490530	[RISCV] Expand load extension / truncate store for bf16 Currentlt, bf16 operations are automatically supported by promoting to float. This patch adds bf16 support by ensuring that load extension / truncate store operations are properly expanded. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D156646	2023-08-01 11:10:41 +08:00
Tamir Duberstein	59afd29899	[BPF] Match CHECK w/ LLVM_ENABLE_ASSERTIONS=OFF (D156136)	2023-08-01 11:12:43 +09:00
Matt Arsenault	4d42e8b5d1	Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd. The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work around the underlying problem with SUBREG_TO_REG.	2023-07-31 20:15:45 -04:00
Steven Wu	42c9354a92	Revert "Reland "[LoongArch] Support -march=native and -mtune="" This reverts commit c56514f21b2cf08eaa7ac3a57ba4ce403a9c8956. This commit adds global state that is shared between clang driver and clang cc1, which is not correct when clang is used with `-fno-integrated-cc1` option (no integrated cc1). The -march and -mtune option needs to be properly passed through cc1 command-line and stored in TargetInfo.	2023-07-31 16:57:06 -07:00
Matt Arsenault	5b5bd81b71	AMDGPU: Move placement of RemoveIncompatibleFunctions This should be approximately first and run with other module passes. https://reviews.llvm.org/D155987	2023-07-31 19:22:04 -04:00
Matt Arsenault	db4d6ef9ef	AMDGPU: Directly emit fabs intrinsic instead of new libcall	2023-07-31 19:19:56 -04:00
Matt Arsenault	02a0b11331	AMDGPU: Remove weird usage of implicit operand on COPY For the purpose of the test it works as well to have a use after the copy itself.	2023-07-31 19:16:11 -04:00
Matt Arsenault	0aa439d502	AMDGPU/GlobalISel: Use SGPR results for G_AMDGPU_WAVE_ADDRESS	2023-07-31 19:16:11 -04:00
Tamir Duberstein	d542a56c1c	[BPF] Clean up SelLowering This patch contains a number of uncontroversial changes: - Replace all uses of `errs`, `assert`, `llvm_unreachable` with `report_fatal_error` with informative error strings. - Replace calls to `fail` in loops with at most one call per error instance. Previously a function with 19 arguments would log "too many args" 14 times. This was not helpful. - Change one `if (..) switch ...` to `if (..) { switch ...`. The added brace is consistent with a near-identical switch immediately above. - Elide one `SDValue` copy by using a reference rather than value. This is consistent with a variable declared immediately before it. Reviewed By: yonghong-song Differential Revision: https://reviews.llvm.org/D156136	2023-08-01 00:31:12 +03:00
David Green	778fa4edaf	[AArch64] Add some basic handling for bf16 constants. This adds some basic handling for bf16 constants, attempting to treat them a lot like fp16 constants where it can. Zero immediates get lowered to FMOVH0, others either get lowered to FMOVWHr(MOVi32imm) or use FMOVHi if they can. Without fp16 they get expanded. This may not always be optimal, but fixes a gap in our lowering. See llvm/test/CodeGen/AArch64/f16-imm.ll for the equivalent fp16 test. Differential Revision: https://reviews.llvm.org/D156649	2023-07-31 21:31:56 +01:00
Simon Pilgrim	071671e15c	[X86] Allow pre-SSE41 targets to extract multiple v16i8 elements coming from the same DWORD/WORD super-element Pre-SSE41 targets tended to have weak (serial) GPR<->VEC moves, meaning we only allowed a single v16i8 extraction before spilling the vector to stack and loading the i8 elements instead. But this didn't make use of the DWORD/WORD extraction we had to use could extract multiple i8 elements at the same time. This patch attempts to determine if all uses of a vector are element extractions, and works out whether all the extractions share the same WORD or (lowest) DWORD, in which case we can perform a single extraction and just shift/truncate the individual elements. Differential Revision: https://reviews.llvm.org/D156350	2023-07-31 17:08:34 +01:00
Matt Arsenault	8a677a7ff0	AMDGPU: Partially respect nobuiltin in libcall simplifier There are more contexts where it's not handled correctly but this is the simplest one. https://reviews.llvm.org/D156682	2023-07-31 10:56:46 -04:00
Simon Pilgrim	076bee1020	[DAG] getNode() - fold (zext (trunc (assertzext x))) -> (assertzext x) If the pre-truncated value was the same width as the extension, and the assertzext guarantees that the extended bits are already zero, then skip the zext/trunc 'zero_extend_inreg' pattern. Addresses several regressions noticed in D155472	2023-07-31 10:43:11 +01:00
Simon Tatham	60b98363c7	Retain all jump table range checks when using BTI. This modifies the switch-statement generation in SelectionDAGBuilder, specifically the part that generates case clusters of type CC_JumpTable. A table-based branch of any kind is at risk of being a JOP gadget, if it doesn't range-check the offset into the table. For some types of table branch, such as Arm TBB/TBH, the impact of this is limited because the value loaded from the table is a relative offset of limited size; for others, such as a MOV PC,Rn computed branch into a table of further branch instructions, the gadget is fully general. When compiling for branch-target enforcement via Arm's BTI system, many of these table branch idioms use branch instructions of types that do not require a BTI instruction at the branch destination. This avoids the need to put a BTI at the start of each case handler, reducing the number of available gadgets //with// BTIs (i.e. ones which could be used by a JOP attack in spite of the BTI system). But without a range check, the use of a non-BTI-requiring branch also opens up a larger range of followup gadgets for an attacker's use. A defence against this is to avoid optimising away the range check on the table offset, even if the compiler believes that no out-of-range value should be able to reach the table branch. (Rationale: that may be true for values generated legitimately by the program, but not those generated maliciously by attackers who have already corrupted the control flow.) The effect of keeping the range check and branching to an unreachable block is that no actual code is generated at that block, so it will typically point at the end of the function. That may still cause some kind of unpredictable code execution (such as executing data as code, or falling through to the next function in the code section), but even if so, there will only be //one// possible invalid branch target, rather than giving an attacker the choice of many possibilities. This defence is enabled only when branch target enforcement is in use. Without branch target enforcement, the range check is easily bypassed anyway, by branching in to a location just after it. But with enforcement, the attacker will have to enter the jump table dispatcher at the initial BTI and then go through the range check. (Or, if they don't, it's because they //already// have a general BTI-bypassing gadget.) Reviewed By: MaskRay, chill Differential Revision: https://reviews.llvm.org/D155485	2023-07-31 10:39:50 +01:00
Francesco Petrogalli	c4b21d57bc	[llc] Add the command line option `-sched-model-force-enable-intervals`. The option is used to force the use of resource intervals in the machine scheduler, effectively ignoring the value of `EnableIntervals` in the instance of the `SchedMachineModel`. Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D156540	2023-07-31 10:10:18 +02:00
Nikita Popov	063b37e7b4	Reapply [IR] Mark and/or constant expressions as undesirable Reapply after D156401, which stops PatternMatch from recognizing binop constant expressions, which should avoid the infinite loops and assertion failures this patch previously exposed. ----- In preparation for removing support for and/or expressions, mark them as undesirable. As such, we will no longer implicitly create such expressions, but they still exist.	2023-07-31 09:54:24 +02:00
Sameer Sahasrabuddhe	d9847cde48	[GlobalISel] convergent intrinsics Introduced the convergent equivalent of the existing G_INTRINSIC opcodes: - G_INTRINSIC_CONVERGENT - G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS Out of the targets that currently have some support for GlobalISel, the patch assumes that the convergent intrinsics only relevant to SPIRV and AMDGPU. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D154766	2023-07-31 12:15:39 +05:30
David Green	e8e49a3567	[AArch64][GlobalISel] G_FMINNUM and G_FMAXNUM vector lowering This attempts to expand the handling for G_FMAXNUM/G_FMINNUM for vector types, which is hopefully fairly straightforward now that fptrunc and fpext are working. Differential Revision: https://reviews.llvm.org/D156171	2023-07-31 07:35:28 +01:00
Craig Topper	eff53ce8fc	[RISCV] Remove unused CHECK prefix from test. NFC	2023-07-30 22:15:54 -07:00
Ben Shi	14e0a67a2d	[CSKY] Add more IR patterns to select FNMUL Reviewed By: zixuan-wu Differential Revision: https://reviews.llvm.org/D155169	2023-07-31 12:12:56 +08:00
Jianjian GUAN	b7408ebbb7	[RISCV] Use x0 in vsetvli when avl is equal to vlmax. We could use x0 form in vsetvli when we already know the vlmax and avl is equal to it. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D156404	2023-07-31 09:49:40 +08:00
David Green	cf39dea58d	[AArch64] Add a fminnum/fmaxnum test. NFC See D156171.	2023-07-30 17:27:05 +01:00
David Green	76f0d186d6	[AArch64] Regenerate arm64-vabs.ll, arm64-subvector-extend.ll and some mir tests. NFC	2023-07-30 16:51:01 +01:00
Jay Foad	58642565ec	[Hexagon] Add machine verification to some tests This is to help catch problems in D156552 that only showed up in an expensive checks build.	2023-07-30 12:07:34 +01:00
Jay Foad	e2e3f06813	Revert "[MachineScheduler] Track physical register dependencies per-regunit" This reverts commit 1a54671d5405a39de362e9692ce963c0638023bc. It was causing lit test failures in a LLVM_ENABLE_EXPENSIVE_CHECKS build.	2023-07-29 18:05:25 +01:00
Jay Foad	1a54671d54	[MachineScheduler] Track physical register dependencies per-regunit Change the scheduler's physical register dependency tracking from registers-and-their-aliases to regunits. This has a couple of advantages when subregisters are used: - The dependency tracking is more accurate and creates fewer useless edges in the dependency graph. An AMDGPU example, edited for clarity: SU(0): $vgpr1 = V_MOV_B32 $sgpr0 SU(1): $vgpr1 = V_ADDC_U32 0, $vgpr1 SU(2): $vgpr0_vgpr1 = FLAT_LOAD_DWORDX2 $vgpr0_vgpr1, 0, 0 There is a data dependency on $vgpr1 from SU(0) to SU(1) and from SU(1) to SU(2). But the old dependency tracking code also added a useless edge from SU(0) to SU(2) because it thought that SU(0)'s def of $vgpr1 aliased with SU(2)'s use of $vgpr0_vgpr1. - On targets like AMDGPU that make heavy use of subregisters, each register can have a huge number of aliases - it can be quadratic in the size of the largest defined register tuple. There is a much lower bound on the number of regunits per register, so iterating over regunits is faster than iterating over aliases. The LLVM compile-time tracker shows a tiny overall improvement of 0.03% on X86. I expect a larger compile-time improvement on targets like AMDGPU. Differential Revision: https://reviews.llvm.org/D156552	2023-07-29 15:34:53 +01:00
Jay Foad	5a64c89c8d	[MachineScheduler] Test case for physical register dependencies Differential Revision: https://reviews.llvm.org/D156551	2023-07-29 15:34:53 +01:00
Anatoly Trosinenko	4210204f52	[AArch64] Refactor checks in sign-return-address.ll test Using implicit CHECK prefix in one FileCheck invocation and explicit CHECK-V83A in the other one seems to misguide to use CHECK: lines as a common matching prefix at various places. Also note that ; CHECK, CHECK-V83A: ... line only matches the "CHECK-V83A" prefix. This commit explicitly splits the checks into common ones (CHECK) and invocation-specific ones (COMPAT and V83A) and updates the assertions with the update_llc_test_checks.py script. Reviewed By: efriedma, MaskRay Differential Revision: https://reviews.llvm.org/D156327	2023-07-29 12:58:56 +03:00
Wael Yehia	9d4e8c09f4	[XCOFF] Do not put MergeableCStrings in their own section The current implementation generates a csect with a ".rodata.str.x.y" prefix for a MergeableCString variable definition. However, a reference to such variable does not get the prefix in its name because there's not enough information in the containing IR. In particular, without seeing the initializer and absent of some other indicators, we cannot tell that the referenced variable is a null- terminated string. When the AIX codegen in llvm was being developed, the prefixing was copied from ELF without having the linker take advantage of the info. Currently, the AIX linker does not have the capability to merge MergeableCString variables. If such feature would ever get implemented, the contract between the linker and compiler would have to be reconsidered. Here's the before and after of this change: ``` @a = global i64 320255973571806, align 8 @strA = unnamed_addr constant [7 x i8] c"hello\0A\00", align 1 ;; Mergeable1ByteCString @strB = unnamed_addr constant [8 x i8] c"Blahah\0A\00", align 1 ;; Mergeable1ByteCString @strC = unnamed_addr constant [2 x i16] [i16 1, i16 0], align 2 ;; Mergeable2ByteCString @strD = unnamed_addr constant [2 x i16] [i16 1, i16 1], align 2 ;; !isMergeableCString @strE = external unnamed_addr constant [2 x i16], align 2 -fdata-sections: .text extern .rodata.str1.1strA .text extern strA 0 SD RO 0 SD RO .text extern .rodata.str1.1strB .text extern strB 0 SD RO 0 SD RO .text extern .rodata.str2.2strC ===> .text extern strC 0 SD RO 0 SD RO .text extern strD .text extern strD 0 SD RO 0 SD RO .data extern a .data extern a 0 SD RW 0 SD RW undef extern strE undef extern strE 0 ER UA 0 ER UA -fno-data-sections: .text unamex .rodata.str1.1 .text unamex .rodata 0 SD RO 0 SD RO .text extern strA .text extern strA 0 LD RO 0 LD RO .text extern strB .text extern strB 0 LD RO 0 LD RO .text unamex .rodata.str2.2 ===> .text extern strC 0 SD RO 0 LD RO .text extern strC .text extern strD 0 LD RO 0 LD RO .text unamex .rodata .data unamex .data 0 SD RO 0 SD RW .text extern strD .data extern a 0 LD RO 0 LD RW .data unamex .data undef extern strE 0 SD RW 0 ER UA .data extern a 0 LD RW undef extern strE 0 ER UA ``` Reviewed by: David Tenty, Fangrui Song Differential Revision: https://reviews.llvm.org/D156202	2023-07-29 03:24:21 +00:00
Matt Arsenault	3240ae7034	AMDGPU/GlobalISel: Set dead on scc on manually selected instructions In SelectionDAG InstrEmitter automatically puts dead flags on unused physreg defs everywhere. The generated selectors should also set dead on physreg defs that were not used in the pattern.	2023-07-28 14:14:06 -04:00
Matt Arsenault	c26dfc81e2	[HACK] X86: Disable isCopyInstrImpl for undef subregister defs This is a workaround for a coalescer bug where coalescing SUBREG_TO_REG ends up losing the liveness of the high bits of the source register. The result is an incorrect undef subregister def instead of preserving the high values. Work around the observed failure after the resulting mov is eliminated during allocation until a proper fix is ready. I believe the proper fix is to make SUBREG_TO_REG use a tied operand. The test should catch a regression originally observed after b7836d856206ec39509d42529f958c920368166b and should not show a difference after a496c8be6e638ae58bb45f13113dbe3a4b7b23fd is reverted. https://reviews.llvm.org/D156164	2023-07-28 13:33:28 -04:00
Arthur Eubanks	f800c1f3b2	[PEI] Don't zero out noreg operands A tail call may have $noreg operands. Fixes a crash. Reviewed By: xgupta Differential Revision: https://reviews.llvm.org/D156485	2023-07-28 10:23:17 -07:00
Jeffrey Byrnes	391249d1af	[AMDGPU] Allow 8,16 bit sources in calculateSrcByte This is required for many trees produced in practice for i8 CodeGen. Differential Revision: https://reviews.llvm.org/D155864 Change-Id: Iac01d183d9998b15138bdc7a5051e3bed338e7d9	2023-07-28 09:50:21 -07:00
Jay Foad	945123384e	[PEI][ARM] Switch to backwards frame index elimination This adds better support for call frame pseudos that adjust SP in PEI::replaceFrameIndicesBackward. Running frame index elimination backwards is preferred because it can do backwards register scavenging (on targets that require scavenging) which does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D156434	2023-07-28 17:32:51 +01:00

... 69 70 71 72 73 ...

52796 Commits