llvm-project

Author	SHA1	Message	Date
Paul Walker	96c8d615d6	[SVE] Extend findMoreOptimalIndexType so BUILD_VECTORs do not force 64bit indices. Extends findMoreOptimalIndexType to allow ISD::BUILD_VECTOR based indices to be truncated when such truncation is lossless. This can enable the use of 32bit gather/scatter indices thus making it less likely to have to split a gather/scatter in two. Depends on D125194 Differential Revision: https://reviews.llvm.org/D130533	2022-08-18 18:00:53 +01:00
wanglian	eeac894418	Precommit tests for D132115 Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D132116	2022-08-18 17:58:15 +08:00
gonglingqin	36038b5cb6	[LoongArch] Supports brcond with 21 bit offsets Differential Revision: https://reviews.llvm.org/D132006	2022-08-18 15:55:50 +08:00
WANG Xuerui	929d201b7a	[LoongArch] Add support for llvm.eh.dwarf.cfa It's the same as D126181 for RISCV. Differential Revision: https://reviews.llvm.org/D132012	2022-08-18 13:17:49 +08:00
Sam Clegg	fa306f1396	[WebAssembly] WebAssemblyLowerEmscriptenEHSjLj: Fix signature of malloc in wasm64 mode Differential Revision: https://reviews.llvm.org/D132091	2022-08-17 18:16:34 -07:00
Luo, Yuanke	28733d86cf	[amdgpu] Change the RA to basic Specifying `-regalloc=fast` is not reliable. With fast register allocation, `LIS = getAnalysisIfAvailable<LiveIntervals>();` get nullptr in "si-lower-sgpr-spills" pass, so the slot index is not created in the pass for new inserted instructions. When verifying the machine instructions, it fails on checking slot index. While greedy-ra is time consuming basic-ra can be used to reduce compiling time for this test case. Differential Revision: https://reviews.llvm.org/D131931	2022-08-18 08:16:19 +08:00
Jeffrey Byrnes	1c8d7ea973	[AMDGPU] Implement pipeline solver for non-trivial pipelines Requested SchedGroup pipelines may be non-trivial to satisify. A minimimal example is if the requested pipeline is {2 VMEM, 2 VALU, 2 VMEM} and the original order of SUnits is {VMEM, VALU, VMEM, VALU, VMEM}. Because of existing dependencies, the choice of which SchedGroup the middle VMEM goes into impacts how closely we are able to match the requested pipeline. It seems minimizing the degree of misfit (as measured by the number of edges we can't add) w.r.t the choice we make when mapping an instruction -> SchedGroup is an NP problem. This patch implements the PipelineSolver class which produces a solution for the defined problem for the sched_group_barrier mutation. The solver has both an exponential time exact algorithm and a greedy algorithm. The patch includes some controls which allows the user to select the greedy/exact algorithm. Differential Revision: https://reviews.llvm.org/D130797	2022-08-17 16:21:59 -07:00
Sanjay Patel	7f72a0f5bb	[SDAG] avoid generating libcall to function with same name This is a potentially better alternative to D131452 that also should avoid the infinite loop bug from: issue #56403 This is again a minimal fix to reduce merging pain for the release. But if this makes sense, then we might want to guard all of the RTLIB generation (and other libcalls?) with a similar name check. Differential Revision: https://reviews.llvm.org/D131521	2022-08-17 16:19:34 -04:00
Matthias Braun	19ce5e515f	RAGreedyStats: Ignore identity COPYs; count COPYs from/to physregs Improve copy statistics: - Count copies from or to physical registers: They are used to model function parameters and calling conventions and the register allocator optimizes for them. - Check physical registers assigned to virtual registers and stop counting "identity" `COPY`s where source and destination is the same physical registers; they will be removed in the `virtregmap` pass anyway. Differential Revision: https://reviews.llvm.org/D131932	2022-08-17 12:53:29 -07:00
Archit Saxena	e170d955fe	Split EH code by default The current machine function splitter is reliant on profile data to do profile summary analysis to split blocks into cold section. This may sometimes limit the usage of machine function splitter especially in cases where we could do some form of static analysis to split out cold blocks if profile data is absent or profile data which may be faulty (Consider Sample PGO). Of all code that could statically be marked cold Exception handling blocks are one of them (In fact BFI framework also tends to mark them as cold), and the most in size contribution. In my experiments I found out Exception handling pads and all code reachable from there account for up to 6-8% of the .text section on modern production binaries. This patch introduces a flag to split out all Exception handling blocks and blocks only reachable from Exceptional Handling pad to cold section. This flag has shown to give a performance win of up to 0.1% in terms of average cycles and instructions executed on internal facebook search service. Reviewed By: snehasish Differential Revision: https://reviews.llvm.org/D131824	2022-08-17 12:40:31 -07:00
Sanjay Patel	8eddd1ec60	[AArch64] add test for recursive libcall lowering; NFC Issue #56403	2022-08-17 14:54:50 -04:00
Vladislav Dzhidzhoev	4b57939583	[AArch64][GlobalISel] Fallback to generic lowering of G_CTPOP Use generic lowering of G_CTPOP for s32 and s64 scalars when noimplicitfloat is specified. Differential Revision: https://reviews.llvm.org/D131454	2022-08-17 21:10:27 +03:00
Craig Topper	550fab53e1	[RISCV] Fold (sub C, (xor (setcc), 1)) -> (add (setcc), C-1). Extracted from D131729 where we handled C==0. It's now generalized to more constants. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132000	2022-08-17 09:50:08 -07:00
Nick Desaulniers	6b0e2fa6f0	[SelectionDAG] make INLINEASM_BR use MachineBasicBlocks instead of BlockAddresses As part of re-architecting callbr to no longer use blockaddresses (https://reviews.llvm.org/D129288), we don't really need them in MIR. They make comparing MachineBasicBlocks of indirect targets during MachineVerifier a PITA. Suggested by @efriedma from the discussion: https://reviews.llvm.org/D130290#3669531 Reviewed By: efriedma, void Differential Revision: https://reviews.llvm.org/D130316	2022-08-17 09:34:31 -07:00
David Penry	1c9f0408bc	Revert "[ModuloSchedule] Add interface call to accept/reject SMS schedules" This reverts commit 8c4aea438c310816bb4e4f9a32d783381ef3182e. Needed because buildbot failures (warnings) gave a clue that there was a functional bug in the ARM rejection logic. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D132037	2022-08-17 09:32:43 -07:00
Sotiris Apostolakis	848e9e454f	[SelectOpti] Remove test on loop-level analysis Remove a test that relied on the underlying instruction latency modeling. Such dependency blocks efforts such as D79483 to improve this cost modeling. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D132029	2022-08-17 16:13:33 +00:00
David Penry	8c4aea438c	[ModuloSchedule] Add interface call to accept/reject SMS schedules This interface allows a target to reject a proposed SMS schedule. For Hexagon/PowerPC, all schedules are accepted, leaving behavior unchanged. For ARM, schedules which exceed register pressure limits are rejected. Also, two RegisterPressureTracker methods now need to be public so that register pressure can be computed by more callers. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D128941	2022-08-17 08:13:26 -07:00
Alex Bradbury	ce38128194	[RISCV] Avoid redundant branch-to-branch when expanding cmpxchg If the success value of a cmpxchg is used in a branch, the expanded cmpxchg sequence ends up with a redundant branch-to-branch (as the backend atomics expansion happens as late as possible, passes to optimise such cases have already run). This patch identifies this case and avoid it when expanding the cmpxchg. Note that a similar optimisation is possible for a BEQ on the cmpxchg success value. As it's hard to imagine a case where real-world code may do that, this patch doens't handle that case. Differential Revision: https://reviews.llvm.org/D130192	2022-08-17 13:49:15 +01:00
OverMighty	232953f996	[AArch64] Add pattern for SQDML*Lv1i32_indexed There was no pattern to fold into these instructions. This patch adds the pattern obtained from the following ACLE intrinsics so that they generate sqdmlal/sqdmlsl instructions instead of separate sqdmull and sqadd/sqsub instructions: - vqdmlalh_s16, vqdmlslh_s16 - vqdmlalh_lane_s16, vqdmlalh_laneq_s16, vqdmlslh_lane_s16, vqdmlslh_laneq_s16 (when the lane index is 0) It also modifies the result of the existing pattern for the latter, when the lane index is not 0, to use the v1i32_indexed instructions instead of the v4i16_indexed ones. Fixes #49997. Differential Revision: https://reviews.llvm.org/D131700	2022-08-17 12:00:47 +01:00
Rainer Orth	d9993484ee	[Sparc] Don't use SunStyleELFSectionSwitchSyntax As discussed in D85414 <https://reviews.llvm.org/D85414>, two tests currently `FAIL` on Sparc since that backend uses the Sun assembler syntax for the `.section` directive, controlled by `SunStyleELFSectionSwitchSyntax`. Instead of adapting the affected tests, this patch changes that default. The internal assembler still accepts both forms as input, only the output syntax is affected. Current support for the Sun syntax is cursory at best: the built-in assembler cannot even assemble some of the directives emitted by GCC, and the set supported by the Solaris assembler is even larger: SPARC Assembly Language Reference Manual, 3.4 Pseudo-Op Attributes <https://docs.oracle.com/cd/E37838_01/html/E61063/gmabi.html#scrolltoc>. A few Sparc test cases need to be adjusted. At the same time, the patch fixes the failures from D85414 <https://reviews.llvm.org/D85414>. Tested on `sparcv9-sun-solaris2.11`. Differential Revision: https://reviews.llvm.org/D85415	2022-08-17 12:59:29 +02:00
Craig Topper	39707c1a9a	[RISCV] Add test coverage for (select (icmp X, Y), float, float). NFC We fold integer setcc into SELECT_CC during DAG combine even if the SELECT_CC has FP result type, but we had no test coverage.	2022-08-16 21:28:26 -07:00
Vitaly Buka	16fecdfa70	Revert "[AArch64] Add `foldCSELOfCSEl` DAG combine" Breaks ubsan on buildbot, details in D125504 This reverts commit 6f9423ef06926a70af84b77cb290c91214cf791a.	2022-08-16 20:29:37 -07:00
Craig Topper	d8cdd78b6c	[RISCV] Add test cases to show missed opportunity to fold (sub C, (xor (setcc), 1)). NFC (sub C, (xori X, 1)) can be folded to (add X, C-1) if X is 0 or 1. This would avoid the xori and in some cases remove an instruction neede to materialize the constant.	2022-08-16 16:40:53 -07:00
Eli Friedman	cfd2c5ce58	Untangle the mess which is MachineBasicBlock::hasAddressTaken(). There are two different senses in which a block can be "address-taken". There can be a BlockAddress involved, which means we need to map the IR-level value to some specific block of machine code. Or there can be constructs inside a function which involve using the address of a basic block to implement certain kinds of control flow. Mixing these together causes a problem: if target-specific passes are marking random blocks "address-taken", if we have a BlockAddress, we can't actually tell which MachineBasicBlock corresponds to the BlockAddress. So split this into two separate bits: one for BlockAddress, and one for the machine-specific bits. Discovered while trying to sort out related stuff on D102817. Differential Revision: https://reviews.llvm.org/D124697	2022-08-16 16:15:44 -07:00
Craig Topper	53ce22e429	Recommit "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine." This time using N1 instead of N0 since N1 points to the original setcc. This now affects scheduling as I expected. Original commit message: We change seteq<->setne but it doesn't change the semantics of the setcc. We should keep original debug location. This is consistent with visitXor in the generic DAGCombiner.	2022-08-16 15:51:07 -07:00
Craig Topper	b5a18de651	[RISCV] Remove C!=0 restriction from (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)). While (sub 0, X) can use x0 for the 0, I believe (add X, -1) is still preferrable. (addi X, -1) can be compressed, sub with x0 on the LHS is never compressible.	2022-08-16 14:49:52 -07:00
Alexander Shaposhnikov	d68ba43ad2	[Intrinsics] Add initial support for NonNull attribute Add initial support for NonNull attribute. (https://github.com/llvm/llvm-project/issues/57113) Test plan: verify that for __thread int x; int main() { int* y = &x; return *y; } (with this patch) clang -O -fsanitize=null -S -emit-llvm -o - doesn't emit a null-pointer check Differential revision: https://reviews.llvm.org/D131872	2022-08-16 21:28:23 +00:00
Craig Topper	de6fd16971	[RISCV] Don't fold (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) if C-1 isn't simm12. We still need to materialize the constant in a register and we may not be removing all uses of the original constant so it may increase code size.	2022-08-16 14:11:31 -07:00
Craig Topper	1180ed41ee	[RISCV] Add more test cases for (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)). NFC In these test cases we do the transform, but the immediate is too large to form an ADDI so it didn't save any instructions. If the constant is opaque or has additional users we shouldn't do the transform if it doesn't form an ADDI.	2022-08-16 14:08:42 -07:00
Craig Topper	4854fa217f	[RISCV] Move test from setcc-logic.ll to select-const.ll. NFC Also add setne version of the test. Add some common prefixes to reduce number of identical CHECK lines.	2022-08-16 14:08:42 -07:00
Craig Topper	4184edc691	[RISCV] (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) fold for FP setcc. This introduce an xori in some cases. I don't believe it was the intention of the original patch. This was an accident because nonan FP equality compares also use SETEQ/SETNE. Also pass the correct type to getSetCCInverse.	2022-08-16 13:00:36 -07:00
Craig Topper	87e7837293	[RISCV] Add test cases to show where we inverted a fp setcc and introduced an extra xori. In these tests we had (sub C, (seteq X, Y)) which we converted to the (add (setne X, Y), C-1). We don't have a FNE compare instruction so this created an XORI to invert an FEQ instruction. This might be a good idea since it can save a constant materialization, but does not appear to be the intention of the original patch.	2022-08-16 12:59:16 -07:00
Nicolas Miller	ccfabfbb1f	Fix subrange liveness checking at rematerialization This patch fixes an issue where an instruction reading a whole register would be moved during register allocation into a spot where one of the subregisters was dead. The code to check whether an instruction can be rematerialized at a given point or not was already checking for subranges to ensure that subregisters are live, but only when the instruction being moved was using a subregister, this patch changes that so the subranges are checked even when the moved instruction uses the full register. This patch also adds a case to the original test for the subrange checking that trigger the issue described above. The original subrange checking code was introduced in this revision: https://reviews.llvm.org/D115278 And I've encountered this issue on AMDGPUs while working with DPC++: https://github.com/intel/llvm/issues/6209 Essentially the greedy register allocator attempts to move the following instruction: ``` %3961:vreg_64 = V_LSHLREV_B64_e64 3, %3078:vreg_64, implicit $exec ``` From `@3440` into the body of a loop `@16312`, but `%3078` has the following live ranges: ``` %3078 [2224r,2240r:0)[2240r,3488B:1)[16192B,38336B:1) 0@2224r 1@2240r L0000000000000003 [2224r,3440r:0) 0@2224r L000000000000000C [2240r,3488B:0)[16192B,38336B:0) 0@2240r ``` So `@16312e` `%3078.sub1` is alive but `%3078.sub0` is dead, so this instruction being moved there leads to invalid memory accesses as `3078.sub0` ends up being trashed and the result of this instruction is used as part of an address calculation for a load. On the original ticket this issue showed up on gfx906 and gfx90a but not on gfx908, this turned out to be because on gfx908 instead of moving the shift instruction into the loop, its value is spilled into an ACC register, gfx906 doesn't have ACC registers and for gfx90a ACC registers are used like regular vector registers and so aren't used for spilling. With this patch the original application from the DPC++ ticket works properly on gfx906, and the result of the shift instruction is correctly spilled instead of moving the instruction in the loop. Original Author: npmiller Reviewed by: rampitec Submitted by: rampitec Differential Revision: https://reviews.llvm.org/D131884	2022-08-16 10:50:09 -07:00
Arthur Eubanks	9181ce623f	[Windows] Put init_seg(compiler/lib) in llvm.global_ctors Currently we treat initializers with init_seg(compiler/lib) as similar to any other init_seg, they simply have a global variable in the proper section (".CRT$XCC" for compiler/".CRT$XCL" for lib) and are added to llvm.used. However, this doesn't match with how LLVM sees normal (or init_seg(user)) initializers via llvm.global_ctors. This causes issues like incorrect init_seg(compiler) vs init_seg(user) ordering due to GlobalOpt evaluating constructors, and the ability to remove init_seg(compiler/lib) initializers at all. Currently we use 'A' for priorities less than 200. Use 200 for init_seg(compiler) (".CRT$XCC") and 400 for init_seg(lib) (".CRT$XCL"), which do not append the priority to the section name. Priorities between 200 and 400 use ".CRT$XCC${Priority}". This allows for some wiggle room for people/future extensions that want to add initializers between compiler and lib. Fixes #56922 Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D131910	2022-08-16 08:16:18 -07:00
Karl Meakin	6f9423ef06	[AArch64] Add `foldCSELOfCSEl` DAG combine Differential Revision: https://reviews.llvm.org/D125504	2022-08-16 12:49:11 +01:00
Zain Jaffal	7155ed4289	[AArch64] Add support for 256-bit non temporal loads Currenlty all temporal loads are mapped to `LDP` or `LDR`. This patch will map all the non temporal 256-bit loads into `LDNP`. Future patches should address other non-temporal loads. Reviewed By: fhahn, dmgreen Differential Revision: https://reviews.llvm.org/D131773	2022-08-16 12:19:36 +01:00
Bing1 Yu	807b8cb06c	[X86] Fix a lowering issue of mask.compress which has undef float passthrough Previously, LegaizeDAG didn't check mask.compress's passthrough might be float, and this lead to getConstant crash since it doesn't support fp Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D131947	2022-08-16 17:54:45 +08:00
gonglingqin	a9d46d9af3	[LoongArch] Add codegen support for fabs Differential Revision: https://reviews.llvm.org/D131871	2022-08-16 14:41:27 +08:00
Weining Lu	d1f36da9e0	[LoongArch] Encode LoongArch specific ELF e_flags to binary by LoongArchTargetStreamer Reference: https://github.com/loongson/LoongArch-Documentation The last commit hash (main branch) is: 99016636af64d02dee05e39974d4c1e55875c45b Note: There are several PRs [1][2][3] that may affect the e_flags. After they got closed or merged, we should update the implementation here accordingly. [1] https://github.com/loongson/LoongArch-Documentation/pull/33 [2] https://github.com/loongson/LoongArch-Documentation/pull/47 [2] https://github.com/loongson/LoongArch-Documentation/pull/61 Differential Revision: https://reviews.llvm.org/D130239	2022-08-16 13:41:50 +08:00
Rahman Lavaee	df2213f345	[EHStreamer] Omit @LPStart when function has no landing pads When no landing pads exist for a function, `@LPStart` is undefined and must be omitted. EH table is generally not emitted for functions without landing pads, except when the personality function is uknown (`!isNoOpWithoutInvoke(classifyEHPersonality(Per))`). In that case, we must omit `@LPStart` even when machine function splitting is enabled. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D131626	2022-08-15 17:09:46 -07:00
Vitaly Buka	e0e960923f	[AArch64] Fix signed integer overflow in CSINC case Followup to D131815, which overlflows on different values.	2022-08-15 15:04:20 -07:00
Amy Kwan	a5bef98c75	[PowerPC][NFC] Add additional vector_shuffle tests involving scalar_to_vector. This patch adds additional test cases involving vector_shuffles where either its left, right or both inputs are scalar_to_vector nodes. These test cases involve v16i8, v2i64, v4i32 and v8i16 vector shuffles, and were generated in preparation for D130487. Differential Revision: https://reviews.llvm.org/D130485	2022-08-15 12:30:58 -05:00
Craig Topper	7a73ab5818	[RISCV] Enable isTruncateFree in SDAG for i64->i32 on rv64. We have a good selection of W instructions, so promoting a truncated value back to i64 is often free. This appears to be a net code size reduction on SPECINT2006. This has been split from D130397 as one of the patches needed to complete that. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D131819	2022-08-15 08:32:51 -07:00
Simon Pilgrim	a7b85e4c0c	[X86] Freeze shl(x,1) -> add(x,x) vector fold (PR50468) Vector fold shl(x,1) -> add(freeze(x),freeze(x)) to avoid the undef issues identified in PR50468 Differential Revision: https://reviews.llvm.org/D106675	2022-08-15 16:17:21 +01:00
Simon Pilgrim	41bdb8cd36	[X86] Fold insert_vector_elt(undef, elt, 0) --> scalar_to_vector(elt) I had hoped to make this a generic fold in DAGCombine, but there's quite a few regressions in Thumb2 MVE that need addressing first. Fixes regressions from D106675.	2022-08-15 14:56:30 +01:00
David Green	dfc95bab07	[DAG] Ensure more Legal BUILD_VECTOR elements types in shuffle->And combine This is a followup to D131350, which caused another problem for i64 types being split into i32 on i32 targets. This patch tries to make sure that either Illegal types are OK, or that the element types of a buildvector are legal and bigger than or equal to the size of the original elements. Differential Revision: https://reviews.llvm.org/D131883	2022-08-15 14:41:45 +01:00
Luo, Yuanke	853bb192c4	Revert "(Reland) [fastalloc] Support allocating specific register class in fastalloc" This reverts commit 30f9e6ebd30b79d13f99eaca4d829e0da07186b3.	2022-08-15 20:33:15 +08:00
Ayke van Laethem	a560e57a7e	[AVR] Only push and clear R1 in interrupts when necessary R1 is a reserved register, but LLVM gives the APIs to know when it is used or not. So this patch uses these APIs to only save/clear/restore R1 in interrupts when necessary. The main issue here was getting inline assembly to work. One could argue that this is the job of Clang, but for consistency I've made sure that R1 is always usable in inline assembly even if that means clearing it when it might not be needed. Information on inline assembly in AVR can be found here: https://www.nongnu.org/avr-libc/user-manual/inline_asm.html#asm_code Essentially, this seems to suggest that r1 can be freely used in avr-gcc inline assembly, even without specifying it as an input operand. Differential Revision: https://reviews.llvm.org/D117426	2022-08-15 14:29:38 +02:00
Ayke van Laethem	43a8dbc5be	[AVR] Use @earlyclobber instead of register scavenging The code to support the case when the register allocator has assigned the same register to the src and the dst register operand isn't actually needed: * LDWRdPtr and LDDWRdPtrQ have an @earlyclobber on the output register, so the register allocator will make sure to allocate a different register for the output register. * LDDWRdYQ does not have an @earlyclobber, but the pointer register is the fixed Y register which is reserved. The register allocator won't use reserved registers for the output value. This removes a special case in the code that makes the pseudo instruction expansion pass more complicated than it needs to be. Differential Revision: https://reviews.llvm.org/D131844	2022-08-15 14:29:38 +02:00
Ayke van Laethem	de48717fcf	[AVR] Support unaligned store This patch really just extends D39946 towards stores as well as loads. While the patch is in SelectionDAGBuilder, it only applies to AVR (the only target that supports unaligned atomic operations). Differential Revision: https://reviews.llvm.org/D128483	2022-08-15 14:29:37 +02:00

1 2 3 4 5 ...

44598 Commits