llvm-project

Author	SHA1	Message	Date
Sven van Haastregt	bfc961aeb2	[TargetLowering] Check boolean content when folding bit compare Updates an optimization that relies on boolean contents being either 0 or 1 to properly check for this before triggering. The following: (X & 8) != 0 --> (X & 8) >> 3 Produces unexpected results when a boolean 'true' value is represented by negative one. Patch by Erik Hogeman. Differential Revision: https://reviews.llvm.org/D89390	2020-10-21 11:46:55 +01:00
Sven van Haastregt	1af51f077b	[TargetLowering] Add test for bit comparison fold This adds a test covering an issue in bit comparison folding. The issue will be addressed in the subsequent commit. Patch by Erik Hogeman. Differential Revision: https://reviews.llvm.org/D89390	2020-10-21 11:46:45 +01:00
Florian Hahn	88241ffb56	[Passes] Move ADCE before DSE & LICM. The adjustment seems to have very little impact on optimizations. The only binary change with -O3 MultiSource/SPEC2000/SPEC2006 on X86 is in consumer-typeset and the size there actually decreases by -0.1%, with not significant changes in the stats. On its own, it is mildly positive in terms of compile-time, most likely due to LICM & DSE having to process slightly less instructions. It should also be unlikely that DSE/LICM make much new code dead. http://llvm-compile-time-tracker.com/compare.php?from=df63eedef64d715ce1f31843f7de9c11fe1e597f&to=e3bdfcf94a9eeae6e006d010464f0c1b3550577d&stat=instructions With DSE & MemorySSA, it gives some nice compile-time improvements, due to the fact that DSE can re-use the PDT from ADCE, if it does not make any changes: http://llvm-compile-time-tracker.com/compare.php?from=15fdd6cd7c24c745df1bb419e72ff66fd138aa7e&to=481f494515fc89cb7caea8d862e40f2c910dc994&stat=instructions Reviewed By: xbolva00 Differential Revision: https://reviews.llvm.org/D87322	2020-10-21 10:30:56 +01:00
Esme-Yi	9fbb060418	[NFC][PowerPC]Add tests for folding RLWINM before and after RA.	2020-10-21 06:38:22 +00:00
Austin Kerbow	ebdcef20ce	[AMDGPU] Avoid inserting noops during scheduling Passes that are run after the post-RA scheduler may insert instructions like waitcnt which eliminate the need for certain noops. After this patch the scheduler is still aware of possible latency from hazards but noops will not be inserted until the dedicated hazard recognizer pass is run. Depends on D89753. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D89754	2020-10-20 17:11:36 -07:00
Austin Kerbow	37d907899f	[HazardRec] Allow inserting multiple wait-states simultaneously If a target can encode multiple wait-states into a noop allow emitting such instructions directly. Reviewed By: rampitec, dmgreen Differential Revision: https://reviews.llvm.org/D89753	2020-10-20 17:03:47 -07:00
Tony	1bc7bfffdb	[AMDGPU] Optimize waitcnt insertion for flat memory operations Change waitcnt insertion to check the memory operand tokens to see if flat memory operations access VMEM in the same way it does to check if accessing LDS. This avoids adding waitcnt for counters for address spaces that are not accessed. In addition, only generate the pessimistic waitcnt 0 if a flat memory operation appears to access both VMEM and LDS. This benefits flat memory operations that explicitly specify the address space as GLOBAL or LOCAL. Differential Revision: https://reviews.llvm.org/D89618	2020-10-20 22:55:12 +00:00
Michael Liao	2a0e4d1c01	[amdgpu] Enhance AMDGPU AA. - In general, a generic point may alias to pointers in all other address spaces. However, for certain cases enforced by the programming model, we may found a generic point won't alias to pointers to local objects. * When a generic pointer is loaded from the constant address space, it could only be a pointer to the GLOBAL or CONSTANT address space. Thus, it won't alias to pointers to the PRIVATE or LOCAL address space. * When a generic pointer is passed as a kernel argument, it also could only be a pointer to the GLOBAL or CONSTANT address space. Thus, it also won't alias to pointers to the PRIVATE or LOCAL address space. Differential Revision: https://reviews.llvm.org/D89525	2020-10-20 09:54:12 -04:00
Carl Ritson	be2afbd019	[AMDGPU] Remove fix up operand from SI_ELSE Remove immediate operand from SI_ELSE which indicates if EXEC has been modified. Instead always emit code that handles EXEC and remove unnecessary instructions during pre-RA optimisation. This facilitates passes (i.e. SIWholeQuadMode) adding exec mask manipulation post control flow lowering, and pre control flow lower passes do not need to be aware of SI_ELSE handling. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D89644	2020-10-20 19:15:21 +09:00
sstefan1	fbfb1c7909	[IR] Make nosync, nofree and willreturn default for intrinsics. D70365 allows us to make attributes default. This is a follow up to actually make nosync, nofree and willreturn default. The approach we chose, for now, is to opt-in to default attributes to avoid introducing problems to target specific intrinsics. Intrinsics with default attributes can be created using `DefaultAttrsIntrinsic` class.	2020-10-20 11:57:19 +02:00
David Green	6dcbc323fd	Revert "[ARM][LowOverheadLoops] Adjust Start insertion." This reverts commit 38f625d0d1360b035271422bab922d22ed04d79a. This commit contains some holes in its logic and has been causing issues since it was commited. The idea sounds OK but some cases were not handled correctly. Instead of trying to fix that up later it is probably simpler to revert it and work to reimplement it in a more reliable way.	2020-10-20 08:55:21 +01:00
Kai Luo	638fee625d	[PowerPC] Add test case for missing `nsw` flag. NFC.	2020-10-20 03:47:49 +00:00
Qiu Chaofan	1b2fe71ecf	[DAGCombiner] Tighten reasscociation of visitFMA From LangRef, FMF contract should not enable reassociating to form arbitrary contractions. So it should not help rearrange nodes like (fma (fmul x, c1), c2, y) into (fma x, c1*c2, y). Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D89527	2020-10-20 10:13:01 +08:00
Wang, Pengfei	3a85472af2	[X86] Fix assert fail when element type is i1. extract_vector_elt will turn type vxi1 into i8, which triggers the assertion fail. Since we don't really handle vxi1 cases in below code, we can just return from here. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D89096	2020-10-20 09:26:32 +08:00
Evgenii Stepanov	188a7d6710	Add alloca size threshold for StackTagging initializer merging. Summary: Initializer merging generates pretty inefficient code for large allocas that also happens to trigger an exponential algorithm somewhere in Machine Instruction Scheduler. See https://bugs.llvm.org/show_bug.cgi?id=47867. This change adds an upper limit for the alloca size. The default limit is selected such that worst case size of memtag-generated code is similar to non-memtag (but because of the ISA quirks, this case is realized at the different value of alloca size, ex. memset inlining triggers at sizes below 512, but stack tagging instructions are 2x shorter, so limit is approx. 256). We could try harder to emit more compact code with initializer merging, but that would only affect large, sparsely initialized allocas, and those are doing fine already. Reviewers: vitalybuka, pcc Subscribers: llvm-commits	2020-10-19 13:44:07 -07:00
Craig Topper	edd0cb11bd	[SelectionDAG][X86] Enable SimplifySetCC CTPOP transforms for vector splats This enables these transforms for vectors: (ctpop x) u< 2 -> (x & x-1) == 0 (ctpop x) u> 1 -> (x & x-1) != 0 (ctpop x) == 1 --> (x != 0) && ((x & x-1) == 0) (ctpop x) != 1 --> (x == 0) \|\| ((x & x-1) != 0) All enabled if CTPOP isn't Legal. This differs from the scalar behavior where the first two are done unconditionally and the last two are done if CTPOP isn't Legal or Custom. The Legal check produced better results for vectors based on X86's custom handling. Might be worth re-visiting scalars here. I disabled the looking through truncate for vectors. The code that creates new setcc can use the same result VT as the original setcc even if we truncated the input. That may work work for most scalars, but definitely wouldn't work for vectors unless it was a vector of i1. Fixes or at least improves PR47825 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D89346	2020-10-19 12:56:59 -07:00
Craig Topper	e28376ec28	[X86] Add i32->float and i64->double bitcast pseudo instructions to store folding table. We have pseudo instructions we use for bitcasts between these types. We have them in the load folding table, but not the store folding table. This adds them there so they can be used for stack spills. I added an exact size check so that we don't fold when the stack slot is larger than the GPR. Otherwise the upper bits in the stack slot would be garbage. That would be fine for Eli's test case in PR47874, but I'm not sure its safe in general. A step towards fixing PR47874. Next steps are to change the ADDSSrr_Int pseudo instructions to use FR32 as the second source register class instead of VR128. That will keep the coalescer from promoting the register class of the bitcast instruction which will make the stack slot 4 bytes instead of 16 bytes. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D89656	2020-10-19 12:53:14 -07:00
Cameron McInally	629d1d117a	[SVE] Update vector reduction intrinsics in new tests. Remove `experimental` from the intrinsic names.	2020-10-19 13:27:46 -05:00
Amy Kwan	6a946fd06f	[DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook, Use isOperationLegalOrCustom directly instead. MULH is often expanded on targets. This patch removes the isMulhCheaperThanMulShift hook and uses isOperationLegalOrCustom instead. Differential Revision: https://reviews.llvm.org/D80485	2020-10-19 12:23:04 -05:00
Piotr Sobczak	c872faf6e0	[AMDGPU] Do not generate S_CMP_LG_U64 on gfx7 S_CMP_LG_U64 was added in gfx8 and is guarded by hasScalarCompareEq64(). Rewrite S_CMP_LG_U64 to S_OR_B32 + S_CMP_LG_U32 for targets that do not support 64-bit scalar compare. Differential Revision: https://reviews.llvm.org/D89536	2020-10-19 14:44:31 +02:00
Kazushi (Jam) Marukawa	6bb60d3e26	[VE] Add setcc for fp128 Add setcc for fp128 and clean existing ISel patterns. Also add a regression test. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89683	2020-10-19 21:36:57 +09:00
Kazushi (Jam) Marukawa	fb2bb6fad4	[VE] Add cast to/from fp128 patterns Add cast to/from fp128 patterns. Clean other cast patterns too. Update a regression test by adding missing tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89682	2020-10-19 21:35:27 +09:00
Hans Wennborg	0628bea513	Revert "[PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting" This broke Chromium's PGO build, it seems because hot-cold-splitting got turned on unintentionally. See comment on the code review for repro etc. > This patch adds -f[no-]split-cold-code CC1 options to clang. This allows > the splitting pass to be toggled on/off. The current method of passing > `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose > correctly (say, with `-O0` or `-Oz`). > > To implement the -fsplit-cold-code option, an attribute is applied to > functions to indicate that they may be considered for splitting. This > removes some complexity from the old/new PM pipeline builders, and > behaves as expected when LTO is enabled. > > Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org> > Differential Revision: https://reviews.llvm.org/D57265 > Reviewed By: Aditya Kumar, Vedant Kumar > Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar This reverts commit 273c299d5d649a0222fbde03c9a41e41913751b4.	2020-10-19 12:31:14 +02:00
Kazushi (Jam) Marukawa	8796746b2a	[VE] Support select_cc Add missing ISel patterns related to select_cc DAG nodes. Add regression test of all combination of possible scalar types. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89672	2020-10-19 18:54:25 +09:00
Kazushi (Jam) Marukawa	25955cbae4	[VE] Support br_cc comparing fp128 Support br_cc instruction comparing fp128 values. Add a br_cc.ll regression test for all kind of br_cc instructions. And, clean existing branch regression tests, this time. Clean a brcond.ll regression test for brcond instruction. Remove mixed branch1.ll regression test. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89627	2020-10-19 18:29:39 +09:00
Kazushi (Jam) Marukawa	af8b444de3	[VE] Update ISel patterns for select instruction Add an ISel pattern for fp128 select instruction and optimize generated code for other types' select. instructions. Add a regression test also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89509	2020-10-19 18:28:21 +09:00
Kai Luo	354d3106c6	[PowerPC] Skip combining (uint_to_fp x) if x is not simple type Current powerpc64le backend hits ``` Combining: t7: f64 = uint_to_fp t6 llc: llvm-project/llvm/include/llvm/CodeGen/ValueTypes.h:291: llvm::MVT llvm::EVT::getSimpleVT() const: Assertion `isSimple() && "Expected a SimpleValueType!"' failed. ``` This patch fixes it by skipping combination if `t6` is not simple type. Fixed https://bugs.llvm.org/show_bug.cgi?id=47660. Reviewed By: #powerpc, steven.zhang Differential Revision: https://reviews.llvm.org/D88388	2020-10-19 05:23:46 +00:00
Fangrui Song	2819631914	[PrologEpilogInserter] Reduce PR16393 test and fix a prologue parameter in a debuginfo test	2020-10-18 22:18:42 -07:00
Craig Topper	9d23224bf6	[X86] Add test cases for PR47874. NFC	2020-10-18 12:45:01 -07:00
Fangrui Song	98797a5fc0	[PrologEpilogInserter][test] Improve SpilledToReg test D39386 made CalleeSavedInfo possible to spill a register to another register (vector register for POWER9) but did not actually test live-in.	2020-10-17 20:36:22 -07:00
Amara Emerson	4ad459997e	[AArch64][GlobalISel] Select csinc if a select has a 1 on RHS. Differential Revision: https://reviews.llvm.org/D89513	2020-10-16 16:49:52 -07:00
Albion Fung	d30155feaa	[PowerPC] Implementation of 128-bit Binary Vector Rotate builtins This patch implements 128-bit Binary Vector Rotate builtins for PowerPC10. Differential Revision: https://reviews.llvm.org/D86819	2020-10-16 18:03:22 -04:00
Austin Kerbow	978fbd8268	[AMDGPU] Run hazard recognizer pass later If instructions were removed in peephole passes after the hazard recognizer was run it is possible that new hazards could be introduced. Fixes: SWDEV-253090 Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D89077	2020-10-16 12:15:51 -07:00
Amara Emerson	39c05a1a71	[AArch64][GlobalISel] Add selection support for v2s32 and v2s64 reductions for FADD/ADD. We'll need legalizer lower() support for the other types to work. Differential Revision: https://reviews.llvm.org/D89159	2020-10-16 11:41:57 -07:00
Amara Emerson	32f77eea2d	[AArch64][GlobalISel] Regbankselect reductions to use FPR bank for scalars. Differential Revision: https://reviews.llvm.org/D89075	2020-10-16 10:42:15 -07:00
Amara Emerson	9190411fcf	[AArch64][GlobalISel] Add basic legalizer rules for supported add/fadd reductions. NEON is pretty limited in it's reduction support. As a first step add some basic rules for the legal types we can select. Differential Revision: https://reviews.llvm.org/D89070	2020-10-16 10:35:46 -07:00
Amara Emerson	6042c25b0a	[GlobalISel] Add translation support for vector reduction intrinsics. In order to prevent the ExpandReductions pass from expanding some intrinsics before they get to codegen, I had to add a -disable-expand-reductions flag for testing purposes. Differential Revision: https://reviews.llvm.org/D89028	2020-10-16 10:17:53 -07:00
Jay Foad	1417abe54c	[AMDGPU] Add new llvm.amdgcn.fma.legacy intrinsic Differential Revision: https://reviews.llvm.org/D89558	2020-10-16 17:10:21 +01:00
Matt Arsenault	ce16b6835b	AMDGPU: Don't kill super-register with overlapping copy This would end up killing part of the result super-register, resulting in a verifier error on a later use of the overlapping registers. We could add kills of any non-aliasing registers, but we should be moving away from relying on kill flags.	2020-10-16 09:34:35 -04:00
Florian Hahn	51ff04567b	Recommit "[DSE] Switch to MemorySSA-backed DSE by default." After investigation by @asbirlea, the issue that caused the revert appears to be an issue in the original source, rather than a problem with the compiler. This patch enables MemorySSA DSE again. This reverts commit 915310bf14cbac58a81fd60e0fa9dc8d341108e2.	2020-10-16 09:02:53 +01:00
Vedant Kumar	273c299d5d	[PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting This patch adds -f[no-]split-cold-code CC1 options to clang. This allows the splitting pass to be toggled on/off. The current method of passing `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose correctly (say, with `-O0` or `-Oz`). To implement the -fsplit-cold-code option, an attribute is applied to functions to indicate that they may be considered for splitting. This removes some complexity from the old/new PM pipeline builders, and behaves as expected when LTO is enabled. Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org> Differential Revision: https://reviews.llvm.org/D57265 Reviewed By: Aditya Kumar, Vedant Kumar Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar	2020-10-15 23:13:33 +00:00
Amara Emerson	c2551c1f40	[GlobalISel] Remove scalar src from non-sequential fadd/fmul reductions. It's probably better to split these into separate G_FADD/G_FMUL + G_VECREDUCE operations in the translator rather than carrying the scalar around. The majority of the time it'll get simplified away as the scalars are probably identity values. Differential Revision: https://reviews.llvm.org/D89150	2020-10-15 15:51:44 -07:00
Kazushi (Jam) Marukawa	410e5b17cf	[VE] Support fabs/fcos/fsin/fsqrt math functions VE doesn't have instruction for fabs/fcos/fsin/fsqrt, so expand them. Add regression tests also. Update fcopysign regression test, also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89457	2020-10-16 06:27:38 +09:00
Thomas Lively	1992e30c2d	[WebAssembly] Prototype i8x16.popcnt As proposed at https://github.com/WebAssembly/simd/pull/379. Use a target builtin and intrinsic rather than normal codegen patterns to make the instruction opt-in until it is merged to the proposal and stabilized in engines. Differential Revision: https://reviews.llvm.org/D89446	2020-10-15 21:18:22 +00:00
Jameson Nash	122d92dfc3	fix symbol printing on windows Similar to MCSymbol::print in 3d6c8ebb584375d01b1acead4c2056b3f0c501fc (llvm-svn: 81682, PR4966), these symbols may need to be quoted to be handled by the linker correctly. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D87099	2020-10-15 17:14:55 -04:00
alex-t	42ed388120	[AMDGPU] SILowerControlFlow::removeMBBifRedundant should not try to change MBB layout if it can fallthrough removeMBBifRedundant normally tries to keep predecessors fallthrough when removing redundant MBB. It has to change MBBs layout to keep the new successor to immediately follow the predecessor of removed MBB. It only may be allowed in case the new successor itself has no successors to which it fall through. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D89397	2020-10-15 23:20:54 +03:00
Evgenii Stepanov	2e794a46b5	[AArch64] Stack frame reordering. Implement stack frame reordering in the AArch64 backend. Unlike the X86 implementation, AArch64 does not seem to benefit from "access density" based frame reordering, mainly because it has a much smaller variety of addressing modes, and the fact that all instructions are 4 bytes so each frame object is either in range of an instruction (and then the access is "free") or not (and that has a code size cost of 4 bytes). This change improves Memory Tagging codegen by * Placing an object that has been chosen as the base tagged pointer of the function at SP + 0. This saves one instruction to setup the pointer (IRG does not have an offset immediate), and more because that object can now be referenced without materializing its tagged address in a scratch register. * Placing objects that go out of scope simultaneously together. This exposes opportunities for instruction merging in tryMergeAdjacentSTG. Differential Revision: https://reviews.llvm.org/D72366	2020-10-15 12:50:16 -07:00
Evgenii Stepanov	2f63e57fa5	[MTE] Pin the tagged base pointer to one of the stack slots. Summary: Pin the tagged base pointer to one of the stack slots, and (if necessary) rewrite tag offsets so that an object that occupies that slot has both address and tag offsets of 0. This allows ADDG instructions for that object to be eliminated and their uses replaced with the tagged base pointer itself. This optimization must be done in machine instructions and not in the IR instrumentation pass, because referring to a stack slot through an IRG pointer would confuse the stack coloring pass. The optimization makes a (pretty naive) attempt to find the slot that would benefit the most by counting the uses of stack slots in the function. Reviewers: ostannard, pcc Subscribers: merge_guards_bot, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72365	2020-10-15 12:50:16 -07:00
Stanislav Mekhanoshin	d1beb95d12	[AMDGPU] gfx1032 target Differential Revision: https://reviews.llvm.org/D89487	2020-10-15 12:41:18 -07:00
Thomas Lively	3f738d1f5e	Reland "[WebAssembly] v128.load{8,16,32,64}_lane instructions" This reverts commit 7c8385a352ba21cb388046290d93b53dc273cd9f with a typing fix to an instruction selection pattern.	2020-10-15 19:32:34 +00:00

1 2 3 4 5 ...

36153 Commits