llvm-project

Author	SHA1	Message	Date
Róbert Ágoston	cd4ed08b5a	[GlobalISel] Don't combine instructions which are fed by memory instructions using different size Memory instructions like extending loads from the same address are not equal if their size is not equal. This fixes https://github.com/llvm/llvm-project/issues/53524. Differential Revision: https://reviews.llvm.org/D118805	2022-02-04 15:00:47 -08:00
Matt Arsenault	8b8b491379	AMDGPU/GlobalISel: Fix assertions on invalid addrspacecasts Fixes some assert on invalid situations and starts directly emitting the error.	2022-02-04 17:28:49 -05:00
Matt Arsenault	f1f15d0285	AMDGPU: Fix failing test	2022-02-04 16:54:09 -05:00
Masoud Ataei	8ce13bc93b	[PowerPC] Option controling scalar MASS convertion differential: https://reviews.llvm.org/D119035 reviewer: bmahjour	2022-02-04 13:24:22 -08:00
Paulo Matos	c67c9cfe3f	[WebAssembly] Refactor and fix emission of external IR global decls Reland of 00bf4755. This patches fixes the visibility and linkage information of symbols referring to IR globals. Emission of external declarations is now done in the first execution of emitConstantPool rather than in emitLinkage (and a few other places). This is the point where we have already gathered information about used symbols (by running the MC Lower PrePass) and not yet started emitting any functions so that any declarations that need to be emitted are done so at the top of the file before any functions. This changes the order of a few directives in the final asm file which required an update to a few tests. Reviewed By: sbc100 Differential Revision: https://reviews.llvm.org/D118995	2022-02-04 22:01:46 +01:00
Matt Arsenault	4622afa94c	AMDGPU: Convert AMDGPUResourceUsageAnalysis to a Module pass This is more precise in the face of indirect calls and aliases, still assuming the call target is defined somewhere in the current module. This sometimes changes the order the functions are printed, and also changes the point where context errors are printed relative to stdout. This also likely has negative consequences for compile time and memory usage.	2022-02-04 15:56:04 -05:00
Matt Arsenault	935abab65c	AMDGPU: Use module level register maximums for unknown callees Compute the theoretical register budget based on the IR function signature/attributes, and use the global maximum register budgets for unknown callees. This should fix the kernel reported register usage in the presence of indirect calls. The previous fix in 2b08f6af62afbf32e89a6a392dbafa92c62f7bdf was incorrect becauset it was only taking the maximum in the known call graph, and missing something that was either outside of it or codegened later. This fixes a second case I discovered where calls to aliases also did not work as expected. CallGraphAnalysis misses these, so functions called through aliases were not codegened ahead of callers as expected. CallGraphAnalysis should probably be fixed to understand this case, and there's likely a bug with IPRA here. This fixes numerous failures in the conformance test at -O0.	2022-02-04 15:56:03 -05:00
Sanjay Patel	fff3e1dbaa	[x86] enable fast sqrtss/sqrtps tuning for AMD Zen cores As discussed in D118534, all of the recent AMD CPUs have relatively fast (<14 cycle latency) "sqrtss" and "sqrtps" instructions: https://uops.info/table.html?search=sqrtps&cb_lat=on&cb_tp=on&cb_SNB=on&cb_SKL=on&cb_ZENp=on&cb_ZEN2=on&cb_ZEN3=on&cb_measurements=on&cb_avx=on&cb_sse=on So we should set this tuning flag to alter codegen of plain "sqrt(X)" expansion (as opposed to reciprocal-sqrt - there is other test coverage for that pattern). The expansion is both slower and less accurate than the hardware instruction. Differential Revision: https://reviews.llvm.org/D119001	2022-02-04 13:59:20 -05:00
Craig Topper	1d8bbe3d25	[RISCV] Implement a basic version of AArch64RedundantCopyElimination pass. Using AArch64's original implementation for reference, this patch implements a pass to remove unneeded copies of X0. This pass runs after register allocation and looks to see if a register is implied to be 0 by a branch in the predecessor basic block. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D118160	2022-02-04 10:43:46 -08:00
Craig Topper	234e54bdd8	[RISCV] Add more types of shuffles isShuffleMaskLegal. Add the vslidedown and interleave patterns that I recently implemented. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D118952	2022-02-04 09:13:13 -08:00
Craig Topper	c83905a308	[RISCV] Add inline expansion for vector fround. This avoids a crash for scalable vectors and or scalarization for fixed vectors. The algorithm is different enough that I don't think it makes sense to merge with ceil/floor/trunc. Algorithm is adapted from gcc's X86 SSE2 output. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D117247	2022-02-04 09:12:09 -08:00
Sanjay Patel	1eb4f88bfe	[x86] add test coverage for AMD Ryzen fast sqrt codegen; NFC	2022-02-04 11:40:54 -05:00
David Green	74d1fe72f4	[AArch64] Expand UADDLV patterns to SADDLV We already had some patterns for UADDV(UADDLP(x)) -> UADDLV(x), this simply expands them to the signed instructions by re-using the tablegen patterns. Differential Revision: https://reviews.llvm.org/D118133	2022-02-04 14:07:02 +00:00
Nikita Popov	f62a400cdf	[Statepoint] Determine return type from elementtype attribute Based on the LangRef change in D117890, this uses the elementtype attribute rather than the pointer element type to determine the statepoint callee function type, making statepoints compatible with opaque pointers.	2022-02-04 14:40:27 +01:00
Nikita Popov	46f9e45ef0	[Statepoint] Update gc.statepoint calls in tests with elementtype (NFC) This updates tests for the LangRef change in D117890.	2022-02-04 14:15:41 +01:00
John Brawn	0d8092dd48	[AArch64] Fix legalization of v1f64 strict_fsetcc and strict_fsetccs These operations are scalarized but the result type v1i1 isn't which needs special handling (the same as is done for the non-strict versions of these operations). Differential Revision: https://reviews.llvm.org/D118258	2022-02-04 12:55:38 +00:00
Sanjay Patel	7b03725097	Revert "[x86] try harder to scalarize a vector load with extracted integer op uses" This reverts commit b4b97ec813a02585000f30ac7d532dda74e8bfda. As discussed in post-commit feedback at: https://reviews.llvm.org/D118376 ...there's a stage 2 failure on a Mac running a clang-refactor tool test.	2022-02-04 07:45:57 -05:00
Matt Devereau	6b73a4cc7d	[AArch64][SVE] Remove false register dependency for unary FP convert operations Generate movprfx for floating point convert zeroing pseudo operations Differential Revision: https://reviews.llvm.org/D118617	2022-02-04 09:55:39 +00:00
Bjorn Pettersson	3db39e7479	[DAGCombiner] Fix dependency analysis in checkMergeStoreCandidatesForDependencies In the aftermath of D116895 a problem was found in the analysis of dependencies between store merge candidates in checkMergeStoreCandidatesForDependencies, that is needed to avoid the cycles are introduced in the DAG. In the past it has been enough (or assumed to be enough) to start scanning from non-chain operands when analysing the store merge candidates for dependencies, assuming that the analysis of chain dependencies performed when finding the candidates would cover up for potential dependencies that exist involving the chain operands. It was however discovered that one could end up with scenarios such as descibed in the aarch64-checkMergeStoreCandidatesForDependencies.ll test case, when the dependency between two stores is given by a mix of chain operand dependencies and non-chain operand dependencies. The fix in this patch make sure that we also account for chain operand dependencies when doing the more elaborate analysis in checkMergeStoreCandidatesForDependencies, no longer relying on that the earlier check involving chain operands is enough. Differential Revision: https://reviews.llvm.org/D118943	2022-02-04 08:53:01 +01:00
Jessica Paquette	9a61e731ff	[GlobalISel] Combine (G_ADDO x, 0) -> x + no carry out Similar to the G_MULO change. The code for checking if a constant is legal/pre-legalize is shared between these, and is kind of hairy. So, factor it out into a new function: `isConstantLegalOrBeforeLegalizer`. To make the refactoring clean, further refactor `isLegalOrBeforeLegalizer` into a wrapper for two functions: - `isPreLegalize` - `isLegal` This is a bit easier to read in general. https://godbolt.org/z/KW7oszP1o Differential Revision: https://reviews.llvm.org/D118655	2022-02-03 14:25:15 -08:00
Jessica Paquette	c636899dc1	[GlobalISel] Combine: (G_MULO x, 0) -> 0 + no carry out Similar to the following combine in `DAGCombiner::visitMULO`: ``` // fold (mulo x, 0) -> 0 + no carry out if (isNullOrNullSplat(N1)) return CombineTo(N, DAG.getConstant(0, DL, VT), DAG.getConstant(0, DL, CarryVT)); ``` This fixes some generally poor codegen for `mulo`: https://godbolt.org/z/eTxYsvz8f Differential Revision: https://reviews.llvm.org/D118635	2022-02-03 14:23:58 -08:00
Vang Thao	2ca194ff55	[AMDGPU] Fix scheduler live-ins with debug inst at start of block GCNDownwardRPTracker RPTracker.reset() skips debug instructions for NextMI so RPTracker.getNext() will never give the beginning of a sched region if it is a debug value. In this case we will never set the live-ins for that block. Add check to see if getNext also equals the MI after skipping debug instructions. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D118853	2022-02-03 12:41:32 -08:00
Bjorn Pettersson	0352ee1a22	[CodeGenPrepare] Avoid out-of-bounds shift AddressingModeMatcher::matchOperationAddr may attempt to shift a variable by the same amount of steps as found in the IR in a SHL instruction. This was done without considering that there could be undefined behavior in the IR, so the shift performed when compiling could end up having undefined behavior as well. This patch avoid UB in the codegenprepare by making sure that we limit the shift amount used, in a similar way as already being done in CodeGenPrepare::optimizeLoadExt. Differential Revision: https://reviews.llvm.org/D118602	2022-02-03 21:03:58 +01:00
Caroline Concatto	961e954af5	[AArch64][SVE] Add more folds to make use of gather/scatter with 32-bit indices In AArch64ISelLowering.cpp this patch implements this fold: 1) GEP (%ptr, SHL ((stepvector(A) + splat(%offset))) << splat(B))) into GEP (%ptr + (%offset << B), step_vector (A << B)) The above transform simplifies the index operand so that it can be expressed as i32 elements. This allows using only one gather/scatter assembly instruction instead of two. Patch by Paul Walker (@paulwalker-arm). Depends on D117900 Differential Revision: https://reviews.llvm.org/D118345	2022-02-03 19:18:30 +00:00
Caroline Concatto	019f0221d5	[AArch64][SVE] Fold gather/scatter with 32bits when possible In AArch64ISelLowering.cpp this patch implements this fold: GEP (%ptr, (splat(%offset) + stepvector(A))) into GEP ((%ptr + %offset), stepvector(A)) The above transform simplifies the index operand so that it can be expressed as i32 elements. This allows using only one gather/scatter assembly instruction instead of two. Patch by Paul Walker (@paulwalker-arm). Depends on D118459 Differential Revision: https://reviews.llvm.org/D117900	2022-02-03 18:58:37 +00:00
Craig Topper	237eb37260	[RISCV] Add FMV_X_W and FMV_X_H to RISCVSExtWRemoval. Add -target-abi to sextw-removal.ll RUN lines to show benefit on new test case.	2022-02-03 09:40:47 -08:00
Sanjay Patel	a662456b33	[x86] add minimal test for sbb idiom and CPU capabilities; NFC D116804 proposes to alter codegen on this example based on CPU tuning, so check a variety of models to confirm it works as expected. We already have this test mixed in with several others in another test file, but it seems wasteful to add so many RUN lines to check this difference over and over again.	2022-02-03 12:32:36 -05:00
Sanjay Patel	3dbe33e4ec	[x86] remove CPU requirement for RUN line in test file; NFC A proposed change ( D118843 ) that would affect this test will not require a specific CPU model to show a difference.	2022-02-03 12:32:36 -05:00
Thomas Symalla	476babcc1d	[AMDGPU] Introduce new ISel combine for trunc-slr patterns In some cases, when selecting a (trunc (slr)) pattern, the slr gets translated to a v_lshrrev_b3e2_e64 instruction whereas the truncation gets selected to a sequence of v_and_b32_e64 and v_cmp_eq_u32_e64. In the final ISA, this appears as selecting the nth-bit: v_lshrrev_b32_e32 v0, 2, v1 v_and_b32_e32 v0, 1, v0 v_cmp_eq_u32_e32 vcc_lo, 1, v0 However, when the value used in the right shift is known at compilation time, the whole sequence can be reduced to two VALUs when the constant operand in the v_and is adjusted to (1 << lshrrev_operand): v_and_b32_e32 v0, (1 << 2), v1 v_cmp_ne_u32_e32 vcc_lo, 0, v0 In the example above, the following pseudo-code: v0 = (v1 >> 2) v0 = v0 & 1 vcc_lo = (v0 == 1) would be translated to: v0 = v1 & 0b100 vcc_lo = (v0 == 0b100) which should yield an equivalent result. This is a little bit hard to test as one needs to force the SelectionDAG to contain the nodes before instruction selection, but the test sequence was roughly derived from a production shader. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D118461	2022-02-03 18:06:44 +01:00
Sunho Kim	44601f4956	[AARCH64][NEON] Allow to sink operands for aarch64_neon_pmull This teaches AArch64TargetLowering::shouldSinkOperands to sink the operands of aarch64_neon_pmull intrinsic. Differential Revision: https://reviews.llvm.org/D117944	2022-02-03 16:46:49 +00:00
Jay Foad	b9cf52bc3d	[AMDGPU] Simplify AMDGPUAnnotateUniformValues::visitLoadInst Always set uniform metadata on the pointer if it is an instruction, but otherwise do not bother to create a trivial getelementptr instruction, because AMDGPUInstrInfo::isUniformMMO can already detect that various non-instruction pointers are uniform. Most of the test case churn is from tests that used undef as a pointer, which AMDGPUInstrInfo::isUniformMMO treats as uniform. Differential Revision: https://reviews.llvm.org/D118909	2022-02-03 16:27:48 +00:00
Jay Foad	42fc05e09c	[AMDGPU] Tweak tests in noclobber-barrier.ll Tweak some of the tests to demonstrate AMDGPUAnnotateUniformValues::visitLoadInst inserting a trivial getelementptr instruction, just to have somewhere to put amdgpu.uniform metadata. NFC.	2022-02-03 16:10:51 +00:00
Matt Devereau	1c6dca96ca	[AArch64][SVE] Fold vselect into predicated fmul, fsub and fadd Fold vselect with an unpredicated fmul/fsub/fadd operand into a predicated fmul/fsub/fadd: (vselect (p) (op (a) (b)) (a)) => (op -> (p) (a) (b)) Differential Revision: https://reviews.llvm.org/D117689	2022-02-03 13:43:15 +00:00
Shao-Ce SUN	005fd8aa70	[RISCV] Add support for Zihintpause extention Add support for the 'pause' hint instruction as an alias for 'fence w, 0'. To do this allow the 'fence' operands pred and succ to be set to 0 (the empty set). This will also allow future hints to be encoded as 'fence 0, <x>' and 'fence <x>, 0'. This patch revised from @mundaym's D93019. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D117789	2022-02-03 20:55:47 +08:00
John Brawn	6f53960d64	[AArch64] Adjust machine-combiner-reassociate.mir test Use regular expressions for instruction numbers, as these can vary.	2022-02-03 12:40:14 +00:00
John Brawn	94843ea7d7	[AArch64] Make machine combiner patterns preserve MIFlags This is mainly done so that we don't lose the nofpexcept flag once we start emitting it. Differential Revision: https://reviews.llvm.org/D118621	2022-02-03 11:58:59 +00:00
David Green	31373fb88a	[AArch64] Reassociate integer extending reductions to pairwise addition. Given an (integer) vecreduce, we know the order of the inputs does not matter. We can convert UADDV(add(zext(extract_lo(x)), zext(extract_hi(x)))) into UADDV(UADDLP(x)). This can also happen through an extra add, where we transform UADDV(add(y, add(zext(extract_lo(x)), zext(extract_hi(x))))). This makes sure the same thing happens signed cases too, which requires adding a new SADDLP node. Differential Revision: https://reviews.llvm.org/D118107	2022-02-03 11:05:48 +00:00
Simon Moll	73ac3b1371	[VE] Packed v512i32 isel and tests Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D118332	2022-02-03 11:01:54 +01:00
Roman Lebedev	ee4ba9f3a1	Revert "[SimplifyCFG] Start redesigning `FoldTwoEntryPHINode()`." Unfortunately, it seems we really do need to take the long route; start from the "merge" block, find (all the) "dispatch" blocks, and deal with each "dispatch" block separately, instead of simply starting from each "dispatch" block like it would logically make sense, otherwise we run into a number of other missing folds around `switch` formation, missing sinking/hoisting and phase ordering. This reverts commit 85628ce75b3084dc0f185a320152baf85b59aba7. This reverts commit c5fff9095342a792bf4b9a077fe3c3a83c4e566c. This reverts commit 34a98e1046e3aa55e5f26ab20a15e96b4034d25a. This reverts commit 1e353f092288309d74d380367aa50bbd383780ed.	2022-02-03 12:32:50 +03:00
Sander de Smalen	01bfe9729a	[ISEL] Canonicalize STEP_VECTOR to LHS if RHS is a splat. This helps recognise patterns where we're trying to match STEP_VECTOR patterns to INDEX instructions that take a GPR for the Start/Step. The reason for canonicalising this operation to the LHS is because it will already be canonicalised to the LHS if the RHS is a constant splat vector. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D118459	2022-02-03 09:31:46 +00:00
Thomas Symalla	78bf2e0a3f	[AMDGPU] Update two Codegen tests. (NFC) This change adds a new Codegen test with auto-generated checks and updates divergence-driven-trunc-to-i1.ll with auto-generated checks. This is in preparation to D118461 to visualize the Codegen changes.	2022-02-03 10:28:53 +01:00
Fangrui Song	f3a66ec0bd	[asan][test] Re-generate asan-check-memaccess-add.ll with update_llc_test_checks.py * LABEL is important to give a better diagnostic in case a check pattern fails * Some NOT negative patterns are not effective. NEXT is useful to ensure a code sequence has the desired instructions and report a better diagnostic if something goes off. * Since the ABI says the first parameter is in RDI. Replacing the pattern `[[REG16:.*]]` with `RDI` should not cause maintenance burden. Since the test is pretty mechanical, just use update_llc_test_checks.py to re-generate it. Most functions can use `nounwind` to avoid CFI directives. Reviewed By: kstoimenov, vitalybuka Differential Revision: https://reviews.llvm.org/D118864	2022-02-02 23:40:10 -08:00
Florian Mayer	29f92da522	[mte] fix compiler crash with musttail. see D118852 for matching hwasan fix. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118854	2022-02-02 16:13:46 -08:00
Matt Arsenault	d6fdbbcace	AMDGPU: Add second emergency slot for SGPR to vmem for large frames In a future change, we will sometimes use a VGPR offset for doing spills to memory, in which case we need 2 free VGPRs to do the SGPR spill. In most cases we could spill the VGPR along with the SGPR being spilled, but we don't have any free lanes for SGPR_1024 in wave32 so we could still potentially need a second scavenging slot.	2022-02-02 19:05:05 -05:00
Florian Mayer	f7a6c341cb	[mte] support more complicated lifetimes (e.g. for exceptions). Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118848	2022-02-02 14:39:22 -08:00
Sanjay Patel	0e9a3d3603	[x86] add test for 'sbb' false dependency stall; NFC	2022-02-02 16:56:10 -05:00
Florian Mayer	8680d6db1e	[mte] work around lifetime issue with setjmp. setjmp can return twice, but PostDominatorTree is unaware of this. as such, it overestimates postdominance, leaving some cases where memory does not get untagged on return. this causes false positives later in the program execution. this is a workaround for now, in the longer term PostDominatorTree should be made aware of returns_twice, as this may cause problems elsewhere. See D118647 for equivalent fix to HWASan. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118749	2022-02-02 13:55:09 -08:00
Matt Arsenault	a96dbb9035	CodeGen: Use asm register names in warning message This was using the ugly tablegenerated register enum names, which are really hideous for register tuples on AMDGPU. Use the prettier names which are recognized by the asm parser.	2022-02-02 14:20:12 -05:00
Matt Arsenault	245e25f9c3	AMDGPU: Implement isAsmClobberable Warn on inline assembly clobbering reserved registers. It should also warn on at least some reserved register defs, but that isn't happening right now. If you have a def and re-use of a register we reserve, the register coalescer will eliminate the intermediate virtual register. When the reserved reg def is introduced later by the backend, it will end up clobbering the value the register coalescer assumed was live through the range. There is also isInlineAsmReadOnlyReg, although I don't understand what the distinction really is. It's called in SelectionDAGBuilder, long before the set of reserved registers is frozen so I'm not sure how that can possibly work reliably. Unfortunately this is also using the ugly tablegenerated names for the registers.	2022-02-02 14:20:12 -05:00
Craig Topper	b73d151a11	[RISCV] Add DAG combines to transform ADD_VL/SUB_VL into widening add/sub. This adds or reuses ISD opcodes for vadd.wv, vaddu.wv, vadd.vv, vaddu.vv and a similar set for sub. I've included support for narrowing scalar splats that have known sign/zero bits similar to what was done for MUL_VL. The conversion to vwadd.vv proceeds in two phases. First we'll form a vwadd.wv by narrowing one of the operands. Then we'll visit the vwadd.wv to try to narrow the other operand. This turned out to be simpler than catching all the cases in one step. The forming of of vwadd.wv can happen for either operand for add, but only the right hand side for sub since sub isn't commutable. An interesting quirk is that ADD_VL and VZEXT_VL/VSEXT_VL are formed during vector op legalization, but VMV_V_X_VL isn't usually formed until op legalization when BUILD_VECTORS are handled. This leads to VWADD_W_VL forming in one DAG combine round, and then a later DAG combine round sees the VMV_V_X_VL and needs to commute the operands to get the splat in position. This alone necessitated a VWADD_W_VL combine function which made forming vwadd.vv in two stages an easy choice. I've left out trying hard to form vwadd.wx instructions for now. It would only save an extend in the scalar domain which isn't as interesting. Might need to review the test coverage a bit. Most of the vwadd.wv instructions are coming from vXi64 tests on rv64. The tests were copy pasted from the existing multiply tests. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D117954	2022-02-02 10:03:08 -08:00

1 2 3 4 5 ...

42034 Commits