llvm-project

Author	SHA1	Message	Date
jwanggit86	b853988e0d	[AMDGPU] Port AMDGPURewriteUndefForPHI to new pass manager (#66008 ) This patch ports the AMDGPURewriteUndefForPHI pass to the new pass manager. With this, the pass is supported under both the legacy and the new pass managers. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2023-09-12 13:32:02 -07:00
Matt Arsenault	c48248d7f9	AMDGPU: Teach valueIsKnownNeverF32Denorm about frexp https://reviews.llvm.org/D158130	2023-09-12 23:23:10 +03:00
Matt Arsenault	72a7024add	AMDGPU: Correctly lower llvm.sqrt.f32 Make codegen emit correctly rounded sqrt by default. Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare based on !fpmath, like the fdiv case. Hack around visitation ordering problems from AMDGPUCodeGenPrepare using forward iteration instead of a well behaved combiner. https://reviews.llvm.org/D158129	2023-09-12 23:22:54 +03:00
Jay Foad	928c9d6851	[AMDGPU] Fix some MIR tests (#66090 ) Fix some problems in hand written MIR tests that only showed up when I tried to run LiveIntervals on them, after which they failed machine verification with "Use not jointly dominated by defs" errors.	2023-09-12 16:32:41 +01:00
Benjamin Kramer	bc8d85655c	[NVPTX] Tighten up legal v2i16 ops a bit TargetLoweringBase makes almost all ops legal by default, so make ones that Expand explicit and remove redundant legal settings.	2023-09-12 16:10:20 +02:00
David Green	b4c66f4e33	Revert "[AArch64][GlobalISel] Add lowering for constant BIT/BIF/BSP" This reverts commit cb5bad2acd7a498761d4979825d6801f5a845135 as the existing fcopysign code looks like it might be incorrect.	2023-09-12 14:18:44 +01:00
Allen	eaf23b2480	[GIsel][AArch64] Legalize <2 x i16> for G_INSERT_VECTOR_ELT (#65830 ) Widen the vector elements to 64 bits to make sure it legal instead by clamping the number of elements. Depend on D153394. Fixes https://github.com/llvm/llvm-project/issues/63826	2023-09-12 21:15:01 +08:00
Jay Foad	0528dbfe5c	Add some -early-live-intervals RUN lines (#66058 ) This adds test coverage for an upcoming change to TwoAddressInstructionPass::processTiedPairs.	2023-09-12 13:06:10 +01:00
Yingwei Zheng	4793c2c3de	[DAGCombiner][RISCV] Prefer to sext i32 non-negative values (#65984 ) By default, `DAGCombiner` folds `sext x` to `zext x` when `x` is non-negative. It will generate redundant `zext` inst seq on riscv64 (typically `slli (srli x, 32), 32`). godbolt: https://godbolt.org/z/osf6adP1o This patch applies the transform iff `zext` is cheaper than `sext`.	2023-09-12 19:02:35 +08:00
Paul Walker	ea42c4ac6a	[SVE] Precommit test to show missing initialisation of call operand. When calling func_f8_and_v0_passed_via_memory the memory used to hold the first vector operand is allocated but not initialised.	2023-09-12 10:46:41 +00:00
Michal Paszkowski	efe0e10718	[SPIR-V] Support SPV_INTEL_arbitrary_precision_integers_extension, misc utils for other extensions Differential Revision: https://reviews.llvm.org/D158764	2023-09-12 02:45:15 -07:00
Saiyedul Islam	466a8149b3	Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410 )" (#66060 ) This reverts commit 0a8d17e79b02a92814a2a788d79df1f54d70ec3e.	2023-09-12 15:13:59 +05:30
Ivan Kosarev	eaf737a4e0	[AMDGPU] Remove the GFX11 runs in CodeGen/AMDGPU/fma.f16.ll. It still fails with expensive checks enabled. This partially reverts: a1e38e0b8e3e [AMDGPU][GFX11] Add more test coverage for FMA instructions.	2023-09-12 10:30:52 +01:00
David Green	cb5bad2acd	[AArch64][GlobalISel] Add lowering for constant BIT/BIF/BSP (#65897 ) The non-constant bit/bif/bsp already work through tablegen patterns, this patch handles the constant case, mirroring the basic support for `or(and(X, C), and(Y, ~C))` from ISel tryCombineToBSL. BSP gets expanded to either BIT, BIF or BSL depending on the best register allocation. G_BIT can be replaced with G_BSP as a more general alternative.	2023-09-12 10:13:32 +01:00
David Green	b7cb18c5eb	[AArch64][GISel] Expand test coverage of FPow. This adds some more extensive test coverage for fpow intrinsics through global isel, and removes the unused vector libcall types. All types get scalarized, fp16 will be expanded to fp32 and then we lower to a libcall from there.	2023-09-12 10:08:09 +01:00
Ivan Kosarev	a1e38e0b8e	[AMDGPU][GFX11] Add more test coverage for FMA instructions. (#65935 ) This is another attempt to update the tests to run for GFX11. Previously done in <https://reviews.llvm.org/D153269>, and then reverted in <https://reviews.llvm.org/rG2d3e6c440244ad94777aa13566b0376eb3c088f1> due to a failure on a buildbot with expensive checks enabled. Commit 4b1702e87a2687569b197aea4721353f8b788182 fixed the problem.	2023-09-12 09:40:10 +01:00
Saiyedul Islam	0a8d17e79b	[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410 ) Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Reviewed By: arsenm, jhuber6 Github PR: #65410 Differential Revision: https://reviews.llvm.org/D129818	2023-09-12 13:53:31 +05:30
pvanhout	2126a18d86	[AMDGPU] Regen combine-fma-add-mul-pre-legalize.mir	2023-09-12 08:50:12 +02:00
Matt Arsenault	cd4b906e18	RegisterCoalescer: Don't delete IMPLICIT_DEF if it's live into the same block Live out implicit_defs need to be kept, but the check for this only checked if the block parent was the same. This doesn't work if the parent blocks are the same but the value is live. Fixes verifier error "Instruction ending live segment doesn't read the register", which would appear at the coalesced non-implicit_def def. Fixes #38788 https://reviews.llvm.org/D158882	2023-09-12 09:28:33 +03:00
Matt Arsenault	de5585078e	RegisterCoalescer: Correctly set valid lanes when keeping live out implicit defs This fixes some verifier errors when live out implicit defs are coalesced with identity copies. Fixes some reduced testcases from issue #38788 but doesn't solve the original failure. I was surprised this seems to obviate the special casing in analyzeValue that's been there since the subregister liveness support went in. https://reviews.llvm.org/D158850	2023-09-12 09:28:33 +03:00
liqin.weng	1eec357494	[VP] IR expansion for maxnum/minnum Add basic handling for VP ops that can expand to non-predicate ops Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D159494	2023-09-12 10:15:52 +08:00
Fangrui Song	cfc1a87878	[test] Change llc -march= to -mtriple= & llvm-mc -arch= to -triple= Similar to 806761a7629df268c8aed49657aeccffa6bca449	2023-09-11 15:11:01 -07:00
Fangrui Song	806761a762	[test] Change llc -march= to -mtriple= The issue is uncovered by #47698: for IR files without a target triple, -mtriple= specifies the full target triple while -march= merely sets the architecture part of the default target triple, leaving a target triple which may not make sense, e.g. riscv64-apple-darwin. Therefore, -march= is error-prone and not recommended for tests without a target triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead of rejecting it outrightly.	2023-09-11 14:42:37 -07:00
Philip Reames	5352c79398	[RISCV] Add a combine to form masked.load from unit strided load (#65674 ) Add a DAG combine to form a masked.load from a masked_strided_load intrinsic with stride equal to element size. This covers a couple of extra test cases, and allows us to simplify and common some existing code on the concat_vector(load, ...) to strided load transform. This is the first in a mini-patch series to try and generalize our strided load and gather matching to handle more cases, and common up different approaches to the same problems in different places.	2023-09-11 13:01:14 -07:00
Vitaly Buka	f106b3f135	Revert "[PHIElimination] Handle subranges in LiveInterval updates" Leaks memory. This reverts commit 3bff611068ae70e3273a46bbc72bc66b66f98c1c.	2023-09-11 11:09:26 -07:00
Jeremy Morse	1ce1732f82	[DebugInfo] Use getStableDebugLoc to pick IRBuilder DebugLocs When IRBuilder is given an insertion position and there is debug-info, it sets the DebugLoc of newly inserted instructions to the DebugLoc of the insertion position. Unfortunately, that means if you insert in front of a debug intrinsics, your "real" instructions get potentially-misleading source locations from the debug intrinsics. Worse, if you compile -gmlt to get source locations but no variable locations, you'll get different source locations to a normal -g build, which is silly. Rectify this with the getStableDebugLoc method, which skips over any debug intrinsics to find the next "real" instruction. This is the source location that you would get if you compile with -gmlt, and it remains stable in the presence of debug intrinsics. The changed tests show a few locations where this has been happening, for example selecting line-zero locations for instrumentation on a perfectly valid call site. Differential Revision: https://reviews.llvm.org/D159485	2023-09-11 19:00:44 +01:00
Philip Reames	299d710e3d	[RISCV] Lower fixed vectors extract_vector_elt through stack at high LMUL This is the extract side of D159332. The goal is to avoid non-linear costing on patterns where an entire vector is split back into scalars. This is an idiomatic pattern for SLP. Each vslide operation is linear in LMUL on common hardware. (For instance, the sifive-x280 cost model models slides this way.) If we do a VL unique extracts, each with a cost linear in LMUL, the overall cost is O(LMUL2) * VLEN/ETYPE. To avoid the degenerate case, fallback to the stack if we're beyond LMUL2. There's a subtly here. For this to work, we're relying on an optimization in LegalizeDAG which tries to reuse the stack slot from a previous extract. In practice, this appear to trigger for patterns within a block, but if we ended up with an explode idiom split across multiple blocks, we'd still be in quadratic territory. I don't think that variant is fixable within SDAG. It's tempting to think we can do better than going through the stack, but well, I haven't found it yet if it exists. Here's the results for sifive-s280 on all the variants I wrote (all 16 x i64 with V): output/sifive-x280/linear_decomp_with_slidedown.mca:Total Cycles: 20703 output/sifive-x280/linear_decomp_with_vrgather.mca:Total Cycles: 23903 output/sifive-x280/naive_linear_with_slidedown.mca:Total Cycles: 21604 output/sifive-x280/naive_linear_with_vrgather.mca:Total Cycles: 22804 output/sifive-x280/recursive_decomp_with_slidedown.mca:Total Cycles: 15204 output/sifive-x280/recursive_decomp_with_vrgather.mca:Total Cycles: 18404 output/sifive-x280/stack_by_vreg.mca:Total Cycles: 12104 output/sifive-x280/stack_element_by_element.mca:Total Cycles: 4304 I am deliberately excluding scalable vectors. It functionally works, but frankly, the code quality for an idiomatic explode loop is so terrible either way that it felt better to leave that for future work. Differential Revision: https://reviews.llvm.org/D159375	2023-09-11 10:49:17 -07:00
Stanislav Mekhanoshin	070c2570ad	[AMDGPU] Global ISel for packed fp32 instructions (#65803 )	2023-09-11 10:48:37 -07:00
Stanislav Mekhanoshin	093aa37744	[AMDGPU] Autogenerate min.ll/max.ll tests. NFC. (#65786 )	2023-09-11 10:29:53 -07:00
Luke Lau	e33f3f09b8	[RISCV] Shrink vslidedown when lowering fixed extract_subvector (#65598 ) As noted in https://github.com/llvm/llvm-project/pull/65392#discussion_r1316259471, when lowering an extract of a fixed length vector from another vector, we don't need to perform the vslidedown on the full vector type. Instead we can extract the smallest subregister that contains the subvector to be extracted and perform the vslidedown with a smaller LMUL. E.g, with +Zvl128b: v2i64 = extract_subvector nxv4i64, 2 is currently lowered as vsetivli zero, 2, e64, m4, ta, ma vslidedown.vi v8, v8, 2 This patch shrinks the vslidedown to LMUL=2: vsetivli zero, 2, e64, m2, ta, ma vslidedown.vi v8, v8, 2 Because we know that there's at least 128*2=256 bits in v8 at LMUL=2, and we only need the first 256 bits to extract a v2i64 at index 2. lowerEXTRACT_VECTOR_ELT already has this logic, so this extracts it out and reuses it. I've split this out into a separate PR rather than include it in #65392, with the hope that we'll be able to generalize it later. This patch refactors extract_subvector lowering to lower to extract_subreg directly, and to shortcut whenever the index is 0 when extracting a scalable vector. This doesn't change any of the existing behaviour, but makes an upcoming patch that extends the scalable path slightly easier to read.	2023-09-11 17:25:12 +01:00
Luke Lau	46f3ea5952	[RISCV] Add extract_subvector tests for a statically-known VLEN. NFC (#65389 ) This is partly a precommit for an upcoming patch, and partly to remove the fixed length LMUL restriction similarly to what was done in https://reviews.llvm.org/D158270, since it's no longer that relevant.	2023-09-11 16:28:53 +01:00
Simon Pilgrim	f8b04eb6d0	[X86] matchIndexRecursively - add zext(add/addlike(x,c)) -> index: zext(x), disp + zext(c) pattern handling More restricted alternative to a8cef6b58e2d41f	2023-09-11 15:36:13 +01:00
liqin.weng	3723ede3cf	[VP] IR expansion for zext/sext/trunc/fptosi/fptosi/sitofp/uitofp/fptrunc/fpext Add basic handling for VP ops that can expand to Cast intrinsics Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D159491	2023-09-11 21:14:38 +08:00
liqin.weng	28e74e6180	[VP] IR expansion for abs/smax/smin/umax/umin Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D159495	2023-09-11 21:14:37 +08:00
Max Iyengar	dbeb3d029d	Add missing vrnd intrinsics This patch adds 8 missing intrinsics as specified in the Arm ACLE document section 2.12.1.1 : [[ https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#rounding-3 \| https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#rounding-3]] The intrinsics implemented are: - vrnd32z_f64 - vrnd32zq_f64 - vrnd64z_f64 - vrnd64zq_f64 - vrnd32x_f64 - vrnd32xq_f64 - vrnd64x_f64 - vrnd64xq_f64 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D158626	2023-09-11 12:59:18 +01:00
Simon Pilgrim	ef87d43834	Revert rGa8cef6b58e2d41f04ed4fa63c3f628eac1a28925 "[X86] promoteExtBeforeAdd - add support for or/xor 'addlike' patterns" Investigating reports of issues with second stage clang builds	2023-09-11 11:52:25 +01:00
Simon Pilgrim	a8cef6b58e	[X86] promoteExtBeforeAdd - add support for or/xor 'addlike' patterns Fold zext(addlike(x, C)) --> add(zext(x), C_zext) if its likely to help us create LEA instructions Addresses some regressions exposed by D155472	2023-09-11 10:17:34 +01:00
Simon Pilgrim	79941c3a0d	[X86] lea-2.ll - add test showing failure to fold shl(zext(or(x,c1)),c2) 'addlike' into LEA instruction	2023-09-11 10:17:33 +01:00
Nathan Gauër	56396b25f1	[SPIRV-V] Add SPIR-V logical triple to llc This commits adds the minimal required bits to build a logical SPIR-V compute shader using LLC. - Skip OpenCL-only capabilities & extensions for Logical SPIR-V. - Generate required metadata for entrypoints from HLSL frontend. - Fix execution mode to GLCompute in logical. The main issue is the lack of "vulkan" bit in the triple. This might need to be added as a vendor? Because as-is, SPIRV32/64 assumes OpenCL, and then, SPIRV assumes Vulkan. This is ok-ish today, but not correct. Differential Revision: https://reviews.llvm.org/D156424	2023-09-11 10:31:50 +02:00
Carl Ritson	3bff611068	[PHIElimination] Handle subranges in LiveInterval updates Add handling for subrange updates in LiveInterval preservation. This requires extending MachineBasicBlock::SplitCriticalEdge to also update subrange intervals. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D158144	2023-09-11 17:15:09 +09:00
Jingu Kang	5474d49f1f	[AArch64] Remove copy instruction between uaddlv and urshr If there are copy instructions between uaddlv and urshr for transfer from gpr to fpr, and vice versa, try to remove them. Differential Revision: https://reviews.llvm.org/D159265	2023-09-11 09:06:09 +01:00
Yeting Kuo	1f15155d5e	[RISCV] Disable zcmp push/pop for variadic functions. (#65302 ) Variadic function needs a save region for variable arguement and the region is possible to be overlaped with the region of zcmp push/pop used.	2023-09-11 13:09:01 +08:00
Carl Ritson	1d8a94c4ff	[AMDGPU] SILowerControlFlow: fix preservation of LiveIntervals In emitElse live interval for SI_ELSE source must be recalculated as SI_ELSE is removed, and new user is placed at block start. In emitIfBreak live interval for new created AndReg must be computed. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D158141	2023-09-11 13:46:28 +09:00
Carl Ritson	46ee3b3914	[AMDGPU] SILowerI1Copies: clear kill flags on COPY (#65883 ) Clear kill flags on COPY source as it will be reused.	2023-09-11 12:30:08 +09:00
Shengchen Kan	503e3a4130	[X86] Remove _REV instructions from the EVEX2VEX tables (#65752 ) _REV instruction should not appear before encoding optimization, so there is no chance to compress it during MIR optimizations.	2023-09-11 09:54:05 +08:00
Fangrui Song	61c44f1822	[X86] FastISel -fno-pic: emit R_386_PC32 when calling an intrinsic This matches how a SelectionDAG::getExternalSymbol node is lowered. On x86-32, a function call in -fno-pic code should emit R_386_PC32 (since ebx is not set up). When linked as -shared (problematic!), the generated text relocation will work. Ideally, we should mark IR intrinsics created in CodeGenFunction::EmitBuiltinExpr as dso_local, but the code structure makes it not very feasible. Fix #51078	2023-09-10 15:03:36 -07:00
Simon Pilgrim	63af54a84e	[AArch64] ushl_sat.ll - regenerate checks. NFC. Add missing asm comments to reduce a future diff.	2023-09-10 19:45:20 +01:00
Simon Pilgrim	76c09d9c5e	[X86] matchIndexRecursively - don't peek through multiuse sext(add_nsw(x,c)) (PR65895) Fixes #65895	2023-09-10 16:54:18 +01:00
David Green	4e52fd8468	[AArch64] Add GlobalISel coverage for BIT/BIF/BSL. NFC Some of the 1x vector types are expanded to scalar, but the others that do not require constants looks OK.	2023-09-10 13:02:35 +01:00
Matt Arsenault	17bd80601e	AMDGPU: Implement llvm.get.fpmode Currently s_getreg_b32 is missing the possible mode use. Really we need separate pseudos for mode-only accesses, but leave this as a pre-existing issue. https://reviews.llvm.org/D152710	2023-09-10 10:19:19 +03:00

... 56 57 58 59 60 ...

52796 Commits