llvm-project

Author	SHA1	Message	Date
Pravin Jagtap	c931f2e6fd	[AMDGPU] Autogenerate & pre-commit tests for D156301 and D157388 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D157712	2023-08-18 09:50:44 -04:00
Carl Ritson	ad9eed1e77	[MachineVerifier] Verify LiveIntervals for PHIs Implement basic support for verifying LiveIntervals for PHIs. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D156872	2023-08-18 18:14:22 +09:00
Joe Nash	6aab000874	[AMDGPU] Convert fmul-2-combine-multi-use test to auto-gen NFC. Deletes the unused SI runline. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D158198	2023-08-17 14:23:20 -04:00
Pravin Jagtap	af5fd142d3	[AMDGPU] Extend f32 support for llvm.amdgcn.update.dpp intrinsic This will be useful to avoid the bit-casting noise required to extend support for Floating Point Operations in atomic optimizer for DPP in D156301 Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D156647	2023-08-17 10:45:19 -04:00
Christudasan Devadasan	81827f8cfb	[AMDGPU] Support wwm-reg AV spill pseudos The wwm register spill pseudos are currently defined for VGPR_32 regclass. It causes a verifier error for gfx908 or above as the regalloc sometimes restores the values to the vector superclass AV_32. Fixing it by supporting AV wwm-spill pseudos as well. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155646	2023-08-17 20:04:18 +05:30
Tuan Chuong Goh	74e191da07	[AArch64][GlobalISel] Combiner for EXT Keep components of UNMERGE larger after running the Artifact Combiner on it. This was intended to help with <v16i64> = G_SEXT <v16i16>, but implementation for legalizing EXT is in a following patch, therefore a test for this case will be included in the following patch Differential Revision: https://reviews.llvm.org/D157715	2023-08-17 14:34:45 +01:00
Matt Arsenault	c9d0d15e69	AMDGPU: Refine some rsq formation tests Drop unnecessary flags and metadata, add contract flags that should be necessary.	2023-08-16 13:37:03 -04:00
Matt Arsenault	66ee794064	AMDGPU: Fix verifier error on splatted opencl fmin/fmax and ldexp calls Apparently the spec has overloads for fmin/fmax and ldexp with one of the operands as scalar. We need to broadcast the scalars to the vector type. https://reviews.llvm.org/D158077	2023-08-16 09:42:26 -04:00
Ivan Kosarev	d7efe41598	[AMDGPU] Autogenerate the v_cndmask.ll and llvm.amdgcn.image.msaa.load.ll codegen tests. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D157970	2023-08-16 12:50:51 +01:00
Ivan Kosarev	f9ab235318	[AMDGPU] Autogenerate the fmuladd.f16.ll and llvm.fmuladd.f16.ll codegen tests. Reviewed By: Joe_Nash Differential Revision: https://reviews.llvm.org/D157966	2023-08-16 12:49:45 +01:00
Carl Ritson	d0e246ff16	[LiveRange] Fix inaccurate verification of live-in PhysRegs Fix verification that a PhysReg is live in to an MBB. isLiveIn does not handle reg units, so cannot identify when a register would be defined because its super register is partially defined. Additionally a PhysReg may be partial defined at block entry and then fully defined before any use. Reviewed By: foad, arsenm Differential Revision: https://reviews.llvm.org/D157086	2023-08-16 17:42:42 +09:00
Joe Nash	a093032981	[AMDGPU][True16] Update FPToI1Pat GFX11 pat to use GFX11 instruction These cmp patterns were using the pre-GFX11 pseudo instruction, and so failed to compile. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D157912	2023-08-15 11:11:17 -04:00
Matt Arsenault	d251761660	AMDGPU: Replace log libcalls with log intrinsics	2023-08-15 10:48:46 -04:00
Matt Arsenault	81b278e613	AMDGPU: Fix fast f32 exp2 Mirror of the previous log changes, OpenCL conformance doesn't like interpreting afn as ignore denormal handling but was previously hidden by flag dropping.	2023-08-15 10:48:46 -04:00
Matt Arsenault	4b7b4b9458	AMDGPU: Fix fast f32 log/log10 OpenCL conformance didn't like interpreting afn as ignore the denormal handling. https://reviews.llvm.org/D157940	2023-08-15 10:48:46 -04:00
Matt Arsenault	e09b3593ba	AMDGPU: Fix fast math log2 f32 Apparently afn doesn't allow you to drop the denormal handling according to OpenCL conformance. This was hidden by losing the flags during the library linking process. Fast log is still broken and needs more work. https://reviews.llvm.org/D157936	2023-08-15 10:48:46 -04:00
Jay Foad	f0e5f73fdc	[MachineScheduler] Account for lane masks in basic block liveins Differential Revision: https://reviews.llvm.org/D157633	2023-08-15 09:52:43 +01:00
Matt Arsenault	1faa4797ca	AMDGPU: Handle unsafe exp.f32 with denormal handling I somehow missed this path when adding the new expansions. Saves a lot of instructions for afn + IEEE. https://reviews.llvm.org/D157867	2023-08-14 18:36:01 -04:00
Matt Arsenault	d45022b094	AMDGPU: Remove special case constant folding of divide We should probably just swap this out for the fdiv, but that's what the implementation is anyway.	2023-08-14 18:36:01 -04:00
Matt Arsenault	0eabe65bfb	AMDGPU: Replace ldexp libcalls with intrinsic	2023-08-14 18:36:01 -04:00
Matt Arsenault	f337a77c99	AMDGPU: Replace rounding libcalls with intrinsics	2023-08-14 18:36:01 -04:00
Matt Arsenault	c7876c55ac	AMDGPU: Replace fabs and copysign libcalls with intrinsics Preserves flags and metadata like the other cases.	2023-08-14 18:28:21 -04:00
Matt Arsenault	a70006c4c5	AMDGPU: Replace some libcalls with intrinsics OpenCL loses fast math information by going through libcall wrappers around intrinsics. Do this to preserve call site flags which are lost when inlining. It's not safe in general to propagate flags during inline, so avoid dealing with this by just special casing some of the useful calls.	2023-08-14 18:20:47 -04:00
Joe Nash	dc242f9f1e	[AMDGPU][NFC] Convert fpto{u\|s}i f16 tests to auto-gen Makes it easier to add GFX11 runline in future patch, which has significantly different output.	2023-08-14 15:28:20 -04:00
Matt Arsenault	a8376bbe53	AMDGPU: Add baseline tests for libcall to intrinsic handling Test all the different itanium mangled opencl functions that are interesting to replace with raw intrinsic calls. https://reviews.llvm.org/D157873	2023-08-14 15:15:30 -04:00
Matt Arsenault	f44beecb78	AMDGPU: Try to use private version of sincos if available The comment was out of date, the device libs build does provide all the pointer overloads. An extremely pedantic interpretation of the spec would suggest only the flat version exists, but the overloads do exist in the implementation. https://reviews.llvm.org/D156720	2023-08-14 11:40:04 -04:00
Matt Arsenault	58fd1de09f	AMDGPU: Consider nobuiltin when querying defined libfuncs https://reviews.llvm.org/D156708	2023-08-14 11:30:12 -04:00
Matt Arsenault	42c6e4209c	AMDGPU: Handle multiple uses when matching sincos Match how the generic implementation handles this. We now will leave behind the dead other user for later passes to deal with. https://reviews.llvm.org/D156707	2023-08-14 11:28:41 -04:00
Nikita Popov	9deee6bffa	[SDAG] Don't transfer !range metadata without !noundef to SDAG (PR64589) D141386 changed the semantics of !range metadata to return poison on violation. If !range is combined with !noundef, violation is immediate UB instead, matching the old semantics. In theory, these IR semantics should also carry over into SDAG. In practice, DAGCombine has at least one key transform that is invalid in the presence of poison, namely the conversion of logical and/or to bitwise and/or (`c7b537bf09/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (L11252)`). Ideally, we would fix this transform, but this will require substantial work to avoid codegen regressions. In the meantime, avoid transferring !range metadata without !noundef, effectively restoring the old !range metadata semantics on the SDAG layer. Fixes https://github.com/llvm/llvm-project/issues/64589. Differential Revision: https://reviews.llvm.org/D157685	2023-08-14 09:04:27 +02:00
Christudasan Devadasan	bd7c6e3c48	[AMDGPU] Precommit lit test for wwm-reg AV spill pseudos D155646.	2023-08-12 16:18:18 +05:30
Konrad Kusiak	4fa8a5487e	[AMDGPU] Add sanity check that fixes bad shift operation in AMD backend There is a problem with the SILoadStoreOptimizer::dmasksCanBeCombined() function that can lead to UB. This boolean function decides if two masks can be combined into 1. The idea here is that the bits which are "on" in one mask, don't overlap with the "on" bits of the other. Consider an example (10 bits for simplicity): Mask 1: 0101101000 Mask 2: 0000000110 Those can be combined into a single mask: 0101101110. To check if such an operation is possible, the code takes the mask which is greater and counts how many 0s there are, starting from the LSB and stopping at the first 1. Then, it shifts 1u by this number and compares it with the smaller mask. The problem is that when both masks are 0, the counter will find 32 zeroes in the first mask and will try to do a shift by 32 positions which leads to UB. The fix is a simple sanity check, if the bigger mask is 0 or not. https://reviews.llvm.org/D155051	2023-08-11 15:26:35 -04:00
Mirko Brkusanin	1e5359c6ba	[AMDGPU] Treat KIMM32 and KIMM16 operand types as noninlinable While they are represent 32/16 bit immediate values they are already included in encoding of the instructions that use them and are not true literals. FMAMK and FMAAK instructions that use them are marked with fixed size so getInstSizeInBytes will not increase the size for these operands. We also add tests whose logic relies on KIMM16 and KIMM32 being considered not inlinable. Differential Revision: https://reviews.llvm.org/D157624	2023-08-11 18:46:39 +02:00
Jeffrey Byrnes	f76ffc1f40	[MCP] Invalidate copy for super register in copy source We must also track the super sources of a copy, otherwise we introduce a sort of subtle bug. Consider: 1. DEF r0:r1 2. USE r1 3. r6:r9 = COPY r10:r13 4. r14:15 = COPY r0:r1 5. USE r6 6.. r1:4 = COPY r6:9 BackwardCopyPropagateBlock processes the instructions from bottom up. After processing 6., we will have propagatable copy for r1-r4 and r6-r9. After 5., we invalidate and erase the propagatble copy for r1-r4 and r6 but not for r7-r9. The issue is that when processing 3., data structures still say we have valid copies for dest regs r7-r9 (from 6.). The corresponding defs for these registers in 6. are r1:r4, which we mark as registers to invalidate. When invalidating, we find the copy that corresponds to r1 is 4. (this was added when processing 4.), and we say that r1 now maps to unpropagatable copies. Thus, when we process 2., we do not have a valid copy, but when we process 1. we do -- because the mapped copy for subregister r0 was never invalidated. The net result is to propagate the copy from 4. to 1., and replace DEF r0:r1 with DEF r14:r15. Then, we have a use before def in 2. The main issue is that we have an inconsitent state between which def regs and which src regs are valid. When processing 5., we mark all the defs in 6. as invalid, but only the subreg use as invalid. Either we must only invalidate the individual subreg for both uses and defs, or the super register for both. Differential Revision: https://reviews.llvm.org//D157564 Change-Id: I99d5e0b1a0d735e8ea3bd7d137b6464690aa9486	2023-08-11 09:01:18 -07:00
Jeffrey Byrnes	d0e54e377b	[AMDGPU] Extend CalculateByteProvider to capture vectors and signed Differential Revision: https://reviews.llvm.org/D157133 Change-Id: I9ba8727b4ac5a627de2f7d87d2169eb79e01f0ee	2023-08-11 08:47:17 -07:00
Joe Nash	2fb4bfa5ba	[AMDGPU][True16] Fix ISel for A16 Image Instructions The 16-bit VAddr arguments to A16 image instructions are packed into legal VGPR_32 operands in AMDGPULegalizerInfo::legalizeImageIntrinsic on all subtargets. With True16, we also need to pack if the number of VAddr is one because VGPR_16 is not a legal argument to those Image instructions. No change to emitted code intended on subtargets pre-GFX11, and none on GFX11 until True16 is active. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D157426	2023-08-11 11:12:16 -04:00
Matt Arsenault	8f18cf77e7	AMDGPU: Check for implicit defs before constant folding instruction Can't delete the constant folded instruction if scc is used. Fixes #63986 https://reviews.llvm.org/D157504	2023-08-11 10:29:53 -04:00
Matt Arsenault	1030483561	AMDGPU/GlobalISel: Handle stacksave/stackrestore https://reviews.llvm.org/D156670	2023-08-11 10:25:01 -04:00
Matt Arsenault	9a53f5f5c4	AMDGPU: Handle llvm.stacksave and llvm.stackrestore Not sure if the only valid use is to have stackrestore directly consume stacksave outputs or not. Handled exactly like a regular stack pointer so all the edge cases theoretically should work. https://reviews.llvm.org/D156669	2023-08-11 10:25:01 -04:00
Joe Nash	ef79d9e38e	[AMDGPU][NFC] Regenerate CHECKs as pre-commit for D157426	2023-08-11 09:55:59 -04:00
Matt Arsenault	29fff3e2ab	AMDGPU: Try to select fmul by power of 2 to ldexp For the f64 case, this gives us a cheaper to materialize 32-bit constant. It's less obviously a win for f32 and f16. It forces us to use a VOP3 encoding so it's a neutral code size change. GlobalISel cases don't work because of the constant-is-copy-to-vgpr problem. https://reviews.llvm.org/D157111	2023-08-11 07:57:55 -04:00
Matt Arsenault	c8a4f2a8c1	AMDGPU: Add baseline tests for fmul-to-ldexp patterns We can better some multiply-by-power-of-2 patterns as ldexp.	2023-08-11 07:57:55 -04:00
Stanislav Mekhanoshin	02046ad944	[AMDGPU] W/a for gfx940 byte0 fp8 conversion bug VOP1 form of these do not work. Differential Revision: https://reviews.llvm.org/D157683	2023-08-11 02:21:21 -07:00
pvanhout	490a867f16	[GlobalISel] Also set dead flags of implicit defs added by BuildMI BuildMI automatically adds the implicit operands of the instruction. This meant we couldn''t set the dead flag on dead implicit defs in that case. Fix it by introducing an opcode to mark a given implicit def as dead. Fixes #64565 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D157515	2023-08-11 08:38:37 +02:00
pvanhout	89e91e4c0c	[AMDGPU] Remove post-PromoteAlloca SROA run PromoteAlloca now uses SSAUpdater, it doesn't need SROA to clean-up after it anymore. Internal testing shows no noticeable performance impact. Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D156398	2023-08-11 08:29:21 +02:00
Matt Arsenault	7575ee7167	AMDGPU: Add more test coverage for FP-typed atomicrmw xchg	2023-08-10 17:38:25 -04:00
Changpeng Fang	1e22873ef4	[AMDGPU][NFC] Rename two LIT test files	2023-08-10 11:31:14 -07:00
Jay Foad	3091bdb86d	[AMDGPU] Do not release VGPRs at -O0 This was an oversight when the GFX11 early release VGPRs optimization was reimplemented in D153279. Sending the DEALLOC_VGPRS message is a performance optimization so there is no need to do it at -O0. In addition it makes some kinds of post mortem debugging hard or impossible, since VGPR values are no longer available to inspect at the s_endpgm instruction. Differential Revision: https://reviews.llvm.org/D157599	2023-08-10 14:58:06 +01:00
Matt Arsenault	6dbd458128	AMDGPU: Remove pointless libcall optimization of fma/mad After the library is linked and trivially inlined, the generic fma and fmuladd intrinsics already handle these cases, and with precise flag handling. This was requiring all fast math flags when we really just need nsz for the fma(a, b, 0) case. https://reviews.llvm.org/D156677	2023-08-09 19:37:52 -04:00
Matt Arsenault	6448d5ba58	AMDGPU: Remove pointless libcall recognition of native_{divide\|recip} This was trying to constant fold these calls, and also turn some of them into a regular fmul/fdiv. There's no point to doing that, the underlying library implementation should be using those in the first place. Even when the library does use the rcp intrinsics, the backend handles constant folding of those. This was also only performing the folds under overly strict fast-evertyhing-is-required conditions. The one possible plus this gained over linking in the library is if you were using all fast math flags, it would propagate them to the new instructions. We could address this in the library by adding more fast math flags to the native implementations. The constant fold case also had no test coverage. https://reviews.llvm.org/D156676	2023-08-09 18:48:46 -04:00
Matt Arsenault	58e87c961e	AMDGPU: Port AMDGPULowerKernelArguments to new pass manager https://reviews.llvm.org/D157498	2023-08-09 18:34:30 -04:00

1 2 3 4 5 ...

6693 Commits