llvm-project

Author	SHA1	Message	Date
Matt Arsenault	9e574a3936	DAG: Fix expansion of bf16 sourced extloads Also fix assorted vector extload failures for AMDGPU.	2023-12-20 19:24:27 +07:00
James Y Knight	137f785fa6	[AMDGPU] Set MaxAtomicSizeInBitsSupported. (#75185 ) This will result in larger atomic operations getting expanded to `__atomic_*` libcalls via AtomicExpandPass, which matches what Clang already does in the frontend. While AMDGPU currently disables the use of all libcalls, I've changed it to instead disable all of them _except_ the atomic ones. Those are already be emitted by the Clang frontend, and enabling them in the backend allows the same behavior there.	2023-12-18 16:51:06 -05:00
Piotr Sobczak	6eec80133b	[AMDGPU] Min/max changes for GFX12 (#75214 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-13 14:18:10 +01:00
Matt Arsenault	db8b85ac58	AMDGPU: Support llvm.exp10 (#65860 )	2023-12-02 21:56:35 +07:00
Sander de Smalen	81b7f115fb	[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979 ) It seems TypeSize is currently broken in the sense that: TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8) without failing its assert that explicitly tests for this case: assert(LHS.Scalable == RHS.Scalable && ...); The reason this fails is that `Scalable` is a static method of class TypeSize, and LHS and RHS are both objects of class TypeSize. So this is evaluating if the pointer to the function Scalable == the pointer to the function Scalable, which is always true because LHS and RHS have the same class. This patch fixes the issue by renaming `TypeSize::Scalable` -> `TypeSize::getScalable`, as well as `TypeSize::Fixed` to `TypeSize::getFixed`, so that it no longer clashes with the variable in FixedOrScalableQuantity. The new methods now also better match the coding standard, which specifies that: * Variable names should be nouns (as they represent state) * Function names should be verb phrases (as they represent actions)	2023-11-22 08:52:53 +00:00
Acim-Maravic	f3138524db	[AMDGPU] Generic lowering for rint and nearbyint (#69596 ) The are three different rounding intrinsics, that are brought down to same instruction. Co-authored-by: Acim Maravic <acim.maravic@amd.com>	2023-11-14 18:49:21 +01:00
Diana	7f5d59b38d	[AMDGPU] ISel for @llvm.amdgcn.cs.chain intrinsic (#68186 ) The @llvm.amdgcn.cs.chain intrinsic is essentially a call. The call parameters are bundled up into 2 intrinsic arguments, one for those that should go in the SGPRs (the 3rd intrinsic argument), and one for those that should go in the VGPRs (the 4th intrinsic argument). Both will often be some kind of aggregate. Both instruction selection frameworks have some internal representation for intrinsics (G_INTRINSIC[_WITH_SIDE_EFFECTS] for GlobalISel, ISD::INTRINSIC_[VOID\|WITH_CHAIN] for DAGISel), but we can't use those because aggregates are dissolved very early on during ISel and we'd lose the inreg information. Therefore, this patch shortcircuits both the IRTranslator and SelectionDAGBuilder to lower this intrinsic as a call from the very start. It tries to use the existing infrastructure as much as possible, by calling into the code for lowering tail calls. This has already gone through a few rounds of review in Phab: Differential Revision: https://reviews.llvm.org/D153761	2023-11-06 12:30:07 +01:00
Changpeng Fang	8ceb72ffe5	[AMDGPU] make v32i16/v32f16 legal (#70484 ) Some upcoming intrinsics will be using these new types	2023-10-27 15:28:31 -07:00
Pierre van Houtryve	40a426fac6	[AMDGPU] Constant fold FMAD_FTZ (#69443 ) Solves #68315	2023-10-19 16:05:51 +02:00
Jay Foad	21c2ba4bdb	[GlobalISel] Remove TargetLowering::isConstantUnsignedBitfieldExtractLegal Use LegalizerInfo::isLegalOrCustom instead. Differential Revision: https://reviews.llvm.org/D116807	2023-09-27 15:58:01 +01:00
Matt Arsenault	1328a8534b	AMDGPU: Fix handling of -0 in round lowering (#65761 )	2023-09-19 09:14:17 +03:00
Matt Arsenault	edecb60481	Reapply "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp" This reverts commit d9333e360a7c52587ab6e4328e7493b357fb2cf3.	2023-09-13 08:38:48 +03:00
Matt Arsenault	c48248d7f9	AMDGPU: Teach valueIsKnownNeverF32Denorm about frexp https://reviews.llvm.org/D158130	2023-09-12 23:23:10 +03:00
Matt Arsenault	72a7024add	AMDGPU: Correctly lower llvm.sqrt.f32 Make codegen emit correctly rounded sqrt by default. Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare based on !fpmath, like the fdiv case. Hack around visitation ordering problems from AMDGPUCodeGenPrepare using forward iteration instead of a well behaved combiner. https://reviews.llvm.org/D158129	2023-09-12 23:22:54 +03:00
Kazu Hirata	57390c914b	[AMDGPU] Use isNullConstant and isOneConstant (NFC)	2023-08-27 08:26:52 -07:00
Bjorn Pettersson	a23e01ada7	[AMDGPU] Fix -Wenum-compare warnings Avoiding warnings like this when building with GCC: warning: enumeral mismatch in conditional expression: 'llvm::AMDGPUISD::NodeType' vs 'llvm::ISD::NodeType' [-Wenum-compare]	2023-08-23 14:24:30 +02:00
Diana Picus	26dc284498	[AMDGPU] ISel for amdgpu_cs_chain[_preserve] functions Lower formal arguments and returns for functions with the `amdgpu_cs_chain` and `amdgpu_cs_chain_preserve` calling conventions: * Put `inreg` arguments into SGPRs, starting at s0, and other arguments into VGPRs, starting at v8. No arguments should end up on the stack, if we don't have enough registers we should error out. * Lower the return (which is always void) as an S_ENDPGM. * Set the ScratchRSrc register to s48:51, as described in the docs. * Set the SP to s32, matching amdgpu_gfx. This might be revisited in a future patch. Differential Revision: https://reviews.llvm.org/D153517	2023-08-21 11:16:17 +02:00
Matt Arsenault	81b278e613	AMDGPU: Fix fast f32 exp2 Mirror of the previous log changes, OpenCL conformance doesn't like interpreting afn as ignore denormal handling but was previously hidden by flag dropping.	2023-08-15 10:48:46 -04:00
Matt Arsenault	4b7b4b9458	AMDGPU: Fix fast f32 log/log10 OpenCL conformance didn't like interpreting afn as ignore the denormal handling. https://reviews.llvm.org/D157940	2023-08-15 10:48:46 -04:00
Matt Arsenault	e09b3593ba	AMDGPU: Fix fast math log2 f32 Apparently afn doesn't allow you to drop the denormal handling according to OpenCL conformance. This was hidden by losing the flags during the library linking process. Fast log is still broken and needs more work. https://reviews.llvm.org/D157936	2023-08-15 10:48:46 -04:00
Matt Arsenault	1faa4797ca	AMDGPU: Handle unsafe exp.f32 with denormal handling I somehow missed this path when adding the new expansions. Saves a lot of instructions for afn + IEEE. https://reviews.llvm.org/D157867	2023-08-14 18:36:01 -04:00
Matt Arsenault	9a53f5f5c4	AMDGPU: Handle llvm.stacksave and llvm.stackrestore Not sure if the only valid use is to have stackrestore directly consume stacksave outputs or not. Handled exactly like a regular stack pointer so all the edge cases theoretically should work. https://reviews.llvm.org/D156669	2023-08-11 10:25:01 -04:00
Matt Arsenault	055a7f2512	AMDGPU: Adjust outdated comment	2023-07-31 08:05:13 -04:00
Matt Arsenault	0295513238	AMDGPU: Filter out contract flags when lowering exp It is unsafe to contract the fsub into the fmul. It also increases code size by duplicating a constant.	2023-07-20 18:14:24 -04:00
Matt Arsenault	fbe4ff8149	AMDGPU: Partially fix not respecting dynamic denormal mode The most notable issue was producing v_mad_f32 in functions with the dynamic mode, since it just ignores the mode. fdiv lowering is still somewhat broken because it involves a mode switch and we need to query the original mode.	2023-07-11 15:14:52 -04:00
Amara Emerson	3a80bdb316	[GlobalISel] Remove an erroneous oneuse check in the G_ADD reassociation combine. This check was unnecessary/incorrect, it was already being done by the target hook default implementation, and the one in the matcher was checking for a completely different thing. This change: 1) Removes the check and updates affected tests which now do some more reassociations. 2) Modifies the AMDGPU hooks which were stubbed with "return true" to also do the oneuse check. Not sure why I didn't do this the first time.	2023-07-10 01:03:12 -07:00
Matt Arsenault	8ee1cc82c9	AMDGPU: Fold out sign bit ops on frexp_exp The sign bit has no impact on the exponent, so strip these away. Saves on the source modifier encoding cost. I left the GlobalISel handling until there's a resolution to issue #62628. We should do this in instcombine too, but legalization should be introducing more frexps than it currently is where this would occur.	2023-07-06 10:26:21 -04:00
Matt Arsenault	5491666248	AMDGPU: Correctly lower llvm.exp.f32 The library expansion has too many paths for all the permutations of DAZ, unsafe and the 3 exp functions. It's easier to expand it in the backend when we know all of these things. The library currently misses the no-infinity check on the overflow, which this handles optimizing out. Some of the <3 x half> fast tests regress due to vector widening dropping flags which will be fixed separately. Apparently there is no exp10 intrinsic, but there should be. Adds some deadish code in preparation for adding one while I'm following along with the current library expansion.	2023-07-05 17:23:49 -04:00
Matt Arsenault	ed556a1ad5	AMDGPU: Correctly lower llvm.exp2.f32 Previously this did a fast math expansion only.	2023-07-05 17:23:48 -04:00
Matt Arsenault	4e15f378ee	AMDGPU: Correctly lower llvm.log.f32 and llvm.log10.f32 Previously we expanded these in a fast-math way and the device libraries were relying on this behavior. The libraries have a pending change to switch to the new target intrinsic. Unlike the library version, this takes advantage of no-infinities on the result overflow check.	2023-07-05 15:30:35 -04:00
Matt Arsenault	89ccfa1b39	AMDGPU: Use correct lowering for llvm.log2.f32 We previously directly codegened to v_log_f32, which is broken for denormals. The lowering isn't complicated, you simply need to scale denormal inputs and adjust the result. Note log and log10 are still not accurate enough, and will be fixed separately.	2023-06-23 08:37:37 -04:00
Matt Arsenault	d9333e360a	Revert "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp" This reverts commit 1159c670d40e3ef302264c681fe7e0268a550874. Accidentally pushed wrong patch	2023-06-16 18:13:07 -04:00
Matt Arsenault	1159c670d4	AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp	2023-06-16 18:06:27 -04:00
Matt Arsenault	28f3edd2be	AMDGPU: Add llvm.amdgcn.exp2 intrinsic Provide direct access to v_exp_f32 and v_exp_f16, so we can start correctly lowering the generic exp intrinsics. Unfortunately have to break from the usual naming convention of matching the instruction name and stripping the v_ prefix. exp is already taken by the export intrinsic. On the clang builtin side, we have a choice of maintaining the convention to the instruction name, or following the intrinsic name.	2023-06-15 07:00:07 -04:00
Matt Arsenault	d0923a7739	AMDGPU: Correct constants used in fast math log expansion The division between float constants was done with less precision. Performing the divide in double and truncating to float provides the same value as used in the library fast math expansion.	2023-06-12 21:11:41 -04:00
Matt Arsenault	eccc89b26c	AMDGPU: Add llvm.amdgcn.log intrinsic This will map directly to the hardware instruction which does not handle denormals for f32. This will allow moving the generic intrinsic to be lowered correctly. Also handles selecting the f16 version, but there's no reason to use it over the generic intrinsic.	2023-06-12 21:10:30 -04:00
Matt Arsenault	abff7668ab	AMDGPU: Implement known bits functions for min3/max3/med3	2023-06-10 10:58:44 -04:00
Matt Arsenault	4e4c351ae5	AMDGPU: Avoid endpgm in middle of block for fallback trap lowering. This was inserting an s_endpgm in the middle of the block when it has to be a terminator. Split the block and insert a branch to a new block with the trap if it's not in a terminator position. Fixes verifier error on LDS in function with no trap support (and other trap sources).	2023-06-09 21:04:38 -04:00
Amara Emerson	086601eac2	[GlobalISel] Implement some binary reassociations, G_ADD for now - (op (op X, C1), C2) -> (op X, (op C1, C2)) - (op (op X, C1), Y) -> (op (op X, Y), C1) Some code duplication with the G_PTR_ADD reassociations unfortunately but no easy way to avoid it that I can see. Differential Revision: https://reviews.llvm.org/D150230	2023-06-08 21:14:58 -07:00
Matt Arsenault	c01f284fbb	AMDGPU: Fix regressions in integer mad matching Undo the canonicalize done in 0cfc6510323fbb5a56a5de23cbc65f7cc30fd34c. Restores some regressed matching of integer mad. The selection patterns fo the actual mads don't seem to be properly commuting, so some of the commuted cases are still missed. Fixes: SWDEV-363009	2023-06-08 16:48:47 -04:00
Matt Arsenault	3d0350b762	AMDGPU: Add MF independent version of getImplicitParameterOffset	2023-06-07 08:26:31 -04:00
Matt Arsenault	bc61bc8d6a	AMDGPU: Use available subtarget member	2023-06-07 08:26:31 -04:00
Matt Arsenault	eece6ba283	IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics AMDGPU has native instructions and target intrinsics for this, but these really should be subject to legalization and generic optimizations. This will enable legalization of f16->f32 on targets without f16 support. Implement a somewhat horrible inline expansion for targets without libcall support. This could be better if we could introduce control flow (GlobalISel version not yet implemented). Support for strictfp legalization is less complete but works for the simple cases.	2023-06-06 17:07:18 -04:00
Jay Foad	a4a3ac10cb	[AMDGPU] Remove extract_subvector patterns Removing them seems to slightly increase code quality as well as simplifying both the tablegen and C++ parts of the code. Differential Revision: https://reviews.llvm.org/D149853	2023-06-06 14:04:50 +01:00
Krzysztof Drewniak	faa2c678aa	[AMDGPU] Add buffer intrinsics that take resources as pointers In order to enable the LLVM frontend to better analyze buffer operations (and to potentially enable more precise analyses on the backend), define versions of the raw and structured buffer intrinsics that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their rsrc arguments. The new intrinsics are named by replacing `buffer.` with `buffer.ptr`. One advantage to these intrinsic definitions is that, instead of specifying that a buffer load/store will read/write some memory, we can indicate that the memory read or written will be based on the pointer argument. This means that, for example, a read from a `noalias` buffer can be pulled out of a loop that is modifying a distinct buffer. In the future, we will define custom PseudoSourceValues that will allow us to package up the (buffer, index, offset) triples that buffer intrinsics contain and allow for more precise backend analysis. This work also enables creating address space 7, which represents manipulation of raw buffers using native LLVM load and store instructions. Where tests simply used a buffer intrinsic while testing some other code path (such as the tests for VGPR spills), they have been updated to use the new intrinsic form. Tests that are "about" buffer intrinsics (for instance, those that ensure that they codegen as expected) have been duplicated, either within existing files or into new ones. Depends on D145441 Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D147547	2023-06-05 16:59:07 +00:00
Elliot Goodrich	ac73c48e09	[llvm] Reduce ComplexDeinterleavingPass.h includes Remove the unnecessary `"llvm/IR/PatternMatch.h"` include directive from `ComplexDeinterleavingPass.h` and move it to the corresponding source file. Add missing includes that were transitively included by this header to 3 other source files. This reduces the total number of preprocessing tokens across the LLVM source files in `lib` from (roughly) 1,964,876,961 to 1,935,091,611 - a reduction of ~1.52%. This should result in a small improvement in compilation time.	2023-05-20 17:49:18 +01:00
Thomas Symalla	91a7aa4c9b	[AMDGPU] Improve abs modifier usage If a call to the llvm.fabs intrinsic has users in another reachable BB, SelectionDAG will not apply the abs modifier to these users and instead generate a v_and ..., 0x7fffffff instruction. For fneg instructions, the issue is similar. This patch implements `AMDGPUIselLowering::shouldSinkOperands`, which allows CodegenPrepare to call `tryToSinkFreeOperands`. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D150347	2023-05-19 12:02:21 +02:00
Philip Reames	0dc0c27989	[TLI] Add IsZero parameter to storeOfVectorConstantIsCheap [nfc] Make the decision to consider zero constant stores cheap target specific. Will be used in an upcoming change for RISCV.	2023-05-17 09:19:01 -07:00
Nicolai Hähnle	ef13308b26	AMDGPU/SDAG: Improve {extract,insert}_subvector lowering for 16-bit vectors v2: - simplify the escape to TableGen patterns Differential Revision: https://reviews.llvm.org/D149841	2023-05-05 10:55:18 +02:00
Sergei Barannikov	e744e51b12	[SelectionDAG] Rename ADDCARRY/SUBCARRY to UADDO_CARRY/USUBO_CARRY (NFC) This will make them consistent with other overflow-aware nodes. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D148196	2023-04-29 21:59:58 +03:00

1 2 3 4 5 ...

570 Commits