llvm-project

Author	SHA1	Message	Date
Bjorn Pettersson	21a6890856	[Vectorize] Clean up Transforms/Vectorize.h Removed definitions of vectorizeBasicBlock and VectorizeConfig (possibly a remnant from the BBVectorize pass that was removed way back in 2017). Also reduced amount of include dependencies to Transforms/Vectorize.h.	2023-04-17 13:54:19 +02:00
Kazu Hirata	bfaf0e7858	[AMDGPU] Modernize Status and BlockData (NFC) Identified with modernize-use-default-member-init.	2023-04-16 13:03:02 -07:00
Jay Foad	6b5067a81a	[AMDGPU] Don't assert that image intrinsics are supported Unsupported intrinsics should give a regular "cannot select" error. Differential Revision: https://reviews.llvm.org/D148147	2023-04-16 19:54:55 +01:00
NAKAMURA Takumi	7d5d987e93	[CMake] Reorder and reformat deps	2023-04-17 00:32:16 +09:00
Kazu Hirata	972983539b	[llvm] Apply fixes from readability-redundant-control-flow (NFC)	2023-04-16 00:13:46 -07:00
Kazu Hirata	4241d890ae	[Target] Use range-based for loops (NFC)	2023-04-15 14:14:56 -07:00
Akshay Khadse	842dc35fc9	Guard against dereferencing a nullptr In `lib/CodeGen/PrologEpilogInserter.cpp` file, `RS` is assigned via `RS = TRI->requiresRegisterScavenging(MF) ? new RegScavenger() : nullptr;`. This means that `RS` can be `nullptr`. While executing the `TFI->processFunctionBeforeFrameFinalized(MF, RS);`, the `RS` can be dereferenced in the call `RS->enterBasicBlock(MBB);` in file `lib/Target/AMDGPU/SIFrameLowering.cpp` Reviewed By: skan, arsenm Differential Revision: https://reviews.llvm.org/D146791	2023-04-15 11:30:43 +08:00
pvanhout	b3b3cb2d2f	[AMDGPU] Less aggressively break large PHIs In some cases, breaking large PHIs can very negatively affect performance (3x more instructions observed in a particular test case). This patch adds some basic profitability heuristics to help with some of these issues without affecting the "good" cases. e.g. avoid breaking PHIs if it causes back-and-forth between vector/scalar form for no good reason. Fixes SWDEV-392803 Fixes SWDEV-393781 Fixes SWDEV-394228 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D147786	2023-04-14 15:41:26 +02:00
David Stuttard	fc83f1de5d	[AMDGPU] Add backend support for new PAL ELF Metadata 3.0 PAL Metadata 3.0 introduces an explicit structure in metadata for the programmable registers written out by the compiler backend. Rather than using opaque registers which can change between different architectures and requires encoding the bitfield information in the backend, which may change between versions. This is the initial minimal implementation that enables the use of PAL Metadata 3.0. The change itself should be NFC for non-PAL, although the way RSRC2 register is handled has been changed slightly. The test is fairly minimal, but checks that the metadata format looks as expected and verifies a couple of special cases such as tgid_[xyz]_en handling and PsInputAddr/Ena which also change to explicit fields. Differential Revision: https://reviews.llvm.org/D147143	2023-04-14 09:57:13 +01:00
Diana Picus	b9ba05360e	[AMDGPU] Don't S_MOV_B32 into $scc The peephole optimizer tries to replace ``` %n:sgpr_32 = S_MOV_B32 x $scc = COPY %n ``` with a `S_MOV_B32` directly into `$scc`. This crashes because `S_MOV_B32` cannot take `$scc` as input. We currently generate code like this from GlobalISel when lowering a G_BRCOND with a constant condition. We should probably look into removing this kind of branch altogether, but until then we should at least not crash. This patch fixes the issue by making sure we don't apply the peephole optimization when trying to move into a physical register that doesn't belong to the correct register class. Differential Revision: https://reviews.llvm.org/D148117	2023-04-14 10:24:43 +02:00
Jay Foad	2d39f5b5cd	[AMDGPU] Allow use of TTMP registers in AMDGPUResourceUsageAnalysis With architected SGPRs, workgroup IDs are passed into a compute shader in TTMP registers. Allow for this in AMDGPUResourceUsageAnalysis instead of failing an assertion. Differential Revision: https://reviews.llvm.org/D148239	2023-04-13 16:56:22 +01:00
Jay Foad	cf736e2325	[AMDGPU] Avoid else-after-return in isLegalAddressingMode. NFC.	2023-04-13 16:55:58 +01:00
Simon Pilgrim	9e30b87afb	[TTI] getMinMaxReductionCost - add FastMathFlag argument Similar to the getArithmeticReductionCost / getExtendedReductionCost calls (which really don't need to use std::optional<>). This will be necessary to correct recognize fast/nnan fmax/fmul reductions which can avoid nan handling - which will allow us to remove the fmax/fmin special case in X86TTIImpl::getMinMaxCost and use getIntrinsicInstrCost like we do for integer reductions (63c3895327839ba5b57f5b99ec9e888abf976ac6). Differential Revision: https://reviews.llvm.org/D148149	2023-04-13 10:42:42 +01:00
pvanhout	fd1d60873f	[AMDGPU] Remove CC exception for Promote Alloca Limits Apparently it was used to work around some issue that has been fixed. Removing it helps with high scratch usage observed in some cases due to failed alloca promotion. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D145586	2023-04-13 08:48:34 +02:00
Matt Arsenault	f608ac6286	AMDGPU: Push fneg into bitcast of integer select Avoids some regressions in the math libraries in a future patch.	2023-04-12 06:48:58 -04:00
Michael Liao	72fc08a541	[InstCombine] Teach alloca replacement to handle `addrspacecast` - As the address space cast may not be valid on a specific target, `addrspacecast` is not handled when an `alloca` is able to be replaced with the source of memcpy/memmove. This patch addresses that by querying a target hook on whether that address space cast is valid. For example, on most GPU targets, the cast from a global pointer to a generic pointer is valid. - If that cast is allowedd (by querying `isValidAddrSpaceCast`), the replacement is enhanced to handle that `addrspacecast` as well. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D147025	2023-04-11 11:47:37 -04:00
Matt Arsenault	0f59720e1c	AMDGPU: Fold fneg into bitcast of build_vector The math libraries have a lot of code that performs manual sign bit operations by bitcasting doubles to int2 and doing bithacking on them. This is a bad canonical form we should rewrite to use high level sign operations directly on double. To avoid codegen regressions, we need to do a better job moving fnegs to operate only on the high 32-bits. This is only halfway to fixing the real case.	2023-04-11 07:12:01 -04:00
Jon Chesterfield	e17c1bb494	[amdgpu][nfc] Update comments on LDS lowering	2023-04-11 10:48:19 +01:00
Diana Picus	d9bf8aba23	[AMDGPU] Add MMOs for GFX11 Streamout Instructions The GFX11 NGG Streamout Instructions perform atomic operations on dedicated registers. At the moment, they lack machine memory operands, which causes the si-memory-legalizer pass to treat them conservatively and introduce several unnecessary waits and cache invalidations. This patch introduces a new address space to represent these special registers and teaches instruction selection to add memory operands with this new address space to DS_ADD/SUB_GS_REG_RTN. Since this address space is meant to be compiler-internal, we move it up a bit from the other address spaces and give it the number 128. According to the LLVM Language Reference, address space numbers can go all the way up to 2^24, but I'm not sure how well this is supported in practice [1], so using a smaller number seems safer. [1] `0107513fe7/llvm/utils/TableGen/IntrinsicEmitter.cpp (L401)` Differential Revision: https://reviews.llvm.org/D146031	2023-04-11 11:11:32 +02:00
Diana Picus	f8861ea023	clang-format file I'm about to touch. NFCI	2023-04-11 11:11:32 +02:00
pvanhout	0ff02cf015	[AMDGPU] Hide "removing function" diagnostics by default Use an `OptimizationRemark` for them even though it's not really an optimization. It just integrates better with the other diagnostics (enabling is easy with `-pass-remark`). Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D147703	2023-04-11 09:26:20 +02:00
Kazu Hirata	53ead5215b	[Target] Use isNullConstant and isOneConstant (NFC)	2023-04-10 18:23:07 -07:00
Changpeng Fang	3bc1e084ee	AMDGPU: Created a subclass for the return address operand in the tail call return instruction Summary: This is to avoid using the callee saved registers for the return address of the tail call return instruction. Reviewers: arsenm, cdevadas Differential Revision: https://reviews.llvm.org/D147096	2023-04-10 10:53:33 -07:00
Jay Foad	f34a1953ce	[AMDGPU] Fix AddedComplexity for s_buffer_load patterns. NFCI. We set AddedComplexity = 100 for s_load patterns to prefer them over global loads, but for s_buffer_load patterns there is no need to do this and it was quietly overriding the AddedComplexity of each individual GCNPat that is defined inside SMLoad_Pattern (but in practice that did not appear to make any difference). Differential Revision: https://reviews.llvm.org/D145396	2023-04-10 17:26:50 +01:00
mmarjano	f6e70ed1c7	[AMDGPU] Extend tbuffer_load_format merge Add support for merging _IDXEN and _BOTHEN variants of TBUFFER_LOAD_FORMAT instruction.	2023-04-10 12:24:21 +02:00
skc7	b434051dc8	[AMDGPU] Introduce SIInstrWorklist to process instructions in moveToVALU Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D147168	2023-04-10 11:34:14 +05:30
Chen Zheng	a3d5ec51ba	[AMDGPU][Global-ISel] reuse extension related patterns in td file However the imported rules can not be used for now because Global ISel selectImpl() seems has some bug/limitation to create a illegl COPY from VGPR to SGPR. So currently workaround this by not auto selecting these patterns. Fixes #61468 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D147780	2023-04-10 02:11:33 +00:00
Matt Arsenault	2a9f1dad2c	AMDGPU: Fix LiveVariables verifier error for values defined before SI_END_CF GlobalISel happens to insert some constant materializes before SI_END_CF in one test. These need to be excluded from AliveBlocks since they are defined in the original block and used in the split block, so they aren't fully alive through either block. The case where the value defined in the first block which was originally used in a later block is still broken. Avoids a verifier error in a future patch.	2023-04-08 07:05:35 -04:00
Matt Arsenault	7ac3ab34cb	AMDGPU: Fix missing MIR serialization for PSInputAddr/PSInputEnable Resuming any mir test for a pixel shader would assert in the AsmPrinter.	2023-04-08 07:05:35 -04:00
Jay Foad	985e24cc16	[AMDGPU] Fix a case of updating LiveIntervals in SIOptimizeExecMaskingPreRA This was causing two test failures when I applied D129208 to enable extra verification of LiveIntervals: LLVM :: CodeGen/AMDGPU/optimize-negated-cond-exec-masking-wave32.mir LLVM :: CodeGen/AMDGPU/optimize-negated-cond-exec-masking.mir Differential Revision: https://reviews.llvm.org/D147721	2023-04-08 08:39:21 +01:00
Jay Foad	b2e98c18cc	[AMDGPU] Fix comment in SIOptimizeExecMaskingPreRA	2023-04-07 11:07:50 +01:00
Ruiling Song	2ab6835f28	AMDGPU: mark SET_INACTIVE_* as convergent operation set_inactive is actually a kind of operation that is passing certain value from active threads to inactive threads. In later WWM operation, the activated threads which were disabled before would read such values passed to them by set_inactive operation. So I think the set_inactive is a convergent operation. Differential Revision: https://reviews.llvm.org/D147683	2023-04-07 09:10:43 +08:00
Nico Weber	72e01ef1f1	Revert "[AMDGPU] Add Lower Bound to PipelineSolver" This reverts commit 3c42a58c4f20ae3b621733bf5ee6d57c912994a9. Breaks tests on mac, see https://reviews.llvm.org/rG3c42a58c4f20ae3b621733bf5ee6d57c912994a9#1191724	2023-04-06 12:35:44 -04:00
Alexis Engelke	0c049ea60a	[MC] Always encode instruction into SmallVector All users of MCCodeEmitter::encodeInstruction use a raw_svector_ostream to encode the instruction into a SmallVector. The raw_ostream however incurs some overhead for the actual encoding. This change allows an MCCodeEmitter to directly emit an instruction into a SmallVector without using a raw_ostream and therefore allow for performance improvments in encoding. A default path that uses existing raw_ostream implementations is provided. Reviewed By: MaskRay, Amir Differential Revision: https://reviews.llvm.org/D145791	2023-04-06 16:21:49 +02:00
Jessica Del	04317d4da7	[AMDGPU][GISel] Add inverse ballot intrinsic The inverse ballot intrinsic takes in a boolean mask for all lanes and returns the boolean for the current lane. See SPIR-V's `subgroupInverseBallot()` in the [[ https://github.com/KhronosGroup/GLSL/blob/master/extensions/khr/GL_KHR_shader_subgroup.txt \| GL_KHR_shader_subgroup extension ]]. This allows decision making via branch and select instructions with a manually manipulated mask. Implemented in GlobalISel and SelectionDAG, since currently both are supported. The SelectionDAG required pseudo instructions to use the custom inserter. The boolean mask needs to be uniform for all lanes. Therefore we expect SGPR input. In case the source is in a VGPR, we insert one or more `v_readfirstlane` instructions. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D146287	2023-04-06 07:46:50 +02:00
Jeff Byrnes	3c42a58c4f	[AMDGPU] Add Lower Bound to PipelineSolver	2023-04-05 14:54:59 -07:00
Jon Chesterfield	3c76e5f0c8	[amdgpu][nfc] Remove dead code associated with LDS lowering Pass disabled since approximately D104962 for miscompiling openmp The functions under ReplaceConstant miscompile phis as noted in D112717 and have no users in tree other than the disabled pass. It seems likely it has no users out of tree. Deletes the test cases associated with the disabled pass as well. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D147586	2023-04-05 22:24:22 +01:00
Jon Chesterfield	0507448d82	[amdgpu] Implement dynamic LDS accesses from non-kernel functions The premise here is to allow non-kernel functions to locate external LDS variables without using LDS or extra magic SGPRs to do so. 1/ First it crawls the callgraph to work out which external LDS variables are reachable from a given kernel 2/ Then it creates a new `extern char[0]` variable for each kernel, which will alias all the other extern LDS variables because that's the documented behaviour of these variables 3/ The address of that variable is written to a lookup table. The global variable is tagged with metadata to track what address it was allocated at by codegen 4/ The assembler builds the lookup table using the metadata 5/ Any non-kernel functions use the same magic intrinsic used by table lookups of non-dynamic LDS variables to find the address to use Heavy overlap with the code paths taken for other lowering, in particular the same intrinsic is used to pass the dynamic scope information through the same sgpr as for table lookups of static LDS. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144233	2023-04-04 20:06:34 +01:00
Changpeng Fang	282b8ac1f0	Revert "AMDGPU: Created a subclass for the return address operand in the tail call return instruction" This reverts commit 461a559bc9bd755436ba8f12f8b74757e03f9b9f.	2023-04-04 11:44:52 -07:00
Changpeng Fang	461a559bc9	AMDGPU: Created a subclass for the return address operand in the tail call return instruction Summary: This is to avoid using the callee saved registers for the return address of the tail call return instruction. Reviewers: arsenm, cdevadas Differential Revision: https://reviews.llvm.org/D147096	2023-04-04 10:56:58 -07:00
Craig Topper	1f60c8d025	[IR] Replace calls to ConstantFP::getNullValue with ConstantFP::getZero. NFC There is no getNullValue in ConstantFP. Due to inheritance, we're calling Constant::getNullValue which handles any type including FP. Since we already know we want an FP constant we can use ConstantFP::getZero which might be faster and is a more readable name for an FP zero.	2023-04-03 23:14:02 -07:00
Jon Chesterfield	62951784f0	[amdgpu][nfc] Refactor prior to D144233 to remove noise from diff	2023-04-03 16:47:01 +01:00
Craig Topper	219ff07f72	[Targets] Rename Flag->Glue. NFC Long long ago Glue was called Flag, and it was never completely renamed.	2023-04-02 19:28:51 -07:00
Aaron Ballman	ec1f5b947e	Revert "AMDGPU: Created a subclass for the return address operand in the tail call return instruction" This reverts commit 7a98934fadc3581ff024a77dc696b62f1a538ad5. This appears to have broken several bots, including: https://lab.llvm.org/buildbot/#/builders/42/builds/9472	2023-04-01 10:50:59 -04:00
Simon Pilgrim	8153b92d9b	[DAG] Add SelectionDAG::SplitScalar helper Similar to the existing SelectionDAG::SplitVector helper, this helper creates the EXTRACT_ELEMENT nodes for the LO/HI halves of the scalar source. Differential Revision: https://reviews.llvm.org/D147264	2023-03-31 18:35:40 +01:00
Jay Foad	bdf52b5dfe	[AMDGPU] Don't bother to use OffsetMode to define Real SMEM instructions Various Real classes took an OffsetMode parameter, but only used it to extract the suffix for the name of the corresponding pseudo. I found this confusing because you couldn't usefully define and use a different OffsetMode here, e.g. one with different operand types to affect how the instruction was printed. Overall I think it's simpler to just pass in the suffixed pseudo name directly. Differential Revision: https://reviews.llvm.org/D147242	2023-03-31 15:00:30 +01:00
Jay Foad	8bad806f29	[AMDGPU] Do not reserve 16-bit registers There should be no need to reserve all SGPR hi16/lo16 halves, or all AGPR hi16 halves. This should be done by marking the corresponding register classes as not allocatable instead. Differential Revision: https://reviews.llvm.org/D147158	2023-03-31 14:56:27 +01:00
pvanhout	16e1f8e970	Revert "[AMDGPU] Select v_sat_pk_u8_i16" This reverts commit 64b45db34a0cd979dae9ca3016e9da517e57b987. Reason: the patterns are wrong which can result in a miscompilation. However, fixing the pattern is not trivial due to how i8 values are handled, and due to the additional type-checking performed by D147127: trunc/smax/smin are all defined as int ops in the DAG despite them working on vectors too. As this is not a much-needed pattern, I prefer reverting for now until I can find time to properly rewrite the pattern.	2023-03-31 12:39:49 +02:00
Jay Foad	6b6303ac00	[AMDGPU] Fix whitespace after D147216	2023-03-31 11:10:25 +01:00
Changpeng Fang	7a98934fad	AMDGPU: Created a subclass for the return address operand in the tail call return instruction Summary: This is to avoid using the callee saved registers for the return address of the tail call return instruction. Reviewers: arsenm, cdevadas Differential Revision: https://reviews.llvm.org/D147096	2023-03-30 11:13:21 -07:00

1 2 3 4 5 ...

7812 Commits