llvm-project

Author	SHA1	Message	Date
Archibald Elliott	62c7f035b4	[NFC][TargetParser] Remove llvm/ADT/Triple.h I also ran `git clang-format` to get the headers in the right order for the new location, which has changed the order of other headers in two files.	2023-02-07 12:39:46 +00:00
Nicolai Hähnle	10cef708a7	AMDGPU: Clean up LDS-related occupancy calculations Occupancy is expressed as waves per SIMD. This means that we need to take into account the number of SIMDs per "CU" or, to be more precise, the number of SIMDs over which a workgroup may be distributed. getOccupancyWithLocalMemSize was wrong because it didn't take SIMDs into account at all. At the same time, we need to take into account that WGP mode offers access to a larger total amount of LDS, since this can affect how non-power-of-two LDS allocations are rounded. To make this work consistently, we distinguish between (available) local memory size and addressable local memory size (which is always limited by 64kB on gfx10+, even with WGP mode). This change results in a massive amount of test churn. A lot of it is caused by the fact that the default work group size is 1024, which means that (due to rounding effects) the default occupancy on older hardware is 8 instead of 10, which affects scheduling via register pressure estimates. I've adjusted most tests by just running the UTC tools, but in some cases I manually changed the work group size to 32 or 64 to make sure that work group size chunkiness has no effect. Differential Revision: https://reviews.llvm.org/D139468	2023-01-23 21:43:06 +01:00
Nicolai Hähnle	84610a82a1	AMDGPU: Add AMDGPUSubtarget::getEUsPerCU() We will use this for more accurate occupancy computations. Note that IsaInfo takes WGP mode vs. CU mode into account on gfx10+. Differential Revision: https://reviews.llvm.org/D139467	2023-01-23 21:43:05 +01:00
Joe Nash	18ed279a3a	[AMDGPU] gfx11 subtarget features & early tests Tablegen definitions for subtarget features and cpp predicate functions to access the features. New Sub-TargetProcessors and common latencies. Simple changes to MIR codegen tests which pass on gfx11 because they have the same output as previous subtargets or operate on pseudo instructions which are reused from previous subtargets. Contributors: Jay Foad <jay.foad@amd.com> Petar Avramovic <Petar.Avramovic@amd.com> Patch 4/N for upstreaming of AMDGPU gfx11 architecture Depends on D124538 Reviewed By: Petar.Avramovic, foad Differential Revision: https://reviews.llvm.org/D125261	2022-05-11 10:31:49 -04:00
Jay Foad	8a53b25ed5	[AMDGPU] Use default member initializers in Subtarget classes Use default member initializers in AMDGPUSubtarget and subclasses. This is to guard against adding a new feature boolean in AMDGPUSubtarget.h but forgetting to initialize it to false in AMDGPUSubtarget.cpp. This was mostly autogenerated by: clang-tidy -checks=-,cppcoreguidelines-prefer-member-initializer,modernize-use-default-member-init -header-filter=Subtarget -fix lib/Target/AMDGPU/Subtarget.cpp Differential Revision: https://reviews.llvm.org/D123613	2022-04-12 16:42:30 +01:00
Kazu Hirata	3a3cb929ab	[llvm] Use = default (NFC)	2022-02-06 22:18:35 -08:00
Matt Arsenault	89c447e4e6	AMDGPU: Stop reserving 36-bytes before kernel arguments for amdpal This was inheriting the mesa behavior, and as far as I know nobody is using opencl kernels with amdpal. The isMesaKernel check was irrelevant because this property needs to be held for all functions.	2022-01-20 12:12:05 -05:00
Neubauer, Sebastian	d1f45ed58f	[AMDGPU][NFC] Fix typos Differential Revision: https://reviews.llvm.org/D113672	2021-11-12 11:37:21 +01:00
Matt Arsenault	ec55dcedce	AMDGPU: Refactor getWavesPerEU to separate flat workgroup size query Add an overload to pass the flat workgroup range in separately. This will allow the attributor to use the assumed value for amdgpu-flat-workgroup-sizes when inferring amdgpu-waves-per-eu.	2021-09-21 22:57:17 -04:00
David Stuttard	b8173c3178	[AMDGPU] Stop mulhi from doing 24 bit mul for uniform values Added support to check if architecture supports s_mulhi which is used as part of the decision whether or not to use valu 24 bit mul (if the mulhi gets transformed to a valu op anyway, then may as well use it). This is an extension of the work in D97063 Differential Revision: https://reviews.llvm.org/D103321 Change-Id: I80b1323de640a52623d69ac005a97d06a5d42a14	2021-07-05 10:33:23 +01:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
dfukalov	560d7e0411	[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets ... to reduce headers dependency. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D95036	2021-01-20 22:22:45 +03:00
dfukalov	6a87e9b08b	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
Praveen Velliengiri	61177943c9	[AMDGPU] Use MUBUF instructions for global address space access Currently, the compiler crashes in instruction selection of global load/stores in gfx600 due to the lack of FLAT instructions. This patch fix the crash by selecting MUBUF instructions for global load/stores in gfx600. Authored-by: Praveen Velliengiri <Praveen.Velliengiri@amd.com> Reviewed by: t-tye Differential revision: https://reviews.llvm.org/D92483	2020-12-24 10:13:04 +00:00
Stanislav Mekhanoshin	cf6565f6d0	[AMDGPU] Enable multi-dword flat scratch load/stores Differential Revision: https://reviews.llvm.org/D91384	2020-11-12 13:38:56 -08:00
Jay Foad	830ed64ccd	Revert "Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access"" This reverts commit 8b08fa0103c8d8e624b19fad5a5006e7a783ecb7. The underlying problems were fixed by D90607.	2020-11-11 14:40:14 +00:00
Michael Liao	0d092303b4	[amdgpu] Enable use of AA during codegen. - Add an internal option `-amdgpu-use-aa-in-codegen` to enable or disable this feature. By Default, it's enabled. Differential Revision: https://reviews.llvm.org/D89320	2020-10-27 09:46:23 -04:00
Stanislav Mekhanoshin	038d884a50	[AMDGPU] Use flat scratch instructions where available The support is disabled by default. So far there is instruction selection, spilling, and frame elimination. It also changes SP from unswizzled to swizzled as used by flat scratch instructions, so it cannot be mixed with MUBUF stack access. At the very least missing: - GlobalISel; - Some optimizations in frame elimination in between vector and scalar ALU; - It shall finally allow to always materialize frame index as an SGPR, but that is not implemented and frame elimination cannot handle it yet; - Unaligned and/or multidword flat scratch shall work, but it is legalized now for MUBUF; - Operand folding cannot optimize FI like with MUBUF yet; - It will need scaling the value of the SP/FP in the DWARF expression to recover the unswizzled scratch address; Differential Revision: https://reviews.llvm.org/D89170	2020-10-26 14:40:42 -07:00
Matt Arsenault	1ed4caff1d	AMDGPU: Lower the threshold reported for maximum stack size exceeded Check the actual maximum supported stack size for a kernel.	2020-10-21 12:06:27 -04:00
Stanislav Mekhanoshin	6ddadf9901	[AMDGPU] flat scratch ST addressing mode on gfx10 GFX10 enables third addressing mode for flat scratch instructions, an ST mode. In that mode both register operands are omitted and only swizzled offset is used in addition to flat_scratch base. Differential Revision: https://reviews.llvm.org/D89501	2020-10-19 15:29:52 -07:00
Tony	e2af9bd611	[AMDGPU] Correct comment typo in AMDGPUSubtarget.h	2020-10-16 08:49:02 +00:00
Konstantin Zhuravlyov	3fdf3b1539	AMDGPU: Update AMDHSA code object version handling Differential Revision: https://reviews.llvm.org/D89076	2020-10-14 13:04:27 -04:00
Rodrigo Dominguez	f71f5f39f6	[AMDGPU] Implement hardware bug workaround for image instructions Summary: This implements a workaround for a hardware bug in gfx8 and gfx9, where register usage is not estimated correctly for image_store and image_gather4 instructions when D16 is used. Change-Id: I4e30744da6796acac53a9b5ad37ac1c2035c8899 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81172	2020-10-07 07:39:52 -04:00
Mirko Brkusanin	8b08fa0103	Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access" This reverts commit f5cd7ec9f3fc969ff5e1feed961996844333de3b. Certain rocPRIM/rocThrust/hipCUB tests were failing because of this change.	2020-09-29 15:33:34 +02:00
Matt Arsenault	af1c1e20f4	AMDGPU/GlobalISel: Implement computeKnownBits for groupstaticsize	2020-08-27 19:39:44 -04:00
Matt Arsenault	a1bc37c9e5	AMDGPU: Use caller subtarget, not intrinsic declaration Intrinsic declarations use the default subtarget, but this should be using the subtarget for the calling function. I haven't been able to come up with a case where it matters though.	2020-08-27 16:42:09 -04:00
Matt Arsenault	70cd9f5b77	AMDGPU/GlobalISel: Start implementing computeKnownBitsForTargetInstr Handle workitem intrinsics. There isn't really away to adequately test this right now, since none of the known bits users are fine grained enough to test the edge conditions. This triggers a number of instances of the new 64-bit to 32-bit shift combine in the existing tests.	2020-08-24 09:53:27 -04:00
Mirko Brkusanin	0654ff703d	[AMDGPU] Use ds_read/write_b96/b128 when possible for SDag Do not break down local loads and stores so ds_read/write_b96/b128 in ISelLowering can be selected on subtargets that support them and if align requirements allow them. Differential Revision: https://reviews.llvm.org/D84403	2020-08-21 12:26:31 +02:00
Mirko Brkusanin	f5cd7ec9f3	[AMDGPU] Reorganize GCN subtarget features for unaligned access Features UnalignedBufferAccess and UnalignedDSAccess are now used to determine whether hardware supports such access. UnalignedAccessMode should be used to enable them. hasUnalignedBufferAccessEnabled() and hasUnalignedDSAccessEnabled() can be now used to quickly check both. Differential Revision: https://reviews.llvm.org/D84522	2020-08-21 12:26:31 +02:00
Mirko Brkusanin	5bd1febe21	[AMDGPU] Fix alignment requirements for 96bit and 128bit local loads and stores Adjust alignment requirements for ds_read/write_b96/b128. GFX9 and onwards allow misaligned access for reads and writes but only if SH_MEM_CONFIG.alignment_mode allows it. UnalignedDSAccess is set on GCN subtargets from GFX9 onward to let us know if we can relax alignment requirements. UnalignedAccessMode acts similary to UnalignedBufferAccess for DS instructions but only from GFX9 onward and is supposed to match alignment_mode. By default alignment of 4 is required. Differential Revision: https://reviews.llvm.org/D82788	2020-08-21 12:26:31 +02:00
Craig Topper	c7a0b2684f	[X86][MC][Target] Initial backend support a tune CPU to support -mtune This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line. This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned. One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU. I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning. Differential Revision: https://reviews.llvm.org/D85165	2020-08-14 15:31:50 -07:00
Matt Arsenault	2414bab5d7	AMDGPU/GlobalISel: Remove old hacks for boolean selection There were various hacks used to try to avoid making s1 SGPR vs. s1 VCC ambiguous after constraining the register before we had a strategy to deal with this. This also attempted to handle undef operands, which are now illegal gMIR.	2020-08-03 09:04:14 -04:00
Matt Arsenault	a4edc04693	AMDGPU/GlobalISel: Use clamp modifier for [us]addsat/[us]subsat We also have never handled this for SelectionDAG, which needs additional work.	2020-07-28 11:18:05 -04:00
Dmitry Preobrazhensky	2e87acac9b	[AMDGPU] Removed s_mov_regrd and mov_fed opcodes These opcodes are not intended for public use. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D81659	2020-07-17 19:52:54 +03:00
Matt Arsenault	6f09bb7da2	AMDGPU: Don't pass MachineFunction if only the IR Function is used	2020-06-18 11:06:46 -04:00
Stanislav Mekhanoshin	9ee272f13d	[AMDGPU] Add gfx1030 target Differential Revision: https://reviews.llvm.org/D81886	2020-06-15 16:18:05 -07:00
Sebastian Neubauer	29a6ad94fd	[AMDGPU] Add G16 support to image instructions Add G16 feature for GFX10 and support A16 and G16 in GlobalISel. Differential Revision: https://reviews.llvm.org/D76836	2020-06-12 11:26:31 +02:00
Matt Arsenault	a8f7209255	AMDGPU: Change internal tracking of wave size Store the log2 wave size instead of forcing division and log2 operations when querying either.	2020-06-01 17:55:08 -04:00
Matt Arsenault	2484109378	AMDGPU/GlobalISel: Add boilerplate for inline asm lowering Test mostly from minor adjustments to the AArch64 one.	2020-05-29 16:49:23 -04:00
Matt Arsenault	2d2627d47a	AMDGPU: Remove fp-exceptions feature This was never used, and the only thing it changed was removed in 284472be6da3353d81dfd25b1ac4218e852d1e5f. The floating point mode is also not a property of the subtarget.	2020-05-29 15:19:59 -04:00
Jay Foad	42a5560503	[AMDGPU] New SIInsertHardClauses pass Enable clausing of memory loads on gfx10 by adding a new pass to insert the s_clause instructions that mark the start of each hard clause. Differential Revision: https://reviews.llvm.org/D79792	2020-05-14 18:54:49 +01:00
Fraser Cormack	c819ef9653	Provide operand indices to adjustSchedDependency This allows targets to know exactly which operands are contributing to the dependency, which is required for targets with per-operand scheduling models. Differential Revision: https://reviews.llvm.org/D77135	2020-04-17 11:08:44 +01:00
Matt Arsenault	221890d709	AMDGPU: Add feature for fast f32 denormals	2020-04-04 20:01:24 -04:00
Matt Arsenault	5660bb6bc9	AMDGPU: Remove denormal subtarget features Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.	2020-04-02 17:17:12 -04:00
Jay Foad	0fe096c4e9	[AMDGPU] Rename overloaded getMaxWavesPerEU to getWavesPerEUForWorkGroup Summary: I think Max in the name was misleading. NFC. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76860	2020-03-26 20:21:04 +00:00
Jay Foad	596bed3fd3	[AMDGPU] Remove unused methods. NFC.	2020-03-26 16:11:35 +00:00
Ram Nalamothu	24698e526f	Implement wave32 DWARF register mapping Implement the DWARF register mapping described in llvm/docs/AMDGPUUsage.rst. This enables generating appropriate DWARF register numbers for wave64 and wave32 modes.	2020-03-23 10:24:16 -04:00
Sebastian Neubauer	8756869170	[AMDGPU] Add a16 feature to gfx10 Based on D72931 This adds a new feature called A16 which is enabled for gfx10. gfx9 keeps the R128A16 feature so it can share all the instruction encodings with gfx7/8. Differential Revision: https://reviews.llvm.org/D73956	2020-02-10 09:04:23 +01:00
Matt Arsenault	5aa6e246a1	AMDGPU/GlobalISel: Legalize f64 G_FFLOOR for SI Use cmp ord instead of cmp_class compared to the DAG version for the nan check, but mostly try to match the existsing pattern. I think the sign doesn't matter for fract, so we could do a little better with the source modifier matching. I think this is also still broken as in D22898, but I'm leaving it as-is for now while I don't have an SI system to test on.	2020-02-05 14:32:01 -05:00
Jay Foad	4a331beadc	[AMDGPU] Fix vccz after v_readlane/v_readfirstlane to vcc_lo/hi Summary: Up to gfx9, writes to vcc_lo and vcc_hi by instructions like v_readlane and v_readfirstlane do not update vccz to reflect the new value of vcc. Fix it by reusing part of the existing vccz bug handling code, which inserts an "s_mov_b64 vcc, vcc" instruction to restore vccz just before an instruction that needs the correct value. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69661	2020-01-28 10:52:17 +00:00

1 2 3 4 5 ...

273 Commits