llvm-project

Author	SHA1	Message	Date
Matt Arsenault	2f5a116cf7	AMDGPU: Expand casted f16 fmed3 pattern to fmin/fmax on gfx8 If we have legal f16 instructions but no f16 med3, we can save one instruction by expanding out the min/max sequence compared to casting to f32 and casting back.	2023-05-23 08:48:25 +01:00
Mateja Marjanovic	cf76074a36	[AMDGPU][GlobalISel] Check exact width in get*ClassForBitWidth and widen if necessary Instead of checking if the given bitwidth is less or equal to a bitwidth of an existing RegClass, check if it has the exact same value. For LLVM vector types that don't have a corresponding Register Class, widen them during legalization. That goes for G_EXTRACT_VECTOR_ELT, G_INSERT_VECTOR_ELT and G_BUILD_VECTOR. Differential revision: https://reviews.llvm.org/D148096 Reviewers: foad, arsenm	2023-05-03 17:32:24 +02:00
Mateja Marjanovic	6175ec0bb6	Revert "[AMDGPU][GlobalISel] Widen the vector operand in G_BUILD/INSERT/EXTRACT_VECTOR" This reverts commit b25c7cafcbe1b52ea2d1ff5e5c2f13674b5f297d.	2023-05-03 17:28:01 +02:00
Mateja Marjanovic	b25c7cafcb	[AMDGPU][GlobalISel] Widen the vector operand in G_BUILD/INSERT/EXTRACT_VECTOR Widen the vector operand type in G_BUILD_VECTOR, G_INSERT_VECTOR_ELT, G_EXTRACT_VECTOR_ELT to the nearest larger RegClass.	2023-05-03 17:14:38 +02:00
Changpeng Fang	3bc1e084ee	AMDGPU: Created a subclass for the return address operand in the tail call return instruction Summary: This is to avoid using the callee saved registers for the return address of the tail call return instruction. Reviewers: arsenm, cdevadas Differential Revision: https://reviews.llvm.org/D147096	2023-04-10 10:53:33 -07:00
Jon Chesterfield	0507448d82	[amdgpu] Implement dynamic LDS accesses from non-kernel functions The premise here is to allow non-kernel functions to locate external LDS variables without using LDS or extra magic SGPRs to do so. 1/ First it crawls the callgraph to work out which external LDS variables are reachable from a given kernel 2/ Then it creates a new `extern char[0]` variable for each kernel, which will alias all the other extern LDS variables because that's the documented behaviour of these variables 3/ The address of that variable is written to a lookup table. The global variable is tagged with metadata to track what address it was allocated at by codegen 4/ The assembler builds the lookup table using the metadata 5/ Any non-kernel functions use the same magic intrinsic used by table lookups of non-dynamic LDS variables to find the address to use Heavy overlap with the code paths taken for other lowering, in particular the same intrinsic is used to pass the dynamic scope information through the same sgpr as for table lookups of static LDS. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144233	2023-04-04 20:06:34 +01:00
Mariusz Sikora	ea064ee2a3	[AMDGPU] Create Subtarget Features for some of 16 bits atomic fadd instructions Introducing Subtarget Features for instructions: - ds_pk_add_bf16 - ds_pk_add_f16 - ds_pk_add_rtn_bf16 - ds_pk_add_rtn_f16 - flat_atomic_pk_add_f16 - flat_atomic_pk_add_bf16 - global_atomic_pk_add_f16 - global_atomic_pk_add_bf16 - buffer_atomic_pk_add_f16 Differential Revision: https://reviews.llvm.org/D146701	2023-03-24 13:10:40 +01:00
Jay Foad	dcb834843e	[AMDGPU] Split SIModeRegisterDefaults out of AMDGPUBaseInfo. NFC. This is only used by CodeGen. Moving it out of AMDGPUBaseInfo simplifies future changes to make some of it depend on the subtarget. Differential Revision: https://reviews.llvm.org/D144650	2023-02-23 16:38:15 +00:00
Mirko Brkusanin	926746d22a	[AMDGPU][GFX11] Legalize and select partial NSA MIMG instructions If more registers are needed for VAddr then the NSA format allows then the final register can act as a contigous set of remaining addresses. Update legalizer to pack register for this new format and allow instruction selection to use NSA encoding when number of addresses exceeds max size. Also update SIShrinkInstructions to handle partial NSA. Differential Revision: https://reviews.llvm.org/D144034	2023-02-23 13:33:34 +01:00
Piotr Sobczak	a3d7b3121c	[AMDGPU][NFC] Add getMaxMUBUFImmOffset Replace magic constant 4095 with the function getMaxMUBUFImmOffset(). Differential Revision: https://reviews.llvm.org/D144623	2023-02-23 11:29:59 +01:00
Jessica Del	fc672b6a8b	[AMDGPU] Improved wide multiplies These checks show optimized instructions if an operand is known to be (partially) zero. Change-Id: Ie2f6d0d3ee9d5b279d1f4c1dd0787492e39cc77a Differential Revision: https://reviews.llvm.org/D140208	2023-02-22 16:39:06 +01:00
Jay Foad	62e4f81c67	[AMDGPU] Simplify widenScalar condition for BigTy for G_(UN)MERGE_VALUES Differential Revision: https://reviews.llvm.org/D144250	2023-02-17 11:12:43 +00:00
Kazu Hirata	7e6e636fb6	Use llvm::has_single_bit<uint32_t> (NFC) This patch replaces isPowerOf2_32 with llvm::has_single_bit<uint32_t> where the argument is wider than uint32_t.	2023-02-15 22:17:27 -08:00
Kazu Hirata	64dad4ba9a	Use llvm::bit_cast (NFC)	2023-02-14 01:22:12 -08:00
Changpeng Fang	7ca3444fba	AMDGPU: Use module flag to get code object version at IR level folow-up Summary: This is part of the leftover work for https://reviews.llvm.org/D143138. In this work, we pass code object version as an argument to initialize target ID and use it for targetID dump. Reviewers: arsenm Differential Revision https://reviews.llvm.org/D143293	2023-02-10 11:16:38 -08:00
Mirko Brkusanin	43924cbd29	[AMDGPU][GlobalISel] Fix selection of image sample g16 instructions Pre-GFX10 A16 modifier would imply G16. From GFX10 and onwards there are separate instructions for 16bit gradients. This fixes the condition for selecting G16 opcodes. Also stop adding G16 flag to instructions that do not use gradients for GFX10 onwards.	2023-02-09 16:26:55 +01:00
Matt Arsenault	6ce86a7eff	AMDGPU: Ensure flat loads are broken into dword in functions We were assuming we could rely on the flat scratch init detection to imply if there are possible flat addressed stack objects, which doesn't work outside of a kernel. We should have a way to prove if a given flat access can't access the stack. We could use a not-stack parameter attribute to avoid these splits. Make the minimally correct change for GlobalISel; I'll address this better in my larger patch to rewrite load and store legalization. Fixes: SWDEV-218237	2023-02-05 05:25:15 -04:00
Changpeng Fang	54cf69c9d5	AMDGPU: Use module flag to get code object version at IR level Summary: This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line. In case the module flag is missing, we use the current default code object version supported in the compiler. For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code object version, That will be in a separate patch later. For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file. In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D14313	2023-02-02 18:57:26 -08:00
Kazu Hirata	e078201835	[Target] Use llvm::count{l,r}_{zero,one} (NFC)	2023-01-28 09:23:07 -08:00
Kazu Hirata	f20b5071f3	[llvm] Use llvm::bit_floor instead of llvm::PowerOf2Floor (NFC)	2023-01-28 09:06:31 -08:00
Matt Arsenault	93ec3fa402	AMDGPU: Support atomicrmw uinc_wrap/udec_wrap For now keep the exising intrinsics working.	2023-01-27 22:17:16 -04:00
Kazu Hirata	caa99a01f5	Use llvm::popcount instead of llvm::countPopulation(NFC)	2023-01-22 12:48:51 -08:00
Kazu Hirata	188ec33726	[llvm] Use llvm::bit_width (NFC)	2023-01-21 14:48:32 -08:00
Joe Loser	a288d7f937	[llvm][ADT] Replace uses of `makeMutableArrayRef` with deduction guides Similar to how `makeArrayRef` is deprecated in favor of deduction guides, do the same for `makeMutableArrayRef`. Once all of the places in-tree are using the deduction guides for `MutableArrayRef`, we can mark `makeMutableArrayRef` as deprecated. Differential Revision: https://reviews.llvm.org/D141814	2023-01-16 14:49:37 -07:00
Diana Picus	f95a5fbe7c	MachineIRBuilder: Rename buildMerge. NFC `buildMerge` may build a G_MERGE_VALUES, G_BUILD_VECTOR or G_CONCAT_VECTORS. Rename it to `buildMergeLikeInstr`. This is a follow-up suggested in https://reviews.llvm.org/D140964 Differential Revision: https://reviews.llvm.org/D141372	2023-01-13 09:32:58 +01:00
serge-sans-paille	38818b60c5	Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part Use deduction guides instead of helper functions. The only non-automatic changes have been: 1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t), (uint8_t)) 2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase. 3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated. 4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that). Per reviewers' comment, some useless makeArrayRef have been removed in the process. This is a follow-up to https://reviews.llvm.org/D140896 that introduced the deduction guides. Differential Revision: https://reviews.llvm.org/D140955	2023-01-05 14:11:08 +01:00
Ivan Kosarev	85dada81e3	[AMDGPU][CodeGen] Support raw format TFE buffer loads other than byte, short and d16 ones. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D138215	2022-12-19 11:39:08 +00:00
Matt Arsenault	012a85296b	AMDGPU/GlobalISel: Use ptrtoint to legalize constant 32-bit addrspacecast This was trying to merge 2 32-bit pointers into a 64-bit pointer. The artifact combiner was assuming merges to pointers use scalar sources, and ended up inserting invalid bitcast from a pointer to a scalar. It should probably be a verifier error to have pointer merge sources with a pointer result. Fixes verifier errors with EXPENSIVE_CHECKS.	2022-12-18 13:15:58 -05:00
Matt Arsenault	9d6003c764	AMDGPU: Lower addrspacecast on gfx6 Fixes inconsistent handling of constant-32bit case. Turns out we can lower all the casts just fine, it's just accessing the flat results that's a problem.	2022-12-18 08:02:45 -05:00
Fangrui Song	21c4dc7997	std::optional::value => operator*/operator-> value() has undesired exception checking semantics and calls __throw_bad_optional_access in libc++. Moreover, the API is unavailable without _LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS). This fixes clang.	2022-12-17 00:42:05 +00:00
Jay Foad	6443c0ee02	[AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828	2022-12-14 13:22:26 +00:00
Fangrui Song	67819a72c6	[CodeGen] llvm::Optional => std::optional	2022-12-13 09:06:36 +00:00
Haojian Wu	50daddf279	Fix an -Wunused-variable warning in release build, NFC	2022-12-07 18:59:17 +01:00
Mirko Brkusanin	fe42ebe442	[AMDGPU][GlobalISel] Fix legalizing image intrinsics for new types We no longer need to increase vector size to 16 for intrinsics that use more than 8 vgprs for addr. There is no image intrinsic that needs more than 12 so all currently existing cases will be covered. Using incorrect size was causing an error in instruction selection because instructions were updated to require new types (9x32, 10x32, 11x32, 12x32). Differential Revision: https://reviews.llvm.org/D139546	2022-12-07 18:20:58 +01:00
Janek van Oirschot	587747d8d1	[AMDGPU] G_IS_FPCLASS lower() support for IEEE fp types Simplified globalisel version of sdag's expandIS_FPCLASS. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D139128	2022-12-07 11:53:09 +00:00
Pierre van Houtryve	a88deb4b65	[AMDGPU] Use aperture registers instead of S_GETREG Fixes a longstanding TODO in the codebase where we were using S_GETREG + shift to do something that could simply be done with an inline constant (register). Patch based on D31874 by @kzhuravl Depends on D137767 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D137542	2022-11-30 12:25:10 +00:00
Mateja Marjanovic	595a08847a	[AMDGPU] Add support for new LLVM vector types Add VReg, AReg and SReg on AMDGPU for bit widths: 288, 320, 352 and 384. Differential Revision: https://reviews.llvm.org/D138205	2022-11-29 17:02:04 +01:00
Janek van Oirschot	322966f8f8	[AMDGPU] Add llvm.is.fpclass intrinsic to existing SelectionDAG fp class support and introduce GlobalISel implementation for AMDGPU Uses existing SelectionDAG lowering of the llvm.amdgcn.class intrinsic for llvm.is.fpclass	2022-11-28 16:00:36 -05:00
Ivan Kosarev	ec8ede8177	[AMDGPU][CodeGen] Support raw format TFE buffer loads other than byte, short and d16 ones. Differential Revision: https://reviews.llvm.org/D138215	2022-11-24 10:50:26 +00:00
Matt Arsenault	1fe1299a93	GlobalISel: Legalize strict_fsub In the future should probably have a more convenient way to switch between building strict and non-strict ops.	2022-11-18 15:21:41 -08:00
Matt Arsenault	fe5b9a6a11	AMDGPU/GlobalISel: Make strict fadd, fmul and fma legal	2022-11-17 20:50:04 -08:00
Stanislav Mekhanoshin	bcaf31ec3f	[AMDGPU] Allow finer grain control of an unaligned access speed A target can return if a misaligned access is 'fast' as defined by the target or not. In reality there can be different levels of 'fast' and 'slow'. This patch changes the boolean 'Fast' argument of the allowsMisalignedMemoryAccesses family of functions to an unsigned representing its speed. A target can still define it as it wants and the direct translation of the current code uses 0 and 1 for current false and true. This makes the change an NFC. Subsequent patch will start using an actual value of speed in the load/store vectorizer to compare if a vectorized access going to be not just fast, but not slower than before. Differential Revision: https://reviews.llvm.org/D124217	2022-11-17 09:23:53 -08:00
Yashwant Singh	1c9a93ae3a	[GlobalIsel][AMDGPU] Changing legalize rule for G_{UADDO\|UADDE\|USUBO\|USUBE\|SADDE\|SSUBE} Generic add and sub with carry are now legalized in a way to explicitly calculate carry/borrow output. i.e %6:_(s64), %7:_(s1) = G_UADDO %0, %1 becomes, %13:_(s32), %14:_(s1) = G_UADDO %2, %4 %15:_(s32), %16:_(s1) = G_UADDE %3, %5, %14 %6:_(s64) = G_MERGE_VALUES %13(s32), %15(s32) %7:_(s1) = G_ICMP intpred(ult), %6(s64), %1 Here G_MERGE and G_ICMP instructions are redundant for recalculating carry output. (Similar case for sub with borrow) This change fix this. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D137932	2022-11-14 23:42:23 +05:30
Jay Foad	5073ae2a88	[AMDGPU] Fix duplicated words in comments	2022-11-03 15:33:30 +00:00
Pierre van Houtryve	bb71079e30	[AMDGPU][GISel] Add missing V2S16 BUILD_VECTOR_TRUNC legalization Previously we would be unable to legalize V2S16 BUILD_VECTOR_TRUNC on GFX8 & below as the custom legalization was missing. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135149	2022-10-06 06:48:53 +00:00
Pierre van Houtryve	c93104073c	[AMDGPU] Always lower SHUFFLE_VECTOR Make it illegal, remove InstructionSelector logic for it Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134967	2022-10-04 14:23:17 +00:00
Pierre van Houtryve	9a67a6b72a	[AMDGPU][GISel] Legalize V2S16 G_BUILD_VECTOR Preparation patch for D134354 to make V2S16 G_BUILD_VECTOR legal. Also removes RegBankInfo's scalarization of small BUILD_VECTORs, replacing it with InstructionSelector logic instead. This allows for V2S16 BUILD_VECTOR instructions to survive all the way to ISel so we can select FMA/MAD_MIX instructions in D134354. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134433	2022-09-30 14:04:53 +00:00
Carl Ritson	266b5dbc5d	[AMDGPU] Add MIMG NSA threshold configuration attribute Make MIMG NSA minimum addresses threshold an attribute that can be set on a function or configured via command line. This enables frontend tuning which allows increased NSA usage where beneficial. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D134780	2022-09-28 20:03:18 +09:00
Petar Avramovic	6db7921b65	AMDGPU: Use tablegen patterns for buffer global and flat atomic fadd Remove manual selection for atomic fadd from global-isel. Stop pre-isel translation to AtomicLoadFAdd/G_ATOMICRMW_FADD which corresponds to llvm-ir's atomicrmw fadd instruction. global and flat atomic fadd patterns changes: Split rtn/no-rtn patterns Add missing patterns or fix predicates Remove atomicrmw patterns for v2f16 (atomic rmw doesn't support vectors). Patterns now check addrspace of pointer, added patterns for flat intrinsic. with global addrspace pointer that selects into global atomic instruction. buffer atomic fadd patterns changes: Rdit patterns to import into global-isel. Remove gfx6/gfx7 _addr64 and _offset patterns. Remove patterns that can't be reached (same pattern but different feature). Differential Revision: https://reviews.llvm.org/D130579	2022-09-23 17:52:10 +02:00
Petar Avramovic	5cee9047d5	AMDGPU: Improve atomicrmw fadd selection Use same atomicrmw fadd expansion rules for gfx908, gfx940 and gfx11 as for gfx90a. Add missing globalisel legalizer support for flat atomicrmw fadd f32 on gfx940 and gfx11. Isel support for gfx11 will be added in D130579. Differential Revision: https://reviews.llvm.org/D131560	2022-09-23 17:52:10 +02:00

1 2 3 4 5 ...

587 Commits