llvm-project

Author	SHA1	Message	Date
Thorsten Schütt	deefe3fbc9	[GlobalIsel] Post-review combine ADDO (#85961 ) https://github.com/llvm/llvm-project/pull/82927	2024-03-21 03:56:40 +01:00
Jonas Paulsson	9ebd329ad8	Revert "Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698 )" This reverts commit 05bde30585710a51592eee0a6cf6df8184d09c92. Reverting due to verifier complaints with expensive checks on build-bot.	2024-03-20 11:48:30 -04:00
Jonas Paulsson	05bde30585	Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698 ) Have the verifier report a missing AdjustsStack flag rather than waiting until PEI asserts.	2024-03-20 10:29:12 -04:00
Pravin Jagtap	e52a687871	[AMDGPU][NFC] Test clean up (#85922 ) Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-03-20 17:29:42 +05:30
Pravin Jagtap	070d1e8321	[AMDGPU] Add test for fpext & fptrunc with bf16. (#85909 ) Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-03-20 14:45:38 +05:30
Peter Rong	4a026b5092	[AMDGCN] Use ZExt when handling indices in insertment element (#85718 ) When i1 true is used as an index, SExt extends it to i32 -1. This would cause BitVector to overflow. The language manual have specified that the index shall be treated as an unsigned number, this patch fixes that. (https://llvm.org/docs/LangRef.html#insertelement-instruction) This patch fixes #85717 --------- Signed-off-by: Peter Rong <PeterRong96@gmail.com>	2024-03-19 21:44:08 -07:00
Changpeng Fang	ab76052fa9	AMDGPU: Treat SWMMAC the same as MFMA and other WMMA for sched_barrier (#85721 )	2024-03-19 09:58:09 -07:00
Pravin Jagtap	08701e35ed	[AMDGPU][NFC] Test clean up. (#85775 ) Added common check for DPP and Iterative strategies for uniform value case since optimization applied is same. Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-03-19 18:00:34 +05:30
Pierre van Houtryve	953c13b5c9	[AMDGPU][PromoteAlloca] Whole-function alloca promotion to vector (#84735 ) Update PromoteAllocaToVector so it considers the whole function before promoting allocas. Allocas are scored & sorted so the highest value ones are seen first. The budget is now per function instead of per alloca. Passed internal performance testing.	2024-03-19 11:49:22 +01:00
Jonas Paulsson	09bc6abba6	[MachineFrameInfo] Refactoring around computeMaxcallFrameSize() (NFC) (#78001 ) - Use computeMaxCallFrameSize() in PEI::calculateCallFrameInfo() instead of duplicating the code. - Set AdjustsStack in FinalizeISel instead of in computeMaxCallFrameSize().	2024-03-18 10:37:59 -04:00
Yingwei Zheng	38a44bdc93	[CodeGenPrepare] Reverse the canonicalization of isInf/isNanOrInf (#81572 ) In commit `2b582440c1`, we canonicalize the isInf/isNanOrInf idiom into fabs+fcmp for better analysis/codegen (See also the discussion in https://github.com/llvm/llvm-project/pull/76338). This patch reverses the fabs+fcmp to `is.fpclass`. If the `is.fpclass` is not supported by the target, it will be expanded by TLI. Fixes the regression introduced by `2b582440c1` and https://github.com/llvm/llvm-project/pull/80414#issuecomment-1936374206.	2024-03-18 18:27:45 +08:00
pvanhout	3493438605	Revert "[AMDGPU] Run LowerLDS at the end of the fullLTO pipeline (#75333 )" This reverts commit 9b98692eedb78aa106539c36ba02944f32cae1ff.	2024-03-18 11:18:57 +01:00
Pierre van Houtryve	9b98692eed	[AMDGPU] Run LowerLDS at the end of the fullLTO pipeline (#75333 ) This change allows us to use `--lto-partitions` in some cases (not at all guaranteed it works perfectly), as LDS is lowered before the module is split for parallel codegen. We must run LowerLDS before splitting modules as it needs to see all callers of functions with LDS to properly lower them.	2024-03-18 09:09:43 +01:00
Sameer Sahasrabuddhe	ec34699f75	[GlobalISel] convergence control tokens and intrinsics (#67006 ) [GlobalISel] Implement convergence control tokens and intrinsics in GMIR In the IR translator, convert the LLVM token type to LLT::token(), which is an alias for the s0 type. These show up as implicit uses on convergent operations. Differential Revision: https://reviews.llvm.org/D158147	2024-03-18 10:34:11 +05:30
Jay Foad	092999e70b	[AMDGPU] Update checks in new test after #85370	2024-03-15 14:10:30 +00:00
Matt Arsenault	9b5d9a81bd	AMDGPU: Regenerate test checks from c7c561ef9 The test output changed after initial commit/test in 5f774619eac5db73398225a4c924a9c1d437fb40	2024-03-15 16:27:42 +05:30
Matt Arsenault	c7c561ef98	AMDGPU: Enable ExpandLargeFpConvert for > 64-bit types Fixes casts between double/float/half and i128. The pass seems to be broken for bfloat though. I also believe we could have a better implementation which attempts to make use the native 32-bit conversion instructions like the 64-bit expansion does.	2024-03-15 16:08:39 +05:30
Thorsten Schütt	5f774619ea	[GlobalIsel] Combine ADDO (#82927 ) Perform the requested arithmetic and produce a carry output in addition to the normal result. Clang has them as builtins (__builtin_add_overflow_p). The middle end has intrinsics for them (sadd_with_overflow). AArch64: ADDS Add and set flags On Neoverse V2, they run at half the throughput of basic arithmetic and have a limited set of pipelines.	2024-03-14 12:45:19 +01:00
Carl Ritson	c29b265eb9	Reapply "[AMDGPU] Add pal metadata 3.0 support to callable pal funcs (#67104 )" This reverts commit 7d508eb5d38f4bbbab4230a666d9e742e271af61.	2024-03-14 10:56:43 +09:00
Harald van Dijk	ceb744eb2f	[AMDGPU] Fix canonicalization of truncated values. (#83054 ) We were relying on roundings to implicitly canonicalize, which is generally safe, except with roundings that may be optimized away. Fixes #82937.	2024-03-13 12:08:39 +00:00
Jun Wang	c4e517f59c	[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035 ) A new function attribute named amdgpu_num_work_groups is added. This attribute, which consists of three integers, allows programmers to let the compiler know the number of workgroups to be launched in each of the three dimensions and do optimizations based on that information. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2024-03-12 10:30:39 -07:00
Matt Arsenault	bd72ebd8d1	AMDGPU: Add some more mfma hazard recognizer tests (#84727 )	2024-03-12 22:05:47 +05:30
Jake Egan	fa1d13590c	[AIX][tests] Disable failing tests on AIX These new tests are failing on the AIX bot because the -I option isn't supported. Disable these tests for now until they can be fixed.	2024-03-12 12:11:18 -04:00
Pierre van Houtryve	d4569d42b5	[AMDGPU] Let LowerModuleLDS run twice on the same module (#81729 ) If all variables in the module are absolute, this means we're running the pass again on an already lowered module, and that works. If none of them are absolute, lowering can proceed as usual. Only diagnose cases where we have a mix of absolute/non-absolute GVs, which means we added LDS GVs after lowering, which is broken. See #81491 Split from #75333	2024-03-11 09:20:01 +01:00
AtariDreams	4e0e9b17c6	[SelectionDAG] Switch to LiveRegUnits (#84197 )	2024-03-11 12:47:39 +05:30
Carl Ritson	4a21e3afa2	[LiveIntervals] repairIntervalsInRange: recompute width changes (#78564 ) Extend repairIntervalsInRange to completely recompute the interva for a register if subregister defs exist without precise subrange matches (LaneMask exactly matching subregister). This occurs when register sequences are lowered to copies such that the size of the copies do not match any uses of the subregisters formed (i.e. during twoaddressinstruction). The subranges without this change are probably legal, but do not match those generated by live interval computation. This creates problems with other code that assumes subranges precisely cover all subregisters defined, e.g. shrinkToUses().	2024-03-11 15:24:17 +09:00
Carl Ritson	d9e6aa7048	[AMDGPU] Update LiveInterval def index for early-clobber (#79285 ) On converting an instruction to an early-clobber definition in convertToThreeAddress, we must also update live intervals for the register to start at the early-clobber index.	2024-03-11 14:54:11 +09:00
Jay Foad	fd3eaf76ba	[GISel] Enforce G_PTR_ADD RHS type matching index size for addr space (#84352 )	2024-03-09 09:07:22 +00:00
Shilei Tian	e963d0740e	[AMDGPU] Replace `isInlinableLiteral16` with specific version (#84402 ) The current implementation of `isInlinableLiteral16` assumes, a 16-bit inlinable literal is either an `i16` or a `fp16`. This is not always true because of `bf16`. However, we can't tell `fp16` and `bf16` apart by just looking at the value. This patch splits `isInlinableLiteral16` into three versions, `i16`, `fp16`, `bf16` respectively, and call the corresponding version.	2024-03-08 14:49:52 -05:00
Pierre van Houtryve	4b1910b11d	[GlobalISel][AMDGPU] Import patterns with multiple defs (#84171 ) Fixes #63216	2024-03-08 09:39:10 +01:00
Fangrui Song	66bd3cd75b	[AMDGPU,test] Change llc -march= to -mtriple= PR #75982 had been created before these tests were added, therefore some test were not updated.	2024-03-07 19:09:18 -08:00
David Green	44be5a7fdc	[Codegen] Make Width in getMemOperandsWithOffsetWidth a LocationSize. (#83875 ) This is another part of #70452 which makes getMemOperandsWithOffsetWidth use a LocationSize for Width, as opposed to the unsigned it currently uses. The advantages on it's own are not super high if getMemOperandsWithOffsetWidth usually uses known sizes, but if the values can come from an MMO it can help be more accurate in case they are Unknown (and in the future, scalable).	2024-03-06 17:40:13 +00:00
Krzysztof Drewniak	6540f1635a	[AMDGPU] Add IR-level pass to rewrite away address space 7 (#77952 ) This commit adds the -lower-buffer-fat-pointers pass, which is applicable to all AMDGCN compilations. The purpose of this pass is to remove the type `ptr addrspace(7)` from incoming IR. This must be done at the LLVM IR level because `ptr addrspace(7)`, as a 160-bit primitive type, cannot be correctly handled by SelectionDAG. The detailed operation of the pass is described in comments, but, in summary, the removal proceeds by: 1. Rewriting loads and stores of ptr addrspace(7) to loads and stores of i160 (including vectors and aggregates). This is needed because the in-register representation of these pointers will stop matching their in-memory representation in step 2, and so ptrtoint/inttoptr operations are used to preserve the expected memory layout 2. Mutating the IR to replace all occurrences of `ptr addrspace(7)` with the type `{ptr addrspace(8), ptr addrspace(6) }`, which makes the two parts of a buffer fat pointer (the 128-bit address space 8 resource and the 32-bit address space 6 offset) visible in the IR. This also impacts the argument and return types of functions. 3. Splitting the resource and offset parts. All instructions that produce or consume buffer fat pointers (like GEP or load) are rewritten to produce or consume the resource and offset parts separately. For example, GEP updates the offset part of the result and a load uses the resource and offset parts to populate the relevant llvm.amdgcn.raw.ptr.buffer.load intrinsic call. At the end of this process, the original mutated instructions are replaced by their new split counterparts, ensuring no invalidly-typed IR escapes this pass. (For operations like call, where the struct form is needed, insertelement operations are inserted). Compared to LGC's PatchBufferOp ( `32cda89776/lgc/patch/PatchBufferOp.cpp` ): this pass - Also handles vectors of ptr addrspace(7)s - Also handles function boundaries - Includes the same uniform buffer optimization for loops and conditionals - Does not handle memcpy() and friends (this is future work) - Does not break up large loads and stores into smaller parts. This should be handled by extending the legalization of .buffer.{load,store} to handle larger types by producing multiple instructions (the same way ordinary LOAD and STORE are legalized). That work is planned for a followup commit. - Does not* have special logic for handling divergent buffer descriptors. The logic in LGC is, as far as I can tell, incorrect in general, and, per discussions with @nhaehnle, isn't widely used. Therefore, divergent descriptors are handled with waterfall loops later in legalization. As a final matter, this commit updates atomic expansion to treat buffer operations analogously to global ones. (One question for reviewers: is the new pass is the right place? Should it be later in the pipeline?) Differential Revision: https://reviews.llvm.org/D158463	2024-03-06 09:49:58 -06:00
Mirko Brkušanin	1fd1f4c0e1	[AMDGPU] Handle amdgpu.last.use metadata (#83816 ) Convert !amdgpu.last.use metadata into MachineMemOperand for last use and handle it in SIMemoryLegalizer similar to nontemporal and volatile.	2024-03-06 16:33:52 +01:00
Emma Pilkington	4490003a22	[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905 ) The previous name 'amdgpu_code_object_version', was misleading since this is really a property of the HSA OS. The new spelling also matches the asm directive I added in bc82cfb.	2024-03-06 09:51:48 -05:00
Joseph Huber	1fc5e50ceb	[AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (#83906 ) Summary: This patch implements the LLVM floating point environment control intrinsics and also exposes it through clang. We encode the floating point environment as a 64-bit value that simply concatenates the values of the mode registers and the current trap status. We only fetch the bits relevant for floating point instructions. That is, rounding mode, denormalization mode, ieee, dx10 clamp, debug, enabled traps, f16 overflow, and active exceptions.	2024-03-06 08:11:54 -06:00
Shilei Tian	e9c1dbb408	Revert "[AMDGPU] Replace `isInlinableLiteral16` with specific version (#81345 )" This reverts commit 530f0e64ec11327879c44f2fd55c7c28efdbaa2d because it breaks downstream.	2024-03-06 08:42:54 -05:00
Pierre van Houtryve	52d5b8e02d	[AMDGPU] Don't form sext/abs/neg fp8 cvt (#83843 ) gfx940 does not allow abs/sext/neg on v_cvt_fp8/bf8 & pk variants. Fixes SWDEV-447468	2024-03-06 10:38:20 +01:00
Sameer Sahasrabuddhe	60822637bf	Restore "Implement convergence control in MIR using SelectionDAG (#71785 )" This restores commit c7fdd8c11e54585dc9d15d63de9742067e0506b9. Previously reverted in f010b1bef4dda2c7082cbb41dbabf1f149cce306. LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-03-06 12:19:32 +05:30
Noah Goldstein	17162b61c2	[KnownBits] Make `nuw` and `nsw` support in `computeForAddSub` optimal Just some improvements that should hopefully strengthen analysis. Closes #83580	2024-03-05 12:59:58 -06:00
bcahoon	4cf8b298cf	[AMDGPU][PromoteAlloca] Correctly handle a variable vector index (#83597 ) The promote alloca to vector transformation assumes that the vector index is a constant value. If it is not a constant, then either an assert occurs or the tranformation generates an incorrect index.	2024-03-05 08:18:17 -06:00
Mitch Phillips	f010b1bef4	Revert "Restore "Implement convergence control in MIR using SelectionDAG (#71785 )"" This reverts commit c7fdd8c11e54585dc9d15d63de9742067e0506b9. Reason: Broke the sanitizer buildbots. See the comments at https://github.com/llvm/llvm-project/pull/71785 for more information.	2024-03-04 17:05:34 +01:00
Mirko Brkušanin	27ce5121ee	[AMDGPU] Fix setting nontemporal in memory legalizer (#83815 ) Iterator MI can advance in insertWait() but we need original instruction to set temporal hint. Just move it before handling volatile.	2024-03-04 15:05:31 +01:00
Shilei Tian	530f0e64ec	[AMDGPU] Replace `isInlinableLiteral16` with specific version (#81345 )	2024-03-04 08:40:42 -05:00
Mirko Brkušanin	982e9022ca	[AMDGPU] Add GFX12 memory legalizer tests (#83814 )	2024-03-04 11:22:04 +01:00
Sameer Sahasrabuddhe	c7fdd8c11e	Restore "Implement convergence control in MIR using SelectionDAG (#71785 )" Original commit 79889734b940356ab3381423c93ae06f22e772c9. Perviously reverted in commit a2afcd5721869d1d03c8146bae3885b3385ba15e. LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-03-04 13:28:04 +05:30
Bjorn Pettersson	da591d390e	[GlobalISel][TableGen] Take first result for multi-output instructions (#81130 ) Previously, tblgen would reject patterns where one of its nested instructions produced more than one result. These arise when the instruction definition contains 'outs' as well as 'Defs'. This patch fixes that by always taking the first result, which is how these situations are handled in SelectionIDAG. Original patch: https://reviews.llvm.org/D86617 Continued as: https://github.com/llvm/llvm-project/pull/81130	2024-03-02 20:10:02 +01:00
Pierre van Houtryve	756166e342	[AMDGPU] Improve detection of non-null addrspacecast operands (#82311 ) Use IR analysis to infer when an addrspacecast operand is nonnull, then lower it to an intrinsic that the DAG can use to skip the null check. I did this using an intrinsic as it's non-intrusive. An alternative would have been to allow something like `!nonnull` on `addrspacecast` then lower that to a custom opcode (or add an operand to the addrspacecast MIR/DAG opcodes), but it's a lot of boilerplate for just one target's use case IMO. I'm hoping that when we switch to GISel that we can move all this logic to the MIR level without losing info, but currently the DAG doesn't see enough so we need to act in CGP. Fixes: SWDEV-316445	2024-03-01 14:01:10 +01:00
Nick Anderson	ba8e9ace13	[AMDGPU] promote i1 arg type for amdgpu_cs (#82971 ) fixes #68087 Not sure where to put regression tests for this pr? Also, should i1 args not in reg also be promoted?	2024-03-01 14:25:46 +05:30
Leon Clark	5b07fd4799	[AMDGPU] Fix OpenCL conformance test failures for ctlz. (#83170 ) Remove LSH transform and restore previous lowering. Fixes conformance issue in [77615](https://github.com/llvm/llvm-project/pull/77615) where OpenCL integer_ops tests fail for integer_clz. Co-authored-by: Leon Clark <leoclark@amd.com>	2024-02-29 22:28:13 +00:00

1 2 3 4 5 ...

7260 Commits