llvm-project

Author	SHA1	Message	Date
Piyou Chen	b01c006f73	[TII][RISCV] Add renamable bit to copyPhysReg (#91179 ) The renamable flag is useful during MachineCopyPropagation but renamable flag will be dropped after lowerCopy in some case. This patch introduces extra arguments to pass the renamable flag to copyPhysReg.	2024-08-27 10:08:43 +08:00
Juan Manuel Martinez Caamaño	cbf34a5f77	[AMDGPU] Remove dead pass: AMDGPUMachineCFGStructurizer (#105645 )	2024-08-23 14:06:17 +02:00
Carl Ritson	fc6300a5f7	[AMDGPU] Disable inline constants for pseudo scalar transcendentals (#104395 ) Prevent operand folding from inlining constants into pseudo scalar transcendental f16 instructions. However still allow literal constants.	2024-08-17 16:52:38 +09:00
Ivan Kosarev	f0fe6c66cb	[AMDGPU][NFC] Rename isHi() to isHi16Reg() for clarity. (#103888 ) And declare it to take an MCRegister. Also rename related entities and remove a comment for the function that depending on its purpose is either irrelevant or misleading.	2024-08-14 17:04:15 +01:00
Brox Chen	ae059a1f9f	[AMDGPU][True16][CodeGen] support v_mov_b16 and v_swap_b16 in true16 format (#102198 ) support v_swap_b16 in true16 format. update tableGen pattern and folding for v_mov_b16. --------- Co-authored-by: guochen2 <guochen2@amd.com>	2024-08-08 16:52:59 -04:00
Matt Arsenault	ca409892c5	AMDGPU: Permit more frame index operands in verifier (#101691 ) Treat FI operands more like a register. When it gets materialized, we will typically need to introduce a scavenged register anyway. Add baseline tests for folding frame indexes into add/or.	2024-08-05 21:34:58 +04:00
Carl Ritson	62aa596ba1	[AMDGPU] Add no return image_sample intrinsics and instructions (#97542 ) An appropriately configured image resource descriptor can trigger image_sample instructions to store outputs directly to a linked memory location instead of returning to VGPRs. This is opaque to the backend as instruction encoding is unchanged; however, a mechanism is require to allow frontends to communicate that these instructions do not require destination VGPRs and store to memory. Flagging these as stores means they will not be optimized away.	2024-07-20 17:26:58 +09:00
Jay Foad	0ce3ea1bff	[AMDGPU] Simplify selection of llvm.amdgcn.inverse.ballot. NFCI. (#99345 )	2024-07-18 07:45:13 +01:00
Jay Foad	63fae3ed65	[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298 )	2024-07-17 21:11:00 +01:00
Jay Foad	f10a78b7e4	[AMDGPU] clang-tidy: use std::make_unique. NFC.	2024-07-17 07:58:09 +01:00
Carl Ritson	befd44bcdc	[AMDGPU] Update hasUnwantedEffectsWhenEXECEmpty (#97982 ) Add barriers and s_wait_event to hasUnwantedEffectsWhenEXECEmpty. Add a comment documenting the current expected use of the function.	2024-07-17 11:30:44 +09:00
Jay Foad	74b87b02d2	[AMDGPU] Fix and add namespace closing comments. NFC.	2024-07-16 16:56:31 +01:00
Jay Foad	ff81bbede4	[AMDGPU] Concatenate nested namespaces. NFC.	2024-07-16 15:36:49 +01:00
Matt Arsenault	d8b63b680d	AMDGPU: Don't fold clamp/omod modifiers without nofpexcept (#95950 )	2024-06-18 23:55:49 +02:00
Jay Foad	6b4760acc7	[AMDGPU] Make use of composeSubRegIndices. NFCI. (#95548 ) Simplify SIInstrInfo::buildExtractSubReg by building one COPY with a composed subreg index instead of two COPYs.	2024-06-14 15:07:04 +01:00
Carl Ritson	db096adba0	[AMDGPU] Remove SIWholeQuadMode pseudo wavemode optimization (#94133 ) This does not work correctly in divergent control flow. Can be replaced with a later exec mask manipulation optimizer. This reverts commit a3646ec1bc662e221c2a1d182987257c50958789.	2024-06-12 15:51:40 +09:00
Matt Arsenault	b263033c2b	AMDGPU: Remove arbitrary SCC liveness scan threshold (#94097 )	2024-06-01 14:19:38 +02:00
Matt Arsenault	5f243b3fff	AMDGPU: Generalize instruction shrinking code (#93810 ) Try to avoid referring to specific operand names, except in the special case. The special case for hasNamedOperand(Op32, sdst) seems to have been dead code.	2024-05-30 19:25:04 +02:00
Emma Pilkington	9e0be65f24	[AMDGPU] Fix broken MIR generated by gfx11 simulated trap lowering (#91652 ) This was breaking the CFG connection between uses of virtual registers after the trap and their definitions before it. Fixes SWDEV-460384. Fixes a bug in #85854.	2024-05-22 10:55:19 -04:00
Petar Avramovic	f5e49279c0	AMDGPU: fix isSafeToSink expecting exactly one predecessor (#89224 ) isSafeToSink needs to check if machine cycle has divergent exit branch but first it needs the MBB that contains cycle exit branch. Early-tailduplication can delete exit block created by structurize-cfg so there is still exactly one cycle exit block but the new cycle exit block can have multiple predecessors. Simplify search for MBBs that contain cycle exit branch by introducing helper method getExitingBlocks in GenericCycle. Fixes #89200	2024-05-10 13:02:05 +02:00
Stanislav Mekhanoshin	57216f7bd6	[AMDGPU] Support byte_sel modifier for v_cvt_f32_fp8 and v_cvt_f32_bf8 (#90887 )	2024-05-02 12:03:51 -07:00
David Stuttard	2914a11e3f	[AMDGPU] Fix hard clausing for image instructions on gfx12 (#90221 ) Also updated hard-clauses.mir to have separate versions for gfx11 and gfx12 since the MIR instructions are different for each of them.	2024-04-29 11:42:36 +01:00
Emma Pilkington	a04714701f	[AMDGPU] Add a trap lowering workaround for gfx11 (#85854 ) On gfx11 shaders run with PRIV=1, which causes `s_trap 2` to be treated as a nop, which means it isn't a correct lowering for the trap intrinsic. As a workaround, this commit instead lowers the trap intrinsic to instructions that simulate the behavior of s_trap 2. Fixes: SWDEV-438421	2024-04-24 09:43:54 -04:00
Xu Zhang	f6d431f208	[CodeGen] Make the parameter TRI required in some functions. (#85968 ) Fixes #82659 There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411. Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact. After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.	2024-04-24 14:24:14 +01:00
David Green	601e102bdb	[CodeGen] Use LocationSize for MMO getSize (#84751 ) This is part of #70452 that changes the type used for the external interface of MMO to LocationSize as opposed to uint64_t. This means the constructors take LocationSize, and convert ~UINT64_C(0) to LocationSize::beforeOrAfter(). The getSize methods return a LocationSize. This allows us to be more precise with unknown sizes, not accidentally treating them as unsigned values, and in the future should allow us to add proper scalable vector support but none of that is included in this patch. It should mostly be an NFC. Global ISel is still expected to use the underlying LLT as it needs, and are not expected to see unknown sizes for generic operations. Most of the changes are hopefully fairly mechanical, adding a lot of getValue() calls and protecting them with hasValue() where needed.	2024-03-17 18:15:56 +00:00
Carl Ritson	d9e6aa7048	[AMDGPU] Update LiveInterval def index for early-clobber (#79285 ) On converting an instruction to an early-clobber definition in convertToThreeAddress, we must also update live intervals for the register to start at the early-clobber index.	2024-03-11 14:54:11 +09:00
Shilei Tian	e963d0740e	[AMDGPU] Replace `isInlinableLiteral16` with specific version (#84402 ) The current implementation of `isInlinableLiteral16` assumes, a 16-bit inlinable literal is either an `i16` or a `fp16`. This is not always true because of `bf16`. However, we can't tell `fp16` and `bf16` apart by just looking at the value. This patch splits `isInlinableLiteral16` into three versions, `i16`, `fp16`, `bf16` respectively, and call the corresponding version.	2024-03-08 14:49:52 -05:00
David Green	44be5a7fdc	[Codegen] Make Width in getMemOperandsWithOffsetWidth a LocationSize. (#83875 ) This is another part of #70452 which makes getMemOperandsWithOffsetWidth use a LocationSize for Width, as opposed to the unsigned it currently uses. The advantages on it's own are not super high if getMemOperandsWithOffsetWidth usually uses known sizes, but if the values can come from an MMO it can help be more accurate in case they are Unknown (and in the future, scalable).	2024-03-06 17:40:13 +00:00
Shilei Tian	e9c1dbb408	Revert "[AMDGPU] Replace `isInlinableLiteral16` with specific version (#81345 )" This reverts commit 530f0e64ec11327879c44f2fd55c7c28efdbaa2d because it breaks downstream.	2024-03-06 08:42:54 -05:00
Pierre van Houtryve	52d5b8e02d	[AMDGPU] Don't form sext/abs/neg fp8 cvt (#83843 ) gfx940 does not allow abs/sext/neg on v_cvt_fp8/bf8 & pk variants. Fixes SWDEV-447468	2024-03-06 10:38:20 +01:00
Shilei Tian	530f0e64ec	[AMDGPU] Replace `isInlinableLiteral16` with specific version (#81345 )	2024-03-04 08:40:42 -05:00
Shilei Tian	46734aa1e5	[AMDGPU] Use `bf16` instead of `i16` for bfloat (#80908 ) Currently we generally use `i16` to represent `bf16` in those tablegen files. This patch is trying to use `bf16` directly. Fix #79369.	2024-02-16 15:58:30 -05:00
Corbin Robeck	2d9f350449	[AMDGPU] Consolidate SGPRSpill and VGPRSpill into single Spill bit (#81901 ) Follow on to #81525 in the series of consolidating bits in TSFlags. Merge SGPRSpill and VGPRSpill into single Spill bit Modify isSGPRSpill and isVGPRSpill helper functions to differentiate VGPR and SGPR spills: Spill+SALU=SGPR Spill Spill+VALU=VGPR Spill The only exception here is SGPR spills to VGPRs which require an explicit instruction check.	2024-02-16 13:32:59 -05:00
Konstantin Zhuravlyov	fcef407aa2	AMDGPU/NFC: Remove some bits from TSFlags (#81525 ) - AMDGPU/NFC: Purge SOPK_ZEXT from TSFlags - Moved to helper function in SIInstInfo - AMDGPU/NFC: Purge VOPAsmPrefer32Bit from TSFlags - This flag did not make sense / remnants of something else I think	2024-02-12 16:43:48 -05:00
Philip Reames	3ff7caea33	[TTI] Use Register in isLoadFromStackSlot and isStoreToStackSlot [nfc] (#80339 )	2024-02-01 17:52:35 -08:00
Shengchen Kan	550f0eb2ce	[NFC] Rename TargetInstrInfo::FoldImmediate to TargetInstrInfo::foldImmediate and simplify implementation for X86	2024-01-26 20:50:58 +08:00
Jay Foad	e89a7c41ba	[AMDGPU] Update comment on SIInstrInfo::isLegalFLATOffset for GFX12	2024-01-19 15:53:06 +00:00
Jay Foad	80ccc72ec7	[AMDGPU] Remove GFX12 encoding hack (#78702 ) This is no longer needed now that we have implemented GFX12 encoding for all instructions.	2024-01-19 12:19:29 +00:00
Jay Foad	9ca36932b5	[AMDGPU] Work around s_getpc_b64 zero extending on GFX12 (#78186 )	2024-01-18 10:23:27 +00:00
Ivan Kosarev	2a869ced61	[AMDGPU][True16] Support V_FLOOR_F16. (#78446 )	2024-01-18 08:43:47 +00:00
Mirko Brkušanin	1d286ad59b	[AMDGPU] Add mark last scratch load pass (#75512 )	2024-01-18 09:36:44 +01:00
Jay Foad	9d8e53818d	[AMDGPU] Refactor getNonSoftWaitcntOpcode and its callers (#77933 ) This avoids listing all soft waitcnt opcodes in two places (getNonSoftWaitcntOpcode and isSoftWaitcnt) and avoids the need for helpers isWaitcnt and isWaitcntVsCnt.	2024-01-12 17:12:09 +00:00
Ivan Kosarev	084f1c2ee0	[AMDGPU][True16] Support V_CEIL_F16. (#73108 ) As not all fake instructions have their real counterparts implemented yet, we specify no AssemblerPredicate for UseFakeTrue16Insts to allow both fake and real True16 instructions in assembler and disassembler tests in the -mattr=+real-true16 mode during the transition period. Source DPP and desitnation VOPDstOperand_t16 operands are still not supported and will be addressed separately.	2024-01-10 08:46:19 +00:00
Alex Bradbury	197214e39b	[RFC][SelectionDAG] Add and use SDNode::getAsZExtVal() helper (#76710 ) This follows on from #76708, allowing `cast<ConstantSDNode>(N)->getZExtValue()` to be replaced with just `N->getAsZextVal();` Introduced via `git grep -l "cast<ConstantSDNode>\(.\).getZExtValue" \| xargs sed -E -i 's/cast<ConstantSDNode>\((.*)\)->getZExtValue/\1->getAsZExtVal/'` and then using `git clang-format` on the result.	2024-01-09 12:25:17 +00:00
Jay Foad	daa4728dee	[AMDGPU] Add CodeGen support for GFX12 s_mul_u64 (#75825 )	2024-01-08 19:13:38 +00:00
Nicolai Hähnle	49b492048a	AMDGPU: Fix packed 16-bit inline constants (#76522 ) Consistently treat packed 16-bit operands as 32-bit values, because that's really what they are. The attempt to treat them differently was ultimately incorrect and lead to miscompiles, e.g. when using non-splat constants such as (1, 0) as operands. Recognize 32-bit float constants for i/u16 instructions. This is a bit odd conceptually, but it matches HW behavior and SP3. Remove isFoldableLiteralV216; there was too much magic in the dependency between it and its use in SIFoldOperands. Instead, we now simply rely on checking whether a constant is an inline constant, and trying a bunch of permutations of the low and high halves. This is more obviously correct and leads to some new cases where inline constants are used as shown by tests. Move the logic for switching packed add vs. sub into SIFoldOperands. This has two benefits: all logic that optimizes for inline constants in packed math is now in one place; and it applies to both SelectionDAG and GISel paths. Disable the use of opsel with v_dot* instructions on gfx11. They are documented to ignore opsel on src0 and src1. It may be interesting to re-enable to use of opsel on src2 as a future optimization. A similar "proper" fix of what inline constants mean could potentially be applied to unpacked 16-bit ops. However, it's less clear what the benefit would be, and there are surely places where we'd have to carefully audit whether values are properly sign- or zero-extended. It is best to keep such a change separate. Fixes: Corruption in FSR 2.0 (latent bug exposed by an LLPC change)	2024-01-04 00:10:15 +01:00
Pierre van Houtryve	33565750e4	[AMDGPU] Fix moveToValu for copy to phys SGPRs (#76715 ) Fixes #76031	2024-01-02 14:45:33 +01:00
Alex Bradbury	80aeb62211	[llvm][NFC] Use SDValue::getConstantOperandVal(i) where possible (#76708 ) This helper function shortens examples like `cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();` to `Node->getConstantOperandVal(1);`. Implemented with: `git grep -l "cast<ConstantSDNode>\(.->getOperand\(.\)\)->getZExtValue\(\)" \| xargs sed -E -i 's/cast<ConstantSDNode>\((.)->getOperand\((.)\)\)->getZExtValue\(\)/\1->getConstantOperandVal(\2)/` and `git grep -l "cast<ConstantSDNode>\(.\.getOperand\(.\)\)->getZExtValue\(\)" \| xargs sed -E -i 's/cast<ConstantSDNode>\((.)\.getOperand\((.)\)\)->getZExtValue\(\)/\1.getConstantOperandVal(\2)/'`. With a couple of simple manual fixes needed. Result then processed by `git clang-format`.	2024-01-02 13:14:28 +00:00
Ivan Kosarev	8c6172b0ac	[AMDGPU][True16] Don't use the VGPR_LO/HI16 register classes. (#76440 ) Removing the classes requires updating tests and so is planned to be done with a separate change.	2023-12-28 11:48:25 +00:00
Acim Maravic	48f36c6e74	[LLVM] Make use of s_flbit_i32_b64 and s_ff1_i32_b64 (#75158 ) Update DAG ISel to support 64bit versions S_FF1_I32_B64 and S_FLBIT_I32_B664 --------- Co-authored-by: Acim Maravic <Acim.Maravic@amd.com>	2023-12-25 11:55:20 +01:00

1 2 3 4 5 ...

856 Commits