llvm-project

Author	SHA1	Message	Date
Kazu Hirata	7ada7bbee1	[Target] Use *{Set,Map}::contains (NFC)	2023-03-14 18:06:55 -07:00
pvanhout	1f1fea6c38	Reland: [DAG/AMDGPU] Use UniformityAnalysis in DAGISel Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis. No explosions seen during internal testing so this looks like a smooth transition. Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D145918	2023-03-14 14:38:45 +01:00
pvanhout	0e79106fc9	Revert "[DAG/AMDGPU] Use UniformityAnalysis in DAGISel" This reverts commit 0022b5803fd4f5a4e9fcf233267c0ffa1b88f763.	2023-03-14 11:48:58 +01:00
pvanhout	0022b5803f	[DAG/AMDGPU] Use UniformityAnalysis in DAGISel Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis. No explosions seen during internal testing so this looks like a smooth transition. Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D145918	2023-03-14 11:18:28 +01:00
Jay Foad	dc3882eace	[AMDGPU] Fix .amdhsa_shared_vgpr_count error checking for GFX11 Differential Revision: https://reviews.llvm.org/D145936	2023-03-14 09:05:32 +00:00
Jay Foad	23b0df72d2	[AMDGPU] Remove BoolToList class Replace all: foreach _ = BoolToList<cond>.ret in with: if cond then Thanks to Philip Reames for D145711 which enabled this.	2023-03-13 09:22:52 +00:00
Simon Pilgrim	9041682d2c	[DAG] Remove redundant isZExtFree(SDValue,VT) overrides. NFC. These implementations both match the TargetLoweringBase.isZExtFree implementation	2023-03-12 15:56:04 +00:00
Jon Chesterfield	d3dda422bf	[amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD Post ISel, LDS variables are absolute values. Representing them as such is simpler than the frame recalculation currently used to build assembler tables from their addresses. This is a precursor to lowering dynamic/external LDS accesses from non-kernel functions. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144221	2023-03-12 13:47:48 +00:00
Jay Foad	5fc5c7ebe2	[AMDGPU] Make use of defvar in defining SMEM Real instructions	2023-03-10 14:31:24 +00:00
Mirko Brkusanin	2eada459c7	[AMDGPU][MachineVerifier] Fix vdata reg count for MIMG d16 Differential Revision: https://reviews.llvm.org/D145785	2023-03-10 14:47:49 +01:00
Valery Pykhtin	8f6c47b7a4	[AMDGPU] Speedup GCNDownwardRPTracker::advanceBeforeNext The function makes liveness tests for the entire live register set for every instruction it passes by. This becomes very slow on high RP regions such as ASAN enabled code. Instead only uses of last tracked instruction should be tested and this greatly improves compilation time. This patch revealed few bugs in SIFormMemoryClauses and PreRARematStage::sinkTriviallyRematInsts which should be fixed first. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D136267	2023-03-09 15:18:02 +01:00
Petar Avramovic	ded69779be	Fix SGPR + VGPR + offset Scratch offset folding Values in SGPR and VGPR register are treated as unsigned by hardware. When value in 32-bit SGPR or VGPR base can be negative calculate offset using 32-bit add instructions, otherwise use sgpr(unsigned) + vgpr(unsigned) + offset. LoopStrengthReduce.cpp changes offsets to negative and in some iterations value in SGPR or VGPR register could be negative. Differential Revision: https://reviews.llvm.org/D144957	2023-03-09 10:53:41 +01:00
Petar Avramovic	3ae310d0ae	Fix VGPR + offset Scratch offset folding Values in VGPR register are treated as unsigned by hardware. When value in 32-bit VGPR base can be negative calculate offset using 32-bit add instruction, otherwise use vgpr base(unsigned) + offset. Does not affect case where whole offset comes from VGPR register (immediate offset is 0). LoopStrengthReduce.cpp changes offsets to negative and in some iterations value in VGPR register could be negative. Differential Revision: https://reviews.llvm.org/D144956	2023-03-09 10:52:44 +01:00
Petar Avramovic	5e56d59999	Fix SGPR + offset Scratch offset folding Values in SGPR register are treated as unsigned by hardware. When value in 32-bit SGPR base can be negative calculate offset using 32-bit add instruction, otherwise use sgpr base(unsigned) + offset. Does not affect case where whole offset comes from SGPR register (immediate offset is 0). LoopStrengthReduce.cpp changes offsets to negative and in some iterations value in SGPR register could be negative. Differential Revision: https://reviews.llvm.org/D144955	2023-03-09 10:52:44 +01:00
Stanislav Mekhanoshin	e7ec123c6a	[AMDGPU] Implement idempotent atomic lowering This turns an idempotent atomic operation into an atomic load. Fixes: SWDEV-385135 Differential Revision: https://reviews.llvm.org/D144759	2023-03-08 14:09:59 -08:00
Valery Pykhtin	a999669982	[AMDGPU] Scheduler: fix RP calculation for a MBB with one successor We reuse live registers after tracking one MBB as live-ins to the successor MBB if the successor is only one but we don't check if the successor has other predecessors. `A B` ` \ /` ` C` A and B have one successor but C has live-ins defined by A and B and therefore should be initialized using LIS. This fixes 83 lit tests out if 420 with EXPENSIVE_CHECK enabled. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D136918	2023-03-08 12:20:03 +01:00
Stanislav Mekhanoshin	59162e3859	[AMDGPU] Skip buffer_wbl2 before atomic fence acquire Memory models for gfx90a and gfx940 do not require buffer_wbl2 before the fence for acquire ordering, but we do insert the full release. Fixes: SWDEV-386785 Differential Revision: https://reviews.llvm.org/D145524	2023-03-08 01:24:20 -08:00
Christudasan Devadasan	2171f04c12	[AMDGPU] Extend WorkGroupID* codegen for compute shaders Currently, the codegen support for llvm.amdgcn.workgroup.id* intrinsics are enabled only for compute kernels. In addition, this patch enables their selection for compute shaders on subtargets that have architected SGPRs. Differential Revision: https://reviews.llvm.org/D145045	2023-03-08 07:36:19 +05:30
Simon Pilgrim	e6287d57a3	[DAG] isNarrowingProfitable - consistently use SrcVT/DestVT argument names. NFC. Make it more obvious what order the narrowing types are in.	2023-03-07 14:00:06 +00:00
pvanhout	edca49cfb7	[AMDGPU] Match med3 for (max (min ..)) We previously only matched (min (max ...)) Depends on D144728 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D145159	2023-03-07 11:14:31 +01:00
pvanhout	036431e31e	[AMDGPU] Use UniformityAnalysis in LateCodeGenPrepare Reviewed By: foad Differential Revision: https://reviews.llvm.org/D145366	2023-03-06 13:35:57 +01:00
pvanhout	dbebebf6f6	[AMDGPU] Use UniformityAnalysis in CodeGenPrepare A little extra change was needed in UA because it didn't consider InvokeInst and it made call-constexpr.ll assert. Reviewed By: sameerds, arsenm Differential Revision: https://reviews.llvm.org/D145358	2023-03-06 13:26:51 +01:00
pvanhout	7a5d850da2	[AMDGPU] Use UniformityAnalysis in RewriteUndefsForPHI Reviewed By: foad Differential Revision: https://reviews.llvm.org/D145359	2023-03-06 12:15:33 +01:00
Matt Arsenault	9f4746b65f	AMDGPU: Combine down fcopysign f64 magnitude Copy through the low bits and only apply an f32 copysign to the high half. This is effectively what we do for codegen anyway, but this provides some combine benefits. The cases involving constants show some small improvements. https://reviews.llvm.org/D142682	2023-03-06 05:54:25 -04:00
Matt Arsenault	606a62ce27	AMDGPU: Force sign operand of f64 fcopysign to f32 The fcopysign DAG operation, unlike the IR one, allows different types for the sign and magnitude. We can reduce the bitwidth of the high operand since only the sign bit matters. The default combine only introduces mixed fcopysign operand types from fpext/fptrunc. We effectively do this already during selection, but doing it earlier in the combiner should expose new combine opportunities (e.g. the existing tests now eliminate the load of the low half of the double). Unfortunately this isn't enough to handle the case I'm interested in just yet.	2023-03-05 19:54:13 -04:00
Matt Arsenault	bd1f7c417f	AMDGPU: Try to push fneg as integer into select I initially attempted to select the source modifier from xor of a sign mask. This proved to be more difficult since foldBinOpIntoSelect does not consider free fneg of integers and undoes the combine.	2023-03-05 18:53:16 -04:00
Jay Foad	7ba61eaf34	[AMDGPU] More precise limit on SALU cycles in s_delay_alu instructions This just tweaks the fix for D145232 to make the limit more precise, so that we could actually emit a delay of 3 SALU cycles (the maximum) if we had any SALU instructions that required it.	2023-03-05 08:14:15 +00:00
Matt Arsenault	ce3d93e4be	AMDGPU: Use static constexpr instead of static const Not sure why this was broken, but I was seeing this linker error: ld64.lld: error: undefined symbol: (anonymous namespace)::AMDGPUInsertDelayAlu::DelayInfo::SALU_CYCLES_MAX >>> referenced by AMDGPUInsertDelayAlu.cpp:129 (/Users/matt/src/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp:129)	2023-03-03 18:50:34 -04:00
Jeffrey Byrnes	b89236a96f	[AMDGPU] Vectorize misaligned global loads & stores Based on experimentation on gfx906,908,90a and 1030, wider global loads / stores are more performant than multiple narrower ones independent of alignment -- this is especially true when combining 8 bit loads / stores, in which case speedup was usually 2x across all alignments. Differential Revision: https://reviews.llvm.org/D145170 Change-Id: I6ee6c76e6ace7fc373cc1b2aac3818fc1425a0c1	2023-03-03 13:18:25 -08:00
Jay Foad	7442f8635b	[AMDGPU] Fix invalid instid value in s_delay_alu instruction Differential Revision: https://reviews.llvm.org/D145232	2023-03-03 21:08:26 +00:00
Nikita Popov	576060fb41	[ReplaceConstant] Extract code for expanding users of constant (NFC) AMDGPU implements some handy code for expanding all constexpr users of LDS globals. Extract the core logic into ReplaceConstant, so that it can be reused elsewhere.	2023-03-03 16:09:06 +01:00
Jay Foad	08bdff862c	[AMDGPU] Fix error message for illegal copy	2023-03-03 11:46:01 +00:00
ZHU Zijia	8fccdfa436	[AMDGPU] Remove outdated FIXME in comments [NFC] This case has already been handled by D106449.	2023-03-03 01:34:19 +08:00
Anshil Gandhi	7474cd3e2e	[SIAnnotateControlFlow] Use Uniformity analysis Reviewed By: foad Differential Revision: https://reviews.llvm.org/D145013	2023-03-01 10:19:45 -07:00
Anshil Gandhi	1b52c7be91	[AMDGPUUnifyDivergentExitNodes] Use Uniformity Analysis Reviewed By: foad Differential Revision: https://reviews.llvm.org/D145018	2023-03-01 10:17:11 -07:00
Ivan Kosarev	b06e5ad8a6	[AMDGPU][AsmParser][NFC] Simplify parsing cache policies. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D144954	2023-03-01 12:34:21 +00:00
Anshil Gandhi	a78301560d	[AMDGPU] Replace LegacyDA with Uniformity Analysis in AnnotateUniformValues Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D144162	2023-02-28 13:05:38 -07:00
Ivan Kosarev	905fa15d84	[AMDGPU][AsmParser] Distinguish literal and modifier SMEM offsets. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D144902	2023-02-28 12:58:54 +00:00
Ivan Kosarev	dbbab71b76	[AMDGPU][NFC] Eliminate the u32imm operand definition. It is only used to infer the types of offset parameters in isel patterns, which we can specify directly. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D144890	2023-02-28 12:23:47 +00:00
Jon Chesterfield	bf579a7049	[amdgpu] Change LDS lowering default to hybrid Postponed from D139433 until the bug fixed by D139874 could be resolved. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D141852	2023-02-24 15:20:12 +00:00
Justin Bogner	c083c89744	[AMDGPU] Move V_FMA_MIX pattern matching into tablegen. NFC The matching for V_FMA_MIX was partially implemented with a C++ matcher (for fmas with 32 bit results and 16 bit inputs) and partially in tablegen (for fmas with 16 bit results). Move the C++ matcher logic into tablegen to make this more consistent and so we can remove the duplication between SDAG and GISel. Differential Revision: https://reviews.llvm.org/D144612	2023-02-23 10:23:34 -08:00
Jay Foad	dcb834843e	[AMDGPU] Split SIModeRegisterDefaults out of AMDGPUBaseInfo. NFC. This is only used by CodeGen. Moving it out of AMDGPUBaseInfo simplifies future changes to make some of it depend on the subtarget. Differential Revision: https://reviews.llvm.org/D144650	2023-02-23 16:38:15 +00:00
Mirko Brkusanin	926746d22a	[AMDGPU][GFX11] Legalize and select partial NSA MIMG instructions If more registers are needed for VAddr then the NSA format allows then the final register can act as a contigous set of remaining addresses. Update legalizer to pack register for this new format and allow instruction selection to use NSA encoding when number of addresses exceeds max size. Also update SIShrinkInstructions to handle partial NSA. Differential Revision: https://reviews.llvm.org/D144034	2023-02-23 13:33:34 +01:00
Mirko Brkusanin	b3dc0e69cf	[AMDGPU][MC][GFX11] Add Partial NSA format for image sample instructions Image sample instructions that need more than 5 VGPRs for VAddr can use partial NSA for NSA encoding format. VGPRs that can not fit into the encoding are sequential after the last one. This patch adds assembly and disassembly parts. Differential Revision: https://reviews.llvm.org/D144033	2023-02-23 13:33:34 +01:00
Piotr Sobczak	51a49ec52a	[AMDGPU] Clean up MUBUF immediate offset D143174 lifted the artificial type restriction by promoting offset to i32. This patch handles more cases: those involving immediate offset in MUBUF. Differential Revision: https://reviews.llvm.org/D144628	2023-02-23 13:29:53 +01:00
Piotr Sobczak	a3d7b3121c	[AMDGPU][NFC] Add getMaxMUBUFImmOffset Replace magic constant 4095 with the function getMaxMUBUFImmOffset(). Differential Revision: https://reviews.llvm.org/D144623	2023-02-23 11:29:59 +01:00
Konstantina Mitropoulou	944f429b21	[AMDGPU] Improve the lowering of raw_buffer_load_{i8,i16} and struct_buffer_load_{i8,i16} intrinsics Currently, raw_buffer_load_{i8,i16} and struct_buffer_load_{i8,i16} intrinsics are lowered as buffer_load_{u8,u16}. This patch combines buffer_load_{u8,u16} and sign extension instructions in order to generate buffer_load_{i8,i16} instructions. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D144313	2023-02-22 09:01:33 -08:00
Joe Nash	80a8e6805a	[AMDGPU] Don't set src mods on permlane16 v_permlane16_b32 and v_permlanex16_b32 should not set abs and neg src modifiers on any input, but they can set op_sel on src0 or src1 to represent fi or bc when desired. The ISel patterns were setting the src_modifier bits to -1, effectively setting abs and neg as well, whenever it was intended to set op_sel, due to an error in ISel. ISel should now correctly only set the op_sel bits. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144519	2023-02-22 11:41:52 -05:00
Jay Foad	c9f4df57ca	[AMDGPU] Move splitMUBUFOffset from AMDGPUBaseInfo to SIInstrInfo Moving this out of AMDGPUBaseInfo enforces that AMDGPUBaseInfo should not be calling into GCNSubtarget. Differential Revision: https://reviews.llvm.org/D144564	2023-02-22 16:19:05 +00:00
Jessica Del	fc672b6a8b	[AMDGPU] Improved wide multiplies These checks show optimized instructions if an operand is known to be (partially) zero. Change-Id: Ie2f6d0d3ee9d5b279d1f4c1dd0787492e39cc77a Differential Revision: https://reviews.llvm.org/D140208	2023-02-22 16:39:06 +01:00

1 2 3 4 5 ...

7724 Commits