llvm-project

Author	SHA1	Message	Date
Aakanksha Patil	3453f3dd46	[AMDGPU] Add gfx1035 target Differential Revision: https://reviews.llvm.org/D104804	2021-06-24 14:32:41 -04:00
Carl Ritson	98f48723f2	[AMDGPU] Add 224-bit vector types and link 192-bit types to MVTs Add SReg_224, VReg_224, AReg_224, etc. Link 224-bit types with v7i32/v7f32. Link existing 192-bit types to newly added v3i64/v3f64/v6i32/v6f32. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D104622	2021-06-24 12:41:22 +09:00
Jon Chesterfield	660cae84c3	Revert "[AMDGPU] [IndirectCalls] Don't propagate attributes to address taken functions and their callees" This reverts commit 6a3beb1f68d6791a4cd0190f68b48510f754a00a. Test case that triggers an infinite loop before the revert is at the review for D103138.	2021-06-24 02:33:50 +01:00
Stanislav Mekhanoshin	d274d64ef4	[AMDGPU] Check for pointer operand while refining LDS align Also skips the propagation if alignment is 1. Differential Revision: https://reviews.llvm.org/D104796	2021-06-23 12:27:55 -07:00
Jinsong Ji	c125af82a5	[DAGCombine] Check reassoc flags in aggressive fsub fusion The is from discussion in https://reviews.llvm.org/D104247#inline-993387 The contract and reassoc flags shouldn't imply each other . All the aggressive fsub fusion reassociate operations, we should guard them with reassoc flag check. Reviewed By: mcberg2017 Differential Revision: https://reviews.llvm.org/D104723	2021-06-23 13:59:40 +00:00
Stanislav Mekhanoshin	2b43209ee3	[AMDGPU] Propagate LDS align into to instructions Differential Revision: https://reviews.llvm.org/D104316	2021-06-23 00:57:16 -07:00
Matt Arsenault	39f8a792f0	AMDGPU: Try to eliminate clearing of high bits of 16-bit instructions These used to consistently be zeroed pre-gfx9, but gfx9 made the situation complicated since now some still do and some don't. This also manages to pick up a few cases that the pattern fails to optimize away. We handle some cases with instruction patterns, but some get through. In particular this improves the integer cases.	2021-06-22 13:42:49 -04:00
Matt Arsenault	2e120920ac	AMDGPU: Add baseline test for instructions zeroing high bits	2021-06-22 13:27:39 -04:00
Matt Arsenault	9ad8a1f6fb	AMDGPU: Fix high 16-bit optimization on gfx9 We can do this optimization in the majority of cases, but we currently don't have a way to do it. We do not track/model which instructions have which behavior, the control bit to change the high bit behavior, or making use of preserved bits at all. This is a bit fuzzy since we don't know precisely how the source instruction will be lowered, but that only really matters in one case (for fma_mixlo). We do need to fixup some of these cases after selection, but the pattern helps eliminate many of these zexts.	2021-06-22 13:16:45 -04:00
Stanislav Mekhanoshin	d797a7f8da	[AMDGPU] Use performOptimizedStructLayout for LDS sort This gives better packing. Differential Revision: https://reviews.llvm.org/D104331	2021-06-22 09:58:10 -07:00
Fangrui Song	f53d791520	Improve the diagnostic of DiagnosticInfoResourceLimit (and warn-stack-size in particular) Before: `warning: stack size limit exceeded (888) in main` After: `warning: stack frame size (888) exceeds limit (100) in function 'main'` (the -Wframe-larger-than limit will be mentioned) Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D104667	2021-06-22 09:55:20 -07:00
Jinsong Ji	3996311ee1	[DAGCombine] reassoc flag shouldn't enable contract According to IR LangRef, the FMF flag: contract Allow floating-point contraction (e.g. fusing a multiply followed by an addition into a fused multiply-and-add). reassoc Allow reassociation transformations for floating-point instructions. This may dramatically change results in floating-point. My understanding is that these two flags shouldn't imply each other, as we might have a SDNode that can be reassociated with others, but not contractble. eg: We may want following fmul/fad/fsub to freely reassoc, but don't want fma being generated here. %F = fmul reassoc double %A, %B ; <double> [#uses=1] %G = fmul reassoc double %C, %D ; <double> [#uses=1] %H = fadd reassoc double %F, %G ; <double> [#uses=1] %I = fsub reassoc double %H, %E ; <double> [#uses=1] Before https://reviews.llvm.org/D45710, `reassoc` flag actually did not imply isContratable either. The current implementation also only check the flag in fadd node, ignoring fmul node, this patch update that as well. Reviewed By: spatel, qiucf Differential Revision: https://reviews.llvm.org/D104247	2021-06-21 21:15:43 +00:00
Matt Arsenault	4819cd162e	AMDGPU: Add missing tests for v_fma_mixlo	2021-06-21 10:58:53 -04:00
Ruiling Song	208332de8a	[AMDGPU] Add Optimize VGPR LiveRange Pass. This pass aims to optimize VGPR live-range in a typical divergent if-else control flow. For example: def(a) if(cond) use(a) ... // A else use(a) As AMDGPU access vgpr with respect to active-mask, we can mark `a` as dead in region A. For details, please refer to the comments in implementation file. The pass is enabled by default, the frontend can disable it through "-amdgpu-opt-vgpr-liverange=false". Differential Revision: https://reviews.llvm.org/D102212	2021-06-21 15:25:55 +08:00
hsmahesha	80fd5fa526	[AMDGPU] Replace non-kernel function uses of LDS globals by pointers. The main motivation behind pointer replacement of LDS use within non-kernel functions is - to avoid subsequent LDS lowering pass from directly packing LDS (assume large LDS) into a struct type which would otherwise cause allocating huge memory for struct instance within every kernel. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103225	2021-06-21 11:51:49 +05:30
David Green	a24b02193a	[DSE] Remove stores in the same loop iteration DSE will currently only remove stores in the same block unless they can be guaranteed to be loop invariant. This expands that to any stores that are in the same Loop, at the same loop level. This should still account for where AA/MSSA will not handle aliasing between loops, but allow the dead stores to be removed where they overlap in the same loop iteration. It requires adding loop info to DSE, but that looks fairly harmless. The test case this helps is from code like this, which can come up in certain matrix operations: for(i=..) dst[i] = 0; for(j=..) dst[i] += src[in+j]; After LICM, this becomes: for(i=..) dst[i] = 0; sum = 0; for(j=..) sum += src[in+j]; dst[i] = sum; The first store is dead, and with this patch is now removed. Differntial Revision: https://reviews.llvm.org/D100464	2021-06-20 17:03:30 +01:00
Michael Liao	940efa4f69	[amdgpu] Improve the from f32 to i64. - Take the same principle as the conversion from f64 to i64 with extra necessary pre- and post-processing. It helps to reduce that conversion sequence by half compared to legacy one. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D104427	2021-06-19 12:46:48 -04:00
Matt Arsenault	d6467e00df	AMDGPU: Fix infinite loop in DAG combine with fneg + fma We were not reporting isFNegFree for v2f32, although it is effectively free after legalization. The generic combine was pulling fneg out of the fma source operands, and the AMDGPU combine was doing the opposite.	2021-06-18 19:09:03 -04:00
Matt Arsenault	ad4a18251a	AMDGPU: Fix assert on m0_lo16/m0_hi16 These get added (redundantly) to the bundle expanded for indirect register accesses. We hit this path only when there is a call in the function.	2021-06-18 18:48:53 -04:00
Anshil Gandhi	2e5dc4a1ef	[AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask Implemented the transformation of xor (llvm.amdgcn.class x, mask), -1 into llvm.amdgcn.class(x, ~mask). Added LIT tests as well. Differential Revision: https://reviews.llvm.org/D104049	2021-06-18 13:04:12 -06:00
Jay Foad	1f9dcd2b73	[AMDGPU] Update generated checks. NFC.	2021-06-18 10:49:02 +01:00
Jon Roelofs	a2ab765029	[GISel] Eliminate redundant bitmasking This was a GISel vs SDAG regression that showed up at -Os on arm64 in: SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding.test https://llvm.godbolt.org/z/aecjodsjG Differential revision: https://reviews.llvm.org/D103334	2021-06-17 12:53:00 -07:00
Bjorn Pettersson	4c7f820b2b	Update @llvm.powi to handle different int sizes for the exponent This can be seen as a follow up to commit 0ee439b705e82a4fe20e2, that changed the second argument of __powidf2, __powisf2 and __powitf2 in compiler-rt from si_int to int. That was to align with how those runtimes are defined in libgcc. One thing that seem to have been missing in that patch was to make sure that the rest of LLVM also handle that the argument now depends on the size of int (not using the si_int machine mode for 32-bit). When using __builtin_powi for a target with 16-bit int clang crashed. And when emitting libcalls to those rtlib functions, typically when lowering @llvm.powi), the backend would always prepare the exponent argument as an i32 which caused miscompiles when the rtlib was compiled with 16-bit int. The solution used here is to use an overloaded type for the second argument in @llvm.powi. This way clang can use the "correct" type when lowering __builtin_powi, and then later when emitting the libcall it is assumed that the type used in @llvm.powi matches the rtlib function. One thing that needed some extra attention was that when vectorizing calls several passes did not support that several arguments could be overloaded in the intrinsics. This patch allows overload of a scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with an entry for powi. Differential Revision: https://reviews.llvm.org/D99439	2021-06-17 09:38:28 +02:00
Stanislav Mekhanoshin	0a07343e34	[AMDGPU] Fixed constexpr expansion to handle multiple uses Recently added convertConstantExprsToInstructions() does not handle a case when a same ConstantExpr used multiple times in the same instruction. A first use is replaced and the rest of the uses in the instruction are replaced as well with the replaceUsesOfWith(). Then function attempts to replace a constant already destroyed. So far this interface is only used by the AMDGPU BE. Differential Revision: https://reviews.llvm.org/D104425	2021-06-16 16:57:41 -07:00
Stanislav Mekhanoshin	a11880468e	[AMDGPU] Fix lds superalign test. NFC.	2021-06-15 11:02:34 -07:00
madhur13490	c27e8141b3	[AMDGPU][IndirectCalls] Fix register usage propagation for indirect/external calls This patch computes max SGPRs and VGPRs used by module in presence of indirect calls and makes that as register requirement for functions/kernels which makes indirect calls. This patch also refactors code AMDGPUSubTarget.cpp which add a "base" variants of getMaxNumSGPRs which is used by MachineFunction and new Function version. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D103636	2021-06-12 11:59:34 +05:30
hsmahesha	f6632f11ed	[AMDGPU] Fix missing lowering of LDS used in global scope. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103431	2021-06-10 08:40:01 +05:30
Brendon Cahoon	294efbbd3e	Reland "[AMDGPU] Add gfx1013 target" This reverts commit 211e584fa2a4c032e4d573e7cdbffd622aad0a8f. Fixed a use-after-free error that caused the sanitizers to fail.	2021-06-08 21:15:35 -04:00
Brendon Cahoon	211e584fa2	Revert "[AMDGPU] Add gfx1013 target" This reverts commit ea10a86984ea73fcec3b12d22404a15f2f59b219. A sanitizer buildbot reports an error.	2021-06-08 16:29:41 -04:00
David Green	297088d1ad	Revert "[DSE] Remove stores in the same loop iteration" Apparently non-dead stores are being removed, as noted in D100464. This reverts commit 222aeb4d51a46c5a81c9e4ccb16d1d19dd21ec95.	2021-06-08 21:23:08 +01:00
Michael Liao	27332968d8	[amdgpu] Add `-enable-ocl-mangling-mismatch-workaround`. - Add `-enable-ocl-mangling-mismatch-workaround` to work around the mismatch on OCL name mangling so far. Reviewed By: yaxunl, rampitec Differential Revision: https://reviews.llvm.org/D103920	2021-06-08 15:42:27 -04:00
Matt Arsenault	31a9659de5	GlobalISel: Avoid use of G_INSERT in insertParts G_INSERT legalization is incomplete and doesn't work very well. Instead try to use sequences of G_MERGE_VALUES/G_UNMERGE_VALUES padding with undef values (although this can get pretty large). For the case of load/store narrowing, this is still performing the load/stores in irregularly sized pieces. It might be cleaner to split this down into equal sized pieces, and rely on load/store merging to optimize it.	2021-06-08 14:44:24 -04:00
Brendon Cahoon	ea10a86984	[AMDGPU] Add gfx1013 target Differential Revision: https://reviews.llvm.org/D103663	2021-06-08 12:49:49 -04:00
Carl Ritson	f8816c7400	[AMDGPU] Add v5f32/VReg_160 support for MIMG instructions Avoid having to round up to v8f32/VReg_256 when only 5 VGPRs are required for a MIMG address operand. Maintain _V8 instruction variants of pseudo instructions allowing assembly prior to GFX10 to work as-is. Currently the validator can tell for GFX10 what the correct size is, so will disallow oversize address registers. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103672	2021-06-08 11:11:40 +09:00
Stanislav Mekhanoshin	05289dfb62	[AMDGPU] Handle constant LDS uses from different kernels This allows to lower an LDS variable into a kernel structure even if there is a constant expression used from different kernels. Differential Revision: https://reviews.llvm.org/D103655	2021-06-07 15:39:08 -07:00
hsmahesha	713ca2f360	[AMDGPU] Introduce command line switch to control super aligning of LDS. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103817	2021-06-08 03:58:13 +05:30
Matt Arsenault	ccf28ea800	AMDGPU: Move codegen test out of MIR test directory This is testing an actual pass, not the MIR parser/printer.	2021-06-07 14:26:48 -04:00
Sebastian Neubauer	96e1fcb1e0	[AMDGPU] Use s_add_i32 for address additions This allows to convert the add instruction to s_addk_i32 and v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU instruction. Differential Revision: https://reviews.llvm.org/D103322	2021-06-07 16:09:48 +02:00
hsmahesha	52ffbfdffc	[AMDGPU] Increase alignment of LDS globals if necessary before LDS lowering. Before packing LDS globals into a sorted structure, make sure that their alignment is properly updated based on their size. This will make sure that the members of sorted structure are properly aligned, and hence it will further reduce the probability of unaligned LDS access. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103261	2021-06-07 18:00:41 +05:30
Mirko Brkusanin	35ef4c940b	[AMDGPU][GlobalISel] Legalize G_ABS Legalize and select G_ABS so that we can use llvm.abs intrinsic Differential Revision: https://reviews.llvm.org/D102391	2021-06-04 14:46:43 +02:00
madhur13490	6a3beb1f68	[AMDGPU] [IndirectCalls] Don't propagate attributes to address taken functions and their callees Don't propagate launch bound related attributes to address taken functions and their callees. The idea is to do a traversal over the call graph starting at address taken functions and erase the attributes set by previous logic i.e. process(). This two phase approach makes sure that we don't miss out on deep nested callees from address taken functions as a function might be called directly as well as indirectly. This patch is also reattempt to D94585 as latent issues are fixed in hasAddressTaken function in the recent past. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D103138	2021-06-04 11:36:56 +05:30
hsmahesha	753437fc1d	Revert "[AMDGPU] Increase alignment of LDS globals if necessary before LDS lowering." This reverts commit d71ff907ef23eaef86ad66ba2d711e4986cd6cb2.	2021-06-04 11:16:46 +05:30
hsmahesha	d71ff907ef	[AMDGPU] Increase alignment of LDS globals if necessary before LDS lowering. Before packing LDS globals into a sorted structure, make sure that their alignment is properly updated based on their size. This will make sure that the members of sorted structure are properly aligned, and hence it will further reduce the probability of unaligned LDS access. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103261	2021-06-04 09:34:37 +05:30
Julien Pagès	37821155c9	[AMDGPU] Fix a crash when selecting a particular case of buffer_load_format_d16 In this particular example, we had a crash when compiling it for several architectures. This patch extends the legalization of extract_subvector to avoid this problem. Differential Revision: https://reviews.llvm.org/D103344	2021-06-03 16:40:18 -04:00
Stanislav Mekhanoshin	9e2e49328f	[AMDGPU] All GWS instructions need aligned VGPR on gfx90a Fixes: SWDEV-288006 Differential Revision: https://reviews.llvm.org/D103197	2021-06-01 17:08:03 -07:00
Arthur Eubanks	8815ce03e8	Remove "Rewrite Symbols" from codegen pipeline It breaks up the function pass manager in the codegen pipeline. With empty parameters, it looks at the -mllvm flag -rewrite-map-file. This is likely not in use. Add a check that we only have one function pass manager in the codegen pipeline. Some tests relied on the fact that we had a module pass somewhere in the codegen pipeline. addr-label.ll crashes on ARM due to this change. This is because a ARMConstantPoolConstant containing a BasicBlock to represent a blockaddress may hold an invalid pointer to a BasicBlock if the blockaddress is invalidated by its BasicBlock getting removed. In that case all referencing blockaddresses are RAUW a constant int. Making ARMConstantPoolConstant::CVal a WeakVH fixes the crash, but I'm not sure that's the right fix. As a workaround, create a barrier right before ISel so that IR optimizations can't happen while a ARMConstantPoolConstant has been created. Reviewed By: rnk, MaskRay, compnerd Differential Revision: https://reviews.llvm.org/D99707	2021-05-31 08:32:36 -07:00
Djordje Todorovic	dee85d47d9	[LiveDebugVariables] Stop trimming locations of non-inlined vars The D35953, D62650 and D73691 introduced trimming of variables locations in LiveDebugVariables pass, since there are some cases where after the virtregrewrite we have exploded number of DBG_VALUEs created for some inlined variables. As it looks, all problematic cases were regarding inlined variables, so it seems reasonable to stop trimming the location ranges for non-inlined variables. It has very good impact on the llvm-locstats report. Differential Revision: https://reviews.llvm.org/D102917	2021-05-31 02:59:19 -07:00
David Green	222aeb4d51	[DSE] Remove stores in the same loop iteration DSE will currently only remove stores in the same block unless they can be guaranteed to be loop invariant. This expands that to any stores that are in the same Loop, at the same loop level. This should still account for where AA/MSSA will not handle aliasing between loops, but allow the dead stores to be removed where they overlap in the same loop iteration. It requires adding loop info to DSE, but that looks fairly harmless. The test case this helps is from code like this, which can come up in certain matrix operations: for(i=..) dst[i] = 0; for(j=..) dst[i] += src[in+j]; After LICM, this becomes: for(i=..) dst[i] = 0; sum = 0; for(j=..) sum += src[in+j]; dst[i] = sum; The first store is dead, and with this patch is now removed. Differntial Revision: https://reviews.llvm.org/D100464	2021-05-31 10:22:37 +01:00
Sebastian Neubauer	690f5b7a01	[AMDGPU] Fix function calls with flat scratch When flat scratch is used, the stack pointer needs to be added when writing arguments to the stack. For buffer instructions, this is done in SelectMUBUFScratchOffen and SelectMUBUFScratchOffset. Move that to call argument lowering, like it is done in GlobalISel. Differential Revision: https://reviews.llvm.org/D103166	2021-05-28 11:22:13 +02:00
Sebastian Neubauer	6133b60a27	[AMDGPU] Precommit test Add scratch run to gfx-callable-argument-types.ll.	2021-05-28 11:22:13 +02:00

1 2 3 4 5 ...

4622 Commits