llvm-project

Author	SHA1	Message	Date
Nikita Popov	23dd750279	Revert "[IR] Don't mark mustprogress as type attribute" This reverts commit 84ed3a794b4ffe7bd673f1e5a17d507aa3113d12. A number of clang tests are also affected by this change. Revert until I can update them.	2021-07-09 18:46:00 +02:00
Nikita Popov	84ed3a794b	[IR] Don't mark mustprogress as type attribute This is a simple enum attribute. Test changes are because enum attributes are sorted before type attributes.	2021-07-09 18:24:16 +02:00
Nico Weber	97c675d3d4	Revert "Revert "Temporarily do not drop volatile stores before unreachable"" This reverts commit 52aeacfbf5ce5f949efe0eae029e56db171ea1f7. There isn't full agreement on a path forward yet, but there is agreement that this shouldn't land as-is. See discussion on https://reviews.llvm.org/D105338 Also reverts unreviewed "[clang] Improve `-Wnull-dereference` diag to be more in-line with reality" This reverts commit f4877c78c0fc98be47b926439bbfe33d5e1d1b6d. And all the related changes to tests: This reverts commit 9a0152799f8e4a59e0483728c9f11c8a7805616f. This reverts commit 3f7c9cc27422f7302cf5a683eeb3978e6cb84270. This reverts commit 329f8197ef59f9bd23328b52d623ba768b51dbb2. This reverts commit aa9f58cc2c48ca6cfc853a2467cd775dc7622746. This reverts commit 2df37d5ddd38091aafbb7d338660e58836f4ac80. This reverts commit a72a44181264fd83e05be958c2712cbd4560aba7.	2021-07-09 11:44:34 -04:00
Roman Lebedev	2df37d5ddd	[NFC][Codegen] Harden a few tests to not rely that volatile store to null isn't erased	2021-07-09 13:30:42 +03:00
Stanislav Mekhanoshin	e5b0fe1b83	[AMDGPU] Mark more SOP instructions as rematerializable The rest of the SOP instructions implicitly set SCC and not suitable for the rematerialization. Differential Revision: https://reviews.llvm.org/D105670	2021-07-08 16:00:45 -07:00
Stanislav Mekhanoshin	de5582be26	[AMDGPU] Fix more indention in llc-pipeline test. NFC.	2021-07-08 11:20:00 -07:00
Stanislav Mekhanoshin	9dae86ce56	[AMDGPU] Fix indention in llc-pipeline test. NFC.	2021-07-08 11:08:25 -07:00
Stanislav Mekhanoshin	74a5760d35	[AMDGPU] Set LoopInfo as preserved by SIAnnotateControlFlow The pass does not change loops, it just adds calls. Differential Revision: https://reviews.llvm.org/D105583	2021-07-08 09:34:43 -07:00
Stanislav Mekhanoshin	0fdb25cd95	[AMDGPU] Disable garbage collection passes Differential Revision: https://reviews.llvm.org/D105593	2021-07-07 15:47:57 -07:00
Stanislav Mekhanoshin	a0ab45799b	[AMDGPU] Move atomic expand past infer address spaces There are cases where infer address spaces pass cannot yet infer an address space in the opt pipeline and then in the llc pipeline it runs too late for atomic expand pass to benefit from a specific address space. Move atomic expand pass past the infer address spaces. Fixes: SWDEV-293410 Differential Revision: https://reviews.llvm.org/D105511	2021-07-06 15:53:32 -07:00
Stanislav Mekhanoshin	5915d33874	[AMDGPU] Do not run IR optimizations at -O0 Differential Revision: https://reviews.llvm.org/D105515	2021-07-06 15:29:52 -07:00
Sebastian Neubauer	db646de3ee	[AMDGPU] Set optional PAL metadata Set informational fields in the .shader_functions table. Also correct the documentation, .scratch_memory_size and .lds_size are integers. Differential Revision: https://reviews.llvm.org/D105116	2021-07-06 11:58:00 +02:00
David Stuttard	83cb9632a1	[DAGCombiner] Add support for mulhi const folding in DAGCombiner Differential Revision: https://reviews.llvm.org/D103323 Change-Id: I4ffaaa32301795ba8a339567a68e77fe0862b869	2021-07-05 12:01:26 +01:00
David Stuttard	4b125b23ba	[DAGCombiner] Pre-commit test to demonstrate mulhi const folding D103323 will fold this Differential Revision: https://reviews.llvm.org/D105424 Change-Id: I64947215eb531fbd70b52a72203b39e43fefafcc	2021-07-05 11:34:38 +01:00
David Stuttard	b8173c3178	[AMDGPU] Stop mulhi from doing 24 bit mul for uniform values Added support to check if architecture supports s_mulhi which is used as part of the decision whether or not to use valu 24 bit mul (if the mulhi gets transformed to a valu op anyway, then may as well use it). This is an extension of the work in D97063 Differential Revision: https://reviews.llvm.org/D103321 Change-Id: I80b1323de640a52623d69ac005a97d06a5d42a14	2021-07-05 10:33:23 +01:00
Matt Arsenault	99c7e918b5	GlobalISel: Use LLT in call lowering callbacks This preserves the memory type so the lowerings can rely on them.	2021-07-01 12:15:54 -04:00
Stanislav Mekhanoshin	661577e698	[AMDGPU] Fix immediate sign during V_MOV_B64_PSEUDO expansion Creating a V_MOV_B32 with zero extended immediate source prevented conversion to V_BFREV_B32. Differential Revision: https://reviews.llvm.org/D105235	2021-07-01 09:00:29 -07:00
Matt Arsenault	28f2f66200	GlobalISel: Use LLT in memory legality queries This enables proper lowering of non-byte sized loads. We still aren't faithfully preserving memory types everywhere, so the legality checks still only consider the size.	2021-06-30 17:44:13 -04:00
Matt Arsenault	a601b308d9	GlobalISel: Lower non-byte loads and stores Previously we didn't preserve the memory type and had to blindly interpret a number of bytes. Now that non-byte memory accesses are representable, we can handle these correctly. Ported from DAG version (minus some weird special case i1 legality checking which I don't fully understand, and we don't have a way to query for) For now, this is NFC and the test changes are placeholders. Since the legality queries are still relying on byte-flattened memory sizes, the legalizer can't actually see these non-byte accesses. This keeps this change self contained without merging it with the larger patch to switch to LLT memory queries.	2021-06-30 17:05:50 -04:00
Matt Arsenault	748e0b07dc	GlobalISel: Preserve memory type when reducing load/store width	2021-06-30 17:05:29 -04:00
Matt Arsenault	d6270125fc	AMDGPU/GlobalISel: Remove some problematic testcases These testcases are a bit nonsensical and won't be handled correctly for a long time. Remove them to unblock load/store legalization work.	2021-06-30 17:05:29 -04:00
Matt Arsenault	fae05692a3	CodeGen: Print/parse LLTs in MachineMemOperands This will currently accept the old number of bytes syntax, and convert it to a scalar. This should be removed in the near future (I think I converted all of the tests already, but likely missed a few). Not sure what the exact syntax and policy should be. We can continue printing the number of bytes for non-generic instructions to avoid test churn and only allow non-scalar types for generic instructions. This will currently print the LLT in parentheses, but accept parsing the existing integers and implicitly converting to scalar. The parentheses are a bit ugly, but the parser logic seems unable to deal without either parentheses or some keyword to indicate the start of a type.	2021-06-30 16:54:13 -04:00
Jon Roelofs	a642872476	[GISel] Support llvm.memcpy.inline Differential revision: https://reviews.llvm.org/D105072	2021-06-30 12:39:05 -07:00
Stanislav Mekhanoshin	381ded345b	[AMDGPU] Add S_MOV_B64_IMM_PSEUDO for wide constants This is to allow 64 bit constant rematerialization. If a constant is split into two separate moves initializing sub0 and sub1 like now RA cannot rematerizalize a 64 bit register. This gives 10-20% uplift in a set of huge apps heavily using double precession math. Fixes: SWDEV-292645 Differential Revision: https://reviews.llvm.org/D104874	2021-06-30 11:45:38 -07:00
Tony Tye	7f19aa73c2	[AMDGPU] Update gfx90a memory model support Update AMDGPU gfx90a memory model to make coarse grain memory allocations consistent when fine grained system scope atomic acquire and release is performed. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D105137	2021-06-30 04:05:22 +00:00
Matt Arsenault	990278d026	CodeGen: Store LLT instead of uint64_t in MachineMemOperand GlobalISel is relying on regular MachineMemOperands to track all of the memory properties of accesses. Just the raw byte size is insufficent to disambiguate all situations. For example, if we need to split an unaligned extending load, we need to know the number of bits in the original source value and can't infer it from the result type. This is also a problem for extending vector loads. This does decrease the maximum representable size from the full uint64_t bytes to a maximum of 16-bits. No in tree testcases hit this, other than places using UINT64_MAX for unknown sizes. This may be an issue for G_MEMCPY and co., although they can just use unknown size for large static sizes. This also has potential for backend abuse by relying on the type when it really shouldn't be relevant after selection. This does not include the necessary MIR printer/parser changes to represent this.	2021-06-29 17:38:51 -04:00
Fangrui Song	a9854045f6	[test] Change -t to --syms and -s to -S for llvm-readobj RUN lines -s and -t will be changed to improve consistency with llvm-readelf. The inconsistency issue regularly contributes to confusion using the two tools.	2021-06-29 11:50:31 -07:00
Piotr Sobczak	f38a8b54ea	[AMDGPU] Fix 224-bit spills Related to D104622. Differential Revision: https://reviews.llvm.org/D105109	2021-06-29 17:52:16 +02:00
Brendon Cahoon	f9f5d41545	[AMDGPU][GlobalISel] Legalize and select G_SBFX and G_UBFX Adds legalizer, register bank select, and instruction select support for G_SBFX and G_UBFX. These opcodes generate scalar or vector ALU bitfield extract instructions for AMDGPU. The instructions allow both constant or register values for the offset and width operands. The 32-bit scalar version is expanded to a sequence that combines the offset and width into a single register. There are no 64-bit vgpr bitfield extract instructions, so the operations are expanded to a sequence of instructions that implement the operation. If the width is a constant, then the 32-bit bitfield extract instructions are used. Moved the AArch64 specific code for creating G_SBFX to CombinerHelper.cpp so that it can be used by other targets. Only bitfield extracts with constant offset and width values are handled currently. Differential Revision: https://reviews.llvm.org/D100149	2021-06-28 09:06:44 -04:00
Jon Chesterfield	50ad3478bd	Disable ReplaceLDS pass, patch up tests to match Most tests passed with an extra argument to explicitly enable the pass. One does not, deleted it as part of this change. I can't see why the codegen would be different between default on and default off but switched on. It can be retrieved from the project history. This would be a revert, but git revert was not clean. Disabling the pass and leaving it in tree is less likely to cause breakage elsewhere than patching up the git revert conflicts on unfamiliar code. It'll be landed without review, as @hsmhsm is believed unavailable at present. Differential Revision: https://reviews.llvm.org/D104962	2021-06-26 01:36:42 +01:00
Aakanksha Patil	3453f3dd46	[AMDGPU] Add gfx1035 target Differential Revision: https://reviews.llvm.org/D104804	2021-06-24 14:32:41 -04:00
Carl Ritson	98f48723f2	[AMDGPU] Add 224-bit vector types and link 192-bit types to MVTs Add SReg_224, VReg_224, AReg_224, etc. Link 224-bit types with v7i32/v7f32. Link existing 192-bit types to newly added v3i64/v3f64/v6i32/v6f32. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D104622	2021-06-24 12:41:22 +09:00
Jon Chesterfield	660cae84c3	Revert "[AMDGPU] [IndirectCalls] Don't propagate attributes to address taken functions and their callees" This reverts commit 6a3beb1f68d6791a4cd0190f68b48510f754a00a. Test case that triggers an infinite loop before the revert is at the review for D103138.	2021-06-24 02:33:50 +01:00
Stanislav Mekhanoshin	d274d64ef4	[AMDGPU] Check for pointer operand while refining LDS align Also skips the propagation if alignment is 1. Differential Revision: https://reviews.llvm.org/D104796	2021-06-23 12:27:55 -07:00
Jinsong Ji	c125af82a5	[DAGCombine] Check reassoc flags in aggressive fsub fusion The is from discussion in https://reviews.llvm.org/D104247#inline-993387 The contract and reassoc flags shouldn't imply each other . All the aggressive fsub fusion reassociate operations, we should guard them with reassoc flag check. Reviewed By: mcberg2017 Differential Revision: https://reviews.llvm.org/D104723	2021-06-23 13:59:40 +00:00
Stanislav Mekhanoshin	2b43209ee3	[AMDGPU] Propagate LDS align into to instructions Differential Revision: https://reviews.llvm.org/D104316	2021-06-23 00:57:16 -07:00
Matt Arsenault	39f8a792f0	AMDGPU: Try to eliminate clearing of high bits of 16-bit instructions These used to consistently be zeroed pre-gfx9, but gfx9 made the situation complicated since now some still do and some don't. This also manages to pick up a few cases that the pattern fails to optimize away. We handle some cases with instruction patterns, but some get through. In particular this improves the integer cases.	2021-06-22 13:42:49 -04:00
Matt Arsenault	2e120920ac	AMDGPU: Add baseline test for instructions zeroing high bits	2021-06-22 13:27:39 -04:00
Matt Arsenault	9ad8a1f6fb	AMDGPU: Fix high 16-bit optimization on gfx9 We can do this optimization in the majority of cases, but we currently don't have a way to do it. We do not track/model which instructions have which behavior, the control bit to change the high bit behavior, or making use of preserved bits at all. This is a bit fuzzy since we don't know precisely how the source instruction will be lowered, but that only really matters in one case (for fma_mixlo). We do need to fixup some of these cases after selection, but the pattern helps eliminate many of these zexts.	2021-06-22 13:16:45 -04:00
Stanislav Mekhanoshin	d797a7f8da	[AMDGPU] Use performOptimizedStructLayout for LDS sort This gives better packing. Differential Revision: https://reviews.llvm.org/D104331	2021-06-22 09:58:10 -07:00
Fangrui Song	f53d791520	Improve the diagnostic of DiagnosticInfoResourceLimit (and warn-stack-size in particular) Before: `warning: stack size limit exceeded (888) in main` After: `warning: stack frame size (888) exceeds limit (100) in function 'main'` (the -Wframe-larger-than limit will be mentioned) Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D104667	2021-06-22 09:55:20 -07:00
Jinsong Ji	3996311ee1	[DAGCombine] reassoc flag shouldn't enable contract According to IR LangRef, the FMF flag: contract Allow floating-point contraction (e.g. fusing a multiply followed by an addition into a fused multiply-and-add). reassoc Allow reassociation transformations for floating-point instructions. This may dramatically change results in floating-point. My understanding is that these two flags shouldn't imply each other, as we might have a SDNode that can be reassociated with others, but not contractble. eg: We may want following fmul/fad/fsub to freely reassoc, but don't want fma being generated here. %F = fmul reassoc double %A, %B ; <double> [#uses=1] %G = fmul reassoc double %C, %D ; <double> [#uses=1] %H = fadd reassoc double %F, %G ; <double> [#uses=1] %I = fsub reassoc double %H, %E ; <double> [#uses=1] Before https://reviews.llvm.org/D45710, `reassoc` flag actually did not imply isContratable either. The current implementation also only check the flag in fadd node, ignoring fmul node, this patch update that as well. Reviewed By: spatel, qiucf Differential Revision: https://reviews.llvm.org/D104247	2021-06-21 21:15:43 +00:00
Matt Arsenault	4819cd162e	AMDGPU: Add missing tests for v_fma_mixlo	2021-06-21 10:58:53 -04:00
Ruiling Song	208332de8a	[AMDGPU] Add Optimize VGPR LiveRange Pass. This pass aims to optimize VGPR live-range in a typical divergent if-else control flow. For example: def(a) if(cond) use(a) ... // A else use(a) As AMDGPU access vgpr with respect to active-mask, we can mark `a` as dead in region A. For details, please refer to the comments in implementation file. The pass is enabled by default, the frontend can disable it through "-amdgpu-opt-vgpr-liverange=false". Differential Revision: https://reviews.llvm.org/D102212	2021-06-21 15:25:55 +08:00
hsmahesha	80fd5fa526	[AMDGPU] Replace non-kernel function uses of LDS globals by pointers. The main motivation behind pointer replacement of LDS use within non-kernel functions is - to avoid subsequent LDS lowering pass from directly packing LDS (assume large LDS) into a struct type which would otherwise cause allocating huge memory for struct instance within every kernel. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103225	2021-06-21 11:51:49 +05:30
David Green	a24b02193a	[DSE] Remove stores in the same loop iteration DSE will currently only remove stores in the same block unless they can be guaranteed to be loop invariant. This expands that to any stores that are in the same Loop, at the same loop level. This should still account for where AA/MSSA will not handle aliasing between loops, but allow the dead stores to be removed where they overlap in the same loop iteration. It requires adding loop info to DSE, but that looks fairly harmless. The test case this helps is from code like this, which can come up in certain matrix operations: for(i=..) dst[i] = 0; for(j=..) dst[i] += src[in+j]; After LICM, this becomes: for(i=..) dst[i] = 0; sum = 0; for(j=..) sum += src[in+j]; dst[i] = sum; The first store is dead, and with this patch is now removed. Differntial Revision: https://reviews.llvm.org/D100464	2021-06-20 17:03:30 +01:00
Michael Liao	940efa4f69	[amdgpu] Improve the from f32 to i64. - Take the same principle as the conversion from f64 to i64 with extra necessary pre- and post-processing. It helps to reduce that conversion sequence by half compared to legacy one. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D104427	2021-06-19 12:46:48 -04:00
Matt Arsenault	d6467e00df	AMDGPU: Fix infinite loop in DAG combine with fneg + fma We were not reporting isFNegFree for v2f32, although it is effectively free after legalization. The generic combine was pulling fneg out of the fma source operands, and the AMDGPU combine was doing the opposite.	2021-06-18 19:09:03 -04:00
Matt Arsenault	ad4a18251a	AMDGPU: Fix assert on m0_lo16/m0_hi16 These get added (redundantly) to the bundle expanded for indirect register accesses. We hit this path only when there is a call in the function.	2021-06-18 18:48:53 -04:00
Anshil Gandhi	2e5dc4a1ef	[AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask Implemented the transformation of xor (llvm.amdgcn.class x, mask), -1 into llvm.amdgcn.class(x, ~mask). Added LIT tests as well. Differential Revision: https://reviews.llvm.org/D104049	2021-06-18 13:04:12 -06:00

1 2 3 4 5 ...

4652 Commits