llvm-project

Author	SHA1	Message	Date
Krzysztof Drewniak	6540f1635a	[AMDGPU] Add IR-level pass to rewrite away address space 7 (#77952 ) This commit adds the -lower-buffer-fat-pointers pass, which is applicable to all AMDGCN compilations. The purpose of this pass is to remove the type `ptr addrspace(7)` from incoming IR. This must be done at the LLVM IR level because `ptr addrspace(7)`, as a 160-bit primitive type, cannot be correctly handled by SelectionDAG. The detailed operation of the pass is described in comments, but, in summary, the removal proceeds by: 1. Rewriting loads and stores of ptr addrspace(7) to loads and stores of i160 (including vectors and aggregates). This is needed because the in-register representation of these pointers will stop matching their in-memory representation in step 2, and so ptrtoint/inttoptr operations are used to preserve the expected memory layout 2. Mutating the IR to replace all occurrences of `ptr addrspace(7)` with the type `{ptr addrspace(8), ptr addrspace(6) }`, which makes the two parts of a buffer fat pointer (the 128-bit address space 8 resource and the 32-bit address space 6 offset) visible in the IR. This also impacts the argument and return types of functions. 3. Splitting the resource and offset parts. All instructions that produce or consume buffer fat pointers (like GEP or load) are rewritten to produce or consume the resource and offset parts separately. For example, GEP updates the offset part of the result and a load uses the resource and offset parts to populate the relevant llvm.amdgcn.raw.ptr.buffer.load intrinsic call. At the end of this process, the original mutated instructions are replaced by their new split counterparts, ensuring no invalidly-typed IR escapes this pass. (For operations like call, where the struct form is needed, insertelement operations are inserted). Compared to LGC's PatchBufferOp ( `32cda89776/lgc/patch/PatchBufferOp.cpp` ): this pass - Also handles vectors of ptr addrspace(7)s - Also handles function boundaries - Includes the same uniform buffer optimization for loops and conditionals - Does not handle memcpy() and friends (this is future work) - Does not break up large loads and stores into smaller parts. This should be handled by extending the legalization of .buffer.{load,store} to handle larger types by producing multiple instructions (the same way ordinary LOAD and STORE are legalized). That work is planned for a followup commit. - Does not* have special logic for handling divergent buffer descriptors. The logic in LGC is, as far as I can tell, incorrect in general, and, per discussions with @nhaehnle, isn't widely used. Therefore, divergent descriptors are handled with waterfall loops later in legalization. As a final matter, this commit updates atomic expansion to treat buffer operations analogously to global ones. (One question for reviewers: is the new pass is the right place? Should it be later in the pipeline?) Differential Revision: https://reviews.llvm.org/D158463	2024-03-06 09:49:58 -06:00
Mirko Brkušanin	1d286ad59b	[AMDGPU] Add mark last scratch load pass (#75512 )	2024-01-18 09:36:44 +01:00
Jessica Del	32f9983c06	[AMDGPU] - Add address space for strided buffers (#74471 ) This is an experimental address space for strided buffers. These buffers can have structs as elements and a stride > 1. These pointers allow the indexed access in units of stride, i.e., they point at `buffer[index * stride]`. Thus, we can use the `idxen` modifier for buffer loads. We assign address space 9 to 192-bit buffer pointers which contain a 128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially, they are fat buffer pointers with an additional 32-bit index.	2023-12-15 15:49:25 +01:00
Petar Avramovic	6892c175c5	AMDGPU/GlobalISel: add AMDGPUGlobalISelDivergenceLowering pass (#75340 ) Add empty AMDGPUGlobalISelDivergenceLowering pass. This pass will implement - selection of divergent i1 phis as lane mask phis, requires lane mask merging in some cases - lower uses of divergent i1 values outside of the cycle using lane mask merging - lowering of all cases of temporal divergence: - lower uses of uniform i1 values outside of the cycle using lane mask merging - lower uses of uniform non-i1 values outside of the cycle using a copy to vgpr inside of the cycle Add very detailed set of regression tests for cases mentioned above. patch 1 from: https://github.com/llvm/llvm-project/pull/73337	2023-12-13 16:42:56 +01:00
Dominik Adamski	276a024b49	[NFC][AMDGPU] Unify AMDGPU address space enum (#73944 ) Types of AMDGPU address space were defined not only in Clang-specific class but also in LLVM header. If we unify the AMD GPU address space enumeration, then we can reuse it in Clang, Flang and LLVM.	2023-12-11 10:45:21 +01:00
Kazu Hirata	55531e715f	[Target] Remove unused forward declarations (NFC)	2023-12-10 10:38:55 -08:00
Jay Foad	28233b11ac	[AMDGPU] New AMDGPUInsertSingleUseVDST pass (#72388 ) Add support for emitting GFX11.5 s_singleuse_vdst instructions. This is a power saving feature whereby the compiler can annotate VALU instructions whose results are known to have only a single use, so the hardware can in some cases avoid writing the result back to VGPR RAM. To begin with the pass is disabled by default because of one missing feature: we need an exclusion list of opcodes that never qualify as single-use producers and/or consumers. A future patch will implement this and enable the pass by default. --------- Co-authored-by: Scott Egerton <scott.egerton@amd.com>	2023-11-24 10:23:06 +00:00
Matt Arsenault	d34a10a47d	AMDGPU: Port AMDGPUAttributor to new pass manager (#71349 )	2023-11-07 15:40:40 +09:00
Valery Pykhtin	e808f8a616	[AMDGPU] GCNRegPressurePrinter pass to print GCNRegPressure values for testing. (#70031 ) Using GCNDownwardRPTracker or GCNUpwardRPTracker the pass collects register pressure values for a function and prints these values next to instructions. Output can be used to generate Filecheck rules in mir tests.	2023-11-01 23:01:39 +01:00
Jay Foad	d85d143ad9	[AMDGPU] New image intrinsic optimizer pass (#67151 ) Implement a new pass to combine multiple image_load_2dmsaa and 2darraymsaa intrinsic calls into a single image_msaa_load if: - they refer to the same vaddr except for sample_id, - they use a constant sample_id and they fall into the same group, - they have the same dmask and the number of instructions and the number of vaddr/vdata dword transfers is reduced by the combine This should be valid on all GFX11 but a hardware bug renders it unworkable on GFX11.0.* so it is only enabled for GFX11.5. Based on a patch by Rodrigo Dominguez!	2023-09-26 09:33:49 +01:00
Arthur Eubanks	0a1aa6cda2	[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295 ) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::	2023-09-14 14:10:14 -07:00
Kazu Hirata	a9c7ba964f	[AMDGPU] Fix a warning This patch fixes: llvm/lib/Target/AMDGPU/AMDGPU.h:297:18: error: private field 'TM' is not used [-Werror,-Wunused-private-field]	2023-09-12 14:02:07 -07:00
jwanggit86	b853988e0d	[AMDGPU] Port AMDGPURewriteUndefForPHI to new pass manager (#66008 ) This patch ports the AMDGPURewriteUndefForPHI pass to the new pass manager. With this, the pass is supported under both the legacy and the new pass managers. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2023-09-12 13:32:02 -07:00
Matt Arsenault	f7dcabe502	AMDGPU: Pass in TargetMachine to AMDGPULowerModuleLDSPass https://reviews.llvm.org/D157660	2023-09-02 12:02:36 -04:00
Matt Arsenault	58e87c961e	AMDGPU: Port AMDGPULowerKernelArguments to new pass manager https://reviews.llvm.org/D157498	2023-08-09 18:34:30 -04:00
Matt Arsenault	9a806551a0	AMDGPU: Delete old PM support for libcall passes This has no reason to run in the codegen pipeline.	2023-08-01 18:22:02 -04:00
Matt Arsenault	5dfdd3494b	AMDGPU: Don't try to fold wavefrontsize intrinsic in libcall simplify It's not a libcall so doesn't really belong here to begin with. Relying on checking the target name and explicit features isn't particularly sound either. The library doesn't use the intrinsic anymore, so it doesn't matter anyway.	2023-08-01 18:20:50 -04:00
Matt Arsenault	4d42e8b5d1	Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd. The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work around the underlying problem with SUBREG_TO_REG.	2023-07-31 20:15:45 -04:00
Vitaly Buka	a496c8be6e	Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" And dependent commits. Details in D150388. This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c. This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e. This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826. This reverts commit b7836d856206ec39509d42529f958c920368166b. No conflicts in the code, few tests had conflicts in autogenerated CHECKs: llvm/test/CodeGen/Thumb2/mve-float32regloops.ll llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll Reviewed By: alexfh Differential Revision: https://reviews.llvm.org/D156381	2023-07-26 22:13:32 -07:00
Christudasan Devadasan	b4a62b1fa5	[AMDGPU] Enable whole wave register copy So far, we haven't exposed the allocation of whole-wave registers to regalloc. We hand-picked them for various whole wave mode operations. With a future patch, we want the allocator to efficiently allocate them rather than using the custom pre-allocation pass. Any liverange split of virtual registers involved in whole-wave operations require the resulting COPY introduced with the split to be performed for all lanes. It isn't implemented in the compiler yet. This patch would identify all such copies and manipulate the exec mask around them to enable all lanes without affecting the value of exec mask elsewhere. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D143762	2023-07-07 22:58:55 +05:30
Brendon Cahoon	853b2a84cb	[AMDGPU] Reserve SGPR pair when long branches are present Branch relaxation requires 2 additional SGPRs for AMDGPU to handle the case when an indirect branch target is too far away. The register scavanger may not find available registers, which causes a “did not find scavenging index” assert to occur in assignRegToScavengingIndex. In this patch, we estimate before register allocation whether an indirect branch is likely to be needed, and reserve 2 SGPRs if the branch distance is found to be above a threshold. The distance threshold is an approximation as the exact code size and branch distance are unknown prior to register allocation. Patch by Corbin Robeck. Thanks! Differential Review: https://reviews.llvm.org/D149775	2023-06-29 16:50:46 -05:00
Pravin Jagtap	597fb7fb46	[AMDGPU] Switch to the new cl option amdgpu-atomic-optimizer-strategy. Atomic optimizer is turned on by default through D152649. This patch removes the usage of old command line option amdgpu-atomic-optimizations and transfer the responsibility to `amdgpu-atomic-optimizer-strategy`. We can safely remove old option when LLPC remove its all usage. Reviewed By: foad, arsenm, #amdgpu, cdevadas Differential Revision: https://reviews.llvm.org/D153007	2023-06-22 07:06:42 -04:00
Matt Arsenault	e777da468c	AMDGPU: Delete old AMDGPUPropagateAttributes pass The optimizing, non-broken features have all been moved to AMDGPUAttributor. The only remaining piece of functionality was the broken propagation of the wavesize features. This was fundamentally broken and a hack for device library linking. It doesn't matter when the device libraries are correctly linked and internalized. In case of linked-as-normal-bitcode (as comgr still does), we're reliant on the global subtarget anyway. If we can get away without forcing target-cpu, we should just as well be able to get away without propagating target-features.	2023-06-20 13:05:45 -04:00
Jay Foad	eb7491769a	[AMDGPU] Reimplement the GFX11 early release VGPRs optimization Implement this optimization in SIInsertWaitcnts, where we already have information about whether there might be outstanding VMEM store instructions. This has the following advantages: - Correctly handles atomics-with-return. - Correctly handles call instructions. - Should be faster because it does not require running a separate pass. Differential Revision: https://reviews.llvm.org/D153279	2023-06-19 17:12:54 +01:00
Matt Arsenault	a09f79d227	TargetTransformInfo: Add addrspacesMayAlias For some reason we used to only handle address space aliasing through chaining a target specific AA pass. We need never-fail simple queries in order to lower memmove intrinsics based purely on the address spaces. I also think it would be better if BasicAA checked this, rather than relying on the target AA passes. Currently we go through the more expensive AA analyses before getting to the trivial address space checks.	2023-06-13 20:44:00 -04:00
Matt Arsenault	3c848194f2	CodeGen: Expand memory intrinsics in PreISelIntrinsicLowering Expand large or unknown size memory intrinsics into loops in the default lowering pipeline if the target doesn't have the corresponding libfunc. Previously AMDGPU had a custom pass which existed to call the expansion utilities. With a default no-libcall option, we can remove the libfunc checks in LoopIdiomRecognize for these, which never made any sense. This also provides a path to lifting the immarg restriction on llvm.memcpy.inline. There seems to be a bug where TLI reports functions as available if you use -march and not -mtriple.	2023-06-09 21:04:37 -04:00
Pravin Jagtap	f6c8a8e9cb	[AMDGPU] Iterative scan implementation for atomic optimizer. This patch provides an alternative implementation to DPP for Scan Computations. An alternative implementation iterates over all active lanes of Wavefront using llvm.cttz and performs the following steps: 1. Read the value that needs to be atomically incremented using llvm.amdgcn.readlane intrinsic 2. Accumulate the result. 3. Update the scan result using llvm.amdgcn.writelane intrinsic if intermediate scan results are needed later in the kernel. Reviewed By: arsenm, cdevadas Differential Revision: https://reviews.llvm.org/D147408	2023-06-09 01:08:44 -04:00
Valery Pykhtin	8d0412ce9d	[AMDGPU] Add pass to rewrite partially used virtual superregisters after RenameIndependentSubregs pass with registers of minimal size. The main purpose of this is to simplify register pressure tracking as after the pass there is no need to track subreg liveness anymore. On the other hand this pass creates more possibilites for the subreg unaware code, as many of the subregs becomes ordinary registers. Intersting sideeffect: spill-vgpr.ll has lost a lot of spills. Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D139732	2023-05-26 09:05:44 +02:00
Anshil Gandhi	a22ef958cb	[AMDGPUCodegenPrepare] Add NewPM Support Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D151241	2023-05-26 00:20:01 -06:00
Krzysztof Drewniak	f0415f2a45	Re-land "[AMDGPU] Define data layout entries for buffers"" Re-land D145441 with data layout upgrade code fixed to not break OpenMP. This reverts commit 3f2fbe92d0f40bcb46db7636db9ec3f7e7899b27. Differential Revision: https://reviews.llvm.org/D149776	2023-05-03 19:43:56 +00:00
Krzysztof Drewniak	3f2fbe92d0	Revert "[AMDGPU] Define data layout entries for buffers" This reverts commit f9c1ede2543b37fabe9f2d8f8fed5073c475d850. Differential Revision: https://reviews.llvm.org/D149758	2023-05-03 16:11:00 +00:00
Krzysztof Drewniak	f9c1ede254	[AMDGPU] Define data layout entries for buffers Per discussion at https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798, we define two new address spaces for AMDGCN targets. The first is address space 7, a non-integral address space (which was already in the data layout) that has 160-bit pointers (which are 256-bit aligned) and uses a 32-bit offset. These pointers combine a 128-bit buffer descriptor and a 32-bit offset, and will be usable with normal LLVM operations (load, store, GEP). However, they will be rewritten out of existence before code generation. The second of these is address space 8, the address space for "buffer resources". These will be used to represent the resource arguments to buffer instructions, and new buffer intrinsics will be defined that take them instead of <4 x i32> as resource arguments. ptr addrspace(8). These pointers are 128-bits long (with the same alignment). They must not be used as the arguments to getelementptr or otherwise used in address computations, since they can have arbitrarily complex inherent addressing semantics that can't be represented in LLVM. Even though, like their address space 7 cousins, these pointers have deterministic ptrtoint/inttoptr semantics, they are defined to be non-integral in order to prevent optimizations that rely on pointers being a [0, [addr_max]] value from applying to them. Future work includes: - Defining new buffer intrinsics that take ptr addrspace(8) resources. - A late rewrite to turn address space 7 operations into buffer intrinsics and offset computations. This commit also updates the "fallback address space" for buffer intrinsics to the buffer resource, and updates the alias analysis table. Depends on D143437 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D145441	2023-05-03 15:25:58 +00:00
Pravin Jagtap	21a69bdb66	[NewPM][AMDGPU] Port amdgpu-atomic-optimizer Reviewed By: arsenm, sameerds, gandhi21299 Differential Revision: https://reviews.llvm.org/D148628	2023-04-20 00:27:47 -04:00
Diana Picus	d9bf8aba23	[AMDGPU] Add MMOs for GFX11 Streamout Instructions The GFX11 NGG Streamout Instructions perform atomic operations on dedicated registers. At the moment, they lack machine memory operands, which causes the si-memory-legalizer pass to treat them conservatively and introduce several unnecessary waits and cache invalidations. This patch introduces a new address space to represent these special registers and teaches instruction selection to add memory operands with this new address space to DS_ADD/SUB_GS_REG_RTN. Since this address space is meant to be compiler-internal, we move it up a bit from the other address spaces and give it the number 128. According to the LLVM Language Reference, address space numbers can go all the way up to 2^24, but I'm not sure how well this is supported in practice [1], so using a smaller number seems safer. [1] `0107513fe7/llvm/utils/TableGen/IntrinsicEmitter.cpp (L401)` Differential Revision: https://reviews.llvm.org/D146031	2023-04-11 11:11:32 +02:00
Diana Picus	f8861ea023	clang-format file I'm about to touch. NFCI	2023-04-11 11:11:32 +02:00
Jon Chesterfield	3c76e5f0c8	[amdgpu][nfc] Remove dead code associated with LDS lowering Pass disabled since approximately D104962 for miscompiling openmp The functions under ReplaceConstant miscompile phis as noted in D112717 and have no users in tree other than the disabled pass. It seems likely it has no users out of tree. Deletes the test cases associated with the disabled pass as well. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D147586	2023-04-05 22:24:22 +01:00
Jeffrey Byrnes	b89236a96f	[AMDGPU] Vectorize misaligned global loads & stores Based on experimentation on gfx906,908,90a and 1030, wider global loads / stores are more performant than multiple narrower ones independent of alignment -- this is especially true when combining 8 bit loads / stores, in which case speedup was usually 2x across all alignments. Differential Revision: https://reviews.llvm.org/D145170 Change-Id: I6ee6c76e6ace7fc373cc1b2aac3818fc1425a0c1	2023-03-03 13:18:25 -08:00
pvanhout	8e68c12045	[AMDGPU] Remove function with incompatible features Adds a new pass that removes functions if they use features that are not supported on the current GPU. This change is aimed at preventing crashes when building code at O0 that uses idioms such as `if (ISA_VERSION >= N) intrinsic_a(); else intrinsic_b();` where ISA_VERSION is not constexpr, and intrinsic_a is not selectable on older targets. This is a pattern that's used all over the ROCm device libs. The main motive behind this change is to allow code using ROCm device libs to be built at O0. Note: the feature checking logic is done ad-hoc in the pass. There is no other pass that needs (or will need in the foreseeable future) to do similar feature-checking logic so I did not see a need to generalize the feature checking logic yet. It can (and should probably) be generalized later and moved to a TargetInfo-like class or helper file. Reviewed By: arsenm, Joe_Nash Differential Revision: https://reviews.llvm.org/D139000	2023-02-21 10:42:39 +01:00
Matt Arsenault	e9c49901a4	AMDGPU/GlobalISel: Add stub custom regbankselect pass Uniformity analysis needs to be the fundamental basis for regbank decisions. The considerations of the default pass are secondary, but potentially useful for some edge cases (e.g. selecting AGPRs when arbitrary loads and stores can directly use them). This needs to be a separate pass since it requires new analysis dependencies. Boilerplate to subclass the existing pass which does nothing different.	2023-01-30 16:18:20 -04:00
Nick Desaulniers	ad99774a5f	[llvm][PassSupport] don't require passes to be default constructible Quite a few passes are not default constructible. In order to properly support -{start\|stop}-{before\|after}= for these passes, we would like to continue to use INITIALIZE_PASS, but not necessarily provide a default constructor. Delete the default constructors of classes derived from SelectionDAGISel. Link: https://github.com/llvm/llvm-project/issues/59538 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D140349	2022-12-20 14:07:29 -08:00
Matt Arsenault	f23f26032d	AMDGPU: Port AMDGPUCtorDtorLowering to new PM	2022-12-09 13:43:38 -05:00
Ruiling Song	cf14c7caac	AMDGPU: Add a pass to rewrite certain undef in PHI For the pattern of IR (%if terminates with a divergent branch.), divergence analysis will report %phi as uniform to help optimal code generation. ``` %if \| \ \| %then \| / %endif: %phi = phi [ %uniform, %if ], [ %undef, %then ] ``` In the backend, %phi and %uniform will be assigned a scalar register. But the %undef from %then will make the scalar register dead in %then. This will likely cause the register being over-written in %then. To fix the issue, we will rewrite %undef as %uniform. For details, please refer the comment in AMDGPURewriteUndefForPHI.cpp. Currently there is no test changes shown, but this is mandatory for later changes. Reviewed by: sameerds Differential Revision: https://reviews.llvm.org/D133840	2022-09-26 09:54:47 +08:00
Joe Nash	d1af09ad96	[AMDGPU] gfx11 Generate VOPD Instructions We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a legal VOPD, namely that the matching operands (e.g. src0x and src0y, src1x and src1y) must be in different register banks. We add a PostRA scheduler mutation to put possible VOPD components back-to-back. Depends on D128442, D128270 Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D128656	2022-07-05 09:18:19 -04:00
Jay Foad	0f94d2b385	[AMDGPU] GFX11: automatically release VGPRs at the end of the shader GFX11 has a new message type MSG_DEALLOC_VGPRS which can be used to release a shader's VGPRs. Sending this at the end of a shader (just before the s_endpgm) can help overall system performance in cases where the s_endpgm would have to wait for outstanding VMEM stores to complete before releasing the VGPRs. Differential Revision: https://reviews.llvm.org/D128442	2022-06-30 20:55:14 +01:00
Jay Foad	cfb7ffdec0	[AMDGPU] New AMDGPUInsertDelayAlu pass Differential Revision: https://reviews.llvm.org/D128270	2022-06-29 21:30:20 +01:00
Ivan Kosarev	6ddf2a824d	[AMDGPU] Adjust wave priority based on VMEM instructions to avoid duty-cycling. As older waves execute long sequences of VALU instructions, this may prevent younger waves from address calculation and then issuing their VMEM loads, which in turn leads the VALU unit to idle. This patch tries to prevent this by temporarily raising the wave's priority. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D124246	2022-04-27 14:37:18 +01:00
Matt Arsenault	203a1e36ed	Reapply "AMDGPU: Remove AMDGPUFixFunctionBitcasts pass" This reverts commit 8a85be807bd453eb9c88d0126c75fd5ea393f60d. The unrelated failure this exposed was fixed.	2022-04-11 19:43:37 -04:00
serge-sans-paille	e188aae406	Cleanup header dependencies in LLVMCore Based on the output of include-what-you-use. This is a big chunk of changes. It is very likely to break downstream code unless they took a lot of care in avoiding hidden ehader dependencies, something the LLVM codebase doesn't do that well :-/ I've tried to summarize the biggest change below: - llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h - llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h - llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h - llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h - llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h - llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h - llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h And the usual count of preprocessed lines: $ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l before: 6400831 after: 6189948 200k lines less to process is no that bad ;-) Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D118652	2022-02-02 06:54:20 +01:00
Ron Lieberman	8a85be807b	Revert "AMDGPU: Remove AMDGPUFixFunctionBitcasts pass" Offload abort in Nekbone This reverts commit 2b4876157562bc76e86f193d371348993905bc61.	2021-12-16 21:21:32 +00:00
Matt Arsenault	2b48761575	AMDGPU: Remove AMDGPUFixFunctionBitcasts pass This was a workaround for not supporting indirect calls when instcombine didn't eliminate constant expression casts of the callee at -O0. Indirect calls are supposed to work now, so drop the hack.	2021-12-15 18:20:48 -05:00

1 2 3 4

198 Commits