llvm-project

Author	SHA1	Message	Date
Matt Arsenault	4cb110a84f	[RFC] IR: Support atomicrmw FP ops with vector types (#86796 ) Allow using atomicrmw fadd, fsub, fmin, and fmax with vectors of floating-point type. AMDGPU supports atomic fadd for <2 x half> and <2 x bfloat> on some targets and address spaces. Note this only supports the proper floating-point operations; float vector typed xchg is still not supported. cmpxchg still only supports integers, so this inserts bitcasts for the loop expansion. I have support for fp vector typed xchg, and vector of int/ptr separately implemented but I don't have an immediate need for those beyond feature consistency.	2024-04-06 15:27:45 -04:00
Kevin P. Neal	fe893c93b7	[FPEnv][AtomicExpand] Correct strictfp attribute handling in AtomicExpandPass (#87082 ) The AtomicExpand pass was lowering function calls with the strictfp attribute to sequences that included function calls incorrectly lacking the attribute. This patch corrects that. The pass now also emits the correct constrained fp call instead of normal FP instructions when in a function with the strictfp attribute. Test changes verified with D146845.	2024-03-29 14:54:51 -04:00
Rishabh Bali	fe42e72db2	[CodeGen] Port AtomicExpand to new Pass Manager (#71220 ) Port the `atomicexpand` pass to the new Pass Manager. Fixes #64559	2024-02-25 18:42:22 +05:30
James Y Knight	137f785fa6	[AMDGPU] Set MaxAtomicSizeInBitsSupported. (#75185 ) This will result in larger atomic operations getting expanded to `__atomic_*` libcalls via AtomicExpandPass, which matches what Clang already does in the frontend. While AMDGPU currently disables the use of all libcalls, I've changed it to instead disable all of them _except_ the atomic ones. Those are already be emitted by the Clang frontend, and enabling them in the backend allows the same behavior there.	2023-12-18 16:51:06 -05:00
Alex Richardson	e39f6c1844	[opt] Infer DataLayout from triple if not specified There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060	2023-10-26 12:07:37 -07:00
Alex Richardson	e86d6a43f0	Regenerate test checks for tests affected by D141060	2023-10-04 10:51:35 -07:00
Pravin Jagtap	5f8fd68672	[AMDGPU] Pre-commit test for D157495 Reviewed By: yassingh Differential Revision: https://reviews.llvm.org/D158243	2023-08-18 06:52:32 -04:00
Matt Arsenault	7575ee7167	AMDGPU: Add more test coverage for FP-typed atomicrmw xchg	2023-08-10 17:38:25 -04:00
Matt Arsenault	35be9e2903	AtomicExpand: Preserve syncscope when expanding partword atomics	2023-08-08 14:38:06 -04:00
Matt Arsenault	3371849194	AMDGPU: Round out system atomics tests There were system scope tests only for integer min/max. Expand this to cover all of the integer operations.	2023-08-08 14:38:05 -04:00
Matt Arsenault	b97e9a9a03	AMDGPU: Fix some typed pointers in atomic expand test	2023-08-08 14:38:05 -04:00
Kai Luo	f26af16e2c	[PowerPC][AIX] Enable quadword atomics by default for AIX On AIX, a libatomic supporting inline quadword atomic operations has been released, so that compatibility is not an issue now, we can enable quadword atomics by default. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D151312	2023-07-25 08:21:07 +08:00
Nikita Popov	edb2fc6dab	[llvm] Remove explicit -opaque-pointers flag from tests (NFC) Opaque pointers mode is enabled by default, no need to explicitly enable it.	2023-07-12 14:35:55 +02:00
Krasimir Georgiev	c256e19671	Revert "Revert "IRBuilder: Fix not handling strictfp minnum/maxnum"" This reverts commit 593797ab9bedca6e9b0b7a9ed0589cf76023ab00. I didn't realize that there was already a fix for the broken tests fd2254b7358d0f78a79784688bd8012c1a52b9cf.	2023-07-12 14:13:31 +02:00
Krasimir Georgiev	593797ab9b	Revert "IRBuilder: Fix not handling strictfp minnum/maxnum" This reverts commit 14c3ab945be9c49964dbf79f13d8ff8df1ff7b72. Causes build bot failures.	2023-07-12 11:16:20 +00:00
Matt Arsenault	14c3ab945b	IRBuilder: Fix not handling strictfp minnum/maxnum Removing the rounding mode arguments seems like more trouble than it's worth. minimum and maximum are still broken. https://reviews.llvm.org/D154994	2023-07-11 18:51:50 -04:00
Matt Arsenault	3701ebe76b	AtomicExpand: Fix expanding atomics into unconstrained FP in strictfp functions Ideally the normal fadd/fmin/fmax this was creating would fail the verifier. It's probably also necessary to force off FP exception handlers in the cmpxchg loop but we don't have a generic way to do that now. Note strictfp builder is broken in the minnum/maxnum case https://reviews.llvm.org/D154993	2023-07-11 18:51:15 -04:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
Krzysztof Drewniak	f0415f2a45	Re-land "[AMDGPU] Define data layout entries for buffers"" Re-land D145441 with data layout upgrade code fixed to not break OpenMP. This reverts commit 3f2fbe92d0f40bcb46db7636db9ec3f7e7899b27. Differential Revision: https://reviews.llvm.org/D149776	2023-05-03 19:43:56 +00:00
Krzysztof Drewniak	3f2fbe92d0	Revert "[AMDGPU] Define data layout entries for buffers" This reverts commit f9c1ede2543b37fabe9f2d8f8fed5073c475d850. Differential Revision: https://reviews.llvm.org/D149758	2023-05-03 16:11:00 +00:00
Krzysztof Drewniak	f9c1ede254	[AMDGPU] Define data layout entries for buffers Per discussion at https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798, we define two new address spaces for AMDGCN targets. The first is address space 7, a non-integral address space (which was already in the data layout) that has 160-bit pointers (which are 256-bit aligned) and uses a 32-bit offset. These pointers combine a 128-bit buffer descriptor and a 32-bit offset, and will be usable with normal LLVM operations (load, store, GEP). However, they will be rewritten out of existence before code generation. The second of these is address space 8, the address space for "buffer resources". These will be used to represent the resource arguments to buffer instructions, and new buffer intrinsics will be defined that take them instead of <4 x i32> as resource arguments. ptr addrspace(8). These pointers are 128-bits long (with the same alignment). They must not be used as the arguments to getelementptr or otherwise used in address computations, since they can have arbitrarily complex inherent addressing semantics that can't be represented in LLVM. Even though, like their address space 7 cousins, these pointers have deterministic ptrtoint/inttoptr semantics, they are defined to be non-integral in order to prevent optimizations that rely on pointers being a [0, [addr_max]] value from applying to them. Future work includes: - Defining new buffer intrinsics that take ptr addrspace(8) resources. - A late rewrite to turn address space 7 operations into buffer intrinsics and offset computations. This commit also updates the "fallback address space" for buffer intrinsics to the buffer resource, and updates the alias analysis table. Depends on D143437 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D145441	2023-05-03 15:25:58 +00:00
Stanislav Mekhanoshin	94def1b44e	[AMDGPU] Do not exapnd fp atomics on gfx940 FP atomics are safe on gfx940. This fixes regression after D131560. Fixes: SWDEV-380468 Differential Revision: https://reviews.llvm.org/D143603	2023-02-08 13:22:04 -08:00
Stanislav Mekhanoshin	3e9f2af27a	[AMDGPU] Update atomic tests. NFC. This is to precommit tests before future patch.	2023-02-08 12:55:34 -08:00
Matt Arsenault	778cf5431c	IR: Add atomicrmw uinc_wrap and udec_wrap These are essentially add/sub 1 with a clamping value. AMDGPU has instructions for these. CUDA/HIP expose these as atomicInc/atomicDec. Currently we use target intrinsics for these, but those do no carry the ordering and syncscope. Add these to atomicrmw so we can carry these and benefit from the regular legalization processes.	2023-01-24 17:55:11 -04:00
Matt Arsenault	143ca74ed3	AtomicExpand: Convert tests to opaque pointers	2022-11-28 08:43:16 -05:00
Manuel Brito	f408635b26	[CodeGen] Use poison instead of undef as placeholder in AtomicExpandPass [NFC] Differential Revision: https://reviews.llvm.org/D138483	2022-11-24 08:42:28 +00:00
Nuno Lopes	b50e1bd605	Revert "[CodeGen] Use poison instead of undef as placeholder in AtomicExpandPass [NFC]" This reverts commit f50423c1a4422900aa1240fed643f5920451a88d.	2022-11-22 12:41:22 +00:00
Manuel Brito	f50423c1a4	[CodeGen] Use poison instead of undef as placeholder in AtomicExpandPass [NFC] Differential Revision: https://reviews.llvm.org/D138483	2022-11-22 11:40:25 +00:00
gonglingqin	19ae5391e3	[LoongArch] Expand atomicrmw fadd/fsub/fmin/fmax with CmpXChg Differential Revision: https://reviews.llvm.org/D137311	2022-11-14 10:11:37 +08:00
Matt Arsenault	3cfa03856f	AtomicExpand: Support cmpxchg expansion for small FP types Handles f16 atomics for AMDGPU.	2022-11-10 22:16:11 -08:00
Bjorn Pettersson	893e351f2f	[test] Avoid legacy PM default pipelines (O0,O1 etc) when running opt Two lit tests were found running something like this: opt -O<n> -pass-locked-to-legacy-PM ... The expand-atomicrmw-xchg-fp.ll seem to have used -O1 just to ensure that the -atomic-expand pass were thinking that it wasn't running at O0 level. Same thing can be ensured by using the -codegen-opt-level=1 option, making it possible to avoid using O1 in that test case. In the vector-reductions-expanded.ll test case it was possible to split the RUN line into using two opt invocations. First running "opt -O2" using the new PM, and then running "opt -expand-reductions" using the legacy PM. I think that given this patch we get closer to removing code related to 'AddOptimizationPasses' in opt.cpp. Differential Revision: https://reviews.llvm.org/D137626	2022-11-09 09:57:57 +01:00
Shilei Tian	1186e9d59f	[LLVM][AMDGPU] Specialize 32-bit atomic fadd instruction for generic address space The 32-bit floating-point atomic add instructions on AMDGPUs does not support a "flat" or "generic" address space. So, if the address space cannot be determined statically, the AMDGPU backend will fall back to a CAS loop (which does support "flat" addressing). Instead, this patch emits runtime address-space checks to allow native FP atomic add instructions for global and LDS memory (and non-atomic FP add instructions for private/scratch memory). In order to do that, this patch introduces a new interface function `emitExpandAtomicRMW`. It is expected to be called when a common atomic expand doesn't work for a specific target, such as the case we discussed here. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D129690	2022-11-04 14:11:05 -04:00
Matt Arsenault	b60a9ccd02	AtomicExpand: Use InstSimplifyFolder Automatically cleanup operations if we know the atomic has higher alignment.	2022-10-31 23:31:42 -07:00
Matt Arsenault	07f12170a2	AtomicExpand: Don't create unused instructions for some atomicrmw This wasn't used by every atomicrmw expansion.	2022-10-31 18:34:36 -07:00
Matt Arsenault	d0750ec475	AtomicExpand: Avoid some operations if the atomic is overaligned Let some of the pointer bithacking fold away if we know the LSB are 0.	2022-10-13 23:31:00 -07:00
Matt Arsenault	01adf1f3e5	AtomicExpand: Add some more overaligned atomic tests	2022-09-28 12:51:30 -04:00
Matt Arsenault	a61c3455c0	AtomicExpand: Use llvm.ptrmask instead of ptrtoint This removes the ptrtoint from the load's pointer operand, although we can't entirely eliminate these to get the LSB shift. In a future patch, this will avoid ptrtoint in the case where the atomic is overaligned to the word size.	2022-09-28 12:51:30 -04:00
Petar Avramovic	5cee9047d5	AMDGPU: Improve atomicrmw fadd selection Use same atomicrmw fadd expansion rules for gfx908, gfx940 and gfx11 as for gfx90a. Add missing globalisel legalizer support for flat atomicrmw fadd f32 on gfx940 and gfx11. Isel support for gfx11 will be added in D130579. Differential Revision: https://reviews.llvm.org/D131560	2022-09-23 17:52:10 +02:00
Petar Avramovic	48968c47b0	AMDGPU: Add detailed buffer, global and flat atomic fadd tests Precommit for D130579 that will remove manual selection and use patterns from td files. Tests are grouped based on target features. All patterns have rtn and no-rtn versions. buffer atomics patterns are selected based on the intrinsic used (raw or struct) and the offset operand (imm or vgpr): _offset raw with imm offset _offen raw with vgpr offset (or large imm offset) _idxen struct with imm offset _bothen struct with vgpr offset (or large imm offset) global and flat atomics are selected via intrinsic or the atomicrmw fadd. atomicrmw tests have amdgpu-unsafe-fp-atomics=true and non-system scope since they get expanded otherwise. atomicrmw fadd does not support vector type, test float and double. global atomics patterns are selected based on address type via (global or flat) intrinsic or atomicrmw fadd with global address(addrspace(1)). 'no suffix' vgpr addrspace(1) address _saddr sgpr addrspace(1)* address flat atomics patterns are selected via (flat)intrinsic or atomicrmw fadd with flat address (* - address space 0). Differential Revision: https://reviews.llvm.org/D131561	2022-09-23 17:52:10 +02:00
Matt Arsenault	b9a371f6d1	AtomicExpand: Use correct pointer size for integer This was using the default address space.	2022-09-20 16:51:05 -04:00
Matt Arsenault	4d322ba77b	AMDGPU: Add baseline test for expansion of 16-bit local atomics The expansion is currently using the wrong pointer size.	2022-09-20 16:51:05 -04:00
Matt Arsenault	784d2930c0	AtomicExpand: Switch test to generated checks	2022-09-20 16:51:05 -04:00
Matt Arsenault	28e03692ae	AMDGPU: Fix expansion of 16-bit atomicrmw Fixes issue 57830	2022-09-20 14:47:40 -04:00
Matt Arsenault	a4b1f7a8b5	AMDGPU: Add some tests for atomics with excess alignment	2022-09-19 19:27:21 -04:00
Matt Arsenault	3f77df8e29	AMDGPU: Update baseline test checks	2022-09-19 18:57:33 -04:00
Marco Elver	f0d6709e4a	[AtomicExpandPass] Always copy pcsections Metadata to expanded atomics When expanding IR atomics to target-specific atomics, copy all !pcsections Metadata to expanded atomics automatically. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130885	2022-09-07 11:36:01 +02:00
Kai Luo	ad2f7fd286	[AtomicExpand] Make floating point conversion happens before fence insertion IIUC, the conversion part is not part of atomic operations and fences should be put around converted atomic operations. This also fixes atomic load of floating point values which requires fence on PowerPC. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D127609	2022-08-31 09:54:58 +08:00
gonglingqin	e9a4b8e397	[LoongArch] Optimize the atomic store with amswap_db.[w/d] When AtomicOrdering is release or stronger, use amswap_db.[w/d] $zero, $a1, $a0 instead of dbar 0 st.[w/d] $a0, $a1, 0 Thanks to @xry111 for the suggestion: https://reviews.llvm.org/D128901#3626635 Differential Revision: https://reviews.llvm.org/D129838	2022-08-23 17:11:57 +08:00
gonglingqin	47f3dc6d49	[LoongArch] Add codegen support for atomic fence, atomic load and atomic store Differential Revision: https://reviews.llvm.org/D128901	2022-07-13 15:25:45 +08:00
Kai Luo	6710b21d46	[PowerPC] Allow llvm.ppc.cfence to accept pointer types In the context of atomic load, integer, pointer and float point types are allowed, thus we should allow llvm.ppc.cfence to accept any type mentioned. Fixes https://github.com/llvm/llvm-project/issues/55983. Reviewed By: shchenz, vchuravy Differential Revision: https://reviews.llvm.org/D127554	2022-06-24 10:55:32 +08:00

1 2 3

105 Commits