llvm-project

Author	SHA1	Message	Date
Anna Thomas	b2195bc771	[SelectionDAG][AArch64] Legalize FMAXIMUM/FMINIMUM The missing legalization in SelectionDAG was identified when adding the intrinsic support for vector reduction for maximum/minimum (D152370). Fixes part of PR: https://github.com/llvm/llvm-project/issues/63267 Differential Revision: https://reviews.llvm.org/D152718	2023-06-12 12:22:21 -04:00
Kazu Hirata	9eea63bc9c	[AMDGPU] Fix resource-usage-pal.ll	2023-06-12 08:06:46 -07:00
Baptiste	3604fdf18d	[AMDGPU] Do not assume stack size for PAL code object indirect calls There is no need to set a big default stack size for PAL code object indirect calls. The driver knows the max recursion depth, so it can compute a more accurate value from the minimum scratch size. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D150609	2023-06-12 10:14:17 -04:00
Ivan Kosarev	d09fa8ff2c	[AMDGPU][GFX11] Add test coverage for cases involving conversions from and to fp16 values. Other such tests, of which there are many, are to be updated with separate patches. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D152557	2023-06-12 13:04:40 +01:00
Francesco Petrogalli	45902a25fa	[MISched] Require asserts and AArch64 registered target for test. Fixes failure at https://lab.llvm.org/buildbot/#/builders/124/builds/7472: ``` llc: Unknown command line argument '-debug-only=machine-scheduler'. Try: '/home/buildbot/as-worker-91/clang-with-lto-ubuntu/build/stage1/bin/llc --help' ``` Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D152703	2023-06-12 13:51:19 +02:00
Simon Pilgrim	3d34f7be73	[GlobalIsel][X86] Rename x86_64-select-fcmp.mir to select-fcmp.mir and add 32-bit test coverage x86_64 was being used as shorthand for SSE2	2023-06-12 12:41:27 +01:00
Simon Pilgrim	22c17c6a1f	[GlobalIsel][X86] Move G_FCMP getActionDefinitionsBuilder out of setLegalizerInfo64bit and add 32-bit support We were using x86_64-only support as a SSE2 proxy	2023-06-12 12:18:37 +01:00
Simon Pilgrim	6a12ab874a	[GlobalIsel][X86] Regenerate legalize-fcmp.mir	2023-06-12 12:18:36 +01:00
Simon Pilgrim	1a576aa09d	[GlobalIsel][X86] Rename x86_64-legalize-fcmp to legalize-fcmp 32-bit support will be added shortly - x86_64 was being used a shorthand for SSE2	2023-06-12 12:18:36 +01:00
Luke Lau	2a1716dec5	[LegalizeTypes][VP] Widen load/store of fixed length vectors to VP ops If we have a load/store with an illegal fixed length vector result type that needs widened, e.g. `x:v6i32 = load p` Instead of just widening it to: `x:v8i32 = load p` We can widen it to the equivalent VP operation and set the EVL to the exact number of elements needed: `x:v8i32 = vp_load a, b, mask=true, evl=6` Provided that the target supports vp_load/vp_store on the widened type. Scalable vectors are already widened this way where possible, so this largely reuses the same logic. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D148713	2023-06-12 10:21:04 +01:00
Francesco Petrogalli	15a16ef8e0	[MISched] Use StartAtCycle in trace dumps. This commit re-work the methods that dump traces with resource usage to take into account the StartAtCycle value added by https://reviews.llvm.org/D150310. For each i, the values of the lists StartAtCycle and ReservedCycles is are printed with the interval [StartAtCycle[i], ReservedCycles[i]) ``` ... \| StartAtCycle[i] \| ... \| ReservedCycles[i] - 1 \| ReservedCycles[i] \| ... \| xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx \| \| ``` Reviewed By: andreadb Differential Revision: https://reviews.llvm.org/D150311	2023-06-12 09:11:48 +02:00
Fangrui Song	849f1dd15e	[XRay] Rename XRayOmitFunctionIndex to XRayFunctionIndex Apply my post-commit comment on D81995. The negative name misguided commit d8a8e5d6240a1db809cd95106910358e69bbf299 (`[clang][cli] Remove marshalling from Opt{In,Out}FFlag`) to: * accidentally flip the option to not emit the xray_fn_idx section. * change -fno-xray-function-index (instead of -fxray-function-index) to emit xray_fn_idx This patch renames XRayOmitFunctionIndex and makes -fxray-function-index emit xray_fn_idx, but the default remains -fno-xray-function-index .	2023-06-11 15:27:22 -07:00
Oleksii Lozovskyi	c72dea88b6	[AArch64][ARM][X86] Split XRay tests for Linux/macOS XRay instrumentation works for macOS running on Apple Silicon, but codegen is untested there. I'm going to make changes affecting this target, get the XRay tests running on AArch64. Data sections are going to become slightly different on x86_64 soon. I do want the tests to be specific about symbol names, so instead of having test check the common step, bifurcate tests a bit and check the full symbol names. As for ARM, XRay is not really supported on iOS at the moment, though ARM is also really used there with modern phones. Nevertheless, codegen tests exist and the output is going to change a little, make it easier to write the special case for iOS. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D145291	2023-06-11 12:53:29 -07:00
Simon Pilgrim	26706807c9	[GlobalIsel][X86] Ensure bit count legalizer patterns keep matching result + input scalar types	2023-06-11 18:56:28 +01:00
Ben Shi	71d90f3108	[AVR] Optimize 8-bit rotation when rotation bits == 3 Fixes https://github.com/llvm/llvm-project/issues/63100 Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D152365	2023-06-11 08:41:47 +08:00
Ben Shi	e21df8296d	[AVR] Optimize 8-bit rotation when rotation bits >= 4 Fixes https://github.com/llvm/llvm-project/issues/63100 Reviewed By: aykevl, Patryk27, jacquesguan Differential Revision: https://reviews.llvm.org/D152130	2023-06-11 08:36:22 +08:00
Noah Goldstein	b6808ba291	[X86] Make constant `mul` -> `shl` + `add`/`sub` work for vector types Something like: `%r = mul %x, <33, 33, 33, ...>` Is best lowered as: `%tmp = %shl x, <5, 5, 5>; %r = add %tmp, %x` As well, since vectors have non-destructive shifts, we can also do cases where the multiply constant is `Pow2A +/- Pow2B` for arbitrary A and B, unlike in the scalar case where the extra `mov` instructions make it not worth it. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D150324	2023-06-10 14:38:46 -05:00
Matt Arsenault	6d2e5c3445	LowerMemIntrinsics: Skip memmove with different address spaces This is a quick fix for an assert when the source and dest have different address spaces. The pointer compare needs to have matching types, but we can't generically introduce addrspacecast and we don't know if the address spaces alias.	2023-06-10 12:28:05 -04:00
Ben Shi	f3837e726f	[AVR] Fix incorrect expansion of pseudo instruction ROLBRd Since ROLBRd needs an implicit R1 (on AVR) or an implicit R17 (on AVRTiny), we split ROLBRd to ROLBRdR1 (on AVR) and ROLBRdR17 (on AVRTiny). Reviewed By: aykevl, Patryk27 Differential Revision: https://reviews.llvm.org/D152248	2023-06-11 00:20:43 +08:00
Ben Shi	cef723a0fe	[AVR] Enable sub register liveness Reviewed By: Patryk27 Differential Revision: https://reviews.llvm.org/D152606	2023-06-11 00:16:35 +08:00
Ben Shi	3b8c12c18e	[AVR][NFC] Improve CodeGen tests Reviewed By: Patryk27 Differential Revision: https://reviews.llvm.org/D152605	2023-06-11 00:15:20 +08:00
Matt Arsenault	abff7668ab	AMDGPU: Implement known bits functions for min3/max3/med3	2023-06-10 10:58:44 -04:00
Matt Arsenault	f24de950e5	AMDGPU: Add baseline tests for known bits handling of med3	2023-06-10 10:58:39 -04:00
Matt Arsenault	5b657f50b8	AMDGPU: Move LICM after AMDGPUCodeGenPrepare The commit that added the run says it's to hoist uniform parts of integer division expansion. That expansion is performed later, so this didn't do anything in that case. Move this later so the original test shows the improvement. This also saves a run of "Canonicalize natural loops". Not sure why this appears to be still getting a separate loop PM run. Also feels a bit heavy to run this just for divide. Is there a way to specifically hoist the divide sequence when it expands?	2023-06-10 07:37:32 -04:00
Thorsten Schütt	24f49deacf	[GlobalIsel][X86] Legalize G_FREEZE Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D152501	2023-06-10 07:33:02 +02:00
Matt Arsenault	4c0fc4841b	AMDGPU: Mark scalar loads as rematerializable This should be true, but this is useless as is. The rematerialization logic only permits rematerialize with constant physical register uses, so non-constant physregs or virtual register uses (the case that really matters) are not rematerialized. Add the tests which shows nothing happens, but should in the future. Also, all loads should really be rematerializable so in the future this should apply to all the other kinds.	2023-06-09 21:20:21 -04:00
Matt Arsenault	4e4c351ae5	AMDGPU: Avoid endpgm in middle of block for fallback trap lowering. This was inserting an s_endpgm in the middle of the block when it has to be a terminator. Split the block and insert a branch to a new block with the trap if it's not in a terminator position. Fixes verifier error on LDS in function with no trap support (and other trap sources).	2023-06-09 21:04:38 -04:00
Matt Arsenault	3c848194f2	CodeGen: Expand memory intrinsics in PreISelIntrinsicLowering Expand large or unknown size memory intrinsics into loops in the default lowering pipeline if the target doesn't have the corresponding libfunc. Previously AMDGPU had a custom pass which existed to call the expansion utilities. With a default no-libcall option, we can remove the libfunc checks in LoopIdiomRecognize for these, which never made any sense. This also provides a path to lifting the immarg restriction on llvm.memcpy.inline. There seems to be a bug where TLI reports functions as available if you use -march and not -mtriple.	2023-06-09 21:04:37 -04:00
Matt Arsenault	4469aff148	AMDGPU: Add baseline tests for integer mad matching Test some clpeak-like patterns with multiple use muls.	2023-06-09 19:17:56 -04:00
Amara Emerson	6f6298e5b3	[GlobalISel] Fix D144336 in a different way, by choosing operands from the first of the div/rem insts. Differential Revision: https://reviews.llvm.org/D144336	2023-06-09 15:06:06 -07:00
Artem Belevich	8006c7e3f2	[NVPTX] Remove few more unneeded fp16 instruction variants Differential Revision: https://reviews.llvm.org/D152478	2023-06-09 12:09:08 -07:00
Amara Emerson	1c2c668846	[GlobalISel] Introduce G_CONSTANT_FOLD_BARRIER and use it to prevent constant folding hoisted constants. The constant hoisting pass tries to hoist large constants into predecessors and also generates remat instructions in terms of the hoisted constants. These aim to prevent codegen from rematerializing expensive constants multiple times. So we can re-use this optimization, we can preserve the no-op bitcasts that are used to anchor constants to the predecessor blocks. SelectionDAG achieves this by having the OpaqueConstant node, which is just a normal constant with an opaque flag set. I've opted to avoid introducing a new constant generic instruction here. Instead, we have a new G_CONSTANT_FOLD_BARRIER operation that constitutes a folding barrier. These are somewhat like the optimization hints, G_ASSERT_ZEXT in that they're eliminated by the generic instruction selection code. This change by itself has very minor improvements in -Os CTMark overall. What this does allow is better optimizations when future combines are added that rely on having expensive constants remain unfolded. Differential Revision: https://reviews.llvm.org/D144336	2023-06-09 11:45:06 -07:00
Caleb Zulawski	18077e9fd6	[WebAssembly] Re-land 8392bf6000ad Correctly handle single-element vectors to fix an assertion failure. Add tests that were missing from the original commit. Differential Revision: D151782	2023-06-09 08:42:27 -07:00
Simon Pilgrim	0662167c5b	[GlobalIsel][X86] Update legalization of G_PTR_ADD Replace the legacy legalizer versions Add test coverage for 32-bit targets and non-constant ptr offsets	2023-06-09 13:27:25 +01:00
pvanhout	df1782c2a2	[MCP] Do not remove redundant copy for COPY from undef I don't think we can safely remove the second COPY as redundant in such cases. The first COPY (which has undef src) may be lowered to a KILL instruction instead, resulting in no COPY being emitted at all. Testcase is X86 so it's in the same place as other testcases for this function, but this was initially spotted on AMDGPU with the following: ``` renamable $vgpr24 = PRED_COPY undef renamable $vgpr25, implicit $exec renamable $vgpr24 = PRED_COPY killed renamable $vgpr25, implicit $exec ``` The second COPY waas removed as redundant, and the first one was lowered to a KILL (= removed too), causing $vgpr24 to not have $vgpr25's value. Fixes SWDEV-401507 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D152502	2023-06-09 14:23:57 +02:00
David Stuttard	90431ca2e0	Reland [AMDGPU] New PAL metadata updates to ps_extra_lds_size and float_mode New metadata format contains full calculation of field contents for ps_extra_lds_size (vs old format where the value in RSRC register is used by PAL to calculate the value required). Also stop updating float_mode and rely on front end settings for this field. Differential Revision: https://reviews.llvm.org/D152247	2023-06-09 12:34:00 +01:00
Simon Pilgrim	b8f053f5d7	[GlobalIsel][X86] Add 32-bit test coverage to zero count tests This shows a current problem with G_CTTZ_ZERO_UNDEF result legalizations	2023-06-09 10:36:32 +01:00
Simon Pilgrim	2717d98a1e	[GlobalIsel][X86] legalize-select.mir - add x86-64 test coverage	2023-06-09 10:36:32 +01:00
pvanhout	ecbd37d5a3	[AMDGPU] Port no-hsa-graphic-shaders.ll to code object V4 Split from D146023 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D152432	2023-06-09 09:07:53 +02:00
Pravin Jagtap	f6c8a8e9cb	[AMDGPU] Iterative scan implementation for atomic optimizer. This patch provides an alternative implementation to DPP for Scan Computations. An alternative implementation iterates over all active lanes of Wavefront using llvm.cttz and performs the following steps: 1. Read the value that needs to be atomically incremented using llvm.amdgcn.readlane intrinsic 2. Accumulate the result. 3. Update the scan result using llvm.amdgcn.writelane intrinsic if intermediate scan results are needed later in the kernel. Reviewed By: arsenm, cdevadas Differential Revision: https://reviews.llvm.org/D147408	2023-06-09 01:08:44 -04:00
Amara Emerson	086601eac2	[GlobalISel] Implement some binary reassociations, G_ADD for now - (op (op X, C1), C2) -> (op X, (op C1, C2)) - (op (op X, C1), Y) -> (op (op X, Y), C1) Some code duplication with the G_PTR_ADD reassociations unfortunately but no easy way to avoid it that I can see. Differential Revision: https://reviews.llvm.org/D150230	2023-06-08 21:14:58 -07:00
Phoebe Wang	c778ca201e	[X86][BF16] Split vNbf16 vectors according to vNf16 Fixes #63017 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D151778	2023-06-09 09:04:56 +08:00
Phoebe Wang	7634905a73	[X86][BF16] Share FP16 vector ABI with BF16 The ABI of BF16 is identical to FP16 rather than i16. Fixes #62997 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D151710	2023-06-09 09:04:56 +08:00
Heejin Ahn	90073e8de3	[WebAssembly] Error out on invalid personality functions Without explicitly checking and erroring out, an invalid personality function, which is not `__gxx_wasm_personality_v0`, caused a segmentation fault down the line because `WasmEHFuncInfo` was not created. This explicitly checks the validity of personality functions in functions with EH pads and errors out explicitly with a helpful error message. This also adds some more assertions to ensure `WasmEHFuncInfo` is correctly created and non-null. Invalid personality functions wouldn't be generated by our Clang, but can be present in handwritten ll files, and more often, in files transformed by passes like `metarenamer`, which is often used with `bugpoint` to simplify names in `bugpoint`-reduced files. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D152203	2023-06-08 16:59:49 -07:00
Thomas Lively	100c756d96	Revert "Improve WebAssembly vector bitmask, mask reduction, and extending" This reverts commit 8392bf6000ad039bd0e55383d40a05ddf7b4af13. The commit missed some edge cases that led to crashes. Reverting to resolve downstream breakage while a fix is pending.	2023-06-08 14:36:29 -07:00
Matt Arsenault	c01f284fbb	AMDGPU: Fix regressions in integer mad matching Undo the canonicalize done in 0cfc6510323fbb5a56a5de23cbc65f7cc30fd34c. Restores some regressed matching of integer mad. The selection patterns fo the actual mads don't seem to be properly commuting, so some of the commuted cases are still missed. Fixes: SWDEV-363009	2023-06-08 16:48:47 -04:00
Artem Belevich	c16b7e54ac	[NVPTX] Allow using v4i32 for memcpy lowering. Differential Revision: https://reviews.llvm.org/D152317	2023-06-08 13:28:43 -07:00
Krzysztof Parzyszek	c6ddd04d73	[RDF] Create individual phi for each indivisible register This isn't quite using register units, but it's getting close. The phi generation is driven by register units, but each phi still contains a reference to a register, potentially with a mask that amounts to a unit. In cases of explicit register aliasing this may still create phis with references that are aliased, whereas separate phis would ideally contain disjoint references (this is all within a single basic block). Previously phis used maximal registers, now they use minimal. This is a step towards both, using register units directly, and a simpler liveness calculation algorithm. The idea is that a phi cannot reach a reference to anything smaller than the phi itself represents. Before there could be a phi for R1_R0, now there will be two for this case (assuming R0 and R1 have one unit each).	2023-06-08 11:07:57 -07:00
Thorsten Schütt	0b771c679a	[GlobalIsel][X86] Legalize G_SELECT with bug fixes Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D152445	2023-06-08 18:54:25 +02:00
Craig Topper	167f2fa1b6	[RISCV] Fix crash in lowerVECTOR_INTERLEAVE when VecVT is an LMUL=8 type. If VecVT is an LMUL=8 VT, we can't concatenate the vectors as that would create an illegal type. Instead we need to split the vectors and emit two VECTOR_INTERLEAVE operations that can each be lowered. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D152411	2023-06-08 08:41:38 -07:00

1 2 3 4 5 ...

48420 Commits