llvm-project

Author	SHA1	Message	Date
Vladislav Dzhidzhoev	e69916c943	[AArch64][GlobalISel] Legalize integer across-lane intrinsics with actual type Across-lane intrinsics with integer destination type (uaddv, saddv, umaxv, smavx, uminv, sminv) were legalized with the destination type given in the LLVM IR intrinsic’s definition. It was wider than the actual destination type of the corresponding machine instruction. InstructionSelect was implicitly supposed to generate underlying extension instructions for these intrinsics, while the real destination type was opaque for other GlobalISel passes. Thus, llvm/test/CodeGen/AArch64/arm64-vaddv.ll failed on GlobalISel since the generated code was worse in functions that used the value of an across-lane intrinsic in following FP&SIMD instructions (functions with _used_by_laneop suffix). Here intrinsics are legalized and selected with an actual destination type, making it transparent to other passes. If the destination value is used in further instructions accepting FPR registers, there won’t be extra copies across register banks. i16 type is added to the list of the types of the FPR16 register bank to make it possible, and a few SelectionDAG patterns are modified to eliminate ambiguity in TableGen. Differential Revision: https://reviews.llvm.org/D156831	2023-08-17 18:19:56 +02:00
David Green	cf65afbf93	[AArch64][GISel] Extend lowering for fp round intrinsics. This extends the lowering of ceil, floor, nearbyint, rint, round, roundeven and trunc. They are all very similar, so can reuse the same legalization info. selectIntrinsicTrunc and selectIntrinsicRound can be removed as they can be selected via tablegen patterns, and G_INTRINSIC_ROUNDEVEN is marked as a gisel equivalent of froundeven. Otherwise this reuses the existing code, filling it out to handle more types. Differential Revision: https://reviews.llvm.org/D157679	2023-08-17 16:25:32 +01:00
Fraser Cormack	c058eb998a	[AArch64] Fix crash when neither Neon nor SVE are enabled The subtarget was unconditionally reporting that SVE was to be used to lower vectors when Neon was unavailable, even when SVE itself was unavailable. This decision leads other parts of the compiler to crash, e.g., when querying SVE vector sizes. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D158179	2023-08-17 16:07:27 +01:00
Pravin Jagtap	af5fd142d3	[AMDGPU] Extend f32 support for llvm.amdgcn.update.dpp intrinsic This will be useful to avoid the bit-casting noise required to extend support for Floating Point Operations in atomic optimizer for DPP in D156301 Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D156647	2023-08-17 10:45:19 -04:00
Christudasan Devadasan	81827f8cfb	[AMDGPU] Support wwm-reg AV spill pseudos The wwm register spill pseudos are currently defined for VGPR_32 regclass. It causes a verifier error for gfx908 or above as the regalloc sometimes restores the values to the vector superclass AV_32. Fixing it by supporting AV wwm-spill pseudos as well. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155646	2023-08-17 20:04:18 +05:30
Tuan Chuong Goh	74e191da07	[AArch64][GlobalISel] Combiner for EXT Keep components of UNMERGE larger after running the Artifact Combiner on it. This was intended to help with <v16i64> = G_SEXT <v16i16>, but implementation for legalizing EXT is in a following patch, therefore a test for this case will be included in the following patch Differential Revision: https://reviews.llvm.org/D157715	2023-08-17 14:34:45 +01:00
Simon Pilgrim	a342f9802b	[X86] dagcombine-shifts.ll - add i686 test coverage	2023-08-17 13:35:05 +01:00
Simon Pilgrim	3c2432690a	[X86] Remove combineVectorSignBitsTruncation and leave TRUNCATE -> PACKSS/PACKUS to legalization/lowering Don't prematurely fold TRUNCATE nodes to PACKSS/PACKUS target nodes - we miss out on generic TRUNCATE folds. Helps some regressions from D152928 and #63946 Fixes #63710	2023-08-17 12:23:29 +01:00
Simon Pilgrim	dd7ba38078	[X86] matchTruncateWithPACK - consistently prefer shuffles for truncation to sub-64-bit vXi16 If we're truncating from v2i32 / v2i64 then PSHUFLW / PSHUFD+PSHUFLW should more easily allow further shuffle combines than a PACK chain will	2023-08-17 12:23:28 +01:00
Nikita Popov	6f1d9fb987	[X86] Add some i128 argument passing tests (NFC)	2023-08-17 12:00:37 +02:00
Nabeel Omer	d06608060c	[X86] Fix aliasing check between TargetFrameIndex and FrameIndex Compare slot indices instead of comparing pointer values. Closes #63645 Differential Revision: https://reviews.llvm.org/D157513	2023-08-17 10:48:08 +01:00
David Green	8fc6b1a18f	[AArch64] Add some vcvt tests. NFC. See D157679. This also removes some duplication from arm64-vfloatintrinsics.ll, where the tests now exist elsewhere.	2023-08-17 10:41:52 +01:00
Jonas Hahnfeld	eeac4321c5	Disable two tests without {arm,aarch64}-registered-target	2023-08-17 10:04:38 +02:00
Han Shen	317a0fe5bd	[Driver][CodeGen] Properly handle -fsplit-machine-functions for fatbinary compilation. When building a fatbinary, the driver invokes the compiler multiple times with different "--target". (For example, with "-x cuda --cuda-gpu-arch=sm_70" flags, clang will be invoded twice, once with --target=x86_64_...., once with --target=sm_70) If we use -fsplit-machine-functions or -fno-split-machine-functions for such invocation, the driver reports an error. This CL changes the behavior so: - "-fsplit-machine-functions" is now passed to all targets, for non-X86 targets, the flag is a NOOP and causes a warning. - "-fno-split-machine-functions" now negates -fsplit-machine-functions (if -fno-split-machine-functions appears after any -fsplit-machine-functions) for any target triple, previously, it causes an error. - "-fsplit-machine-functions -Xarch_device -fno-split-machine-functions" enables MFS on host but disables MFS for GPUS without warnings/errors. - "-Xarch_host -fsplit-machine-functions" enables MFS on host but disables MFS for GPUS without warnings/errors. Reviewed by: xur, dhoekwater Differential Revision: https://reviews.llvm.org/D157750	2023-08-16 23:41:34 -07:00
Fangrui Song	4c89277095	[Mips][MC] AttemptToFoldSymbolOffsetDifference: revert isMicroMips special case D52985/D57677 added a .gcc_except_table workaround, but the new behavior doesn't match GNU assembler. ``` void foo(); int bar() { foo(); try { throw 1; } catch (int) { return 1; } return 0; } clang --target=mipsel-linux-gnu -mmicromips -S a.cc mipsel-linux-gnu-gcc -mmicromips -c a.s -o gnu.o .uleb128 ($cst_end0)-($cst_begin0) // bit 0 is not forced to 1 .uleb128 ($func_begin0)-($func_begin0) // bit 0 is not forced to 1 ``` I have inspected `.gcc_except_table` output by `mipsel-linux-gnu-gcc -mmicromips -c a.cc`. The `.uleb128` values are not forced to set the least significant bit. In addition, D57677's adjustment (even->odd) to CodeGen/Mips/micromips-b-range.ll is wrong. PC-relative `.long func - .` values will differ from GNU assembler as well. The original intention of D52985 seems unclear to me. I think whatever goal it wants to achieve should be moved to an upper layer. This isMicroMips special case has caused problems to fix MCAssembler::relaxLEB to use evaluateAsAbsolute instead of evaluateKnownAbsolute, which is needed to proper support R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128. Differential Revision: https://reviews.llvm.org/D157655	2023-08-16 23:11:59 -07:00
Justin Bogner	72017fcf00	[DirectX] Only embed dxil when writing object files When emitting assembly we don't particularly want the binary DXIL embedded in the output. This was mostly there for testing purposes, so we update those tests to run the test directly using `opt` and restrict the -dxil-embed and -dxil-globals passes to running normally only in the case where we're trying to emit a DXContainer. Differential Revision: https://reviews.llvm.org/D158051	2023-08-16 13:12:32 -07:00
Daniel Hoekwater	2c43d591c6	[CodeGen] Move function splitting tests from X86 to Generic (NFC) Machine function splitting will become available for AArch64; since MFS is no longer X86-only, the tests for generic behavior should live somewhere other than tests/CodeGen/X86. MFS implementation doesn't vary much across platforms, and most tests should be identical between X86 and AArch64 besides instruction selection, so the tests can live together in tests/CodeGen/Generic. Differential Revision: https://reviews.llvm.org/D157563	2023-08-16 18:11:23 +00:00
Matt Arsenault	c9d0d15e69	AMDGPU: Refine some rsq formation tests Drop unnecessary flags and metadata, add contract flags that should be necessary.	2023-08-16 13:37:03 -04:00
Kazushi (Jam) Marukawa	922ac64b04	[VE] Avoid vectorizing store/load in scalar mode Avoid vectorizing store and load instructions in scalar mode. Reviewed By: efocht Differential Revision: https://reviews.llvm.org/D158049	2023-08-17 02:15:54 +09:00
Nicholas Guy	d65feccb12	[ARM] Set preferred function alignment Aligning functions yields small performance gains on embedded cores, moreso with numerous small function calls. Similar to aligning loops, if the function can fit within a single cache line then the performance overhead of fetching more instructions can be limited. Differential Revision: https://reviews.llvm.org/D157514	2023-08-16 17:31:21 +01:00
David Green	a047dfe0d5	[AArch64][GISel] Lower EXT of 0 to a COPY This allows us to select G_SHUFFLE_VECTOR with identity masks (possibly including undef elements), but avoid the actual EXT instruction if the shift amount is 0.	2023-08-16 17:12:15 +01:00
Eduard Zingerman	8f28e8069c	[BPF] support for BPF_ST instruction in codegen Generate store immediate instruction when CPUv4 is enabled. For example: $ cat test.c struct foo { unsigned char b; unsigned short h; unsigned int w; unsigned long d; }; void bar(volatile struct foo p) { p->b = 1; p->h = 2; p->w = 3; p->d = 4; } $ clang -O2 --target=bpf -mcpu=v4 test.c -c -o - \| llvm-objdump -d - ... 0000000000000000 <bar>: 0: 72 01 00 00 01 00 00 00 (u8 )(r1 + 0x0) = 0x1 1: 6a 01 02 00 02 00 00 00 (u16 )(r1 + 0x2) = 0x2 2: 62 01 04 00 03 00 00 00 (u32 )(r1 + 0x4) = 0x3 3: 7a 01 08 00 04 00 00 00 (u64 *)(r1 + 0x8) = 0x4 4: 95 00 00 00 00 00 00 00 exit Take special care to: - apply `BPFMISimplifyPatchable::checkADDrr` rewrite for BPF_ST - validate immediate value when BPF_ST write is 64-bit: BPF interprets `(BPF_ST \| BPF_MEM \| BPF_DW)` writes as writes with sign extension. Thus it is fine to generate such write when immediate is -1, but it is incorrect to generate such write when immediate is +0xffff_ffff. This commit was previously reverted in e66affa17e32. The reason for revert was an unrelated bug in BPF backend, triggered by test case added in this commit if LLVM is built with LLVM_ENABLE_EXPENSIVE_CHECKS. The bug was fixed in D157806. Differential Revision: https://reviews.llvm.org/D140804	2023-08-16 17:51:28 +03:00
Philip Reames	3c2a66973e	[RISCVInsertVSETVLI] Generalize scalar extract (vmv.x.s, and vmx.f.s) hamdling vmv.x.s and vmv.f.s are unconditional. They read the low element of a vector register (not vector group), and function even when VL=0 or VSTART>0. As such, they are don't care with respect to both VL and LMUL. We'd previously had handling in the forward pass only via the NoRegister mechanusm. (The only instructions with SEW but without VL are these extracts.) This patch moves that handling into getDemanded so that the backwards pass benefits as well. Differential Revision: https://reviews.llvm.org/D157991	2023-08-16 07:50:59 -07:00
Philip Reames	b06e52c32f	[RISCVInsertVSETVLI] Default to VL=1 for scalar extracts We were defaulting to VL=0 when we didn't otherwise have a vsetv nearby. Instead, let's use VL=1. VL=0 is very much a cornercase in hardware, and let's avoid if we can. Differential Revision: https://reviews.llvm.org/D158015	2023-08-16 07:35:00 -07:00
Matt Arsenault	66ee794064	AMDGPU: Fix verifier error on splatted opencl fmin/fmax and ldexp calls Apparently the spec has overloads for fmin/fmax and ldexp with one of the operands as scalar. We need to broadcast the scalars to the vector type. https://reviews.llvm.org/D158077	2023-08-16 09:42:26 -04:00
Paul Walker	566065207b	[SelectionDAG] Use TypeSize variant of ComputeValueVTs to compute correct offsets for scalable aggregate types. Differential Revision: https://reviews.llvm.org/D157872	2023-08-16 11:56:31 +00:00
Ivan Kosarev	d7efe41598	[AMDGPU] Autogenerate the v_cndmask.ll and llvm.amdgcn.image.msaa.load.ll codegen tests. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D157970	2023-08-16 12:50:51 +01:00
Ivan Kosarev	f9ab235318	[AMDGPU] Autogenerate the fmuladd.f16.ll and llvm.fmuladd.f16.ll codegen tests. Reviewed By: Joe_Nash Differential Revision: https://reviews.llvm.org/D157966	2023-08-16 12:49:45 +01:00
David Green	983185d3c7	[AArch64] Regenerate postlegalizer-lowering-shuffle-splat.mir and postlegalizer-lowering-zip.mir. NFC	2023-08-16 11:00:50 +01:00
Carl Ritson	d0e246ff16	[LiveRange] Fix inaccurate verification of live-in PhysRegs Fix verification that a PhysReg is live in to an MBB. isLiveIn does not handle reg units, so cannot identify when a register would be defined because its super register is partially defined. Additionally a PhysReg may be partial defined at block entry and then fully defined before any use. Reviewed By: foad, arsenm Differential Revision: https://reviews.llvm.org/D157086	2023-08-16 17:42:42 +09:00
David Green	c5f763b563	[AArch64][GISel] Fix selection of G_CONSTANT_FOLD_BARRIER As far as I understand - When lowering a G_CONSTANT_FOLD_BARRIER we replace the DstReg with SrcReg, and need to check that the register class is equivalent when doing so for the replacement to be legal. During lowering we could end up visiting nodes in an odd order, leaving a G_CONSTANT_FOLD_BARRIER with a known regclass for the src, but only a regbank for the dst. Providing the Regbank contains the regclass, the replacement should still be safe. This fixes an assert seen in the llvm-test-suite when lowering hoisted constants, relaxing canReplaceReg to account for the case when the regbank covers the regclass, so it is better able to handle differences in visiting order. Differential Revision: https://reviews.llvm.org/D157202	2023-08-16 08:33:16 +01:00
Noah Goldstein	e7f7b63fb3	[DAGCombiner][X86] Guard `(X & Y) ==/!= Y` --> `(X & Y) !=/== 0` behind TLI preference On X86 for vec types `(X & Y) == Y` is generally preferable to `(X & Y) != 0`. Creating zero requires an extra instruction and on pre-avx512 targets there is no vector `pcmpne` so it requires two additional instructions to invert the `pcmpeq`. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D157014	2023-08-16 02:00:15 -05:00
Noah Goldstein	2549ec1866	[SelectionDAG] Improve `isKnownToBeAPowerOfTwo` Add additional cases for: select, vselect, {u,s}{min,max}, and, casts, rotl, rotr And improve handling of constants and shifts. Differential Revision: https://reviews.llvm.org/D156778	2023-08-16 02:00:15 -05:00
Noah Goldstein	ac485e4072	[SelectionDAG] Add/Improve cases in `isKnownNeverZero` 1) Handle casts a bit more cleanly just with a loop rather than with recursion. 2) Add additional cases for smin/smax 3 ) For shifts we can also deduce non-zero if the maximum shift amount on the known 1s is non-zero. Differential Revision: https://reviews.llvm.org/D156777	2023-08-16 02:00:15 -05:00
Noah Goldstein	2937d03e49	[X86] Add more tests for `isKnownNeverZero`; Differential Revision: https://reviews.llvm.org/D156776	2023-08-16 02:00:15 -05:00
Noah Goldstein	86345eb459	[X86] Add tests for `isKnownToBeAPowerOfTwo`; NFC Differential Revision: https://reviews.llvm.org/D156775	2023-08-16 02:00:14 -05:00
Aiden Grossman	87cccf8f6d	[MLGO] Remove unsupported tag from BB profile dump test This was added originally as the test was failing on NVPTX before an explicit target triple was set on the llc invocation. The test was fixed in 4afb1ee7bc0e3674238da2d3668f8d8b80596c62 but the unsupported directive was never removed.	2023-08-15 20:36:19 -07:00
Daniel Hoekwater	e8540723b3	Revert "[CodeGen] Move function splitting tests from X86 to Generic (NFC)" This reverts commit 1670e0ea076b12b3fcea2ea63f01bc09e0e7f3b2. Causes https://lab.llvm.org/buildbot/#/builders/188/builds/33943	2023-08-16 01:46:35 +00:00
Daniel Hoekwater	d7bca8e494	[AArch64] Relax cross-section branches Because the code layout is not known during compilation, the distance of cross-section jumps is not knowable at compile-time. Because of this, we should assume that any cross-sectional jumps are out of range. This assumption is necessary for machine function splitting on AArch64, which introduces cross-section branches in the middle of functions. The linker relaxes out-of-range unconditional branches, but it clobbers X16 to do so; it doesn't relax conditional branches, which must be manually relaxed by the compiler. Differential Revision: https://reviews.llvm.org/D145211	2023-08-16 01:43:07 +00:00
Daniel Hoekwater	1670e0ea07	[CodeGen] Move function splitting tests from X86 to Generic (NFC) Machine function splitting will become available for AArch64; since MFS is no longer X86-only, the tests for generic behavior should live somewhere other than tests/CodeGen/X86. MFS implementation doesn't vary much across platforms, and most tests should be identical between X86 and AArch64 besides instruction selection, so the tests can live together in tests/CodeGen/Generic. Differential Revision: https://reviews.llvm.org/D157563	2023-08-16 01:25:54 +00:00
Yeting Kuo	818e76d6f2	[RISCV] Add MC layer support for Zicfilp. This adds extension Zicfilp and support pseudo instruction lpad. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D157362	2023-08-16 08:52:51 +08:00
Craig Topper	6eb36aed86	[RISCV][GISel] Mark G_GLOBAL_VALUE as legal. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D157950	2023-08-15 09:26:29 -07:00
Craig Topper	ae76574d4a	[RISCV][GISel] Make G_ICMP of pointers legal. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D157823	2023-08-15 09:25:52 -07:00
Craig Topper	88903fac1f	[RISCV][GISel] Make G_SELECT of pointers legal Reviewed By: reames Differential Revision: https://reviews.llvm.org/D157819	2023-08-15 09:20:08 -07:00
Simon Pilgrim	b0a77af4f1	[DAG] SimplifyDemandedBits - add sra(shl(x,c1),c1) -> sign_extend_inreg(x) demanded elts fold Move the sra(shl(x,c1),c1) -> sign_extend_inreg(x) fold inside SimplifyDemandedBits so we can recognize hidden splats with DemandedElts masks. Because the c1 shift amount has multiple uses, hidden splats won't get simplified to a splat constant buildvector - meaning the existing fold in DAGCombiner::visitSRA can't fire as it won't see a uniform shift amount. I also needed to add TLI preferSextInRegOfTruncate hook to help keep truncate(sign_extend_inreg(x)) vector patterns on X86 so we can use PACKSS more efficiently. Differential Revision: https://reviews.llvm.org/D157972	2023-08-15 16:32:03 +01:00
Joe Nash	a093032981	[AMDGPU][True16] Update FPToI1Pat GFX11 pat to use GFX11 instruction These cmp patterns were using the pre-GFX11 pseudo instruction, and so failed to compile. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D157912	2023-08-15 11:11:17 -04:00
Matt Arsenault	d251761660	AMDGPU: Replace log libcalls with log intrinsics	2023-08-15 10:48:46 -04:00
Matt Arsenault	81b278e613	AMDGPU: Fix fast f32 exp2 Mirror of the previous log changes, OpenCL conformance doesn't like interpreting afn as ignore denormal handling but was previously hidden by flag dropping.	2023-08-15 10:48:46 -04:00
Matt Arsenault	4b7b4b9458	AMDGPU: Fix fast f32 log/log10 OpenCL conformance didn't like interpreting afn as ignore the denormal handling. https://reviews.llvm.org/D157940	2023-08-15 10:48:46 -04:00
Matt Arsenault	e09b3593ba	AMDGPU: Fix fast math log2 f32 Apparently afn doesn't allow you to drop the denormal handling according to OpenCL conformance. This was hidden by losing the flags during the library linking process. Fast log is still broken and needs more work. https://reviews.llvm.org/D157936	2023-08-15 10:48:46 -04:00

... 64 65 66 67 68 ...

52796 Commits