llvm-project

Author	SHA1	Message	Date
Krzysztof Drewniak	88871784fd	[AMDGPU] Allow buffer intrinsics to be marked volatile at the IR level (#77847 ) In order to ensure the correctness of ptr addrspace(7) lowering, we need a backwards-compatible way to flag buffer intrinsics as volatile that can't be dropped (unlike metadata). To acheive this in a backwards-compatible way, we use bit 31 of the auxilliary immediates of buffer intrinsics as the volatile flag. When this bit is set, the MachineMemOperand for said intrinsic is marked volatile. Existing code will ensure that this results in the appropriate use of flags like glc and dlc. This commit also harmorizes the handling of the auxilliary immediate for atomic intrinsics, which new go through extract_cpol like loads and stores, which masks off the volatile bit.	2024-01-12 11:20:01 -06:00
Jay Foad	dec74a8347	[AMDGPU] Fix VS_CNT overflow assertion (#77935 ) Always set the upper bound for VS_CNT higher than the lower bound. Before #77439 this code was only executed on function entry where the lower bound was 0 so it was not a problem. Fixes #77931	2024-01-12 17:11:19 +00:00
Philip Reames	e4d01bb227	[SCEV] Special case sext in isKnownNonZero (#77834 ) The existing logic in isKnownNonZero relies on unsigned ranges, which can be problematic when our range calculation is imprecise. Consider the following: %offset.nonzero = or i32 %offset, 1 --> %offset.nonzero U: [1,0) S: [1,0) %offset.i64 = sext i32 %offset.nonzero to i64 --> (sext i32 %offset.nonzero to i64) U: [-2147483648,2147483648) S: [-2147483648,2147483648) Note that the unsigned range for the sext does contain zero in this case despite the fact that it can never actually be zero. Instead, we can push the query down one level - relying on the fact that the sext is an invertible operation and that the result can only be zero if the input is. We could likely generalize this reasoning for other invertible operations, but special casing sext seems worthwhile.	2024-01-12 07:45:28 -08:00
Maciej Gabka	5dbf178154	[TLI][NFC] Fix ordering of ArmPL and SLEEF tests (#77609 ) This patch sorts the tests which check if SLEEF and ArmPL mappings are used, in the order of the math functions base names.	2024-01-12 15:06:25 +00:00
Natalie Chouinard	4f47372f8c	[SPIR-V] Add Float16 support when targeting Vulkan (#77115 ) Add Float16 to Vulkan's available capabilities, and guard Float16Buffer (Kernel-only capability) against being added outside OpenCL environments. Add tests to verify half and half vector types, and validate with spirv-val. Fixes #66398	2024-01-12 10:03:48 -05:00
Matthew Devereau	a8f83cc159	[AArch64][SME] Fix multi vector cvt builtins (#77656 ) This fixes cvt multi vector builtins that erroneously had inverted return vectors and vector parameters. This caused the incorrect instructions to be emitted.	2024-01-12 09:55:52 +00:00
Fangrui Song	7e604485e1	[test] Improve x86 inline asm tests Reorganize asm-modifier and make other cleanups.	2024-01-11 23:35:46 -08:00
Amara Emerson	1833e3fafa	Fix test failure introduced in 3baedb411121c188c4bb07f47efb755bf4d4cf87	2024-01-11 22:06:01 -08:00
Shengchen Kan	4f71068b72	[X86] Correct the asm comment for compression NF_ND -> NF	2024-01-12 12:55:11 +08:00
Carl Ritson	6752f1517d	[TwoAddressInstruction] Recompute live intervals for partial defs (#74431 ) Force live interval recomputation for a register if its definition is narrowed to become partial. The live interval repair process cannot otherwise detect these changes.	2024-01-12 13:26:01 +09:00
Emil J	3baedb4111	[GISel] Fix #77762 : extend correct source registers in combiner helper rule extend_through_phis (#77765 ) Since we already know which register we want to extend, we don't have to ask its defining MI about it --------- Co-authored-by: Emil Tywoniak <Emil.Tywoniak@hightec-rt.com>	2024-01-12 12:09:58 +08:00
Dávid Ferenc Szabó	d1d1e7d6d0	[NFC] Updating the tests for combine-ext.mir (#77756 )	2024-01-12 10:20:53 +07:00
Shengchen Kan	9095eec052	[X86][CodeGen] Support EVEX compression: NDD to nonNDD (#77731 )	2024-01-12 10:03:30 +08:00
Philip Reames	5ce067d592	Revert "[LSR][TTI][RISCV] Disable terminator folding for RISC-V." This reverts commit fdb87640ee2be63af9b0e0cd943cb13d79686a03, and thus re-enables terminator folding for RISCV. The reported miscompile has been fixed in f5dd70c58277d925710e5a7c25c86d7565cc3c6c.	2024-01-11 13:20:02 -08:00
Usman Nadeem	c3e3aa9c33	[AArch64][SVE2] Generate XAR (#77160 ) Bitwise exclusive OR and rotate right by immediate Select xar (x, y, imm) for the following pattern: or (shl (xor x, y), nBits-imm), (shr (xor x, y), imm) This is essentially: rotr (xor(x, y), imm)	2024-01-11 10:56:29 -08:00
Luke Lau	114e6d7ba0	[RISCV] Add test for strided gather with recursive disjoint or. NFC This already gets converted to a strided intrinsic because we currently call haveNoCommonBitsSet when checking or instructions, but an upcoming patch will change this logic and we want to preserve this case. Note that this IR is in the form that comes from instcombine. The splats need to be inline constexprs, otherwise isSplatValue() will fail. (It can't currently handle splats where the shufflevector is an instruction, and the insertelement is a constexpr.	2024-01-12 00:02:28 +07:00
Mirko Brkušanin	3867e6689e	[AMDGPU] Add new GFX12 image atomic float instructions (#76946 )	2024-01-11 17:28:04 +01:00
HaohaiWen	b6fc463d4c	[SEH] Redirect test output to /dev/null (#77784 )	2024-01-11 23:31:57 +08:00
Luke Lau	3b3ee1f534	[RISCV] Add test for strided gather with disjoint or. NFC	2024-01-11 22:08:57 +07:00
Nikita Popov	13b5882ee6	[PowerPC] Add test for #77748 (NFC)	2024-01-11 15:45:52 +01:00
HaohaiWen	f892cc36fd	[BranchFolding] Fix missing predecessors of landing-pad (#77608 ) When removing an empty machine basic block, all of its successors should be inherited by its fall through MBB. This keeps CFG as only have one entry which is required by LiveDebugValues. Reland #77441 as LiveDebugValues test.	2024-01-11 22:09:41 +08:00
Jay Foad	b120dae9bb	[AMDGPU] Support GFX12 VDSDIR instructions WAITVMSRC operand in GCNHazardRecognizer (#77628 ) Modify GCNHazardRecognizer::fixLdsDirectVMEMHazard() so the waitvsrc operand in gfx12 DS_PARAM_LOAD or DS_DIRECT_LOAD instructions is set appropriately depending on whether a hazard is found or not, rather than inserting an S_WAITCNT_DEPCTR instruction if a hazard needs to be mitigated. Co-authored-by: Stephen Thomas <Stephen.Thomas@amd.com>	2024-01-11 13:20:19 +00:00
John Brawn	40d5c2bcd4	[clang][AArch64] Add a -mbranch-protection option to enable GCS (#75486 ) -mbranch-protection=gcs (enabled by -mbranch-protection=standard) causes generated objects to be marked with the gcs feature. This is done via the guarded-control-stack module flag, in a similar way to branch-target-enforcement and sign-return-address. Enabling GCS causes the GNU_PROPERTY_AARCH64_FEATURE_1_GCS bit to be set on generated objects. No code generation changes are required, as GCS just requires that functions are called using BL and returned from using RET (or other similar variant instructions), which is already the case.	2024-01-11 12:53:23 +00:00
Amara Emerson	bbbe8ecc17	[GlobalISel][Localizer] Allow localization of a small number of repeated phi uses. (#77566 ) We previously had a heuristic that if a value V was used multiple times in a single PHI, then to avoid potentially rematerializing into many predecessors we bail out. The phi uses only counted as a single use in the shouldLocalize() hook because it counted the PHI as a single instruction use, not factoring in it may have many incoming edges. It turns out this heuristic is slightly too pessimistic, and allowing a small number of these uses to be localized can improve code size due to shortening live ranges, especially if those ranges span a call. This change results in some improvements in size on CTMark -Os: ``` Program size.__text before after diff kimwitu++/kc 451676.00 451860.00 0.0% mafft/pairlocalalign 241460.00 241540.00 0.0% tramp3d-v4/tramp3d-v4 389216.00 389208.00 -0.0% 7zip/7zip-benchmark 587528.00 587464.00 -0.0% Bullet/bullet 457424.00 457348.00 -0.0% consumer-typeset/consumer-typeset 405472.00 405376.00 -0.0% SPASS/SPASS 410288.00 410120.00 -0.0% lencod/lencod 426396.00 426108.00 -0.1% ClamAV/clamscan 380108.00 379756.00 -0.1% sqlite3/sqlite3 283664.00 283372.00 -0.1% Geomean difference -0.0% ``` I experimented with different variations and thresholds. Using 3 instead of 2 resulted in a further 0.1% improvement on ClamAV but also regressed sqlite3 by the same %.	2024-01-11 18:57:37 +08:00
Shengchen Kan	e4e0b65838	[X86][test] Pre-commit test for #77731	2024-01-11 18:51:34 +08:00
Sjoerd Meijer	75d820dcdd	[AArch64] MI Scheduler: create more LDP/STP pairs (#77565 ) Target hook `canPairLdStOpc` is missing quite a few opcodes for which LDPs/STPs can created. I was hoping that it would not be necessary to add these missing opcodes here and that the attached motivating test case would be handled by the LoadStoreOptimiser (especially after #71908), but it's not. The problem is that after register allocation some things are a lot harder to do. Consider this for the motivating example ``` [1] renamable $q1 = LDURQi renamable $x9, -16 :: (load (s128) from %ir.r51, align 8, !tbaa !0) [2] renamable $q2 = LDURQi renamable $x0, -16 :: (load (s128) from %ir.r53, align 8, !tbaa !4) [3] renamable $q1 = nnan ninf nsz arcp contract afn reassoc nofpexcept FMLSv2f64 killed renamable $q1(tied-def 0), killed renamable $q2, renamable $q0, implicit $fpcr [4] STURQi killed renamable $q1, renamable $x9, -16 :: (store (s128) into %ir.r51, align 1, !tbaa !0) [5] renamable $q1 = LDRQui renamable $x9, 0 :: (load (s128) from %ir.r.G0001_609.0, align 8, !tbaa !0) ``` We can't combine the the load in line [5] into the load on [1]: regisister q1 is used in between. And we can can't combine [1] into [5]: it is aliasing with the STR on line [4]. So, adding some missing opcodes here seems the best/easiest approach. I will follow up to add some more missing cases here.	2024-01-11 09:46:47 +00:00
Simon Pilgrim	7bf13fe812	[DAG] Fold (sext (sext_inreg x)) -> (sext (trunc x)) if the trunc is free (#77616 )	2024-01-11 09:39:30 +00:00
Thorsten Schütt	d7642b2200	[GlobalIsel] Combine select to integer minmax (second attempt). (#77520 ) Instcombine canonicalizes selects to floating point and integer minmax. This and the dag combiner canonicalize to floating point minmax. None of them canonicalizes to integer minmax. On Neoverse V2 basic integer arithmetic and integer minmax have the same costs.	2024-01-11 09:50:33 +01:00
Diana Picus	16945bc16d	[AMDGPU] Don't send DEALLOC_VGPRs after calls (#77439 ) Calls do not have to wait for VsCnt, so after they return there might still be scratch stores in progress. It's important that we don't send the DEALLOC_VGPR message in that case, since that might release the VGPRs and scratch allocation before those stores are complete.	2024-01-11 09:14:52 +01:00
Luke Lau	e8790027b1	[RISCV] Allow vsetvlis with same register AVL in doLocalPostpass (#76801 )	2024-01-11 12:12:46 +07:00
Shengchen Kan	1fe7bdb87b	[X86][CodeGen] Support lowering for NDD ADD/SUB/ADC/SBB/OR/XOR/NEG/NOT/INC/DEC/IMUL (#77564 ) We supported encoding/decoding for these instructions in https://github.com/llvm/llvm-project/pull/76319 https://github.com/llvm/llvm-project/pull/76721 https://github.com/llvm/llvm-project/pull/76919	2024-01-11 12:15:17 +08:00
Min-Yih Hsu	03be448cce	[RISCV][AMDGPU] Mark test/CodeGen/Generic/live-debug-label.ll XFAIL for RISCV and AMDGPU (#77631 ) Both RISC-V and AMDGPU(GCN) deploy two VirtRegRewriter in their codegen pipeline. This test prematurely stops at the first one, which doesn't cleanup the virtual register map and cause an assertion failure. Ideally we can solve this by teaching `-stop-after` how to stop at the last instance of a Pass, but we're just marking XFAIL for these two targets for now.	2024-01-10 16:47:34 -08:00
Craig Topper	3378514a4d	[RISCV] Use any_extend for type legalizing atomic_compare_swap with Zacas. (#77669 ) With Zacas we will use amocas.w which doesn't require the input to be sign extended.	2024-01-10 12:41:11 -08:00
Craig Topper	0a1b066bba	[RISCV] Support isel for Zacas for XLen and i32. (#77666 ) This adds new isel patterns for Zacas that take priority over the pseudoinstructions we use for the A extension. Support for 2x XLen types will come in a separate patch since they need to be done differently.	2024-01-10 12:00:40 -08:00
CarolineConcatto	14e7dac92a	[Clang][LLVM][AArch64]SVE2.1 update the intrinsics according to acle[1] (#76844 ) This patch changes the following intrinsic ```svst1uwq[_{d}] replaced by svst1wq[_{d}] svst1uwq_vnum[_{d}] replaced by svst1wq_vnum[_{d}] svst1udq[_{d}] replaced by svst1dq[_{d}] svst1udq_vnum[_{d}] replaced by svst1dq_vnum[_{d}] ``` Drops 'u' from the quadword stores because it is simply truncating the quadwords to 32 bits ``` svextq_lane[_{d}] replaced by svextq[_{d}] ``` EXTQ follows the previous defined EXT intrinsics ``` svdot[_{d}_{2}_{3}] replaced by svdot[_{d}_{2}] ``` Introduced with the latest SME2 ACLE change [1]https://github.com/ARM-software/acle/pull/257	2024-01-10 17:12:14 +00:00
Sander de Smalen	d7ac412333	[AArch64][SME] Fix definition of uclamp/sclamp instructions. (#77619 ) For some reason the arguments were in the wrong order.	2024-01-10 17:07:03 +00:00
HaohaiWen	9bde5becb4	[BranchFolding][SEH] Add test to track SEH CFG optimization (#77598 ) This test tracks BranchFolding pass which removes fall through jump and leaves landing-pad to be machine basic block of no predecessors. It would raise bug as introduced in #77441.	2024-01-10 22:34:18 +08:00
Ulrich Weigand	9aa8c82748	[SystemZ] Fix 256-bit shifts when i128 is legal When i128 is a legal type, SelectionDAG now attempts to use SRL_PARTS etc. with type i128, which is not implemented. Fix by marking those as Expand, just like we do for i64. Fixes https://github.com/llvm/llvm-project/issues/77132	2024-01-10 15:12:19 +01:00
Simon Pilgrim	cc21aa1922	[X86] lower1BitShuffle - fold permute(setcc(x,y)) -> setcc(permute(x),permute(y)) for 32/64-bit element vectors Noticed in #77459 - for wider element types, its usually better to pre-shuffle the comparison arguments if we can, like we already for broadcasts	2024-01-10 12:35:50 +00:00
Simon Pilgrim	78cf2c041b	[X86] pr77459.ll - add missing AVX512 check prefixes Missed these in 3210ce276350a247220b193db12a9b45d1034724 for the #77459 fix	2024-01-10 12:09:38 +00:00
Jay Foad	08da7ac80c	[AMDGPU] Fix broken sign-extended subword buffer load combine (#77470 )	2024-01-10 10:50:13 +00:00
Ivan Kosarev	084f1c2ee0	[AMDGPU][True16] Support V_CEIL_F16. (#73108 ) As not all fake instructions have their real counterparts implemented yet, we specify no AssemblerPredicate for UseFakeTrue16Insts to allow both fake and real True16 instructions in assembler and disassembler tests in the -mattr=+real-true16 mode during the transition period. Source DPP and desitnation VOPDstOperand_t16 operands are still not supported and will be addressed separately.	2024-01-10 08:46:19 +00:00
Craig Topper	b788692fa5	[RISCV][NFC] Remove unused CHECK prefixes to fix buildbots. NFC	2024-01-09 23:37:18 -08:00
Serge Pavlov	7fc7ef1434	[GlobalISel] Lowering of {get,set,reset}_fpenv (#75086 ) The intrinsics get_fpenv, set_fpenv and reset_fpenv in this change are implemented as calls to math library functions. Target specific lowering will be implemented later on.	2024-01-10 14:18:00 +07:00
Juneyoung Lee	7388b7422f	[WebAssembly] Correctly consider signext/zext arg flags at function declaration (#77281 ) This patch fixes WebAssembly's FastISel pass to correctly consider signext/zeroext parameter flags at function declaration. Previously, the flags at call sites were only considered during code generation, which caused an interesting bug report #63388 . This is problematic especially because in WebAssembly's ABI, either signext or zeroext can be tagged to a function argument, and it must be correctly reflected in the generated code. Unit test https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/WebAssembly/signext-zeroext.ll shows that `i8 zeroext %t` and `i8 signext %t`'s code gen are different.	2024-01-09 23:54:43 -06:00
jiahanxie353	e42a70afab	[RISCV][GISel] IRTranslate and Legalize some instructions with scalable vector type * Add IRTranslate tests for ADD, SUB, AND, OR, and XOR with scalable vector types to show that they work as expected. * Legalize G_ADD, G_SUB, G_AND, G_OR, and G_XOR of scalable vector type for the RISC-V vector extension.	2024-01-09 21:51:30 -07:00
Chia	a79d13f12a	[RISCV][ISel] Use vaaddu with rounding mode rnu for ISD::AVGCEILU. (#77473 ) Similar to #76550, but for `ISD::AVGCEILU`. Specifically, this patch aims to use `vaaddu` with rounding mode rnu (i.e `vxrm[1:0] = 0b00`) for `ISD::AVGCEILU`. ### Source code ``` define <vscale x 8 x i8> @vaaddu_vv_nxv8i8_ceil(<vscale x 8 x i8> %x, <vscale x 8 x i8> %y) { %xzv = zext <vscale x 8 x i8> %x to <vscale x 8 x i16> %yzv = zext <vscale x 8 x i8> %y to <vscale x 8 x i16> %add = add nuw nsw <vscale x 8 x i16> %xzv, %yzv %one = insertelement <vscale x 8 x i16> poison, i16 1, i32 0 %splat = shufflevector <vscale x 8 x i16> %one, <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer %add1 = add nuw nsw <vscale x 8 x i16> %add, %splat %div = lshr <vscale x 8 x i16> %add1, %splat %ret = trunc <vscale x 8 x i16> %div to <vscale x 8 x i8> ret <vscale x 8 x i8> %ret } ``` ### Before this patch ``` vaaddu_vv_nxv8i8_ceil: vsetvli a0, zero, e8, m1, ta, ma vwaddu.vv v10, v8, v9 vsetvli zero, zero, e16, m2, ta, ma vadd.vi v10, v10, 1 vsetvli zero, zero, e8, m1, ta, ma vnsrl.wi v8, v10, 1 ret ``` ### After this patch ``` vaaddu_vv_nxv8i8_ceil: vsetvli a0, zero, e8, m1, ta, ma csrwi vxrm, 0 vaaddu.vv v8, v8, v9 ret ```	2024-01-10 12:08:16 +09:00
HaohaiWen	c9124adfd8	Revert "[SEH][CodeGen] Add test to track CFG optimization bug for SEH" (#77542 ) Reverts llvm/llvm-project#77441 I'll land it with fix.	2024-01-10 09:25:45 +08:00
Kai Luo	6615581526	[PowerPC] Make verifier happy when lowering `llvm.trap` (#77266 ) `llvm.trap` is lowered to `PPC::TRAP` and `PPC::TRAP` is set as terminator. Verifier complains about terminator should not lie in the middle of an MBB. See #77095. Fix it by removing `isTerminator` and `isBarrier` and then set `isTrap` which was introduced by https://reviews.llvm.org/D48836# and is being used by X86 and AArch64. `PPC::TRAP` is not a hardware memory barrier and `llvm.trap` doesn't indicate a memory barrier either.	2024-01-10 09:23:30 +08:00
Zequan Wu	4e8986fc58	[Coverage] Mark coverage sections as metadata sections on COFF. (#76834 ) Mark `.lcovmap$M`, `.lcovfun$M`, `.lcovd` and `.lcovn` as metadata sections on COFF so they are not loaded into memory.	2024-01-09 16:58:28 -05:00

... 24 25 26 27 28 ...

52796 Commits