llvm-project

Author	SHA1	Message	Date
Durgadoss R	340cc1702e	[LLVM][NVPTX]: Add intrinsic for setmaxnreg (#77289 ) This patch adds an intrinsic for setmaxnreg PTX instruction. * PTX Doc link for this instruction: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-setmaxnreg * The i32 argument, an immediate value, specifies the actual absolute register count for the instruction. * The `setmaxnreg` instruction is available in SM90a. So, this patch adds 'hasSM90a' predicate to use in the NVPTX backend. * lit tests are added to verify the lowering of the intrinsic. * Verifier logic (and tests) are added to test the register count range and divisibility-by-8 requirements. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2024-01-09 12:04:13 -08:00
Fangrui Song	6c207ee5d2	[RISCV] Force relocations if initial MCSubtargetInfo contains FeatureRelax (#77436 ) Regarding ``` .option norelax j label .option relax // relaxable instructions // For assembly input, RISCVAsmParser::ParseInstruction will set ForceRelocs (https://reviews.llvm.org/D46423). // For direct object emission, ForceRelocs is not set after https://github.com/llvm/llvm-project/pull/73721 label: ``` The J instruction needs a relocation to ensure the target is correct after linker relaxation. This is related a limitation in the assembler: RISCVAsmBackend::shouldForceRelocation decides upfront whether a relocation is needed, instead of checking more information (whether there are relaxable fragments in between). Despite the limitation, `j label` produces a relocation in direct object emission mode, but was broken by #73721 due to the shouldForceRelocation limitation. Add a workaround to RISCVTargetELFStreamer to emulate the previous behavior. Link: https://github.com/ClangBuiltLinux/linux/issues/1965	2024-01-09 11:24:21 -08:00
Shilei Tian	b629b8662c	[AMDGPU][MC] Use normal ELF syntax for section switching (#77267 ) For some reasons `SunStyleELFSectionSwitchSyntax` is set to `true` for AMDGPU, but according to https://github.com/llvm/llvm-project/issues/64862#issuecomment-1880419239 that syntax is only limited to Sun system. Fix #64862.	2024-01-09 14:13:42 -05:00
Simon Pilgrim	3210ce2763	[X86] Fold (iX bitreverse(bitcast(vXi1 X))) -> (iX bitcast(shuffle(X))) X86 doesn't have a BITREVERSE instruction, so if we're working with a casted boolean vector, we're better off shuffling the vector instead if we have PSHUFB (SSSE3 or later) Fixes #77459	2024-01-09 19:06:32 +00:00
Simon Pilgrim	417df8ee4a	[X86] Add test coverage for #77459	2024-01-09 18:55:05 +00:00
Fangrui Song	f972e4d343	[MC,ELF] .section: unconditionally print section flag 'G' after 'o' * Placing 'G' before 'M' (SHF_MERGE) can be misleading as the sh_entsize argument goes before the section group name, if a reader doesn't know that the order of extra arguments is not affected by the order of flags. * 'a', 'w', and 'x' indicate basic permission-related flags. Separating them with 'G' is kinda ugly. Simplify code and move 'G' after 'o'. The new output is more similar to GCC.	2024-01-09 10:48:23 -08:00
Fangrui Song	7620f03ef7	[MC] Parse SHF_LINK_ORDER argument before section group name (#77407 ) When both SHF_LINK_ORDER \| SHF_GROUP flags are set, GNU assembler from 2.35 onwards (https://sourceware.org/PR25381 https://sourceware.org/binutils/docs/as/Section.html) parses the SHF_LINK_ORDER argument before section group name, different from us. This is unfortunate, but does not matter because the `.section` flag `o` is a niche feature only used by compiler instrumentations, not adopted by hand-written assembly, and using both flags is extremely rare. Let's just match GNU assembler. There is another benefit: we now support zero-flag section group with the SHF_LINK_ORDER flag, while previously there isn't a syntax. While here, print 'G' after 'o' to be clear that the 'G' argument is parsed after the 'o' argument. To make the diff smaller, we don't print 'G' after 'w' in the absence of 'o' for now.	2024-01-09 10:42:34 -08:00
Matt Arsenault	888a20c466	AMDGPU: Drop amdgpu-no-lds-kernel-id attribute in LDS lowering (#71481 ) This is in preparation for moving the run of AMDGPUAttributor earlier. Currently it infers the lack of the corresponding intrinsic calls, so if we introduce new ones we need to remove the attribute from any possible transitive callers. This is more conservative than necessary, we could try to identify specific subgraphs where LDS globals are not used. Other options include teaching the attributor to avoid adding it in cases where the lowering may choose the table, but this seems more complex. Alternatively could add a second run which doesn't seem worth it. Depends #71349	2024-01-10 00:12:40 +07:00
HaohaiWen	a2dba0c977	[SEH][CodeGen] Add test to track CFG optimization bug for SEH (#77441 ) LiveDebugValues requires CFG only has one entry. BranchFolding and MachineBlockPlacement may remove all predecessors of landing pad which leaves it to be another entry.	2024-01-09 22:30:13 +08:00
wanglei	98c6aa7229	[LoongArch] Implement LoongArchRegisterInfo::canRealignStack() (#76913 ) This patch fixes the crash issue in the test: CodeGen/LoongArch/can-not-realign-stack.ll Register allocator may spill virtual registers to the stack, which introduces stack alignment requirements (when the size of spilled registers exceeds the default alignment size of the stack). If a function does not have stack alignment requirements before register allocation, registers used for stack alignment will not be preserved. Therefore, we should implement `canRealignStack()` to inform the register allocator whether it is allowed to perform stack realignment operations.	2024-01-09 20:35:49 +08:00
wanglei	f499472de3	[LoongArch] Pre-commit test for #76913 . NFC This test will crash with expensive check. Crash message: ``` * Bad machine code: Using an undefined physical register * - function: main - basic block: %bb.0 entry (0x20fee70) - instruction: $r3 = frame-destroy ADDI_D $r22, -288 - operand 1: $r22 ```	2024-01-09 20:32:20 +08:00
Saiyedul Islam	4f7c402d9f	[AMDGPU][NFC] Update left over tests for COV5 (#76984 ) Update AMDGPU CodeGen lit tests to check for COV5 ABI.	2024-01-09 17:31:42 +05:30
Matt Arsenault	9be29ad48c	AMDGPU: Regenerate test checks Fix test failures after auto-merge of f9fec402896a90f3b09cea359c330f65a0908649	2024-01-09 17:56:27 +07:00
Matt Arsenault	daecc303bb	AMDGPU: Replace sqrt OpenCL libcalls with llvm.sqrt (#74197 ) The library implementation is just a wrapper around a call to the intrinsic, but loses metadata. Swap out the call site to the intrinsic so that the lowering can see the !fpmath metadata and fast math flags. Since d56e0d07cc5ee8e334fd1ad403eef0b1a771384f, clang started placing !fpmath on OpenCL library sqrt calls. Also don't bother emitting native_sqrt anymore, it's just another wrapper around llvm.sqrt.	2024-01-09 15:13:58 +07:00
Nick Anderson	f1ec0d12bb	Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#77182 ) Port CodeGenPrepare to new pass manager and dependency BasicBlockSectionsProfileReader Fixes: #75380 Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>	2024-01-09 13:32:59 +07:00
Shengchen Kan	38ce770ef1	[X86][test] Add test to check ah is not allocatable for register class gr8_norex2 This test should be added after #73529	2024-01-09 14:28:38 +08:00
Chia	0c24c175f2	[RISCV][ISel] Use vaaddu with rounding mode rdn for ISD::AVGFLOORU. (#76550 ) This patch aims to use `vaaddu` with rounding mode rdn (i.e `vxrm[1:0] = 0b10`) for `ISD::AVGFLOORU`. ### Source code ``` define <8 x i8> @vaaddu_auto(ptr %x, ptr %y, ptr %z) { %xv = load <8 x i8>, ptr %x, align 2 %yv = load <8 x i8>, ptr %y, align 2 %xzv = zext <8 x i8> %xv to <8 x i16> %yzv = zext <8 x i8> %yv to <8 x i16> %add = add nuw nsw <8 x i16> %xzv, %yzv %div = lshr <8 x i16> %add, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1> %ret = trunc <8 x i16> %div to <8 x i8> ret <8 x i8> %ret } ``` ### Before this patch ``` vaaddu_auto: vsetivli zero, 8, e8, mf2, ta, ma vle8.v v8, (a0) vle8.v v9, (a1) vwaddu.vv v10, v8, v9 vnsrl.wi v8, v10, 1 ret ``` ### After this patch ``` vaaddu_auto: vsetivli zero, 8, e8, mf2, ta, ma vle8.v v8, (a0) vle8.v v9, (a1) csrwi vxrm, 2 vaaddu.vv v8, v8, v9 ret ``` ### Note on signed averaging addition Based on the rvv spec, there is also a variant for signed averaging addition called `vaadd`. But AFAIU, no matter in which rounding mode, we cannot achieve the semantic of signed averaging addition through `vaadd`. Thus this patch only introduces `vaaddu`.	2024-01-09 15:17:38 +09:00
Craig Topper	a8e9dceb49	[RISCV] Use getELen() instead of hardcoded 64 in lowerBUILD_VECTOR. (#77355 ) This is needed to properly support Zve32x.	2024-01-08 19:36:15 -08:00
James Y Knight	b856e77b2d	Set MaxAtomicSizeInBitsSupported for remaining targets. (#75703 ) Targets affected: - NVPTX and BPF: set to 64 bits. - ARC, Lanai, and MSP430: set to 0 (they don't implement atomics). Those which didn't yet add AtomicExpandPass to their pass pipeline now do so. This will result in larger atomic operations getting expanded to `__atomic_*` libcalls via AtomicExpandPass. On all these targets, this now matches what Clang already does in the frontend. The only targets which do not configure AtomicExpandPass now are: - DirectX and SPIRV: they aren't normal backends. - AVR: a single-cpu architecture with no privileged/user divide, which could implement all atomics by disabling/enabling interrupts, regardless of size/alignment. Will be addressed by future work.	2024-01-08 22:34:28 -05:00
Jim Lin	96c4f1034c	[RISCV] Add support predicating for ANDN/ORN/XNOR with short-forward-branch-opt. (#77077 ) ANDN/ORN/XNOR are like other ALU instructions. It should be able to be predicated by the cpu that supports short-forward-branch.	2024-01-09 11:12:44 +08:00
Usman Nadeem	ac8b4f8749	[AArch64][SVE2] Add pattern for BCAX (#77159 ) Bitwise clear and exclusive or Add pattern for: xor x, (and y, not(z)) -> bcax x, y, z	2024-01-08 15:51:33 -08:00
Craig Topper	faa326de97	[RISCV] Add branch+c.mv macrofusion for sifive-p450. (#76169 ) sifive-p450 supports a very restricted version of the short forward branch optimization from the sifive-7-series. For sifive-p450, a branch over a single c.mv can be macrofused as a conditional move operation. Due to encoding restrictions on c.mv, we can't conditionally move from X0. That would require c.li instead.	2024-01-08 15:23:26 -08:00
Jay Foad	daa4728dee	[AMDGPU] Add CodeGen support for GFX12 s_mul_u64 (#75825 )	2024-01-08 19:13:38 +00:00
Min-Yih Hsu	478ec63312	[RISCV] Mark VFIRST and VCPOP as SignExtendingOpW (#77022 ) Since their values are small enough ([-1, 65535] & [0, 65535], respectively) to fit into signed 32 bits, any sext (or downcasting + sext) will be redundnat. Hence marking them as SignExtendingOpW.	2024-01-08 10:59:06 -08:00
Min-Yih Hsu	4c66180e46	[RISCV] Use COPY to create artificial 64-bit uses in RISCVOptWInstrs's tests In reflection of 4dd5d967975fa8d52b8c60596d892d9dd5615809, we can now use COPY to physical registers to create artificial 64-bit uses to prevent RISCVOptWInstrs from optimizing away sext in absent of the IsSignExtendingOpW flag. NFCI.	2024-01-08 10:03:32 -08:00
Simon Pilgrim	d460c1de3b	[DAG] SimplifyDemandedBits - don't fold sext(x) -> aext(x) if we lose an 0/-1 allsignbits mask (#77296 ) For targets that use 0/-1 boolean results, we want to keep this pattern through extensions/truncations as much as possible - so avoid simplifying to any_extend even if we don't demand the upper bits. Noticed in triage for https://reviews.llvm.org/D152928	2024-01-08 18:01:41 +00:00
Simon Pilgrim	52ebf61bac	[X86] ftrunc.ll - replace X32 checks with X86. NFC. We try to use X32 for gnux32 triples only. Add common AVX check prefix for 32/64 bit test coverage	2024-01-08 17:25:44 +00:00
Simon Pilgrim	fbfc9cb7ea	[X86] vector-shuffle-mmx.ll - replace X32 checks with X86. NFC. We try to use X32 for gnux32 triples only. Add nounwind to remove cfi noise as well.	2024-01-08 17:25:44 +00:00
Simon Pilgrim	9632f98716	[X86] legalize-shl-vec.ll - replace X32 checks with X86. NFC. We try to use X32 for gnux32 triples only. Add nounwind to remove cfi noise as well.	2024-01-08 17:25:44 +00:00
Simon Pilgrim	635f6d3845	[X86] inline-sse.ll - replace X32 checks with X86. NFC. We try to use X32 for gnux32 triples only.	2024-01-08 17:25:44 +00:00
Simon Pilgrim	8bd16789ff	[X86] lea-2.ll - replace X32 checks with X86. NFC. We try to use X32 for gnux32 triples only (although in this case the gnux32 tests share the X64 checks)	2024-01-08 17:25:43 +00:00
Simon Pilgrim	61dcfaa745	[X86] i64-mem-copy.ll - replace X32 checks with X86. NFC. We try to use X32 for gnux32 triples only. Add nounwind to remove cfi noise as well.	2024-01-08 17:25:43 +00:00
Simon Pilgrim	f3f6677311	[X86] combine-bextr.ll - replace X32 checks with X86. NFC. We try to use X32 for gnux32 triples only. Add nounwind to remove cfi noise as well.	2024-01-08 17:25:43 +00:00
Amara Emerson	ff47989ec2	[AArch64][GlobalISel] Allow anyexting loads from 32b -> 64b to be legal. We can already support selection of these through imported patterns, we were just missing the legalizer rule to allow these to be formed. Nano size benefit overall.	2024-01-08 08:37:47 -08:00
Mirko Brkušanin	7ca4473dd9	[AMDGPU] Add new cache flushing instructions for GFX12 (#76944 ) Co-authored-by: Diana Picus <Diana-Magda.Picus@amd.com>	2024-01-08 14:06:58 +00:00
Simon Pilgrim	0e4a38018a	[X86] avx2-nontemporal.ll - replace X32 checks with X86. NFC. We try to use X32 for gnux32 triples only.	2024-01-08 12:37:34 +00:00
Simon Pilgrim	f1e3a8f1eb	[X86] avx2-gather.ll - replace X32 checks with X86. NFC. We try to use X32 for gnux32 triples only.	2024-01-08 12:37:06 +00:00
Simon Pilgrim	e3f8e44b00	[X86] vector-lzcnt-256.ll / vector-tzcnt-256.ll - replace X32 checks with X86. NFC. We try to use X32 for gnux32 triples only.	2024-01-08 12:35:48 +00:00
Simon Pilgrim	eb523a4d27	[X86] vec_extract - replace X32 checks with X86. NFC. We try to use X32 for gnux32 triples only.	2024-01-08 12:33:47 +00:00
Matt Arsenault	bdbaf6e61b	AMDGPU: Make v8bf16/v16bf16 legal types (#76678 ) Depends #76217	2024-01-08 18:59:01 +07:00
Nathan Gauër	a9ffc92fc4	[SPIR-V] Add pre-headers to loops. (#75844 ) This is the first of the 7 steps outlined in #75801. This PR explicitely calls the SimplifyLoops pass. Directly following this pass should follow the 6 others required to structurize the IR. Running this pass could generate empty basic-blocks, which are implicit fallthrough to the successor BB. There was a specific condition in the SPIR-V ISel which handled implicit fallthrough, but it couldn't work on empty basic-blocks. This commits removes the old logic, and adds this new logic, which checks all basic-blocks for implicit fallthroughs, including empty ones. --------- Signed-off-by: Nathan Gauër <brioche@google.com>	2024-01-08 11:41:45 +01:00
Paschalis Mpeis	68a1583a89	[TLI] replace-with-veclib works with FRem Instruction. (#76166 ) Updated SLEEF and ArmPL tests with Fixed-Width and Scalable cases for frem. Those are mapped to fmod/fmodf.	2024-01-08 08:53:15 +00:00
Shengchen Kan	1c674666fa	[X86] Support EVEX compression for EGPR (#77202 ) Compress promoted instruction (EVEX) to pre-promotion instruction (legacy/VEX) when R16-R31 is not used. Alternative of #77065	2024-01-08 16:50:23 +08:00
Amara Emerson	ca20c99bb1	[GlobalISel][IRTranslator] Port switch binary tree search optimization. (#77279 ) This re-uses some code extracted earlier from SelectionDAG into SwitchLoweringUtils Much of the code is a straight port from SDAG's splitWorkItem(), with minor changes needed for GISel.	2024-01-08 15:53:09 +08:00
Amara Emerson	9de81ce87d	NFC: Another pre-commit test change.	2024-01-07 23:22:14 -08:00
Amara Emerson	fe1364f1e7	Update pre-committed test. Accidentally committed the wrong version, this one properly demonstrates the upcoming change.	2024-01-07 22:53:48 -08:00
Amara Emerson	624b48789f	[AArch64][NFC] Pre-commit IR translator switch lowering test.	2024-01-07 22:14:50 -08:00
Kai Luo	225e2704af	[PowerPC] Precommit test for lowering llvm.trap on ppc64le. NFC.	2024-01-08 10:20:01 +08:00
Chen Zheng	d6aef863d8	[PowerPC] make LR/LR8 CTR/CTR8 aliased (#76926 ) fixes https://github.com/llvm/llvm-project/issues/47156 fixes https://github.com/llvm/llvm-project/issues/47155	2024-01-08 09:37:40 +08:00
Fangrui Song	360996ac5a	[RISCV] Merge machine operand flag MO_PLT into MO_CALL (#77253 ) Since #72467, `@plt` in assembly output "call foo@plt" is omitted. We can trivially merge MO_PLT and MO_CALL without any functional change to assembly/relocatable file output. Earlier architectures use different call relocation types whether a PLT is potentially needed: R_386_PLT32/R_386_PC32, R_68K_PLT32/R_68K_PC32, R_SPARC_WDISP30/R_SPARC_WPLT320. However, as the PLT property is per-symbol instead of per-call-site and linkers can optimize out a PLT, the distinction has been confusing. Arm made good names R_ARM_CALL/R_AARCH64_CALL. Let's use MO_CALL instead of MO_PLT. As follow-ups, we can merge fixup_riscv_call/fixup_riscv_call_plt and VK_RISCV_CALL/VK_RISCV_CALL_PLT.	2024-01-07 12:43:39 -08:00

... 25 26 27 28 29 ...

52796 Commits