llvm-project

Author	SHA1	Message	Date
Craig Topper	247eb13fab	[RISCV][GISel] Legalize G_BITREVERSE.	2023-11-09 16:27:21 -08:00
Maurice Heumann	8cbfc0b29d	[X86] Respect blockaddress offsets when performing X86 LEA fixups (#71641 ) The X86FixupLEAs pass drops blockaddress offsets, when splitting up slow 3-ops LEAs, as can be seen in this example: https://godbolt.org/z/bEsc3Poje Before running the pass, the first instruction in bb.0 is a LEA with ebp, ebx and a blockaddress. After the transformation, the blockaddress is missing. The reason this happens is because the 3-ops LEA is being splitup into a 2-ops LEA + an add instruction. However, as hasLEAOffset does not take blockaddresses into consideration, the add is not emitted and thus leading to the offset being dropped. Taking blockaddresses into consideration fixes this issue and results in the add instruction being emitted. This fixes #71667	2023-11-10 08:12:18 +08:00
stephenpeckham	1d1fede493	[XCOFF] Ensure .file is emitted before any .info pseudo-ops (#71577 ) When generating the assembly code for AIX/XCOFF, the .file pseudo-op needs to be emitted first, before any csects are generated. Otherwise, information such as the embedded command line will be associated with part of the object file rather than the entire object file.	2023-11-09 16:03:45 -06:00
Craig Topper	8b98d5b813	[RISCV][GISel] Enable libcall expansion for G_FCEIL and G_FFLOOR.	2023-11-09 13:14:42 -08:00
Craig Topper	679cc16c99	[RISCV] Disable early promotion for Zbs in performANDCombine with riscv-experimental-rv64-legal-i32 We can match this directly in isel with the i32 type being legal. The generic DAG combine will unpromote part of the pattern and prevent it from being matched in isel.	2023-11-09 09:51:31 -08:00
Craig Topper	24577bd089	[RISCV] Add BSET/BCLR/BINV/BEXT patterns for riscv-experimental-rv64-legal-i32.	2023-11-09 09:17:22 -08:00
Juergen Ributzka	6d1d7be133	Obsolete WebKit Calling Convention (#71567 ) The WebKit Calling Convention was created specifically for the WebKit FTL. FTL doesn't use LLVM anymore and therefore this calling convention is obsolete. This commit removes the WebKit CC, its associated tests, and documentation.	2023-11-09 09:08:41 -08:00
chuongg3	451bc3ec1d	[AArch64][GlobalISel] Legalize G_VECREDUCE_{MIN/MAX} (#69461 ) Legalizes G_VECREDUCE_{MIN/MAX} and selects instructions for vecreduce_{min/max}	2023-11-09 16:29:14 +00:00
Philip Reames	7ac8486e54	[RISCVInsertVSETVLI] Allow PRE with non-immediate AVLs (#71728 ) Extend our PRE logic to cover non-immediate AVL values. This covers large constant AVLs (which must be materialized in registers), and may help some code written explicitly with intrinsics. Looking at the existing code, I can't entirely figure out why I thought we needed VL == AVL to perform the PRE. My best guess is that I was worried about the VLMAX < VL < 2 * VLMAX case, but the spec explicitly says that vsetvli must be determinist on any particular AVL value. That case was, possibly by accident, covering another legality precondition. Specifically, by only returning true for immediate and VLMAX AVL values, we didn't encounter the case where the AVL was a register and that register wasn't available in the predecessor (e.g. if AVL is a load in the MBB block itself). --------- Co-authored-by: Luke Lau <luke_lau@icloud.com>	2023-11-09 08:03:13 -08:00
Shengchen Kan	c9017bc793	[X86] Support EGPR (R16-R31) for APX (#70958 ) 1. Map R16-R31 to DWARF registers 130-145. 2. Make R16-R31 caller-saved registers. 3. Make R16-31 allocatable only when feature EGPR is supported 4. Make R16-31 availabe for instructions in legacy maps 0/1 and EVEX space, except XSAVE*/XRSTOR RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4 Explanations for some seemingly unrelated changes: inline-asm-registers.mir, statepoint-invoke-ra-enter-at-end.mir: The immediate (TargetInstrInfo.cpp:1612) used for the regdef/reguse is the encoding for the register class in the enum generated by tablegen. This encoding will change any time a new register class is added. Since the number is part of the input, this means it can become stale. seh-directive-errors.s: R16-R31 makes ".seh_pushreg 17" legal musttail-varargs.ll: It seems some LLVM passes use the number of registers rather the number of allocatable registers as heuristic. This PR is to reland #67702 after #70222 in order to reduce some compile-time regression when EGPR is not used.	2023-11-09 23:39:40 +08:00
Igor Kirillov	59a063d5c6	[ExpandMemCmp] Improve memcmp optimisation for boolean results (#71221 ) This patch enhances the optimization of memcmp calls when only two outcomes are needed and comparison fits into one block, for example: bool result = memcmp(a, b, 6) > 0; Previously, LLVM would generate unnecessary operations even when the user of memcmp was only interested in a binary outcome.	2023-11-09 11:52:04 +00:00
Craig Topper	e3c120a585	[RISCV] Add a Zbb+Zbs command line to rv*zbs.ll to get coverage on an existing isel pattern. NFC This pattern wasn't tested def : Pat<(XLenVT (and (rotl -2, (XLenVT GPR:$rs2)), GPR:$rs1)), (BCLR GPR:$rs1, GPR:$rs2)>;1	2023-11-08 22:31:49 -08:00
Jianjian Guan	d36eb79ccc	[RISCV] Support Strict FP arithmetic Op when only have Zvfhmin (#68867 ) Include: STRICT_FADD, STRICT_FSUB, STRICT_FMUL, STRICT_FDIV, STRICT_FSQRT and STRICT_FMA.	2023-11-09 09:55:48 +08:00
Jun Wang	54470176af	[AMDGPU] Add inreg support for SGPR arguments (#67182 ) Function parameters marked with inreg are supposed to be allocated to SGPRs. However, for compute functions, this is ignored and function parameters are allocated to VGPRs. This fix modifies CC_AMDGPU_Func in AMDGPUCallingConv.td to use SGPRs if input arg is marked inreg. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2023-11-08 11:35:52 -08:00
Simon Pilgrim	671d10ad39	[X86] Add fabs test coverage for Issue #70947	2023-11-08 16:20:34 +00:00
Simon Pilgrim	45f1db4855	[X86] vec_fabs.ll - add AVX2 test coverage	2023-11-08 16:20:34 +00:00
Dinar Temirbulatov	3f9d385e58	[AArch64][SME] Shuffle lowering, assume that the minimal SVE register is 128-bit, when NOEN is not available. (#71647 ) We can assume that the minimal SVE register is 128-bit, when NEON is not available. And we can lower the shuffle shuffle operation with one operand to TBL1 SVE instruction.	2023-11-08 14:37:49 +00:00
alexfh	067632e141	Revert "[DAGCombiner] Transform `(icmp eq/ne (and X,C0),(shift X,C1))` to use rotate or to getter constants." due to a miscompile (#71598 ) - Revert "[DAGCombiner] Transform `(icmp eq/ne (and X,C0),(shift X,C1))` to use rotate or to getter constants." - causes a miscompile, see `112e49b381 (commitcomment-131943923)` - Revert "[X86] Fix gcc warning about mix of enumeral and non-enumeral types. NFC", which fixes a compiler warning in the commit above	2023-11-08 15:07:12 +01:00
Simon Pilgrim	33ecd93596	[X86] Add test coverage for ABDS/ABDU patterns with mismatching extension types	2023-11-08 10:33:18 +00:00
Jay Foad	d5f3b3b3b1	[RegScavenger] Simplify state tracking for backwards scavenging (#71202 ) Track the live register state immediately before, instead of after, MBBI. This makes it simple to track the state at the start or end of a basic block without a separate (and poorly named) Tracking flag. This changes the API of the backward(MachineBasicBlock::iterator I) method, which now recedes to the state just before, instead of just after, *I. Some clients are simplified by this change. There is one small functional change shown in the lit tests where multiple spilled registers all need to be reloaded before the same instruction. The reloads will now be inserted in the opposite order. This should not affect correctness.	2023-11-08 09:49:07 +00:00
Zhaoxuan Jiang	76b53a0216	[AArch64] (NFC) Fix test after loosening requirements for register renaming (#71634 ) The landing of https://reviews.llvm.org/D88663 renders the existing stp-opt-with-renaming-undef-assert test useless because the picked register for renaming becomes q0 instead of q16.	2023-11-08 09:12:57 +00:00
Diana Picus	3b905a0be5	[AMDGPU] ISel for llvm.amdgcn.set.inactive.chain.arg Add patterns to select int_amdgcn_set_inactive_chain_arg to V_SET_INACTIVE. This could probably use some more testing, but at least for simple cases V_SET_INACTIVE seems to mostly work out of the box. Differential Revision: https://reviews.llvm.org/D158605	2023-11-08 09:53:47 +01:00
Diana Picus	39830fea28	[AMDGPU][PEI] Set up SP for chain functions Initialize the SP to 0 in the prologue of functions with the `amdgpu_cs_chain` or `amdgpu_cs_chain_preserve` calling conventions, but only if they need one (i.e. if they contain calls to `amdgpu_gfx` functions or if they have stack objects). Also make sure we don't try to realign the stack (since 0 is aligned enough). Differential Revision: https://reviews.llvm.org/D156413	2023-11-08 09:27:34 +01:00
Craig Topper	24b11ba24d	[RISCV][GISel] Use default lowering for G_DYN_STACKALLOC.	2023-11-07 23:59:27 -08:00
Diana	1fa58c7790	[AMDGPU] Callee saves for amdgpu_cs_chain[_preserve] (#71526 ) Teach prolog epilog insertion how to handle functions with the amdgpu_cs_chain or amdgpu_cs_chain_preserve calling conventions. For amdgpu_cs_chain functions, we only need to preserve the inactive lanes of VGPRs above v8, and only in the presence of calls via @llvm.amdgcn.cs.chain. For amdgpu_cs_chain_preserve functions, we will also need to preserve the active lanes for registers above the last argument VGPR. AFAICT there's no direct way to find out what the last argument VGPR is, so instead the patch uses the fact that chain calls from amdgpu_cs_chain_preserve functions can't use more VGPRs than the caller's VGPR arguments. In other words, it removes the operands of SI_CS_CHAIN_TC instructions from the list of callee saved registers. For both calling conventions, registers v0-v7 never need to be saved and restored, so we should never add them as WWM spills. Differential Revision: https://reviews.llvm.org/D156412	2023-11-08 08:28:15 +01:00
Qiu Chaofan	5f295552f1	[PowerPC] Fix incorrect symbol name of frexp libcall (#71626 ) frexpl is for ppc_fp128. The correct symbol name for f128 is frexpf128.	2023-11-08 14:41:19 +08:00
Luke Lau	11c182740a	[RISCV] Use masked pseudo peephole for reduction pseudos (#71508 ) After #71483 we now have a way of marking masked pseudos as having an unmasked equivalent, but their mask shouldn't be folded unless it's all ones since it would affect the result. This patch uses it to mark the pseudos for vredsum and friends, which in turn allows us to remove the unmasked patterns, and catch some other forms of vmerge.	2023-11-08 12:46:06 +08:00
Qiu Chaofan	d199fd76f7	[NFC] Add f128 frexp intrinsics for PowerPC	2023-11-08 11:27:40 +08:00
Carl Ritson	af6ff98c53	[AMDGPU] Move WWM register pre-allocation to during regalloc (#70618 ) Move SIPreAllocateWWMRegs pass to just before VGPR allocation. This saves recomputation of the virtual matrix and live reg map, with the slight regression in O0 that live intervals and slot indexes must be computed.	2023-11-08 11:54:28 +09:00
Craig Topper	a6c80c4f70	[RISCV][GISel] Add support for G_SITOFP/G_UITOFP with F and D extensions.	2023-11-07 16:40:58 -08:00
Michael Maitland	ac4ff6168a	[CodeGen][MachineVerifier] Use TypeSize instead of unsigned for getRe… (#70881 ) …gSizeInBits This patch changes getRegSizeInBits to return a TypeSize instead of an unsigned in the case that a virtual register has a scalable LLT. In the case that register is physical, a Fixed TypeSize is returned. The MachineVerifier pass is updated to allow copies between fixed and scalable operands as long as the Src size will fit into the Dest size. This is a precommit which will be stacked on by a change to GISel to generate COPYs with a scalable destination but a fixed size source. This patch is stacked on https://github.com/llvm/llvm-project/pull/70893 for the ability to use scalable vector types in MIR tests.	2023-11-07 14:38:46 -05:00
Craig Topper	374fb4126f	[RISCV][GISel] Add support for G_FPTOSI/G_FPTOUI with F and D extensions.	2023-11-07 10:14:37 -08:00
Philip Reames	a7f35d54ee	[SCEV] Extend isImpliedCondOperandsViaRanges to independent predicates (#71110 ) As far as I can tell, there's nothing in this code which actually assumes the two predicates in (FoundLHS FoundPred FoundRHS) => (LHS Pred RHS) are the same. Noticed while investigating something else, this is purely an oppurtunistic optimization while I'm looking at the code. Unfortunately, this doesn't solve my original problem. :)	2023-11-07 07:25:47 -08:00
Pierre van Houtryve	5db63d29fd	[AMDGPU] PromoteAlloca: Handle load/store subvectors using non-constant indexes (#71505 ) I assumed indexes were always ConstantInts, but that's not always the case. They can be other things as well. We can easily handle that by just emitting an add and let InstSimplify do the constant folding for cases where it's really a ConstantInt. Solves SWDEV-429935	2023-11-07 15:29:41 +01:00
Mitch Phillips	9b2439167d	Revert "RegisterCoalescer: Generate test checks" This reverts commit 9832eb4bdd92e876a59fea5a3502572dc9bcf870. Reason: Dependency on change that was reverted in `ba385ae210`	2023-11-07 15:09:08 +01:00
Mitch Phillips	9e50c6e6b5	Revert "Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG"" This reverts commit ba385ae210b3659bc9dfb78ef1d280d03c2c3b5a. Reason: Broke the MSan buildbot. See comments on `ba385ae210` for more information.	2023-11-07 15:08:45 +01:00
Pierre van Houtryve	4428b01faa	Reland: [AMDGPU] Remove Code Object V3 (#67118 ) V3 has been deprecated for a while as well, so it can safely be removed like V2 was removed. - [Clang] Set minimum code object version to 4 - [lld] Fix tests using code object v3 - Remove code object V3 from the AMDGPU backend, and delete or port v3 tests to v4. - Update docs to make it clear V3 can no longer be emitted.	2023-11-07 12:23:03 +01:00
Nikita Popov	6e56c35d19	[SpeculativeExecution] Add only-if-divergent-target pass option The optimization pipeline enables this option, but it was not preserved in -print-pipeline-passes output.	2023-11-07 11:49:37 +01:00
Graham Hunter	a850dbcc5c	[AArch64] Sink vscale calls into loops for better isel (#70304 ) For more recent sve capable CPUs it is beneficial to use the inc* instruction to increment a value by vscale (potentially shifted or multiplied) even in short loops. This patch tells codegenprepare to sink appropriate vscale calls into blocks where they are used so that isel can match them.	2023-11-07 10:29:42 +00:00
Luke Lau	fd4804423b	[RISCV] Add tests for pseudos that shouldn't have vmerge folded into them. NFC	2023-11-07 18:25:37 +08:00
Jim Lin	4306cfd40e	[RISCV] Fix using undefined variable %pt2 in mask-reg-alloc.mir testcase (#70764 ) First PseudoVMERGE_VIM_M1 should use %pt1 as its operand instead of %pt2. I found this error when I add LiveIntervals analysis pass in my downstream. And it crashes with the message: ``` Use of %7 does not have a corresponding definition on every path: 112r %6:vrnov0 = PseudoVMERGE_VIM_M1 %pt2:vrnov0(tied-def 0), %2:vr, 1, %4:vmv0, 1, 3 LLVM ERROR: Use not jointly dominated by defs. ```	2023-11-07 17:05:03 +08:00
Amara Emerson	e09184ffe0	[AArch64][GlobalISel] Remove -O0 from a legalizer test, which causes legalization failures to be silent. This was masking legalization failures in some functions in the test. Remove those for now since they don't actually work.	2023-11-07 00:36:15 -08:00
Nikita Popov	17764d2c87	[IR] Remove FP cast constant expressions (#71408 ) Remove support for the fptrunc, fpext, fptoui, fptosi, uitofp and sitofp constant expressions. All places creating them have been removed beforehand, so this just removes the APIs and uses of these constant expressions in tests. With this, the only remaining FP operation that still has constant expression support is fcmp. This is part of https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.	2023-11-07 09:34:16 +01:00
Yeting Kuo	a5c1ecada2	[RISCV] Disable performCombineVMergeAndVOps for PseduoVIOTA_M. (#71483 ) This transformation might be illegal for `PseduoVIOTA_M`. The value of `viota.m vd, vs2` is the prefix sum of vd2 and adding mask for it may cause wrong prefix sum. Take an example, the result of following expression is `{5, 5, 5, 3}`, ``` ; v4 = {1, 1, 1, 1} viota.m v1, v4 ; v0 = {0, 0, 0, 1}, v1 = {0, 1, 2, 3}, v8 = {5, 5, 5, 5} vmerge.vvm v8, v8, v1, v0.t ; v8 = {5, 5, 5, 3} ``` but if we merge them to `viota.m v8, v4, v0.t`, then the result of is `{5, 5, 5, 0}`. Also, we still does `performCombineVMergeAndVOps` for `voita.m` when mask of `vmerge.vvm` is a true mask.	2023-11-07 16:21:35 +08:00
Matt Arsenault	9832eb4bdd	RegisterCoalescer: Generate test checks Forgot to add the FileCheck part and generate checks before pushing this.	2023-11-07 16:58:47 +09:00
Matt Arsenault	ba385ae210	Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" This reverts commit e0f86ca2004b2d87ffe3c1e8242650a29fa98a82. This was hitting some assertions which have since been relaxed.	2023-11-07 16:57:02 +09:00
Matt Arsenault	d34a10a47d	AMDGPU: Port AMDGPUAttributor to new pass manager (#71349 )	2023-11-07 15:40:40 +09:00
Amara Emerson	6b69584660	[GlobalISel] Fall back for bf16 conversions. (#71470 ) We don't support these correctly since we don't yet have FP types. AMDGPU tests were silently miscompiling bf16 as if they were fp16.	2023-11-06 21:18:57 -08:00
Jay Foad	521ac12a25	[AMDGPU] Remove AMDGPUAsmPrinter::isBlockOnlyReachableByFallthrough (#71407 ) The special handling for blocks ending with a long branch has been unnecessary since D106445: "[amdgpu] Add 64-bit PC support when expanding unconditional branches."	2023-11-06 16:29:52 +00:00
Jay Foad	1c6102d19b	[AMDGPU] Regenerate checks for long-branch-reserve-register.ll	2023-11-06 15:33:23 +00:00

1 2 3 4 5 ...

50716 Commits