llvm-project

Author	SHA1	Message	Date
Craig Topper	6b9752cc72	[RISCV] Add rv64zbkb.ll test for -riscv-experimental-rv64-legal-i32. NFC	2023-11-11 17:52:22 -08:00
Craig Topper	fdc904e568	[RISCV] Add isel pattern to turn (or (zext X), Y) into add.uw when X and Y are disjoint. Improve code for -riscv-experimental-rv64-legal-i32.	2023-11-11 15:51:38 -08:00
Craig Topper	bf0963620c	[RISCV] Add (shl (zext GPR:), uimm5:) pattern for -riscv-experimental-rv64-legal-i32.	2023-11-11 15:14:02 -08:00
Craig Topper	994d882e15	[RISCV] Add an slli.uw pattern using zext for -riscv-experimental-rv64-legal-i32 We already had the pattern for GlobalISel. Move it over to SelectionDAG.	2023-11-11 14:41:56 -08:00
Momchil Velikov	e8209b2486	[MachineSink] Drop debug info for instructions deleted by sink-and-fold (#71443 ) After performing sink-and-fold over a COPY, the original instruction is replaced with one that produces its output in the destination of the copy. Its value is still available (in a hard register), so if there are debug instructions which refer to the (now deleted) virtual register they could be updated to refer to the hard register, in principle. However, it's not clear how to do that, moreover in some cases the debug instructions may need to be replicated proportionally to the number of the COPY instructions replaced and in some extreme cases we can end up with quadratic increase in the number of debug instructions, e.g: int f(int); void g(int x) { int y = x + 1; int t0 = y; f(t0); int t1 = y; f(t1); }	2023-11-11 19:43:14 +00:00
David Green	0bd67566f7	[AArch64] Remove AArch64/aarch64-neon-v1i1-setcc.ll test. NFC These are replicated in llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll with more tests and updated check lines. Remove the duplicate test.	2023-11-11 18:22:41 +00:00
Craig Topper	bab2cf2d01	[RISCV][GISel] Promote s32 constant shift amounts to s64 on RV64. This allows us to reuse isel patterns from SelectionDAG. This is similar to what is done on AArch64.	2023-11-10 23:07:00 -08:00
Craig Topper	647c490f8a	[RISCV] Add an add.uw pattern using zext for -riscv-experimental-rv64-legal-i32 and global isel	2023-11-10 21:36:29 -08:00
Craig Topper	7e0bae5b34	[RISCV][GISel] Add isel patterns for SHXADD with s32 type on RV64.	2023-11-10 19:52:57 -08:00
Craig Topper	a93dfb589d	[RISCV] Peek through zext in selectShiftMask. This improves the code for -riscv-experimental-rv64-legal-i32	2023-11-10 19:02:14 -08:00
Craig Topper	83cc24e598	[RISCV] Add test case showing unnecessary zext of shift amounts with -riscv-experimental-rv64-legal-i32. NFC	2023-11-10 19:02:13 -08:00
Shoaib Meenai	c5dd1bbcc3	Revert "Revert "[IR] Mark lshr and ashr constant expressions as undesirable"" This reverts commit 8ee07a4be7f7d8654ecf25e7ce0a680975649544. The revert is breaking AMDGPU backend tests (which I didn't have enabled), and I don't want to risk breakages over the weekend, so just revert for now.	2023-11-10 17:26:14 -08:00
Shoaib Meenai	8ee07a4be7	Revert "[IR] Mark lshr and ashr constant expressions as undesirable" This reverts commit 82f68a992b9f89036042d57a5f6345cb2925b2c1. cd7ba9f3d090afb5d3b15b0dcf379d15d1e11e33 needs to be reverted to fix test failures on builds without assertions, and this one needs to be reverted first for that.	2023-11-10 17:08:35 -08:00
Craig Topper	ca603343db	[RISCV][GISel] Legalizer and register bank selection for G_JUMP_TABLE and G_BRJT (#71970 ) Testing together since they should come paired. Instruction selection will be a separate PR.	2023-11-10 13:09:24 -08:00
Joseph Huber	a3bd87b100	[AMDGPU] Call the `FINI_ARRAY` destructors in the correct order (#71815 ) Summary: The AMDGPU backend uses the linker-provided INIT_ARRAY and FINI_ARRAY sections to call all the global constructors in a single kernel. Previously this mistakenly used the same iteration logic for both arrays. The destructors stored in FINI_ARRAY are stored in the same order as the ones in the INIT_ARRAY section so we need to traverse it in reverse order. Relanding after the revert in fe7b5e2cfcf6848287010291081f85fa1f6bb2ef using the IR builder interface instead of ConstantExpr.	2023-11-10 11:01:02 -06:00
Nikita Popov	fe7b5e2cfc	Revert "[AMDGPU] Call the `FINI_ARRAY` destructors in the correct order (#71815 )" This reverts commit c1d5865a313d0a8a254b37c852bdd444453c0f73. Introduces a new use of ConstantExpr::getAShr().	2023-11-10 17:01:06 +01:00
Joseph Huber	c1d5865a31	[AMDGPU] Call the `FINI_ARRAY` destructors in the correct order (#71815 ) Summary: The AMDGPU backend uses the linker-provided INIT_ARRAY and FINI_ARRAY sections to call all the global constructors in a single kernel. Previously this mistakenly used the same iteration logic for both arrays. The destructors stored in FINI_ARRAY are stored in the same order as the ones in the INIT_ARRAY section so we need to traverse it in reverse order.	2023-11-10 09:34:04 -06:00
Joseph Huber	af8ebfdcd9	[NVPTX] Allow the ctor/dtor lowering pass to emit kernels (#71549 ) Summary: This pass emits the new "nvptx$device$init" and "nvptx$device$fini" kernels that are callable by the device. This intends to mimic the method of lowering for AMDGPU where we emit `amdgcn.device.init` and `amdgcn.device.fini` respectively. These kernels simply iterate a symbol called `__init_array_start/stop` and `__fini_array_start/stop`. Normally, the linker provides these symbols automatically. In the AMDGPU case we only need call the kernel and we call the ctors / dtors. However, for NVPTX we require the user initializes these variables to the associated globals that we already emit as a part of this pass. The motivation behind this change is to move away from OpenMP's handling of ctors / dtors. I would much prefer that the backend / runtime handles this. That allows us to handle ctors / dtors in a language agnostic way, This approach requires that the runtime initializes the associated globals. They are marked `weak` so we can emit this per-TU. The kernel itself is `weak_odr` as it is copied exactly. One downside is that any module containing these kernels elicitis the "stack size cannot be statically determined warning" every time from `nvlink` which is annoying but inconsequential for functionality. It would be nice if there were a way to silence this warning however.	2023-11-10 09:33:29 -06:00
Nikita Popov	82f68a992b	[IR] Mark lshr and ashr constant expressions as undesirable These will no longer be created by default during constant folding.	2023-11-10 16:29:13 +01:00
David Green	10ce319320	[AArch64][GlobalISel] Expand handling for sitofp and uitofp (#71282 ) Similar to #70635, this expands the handling of integer to fp conversions. The code is very similar to the float->integer conversions with types handled oppositely. There are some extra unhandled cases which require more handling for ASR operations.	2023-11-10 13:41:13 +00:00
Yingwei Zheng	650026897c	[RISCV][SDAG] Prefer ShortForwardBranch to lower sdiv by pow2 (#67364 ) This patch lowers `sdiv x, +/-2k` to `add + select + shift` when the short forward branch optimization is enabled. The latter inst seq performs faster than the seq generated by target-independent DAGCombiner. This algorithm is described in Hacker's Delight**. This patch also removes duplicate logic in the X86 and AArch64 backend. But we cannot do this for the PowerPC backend since it generates a special instruction `addze`.	2023-11-10 21:38:47 +08:00
Valery Pykhtin	87b8d94371	[AMDGPU] Fix GCNUpwardRPTracker. (#71186 ) Fixed: 1. Maximum register pressure calculation at the instruction level. Previously max RP included both def and use of registers of an instruction. Now maximum RP includes _uses_ and _early-clobber defs_. 2. Uses were incorrectly tracked and this resulted in a mismatch of live-in set reported by LiveIntervals and tracked live reg set when the beginning of the block is reached. Interface has changed, moveMaxPressure becomes deprecated and getMaxPressure, resetMaxPressure functions are added. reset function seem now more consistent.	2023-11-10 13:44:10 +01:00
Serge Pavlov	5b0f703918	Revert "[ARM][FPEnv] Lowering of fpenv intrinsics" This reverts commit d62f040418bd167d1ddd2b79c640a90c0c2ea353. Some cuda buildbots start failing.	2023-11-10 16:24:51 +07:00
Serge Pavlov	d62f040418	[ARM][FPEnv] Lowering of fpenv intrinsics The change implements lowering of `get_fpenv`, `set_fpenv` and `reset_fpenv`. Differential Revision: https://reviews.llvm.org/D81843	2023-11-10 16:06:33 +07:00
Diana Picus	20e9e4f797	[AMDGPU] si-wqm: Skip only LiveMask COPY si-wqm sometimes needs to save the LiveMask in the entry block. Later on, while looking for a place to enter WQM/WWM, it unconditionally skips over the first COPY instruction in the entry block. This is incorrect for functions where the LiveMask doesn't need to be saved, and therefore the first COPY is more likely a COPY from a function argument and might need to be in some non-exact mode. This patch fixes the issue by also checking that the source of the COPY is the EXEC register. This produces different code in 3 of the existing tests: In wwm-reserved.ll, a SGPR copy is now inside the WWM area rather than outside. This is benign. In wave32.ll, we end up with an extra register copy. This is because the first COPY in the block is now part of the WWM block, so si-pre-allocate-wwm-regs will allocate a new register for its destination (when it was outside of the WWM region, the register allocator could just re-use the same register). We might be able to improve this in si-pre-allocate-wwm-regs but I haven't looked into it. The same thing happens in dual-source-blend-export.ll, but for that one it's harder to see because of the scheduling changes. I've uploaded the before/after si-wqm output for it here: https://reviews.llvm.org/differential/diff/553445/ Differential Revision: https://reviews.llvm.org/D158841	2023-11-10 09:30:44 +01:00
Matt Arsenault	67c3cb4f6b	AMDGPU: Use an explicit triple in test to avoid bot failures	2023-11-10 17:09:55 +09:00
Wang Pengcheng	9bb69c1d96	[RISCV] Enable LoopDataPrefetch pass (#66201 ) So that we can benefit from data prefetch when `Zicbop` extension is supported. Tune information for data prefetching are added in `RISCVTuneInfo`.	2023-11-10 15:39:58 +08:00
Craig Topper	fdbff88196	[RISCV][GISel] Add support for G_FCMP with F and D extensions. (#70624 ) We only have instructions for OEQ, OLT, and OLE. We need to convert other comparison codes into those. I think we'll likely want to split this up in the future to support optimizations. Maybe do some of it in the legalizer or in a new post legalizer lowering pass. So this patch is just enough to get something working without adding 11 additional patterns to tablegen for each type.	2023-11-09 20:45:35 -08:00
Craig Topper	aae30f9e2c	[RISCV] Use Align(8) for the stack temporary created for SPLAT_VECTOR_SPLIT_I64_VL. The value needs to be read as an 8 byte vector element which requires the pointer to be 8 byte aligned according to the vector spec. Fixes #71787	2023-11-09 20:43:22 -08:00
Matt Arsenault	dd57bd0efe	Reapply "RegisterCoalescer: Generate test checks" This reverts commit 9b2439167d9f794e317fecbdbb0a6e96f9ea4b56. This was an unrelated NFC change to make a test more useful (really it should have been first, it was supposed to show the test diff).	2023-11-10 10:29:08 +09:00
Craig Topper	247eb13fab	[RISCV][GISel] Legalize G_BITREVERSE.	2023-11-09 16:27:21 -08:00
Maurice Heumann	8cbfc0b29d	[X86] Respect blockaddress offsets when performing X86 LEA fixups (#71641 ) The X86FixupLEAs pass drops blockaddress offsets, when splitting up slow 3-ops LEAs, as can be seen in this example: https://godbolt.org/z/bEsc3Poje Before running the pass, the first instruction in bb.0 is a LEA with ebp, ebx and a blockaddress. After the transformation, the blockaddress is missing. The reason this happens is because the 3-ops LEA is being splitup into a 2-ops LEA + an add instruction. However, as hasLEAOffset does not take blockaddresses into consideration, the add is not emitted and thus leading to the offset being dropped. Taking blockaddresses into consideration fixes this issue and results in the add instruction being emitted. This fixes #71667	2023-11-10 08:12:18 +08:00
stephenpeckham	1d1fede493	[XCOFF] Ensure .file is emitted before any .info pseudo-ops (#71577 ) When generating the assembly code for AIX/XCOFF, the .file pseudo-op needs to be emitted first, before any csects are generated. Otherwise, information such as the embedded command line will be associated with part of the object file rather than the entire object file.	2023-11-09 16:03:45 -06:00
Craig Topper	8b98d5b813	[RISCV][GISel] Enable libcall expansion for G_FCEIL and G_FFLOOR.	2023-11-09 13:14:42 -08:00
Craig Topper	679cc16c99	[RISCV] Disable early promotion for Zbs in performANDCombine with riscv-experimental-rv64-legal-i32 We can match this directly in isel with the i32 type being legal. The generic DAG combine will unpromote part of the pattern and prevent it from being matched in isel.	2023-11-09 09:51:31 -08:00
Craig Topper	24577bd089	[RISCV] Add BSET/BCLR/BINV/BEXT patterns for riscv-experimental-rv64-legal-i32.	2023-11-09 09:17:22 -08:00
Juergen Ributzka	6d1d7be133	Obsolete WebKit Calling Convention (#71567 ) The WebKit Calling Convention was created specifically for the WebKit FTL. FTL doesn't use LLVM anymore and therefore this calling convention is obsolete. This commit removes the WebKit CC, its associated tests, and documentation.	2023-11-09 09:08:41 -08:00
chuongg3	451bc3ec1d	[AArch64][GlobalISel] Legalize G_VECREDUCE_{MIN/MAX} (#69461 ) Legalizes G_VECREDUCE_{MIN/MAX} and selects instructions for vecreduce_{min/max}	2023-11-09 16:29:14 +00:00
Philip Reames	7ac8486e54	[RISCVInsertVSETVLI] Allow PRE with non-immediate AVLs (#71728 ) Extend our PRE logic to cover non-immediate AVL values. This covers large constant AVLs (which must be materialized in registers), and may help some code written explicitly with intrinsics. Looking at the existing code, I can't entirely figure out why I thought we needed VL == AVL to perform the PRE. My best guess is that I was worried about the VLMAX < VL < 2 * VLMAX case, but the spec explicitly says that vsetvli must be determinist on any particular AVL value. That case was, possibly by accident, covering another legality precondition. Specifically, by only returning true for immediate and VLMAX AVL values, we didn't encounter the case where the AVL was a register and that register wasn't available in the predecessor (e.g. if AVL is a load in the MBB block itself). --------- Co-authored-by: Luke Lau <luke_lau@icloud.com>	2023-11-09 08:03:13 -08:00
Shengchen Kan	c9017bc793	[X86] Support EGPR (R16-R31) for APX (#70958 ) 1. Map R16-R31 to DWARF registers 130-145. 2. Make R16-R31 caller-saved registers. 3. Make R16-31 allocatable only when feature EGPR is supported 4. Make R16-31 availabe for instructions in legacy maps 0/1 and EVEX space, except XSAVE*/XRSTOR RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4 Explanations for some seemingly unrelated changes: inline-asm-registers.mir, statepoint-invoke-ra-enter-at-end.mir: The immediate (TargetInstrInfo.cpp:1612) used for the regdef/reguse is the encoding for the register class in the enum generated by tablegen. This encoding will change any time a new register class is added. Since the number is part of the input, this means it can become stale. seh-directive-errors.s: R16-R31 makes ".seh_pushreg 17" legal musttail-varargs.ll: It seems some LLVM passes use the number of registers rather the number of allocatable registers as heuristic. This PR is to reland #67702 after #70222 in order to reduce some compile-time regression when EGPR is not used.	2023-11-09 23:39:40 +08:00
Igor Kirillov	59a063d5c6	[ExpandMemCmp] Improve memcmp optimisation for boolean results (#71221 ) This patch enhances the optimization of memcmp calls when only two outcomes are needed and comparison fits into one block, for example: bool result = memcmp(a, b, 6) > 0; Previously, LLVM would generate unnecessary operations even when the user of memcmp was only interested in a binary outcome.	2023-11-09 11:52:04 +00:00
Craig Topper	e3c120a585	[RISCV] Add a Zbb+Zbs command line to rv*zbs.ll to get coverage on an existing isel pattern. NFC This pattern wasn't tested def : Pat<(XLenVT (and (rotl -2, (XLenVT GPR:$rs2)), GPR:$rs1)), (BCLR GPR:$rs1, GPR:$rs2)>;1	2023-11-08 22:31:49 -08:00
Jianjian Guan	d36eb79ccc	[RISCV] Support Strict FP arithmetic Op when only have Zvfhmin (#68867 ) Include: STRICT_FADD, STRICT_FSUB, STRICT_FMUL, STRICT_FDIV, STRICT_FSQRT and STRICT_FMA.	2023-11-09 09:55:48 +08:00
Jun Wang	54470176af	[AMDGPU] Add inreg support for SGPR arguments (#67182 ) Function parameters marked with inreg are supposed to be allocated to SGPRs. However, for compute functions, this is ignored and function parameters are allocated to VGPRs. This fix modifies CC_AMDGPU_Func in AMDGPUCallingConv.td to use SGPRs if input arg is marked inreg. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2023-11-08 11:35:52 -08:00
Simon Pilgrim	671d10ad39	[X86] Add fabs test coverage for Issue #70947	2023-11-08 16:20:34 +00:00
Simon Pilgrim	45f1db4855	[X86] vec_fabs.ll - add AVX2 test coverage	2023-11-08 16:20:34 +00:00
Dinar Temirbulatov	3f9d385e58	[AArch64][SME] Shuffle lowering, assume that the minimal SVE register is 128-bit, when NOEN is not available. (#71647 ) We can assume that the minimal SVE register is 128-bit, when NEON is not available. And we can lower the shuffle shuffle operation with one operand to TBL1 SVE instruction.	2023-11-08 14:37:49 +00:00
alexfh	067632e141	Revert "[DAGCombiner] Transform `(icmp eq/ne (and X,C0),(shift X,C1))` to use rotate or to getter constants." due to a miscompile (#71598 ) - Revert "[DAGCombiner] Transform `(icmp eq/ne (and X,C0),(shift X,C1))` to use rotate or to getter constants." - causes a miscompile, see `112e49b381 (commitcomment-131943923)` - Revert "[X86] Fix gcc warning about mix of enumeral and non-enumeral types. NFC", which fixes a compiler warning in the commit above	2023-11-08 15:07:12 +01:00
Simon Pilgrim	33ecd93596	[X86] Add test coverage for ABDS/ABDU patterns with mismatching extension types	2023-11-08 10:33:18 +00:00
Jay Foad	d5f3b3b3b1	[RegScavenger] Simplify state tracking for backwards scavenging (#71202 ) Track the live register state immediately before, instead of after, MBBI. This makes it simple to track the state at the start or end of a basic block without a separate (and poorly named) Tracking flag. This changes the API of the backward(MachineBasicBlock::iterator I) method, which now recedes to the state just before, instead of just after, *I. Some clients are simplified by this change. There is one small functional change shown in the lit tests where multiple spilled registers all need to be reloaded before the same instruction. The reloads will now be inserted in the opposite order. This should not affect correctness.	2023-11-08 09:49:07 +00:00

... 40 41 42 43 44 ...

52796 Commits