llvm-project

Author	SHA1	Message	Date
Craig Topper	0a459dd4e9	[RISCV] Add tests for selecting G_BRCOND+G_ICMP. NFC These should have been part of e0e0891d741588684b0803d7724e5080f9c75537	2023-11-13 15:29:34 -08:00
Craig Topper	29d75cb9dc	[RISCV][GISel] Update legalize-smin.mir and legalize-smax.mir to test G_SMIN/G_SMAX. Looks like an incomplete fixup was done after copying the umin/umax tests.	2023-11-13 13:49:14 -08:00
Craig Topper	915e092400	[RISCV] Select zext as sext when sign bit is 0 for -riscv-experimental-rv64-legal-i32 In our default SelectionDAG where i32 isn't legal, the zext will become and i64 AND and often get optimized out on its own. With i32 legal, we need to turn it in into sext.w and rely on RISCVOptWInstrs to remove it.	2023-11-13 12:21:36 -08:00
Craig Topper	05300222ba	[RISCV][GISel] Add really basic support for FP regbank selection for G_LOAD/G_STORE. (#70896 ) Coerce the register bank based on the users of the G_LOAD or the defining instruction for the G_STORE. s64 on rv32 is handled by forcing the FPRB register bank.	2023-11-13 12:12:16 -08:00
Craig Topper	70ce047f7e	[RISCV] Legalize G_CTLZ/G_CTLZ_ZERO_UNDEF/G_CTTZ/G_CTTZ_ZERO_UNDEF. (#72014 ) The base ISA does not support these operations. A future patch will enable them for Zbb.	2023-11-13 11:22:48 -08:00
Tom Stellard	877226f01f	[X86] Simplify regex in pr42616.ll test (#71980 )	2023-11-13 11:05:42 -08:00
Craig Topper	d8576e4542	[RISCV][GISel] Update RV64 legalize-ctpop.mir to account for constant shift amounts being i64 now. This changed while the ctpop patch was in review and I forgot to update it.	2023-11-13 10:38:36 -08:00
Craig Topper	90dd4c470f	[RISCV][GISel] Legalize G_CTPOP. (#72005 ) The base ISA does not have an instruction for this so we need to lower. Zbb support will come in a future patch.	2023-11-13 10:26:32 -08:00
Felipe de Azevedo Piovezan	83729e6716	[SelectionDAG] Disable FastISel for swiftasync functions (#70741 ) Most (x86) swiftasync functions tend to use both SelectionDAGISel and FastISel lowering: * FastISel argument lowering can only handle C calling convention. * FastISel fails mid-BB in a number of ways, including in simple `ret void` instructions under certain circumstances. This dance of SelectionDAG (argument) -> FastISel (some instructions) -> SelectionDAG(remaining instructions) is lossy; in particular, Argument information lowering is cleared after that first SelectionDAG run. Since swiftasync functions rely heavily on proper Argument lowering for debug information, this patch disables the use of FastISel in such functions.	2023-11-13 08:27:29 -08:00
Yingwei Zheng	d64d5ea102	[RISCV][CodeGenPrepare] Remove duplicated transform for zext. NFC. (#72053 ) After #71534 and #72052, the transform `zext -> zext nneg` in `RISCVCodeGenPrepare` is redundant.	2023-11-13 22:45:33 +08:00
David Green	2238363a5f	[AArch64] Prevent v1f16 vselect/setcc type expansion. (#72048 ) PR #71614 identified an issue in the lowering of v1f16 vector compares, where the `v1i1 setcc` is expanded to `v1i16 setcc`, and the `v1i16 setcc` tries to be expanded to a `v2i16 setcc` which fails. For floating point types we can let them scalarize instead though, generating a `setcc f16` that can be lowered using normal fp16 lowering. 07a8ff4892b2a54f0bd5843f863bcffa7a258f1f added a special case combine for v1 vselect to expand the predicate type to the same size as the fcmp operands. This turns that off for float types, allowing them to scalarize naturally, which hopefully fixes the issue by preventing the v1i16 setcc, meaning it wont try to widen to larger vectors. The codegen might not be optimal, but as far as I can tell everything generated successfully, providing that no `v1i16 setcc v1f16` instructions get generated.	2023-11-13 14:42:52 +00:00
Jay Foad	a4196666ac	[AMDGPU] Revert "Preliminary patch for divergence driven instruction selection. Operands Folding 1." (#71710 ) This reverts commit 201f892b3b597f24287ab6a712a286e25a45a7d9.	2023-11-13 13:53:10 +00:00
Simon Pilgrim	1a9fbf6166	[X86] combineLoad - reuse an existing VBROADCAST_LOAD constant for a smaller vector load of the same constant Extends the existing code that performed something similar for SUBV_BROADCAST_LOAD, but this is just for cases where AVX2 targets loads full width 128-bit constant vectors but broadcasts the equivalent 256-bit constant vector Fixes AVX2 case for Issue #70947	2023-11-13 11:59:04 +00:00
Jay Foad	47f29043f0	[AMDGPU] Fix a GlobalISel RUN line This was added in D149795 without actually enabling GlobalISel.	2023-11-13 11:30:15 +00:00
Nemanja Ivanovic	563720c3be	[RISCV] Fix lowering of negative zero with Zdinx 32-bit (#71869 ) The compiler currently abends with an impossible reg-to-reg copy when producing a negative zero FP immediate on RV32 with -Zdinx. This is because we emit a negation that uses FP registers. Emit the right node to produce correct code.	2023-11-13 07:38:14 +01:00
Craig Topper	44e8bea400	[GISel][AArch64] Notify the Observer when CTTZ lowering changes the opcode to CTPOP. (#72008 )	2023-11-12 19:36:24 -08:00
Carl Ritson	edc38a6cbd	[AMDGPU] Add option to pre-allocate SGPR spill VGPRs (#70626 ) SGPR spill VGPRs are WWM registers so allow them to be allocated by SIPreAllocateWWMRegs pass. This intentionally prevents spilling of these VGPRs when enabled.	2023-11-13 12:21:18 +09:00
Carl Ritson	52b247b1d3	[PHIElimination] Handle subranges in LiveInterval updates (#69429 ) Add subrange tracking and handling for LiveIntervals during PHI elimination. This requires extending MachineBasicBlock::SplitCriticalEdge to also update subrange intervals.	2023-11-13 12:16:26 +09:00
Han Shen	ca10e3b2e5	[LLVM][NVPTX] Add BF16 vector instruction and fix lowering rules (#69415 ) Add support for bf16x2 instructions such as setp, fneg, fabs, etc; Fix the instructions that were not differentiated between sm_80 and sm_90 support, such as fpround etc. Add more bf16 test cases to ensure the correct behavior. --------- Co-authored-by: shenhan03 <shenhan03@kuaishou.com>	2023-11-12 21:48:31 +08:00
David Green	a31538d29c	[AArch64] Add a test showing inefficient register allocation around loop IVs. NFC	2023-11-12 13:11:03 +00:00
Craig Topper	ee95819503	[RISCV][GISel] Legalize G_FSHL/G_FSHR.	2023-11-11 20:23:29 -08:00
Craig Topper	b0e97c7757	[RISCV][GISel] Legalize G_ROTL/G_ROTR.	2023-11-11 20:12:07 -08:00
Craig Topper	7965a21f7a	[RISCV] Add more packh patterns.	2023-11-11 19:31:23 -08:00
Craig Topper	bfb7843580	[RISCV] Add packw/packh patterns for -riscv-experimental-rv64-legal-i32	2023-11-11 17:52:22 -08:00
Craig Topper	6b9752cc72	[RISCV] Add rv64zbkb.ll test for -riscv-experimental-rv64-legal-i32. NFC	2023-11-11 17:52:22 -08:00
Craig Topper	fdc904e568	[RISCV] Add isel pattern to turn (or (zext X), Y) into add.uw when X and Y are disjoint. Improve code for -riscv-experimental-rv64-legal-i32.	2023-11-11 15:51:38 -08:00
Craig Topper	bf0963620c	[RISCV] Add (shl (zext GPR:), uimm5:) pattern for -riscv-experimental-rv64-legal-i32.	2023-11-11 15:14:02 -08:00
Craig Topper	994d882e15	[RISCV] Add an slli.uw pattern using zext for -riscv-experimental-rv64-legal-i32 We already had the pattern for GlobalISel. Move it over to SelectionDAG.	2023-11-11 14:41:56 -08:00
Momchil Velikov	e8209b2486	[MachineSink] Drop debug info for instructions deleted by sink-and-fold (#71443 ) After performing sink-and-fold over a COPY, the original instruction is replaced with one that produces its output in the destination of the copy. Its value is still available (in a hard register), so if there are debug instructions which refer to the (now deleted) virtual register they could be updated to refer to the hard register, in principle. However, it's not clear how to do that, moreover in some cases the debug instructions may need to be replicated proportionally to the number of the COPY instructions replaced and in some extreme cases we can end up with quadratic increase in the number of debug instructions, e.g: int f(int); void g(int x) { int y = x + 1; int t0 = y; f(t0); int t1 = y; f(t1); }	2023-11-11 19:43:14 +00:00
David Green	0bd67566f7	[AArch64] Remove AArch64/aarch64-neon-v1i1-setcc.ll test. NFC These are replicated in llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll with more tests and updated check lines. Remove the duplicate test.	2023-11-11 18:22:41 +00:00
Craig Topper	bab2cf2d01	[RISCV][GISel] Promote s32 constant shift amounts to s64 on RV64. This allows us to reuse isel patterns from SelectionDAG. This is similar to what is done on AArch64.	2023-11-10 23:07:00 -08:00
Craig Topper	647c490f8a	[RISCV] Add an add.uw pattern using zext for -riscv-experimental-rv64-legal-i32 and global isel	2023-11-10 21:36:29 -08:00
Craig Topper	7e0bae5b34	[RISCV][GISel] Add isel patterns for SHXADD with s32 type on RV64.	2023-11-10 19:52:57 -08:00
Craig Topper	a93dfb589d	[RISCV] Peek through zext in selectShiftMask. This improves the code for -riscv-experimental-rv64-legal-i32	2023-11-10 19:02:14 -08:00
Craig Topper	83cc24e598	[RISCV] Add test case showing unnecessary zext of shift amounts with -riscv-experimental-rv64-legal-i32. NFC	2023-11-10 19:02:13 -08:00
Shoaib Meenai	c5dd1bbcc3	Revert "Revert "[IR] Mark lshr and ashr constant expressions as undesirable"" This reverts commit 8ee07a4be7f7d8654ecf25e7ce0a680975649544. The revert is breaking AMDGPU backend tests (which I didn't have enabled), and I don't want to risk breakages over the weekend, so just revert for now.	2023-11-10 17:26:14 -08:00
Shoaib Meenai	8ee07a4be7	Revert "[IR] Mark lshr and ashr constant expressions as undesirable" This reverts commit 82f68a992b9f89036042d57a5f6345cb2925b2c1. cd7ba9f3d090afb5d3b15b0dcf379d15d1e11e33 needs to be reverted to fix test failures on builds without assertions, and this one needs to be reverted first for that.	2023-11-10 17:08:35 -08:00
Craig Topper	ca603343db	[RISCV][GISel] Legalizer and register bank selection for G_JUMP_TABLE and G_BRJT (#71970 ) Testing together since they should come paired. Instruction selection will be a separate PR.	2023-11-10 13:09:24 -08:00
Joseph Huber	a3bd87b100	[AMDGPU] Call the `FINI_ARRAY` destructors in the correct order (#71815 ) Summary: The AMDGPU backend uses the linker-provided INIT_ARRAY and FINI_ARRAY sections to call all the global constructors in a single kernel. Previously this mistakenly used the same iteration logic for both arrays. The destructors stored in FINI_ARRAY are stored in the same order as the ones in the INIT_ARRAY section so we need to traverse it in reverse order. Relanding after the revert in fe7b5e2cfcf6848287010291081f85fa1f6bb2ef using the IR builder interface instead of ConstantExpr.	2023-11-10 11:01:02 -06:00
Nikita Popov	fe7b5e2cfc	Revert "[AMDGPU] Call the `FINI_ARRAY` destructors in the correct order (#71815 )" This reverts commit c1d5865a313d0a8a254b37c852bdd444453c0f73. Introduces a new use of ConstantExpr::getAShr().	2023-11-10 17:01:06 +01:00
Joseph Huber	c1d5865a31	[AMDGPU] Call the `FINI_ARRAY` destructors in the correct order (#71815 ) Summary: The AMDGPU backend uses the linker-provided INIT_ARRAY and FINI_ARRAY sections to call all the global constructors in a single kernel. Previously this mistakenly used the same iteration logic for both arrays. The destructors stored in FINI_ARRAY are stored in the same order as the ones in the INIT_ARRAY section so we need to traverse it in reverse order.	2023-11-10 09:34:04 -06:00
Joseph Huber	af8ebfdcd9	[NVPTX] Allow the ctor/dtor lowering pass to emit kernels (#71549 ) Summary: This pass emits the new "nvptx$device$init" and "nvptx$device$fini" kernels that are callable by the device. This intends to mimic the method of lowering for AMDGPU where we emit `amdgcn.device.init` and `amdgcn.device.fini` respectively. These kernels simply iterate a symbol called `__init_array_start/stop` and `__fini_array_start/stop`. Normally, the linker provides these symbols automatically. In the AMDGPU case we only need call the kernel and we call the ctors / dtors. However, for NVPTX we require the user initializes these variables to the associated globals that we already emit as a part of this pass. The motivation behind this change is to move away from OpenMP's handling of ctors / dtors. I would much prefer that the backend / runtime handles this. That allows us to handle ctors / dtors in a language agnostic way, This approach requires that the runtime initializes the associated globals. They are marked `weak` so we can emit this per-TU. The kernel itself is `weak_odr` as it is copied exactly. One downside is that any module containing these kernels elicitis the "stack size cannot be statically determined warning" every time from `nvlink` which is annoying but inconsequential for functionality. It would be nice if there were a way to silence this warning however.	2023-11-10 09:33:29 -06:00
Nikita Popov	82f68a992b	[IR] Mark lshr and ashr constant expressions as undesirable These will no longer be created by default during constant folding.	2023-11-10 16:29:13 +01:00
David Green	10ce319320	[AArch64][GlobalISel] Expand handling for sitofp and uitofp (#71282 ) Similar to #70635, this expands the handling of integer to fp conversions. The code is very similar to the float->integer conversions with types handled oppositely. There are some extra unhandled cases which require more handling for ASR operations.	2023-11-10 13:41:13 +00:00
Yingwei Zheng	650026897c	[RISCV][SDAG] Prefer ShortForwardBranch to lower sdiv by pow2 (#67364 ) This patch lowers `sdiv x, +/-2k` to `add + select + shift` when the short forward branch optimization is enabled. The latter inst seq performs faster than the seq generated by target-independent DAGCombiner. This algorithm is described in Hacker's Delight**. This patch also removes duplicate logic in the X86 and AArch64 backend. But we cannot do this for the PowerPC backend since it generates a special instruction `addze`.	2023-11-10 21:38:47 +08:00
Valery Pykhtin	87b8d94371	[AMDGPU] Fix GCNUpwardRPTracker. (#71186 ) Fixed: 1. Maximum register pressure calculation at the instruction level. Previously max RP included both def and use of registers of an instruction. Now maximum RP includes _uses_ and _early-clobber defs_. 2. Uses were incorrectly tracked and this resulted in a mismatch of live-in set reported by LiveIntervals and tracked live reg set when the beginning of the block is reached. Interface has changed, moveMaxPressure becomes deprecated and getMaxPressure, resetMaxPressure functions are added. reset function seem now more consistent.	2023-11-10 13:44:10 +01:00
Serge Pavlov	5b0f703918	Revert "[ARM][FPEnv] Lowering of fpenv intrinsics" This reverts commit d62f040418bd167d1ddd2b79c640a90c0c2ea353. Some cuda buildbots start failing.	2023-11-10 16:24:51 +07:00
Serge Pavlov	d62f040418	[ARM][FPEnv] Lowering of fpenv intrinsics The change implements lowering of `get_fpenv`, `set_fpenv` and `reset_fpenv`. Differential Revision: https://reviews.llvm.org/D81843	2023-11-10 16:06:33 +07:00
Diana Picus	20e9e4f797	[AMDGPU] si-wqm: Skip only LiveMask COPY si-wqm sometimes needs to save the LiveMask in the entry block. Later on, while looking for a place to enter WQM/WWM, it unconditionally skips over the first COPY instruction in the entry block. This is incorrect for functions where the LiveMask doesn't need to be saved, and therefore the first COPY is more likely a COPY from a function argument and might need to be in some non-exact mode. This patch fixes the issue by also checking that the source of the COPY is the EXEC register. This produces different code in 3 of the existing tests: In wwm-reserved.ll, a SGPR copy is now inside the WWM area rather than outside. This is benign. In wave32.ll, we end up with an extra register copy. This is because the first COPY in the block is now part of the WWM block, so si-pre-allocate-wwm-regs will allocate a new register for its destination (when it was outside of the WWM region, the register allocator could just re-use the same register). We might be able to improve this in si-pre-allocate-wwm-regs but I haven't looked into it. The same thing happens in dual-source-blend-export.ll, but for that one it's harder to see because of the scheduling changes. I've uploaded the before/after si-wqm output for it here: https://reviews.llvm.org/differential/diff/553445/ Differential Revision: https://reviews.llvm.org/D158841	2023-11-10 09:30:44 +01:00
Matt Arsenault	67c3cb4f6b	AMDGPU: Use an explicit triple in test to avoid bot failures	2023-11-10 17:09:55 +09:00

1 2 3 4 5 ...

50770 Commits