llvm-project

Author	SHA1	Message	Date
David Green	b94913b8ad	[AArch64] Vector insert zero upper tests. NFC	2024-02-26 22:15:36 +00:00
Noah Goldstein	15a7de697a	[SelectionDAG] Support sign tracking through `{S\|U}INT_TO_FP` Just a minimal amount of easily provable tracking. Proofs: https://alive2.llvm.org/ce/z/RQYbdw Closes #82808 Alive2 to has an issue with `(sitofp i1)`, but it can be verified by hand: https://godbolt.org/z/qKr7hT7s9	2024-02-26 15:35:38 -06:00
Jeffrey Byrnes	113052b2b0	[AMDGPU] Prefer lower total register usage in regions with spilling Change-Id: Ia5c434b0945bdcbc357c5e06c3164118fc91df25	2024-02-26 12:19:52 -08:00
Craig Topper	f1bb88bee2	[RISCV] Use PromoteSetCCOperands to promote operands for UMAX/UMIN during type legalization. (#82716 ) For RISC-V, we were always choosing to sign extend when promoting i32->i64. If the promoted inputs happen to be zero extended already, we should use zero extend instead. This is what we do for SETCC.	2024-02-26 10:31:58 -08:00
Owen Anderson	ebb64d8370	[GlobalISel] Make the Combiner insert G_FREEZE when converting G_SELECT to binary operations. (#82733 ) This is needed because the binary operators (G_OR and G_AND) do not have the poison-suppressing semantics of G_SELECT. Fixes https://github.com/llvm/llvm-project/issues/72475	2024-02-26 10:50:37 -05:00
Petar Avramovic	433f8e741e	MachineSSAUpdater: use all vreg attributes instead of reg class only (#78431 ) When initializing MachineSSAUpdater save all attributes of current virtual register and create new virtual registers with same attributes. Now new virtual registers have same both register class or bank and LLT. Previously new virtual registers had same register class but LLT was not set (LLT was set to default/empty LLT). Required by GlobalISel for AMDGPU, new 'lane mask' virtual registers created by MachineSSAUpdater need to have both register class and LLT. patch 4 from: https://github.com/llvm/llvm-project/pull/73337	2024-02-26 13:46:13 +01:00
ostannard	749384c08e	[ARM] Update IsRestored for LR based on all returns (#82745 ) PR #75527 fixed ARMFrameLowering to set the IsRestored flag for LR based on all of the return instructions in the function, not just one. However, there is also code in ARMLoadStoreOptimizer which changes return instructions, but it set IsRestored based on the one instruction it changed, not the whole function. The fix is to factor out the code added in #75527, and also call it from ARMLoadStoreOptimizer if it made a change to return instructions. Fixes #80287.	2024-02-26 12:23:25 +00:00
Oliver Stannard	8779cf68e8	Pre-commit test showing bug #80287 This test shows the bug where LR is used as a general-purpose register on a code path where it is not spilled to the stack.	2024-02-26 12:21:13 +00:00
Jack Styles	28233408a2	[CodeGen] [ARM] Make RISC-V Init Undef Pass Target Independent and add support for the ARM Architecture. (#77770 ) When using Greedy Register Allocation, there are times where early-clobber values are ignored, and assigned the same register. This is illeagal behaviour for these intructions. To get around this, using Pseudo instructions for early-clobber registers gives them a definition and allows Greedy to assign them to a different register. This then meets the ARM Architecture Reference Manual and matches the defined behaviour. This patch takes the existing RISC-V patch and makes it target independent, then adds support for the ARM Architecture. Doing this will ensure early-clobber restraints are followed when using the ARM Architecture. Making the pass target independent will also open up possibility that support other architectures can be added in the future.	2024-02-26 12:12:31 +00:00
Luke Lau	3d084e37ab	[RISCV] Add tests for fixed length concat_vector. NFC These shufflevector chains will get combined into a n-ary concat_vectors node.	2024-02-26 20:03:25 +08:00
Yeting Kuo	e510fc7753	[VP][RISCV] Introduce vp.lrint/llrint and RISC-V support. (#82627 ) RISC-V implements vector lrint/llrint by vfcvt.x.f.v.	2024-02-26 16:37:41 +08:00
hev	8be39b3901	[LoongArch] Improve pattern matching for AddLike predicate (#82767 ) This commit updates the pattern matching logic for the `AddLike` predicate in `LoongArchInstrInfo.td` to use the `isBaseWithConstantOffset` function provided by `CurDAG`. This optimization aims to improve the efficiency of pattern matching by identifying cases where the operation can be represented as a base address plus a constant offset, which can lead to more efficient code generation.	2024-02-26 11:13:21 +08:00
Owen Anderson	2c5a68858b	Fix non-splat vector SREM expansion when one of the divisors is a power of two. (#82706 ) The expansion previously used, derived from Hacker's Delight, does not work correctly when the dividend is INT_MIN and the divisor is a power of two. We now use an alternate derivation of the A and Q constants specifically for the power-of-two divisor case to avoid this problem. Credit to Fabian Giesen for the new derivation. Fixes https://github.com/llvm/llvm-project/issues/77169	2024-02-25 10:13:05 -05:00
Rishabh Bali	fe42e72db2	[CodeGen] Port AtomicExpand to new Pass Manager (#71220 ) Port the `atomicexpand` pass to the new Pass Manager. Fixes #64559	2024-02-25 18:42:22 +05:30
Thorsten Schütt	12d29cd171	test overflow intrinsics	2024-02-25 11:37:43 +01:00
Serge Pavlov	00c0638b56	[AArch64] Intrinsics aarch64_{get,set}_fpsr (#81867 ) Two new intrinsics are introduced to read/write FPSR. They are similar to the existing intrinsics aarch64_{get,set}_fpcr.	2024-02-24 20:25:21 +07:00
yingopq	96abee5eef	[Mips] Fix unable to handle inline assembly ends with compat-branch o… (#77291 ) …n MIPS Modify: Add a global variable 'CurForbiddenSlotAttr' to save current instruction's forbidden slot and whether set reorder. This is the judgment condition for whether to add nop. We would add a couple of '.set noreorder' and '.set reorder' to wrap the current instruction and the next instruction. Then we can get previous instruction`s forbidden slot attribute and whether set reorder by 'CurForbiddenSlotAttr'. If previous instruction has forbidden slot and .set reorder is active and current instruction is CTI. Then emit a NOP after it. Fix https://github.com/llvm/llvm-project/issues/61045. Because https://reviews.llvm.org/D158589 was 'Needs Review' state, not ending, so we commit pull request again.	2024-02-24 15:13:43 +08:00
Jeffrey Byrnes	8f2bd8ae68	[AMDGPU] Introduce iglp_opt(2): Generalized exp/mfma interleaving for select kernels (#81342 ) This implements the basic pipelining structure of exp/mfma interleaving for better extensibility. While it does have improved extensibility, there are controls which only enable it for DAGs with certain characteristics (matching the DAGs it has been designed against).	2024-02-23 17:13:20 -08:00
Visoiu Mistrih Francis	775bd60363	[RISCV] Add scheduling info for Zcmp (#82719 ) The order of the entries in the list is: outs, ins, Defs, Uses, implicit-defs, implicit uses, where the last two are added programatically during codegen depending on the registers saved/restored and are not described in the TD files.	2024-02-23 15:44:57 -08:00
Kevin P. Neal	3e9e5e2771	[FPEnv][SystemZ] Correct strictfp test. Correct llvm-reduce strictfp test to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics This test needed the strictfp attribute added to function definitions. Test changes verified with D146845.	2024-02-23 13:00:38 -05:00
Lukacma	08cb1a62f6	[AArch64][SVE] Add intrinsincs to assembly mapping for svpmov (#81861 ) This patch enables translation of svpmov intrinsic to the correct assembly instruction, instead of function call.	2024-02-23 15:40:44 +00:00
hev	c747b24262	[NFC] Precommit a memcpy test for isOrEquivalentToAdd (#82758 )	2024-02-23 21:43:53 +08:00
Evgenii Kudriashov	790bcecce6	[GlobalISel] Fix a check that aligned tail call is lowered (#82016 ) Despite of a valid tail call opportunity, backends still may not generate a tail call or such lowering is not implemented yet. Check that lowering has happened instead of its possibility when generating G_ASSERT_ALIGN.	2024-02-23 12:11:50 +01:00
Yeting Kuo	850dde063b	[RISCV][VP] Introduce vp saturating addition/subtraction and RISC-V support. (#82370 ) This patch also pick the MatchContext framework from DAGCombiner to an indiviual header file to make the framework be used from other files in llvm/lib/CodeGen/SelectionDAG/.	2024-02-23 14:17:15 +08:00
Heejin Ahn	6e6bf9f817	[WebAssembly] Disable multivalue emission temporarily (#82714 ) We plan to enable multivalue in the features section soon (#80923) for other reasons, such as the feature having been standardized for many years and other features being developed (e.g. EH) depending on it. This is separate from enabling Clang experimental multivalue ABI (`-Xclang -target-abi -Xclang experimental-mv`), but it turned out we generate some multivalue code in the backend as well if it is enabled in the features section. Given that our backend multivalue generation still has not been much used nor tested, and enabling the feature in the features section can be a separate decision from how much multialue (including none) we decide to generate for now, I'd like to temporarily disable the actual generation of multivalue in our backend. To do that, this adds an internal flag `-wasm-emit-multivalue` that defaults to false. All our existing multivalue tests can use this to test multivalue code. This flag can be removed later when we are confident the multivalue generation is well tested.	2024-02-22 19:17:15 -08:00
Alex MacLean	590c968e79	[NVPTX] fixup support for unaligned parameters and returns (#82562 ) Add support for unaligned parameters and return values. These must be loaded and stored one byte at a time and then bit manipulation is used to assemble the correct final result.	2024-02-22 17:27:28 -08:00
Philip Reames	ac518c7c99	[RISCV] Vector sub (zext, zext) -> sext (sub (zext, zext)) (#82455 ) This is legal as long as the inner zext retains at least one bit of increase so that the sub overflow case (0 - UINT_MAX) can be represented. Alive2 proof: https://alive2.llvm.org/ce/z/BKeV3W For RVV, restrict this to power of two sizes with the operation type being at least e8 to stick to legal extends. We could arguably handle i1 source types with some care if we wanted to. This is likely profitable because it may allow us to perform the sub instruction in a narrow LMUL (equivalently, in fewer DLEN-sized pieces) before widening for the user. We could arguably avoid narrowing below DLEN, but the transform should at worst introduce one extra extend and one extra vsetvli toggle if the source could previously be handled via loads explicit w/EEW.	2024-02-22 16:17:48 -08:00
Sumanth Gundapaneni	aaf2d078b6	[Hexagon] Clean up redundant transfer instructions. (#82663 ) This patch adds a Hexagon specific backend pass that cleans up redundant transfers after register allocation.	2024-02-22 17:31:37 -06:00
Nashe Mncube	744c0057e7	[AArch64][CodeGen] Fix crash when fptrunc returns fp16 with +nofp attr (#81724 ) When performing lowering of the fptrunc opcode returning fp16 with the +nofp flag enabled we could trigger a compiler crash. This is because we had no custom lowering implemented. This patch the case in which we need to promote an fp16 return type for fptrunc when the +nofp attr is enabled.	2024-02-22 19:15:52 +00:00
yandalur	6599c022be	[HEXAGON] Fix bit boundary for isub_hi in HexagonBitSimplify (#82336 ) Use bit boundary of 32 for high subregisters in HexagonBitSimplify. This fixes the subregister used in an upper half register store.	2024-02-22 11:48:06 -06:00
Craig Topper	5b53fa04db	[RISCV] Enable -riscv-enable-sink-fold by default. (#82026 ) AArch64 has had it enabled since late November, so hopefully the main issues have been resolved. I see a small reduction in dynamic instruction count on every benchmark in specint2017. The best improvement was 0.3% so nothing amazing.	2024-02-22 09:07:21 -08:00
Craig Topper	c1716e3fcf	[DAGCombiner][RISCV] CSE zext nneg and sext. (#82597 ) If we have a sext and a zext nneg with the same types and operand we should combine them into the sext. We can't go the other way because the nneg flag may only be valid in the context of the uses of the zext nneg.	2024-02-22 09:06:49 -08:00
Craig Topper	c9afd1ad78	[RISCV] Add test case showing missed opportunity to form sextload when sext and zext nneg are both present. NFC	2024-02-22 08:38:42 -08:00
Yingwei Zheng	0107c8824b	[RISCV][SDAG] Improve codegen of select with constants if zicond is available (#82456 ) This patch uses `add + czero.eqz/nez` to lower select with constants if zicond is available. ``` (select c, c1, c2) -> (add (czero_nez c2 - c1, c), c1) (select c, c1, c2) -> (add (czero_eqz c1 - c2, c), c2) ``` The above code sequence is suggested by [RISCV Optimization Guide](https://riscv-optimization-guide-riseproject-c94355ae3e6872252baa952524.gitlab.io/riscv-optimization-guide.html#_avoid_branches_using_conditional_moves).	2024-02-23 00:18:56 +08:00
Pierre van Houtryve	4235e44d4c	[GlobalISel] Constant-fold G_PTR_ADD with different type sizes (#81473 ) All other opcodes in the list are constrained to have the same type on both operands, but not G_PTR_ADD. Fixes #81464	2024-02-22 13:15:26 +01:00
Sander de Smalen	1f99a45012	[AArch64] Remove unused ReverseCSRRestoreSeq option. (#82326 ) This patch removes the `-reverse-csr-restore-seq` option from AArch64FrameLowering, since this is no longer used. This patch was reverted because of a crash in PR#79623. Merging it back as it was fixed in PR#82492.	2024-02-22 12:01:53 +00:00
Billy Laws	f17e415142	[AArch64] Mangle names of all ARM64EC functions with entry thunks (#80996 ) This better matches MSVC output in cases where static functions have their addresses taken.	2024-02-22 12:36:18 +01:00
Harald van Dijk	4f12f47550	[AArch64] Switch to soft promoting half types. (#80576 ) The traditional promotion is known to generate wrong code. Like #80440 for ARM, except that far less is affected as on AArch64, hardware floating point support always includes FP16 support and is unaffected by these changes. This only affects `-mgeneral-regs-only` (Clang) / `-mattr=-fp-armv8` (LLVM). Because this only affects a configuration where no FP support is available at all, `useFPRegsForHalfType()` has no effect and is not specified: `f32` was getting legalized as a parameter and return type to an integer anyway.	2024-02-22 10:45:27 +00:00
Vyacheslav Levytskyy	4a602d9250	Add support for the SPV_INTEL_usm_storage_classes extension (#82247 ) Add support for the SPV_INTEL_usm_storage_classes extension: * https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_usm_storage_classes.asciidoc	2024-02-22 11:05:19 +01:00
Vyacheslav Levytskyy	6cca23a3b9	[SPIRV] Prevent creation of jump tables from switch (#82287 ) This PR is to prevent creation of jump tables from switch. The reason is that SPIR-V doesn't know how to lower jump tables, and a sequence of commands that IRTranslator generates for switch via jump tables breaks SPIR-V Backend code generation with complains to G_BRJT. The next example is the shortest code to break SPIR-V Backend code generation in this way: ``` target datalayout = "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-n8:16:32:64" target triple = "spir64-unknown-unknown" define spir_func void @foo(i32 noundef %val) { entry: switch i32 %val, label %sw.epilog [ i32 0, label %sw.bb i32 1, label %sw.bb2 i32 2, label %sw.bb3 i32 3, label %sw.bb4 ] sw.bb: br label %sw.epilog sw.bb2: br label %sw.epilog sw.bb3: br label %sw.epilog sw.bb4: br label %sw.epilog sw.epilog: ret void } ``` To resolve the issue we set a high lower limit for number of blocks in a jump table via getMinimumJumpTableEntries() and prevent undesirable (or rather unsupported at the moment) path of code generation.	2024-02-22 10:30:00 +01:00
Vyacheslav Levytskyy	fddf23c6f4	[SPIRV] Add support for the SPV_KHR_subgroup_rotate extension (#82374 ) This PR adds support for the SPV_KHR_subgroup_rotate extension that enables rotating values across invocations within a subgroup: * https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/KHR/SPV_KHR_subgroup_rotate.asciidoc	2024-02-22 10:27:59 +01:00
CarolineConcatto	c5253aa136	[AArch64] Restore Z-registers before P-registers (#79623 ) (#82492 ) This is needed by PR#77665[1] that uses a P-register while restoring Z-registers. The reverse for SVE register restore in the epilogue was added to guarantee performance, but further work was done to improve sve frame restore and besides that the schedule also may change the order of the restore, undoing the reverse restore. This also fix the problem reported in (PR #79623) on Windows with std::reverse and .base(). [1]https://github.com/llvm/llvm-project/pull/77665	2024-02-22 09:19:48 +00:00
Antonio Frighetto	25e7e8d993	[CGP] Permit tail call optimization on undefined return value We may freely allow tail call optzs on undef values as well. Fixes: https://github.com/llvm/llvm-project/issues/82387.	2024-02-22 10:09:15 +01:00
Nick Anderson	8bd327d6fe	[AMDGPU][GlobalISel] Add fdiv / sqrt to rsq combine (#78673 ) Fixes #64743	2024-02-22 09:47:36 +01:00
Yeting Kuo	7e97ae35ae	[RISCV] Teach RISCVMakeCompressible handle Zca/Zcf/Zce/Zcd. (#81844 ) Make targets which don't have C but have Zca/Zcf/Zce/Zcd benefit from this pass.	2024-02-22 15:51:19 +08:00
Luke Lau	815644b4dd	[RISCV] Fix mgather -> riscv.masked.strided.load combine not extending indices (#82506 ) This fixes the miscompile reported in #82430 by telling isSimpleVIDSequence to sign extend to XLen instead of the width of the indices, since the "sequence" of indices generated by a strided load will be at XLen. This was the simplest way I could think of getting isSimpleVIDSequence to treat the indexes as if they were zero extended to XLenVT. Another way we could do this is by refactoring out the "get constant integers" part from isSimpleVIDSequence and handle them as APInts so we can separately zero extend it. Fixes #82430	2024-02-22 11:50:27 +08:00
Luke Lau	11d115d056	[RISCV] Adjust test case to show wrong stride. NFC See https://github.com/llvm/llvm-project/pull/82506#discussion_r1498080785	2024-02-22 11:08:45 +08:00
Sumanth Gundapaneni	d62ca8def3	[Hexagon] Optimize post-increment load and stores in loops. (#82418 ) This patch optimizes the post-increment instructions so that we can packetize them together. v1 = phi(v0, v3') v2,v3 = post_load v1, 4 v2',v3'= post_load v3, 4 This can be optimized in two ways v1 = phi(v0, v3') v2,v3' = post_load v1, 8 v2' = load v1, 4	2024-02-21 19:50:47 -06:00
Sumanth Gundapaneni	4c0fdcdb33	[Hexagon] Generate absolute-set load/store instructions. (#82034 ) The optimization finds the loads/stores of a specific form and translate the first load/store to an absolute-set form there by optimizing out the transfer and eliminate the constant extenders.	2024-02-21 19:50:29 -06:00
David Majnemer	be36812fb7	[TargetLowering] Be more efficient in fp -> bf16 NaN conversions We can avoid masking completely as it is OK (and probably preferable) to bring over some of the existant NaN payload.	2024-02-21 22:47:27 +00:00

... 11 12 13 14 15 ...

52796 Commits