llvm-project

Author	SHA1	Message	Date
Akshat Oke	7b60e03d73	Reland "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703 )" (#126684 ) `RegisterClassInfo` was supposed to be kept alive between pass runs, which wasn't being done leading to recomputations increasing the compile time. Now the Impl class is a member of the legacy and new passes so that it is not reconstructed on every pass run. --------- Co-authored-by: Christudasan Devadasan <christudasan.devadasan@amd.com>	2025-02-12 18:54:39 +05:30
Akshat Oke	564b9b7f4d	Revert "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703 )" (#126268 ) This reverts commit 5aa4979c47255770cac7b557f3e4a980d0131d69 while I investigate what's causing the compile-time regression.	2025-02-08 15:36:48 +05:30
Matt Arsenault	58a88001f3	PeepholeOpt: Fix looking for def of current copy to coalesce (#125533 ) This fixes the handling of subregister extract copies. This will allow AMDGPU to remove its implementation of shouldRewriteCopySrc, which exists as a 10 year old workaround to this bug. peephole-opt-fold-reg-sequence-subreg.mir will show the expected improvement once the custom implementation is removed. The copy coalescing processing here is overly abstracted from what's actually happening. Previously when visiting coalescable copy-like instructions, we would parse the sources one at a time and then pass the def of the root instruction into findNextSource. This means that the first thing the new ValueTracker constructed would do is getVRegDef to find the instruction we are currently processing. This adds an unnecessary step, placing a useless entry in the RewriteMap, and required skipping the no-op case where getNewSource would return the original source operand. This was a problem since in the case of a subregister extract, shouldRewriteCopySource would always say that it is useful to rewrite and the use-def chain walk would abort, returning the original operand. Move the process to start looking at the source operand to begin with. This does not fix the confused handling in the uncoalescable copy case which is proving to be more difficult. Some currently handled cases have multiple defs from a single source, and other handled cases have 0 input operands. It would be simpler if this was implemented with isCopyLikeInstr, rather than guessing at the operand structure as it does now. There are some improvements and some regressions. The regressions appear to be downstream issues for the most part. One of the uglier regressions is in PPC, where a sequence of insert_subrgs is used to build registers. I opened #125502 to use reg_sequence instead, which may help. The worst regression is an absurd SPARC testcase using a <251 x fp128>, which uses a very long chain of insert_subregs. We need improved subregister handling locally in PeepholeOptimizer, and other pasess like MachineCSE to fix some of the other regressions. We should handle subregister composes and folding more indexes into insert_subreg and reg_sequence.	2025-02-05 23:29:02 +07:00
Christudasan Devadasan	5aa4979c47	CodeGen][NewPM] Port MachineScheduler to NPM. (#125703 )	2025-02-05 12:17:59 +05:30
Sergei Barannikov	ff9c041d96	[MachineScheduler] Fix physreg dependencies of ExitSU (#123541 ) Providing the correct operand index allows addPhysRegDataDeps to compute the correct latency. Pull Request: https://github.com/llvm/llvm-project/pull/123541	2025-02-01 20:40:50 +03:00
Alexander Richardson	213a939a79	[LegalizeDAG] Use Base+Offset instead of Offset+Base for jump tables This is needed for architectures that actually use strict pointer arithmetic instead of integers such as AArch64 with FEAT_CPA (see https://github.com/llvm/llvm-project/pull/105669) or CHERI. Using an index as the first operand of pointer arithmetic may result in an invalid output. While there are quite a few codegen changes here, these only change the order of registers in add instructions. One MIPS combine had to be updated to handle the new node order. Reviewed By: topperc Pull Request: https://github.com/llvm/llvm-project/pull/125279	2025-01-31 14:05:34 -08:00
Alex Richardson	c7d4ccfd83	[PowerPC] Autogenerate a test checks in preparation for follow-up commit This just adds more lines that are checked	2025-01-31 12:01:31 -08:00
Stefan Pintilie	340706f311	[PowerPC] Fix saving of Link Register when using ROP Protect (#123101 ) An optimization was added that tries to move the uses of the mflr instruction away from the instruction itself. However, this doesn't work when we are using the hashst instruction because that instruction needs to be run before the stack frame is obtained. This patch disables moving instructions away from the mflr in the case where ROP protection is being used. --------- Co-authored-by: Lei Huang <lei@ca.ibm.com>	2025-01-22 13:44:20 -05:00
Sander de Smalen	6b1db79887	Revert "Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (#123632 )" There's a regression with one of the bootstrap builds for x86. I'll revert this while I investigate. This reverts commit 4df6d3df24ae9cff07c70c96a1663cbba6e1dca5.	2025-01-22 10:11:32 +00:00
Sander de Smalen	4df6d3df24	Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (#123632 ) This PR aims to reland work done by @arsenm which was previously reverted due to some tangentially related scheduler issues as discussed on #76416. This PR cherry-picks the original commit (0e46b49de433), and adds another patch on top with the following changes: * The code in `updateRegDefsUses` now updates subranges when subreg-liveness-tracking is enabled. * When adding an implicit-def operand for the super-register, the code in `reMaterializeTrivialDef` which tries to remove undefined subranges should now take into account that the lanes from the super-reg are no longer undefined. Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>	2025-01-22 09:07:46 +00:00
TiborGY	3630d9ef65	[PartiallyInlineLibCalls] Add infrastructure for emitting optimization remarks from PartiallyInlineLibCalls (#122654 ) I am planning to add some optimization remarks to the `PartiallyInlineLibCalls` pass. However, since this pass does not emit any optimization remarks yet, I have to add the "infrastructure" for that first, which is what this PR is about.	2025-01-22 13:15:40 +07:00
Matt Arsenault	5e79ae60a6	DAG: Fix vector_shuffle -> splat fold defining undef lanes (#123596 ) For shuffle vector splats with undef lanes in the mask, this was introducing real values. Filter out build_vector results based on the undef elements in the mask. This avoids AMDGPU test regressions in a future change. test/CodeGen/X86/urem-seteq-illegal-types.ll looks worse but I didn't investigate.	2025-01-21 23:55:50 +07:00
Guy David	1a935d7a17	[llvm] Mark scavenging spill-slots as spilled stack objects. (#122673 ) This seems like an oversight when copying code from other backends.	2025-01-14 10:18:31 +02:00
Hubert Tong	e438513f2e	[AIX][AsmPrinter] Fix unsigned subtraction wrap-around (#122214 ) Unsigned subtraction wrap-around occurs in `emitGlobalConstantImpl` on an AIX-specific code path from 8e4423eb0888 when a structure type has zero elements. With assertions enabled, this manifests as: ``` TypeSize llvm::StructLayout::getElementOffset(unsigned int) const: Assertion `Idx < NumElements && "Invalid element idx!"' failed. ```	2025-01-09 00:07:57 -04:00
Fangrui Song	5e0be962fe	[PowerPC] Support PIC Secure PLT for CALL_RM https://reviews.llvm.org/D111433 introduced PPCISD::CALL_RM for -frounding-math. -msecure-plt -frounding-math {-fpic,-fPIC} codegen for PPC32 became incorrect when a function contains function calls but no global variable references (GlobalBaseReg). As reported by @q66 , musl/src/dirent/closedir.c implements such a function, which is miscompiled. PPCISD::CALL has custom logic to set up the base register (https://reviews.llvm.org/D42112). Add an extra case for CALL_RM. While here, improve the test to * actually test `case PPCISD::CALL`: we need a non-leaf function that doesn't access global variables (global variables lead to GlobalBaseReg, which call `getGlobalBaseReg()` as well). * test `ExternalSymbolSDNode` with a memset. Supersedes: #72758 Pull Request: https://github.com/llvm/llvm-project/pull/121281	2025-01-06 08:59:42 -08:00
Craig Topper	4dfea22e77	[ExpandMemCmp][AArch64][PowerPC][RISCV][X86] Use llvm.ucmp instead of (sub (zext (icmp ugt)), (zext (icmp ult))). (#121530 ) AArch64 and PowerPC look like a improvements. RISC-V is neutral. X86 trades a dependency breaking xor before a seta for a movsx after a sbbb. Depending on how the result is used, this movsx might go away.	2025-01-03 09:19:32 -08:00
Simon Pilgrim	b3a7ab6f1f	[DAG] Don't allow implicit truncation in extract_element(bitcast(scalar_to_vector(X))) -> trunc(srl(X,C)) fold Limits #117900 to only fold when scalar_to_vector doesn't perform implicit truncation, as the scaled shift calculation doesn't currently account for this - this can be addressed in a future update. Fixes #121306	2024-12-30 16:08:35 +00:00
Fangrui Song	9efa7d7af3	Remove -print-lsr-output in favor of --stop-after=loop-reduce Pull Request: https://github.com/llvm/llvm-project/pull/121305	2024-12-29 18:58:30 -08:00
Fangrui Song	7b23f413d1	MCAsmStreamer: Omit initial ".text" llvm-mc --assemble prints an initial `.text` from `initSections`. This is weird for quick assembly tasks that do not specify `.text`. Omit the .text by moving section directive printing from `changeSection` to `switchSection`. switchSectionNoPrint now correctly calls the `changeSection` hook (needed by MachO). The initial directives of clang -S are now reordered. On ELF targets, we get `.file "a.c"; .text` instead of `.text; .file "a.c"`. If there is no function, `.text` will be omitted.	2024-12-22 22:03:44 -08:00
Benjamin Maxwell	a7dafea384	[SDAG] Allow folding stack slots into sincos/frexp in more cases (#118117 ) This adds a new helper `canFoldStoreIntoLibCallOutputPointers()` to check that it is safe to fold a store into a node that will expand to a library call that takes output pointers. This requires checking for two (independent) properties: 1. The store is not within a CALLSEQ_START..CALLSEQ_END pair * If it is, the expansion would lead to nested call sequences (which is invalid) 2. The node does not appear as a predecessor to the store * If it does, attempting to merge the store into the call would result in a cycle in the DAG These two properties are checked as part of the same traversal in `canFoldStoreIntoLibCallOutputPointers()`	2024-12-17 10:54:17 +00:00
Matt Arsenault	bb18e49edb	RegAlloc: Use DiagnosticInfo to report register allocation failures (#119492 ) Improve the non-fatal cases to use DiagnosticInfo, which will now provide a location. The allocators attempt to report different errors if it happens to see inline assembly is involved (this detection is quite unreliable) using srcloc instead of dbgloc. For now, leave this behavior unchanged. I think reporting the full location and context function would be more useful.	2024-12-16 10:49:08 +09:00
Fangrui Song	133352feb3	[test] Remove redundant -march= when target triple is specified in IR	2024-12-15 12:42:17 -08:00
Stefan Pintilie	67eb05b292	[PowerPC] Add special handling for arguments that are smaller than pointer size. (#119003 ) When arguments are passed in memory instead of registers we currently load the entire pointer size even though the argument may be smaller. For exmaple if the pointer size if i32 then we use a load word even if the argument is only an i8. This patch zeros / extends the bits that are not required to ensure that we are getting the correct value even if the load is larger.	2024-12-12 09:43:53 -05:00
Guillaume DI FATTA	a1ee1a9126	[CodeGen] @llvm.experimental.stackmap make operands immediate (#117932 ) This pull request modifies the behavior of the `@llvm.experimental.stackmap` intrinsic to require that its two first operands (`id` and `numShadowBytes`) be immediate values. This change ensures that variables cannot be passed as two first arguments to this intrinsic. Related Issue: https://github.com/llvm/llvm-project/issues/115733 ### Testing - Added new test cases to ensure errors are emitted for non-immediate operands. - Ran the full LLVM test suite to verify no regressions were introduced.	2024-12-11 17:41:19 +08:00
zhijian lin	4d06623b28	recalculate the live interval of the defined register of xvmaddmdp in the VSX FMA mutation pass. (#116071 ) The patch fix https://github.com/llvm/llvm-project/issues/116061 The root cause of the assertion is that the FMA mutation pass does not update the subranges of the live interval for the defined register of the modified instruction . it recalculate the live interval of the defined register of xvmaddmdp in the VSX FMA mutation pass.	2024-12-10 11:21:15 -05:00
Amy Kwan	f31099ce58	[PowerPC][AIX] Emit PowerPC version for XCOFF (#113214 ) This PR emits implements the ability to emit the PPC version for both assembly and object files on AIX.	2024-12-10 11:11:50 -05:00
Lei Huang	a13ec9cd54	[PowerPC] Update data layout aligment of i128 to 16 (#118004 ) Fix 64-bit PowerPC part of https://github.com/llvm/llvm-project/issues/102783.	2024-12-09 18:02:24 -05:00
Maryam Moghadas	68e75eebec	[PPC] Custom lower ssubo for i64 (#118711 ) This is a follow-up patch to improve the codegen for ssubo node for i64 in 64-bit mode by custom lowering.	2024-12-05 17:22:44 -05:00
zhijian lin	6b5c67bd16	[PowerPC][Backend] using signed extend value instead of zero extend value for isIntS34Immediate() (#118703 ) The patch fix the issue https://github.com/llvm/llvm-project/issues/118695	2024-12-05 09:08:18 -05:00
Simon Pilgrim	b1a48af56a	[DAG] SimplifyDemandedVectorElts - add handling for INT<->FP conversions (#117884 )	2024-12-04 07:37:01 +00:00
Zaara Syeda	935bbbbde4	[PPC] Remove missed cases of ppc-merge-string-pool (#117626 ) PPCMergeStringPool was replaced with GlobalMerge with commit aaa37d6. Some cases of option ppc-merge-string-pool were missed being removed.	2024-12-03 13:31:26 -05:00
Simon Pilgrim	31b7d4333a	[DAG] Extend extract_element(bitcast(scalar_to_vector(X))) -> trunc(srl(X,C)) (#117900 ) When extracting a smaller integer from a scalar_to_vector source, we were limited to only folding/truncating the lowest bits of the scalar source. This patch extends the fold to handle extraction of any other element, by right shifting the source before truncation. Fixes a regression from #117884	2024-11-29 17:24:38 +00:00
Maryam Moghadas	dab4121a55	[PowerPC] Add custom lowering for ssubo (#111748 ) (#115875 ) This patch is to improve the codegen for ssubo node for i32 by custom lowering.	2024-11-28 13:55:53 -05:00
Maryam Moghadas	66d350a017	[PowerPC][NFC] Pre-commit test case to prepare for patch to custom lower ssubo	2024-11-28 10:37:03 -05:00
Nikita Popov	04a2d50efd	[PPC] Use getSignedConstant() for frame index offset The offset is signed. Fixes assertion failure reported at: https://github.com/llvm/llvm-project/pull/117558#issuecomment-2504413074	2024-11-28 10:49:45 +01:00
RolandF77	a475180498	[PowerPC] Use setbc for values from vector compare conditions (#114858 ) For P10 use the setbc instruction to get int values from vector compare summary condition results.	2024-11-27 12:47:10 -05:00
Zaara Syeda	b1a34b80b8	[NFC][Test] Fix PowerPC test gcov_ctr_ref_init.ll (#117577 )	2024-11-26 12:09:49 -05:00
Zaara Syeda	8e4423eb08	[AsmPrinter] Fix handling in emitGlobalConstantImpl for AIX (#116255 ) When GlobalMerge creates a MergedGlobal of statics all initialized to zero, emitGlobalConstantImpl sees a ConstantAggregateZero. This results in just emitting zeros followed by labels for the aliases. We need to handle it more like how emitGlobalConstantStruct does by emitting each global inside the aggregate. --------- Co-authored-by: Hubert Tong <hubert.reinterpretcast@gmail.com>	2024-11-19 09:58:25 -05:00
Akshat Oke	3f9d02aae8	[CodeGen][NewPM] Port PeepholeOptimizer to NPM (#116326 ) With this, all machine SSA optimization passes are available in the new codegen pipeline.	2024-11-18 11:02:01 +05:30
Sergei Barannikov	032014ef10	[PowerPC] Add `SDNPMemOperand` to some nodes (#115580 ) Nodes created with `getMemIntrinsicNode` have memory operands. In order for operands to be propagated to machine instructions, the nodes should have `SDNPMemOperand` property. Similar to 3c8c385a.	2024-11-15 20:36:56 +03:00
Jake Egan	48cc435109	Reland "[PowerPC] Add error for incorrect use of memory operands (#114277 )" (#115958 ) Commit 93589057830b2c3c35500ee8cac25c717a1e98f9 was reverted because it caused a failure with test `lld :: ELF/ppc64-local-exec-tls.s`. This relands the commit with a fix for the test.	2024-11-13 22:24:19 -05:00
Tex Riddell	5c2a133b13	Emit constrained atan2 intrinsic for clang builtin (#113636 ) This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 - `Builtins.td` - Add f16 support for libm atan2 builtin - `CGBuiltin.cpp` - Emit constraint atan2 intrinsic for clang builtin - `clang/test/CodeGenCXX/builtin-calling-conv.cpp` - Use erff instead of atan2 for clang builtin to lib call calling convention check, now that atan2 maps to an intrinsic. - add atan2 cases to llvm.experimental.constrained tests for more backends: ARM, PowerPC, RISCV, SystemZ. - LangRef.rst: add llvm.experimental.constrained.atan2, revise llvm.atan2 description. Last part of Implement the atan2 HLSL Function. Fixes #70096.	2024-11-12 13:34:29 -08:00
Benjamin Maxwell	014455a587	[SDAG] Limit sincos/frexp stack slot folding to stores chained to entry (#115906 ) When the chain is not the entry node there is a risk the stores are within a (CALLSEQ_START, CALLSEQ_END), which when the node is expanded will lead to nested call sequences. It should be possible to check for this and allow more cases, but for now, let's limit this to cases where it's definitely safe. Fixes #115323	2024-11-12 20:48:41 +00:00
Zaara Syeda	aaa37d6755	[PPC] Replace PPCMergeStringPool with GlobalMerge for Linux (#114850 ) Enable merging all constants without looking at use in GlobalMerge by default to replace PPCMergeStringPool pass on Linux.	2024-11-12 14:02:01 -05:00
Jake Egan	0e52a0721e	Revert "[PowerPC] Add error for incorrect use of memory operands (#114277 )" This commit broke a test on a couple bots lld :: ELF/ppc64-local-exec-tls.s This reverts commit 93589057830b2c3c35500ee8cac25c717a1e98f9.	2024-11-12 04:03:06 -05:00
Jake Egan	9358905783	[PowerPC] Add error for incorrect use of memory operands (#114277 ) If an instruction doesn't support memory operands, but one is provided, an error should be raised. And conversely, if an instruction requires a memory operand, but none is given, an error should be raised.	2024-11-12 03:00:06 -05:00
Amy Kwan	4981f8cb72	[PowerPC] Fix vector_shuffle combines when inputs are scalar_to_vector of differing types. (#80784 ) This patch fixes the combines for vector_shuffles when either or both of its left and right hand side inputs are scalar_to_vector nodes. Previously, when both left and right side inputs are scalar_to_vector nodes, the current combine could not handle this situation, as the shuffle mask was updated incorrectly. To temporarily solve this solution, this combine was simply disabled and not performed. Now, not only does this patch aim to resolve the previous issue of the incorrect shuffle mask adjustments respectively, but it also updates any test cases that are affected by this change. Patch migrated from https://reviews.llvm.org/D130487.	2024-11-11 10:53:51 -05:00
Nikita Popov	dd116369f6	[InstSimplify] Fix incorrect poison propagation when folding phi (#96631 ) We can only replace phi(X, undef) with X, if X is known not to be poison. Otherwise, the result may be more poisonous on the undef branch. Fixes https://github.com/llvm/llvm-project/issues/68683.	2024-11-07 14:09:45 +01:00
abhishek-kaushik22	d2aff182d3	Revert "TLS loads opimization (hoist)" (#114740 ) This reverts commit c31014322c0b5ae596da129cbb844fb2198b4ef4. Based on the discussions in #112772, this pass is not needed after the introduction of `llvm.threadlocal.address` intrinsic. Fixes https://github.com/llvm/llvm-project/issues/112771.	2024-11-07 10:10:28 +01:00
Benjamin Maxwell	ea6b8fa4b9	[SDAG] Merge multiple-result libcall expansion into DAG.expandMultipleResultFPLibCall() (#114792 ) This merges the logic for expanding both FFREXP and FSINCOS into one method `DAG.expandMultipleResultFPLibCall()`. This reduces duplication and also allows FFREXP to benefit from the stack slot elimination implemented for FSINCOS. This method will also be used in future to implement more multiple-result intrinsics (such as modf and sincospi).	2024-11-06 11:06:06 +00:00

1 2 3 4 5 ...

4011 Commits