llvm-project

Author	SHA1	Message	Date
Nikita Popov	d74831efeb	Revert "[SDAG] Fix fmaximum legalization errors (#142170 )" This reverts commit 58cc1675ec7b4aa5bc2dab56180cb7af1b23ade5. I also made the incorrect assumption that we know both values are +/-0.0 here as well. Revert for now.	2025-06-04 14:35:30 +02:00
Nikita Popov	42605b8aa3	Revert "[SelectionDAG] Avoid one comparison when legalizing fmaximum (#142732 )" This reverts commit 54da543a14da6dd0e594875241494949cb659b08. I made a logic error here with the assumption that both values are known to be +/-0.0.	2025-06-04 14:22:19 +02:00
Nikita Popov	54da543a14	[SelectionDAG] Avoid one comparison when legalizing fmaximum (#142732 ) When ordering signed zero, only check the sign of one of the values. We already know at this point that both values must be +/-0.0, so it is sufficient to check one of them to correctly order them. For example, for fmaximum, if we know LHS is `+0.0` then we can always select LHS, value of RHS does not matter. If LHS is `-0.0` we can always select RHS, value of RHS doesn't matter.	2025-06-04 10:41:30 +02:00
Matt Arsenault	01a6d0fffb	ARM: Use correct file extension for IR test (#142728 )	2025-06-04 17:34:29 +09:00
Yingwei Zheng	1984c7539e	[ValueTracking] Do not use FMF from fcmp (#142266 ) This patch introduces an FMF parameter for `matchDecomposedSelectPattern` to pass FMF flags from select, instead of fcmp. Closes https://github.com/llvm/llvm-project/issues/137998. Closes https://github.com/llvm/llvm-project/issues/141017.	2025-06-02 18:21:14 +08:00
Nikita Popov	58cc1675ec	[SDAG] Fix fmaximum legalization errors (#142170 ) FMAXIMUM is currently legalized via IS_FPCLASS for the signed zero handling. This is problematic, because it assumes the equivalent integer type is legal. Many targets have legal fp128, but illegal i128, so this results in legalization failures. Fix this by replacing IS_FPCLASS with checking the bitcast to integer instead. In that case it is sufficient to use any legal integer type, as we're just interested in the sign bit. This can be obtained via a stack temporary cast. There is existing FloatSignAsInt functionality used for legalization of FABS and similar we can use for this purpose. Fixes https://github.com/llvm/llvm-project/issues/139380. Fixes https://github.com/llvm/llvm-project/issues/139381. Fixes https://github.com/llvm/llvm-project/issues/140445.	2025-06-02 10:14:33 +02:00
David Green	7a688c080f	[ARM] Add vector vrint tests and fix FP16 to expand.	2025-05-31 12:21:46 +01:00
Folkert de Vries	46b389218b	[ARM]: codegen `llvm.roundeven.v*` (#141786 ) fixes https://github.com/llvm/llvm-project/issues/73588 The aarch64 version of `frintn.ll` notes the intention to auto-upgrade `frintn` to `roundeven`. I haven't been able to figure out how to make that happen though (either for arm or aarch64). The original issue came up in https://github.com/rust-lang/stdarch/pull/1807	2025-05-30 19:36:29 +01:00
Orlando Cazalet-Hyams	34a55c9376	[BranchFolding] Fix assertion failure in HoistCommonCodeInSuccs (#141028 ) Assertion failure introduced in #140063, which didn't account for TBB and FBB being the same block.	2025-05-22 13:21:26 +01:00
Mohammad Bashir	bcdce987c0	Fix regression tests with bad FileCheck checks (#140373 ) Fixes https://github.com/llvm/llvm-project/issues/140149	2025-05-22 07:59:57 +03:00
Jessica Clarke	1b41599cf8	[MC][AArch64][ARM][X86] Push target-dependent assembler flags into targets (#139844 ) The .syntax unified directive and .codeX/.code X directives are, other than some simple common printing code, exclusively implemented in the targets themselves. Thus, remove the corresponding MCAF_* flags and reimplement the directives solely within the targets. This avoids exposing all targets to all other targets' flags. Since MCAF_SubsectionsViaSymbols is all that remains, convert it to its own function like other directives, simplifying its implementation. Note that, on X86, we now always need a target streamer when parsing assembly, as it's now used for directives that aren't COFF-specific. It still does not however need to do anything when producing a non-COFF object file, so this commit does not introduce any new target streamers. There is some churn in test output, and corresponding UTC regex changes, due to comments no longer being flushed by these various directives (and EmitEOL is not exposed outside MCAsmStreamer.cpp so we couldn't do so even if we wanted to), but that was a bit odd to be doing anyway. This is motivated by Morello LLVM, which adds yet another assembler flag to distinguish A64 and C64 instruction sets, but did not update every switch and so emits warnings during the build. Rather than fix those warnings it seems better to instead make the problem not exist in the first place via this change.	2025-05-18 20:09:43 +01:00
YunQiang Su	780054d3ff	CodeGen: Add ISD::AssertNoFPClass (#138839 ) It is used to mark a value that we are sure that it is not some fcType. The examples include: * An arguments of a function is marked with nofpclass * Output value of an intrinsic can be sure to not be some type So that the following operation can make some assumptions.	2025-05-15 16:05:15 +08:00
Simon Pilgrim	bde39d7251	[DAG] Add SDPatternMatch::m_BitwiseLogic common matcher for AND/OR/XOR nodes (#138301 )	2025-05-06 12:50:50 +01:00
Matt Arsenault	93509064a6	Revert "ARM: Remove override of shouldRewriteCopySrc (#125219 )" This reverts commit 9d90f8ba7113fd9c7b2662682ad94b744ed2b78c. Test fails the machine verifier. There's a bug somewhere, the unrepresentable cases should be avoided by the default logic.	2025-05-05 22:08:48 +02:00
Matt Arsenault	9d90f8ba71	ARM: Remove override of shouldRewriteCopySrc (#125219 ) All of the overrides of shouldRewriteCopySrc appear to be hacks for bugs in the base implementation, so I'm trying to delete all of the overrides. I was expecting this to find an example issue like the x86 version, but no tests change with this.	2025-05-05 19:57:04 +02:00
Matt Arsenault	44856d957e	ARM: Add test which shows overriding shouldRewriteCopySrc does something Split out from #125219	2025-05-05 17:32:37 +02:00
Sergei Barannikov	becd418626	[CGP] Despeculate ctlz/cttz with "illegal" integer types (#137197 ) The code below the removed check looks generic enough to support arbitrary integer widths. This change helps 32-bit targets avoid expensive expansion/libcalls in the case of zero input. Pull Request: https://github.com/llvm/llvm-project/pull/137197	2025-04-29 22:33:40 +03:00
John Brawn	dd87127f4e	[DAGCombiner] Eliminate fp casts if we have the right fast math flags (#131345 ) When floating-point operations are legalized to operations of a higher precision (e.g. f16 fadd being legalized to f32 fadd) then we get narrowing then widening operations between each operation. With the appropriate fast math flags (nnan ninf contract) we can eliminate these casts.	2025-04-28 11:21:51 +01:00
Sergei Barannikov	7af555e524	[ARM][RISCV] Partially revert #101786 (#137120 ) The change as is breaks the Linux kernel build as pointed out in the comments.	2025-04-24 10:13:05 +03:00
Sergei Barannikov	11a3de7e98	[SDag][ARM][RISCV] Allow lowering CTPOP into a libcall (#101786 ) This is a reland of #99752 with the bug fixed (see test diff in the third commit in this PR). All `popcount` libcalls return `int`, but `ISD::CTPOP` returns the type of the argument, which can be wider than `int`. The fix is to make DAG legalizer pass the correct return type to `makeLibCall` and sign-extend the result afterwards. Original commit message: The main change is adding CTPOP to `RuntimeLibcalls.def` to allow targets to use LibCall action for CTPOP. DAG legalizers are changed accordingly. Pull Request: https://github.com/llvm/llvm-project/pull/101786	2025-04-23 12:43:05 +03:00
Benson Chu	50320504c8	[ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute FPSCR and FPEXC will be stored in FPStatusRegs, after GPRCS2 has been saved. - GPRCS1 - GPRCS2 - FPStatusRegs (new) - DPRCS - GPRCS3 - DPRCS2 FPSCR is present on all targets with a VFP, but the FPEXC register is not present on Cortex-M devices, so different amounts of bytes are being pushed onto the stack depending on our target, which would affect alignment for subsequent saves. DPRCS1 will sum up all previous bytes that were saved, and will emit extra instructions to ensure that its alignment is correct. My assumption is that if DPRCS1 is able to correct its alignment to be correct, then all subsequent saves will also have correct alignment. Avoid annotating the saving of FPSCR and FPEXC for functions marked with the interrupt_save_fp attribute, even though this is done as part of frame setup. Since these are status registers, there really is no viable way of annotating this. Since these aren't GPRs or DPRs, they can't be used with .save or .vsave directives. Instead, just record that the intermediate registers r4 and r5 are saved to the stack again. Co-authored-by: Jake Vossen <jake@vossen.dev> Co-authored-by: Alan Phipps <a-phipps@ti.com>	2025-04-22 14:31:29 -05:00
Fangrui Song	7117dea043	AsmPrinter: Remove ELF's special lowerRelativeReference for unnamed_addr function; use lowerDSOLocalEquivalent in more cases https://reviews.llvm.org/D17938 introduced lowerRelativeReference to give ConstantExpr sub (A-B) special semantics in ELF: when `A` is an `unnamed_addr` function, create a PLT-generating relocation. This was intended for C++ relative vtables, but C++ relative vtable ended up using DSOLocalEquivalent (lowerDSOLocalEquivalent). This special treatment of `unnamed_addr` seems unusual. Let's remove it. Only COFF needs an overload to generate a @IMGREL32 relocation specifier (llvm/test/MC/COFF/cross-section-relative.ll). Pull Request: https://github.com/llvm/llvm-project/pull/134781	2025-04-08 10:11:20 -07:00
Un1q32	6f34d03b31	Remove iOS 5 check for tailcalls on ARM (#133354 ) Fixes #102053 The check was added in 8decdc472f308b13d7fb7fd50c3919db086c0417, and at the time iOS 5 was the latest iOS version, before that commit tail calls were disabled for all ARMv7 targets. Testing a build of wasm3 with the patch on a device running iOS 3.0 shows a noticeable performance improvement and no issues.	2025-04-04 16:02:39 -07:00
Alex MacLean	ad39049ec4	[DAGCombiner] Attempt to fold 'add' nodes to funnel-shift or rotate (#125612 ) Almost all of the rotate idioms that are valid for an 'or' are also valid when the halves are combined with an 'add'. Further, many of these cases are not handled by common bits tracking meaning that the 'add' is not converted to a 'disjoint or'.	2025-04-04 15:39:24 -07:00
Petr Hosek	4b19db6db9	Revert "AsmPrinter: Remove ELF's special lowerRelativeReference for unnamed_addr function" (#133935 ) Reverts llvm/llvm-project#132684	2025-04-01 09:39:07 -07:00
Fangrui Song	dd862356e2	AsmPrinter: Remove ELF's special lowerRelativeReference for unnamed_addr function https://reviews.llvm.org/D17938 introduced lowerRelativeReference to give ConstantExpr sub (A-B) special semantics in ELF: when `A` is an `unnamed_addr` function, create a PLT-generating relocation. This was intended for C++ relative vtables, but C++ relative vtable ended up using DSOLocalEquivalent (lowerDSOLocalEquivalent). This special treatment of `unnamed_addr` seems unusual. Let's remove it. Only COFF needs an overload to generate a @IMGREL32 relocation specifier (llvm/test/MC/COFF/cross-section-relative.ll). Pull Request: https://github.com/llvm/llvm-project/pull/132684	2025-03-31 20:44:29 -07:00
Fangrui Song	04a67528d3	[MC] Simplify MCBinaryExpr/MCUnaryExpr printing by reducing parentheses (#133674 ) The existing pretty printer generates excessive parentheses for MCBinaryExpr expressions. This update removes unnecessary parentheses of MCBinaryExpr with +/- operators and MCUnaryExpr. Since relocatable expressions only use + and -, this change improves readability in most cases. Examples: - (SymA - SymB) + C now prints as SymA - SymB + C. This updates the output of -fexperimental-relative-c++-abi-vtables for AArch64 and x86 to `.long _ZN1B3fooEv@PLT-_ZTV1B-8` - expr + (MCTargetExpr) now prints as expr + MCTargetExpr, with this change primarily affecting AMDGPUMCExpr.	2025-03-30 22:03:14 -07:00
Alex MacLean	672c51c9cb	[SDAG][tests] add some test cases covering an add-based rotate (#132842 ) Add tests to various targets covering rotate idioms where an 'ADD' node is used to combine the halves instead of an 'OR'. Some of these cases will be better optimized following #125612, while others are already well optimized or do not have a valid fold to a rotate or funnel-shift.	2025-03-26 09:47:28 -07:00
Akshat Oke	174110bf3c	[CodeGen][NPM] Port LiveDebugValues to NPM (#131563 )	2025-03-24 11:34:45 +05:30
Jeremy Morse	792a6f8119	[RemoveDIs] Remove "try-debuginfo-iterators..." test flags (#130298 ) These date back to when the non-intrinsic format of variable locations was still being tested and was behind a compile-time flag, so not all builds / bots would correctly run them. The solution at the time, to get at least some test coverage, was to have tests opt-in to non-intrinsic debug-info if it was built into LLVM. Nowadays, non-intrinsic format is the default and has been on for more than a year, there's no need for this flag to exist. (I've downgraded the flag from "try" to explicitly requesting non-intrinsic format in some places, so that we can deal with tests that are explicitly about non-intrinsic format in their own commit).	2025-03-14 15:50:49 +00:00
Frederik Harwath	6962cf1700	Rename ExpandLargeFpConvertPass to ExpandFpPass (#131128 ) This is meant as a preparation for PR #130988 "[AMDGPU] Implement IR expansion for frem instruction" which implements the expansion of another instruction in this pass. The more general name seems more appropriate given this change and quite reasonable even without it.	2025-03-14 13:11:45 +01:00
Benson Chu	3b3356043c	Revert "[ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute" This reverts commit 1f05703176d43a339b41a474f51c0e8b1a83c9bb.	2025-03-10 10:11:23 -05:00
Benson Chu	1f05703176	[ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute FPSCR and FPEXC will be stored in FPStatusRegs, after GPRCS2 has been saved. - GPRCS1 - GPRCS2 - FPStatusRegs (new) - DPRCS - GPRCS3 - DPRCS2 FPSCR is present on all targets with a VFP, but the FPEXC register is not present on Cortex-M devices, so different amounts of bytes are being pushed onto the stack depending on our target, which would affect alignment for subsequent saves. DPRCS1 will sum up all previous bytes that were saved, and will emit extra instructions to ensure that its alignment is correct. My assumption is that if DPRCS1 is able to correct its alignment to be correct, then all subsequent saves will also have correct alignment. Avoid annotating the saving of FPSCR and FPEXC for functions marked with the interrupt_save_fp attribute, even though this is done as part of frame setup. Since these are status registers, there really is no viable way of annotating this. Since these aren't GPRs or DPRs, they can't be used with .save or .vsave directives. Instead, just record that the intermediate registers r4 and r5 are saved to the stack again. Co-authored-by: Jake Vossen <jake@vossen.dev> Co-authored-by: Alan Phipps <a-phipps@ti.com>	2025-03-10 10:05:15 -05:00
John Brawn	d19218e507	[SelectionDAG] Preserve fast math flags when legalizing/promoting (#130124 ) When we have a floating-point operation that a target doesn't support for a given type, but does support for a wider type, then there are two ways this can be handled: * If the target doesn't have any registers at all of this type then LegalizeTypes will convert the operation. * If we do have registers but no operation for this type, then the operation action will be Promote and it's handled in PromoteNode. In both cases the operation at the wider type, and the conversion operations to and from that type, should have the same fast math flags as the original operation. This is being done in preparation for a DAGCombine patch which makes use of these fast math flags.	2025-03-07 14:46:32 +00:00
Daniel Paoliello	16e051f0b9	[win] NFC: Rename `EHCatchret` to `EHCont` to allow for EH Continuation targets that aren't `catchret` instructions (#129953 ) This change splits out the renaming and comment updates from #129612 as a non-functional change.	2025-03-06 09:28:44 -08:00
Matt Arsenault	b21663cb5b	SplitKit: Take register class directly from instruction definition (#129727 ) This fixes an expensive chesk failure after 8476a5d480304. The issue was essentially that getRegClassConstraintEffectForVReg was not doing anything useful, sometimes. If the register passed to it is not present in the instruction, it is a no-op and returns the original classe. The Edit->getReg() register may not be the register as it appears in either the use or def instruction. It may be some split register, so take the register directly from the instruction being rematerialized. Also directly query the constraint from the def instruction, with a hardcoded operand index. This isn't ideal, but all the other rematerialize code makes the same assumption. So far I've been unable to reproduce this with a standalone MIR test. In the original case, stop-before=greedy and running the one pass is not working.	2025-03-06 20:06:35 +07:00
Benjamin Maxwell	0228b778a4	[SDAG] Add missing SoftenFloatRes legalization for FMODF (#129264 ) This is needed on some ARM platforms.	2025-03-05 13:45:48 +00:00
Lucas Ramirez	03677f63a7	[MachineScheduler] Optional scheduling of single-MI regions (#129704 ) Following 15e295d the machine scheduler no longer filters-out single-MI regions when emitting regions to schedule. While this has no functional impact at the moment, it generally has a negative compile-time impact (see #128739). Since all targets but AMDGPU do not care for this behavior, this introduces an off-by-default flag to `ScheduleDAGInstrs` to control whether such regions are going to be scheduled, effectively reverting 15e295d for all targets but AMDGPU (currently the only target enabling this flag).	2025-03-04 17:46:44 +01:00
Akshat Oke	77f44a9642	[CodeGen][NewPM] Port MachineSink to NPM (#115434 ) Targets can set the EnableSinkAndFold option in CGPassBuilderOptions for the NPM pipeline in buildCodeGenPipeline(... &Opts, ...)	2025-03-03 15:49:37 +05:30
Lucas Ramirez	15e295d30a	[MachineScheduler][AMDGPU] Allow scheduling of single-MI regions (#128739 ) The MI scheduler skips regions containing a single MI during scheduling. This can prevent targets that perform multi-stage scheduling and move MIs between regions during some stages to reason correctly about the entire IR, since some MIs will not be assigned to a region at the beginning. This makes the machine scheduler no longer skip single-MI regions. Only a few unit tests are affected (mainly those which check for the scheduler's debug output).	2025-02-27 11:27:07 +01:00
Eli Friedman	1b39328d74	[CodeGen] Fix MachineInstr::isSafeToMove handling of inline asm. (#126807 ) Even if an inline asm doesn't have memory effects, we can't assume it's safe to speculate: it could trap, or cause undefined behavior. At the LLVM IR level, this is handled correctly: we don't speculate inline asm (unless it's marked "speculatable", but I don't think anyone does that). Codegen also needs to respect this restriction. This change stops Early If Conversion and similar passes from speculating an INLINEASM MachineInstr. Some uses of isSafeToMove probably could be switched to a different API: isSafeToMove assumes you're hoisting, but we could handle some forms of sinking more aggressively. But I'll leave that for a followup, if it turns out to be relevant. See also discussion on gcc bugtracker https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102150 .	2025-02-25 15:29:12 -08:00
Vikash Gupta	352c48f278	[SelectionDAG] Utilizing target hook convertSelectOfConstantsToMath for SelectwithConstant (#127599 ) The Target hook convertSelectOfConstantsToMath() needs to be used within SimplifySelectCC helper combine function in SelectionDAG Isel, where generic select folding with constants is happening into simple maths op using the condition as it is. It necessarily fixes #121145.	2025-02-25 20:32:24 +05:30
Simon Pilgrim	7de64925da	[DAG] shouldReduceLoadWidth - hasOneUse should check just the loaded value - not the chain (#128167 ) The hasOneUse check was failing in any case where the load was part of a chain - we should only be checking if the loaded value has one use, and any updates to the chain should be handled by the fold calling shouldReduceLoadWidth. I've updated the x86 implementation to match, although it has no effect here yet (I'm still looking at how to improve the x86 implementation) as the inner for loop was discarding chain uses anyway. By using SDValue::hasOneUse instead this patch exposes a missing dependency on the LLVMSelectionDAG library in a lot of tools + unittests, which resulted in having to make SDNode::hasNUsesOfValue inline. Noticed while fighting the x86 regressions in #122671	2025-02-24 11:09:41 +00:00
Nikita Popov	d8b2e432d6	[IR] Remove mul constant expression (#127046 ) Remove support for the mul constant expression, which has previously already been marked as undesirable. This removes the APIs to create mul expressions and updates tests to stop using mul expressions. Part of: https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179	2025-02-14 09:28:57 +01:00
Akshat Oke	7b60e03d73	Reland "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703 )" (#126684 ) `RegisterClassInfo` was supposed to be kept alive between pass runs, which wasn't being done leading to recomputations increasing the compile time. Now the Impl class is a member of the legacy and new passes so that it is not reconstructed on every pass run. --------- Co-authored-by: Christudasan Devadasan <christudasan.devadasan@amd.com>	2025-02-12 18:54:39 +05:30
Cullen Rhodes	317a644ae6	[SDAG] Precommit tests for #126207 (NFC) (#126208 ) Add missing test coverage for codepaths touched by #126207.	2025-02-10 09:13:02 +00:00
Akshat Oke	564b9b7f4d	Revert "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703 )" (#126268 ) This reverts commit 5aa4979c47255770cac7b557f3e4a980d0131d69 while I investigate what's causing the compile-time regression.	2025-02-08 15:36:48 +05:30
Matt Arsenault	c268a3f093	DAG: Fix extract of load combine with mismatched vector element type Fix the case where the vector element type of the loaded extractelement input does not match the result type of the extract. This fixes a regression reported after c55a7659b38946350315ac4a18d9805deb1f0a54	2025-02-06 22:56:56 +07:00
Matt Arsenault	58a88001f3	PeepholeOpt: Fix looking for def of current copy to coalesce (#125533 ) This fixes the handling of subregister extract copies. This will allow AMDGPU to remove its implementation of shouldRewriteCopySrc, which exists as a 10 year old workaround to this bug. peephole-opt-fold-reg-sequence-subreg.mir will show the expected improvement once the custom implementation is removed. The copy coalescing processing here is overly abstracted from what's actually happening. Previously when visiting coalescable copy-like instructions, we would parse the sources one at a time and then pass the def of the root instruction into findNextSource. This means that the first thing the new ValueTracker constructed would do is getVRegDef to find the instruction we are currently processing. This adds an unnecessary step, placing a useless entry in the RewriteMap, and required skipping the no-op case where getNewSource would return the original source operand. This was a problem since in the case of a subregister extract, shouldRewriteCopySource would always say that it is useful to rewrite and the use-def chain walk would abort, returning the original operand. Move the process to start looking at the source operand to begin with. This does not fix the confused handling in the uncoalescable copy case which is proving to be more difficult. Some currently handled cases have multiple defs from a single source, and other handled cases have 0 input operands. It would be simpler if this was implemented with isCopyLikeInstr, rather than guessing at the operand structure as it does now. There are some improvements and some regressions. The regressions appear to be downstream issues for the most part. One of the uglier regressions is in PPC, where a sequence of insert_subrgs is used to build registers. I opened #125502 to use reg_sequence instead, which may help. The worst regression is an absurd SPARC testcase using a <251 x fp128>, which uses a very long chain of insert_subregs. We need improved subregister handling locally in PeepholeOptimizer, and other pasess like MachineCSE to fix some of the other regressions. We should handle subregister composes and folding more indexes into insert_subreg and reg_sequence.	2025-02-05 23:29:02 +07:00
Christudasan Devadasan	44f638f88e	CodeGen][NewPM] Port PostRAScheduler to NPM. (#125798 )	2025-02-05 12:45:59 +05:30

1 2 3 4 5 ...

5067 Commits