llvm-project

Author	SHA1	Message	Date
Petr Hosek	4b19db6db9	Revert "AsmPrinter: Remove ELF's special lowerRelativeReference for unnamed_addr function" (#133935 ) Reverts llvm/llvm-project#132684	2025-04-01 09:39:07 -07:00
Fangrui Song	dd862356e2	AsmPrinter: Remove ELF's special lowerRelativeReference for unnamed_addr function https://reviews.llvm.org/D17938 introduced lowerRelativeReference to give ConstantExpr sub (A-B) special semantics in ELF: when `A` is an `unnamed_addr` function, create a PLT-generating relocation. This was intended for C++ relative vtables, but C++ relative vtable ended up using DSOLocalEquivalent (lowerDSOLocalEquivalent). This special treatment of `unnamed_addr` seems unusual. Let's remove it. Only COFF needs an overload to generate a @IMGREL32 relocation specifier (llvm/test/MC/COFF/cross-section-relative.ll). Pull Request: https://github.com/llvm/llvm-project/pull/132684	2025-03-31 20:44:29 -07:00
Fangrui Song	04a67528d3	[MC] Simplify MCBinaryExpr/MCUnaryExpr printing by reducing parentheses (#133674 ) The existing pretty printer generates excessive parentheses for MCBinaryExpr expressions. This update removes unnecessary parentheses of MCBinaryExpr with +/- operators and MCUnaryExpr. Since relocatable expressions only use + and -, this change improves readability in most cases. Examples: - (SymA - SymB) + C now prints as SymA - SymB + C. This updates the output of -fexperimental-relative-c++-abi-vtables for AArch64 and x86 to `.long _ZN1B3fooEv@PLT-_ZTV1B-8` - expr + (MCTargetExpr) now prints as expr + MCTargetExpr, with this change primarily affecting AMDGPUMCExpr.	2025-03-30 22:03:14 -07:00
Alex MacLean	672c51c9cb	[SDAG][tests] add some test cases covering an add-based rotate (#132842 ) Add tests to various targets covering rotate idioms where an 'ADD' node is used to combine the halves instead of an 'OR'. Some of these cases will be better optimized following #125612, while others are already well optimized or do not have a valid fold to a rotate or funnel-shift.	2025-03-26 09:47:28 -07:00
Akshat Oke	174110bf3c	[CodeGen][NPM] Port LiveDebugValues to NPM (#131563 )	2025-03-24 11:34:45 +05:30
Jeremy Morse	792a6f8119	[RemoveDIs] Remove "try-debuginfo-iterators..." test flags (#130298 ) These date back to when the non-intrinsic format of variable locations was still being tested and was behind a compile-time flag, so not all builds / bots would correctly run them. The solution at the time, to get at least some test coverage, was to have tests opt-in to non-intrinsic debug-info if it was built into LLVM. Nowadays, non-intrinsic format is the default and has been on for more than a year, there's no need for this flag to exist. (I've downgraded the flag from "try" to explicitly requesting non-intrinsic format in some places, so that we can deal with tests that are explicitly about non-intrinsic format in their own commit).	2025-03-14 15:50:49 +00:00
Frederik Harwath	6962cf1700	Rename ExpandLargeFpConvertPass to ExpandFpPass (#131128 ) This is meant as a preparation for PR #130988 "[AMDGPU] Implement IR expansion for frem instruction" which implements the expansion of another instruction in this pass. The more general name seems more appropriate given this change and quite reasonable even without it.	2025-03-14 13:11:45 +01:00
Benson Chu	3b3356043c	Revert "[ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute" This reverts commit 1f05703176d43a339b41a474f51c0e8b1a83c9bb.	2025-03-10 10:11:23 -05:00
Benson Chu	1f05703176	[ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute FPSCR and FPEXC will be stored in FPStatusRegs, after GPRCS2 has been saved. - GPRCS1 - GPRCS2 - FPStatusRegs (new) - DPRCS - GPRCS3 - DPRCS2 FPSCR is present on all targets with a VFP, but the FPEXC register is not present on Cortex-M devices, so different amounts of bytes are being pushed onto the stack depending on our target, which would affect alignment for subsequent saves. DPRCS1 will sum up all previous bytes that were saved, and will emit extra instructions to ensure that its alignment is correct. My assumption is that if DPRCS1 is able to correct its alignment to be correct, then all subsequent saves will also have correct alignment. Avoid annotating the saving of FPSCR and FPEXC for functions marked with the interrupt_save_fp attribute, even though this is done as part of frame setup. Since these are status registers, there really is no viable way of annotating this. Since these aren't GPRs or DPRs, they can't be used with .save or .vsave directives. Instead, just record that the intermediate registers r4 and r5 are saved to the stack again. Co-authored-by: Jake Vossen <jake@vossen.dev> Co-authored-by: Alan Phipps <a-phipps@ti.com>	2025-03-10 10:05:15 -05:00
John Brawn	d19218e507	[SelectionDAG] Preserve fast math flags when legalizing/promoting (#130124 ) When we have a floating-point operation that a target doesn't support for a given type, but does support for a wider type, then there are two ways this can be handled: * If the target doesn't have any registers at all of this type then LegalizeTypes will convert the operation. * If we do have registers but no operation for this type, then the operation action will be Promote and it's handled in PromoteNode. In both cases the operation at the wider type, and the conversion operations to and from that type, should have the same fast math flags as the original operation. This is being done in preparation for a DAGCombine patch which makes use of these fast math flags.	2025-03-07 14:46:32 +00:00
Daniel Paoliello	16e051f0b9	[win] NFC: Rename `EHCatchret` to `EHCont` to allow for EH Continuation targets that aren't `catchret` instructions (#129953 ) This change splits out the renaming and comment updates from #129612 as a non-functional change.	2025-03-06 09:28:44 -08:00
Matt Arsenault	b21663cb5b	SplitKit: Take register class directly from instruction definition (#129727 ) This fixes an expensive chesk failure after 8476a5d480304. The issue was essentially that getRegClassConstraintEffectForVReg was not doing anything useful, sometimes. If the register passed to it is not present in the instruction, it is a no-op and returns the original classe. The Edit->getReg() register may not be the register as it appears in either the use or def instruction. It may be some split register, so take the register directly from the instruction being rematerialized. Also directly query the constraint from the def instruction, with a hardcoded operand index. This isn't ideal, but all the other rematerialize code makes the same assumption. So far I've been unable to reproduce this with a standalone MIR test. In the original case, stop-before=greedy and running the one pass is not working.	2025-03-06 20:06:35 +07:00
Benjamin Maxwell	0228b778a4	[SDAG] Add missing SoftenFloatRes legalization for FMODF (#129264 ) This is needed on some ARM platforms.	2025-03-05 13:45:48 +00:00
Lucas Ramirez	03677f63a7	[MachineScheduler] Optional scheduling of single-MI regions (#129704 ) Following 15e295d the machine scheduler no longer filters-out single-MI regions when emitting regions to schedule. While this has no functional impact at the moment, it generally has a negative compile-time impact (see #128739). Since all targets but AMDGPU do not care for this behavior, this introduces an off-by-default flag to `ScheduleDAGInstrs` to control whether such regions are going to be scheduled, effectively reverting 15e295d for all targets but AMDGPU (currently the only target enabling this flag).	2025-03-04 17:46:44 +01:00
Akshat Oke	77f44a9642	[CodeGen][NewPM] Port MachineSink to NPM (#115434 ) Targets can set the EnableSinkAndFold option in CGPassBuilderOptions for the NPM pipeline in buildCodeGenPipeline(... &Opts, ...)	2025-03-03 15:49:37 +05:30
Lucas Ramirez	15e295d30a	[MachineScheduler][AMDGPU] Allow scheduling of single-MI regions (#128739 ) The MI scheduler skips regions containing a single MI during scheduling. This can prevent targets that perform multi-stage scheduling and move MIs between regions during some stages to reason correctly about the entire IR, since some MIs will not be assigned to a region at the beginning. This makes the machine scheduler no longer skip single-MI regions. Only a few unit tests are affected (mainly those which check for the scheduler's debug output).	2025-02-27 11:27:07 +01:00
Eli Friedman	1b39328d74	[CodeGen] Fix MachineInstr::isSafeToMove handling of inline asm. (#126807 ) Even if an inline asm doesn't have memory effects, we can't assume it's safe to speculate: it could trap, or cause undefined behavior. At the LLVM IR level, this is handled correctly: we don't speculate inline asm (unless it's marked "speculatable", but I don't think anyone does that). Codegen also needs to respect this restriction. This change stops Early If Conversion and similar passes from speculating an INLINEASM MachineInstr. Some uses of isSafeToMove probably could be switched to a different API: isSafeToMove assumes you're hoisting, but we could handle some forms of sinking more aggressively. But I'll leave that for a followup, if it turns out to be relevant. See also discussion on gcc bugtracker https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102150 .	2025-02-25 15:29:12 -08:00
Vikash Gupta	352c48f278	[SelectionDAG] Utilizing target hook convertSelectOfConstantsToMath for SelectwithConstant (#127599 ) The Target hook convertSelectOfConstantsToMath() needs to be used within SimplifySelectCC helper combine function in SelectionDAG Isel, where generic select folding with constants is happening into simple maths op using the condition as it is. It necessarily fixes #121145.	2025-02-25 20:32:24 +05:30
Simon Pilgrim	7de64925da	[DAG] shouldReduceLoadWidth - hasOneUse should check just the loaded value - not the chain (#128167 ) The hasOneUse check was failing in any case where the load was part of a chain - we should only be checking if the loaded value has one use, and any updates to the chain should be handled by the fold calling shouldReduceLoadWidth. I've updated the x86 implementation to match, although it has no effect here yet (I'm still looking at how to improve the x86 implementation) as the inner for loop was discarding chain uses anyway. By using SDValue::hasOneUse instead this patch exposes a missing dependency on the LLVMSelectionDAG library in a lot of tools + unittests, which resulted in having to make SDNode::hasNUsesOfValue inline. Noticed while fighting the x86 regressions in #122671	2025-02-24 11:09:41 +00:00
Nikita Popov	d8b2e432d6	[IR] Remove mul constant expression (#127046 ) Remove support for the mul constant expression, which has previously already been marked as undesirable. This removes the APIs to create mul expressions and updates tests to stop using mul expressions. Part of: https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179	2025-02-14 09:28:57 +01:00
Akshat Oke	7b60e03d73	Reland "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703 )" (#126684 ) `RegisterClassInfo` was supposed to be kept alive between pass runs, which wasn't being done leading to recomputations increasing the compile time. Now the Impl class is a member of the legacy and new passes so that it is not reconstructed on every pass run. --------- Co-authored-by: Christudasan Devadasan <christudasan.devadasan@amd.com>	2025-02-12 18:54:39 +05:30
Cullen Rhodes	317a644ae6	[SDAG] Precommit tests for #126207 (NFC) (#126208 ) Add missing test coverage for codepaths touched by #126207.	2025-02-10 09:13:02 +00:00
Akshat Oke	564b9b7f4d	Revert "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703 )" (#126268 ) This reverts commit 5aa4979c47255770cac7b557f3e4a980d0131d69 while I investigate what's causing the compile-time regression.	2025-02-08 15:36:48 +05:30
Matt Arsenault	c268a3f093	DAG: Fix extract of load combine with mismatched vector element type Fix the case where the vector element type of the loaded extractelement input does not match the result type of the extract. This fixes a regression reported after c55a7659b38946350315ac4a18d9805deb1f0a54	2025-02-06 22:56:56 +07:00
Matt Arsenault	58a88001f3	PeepholeOpt: Fix looking for def of current copy to coalesce (#125533 ) This fixes the handling of subregister extract copies. This will allow AMDGPU to remove its implementation of shouldRewriteCopySrc, which exists as a 10 year old workaround to this bug. peephole-opt-fold-reg-sequence-subreg.mir will show the expected improvement once the custom implementation is removed. The copy coalescing processing here is overly abstracted from what's actually happening. Previously when visiting coalescable copy-like instructions, we would parse the sources one at a time and then pass the def of the root instruction into findNextSource. This means that the first thing the new ValueTracker constructed would do is getVRegDef to find the instruction we are currently processing. This adds an unnecessary step, placing a useless entry in the RewriteMap, and required skipping the no-op case where getNewSource would return the original source operand. This was a problem since in the case of a subregister extract, shouldRewriteCopySource would always say that it is useful to rewrite and the use-def chain walk would abort, returning the original operand. Move the process to start looking at the source operand to begin with. This does not fix the confused handling in the uncoalescable copy case which is proving to be more difficult. Some currently handled cases have multiple defs from a single source, and other handled cases have 0 input operands. It would be simpler if this was implemented with isCopyLikeInstr, rather than guessing at the operand structure as it does now. There are some improvements and some regressions. The regressions appear to be downstream issues for the most part. One of the uglier regressions is in PPC, where a sequence of insert_subrgs is used to build registers. I opened #125502 to use reg_sequence instead, which may help. The worst regression is an absurd SPARC testcase using a <251 x fp128>, which uses a very long chain of insert_subregs. We need improved subregister handling locally in PeepholeOptimizer, and other pasess like MachineCSE to fix some of the other regressions. We should handle subregister composes and folding more indexes into insert_subreg and reg_sequence.	2025-02-05 23:29:02 +07:00
Christudasan Devadasan	44f638f88e	CodeGen][NewPM] Port PostRAScheduler to NPM. (#125798 )	2025-02-05 12:45:59 +05:30
Christudasan Devadasan	5aa4979c47	CodeGen][NewPM] Port MachineScheduler to NPM. (#125703 )	2025-02-05 12:17:59 +05:30
Akshat Oke	4313345f2e	[CodeGen][NewPM] Port MachineCopyPropagation to NPM (#125202 )	2025-02-04 15:45:03 +05:30
Matt Arsenault	3a2b552e44	TwoAddressInstruction: Fix assert on undef operand with -early-live-intervals (#125518 )	2025-02-03 23:48:28 +07:00
Sergei Barannikov	ff9c041d96	[MachineScheduler] Fix physreg dependencies of ExitSU (#123541 ) Providing the correct operand index allows addPhysRegDataDeps to compute the correct latency. Pull Request: https://github.com/llvm/llvm-project/pull/123541	2025-02-01 20:40:50 +03:00
TiborGY	3630d9ef65	[PartiallyInlineLibCalls] Add infrastructure for emitting optimization remarks from PartiallyInlineLibCalls (#122654 ) I am planning to add some optimization remarks to the `PartiallyInlineLibCalls` pass. However, since this pass does not emit any optimization remarks yet, I have to add the "infrastructure" for that first, which is what this PR is about.	2025-01-22 13:15:40 +07:00
Pedro Lobo	c23f2417dc	[CodeGenPrepare] Replace `undef` use with `poison` [NFC] (#123111 ) When generating a constant vector, if `UseSplat` is false, the indices different from the index of the extract can be filled with `poison` instead of `undef`.	2025-01-16 08:17:55 +00:00
Florian Hahn	0b3912622e	[ARM] Update LV test in test/Codegen/ARM after 1de3dc7d23.	2025-01-14 22:41:31 +00:00
David Green	ab9a80a3ad	[DAG] Allow AssertZExt to scalarize. (#122463 ) With range and undef metadata on a call we can have vector AssertZExt generated on a target with no vector operations. The AssertZExt needs to scalarize to a normal `AssertZext tin, ValueType`. I have added AssertSext too, although I do not have a test case. Fixes #110374	2025-01-11 16:29:06 +00:00
Antonio Frighetto	446a426436	[ARM] Record store with pre/post-indexed addressing as `mayStore` A miscompilation issue observed during machine sinking has been addressed with improved handling. Fixes: https://github.com/llvm/llvm-project/issues/121299.	2025-01-07 09:39:05 +01:00
Antonio Frighetto	7810e6a3a8	[ARM] Introduce test for PR121565 (NFC)	2025-01-07 09:39:05 +01:00
Björn Pettersson	3ad2399148	[DAGCombiner] Refactor and improve ReduceLoadOpStoreWidth (#119564 ) This patch make a couple of improvements to ReduceLoadOpStoreWidth. When determining the minimum size of "NewBW" we now take byte boundaries into account. If we for example touch bits 6-10 we shouldn't accept NewBW=8, because we would fail later when detecting that we can't access bits from two different bytes in memory using a single load. Instead we make sure to align LSB/MSB according to byte size boundaries up front before searching for a viable "NewBW". In the past we only tried to find a "ShAmt" that was a multiple of "NewBW", but now we use a sliding window technique to scan for a viable "ShAmt" that is a multiple of the byte size. This can help out finding more opportunities for optimization (specially if the original type isn't byte sized, and for big-endian targets when the original load/store is aligned on the most significant bit).	2024-12-16 12:15:11 +01:00
Pengcheng Wang	da71203e6f	[MISched] Unify the way to specify scheduling direction (#119518 ) For pre-ra scheduling, we use two options `-misched-topdown` and `-misched-bottomup` to force the direction. While for post-ra scheduling, we use `-misched-postra-direction` with enumerated values (`topdown`, `bottomup` and `bidirectional`). This is not unified and adds some mental burdens. Here we replace these two options `-misched-topdown` and `-misched-bottomup` with `-misched-prera-direction` with the same enumerated values. To avoid the condition of `getNumOccurrences() > 0`, we add a new enum value `Unspecified` and make it the default initial value. These options are hidden, so we needn't keep the compatibility.	2024-12-12 11:24:07 +08:00
Bjorn Pettersson	22780f808a	[DAGCombiner] Fix to avoid writing outside original store in ReduceLoadOpStoreWidth (#119203 ) DAGCombiner::ReduceLoadOpStoreWidth could replace memory accesses with more narrow loads/store, although sometimes the new load/store would touch memory outside the original object. That seemed wrong and this patch is simply avoiding doing the DAG combine in such situations. Also simplifying the expression used to align ShAmt down to a multiple of NewBW. Subtracting (ShAmt % NewBW) should do the same thing as the old more complicated expression. Intention is to follow up with a patch that make more attempts, trying to align the memory accesses at other offsets, allowing to trigger the transform in more situations. The current strategy for deciding size (NewBW) and offset (ShAmt) for the narrowed operations are a bit ad-hoc, and not really considering big endian memory order in same way as little endian.	2024-12-11 15:07:16 +01:00
Bjorn Pettersson	bc1f3eb593	[DAGCombiner] Pre-commit test case for ReduceLoadOpStoreWidth. NFC Adding test cases related to narrowing of load-op-store sequences. ReduceLoadOpStoreWidth isn't careful enough, so it may end up creating load/store operations that access memory outside the region touched by the original load/store. Using ARM as a target for the test cases to show what happens for both little-endian and big-endian. This patch also adds a way to override the TLI.isNarrowingProfitable check in DAGCombiner::ReduceLoadOpStoreWidth by using the option -combiner-reduce-load-op-store-width-force-narrowing-profitable. Idea is that it should be simpler to for example add lit tests verifying that the code is correct for big-endian (which otherwise is difficult since there are no in-tree big-endian targets that is overriding TLI.isNarrowingProfitable). This is a pre-commit for https://github.com/llvm/llvm-project/pull/119203	2024-12-11 15:07:15 +01:00
Sergei Barannikov	e0ed0333f0	Reland "[ARM] Stop gluing ALU nodes to branches / selects" (#118887 ) Re-landing #116970 after fixing miscompilation error. The original change made it possible for CMPZ to have multiple uses; `ARMDAGToDAGISel::SelectCMPZ` was not prepared for this. Pull Request: https://github.com/llvm/llvm-project/pull/118887 Original commit message: Following #116547 and #116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue. Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change.	2024-12-07 10:14:36 +03:00
Oliver Stannard	2d8e8dd2b8	[ARM] Add Cortex-A510 CPU for AArch32 (#118811 ) This core was originally AArch64-only, but the r1p0 revision added optional support for AArch32 at EL0. TRM: https://developer.arm.com/documentation/101604/0103	2024-12-06 08:51:22 +00:00
Oliver Stannard	99b862efba	[DAGISel][ARM] Fix vector truncate combine for big-endian (#118101 ) This DAG combine was incorrect for big-endian targets, because it assumes that when a bitcast changes the lane width, the least-significant bits of the wider lanes are in the lower-numbered lanes of the smaller type, which is only true for little-endian.	2024-12-04 14:32:15 +00:00
Simon Pilgrim	e6eac65ad6	[ARM] 2012-03-13-DAGCombineBug.ll - regenerate checks	2024-12-02 11:46:49 +00:00
Martin Storsjö	2a5e1da57a	Revert "[ARM] Stop gluing ALU nodes to branches / selects" (#118232 ) Reverts llvm/llvm-project#116970. This change broke Wine compiled for armv7, causing segfaults when starting Wine. See llvm/llvm-project#116970 for more detailed discussion about the issue.	2024-12-02 00:02:25 +02:00
Sergei Barannikov	a348f223ca	[ARM] Stop gluing ALU nodes to branches / selects (#116970 ) Following #116547 and #116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue. Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change. Pull Request: https://github.com/llvm/llvm-project/pull/116970	2024-11-30 08:14:24 +03:00
Sergei Barannikov	61a23646c9	[SjLjEHPrepare] Configure call sites correctly (#117656 ) After 9fe78db4, the pass inserts `store volatile i32 -1, ptr %call_site` before all invoke instruction except the one in the entry block, which has the effect of bypassing landing pads on exceptions. When configuring the call site for a potentially throwing instruction check that it is not `InvokeInst` -- they are handled by earlier code.	2024-11-27 08:03:47 +03:00
Sergei Barannikov	ad9dcd96dc	Reland "[ARM] Stop gluing FP comparisons to FMSTAT" (#117248 ) Following #116547, this changes the result of `ARMISD::CMPFP*` and the operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type. This change allows comparisons to be CSEd and scheduled around as can be seen in the test changes. Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is going to be changed in a separate patch. This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a negative value. The reason is the same as for CCR register class: it makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV` register to / from virtual registers. Previously, this was not necessary, since no attempt was made to create copies in the first place. `TRI::getCrossCopyRegClass` is modified in a way that prevents DAG scheduler from copying FPSCR into a virtual register. The register allocator might need to spill the virtual register, but that only seem to work in Thumb mode.	2024-11-22 22:29:58 +03:00
Sergei Barannikov	5d32a1409d	Revert "[ARM] Stop gluing FP comparisons to FMSTAT" (#117175 ) Reverts llvm/llvm-project#116676 Reverting per post-commit feedback (causes miscompilation errors and/or assertion failures).	2024-11-21 18:26:53 +03:00
Sergei Barannikov	8c56dd3040	[ARM] Stop gluing FP comparisons to FMSTAT (#116676 ) Following #116547, this changes the result of `ARMISD::CMPFP*` and the operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type. This change allows comparisons to be CSEd and scheduled around as can be seen in the test changes. Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is going to be changed in a separate patch. This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a negative value. The reason is the same as for CCR register class: it makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV` register to / from virtual registers. Previously, this was not necessary, since no attempt was made to create copies in the first place. There might be a case when a copy can't be avoided (although not found in existing tests). If a copy is necessary, the virtual register will be created with `cl_FPSCR_NZCV` register class. If this register class is inappropriate, `TRI::getCrossCopyRegClass` should be modified to return the correct class. Pull Request: https://github.com/llvm/llvm-project/pull/116676	2024-11-20 16:07:05 +03:00

1 2 3 4 5 ...

5043 Commits