llvm-project

Author	SHA1	Message	Date
Craig Topper	9240061800	[RegAllocFast] Don't align stack slots if the stack can't be realigned (#153682 ) This is the fast regalloc equivalent of 773771ba382b1fbcf6acccc0046bfe731541a599.	2025-08-19 08:17:26 -07:00
Shoreshen	db96363c0a	[AMDGPU] Avoid put implicit_def into bundle that break reg's liveness (#142563 ) Cause: 1. `implicit_def` inside bundle does not count for define of reg in machineinst verifier 2. Including `implicit_def` will cause relative reg not define, result in `Bad machine code: Using an undefined physical register` in the machineinst verifier Fixes https://github.com/llvm/llvm-project/issues/139102 --------- Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>	2025-08-13 10:41:44 +08:00
Philip Reames	4d629f9744	[MIR] Remove std::variant from multiple save/restore point handling [nfc] (#153226 ) In review of bbde6b, I had originally proposed that we support the legacy text format. As review evolved, it bacame clear this had been a bad idea (too much complexity), but in order to let that patch finally move forward, I approved the change with the variant. This change undoes the variant, and updates all the tests to just use the array form.	2025-08-12 11:23:05 -07:00
Nikita Popov	92c55a315e	[IR] Only allow lifetime.start/end on allocas (#149310 ) lifetime.start and lifetime.end are primarily intended for use on allocas, to enable stack coloring and other liveness optimizations. This is necessary because all (static) allocas are hoisted into the entry block, so lifetime markers are the only way to convey the actual lifetimes. However, lifetime.start and lifetime.end are currently allowed to be used on non-alloca pointers. We don't actually do this in practice, but just the mere fact that this is possible breaks the core purpose of the lifetime markers, which is stack coloring of allocas. Stack coloring can only work correctly if all lifetime markers for an alloca are analyzable. * If a lifetime marker may operate on multiple allocas via a select/phi, we don't know which lifetime actually starts/ends and handle it incorrectly (https://github.com/llvm/llvm-project/issues/104776). * Stack coloring operates on the assumption that all lifetime markers are visible, and not, for example, hidden behind a function call or escaped pointer. It's not possible to change this, as part of the purpose of lifetime markers is that they work even in the presence of escaped pointers, where simple use analysis is insufficient. I don't think there is any way to have coherent semantics for lifetime markers on allocas, while also permitting them on arbitrary pointer values. This PR restricts lifetimes to operate on allocas only. As a followup, I will also drop the size argument, which is superfluous if we always operate on an alloca. (This change also renders various code handling lifetime markers on non-alloca dead. I plan to clean up that kind of code after dropping the size argument as well.) In practice, I've only found a few places that currently produce lifetimes on non-allocas: * CoroEarly replaces the promise alloca with the result of an intrinsic, which will later be replaced back with an alloca. I think this is the only place where there is some legitimate loss of functionality, but I don't think this is particularly important (I don't think we'd expect the promise in a coroutine to admit useful lifetime optimization.) * SafeStack moves unsafe allocas onto a separate frame. We can safely drop lifetimes here, as SafeStack performs its own stack coloring. * Similar for AddressSanitizer, it also moves allocas into separate memory. * LSR sometimes replaces the lifetime argument with a GEP chain of the alloca (where the offsets ultimately cancel out). This is just unnecessary. (Fixed separately in https://github.com/llvm/llvm-project/pull/149492.) * InferAddrSpaces sometimes makes lifetimes operate on an addrspacecast of an alloca. I don't think this is necessary.	2025-07-21 15:04:50 +02:00
Brad Smith	0d2e11f3e8	Remove Native Client support (#133661 ) Remove the Native Client support now that it has finally reached end of life.	2025-07-15 13:22:33 -04:00
David Green	0967957d7a	[CostModel] Handle all cost kinds in getCmpSelInstrCost (#148233 ) Currently we always produce a cost of 1 for all CostKinds that are not RecipThroughput, which can underestimate the cost if the type has a higher legalization cost (like larger vectors). This relaxes it to cover all cost kinds.	2025-07-15 18:08:52 +01:00
John Brawn	f8c2c4f161	[LSR] Account for hardware loop instructions (#147958 ) A hardware loop instruction combines a subtract, compare with zero, and branch. We currently account for the compare and branch being combined into one in Cost::RateFormula, as part of more general handling for compare-branch-zero, but don't account for the subtract, leading to suboptimal decisions in some cases. Fix this in Cost::RateRegister by noticing when we have such a subtract and discounting the AddRecCost in such a case.	2025-07-14 16:48:54 +01:00
AZero13	0edc98cd6d	[ARM] Copy SMAX(lhs, 0) and SMIN(lhs, 0) patterns from AArch64 to ARM (#146565 ) They work on ARM too.	2025-07-10 21:06:52 +01:00
Guy David	76274eb2b3	[PHIElimination] Revert #131837 #146320 #146337 (#146850 ) Reverting because mis-compiles: - https://github.com/llvm/llvm-project/pull/131837 - https://github.com/llvm/llvm-project/pull/146320 - https://github.com/llvm/llvm-project/pull/146337	2025-07-03 07:48:08 -04:00
Guy David	f5c62ee0fa	[PHIElimination] Reuse existing COPY in predecessor basic block (#131837 ) The insertion point of COPY isn't always optimal and could eventually lead to a worse block layout, see the regression test in the first commit. This change affects many architectures but the amount of total instructions in the test cases seems too be slightly lower.	2025-06-29 21:28:42 +03:00
AZero13	8bd6d36a44	[ARM] Override hasAndNotCompare (#145441 ) bics is available on ARM. USAT regressions are to be fixed after this because that is an issue with the ARMISelLowering and should be another PR. Note that opt optimizes those testcases to min/max intrinsics anyway so this should have no real effect on codegen. Proof: https://alive2.llvm.org/ce/z/kPVQ3_	2025-06-29 11:15:56 +01:00
David Green	825ad86aea	[DAG] Fold nested add(add(reduce(a), b), add(reduce(c), d)) (#115150 ) This patch reassociates `add(add(vecreduce(a), b), add(vecreduce(c), d))` into `add(vecreduce(add(a, c)), add(b, d))`, to combine the reductions into a single node. This comes up after unrolling vectorized loops. There is another small change to move reassociateReduction inside fadd outside of a AllowNewConst block, as new constants will not be created and it should be OK to perform the combine later after legalization.	2025-06-24 13:08:59 +01:00
Bjorn Pettersson	dac0820b27	[Thumb2] Regenerate some test checks. NFC	2025-06-18 11:54:51 +02:00
Matt Arsenault	505c550e4c	DAG: Assert fcmp uno runtime calls are boolean values (#142898 ) This saves 2 instructions in the ARM soft float case for fcmp ueq. This code is written in an confusingly overly general way. The point of getCmpLibcallCC is to express that the compiler-rt implementations of the FP compares are different aliases around functions which may return -1 in some cases. This does not apply to the call for unordered, which returns a normal boolean. Also stop overriding the default value for the unordered compare for ARM. This was setting it to the same value as the default, which is now assumed.	2025-06-10 10:46:29 +09:00
David Green	6baaa0afc3	[ARM] Handle roundeven for MVE. (#142557 ) Now that #141786 handles scalar and neon types, this adds MVE definitions and legalization for llvm.roundeven intrinsics. The existing llvm.arm.mve.vrintn are auto-upgraded to llvm.roundeven like other vrint instructions, so should continue to work.	2025-06-08 18:23:50 +01:00
Folkert de Vries	46b389218b	[ARM]: codegen `llvm.roundeven.v*` (#141786 ) fixes https://github.com/llvm/llvm-project/issues/73588 The aarch64 version of `frintn.ll` notes the intention to auto-upgrade `frintn` to `roundeven`. I haven't been able to figure out how to make that happen though (either for arm or aarch64). The original issue came up in https://github.com/rust-lang/stdarch/pull/1807	2025-05-30 19:36:29 +01:00
David Green	11953c647b	[ARM] Remove kill flags in ReplaceConstByVPNOTs. (#140082 ) This is similar to #86300. The vpr register on this branch might be killed before we reuse it.	2025-05-22 08:24:01 +01:00
Mohammad Bashir	bcdce987c0	Fix regression tests with bad FileCheck checks (#140373 ) Fixes https://github.com/llvm/llvm-project/issues/140149	2025-05-22 07:59:57 +03:00
John Brawn	dd87127f4e	[DAGCombiner] Eliminate fp casts if we have the right fast math flags (#131345 ) When floating-point operations are legalized to operations of a higher precision (e.g. f16 fadd being legalized to f32 fadd) then we get narrowing then widening operations between each operation. With the appropriate fast math flags (nnan ninf contract) we can eliminate these casts.	2025-04-28 11:21:51 +01:00
Sergei Barannikov	7af555e524	[ARM][RISCV] Partially revert #101786 (#137120 ) The change as is breaks the Linux kernel build as pointed out in the comments.	2025-04-24 10:13:05 +03:00
Sergei Barannikov	11a3de7e98	[SDag][ARM][RISCV] Allow lowering CTPOP into a libcall (#101786 ) This is a reland of #99752 with the bug fixed (see test diff in the third commit in this PR). All `popcount` libcalls return `int`, but `ISD::CTPOP` returns the type of the argument, which can be wider than `int`. The fix is to make DAG legalizer pass the correct return type to `makeLibCall` and sign-extend the result afterwards. Original commit message: The main change is adding CTPOP to `RuntimeLibcalls.def` to allow targets to use LibCall action for CTPOP. DAG legalizers are changed accordingly. Pull Request: https://github.com/llvm/llvm-project/pull/101786	2025-04-23 12:43:05 +03:00
Pengxuan Zheng	36acaa0be5	Revert "[ARM][ConstantIslands] Correct MinNoSplitDisp calculation (#114590 )" This reverts commit e48916f615e0ad2b994b2b785d4fe1b8a98bc322.	2025-04-10 13:56:52 -07:00
David Green	6c27817294	[SelectionDAG] Use SimplifyDemandedBits from SimplifyDemandedVectorElts Bitcast. (#133717 ) This adds a call to SimplifyDemandedBits from bitcasts with scalar input types in SimplifyDemandedVectorElts, which can help simplify the input scalar.	2025-04-03 11:14:08 +01:00
Fangrui Song	04a67528d3	[MC] Simplify MCBinaryExpr/MCUnaryExpr printing by reducing parentheses (#133674 ) The existing pretty printer generates excessive parentheses for MCBinaryExpr expressions. This update removes unnecessary parentheses of MCBinaryExpr with +/- operators and MCUnaryExpr. Since relocatable expressions only use + and -, this change improves readability in most cases. Examples: - (SymA - SymB) + C now prints as SymA - SymB + C. This updates the output of -fexperimental-relative-c++-abi-vtables for AArch64 and x86 to `.long _ZN1B3fooEv@PLT-_ZTV1B-8` - expr + (MCTargetExpr) now prints as expr + MCTargetExpr, with this change primarily affecting AMDGPUMCExpr.	2025-03-30 22:03:14 -07:00
Rifet-c	4a0212b9fc	[CodeGen] Combine two loops in SlotIndexes.cpp file (#127631 ) Merged two loops that were iterating over the same machine basic block into one, also did some minor readability improvements (variable renaming, commenting and absorbing if condition into a variable)	2025-03-07 08:17:23 +07:00
Daniel Paoliello	16e051f0b9	[win] NFC: Rename `EHCatchret` to `EHCont` to allow for EH Continuation targets that aren't `catchret` instructions (#129953 ) This change splits out the renaming and comment updates from #129612 as a non-functional change.	2025-03-06 09:28:44 -08:00
Matt Arsenault	c53eb93dd7	PeepholeOpt: Immediately check if a reg_sequence compose supports a subregister (#128279 ) This is a quick fix for EXPENSIVE_CHECKS bot failures. I still think we could defer looking for a compatible subregister further up the use-def chain, and should be able to check compatibilty with the ultimate found source.	2025-02-26 10:15:17 +07:00
Matt Arsenault	1bb43068f1	PeepholeOpt: Allow introducing subregister uses on reg_sequence (#127052 ) This reverts d246cc618adc52fdbd69d44a2a375c8af97b6106. We now handle composing subregister extracts through reg_sequence.	2025-02-22 09:16:14 +07:00
Oliver Stannard	308ce8d524	[ARM] Fix calling convention for __fp16 with big-endian (#126741 ) AAPCS32 defines the fp16 and bf16 types as being passed as if they were extended to 32 bits, with the high 16 bits being unspecified. The extension is specified as happening as-if it was done in a register, which means that for big endian targets, the actual value gets passed in the higher addressed half of the stack slot, instead of the lower addressed half as for little endian. Previously, for targets with the fp16 extension, we were passing these types as a 16 bit stack slot, which worked for little endian because every later stack slot would be 4-byte aligned leaving the 2 byte gap, but was incorrect for big endian.	2025-02-13 09:09:29 +00:00
Matt Arsenault	58a88001f3	PeepholeOpt: Fix looking for def of current copy to coalesce (#125533 ) This fixes the handling of subregister extract copies. This will allow AMDGPU to remove its implementation of shouldRewriteCopySrc, which exists as a 10 year old workaround to this bug. peephole-opt-fold-reg-sequence-subreg.mir will show the expected improvement once the custom implementation is removed. The copy coalescing processing here is overly abstracted from what's actually happening. Previously when visiting coalescable copy-like instructions, we would parse the sources one at a time and then pass the def of the root instruction into findNextSource. This means that the first thing the new ValueTracker constructed would do is getVRegDef to find the instruction we are currently processing. This adds an unnecessary step, placing a useless entry in the RewriteMap, and required skipping the no-op case where getNewSource would return the original source operand. This was a problem since in the case of a subregister extract, shouldRewriteCopySource would always say that it is useful to rewrite and the use-def chain walk would abort, returning the original operand. Move the process to start looking at the source operand to begin with. This does not fix the confused handling in the uncoalescable copy case which is proving to be more difficult. Some currently handled cases have multiple defs from a single source, and other handled cases have 0 input operands. It would be simpler if this was implemented with isCopyLikeInstr, rather than guessing at the operand structure as it does now. There are some improvements and some regressions. The regressions appear to be downstream issues for the most part. One of the uglier regressions is in PPC, where a sequence of insert_subrgs is used to build registers. I opened #125502 to use reg_sequence instead, which may help. The worst regression is an absurd SPARC testcase using a <251 x fp128>, which uses a very long chain of insert_subregs. We need improved subregister handling locally in PeepholeOptimizer, and other pasess like MachineCSE to fix some of the other regressions. We should handle subregister composes and folding more indexes into insert_subreg and reg_sequence.	2025-02-05 23:29:02 +07:00
Albert Huang	409fa785d9	[ARM] Add "avoidmuls" to STAR-MC1 also (#123706 ) PR #112540 as the reference.	2025-02-05 07:44:53 +00:00
Sergei Barannikov	ff9c041d96	[MachineScheduler] Fix physreg dependencies of ExitSU (#123541 ) Providing the correct operand index allows addPhysRegDataDeps to compute the correct latency. Pull Request: https://github.com/llvm/llvm-project/pull/123541	2025-02-01 20:40:50 +03:00
Nikita Popov	eeac0ffaf4	Revert "[MachineLICM] Use `RegisterClassInfo::getRegPressureSetLimit` (#119826 )" This reverts commit b4e17d4a314ed87ff6b40b4b05397d4b25b6636a. This causes a large compile-time regression.	2025-01-10 09:05:06 +01:00
Pengcheng Wang	b4e17d4a31	[MachineLICM] Use `RegisterClassInfo::getRegPressureSetLimit` (#119826 ) `RegisterClassInfo::getRegPressureSetLimit` is a wrapper of `TargetRegisterInfo::getRegPressureSetLimit` with some logics to adjust the limit by removing reserved registers. It seems that we shouldn't use `TargetRegisterInfo::getRegPressureSetLimit` directly, just like the comment "This limit must be adjusted dynamically for reserved registers" said. Separate from https://github.com/llvm/llvm-project/pull/118787	2025-01-09 21:05:52 +08:00
David Green	f87a9db832	[ARM] Expand fp64 bf16 converts similarly to f32 This helps with +fp64 targets where the f64s are legal and not previously lowered. It can treat fpextends as a shift + cvt and fptrunc can use a libcall.	2025-01-03 11:28:31 +00:00
David Green	ccbbacf0fa	[ARM] Fix MVE incrementing gather offset calculation The code was checking the gep ptr type as opposed to the gep source element type in calculating the offset scale. Fixes #120993	2024-12-24 08:48:01 +00:00
David Green	4472648998	[ARM] Expand bf16 expanding/rounding fp loads/stores As with other fp types, these should be expanded to prevent nodes that are illegal for Arm.	2024-12-20 09:03:28 +00:00
David Green	e020f46027	[ARM] Fix BF16 lowering with FullFP16 This adds test coverage for bf16 instructions, making sure that lowering bf16 works with and without +fullfp16.	2024-12-19 10:20:35 +00:00
Fangrui Song	133352feb3	[test] Remove redundant -march= when target triple is specified in IR	2024-12-15 12:42:17 -08:00
pzhengqc	e48916f615	[ARM][ConstantIslands] Correct MinNoSplitDisp calculation (#114590 ) MinNoSplitDisp was first introduced in D16890 to handle cases where the ConstantIslands pass fails to converge in the presence of big basic blocks. However, the computation of the variable seems to be wrong as it currently computes the offset immediately following UserBB. In other words, it represents the distance from the beginning of the function to the end of UserBB. The distance from the beginning of the function does not seem to be a good indicator of how big the basic block is unless the basic block is close to the beginning of the function. I think MinNoSplitDisp should compute the distance between UserOffset and the end of UserBB instead.	2024-12-14 10:14:50 -08:00
Sergei Barannikov	e0ed0333f0	Reland "[ARM] Stop gluing ALU nodes to branches / selects" (#118887 ) Re-landing #116970 after fixing miscompilation error. The original change made it possible for CMPZ to have multiple uses; `ARMDAGToDAGISel::SelectCMPZ` was not prepared for this. Pull Request: https://github.com/llvm/llvm-project/pull/118887 Original commit message: Following #116547 and #116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue. Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change.	2024-12-07 10:14:36 +03:00
Oliver Stannard	f893b47500	[ARM] Fix instruction selection for MVE vsbciq intrinsic (#118284 ) There were two bugs in the implementation of the MVE vsbciq (subtract with carry across vector, with initial carry value) intrinsics: * The VSBCI instruction behaves as if the carry-in is always set, but we were selecting it when the carry-in is clear. * The vsbciq intrinsics should generate IR with the carry-in set, but they were leaving it clear. These two bugs almost cancelled each other out, but resulted in incorrect code when the vsbcq intrinsics (with a carry-in) were used, and the carry-in was a compile time constant.	2024-12-06 08:46:56 +00:00
Martin Storsjö	2a5e1da57a	Revert "[ARM] Stop gluing ALU nodes to branches / selects" (#118232 ) Reverts llvm/llvm-project#116970. This change broke Wine compiled for armv7, causing segfaults when starting Wine. See llvm/llvm-project#116970 for more detailed discussion about the issue.	2024-12-02 00:02:25 +02:00
Sergei Barannikov	a348f223ca	[ARM] Stop gluing ALU nodes to branches / selects (#116970 ) Following #116547 and #116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue. Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change. Pull Request: https://github.com/llvm/llvm-project/pull/116970	2024-11-30 08:14:24 +03:00
Sergei Barannikov	ad9dcd96dc	Reland "[ARM] Stop gluing FP comparisons to FMSTAT" (#117248 ) Following #116547, this changes the result of `ARMISD::CMPFP*` and the operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type. This change allows comparisons to be CSEd and scheduled around as can be seen in the test changes. Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is going to be changed in a separate patch. This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a negative value. The reason is the same as for CCR register class: it makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV` register to / from virtual registers. Previously, this was not necessary, since no attempt was made to create copies in the first place. `TRI::getCrossCopyRegClass` is modified in a way that prevents DAG scheduler from copying FPSCR into a virtual register. The register allocator might need to spill the virtual register, but that only seem to work in Thumb mode.	2024-11-22 22:29:58 +03:00
Sergei Barannikov	5d32a1409d	Revert "[ARM] Stop gluing FP comparisons to FMSTAT" (#117175 ) Reverts llvm/llvm-project#116676 Reverting per post-commit feedback (causes miscompilation errors and/or assertion failures).	2024-11-21 18:26:53 +03:00
Sergei Barannikov	8c56dd3040	[ARM] Stop gluing FP comparisons to FMSTAT (#116676 ) Following #116547, this changes the result of `ARMISD::CMPFP*` and the operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type. This change allows comparisons to be CSEd and scheduled around as can be seen in the test changes. Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is going to be changed in a separate patch. This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a negative value. The reason is the same as for CCR register class: it makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV` register to / from virtual registers. Previously, this was not necessary, since no attempt was made to create copies in the first place. There might be a case when a copy can't be avoided (although not found in existing tests). If a copy is necessary, the virtual register will be created with `cl_FPSCR_NZCV` register class. If this register class is inappropriate, `TRI::getCrossCopyRegClass` should be modified to return the correct class. Pull Request: https://github.com/llvm/llvm-project/pull/116676	2024-11-20 16:07:05 +03:00
Oliver Stannard	aba55809e9	[ARM] Fix operand order for MVE predicated VFMAS (#115908 ) For most MVE predicated FMA instructions, disabled lanes will contain the value in the addend operand. However, The VFMAS instruction takes the addend in a GPR, and the output register is shared with the first multiply operand, so disabled lanes will get that value instead. This means that we can't use the same intrinsic as for the other VFMA instructions. Instead, we can codegen the vfmas intrinsic to a regular FMA and select in clang, which the backend already has the patterns to select VFMAS from.	2024-11-13 12:16:28 +00:00
David Green	750247bc1c	[ARM] Fix APInt assert for CSNEG known bits. Use APInt::getAllOnes(32), to avoid the assert when constructing an APInt from -1.	2024-11-11 19:39:02 +00:00
Oliver Stannard	9f02950a15	[ARM] Allow spilling FPSCR for MVE adc/sbc intrinsics (#115174 ) The MVE VADC and VSBC instructions read and write a carry bit in FPSCR, which is exposed through the intrinsics. This makes it possible to write code which has the FPSCR live across a function call, or which uses the same value twice, so it needs to be possible to spill and reload it. There is a missed optimisation in one of the test cases, where we reload the FPSCR from the stack despite it still being live, I've not found a simple way to prevent the register allocator from doing this.	2024-11-07 11:23:49 +00:00

1 2 3 4 5 ...

1872 Commits