llvm-project

Author	SHA1	Message	Date
zhijian lin	41647412c6	[PowerPC] Fix an LowerADDSUBO_CARRY error when converting carry bit for usubo_carry (#137809 ) In PowerPC, if a borrow occurs during a subtraction, the carry bit is zero (unset). The carry bit is set if no borrow occurs. For ISD::USUBO_CARRY, the nodes produce two results: the normal result of the addition or subtraction, and a boolean value that is 1 if and only if there is an outgoing carry or borrow. Therefore, we need to convert a 1 (which indicates a borrow in ISD::USUBO_CARRY) to 0 to match PowerPC's definition of borrow. Similarly, we need to convert a 0 (no borrow in ISD::USUBO_CARRY) to 1 for PowerPC. To perform this conversion, we use XOR 1 instead of XOR DAG.getAllOnesConstant(DL, CarryOp.getValueType()). `	2025-04-30 10:39:09 -04:00
Vikram Hegde	53a8b89003	[CodeGen][NewPM] Port "ShrinkWrap" pass to NPM (#129880 )	2025-04-30 13:11:17 +05:30
Maryam Moghadas	82a1d5078d	[PowerPC] Add dense math half-precision floating-point outer-product accumulate to DMR instructions (#133272 ) This patch adds the following Dense Math Facility 16-bit half-precision floating-point calculation instructions: dmxvf16gerx2, dmxvf16gerx2pp, dmxvf16gerx2pn, dmxvf16gerx2np, dmxvf16gerx2nn, pmdmxvf16gerx2, pmdmxvf16gerx2pp, pmdmxvf16gerx2pn, pmdmxvf16gerx2np, pmdmxvf16gerx2nn, along with their corresponding intrinsics and tests.	2025-04-28 16:03:10 -04:00
RolandF77	a903c7b7f5	[PowerPC] Intrinsics and tests for dmr insert/extract (#135653 ) Add some intrinsics and LIT tests for PPC dmr insert/extract instructions.	2025-04-24 11:27:22 -04:00
zhijian lin	3e605b1e1d	[NFC] Add a pre-commit test case for #111696 (#136730 ) Add a pre- commit test case for Patch https://github.com/llvm/llvm-project/pull/111696 Test ppc-vsx-fma-mutate pass work with -schedule-ppc-vsx-fma-mutation-early not hoist the instruction `xxspltiw vs2, 1170469888` out the loop. --------- Co-authored-by: Amy Kwan <amy.kwan1@ibm.com>	2025-04-24 10:37:24 -04:00
Sergei Barannikov	5080a0251f	[CodeGenPrepare] Unfold slow ctpop when used in power-of-two test (#102731 ) DAG combiner already does this transformation, but in some cases it does not have a chance because either CodeGenPrepare or SelectionDAGBuilder move icmp to a different basic block. https://alive2.llvm.org/ce/z/ARzh99 Fixes #94829 Pull Request: https://github.com/llvm/llvm-project/pull/102731	2025-04-23 08:54:10 +03:00
zhijian lin	afda4c295b	Reland [SelectionDAG] Folding ZERO-EXTEND/SIGN_EXTEND poison to Poison value in getNode (#136701 ) This patch addresses the signed/zero extension of poison by using a poison value of the extended type instead of a constant zero of the extended type.	2025-04-22 17:36:41 -04:00
Maryam Moghadas	c40d3a411c	[PowerPC] Add dense math bfloat16 floating-point outer-product accumulate to DMR instructions (#133109 ) This patch adds the following Dense Math Facility bfloat16 floating-point calculation instructions: dmxvbf16gerx2, dmxvbf16gerx2pp,dmxvbf16gerx2pn, dmxvbf16gerx2np, dmxvbf16gerx2nn, pmdmxvbf16gerx2, pmdmxvbf16gerx2pp, pmdmxvbf16gerx2pn, pmdmxvbf16gerx2np, pmdmxvbf16gerx2nn, along with their corresponding intrinsics and tests.	2025-04-21 18:39:44 -04:00
Nico Weber	e18a77cfbe	Revert "[SelectionDAG] Folding ZERO-EXTEND/SIGN_EXTEND poison to Poison value in getNode (#122741 )" This reverts commit f12078e72601e7c03e5d66afab034313caf8f791. Breaks `check-llvm`, see comments on https://github.com/llvm/llvm-project/pull/122741	2025-04-21 10:51:03 -04:00
zhijian lin	f12078e726	[SelectionDAG] Folding ZERO-EXTEND/SIGN_EXTEND poison to Poison value in getNode (#122741 ) The PR will fix the issue https://github.com/llvm/llvm-project/issues/122728 This patch addresses the signed/zero extension of poison by using a poison value of the extended type instead of a constant zero of the extended type.	2025-04-21 10:02:21 -04:00
Yingwei Zheng	7e5317139d	[PowerPC] Pre-commit tests for PR130742. NFC. (#135606 ) Needed by https://github.com/llvm/llvm-project/pull/130742.	2025-04-17 17:52:49 +08:00
Matt Arsenault	393c783a10	LICM: Avoid looking at use list of constant data (#134690 ) The codegen test changes seem incidental. Either way, sms-grp-order.ll seems to already not hit the original issue.	2025-04-13 17:06:38 +02:00
Douglas Yung	b03aa291b8	Add 'REQUIRES: asserts' to test undef-args.ll added in #135247 to skip test when asserts are not present. Should fix bot failure: https://lab.llvm.org/buildbot/#/builders/202/builds/601	2025-04-11 02:18:10 +00:00
zhijian lin	5aeeebc1f4	[NFC] add a pre-commit test case for patch 122741 (#135247 ) [NFC] add a pre-commit test case for patch [Eliminating li of 0 into arg registers of unused arguments](https://github.com/llvm/llvm-project/pull/122741) The test case tests that extend poison are lower to undef and also test there are redendunt instrution load 0 into argument registers for unused arguments.	2025-04-10 16:33:40 -04:00
zhijian lin	378ac572ac	Reland "[SelectionDAG] Introducing a new ISD::POISON SDNode to represent the poison value in the IR." (#135056 ) A new ISD::POISON SDNode is introduced to represent the poison value in the IR, replacing the previous use of ISD::UNDEF	2025-04-10 11:29:14 -04:00
Lei Huang	3479c57466	PowerPC32:PIC: Update to bcl to fix branch prediction mis-predict issue (#134140 ) Update `bl` to `bcl 20, 31, .+4` for 32bit PIC code gen so the link stack is not corrupted and cause mis-predict for the branch predictor. fixes: https://github.com/llvm/llvm-project/issues/128644	2025-04-07 15:50:21 -04:00
Lei Huang	b518242156	[PowerPC] Fix instruction name for dmr insert (#134301 )	2025-04-04 15:56:30 -04:00
zhijian lin	1a540c3b8b	[PowerPC] Deprecate uses of ISD::ADDC/ISD::ADDE/ISD::SUBC/ISD::SUBE (#133155 ) ISD::ADDC, ISD::ADDE, ISD::SUBC and ISD::SUBE are being deprecated, using ISD::UADDO_CARRY,ISD::USUBO_CARRY instead. Lowering the UADDO, UADDO_CARRY, USUBO, USUBO_CARRY in the patch.	2025-04-03 13:22:49 -04:00
Nikita Popov	9356091a98	[GlobalMerge][PPC] Don't merge globals in llvm.metadata section (#131801 ) The llvm.metadata section is not emitted and has special semantics. We should not merge globals in it, similarly to how we already skip merging of `llvm.xyz` globals. Fixes https://github.com/llvm/llvm-project/issues/131394.	2025-04-02 10:40:53 +02:00
Fangrui Song	04a67528d3	[MC] Simplify MCBinaryExpr/MCUnaryExpr printing by reducing parentheses (#133674 ) The existing pretty printer generates excessive parentheses for MCBinaryExpr expressions. This update removes unnecessary parentheses of MCBinaryExpr with +/- operators and MCUnaryExpr. Since relocatable expressions only use + and -, this change improves readability in most cases. Examples: - (SymA - SymB) + C now prints as SymA - SymB + C. This updates the output of -fexperimental-relative-c++-abi-vtables for AArch64 and x86 to `.long _ZN1B3fooEv@PLT-_ZTV1B-8` - expr + (MCTargetExpr) now prints as expr + MCTargetExpr, with this change primarily affecting AMDGPUMCExpr.	2025-03-30 22:03:14 -07:00
Tony Varghese	ff9c5c334a	[shrinkwrap] PowerPC's FP register should be honored when processing the save point for prologue. (#129855 ) When generating code for functions that have `__builtin_frame_address` calls and `noinline` attribute, prologue was not emitted correctly leading to an assertion failure in PowerPC. The issue was due to improper insertion of prologue for a function that contain llvm `__builtin_frame_address`. Shrink-wrap pass computes the save and restore points of a function. Default points are the entry and exit points of the function. During shrink-wrapping the frame-pointer was not honored like the stack pointer and it was considered as a callee-saved register. This change will treat the FP similar to SP and will insert the prolog on top the instruction containing FP. --------- Co-authored-by: Tony Varghese <tony.varghese@ibm.com>	2025-03-21 12:55:39 -04:00
RolandF77	0489447b07	[PowerPC] dmr extract update assembly operand order (#132083 ) The operand order of the assembly for dmr extract instructions has changed since they were added. The results now come before the uses.	2025-03-20 13:40:40 -04:00
Lei Huang	ade22fc1d9	[PowerPC] Support conversion between f16 and f128 (#130158 ) Enables conversion between f16 and f128. Expanding on pre-Power9 targets and using HW instructions on Power9. Fixes https://github.com/llvm/llvm-project/issues/92866 Commandeer of: https://github.com/llvm/llvm-project/pull/97677 --------- Co-authored-by: esmeyi <esme.yi@ibm.com>	2025-03-19 10:19:57 -04:00
Lei Huang	dbc7665b24	PowerPC: Use REG_SEQUENCE instead of INSERT_SUBREG (#129941 ) Update to use REG_SEQUENCE when possible. This patch only update td pattern to utilize REG_SEQUENCE for INSERT_SUBREG for cases where it does not produce a nesting of REG_SEQUENCE. This seem to show some improvement in code gen for `llvm/test/CodeGen/PowerPC/mmaplus-intrinsics.ll`. Fixes part of https://github.com/llvm/llvm-project/issues/125502	2025-03-18 13:41:24 -04:00
Tony Varghese	aab4ce4d5e	[NFC][shrinkwrap] Add test point to capture the prologue and epilogue insertion by shrinkwrap pass for powerpc. (#131192 ) This is NFC patch to capture the insertion of prologue and epilogue by `shrinkwrap` pass for Powerpc target for functions that contain llvm `__builtin_frame_address`. --------- Co-authored-by: Tony Varghese <tony.varghese@ibm.com>	2025-03-18 10:26:44 -04:00
Maryam Moghadas	22c6674f1d	[PowerPC] Add Dense Math binary integer outer-Product accumulate to DMR Instructions (#130791 ) This commit adds the following Dense Math Facility integer calculation instructions: dmxvi8gerx4, dmxvi8gerx4pp, dmxvi8gerx4spp, pmdmxvi8gerx4, pmdmxvi8gerx4pp, and pmdmxvi8gerx4spp, along with their corresponding intrinsics and tests.	2025-03-18 09:40:07 -04:00
Hubert Tong	2091547d4c	[PPC codegen test] NFC: Fix RUN line; fix DATA checks to match 64-bit	2025-03-15 21:20:22 -04:00
Frederik Harwath	6962cf1700	Rename ExpandLargeFpConvertPass to ExpandFpPass (#131128 ) This is meant as a preparation for PR #130988 "[AMDGPU] Implement IR expansion for frem instruction" which implements the expansion of another instruction in this pass. The more general name seems more appropriate given this change and quite reasonable even without it.	2025-03-14 13:11:45 +01:00
RolandF77	4518780c3c	[PowerPC] Add intrinsics and tests for basic Dense Math enablement instructions (#129913 ) Add intrinsics and tests for Dense Math basic enablement instructions dmsetdmrz, dmmr, dmxor.	2025-03-12 12:55:29 -04:00
Daniel Paoliello	16e051f0b9	[win] NFC: Rename `EHCatchret` to `EHCont` to allow for EH Continuation targets that aren't `catchret` instructions (#129953 ) This change splits out the renaming and comment updates from #129612 as a non-functional change.	2025-03-06 09:28:44 -08:00
zhijian lin	0303fd2746	[PowerPC] hoist xxspltib out of loop body (#127121 ) Fixes https://github.com/llvm/llvm-project/issues/127119 Remove `hasSideEffects` from `xxspltib` since there is no special restriction specified in the ISA that prevent it from being reordered, move, CSE, or LICM. Removing this restriction will allow `xxspltib` to be hoisted out of loop bodies.	2025-03-03 11:14:02 -05:00
Benjamin Maxwell	55fdeccc45	[SDAG][X86] Remove hack needed to avoid missing x87 FPU stack pops (#128055 ) If a (two-result) node like `FMODF` or `FFREXP` is expanded to a library call, where said library has the function prototype like: `float(float, float*)` -- that is it returns a float from the call and via an output pointer. The first result of the node maps to the value returned by value and the second result maps to the value returned via the output pointer. If only the second result is used after the expansion, we hit an issue on x87 targets: ``` // Before expansion: t0, t1 = fmodf x return t1 // t0 is unused ``` Expanded result: ``` ptr = alloca ch0 = call modf ptr t0, ch1 = copy_from_reg, ch0 // t0 unused t1, ch2 = ldr ptr, ch1 return t1 ``` So far things are alright, but the DAGCombiner optimizes this to: ``` ptr = alloca ch0 = call modf ptr // copy_from_reg optimized out t1, ch1 = ldr ptr, ch0 return t1 ``` On most targets this is fine. The optimized out `copy_from_reg` is unused and is a NOP. However, x87 uses a floating-point stack, and if the `copy_from_reg` is optimized out it won't emit a pop needed to remove the unused result. The prior solution for this was to attach the chain from the `copy_from_reg` to the root, which did work, however, the root is not always available (it's set to null during legalize types). So the alternate solution in this patch is to replace the `copy_from_reg` with an `X86ISD::POP_FROM_X87_REG` within the X86 call lowering. This node is the same as `copy_from_reg` except this node makes it explicit that it may lower to an x87 FPU stack pop. Optimizations should be more cautious when handling this node than a normal CopyFromReg to avoid removing a required FPU stack pop. ``` ptr = alloca ch0 = call modf ptr t0, ch1 = pop_from_x87_reg, ch0 // t0 unused t1, ch2 = ldr ptr, ch1 return t1 ``` Using this node ensures a required x87 FPU pop is not removed due to the DAGCombiner. This is an alternate solution for #127976.	2025-03-03 12:23:28 +00:00
Akshat Oke	77f44a9642	[CodeGen][NewPM] Port MachineSink to NPM (#115434 ) Targets can set the EnableSinkAndFold option in CGPassBuilderOptions for the NPM pipeline in buildCodeGenPipeline(... &Opts, ...)	2025-03-03 15:49:37 +05:30
RolandF77	a73e591f33	[PowerPC] custom lower v1024i1 load/store (#126969 ) Support moving PPC dense math register values to and from storage with LLVM IR load/store.	2025-02-28 10:25:07 -05:00
Lucas Ramirez	15e295d30a	[MachineScheduler][AMDGPU] Allow scheduling of single-MI regions (#128739 ) The MI scheduler skips regions containing a single MI during scheduling. This can prevent targets that perform multi-stage scheduling and move MIs between regions during some stages to reason correctly about the entire IR, since some MIs will not be assigned to a region at the beginning. This makes the machine scheduler no longer skip single-MI regions. Only a few unit tests are affected (mainly those which check for the scheduler's debug output).	2025-02-27 11:27:07 +01:00
Simon Pilgrim	bae41127e2	[DAG] replaceShuffleOfInsert - add support for shuffle_vector(scalar_to_vector(x),y) -> insert_vector_elt(y,x,c) (#127210 ) Begin extending replaceShuffleOfInsert to handle other forms of scalar insertion into a vector. I've limited this to targets that have Custom/Legal ISD::INSERT_VECTOR_ELT handling for now - although we can probably always fold this before LegalOperations.	2025-02-27 08:41:58 +00:00
Benjamin Maxwell	ea4e19df53	[SDAG] Add missing ppc_fp128 ExpandFloatRes for sincos[pi] (#128514 )	2025-02-25 08:56:16 +00:00
zhijian lin	481e1eba3a	[NFC] add a pre-commit test case for patch #127121 that hoists xxsplitib out of loop (#127701 ) This is a pre-commit test case for patch https://github.com/llvm/llvm-project/pull/127121 that hoists xxsplitib out of loop	2025-02-21 10:29:52 -05:00
Benjamin Maxwell	f178e51747	[SDAG] Add missing ppc_fp128 ExpandFloatRes legalization for modf (#127895 ) Should fix: https://lab.llvm.org/buildbot/#/builders/72/builds/8380 (`test_modf_ppcf128` is the test case that needed the additional legalization)	2025-02-20 09:50:16 +07:00
David Tenty	aa9e519b24	Revert "[PowerPC] Deprecate uses of ISD::ADDC/ISD::ADDE/ISD::SUBC/ISD::SUBE (#116984 )" This reverts commit 7763119c6eb0976e4836f81c9876c49a36d46d73 (leaving the modifications from 03cb46d248b08)..	2025-02-19 09:44:39 -05:00
Nikita Popov	cc539138ac	[CodeGen] Use __extendhfsf2 and __truncsfhf2 by default (#126880 ) The standard libcalls for half to float and float to half conversion are __extendhfsf2 and __truncsfhf2. However, LLVM currently uses __gnu_h2f_ieee and __gnu_f2h_ieee instead. As far as I can tell, these libcalls are an ARM-ism and only provided by libgcc on that platform. compiler-rt always provides both libcalls. Use the standard libcalls by default, and only use the __gnu libcalls on ARM.	2025-02-19 10:16:57 +01:00
Craig Topper	256145b4b0	[PowerPC] Use getSignedTargetConstant in SelectOptimalAddrMode. (#127305 ) Fixes #127298.	2025-02-15 14:13:32 -08:00
zhijian lin	7763119c6e	[PowerPC] Deprecate uses of ISD::ADDC/ISD::ADDE/ISD::SUBC/ISD::SUBE (#116984 ) ISD::ADDC, ISD::ADDE, ISD::SUBC and ISD::SUBE are being deprecated, using ISD::UADDO_CARRY,ISD::USUBO_CARRY instead. Lowering the UADDO, UADDO_CARRY, USUBO, USUBO_CARRY in the patch.	2025-02-13 09:09:17 -05:00
Akshat Oke	7b60e03d73	Reland "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703 )" (#126684 ) `RegisterClassInfo` was supposed to be kept alive between pass runs, which wasn't being done leading to recomputations increasing the compile time. Now the Impl class is a member of the legacy and new passes so that it is not reconstructed on every pass run. --------- Co-authored-by: Christudasan Devadasan <christudasan.devadasan@amd.com>	2025-02-12 18:54:39 +05:30
Akshat Oke	564b9b7f4d	Revert "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703 )" (#126268 ) This reverts commit 5aa4979c47255770cac7b557f3e4a980d0131d69 while I investigate what's causing the compile-time regression.	2025-02-08 15:36:48 +05:30
Matt Arsenault	58a88001f3	PeepholeOpt: Fix looking for def of current copy to coalesce (#125533 ) This fixes the handling of subregister extract copies. This will allow AMDGPU to remove its implementation of shouldRewriteCopySrc, which exists as a 10 year old workaround to this bug. peephole-opt-fold-reg-sequence-subreg.mir will show the expected improvement once the custom implementation is removed. The copy coalescing processing here is overly abstracted from what's actually happening. Previously when visiting coalescable copy-like instructions, we would parse the sources one at a time and then pass the def of the root instruction into findNextSource. This means that the first thing the new ValueTracker constructed would do is getVRegDef to find the instruction we are currently processing. This adds an unnecessary step, placing a useless entry in the RewriteMap, and required skipping the no-op case where getNewSource would return the original source operand. This was a problem since in the case of a subregister extract, shouldRewriteCopySource would always say that it is useful to rewrite and the use-def chain walk would abort, returning the original operand. Move the process to start looking at the source operand to begin with. This does not fix the confused handling in the uncoalescable copy case which is proving to be more difficult. Some currently handled cases have multiple defs from a single source, and other handled cases have 0 input operands. It would be simpler if this was implemented with isCopyLikeInstr, rather than guessing at the operand structure as it does now. There are some improvements and some regressions. The regressions appear to be downstream issues for the most part. One of the uglier regressions is in PPC, where a sequence of insert_subrgs is used to build registers. I opened #125502 to use reg_sequence instead, which may help. The worst regression is an absurd SPARC testcase using a <251 x fp128>, which uses a very long chain of insert_subregs. We need improved subregister handling locally in PeepholeOptimizer, and other pasess like MachineCSE to fix some of the other regressions. We should handle subregister composes and folding more indexes into insert_subreg and reg_sequence.	2025-02-05 23:29:02 +07:00
Christudasan Devadasan	5aa4979c47	CodeGen][NewPM] Port MachineScheduler to NPM. (#125703 )	2025-02-05 12:17:59 +05:30
Sergei Barannikov	ff9c041d96	[MachineScheduler] Fix physreg dependencies of ExitSU (#123541 ) Providing the correct operand index allows addPhysRegDataDeps to compute the correct latency. Pull Request: https://github.com/llvm/llvm-project/pull/123541	2025-02-01 20:40:50 +03:00
Alexander Richardson	213a939a79	[LegalizeDAG] Use Base+Offset instead of Offset+Base for jump tables This is needed for architectures that actually use strict pointer arithmetic instead of integers such as AArch64 with FEAT_CPA (see https://github.com/llvm/llvm-project/pull/105669) or CHERI. Using an index as the first operand of pointer arithmetic may result in an invalid output. While there are quite a few codegen changes here, these only change the order of registers in add instructions. One MIPS combine had to be updated to handle the new node order. Reviewed By: topperc Pull Request: https://github.com/llvm/llvm-project/pull/125279	2025-01-31 14:05:34 -08:00
Alex Richardson	c7d4ccfd83	[PowerPC] Autogenerate a test checks in preparation for follow-up commit This just adds more lines that are checked	2025-01-31 12:01:31 -08:00

1 2 3 4 5 ...

4054 Commits