llvm-project

Author	SHA1	Message	Date
Matt Arsenault	58a88001f3	PeepholeOpt: Fix looking for def of current copy to coalesce (#125533 ) This fixes the handling of subregister extract copies. This will allow AMDGPU to remove its implementation of shouldRewriteCopySrc, which exists as a 10 year old workaround to this bug. peephole-opt-fold-reg-sequence-subreg.mir will show the expected improvement once the custom implementation is removed. The copy coalescing processing here is overly abstracted from what's actually happening. Previously when visiting coalescable copy-like instructions, we would parse the sources one at a time and then pass the def of the root instruction into findNextSource. This means that the first thing the new ValueTracker constructed would do is getVRegDef to find the instruction we are currently processing. This adds an unnecessary step, placing a useless entry in the RewriteMap, and required skipping the no-op case where getNewSource would return the original source operand. This was a problem since in the case of a subregister extract, shouldRewriteCopySource would always say that it is useful to rewrite and the use-def chain walk would abort, returning the original operand. Move the process to start looking at the source operand to begin with. This does not fix the confused handling in the uncoalescable copy case which is proving to be more difficult. Some currently handled cases have multiple defs from a single source, and other handled cases have 0 input operands. It would be simpler if this was implemented with isCopyLikeInstr, rather than guessing at the operand structure as it does now. There are some improvements and some regressions. The regressions appear to be downstream issues for the most part. One of the uglier regressions is in PPC, where a sequence of insert_subrgs is used to build registers. I opened #125502 to use reg_sequence instead, which may help. The worst regression is an absurd SPARC testcase using a <251 x fp128>, which uses a very long chain of insert_subregs. We need improved subregister handling locally in PeepholeOptimizer, and other pasess like MachineCSE to fix some of the other regressions. We should handle subregister composes and folding more indexes into insert_subreg and reg_sequence.	2025-02-05 23:29:02 +07:00
Alexander Richardson	213a939a79	[LegalizeDAG] Use Base+Offset instead of Offset+Base for jump tables This is needed for architectures that actually use strict pointer arithmetic instead of integers such as AArch64 with FEAT_CPA (see https://github.com/llvm/llvm-project/pull/105669) or CHERI. Using an index as the first operand of pointer arithmetic may result in an invalid output. While there are quite a few codegen changes here, these only change the order of registers in add instructions. One MIPS combine had to be updated to handle the new node order. Reviewed By: topperc Pull Request: https://github.com/llvm/llvm-project/pull/125279	2025-01-31 14:05:34 -08:00
Jonas Paulsson	eb1a571114	[SystemZ] Replace SELRMux with COPY in case of identical operands. (#125108 ) If both operands of a SELRMux use the same register which is killed, and the SELRMux is expanded to a jump sequence, a broken MIR results if the kill flag is not removed. This patch replaces the SELRMux with a COPY in these cases.	2025-01-31 13:58:01 +01:00
Matt Arsenault	1cbfac04d0	SystemZ: Handle copies between gr64 and fp64 (#124890 ) I'm guessing based on tablegen definitions. I also don't really understand how this could have been missing. This defends against regressions in a future peephole-opt patch.	2025-01-30 11:08:08 +07:00
Mikhail Gudim	3c3c850a45	[ReachingDefAnalysis] Extend the analysis to stack objects. (#118097 ) We track definitions of stack objects, the implementation is identical to tracking of registers. Also, added printing of all found reaching definitions for testing purposes. --------- Co-authored-by: Michael Maitland <michaeltmaitland@gmail.com>	2025-01-29 10:55:16 -05:00
Jeffrey Byrnes	acb7859f07	[MachineSink] Extend loop sinking capability (#117247 ) The current MIR cycle sinking capabilities are rather limited. It only support sinking copies into a single successor block while obeying limits. This opt-in feature adds a more aggressive option, that is not limited to the above concerns. The feature will try to "sink" by duplicating any top-level preheader instruction (that we are sure is safe to sink) into any user block, then does some dead code cleanup. In particular, this is useful for high RP situations when loop bodies have control flow.	2025-01-23 17:08:23 -08:00
Ulrich Weigand	6d5697f7cb	[SystemZ] Fix ICE with i128->i64 uaddo carry chain We can only optimize a uaddo_carry via specialized instruction if the carry was produced by another uaddo(_carry) instruction; there is already a check for that. However, i128 uaddo(_carry) use a completely different mechanism; they indicate carry in a vector register instead of the CC flag. Thus, we must also check that we don't mix those two - that check has been missing. Fixes: https://github.com/llvm/llvm-project/issues/124001	2025-01-23 19:15:11 +01:00
Ulrich Weigand	8424bf207e	[SystemZ] Add support for new cpu architecture - arch15 This patch adds support for the next-generation arch15 CPU architecture to the SystemZ backend. This includes: - Basic support for the new processor and its features. - Detection of arch15 as host processor. - Assembler/disassembler support for new instructions. - Exploitation of new instructions for code generation. - New vector (signed\|unsigned\|bool) __int128 data types. - New LLVM intrinsics for certain new instructions. - Support for low-level builtins mapped to new LLVM intrinsics. - New high-level intrinsics in vecintrin.h. - Indicate support by defining __VEC__ == 10305. Note: No currently available Z system supports the arch15 architecture. Once new systems become available, the official system name will be added as supported -march name.	2025-01-20 19:30:21 +01:00
Guillaume DI FATTA	a1ee1a9126	[CodeGen] @llvm.experimental.stackmap make operands immediate (#117932 ) This pull request modifies the behavior of the `@llvm.experimental.stackmap` intrinsic to require that its two first operands (`id` and `numShadowBytes`) be immediate values. This change ensures that variables cannot be passed as two first arguments to this intrinsic. Related Issue: https://github.com/llvm/llvm-project/issues/115733 ### Testing - Added new test cases to ensure errors are emitted for non-immediate operands. - Ran the full LLVM test suite to verify no regressions were introduced.	2024-12-11 17:41:19 +08:00
anoopkg6	dc04d414df	SystemZ: Add support for __builtin_setjmp and __builtin_longjmp. (#119257 ) This pr includes fixes for original pr##116642. Implementation for __builtin_setjmp and __builtin_longjmp for SystemZ..	2024-12-10 19:50:51 +01:00
Ulrich Weigand	8787bc72a6	Revert "[SystemZ] Add support for __builtin_setjmp and __builtin_longjmp (#116642 )" This reverts commit 030bbc92a705758f1131fb29cab5be6d6a27dd1f.	2024-12-07 00:55:54 +01:00
anoopkg6	030bbc92a7	[SystemZ] Add support for __builtin_setjmp and __builtin_longjmp (#116642 ) Implementation for __builtin_setjmp and __builtin_longjmp for SystemZ.	2024-12-06 23:33:33 +01:00
Matt Arsenault	d42ab5d0f0	SystemZ: Regenerate baseline checks for some coalescer tests (#118322 ) These were missing -NEXT checks and also had some dead checks. Also switch a test to actually check the output.	2024-12-06 12:18:51 -05:00
Serge Pavlov	eac8ea323a	[SystemZ] Modify tests for constrained rounding functions (#116952 ) The existing tests for constrained functions often use constant arguments. If constant evaluation is enhanced, such tests will not check code generation of the tested functions. To avoid it, the tests are modified to use loaded value instead of constants. Now only the tests for rounding functions are changed.	2024-11-22 15:15:41 +07:00
Tex Riddell	5c2a133b13	Emit constrained atan2 intrinsic for clang builtin (#113636 ) This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 - `Builtins.td` - Add f16 support for libm atan2 builtin - `CGBuiltin.cpp` - Emit constraint atan2 intrinsic for clang builtin - `clang/test/CodeGenCXX/builtin-calling-conv.cpp` - Use erff instead of atan2 for clang builtin to lib call calling convention check, now that atan2 maps to an intrinsic. - add atan2 cases to llvm.experimental.constrained tests for more backends: ARM, PowerPC, RISCV, SystemZ. - LangRef.rst: add llvm.experimental.constrained.atan2, revise llvm.atan2 description. Last part of Implement the atan2 HLSL Function. Fixes #70096.	2024-11-12 13:34:29 -08:00
Jonas Paulsson	77ddcf7cbf	[SystemZ] Fix bitwidth problem in FindReplicatedImm(). (#115383 ) A test case emerged with an i32 truncating store of an i64 constant operand, where the i64 constant did not fit in 32 bits, which caused FindReplicatedImm() to crash. Make sure to truncate the APInt in these cases.	2024-11-11 22:16:20 +01:00
Kai Nacke	4a37799a48	[SystemZ][XRay] Implement XRay instrumentation for SystemZ (#113253 ) Expands pseudo instructions PATCHABLE_FUNCTION_ENTER and PATCHABLE_RET into a small instruction sequence which calls into the XRay library.	2024-11-05 15:42:55 -05:00
Kai Nacke	8b659736f7	[SystemZ] Make lit test more specific (#115050 ) The lit test fmuladd-soft-float.ll only specifies s390x as platform, but the test is Linux specific, causing problems when run on z/OS. This change updates the triple to fix this.	2024-11-05 15:29:32 -05:00
Jonas Paulsson	117e952a53	[LiveRangeEdit] Remove any MemoryOperand on MI when converting it to KILL. (#114407 ) When LiveRangeEdit::eliminateDeadDef() converts an MI to a KILL instruction, it should also call dropMemRefs() in order to erase any MachineMemOperand present. This was discovered in testing as the MachineVerifier does not accept an MMO without the corresponding MI mayLoad/mayStore flag, which the KILL opcode lacks.	2024-11-05 18:08:27 +01:00
tltao	0aec4d2b78	[SystemZ] Update large ada tests after HLASM syntax change (#113578 ) Fix buildbot failures seen on: https://lab.llvm.org/buildbot/#/builders/42/builds/1597 caused by: https://github.com/llvm/llvm-project/pull/113369 Co-authored-by: Tony Tao <tonytao@ca.ibm.com>	2024-10-24 11:02:36 -04:00
tltao	c170405996	[SystemZ] Introduce GNU and HLASM differences to asmwriter and update tests (#113369 ) Now that the GNU and HLASM `InstPrinter` paths are separated in https://github.com/llvm/llvm-project/pull/112975, differentiate between them in `SystemZInstrFormats.td`. The main difference are: - Tabs converted to space - Remove space after comma for instruction operands --------- Co-authored-by: Tony Tao <tonytao@ca.ibm.com>	2024-10-23 13:06:48 -04:00
Alex Rønne Petersen	5785cbb405	[llvm] Ensure that soft float targets don't emit `fma()` libcalls. (#106615 ) The previous behavior could be harmful in some edge cases, such as emitting a call to `fma()` in the `fma()` implementation itself. Do this by just being more accurate in `isFMAFasterThanFMulAndFAdd()`. This was already done for PowerPC; this commit just extends that to Arm, z/Arch, and x86. MIPS and SPARC already got it right, but I added tests for them too, for good measure. Note: I don't have commit access.	2024-10-19 06:13:15 -07:00
Jay Foad	922992a22f	Fix typo "instrinsic" (#112899 )	2024-10-18 15:58:33 +01:00
Alex Rønne Petersen	ad4a582fd9	[llvm] Consistently respect `naked` fn attribute in `TargetFrameLowering::hasFP()` (#106014 ) Some targets (e.g. PPC and Hexagon) already did this. I think it's best to do this consistently so that frontend authors don't run into inconsistent results when they emit `naked` functions. For example, in Zig, we had to change our emit code to also set `frame-pointer=none` to get reliable results across targets. Note: I don't have commit access.	2024-10-18 09:35:42 +04:00
Jonas Paulsson	76aa370f44	[SystemZ] Remove inlining threshold multiplier. (#106058 ) Due to recently reported problems with having the inlining threshold multiplier set fairly high (x3), this patch removes the multiplier while addressing the regressions seen by doing so in adjustInliningThreshold(). The specific cases that benefit from inlining that were now found to be in need of handling contain a considerable number of memory accesses to the same memory in both caller and callee.	2024-10-07 10:59:45 +02:00
Jonas Paulsson	f9fbfc587d	[SystemZ] Dump function signature on missing arg extension. (#109699 ) Make it easier to handle detected problems by providing the function signature(s) involved in cases of missing argument extensions.	2024-09-30 17:03:18 +02:00
Jonas Paulsson	0ef24aa549	Fix for logic in combineExtract() (#108208 ) A (csmith) test case appeared where combineExtract() crashed when the input vector was a bitcast into a vector of i1:s. Fix this by adding a check with canTreatAsByteVector() before the call.	2024-09-25 12:12:27 +02:00
Jonas Paulsson	14120227a3	Target ABI: improve call parameters extensions handling (#100757 ) For the purpose of verifying proper arguments extensions per the target's ABI, introduce the NoExt attribute that may be used by a target when neither sign- or zeroextension is required (e.g. with a struct in register). The purpose of doing so is to be able to verify that there is always one of these attributes present and by this detecting cases where sign/zero extension is actually missing. As a first step, this patch has the verification step done for the SystemZ backend only, but left off by default until all known issues have been addressed. Other targets/front-ends can now also add NoExt attribute where needed and do this check in the backend.	2024-09-19 16:59:31 +02:00
Simon Pilgrim	4baf29e81e	[DAG] Handle cases where a shift amount is larger than the pre-extended value bitwidth In the (zext (shl (zext x), cst)) -> (shl (zext x), cst) fold, don't use a bitmask / MaskedValueIsZero as we can't guarantee that the shift amount is in bounds. Fixes #106202	2024-08-27 18:12:24 +01:00
Abhina Sree	a0be7053d7	[SystemZ][z/OS] Continuation of __ptr32 support (#103393 ) This is a continuation of the __ptr32 support added here `135fecd444`	2024-08-14 13:26:30 -04:00
tltao	bc747c3e13	[SystemZ][z/OS] Fix incorrect codegen for ADA_ENTRY pseudo instruction (#101415 ) The current MCInstBuilder for generating an ALGFI when loading something from the ADA is incorrect and will crash the compiler. r0 must also be excluded from the registers returned as the result, since it is treated as the value "0" on z/OS. Also add some tests to properly test the paths where LLILF and ALGFI are generated. --------- Co-authored-by: Tony Tao <tonytao@ca.ibm.com>	2024-08-01 13:23:49 -04:00
Jonas Paulsson	22bc9db92b	[SystemZ] Use the EVT version of getVectorVT() in combineTruncateExtract(). (#100150 ) A test case showed up where the new vector type is v24i16, which is not a simple MVT. In order to get an extended value type for cases like this, EVT::getVectorVT() needs to be called instead of MVT::getVectorVT(), otherwise the following call to getVectorElementType() in combineExtract() will fail.	2024-07-26 14:33:40 +02:00
Björn Pettersson	2b78303e3f	[DAGCombiner] Freeze maybe poison operands when folding select to logic (#84924 ) Just like for regular IR we need to treat SELECT as conditionally blocking poison in SelectionDAG. So (unless the condition itself is poison) the result is only poison if the selected true/false value is poison. Thus, when doing DAG combines that turn SELECT into arithmetic/logical operations (e.g. AND/OR) we need to make sure that the new operations aren't more poisonous. One way to do that is to use FREEZE to make sure the operands aren't posion. This patch aims at fixing the kind of miscompiles reported in https://github.com/llvm/llvm-project/issues/84653 and https://github.com/llvm/llvm-project/issues/85190 Solution is to make sure that we insert FREEZE, if needed to make the fold sound, when using the foldBoolSelectToLogic and foldVSelectToSignBitSplatMask DAG combines.	2024-07-22 17:19:46 +02:00
Volodymyr Vasylkun	e094abde42	[SelectionDAG] Expand [US]CMP using arithmetic on boolean values instead of selects (#98774 ) The previous expansion of [US]CMP was done using two selects and two compares. It produced decent code, but on many platforms it is better to implement [US]CMP nodes by performing the following operation: ``` [us]cmp(x, y) = (x [us]> y) - (x [us]< y) ``` This patch adds this new expansion, as well as a hook in TargetLowering to allow some targets to still use the select-based approach. AArch64 and SystemZ are currently the only targets to prefer the former approach, but other targets may also start to use it if it provides for better codegen.	2024-07-16 20:56:18 +01:00
Ulrich Weigand	e8e406041e	Fix sext_in_reg from i1 to i128 The combineSIGN_EXTEND_INREG routine was using DAG.getConstant(-1, DL, VT), which does not result in the expected value when VT has more than 64 bits. Fix this by using DAG.getAllOnesConstant(DL, VT) instead. Also add test cases for v1i128 comparisons (which triggers the bug).	2024-07-15 11:26:37 +02:00
Manish Kausik H	69192e0193	[LegalizeDAG] Optimize CodeGen for `ISD::CTLZ_ZERO_UNDEF` (#83039 ) Previously we had the same instructions being generated for `ISD::CTLZ` and `ISD::CTLZ_ZERO_UNDEF` which did not take advantage of the fact that zero is an invalid input for `ISD::CTLZ_ZERO_UNDEF`. This commit separates codegen for the two cases to allow for the optimization for the latter case. The details of the optimization are outlined in #82075 Fixes #82075 Co-authored-by: Manish Kausik H <hmamishkausik@gmail.com>	2024-07-08 14:01:32 +01:00
Zibi Sarbinowski	1de1818fab	[SystemZ] Address issue with supper large stack frames (#96318 ) This PR fixes the following failure by adjusting the calculation of maximum displacement from Stack Pointer. `LLVM ERROR: Error while trying to spill R5D from class ADDR64Bit: Cannot scavenge register without an emergency spill slot! `	2024-06-27 09:23:35 -04:00
Farzon Lotfi	189d471191	[clang] Reland Add tanf16 builtin and support for tan constrained intrinsic (#94559 ) Relanding this PR now that https://github.com/llvm/llvm-project/pull/90503 has merged. with `FTAN` landing in [TargetLoweringBase.cpp:L1021](https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/TargetLoweringBase.cpp#L1020C23-L1021C63 ) There is now a llvm tan intrinsic 32\64\128 Expand case for all llvm backends. In LLVM, the `llvm.experimental.constrained.cos` and `llvm.experimental.constrained.sin` intrinsics are used for performing cosine and sine calculations with additional constraints on floating-point operations. This behavior is expected for all floating-point math intrinsics. This change adds these constraints for the `tan` intrinsic. - `Builtins.td` - replace TanF128 with F16F128MathTemplate - `CGBuiltin.cpp` - map existing tan builtins to `tan` and `constrained_tan` intrinsic - `ConstrainedOps.def` map tan and constrained_tan to an ISDOpcode. resolves #91421 --------- Co-authored-by: Farzon Lotfi <farzon@farzon.com>	2024-06-10 20:46:26 -04:00
paperchalice	1bc8b3258e	[NewPM][CodeGen] Port `regallocfast` to new pass manager (#94426 ) This pull request port `regallocfast` to new pass manager. It exposes the parameter `filter` to handle different register classes for AMDGPU. IIUC AMDGPU need to allocate different register classes separately so it need implement its own `--<reg-class>-regalloc`. Now users can use e.g. `-passe=regallocfast<filter=sgpr>` to allocate specific register class. The command line option `--regalloc-npm` is still in work progress, plan to reuse the syntax of passes, e.g. use `--regalloc-npm=regallocfast<filter=sgpr>,greedy<filter=vgpr>` to replace `--sgpr-regalloc` and `--vgpr-regalloc`.	2024-06-07 12:22:42 +08:00
paperchalice	9b0e1c2ca2	[NewPM][CodeGen] Port `finalize-isel` to new pass manager (#94214 ) It should preserve more analysis results, but it happens immediately after instruction selection.	2024-06-04 09:23:52 +08:00
Matt Arsenault	ddb87e0f96	SystemZ: Use REG_SEQUENCE for PAIR128 (#90640 ) PAIR128 should probably just be removed entirely Depends #90638	2024-05-17 13:16:34 +02:00
Jonas Paulsson	d6ee7e8481	[SystemZ] Handle address clobbering in splitMove(). (#92105 ) When expanding an L128 (which is used to reload i128) it is possible that the quadword destination register clobbers an address register. This patch adds an assertion against the case where both of the expanded parts clobber the address, and in the case where one of the expanded parts do so puts it last. Fixes #91437	2024-05-15 08:36:26 +02:00
Ulrich Weigand	de117dd533	[SystemZ] Add some more atomic load/store tests Verify atomic load/store of f128 on z14 where the type lives in VRs.	2024-05-07 16:57:17 +02:00
Ulrich Weigand	0a0cac6dbd	[SystemZ] Simplify f128 atomic load/store (#90977 ) Change definition of expandBitCastI128ToF128 and expandBitCastF128ToI128 to allow for simplified use in atomic load/store. Update logic to split 128-bit loads and stores in DAGCombine to also handle the f128 case where appropriate. This fixes the regressions introduced by recent atomic load/store patches.	2024-05-06 12:17:19 +02:00
Matt Arsenault	eb75af223f	Reapply "SystemZ: Fold copy of vector immediate to gr128" (#91099 ) This reverts commit a415b4dfcc02e3e82b8c8a7836f7c04b9d65dc9b. Modify the instruction in place to transform it into a REG_SEQUENCE, which is what other implementations of foldImmediate do. Also start erasing the def instruction if there are no other uses. Fixes #91110.	2024-05-06 10:00:20 +02:00
Matt Arsenault	4b61d04645	SystemZ: Remove unnecessary REQUIRES asserts from tests	2024-05-06 09:52:35 +02:00
Matt Arsenault	181e82143e	SystemZ: Remove redundant REQUIRES systemz from test	2024-05-06 09:52:35 +02:00
Vitaly Buka	a415b4dfcc	Revert "SystemZ: Fold copy of vector immediate to gr128" (#91099 ) Fails here: https://lab.llvm.org/buildbot/#/builders/239/builds/6893 https://lab.llvm.org/buildbot/#/builders/5/builds/43113 https://lab.llvm.org/buildbot/#/builders/168/builds/20228 Reverts llvm/llvm-project#90706	2024-05-04 23:59:49 -07:00
Simon Pilgrim	caacf8685a	[DAG] Fold freeze(shuffle(x,y,m)) -> shuffle(freeze(x),freeze(y),m) (#90952 ) If the shuffle mask contains no undef elements, then we can move the freeze through a shuffle node. This requires special case handling to create a new ShuffleVectorSDNode. Includes VECTOR_SHUFFLE support for isGuaranteedNotToBeUndefOrPoison / canCreateUndefOrPoison.	2024-05-04 12:03:10 +01:00
Matt Arsenault	49c5f4d56a	SystemZ: Fold copy of vector immediate to gr128 (#90706 ) If materializing a constant in a vector register that is just going to be copied to general registers, directly materialize the immediate in the gpr. This will avoid a few lit test regressions in a future commit.	2024-05-03 18:40:11 +02:00

1 2 3 4 5 ...

986 Commits