llvm-project

Author	SHA1	Message	Date
Ulrich Weigand	be7ef6c52b	[MachineLICM] Recognize registers clobbered at EH landing pad entry (#122446 ) EH landing pad entry implicitly clobbers target-specific exception pointer and exception selector registers. The post-RA MachineLICM pass needs to take these into account when deciding whether to hoist an instruction out of the loop that initializes one of these registers. Fixes: https://github.com/llvm/llvm-project/issues/122315	2025-04-25 22:27:27 +02:00
Jonas Paulsson	94a14f9f0d	[SystemZ] Add DAGCombine for FCOPYSIGN to remove rounding. (#136131 ) Add a DAGCombine for FCOPYSIGN that removes the rounding which is never needed as the sign bit is already in the correct place. This helps in particular the rounding to f16 case which needs a libcall. Also remove the roundings for other FP VTs and simplify the CPSDR patterns correspondingly. fp-copysign-03.ll test updated, now also covering the other FP VT combinations.	2025-04-24 11:05:51 +02:00
Jonas Paulsson	1ec22fae7e	[SystemZ] Handle f16 load positive/negative/complement without libcalls. (#136286 ) This can be done directly with the (64-bit) target instruction as only the sign bit is changed.	2025-04-24 10:49:40 +02:00
Jonas Paulsson	6d03f51f0c	[SystemZ] Add support for 16-bit floating point. (#109164 ) - _Float16 is now accepted by Clang. - The half IR type is fully handled by the backend. - These values are passed in FP registers and converted to/from float around each operation. - Compiler-rt conversion functions are now built for s390x including the missing extendhfdf2 which was added. Fixes #50374	2025-04-16 20:02:56 +02:00
Sergei Barannikov	3050061793	[AsmPrinter] Link .section_sizes to the correct section (#135583 ) AsmPrinter may switch the current section when e.g., emitting a jump table for a switch. `.stack_sizes` should still be linked to the function section. If the section is wrong, readelf emits a warning "relocation symbol is not in the expected section".	2025-04-14 20:04:50 +03:00
Dominik Steenken	e071233fa5	[SystemZ] Consider VST/VL as SimpleBDXStore/Load (#135623 ) Previously `vst` and `vl` were not considered "simple" BDX stores and loads, leading to, among other things, some opportunities for `mvc` optimization to be missed. This PR addresses this and updates some tests to account for additional `mvc` instructions being emitted. This is observed to have a neutral or slightly beneficial effect performance-wise.	2025-04-14 18:58:57 +02:00
Ulrich Weigand	80267f8148	Support z17 processor name and scheduler description (#135254 ) The recently announced IBM z17 processor implements the architecture already supported as "arch15" in LLVM. This patch adds support for "z17" as an alternate architecture name for arch15. This patch also add the scheduler description for the z17 processor, provided by Jonas Paulsson.	2025-04-11 00:20:58 +02:00
tltao	18189430ab	[SystemZ] Add check for INIT_UNDEF in getInstSizeInBytes (#134661 ) Due to some optimization changes, INIT_UNDEF is making its way to `getInstSizeInBytes` in `llvm/lib/Target/SystemZ/SystemZLongBranch.cpp` but we do not have an exception there in the assert. Since INIT_UNDEF is described as being similar to IMPLICIT_DEF and there is a check for IMPLICIT_DEF, it seems logical to also add a check for INIT_UNDEF. --------- Co-authored-by: Tony Tao <tonytao@ca.ibm.com>	2025-04-10 16:16:20 -04:00
Dominik Steenken	e9a3ea2218	[SystemZ, DebugInfo] Instrument SystemZ backend passes for Instr-Ref DebugInfo (#133061 ) This PR instruments the optimization passes in the SystemZ backend with calls to `MachineFunction::substituteDebugValuesForInst` where instruction substitutions are made to instructions that may compute tracked values. Tests are also added for each of the substitutions that were inserted. Details on the individual passes follow. ### systemz-copy-physregs When a copy targets an access register, we redirect the copy via an auxiliary register. This leads to the final result being written by a newly inserted SAR instruction, rather than the original MI, so we need to update the debug value tracking to account for this. ### systemz-long-branch This pass relaxes relative branch instructions based on the actual locations of blocks. Only one of the branch instructions qualifies for debug value tracking: BRCT, i.e. branch-relative-on-count, which subtracts 1 from a register and branches if the result is not zero. This is relaxed into an add-immediate and a conditional branch, so any `debug-instr-number` present must move to the add-immediate instruction. ### systemz-post-rewrite This pass replaces `LOCRMux` and `SELRMux` pseudoinstructions with either the real versions of those instructions, or with branching programs that implement the intent of the Pseudo. In all these cases, any `debug-instr-number` attached to the pseudo needs to be reallocated to the appropriate instruction in the result, either LOCR, SELR, or a COPY. ### systemz-elim-compare Similar to systemz-long-branch, for this pass, only few substitutions are necessary, since it mainly deals with conditional branch instructions. The only exceptiona are again branch-relative-on-count, as it modifies a counter as part of the instruction, as well as any of the load instructions that are affected.	2025-03-31 19:30:06 +02:00
Dominik Steenken	f24cf59d7a	[SystemZ] Add `is(LoadFrom\|StoreTo)StackSlotPostFE` to SystemZBackend (#132928 ) As part of an effort to enable instr-ref-based debug value tracking, this PR implements `SystemZInstrInfo::isLoadFromStackSlotPostFE`, as well as `SystemZInstrInfo::isStoreToStackSlotPostFE`. The implementation relies upon the presence of MachineMemoryOperands on the relevant `MachineInstr`s in order to access the `FrameIndex` post frame index elimination. Since these new functions are only meant to be called after frame-index elimination, they assert against the present of a frame index on the base register operand of the instruction. Outside of the utility of these functions to enable instr-ref-based debug value tracking, they also changes the behavior of the AsmPrinter, since it will now be able to properly detect non-folded spills and reloads, so this changes a number of tests that were checking specifically for folded reloads. Note that there are some tests that still check for `vst` and `vl` as folded spills/reloads even though they should be straight reloads. This will be addressed in a future PR. Co-authored-by: Dominik Steenken <dominik.steenken@gmail.com>	2025-03-25 15:03:54 +01:00
tltao	f7a32b85b5	[MC][SystemZ] Introduce Target Specific HLASM Streamer for z/OS (#130535 ) A more fleshed out version of a previous PR https://github.com/llvm/llvm-project/pull/107415. The goal is to provide platforms an alternative to the current MCAsmStreamer which only supports the GNU Asm syntax. RFC: https://discourse.llvm.org/t/rfc-llvm-add-support-for-target-specific-asm-streamer/85095 --------- Co-authored-by: Tony Tao <tonytao@ca.ibm.com>	2025-03-21 11:36:35 -04:00
Ulrich Weigand	f4ea1055ad	[SystemZ] Implement i128 funnel shifts These can be handled via the VECTOR SHIFT LEFT/RIGHT DOUBLE family of instructions, depending on architecture level. Fixes: https://github.com/llvm/llvm-project/issues/129955	2025-03-15 18:28:44 +01:00
Ulrich Weigand	4155cc0fb3	[SystemZ] Recognize carry/borrow computation Generate code using the VECTOR ADD COMPUTE CARRY and VECTOR SUBTRACT COMPUTE BORROW INDICATION instructions to implement open-coded IR with those semantics. Handles integer vector types as well as i128. Fixes: https://github.com/llvm/llvm-project/issues/129608	2025-03-15 18:28:44 +01:00
Ulrich Weigand	4a4987be36	[SystemZ] Optimize vector zero/sign extensions Generate more efficient code for zero or sign extensions where the source is a subvector generated via SHUFFLE_VECTOR. Specifically, recognize patterns corresponding to (series of) VECTOR UNPACK instructions, or the VECTOR SIGN EXTEND TO DOUBLEWORD instruction. As a special case, also handle zero or sign extensions of a vector element to i128. Fixes: https://github.com/llvm/llvm-project/issues/129576 Fixes: https://github.com/llvm/llvm-project/issues/129899	2025-03-15 18:28:44 +01:00
Ulrich Weigand	cdc7864986	[SystemZ] Optimize widening and high-word vector multiplication Detect (non-intrinsic) IR patterns corresponding to the semantics of the various widening and high-word multiplication instructions. Specifically, this is done by: - Recognizing even/odd widening multiplication patterns in DAGCombine - Recognizing widening multiply-and-add on top during ISel - Implementing the standard MULHS/MUHLU IR opcodes - Detecting high-word multiply-and-add (which common code does not) Depending on architecture level, this can support all integer vector types as well as the scalar i128 type. Fixes: https://github.com/llvm/llvm-project/issues/129705	2025-03-15 18:28:44 +01:00
Ulrich Weigand	7af3d3929e	[SystemZ] Optimize vector comparison reductions Generate efficient code using the condition code set by the VECTOR (FP) COMPARE family of instructions to implement vector comparison reductions, e.g. as resulting from __builtin_reduce_and/or of some vector comparsion. Fixes: https://github.com/llvm/llvm-project/issues/129434	2025-03-15 18:28:44 +01:00
Jonas Paulsson	85318bae28	[MachineLateInstrsCleanup] Handle multiple kills for a preceding definition. (#119132 ) When removing a redundant definition in order to reuse an earlier identical one it is necessary to remove any earlier kill flag as well. Previously, the assumption has been that any register that kills the defined Reg is enough to handle for this purpose, but this is actually not quite enough. A kill of a super-register does not necessarily imply that all of its subregs (including Reg) is defined at that point: a partial definition of a register is legal. This means Reg may have been killed earlier and is not live at that point. This patch changes the tracking of kill flags to allow for multiple flags to be removed: instead of remembering just the single / latest kill flag, a vector is now used to track and remove them all. TinyPtrVector seems ideal for this as there are only very rarely more than one kill flag, and it doesn't seem to give much difference in compile time. The kill flags handling here is making this pass much more complicated than it would have to be. This pass does not depend on kill flags for its own use, so an interesting alternative to all this handling would be to just remove them all. If there actually is a serious user, maybe that pass could instead recompute them. Also adding an assertion which is unrelated to kill flags, but it seems to make sense (according to liberal assertion policy), to verify that the preceding definition is in fact identical in clearKillsForDef(). Fixes #117783	2025-03-13 15:50:54 +01:00
Akshat Oke	af4ec59f8d	[CodeGen][NPM] Port ExpandPostRAPseudos to NPM (#129509 )	2025-03-04 11:49:09 +05:30
Akshat Oke	77f44a9642	[CodeGen][NewPM] Port MachineSink to NPM (#115434 ) Targets can set the EnableSinkAndFold option in CGPassBuilderOptions for the NPM pipeline in buildCodeGenPipeline(... &Opts, ...)	2025-03-03 15:49:37 +05:30
Akshat Oke	aa1fe57b19	[RegAlloc][NewPM] Plug Greedy RA in codegen pipeline (#120557 ) Use `-passes="regallocgreedy<[all\|sgpr\|wwm\|vgpr]>` to insert the greedy RA with a filter and `-regalloc-npm=<type>` to control which RA to use in existing pipeline.	2025-03-03 11:06:15 +05:30
Jonas Paulsson	c298f71ea6	[SystemZ] Fix regstate of SELRMux operand in selectSLRMux(). (#128555 ) It seems that there can be other cases with this that also can lead to wrong code (discovered with csmith). This time it involved not the kill flag but the undef flag. Use the intersection of the flags from both MachineOperand:s instead of the RegState from just one of them.	2025-02-28 15:03:04 +01:00
Ulrich Weigand	adacbf68eb	[SystemZ] Add codegen support for llvm.roundeven This is straightforward as we already had all the necessary instructions, they simply were not wired up. Also allows implementing the vec_round intrinsic via the standard llvm.roundeven IR instead of a platform intrinsic now.	2025-02-14 00:10:37 +01:00
Matt Arsenault	58a88001f3	PeepholeOpt: Fix looking for def of current copy to coalesce (#125533 ) This fixes the handling of subregister extract copies. This will allow AMDGPU to remove its implementation of shouldRewriteCopySrc, which exists as a 10 year old workaround to this bug. peephole-opt-fold-reg-sequence-subreg.mir will show the expected improvement once the custom implementation is removed. The copy coalescing processing here is overly abstracted from what's actually happening. Previously when visiting coalescable copy-like instructions, we would parse the sources one at a time and then pass the def of the root instruction into findNextSource. This means that the first thing the new ValueTracker constructed would do is getVRegDef to find the instruction we are currently processing. This adds an unnecessary step, placing a useless entry in the RewriteMap, and required skipping the no-op case where getNewSource would return the original source operand. This was a problem since in the case of a subregister extract, shouldRewriteCopySource would always say that it is useful to rewrite and the use-def chain walk would abort, returning the original operand. Move the process to start looking at the source operand to begin with. This does not fix the confused handling in the uncoalescable copy case which is proving to be more difficult. Some currently handled cases have multiple defs from a single source, and other handled cases have 0 input operands. It would be simpler if this was implemented with isCopyLikeInstr, rather than guessing at the operand structure as it does now. There are some improvements and some regressions. The regressions appear to be downstream issues for the most part. One of the uglier regressions is in PPC, where a sequence of insert_subrgs is used to build registers. I opened #125502 to use reg_sequence instead, which may help. The worst regression is an absurd SPARC testcase using a <251 x fp128>, which uses a very long chain of insert_subregs. We need improved subregister handling locally in PeepholeOptimizer, and other pasess like MachineCSE to fix some of the other regressions. We should handle subregister composes and folding more indexes into insert_subreg and reg_sequence.	2025-02-05 23:29:02 +07:00
Alexander Richardson	213a939a79	[LegalizeDAG] Use Base+Offset instead of Offset+Base for jump tables This is needed for architectures that actually use strict pointer arithmetic instead of integers such as AArch64 with FEAT_CPA (see https://github.com/llvm/llvm-project/pull/105669) or CHERI. Using an index as the first operand of pointer arithmetic may result in an invalid output. While there are quite a few codegen changes here, these only change the order of registers in add instructions. One MIPS combine had to be updated to handle the new node order. Reviewed By: topperc Pull Request: https://github.com/llvm/llvm-project/pull/125279	2025-01-31 14:05:34 -08:00
Jonas Paulsson	eb1a571114	[SystemZ] Replace SELRMux with COPY in case of identical operands. (#125108 ) If both operands of a SELRMux use the same register which is killed, and the SELRMux is expanded to a jump sequence, a broken MIR results if the kill flag is not removed. This patch replaces the SELRMux with a COPY in these cases.	2025-01-31 13:58:01 +01:00
Matt Arsenault	1cbfac04d0	SystemZ: Handle copies between gr64 and fp64 (#124890 ) I'm guessing based on tablegen definitions. I also don't really understand how this could have been missing. This defends against regressions in a future peephole-opt patch.	2025-01-30 11:08:08 +07:00
Mikhail Gudim	3c3c850a45	[ReachingDefAnalysis] Extend the analysis to stack objects. (#118097 ) We track definitions of stack objects, the implementation is identical to tracking of registers. Also, added printing of all found reaching definitions for testing purposes. --------- Co-authored-by: Michael Maitland <michaeltmaitland@gmail.com>	2025-01-29 10:55:16 -05:00
Jeffrey Byrnes	acb7859f07	[MachineSink] Extend loop sinking capability (#117247 ) The current MIR cycle sinking capabilities are rather limited. It only support sinking copies into a single successor block while obeying limits. This opt-in feature adds a more aggressive option, that is not limited to the above concerns. The feature will try to "sink" by duplicating any top-level preheader instruction (that we are sure is safe to sink) into any user block, then does some dead code cleanup. In particular, this is useful for high RP situations when loop bodies have control flow.	2025-01-23 17:08:23 -08:00
Ulrich Weigand	6d5697f7cb	[SystemZ] Fix ICE with i128->i64 uaddo carry chain We can only optimize a uaddo_carry via specialized instruction if the carry was produced by another uaddo(_carry) instruction; there is already a check for that. However, i128 uaddo(_carry) use a completely different mechanism; they indicate carry in a vector register instead of the CC flag. Thus, we must also check that we don't mix those two - that check has been missing. Fixes: https://github.com/llvm/llvm-project/issues/124001	2025-01-23 19:15:11 +01:00
Ulrich Weigand	8424bf207e	[SystemZ] Add support for new cpu architecture - arch15 This patch adds support for the next-generation arch15 CPU architecture to the SystemZ backend. This includes: - Basic support for the new processor and its features. - Detection of arch15 as host processor. - Assembler/disassembler support for new instructions. - Exploitation of new instructions for code generation. - New vector (signed\|unsigned\|bool) __int128 data types. - New LLVM intrinsics for certain new instructions. - Support for low-level builtins mapped to new LLVM intrinsics. - New high-level intrinsics in vecintrin.h. - Indicate support by defining __VEC__ == 10305. Note: No currently available Z system supports the arch15 architecture. Once new systems become available, the official system name will be added as supported -march name.	2025-01-20 19:30:21 +01:00
Guillaume DI FATTA	a1ee1a9126	[CodeGen] @llvm.experimental.stackmap make operands immediate (#117932 ) This pull request modifies the behavior of the `@llvm.experimental.stackmap` intrinsic to require that its two first operands (`id` and `numShadowBytes`) be immediate values. This change ensures that variables cannot be passed as two first arguments to this intrinsic. Related Issue: https://github.com/llvm/llvm-project/issues/115733 ### Testing - Added new test cases to ensure errors are emitted for non-immediate operands. - Ran the full LLVM test suite to verify no regressions were introduced.	2024-12-11 17:41:19 +08:00
anoopkg6	dc04d414df	SystemZ: Add support for __builtin_setjmp and __builtin_longjmp. (#119257 ) This pr includes fixes for original pr##116642. Implementation for __builtin_setjmp and __builtin_longjmp for SystemZ..	2024-12-10 19:50:51 +01:00
Ulrich Weigand	8787bc72a6	Revert "[SystemZ] Add support for __builtin_setjmp and __builtin_longjmp (#116642 )" This reverts commit 030bbc92a705758f1131fb29cab5be6d6a27dd1f.	2024-12-07 00:55:54 +01:00
anoopkg6	030bbc92a7	[SystemZ] Add support for __builtin_setjmp and __builtin_longjmp (#116642 ) Implementation for __builtin_setjmp and __builtin_longjmp for SystemZ.	2024-12-06 23:33:33 +01:00
Matt Arsenault	d42ab5d0f0	SystemZ: Regenerate baseline checks for some coalescer tests (#118322 ) These were missing -NEXT checks and also had some dead checks. Also switch a test to actually check the output.	2024-12-06 12:18:51 -05:00
Serge Pavlov	eac8ea323a	[SystemZ] Modify tests for constrained rounding functions (#116952 ) The existing tests for constrained functions often use constant arguments. If constant evaluation is enhanced, such tests will not check code generation of the tested functions. To avoid it, the tests are modified to use loaded value instead of constants. Now only the tests for rounding functions are changed.	2024-11-22 15:15:41 +07:00
Tex Riddell	5c2a133b13	Emit constrained atan2 intrinsic for clang builtin (#113636 ) This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 - `Builtins.td` - Add f16 support for libm atan2 builtin - `CGBuiltin.cpp` - Emit constraint atan2 intrinsic for clang builtin - `clang/test/CodeGenCXX/builtin-calling-conv.cpp` - Use erff instead of atan2 for clang builtin to lib call calling convention check, now that atan2 maps to an intrinsic. - add atan2 cases to llvm.experimental.constrained tests for more backends: ARM, PowerPC, RISCV, SystemZ. - LangRef.rst: add llvm.experimental.constrained.atan2, revise llvm.atan2 description. Last part of Implement the atan2 HLSL Function. Fixes #70096.	2024-11-12 13:34:29 -08:00
Jonas Paulsson	77ddcf7cbf	[SystemZ] Fix bitwidth problem in FindReplicatedImm(). (#115383 ) A test case emerged with an i32 truncating store of an i64 constant operand, where the i64 constant did not fit in 32 bits, which caused FindReplicatedImm() to crash. Make sure to truncate the APInt in these cases.	2024-11-11 22:16:20 +01:00
Kai Nacke	4a37799a48	[SystemZ][XRay] Implement XRay instrumentation for SystemZ (#113253 ) Expands pseudo instructions PATCHABLE_FUNCTION_ENTER and PATCHABLE_RET into a small instruction sequence which calls into the XRay library.	2024-11-05 15:42:55 -05:00
Kai Nacke	8b659736f7	[SystemZ] Make lit test more specific (#115050 ) The lit test fmuladd-soft-float.ll only specifies s390x as platform, but the test is Linux specific, causing problems when run on z/OS. This change updates the triple to fix this.	2024-11-05 15:29:32 -05:00
Jonas Paulsson	117e952a53	[LiveRangeEdit] Remove any MemoryOperand on MI when converting it to KILL. (#114407 ) When LiveRangeEdit::eliminateDeadDef() converts an MI to a KILL instruction, it should also call dropMemRefs() in order to erase any MachineMemOperand present. This was discovered in testing as the MachineVerifier does not accept an MMO without the corresponding MI mayLoad/mayStore flag, which the KILL opcode lacks.	2024-11-05 18:08:27 +01:00
tltao	0aec4d2b78	[SystemZ] Update large ada tests after HLASM syntax change (#113578 ) Fix buildbot failures seen on: https://lab.llvm.org/buildbot/#/builders/42/builds/1597 caused by: https://github.com/llvm/llvm-project/pull/113369 Co-authored-by: Tony Tao <tonytao@ca.ibm.com>	2024-10-24 11:02:36 -04:00
tltao	c170405996	[SystemZ] Introduce GNU and HLASM differences to asmwriter and update tests (#113369 ) Now that the GNU and HLASM `InstPrinter` paths are separated in https://github.com/llvm/llvm-project/pull/112975, differentiate between them in `SystemZInstrFormats.td`. The main difference are: - Tabs converted to space - Remove space after comma for instruction operands --------- Co-authored-by: Tony Tao <tonytao@ca.ibm.com>	2024-10-23 13:06:48 -04:00
Alex Rønne Petersen	5785cbb405	[llvm] Ensure that soft float targets don't emit `fma()` libcalls. (#106615 ) The previous behavior could be harmful in some edge cases, such as emitting a call to `fma()` in the `fma()` implementation itself. Do this by just being more accurate in `isFMAFasterThanFMulAndFAdd()`. This was already done for PowerPC; this commit just extends that to Arm, z/Arch, and x86. MIPS and SPARC already got it right, but I added tests for them too, for good measure. Note: I don't have commit access.	2024-10-19 06:13:15 -07:00
Jay Foad	922992a22f	Fix typo "instrinsic" (#112899 )	2024-10-18 15:58:33 +01:00
Alex Rønne Petersen	ad4a582fd9	[llvm] Consistently respect `naked` fn attribute in `TargetFrameLowering::hasFP()` (#106014 ) Some targets (e.g. PPC and Hexagon) already did this. I think it's best to do this consistently so that frontend authors don't run into inconsistent results when they emit `naked` functions. For example, in Zig, we had to change our emit code to also set `frame-pointer=none` to get reliable results across targets. Note: I don't have commit access.	2024-10-18 09:35:42 +04:00
Jonas Paulsson	76aa370f44	[SystemZ] Remove inlining threshold multiplier. (#106058 ) Due to recently reported problems with having the inlining threshold multiplier set fairly high (x3), this patch removes the multiplier while addressing the regressions seen by doing so in adjustInliningThreshold(). The specific cases that benefit from inlining that were now found to be in need of handling contain a considerable number of memory accesses to the same memory in both caller and callee.	2024-10-07 10:59:45 +02:00
Jonas Paulsson	f9fbfc587d	[SystemZ] Dump function signature on missing arg extension. (#109699 ) Make it easier to handle detected problems by providing the function signature(s) involved in cases of missing argument extensions.	2024-09-30 17:03:18 +02:00
Jonas Paulsson	0ef24aa549	Fix for logic in combineExtract() (#108208 ) A (csmith) test case appeared where combineExtract() crashed when the input vector was a bitcast into a vector of i1:s. Fix this by adding a check with canTreatAsByteVector() before the call.	2024-09-25 12:12:27 +02:00
Jonas Paulsson	14120227a3	Target ABI: improve call parameters extensions handling (#100757 ) For the purpose of verifying proper arguments extensions per the target's ABI, introduce the NoExt attribute that may be used by a target when neither sign- or zeroextension is required (e.g. with a struct in register). The purpose of doing so is to be able to verify that there is always one of these attributes present and by this detecting cases where sign/zero extension is actually missing. As a first step, this patch has the verification step done for the SystemZ backend only, but left off by default until all known issues have been addressed. Other targets/front-ends can now also add NoExt attribute where needed and do this check in the backend.	2024-09-19 16:59:31 +02:00

1 2 3 4 5 ...

1008 Commits