llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	4baf29e81e	[DAG] Handle cases where a shift amount is larger than the pre-extended value bitwidth In the (zext (shl (zext x), cst)) -> (shl (zext x), cst) fold, don't use a bitmask / MaskedValueIsZero as we can't guarantee that the shift amount is in bounds. Fixes #106202	2024-08-27 18:12:24 +01:00
Abhina Sree	a0be7053d7	[SystemZ][z/OS] Continuation of __ptr32 support (#103393 ) This is a continuation of the __ptr32 support added here `135fecd444`	2024-08-14 13:26:30 -04:00
tltao	bc747c3e13	[SystemZ][z/OS] Fix incorrect codegen for ADA_ENTRY pseudo instruction (#101415 ) The current MCInstBuilder for generating an ALGFI when loading something from the ADA is incorrect and will crash the compiler. r0 must also be excluded from the registers returned as the result, since it is treated as the value "0" on z/OS. Also add some tests to properly test the paths where LLILF and ALGFI are generated. --------- Co-authored-by: Tony Tao <tonytao@ca.ibm.com>	2024-08-01 13:23:49 -04:00
Jonas Paulsson	22bc9db92b	[SystemZ] Use the EVT version of getVectorVT() in combineTruncateExtract(). (#100150 ) A test case showed up where the new vector type is v24i16, which is not a simple MVT. In order to get an extended value type for cases like this, EVT::getVectorVT() needs to be called instead of MVT::getVectorVT(), otherwise the following call to getVectorElementType() in combineExtract() will fail.	2024-07-26 14:33:40 +02:00
Björn Pettersson	2b78303e3f	[DAGCombiner] Freeze maybe poison operands when folding select to logic (#84924 ) Just like for regular IR we need to treat SELECT as conditionally blocking poison in SelectionDAG. So (unless the condition itself is poison) the result is only poison if the selected true/false value is poison. Thus, when doing DAG combines that turn SELECT into arithmetic/logical operations (e.g. AND/OR) we need to make sure that the new operations aren't more poisonous. One way to do that is to use FREEZE to make sure the operands aren't posion. This patch aims at fixing the kind of miscompiles reported in https://github.com/llvm/llvm-project/issues/84653 and https://github.com/llvm/llvm-project/issues/85190 Solution is to make sure that we insert FREEZE, if needed to make the fold sound, when using the foldBoolSelectToLogic and foldVSelectToSignBitSplatMask DAG combines.	2024-07-22 17:19:46 +02:00
Volodymyr Vasylkun	e094abde42	[SelectionDAG] Expand [US]CMP using arithmetic on boolean values instead of selects (#98774 ) The previous expansion of [US]CMP was done using two selects and two compares. It produced decent code, but on many platforms it is better to implement [US]CMP nodes by performing the following operation: ``` [us]cmp(x, y) = (x [us]> y) - (x [us]< y) ``` This patch adds this new expansion, as well as a hook in TargetLowering to allow some targets to still use the select-based approach. AArch64 and SystemZ are currently the only targets to prefer the former approach, but other targets may also start to use it if it provides for better codegen.	2024-07-16 20:56:18 +01:00
Ulrich Weigand	e8e406041e	Fix sext_in_reg from i1 to i128 The combineSIGN_EXTEND_INREG routine was using DAG.getConstant(-1, DL, VT), which does not result in the expected value when VT has more than 64 bits. Fix this by using DAG.getAllOnesConstant(DL, VT) instead. Also add test cases for v1i128 comparisons (which triggers the bug).	2024-07-15 11:26:37 +02:00
Manish Kausik H	69192e0193	[LegalizeDAG] Optimize CodeGen for `ISD::CTLZ_ZERO_UNDEF` (#83039 ) Previously we had the same instructions being generated for `ISD::CTLZ` and `ISD::CTLZ_ZERO_UNDEF` which did not take advantage of the fact that zero is an invalid input for `ISD::CTLZ_ZERO_UNDEF`. This commit separates codegen for the two cases to allow for the optimization for the latter case. The details of the optimization are outlined in #82075 Fixes #82075 Co-authored-by: Manish Kausik H <hmamishkausik@gmail.com>	2024-07-08 14:01:32 +01:00
Zibi Sarbinowski	1de1818fab	[SystemZ] Address issue with supper large stack frames (#96318 ) This PR fixes the following failure by adjusting the calculation of maximum displacement from Stack Pointer. `LLVM ERROR: Error while trying to spill R5D from class ADDR64Bit: Cannot scavenge register without an emergency spill slot! `	2024-06-27 09:23:35 -04:00
Farzon Lotfi	189d471191	[clang] Reland Add tanf16 builtin and support for tan constrained intrinsic (#94559 ) Relanding this PR now that https://github.com/llvm/llvm-project/pull/90503 has merged. with `FTAN` landing in [TargetLoweringBase.cpp:L1021](https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/TargetLoweringBase.cpp#L1020C23-L1021C63 ) There is now a llvm tan intrinsic 32\64\128 Expand case for all llvm backends. In LLVM, the `llvm.experimental.constrained.cos` and `llvm.experimental.constrained.sin` intrinsics are used for performing cosine and sine calculations with additional constraints on floating-point operations. This behavior is expected for all floating-point math intrinsics. This change adds these constraints for the `tan` intrinsic. - `Builtins.td` - replace TanF128 with F16F128MathTemplate - `CGBuiltin.cpp` - map existing tan builtins to `tan` and `constrained_tan` intrinsic - `ConstrainedOps.def` map tan and constrained_tan to an ISDOpcode. resolves #91421 --------- Co-authored-by: Farzon Lotfi <farzon@farzon.com>	2024-06-10 20:46:26 -04:00
paperchalice	1bc8b3258e	[NewPM][CodeGen] Port `regallocfast` to new pass manager (#94426 ) This pull request port `regallocfast` to new pass manager. It exposes the parameter `filter` to handle different register classes for AMDGPU. IIUC AMDGPU need to allocate different register classes separately so it need implement its own `--<reg-class>-regalloc`. Now users can use e.g. `-passe=regallocfast<filter=sgpr>` to allocate specific register class. The command line option `--regalloc-npm` is still in work progress, plan to reuse the syntax of passes, e.g. use `--regalloc-npm=regallocfast<filter=sgpr>,greedy<filter=vgpr>` to replace `--sgpr-regalloc` and `--vgpr-regalloc`.	2024-06-07 12:22:42 +08:00
paperchalice	9b0e1c2ca2	[NewPM][CodeGen] Port `finalize-isel` to new pass manager (#94214 ) It should preserve more analysis results, but it happens immediately after instruction selection.	2024-06-04 09:23:52 +08:00
Matt Arsenault	ddb87e0f96	SystemZ: Use REG_SEQUENCE for PAIR128 (#90640 ) PAIR128 should probably just be removed entirely Depends #90638	2024-05-17 13:16:34 +02:00
Jonas Paulsson	d6ee7e8481	[SystemZ] Handle address clobbering in splitMove(). (#92105 ) When expanding an L128 (which is used to reload i128) it is possible that the quadword destination register clobbers an address register. This patch adds an assertion against the case where both of the expanded parts clobber the address, and in the case where one of the expanded parts do so puts it last. Fixes #91437	2024-05-15 08:36:26 +02:00
Ulrich Weigand	de117dd533	[SystemZ] Add some more atomic load/store tests Verify atomic load/store of f128 on z14 where the type lives in VRs.	2024-05-07 16:57:17 +02:00
Ulrich Weigand	0a0cac6dbd	[SystemZ] Simplify f128 atomic load/store (#90977 ) Change definition of expandBitCastI128ToF128 and expandBitCastF128ToI128 to allow for simplified use in atomic load/store. Update logic to split 128-bit loads and stores in DAGCombine to also handle the f128 case where appropriate. This fixes the regressions introduced by recent atomic load/store patches.	2024-05-06 12:17:19 +02:00
Matt Arsenault	eb75af223f	Reapply "SystemZ: Fold copy of vector immediate to gr128" (#91099 ) This reverts commit a415b4dfcc02e3e82b8c8a7836f7c04b9d65dc9b. Modify the instruction in place to transform it into a REG_SEQUENCE, which is what other implementations of foldImmediate do. Also start erasing the def instruction if there are no other uses. Fixes #91110.	2024-05-06 10:00:20 +02:00
Matt Arsenault	4b61d04645	SystemZ: Remove unnecessary REQUIRES asserts from tests	2024-05-06 09:52:35 +02:00
Matt Arsenault	181e82143e	SystemZ: Remove redundant REQUIRES systemz from test	2024-05-06 09:52:35 +02:00
Vitaly Buka	a415b4dfcc	Revert "SystemZ: Fold copy of vector immediate to gr128" (#91099 ) Fails here: https://lab.llvm.org/buildbot/#/builders/239/builds/6893 https://lab.llvm.org/buildbot/#/builders/5/builds/43113 https://lab.llvm.org/buildbot/#/builders/168/builds/20228 Reverts llvm/llvm-project#90706	2024-05-04 23:59:49 -07:00
Simon Pilgrim	caacf8685a	[DAG] Fold freeze(shuffle(x,y,m)) -> shuffle(freeze(x),freeze(y),m) (#90952 ) If the shuffle mask contains no undef elements, then we can move the freeze through a shuffle node. This requires special case handling to create a new ShuffleVectorSDNode. Includes VECTOR_SHUFFLE support for isGuaranteedNotToBeUndefOrPoison / canCreateUndefOrPoison.	2024-05-04 12:03:10 +01:00
Matt Arsenault	49c5f4d56a	SystemZ: Fold copy of vector immediate to gr128 (#90706 ) If materializing a constant in a vector register that is just going to be copied to general registers, directly materialize the immediate in the gpr. This will avoid a few lit test regressions in a future commit.	2024-05-03 18:40:11 +02:00
Matt Arsenault	edbe6ebb4d	SystemZ: Don't promote atomic store in IR (#90899 ) This is the mirror to the recent atomic load change. The same bitcast-back-to-integer case is a small code quality regression for the same reason. This would disappear with a bitcastable legal 128-bit type.	2024-05-03 10:04:12 +02:00
Matt Arsenault	6535e7a400	SystemZ: Remove redundant copy tests from 75f4baa70	2024-05-03 10:03:05 +02:00
Matt Arsenault	38f9c013a0	SystemZ: Stop casting fp typed atomic loads in the IR (#90768 ) shouldCastAtomicLoadInIR is a hack that should be removed. Simple bitcasting of operations should be in the domain of ordinary type legalization and does not need to be done in the IR. This introduces a code quality regression due to the hack currently used to avoid using 128-bit values in the case where the floating point value is ultimately used as an integer. This would be avoidable if there were always a legal 128-bit type (like v2i64). This is a pretty niche situation so I assume it's not important. I implemented about 85% of the work necessary to make v2i64 legal, but it was taking too long and I lack the necessary familiarity with systemz to complete it. I've pushed it here for someone to pick up: https://github.com/arsenm/llvm-project/pull/new/systemz-legal-v2i64 Depends #90861	2024-05-02 21:31:29 +02:00
Matt Arsenault	d11afe1c74	SystemZ: Handle gr128 to fp128 copies in copyPhysReg (#90861 )	2024-05-02 17:46:43 +02:00
Matt Arsenault	3a1e55904b	SystemZ: Add some tests for fp128 atomics with soft-float (#90826 )	2024-05-02 15:22:34 +02:00
Matt Arsenault	15027be6a5	SystemZ: Fix test failing the verifier	2024-05-02 08:37:27 +02:00
Matt Arsenault	376bc73b34	SystemZ: Fix accidentally commented out run line in test	2024-05-02 08:37:27 +02:00
Matt Arsenault	75f4baa705	SystemZ: Implement copyPhysReg between vr128 and gr128 (#90616 ) I have no idea if this is correct and I probably swapped the element ordering somewhere.	2024-04-30 23:02:54 +02:00
Jonas Paulsson	6c32a1fdf7	[SystemZ] Enable MachineCombiner for FP reassociation (#83546 ) Enable MachineCombining for FP add, sub and mul. In order for this to work, the default instruction selection of reg/mem opcodes is disabled for ISD nodes that carry the flags that allow reassociation. The reg/mem folding is instead done after MachineCombiner by PeepholeOptimizer. SystemZInstrInfo optimizeLoadInstr() and foldMemoryOperandImpl() ("LoadMI version") have been implemented for this purpose also by this patch.	2024-04-30 17:09:54 +02:00
Matt Arsenault	738c135ee0	SystemZ: Add more tests for fp128 atomics (#90269 ) These did not have proper floating point uses so weren't representative samples. The bitcast inserted by lowering could be absorbed by the load/store on the source/use.	2024-04-27 20:26:09 +02:00
Kai Nacke	d5022d9ad4	[SystemZ][z/OS] Make z/OS personality function known (#89679 ) This change adds the z/OS personality function to the list of known EH personality functions. It enables removing of the EH data/labels if the personality function is not invoked.	2024-04-23 10:39:03 -04:00
Kai Nacke	cce4dc7b7a	[SystemZ][z/OS] Implement llvm.returnaddress for XPLINK (#89440 ) The implementation follows the ELF implementation.	2024-04-22 11:01:22 -04:00
Kai Nacke	7e2c2981fb	[SystemZ][z/OS] Implement llvm.frameaddr for XPLINK (#89284 ) The implementation follows the ELF implementation.	2024-04-19 08:09:49 -04:00
Jonas Paulsson	7e4c6e98fa	[SystemZ] Bugfix in getDemandedSrcElements(). (#88623 ) For the intrinsic s390_vperm, all of the elements are demanded, so use an APInt with the value of '-1' for them (not '1'). Fixes https://github.com/llvm/llvm-project/issues/88397	2024-04-15 16:32:14 +02:00
Dominik Steenken	b794dc2325	[SystemZ] Add custom handling of legal vectors with reduce-add. (#88495 ) This commit skips the expansion of the `vector.reduce.add` intrinsic on vector-enabled SystemZ targets in order to introduce custom handling of `vector.reduce.add` for legal vector types using the VSUM instructions. This is limited to full vectors with scalar types up to `i32` due to performance concerns. It also adds testing for the generation of such custom handling, and adapts the related cost computation, as well as the testing for that. The expected result is a performance boost in certain benchmarks that make heavy use of `vector.reduce.add` with other benchmarks remaining constant. For instance, the assembly for `vector.reduce.add<4 x i32>` changes from ```hlasm vmrlg %v0, %v24, %v24 vaf %v0, %v24, %v0 vrepf %v1, %v0, 1 vaf %v0, %v0, %v1 vlgvf %r2, %v0, 0 ``` to ```hlasm vgbm %v0, 0 vsumqf %v0, %v24, %v0 vlgvf %r2, %v0, 3 ```	2024-04-12 18:05:30 +02:00
Jonas Paulsson	16b7cc69ef	[SystemZ] Eliminate call sequence instructions early. (#77812 ) On SystemZ, the outgoing argument area which is big enough for all calls in the function is created once during the prolog, as opposed to adjusting the stack around each call. The call-sequence instructions are therefore not really useful any more than to compute the maximum call frame size, which has so far been done by PEI, but can just as well be done at an earlier point. This patch removes the mapping of the CallFrameSetupOpcode and CallFrameDestroyOpcode and instead computes the MaxCallFrameSize directly after instruction selection and then removes the ADJCALLSTACK pseudos. This removes the confusing pseudos and also avoids the problem of having to keep the call frame size accurate when creating new MBBs. This fixes #76618 which exposed the need to maintain the call frame size when splitting blocks (which was not done).	2024-03-28 18:26:38 +01:00
Ulrich Weigand	4b907414d2	[SystemZ] Add support for llvm.readcyclecounter The llvm.readcyclecounter intrinsic can be implemented via the STORE CLOCK FAST (STCKF) instruction.	2024-03-22 20:01:02 +01:00
Jonas Paulsson	7564566779	Reapply "Move assertion for AdjustsStack from PEI to MachineVerifier (#85698 )" - The check is now actually done in both PEI and the MachineVerifier. - More .mir tests trivially updated with "adjustsStack: true" as needed.	2024-03-21 20:24:57 -04:00
Jonas Paulsson	b4b5e8277a	Check for all frame instructions in finalize isel. (#85945 ) Check for all frame instructions in finalize isel, not just for the frame setup opcode. This was proven necessary, see #78001 for discussion.	2024-03-21 11:00:08 -04:00
Jonas Paulsson	9ebd329ad8	Revert "Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698 )" This reverts commit 05bde30585710a51592eee0a6cf6df8184d09c92. Reverting due to verifier complaints with expensive checks on build-bot.	2024-03-20 11:48:30 -04:00
Neumann Hon	5fb2797f23	[GOFF][z/OS] Change PrivateGlobalPrefix and PrivateLabelPrefix to be L# (#85730 ) The current values for PrivateGlobalPrefix and PrivateLabelPrefix (@@ and @ respectively) are, in hindsight, poor choices for multiple reasons: First, there exist externally visible routines from the language environment that begin with @@. These functions are certainly not local/private by any means and they should not share a prefix with private globals. Secondly, both private globals and private labels should be handled the same way by GOFF, so it doesn't make much sense for them to have separate prefixes. GOFF remains the only file format where these are different and there is no reason for that to be the case	2024-03-20 10:30:30 -04:00
Jonas Paulsson	05bde30585	Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698 ) Have the verifier report a missing AdjustsStack flag rather than waiting until PEI asserts.	2024-03-20 10:29:12 -04:00
Ulrich Weigand	335f365982	Reapply: [SystemZ] Fix overflow flag for i128 USUBO We use the VSCBIQ/VSBIQ/VSBCBIQ family of instructions to implement USUBO/USUBO_CARRY for the i128 data type. However, these instructions use an inverted sense of the borrow indication flag (a value of 1 indicates no borrow, while a value of 0 indicated borrow). This does not match the semantics of the boolean "overflow" flag of the USUBO/USUBO_CARRY ISD nodes. Fix this by generating code to explicitly invert the flag. These cancel out of the result of USUBO feeds into an USUBO_CARRY. To avoid unnecessary zero-extend operations, also improve the DAGCombine handling of ZERO_EXTEND to optimize (zext (xor (trunc))) sequences where appropriate. Fixes: https://github.com/llvm/llvm-project/issues/83268	2024-03-19 14:07:08 +01:00
Ulrich Weigand	d1c3795968	Revert "Fix overflow flag for i128 USUBO" This reverts commit d9c31ee9568277e4303715736b40925e41503596.	2024-03-19 11:43:05 +01:00
Ulrich Weigand	d9c31ee956	Fix overflow flag for i128 USUBO We use the VSCBIQ/VSBIQ/VSBCBIQ family of instructions to implement USUBO/USUBO_CARRY for the i128 data type. However, these instructions use an inverted sense of the borrow indication flag (a value of 1 indicates no borrow, while a value of 0 indicated borrow). This does not match the semantics of the boolean "overflow" flag of the USUBO/USUBO_CARRY ISD nodes. Fix this by generating code to explicitly invert the flag. These cancel out of the result of USUBO feeds into an USUBO_CARRY. To avoid unnecessary zero-extend operations, also improve the DAGCombine handling of ZERO_EXTEND to optimize (zext (xor (trunc))) sequences where appropriate. Fixes: https://github.com/llvm/llvm-project/issues/83268	2024-03-19 11:20:52 +01:00
Jonas Paulsson	8b8e1adbde	[SystemZ] Don't lower ATOMIC_LOAD/STORE to LOAD/STORE (#75879 ) - Instead of lowering float/double ISD::ATOMIC_LOAD / ISD::ATOMIC_STORE nodes to regular LOAD/STORE nodes, make them legal and select those nodes properly instead. This avoids exposing them to the DAGCombiner. - AtomicExpand pass no longer casts float/double atomic load/stores to integer (FP128 is still casted).	2024-03-18 17:21:50 -04:00
Jonas Paulsson	09bc6abba6	[MachineFrameInfo] Refactoring around computeMaxcallFrameSize() (NFC) (#78001 ) - Use computeMaxCallFrameSize() in PEI::calculateCallFrameInfo() instead of duplicating the code. - Set AdjustsStack in FinalizeISel instead of in computeMaxCallFrameSize().	2024-03-18 10:37:59 -04:00
Kevin P. Neal	3e9e5e2771	[FPEnv][SystemZ] Correct strictfp test. Correct llvm-reduce strictfp test to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics This test needed the strictfp attribute added to function definitions. Test changes verified with D146845.	2024-02-23 13:00:38 -05:00

1 2 3 4 5 ...

958 Commits