This pull request modifies the behavior of the
`@llvm.experimental.stackmap` intrinsic to require that its two first
operands (`id` and `numShadowBytes`) be **immediate values**. This
change ensures that variables cannot be passed as two first arguments to
this intrinsic.
Related Issue: https://github.com/llvm/llvm-project/issues/115733
### Testing
- Added new test cases to ensure errors are emitted for non-immediate
operands.
- Ran the full LLVM test suite to verify no regressions were introduced.
The existing tests for constrained functions often use constant
arguments. If constant evaluation is enhanced, such tests will not check
code generation of the tested functions. To avoid it, the tests are
modified to use loaded value instead of constants. Now only the tests
for rounding functions are changed.
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
- `Builtins.td` - Add f16 support for libm atan2 builtin
- `CGBuiltin.cpp` - Emit constraint atan2 intrinsic for clang builtin
- `clang/test/CodeGenCXX/builtin-calling-conv.cpp` - Use erff instead of
atan2 for clang builtin to lib call calling convention check, now that
atan2 maps to an intrinsic.
- add atan2 cases to llvm.experimental.constrained tests for more
backends: ARM, PowerPC, RISCV, SystemZ.
- LangRef.rst: add llvm.experimental.constrained.atan2, revise
llvm.atan2 description.
Last part of Implement the atan2 HLSL Function. Fixes#70096.
A test case emerged with an i32 truncating store of an i64 constant
operand, where the i64 constant did not fit in 32 bits, which caused
FindReplicatedImm() to crash.
Make sure to truncate the APInt in these cases.
The lit test fmuladd-soft-float.ll only specifies s390x as platform,
but the test is Linux specific, causing problems when run on z/OS.
This change updates the triple to fix this.
When LiveRangeEdit::eliminateDeadDef() converts an MI to a KILL instruction,
it should also call dropMemRefs() in order to erase any MachineMemOperand
present.
This was discovered in testing as the MachineVerifier does not accept an MMO
without the corresponding MI mayLoad/mayStore flag, which the KILL opcode
lacks.
Now that the GNU and HLASM `InstPrinter` paths are separated in
https://github.com/llvm/llvm-project/pull/112975, differentiate between
them in `SystemZInstrFormats.td`.
The main difference are:
- Tabs converted to space
- Remove space after comma for instruction operands
---------
Co-authored-by: Tony Tao <tonytao@ca.ibm.com>
The previous behavior could be harmful in some edge cases, such as
emitting a call to `fma()` in the `fma()` implementation itself.
Do this by just being more accurate in `isFMAFasterThanFMulAndFAdd()`.
This was already done for PowerPC; this commit just extends that to Arm,
z/Arch, and x86. MIPS and SPARC already got it right, but I added tests
for them too, for good measure.
Note: I don't have commit access.
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.
Note: I don't have commit access.
Due to recently reported problems with having the inlining threshold multiplier
set fairly high (x3), this patch removes the multiplier while addressing
the regressions seen by doing so in adjustInliningThreshold().
The specific cases that benefit from inlining that were now found to be in need
of handling contain a considerable number of memory accesses to the same
memory in both caller and callee.
A (csmith) test case appeared where combineExtract() crashed when the
input vector was a bitcast into a vector of i1:s. Fix this by adding a check
with canTreatAsByteVector() before the call.
For the purpose of verifying proper arguments extensions per the target's ABI,
introduce the NoExt attribute that may be used by a target when neither sign-
or zeroextension is required (e.g. with a struct in register). The purpose of
doing so is to be able to verify that there is always one of these attributes
present and by this detecting cases where sign/zero extension is actually
missing.
As a first step, this patch has the verification step done for the SystemZ
backend only, but left off by default until all known issues have been
addressed.
Other targets/front-ends can now also add NoExt attribute where needed and do
this check in the backend.
In the (zext (shl (zext x), cst)) -> (shl (zext x), cst) fold, don't use a bitmask / MaskedValueIsZero as we can't guarantee that the shift amount is in bounds.
Fixes#106202
The current MCInstBuilder for generating an ALGFI when loading something
from the ADA is incorrect and will crash the compiler.
r0 must also be excluded from the registers returned as the result,
since it is treated as the value "0" on z/OS.
Also add some tests to properly test the paths where LLILF and ALGFI are
generated.
---------
Co-authored-by: Tony Tao <tonytao@ca.ibm.com>
A test case showed up where the new vector type is v24i16, which is not a simple
MVT. In order to get an extended value type for cases like this, EVT::getVectorVT()
needs to be called instead of MVT::getVectorVT(), otherwise the following call
to getVectorElementType() in combineExtract() will fail.
Just like for regular IR we need to treat SELECT as conditionally
blocking poison in SelectionDAG. So (unless the condition itself is
poison) the result is only poison if the selected true/false value is
poison.
Thus, when doing DAG combines that turn SELECT into arithmetic/logical
operations (e.g. AND/OR) we need to make sure that the new operations
aren't more poisonous. One way to do that is to use FREEZE to make
sure the operands aren't posion.
This patch aims at fixing the kind of miscompiles reported in
https://github.com/llvm/llvm-project/issues/84653
and
https://github.com/llvm/llvm-project/issues/85190
Solution is to make sure that we insert FREEZE, if needed to make
the fold sound, when using the foldBoolSelectToLogic and
foldVSelectToSignBitSplatMask DAG combines.
The previous expansion of [US]CMP was done using two selects and two
compares. It produced decent code, but on many platforms it is better to
implement [US]CMP nodes by performing the following operation:
```
[us]cmp(x, y) = (x [us]> y) - (x [us]< y)
```
This patch adds this new expansion, as well as a hook in TargetLowering to allow some targets to still use the select-based approach. AArch64 and SystemZ are currently the only targets to prefer the former approach, but other targets may also start to use it if it provides for better codegen.
The combineSIGN_EXTEND_INREG routine was using
DAG.getConstant(-1, DL, VT), which does not result in
the expected value when VT has more than 64 bits.
Fix this by using DAG.getAllOnesConstant(DL, VT) instead.
Also add test cases for v1i128 comparisons (which triggers
the bug).
Previously we had the same instructions being generated for `ISD::CTLZ` and `ISD::CTLZ_ZERO_UNDEF` which did not take advantage of the fact that zero is an invalid input for `ISD::CTLZ_ZERO_UNDEF`. This commit separates codegen for the two cases to allow for the optimization for the latter case.
The details of the optimization are outlined in #82075Fixes#82075
Co-authored-by: Manish Kausik H <hmamishkausik@gmail.com>
This PR fixes the following failure by adjusting the calculation of
maximum displacement from Stack Pointer.
`LLVM ERROR: Error while trying to spill R5D from class ADDR64Bit:
Cannot scavenge register without an emergency spill slot!
`
Relanding this PR now that
https://github.com/llvm/llvm-project/pull/90503 has merged. with `FTAN`
landing in
[TargetLoweringBase.cpp:L1021](https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/TargetLoweringBase.cpp#L1020C23-L1021C63
) There is now a llvm tan intrinsic 32\64\128 Expand case for all llvm
backends.
In LLVM, the `llvm.experimental.constrained.cos` and
`llvm.experimental.constrained.sin` intrinsics are used for performing
cosine and sine calculations with additional constraints on
floating-point operations. This behavior is expected for all
floating-point math intrinsics. This change adds these constraints for
the `tan` intrinsic.
- `Builtins.td` - replace TanF128 with F16F128MathTemplate
- `CGBuiltin.cpp` - map existing tan builtins to `tan` and
`constrained_tan` intrinsic
- `ConstrainedOps.def` map tan and constrained_tan to an ISDOpcode.
resolves #91421
---------
Co-authored-by: Farzon Lotfi <farzon@farzon.com>
This pull request port `regallocfast` to new pass manager. It exposes
the parameter `filter` to handle different register classes for AMDGPU.
IIUC AMDGPU need to allocate different register classes separately so it
need implement its own `--<reg-class>-regalloc`. Now users can use e.g.
`-passe=regallocfast<filter=sgpr>` to allocate specific register class.
The command line option `--regalloc-npm` is still in work progress, plan
to reuse the syntax of passes, e.g. use
`--regalloc-npm=regallocfast<filter=sgpr>,greedy<filter=vgpr>` to
replace `--sgpr-regalloc` and `--vgpr-regalloc`.
When expanding an L128 (which is used to reload i128) it is
possible that the quadword destination register clobbers an
address register. This patch adds an assertion against the case
where both of the expanded parts clobber the address, and in the
case where one of the expanded parts do so puts it last.
Fixes#91437
Change definition of expandBitCastI128ToF128 and expandBitCastF128ToI128
to allow for simplified use in atomic load/store.
Update logic to split 128-bit loads and stores in DAGCombine to also
handle the f128 case where appropriate. This fixes the regressions
introduced by recent atomic load/store patches.
This reverts commit a415b4dfcc02e3e82b8c8a7836f7c04b9d65dc9b.
Modify the instruction in place to transform it into a REG_SEQUENCE,
which is what other implementations of foldImmediate do. Also start
erasing the def instruction if there are no other uses.
Fixes#91110.
If the shuffle mask contains no undef elements, then we can move the freeze through a shuffle node.
This requires special case handling to create a new ShuffleVectorSDNode.
Includes VECTOR_SHUFFLE support for isGuaranteedNotToBeUndefOrPoison / canCreateUndefOrPoison.
If materializing a constant in a vector register that is just
going to be copied to general registers, directly materialize
the immediate in the gpr. This will avoid a few lit test regressions
in a future commit.
This is the mirror to the recent atomic load change. The same
bitcast-back-to-integer case is a small code quality regression for the
same reason. This would disappear with a bitcastable legal 128-bit type.
shouldCastAtomicLoadInIR is a hack that should be removed. Simple
bitcasting of operations should be in the domain of ordinary type
legalization and does not need to be done in the IR.
This introduces a code quality regression due to the hack currently used
to avoid using 128-bit values in the case where the floating point value
is ultimately used as an integer. This would be avoidable if there were
always a legal 128-bit type (like v2i64). This is a pretty niche
situation so I assume it's not important.
I implemented about 85% of the work necessary to make v2i64 legal, but
it was taking too long and I lack the necessary familiarity with systemz
to complete it. I've pushed it here for someone to pick up:
https://github.com/arsenm/llvm-project/pull/new/systemz-legal-v2i64
Depends #90861