This avoids computing the dominator tree by removing the
simplifyInstruction use.
This was applying simplification with some kind of questionable
load-store forwarding and looking for the global. This had to have
been an ancient hack copied from previous backends. In the OpenCL
case, this is always emitted as required the direct global reference
anyway.
This implements the DAG patterns to enable instruction selection for the
LDAP1 and STL1 instructions from FEAT_LRCPC3. The instructions should
match the following combinations:
* Aqcuiring atomic load + vector insert element for LDAP1.
* Vector extract element + releasing atomic store for STL1.
Patterns have also been added to cope with the DAG structure found when
dealing with 1-lane sub-vectors.
Reviewed By: tmatheson, efriedma
Differential Revision: https://reviews.llvm.org/D153129
This patch readjusts the frame stack for the push and pop instructions
co-author: @Lukacma
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D134599
Combine the two checks into a check if the exponent bits are 0. The
inverted case isn't reachable until a future change, and GlobalISel
currently doesn't attempt the inversion optimization.
https://reviews.llvm.org/D143182
Fixes 3c848194f28decca41b7362f9dd35d4939797724. The TTI hook name got
renamed at some point in the process and the target implementation was
left behind.
Fixes: SWDEV-407329
Summary:
Adding a new option -traceback-table to print out the traceback info of xcoff ojbect file.
Reviewers: James Henderson, Fangrui Song, Stephen Peckham, Xing Xue
Differential Revision: https://reviews.llvm.org/D89049
The sign bit has no impact on the exponent, so strip these away. Saves
on the source modifier encoding cost. I left the GlobalISel handling
until there's a resolution to issue #62628.
We should do this in instcombine too, but legalization should be
introducing more frexps than it currently is where this would occur.
Patterns already exist for 64-bit that I've simply copied and
converted to include the necessary truncation.
Differential Revision: https://reviews.llvm.org/D154350
Extended BPFCheckAndAdjustIR pass with sinkMinMax() transformation
that undoes LICM hoistMinMax pass.
The undo transformation converts the following patterns:
x < min(a, b) -> x < a && x < b
x > min(a, b) -> x > a || x > b
x < max(a, b) -> x < a || x < b
x > max(a, b) -> x > a && x > b
Where 'a' or 'b' is a constant.
Also supports `sext min(...) ...` and `zext min(...) ...`.
Differential Revision: https://reviews.llvm.org/D147990
This patch is a follow up to D149722, D152669 and D153645, where a slightly more
optimized code sequence is generated for 64-bit and 32-bit local-exec accesses
when optimizations are turned on.
Handling is added PPCISelDAGToDAG.cpp in order to check if any D-form loads or
stores that follow an PPCISD::ADD_TLS can be optimized to use an X-Form load or
store. In this particular situation, this allows the ADD_TLS node to be removed
completely.
Differential Revision: https://reviews.llvm.org/D150367
1. Improved code that deduces register class from instruction definitions. Previously if some instruction didn't contain a reg class for an operand it was considered as no information on register class even if other instructions specified the class.
2. Added check on required size of resulting register because in some cases classes with smaller registers had been selected (for example VReg_1).
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D152832
The library expansion has too many paths for all the permutations of
DAZ, unsafe and the 3 exp functions. It's easier to expand it in the
backend when we know all of these things. The library currently misses
the no-infinity check on the overflow, which this handles optimizing
out.
Some of the <3 x half> fast tests regress due to vector widening
dropping flags which will be fixed separately.
Apparently there is no exp10 intrinsic, but there should be. Adds some
deadish code in preparation for adding one while I'm following along
with the current library expansion.
A year ago when I was not invested at all into compilers, I found an assertion error when building an AArch64 debug build with LTO + CFI, among other combinations.
It was posted as a github issue here: https://github.com/llvm/llvm-project/issues/54088
I took it upon myself to revisit the issue now that I have spent some more time working on LLVM.
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D151276
These inherited the fast math checks from f32, but the manual suggests
these should be accurate enough for unconditional use. The definition
of correctly rounded is 0.5ulp, but the manual says "0.51ulp". I've
been a bit nervous about changing this as the OpenCL conformance test
does not cover half. Brute force produces identical values compared to
a reference host implementation for all values.
The SDAG will sometimes insert an extend between
the shift and an and (immediate) even though the
immediate is narrower than the narrow size.
This does not allow us to produce a rotate
instruction (such as rlwinm).
This patch just adds a combine to move the extend
onto the and.
Differential revision: https://reviews.llvm.org/D152911
This change continues with the line of work discussed in https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295.
This change targets all the pseudos used in loads (unit, strided, segmented, fault first, and their combinations). As with previous changes in the series, we replace the existing TA and TU forms with a single unified pseudo with a passthru (which may be implicit_def) and a policy operand.
One quirk is that I went ahead and treated the unmasked mask load instruction (vlm) the same way. We need the pass thru operand to model tail undefined, but since the instruction is unconditionally agnostic and the instruction has no mask, the policy operand is arguably unneeded. I kept it mostly for consistency sake.
Another quirk worth highlighting is that segment loads require a bit of dedicated handling. Surprisingly, we don't have IMPLICIT_DEF nodes of the right types, and attempting to use them results in some odd looking codegen and a few crashes. Instead, I left the REG_SEQUENCE form, and extended InsertVSETVLI to recognize the complex undefs. Arguably, we should probably revisit the handling of undef reg_sequence nodes here, but I'm hoping to side step that in this patch.
As before, we see codegen changes (some improvements and some regressions) due to scheduling differences caused by the extra implicit_def instructions. I did have to delete one register allocation regression test as I couldn't figure out how to meaningfully update it. I spent a significant amount of time trying, and finally gave up.
Differential Revision: https://reviews.llvm.org/D154141
Previously we expanded these in a fast-math way and the device
libraries were relying on this behavior. The libraries have a pending
change to switch to the new target intrinsic.
Unlike the library version, this takes advantage of no-infinities on
the result overflow check.
Following from D153864, this patch implements the lowerDeinterleaveIntrinsic
hook to lower deinterleaves of loads into vlseg2 intrinsics.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D153876
This patch teaches the RISCV TargetLowering class to lower interleave
intrinsics to vsseg2, so it can lower interleaved stores for scalable vectors.
Previously, we could only lower stores of interleaves for fixed length vectors
with vector shuffles.
This uses the lowerInterleaveIntrinsic interface for the interleaved
access pass that was added in D146218, and subsumes the DAG combine
approach taken in D144175
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D153864
This patch adds support for the ADA (associated data area), doing the following:
-Creates the ADA table to handle displacements
-Emits the ADA section in the SystemZAsmPrinter
-Lowers the ADA_ENTRY node into the appropriate load instruction
Differential Revision: https://reviews.llvm.org/D153788
This commit allows generating of complex number intrinsics for expressions
with constants or loops invariants, which are represented as splats.
For instance, after vectorizing loops in the following code snippets,
the ComplexDeinterleaving pass will be able to generate complex number
intrinsics:
```
complex<> x = ...;
for (int i = 0; i < N; ++i)
c[i] = a[i] * b[i] * x;
```
or
```
for (int i = 0; i < N; ++i)
c[i] = a[i] * b[i] * (11.0 + 3.0i);
```
Differential Revision: https://reviews.llvm.org/D153355