This avoids the call overhead as well as the the save/restore of
fflags and the snan handling in the libm function.
The save/restore of fflags and snan handling are needed to be
correct for -ftrapping-math. I think we can ignore them in the
default environment.
The inline sequence will generate an invalid exception for nan
and an inexact exception if fractional bits are discarded.
I've used a custom inserter to explicitly create the control flow
around the float->int->float conversion.
We can probably avoid the final fsgnj after the conversion for
no signed zeros FMF, but I'll leave that for future work.
Note the comparison constant is slightly different than glibc uses.
They use 1<<53 for double, I'm using 1<<52. I believe either are valid.
Numbers >= 1<<52 can't have any fractional bits. It's ok to do the
float->int->float conversion on numbers between 1<<53 and 1<<52 since
they will all fit in 64. We only have a problem if the double can't fit
in i64
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D136508
Instead of using vslide1up, use vslide1down and build the other
direction. This avoids the overlap constraint early clobber of
vslide1up.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D136735
This reverts commit 65aaecca8842dec30d03734a7fe8ce33c5afec81.
There was an ordering problem in the calculation of the partial
remainder.
Original commit message:
If the divisor is even, we can first shift the dividend and divisor
right by the number of trailing zeros. Now the divisor is odd and we
can do the original algorithm to calculate a remainder. Then we shift
that remainder left by the number of trailing zeros and add the bits
that were shifted out of the dividend.
Differential Revision: https://reviews.llvm.org/D135541
If the divisor is even, we can first shift the dividend and divisor
right by the number of trailing zeros. Now the divisor is odd and we
can do the original algorithm to calculate a remainder. Then we shift
that remainder left by the number of trailing zeros and add the bits
that were shifted out of the dividend.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D135541
This reverts commit e8b3ffa532b8ebac5dcdf17bb91b47817382c14d.
The AMDGPU/mad_64_32.ll seems to fail on some of the build bots but
passes locally. I'm really confused.
(sra X, BW-1) is either 0 or -1. So the multiply is a conditional
negate of Y.
This pattern shows up when type legalizing wide multiplies involving
a sign extended value.
Fixes PR57549.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D133399
The CanFoldNonConst doesn't work correctly with opaque constants
because getNode won't constant fold constants if one is opaque. Even
if the operation is AND/OR. This can lead to infinite loops.
This patch does the folding manually in the DAGCombine. Alternatively,
we could improve getNode but that seemed likely to have bigger impact
and possibly increase compile time for the additional checks. We wouldn't
want to directly constant fold because we need to preserve the opaque flag.
Fixes PR58511.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D136472
Enable Machine Combiner for O1/O2/O3 optimization levels. It makes RISCV
consistent with other targets running Machine Combiner.
Originally it was enabled only for -O3, however I looked through time reports
and usually it takes 0.1%-0.4% of total time, and never takes more than 1.0%.
Differential Revision: https://reviews.llvm.org/D136339
This is an alternative to fix PR57939 for RISC-V. It definitely
can be argued that the stack temporaries for RISC-V are being created
with an unnecessarily large alignment. But ignoring the alignment
in MachineFrameInfo also seems bad.
Looking at the test update that go with the current ID==0 check,
it was intending to exclude things like the NoAlloc stackid. So I'm
not sure if scalable vectors are intentionally being excluded.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D135913
For RISC-V, load/store(exclude vector load/store) instructions only
has a 12 bit immediate operand. If the offset is out-of-range, it
must make use of a temp register to make up this offset. If between
these offsets, they have a small(IsInt<12>) relative offset,
LocalStackSlotAllocation pass can find a value as frame base register's
value, and replace the origin offset with this register's value plus
the relative offset.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D98101
sifive-7-series has macrofusion support to convert a branch over
a single instruction into a conditional instruction. This can be
an improvement if the branch is hard to predict.
This patch adds support for the most basic case, a branch over a
move instruction. This is implemented as a pseudo instruction so
we can hide the control flow until all code motion passes complete.
I've disabled a recent select optimization if this feature is enabled
in the subtarget.
Related gcc patch for the same optimization https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg211045.html
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D135814
The code incorrectly checked for CTLZ_ZERO_UNDEF instead of
CTTZ_ZERO_UNDEF.
While I was there I flipped the condition into an early out.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D136010
In https://github.com/llvm/llvm-project/issues/57452, we found that IRTranslator is translating `i1 true` into `i32 -1`.
This is because IRTranslator uses SExt for indices.
In this fix, we change the expected behavior of extractelement's index, moving from SExt to ZExt.
This change includes both documentation, SelectionDAG and IRTranslator.
We also included a test for AMDGPU, updated tests for AArch64, Mips, PowerPC, RISCV, VE, WebAssembly and X86
This patch fixes issue #57452.
Differential Revision: https://reviews.llvm.org/D132978
Same with (select C, X, -1), (select C, 0, X), and (select C, X, 0).
There's a DAGCombine after we turn the select into select_cc, but
that may introduce a setcc that didn't previously exist. We could
add more DAGCombines to remove the extra setcc, but this seemed lower
effort.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D135833
Continuing the theme of adding branchless lowerings for simple selects, this time handle the 0 arm case. This is very common for various umin idioms, etc..
Differential Revision: https://reviews.llvm.org/D135600
The patch selects VSELECT/VP_MERGE_VL which uses fmadd/fnmsub as true operand
and the adden of the fmadd/fnmsub as false operand.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D135330
If the source is implicit_def, the register allocator won't have
any constraint on what register it picks for the destination. This
doesn't give the user much control of what register is being used.
So in my mind that means the only reason to honor the policy operand
is to control what policy is used in vsetvli to maybe avoid a vtype
change. Given the other optimizations we do on the policy field, I
don't think allowing the user this control is reliable.
Therefore, I think we should use agnostic policies if the source is
undef.
This should give better performance on some CPUs for VP intrinsics where
there is no merge operand and the backend adds IMPLICIT_DEF to the instruction.
Differential Revision: https://reviews.llvm.org/D135396
This reverts commit 0148df8157f05ecf3b1064508e6f012aefb87dad.
Getting a lit test failures on AMDGPU but I can't reproduce it so far.
Reverting to investigate.
(sra X, BW-1) is either 0 or -1. So the multiply is a conditional
negate of Y.
This pattern shows up when type legalizing wide multiplies involving
a sign extended value.
Fixes PR57549.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D133399
If the divisor is even, we can first shift the dividend and divisor
right by the number of trailing zeros. Now the divisor is odd and we
can do the original algorithm to calculate a remainder. Then we shift
that remainder left by the number of trailing zeros and add the bits
that were shifted out of the dividend.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D135541
The scalar instruction of this is `llvm.trunc`. However the naming of
ISD::VP_TRUNC is already taken by `trunc` of the LLVM IR. Naming this as
`vp.ftrunc` would likely cause confusion with `vp.fptrunc`. So adding
`vp.roundtozero` that will look similar to `vp.roundeven`.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D135233
I tend to think we should ignore the policy bit in vsetvli insertion
if the tied operand is IMPLICIT_DEF. But that raises questions about
what the policy operand on RVV intrinsics means if you also pass
vundefined().
This change at least fixes some cases. I'll post a separate patch
for vsetvli insertion for discussion.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D135386
We can lower these as an or with the negative of the condition value. This appears to result in significantly less branch-y code on multiple common idioms (as seen in tests).
Differential Revision: https://reviews.llvm.org/D135316
Some tests had scalable vector intrinsic names with fixed vector types.
Some had types in the wrong order.
Remove scalable vector test from fixed vector files.
Also replace insert+shuffle constexprs with fixed constant vectors.
Optimization for using compressed beqz and bnez
If there is pattern
```
br_cc val1 constval eq/neq place
select_cc val1 constval eq/neq trueval falseval
```
and constval does not fit in compressed imm format(6 bit), but fit in
imm format(12 bit), we can replace by non compress sub and compress
c.beqz/c.bneqz:
```
addi val val -constval
c.beqz val place
```
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D132839