The 0 immediate can't be selected to vmsgtu.vi/vmsleu.vi by decrementing
the immediate. To prevent his we had special patterns that provided
alternate lowering for the 0 cases. This relied on tablegen prioritizing
the 0 pattern over the sim5_plus1 range.
This patch introduces simm5_plus1_nonzero that excludes 0. It also
excludes the special case for vmsltu.vi since we can just use
vmsltu.vx and let the 0 be selected to X0.
This is an alternative to some of the changes in D116584.
Reviewed By: Chenbing.Zheng, asb
Differential Revision: https://reviews.llvm.org/D116723
Function calls and compare instructions tend to cause sext.w
instructions to be inserted. If we make good use of W instructions,
these operations can often end up being redundant. We don't always
detect these during SelectionDAG due to things like phis. There also
some cases caused by failure to turn extload into sextload in
SelectionDAG. extload selects to LW allowing later sext.ws to become
redundant.
This patch adds a pass that examines the input of sext.w instructions trying
to determine if it is already sign extended. Either by finding a
W instruction, other instructions that produce a sign extended result,
or looking through instructions that propagate sign bits. It uses
a worklist and visited set to search as far back as necessary.
Reviewed By: asb, kito-cheng
Differential Revision: https://reviews.llvm.org/D116397
The zextload hook is only used to determine whether to insert a
zero_extend or any_extend for narrow types leaving a basic block.
Returning true from this hook tends to cause any load whose output
leaves the basic block to become an LWU instead of an LW.
Since we tend to prefer sexts for i32 compares on RV64, this can
cause extra sext.w instructions to be created in other basic blocks.
If we use LW instead of LWU this gives the MIR pass from D116397
a better chance of removing them.
Another option might be to teach getPreferredExtendForValue in
FunctionLoweringInfo.cpp about our preference for sign_extend of
i32 compares. That would cause SIGN_EXTEND to be chosen for any
value used by a compare instead of using the isZExtFree heuristic.
That will require code to convert from the llvm::Type* to EVT/MVT
as well as querying the type legalization actions to get the
promoted type in order to call TargetLowering::isSExtCheaperThanZExt.
That seemed like many extra steps when no other target wants it.
Though it would avoid us needing to lean on the MIR pass in some cases.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D116567
The function that optimally inserts the exec mask
restore operations by combining the blocks currently
visits the lowered END_CF pseudos in the forward
direction as it iterates the setvector in the order
the entries are inserted in it.
Due to the absence of BranchFolding at -O0, the
irregularly placed BBs cause the forward traversal
to incorrectly place two unconditional branches in
certain BBs while combining them, especially when
an intervening block later gets optimized away in
subsequent iterations.
It is avoided by reverse iterating the setvector.
The blocks at the bottom of a function will get
optimized first before processing those at the top.
Fixes: SWDEV-315215
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D116273
Lower select(I1,Q,Q) by converting vector predicate Q to vector register V,
doing select(I1,V,V), and then converting the resulting V back to Q. Also,
try to avoid creating such situations in the first place.
Add support for Return Oriented Programming (ROP) protection for 32 bit.
This patch also adds a testing for AIX on both 64 and 32 bit.
Reviewed By: amyk
Differential Revision: https://reviews.llvm.org/D111362
These instructions have nothing to do with the new MOP CPY instructions,
and are better named DUP to avoid confusion.
Differential Revision: https://reviews.llvm.org/D116655
This reverts commit 859ebca744e634dcc89a2294ffa41574f947bd62.
The change contained many unrelated changes and e.g. restored
unit test failes for the old lld port.
This reverts commit 640beb38e7710b939b3cfb3f4c54accc694b1d30.
That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort).
Reverting until we have a better solution to s_cselect_b64 codegen cleanup
Change-Id: Ibf8e397df94001f248fba609f072088a46abae08
Reviewed By: kzhuravl
Differential Revision: https://reviews.llvm.org/D115960
Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105
This change extends the addressing mode optimization
pass to HVX vgather. This is specifically intended to
resolve compiler not generating indexed addresses for
vgather stores to vtcm. Changed the vgather pseudo
instructions to accept an immediate operand and handled
addition of appropriate immediate operand in addressing
mode optimization pass.
The current AsmPrinter has support to emit the "Max Skip" operand
(the 3rd of .p2align), however has no support for it to actually be specified.
Adding MaxBytesForAlignment to MachineBasicBlock provides this capability on a
per-block basis. Leaving the value as default (0) causes no observable differences
in behaviour.
Differential Revision: https://reviews.llvm.org/D114590
When constructDup is passed an extract_subvector it tries to use
extract_subvector's operand directly when creating the DUPLANE.
This is invalid when extracting from a scalable vector because the
necessary DUPLANE ISel patterns do not exist.
NOTE: This patch is an update to https://reviews.llvm.org/D110524
that originally fixed this but introduced a bug when the result
VT is 64bits. I've restructured the code so the critial final
else block is entered when necessary.
Differential Revision: https://reviews.llvm.org/D116442
This uses `changeToCall` and `changeToInvokeAndSplitBasicBlock` from
`lib/Transforms/Utils`, replacing the custom logic. One difference of
those functions from our previous logic is they delete the original
`CallInst`/`InvokeInst`, which makes them tricky to use while iterating
through instructions/BBs. So this CL gathers the candidate calls first
and run them through `changeToInvokeAndSplitBasicBlock` later.
Also this renames some variables.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D116620
D107530 did a small optimization that, if a function contains `setjmp`
calls but not other calls that can `longjmp`, we don't do SjLj
transformation on those `setjmp` calls, because they don't have
possibilities of returning from `longjmp`.
But we should remove those `setjmp` calls even in that case, because
Emscripten doesn't provide that function, assuming it is lowered away by
SjLj transformation. `setjmp` always returns 0 when called directly, so
this CL replaces them with `i32 0`.
Fixes https://github.com/emscripten-core/emscripten/issues/15679.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D116619
Ideally we should make USR as Def for these floating point instructions.
However, it violates some assembler MCChecker rules. This patch fixes
the issue by marking these FP instructions as non-sinkable.
For code below:
{
r7 = addasl(r3,r0,#2)
r8 = addasl(r3,r2,#2)
r5 = memw(r3+r0<<#2)
r6 = memw(r3+r2<<#2)
}
{
p1 = cmp.gtu(r6,r5)
if (p1.new) memw(r8+#0) = r5
if (p1.new) memw(r7+#0) = r6
}
{
r0 = mux(p1,r2,r4)
}
In packetizer, a new packet is created for the cmp instruction since
there arent enough resources in previous packet. Also it is determined
that the cmp stalls by 2 cycles since it depends on the prior load of r5.
In current packetizer implementation, the predicated store is evaluated
for whether it can go in the same packet as compare, and since the compare
stalls, the stall of the predicated store does not matter and it can go in
the same packet as the cmp. However the predicated store will stall for
more cycles because of its dependence on the addasl instruction and to
avoid that stall we can put it in a new packet.
Improve the packetizer to check if an instruction being added to packet
will stall longer than instruction already in packet and if so create a
new packet.
Unsigned compares work with either zero extended or sign extended
inputs just like equality comparisons. I didn't allow this when
I refactored the code in D116421 due to lack of tests. But I've
since found a simple C test case that demonstrates when this can be
useful.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D116617
The HexagonVectorCombine pass was moving an instruction
incorrectly, which caused a use in a GEP that was not yet
defined.
HexagonVectorCombine removes a load from a group due to its
dependences, but in realignGroup, the load is processed anyways.
In realignGroup, when determining the maximum alignment, only
those instructions still in the group should be considered.
Previously we only recognized strided loads/store when the initial
value for the phi was a strided constant vector.
This patch extends the support to a strided_constant added to a
splatted value. The rewritten loop will add the splat value to the
first element of the strided constant vector to use as the scalar
start value. The stride is unaffected.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D115958
This ports the logic we generate in instcombine for a single use x.with.overflow check for use in SCEV's analysis. The result is that we can prove trip counts for many checks, and (through existing logic) often discharge them.
Motivation comes from compiling a simple example with -ftrapv.
Differential Revision: https://reviews.llvm.org/D116499
Skip operation on the lower byte in int16 logical left shift when
shift amount is greater than 8.
Skip operation on the higher byte in int16 logical & arithmetic
right shift when shift amount is greater than 8.
Reviewed By: aykevl
Differential Revision: https://reviews.llvm.org/D115594
Currently, the code in TargetLoweringObjectFile only assigns
@init_array section type to plain .init_array sections, but not
prioritized sections like .init_array.00001.
This is inconsistent with the interpretation in the AsmParser
(see 791523bae6/llvm/lib/MC/MCParser/ELFAsmParser.cpp (L621-L632))
and upcoming expectations in LLD
(see https://github.com/rust-lang/rust/issues/92181 for context).
This patch assigns @init_array section type to all sections with an
.init_array prefix. The same is done for .fini_array and
.preinit_array as well. With that, the logic matches the AsmParser.
Differential Revision: https://reviews.llvm.org/D116528
MOV64ri results in a significantly longer encoding, and use of this
operator is fairly avoidable as we can always check the size of the
immediate we're using.
This is an updated version of D99045.
Co-authored-by: Simonas Kazlauskas <git@kazlauskas.me>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D116458
This is similar to what is done for targets that prefer zero extend
where we avoid using a zero extend if the promoted values are sign
extended.
We'll also check for zero extended operands for ugt, ult, uge, and ule when the
target prefers sign extend. This is different than preferring zero extend, where
we only check for sign bits on equality comparisons.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D116421
Early revisions of the VR4300 have a hardware bug where two consecutive
multiplications can produce an incorrect result in the second multiply.
This revision adds the `-mfix4300` flag to llvm (and clang) which, when
passed, provides a software fix for this issue.
More precise description of the "mulmul" bug:
```
mul.[s,d] fd,fs,ft
mul.[s,d] fd,fs,ft or [D]MULT[U] rs,rt
```
When the above sequence is executed by the CPU, if at least one of the
source operands of the first mul instruction happens to be `sNaN`, `0`
or `Infinity`, then the second mul instruction may produce an incorrect
result. This can happen both if the two mul instructions are next to each
other and if the first one is in a delay slot and the second is the first
instruction of the branch target.
Description of the fix:
This fix adds a backend pass to llvm which scans for mul instructions in
each basic block and inserts a nop whenever the following conditions are
met:
- The current instruction is a single or double-precision floating-point
mul instruction.
- The next instruction is either a mul instruction (any kind) or a branch
instruction.
Differential Revision: https://reviews.llvm.org/D116238
For large integers (for example, magic numbers generated by
TargetLowering::BuildSDIV when dividing by constant), we may
need about 4~8 instructions to build them.
In the same time, it just takes two instructions to load
constants (with extra cycles to access memory), so it may be
profitable to put these integers into constant pool.
Reviewed By: asb, craig.topper
Differential Revision: https://reviews.llvm.org/D114950
This patch adds isel support for STRICT_LRINT/LLRINT/LROUND/LLROUND.
It also adds test cases for f32 and f64 constrained intrinsics that
correspond to the intrinsics in float-intrinsics.ll and
double-intrinsics.ll. Support for promoting the integer argument of
STRICT_FPOWI was added.
I've skipped adding tests for f16 intrinsics, since we don't have libcalls
for them and we have inconsistent support for promoting them in LegalizeDAG.
This will need to be examined more closely.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D116323