41506 Commits

Author SHA1 Message Date
Daniel Kiss
131c06e6da Revert "[AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions."
This reverts commit f903c8505515f15e956febbd8cdfa0037fbaf689.
2022-01-06 19:17:45 +01:00
Craig Topper
ec4dd862bf [RISCV] Use simm5_plus1_nonzero in isel patterns for vmsgeu.vi/vmsltu.vi intrinsics.
The 0 immediate can't be selected to vmsgtu.vi/vmsleu.vi by decrementing
the immediate. To prevent his we had special patterns that provided
alternate lowering for the 0 cases. This relied on tablegen prioritizing
the 0 pattern over the sim5_plus1 range.

This patch introduces simm5_plus1_nonzero that excludes 0. It also
excludes the special case for vmsltu.vi since we can just use
vmsltu.vx and let the 0 be selected to X0.

This is an alternative to some of the changes in D116584.

Reviewed By: Chenbing.Zheng, asb

Differential Revision: https://reviews.llvm.org/D116723
2022-01-06 08:27:27 -08:00
Craig Topper
56ca11e31e [RISCV] Add an MIR pass to replace redundant sext.w instructions with copies.
Function calls and compare instructions tend to cause sext.w
instructions to be inserted. If we make good use of W instructions,
these operations can often end up being redundant. We don't always
detect these during SelectionDAG due to things like phis. There also
some cases caused by failure to turn extload into sextload in
SelectionDAG. extload selects to LW allowing later sext.ws to become
redundant.

This patch adds a pass that examines the input of sext.w instructions trying
to determine if it is already sign extended. Either by finding a
W instruction, other instructions that produce a sign extended result,
or looking through instructions that propagate sign bits. It uses
a worklist and visited set to search as far back as necessary.

Reviewed By: asb, kito-cheng

Differential Revision: https://reviews.llvm.org/D116397
2022-01-06 08:23:42 -08:00
Craig Topper
75117fb340 [RISCV] Don't advertise i32->i64 zextload as free for RV64.
The zextload hook is only used to determine whether to insert a
zero_extend or any_extend for narrow types leaving a basic block.
Returning true from this hook tends to cause any load whose output
leaves the basic block to become an LWU instead of an LW.

Since we tend to prefer sexts for i32 compares on RV64, this can
cause extra sext.w instructions to be created in other basic blocks.

If we use LW instead of LWU this gives the MIR pass from D116397
a better chance of removing them.

Another option might be to teach getPreferredExtendForValue in
FunctionLoweringInfo.cpp about our preference for sign_extend of
i32 compares. That would cause SIGN_EXTEND to be chosen for any
value used by a compare instead of using the isZExtFree heuristic.
That will require code to convert from the llvm::Type* to EVT/MVT
as well as querying the type legalization actions to get the
promoted type in order to call TargetLowering::isSExtCheaperThanZExt.
That seemed like many extra steps when no other target wants it.
Though it would avoid us needing to lean on the MIR pass in some cases.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D116567
2022-01-06 08:13:42 -08:00
Nikita Popov
f430c1eb64 [Tests] Add elementtype attribute to indirect inline asm operands (NFC)
This updates LLVM tests for D116531 by adding elementtype attributes
to operands that correspond to indirect asm constraints.
2022-01-06 14:23:51 +01:00
David Green
ba927f66c0 [AArch64] Regenerate arith overflow test, and add a few more select tests. NFC 2022-01-06 11:02:14 +00:00
Christudasan Devadasan
50b5b367c1 [AMDGPU] Iterate LoweredEndCf in the reverse order
The function that optimally inserts the exec mask
restore operations by combining the blocks currently
visits the lowered END_CF pseudos in the forward
direction as it iterates the setvector in the order
the entries are inserted in it.

Due to the absence of BranchFolding at -O0, the
irregularly placed BBs cause the forward traversal
to incorrectly place two unconditional branches in
certain BBs while combining them, especially when
an intervening block later gets optimized away in
subsequent iterations.

It is avoided by reverse iterating the setvector.
The blocks at the bottom of a function will get
optimized first before processing those at the top.

Fixes: SWDEV-315215

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D116273
2022-01-06 00:27:11 -05:00
Ikhlas Ajbar
2819e5de42 [Hexagon] Handle instruction selection for select(I1,Q,Q)
Lower select(I1,Q,Q) by converting vector predicate Q to vector register V,
doing select(I1,V,V), and then converting the resulting V back to Q. Also,
try to avoid creating such situations in the first place.
2022-01-05 14:50:12 -08:00
Stefan Pintilie
04496201e0 [PowerPC] Add support for ROP protection for 32 bit.
Add support for Return Oriented Programming (ROP) protection for 32 bit.
This patch also adds a testing for AIX on both 64 and 32 bit.

Reviewed By: amyk

Differential Revision: https://reviews.llvm.org/D111362
2022-01-05 15:15:53 -06:00
David Green
ca7ffe09dc [AArch64] Rename CPY to DUP. NFC
These instructions have nothing to do with the new MOP CPY instructions,
and are better named DUP to avoid confusion.

Differential Revision: https://reviews.llvm.org/D116655
2022-01-05 20:02:39 +00:00
Nico Weber
085f078307 Revert "Revert D109159 "[amdgpu] Enable selection of s_cselect_b64.""
This reverts commit 859ebca744e634dcc89a2294ffa41574f947bd62.
The change contained many unrelated changes and e.g. restored
unit test failes for the old lld port.
2022-01-05 13:10:25 -05:00
David Salinas
859ebca744 Revert D109159 "[amdgpu] Enable selection of s_cselect_b64."
This reverts commit 640beb38e7710b939b3cfb3f4c54accc694b1d30.

That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort).
Reverting until we have a better solution to s_cselect_b64 codegen cleanup

Change-Id: Ibf8e397df94001f248fba609f072088a46abae08

Reviewed By: kzhuravl

Differential Revision: https://reviews.llvm.org/D115960

Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105
2022-01-05 17:57:32 +00:00
Christudasan Devadasan
e7b89f3222 [AMDGPU] Regenerate test checks in collapse-endcf.mir. NFC 2022-01-05 12:06:11 -05:00
Shubham Pawar
41085357df [Hexagon] Extend OptAddrMode pass to vgather
This change extends the addressing mode optimization
pass to HVX vgather. This is specifically intended to
resolve compiler not generating indexed addresses for
vgather stores to vtcm. Changed the vgather pseudo
instructions to accept an immediate operand and handled
addition of appropriate immediate operand in addressing
mode optimization pass.
2022-01-05 08:44:21 -08:00
David Green
c30f97872f [AArch64] Regenerate some mir tests to new format. NFC 2022-01-05 15:12:22 +00:00
Nicholas Guy
13992498cd [AArch64][CodeGen] Emit alignment "Max Skip" operand for AArch64 loops
Differential Revision: https://reviews.llvm.org/D114879
2022-01-05 12:54:31 +00:00
Nicholas Guy
73d92faa2f [CodeGen] Emit alignment "Max Skip" operand
The current AsmPrinter has support to emit the "Max Skip" operand
(the 3rd of .p2align), however has no support for it to actually be specified.
Adding MaxBytesForAlignment to MachineBasicBlock provides this capability on a
per-block basis. Leaving the value as default (0) causes no observable differences
in behaviour.

Differential Revision: https://reviews.llvm.org/D114590
2022-01-05 12:54:30 +00:00
Paul Walker
3728a7de34 [SVE] Add ISel for fabs(fsub(a,b)) ==> FABD.
Differential Revision: https://reviews.llvm.org/D116227
2022-01-05 11:59:25 +00:00
Paul Walker
4325fd7402 [AArch64ISelLowering] Don't look through scalable extract_subvector when optimising DUPLANE.
When constructDup is passed an extract_subvector it tries to use
extract_subvector's operand directly when creating the DUPLANE.
This is invalid when extracting from a scalable vector because the
necessary DUPLANE ISel patterns do not exist.

NOTE: This patch is an update to https://reviews.llvm.org/D110524
that originally fixed this but introduced a bug when the result
VT is 64bits. I've restructured the code so the critial final
else block is entered when necessary.

Differential Revision: https://reviews.llvm.org/D116442
2022-01-05 11:56:59 +00:00
Victor Perez
96e220e688 [LegalizeTypes][VP] Add integer promotion support for vp.select
Promote select, vselect and vp.select in a similar way.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116400
2022-01-05 11:01:52 +00:00
Victor Perez
df5226dfb3 [LegalizeTypes][VP] Add widening support for vp.select
Widen vp.select the same way as select and vselect.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116407
2022-01-05 09:21:11 +00:00
Zi Xuan Wu
9566cf16ad [CSKY] Add codegen of select/br/cmp instruction and some frame lowering infra
Add basic integer codegen of select/br/cmp instruction. It also includes frame lowering code
such as prologue/epilogue.
2022-01-05 15:59:03 +08:00
Heejin Ahn
f2a43f06dd [WebAssembly] Use llvm utility functions in EH/SjLj
This uses `changeToCall` and `changeToInvokeAndSplitBasicBlock` from
`lib/Transforms/Utils`, replacing the custom logic. One difference of
those functions from our previous logic is they delete the original
`CallInst`/`InvokeInst`, which makes them tricky to use while iterating
through instructions/BBs. So this CL gathers the candidate calls first
and run them through `changeToInvokeAndSplitBasicBlock` later.

Also this renames some variables.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D116620
2022-01-04 17:47:20 -08:00
Heejin Ahn
f178f61e1d [WebAssembly] Nullify unnecessary setjmp calls
D107530 did a small optimization that, if a function contains `setjmp`
calls but not other calls that can `longjmp`, we don't do SjLj
transformation on those `setjmp` calls, because they don't have
possibilities of returning from `longjmp`.

But we should remove those `setjmp` calls even in that case, because
Emscripten doesn't provide that function, assuming it is lowered away by
SjLj transformation. `setjmp` always returns 0 when called directly, so
this CL replaces them with `i32 0`.

Fixes https://github.com/emscripten-core/emscripten/issues/15679.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D116619
2022-01-04 17:44:32 -08:00
Sumanth Gundapaneni
822448635e [Hexagon] Fix MachineSink not to hoist FP instructions that update USR.
Ideally we should make USR as Def for these floating point instructions.
However, it violates some assembler MCChecker rules. This patch fixes
the issue by marking these FP instructions as non-sinkable.
2022-01-04 15:55:22 -08:00
SANTANU DAS
52f347010a [Hexagon] Make A2_tfrsi not cheap for operands exceeding 16 bits
This patch aids to reduce code size since it removes generation
of back-to-back A2_tfrsi instructions. It is enabled only at -Os/-Oz.
2022-01-04 15:46:26 -08:00
Krzysztof Parzyszek
60944d132f [Hexagon] Convert codegen testcase from .ll to .mir 2022-01-04 15:41:32 -08:00
Harsha Jagasia
2b1c6df5a6 [Hexagon] Performance regression with b2b
For code below:
        {
                r7 = addasl(r3,r0,#2)
                r8 = addasl(r3,r2,#2)
                r5 = memw(r3+r0<<#2)
                r6 = memw(r3+r2<<#2)
        }
        {
                p1 = cmp.gtu(r6,r5)
                if (p1.new) memw(r8+#0) = r5
                if (p1.new) memw(r7+#0) = r6
        }
        {
                r0 = mux(p1,r2,r4)

        }

In packetizer, a new packet is created for the cmp instruction since
there arent enough resources in previous packet. Also it is determined
that the cmp stalls by 2 cycles since it depends on the prior load of r5.
In current packetizer implementation, the predicated store is evaluated
for whether it can go in the same packet as compare, and since the compare
stalls, the stall of the predicated store does not matter and it can go in
the same packet as the cmp. However the predicated store will stall for
more cycles because of its dependence on the addasl instruction and to
avoid that stall we can put it in a new packet.

Improve the packetizer to check if an instruction being added to packet
will stall longer than instruction already in packet and if so create a
new packet.
2022-01-04 14:09:47 -08:00
Craig Topper
a04b532505 [LegalizeIntegerTypes][RISCV] Teach PromoteSetCCOperands to check sign bits of unsigned compares.
Unsigned compares work with either zero extended or sign extended
inputs just like equality comparisons. I didn't allow this when
I refactored the code in D116421 due to lack of tests. But I've
since found a simple C test case that demonstrates when this can be
useful.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D116617
2022-01-04 12:38:47 -08:00
Brendon Cahoon
db5b791595 [Hexagon] Fix an instruction move in HexagonVectorCombine
The HexagonVectorCombine pass was moving an instruction
incorrectly, which caused a use in a GEP that was not yet
defined.

HexagonVectorCombine removes a load from a group due to its
dependences, but in realignGroup, the load is processed anyways.
In realignGroup, when determining the maximum alignment, only
those instructions still in the group should be considered.
2022-01-04 11:41:42 -08:00
Tasmia Rahman
e88eb6443f [Hexagon] Fix buildVector32 for v4i8 constants
The code for constructing a 32-bit constant from 4 8-bit constants has
a typo and uses one of the constants twice
2022-01-04 11:19:15 -08:00
Krzysztof Parzyszek
78f5014fea [Hexagon] Conversions to/from FP types, HVX and scalar
Co-authored-by: Anirudh Sundar Subramaniam <quic_sanirudh@quicinc.com>
Co-authored-by: Sumanth Gundapaneni <sgundapa@quicinc.com>
2022-01-04 11:03:51 -08:00
Craig Topper
df2e728b77 [RISCV] Teach RISCVGatherScatterLowering to handle more complex recurrence start values.
Previously we only recognized strided loads/store when the initial
value for the phi was a strided constant vector.

This patch extends the support to a strided_constant added to a
splatted value. The rewritten loop will add the splat value to the
first element of the strided constant vector to use as the scalar
start value. The stride is unaffected.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D115958
2022-01-04 10:13:34 -08:00
Philip Reames
b061d86c69 [SCEV] Compute exit count from overflow check expressed w/ x.with.overflow intrinsics
This ports the logic we generate in instcombine for a single use x.with.overflow check for use in SCEV's analysis. The result is that we can prove trip counts for many checks, and (through existing logic) often discharge them.

Motivation comes from compiling a simple example with -ftrapv.

Differential Revision: https://reviews.llvm.org/D116499
2022-01-04 09:44:23 -08:00
Ben Shi
99e7bf46c9 [AVR] Optimize int16 shift operation for shift amount greater than 8
Skip operation on the lower byte in int16 logical left shift when
shift amount is greater than 8.

Skip operation on the higher byte in int16 logical & arithmetic
right shift when shift amount is greater than 8.

Reviewed By: aykevl

Differential Revision: https://reviews.llvm.org/D115594
2022-01-04 11:48:50 +00:00
Ben Shi
f4ef79306c [AVR] Optimize int8 arithmetic right shift 6 bits
Reviewed By: aykevl

Differential Revision: https://reviews.llvm.org/D115593
2022-01-04 10:36:03 +00:00
Nikita Popov
4ef560ec60 [ELF] Handle .init_array prefix consistently
Currently, the code in TargetLoweringObjectFile only assigns
@init_array section type to plain .init_array sections, but not
prioritized sections like .init_array.00001.

This is inconsistent with the interpretation in the AsmParser
(see 791523bae6/llvm/lib/MC/MCParser/ELFAsmParser.cpp (L621-L632))
and upcoming expectations in LLD
(see https://github.com/rust-lang/rust/issues/92181 for context).

This patch assigns @init_array section type to all sections with an
.init_array prefix. The same is done for .fini_array and
.preinit_array as well. With that, the logic matches the AsmParser.

Differential Revision: https://reviews.llvm.org/D116528
2022-01-04 09:42:58 +01:00
Ben Shi
9fb4e79d06 Revert "[AVR] Optimize int8 arithmetic right shift 6 bits"
This reverts commit 5723261370b45fa4d0d295845c6ef9e223f2ff4a.

There are failures as reported in

https://lab.llvm.org/buildbot#builders/16/builds/21638
https://lab.llvm.org/buildbot#builders/104/builds/5394
2022-01-04 04:14:15 +00:00
Ben Shi
5723261370 [AVR] Optimize int8 arithmetic right shift 6 bits
Reviewed By: aykevl

Differential Revision: https://reviews.llvm.org/D115593
2022-01-04 03:20:29 +00:00
Erik Desjardins
a390c9905d [X86] Improve selection of the mov instruction in FrameLowering
MOV64ri results in a significantly longer encoding, and use of this
operator is fairly avoidable as we can always check the size of the
immediate we're using.

This is an updated version of D99045.

Co-authored-by: Simonas Kazlauskas <git@kazlauskas.me>

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116458
2022-01-03 11:10:16 -08:00
Erik Desjardins
95cf30401c [X86] autogen segmented stacks tests (NFC)
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116420
2022-01-03 11:09:35 -08:00
Victor Perez
5527139302 [RISCV][VP] Add RVV codegen for [nX]vXi1 vp.select
Expand [nX]vXi1 vp.select the same way as [nX]vXi1 vselect.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D115546
2022-01-02 23:12:32 -08:00
Craig Topper
d00e438cfe [RISCV][LegalizeIntegerTypes] Teach PromoteSetCCOperands not to sext i32 comparisons for RV64 if the promoted values are already zero extended.
This is similar to what is done for targets that prefer zero extend
where we avoid using a zero extend if the promoted values are sign
extended.

We'll also check for zero extended operands for ugt, ult, uge, and ule when the
target prefers sign extend. This is different than preferring zero extend, where
we only check for sign bits on equality comparisons.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D116421
2021-12-31 17:15:20 -08:00
Random
2edcde00cb [MIPS] Add -mfix4300 flag to enable vr4300 mulmul bugfix pass
Early revisions of the VR4300 have a hardware bug where two consecutive
multiplications can produce an incorrect result in the second multiply.
This revision adds the `-mfix4300` flag to llvm (and clang) which, when
passed, provides a software fix for this issue.

More precise description of the "mulmul" bug:
```
mul.[s,d] fd,fs,ft
mul.[s,d] fd,fs,ft  or  [D]MULT[U] rs,rt
```

When the above sequence is executed by the CPU, if at least one of the
source operands of the first mul instruction happens to be `sNaN`, `0`
or `Infinity`, then the second mul instruction may produce an incorrect
result. This can happen both if the two mul instructions are next to each
other and if the first one is in a delay slot and the second is the first
instruction of the branch target.

Description of the fix:
This fix adds a backend pass to llvm which scans for mul instructions in
each basic block and inserts a nop whenever the following conditions are
met:

 - The current instruction is a single or double-precision floating-point
   mul instruction.
 - The next instruction is either a mul instruction (any kind) or a branch
   instruction.

Differential Revision: https://reviews.llvm.org/D116238
2021-12-31 15:59:44 +03:00
Jay Foad
866b195cb9 [AMDGPU] Regenerate checks for waitcnt-overflow.mir 2021-12-31 11:27:15 +00:00
wangpc
41454ab256 [RISCV] Use constant pool for large integers
For large integers (for example, magic numbers generated by
TargetLowering::BuildSDIV when dividing by constant), we may
need about 4~8 instructions to build them.
In the same time, it just takes two instructions to load
constants (with extra cycles to access memory), so it may be
profitable to put these integers into constant pool.

Reviewed By: asb, craig.topper

Differential Revision: https://reviews.llvm.org/D114950
2021-12-31 14:48:48 +08:00
jacquesguan
05f82dc877 [RISCV] Fix incorrect cases of vmv.s.f in the VSETVLI insert pass.
Fix incorrect cases of vmv.s.f and add test cases for it.

Differential Revision: https://reviews.llvm.org/D116432
2021-12-31 14:17:03 +08:00
Krzysztof Parzyszek
db83e3e507 [Hexagon] Generate HVX/FP arithmetic instructions
Co-authored-by: Anirudh Sundar Subramaniam <quic_sanirudh@quicinc.com>
Co-authored-by: Sumanth Gundapaneni <sgundapa@quicinc.com>
Co-authored-by: Joshua Herrera <joshherr@quicinc.com>
2021-12-30 12:47:30 -08:00
Krzysztof Parzyszek
9e6afbedb0 [Hexagon] Generate HVX/FP compare instructions
Co-authored-by: Anirudh Sundar Subramaniam <quic_sanirudh@quicinc.com>
2021-12-30 12:17:22 -08:00
Craig Topper
15787ccd45 [RISCV] Add support for STRICT_LRINT/LLRINT/LROUND/LLROUND. Tests for other strict intrinsics.
This patch adds isel support for STRICT_LRINT/LLRINT/LROUND/LLROUND.

It also adds test cases for f32 and f64 constrained intrinsics that
correspond to the intrinsics in float-intrinsics.ll and
double-intrinsics.ll. Support for promoting the integer argument of
STRICT_FPOWI was added.

I've skipped adding tests for f16 intrinsics, since we don't have libcalls
for them and we have inconsistent support for promoting them in LegalizeDAG.
This will need to be examined more closely.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D116323
2021-12-30 11:54:32 -08:00