52796 Commits

Author SHA1 Message Date
Amara Emerson
65f946cba4 [RISCV] Fix some GlobalISel tests using -march instead of -mtriple.
This caused llc to assume the wrong target triple and broke some internal
AS sanitizer bots.
2023-10-19 16:30:47 -07:00
Min-Yih Hsu
e353cd8173
[RISCV] Apply IsSignExtendingOpW = 1 on fcvtmod.w.d (#69633)
Such that RISCVOptWInstrs can eliminate the redundant sign extend.
2023-10-19 14:55:33 -07:00
Tobias Stadler
b1a6b2cc40
[AArch64][GlobalISel] Fix miscompile on carry-in selection (#68840)
Eliding the vReg to NZCV conversion instruction for G_UADDE/... is illegal if
it causes the carry generating instruction to become dead because ISel
will just remove the dead instruction.
I accidentally introduced this here: https://reviews.llvm.org/D153164.
As far as I can tell, this is not exposed on the default clang settings,
because on O0 there is always a G_AND between boolean defs and uses, so
the optimization doesn't apply. Thus, when I tried to commit
https://reviews.llvm.org/D159140, which removes these G_ANDs on O0, I
broke some UBSan tests.
We fix this by recursively selecting the previous (NZCV-setting) instruction before continuing selection for the current instruction.
2023-10-19 19:50:46 +02:00
Caroline Concatto
200a92520c [Clang][SVE2.1] Add builtins and intrinsics for SVBFMLSLB/T
As described in: https://github.com/ARM-software/acle/pull/257

Patch by: Kerry McLaughlin <kerry.mclaughlin@arm.com>

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D151461
2023-10-19 16:44:39 +00:00
Matt Arsenault
3e49ce6ea1
InlineSpiller: Delete assert that implicit_def has no implicit operands (#69087)
It's not a verifier enforced property that implicit_def may only have
one operand. Fixes assertions after the coalescer implicit-defs to
preserve super register liveness to arbitrary instructions.

For some reason I'm unable to reproduce this as a MIR test running only
the allocator for the x86 test. Not sure it's worth keeping around.
2023-10-20 00:51:12 +09:00
Jay Foad
21e1b13f33
[TwoAddressInstruction] Handle physical registers with LiveIntervals (#66784)
Teach the LiveIntervals path in isPlainlyKilled to handle physical
registers, to get equivalent functionality with the LiveVariables path.

Test this by adding -early-live-intervals RUN lines to a handful of
tests that would fail without this.
2023-10-19 16:26:30 +01:00
Pierre van Houtryve
40a426fac6
[AMDGPU] Constant fold FMAD_FTZ (#69443)
Solves #68315
2023-10-19 16:05:51 +02:00
Simon Pilgrim
8505c3b15b [DAG] canCreateUndefOrPoison - remove AssertSext/AssertZext assumption that they never create undef/poison
We need to assume that we generate poison if the assertions failed

Fixes #66603
2023-10-19 13:28:53 +01:00
Simon Pilgrim
309e41dd13 [DAG] Add test coverage for Issue #66603 2023-10-19 13:28:52 +01:00
Momchil Velikov
d15fff6c69 Re-apply '[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432)'
This reverts revert 19505072123e43eccf528b660973067b5c9b4a26.

An issue was fixed in bea3684944c0d7962cd53ab77aad756cfee76b7c
and some newly appeared tests updated.
2023-10-19 13:18:25 +01:00
Ramkumar Ramachandra
98c90a13c6
ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering (#66924)
The issue #55208 noticed that std::rint is vectorized by the
SLPVectorizer, but a very similar function, std::lrint, is not.
std::lrint corresponds to ISD::LRINT in the SelectionDAG, and
std::llrint is a familiar cousin corresponding to ISD::LLRINT. Now,
neither ISD::LRINT nor ISD::LLRINT have a corresponding vector variant,
and the LangRef makes this clear in the documentation of llvm.lrint.*
and llvm.llrint.*.

This patch extends the LangRef to include vector variants of
llvm.lrint.* and llvm.llrint.*, and lays the necessary ground-work of
scalarizing it for all targets. However, this patch would be devoid of
motivation unless we show the utility of these new vector variants.
Hence, the RISCV target has been chosen to implement a custom lowering
to the vfcvt.x.f.v instruction. The patch also includes a CostModel for
RISCV, and a trivial follow-up can potentially enable the SLPVectorizer
to vectorize std::lrint and std::llrint, fixing #55208.

The patch includes tests, obviously for the RISCV target, but also for
the X86, AArch64, and PowerPC targets to justify the addition of the
vector variants to the LangRef.
2023-10-19 13:05:04 +01:00
Pierre-Andre Saulais
0b80288e9e [NVPTX] Preserve v16i8 vector loads when legalizing
This is done by lowering v16i8 loads into LoadV4 operations with i32
results instead of letting ReplaceLoadVector split it into smaller
loads during legalization. This is done at dag-combine1 time, so that
vector operations with i8 elements can be optimised away instead of
being needlessly split during legalization, which involves storing to
the stack and loading it back.
2023-10-19 12:34:25 +01:00
Simon Pilgrim
c43ac32bca
[DAG] Expand vXi1 add/sub overflow operations as xor/and (#69191)
Similar to what we already do for add/sub + saturation variants.

Scalar support will be added in a future patch covering the other variants at the same time.

Alive2: https://alive2.llvm.org/ce/z/rBDrNE

Fixes #69080
2023-10-19 11:48:51 +01:00
Pierre van Houtryve
d2edff839d
[AMDGPU] PeepholeSDWA: Don't assume inst srcs are registers (#69576)
To fix that ticket we only needed to address the V_LSHLREV_B16 case, but
I did it for all insts just in case.

Fixes #66899
2023-10-19 12:13:45 +02:00
Yeting Kuo
5341d5465d
[RISCV] Combine (and (select cond, x, -1), c) to (select cond, x, (and x, c)) with Zicond. (#69563)
It's only beneficial when cond is setcc with integer equality condition
code. For other case, it has same instruction count as the original.
2023-10-19 16:11:11 +08:00
Freddy Ye
278e533ee9
[X86] Support -march=pantherlake,clearwaterforest (#69277) 2023-10-19 15:11:15 +08:00
Wang Pengcheng
f4231bf446
[RISCV] Replace PostRAScheduler with PostMachineScheduler (#68696)
Just like what other targets have done.

And this will make DAG mutations like MacroFusion take effect.
2023-10-19 13:30:41 +08:00
Craig Topper
d51855f700
[RISCV] Fix assertion failure from performBUILD_VECTORCombine when the binop is a shift. (#69349)
The RHS of a shift can have a different type than the LHS. If there are
undefs in the vector, we need the undef added to the RHS to match the
type of any shift amounts that are also added to the vector.

For now just don't add shifts if their RHS and LHS don't match.
2023-10-18 21:40:28 -07:00
Michal Paszkowski
817519058a
[SPIR-V] Emit proper pointer type for OpenCL kernel arguments (#67726) 2023-10-18 20:51:53 -07:00
Wang Pengcheng
654a3a3cbc
[OpenCL][RISCV] Support SPIR_KERNEL calling convention (#69282)
X86 supports this calling convention but I don't find any special
handling, so I think we can just handle it via CC_RISCV.

This should fix #69197.
2023-10-19 11:00:39 +08:00
Lu Weining
78abc45c44
[LoongArch] Improve codegen for atomic cmpxchg ops (#69339)
PR #67391 improved atomic codegen by handling memory ordering specified
by the `cmpxchg` instruction. An acquire barrier needs to be generated
when memory ordering includes an acquire operation. This PR improves the
codegen further by only handling the failure ordering.
2023-10-19 09:21:51 +08:00
wanglei
271087e3a0
[LoongArch] Implement COPY instruction between CFRs (#69300)
With this patch, all CFRs can be used for register allocation.
2023-10-19 09:20:27 +08:00
Craig Topper
e103515ced
[RISCV][GISel] Support passing arguments through the stack. (#69289)
This is needed when we run out of registers.
2023-10-18 17:48:58 -07:00
Arthur Eubanks
f3ea73133f
[ELF] Set large section flag for globals with an explicit section (#69396)
An oversight in https://reviews.llvm.org/D148836 since this is a
different code path.
2023-10-18 16:24:23 -07:00
Min-Yih Hsu
5f5faf407b
[RISCV][GISel] Add ISel supports for SHXADD from Zba extension (#67863)
This patch consists of porting SDISel patterns of SHXADD instructions to
GISel.
Note that `non_imm12`, a predicate that was implemented with `PatLeaf`,
is now turned into a `PatFrag` of `<op>_with_non_imm12` where `op` is
the operator that uses `the non_imm12` operand, as GISel doesn't have
equivalence of `PatLeaf` at this moment.
2023-10-18 15:55:19 -07:00
Craig Topper
040df124a2
[RISCV] Don't let performBUILD_VECTORCombine form a division or remainder with undef elements. (#69482)
Division/remainder by undef is immediate UB across the entire vector.
2023-10-18 13:51:22 -07:00
Stanislav Mekhanoshin
98e95a0055
[AMDGPU] Make S_MOV_B64_IMM_PSEUDO foldable (#69483)
With the legality checks in place it is now safe to do. S_MOV_B64 shall
not be used with wide literals, thus updating the test.
2023-10-18 13:38:20 -07:00
David Green
8a701024f3 [ARM] Lower i1 concat via MVETRUNC
The MVETRUNC operation can perform the same truncate of two vectors, without
requiring lane inserts/extracts from every vector lane. This moves the concat
i1 lowering to use it for v8i1 and v16i1 result types, trading a bit of extra
stack space for less instructions.
2023-10-18 19:40:11 +01:00
Stanislav Mekhanoshin
84f398af74
[AMDGPU] Add missing test checks. NFC. (#69484) 2023-10-18 11:26:39 -07:00
Ilya Leoshkevich
8e810dc7d9
[SystemZ] Support builtin_{frame,return}_address() with non-zero argument (#69405)
When the code is built with -mbackchain, it is possible to retrieve the
caller's frame and return addresses. GCC already can do this, add this
support to Clang as well. Use RISCVTargetLowering and GCC's
s390_return_addr_rtx() as inspiration. Add tests based on what GCC is
emitting.
2023-10-18 19:05:31 +02:00
Stanislav Mekhanoshin
47ed921985
[AMDGPU] Add legality check when folding short 64-bit literals (#69391)
We can only fold it if it can fit into 32-bit. I believe it did not
trigger yet because we do not select 64-bit literals generally.
2023-10-18 09:22:23 -07:00
Sirish Pande
28e4f97320
[AMDGPU] Save/Restore SCC bit across waterfall loop. (#68363)
Waterfall loop is overwriting SCC bit of status register. Make sure SCC
bit is saved and restored across.
We need to save/restore only in cases where SCC is live across waterfall
loop.

Co-authored-by: Sirish Pande <sirish.pande@amd.com>
2023-10-18 08:43:29 -05:00
David Green
c060757bcc [ARM] Correct v2i1 concat extract types.
For two v2i1 concat into a v4i1, we cannot extract each i64 element as an i32.
This casts to a v4i32 instead and extracts the correct vector lanes.
2023-10-18 13:40:38 +01:00
pvanhout
868abf0961 Revert "[AMDGPU] Remove Code Object V3 (#67118)"
This reverts commit 544d91280c26fd5f7acd70eac4d667863562f4cc.
2023-10-18 12:55:36 +02:00
Jay Foad
104db26004
[AMDGPU] Fix image intrinsic optimizer on loads from different resources (#69355)
The image intrinsic optimizer pass was neglecting to check any arguments
of the load intrinsic after the VAddr arguments. For example multiple
loads from different resources should not have been combined but were,
because the pass was not checking the resource argument.
2023-10-18 11:08:01 +01:00
Paul Walker
675231eb09
[SVE ACLE] Allow default zero initialisation for svcount_t. (#69321)
This matches the behaviour of the other SVE ACLE types.
2023-10-18 10:40:07 +01:00
Amara Emerson
e93bddb287 [AArch64][GlobalISel] Precommit indexed sextload/zextload tests. 2023-10-18 00:23:20 -07:00
Shao-Ce SUN
f48dab5237
Add RV64 constraint to SRLIW (#69416)
Fixes #69408
2023-10-18 15:01:17 +08:00
Noah Goldstein
112e49b381 [DAGCombiner] Transform (icmp eq/ne (and X,C0),(shift X,C1)) to use rotate or to getter constants.
If `C0` is a mask and `C1` shifts out all the masked bits (to
essentially compare two subsets of `X`), we can arbitrarily re-order
shift as `srl` or `shl`.

If `C1` (shift amount) is a power of 2, we can replace the and+shift
with a rotate.

Otherwise, based on target preference we can arbitrarily swap `shl`
and `shl` in/out to get better constants.

On x86 we can use this re-ordering to:
    1) get better `and` constants for `C0` (zero extended moves or
       avoid imm64).
    2) covert `srl` to `shl` if `shl` will be implementable with `lea`
       or `add` (both of which can be preferable).

Proofs: https://alive2.llvm.org/ce/z/qzGM_w

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D152116
2023-10-18 01:16:55 -05:00
Noah Goldstein
0c2d28a448 [X86] Add tests for transform (icmp eq/ne (and X, C0), (shift X, C1)); NFC
Differential Revision: https://reviews.llvm.org/D152115
2023-10-18 01:16:55 -05:00
Pierre van Houtryve
c464fea779
[DAG] Constant fold FMAD (#69324)
This has very little effect on codegen in practice, but is a nice to
have I think.

See #68315
2023-10-18 07:46:24 +02:00
Kai Luo
b42738805a [PowerPC] Auto gen test checks for #69299. NFC. 2023-10-18 02:21:22 +00:00
Nitin John Raj
ae3ba725b7
[RISCV][GlobalISel] Select G_FRAME_INDEX (#68254)
This patch is a bandage to get G_FRAME_INDEX working. We could import
the SelectionDAG patterns for the ComplexPattern FrameAddrRegImm, and
perhaps we will do that in the future. For now we just select it as an
addition with 0.
2023-10-17 17:56:42 -07:00
Mircea Trofin
ab91e05e48 [mlgo] Fix tests post 760e7d0 2023-10-17 12:19:54 -07:00
Artem Belevich
b33723710f
[NVPTX] Fixed few more corner cases for v4i8 lowering. (#69263)
Fixes https://github.com/llvm/llvm-project/issues/69124
2023-10-17 11:06:11 -07:00
Stanislav Mekhanoshin
a22a1fe151
[AMDGPU] support 64-bit immediates in SIInstrInfo::FoldImmediate (#69260)
This is a part of https://github.com/llvm/llvm-project/issues/67781.
Until we select more 64-bit move immediates the impact is minimal.
2023-10-17 10:53:22 -07:00
David Green
4266815f4d
[AArch64] Convert negative constant aarch64_neon_sshl to VASHR (#68918)
In replacing shifts by splat with constant shifts, we can handle
negative shifts by flipping the sign and using a VASHR or VLSHR.
2023-10-17 18:41:23 +01:00
David Green
658ed58de6 [AArch64] Add additional tests for fptosi/fptoui. NFC 2023-10-17 18:39:37 +01:00
akirchhoff-modular
4480e650b3
[YAMLParser] Improve plain scalar spec compliance (#68946)
The `YAMLParser.h` header file claims support for YAML 1.2 with a few
deviations, but our plain scalar parsing failed to parse some valid YAML
according to the spec. This change puts us more in compliance with the
YAML spec, now letting us parse plain scalars containing additional
special characters in cases where they are not ambiguous.
2023-10-17 11:28:14 -06:00
Guozhi Wei
760e7d00d1 [X86, Peephole] Enable FoldImmediate for X86
Enable FoldImmediate for X86 by implementing X86InstrInfo::FoldImmediate.

Also enhanced peephole by deleting identical instructions after FoldImmediate.

Differential Revision: https://reviews.llvm.org/D151848
2023-10-17 16:22:42 +00:00