The code was written with the implicit assumption that each IMPLICIT_DEF either a) the tied operand, or b) an untied source, but not both. This is true right now, but an upcoming change may allow CSE of IMPLICIT_DEFs in some cases, so let's rewrite the code to handle that possibility.
I added an MIR case which demonstrates the multiple use IMPLICIT_DEF. To my knowledge, this is not a reachable configuration from IR right now.
As an aside, this makes the structure a much closer match with the sub-reg liveness case, and we can probably just merge these routines. (Future work.)
Differential Revision: https://reviews.llvm.org/D156477
This fixes a code gen issue where savings the swift async context
register (x22) accidentally overwrites the saved value of another
callee-saved register, corrupts its value and causes a crash.
Differential Revision: https://reviews.llvm.org/D156391
PATCHABLE_TYPED_EVENT_CALL and PATCHABLE_EVENT_CALL are pseudo
instructions that expand to XRay sleds, so getInstSizeInBytes
should reflect the size of the sleds, not the pseudo-instructions.
Differential Revision: https://reviews.llvm.org/D156272
The patterns were ripped out in
a4a3ac10cb1a40ccebed4e81cd7e94f1eb71602d so this always needs to be
custom lowered. I absolutely hate how difficult it is to write tests
for these, I have no doubt there are more of these hidden.
Fixes#64142
Record the call frame size on entry to each basic block. This is usually
zero except when a basic block has been split in the middle of a call
sequence.
This simplifies PEI::replaceFrameIndices which previously had to visit
basic blocks in a specific order and had special handling for
unreachable blocks. More importantly it paves the way for an equally
simple implementation of a backwards version of replaceFrameIndices,
which is required to fully convert PrologEpilogInserter to backwards
register scavenging, which is preferred because it does not rely on
accurate kill flags.
Differential Revision: https://reviews.llvm.org/D156113
The indexed fmlal should use a low numbered register for the index operand,
which this fixes by making it V128_lo.
Fixes 64104
Differential Revision: https://reviews.llvm.org/D156296
And dependent commits.
Details in D150388.
This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c.
This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e.
This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826.
This reverts commit b7836d856206ec39509d42529f958c920368166b.
No conflicts in the code, few tests had conflicts in autogenerated CHECKs:
llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll
Reviewed By: alexfh
Differential Revision: https://reviews.llvm.org/D156381
Without any additional tweaking, we can successfully legalize for wider
types (i64, i96 for rv32; i128, i192 for rv64) that are integer
multiples of XLen.
Reviewed By: arsenm, craig.topper
Differential Revision: https://reviews.llvm.org/D155639
This reuses the patterns introduced to help lower vnsr[a,l].vx in D155698.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D155936
Similar to D155698 where the shift amount is extended, this patch extends the
ComplexPattern to handle the case where the shift amount has been truncated.
Truncations are custom lowered to truncate_vector_vl, and in cases like i64 ->
i16 they are truncated by one power of two at a time, so we need to unravel
nested layers of them.
The pattern can also be reused for Zvbb's vwsll.vx in an upcoming patch.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D155928
Recommit only the tests that look good this time.
Correct X86 strictfp tests to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics
Mostly these tests just needed the strictfp attribute on function
definitions. After D154991 the constrained intrinsics have the
strictfp attribute by default so they don't need it here, but other
functions do.
Test changes verified with D146845.
This removes selectSETCC and adds isel patterns for seteq/setne
conditions.
This removes the duplication of selectSETCC between lowering and
isel. This also gets some cases in xaluo.ll that we missed previously.
Reviewed By: wangpc
Differential Revision: https://reviews.llvm.org/D156250
In [1], a few new insns are proposed to expand BPF ISA to
. fixing the limitation of existing insn (e.g., 16bit jmp offset)
. adding new insns which may improve code quality
(sign_ext_ld, sign_ext_mov, st)
. feature complete (sdiv, smod)
. better user experience (bswap)
This patch implemented insn encoding for
. sign-extended load
. sign-extended mov
. sdiv/smod
. bswap insns
. unconditional jump with 32bit offset
The new bswap insns are generated under cpu=v4 for __builtin_bswap.
For cpu=v3 or earlier, for __builtin_bswap, be or le insns are generated
which is not intuitive for the user.
To support 32-bit branch offset, a 32-bit ja (JMPL) insn is implemented.
For conditional branch which is beyond 16-bit offset, llvm will do
some transformation 'cond_jmp' -> 'cond_jmp + jmpl' to simulate 32bit
conditional jmp. See BPFMIPeephole.cpp for details. The algorithm is
hueristic based. I have tested bpf selftest pyperf600 with unroll account
600 which can indeed generate 32-bit jump insn, e.g.,
13: 06 00 00 00 9b cd 00 00 gotol +0xcd9b <LBB0_6619>
Eduard is working on to add 'st' insn to cpu=v4.
A list of llc flags:
disable-ldsx, disable-movsx, disable-bswap,
disable-sdiv-smod, disable-gotol
can be used to disable a particular insn for cpu v4.
For example, user can do:
llc -march=bpf -mcpu=v4 -disable-movsx t.ll
to enable cpu v4 without movsx insns.
References:
[1] https://lore.kernel.org/bpf/4bfe98be-5333-1c7e-2f6d-42486c8ec039@meta.com/
Differential Revision: https://reviews.llvm.org/D144829
Correct AArch64 strictfp tests to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics
Mostly these tests just needed the strictfp attribute on function
definitions. I've also removed the strictfp attribute from uses
of the constrained intrinsics because it comes by default since
D154991, but I only did this in tests I was changing anyway.
I have removed attributes added to declare lines of intrinsics. The
attributes of intrinsics cannot be changed in a test so I eliminated
attempts to do so.
Test changes verified with D146845.
Correct PowerPC strictfp tests to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics
Mostly these tests just needed the strictfp attribute on function
definitions. I've also removed the strictfp attribute from uses
of the constrained intrinsics because it comes by default since
D154991, but I only did this in tests I was changing anyway.
I have removed attributes added to declare lines of intrinsics. The
attributes of intrinsics cannot be changed in a test so I eliminated
attempts to do so.
Test changes verified with D146845.
Correct X86 strictfp tests to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics
Mostly these tests just needed the strictfp attribute on function
definitions. After D154991 the constrained intrinsics have the
strictfp attribute by default so they don't need it here, but other
functions do.
Test changes verified with D146845.
The previous condition was incorrect in some cases, like storing <2 x i32>
into a double. If IndexVal was >0, we ended up never storing anything.
Reviewed By: #amdgpu, arsenm
Differential Revision: https://reviews.llvm.org/D156308
This patch expands testing coverage for ReplaceWithVeclib pass
when SLEEF vector library is used. It adds testing for all LLVM
intrinsics which correspond to math functions from libm.
llrint, llround and lrint are not included as currently
IR verifier pass does not allow to use vector types with them.
Differential Revision: https://reviews.llvm.org/D155623
Summary: D80642 added support for emitting AvailableExternally Linkage on AIX. However, an assertion of "Trying to get csect representation of this symbol but none was set." occurred when a function is declared as available_externally. This is due to we missing to generate a csect for the function. This patch fixes it.
Reviewed By: hubert.reinterpretcast, shchenz
Differential Revision: https://reviews.llvm.org/D156213
Signed-off-by: Esme Yi <esme.yi@ibm.com>
As described in [1][2], `-mtune=` is used to select the type of target
microarchitecture, defaults to the value of `-march`. The set of
possible values should be a superset of `-march` values. Currently
possible values of `-march=` and `-mtune=` are `native`, `loongarch64`
and `la464`.
D136146 has supported `-march={loongarch64,la464}` and this patch adds
support for `-march=native` and `-mtune=`.
A new ProcessorModel called `loongarch64` is defined in LoongArch.td
to support `-mtune=loongarch64`.
`llvm::sys::getHostCPUName()` returns `generic` on unknown or future
LoongArch CPUs, e.g. the not yet added `la664`, leading to
`llvm::LoongArch::isValidArchName()` failing to parse the arch name.
In this case, use `loongarch64` as the default arch name for 64-bit
CPUs.
And these two preprocessor macros are defined:
- __loongarch_arch
- __loongarch_tune
[1]: https://github.com/loongson/LoongArch-Documentation/blob/2023.04.20/docs/LoongArch-toolchain-conventions-EN.adoc
[2]: https://github.com/loongson/la-softdev-convention/blob/v0.1/la-softdev-convention.adoc
Reviewed By: xen0n, wangleiat
Differential Revision: https://reviews.llvm.org/D155824
When inline assembly code requests more registers than available, the
MachineInstr::emitError function in the RegAllocFast pass emits an error
but doesn't stop the pass, and then the compiler crashes later with an
assertion failure. This commit, mimicking the RegAllocGreedy pass, assigns
a random physical register, and therefore avoids the crash after producing
the diagnostic. This problem has been observed for both rustc and clang,
while it doesn't occur in gcc.
In code object 5 (https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata) the AMDGPU backend added the .uses_dynamic_stack bit to the kernel meta data to identity kernels which have compile time indeterminable stack usage (indirect function calls and recursion mainly). This patch adds this information to the output of the kernel-resource-usage remarks.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D156040
Author: Corbin Robeck <corbin.robeck@amd.com>
Correct AMDGPU strictfp tests to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics
Mostly these tests just needed the strictfp attribute on function
definitions. I've also removed the strictfp attribute from uses
of the constrained intrinsics because it comes by default since
D154991, but I only did this in tests I was changing anyway.
I also removed attributes added to declare lines of intrinsics. The
attributes of intrinsics cannot be changed in a test so I eliminated
attempts to do so.
Test changes verified with D146845.
If we are selecting between two setccs that need to be legalized
with xor, the select will be legalized first. Detect this pattern
so we can pull the xor through to expose it to additional
optimizations.
We could generalize this to other operations, but those normally
get handled in DAG combine before select legalization.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D156159
If we're selecting the result of two setccs that have been legalized
by introducing an xor with 1, we can pull the xor with 1 through the
select to enable more optimizations.
We could generalize this to other binary operators with identical
conditions, but those are usually caught before we legalize the select.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D156144
This reapplies the change for and, but also marks or as undesirable
at the same time. Only handling one of them can cause infinite
combine loops due to the asymmetric handling.
-----
In preparation for removing support for and/or expressions, mark
them as undesirable. As such, we will no longer implicitly create
such expressions, but they still exist.
As described in [1][2], `-mtune=` is used to select the type of target
microarchitecture, defaults to the value of `-march`. The set of
possible values should be a superset of `-march` values. Currently
possible values of `-march=` and `-mtune=` are `native`, `loongarch64`
and `la464`.
D136146 has supported `-march={loongarch64,la464}` and this patch adds
support for `-march=native` and `-mtune=`.
A new ProcessorModel called `loongarch64` is defined in LoongArch.td
to support `-mtune=loongarch64`.
`llvm::sys::getHostCPUName()` returns `generic` on unknown or future
LoongArch CPUs, e.g. the not yet added `la664`, leading to
`llvm::LoongArch::isValidArchName()` failing to parse the arch name.
In this case, use `loongarch64` as the default arch name for 64-bit
CPUs.
And these two preprocessor macros are defined:
- __loongarch_arch
- __loongarch_tune
[1]: https://github.com/loongson/LoongArch-Documentation/blob/2023.04.20/docs/LoongArch-toolchain-conventions-EN.adoc
[2]: https://github.com/loongson/la-softdev-convention/blob/v0.1/la-softdev-convention.adoc
Differential Revision: https://reviews.llvm.org/D155824
rocm-device-libs and llpc were avoiding using f64 sqrt
intrinsics in favor of their own expansions. Port the
expansion into the backend. Both of these users should be
updated to call the intrinsic instead.
The library and llpc expansions are slightly different.
llpc uses an ldexp to do the scale; the library uses a multiply.
Use ldexp to do the scale instead of the multiply.
I believe v_ldexp_f64 and v_mul_f64 are always the same number of
cycles, but it's cheaper to materialize the 32-bit integer constant
than the 64-bit double constant.
The libraries have another fast version of sqrt which will
be handled separately.
I am tempted to do this in an IR expansion instead. In the IR
we could take advantage of computeKnownFPClass to avoid
the 0-or-inf argument check.