A llvm.vector.reduce.fadd(float, <1 x float>) will be translated to
G_VECREDUCE_SEQ_FADD with two scalar operands, which is illegal
according to the verifier. This makes sure we generate a fadd/fmul
instead.
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>. Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
This reverts commit 14cd1339318b16e08c1363ec6896bd7d1e4ae281. The
buildbot failure seems to have been a cmake issue which has been
discussed in more detail in this Discourse post:
https://discourse.llvm.org/t/cmake-doesnt-regenerate-all-tablegen-target-files/87901
If any buildbots fail to select arbitrary intrinsics with this patch,
it's worth considering using clean builds with ccache instead of
incremental builds, as recommended here:
https://llvm.org/docs/HowToAddABuilder.html#:~:text=Use%20CCache%20and%20NOT%20incremental%20builds
The original commit message for this patch:
Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave
functions. This will take as its first argument the callee with the
amdgpu_gfx_whole_wave calling convention, followed by the call
parameters which must match the signature of the callee except for the
first function argument (the i1 original EXEC mask, which doesn't need
to be passed in). Indirect calls are not allowed.
Make direct calls to amdgpu_gfx_whole_wave functions a verifier error.
Tail calls are handled in a future patch.
Since the `dontcall-*` attributes are checked both by
`FastISel`/`GlobalISel` and `SelectionDAGBuilder`, and both `FastISel`
and `GlobalISel` bail for calls on Arm64EC for AFTER doing the check, we
ended up emitting duplicate copies of this error.
This change moves the checking for `dontcall-*` in `FastISel` and
`GlobalISel` to after it has been successfully lowered.
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave
functions. This will take as its first argument the callee with the
amdgpu_gfx_whole_wave calling convention, followed by the call
parameters which must match the signature of the callee except for the
first function argument (the i1 original EXEC mask, which doesn't need
to be passed in). Indirect calls are not allowed.
Make direct calls to amdgpu_gfx_whole_wave functions a verifier error.
Unspeakable horrors happen around calls from whole wave functions, the
plan is to improve the handling of caller/callee-saved registers in
a future patch.
Tail calls are also handled in a future patch.
The IRTranslator sets the flags now more consistently with
`SelectionDAGBuilder::visitGetElementPtr()`. This affects `nuw` and `nusw`, as
well as the recently introduced `inbounds` MIFlag (see PR #150900).
This PR also adds more tests to `AArch64/GlobalISel/irtranslator-gep-flags.ll`
to cover all points in `IRTranslator::translateGetElementPtr` that set flags.
For SWDEV-516125.
This slightly relaxes the invariant established in #149310, by also
allowing the lifetime argument to be poison. This is to support the
typical pattern of RAUWing with poison when removing an instruction.
It's worth noting that this does not require any conservative
assumptions, lifetimes with poison arguments can simply be skipped.
Fixes https://github.com/llvm/llvm-project/issues/151119.
These functions are for building G_PTR_ADDs when we know that the base
pointer and the result are both valid pointers into (or just after) the
same object. They are similar to SelectionDAG::getObjectPtrOffset.
This PR also changes call sites of the generic (build|materialize)PtrAdd
functions that implement pointer arithmetic to split large memory
accesses to the new functions. Since memory accesses have to fit into an
object in memory, pointer arithmetic to an offset into a large memory
access also yields an address in that object.
Currently, these (build|materialize)ObjectPtrOffset functions only add
"nuw" to the generated G_PTR_ADD, but I intend to introduce an
"inbounds" MIFlag in a later PR (analogous to a concurrent effort in
SDAG: #131862, related: #140017, #141725) that will also be set in the
(build|materialize)ObjectPtrOffset functions.
Most test changes just add "nuw" to G_PTR_ADDs. Exceptions are AMDGPU's
call-outgoing-stack-args.ll, flat-scratch.ll, and freeze.ll tests, where
offsets are now folded into scratch instructions, and cases where the
behavior of the check regeneration script changed, resulting, e.g., in
better checks for "nusw G_PTR_ADD" instructions, matched empty lines,
and the use of "CHECK-NEXT" in MIPS tests.
For SWDEV-516125.
After https://github.com/llvm/llvm-project/pull/149310 we are guaranteed
that the argument is an alloca, so we don't need to look at underlying
objects (which was not a correct thing to do anyway).
This also drops the offset argument for lifetime nodes in SDAG. The
offset is fixed to zero now. (Peculiarly, while SDAG pretended to have
an offset, it just gets silently dropped during selection.)
- This implementation is adapted from SDAG
X86TargetLowering::LowerGET_ROUNDING.
- llvm.set.rounding will be added later because it involves MXCSR
updates currently unsupported.
This is a reland of #138434 except that:
- the bits for llvm/lib/CodeGen/RenameIndependentSubregs.cpp
have been dropped because they caused a test failure under asan, and
- the bits for llvm/lib/CodeGen/SelectionDAG/ScheduleDAGFast.cpp have
been improved with structured bindings.
This reverts commit a9699a334bc9666570418a3bed9520bcdc21518b.
Breaks CodeGen/AMDGPU/collapse-endcf.ll in several configs
(sanitizer builds; macOS; possibly more), see comments on
https://github.com/llvm/llvm-project/pull/138434
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.
These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.
Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.
These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.
Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
This patch replaces:
llvm::copy(Src, std::back_inserter(Dst));
with:
llvm::append_range(Dst, Src);
for breavity.
One side benefit is that llvm::append_range eventually calls
llvm::SmallVector::reserve if Dst is of llvm::SmallVector.
In [1], Nikita Popov suggested that during lowering 'unreachable' insn
should not generate extra code for naked functions, and this applies to
all architectures. Note that for naked functions, 'unreachable' insn is
necessary in IR since the basic block needs a terminator to end.
This patch checked whether a function is naked function or not. If it is
a naked function, 'unreachable' insn will not generate ISD::TRAP.
[1] https://github.com/llvm/llvm-project/pull/131731
Co-authored-by: Yonghong Song <yonghong.song@linux.dev>
From #106446, this adds a variant of getVectorIdxTy that returns an LLT.
Many uses only look at the width, so a getVectorIdxWidth was added as
the common base.
In contrast to SelectionDAG, GlobalISel created a new virtual register
for the return value of invariant.start, leaving subsequent users of the
invariant.start value with an undefined reference.
A minimal example:
```
%tmp = alloca i32, align 4, addrspace(5)
%tmpI = call ptr @llvm.invariant.start.p5(i64 4, ptr addrspace(5) %tmp) #3
call void @llvm.invariant.end.p5(ptr %tmpI, i64 4, ptr addrspace(5) %tmp) #3
store i32 %i, ptr %tmpI, align 4
```
Although the return value of invariant.start might not be intended for
any use beyond invariant.end (the fuzzer might not have created a
sensible situation here), an implicit definition of the corresponding
virtual register avoids a segfault in the target instruction selector
later.
This LLVM defect was identified via the AMD Fuzzing project.
We currently have an issue where bf16 patters can be used to match fp16
types, as GISel does not know about the difference between the two. This
patch explicitly disables them to make sure that they are never used.
The opposite can also happen too, where fp16 patterns are used for
operators that should be bf16. So this also changes any operations with
bf16 types to now cause a fallback to SDAG.
The pass setup for GISel has been slightly adjusted to make sure that a
verify pass does not get added between AMD-SDAG and SIFixSGPRCopiesPass,
which otherwise can cause verifier issues when falling back.
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to getFirstNonPHI use the iterator-returning version.
This patch changes a bunch of call-sites calling getFirstNonPHI to use
getFirstNonPHIIt, which returns an iterator. All these call sites are
where it's obviously safe to fetch the iterator then dereference it. A
follow-up patch will contain less-obviously-safe changes.
We'll eventually deprecate and remove the instruction-pointer
getFirstNonPHI, but not before adding concise documentation of what
considerations are needed (very few).
---------
Co-authored-by: Stephen Tozer <Melamoto@gmail.com>
Handle @llvm.expect.with.probability in GlobalISel in the same way
@llvm.expect is handled, passing the value through as-is. This can be
encountered if the intrinsic is used without optimizations, which would
otherwise transform it out.
Fixes#115411 for GlobalISel
This adds the `llvm.sincos` intrinsic, legalization, and lowering.
The `llvm.sincos` intrinsic takes a floating-point value and returns
both the sine and cosine (as a struct).
```
declare { float, float } @llvm.sincos.f32(float %Val)
declare { double, double } @llvm.sincos.f64(double %Val)
declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80 %Val)
declare { fp128, fp128 } @llvm.sincos.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 } @llvm.sincos.ppcf128(ppc_fp128 %Val)
declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> %Val)
```
The lowering is built on top of the existing FSINCOS ISD node, with
additional type legalization to allow for f16, f128, and vector values.
Add support for using a thread-local variable with a specified offset
for holding the stack guard canary value. This supports both 32- and 64-
bit PowerPC targets.
This mirrors changes from #108942 but targeting PowerPC instead of
RISCV. Because both of these PRs modify the same driver functions, this
series is stack on top of the RISC-V one.
---------
Signed-off-by: Keith Packard <keithp@keithp.com>
Following the addition of the llvm.fake.use intrinsic and corresponding
MIR instruction, two further changes are planned: to add an
-fextend-lifetimes flag to Clang that emits these intrinsics, and to
have -Og enable this flag by default. Currently, some logic for handling
fake uses is gated by the optdebug attribute, which is intended to be
switched on by -fextend-lifetimes (and by extension -Og later on).
However, the decision was made that a general optdebug attribute should
be incompatible with other opt_ attributes (e.g. optsize, optnone),
since they all express different intents for how to optimize the
program. We would still like to allow -fextend-lifetimes with optsize
however (i.e. -Os -fextend-lifetimes should be legal), since it may be a
useful configuration and there is no technical reason to not allow it.
This patch resolves this by tracking MachineFunctions that have fake
uses, allowing us to run passes that interact with them and skip passes
that clash with them.