This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>. Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
Mips requires fp128 args/returns to be passed differently than i128. It
handles this by inspecting the pre-legalization type. However, for soft
float libcalls, the original type is currently not provided (it will
look like a i128 call). To work around that, MIPS maintains a list of
libcalls working on fp128.
This patch removes that list by providing the original, pre-softening
type to calling convention lowering. This is done by carrying additional
information in CallLoweringInfo, as we unfortunately do need both types
(we want the un-softened type for OrigTy, but we need the softened type
for the actual register assignment etc.)
This is in preparation for completely removing all the custom
pre-analysis code in the Mips backend and replacing it with use of
OrigTy.
This reverts commit 14cd1339318b16e08c1363ec6896bd7d1e4ae281. The
buildbot failure seems to have been a cmake issue which has been
discussed in more detail in this Discourse post:
https://discourse.llvm.org/t/cmake-doesnt-regenerate-all-tablegen-target-files/87901
If any buildbots fail to select arbitrary intrinsics with this patch,
it's worth considering using clean builds with ccache instead of
incremental builds, as recommended here:
https://llvm.org/docs/HowToAddABuilder.html#:~:text=Use%20CCache%20and%20NOT%20incremental%20builds
The original commit message for this patch:
Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave
functions. This will take as its first argument the callee with the
amdgpu_gfx_whole_wave calling convention, followed by the call
parameters which must match the signature of the callee except for the
first function argument (the i1 original EXEC mask, which doesn't need
to be passed in). Indirect calls are not allowed.
Make direct calls to amdgpu_gfx_whole_wave functions a verifier error.
Tail calls are handled in a future patch.
TargetFrameIndex shouldn't be used as an operand to target-independent
node such as a load. This causes ISel issues.
#81635 fixed a similar issue with this code using a TargetConstant,
instead of a Constant.
Fixes#142314.
Reported from
https://github.com/llvm/llvm-project/pull/153393#issuecomment-3189898813
During DAGCombine, an intermediate extract_subvector sequence was
generated:
```
t8: v9i16 = extract_subvector t3, Constant:i64<9>
t24: v8i16 = extract_subvector t8, Constant:i64<0>
```
And one of the DAGCombine rule which turns `(extract_subvector
(extract_subvector X, C), 0)` into `(extract_subvector X, C)` kicked in
and turn that into `v8i16 = extract_subvector t3, Constant:i64<9>`. But
it forgot to check if the extracted index is a multiple of the minimum
vector length of the result type, hence the crash.
This patch fixes this by adding an additional check.
Add special handling of EXTRACT_SUBVECTOR, INSERT_SUBVECTOR,
EXTRACT_VECTOR_ELT, INSERT_VECTOR_ELT and SCALAR_TO_VECTOR in
isGuaranteedNotToBeUndefOrPoison. Make use of DemandedElts to improve
the analysis and only check relevant elements for each operand.
Also start using DemandedElts in the recursive calls that check
isGuaranteedNotToBeUndefOrPoison for all operands for operations that do
not create undef/poison. We can do that for a number of elementwise
operations for which the DemandedElts can be applied to every operand
(e.g. ADD, OR, BITREVERSE, TRUNCATE).
Instead of passing SDNode and ResNo separately.
This allows us to use SDValue::operator== and avoid creating SDValue
from the operands inside the function.
780054d3ff18075a6bc433029f336931792b1d2d added support for
`ISD::AssertNoFPClass`.
This ISD node can be used with the `ppc_fp128` type, which is really
just two `f64s` and requires expanding when used with
`ISD::AssertNoFPClass`. Without the support for expanding the result, we
get an assertion because the legalizer does not know how to expand the
results of `ppc_fp128` with `ISD::AssertNoFPClass`.
```
ExpandFloatResult #0: t7: ppcf128 = AssertNoFPClass t5, TargetConstant:i32<3>
LLVM ERROR: Do not know how to expand the result of this operator!
```
Thus, this patch aims to add support for the expand so we no longer
assert.
This fixes#151375.
https://github.com/llvm/llvm-project/pull/152709 exposed the original IR
argument type to the CC lowering logic. However, in SDAG, this used the
raw type, prior to aggregate splitting. This PR changes it to use the
non-aggregate type instead. (This matches what happened in the
GlobalISel case already.)
I've also added some more detailed documentation on the
InputArg/OutputArg fields, to explain how they differ. In most cases
ArgVT is going to be the EVT of OrigTy, so they encode very similar
information (OrigTy just preserves some additional information lost in
EVTs, like pointer types). One case where they do differ is in
post-legalization lowering of libcalls, where ArgVT is going to be a
legalized type, while OrigTy is going to be the original non-legalized
type.
Since the `dontcall-*` attributes are checked both by
`FastISel`/`GlobalISel` and `SelectionDAGBuilder`, and both `FastISel`
and `GlobalISel` bail for calls on Arm64EC for AFTER doing the check, we
ended up emitting duplicate copies of this error.
This change moves the checking for `dontcall-*` in `FastISel` and
`GlobalISel` to after it has been successfully lowered.
Partially fix#149023.
The original code `MRI.def_begin(Reg)->getParent()` may return the
incorrect MI, as the physical register `Reg` may have multiple
definitions.
This patch selects the correct MI to verify by comparing the MBB of each
definition.
New testcase hangs with -O1/2/3 enabled. The BranchFolding may be to
blame.
We only do conditional streaming mode changes in two cases:
- Around calls in streaming-compatible functions that don't have a
streaming body
- At the entry/exit of streaming-compatible functions with a streaming
body
In both cases, the condition depends on the entry pstate.sm value. Given
this, we don't need to emit calls to __arm_sme_state at every mode
change.
This patch handles this by placing a "AArch64ISD::ENTRY_PSTATE_SM" node
in the entry block and copying the result to a register. The register is
then used whenever we need to emit a conditional streaming mode change.
The "ENTRY_PSTATE_SM" node expands to a call to "__arm_sme_state" only
if (after SelectionDAG) the function is determined to have
streaming-mode changes.
This has two main advantages:
1. It allows back-to-back conditional smstart/stop pairs to be folded
2. It has the correct behaviour for EH landing pads
- These are entered with pstate.sm = 0, and should switch mode based on
the entry pstate.sm
- Note: This is not fully implemented yet
It is common to have ABI requirements for illegal types: For example,
two i64 argument parts that originally came from an fp128 argument may
have a different call ABI than ones that came from a i128 argument.
The current calling convention lowering does not provide access to this
information, so backends come up with various hacks to support it (like
additional pre-analysis cached in CCState, or bypassing the default
logic entirely).
This PR adds the original IR type to InputArg/OutputArg and passes it
down to CCAssignFn. It is not actually used anywhere yet, this just does
the mechanical changes to thread through the new argument.
This introduces a new `ptrtoaddr` instruction which is similar to
`ptrtoint` but has two differences:
1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance
2) `ptrtoaddr` only extracts (and then extends/truncates) the low
index-width bits of the pointer
For most architectures, difference 2) does not matter since index (address)
width and pointer representation width are the same, but this does make a
difference for architectures that have pointers that aren't just plain
integer addresses such as AMDGPU fat pointers or CHERI capabilities.
This commit introduces textual and bitcode IR support as well as basic code
generation, but optimization passes do not handle the new instruction yet
so it may result in worse code than using ptrtoint. Follow-up changes will
update capture tracking, etc. for the new instruction.
RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54
Reviewed By: nikic
Pull Request: https://github.com/llvm/llvm-project/pull/139357
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
The histogram DAG combine went into an infinite loop of creating the
same histogram node due to an incorrect use of the `refineUniformBase`
and `refineIndexType` APIs.
These APIs take SDValues by reference (SDValue&) and return `true` if
they were "refined" (i.e., set to new values).
Previously, this DAG combine would create the `Ops` array (used to
create the new histogram node) before calling the `refine*` APIs, which
copies the SDValues into the array, meaning the updated values were not
used to create the new histogram node.
Reproducer: https://godbolt.org/z/hsGWhTaqY (it will timeout)
AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __memcmp routine is a millicode implementation;
we use millicode for the memcmp function instead of a library call to
improve performance.
The information whether a specific argument is vararg or fixed is
currently stored separately from all the other argument information in
ArgFlags. This means that it is not accessible from CCAssign, and
backends have developed all kinds of workarounds for how they can access
it after all.
Move this information to ArgFlags to make it directly available in all
relevant places.
I've opted to invert this and store it as IsVarArg, as I think that both
makes the meaning more obvious and provides for a better default (which
is IsVarArg=false).
Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave
functions. This will take as its first argument the callee with the
amdgpu_gfx_whole_wave calling convention, followed by the call
parameters which must match the signature of the callee except for the
first function argument (the i1 original EXEC mask, which doesn't need
to be passed in). Indirect calls are not allowed.
Make direct calls to amdgpu_gfx_whole_wave functions a verifier error.
Unspeakable horrors happen around calls from whole wave functions, the
plan is to improve the handling of caller/callee-saved registers in
a future patch.
Tail calls are also handled in a future patch.
That change adds support for folding a SETCC when one or both of the
operands is a TRUNCATE with the appropriate no-wrap flags. This pattern
can occur when promoting i8 operations in NVPTX, and we currently have
some ISel rules to try to handle it.
Now that #146490 removed the assertion in visitFreeze to assert that the
node was still isGuaranteedNotToBeUndefOrPoison we no longer need this
reduced depth hack (which had to account for the difference in depth of
freeze(op()) vs op(freeze())
Helps with some of the minor regressions in #150017
Similar to InstCombinerImpl::freezeOtherUses, attempt to ensure that we
merge multiple frozen/unfrozen uses of a SDValue. This fixes a number of
hasOneUse() problems when trying to push FREEZE nodes through the DAG.
Remove SimplifyMultipleUseDemandedBits handling of FREEZE nodes as we
now want to keep the common node, and not bypass for some nodes just
because of DemandedElts.
Fixes#149799
If using srl does not produce a legal constant for the RHS of the
final compare, try to use sra instead.
Because the AND constant is negative, the sign bits participate in the
compare. Using an arithmetic shift right duplicates that bit.
Add a new combine to replace
```
(store ch (vselect cond truevec (load ch ptr offset)) ptr offset)
```
to
```
(mstore ch truevec ptr offset cond)
```
This saves a blend operation on targets that support conditional stores.
This slightly relaxes the invariant established in #149310, by also
allowing the lifetime argument to be poison. This is to support the
typical pattern of RAUWing with poison when removing an instruction.
It's worth noting that this does not require any conservative
assumptions, lifetimes with poison arguments can simply be skipped.
Fixes https://github.com/llvm/llvm-project/issues/151119.
The optimization introduced by #125637 tried to avoid using stacks to
promote bitcast with vector result type. However, it wouldn't be correct
if the input type is vector. This patch limits that optimizations to
only scalar to vector bitcasts.