Rather then defining these tags in each object file that requires them
we can can declare them as undefined and require that they defined
externally in, for example, compiler-rt or libcxxabi.
These instructions have one already narrow operand. Previously, we
pretended like this operand was a supported extension.
This could cause problems when we called getOrCreateExtendedOp on this
narrow operand when creating the the VWADD_VL. If the narrow operand
happened to be an extend of the opposite type, we would peek through it
and then rebuild it with the wrong extension type. So (vwadd_w_vl (i32
(sext X)), (i16 (zext Y))) would become (vwadd_vl (i16 (sext X)), (i16
(sext Y))).
To prevent this, we ignore the operand instead and pass std::nullopt for
SupportsExt to getOrCreateExtendedOp so it won't peek through any
extends on the narrow source.
Fixes#159152.
The result type of the vector extend intrinsics generated by the
BUILD_VECTOR lowering code should match how they are actually defined.
Currently the result type is defaulting to the operand type there. This
can conflict with calls to the same intrinsic from other paths.
AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __strlen routine is a millicode implementation;
we use millicode for the strlen function instead of a library call to
improve performance.
In this commit:
(1) Added new pass manager support for `ReachingDefAnalysis`.
(2) Added printer pass.
(3) Make old pass manager use `ReachingDefInfoWrapperPass`
If we can't fold a PTRADD's offset into its users, lowering them to
disjoint ORs is preferable: Often, a 32-bit OR instruction suffices
where we'd otherwise use a pair of 32-bit additions with carry.
This needs to be a DAGCombine (and not a selection rule) because its
main purpose is to enable subsequent DAGCombines for bitwise operations.
We don't want to just turn PTRADDs into disjoint ORs whenever that's
sound because this transform loses the information that the operation
implements pointer arithmetic, which AMDGPU for instance needs when
folding constant offsets.
For SWDEV-516125.
There are more places in SIISelLowering.cpp and AMDGPUISelDAGToDAG.cpp
that check for ISD::ADD in a pointer context, but as far as I can tell
those are only relevant for 32-bit pointer arithmetic (like frame
indices/scratch addresses and LDS), for which we don't enable PTRADD
generation yet.
For SWDEV-516125.
Previously this would only accept full copy hints. This relaxes
this to accept some subregister copies. Specifically, this now
accepts:
- Copies to/from physical registers if there is a compatible
super register
- Subreg-to-subreg copies
This has the potential to repeatedly add the same hint to the
hint vector, but not sure if that's a real problem.
If the original type was i32, type legalization will sign extend
the constant. This prevents it from having a single bit set or clear
so other patterns can't match. If the upper bits aren't used, we
can ignore the sign extension.
Similar for bclri and binvi.
The original patterns for the Xqci select-like instructions used
`select`, and marked that ISD node as legal. This is not the usual way
that `select` is dealt with in the RISC-V backend.
Usually on RISC-V, we expand `select` to `riscv_select_cc` which holds
references to the operands of the comparison and the possible values
depending on the comparison. In retrospect, this is a much better fit
for our instructions, as most of them correspond to specific condition
codes, rather than more generic `select` with a truthy/falsey value.
This PR moves the Xqci select-like patterns to use `riscv_select_cc`
nodes. This applies to the Xqcicm, Xqcics and Xqcicli instruction
patterns.
In order to match the existing codegen, minor additions had to be made
to `translateSetCCForBranch` to ensure that comparisons against specific
immediate values are left in a form that can be matched more closely by
the instructions. This prevents having to insert additional `li`
instructions and use the register forms.
There are a few slight regressions:
- There are sometimes more `mv` instructions than entirely necessary. I
believe these would not be seen with larger examples where the register
allocator has more leeway.
- In some tests where just one of the three extensions is enabled,
codegen falls back to using a branch over a move. With all three
extensions enabled (the configuration we most care about), these are not
seen.
- The generated patterns are very similar to each other - they have
similar complexity (7 or 8) and there are still overlaps. Sometimes the
choice between two instructions can be affected by the order of the
patterns in the tablegen file.
One other change is that Xqcicm instructions are prioritised over Xqcics
instructions where they have identical patterns. This is done because
one of the the Xqcicm instructions is compressible (`qc.mveqi`), while
none of the Xqcics instructions are.
This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.
For SWDEV-516125.
Add new event SCC_WRITE for s_barrier_signal_isfirst and s_barrier_leave,
instructions that write to SCC, counter is KM_CNT.
Also start tracking SCC for reads and writes.
s_barrier_wait on the same barrier guarantees that the SCC write from
s_barrier_signal_isfirst has landed, no need to insert s_wait_kmcnt.
Demonstrates the failure to keep avx512 mask predicate bit manipulation
patterns (based off the BMI1/BMI2/TBM style patterns) on the predicate
registers - unless the pattern is particularly complex the cost of
transferring to/from gpr outweighs any gains from better scalar
instructions
I've been rather random with the mask types for the tests, I can adjust
later on if there are particular cases of interest
As reported in https://github.com/llvm/llvm-project/issues/141034
SelectionDAG::getNode had some unexpected
behaviors when trying to create vectors with UNDEF elements. Since
we treat both UNDEF and POISON as undefined (when using isUndef())
we can't just fold away INSERT_VECTOR_ELT/INSERT_SUBVECTOR based on
isUndef(), as that could make the resulting vector more poisonous.
Same kind of bug existed in DAGCombiner::visitINSERT_SUBVECTOR.
Here are some examples:
This fold was done even if vec[idx] was POISON:
INSERT_VECTOR_ELT vec, UNDEF, idx -> vec
This fold was done even if any of vec[idx..idx+size] was POISON:
INSERT_SUBVECTOR vec, UNDEF, idx -> vec
This fold was done even if the elements not extracted from vec could
be POISON:
sub = EXTRACT_SUBVECTOR vec, idx
INSERT_SUBVECTOR UNDEF, sub, idx -> vec
With this patch we avoid such folds unless we can prove that the
result isn't more poisonous when eliminating the insert.
Fixes https://github.com/llvm/llvm-project/issues/141034
There is no RISCV isel for bitcast between f16 and bf16 which will
trigger "cannot select" fatal error.
Co-authored-by: Ying Wang <wy446777@alibaba-inc.com>
With this change, construction of abstract subprogram DIEs is split in
two stages/functions:
creation of DIE (in DwarfCompileUnit::getOrCreateAbstractSubprogramDIE)
and its population with children (in
DwarfCompileUnit::constructAbstractSubprogramScopeDIE).
With that, abstract subprograms can be created/referenced from
DwarfDebug::beginModule, which should solve the issue with static local
variables DIE creation of inlined functons with optimized-out
definitions. It fixes https://github.com/llvm/llvm-project/issues/29985.
LexicalScopes class now stores mapping from DISubprograms to their
corresponding llvm::Function's. It is supposed to be built before
processing of each function (so, now LexicalScopes class has a method
for "module initialization" alongside the method for "function
initialization"). It is used by DwarfCompileUnit to determine whether a
DISubprogram needs an abstract DIE before DwarfDebug::beginFunction is
invoked.
DwarfCompileUnit::getOrCreateSubprogramDIE method is added, which can
create an abstract or a concrete DIE for a subprogram. It accepts
llvm::Function* argument to determine whether a concrete DIE must be
created.
This is a temporary fix for
https://github.com/llvm/llvm-project/issues/29985. Ideally, it will be
fixed by moving global variables and types emission to
DwarfDebug::endModule (https://reviews.llvm.org/D144007,
https://reviews.llvm.org/D144005).
Some code proposed by Ellis Hoag <ellis.sparky.hoag@gmail.com> in
https://github.com/llvm/llvm-project/pull/90523 was taken for this
commit.