Currently LibcallLoweringInfo is defined inside of TargetLowering,
which is owned by the subtarget. Pass in the subtarget so we can
construct LibcallLoweringInfo with the subtarget. This is a temporary
step that should be revertable in the future, after LibcallLoweringInfo
is moved out of TargetLowering.
This aims to fix the crash in #168495, my combine rule was
missing a check that the source vector was in fact a vector. This then
caused the legality check to fail in this example as the concat was
trying to concat a non vector.
I have also gated the bitcast of the concat to only work on non-scalable
vectors as the mutation calls `getNumElements` which crashes when called
on a scalable vector.
Fixes#168495
For vectors, CTLZ, CTTZ, CTPOP all operate on individual elements. The
lowering should be based on the element width.
I noticed this by inspection. No tests in tree are currently affected,
but I thought it would be good to fix so someone doesn't have to debug
it in the future.
If vector-unaligned-mem support is not enabled, we should not generate
loads/stores that are not aligned to their element size.
We already do this for non-VP vector loads/stores.
This code has been in our downstream for about a year and a half after
finding the vectorizer generating misaligned loads/stores. I don't think
that is unique to our downstream.
Doing this for masked vp.load/store requires widening the mask as well
which is harder to do.
NOTE: Because we have to scale the VL, this will introduce additional
vsetvli and the VL optimizer will not be effective at optimizing any
arithmetic that is consumed by the store.
Updates the demanded elements before recursing through copies in case
the type of the source register changes from a non-vector register to a
vector register.
Fixes#167842.
This adds handling for f16 and f128 lround/llround under LP64 targets,
promoting the f16 where needed and using a libcall for f128. This
codegen is now identical to the selection dag version.
20a22a45e96bc94c3a8295cccc9031bd87552725 was supposed to fully remove
these, but left around the functionality to actually compute them and a
unittest that ensured they worked. These are not development features in
the sense of features used in development mode, but experimental
features that have been superseded by MIR2Vec.
This PR adds a new combine to the `post-legalizer-combiner` pass. The
new combine checks for vectors being unmerged and subsequently padded
with `G_IMPLICIT_DEF` values by building a new vector. If such a case is
found, the vector being unmerged is instead just concatenated with a
`G_IMPLICIT_DEF` that is as wide as the vector being unmerged.
This removes unnecessary `mov` instructions in a few places.
In the Dhrystone benchmark, I find some adjacent global not be merged,
on the contrary the GCC's anchor optimize is work. Use
global-merge-max-offset to set the max offset can yield similar results
(still slightly different, at least we can control the offset).
This changes `MCRegUnit` type from `unsigned` to `enum class : unsigned`
and inserts necessary casts.
The added `MCRegUnitToIndex` functor is used with `SparseSet`,
`SparseMultiSet` and `IndexedMap` in a few places.
`MCRegUnit` is opaque to users, so it didn't seem worth making it a
full-fledged class like `Register`.
Static type checking has detected one issue in
`PrologueEpilogueInserter.cpp`, where `BitVector` created for
`MCRegister` is indexed by both `MCRegister` and `MCRegUnit`.
The number of casts could be reduced by using `IndexedMap` in more
places and/or adding a `BitVector` adaptor, but the number of casts *per
file* is still small and `IndexedMap` has limitations, so it didn't seem
worth the effort.
Pull Request: https://github.com/llvm/llvm-project/pull/167943
Teach `SDNodeInfoEmitter` TableGen backend to process `SDTypeConstraint`
records and emit tables for them. The tables are used by
`SDNodeInfo::verifyNode()` to validate a node being created.
This PR only adds validation code for `SDTCisVT` and `SDTCVecEltisVT`
constraints to keep it smaller.
Pull Request: https://github.com/llvm/llvm-project/pull/150125
TargetConstant nodes don't match TableGen ImmLeaf patterns during
instruction selection. When this zero constant flows into the AArch64
CCMP formation code, the machine verifier hits an assertion in expensive
checks.
Fixes: #168227
This PR improves the lowering of vectors of fp16 when using fpext.
Previously vectors of fp16 were scalarized leading to lots of extra
instructions. Now, vectors of fp16 will be lowered when extended to fp64
via the preexisting lowering logic for extends. To make use of the
existing logic, we need to add elements until we reach the next power of
2.
To avoid scaling offsets back and forth. This is also what SelectionDAG
equivalent (ComputeValueVTs) does, and will allow to reuse
ComputeValueTypes with less effort.
After the base branch was moved to main, this somehow ended up
adding a second definition of RTLCI, instead of modifying the
existing one.
Also fix other build error with gcc bots.
This fixes the -fveclib flag getting lost on its way to the backend.
Previously this was its own cl::opt with a random boolean. Move the
flag handling into CommandFlags with other backend ABI-ish options,
and have clang directly set it, rather than forcing it to go through
command line parsing.
Prior to de68181d7f, codegen used TargetLibraryInfo to find the vector
function. Clang has special handling for TargetLibraryInfo, where it
would
directly construct one with the vector library in the pass pipeline.
RuntimeLibcallsInfo currently is not used as an analysis in codegen, and
needs to know the vector library when constructed.
RuntimeLibraryAnalysis could follow the same trick that
TargetLibraryInfo is using in the future, but a lot more boilerplate changes
are needed to thread that analysis through codegen. Ideally this would come
from an IR module flag, and nothing would be in TargetOptions. For now, it's
better for all of these sorts of controls to be consistent.
RegisterId can represent a physical register, a MCRegUnit, or
an index into a side structure that stores register masks. These 3
types were encoded by using the physical reg, stack slot, and
virtual register encoding partitions from the Register class.
This encoding scheme alias wasn't well contained so
Register::index2StackSlot and Register::stackSlotIndex appeared
in multiple places.
This patch gives RegisterRef its own encoding defines and separates
it from Register.
I've removed the generic idx() method in favor of getAsMCReg(),
getAsMCRegUnit(), and getMaskIdx() for some degree of type safety.
Some places used the RegisterId field of RegisterRef directly as a
register. Those have been updated to use getAsMCReg.
Some special cases for RegisterId 0 have been removed as it can
be treated like a MCRegister by existing code.
I think I want to rename the Reg field of RegisterRef to Id, but
I'll do that in another patch.
Additionally, callers of the RegisterRef constructor need to be
audited for implicit conversions from Register/MCRegister
to unsigned.
InlineAsmLowering rejected inline assembly with memory reference inputs
if the values passed to the inline asm weren't pointers. The DAG
lowering however handled them just fine.
This patch updates InlineAsmLowering to store such values on the stack,
and then use the stack pointer as the "indirect" version of the operand.
In the new test, we're trying to fold a load and a X86ISD::CALL. The
call has a CopyToReg glued to it. The load and the call have different
input chains so they need to be merged. This results in a TokenFactor
that gets put between the CopyToReg and the final CALLm instruction. The
DAG scheduler can't handle that.
The load here was created by legalization of the extract_element using a
stack temporary store and load. A normal IR load would be chained into
call sequence by SelectionDAGBuilder. This would usually have the load
chained in before the CopyToReg. The store/load created by legalization
don't get chained into the rest of the DAG.
Fixes#63790
The LocID for registers is just the register ID. The getLocID function
is supposed to hide this detail, but it wasn't being used consistently.
This avoids a bunch of implicit casts from Register or MCRegister to
unsigned.
This patch adds a Clang-compatible --save-stats option to opt, to
provide an easy to use way to save LLVM statistics files when working
with opt on the middle end.
This is a follow up on the addition to `llc`:
https://github.com/llvm/llvm-project/pull/163967
Like on Clang, one can specify --save-stats, --save-stats=cwd, and
--save-stats=obj with the same semantics and JSON format. The
pre-existing --stats option is not affected.
The implementation extracts the flag and its methods into the common
`CodeGen/CommandFlags` as `LLVM_ABI`, using a new registration class to
conservatively enable opt-in rather than let all tools take it. Its only
needed for llc and opt for now. Then it refactors llc and adds support
for opt.
Use it in `printVRegOrUnit()`, `getPressureSets()`/`PSetIterator`,
and in functions/classes dealing with register pressure.
Static type checking revealed several bugs, mainly in MachinePipeliner.
I'm not very familiar with this pass, so I left a bunch of FIXMEs.
There is one bug in `findUseBetween()` in RegisterPressure.cpp, also
annotated with a FIXME.