This flag applies to G_PTR_ADD instructions and indicates that the operation
implements an inbounds getelementptr operation, i.e., the pointer operand is in
bounds wrt. the allocated object it is based on, and the arithmetic does not
change that.
It is set when the IRTranslator lowers inbounds GEPs (currently only in some
cases, to be extended with a future PR), and in the
(build|materialize)ObjectPtrOffset functions.
Inbounds information is useful in ISel when we have instructions that perform
address computations whose intermediate steps must be in the same memory region
as the final result. A follow-up patch will start using it for AMDGPU's flat
memory instructions, where the immediate offset must not affect the memory
aperture of the address.
This is analogous to a concurrent effort in SDAG: #131862
(related: #140017, #141725).
For SWDEV-516125.
These functions are for building G_PTR_ADDs when we know that the base
pointer and the result are both valid pointers into (or just after) the
same object. They are similar to SelectionDAG::getObjectPtrOffset.
This PR also changes call sites of the generic (build|materialize)PtrAdd
functions that implement pointer arithmetic to split large memory
accesses to the new functions. Since memory accesses have to fit into an
object in memory, pointer arithmetic to an offset into a large memory
access also yields an address in that object.
Currently, these (build|materialize)ObjectPtrOffset functions only add
"nuw" to the generated G_PTR_ADD, but I intend to introduce an
"inbounds" MIFlag in a later PR (analogous to a concurrent effort in
SDAG: #131862, related: #140017, #141725) that will also be set in the
(build|materialize)ObjectPtrOffset functions.
Most test changes just add "nuw" to G_PTR_ADDs. Exceptions are AMDGPU's
call-outgoing-stack-args.ll, flat-scratch.ll, and freeze.ll tests, where
offsets are now folded into scratch instructions, and cases where the
behavior of the check regeneration script changed, resulting, e.g., in
better checks for "nusw G_PTR_ADD" instructions, matched empty lines,
and the use of "CHECK-NEXT" in MIPS tests.
For SWDEV-516125.
This adds a GISelValueTrackingPrinterPass that can print the known bits
and sign bit of each def in a function. It is built on the new pass
manager and so adds a NPM GISelValueTrackingAnalysis, renaming the older
class to GISelValueTrackingAnalysisLegacy.
The first 2 functions from the AArch64GISelMITest are ported over to an
mir test to show it working. It also runs successfully on all files in
llvm/test/CodeGen/AArch64/GlobalISel/*.mir that are not invalid. It can
hopefully be used to test GlobalISel known bits analysis more directly
in common cases, without jumping through the hoops that the C++ tests
requires.
This patch fixes:
third-party/unittest/googletest/include/gtest/gtest.h:1379:11:
error: comparison of integers of different signs: 'const unsigned
long' and 'const int' [-Werror,-Wsign-compare]
The current implementation always creates a 1 bit constant for the
result of the `G_ICMP`, which will cause issues if the destination
register size is larger than that. With asserts enabled, it will cause a
crash in `buildConstant`:
```
llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp:322: virtual MachineInstrBuilder llvm::MachineIRBuilder::buildConstant(const DstOp &, const ConstantInt &): Assertion `EltTy.getScalarSizeInBits() == Val.getBitWidth() && "creating constant with the wrong size"' failed.
```
This adds a Flags parameter to the BinaryOp_match, allowing it to detect
different flags like Disjoint. A m_GDisjointOr is added to detect Or's
with disjoint flags, and G_AddLike is then either a m_GADD or
m_GDisjointOr.
The rest is trying to allow matching `const MachineInstr&`, as opposed
to non-const references.
The standard libcalls for half to float and float to half conversion are
__extendhfsf2 and __truncsfhf2. However, LLVM currently uses
__gnu_h2f_ieee and __gnu_f2h_ieee instead. As far as I can tell, these
libcalls are an ARM-ism and only provided by libgcc on that platform.
compiler-rt always provides both libcalls.
Use the standard libcalls by default, and only use the __gnu libcalls on
ARM.
This pattern does the same thing as m_SpecificReg/Type except the value
it matches against origniated from an earlier pattern in the same
mi_match expression.
This patch also changes how commutative patterns are handled: in order
to support m_DefferedReg/Type, we always have to run the LHS-pattern
before the RHS one.
m_Type is supposed to extract the underlying value type (equality type
comparison is covered by m_SpecificType), therefore it should take a LLT
reference as its argument rather than passing by value.
This was originated from de256478e61d6488db751689af82d280ba114a6f, which
refactored out a good chunk of LLT reference usages. And it's just so
happen that (for some reasons) no one is using m_Type and no test was
covering it.
Following discussions in #110443, and the following earlier discussions
in https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html,
https://reviews.llvm.org/D38482, https://reviews.llvm.org/D38489, this
PR attempts to overhaul the `TargetMachine` and `LLVMTargetMachine`
interface classes. More specifically:
1. Makes `TargetMachine` the only class implemented under
`TargetMachine.h` in the `Target` library.
2. `TargetMachine` contains target-specific interface functions that
relate to IR/CodeGen/MC constructs, whereas before (at least on paper)
it was supposed to have only IR/MC constructs. Any Target that doesn't
want to use the independent code generator simply does not implement
them, and returns either `false` or `nullptr`.
3. Renames `LLVMTargetMachine` to `CodeGenCommonTMImpl`. This renaming
aims to make the purpose of `LLVMTargetMachine` clearer. Its interface
was moved under the CodeGen library, to further emphasis its usage in
Targets that use CodeGen directly.
4. Makes `TargetMachine` the only interface used across LLVM and its
projects. With these changes, `CodeGenCommonTMImpl` is simply a set of
shared function implementations of `TargetMachine`, and CodeGen users
don't need to static cast to `LLVMTargetMachine` every time they need a
CodeGen-specific feature of the `TargetMachine`.
5. More importantly, does not change any requirements regarding library
linking.
cc @arsenm @aeubanks
The implementation was missing the fact that `G_EXTRACT_SUBVECTOR`
destination and source vector can be different types.
Also fix a bug in the MIR builder for `G_EXTRACT_SUBVECTOR` to generate
the correct opcode.
Clarify the G_EXTRACT_SUBVECTOR specification.
While working on a MIR unittest, I noticed that parseMIR includes an
unused argument that sets a function name. This is not only redundant
but also irrelevant, as parseMIR is designed to parse entire module, not
specific functions, even though most unittests contain a single function
per module. To streamline the API, I have removed this unnecessary
argument from parseMIR. However, if this argument was originally
included to enhance readability or for any other purpose, please let me
know.
Previously we had the same instructions being generated for `ISD::CTLZ` and `ISD::CTLZ_ZERO_UNDEF` which did not take advantage of the fact that zero is an invalid input for `ISD::CTLZ_ZERO_UNDEF`. This commit separates codegen for the two cases to allow for the optimization for the latter case.
The details of the optimization are outlined in #82075Fixes#82075
Co-authored-by: Manish Kausik H <hmamishkausik@gmail.com>
- Remove some cases where ULEB128 isn't needed
- Add a fastDecodeULEB128 tailored for GlobalISel which does unchecked
decoding optimized for the common case, which is 1 byte values. We
rarely have >1 byte Inst IDs, OpIdx, etc. and those are the most common
ULEB users by far.
This specific LEB128 decode function generates almost 2x less
instructions than the generic one.
Currently the inserted mergelike instructions will be inserted at the
location of the G_PHI. Seems like the behaviour was correct before, but
the rework done in https://reviews.llvm.org/D114198 forgot to include
the part which makes sure the instructions will be inserted after all
the G_PHIs.
Recommits llvm/llvm-project#80378 which was reverted in
llvm/llvm-project#84330. The problem was that the change in
llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir used
217 as an opcode instead of a regex.
This patch is stacked on
https://github.com/llvm/llvm-project/pull/80372,
https://github.com/llvm/llvm-project/pull/80307, and
https://github.com/llvm/llvm-project/pull/80306.
ShuffleVector on scalable vector types gets IRTranslate'd to
G_SPLAT_VECTOR since a ShuffleVector that has operates on scalable
vectors is a splat vector where the value of the splat vector is the 0th
element of the first operand, because the index mask operand is the
zeroinitializer (undef and poison are treated as zeroinitializer here).
This is analogous to what happens in SelectionDAG for ShuffleVector.
`buildSplatVector` is renamed to`buildBuildVectorSplatVector`. I did not
make this a separate patch because it would cause problems to revert
that change without reverting this change too.
This function can be called from buildCopyToRegs where at least one of
the types is a scalable vector type. This function crashed because it
did not know how to handle scalable vector types.
This patch extends the functionality of getGCDType to handle when at
least one of the types is a scalable vector. getGCDType between a fixed
and scalable vector is not implemented since the docstring of the
function explains that getGCDType is used to build MERGE/UNMERGE
instructions and we will never build a MERGE/UNMERGE between fixed and
scalable vectors.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
This function can be called from buildCopyToRegs where at least one of
the types is a scalable vector type. This function crashed because it
did not know how to handle scalable vector types.
This patch extends the functionality of getLCMType to handle when at
least one of the types is a scalable vector. getLCMType between a fixed
and scalable vector is not implemented since the docstring of the
function explains that getLCMType is used to build MERGE/UNMERGE
instructions and we will never build a MERGE/UNMERGE between fixed and
scalable vectors.
This tries to allow libcalls to be tail called, using a similar method
to DAG where the type is checked to make sure they match, and if so the
backend, through lowerCall checks that the tailcall is valid for all
arguments.
FileCheck currently compiles a regular expression of the form
`Prefix1|Prefix2|...` and uses it to find the next prefix in the input.
If we had a fast regex implementation, this would be a useful thing to
do, as the regex implementation would be able to match multiple prefixes
more efficiently than a naive approach. However, with our actual regex
implementation, finding the prefixes basically becomes O(InputLen *
RegexLen * LargeConstantFactor), which is a lot worse than a simple
string search.
Replace the regex with StringRef::find(), and keeping track of the next
position of each prefix. There are various ways this could be improved
on, but it's already significantly faster that the previous approach.
For me, this improves check-llvm time from 138.5s to 132.5s, so by
around 4-5%.
For vector-interleaved-load-i16-stride-7.ll in particular, test time
drops from 5s to 2.5s.
The legalizer currently generates lots of G_AND artifacts.
For example between boolean uses and defs there is always a G_AND with a mask of 1, but when the target uses ZeroOrOneBooleanContents, this is unnecessary.
Currently these artifacts have to be removed using post-legalize combines.
Omitting these artifacts at their source in the artifact combiner has a few advantages:
- We know that the emitted G_AND is very likely to be useless, so our KnownBits call is likely worth it.
- The G_AND and G_CONSTANT can interrupt e.g. G_UADDE/... sequences generated during legalization of wide adds which makes it harder to detect these sequences in the instruction selector (e.g. useful to prevent unnecessary reloading of AArch64 NZCV register).
- This cleans up a lot of legalizer output and even improves compilation-times.
AArch64 CTMark geomean: `O0` -5.6% size..text; `O0` and `O3` ~-0.9% compilation-time (instruction count).
Since this introduces KnownBits into code-paths used by `O0`, I reduced the default recursion depth.
This doesn't seem to make a difference in CTMark, but should prevent excessive recursive calls in the worst case.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D159140
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
Introduced the convergent equivalent of the existing G_INTRINSIC opcodes:
- G_INTRINSIC_CONVERGENT
- G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS
Out of the targets that currently have some support for GlobalISel, the patch
assumes that the convergent intrinsics only relevant to SPIRV and AMDGPU.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D154766