Try to more aggressively narrow masks of extended values.
This is mainly for cases where the mask is trying to zero out any_extended upper bits, assuming we can zext/trunc the values for free.
This catches a few actual missed folds, as well as helps canonicalize a number of other cases which were being caught in isel etc.
Differential Revision: https://reviews.llvm.org/D145866
These attributes have been accepted in ELF for a while, and are generated by
Clang in some places, so it makes sense to support them on MachO too.
https://reviews.llvm.org/D143173
An instruction that is "always uniform" is so even if it occurs in an
irreducible cycle. The output produced by such an instruction may depend on the
implementation defined cycle hierarchy, but that does not affect the uniformity
of the output. In other words, an "always uniform" instruction is uniform even
if it is not m-converged.
Reviewed By: ruiling, ronlieb
Differential Revision: https://reviews.llvm.org/D145572
This change improves FS discriminators in the following ways:
(1) use call-stack debug information in the the to generate
discriminators: the same (src/line) DILs can now have same
discriminator value if they come from different call-stacks.
This effectively increases the usable discriminator values
for each round of FS discriminator pass.
(2) don't generate the FS discriminator for meta instructions
(i.e. instructions not emitted). This reduces the number
discriminators conflicts (for the case we run out of discriminator
bits for that pass).
(3) use less expensive hashing of xxHash64.
These improvements should bring better performance for FSAFDO
and they should be used by default. But this change creates
incompatible FS discriminators. For the iterative profile users,
they might see a performance drop in the first release with
this change (due to the fact that the profiles have the old
discriminators and the compiler uses the new discriminator).
We have measured that this is not more than 1.5% on several
benchmarks. Note the degradation should be gone in the second
release and one should expect a performance gain over the binary
without this change.
One possible solution to the iterative profile issue would be
separating discriminators for profile-use and the ones emitted to
the binary. This would require a mechanism to allow two sets of
discriminators to be maintained and then phasing out the first
approach. This is too much churn in the compiler and the
performance implications do not seem to be worth the effort.
Instead, we put the changes under an option so iterative profile
users can do a gradual rollout of this change. We will make the
option default value to true in a later patch and eventually
purge this option from the code base.
Differential Revision: https://reviews.llvm.org/D145171
The patch mainly does two things. The first is allowing scalable vector
ISD::STRICT_FP_EXTEND. The second is making RISC-V customized lower
strict_fpextend to riscv_strict_fpextend_vl, the strict version of
riscv_fpextend_vl.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D145548
A pointer dereference was added (D141302) above an assert that checks
whether the pointer is null. This commit moves the assert above the
dereference and transforms it into an llvm_unreachable to better express
the intent that certain switch cases should never be reached.
Differential Revision: https://reviews.llvm.org/D145599
This patch resolves suboptimal code generation reported by https://github.com/llvm/llvm-project/issues/60571 .
DAGCombiner currently converts `(x or/xor const) + y` to `(x + y) + const` if this is valid.
However, if `.. + const` is broken down into a sequences of adds with carries, the benefit is not clear, introducing two more add(-with-carry) ops (total 6) in the case of the reported issue whereas the optimal sequence must only have 4 add(-with-carry)s.
This patch resolves this issue by allowing this conversion only when (1) `.. + const` is legal or promotable, or (2) `const` is a sign bit because it does not introduce more adds.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D144116
I've pulled this change from D145404 to land in isolation because
I'm concerned the code might be more important than the test
coverage might suggest (NOTE: the code has no test coverage).
The merged store touches memory for other underlying objects, so mapping
the merged store to the first underlying object is not correct. For example
in https://github.com/llvm/llvm-project/issues/60744, the merged store is
not correctly analyzed as dependent with memory operations which are also
part of the merged store.
Fixes#60744
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D144711
This is generally done by the InstCombine, but can be emitted as an
intermediate step and is cheap to handle.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D145177
This is generally done by the InstCombine, but can be emitted as an
intermediate step and is cheap to handle.
Differential Revision: https://reviews.llvm.org/D145143
It turns out that there are relatively trivial, albeit rare, cases that
require a MaxDepth of more than 16 (see added test). However, we want to
avoid having to rely on a large fixed MaxDepth.
Since these cases are relatively rare, apply the following strategy:
1. Start with a low MaxDepth of 16 - if the entry node was not
reached, we can return (the common case).
2. If the entry node was reached, exponentially increase MaxDepth up
to some large limit that should cover all cases and guard against
stack exhaustion.
This retains the better performance with a low MaxDepth in the common
case, and in complex cases backs off and retries. On a whole, this is
preferable vs. starting with a large MaxDepth which would unnecessarily
penalize the common case where a low MaxDepth is sufficient.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D145386
To make legalization easier, the operands and outputs have the same size for
these ISD Nodes. When legalizing the results in PromoteIntegerResult the operands
are legalized to the same size as the outputs.
The ISD Node has two output/results, therefore the legalizing functions update
both results/outputs.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D144846
Move MaxDepth into the lambda, since it is not needed outside. This
fixes some compilers that complain about missing capture:
error C3493: 'MaxDepth' cannot be implicitly captured because no
default capture mode has been specified
Fixes: f693932fbea7 ("[SelectionDAG] Transitively copy NodeExtraInfo on RAUW")
GISel's CSE mechanism lazily inserts instructions into the CSE List
to improve on efficiency as well as efficacy of CSE
(for allowing partially built instructions to be fully built).
There's unfortunately a mutual recursion via
`handleRecordedInsts -> handleRecordedInst -> insertNode-> handleRecordedInsts`.
So this change simply records that we're already draining this list so we can just bail out on the recursion.
No changes to codegen are expected as we're still draining/handling the temporary
list via pop_back and we should get the same sequence of instructions
whether we call pop_back in a loop at the top level or recursive.
https://reviews.llvm.org/D145006
reviewed by: dsanders
During legalization of the SelectionDAG, some nodes are replaced with
arch-specific nodes. These may be complex nodes, where the root node no
longer corresponds to the node that should carry the extra info.
Fix the issue by copying extra info to the new node and all its new
transitive operands during RAUW. See code comments for more details.
This fixes the remaining pcsections-atomics.ll tests on X86.
v2: Optimize copyExtraInfo() deep copy. For now we assume that only
NodeExtraInfo that have PCSections set require deep copy. Furthermore,
limit the depth of graph search while pre-populating the visited set,
assuming the to-be-replaced subgraph 'From' has limited complexity. An
assertion catches if the maximum depth needs to be increased.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D144677
This is guarding a check for isTypeLegal so it should check is
LegalTypes.
Fixes PR61111.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D145139
This patch adds AArch64 CodeGen support such that the type can be passed
and returned to/from functions, and also adds support to use this type in
load/store operations and PHI nodes.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D136862
As part of this work, removing `SDDbgValue::clearIsEmitted` originally added for
`dbg.addr` in 045c67769d7fe577fc38cccb6fb40fd814437447 was attempted, but it
appears some tests for `DBG_INSTR_REF` now depend on that behaviour as well, so
it was kept and comments were updated instead.
Part of `dbg.addr` removal
Discussed in https://discourse.llvm.org/t/what-is-the-status-of-dbg-addr/62898
Differential Revision: https://reviews.llvm.org/D144800
This removes the unused `FuncArgumentDbgValueKind::Addr` value originally added
by e24f5348798605a799c63ff09169d177d262cd37. The intent was to signal the
original intrinsic that marked a function argument, but the `Addr` part was
never used.
Part of `dbg.addr` removal
Discussed in https://discourse.llvm.org/t/what-is-the-status-of-dbg-addr/62898
Differential Revision: https://reviews.llvm.org/D144794
As noticed on D144789, when we have pairs of min/max nodes we often end up with multiple comparisons which we could reuse with commuted select ops, so check to see if a suitable SETCC already exists. This also allowed us to remove a similar X86 peephole.
There are other getSETCC cases where we could safely reuse other CondCodes as well - I've been trying to think of how we could reuse this logic in SelectionDAG but haven't found anything that always works well.
An alternative would be to have a TLI callback that returns a preferred CondCode from a list of options, I've noticed this helped fpclamptosat tests on some other targets (MVE + WebAssembly), but other tests suffered.
Differential Revision: https://reviews.llvm.org/D145065
Default MaxDivRemBitWidthSupported to 128, so that divisions larger
than 128 bits are always expanded, without requiring additional
configuration from the target.
Note that this may still emit calls to __udivti3 on 32-bit targets,
which likely don't have an implementation of that builtin. However,
I believe this is sufficient to fix
https://github.com/llvm/llvm-project/issues/60531, because Zig must
already be defining those builtins.
Differential Revision: https://reviews.llvm.org/D144871
To make legalization easier, the operands and outputs have the same size for
these ISD Nodes. When legalizing the results in SplitVectorResult the operands
are legalized to the same size as the outputs.
The ISD Node has two output/results, therefore the legalizing functions update
both results/outputs.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D144744
Put jump table in .rdata for Windows to align with that for Linux.
It can avoid loading the same code page into I$ and D$ simultaneously
and thus favor performance.
Differential Revision: https://reviews.llvm.org/D144701
LLVM_DUMP_METHOD includes ATTRIBUTE_NOINLINE. operator<< isn't
what we normally consider a dump method so it should be ok to inline.
This fixes a warning from gcc that some other declaration for some
other class was inline but this one is noinline. Seems like a bogus
warning from gcc really.
This patch adds support for TargetExtType/target(...) representing
SPIR-V builtin types. After D135202, target(...) is the preferred way
for representing SPIR-V builtin types in LLVM IR and the only working
in the opaque pointer mode.
In order to maintain compatibility with LLVM IR generated by older
versions of Clang and LLVM/SPIR-V Translator, pointers-to-opaque-structs
denoting SPIR-V/OpenCL builtin types will be translated to equivalent
SPIR-V target extension types. This translation is only available in the
typed pointer mode (-opaque-pointers=0).
The relevant LIT tests with SPIR-V builtins were converted to use the
new target(...) notation.
Differential Revision: https://reviews.llvm.org/D144494
If we have sext_inreg(vector_extract(x)) but the top bits are not used, DAG
will try to remove the sext_inreg, using vector_extract(x) directly. This can
lead to multiple uses of both sext_inreg(vector_extract(x)) and
vector_extract(x), leading to the generation of both umov and smov extracts.
This adds a target hook to prevent that under AArch64 where the sext_inreg can
be considered free if there are multiple uses of the sext and no uses of the
vector_extract. This helps fix a small regression from D144550.
Differential Revision: https://reviews.llvm.org/D144850
During legalization of the SelectionDAG, some nodes are replaced with
arch-specific nodes. These may be complex nodes, where the root node no
longer corresponds to the node that should carry the extra info.
Fix the issue by copying extra info to the new node and all its new
transitive operands during RAUW. See code comments for more details.
This fixes the remaining pcsections-atomics.ll tests on X86.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D144677
We use this combine in the AArch64 postlegalizer combiner, which causes this
function to query the legalizer rules for the action for an invalid opcode/type
combination (G_AND and p0). Moving the legalizer query until after the validity
check in matchHoistLogicOpWithSameOpcodeHands() fixes this.
`(and/or (icmp eq/ne A,C0), (icmp eq/ne A,C1))` can be lowered to
`(icmp eq/ne (and (sub A, (smin C0, C1)), (not (sub (smax C0, C1), (smin C0, C1)))), 0)`
generically if `(sub (smax C0, C1), (smin C0,C1))` is a power of 2.
This covers the existing case of `(and/or (icmp eq/ne A, C_Pow2),(icmp eq/ne A, -C_Pow2))`
as well as other cases.
Alive2 Links:
EQ: https://alive2.llvm.org/ce/z/mLJiUW
NE: https://alive2.llvm.org/ce/z/TKnzUr
Differential Revision: https://reviews.llvm.org/D144283
In Codeview, the basic type of a complex represents the size
of an individual component rather than the sum of the real
and imaginary components.
Differential Revision: https://reviews.llvm.org/D143760
This is recommit of 2e416cdd52, fixed to be accepatble by GCC.
The original commit message is below.
With this change bitwise operations are allowed for FPClassTest
enumeration, it must simplify using this type. Also some functions
changed to get argument of type FPClassTest instead of unsigned.
Differential Revision: https://reviews.llvm.org/D144241