Remove the `-lower-global-dtors-via-cxa-atexit` escape hatch introduced
in D121736 [1], which switched the default lowering of global
destructors on MachO to use `__cxa_atexit()` to avoid emitting
deprecated `__mod_term_func` sections.
I added this flag as an escape hatch in case the switch causes any
problems. We didn't discover any problems so now we can remove it.
[1] https://reviews.llvm.org/D121736
rdar://90277838
Differential Revision: https://reviews.llvm.org/D145715
Try to remove extra bitcasts around logicops if we're dealing with illegal types
Fixes the regressions in D145939
Differential Revision: https://reviews.llvm.org/D146032
Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis.
No explosions seen during internal testing so this looks like a smooth transition.
Reviewed By: sameerds
Differential Revision: https://reviews.llvm.org/D145918
This adds two new methods to ShuffleVectorInst, isInterleave and
isInterleaveMask, so that the logic to check if a shuffle mask is an
interleave can be shared across the TTI, codegen and the interleaved
access pass.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D145971
Switch DAGISel over to UniformityAnalysis, which was one of the last remaining users of the DivergenceAnalysis.
No explosions seen during internal testing so this looks like a smooth transition.
Reviewed By: sameerds
Differential Revision: https://reviews.llvm.org/D145918
Each target's `TargetInstrInfo` is responsible for announcing which code
patterns it is able to transform during the MachineCombiner pass.
Currently, these patterns are applied without preserving the debug
instruction number required by the InstrRef implementation of
LiveDebugValues. As such, we've seen a number of examples where debug
information is dropped for variables in InstrRef mode that were
otherwise available in VarLoc mode. This has been observed both in X86
and AArch examples.
This commit is an initial attempt at preserving said numbers by changing
the general (target agnostic) implementation of TargetInstrInfo: the
reassociation pattern must keep the debug number of the "top level"
instruction, i.e., the instruction whose value represents the final
value of the arithmetic expression. Intermediate values must have their
debug number dropped, as they have no equivalent value in the
unoptimized code.
Future work is required to update each target's
`TargetInstrInfo::genAlternativeCodeSequence` method.
Differential Revision: https://reviews.llvm.org/D145759
[DAGCombiner] handle more store value forwarding
When lowering calls on target like PPC, some stack loads
will be generated for by value parameters. Node CALLSEQ_START
prevents such loads from being combined.
Suggested by @RolandF, this patch removes the unnecessary
loads for the byval parameter by extending ForwardStoreValueToDirectLoad
Reviewed By: nemanjai, RolandF
Differential Revision: https://reviews.llvm.org/D138899
Try to more aggressively narrow masks of extended values.
This is mainly for cases where the mask is trying to zero out any_extended upper bits, assuming we can zext/trunc the values for free.
This catches a few actual missed folds, as well as helps canonicalize a number of other cases which were being caught in isel etc.
Differential Revision: https://reviews.llvm.org/D145866
These attributes have been accepted in ELF for a while, and are generated by
Clang in some places, so it makes sense to support them on MachO too.
https://reviews.llvm.org/D143173
An instruction that is "always uniform" is so even if it occurs in an
irreducible cycle. The output produced by such an instruction may depend on the
implementation defined cycle hierarchy, but that does not affect the uniformity
of the output. In other words, an "always uniform" instruction is uniform even
if it is not m-converged.
Reviewed By: ruiling, ronlieb
Differential Revision: https://reviews.llvm.org/D145572
This change improves FS discriminators in the following ways:
(1) use call-stack debug information in the the to generate
discriminators: the same (src/line) DILs can now have same
discriminator value if they come from different call-stacks.
This effectively increases the usable discriminator values
for each round of FS discriminator pass.
(2) don't generate the FS discriminator for meta instructions
(i.e. instructions not emitted). This reduces the number
discriminators conflicts (for the case we run out of discriminator
bits for that pass).
(3) use less expensive hashing of xxHash64.
These improvements should bring better performance for FSAFDO
and they should be used by default. But this change creates
incompatible FS discriminators. For the iterative profile users,
they might see a performance drop in the first release with
this change (due to the fact that the profiles have the old
discriminators and the compiler uses the new discriminator).
We have measured that this is not more than 1.5% on several
benchmarks. Note the degradation should be gone in the second
release and one should expect a performance gain over the binary
without this change.
One possible solution to the iterative profile issue would be
separating discriminators for profile-use and the ones emitted to
the binary. This would require a mechanism to allow two sets of
discriminators to be maintained and then phasing out the first
approach. This is too much churn in the compiler and the
performance implications do not seem to be worth the effort.
Instead, we put the changes under an option so iterative profile
users can do a gradual rollout of this change. We will make the
option default value to true in a later patch and eventually
purge this option from the code base.
Differential Revision: https://reviews.llvm.org/D145171
The patch mainly does two things. The first is allowing scalable vector
ISD::STRICT_FP_EXTEND. The second is making RISC-V customized lower
strict_fpextend to riscv_strict_fpextend_vl, the strict version of
riscv_fpextend_vl.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D145548
A pointer dereference was added (D141302) above an assert that checks
whether the pointer is null. This commit moves the assert above the
dereference and transforms it into an llvm_unreachable to better express
the intent that certain switch cases should never be reached.
Differential Revision: https://reviews.llvm.org/D145599
This patch resolves suboptimal code generation reported by https://github.com/llvm/llvm-project/issues/60571 .
DAGCombiner currently converts `(x or/xor const) + y` to `(x + y) + const` if this is valid.
However, if `.. + const` is broken down into a sequences of adds with carries, the benefit is not clear, introducing two more add(-with-carry) ops (total 6) in the case of the reported issue whereas the optimal sequence must only have 4 add(-with-carry)s.
This patch resolves this issue by allowing this conversion only when (1) `.. + const` is legal or promotable, or (2) `const` is a sign bit because it does not introduce more adds.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D144116
I've pulled this change from D145404 to land in isolation because
I'm concerned the code might be more important than the test
coverage might suggest (NOTE: the code has no test coverage).
The merged store touches memory for other underlying objects, so mapping
the merged store to the first underlying object is not correct. For example
in https://github.com/llvm/llvm-project/issues/60744, the merged store is
not correctly analyzed as dependent with memory operations which are also
part of the merged store.
Fixes#60744
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D144711
This is generally done by the InstCombine, but can be emitted as an
intermediate step and is cheap to handle.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D145177
This is generally done by the InstCombine, but can be emitted as an
intermediate step and is cheap to handle.
Differential Revision: https://reviews.llvm.org/D145143
It turns out that there are relatively trivial, albeit rare, cases that
require a MaxDepth of more than 16 (see added test). However, we want to
avoid having to rely on a large fixed MaxDepth.
Since these cases are relatively rare, apply the following strategy:
1. Start with a low MaxDepth of 16 - if the entry node was not
reached, we can return (the common case).
2. If the entry node was reached, exponentially increase MaxDepth up
to some large limit that should cover all cases and guard against
stack exhaustion.
This retains the better performance with a low MaxDepth in the common
case, and in complex cases backs off and retries. On a whole, this is
preferable vs. starting with a large MaxDepth which would unnecessarily
penalize the common case where a low MaxDepth is sufficient.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D145386
To make legalization easier, the operands and outputs have the same size for
these ISD Nodes. When legalizing the results in PromoteIntegerResult the operands
are legalized to the same size as the outputs.
The ISD Node has two output/results, therefore the legalizing functions update
both results/outputs.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D144846
Move MaxDepth into the lambda, since it is not needed outside. This
fixes some compilers that complain about missing capture:
error C3493: 'MaxDepth' cannot be implicitly captured because no
default capture mode has been specified
Fixes: f693932fbea7 ("[SelectionDAG] Transitively copy NodeExtraInfo on RAUW")
GISel's CSE mechanism lazily inserts instructions into the CSE List
to improve on efficiency as well as efficacy of CSE
(for allowing partially built instructions to be fully built).
There's unfortunately a mutual recursion via
`handleRecordedInsts -> handleRecordedInst -> insertNode-> handleRecordedInsts`.
So this change simply records that we're already draining this list so we can just bail out on the recursion.
No changes to codegen are expected as we're still draining/handling the temporary
list via pop_back and we should get the same sequence of instructions
whether we call pop_back in a loop at the top level or recursive.
https://reviews.llvm.org/D145006
reviewed by: dsanders
During legalization of the SelectionDAG, some nodes are replaced with
arch-specific nodes. These may be complex nodes, where the root node no
longer corresponds to the node that should carry the extra info.
Fix the issue by copying extra info to the new node and all its new
transitive operands during RAUW. See code comments for more details.
This fixes the remaining pcsections-atomics.ll tests on X86.
v2: Optimize copyExtraInfo() deep copy. For now we assume that only
NodeExtraInfo that have PCSections set require deep copy. Furthermore,
limit the depth of graph search while pre-populating the visited set,
assuming the to-be-replaced subgraph 'From' has limited complexity. An
assertion catches if the maximum depth needs to be increased.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D144677
This is guarding a check for isTypeLegal so it should check is
LegalTypes.
Fixes PR61111.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D145139
This patch adds AArch64 CodeGen support such that the type can be passed
and returned to/from functions, and also adds support to use this type in
load/store operations and PHI nodes.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D136862
As part of this work, removing `SDDbgValue::clearIsEmitted` originally added for
`dbg.addr` in 045c67769d7fe577fc38cccb6fb40fd814437447 was attempted, but it
appears some tests for `DBG_INSTR_REF` now depend on that behaviour as well, so
it was kept and comments were updated instead.
Part of `dbg.addr` removal
Discussed in https://discourse.llvm.org/t/what-is-the-status-of-dbg-addr/62898
Differential Revision: https://reviews.llvm.org/D144800
This removes the unused `FuncArgumentDbgValueKind::Addr` value originally added
by e24f5348798605a799c63ff09169d177d262cd37. The intent was to signal the
original intrinsic that marked a function argument, but the `Addr` part was
never used.
Part of `dbg.addr` removal
Discussed in https://discourse.llvm.org/t/what-is-the-status-of-dbg-addr/62898
Differential Revision: https://reviews.llvm.org/D144794
As noticed on D144789, when we have pairs of min/max nodes we often end up with multiple comparisons which we could reuse with commuted select ops, so check to see if a suitable SETCC already exists. This also allowed us to remove a similar X86 peephole.
There are other getSETCC cases where we could safely reuse other CondCodes as well - I've been trying to think of how we could reuse this logic in SelectionDAG but haven't found anything that always works well.
An alternative would be to have a TLI callback that returns a preferred CondCode from a list of options, I've noticed this helped fpclamptosat tests on some other targets (MVE + WebAssembly), but other tests suffered.
Differential Revision: https://reviews.llvm.org/D145065
Default MaxDivRemBitWidthSupported to 128, so that divisions larger
than 128 bits are always expanded, without requiring additional
configuration from the target.
Note that this may still emit calls to __udivti3 on 32-bit targets,
which likely don't have an implementation of that builtin. However,
I believe this is sufficient to fix
https://github.com/llvm/llvm-project/issues/60531, because Zig must
already be defining those builtins.
Differential Revision: https://reviews.llvm.org/D144871