The previous position of llvm.protected.field.ptr lowering for loads
and stores was problematic as it not only inhibited optimizations such
as DSE (as stores to a llvm.protected.field.ptr were not considered to
must-alias stores to the non-protected.field pointer) but also required
changes to other optimization passes to avoid transformations that would
reduce PFP coverage.
Address this by moving the load/store part of the lowering to
InstCombine, where it will run earlier than the PFP-breaking and
AA-relying transformations. The deactivation symbol, null comparison
and EmuPAC parts of the lowering remain in PreISelLowering.
Now that the transformation inhibitions are no longer needed, remove them
(i.e. partially revert #151649, and revert #182976).
This change resulted in a 2.4% reduction in Fleetbench .text size and
the following improvements to PFP performance overhead for BM_PROTO_Arena
on various microarchitectures:
before after
Apple M2 Ultra 3.5% 3.3%
Google Axion C4A 3.3% 2.9%
Google Axion N4A 2.7% 2.2%
Reviewers: fmayer, nikic, vitalybuka
Reviewed By: fmayer
Pull Request: https://github.com/llvm/llvm-project/pull/186548
This fixes a rematerializer issue wherein re-creating the interval of a
non-rematerializable super-register defined over multiple MIs, some of
which defining entirely dead sub-registers, could cause a crash when
changing the order of sub-definitions (for example during scheduling)
because the re-created interval could end up with multiple connected
components, which is illegal. The solution is to split separate
components of the interval in such cases. The added unit test crashes
without that added behavior.
The rematerializer implements support for rolling back
rematerializations by modifying MIs that should normally be deleted in
an attempt to make them "transparent" to other analyses. This involves:
1. setting their opcode to DBG_VALUE and
2. setting their read register operands to the sentinel register.
This approach has several drawbacks.
1. It forces the rematerializer to support tracking these "dead MIs"
(even if support is optional, these data-structures have to exist).
2. It is not actually clear whether this mechanism will interact well
with all other analyses. This is an issue since the intent of the
rematerializer is to be usable in as many contexts as possible.
3. In practice, it has shown itself to be relatively error-prone.
This commit removes rollback support from the rematerializer and moves
those capabilities to a rematerializer listener than can be instantiated
on-demand and implements the same functionality on top of standard
rematerializer operations. The rematerializer now actually deletes MIs
that are no longer useful after rematerializations, and has support for
re-creating them on-demand without requiring additional tracking on its
part.
The visited set can grow rather large and we can use an unused field in
SDNode to store the same information without the use of a hash set.
This improves compile times: stage2-O3 -0.14%.
Fixes#189481
Implement ISD::SPLAT_VECTOR in SelectionDAG::computeKnownFPClass to
correctly propagate floating-point properties from scalar operands to
vectors.
Added AArch64 and RISC-V test coverage
Match naming convention for other m_Specific* matchers, and frees up the
m_Opc() matcher for future use in #84940 to allow us to capture the
opcode of a unknown binop
Moving to m_SpecificOpc does mess up the formatting in a few places,
I've tried to refactor to use the m_Value(SDValue, ....) matcher where I
can to retrieve some whitespace
For a given element, I believe A is only 0 when the divisor is INT_MIN.
The only way for NeedToApplyOffset to be false after processing all
elements, is for all divisors to be INT_MIN. If all divisors are
INT_MIN, then all divisors are a power of 2 and we wouldn't do the
transform.
Fix issue reported on
https://github.com/llvm/llvm-project/pull/188296#issuecomment-4179103756
`SwiftErrorValueTracking` holds per-function state used by
`IRTranslator`.
On targets where `TargetLowering::supportSwiftError()` is false, (e.g.
wasm) `SwiftErrorValueTracking::setFunction()` exits early.
Historically, that early return happened before clearing per-function
containers, and pointer members (including `SwiftErrorArg`) had no
in-class initialization.
The bad case is a function with a swifterror argument on such a target:
`IRTranslator` uses `SwiftError.getFunctionArg()` without checking
`supportSwiftError()` and this could read an uninitialized
`SwiftErrorArg` value. (SelectionDAG gates the `getFunctionArg` usages
behind `supportSwiftError()`, so it's specific to GlobalISel)
29391328ab66 added [a first test
case](llvm/test/CodeGen/WebAssembly/GlobalISel/irtranslator/args-swiftcc.ll)
that satisfies:
- the target is `supportSwiftError` = false
- use swiftcc
- use GlobalISel
and it made the issue observable with sanitizer builds. This commit
fixes the per-function container reinitialization and defensively add
explicit pointer member initializations.
This patch completely removes `isLoopCarriedDep`, which was used
previously to identify loop-carried dependencies in the DAG. Now that we
have the DDG representation, this special handling is no longer
necessary. Simply replacing its usage with the DDG causes several tests
to fail, since cycle detection takes some of the validation-only edges
in the DDG into account. To address this, this patch introduces extra
edges in the DDG, which are used only for cycle detection and not for
other parts of the pass (e.g., scheduling). The extra edges are
determined to preserve the existing behavior of the pass as closely as
possible, which makes the predicates for adding them somewhat complex.
Split off from #135148, and the final patch in the series for #135148
The evaluation order of function arguments is unspecified by the C++
standard. We had two getNode calls as function arguments which causes
the nodes to be created in a different order depending on the compiler
used. This patch moves them to their own variables to ensure they are
called in the same order on all compilers.
Possible fix for #190148.
If the divisor is INT_MIN, we can still treat it like any other power of
2. We'll fold i32 (seteq (srem X, INT_MIN)) to
(setule (rotr (add (mul X, 1), INT_MIN), 31), 1). Alive2 says this is
correct https://alive2.llvm.org/ce/z/vjzqKk.
The multiply is a NOP, the add toggles the sign bits. The rotate puts
the lowest 31 bits of into the upper 31 bits. The sign bit is now in the
LSB. The compare checks if the upper 31 bits are 0.
srem X, INT_MIN has a remainder of 0 if X is 0 or INT_MIN which is
equivalent to checking if the uppper 31 bits are 0 after the rotate.
I don't think we need to add any constant for power of 2 but toggling
the sign bit like we do now doesn't hurt.
Fixes#181643
For queries like `isKnownToBeAPowerOfTwo(V, OrZero=true)`, if an operand
is known to be "pow2-or-zero" but not strictly non-zero power-of-two,
the min/max case currently returns false even when the result remains
pow2-or-zero.
For instance:
- `A = select cond, 4, 0` (A is pow2-or-zero)
- `R = umin(A, 16)`
`R` is always in `{0, 4}` and querying `isKnownToBeAPowerOfTwo(R,
OrZero=true)` should be true.
Added unitests for baseline and failing case and now propagating
correctly to `OrZero` and `DemandedElts`
Initially added in #187709. It was reverted in #188833, because
[llvm-clang-x86_64-sie-win](https://lab.llvm.org/buildbot/#/builders/46/builds/32873)
was failing in
`cross-project-tests/debuginfo-tests/dexter-tests/nrvo.cpp`.
The test passed for me locally. After checking on another machine, I
found that `S_DEFRANGE_REGISTER_REL_INDIR` is only supported by
dbgeng/WinDbg from Windows 10.0 Build 19041 (released 2020) onwards.
SDKs before this will fail to read the value. That buildbot is on
Windows 10.0 Build 17763.
I'm not sure if we should make the generation of that record
conditional. Debuggers that can't read the record will skip it. They'll
still see that there's some local variable, but won't be able to display
the value.
As far as I know, users of older Windows 10 builds should be able to
install a newer Windows SDK and use the WinDbg from that version. But I
haven't tested that.
Moving these into the middle-end pipeline will allow for additional
optimization of the expansion result, such as CSE of redundant loads
(c.f. https://godbolt.org/z/bEna4Md9r). For now, we conservatively place
the passes at the end of the middle-end pipeline, so we mostly don't
benefit from additional optimizations yet. The pipeline position will be
moved in a future change.
This builds on work done by legrosbuffle in
https://reviews.llvm.org/D60318.
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This addresses an illegal mutation kind, where gisel would hit an
assert. It expands vector elements for non-power2 elements or elements
less that i8 to a power of 2.
A fix to handle vector types correctly was needed in LegalizerHandler.
Fixes#185411
Do not remove IMPLICIT_DEF of a physreg unless all uses have an undef
flag added. Previously, only the first use instruction had undef flags
added. This will cause a failure in machine instruction verification.
Multi-instruction uses tested in AMDGPU/multi-use-implicit-def.mir and
X86/multi-use-implicit-def.mir.
---------
Signed-off-by: John Lu <John.Lu@amd.com>
This makes the rematerializer able to rematerialize MIs at the end of a
basic block. We achieve this by tracking the parent basic block of every
region inside the rematerializer and adding an explicit target region to
some of the class's methods. The latter removes the requirement that we
track the MI of every region (`Rematerializer::MIRegion`) after the
analysis phase; the class member is therefore deleted.
This new ability will be used shortly to improve the design of the
rollback mechanism.
A better version of #175801 . see that for more info.
Fixes#185467
The original patch was checking the correctness of the transformation
based on the original Op1 , which was then negated (in the case of
IsAdd). This patch fixes that issue by inverting the sign bit in that
case.
Also pushed a slight nfc there to simplify the code and remove some
duplication.
alive2 proofs:
abds: https://alive2.llvm.org/ce/z/oJQPss
abdu: https://alive2.llvm.org/ce/z/HfPF5q
Note that the regression test is not (wrongly) affected anymore by the
patch (as it did before)
Remove LowerFCanonicalize. Added fallback for cases when the scalar type also has its Custom lowering to avoid regressions on AMDGPU and SystemZ.
Fixes#143862
This change adds support for adding listeners to the target-independent
rematerializer; listeners can catch certain rematerialization-related
events to implement some additional functionality on top of what the
rematerializer already performs.
This is NFC and has no user at the moment, but the plan is to have
listeners start being responsible for secondary/optional functionalities
that are at the moment integrated with the rematerializer itself. Two
examples of that are:
1. rollback support (currently optional), and
2. region tracking (currently mandatory, but not fundamentally necessary
to the rematerializer).
Enable pseudo probes emission for MachO. Due to the 16 character limit
of MachO segment and section, the file sections will be
`__PSEUDO_PROBE,__probes` and `__PSEUDO_PROBE,__probe_descs`.
This change adds a mechanism to stop emitting `.debug_pubname`,
`.debug_pubtypes` sections for a particular target.
This is particularly useful for cases where IR is generated by frontends
that do not explicitly disable these sections (as `Clang` does for
`NVPTX`), but still use `llc` for code generation.
Currently, only `NVPTX` uses this to disable these sections.
This is a second attempt at "[SelectionDAG] Expand
CTTZ_ELTS[_ZERO_POISON] and handle splitting" (#188220)
That PR had to be reverted in 7d39664a6ae8daaf186b65578492244d96a50bf2
because we had crashes on AMDGPU since we didn't have scalarization
support, and other crashes on PowerPC because we didn't handle the case
when a vector needed widened. Tests for these are added in
AMDGPU/cttz-elts.ll, RISCV/rvv/cttz-elts-scalarize.ll and
PowerPC/cttz-elts.ll.
The former crash has been fixed by adding
DAGTypeLegalizer::ScalarizeVecOp_CTTZ_ELTS.
The second crash has been fixed by reworking
TargetLowering::expandCttzElts. The expansion for CTTZ_ELTS is nearly
identical to VECTOR_FIND_LAST_ACTIVE, except it uses a reverse step
vector and subtracts the result from VF. The easiest way to fix these
crashes without introducing regressions is to reuse the
VECTOR_FIND_LAST_ACTIVE expansion which already handles the case where
the vector needs widened.
This means that the node now needs to take in a boolean vector argument
and uses VSELECT instead of an AND to zero out inactive lanes, so the op
promotion code has also been shared.
The original differential revision is https://reviews.llvm.org/D153638
Reverted in
f5b5a30858
because of causing a clang crash.
This patch relands it with the crash fixed. Call `DTU->flush()` in each
iteration of `while (MadeChange)`
loop, flush all awaiting BasicBlocks deletion, and prevent iterator
invalidation.
The truncating store analogue of #181104.
Adds `Alignment` and `AddrSpace` parameters to
`TargetLoweringBase::getTruncStoreAction` and dependents, and introduces
a `getCustomTruncStoreAction` hook for targets to customize legalization
behavior using this new information.
This change is fully backwards compatible from the target's point of
view, with `setTruncStoreAction` having identical functionality. The
change is purely additive.
Remove ConstantFPSDNode handling from isKnownNeverNaN and fallback to
using computeKnownFPClass if there are no opcode matches in
isKnownNeverNaN
The test check changes are due to isKnownNeverNaN not handling
UNDEF/POISON but computeKnownFPClass does (POISON in particular now
returns isKnownNeverNaN == true, preventing a ISD::FCANONICALIZE call in
expandFMINNUM_FMAXNUM).
Move some functions around so that the CallBrInst processing is
contained. The 'static' functions don't need to be declared at the top;
just place them before the calls. Fix the naming to use lower-case for
the first letter of function names.
A recently introduced local is only used in an assertion which means we
get -Wunused-variable in release+noasserts builds. Mark it
[[maybe_unused]] rather than inlinine the definition given there are
multiple uses within the assert.