Similar to commit 245491a9f384e4c53421196533c2a2b693efaf8d for DwarfDebug.
This completely disables the expensive MCFragment walk code in
`AttemptToFoldSymbolOffsetDifference` when compiling sqlite3.i for
macOS.
In the future, we should try enabling the MCFragment walk only for
constructs like `.if . -_start == 1` and `.subsection a-b` and
remove these `setUseAssemblerInfoForParsing`.
Remove support for the icmp and fcmp constant expressions.
This is part of:
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179
As usual, many of the updated tests will no longer test what they were
originally intended to -- this is hard to preserve when constant
expressions get removed, and in many cases just impossible as the
existence of a specific kind of constant expression was the cause of the
issue in the first place.
Previously this assumed that `LLVM_ENABLE_ABI_BREAKING_CHECKS` would
always be enabled in this case, if it's not `TTI` does not exist.
Introduced in 7652a59407018c057cdc1163c9f64b5b6f0954eb
These blocks usually show up in the form of branches within inline
assembly. Since it's hard to rewire them, we fully omit paths with such
blocks from path cloning.
- Fix build with `EXPENSIVE_CHECKS`
- Remove unused `PassName::ID` to resolve warning
- Mark `~SelectionDAGISel` virtual so AArch64 backend can work properly
Since pointers in memory, as well as the index type are both 32 bits,
but in registers pointers are 64 bits, the mask generated by
llvm.ptrmask needs to be zero-extended.
Fixes: #94075
Fixes: rdar://125263567
This reduce the time complexity of the main loop of `findCandidates()`
method from $O(n^2)$ to $O(n \log n)$.
For small $n$, the modification does not regress the build time, but it
helps significantly when $n$ is large.
For one application, this reduces the runtime of the main loop from 120
seconds to 28 seconds.
This is the first commit for an enhanced version of machine outliner --
see
[RFC](https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-1-fulllto-part-2-thinlto-nolto-to-come/78732).
We add a feature that prevents the GlobalMerge pass from considering
data smaller than a minimum size in bytes for merging.
The MinSize is set in 3 ways:
1. If global-merge-min-data-size is explicitly set, then it uses that
value.
2. If SmallDataLimit is set and non-zero, then SmallDataLimit + 1 is
used.
3. Otherwise, 0 is used, which means all sizes are considered for
merging.
We found that this feature allowed us to see the benefit of the
GlobalMerge pass while eliminating some merging that was not beneficial.
This feature allowed us to enable the GlobalMerge pass on RISC-V in our
downstream by default because it led to improvements on multiple
benchmark suites.
I plan to post a separate patch to propose enabling this by default on
RISC-V. But I do not want that discussion to be part of the discussion
of adding this feature, so I am keeping the patches separate.
This avoids the pitfall where we set the uwtable to none:
```
func.setUWTableKind(llvm::UWTableKind::None)
```
`Attribute::getAsString()` would see an unknown attribute and fail an
assertion. In this patch, we assert that we do not see a None uwtable
kind.
This also skips the check of `UWTableKind::Async`. It is dominated by
the check of `UWTableKind::Default`, which has the same enum value
(nfc).
We were missing the PoisonOnly argument (so Depth + 1 was being used instead and the default Depth = 0 argument then being silently used)
Fixes#94145 and serves as the test case for 9e22c7a0ea87228dffcdfd7ab62724f72e0b3e30
This reverts commit de37c06f01772e02465ccc9f538894c76d89a7a1 to
de37c06f01772e02465ccc9f538894c76d89a7a1
It still breaks EXPENSIVE_CHECKS build. Sorry.
Port selection dag isel to new pass manager.
Only `AMDGPU` and `X86` support new pass version. `-verify-machineinstrs` in new pass manager belongs to verify instrumentation, it is enabled by default.
Since #93182 we can now call computeKnownBits inside getValidMaximumShiftAmount to determine the bounds of the shift amount ensuring that it wasn't poison, meaning if we did freeze the ahift amount, isGuaranteedNotToBeUndefOrPoison would then fail as we can't call computeKnownBits through FREEZE for potentially poison values.
I'm still reducing a decent test case but wanted to get the buildbot fix ASAP.
The getValidShiftAmountConstant/getValidMinimumShiftAmountConstant/getValidMaximumShiftAmountConstant helpers only worked with constant shift amounts, which could be problematic after type legalization (e.g. v2i64 might be partially scalarized or split into v4i32 on some targets such as 32-bit x86, Thumb2 MVE).
This patch proposes we generalize these helpers to work with ConstantRange+KnownBits if a scalar/buildvector constant isn't available.
Most restrictions are the same - the helper fails if any shift amount is out of bounds, getValidShiftConstant must be a specific constant uniform etc.
However, getValidMinimumShiftAmount/getValidMaximumShiftAmount now can return bounds values that aren't values in the actual data, as they are based off the common KnownBits of every vector element.
This addresses feedback on #92096
This adds codegen support for the "ptrauth" operand bundles, which can
be used to augment indirect calls with the equivalent of an
`@llvm.ptrauth.auth` intrinsic call on the call target (possibly
preceded by an `@llvm.ptrauth.blend` on the auth discriminator if
applicable.)
This allows the generation of combined authenticating calls
on AArch64 (in the BLRA* PAuth instructions), while avoiding
the raw just-authenticated function pointer from being
exposed to attackers.
This is done by threading a PtrAuthInfo descriptor through
the call lowering infrastructure, eventually selecting a BLRA
pseudo. The pseudo encapsulates the safe discriminator
computation, which together with the real BLRA* call get emitted
in late pseudo expansion in AsmPrinter.
Note that this also applies to the other forms of indirect calls,
notably invokes, rvmarker, and tail calls. Tail-calls in particular
bring some additional complexity, with the intersecting register
constraints of BTI and PAC discriminator computation.
However this doesn't currently support PAuth_LR tail-call variants.
This also adopts an x8+ allocation order for GPR64noip, matching
GPR64.
The current way of lowering `llvm.clear_cache` is a bit unusual. As
suggested by Matt Arsenault we are better off using an ISD node.
This change introduces a new `ISD::CLEAR_CACHE`, registers a new libcall
by default named `__clear_cache` and the default legalisation is a
libcall.
This is preparatory work for a custom lowering of `ISD::CLEAR_CACHE`
needed by RISC-V on some platforms.
This patch adds basic support for scalable vector types in load & store
instructions for AArch64 with GISel.
Only scalable vector types with a 128-bit base size are supported, e.g.
`<vscale x 4 x i32>`, `<vscale x 16 x i8>`.
This patch adapted some ideas from a similar abandoned patch
[https://github.com/llvm/llvm-project/pull/72976](https://github.com/llvm/llvm-project/pull/72976).
This is an NFC patch to move DIExpressionCursor to DebugInfoMetada.h, so
that it can be used by classes in that header file.
Specifically, I want to use DIExpressionCursor in a subsequent patch:
https://github.com/llvm/llvm-project/pull/71718
This was only preserving the flags on the setcc, not the new select.
This was missing presumably due to getSelect not having a flags argument
until recently. Avoids regressions in a future commit.
Despite the comment, this isn't used to size bit vectors or tables.
That's done by VALUETYPE_SIZE. MAX_ALLOWED_VALUETYPE is only used by
some static_asserts that compare it to VALUETYPE_SIZE.
This patch removes it and most of the static_asserts. I left one where I
compared VALUETYPE_SIZE to token which is the first type that isn't part
of the VALUETYPE range. This isn't strictly needed, we'd probably catch
duplication error from VTEmitter.cpp first.
shouldRealignStack/canRealignStack are repeatedly called in PEI (through
hasStackRealignment). Checking function attributes is expensive, so
cache this data in the MachineFrameInfo, which had most data already.
This slightly changes the semantics of `MachineFrameInfo::ForcedRealign`
to be also true when the `stackrealign` attribute is set.
These are known to be inbounds, create them as such. NFCI because
constant expression construction currently already infers this.
Also drop the unnecessary zero-index GEP: This is equivalent to
the pointer itself nowadays.
The operation selection logic here doesn't really work when vector types
need to be split. This was also dropping the flags, and losing nnan made
the combine from select back to fmin/fmax unrecoverable. Preserve the
flags to assist a future commit.
Previously, we just check if the source is a virtual register and
this prevents some potential hoists.
We can see some improvements in AArch64/RISCV tests.
This adds VPSExtPromotedInteger and VPZExtPromotedInteger and uses them
to promote many arithmetic operations.
VPSExtPromotedInteger uses a shift pair because we don't have
VP_SIGN_EXTEND_INREG yet.
Handle these in promote float and vector widening. Currently we happen
to avoid emitting these unless legal or custom. Avoids regression in
a future commit which wants to unconditionally emit these.
In #91747, we changed the SDNode from `X86ISD::SUB` (FROM) to
`X86ISD::CCMP`
(TO) in the DAGCombine. The value type of `X86ISD::SUB` can be `i8, i32`
while the value type of `X86ISD::CCMP` is i32. This breaks the
assumption
that the value type should match after the combine and triggers the
error
```
SelectionDAG.cpp:10942: void
llvm::SelectionDAG::transferDbgValues(llvm::SDValue, llvm::SDValue,
unsigned int, unsigned int, bool): Assertion `FromNode && ToNode &&
"Can't modify dbg values"' failed.
```
when running tests
llvm/test/CodeGen/X86/apx/ccmp.ll
llvm/test/CodeGen/X86/apx/ctest.ll
in Release build when LLVM_ENABLE_ASSERTIONS is on.
In this patch, we fix it by creating a merged value.
DAG combine for `CCMP` and `CTESTrr`:
```
and/or(setcc(cc0, flag0), setcc(cc1, sub (X, Y)))
->
setcc(cc1, ccmp(X, Y, ~cflags/cflags, cc0/~cc0, flag0))
and/or(setcc(cc0, flag0), setcc(cc1, cmp (X, 0)))
->
setcc(cc1, ctest(X, X, ~cflags/cflags, cc0/~cc0, flag0))
```
where `cflags` is determined by `cc1`.
Generic DAG combine:
```
cmp(setcc(cc, X), 0)
brcond ne
->
X
brcond cc
sub(setcc(cc, X), 1)
brcond ne
->
X
brcond ~cc
```
Post DAG transform: `ANDrr/rm + CTESTrr -> CTESTrr/CTESTmr`
Pattern match for `CTESTri`:
```
X= and A, B
ctest(X, X, cflags, cc0/, flag0)
->
ctest(A, B, cflags, cc0/, flag0)
```
`CTESTmi` is already handled by the memory folding mechanism in MIR.