llvm/llvm-project#147560 changed when the legacy SelectionDAG pass needs
TargetTransformInfoWrapperPass to always require it (rather than only
when assertions are enabled). `SelectionDAGISelLegacy::getAnalysisUsage`
was not updated in that PR, which was causing crashes on
assertions-disabled builds, which are hard to track down.
This makes the required update, which should avoid crashes being seen on
some buildbots and by some users.
This reverts commit 8ac7210b7f0ad49ae7809bf6a9faf2f7433384b0.
This breaks the building the AArch64 backend, e.g. see
https://github.com/llvm/llvm-project/pull/144947
Revert to unbreak the build.
Also reverts follow-up commits 1e76f012db3ccfaa05e238812e572b5b6d12c17e.
If a kernel is known to be executing only a single lane, IR
UniformityAnalysis will take note of that (via
GCNTTIImpl::hasBranchDivergence) and report that all values are uniform.
SelectionDAG's built-in divergence tracking should do the same.
Seeing how we can't generate any debug intrinsics any more: delete a
variety of codepaths where they're handled. For the most part these are
plain deletions, in others I've tweaked comments to remain coherent, or
added a type to (what was) type-generic-lambdas.
This isn't all the DbgInfoIntrinsic call sites but it's most of the
simple scenarios.
Co-authored-by: Nikita Popov <github@npopov.com>
This patch optimizes the Windows security cookie check mechanism by
moving the comparison inline and only calling __security_check_cookie
when the check fails. This reduces the overhead of making a DLL call
for every function return.
Previously, we implemented this optimization through a machine pass
(X86WinFixupBufferSecurityCheckPass) in PR #95904 submitted by
@mahesh-attarde. We have reverted that pass in favor of this new
approach. Also we have abandoned the AArch64 specific implementation
of same pass in PR #121938 in favor of this more general solution.
The old machine instruction pass approach:
- Scanned the generated code to find __security_check_cookie calls
- Modified these calls by splitting basic blocks
- Added comparison logic and conditional branching
- Required complex block management and live register computation
The new approach:
- Implements the same optimization during instruction selection
- Directly emits the comparison and conditional branching
- No need for post-processing or basic block manipulation
- Disables optimization at -Oz.
Thanks @tamaspetz, @efriedma-quic and @arsenm for their help.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
It is used to mark a value that we are sure that it is not some fcType.
The examples include:
* An arguments of a function is marked with nofpclass
* Output value of an intrinsic can be sure to not be some type
So that the following operation can make some assumptions.
If the SDNode is used it can pick up the wrong results number, for
example looking at the known bits of the first result where it should be
looking at the second. The SDValue is already present as the
SelectCodeCommon checks move from parent to child, pass the SDValue
through to CheckNodePredicate as Op so that it can use it if necessary.
SDNode *N is still generated, keeping most PatFrags the same.
Fixes#137274
It is used to mark a value that we are sure that it is not some fcType.
The examples include:
* An arguments of a function is marked with nofpclass
* Output value of an intrinsic can be sure to not be some type
So that the following operation can make some assumptions.
---------
Co-authored-by: Your Name <you@example.com>
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to getFirstNonPHI use the iterator-returning version.
This patch changes a bunch of call-sites calling getFirstNonPHI to use
getFirstNonPHIIt, which returns an iterator. All these call sites are
where it's obviously safe to fetch the iterator then dereference it. A
follow-up patch will contain less-obviously-safe changes.
We'll eventually deprecate and remove the instruction-pointer
getFirstNonPHI, but not before adding concise documentation of what
considerations are needed (very few).
---------
Co-authored-by: Stephen Tozer <Melamoto@gmail.com>
Once we get to SelectionDAG the IR should not be changing anymore, so we
can use BatchAAResults rather than AAResults to cache AA queries.
This should be a NFC change for targets that enable AA during codegen
(such as AArch64), but also give a nice compile-time improvement in some
cases. See:
https://github.com/llvm/llvm-project/pull/123787#issuecomment-2606797041
Note: This follows Nikita's suggestion on #123787.
With this change, targets are no longer required to put memory / strict-fp opcodes after special
`ISD::FIRST_TARGET_MEMORY_OPCODE`/`ISD::FIRST_TARGET_STRICTFP_OPCODE` markers.
This will also allow autogenerating `isTargetMemoryOpcode`/`isTargetStrictFPOpcode (#119709).
Pull Request: https://github.com/llvm/llvm-project/pull/119969
SDNode::use_iterator now returns an SDUse& when dereferenced.
SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses
work on use_iterator. SDNode::user_begin/user_end/users work on
user_iterator.
We can now write range based for loops using SDUse& and SDNode::uses().
I've converted many of these in this patch. I didn't update loops that
have additional variables updated in their for statement.
Some loops use SDNode::use_iterator::getOperandNo() which also prevents
using range based for loops. I plan to move this into SDUse in a follow
up patch.
This function is most often used in range based loops or algorithms
where the iterator is implicitly dereferenced. The dereference returns
an SDNode * of the user rather than SDUse * so users() is a better name.
I've long beeen annoyed that we can't write a range based loop over
SDUse when we need getOperandNo. I plan to rename use_iterator to
user_iterator and add a use_iterator that returns SDUse& on dereference.
This will make it more like IR.
In SelectionDAG, `TargetTransformInfo::hasBranchDivergence()` can be
called when both `NDEBUG` and `LLVM_ENABLE_ABI_BREAKING_CHECKS` are
enabled. In that case, the class member `TTI` is still initialized to
`nullptr`, causing a segfault.
Fix this by ensuring that all the calls to `hasBranchDivergence` and
`VerifyDAGDivergence` only occur when `NDEBUG` is disabled, and
`LLVM_ENABLE_ABI_BREAKING_CHECKS` is enabled.
This fixes all the places that hit the new assertion added in
https://github.com/llvm/llvm-project/pull/106524 in tests. That is,
cases where the value passed to the APInt constructor is not an N-bit
signed/unsigned integer, where N is the bit width and signedness is
determined by the isSigned flag.
The fixes either set the correct value for isSigned, set the
implicitTrunc flag, or perform more calculations inside APInt.
Note that the assertion is currently still disabled by default, so this
patch is mostly NFC.
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:
https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850
This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).
Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
RFC:
https://discourse.llvm.org/t/rfc-extend-machine-value-type-from-uint8-t-to-uint16-t/80274
compile-time-tracker:
https://llvm-compile-time-tracker.com/compare.php?from=4b9fab591916eec9fd1942f37afe3b137b564089&to=177d28247efe5a4d59a8d8150b4daf01e4f57d74&stat=wall-time
Currently 208 out of 256 MVTs are used, it will be run out soon, so
ultimately we need to extend the original `MVT::SimpleValueType` from
`uint8_t` to `uint16_t` to accomodate more types.
The `MatcherTable` uses `unsigned char` for encoding the matcher code,
so the extended MVTs are no longer fit into the table, thus we need to
use VBR to encode them as we do on others that are wider than 8 bits.
The statistics below shows the difference of "Total Array size" of the
matcher table that appears in every files:
```
Table Before After Change(%)
WebAssemblyGenDAGISel.inc 23576 23775 0.844
NVPTXGenDAGISel.inc 173498 173498 0
RISCVGenDAGISel.inc 2179121 2369929 8.756
AVRGenDAGISel.inc 2754 2754 0
PPCGenDAGISel.inc 163315 163617 0.185
MipsGenDAGISel.inc 47280 47447 0.353
SystemZGenDAGISel.inc 56243 56461 0.388
AArch64GenDAGISel.inc 467893 487830 4.261
MSP430GenDAGISel.inc 8069 8069 0
LoongArchGenDAGISel.inc 78928 79131 0.257
XCoreGenDAGISel.inc 3432 3432 0
BPFGenDAGISel.inc 3733 3733 0
VEGenDAGISel.inc 65174 66456 1.967
LanaiGenDAGISel.inc 2067 2067 0
X86GenDAGISel.inc 628787 636987 1.304
ARMGenDAGISel.inc 170968 171036 0.040
HexagonGenDAGISel.inc 155764 155764 0
SparcGenDAGISel.inc 5762 5798 0.625
AMDGPUGenDAGISel.inc 504356 504463 0.021
R600GenDAGISel.inc 29785 29785 0
```
The statistics below shows the runtime peak memory usage by compiling a
simple C program:
`/bin/time -v clang -target $TARGET -O3 -c test.c`
```
int test(int a) {
return a * 3;
}
```
```
Target Before(kbytes) After(kbytes) Change(%)
wasm64 110172 110088 -0.076
nvptx64 109784 109980 0.179
riscv64 114020 113656 -0.319
avr 110352 110068 -0.257
ppc64 112612 112476 -0.120
mips64 113588 113668 0.070
systemz 110860 110760 -0.090
aarch64 113704 113432 -0.239
msp430 110284 110200 -0.076
loongarch64 111052 110756 -0.267
xcore 108340 108020 -0.295
bpf 110620 110708 0.080
ve 110960 110920 -0.036
lanai 110180 109960 -0.200
x86_64 113640 113304 -0.296
arm64 113540 113172 -0.324
hexagon 114620 114684 0.056
sparc 110412 110136 -0.250
amdgcn 118164 117144 -0.863
r600 111200 110508 -0.622
```
This is only used by x86 and only used in the AsmPrinter module pass. I
think implementing this by looking at the underlying IR types instead
of the selected instructions is a pretty horrifying implementation,
but it's still available in the AsmPrinter.
This is https://reviews.llvm.org/D123933 resurrected.
I still don't know what the point of emitting _fltused is, but this
approach of looking at the IR types probably isn't the right way to
do this in the first place. If the intent is report any FP instructions,
this will miss any implicitly introduced ones during codegen. Also don't
know why just unconditionally emitting it isn't an option.
The last review mentioned the ARMs might want to emit this, but I'm
not going to go fix that. If someone wants to emit this on ARM, they
can move this to a common helper or analysis somewhere.