Similar to https://wg21.link/n4279
For example, insert_or_assign can be used to simplify
CodeGenModule::AddDeferredUnusedCoverageMapping in
clang/lib/CodeGen/CodeGenModule.cpp
Add subrange tracking and handling for LiveIntervals during PHI
elimination.
This requires extending MachineBasicBlock::SplitCriticalEdge to also
update subrange intervals.
With opaque pointers, `CreatePointerBitCastOrAddrSpaceCast` can be replaced with `CreateAddrSpaceCast`.
Replace or remove uses of `CreatePointerBitCastOrAddrSpaceCast`.
Opaque pointer cleanup effort.
Changes the size of allocations automatically.
For now, implements the case when a single range from start of the
allocation is alive and the allocation can be reduced.
Summary:
This patch reworks how we handle global constructors in OpenMP.
Previously, we emitted individual kernels that were all registered and
called individually. In order to provide more generic support, this
patch moves all handling of this to the target backend and the runtime
plugin. This has the benefit of supporting the GNU extensions for
constructors an destructors, removing a class of failures related to
shared library destruction order, and allows targets other than OpenMP
to use the same support without needing to change the frontend.
This is primarily done by calling kernels that the backend emits to
iterate a list of ctor / dtor functions. For x64, this is automatic and
we get it for free with the standard `dlopen` handling. For AMDGPU, we
emit `amdgcn.device.init` and `amdgcn.device.fini` functions which
handle everything atuomatically and simply need to be called. For NVPTX,
a patch https://github.com/llvm/llvm-project/pull/71549 provides the
kernels to call, but the runtime needs to set up the array manually by
pulling out all the known constructor / destructor functions.
One concession that this patch requires is the change that for GPU
targets in OpenMP offloading we will use `llvm.global_dtors` instead of
using `atexit`. This is because `atexit` is a separate runtime function
that does not mesh well with the handling we're trying to do here. This
should be equivalent in all cases except for cases where we would need
to destruct manually such as:
```
struct S { ~S() { foo(); } };
void foo() {
static S s;
}
```
However this is broken in many other ways on the GPU, so it is not
regressing any support, simply increasing the scope of what we can
handle.
This changes the handling of ctors / dtors. This patch now outputs a
information message regarding the deprecation if the old format is used.
This will be completely removed in a later release.
Depends on: https://github.com/llvm/llvm-project/pull/71549
This causes asserts to fire:
llvm/lib/Analysis/ValueTracking.cpp:4262:
std::tuple<Value *, FPClassTest, FPClassTest> llvm::fcmpImpliesClass(CmpInst::Predicate, const Function &, Value *, const APFloat *, bool):
Assertion `(RHSClass == fcPosNormal || RHSClass == fcNegNormal || RHSClass == fcPosSubnormal || RHSClass == fcNegSubnormal) && "should have been recognized as an exact class test"' failed.
See comments on the PR.
> Previously we could recognize exact class tests performed by
> an fcmp with special values (0s, infs and smallest normal).
> Expand this to recognize the implied classes by a compare with a general
> constant. e.g. fcmp ogt x, 1 implies positive and non-0.
>
> The API should be better merged with fcmpToClassTest but that
> made the diff way bigger, will try to do that in a future
> patch.
This reverts commit dc3faf0ed0e3f1ea9e435a006167d9649f865da1.
This patch lowers `sdiv x, +/-2**k` to `add + select + shift` when the
short forward branch optimization is enabled. The latter inst seq
performs faster than the seq generated by target-independent
DAGCombiner. This algorithm is described in ***Hacker's Delight***.
This patch also removes duplicate logic in the X86 and AArch64 backend.
But we cannot do this for the PowerPC backend since it generates a
special instruction `addze`.
Previously we could recognize exact class tests performed by
an fcmp with special values (0s, infs and smallest normal).
Expand this to recognize the implied classes by a compare with a general
constant. e.g. fcmp ogt x, 1 implies positive and non-0.
The API should be better merged with fcmpToClassTest but that
made the diff way bigger, will try to do that in a future
patch.
Long scalar values can be split into multiple lines to improve
readability. The rules are described in Section 6.5. "Line Folding",
https://yaml.org/spec/1.2.2/#65-line-folding. In addition, for flow
scalar styles, the Spec states that "All leading and trailing white
space characters on each line are excluded from the content",
https://yaml.org/spec/1.2.2/#73-flow-scalar-styles.
The patch implements these unfolding rules for double-quoted,
single-quoted, and plain scalars.
This patch plumbs the command line --experimental-debuginfo-iterators flag
in to the pass managers, so that modules can be converted to the new
format, passes run, then converted back to the old format. That allows
developers to test-out the new debuginfo representation across some part of
LLVM with no further work, and from the command line. It also installs
flag-catchers at the various points that bitcode and textual IR can egress
from a process, and temporarily convert the module to dbg.value format when
doing so.
No tests alas as it's designed to be transparent.
Differential Revision: https://reviews.llvm.org/D154372
The current isScalable function requires a user to call isVector before
hand in order to avoid an assertion failure in the case that the LLT is
not a vector.
This patch addds helper functions that allow a user to query whether the
LLT is fixed or scalable, not wanting an assertion failure in the case
that the LLT was never a vector in the first place.
Refactor this function to take a callback for each decoded string, rename it and change it to a static function in cpp. Move its (sole) caller definition from header to cpp.
- This is a split of patch https://github.com/llvm/llvm-project/pull/66825; to minimize the diff created in a big PR.
For a label difference like `.uleb128 A-B`, MC folds A-B even if A and B
are separated by a RISC-V linker-relaxable instruction. This incorrect
behavior is currently abused by DWARF v5 .debug_loclists/.debug_rnglists
(DW_LLE_offset_pair/DW_RLE_offset_pair entry kinds) implemented in
Clang/LLVM (see https://github.com/ClangBuiltLinux/linux/issues/1719 for
an instance).
96d6e190e9
defined R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128. This patch generates such
a pair of relocations to represent A-B that should not be folded.
GNU assembler computes the directive size by ignoring shrinkable section
content, therefore after linking the value of A-B cannot use more bytes
than the reserved number (`final size of uleb128 value at offset ... exceeds available space`).
We make the same assumption.
```
w1:
call foo
w2:
.space 120
w3:
.uleb128 w2-w1 # 1 byte, 0x08
.uleb128 w3-w1 # 2 bytes, 0x80 0x01
```
We do not conservatively reserve 10 bytes (maximum size of an uleb128
for uint64_t) as that would pessimize DWARF v5
DW_LLE_offset_pair/DW_RLE_offset_pair, nullifying the benefits of
introducing R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128 relocations.
The supported expressions are limited. For example,
* non-subtraction `.uleb128 A` is not allowed
* `.uleb128 A-B`: report an error unless A and B are both defined and in the same section
The new cl::opt `-riscv-uleb128-reloc` can be used to suppress the
relocations.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D157657
The WebKit Calling Convention was created specifically for the WebKit
FTL. FTL
doesn't use LLVM anymore and therefore this calling convention is
obsolete.
This commit removes the WebKit CC, its associated tests, and
documentation.
Because the SmallVectorImpl destructor is not virtual, the destructor of
derived classes will not be called if pointers to the SmallVectorImpl
class are deleted directly. Making the SmallVectorImpl destructor
protected will prevent this.
This is the "central" patch to the removing-debug-intrinsics project: it
changes the instruction movement APIs (insert, move, splice) to interpret
the "Head" bits we're attaching to BasicBlock::iterators, and updates
debug-info records in the background to preserve the ordering of debug-info
(which is in DPValue objects instead of dbg.values). The cost is the
complexity of this patch, plus memory. The benefit is that LLVM developers
can cease thinking about whether they're moving debug-info or not, because
it'll happen behind the scenes.
All that complexity appears in BasicBlock::spliceDebugInfo, see the diagram
there for how we now manually shuffle debug-info around. Each potential
splice configuration gets tested in the added unit tests.
The rest of this patch applies the same reasoning in a variety of
scenarios. When moveBefore (and it's siblings) are used to move
instructions around, the caller has to indicate whether they intend for
debug-info to move too (is it a "Preserving" call or not), and then the
"Head" bits used to determine where debug-info moves to. Similar reasoning
is needed for insertBefore.
Differential Revision: https://reviews.llvm.org/D154353
BasicBlock.h and Instruction.h will eventually need to include
DebugProgramInstruction.h so that debug-info attached to instructions can
be enumerated and cloned. Originally including it made compiling clang
much slower, I think I've pinned that down as being the inclusion of
DebugInfoMetadata.h causing ~every LLVM translation unit to parse
all the debug-info classes.
This patch avoids that by shifting some functions into the cpp file rather
than the header, and restores the inclusion of DebugProgramInstruction.h in
BasicBlock.h so that the rest of the RemoveDIs functionality can land.
This test checks for error paths in relocation dependent functions of readAddend and applyFixup. It is useful to check these to avoid unexpected assert errors. Currently opcode errors are triggered in most of the cases in AArch32 but there might be further checks to look for in the future. Different backends can also implement a similar test.
Support for ELF::R_ARM_THM_MOVW_PREL_NC and ELF::R_ARM_THM_MOVT_PREL
is added. Move instructions with PC-relative immediates can be handled
in Thumb mode with this addition.
Close https://github.com/llvm/llvm-project/issues/56980.
This patch tries to introduce a light-weight optimization attribute for
coroutines which are guaranteed to only be destroyed after it reached
the final suspend.
The rationale behind the patch is simple. See the example:
```C++
A foo() {
dtor d;
co_await something();
dtor d1;
co_await something();
dtor d2;
co_return 43;
}
```
Generally the generated .destroy function may be:
```C++
void foo.destroy(foo.Frame *frame) {
switch(frame->suspend_index()) {
case 1:
frame->d.~dtor();
break;
case 2:
frame->d.~dtor();
frame->d1.~dtor();
break;
case 3:
frame->d.~dtor();
frame->d1.~dtor();
frame->d2.~dtor();
break;
default: // coroutine completed or haven't started
break;
}
frame->promise.~promise_type();
delete frame;
}
```
Since the compiler need to be ready for all the cases that the coroutine
may be destroyed in a valid state.
However, from the user's perspective, we can understand that certain
coroutine types may only be destroyed after it reached to the final
suspend point. And we need a method to teach the compiler about this.
Then this is the patch. After the compiler recognized that the
coroutines can only be destroyed after complete, it can optimize the
above example to:
```C++
void foo.destroy(foo.Frame *frame) {
frame->promise.~promise_type();
delete frame;
}
```
I spent a lot of time experimenting and experiencing this in the
downstream. The numbers are really good. In a real-world coroutine-heavy
workload, the size of the build dir (including .o files) reduces 14%.
And the size of final libraries (excluding the .o files) reduces 8% in
Debug mode and 1% in Release mode.
This change broke building LLVM with Module support enabled, i.e.
`LLVM_ENABLE_MODULES=ON`.
This reverts commit f40da072ed51ba77bf46191b35a74208b1045042.
This reverts commit 957efa4ce4f0391147cec62746e997226ee2b836.
Original commit message below -- in this follow up, I've shifted
un-necessary inclusions of DebugProgramInstruction.h into being forward
declarations (fixes clang-compile time I hope), and a memory leak in the
DebugInfoTest.cpp IR unittests.
I also tracked a compile-time regression in D154080, more explanation
there, but the result of which is hiding some of the changes behind the
EXPERIMENTAL_DEBUGINFO_ITERATORS compile-time flag. This is tested by the
"new-debug-iterators" buildbot.
[DebugInfo][RemoveDIs] Add prototype storage classes for "new" debug-info
This patch adds a variety of classes needed to record variable location
debug-info without using the existing intrinsic approach, see the rationale
at [0].
The two added files and corresponding unit tests are the majority of the
plumbing required for this, but at this point isn't accessible from the
rest of LLVM as we need to stage it into the repo gently. An overview is
that classes are added for recording variable information attached to Real
(TM) instructions, in the form of DPValues and DPMarker objects. The
metadata-uses of DPValues is plumbed into the metadata hierachy, and a
field added to class Instruction, which are all stimulated in the unit
tests. The next few patches in this series add utilities to convert to/from
this new debug-info format and add instruction/block utilities to have
debug-info automatically updated in the background when various operations
occur.
This patch was reviewed in Phab in D153990 and D154080, I've squashed them
together into this commit as there are dependencies between the two
patches, and there's little profit in landing them separately.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
- Revert "[DAGCombiner] Transform `(icmp eq/ne (and X,C0),(shift X,C1))`
to use rotate or to getter constants." - causes a miscompile, see
112e49b381 (commitcomment-131943923)
- Revert "[X86] Fix gcc warning about mix of enumeral and non-enumeral
types. NFC", which fixes a compiler warning in the commit above
Track the live register state immediately before, instead of after,
MBBI. This makes it simple to track the state at the start or end of a
basic block without a separate (and poorly named) Tracking flag.
This changes the API of the backward(MachineBasicBlock::iterator I)
method, which now recedes to the state just before, instead of just
after, *I. Some clients are simplified by this change.
There is one small functional change shown in the lit tests where
multiple spilled registers all need to be reloaded before the same
instruction. The reloads will now be inserted in the opposite order.
This should not affect correctness.
Also disables generation of MutateOpcode. It's almost never used in
combiners anyway.
If we really want to use it, it needs to be investigated & properly
fixed (see TODO)
Fixes#70780
Add a new intrinsic, similar to llvm.amdgcn.set.inactive, but used only
in functions with the `amdgpu_cs_chain` or `amdgpu_cs_chain_preserve`
calling conventions. It allows setting the inactive lanes to those of a
value received as a VGPR argument (whereas llvm.amdgcn.set.inactive
usually takes a constant as the value of the inactive lanes).
Differential Revision: https://reviews.llvm.org/D158604
The inference is trivial and leverages the MCOI OperandTypes encoded in
CodeGenInstructions to infer types across patterns in a CombineRule.
It's thus very limited and only supports CodeGenInstructions (but that's the
main use case so it's fine).
We only try to infer untyped operands in apply patterns when they're
temp reg defs, or immediates. Inference always outputs a `GITypeOf<$x>` where
$x is a named operand from a match pattern.
This allows us to drop the `GITypeOf` in most cases without any errors.
Make it easier to control which optimizations are enabled by making
OptimizeArgs a bit masked enum. There's currently only one such
optimization, but more will be added in followup commits.
This is a support for " #pragma omp atomic compare fail ". It has Parser & AST support for now.
Reviewed By: tianshilei1992, ABataev
Differential Revision: https://reviews.llvm.org/D123235
…gSizeInBits
This patch changes getRegSizeInBits to return a TypeSize instead of an
unsigned in the case that a virtual register has a scalable LLT. In the
case that register is physical, a Fixed TypeSize is returned.
The MachineVerifier pass is updated to allow copies between fixed and
scalable operands as long as the Src size will fit into the Dest size.
This is a precommit which will be stacked on by a change to GISel to
generate COPYs with a scalable destination but a fixed size source.
This patch is stacked on https://github.com/llvm/llvm-project/pull/70893
for the ability to use scalable vector types in MIR tests.