Without this patch a `DW_ATE_complex_float` encoding trips an assertion in
`DebugHandlerBase::isUnsignedDIType` with the message `"Unsupported
encoding"`.
By adding a case to the `assert` for `DW_ATE_complex_float` it becomes
supported, behaving in the same way as the already supported `DW_ATE_float`
type (return false).
Note: For the reported reproducer:
#include <complex.h>
int main() {
long double complex r1;
}
The assertion isn't tripped without assignment tracking because instcombine
deletes everything, including the `dbg.declare`, without recovering any
location information. Whereas with assignment tracking we track a zeroing
memset that is emitted by clang.
Reviewed By: probinson
Differential Revision: https://reviews.llvm.org/D151795
This reverts commit 859b05b02d3fd9ab6b77f2bed8df6902fe704806.
Also reverts these follow-ups:
Revert "[RDF] Remove `constexpr` from `hash"
This reverts commit 621507ce20ad8eef2986be2712631165e53b7d91.
Revert "[RDF] Do not use trailing return type after all, NFC"
This reverts commit 46e19e3a2c45e7fb5f501bdb983a7151c158304f.
Revert "[RDF] Stop looking when reached code node in getNextRef with NextOnly"
This reverts commit a049ce9d1bd5a7c1c4fcccc6a801b72b00ea8e0f.
Revert "[RDF] Use trailing return type syntax, NFC"
This reverts commit d3b34b7f3a7cbfc96aea897419f167b5ee19e61a.
Revert "[RDF] Define short type names: NodeAddr<XyzNode*> -> Xyz, NFC"
This reverts commit f8ed60b56d1948422dda924fcf450560591e8a19.
This fold goes against the usual approach of pushing freeze into
operands. The idea behind the fold is that if the setcc feeds into
a brcond, the freeze can be dropped entirely.
Move the fold to brcond, where we can remove the freeze directly.
This ensures that there can be no infinite combine loops due to
conflicting transforms.
Differential Revision: https://reviews.llvm.org/D152544
If we have a load/store with an illegal fixed length vector result type that
needs widened, e.g. `x:v6i32 = load p`
Instead of just widening it to: `x:v8i32 = load p`
We can widen it to the equivalent VP operation and set the EVL to the
exact number of elements needed: `x:v8i32 = vp_load a, b, mask=true, evl=6`
Provided that the target supports vp_load/vp_store on the widened type.
Scalable vectors are already widened this way where possible, so this
largely reuses the same logic.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D148713
This commit re-work the methods that dump traces with resource usage to take into account the StartAtCycle value added by https://reviews.llvm.org/D150310.
For each i, the values of the lists StartAtCycle and ReservedCycles is are printed with the interval [StartAtCycle[i], ReservedCycles[i])
```
... | StartAtCycle[i] | ... | ReservedCycles[i] - 1 | ReservedCycles[i] | ...
| xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | |
```
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D150311
Apply my post-commit comment on D81995. The negative name misguided commit
d8a8e5d6240a1db809cd95106910358e69bbf299 (`[clang][cli] Remove marshalling from
Opt{In,Out}FFlag`) to:
* accidentally flip the option to not emit the xray_fn_idx section.
* change -fno-xray-function-index (instead of -fxray-function-index) to emit xray_fn_idx
This patch renames XRayOmitFunctionIndex and makes -fxray-function-index emit
xray_fn_idx, but the default remains -fno-xray-function-index .
This is a quick fix for an assert when the source and dest have
different address spaces. The pointer compare needs to have matching
types, but we can't generically introduce addrspacecast and we don't
know if the address spaces alias.
This report was broken. The passed null block would crash in the
remark constructor.
I have a patch which will introduce a new block, and it seems to work
fine. It doesn't require legalizing any instructions in the new block,
so it's possible those will be missed.
Expand large or unknown size memory intrinsics into loops in the
default lowering pipeline if the target doesn't have the corresponding
libfunc. Previously AMDGPU had a custom pass which existed to call the
expansion utilities.
With a default no-libcall option, we can remove the libfunc checks in
LoopIdiomRecognize for these, which never made any sense. This also
provides a path to lifting the immarg restriction on
llvm.memcpy.inline.
There seems to be a bug where TLI reports functions as available if
you use -march and not -mtriple.
The name rdf::Use conflicts with llvm::Use when both namespaces are
used via `using namespace`. Specifically this happened in the declaration
of DataFlowGraph::newUse (in RDFGraph.cpp):
```
using namespace rdf;
Use newUse(...); <-- Lookup conflict for "Use"
```
Since the TRT lookup starts in a different namespace than that of the
leading type, this serves as a workaround. In general the rdf namespace
will not likely be introduced via `using namespace`, so this shouldn't
be a problem elsewhere.
hoisted constants.
The constant hoisting pass tries to hoist large constants into predecessors and also
generates remat instructions in terms of the hoisted constants. These aim to prevent
codegen from rematerializing expensive constants multiple times. So we can re-use
this optimization, we can preserve the no-op bitcasts that are used to anchor
constants to the predecessor blocks.
SelectionDAG achieves this by having the OpaqueConstant node, which is just a
normal constant with an opaque flag set. I've opted to avoid introducing a new
constant generic instruction here. Instead, we have a new G_CONSTANT_FOLD_BARRIER
operation that constitutes a folding barrier.
These are somewhat like the optimization hints, G_ASSERT_ZEXT in that they're
eliminated by the generic instruction selection code.
This change by itself has very minor improvements in -Os CTMark overall. What this
does allow is better optimizations when future combines are added that rely on having
expensive constants remain unfolded.
Differential Revision: https://reviews.llvm.org/D144336
Re-landing the code that was reverted because of the buildbot failure
in https://lab.llvm.org/buildbot#builders/9/builds/27319.
Original commit message
======================
The class `ResourceSegments` is used to keep track of the intervals
that represent resource usage of a list of instructions that are
being scheduled by the machine scheduler.
The collection is made of intervals that are closed on the left and
open on the right (represented by the standard notation `[a, b)`).
These collections of intervals can be extended by `add`ing new
intervals accordingly while scheduling a basic block.
Unit tests are added to verify the possible configurations of
intervals, and the relative possibility of scheduling a new
instruction in these configurations. Specifically, the methods
`getFirstAvailableAtFromBottom` and `getFirstAvailableAtFromTop` are
tested to make sure that both bottom-up and top-down scheduling work
when tracking resource usage across the basic block with
`ResourceSegments`.
Note that the scheduler tracks resource usage with two methods:
1. counters (via `std::vector<unsigned> ReservedCycles;`);
2. intervals (via `std::map<unsigned, ResourceSegments> ReservedResourceSegments;`).
This patch can be considered a NFC test for existing scheduling models
because the tracking system that uses intervals is turned off by
default (field `bit EnableIntervals = false;` in the tablegen class
`SchedMachineModel`).
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D150312
I don't think we can safely remove the second COPY as redundant in such cases.
The first COPY (which has undef src) may be lowered to a KILL instruction instead, resulting in no COPY being emitted at all.
Testcase is X86 so it's in the same place as other testcases for this function, but this was initially spotted on AMDGPU with the following:
```
renamable $vgpr24 = PRED_COPY undef renamable $vgpr25, implicit $exec
renamable $vgpr24 = PRED_COPY killed renamable $vgpr25, implicit $exec
```
The second COPY waas removed as redundant, and the first one was lowered to a KILL (= removed too), causing $vgpr24 to not have $vgpr25's value.
Fixes SWDEV-401507
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D152502
Reverted because it produces the following builbot failure at https://lab.llvm.org/buildbot#builders/9/builds/27319:
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/unittests/CodeGen/SchedBoundary.cpp: In member function ‘virtual void ResourceSegments_getFirstAvailableAtFromBottom_empty_Test::TestBody()’:
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/unittests/CodeGen/SchedBoundary.cpp:395:31: error: call of overloaded ‘ResourceSegments(<brace-enclosed initializer list>)’ is ambiguous
395 | auto X = ResourceSegments({});
| ^
This reverts commit dc312f0331309692e8d6e06e93b3492b6a40989f.
The class `ResourceSegments` is used to keep track of the intervals
that represent resource usage of a list of instructions that are
being scheduled by the machine scheduler.
The collection is made of intervals that are closed on the left and
open on the right (represented by the standard notation `[a, b)`).
These collections of intervals can be extended by `add`ing new
intervals accordingly while scheduling a basic block.
Unit tests are added to verify the possible configurations of
intervals, and the relative possibility of scheduling a new
instruction in these configurations. Specifically, the methods
`getFirstAvailableAtFromBottom` and `getFirstAvailableAtFromTop` are
tested to make sure that both bottom-up and top-down scheduling work
when tracking resource usage across the basic block with
`ResourceSegments`.
Note that the scheduler tracks resource usage with two methods:
1. counters (via `std::vector<unsigned> ReservedCycles;`);
2. intervals (via `std::map<unsigned, ResourceSegments> ReservedResourceSegments;`).
This patch can be considered a NFC test for existing scheduling models
because the tracking system that uses intervals is turned off by
default (field `bit EnableIntervals = false;` in the tablegen class
`SchedMachineModel`).
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D150312
This removes BitCasts from isSource in Type Promotion, as I don't believe they
need to be treated as Sources. They will usually be from floats or hoisted
constants, where constants will be handled already.
This fixes#62513, but didn't otherwise cause any differences in the tests I
ran.
Differential Revision: https://reviews.llvm.org/D152112
rG2eb7cbf987f21 added this code, which results in crash for vector
nodes. This patch solves it by skipping for the vector nodes.
Thanks Steve for helping reducing the test case.
Co-authored-by: Steve Merritt <steve.merritt@intel.com>
Reviewed By: goldstein.w.n
Differential Revision: https://reviews.llvm.org/D152492
- (op (op X, C1), C2) -> (op X, (op C1, C2))
- (op (op X, C1), Y) -> (op (op X, Y), C1)
Some code duplication with the G_PTR_ADD reassociations unfortunately but no
easy way to avoid it that I can see.
Differential Revision: https://reviews.llvm.org/D150230
Without explicitly checking and erroring out, an invalid personality
function, which is not `__gxx_wasm_personality_v0`, caused
a segmentation fault down the line because `WasmEHFuncInfo` was not
created. This explicitly checks the validity of personality functions in
functions with EH pads and errors out explicitly with a helpful error
message. This also adds some more assertions to ensure `WasmEHFuncInfo`
is correctly created and non-null.
Invalid personality functions wouldn't be generated by our Clang, but
can be present in handwritten ll files, and more often, in files
transformed by passes like `metarenamer`, which is often used with
`bugpoint` to simplify names in `bugpoint`-reduced files.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D152203
This isn't quite using register units, but it's getting close. The phi
generation is driven by register units, but each phi still contains a
reference to a register, potentially with a mask that amounts to a unit.
In cases of explicit register aliasing this may still create phis with
references that are aliased, whereas separate phis would ideally contain
disjoint references (this is all within a single basic block).
Previously phis used maximal registers, now they use minimal. This is a
step towards both, using register units directly, and a simpler liveness
calculation algorithm. The idea is that a phi cannot reach a reference
to anything smaller than the phi itself represents. Before there could
be a phi for R1_R0, now there will be two for this case (assuming R0 and
R1 have one unit each).
matchCombineShlOfExtend did not check if the size of new shift would be
wider then a size of operand. Current condition did not work if the value
being shifted was zero. Updated to support vector splat.
Patch by: Acim Maravic
Differential Revision: https://reviews.llvm.org/D151122
Note: Some patterns in visitFMA are needed refined to support splat of constant.
Reviewed By: luke
Differential Revision: https://reviews.llvm.org/D152260
Consider only targets where `MCAsmInfo::ExceptionsType == ExceptionHandling::None`
and that support CFI (when `MCAsmInfo::UsesCFIForDebug` is set to true):
currently, only AMDGPU.
This patch enables the emission of CFI information in the .eh_frame
section when the uwtable attribute is present on a function.
Before, we could generate CFI information for debugging puproses only.
This patch prepares AMDGPU to support collecting GPU stack traces in the future.
I did a first implementation (https://reviews.llvm.org/D139024)
but at the time I had not realized that no other platform used
`UsesCFIForDebug`.
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D151806
struct VectorInfo manages resources such as dynamically allocated memory, it's generally a good practice to either implement a custom copy constructor or disable the default one
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D152230
AMDGPU has native instructions and target intrinsics for this, but
these really should be subject to legalization and generic
optimizations. This will enable legalization of f16->f32 on targets
without f16 support.
Implement a somewhat horrible inline expansion for targets without
libcall support. This could be better if we could introduce control
flow (GlobalISel version not yet implemented). Support for strictfp
legalization is less complete but works for the simple cases.
Parametrize SampleProfileInference and SampleProfileLoaderBaseImpl by function
type (Function/MachineFunction) instead of block type
(BasicBlock/MachineBasicBlock). Move out specializations to appropriate
locations.
This change makes it possible to use GraphTraits instead of a custom TypeMap and
make SampleProfileInference not dependent on LLVM types, paving the way for
generalizing SampleProfileInference interfaces to BOLT IR types
(BinaryFunction/BinaryBasicBlock) in stale profile matching (D144500).
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D152187
If intrinsic `get_fpenv` or `set_fpenv` is lowered to the form where FP
environment is represented as a region in memory, extra moves can
appear. For example the code:
define void @func_01(ptr %ptr) {
%env = call i256 @llvm.get.fpenv.i256()
store i256 %env, ptr %ptr
ret void
}
produces DAG:
ch = get_fpenv_mem ch, memory_region
val: i256, ch = load ch, memory_region
ch = store ch, ptr, val
In this case the extra moves can be avoided if `get_fpenv_mem` got
pointer to the memory where the FP environment should be finally placed.
This change implement such optimization for this use case.
Differential Revision: https://reviews.llvm.org/D150437
This removes BitCasts from isSource in Type Promotion, as I don't believe they
need to be treated as Sources. They will usually be from floats or hoisted
constants, where constants will be handled already.
This fixes#62513, but didn't otherwise cause any differences in the tests I
ran.
Differential Revision: https://reviews.llvm.org/D152112
Currently, a node and its users are added back to the worklist in reverse topological order after it is combined. This diff changes that order to be topological. This is part of a larger migration to get the DAGCombiner to process nodes in topological order.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D127115