The debug info attached to the BUNDLE is the first instruction in the
BUNDLE, even if a better debug info (line:column) is present in the
later instructions of the bundle. The patch tries to get a better debug
info first. If not, then a worse debug info without line number is
chosen.
---------
Co-authored-by: Vladislav Dzhidzhoev <dzhidzhoev@gmail.com>
Co-authored-by: Orlando Cazalet-Hyams <orlandoch.och@gmail.com>
This allows more accurate alias analysis to apply at the bundle level.
This has a bunch of minor effects in post-RA scheduling that look mostly
beneficial to me, all of them in AMDGPU (the Thumb2 change is cosmetic).
The pre-existing (and unchanged) test in
CodeGen/MIR/AMDGPU/custom-pseudo-source-values.ll tests that MIR with a
bundle with MMOs can be parsed successfully.
v2:
- use cloneMergedMemRefs
- add another test to explicitly check the MMO bundling behavior
v3:
- use poison instead of undef to initialize the global variable in the
test
Currently MachineVerifier is missing verifying early-clobber operand
constraint.
The only other machine operand constraint - TiedTo is already verified.
This is in preparation of a future AMDGPU change where we are going to
create bundles before register allocation and want to rely on the
TwoAddressInstructionPass handling those bundles correctly.
v2:
- simplify the virtual register check and the test
As a follow-up to PR#165841, this change addresses `prof_md` metadata
loss in AtomicExpandPass when lowering `atomicrmw xchg` to a
Load-Linked/Store-Exclusive (LL/SC) loop.
This path is distinct from the LSE path addressed previously:
PR #165841 (and its tests) used `-mtriple=aarch64-linux-gnu`, which
targets a modern **ARMv8.1+** architecture. This architecture supports
**Large System Extensions (LSE)**, allowing `atomicrmw` to be lowered
directly to a more efficient hardware instruction.
This PR (and its tests) uses `-mtriple=aarch64--` or
`-mtriple=armv8-linux-gnueabihf`. This indicates an `ARMv8.0 or lower
architecture that does not support LSE`. On these targets, the pass must
fall back to synthesizing a manual LL/SC loop using the `ldaxr/stxr`
instruction pair.
Similar to previous issue, the new conditional branch was failin to
inherit the `prof_md` metadata. Theis PR correctly fix the branch
weights to the newly created branch within the LL/SC loop, ensuring
profile information is preserved.
Co-authored-by: Jin Huang <jingold@google.com>
In both ScheduleDAGInstrs and MachineScheduler, we call `BufferSize = 0`
as _reserved_ and `BufferSize = 1` as _unbuffered_. This convention is
stem from the fact that we set `SUnit::hasReservedResource` to true when
any of the SUnit's consumed resources has BufferSize equal to zero; set
`SUnit::isUnbuffered` to true when any of its consumed resources has
BufferSize equal to one.
However, `SchedBoundary::isUnbufferedGroup` doesn't really follow this
convention: it returns true when the resource in question is a
`ProcResGroup` and its BufferSize equals to **zero** rather than one.
This could be really confusing for the reader. This patch renames this
function to `isReservedGroup` in aligned with the convention mentioned
above.
NFC.
MachineFunctionSplitter was missing a skipFunction() check, causing it
to incorrectly split functions that should be skipped (e.g., functions
with optnone attribute).
This patch adds an early skipFunction() check in runOnMachineFunction()
to ensure these functions are never split, regardless of profile data
availability or other splitting conditions.
This avoids AArch64 legality rules depending on libcall
availability.
ARM, AArch64, and X86 all had custom lowering of fsincos which
all were just to emit calls to sincos_stret / sincosf_stret. This
messes with the cost heuristics around legality, because really
it's an expand/libcall cost and not a favorable custom.
This is a bit ugly, because we're emitting code trying to match the
C ABI lowered IR type for the aggregate return type. This now also
gives an easy way to lift the unhandled x86_32 darwin case, since
ARM already handled the return as sret case.
Enable the `SPV_INTEL_bfloat16_arithmetic` extension, which allows arithmetic, relational and `OpExtInst` instructions to take `bfloat16` arguments. This patch only adds support to arithmetic and relational ops. The extension itself is rather fresh, but `bfloat16` is ubiquitous at this point and not supporting these ops is limiting.
Previously, sign-extending a 1-bit boolean operand in `#DBG_VALUE` would
convert `true` to -1 (i.e., 0xffffffffffffffff). However, DWARF treats
booleans as unsigned values, so this resulted in the attribute
`DW_AT_const_value(0xffffffffffffffff)` being emitted. As a result, the
debugger would display the value as `255` instead of `true`.
This change modifies the behavior to use zero-extension for 1-bit values
instead, ensuring that `true` is represented as 1. Consequently, the
DWARF attribute emitted is now `DW_AT_const_value(1)`, which allows the
debugger to correctly display the boolean as `true`.
In C++17, static constexpr members are implicitly inline, so they no
longer require an out-of-line definition.
Identified with readability-redundant-declaration.
In the process of determining whether two MachineOperands are equal and
calculating the hash of a MachineOperand, both MO_RegisterMask and
MO_RegisterLiveOut types were uniformly handled. However, when the type
is MO_RegisterLiveOut, calling getRegMask() triggers an assertion
failure. This PR addresses this issue.
This is consistent with other promotion, but causes negative constants
to be sign extended instead of zero extended in some cases.
I guess getNode and type legalizer are inconsistent about what
ANY_EXTEND of a constant does.
When a load or store accesses N bytes starting from a pointer P, and we want to
compute an offset pointer within these N bytes after P, we know that the
arithmetic to add the offset must be inbounds. This is for example relevant
when legalizing too-wide memory accesses, when lowering memcpy&Co., or when
optimizing "vector-load -> extractelement" into an offset load.
For SWDEV-516125.
This PR preserves the InBounds flag (#162477) where possible in PTRADD-related
DAGCombines. We can't preserve them in all the cases that we could in the
analogous GISel change (#152495) because SDAG usually represents pointers as
integers, which means that pointer provenance is not preserved between PTRADD
operations (see the discussion at PR #162477 for more details). This PR marks
the places in the DAGCombiner where this is relevant explicitly.
For SWDEV-516125.
For an insertelt with a dynamic index, the default handling in
DAGTypeLegalizer and LegalizeDAG will reserve a stack slot for the
vector, lower the insertelt to a store, then load the modified vector
back into temporaries. The vector store and load may be legalized into a
sequence of smaller operations depending on the target.
Let V = the vector size and L = the length of a chain of insertelts with
dynamic indices. In the worse case, this chain will lower to O(VL)
operations, which can increase code size dramatically.
Instead, identify such chains, reserve one stack slot for the vector,
and lower all of the insertelts to stores at once. This requires only
O(V + L) operations. This change only affects the default lowering
behavior.
DW_TAG_base_type DIEs are permitted to have both byte_size and bit_size
attributes "If the value of an object of the given type does not fully
occupy the storage described by a byte size attribute"
* Add DataSizeInBits to DIBasicType (`DIBasicType(... dataSize: n ...)` in IR).
* Change Clang to add DataSizeInBits to _BitInt type metadata.
* Change LLVM to add DW_AT_bit_size to base_type DIEs that have non-zero
DataSizeInBits.
TODO: Do we need to emit DW_AT_data_bit_offset for big endian targets?
See discussion on the PR.
Fixes [#61952](https://github.com/llvm/llvm-project/issues/61952)
---------
Co-authored-by: David Stenberg <david.stenberg@ericsson.com>
We want to be able to produce extr instructions post-legalization. They
are legal for scalars, acting as a funnel shift with a constant shift
amount. Unfortunately I'm not sure if there is a way currently to
represent that in the legalization rules, but it might be useful for
several operations - to be able to treat and test operands with constant
operands as legal or not.
This adds a change to the existing matchOrShiftToFunnelShift so that
AArch64 can generate such instructions post-legalization providing that
the operation is scalar and the shift amount is constant.
Currently it is considered suitable to lower to a bit test for a set of
switch case clusters when the the number of unique destinations
(`NumDests`) and the number of total comparisons (`NumCmps`) satisfy:
`(NumDests == 1 && NumCmps >= 3) || (NumDests == 2 && NumCmps >= 5) ||
(NumDests == 3 && NumCmps >= 6)`
However it is found for some cases on powerpc, for example, when
NumDests is 3, and the number of comparisons for each destination is all
2, it's not profitable to lower the switch to bit test. This is to add
an option to set the minimum of largest number of comparisons to use bit
test for switch lowering.
---------
Co-authored-by: Shimin Cui <scui@xlperflep9.rtp.raleigh.ibm.com>
[DAG] Fold mismatched widened avg idioms to narrow form (fixes half of
[llvm#147946](https://github.com/llvm/llvm-project/issues/147946))
1. `trunc(avgceilu(sext(x), sext(y))) -> avgceils(x, y)`
2. `trunc(avgceils(zext(x), zext(y))) -> avgceilu(x, y)`
When inputs are sign-extended, unsigned and signed averaging operations
produce identical results after truncation, allowing us to use the
semantically correct narrow operation.
alive2: https://alive2.llvm.org/ce/z/ZRbfHT
The argument_type and result_type type aliases in std::hash are
deprecated in C++17 and removed in C++20. This patch aligns two
specializations of ours with the C++ standard.
With try_emplace, we can pass the key and the arguments for the
value's constructor, which is a lot shorter than:
Map.insert(std::make_pair(Key, ValueType(Arg1, Arg2)))
Update `.Cases` and `.CasesLower` with 4+ args to use the
`initializer_list` overload. The deprecation of these functions will
come in a separate PR.
For more context, see: https://github.com/llvm/llvm-project/pull/163405.
Cache extracted elements in lowerShuffleVector(). For example, when
lowering
```
%0:_(<2 x s32>) = G_BUILD_VECTOR %0, %1
%2:_(<N x s32>) = G_SHUFFLE_VECTOR %1, shufflemask(0, 0, 0, 0 ... x N )
```
Currently, we generate `N` `G_EXTRACT_VECTOR_ELT` for each element in
shufflemask. This is undesirable and bloats the code, especially for
larger vectors.
With this change, we only generate one `G_EXTRACT_VECTOR_ELT` from `%0`
and reuse it for all four result elements.
This patch moves the init, copyFrom, and grow methods in DenseMap and
SmallDenseMap from public to private to hide implementation details.
The only problem is that PhysicalRegisterUsageInfo calls
DenseMap::grow instead of DenseMap::reserve, which I don't think is
intended. This patch updates the call to reserve.
This is to follow the discussion in
https://github.com/llvm/llvm-project/pull/164565
CallBase can cover more call-like instructions which carry caling
convention flag.
Co-authored-by: Yuanke Luo <ykluo@birentech.com>
Since new select/phi instructions may construct loops, the expression
tree to be simplified may still be incomplete (i.e., it may contain
select with dummy values or phi without incoming values). This patch
removes the call to simplifyInstruction for now, as it doesn't break
existing tests.
Original PR: https://reviews.llvm.org/D36073
Fix the crash reported in
https://github.com/llvm/llvm-project/pull/163453#issuecomment-3429922732.
Now that the old llvm::identity has moved into IndexedMap.h under a
different name, this patch renames identity_cxx20 to identity. Note
that llvm::identity closely models std::identity from C++20.
Add GlobalISel lowering of G_FMINIMUM and G_FMAXIMUM following the same
logic as in SDag's expandFMINIMUM_FMAXIMUM.
Update AMDGPU legalization rules: Pre GFX12 now uses new lowering method
and make G_FMINNUM_IEEE and G_FMAXNUM_IEEE legal to match SDag.
I'm not sure if this is the best way forward or not, but we have a lot
of issues with forgetting that shuffle_vectors can be scalar again and
again. (There is another example from the recent known-bits code added
recently). As a scalar-dst shuffle vector is just an extract, and a
scalar-source shuffle vector is just a build vector, this patch makes
scalar shuffle vector illegal and adjusts the irbuilder to create the
correct node as required.
Most targets do this already through lowering or combines. Making scalar
shuffles illegal simplifies gisel as a whole, it just requires that
transforms that create shuffles of new sizes to account for the scalar
shuffle being illegal (mostly IRBuilder and LessElements).
Selection DAG has a more sophisticated execution order representation
than the simple sequence used in IR, so building the DAG can take into
account specific properties of the nodes to better express possible
parallelism. The existing implementation does this for constrained
function calls, some of them are considered as independent, which can
potentially improve the generated code. However this mechanism
incorrectly implies that the calls with exception behavior 'ebIgnore'
cannot raise floating-point exception. The purpose of this change is to
fix the implementation.
In the current implementation, constrained function calls don't
immediately update the DAG root. Instead, the DAG builder collects their
output chains and flushes them when the root is required. Constrained
function calls cannot be moved across calls of external functions and
intrinsics that access floating-point environment, they work as
barriers. Between the barriers, constrained function calls can be
reordered, they may be considered independent from viewpoint of raising
exceptions. For strictfp functions this is possible only if
floating-point trapping is disabled.
This change introduces a new restriction - the calls with default
exception handling cannot not be moved between strictfp function calls.
Otherwise the exceptions raised by such call can disturb the expected
exception sequence. It means that constrained function calls with strict
exception behavior act as barriers for the calls with non-strict
behavior and vice versa. Effectively it means that the entire sequence
of constrained calls in IR is split into "strict" and "non-strict"
regions, in which restrictions on the order of constrained calls are
relaxed, but move from one region to another is not allowed. It agrees
with the representation of strictfp code in high-level languages. For
example, C/C++ strictfp code correspond to blocks where pragma `STDC
FENV_ACCESS ON` is in effect, this restriction should help preserving
the intended semantics.
When floating-point exception trapping is enabled, constrained
intrinsics with 'ebStrict' cannot be reordered, their sequence must be
identical to the original source order. The current implementation does
not distinguish between strictfp modes with trapping and without it.
This change make assumption that the trapping is disabled. It is not
correct in the general case, but is compatible with the existing
implementation.
This patch introduces SDNodeFlags::InBounds, to show that an ISD::PTRADD SDNode
implements an inbounds getelementptr operation (i.e., the pointer operand is in
bounds wrt. an allocated object it is based on, and the arithmetic does not
change that). The flag is set in the DAG construction when lowering inbounds
GEPs.
Inbounds information is useful in the ISel when selecting memory instructions
that perform address computations whose intermediate steps must be in the same
memory region as the final result. Follow-up patches to propagate the flag in
DAGCombines and to use it when lowering AMDGPU's flat memory instructions,
where the immediate offset must not affect the memory aperture of the address
(similar to this GISel patch: #153001), are planned.
This mirrors #150900, which has introduced a similar flag in GlobalISel.
This patch supersedes #131862, which previously attempted to introduce an
SDNodeFlags::InBounds flag. The difference between this PR and #131862 is that
there is now an ISD::PTRADD opcode (PR #140017) and the InBounds flag is only
defined to apply to ISD::PTRADD DAG nodes. It is therefore unambiguous that
in-bounds-ness refers to a memory object into which the left operand of the
PTRADD node points (in contrast to #131862, where InBounds would have applied
to commutative ISD::ADD nodes, so that the semantics would be more difficult to
reason about).
For SWDEV-516125.