Implement P2255R2 tuple.apply part wording for `std::make_from_tuple`.
```
Mandates: If tuple_size_v<remove_reference_t<Tuple>> is 1, then reference_constructs_from_temporary_v<T, decltype(get<0>(declval<Tuple>()))> is false.
```
Fixes#154274
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
This makes the optimization in optimizeStringLength for strlen(gep
@glob, %x) -> sub endof@glob, %x a little more resilient, and maybe a
bit more correct for geps with non-array types.
According to the instructions manual, when `vr0` is changed, high 128
bit of `xr0` is undefined.
Use `vinsgr2vr.b/h` to insert an `i8/i16` to low 128bit of a 256 vector
may cause undefined behavior when high 128bit is used in later
instructions.
Support the following BCD format conversion builtins for PowerPC.
- `__builtin_bcdcopysign` – Conversion that returns the decimal value of
the first parameter combined with the sign code of the second parameter.
`
- `__builtin_bcdsetsign` – Conversion that sets the sign code of the
input parameter in packed decimal format.
> Note: This built-in function is valid only when all following
conditions are met:
> -qarch is set to utilize POWER9 technology.
> The bcd.h file is included.
## Prototypes
```c
vector unsigned char __builtin_bcdcopysign(vector unsigned char, vector unsigned char);
vector unsigned char __builtin_bcdsetsign(vector unsigned char, unsigned char);
```
## Usage Details
`__builtin_bcdsetsign`: Returns the packed decimal value of the first
parameter combined with the sign code.
The sign code is set according to the following rules:
- If the packed decimal value of the first parameter is positive, the
following rules apply:
- If the second parameter is 0, the sign code is set to 0xC.
- If the second parameter is 1, the sign code is set to 0xF.
- If the packed decimal value of the first parameter is negative, the
sign code is set to 0xD.
> notes:
> The second parameter can only be 0 or 1.
> You can determine whether a packed decimal value is positive or
negative as follows:
> - Packed decimal values with sign codes **0xA, 0xC, 0xE, or 0xF** are
interpreted as positive.
> - Packed decimal values with sign codes **0xB or 0xD** are interpreted
as negative.
---------
Co-authored-by: Aditi-Medhane <aditi.medhane@ibm.com>
This pr aims to resolve#152144
In SelectionDAG::canCreateUndefOrPoison the ISD::SCMP/UCMP cases are
added to always return false as they cannot generate poison or undef
The `freeze-binary.ll` file is now testing the SCMP/UCMP cases
---------
Co-authored-by: Temperz87 <= temperz871@gmail.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
When HwModes are involved, we can duplicate an instruction encoding that
does not belong to any HwMode multiple times. We can do better by
mapping HwMode to a list of encoding IDs it contains. (That is,
duplicate IDs instead of encodings.)
The encodings that were duplicated are still processed multiple times
(e.g., we call an expensive populateInstruction() on each instance).
This is going to be fixed in subsequent patches.
## Short Summary
This patch adds a new pass `aarch64-machine-sme-abi` to handle the ABI
for ZA state (e.g., lazy saves and agnostic ZA functions). This is
currently not enabled by default (but aims to be by LLVM 22). The goal
is for this new pass to more optimally place ZA saves/restores and to
work with exception handling.
## Long Description
This patch reimplements management of ZA state for functions with
private and shared ZA state. Agnostic ZA functions will be handled in a
later patch. For now, this is under the flag `-aarch64-new-sme-abi`,
however, we intend for this to replace the current SelectionDAG
implementation once complete.
The approach taken here is to mark instructions as needing ZA to be in a
specific ("ACTIVE" or "LOCAL_SAVED"). Machine instructions implicitly
defining or using ZA registers (such as $zt0 or $zab0) require the
"ACTIVE" state. Function calls may need the "LOCAL_SAVED" or "ACTIVE"
state depending on the callee (having shared or private ZA).
We already add ZA register uses/definitions to machine instructions, so
no extra work is needed to mark these.
Calls need to be marked by glueing Arch64ISD::INOUT_ZA_USE or
Arch64ISD::REQUIRES_ZA_SAVE to the CALLSEQ_START.
These markers are then used by the MachineSMEABIPass to find
instructions where there is a transition between required ZA states.
These are the points we need to insert code to set up or restore a ZA
save (or initialize ZA).
To handle control flow between blocks (which may have different ZA state
requirements), we bundle the incoming and outgoing edges of blocks.
Bundles are formed by assigning each block an incoming and outgoing
bundle (initially, all blocks have their own two bundles). Bundles are
then combined by joining the outgoing bundle of a block with the
incoming bundle of all successors.
These bundles are then assigned a ZA state based on the blocks that
participate in the bundle. Blocks whose incoming edges are in a bundle
"vote" for a ZA state that matches the state required at the first
instruction in the block, and likewise, blocks whose outgoing edges are
in a bundle vote for the ZA state that matches the last instruction in
the block. The ZA state with the most votes is used, which aims to
minimize the number of state transitions.
SCCP can use PredicateInfo to constrain ranges based on assume and
branch conditions. Currently, this is only enabled during IPSCCP.
This enables it for SCCP as well, which runs after functions have
already been simplified, while IPSCCP runs pre-inline. To a large
degree, CVP already handles range-based optimizations, but SCCP is more
reliable for the cases it can handle. In particular, SCCP works reliably
inside loops, which is something that CVP struggles with due to LVI
cycles.
I have made various optimizations to make PredicateInfo more efficient,
but unfortunately this still has significant compile-time cost (around
0.1-0.2%).
This attempts https://github.com/llvm/llvm-project/issues/132795 again.
Last time we tried this we didn't have enough infra capacity, so had to
revert. According to recent communication from the Infrastructure Area
Team, we should now have enough capacity to re-enable the LLDB tests.
There are a couple of places in the loop vectoriser where we
want to calculate the cost of extracting the last lane in a
vector. However, we wrongly assume that asking for the cost
of extracting lane (VF.getKnownMinValue() - 1) is an accurate
representation of the cost of extracting the last lane. For
SVE at least, this is non-trivial as it requires the use of
whilelo and lastb instructions.
To solve this problem I have added a new
getReverseVectorInstrCost interface where the index is used
in reverse from the end of the vector. Suppose a vector has
a given ElementCount EC, the extracted/inserted lane would be
EC - 1 - Index. For scalable vectors this index is unknown at
compile time. I've added a AArch64 hook that better represents
the cost, and also a RISCV hook that maintains compatibility
with the behaviour prior to this PR.
I've also taken the liberty of adding support in vplan for
calculating the cost of VPInstruction::ExtractLastElement.
It can happen that the call is originally created as a MemoryDef,
and then later transforms show it is actually read-only and could
be a MemoryUse -- however, this is not guaranteed to be reflected
in MSSA.
This is the first step in untangling the variable step transform and
header mask optimizations as described in #152541.
Currently we replace all VF users globally in the plan, including
VPVectorEndPointerRecipe. However this leaves reversed loads and stores
in an incorrect state until they are adjusted in optimizeMaskToEVL.
This moves the VPVectorEndPointerRecipe transform so that it is updated
in lockstep with the actual load/store recipe.
One thought that crossed my mind was that VPInterleaveRecipe could also
use VPVectorEndPointerRecipe, in which case we would have also been
computing the wrong address because we don't transform it to an EVL
recipe which accounts for the reversed address.
Depends on #153625
This patch adds support for statement expressions. It also changes
emitCompoundStmt and emitCompoundStmtWithoutScope to accept an Address
that the optional result is written to. This allows the creation of the
alloca ahead of the creation of the scope which saves us from hoisting
the alloca to its parent scope.
The check for ABI differences for inlined calls involves the caller, the
callee and the nested callee. Before inlining, the ABI is determined by
the target features of the callee. After inlining it is determined by
the caller. The features of the nested callee should never actually
matter.
The InputArg/OutputArg now contains the OrigTy, so directly use that
instead of trying to recover it.
CC_RISCV is now *nearly* a normal CC assignment function. However, it
still differs by having an IsRet flag.
Replace Mips custom logic for retaining information about original types
in calling convention lowering by directly querying the OrigTy that is
now available.
There is one change in behavior here: If the return type is a struct
containing fp128 plus additional members, the result is now different,
as we no longer special case to a single fp128 member. I believe this is
fine, because this is a fake ABI anyway: Such cases should actually use
sret, and as such are a frontend responsibility, and Clang will indeed
emit these as sret, not as a return value struct. So this only impacts
manually written IR tests.
Generate QC_INSB/QC_INSBI from `or (and X, MaskImm), OrImm` iff the
value being inserted only sets known zero bits. This is based on a
similar DAG to DAG transform done in `AArch64`.
PR #149247 made the MD accessible by the backend so we can now leverage
it in the memory model. The first use case here is detecting if a flat op
can access scratch memory.
Benefits both the MemoryLegalizer and InsertWaitCnt.