Compare sinking is selectable based on the result of
hasMultipleConditionRegisters. This function is too coarse grained by
not taking into account the differences between scalar and vector
compares. This PR extends the interface to take an EVT to allow finer
control.
The new interface is used by AArch64 to disable sinking of scalable
vector compares, but with isProfitableToSinkOperands updated to maintain
the cases that are specifically tested.
Add LLVM Context to getOptimalMemOpType and findOptimalMemOpLowering. So
that we can use EVT::getVectorVT to generate EVT type in
getOptimalMemOpType.
Related to [#146673](https://github.com/llvm/llvm-project/pull/146673).
In PowerPC, the AtomicCmpXchgInst is lowered to
ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS. However, this node does not handle
the weak attribute of AtomicCmpXchgInst. As a result, when compiling C++
atomic_compare_exchange_weak_explicit, the generated assembly includes a
"reservation lost" loop — i.e., it branches back and retries if the
stwcx. (store-conditional) fails. This differs from GCC’s codegen, which
does not include that loop for weak compare-exchange.
Since PowerPC uses LL/SC-style atomic instructions, the patch enables
AtomicExpandImpl::expandAtomicCmpXchg for PowerPC. With this, the weak
attribute is properly respected, and the "reservation lost" loop is
removed for weak operations.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Vector shift word or double requires a shift amount vector of 31 or 63
which is too big for splat immediate and requires a multi-instruction
sequence. However the PPC instructions only use 5 or 6 bits of the shift
amount vector elements so an all ones mask, which we can generate
efficiently, works.
ISD::ADDC, ISD::ADDE, ISD::SUBC and ISD::SUBE are being deprecated,
using ISD::UADDO_CARRY,ISD::USUBO_CARRY instead. Lowering the UADDO,
UADDO_CARRY, USUBO, USUBO_CARRY in the patch.
ISD::ADDC, ISD::ADDE, ISD::SUBC and ISD::SUBE are being deprecated,
using ISD::UADDO_CARRY,ISD::USUBO_CARRY instead. Lowering the UADDO,
UADDO_CARRY, USUBO, USUBO_CARRY in the patch.
With this change, targets are no longer required to put memory / strict-fp opcodes after special
`ISD::FIRST_TARGET_MEMORY_OPCODE`/`ISD::FIRST_TARGET_STRICTFP_OPCODE` markers.
This will also allow autogenerating `isTargetMemoryOpcode`/`isTargetStrictFPOpcode (#119709).
Pull Request: https://github.com/llvm/llvm-project/pull/119969
Add support for using a thread-local variable with a specified offset
for holding the stack guard canary value. This supports both 32- and 64-
bit PowerPC targets.
This mirrors changes from #108942 but targeting PowerPC instead of
RISCV. Because both of these PRs modify the same driver functions, this
series is stack on top of the RISC-V one.
---------
Signed-off-by: Keith Packard <keithp@keithp.com>
Given a list of constraints for InlineAsm (ex. "imr") I'm looking to
modify the order in which they are chosen. Before doing so, I noticed a
fair
amount of logic is duplicated between SelectionDAGISel and GlobalISel
for this.
That is because SelectionDAGISel is also trying to lower immediates
during selection. If we detangle these concerns into:
1. choose the preferred constraint
2. attempt to lower that constraint
Then we can slide down the list of constraints until we find one that
can be lowered. That allows the implementation to be shared between
instruction selection frameworks.
This makes it so that later I might only need to adjust the priority of
constraints in one place, and have both selectors behave the same.
reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
This reverts commit ee643b706be2b6bef9980b25cc9cc988dab94bb5.
Fix up build failures in targets I missed in #66003
Kept as 3 commits for reviewers to see better what's changed. Will
squash when
merging.
- reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
- fix all the targets I missed in #66003
- fix off by one found by llvm/test/CodeGen/SystemZ/inline-asm-addr.ll
On PPC there are instructions to store element from vector(e.g.
stxsdx/stxsiwx), and these instructions can be leveraged to avoid tail
constant in memset and constant splat array initialization.
This patch tries to explore these opportunities.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D138883
This patch adds support for the TLS local-exec access model on AIX to allow
for the ability to generate the 32-bit (specifically, non-optimized) code sequence.
This work is a follow up of D149722.
The particular sequence that is generated for this sequence is as follows:
```
.tc var[TC],var[TL]@le. // variable offset, with the le relocation specifier
bla .__get_tpointer() // get the thread pointer, modifies r3
lwz reg1, var[TC](2) // load the variable offset
add reg2, r3, reg1 // add the variable offset to the retrieved thread pointer
```
Differential Revision: https://reviews.llvm.org/D152669
This patch adds support for the TLS local-exec access model on AIX to allow
for the ability to generate the 64-bit (specifically, non-optimized) code sequence.
For this patch in particular, the sequence that is generated involves a load of the
variable offset, followed by an add of the loaded variable offset to r13 (which is
thread pointer, respectively). This code sequence looks like the following:
```
ld reg1,var[TC](2)
add reg2, reg1, r13 // r13 contains the thread pointer
```
The TOC (.tc pseudo-op) entries generated in the assembly files are also
changed where we add the @le relocation for the variable offset.
Differential Revision: https://reviews.llvm.org/D149722
The build failure should be fixed by de681d53. Follow-up refactor will
be done in future patches.
This reverts commit e7c5ced0b9f0551ea17e1d2b48be86f03a772c59.
On PowerPC VSX targets, fp-to-int will be transformed into xscv with
mfvsr. When the result is to be stored, mfvsr can be replaced by a
direct store.
This change simplifies the optimization by using existing fp-to-int
code, which helps CSE and handling strictfp cases.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D141473
This is rework of;
- rG13e77db2df94 (r328395; MVT)
Since `LowLevelType.h` has been restored to `CodeGen`, `MachinveValueType.h`
can be restored as well.
Depends on D148767
Differential Revision: https://reviews.llvm.org/D149024
Power ISA 3.0 introduced new 'test data class' instructions, which
accept flags for: NaN/Infinity/Zero/Denormal. This instruction can be
used to implement custom lowering for llvm.is.fpclass, but some extra
bits provided by the intrinsic are missing (normal and QNaN/SNaN).
For those categories not natively supported, this patch uses a two-way
or three-way combination to implement correct behavior.
Reviewed By: sepavloff, shchenz
Differential Revision: https://reviews.llvm.org/D140381
The input parameter IsByValArg to isEligibleForTCO() is false in all
cases, so it is considered redundant and should be removed.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D145028
A move towards using the generic ISD::ABDU nodes on more backends
Also support ISD::ABDS for v4i32 types using the existing signbit flip trick
PowerPC has a select(icmp_ugt(x,y),sub(x,y),sub(y,x)) -> abdu(x,y) combine that I intend to move to DAGCombiner in a future patch.
The ABS(SUB(X,Y)) -> PPCISD::VABSD(X,Y,1) v4i32 combine wasn't legal (https://alive2.llvm.org/ce/z/jc2hLU) - so I've removed it, having already added the legal sub nsw tests equivalent.
Differential Revision: https://reviews.llvm.org/D142313
The check logic for TCO is scattered in two functions:
IsEligibleForTailCallOptimization_64SVR4() IsEligibleForTailCallOptimization(),
and serves instruction selection phase only at this moment.
This patch aims to refactor existing logic to export an API for TCO
eligible query before instruction selection phase.
Reviewed By: shchenz, nemanjai
Differential Revision: https://reviews.llvm.org/D141673
This doesn't make sense as an option. fneg and fabs are bit
preserving by definition. If a target has some fneg or fabs
instruction that are not bitpreserving it's incorrect to lower
fneg/fabs to use it.
Address the inconsistency between FLT_ROUNDS_ and SET_ROUNDING SDAG
node. Rename FLT_ROUNDS_ to GET_ROUNDING and add llvm.get.rounding
intrinsic to replace flt.rounds.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D139507
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
vperm instruction requires the data to be in the Altivec registers, if one of
the vector operands is not used after this vperm instruction then it can be
substituted by xxperm which doubles the number of available registers.
Reviewed By: stefanp
Differential Revision: https://reviews.llvm.org/D133700
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.
A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.
Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.
Differential Revision: https://reviews.llvm.org/D124217
There are a few issues with the code we generate for atomic operations and the way we generate it:
- Hard coded CR0 for compares
- Order of operands for compares not conducive to
emitting compare-immediate or for CSE of compares
- Missing MachineMemOperand for st[bhwd]cx intrinsics
- Missing intrinsic properties for the same
- Unnecessary blocks with store conditional
instructions to clear reservation (which ends
up hindering performance)
- Move from CR instructions just to compare the
result of a store conditional with zero (even
though it is a record-form)
This patch aims to resolve all of those issues.
Differential revision: https://reviews.llvm.org/D134783
This feature implements support for making entries in the exception section
on XCOFF on the direct assembly path using the ".except" pseudo-op. It also
provides functionality to lower entries (comprised of language and reason
codes) into the exception section through the use of annotation metadata
attached to llvm.ppc.trap/trapd/tw/tdw intrinsics. Integrated assembler
support will be provided in another review. https://reviews.llvm.org/D133030
needs to merge first for LIT tests
Reviewed By: shchenz, RKSimon
Differential Revision: https://reviews.llvm.org/D132146
This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis.
For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling.
Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU.
Differential Revision: https://reviews.llvm.org/D132520