510 Commits

Author SHA1 Message Date
Qiu Chaofan
5b8ea2d0e1 [PowerPC] Lower IS_FPCLASS by test data class instruction
Power ISA 3.0 introduced new 'test data class' instructions, which
accept flags for: NaN/Infinity/Zero/Denormal. This instruction can be
used to implement custom lowering for llvm.is.fpclass, but some extra
bits provided by the intrinsic are missing (normal and QNaN/SNaN).

For those categories not natively supported, this patch uses a two-way
or three-way combination to implement correct behavior.

Reviewed By: sepavloff, shchenz

Differential Revision: https://reviews.llvm.org/D140381
2023-04-03 11:37:17 +08:00
Craig Topper
219ff07f72 [Targets] Rename Flag->Glue. NFC
Long long ago Glue was called Flag, and it was never completely
renamed.
2023-04-02 19:28:51 -07:00
Simon Pilgrim
da570ef1b4 [DAG] Match select(icmp(x,y),sub(x,y),sub(y,x)) -> abd(x,y) patterns
Pulled out of PowerPC, and added ABDS support as well (hence the additional v4i32 PPC matches)

Differential Revision: https://reviews.llvm.org/D144789
2023-03-14 15:10:30 +00:00
Ting Wang
bd4562976c [PowerPC][NFC] cleanup isEligibleForTCO
The input parameter IsByValArg to isEligibleForTCO() is false in all
cases, so it is considered redundant and should be removed.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D145028
2023-03-02 23:04:19 -05:00
Ting Wang
65f68812d3 [PowerPC] update PPCTTIImpl::supportsTailCallFor() check conditions
This patch reuse `PPCTargetLowering::isEligibleForTCO()` to check
`PPCTTIImpl::supportsTailCallFor()`.

Fixes #59315

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D140369
2023-02-28 22:29:16 -05:00
Simon Pilgrim
8757ce4901 [PowerPC] Replace PPCISD::VABSD cases with generic ISD::ABDU(X,Y) node
A move towards using the generic ISD::ABDU nodes on more backends

Also support ISD::ABDS for v4i32 types using the existing signbit flip trick

PowerPC has a select(icmp_ugt(x,y),sub(x,y),sub(y,x)) -> abdu(x,y) combine that I intend to move to DAGCombiner in a future patch.

The ABS(SUB(X,Y)) -> PPCISD::VABSD(X,Y,1) v4i32 combine wasn't legal (https://alive2.llvm.org/ce/z/jc2hLU) - so I've removed it, having already added the legal sub nsw tests equivalent.

Differential Revision: https://reviews.llvm.org/D142313
2023-02-25 20:17:17 +00:00
Ting Wang
d567e06946 [PowerPC][NFC] refactor eligible check for tail call optimization
The check logic for TCO is scattered in two functions:
IsEligibleForTailCallOptimization_64SVR4() IsEligibleForTailCallOptimization(),
and serves instruction selection phase only at this moment.

This patch aims to refactor existing logic to export an API for TCO
eligible query before instruction selection phase.

Reviewed By: shchenz, nemanjai

Differential Revision: https://reviews.llvm.org/D141673
2023-02-21 06:14:47 -05:00
Matt Arsenault
09dd4d870e DAG: Remove hasBitPreservingFPLogic
This doesn't make sense as an option. fneg and fabs are bit
preserving by definition. If a target has some fneg or fabs
instruction that are not bitpreserving it's incorrect to lower
fneg/fabs to use it.
2023-02-14 10:25:24 -04:00
Qiu Chaofan
a40ef656d8 [Intrinsic] Rename flt.rounds intrinsic to get.rounding
Address the inconsistency between FLT_ROUNDS_ and SET_ROUNDING SDAG
node. Rename FLT_ROUNDS_ to GET_ROUNDING and add llvm.get.rounding
intrinsic to replace flt.rounds.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D139507
2022-12-19 15:22:39 +08:00
Kazu Hirata
20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
Krzysztof Parzyszek
864aaa21b4 TargetLowering: convert Optional to std::optional 2022-12-01 16:19:10 -08:00
Maryam Moghadas
934d5fa2b8 [PowerPC] Exploit xxperm, check for dead vectors and substitute vperm with xxperm
vperm instruction requires the data to be in the Altivec registers, if one of
the vector operands is not used after this vperm instruction then it can be
substituted by xxperm which doubles the number of available registers.

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D133700
2022-11-23 13:28:12 -06:00
Stanislav Mekhanoshin
bcaf31ec3f [AMDGPU] Allow finer grain control of an unaligned access speed
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.

A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.

Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.

Differential Revision: https://reviews.llvm.org/D124217
2022-11-17 09:23:53 -08:00
Nemanja Ivanovic
4ea121c904 [PowerPC] Fix a number of inefficiencies and issues with atomic code gen
There are a few issues with the code we generate for atomic operations and the way we generate it:

- Hard coded CR0 for compares
- Order of operands for compares not conducive to
  emitting compare-immediate or for CSE of compares
- Missing MachineMemOperand for st[bhwd]cx intrinsics
- Missing intrinsic properties for the same
- Unnecessary blocks with store conditional
  instructions to clear reservation (which ends
  up hindering performance)
- Move from CR instructions just to compare the
  result of a store conditional with zero (even
  though it is a record-form)

This patch aims to resolve all of those issues.

Differential revision: https://reviews.llvm.org/D134783
2022-10-03 19:55:29 -05:00
Paul Scoropan
ce004fb4f2 [PowerPC] XCOFF exception section support on the direct assembler path
This feature implements support for making entries in the exception section
on XCOFF on the direct assembly path using the ".except" pseudo-op. It also
provides functionality to lower entries (comprised of language and reason
codes) into the exception section through the use of annotation metadata
attached to llvm.ppc.trap/trapd/tw/tdw intrinsics. Integrated assembler
support will be provided in another review. https://reviews.llvm.org/D133030
needs to merge first for LIT tests

Reviewed By: shchenz, RKSimon

Differential Revision: https://reviews.llvm.org/D132146
2022-09-26 22:24:20 -04:00
Josh Stone
4dcfb09e40 [NFC][CodeGen] Use const MF in TargetLowering stack probe functions
This makes them callable from places like canUseAsPrologue.

Differential Revision: https://reviews.llvm.org/D134492
2022-09-23 09:30:32 -07:00
Simon Pilgrim
f9de13232f [X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch
This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis.

For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling.

Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU.

Differential Revision: https://reviews.llvm.org/D132520
2022-08-24 17:28:18 +01:00
Masoud Ataei
96515df816 [PowerPC] Fix the check for scalar MASS conversion
Proposing to move the check for scalar MASS conversion from constructor
of PPCTargetLowering to the lowerLibCallBase function which decides
about the lowering.

The Target machine option Options.PPCGenScalarMASSEntries is set in
PPCTargetMachine.cpp. But an object of the class PPCTargetLowering
is created in one of the included header files. So, the constructor will run
before setting PPCGenScalarMASSEntries to correct value. So, we cannot
check this option in the constructor.

Differential: https://reviews.llvm.org/D128653
Reviewer: @bmahjour
2022-07-06 11:44:00 -07:00
Kai Luo
549e118e93 [PowerPC] Support 16-byte lock free atomics on pwr8 and up
Make 16-byte atomic type aligned to 16-byte on PPC64, thus consistent with GCC. Also enable inlining 16-byte atomics on non-AIX targets on PPC64.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D122377
2022-04-08 23:25:56 +00:00
Shao-Ce SUN
21ac474392 [NFC] Correct typo interger to integer 2022-02-17 21:17:47 +08:00
Amy Kwan
ac5a5a9cfe [PowerPC] Add default handling for single element vectors, and split/promote vNi1 vectors.
This patch updates the handling of vectors in getPreferredVectorAction():

For single-element and scalable vectors, fall back to default vector legalization
handling. For vNi1 vectors, add handling to either split or promote them in
order to prevent the production of wide v256i1/v512i1 types.

The following assertion is fixed by this patch, as we ended up producing the
wide vector types (that are used for MMA) in the backend prior to this fix.

```
Assertion failed: VT.getSizeInBits() == Operand.getValueSizeInBits() &&
"Cannot BITCAST between types of different sizes!"
```

Differential Revision: https://reviews.llvm.org/D119521
2022-02-15 08:44:08 -06:00
Ting Wang
097a95f2df [PowerPC] Add custom lowering for SELECT_CC fp128 using xsmaxcqp
Power ISA 3.1 adds xsmaxcqp/xsmincqp for quad-precision type-c max/min selection,
and this opens the opportunity to improve instruction selection on: llvm.maxnum.f128,
llvm.minnum.f128, and select_cc ordered gt/lt and (don't care) gt/lt.

Reviewed By: nemanjai, shchenz, amyk

Differential Revision: https://reviews.llvm.org/D117006
2022-02-09 21:48:28 -05:00
Masoud Ataei
256d253332 [PowerPC] Scalar IBM MASS library conversion pass
This patch introduces the conversions from math function calls
to MASS library calls. To resolves calls generated with these conversions, one
need to link libxlopt.a library. This patch is tested on PowerPC Linux and AIX.

Differential: https://reviews.llvm.org/D101759

Reviewer: bmahjour
2022-02-02 07:54:19 -08:00
Qiu Chaofan
c2cc70e4f5 [NFC] Fix endif comments to match with include guard 2022-01-07 15:52:59 +08:00
Nemanja Ivanovic
c9cb8edc51 [PowerPC] Allow scalars for asm constraint "v" with VSX
Similarly to what GCC does, we should allow scalars with
the "v" constraint rather than introducing unnecessary
new constraints for scalars in Altivec registers.

Differential revision: https://reviews.llvm.org/D113635
2021-11-23 17:03:04 -06:00
Nemanja Ivanovic
5840f7197d [PowerPC] Respect rounding mode in the back end
Currently, the floating point instructions that depend on
rounding mode are correctly marked in the PPC back end with
an implicit use of the RM register. Similarly, instructions
that explicitly define the register are marked with an
implicit def of the same register. So for the most part,
RM-using code won't be moved across RM-setting instructions.

However, calls are not marked as RM-setting instructions so
code can be moved across calls. This is generally desired,
but so is the ability to turn off this behaviour with an
appropriate option - and -frounding-math really should be
that option.

This patch provides a set of call instructions (for direct
and indirect calls) that are marked with an implicit def of
the RM register. These will be used for calls that are marked
with the strictfp attribute.

Differential revision: https://reviews.llvm.org/D111433
2021-11-10 08:19:58 -06:00
Chen Zheng
5a8b196340 [PowerPC] handle more splat loads without stack operation
This mostly improves splat loads code generation on Power7

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D106555
2021-11-03 05:17:41 +00:00
Arthur Eubanks
a0a4935182 Make more places that use alignment use uint64_t
Followup to D110451.
2021-10-08 16:35:19 -07:00
Amy Kwan
5041a485b9 [PowerPC] Exploit Prefixed Load/Stores using the refactored Load/Store Implementation
This patch exploits the prefixed load and store instructions utilizing the
refactored load/store implementation introduced in D93370.

Prefixed load and store instructions are emitted whenever we are loading or
storing a value with an offset that fits into a 34-bit signed immediate.
Patterns for the prefixed load and stores are added in this patch, as well as
the implementation that detects when we are loading and storing a value with an
offset that fits in 34-bits.

Differential Revision: https://reviews.llvm.org/D96075
2021-09-14 08:39:49 -05:00
Amy Kwan
351a0d8a90 [PowerPC] Update PC-Relative Load/Store Patterns to use the refactored Load/Store Implementation
This patch updates the PC-Relative load and store patterns to utilize the
refactored load/store implementation introduced in D93370.

PC-Relative implementation has been added to PPCISelLowering.cpp, and also the
patterns in PPCInstrPrefix.td have been updated and no longer require AddedComplexity.
All existing test cases pass with this update.

Differential Revision: https://reviews.llvm.org/D95116
2021-09-09 15:38:42 -05:00
Kai Luo
5eaebd5d64 [PowerPC] Implement quadword atomic load/store
Add support to load/store i128 atomically.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D105612
2021-09-01 06:55:40 +00:00
Kai Luo
b9c3941cd6 [PowerPC] Generate inlined quadword lock free atomic operations via AtomicExpand
This patch uses AtomicExpandPass to implement quadword lock free atomic operations. It adopts the method introduced in https://reviews.llvm.org/D47882, which expand atomic operations post RA to avoid spilling that might prevent LL/SC progress.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D103614
2021-07-15 01:12:09 +00:00
David Spickett
e4ecd83fe9 [llvm][AArch64] Handle arrays of struct properly (from IR)
This only applies to FastIsel. GlobalIsel seems to sidestep
the issue.

This fixes https://bugs.llvm.org/show_bug.cgi?id=46996

One of the things we do in llvm is decide if a type needs
consecutive registers. Previously, we just checked if it
was an array or not.
(plus an SVE specific check that is not changing here)

This causes some confusion when you arbitrary IR like:
```
%T1 = type { double, i1 };
define [ 1 x %T1 ] @foo() {
entry:
  ret [ 1 x %T1 ] zeroinitializer
}
```

We see it is an array so we call CC_AArch64_Custom_Block
which bails out when it sees the i1, a type we don't want
to put into a block.

This leaves the location of the double in some kind of
intermediate state and leads to odd codegen. Which then crashes
the backend because it doesn't know how to implement
what it's been asked for.

You get this:
```
  renamable $d0 = FMOVD0
  $w0 = COPY killed renamable $d0
```

Rather than this:
```
  $d0 = FMOVD0
  $w0 = COPY $wzr
```

The backend knows how to copy 64 bit to 64 bit registers,
but not 64 to 32. It can certainly be taught how but the real
issue seems to be us even trying to assign a register block
in the first place.

This change makes the logic of
AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters
a bit more in depth. If we find an array, also check that all the
nested aggregates in that array have a single member type.

Then CC_AArch64_Custom_Block's assumption of a type that looks
like [ N x type ] will be valid and we get the expected codegen.

New tests have been added to exercise these situations. Note that
some of the output is not ABI compliant. The aim of this change is
to simply handle these situations and not to make our processing
of arbitrary IR ABI compliant.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D104123
2021-06-16 13:56:01 +00:00
Nikita Popov
1ffa6499ea [TargetLowering] Use IRBuilderBase instead of IRBuilder<> (NFC)
Don't require a specific kind of IRBuilder for TargetLowering hooks.
This allows us to drop the IRBuilder.h include from TargetLowering.h.

Differential Revision: https://reviews.llvm.org/D103759
2021-06-06 16:29:50 +02:00
Anshil Gandhi
1c5ff0b03f [PowerPC] [GlobalISel] Implementation of formal arguments lowering in the IRTranslator for the PPC backend
Differential Revision: https://reviews.llvm.org/D99812
2021-06-02 16:46:39 -06:00
Anshil Gandhi
3e5ddb83e3 Revert "Differential Revision: https://reviews.llvm.org/D99812"
This reverts commit c729f2a48a6ef6b20554494c5630082c89c3680c.
2021-06-02 16:36:00 -06:00
Anshil Gandhi
c729f2a48a Differential Revision: https://reviews.llvm.org/D99812 2021-06-02 14:09:52 -06:00
Jinsong Ji
b2581196eb [AIX] Enable stackprotect feature
AIX use `__ssp_canary_word` instead of `__stack_chk_guard`.
This patch update the target hook to use correct symbol,
so that the basic stackprotect feature can work.

The traceback will be handled in follow up patch.

Reviewed By: #powerpc, shchenz

Differential Revision: https://reviews.llvm.org/D103100
2021-05-28 02:18:15 +00:00
Stefan Pintilie
15051f0b4a [PowerPC] Handle inline assembly clobber of link regsiter
This patch adds the handling of clobbers of the link register LR for inline
assembly.

This patch is to fix:
https://bugs.llvm.org/show_bug.cgi?id=50147

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D101657
2021-05-13 07:43:37 -05:00
Craig Topper
44e0e91db0 [ValueTypes] Rename MVT::getVectorNumElements() to MVT::getVectorMinNumElements(). Fix some misuses of getVectorNumElements()
getVectorNumElements() returns a value for scalable vectors
without any warning so it is effectively getVectorMinNumElements().
By renaming it and making getVectorNumElements() forward to
it, we can insert a check for scalable vectors into getVectorNumElements()
similar to EVT. I didn't do that in this patch because there are still more
fixes needed, but I was able to temporarily do it and passed the RISCV
lit tests with these changes.

The changes to isPow2VectorType and getPow2VectorType are copied from EVT.

The change to TypeInfer::EnforceSameNumElts reduces the size of AArch64's isel table.
We're now considering SameNumElts to require the scalable property to match which
removes some unneeded type checks.

This was motivated by the bug I fixed yesterday in 80b9510806cf11c57f2dd87191d3989fc45defa8

Reviewed By: frasercrmck, sdesmalen

Differential Revision: https://reviews.llvm.org/D102262
2021-05-12 07:46:45 -07:00
Amy Kwan
64d951be61 [PowerPC] Add new infrastructure to select load/store instructions, update P8/P9 load/store patterns.
This patch introduces a new infrastructure that is used to select the load and
store instructions in the PPC backend.

The primary motivation is that the current implementation of selecting load/stores
is dependent on the ordering of patterns in TableGen. Given this limitation, we
are not able to easily and reliably generate the P10 prefixed load and stores
instructions (such as when the immediates that fit within 34-bits). This
refactoring is meant to provide us with more control over the patterns/different
forms to exploit, as well as eliminating dependency of pattern declaration in TableGen.

The idea of this refactoring is that it introduces a set of addressing modes that
correspond to different instruction formats of a particular load and store
instruction, along with a set of common flags that describes a load/store.
Whenever a load/store instruction is being selected, we analyze the instruction
and compute a set of flags for it. The computed flags are then used to
select the most optimal load/store addressing mode.

This patch is the first of a series of patches to be committed - it contains the
initial implementation of the refactored load/store selection infrastructure and
also updates P8/P9 patterns to adopt this infrastructure. The idea is that
incremental patches will add more implementation and support, and eventually
the old implementation will be removed.

Differential Revision: https://reviews.llvm.org/D93370
2021-04-30 09:53:19 -05:00
Jay Foad
82d34fe2b3 Fix typo "beneficiates" in comments 2021-04-22 12:30:16 +01:00
Nick Desaulniers
c440b97d89 [TargetLowering] move "o" and "X" constraint handling to base class
These constraints are machine agnostic; there's no reason to handle
these per-arch. If arches don't support these constraints, then they
will fail elsewhere during instruction selection. We don't need virtual
calls to look these up; TargetLowering::getInlineAsmMemConstraint should
only be overridden by architectures with additional unique memory
constraints.

Reviewed By: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D100416
2021-04-19 10:53:31 -07:00
Albion Fung
e29bb074c6 [PowerPC] Exploit xxsplti32dx (constant materialization) for scalars
This patch exploits the xxsplti32dx instruction available on Power10
in place of constant pool loads where xxspltidp would not be able to,
usually because the immediate cannot fit into 32 bits.

Differential Revision: https://reviews.llvm.org/D95458
2021-03-24 15:59:59 -04:00
Nemanja Ivanovic
b0f0115308 [AIX][TLS] Generate 32-bit general-dynamic access code sequence
Adds support for the TLS general dynamic access model to
assembly files on AIX 32-bit.

To generate the correct code sequence when accessing a TLS variable
`v`, we first create two TOC entry nodes, one for the variable offset, one
for the region handle. These nodes are followed by a `PPCISD::TLSGD_AIX`
node (new node introduced by this patch).
The `PPCISD::TLSGD_AIX` node (`TLSGDAIX` pseudo instruction) is
expanded to 2 copies (to put the variable offset and region handle in
the right registers) and a call to `__tls_get_addr`.

This patch also changes the way TC entries are generated in asm files.
If the generated TC entry is for the region handle of a TLS variable,
we add the `@m` relocation and the `.` prefix to the entry name.
For example:

```
L..C0:
  .tc .v[TC],v[TL]@m -> region handle
L..C1:
  .tc v[TC],v[TL] -> variable offset
```

Reviewed By: nemanjai, sfertile

Differential Revision: https://reviews.llvm.org/D97948
2021-03-08 09:30:19 -06:00
Nemanja Ivanovic
a5222aa085 [DAGCombine] Do not remove masking argument to FP16_TO_FP for some targets
As of commit 284f2bffc9bc5, the DAG Combiner gets rid of the masking of the
input to this node if the mask only keeps the bottom 16 bits. This is because
the underlying library function does not use the high order bits. However, on
PowerPC's ELFv2 ABI, it is the caller that is responsible for clearing the bits
from the register. Therefore, the library implementation of __gnu_h2f_ieee will
return an incorrect result if the bits aren't cleared.

This combine is desired for ARM (and possibly other targets) so this patch adds
a query to Target Lowering to check if this zeroing needs to be kept.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=49092

Differential revision: https://reviews.llvm.org/D96283
2021-02-09 06:33:48 -06:00
Craig Topper
11ef356d9e [TargetLowering] Use Align in allowsMisalignedMemoryAccesses.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96097
2021-02-04 19:22:06 -08:00
Fangrui Song
022cc6e343 [PowerPC] Delete dead Lower* 2021-01-06 21:58:40 -08:00
Fangrui Song
bfa6ca07a8 [PowerPC] Delete remnant Darwin ISelLowering code 2021-01-06 21:40:40 -08:00
QingShan Zhang
ebdd20f430 Expand the fp_to_int/int_to_fp/fp_round/fp_extend as libcall for fp128
X86 and AArch64 expand it as libcall inside the target. And PowerPC also
want to expand them as libcall for P8. So, propose an implement in the
legalizer to common the logic and remove the code for X86/AArch64 to
avoid the duplicate code.

Reviewed By: Craig Topper

Differential Revision: https://reviews.llvm.org/D91331
2020-12-17 07:59:30 +00:00