2405 Commits

Author SHA1 Message Date
John Brawn
cee7e7b245 [ARM] Correctly handle execute-only in EmitStructByval
Currently when compiling for an execute-only target without movt then
EmitStructByval will generate a constant pool load which isn't
compatible with execute-only. Handle this by emitting tMOVi32imm,
and also simplify the existing movt handling by emitting t2MOVi32imm
or MOVi32imm.

Differential Revision: https://reviews.llvm.org/D154944
2023-07-19 13:56:36 +01:00
Oliver Stannard
aea8db8eb9 Revert "[CodeGen] Store SP adjustment in MachineBasicBlock. NFCI."
This reverts commit 58d1eaa3b6ce4f7285c51f83faff7a3ac374c746.
2023-07-13 14:25:39 +01:00
Caslyn Tonelli
6d9065a716 Revert "[ARM] Correctly handle execute-only in EmitStructByval"
This reverts commit 210f61cbddeddac47b347db072d674ee142520f6.

Differential Revision: https://reviews.llvm.org/D155138
2023-07-12 23:29:54 +00:00
Jay Foad
58d1eaa3b6 [CodeGen] Store SP adjustment in MachineBasicBlock. NFCI.
Record the SP adjustment on entry to each basic block. This is almost
always zero except on targets like ARM which can split a basic block in
the middle of a call sequence.

This simplifies PEI::replaceFrameIndices which previously had to visit
basic blocks in a specific order and had special handling for
unreachable blocks. More importantly it paves the way for an equally
simple implementation of a backwards version of replaceFrameIndices,
which is required to fully convert PrologEpilogInserter to backwards
register scavenging, which is preferred because it does not rely on
accurate kill flags.

Differential Revision: https://reviews.llvm.org/D154281
2023-07-12 14:29:26 +01:00
John Brawn
210f61cbdd [ARM] Correctly handle execute-only in EmitStructByval
Currently when compiling for an execute-only target without movt then
EmitStructByval will generate a constant pool load which isn't
compatible with execute-only. Handle this by emitting tMOVi32imm,
and also simplify the existing movt handling by emitting t2MOVi32imm
or MOVi32imm.

Differential Revision: https://reviews.llvm.org/D154944
2023-07-12 11:48:01 +01:00
Ties Stuij
f0ae3c23b5 [ARM] in LowerConstantFP, make sure we cover armv6-m execute-only
Currently in LowerConstantFP, when we compile for execute-only (XO) we don't
check what architecture we're compiling for (v6m=< or >v6m). We shouldn't get
here for v6m, so put in an assert.

Reviewed By: simonwallis2, dmgreen

Differential Revision: https://reviews.llvm.org/D154506
2023-07-11 10:42:15 +01:00
John Brawn
4fb0e0114f [ARM] Generate out-of-line jump tables for XO without 32-bit branch
When we only have a 16-bit pc-relative branch instruction we generate
a table of address for a jump table. Currently this is placed inline,
but this won't work with execute-only memory. In this case generate
the jump table out-of-line.

Differential Revision: https://reviews.llvm.org/D153774
2023-06-28 13:30:39 +01:00
Ties Stuij
4f19c6a7c7 [ARM] allow long-call codegen for armv6-M eXecute Only (XO)
Recently eXecute Only (XO) codegen was also allowed for armv6-M. Previously this
was only implemented for ~armv7+, effectively if MOVW/MOVT is
available. Regarding long calls, we remove the check for MOVW/MOVT when
generating code for XO, which already was redundant as in the subtarget
initialization we already check if XO is valid for the target. And targets that
generate valid XO code should be able to handle the (wrapper globaladdress)
node.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D153782
2023-06-28 10:50:24 +01:00
Maurice Heumann
249bd9eab0 [ARM] Fix codegen of unaligned volatile load/store of i64
Volatile loads/stores of i64 are lowered to LDRD/STRD on ARMv5TE.
However, these instructions require the addresses to be aligned.
Unaligned loads/stores therefore should be ignored by this handling.

Differential Revision: https://reviews.llvm.org/D152790
2023-06-26 10:45:41 -07:00
Ties Stuij
2273741ea2 [ARM] generate armv6m eXecute Only (XO) code
[ARM] generate armv6m eXecute Only (XO) code for immediates, globals

Previously eXecute Only (XO) support was implemented for targets that support
MOVW/MOVT (~armv7+). See: https://reviews.llvm.org/D27449

XO prevents the compiler from generating data accesses to code sections. This
patch implements XO codegen for armv6-M, which does not support MOVW/MOVT, and
must resort to the following general pattern to avoid loads:

    movs    r3, :upper8_15:foo
    lsls    r3, #8
    adds    r3, :upper0_7:foo
    lsls    r3, #8
    adds    r3, :lower8_15:foo
    lsls    r3, #8
    adds    r3, :lower0_7:foo
    ldr     r3, [r3]

This is equivalent to the code pattern generated by GCC.

The above relocations are new to LLVM and have been implemented in a parent
patch: https://reviews.llvm.org/D149443.

This patch limits itself to implementing codegen for this pattern and enabling
XO for armv6-M in the backend.

Separate patches will follow for:
- switch tables
- replacing specific loads from constant islands which are spread out over the
  ARM backend codebase. Amongst others: FastISel, call lowering, stack frames.

Reviewed By: john.brawn

Differential Revision: https://reviews.llvm.org/D152795
2023-06-23 10:50:47 +01:00
Igor Kirillov
40a81d3100 [CodeGen] Refactor IR generation functions to use IRBuilder in ComplexDeinterleaving pass
This patch updates several functions in LLVM's IR generation code to accept
an IRBuilder object as an argument, rather than an Instruction that indicates
the insertion point for new instructions.
This change is necessary to handle sophisticated -Ofast optimization cases
from D148558 where it's unclear which instructions should be used as the
insertion point for new operations.

Differential Revision: https://reviews.llvm.org/D148703
2023-05-30 16:18:28 +00:00
Sergei Barannikov
01a7967447 [CodeGen] Replace CCState's getNextStackOffset with getStackSize (NFC)
The term "next stack offset" is misleading because the next argument is
not necessarily allocated at this offset due to alignment constrains.
It also does not make much sense when allocating arguments at negative
offsets (introduced in a follow-up patch), because the returned offset
would be past the end of the next argument.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D149566
2023-05-17 21:51:45 +03:00
Jay Foad
d8229e2f14 [KnownBits] Define and use intersectWith and unionWith
Define intersectWith and unionWith as two complementary ways of
combining KnownBits. The names are chosen for consistency with
ConstantRange.

Deprecate commonBits as a synonym for intersectWith.

Differential Revision: https://reviews.llvm.org/D150443
2023-05-16 09:23:51 +01:00
Zequan Wu
3977b77a6b [CodeGen] Fix nomerge attribute not working in tail calls.
In D79537, `nomerge` was made to only apply to non-tail calls. This fixes it by also applying it to tail calls.

For ARM, I only made the new MI to inherit the flag under `TCRETURNdi` and `TCRETURNri`, because that's the place tail calls got replaced. Not sure if there's any other place needed.

Fixes #61545.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D146749
2023-05-10 14:25:11 -04:00
NAKAMURA Takumi
c1221251fb Restore CodeGen/MachineValueType.h from Support
This is rework of;

  - rG13e77db2df94 (r328395; MVT)

Since `LowLevelType.h` has been restored to `CodeGen`, `MachinveValueType.h`
can be restored as well.

Depends on D148767

Differential Revision: https://reviews.llvm.org/D149024
2023-05-03 00:13:20 +09:00
Sergei Barannikov
e744e51b12 [SelectionDAG] Rename ADDCARRY/SUBCARRY to UADDO_CARRY/USUBO_CARRY (NFC)
This will make them consistent with other overflow-aware nodes.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D148196
2023-04-29 21:59:58 +03:00
David Green
d321f3aa64 [ARM] Enable shouldFoldSelectWithIdentityConstant for MVE
We already have tablegen patterns for a lot of these, but performing the
combine earlier in DAG can help in a few extra cases.

Differential Revision: https://reviews.llvm.org/D149269
2023-04-28 14:57:51 +01:00
Daniel Kiss
d75e70d7ae [AArch64] Add preserve_all calling convention.
Clang accepts preserve_all for AArch64 while it is missing form the backed.

Fixes #58145

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D135652
2023-04-28 14:55:38 +02:00
David Green
15d2821263 [ARM] Fix qsat for armv5te/armv6 + thumb-mode
This is a Thumb1 target, so will not have qsat instructions available. There
was a mismatch between hasBaseDSP and the instruction patterns when +dsp was
present, which is set by clang (but maybe shouldn't be). The target being
thumb1-only should override that, implying that it does not have any qadds.

Fixes #62273
2023-04-23 17:20:28 +01:00
Archibald Elliott
9ee4fe63bc [ARM] Fix Crashes in fp16/bf16 Inline Asm
We were still seeing occasional crashes with inline assembly blocks
using fp16/bf16 after my previous patches:
- https://reviews.llvm.org/rGff4027d152d0
- https://reviews.llvm.org/rG7d15212b8c0c
- https://reviews.llvm.org/rG20b2d11896d9

It turns out:
- The original two commits were wrong, and we should have always been
  choosing the SPR register class, not the HPR register class, so that
  LLVM's SelectionDAGBuilder correctly did the right splits/joins.
- The `splitValueIntoRegisterParts`/`joinRegisterPartsIntoValue` changes
  from rG20b2d11896d9 are still correct, even though they sometimes
  result in inefficient codegen of casts between fp16/bf16 and i32/f32
  (which is visible in these tests).

This patch fixes crashes in `getCopyToParts` and when trying to select
`(bf16 (bitconvert (fp16 ...)))` dags when Neon is enabled.

This patch also adds support for passing fp16/bf16 values using the 'x'
constraint that is LLVM-specific. This should broadly match how we pass
with 't' and 'w', but with a different set of valid S registers.

Differential Revision: https://reviews.llvm.org/D147715
2023-04-13 15:34:04 +01:00
David Green
b4df2b2c6c [ARM] Combine fadd into fcmla
This is the MVE equivalent of https://reviews.llvm.org/D146407. It adds a
target combine for fadd(a, vcmla(b, c, d)) -> vcmla(fadd(a, b), c, d), pushing
the fadd into the operands of the fcmla, which can help simplify away some
additions.

Differential Revision: https://reviews.llvm.org/D147200
2023-04-05 10:31:19 +01:00
Craig Topper
219ff07f72 [Targets] Rename Flag->Glue. NFC
Long long ago Glue was called Flag, and it was never completely
renamed.
2023-04-02 19:28:51 -07:00
Simon Pilgrim
8153b92d9b [DAG] Add SelectionDAG::SplitScalar helper
Similar to the existing SelectionDAG::SplitVector helper, this helper creates the EXTRACT_ELEMENT nodes for the LO/HI halves of the scalar source.

Differential Revision: https://reviews.llvm.org/D147264
2023-03-31 18:35:40 +01:00
Kazu Hirata
847b7f358b [ARM] Use isNullConstant and isOneConstant (NFC) 2023-03-29 21:50:34 -07:00
Caleb Zulawski
71dc3de533 [ARM] Improve min/max vector reductions on Arm
This patch adds some more efficient lowering for vecreduce.min/max under NEON,
using sequences of pairwise vpmin/vpmax to reduce to a single value.

This nearly resolves issues such as #50466, #40981, #38190.

Differential Revision: https://reviews.llvm.org/D146404
2023-03-22 16:00:19 +00:00
Archibald Elliott
b189218d44 [ARM] Fix Chain/Glue Bug in PerformVMOVhrCombine
In this optimisation, the Chain and Glue from the original CopyFromReg
was being lost by this optimisation, which resulted in miscompiles.

This fix just ensures that the input chains are correctly updated, and
that any any users are also updated with the new chain from the new
CopyFromReg.

Fixes #60510.

Differential Revision: https://reviews.llvm.org/D143713
2023-03-06 11:55:54 +00:00
Archibald Elliott
20b2d11896 [ARM] Fix Crash in 't'/'w' handling without fp16/bf16
After https://reviews.llvm.org/rGff4027d152d0 and
https://reviews.llvm.org/rG7d15212b8c0c we saw crashes in SelectionDAG
when trying to use these constraints when you don't have the fp16 or
bf16 extensions.

However, it is still possible to move 16-bit floating point values into
the right place in S registers with a normal `vmov`, even if we don't
have fp16 instructions we can use within the inline assembly string.
This patch therefore fixes the crash.

I think the reason we weren't getting this crash before is because I
think the __fp16 and __bf16 types got an error diagnostic in the Clang
frontend when you didn't have the right architectural extensions to use
them. This restriction was recently relaxed.

The approach for bf16 needs a bit more explanation. Exactly how BF16 is
legalized was changed in rGb769eb02b526e3966847351e15d283514c2ec767 -
effectively, whether you have the right instructions to get a bf16 value
into/out of a S register with MoveTo/FromHPR depends on hasFullFP16, but
whether you use a HPR for a value of type MVT::bf16 depends on hasBF16.
This is why the tests are not changed by `+bf16` vs `-bf16`, but I've
left both sets of RUN lines in case this changes in the future.

Test Changes:
- Added more testing for testing inline asm (the core part)
- fp16-promote.ll and pr47454.ll show improvements where unnecessary
  fp16-fp32 up/down-casts are no longer emitted. This results in fewer
  libcalls where those casts would be done with a libcall.
- aes-erratum-fix.ll is fairly noisy, and I need to revisit this test so
  that the IR is more minimal than it is right now, because most of the
  changes in this commit do not relate to what AES is actually trying to
  verify.

Differential Revision: https://reviews.llvm.org/D143711
2023-03-06 11:55:08 +00:00
Kazu Hirata
f8f3db2756 Use APInt::count{l,r}_{zero,one} (NFC) 2023-02-19 22:04:47 -08:00
Kazu Hirata
cbde2124f1 Use APInt::popcount instead of APInt::countPopulation (NFC)
This is for consistency with the C++20-style bit manipulation
functions in <bit>.
2023-02-19 11:29:12 -08:00
Kazu Hirata
7e6e636fb6 Use llvm::has_single_bit<uint32_t> (NFC)
This patch replaces isPowerOf2_32 with llvm::has_single_bit<uint32_t>
where the argument is wider than uint32_t.
2023-02-15 22:17:27 -08:00
Jake Egan
08533f8b86 Revert "[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation"
These commits are causing a test-suite build failure on AIX. Revert for now for time to investigate.
https://lab.llvm.org/buildbot/#/builders/214/builds/5779/steps/9/logs/stdio

This reverts commit bd87a2449da0c82e63cebdf9c131c54a5472e3a7 and 4c72266830ffa332ebb7cf1d3bbd6c56d001fa0f.
2023-02-14 15:20:06 -05:00
Alex Richardson
bd87a2449d [CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation
This function was added for ARM targets, but aligning global/stack pointer
arguments passed to memcpy/memmove/memset can improve code size and
performance for all targets that don't have fast unaligned accesses.
This adds a generic implementation that adjusts the alignment to pointer
size if unaligned accesses are slow.
Review D134168 suggests that this significantly improves performance on
synthetic benchmarks such as Dhrystone on RV32 as it avoids memcpy() calls.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D134282
2023-02-09 10:11:40 +00:00
Simon Pilgrim
2c580884c1 [ARM] Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFC.
Use APInt::setBit() method instead of OR'ing individual bits.
2023-02-08 15:27:05 +00:00
Archibald Elliott
62c7f035b4 [NFC][TargetParser] Remove llvm/ADT/Triple.h
I also ran `git clang-format` to get the headers in the right order for
the new location, which has changed the order of other headers in two
files.
2023-02-07 12:39:46 +00:00
David Green
734d113a6c [ARM] Remove reduce(shuffle) if all the lanes are used
This looks for vaddv(shuffle) or vmlav(shuffle, shuffle), with a shuffle where
all the lanes are used once. Due to the reduction being commutative the shuffle
can be removed.

Differential Revision: https://reviews.llvm.org/D143382
2023-02-07 10:44:35 +00:00
David Green
c56846a892 [ARM] Remove FlattenVectorShuffle and add PerformVQDMULHCombine.
This removes the FlattenVectorShuffle that folds shuffles through certain
binops. This is now handled by generic DAG combines for all but ARMISD::VQDMULH
where a PerformVQDMULHCombine is added to compensate. It pushes identical
shuffles down through the operation, in a similar way to the other combines in
DAG.
2023-02-05 20:59:49 +00:00
Simon Tatham
60ea6f35a2 [ARM] Allow selecting hard-float ABI in integer-only MVE.
Armv8.1-M can be configured to support the integer subset of the MVE
vector instructions, and no floating point. In that situation, the FP
and vector registers still exist, and so do the load, store and move
instructions that transfer data in and out of them. So there's no
reason the hard floating point ABI can't be supported, and you might
reasonably want to use it, for the sake of intrinsics-based code
passing explicit MVE vector types between functions.

But the selection of the hard float ABI in the backend was gated on
Subtarget->hasVFP2Base(), which is false in the case of integer MVE
and no FP.

As a result, you'd silently get the soft float ABI even if you
deliberately tried to select it, e.g. with clang options such as
--target=arm-none-eabi -mfloat-abi=hard -march=armv8.1m.main+nofp+mve

The hard float ABI should have been gated on the weaker condition
Subtarget->hasFPRegs(), because the only requirement for being able to
pass arguments in the FP registers is that the registers themselves
should exist.

I haven't added a new test, because changing the existing
CodeGen/Thumb2/float-ops.ll test seemed sufficient. But I've added a
comment explaining why the results are expected to be what they are.

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D142703
2023-02-01 09:05:12 +00:00
Kazu Hirata
e078201835 [Target] Use llvm::count{l,r}_{zero,one} (NFC) 2023-01-28 09:23:07 -08:00
Guillaume Chatelet
355cc3fd8c [NFC] Deprecate SelectionDag functions taking Alignment as unsigned 2023-01-24 10:40:12 +00:00
Jay Foad
768aed1378 [MC] Make more use of MCInstrDesc::operands. NFC.
Change MCInstrDesc::operands to return an ArrayRef so we can easily use
it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end.
A future patch will remove opInfo_begin and opInfo_end.

Also use it instead of raw access to the OpInfo pointer. A future patch
will remove this pointer.

Differential Revision: https://reviews.llvm.org/D142213
2023-01-23 11:31:41 +00:00
Kazu Hirata
188ec33726 [llvm] Use llvm::bit_width (NFC) 2023-01-21 14:48:32 -08:00
David Green
e49367e7f3 [ARM] Fix i1 shuffle lowering with multiple operands.
The existing lowering of i1 vector shuffle was only considering
single-source shuffles, always assuming the second was undef. This
extends that to properly handle both operands.
2023-01-17 11:29:51 +00:00
Fangrui Song
6052eac2a8 [ARM] Properly fix -Wsign-compare after D141791 2023-01-16 23:57:44 -08:00
Simon Pilgrim
cf47a8d383 Silence signed/unsigned comparison warnings. NFC. 2023-01-16 18:52:04 +00:00
Simon Pilgrim
f4f8f9f185 [Thumb2][MVE] Recognise shuffle truncation patterns suitable for ARMISD::MVETRUNC
I'm helping with the remaining regressions on D127115, and one of my candidate fixes caused some regressions with MVE interleaved shuffles due to poor handling of 'truncation' style shuffle masks (0,2,4,6,...).

This patch attempts to use the ARMISD::MVETRUNC node to handle these cases, based off existing code in LowerTruncate.

It handles both (0,2,4,6,...) and (1,3,5,7,....) 'top' style patterns (assuming no endian problems). I shift down the 'top' patterns - a basic search of ARM docs suggests MVE has some top/bottom truncation/narrowing instructions but I don't seem to be able to get them to be used.

Differential Revision: https://reviews.llvm.org/D141791
2023-01-16 17:59:45 +00:00
Roman Lebedev
cc39c3b17f
[Codegen][LegalizeIntegerTypes] New legalization strategy for scalar shifts: shift through stack
https://reviews.llvm.org/D140493 is going to teach SROA how to promote allocas
that have variably-indexed loads. That does bring up questions of cost model,
since that requires creating wide shifts.

Indeed, our legalization for them is not optimal.
We either split it into parts, or lower it into a libcall.
But if the shift amount is by a multiple of CHAR_BIT,
we can also legalize it throught stack.

The basic idea is very simple:
1. Get a stack slot 2x the width of the shift type
2. store the value we are shifting into one half of the slot
3. pad the other half of the slot. for logical shifts, with zero, for arithmetic shift with signbit
4. index into the slot (starting from the base half into which we spilled, either upwards or downwards)
5. load
6. split loaded integer

This works for both little-endian and big-endian machines:
https://alive2.llvm.org/ce/z/YNVwd5

And better yet, if the original shift amount was not a multiple of CHAR_BIT,
we can just shift by that remainder afterwards: https://alive2.llvm.org/ce/z/pz5G-K

I think, if we are going perform shift->shift-by-parts expansion more than once,
we should instead go through stack, which is what this patch does.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D140638
2023-01-14 19:12:18 +03:00
Craig Topper
79858d1908 [CodeGen][Target] Remove uses of Register::isPhysicalRegister/isVirtualRegister. NFC
Use isPhysical/isVirtual methods.
2023-01-13 23:12:48 -08:00
Guillaume Chatelet
8fd5558b29 [NFC] Use TypeSize::geFixedValue() instead of TypeSize::getFixedSize()
This change is one of a series to implement the discussion from
https://reviews.llvm.org/D141134.
2023-01-11 16:49:38 +00:00
serge-sans-paille
38818b60c5
Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part
Use deduction guides instead of helper functions.

The only non-automatic changes have been:

1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).

Per reviewers' comment, some useless makeArrayRef have been removed in the process.

This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.

Differential Revision: https://reviews.llvm.org/D140955
2023-01-05 14:11:08 +01:00
Qiu Chaofan
a40ef656d8 [Intrinsic] Rename flt.rounds intrinsic to get.rounding
Address the inconsistency between FLT_ROUNDS_ and SET_ROUNDING SDAG
node. Rename FLT_ROUNDS_ to GET_ROUNDING and add llvm.get.rounding
intrinsic to replace flt.rounds.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D139507
2022-12-19 15:22:39 +08:00