4711 Commits

Author SHA1 Message Date
Craig Topper
df017ba9d3 [TargetLowering] Don't use ISD::SELECT_CC in expandFP_TO_INT_SAT.
This function gets called for vectors and ISD::SELECT_CC was never
intended to support vectors. Some updates were made to support
it when this function started getting used for vectors.

Overall, using separate ISD::SETCC and ISD::SELECT looks like an
improvement even for scalar.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D149481
2023-04-29 10:23:08 -07:00
Julian Lettner
c3f0153ec2 [MachO] Disable atexit()-based lowering when LTO'ing kernel/kext code
The kernel and kext environments do not provide the `__cxa_atexit()`
function, so we can't use it for lowering global module destructors.

Unfortunately, just querying for "compiling for kernel/kext?" in the LTO
pipeline isn't possible (kernel/kext identifier isn't part of the triple
yet) so we need to pass down a CodeGen flag.

rdar://93536111

Differential Revision: https://reviews.llvm.org/D148967
2023-04-25 12:13:40 -07:00
David Green
15d2821263 [ARM] Fix qsat for armv5te/armv6 + thumb-mode
This is a Thumb1 target, so will not have qsat instructions available. There
was a mismatch between hasBaseDSP and the instruction patterns when +dsp was
present, which is set by clang (but maybe shouldn't be). The target being
thumb1-only should override that, implying that it does not have any qadds.

Fixes #62273
2023-04-23 17:20:28 +01:00
Archibald Elliott
9ee4fe63bc [ARM] Fix Crashes in fp16/bf16 Inline Asm
We were still seeing occasional crashes with inline assembly blocks
using fp16/bf16 after my previous patches:
- https://reviews.llvm.org/rGff4027d152d0
- https://reviews.llvm.org/rG7d15212b8c0c
- https://reviews.llvm.org/rG20b2d11896d9

It turns out:
- The original two commits were wrong, and we should have always been
  choosing the SPR register class, not the HPR register class, so that
  LLVM's SelectionDAGBuilder correctly did the right splits/joins.
- The `splitValueIntoRegisterParts`/`joinRegisterPartsIntoValue` changes
  from rG20b2d11896d9 are still correct, even though they sometimes
  result in inefficient codegen of casts between fp16/bf16 and i32/f32
  (which is visible in these tests).

This patch fixes crashes in `getCopyToParts` and when trying to select
`(bf16 (bitconvert (fp16 ...)))` dags when Neon is enabled.

This patch also adds support for passing fp16/bf16 values using the 'x'
constraint that is LLVM-specific. This should broadly match how we pass
with 't' and 'w', but with a different set of valid S registers.

Differential Revision: https://reviews.llvm.org/D147715
2023-04-13 15:34:04 +01:00
Archibald Elliott
eeb4fe093d [NFC][ARM] Fix Type in Test
I landed this test with a typo, the callsites all show `fp16_inner`
returning `half`, so the declaration should too.
2023-04-13 13:43:51 +01:00
David Green
9bd7b149c2 [ARM] Replace some uses of -mcpu=cortex-m33 with architectures features. NFC
This adjusts some of the tests to use the architecture features directly as
opposed to -mcpu=cortex-m33 names.
2023-04-13 11:57:32 +01:00
sgokhale
bb5befefc6 Revert "[CodeGen][ShrinkWrap] Split restore point"
This reverts commit 5f0bccc3d1a74111458c71f009817c9995f4bf83.

An issue has been reported here: https://github.com/ClangBuiltLinux/linux/issues/1833
2023-04-13 10:52:28 +05:30
Momchil Velikov
4ac6f99ae0 [LiveInterval] Fix live range overlap check
Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D145707
2023-04-11 11:11:30 +01:00
Nikita Popov
cd91992de8 [ARM] Convert test to opaque pointers
When converting this test to opaque pointers, we get a register
move between the call and the inline asm. However, the test
comment specifically says that there should be nothing between them.

As far as I can tell, this is fine, both in that the inline asm
doesn't use the relevant registers, but also more generally
because the inline asm doesn't declare any clobbers, so really
LLVM can do whatever, side effects or not. The test was added
by 618ce3e85ed1c68e89dc696b7c9ab94a6a910797 with only a reference
to Apple's internal issue tracker.

Differential Revision: https://reviews.llvm.org/D147512
2023-04-11 10:28:40 +02:00
sgokhale
5f0bccc3d1 [CodeGen][ShrinkWrap] Split restore point
This patch splits a restore point to allow it to only post-dominate blocks reachable by use
or def of CSRs(Callee Saved Registers)/FI(Frame Index).

Benchmarking this on SPEC2017, this gives around 4% improvement on povray and no significant change
for others.

Co-authored-by: junbuml

Differential Revision: https://reviews.llvm.org/D42600
2023-04-11 11:58:50 +05:30
Nikita Popov
ebd579ccae [ARM] Regenerate test checks (NFC) 2023-04-04 11:25:13 +02:00
Nikita Popov
f45b22eeb5 [ARM] Convert some tests to opaque pointers (NFC) 2023-04-04 11:22:08 +02:00
Nikita Popov
24906aa83e [ARM] Regenerate test checks (NFC) 2023-04-04 11:22:08 +02:00
Nikita Popov
bc2de67f25 [ARM] Name instructions in test (NFC) 2023-04-04 11:22:08 +02:00
Zequan Wu
321d02cc6b [NFC] Update CodeGen/*/nomerge.ll tests with utils/update_llc_test_checks.py.
Precommit this patch for better diff view on D146749.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D147454
2023-04-03 19:52:39 -04:00
Martin Storsjö
c5383536cb [ARM] Handle generating SEH unwind info for t2STR_PRE/t2LDR_POST
This fixes compiling some uncommon cases.

Differential Revision: https://reviews.llvm.org/D147212
2023-03-31 10:22:28 +03:00
Sergei Barannikov
1f5e9a3502 [MCP] Do not try forward non-existent sub-register of a copy
In this example:
```
$d14 = COPY killed $d18
$s0 = MI $s28
```

$s28 is a sub-register of $d14. However, $d18 does not have
sub-registers and thus cannot be forwarded. Previously, this resulted
in $noreg being substituted in place of the use of $s28, which later
led to an assertion failure.

Fixes https://github.com/llvm/llvm-project/issues/60908, a regression
that was introduced in D141747.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D146930
2023-03-30 06:11:00 +03:00
David Green
3ac5a123d9 [ARM] Add Thumb Attributes for thumb thunks created in SLSHarding
Without this the function will be use an Arm subtarget, meaning the
instructions in it will be invalid for the current subtarget.

Differential Revision: https://reviews.llvm.org/D144733
2023-03-24 18:11:54 +00:00
Caleb Zulawski
71dc3de533 [ARM] Improve min/max vector reductions on Arm
This patch adds some more efficient lowering for vecreduce.min/max under NEON,
using sequences of pairwise vpmin/vpmax to reduce to a single value.

This nearly resolves issues such as #50466, #40981, #38190.

Differential Revision: https://reviews.llvm.org/D146404
2023-03-22 16:00:19 +00:00
Julian Lettner
e6a789ef9b Remove -lower-global-dtors-via-cxa-atexit flag
Remove the `-lower-global-dtors-via-cxa-atexit` escape hatch introduced
in D121736 [1], which switched the default lowering of global
destructors on MachO to use `__cxa_atexit()` to avoid emitting
deprecated `__mod_term_func` sections.

I added this flag as an escape hatch in case the switch causes any
problems.  We didn't discover any problems so now we can remove it.

[1] https://reviews.llvm.org/D121736

rdar://90277838

Differential Revision: https://reviews.llvm.org/D145715
2023-03-14 14:18:11 -07:00
Archibald Elliott
b189218d44 [ARM] Fix Chain/Glue Bug in PerformVMOVhrCombine
In this optimisation, the Chain and Glue from the original CopyFromReg
was being lost by this optimisation, which resulted in miscompiles.

This fix just ensures that the input chains are correctly updated, and
that any any users are also updated with the new chain from the new
CopyFromReg.

Fixes #60510.

Differential Revision: https://reviews.llvm.org/D143713
2023-03-06 11:55:54 +00:00
Archibald Elliott
c314667141 [ARM] Pre-Commit Tests for PR60510
Differential Revision: https://reviews.llvm.org/D143712
2023-03-06 11:55:53 +00:00
Archibald Elliott
20b2d11896 [ARM] Fix Crash in 't'/'w' handling without fp16/bf16
After https://reviews.llvm.org/rGff4027d152d0 and
https://reviews.llvm.org/rG7d15212b8c0c we saw crashes in SelectionDAG
when trying to use these constraints when you don't have the fp16 or
bf16 extensions.

However, it is still possible to move 16-bit floating point values into
the right place in S registers with a normal `vmov`, even if we don't
have fp16 instructions we can use within the inline assembly string.
This patch therefore fixes the crash.

I think the reason we weren't getting this crash before is because I
think the __fp16 and __bf16 types got an error diagnostic in the Clang
frontend when you didn't have the right architectural extensions to use
them. This restriction was recently relaxed.

The approach for bf16 needs a bit more explanation. Exactly how BF16 is
legalized was changed in rGb769eb02b526e3966847351e15d283514c2ec767 -
effectively, whether you have the right instructions to get a bf16 value
into/out of a S register with MoveTo/FromHPR depends on hasFullFP16, but
whether you use a HPR for a value of type MVT::bf16 depends on hasBF16.
This is why the tests are not changed by `+bf16` vs `-bf16`, but I've
left both sets of RUN lines in case this changes in the future.

Test Changes:
- Added more testing for testing inline asm (the core part)
- fp16-promote.ll and pr47454.ll show improvements where unnecessary
  fp16-fp32 up/down-casts are no longer emitted. This results in fewer
  libcalls where those casts would be done with a libcall.
- aes-erratum-fix.ll is fairly noisy, and I need to revisit this test so
  that the IR is more minimal than it is right now, because most of the
  changes in this commit do not relate to what AES is actually trying to
  verify.

Differential Revision: https://reviews.llvm.org/D143711
2023-03-06 11:55:08 +00:00
Arthur Eubanks
773d663e47 [IPO] Remove various legacy passes
These are part of the optimization pipeline, of which the legacy pass manager version is deprecated and being removed.
2023-02-27 19:06:08 -08:00
Nick Desaulniers
a3a84c9e25 [llvm] add CallBrPrepare pass to pipelines
Capstone of
https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8

Clang changes are still necessary to enable the use of outputs along
indirect edges of asm goto statements.

Link: https://github.com/llvm/llvm-project/issues/53562

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D140180
2023-02-16 17:58:34 -08:00
Samuel Parker
8f104a3f9a [ARM] O3-pipeline fix 2023-02-13 11:01:00 +00:00
Tim Northover
c4ce967e34 ARM: skip debug instructions when matching jump-table patterns.
When working out whether we can see a compressible jump-table pattern during
ConstantIslands, we were stopping when we saw a debug instruction. Instead it's
better to keep iterating backwards to the first real instruction.

https://reviews.llvm.org/D142019
2023-02-10 12:27:59 +00:00
Andrew Savonichev
c65b4d64d4 [SelectionDAG] Do not second-guess alignment for alloca
Alignment of an alloca in IR can be lower than the preferred alignment
on purpose, but this override essentially treats the preferred
alignment as the minimum alignment.

The patch changes this behavior to always use the specified
alignment. If alignment is not set explicitly in LLVM IR, it is set to
DL.getPrefTypeAlign(Ty) in computeAllocaDefaultAlign.

Tests are changed as well: explicit alignment is increased to match
the preferred alignment if it changes output, or omitted when it is
hard to determine the right value (e.g. for pointers, some structs, or
weird types).

Differential Revision: https://reviews.llvm.org/D135462
2023-02-09 18:45:20 +03:00
Anton Sidorenko
6820cb2dd5 [Test] Fix YAML mapping keys duplication. NFC.
YAML specification does not allow keys duplication an a mapping. However, YAML
parser in LLVM does not have any check on that and uses only the last key entry.
In this change duplicated keys are merged to satisfy the spec.

Differential Revision: https://reviews.llvm.org/D141848
2023-02-09 12:59:50 +03:00
Nick Desaulniers
07c7784d7b [llvm][IfConversion] update successor list when merging INLINEASM_BR
If this successor list is not correct, then branch-folding may
incorrectly think that the indirect target is dead and remove it. This
results in a dangling reference to the removed block as an operand to
the INLINEASM_BR, which later will get AsmPrinted into code that doesn't
assemble.

This was made more obvious by, but is not a regression of
https://reviews.llvm.org/D130316.

Fixes: https://github.com/llvm/llvm-project/issues/60346

Reviewed By: efriedma, void

Differential Revision: https://reviews.llvm.org/D142924
2023-02-07 10:28:11 -08:00
Nick Desaulniers
1cecfa407c precommit test for pr60346
Link: https://github.com/llvm/llvm-project/issues/60346

Reviewed By: efriedma, void

Differential Revision: https://reviews.llvm.org/D142923
2023-02-07 10:28:11 -08:00
Samuel Parker
7bff37783f [SDAG] Check fminnum/fmaxnum for non-zero operand.
Currently, in TargetLowering, if the target does not support fminnum, we lower
to fminimum if neither operand could be a NaN. But this isn't quite correct
because fminnum and fminimum treat +/-0 differently; so, we need to prove that
one of the operands isn't a zero, or we don't have signed zeros.

Differential Revision: https://reviews.llvm.org/D143256
2023-02-07 10:54:23 +00:00
Samuel Parker
a7de5c82bb [NFC] minnum/maxnum intrinsic tests
ARM and WebAssembly tests.
2023-02-07 10:47:40 +00:00
Samuel Parker
91f8289ff0 Revert "[DAGCombine] Fold redundant select"
This reverts commit bbdf24357932b064f2aa18ea1356b474e0220dde.
2023-02-07 10:37:20 +00:00
Sanjay Patel
fb3e3ef62e [SDAG] fix miscompiles caused by using ValueTracking matchSelectPattern to create FMINIMUM/FMAXIMUM
ValueTracking attempts to match compare+select patterns to FP min/max
operations, but it was created before the newer IEEE-754-2019
minimum/maximum ops were defined. Ie, matchSelectPattern() does not
account for the -0.0/+0.0 behavior that is specified in the newer
standard.

FMINIMUM/FMAXIMUM nodes were created to map to the newer standard:

/// FMINIMUM/FMAXIMUM - NaN-propagating minimum/maximum that also treat -0.0
/// as less than 0.0. While FMINNUM_IEEE/FMAXNUM_IEEE follow IEEE 754-2008
/// semantics, FMINIMUM/FMAXIMUM follow IEEE 754-2018 draft semantics.

We could adjust ValueTracking to deal with signed zero, but it seems like
a moot point given the divergent NaN behavior discussed in D143056, so just
delete this possibility to avoid bugs when converting IR to SDAG.

Differential Revision: https://reviews.llvm.org/D143106
2023-02-03 09:53:47 -05:00
Samuel Parker
bbdf243579 [DAGCombine] Fold redundant select
If a chain of two selects share a true/false value and are controlled
by two setcc nodes, that are never both true, we can fold away one of
the selects. So, the following:
(select (setcc X, const0, eq), Y,
  (select (setcc X, const1, eq), Z, Y))

Can be combined to:
  select (setcc X, const1, eq) Z, Y

Differential Revision: https://reviews.llvm.org/D142535
2023-02-02 09:43:21 +00:00
Tim Northover
6e520fcf45 Revert "ARM: skip debug instructions when matching jump-table patterns."
This reverts commit ce4fcea59e1d5829b4355b6401d7265be23f617a.

I committed it accidentally.
2023-01-26 13:26:10 +00:00
Tim Northover
ce4fcea59e ARM: skip debug instructions when matching jump-table patterns.
When working out whether we can see a compressible jump-table pattern during
ConstantIslands, we were stopping when we saw a debug instruction. Instead it's
better to keep iterating backwards to the first real instruction.
2023-01-26 13:00:36 +00:00
Matt Arsenault
778cf5431c IR: Add atomicrmw uinc_wrap and udec_wrap
These are essentially add/sub 1 with a clamping value.

AMDGPU has instructions for these. CUDA/HIP expose these as
atomicInc/atomicDec. Currently we use target intrinsics for these,
but those do no carry the ordering and syncscope. Add these to
atomicrmw so we can carry these and benefit from the regular
legalization processes.
2023-01-24 17:55:11 -04:00
David Green
3770b4aa3c [ARM] Don't emit Arm speculation hardening thunks under Thumb and vice-versa
Given a patch like D129506, using instructions not valid for the current
target feature set becomes an error. This means that emitting Arm
instructions in a Thumb target (or vice versa) becomes an error. When
running in Thumb mode only thumb thunks will be needed, and in Arm mode
only arm thunks are needed. This patch limits the emitted thunks to just
the ones valid for the current architecture.

Differential Revision: https://reviews.llvm.org/D129693
2023-01-23 11:22:11 +00:00
Matt Arsenault
65420c8041 DAG: Use getNegatedExpression in combineMinNumMaxNum
Computing the negated RHS expression just to see if it compares equal
and throw it away feels dirty.
2023-01-23 06:07:23 -04:00
Matt Arsenault
3b80d02992 DAG: Look through fneg when trying to create unsafe minnum/maxnum
This makes most sense for isFNegFree targets, but shouldn't make
things worse without it. This avoids AMDGPU test regressions in a
future patch.

For some reason APFloat::compareAbsoluteValue is private, so compute
the neg of the constants.
2023-01-23 06:07:22 -04:00
Matt Arsenault
0ee04a1e3c ARM: Add baseline test for fneg + fcmp + select combine 2023-01-22 21:21:15 -04:00
Philip Reames
b3154d08e9 [ARM][AArch64] Switch to generic MEMBARRIER node
This change switches both targets from using target specific CompilerBarrier nodes to the recently introduced generic MEMBARRIER instruction.

A couple things to call out.

First, this changes the assembly comment printed. I'm not sure this matters, but if it does, we can simply drop this patch. This is a minor clean up at best.

Second, the ordering operand on the target instruction appears to be unused. We could easily add ordering to the generic instruction, but since we don't seem to have a motivating case in tree, I simply dropped the ordering when selecting to the generic instruction.

Differential Revision: https://reviews.llvm.org/D141513
2023-01-20 08:54:34 -08:00
OCHyams
99c12afeb4 [Assignment Tracking] Fix tests for buildbot failure (2)
Follow-up for 4ece50737d5385fb80cfa23f5297d1111f8eed39 (D142027).

Assignment Tracking Analysis now always runs and is skipped internally if
assignment tracking is disabled. Update these tests to expect to see the
pass run.

Buildbot failure: https://lab.llvm.org/buildbot/#/builders/57/builds/24094
2023-01-20 15:58:35 +00:00
Paul Kirth
557a5bc336 [codegen] Add StackFrameLayoutAnalysisPass
Issue #58168 describes the difficulty diagnosing stack size issues
identified by -Wframe-larger-than. For simple code, its easy to
understand the stack layout and where space is being allocated, but in
more complex programs, where code may be heavily inlined, unrolled, and
have duplicated code paths, it is no longer easy to manually inspect the
source program and understand where stack space can be attributed.

This patch implements a machine function pass that emits remarks with a
textual representation of stack slots, and also outputs any available
debug information to map source variables to those slots.

The new behavior can be used by adding `-Rpass-analysis=stack-frame-layout`
to the compiler invocation. Like other remarks the diagnostic
information can be saved to a file in a machine readable format by
adding -fsave-optimzation-record.

Fixes: #58168

Reviewed By: nickdesaulniers, thegameg

Differential Revision: https://reviews.llvm.org/D135488
2023-01-19 01:51:14 +00:00
Nikita Popov
9ed2f14c87 [AsmParser] Remove typed pointer auto-detection
IR is now always parsed in opaque pointer mode, unless
-opaque-pointers=0 is explicitly given. There is no automatic
detection of typed pointers anymore.

The -opaque-pointers=0 option is added to any remaining IR tests
that haven't been migrated yet.

Differential Revision: https://reviews.llvm.org/D141912
2023-01-18 09:58:32 +01:00
Paul Kirth
fdc0bf6adc Revert "[codegen] Add StackFrameLayoutAnalysisPass"
This breaks on some AArch64 bots

This reverts commit 0a652c540556a118bbd9386ed3ab7fd9e60a9754.
2023-01-13 22:59:36 +00:00
Paul Kirth
0a652c5405 [codegen] Add StackFrameLayoutAnalysisPass
Issue #58168 describes the difficulty diagnosing stack size issues
identified by -Wframe-larger-than. For simple code, its easy to
understand the stack layout and where space is being allocated, but in
more complex programs, where code may be heavily inlined, unrolled, and
have duplicated code paths, it is no longer easy to manually inspect the
source program and understand where stack space can be attributed.

This patch implements a machine function pass that emits remarks with a
textual representation of stack slots, and also outputs any available
debug information to map source variables to those slots.

The new behavior can be used by adding `-Rpass-analysis=stack-frame-layout`
to the compiler invocation. Like other remarks the diagnostic
information can be saved to a file in a machine readable format by
adding -fsave-optimzation-record.

Fixes: #58168

Reviewed By: nickdesaulniers, thegameg

Differential Revision: https://reviews.llvm.org/D135488
2023-01-13 20:52:48 +00:00
Nikita Popov
60442f0d44 [CodeGen] Convert some tests to opaque pointers (NFC)
These are mostly MIR tests, which I did not handle during previous
conversions.
2023-01-05 13:21:20 +01:00