XRay instrumentation works for macOS running on Apple Silicon, but
codegen is untested there. I'm going to make changes affecting this
target, get the XRay tests running on AArch64.
Data sections are going to become slightly different on x86_64 soon.
I do want the tests to be specific about symbol names, so instead of
having test check the common step, bifurcate tests a bit and check
the full symbol names.
As for ARM, XRay is not really supported on iOS at the moment, though
ARM is also really used there with modern phones. Nevertheless, codegen
tests exist and the output is going to change a little, make it easier
to write the special case for iOS.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D145291
Currently, a node and its users are added back to the worklist in reverse topological order after it is combined. This diff changes that order to be topological. This is part of a larger migration to get the DAGCombiner to process nodes in topological order.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D127115
The change implements intrinsics 'get_fpenv', 'set_fpenv' and 'reset_fpenv'.
They are used to read floating-point environment, set it or reset to
some default state. They do the same actions as C library functions
'fegetenv' and 'fesetenv'. By default these intrinsics are lowered to calls
to these functions.
The new intrinsics specify FP environment as a value of integer type, it
is convenient of most targets where the FP state is a content of some
register. Some targets however use long representations. On X86 the size
of FP environment is 256 bits, and even half of this size is not a legal
ibteger type. To facilitate legalization in such cases, two sets of DAG
nodes is used. Nodes GET_FPENV and SET_FPENV are used when FP
environment may be represented by a legal integer type. Nodes
GET_FPENV_MEM and SET_FPENV_MEM consider FP environment as a region in
memory, much like `fesetenv` and `fegetenv` do. They are used when
target has long representation for floationg-point state.
Differential Revision: https://reviews.llvm.org/D71742
This is an attempt to reland D42600 and enabling this optimisation by default.
This also resolves the issue pointed out in the context of PGO build.
Differential Revision: https://reviews.llvm.org/D42600
There are two motivations.
`-fno-pic -fstack-protector -mstack-protector-guard=global` created
`__stack_chk_guard` is referenced directly on all ELF OSes except FreeBSD.
This patch allows referencing the symbol indirectly with
-fno-direct-access-external-data.
Some Linux kernel folks want
`-fno-pic -fstack-protector -mstack-protector-guard-reg=gs -mstack-protector-guard-symbol=__stack_chk_guard`
created `__stack_chk_guard` to be referenced directly, avoiding
R_X86_64_REX_GOTPCRELX (even if the relocation may be optimized out by the linker).
https://github.com/llvm/llvm-project/issues/60116
Why they need this isn't so clear to me.
---
Add module flag "direct-access-external-data" and set the dso_local property of
the stack protector symbol. The module flag can benefit other LLVMCodeGen
synthesized symbols that are not represented in LLVM IR.
Nowadays, with `-fno-pic` being uncommon, ideally we should set
"direct-access-external-data" when it is true. However, doing so would require
~90 clang/test tests to be updated, which are too much.
As a compromise, we set "direct-access-external-data" only when it's different
from the implied default value.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D150841
We can compute a simpler expression for Lo for these cases. This
is an alternative for the test cases in D151180 that works for
more targets.
This is similar to some of the special cases we have for expanding
setcc operands.
Differential Revision: https://reviews.llvm.org/D151182
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
This is the implementation of D149782
The patch implements a helper function that matches and fold the following cases in the DAGCombiner:
1. `bswap(logic_op(x, bswap(y))) -> logic_op(bswap(x), y)`
2. `bswap(logic_op(bswap(x), y)) -> logic_op(x, bswap(y))`
3. `bswap(logic_op(bswap(x), bswap(y))) -> logic_op(x, y)` in multiuse case, which still reduces the number of instructions.
The helper function accepts SDValue with BSWAP and BITREVERSE opcode. This patch folds the BSWAP cases and remain the BITREVERSE optimization in the future
Reviewed By: RKSimon, goldstein.w.n
Differential Revision: https://reviews.llvm.org/D149783
Fold the following case on SelectionDAG combiner
This patch includes the regression test cases
```
bswap(logic_op(x, bswap(y))) -> logic_op(bswap(x), y)
bswap(logic_op(bswap(x), y)) -> logic_op(x, bswap(y))
bswap(logic_op(bswap(x), bswap(y))) -> logic_op(x, y) (with multiuse)
```
Reviewed By: goldstein.w.n
Differential Revision: https://reviews.llvm.org/D149782
This patch fixes a potential crash due to RegAllocFast not rewriting virtual
registers. This essentially happens because of a call to
MachineInstr::addRegisterKilled() in the process of allocating a "killed" vreg.
The former can eventually delete implicit operands without RegAllocFast
noticing, leading to some operands being "skipped" and not rewritten to use
physical registers.
Note that I noticed this crash when working on a solution for tying a register
with one/multiple of its sub-registers within an instruction. (See problem
description here:
https://discourse.llvm.org/t/pass-to-tie-an-output-operand-to-a-subregister-of-an-input-operand/67184).
Aside from this fix, I believe there could be further improvements to the
RegAllocFast when it comes to instructions with multiple uses of a same virtual
register. You can see it in the added test where the implicit uses have been
re-written in a somewhat surprising way because of phase ordering. Ultimately,
when allocating vregs for an instruction, I believe we should iterate on the
vregs it uses (and then process all the operands that use this vregs), instead
of directly iterating on operands and somewhat assuming each operand uses a
different vreg. This would in the end be quite close to what
greedy+virtregrewriter does. If that makes sense, I would probably spin off
another patch (after I get more familiar with RegAllocFast).
Differential Revision: https://reviews.llvm.org/D145169
The current logic is pretty limitted unless the `Op` is a
constant. This at least covers more obvious cases.
Reviewed By: craig.topper, foad
Differential Revision: https://reviews.llvm.org/D149196
In D79537, `nomerge` was made to only apply to non-tail calls. This fixes it by also applying it to tail calls.
For ARM, I only made the new MI to inherit the flag under `TCRETURNdi` and `TCRETURNri`, because that's the place tail calls got replaced. Not sure if there's any other place needed.
Fixes#61545.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D146749
This reverts commit 1ddfd1c8186735c62b642df05c505dc4907ffac4.
The original commit causes a Chrome build assertion failure with
ThinLTO: https://crbug.com/1443635
Assuming that the stack grows downwards, it is fine if the stack
pointer is exactly at the stacklet boundary. We should use
less-or-equal condition when deciding whether to skip new memory
allocation.
Differential Revision: https://reviews.llvm.org/D149315
This function gets called for vectors and ISD::SELECT_CC was never
intended to support vectors. Some updates were made to support
it when this function started getting used for vectors.
Overall, using separate ISD::SETCC and ISD::SELECT looks like an
improvement even for scalar.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D149481
The kernel and kext environments do not provide the `__cxa_atexit()`
function, so we can't use it for lowering global module destructors.
Unfortunately, just querying for "compiling for kernel/kext?" in the LTO
pipeline isn't possible (kernel/kext identifier isn't part of the triple
yet) so we need to pass down a CodeGen flag.
rdar://93536111
Differential Revision: https://reviews.llvm.org/D148967
This is a Thumb1 target, so will not have qsat instructions available. There
was a mismatch between hasBaseDSP and the instruction patterns when +dsp was
present, which is set by clang (but maybe shouldn't be). The target being
thumb1-only should override that, implying that it does not have any qadds.
Fixes#62273
We were still seeing occasional crashes with inline assembly blocks
using fp16/bf16 after my previous patches:
- https://reviews.llvm.org/rGff4027d152d0
- https://reviews.llvm.org/rG7d15212b8c0c
- https://reviews.llvm.org/rG20b2d11896d9
It turns out:
- The original two commits were wrong, and we should have always been
choosing the SPR register class, not the HPR register class, so that
LLVM's SelectionDAGBuilder correctly did the right splits/joins.
- The `splitValueIntoRegisterParts`/`joinRegisterPartsIntoValue` changes
from rG20b2d11896d9 are still correct, even though they sometimes
result in inefficient codegen of casts between fp16/bf16 and i32/f32
(which is visible in these tests).
This patch fixes crashes in `getCopyToParts` and when trying to select
`(bf16 (bitconvert (fp16 ...)))` dags when Neon is enabled.
This patch also adds support for passing fp16/bf16 values using the 'x'
constraint that is LLVM-specific. This should broadly match how we pass
with 't' and 'w', but with a different set of valid S registers.
Differential Revision: https://reviews.llvm.org/D147715
When converting this test to opaque pointers, we get a register
move between the call and the inline asm. However, the test
comment specifically says that there should be nothing between them.
As far as I can tell, this is fine, both in that the inline asm
doesn't use the relevant registers, but also more generally
because the inline asm doesn't declare any clobbers, so really
LLVM can do whatever, side effects or not. The test was added
by 618ce3e85ed1c68e89dc696b7c9ab94a6a910797 with only a reference
to Apple's internal issue tracker.
Differential Revision: https://reviews.llvm.org/D147512
This patch splits a restore point to allow it to only post-dominate blocks reachable by use
or def of CSRs(Callee Saved Registers)/FI(Frame Index).
Benchmarking this on SPEC2017, this gives around 4% improvement on povray and no significant change
for others.
Co-authored-by: junbuml
Differential Revision: https://reviews.llvm.org/D42600
In this example:
```
$d14 = COPY killed $d18
$s0 = MI $s28
```
$s28 is a sub-register of $d14. However, $d18 does not have
sub-registers and thus cannot be forwarded. Previously, this resulted
in $noreg being substituted in place of the use of $s28, which later
led to an assertion failure.
Fixes https://github.com/llvm/llvm-project/issues/60908, a regression
that was introduced in D141747.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D146930
Without this the function will be use an Arm subtarget, meaning the
instructions in it will be invalid for the current subtarget.
Differential Revision: https://reviews.llvm.org/D144733
This patch adds some more efficient lowering for vecreduce.min/max under NEON,
using sequences of pairwise vpmin/vpmax to reduce to a single value.
This nearly resolves issues such as #50466, #40981, #38190.
Differential Revision: https://reviews.llvm.org/D146404
Remove the `-lower-global-dtors-via-cxa-atexit` escape hatch introduced
in D121736 [1], which switched the default lowering of global
destructors on MachO to use `__cxa_atexit()` to avoid emitting
deprecated `__mod_term_func` sections.
I added this flag as an escape hatch in case the switch causes any
problems. We didn't discover any problems so now we can remove it.
[1] https://reviews.llvm.org/D121736
rdar://90277838
Differential Revision: https://reviews.llvm.org/D145715
In this optimisation, the Chain and Glue from the original CopyFromReg
was being lost by this optimisation, which resulted in miscompiles.
This fix just ensures that the input chains are correctly updated, and
that any any users are also updated with the new chain from the new
CopyFromReg.
Fixes#60510.
Differential Revision: https://reviews.llvm.org/D143713
After https://reviews.llvm.org/rGff4027d152d0 and
https://reviews.llvm.org/rG7d15212b8c0c we saw crashes in SelectionDAG
when trying to use these constraints when you don't have the fp16 or
bf16 extensions.
However, it is still possible to move 16-bit floating point values into
the right place in S registers with a normal `vmov`, even if we don't
have fp16 instructions we can use within the inline assembly string.
This patch therefore fixes the crash.
I think the reason we weren't getting this crash before is because I
think the __fp16 and __bf16 types got an error diagnostic in the Clang
frontend when you didn't have the right architectural extensions to use
them. This restriction was recently relaxed.
The approach for bf16 needs a bit more explanation. Exactly how BF16 is
legalized was changed in rGb769eb02b526e3966847351e15d283514c2ec767 -
effectively, whether you have the right instructions to get a bf16 value
into/out of a S register with MoveTo/FromHPR depends on hasFullFP16, but
whether you use a HPR for a value of type MVT::bf16 depends on hasBF16.
This is why the tests are not changed by `+bf16` vs `-bf16`, but I've
left both sets of RUN lines in case this changes in the future.
Test Changes:
- Added more testing for testing inline asm (the core part)
- fp16-promote.ll and pr47454.ll show improvements where unnecessary
fp16-fp32 up/down-casts are no longer emitted. This results in fewer
libcalls where those casts would be done with a libcall.
- aes-erratum-fix.ll is fairly noisy, and I need to revisit this test so
that the IR is more minimal than it is right now, because most of the
changes in this commit do not relate to what AES is actually trying to
verify.
Differential Revision: https://reviews.llvm.org/D143711
When working out whether we can see a compressible jump-table pattern during
ConstantIslands, we were stopping when we saw a debug instruction. Instead it's
better to keep iterating backwards to the first real instruction.
https://reviews.llvm.org/D142019
Alignment of an alloca in IR can be lower than the preferred alignment
on purpose, but this override essentially treats the preferred
alignment as the minimum alignment.
The patch changes this behavior to always use the specified
alignment. If alignment is not set explicitly in LLVM IR, it is set to
DL.getPrefTypeAlign(Ty) in computeAllocaDefaultAlign.
Tests are changed as well: explicit alignment is increased to match
the preferred alignment if it changes output, or omitted when it is
hard to determine the right value (e.g. for pointers, some structs, or
weird types).
Differential Revision: https://reviews.llvm.org/D135462
YAML specification does not allow keys duplication an a mapping. However, YAML
parser in LLVM does not have any check on that and uses only the last key entry.
In this change duplicated keys are merged to satisfy the spec.
Differential Revision: https://reviews.llvm.org/D141848
If this successor list is not correct, then branch-folding may
incorrectly think that the indirect target is dead and remove it. This
results in a dangling reference to the removed block as an operand to
the INLINEASM_BR, which later will get AsmPrinted into code that doesn't
assemble.
This was made more obvious by, but is not a regression of
https://reviews.llvm.org/D130316.
Fixes: https://github.com/llvm/llvm-project/issues/60346
Reviewed By: efriedma, void
Differential Revision: https://reviews.llvm.org/D142924