scalar_to_vector is difficult to make appear and test,
but I found one case where this makes an observable difference.
It fires more often than this in the test suite, but most of them
have no net result in the final code. This helps reduce regressions
in a future commit.
The lowering code was using the wrong chain value, which meant that
the 'smstart' after the call from streaming agnostic-ZA functions ->
non-streaming private-ZA functions was incorrectly removed from the DAG.
For some ABIs `update_cc_test_checks.py` is unable to generate tests
because of the mismatch between the mangled function names reported by
clang's `-asd-dump` and the function names in LLVM IR.
This patch fixes it by striping the leading underscore from the mangled
name for global functions if the data layout string says they have one.
This relands the reverted #120721 with a fix for cases where neither
reduction operand are the reduction phi. Only
63114239cc8d26225a0ef9920baacfc7cc00fc58 and
63114239cc8d26225a0ef9920baacfc7cc00fc58 are new on top of the reverted
PR.
---------
Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>
This extension adds eleven instructions to accelerate interrupt
servicing.
The current spec can be found at:
https://github.com/quic/riscv-unified-db/releases/latest
This patch adds assembler only support.
---------
Co-authored-by: Harsh Chandel <hchandel@qti.qualcomm.com>
When passing an instruction with a register mask, the machine copy
propagation pass was dropping the information about some copy
instructions which define a register which is preserved by the mask,
because that register overlaps a register which is partially clobbered
by it. This resulted in a miscompilation for AArch64, because this
caused a live copy to be considered dead.
The fix is to clobber register masks by finding the set of reg units
which is preserved by the mask, and clobbering all units not in that
set.
Makes Inline Spiller amenable to the new PM.
This reapplies commit a531800344dc54e9c197a13b22e013f919f3f5e1 reverted
because of two unused private members reported on sanitizer bots.
This patch simplifies select-based integer min/max reductions by
utilizing `llvm::getMinMaxReductionPredicate`, and generates
intrinsic-based min/max reductions by utilizing
`llvm::getMinMaxReductionIntrinsicOp`.
This partially reverts commit 07ff786e39e2190449998d3af1000454dee501be.
The hunk being reverted in this patch seems to break:
tools/llvm-gsymutil/ARM_AArch64/macho-merged-funcs-dwarf.yaml
under LLVM_ENABLE_EXPENSIVE_CHECKS.
`PassRegistry.def` already has this entry, but the dummy definition was
being pulled instead.
I couldn't reproduce the build failures that FIXME referenced, maybe the
Dummy pass getting in the way was part of the cause.
Use the existing VPlan-based analysis to identify recipes that only have
their first lane demanded and transform them to uniform recpliate
recipes. This simplifies the generated code in some places and prepares
for fixing https://github.com/llvm/llvm-project/issues/122496.
Note that PointerUnion::is have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
In this patch, I'm calling call().getBase() for an instance of
PointerUnion. call() alone would return an instance of IndexCall,
which wraps PointerUnion. Note that isa<> cannot directly accept an
instance of IndexCall, at least without defining CastInfo.
I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
For targets with free domain moves, or AVX512 support, allow the use of VPMOVSX/ZX extension loads to reduce the load sizes.
I've limited this to extension to i32/i64 types as we're mostly interested in shuffle mask loading here, but we could include i16 types as well just as easily.
Inspired by a regression on #122485
There is a narrow special-case in isImpliedCondICmps that can benefit
from being taught about samesign. Since it costs us nothing to implement
it, teach it about samesign, for completeness. This patch marks the
completion of the effort to teach ValueTracking about samesign.
This change implements import call optimization for AArch64 Windows
(equivalent to the undocumented MSVC `/d2ImportCallOptimization` flag).
Import call optimization adds additional data to the binary which can be
used by the Windows kernel loader to rewrite indirect calls to imported
functions as direct calls. It uses the same [Dynamic Value Relocation
Table mechanism that was leveraged on x64 to implement
`/d2GuardRetpoline`](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618).
The change to the obj file is to add a new `.impcall` section with the
following layout:
```cpp
// Per section that contains calls to imported functions:
// uint32_t SectionSize: Size in bytes for information in this section.
// uint32_t Section Number
// Per call to imported function in section:
// uint32_t Kind: the kind of imported function.
// uint32_t BranchOffset: the offset of the branch instruction in its
// parent section.
// uint32_t TargetSymbolId: the symbol id of the called function.
```
NOTE: If the import call optimization feature is enabled, then the
`.impcall` section must be emitted, even if there are no calls to
imported functions.
The implementation is split across a few parts of LLVM:
* During AArch64 instruction selection, the `GlobalValue` for each call
to a global is recorded into the Extra Information for that node.
* During lowering to machine instructions, the called global value for
each call is noted in its containing `MachineFunction`.
* During AArch64 asm printing, if the import call optimization feature
is enabled:
- A (new) `.impcall` directive is emitted for each call to an imported
function.
- The `.impcall` section is emitted with its magic header (but is not
filled in).
* During COFF object writing, the `.impcall` section is filled in based
on each `.impcall` directive that were encountered.
The `.impcall` section can only be filled in when we are writing the
COFF object as it requires the actual section numbers, which are only
assigned at that point (i.e., they don't exist during asm printing).
I had tried to avoid using the Extra Information during instruction
selection and instead implement this either purely during asm printing
or in a `MachineFunctionPass` (as suggested in [on the
forums](https://discourse.llvm.org/t/design-gathering-locations-of-instructions-to-emit-into-a-section/83729/3))
but this was not possible due to how loading and calling an imported
function works on AArch64. Specifically, they are emitted as `ADRP` +
`LDR` (to load the symbol) then a `BR` (to do the call), so at the point
when we have machine instructions, we would have to work backwards
through the instructions to discover what is being called. An initial
prototype did work by inspecting instructions; however, it didn't
correctly handle the case where the same function was called twice in a
row, which caused LLVM to elide the `ADRP` + `LDR` and reuse the
previously loaded address. Worse than that, sometimes for the
double-call case LLVM decided to spill the loaded address to the stack
and then reload it before making the second call. So, instead of trying
to implement logic to discover where the value in a register came from,
I instead recorded the symbol being called at the last place where it
was easy to do: instruction selection.
We want special handing for IGLP instructions in the scheduler but they
should still be treated like they have side effects by other passes. Add
a target hook to the ScheduleDAGInstrs DAG builder so that we have more
control over this.
With range and undef metadata on a call we can have vector AssertZExt
generated on a target with no vector operations. The AssertZExt needs to
scalarize to a normal `AssertZext tin, ValueType`. I have added
AssertSext too, although I do not have a test case.
Fixes#110374
If we have a CSEL instruction that depends on the flags set by a
(SUBS x c) instruction and the true and/or false expression is
(add (add x y) -c), we can reassociate the latter expression to
(add (SUBS x c) y) and save one instruction.
Proof for the basic transformation: https://alive2.llvm.org/ce/z/-337Pb
We can extend this transformation for slightly different constants. For
example, if we have (add (add x y) -(c-1)) and a the comparison x <u c,
we can transform the comparison to x <=u c-1 to eliminate the comparison
instruction, too. Similarly, we can transform (x == 0) to (x <u 1).
Proofs for the transformations that alter the constants:
https://alive2.llvm.org/ce/z/3nVqgRFixes#119606.