If parts of a physical register for a given liverange, as assigned by
the register allocator, can be used to store other values not
represented by this liverange, then `LiveIntervals::addKillFlags`
normally avoids adding a kill flag on the use of this register
when the value's liverange ends.
However, if all the other regunits are artificial, then we can
still safely add the kill flag, since those parts of the register
can never be accessed independently.
When we have legal instructions we want to promote to sXLen and let isel
pattern matching removing the and/sext_inreg.
When using a libcall we want to use a 'si' libcall for small types
instead of 'di'. To match the RV64 ABI, we need to sign extend `unsigned
int` arguments. We reuse the shouldSignExtendTypeInLibCall hook from
SelectionDAG.
This DAG combine was incorrect for big-endian targets, because it
assumes that when a bitcast changes the lane width, the
least-significant bits of the wider lanes are in the lower-numbered
lanes of the smaller type, which is only true for little-endian.
The DAG has the same instructions: the signed and unsigned absolute
difference of it's input. For AArch64, they map to uabd and sabd for
Neon and SVE. The Neon and SVE instructions will require custom
patterns.
They are pseudo opcodes and are not imported by the IRTranslator. We
need combines to create them.
PowerPC, ARM, and AArch64 have native instructions.
/// i.e trunc(abs(sext(Op0) - sext(Op1))) becomes abds(Op0, Op1)
/// or trunc(abs(zext(Op0) - zext(Op1))) becomes abdu(Op0, Op1)
For GlobalISel, we are going to write the combines in MIR patterns.
see:
llvm/test/CodeGen/AArch64/abd-combine.ll
- [ ] combine into abd
- [ ] legalize and add td patterns
For IR like this:
%icmp = icmp ult <4 x i32> %a, splat (i32 5)
%res = extractelement <4 x i1> %icmp, i32 1
where there is only one use of %icmp we can take a similar approach
to what we already do for binary ops such add, sub, etc. and convert
this into
%ext = extractelement <4 x i32> %a, i32 1
%res = icmp ult i32 %ext, 5
For AArch64 targets at least the scalar boolean result will almost
certainly need to be in a GPR anyway, since it will probably be
used by branches for control flow. I've tried to reuse existing code
in scalarizeExtractedBinop to also work for setcc.
NOTE: The optimisations don't apply for tests such as
extract_icmp_v4i32_splat_rhs in the file
CodeGen/AArch64/extract-vector-cmp.ll
because scalarizeExtractedBinOp only works if one of the input
operands is a constant.
The existing analysis was already a pimpl wrapper.
I have extracted legacy pass logic to a LDVImpl wrapper named
`LiveDebugVariables` which is the analysis::Result now. This controls
whether to activate the LDV (depending on `-live-debug-variables` and
DIsubprogram) itself.
The legacy and new analysis only construct the LiveDebugVariables.
VirtRegRewriter will test this.
I want to use this function for GISel too so Type * is a better common
interface. All of the callers already convert EVT to Type * as needed
by calling lowering anyway.
It doesn't make sense to add a new generic ISD to handle riscv tuple
type. Instead we use `SPLAT_VECTOR` for ISD and further lower to
`VMV_V_X`.
Note: If there's `visitSPLAT_VECTOR` in generic DAG combiner, it needs
to skip riscv vector tuple type.
Stack on https://github.com/llvm/llvm-project/pull/114329
This update enhances the implementation of structural hashing for global
variables, using their initial contents. Private global variables or
constants are often used for metadata, where their names are not unique.
This can lead to the creation of different hash results although they
could be merged by the linker as they are effectively identical.
- Refine the hashing of GlobalVariables for strings or certain
Objective-C metadata cases that have section names. This can be further
extended to other scenarios.
- Expose StructuralHash for GlobalVariable so that this API can be
utilized by MachineStableHashing, which is also employed in the global
function outliner.
This change significantly improves size reduction by an additional 1% on
the LLD binary when the global function outliner and merger are enabled
together. As discussed in the RFC
https://discourse.llvm.org/t/loh-conflicting-with-machineoutliner/83279/8?u=kyulee-com,
if we disable or relocate the LOH pass, the size impact could increase
to 4%.
This reverts commit acf3b1aa932b2237c181686e52bc61584a80a3ff.
Broke https://lab.llvm.org/buildbot/#/builders/76/builds/5002
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/BackendUtil.cpp.o:(.toc+0x258): undefined reference to `vtable for llvm::DroppedVariableStatsIR'
This reverts commit 249755cedb17ffa707253edcef1a388f807caa35.
Broke https://lab.llvm.org/buildbot/#/builders/160/builds/9420
Note: This is test shard 99 of 154.
[==========] Running 2 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 1 test from DroppedVariableStatsMIR
[ RUN ] DroppedVariableStatsMIR.InlinedAt
--
exit: -11
Moved the IR unit test to the CodeGen folder to resolve linker errors:
`error: undefined reference to 'vtable for
llvm::DroppedVariableStatsIR'`
This patch is trying to reland
https://github.com/llvm/llvm-project/pull/115563
Currently FastISel triggers a fallback if there is an unreachable
terminator and the TrapUnreachable option is enabled (the ISD::TRAP
selection does not actually work).
Add handling for NoTrapAfterNoReturn, in which case we don't actually
need to emit a trap. The test is just there to make sure there is no
FastISel fallback (which is why I'm not testing the case without
noreturn). We have other tests that check the actual unreachable codegen
variations.
Need to force libcall legalization to treat the integer argument as
signed so that it can be promoted to XLen in call lowering for RV64.
Alternatively we could promote the operand before converting to libcall,
but going through call lowering is closer to what SelectionDAG does.
Extend the support for implicit selects in the form of OR with a ZExt
operand to support ADD and SUB binops as well. They similarly can form
implicit selects which can be profitable to convert back the branches.
PR: https://github.com/llvm/llvm-project/pull/115489
Handle @llvm.expect.with.probability in GlobalISel in the same way
@llvm.expect is handled, passing the value through as-is. This can be
encountered if the intrinsic is used without optimizations, which would
otherwise transform it out.
Fixes#115411 for GlobalISel
When extracting a smaller integer from a scalar_to_vector source, we were limited to only folding/truncating the lowest bits of the scalar source.
This patch extends the fold to handle extraction of any other element, by right shifting the source before truncation.
Fixes a regression from #117884
This reverts commit e9c49901a43f5b16c3df416460b7e4dbdd24ce03.
Current AMDGPURegBankSelect does nothing different then RegBankSelect.
Revert to using generic RegBankSelect in preparation for adding new
regbankselect passes. New AMDGPURegBankSelect, that will use uniformity
analysis for regbank select decisions, will not subclass RegBankSelect.
Revert regression tests to use regbankselect since amdgpu-regbankselect
will be used by new pass and behavior will be different.
* Enables conversion of several select-like instructions within one
group
* Any number of auxiliary instructions depending on the same condition
can be in between select-like instructions
* After splitting the basic block, move select-like instructions into
the relevant basic blocks and optimise them
* Make it easier to add support shift-base select-like instructions and
also any mixture of zext/sext/not instructions
After 9fe78db4, the pass inserts `store volatile i32 -1, ptr %call_site`
before all invoke instruction except the one in the entry block, which
has the effect of bypassing landing pads on exceptions.
When configuring the call site for a potentially throwing instruction
check that it is not `InvokeInst` -- they are handled by earlier code.
Handle \@llvm.expect.with.probability in SelectionDAGBuilder, FastISel,
and IntrinsicLowering in the same way \@llvm.expect is handled, where
the value is passed through as-is. This can be reached if the intrinsic
is used without optimizations, where it would otherwise be properly
transformed out.
Fixes#115411 for SelectionDAG. A similar patch is likely needed for
GlobalISel.
With cb57b7a7, MachineLateInstrsCleanup switched to using a map to keep
track of kill flags to remedy compile time regressions seen with huge
functions. It seems that the comment above clearKillsForDef() became stale with
that commit, and also that one of the arguments to it became unused,
both of which this patch fixes.
The optimiser will produce empty blocks that are unconditionally
executed according to the CFG -- while it may not be meaningful code,
and won't get a prologue_end position, we need to not crash on this
input.
The fault comes from assuming that there's always a next block with some
instructions in it, that will eventually produce some meaningful control
flow to stop at -- in the given reproducer in issue #117206 this isn't
true, because the function terminates with `unreachable`. Thus, I've
refactored the "get next instruction logic" into a helper that'll step
through all blocks and terminate if there aren't any more.
Reproducer from aeubanks
Assert that the passed value is a valid unsigned integer value for the
specified type.
For signed values getSignedConstant() / getSignedTargetConstant() should
be used instead.
Fix all the places I could find that did't do this. We were already
mostly correct for FP_ROUND after
9a976f36615dbe15e76c12b22f711b2e597a8e51, but not STRICT_FP_ROUND.
This looks like a rather weird change, so let me explain why this isn't
as unreasonable as it looks. Let's start with the problem it's solving.
```
define signext i32 @overlap_live_ranges(ptr %arg, i32 signext %arg1) { bb:
%i = icmp eq i32 %arg1, 1
br i1 %i, label %bb2, label %bb5
bb2: ; preds = %bb
%i3 = getelementptr inbounds nuw i8, ptr %arg, i64 4
%i4 = load i32, ptr %i3, align 4
br label %bb5
bb5: ; preds = %bb2, %bb
%i6 = phi i32 [ %i4, %bb2 ], [ 13, %bb ]
ret i32 %i6
}
```
Right now, we codegen this as:
```
li a3, 1
li a2, 13
bne a1, a3, .LBB0_2
lw a2, 4(a0)
.LBB0_2:
mv a0, a2
ret
```
In this example, we have two values which must be assigned to a0 per the
ABI (%arg, and the return value). SelectionDAG ensures that all values
used in a successor phi are defined before exit the predecessor block.
This creates an ADDI to materialize the immediate in the entry block.
Currently, this ADDI is not sunk into the tail block because we'd have
to split a critical edges to do so. Note that if our immediate was
anything large enough to require two instructions we *would* split this
critical edge.
Looking at other targets, we notice that they don't seem to have this
problem. They perform the sinking, and tail duplication that we don't.
Why? Well, it turns out for AArch64 that this is entirely an accident of
the existance of the gpr32all register class. The immediate is
materialized into the gpr32 class, and then copied into the gpr32all
register class. The existance of that copy puts us right back into the
two instruction case noted above.
This change essentially just bypasses this emergent behavior aspect of
the aarch64 behavior, and implements the same "always sink immediates"
behavior for RISCV as well.