- Move `MIPrinter` class to anonymous namespace, and remove it as a
friend of `MachineBasicBlock`.
- Move `canPredictBranchProbabilities` to `MachineBasicBlock` and change
it to use the new `BranchProbability::normalizeProbabilities` function
that accepts a range, and also to use `llvm::equal()` to check equality
of the two vectors.
- Use `ListSeparator` to print comma separate lists instead of manual
code to do that.
This experimental option was introduced in 2015 via commit 1192294, and
the target hook was added in 2020 via commit 99e865b6. There does not
appear to have ever been a use of this target hook in tree.
This code is complicating one of the most complicated and hard to
understand parts of our code base, and was an experiment introduced
nearly 10 years ago. Let's get rid of it.
Note that the idea described in the original patch is not neccessarily a
bad one, and we might return to it someday.
- Change MachineInstr operand accessors to use `ArrayRef` internally to
slice the operand array into sub-arrays.
- Minor: remove unnecessary {} on `MachineInstrBuilder::add`.
Implement proper splitting functions for PARTIAL_REDUCE_MLA ISD nodes.
This makes the udot_8to64 and sdot_8to64 tests generate dot product
instructions for when the new ISD nodes are used.
---------
Co-authored-by: James Chesterman <james.chesterman@arm.com>
If the SDNode is used it can pick up the wrong results number, for
example looking at the known bits of the first result where it should be
looking at the second. The SDValue is already present as the
SelectCodeCommon checks move from parent to child, pass the SDValue
through to CheckNodePredicate as Op so that it can use it if necessary.
SDNode *N is still generated, keeping most PatFrags the same.
Fixes#137274
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.
These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.
Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
Fix issue 131298 where an undefined $scc register causes verifier errors
when using SI_KILL_F32_COND_IMM_TERMINATOR instructions. The problem
occurs because the $scc register defined in a comparison before the kill
terminator is used in successor blocks, but was not properly marked as live-in.
This patch:
- Adds code to check if SCC is used in the successor block
- Adds SCC as a live-in to successor blocks
- Handles both explicit and implicit uses of SCC
With this patch the machine verifier no longer reports undefined $scc
errors in following kill terminator instruction.
Fixes#131298
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Reapplied after fixing the config issue that was causing issues following
the previous merge.
This reverts commit fdbf073a86573c9ac4d595fac8e06d252ce1469f.
- Add new pass manager version of `MachineUniformityAnalysis `.
- Query `TargetTransformInfo` in new pass manager version.
- Use `printAsOperand` when printing machine function name
The code below the removed check looks generic enough to support
arbitrary integer widths. This change helps 32-bit targets avoid
expensive expansion/libcalls in the case of zero input.
Pull Request: https://github.com/llvm/llvm-project/pull/137197
LegalizerHelper::reduceLoadStoreWidth does not work for non-byte-sized
types, because this would require (un)packing of bits across byte
boundaries.
Precommit tests: #134904
Current implementation tries to fold the operand before
rematerialization because it can reduce one register usage. But if there
is a physical register available we can still rematerialize it without
causing high register pressure.
This patch do this check to find the better choice. Then we can produce
xorps %xmm1, %xmm1
ucomiss %xmm1, %xmm0
instead of
ucomiss LCPI0_1(%rip), %xmm0
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.
These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.
Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
When floating-point operations are legalized to operations of a higher
precision (e.g. f16 fadd being legalized to f32 fadd) then we get
narrowing then widening operations between each operation. With the
appropriate fast math flags (nnan ninf contract) we can eliminate these
casts.
See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801
for context.
This is a non-functional change which just changes the interface of
GlobalValue, in preparation for future functional changes. This part
touches a fair few users, so is split out for ease of review. Future
changes to the GlobalValue implementation can then be focused purely on
that class.
This does the following:
* Rename GlobalValue::getGUID(StringRef) to
getGUIDAssumingExternalLinkage. This is simply making explicit at the
callsite what is currently implicit.
* Where possible, migrate users to directly calling getGUID on a
GlobalValue instance.
* Otherwise, where possible, have them call the newly renamed
getGUIDAssumingExternalLinkage, to make the assumption explicit.
There are a few cases where neither of the above are possible, as the
caller saves and reconstructs the necessary information to compute the
GUID themselves. We want to migrate these callers eventually, but for
this first step we leave them be.
Replace "concept based polymorphism" with simpler PImpl idiom.
This pursues two goals:
* Enforce static type checking. Previously, target implementations hid
base class methods and type checking was impossible. Now that they
override the methods, the compiler will complain on mismatched
signatures.
* Make the code easier to navigate. Previously, if you asked your
favorite LSP server to show a method (e.g. `getInstructionCost()`), it
would show you methods from `TTI`, `TTI::Concept`, `TTI::Model`,
`TTIImplBase`, and target overrides. Now it is two less :)
There are three commits to hopefully simplify the review.
The first commit removes `TTI::Model`. This is done by deriving
`TargetTransformInfoImplBase` from `TTI::Concept`. This is possible
because they implement the same set of interfaces with identical
signatures.
The first commit makes `TargetTransformImplBase` polymorphic, which
means all derived classes should `override` its methods. This is done in
second commit to make the first one smaller. It appeared infeasible to
extract this into a separate PR because the first commit landed
separately would result in tons of `-Woverloaded-virtual` warnings (and
break `-Werror` builds).
The third commit eliminates `TTI::Concept` by merging it with the only
derived class `TargetTransformImplBase`. This commit could be extracted
into a separate PR, but it touches the same lines in
`TargetTransformInfoImpl.h` (removes `override` added by the second
commit and adds `virtual`), so I thought it may make sense to land these
two commits together.
Pull Request: https://github.com/llvm/llvm-project/pull/136674
EH landing pad entry implicitly clobbers target-specific exception
pointer and exception selector registers. The post-RA MachineLICM pass
needs to take these into account when deciding whether to hoist an
instruction out of the loop that initializes one of these registers.
Fixes: https://github.com/llvm/llvm-project/issues/122315
Add support for using the existing SCRATCH_STORE_BLOCK and
SCRATCH_LOAD_BLOCK instructions for saving and restoring callee-saved
VGPRs. This is controlled by a new subtarget feature, block-vgpr-csr. It
does not include WWM registers - those will be saved and restored
individually, just like before. This patch does not change the ABI.
Use of this feature may lead to slightly increased stack usage, because
the memory is not compacted if certain registers don't have to be
transferred (this will happen in practice for calling conventions where
the callee and caller saved registers are interleaved in groups of 8).
However, if the registers at the end of the block of 32 don't have to be
transferred, we don't need to use a whole 128-byte stack slot - we can
trim some space off the end of the range.
In order to implement this feature, we need to rely less on the
target-independent code in the PrologEpilogInserter, so we override
several new methods in SIFrameLowering. We also add new pseudos,
SI_BLOCK_SPILL_V1024_SAVE/RESTORE.
One peculiarity is that both the SI_BLOCK_V1024_RESTORE pseudo and the
SCRATCH_LOAD_BLOCK instructions will have all the registers that are not
transferred added as implicit uses. This is done in order to inform
LiveRegUnits that those registers are not available before the restore
(since we're not really restoring them - so we can't afford to scavenge
them). Unfortunately, this trick doesn't work with the save, so before
the save all the registers in the block will be unavailable (see the
unit test).
This was reverted due to failures in the builds with expensive checks
on, now fixed by always updating LiveIntervals and SlotIndexes in
SILowerSGPRSpills.
It is used to mark a value that we are sure that it is not some fcType.
The examples include:
* An arguments of a function is marked with nofpclass
* Output value of an intrinsic can be sure to not be some type
So that the following operation can make some assumptions.
---------
Co-authored-by: Your Name <you@example.com>
This reverts commit a9d93ecf1f8d2cfe3f77851e0df179b386cff353.
Reverted due to the commit including a config in LLVM headers that is not
available outside of the llvm source tree.
Without this, we end up with quadratic behavior affecting functions with
large numbers of code motion barriers, such as CFI jump tables.
As a drive-by cleanup, remove a redundant store to SawStore in this
pass as it is also done by isSafeToMove.
Reviewers: arsenm
Reviewed By: arsenm
Pull Request: https://github.com/llvm/llvm-project/pull/136806
Reverts llvm/llvm-project#136205
Breaks buildbots, probably something about needing to restrict the test
to running on a specific target or the like - I haven't looked closely.
Co-authored-by: Vladislav Dzhidzhoev <dzhidzhoev@gmail.com>
And teach SelectionDAGBuilder to get the range metadata in
visitAtomicLoad.
This allows us to recognize that sign extending a byte load of a
boolean value from memory will produce zeros for the extended bits.
This allow us to remove an AND on RISC-V.
Tests copied from #136502 with range metadata added to i1 cases.
Some of the test effects overlap with #136502, but that patch can't
handle the acquire or seq_cst cases with the Zalasr extension. We
only have sign extending versions of those loads.
This is part of a series of patches that tries to improve DILocation bug
detection in Debugify; see the review for more details. This is the patch
that adds the main feature, adding a set of `DebugLoc::get<Kind>`
functions that can be used for instructions with intentionally empty
DebugLocs to prevent Debugify from treating them as bugs, removing the
currently-pervasive false positives and allowing us to use Debugify (in
its original DI preservation mode) to reliably detect existing bugs and
regressions. This patch does not add uses of these functions, except for
once in Clang before optimizations, and in
`Instruction::dropLocation()`, since that is an obvious case that
immediately removes a set of false positives.
This reverts commit 427b6448a3af009e57c0142d6d8af83318b45093.
Original patch has been updated to include a fix to esnure
AArch64InstructionSelector::emitConstantVector supports all the cases
where isBuildVectorAllOnes returns true.
Add lowering in tablegen for PARTIAL_REDUCE_U/SMLA ISD nodes. Only
happens when the combine has been performed on the ISD node. Also adds
in check to only do the DAG combine when the node can then eventually be
lowered, so changes neon tests too.
---------
Co-authored-by: James Chesterman <james.chesterman@arm.com>