As far as I can tell, this pull request was not approved, and
did not go through an RFC on discourse.
This reverts commit 89881480030f48f83af668175b70a9798edca2fb.
This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.
Currently, on different platform, the behaivor of llvm.minnum is
different if one operand is sNaN:
When we compare sNaN vs NUM:
ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN.
RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86:
Returns NUM but not same with IEEE754-2019's minimumNumber as
+0.0 is not always greater than -0.0.
MIPS/LoongArch/Generic: return NUM.
LIBCALL: returns qNaN.
So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow
IEEE754-2019's minimumNumber/maximumNumber.
Half-fix: #93033
`rethrow` instruction is a terminator, but when when its DAG is built in
`SelectionDAGBuilder` in a custom routine, it was NOT treated as such.
```ll
rethrow: ; preds = %catch.start
invoke void @llvm.wasm.rethrow() #1 [ "funclet"(token %1) ]
to label %unreachable unwind label %ehcleanup
ehcleanup: ; preds = %rethrow, %catch.dispatch
%tmp = phi i32 [ 10, %catch.dispatch ], [ 20, %rethrow ]
...
```
In this bitcode, because of the `phi`, a `CONST_I32` will be created in
the `rethrow` BB. Without this patch, the DAG for the `rethrow` BB looks
like this:
```
t0: ch,glue = EntryToken
t3: ch = CopyToReg t0, Register:i32 %9, Constant:i32<20>
t5: ch = llvm.wasm.rethrow t0, TargetConstant:i32<12161>
t6: ch = TokenFactor t3, t5
t8: ch = br t6, BasicBlock:ch<unreachable 0x562532e43c50>
```
Note that `CopyToReg` and `llvm.wasm.rethrow` don't have dependence so
either can come first in the selected code, which can result in the code
like
```mir
bb.3.rethrow:
RETHROW 0, implicit-def dead $arguments
%9:i32 = CONST_I32 20, implicit-def dead $arguments
BR %bb.6, implicit-def dead $arguments
```
After this patch, `llvm.wasm.rethrow` is treated as a terminator, and
the DAG will look like
```
t0: ch,glue = EntryToken
t3: ch = CopyToReg t0, Register:i32 %9, Constant:i32<20>
t5: ch = llvm.wasm.rethrow t3, TargetConstant:i32<12161>
t7: ch = br t5, BasicBlock:ch<unreachable 0x5555e3d32c70>
```
Note that now `rethrow` takes a token from `CopyToReg`, so `rethrow` has
to come after `CopyToReg`. And the resulting code will be
```mir
bb.3.rethrow:
%9:i32 = CONST_I32 20, implicit-def dead $arguments
RETHROW 0, implicit-def dead $arguments
BR %bb.6, implicit-def dead $arguments
```
I'm not very familiar with the internals of `getRoot` vs.
`getControlRoot`, but other terminator instructions seem to use the
latter, and using it for `rethrow` too worked.
This PR adds initial support for the `scmp`/`ucmp` 3-way comparison
intrinsics in the SelectionDAG. Some of the expansions/lowerings
are not optimal yet.
lowerInvokeable wasn't updating the returned chain after emitting the
lowerEndEH, which caused SwiftErrorVal-handling code to re-set the DAG
root, and thus accidentally skip the EH_LABEL node it was supposed to
have addeed. After fixing that, a few places needed to be adjusted that
assume the specific shape of the returned DAG.
Fixes: #64826
Fixes: rdar://113994760
Prior to this change, `SelectionDAGBuilder` was producing `SDNode`s of
the form: `f32 = extract_vector_elt <1 x bfloat|half>, i32 0` when
lowering phis of `<1 x bfloat|half>` and running on a target that
promotes this type to `f32` (like some x86 or AMDGPU targets.)
This construct is invalid since this type of node only allows type
extensions for integer types.
It went unotice because the `extract_vector_elt` node is later broken
down in `bitcast` followed by `bf16_to_fp|fp_extend`. However, when the
argument of the phi is a constant we were crashing because the existing
code would try to constant fold this `extract_vector_elt` into a
any_ext.
This patch fixes this by using a proper decomposition for `<1 x
bfloat|half>`:
```
bfloat|half = bitcast <1 x blfoat|half>
float = fp_extend bfloat|half
```
This change should be NFC for the non-constant-folding cases and fix the
SDISel crashes (reported in
https://github.com/llvm/llvm-project/issues/94449) for the folding
cases.
Note: The change on the arm test is a missing fp16 to f32 constant folding
exposed by this patch. I'll push a separate improvement for that.
`SelectionDAGBuilder::handleDebugValue` has a parameter `Order` which
represents the insert-at position for the new DBG_VALUE. Prior to this patch
`SelectionDAGBuilder::SDNodeOrder` is used instead of the `Order` parameter.
The only code-paths where `Order != SDNodeOrder` are the two calls calls to
`handleDebugValue` from `salvageUnresolvedDbgValue`.
`salvageUnresolvedDbgValue` is called from `resolveOrClearDbgInfo` and
`dropDanglingDebugInfo`. The former is called after SelectionDAG completes one
block.
Some dbg.values can't be lowered to DBG_VALUEs right away. These get recorded
as 'dangling' - their order-number is saved - and get salvaged later through
`dropDanglingDebugInfo`, or if we've still got dangling debug info once the
whole block has been emitted, through `resolveOrClearDbgInfo`. Their saved
order-number is passed to `handleDebugValue`.
Prior to this patch, DBG_VALUEs inserted using these functions are inserted at
the "current" `SDNodeOrder` rather than the intended position that is passed to
the function.
Fix and add test.
This change is an implementation of #87367's investigation on supporting
IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
Much of this change was following how G_FSIN and G_FCOS were used.
Changes:
- `llvm/docs/GlobalISel/GenericOpcode.rst` - Document the `G_FTAN`
opcode
- `llvm/docs/LangRef.rst` - Document the tan intrinsic
- `llvm/include/llvm/Analysis/VecFuncs.def` - Associate the tan
intrinsic as a vector function similar to the tanf libcall.
- `llvm/include/llvm/CodeGen/BasicTTIImpl.h` - Map the tan intrinsic to
`ISD::FTAN`
- `llvm/include/llvm/CodeGen/ISDOpcodes.h` - Define ISD opcodes for
`FTAN` and `STRICT_FTAN`
- `llvm/include/llvm/IR/Intrinsics.td` - Create the tan intrinsic
- `llvm/include/llvm/IR/RuntimeLibcalls.def` - Define tan libcall
mappings
- `llvm/include/llvm/Target/GenericOpcodes.td` - Define the `G_FTAN`
Opcode
- `llvm/include/llvm/Support/TargetOpcodes.def` - Create a `G_FTAN`
Opcode handler
- `llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td` - Map
`G_FTAN` to `ftan`
- `llvm/include/llvm/Target/TargetSelectionDAG.td` - Define `ftan`,
`strict_ftan`, and `any_ftan` and map them to the ISD opcodes for `FTAN`
and `STRICT_FTAN`
- `llvm/lib/Analysis/VectorUtils.cpp` - Associate the tan intrinsic as a
vector intrinsic
- `llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp` Map the tan intrinsic
to `G_FTAN` Opcode
- `llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp` - Add `G_FTAN` to
the list of floating point math operations also associate `G_FTAN` with
the `TAN_F` runtime lib.
- `llvm/lib/CodeGen/GlobalISel/Utils.cpp` - More floating point math
operation common behaviors.
- llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp - List the function
expansion operations for `FTAN` and `STRICT_FTAN`. Also define both
opcodes in `PromoteNode`.
- `llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp` - More `FTAN`
and `STRICT_FTAN` handling in the legalizer
- `llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h` - Define
`SoftenFloatRes_FTAN` and `ExpandFloatRes_FTAN`.
- `llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp` - Define `FTAN`
as a legal vector operation.
- `llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp` - Define
`FTAN` as a legal vector operation.
- `llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp` - define tan as an
intrinsic that doesn't return NaN.
- `llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp` Map
`LibFunc_tan`, `LibFunc_tanf`, and `LibFunc_tanl` to `ISD::FTAN`. Map
`Intrinsic::tan` to `ISD::FTAN` and add selection dag handling for
`Intrinsic::tan`.
- `llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp` - Define `ftan`
and `strict_ftan` names for the equivalent ISD opcodes.
- `llvm/lib/CodeGen/TargetLoweringBase.cpp` -Define a Tan128 libcall and
ISD::FTAN as a target lowering action.
- `llvm/lib/Target/X86/X86ISelLowering.cpp` - Add x86_64 lowering for
tan intrinsic
resolves https://github.com/llvm/llvm-project/issues/70082
Remove support for the icmp and fcmp constant expressions.
This is part of:
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179
As usual, many of the updated tests will no longer test what they were
originally intended to -- this is hard to preserve when constant
expressions get removed, and in many cases just impossible as the
existence of a specific kind of constant expression was the cause of the
issue in the first place.
Since pointers in memory, as well as the index type are both 32 bits,
but in registers pointers are 64 bits, the mask generated by
llvm.ptrmask needs to be zero-extended.
Fixes: #94075
Fixes: rdar://125263567
This adds codegen support for the "ptrauth" operand bundles, which can
be used to augment indirect calls with the equivalent of an
`@llvm.ptrauth.auth` intrinsic call on the call target (possibly
preceded by an `@llvm.ptrauth.blend` on the auth discriminator if
applicable.)
This allows the generation of combined authenticating calls
on AArch64 (in the BLRA* PAuth instructions), while avoiding
the raw just-authenticated function pointer from being
exposed to attackers.
This is done by threading a PtrAuthInfo descriptor through
the call lowering infrastructure, eventually selecting a BLRA
pseudo. The pseudo encapsulates the safe discriminator
computation, which together with the real BLRA* call get emitted
in late pseudo expansion in AsmPrinter.
Note that this also applies to the other forms of indirect calls,
notably invokes, rvmarker, and tail calls. Tail-calls in particular
bring some additional complexity, with the intersecting register
constraints of BTI and PAC discriminator computation.
However this doesn't currently support PAuth_LR tail-call variants.
This also adopts an x8+ allocation order for GPR64noip, matching
GPR64.
The current way of lowering `llvm.clear_cache` is a bit unusual. As
suggested by Matt Arsenault we are better off using an ISD node.
This change introduces a new `ISD::CLEAR_CACHE`, registers a new libcall
by default named `__clear_cache` and the default legalisation is a
libcall.
This is preparatory work for a custom lowering of `ISD::CLEAR_CACHE`
needed by RISC-V on some platforms.
Based on discussion from
https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788
Current interface is:
llvm.experimental.histogram(<vecty> ptrs, <intty> inc_amount, <vecty> mask)
The integer type used by 'inc_amount' needs to match the type of the buckets in memory.
The intrinsic covers the following operations:
* Gather load
* histogram on the elements of 'ptrs'
* multiply the histogram results by 'inc_amount'
* add the result of the multiply to the values loaded by the gather
* scatter store the results of the add
Supports lowering to histcnt instructions for AArch64 targets, and scalarization for all others at present.
In PR #88385 I've added support for auto-vectorisation of some early
exit loops, which requires using the experimental.cttz.elts to calculate
final indices in the early exit block. We need a more accurate cost
model for this intrinsic to better reflect the cost of work required in
the early exit block. I've tried to accurately represent the expansion
code for the intrinsic when the target does not have efficient lowering
for it. It's quite tricky to model because you need to first figure out
what types will actually be used in the expansion. The type used can
have a significant effect on the cost if you end up using illegal vector
types.
Tests added here:
Analysis/CostModel/AArch64/cttz_elts.ll
Analysis/CostModel/RISCV/cttz_elts.ll
This patch is moving out following intrinsics:
* vector.interleave2/deinterleave2
* vector.reverse
* vector.splice
from the experimental namespace.
All these intrinsics exist in LLVM for more than a year now, and are
widely used, so should not be considered as experimental.
This is a fix for miscompiles reported in
https://github.com/llvm/llvm-project/issues/89060
After argument copy elison the IR value for the eliminated alloca
is aliasing with the fixed stack object. This patch is making sure
that we mark the fixed stack object as being aliased with IR values
to avoid that for example schedulers are reordering accesses to
the fixed stack object. This could otherwise happen when there is a
mix of MemOperands refering the shared fixed stack slow via both
the IR value for the elided alloca, and via a fixed stack pseudo
source value (as would be the case when lowering the arguments).
Copy `nneg` flag when building `UINT_TO_FP` from `uitofp` and use
`nneg` flag in the one place we transform `UINT_TO_FP` -> `SINT_TO_FP`
if the operand is non-negative.
SVE has some non-temporal masked loads and stores. The metadata coming
from the nodes is not copied to the MMO at the moment though, meaning it
will generate a normal instruction. This patch ensures that the right
flags are set if the instruction has non-temporal metadata.
The earlier implementation on AMDGPU used explicit token operands at
SI_CALL and SI_CALL_ISEL. This is now replaced with CONVERGENCECTRL_GLUE
operands, with the following effects:
- The treatment of tokens at call-like operations is now consistent with
the treatment at intrinsics.
- Support for tail calls using implicit tokens at SI_TCRETURN "just
works".
- The extra parameter at call-like instructions is eliminated, thus
restoring those instructions and their handling to the original state.
The new glue node is placed after the existing glue node for the
outgoing call parameters, which seems to not interfere with selection of
the call-like nodes.
Before this fix, a duplicate llvm.dbg.value intrinsic referring to an
argument, after an alloca, would be generated with `$noreg`, losing
debug information. Instead, we silently drop the second debug info, so
it doesn't break the first one.
rdar://125375717
Currently patchpoints can only have two result types, `void` and `i64`.
This limits the result to general purpose registers.
This patch makes `patchpoint.i64` an overloadable intrinsic, allowing
result values that can fit in a single register (e.g. integers,
pointers, floats).
Remove getSizeOrUnknown call when MachineMemOperand is created. For Scalable
TypeSize, the MemoryType created becomes a scalable_vector.
2 MMOs that have scalable memory access can then use the updated BasicAA that
understands scalable LocationSize.
Original Patch by Harvin Iriawan
Co-authored-by: David Green <david.green@arm.com>
This is needed to support non-intrinsic functions returning tuple types
which are represented as structs with scalable vector types in IR.
I suspect this may have been broken since
https://reviews.llvm.org/D158115
This patch renames DPLabel to DbgLabelRecord, in accordance with the
ongoing DbgRecord rename. This rename was fairly trivial, since DPLabel
isn't as widely used as DPValue and has no real conflicts in either its
full or abbreviated name. As usual, the entire replacement was done
automatically, with `s/DPLabel/DbgLabelRecord/` and `s/DPL/DLR/`.
This is the major rename patch that prior patches have built towards.
The DPValue class is being renamed to DbgVariableRecord, which reflects
the updated terminology for the "final" implementation of the RemoveDI
feature. This is a pure string substitution + clang-format patch. The
only manual component of this patch was determining where to perform
these string substitutions: `DPValue` and `DPV` are almost exclusively
used for DbgRecords, *except* for:
- llvm/lib/target, where 'DP' is used to mean double-precision, and so
appears as part of .td files and in variable names. NB: There is a
single existing use of `DPValue` here that refers to debug info, which
I've manually updated.
- llvm/tools/gold, where 'LDPV' is used as a prefix for symbol
visibility enums.
Outside of these places, I've applied several basic string
substitutions, with the intent that they only affect DbgRecord-related
identifiers; I've checked them as I went through to verify this, with
reasonable confidence that there are no unintended changes that slipped
through the cracks. The substitutions applied are all case-sensitive,
and are applied in the order shown:
```
DPValue -> DbgVariableRecord
DPVal -> DbgVarRec
DPV -> DVR
```
Following the previous rename patches, it should be the case that there
are no instances of any of these strings that are meant to refer to the
general case of DbgRecords, or anything other than the DPValue class.
The idea behind this patch is therefore that pure string substitution is
correct in all cases as long as these assumptions hold.
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.
This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.
Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
As part of the effort to rename the DbgRecord classes, this patch
renames the widely-used functions that operate on DbgRecords but refer
to DbgValues or DPValues in their names to refer to DbgRecords instead;
all such functions are defined in one of `BasicBlock.h`,
`Instruction.h`, and `DebugProgramInstruction.h`.
This patch explicitly does not change the names of any comments or
variables, except for where they use the exact name of one of the
renamed functions. The reason for this is reviewability; this patch can
be trivially examined to determine that the only changes are direct
string substitutions and any results from clang-format responding to the
changed line lengths. Future patches will cover renaming variables and
comments, and then renaming the classes themselves.
This restores commit c7fdd8c11e54585dc9d15d63de9742067e0506b9.
Previously reverted in f010b1bef4dda2c7082cbb41dbabf1f149cce306.
LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:
1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
control intrinsics.
2. Model token values as untyped virtual registers in MIR.
The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.
The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
Previously SelectionDAGBuilder used ABI alignment for
compressstore/expandload. This patch allows SelectionDAGBuilder to use
parameter alignment like vp intrinsics. This does not follow the
original code to default use vector type alignment, since it is possible
implemented to unaligned vector alignment.
Changes in Recommit:
1) Fix non-determanism by using `SmallMapVector` instead of
`SmallPtrSet`.
2) Fix bug in dependency pruning where we discounted the actual
`and/or` combining the two conditions. This lead to over pruning.
Closes#81689
This reverts commit c7fdd8c11e54585dc9d15d63de9742067e0506b9.
Reason: Broke the sanitizer buildbots. See the comments at
https://github.com/llvm/llvm-project/pull/71785
for more information.
Original commit 79889734b940356ab3381423c93ae06f22e772c9.
Perviously reverted in commit a2afcd5721869d1d03c8146bae3885b3385ba15e.
LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:
1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
control intrinsics.
2. Model token values as untyped virtual registers in MIR.
The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.
The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
MachineIR alias analysis assumes that only bytes after the pointer will
be accessed. This is incorrect if the stride is negative.
This is causing miscompiles in our downstream after SLP started making
strided loads.
Fixes#82657
Patch 2 of 3 to add llvm.dbg.label support to the RemoveDIs project. The
patch stack adds the DPLabel class, which is the RemoveDIs llvm.dbg.label
equivalent.
1. Add DbgRecord base class for DPValue and the not-yet-added
DPLabel class.
2. Add the DPLabel class.
-> 3. Add support to passes.
The next patch, #82639, will enable conversion between dbg.labels and DPLabels.
AssignemntTrackingAnalysis support could have gone two ways:
1. Have the analysis store a DPLabel representation in its results -
SelectionDAGBuilder reads the analysis results and ignores all DbgRecord
kinds.
2. Ignore DPLabels in the analysis - SelectionDAGBuilder reads the analysis
results but still needs to iterate over DPLabels from the IR.
I went with option 2 because it's less work and is no less correct than 1. It's
worth noting that causes labels to sink to the bottom of packs of debug records.
e.g., [value, label, value] becomes [value, value, label]. This shouldn't be a
problem because labels and variable locations don't have an ordering requirement.
The ordering between variable locations is maintained and the label movement is
deterministic
This is another step in the direction of fixing the `Fixed(0) !=
Scalable(0)` bugbear, although whilst weird I don't believe it's causing
us any real issues.