2012 Commits

Author SHA1 Message Date
Daniil Kovalev
1488fb4153
[PAC][AArch64] Lower ptrauth constants in code (#96879)
This re-applies #94241 after fixing buildbot failure, see
https://lab.llvm.org/buildbot/#/builders/51/builds/570

According to standard, `constexpr` variables and `const` variables
initialized with constant expressions can be used in lambdas w/o
capturing - see https://en.cppreference.com/w/cpp/language/lambda.
However, MSVC used on buildkite seems to ignore that rule and does not
allow using such uncaptured variables in lambdas: we have "error C3493:
'Mask16' cannot be implicitly captured because no default capture mode
has been specified" - see
https://buildkite.com/llvm-project/github-pull-requests/builds/73238

Explicitly capturing such a variable, however, makes buildbot fail with
"error: lambda capture 'Mask16' is not required to be captured for this
use [-Werror,-Wunused-lambda-capture]" - see
https://lab.llvm.org/buildbot/#/builders/51/builds/570.

Fix both cases by using `0xffff` value directly instead of giving a name
to it.

Original PR description below.

Depends on #94240.

Define the following pseudos for lowering ptrauth constants in code:

- non-`extern_weak`:
  - no GOT load needed: `MOVaddrPAC` - similar to `MOVaddr`, with added
PAC;
  - GOT load needed: `LOADgotPAC` - similar to `LOADgot`, with added PAC;
- `extern_weak`: `LOADauthptrstatic` - similar to `LOADgot`, but use a
special stub slot named `sym$auth_ptr$key$disc` filled by dynamic linker
during relocation resolving instead of a GOT slot.

---------

Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org>
2024-06-28 07:29:38 +03:00
Shengchen Kan
15fc801cf0
[X86][CodeGen] Support hoisting load/store with conditional faulting (#96720)
1. Add TTI interface for conditional load/store.
2. Mark 1 x i16/i32/i64 masked load/store legal so that it's not
   legalized in pass scalarize-masked-mem-intrin.
3. Visit 1 x i16/i32/i64 masked load/store to build a target-specific
   CLOAD/CSTORE node to avoid error in
   `DAGTypeLegalizer::ScalarizeVectorResult`.
4. Combine DAG to simplify the nodes for CLOAD/CSTORE.
5. Lower CLOAD/CSTORE to CFCMOV by pattern match.

This is CodeGen part of #95515
2024-06-27 17:01:55 +08:00
Daniil Kovalev
99251f5a11
Revert "[PAC][AArch64] Lower ptrauth constants in code (#94241)" (#96865)
This reverts #94241.

See buildbot failure
https://lab.llvm.org/buildbot/#/builders/51/builds/570
2024-06-27 11:10:38 +03:00
Daniil Kovalev
b5cc19e572
[PAC][AArch64] Lower ptrauth constants in code (#94241)
Depends on #94240.

Define the following pseudos for lowering ptrauth constants in code:

- non-`extern_weak`:
  - no GOT load needed: `MOVaddrPAC` - similar to `MOVaddr`, with added
    PAC;
  - GOT load needed: `LOADgotPAC` - similar to `LOADgot`, with added PAC;
- `extern_weak`: `LOADauthptrstatic` - similar to `LOADgot`, but use a
  special stub slot named `sym$auth_ptr$key$disc` filled by dynamic linker
  during relocation resolving instead of a GOT slot.

---------

Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org>
2024-06-27 10:02:17 +03:00
Nikita Popov
f2f18459d4 Revert "Intrinsic: introduce minimumnum and maximumnum (#93841)"
As far as I can tell, this pull request was not approved, and
did not go through an RFC on discourse.

This reverts commit 89881480030f48f83af668175b70a9798edca2fb.
This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.
2024-06-21 08:34:04 +02:00
YunQiang Su
8988148003
Intrinsic: introduce minimumnum and maximumnum (#93841)
Currently, on different platform, the behaivor of llvm.minnum is
different if one operand is sNaN:

When we compare sNaN vs NUM:

ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN.
RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86:
Returns NUM but not same with IEEE754-2019's minimumNumber as
     +0.0 is not always greater than -0.0.
MIPS/LoongArch/Generic: return NUM.
LIBCALL: returns qNaN.

So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow
IEEE754-2019's minimumNumber/maximumNumber.

Half-fix: #93033
2024-06-21 11:53:08 +08:00
Heejin Ahn
3c8f3b91d8
[WebAssembly] Treat 'rethrow' as terminator in custom isel (#95967)
`rethrow` instruction is a terminator, but when when its DAG is built in
`SelectionDAGBuilder` in a custom routine, it was NOT treated as such.

```ll
rethrow:                                          ; preds = %catch.start
  invoke void @llvm.wasm.rethrow() #1 [ "funclet"(token %1) ]
          to label %unreachable unwind label %ehcleanup

ehcleanup:                                        ; preds = %rethrow, %catch.dispatch
  %tmp = phi i32 [ 10, %catch.dispatch ], [ 20, %rethrow ]
  ...
```

In this bitcode, because of the `phi`, a `CONST_I32` will be created in
the `rethrow` BB. Without this patch, the DAG for the `rethrow` BB looks
like this:
```
  t0: ch,glue = EntryToken
      t3: ch = CopyToReg t0, Register:i32 %9, Constant:i32<20>
      t5: ch = llvm.wasm.rethrow t0, TargetConstant:i32<12161>
    t6: ch = TokenFactor t3, t5
  t8: ch = br t6, BasicBlock:ch<unreachable 0x562532e43c50>
```
Note that `CopyToReg` and `llvm.wasm.rethrow` don't have dependence so
either can come first in the selected code, which can result in the code
like
```mir
bb.3.rethrow:
  RETHROW 0, implicit-def dead $arguments
  %9:i32 = CONST_I32 20, implicit-def dead $arguments
  BR %bb.6, implicit-def dead $arguments
```

After this patch, `llvm.wasm.rethrow` is treated as a terminator, and
the DAG will look like
```
        t0: ch,glue = EntryToken
      t3: ch = CopyToReg t0, Register:i32 %9, Constant:i32<20>
    t5: ch = llvm.wasm.rethrow t3, TargetConstant:i32<12161>
  t7: ch = br t5, BasicBlock:ch<unreachable 0x5555e3d32c70>
```
Note that now `rethrow` takes a token from `CopyToReg`, so `rethrow` has
to come after `CopyToReg`. And the resulting code will be
```mir
bb.3.rethrow:
  %9:i32 = CONST_I32 20, implicit-def dead $arguments
  RETHROW 0, implicit-def dead $arguments
  BR %bb.6, implicit-def dead $arguments
```

I'm not very familiar with the internals of `getRoot` vs.
`getControlRoot`, but other terminator instructions seem to use the
latter, and using it for `rethrow` too worked.
2024-06-18 21:56:41 -07:00
Matt Arsenault
62b5196be0
DAG: Fix asserting on invalid inline asm constraints (#95935) 2024-06-18 18:51:29 +02:00
Poseydon42
995835fe6d
[SelectionDAG] Add support for the 3-way comparison intrinsics [US]CMP (#91871)
This PR adds initial support for the `scmp`/`ucmp` 3-way comparison
intrinsics in the SelectionDAG. Some of the expansions/lowerings
are not optimal yet.
2024-06-17 11:16:52 +02:00
NAKAMURA Takumi
85a7bba7d2
Cleanup MC/DC intrinsics for #82448 (#95496)
3rd arg of `tvbitmap.update` was made unused. Remove 3rd arg.

Sweep `condbitmap.update`, since it is no longer used.
2024-06-16 09:04:51 +09:00
Andreas Jonson
ae71609e91
[SDAG] Lower range attribute to AssertZext (#95450)
Add support for range attributes on calls, in addition to range metadata.
2024-06-14 10:04:33 +02:00
Jon Roelofs
785dc76c66
[llvm][SelectionDAG] Fix up chains in lowerInvokeable. rdar://113994760 (#94004)
lowerInvokeable wasn't updating the returned chain after emitting the
lowerEndEH, which caused SwiftErrorVal-handling code to re-set the DAG
root, and thus accidentally skip the EH_LABEL node it was supposed to
have addeed. After fixing that, a few places needed to be adjusted that
assume the specific shape of the returned DAG.

Fixes: #64826
Fixes: rdar://113994760
2024-06-13 18:05:52 -07:00
Jay Foad
d4a0154902
[llvm-project] Fix typo "seperate" (#95373) 2024-06-13 20:20:27 +01:00
Quentin Colombet
0605e984fa
[SDISel][Builder] Fix the instantiation of <1 x bfloat|half> (#94591)
Prior to this change, `SelectionDAGBuilder` was producing `SDNode`s of
the form: `f32 = extract_vector_elt <1 x bfloat|half>, i32 0` when
lowering phis of `<1 x bfloat|half>` and running on a target that
promotes this type to `f32` (like some x86 or AMDGPU targets.)

This construct is invalid since this type of node only allows type
extensions for integer types.
It went unotice because the `extract_vector_elt` node is later broken
down in `bitcast` followed by `bf16_to_fp|fp_extend`. However, when the
argument of the phi is a constant we were crashing because the existing
code would try to constant fold this `extract_vector_elt` into a
any_ext.

This patch fixes this by using a proper decomposition for `<1 x
bfloat|half>`:
```
bfloat|half = bitcast <1 x blfoat|half>
float = fp_extend bfloat|half
```

This change should be NFC for the non-constant-folding cases and fix the
SDISel crashes (reported in
https://github.com/llvm/llvm-project/issues/94449) for the folding
cases.

Note: The change on the arm test is a missing fp16 to f32 constant folding
exposed by this patch. I'll push a separate improvement for that.
2024-06-07 18:47:37 +02:00
Orlando Cazalet-Hyams
ea32197daa
[DebugInfo][SelectionDAG] Fix position of salvaged 'dangling' DBG_VALUEs (#94458)
`SelectionDAGBuilder::handleDebugValue` has a parameter `Order` which
represents the insert-at position for the new DBG_VALUE. Prior to this patch
`SelectionDAGBuilder::SDNodeOrder` is used instead of the `Order` parameter.

The only code-paths where `Order != SDNodeOrder` are the two calls calls to
`handleDebugValue` from `salvageUnresolvedDbgValue`.
`salvageUnresolvedDbgValue` is called from `resolveOrClearDbgInfo` and
`dropDanglingDebugInfo`. The former is called after SelectionDAG completes one
block.

Some dbg.values can't be lowered to DBG_VALUEs right away. These get recorded
as 'dangling' - their order-number is saved - and get salvaged later through
`dropDanglingDebugInfo`, or if we've still got dangling debug info once the
whole block has been emitted, through `resolveOrClearDbgInfo`. Their saved
order-number is passed to `handleDebugValue`.

Prior to this patch, DBG_VALUEs inserted using these functions are inserted at
the "current" `SDNodeOrder` rather than the intended position that is passed to
the function.

Fix and add test.
2024-06-06 09:17:38 +01:00
Farzon Lotfi
1d87433593
[x86] Add tan intrinsic part 4 (#90503)
This change is an implementation of #87367's investigation on supporting
IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294


Much of this change was following how G_FSIN and G_FCOS were used.

Changes:
- `llvm/docs/GlobalISel/GenericOpcode.rst` - Document the `G_FTAN`
opcode
-  `llvm/docs/LangRef.rst` - Document the tan intrinsic
- `llvm/include/llvm/Analysis/VecFuncs.def` - Associate the tan
intrinsic as a vector function similar to the tanf libcall.
- `llvm/include/llvm/CodeGen/BasicTTIImpl.h` - Map the tan intrinsic to
`ISD::FTAN`
- `llvm/include/llvm/CodeGen/ISDOpcodes.h` - Define ISD opcodes for
`FTAN` and `STRICT_FTAN`
-  `llvm/include/llvm/IR/Intrinsics.td` - Create the tan intrinsic
- `llvm/include/llvm/IR/RuntimeLibcalls.def` - Define tan libcall
mappings
- `llvm/include/llvm/Target/GenericOpcodes.td` - Define the `G_FTAN`
Opcode
- `llvm/include/llvm/Support/TargetOpcodes.def` - Create a `G_FTAN`
Opcode handler
- `llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td` - Map
`G_FTAN` to `ftan`
- `llvm/include/llvm/Target/TargetSelectionDAG.td` - Define `ftan`,
`strict_ftan`, and `any_ftan` and map them to the ISD opcodes for `FTAN`
and `STRICT_FTAN`
- `llvm/lib/Analysis/VectorUtils.cpp` - Associate the tan intrinsic as a
vector intrinsic
- `llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp` Map the tan intrinsic
to `G_FTAN` Opcode
- `llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp` - Add `G_FTAN` to
the list of floating point math operations also associate `G_FTAN` with
the `TAN_F` runtime lib.
- `llvm/lib/CodeGen/GlobalISel/Utils.cpp` - More floating point math
operation common behaviors.
- llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp - List the function
expansion operations for `FTAN` and `STRICT_FTAN`. Also define both
opcodes in `PromoteNode`.
- `llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp` - More `FTAN`
and `STRICT_FTAN` handling in the legalizer
- `llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h` - Define
`SoftenFloatRes_FTAN` and `ExpandFloatRes_FTAN`.
- `llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp` - Define `FTAN`
as a legal vector operation.
- `llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp` - Define
`FTAN` as a legal vector operation.
- `llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp` - define tan as an
intrinsic that doesn't return NaN.
- `llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp` Map
`LibFunc_tan`, `LibFunc_tanf`, and `LibFunc_tanl` to `ISD::FTAN`. Map
`Intrinsic::tan` to `ISD::FTAN` and add selection dag handling for
`Intrinsic::tan`.
- `llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp` - Define `ftan`
and `strict_ftan` names for the equivalent ISD opcodes.
- `llvm/lib/CodeGen/TargetLoweringBase.cpp` -Define a Tan128 libcall and
ISD::FTAN as a target lowering action.
- `llvm/lib/Target/X86/X86ISelLowering.cpp` - Add x86_64 lowering for
tan intrinsic

resolves https://github.com/llvm/llvm-project/issues/70082
2024-06-05 15:01:33 -04:00
Nikita Popov
deab451e7a
[IR] Remove support for icmp and fcmp constant expressions (#93038)
Remove support for the icmp and fcmp constant expressions.

This is part of:
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179

As usual, many of the updated tests will no longer test what they were
originally intended to -- this is hard to preserve when constant
expressions get removed, and in many cases just impossible as the
existence of a specific kind of constant expression was the cause of the
issue in the first place.
2024-06-04 08:31:03 +02:00
Jon Roelofs
0b4af3a5f4
[llvm][SelectionDAG] Relax llvm.ptrmask's size check on arm64_32 (#94125)
Since pointers in memory, as well as the index type are both 32 bits,
but in registers pointers are 64 bits, the mask generated by
llvm.ptrmask needs to be zero-extended.

Fixes: #94075
Fixes: rdar://125263567
2024-06-03 15:26:30 -07:00
Ahmed Bougacha
cc548ec47c
[AArch64][PAC] Lower authenticated calls with ptrauth bundles. (#85736)
This adds codegen support for the "ptrauth" operand bundles, which can
be used to augment indirect calls with the equivalent of an
`@llvm.ptrauth.auth` intrinsic call on the call target (possibly
preceded by an `@llvm.ptrauth.blend` on the auth discriminator if
applicable.)

This allows the generation of combined authenticating calls
on AArch64 (in the BLRA* PAuth instructions), while avoiding
the raw just-authenticated function pointer from being
exposed to attackers.

This is done by threading a PtrAuthInfo descriptor through
the call lowering infrastructure, eventually selecting a BLRA
pseudo.  The pseudo encapsulates the safe discriminator
computation, which together with the real BLRA* call get emitted
in late pseudo expansion in AsmPrinter.

Note that this also applies to the other forms of indirect calls,
notably invokes, rvmarker, and tail calls.  Tail-calls in particular
bring some additional complexity, with the intersecting register
constraints of BTI and PAC discriminator computation.
However this doesn't currently support PAuth_LR tail-call variants.

This also adopts an x8+ allocation order for GPR64noip, matching
GPR64.
2024-05-31 14:08:10 -07:00
Roger Ferrer Ibáñez
05e6bb40eb
[SelectionDAG] Add an ISD::CLEAR_CACHE node to lower llvm.clear_cache (#93795)
The current way of lowering `llvm.clear_cache` is a bit unusual. As
suggested by Matt Arsenault we are better off using an ISD node.

This change introduces a new `ISD::CLEAR_CACHE`, registers a new libcall
by default named `__clear_cache` and the default legalisation is a
libcall.

This is preparatory work for a custom lowering of `ISD::CLEAR_CACHE`
needed by RISC-V on some platforms.
2024-05-30 14:55:32 +02:00
Graham Hunter
fbb37e9606
[AArch64] Add an all-in-one histogram intrinsic
Based on discussion from
https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788

Current interface is:

llvm.experimental.histogram(<vecty> ptrs, <intty> inc_amount, <vecty> mask)

The integer type used by 'inc_amount' needs to match the type of the buckets in memory.

The intrinsic covers the following operations:
  * Gather load
  * histogram on the elements of 'ptrs'
  * multiply the histogram results by 'inc_amount'
  * add the result of the multiply to the values loaded by the gather
  * scatter store the results of the add

Supports lowering to histcnt instructions for AArch64 targets, and scalarization for all others at present.
2024-05-13 11:35:28 +01:00
Paul Walker
b277bf56d7
[LLVM][CodeGen][SVE] Clean up lowering of VECTOR_SPLICE operations. (#91330)
Remove DAG combine that is performing type legalisation and instead
add isel patterns for all legal types.
2024-05-10 10:53:57 +01:00
Simon Pilgrim
df21ee4c62 [DAG] Add clang-format off/on wrappers around compact switch handlers. NFC.
Avoids a problem identified in #90503
2024-05-09 17:02:17 +01:00
David Sherwood
b52fa9461a
[Analysis] Add cost model for experimental.cttz.elts intrinsic (#90720)
In PR #88385 I've added support for auto-vectorisation of some early
exit loops, which requires using the experimental.cttz.elts to calculate
final indices in the early exit block. We need a more accurate cost
model for this intrinsic to better reflect the cost of work required in
the early exit block. I've tried to accurately represent the expansion
code for the intrinsic when the target does not have efficient lowering
for it. It's quite tricky to model because you need to first figure out
what types will actually be used in the expansion. The type used can
have a significant effect on the cost if you end up using illegal vector
types.

Tests added here:

  Analysis/CostModel/AArch64/cttz_elts.ll
  Analysis/CostModel/RISCV/cttz_elts.ll
2024-05-09 09:40:33 +01:00
Aleksandr Popov
df311a2762
Add interface to check if a call has a deopt bundle (NFC) (#91348)
Encapsulate check that a call has a deopt bundle to make it easier to
change the deopt scheme.
2024-05-08 11:54:49 +02:00
Paul Walker
235cea720c
[NFC][LLVM] Refactor rounding mode detection of constrained fp intrinsic IDs (#90854)
I've refactored the code to genericise the implementation to better
allow for target specific constrained fp intrinsics.
2024-05-07 11:23:55 +01:00
Min-Yih Hsu
539f626ecd
[VP][RISCV] Add vp.cttz.elts intrinsic and its RISC-V codegen (#90502)
This intrinsic is the VP version of `experimental.cttz.elts`.
2024-04-30 09:27:10 -07:00
Maciej Gabka
bfc0317153
Move several vector intrinsics out of experimental namespace (#88748)
This patch is moving out following intrinsics:
* vector.interleave2/deinterleave2
* vector.reverse
* vector.splice

from the experimental namespace.

All these intrinsics exist in LLVM for more than a year now, and are
widely used, so should not be considered as experimental.
2024-04-29 10:16:45 +01:00
Pierre van Houtryve
cf328ff96d
[IR] Memory Model Relaxation Annotations (#78569)
Implements the core/target-agnostic components of Memory Model
Relaxation Annotations.

RFC:
https://discourse.llvm.org/t/rfc-mmras-memory-model-relaxation-annotations/76361/5
2024-04-24 08:52:25 +02:00
Björn Pettersson
d8b253be56
[SelectionDAG] Mark frame index as "aliased" at argument copy elison (#89712)
This is a fix for miscompiles reported in
  https://github.com/llvm/llvm-project/issues/89060

After argument copy elison the IR value for the eliminated alloca
is aliasing with the fixed stack object. This patch is making sure
that we mark the fixed stack object as being aliased with IR values
to avoid that for example schedulers are reordering accesses to
the fixed stack object. This could otherwise happen when there is a
mix of MemOperands refering the shared fixed stack slow via both
the IR value for the elided alloca, and via a fixed stack pseudo
source value (as would be the case when lowering the arguments).
2024-04-23 13:49:18 +02:00
Noah Goldstein
7013638978 [DAG] Add support for nneg flag with uitofp
Copy `nneg` flag when building `UINT_TO_FP` from `uitofp` and use
`nneg` flag in the one place we transform `UINT_TO_FP` -> `SINT_TO_FP`
if the operand is non-negative.
2024-04-09 23:06:55 -05:00
David Green
9fd2e2c2fd
[DAG][AArch64] Support masked loads/stores with nontemporal flags (#87608)
SVE has some non-temporal masked loads and stores. The metadata coming
from the nodes is not copied to the MMO at the moment though, meaning it
will generate a normal instruction. This patch ensures that the right
flags are set if the instruction has non-temporal metadata.
2024-04-08 08:53:27 +01:00
Sameer Sahasrabuddhe
421557974a
[AMDGPU] Use glue for convergence tokens at call-like operations (#86766)
The earlier implementation on AMDGPU used explicit token operands at
SI_CALL and SI_CALL_ISEL. This is now replaced with CONVERGENCECTRL_GLUE
operands, with the following effects:

- The treatment of tokens at call-like operations is now consistent with
the treatment at intrinsics.
- Support for tail calls using implicit tokens at SI_TCRETURN "just
works".
- The extra parameter at call-like instructions is eliminated, thus
restoring those instructions and their handling to the original state.

The new glue node is placed after the existing glue node for the
outgoing call parameters, which seems to not interfere with selection of
the call-like nodes.
2024-04-01 10:51:13 +05:30
Vitaly Buka
20f56e1f8e
[CodeGen] Add default lowering for llvm.allow.{runtime,ubsan}.check() (#86049)
RFC:
https://discourse.llvm.org/t/rfc-add-llvm-experimental-hot-intrinsic-or-llvm-hot/77641
2024-03-31 22:19:33 -07:00
Emil Pedersen
0e5c504d3d
[DebugInfo] [SelectionDAG] Fix handling of duplicate dbg values (#86598)
Before this fix, a duplicate llvm.dbg.value intrinsic referring to an
argument, after an alloca, would be generated with `$noreg`, losing
debug information. Instead, we silently drop the second debug info, so
it doesn't break the first one.

rdar://125375717
2024-03-26 12:09:22 -07:00
Il-Capitano
308ed0233a
[Intrinsics] Make patchpoint.i64 generic on its return type (#85911)
Currently patchpoints can only have two result types, `void` and `i64`.
This limits the result to general purpose registers.
This patch makes `patchpoint.i64` an overloadable intrinsic, allowing
result values that can fit in a single register (e.g. integers,
pointers, floats).
2024-03-26 19:08:52 +05:30
Harvin Iriawan
57146daeaa
[CodeGen] Update for scalable MemoryType in MMO (#70452)
Remove getSizeOrUnknown call when MachineMemOperand is created.  For Scalable
TypeSize, the MemoryType created becomes a scalable_vector.

2 MMOs that have scalable memory access can then use the updated BasicAA that
understands scalable LocationSize.

Original Patch by Harvin Iriawan
Co-authored-by: David Green <david.green@arm.com>
2024-03-23 12:56:25 +00:00
Craig Topper
c67ed2f1e1
[SelectionDAG][RISCV] Use TypeSize version of ComputeValueVTs in TargetLowering::LowerCallTo. (#86166)
This is needed to support non-intrinsic functions returning tuple types
which are represented as structs with scalable vector types in IR.

I suspect this may have been broken since
https://reviews.llvm.org/D158115
2024-03-21 20:35:08 -07:00
Stephen Tozer
bdc77d1ecc
[RemoveDIs][NFC] Rename DPLabel->DbgLabelRecord (#85918)
This patch renames DPLabel to DbgLabelRecord, in accordance with the
ongoing DbgRecord rename. This rename was fairly trivial, since DPLabel
isn't as widely used as DPValue and has no real conflicts in either its
full or abbreviated name. As usual, the entire replacement was done
automatically, with `s/DPLabel/DbgLabelRecord/` and `s/DPL/DLR/`.
2024-03-20 13:11:28 +00:00
Stephen Tozer
ffd08c7759
[RemoveDIs][NFC] Rename DPValue -> DbgVariableRecord (#85216)
This is the major rename patch that prior patches have built towards.
The DPValue class is being renamed to DbgVariableRecord, which reflects
the updated terminology for the "final" implementation of the RemoveDI
feature. This is a pure string substitution + clang-format patch. The
only manual component of this patch was determining where to perform
these string substitutions: `DPValue` and `DPV` are almost exclusively
used for DbgRecords, *except* for:

- llvm/lib/target, where 'DP' is used to mean double-precision, and so
appears as part of .td files and in variable names. NB: There is a
single existing use of `DPValue` here that refers to debug info, which
I've manually updated.
- llvm/tools/gold, where 'LDPV' is used as a prefix for symbol
visibility enums.

Outside of these places, I've applied several basic string
substitutions, with the intent that they only affect DbgRecord-related
identifiers; I've checked them as I went through to verify this, with
reasonable confidence that there are no unintended changes that slipped
through the cracks. The substitutions applied are all case-sensitive,
and are applied in the order shown:

```
  DPValue -> DbgVariableRecord
  DPVal -> DbgVarRec
  DPV -> DVR
```

Following the previous rename patches, it should be the case that there
are no instances of any of these strings that are meant to refer to the
general case of DbgRecords, or anything other than the DPValue class.
The idea behind this patch is therefore that pure string substitution is
correct in all cases as long as these assumptions hold.
2024-03-19 20:07:07 +00:00
David Green
18da51b2b2 [CodeGen] More uses of LocationSize::beforeOrAfterPointer().
As an extension to #84751, this adds some extra uses of beforeOrAfterPointer()
instead of UnknownSize.
2024-03-18 20:18:49 +00:00
David Green
601e102bdb
[CodeGen] Use LocationSize for MMO getSize (#84751)
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.

This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.

Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
2024-03-17 18:15:56 +00:00
Stephen Tozer
15f3f446c5
[RemoveDIs][NFC] Rename common interface functions for DPValues->DbgRecords (#84793)
As part of the effort to rename the DbgRecord classes, this patch
renames the widely-used functions that operate on DbgRecords but refer
to DbgValues or DPValues in their names to refer to DbgRecords instead;
all such functions are defined in one of `BasicBlock.h`,
`Instruction.h`, and `DebugProgramInstruction.h`.

This patch explicitly does not change the names of any comments or
variables, except for where they use the exact name of one of the
renamed functions. The reason for this is reviewability; this patch can
be trivially examined to determine that the only changes are direct
string substitutions and any results from clang-format responding to the
changed line lengths. Future patches will cover renaming variables and
comments, and then renaming the classes themselves.
2024-03-12 14:53:13 +00:00
Sameer Sahasrabuddhe
60822637bf Restore "Implement convergence control in MIR using SelectionDAG (#71785)"
This restores commit c7fdd8c11e54585dc9d15d63de9742067e0506b9.
Previously reverted in f010b1bef4dda2c7082cbb41dbabf1f149cce306.

LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-03-06 12:19:32 +05:30
Yeting Kuo
d95a0d7c0f
[DAG] Teach SelectionDAGBuilder to read parameter alignment of compressstore/expandload. (#83763)
Previously SelectionDAGBuilder used ABI alignment for
compressstore/expandload. This patch allows SelectionDAGBuilder to use
parameter alignment like vp intrinsics. This does not follow the
original code to default use vector type alignment, since it is possible
implemented to unaligned vector alignment.
2024-03-05 20:48:37 +08:00
Noah Goldstein
a4951eca40 Recommit "[X86] Don't always separate conditions in (br (and/or cond0, cond1)) into separate branches" (2nd Try)
Changes in Recommit:
    1) Fix non-determanism by using `SmallMapVector` instead of
       `SmallPtrSet`.
    2) Fix bug in dependency pruning where we discounted the actual
       `and/or` combining the two conditions. This lead to over pruning.

Closes #81689
2024-03-04 13:23:56 -06:00
Mitch Phillips
f010b1bef4 Revert "Restore "Implement convergence control in MIR using SelectionDAG (#71785)""
This reverts commit c7fdd8c11e54585dc9d15d63de9742067e0506b9.

Reason: Broke the sanitizer buildbots. See the comments at
https://github.com/llvm/llvm-project/pull/71785
for more information.
2024-03-04 17:05:34 +01:00
Sameer Sahasrabuddhe
c7fdd8c11e Restore "Implement convergence control in MIR using SelectionDAG (#71785)"
Original commit 79889734b940356ab3381423c93ae06f22e772c9.
Perviously reverted in commit a2afcd5721869d1d03c8146bae3885b3385ba15e.

LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-03-04 13:28:04 +05:30
NAKAMURA Takumi
5b4759f9fd Revert "[X86] Don't always separate conditions in (br (and/or cond0, cond1)) into separate branches"
This has been buggy for a while.

Reverts #81689
This reverts commit ae76dfb74701e05e5ab4be194e20e49f10768e46.
2024-03-03 22:31:28 +09:00
Noah Goldstein
ae76dfb747 [X86] Don't always separate conditions in (br (and/or cond0, cond1)) into separate branches
It makes sense to split if the cost of computing `cond1` is high
(proportionally to how likely `cond0` is), but it doesn't really make
sense to introduce a second branch if its only a few instructions.

Splitting can also get in the way of potentially folding patterns.

This patch introduces some logic to try to check if the cost of
computing `cond1` is relatively low, and if so don't split the
branches.

Modest improvement on clang bootstrap build:
https://llvm-compile-time-tracker.com/compare.php?from=79ce933114e46c891a5632f7ad4a004b93a5b808&to=978278eabc0bafe2f390ca8fcdad24154f954020&stat=cycles
Average stage2-O3:   0.59% Improvement (cycles)
Average stage2-O0-g: 1.20% Improvement (cycles)

Likewise on llvm-test-suite on SKX saw a net 0.84% improvement  (cycles)

There is also a modest compile time improvement with this patch:
https://llvm-compile-time-tracker.com/compare.php?from=79ce933114e46c891a5632f7ad4a004b93a5b808&to=978278eabc0bafe2f390ca8fcdad24154f954020&stat=instructions%3Au

Note that the stage2 instruction count increases is expected, this
patch trades instructions for decreasing branch-misses (which is
proportionately lower):
https://llvm-compile-time-tracker.com/compare.php?from=79ce933114e46c891a5632f7ad4a004b93a5b808&to=978278eabc0bafe2f390ca8fcdad24154f954020&stat=branch-misses

NB: This will also likely help for APX targets with the new `CCMP` and
`CTEST` instructions.

Closes #81689
2024-03-01 15:35:34 -06:00