532 Commits

Author SHA1 Message Date
Simon Pilgrim
a0c7a29655 [GlobalISel] IRTranslator::translateGetElementPtr - don't assume a gep constant offset is representable as i64
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=65052
2023-12-14 11:02:38 +00:00
Jay Foad
35ebd92d3d
[GlobalISel] Add G_PREFETCH (#74863) 2023-12-11 11:06:50 +00:00
Craig Topper
755c28a940
[GISel][Mips] Infer alignment when creating memory operand for G_VASTART. (#74004) 2023-11-30 19:55:23 -08:00
Youngsuk Kim
d8b8aa3a56 [llvm] Replace calls to Type::getPointerTo (NFC)
Cleanup work towards removing the method Type::getPointerTo.

If a call to Type::getPointerTo is used solely to support an unneeded
pointer-cast, remove the call entirely.
2023-11-27 10:49:34 -06:00
HaohaiWen
394bba766d
[CodeGen][DebugInfo] Add missing debug info for jump table BB (#71021)
visitJumpTable is called on FinishBasicBlock. At that time, getCurSDLoc
will always return SDLoc without DebugLoc since CurInst was set to
nullptr after visiting each instruction.
This patch passes SDLoc to buildJumpTable when visiting SwitchInst so
that visitJumpTable can use it later.
2023-11-18 19:17:51 +08:00
Michael Maitland
725e599637
[RISCV][GISEL] Add support for scalable vector types in lowerReturnVal (#71587)
Scalable vector types from LLVM IR are lowered into physical vector
registers in MIR based on calling convention for return instructions.
2023-11-15 17:30:53 -05:00
Paulo Matos
7b9d73c2f9
[NFC] Remove Type::getInt8PtrTy (#71029)
Replace this with PointerType::getUnqual().
Followup to the opaque pointer transition. Fixes an in-code TODO item.
2023-11-07 17:26:26 +01:00
Amara Emerson
6b69584660
[GlobalISel] Fall back for bf16 conversions. (#71470)
We don't support these correctly since we don't yet have FP types.
AMDGPU tests were silently miscompiling bf16 as if they were fp16.
2023-11-06 21:18:57 -08:00
Diana
7f5d59b38d
[AMDGPU] ISel for @llvm.amdgcn.cs.chain intrinsic (#68186)
The @llvm.amdgcn.cs.chain intrinsic is essentially a call. The call
parameters are bundled up into 2 intrinsic arguments, one for those that
should go in the SGPRs (the 3rd intrinsic argument), and one for those
that should go in the VGPRs (the 4th intrinsic argument). Both will
often be some kind of aggregate.

Both instruction selection frameworks have some internal representation
for intrinsics (G_INTRINSIC[_WITH_SIDE_EFFECTS] for GlobalISel,
ISD::INTRINSIC_[VOID|WITH_CHAIN] for DAGISel), but we can't use those
because aggregates are dissolved very early on during ISel and we'd lose
the inreg information. Therefore, this patch shortcircuits both the
IRTranslator and SelectionDAGBuilder to lower this intrinsic as a call
from the very start. It tries to use the existing infrastructure as much
as possible, by calling into the code for lowering tail calls.

This has already gone through a few rounds of review in Phab:

Differential Revision: https://reviews.llvm.org/D153761
2023-11-06 12:30:07 +01:00
Serge Pavlov
462d5830da [GlobalISel] Add support for *_fpmode intrinsics
The change implements support of the intrinsics `get_fpmode`,
`set_fpmode` and `reset_fpmode` in Global Instruction Selector. Now they
are lowered into library function calls.

Differential Revision: https://reviews.llvm.org/D158260
2023-10-09 21:14:07 +07:00
Mirko Brkusanin
72e3713009 [IRTranslator] Set NUW flag for inbounds gep and load/store offsets
Patch by: Acim Maravic

Differential Revision: https://reviews.llvm.org/D159515
2023-09-22 16:16:28 +02:00
Martin Storsjö
7a91bbbb00
[GlobalISel] Check for unsupported Windows features on invoke (#65864)
This matches what is done on calls, since
cc981d285d1aa33df201605b9a3e22dd2311ead2 (extended for another case in
5a751e747dbf2c267e944aa961e21de7a815e7eb).

Apply both those cases on invoke just like is done for call.

Also update the preexisting comment which was left without update in
5a751e747dbf2c267e944aa961e21de7a815e7eb.

This fixes github issue #61941.
2023-09-15 11:14:40 +03:00
Arthur Eubanks
0a1aa6cda2
[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.

This matches other nearby enums.

For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
2023-09-14 14:10:14 -07:00
Matt Arsenault
b14e83d1a4 IR: Add llvm.exp10 intrinsic
We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10
to fix this asymmetry. AMDGPU already has most of the code for f32
exp10 expansion implemented alongside exp, so the current
implementation is duplicating nearly identical effort between the
compiler and library which is inconvenient.

https://reviews.llvm.org/D157871
2023-09-01 19:45:03 -04:00
Felipe de Azevedo Piovezan
88417098bb [CodeGen][DebugInfo] Append OP_deref when converting an EntryValue dbg.declare
When we convert an EntryValue dbg.declare into an entry of the MF side table, we
currently copy its DIExpression as is, and rely on subsequent layers to "know"
that this expression is implicitly indirect. This is bad because it adds an
implicit assumption to the IR representation, and requires subsequent layers to
know about this assumption. This also limits the reusability of this table:
what if, in the future, we want to use this table for dbg.values?

This patch changes existing behavior so that the entities converting
dbg_declares explicitly add an OP_deref when converting EntryValue dbg.declares.

Differential Revision: https://reviews.llvm.org/D158437
2023-08-23 12:25:12 -04:00
David Green
a3f2751f78 [AArch64][GISel] Add handling for G_VECREDUCE_FMAXIMUM and G_VECREDUCE_FMINIMUM
This is a lot of copy-pasting for the existing handling of
G_VECREDUCE_FMAX/G_VECREDUCE_FMIN to add handling for
G_VECREDUCE_FMAXIMUM/G_VECREDUCE_FMINIMUM in the same way.

Differential Revision: https://reviews.llvm.org/D156615
2023-08-14 10:03:25 +01:00
Matt Arsenault
1ca0808db2 GlobalISel: Don't expand stacksave/stackrestore in IRTranslator
In some (likely invalid edge cases anyway), it's not correct to
directly copy the stack pointer register.
2023-08-09 18:33:55 -04:00
David Blaikie
4e429fd2a7 Few linter fixes
size() > 0 -> !empty
indentation
mismatched names on parameters in decls/defs
const on value return types
2023-07-31 18:52:57 +00:00
Sameer Sahasrabuddhe
d9847cde48 [GlobalISel] convergent intrinsics
Introduced the convergent equivalent of the existing G_INTRINSIC opcodes:

- G_INTRINSIC_CONVERGENT
- G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS

Out of the targets that currently have some support for GlobalISel, the patch
assumes that the convergent intrinsics only relevant to SPIRV and AMDGPU.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D154766
2023-07-31 12:15:39 +05:30
Matt Arsenault
003b58f65b IR: Add llvm.frexp intrinsic
Add an intrinsic which returns the two pieces as multiple return
values. Alternatively could introduce a pair of intrinsics to
separately return the fractional and exponent parts.

AMDGPU has native instructions to return the two halves, but could use
some generic legalization and optimization handling. For example, we
should be able to handle legalization of f16 on older targets, and for
bf16. Additionally antique targets need a hardware workaround which
would be better handled in the backend rather than in library code
where it is now.
2023-06-28 14:50:16 -04:00
Amara Emerson
1c2c668846 [GlobalISel] Introduce G_CONSTANT_FOLD_BARRIER and use it to prevent constant folding
hoisted constants.

The constant hoisting pass tries to hoist large constants into predecessors and also
generates remat instructions in terms of the hoisted constants. These aim to prevent
codegen from rematerializing expensive constants multiple times. So we can re-use
this optimization, we can preserve the no-op bitcasts that are used to anchor
constants to the predecessor blocks.

SelectionDAG achieves this by having the OpaqueConstant node, which is just a
normal constant with an opaque flag set. I've opted to avoid introducing a new
constant generic instruction here. Instead, we have a new G_CONSTANT_FOLD_BARRIER
operation that constitutes a folding barrier.

These are somewhat like the optimization hints, G_ASSERT_ZEXT in that they're
eliminated by the generic instruction selection code.

This change by itself has very minor improvements in -Os CTMark overall. What this
does allow is better optimizations when future combines are added that rely on having
expensive constants remain unfolded.

Differential Revision: https://reviews.llvm.org/D144336
2023-06-09 11:45:06 -07:00
Matt Arsenault
eece6ba283 IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics
AMDGPU has native instructions and target intrinsics for this, but
these really should be subject to legalization and generic
optimizations. This will enable legalization of f16->f32 on targets
without f16 support.

Implement a somewhat horrible inline expansion for targets without
libcall support. This could be better if we could introduce control
flow (GlobalISel version not yet implemented). Support for strictfp
legalization is less complete but works for the simple cases.
2023-06-06 17:07:18 -04:00
Dávid Bolvanský
09515f2c20 [SDAG] Preserve unpredictable metadata, teach X86CmovConversion to respect this metadata
Sometimes an developer would like to have more control over cmov vs branch. We have unpredictable metadata in LLVM IR, but currently it is ignored by X86 backend. Propagate this metadata and avoid cmov->branch conversion in X86CmovConversion for cmov with this metadata.

Example:

```
int MaxIndex(int n, int *a) {
    int t = 0;
    for (int i = 1; i < n; i++) {
        // cmov is converted to branch by X86CmovConversion
        if (a[i] > a[t]) t = i;
    }
    return t;
}

int MaxIndex2(int n, int *a) {
    int t = 0;
    for (int i = 1; i < n; i++) {
        // cmov is preserved
        if (__builtin_unpredictable(a[i] > a[t])) t = i;
    }
    return t;
}
```

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D118118
2023-06-01 20:56:44 +02:00
Felipe de Azevedo Piovezan
e8aee45be7 [IRTranslator] Implement translation of entry_value dbg.value intrinsics
For dbg.value intrinsics targeting an llvm::Argument address whose expression
starts with an entry value, we lower this to a DEBUG_VALUE targeting the livein
physical register corresponding to that Argument.

Depends on D151328

Differential Revision: https://reviews.llvm.org/D151329
2023-05-26 06:45:01 -04:00
Felipe de Azevedo Piovezan
b5983a38cb [IRTranslator][NFC] Refactor if/else chain into early returns
This will make it easier to add more cases in a subsequent commit and also
better conforms to the coding guidelines.

Differential Revision: https://reviews.llvm.org/D151328
2023-05-25 06:42:44 -04:00
Krzysztof Drewniak
0bc739a4ae [GlobalISel] Handle ptr size != index size in IRTranslator, CodeGenPrepare
While the original motivation for this patch (address space 7 on
AMDGPU) has been reworked and is not presently planned to reach IR
translation, the incorrect (by the spec) handling of index offset
width in IR translation and CodeGenPrepare is likely to trip someone
- possibly future AMD, since we have a p7:160:256:256:32 now, so we
convert to the other API now.

Reviewed By: aemerson, arsenm

Differential Revision: https://reviews.llvm.org/D143526
2023-05-12 16:21:01 +00:00
Felipe de Azevedo Piovezan
3f6e4e5b6e [IRTranslator][DebugInfo] Implement translation of entry_value vars
This commit implements IRTranslator lowering of dbg.declare intrinsics targeting
swiftasync Arguments, by putting them in the MachineFunction's table of
variables whose location doesn't change throughout the function.

Depends on D149881

Differential Revision: https://reviews.llvm.org/D149882
2023-05-12 11:55:39 -04:00
NAKAMURA Takumi
9cfeba5b12 Restore CodeGen/LowLevelType from Support
This is rework of;
  - D30046 (LLT)

Since I have introduced `llvm-min-tblgen` as D146352, `llvm-tblgen`
may depend on `CodeGen`.

`LowLevlType.h` originally belonged to `CodeGen`. Almost all userse are
still under `CodeGen` or `Target`. I think `CodeGen` is the right place
to put `LowLevelType.h`.

`MachineValueType.h` may be moved as well. (later, D149024)

I have made many modules depend on `CodeGen`. It is consistent but
inefficient. It will be split out later, D148769

Besides, I had to isolate MVT and LLT in modmap, since
`llvm::PredicateInfo` clashes between `TableGen/CodeGenSchedule.h`
and `Transforms/Utils/PredicateInfo.h`.
(I think better to introduce namespace llvm::TableGen)

Depends on D145937, D146352, and D148768.

Differential Revision: https://reviews.llvm.org/D148767
2023-05-03 00:13:19 +09:00
NAKAMURA Takumi
d45fae6010 Move CodeGen/LowLevelType => CodeGen/LowLevelTypeUtils
Before restoring `CodeGen/LowLevelType`, rename this to `LowLevelTypeUtils`.

Differential Revision: https://reviews.llvm.org/D148768
2023-04-25 08:53:17 +09:00
Felipe de Azevedo Piovezan
79a1e32915 [GlobalISel] Improve stack slot tracking in dbg.values
For IR like:

```
%alloca = alloca ...
dbg.value(%alloca, !myvar, OP_deref(<other_ops>))
```

GlobalISel lowers it to MIR:

```
%some_reg = G_FRAME_INDEX <stack_slot>
DBG_VALUE %some_reg, !myvar, OP_deref(<other_ops>)
```

In other words, if the value of `!myvar` can be obtained by
dereferencing an alloca, in MIR we say that the _location_ of a variable
is obtained by dereferencing register %some_reg (plus some
`<other_ops>`).

We can instead remove the use of `%some_reg`: the location of `!myvar`
_is_ `<stack_slot>` (plus some `<other_ops>`). This patch implements
this transformation, which improves debug information handling in O0, as
these registers hardly ever survive register allocation.

A note about testing: similar to what was done in D76934
(f24e2e9eebde4b7a1d), this patch exposed a bug in the Builder class when
using `-debug`, where we tried to print an incomplete instruction. The
changes in `MachineIRBuilder.cpp` address that.

Differential Revision: https://reviews.llvm.org/D147536
2023-04-05 08:21:00 -04:00
Kazu Hirata
b9c4b95b11 [llvm] Use ConstantInt::{isZero,isOne} (NFC) 2023-03-21 17:40:35 -07:00
Kazu Hirata
7e6e636fb6 Use llvm::has_single_bit<uint32_t> (NFC)
This patch replaces isPowerOf2_32 with llvm::has_single_bit<uint32_t>
where the argument is wider than uint32_t.
2023-02-15 22:17:27 -08:00
Kazu Hirata
55e2cd1609 Use llvm::count{lr}_{zero,one} (NFC) 2023-01-28 12:41:20 -08:00
Matt Arsenault
778cf5431c IR: Add atomicrmw uinc_wrap and udec_wrap
These are essentially add/sub 1 with a clamping value.

AMDGPU has instructions for these. CUDA/HIP expose these as
atomicInc/atomicDec. Currently we use target intrinsics for these,
but those do no carry the ordering and syncscope. Add these to
atomicrmw so we can carry these and benefit from the regular
legalization processes.
2023-01-24 17:55:11 -04:00
Kazu Hirata
caa99a01f5 Use llvm::popcount instead of llvm::countPopulation(NFC) 2023-01-22 12:48:51 -08:00
Matt Arsenault
e70ae0f46b DAG/GlobalISel: Fix broken/redundant setting of MODereferenceable
This was incorrectly setting dereferenceable on unaligned
operands. getLoadMemOperandFlags does the alignment dereferenceabilty
check without alignment, and then both paths went on to check
isDereferenceableAndAlignedPointer. Make getLoadMemOperandFlags check
isDereferenceableAndAlignedPointer, and remove the second call.
2023-01-13 20:30:30 -05:00
James Y Knight
1ae36b1387 Remove special cases for invoke of non-throwing inline-asm.
Non-throwing inline asm infers the nounwind attribute in
instcombine. Thus, it can be handled in the same manner as
non-throwing target functions are generally. Further special casing is
unnecessary complexity.
2023-01-06 13:53:10 -05:00
serge-sans-paille
38818b60c5
Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part
Use deduction guides instead of helper functions.

The only non-automatic changes have been:

1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).

Per reviewers' comment, some useless makeArrayRef have been removed in the process.

This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.

Differential Revision: https://reviews.llvm.org/D140955
2023-01-05 14:11:08 +01:00
Amara Emerson
53445f5b1c [GlobalISel] Add a new G_INVOKE_REGION_START instruction to fix an EH bug.
We currently have a bug where the legalizer, when dealing with phi operands,
may create instructions in the phi's incoming blocks at points which are effectively
dead due to a possible exception throw.

Say we have:

throwbb:
  EH_LABEL
  x0 = %callarg1
  BL @may_throw_call
  EH_LABEL
  B returnbb

bb:
  %v = phi i1 %true, throwbb, %false....

When legalizing we may need to widen the i1 %true value, and to do that we need
to create new extension instructions in the incoming block. Our insertion point
currently is the MBB::getFirstTerminator() which puts the IP before the unconditional
branch terminator in throwbb. These extensions may never be executed if the call
throws, and therefore we need to emit them before the call (but not too early, since
our new instruction may need values defined within throwbb as well).

throwbb:
  EH_LABEL
  x0 = %callarg1
  BL @may_throw_call
  EH_LABEL
  %true = G_CONSTANT i32 1 ; <<<-- ruh'roh, this never executes if may_throw_call() throws!
  B returnbb

bb:
  %v = phi i32 %true, throwbb, %false....

To fix this, I've added two new instructions. The main idea is that G_INVOKE_REGION_START
is a terminator, which tries to model the fact that in the IR, the original invoke inst
is actually a terminator as well. By using that as the new insertion point, we
make sure to place new instructions on always executing paths.

Unfortunately we still need to make the legalizer use a new insertion point API
that I've added, since the existing `getFirstTerminator()` method does a reverse
walk up the block, and any non-terminator instructions cause it to bail out. To
avoid impacting compile time for all `getFirstTerminator()` uses, I've added a new
method that does a forward walk instead.

Differential Revision: https://reviews.llvm.org/D137905
2022-12-07 10:28:51 -08:00
Krzysztof Parzyszek
ab672e9173 FPEnv: convert Optional to std::optional 2022-12-03 13:55:56 -06:00
Nicolai Hähnle
43b86bf992 AMDGPU: Remove BufferPseudoSourceValue
The use of a PSV for buffer intrinsics is misleading because it may be
misinterpreted as all buffer intrinsics accessing the same address in
memory, which is clearly not true.

Instead, build MachineMemOperands without a pointer value but with an
address space, so that address space-based alias analysis can still
work.

There is a lot of test churn because previously address space 4
(constant address space) was used as an address space for buffer
intrinsics. This doesn't make much sense and seems to have been an
accident -- see the change in
AMDGPUTargetMachine::getAddressSpaceForPseudoSourceKind.

Differential Revision: https://reviews.llvm.org/D138711
2022-11-29 22:15:11 +01:00
Janek van Oirschot
322966f8f8 [AMDGPU] Add llvm.is.fpclass intrinsic to existing SelectionDAG fp
class support and introduce GlobalISel implementation for AMDGPU

Uses existing SelectionDAG lowering of the llvm.amdgcn.class intrinsic
for llvm.is.fpclass
2022-11-28 16:00:36 -05:00
Matt Arsenault
162d9030ab GlobalISel: Pass through AA metadata for target memory intrinsics
The corresponding change for the DAG was done in fa4aac7335ac7ecabbb634d134bd4897783bf62b
2022-11-06 22:14:12 -08:00
Peter Rong
c2e7c9cb33 [CodeGen] Using ZExt for extractelement indices.
In https://github.com/llvm/llvm-project/issues/57452, we found that IRTranslator is translating `i1 true` into `i32 -1`.
This is because IRTranslator uses SExt for indices.

In this fix, we change the expected behavior of extractelement's index, moving from SExt to ZExt.
This change includes both documentation, SelectionDAG and IRTranslator.
We also included a test for AMDGPU, updated tests for AArch64, Mips, PowerPC, RISCV, VE, WebAssembly and X86

This patch fixes issue #57452.

Differential Revision: https://reviews.llvm.org/D132978
2022-10-15 15:45:35 -07:00
Matt Arsenault
34fb7803f8 GlobalISel: Pass through AssumptionCache 2022-09-19 19:10:51 -04:00
Matt Arsenault
0d8ffcc532 Analysis: Add AssumptionCache argument to isDereferenceableAndAlignedPointer
This does not try to pass it through from the end users.
2022-09-19 18:57:33 -04:00
Matt Arsenault
bb70b5d406 CodeGen: Set MODereferenceable from isDereferenceableAndAlignedPointer
Previously this was assuming piontsToConstantMemory implies
dereferenceable.
2022-09-12 08:38:35 -04:00
Marco Elver
31a548021b [GlobalISel] Propagate PCSections metadata to MachineInstr
Propagate (most) PC sections metadata to MachineInstr when GlobalISel is
doing instruction selection.

This change results in support for architectures using GlobalISel (such
as -O0 with AArch64). Not all instructions may be supported yet, and
requires further target-specific handling (such as done for AArch64
pseudo-atomics). Expanding supported instructions is planned on a
case-by-case basis and new use cases for PC sections metadata.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130886
2022-09-07 11:36:02 +02:00
Markus Böck
2fdf963daf [GlobalISel] Explicitly fail trying to translate gc.statepoint and related intrinsics
The provided testcase would previously fail with an assertion due to later down below trying to allocate registers for `token` return types and arguments. This is especially problematic as the process would then exit instead of falling back to using FastIsel.

This patch fixes that by simply explicitly failing translation if either of these intrinsics are encountered.

Fixes https://github.com/llvm/llvm-project/issues/57349

Differential Revision: https://reviews.llvm.org/D132974
2022-08-31 00:47:17 +02:00
Eli Friedman
cfd2c5ce58 Untangle the mess which is MachineBasicBlock::hasAddressTaken().
There are two different senses in which a block can be "address-taken".
There can be a BlockAddress involved, which means we need to map the
IR-level value to some specific block of machine code.  Or there can be
constructs inside a function which involve using the address of a basic
block to implement certain kinds of control flow.

Mixing these together causes a problem: if target-specific passes are
marking random blocks "address-taken", if we have a BlockAddress, we
can't actually tell which MachineBasicBlock corresponds to the
BlockAddress.

So split this into two separate bits: one for BlockAddress, and one for
the machine-specific bits.

Discovered while trying to sort out related stuff on D102817.

Differential Revision: https://reviews.llvm.org/D124697
2022-08-16 16:15:44 -07:00