35122 Commits

Author SHA1 Message Date
Matt Arsenault
460ffcddd9
AMDGPU: Make bf16/v2bf16 legal types (#76215)
There are some intrinsics are using i16 vectors in place of bfloat
vectors.
Move towards making bf16 vectors legal so these can migrate. Leave the
larger vectors for a later change.

Depends #76213 #76214
2024-01-04 22:31:18 +07:00
Simon Pilgrim
f45b75949d [DAG] SimplifyDemandedBits - call demanded elts variant directly for SELECT/SELECT_CC nodes.
Don't rebuild the demanded elts mask every time.
2024-01-04 10:53:45 +00:00
Simon Pilgrim
5b38ecff6e [DAG] BaseIndexOffset::equalBaseIndex - early out on failed matches. NFCI.
If we successfully cast only the first base node as GlobalAddressSDNode / ConstantPoolSDNode / FrameIndexSDNode then we can early out as we know that base won't cast as a later type.

Noticed while investigating profiles for potential compile time improvements.
2024-01-04 10:50:43 +00:00
Simon Pilgrim
43e0723899 [DAG] BaseIndexOffset::computeAliasing - early out on failed matches. NFCI.
Don't wait to test that all base ptr matches have succeeded
2024-01-04 10:50:42 +00:00
Simon Pilgrim
72db578d71 [DAG] Fix typo in VSELECT SimplifyDemandedVectorElts handling. NFC.
Rename UndefZero -> UndefSel (undefined elements from Sel operand).
2024-01-04 10:50:42 +00:00
Jie Fu
ff0c1f20a7 [CodeGen] Remove unused variables in TargetLoweringBase.cpp (NFC)
llvm-project/llvm/lib/CodeGen/TargetLoweringBase.cpp:570:12: error: unused variable 'ModeN' [-Werror,-Wunused-variable]
  570 |   unsigned ModeN, ModelN;
      |            ^~~~~
llvm-project/llvm/lib/CodeGen/TargetLoweringBase.cpp:570:19: error: unused variable 'ModelN' [-Werror,-Wunused-variable]
  570 |   unsigned ModeN, ModelN;
      |                   ^~~~~~
2 errors generated.
2024-01-04 18:45:55 +08:00
Thomas Preud'homme
ce61b0e9a4
Add out-of-line-atomics support to GlobalISel (#74588)
This patch implement the GlobalISel counterpart to
4d7df43ffdb460dddb2877a886f75f45c3fee188.
2024-01-04 10:15:16 +00:00
Jannik Silvanus
7954c57124
[IR] Fix GEP offset computations for vector GEPs (#75448)
Vectors are always bit-packed and don't respect the elements' alignment
requirements. This is different from arrays. This means offsets of
vector GEPs need to be computed differently than offsets of array GEPs.

This PR fixes many places that rely on an incorrect pattern
that always relies on `DL.getTypeAllocSize(GTI.getIndexedType())`.
We replace these by usages of  `GTI.getSequentialElementStride(DL)`, 
which is a new helper function added in this PR.

This changes behavior for GEPs into vectors with element types for which
the (bit) size and alloc size is different. This includes two cases:

* Types with a bit size that is not a multiple of a byte, e.g. i1.
GEPs into such vectors are questionable to begin with, as some elements
  are not even addressable.
* Overaligned types, e.g. i16 with 32-bit alignment.

Existing tests are unaffected, but a miscompilation of a new test is fixed.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
2024-01-04 10:08:21 +01:00
David Green
5550e9c841
[GlobalISel][AArch64] Add libcall lowering for fpowi. (#67114)
This adds legalization, notably libcall lowering for fpowi. It is a
little different to other methods as the function takes both a float and
integer register. Otherwise all vectors get scalarized and fp16 is
promoted to fp32.
2024-01-04 07:26:23 +00:00
Micah Weston
7df28fd61a
[SHT_LLVM_BB_ADDR_MAP][AsmPrinter] Implements PGOAnalysisMap emitting in AsmPrinter with tests. (#75202)
Uses machine analyses to emit PGOAnalysisMap into the bb-addr-map ELF
section. Implements filecheck tests to verify emitting new fields.

This patch emits optional PGO related analyses into the bb-addr-map ELF
section during AsmPrinter. This currently supports Function Entry Count,
Machine Block Frequencies. and Machine Branch Probabilities. Each is
independently enabled via the `feature` byte of `bb-addr-map` for the given
function.

A part of [RFC - PGO Accuracy Metrics: Emitting and Evaluating Branch and Block Analysis](https://discourse.llvm.org/t/rfc-pgo-accuracy-metrics-emitting-and-evaluating-branch-and-block-analysis/73902).
2024-01-03 19:17:44 -05:00
Craig Topper
bdcd7c0ba0
[DAGCombiner][RISCV] Preserve disjoint flag in folding (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2) (#76860)
Since we are shifting both inputs to the original Or by the same amount
and inserting zeros in the LSBs, the result should still be disjoint.
2024-01-03 13:14:13 -08:00
Craig Topper
47a1704ac9
[SelectionDAG][X86] Use disjoint flag in SelectionDAG::isADDLike. (#76847)
Keep the haveNoCommonBitsSet check because we haven't started inferring
the flag yet.

I've added tests for two transforms, but these are not the only
transforms that use isADDLike.
2024-01-03 11:54:29 -08:00
Arthur Eubanks
c4146121e9 Revert "Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG""
This reverts commit 0e46b49de43349f8cbb2a7d4c6badef6d16e31ae.

Causes crashes, see repro on 0e46b49de4.
2024-01-03 17:09:46 +00:00
David Green
771fd1ad2a
[DAG] Extend input types if needed in combineShiftToAVG. (#76791)
This atempts to fix #76734 which is a crash in invalid TRUNC nodes types
from unoptimized input code in combineShiftToAVG. The NVT can be VT if
the larger type was legal and the adds will not overflow, in which case
the inputs should be extended.

From what I can tell this appears to be valid (if not optimal for this
case): https://alive2.llvm.org/ce/z/fRieHR

The result has also been changed to getExtOrTrunc in case that VT==NVT,
which is not handled by SEXT/ZEXT.
2024-01-03 10:52:01 +00:00
David Green
d659bd1635
[GlobalISel][AArch64] Tail call libcalls. (#74929)
This tries to allow libcalls to be tail called, using a similar method
to DAG where the type is checked to make sure they match, and if so the
backend, through lowerCall checks that the tailcall is valid for all
arguments.
2024-01-03 07:59:36 +00:00
David Green
5b5614c92f
[AArch64][GlobalISel] Add legalization for vecreduce.fmul (#73309)
There are no native operations that we can use for floating point mul,
so lower by splitting the vector into chunks multiple times. There is
still a missing fold for fmul_indexed, that could help the gisel test
cases a bit.
2024-01-03 07:49:20 +00:00
Craig Topper
bbd57e1832
[SelectionDAG] Add initial plumbing for the disjoint flag. (#76751)
This copies the flag from IR to the SDNode in SelectionDAGBuilder, clears
the flag in SimplifyDemandedBits, and adds it to canCreateUndefOrPoison.

Uses of the flag will come in later patches.
2024-01-02 21:58:00 -08:00
Thorsten Schütt
4b9194952d
[GlobalIsel] Combine selects with constants (#76089)
A first small step at combining selects.
2024-01-02 17:26:39 +01:00
Alex Bradbury
a181b42565 [llvm][NFC] Use SDValue::getConstantOperandAPInt(i) where possible
The helper function allows examples like
`cast<ConstantSDNode>(Op.getOperand(0))->getAPIntValue();` to be changed
to `Op.getConstantOperandAPInt(0);`.

See #76708 for further context. Although there are far fewer
opportunities for replacement, I used a similar git grep and sed combo
as before, given I already had it to hand:

`git grep -l "cast<ConstantSDNode>\(.*->getOperand\(.*\)\)->getAPIntValue\(\)" | xargs sed -E -i 's/cast<ConstantSDNode>\((.*)->getOperand\((.*)\)\)->getAPIntValue\(\)/\1->getConstantOperandAPInt(\2)/'`
and
`git grep -l
"cast<ConstantSDNode>\(.*\.getOperand\(.*\)\)->getAPIntValue\(\)" |
xargs sed -E -i
's/cast<ConstantSDNode>\((.*)\.getOperand\((.*)\)\)->getAPIntValue\(\)/\1.getConstantOperandAPInt(\2)/'`
2024-01-02 14:43:55 +00:00
Alex Bradbury
80aeb62211
[llvm][NFC] Use SDValue::getConstantOperandVal(i) where possible (#76708)
This helper function shortens examples like
`cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();` to
`Node->getConstantOperandVal(1);`.

Implemented with:
`git grep -l
"cast<ConstantSDNode>\(.*->getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i

's/cast<ConstantSDNode>\((.*)->getOperand\((.*)\)\)->getZExtValue\(\)/\1->getConstantOperandVal(\2)/`
and `git grep -l
"cast<ConstantSDNode>\(.*\.getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i

's/cast<ConstantSDNode>\((.*)\.getOperand\((.*)\)\)->getZExtValue\(\)/\1.getConstantOperandVal(\2)/'`.
With a couple of simple manual fixes needed. Result then processed by
`git clang-format`.
2024-01-02 13:14:28 +00:00
Shao-Ce SUN
9f6bf00b25
[DAGCombine] Add DAG optimisation for BF16_TO_FP (#69426)
fold bf16_to_fp(op & 0xffff) -> bf16_to_fp(op)
2023-12-27 17:20:54 +08:00
Vettel
dc1fadef23
[MCP] Enhance MCP copy Instruction removal for special case(reapply) (#74239)
Machine Copy Propagation Pass may lose some opportunities to further
remove the redundant copy instructions during the ForwardCopyPropagateBlock
procedure. When we Clobber a "Def" register, we also need to remove the record 
from the copy maps that indicates "Src" defined "Def" to ensure the correct semantics
of the ClobberRegister function.  This patch reapplies #70778 and addresses the corner 
case bug  #73512 specific to the AMDGPU backend. Additionally, it refines the criteria 
for removing empty records from the copy maps, thereby enhancing overall safety.

For more information, please see the C++ test case generated code in 
"vector.body" after the MCP Pass: https://gcc.godbolt.org/z/nK4oMaWv5.
2023-12-26 16:22:42 +08:00
Kazu Hirata
41cb686d0f [CodeGen] Use range-based for loops (NFC) 2023-12-24 22:45:50 -08:00
HaohaiWen
536b043219
[RegAllocFast] Lazily initialize InstrPosIndexes for each MBB (#76275)
Most basic block do not need to query dominates. Defer initialization of
InstrPosIndexes to first query for each MBB.
2023-12-25 09:42:31 +08:00
Simon Pilgrim
1e710cfc80 [DAG] Add TLI::isTruncateFree(SDValue, EVT) wrapper.
Similar to the existing isZExtFree(SDValue, EVT) wrapper, this will allow targets to override for specific cases (e.g. free truncation of an ext/extload node). But for now its just used to wrap the existing isTruncateFree(EVT, EVT) call.
2023-12-24 13:19:10 +00:00
Felipe de Azevedo Piovezan
acacec3bbf
[LiveDebugValues][nfc] Reduce memory usage of InstrRef (#76051)
Commit 1b531d54f623 (#74203) removed the usage of unique_ptrs of arrays
in favour of using vectors, but inadvertently increased peak memory
usage by removing the ability to deallocate vector memory that was no
longer needed mid-LDV.

In that same review, it was pointed out that `FuncValueTable` typedef
could be removed, since it was "just a vector".

This commit addresses both issues by making `FuncValueTable` a real data
structure, capable of mapping BBs to ValueTables and able to free
ValueTables as needed.

This reduces peak memory usage in the compiler by 10% in the benchmarks
flagged by the original review.

As a consequence, we had to remove a handful of instances of the
"declare-then-initialize" antipattern in unittests, as the
FuncValueTable class is no longer default-constructible.
2023-12-23 13:44:45 -03:00
Matt Arsenault
ed6dc62862
DAG: Handle equal size element build_vector promotion (#76213) 2023-12-23 20:43:14 +07:00
Nikita Popov
d82eccc752 [RegAllocFast] Avoid duplicate hash lookup (NFC) 2023-12-22 16:52:20 +01:00
HaohaiWen
40ec791b15
[RegAllocFast] Refactor dominates algorithm for large basic block (#72250)
The original brute force dominates algorithm is O(n) complexity so it is
very slow for very large machine basic block which is very common with
O0. This patch added InstrPosIndexes to assign index for each
instruction and use it to determine dominance. The complexity is now
O(1).
2023-12-22 23:06:16 +08:00
Matt Arsenault
f7c3627338
DAG: Implement promotion for strict_fpextend (#74310)
Test is a placeholder, will be merged into the existing test after
additional bug fixes for illegal f16 targets are fixed.
2023-12-22 17:15:52 +07:00
Matt Arsenault
0e46b49de4 Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG"
This reverts commit c398fa009a47eb24f88383d5e911e59e70f8db86.

PPC backend was fixed in 2f82662ce901c6666fceb9c6c5e0de216a1c9667
2023-12-22 16:46:22 +07:00
Wang Pengcheng
17858ce6f3
[MacroFusion] Remove createBranchMacroFusionDAGMutation (#76209)
Instead, we add a `BranchOnly` parameter to indicate that only
branches with its predecessors will be fused.

X86 is the only user of `createBranchMacroFusionDAGMutation`.
2023-12-22 16:31:38 +08:00
Matt Arsenault
4d1cd38c95 DAG: Handle promotion of fcanonicalize
This avoids a regression in a future commit
2023-12-22 12:50:18 +07:00
Felipe de Azevedo Piovezan
058e527434
[AccelTable][NFC] Fix typos and duplicated code (#76155)
Renaming a member variable from "Endoding" to "Encoding".

Also replace inlined code for "isNormalized" with a call to the
function, so that if the definition of normalization ever changes, we
only need to change the one place.
2023-12-21 16:10:30 -03:00
Craig Topper
0dcff0db3a
[RISCV] Add codegen support for experimental.vp.splice (#74688)
IR intrinsics were already defined, but no codegen support had been
added.

I extracted this code from our downstream. Some of it may have come from
https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/ originally.
2023-12-21 08:38:32 -08:00
yan zhou
cd09f4b951
[CodeGen] This patch fix a bug that may caused error for a self-defined target in SelectionDAG::getNode (#75320)
we need first judge N1.getNumOperands() > 0.

If Lowering Generated SDNode like.

```
v2i32 t20:  TargetOpNode.
i32 t21: extract_vector_elt t20  0
i32 t22: extract_vector_elt t20 1
```

will cause a error.
2023-12-21 19:39:05 +07:00
Paschalis Mpeis
2e3d77d6ed
[TLI] Pass replace-with-veclib works with Scalable Vectors. (#73642)
[TLI] Pass replace-with-veclib works with Scalable Vectors.

The pass is heavily refactored.
It uses the Masked variant of a TLI method when the Intrinsic operates on Scalable Vectors.

 Improve tests for ArmPL and SLEEF Intrinsics:
- Auto-generate test `armpl-intrinsics.ll`, and use active lane mask to have shorter `shufflevector` check lines.
- Update scripts now add `@llvm.compiler.used` instead of using the regex: `@[[LLVM_COMPILER_USED:[a-zA-Z0-9_$"\\.-]+]]`
-  Add simplifycfg pass and noalias to ensure tail folding. `noalias` attribute was added only to the `%in.ptr` parameter of the ArmPL Intrinsics.
2023-12-21 12:37:57 +00:00
Matt Arsenault
9e574a3936 DAG: Fix expansion of bf16 sourced extloads
Also fix assorted vector extload failures for AMDGPU.
2023-12-20 19:24:27 +07:00
Yusra Syeda
0768253c20
[SystemZ][z/OS] Add exception handling for XPLINK (#74638)
Adds emitting the exception table and the EH registers for XPLINK.

---------

Co-authored-by: Yusra Syeda <yusra.syeda@ibm.com>
2023-12-19 13:58:33 -05:00
Jonas Paulsson
e32e147d6c
[DAGCombiner] Don't drop alignment info of original load. (#75626)
Pass the original MMO instead of different individual values.

getAlign() was used before where actually getOriginalAlign() would have been
better, and this patch has the same effect.
2023-12-19 16:30:47 +01:00
Rin
0894c2ee5f
[DAGCombiner] Avoid the pre-truncate of BUILD_VECTOR sources. (#75792)
Avoid the pre-truncate of BUILD_VECTOR sources when there is more than
one use. This can avoid using unnecessary movs later down the
instruction selection pipeline.
2023-12-19 15:25:38 +00:00
Matt Arsenault
5781d79a20 ShadowStackGCLowering: Remove unnecessary std::string 2023-12-19 17:12:52 +07:00
Wang Pengcheng
9348d437f5
[SelectionDAG] Add space-optimized forms of OPC_EmitRegister (#73291)
The followed byte of `OPC_EmitRegister` is a MVT type, which is
usually i32 or i64.

We add `OPC_EmitRegisterI32` and `OPC_EmitRegisterI64` so that we
can reduce one byte.

Overall this reduces the llc binary size with all in-tree targets by
about 10K.
2023-12-19 17:31:49 +08:00
Matt Arsenault
e8d98fa16b ShadowGCLowering: Drop typed pointer handling 2023-12-19 14:03:54 +07:00
paperchalice
72c75501ec
[CodeGen] Port LowerEmuTLS to new pass manager (#75171)
In fact, this pass need `llc` to test. `TargetMachine` seems redundant,
because before adding this pass `CodeGenPassBuilder` already checks it:

ed4194bb8d/llvm/include/llvm/CodeGen/CodeGenPassBuilder.h (L590-L592)
2023-12-19 14:44:35 +08:00
Felipe de Azevedo Piovezan
da2db4a9e8
[InstrRef][NFC] Delete unused variables (#75501)
`V` was unused, and all the other deletions follow from that
observation.
2023-12-18 11:53:18 -08:00
Simon Pilgrim
7b1e4239b3
[DAG] Fold (vt trunc (extload (vt x))) -> (vt load x) (#75229)
We were only folding cases which remained extloads, but DAG.getExtLoad can also handle the cases which don't need to extend at all (we just can't do truncloads).

reduceLoadWidth can handle this for scalar loads, but not for vectors.

Noticed while triaging D152928
2023-12-18 16:21:11 +00:00
Ulrich Weigand
82a1bffd34
[SelectionDAG] Do not crash on large integers in CheckInteger (#75787)
The CheckInteger routine called from TableGen-generated selection logic
uses getSExtValue - which will abort if the underlying APInt does not
fit into an int64_t.

This case is now triggered by the SystemZ back-end since i128 is a legal
type on certain machines. While we do not have any regular instructions
that take 128-bit immediates (like most other platforms), there are
patterns in the .td files that recognize an i128 "xor ..., -1" as a
"not".

These patterns cause code to be generated that calls the CheckInteger
routine on some i128-valued integer, which may trigger the assert.

Fix by using trySExtValue instead.

Fixes https://github.com/llvm/llvm-project/issues/75710
2023-12-18 14:03:57 +01:00
Kazu Hirata
2570c7e284 [CodeGen] Remove unused forward declarations (NFC) 2023-12-17 09:09:39 -08:00
Kazu Hirata
4b3078ef2d [CodeGen] Remove unnecessary includes (NFC) 2023-12-17 09:09:38 -08:00