36058 Commits

Author SHA1 Message Date
Dmitry Borisenkov
a38d5e0632
[SelectionDAG] Use LAST_INTEGER_VALUETYPE instead of i64 (#98299)
When looking for a largest legal integer type for a target
`TargetLowering::findOptimalMemOpLowering` assumes that `MVT::i64` is
the largets possible integer type. The patch removes this assumption and
uses `MVT::LAST_INTEGER_VALUETYPE` instead.
2024-07-10 21:38:50 +04:00
AtariDreams
4f8b2fff6d
[DAG] Use break instead of continue to leave do while (false) loop (NFC) (#97966) 2024-07-10 20:51:06 +04:00
paperchalice
abde52aa66
[CodeGen][NewPM] Port LiveIntervals to new pass manager (#98118)
- Add `LiveIntervalsAnalysis`.
- Add `LiveIntervalsPrinterPass`.
- Use `LiveIntervalsWrapperPass` in legacy pass manager.
- Use `std::unique_ptr` instead of raw pointer for `LICalc`, so
destructor and default move constructor can handle it correctly.

This would be the last analysis required by `PHIElimination`.
2024-07-10 19:34:48 +08:00
Daniel Kiss
1782810b84 [Clang][ARM][AArch64] Alway emit protection attributes for functions. (#82819)
So far branch protection, sign return address, guarded control stack
attributes are
only emitted as module flags to indicate the functions need to be
generated with
those features.
The problem is in case of an LTO build the module flags are merged with
the `min`
rule which means if one of the module is not build with sign return
address then the features
will be turned off for all functions. Due to the functions take the
branch-protection and
sign-return-address features from the module flags. The
sign-return-address is
function level option therefore it is expected functions from files that
is
compiled with -mbranch-protection=pac-ret to be protected.
The inliner might inline functions with different set of flags as it
doesn't consider
the module flags.

This patch adds the attributes to all functions and drops the checking
of the module flags
for the code generation.
Module flag is still used for generating the ELF markers.
Also drops the "true"/"false" values from the
branch-protection-enforcement,
branch-protection-pauth-lr, guarded-control-stack attributes as presence
of the
attribute means it is on absence means off and no other option.

Releand with test fixes.
2024-07-10 11:32:41 +02:00
paperchalice
145a692947
[CodeGen] Format PHIElimination.cpp NFC (#98289)
clang-format will format entire class when `class PHIElimination :
public MachineFunctionPass {` is changed. Format it firstly to reduce
unnecessary changes when porting it to new pass manager.
2024-07-10 17:13:02 +08:00
Daniel Kiss
4b2daeccc7
Revert "[Clang][ARM][AArch64] Alway emit protection attributes for functions." (#98284)
Reverts llvm/llvm-project#82819
2024-07-10 10:22:38 +02:00
Daniel Kiss
e15d67cfc2
[Clang][ARM][AArch64] Alway emit protection attributes for functions. (#82819)
So far branch protection, sign return address, guarded control stack
attributes are
only emitted as module flags to indicate the functions need to be
generated with
those features.
The problem is in case of an LTO build the module flags are merged with
the `min`
rule which means if one of the module is not build with sign return
address then the features
will be turned off for all functions. Due to the functions take the
branch-protection and
sign-return-address features from the module flags. The
sign-return-address is
function level option therefore it is expected functions from files that
is
compiled with -mbranch-protection=pac-ret to be protected.
The inliner might inline functions with different set of flags as it
doesn't consider
the module flags.
 
This patch adds the attributes to all functions and drops the checking
of the module flags
for the code generation.
Module flag is still used for generating the ELF markers.
Also drops the "true"/"false" values from the
branch-protection-enforcement,
branch-protection-pauth-lr, guarded-control-stack attributes as presence
of the
attribute means it is on absence means off and no other option.
2024-07-10 10:06:14 +02:00
Kazu Hirata
ef9aba2a2f
[CodeGen] Use range-based for loops (NFC) (#98104) 2024-07-10 16:10:48 +09:00
Alex Bradbury
4d052a7618
[Intrinsics][PreISelIntrinsicLowering] llvm.memset.inline length no longer needs to be constant (#95397)
As requested in
https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496
this patch removes the requirement that the length of llvm.memset.inline
is a constant, and adjusts PreISelIntrinsicLowering so it supports
expanding such the intrinsic in the case it has a non-constant length.
2024-07-10 07:58:52 +01:00
AdityaK
3e4adef946
[NFC] Add reference to the clustering algortihm for switch statements (#98239)
Menezes, Evandro, Sebastian Pop, and Aditya Kumar. "Clustering case
statements for indirect branch predictors." arXiv preprint
arXiv:1910.02351 (2019).

https://arxiv.org/pdf/1910.02351v2
2024-07-09 21:26:32 -07:00
David Tellenbach
8f159096e0
[AsmPrinter] Don't check for inlineasm dialect on non-X86 platforms (#98097)
AArch64 uses MCAsmInfo::AssemblerDialect to control the style of emitted
Neon assembly. E.g. Apple platforms use AsmWriterVariantTy::Apple by
default which collides with InlineAsm::AD_Intel (both value 1). Checking
for inlineasm dialects on non-X86 platforms can thus lead to problems.
2024-07-09 12:44:52 -07:00
Min-Yih Hsu
7e2f96194f
[MachineSink] Fix missing sinks along critical edges (#97618)
4e0bd3f improved early MachineLICM's capabilities to hoist COPY from
physical registers out of a loop. However, it accidentally broke one of
MachineSink's preconditions on sinking cheap instructions (in this case,
COPY) which considered those instructions being profitable to sink only
when there are at least two of them in the same def-use chain in the
same basic block. So if early MachineLICM hoisted one of them out,
MachineSink no longer sink rest of the cheap instructions. This results
in redundant load immediate instructions from the motivating example
we've seen on RISC-V.

This patch fixes this by teaching MachineSink that if there is more than
one demand to sink a register into the same block from different
critical edges, it should be considered profitable as it increases the
CSE opportunities.
This change also improves two of the AArch64's cases.
2024-07-09 10:48:22 -07:00
Luke Lau
baf22a527c [SelectionDAG] Handle vscale range wrapping in isKnownNeverZero
As pointed out by @preames, ConstantRange can wrap so it's possible
for zero to be in a range without zero being the minimum. This fixes
this by checking contains instead.
2024-07-09 23:05:22 +08:00
paperchalice
4010f894a1
[CodeGen][NewPM] Port SlotIndexes to new pass manager (#97941)
- Add `SlotIndexesAnalysis`.
- Add `SlotIndexesPrinterPass`.
- Use `SlotIndexesWrapperPass` in legacy pass.
2024-07-09 12:09:11 +08:00
paperchalice
ac0b2814c3
[CodeGen][NewPM] Port LiveVariables to new pass manager (#97880)
- Port `LiveVariables` to new pass manager.
- Convert to `LiveVariablesWrapperPass` in legacy pass manager.
2024-07-09 10:50:43 +08:00
paperchalice
79d0de2ac3
[CodeGen][NewPM] Port machine-loops to new pass manager (#97793)
- Add `MachineLoopAnalysis`.
- Add `MachineLoopPrinterPass`.
- Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.
2024-07-09 09:11:18 +08:00
Kazu Hirata
d1f0ba6155
[AsmPrinter] Use range-based for loops (NFC) (#97977) 2024-07-09 05:55:29 +09:00
Manish Kausik H
69192e0193
[LegalizeDAG] Optimize CodeGen for ISD::CTLZ_ZERO_UNDEF (#83039)
Previously we had the same instructions being generated for `ISD::CTLZ` and `ISD::CTLZ_ZERO_UNDEF` which did not take advantage of the fact that zero is an invalid input for `ISD::CTLZ_ZERO_UNDEF`. This commit separates codegen for the two cases to allow for the optimization for the latter case.

The details of the optimization are outlined in #82075

Fixes #82075

Co-authored-by: Manish Kausik H <hmamishkausik@gmail.com>
2024-07-08 14:01:32 +01:00
Momchil Velikov
a497e987e5 Reapply "[AArch64] Lower extending sitofp using tbl (#92528)"
This re-commits d1a4f0c9fb559eb4c2fb56112e56343bcd333edc after
a issue was fixed in f92bfca9fc217cad9026598ef6755e711c0be070
("[AArch64] All bits of an exact right shift are demanded (#97448)").
2024-07-08 11:55:29 +01:00
esmeyi
c119da23af [PowerPC] Function descriptor symbol may be omitted for external symbol. #97526
If a function's address is taken, which means it may be called via a function pointer,
we need the function descriptor for it.
Otherwise, the function descriptor can be omitted for external symbols.
2024-07-08 03:47:33 -04:00
Fangrui Song
2718654c54
[MC] Support .cfi_label
GNU assembler 2.26 introduced the .cfi_label directive. It does not
expand to any CFI instructions, but defines a label in
.eh_frame/.debug_frame, which can be used by runtime patching code to
locate the FDE. .cfi_label is not allowed for CIE's initial
instructions, and can therefore be used to force the next instruction to
be placed in a FDE instead of a CIE.

In glibc since 2018, sysdeps/riscv/start.S utilizes .cfi_label to force
DW_CFA_undefined to be placed in a FDE. arc/csky/loongarch ports have
copied this use.
```
.cfi_startproc
// DW_CFA_undefined is allowed for CIE's initial instructions.
// Without .cfi_label, gas would place DW_CFA_undefined in a CIE.
.cfi_label .Ldummy
.cfi_undefined ra
.cfi_endproc
```

No CFI instruction is associated with .cfi_label, so the `case
MCCFIInstruction::OpLabel:` code in BOLT is unreachable and onlt to make
-Wswitch happy.

Close #97222

Pull Request: https://github.com/llvm/llvm-project/pull/97922
2024-07-07 12:41:13 -07:00
Kazu Hirata
75bc20ff89
[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#97914) 2024-07-07 08:23:41 +09:00
Youngsuk Kim
34855405b0 [llvm] Avoid 'raw_string_ostream::str' (NFC)
Since `raw_string_ostream` doesn't own the string buffer, it is
desirable (in terms of memory safety) for users to directly reference
the string buffer rather than use `raw_string_ostream::str()`.

Work towards TODO item to remove `raw_string_ostream::str()`.
2024-07-05 17:22:03 -05:00
Bjorn Pettersson
c2fbc701aa [SelectionDAG] Let ComputeKnownSignBits handle (shl (ext X), C) (#97695)
Add simple support for looking through ZEXT/ANYEXT/SEXT when doing
ComputeKnownSignBits for SHL. This is valid for the case when all
extended bits are shifted out, because then the number of sign bits
can be found by analysing the EXT operand.

A future improvement could be to pass along the "shifted left by"
information in the recursive calls to ComputeKnownSignBits. Allowing
us to handle this more generically.
2024-07-05 22:37:26 +02:00
Luke Lau
e4b28420f6
[SelectionDAG] Handle VSCALE in isKnownNeverZero (#97789)
VSCALE is by definition greater than zero, but this checks it via
getVScaleRange anyway.

The motivation for this is to be able to check if the EVL for a VP
strided load is non-zero in #97394.

I added the tests to the RISC-V backend since the existing X86
known-never-zero.ll test crashed when trying to lower vscale for the
+sse2 RUN line.
2024-07-05 16:11:06 +08:00
Shengchen Kan
a48305e0f9 [X86][CodeGen] Convert masked.load/store to CLOAD/CSTORE node only when vector size = 1
This fixes the crash when building llvm-test-suite with avx512f + cf.
2024-07-05 15:50:21 +08:00
Shengchen Kan
c60b9307d0 Revert "[X86][CodeGen] Convert masked.load/store to CLOAD/CSTORE node only when vector size = 1"
This reverts commit 74984dee51307779a3eab10a8cd6102be37e1081.

It caused AArch64 test sve-nontemporal-masked-ldst.ll to fail.
2024-07-05 15:14:30 +08:00
Shengchen Kan
74984dee51 [X86][CodeGen] Convert masked.load/store to CLOAD/CSTORE node only when vector size = 1
This fixes the crash when building llvm-test-suite with avx512f + cf.
2024-07-05 14:35:42 +08:00
Craig Topper
33112cbf59 [DAGCombiner] Remove unnecessary assert from getShiftAmountTy wrapper. NFC
The same assert appears in the TargetLowering function.

Refine comment to describe as a convenience wrapper and leave it to
TargetLowering documentation to explain.
2024-07-04 19:05:54 -07:00
Craig Topper
8419da8bd4
[SelectionDAG] Remove LegalTypes argument from getShiftAmountConstant. (#97653)
#97645 proposed to remove LegalTypes from getShiftAmountTy. This patches
removes it from getShiftAmountConstant which is one of the callers of
getShiftAmountTy.
2024-07-04 18:33:25 -07:00
Craig Topper
3141c11fe8
[SelectionDAG] Remove LegalTypes argument from getShiftAmountTy. NFC (#97757)
This argument is no longer used inside the function. Remove it from the
interface.
2024-07-04 15:24:54 -07:00
Simon Pilgrim
687531fbed [DAG] PromoteIntRes_EXTRACT_SUBVECTOR - pull out repeated getOperand/getVectorElementType calls. NFC. 2024-07-04 17:12:43 +01:00
Craig Topper
34fe032fdb
[DAGCombiner] Use getShiftAmountConstant where possible. (#97683)
In #97645, I proposed removing the LegalTypes operand to
TargetLowering::getShiftAmountTy. This means we don't need to use the
DAGCombiner wrapper for getShiftAmountTy that manages this flag. Now we
can use getShiftAmountConstant and let it call
TargetLowering::getShiftAmountTy.
2024-07-04 08:44:50 -07:00
Craig Topper
f4d058fdb1
[SelectionDAG] Ignore LegalTypes parameter in TargetLoweringBase::getShiftAmountTy. (#97645)
When this flag was false, `getShiftAmountTy` would return `PointerTy`
instead of the target's preferred shift amount type for scalar shifts.

This used to be needed when the target's preferred type wasn't large
enough to support the shift amount needed for an illegal type. For
example, any scalar type larger than i256 on X86 since X86's preferred
shift amount type is i8.

For a while now, we've had code that uses `MVT::i32` if `LegalTypes` is
true, but the target's preferred type is too small. This fixed a
repeated cause of crashes where the `LegalTypes` flag wasn't set to
false when illegal types could be present.

This has made it unnecessary to set the `LegalTypes` flag correctly, and
as a result more and more places don't. So I think its time for this
flag to go away.

This first patch just disconnects the flag. The interface and all
callers will be cleaned up in follow up patches.

The X86 test change is because we now have the same shift type for both
shifts in a (srl (sub C, (shl X, 32), 32) sequence. This makes the shift
amounts appear equal in value and type which is needed to enable a
combine.
2024-07-04 08:42:53 -07:00
Nicholas Guy
6222c8f030
[IR][LangRef] Add partial reduction add intrinsic (#94499)
Adds the llvm.experimental.partial.reduce.add.* overloaded intrinsic,
this intrinsic represents add reductions that result in a narrower
vector.
2024-07-04 13:32:42 +01:00
Haohai Wen
73f5f83b19
[BasicBlockSections] Using MBBSectionID as DenseMap key (#97295)
getSectionIDNum may return same value for two different MBBSectionID.
e.g. A Cold type MBBSectionID with number 0 and a Default type
MBBSectionID with number 2 get same value 2 from getSectionIDNum. This
may lead to overwrite of MBBSectionRanges.  Using MBBSectionID itself
as DenseMap key is better choice.
2024-07-04 09:52:38 +08:00
Craig Topper
a3c5c83273 [DAGCombiner] Remove unneeded getValueType() calls in visitMULHS/MULHU. NFC
We have an existing VT variable that should match N0.getValueType.
2024-07-03 13:35:04 -07:00
Yingwei Zheng
d5c9ffd545
[SDAG] Intersect poison-generating flags after CSE (#97434)
This patch fixes a miscompilation when `N` gets CSEed to `Existing`:
```
Existing: t5: i32 = sub nuw Constant:i32<0>, t3
N: t30: i32 = sub Constant:i32<0>, t3
```

Fixes https://github.com/llvm/llvm-project/issues/96366.
2024-07-03 20:32:46 +08:00
David Green
3b73cb3bf1 [AArch64][GlobalISel] Create copy rather than single-element concat
The verifier does not accept single-element G_CONCAT_VECTORS, so if there is a
single Op generate a COPY instead.
2024-07-03 10:22:15 +01:00
Alexis Engelke
bb260eb87d
[CodeGen] Only deduplicate PHIs on critical edges (#97064)
PHIElim deduplicates identical PHI nodes to reduce the number of copies
inserted. There are two cases:

1. Identical PHI nodes are in different blocks. That's the reason for
   this optimization; this can't be avoided at SSA-level. A necessary
   prerequisite for this is that the predecessors of all basic blocks
   (where such a PHI node could occur) are the same. This implies that
   all (>= 2) predecessors must have multiple successors, i.e. all edges
   into the block are critical edges.

2. Identical PHI nodes are in the same block. CSE can remove these.
   There are a few cases, however, where they still occur regardless:

   - expand-large-div-rem creates PHI nodes with large integers, which
     get lowered into one PHI per MVT. Later, some identical values
     (zeroes) get folded, resulting in identical PHI nodes.
   - peephole-opt occasionally inserts PHIs for the same value.
   - Some pseudo instruction emitters create redundant PHI nodes (e.g.,
     AVR's insertShift), merging the same values more than once.

   In any case, this happens rarely and MachineCSE handles most cases
   anyway, so that PHIElim only gets to see very few of such cases (see
   changed test files).

Currently, all PHI nodes are inserted into a DenseMap that checks
equality not by pointer but by operands. This hash map is pretty
expensive (hashing itself and the hash map), but only really useful in
the first case.

Avoid this expensive hashing most of the time by restricting it to basic
blocks with only critical input edges. This improves performance for
code with many PHI nodes, especially at -O0. (Note that Clang often
doesn't generate PHI nodes and -O0 includes no mem2reg. Other
compilers always generate PHI nodes.)
2024-07-03 11:19:05 +02:00
Thorsten Schütt
c5b67dde98
[GlobalIsel][NFC] Modernize UBFX combine (#97513)
Credits: https://reviews.llvm.org/D99283
2024-07-03 09:19:40 +02:00
Kazu Hirata
3641efcf8c
[CodeGen] Use range-based for loops (NFC) (#97500) 2024-07-02 19:24:53 -07:00
Ryotaro KASUGA
0a369b06e3
Reapply "[MachinePipeliner] Fix constraints aren't considered in cert… (#97259)
…ain cases" (#97246)

This reverts commit e6a961dbef773b16bda2cebc4bf9f3d1e0da42fc.

There is no difference from the original change. I re-ran the failed
test and it passed. So the failure wasn't caused by this change.
test result: https://lab.llvm.org/buildbot/#/builders/176/builds/585
2024-07-03 09:15:41 +09:00
Kazu Hirata
58fd3bea6d
[CodeGen] Use range-based for loops (NFC) (#97467) 2024-07-02 16:36:13 -07:00
Igor Kudrin
23db37c51c
[CodeGen] Do not emit TRAP for unreachable after @llvm.trap (#94570)
With `--trap-unreachable`, `clang` can emit double `TRAP` instructions
for code that contains a call to `__builtin_trap()`:

```
> cat test.c
void test() { __builtin_trap(); }
> clang test.c --target=x86_64 -mllvm --trap-unreachable -O1 -S -o -
...
test:
...
  ud2
  ud2
...
```

`SimplifyCFGPass` inserts `unreachable` after a call to a `noreturn`
function, and later this instruction causes `TRAP/G_TRAP` to be emitted
in `SelectionDAGBuilder::visitUnreachable()` or
`IRTranslator::translateUnreachable()` if
`TargetOptions.TrapUnreachable` is set.

The patch checks the instruction before `unreachable` and avoids
inserting an additional trap.
2024-07-02 15:36:02 -07:00
Youngsuk Kim
a95c85fba5
[llvm][CodeGen] Avoid 'raw_string_ostream::str' (NFC) (#97318)
Since `raw_string_ostream` doesn't own the string buffer, it is
desirable (in terms of memory safety) for users to directly reference
the string buffer rather than use `raw_string_ostream::str()`.

Work towards TODO comment to remove `raw_string_ostream::str()`.
2024-07-01 21:52:37 -04:00
Kazu Hirata
bf6f2c1c43
[CodeGen] Use range-based for loops (NFC) (#97187) 2024-07-01 16:11:09 -07:00
Shilei Tian
9a4f57ec1e
[SelectionDAG] Use EVT::getIntegerVT in getBitcastedAnyExtOrTrunc (#96658)
`SelectionDAG::getBitcastedAnyExtOrTrunc` assumes that there is always a
valid
integer type corresponding to another type, which is not always true
when it
comes to vector type. For example, `<3 x i8>` doesn't have a
corresponding
integer type.

Fix SWDEV-464698.
2024-07-01 15:10:57 -04:00
Simon Pilgrim
163d00c666 [DAG] Pull out repeated SDLoc in SELECT/SETCC folds. NFC. 2024-07-01 18:03:46 +01:00
Alexis Engelke
80ffec7884
[AsmPrinter] Remove timers (#97046)
Timers are an out-of-line function call and a global variable access,
here twice per emitted instruction. At this granularity, not only the
time results become skewed, but the timers also add a performance
overhead when profiling is disabled. Also outside of the innermost loop,
timers add a measurable overhead. As this is quite expensive for a
mostly unused profiling facility, remove the timers.

Fixes #39650.
2024-07-01 16:20:54 +02:00