49268 Commits

Author SHA1 Message Date
Simon Pilgrim
076bee1020 [DAG] getNode() - fold (zext (trunc (assertzext x))) -> (assertzext x)
If the pre-truncated value was the same width as the extension, and the assertzext guarantees that the extended bits are already zero, then skip the zext/trunc 'zero_extend_inreg' pattern.

Addresses several regressions noticed in D155472
2023-07-31 10:43:11 +01:00
Simon Tatham
60b98363c7 Retain all jump table range checks when using BTI.
This modifies the switch-statement generation in SelectionDAGBuilder,
specifically the part that generates case clusters of type CC_JumpTable.

A table-based branch of any kind is at risk of being a JOP gadget, if
it doesn't range-check the offset into the table. For some types of
table branch, such as Arm TBB/TBH, the impact of this is limited
because the value loaded from the table is a relative offset of
limited size; for others, such as a MOV PC,Rn computed branch into a
table of further branch instructions, the gadget is fully general.

When compiling for branch-target enforcement via Arm's BTI system,
many of these table branch idioms use branch instructions of types
that do not require a BTI instruction at the branch destination. This
avoids the need to put a BTI at the start of each case handler,
reducing the number of available gadgets //with// BTIs (i.e. ones
which could be used by a JOP attack in spite of the BTI system). But
without a range check, the use of a non-BTI-requiring branch also
opens up a larger range of followup gadgets for an attacker's use.

A defence against this is to avoid optimising away the range check on
the table offset, even if the compiler believes that no out-of-range
value should be able to reach the table branch. (Rationale: that may
be true for values generated legitimately by the program, but not
those generated maliciously by attackers who have already corrupted
the control flow.)

The effect of keeping the range check and branching to an unreachable
block is that no actual code is generated at that block, so it will
typically point at the end of the function. That may still cause some
kind of unpredictable code execution (such as executing data as code,
or falling through to the next function in the code section), but even
if so, there will only be //one// possible invalid branch target,
rather than giving an attacker the choice of many possibilities.

This defence is enabled only when branch target enforcement is in use.
Without branch target enforcement, the range check is easily bypassed
anyway, by branching in to a location just after it. But with
enforcement, the attacker will have to enter the jump table dispatcher
at the initial BTI and then go through the range check. (Or, if they
don't, it's because they //already// have a general BTI-bypassing
gadget.)

Reviewed By: MaskRay, chill

Differential Revision: https://reviews.llvm.org/D155485
2023-07-31 10:39:50 +01:00
Francesco Petrogalli
c4b21d57bc [llc] Add the command line option -sched-model-force-enable-intervals.
The option is used to force the use of resource intervals
in the machine scheduler, effectively ignoring the value of
`EnableIntervals` in the instance of the `SchedMachineModel`.

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D156540
2023-07-31 10:10:18 +02:00
Nikita Popov
063b37e7b4 Reapply [IR] Mark and/or constant expressions as undesirable
Reapply after D156401, which stops PatternMatch from recognizing
binop constant expressions, which should avoid the infinite loops
and assertion failures this patch previously exposed.

-----

In preparation for removing support for and/or expressions, mark
them as undesirable. As such, we will no longer implicitly create
such expressions, but they still exist.
2023-07-31 09:54:24 +02:00
Sameer Sahasrabuddhe
d9847cde48 [GlobalISel] convergent intrinsics
Introduced the convergent equivalent of the existing G_INTRINSIC opcodes:

- G_INTRINSIC_CONVERGENT
- G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS

Out of the targets that currently have some support for GlobalISel, the patch
assumes that the convergent intrinsics only relevant to SPIRV and AMDGPU.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D154766
2023-07-31 12:15:39 +05:30
David Green
e8e49a3567 [AArch64][GlobalISel] G_FMINNUM and G_FMAXNUM vector lowering
This attempts to expand the handling for G_FMAXNUM/G_FMINNUM for vector types,
which is hopefully fairly straightforward now that fptrunc and fpext are
working.

Differential Revision: https://reviews.llvm.org/D156171
2023-07-31 07:35:28 +01:00
Craig Topper
eff53ce8fc [RISCV] Remove unused CHECK prefix from test. NFC 2023-07-30 22:15:54 -07:00
Ben Shi
14e0a67a2d [CSKY] Add more IR patterns to select FNMUL
Reviewed By: zixuan-wu

Differential Revision: https://reviews.llvm.org/D155169
2023-07-31 12:12:56 +08:00
Jianjian GUAN
b7408ebbb7 [RISCV] Use x0 in vsetvli when avl is equal to vlmax.
We could use x0 form in vsetvli when we already know the vlmax and avl is equal to it.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D156404
2023-07-31 09:49:40 +08:00
David Green
cf39dea58d [AArch64] Add a fminnum/fmaxnum test. NFC
See D156171.
2023-07-30 17:27:05 +01:00
David Green
76f0d186d6 [AArch64] Regenerate arm64-vabs.ll, arm64-subvector-extend.ll and some mir tests. NFC 2023-07-30 16:51:01 +01:00
Jay Foad
58642565ec [Hexagon] Add machine verification to some tests
This is to help catch problems in D156552 that only showed up in an
expensive checks build.
2023-07-30 12:07:34 +01:00
Jay Foad
e2e3f06813 Revert "[MachineScheduler] Track physical register dependencies per-regunit"
This reverts commit 1a54671d5405a39de362e9692ce963c0638023bc.

It was causing lit test failures in a LLVM_ENABLE_EXPENSIVE_CHECKS
build.
2023-07-29 18:05:25 +01:00
Jay Foad
1a54671d54 [MachineScheduler] Track physical register dependencies per-regunit
Change the scheduler's physical register dependency tracking from
registers-and-their-aliases to regunits. This has a couple of advantages
when subregisters are used:

- The dependency tracking is more accurate and creates fewer useless
  edges in the dependency graph. An AMDGPU example, edited for clarity:

    SU(0): $vgpr1 = V_MOV_B32 $sgpr0
    SU(1): $vgpr1 = V_ADDC_U32 0, $vgpr1
    SU(2): $vgpr0_vgpr1 = FLAT_LOAD_DWORDX2 $vgpr0_vgpr1, 0, 0

  There is a data dependency on $vgpr1 from SU(0) to SU(1) and from
  SU(1) to SU(2). But the old dependency tracking code also added a
  useless edge from SU(0) to SU(2) because it thought that SU(0)'s def
  of $vgpr1 aliased with SU(2)'s use of $vgpr0_vgpr1.

- On targets like AMDGPU that make heavy use of subregisters, each
  register can have a huge number of aliases - it can be quadratic in
  the size of the largest defined register tuple. There is a much lower
  bound on the number of regunits per register, so iterating over
  regunits is faster than iterating over aliases.

The LLVM compile-time tracker shows a tiny overall improvement of 0.03%
on X86. I expect a larger compile-time improvement on targets like
AMDGPU.

Differential Revision: https://reviews.llvm.org/D156552
2023-07-29 15:34:53 +01:00
Jay Foad
5a64c89c8d [MachineScheduler] Test case for physical register dependencies
Differential Revision: https://reviews.llvm.org/D156551
2023-07-29 15:34:53 +01:00
Anatoly Trosinenko
4210204f52 [AArch64] Refactor checks in sign-return-address.ll test
Using implicit CHECK prefix in one FileCheck invocation and explicit
CHECK-V83A in the other one seems to misguide to use CHECK: lines as
a common matching prefix at various places. Also note that

; CHECK, CHECK-V83A: ...

line only matches the "CHECK-V83A" prefix.

This commit explicitly splits the checks into common ones (CHECK) and
invocation-specific ones (COMPAT and V83A) and updates the assertions
with the update_llc_test_checks.py script.

Reviewed By: efriedma, MaskRay

Differential Revision: https://reviews.llvm.org/D156327
2023-07-29 12:58:56 +03:00
Wael Yehia
9d4e8c09f4 [XCOFF] Do not put MergeableCStrings in their own section
The current implementation generates a csect with a
".rodata.str.x.y" prefix for a MergeableCString variable definition.
However, a reference to such variable does not get the prefix in its
name because there's not enough information in the containing IR.
In particular, without seeing the initializer and absent of some other
indicators, we cannot tell that the referenced variable is a null-
terminated string.

When the AIX codegen in llvm was being developed, the prefixing was copied
from ELF without having the linker take advantage of the info.
Currently, the AIX linker does not have the capability to merge
MergeableCString variables. If such feature would ever get implemented,
the contract between the linker and compiler would have to be reconsidered.

Here's the before and after of this change:
```
@a = global i64 320255973571806, align 8
@strA = unnamed_addr constant [7 x i8] c"hello\0A\00", align 1  ;; Mergeable1ByteCString
@strB = unnamed_addr constant [8 x i8] c"Blahah\0A\00", align 1 ;; Mergeable1ByteCString
@strC = unnamed_addr constant [2 x i16] [i16 1, i16 0], align 2 ;; Mergeable2ByteCString
@strD = unnamed_addr constant [2 x i16] [i16 1, i16 1], align 2 ;; !isMergeableCString
@strE = external unnamed_addr constant [2 x i16], align 2

-fdata-sections:
  .text  extern        .rodata.str1.1strA        .text  extern        strA
    0    SD       RO                               0    SD       RO
  .text  extern        .rodata.str1.1strB        .text  extern        strB
    0    SD       RO                               0    SD       RO
  .text  extern        .rodata.str2.2strC  ===>  .text  extern        strC
    0    SD       RO                               0    SD       RO
  .text  extern        strD                      .text  extern        strD
    0    SD       RO                               0    SD       RO
  .data  extern        a                         .data  extern        a
    0    SD       RW                               0    SD       RW
  undef  extern        strE                      undef  extern        strE
    0    ER       UA                               0    ER       UA

-fno-data-sections:
  .text  unamex        .rodata.str1.1            .text  unamex        .rodata
    0    SD       RO                               0    SD       RO
  .text  extern        strA                      .text  extern        strA
    0    LD       RO                               0    LD       RO
  .text  extern        strB                      .text  extern        strB
    0    LD       RO                               0    LD       RO
  .text  unamex        .rodata.str2.2      ===>  .text  extern        strC
    0    SD       RO                               0    LD       RO
  .text  extern        strC                      .text  extern        strD
    0    LD       RO                               0    LD       RO
  .text  unamex        .rodata                   .data  unamex        .data
    0    SD       RO                               0    SD       RW
  .text  extern        strD                      .data  extern        a
    0    LD       RO                               0    LD       RW
  .data  unamex        .data                     undef  extern        strE
    0    SD       RW                               0    ER       UA
  .data  extern        a
    0    LD       RW
  undef  extern        strE
    0    ER       UA
```

Reviewed by: David Tenty, Fangrui Song

Differential Revision: https://reviews.llvm.org/D156202
2023-07-29 03:24:21 +00:00
Matt Arsenault
3240ae7034 AMDGPU/GlobalISel: Set dead on scc on manually selected instructions
In SelectionDAG InstrEmitter automatically puts dead flags on unused
physreg defs everywhere. The generated selectors should also set dead
on physreg defs that were not used in the pattern.
2023-07-28 14:14:06 -04:00
Matt Arsenault
c26dfc81e2 [HACK] X86: Disable isCopyInstrImpl for undef subregister defs
This is a workaround for a coalescer bug where coalescing
SUBREG_TO_REG ends up losing the liveness of the high bits of the
source register. The result is an incorrect undef subregister def
instead of preserving the high values. Work around the observed
failure after the resulting mov is eliminated during allocation until
a proper fix is ready. I believe the proper fix is to make
SUBREG_TO_REG use a tied operand.

The test should catch a regression originally observed after
b7836d856206ec39509d42529f958c920368166b and should not show a
difference after a496c8be6e638ae58bb45f13113dbe3a4b7b23fd is reverted.

https://reviews.llvm.org/D156164
2023-07-28 13:33:28 -04:00
Arthur Eubanks
f800c1f3b2 [PEI] Don't zero out noreg operands
A tail call may have $noreg operands.

Fixes a crash.

Reviewed By: xgupta

Differential Revision: https://reviews.llvm.org/D156485
2023-07-28 10:23:17 -07:00
Jeffrey Byrnes
391249d1af [AMDGPU] Allow 8,16 bit sources in calculateSrcByte
This is required for many trees produced in practice for i8 CodeGen.

Differential Revision: https://reviews.llvm.org/D155864

Change-Id: Iac01d183d9998b15138bdc7a5051e3bed338e7d9
2023-07-28 09:50:21 -07:00
Jay Foad
945123384e [PEI][ARM] Switch to backwards frame index elimination
This adds better support for call frame pseudos that adjust SP in
PEI::replaceFrameIndicesBackward.

Running frame index elimination backwards is preferred because it can
do backwards register scavenging (on targets that require scavenging)
which does not rely on accurate kill flags.

Differential Revision: https://reviews.llvm.org/D156434
2023-07-28 17:32:51 +01:00
Evgenii Kudriashov
c13e310fa7 [DAGCombine] Support truncated constants for fptosi.sat combining
Closes https://github.com/llvm/llvm-project/issues/56779

Reviewed By: RKSimon, dmgreen

Differential Revision: https://reviews.llvm.org/D152926
2023-07-28 18:54:39 +03:00
Mikhail Gudim
cb15e657b5 [RISCV] A test for conditional binary ops.
Consider the following pattern:

```
%binop_ = binop %x, %y
%select_ = select %c, %binop_, %x
```

If there is an identity `%identity` operand for `binop`, it is possible to transform
the above code to:
```
%opearand = select %c, %y, %identity
%result = binop %x, %operand
```
This transformation is profitable when `%identity` is all zeroes or
ones.

This patch commits a test for such patterns.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D155481
2023-07-28 11:08:02 -04:00
melonedo
afb9c04a5a [RISCV] Add support for XCVbi extension in CV32E40P
Implement XCVbi intrinsics for CV32E40P according to the specification.

This commit is part of a patch-set to upstream the 7 vendor specific extensions of CV32E40P.

Contributors: @CharKeaney, @jeremybennett, @lewis-revill, @liaolucy, Nandni Jamnadas, @paolos, @simoncook, @xmj.

bf2ad26b4ff856aab9a62ad168e6bdefeedc374f originally commited.
e4777dc4b9cb371971523cc603e1b8a5c7255e7e reverted due to test failures caused by a merge conflict marker in llvm/test/CodeGen/RISCV/attributes that was accidentally checked in.
This commit removed the conflict marker and recommitted.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154412
2023-07-28 21:54:10 +08:00
Ben Shi
7ca6b76934 [CSKY] Optimize conditional branches with float comparison
Reviewed By: zixuan-wu

Differential Revision: https://reviews.llvm.org/D155424
2023-07-28 21:23:21 +08:00
melonedo
e4777dc4b9 Revert "[RISCV] Add support for XCVbi extension in CV32E40P"
This reverts commit bf2ad26b4ff856aab9a62ad168e6bdefeedc374f as it
checked in merge conflict markers.
2023-07-28 19:28:20 +08:00
Tuan Chuong Goh
8eeabf674c [AArch64] Add funnel shift lowering for SelectionDAG
Consider FSHR legal if shift amount is constant
Lower FSHL to FSHR if shift amount is constant

Differential Revision: https://reviews.llvm.org/D155565
2023-07-28 11:56:25 +01:00
Kevin Athey
d5f496a2cb Revert "[RISCVRVVInitUndef] Remove implicit single use assumption for IMPLICIT_DEF"
This reverts commit 9cf675923afa73a3dbe575803ebbbe9146701df8.

Breaking sanitzer buildbots: asan and fast
https://lab.llvm.org/buildbot/#/builders/168/builds/14824
https://lab.llvm.org/buildbot/#/builders/5/builds/35419
2023-07-28 12:23:18 +02:00
John Brawn
8336d38be9 [ARM] Correctly handle combining segmented stacks with execute-only
Using segmented stacks with execute-only mostly works, but we need to
use the correct movi32 opcode in 6-M, and there's one place where for
thumb1 (i.e. 6-M and 8-M.base) a constant pool was unconditionally
used which needed to be fixed.

Differential Revision: https://reviews.llvm.org/D156339
2023-07-28 10:37:40 +01:00
melonedo
bf2ad26b4f [RISCV] Add support for XCVbi extension in CV32E40P
Implement XCVbi intrinsics for CV32E40P according to the specification.

This commit is part of a patch-set to upstream the 7 vendor specific extensions of CV32E40P.

Contributors: @CharKeaney, @jeremybennett, @lewis-revill, @liaolucy, Nandni Jamnadas, @PaoloS, @simoncook, @xmj.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154412
2023-07-28 17:36:57 +08:00
melonedo
3c0604b224 [RISCV] Add support for XCVsimd extension in CV32E40P
Implement XCVsimd intrinsics for CV32E40P according to the specification.

This commit is part of a patch-set to upstream the 7 vendor specific extensions of CV32E40P.

Contributors: @CharKeaney, @jeremybennett, @lewis-revill, @liaolucy, Nandni Jamnadas, @PaoloS, @simoncook, @xmj.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D153721
2023-07-28 16:52:32 +08:00
Jun Sha (Joshua)
e56bf13317 [RISCV] Remove some instructions from Zvfbfwma by implying Zfbfmin according to the latest spec
According to the latest spec, Zvfbfwma requires Zvfbfmin and Zvfbfmin requires Zfbfmin, with FLH/FSH/FMV.H.X/HMV.X.H removed from Zvfbfwma.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D155916
2023-07-28 15:52:03 +08:00
Freddy Ye
c9d92e6638 [X86] Support -march=arrowlake,arrowlake-s,lunarlake
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D156239
2023-07-28 15:05:54 +08:00
Qihan Cai
092e60a3fc [RISCV] Add support for XCValu extension in CV32E40P
Implement XCValu intrinsics for CV32E40P according to the specification.

This is a commit of the patch-set to upstream the 7 vendor specific extensions of CV32E40P.

Contributors: @CharKeaney, Nandni Jamnadas, Serkan Muhcu, @jeremybennett, @lewis-revill, @liaolucy, @simoncook, @xmj

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D153748
2023-07-28 11:37:31 +08:00
Fangrui Song
845d83d85f [test] Add --show-all-symbols to some llvm-objdump -d commands
llvm-objdump -d will be changed to not display mapping symbols by
default (D156190).
Add --show-all-symbols to make the intent clearer and prevent test
adjustment with the new behavior.
2023-07-27 19:33:51 -07:00
Jun Sha (Joshua)
2b6df4a336 [RISCV] Add codegen support for bf16 vector
This patch adds codegen support for vector with bfloat16 type in llvm backend.
With this patch, Zvbfmin/Zvbfwma instructions as well as vle16/vse16 can generated from newly added bf16 IR intrinsics.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D156287
2023-07-28 09:54:23 +08:00
Philip Reames
9cf675923a [RISCVRVVInitUndef] Remove implicit single use assumption for IMPLICIT_DEF
The code was written with the implicit assumption that each IMPLICIT_DEF either a) the tied operand, or b) an untied source, but not both. This is true right now, but an upcoming change may allow CSE of IMPLICIT_DEFs in some cases, so let's rewrite the code to handle that possibility.

I added an MIR case which demonstrates the multiple use IMPLICIT_DEF. To my knowledge, this is not a reachable configuration from IR right now.

As an aside, this makes the structure a much closer match with the sub-reg liveness case, and we can probably just merge these routines. (Future work.)

Differential Revision: https://reviews.llvm.org/D156477
2023-07-27 16:25:56 -07:00
Jon Roelofs
3e0cdf332f
Upgrade a rdar://5907648 link to a github issue
https://github.com/llvm/llvm-project/issues/64174
2023-07-27 13:37:48 -07:00
Hiroshi Yamauchi
a90228b911 [AArch64][Windows] Fix the slot offset of the swift async context register.
This fixes a code gen issue where savings the swift async context
register (x22) accidentally overwrites the saved value of another
callee-saved register, corrupts its value and causes a crash.

Differential Revision: https://reviews.llvm.org/D156391
2023-07-27 12:32:43 -07:00
Craig Topper
a81e1f0fb2 [RISCV] When using vror.vi for left rotate, mask the inverted immediate to SEW.
This makes the assembly more readable.

Reviewed By: luke

Differential Revision: https://reviews.llvm.org/D156348
2023-07-27 12:16:21 -07:00
Matt Arsenault
e5f04830c5 ARM: Use explicit triple in a test to avoid inheriting windows from the host 2023-07-27 13:18:50 -04:00
Daniel Hoekwater
3435a6a0bb [AArch64] [XRay] Account for XRay event instrs in Branch Relaxation
PATCHABLE_TYPED_EVENT_CALL and PATCHABLE_EVENT_CALL are pseudo
instructions that expand to XRay sleds, so getInstSizeInBytes
should reflect the size of the sleds, not the pseudo-instructions.

Differential Revision: https://reviews.llvm.org/D156272
2023-07-27 17:10:58 +00:00
Matt Arsenault
95e5a461f5 AMDGPU: Always custom lower extract_subvector
The patterns were ripped out in
a4a3ac10cb1a40ccebed4e81cd7e94f1eb71602d so this always needs to be
custom lowered. I absolutely hate how difficult it is to write tests
for these, I have no doubt there are more of these hidden.

Fixes #64142
2023-07-27 08:46:44 -04:00
Jay Foad
2dcf051259 [CodeGen] Store call frame size in MachineBasicBlock
Record the call frame size on entry to each basic block. This is usually
zero except when a basic block has been split in the middle of a call
sequence.

This simplifies PEI::replaceFrameIndices which previously had to visit
basic blocks in a specific order and had special handling for
unreachable blocks. More importantly it paves the way for an equally
simple implementation of a backwards version of replaceFrameIndices,
which is required to fully convert PrologEpilogInserter to backwards
register scavenging, which is preferred because it does not rely on
accurate kill flags.

Differential Revision: https://reviews.llvm.org/D156113
2023-07-27 10:32:00 +01:00
David Green
beabfe747b [AArch64] Sink splat to fmlal intrinsics
Similar to other neon index instructions, it is beneficial to sink the splat to
the instruction for fmlal in order for it to create the index.
2023-07-27 10:07:01 +01:00
David Green
509cb33469 [AArch64] Correct the regtype of indexed fmlal
The indexed fmlal should use a low numbered register for the index operand,
which this fixes by making it V128_lo.

Fixes 64104

Differential Revision: https://reviews.llvm.org/D156296
2023-07-27 08:27:03 +01:00
David Green
e012c5cfac [AArch64] Add test showing incorrect register usage of FMLAL. NFC
See D156296
2023-07-27 07:39:10 +01:00
Vitaly Buka
a496c8be6e Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
And dependent commits.

Details in D150388.

This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c.
This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e.
This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826.
This reverts commit b7836d856206ec39509d42529f958c920368166b.

No conflicts in the code, few tests had conflicts in autogenerated CHECKs:
llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll

Reviewed By: alexfh

Differential Revision: https://reviews.llvm.org/D156381
2023-07-26 22:13:32 -07:00
Pravin Jagtap
1462053608 [AMDGPU] Propagate constants for llvm.amdgcn.wave.reduce.umin/umax
Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D156077
2023-07-26 23:46:01 -04:00