688 Commits

Author SHA1 Message Date
Ruiling, Song
67c55b1ffc
[AMDGPU] Make max dwords of memory cluster configurable (#119342)
We find it helpful to increase the value for graphics workload. Make it
configurable so we can experiment with a different value.
2024-12-18 14:17:27 +08:00
Fangrui Song
cd12922235 [test] Change llc -march= to -mtriple=
Similar to 806761a7629df268c8aed49657aeccffa6bca449

-march= is error-prone when running on a host whose OS is different.
2024-12-15 13:08:02 -08:00
Fangrui Song
e339f0a9da [test] Remove redundant -march=x86-64 when target triple is specified in IR 2024-12-15 11:30:14 -08:00
Fangrui Song
40a4cbb0f2 [MIR,test] Change llc -march=x86-64 to -mtriple=x86_64
Similar to 806761a7629df268c8aed49657aeccffa6bca449

-mtriple= specifies the full target triple while -march= merely sets the
architecture part of the default target triple (e.g. Windows, macOS).

Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as these MIR tests do not
utilize object file format specific detail, but it's good to change
these tests to neighbor files that use -mtriple=x86_64
2024-12-15 11:23:08 -08:00
Fangrui Song
b279f6b098 [NVPTX,test] Change llc -march= to -mtriple=
Similar to 806761a7629df268c8aed49657aeccffa6bca449

-mtriple= specifies the full target triple while -march= merely sets the
architecture part of the default target triple (e.g. Windows, macOS),
leaving a target triple which may not make sense.

Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
nvptx{,64}-apple-darwin as ELF instead of rejecting it outrightly.
2024-12-15 10:45:11 -08:00
Fangrui Song
2208c97c1b [Hexagon,test] Change llc -march= to -mtriple=
Similar to 806761a7629df268c8aed49657aeccffa6bca449

-mtriple= specifies the full target triple while -march= merely sets the
architecture part of the default target triple, leaving a target triple which
may not make sense.

Therefore, -march= is error-prone and not recommended for tests without a target
triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead
of rejecting it outrightly.
2024-12-15 10:20:22 -08:00
Fangrui Song
ae26f50aea [test] Change llc -march=mips* to -mtriple=mips*
Similar to 806761a7629df268c8aed49657aeccffa6bca449
2024-12-10 22:14:06 -08:00
Michael Maitland
b816c26289 [RISCV][MIR] Move skip-mir-comment-trailing-whitespace.mir into RISCV subdirectory 2024-11-11 12:02:29 -08:00
Michael Maitland
2b58458225
[MIRLexer][RISCV] Eat a space after the Machine comment (#115365)
The MIRPrinter emits ` :: ` at the start of a MMO. The MIRLexer eats all
the white space after the operand and before the `::` when there is no
comment. We need to eat the space after the comment to allow MIRLexer to
parse comments on a MMO.
2024-11-11 14:48:31 -05:00
Shilei Tian
6548b6354d Reapply "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)"
This reverts commit ca33649abe5fad93c57afef54e43ed9b3249cd86.
2024-11-08 20:21:16 -05:00
Shilei Tian
ca33649abe Revert "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)"
This reverts commit e215a1e27d84adad2635a52393621eb4fa439dc9 as it broke both
hip and openmp buildbots.
2024-11-08 16:36:35 -05:00
Shilei Tian
e215a1e27d
[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403) 2024-11-08 13:05:35 -05:00
dyung
bc7e099aa8
Revert "[AMDGPU][MIR] Serialize NumPhysicalVGPRSpillLanes" (#115353)
Reverts llvm/llvm-project#115291

Reverting due to test failures on many bots including
https://lab.llvm.org/buildbot/#/builders/174/builds/8049
2024-11-07 13:02:51 -05:00
Akshat Oke
21835ee28d
[AMDGPU][MIR] Serialize NumPhysicalVGPRSpillLanes (#115291) 2024-11-07 20:08:36 +05:30
Akshat Oke
e76d9214c8
[AMDGPU] Fix 3495d04 MIR test (#114963)
Needed to specify scratchRSrcReg and spreg in order to stop after
prologepilog.

- Fixes #113129 test failure
2024-11-05 17:11:47 +05:30
Akshat Oke
3495d04560
[AMDGPU][MIR] Serialize SpillPhysVGPRs (#113129) 2024-11-05 13:17:25 +05:30
Thorsten Schütt
4b028773b2
Revert "[GlobalISel] Import samesign flag" (#114256)
Reverts llvm/llvm-project#113090
2024-10-30 17:03:17 +01:00
Thorsten Schütt
72b115301d
[GlobalISel] Import samesign flag (#113090)
Credits: https://github.com/llvm/llvm-project/pull/111419
2024-10-30 16:34:01 +01:00
Jack Styles
86f76c3b17
[AArch64][Libunwind] Add Support for FEAT_PAuthLR DWARF Instruction (#112171)
As part of FEAT_PAuthLR, a new DWARF Frame Instruction was introduced,
`DW_CFA_AARCH64_negate_ra_state_with_pc`. This instructs Libunwind that
the PC has been used with the signing instruction. This change includes
three commits
- Libunwind support for the newly introduced DWARF Instruction
- CodeGen Support for the DWARF Instructions
- Reversing the changes made in #96377. Due to
`DW_CFA_AARCH64_negate_ra_state_with_pc`'s requirements to be placed
immediately after the signing instruction, this would mean the CFI
Instruction location was not consistent with the generated location when
not using FEAT_PAuthLR. The commit reverses the changes and makes the
location consistent across the different branch protection options.
While this does have a code size effect, this is a negligible one.

For the ABI information, see here:
853286c7ab/aadwarf64/aadwarf64.rst (id23)
2024-10-28 08:22:38 +00:00
Akshat Oke
6360652e9f
Reland [AMDGPU] Serialize WWM_REG vreg flag (#110229) (#112492)
A reland but not an exact copy as `VRegInfo.Flags` from the parser is
now an int8 instead of a vector; so only need to copy over the value.
2024-10-21 13:44:09 +05:30
Peter Collingbourne
3cab8827fd Revert "[AMDGPU] Serialize WWM_REG vreg flag (#110229)"
This reverts commit bec839d8eed9dd13fa7eaffd50b28f8f913de2e2.

Caused buildbot failures, e.g.
https://lab.llvm.org/buildbot/#/builders/52/builds/2928
2024-10-15 13:18:43 -07:00
Akshat Oke
8b20f1b924
[MIR] Fix tests for flags in register info (#112179)
[MIR] Serialize virtual register flags #110228 introduces register flags
which appear empty in .mir dumps. Future tests should use
`-simplify-mir`.
2024-10-14 18:28:54 +05:30
Akshat Oke
bec839d8ee
[AMDGPU] Serialize WWM_REG vreg flag (#110229) 2024-10-14 14:37:21 +05:30
Akshat Oke
dbfca24b99
[MIR] Serialize virtual register flags (#110228)
[MIR] Serialize virtual register flags

This introduces target-specific vreg flag serialization. Flags are represented as `uint8_t` and the `TargetRegisterInfo` override provides methods `getVRegFlagValue` to deserialize and `getVRegFlagsOfReg` to serialize.
2024-10-14 14:19:53 +05:30
Stephen Tozer
d826b0c90f
[LLVM] Add HasFakeUses to MachineFunction (#110097)
Following the addition of the llvm.fake.use intrinsic and corresponding
MIR instruction, two further changes are planned: to add an
-fextend-lifetimes flag to Clang that emits these intrinsics, and to
have -Og enable this flag by default. Currently, some logic for handling
fake uses is gated by the optdebug attribute, which is intended to be
switched on by -fextend-lifetimes (and by extension -Og later on).
However, the decision was made that a general optdebug attribute should
be incompatible with other opt_ attributes (e.g. optsize, optnone),
since they all express different intents for how to optimize the
program. We would still like to allow -fextend-lifetimes with optsize
however (i.e. -Os -fextend-lifetimes should be legal), since it may be a
useful configuration and there is no technical reason to not allow it.

This patch resolves this by tracking MachineFunctions that have fake
uses, allowing us to run passes that interact with them and skip passes
that clash with them.
2024-10-04 13:13:30 +01:00
Dominik Montada
d853adee00
[MIR] Fix return value when computed properties conflict with given prop (#109923)
This fixes a test failure when expensive checks are enabled. Use the
correct return value when computing machine function properties resulted
in an error (e.g. when conflicting with explicitly set values).

Without this, the machine verifier would crash even in the presence of
parsing errors which should have gently terminated execution.
2024-09-25 10:47:14 +02:00
Dominik Montada
8ba334bc4a
[MIR] Allow overriding isSSA, noPhis, noVRegs in MIR input (#108546)
Allow setting the computed properties IsSSA, NoPHIs, NoVRegs for MIR
functions in MIR input. The default value is still the computed value.
If the property is set to false, the computed result is ignored. Conflicting
values (e.g. setting IsSSA where the input MIR is clearly not SSA) lead to
an error.

Closes #37787
2024-09-24 14:21:45 +02:00
gonzalobg
78ae2de4c6
[NVPTX] Load/Store/Fence syncscope support (#106101)
Adds "initial" support for `syncscope` to the NVPTX backend
`load`/`store`/`fence` instructions.
Atomic Read-Modify-Write operations intentionally not supported as part
of this initial PR.
2024-09-23 10:18:00 -07:00
Abinaya Saravanan
c010b72e9b
[HEXAGON] AddrModeOpt support for HVX and optimize adds (#106368)
This patch does 3 things:
1. Add support for optimizing the address mode of HVX load/store
instructions
2. Reduce the value of Add instruction immediates by replacing with the
difference from other Addi instructions that share common base:

For Example, If we have the below sequence of instructions: r1 =
add(r2,# 1024) ... r3 = add(r2,# 1152) ... r4 = add(r2,# 1280)

Where the register r2 has the same reaching definition, They get
modified to the below sequence:

       r1 = add(r2,# 1024)
            ...
       r3 = add(r1,# 128)
            ...
       r4 = add(r1,# 256)
3. Fixes a bug pass where the addi instructions were modified based on a
predicated register definition, leading to incorrect output.

Eg:
         INST-1: if (p0) r2 = add(r13,# 128)
         INST-2: r1 = add(r2,# 1024)
         INST-3: r3 = add(r2,# 1152)
         INST-4: r5 = add(r2,# 1280)

In the above case, since r2's definition is predicated, we do not want
to modify the uses of r2 in INST-3/INST-4 with add(r1,#128/256)

4.Fixes a corner case

It looks like we never check whether the offset register is actually
live (not clobbered) at optimization site. Add the check whether it is
live at MBB entrance. The rest should have already been verified.

5. Fixes a bad codegen

For whatever reason we do transformation without checking if the value
in register actually reaches the user. This is second identical fix for
this pass.

   Co-authored-by: Anirudh Sundar <quic_sanirudh@quicinc.com>
   Co-authored-by: Sergei Larin <slarin@quicinc.com>
2024-09-13 18:48:34 -05:00
Diana Picus
3356208531
Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512)
This reverts commit
7792b4ae79.

The problem was a conflict with
e55d6f5ea2
"[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive
(https://github.com/llvm/llvm-project/pull/107889)"
which changed the syntax of V_SET_INACTIVE (and thus made my MIR test
crash).

...if only we had a merge queue.
2024-09-13 11:54:30 +02:00
Diana Picus
7792b4ae79
Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)"" (#108341)
Reverts llvm/llvm-project#108173

si-init-whole-wave.mir crashes on some buildbots (although it passed
both locally with sanitizers enabled and in pre-merge tests).
Investigating.
2024-09-12 10:12:09 +02:00
Diana Picus
703ebca869
Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)" (#108173)
This reverts commit
c7a7767fca.

The buildbots failed because I removed a MI from its parent before
updating LIS. This PR should fix that.
2024-09-12 09:11:41 +02:00
Vitaly Buka
c7a7767fca
Revert "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)
Breaks bots, see #105822.

Reverts llvm/llvm-project#105822
2024-09-10 09:51:43 -07:00
Diana Picus
44556e64f2
[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822)
This intrinsic is meant to be used in functions that have a "tail" that
needs to be run with all the lanes enabled. The "tail" may contain
complex control flow that makes it unsuitable for the use of the
existing WWM intrinsics. Instead, we will pretend that the function
starts with all the lanes enabled, then branches into the actual body of
the function for the lanes that were meant to run it, and then finally
all the lanes will rejoin and run the tail.

As such, the intrinsic will return the EXEC mask for the body of the
function, and is meant to be used only as part of a very limited pattern
(for now only in amdgpu_cs_chain functions):

```
entry:
  %func_exec = call i1 @llvm.amdgcn.init.whole.wave()
  br i1 %func_exec, label %func, label %tail

func:
  ; ... stuff that should run with the actual EXEC mask
  br label %tail

tail:
  ; ... stuff that runs with all the lanes enabled;
  ; can contain more than one basic block
```

It's an error to use the result of this intrinsic for anything
other than a branch (but unfortunately checking that in the verifier is
non-trivial because SIAnnotateControlFlow will introduce an amdgcn.if
between the intrinsic and the branch).

The intrinsic is lowered to a SI_INIT_WHOLE_WAVE pseudo, which for now
is expanded in si-wqm (which is where SI_INIT_EXEC is handled too);
however the information that the function was conceptually started in
whole wave mode is stored in the machine function info
(hasInitWholeWave). This will be useful in prolog epilog insertion,
where we can skip saving the inactive lanes for CSRs (since if the
function started with all the lanes active, then there are no inactive
lanes to preserve).
2024-09-10 13:24:53 +02:00
Carl Ritson
16cda01d22
[AMDGPU] V_SET_INACTIVE optimizations (#98864)
Optimize V_SET_INACTIVE by allow it to run in WWM.
Hence WWM sections are not broken up for inactive lane setting.
WWM V_SET_INACTIVE can typically be lower to V_CNDMASK.
Some cases require use of exec manipulation V_MOV as previous code.
GFX9 sees slight instruction count increase in edge cases due to
smaller constant bus.

Additionally avoid introducing exec manipulation and V_MOVs where
a source of V_SET_INACTIVE is the destination.
This is a common pattern as WWM register pre-allocation often
assigns the same register.
2024-09-05 14:39:28 +09:00
Stephen Tozer
9a58b12fe7 [ExtendLifetimes][NFC] Add explicit triple to new fake-use tests
Several tests for the new fake use intrinsic are failing on NVPTX
buildbots due to relying on behaviour for their expected triple;
this commit adds that triple to each of them to prevent failures.

Fixes commit 3d08ade (#86149).

Example buildbot failures:
https://lab.llvm.org/buildbot/#/builders/160/builds/4175
https://lab.llvm.org/buildbot/#/builders/180/builds/4173
2024-08-29 18:43:35 +01:00
Stephen Tozer
3d08ade7bd
[ExtendLifetimes] Implement llvm.fake.use to extend variable lifetimes (#86149)
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:

https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850

This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).

Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
2024-08-29 17:53:32 +01:00
Matt Arsenault
b1bcb7ca46 Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commit adaff46d087799072438dd744b038e6fd50a2d78.

Drop the -O3 checks from default-attributes.hip. I don't know why they
are different on some bots but reverting this is far too disruptive.
2024-07-15 11:51:44 +04:00
paperchalice
c09ed6a29e
[CodeGen][NewPM] Port MachineVerifier to new pass manager (#98628)
- Add `MachineVerifierPass`.
- Use complete `MachineVerifierPass` in `VerifyInstrumentation` if
possible.

`LiveStacksAnalysis` will be added in future, all other analyses are
done.
2024-07-15 12:42:44 +08:00
dyung
adaff46d08
Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commits 677cc15e0ff2e0e6aa30538eb187990a6a8f53c0 and
78bc1b64a6dc3fb6191355a5e1b502be8b3668e7.

The test CodeGenHIP/default-attributes.hip is failing on multiple bots
even after the attempted fix including the following:
- https://lab.llvm.org/buildbot/#/builders/3/builds/1473
- https://lab.llvm.org/buildbot/#/builders/65/builds/1380
- https://lab.llvm.org/buildbot/#/builders/161/builds/595
- https://lab.llvm.org/buildbot/#/builders/154/builds/1372
- https://lab.llvm.org/buildbot/#/builders/133/builds/1547
- https://lab.llvm.org/buildbot/#/builders/81/builds/755
- https://lab.llvm.org/buildbot/#/builders/40/builds/570
- https://lab.llvm.org/buildbot/#/builders/13/builds/748
- https://lab.llvm.org/buildbot/#/builders/12/builds/1845
- https://lab.llvm.org/buildbot/#/builders/11/builds/1695
- https://lab.llvm.org/buildbot/#/builders/190/builds/1829
- https://lab.llvm.org/buildbot/#/builders/193/builds/962
- https://lab.llvm.org/buildbot/#/builders/23/builds/991
- https://lab.llvm.org/buildbot/#/builders/144/builds/2256
- https://lab.llvm.org/buildbot/#/builders/46/builds/1614

These bots have been broken for a day, so reverting to get everything
back to green.
2024-07-14 18:48:54 -07:00
Matt Arsenault
78bc1b64a6
AMDGPU: Move attributor into optimization pipeline (#83131)
Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.

Mostly mechanical, but there are some creative test updates. I preferred
to take the changes as-is in tests where the ABI isn't relevant. In
cases where it's more relevant, or the optimize out logic was too
ingrained in the test, I pre-run the optimization. Some cases manually
add attributes to disable inputs.
2024-07-14 08:36:33 +04:00
Scott Linder
9f5756abef [MIR] Replace bespoke DIExpression parser
Resolve FIXME by using the LLParser implementation of parseDIExpression
from the MIParser.
2024-07-10 19:26:13 +00:00
Stephen Chou
3c24eb39fb
[LLVM][MIR] Support parsing bfloat immediates in MIR parser (#96010)
Adds support in MIR parser for parsing bfloat immediates, and adds a
test for this.
2024-06-25 16:44:14 -04:00
Alan Zhao
836703087d
[BranchFolder] Fix missing debug info with tail merging (#94715)
`BranchFolder::TryTailMergeBlocks(...)` removes unconditional branch
instructions and then recreates them. However, this process loses debug
source location information from the previous branch instruction, even
if tail merging doesn't change IR. This patch preserves the debug
information from the removed instruction and inserts them into the
recreated instruction.

Fixes #94050
2024-06-20 10:48:18 -07:00
paperchalice
1bc8b3258e
[NewPM][CodeGen] Port regallocfast to new pass manager (#94426)
This pull request port `regallocfast` to new pass manager. It exposes
the parameter `filter` to handle different register classes for AMDGPU.
IIUC AMDGPU need to allocate different register classes separately so it
need implement its own `--<reg-class>-regalloc`. Now users can use e.g.
`-passe=regallocfast<filter=sgpr>` to allocate specific register class.
The command line option `--regalloc-npm` is still in work progress, plan
to reuse the syntax of passes, e.g. use
`--regalloc-npm=regallocfast<filter=sgpr>,greedy<filter=vgpr>` to
replace `--sgpr-regalloc` and `--vgpr-regalloc`.
2024-06-07 12:22:42 +08:00
paperchalice
e7939d0df6
[Instrumentation] Support verifying machine function (#90931)
We need it to test isel related passes. Currently
`verifyMachineFunction` is incomplete (no LiveIntervals support), but is
enough for testing isel pass, will migrate to complete
`MachineVerifierPass` in future.
2024-05-04 09:00:59 +08:00
Fangrui Song
b9ae06ba15 [test] Convert text files from CRLF to LF
Skip *.pdb, *.rc, *crlf*, and FileCheck/dos-style-eol.txt
2024-05-03 10:09:52 -07:00
David Tellenbach
cf2f32c97f
[MIR] Serialize MachineFrameInfo::isCalleeSavedInfoValid() (#90561)
In case of functions without a stack frame no "stack" field is
serialized into MIR which leads to isCalleeSavedInfoValid being false
when reading a MIR file back in. To fix this we should serialize
MachineFrameInfo::isCalleeSavedInfoValid() into MIR.
2024-05-01 10:07:51 -07:00
Jonas Paulsson
09bc6abba6
[MachineFrameInfo] Refactoring around computeMaxcallFrameSize() (NFC) (#78001)
- Use computeMaxCallFrameSize() in PEI::calculateCallFrameInfo() instead of duplicating the code.

- Set AdjustsStack in FinalizeISel instead of in computeMaxCallFrameSize().
2024-03-18 10:37:59 -04:00
Sameer Sahasrabuddhe
ec34699f75
[GlobalISel] convergence control tokens and intrinsics (#67006)
[GlobalISel] Implement convergence control tokens and intrinsics in GMIR

In the IR translator, convert the LLVM token type to LLT::token(), which is an
alias for the s0 type. These show up as implicit uses on convergent operations.

Differential Revision: https://reviews.llvm.org/D158147
2024-03-18 10:34:11 +05:30