341 Commits

Author SHA1 Message Date
Christudasan Devadasan
488d3924dd
[CodeGen][NewPM] Port EarlyIfConversion pass to NPM. (#108508) 2024-10-16 13:22:57 +05:30
Daniel Paoliello
c9f27275c1
[clang][aarch64] Add support for the MSVC qualifiers __ptr32, __ptr64, __sptr, __uptr for AArch64 (#111879)
MSVC has a set of qualifiers to allow using 32-bit signed/unsigned
pointers when building 64-bit targets. This is useful for WoW code
(i.e., the part of Windows that handles running 32-bit application on a
64-bit OS). Currently this is supported on x64 using the 270, 271 and
272 address spaces, but does not work for AArch64 at all.

This change adds the same 270, 271 and 272 address spaces to AArch64 and
adjusts the data layout string accordingly. Clang will generate the
correct address space casts, but these will currently be ignored until
the AArch64 backend is updated to handle them.

Partially fixes #62536

This is a resurrected version of <https://reviews.llvm.org/D158857>
(originally created by @a_vorobev) - I've cleaned it up a little, fixed
the rest of the tests and added to auto-upgrade for the data layout.
2024-10-15 10:37:36 -07:00
Paul Walker
665457815f [LLVM][AArch64] Enable SVEIntrinsicOpts at all optimisation levels. 2024-10-09 14:17:39 +00:00
Paul Walker
1b3fc75451 Revert "[LLVM][AArch64] Enable SVEIntrinsicOpts at all optimisation levels."
This reverts commit 886d98e149843f3890ef4dd556a5dee45ff97fe9.
2024-10-09 10:44:00 +00:00
Paul Walker
886d98e149 [LLVM][AArch64] Enable SVEIntrinsicOpts at all optimisation levels. 2024-10-09 10:22:52 +00:00
rjmansfield
0717898124
Fix cl::desc typos in aarch64-enable-dead-defs and arm-implicit-it. (#106712) 2024-08-30 19:15:05 +01:00
Sander de Smalen
6c189eaea9
[AArch64] Add SME peephole optimizer pass (#104612)
This pass removes back-to-back smstart/smstop instructions
to reduce the number of streaming mode changes in a function.

The implementation as proposed doesn't aim to solve all problems
yet and suggests a number of cases that can be optimized in the
future.
2024-08-21 09:44:01 +01:00
Anatoly Trosinenko
cc53b953ac
[AArch64] When hardening against SLS, only create called thunks (#97472)
In preparation for implementing hardening of BLRA* instructions,
restrict thunk function generation to only the thunks being actually
called from any function. As described in the existing comments,
emitting all possible thunks for BLRAA and BLRAB instructions would mean
adding about 1800 functions in total, most of which are likely not to be
called.

This commit merges AArch64SLSHardening class into SLSBLRThunkInserter,
so thunks can be created as needed while rewriting a machine function.
The usages of TII, TRI and ST fields of AArch64SLSHardening class are
replaced with requesting them in-place, as ThunkInserter assumes
multiple "entry points" in contrast to the only runOnMachineFunction
method of AArch64SLSHardening.

The runOnMachineFunction method essentially replaces pre-existing
insertThunks implementation as there is no more need to insert all
possible thunks unconditionally. Instead, thunks are created on first
use from inside of insertThunks method.
2024-07-05 13:12:09 +03:00
Nikita Popov
5cd0ba30f5
Reapply [IR] Lazily initialize the class to pass name mapping (NFC) (#96321) (#96462)
On MSVC the `this` uses inside `decltype` require a lambda capture. On
clang they result in an unused capture warning instead. Add the capture
and suppress the warning with `(void)this`.

-----

Initializing this map is somewhat expensive (especially for O0), so we
currently only do it if certain flags are used. I would like to make use
of it for crash dumps (#96078), where we don't know in advance whether
it will be needed or not.

This patch changes the initialization to a lazy approach, where a
callback is registered that does the actual initialization. The
callbacks will be run the first time the pass name is requested.

This way there is no compile-time impact if the mapping is not used.
2024-06-24 15:00:11 +02:00
Nikita Popov
e5a41f0afc Revert "[IR] Lazily initialize the class to pass name mapping (NFC) (#96321)"
My attempt to fix the Windows build made things worse,
revert entirely for now.

This reverts commit e7137f2fed5cfee822ae3c4c6d39188adb59a16c.
This reverts commit 6eaf204dbb0a6a81cddfd02f625c130f7bb1aae5.
This reverts commit 957dc4366dd2ce9d5d2991c3ad76bbf438e9954e.
2024-06-24 10:32:03 +02:00
Nikita Popov
957dc4366d
[IR] Lazily initialize the class to pass name mapping (NFC) (#96321)
Initializing this map is somewhat expensive (especially for O0), so we
currently only do it if certain flags are used. I would like to make use
of it for crash dumps (#96078), where we don't know in advance whether
it will be needed or not.

This patch changes the initialization to a lazy approach, where a
callback is registered that does the actual initialization. The
callbacks will be run the first time the pass name is requested.

This way there is no compile-time impact if the mapping is not used.
2024-06-24 09:40:09 +02:00
Sander de Smalen
8e0cd7382a
[AArch64] Consider runtime mode when deciding to use SVE for fixed-length vectors. (#96081)
This also fixes the case where an SVE div is incorrectly to be assumed
available in non-streaming mode with SME.
2024-06-20 17:08:14 +01:00
Min-Yih Hsu
37e309f163
[AArch64][LoopIdiom] Generalize AArch64LoopIdiomTransform into LoopIdiomVectorize (#94081)
To facilitate sharing LoopIdiomTransform between AArch64 and RISC-V,
this first patch moves AArch64LoopIdiomTransform from lib/Target/AArch64
to lib/Transforms/Vectorize and renames it to LoopIdiomVectorize. The
following patch (#94082) will teach LoopIdiomVectorize how to generate VP
intrinsics (in addition to the current masked vector style) in favor of
RVV.
2024-06-07 14:06:11 -07:00
paperchalice
7652a59407
Reland "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94149)
- Fix build with `EXPENSIVE_CHECKS`
- Remove unused `PassName::ID` to resolve warning
- Mark `~SelectionDAGISel` virtual so AArch64 backend can work properly
2024-06-04 08:10:58 +08:00
paperchalice
8917afaf0e
Revert "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94146)
This reverts commit de37c06f01772e02465ccc9f538894c76d89a7a1 to
de37c06f01772e02465ccc9f538894c76d89a7a1

It still breaks EXPENSIVE_CHECKS build. Sorry.
2024-06-02 14:31:52 +08:00
paperchalice
d2cdc8ab45
[NewPM][CodeGen] Port selection dag isel to new pass manager (#83567)
Port selection dag isel to new pass manager.
Only `AMDGPU` and `X86` support new pass version. `-verify-machineinstrs` in new pass manager belongs to verify instrumentation, it is enabled by default.
2024-06-02 09:12:33 +08:00
Sander de Smalen
1015f51dd9
[AArch64] NFC: Rename -force-streaming-compatible-sve to -force-streaming-compatible (#92774)
The behaviour of the flag should be equivalent to
__arm_streaming_compatible.

At the moment, the name suggests that '-force-streaming-compatible-sve'
on its own (i.e. without specifying `+sve`) enables the compiler to use
the streaming-compatible subset of SVE instructions, but the semantics
merely are that the function can be called with either PSTATE.SM=0 or
PSTATE.SM=1.
2024-05-22 07:58:54 +01:00
Doug Wyatt
ddecadabeb
[clang backend] In AArch64's DataLayout, specify a minimum function alignment of 4. (#90702)
This addresses an issue where the explicit alignment of 2 (for C++ ABI
reasons) was being propagated to the back end and causing under-aligned
functions (in special sections).

This is an alternate approach suggested by @efriedma-quic in PR #90415.

Fixes #90358
2024-05-05 19:05:15 -07:00
Sander de Smalen
c3d58867bd
[AArch64][SME] Create new pass to remove COALESCER_BARRIER early. (#85386)
The purpose of the COALESCER_BARRIER pseudo node is to prevent the
register coalescer from coalescing certain COPY instructions around
smstart/smstop instructions, so that we spill only the (required) FPR
register rather than the encompassing ZPR register.

The pseudos are removed in the AArch64ExpandPseudo pass. However,
because the node itself is a _use_ of a register, this occassionally
leads to redundant spills/fills, because the register allocator thinks
the virtual register is actually used before an smstart/smstop
instruction, causing it to be filled, at which points it requires
immediate spilling again to ensure it stays live over the smstart/smstop
instruction.

We can avoid that by removing the pseudo nodes right after coalescing,
but before register allocation.
2024-04-15 15:07:20 +01:00
Nathan Lanza
da385e8251
[aarch64] Unguard GEPOpt from O3
This chunk of code currently runs only if the optimization mode is O3
AND the EnableGEPOpt flag is set. Given that this is the only use case
for the EnableGEPOpt flag, the guarding against O3 is kinda pointless.
IF the user wants to enable it then the flag should be sufficient.

Reviewers: TNorthover, aeubanks

Reviewed By: aeubanks

Pull Request: https://github.com/llvm/llvm-project/pull/86588
2024-03-25 18:08:36 -04:00
paperchalice
29bf32efbb
[NewPM][AArch64] Add AArch64PassRegistry.def (#85215)
PR #83567 ports `SelectionDAGISel` to the new pass manager, then each
backend should provide `<Target>DagToDagISel()` in new pass manager
style. Then each target should provide `<Target>PassRegistry.def` to
register backend passes in `registerPassBuilderCallbacks` to reduce
duplicate code.
This PR adds `AArch64PassRegistry.def` to AArch64 backend and
boilerplate code in `registerPassBuilderCallbacks`.
2024-03-21 10:57:51 +08:00
ostannard
503c55e170
[AArch64] Move SLS later in pass pipeline (#84210)
Currently, the SLS hardening pass is run before the machine outliner,
which means that the outliner creates new functions and calls which do
not have the SLS hardening applied.

The fix for this is to move the SLS passes to after the outliner, as has
recently been done for the return address signing pass.

This also avoids a bug where the SLS outliner emits code with
instructions after a return, which the outliner doesn't correctly
handle.
2024-03-07 09:28:49 +00:00
Rishabh Bali
fe42e72db2
[CodeGen] Port AtomicExpand to new Pass Manager (#71220)
Port the `atomicexpand` pass to the new Pass Manager. 
Fixes #64559
2024-02-25 18:42:22 +05:30
Yuta Mukai
70eab122bc
[AArch64][MachinePipeliner] Add pipeliner support for AArch64 (#79589)
Add AArch64 implementations for the interfaces of MachinePipeliner pass.
The pass is disabled by default for AArch64. It is enabled by specifying
--aarch64-enable-pipeliner.

5 tests in llvm-test-suites show performance improvement by more than 5%
on a Neoverse V1 processor.

| test | improvement |
| ---------------------------------------------------------------- |
-----------:|
| MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl.test | 16%
|
| MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-flt.test | 16%
|
| SingleSource/Benchmarks/Adobe-C++/loop_unroll.test | 14% |
| SingleSource/Benchmarks/Misc/flops-5.test | 13% |
| SingleSource/Benchmarks/BenchmarkGame/spectral-norm.test | 6% |

(base flags: -mcpu=neoverse-v1 -O3 -mrecip, flags for pipelining: -mllvm
-aarch64-enable-pipeliner -mllvm
-pipeliner-max-stages=100 -mllvm -pipeliner-max-mii=100 -mllvm
-pipeliner-enable-copytophi=0)

On the other hand, there are cases of significant performance
degradation. Algorithm improvements and adding the option/pragma will be
needed in the future.
2024-02-02 10:33:44 +09:00
Eli Friedman
a6065f0fa5
Arm64EC entry/exit thunks, consolidated. (#79067)
This combines the previously posted patches with some additional work
I've done to more closely match MSVC output.

Most of the important logic here is implemented in
AArch64Arm64ECCallLowering. The purpose of the
AArch64Arm64ECCallLowering is to take "normal" IR we'd generate for
other targets, and generate most of the Arm64EC-specific bits:
generating thunks, mangling symbols, generating aliases, and generating
the .hybmp$x table. This is all done late for a few reasons: to
consolidate the logic as much as possible, and to ensure the IR exposed
to optimization passes doesn't contain complex arm64ec-specific
constructs.

The other changes are supporting changes, to handle the new constructs
generated by that pass.

There's a global llvm.arm64ec.symbolmap representing the .hybmp$x
entries for the thunks. This gets handled directly by the AsmPrinter
because it needs symbol indexes that aren't available before that.

There are two new calling conventions used to represent calls to and
from thunks: ARM64EC_Thunk_X64 and ARM64EC_Thunk_Native. There are a few
changes to handle the associated exception-handling info,
SEH_SaveAnyRegQP and SEH_SaveAnyRegQPX.

I've intentionally left out handling for structs with small
non-power-of-two sizes, because that's easily separated out. The rest of
my current work is here. I squashed my current patches because they were
split in ways that didn't really make sense. Maybe I could split out
some bits, but it's hard to meaningfully test most of the parts
independently.

Thanks to @dpaoliello for extensive testing and suggestions.

(Originally posted as https://reviews.llvm.org/D157547 .)
2024-01-22 21:28:07 -08:00
David Sherwood
c7148467fc
[AArch64] Add an AArch64 pass for loop idiom transformations (#72273)
We have added a new pass that looks for loops such as the following:

```
  while (i != max_len)
      if (a[i] != b[i])
          break;

  ... use index i ...
```

Although similar to a memcmp, this is slightly different because instead
of returning the difference between the values of the first non-matching
pair of bytes, it returns the index of the first mismatch. As such, we
are not able to lower this to a memcmp call.

The new pass can now spot such idioms and transform them into a
specialised predicated loop that gives a significant performance
improvement for AArch64. It is intended as a stop-gap solution until
this can be handled by the vectoriser, which doesn't currently deal with
early exits.

This specialised loop makes use of a generic intrinsic that counts the
trailing zero elements in a predicate vector. This was added in
https://reviews.llvm.org/D159283 and for SVE we end up with brkb & incp
instructions.

Although we have added this pass only for AArch64, it was written in a
generic way so that in theory it could be used by other targets.
Currently the pass requires scalable vector support and needs to know
the minimum page size for the target, however it's possible to make it
work for fixed-width vectors too. Also, the llvm.experimental.cttz.elts
intrinsic used by the pass has generic lowering, but can be made
efficient for targets with instructions similar to SVE's brkb, cntp and
incp.

Original version of patch was posted on Phabricator:

 https://reviews.llvm.org/D158291

Patch co-authored by Kerry McLaughlin (@kmclaughlin-arm) and David
Sherwood (@david-arm)

See the original discussion on Discourse:

https://discourse.llvm.org/t/aarch64-target-specific-loop-idiom-recognition/72383
2024-01-09 11:29:28 +00:00
Momchil Velikov
ac06d4e4cb Re-commit "[MachineSink][AArch64] Enable sink-and-fold by default (#72132)"
This re-commits 13fe0386454d after fixing a couple of issues in the LLDB
testsuite in ef9bcace834e and 6b87d84ff45d
2023-11-27 11:28:22 +00:00
Momchil Velikov
4ac5b0da8d Revert "[MachineSink][AArch64] Enable sink-and-fold by default (#72132)"
This reverts commit 13fe0386454d2f4c9bad4e20fc59699d1a49b8cf.

May have broken an LLDB test https://lab.llvm.org/buildbot/#/builders/96/builds/48609
2023-11-16 17:07:39 +00:00
Momchil Velikov
13fe038645
[MachineSink][AArch64] Enable sink-and-fold by default (#72132)
Enable the optimisation by default for AArch64 after a compile time
regressoin fix in e8209b2486d8
2023-11-16 12:12:56 +00:00
David Sherwood
bdc0afc871
[CodeGen][AArch64] Set min jump table entries to 13 for AArch64 targets (#71166)
There are some workloads that are negatively impacted by using jump
tables when the number of entries is small. The SPEC2017 perlbench
benchmark is one example of this, where increasing the threshold to
around 13 gives a ~1.5% improvement on neoverse-v1. I chose the minimum
threshold based on empirical evidence rather than science, and just
manually increased the threshold until I got the best performance
without impacting other workloads. For neoverse-v1 I saw around ~0.2%
improvement in the SPEC2017 integer geomean, and no overall change for
neoverse-n1. If we find issues with this threshold later on we can
always revisit this.

The most significant SPEC2017 score changes on neoverse-v1 were:

500.perlbench_r: +1.6%
520.omnetpp_r: +0.6%

and the rest saw changes < 0.5%.

I updated CodeGen/AArch64/min-jump-table.ll to reflect the new
threshold. For most of the affected tests I manually set the min number
of entries back to 4 on the RUN line because the tests seem to rely upon
this behaviour.
2023-11-14 13:00:28 +00:00
Oliver Stannard
339faffd05 Revert "[AArch64] Move SLS later in pass pipeline"
The (MF.size() == 0) assertis is triggering when building at -O0.
Reverting this while I work out what is going wrong.

This reverts commit 7e8eccd990d37d2771ca5ad7a84f54c3cfc4a5e1.
2023-10-26 09:50:13 +01:00
Momchil Velikov
9d35387811
[AArch64] Disable by default MachineSink sink-and-fold (#70101)
There is a report about a large compile time regression in V8 when
generating debug info.
2023-10-25 10:58:31 +01:00
Oliver Stannard
7e8eccd990 [AArch64] Move SLS later in pass pipeline
Currently, the SLS hardening pass is run before the machine outliner,
which means that the outliner creates new functions and calls which do
not have the SLS hardening applied.

The fix for this is to move the SLS passes to after the outliner, as has
recently been done for the return address signing pass.

This also avoids a bug where the SLS outliner emits code with
instructions after a return, which the outliner doesn't correctly
handle.

Reviewed By: kristof.beyls

Differential Revision: https://reviews.llvm.org/D158511
2023-10-25 10:45:12 +01:00
Momchil Velikov
d15fff6c69 Re-apply '[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432)'
This reverts revert 19505072123e43eccf528b660973067b5c9b4a26.

An issue was fixed in bea3684944c0d7962cd53ab77aad756cfee76b7c
and some newly appeared tests updated.
2023-10-19 13:18:25 +01:00
Amara Emerson
1950507212 Revert "Re-apply '[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432)'"
This reverts commit dbb9faedec5e28ab3f584f5e14d31e475ac268ac.

This seems to cause miscompiles on CTMark/sqlite3 and others with GISel.
2023-10-15 14:16:37 -07:00
Momchil Velikov
dbb9faedec Re-apply '[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432)'
This re-applies commit a9d0ab2ee572f179f80483f3ebbbcdd03c3b4481, which
was reverted by 8abb2ace888bdd04a1bdb4ac2f2fc25d57a5760a.

The issue was fixed by 7510f32f906ab4e583542eae2611b020f88629af
2023-10-13 12:14:22 +01:00
Caroline Tice
8abb2ace88 Revert "Re-apply "[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432)""
This reverts commit a9d0ab2ee572f179f80483f3ebbbcdd03c3b4481.
That commit is causing clang crashes.
2023-10-06 20:51:48 -07:00
Momchil Velikov
a9d0ab2ee5 Re-apply "[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432)"
This re-applies commit ace20e24287b, which was reverted in eff4ef25b3dc.

The issues were fixed in:

  * b30765caf874 [AArch64] Fix an incorrect handling of debug values in
    MachineSink (#68107)

  * b454b04d6869 [AArch64] Fix a compiler crash in MachineSink (#67705)
2023-10-06 09:34:42 +01:00
Momchil Velikov
eff4ef25b3 Revert "[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432)"
This reverts commit ace20e24287bf531bb1185e213642c3b49eb293c.

This might be causing a buildbot failure at
https://green.lab.llvm.org/green/job/clang-stage1-RA/35786/
2023-09-27 14:24:59 +01:00
Momchil Velikov
ace20e2428
[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432) 2023-09-27 10:05:32 +01:00
Momchil Velikov
c649fd34e9 [MachineSink][AArch64] Sink instruction copies when they can replace copy into hard register or folded into addressing mode
This patch adds a new code transformation to the `MachineSink` pass,
that tries to sink copies of an instruction, when the copies can be folded
into the addressing modes of load/store instructions, or
replace another instruction (currently, copies into a hard register).

The criteria for performing the transformation is that:
* the register pressure at the sink destination block must not
  exceed the register pressure limits
* the latency and throughput of the load/store or the copy must not deteriorate
* the original instruction must be deleted

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D152828
2023-09-25 10:49:44 +01:00
Anatoly Trosinenko
eb02ee44d3 [AArch64] Move PAuth codegen down the machine pipeline
To simplify handling PAuth in the machine outliner, introduce a
separate AArch64PointerAuth pass that is executed after both
Prologue/Epilogue Inserter and Machine Outliner passes.

After moving to AArch64PointerAuth, signLR and authenticateLR are
not used outside of their class anymore, so make them private and
simplify accordingly.

The new pass is added via AArch64PassConfig::addPostBBSections(),
so that it can change the code size before branch relaxation occurs.
AArch64BranchTargets is placed there too, so it can take into account
any PACI(A|B)SP instructions and not excessively add BTIs at the start
of functions.

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D159357
2023-09-22 14:49:14 +03:00
Arthur Eubanks
0a1aa6cda2
[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.

This matches other nearby enums.

For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
2023-09-14 14:10:14 -07:00
Sander de Smalen
ecb7b9c5c5 [Clang][AArch64] Diagnostics for SME attributes when target doesn't have 'sme'
This patch adds error diagnostics to Clang when code uses the AArch64 SME
attributes without specifying 'sme' as available target attribute.

* Function definitions marked as '__arm_streaming', '__arm_locally_streaming',
  '__arm_shared_za' or '__arm_new_za' will by definition use or require SME
  instructions.
* Calls from non-streaming functions to streaming-functions require
  the compiler to enable/disable streaming-SVE mode around the call-site.

In some cases we can accept the SME attributes without having 'sme' enabled:
* Function declaration can have the SME attributes.
* Definitions can be __arm_streaming_compatible since the generated
  code should execute on processing elements without SME.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D157269
2023-08-09 12:31:02 +00:00
Zhongyunde
05aae0839f Reland [AArch64][NFC] Call the API getVScaleRange directly
Use the maximum 64 for BitWidth of getVScaleRange to avoid returning an empty range.

the previous changes bring in a Buildbot failure because MinSVEVectorSize = MinSVEVectorSize.
    error: explicitly assigning value of variable of type 'unsigned int' to itself [-Werror,-Wself-assign]

Reviewed By: sdesmalen, nikic, dmgreen
Differential Revision: https://reviews.llvm.org/D155708
2023-07-26 18:55:31 +08:00
Zhongyunde
ebaac2b2d6 Revert "[AArch64][NFC] Call the API getVScaleRange directly"
This reverts commit 67005c8e6fa9464f8bc436305a422071013ae499.
2023-07-26 16:44:14 +08:00
Zhongyunde
67005c8e6f [AArch64][NFC] Call the API getVScaleRange directly
Use the maximum 64 for BitWidth of getVScaleRange to
avoid returning an empty range.

Reviewed By: sdesmalen, nikic, dmgreen
Differential Revision: https://reviews.llvm.org/D155708
2023-07-26 15:54:04 +08:00
Daniel Hoekwater
0315fca912 [AArch64] Move branch relaxation after bbsection assignment
Because branch relaxation needs to factor in if branches target
a block in the same section or a different one, it needs to run
after the Basic Block Sections / Machine Function Splitting passes.

Because Jump table compression relies on block offsets remaining
fixed after the table is compressed, we must also move the JT
compression pass.

The only tests affected are ones enforcing just the ordering and
the a few that have basic block ids changed because RenumberBlocks
hasn't run yet.

Differential Revision: https://reviews.llvm.org/D153829
2023-07-21 20:24:52 +00:00
Sander de Smalen
08fd44b300 [AArch64] Force streaming-compatible codegen when attributes are set.
Before this patch, the only way to generate streaming-compatible code
was to use the `-force-streaming-compatible-sve` flag, but the compiler
should also avoid the use of instructions invalid in streaming mode
when a function has the aarch64_pstate_sm_enabled/compatible attribute.

Reviewed By: paulwalker-arm, david-arm

Differential Revision: https://reviews.llvm.org/D155428
2023-07-18 10:26:00 +00:00
Sami Tolvanen
e9569748de [CodeGen][KCFI] Move cfi-type lowering to TargetLowering
KCFI machine function passes transform indirect calls with a
cfi-type attribute into architecture-specific type checks bundled
together with the calls. Instead of having a separate pass for each
architecture, add a generic machine function pass for KCFI and
move the architecture-specific code that emits the actual check to
TargetLowering. This avoids unnecessary duplication and makes it
easier to add KCFI support to other architectures.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D149915
2023-05-09 18:38:54 +00:00