45995 Commits

Author SHA1 Message Date
Johannes Doerfert
f6e3a89cc0 [AMDGPU] Annotate the intrinsics to be default and nocallback
Differential Revision: https://reviews.llvm.org/D135155
2022-12-07 14:25:25 -08:00
Jon Chesterfield
d77ae7f251 [amdgpu] Reimplement LDS lowering
Renames the current lowering scheme to "module" and introduces two new
ones, "kernel" and "table", plus a "hybrid" that chooses between those three
on a per-variable basis.

Unit tests are set up to pass with the default lowering of "module" or "hybrid"
with this patch defaulting to "module", which will be a less dramatic codegen
change relative to the current. This reflects the sparsity of test coverage for
the table lowering method. Hybrid is better than module in every respect and
will be default in a subsequent patch.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D139433
2022-12-07 22:02:54 +00:00
Chris Bieneman
c861ea8736 Generate DXIL Shader hash
DXIL shader bitcode is hashed and the hash is placed into the final
output object file in its own data part.

This change modifies the DXContainerGlobals pass to compute the shader
hash (just an MD5 of the bitcode) and put the shader hash data into a
global for the HASH part.

This also sets the hash flag as appropriate for if the hashed shader
contained debug information. There is additional handling required to
get debug information in shaders working correctly with our tooling,
but that will be addressed in subsequent patches.

Reviewed By: python3kgae

Differential Revision: https://reviews.llvm.org/D139357
2022-12-07 15:22:55 -06:00
Koakuma
f8f41c3fcd [SPARC] Lower SELECT_CC to MOVr on 64-bit target whenever possible
On 64-bit target, when doing i64 SELECT_CC where one of the comparison operands
is a constant zero, try to fold the compare and MOVcc into a MOVr instruction.

For all integers, EQ and NE comparison are available, additionally for signed
integers, GT, GE, LT, and LE is also available.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D138922
2022-12-07 15:34:58 -05:00
Brad Smith
7806f86a5e Revert "[SPARC] Mark the %g0 register as constant & use it to materialize zeros"
2 of the Sparc tests are now failing.

This reverts commit 2c41310fc146a1f609147c65ac5f30e5a57e84a8.
2022-12-07 15:27:57 -05:00
Tim Northover
6b98824a58 AArch64: emit fcmp ord %a, zeroinitializer as a single fcmeq.
Most "ord" checks need two real-world compares to implement, but this is the
canonical form of a "!isnan" check, which is equivalent to comparing the input
for equality against itself.
2022-12-07 19:17:30 +00:00
Koakuma
2c41310fc1 [SPARC] Mark the %g0 register as constant & use it to materialize zeros
Materialize zeros by copying from %g0, which is now marked as constant.

This makes it possible for some common operations (like integer negation) to be
performed in fewer instructions.

This continues @arichardson's patch at D132561.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D138887
2022-12-07 13:34:13 -05:00
Joe Nash
bbfbec94b1 [AMDGPU] Enable OMod on more VOP3 instructions
OMod was disabled if OpSel was enabled, but that restriction is more
specific than necessary. Any VOP3 with float operands can use OMod.

On GFX11, FMAC_F16_e64 can use op_sel.
Previously, SIFoldOperands and convertToThreeAddress were accidentally correct when
they reinterpreted the zero OMod operand on V_FMAC_F16_e64 as the OpSel operand on
V_FMA_F16_gfx9_e64. Now we explicitly add op_sel if required.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D139469
2022-12-07 13:30:33 -05:00
Amara Emerson
53445f5b1c [GlobalISel] Add a new G_INVOKE_REGION_START instruction to fix an EH bug.
We currently have a bug where the legalizer, when dealing with phi operands,
may create instructions in the phi's incoming blocks at points which are effectively
dead due to a possible exception throw.

Say we have:

throwbb:
  EH_LABEL
  x0 = %callarg1
  BL @may_throw_call
  EH_LABEL
  B returnbb

bb:
  %v = phi i1 %true, throwbb, %false....

When legalizing we may need to widen the i1 %true value, and to do that we need
to create new extension instructions in the incoming block. Our insertion point
currently is the MBB::getFirstTerminator() which puts the IP before the unconditional
branch terminator in throwbb. These extensions may never be executed if the call
throws, and therefore we need to emit them before the call (but not too early, since
our new instruction may need values defined within throwbb as well).

throwbb:
  EH_LABEL
  x0 = %callarg1
  BL @may_throw_call
  EH_LABEL
  %true = G_CONSTANT i32 1 ; <<<-- ruh'roh, this never executes if may_throw_call() throws!
  B returnbb

bb:
  %v = phi i32 %true, throwbb, %false....

To fix this, I've added two new instructions. The main idea is that G_INVOKE_REGION_START
is a terminator, which tries to model the fact that in the IR, the original invoke inst
is actually a terminator as well. By using that as the new insertion point, we
make sure to place new instructions on always executing paths.

Unfortunately we still need to make the legalizer use a new insertion point API
that I've added, since the existing `getFirstTerminator()` method does a reverse
walk up the block, and any non-terminator instructions cause it to bail out. To
avoid impacting compile time for all `getFirstTerminator()` uses, I've added a new
method that does a forward walk instead.

Differential Revision: https://reviews.llvm.org/D137905
2022-12-07 10:28:51 -08:00
Craig Topper
f1fd5c9b36 [RISCV] Remove pseudos for whole register load, store, and move.
The MC layer instructions have the correct register classes, and
the pseudos don't have any additional operands. So there doesn't
seem to be any reason for them to exist.

The pseudos were incorrectly going through code in RISCVMCInstLower
that converted LMUL>1 register classes to LMUL1 register class.
This makes the MCInst technically malformed, and prevented the
vl2r.v, vl4r.v, and vl8r.v InstAliases from matching. This accounts
for all of the .ll test diffs.

Differential Revision: https://reviews.llvm.org/D139511
2022-12-07 10:19:58 -08:00
Craig Topper
938d0d6d7b [RISCV] Replace uses of hasStdExtC with COrZca.
Except MakeCompressible which will need more work.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D139504
2022-12-07 09:34:01 -08:00
Mirko Brkusanin
fe42ebe442 [AMDGPU][GlobalISel] Fix legalizing image intrinsics for new types
We no longer need to increase vector size to 16 for intrinsics that use more
than 8 vgprs for addr. There is no image intrinsic that needs more than 12
so all currently existing cases will be covered. Using incorrect size was
causing an error in instruction selection because instructions were updated
to require new types (9x32, 10x32, 11x32, 12x32).

Differential Revision: https://reviews.llvm.org/D139546
2022-12-07 18:20:58 +01:00
Joe Nash
c6c4f54e6b [AMDGPU] Add gfx11 runline to omod test. NFC 2022-12-07 11:40:39 -05:00
Phoebe Wang
fc16ca7120 [X86] Pre-commit test for pr59305 2022-12-08 00:35:42 +08:00
Doru Bercea
f7d426b362 Automate tests. 2022-12-07 10:05:02 -06:00
Simon Pilgrim
519adee201 [X86] combine-and.ll - add some 256/512-bit test coverage for D138521 2022-12-07 14:21:24 +00:00
Simon Pilgrim
abe6dbeedd [X86] combine-and.ll - add AVX1 test coverage 2022-12-07 14:21:24 +00:00
Nico Weber
bfe292b00f [llvm] Auto-cleanup left-over file from earlier version of this test on bots
When this test was originally added in 991dfedfd738ce, it didn't pass
`-o` to to llc, causing llc to write a .s file to the source directory.
On the next run, lit would then try to run that as a test.
Make the test auto-cleanup that file for a while.
2022-12-07 09:09:45 -05:00
Haojian Wu
3abdd9b91b Fix a typo ll->llc in test 2022-12-07 13:24:06 +01:00
Haojian Wu
a2215149ae Add implementation isTargetCanonicalConstantNode for hexagon.
This fixes an infinite compiling loop caused by https://reviews.llvm.org/D137140

Differential Revision: https://reviews.llvm.org/D139525
2022-12-07 13:21:29 +01:00
Janek van Oirschot
587747d8d1 [AMDGPU] G_IS_FPCLASS lower() support for IEEE fp types
Simplified globalisel version of sdag's expandIS_FPCLASS.

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D139128
2022-12-07 11:53:09 +00:00
Anton Sidorenko
f8ed709345 [MachineCombiner] Extend reassociation logic to handle inverse instructions
Machine combiner supports generic reassociation only of associative and
commutative instructions, for example (A + X) + Y => (X + Y) + A. However, we
can extend this generic support to handle patterns like
(X + A) - Y => (X - Y) + A), where `-` is the inverse of `+`.
This patch adds interface functions to process reassociation patterns of
associative/commutative instructions and their inverse variants with minimal
changes in backends.

Differential Revision: https://reviews.llvm.org/D136754
2022-12-07 13:50:28 +03:00
Piotr Sobczak
1e3abd82b9 [AMDGPU] Fix wide spills
Update spill code to account for new vector types with
bit widths: 288, 320, 352, 384.

Related to D138205.

Differential Revision: https://reviews.llvm.org/D139203
2022-12-07 10:23:52 +01:00
David Sherwood
bfb6f47e9e [SVE] Change some bfloat lane intrinsics to use i32 immediates
Almost all of the other SVE LLVM IR intrinsics take i32 values
for lane indices or other immediates. We should bring the bfloat
intrinsics in line with that. It will also make it easier to
add support for the SVE2.1 float intrinsics in future, since
they reuse the same underlying instruction classes.

I've maintained backwards compatibility with the old i64 variants
and used the autoupgrade mechanism.

Differential Revision: https://reviews.llvm.org/D138788
2022-12-07 09:19:54 +00:00
Qiu Chaofan
62f20f51ce [PowerPC] Support test data class intrinsic of 128-bit float
We've exploited test data class instructions introduced in ISA 3.0.
This change unifies the scalar intrinsics into ppc_test_data_class
and add support for 128-bit precision float values using xststdcqp.

Vector versions of the intrinsic can't be unified because they return
vector int instead of int.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D138105
2022-12-07 16:44:12 +08:00
Yeting Kuo
0f8c761c48 [VP][RISCV] Recommit "Add vp.fshl/fshr and RISC-V support."
This reverts commit 7883e5b061bdbbe8bee5f479ebe911db5045b7e9.

The original commit was reverted that it didn't update test files after D136263
landed. The recommit fixed those.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D139509
2022-12-07 15:58:12 +08:00
Justin Bogner
bcfdaa96f5 [AMDGPU] Handle min(max(x, y), max(min(x, y), z)) in med3 combines
Differential Revision: https://reviews.llvm.org/D139508
2022-12-06 22:59:43 -08:00
Justin Bogner
916ae0a060 [AMDGPU] Handle nnan and fast on the call in fpmed3 patterns
We were only allowing these med3 patterns if the operands were known
to not be NaN, but we should also allow it if the calls to max/min
have the `nnan` or `fast` flags.

Differential Revision: https://reviews.llvm.org/D139506
2022-12-06 22:57:52 -08:00
Justin Bogner
f1353d2f1a [AMDGPU] Precommit GISel test for min(max(x, y), max(min(x, y), z)) -> med3
These combines will be added by https://reviews.llvm.org/D139508
2022-12-06 22:57:39 -08:00
Rahman Lavaee
6015a045d7 [Propeller] Use Fixed MBB ID instead of volatile MachineBasicBlock::Number.
Let Propeller use specialized IDs for basic blocks, instead of MBB number.

This allows optimizations not just prior to asm-printer, but throughout the entire codegen.
This patch only implements the functionality under the new `LLVM_BB_ADDR_MAP` version, but the old version is still being used. A later patch will change the used version.

####Background
Today Propeller uses machine basic block (MBB) numbers, which already exist, to map native assembly to machine IR.  This is done as follows.
    - Basic block addresses are captured and dumped into the `LLVM_BB_ADDR_MAP` section just before the AsmPrinter pass which writes out object files. This ensures that we have a mapping that is close to assembly.
    - Profiling mapping works by taking a virtual address of an instruction and looking up the `LLVM_BB_ADDR_MAP` section to find the MBB number it corresponds to.
    - While this works well today, we need to do better when we scale Propeller to target other Machine IR optimizations like spill code optimization.  Register allocation happens earlier in the Machine IR pipeline and we need an annotation mechanism that is valid at that point.
    - The current scheme will not work in this scenario because the MBB number of a particular basic block is not fixed and changes over the course of codegen (via renumbering, adding, and removing the basic blocks).
    - In other words, the volatile MBB numbers do not provide a one-to-one correspondence throughout the lifetime of Machine IR.  Profile annotation using MBB numbers is restricted to a fixed point; only valid at the exact point where it was dumped.
    - Further, the object file can only be dumped before AsmPrinter and cannot be dumped at an arbitrary point in the Machine IR pass pipeline.  Hence, MBB numbers are not suitable and we need something else.
####Solution
We propose using fixed unique incremental MBB IDs for basic blocks instead of volatile MBB numbers. These IDs are assigned upon the creation of machine basic blocks. We modify `MachineFunction::CreateMachineBasicBlock` to assign the fixed ID to every newly created basic block.  It assigns `MachineFunction::NextMBBID` to the MBB ID and then increments it, which ensures having unique IDs.

 To ensure correct profile attribution, multiple equivalent compilations must generate the same Propeller IDs. This is guaranteed as long as the MachineFunction passes run in the same order. Since the `NextBBID` variable is scoped to `MachineFunction`, interleaving of codegen for different functions won't cause any inconsistencies.

The new encoding is generated under the new version number 2 and we keep backward-compatibility with older versions.

####Impact on Size of the `LLVM_BB_ADDR_MAP` Section
Emitting the Propeller ID results in a 23% increase in the size of the `LLVM_BB_ADDR_MAP` section for the clang binary.

Reviewed By: tmsriram

Differential Revision: https://reviews.llvm.org/D100808
2022-12-06 22:50:09 -08:00
Kazu Hirata
7883e5b061 Revert "[VP][RISCV] Add vp.fshl/fshr and RISC-V support."
This reverts commit 70de0e014013b4d97febe6704881a9a8c893d078.

I'm seeing:

Failed Tests (2):
  LLVM :: CodeGen/RISCV/rvv/fixed-vectors-fshr-fshl-vp.ll
  LLVM :: CodeGen/RISCV/rvv/fshr-fshl-vp.ll

Also reported at:

https://lab.llvm.org/buildbot/#/builders/123/builds/14531
2022-12-06 22:27:43 -08:00
Justin Bogner
6ec24c1d5c [AMDGPU] Precommit GISel test showing missing med3 combines
These combines will be added by https://reviews.llvm.org/D139506
2022-12-06 22:22:47 -08:00
Monk Chiang
7b50c18360 [RISCV] Codegen support for Zfhmin.
The Zfhmin subset only has FLH, FSH, FMV.X.H, FMV.H.X, FCVT.S.H, and FCVT.H.S.
If the D extension is present, the FCVT.D.H and FCVT.H.D instructions are also included.
Since most instructions are not included for Zfhmin, so most operations are promoted.
The patch primarily about making f16 a legal type.

RISC-V ISA info:
https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D139391
2022-12-06 22:14:15 -08:00
Yeting Kuo
70de0e0140 [VP][RISCV] Add vp.fshl/fshr and RISC-V support.
The patch made VectorLegalizer expand ISD::VP_FSHL and ISD::VP_FSHR to
achieve the codegen.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D138379
2022-12-07 12:16:36 +08:00
Craig Topper
8d30b9e64f [RISCV] Move VSPILL/VRELOAD expansion for vector tuples to eliminateFrameIndex.
We need a scratch GPR to increment the base pointer for each subsequent
register. We currently reuse the input GPR for the base pointer without
declaring it as a Def of the pseudo.

We can't add it as a Def of the pseudo at creation time because it doesn't
get register allocated. This was tried in D109405.

Seems the only choice we have is to scavenge the GPR. This patch
moves the expansion to eliminateFrameIndex where we can create
virtual registers that will be scavenged. This also eliminates the
extra operand for passing vlenb from frame lowering to expand pseudos.

I need to do more testing on real world code, but wanted to get this
up for early review.

I hope this will fix the issue reported in D123394, but I haven't
checked yet.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D139169
2022-12-06 15:42:00 -08:00
Paul Robinson
a459529858 [Mips] Convert a test to check 'target=...'
Although it should base the check on host, not target, if possible.

Part of the project to eliminate special handling for triples in lit
expressions.
2022-12-06 15:24:23 -08:00
Craig Topper
1806ce9097 [RISCV] Teach RISCVMatInt to prefer li+slli over lui+addi(w) for compressibility.
With C extension, li with a 6 bit immediate followed by slli is 4 bytes.
The lui+addi(w) sequence is at least 6 bytes.

The two sequences probably have similar execution latency. The exception
being if the target supports lui+addi(w) macrofusion.

Since the execution latency is probably the same I didn't restrict
this to C extension.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D139135
2022-12-06 10:31:17 -08:00
Mircea Trofin
4c97745bf0 Reapply "[mlgo] Dependency-free training mode logger"
This reverts commit 8abe7b11f74bea63d3134c144137b72146da0c7b.

Added the missing cast which was causing a build problem on certain compilers.
2022-12-06 10:29:50 -08:00
Peter Rong
ee31a4a702 [ARM] IselLowering unsigned overflow to crash using APInt in PerformSHLSimplify
This diff fixes issue https://github.com/llvm/llvm-project/issues/59317

We should check if bitwidth is lower than the shift amount before we subtract them to avoid unsigned overflow.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D139238
2022-12-06 09:58:27 -08:00
Florian Hahn
8abe7b11f7
Revert "[mlgo] Dependency-free training mode logger"
This reverts commit c5ff6f72342e0a4b0ba2ec9f603bedca86721e80.

This breaks building on macOS:

FAILED: lib/Analysis/CMakeFiles/LLVMAnalysis.dir/TensorSpec.cpp.o
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -DBUILD_EXAMPLES -DGTEST_HAS_RTTI=0 -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/clang-build/lib/Analysis -I/Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm-project/llvm/lib/Analysis -I/Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/clang-build/include -I/Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm-project/llvm/include -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -O3 -DNDEBUG -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk -mmacosx-version-min=10.14  -fno-exceptions -fno-rtti -UNDEBUG -std=c++17 -MD -MT lib/Analysis/CMakeFiles/LLVMAnalysis.dir/TensorSpec.cpp.o -MF lib/Analysis/CMakeFiles/LLVMAnalysis.dir/TensorSpec.cpp.o.d -o lib/Analysis/CMakeFiles/LLVMAnalysis.dir/TensorSpec.cpp.o -c /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm-project/llvm/lib/Analysis/TensorSpec.cpp
In file included from /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm-project/llvm/lib/Analysis/TensorSpec.cpp:16:
In file included from /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm-project/llvm/include/llvm/Analysis/TensorSpec.h:16:
/Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm-project/llvm/include/llvm/Support/JSON.h:354:29: error: non-constant-expression cannot be narrowed from type 'unsigned long' to 'int64_t' (aka 'long long') in initializer list [-Wc++11-narrowing]
    create<int64_t>(int64_t{I});
                            ^
/Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm-project/llvm/lib/Analysis/TensorSpec.cpp:55:18: note: in instantiation of function template specialization 'llvm::json::Value::Value<unsigned long, void, void, void>' requested here
        OS.value(D);
                 ^
/Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm-project/llvm/include/llvm/Support/JSON.h:354:29: note: insert an explicit cast to silence this issue
    create<int64_t>(int64_t{I});
                            ^
                            static_cast<int64_t>( )
1 error generated.

https://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/33120/consoleFull#-145995569149ba4694-19c4-4d7e-bec5-911270d8a58c
2022-12-06 17:24:55 +00:00
Nico Weber
a862d09a92 Revert "[amdgpu] Reimplement LDS lowering"
This reverts commit 982017240d7f25a8a6969b8b73dc51f9ac5b93ed.
Breaks check-llvm, see https://reviews.llvm.org/D139433#3974862
2022-12-06 12:01:36 -05:00
Sanjay Patel
adc7c589c3 [SDAG] try to convert bit set/clear to signbit test when trunc is free
(X & Pow2MaskC) == 0 --> (trunc X) >= 0
(X & Pow2MaskC) != 0 --> (trunc X) <  0

This was noted as a regression in the post-commit feedback for D112634
(where we canonicalized IR differently).

For x86, this saves a few instruction bytes. AArch64 seems neutral.

Differential Revision: https://reviews.llvm.org/D139363
2022-12-06 11:34:48 -05:00
Sanjay Patel
772c2f461b [AArch64][RISCV][x86] add tests for masked val equality with 0; NFC 2022-12-06 11:34:48 -05:00
Jon Chesterfield
982017240d [amdgpu] Reimplement LDS lowering
Renames the current lowering scheme to "module" and introduces two new
ones, "kernel" and "table", plus a "hybrid" that chooses between those three
on a per-variable basis.

Unit tests are set up to pass with the default lowering of "module" or "hybrid"
with this patch defaulting to "module", which will be a less dramatic codegen
change relative to the current. This reflects the sparsity of test coverage for
the table lowering method. Hybrid is better than module in every respect and
will be default in a subsequent patch.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D139433
2022-12-06 16:28:15 +00:00
Mircea Trofin
c5ff6f7234 [mlgo] Dependency-free training mode logger
This is the next step in dropping the dependency on protobuf.

The simple logger produces an output consisting of lines of json
strings. Tensor values - which should constitute the bulk of the data -
are serialized as raw byte buffers. This allows for light-weight reading
of the values.

The next step is to switch the training logic to the new logging format,
following which the protobuf-based logger will be dropped, together with
the training dependency on protobuf.

Subsequent changes will also stop buffering and stream, instead - the
buffering model is just as a convenient point-in-time.

Differential Revision: https://reviews.llvm.org/D139370
2022-12-06 08:12:45 -08:00
Sander de Smalen
5922a04dbd [AArch64][SVE2p1] Make use of REVD instruction.
Reversing double-words within a quard-word is possible using the REVD instruction
when SVE2p1 is enabled.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D139119
2022-12-06 15:42:32 +00:00
chenglin.bi
5cd900ce3c [AArch64] Transform shift+and to shift+shift to select more shifted register
and (shl/srl/sra, x, c), mask --> shl (srl/sra, x, c1), c2

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D138904
2022-12-06 23:29:56 +08:00
bipmis
bda1f0b96c Add tests which can be matched to umull 2022-12-06 12:49:05 +00:00
Archibald Elliott
83b3304dd2 [AArch64] Implement __arm_rsr128/__arm_wsr128
This only contains the SelectionDAG implementation. GlobalISel to
follow.

The broad approach is:
- Introduce new builtins for 128-bit wide instructions.
- Lower these to @llvm.read_register.i128/@llvm.write_register.i128
- Introduce target-specific ISD nodes which have legal operands (two
  i64s rather than an i128). These are named AArch64::{MRRS, MSRR} to
  match the instructions they are for. These are a little complex as
  they need to match the "shape" of what they're replacing or the
  legaliser complains.
- Select these using the existing tryReadRegister/tryWriteRegister to
  share the MDString parsing code, and introduce additional code to
  ensure these are selected into the right MRRS/MSRR instructions. What
  makes this hard is ensuring that the two i64s end up in an XSeqPair
  register pair, because SelectionDAG doesn't care that much about
  register classes if it can avoid doing so.

The main change to existing code is the reorganisation of
tryReadRegister and tryWriteRegister to try to keep the string parsing
code separate from the instruction creating code.

This also includes the changes to clang to define and use the ACLE
feature macro named `__ARM_FEATURE_SYSREG128`.

Contributors:
  Sam Elliott
  Lucas Prates

Differential Revision: https://reviews.llvm.org/D139086
2022-12-06 11:39:05 +00:00
Ties Stuij
94e7e58fa4 [AArch64] implement GPR (U/S)(MIN/MAX) instruction SDag support
Using SelectionDag, lower umin, umax, smin, smax intrinsics to corresponding
UMIN, UMAX, SMIN, SMAX instructions when feat CSSC is available.

See specs for corresponding immediate and register versions in:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D138813
2022-12-06 10:57:49 +00:00