52796 Commits

Author SHA1 Message Date
Craig Topper
e837ef91e3 [RISCV][GISel] Re-generate legalize-vastart-rv32.mir and legalize-vastart-rv64.mir to fix buildbot failure. NFC
I must have messed something up when addressing feedback on the patch
that added these tests.
2023-12-08 13:08:46 -08:00
Maryam Moghadas
8f6f5ec776
[PowerPC] Move __ehinfo TOC entries to the end of the TOC section (#73586)
On AIX, the __ehinfo toc-entry is never referenced directly using
instructions, therefore we can allocate them with the TE storage mapping
class to move them to the end of TOC.
2023-12-08 15:03:11 -05:00
Michael Maitland
e8dbed097a [RISCV][GISEL] Fix RUN lines in vararg.ll
The `< %s` needed to be removed. This change fixes the test introduced
in 02379d19147afda413a2bc757e8d2f5249d772d1
2023-12-08 11:56:55 -08:00
Michael Maitland
02379d1914 [RISCV][GISEL] Add vararg.ll LLVM IR -> ASM test
This test is added to be the counterpart of the SelectionDAG
llvm/test/CodeGen/RISCV/vararg.ll test. Minor changes were made compared
to the other version, all which are commented in the test file added in
this commit.
2023-12-08 11:25:54 -08:00
Arthur Eubanks
687e63a2bd
[X86] Allow accessing large globals in small code model (#74785)
This removes some assumptions that the small code model will only
reference "near" globals.

There are still some missing optimizations and wrong code sequences, but
I'd like to address those separately. This will require auditing any
checks of the code model in the X86 backend.
2023-12-08 11:09:54 -08:00
Craig Topper
478d093e1b
[RISCV][GISel] Reverse the operands the buildStore created in legalizeVAStart. (#73989)
We need to store the frame index to the location pointed to by the
VASTART, not the other way around.
2023-12-08 10:45:53 -08:00
Michael Maitland
3a38baa0e7
[GISEL][RISCV] Legalize llvm.vacopy intrinsic (#73066)
In the future, we can consider adding a G_VACOPY opcode instead of going
through the GIntrinsic for all targets. We do the approach in this patch
because that is what other targets do today.
2023-12-08 13:45:32 -05:00
Michael Maitland
6f9cb9a75c
[RISCV][GISEL] Legalize G_VAARG through expansion. (#73065)
G_VAARG can be expanded similiar to SelectionDAG::expandVAArg through
LegalizerHelper::lower. This patch implements the lowering through this
style of expansion.

The expansion gets the head of the va_list by loading the pointer to
va_list. Then, the head of the list is adjusted depending on argument
alignment information. This gives a pointer to the element to be read
out of the va_list. Next, the head of the va_list is bumped to the next
element in the list. The new head of the list is stored back to the
original pointer to the head of the va_list so that subsequent G_VAARG
instructions get the next element in the list. Lastly, the element is
loaded from the alignment adjusted pointer constructed earlier.

This change is stacked on #73062.
2023-12-08 13:24:27 -05:00
Jonas Paulsson
435ba72afd
[SystemZ] Simplify handling of AtomicRMW instructions. (#74789)
Let the AtomicExpand pass do more of the job of expanding
AtomicRMWInst:s in order to simplify the handling in the backend.

The only cases that the backend needs to handle itself are those of
subword size (8/16 bits) and those directly corresponding to a target
instruction.
2023-12-08 17:19:17 +01:00
Benjamin Kramer
06ebe3b237 [NVPTX] Fix a typo that makes the output invalid PTX
It's surprisingly tricky to trigger this as it's only used by abs/neg
which expand into and/xor in the integer domain.
2023-12-08 14:22:07 +01:00
Jay Foad
e38c29c2b7 [AMDGPU] Add GFX11 test coverage to integer-mad-patterns.ll 2023-12-08 13:06:03 +00:00
Saiyedul Islam
5c4c199fe3
[AMDGPU][NFC] Improve testing for AMDHSA ABI Version (#74300)
Add tests for COV4 as well as COV5 instead of only testing for the
default version.
2023-12-08 18:09:45 +05:30
Simon Pilgrim
5f91335a55 [X86] canonicalizeBitSelect - always use VPTERNLOGD for sub-32bit types
We were using VPTERNLOGQ for everything but i32 types, which made broadcasts wider than necessary

Noticed in #73509
2023-12-08 11:38:32 +00:00
Simon Pilgrim
faecc736e2
[DAG] isSplatValue - node is a splat if all demanded elts have the same whole constant value (#74443) 2023-12-08 10:53:51 +00:00
Simon Pilgrim
8859a4f630 [X86] LowerBUILD_VECTOR - don't use insert_element(constant, elt, idx) if we have a freeze(undef) element
Fixes #74736
2023-12-08 10:28:56 +00:00
Valery Pykhtin
901c5be524
[AMDGPU] Fix GCNUpwardRPTracker: max register pressure on defs. (#74422)
Treat a defined register as fully live "at" the instruction and update maximum pressure accordingly. Fixes #3786.
2023-12-08 11:27:08 +01:00
wanglei
cdc3732566 [LoongArch] Mark ISD::FNEG as legal 2023-12-08 15:07:58 +08:00
wanglei
9f70e708a7
[LoongArch] Make ISD::FSQRT a legal operation with lsx/lasx feature (#74795)
And add some patterns:
1. (fdiv 1.0, vector)
2. (fdiv 1.0, (fsqrt vector))
2023-12-08 14:16:26 +08:00
Philip Reames
ffb2af3ed6
[SCEVExpander] Attempt to reinfer flags dropped due to CSE (#72431)
LSR uses SCEVExpander to generate induction formulas. The expander
internally tries to reuse existing IR expressions. To do that, it needs
to strip any poison generating flags (nsw, nuw, exact, nneg, etc..)
which may not be valid for the newly added users.

This is conservatively correct, but has the effect that LSR will strip
nneg flags on zext instructions involved in trip counts in loop
preheaders. To avoid this, this patch adjusts the expanded to reinfer
the flags on the CSE candidate if legal for all possible users.

This should fix the regression reported in
https://github.com/llvm/llvm-project/issues/71200.

This should arguably be done inside canReuseInstruction instead, but
doing it outside is more conservative compile time wise. Both
canReuseInstruction and isGuaranteedNotToBePoison walk operand lists, so
right now we are performing work which is roughly O(N^2) in the size of
the operand graph. We should fix that before making the per operand step
more expensive. My tenative plan is to land this, and then rework the
code to sink the logic into more core interfaces.
2023-12-07 13:20:36 -08:00
Craig Topper
e87f33d9ce
[RISCV][MC] Pass MCSubtargetInfo down to shouldForceRelocation and evaluateTargetFixup. (#73721)
Instead of using the STI stored in RISCVAsmBackend, try to get it from
the MCFragment.

This addresses the issue raised here
https://discourse.llvm.org/t/possible-problem-related-to-subtarget-usage/75283
2023-12-07 13:17:58 -08:00
Natalie Chouinard
6c6f8b1acd
[SPIR-V] Fixup tests (#73371)
These tests are currently failing at tip-of-tree, but pass with minor
FileCheck updates that look reasonable.
2023-12-07 15:23:27 -05:00
Stefan Pintilie
ea8b95d0d5
[PowerPC] Add a set of extended mnemonics that are missing from Power 10. (#73003)
This patch adds the majority of the missing extended mnemonics that were
introduced in Power 10.

The only extended mnemonics that were not added are related to the plq
and pstq instructions. These will be added in a separate patch as the
instructions themselves would also have to be added.
2023-12-07 13:40:00 -05:00
David Green
e3720bbc08 [AArch64] Extend and cleanup vector icmp test cases. NFC 2023-12-07 18:39:33 +00:00
Simon Pilgrim
f1200ca7ac
[DAG] visitEXTRACT_VECTOR_ELT - constant fold legal fp imm values (#74304)
If we're extracting a constant floating point value, and the constant is a legal fp imm value, then replace the extraction with a fp constant.
2023-12-07 14:56:12 +00:00
Simon Pilgrim
5384fb3d40 [X86] gep-expanded-vector.ll - regenerate checks 2023-12-07 14:07:10 +00:00
wanglei
9ff7d0ebeb
[LoongArch] Add codegen support for icmp/fcmp with lsx/lasx fetaures (#74700)
Mark ISD::SETCC node as legal, and add handling for the vector types
condition codes.
2023-12-07 20:11:43 +08:00
Harald van Dijk
03edfe6148
Implement SoftPromoteHalf for FFREXP. (#74076)
`llvm/test/CodeGen/RISCV/llvm.frexp.ll` and
`llvm/test/CodeGen/X86/llvm.frexp.ll` contain a number of disabled tests
for unimplemented functionality. This implements one missing part of it.
2023-12-07 11:10:17 +00:00
Simon Pilgrim
22df0886a1
[DAG] Don't split f64 constant stores if the fp imm is legal (#74622)
If the target can generate a specific fp immediate constant, then don't split the store into 2 x i32 stores

Another cleanup step for #74304
2023-12-07 10:33:03 +00:00
Sjoerd Meijer
3acbd38492
[AArch64] Optimise MOVI + CMGT to CMGE (#74499)
This fixes a regression that occured for a pattern of MOVI + CMGT
instructions, which can be optimised to CMGE. I.e., when the signed
greater than compare has -1 as an operand, we can rewrite that as a
compare greater equal than 0, which is what CMGE does.

Fixes #61836
2023-12-07 08:32:02 +00:00
Fangrui Song
39ba027f4e [RISCV,test] Test whether MCAssembler uses function target-features
Test https://discourse.llvm.org/t/possible-problem-related-to-subtarget-usage/75283
The test is similar to ARM/relax-per-target-feature.ll in spirit.
2023-12-07 00:23:42 -08:00
Chen Zheng
4b932d84f4
[PowerPC] redesign the target flags (#69695)
12 bit is not enough for PPC's target specific flags. If 8 bit for the
bitmask flags, 4 bit for the direct mask, PPC can total have 16 direct
mask and 8 bitmask. Not enough for PPC, see this issue in
https://github.com/llvm/llvm-project/pull/66316

Redesign how PPC target set the target specific flags. With this patch,
all ppc target flags are direct flags. No bitmask flag in PPC anymore.

This patch aligns with some targets like X86 which also has many target
specific flags.

The patch also fixes a bug related to flag `MO_TLSGDM_FLAG` and `MO_LO`.
They are the same value and the test case changes in this PR shows the
bug.
2023-12-07 12:47:25 +08:00
Craig Topper
b310932f87
[RISCV] Add vmv.x.s to RISCVOptWInstrs. (#74519)
This instruction produces a 32-bit sign extended value if the SEW is less than or
equal to 32.
2023-12-06 17:06:56 -08:00
Thurston Dang
69c4930aad Revert "Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG""
This reverts commit 1f283a60a4bb896fa2d37ce00a3018924be82b9f.
Reason: breaks MSan buildbot
(https://lab.llvm.org/buildbot/#/builders/74/builds/24077)
2023-12-06 19:27:21 +00:00
paperchalice
5baf66f3c2
[CodeGen] Port WasmEHPrepare to new pass manager (#74435)
Port `WasmEHPrepare` to new pass manager, also rename `wasmehprepare` to
`wasm-eh-prepare`.
2023-12-06 11:11:00 -08:00
Artem Belevich
a2d3bb1fa9
Revert "[NVPTX] Lower 16xi8 and 8xi8 stores efficiently (#73646)" (#74518)
This reverts commit 173fcf7da592acd284dc50749558fd36928861f0.

We need to constrain the optimization to properly aligned loads/stores
only.
https://github.com/llvm/llvm-project/pull/73646#issuecomment-1841454559
2023-12-06 10:48:43 -08:00
Matthew Devereau
8186e1500b
[SME2] Add LUTI2 and LUTI4 single Builtins and Intrinsics (#73304)
See https://github.com/ARM-software/acle/pull/217

Patch by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>
2023-12-06 16:35:56 +00:00
Matt Arsenault
1f283a60a4 Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG"
This reverts commit 9e50c6e6b5741895f58f3e530004052844b6af9f. A few assertion and verifier
errors have been fixed in the coalescer and allocator, so hopefully this sticks this time.
2023-12-06 23:07:22 +07:00
Matt Arsenault
546a9ce80c
CodeGen: Fix bypassing legality checks for IMPLICIT_DEF rematerialization (#73934)
It's permitted to have extra implicit-def operands of the same main
register
after the main register def. If there are implicit operands, use the
standard
legality checks which verify the operand contents.

Depends #73933
2023-12-06 21:43:19 +07:00
Shengchen Kan
b8bc2351b8
[X86][test] Simplify test avx512-broadcast-unfold.ll (#74593)
The test was updated by

opt -passes=early-cse -S
llvm/test/CodeGen/X86/avx512-broadcast-unfold.ll
2023-12-06 22:38:32 +08:00
Simon Pilgrim
56eb3e738a
[X86] Set x87 fld1/fldz pseudo instructions as rematerializable (#74592)
No need to generate/spill/restore to cpu stack

Cleanup work to allow us to properly use isFPImmLegal and fix some regressions encountered while looking at #74304
2023-12-06 14:36:42 +00:00
Matthew Devereau
30faf19a88
[SME2] Add LUTI2 and LUTI4 double Builtins and Intrinsics (#73305)
See https://github.com/ARM-software/acle/pull/217

Patch by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>
2023-12-06 14:35:11 +00:00
Simon Pilgrim
bf454839a1 [X86] vec_zero_cse.ll - replace X32 check prefix with X86
We use X32 for gnux32 triples - X86 should be used for 32-bit triples
2023-12-06 14:06:37 +00:00
Shengchen Kan
50c66600b8 [X86][test] Migrate test avx512-broadcast-unfold.ll for opaque pointers 2023-12-06 21:24:30 +08:00
Simon Pilgrim
609d980b3f [ARM] Regenerate aapcs-hfa-code.ll 2023-12-06 12:09:30 +00:00
Simon Pilgrim
f12a0ba53e [X86] zero-remat.ll - regenerate checks 2023-12-06 11:15:55 +00:00
Simon Pilgrim
322c7c717b [X86] slow-unaligned-mem.ll - improve checks
We can't easily convert this to use the update scripts, but we can manually improve the checks so we check for the right number of stores
2023-12-06 10:50:57 +00:00
Matthew Devereau
6704d6aadd
[SME2] Add LUTI2 and LUTI4 quad Builtins and Intrinsics (#73317)
See https://github.com/ARM-software/acle/pull/217

Patch by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>
2023-12-06 10:08:04 +00:00
Pierre van Houtryve
ecd2f56a80
[AMDGPU] Warn if 'amdgpu-waves-per-eu' target occupancy was not met (#74055)
This should make it a bit harder to miss this type of issue. The warning
only shows if amdgpu-waves-per-eu is used.

See SWDEV-434482
2023-12-06 10:46:46 +01:00
Matt Arsenault
08e63dd8fe AMDGPU: Add a MIR test to catch infinite loop
This is derived from one of the regressions reported
after aed1a2217a1da0c9fb7d2c0856302dee25b1d4a1
2023-12-06 15:58:32 +07:00
wanglei
de21308f78 [LoongArch] Make ISD::VSELECT a legal operation with lsx/lasx 2023-12-06 16:43:38 +08:00