52422 Commits

Author SHA1 Message Date
Michael Maitland
2f400a2fd7
[GISEL] Add G_VSCALE instruction (#84542) 2024-03-12 20:22:49 -04:00
S. Bharadwaj Yadavalli
54f631d116
[DirectX][NFC] Model precise overload type specification of DXIL Ops (#83917)
Implement an abstraction to specify precise overload types supported by
DXIL ops. These overload types are typically a subset of LLVM
intrinsics.

Implement the corresponding changes in DXILEmitter backend.

Add tests to verify expected errors for unsupported overload types at
code generation time.

Add tests to check for correct overload error output.
2024-03-12 16:51:18 -04:00
Arthur Eubanks
6bbb73b4cb
[X86] Fix determining if globals with size <8 bits are large (#84975)
Previously any global under 8 bits would accidentally be considered 0
sized, which is considered a large global.
2024-03-12 12:43:29 -07:00
Arthur Eubanks
45219702e7 [test][X86] Precommit test for large data threshold and i1 global 2024-03-12 19:08:40 +00:00
Simon Pilgrim
c1af6ab505 [X86] getFauxShuffleMask - recognise CONCAT(SUB0, SUB1) style patterns
Handles the INSERT_SUBVECTOR(INSERT_SUBVECTOR(UNDEF,SUB0,0),SUB1,N) pattern

Currently limited to v8i64/v8f64 cases as only AVX512 has decent cross lane 2-input shuffles, the plan is to relax this as I deal with some regressions
2024-03-12 17:40:19 +00:00
Jun Wang
c4e517f59c
[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035)
A new function attribute named amdgpu_num_work_groups is added. This
attribute, which consists of three integers, allows programmers to let
the compiler know the number of workgroups to be launched in each of the
three dimensions and do optimizations based on that information.

---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2024-03-12 10:30:39 -07:00
Matt Arsenault
bd72ebd8d1
AMDGPU: Add some more mfma hazard recognizer tests (#84727) 2024-03-12 22:05:47 +05:30
Jake Egan
fa1d13590c [AIX][tests] Disable failing tests on AIX
These new tests are failing on the AIX bot because the -I option isn't supported.

Disable these tests for now until they can be fixed.
2024-03-12 12:11:18 -04:00
Nemanja Ivanovic
08dd645c15
[RISC-V] Bad immediate value for Zcmp instructions with E extension (#84925)
When we are using the Zcmp extension together with the E extension in
32-bit mode and we need to spill both callee-saved registers as well as
needing a couple of 32-bit stack slots, we emit a meaningless stack
adjustment with cm.push/cm.popret. Furthermore this leads to the stack
slot for the ra being clobbered so control returns to a random location.

This is just a pre-commit test so that the PR for the fix shows the
difference in code generation.
2024-03-12 16:26:49 +01:00
Bjorn Pettersson
4d0f79e346 Pre commit test cases SRL/SRA support in canCreateUndefOrPoison. NFC
Add test cases to show that we can't push freeze through SRA/SRL with
'exact' flag when there are multiple uses.
2024-03-12 16:03:18 +01:00
Danial Klimkin
afd4758703
Revert "[NVPTX] Add support for atomic add for f16 type" (#84918)
Reverts llvm/llvm-project#84295 due to breakages.
2024-03-12 15:01:18 +01:00
Adrian Kuegel
8e0f4b943f
[NVPTX] Add support for atomic add for f16 type (#84295)
atom.add.noftz.f16 is supported since SM 7.0
2024-03-12 09:12:44 +01:00
Dhruv Chawla (work)
1d900e2984
[AArch64][GlobalISel] Avoid generating inserts for undefs when selecting G_BUILD_VECTOR (#84452)
It is safe to ignore undef values when selecting G_BUILD_VECTOR as undef
values choose random registers for copying values from.
2024-03-12 11:57:07 +05:30
Phoebe Wang
e89b4bcf32
[X86] Remove SlowDivide tuning from GRTTuning (#84676)
The DIV32/64 throughput was improved since Goldmont in the Atom
architecture. The Alder Lake-E shows similar number too. So we shouldn't
add such tunings to Gracemont and later products.

Checked from Agner Fog's table and uops.info.
2024-03-12 13:41:49 +08:00
Craig Topper
884b051a42 Recommit "[TypePromotion] Support positive addition amounts in isSafeWrap. (#81690)"
With special case with Add constant is 0.

Original message:
We can support these by changing the sext promotion to -zext(-C) and
replacing a sgt check with ugt. Reframing the logic in terms of how the
unsigned range are affected. More comments in the patch.

The new cases check isLegalAddImmediate to avoid some
regressions in lit tests.
2024-03-11 12:39:38 -07:00
Michael Maitland
034cc2f5d0
[GISEL] Add G_INSERT_SUBVECTOR and G_EXTRACT_SUBVECTOR (#84538)
G_INSERT and G_EXTRACT are not sufficient to use to represent both
INSERT/EXTRACT on a subregister and INSERT/EXTRACT on a vector.

We would like to be able to INSERT/EXTRACT on vectors in cases that
INSERT/EXTRACT on vector subregisters are not sufficient, so we add
these opcodes.

I tried to do a patch where we treated G_EXTRACT as both
G_EXTRACT_SUBVECTOR and G_EXTRACT_SUBREG, but ran into an infinite loop
at this
[point](8b5b294ec2/llvm/lib/Target/RISCV/RISCVISelLowering.cpp (L9932))
in the SDAG equivalent code.
2024-03-11 13:47:30 -04:00
Simon Pilgrim
6cd68c2f87 [X86] Add base SSE2 coverage to SRL/SRA combines tests 2024-03-11 16:25:05 +00:00
Simon Pilgrim
7dc4d5f6a0 [X86] Add AVX512 (x86-64-v4) coverage to generic shift combines tests 2024-03-11 16:22:47 +00:00
Sivan Shani
5e688f0dbd [llvm][arm] add T1 and T2 assembly options for vlldm and vlstm
Re-land 634b0243b8f7acc85af4f16b70e91d86ded4dc83.

T1 allow for an optional registers list,
the register list must be {d0-d15}.
T2 define a mandatory register list,
the register list must be {d0-d31}.

The requirements for T1/T2 are as follows:
                T1              T2
Require:        v8-M.Main,      v8.1-M.Main,
                secure state    secure state
16 D Regs       valid           valid
32 D Regs       UNDEFINED       valid
No D Regs       NOP             NOP
2024-03-11 14:27:28 +00:00
Pierre van Houtryve
d4569d42b5
[AMDGPU] Let LowerModuleLDS run twice on the same module (#81729)
If all variables in the module are absolute, this means we're running
the pass again on an already lowered module, and that works.
If none of them are absolute, lowering can proceed as usual.
Only diagnose cases where we have a mix of absolute/non-absolute GVs,
which means we added LDS GVs after lowering, which is broken.

See #81491
Split from #75333
2024-03-11 09:20:01 +01:00
Craig Topper
561ddb1687 Revert "[TypePromotion] Support positive addition amounts in isSafeWrap. (#81690)"
This reverts commit 0813b90ff5d195d8a40c280f6b745f1cc43e087a.

Fixes miscompile reported in #84718.
2024-03-11 00:51:21 -07:00
AtariDreams
4e0e9b17c6
[SelectionDAG] Switch to LiveRegUnits (#84197) 2024-03-11 12:47:39 +05:30
Carl Ritson
4a21e3afa2
[LiveIntervals] repairIntervalsInRange: recompute width changes (#78564)
Extend repairIntervalsInRange to completely recompute the interva for a
register if subregister defs exist without precise subrange matches
(LaneMask exactly matching subregister).
This occurs when register sequences are lowered to copies such that the
size of the copies do not match any uses of the subregisters formed
(i.e. during twoaddressinstruction).

The subranges without this change are probably legal, but do not match
those generated by live interval computation. This creates problems with
other code that assumes subranges precisely cover all subregisters
defined, e.g. shrinkToUses().
2024-03-11 15:24:17 +09:00
Kito Cheng
b7f97d3661
[RISCV] Place mergeable small read only data into srodata section (#82214)
Small mergeable read only data was place on the sdata before, but it
also means it lose the mergeable property, which means lose some code
size optimization opportunity during link time.
2024-03-11 13:57:06 +08:00
Carl Ritson
d9e6aa7048
[AMDGPU] Update LiveInterval def index for early-clobber (#79285)
On converting an instruction to an early-clobber definition in
convertToThreeAddress, we must also update live intervals for the
register to start at the early-clobber index.
2024-03-11 14:54:11 +09:00
Craig Topper
d8d2dea7fc
[RISCV] Handle FP riscv_masked_strided_load with 0 stride. (#84576)
Previously, we tried to create an integer extending load. We need to a
non-extending FP load instead.

Fixes #84541.
2024-03-10 21:22:37 -07:00
wanglei
edd4c6c6dc
[LoongArch] Make sure that the LoongArchISD::BSTRINS node uses the correct MSB value (#84454)
The `MSB` must not be greater than `GRLen`. Without this patch, newly
added test cases will crash with LoongArch32, resulting in a 'cannot
select' error.
2024-03-11 08:59:17 +08:00
Simon Pilgrim
862c7e0218 [X86] combineAndShuffleNot - ensure the type is legal before create X86ISD::ANDNP target nodes
Fixes #84660
2024-03-10 16:23:51 +00:00
Simon Pilgrim
92d7aca441
[X86] Add missing immediate qualifier to the (V)CMPSS/D instructions (#84496)
Matches (V)CMPPS/D and makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-03-09 16:21:25 +00:00
Jay Foad
fd3eaf76ba
[GISel] Enforce G_PTR_ADD RHS type matching index size for addr space (#84352) 2024-03-09 09:07:22 +00:00
Craig Topper
6b270358c7
[SelectionDAG] Allow FREEZE to be hoisted before FP SETCC. (#84358)
No nans/infs in SelectionDAG is complicated. Hopefully I've captured
all of the cases. I've only applied to ConsiderFlags to the SDNodeFlags
since those are the only ones that will be droped by hoisting. The
condition code and TargetOptions would still be in effect.
    
Recovers some regression from #84232.
2024-03-08 17:21:21 -08:00
yingopq
755b439694
[Mips] Fix missing sign extension in expansion of sub-word atomic max (#77072)
Add sign extension "SEB/SEH" before compare.

Fix #61881
2024-03-08 15:41:31 -05:00
David Majnemer
edc1c3d24e [AArch64] Make more vector f16 operations legal
v8f16 is a legal type but promoting to v16f16 would result in an illegal
type.

Let's legalize these by a combination of splitting+promoting resulting
in a pair of v4f16.

Also, we were being overly cautious with different v4f16 nodes. Mark
more of them safe to promote to v4f32.
2024-03-08 19:52:54 +00:00
David Majnemer
5f935e9181 [AArch64] Optimize fp64 <-> fp16 SIMD conversions
Legalization would result in needless scalarization. Add some
DAGCombines to fix this up.
2024-03-08 19:52:53 +00:00
Shilei Tian
e963d0740e
[AMDGPU] Replace isInlinableLiteral16 with specific version (#84402)
The current implementation of `isInlinableLiteral16` assumes, a 16-bit
inlinable
literal is either an `i16` or a `fp16`. This is not always true because
of
`bf16`. However, we can't tell `fp16` and `bf16` apart by just looking
at the
value. This patch splits `isInlinableLiteral16` into three versions,
`i16`,
`fp16`, `bf16` respectively, and call the corresponding version.
2024-03-08 14:49:52 -05:00
Craig Topper
a456885efc
[SelectionDAG] Allow FREEZE to be hoisted before integer SETCC. (#84241)
Teach canCreateUndefOrPoison that ISD::SETCC with integer operands can
never create undef/poison. FP SETCC is more complicated and will be
handled in a future patch.

Teach isGuaranteedNotToBeUndefOrPoison that ISD::CONDCODE is not
poison/undef. Its a special constant only used by setcc/select_cc like
nodes. This is needed since the hoisting will only hoist if exactly one
operand might be poison. setcc has 3 operand including the condition
code.
    
Recovers some regression from #84232.
2024-03-08 10:17:54 -08:00
Lukacma
2b4d8188b2
[Clang][LLVM][SVE2.1] Created intrinsics for DUPQ instr. (#83260)
This patch adds clang and llvm support for following intrinsic and maps
it to DUPQ instruction:
```
   // Variants are also available for:
   // _s8, _u16, _s16, _u32, _s32, _u64, _s64
   // _bf16, _f16, _f32, _f64
   svuint8_t svdup_laneq[_u8](svuint8_t zn, uint64_t imm_idx);
```
2024-03-08 15:35:48 +00:00
Paul Walker
bd6eb54886
[LLVM][CodeGen] Teach SelectionDAG how to expand FREM to a vector math call. (#83859)
This removes, at least when a vector library is available, a failure
case for scalable vectors. Doing so means we can confidently cost vector
FREM instructions without making an assumption that later passes will
transform the IR before it gets to the code generator.

NOTE: Whilst only FREM has been implemented the same mechanism
can be used for the other libm related ISD nodes.
2024-03-08 12:09:05 +00:00
zhongyunde 00443407
a110a1c0ed [AArch64] MachineCombiner msub matching for i64 2024-03-08 18:14:26 +08:00
zhongyunde 00443407
3a62edcf52 [AArch64] MachineCombiner msub matching
Pattern should be sorted in priority order since the pattern evalutor
stops checking as soon as it finds a faster sequence.
so for a * b - c * d, we prefer to match the 2nd operands of sub,
which can be use msub to fold them.

Refer to https://www.slideshare.net/chimerawang/instruction-combine-in-llvm

Fix https://github.com/llvm/llvm-project/issues/84152
2024-03-08 18:14:25 +08:00
Sizov Nikita
ef1eb0315e
[AArch64] Add neon bici test for haddu and shadd (#84073)
Add neon bici test for haddu and shadd, prerequisite for #76644
2024-03-08 09:45:58 +00:00
Pierre van Houtryve
4b1910b11d
[GlobalISel][AMDGPU] Import patterns with multiple defs (#84171)
Fixes #63216
2024-03-08 09:39:10 +01:00
Vyacheslav Levytskyy
fb1be9b33c
[SPIR-V] Insert a bitcast before load/store instruction to keep SPIR-V code valid (#84069)
This PR introduces a step after instruction selection where instructions
can be traversed from the perspective of their validity from the
specification point of view. The PR adds also a way to correct
load/store when there is a type mismatch contradicting the specification
-- an additional bitcast is inserted to keep types consistent.
Correspondent test cases are added and existing test cases are
corrected.

This PR helps to successfully validate with the `spirv-val` tool
(https://github.com/KhronosGroup/SPIRV-Tools) some output that
previously led to validation errors and crashes of back translation from
SPIRV to LLVM IR from the side of SPIRV Translator project
(https://github.com/KhronosGroup/SPIRV-LLVM-Translator).

The added step of bringing instructions to required by the specification
type correspondence can be (should be and will be) extended beyond
load/store instructions to ensure validity rules of other SPIRV
instructions related to type inference.
2024-03-08 08:31:56 +01:00
Amara Emerson
f6b825f51e Revert "Revert "[AArch64][GlobalISel] Fix incorrect selection of monotonic s32->s64 anyext load.""
Attempt 2. The first one was trying to call isa<> on an MI reference that was free'd.

This reverts commit ee24409c40ff35c3221892d9723331c233ca9f0e.
2024-03-07 23:28:33 -08:00
Fangrui Song
66bd3cd75b [AMDGPU,test] Change llc -march= to -mtriple=
PR #75982 had been created before these tests were added, therefore
some test were not updated.
2024-03-07 19:09:18 -08:00
Chen Zheng
cc34e56b86 [PPC][NFC] add an option to expose the bug in 74951 2024-03-07 20:52:44 -05:00
Chen Zheng
e7a22e72de [PPC] precommit cases for issue 74915 2024-03-07 20:22:26 -05:00
Igor Kudrin
0cd7942c7f
[llvm-dwarfdump] Fix parsing DW_CFA_AARCH64_negate_ra_state (#84128)
The saved state of the AARCH64_DWARF_PAUTH_RA_STATE register was not
updated, so `llvm-dwarfdump` continued to dump it as `reg34=1` even if
the correct value is `0`:

```
> llvm-dwarfdump -v test.o
...
0000002c 00000024 00000030 FDE cie=00000000 pc=00000030...00000064
  Format:       DWARF32
  DW_CFA_advance_loc: 4
  DW_CFA_AARCH64_negate_ra_state:
  DW_CFA_advance_loc: 4
  DW_CFA_def_cfa_offset: +16
  DW_CFA_offset: W30 -16
  DW_CFA_remember_state:
  DW_CFA_advance_loc: 16
  DW_CFA_def_cfa_offset: +0
  DW_CFA_advance_loc: 4
  DW_CFA_AARCH64_negate_ra_state:
  DW_CFA_restore: W30
  DW_CFA_advance_loc: 4
  DW_CFA_restore_state:
  DW_CFA_advance_loc: 12
  DW_CFA_def_cfa_offset: +0
  DW_CFA_advance_loc: 4
  DW_CFA_AARCH64_negate_ra_state:
  DW_CFA_restore: W30
  DW_CFA_nop:

  0x30: CFA=WSP
  0x34: CFA=WSP: reg34=1
  0x38: CFA=WSP+16: W30=[CFA-16], reg34=1
  0x48: CFA=WSP: W30=[CFA-16], reg34=1
  0x4c: CFA=WSP: reg34=1               <--- should be '=0'
  0x50: CFA=WSP+16: W30=[CFA-16], reg34=1
  0x5c: CFA=WSP: W30=[CFA-16], reg34=1
  0x60: CFA=WSP: reg34=1               <--- should be '=0'
```
2024-03-08 07:34:20 +07:00
Craig Topper
0d4978f3cf [RISCV] Update some tests I missed in 909ab0e0d1903ad2329ca9fdf248d21330f9437f. NFC 2024-03-07 16:21:41 -08:00
Amara Emerson
26fa440957 [GlobalISel] Fix yet another pointer type invalid combining issue, this time in tryFoldSelectOfConstants() 2024-03-07 15:58:28 -08:00