44749 Commits

Author SHA1 Message Date
Fangrui Song
2417618d5c [Verifier] Reject dllexport with non-default visibility
Add a visibility check for dllimport and dllexport. Note: dllimport with a
non-default visibility (implicit dso_local) is already rejected, but with a less
clear dso_local error.

The MC level visibility `MCSA_Exported` (D123951) is mapped from IR level
default visibility when dllexport is specified. The D123951 error is now very
difficult to trigger (needs to disable the IR verifier).

Reviewed By: mstorsjo

Differential Revision: https://reviews.llvm.org/D133267
2022-09-05 10:53:41 -07:00
Amara Emerson
511f2169a8 [GlobalISel] Update combine-build-vector.mir test checks before patch. 2022-09-05 16:06:05 +01:00
Amara Emerson
22b6a4fcac [GlobalISel] Update test checks before a patch. 2022-09-05 15:24:07 +01:00
David Sherwood
ffa6267300 [CodeGen] Support extracting fixed-length vectors from illegal scalable vectors
For some indices we can simply extract the fixed-length subvector from the
low half of the scalable vector, for example when the index is less than the
minimum number of elements in the low half. For all other cases we can
expand the operation through the stack by storing out the vector and
reloading the fixed-length part we need.

Fixes https://github.com/llvm/llvm-project/issues/55412

Tests added here:

  CodeGen/AArch64/sve-extract-fixed-from-scalable-vector.ll

Differential Revision: https://reviews.llvm.org/D117499
2022-09-05 15:05:14 +01:00
Ivan Kosarev
5db8d6fd2b [AMDGPU][CodeGen] Support (base | offset) SMEM loads.
Prevents generation of unnecessary s_or_b32 instructions.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D132552
2022-09-05 14:22:06 +01:00
Andrey Tretyakov
1268cf6454 [SPIRV] Add tests to improve test coverage
Differential Revision: https://reviews.llvm.org/D133265
2022-09-05 15:52:01 +03:00
Ivan Kosarev
1f550d86b2 [AMDGPU][CodeGen] Pre-commit a test on (base | offset) SMEM loads for D132552.
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D133021
2022-09-05 13:12:43 +01:00
Ivan Kosarev
f33645301e [AMDGPU][CodeGen] Support (soffset + offset) s_buffer_load's.
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D130263
2022-09-05 12:53:05 +01:00
Amara Emerson
fb60e50c78 [GlobalISel] Fix a combine crash due to a negative G_INSERT_VECTOR_ELT idx.
These should really be folded away to undef but we shouldn't crash in any case.
2022-09-05 12:10:17 +01:00
Simon Pilgrim
4e6783f866 [DAG] getFreeze()/getNode() - account for operand depth when calling isGuaranteedNotToBeUndefOrPoison (PR57554)
Similar to #57402 - we were calling isGuaranteedNotToBeUndefOrPoison on the freeze operand (with Depth = 0), but wasn't accounting for the fact that a later isGuaranteedNotToBeUndefOrPoison assertion will call from the new node (with Depth = 0 as well) - which will then recursively call isGuaranteedNotToBeUndefOrPoison for its operands with Depth = 1

Fixes #57554
2022-09-05 11:46:46 +01:00
Craig Topper
0d1d36cfa6 [X86] Pre-commit tests for D130862. NFC 2022-09-04 21:19:01 -07:00
gonglingqin
bc743bf666 [LoongArch] Add codegen support for fcopysign
Differential Revision: https://reviews.llvm.org/D133185
2022-09-05 11:03:54 +08:00
Simon Pilgrim
e438ce5694 [AMDGPU] Add -verify-machineinstrs to attr-amdgpu-flat-work-group-size* tests
These were affected by D131825 (and reported on Issue #57149) - adding the verification will help ensure that we don't hit this again on builds with EXPENSIVE_CHECKS enabled
2022-09-03 13:47:41 +01:00
Simon Pilgrim
62cdfdab4d [DAG] canCreateUndefOrPoison - add freeze(insert_subvector(x,y,c)) -> insert_subvector(freeze(x),freeze(y),c) support
We already have plenty of assertions in place to ensure that the insertion index is constant and inrange
2022-09-03 13:41:33 +01:00
Simon Pilgrim
3968844bff [X86] Add test showing failure to fold freeze(insert_subvector(x,y,c)) -> insert_subvector(freeze(x),freeze(y),c)
If at least one of x and y are known never poison.
2022-09-03 13:27:08 +01:00
Daniil Fukalov
b4e1b0e00d [LiveIntervals] Split live intervals on any dead def
Each dead def of the same virtual register is required to be split into multiple
virtual registers with separate live intervals to avoid MachineVerifier error.

Partially fixes https://github.com/llvm/llvm-project/issues/56050 and
https://github.com/llvm/llvm-project/issues/56051

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D130477
2022-09-02 20:00:22 +03:00
Andrey Tretyakov
f20c9c42d2 [SPIRV] Add tests to improve test coverage
Differential Revision: https://reviews.llvm.org/D132903
2022-09-02 13:19:28 +03:00
WANG Xuerui
2dd434c3ee [LoongArch] Support lowering br_jt
Jump tables cannot be generated yet, due to missing support for emitting
local addresses.

Differential Revision: https://reviews.llvm.org/D132653
2022-09-02 17:57:50 +08:00
Andrey Tretyakov
13453c9861 [SPIRV] Add tests to improve test coverage
Differential Revision: https://reviews.llvm.org/D132817
2022-09-02 11:59:18 +03:00
Nuno Lopes
858fe8664e Expand Div/Rem: consider the case where the dividend is zero
So we can't use ctlz in poison-producing mode
2022-09-01 17:04:26 +01:00
Matt Devereau
b9062ceffc [AArch64][SVE] Add floating-point repeated complex pattern llc tests 2022-09-01 15:04:59 +00:00
Nikita Popov
5134bd432f [DwarfEhPrepare] Assign dummy debug location for inserted _Unwind_Resume calls (PR57469)
DwarfEhPrepare inserts calls to _Unwind_Resume into landing pads.
If _Unwind_Resume happens to be defined in the same module and
debug info is used, then this leads to a verifier error:

  inlinable function call in a function with debug info must
    have a !dbg location
  call void @_Unwind_Resume(ptr %exn.obj) #0

Fix this by assigning a dummy location to the call. (As this
happens in the backend, inlining is not actually relevant here.)

Fixes https://github.com/llvm/llvm-project/issues/57469.

Differential Revision: https://reviews.llvm.org/D133095
2022-09-01 16:35:49 +02:00
Ilia Diachkov
698c800142 [SPIRV] support builtin types and ExtInsts selection
The patch adds the support of OpenCL and SPIR-V built-in types. It also
implements ExtInst selection and adds spv_unreachable and spv_alloca
intrinsics which improve the generation of the corresponding SPIR-V code.
Five LIT tests are included to demonstrate the improvement.

Differential Revision: https://reviews.llvm.org/D132648

Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com>
Co-authored-by: Andrey Tretyakov <andrey1.tretyakov@intel.com>
Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>
2022-09-01 16:44:54 +03:00
Amara Emerson
4cf3db41da [GlobalISel] Add sdiv exact (X, constant) -> mul combine.
This port of the SDAG optimization is only for exact sdiv case.

Differential Revision: https://reviews.llvm.org/D130517
2022-09-01 13:34:00 +01:00
gonglingqin
6e47ebdcec [LoongArch] Support ISD::BR_CC and branch according to condition flag register
Use bceqz/bcnez instead of movcf2gr + bnez/beqz for branch jumps.

Differential Revision: https://reviews.llvm.org/D132824
2022-09-01 10:43:16 +08:00
Sam Clegg
349a2c37f9 [WebAssembly][MC] Update tests after recent removal of .size directives for functions
These were missing from https://reviews.llvm.org/D132929
2022-08-31 14:54:13 -07:00
Nikita Popov
ab6876a40d reland: [Local] Allow creating callbr with duplicate successors
Since D129288, callbr is allowed to have duplicate successors. This patch removes a limitation which prevents optimizations from actually producing such callbrs.

This is probably the riskiest of all the recent callbr changes, because code with incorrect assumptions might be lurking somewhere. I fixed the one case I encountered ahead of time in 8201e3ef5c.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D129997

Originally landed as
commit 08860f525a23 ("[Local] Allow creating callbr with duplicate successors")

Reverted in
commit 1cf6b93df168 ("Revert "[Local] Allow creating callbr with duplicate successors"")
2022-08-31 13:23:00 -07:00
Nick Desaulniers
d7474bef77 [llvm][TailDuplicator] don't taildup isInlineAsmBrIndirectTargets
This fixes a crash observed after
https://reviews.llvm.org/D129997.

Similar to D88823.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D130127
2022-08-31 13:07:10 -07:00
Nick Desaulniers
86b6b39411 [llvm][test] precommit test for D130127
Add a FIXME that will be immediately removed in D130127.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D132609
2022-08-31 12:40:52 -07:00
Simon Pilgrim
eaede4b5b7 [DAG] extractShiftForRotate - replace assertion for shift opcode with an early-out
We feed the result from the first extractShiftForRotate call into the second, and that result might no longer be a shift op (usually due to constant folding).

NOTE: We REALLY need to stop creating nodes on the fly inside extractShiftForRotate!

Fixes Issue #57474
2022-08-31 15:50:48 +01:00
Hassnaa Hamdi
a6d9c944df [AArch64 - SVE]: Use SVE to lower reduce.fadd.
Differential Revision: https://reviews.llvm.org/D132573

skip custom-lowering for v1f64 to be expanded instead, because it has only one lane

Differential Revision: https://reviews.llvm.org/D132959
2022-08-31 12:31:06 +00:00
Hassnaa Hamdi
d8655bdeb4 [AArch64-SVE-fixed]:
change vscale_range<2,0> to vscale_range<1,0> for 64/128-bit vectors of fadda tests
2022-08-31 11:40:46 +00:00
Simon Pilgrim
9d22800275 [DAG] visitFreeze - account for operand depth when calling isGuaranteedNotToBeUndefOrPoison (PR57402)
We were calling isGuaranteedNotToBeUndefOrPoison on operands (with Depth = 0), but wasn't accounting for the fact that a later isGuaranteedNotToBeUndefOrPoison assertion will call from the new node (with Depth = 0 as well) - which will then recursively call isGuaranteedNotToBeUndefOrPoison for its operands with Depth = 1

Fixes #57402
2022-08-31 12:20:30 +01:00
gonglingqin
fb9d67636a [LoongArch] Support floating-point number reciprocal
Differential Revision: https://reviews.llvm.org/D132847
2022-08-31 14:20:46 +08:00
Xiang Li
6917799e37 [DirectX backend] change MinVectorRegisterBitWidth to 32.
This is to avoid vector-combine generate vector4 on float.

Reviewed By: beanz

Differential Revision: https://reviews.llvm.org/D132826
2022-08-30 23:20:12 -07:00
Kai Luo
ad2f7fd286 [AtomicExpand] Make floating point conversion happens before fence insertion
IIUC, the conversion part is not part of atomic operations and fences should be put around converted atomic operations.
This also fixes atomic load of floating point values which requires fence on PowerPC.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D127609
2022-08-31 09:54:58 +08:00
Markus Böck
2fdf963daf [GlobalISel] Explicitly fail trying to translate gc.statepoint and related intrinsics
The provided testcase would previously fail with an assertion due to later down below trying to allocate registers for `token` return types and arguments. This is especially problematic as the process would then exit instead of falling back to using FastIsel.

This patch fixes that by simply explicitly failing translation if either of these intrinsics are encountered.

Fixes https://github.com/llvm/llvm-project/issues/57349

Differential Revision: https://reviews.llvm.org/D132974
2022-08-31 00:47:17 +02:00
Mingming Liu
4df696fbe9 [NFC] Move a test case across files.
The test case is about pmull2 instruction generated used than a SIMD
ldr being generated. So aarch64-pmull2.ll is a better test file.

Differential Revision: https://reviews.llvm.org/D132277
2022-08-30 14:16:28 -07:00
Craig Topper
893f5e95e2 [RISCV] Improve isel of AND with shiftedMask containing 32 leading zeros and some trailing zeros.
We can use srliw to shift out the trailing bits and slli to shift
back in zeros. The sign extend of srliw will 0 the upper 32 bits
since we will be shifting a 0 into bit 31.
2022-08-30 12:22:46 -07:00
Stanislav Mekhanoshin
fd1f8c85f2 [AMDGPU] Limit TID / wavefrontsize uniformness to 1D kernels
If a kernel has uneven dimensions we can have a value of workitem-id-x
divided by the wavefrontsize non-uniform. For example dimensions (65, 2)
will have workitems with address (64, 0) and (0, 1) packed into a same
wave which gives 1 and 0 after the division by 64 respectively.

Unfortunately, this limits the optimization to OpenCL only and only if
reqd_work_group_size attribute is set. This patch limits it to 1D kernels,
although that shall be possible to perform this optimization is the size
of the X dimension is a power of 2, we just do not currently have
infrastructure to query it.

Note that presence of amdgpu-no-workitem-id-y attribute does not help
as it only hints the lack of the workitem-id-y query, but not the absence
of the actual 2nd dimension, therefore affecting just the SGPR allocation.

Differential Revision: https://reviews.llvm.org/D132879
2022-08-30 12:22:08 -07:00
Justin Bogner
f9433161f5 [AMDGPU] Precommit two tests showing missed combines to v_med3 2022-08-30 11:56:09 -07:00
Stephen Long
40999cbd93 [SVE] Fix SVEDup0 matching -0.0f
Because of D128669, CPY is being used to zero active lanes even in the case of -0.0f. This patch checks for floating point positive zero. That way SVEDup0 won't match -0.0f.

Fixes https://github.com/llvm/llvm-project/issues/57428

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D132880
2022-08-30 11:07:17 -07:00
Jon Chesterfield
9b0b912e15 [amdgpu][nfc] Add test case showing false aliasing in LDS lowering 2022-08-30 15:33:57 +01:00
Thomas Symalla
d26dd37149 [NFC][AMDGPU] Pre-commit tests for D132837.
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D132930
2022-08-30 13:55:36 +02:00
Tomas Matheson
050dad57f7 [AArch64][GISel] constrain regclass for 128->64 copy
When selecting G_EXTRACT to COPY for extracting a 64-bit GPR from
a 128-bit register pair (XSeqPair) we know enough to constrain the
destination register class to gpr64. Without this it may have only
a register bank and some copy elimination code would assert while
assuming that a register class existed.

The register class has to be set explicitly because we might hit the
COPY -> COPY case where register class can't be inferred.

This would cause the following to crash in selection, where the store
is commented (otherwise the store constrains the register class):

  define dso_local i128 @load_atomic_i128_unordered(i128* %p) {
    %pair = cmpxchg i128* %p, i128 0, i128 0 acquire acquire
    %val = extractvalue { i128, i1 } %pair, 0
    ; store i128 %val, i128* %p
    ret i128 %val
  }

Differential Revision: https://reviews.llvm.org/D132665
2022-08-30 11:02:51 +01:00
Tomas Matheson
9a390d6692 [AArch64][GISel] fix G_ADD*/G_SUB* legalization
widenScalarDst updates the insert point to after MI, so
widenScalarSrc must be called before widenScalarDst. Otherwise
The updated Src values will appear after MI and break SSA. e.g.:

  %14:_(s64), %15:_(s1) = G_UADDE %9:_, %11:_, %13:_

becomes

  %14:_(s64), %16:_(s32) = G_UADDE %9:_, %11:_, %17:_
  %15:_(s1) = G_TRUNC %16:_(s32)
  %17:_(s32) = G_ZEXT %13:_(s1)

Differential Revision: https://reviews.llvm.org/D132547

Change-Id: Ie3458747a6879433f4d5ab9939d2bd102dd0f2db
2022-08-30 10:59:32 +01:00
Ting Wang
710923cdc8 [PowerPC] CTRLoop pseudo instructions should not be duplicated
Add isNotDuplicable to CTRLoop pseudo instructions, to avoid other pass
such as early-tailduplication break the loop structure by duplicating
pseudo instructions.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D132738
2022-08-30 04:32:29 -04:00
Ting Wang
f908cbc36f [NFC][PowerPC] Add test case to show ctrloop mi shall not be duplicated
Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D132899
2022-08-30 01:57:22 -04:00
wanglian
e2bb9774b1 [LegalizeTypes] Support widen result for VECTOR_REVERSE.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D132359
2022-08-30 10:01:26 +08:00
Craig Topper
e25eb61d03 [RISCV] Enable (srl (and X, C2), C) to form SRLIW in more cases.
Don't require the AND has one use and don't depend on
targetShrinkDemandedConstant turning C2 into 0xffffffff. Instead,
check that the constant is 0xffffffff after replacing any bits
that will be shifted out with 1s.

Another way to fix this might be to prevent SimplifyDemandedBits
from destroying the ANDI after type legalization using
targetShrinkDemandedBits. That would prevent the CSE that created
this mess. targetShrinkDemandedBits is currently only enable after
legalize ops. Quick experiment shows we can't just change when it
runs, we would need to try a different heuristic for post type
legalization.
2022-08-29 15:52:08 -07:00