52796 Commits

Author SHA1 Message Date
Matt Arsenault
35c2a7542c AMDGPU: Fix asserting on fast f16 pown
https://reviews.llvm.org/D158903
2023-08-25 19:56:20 -04:00
Snehasish Kumar
3dbabeadd6 [CodeGen] Remove unused option in MachineFunctionSplitter.
The option was added in github.com/llvm/llvm-project/commit/90ab85a but it doesn't seem to be used. The triple check has been removed so this shouldn't be required going forward.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D158885
2023-08-25 21:24:28 +00:00
Craig Topper
398c855457 [RISCV] Improve splatPartsI64WithVL for vlmax scalable vector constants where Hi and Lo are the same.
We can use a 32-bit splat and bitcast to i64 vector.

This only handles the case where we are using vlmax so that the new
vl is cheap to compute. This could be generalized to double the VL.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D158879
2023-08-25 14:15:41 -07:00
Craig Topper
4184bafa9b [RISCV] Refactor lowerSPLAT_VECTOR_PARTS to use splatPartsI64WithVL for scalable vectors.
There was quite a bit of duplication between splatPartsI64WithVL
and the scalable vector handling in lowerSPLAT_VECTOR_PARTS, but
scalable vector had one additional case. Move that case to
splatPartsI64WithVL which improves some fixed vector tests.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D158876
2023-08-25 14:15:40 -07:00
Jeffrey Byrnes
3ba8dabbf3 [AMDGPU] Add sdot4 / sdot8 intrinsics for gfx11
This provides a uniform way to lower into the relevant instructions across all generations.

Differential Revision: https://reviews.llvm.org/D158468

Change-Id: I1f7ba4b15ee470738535cf1c7d177a11fc471e43
2023-08-25 11:45:55 -07:00
Daniel Paoliello
8d0c3db388 Emit the CodeView S_ARMSWITCHTABLE debug symbol for jump tables
The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information:

* The address of the branch instruction that uses the jump table.
* The address of the jump table.
* The "base" address that the values in the jump table are relative to.
* The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted).

Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to.

Documentation for the symbol can be found in the Microsoft PDB library dumper: 0fe89a942f/cvdump/dumpsym7.cpp (L5518)

This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D149367
2023-08-25 10:19:17 -07:00
Yolanda Chen
291101aa8e [WebAssembly] Optimize vector shift using a splat value from outside block
The vector shift operation in WebAssembly uses an i32 shift amount type, while
the LLVM IR requires binary operator uses the same type of operands. When the
shift amount operand is splated from a different block, the splat source will
not be exported and the vector shift will be unrolled to scalar shifts. This
patch enables the vector shift to identify the splat source value from the other
block, and generate expected WebAssembly bytecode when lowering.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D158399
2023-08-25 08:13:27 -07:00
LiaoChunyu
1b12427c01 [VP][RISCV] Add vp.is.fpclass and RISC-V support
There is no vp.fpclass after FCLASS_VL(D151176), try to support vp.fpclass.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D152993
2023-08-25 15:40:55 +08:00
Konstantina Mitropoulou
48fa79a503 Revert "[DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points."
This reverts commit 5ec13535235d07eafd64058551bc495f87c283b1.
2023-08-24 20:39:04 -07:00
Daniel Hoekwater
8c249c44d4 [CodeGen][AArch64] Don't split functions with a red zone on AArch64
Because unconditional branch relaxation on AArch64 grows the stack to
spill a register, splitting a function would cause the red zone to be
overwritten. Explicitly disable MFS for such functions.

Differential Revision: https://reviews.llvm.org/D157127
2023-08-24 21:57:35 +00:00
Daniel Hoekwater
c9f328844d Reland "[CodeGen] Fix unconditional branch duplication issue in bbsections"
Reverted in 4c8d056f50342d5401f5930ed60e5e48b211c3fb because it broke
buildbot `llvm-clang-x86_64-expensive-checks-debian` due to the AArch64
test generating invalid code. The issue still exists, but it's fixed in
D156767, so the AArch64 test should be added there.

Differential Revision: https://reviews.llvm.org/D158674
2023-08-24 21:27:55 +00:00
Felipe de Azevedo Piovezan
6be47fb8be [CodeGen] Separate X86 and Aarch entry_value test
Addresses the bot issues raised in D158636.
2023-08-24 17:07:21 -04:00
Felipe de Azevedo Piovezan
e070a5d230 [CodeGen] Separate X86 and Aarch tests
The directory this test used to live in is exclusive to Aarch.
Addresses the failure reported in D158636.
2023-08-24 16:33:13 -04:00
Konstantina Mitropoulou
5ec1353523 [DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns with floating points.
CMP(A,C)||CMP(B,C) => CMP(MIN/MAX(A,B), C)
CMP(A,C)&&CMP(B,C) => CMP(MIN/MAX(A,B), C)

If the operands are proven to be non NaN, then the optimization can be applied
for all predicates.

We can apply the optimization for the following predicates for FMINNUM/FMAXNUM
(for quiet and signaling NaNs) and for FMINNUM_IEEE/FMAXNUM_IEEE if we can prove
that the operands are not signaling NaNs.
- ordered lt/le and ||
- ordered gt/ge and ||
- unordered lt/le and &&
- unordered gt/ge and &&

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D155267
2023-08-24 10:48:56 -07:00
Daniel Hoekwater
4c8d056f50 Revert "[CodeGen] Fix unconditional branch duplication issue in bbsections"
This reverts commit 994eb5adc40cd001d82d0f95d18d1827b57e496c.
Breaks buildbot `llvm-clang-x86_64-expensive-checks-debian`
https://lab.llvm.org/buildbot/#/builders/16/builds/53620
2023-08-24 16:59:17 +00:00
Daniel Hoekwater
994eb5adc4 [CodeGen] Fix unconditional branch duplication issue in bbsections
If an end section basic block ends in an unconditional branch to its
fallthrough, BasicBlockSections will duplicate the unconditional branch.
This doesn't break x86, but it is a (slight) size optimization and more
importantly prevents AArch64 builds from breaking.

Ex:
```
bb1 (bbsections Hot):
  jmp bb2

bb2 (bbsections Cold):
  /* do work... */
```

After running sortBasicBlocksAndUpdateBranches():
```
bb1 (bbsections Hot):
  jmp bb2
  jmp bb2

bb2 (bbsections Cold):
  /* do work... */
```

Differential Revision: https://reviews.llvm.org/D158674
2023-08-24 16:22:55 +00:00
Luke Lau
515bd40b4e [RISCV] Fix test using wrong variable. NFC
Looks like this test was trying to check if two shifts were combined, but it
was accidentally using the insertelement instead of the splat.
2023-08-24 15:45:43 +01:00
Yeting Kuo
243d8cdb03 [RISCV] Add missed HasRoundModeOp for VPseudoUnaryMask_FRM/VPseudoUnaryMask_FRM.
Missed HasRoundModeOp makes performCombineVMergeAndVOps use wrong operands for
VFCVT_RM instructions.

Reviewed By: luke

Differential Revision: https://reviews.llvm.org/D158711
2023-08-24 21:45:22 +08:00
Felipe de Azevedo Piovezan
35f4ef1fee [SelectionDAG][DebugInfo] Handle entry_value dbg.value DIExprs earlier
When SelectiondDAG converts dbg.value intrinsics, it first ensures we have
already generated code for the value operator of the intrinsic. The rationale
being that if we haven't had the need to generate code for this value, it won't
be a debug value that causes the generation.

For example, if the first use the physical register of an argument is a
dbg.value, we are going to hit this code path.  However, this is irrelevant for
entry value expressions: by definition we are not interested in the _current_
value of the physical register, but rather on its value at the start of the
function. To deal with this, this patch changes lowering to handle this case as
early as possible.

Differential Revision: https://reviews.llvm.org/D158649
2023-08-24 09:33:53 -04:00
Oliver Stannard
40614e1c14 [ARM] Save and restore CPSR around tMOVimm32
When resolving a frame index with a large offset for v6M execute-only,
we emit a tMOVimm32 pseudo-instruction, which later gets lowered to a
sequence of instructions, all of which are flag-setting. However, a
frame index may be generated for a register spill or reload instruction,
which can be inserted at a point where CPSR is live. This patch inserts
MRS and MSR instructions around the tMOVimm32 to save and restore the
value of CPSR, if CPSR is live at that point.

This may need up to two virtual registers (one to build the immediate
value, one to save CPSR) during frame index lowering, which happens
after register allocation, so we need to ensure two spill slots are
avilable to the register scavenger to ensure it can free up enough
registers for this.

There is no test for the emission (or not) of the MRS/MSR pair, because
it requires a spill or reload to be inserted at a point where CPSR is
live, which requires a large, complex function and is fragile enough
that any optimisation changes will break the test. This bug was easily
found by csmith with -verify-machineinstrs, which I now run regularly on
v6M execute-only (and many other combinations).

Patch by John Brawn and myself.

Reviewed By: stuij

Differential Revision: https://reviews.llvm.org/D158404
2023-08-24 14:15:02 +01:00
Felipe de Azevedo Piovezan
27425aec86 [CodeGen][DebugInfo] Add x86 entry value tests
We should also test the x86 target, since it has different backend defaults from
ARM.

Differential Revision: https://reviews.llvm.org/D158636
2023-08-24 08:48:48 -04:00
Simon Pilgrim
19777deba4 [X86] matchAddressRecursively - add foldMaskedShiftToBEXTR handling to ZERO_EXTEND nodes. 2023-08-24 13:14:41 +01:00
Simon Pilgrim
69a0f23598 [X86] extract-bits.ll - add test showing failure to match BEXTR through ZERO_EXTEND node 2023-08-24 13:14:41 +01:00
Matt Arsenault
d86a7d631c GlobalISel: Add constant fold combine for zext/sext/anyext
Could use more work for vectors.

https://reviews.llvm.org/D156534
2023-08-24 08:10:01 -04:00
Matt Arsenault
e52acb817d GlobalISel: Add shifts to constant_fold combine
Currently we're getting away with post-selection constant folding on
these (a hack which exists for the DAG).

https://reviews.llvm.org/D156534
2023-08-24 08:09:57 -04:00
Luke Lau
e772c0ecd8 [RISCV] Use vmv.v.x if Hi bits are undef when lowering splat_vector_parts
When lowering a splat_vector_parts, if the hi bits are undefined then we can
splat the lo bits without having to check if it's going to be sign extended or
not, because those bits will be undefined anyway.

I've handled it for both fixed and scalable vectors, but there's no diff
on the scalable vror tests, since the hi bits aren't combined away to
undef in SimplifyDemanded for scalable vectors. I'm not sure why that is.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D158625
2023-08-24 12:19:09 +01:00
David Green
8ec2392622 [AArch64][GISel] Expand coverage of FNeg.
This adds some more extensive test coverage for fneg through global isel,
switching the opcodes to use the more complete ActionDefinitions to handle more
cases.
2023-08-24 12:00:23 +01:00
Simon Pilgrim
e283ef7e93 [X86] matchAddressRecursively - add foldMaskAndShiftToScale handling to ZERO_EXTEND nodes. 2023-08-24 11:47:07 +01:00
Simon Pilgrim
f84cd7e579 [X86] fold-and-shift-x86_64.ll - add zext test case where upper bits are known zero (and won't get simplified to any_extend)
Add test coverage showing failure to use foldMaskAndShiftToScale with zero_extend nodes
2023-08-24 11:47:07 +01:00
David Green
e2fa9610c6 [AArch64][GISel] Expand coverage of FMul.
This adds some more extensive test coverage for fmul through global isel,
switching the opcodes to use the more complete ActionDefinitions to handle more
cases.
2023-08-24 11:41:15 +01:00
Jingu Kang
3b485a6622 [AArch64] Mark known zero for high 16-bits of uaddlv intrinsic output with v8i8
The uaddlv with v8i8 returns 16-bits value but clang generates 32-bits intrinsic
and trunc for it. In this case, we can mark known zero for the high 16-bits of
the intrinsic output.

Differential Revision:
2023-08-24 10:55:19 +01:00
Simon Pilgrim
19cdd45b08 [X86] X86DAGToDAGISel::matchIndexRecursively - add SIGN_EXTEND(ADD_NSW(X,C)) handling
Split an index register from IndexReg = SIGN_EXTEND(ADD_NSW(X,C)) to IndexReg = SIGN_EXTEND(X), Offset = SIGN_EXTEND(C)
2023-08-24 10:19:37 +01:00
Serge Pavlov
6862f0fab1 [FPEnv] Intrinsics for access to FP control modes
The change introduces intrinsics 'get_fpmode', 'set_fpmode' and
'reset_fpmode'. They manage all target dynamic floating-point control
modes, which include, for instance, rounding direction, precision,
treatment of denormals and so on. The intrinsics do the same
operations as the C library functions 'fegetmode' and 'fesetmode'. By
default they are lowered to calls to these functions.

Two main use cases are supported by this implementation.

1. Local modification of the control modes. In this case the code
usually has a pattern (in pseudocode):

    saved_modes = get_fpmode()
    set_fpmode(<new_modes>)
    ...
    <do operations under the new modes>
    ...
    set_fpmode(saved_modes)

In the case when it is known that the current FP environment is default,
the code may be shorter:

    set_fpmode(<new_modes>)
    ...
    <do operations under the new modes>
    ...
    reset_fpmode()

Such patterns appear not only in user code but also in implementations
of various FP controlling pragmas. In particular, the implementation of
`#pragma STDC FENV_ROUND` requires similar code if the target does not
support static rounding mode.

2. Portable control of FP modes. Usually FP control modes are set by
writing to some control register. Different targets have different
layout of this register, the way the register is accessed also may be
different. Using set of target-specific definitions for the control
register bits together with these intrinsic functions provides enough
portable way to handle control modes across wide range of hardware.

This change defines only llvm intrinsic function, which implement the
access required for the aforementioned use cases.

Differential Revision: https://reviews.llvm.org/D82525
2023-08-24 15:52:19 +07:00
Craig Topper
2ad50f354a [DAGCombiner][RISCV][AArch64][PowerPC] Restrict foldAndOrOfSETCC from using SMIN/SMAX where and OR/AND would do.
This removes some diffs created by D153502.

I'm assuming an AND/OR won't be worse than an SMIN/SMAX. For
RISC-V at least, AND/OR can be a shorter encoding than SMIN/SMAX.

It's weird that we have two different functions responsible for
folding logic of setccs, but I'm not ready to try to untangle that.

I'm unclear if the PowerPC chang is a regression or not. It looks
like it might use more registers, but I don't understand PowerPC
register so I'm not sure.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D158292
2023-08-23 20:26:23 -07:00
Kai Luo
1ceaec3e81 [PowerPC][altivec] Optimize codegen of vec_promote
According to https://www.ibm.com/docs/en/xl-c-and-cpp-linux/16.1.1?topic=functions-vec-promote, elements not specified by the input index argument are undefined. So that we don't need to set these elements to be zeros.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D158487
2023-08-24 02:10:13 +00:00
Matt Arsenault
8ce75acd1a AMDGPU: Expand and modernize llvm.sqrt.f32 tests 2023-08-23 20:39:18 -04:00
Matt Arsenault
16bc07ac91 AMDGPU: Select f64 fmul by negative power of 2 to ldexp
Select fmul x, -K -> ldexp(-x, log2(fabsK))
Select fmul fabs(x), -K -> ldexp(-|x|, log2(fabsK))

https://reviews.llvm.org/D158173
2023-08-23 20:36:01 -04:00
Matt Arsenault
4c4ff50361 AMDGPU: Add more baseline test for fmul to ldexp patterns 2023-08-23 20:31:54 -04:00
Matt Arsenault
a738bdf35e AMDGPU: Permit more rsq formation in AMDGPUCodeGenPrepare
We were basing the defer the fast case to codegen based on the fdiv
itself, and not looking for a foldable sqrt input.

https://reviews.llvm.org/D158127
2023-08-23 20:06:50 -04:00
Craig Topper
1f395115da [RISCV] Add Zicond instructions to RISCVOptWInstrs like XVentanaCondOps. 2023-08-23 16:57:16 -07:00
Matt Arsenault
e954085f80 AMDGPU: Fix more unsafe rsq formation
Introducing rsq contract flags is wrong, and also requires some level
of approximate functions. AMDGPUCodeGenPrepare already should handle
the f32 cases with appropriate flags, and I don't see how new
situations to handle would arise during legalization (other than cases
involving the rcp intrinsic, which instcombine tries to
handle). AMDGPUCodeGenPrepare does need to learn better handling of
rcp/rsq for f64 though, which we never bothered to handle well.

Removes another obstacle to correctly lowering sqrt.

https://reviews.llvm.org/D158099
2023-08-23 19:28:49 -04:00
Nitin John Raj
c07062a2e9 [RISCV][GlobalISel] Select G_CONSTANT, G_ANYEXT, COPY
We select G_CONSTANT generic opcodes by materializing the constant in a
register. G_ANYEXT is replaced with COPY.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D158504
2023-08-23 15:33:44 -07:00
David Tellenbach
979e8ae4fc
[AArch64] Check opcode before trying to extract register from operand
When matching FNEG patterns for the MachineCombiner we need to check for
opcodes first, before trying to extract a register from an operand.
Otherwise handling of instructions with non-register operands causes the
compiler to crash.

Differential Revision: https://reviews.llvm.org/D158473
2023-08-23 14:46:31 -07:00
Simon Pilgrim
5d79a8d148 [X86] fold-and-shift.ll - add x86-64 test coverage
Although we already have fold-and-shift-x86_64.ll - this adds additional test coverage for various and-shift patterns split by sign/zero extensions from i32 index patterns to i64 pointers
2023-08-23 22:16:54 +01:00
Yingwei Zheng
d6639f83a9
[SDAG][RISCV] Avoid folding setcc (xor C1, -1), C2, cond into setcc (xor C2, -1), C1, cond
This patch fixes https://github.com/llvm/llvm-project/issues/64935.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D158654
2023-08-24 04:18:17 +08:00
Changpeng Fang
ffa7c7897c [AMDGPU] Emit .actual_access metadata
Summary:
  Emit .actual_access metadata for the deduced argument access qualifier,
and .access for kernel_arg_access_qual.

Reviewers:
 arsenm

Differential Revision:
  https://reviews.llvm.org/D157451
2023-08-23 12:57:29 -07:00
Peter Rong
f58fbfc746 [X86][CodeGen] Add a dag pattern to fix #64323
After recent patch D30189, #64323's error message become a new one.
When DAGCombiner was optimizing `(vextract (scalar_to_vector val, 0) -> val`, it didn't
consider the possibility that the inserted value type has less bit than the dest type.
This patch fixes that.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D158355
2023-08-23 10:50:32 -07:00
David Green
adaf545a50 [GlobalISel] Limit shift_of_shifted_logic_chain to non-zero folds
After D157690 we are seeing some crashes from Global ISel, which seem to be
related to the shift_of_shifted_logic_chain combine that can remove too many
instructions if the shift amount is zero.

This limits the fold to non-zero shifts, under the assumption that it is better
in that case to fold away the shift to a COPY.

Differential Revision: https://reviews.llvm.org/D158596
2023-08-23 18:17:37 +01:00
Felipe de Azevedo Piovezan
88417098bb [CodeGen][DebugInfo] Append OP_deref when converting an EntryValue dbg.declare
When we convert an EntryValue dbg.declare into an entry of the MF side table, we
currently copy its DIExpression as is, and rely on subsequent layers to "know"
that this expression is implicitly indirect. This is bad because it adds an
implicit assumption to the IR representation, and requires subsequent layers to
know about this assumption. This also limits the reusability of this table:
what if, in the future, we want to use this table for dbg.values?

This patch changes existing behavior so that the entities converting
dbg_declares explicitly add an OP_deref when converting EntryValue dbg.declares.

Differential Revision: https://reviews.llvm.org/D158437
2023-08-23 12:25:12 -04:00
Neumann Hon
d00f59893e [SystemZ][z/OS] Fix the entry point marker for leaf functions
The function emitFunctionEntryLabel does not look at whether or not a function is a leaf when setting the entry flags, and instead blindly marks all functions as non-leaf routines.

Differential Revision: https://reviews.llvm.org/D157701

Reviewed By: uweigand
2023-08-23 09:50:01 -04:00