68051 Commits

Author SHA1 Message Date
Monk Chiang
2b045324b2 [RISCV] Add scheduling resources for vector segment instructions.
Add scheduling resources for vector segment instructions

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128886
2022-07-12 22:51:58 -07:00
Simon Pilgrim
66bfd1ba8c [X86] Move isInRange(ArrayRef<int>) inside assert to fix NDEBUG builds. NFC.
Fix unused static function warning introduced by D129207
2022-07-12 21:51:07 +01:00
Nick Desaulniers
2240d72f15 [X86] initial -mfunction-return=thunk-extern support
Adds support for:
* `-mfunction-return=<value>` command line flag, and
* `__attribute__((function_return("<value>")))` function attribute

Where the supported <value>s are:
* keep (disable)
* thunk-extern (enable)

thunk-extern enables clang to change ret instructions into jmps to an
external symbol named __x86_return_thunk, implemented as a new
MachineFunctionPass named "x86-return-thunks", keyed off the new IR
attribute fn_ret_thunk_extern.

The symbol __x86_return_thunk is expected to be provided by the runtime
the compiled code is linked against and is not defined by the compiler.
Enabling this option alone doesn't provide mitigations without
corresponding definitions of __x86_return_thunk!

This new MachineFunctionPass is very similar to "x86-lvi-ret".

The <value>s "thunk" and "thunk-inline" are currently unsupported. It's
not clear yet that they are necessary: whether the thunk pattern they
would emit is beneficial or used anywhere.

Should the <value>s "thunk" and "thunk-inline" become necessary,
x86-return-thunks could probably be merged into x86-retpoline-thunks
which has pre-existing machinery for emitting thunks (which could be
used to implement the <value> "thunk").

Has been found to build+boot with corresponding Linux
kernel patches. This helps the Linux kernel mitigate RETBLEED.
* CVE-2022-23816
* CVE-2022-28693
* CVE-2022-29901

See also:
* "RETBLEED: Arbitrary Speculative Code Execution with Return
Instructions."
* AMD SECURITY NOTICE AMD-SN-1037: AMD CPU Branch Type Confusion
* TECHNICAL GUIDANCE FOR MITIGATING BRANCH TYPE CONFUSION REVISION 1.0
  2022-07-12
* Return Stack Buffer Underflow / Return Stack Buffer Underflow /
  CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702

SystemZ may eventually want to support "thunk-extern" and "thunk"; both
options are used by the Linux kernel's CONFIG_EXPOLINE.

This functionality has been available in GCC since the 8.1 release, and
was backported to the 7.3 release.

Many thanks for folks that provided discrete review off list due to the
embargoed nature of this hardware vulnerability. Many Bothans died to
bring us this information.

Link: https://www.youtube.com/watch?v=IF6HbCKQHK8
Link: https://github.com/llvm/llvm-project/issues/54404
Link: https://gcc.gnu.org/legacy-ml/gcc-patches/2018-01/msg01197.html
Link: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/return-stack-buffer-underflow.html
Link: https://arstechnica.com/information-technology/2022/07/intel-and-amd-cpus-vulnerable-to-a-new-speculative-execution-attack/?comments=1
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce114c866860aa9eae3f50974efc68241186ba60
Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00702.html
Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00707.html

Reviewed By: aaron.ballman, craig.topper

Differential Revision: https://reviews.llvm.org/D129572
2022-07-12 09:17:54 -07:00
Jay Foad
5d41fe0768 [AMDGPU] SILowerControlFlow uses LiveIntervals
The availability of LiveIntervals affects kill flags in the output, so
declare the use to avoid strange effects where the output of this pass
is different depending on what other passes are scheduled after it.

Differential Revision: https://reviews.llvm.org/D129555
2022-07-12 16:53:53 +01:00
Igor Kudrin
9ff10a0d62 [NVPTX] Add missing pass names
Differential Revision:
2022-07-12 07:58:13 -07:00
Rosie Sumpter
e5edc1b5ee [AArch64][SVE] Ensure PTEST operands have type nxv16i1
Currently any legal predicate types will be pattern-matched when
creating a PTEST instruction. This could be a problem in future since
PTEST always uses the .B specifier for the operand, but it is not
always guaranteed that the extra lanes of unpacked types (e.g. nxv4i1)
are zero. This patch ensures the operands of PTEST are type nxv16i1,
where the undef lanes are set to zero.

Differential Revision: https://reviews.llvm.org/D129282/
2022-07-12 09:27:59 +01:00
Craig Topper
c5be6a8308 [RISCV] Use X0 in place of VLMaxSentinel in lowering.
I thought I had already fixed all of these, but I guess I missed one.
2022-07-11 23:29:04 -07:00
Xiang1 Zhang
a45dd3d814 [X86] Support -mstack-protector-guard-symbol
Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D129346
2022-07-12 10:17:00 +08:00
Xiang1 Zhang
643786213b Revert "[X86] Support -mstack-protector-guard-symbol"
This reverts commit efbaad1c4a526e91b034e56386e98a9268cd87b2.
due to miss adding review info.
2022-07-12 10:14:32 +08:00
Xiang1 Zhang
efbaad1c4a [X86] Support -mstack-protector-guard-symbol 2022-07-12 10:13:48 +08:00
Craig Topper
c3c17b1695 [RISCV] Use MVT for the argument to getMaskTypeFor. NFC
Only one caller didn't already have an MVT and that was easy to
fix. Since the return type is MVT and it uses MVT::getVectorVT,
taking an MVT as input makes the most sense.
2022-07-11 15:14:44 -07:00
Piotr Sobczak
2bd8e74b94 [AMDGPU] Fix bitcast v4i64/v16i16
Fix a regression introduced in D128865.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D129375
2022-07-11 22:27:52 +02:00
Craig Topper
759e5e0096 [RISCV] Remove doPeepholeLoadStoreADDI.
All of the cases should be handled by SelectAddrRegImm now.

Reviewed By: asb, luismarques

Differential Revision: https://reviews.llvm.org/D129451
2022-07-11 10:44:33 -07:00
Craig Topper
907d923a20 [RISCV] Move the custom isel for (add X, imm) into SelectAddrRegImm.
This custom isel was used to split the lo12 bits of the imm so that
they could be folded into load/store addresses via a post-isel
peephole.

This patch instead splits the immediate during isel and folds the
lo12 removing the need for the post-isel peephole to do anything.

After this we'll be able to remove the post-isel peephole.

Reviewed By: asb, luismarques

Differential Revision: https://reviews.llvm.org/D129450
2022-07-11 10:44:33 -07:00
spupyrev
eecd41aa09 Revert "Rebase: [Facebook] [MC] Introduce NeverAlign fragment type"
This reverts commit 6d0528636ae54fba75938a79ae7a98dfcc949f72.
2022-07-11 09:50:47 -07:00
Craig Topper
1a2bd44b77 [RISCV] Make shouldConvertConstantLoadToIntImm return true unless enableUnalignedScalarMem is true.
This restores the old behavior before D129402 when
enableUnalignedScalarMem is false. This fixes a regression spotted
by @asb.

To fix this correctly, we need to consider alignment of the load
we'd be replacing, but that's not possible in the current interface.
2022-07-11 09:40:08 -07:00
Rafael Auler
6d0528636a Rebase: [Facebook] [MC] Introduce NeverAlign fragment type
Summary:
Introduce NeverAlign fragment type.

The intended usage of this fragment is to insert it before a pair of
macro-op fusion eligible instructions. NeverAlign fragment ensures that
the next fragment (first instruction in the pair) does not end at a
given alignment boundary by emitting a minimal size nop if necessary.

In effect, it ensures that a pair of macro-fusible instructions is not
split by a given alignment boundary, which is a precondition for
macro-op fusion in modern Intel Cores (64B = cache line size, see Intel
Architecture Optimization Reference Manual, 2.3.2.1 Legacy Decode
Pipeline: Macro-Fusion).

This patch introduces functionality used by BOLT when emitting code with
MacroFusion alignment already in place.

The use case is different from BoundaryAlign and instruction bundling:
- BoundaryAlign can be extended to perform the desired alignment for the
first instruction in the macro-op fusion pair (D101817). However, this
approach has higher overhead due to reliance on relaxation as
BoundaryAlign requires in the general case - see
https://reviews.llvm.org/D97982#2710638.
- Instruction bundling: the intent of NeverAlign fragment is to prevent
the first instruction in a pair ending at a given alignment boundary, by
inserting at most one minimum size nop. It's OK if either instruction
crosses the cache line. Padding both instructions using bundles to not
cross the alignment boundary would result in excessive padding. There's
no straightforward way to request instruction bundling to avoid a given
end alignment for the first instruction in the bundle.

LLVM: https://reviews.llvm.org/D97982

Manual rebase conflict history:
https://phabricator.intern.facebook.com/D30142613

Test Plan: sandcastle

Reviewers: #llvm-bolt

Subscribers: phabricatorlinter

Differential Revision: https://phabricator.intern.facebook.com/D31361547
2022-07-11 09:31:52 -07:00
Simon Pilgrim
97868fb972 [X86] isTargetShuffleEquivalent - attempt to match SM_SentinelZero shuffle mask elements using known bits
If the combined shuffle mask requires zero elements, we don't currently have much chance of matching them against the expected source vector. This patch uses the SelectionDAG::MaskedVectorIsZero wrapper to attempt to determine if the expected lement we want to use is already known to be zero.

I've also tightened up the ExpectedMask assertion to always be in range - we're never giving it a target shuffle mask that has sentinels at all - allowing to remove some of the confusing bounds checks.

This attempts to address some of the regressions uncovered by D129150 where we more aggressively fold shuffles as AND / 'clear' masks which results in more combined shuffles using SM_SentinelZero.

Differential Revision: https://reviews.llvm.org/D129207
2022-07-11 15:29:44 +01:00
John Brawn
ddd9485129 [MVE] Don't distribute add of vecreduce if it has more than one use
If the add has more than one use then applying the transformation
won't cause it to be removed, so we can end up applying it again
causing an infinite loop.

Differential Revision: https://reviews.llvm.org/D129361
2022-07-11 14:13:29 +01:00
David Sherwood
03fee6712a [LoopVectorize] Add option to use active lane mask for loop control flow
Currently, for vectorised loops that use the get.active.lane.mask
intrinsic we only use the mask for predicated vector operations,
such as masked loads and stores, etc. The loop itself is still
controlled by comparing the canonical induction variable with the
trip count. However, for some targets this is inefficient when it's
cheap to use the mask itself to control the loop.

This patch adds support for using the active lane mask for control
flow by:

1. Generating the active lane mask for the next iteration of the
vector loop, rather than the current one. If there are still any
remaining iterations then at least the first bit of the mask will
be set.
2. Extract the first bit of this mask and use this bit for the
conditional branch.

I did this by creating a new VPActiveLaneMaskPHIRecipe that sets
up the initial PHI values in the vector loop pre-header. I've also
made use of the new BranchOnCond VPInstruction for the final
instruction in the loop region.

Differential Revision: https://reviews.llvm.org/D125301
2022-07-11 13:46:55 +01:00
David Green
0a11ad2aa8 [ARM] Expand MVE i1 fptoint and inttofp if mve.fp is not present.
If MVE.fp is not present then we cannot select the vector i1 fp
operations to VCMP instructions, so need to expand.
2022-07-11 13:03:30 +01:00
David Green
46fc4de065 [AArch64] Guard FP16 fptosi_sat patterns with HasFullFP16. NFC
We shouldn't get this far as the operations are already legalized, but
the patterns should be guarded with hasFullFP16.
2022-07-11 08:35:40 +01:00
LiaoChunyu
3f68f0f816 [RISCV] Optimize 2x SELECT for floating-point types
Including the following opcode:
 Select_FPR16_Using_CC_GPR
 Select_FPR32_Using_CC_GPR
 Select_FPR64_Using_CC_GPR

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D127871
2022-07-11 14:10:27 +08:00
Pengcheng Wang
8977989449 [RISCV] Increase complexity of RVV element extraction patterns
Somehow some tests failed in our downstream because it matched
VFMV+FSD pattern first. Both FSD and VSE patterns have the same
complexity, while FSD is matched before VSE in the generated
matcher table.

This problem only occurs in our downstream (so sorry that I can't
provide a test here) and increasing the value of `AddedComplexity`
can fix it.

Reviewed By: StephenFan, craig.topper

Differential Revision: https://reviews.llvm.org/D129360
2022-07-11 10:53:15 +08:00
Craig Topper
35ec8a423d [RISCV] Teach shouldConvertConstantLoadToIntImm that constant materialization can use constant pools.
I think it only makes sense to return true here if we aren't going
to turn around and create a constant pool for the immmediate.

I left out the check for useConstantPoolForLargeInts() thinking
that even if you don't want the commpiler to create a constant pool
you might still want to avoid materializing an integer that is
already available in a global variable.

Test file was copied from AArch64/ARM and has not been commited yet.
Will post separate review for that.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D129402
2022-07-10 14:10:17 -07:00
NAKAMURA Takumi
393e12bddd R600ISelLowering.h: Silence a warning. [-Warray-parameter]
FIXME: Could it be rewritten with llvm::ArrayRef ?
2022-07-10 18:29:55 +09:00
Nicolai Hähnle
ede600377c ManagedStatic: remove many straightforward uses in llvm
(Reapply after revert in e9ce1a588030d8d4004f5d7e443afe46245e9a92 due to
Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other
than error categories, to be checked in more detail and reapplied
separately.)

Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.

Differential Revision: https://reviews.llvm.org/D129120
2022-07-10 10:29:15 +02:00
Nicolai Hähnle
e9ce1a5880 Revert "ManagedStatic: remove many straightforward uses in llvm"
This reverts commit e6f1f062457c928c18a88c612f39d9e168f65a85.

Reverting due to a failure on the fuchsia-x86_64-linux buildbot.
2022-07-10 09:54:30 +02:00
Nicolai Hähnle
e6f1f06245 ManagedStatic: remove many straightforward uses in llvm
Bulk remove many of the more trivial uses of ManagedStatic in the llvm
directory, either by defining a new getter function or, in many cases,
moving the static variable directly into the only function that uses it.

Differential Revision: https://reviews.llvm.org/D129120
2022-07-10 09:15:08 +02:00
Craig Topper
5f7641a3be [RISCV] Modify the custom isel for (add X, imm) used by load/stores.
We have custom isel that tries to select the Lo12 bits using a
separate ADDI that can later folded into the load/store address
by the post-isel peephole.

This patch disables this if the load/store already had a non-zero
offset. A non-zero offset implies that CodeGenPrepare split several
large offsets used by different loads and stores into a common large
offset and multiple small offsets that could be folded. Folding more
of the lo12 bits changes this common offset by increasing the small
offsets. While this can save an instruction to materialize the common
offset, it can also prevent the small offsets from fitting in a
compressed load/store instruction.

Removing this also simplifies the last piece needed to fold the custom
isel for add into SelectAddrRegImm and remove the post-isel peephole.
2022-07-09 22:47:27 -07:00
Craig Topper
9c6a2200e2 [RISCV] Support folding constant addresses in SelectAddrRegImm.
We already handled this by folding an ADDI in the post-isel peephole.
My goal is to remove that peephole so this adds the functionality
to isel.
2022-07-09 13:12:02 -07:00
David Blaikie
9008d0a38e Fix -Warray-parameter warning
Remove the bound in the definition, since it's not guaranteed/could
provide a false sense of security (I'd be inclined to go further and
change this to a pointer parameter, since that's what it really is - but
figured I'd preserve some of the author's intent here)
2022-07-09 17:04:01 +00:00
serge-sans-paille
e1272ab6ec [AMDGPU][NFC] Harmonize decl&def of R600TargetLowering::OptimizeSwizzle
The freshly baked -Warray-parameter warning discovered an inconsistency in
argument declaration, use the stricter one.

This fixes build issues like https://lab.llvm.org/buildbot#builders/18/builds/5305
2022-07-09 09:07:31 +02:00
Philip Reames
b12930e133 [RISCV] Switch to using get.active.lane.mask when tail folding
The motivation here is to a) bring us closer into alignment with AArch64 under the assumption that codepath is better tested, and b) simplify pattern matching in an upcoming change.

The immediate impact is a significant IR reduction but a fairly minimal change in the generated assembly. Due to a difference in expansion behavior we get a saturating add vs an unsaturating one for the old code, but that's about it. This difference comes down to different handling of overflow, which doesn't seem to be possible here anyways, so the assembly codegen is arguably a minor regression. I don't expect that to matter in practice.

Differential Revision: https://reviews.llvm.org/D129221
2022-07-08 10:24:59 -07:00
Craig Topper
92f1794d41 [RISCV] Mark fminnum_vl and fmaxnum_vl as commutable. 2022-07-08 10:19:09 -07:00
Philip Reames
264018d764 [RISCV] Mark vsadd(u)_vl as commutable
This allows fixed length vectors involving splats on the LHS to commute into the _vx form of the instruction. Oddly, the generic canonicalization rules appear to catch the scalable vector cases. I haven't fully dug in to understand why, but I suspect it's because of a difference in how we represent splats (splat_vector vs build_vector).

Differential Revision: https://reviews.llvm.org/D129302
2022-07-08 10:18:21 -07:00
Craig Topper
a246eb6814 [RISCV] Mark (s/u)min_vl and (s/u)max_vl as commutable. 2022-07-08 09:59:42 -07:00
Matt Arsenault
02769f2b3f AArch64/GlobalISel: Stop using legal s1 values
As far as I can tell treating s1 values as legal makes no sense. There
are no allocatable 1-bit registers. SelectionDAG legalizes the usual
set of boolean operations to 32-bits, and this should do the
same. This avoids some special case handling in the selector of s1
values, and some extra code to look through truncates.

This makes some code worse at -O0, since nothing cleans up the and 1
the artifact combiner inserts. We could probably add some
non-essential combines or teach the artifact combiner to elide
intermediates betweeen boolean uses and defs.
2022-07-08 11:55:08 -04:00
Matt Arsenault
1ee6ce9bad GlobalISel: Allow forming atomic/volatile G_ZEXTLOAD
SelectionDAG has a target hook, getExtendForAtomicOps, which it uses
in the computeKnownBits implementation for ATOMIC_LOAD. This is pretty
ugly (as is having a separate load opcode for atomics), so instead
allow making use of atomic zextload. Enable this for AArch64 since the
DAG path defaults in to the zext behavior.

The tablegen changes are pretty ugly, but partially helps migrate
SelectionDAG from using ISD::ATOMIC_LOAD to regular ISD::LOAD with
atomic memory operands. For now the DAG emitter will emit matchers for
patterns which the DAG will not produce.

I'm still a bit confused by the intent of the isLoad/isStore/isAtomic
bits. The DAG implementation rejects trying to use any of these in
combination. For now I've opted to make the isLoad checks also check
isAtomic, although I think having isLoad and isAtomic set on these
makes most sense.
2022-07-08 11:55:08 -04:00
Cullen Rhodes
d1c51d45f0 [AArch64] Use Neoverse N2 sched model as default for:
- Cortex-A710
  - Cortex-X2
  - Neoverse-V1
  - Neoverse-512tvb

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D129203
2022-07-08 13:34:13 +00:00
Phoebe Wang
8fb083d33e [X86][FP16] Add constrained FP support for scalar emulation
This is a follow up patch to support constrained FP in FP16 emulation.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D128114
2022-07-08 20:33:42 +08:00
Sanjay Patel
8b75671314 [SDAG] try to replace subtract-from-constant with xor
This is almost the same as the abandoned D48529, but it
allows splat vector constants too.

This replaces the x86-specific code that was added with
the alternate patch D48557 with the original generic
combine.

This transform is a less restricted form of an existing
InstCombine and the proposed SDAG equivalent for that
in D128080:
https://alive2.llvm.org/ce/z/OUm6N_

Differential Revision: https://reviews.llvm.org/D128123
2022-07-08 08:14:24 -04:00
David Green
4334cbd49b [AArch64] Remove incorrect use of DemandElts
This call to computeKnownBits was passing in a 0xff mask, looking like
it was expecting it to be used as a DemandBits, not a DemandElts mask.
2022-07-08 11:38:00 +01:00
Kito Cheng
5c45ae4108 [RISCV] Fix wrong register rename for store value during make-compressible optimization
Current implementation will rename both register in store instructions if
we store base address into memory with same base register, it's OK if
the offset is 0, however that is wrong transform if offset isn't 0, give
a smalle example here:

sd      a0, 808(a0)

We should not transform into:

addi    a2, a0, 768
sd      a2, 40(a2)

That should just rename base address like this:

addi    a2, a0, 768
sd      a0, 40(a2)

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128876
2022-07-08 18:07:17 +08:00
Cullen Rhodes
03af9ba680 [AArch64] Initial sched model for Neoverse N2
The optimization guide can be found here:
https://developer.arm.com/documentation/PJDOC-466751330-18256/latest/

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D128631
2022-07-08 09:39:13 +00:00
Weining Lu
1d27f26426 [LoongArch] Add codegen support for multiplication operations
Reference:
https://llvm.org/docs/LangRef.html#mul-instruction

Differential Revision: https://reviews.llvm.org/D128194
2022-07-08 17:15:17 +08:00
Lian Wang
9cfb28d672 [RISCV] Change VECTOR_SPLICE mask operation from expand to promote
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128717
2022-07-08 06:20:22 +00:00
Abinav Puthan Purayil
17a81ecf85 [AMDGPU] Use the HasNoUse predicate for no-ret atomic op selection
This change replaces the C++ predicates with the HasNoUse builtin
predicate that would enable the no-ret atomic op selection in
GlobalISel.

Differential Revision: https://reviews.llvm.org/D125213
2022-07-08 09:47:33 +05:30
Abinav Puthan Purayil
7504c7a877 [AMDGPU] Use AddedComplexity for ret and noret atomic ops selection
This patch removes the predicate for return atomic ops and uses
AddedComplexity to distinguish its selection from its no return variant.
This will produce better matchers that doesn't unnecessarily check for
the negated predicate if the initial predicate failed. Also, it
simplifies the enabling of no return atomic ops selection in GlobalISel.

Differential Revision: https://reviews.llvm.org/D128241
2022-07-08 09:47:33 +05:30
Haohai Wen
18a1085e02 [X86] Fix collectLeaves for adds used by phi that forms loop
When add has additional users, we should indentify whether add's
user is phi that forms loop rather than root's.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D129169
2022-07-08 10:39:02 +08:00