176973 Commits

Author SHA1 Message Date
Kazu Hirata
1daf2994de [llvm] Use StringRef::contains (NFC) 2023-12-23 22:21:52 -08:00
Shengchen Kan
17ff25a58e [X86][NFC] Not infer OpSize from Xi8|16|32|64
For legacy (arithmetic) instructions, the operand size override prefix (0x66)
is used to switch the operand data size from 32b to 16b (in 32/64-bit mode),
16b to 32b (in 16-bit mode). That's why we set OpSize16 for 16-bit instructions
and set OpSize32 for 32-bit instructions.

But it's not a generic rule any more after APX. APX adds 4 variants for
arithmetic instructions: promoted EVEX, NDD (new data destination), NF (no flag),
NF_NDD. All the 4 variants are in EVEX space and only legal in 64-bit
mode. EVEX.pp is set to 01 for the 16-bit instructions to encode 0x66.
For APX, we should set OpSizeFixed for 8/16/32/64-bit variants and set PD for
the 16-bit variants.

Hence, to reuse the classes ITy and its subclasses BinOp* for APX instructions,
we extract the OpSize setting from the class ITy.
2023-12-24 12:00:25 +08:00
Shengchen Kan
6e20df1a3b [X86][NFC] Set default OpPrefix to PS for XOP/VEX/EVEX instructions
It helps simplify the class definitions. Now, the only explicit usage of PS is
to check prefix 0x66/0xf2/0xf3 can not be used a prefix, e.g. wbinvd.

See 82974e0114f02ffc07557e217d87f8dc4e100a26 for more details.
2023-12-24 10:20:40 +08:00
Felipe de Azevedo Piovezan
acacec3bbf
[LiveDebugValues][nfc] Reduce memory usage of InstrRef (#76051)
Commit 1b531d54f623 (#74203) removed the usage of unique_ptrs of arrays
in favour of using vectors, but inadvertently increased peak memory
usage by removing the ability to deallocate vector memory that was no
longer needed mid-LDV.

In that same review, it was pointed out that `FuncValueTable` typedef
could be removed, since it was "just a vector".

This commit addresses both issues by making `FuncValueTable` a real data
structure, capable of mapping BBs to ValueTables and able to free
ValueTables as needed.

This reduces peak memory usage in the compiler by 10% in the benchmarks
flagged by the original review.

As a consequence, we had to remove a handful of instances of the
"declare-then-initialize" antipattern in unittests, as the
FuncValueTable class is no longer default-constructible.
2023-12-23 13:44:45 -03:00
Florian Hahn
fbcf8a8cbb
[ConstraintElim] Add (UGE, var, 0) to unsigned system for new vars. (#76262)
The constraint system used for ConstraintElimination assumes all
varibles to be signed. This can cause missed optimization in the
unsigned system, due to missing the information that all variables are
unsigned (non-negative).

Variables can be marked as non-negative by adding Var >= 0 for all
variables. This is done for arguments on ConstraintInfo construction and
after adding new variables. This handles cases like the ones outlined in
https://discourse.llvm.org/t/why-does-llvm-not-perform-range-analysis-on-integer-values/74341

The original example shared above is now handled without this change,
but adding another variable means that instcombine won't be able to
simplify examples like https://godbolt.org/z/hTnra7zdY

Adding the extra variables comes with a slight compile-time increase
https://llvm-compile-time-tracker.com/compare.php?from=7568b36a2bc1a1e496ec29246966ffdfc3a8b87f&to=641a47f0acce7755e340447386013a2e086f03d9&stat=instructions:u

stage1-O3    stage1-ReleaseThinLTO    stage1-ReleaseLTO-g  stage1-O0-g
 +0.04%           +0.07%                   +0.05%           +0.02%
stage2-O3    stage2-O0-g    stage2-clang
  +0.05%         +0.05%        +0.05%

https://github.com/llvm/llvm-project/pull/76262
2023-12-23 15:53:48 +01:00
Matt Arsenault
ed6dc62862
DAG: Handle equal size element build_vector promotion (#76213) 2023-12-23 20:43:14 +07:00
Kazu Hirata
03dc806b12 [Transforms] Use {DenseMap,SmallPtrSet}::contains (NFC) 2023-12-22 14:51:22 -08:00
Yingwei Zheng
345d7b1618
[InstCombine] Fold minmax intrinsic using KnownBits information (#76242)
This patch tries to fold minmax intrinsic by using
`computeConstantRangeIncludingKnownBits`.
Fixes regression in
[_karatsuba_rec:cpython/Modules/_decimal/libmpdec/mpdecimal.c](c31943af16/Modules/_decimal/libmpdec/mpdecimal.c (L5460-L5462)),
which was introduced by #71396.
See also
https://github.com/dtcxzyw/llvm-opt-benchmark/issues/16#issuecomment-1865875756.

Alive2 for splat vectors with undef: https://alive2.llvm.org/ce/z/J8hKWd
2023-12-23 04:41:32 +08:00
Momchil Velikov
4b6968952e
[AArch64] Implement spill/fill of predicate pair register classes (#76068)
We are getting ICE with, e.g.
```
#include <arm_sve.h>

 void g();
 svboolx2_t f0(int64_t i, int64_t n) {
     svboolx2_t r = svwhilelt_b16_x2(i, n);
     g();
     return r;
 }
```
2023-12-22 15:54:12 +00:00
Nikita Popov
d82eccc752 [RegAllocFast] Avoid duplicate hash lookup (NFC) 2023-12-22 16:52:20 +01:00
Nikita Popov
658b260dbf [Attributor] Don't construct pretty GEPs
Bring this in line with other transforms like ArgPromotion/SROA/
SCEVExpander and always produce canonical i8 GEPs.
2023-12-22 16:48:13 +01:00
HaohaiWen
40ec791b15
[RegAllocFast] Refactor dominates algorithm for large basic block (#72250)
The original brute force dominates algorithm is O(n) complexity so it is
very slow for very large machine basic block which is very common with
O0. This patch added InstrPosIndexes to assign index for each
instruction and use it to determine dominance. The complexity is now
O(1).
2023-12-22 23:06:16 +08:00
Lucas Duarte Prates
e4f1c52832
[AArch64] Assembly support for the Armv9.5-A Memory System Extensions (#76237)
This implements assembly support for the Memory Systems Extensions
introduced as part of the Armv9.5-A architecture version.
The changes include:
* New subtarget feature for FEAT_TLBIW.
* New system registers for FEAT_HDBSS:
  * HDBSSBR_EL2 and HDBSSPROD_EL2.
* New system registers for FEAT_HACDBS:
  * HACDBSBR_EL2 and HACDBSCONS_EL2.
* New TLBI instructions for FEAT_TLBIW:
  * VMALLWS2E1(nXS), VMALLWS2E1IS(nXS) and VMALLWS2E1OS(nXS).
* New system register for FEAT_FGWTE3:
  * FGWTE3_EL3.
2023-12-22 14:40:29 +00:00
Simon Pilgrim
3736e1d1cd [SCEV] Ensure shift amount is in range before calling getZExtValue()
Fixes #76234
2023-12-22 14:16:54 +00:00
Tomas Matheson
f5ab0bb148
[AArch64] paci<k>171615 auti<k>171615 assembly (#76227)
This adds the following instructions which are added in PAuthLR:
 - PACIA171615
 - PACIB171615
 - AUTIA171615
 - AUTIB171615

Also updates some encodings to match final published values.

Documentation can be found here:

https://developer.arm.com/documentation/ddi0602/2023-12/Base-Instructions

Co-authored-by: Lucas Prates <lucas.prates@arm.com>
2023-12-22 13:54:21 +00:00
Nikita Popov
c16559137c [IndVars] Avoid unnecessary truncate for zext nneg use
When performing sext IV widening, if one of the narrow uses is in
a zext nneg, we can treat it like an sext and avoid the insertion
of a trunc.
2023-12-22 11:30:17 +01:00
Nikita Popov
24e80d4cc5 [IndVars] Move "using namespace" to top-level scope (NFC) 2023-12-22 11:28:54 +01:00
Matt Arsenault
f7c3627338
DAG: Implement promotion for strict_fpextend (#74310)
Test is a placeholder, will be merged into the existing test after
additional bug fixes for illegal f16 targets are fixed.
2023-12-22 17:15:52 +07:00
Lucas Duarte Prates
7109a462cd
[AArch64] Assembly support for the Armv9.5-A RAS Extensions (#76161)
This implements assembly support for the RAS extensions introduced as
part of the Armv9.5-A architecture version.
The changes include:
* New system registers for Delegated SError exceptions for EL3
(FEAT_E3DSE):
  * VDISR_EL3
  * VSESR_EL3

Mode details about these extensions can be found at:
* https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2023
* https://developer.arm.com/documentation/ddi0602/2023-09/

Co-authored-by: Jirui Wu <jirui.wu@arm.com>
Co-authored-by: Oliver Stannard <oliver.stannard@arm.com>
2023-12-22 10:06:06 +00:00
Matt Arsenault
0e46b49de4 Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG"
This reverts commit c398fa009a47eb24f88383d5e911e59e70f8db86.

PPC backend was fixed in 2f82662ce901c6666fceb9c6c5e0de216a1c9667
2023-12-22 16:46:22 +07:00
Nikita Popov
54067c5fbe [SROA] Use memcpy if type size does not match store size
The original memcpy also copies the padding, so make sure that
this is still the case after splitting.

Fixes https://github.com/llvm/llvm-project/issues/64081.
2023-12-22 10:19:22 +01:00
Wang Pengcheng
17858ce6f3
[MacroFusion] Remove createBranchMacroFusionDAGMutation (#76209)
Instead, we add a `BranchOnly` parameter to indicate that only
branches with its predecessors will be fused.

X86 is the only user of `createBranchMacroFusionDAGMutation`.
2023-12-22 16:31:38 +08:00
Shan Huang
06a9c6738a
[CVP] Fix #76058: missing debug location in processSDiv function (#76118)
This PR fixes #76058.
2023-12-22 09:26:32 +01:00
Shengchen Kan
ff32ab3ae7 [X86][NFC] Not imply TB in PS|PD|XS|XD
This can help us aovid introducing new classes T_MAP*PS|PD|XS|XD
when a new opcode map is supported.

And, T_MAP*PS|PD|XS|XD does not look better than T_MAP*, PS|PD|XS|XD.
2023-12-22 15:44:30 +08:00
Aiden Grossman
a15532d764
[X86] Add CPU detection for more znver2 CPUs (#74955)
This patch adds proper detection support for more znver2 CPUs.

Specifically, this adds in support for CPUs codenamed Renoir, Lucienne,
and Mendocino.

This was originally proposedfor Renoir in
https://reviews.llvm.org/D96220 and
got approved, but slipped through the cracks. However, there is still a
demand for this feature.

In addition to adding support for more znver2 CPUs, this patch also includes
some additional refactoring and comments related to cpu model
information for zen CPUs.

Fixes https://github.com/llvm/llvm-project/issues/74934.
2023-12-21 23:39:28 -08:00
XinWang10
1d4691a233
[X86][MC] Support Enc/Dec for EGPR for promoted CMPCCXADD instruction (#76125)
R16-R31 was added into GPRs in
https://github.com/llvm/llvm-project/pull/70958,
This patch supports the encoding/decoding for promoted CMPCCXADD
instruction in EVEX space.

RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2023-12-22 15:19:56 +08:00
Wang Pengcheng
f9c908862a
[RISCV] Split TuneShiftedZExtFusion (#76032)
We split `TuneShiftedZExtFusion` into three fusions to make them
reusable and match the GCC implementation[1].

The zexth/zextw fusions can be reused by XiangShan[2] and other
commercial processors, but shifted zero extension is not so common.

`macro-fusions-veyron-v1.mir` is renamed so it's not relevant to
specific processor.

References:
[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637303.html
[2] https://xiangshan-doc.readthedocs.io/zh_CN/latest/frontend/decode
2023-12-22 14:37:26 +08:00
wangpc
90f816e61f [RISCV] Rename TuneVeyronFusions to TuneVentanaVeyron
And fusion features are added to processor definition.
2023-12-22 14:29:31 +08:00
XinWang10
7b3323fffb
[X86][MC] Support Enc/Dec for EGPR for promoted CET instruction (#76023)
R16-R31 was added into GPRs in
https://github.com/llvm/llvm-project/pull/70958,
This patch supports the encoding/decoding for promoted CET instruction
in EVEX space.

RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2023-12-22 14:11:32 +08:00
Matt Arsenault
4d1cd38c95 DAG: Handle promotion of fcanonicalize
This avoids a regression in a future commit
2023-12-22 12:50:18 +07:00
Vitaly Buka
0ccc1e7acd Revert "[AArch64] Fold more load.x into load.i with large offset"
Issue #76202

This reverts commit f5687636415969e6d945659a0b78734abdfb0f06.
2023-12-21 21:12:40 -08:00
Matt Arsenault
248fba0cd8 AMDGPU: Remove pointless setOperationAction for xint_to_fp
The legalize action for uint_to_fp/sint_to_fp uses the source integer
type, not the result FP type so setting an action on an FP type does
nothing.
2023-12-22 11:24:35 +07:00
Shengchen Kan
62d8ae0a1e [X86][NFC] Remove class (VEX/EVEX/XOP)_4V and add class VVVV
`VEX_4V` does not look simpler than `VEX, VVVV`. It's kind of confusing
b/c classes like `VEX_L`, `VEX_LIG` do not imply `VEX` but it does.

For APX, we have promote EVEX, NDD, NF and NDD_NF instructions. All of
the 4 variants are in EVEX space and NDD/NDD_NF set the VVVV fields.
To extract the common fields (e.g EVEX) into a class and set VVVV
conditionally, we need VVVV to not imply other prefixes.
2023-12-22 10:38:15 +08:00
Craig Topper
e64f5d6305
[RISCV] Replace RISCVISD::VP_MERGE_VL with a new node that has a separate passthru operand. (#75682)
ISD::VP_MERGE treats the false operand as the source for elements past
VL. The vmerge instruction encodes 3 registers and treats the vd
register as the source for the tail.

This patch adds a new ISD opcode that models the tail source explicitly.
During lowering we copy the false operand to this operand.

I think we can merge RISCVISD::VSELECT_VL with this new opcode by using
an UNDEF passthru, but I'll save that for another patch.
2023-12-21 14:34:49 -08:00
Derek Schuff
35a5df2de6
[WebAssembly][Object] Record section start offsets at start of payload (#76188)
LLVM ObjectFile currently records the start offsets of sections as the
start of the section header, whereas most other tools (WABT, emscripten,
wasm-tools) record it as the start of the section content, after the
header. This affects binutils tools such as objdump and nm, but not
compilation/assembly (since that is driven by symbols and assembler
labels which already have their values inside the section payload rather
in the header. This patch updates LLVM to match the other tools.
2023-12-21 14:16:37 -08:00
Felipe de Azevedo Piovezan
058e527434
[AccelTable][NFC] Fix typos and duplicated code (#76155)
Renaming a member variable from "Endoding" to "Encoding".

Also replace inlined code for "isNormalized" with a call to the
function, so that if the definition of normalization ever changes, we
only need to change the one place.
2023-12-21 16:10:30 -03:00
Arthur Eubanks
7433b1ca3e Reapply "[X86] Set SHF_X86_64_LARGE for globals with explicit well-known large section name (#74381)"
This reverts commit 19fff858931bf575b63a0078cc553f8f93cced20.

Now that explicit large globals are handled properly in the small code model.
2023-12-21 10:51:30 -08:00
Arthur Eubanks
2366d53d8d
[X86] Fix more medium code model addressing modes (#75641)
By looking at whether a global is large instead of looking at the code
model.

This also fixes references to large data in the small code model.

We now always fold any 32-bit offset into the addressing mode with the
large code model since it uses 64-bit relocations.
2023-12-21 10:40:56 -08:00
Tomas Matheson
7bd17212ef Re-land "[AArch64] Codegen support for FEAT_PAuthLR" (#75947)
This reverts commit 9f0f5587426a4ff24b240018cf8bf3acc3c566ae.

Fix expensive checks failure by properly marking register def for ADR.
2023-12-21 18:32:55 +00:00
David Li
f44079db22
[ISel] Add pattern matching for depositing subreg value (#75978)
Depositing value into the lowest byte/word is a common code pattern.
This patch improves the code generation for it to avoid redundant AND
and OR operations.
2023-12-21 10:18:57 -08:00
Tomas Matheson
192f720178 Re-land "[AArch64] Add FEAT_PAuthLR assembler support" (#75947)
This reverts commit 199a0f9f5aaf72ff856f68e3bb708e783252af17.
Fixed the left-shift of signed integer which was causing UB.
2023-12-21 18:09:31 +00:00
Mikhail Gudim
411cba215a
Revert "[InstCombine] Extend foldICmpBinOp to add-like or. (#71… (#76167)
…396)"

This reverts commit 8773c9be3d9868288f1f46957945d50ff58e4e91.
2023-12-21 11:41:09 -05:00
Craig Topper
0dcff0db3a
[RISCV] Add codegen support for experimental.vp.splice (#74688)
IR intrinsics were already defined, but no codegen support had been
added.

I extracted this code from our downstream. Some of it may have come from
https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/ originally.
2023-12-21 08:38:32 -08:00
Tomas Matheson
199a0f9f5a Revert "[AArch64] Add FEAT_PAuthLR assembler support"
This reverts commit 934b1099cbf14fa3f86a269dff957da8e5fb619f.

Buildbot failues on sanitizer-x86_64-linux-fast
2023-12-21 16:26:39 +00:00
Tomas Matheson
9f0f558742 Revert "[AArch64] Codegen support for FEAT_PAuthLR"
This reverts commit 5992ce90b8c0fac06436c3c86621fbf6d5398ee5.

Builtbot failures with expensive checks enabled.
2023-12-21 16:25:55 +00:00
Kazu Hirata
e01c063684 [llvm] Use DenseMap::contains (NFC) 2023-12-21 08:18:47 -08:00
Nikita Popov
a134abf4be
[ValueTracking] Make isGuaranteedNotToBeUndef() more precise (#76160)
Currently isGuaranteedNotToBeUndef() is the same as
isGuaranteedNotToBeUndefOrPoison(). This function is used in places
where we only care about undef (due to multi-use issues), not poison.

Make it more precise by only considering instructions that can create
undef (like loads or call), and ignore those that can only create
poison. In particular, we can ignore poison-generating flags.

This means that inferring more flags has less chance to pessimize other
transforms.
2023-12-21 16:49:37 +01:00
Nikita Popov
b8df88b41c [InstCombine] Support zext nneg in gep of sext add fold
Add m_NNegZext() and m_SExtLike() matchers to make doing these kinds
of changes simpler in the future.
2023-12-21 16:38:09 +01:00
Jay Foad
8fdfd34cd2
[AMDGPU] Remove GDS and GWS for GFX12 (#76148) 2023-12-21 15:27:08 +00:00
Shengchen Kan
8eccf2b872 [X86] Set Uses = [EFLAGS] for ADCX/ADOX
According to Intel SDE, ADCX reads CF and ADOX reads OF. `Uses` was
set to empty by accident, the bug was not exposed b/c compiler never
emits these instructions.
2023-12-21 23:01:00 +08:00