669 Commits

Author SHA1 Message Date
Rishabh Bali
fe42e72db2
[CodeGen] Port AtomicExpand to new Pass Manager (#71220)
Port the `atomicexpand` pass to the new Pass Manager. 
Fixes #64559
2024-02-25 18:42:22 +05:30
Craig Topper
e4057aacc5
[X86] Add missing pass initialization calls. (#82447)
If the passes aren't registered, they don't show up in print-after-all.
2024-02-20 21:11:01 -08:00
Shoaib Meenai
741b836330
[X86] Fix RTTI proxy emission for 32-bit (#78622)
32-bit x86 doesn't have an appropriate relocation type we can use to
elide the RTTI proxies, so we need to emit them. This would previously
cause crashes when using the relative vtable ABI for 32-bit x86.
2024-01-18 13:31:33 -08:00
Shengchen Kan
a5902a4d24 [X86][NFC] Rename variables/passes for EVEX compression optimization
RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031

APX introduces EGPR, NDD and NF instructions. In addition to compressing
EVEX encoded AVX512 instructions into VEX encoding, we also have several
more possible optimizations.

a. Promoted instruction (EVEX space) -> pre-promotion instruction (legacy space)
b. NDD (EVEX space) -> non-NDD (legacy space)
c. NF_ND (EVEX space) -> NF (EVEX space)

The first two types of compression can usually reduce code size, while
the third type of compression can help hardware decode although the
instruction length remains unchanged.

So we do the renaming for the upcoming APX optimizations.

BTW, I clang-format the code in X86CompressEVEX.cpp,
X86CompressEVEXTablesEmitter.cpp.

This patch also extracts the NFC in #77065 into a separate commit.
2024-01-06 12:41:09 +08:00
yubingex007-a11y
f2517cbcee
[X86][AMX] remove related code of X86PreAMXConfigPass (#69569)
In https://reviews.llvm.org/D125075, we switched to use
FastPreTileConfig in O0 and abandoned X86PreAMXConfigPass.
we can remove related code of X86PreAMXConfigPass safely.
2023-10-20 13:43:34 +08:00
Harald van Dijk
a21abc782a
[X86] Align i128 to 16 bytes in x86 datalayouts
This is an attempt at rebooting https://reviews.llvm.org/D28990

I've included AutoUpgrade changes to modify the data layout to satisfy the compatible layout check. But this does mean alloca, loads, stores, etc in old IR will automatically get this new alignment.

This should fix PR46320.

Reviewed By: echristo, rnk, tmgross

Differential Revision: https://reviews.llvm.org/D86310
2023-10-11 10:23:38 +01:00
Arthur Eubanks
0a1aa6cda2
[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.

This matches other nearby enums.

For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
2023-09-14 14:10:14 -07:00
Fangrui Song
bf6e39367c [X86] Clean up GlobalISel headers. NFC 2023-08-21 23:55:08 -07:00
Simon Pilgrim
0b91de5ea3 [X86] Add X86FixupVectorConstantsPass to re-fold AVX512 vector load folds as broadcast folds
This patch analyzes AVX512 instructions for full vector width folded loads from the constant pool and attempts to determine if it can be replaced with a smaller broadcast folded variant. Typically the broadcast opportunities were missed by type-width mismatches or mulituse limitations which have been removed in later passes.

As well as introducing broadcast fold tables (which can hopefully be extended/automated in the future), this also handles mismatches in the AND/ANDN/OR/XOR/TERNLOG type-widths, catching additional missed opportunities.

This is patch is pulled from the ongoing work based on D150143, but without removing the existing DAG constant broadcast lowering code - this patch is currently a late stage cleanup only.

The intention is to add additional broadcast/extension handling of constants in future patches, but it turned out that AVX512 broadcast handling was the easiest to start with.

Differential Revision: https://reviews.llvm.org/D150526
2023-05-23 10:58:17 +01:00
Sami Tolvanen
e9569748de [CodeGen][KCFI] Move cfi-type lowering to TargetLowering
KCFI machine function passes transform indirect calls with a
cfi-type attribute into architecture-specific type checks bundled
together with the calls. Instead of having a separate pass for each
architecture, add a generic machine function pass for KCFI and
move the architecture-specific code that emits the actual check to
TargetLowering. This avoids unnecessary duplication and makes it
easier to add KCFI support to other architectures.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D149915
2023-05-09 18:38:54 +00:00
Luo, Yuanke
e4ceb5a7bb [X86] Create extra prolog/epilog for stack realignment
Fix some bugs and reland e4c1dfed38370b4 and 614c63bec6d67c.
1. Run argument stack rebase pass before the reserved physical register
   is finalized.
2. Add LEA pseudo instruction to prevent the instruction being
   eliminated.
3. Don't support X32.
2023-03-22 22:20:27 +08:00
Luo, Yuanke
da8260a9b1 Revert "[X86] Create extra prolog/epilog for stack realignment"
This reverts commit e4c1dfed38370b4933f05c8e24b1d77df56b526c.
2023-03-21 20:30:29 +08:00
Luo, Yuanke
e4c1dfed38 [X86] Create extra prolog/epilog for stack realignment
The base pointer register is reserved by compiler when there is
dynamic size alloca and stack realign in a function. However the
base pointer register is not defined in X86 ABI, so user can use
this register in inline assembly. The inline assembly would
clobber base pointer register without being awared by user. This
patch is to create extra prolog to save the stack pointer to a
scratch register and use this register to reference argument from
stack. For some calling convention (e.g. regcall), there may be
few scratch register.
Below is the example code for such case.

```
extern int bar(void *p);
long long foo(size_t size, char c, int id) {
  __attribute__((__aligned__(64))) int a;
  char *p = (char *)alloca(size);
  asm volatile ("nop"::"S"(405):);
  asm volatile ("movl %0, %1"::"r"(id), "m"(a):);
  p[2] = 8;
  memset(p, c, size);
  return bar(p);
}
```
And below prolog/epilog will be emit for this case.
```
leal    4(%esp), %ebx
.cfi_def_cfa %ebx, 0
andl    $-128, %esp
pushl   -4(%ebx)
...
leal    4(%ebx), %esp
.cfi_def_cfa %esp, 4
```

Differential Revision: https://reviews.llvm.org/D145650
2023-03-21 08:09:56 +08:00
Noah Goldstein
69a322fed1 Add new pass X86FixupInstTuning for fixing up machine-instruction selection.
There are a variety of cases where we want more control over the exact
instruction emitted. This commit creates a new pass to fixup
instructions after the DAG has been lowered. The pass is only meant to
replace instructions that are guranteed to be interchangable, not to
do analysis for special cases.

Handling these instruction changes in in X86ISelLowering of
X86ISelDAGToDAG isn't ideal, as its liable to either break existing
patterns that expected a certain instruction or generate infinite
loops.

As well, operating as the MachineInstruction level allows us to access
scheduling/code size information for making the decisions.

Currently only implements `{v}permilps` -> `{v}shufps/{v}shufd` but
more transforms can be added.

Differential Revision: https://reviews.llvm.org/D143787
2023-02-27 18:53:25 -06:00
Archibald Elliott
62c7f035b4 [NFC][TargetParser] Remove llvm/ADT/Triple.h
I also ran `git clang-format` to get the headers in the right order for
the new location, which has changed the order of other headers in two
files.
2023-02-07 12:39:46 +00:00
Dominik Adamski
6809af1a23 Revert "[OpenMP][OMPIRBuilder] Move SIMD alignment calculation to LLVM Frontend"
This reverts commit ed01de67433174d3157e9d239d59dd465d52c6a5.
2023-01-13 14:38:17 -06:00
Dominik Adamski
ed01de6743 [OpenMP][OMPIRBuilder] Move SIMD alignment calculation to LLVM Frontend
Currently default simd alignment is specified by Clang specific TargetInfo
class. This class cannot be reused for LLVM Flang. If we move the default
alignment field into TargetMachine class then we can create TargetMachine
objects and query them to find SIMD alignment.

Scope of changes:
  1) Added information about maximal allowed SIMD alignment to TargetMachine
     classes.
  2) Removed getSimdDefaultAlign function from Clang TargetInfo class.
  3) Refactored createTargetMachine function.

Reviewed By: jsjodin

Differential Revision: https://reviews.llvm.org/D138496
2023-01-13 14:07:29 -06:00
Matt Arsenault
69e75ae695 CodeGen: Don't lazily construct MachineFunctionInfo
This fixes what I consider to be an API flaw I've tripped over
multiple times. The point this is constructed isn't well defined, so
depending on where this is first called, you can conclude different
information based on the MachineFunction. For example, the AMDGPU
implementation inspected the MachineFrameInfo on construction for the
stack objects and if the frame has calls. This kind of worked in
SelectionDAG which visited all allocas up front, but broke in
GlobalISel which hasn't visited any of the IR when arguments are
lowered.

I've run into similar problems before with the MIR parser and trying
to make use of other MachineFunction fields, so I think it's best to
just categorically disallow dependency on the MachineFunction state in
the constructor and to always construct this at the same time as the
MachineFunction itself.

A missing feature I still could use is a way to access an custom
analysis pass on the IR here.
2022-12-21 10:49:32 -05:00
Nick Desaulniers
be8fd64091 [llvm][X86ISelDAGToDAG] support -{start|stop}-{before|after}=x86-isel
Follow a similar pattern as AMDGPUDAGToDAGISel's constructor so that we
can use INITIALIZE_PASS to register a pass. This allows for more fine
grain testability of SelectionDAGISel.

Link: https://github.com/llvm/llvm-project/issues/59538

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D140323
2022-12-20 14:16:45 -08:00
Fangrui Song
4b1b9e22b3 Remove unused #include "llvm/ADT/Optional.h" 2022-12-05 04:21:08 +00:00
Fangrui Song
bac974278c CodeGen/CommandFlags: Convert Optional to std::optional 2022-12-03 18:38:12 +00:00
Krzysztof Parzyszek
8c7c20f033 Convert Optional<CodeModel> to std::optional<CodeModel> 2022-12-03 12:08:47 -06:00
Sami Tolvanen
cff5bef948 KCFI sanitizer
The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a
forward-edge control flow integrity scheme for indirect calls. It
uses a !kcfi_type metadata node to attach a type identifier for each
function and injects verification code before indirect calls.

Unlike the current CFI schemes implemented in LLVM, KCFI does not
require LTO, does not alter function references to point to a jump
table, and never breaks function address equality. KCFI is intended
to be used in low-level code, such as operating system kernels,
where the existing schemes can cause undue complications because
of the aforementioned properties. However, unlike the existing
schemes, KCFI is limited to validating only function pointers and is
not compatible with executable-only memory.

KCFI does not provide runtime support, but always traps when a
type mismatch is encountered. Users of the scheme are expected
to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi`
operand bundle to indirect calls, and LLVM lowers this to a
known architecture-specific sequence of instructions for each
callsite to make runtime patching easier for users who require this
functionality.

A KCFI type identifier is a 32-bit constant produced by taking the
lower half of xxHash64 from a C++ mangled typename. If a program
contains indirect calls to assembly functions, they must be
manually annotated with the expected type identifiers to prevent
errors. To make this easier, Clang generates a weak SHN_ABS
`__kcfi_typeid_<function>` symbol for each address-taken function
declaration, which can be used to annotate functions in assembly
as long as at least one C translation unit linked into the program
takes the function address. For example on AArch64, we might have
the following code:

```
.c:
  int f(void);
  int (*p)(void) = f;
  p();

.s:
  .4byte __kcfi_typeid_f
  .global f
  f:
    ...
```

Note that X86 uses a different preamble format for compatibility
with Linux kernel tooling. See the comments in
`X86AsmPrinter::emitKCFITypeId` for details.

As users of KCFI may need to locate trap locations for binary
validation and error handling, LLVM can additionally emit the
locations of traps to a `.kcfi_traps` section.

Similarly to other sanitizers, KCFI checking can be disabled for a
function with a `no_sanitize("kcfi")` function attribute.

Relands 67504c95494ff05be2a613129110c9bcf17f6c13 with a fix for
32-bit builds.

Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay

Differential Revision: https://reviews.llvm.org/D119296
2022-08-24 22:41:38 +00:00
Sami Tolvanen
a79060e275 Revert "KCFI sanitizer"
This reverts commit 67504c95494ff05be2a613129110c9bcf17f6c13 as using
PointerEmbeddedInt to store 32 bits breaks 32-bit arm builds.
2022-08-24 19:30:13 +00:00
Sami Tolvanen
67504c9549 KCFI sanitizer
The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a
forward-edge control flow integrity scheme for indirect calls. It
uses a !kcfi_type metadata node to attach a type identifier for each
function and injects verification code before indirect calls.

Unlike the current CFI schemes implemented in LLVM, KCFI does not
require LTO, does not alter function references to point to a jump
table, and never breaks function address equality. KCFI is intended
to be used in low-level code, such as operating system kernels,
where the existing schemes can cause undue complications because
of the aforementioned properties. However, unlike the existing
schemes, KCFI is limited to validating only function pointers and is
not compatible with executable-only memory.

KCFI does not provide runtime support, but always traps when a
type mismatch is encountered. Users of the scheme are expected
to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi`
operand bundle to indirect calls, and LLVM lowers this to a
known architecture-specific sequence of instructions for each
callsite to make runtime patching easier for users who require this
functionality.

A KCFI type identifier is a 32-bit constant produced by taking the
lower half of xxHash64 from a C++ mangled typename. If a program
contains indirect calls to assembly functions, they must be
manually annotated with the expected type identifiers to prevent
errors. To make this easier, Clang generates a weak SHN_ABS
`__kcfi_typeid_<function>` symbol for each address-taken function
declaration, which can be used to annotate functions in assembly
as long as at least one C translation unit linked into the program
takes the function address. For example on AArch64, we might have
the following code:

```
.c:
  int f(void);
  int (*p)(void) = f;
  p();

.s:
  .4byte __kcfi_typeid_f
  .global f
  f:
    ...
```

Note that X86 uses a different preamble format for compatibility
with Linux kernel tooling. See the comments in
`X86AsmPrinter::emitKCFITypeId` for details.

As users of KCFI may need to locate trap locations for binary
validation and error handling, LLVM can additionally emit the
locations of traps to a `.kcfi_traps` section.

Similarly to other sanitizers, KCFI checking can be disabled for a
function with a `no_sanitize("kcfi")` function attribute.

Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay

Differential Revision: https://reviews.llvm.org/D119296
2022-08-24 18:52:42 +00:00
Phoebe Wang
190518da4b [X86] Use generic tuning for "x86-64" if "tune-cpu" is not specified
This is an alternative to D129154. See discussions on https://discourse.llvm.org/t/fast-scalar-fsqrt-tuning-in-x86/63605

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D129647
2022-07-15 10:05:08 +08:00
Nick Desaulniers
2240d72f15 [X86] initial -mfunction-return=thunk-extern support
Adds support for:
* `-mfunction-return=<value>` command line flag, and
* `__attribute__((function_return("<value>")))` function attribute

Where the supported <value>s are:
* keep (disable)
* thunk-extern (enable)

thunk-extern enables clang to change ret instructions into jmps to an
external symbol named __x86_return_thunk, implemented as a new
MachineFunctionPass named "x86-return-thunks", keyed off the new IR
attribute fn_ret_thunk_extern.

The symbol __x86_return_thunk is expected to be provided by the runtime
the compiled code is linked against and is not defined by the compiler.
Enabling this option alone doesn't provide mitigations without
corresponding definitions of __x86_return_thunk!

This new MachineFunctionPass is very similar to "x86-lvi-ret".

The <value>s "thunk" and "thunk-inline" are currently unsupported. It's
not clear yet that they are necessary: whether the thunk pattern they
would emit is beneficial or used anywhere.

Should the <value>s "thunk" and "thunk-inline" become necessary,
x86-return-thunks could probably be merged into x86-retpoline-thunks
which has pre-existing machinery for emitting thunks (which could be
used to implement the <value> "thunk").

Has been found to build+boot with corresponding Linux
kernel patches. This helps the Linux kernel mitigate RETBLEED.
* CVE-2022-23816
* CVE-2022-28693
* CVE-2022-29901

See also:
* "RETBLEED: Arbitrary Speculative Code Execution with Return
Instructions."
* AMD SECURITY NOTICE AMD-SN-1037: AMD CPU Branch Type Confusion
* TECHNICAL GUIDANCE FOR MITIGATING BRANCH TYPE CONFUSION REVISION 1.0
  2022-07-12
* Return Stack Buffer Underflow / Return Stack Buffer Underflow /
  CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702

SystemZ may eventually want to support "thunk-extern" and "thunk"; both
options are used by the Linux kernel's CONFIG_EXPOLINE.

This functionality has been available in GCC since the 8.1 release, and
was backported to the 7.3 release.

Many thanks for folks that provided discrete review off list due to the
embargoed nature of this hardware vulnerability. Many Bothans died to
bring us this information.

Link: https://www.youtube.com/watch?v=IF6HbCKQHK8
Link: https://github.com/llvm/llvm-project/issues/54404
Link: https://gcc.gnu.org/legacy-ml/gcc-patches/2018-01/msg01197.html
Link: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/return-stack-buffer-underflow.html
Link: https://arstechnica.com/information-technology/2022/07/intel-and-amd-cpus-vulnerable-to-a-new-speculative-execution-attack/?comments=1
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce114c866860aa9eae3f50974efc68241186ba60
Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00702.html
Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00707.html

Reviewed By: aaron.ballman, craig.topper

Differential Revision: https://reviews.llvm.org/D129572
2022-07-12 09:17:54 -07:00
Luo, Yuanke
5cb0979870 [X86][AMX] Split greedy RA for tile register
When we fill the shape to tile configure memory, the shape is gotten
from AMX pseudo instruction. However the register for the shape may be
split or spilled by greedy RA. That cause we fill the shape to config
memory after ldtilecfg is executed, so that the shape configuration
would be wrong.
This patch is to split the tile register allocation from greedy register
allocation, so that after tile registers are allocated the shape
registers are still virtual register. The shape register only may be
redefined or multi-defined by phi elimination pass, two address pass.
That doesn't affect tile register configuration.

Differential Revision: https://reviews.llvm.org/D128584
2022-06-29 10:35:43 +08:00
Kazu Hirata
e0e687a615 [llvm] Don't use Optional::hasValue (NFC) 2022-06-20 10:38:12 -07:00
Paul Robinson
654a835c3f [PS5] Trap after noreturn calls, with special case for stack-check-fail 2022-06-15 09:02:17 -07:00
Luo, Yuanke
496156ac57 [X86][AMX] Multiple configure for AMX register.
The previous solution depends on variable name to record the shape
information. However it is not reliable, because in release build
compiler would not set the variable name. It can be accomplished with an
additional option `fno-discard-value-names`, but it is not acceptable
for users.
This patch is to preconfigure the tile register with machine
instruction. It follow the same way what sigle configure does. In the
future we can fall back to multiple configure when single configure
fails due to the shape dependency issue.
The algorithm to configure the tile register is simple in the patch. We
may improve it in the future. It configure tile register based on basic
block. Compiler would spill the tile register if it live out the basic
block. After the configure there should be no spill across tile
confgiure in the register alloction. Just like fast register allocation
the algorithm walk the instruction in reverse order. When the shape
dependency doesn't meet, it insert ldtilecfg after the last instruction
that define the shape.
In post configuration compiler also walk the basic block to collect the
physical tile register number and generate instruction to fill the stack
slot for the correponding shape information.
TODO: There is some following work in D125602. The risk is modifying the
fast RA may cause regression as fast RA is usded for different targets.
We may create an independent RA for tile register.

Differential Revision: https://reviews.llvm.org/D125075
2022-05-24 13:18:42 +08:00
Craig Topper
589517925b [X86] Call initializeX86PreTileConfigPass from LLVMInitializeX86Target.
Without this, the pass doesn't show up in print-before/after-all.

Differential Revision: https://reviews.llvm.org/D124973
2022-05-04 19:09:06 -07:00
serge-sans-paille
ed98c1b376 Cleanup includes: DebugInfo & CodeGen
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121332
2022-03-12 17:26:40 +01:00
Jameson Nash
c4b1a63a1b mark getTargetTransformInfo and getTargetIRAnalysis as const
Seems like this can be const, since Passes shouldn't modify it.

Reviewed By: wsmoses

Differential Revision: https://reviews.llvm.org/D120518
2022-02-25 14:30:44 -05:00
Yuanfang Chen
f927021410 Reland "[clang-cl] Support the /JMC flag"
This relands commit b380a31de084a540cfa38b72e609b25ea0569bb7.

Restrict the tests to Windows only since the flag symbol hash depends on
system-dependent path normalization.
2022-02-10 15:16:17 -08:00
Yuanfang Chen
b380a31de0 Revert "[clang-cl] Support the /JMC flag"
This reverts commit bd3a1de683f80d174ea9c97000db3ec3276bc022.

Break bots:
https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-windows-x64/b8822587673277278177/overview
2022-02-10 14:17:37 -08:00
Yuanfang Chen
bd3a1de683 [clang-cl] Support the /JMC flag
The introduction and some examples are on this page:
https://devblogs.microsoft.com/cppblog/announcing-jmc-stepping-in-visual-studio/

The `/JMC` flag enables these instrumentations:
- Insert at the beginning of every function immediately after the prologue with
  a call to `void __fastcall __CheckForDebuggerJustMyCode(unsigned char *JMC_flag)`.
  The argument for `__CheckForDebuggerJustMyCode` is the address of a boolean
  global variable (the global variable is initialized to 1) with the name
  convention `__<hash>_<filename>`. All such global variables are placed in
  the `.msvcjmc` section.
- The `<hash>` part of `__<hash>_<filename>` has a one-to-one mapping
  with a directory path. MSVC uses some unknown hashing function. Here I
  used DJB.
- Add a dummy/empty COMDAT function `__JustMyCode_Default`.
- Add `/alternatename:__CheckForDebuggerJustMyCode=__JustMyCode_Default` link
  option via ".drectve" section. This is to prevent failure in
  case `__CheckForDebuggerJustMyCode` is not provided during linking.

Implementation:
All the instrumentations are implemented in an IR codegen pass. The pass is placed immediately before CodeGenPrepare pass. This is to not interfere with mid-end optimizations and make the instrumentation target-independent (I'm still working on an ELF port in a separate patch).

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D118428
2022-02-10 10:26:30 -08:00
Phoebe Wang
37d1d02200 [X86][MS] Change the alignment of f80 to 16 bytes on Windows 32bits to match with ICC
MSVC currently doesn't support 80 bits long double. ICC supports it when
the option `/Qlong-double` is specified. Changing the alignment of f80
to 16 bytes so that we can be compatible with ICC's option.

Reviewed By: rnk, craig.topper

Differential Revision: https://reviews.llvm.org/D115942
2022-01-23 09:58:46 +08:00
Phoebe Wang
f63a805a4e Revert "[X86][MS] Change the alignment of f80 to 16 bytes on Windows 32bits to match with ICC"
This reverts commit 1bb0caf561688681be67cc91560348c9e43fcbf3.
2022-01-15 10:54:38 +08:00
Phoebe Wang
1bb0caf561 [X86][MS] Change the alignment of f80 to 16 bytes on Windows 32bits to match with ICC
MSVC currently doesn't support 80 bits long double. ICC supports it when
the option `/Qlong-double` is specified. Changing the alignment of f80
to 16 bytes so that we can be compatible with ICC's option.

Reviewed By: rnk, craig.topper

Differential Revision: https://reviews.llvm.org/D115942
2022-01-12 17:50:37 +08:00
Florian Hahn
ff3b085ab0
[X86] Use bundle for CALL_RVMARKER expansion.
This patch updates expandCALL_RVMARKER to wrap the call, marker and
objc runtime call in an instruction bundle. This ensures later passes,
like machine block placement, cannot break them up.

On AArch64, the instruction sequence is already wrapped in a bundle.
Keeping the whole instruction sequence together is highly desirable for
performance and outweighs potential other benefits from breaking the
sequence up.

Reviewed By: ahatanak

Differential Revision: https://reviews.llvm.org/D115230
2021-12-14 10:53:22 +00:00
Reid Kleckner
89b57061f7 Move TargetRegistry.(h|cpp) from Support to MC
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.

This allows us to ensure that Support doesn't have includes from MC/*.

Differential Revision: https://reviews.llvm.org/D111454
2021-10-08 14:51:48 -07:00
Hongtao Yu
d9b511d8e8 [CSSPGO] Set PseudoProbeInserter as a default pass.
Currenlty PseudoProbeInserter is a pass conditioned on a target switch. It works well with a single clang invocation. It doesn't work so well when the backend is called separately (i.e, through the linker or llc), where user has always to pass -pseudo-probe-for-profiling explictly. I'm making the pass a default pass that requires no command line arg to trigger, but will be actually run depending on whether the CU comes with `llvm.pseudo_probe_desc` metadata.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D110209
2021-09-22 09:09:48 -07:00
Amara Emerson
4ceea77409 [X86] Rename the X86WinAllocaExpander pass and related symbols to "DynAlloca". NFC.
For x86 Darwin, we have a stack checking feature which re-uses some of this
machinery around stack probing on Windows. Renaming this to be more appropriate
for a generic feature.

Differential Revision: https://reviews.llvm.org/D109993
2021-09-20 16:19:28 -07:00
Nick Desaulniers
3787ee4571 reland [IR] make -stack-alignment= into a module attr
Relands commit 433c8d950cb3a1fa0977355ce0367e8c763a3f13 with fixes for
MIPS.

Similar to D102742, specifying the stack alignment via CodegenOpts means
that this flag gets dropped during LTO, unless the command line is
re-specified as a plugin opt. Instead, encode this information as a
module level attribute so that we don't have to expose this llvm
internal flag when linking the Linux kernel with LTO.

Looks like external dependencies might need a fix:
* https://github.com/llvm-hs/llvm-hs/issues/345
* https://github.com/halide/Halide/issues/6079

Link: https://github.com/ClangBuiltLinux/linux/issues/1377

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D103048
2021-06-08 10:59:46 -07:00
Nick Desaulniers
a596b54d47 Revert "[IR] make -stack-alignment= into a module attr"
This reverts commit 433c8d950cb3a1fa0977355ce0367e8c763a3f13.

Breaks the MIPS build.
2021-06-08 08:55:50 -07:00
Nick Desaulniers
433c8d950c [IR] make -stack-alignment= into a module attr
Similar to D102742, specifying the stack alignment via CodegenOpts means
that this flag gets dropped during LTO, unless the command line is
re-specified as a plugin opt. Instead, encode this information as a
module level attribute so that we don't have to expose this llvm
internal flag when linking the Linux kernel with LTO.

Looks like external dependencies might need a fix:
* https://github.com/llvm-hs/llvm-hs/issues/345
* https://github.com/halide/Halide/issues/6079

Link: https://github.com/ClangBuiltLinux/linux/issues/1377

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D103048
2021-06-08 08:31:04 -07:00
Harald van Dijk
75521bd9d8
[X32] Add Triple::isX32(), use it.
So far, support for x86_64-linux-gnux32 has been handled by explicit
comparisons of Triple.getEnvironment() to GNUX32. This worked as long as
x86_64-linux-gnux32 was the only X32 environment to worry about, but we
now have x86_64-linux-muslx32 as well. To support this, this change adds
an isX32() function and uses it. It replaces all checks for GNUX32 or
MuslX32 by isX32(), except for the following:

- Triple::isGNUEnvironment() and Triple::isMusl() are supposed to treat
  GNUX32 and MuslX32 differently.
- computeTargetTriple() needs to be able to transform triples to add or
  remove X32 from the environment and needs to map GNU to GNUX32, and
  Musl to MuslX32.
- getMultiarchTriple() completely lacks any Musl support and retains the
  explicit check for GNUX32 as it can only return x86_64-linux-gnux32.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D103777
2021-06-07 20:48:39 +01:00
Xiang1 Zhang
d4bdeca576 [X86] Support AMX fast register allocation
Differential Revision: https://reviews.llvm.org/D100026
2021-05-08 14:21:11 +08:00
Xiang1 Zhang
bebafe01a7 Revert "[X86] Support AMX fast register allocation"
This reverts commit 77e2e5e07d01fe0b83c39d0c527c0d3d2e659146.
2021-05-08 13:43:32 +08:00