36581 Commits

Author SHA1 Message Date
Yuta Saito
d4efc3e097
[Coverage][WebAssembly] Add initial support for WebAssembly/WASI (#111332)
Currently, WebAssembly/WASI target does not provide direct support for
code coverage.
This patch set fixes several issues to unlock the feature. The main
changes are:

1. Port `compiler-rt/lib/profile` to WebAssembly/WASI.
2. Adjust profile metadata sections for Wasm object file format.
- [CodeGen] Emit `__llvm_covmap` and `__llvm_covfun` as custom sections
instead of data segments.
    - [lld] Align the interval space of custom sections at link time.
- [llvm-cov] Copy misaligned custom section data if the start address is
not aligned.
    - [llvm-cov] Read `__llvm_prf_names` from data segments
3. [clang] Link with profile runtime libraries if requested

See each commit message for more details and rationale.
This is part of the effort to add code coverage support in Wasm target
of Swift toolchain.
2024-10-15 02:41:43 +09:00
Michael Marjieh
b5600c6f85
[TargetLowering][SelectionDAG] Exploit nneg Flag in UINT_TO_FP (#108931)
1. Propagate the nneg flag in WidenVecRes
2. Use SINT_TO_FP in expandUINT_TO_FP when possible.
2024-10-14 20:55:48 +04:00
Akshat Oke
cd6c2b80be
[NewPM][CodeGen] Port StackColoring to NPM (#111812) 2024-10-14 19:23:34 +05:30
c8ef
a3b0c31ebc
Revert "[DAG] Enhance SDPatternMatch to match integer minimum and maximum patterns in addition to the existing ISD nodes." (#112200)
Reverts llvm/llvm-project#111774

This appears to be causing some tests to fail.
2024-10-14 21:43:49 +08:00
c8ef
11f625cb87
[DAG] Enhance SDPatternMatch to match integer minimum and maximum patterns in addition to the existing ISD nodes. (#111774)
Closes #108218.

This patch adds icmp+select patterns for integer min/max matchers in
SDPatternMatch, similar to those in IR PatternMatch.
2024-10-14 21:19:34 +08:00
David Green
828d72b263
[GlobalISel] Add an assert for the DemandedElts APInt size. (#112150)
Similar to the other implementations in DAG/ValueTracking, this adds an
assert that the size of the DemandedElts is what we expect it to be -
the size of a fixed length vector or APInt(1,1) otherwise. The
G_BUILDVECTOR is fixed as it was passing an original DemandedElts for
the scalar operands.
2024-10-14 09:59:26 +01:00
Akshat Oke
3dba5d8584
[MIR] Add missing noteNewVirtualRegister callbacks (#111634)
The delegates' callback isn't invoked on parsing new virtual registers.

There are two places in the serialization where new virtual registers can be discovered: in register infos and in instructions.
2024-10-14 14:29:09 +05:30
Akshat Oke
dbfca24b99
[MIR] Serialize virtual register flags (#110228)
[MIR] Serialize virtual register flags

This introduces target-specific vreg flag serialization. Flags are represented as `uint8_t` and the `TargetRegisterInfo` override provides methods `getVRegFlagValue` to deserialize and `getVRegFlagsOfReg` to serialize.
2024-10-14 14:19:53 +05:30
duk
464a7ee79e
[CodeGen] Generalize trap emission after SP check fail (#109744)
Generalize and improve some target-specific code that emits traps after
stack protector failure in SelectionDAG & GlobalIsel.
2024-10-12 20:01:22 -04:00
Kazu Hirata
a62768c427
[CodeGen] Simplify code with *Map::operator[] (NFC) (#112075) 2024-10-11 23:01:21 -07:00
Tex Riddell
82b40fd4fd
Fix scalar overload name constructed by ReplaceWithVeclib.cpp (#111095)
ReplaceWithVeclib.cpp would construct overload name using all the
arguments in the intrinsic, but overloads should only be constructed
from arguments for which isVectorIntrinsicWithOverloadTypeAtArg returns
true, including the return type first (index -1).

Additionally,
- skip when `Intrinsic::not_intrinsic`, otherwise
`isVectorIntrinsicWithOverloadTypeAtArg` asserts for some
IntrinsicCalls.

Unblocks translation for pow and atan2 intrinsics.

Fixes #111093
2024-10-11 14:38:35 -07:00
Ellis Hoag
adaa603224
[MachineVerifier] Report errors from one thread at a time (#111605)
Create the `ReportedErrors` class to track the number of reported errors
during verification. The class will block reporting errors if some other
thread is currently reporting an error.

I've encountered a case where there were many different verifications
reporting errors at the same time on different threads. This ensures
that we don't start printing the error from one case until we are
completely done printing errors from other cases. Most of the time
`AbortOnError = true` so we usually abort after reporting the first
error.

Depends on https://github.com/llvm/llvm-project/pull/111602.
2024-10-11 11:53:44 -07:00
Rahul Joshi
fa789dffb1
[NFC] Rename Intrinsic::getDeclaration to getOrInsertDeclaration (#111752)
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
2024-10-11 05:26:03 -07:00
Yingwei Zheng
ec3e0a5900
Revert "[CodeGenPrepare] Convert ctpop(X) ==/!= 1 into ctpop(X) u</u> 2/1" (#111932)
Reverts llvm/llvm-project#111284 to fix clang stage2 builds.
Investigating...

Failed buildbots:
https://lab.llvm.org/buildbot/#/builders/76/builds/3576
https://lab.llvm.org/buildbot/#/builders/168/builds/4308
https://lab.llvm.org/buildbot/#/builders/127/builds/1087
2024-10-11 11:08:07 +08:00
Yingwei Zheng
e3894f58e1
[CodeGenPrepare] Convert ctpop(X) ==/!= 1 into ctpop(X) u</u> 2/1 (#111284)
Some targets have better codegen for `ctpop(X) u< 2` than `ctpop(X) ==
1`. After https://github.com/llvm/llvm-project/pull/100899, we set the
range of ctpop's return value to indicate the argument/result is
non-zero.

This patch converts `ctpop(X) ==/!= 1` into `ctpop(X) u</u> 2/1` in CGP
to fix https://github.com/llvm/llvm-project/issues/95255.
2024-10-11 09:08:38 +08:00
Ellis Hoag
cb5fbd2f60
[CodeLayout] Do not verify after assigning blocks (#111754)
Rather than invariantly running `F->verify()` when asserts are enabled,
run machine IR verification in LIT tests only.

Swap `CHECK-PERF` and `CHECK-SIZE` in `code_placement_ext_tsp_large.ll`.

Remove `={0,1,true,false}` from flags in tests.
2024-10-10 09:01:50 -07:00
Vladimir Radosavljevic
dabb0ddbd7
[MCP] Skip invalidating def constant regs during forward propagation (#111129)
Before this patch, redundant COPY couldn't be removed for the following
case:
```
  %reg1 = COPY %const-reg
  ... // There is a def of %const-reg
  %reg2 = COPY killed %reg1
```
where this can be optimized to:
```
  ... // There is a def of %const-reg
  %reg2 = COPY %const-reg
```

This patch allows for such optimization by not invalidating defined
constant registers. This is safe, as architectures like AArch64 and
RISCV replace a dead definition of a GPR with a zero constant register
for certain instructions.
2024-10-10 18:05:42 +04:00
Oliver Stannard
1e49670b31
[DAGISel] Keep flags when converting FP load/store to integer (#111679)
This DAG combine replaces a floating-point load/store pair which has no
other uses with an integer one, but did not copy the memory operand
flags to the new instructions, resulting in it dropping the volatile
flag. This optimisation is still valid if one or both of the
instructions is volatile, so we can copy over the whole
MachineMemOperand to generate volatile integer loads and stores where
needed.
2024-10-10 09:17:50 +01:00
YunQiang Su
8d35ab80fc
AArch64: Add FMINNUM_IEEE and FMAXNUM_IEEE support (#107855)
FMINNM/FMAXNM instructions of AArch64 follow IEEE754-2008. We can use
them to canonicalize a floating point number. And
FMINNUM_IEEE/FMAXNUM_IEEE is used by something like expanding
FMINIMUMNUM/FMAXIMUMNUM, so let's define them.

Update combine_andor_with_cmps.ll.
Add fp-maximumnum-minimumnum.ll, with nnan testcases only.

V1F64 is not supported yet.
If we set v1f64 as legal, FMINNUM/FMAXNUM will have some problem:
   both of them use `if (isOperationLegalOrCustom(FMAXNUM_IEEE, VT))`.

AArch64 depends on `expandFMINNUM_FMAXNUM` returning `SDValue()`
for FMAXNUM and FMINNUM.

We should fix this problem, while it will be in future patch.
2024-10-10 15:09:47 +08:00
YunQiang Su
d52c8408ff
SelectionDAG/expandFMINNUM_FMAXNUM: skips vector if SETCC/VSELECT is not legal (#109570)
If SETCC or VSELECT is not legal for vector, we should not expand it,
instead we can split the vectors.

So that, some simple scale instructions can be emitted instead of
some pairs of comparation+selection.
2024-10-10 08:39:25 +08:00
Jeffrey Byrnes
853c43d04a
[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564)
Porting to TTI provides direct access to the instruction cost model,
which can enable instruction cost based sinking without introducing code
duplication.
2024-10-09 14:30:09 -07:00
Ellis Hoag
d905a3c51b
[NFC] Format MachineVerifier.cpp to remove extra indentation (#111602)
Many structs in this class have the wrong indentation. To generate this
diff, I touched the first line of each struct and then ran `git
clang-format`. This will make blaming more difficult, but this
autoformatting is difficult to avoid triggering. I think it's best to
push this as one NFC PR.
2024-10-09 08:26:30 -07:00
Matt Arsenault
ced15cd418
DAG: Preserve more flags when expanding gep (#110815)
This allows selecting the addressing mode for stack instructions
in cases where we need to prove the sign bit is zero.
2024-10-09 13:51:52 +04:00
William G Hatch
181840459d
[LiveDebugValues][NVPTX]VarLocBasedImpl handle vregs, enable for NVPTX (#111456)
This patch handles virtual registers in the VarLocBasedImpl of the
LiveDebugVariables pass, allowing it to be used on architectures that
depend on virtual registers in debugging, like NVPTX. It enables the
pass for NVPTX.
2024-10-08 19:38:47 -06:00
Simon Pilgrim
1dcb6dc757 [DAG] foldVSelectToSignBitSplatMask - pull out repeated code and use getShiftAmountConstant helper.
We're assuming shift amount type matches the result type - which is true for vectors, but I'm hoping to generalize this fold in the future.
2024-10-08 17:36:34 +01:00
Juan Manuel Martinez Caamaño
327124ece7
[NFC][EarlyIfConverter] Rename SSAIfConv::runOnMachineFunction to SSAIfConv::init (#111500) 2024-10-08 11:11:23 +02:00
Ralf Jung
29ec0716a8
Fix comment typo in ExpandFCOPYSIGN (#111489)
I noticed this while following
https://github.com/llvm/llvm-project/pull/111269. It makes little sense
that FCOPYSIGN would look at the sign of `x`, right? Surely this must be
`y`. Also fix the inconsistency where it's sometimes `x` and sometimes
`X`.
2024-10-08 12:47:56 +04:00
Juan Manuel Martinez Caamaño
1df8ccd35b
Revert "[NFC][EarlyIfConverter] Turn SSAIfConv into a local variable (#107390)" (#111385)
This reverts commit 09a4c23eb410d4be52202bed21c967a3653c3544.
2024-10-08 10:27:22 +02:00
Juan Manuel Martinez Caamaño
d5ec01b0dd
Revert "[NFC][EarlyIfConverter] Replace boolean Predicate for a class (#108519)" (#111372)
This reverts commit 9e7315912656628b606e884e39cdeb261b476f16.
2024-10-07 16:03:11 +02:00
Juan Manuel Martinez Caamaño
a018353f4b Revert "[NFC][EarlyIfConverter] Remove unused member variables"
This reverts commit 3c83102f0615c7d66f6df698ca472ddbf0e9483d.
2024-10-07 14:37:37 +02:00
Paul Walker
02dd6b1014
[LLVM][CodeGen] Add lowering for scalable vector bfloat operations. (#109803)
Specifically:
  fabs, fadd, fceil, fdiv, ffloor, fma, fmax, fmaxnm, fmin, fminnm,
  fmul, fnearbyint, fneg, frint, fround, froundeven, fsub, fsqrt &
  ftrunc
2024-10-07 13:01:59 +01:00
Luke Lau
c98e41f858
[LegalizeVectorTypes] Always widen fabs (#111298)
fabs and fneg are similar nodes in that they can always be expanded to
integer ops, but currently they diverge when widened.

If the widened vector fabs is marked as expand (and the corresponding
scalar type is too), LegalizeVectorTypes thinks that it may be turned
into a libcall and so will unroll it to avoid the overhead on the undef
elements.

However unlike the other ops in that list like fsin, fround, flog etc.,
an fabs marked as expand will never be legalized into a libcall. Like
fneg, it can always be expanded into an integer op.

This moves it below unrollExpandedOp to bring it in line with fneg,
which fixes an issue on RISC-V with f16 fabs being unexpectedly
scalarized when there's no zfhmin.
2024-10-07 17:40:32 +08:00
Luke Lau
18d3a5d558
[LegalizeVectorTypes] When widening don't check for libcalls if promoted (#111297)
When widening some FP ops, LegalizeVectorTypes will check to see if the
widened op may be scalarized and then turned into a bunch of libcalls,
and if so unroll early to avoid unnecessary libcalls of the padded undef
elements.

It checks if the widened op is legal or custom to see if it will be
scalarized, but promoted ops will also avoid scalarization.

This relaxes the check to account for this which fixes some illegal
vector types on RISC-V from being scalarized when they could be widened.
2024-10-07 16:42:36 +08:00
Kazu Hirata
bea28037f6
[CodeGen] Avoid repeated hash lookups (NFC) (#111274) 2024-10-06 09:21:25 -07:00
Craig Topper
20e37f03c6
[GISel] Don't preserve NSW flag when converting G_MUL of INT_MIN to G_SHL. (#111230)
mul and shl have different meanings for the nsw flag. We need to drop it
when converting a multiply by the minimum negative value.
2024-10-05 10:27:09 -07:00
Daniel Hoekwater
8305e9fc09
Revert "[CFIFixup] Factor CFI remember/restore insertion into a helper (NFC)" (#111168)
Reverts llvm/llvm-project#111066

This seems to be breaking some builds:
- https://lab.llvm.org/buildbot/#/builders/51/builds/4732
- https://lab.llvm.org/buildbot/#/builders/41/builds/2534
- https://lab.llvm.org/buildbot/#/builders/73/builds/6601
2024-10-04 10:34:03 -04:00
Daniel Hoekwater
47c8b95dae
[CFIFixup] Factor CFI remember/restore insertion into a helper (NFC) (#111066)
Inserting a remember/restore pair is a very clean abstraction, so we can
split the logic out into a helper function. Additionally, cleaning this up
will make it easier to add logic for handling functions that are split across
multiple sections.
2024-10-04 09:31:22 -04:00
Stephen Tozer
b01be72af0 [NFC][CodeGen] Remove unused HasFakeUses MachineFunctionProperty
A previous commit d826b0c9 accidentally added a new MachineFunctionProperty,
HasFakeUses, that was unused by the commit (and results in an
uncovered-switch warning, which was fixed by a separate followup 1811e872);
this patch removes that enum value.
2024-10-04 13:58:29 +01:00
Jie Fu
1811e87204 [CodeGen] Fix enumeration value 'HasFakeUses' not handled in switch (NFC)
llvm-project/llvm/lib/CodeGen/MachineFunction.cpp:95:10:
error: enumeration value 'HasFakeUses' not handled in switch [-Werror,-Wswitch]
  switch(Prop) {
         ^~~~
1 error generated.
2024-10-04 20:27:40 +08:00
Stephen Tozer
d826b0c90f
[LLVM] Add HasFakeUses to MachineFunction (#110097)
Following the addition of the llvm.fake.use intrinsic and corresponding
MIR instruction, two further changes are planned: to add an
-fextend-lifetimes flag to Clang that emits these intrinsics, and to
have -Og enable this flag by default. Currently, some logic for handling
fake uses is gated by the optdebug attribute, which is intended to be
switched on by -fextend-lifetimes (and by extension -Og later on).
However, the decision was made that a general optdebug attribute should
be incompatible with other opt_ attributes (e.g. optsize, optnone),
since they all express different intents for how to optimize the
program. We would still like to allow -fextend-lifetimes with optsize
however (i.e. -Os -fextend-lifetimes should be legal), since it may be a
useful configuration and there is no technical reason to not allow it.

This patch resolves this by tracking MachineFunctions that have fake
uses, allowing us to run passes that interact with them and skip passes
that clash with them.
2024-10-04 13:13:30 +01:00
William G Hatch
ae635d6f99
[NVPTX] add support for .debug_loc section (#110905)
Enable .debug_loc section for NVPTX backend.

This commit makes NVPTX omit DW_AT_low_pc (and DW_AT_high_pc) for
DW_TAG_compile_unit. This is because cuda-gdb uses the compile unit's
low_pc as a base address, and adds the addresses in the debug_loc
section to it. Removing low_pc is equivalent to setting that base
address to zero, so addition doesn't break the location ranges.
Additionally, this patch forces debug_loc label emission to emit single
labels with no subtraction or base. This would not be necessary if we
could emit `label1 - label2` expressions in PTX. The PTX documentation
at
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#debugging-directives-section
makes it seem like this is supported, but it doesn't actually work. I
believe when that documentation says that you can subtract “label
addresses between labels in the same dwarf section”, it doesn't merely
mean that the labels need to be in the same section as each other, but
in fact they need to be in the same section as the use. If support for
label subtraction is supported such that in the debug_loc section you
can subtract labels from the main code section, then we can remove the
workarounds added in this PR.

Also, since this now emits valid .debug_loc sections, it replaces the
empty .debug_loc to force existence of at least one debug section with
an empty .debug_macinfo section, which matches what nvcc does.
2024-10-03 16:31:49 -06:00
Luke Lau
487686b82e
[SDAG][RISCV] Don't promote VP_REDUCE_{FADD,FMUL} (#111000)
In https://reviews.llvm.org/D153848, promotion was added for a variety
of f16 ops with zvfhmin, including VP reductions.

However I don't believe it's correct to promote f16 fadd or fmul
reductions to f32 since we need to round the intermediate results.

Today if we lower @llvm.vp.reduce.fadd.nxv1f16 on RISC-V, we'll get two
different results depending on whether we compiled with +zvfh or
+zvfhmin, for example with a 3 element reduction:

	; v9 = [0.1563, 5.97e-8, 0.00006104]

	; zvfh
	vsetivli x0, 3, e16, m1, ta, ma
	vmv.v.i v8, 0
	vfredosum.vs v8, v9, v8
	vfmv.f.s fa0, v8
	; fa0 = 0.1563

	; zvfhmin
	vsetivli x0, 3, e16, m1, ta, ma
	vfwcvt.f.f.v v10, v9
	vsetivli x0, 3, e32, m1, ta, ma
	vmv.v.i v8, 0
	vfredosum.vs v8, v10, v8
	vfmv.f.s fa0, v8
	fcvt.h.s fa0, fa0
	; fa0 = 0.1564

This same thing happens with reassociative reductions e.g. vfredusum.vs,
and this also applies for bf16.

I couldn't find anything in the LangRef for reductions that suggest the
excess precision is allowed. There may be something we can do in Clang
with -fexcess-precision=fast, but I haven't looked into this yet.

I presume the same precision issue occurs with fmul, but not with
fmin/fmax/fminimum/fmaximum.

I can't think of another way of lowering these other than scalarizing,
and we can't scalarize scalable vectors, so this just removes the
promotion and adjusts the cost model to return an invalid cost. (It
looks like we also don't currently cost fmul reductions, so presumably
they also have an invalid cost?)

I think this should be enough to stop the loop vectorizer or SLP from
emitting these intrinsics.
2024-10-04 00:17:45 +08:00
Mehdi Amini
6c7a3f80e7
Fix LLVM_ENABLE_ABI_BREAKING_CHECKS macro check: use #if instead of #ifdef (#110938)
This macros is always defined: either 0 or 1. The correct pattern is to
use #if.

Re-apply #110185 with more fixes for debug build with the ABI breaking
checks disabled.
2024-10-03 01:24:14 +02:00
Christopher Di Bella
45ad1ac4a3
Revert "Fix LLVM_ENABLE_ABI_BREAKING_CHECKS macro check: use #if inst… (#110923)
…ead of #ifdef (#110883)"

This reverts commit 1905cdbf4ef15565504036c52725cb0622ee64ef, which
causes lots of failures where LLVM doesn't have the right header guards.
The errors can be seen on
[BuildKite](https://buildkite.com/llvm-project/upstream-bazel/builds/112362#01924eae-231c-4d06-ba87-2c538cf40e04),
where the source uses `#ifndef NDEBUG`, but the content in question is
defined when `LLVM_ENABLE_ABI_BREAKING_CHECKS == 1`.

For example, `llvm/include/llvm/Support/GenericDomTreeConstruction.h`
has the following:

```cpp
// Helper struct used during edge insertions.
struct InsertionInfo {
  // ...
#ifdef LLVM_ENABLE_ABI_BREAKING_CHECKS
  SmallVector<TreeNodePtr, 8> VisitedUnaffected;
#endif
};

// ...

InsertionInfo II;
// ...
#ifndef NDEBUG
            II.VisitedUnaffected.push_back(SuccTN);
#endif
```
2024-10-02 13:54:09 -07:00
Francis Visoiu Mistrih
916e6ad7d0
[CodeGen] Fix InstructionCount remarks for MI bundles (#107621)
For MI bundles, the instruction count remark doesn't count the
instructions inside the bundle.
2024-10-02 13:37:25 -07:00
spupyrev
9016f27c42
[CodeLayout] Size-aware machine block placement (#109711)
This is an implementation of a new "size-aware" machine block placement.
The
idea is to reorder blocks so that the number of fall-through jumps is
maximized.
Observe that profile data is ignored for the optimization, and it is
applied only
for instances with hasOptSize()=true.
This strategy has two benefits:
(i) it eliminates jump instructions, which results in smaller text size;
(ii) we avoid using profile data while reordering blocks, which yields
more
"uniform" functions, thus helping ICF and machine outliner/merger.

For large (mobile) apps, the size benefits of (i) and (ii) are roughly
the same,
combined providing up to 0.5% uncompressed and up to 1% compressed
savings size
on top of the current solution.

The optimization is turned off by default.
2024-10-02 10:48:08 -07:00
Mehdi Amini
1905cdbf4e
Fix LLVM_ENABLE_ABI_BREAKING_CHECKS macro check: use #if instead of #ifdef (#110883)
This macros is always defined: either 0 or 1. The correct pattern is to
use #if.

Reapply https://github.com/llvm/llvm-project/pull/110185 with fixes.
2024-10-02 18:43:16 +02:00
Matt Arsenault
187dcd8e22
DAG: Preserve disjoint flag when emitting final instructions (#110795) 2024-10-02 19:37:04 +04:00
Bevin Hansson
1a65d95d00
[CodeGen][RAGreedy] Inform LiveDebugVariables about snippets spilled by InlineSpiller. (#109962)
RAGreedy invokes InlineSpiller to spill a particular virtreg inline.
When the spiller does this, it also identifies small, adjacent liveranges called
snippets. These are also spilled or rematerialized in the process.

However, the spiller does not inform RA that it has spilled these regs.
This means that debug variable locations referencing these regs/ranges
are lost.

Mark any spilled regs which do not have a stack slot assigned to them as
allocated to the slot being spilled to to tell LDV that those regs are
located in that slot, even though the regs might no longer exist in the
program after regalloc is finished. Also, inform RA about all of the
regs which were replaced (spilled or rematted), not just the one that was
requested so that it can properly manage the ranges of the debug vars.
2024-10-02 10:29:56 +02:00
Michael Maitland
f957d080e9
[RISCV][GISEL] Legalize G_EXTRACT_SUBVECTOR (#109426)
This is heavily based on the SelectionDAG lowerEXTRACT_SUBVECTOR code.
2024-10-01 14:08:49 -04:00