54 Commits

Author SHA1 Message Date
Roman Lebedev
0aef747b84
[NFC][X86][Codegen] Megacommit: mass-regenerate all check lines that were already autogenerated
The motivation is that the update script has at least two deviations
(`<...>@GOT`/`<...>@PLT`/ and not hiding pointer arithmetics) from
what pretty much all the checklines were generated with,
and most of the tests are still not updated, so each time one of the
non-up-to-date tests is updated to see the effect of the code change,
there is a lot of noise. Instead of having to deal with that each
time, let's just deal with everything at once.

This has been done via:
```
cd llvm-project/llvm/test/CodeGen/X86
grep -rl "; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py" | xargs -L1 <...>/llvm-project/llvm/utils/update_llc_test_checks.py --llc-binary <...>/llvm-project/build/bin/llc
```

Not all tests were regenerated, however.
2021-06-11 23:57:02 +03:00
Jim Lin
4456805938 [X86] Don't fold (fneg (fma (fneg X), Y, (fneg Z))) to (fma X, Y, Z)
Check if it has no signed zeros flag (nsz) in getNegatedExpression for x86.
This patch fixed miscompilation: https://alive2.llvm.org/ce/z/XxwBAJ

Reviewed By: RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D90901
2021-05-21 23:02:19 +08:00
Wang, Pengfei
c22dc71b12 [CodeGen][X86] Remove unused trivial check-prefixes from all CodeGen/X86 directory.
I had manually removed unused prefixes from CodeGen/X86 directory for more than 100 tests.
I checked the change history for each of them at the beginning, and then I mainly focused on the format since I found all of the unused prefixes were result from either insensible copy or residuum after functional update.
I think it's OK to remove the remaining X86 tests by script now. I wrote a rough script which works for me in most tests. I put it in llvm/utils temporarily for review and hope it may help other components owners.
The tests in this patch are all generated by the tool and checked by update tool for the autogenerated tests. I skimmed them and checked about 30 tests and didn't find any unexpected changes.

Reviewed By: mtrofin, MaskRay

Differential Revision: https://reviews.llvm.org/D91496
2020-11-16 09:45:55 +08:00
Sanjay Patel
39009a8245 [DAGCombiner] tighten fast-math constraints for fma fold
fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E)

This is only allowed when "reassoc" is present on the fadd.

As discussed in D80801, this transform goes beyond
what is allowed by "contract" FMF (-ffp-contract=fast).
That is because we are fusing the trailing add of 'E' with a
multiply, but without "reassoc", the code mandates that the
products A*B and C*D are added together before adding in 'E'.

I've added this example to the LangRef to try to clarify the
meaning of "contract". If that seems reasonable, we should
probably do something similar for the clang docs because
there does not appear to be any formal spec for the behavior
of -ffp-contract=fast.

Differential Revision: https://reviews.llvm.org/D82499
2020-07-12 08:51:49 -04:00
Sanjay Patel
26fd3ffa78 [x86][AArch64] add tests for fmul-fma combine; NFC
As discussed in D80801, there's a possible overstep in
what is allowed by the 'contract' fast-math-flag.
2020-06-24 15:56:32 -04:00
Sanjay Patel
702cf93356 [DAGCombiner] allow more folding of fadd + fmul into fma
If fmul and fadd are separated by an fma, we can fold them together
to save an instruction:
fadd (fma A, B, (fmul C, D)), N1 --> fma(A, B, fma(C, D, N1))

The fold implemented here is actually a specialization - we should
be able to peek through >1 fma to find this pattern. That's another
patch if we want to try that enhancement though.

This transform was guarded by the TLI hook enableAggressiveFMAFusion(),
so it was done for some in-tree targets like PowerPC, but not AArch64
or x86. The hook is protecting against forming a potentially more
expensive computation when fma takes longer to execute than a single
fadd. That hook may be needed for other transforms, but in this case,
we are replacing fmul+fadd with fma, and the fma should never take
longer than the 2 individual instructions.

'contract' FMF is all we need to allow this transform. That flag
corresponds to -ffp-contract=fast in Clang, so we are allowed to form
fma ops freely across expressions.

Differential Revision: https://reviews.llvm.org/D80801
2020-06-09 10:41:27 -04:00
Sanjay Patel
912502e8ef [AArch64][x86] add tests for FMA combines; NFC 2020-05-29 08:58:37 -04:00
Simon Pilgrim
4aa7b9cc96 [X86] X86InstComments - add FMA4 comments
These typically match the FMA3 equivalents, although the multiply operands sometimes get flipped due to the FMA3 permute variants.
2020-02-08 17:02:00 +00:00
Simon Pilgrim
c96001035d [X86] isNegatibleForFree - allow pre-legalized FMA negation
As long as the FMA operation is legal (which we can proxy for the FMA3/FMA4 variants as well), we don't have to wait for the LegalOperations stage.
2020-02-07 17:04:17 +00:00
Cameron McInally
5d9271802b Revert "[NFC][CodeGen] Add unary fneg tests to X86/fma_patterns.ll"
This reverts commit 06de52674da73f30751f3ff19fdf457f87077c65.

llvm-svn: 363314
2019-06-13 19:25:06 +00:00
Simon Pilgrim
287e78c82b [DAGCombine] GetNegatedExpression - constant float vector support (PR42105)
Add support for negation of constant build vectors.

Differential Revision: https://reviews.llvm.org/D62963

llvm-svn: 363040
2019-06-11 09:44:33 +00:00
Cameron McInally
06de52674d [NFC][CodeGen] Add unary fneg tests to X86/fma_patterns.ll
llvm-svn: 362730
2019-06-06 18:41:18 +00:00
Craig Topper
aa5eb2fbaa [X86] Force floating point values in constant pool decoding to print in scientific notation so they can't be confused with integers.
When the floating point constants are whole numbers they have no decimal point so look like integers, but mean something very different in something like an 'and' instruction.

Ideally we would just print a decimal point and a 0, but I couldn't see how to make APFloat::toString do that.

llvm-svn: 345488
2018-10-29 04:52:04 +00:00
Sanjay Patel
9e7e0fd828 [DAGCombiner] allow undef elts in vector fma matching
llvm-svn: 344528
2018-10-15 15:56:39 +00:00
Sanjay Patel
475a53649e [x86] add tests for fma with undef elts; NFC
llvm-svn: 344527
2018-10-15 15:47:37 +00:00
Sanjay Patel
4e970ff022 [DAGCombiner] allow undef elts in vector fma matching
llvm-svn: 344525
2018-10-15 15:38:38 +00:00
Sanjay Patel
b06ac18ee9 [x86] add tests for fma with undef elts; NFC
llvm-svn: 344523
2018-10-15 15:28:44 +00:00
Simon Pilgrim
ad23f270db [X86] Standardize floating point assembly comments
Consistently try to use APFloat::toString for floating point constant comments to get rid of differences between Constant / ConstantDataSequential values - it should help stop some of the linux-windows buildbot failures matching NaN/INF etc. as well.

Differential Revision: https://reviews.llvm.org/D52702

llvm-svn: 343562
2018-10-02 09:08:51 +00:00
Simon Pilgrim
43e4e648ef [X86] Regenerate fma comments.
llvm-svn: 343376
2018-09-29 14:31:00 +00:00
Francis Visoiu Mistrih
25528d6de7 [CodeGen] Unify MBB reference format in both MIR and debug output
As part of the unification of the debug format and the MIR format, print
MBB references as '%bb.5'.

The MIR printer prints the IR name of a MBB only for block definitions.

* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g'
* find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g'
* grep -nr 'BB#' and fix

Differential Revision: https://reviews.llvm.org/D40422

llvm-svn: 319665
2017-12-04 17:18:51 +00:00
Dinar Temirbulatov
a0beedef1c [X86] SET0 to use XMM registers where possible PR26018 PR32862
Differential Revision: https://reviews.llvm.org/D35965

llvm-svn: 309926
2017-08-03 08:50:18 +00:00
Dinar Temirbulatov
aead31a36f [X86] SET0 to use XMM registers where possible PR26018 PR32862
Differential Revision: https://reviews.llvm.org/D35839

llvm-svn: 309298
2017-07-27 17:47:01 +00:00
Simon Pilgrim
ddf407dec9 [X86][FMA] Regenerate test with broadcast comments.
llvm-svn: 309093
2017-07-26 10:20:49 +00:00
Craig Topper
2caa97c891 [AVX-512] Fix the execution domain for scalar FMA instructions.
llvm-svn: 296271
2017-02-25 19:36:28 +00:00
Nicolai Haehnle
3c67a08d1b X86: Add checks for fma_patterns[_wide].ll with -enable-no-infs-fp-math
This re-adds checks for the patterns that were disabled with r288506.

Reviewers: spatel, delena, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27346

llvm-svn: 289049
2016-12-08 14:08:08 +00:00
Nicolai Haehnle
33ca182c91 [DAGCombiner] do not fold (fmul (fadd X, 1), Y) -> (fmad X, Y, Y) by default
Summary:
When X = 0 and Y = inf, the original code produces inf, but the transformed
code produces nan. So this transform (and its relatives) should only be
used when the no-infs-fp-math flag is explicitly enabled.

Also disable the transform using fmad (intermediate rounding) when unsafe-math
is not enabled, since it can reduce the precision of the result; consider this
example with binary floating point numbers with two bits of mantissa:

  x = 1.01
  y = 111

  x * (y + 1) = 1.01 * 1000 = 1010 (this is the exact result; no rounding occurs at any step)

  x * y + x = 1000.11 + 1.01 =r 1000 + 1.01 = 1001.01 =r 1000 (with rounding towards zero)

The example relies on rounding towards zero at least in the second step.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98578

Reviewers: RKSimon, tstellarAMD, spatel, arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26602

llvm-svn: 288506
2016-12-02 16:06:18 +00:00
Craig Topper
713085e60a [X86] Don't lower FABS/FNEG masking directly to a ConstantPool load. Just create a ConstantFPSDNode and let that be lowered.
This allows broadcast loads to used when available.

llvm-svn: 279958
2016-08-29 04:49:31 +00:00
Craig Topper
c48c029610 [AVX-512] Fix duplicate column in AVX512 execution dependency table that was preventing VMOVDQU32/VMOVDQA32 from being recognized. Fix a bug in the code that stops execution dependency fix from turning operations on 32-bit integer element types into operations on 64-bit integer element types.
llvm-svn: 277327
2016-08-01 07:55:33 +00:00
Craig Topper
ce415ff9c5 [AVX512] Add load folding support for the unmasked forms of the FMA instructions.
llvm-svn: 276615
2016-07-25 07:20:35 +00:00
Craig Topper
b6519db90d [AVX512] Implement commuting support for EVEX encoded FMA3 instructions.
llvm-svn: 276521
2016-07-23 07:16:56 +00:00
Craig Topper
5c913e84df [AVX512] Use VMOVAPSZ128rr/VMOVAPS256rr for VR128X/VR256X physreg moves when VLX is supported.
Ideally we would use VEX encoded moves instead of EVEX if the high 16 registers aren't referenced, but this a good first step.

llvm-svn: 275763
2016-07-18 06:14:34 +00:00
Craig Topper
e5ce84a33c [AVX512] Add VLX 128/256-bit SET0 operations that encode to 128/256-bit EVEX encoded VPXORD so all 32 registers can be used.
llvm-svn: 268884
2016-05-08 21:33:53 +00:00
Vyacheslav Klochkov
a3cd08b05c X86-FMA3: Defined the ExeDomain property for Scalar FMA3 opcodes.
Reviewer: Simon Pilgrim.
Differential Revision: http://reviews.llvm.org/D15317

llvm-svn: 255080
2015-12-09 00:12:13 +00:00
Simon Pilgrim
5a64d98303 [X86][FMA4] Explicitly set the domain of FMA4 float/double scalar instructions
Both were defaulting to the float domain - now matches the packed instructions.

llvm-svn: 254841
2015-12-05 07:07:42 +00:00
Simon Pilgrim
3fc3454a0c [X86][FMA] Optimize FNEG(FMUL) Patterns
On FMA targets, we can avoid having to load a constant to negate a float/double multiply by instead using a FNMSUB (-(X*Y)-0)

Fix for PR24366

Differential Revision: http://reviews.llvm.org/D14909

llvm-svn: 254495
2015-12-02 09:07:55 +00:00
Simon Pilgrim
db26b3ddfa [X86][FMA4] Prefer FMA4 to FMA
We currently output FMA instructions on targets which support both FMA4 + FMA (i.e. later Bulldozer CPUS bdver2/bdver3/bdver4).

This patch flips this so FMA4 is preferred; this is for several reasons:

1 - FMA4 is non-destructive reducing the need for mov instructions.
2 - Its more straighforward to commute and fold inputs (although the recent work on FMA has reduced this difference).
3 - All supported targets have FMA4 performance equal or better to FMA - Piledriver (bdver2) in particular has half the throughput when executing FMA instructions.

Its looks like no future AMD processor lines will support FMA4 after the Bulldozer series so we're not causing problems for later CPUs.

Differential Revision: http://reviews.llvm.org/D14997

llvm-svn: 254339
2015-11-30 22:22:06 +00:00
Simon Pilgrim
82f663d755 [X86][FMA] More thorough FMA tests
Added FMADD/FMSUB/FNMADD/FNMSUB tests for all types

Added load folding tests for 512-bit vectors

NOTE: Many of the AVX512 FMA instructions don't yet commute/fold correctly

As discussed on D14909

llvm-svn: 254232
2015-11-28 14:28:44 +00:00
Simon Pilgrim
1d881ae225 [X86][FMA] Begun adding AVX512 FMA tests
As discussed on D14909

llvm-svn: 254180
2015-11-26 20:53:28 +00:00
Simon Pilgrim
1b4fecb098 [X86][FMA] Optimize FNEG(FMA) Patterns
X86 needs to use its own FMA opcodes, preventing the standard FNEG(FMA) pattern table recognition method used by other platforms. This patch adds support for lowering FNEG(FMA(X,Y,Z)) into a single suitably negated FMA instruction.

Fix for PR24364

Differential Revision: http://reviews.llvm.org/D14906

llvm-svn: 254016
2015-11-24 20:31:46 +00:00
Simon Pilgrim
806c42a747 [X86][FMA] Regenerate tests.
Fixes some broken checks.

llvm-svn: 253830
2015-11-22 19:05:53 +00:00
Andrew Kaylor
4731bea3e5 Improved the operands commute transformation for X86-FMA3 instructions.
All 3 operands of FMA3 instructions are commutable now.

Patch by Slava Klochkov

Reviewers: Quentin Colombet(qcolombet), Ahmed Bougacha(ab).

Differential Revision: http://reviews.llvm.org/D13269

llvm-svn: 252335
2015-11-06 19:47:25 +00:00
Simon Pilgrim
d45c88bbb5 [DAGCombiner] Improved FMA combine support for vectors
Enabled constant canonicalization for all constants.

Improved combining of constant vectors.

llvm-svn: 249993
2015-10-11 19:48:12 +00:00
Simon Pilgrim
4003ed2da3 [DAGCombiner] Improve FMA support for interpolation patterns
This patch adds support for combining patterns such as (FMUL(FADD(1.0, x), y)) and (FMUL(FSUB(x, 1.0), y)) to their FMA equivalents.

This is useful in particular for linear interpolation cases such as (FADD(FMUL(x, t), FMUL(y, FSUB(1.0, t))))

Differential Revision: http://reviews.llvm.org/D13003

llvm-svn: 248210
2015-09-21 20:32:48 +00:00
Simon Pilgrim
779bcf3e3d [X86][FMA] Refreshed fma tests
llvm-svn: 247508
2015-09-12 15:33:05 +00:00
David Blaikie
a79ac14fa6 [opaque pointer type] Add textual IR support for explicit type parameter to load instruction
Essentially the same as the GEP change in r230786.

A similar migration script can be used to update test cases, though a few more
test case improvements/changes were required this time around: (r229269-r229278)

import fileinput
import sys
import re

pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")

for line in sys.stdin:
  sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7649

llvm-svn: 230794
2015-02-27 21:17:42 +00:00
Craig Topper
12f0d9ef2c Improve logic that decides if its profitable to commute when some of the virtual registers involved have uses/defs chains connecting them to physical register. Fix up the tests that this change improves.
llvm-svn: 221336
2014-11-05 06:43:02 +00:00
Stephen Lin
98cbca2e4d Disambiguate function names in some CodeGen tests. (Some tests were using function names that also were names of instructions and/or doing other unusual things that were making the test not amenable to otherwise scriptable pattern matching.) No functionality change.
llvm-svn: 186621
2013-07-18 22:29:15 +00:00
Craig Topper
908e685102 Mark FMA4 instructions as commutable and add them to the folding tables.
llvm-svn: 163035
2012-08-31 23:10:34 +00:00
Craig Topper
c0387f6b23 Mark FMA3 instructions as commutable so that the operands to the multiply part can be commuted.
llvm-svn: 163001
2012-08-31 16:31:13 +00:00
Craig Topper
c30fdbc46c Add support for converting llvm.fma to fma4 instructions.
llvm-svn: 162999
2012-08-31 15:40:30 +00:00