Cover more cases in preparation for making greater use
of fcmp based lowerings. Also add more tests for the inverted
cases. Test iszero | isnan test masks. We should probably just
generate every combination of test masks.
This builtin will be converted to llvm.set.rounding intrinsic
in IR level and should be work with "#pragma STDC FENV_ACCESS ON"
since it changes default FP environment. Users can change rounding
mode via this builtin without introducing libc dependency.
Reviewed by: andrew.w.kaylor, rjmccall, sepavloff, aaron.ballman
Differential Revision: https://reviews.llvm.org/D145765
Signed-off-by: jinge90 <ge.jin@intel.com>
With opaque pointers, all function pointer types are the same, meaning there should be no bitcasts.
Internal benchmarks with SampleFDO look neutral.
This was added in D36333.
Reviewed By: tejohnson, davidxl
Differential Revision: https://reviews.llvm.org/D146099
When the affine.parallel op was introduced, affine utilities weren't
extended to handle it. Extending these is straightforward and natural
given that addAffineParallelOpDomain has also been added.
Update/complete memref region compute to account for affine.parallel
ops. Handle failure cleanly.
Add and expose utilities missing for affine.parallel to be consistent
with affine.for.
All of these allow various affine passes to work with a combination of
affine.parallel and affine.for ops.
Differential Revision: https://reviews.llvm.org/D145669
The two methods don't belong in BinaryFunction methods.
Move the dispatch tables into target-specific MCPlusBuilder methods.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D131813
Pre-calculate the register size table in MCPlusBuilder constructor,
similar to `AliasMap`/`SmallerAliasMap` in `initAliases`.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D145828
This allows building the compiler builtins library for the Armv8-M
Baseline architecture. It can be built in the same way as other
baremetal targets using the appropriate '--target' flag
(e.g. --target=armv8m.base-eabi).
NOTE: As with the other Cortex-M targets, only the builtins library is
supported. There is no support for sanitizers, etc.
The armv8m.base architecture is a superset of armv6m, so adding it to
the cmake files using thumb1_SOURCES is almost enough for it to compile.
Minor changes are needed to divsi3 and udivsi3, because armv8m.base does
have support for div instructions but not mov with an immediate operand.
Reviewed By: MaskRay, peter.smith
Differential Revision: https://reviews.llvm.org/D143297
Adds perf event listeners when RTDyldObjectLinkingLayer is used in -jit-kind=orc
mode.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D126214
The patch handles fixed type strict-fp by new RISCVISD::STRICT_ prefixed
isd nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D145900
For now, I have introduced `llvm::tmp::getValueType(Rec)` as a copy from
`CodeGenTarget.cpp`. This will be removed in the near future, when
IntrinsicEmitter will not depend on MVT.
Differential Revision: https://reviews.llvm.org/D143844
https://reviews.llvm.org/D146014 removed the dependency on errno from
several targets and added it to the `libc_test` macro. However,
strtol_test_helper is not a `libc_test` but a `cc_library` so it's
missing a dependency.
This is a clean up before fixing issues identified in this pass by
https://github.com/llvm/llvm-project/issues/61380 and similar issues.
- Move patterns definitions closer to declarations.
- Simplify pattern definitions.
- Drop hand-written pass constructor in favor of an auto-generated on.
- Fix typos in pass description.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D146077
D132262 tried to simplify `IsMetadataOrEHFrameSection` originally introduced in
D127549 but caused a regression as `.quad` directives in
```
.section .note,"a",@note; note:
.quad extern-note # extern is undefined
.section .rodata,"a",@progbits; rodata:
.quad extern-rodata # extern is undefined
.section .nonalloc,"",@progbits; nw:
.quad extern-nw
```
are incorrectly rejected: these differences may be link-time constants and
are allowed in GNU assembler and LLVM MC's non-RISC-V ports.
Relax the conditions to allow these cases. For A-B, A may be defined later, but
this requiresFixups call has to eagerly make a decision. For now, emit ADD/SUB
unless A is `.L*`. This euristic handles many temporary label differences for
.debug_* and .apple_types sections. Ideally we should delay the decision of
PC-relative vs ADD/SUB until A is defined.
Reviewed By: compnerd
Differential Revision: https://reviews.llvm.org/D145474
Follow-up to D143226
Currently we incorrectly emit R_RISCV_ADD32/R_RISCV_SUB32.
Emit R_RISCV_PLT32 instead. The new behavior matches x86-64 and AArch64.
Or else InstCombine can incorrectly report that no change has been made.
This optimization doesn't really fit into InstCombine since it optimizes multiple instructions at once; there's likely a more comprehensive fix.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D146064
Previously, we generate function calls to compare values for sorting. It turns
out that the compiler doesn't inline those function calls. We now directly
generate inlined code. Also, modify the code for comparing values to use less
number of branches.
This improves all sort implementation in general. For arabic-2005.mtx CSR, the
improvement is around 25%.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D145442
Add conversion for integer multiplication in scf reductions in the
SCF to OpenMP dialect conversion.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D145948
This patch adds support for reduction of max-intrinsic for scalar
types. Max is lowered as a compare-select in the default lowering
flow for Flang. This pattern is matched and replaced with the
OpenMP dialect reduction operation.
Note: This is a temporary flow. The plan is to move to a flow where
the OpenMP reduction operation is inserted during lowering.
Reviewed By: do
Differential Revision: https://reviews.llvm.org/D145083
The FP16 broadcast and transpose can always use the same instructions as are
used for i16 vectors, with or without +fullfp16. This fills in some extra costs
to make sure we get them right.
Differential Revision: https://reviews.llvm.org/D146035
Remove the `-lower-global-dtors-via-cxa-atexit` escape hatch introduced
in D121736 [1], which switched the default lowering of global
destructors on MachO to use `__cxa_atexit()` to avoid emitting
deprecated `__mod_term_func` sections.
I added this flag as an escape hatch in case the switch causes any
problems. We didn't discover any problems so now we can remove it.
[1] https://reviews.llvm.org/D121736
rdar://90277838
Differential Revision: https://reviews.llvm.org/D145715
The bundler accepts both of the following for the --target option:
hip-amdgcn-amd-amdhsa-gfx900 (no env field)
hip-amdgcn-amd-amdhsa--gfx900 (blank env field)
The environment field is defined as optional for Triples
in Triple.h. However, in this patch we update the bundler to
internally standardize to include the env field. While users
aren't required to specify an env field when listing targets on
the commandline, bundles generated by the offload-bundler will
include the ABI field.
This standardization simplifies things for APIs that deal with
bundles generated by the clang-offload-bundler tool.
Differential Revision: https://reviews.llvm.org/D145770
Cost modeling for GEPs should actually be target dependent but is currently
done inside SLP target-independent way.
Sinking it into TTI enables target dependent implementation.
This patch adds new TTI interface and implementation of the basic functionality
trying to retain existing cost modeling.
Differential Revision: https://reviews.llvm.org/D144770