Without VBMI, we are better off permuting v16i32 sub-lanes, even though its a variable shuffle, if it allows us to then shuffle v64i8 inlane repeated masks (PSHUFB etc.)
Fixes#54658
Allows us to fold XOR(X, MIN_SIGNED_VALUE) == ADD(X, MIN_SIGNED_VALUE) into LEA patterns
As mentioned on PR52267.
Differential Revision: https://reviews.llvm.org/D122815
As noticed on PR39174, if we're extracting a single non-constant bit index, then try to use BT+SETCC instead to avoid messing around moving the shift amount to the ECX register, using slow x86 shift ops etc.
Recommitted with a fix to ensure we zext/trunc the SETCC result to the original type.
Differential Revision: https://reviews.llvm.org/D122891
As noticed on PR39174, if we're extracting a single non-constant bit index, then try to use BT+SETCC instead to avoid messing around moving the shift amount to the ECX register, using slow x86 shift ops etc.
Differential Revision: https://reviews.llvm.org/D122891
This approach is used by AArch64/RISCV to make frame-setup/frame-destroy
instructions contiguous instead of being interleaved by CFI instructions. Code
checking `MBBI->getFlag(MachineInstr::FrameSetup) || MBBI->isCFIInstruction()`
can be simplified to just check FrameSetup.
This helps locate all CFI instructions in the prologue, which can be handy to use
.cfi_remember_state/.cfi_restore_state to decrease unwind table size (D114545).
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D122541
This inverts a fold recently added to IR with:
3491f2f4b033
We can put -bidirectional on the Alive2 examples to show that
the reverse transforms work:
https://alive2.llvm.org/ce/z/8iVQwB
The motivation for the IR change was to improve matching to
'fabs' in IR (see https://github.com/llvm/llvm-project/issues/38828 ),
but it regressed x86 codegen for 'not-quite-fabs' patterns like
(X > -X) ? X : -X.
Ie, when there is no fast-math (nsz), the cmp+select is not a proper
fabs operation, but it does map nicely to the unusual NAN semantics
of MINSS/MAXSS.
I drafted this as a target-independent fold, but it doesn't appear to
help any other targets and seems to cause regressions for SystemZ at
least.
Differential Revision: https://reviews.llvm.org/D122726
The AMX combiner would store undef or zero to stack and invoke tileload
to load the data to tile register. To avoid the store/load, we can
materialzie undef or zero value to tilezero.
Differential Revision: https://reviews.llvm.org/D122714
As mentioned on D122482, if we've generated a masked overflow test see if we can fold it to X86ISD::BT to feed a X86ISD::ADC/SBB
Differential Revision: https://reviews.llvm.org/D122572
When shifting by a byte-multiple:
bswap (shl X, C) --> lshr (bswap X), C
bswap (lshr X, C) --> shl (bswap X), C
This is the backend version of D122010 and an alternative
suggested in D120648.
There's an extra check to make sure the shift amount is
valid that was not in the rough draft.
I'm not sure if there is a larger motivating case for RISCV (bug report?),
but the ARM diffs show a benefit from having a late version of the
transform (because we do not combine the loads in IR).
Differential Revision: https://reviews.llvm.org/D122655
If we're not relying on the flag result, we can fold the constants together into the RHS immediate operand and set the LHS operand to zero, simplifying for further folds.
We could do something similar if the flag result is in use and the constant fold doesn't affect it, but I don't have any real test cases for this yet.
As suggested by @davezarzycki on Issue #35256
Differential Revision: https://reviews.llvm.org/D122482
This is an alternative to D122376. Rather than working around the
problem, this patch requires that struct return types in intrinsics
are anonymous/literal and adds auto-upgrade code to convert
existing uses of intrinsics with named struct types.
This ensures that the mapping between intrinsic name and
intrinsic function type is actually bijective, as it is supposed
to be.
This also fixes https://github.com/llvm/llvm-project/issues/37891.
Differential Revision: https://reviews.llvm.org/D122471
This patch add some tests to show some optimization opportunities
for bitreverse-shift.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D121507
(Connect InlineAsm Memory Operand with its real value not just name)
Revert 2 history bugfix patch:
Revert "[X86][MS-InlineAsm] Make the constraint *m to be simple place holder"
This patch revert https://reviews.llvm.org/D115225 which mainly
fix problems intrduced by https://reviews.llvm.org/D113096
This reverts commit d7c07f60b35f901f5bd9153b11807124a9bdde60.
Revert "Reland "[X86][MS-InlineAsm] Use exact conditions to recognize MS global variables""
This patch revert https://reviews.llvm.org/D116090 which fix problem
intrduced by https://reviews.llvm.org/D115225
This reverts commit 24c68ea1eb4fc0d0e782424ddb02da9e8c53ddf5.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D120886
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.
Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.
Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be
removed in the future.
Differential Revision: https://reviews.llvm.org/D121736