llvm-project

Author	SHA1	Message	Date
Philip Reames	7ca0af8943	[RISCV] Consolidate test lines in fence lowering test These are identical for RV32 and RV64.	2023-01-09 11:00:47 -08:00
Philip Reames	0fcbb12465	[RISCV] Add test coverage for singlethread fences	2023-01-09 10:14:12 -08:00
Florian Hahn	596a8d07d4	[AArch64] Add additional reassociation test. Add a test where the reassociation candidates are split across 2 blocks.	2023-01-09 16:38:19 +00:00
Florian Hahn	70520e2f1c	[AArch64] Add test showing reassociation potential. Add a test case where some ops of a reassociate-able expression are in an earlier block. This can appear in practice, e.g. when computing the final reduction value after vectorization.	2023-01-09 15:20:55 +00:00
Sander de Smalen	8aff167b34	[AArch64][SME] Improve streaming-compatible codegen for extending loads/truncating stores. This is another step in aligning addTypeForStreamingSVE with addTypeForFixedLengthSVE, which also improves code quality for extending loads and truncating stores. Reviewed By: hassnaa-arm Differential Revision: https://reviews.llvm.org/D141266	2023-01-09 15:08:04 +00:00
Alexey Baturo	35b8bb0ab3	[RISC-V][HWASAN] Don't explicitly load GOT entry to call hwasan mismatch routine Reviewed by: luismarques Differential Revision: https://reviews.llvm.org/D132994	2023-01-09 16:46:28 +03:00
David Green	90f24bef47	[ARM] Fold And/Or into CSel if possible This is the ARM equivalent of D141119, where we fold `and x, (csel 0, 1, cc)` to `csel ZR, x, cc` if we know that x is 0/1 and for `or x, (csel 0, 1, cc)` emit `csinc x, ZR, cc`. The or pattern gets recognized from a cmov under Arm. Differential Revision: https://reviews.llvm.org/D141137	2023-01-09 13:28:57 +00:00
David Green	07d6af6a71	[AArch64] Fold And/Or into CSel if possible If we have `and x, (csel 0, 1, cc)` and we know that x is 0/1, then we can emit a `csel ZR, x, cc`. Similarly for `or x, (csel 0, 1, cc)` we can emit `csinc x, ZR, cc`. This can help where we can not otherwise general ccmp instructions. Differential Revision: https://reviews.llvm.org/D141119	2023-01-09 11:52:37 +00:00
Tim Northover	5b24d42106	TailDuplication: do not remove trivial PHIs from addr-taken blocks. Unlike an anonymous block, it will not be removed even though we've resolved all valid paths to get here. So removing a PHI can leave vregs with no definition, violating SSA. Instead, this converts it to an IMPLICIT_DEF.	2023-01-09 11:12:33 +00:00
Stephen Tozer	da0faa0594	[DebugInfo] Produce variadic DBG_INSTR_REFs from ISel This patch modifies SelectionDAG and FastISel to produce DBG_INSTR_REFs with variadic expressions, and produce DBG_INSTR_REFs for debug values with variadic location expressions. The former essentially means just prepending DW_OP_LLVM_arg, 0 to the existing expression. The latter is achieved in MachineFunction::finalizeDebugInstrRefs and InstrEmitter::EmitDbgInstrRef. Reviewed By: jmorse, Orlando Differential Revision: https://reviews.llvm.org/D133929	2023-01-09 08:58:33 +00:00
zhongyunde	9e83333445	[AArch64][SelectionDAG] Eliminates redundant zero-extension for 32-bit popcount Fix https://github.com/llvm/llvm-project/issues/59597. mov w8, w0 + fmov d0, x8 ==> fmov s0, w0 Reviewed By: dmgreen, efriedma Differential Revision: https://reviews.llvm.org/D140649	2023-01-09 16:08:16 +08:00
Serguei Katkov	fd64bd94ed	[Inline Spiller] Extend the snippet by statepoint uses Snippet is a tiny live interval which has copy or fill like def and copy or spill like use at the end (any of them might abcent). Snippet has only one use/def inside interval and interval is located in one basic block. When inline spiller spills some reg around uses it also forces the spilling of connected snippets those which got by splitting the same original reg and its def is a full copy of our reg or its last use is a full copy to our reg. The definition of snippet is extended to allow not only one use/def but more. However all other uses are statepoint instructions which will fold fill into its operand. That way we do not introduce new fills/spills. Reviewed By: qcolombet, dantrushin Differential Revision: https://reviews.llvm.org/D138093	2023-01-09 13:30:57 +07:00
Simon Pilgrim	ddab12d118	[X86] Add shuffle test coverage for Issue #59860	2023-01-08 19:06:06 +00:00
Ayke van Laethem	9592920890	[AVR] Optimize 32-bit shifts: optimize REG_SEQUENCE This pseudo-instruction stores two small (8-bit) registers into one wide (16-bit) register. But apparently the order matters a lot to the register allocator. This patch changes the order of inserting the registers to optimize for the best register allocation in the tests of shift32.ll. It might be detrimental in other cases, but keeping the registers in the same physical register seems like it would be a common case. Differential Revision: https://reviews.llvm.org/D140573	2023-01-08 20:05:31 +01:00
Ayke van Laethem	fad5e0cf50	[AVR] Optimize 32-bit shifts: reverse shift + move This optimization turns shifts of almost a multiple of 8 into a shift into the opposite direction. Unfortunately it doesn't compose well with the other optimizations (I've tried) so it's separate from them. Differential Revision: https://reviews.llvm.org/D140572	2023-01-08 20:05:31 +01:00
Ayke van Laethem	81f5f22f27	[AVR] Optimize 32-bit shifts: shift by 4 bits This uses a complicated shift sequence that avr-gcc also uses, but extended to work over any number of bytes and in both directions (logical shift left and logical shift right). Unfortunately it can't be used for an arithmetic shift right: I've tried to come up with a sequence but couldn't. Differential Revision: https://reviews.llvm.org/D140571	2023-01-08 20:05:31 +01:00
Ayke van Laethem	8f8afabd32	[AVR] Optimize 32-bit shift: move bytes around This patch optimizes 32-bit constant shifts by renaming registers. This is very effective as the compiler would otherwise need to do a lot of single bit shift instructions. Instead, the registers are renamed at the SSA level which means the register allocator will insert the necessary mov instructions. Unfortunately, the register allocator will insert some unnecessary movs with the current code. This will be fixed in a later patch. Differential Revision: https://reviews.llvm.org/D140570	2023-01-08 20:05:31 +01:00
Ayke van Laethem	840d10a1d2	[AVR] Custom lower 32-bit shift instructions 32-bit shift instructions were previously expanded using the default SelectionDAG expander, which meant it used 16-bit constant shifts and ORed them together. This works, but is far from optimal. I've optimized 32-bit shifts on AVR using a custom inserter. This is done using three new pseudo-instructions that take the upper and lower bits of the value in two separate 16-bit registers and outputs two 16-bit registers. This is the first commit in a series. When completed, shift instructions will take around 31% less instructions on average for constant 32-bit shifts, and is in all cases equal or better than the old behavior. It also tends to match or outperform avr-gcc: the only cases where avr-gcc does better is when it uses a loop to shift, or when the LLVM register allocator inserts some unnecessary movs. But it even outperforms avr-gcc in some cases where avr-gcc does not use a loop. As a side effect, non-constant 32-bit shifts also become more efficient. For some real-world differences: the build of compiler-rt I use in TinyGo becomes 2.7% smaller and the build of picolibc I use becomes 0.9% smaller. I think picolibc is a better representation of real-world code, but even a ~1% reduction in code size is really significant. The current patch just lays the groundwork. The result is actually a regression in code size. Later patches will use this as a basis to optimize these shift instructions. Differential Revision: https://reviews.llvm.org/D140569	2023-01-08 20:05:31 +01:00
Ayke van Laethem	0408b131eb	[SelectionDAG][AVR] Add support for lrint and lround intrinsics Integer legalization already supported splitting the output integer of llround and llrint, but did not support this for lround and lrint yet. This is not a problem for 32-bit architectures, but for 8/16-bit architectures like AVR it results in a crash like this: ExpandIntegerResult #0: t7: i32 = lround t6 LLVM ERROR: Do not know how to expand the result of this operator! This patch simply add lrint/lround to the list of ISD opcodes to expand. Fixes https://github.com/llvm/llvm-project/issues/59573. Differential Revision: https://reviews.llvm.org/D140822	2023-01-08 18:56:07 +01:00
Ayke van Laethem	167338de96	[AVR] correctly declare __do_copy_data and __do_clear_bss These two symbols are declared in object files to indicate whether .data needs to be copied from flash or .bss needs to be cleared. They are supported on avr-gcc and reduce firmware size a bit, which is especially important on very small chips. I checked the behavior of avr-gcc and matched it as well as possible. From my investigation, it seems to work as follows: __do_copy_data is set when the compiler finds a data symbol: * without a section name * with a section name starting with ".data" or ".gnu.linkonce.d" * with a section name starting with ".rodata" or ".gnu.linkonce.r" and flash and RAM are in the same address space __do_clear_bss is set when the compiler finds a data symbol: * without a section name * with a section name that starts with .bss Simply checking whether the calculated section name starts with ".data", ".rodata" or ".bss" should result in the same behavior. Fixes: https://github.com/llvm/llvm-project/issues/58857 Differential Revision: https://reviews.llvm.org/D140830	2023-01-08 18:56:06 +01:00
Paul Walker	c9602e02fc	[SVE] Fix incorrect VT usage when lowering fixed length vector divides. Ensure the negation required when lowering negative power-of-two divides uses the scalable vector container type with the fixed length result extracted from it. Fixes: #59647 Differential Revision: https://reviews.llvm.org/D140563	2023-01-08 12:22:05 +00:00
Matt Arsenault	270e96f435	Revert "AMDGPU: Invert handling of enqueued block detection" This reverts commit 47288cc977fa31c44cc92b4e65044a5b75c2597e. The runtime is having trouble with this at -O0 when the inputs are always enabled.	2023-01-07 21:48:07 -05:00
Eduard Zingerman	f60aefdc7f	[BPF] generate btf_decl_tag records for params of extern functions After frontend changes in the following commit: "BPF: preserve btf_decl_tag for parameters of extern functions" same mechanics could be used to get the list of function parameters and associated btf_decl_tag entries for both extern and non-extern functions. This commit extracts this mechanics as a separate auxiliary function BTFDebug::processDISubprogram(). The function is called for both extern and non-extern functions in order to generated corresponding BTF_DECL_TAG records. Differential Revision: https://reviews.llvm.org/D140971	2023-01-07 09:32:18 -08:00
Michal Paszkowski	99203241df	[SPIR-V] Map IR function pointers to registers in ModuleAnalysis SPIRVModuleAnalysis collects module and external function registers (usually result of OpFunction) for use when emitting OpFunctionCall. This patch makes the mapping between the functions and registers using pointers (instead of name strings) to ensure anonymous functions and calls can be resolved properly. Differential Revision: https://reviews.llvm.org/D140548	2023-01-07 15:38:01 +01:00
David Green	0d4ab5de7f	[ARM][AArch64] Add tests for And/Or into CSel fold. NFC	2023-01-07 14:08:29 +00:00
gonglingqin	2c174a53d5	[LoongArch] Move illegal ImmArg tests to llvm/test/Verifier This patch also fixes incorrect function declarations in test cases and remove -disable-verify from the test case. Fix https://github.com/llvm/llvm-project/issues/59839	2023-01-07 12:10:13 +08:00
Matt Arsenault	47554a0c73	AMDGPU: Use more accurate IR type for block handle The device library uses this as a struct with a pointer sized integer and 2 ints.	2023-01-06 21:23:28 -05:00
Matt Arsenault	b7587ca837	AMDGPU: Add more opencl printf tests	2023-01-06 21:23:14 -05:00
Matt Arsenault	47288cc977	AMDGPU: Invert handling of enqueued block detection Invert the sense of the attribute and let the attributor figure this out like everything else. If needed we can have the not-OpenCL languages set amdgpu-no-default-queue and amdgpu-no-completion-action up front so they never have to pay the cost. There are also so many of these now, the offset use API should probably consider all of them at once. Maybe they should merge into one attribute with used fields. Having separate functions for each field in AMDGPUBaseInfo is also not the greatest API (might as well fix this when the patch to get the object version from the module lands).	2023-01-06 21:16:08 -05:00
Matt Arsenault	0416883dc1	AMDGPU: Fix enqueue block lowering for opaque pointers This was looking for a specific constant cast of the function, when the type doesn't matter. Doesn't bother trying to handle typed pointers, it will just assert. Things probably don't work completely correctly if the block kernel address is captured somewhere else, but that wouldn't work before either. The uses should really be loads out of the handle, and the handle initializer should contain the kernel address.	2023-01-06 21:15:39 -05:00
Matt Arsenault	4ce5400a3f	AMDGPU: Convert enqueue-kernel.ll to opaque pointers This demonstrates the pass is broken with them, the follow up change will fix it.	2023-01-06 21:15:39 -05:00
Matt Arsenault	8723836358	AMDGPU: Add additional printf string tests Test various inputs passed to %s.	2023-01-06 17:22:13 -05:00
Stephen Tozer	c383f4d655	[DebugInfo] Allow non-stack_value variadic expressions and use in DBG_INSTR_REF Prior to this patch, variadic DIExpressions (i.e. ones that contain DW_OP_LLVM_arg) could only be created by salvaging debug values to create stack value expressions, resulting in a DBG_VALUE_LIST being created. As of the previous patch in this patch stack, DBG_INSTR_REF's syntax has been changed to match DBG_VALUE_LIST in preparation for supporting variadic expressions. This patch adds some minor changes needed to allow variadic expressions that aren't stack values to exist, and allows variadic expressions that are trivially reduceable to non-variadic expressions to be handled similarly to non-variadic expressions. Reviewed by: jmorse Differential Revision: https://reviews.llvm.org/D133926	2023-01-06 19:31:10 +00:00
James Y Knight	1ae36b1387	Remove special cases for invoke of non-throwing inline-asm. Non-throwing inline asm infers the nounwind attribute in instcombine. Thus, it can be handled in the same manner as non-throwing target functions are generally. Further special casing is unnecessary complexity.	2023-01-06 13:53:10 -05:00
Stephen Tozer	e10e936315	[DebugInfo][NFC] Add new MachineOperand type and change DBG_INSTR_REF syntax This patch makes two notable changes to the MIR debug info representation, which result in different MIR output but identical final DWARF output (NFC w.r.t. the full compilation). The two changes are: * The introduction of a new MachineOperand type, MO_DbgInstrRef, which consists of two unsigned numbers that are used to index an instruction and an output operand within that instruction, having a meaning identical to first two operands of the current DBG_INSTR_REF instruction. This operand is only used in DBG_INSTR_REF (see below). * A change in syntax for the DBG_INSTR_REF instruction, shuffling the operands to make it resemble DBG_VALUE_LIST instead of DBG_VALUE, and replacing the first two operands with a single MO_DbgInstrRef-type operand. This patch is the first of a set that will allow DBG_INSTR_REF instructions to refer to multiple machine locations in the same manner as DBG_VALUE_LIST. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D129372	2023-01-06 18:03:48 +00:00
LiDongjin	4554663bc0	Recommit "[RISCV] Enable the LocalStackSlotAllocation pass support" This includes a fix for the tramp3d failure from the llvm-testsuite that caused the last revert. Hopefully the others failures were the same issue. Original commit message: For RISC-V, load/store(exclude vector load/store) instructions only has a 12 bit immediate operand. If the offset is out-of-range, it must make use of a temp register to make up this offset. If between these offsets, they have a small(IsInt<12>) relative offset, LocalStackSlotAllocation pass can find a value as frame base register's value, and replace the origin offset with this register's value plus the relative offset. Co-authored-by: luxufan <luxufan@iscas.ac.cn> Co-authored-by: Craig Topper <craig.topper@sifive.com> Differential Revision: https://reviews.llvm.org/D98101	2023-01-06 09:54:19 -08:00
Craig Topper	9f087ba05b	[RISCV] Improve 4x and 8x (s/u)int_to_fp. Previously we emitted a 4x or 8x vzext followed by a vfcvt. We can instead use a 2x or 4x vzext followed by a vfwcvt.	2023-01-06 08:39:14 -08:00
Craig Topper	1aa9862df3	[RISCV] Add more XVentanaCondOps patterns. Add patterns with seteq/setne conditions. We don't have instructions for seteq/setne except for comparing with zero and need to emit an ADDI or XOR before a seqz/snez to compare other values. The select ISD node takes a 0/1 value for the condition, but the VT_MASKC(N) instructions check all XLen bits for zero or non-zero. We can use this to avoid the seqz/snez in many cases. This is pretty ridiculous number of patterns. I wonder if we could use some ComplexPatterns to merge them, but I'd like to do that as a follow up and focus on correctness of the result in this patch. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D140421	2023-01-06 08:29:23 -08:00
Craig Topper	e5a71a41d8	[RISCV] Add support for the vscale_range attribute. This is based on @frasercrmck's D107290. At least some of the clang portion of D107290 has already been committed. This uses vscale_range for min/max vector width unless the command line overrides are used. As a follow up, I plan to add a max or exact VLEN option to clang to control the vscale_range. This will eliminate many of the reasons for users to use the overrides through the -mllvm interface. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D139873	2023-01-06 08:20:37 -08:00
Luke Lau	275658d1af	[SelectionDAG] Implicitly truncate known bits in SPLAT_VECTOR Now that D139525 fixes the Hexagon infinite loop, the stopgap can be removed to provide more information about known bits in SPLAT_VECTOR whose operands are smaller than the bit width (which is most of the time) Reviewed By: reames Differential Revision: https://reviews.llvm.org/D141075	2023-01-06 15:43:47 +00:00
Luke Lau	b599a30e93	[WebAssembly][NFC] Add test case for PR59626 For D141079 Reviewed By: reames Differential Revision: https://reviews.llvm.org/D141120	2023-01-06 15:43:44 +00:00
Hassnaa Hamdi	9eb698946d	[AArch64][SME]: Make 'Expand' the default action for all Ops. By default expand all operations, then change to Custom/Legal if needed. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D141068	2023-01-06 15:32:07 +00:00
Sanjay Patel	bf82070ea4	[SDAG] try to avoid multiply for X*Y==0 Forking this off from D140850 - https://alive2.llvm.org/ce/z/TgBeK_ https://alive2.llvm.org/ce/z/STVD7d We could almost justify doing this in IR, but consideration for "minsize" requires that we only try it in codegen -- the transform is not reversible. In all other cases, avoiding multiply should be a win because a mul is more expensive than simple/parallelizable compares. AArch even has a trick to keep instruction count even for some types. Differential Revision: https://reviews.llvm.org/D141086	2023-01-06 09:06:11 -05:00
Matt Arsenault	b4d44322d9	AMDGPU/GlobalISel: Add missing test for implicit_def regbankselect	2023-01-06 08:58:10 -05:00
Matt Arsenault	6fe85933d4	AMDGPU/GlobalISel: Add wave32 checks to bool test	2023-01-06 08:58:10 -05:00
Sanjay Patel	bd87b84a02	[AArch64] add tests for x*y == 0; NFC	2023-01-06 08:37:04 -05:00
Sanjay Patel	f58eedeeee	[x86] add tests for x*y == 0; NFC	2023-01-06 08:37:04 -05:00
Luke Lau	fb6602616c	[WebAssembly] Explicitly add {z,s}ext so extends are selected During DAG legalization, {u,s}itofp instructions on v2i8, v2i16, v4i8 and v4i16 types ended up being legalized into scalar instructions, when they could just be extended to v2i32/v4i32 instead. Fixes https://github.com/llvm/llvm-project/issues/57182 Differential Revision: https://reviews.llvm.org/D140916	2023-01-06 12:28:29 +00:00
Ties Stuij	0b066e02a6	[AArch64] add GlobalIsel support for scalar CNT instruction When feature CSSC is available we should use instruction CNT for s32, s64 and s128 types in GlobalIsel's G_CTPOP. spec: https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/CNT--Count-bits- Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D139417	2023-01-06 11:08:34 +00:00
Noah Goldstein	960bf8a454	[X86] Add tests for atomic bittest with register/memory operands Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D140938	2023-01-06 17:55:38 +08:00

1 2 3 4 5 ...

46450 Commits