llvm-project

Commit Graph

Author	SHA1	Message	Date
Stefan Pintilie	1492c88f49	[PowerPC] Fix bugs in sign-/zero-extension elimination This patch fixes the following two bugs in `PPCInstrInfo::isSignOrZeroExtended` helper, which is used from sign-/zero-extension elimination in PPCMIPeephole pass. - Registers defined by load with update (e.g. LBZU) were identified as already sign or zero-extended. But it is true only for the first def (loaded value) and not for the second def (i.e. updated pointer). - Registers defined by ORIS/XORIS were identified as already sign-extended. But, it is not true for sign extension depending on the immediate (while it is ok for zero extension). To handle the first case, the parameter for the helpers is changed from `MachineInstr` to a register number to distinguish first and second defs. Also, this patch moves the initialization of PPCMIPeepholePass to allow mir test case. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D40554	2022-08-19 07:05:40 -05:00
Justin Hibbits	f43b228581	PowerPC: Don't hoist float multiply + add to fused operation on SPE SPE doesn't have a fmadd instruction, so don't bother hoisting a multiply and add sequence to this, as it'd become just a library call. Hoisting happens too late for the CTR usability test to veto using the CTR in a loop, and results in an assert "Invalid PPC CTR loop!".	2022-08-10 11:04:27 -04:00
Chen Zheng	d9004dfbab	[PowerPC] mapping hardward loop intrinsics to powerpc pseudo Map hardware loop intrinsics loop_decrement and set_loop_iteration to the new PowerPC pseudo instructions, so that the hardware loop intrinsics will be expanded to normal cmp+branch form or ctrloop form based on the CTR register usage on MIR level. Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D123366	2022-08-08 21:34:20 -04:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Dawid Jurczak	1bd31a6898	[NFC] Add SmallVector constructor to allow creation of SmallVector<T> from ArrayRef of items convertible to type T Extracted from https://reviews.llvm.org/D129781 and address comment: https://reviews.llvm.org/D129781#3655571 Differential Revision: https://reviews.llvm.org/D130268	2022-08-05 13:35:41 +02:00
David Truby	9a976f3661	[llvm] Always use TargetConstant for FP_ROUND ISD Nodes This patch ensures consistency in the construction of FP_ROUND nodes such that they always use ISD::TargetConstant instead of ISD::Constant. This additionally fixes a bug in the AArch64 SVE backend where patterns were matching against TargetConstant nodes and sometimes failing when passed a Constant node. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D130370	2022-08-03 14:02:11 +01:00
Eli Friedman	1a6d82b93f	Fix misc uses of "long" variables to use "int64_t". I don't have any evidence these particular uses are actually causing any issues, but we should avoid accidentally truncating immediate values depending on the host.	2022-07-27 09:47:19 -07:00
Masoud Ataei	96515df816	[PowerPC] Fix the check for scalar MASS conversion Proposing to move the check for scalar MASS conversion from constructor of PPCTargetLowering to the lowerLibCallBase function which decides about the lowering. The Target machine option Options.PPCGenScalarMASSEntries is set in PPCTargetMachine.cpp. But an object of the class PPCTargetLowering is created in one of the included header files. So, the constructor will run before setting PPCGenScalarMASSEntries to correct value. So, we cannot check this option in the constructor. Differential: https://reviews.llvm.org/D128653 Reviewer: @bmahjour	2022-07-06 11:44:00 -07:00
Ting Wang	88b6d22791	[PowerPC] Improve getNormalLoadInput to reach more splat load opportunities There are straight forward splat load opportunities blocked by getNormalLoadInput(), since those cases involve consecutive bitcasts. Improve by looking through bitcasts. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D128703	2022-06-28 08:02:49 -04:00
Nemanja Ivanovic	e09f6ff3c1	[PowerPC] Disable automatic generation of STXVP There are instances where using paired vector stores leads to significant performance degradation due to issues with store forwarding.To avoid falling into this trap with compiler - generated code, we will not emit these instructions unless the user requests them explicitly(with a builtin or by specifying the option). Reviewed By : lei, amyk, saghir Differential Revision: https://reviews.llvm.org/D127218	2022-06-20 14:30:29 -05:00
Amy Kwan	34033a84b8	[PowerPC] Skip combine for vector_shuffles when two scalar_to_vector nodes are different vector types. Currently in `combineVectorShuffle()`, we update the shuffle mask if either input vector comes from a scalar_to_vector, and we keep the respective input vectors in its permuted form by producing PPCISD::SCALAR_TO_VECTOR_PERMUTED. However, it is possible that we end up in a situation where both input vectors to the vector_shuffle are scalar_to_vector, and are different vector types. In situations like this, the shuffle mask is updated incorrectly as the current code assumes both scalar_to_vector inputs are the same vector type. This patch skips the combines for vector_shuffle if both input vectors are scalar_to_vector, and if they are of different vector types. A follow up patch will focus on fixing this issue afterwards, in order to correctly update the shuffle mask. Differential Revision: https://reviews.llvm.org/D127818	2022-06-15 14:12:18 -05:00
Quinn Pham	335e8bf100	[PowerPC] emit VSX instructions instead of VMX instructions for vector loads and stores This patch changes the PowerPC backend to generate VSX load/store instructions for all vector loads/stores on Power8 and earlier (LE) instead of VMX load/store instructions. The reason for this change is because VMX instructions require the vector to be 16-byte aligned. So, a vector load/store will fail with VMX instructions if the vector is misaligned. Also, `gcc` generates VSX instructions in this situation which allow for unaligned access but require a swap instruction after loading/before storing. This is not an issue for BE because we already emit VSX instructions since no swap is required. And this is not an issue on Power9 and up since we have access to `lxv[x]`/`stxv[x]` which allow for unaligned access and do not require swaps. This patch also delays the VSX load/store for LE combines until after LegalizeOps to prioritize other load/store combines. Reviewed By: #powerpc, stefanp Differential Revision: https://reviews.llvm.org/D127309	2022-06-15 12:06:04 -05:00
Stefan Pintilie	263f1b2f5d	[PowerPC] Fix combine step for shufflevector. The combine step for shufflevector will sometimes replace undef in the mask with a defined value. This can cause an infinite loop in some cases as another combine will then put the undef back in the mask. This patch fixes the issue so that undefs are not replaced when doing a combine. Reviewed By: ZarkoCA, amyk, quinnp, saghir Differential Revision: https://reviews.llvm.org/D127439	2022-06-14 11:31:24 -05:00
Guillaume Chatelet	0788186182	[Alignment][NFC] Remove usage of MemSDNode::getAlignment I can't remove the function just yet as it is used in the generated .inc files. I would also like to provide a way to compare alignment with TypeSize since it came up a few times. Differential Revision: https://reviews.llvm.org/D126910	2022-06-07 13:52:20 +00:00
Zongwei Lan	ad73ce318e	[Target] use getSubtarget<> instead of static_cast<>(getSubtarget()) Differential Revision: https://reviews.llvm.org/D125391	2022-05-26 11:22:41 -07:00
David Green	115c188807	[DAG][PowerPC] Combine shuffle(bitcast(X), Mask) to bitcast(shuffle(X, Mask')) If the mask is made up of elements that form a mask in the higher type we can convert shuffle(bitcast into the bitcast type, simplifying the instruction sequence. A v4i32 2,3,0,1 for example can be treated as a 1,0 v2i64 shuffle. This helps clean up some of the AArch64 concat load combines, along with helping simplify a number of other tests. The PowerPC combine for v16i8 splat vector loads needed some fixes to keep it working for v16i8 vectors. This improves the handling of v2i64 shuffles to match too, hopefully improving them in general. Differential Revision: https://reviews.llvm.org/D123801	2022-05-06 10:50:31 +01:00
Fangrui Song	fb193db2c7	[PowerPC] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds	2022-04-19 22:35:05 -07:00
Kai Luo	18679ac0d7	[PowerPC] Adjust `MaxAtomicSizeInBitsSupported` on PPC64 AtomicExpandPass uses this variable to determine emitting libcalls or not. The default value is 1024 and if we don't specify it for PPC64 explicitly, AtomicExpandPass won't emit `__atomic_` libcalls for those target unable to inline atomic ops and finally the backend emits `__sync_` libcalls. Thanks @efriedma for pointing it out. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122868	2022-04-09 00:03:09 +00:00
Kai Luo	549e118e93	[PowerPC] Support 16-byte lock free atomics on pwr8 and up Make 16-byte atomic type aligned to 16-byte on PPC64, thus consistent with GCC. Also enable inlining 16-byte atomics on non-AIX targets on PPC64. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D122377	2022-04-08 23:25:56 +00:00
Ting Wang	b389354b28	[Clang][PowerPC] Add max/min intrinsics to Clang and PPC backend Add support for builtin_[max\|min] which has below prototype: A builtin_max (A1, A2, A3, ...) All arguments must have the same type; they must all be float, double, or long double. Internally use SelectCC to get the result. Reviewed By: qiucf Differential Revision: https://reviews.llvm.org/D122478	2022-04-05 22:43:48 -04:00
Stefan Pintilie	585c85abe5	[PowerPC] Fix lowering of byval parameters for sizes greater than 8 bytes. To store a byval parameter the existing code would store as many 8 byte elements as was required to store the full size of the byval parameter. For example, a paramter of size 16 would store two element of 8 bytes. A paramter of size 12 would also store two elements of 8 bytes. This would sometimes store too many bytes as the size of the paramter is not always a factor of 8. This patch fixes that issue and now byval paramters are stored with the correct number of bytes. Reviewed By: nemanjai, #powerpc, quinnp, amyk Differential Revision: https://reviews.llvm.org/D121430	2022-03-31 15:12:46 -05:00
Shao-Ce SUN	662b9fa02c	[NFC][CodeGen] Add a setTargetDAGCombine use ArrayRef Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122557	2022-03-29 09:53:24 +08:00
Aaron Puchert	c1a31ee65b	[PPCISelLowering] Avoid emitting calls to __multi3, __muloti4 After D108936, @llvm.smul.with.overflow.i64 was lowered to __multi3 instead of __mulodi4, which also doesn't exist on PowerPC 32-bit, not even with compiler-rt. Block it as well so that we get inline code. Because libgcc doesn't have __muloti4, we block that as well. Fixes #54460. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D122090	2022-03-20 20:59:30 +01:00
Qiu Chaofan	300e1293de	[PowerPC] Disable perfect shuffle by default We are going to remove the old 'perfect shuffle' optimization since it brings performance penalty in hot loop around vectors. For example, in following loop sharing the same mask: %v.1 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27> %v.2 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27> The generated instructions will be `vmrglw-vmrghw-vmrglw-vmrghw` instead of `vperm-vperm`. In some large loop cases, this causes 20%+ performance penalty. The original attempt to resolve this is to pre-record masks of every shufflevector operation in DAG, but that is somewhat complex and brings unnecessary computation (to scan all nodes) in optimization. Here we disable it by default. There're indeed some cases becoming worse after this, which will be fixed in a more careful way in future patches. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D121082	2022-03-15 15:52:24 +08:00
Masoud Ataei	30f30e1c12	[PowerPC] Fix the none tail call in scalar MASS conversion This patch is proposing a fix for patch https://reviews.llvm.org/D101759 on none tail call math function conversion to MASS call. Differential: https://reviews.llvm.org/D121016 reviewer: @nemanjai	2022-03-08 08:59:17 -08:00
Qiu Chaofan	b2497e5435	[PowerPC] Add generic fnmsub intrinsic Currently in Clang, we have two types of builtins for fnmsub operation: one for float/double vector, they'll be transformed into IR operations; one for float/double scalar, they'll generate corresponding intrinsics. But for the vector version of builtin, the 3 op chain may be recognized as expensive by some passes (like early cse). We need some way to keep the fnmsub form until code generation. This patch introduces ppc.fnmsub.* intrinsic to unify four fnmsub intrinsics. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D116015	2022-03-07 13:00:06 +08:00
Qiu Chaofan	43d48ed220	[PowerPC] Add option to disable perfect shuffle Perfect shuffle was introduced into PowerPC backend years ago, and only available in big-endian subtargets. This optimization has good effects in simple cases, but brings serious negative impact in large programs with many shuffle instructions sharing the same mask. Here introduces a temporary backend hidden option to control it until we implemented better way to fix the gap in vectorshuffle decomposition. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D120072	2022-02-21 01:39:35 +08:00
Ting Wang	097a95f2df	[PowerPC] Add custom lowering for SELECT_CC fp128 using xsmaxcqp Power ISA 3.1 adds xsmaxcqp/xsmincqp for quad-precision type-c max/min selection, and this opens the opportunity to improve instruction selection on: llvm.maxnum.f128, llvm.minnum.f128, and select_cc ordered gt/lt and (don't care) gt/lt. Reviewed By: nemanjai, shchenz, amyk Differential Revision: https://reviews.llvm.org/D117006	2022-02-09 21:48:28 -05:00
Nikita Popov	149195f576	[PPCISelLowering] Avoid use of getPointerElementType() Use the value type instead.	2022-02-07 14:30:15 +01:00
Masoud Ataei	8ce13bc93b	[PowerPC] Option controling scalar MASS convertion differential: https://reviews.llvm.org/D119035 reviewer: bmahjour	2022-02-04 13:24:22 -08:00
Benjamin Kramer	85243124cf	Tweak some uses of std::iota to skip initializing the underlying storage. NFCI.	2022-02-04 17:00:50 +01:00
Masoud Ataei	256d253332	[PowerPC] Scalar IBM MASS library conversion pass This patch introduces the conversions from math function calls to MASS library calls. To resolves calls generated with these conversions, one need to link libxlopt.a library. This patch is tested on PowerPC Linux and AIX. Differential: https://reviews.llvm.org/D101759 Reviewer: bmahjour	2022-02-02 07:54:19 -08:00
Amy Kwan	0d6e64755a	[PowerPC] Update P10 vector insert patterns to use refactored load/stores, and update handling of v4f32 vector insert. This patch updates the P10 patterns with a load feeding into an insertelt to utilize the refactored load and store infrastructure, as well as updating any tests that exhibit any codegen changes. Furthermore, custom legalization is added for v4f32 on Power9 and above to not only assist with adjusting the refactored load/stores for P10 vector insert, but also it enables the utilization of direct moves. Differential Revision: https://reviews.llvm.org/D115691	2022-02-01 08:48:37 -06:00
Amy Kwan	9cc5b064f1	[PowerPC] Update handling of splat loads for v4i32/v4f32/v2i64 to require non-extending loads. This patch updates how splat loads handled and is an extension of D106555. Particularly, for v2i64/v4f32/v4i32 types, they are updated to handle only non-extending loads. For v8i16/v16i8 types, they are updated to handle extending loads only if the memory VT is the same vector element VT type. A test case has been added to illustrate a scenario where a PPCISD::LD_SPLAT node should not be produced. In this test, it depicts the following f64 extending load used in a v2f64 build vector, but the extending load is actually used in more places other than the build vector (such as in t12 and t16). ``` Type-legalized selection DAG: %bb.0 'test:entry' SelectionDAG has 20 nodes: t0: ch = EntryToken t4: i64,ch = CopyFromReg t0, Register:i64 %1 t6: i64,ch = CopyFromReg t0, Register:i64 %2 t11: f64,ch = load<(load (s64) from %ir.b, !tbaa !7)> t0, t4, undef:i64 t16: f64 = fadd t31, t37 t34: ch = store<(store (s64) into %ir.c, !tbaa !7)> t31:1, t16, t6, undef:i64 t36: ch = TokenFactor t34, t37:1 t27: v2f64 = BUILD_VECTOR t37, t37 t22: ch,glue = CopyToReg t36, Register:v2f64 $v2, t27 t12: f64 = fadd t11, t37 t28: ch = store<(store (s64) into %ir.b, !tbaa !7)> t11:1, t12, t4, undef:i64 t31: f64,ch = load<(load (s64) from %ir.c, !tbaa !7)> t28, t6, undef:i64 t2: i64,ch = CopyFromReg t0, Register:i64 %0 t37: f64,ch = load<(load (s32) from %ir.a, !tbaa !3), anyext from f32> t0, t2, undef:i64 t23: ch = PPCISD::RET_FLAG t22, Register:v2f64 $v2, t22:1 ``` Differential Revision: https://reviews.llvm.org/D117803	2022-01-28 08:23:01 -06:00
Benjamin Kramer	f15014ff54	Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17" This reverts commit `ef82063207`. - It conflicts with the existing llvm::size in STLExtras, which will now never be called. - Calling it without llvm:: breaks C++17 compat	2022-01-26 16:55:53 +01:00
serge-sans-paille	ef82063207	Rename llvm::array_lengthof into llvm::size to match std::size from C++17 As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no build breakage expected).	2022-01-26 16:17:45 +01:00
Nemanja Ivanovic	0c56bc92e4	[PowerPC] Fix eq/ne comparison of v2i64 pre-Power8 In commit `1674d9b6b2`, I fixed the bug where we didn't consider both words of the result of the comparison. However, the logic needs to be different for eq and ne. Namely for eq, we need both words of the doubleword to equal so it is an AND. OTOH for ne, we need either word to be unequal so it is an OR.	2022-01-26 08:59:08 -06:00
Nikita Popov	aa97bc116d	[NFC] Remove uses of PointerType::getElementType() Instead use either Type::getPointerElementType() or Type::getNonOpaquePointerElementType(). This is part of D117885, in preparation for deprecating the API.	2022-01-25 09:44:52 +01:00
Stefan Pintilie	1324bb29f7	[PowerPC] Fix issue with strict float to int conversion. When doing the float to int conversion the strict conversion also needs to retun a chain. This patch fixes that. Reviewed By: nemanjai, #powerpc, qiucf Differential Revision: https://reviews.llvm.org/D117464	2022-01-19 10:57:22 -06:00
Jim Lin	d6b0734837	[NFC] Use Register instead of unsigned	2022-01-19 20:17:04 +08:00
Kazu Hirata	69ccc96162	[llvm] Use the default constructor for SDValue (NFC)	2022-01-01 10:36:59 -08:00
Nemanja Ivanovic	1674d9b6b2	[PowerPC] Fix vector equality comparison for v2i64 pre-Power8 The current code makes the assumption that equality comparison can be performed with a word comparison instruction. While this is true if the entire 64-bit results are used, it does not generally work. It is possible that the low order words and high order words produce different results and a user of only one will get the wrong result. This patch adds an and of the result words so that each word has the result of the comparison of the entire doubleword that contains it. Differential revision: https://reviews.llvm.org/D115678	2021-12-21 14:28:41 -06:00
Chen Zheng	d0022a7250	[PowerPC] copy byval parameter to caller's stack when needed Now we won't copy the byval parameter (bigger than 8 bytes) to caller's parameter save area. Instead, we will only copy the byval parameter when it can not be passed entirely in registers which means we have to use parameter save area according to the 64 bit SVR4 ABI. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D111485	2021-12-09 01:00:47 +00:00
Amy Kwan	c27734c183	[PowerPC] Fix load/store selection infrastructure when load/store intrinsics are used on P10. The load/store infrastructure previously made an incorrect assumption that whenever it is used with a load/store intrinsic on Power10 - those intrinsics would automatically be the lxvp/stxvp intrinsics introduced in Power10. However, this is obviously not the case as there are multiple instances of pre-P10 intrinsics that use the refactored load/store implementation. This patch corrects this assumption, and produces the expected intrinsic on pre-P10. Differential Revision: https://reviews.llvm.org/D114978	2021-12-02 15:59:29 -06:00
Nemanja Ivanovic	b7bf937bbe	[PowerPC] Provide XL-compatible vec_round implementation The XL implementation of vec_round for vector double uses "round-to-nearest, ties to even" just as the vector float `version does. However clang and gcc use "round-to-nearest-away" for vector double and "round-to-nearest, ties to even" for vector float. The XL behaviour is implemented under the __XL_COMPAT_ALTIVEC__ macro similarly to other instances of incompatibility. Differential revision: https://reviews.llvm.org/D113642	2021-11-24 06:43:56 -06:00
Nemanja Ivanovic	c9cb8edc51	[PowerPC] Allow scalars for asm constraint "v" with VSX Similarly to what GCC does, we should allow scalars with the "v" constraint rather than introducing unnecessary new constraints for scalars in Altivec registers. Differential revision: https://reviews.llvm.org/D113635	2021-11-23 17:03:04 -06:00
Lei Huang	f50c6c1718	[PowerPC] Fix 32bit vector insert instructions for ISA3.1 The platform independent ISD::INSERT_VECTOR_ELT take a element index, but vins* instructions take a byte index. Update 32bit td patterns for vector insert to handle the element index accordingly. Since vector insert for non constant index are supported in ISA3.1, there is no need to use platform specific ISD node, PPCISD::VECINSERT. Update td pattern to directly use ISD::INSERT_VECTOR_ELT instead. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D113802	2021-11-15 14:36:39 -06:00
Kazu Hirata	d243cbf8ea	[llvm] Use isa instead of dyn_cast (NFC)	2021-11-14 19:40:46 -08:00
Kazu Hirata	609ccbb240	[PowerPC] Use SDNode::uses (NFC)	2021-11-13 08:34:22 -08:00
Nemanja Ivanovic	5840f7197d	[PowerPC] Respect rounding mode in the back end Currently, the floating point instructions that depend on rounding mode are correctly marked in the PPC back end with an implicit use of the RM register. Similarly, instructions that explicitly define the register are marked with an implicit def of the same register. So for the most part, RM-using code won't be moved across RM-setting instructions. However, calls are not marked as RM-setting instructions so code can be moved across calls. This is generally desired, but so is the ability to turn off this behaviour with an appropriate option - and -frounding-math really should be that option. This patch provides a set of call instructions (for direct and indirect calls) that are marked with an implicit def of the RM register. These will be used for calls that are marked with the strictfp attribute. Differential revision: https://reviews.llvm.org/D111433	2021-11-10 08:19:58 -06:00

1 2 3 4 5 ...

1729 Commits