llvm-project

Commit Graph

Author	SHA1	Message	Date
Benjamin Kramer	bfc812a2f3	[PowerPC][NFC] Merge LLVM_DEBUG statements to avoid unused variable warnings	2022-11-23 21:09:33 +01:00
Maryam Moghadas	934d5fa2b8	[PowerPC] Exploit xxperm, check for dead vectors and substitute vperm with xxperm vperm instruction requires the data to be in the Altivec registers, if one of the vector operands is not used after this vperm instruction then it can be substituted by xxperm which doubles the number of available registers. Reviewed By: stefanp Differential Revision: https://reviews.llvm.org/D133700	2022-11-23 13:28:12 -06:00
Stefan Pintilie	1ac6956b52	[PowerPC] Add handling for WACC register spilling. This patch adds spilling for the new WACC registers. In order to get the spilling test to work the MMA instructions from Power 10 are now supported for Future CPU except that they are all using the new WACC registers instead of the ACC registers from Power 10. Reviewed By: amyk, saghir Differential Revision: https://reviews.llvm.org/D136728	2022-11-22 09:37:52 -06:00
Stanislav Mekhanoshin	bcaf31ec3f	[AMDGPU] Allow finer grain control of an unaligned access speed A target can return if a misaligned access is 'fast' as defined by the target or not. In reality there can be different levels of 'fast' and 'slow'. This patch changes the boolean 'Fast' argument of the allowsMisalignedMemoryAccesses family of functions to an unsigned representing its speed. A target can still define it as it wants and the direct translation of the current code uses 0 and 1 for current false and true. This makes the change an NFC. Subsequent patch will start using an actual value of speed in the load/store vectorizer to compare if a vectorized access going to be not just fast, but not slower than before. Differential Revision: https://reviews.llvm.org/D124217	2022-11-17 09:23:53 -08:00
Qiu Chaofan	5d19fea81f	[PowerPC] Fix strict load-conversion recognition Direct-move instructions are usually more efficient than load then store for conversion. But direct moves are not needed when the source register was just loaded from some address. The pattern has already been recognized, but the source value of strict nodes are not the first (that's the chain), but the second. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D138011	2022-11-16 10:02:10 +08:00
Amy Kwan	715301056e	[PowerPC] Fix invalid cast for vector shuffles when lowering to the xxsplti32dx instruction. When lowering vector shuffles into the xxsplti32dx instruction on Power10, we canonicalize the right operand to be a BUILD_VECTOR and as a result, get the commuted vector shuffle node. However, a vector shuffle will not always be returned as the result for a commuted vector shuffle. In such a scenario, this patch updates the original cast of a shuffle into a dyn_cast<> and checks if the shuffle is a valid vector shuffle node prior to obtaining the commuted shuffle mask. This patch also adds a new test case that demonstrates this scenario (primarily seen on 32-bit), and was originally a crash prior to this fix. Differential Revision: https://reviews.llvm.org/D135024	2022-10-24 09:56:54 -05:00
Nemanja Ivanovic	4ea121c904	[PowerPC] Fix a number of inefficiencies and issues with atomic code gen There are a few issues with the code we generate for atomic operations and the way we generate it: - Hard coded CR0 for compares - Order of operands for compares not conducive to emitting compare-immediate or for CSE of compares - Missing MachineMemOperand for st[bhwd]cx intrinsics - Missing intrinsic properties for the same - Unnecessary blocks with store conditional instructions to clear reservation (which ends up hindering performance) - Move from CR instructions just to compare the result of a store conditional with zero (even though it is a record-form) This patch aims to resolve all of those issues. Differential revision: https://reviews.llvm.org/D134783	2022-10-03 19:55:29 -05:00
Paul Scoropan	ce004fb4f2	[PowerPC] XCOFF exception section support on the direct assembler path This feature implements support for making entries in the exception section on XCOFF on the direct assembly path using the ".except" pseudo-op. It also provides functionality to lower entries (comprised of language and reason codes) into the exception section through the use of annotation metadata attached to llvm.ppc.trap/trapd/tw/tdw intrinsics. Integrated assembler support will be provided in another review. https://reviews.llvm.org/D133030 needs to merge first for LIT tests Reviewed By: shchenz, RKSimon Differential Revision: https://reviews.llvm.org/D132146	2022-09-26 22:24:20 -04:00
Josh Stone	4dcfb09e40	[NFC][CodeGen] Use const MF in TargetLowering stack probe functions This makes them callable from places like canUseAsPrologue. Differential Revision: https://reviews.llvm.org/D134492	2022-09-23 09:30:32 -07:00
Sergei Barannikov	c6acb4eb0f	[SDAG] Add `getCALLSEQ_END` overload taking `uint64_t`s All in-tree targets pass pointer-sized ConstantSDNodes to the method. This overload reduced amount of boilerplate code a bit. This also makes getCALLSEQ_END consistent with getCALLSEQ_START, which already takes uint64_ts.	2022-09-15 14:02:12 -04:00
Joe Loser	5e96cea1db	[llvm] Use std::size instead of llvm::array_lengthof LLVM contains a helpful function for getting the size of a C-style array: `llvm::array_lengthof`. This is useful prior to C++17, but not as helpful for C++17 or later: `std::size` already has support for C-style arrays. Change call sites to use `std::size` instead. Differential Revision: https://reviews.llvm.org/D133429	2022-09-08 09:01:53 -06:00
Kazu Hirata	7d8c2d17eb	[llvm] Use range-based for loops (NFC) Identified with modernize-loop-convert.	2022-09-03 23:27:25 -07:00
Stefan Pintilie	1492c88f49	[PowerPC] Fix bugs in sign-/zero-extension elimination This patch fixes the following two bugs in `PPCInstrInfo::isSignOrZeroExtended` helper, which is used from sign-/zero-extension elimination in PPCMIPeephole pass. - Registers defined by load with update (e.g. LBZU) were identified as already sign or zero-extended. But it is true only for the first def (loaded value) and not for the second def (i.e. updated pointer). - Registers defined by ORIS/XORIS were identified as already sign-extended. But, it is not true for sign extension depending on the immediate (while it is ok for zero extension). To handle the first case, the parameter for the helpers is changed from `MachineInstr` to a register number to distinguish first and second defs. Also, this patch moves the initialization of PPCMIPeepholePass to allow mir test case. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D40554	2022-08-19 07:05:40 -05:00
Justin Hibbits	f43b228581	PowerPC: Don't hoist float multiply + add to fused operation on SPE SPE doesn't have a fmadd instruction, so don't bother hoisting a multiply and add sequence to this, as it'd become just a library call. Hoisting happens too late for the CTR usability test to veto using the CTR in a loop, and results in an assert "Invalid PPC CTR loop!".	2022-08-10 11:04:27 -04:00
Chen Zheng	d9004dfbab	[PowerPC] mapping hardward loop intrinsics to powerpc pseudo Map hardware loop intrinsics loop_decrement and set_loop_iteration to the new PowerPC pseudo instructions, so that the hardware loop intrinsics will be expanded to normal cmp+branch form or ctrloop form based on the CTR register usage on MIR level. Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D123366	2022-08-08 21:34:20 -04:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Dawid Jurczak	1bd31a6898	[NFC] Add SmallVector constructor to allow creation of SmallVector<T> from ArrayRef of items convertible to type T Extracted from https://reviews.llvm.org/D129781 and address comment: https://reviews.llvm.org/D129781#3655571 Differential Revision: https://reviews.llvm.org/D130268	2022-08-05 13:35:41 +02:00
David Truby	9a976f3661	[llvm] Always use TargetConstant for FP_ROUND ISD Nodes This patch ensures consistency in the construction of FP_ROUND nodes such that they always use ISD::TargetConstant instead of ISD::Constant. This additionally fixes a bug in the AArch64 SVE backend where patterns were matching against TargetConstant nodes and sometimes failing when passed a Constant node. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D130370	2022-08-03 14:02:11 +01:00
Eli Friedman	1a6d82b93f	Fix misc uses of "long" variables to use "int64_t". I don't have any evidence these particular uses are actually causing any issues, but we should avoid accidentally truncating immediate values depending on the host.	2022-07-27 09:47:19 -07:00
Masoud Ataei	96515df816	[PowerPC] Fix the check for scalar MASS conversion Proposing to move the check for scalar MASS conversion from constructor of PPCTargetLowering to the lowerLibCallBase function which decides about the lowering. The Target machine option Options.PPCGenScalarMASSEntries is set in PPCTargetMachine.cpp. But an object of the class PPCTargetLowering is created in one of the included header files. So, the constructor will run before setting PPCGenScalarMASSEntries to correct value. So, we cannot check this option in the constructor. Differential: https://reviews.llvm.org/D128653 Reviewer: @bmahjour	2022-07-06 11:44:00 -07:00
Ting Wang	88b6d22791	[PowerPC] Improve getNormalLoadInput to reach more splat load opportunities There are straight forward splat load opportunities blocked by getNormalLoadInput(), since those cases involve consecutive bitcasts. Improve by looking through bitcasts. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D128703	2022-06-28 08:02:49 -04:00
Nemanja Ivanovic	e09f6ff3c1	[PowerPC] Disable automatic generation of STXVP There are instances where using paired vector stores leads to significant performance degradation due to issues with store forwarding.To avoid falling into this trap with compiler - generated code, we will not emit these instructions unless the user requests them explicitly(with a builtin or by specifying the option). Reviewed By : lei, amyk, saghir Differential Revision: https://reviews.llvm.org/D127218	2022-06-20 14:30:29 -05:00
Amy Kwan	34033a84b8	[PowerPC] Skip combine for vector_shuffles when two scalar_to_vector nodes are different vector types. Currently in `combineVectorShuffle()`, we update the shuffle mask if either input vector comes from a scalar_to_vector, and we keep the respective input vectors in its permuted form by producing PPCISD::SCALAR_TO_VECTOR_PERMUTED. However, it is possible that we end up in a situation where both input vectors to the vector_shuffle are scalar_to_vector, and are different vector types. In situations like this, the shuffle mask is updated incorrectly as the current code assumes both scalar_to_vector inputs are the same vector type. This patch skips the combines for vector_shuffle if both input vectors are scalar_to_vector, and if they are of different vector types. A follow up patch will focus on fixing this issue afterwards, in order to correctly update the shuffle mask. Differential Revision: https://reviews.llvm.org/D127818	2022-06-15 14:12:18 -05:00
Quinn Pham	335e8bf100	[PowerPC] emit VSX instructions instead of VMX instructions for vector loads and stores This patch changes the PowerPC backend to generate VSX load/store instructions for all vector loads/stores on Power8 and earlier (LE) instead of VMX load/store instructions. The reason for this change is because VMX instructions require the vector to be 16-byte aligned. So, a vector load/store will fail with VMX instructions if the vector is misaligned. Also, `gcc` generates VSX instructions in this situation which allow for unaligned access but require a swap instruction after loading/before storing. This is not an issue for BE because we already emit VSX instructions since no swap is required. And this is not an issue on Power9 and up since we have access to `lxv[x]`/`stxv[x]` which allow for unaligned access and do not require swaps. This patch also delays the VSX load/store for LE combines until after LegalizeOps to prioritize other load/store combines. Reviewed By: #powerpc, stefanp Differential Revision: https://reviews.llvm.org/D127309	2022-06-15 12:06:04 -05:00
Stefan Pintilie	263f1b2f5d	[PowerPC] Fix combine step for shufflevector. The combine step for shufflevector will sometimes replace undef in the mask with a defined value. This can cause an infinite loop in some cases as another combine will then put the undef back in the mask. This patch fixes the issue so that undefs are not replaced when doing a combine. Reviewed By: ZarkoCA, amyk, quinnp, saghir Differential Revision: https://reviews.llvm.org/D127439	2022-06-14 11:31:24 -05:00
Guillaume Chatelet	0788186182	[Alignment][NFC] Remove usage of MemSDNode::getAlignment I can't remove the function just yet as it is used in the generated .inc files. I would also like to provide a way to compare alignment with TypeSize since it came up a few times. Differential Revision: https://reviews.llvm.org/D126910	2022-06-07 13:52:20 +00:00
Zongwei Lan	ad73ce318e	[Target] use getSubtarget<> instead of static_cast<>(getSubtarget()) Differential Revision: https://reviews.llvm.org/D125391	2022-05-26 11:22:41 -07:00
David Green	115c188807	[DAG][PowerPC] Combine shuffle(bitcast(X), Mask) to bitcast(shuffle(X, Mask')) If the mask is made up of elements that form a mask in the higher type we can convert shuffle(bitcast into the bitcast type, simplifying the instruction sequence. A v4i32 2,3,0,1 for example can be treated as a 1,0 v2i64 shuffle. This helps clean up some of the AArch64 concat load combines, along with helping simplify a number of other tests. The PowerPC combine for v16i8 splat vector loads needed some fixes to keep it working for v16i8 vectors. This improves the handling of v2i64 shuffles to match too, hopefully improving them in general. Differential Revision: https://reviews.llvm.org/D123801	2022-05-06 10:50:31 +01:00
Fangrui Song	fb193db2c7	[PowerPC] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds	2022-04-19 22:35:05 -07:00
Kai Luo	18679ac0d7	[PowerPC] Adjust `MaxAtomicSizeInBitsSupported` on PPC64 AtomicExpandPass uses this variable to determine emitting libcalls or not. The default value is 1024 and if we don't specify it for PPC64 explicitly, AtomicExpandPass won't emit `__atomic_` libcalls for those target unable to inline atomic ops and finally the backend emits `__sync_` libcalls. Thanks @efriedma for pointing it out. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122868	2022-04-09 00:03:09 +00:00
Kai Luo	549e118e93	[PowerPC] Support 16-byte lock free atomics on pwr8 and up Make 16-byte atomic type aligned to 16-byte on PPC64, thus consistent with GCC. Also enable inlining 16-byte atomics on non-AIX targets on PPC64. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D122377	2022-04-08 23:25:56 +00:00
Ting Wang	b389354b28	[Clang][PowerPC] Add max/min intrinsics to Clang and PPC backend Add support for builtin_[max\|min] which has below prototype: A builtin_max (A1, A2, A3, ...) All arguments must have the same type; they must all be float, double, or long double. Internally use SelectCC to get the result. Reviewed By: qiucf Differential Revision: https://reviews.llvm.org/D122478	2022-04-05 22:43:48 -04:00
Stefan Pintilie	585c85abe5	[PowerPC] Fix lowering of byval parameters for sizes greater than 8 bytes. To store a byval parameter the existing code would store as many 8 byte elements as was required to store the full size of the byval parameter. For example, a paramter of size 16 would store two element of 8 bytes. A paramter of size 12 would also store two elements of 8 bytes. This would sometimes store too many bytes as the size of the paramter is not always a factor of 8. This patch fixes that issue and now byval paramters are stored with the correct number of bytes. Reviewed By: nemanjai, #powerpc, quinnp, amyk Differential Revision: https://reviews.llvm.org/D121430	2022-03-31 15:12:46 -05:00
Shao-Ce SUN	662b9fa02c	[NFC][CodeGen] Add a setTargetDAGCombine use ArrayRef Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122557	2022-03-29 09:53:24 +08:00
Aaron Puchert	c1a31ee65b	[PPCISelLowering] Avoid emitting calls to __multi3, __muloti4 After D108936, @llvm.smul.with.overflow.i64 was lowered to __multi3 instead of __mulodi4, which also doesn't exist on PowerPC 32-bit, not even with compiler-rt. Block it as well so that we get inline code. Because libgcc doesn't have __muloti4, we block that as well. Fixes #54460. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D122090	2022-03-20 20:59:30 +01:00
Qiu Chaofan	300e1293de	[PowerPC] Disable perfect shuffle by default We are going to remove the old 'perfect shuffle' optimization since it brings performance penalty in hot loop around vectors. For example, in following loop sharing the same mask: %v.1 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27> %v.2 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27> The generated instructions will be `vmrglw-vmrghw-vmrglw-vmrghw` instead of `vperm-vperm`. In some large loop cases, this causes 20%+ performance penalty. The original attempt to resolve this is to pre-record masks of every shufflevector operation in DAG, but that is somewhat complex and brings unnecessary computation (to scan all nodes) in optimization. Here we disable it by default. There're indeed some cases becoming worse after this, which will be fixed in a more careful way in future patches. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D121082	2022-03-15 15:52:24 +08:00
Masoud Ataei	30f30e1c12	[PowerPC] Fix the none tail call in scalar MASS conversion This patch is proposing a fix for patch https://reviews.llvm.org/D101759 on none tail call math function conversion to MASS call. Differential: https://reviews.llvm.org/D121016 reviewer: @nemanjai	2022-03-08 08:59:17 -08:00
Qiu Chaofan	b2497e5435	[PowerPC] Add generic fnmsub intrinsic Currently in Clang, we have two types of builtins for fnmsub operation: one for float/double vector, they'll be transformed into IR operations; one for float/double scalar, they'll generate corresponding intrinsics. But for the vector version of builtin, the 3 op chain may be recognized as expensive by some passes (like early cse). We need some way to keep the fnmsub form until code generation. This patch introduces ppc.fnmsub.* intrinsic to unify four fnmsub intrinsics. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D116015	2022-03-07 13:00:06 +08:00
Qiu Chaofan	43d48ed220	[PowerPC] Add option to disable perfect shuffle Perfect shuffle was introduced into PowerPC backend years ago, and only available in big-endian subtargets. This optimization has good effects in simple cases, but brings serious negative impact in large programs with many shuffle instructions sharing the same mask. Here introduces a temporary backend hidden option to control it until we implemented better way to fix the gap in vectorshuffle decomposition. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D120072	2022-02-21 01:39:35 +08:00
Ting Wang	097a95f2df	[PowerPC] Add custom lowering for SELECT_CC fp128 using xsmaxcqp Power ISA 3.1 adds xsmaxcqp/xsmincqp for quad-precision type-c max/min selection, and this opens the opportunity to improve instruction selection on: llvm.maxnum.f128, llvm.minnum.f128, and select_cc ordered gt/lt and (don't care) gt/lt. Reviewed By: nemanjai, shchenz, amyk Differential Revision: https://reviews.llvm.org/D117006	2022-02-09 21:48:28 -05:00
Nikita Popov	149195f576	[PPCISelLowering] Avoid use of getPointerElementType() Use the value type instead.	2022-02-07 14:30:15 +01:00
Masoud Ataei	8ce13bc93b	[PowerPC] Option controling scalar MASS convertion differential: https://reviews.llvm.org/D119035 reviewer: bmahjour	2022-02-04 13:24:22 -08:00
Benjamin Kramer	85243124cf	Tweak some uses of std::iota to skip initializing the underlying storage. NFCI.	2022-02-04 17:00:50 +01:00
Masoud Ataei	256d253332	[PowerPC] Scalar IBM MASS library conversion pass This patch introduces the conversions from math function calls to MASS library calls. To resolves calls generated with these conversions, one need to link libxlopt.a library. This patch is tested on PowerPC Linux and AIX. Differential: https://reviews.llvm.org/D101759 Reviewer: bmahjour	2022-02-02 07:54:19 -08:00
Amy Kwan	0d6e64755a	[PowerPC] Update P10 vector insert patterns to use refactored load/stores, and update handling of v4f32 vector insert. This patch updates the P10 patterns with a load feeding into an insertelt to utilize the refactored load and store infrastructure, as well as updating any tests that exhibit any codegen changes. Furthermore, custom legalization is added for v4f32 on Power9 and above to not only assist with adjusting the refactored load/stores for P10 vector insert, but also it enables the utilization of direct moves. Differential Revision: https://reviews.llvm.org/D115691	2022-02-01 08:48:37 -06:00
Amy Kwan	9cc5b064f1	[PowerPC] Update handling of splat loads for v4i32/v4f32/v2i64 to require non-extending loads. This patch updates how splat loads handled and is an extension of D106555. Particularly, for v2i64/v4f32/v4i32 types, they are updated to handle only non-extending loads. For v8i16/v16i8 types, they are updated to handle extending loads only if the memory VT is the same vector element VT type. A test case has been added to illustrate a scenario where a PPCISD::LD_SPLAT node should not be produced. In this test, it depicts the following f64 extending load used in a v2f64 build vector, but the extending load is actually used in more places other than the build vector (such as in t12 and t16). ``` Type-legalized selection DAG: %bb.0 'test:entry' SelectionDAG has 20 nodes: t0: ch = EntryToken t4: i64,ch = CopyFromReg t0, Register:i64 %1 t6: i64,ch = CopyFromReg t0, Register:i64 %2 t11: f64,ch = load<(load (s64) from %ir.b, !tbaa !7)> t0, t4, undef:i64 t16: f64 = fadd t31, t37 t34: ch = store<(store (s64) into %ir.c, !tbaa !7)> t31:1, t16, t6, undef:i64 t36: ch = TokenFactor t34, t37:1 t27: v2f64 = BUILD_VECTOR t37, t37 t22: ch,glue = CopyToReg t36, Register:v2f64 $v2, t27 t12: f64 = fadd t11, t37 t28: ch = store<(store (s64) into %ir.b, !tbaa !7)> t11:1, t12, t4, undef:i64 t31: f64,ch = load<(load (s64) from %ir.c, !tbaa !7)> t28, t6, undef:i64 t2: i64,ch = CopyFromReg t0, Register:i64 %0 t37: f64,ch = load<(load (s32) from %ir.a, !tbaa !3), anyext from f32> t0, t2, undef:i64 t23: ch = PPCISD::RET_FLAG t22, Register:v2f64 $v2, t22:1 ``` Differential Revision: https://reviews.llvm.org/D117803	2022-01-28 08:23:01 -06:00
Benjamin Kramer	f15014ff54	Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17" This reverts commit `ef82063207`. - It conflicts with the existing llvm::size in STLExtras, which will now never be called. - Calling it without llvm:: breaks C++17 compat	2022-01-26 16:55:53 +01:00
serge-sans-paille	ef82063207	Rename llvm::array_lengthof into llvm::size to match std::size from C++17 As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no build breakage expected).	2022-01-26 16:17:45 +01:00
Nemanja Ivanovic	0c56bc92e4	[PowerPC] Fix eq/ne comparison of v2i64 pre-Power8 In commit `1674d9b6b2`, I fixed the bug where we didn't consider both words of the result of the comparison. However, the logic needs to be different for eq and ne. Namely for eq, we need both words of the doubleword to equal so it is an AND. OTOH for ne, we need either word to be unequal so it is an OR.	2022-01-26 08:59:08 -06:00
Nikita Popov	aa97bc116d	[NFC] Remove uses of PointerType::getElementType() Instead use either Type::getPointerElementType() or Type::getNonOpaquePointerElementType(). This is part of D117885, in preparation for deprecating the API.	2022-01-25 09:44:52 +01:00

1 2 3 4 5 ...

1741 Commits