Commit Graph

48 Commits

Author SHA1 Message Date
Kazu Hirata 192d9dd731 [mlir] Use std::nullopt instead of None in comments (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 19:58:32 -08:00
Kazu Hirata 1a36588ec6 [mlir] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-03 18:50:27 -08:00
Nicolas Vasilache 3af6438372 Revert "[WIP] Add support for MMA conversion for 1-D vector.transfer followed by a broadcast to 2-D"
This reverts commit 7db25f78db.

This was mistakently stacked below (and committed) along with an NFC change.
2022-12-01 02:57:03 -08:00
Nicolas Vasilache 7db25f78db [WIP] Add support for MMA conversion for 1-D vector.transfer followed by a broadcast to 2-D
Differential Revision: https://reviews.llvm.org/D139040
2022-12-01 02:49:47 -08:00
Quinn Dawkins c0321edc26 [mlir][gpu] Adding support for transposed mma_load_matrix
Enables transposed gpu.subgroup_mma_load_matrix and updates the lowerings in Vector to GPU and GPU to SPIRV. Needed to enable B transpose matmuls lowering to wmma ops.

Taken over from author: stanley-nod <stanley@nod-labs.com>

Reviewed By: ThomasRaoux, antiagainst

Differential Revision: https://reviews.llvm.org/D138770
2022-11-29 03:35:49 +00:00
Ramkumar Ramachandra d32ec5232c mlir/VectorToGPU: use std::optional (NFC)
This is part of an effort to migrate from llvm::Optional to std::optional:

See also: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716

Signed-off-by: Ramkumar Ramachandra <r@artagnon.com>
2022-11-27 13:32:18 -08:00
Aliia Khasanova 399638f98c Merge kDynamicSize and kDynamicSentinel into one constant.
resolve conflicts

Differential Revision: https://reviews.llvm.org/D138282
2022-11-21 13:01:26 +00:00
Mehdi Amini 6a7a1188d3 Apply clang-tidy fixes for llvm-else-after-return in VectorToGPU.cpp (NFC) 2022-11-06 20:15:00 +00:00
Manish Gupta 114ba722c1 [mlir][NVGPU] Handle native mma.sync and ldmatrix(x4) sizes
This patch handles native `mma.sync` sizes and enables issuing `ldmatrix` on
largest possible tiles for matrixB. It requires handling
`vector.extract_strided_slice` from vector to ngpu lowering.

Differential Revision: https://reviews.llvm.org/D135749
2022-10-19 17:10:21 -07:00
Christopher Bate ea2ed80e6d [mlir][nvgpu] NFC - move NVGPU conversion helpers to NvGpu utils library
The ConvertVectorToGpu pass implementation contained a small private
support library for performing various calculations during conversion
between `vector` and `nvgpu.mma.sync` and `nvgpu.ldmatrix` operations.
The support library is moved under `Dialect/NVGPU/Utils` because the
functions have wider utility. Some documentation comments are added or
improved.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D135303
2022-10-05 20:21:27 -06:00
Jakub Kuderski abc362a107 [mlir][arith] Change dialect name from Arithmetic to Arith
Suggested by @lattner in https://discourse.llvm.org/t/rfc-define-precise-arith-semantics/65507/22.

Tested with:
`ninja check-mlir check-mlir-integration check-mlir-mlir-spirv-cpu-runner check-mlir-mlir-vulkan-runner check-mlir-examples`

and `bazel build --config=generic_clang @llvm-project//mlir:all`.

Reviewed By: lattner, Mogball, rriddle, jpienaar, mehdi_amini

Differential Revision: https://reviews.llvm.org/D134762
2022-09-29 11:23:28 -04:00
Kazu Hirata be650de57d [mlir] Use empty (NFC) 2022-09-18 17:46:53 -07:00
Oleg Shyshkov 4758e916e1 [mlir] Change IteratorType in ContractionOp in Vector dialect from string to enum.
This is the first step in replacing interator_type from strings with enums in Vector and Linalg dialect. This change adds IteratorTypeAttr and uses it in ContractionOp.

To avoid breaking all the tests, print/parse code has conversion between string and enum for now.

There is a shared code in StructuredOpsUtils.h that expects iterator types to be strings. To break this dependancy, this change forks helper function `isParallelIterator` and `isReductionIterator` to utils in both dialects and adds `getIteratorTypeNames()` to support backward compatibility with StructuredGenerator.

In the later changes, I plan to add a similar enum attribute to Linalg.

Differential Revision: https://reviews.llvm.org/D133696
2022-09-12 16:59:34 +02:00
Michele Scuttari 67d0d7ac0a
[MLIR] Update pass declarations to new autogenerated files
The patch introduces the required changes to update the pass declarations and definitions to use the new autogenerated files and allow dropping the old infrastructure.

Reviewed By: mehdi_amini, rriddle

Differential Review: https://reviews.llvm.org/D132838
2022-08-31 12:28:45 +02:00
Michele Scuttari 039b969b32
Revert "[MLIR] Update pass declarations to new autogenerated files"
This reverts commit 2be8af8f0e.
2022-08-30 22:21:55 +02:00
Michele Scuttari 2be8af8f0e
[MLIR] Update pass declarations to new autogenerated files
The patch introduces the required changes to update the pass declarations and definitions to use the new autogenerated files and allow dropping the old infrastructure.

Reviewed By: mehdi_amini, rriddle

Differential Review: https://reviews.llvm.org/D132838
2022-08-30 21:56:31 +02:00
Manish Gupta 14d79afeae [mlir][NVGPU] nvgpu.mmasync on F32 through TF32
Adds optional attribute to support tensor cores on F32 datatype by lowering to `mma.sync` with TF32 operands. Since, TF32 is not a native datatype in LLVM we are adding `tf32Enabled` as an attribute to allow the IR to be aware of `MmaSyncOp` datatype. Additionally, this patch adds placeholders for nvgpu-to-nvgpu transformation targeting higher precision tf32x3.

For mma.sync on f32 input using tensor cores there are two possibilites:
(a) tf32   (1 `mma.sync` per warp-level matrix-multiply-accumulate)
(b) tf32x3 (3 `mma.sync` per warp-level matrix-multiply-accumulate)

Typically, tf32 tensor core acceleration comes at a cost of accuracy from missing precision bits. While f32 has 23 precision bits, tf32 has only 10 precision bits. tf32x3 aims to recover the precision bits by splitting each operand into two tf32 values and issue three `mma.sync` tensor core operations.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D130294
2022-08-01 23:23:27 +00:00
Jeff Niu e179532284 [mlir] Remove types from attributes
This patch removes the `type` field from `Attribute` along with the
`Attribute::getType` accessor.

Going forward, this means that attributes in MLIR will no longer have
types as a first-class concept. This patch lays the groundwork to
incrementally remove or refactor code that relies on generic attributes
being typed. The immediate impact will be on attributes that rely on
`Attribute` containing a type, such as `IntegerAttr`,
`DenseElementsAttr`, and `ml_program::ExternAttr`, which will now need
to define a type parameter on their storage classes. This will save
memory as all other attribute kinds will no longer contain a type.

Moreover, it will not be possible to generically query the type of an
attribute directly. This patch provides an attribute interface
`TypedAttr` that implements only one method, `getType`, which can be
used to generically query the types of attributes that implement the
interface. This interface can be used to retain the concept of a "typed
attribute". The ODS-generated accessor for a `type` parameter
automatically implements this method.

Next steps will be to refactor the assembly formats of certain operations
that rely on `parseAttribute(type)` and `printAttributeWithoutType` to
remove special handling of type elision until `type` can be removed from
the dialect parsing hook entirely; and incrementally remove uses of
`TypedAttr`.

Reviewed By: lattner, rriddle, jpienaar

Differential Revision: https://reviews.llvm.org/D130092
2022-07-31 20:01:31 -04:00
Jacques Pienaar d2c0572b2e [mlir] Flip LinAlg dialect to _Both
This one required more changes than ideal due to overlapping generated name
with different return types. Changed getIndexingMaps to getIndexingMapsArray to
move it out of the way/highlight that it returns (more expensively) a
SmallVector and uses the prefixed name for the Attribute.

Differential Revision: https://reviews.llvm.org/D129919
2022-07-19 14:42:58 -07:00
Christopher Bate 670eee08ce [mlir][VectorToGPU] Fix support for i4, col-major operand support
For the conversion to nvgpu `mma.sync` and `ldmatrix` pathways, the code
was missing support for the `i4` data type. While fixing this, another
bug was discoverd that caused the number of ldmatrix tiles calculated for
certain operand types and configurations to be incorrect. This change
fixes both issues and adds additional tests.

Differential Revision: https://reviews.llvm.org/D128074
2022-06-30 10:26:59 -06:00
Kazu Hirata 064a08cd95 Don't use Optional::hasValue (NFC) 2022-06-20 20:05:16 -07:00
Alex Zinenko 8b68da2c7d [mlir] move SCF headers to SCF/{IR,Transforms} respectively
This aligns the SCF dialect file layout with the majority of the dialects.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D128049
2022-06-20 10:18:01 +02:00
Christopher Bate 51b925df94 [mlir][nvgpu] shared memory access optimization pass
This change adds a transformation and pass to the NvGPU dialect that
attempts to optimize reads/writes from a  memref representing GPU shared
memory in order to avoid bank conflicts. Given a value representing a
shared memory memref, it traverses all reads/writes within the parent op
and, subject to suitable conditions, rewrites all last dimension index
values such that element locations in the final (col) dimension are
given by
`newColIdx = col % vecSize + perm[row](col/vecSize,row)`
where `perm` is a permutation function indexed by `row` and `vecSize`
is the vector access size in elements (currently assumes 128bit
vectorized accesses, but this can be made a parameter). This specific
transformation can help optimize typical distributed & vectorized accesses
common to loading matrix multiplication operands to/from shared memory.

Differential Revision: https://reviews.llvm.org/D127457
2022-06-17 09:31:05 -06:00
Mogball d7ef488bb6 [mlir][gpu] Move GPU headers into IR/ and Transforms/
Depends on D127350

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D127352
2022-06-09 22:49:03 +00:00
Thomas Raoux 271a48e029 [mlir][VectorToGPU] Fix bug generating incorrect ldmatrix ops
ldmatrix transpose can only be used with types that are 16bits wide.

Differential Revision: https://reviews.llvm.org/D126846
2022-06-03 04:30:22 +00:00
Christopher Bate 1ca772ed95 [MLIR][GPU] Add NvGpu mma.sync path to the VectorToGPU pass
This changes adds the option to lower to NvGpu dialect ops during the
VectorToGPU convsersion pass. Because this transformation reuses
existing VectorToGPU logic, a seperate VectorToNvGpu conversion pass is
not created. The option `use-nvgpu` is added to the VectorToGPU pass.
When this is true, the pass will attempt to convert slices rooted at
`vector.contract` operations into `nvgpu.mma.sync` ops, and
`vector.transfer_read` ops are converted to either `nvgpu.ldmatrix` or
one or more `vector.load` operations.  The specific data loaded will
depend on the thread id within a subgroup (warp). These index
calculations depend on data type and shape of the MMA op
according to the downstream PTX specification. The code for supporting
these details is separated into `NvGpuSupport.cpp|h`.

Differential Revision: https://reviews.llvm.org/D122940
2022-05-20 09:42:55 -06:00
Jacques Pienaar 7c38fd605b [mlir] Flip Vector dialect accessors used to prefixed form.
This has been on _Both for a couple of weeks. Flip usages in core with
intention to flip flag to _Prefixed in follow up. Needed to add a couple
of helper methods in AffineOps and Linalg to facilitate a pure flag flip
in follow up as some of these classes are used in templates and so
sensitive to Vector dialect changes.

Differential Revision: https://reviews.llvm.org/D122151
2022-03-28 11:24:47 -07:00
Thomas Raoux d77f483640 [mlir][gpu] Relax restriction on mma load/store op
Those ops can support more complex layout as long as the most inner
dimension is contiguous.

Differential Revision: https://reviews.llvm.org/D122452
2022-03-25 04:03:40 +00:00
River Riddle 47f175b09b [mlir] Update FuncOp conversion passes to Pass/InterfacePass<FunctionOpInterface>
These passes generally don't rely on any special aspects of FuncOp, and moving allows
for these passes to be used in many more situations. The passes that obviously weren't
relying on invariants guaranteed by a "function" were updated to be generic pass, the
rest were updated to be FunctionOpinterface InterfacePasses.

The test updates are NFC switching from implicit nesting (-pass -pass2) form to
the -pass-pipeline form (generic passes do not implicitly nest as op-specific passes do).

Differential Revision: https://reviews.llvm.org/D121190
2022-03-08 12:25:32 -08:00
Matthias Springer 99ef9eebad [mlir][vector][NFC] Split into IR, Transforms and Utils
This reduces the dependencies of the MLIRVector target and makes the dialect consistent with other dialects.

Differential Revision: https://reviews.llvm.org/D118533
2022-01-31 19:17:09 +09:00
Thomas Raoux a57ccad5a6 [VectorToGPU] Fix horizontal stride calculation for N-D memref
Fix a bug in how we calculate the stride of mma load/store ops for N-D
memrefs

Differential Revision: https://reviews.llvm.org/D118378
2022-01-27 13:35:56 -08:00
River Riddle e084679f96 [mlir] Make locations required when adding/creating block arguments
BlockArguments gained the ability to have locations attached a while ago, but they
have always been optional. This goes against the core tenant of MLIR where location
information is a requirement, so this commit updates the API to require locations.

Fixes #53279

Differential Revision: https://reviews.llvm.org/D117633
2022-01-19 17:35:35 -08:00
River Riddle 4157455425 [mlir][Pass] Deprecate FunctionPass in favor of OperationPass<FuncOp>
The only benefit of FunctionPass is that it filters out function
declarations. This isn't enough to justify carrying it around, as we can
simplify filter out declarations when necessary within the pass. We can
also explore with better scheduling primitives to filter out declarations
at the pipeline level in the future.

The definition of FunctionPass is left intact for now to allow time for downstream
users to migrate.

Differential Revision: https://reviews.llvm.org/D117182
2022-01-18 19:52:44 -08:00
Mehdi Amini e4853be2f1 Apply clang-tidy fixes for performance-for-range-copy to MLIR (NFC) 2022-01-02 22:19:56 +00:00
Mehdi Amini 6786d7e4f5 Apply clang-tidy fixes for readability-simplify-boolean-expr to MLIR (NFC)
Reviewed By: rriddle, Mogball

Differential Revision: https://reviews.llvm.org/D116253
2022-01-02 01:59:31 +00:00
Jacques Pienaar c0342a2de8 [mlir] Switching accessors to prefixed form (NFC)
Makes eventual prefixing flag flip smaller change.
2021-12-20 08:03:43 -08:00
Nicolas Vasilache c537a94334 [mlir][Vector] Thread 0-d vectors through vector.transfer ops
This revision adds 0-d vector support to vector.transfer ops.
In the process, numerous cleanups are applied, in particular around normalizing
and reducing the number of builders.

Reviewed By: ThomasRaoux, springerm

Differential Revision: https://reviews.llvm.org/D114803
2021-12-01 16:49:43 +00:00
Alexander Belyaev 9b1d90e8ac [mlir] Move min/max ops from Std to Arith.
Differential Revision: https://reviews.llvm.org/D113881
2021-11-15 13:19:17 +01:00
Thomas Raoux e7969240dc [mlir][VectorToGPU] Support more cases in conversion to MMA ops
Support load with broadcast, elementwise divf op and remove the
hardcoded restriction on the vector size. Picking the right size should
be enfored by user and will fail conversion to llvm/spirv if it is not
supported.

Differential Revision: https://reviews.llvm.org/D113618
2021-11-11 13:10:38 -08:00
River Riddle 937e40a8cf [mlir] Remove the non-templated DenseElementsAttr::getSplatValue
This predates the templated variant, and has been simply forwarding
to getSplatValue<Attribute> for some time. Removing this makes the
API a bit more uniform, and also helps prevent users from thinking
it is "cheap".
2021-11-09 01:40:40 +00:00
thomasraoux 7fbb0678fa [mlir][VectorToGPU] Add support for elementwise mma to vector to GPU
Differential Revision: https://reviews.llvm.org/D112960
2021-11-02 08:01:04 -07:00
Jacques Pienaar cfb72fd3a0 [mlir] Switch arith, llvm, std & shape dialects to accessors prefixed both form.
Following
https://llvm.discourse.group/t/psa-ods-generated-accessors-will-change-to-have-a-get-prefix-update-you-apis/4476,
this follows flipping these dialects to _Both prefixed form. This
changes the accessors to have a prefix. This was possibly mostly without
breaking breaking changes if the existing convenience methods were used.

(https://github.com/jpienaar/llvm-project/blob/main/clang-tools-extra/clang-tidy/misc/AddGetterCheck.cpp
was used to migrate the callers post flipping, using the output from
Operator.cpp)

Differential Revision: https://reviews.llvm.org/D112383
2021-10-24 18:36:33 -07:00
Mogball a54f4eae0e [MLIR] Replace std ops with arith dialect ops
Precursor: https://reviews.llvm.org/D110200

Removed redundant ops from the standard dialect that were moved to the
`arith` or `math` dialects.

Renamed all instances of operations in the codebase and in tests.

Reviewed By: rriddle, jpienaar

Differential Revision: https://reviews.llvm.org/D110797
2021-10-13 03:07:03 +00:00
thomasraoux 4392841949 [mlir][VectorToGPU] Support converting vetor.broadcast to MMA op
Differential Revision: https://reviews.llvm.org/D105175
2021-06-30 09:08:55 -07:00
thomasraoux 1a86559276 [mlir][VectorToGPU] Add conversion for scf::For op with Matrix operands
Differential Revision: https://reviews.llvm.org/D104134
2021-06-24 15:42:28 -07:00
thomasraoux 6413226dce [mlir][VectorToGPU] Add conversion for splat constant to MMA const matrix
Differential Revision: https://reviews.llvm.org/D104133
2021-06-24 15:38:12 -07:00
Matthias Springer 66f878cee9 [mlir][NFC] Remove Standard dialect dependency on MemRef dialect
* Remove dependency: Standard --> MemRef
* Add dependencies: GPUToNVVMTransforms --> MemRef, Linalg --> MemRef, MemRef --> Tensor
* Note: The `subtensor_insert_propagate_dest_cast` test case in MemRef/canonicalize.mlir will be moved to Tensor/canonicalize.mlir in a subsequent commit, which moves over the remaining Tensor ops from the Standard dialect to the Tensor dialect.

Differential Revision: https://reviews.llvm.org/D104506
2021-06-21 17:55:23 +09:00
thomasraoux edd9515bd1 [mlir][VectorToGPU] First step to convert vector ops to GPU MMA ops
This is the first step to convert vector ops to MMA operations in order to
target GPUs tensor core ops. This currently only support simple cases,
transpose and element-wise operation will be added later.

Differential Revision: https://reviews.llvm.org/D102962
2021-06-11 07:52:32 -07:00