Go to file
zhoujing 47fc50bb92 Add pass to support VX/VF instruction generation
Summary: Add pass to support VX/VF instruction generation

Test Plan: none

Reviewers: hualin

Reviewed By: hualin

Subscribers: yanming

Differential Revision: http://www.tpt.com/D583
2023-02-07 14:00:15 +08:00
.github [NFC] Fix exception in version-check.py script 2022-09-15 13:34:29 +02:00
bolt [DebugInfo] llvm::Optional => std::optional 2022-12-05 00:09:22 +00:00
clang WORKAROUND: Fix libclc building error 2023-01-16 16:05:08 +08:00
clang-tools-extra Remove unused #include "llvm/ADT/Optional.h" 2022-12-05 06:31:11 +00:00
cmake [cmake] Add missing CMakePushCheckState include to FindLibEdit.cmake 2022-11-07 18:20:19 +01:00
compiler-rt Successfully build crt0.o and libworkitem.a 2023-01-17 18:17:55 +08:00
cross-project-tests [dexter-tests] Add attribute optnone to main function 2022-10-26 20:57:49 +00:00
flang Remove unused #include "llvm/ADT/Optional.h" 2022-12-05 06:31:11 +00:00
libc [libc][Obvious] Update error bounds for uint_test.QuickMulHiTests. 2022-12-02 18:13:35 -05:00
libclc Add script to automatically build all programs 2023-01-30 13:43:50 +08:00
libcxx [libc++][NFC] Fix typo in comment 2022-12-02 12:20:27 -08:00
libcxxabi [libc++abi][LIT][AIX] Use Vector instructions available on Power7 in vec_reg_restore.pass.cpp 2022-11-29 14:08:03 -05:00
libunwind [CMake] Use LLVM_TARGET_TRIPLE in runtimes 2022-11-29 04:08:24 +00:00
lld Remove unused #include "llvm/ADT/Optional.h" 2022-12-05 06:31:11 +00:00
lldb Remove unused #include "llvm/ADT/Optional.h" 2022-12-05 06:31:11 +00:00
llvm Add pass to support VX/VF instruction generation 2023-02-07 14:00:15 +08:00
llvm-libgcc [cmake] Slight fix ups to make robust to the full range of GNUInstallDirs 2022-07-26 14:48:49 +00:00
mlir [mlir] Use std::nullopt instead of None in comments (NFC) 2022-12-04 19:58:32 -08:00
openmp [OpenMP][libomptarget] Add hasQueue() function in NextGen plugin's AsyncInfoWrapperTy 2022-12-04 13:24:40 +01:00
polly Remove unused #include "llvm/ADT/Optional.h" 2022-12-05 06:31:11 +00:00
pstl Revert "[cmake] Use `CMAKE_INSTALL_LIBDIR` too" 2022-08-18 22:46:32 -04:00
runtimes [runtimes] Fix runtimes-test-depends 2022-11-30 16:55:51 -08:00
third-party [llvm] [cmake] Set EXCLUDE_FROM_ALL on gtest and TestingSupport 2022-11-24 17:52:22 +01:00
utils [bazel] Add a missing dependency after 786cbb09ed 2022-12-02 16:56:05 +01:00
.arcconfig Add pass to support VX/VF instruction generation 2023-02-07 14:00:15 +08:00
.arclint
.clang-format
.clang-tidy Add -misc-const-correctness to .clang-tidy 2022-08-08 13:00:52 -07:00
.git-blame-ignore-revs Add __config formatting to .git-blame-ignore-revs 2022-06-14 09:52:49 -04:00
.gitignore Add script to automatically build all programs 2023-01-30 13:43:50 +08:00
.mailmap .mailmap: add entry for myself 2022-12-03 09:52:57 +01:00
CONTRIBUTING.md docs: update some bug tracker references (NFC) 2022-01-10 15:59:08 -08:00
LICENSE.TXT [docs] Add LICENSE.txt to the root of the mono-repo 2022-08-24 09:35:00 +02:00
README-LLVM.md Update README 2022-12-29 09:56:50 +08:00
README.md Update README.md 2023-01-30 14:09:35 +08:00
SECURITY.md
build-ventus.sh Update build-ventus.sh 2023-02-03 22:16:48 +08:00

README.md

This is the Ventus GPGPU port of LLVM Compiler Infrastructure

Ventus GPGPU is based on RISCV RV32IMACZfinxZve32f ISA with fully redefined concept of V-extension.

For more architecture detail, please refer to Ventus GPGPU Arch

Getting Started

Download all the repositories firstly

libclc can be built from llvm-ventus repository

2: Build all the programs

Assume you have already installed essential build tools such as cmake, clang, ninja etc.

Run ./build-ventus.sh to automatically build all the programs, but we need to run firstly

  • export POCL_DIR=<path-to-pocl-dir>, default folder path will be set to be <llvm-ventus-parentFolder>/pocl
  • export OCL_ICD_DIR=<path-to-ocl-icd-dir>, default folder path will be set to be <llvm-ventus-parentFolder>/ocl-icd

You can dive into build-ventus.sh file to see the detailed information about build process

3: Bridge icd loader

Run export OCL_ICD_VENDORS=<path_to>/libpocl.so to tell ocl icd loader where the icd driver is.

Finally, run export POCL_DEVICES="ventus" to tell pocl driver which device is available(should we set ventus as default device?).

You will see Ventus GPGPU device is found if your setup is correct.

$ <pocl-install-dir>/bin/poclcc -l

LIST OF DEVICES:
0:
  Vendor:   THU
    Name:   Ventus GPGPU device
 Version:   2.2 HSTR: THU-ventus-gpgpu

Also, you can try to set POCL_DEBUG=all and run example under <pocl-build-dir> to see the full OpenCL software stack execution pipeline. For example(Work in progress).

aries@legion:~/workspace/ventus-pocl/build-ventus$ ./examples/vecadd/vecadd
** Final POCL_DEBUG flags: FFFFFFFFFFFFFFFF
[2023-01-19 01:58:36.582896453]POCL: in fn POclCreateCommandQueue at line 98:
  |   GENERAL |  Created Command Queue 3 (0x561c8b324d90) on device 0
[2023-01-19 01:58:36.583753425]POCL: in fn void pocl_llvm_create_context(cl_context) at line 431:
  |      LLVM |  Created context 2 (0x561c8b324c60)
[2023-01-19 01:58:36.583777345]POCL: in fn POclCreateContext at line 227:
  |   GENERAL |  Created Context 2 (0x561c8b324c60)
[2023-01-19 01:58:36.583811014]POCL: in fn POclRetainContext at line 32:
  | REFCOUNTS |  Retain Context 2 (0x561c8b324c60), Refcount: 2
[2023-01-19 01:58:36.583817177]POCL: in fn POclCreateCommandQueue at line 98:
  |   GENERAL |  Created Command Queue 4 (0x561c8b3275b0) on device 0
[2023-01-19 01:58:36.584022807]POCL: in fn POclRetainContext at line 32:
  | REFCOUNTS |  Retain Context 2 (0x561c8b324c60), Refcount: 3
[2023-01-19 01:58:36.584082504]POCL: in fn compile_and_link_program at line 719:
  |      LLVM |  building program with options
[2023-01-19 01:58:36.584105109]POCL: in fn compile_and_link_program at line 756:
  |      LLVM |  building program for 1 devs with options
[2023-01-19 01:58:36.584127804]POCL: in fn compile_and_link_program at line 760:
  |      LLVM |     BUILDING for device: ventus
[2023-01-19 01:58:36.584144767]POCL: in fn pocl_driver_build_source at line 712:
  |      LLVM |  building from sources for device 0
[2023-01-19 01:58:36.585000531]POCL: in fn int pocl_llvm_build_program(cl_program, unsigned int, cl_uint, _cl_program* const*, const char**, int) at line 406:
  |      LLVM |  all build options: -DPOCL_DEVICE_ADDRESS_BITS=32 -D__USE_CLANG_OPENCL_C_H -xcl -Dinline= -I. -cl-kernel-arg-info  -D__ENDIAN_LITTLE__=1 -DCL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE=0 -D__OPENCL_VERSION__=200 -cl-std=CL2.0 -D__OPENCL_C_VERSION__=200 -fno-builtin -triple=riscv32 -target-cpu ventus-gpgpu
[2023-01-19 01:58:37.002888436]POCL: in fn POclRetainContext at line 32:
  | REFCOUNTS |  Retain Context 2 (0x561c8b324c60), Refcount: 4
[2023-01-19 01:58:37.002944911]POCL: in fn POclCreateBuffer at line 256:
  |    MEMORY |  Created Buffer 6 (0x561c8b329330), MEM_HOST_PTR: 0x561c8b65c9d0, device_ptrs[0]: (nil), SIZE 512, FLAGS 36
[2023-01-19 01:58:37.002959038]POCL: in fn POclRetainContext at line 32:
  | REFCOUNTS |  Retain Context 2 (0x561c8b324c60), Refcount: 5
[2023-01-19 01:58:37.002964856]POCL: in fn POclCreateBuffer at line 256:
  |    MEMORY |  Created Buffer 7 (0x561c8b331280), MEM_HOST_PTR: 0x561c8b39bb50, device_ptrs[0]: (nil), SIZE 512, FLAGS 36
[2023-01-19 01:58:37.002971025]POCL: in fn POclRetainContext at line 32:
  | REFCOUNTS |  Retain Context 2 (0x561c8b324c60), Refcount: 6
[2023-01-19 01:58:37.002976107]POCL: in fn POclCreateBuffer at line 256:
  |    MEMORY |  Created Buffer 8 (0x561c8b32c130), MEM_HOST_PTR: (nil), device_ptrs[0]: (nil), SIZE 512, FLAGS 1
[2023-01-19 01:58:37.002991572]POCL: in fn POclCreateKernel at line 139:
  |   GENERAL |  Created Kernel vecadd (0x561c8b32bd30)
[2023-01-19 01:58:37.003011888]POCL: in fn POclSetKernelArg at line 107:
  |   GENERAL |  Kernel          vecadd || SetArg idx   0 ||   float* || Local 0 || Size      8 || Value 0x7ffca34f6350 || Pointer 0x561c8b329330 || *(uint32*)Value:        0 || *(uint64*)Value:        0 ||
Hex Value:  3093328B 1C560000
[2023-01-19 01:58:37.003033612]POCL: in fn POclSetKernelArg at line 107:
  |   GENERAL |  Kernel          vecadd || SetArg idx   1 ||   float* || Local 0 || Size      8 || Value 0x7ffca34f6358 || Pointer 0x561c8b331280 || *(uint32*)Value:        0 || *(uint64*)Value:        0 ||
Hex Value:  8012338B 1C560000
[2023-01-19 01:58:37.003047371]POCL: in fn POclSetKernelArg at line 107:
  |   GENERAL |  Kernel          vecadd || SetArg idx   2 ||   float* || Local 0 || Size      8 || Value 0x7ffca34f6360 || Pointer 0x561c8b32c130 || *(uint32*)Value:        0 || *(uint64*)Value:        0 ||
Hex Value:  30C1328B 1C560000
[2023-01-19 01:58:37.003077696]POCL: in fn pocl_kernel_calc_wg_size at line 168:
  |   GENERAL |  Preparing kernel vecadd with local size 128 x 1 x 1 group sizes 1 x 1 x 1...
[2023-01-19 01:58:37.003111530]POCL: in fn pocl_driver_alloc_mem_obj at line 420:
  |    MEMORY |  Basic device ALLOC 0x561c8b65c9d0 / size 512
[2023-01-19 01:58:37.003123085]POCL: in fn pocl_driver_alloc_mem_obj at line 420:
  |    MEMORY |  Basic device ALLOC 0x561c8b39bb50 / size 512
[2023-01-19 01:58:37.003136372]POCL: in fn pocl_driver_alloc_mem_obj at line 420:
  |    MEMORY |  Basic device ALLOC 0x561c8b65bfb0 / size 512
[2023-01-19 01:58:37.003148678]POCL: in fn POclRetainCommandQueue at line 33:
  | REFCOUNTS |  Retain Command Queue 4 (0x561c8b3275b0), Refcount: 2
[2023-01-19 01:58:37.003159512]POCL: in fn pocl_create_event at line 526:
  |    EVENTS |  Created event 1 (0x561c8b329500) Command ndrange_kernel
[2023-01-19 01:58:37.003187245]POCL: in fn pocl_create_command_struct at line 669:
  |    EVENTS |  Created immediate command struct: CMD 0x561c8b32e440 (event 1 / 0x561c8b329500, type: ndrange_kernel)
[2023-01-19 01:58:37.003217828]POCL: in fn POclRetainKernel at line 33:
  | REFCOUNTS |  Retain Kernel vecadd (0x561c8b32bd30), Refcount: 2
[2023-01-19 01:58:37.003231820]POCL: in fn pocl_command_enqueue at line 1191:
  |    EVENTS |  In-order Q; adding event syncs
[2023-01-19 01:58:37.003263335]POCL: in fn pocl_command_enqueue at line 1236:
  |    EVENTS |  Pushed Event 1 to CQ 4.
[2023-01-19 01:58:37.003286127]POCL: in fn pocl_update_event_queued at line 2084:
  |    EVENTS |  Event queued: 1
[2023-01-19 01:58:37.003365527]POCL: in fn pocl_check_kernel_disk_cache at line 941:
  |   GENERAL |  Using a cached WG function: /work/tptuser/.cache/pocl/kcache/KK/IMMMNPJCCGPBGPCPFBDFJAFNGPBHCOEHIPAHC/vecadd/128-1-1-goffs0/vecadd.so
dlopen("/work/tptuser/.cache/pocl/kcache/KK/IMMMNPJCCGPBGPCPFBDFJAFNGPBHCOEHIPAHC/vecadd/128-1-1-goffs0/vecadd.so") failed with '/work/tptuser/.cache/pocl/kcache/KK/IMMMNPJCCGPBGPCPFBDFJAFNGPBHCOEHIPAHC/vecadd/128-1-1-goffs0/vecadd.so: wrong ELF class: ELFCLASS32'.
note: missing symbols in the kernel binary might be reported as 'file not found' errors.
Aborted (core dumped)

4: Compile a OpenCL C program into Ventus GPGPU assembly

vector_add.cl:

__kernel void vectorAdd(__global float* A, __global float* B) {
  unsigned tid = get_global_id(0);
  A[tid] += B[tid];
}

Compiler OpenCL C into Ventus assembly:

clang -cl-std=CL2.0 -target riscv32 -mcpu=ventus-gpgpu -O1 -S vector_add.cl -o vector_add.s

NOTE: OpenCL host side program should be linked with icd loader -lOpenCL.

5: TODOs

  • Emit barrier instruction for all stores to local/global memory except sGPR spill.
  • Stacks for sGPR spilling and per-thread usage is supported by using RISCV::X2 as warp level stack, RISCV::X4 as per-thread level stack. But the 2 stack size calculation are not yet splitted out, so a lot of stack slots are wasted.
  • VentusRegextInsertion pass may generate incorrect register ordering for next instruction, see FIXME in that pass. To avoid breaking def-use chain, we could keep the extended instruction unmodified by removing Op.setRegIgnoreDUChain() from the pass, the elf generation pass should ignore the higher bit(>2^5) of the register encoding automatically.
  • Pattern match VV and VX optimization. There is only type information in the DAG pattern matching, we can't specify whether to match a DAG to a vop.vv or vop.vx MIR in a tblgen pattern, so a fix pass should be ran after codegen pass.
  • Opencl kernel api - get_enqueued_local_size, need to support non-uniform workgroup