This repository is a clone of public TVM repository (https://github.com/apache/tvm), plus experimental modifications which provide support for the NEC SX-Aurora TSUBASA Vector Engine (VE).
- After installing all tools necessary for LLVM-VE and TVM, run:
git clone https://github.com/saudet/llvm-project/ mkdir llvm-project/build cd llvm-project/build git checkout hpce/develop git submodule update --init --recursive LLVM_PATH=$(pwd) cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS=clang ../llvm make -j100 cd ../.. git clone https://github.com/sx-aurora-dev/vednn mkdir vednn/build cd vednn/build cmake -DLLVM_DIR=$LLVM_PATH/lib/cmake/llvm/ -DLLVM_INSTALL_PREFIX=$LLVM_PATH .. make -j100 cd ../.. git clone https://github.com/sx-aurora-dev/vml mkdir vml/build cd vml/build sed -i /test/d ../CMakeLists.txt cmake -DLLVM_DIR=$LLVM_PATH/lib/cmake/llvm/ -DLLVM_INSTALL_PREFIX=$LLVM_PATH -DNLC_VERSION=2.3.0 -DBUILD_VEORUN_TF=OFF -DUSE_VEDNN=OFF .. make -j100 cd ../.. git clone https://github.com/saudet/tvm mkdir tvm/build cd tvm/build git checkout aurora git submodule update --init --recursive cmake -DBUILD_FOR_VE=TRUE -DUSE_LLVM=$LLVM_PATH/bin/llvm-config .. make -j100 cd ../.. git clone https://github.com/siju-samuel/darknet cd darknet git checkout tvm make mkdir -p ~/.tvm_test_data/darknet cp libdarknet.so ~/.tvm_test_data/darknet/libdarknet2.0.so cd ..
-
BERT model from Speed up your BERT inference by 3x on CPUs using Apache TVM:
export TVM_HOME=$(pwd)/tvm/ export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH} cd tvm/apps/howto_deploy export RV_REPORT=1 export RV_FORCE_WIDTH=256 export RV_FORCE_FUNCTIONS=\ fused_reshape_add_cast_expand_dims_broadcast_to_reshape_2_compute___tvm_parallel_lambda,\ fused_take_transpose_contrib_reverse_reshape_transpose_2_compute___tvm_parallel_lambda,\ fused_contrib_reverse_reshape_transpose_reshape_2_compute___tvm_parallel_lambda,\ fused_subtract_add_sqrt_divide_multiply_add_2_compute___tvm_parallel_lambda, export RV_FORCE_LOOPS=for_body,for_body2,for_body5,for_body5.1,for_body5.2,for_body5.us.us,for_body5.us.us.1,for_body5.us.us.2 make lib/cpp_deploy_normal # lib/cpp_deploy_pack # small functions just to test OMP_NUM_THREADS=8 lib/cpp_deploy_bert
-
Benchmark for "resnet-50", "mobilenet", "vgg-19", "inception_v3", etc ImageNet models:
export TVM_HOME=$(pwd)/tvm/ export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH} cd tvm/apps/benchmark make OMP_NUM_THREADS=8 python3 ve_imagenet_bench.py
-
Darknet models like YOLOv2, YOLOv3, etc
export TVM_HOME=$(pwd)/tvm/ export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH} cd tvm/tutorials/frontend make OMP_NUM_THREADS=8 python3 from_darknet_on_ve.py
Notes:
- The
libtvm_runtime.so
created via CMake won't work with, for example,cpp_deploy_normal
- This is because it's for the x86 host, not the VE target (use a
libtvm_runtime_pack.so
for that)
- This is because it's for the x86 host, not the VE target (use a
- The functions from a static "TVM system lib" do not get registered with TVM properly for some reason
- Instead, we need to export models to shared libraries
- The deployed libraries link and run well with BLAS from NLC and OpenMP from NCC
- However, more work needs to be done to link with oneDNN and veDNN
- The vectorized code currently generated by LLVM-VE for VPU crashes with
Segmentation fault
- Please refer to sx-aurora-dev/llvm-project#24
Documentation | Contributors | Community | Release Notes
Apache TVM (incubating) is a compiler stack for deep learning systems. It is designed to close the gap between the productivity-focused deep learning frameworks, and the performance- and efficiency-focused hardware backends. TVM works with deep learning frameworks to provide end to end compilation to different backends.
© Contributors Licensed under an Apache-2.0 license.
TVM adopts apache committer model, we aim to create an open source project that is maintained and owned by the community. Checkout the Contributor Guide
We learned a lot from the following projects when building TVM.
- Halide: Part of TVM's TIR and arithmetic simplification module originates from Halide. We also learned and adapted some part of lowering pipeline from Halide.
- Loopy: use of integer set analysis and its loop transformation primitives.
- Theano: the design inspiration of symbolic scan operator for recurrence.