Skip to content
Daniel Ayres edited this page Apr 15, 2015 · 1 revision

Overview

As part of the Google Summer of Code 2009 program, Daniel Ayres will implement an OpenCL version of beagle-lib.

Description

This project will implement an open-source C++ library to compute phylogenetic tree likelihoods on the GPU, with the core loops in OpenCL. Currently in phylogenetic inference there is a strong demand for more computational speed and typically likelihood calculation is the main bottleneck. Moving to GPU acceleration is a natural step to address that issue, as significant speedups in likelihood calculation have been observed both in my work and in that of others. The library will make use of the OpenCL framework, an open standard for general-purpose computation on GPUs. This hardware-independent, GPU-accelerated implementation of a likelihood calculation library has the potential of benefiting several state-of-the-art evolutionary analysis packages.

The OpenCL library will be a continuation of the beagle-lib project hosted on this site.

Project goals

  • Implements the minimum interface to compute tree likelihoods for BEAST
  • Implements additional interface elements for compatibility with other state-of-the-art phylogenetic software. Preliminarily these will be GARLI and MrBayes
  • Achieves significant speedups over current CPU implementations
  • Includes bindings for higher-level languages
  • Is well-documented

Project Plan

Preparation weeks

  • Learned more about OpenCL
  • Installed Snow Leopard and got familiarized with OpenCL examples
  • Learned more about CUDA Driver API
  • Tested autotools version of beagle-lib and documented issues on Google Code
  • Cloned trunk of beagle-lib into opencl-dev branch
  • Created project home page and documented current plan
  • Created preliminary logo for beagle-lib

05/23 - 05/30

  • Participated in a workgroup with the developers of the top phylogenetic and population genetic software today. This group had the purpose of defining a new API for beagle which would allow it to be used by these programs.
  • Gave presentation on GPU programming model for CUDA and OpenCL for the group. Also presented on previous work done on likelihood calculation on GPUs.
  • Together with workgroup participants, defined a new interface for the beagle library. (specifically beagle.h was modified)
  • Together with workgroup participants, updated the reference CPU implementation to match the new API. (among others BeagleCPUImpl.cpp was modified)
  • Set up Xcode project for the library and test cases to ease debugging process.

05/30 - 06/06

  • Re-write of CUDA implementation to comply with most new API functions in beagle.h
    • missing all but setTransitionMatrix, calculateEdgeLogLikelihoods
    • works with tinyTest
  • Cleaned up CUDA and CPU code significantly and fixed several bugs

06/06 - 06/13

  • Created new kernel to calculate edge likelihoods on GPU (calculateEdgeLogLikelihoods)
  • Continued to implement other additional functionality in CUDA implementation to match new API
    • fourTaxon example now works on GPU
  • Restructured project so that GPU kernels are not mixed with CUDA code that is run on the host (new files are PeelingFunctions.cu/PeelingKernels.cu and TransitionFunctions.cu/TransitionKernels.cu)
  • Functions in CUDASharedFunctions.cu adapted and moved to CUDASharedFunctions.c thus no longer require nvcc compiler
  • Worked on autotools issues

06/13 - 06/20

  • Implemented additional functionality in CUDA implementation related to new calculateEdgeLogLikelihoods
    • Dynamic scaling
    • StatesPartials
    • non-nucleotide state models
  • Visited mentor Aaron Darling at UC Davis
    • Gave talk to Eisen lab at Davis about BEAGLE and summer of code project

06/20 - 06/27

  • Implemented additional functions specified in new API
    • getPartials
    • setTransitionMatrix
  • Addressed rescaling issues with changes to API
    • additional scaling parameters for calculateRootLogLikelihoods and calculateEdgeLogLikelihoods
    • modified operation list in updatePartials
    • additional parameter categoryCount in createInstance
    • additional function setCategoryRates
  • Started porting to CUDA driver API
    • modified CUDA calls in BeagleCUDAImpl.cpp and CUDASharedFunctions.c

06/27 - 07/05

  • Converted from CUDA runtime API to driver API
  • Significant restructuring of CUDA implementation (now called GPU implementation)
    • New classes GPUInterface and KernelLauncher
    • Moved all CUDA specific code to a lower level. This is in preparation for OpenCL version and will allow the two GPU versions to share all code except for minor kernel modifications and the GPUInterface implementation

07/06 - 07/12

  • Modified API, CPU and GPU implementations and examples to reflect rescaling related changes 1 and 2 from http://bit.ly/1zhHKb
  • Modified API, implemented change 3 from http://bit.ly/1zhHKb to CPU version, GPU version is a work in progress
  • Switched to PTX kernel compilation, this results in slightly longer GPU initialization times but allows for performance gains from future driver improvements. See http://bit.ly/3Ra0qk
  • Fixed autotools compilation issues, see http://bit.ly/12MaGe
  • Updated Xcode project to reflect PTX change. Requires modified CUDA plugin available at http://bit.ly/14wAQJ
  • Fixed JNI wrapper to reflect modified API

07/13 - 07/19

  • Continued work on rate categories reimplementation on GPU version
    • Previous implementation presented some issues when mixing with new API definitions for matrixCount
  • Initial work on derivative calculation kernels
    • Started with kernels I developed for GARLI but are largely unoptimized
    • Exploring breaking complex kernel into smaller ones
  • Considered using the experimental OpenCL C++ bindings (http://www.khronos.org/registry/cl/) as a replacement for current CUDA/OpenCL GPUInterface
    • Decided against it; added complexity, less flexibility and little benefit. Might be useful as a separate project though

07/20 - 07/26

  • Completed work on rate categories implementation on GPU version
    • Modified BeagleGPUImpl, KernelLauncher, BeagleCUDA_kernels
  • Fixed compilation issues so that CUDA version works under Snow Leopard
    • Modifications to makefiles still have to be merged and conditionally activated
  • Fixed minor issues with conversion to libhmsbeagle
  • Minor updates to Xcode project

07/27 - 08/02

  • Began work on OpenCL version
    • Created OpenCL implementation of GPUInterface
    • Started converting CUDA kernels to OpenCL
  • Fixed several rate categories related bugs
  • Modified API with regards to scaling
    • modified updatePartials. Allows separate setting of write and read scalingBuffers as well as cumulativeScalingBuffer
    • new accumulateScaleFactors, allows accumulating factors into a scaling buffer
    • new removeScaleFactors, allows removing factors from a cumulative scaling buffer
    • new resetScaleFactors which resets a cumulative scaling buffer
  • Added API function for setting partials at tips
    • new setTipPartials which requires only one copy of input partials instead category count copies
  • added support for run time selection of resource (not very robust at the moment but allows one to choose between running on CPU or GPU)

08/03 - 08/09

  • Continued work on OpenCL implementation
    • Converted kernels to OpenCL
    • Currently solving a command queue bug
  • Implemented missing scaling functions in GPU version
    • added remove, reset, and accumulate with updatePartials
  • Implemented support for rate categories in CPU calculateEdgeLogLikelihoods
  • API refactoring
    • added beagle prefix to enums, structs, and functions
  • Simplified code base
    • removed no longer needed DYNAMIC_SCALING blocks
    • removed no longer needed caching of tip states/partials
  • Fixed bugs
    • GPU synchronize issue when not rescaling
    • resource list initialization, error when no CUDA device
    • getDeviceDescription, was always returning info from the first device
    • minor Xcode project issues
  • Cleaning up
    • removed obsolete branches from repository
    • updated examples to latest API changes
    • improved output when debug flags defined
    • beagle.h API documentation

08/09 - 08/17

  • Completed work on OpenCL implementation
    • matches CUDA version feature-wise
    • can be compiled on Mac OS 10.6 with ./configure && make
  • Fixed minor bugs in GPU implementation

Deliverables

05/30

  • new API defined

06/20

  • CUDA driver API port of beagle-lib

08/17

  • OpenCL implementation of beagle-lib

Decision Discussion

Convert beagle-lib CUDA implementation to CUDA Driver API

Although the driver API is more complex, it will allow for much easier transition to OpenCL. And, since we intend to maintain both CUDA and OpenCL implementations of beagle-lib, it makes sense for these to be as similar as possible.

Documentation