GSoC2009

Overview

As part of the Google Summer of Code 2009 program, Daniel Ayres will implement an OpenCL version of beagle-lib.

Description

This project will implement an open-source C++ library to compute phylogenetic tree likelihoods on the GPU, with the core loops in OpenCL. Currently in phylogenetic inference there is a strong demand for more computational speed and typically likelihood calculation is the main bottleneck. Moving to GPU acceleration is a natural step to address that issue, as significant speedups in likelihood calculation have been observed both in my work and in that of others. The library will make use of the OpenCL framework, an open standard for general-purpose computation on GPUs. This hardware-independent, GPU-accelerated implementation of a likelihood calculation library has the potential of benefiting several state-of-the-art evolutionary analysis packages.

The OpenCL library will be a continuation of the beagle-lib project hosted on this site.

Project goals

Implements the minimum interface to compute tree likelihoods for BEAST
Implements additional interface elements for compatibility with other state-of-the-art phylogenetic software. Preliminarily these will be GARLI and MrBayes
Achieves significant speedups over current CPU implementations
Includes bindings for higher-level languages
Is well-documented

Project Plan

Preparation weeks

Learned more about OpenCL
Installed Snow Leopard and got familiarized with OpenCL examples
Learned more about CUDA Driver API
Tested autotools version of beagle-lib and documented issues on Google Code
Cloned trunk of beagle-lib into opencl-dev branch
Created project home page and documented current plan
Created preliminary logo for beagle-lib

05/23 - 05/30

Participated in a workgroup with the developers of the top phylogenetic and population genetic software today. This group had the purpose of defining a new API for beagle which would allow it to be used by these programs.
Gave presentation on GPU programming model for CUDA and OpenCL for the group. Also presented on previous work done on likelihood calculation on GPUs.
Together with workgroup participants, defined a new interface for the beagle library. (specifically beagle.h was modified)
Together with workgroup participants, updated the reference CPU implementation to match the new API. (among others BeagleCPUImpl.cpp was modified)
Set up Xcode project for the library and test cases to ease debugging process.

05/30 - 06/06

Re-write of CUDA implementation to comply with most new API functions in beagle.h
- missing all but setTransitionMatrix, calculateEdgeLogLikelihoods
- works with tinyTest
Cleaned up CUDA and CPU code significantly and fixed several bugs

06/06 - 06/13

Created new kernel to calculate edge likelihoods on GPU (calculateEdgeLogLikelihoods)
Continued to implement other additional functionality in CUDA implementation to match new API
- fourTaxon example now works on GPU
Restructured project so that GPU kernels are not mixed with CUDA code that is run on the host (new files are PeelingFunctions.cu/PeelingKernels.cu and TransitionFunctions.cu/TransitionKernels.cu)
Functions in CUDASharedFunctions.cu adapted and moved to CUDASharedFunctions.c thus no longer require nvcc compiler
Worked on autotools issues
- Fixed OS X compatibility issues http://code.google.com/p/beagle-lib/issues/list

06/13 - 06/20

Implemented additional functionality in CUDA implementation related to new calculateEdgeLogLikelihoods
- Dynamic scaling
- StatesPartials
- non-nucleotide state models
Visited mentor Aaron Darling at UC Davis
- Gave talk to Eisen lab at Davis about BEAGLE and summer of code project

06/20 - 06/27

Implemented additional functions specified in new API
- getPartials
- setTransitionMatrix
Addressed rescaling issues with changes to API
- additional scaling parameters for calculateRootLogLikelihoods and calculateEdgeLogLikelihoods
- modified operation list in updatePartials
- additional parameter categoryCount in createInstance
- additional function setCategoryRates
Started porting to CUDA driver API
- modified CUDA calls in BeagleCUDAImpl.cpp and CUDASharedFunctions.c

06/27 - 07/05

Converted from CUDA runtime API to driver API
Significant restructuring of CUDA implementation (now called GPU implementation)
- New classes GPUInterface and KernelLauncher
- Moved all CUDA specific code to a lower level. This is in preparation for OpenCL version and will allow the two GPU versions to share all code except for minor kernel modifications and the GPUInterface implementation

07/06 - 07/12

Modified API, CPU and GPU implementations and examples to reflect rescaling related changes 1 and 2 from http://bit.ly/1zhHKb
Modified API, implemented change 3 from http://bit.ly/1zhHKb to CPU version, GPU version is a work in progress
Switched to PTX kernel compilation, this results in slightly longer GPU initialization times but allows for performance gains from future driver improvements. See http://bit.ly/3Ra0qk
Fixed autotools compilation issues, see http://bit.ly/12MaGe
Updated Xcode project to reflect PTX change. Requires modified CUDA plugin available at http://bit.ly/14wAQJ
Fixed JNI wrapper to reflect modified API

07/13 - 07/19

Continued work on rate categories reimplementation on GPU version
- Previous implementation presented some issues when mixing with new API definitions for matrixCount
Initial work on derivative calculation kernels
- Started with kernels I developed for GARLI but are largely unoptimized
- Exploring breaking complex kernel into smaller ones
Considered using the experimental OpenCL C++ bindings (http://www.khronos.org/registry/cl/) as a replacement for current CUDA/OpenCL GPUInterface
- Decided against it; added complexity, less flexibility and little benefit. Might be useful as a separate project though

07/20 - 07/26

Completed work on rate categories implementation on GPU version
- Modified BeagleGPUImpl, KernelLauncher, BeagleCUDA_kernels
Fixed compilation issues so that CUDA version works under Snow Leopard
- Modifications to makefiles still have to be merged and conditionally activated
Fixed minor issues with conversion to libhmsbeagle
Minor updates to Xcode project

07/27 - 08/02

Began work on OpenCL version
- Created OpenCL implementation of GPUInterface
- Started converting CUDA kernels to OpenCL
Fixed several rate categories related bugs
Modified API with regards to scaling
- modified updatePartials. Allows separate setting of write and read scalingBuffers as well as cumulativeScalingBuffer
- new accumulateScaleFactors, allows accumulating factors into a scaling buffer
- new removeScaleFactors, allows removing factors from a cumulative scaling buffer
- new resetScaleFactors which resets a cumulative scaling buffer
Added API function for setting partials at tips
- new setTipPartials which requires only one copy of input partials instead category count copies
added support for run time selection of resource (not very robust at the moment but allows one to choose between running on CPU or GPU)

08/03 - 08/09

Continued work on OpenCL implementation
- Converted kernels to OpenCL
- Currently solving a command queue bug
Implemented missing scaling functions in GPU version
- added remove, reset, and accumulate with updatePartials
Implemented support for rate categories in CPU calculateEdgeLogLikelihoods
API refactoring
- added beagle prefix to enums, structs, and functions
Simplified code base
- removed no longer needed DYNAMIC_SCALING blocks
- removed no longer needed caching of tip states/partials
Fixed bugs
- GPU synchronize issue when not rescaling
- resource list initialization, error when no CUDA device
- getDeviceDescription, was always returning info from the first device
- minor Xcode project issues
Cleaning up
- removed obsolete branches from repository
- updated examples to latest API changes
- improved output when debug flags defined
- beagle.h API documentation

08/09 - 08/17

Completed work on OpenCL implementation
- matches CUDA version feature-wise
- can be compiled on Mac OS 10.6 with ./configure && make
Fixed minor bugs in GPU implementation

Deliverables

05/30

new API defined

06/20

CUDA driver API port of beagle-lib

08/17

OpenCL implementation of beagle-lib

Decision Discussion

Convert beagle-lib CUDA implementation to CUDA Driver API

Although the driver API is more complex, it will allow for much easier transition to OpenCL. And, since we intend to maintain both CUDA and OpenCL implementations of beagle-lib, it makes sense for these to be as similar as possible.

Documentation

Home

Release Notes

macOS Install Instructions
Windows Install Instructions
Linux Install Instructions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly