-
Notifications
You must be signed in to change notification settings - Fork 18
wtsi-npg/illumina2bam
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Here is a collection of tools to generate or process BAM/SAM files using Picard Java API. It currently includes Illumina2bam, BamIndexDecoder, BamReadTrimmer, BamMerger, BamTagStripper, BamQualityQuantisation, ChangeBamHeader, SplitBamByReadGroup and AlignmentFilter etc. It is a NetBeans Java project. You should be able to open it from NetBeans directly. To test: ant test. To generate jar files: ant jar. You can find more information from http://gq1.github.com/illumina2bam/. --------------------------------------------------- Illumina2bam This tool converts Illumina BCL files directly to bam files using Picard JAVA API. It will create a bam file for an entire lane. It puts an indexing read quality and sequence in tags (we're suggesting the use of) "QT" and "BC" e.g. QT:Z:HHHHHHHH BC:Z:ATCACGTT, on the first read record. User can overwrite them from command line. You can pipe the output bam to BamIndexDecoder to deplex. It puts what preceding Illumina code has run e.g. RTA, in PG headers. By default it only puts in PF filtered reads. There's an option to put all reads in the BAM. Default bcl2qseq will also exclude sequence identified as TruSeq controls from passing the qseq filter field. Currently we don't do that yet. These reads should be included in the output. We expect read qualities to differ from those produced via default bcl2qseq as we'll give Ns a quality of 0 (rather than 2 which we're not convinced about), and we'll not do the EAMSS(?) flattening of the 3' tail of read to qvals of 2 a.k.a. "The Killer Bs". There's also an option to record the second basecall from SCL files. RunInfo or runParamaters under runfolder or Config XML files under Intensities and BaseCalls directory are used to get tile list, cycle ranges per read and other meta data. CLOCS and filter files are also needed. If Clocs file missing, it will try to use locs or pos file instead. All reads are supposed in one read group, by default, which id is 1. Sample and library name can be passed from command line, otherwise use unknown. EXAMPLE TO RUN: java -jar "Illumina2bam.jar" INTENSITY_DIR=testdata/110323_HS13_06000_B_B039WABXX/Data/Intensities LANE=1 OUTPUT=testdata/6000_1.bam VALIDATION_STRINGENCY=STRICT CREATE_INDEX=false CREATE_MD5_FILE=true FIRST_TILE=1101 TILE_LIMIT=1 COMPRESSION_LEVEL=1 HELP: java -jar "Illumina2bam.jar" -h USAGE: Illumina2bam [options] Convert Illumina BCL to BAM or SAM file. Version: 0.03 Options: --help -h Displays options specific to this tool. --stdhelp -H Displays options specific to this tool AND options common to all Picard command line tools. --version Displays program version. RUN_FOLDER=File R=File Illumina runfolder directory including runParameters xml file under it, upwards two levels from Intensities directory if not given. Default value: null. INTENSITY_DIR=File I=File Illumina intensities directory including config xml file, and clocs, locs or pos files under lane directory. Required. BASECALLS_DIR=File B=File Illumina basecalls directory including config xml file, and filter files, bcl, maybe scl files under lane cycle directory, using BaseCalls directory under intensities if not given. Default value: null. LANE=Integer L=Integer Lane number. Required. OUTPUT=File O=File Output file name. Required. GENERATE_SECONDARY_BASE_CALLS=Boolean E2=Boolean Including second base call or not, default false. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} PF_FILTER=Boolean PF=Boolean Filter cluster or not, default true. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} READ_GROUP_ID=String RG=String ID used to link RG header record with RG tag in SAM record, default 1. Default value: 1. This option can be set to 'null' to clear the default value. SAMPLE_ALIAS=String SM=String The name of the sequenced sample, using library name if not given. Default value: null. LIBRARY_NAME=String LB=String The name of the sequenced library, default unknown. Default value: unknown. This option can be set to 'null' to clear the default value. STUDY_NAME=String ST=String The name of the study. Default value: null. PLATFORM_UNIT=String PU=String The platform unit, using runfolder name plus lane number if not given. Default value: null. RUN_START_DATE=Iso8601Date The start date of the run, read from config file if not given. Default value: null. SEQUENCING_CENTER=String SC=String Sequence center name, default SC for Sanger Center. Default value: SC. This option can be set to 'null' to clear the default value. PLATFORM=String The name of the sequencing technology that produced the read, default ILLUMINA. Default value: ILLUMINA. This option can be set to 'null' to clear the default value. FIRST_TILE=Integer If set, this is the first tile to be processed (for debugging). Note that tiles are not processed in numerical order. Default value: null. TILE_LIMIT=Integer If set, process no more than this many tiles (for debugging). Default value: null. BARCODE_SEQUENCE_TAG_NAME=String BC_SEQ=String Tag name for barcode sequence. Default value: BC. This option can be set to 'null' to clear the default value. BARCODE_QUALITY_TAG_NAME=String BC_QUAL=String Tag name for barcode quality. Default value: QT. This option can be set to 'null' to clear the default value. SECOND_BARCODE_SEQUENCE_TAG_NAME=String SEC_BC_SEQ=String Tag name for second barcode sequence. Default value: null. SECOND_BARCODE_QUALITY_TAG_NAME=String SEC_BC_QUAL=String Tag name for second barcode quality. Default value: null. ----------------------------------------------- BamReadTrimmer Strip part of a read (fixed position) - typically a prefix of the forward read, and optionally place this and its quality in BAM tags. EXAMPLE TO RUN: java -jar "BamReadTrimmer.jar" INPUT=testdata/bam/6210_8.sam OUTPUT=testdata/6210_8_trimmed.bam FIRST_POSITION_TO_TRIM=1 TRIM_LENGTH=3 CREATE_MD5_FILE=true ONLY_FORWARD_READ=true SAVE_TRIM=true TRIM_BASE_TAG=rs TRIM_QUALITY_TAG=qs VALIDATION_STRINGENCY=SILENT ----------------------------------------------- BamMerger Merge BAM/SAM alignment info in a bam with the data in an unmapped BAM file, producing a third BAM file that has alignment data and all the additional data from the unmapped BAM. The SQ records and alignment PG records in the aligned bam file will be added to the header of unmampped bam file to form the new header of output file. EXAMPLE TO RUN: java -jar "BamMerger.jar" ALIGNED=testdata/bam/6210_8_aligned.sam I=testdata/bam/6210_8.sam OUTPUT=testdata/6210_8_merged.bam VALIDATION_STRINGENCY=SILENT ----------------------------------------------- STANDARD PICARD OPTIONS: TMP_DIR=File Default value: /tmp/username. This option can be set to 'null' to clear the default value. VERBOSITY=LogLevel Control verbosity of logging. Default value: INFO. This option can be set to 'null' to clear the default value. Possible values: {ERROR, WARNING, INFO, DEBUG} QUIET=Boolean Whether to suppress job-summary info on System.err. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} VALIDATION_STRINGENCY=ValidationStringency Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default value: STRICT. This option can be set to 'null' to clear the default value. Possible values: {STRICT, LENIENT, SILENT} COMPRESSION_LEVEL=Integer Compression level for all compressed files created (e.g. BAM and GELI). Default value: 5. This option can be set to 'null' to clear the default value. MAX_RECORDS_IN_RAM=Integer When writing SAM files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort a SAM file, and increases the amount of RAM needed. Default value: 500000. This option can be set to 'null' to clear the default value. CREATE_INDEX=Boolean Whether to create a BAM index when writing a coordinate-sorted BAM file. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} CREATE_MD5_FILE=Boolean Whether to create an MD5 digest for any BAM files created. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} OPTIONS_FILE=File File of OPTION_NAME=value pairs. No positional parameters allowed. Unlike command-line options, unrecognized options are ignored. A single-valued option set in an options file may be overridden by a subsequent command-line option. A line starting with '#' is considered a comment. This option may be specified 0 or more times.
About
Generate and process BAM files from Illumina sequencing instrument files
Resources
Stars
Watchers
Forks
Packages 0
No packages published