Readthrough transcription detection and gene assignment improvement
- Added option to base
talon
,--create_novel_spliced_genes
, which will create a novel gene if a spliced read is found that does not share splice junctions with any genes, but does overlap an existing locus or loci in the reference - Improved gene assignment for novel transcripts
- Reads that contain splice sites that overlap splice sites in a set of multiple non-overlapping genes (without shared
splice sites) are assigned to novel genes with the novelty label fusion
- Reads that contain splice sites that are shared between a set of multiple genes (ie the genes overlap) are now tiebroken
based on distance of read's 5' / 3' ends to transcripts from this gene set
- This change improves performance on annotating reads and transcripts to the right gene and thus improves gene
quantification as well
Single-cell support
- Added option to
talon
to use cell barcodes in the alignment files as separate datasets (--cb
) - Added utility to output TALON quantifications in AnnData format (
talon_create_adata
). Particularly useful for single-cell and large datasets where dense matrix representation is prohibitive. Capable of generating both gene- and transcript-level AnnDatas
Requirements changes
- Removed pybedtools as a requirement as it was breaking installs and is not required
- Restriction of Python versions to <=3.6 and >3.8, as Python 3.8 changes the behavior of variable sharing across multiprocessing threads (https://stackoverflow.com/questions/70552775/multiprocess-inherently-shared-memory-in-no-longer-working-on-python-3-10-comin)
Miscellaneous
- Added option to
talon_filter_transcripts
to exclude ISM transcripts (--excludeISMs
) - Added option to
talon_filter_transcripts
to include all transcripts from the reference annotation, regardless of whether they were observed in the datasets or not (--includeAnnot
) - Added script to return the longest observed ends for each transcript instead of the ones reported by TALON (
call_longest_ends
) - Added
--verbosity
option totalon
to tune how much output the user sees (0 = only errors, 1 = logging, 2 = debug) - Added support for BAM files as input in addition to SAM files
- Added multithreading to SAM to BAM compression
- Fixed minor bugs with temporary output directory behavior