You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a few questions about the strand information in JACUSA 2 output, and the proper usage of that information for RNA editing identification & frequency estimation:
In the variant detection scenario call-1 and given an RNA sample sequenced on the first strand, using -P RF-FIRSTSTRAND will allow some sites to have two lines of output, each for a strand, like this bed-like out output:
#contig start end name score strand bases11 info filter ref
3 10084 10085 call-1 0.0 + 0,5,0,0 * * C
3 10084 10085 call-1 0.0 - 0,0,7,0 * * G
3 10085 10086 call-1 0.0 + 0,0,5,0 * * G
3 10085 10086 call-1 0.0 - 0,7,0,0 * * C
3 10086 10087 call-1 0.0 + 0,0,0,5 * * T
3 10086 10087 call-1 0.0 - 7,0,0,0 * * A
3 10087 10088 call-1 0.0 + 5,0,0,0 * * A
3 10087 10088 call-1 0.0 - 0,0,0,7 * * T
Above is a simple case of non-editing sites, where the ref allele is simply inverted, but the allele counts (here equal to depth) are not the same (but similar) for inverted bases. How to interpret this? My guess is that there are some reads mapped to both strands, contributing to allele counts on both strands, while reads that mapped specifically to either strands contribute to the difference in allele counts. Can you confirm on this?
#contig start end name score strand bases11 info filter ref
3 10088 10089 call-1 1.556381955545767 + 2,0,3,0 * * G
3 10088 10089 call-1 1.4505124214123128 - 0,5,0,2 * * C
In a more complicated case, where RNA editing is detected on both strands, I feel uncertain about how to interpret this result. I wonder if it's ok to interpret this as true RNA editing events on both strands, or if we are more confident about editing called on one of the strands, based on certain statistics (e.g., Z score)?
If my guess about the first example is correct, it could be possible that the 3 reads covering G on the positive strand are contributed by the same reads that also mapped to C on the negative strand (base order : A, C, G, T). If unfortunately those reads were from negative strand-sourced transcripts, then RNA editing occurred at the genomic location on negative strand.
The problem is: we won't know what proportion of allele counts are actually contributed by the same reads, nor which strand of transcripts were those reads from.
I could think of the following options to determine which strand an editing is more likely to occurred at, when JACUSA2 detected candidate editing on both strands of a given site:
Simply pick the strand that has a higher z score (i.e., the divergence between sample and simulated base vectors, if in call-1).
Pick the strand that has a higher total coverage at that site; if coverage ties between strands, pick the strand with a higher z score.
I know this is a long discussion, but I would appreciate a lot if you could clarify my questions, and/or offer your insights in how to solve the dual-strand problem.
The text was updated successfully, but these errors were encountered:
call-1 works like call-2 when it comes to library type and strand information.
The only difference is that an in silico sample is simulated from the existing read information by substituting all non-reference base calls with the reference base.
Are you sure, that "RF-FIRSTSTRAND" is correct library type? You could check with how the strand information derived in JACUSA2 call-1 compares to existing GTF annotation.
Do you have an IGV snapshot of the problematic region?
Hi Michael & fellow users,
I have a few questions about the strand information in JACUSA 2 output, and the proper usage of that information for RNA editing identification & frequency estimation:
In the variant detection scenario
call-1
and given an RNA sample sequenced on the first strand, using-P RF-FIRSTSTRAND
will allow some sites to have two lines of output, each for a strand, like this bed-like out output:Above is a simple case of non-editing sites, where the ref allele is simply inverted, but the allele counts (here equal to depth) are not the same (but similar) for inverted bases. How to interpret this? My guess is that there are some reads mapped to both strands, contributing to allele counts on both strands, while reads that mapped specifically to either strands contribute to the difference in allele counts. Can you confirm on this?
In a more complicated case, where RNA editing is detected on both strands, I feel uncertain about how to interpret this result. I wonder if it's ok to interpret this as true RNA editing events on both strands, or if we are more confident about editing called on one of the strands, based on certain statistics (e.g., Z score)?
If my guess about the first example is correct, it could be possible that the 3 reads covering
G
on the positive strand are contributed by the same reads that also mapped toC
on the negative strand (base order : A, C, G, T). If unfortunately those reads were from negative strand-sourced transcripts, then RNA editing occurred at the genomic location on negative strand.The problem is: we won't know what proportion of allele counts are actually contributed by the same reads, nor which strand of transcripts were those reads from.
I could think of the following options to determine which strand an editing is more likely to occurred at, when JACUSA2 detected candidate editing on both strands of a given site:
call-1
).Since the proportion of overlapping genes on opposite strands is generally low (an average of 3.3% of overlapping exons on opposite strands in human chromosome), I think it is fine to either discard these sites, or pick one strand and give up potential information on the other. What do you think?
I know this is a long discussion, but I would appreciate a lot if you could clarify my questions, and/or offer your insights in how to solve the dual-strand problem.
The text was updated successfully, but these errors were encountered: