Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JACUJA2 Code Example Request #73

Open
Chan-W00 opened this issue Jun 20, 2024 · 3 comments
Open

JACUJA2 Code Example Request #73

Chan-W00 opened this issue Jun 20, 2024 · 3 comments

Comments

@Chan-W00
Copy link

Dear JACUSA2 team,

I am a truely new to bioinformatics and have been trying to analyze C-to-T base editing that is mediated APOBEC1 enzyme directly conjugated to protein of interest (POI).

My situation is:

  1. Bulk RNA-seq from HEK293 transiently transfected with POI-APOBEC1 or mCherry-APOBEC1 in 150 PE
  2. Biological duplicates
  3. Finished pre-processing, such as trimming (cutadopt), alignment (STAR), sorting (samtools), deduplication (Picard), MD tagging (samtools) and indexing (samtools)
  4. Once tried with
    java -jar JACUSA_v2.0.4.jar call-2 -r result.out cond.1_rep.1.bam,cond.1_rep.2.bam cond.2_rep.1.bam,cond.2_rep.2.bam -p 4
  5. I got following result (Several first strigs):
    #contig start end name score strand bases11 bases12 bases21 bases22 info filter ref
    chr1 14521 14522 call-2 0.0639599068288419 . 1,0,15,0 0,0,12,0 1,0,13,0 1,0,21,0 * * G
    chr1 14522 14523 call-2 0.46276221056984923 . 0,16,0,0 0,12,0,0 0,15,0,0 0,21,0,1 * * C
    chr1 14541 14542 call-2 0.12227137530251753 . 17,0,2,0 13,0,1,0 11,0,4,0 23,0,1,0 * * A
    chr1 14573 14574 call-2 0.20814993206096233 . 24,0,8,0 22,0,1,0 26,0,5,0 25,0,2,0 * * A

My questions are:

  1. Is my code correct? Otherwise, please make recommanded code for me
  2. What does basesN mean? (bases12-bases22)
  3. What does ref mean, while I didn't put reference genome (.fa) file to the code?

Please.. Give me any answer, otherwise my professor will kill me 😹.

@piechottam
Copy link
Contributor

  1. This depends on you - you could add filtering " -a D ". If you know your library type - you could add it to your cal "-P ...".
  2. basesN (you can find details in the manual). -> base12 -> condition 1 and replicate 2 (speaking in filenames: cond.1_rep.2.bam). The ","-vector corresponds to the observed base call count A,C,G,T
  3. ref is the reference base. The MD field from the BAM file (check SAM file format) is used to deduce the reference base.

@Chan-W00
Copy link
Author

Chan-W00 commented Jun 20, 2024

What a fast... thanks for reply.

So, according to your answer,
"chr1 14521 14522 call-2 0.0639599068288419 . 1,0,15,0 0,0,12,0 1,0,13,0 1,0,21,0 * * G" means that
in chromosome number 1 loci 14521-2, condition_1 rep_1 has 1,0,15,0 base conversion from G to A,C,G,T, respectively.

I got it and this is reason why the third position (G) is higher than others.

I really want to caculate total editing number per gene but have no idea about it, yet.
Is there any recommandation for me to do this?

@piechottam
Copy link
Contributor

Sry, for the late response.

You can use bedtools to add the gene name or some other ID (as new column based on coordinate overlap) to the JACUAS2 output.
In R you could then aggregate over the freshly added column,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants