Using TrajGWAS for large-scale datasets: how to improve performance? #48

parekhpravesh · 2024-09-05T07:08:45Z

I would like to run TrajGWAS on some large-scale longitudinal phenotypes. Specifically, I have 100,000 observations, 48 covariates (+ intercept), and 100 phenotypes. I would like to get effect size estimates as well (so running a Wald test)

As an example, I ran TrajGWAS for one phenotype. I start Julia with: julia --threads 64 and then do:

trajGWAS(@formula(y ~ 1 + X1 + X2 + ... + X48),
@formula(y ~ 1),
@formula(y ~ 1 + X1 + X2 + ... + X48),
:id,
path_to_csv_file,
path_to_plink_file,
pvalfile = p_output_name,
nullfile = null_output_name,
covrowinds = covrowmask,
genetic rowinds = geneticrowmask,
parallel = :true,
test = :wald)

I am doing this as a slurm job with --cpus-per-task=64 and mem-per-cpu=7G specifications. Julia version: 1.10.0

However, after about 22 hours, only about 700 SNPs have been written to the output file. This is quite a bit slow and I wonder if there are any suggestions on how to make this efficient? Perhaps I am not specifying parallelisation correctly?

The text was updated successfully, but these errors were encountered:

parekhpravesh · 2024-09-18T19:40:37Z

Hello, just following up on this - I tried the same settings and after 3+ days of computation, only about 3000 SNPs were written to the output file. Do you have any tips/suggestions on how the performance can be improved?

kose-y · 2024-09-19T01:15:48Z

Oh, sorry for the late response. The Wald test, giving the effect sizes, is much slower than the score test, which does not give the effect sizes. Our suggestion is first to screen the SNPs with the score test and take a subset of SNPs with low p-values, then compute the effect sizes using the Wald test only for the selected SNPs.

parekhpravesh · 2024-09-23T13:20:54Z

Thank you - I tried running a score test and could finish the analysis in ~35 hours - could you confirm if I am specifying the parallelisation option correctly? Or is everything implemented for single threaded computation and it doesn't really matter?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using TrajGWAS for large-scale datasets: how to improve performance? #48

Using TrajGWAS for large-scale datasets: how to improve performance? #48

parekhpravesh commented Sep 5, 2024

parekhpravesh commented Sep 18, 2024

kose-y commented Sep 19, 2024

parekhpravesh commented Sep 23, 2024

Using TrajGWAS for large-scale datasets: how to improve performance? #48

Using TrajGWAS for large-scale datasets: how to improve performance? #48

Comments

parekhpravesh commented Sep 5, 2024

parekhpravesh commented Sep 18, 2024

kose-y commented Sep 19, 2024

parekhpravesh commented Sep 23, 2024