Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use Keyword arguments covrowinds and geneticrowinds when using a VCF file? #37

Open
zjy71 opened this issue Sep 2, 2023 · 8 comments

Comments

@zjy71
Copy link

zjy71 commented Sep 2, 2023

Hello,
I want to know how to use Keyword arguments covrowinds and geneticrowinds when using a VCF file? Only some of samples in my VCF file are with an interested phenotype.
Here is my script and errors.

vcfdir = "/mnt/OS5300/data/public_data/impute/"
chr = 1
fitted_null = "PLT.null.txt"
pvaldir = "/share/home/output/"
vcf_dir = "/share/home/bin/vcf_id.txt"

vcffile = vcfdir * "glimpse_merged.chr$(chr)"
pvalfile = pvaldir * "/P.chr$(chr).txt"
covfile = "/share/home/trajgwas_lg_coding.csv"

pheno_data = CSV.read(covfile, DataFrame)
vcf = CSV.read(vcf_dir, DataFrame, header = false)[!,1]

covrowmask, geneticrowmask = matchindices(@formula(H ~ 1 + age + BMI + day),
        @formula(y ~ 1),
        @formula(Hb ~ 1 + age + BMI + day),
        :id,
        pheno_data, vcf);

trajgwas(@formula(H ~ 1 + age + BMI + day),
        @formula(Hb ~ 1),
        @formula(Hb ~ 1 + age + BMI + day),
        :id,
        pheno_data,
        vcffile;
        geneticformat = "VCF",
        vcftype = :DS,
        pvalfile,
        nullfile = fitted_null,
        covrowinds = covrowmask,
        geneticrowinds = geneticrowmask)`

`ERROR: LoadError: AssertionError: Null model contains IDs not in the /mnt/OS5300/data/public_data/impute/glimpse_merged.chr1 file applying `geneticrowinds` mask.
Stacktrace:
 [1] trajgwas(fittednullmodel::WSVarLmmModel{Float64}, geneticfile::String; analysistype::String, geneticformat::String, vcftype::Symbol, samplepath::Nothing, testformula::FormulaTerm{Term, Term}, test::Symbol, init::Function, pvalfile::String, snpmodel::Val{1}, snpinds::Nothing, usespa::Bool, reportchisq::Bool, geneticrowinds::Vector{Int64}, solver::Ipopt.Optimizer, solver_config::Dict{String, Any}, parallel::Bool, runs::Int64, verbose::Bool, snpset::Nothing, e::Nothing, kwargs::Base.Pairs{Symbol, UnitRange{Int64}, Tuple{Symbol}, NamedTuple{(:covrowinds,), Tuple{UnitRange{Int64}}}})
   @ TrajGWAS ~/.julia/packages/TrajGWAS/uUosN/src/gwas.jl:273
 [2] trajgwas(nullmeanformula::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term}}, reformula::FormulaTerm{Term, ConstantTerm{Int64}}, nullwsvarformula::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term, Term, Term}}, idvar::Symbol, nulldf::DataFrame, geneticfile::String; nullfile::String, solver::Ipopt.Optimizer, solver_config::Dict{String, Any}, parallel::Bool, runs::Int64, verbose::Bool, init::typeof(init_ls!), kwargs::Base.Pairs{Symbol, Any, NTuple{5, Symbol}, NamedTuple{(:geneticformat, :vcftype, :pvalfile, :covrowinds, :geneticrowinds), Tuple{String, Symbol, String, UnitRange{Int64}, Vector{Int64}}}})
   @ TrajGWAS ~/.julia/packages/TrajGWAS/uUosN/src/gwas.jl:177
 [3] top-level scope
   @ ~/bin/model.jl:28
in expression starting at /share/home/bin/model.jl:28

Looking forward to your reply.

@kose-y
Copy link
Member

kose-y commented Sep 6, 2023

In general, the lefthand side of the ~ of the formulae should not differ.

@kose-y
Copy link
Member

kose-y commented Sep 6, 2023

Please let us know if it works after matching the lefthand sides. We will add a check to make sure that they are identical in the future if that resolves the problem.

@zjy71
Copy link
Author

zjy71 commented Sep 7, 2023

I am sorry for my minor mistake. But after I changed lefthand side of the ~ of the formula, I got these errors.

ERROR: LoadError: MethodError: no method matching trajgwas(::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term}}, ::FormulaTerm{Term, ConstantTerm{Int64}}, ::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term, Term, Term}}, ::Symbol, ::DataFrame, ::String, ::String; geneticformat="VCF", vcftype=:DS, nullfile="/share/home/trajgwas/PLT.null.txt", covrowinds=Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1 … 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], geneticrowinds=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 20084, 20085, 20086, 20087, 20088, 20089, 20090, 20091, 20092, 20093])
Closest candidates are:
trajgwas(::FormulaTerm, ::FormulaTerm, ::FormulaTerm, ::Union{String, Symbol}, ::DataFrame, ::Union{Nothing, AbstractString}; nullfile, solver, solver_config, parallel, runs, verbose, init, kwargs...) at ~/.julia/packages/TrajGWAS/uUosN/src/gwas.jl:149
trajgwas(::FormulaTerm, ::FormulaTerm, ::FormulaTerm, ::Union{String, Symbol}, !Matched::AbstractString, ::Union{Nothing, AbstractString}; covtype, covrowinds, kwargs...) at ~/.julia/packages/TrajGWAS/uUosN/src/gwas.jl:128
trajgwas(::FormulaTerm, ::FormulaTerm, ::FormulaTerm, ::Union{String, Symbol}, ::DataFrame) at ~/.julia/packages/TrajGWAS/uUosN/src/gwas.jl:149 got unsupported keyword arguments "geneticformat", "vcftype", "nullfile", "covrowinds", "geneticrowinds"
...
Stacktrace:
[1] top-level scope
@ ~/trajgwas_xx/longgang_20230902/bin/1.null_model.jl:37
in expression starting at /share/home/bin/model.jl:37

@kose-y
Copy link
Member

kose-y commented Sep 7, 2023

Could you share what's in

~/trajgwas_xx/longgang_20230902/bin/1.null_model.jl:37
in expression starting at /share/home/bin/model.jl:37

?

You seem to have an extra String argument to the trajgwas function before the semicolon.

@zjy71
Copy link
Author

zjy71 commented Sep 7, 2023

line 37 is
trajgwas(@formula(H ~ 1 + age + BMI + day), @formula(Hb ~ 1), @formula(Hb ~ 1 + age + BMI + day), :id, pheno_data, vcffile; geneticformat = "VCF", vcftype = :DS, pvalfile, nullfile = fitted_null, covrowinds = covrowmask, geneticrowinds = geneticrowmask)
any argument is extra?

@kose-y
Copy link
Member

kose-y commented Sep 7, 2023

Is what you gave still the content there? The lefthand sides of the formulae still do not match yet, and

ERROR: LoadError: MethodError: no method matching
trajgwas(::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term}}, ::FormulaTerm{Term, ConstantTerm{Int64}}, ::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term, Term, Term}}, ::Symbol, ::DataFrame, ::String, ::String; geneticformat="VCF", vcftype=:DS, nullfile="/share/home/trajgwas/PLT.null.txt", covrowinds=Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1 … 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], geneticrowinds=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 20084, 20085, 20086, 20087, 20088, 20089, 20090, 20091, 20092, 20093])

indicates that the erroring trajgwas function call has seven arguments before the semicolon, but the code you gave has six.

Please check the current content of Line 37 of file ~/trajgwas_xx/longgang_20230902/bin/1.null_model.jl and Line 37 of file /share/home/bin/model.jl.

@zjy71
Copy link
Author

zjy71 commented Sep 8, 2023

Thank you for your prompt and patient response. I checked the script and reran it. But I still got some errors.
Here are my script and errors.

using ArgParse
using DataFrames, CSV
using Statistics
using TrajGWAS
using WiSER
using LinearAlgebra
BLAS.set_num_threads(1)

vcfdir = "/mnt/lsy/OS5300/data/public_data/impute/"
chr = 21
fitted_null = "/share/home/lsy_yangzijing/trajgwas_xx/longgang_20230902/input/PLT.null.txt"
pvaldir = "/share/home/lsy_yangzijing/trajgwas_xx/longgang_20230902/output/score"
vcf_index_dir = "/share/home/lsy_yangzijing/trajgwas_xx/longgang_20230902/bin/vcf_id.txt"

vcffile = vcfdir * "longgang_NIPT_glimpse_50948.merged.chr$(chr)"
pvalfile = pvaldir * "/PLT.chr$(chr).txt"
covfile = "/share/home/lsy_guoxinxin/project/anemia/trajGWAS/longgang_input/trajgwas_lg_coding.csv"

pheno_data = CSV.read(covfile, DataFrame, types=[String31, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64])
vcf_index = CSV.read(vcf_index_dir, DataFrame, header = false)[!,1]

covrowmask, geneticrowmask = matchindices(@formula(Hb ~ 1 + age + BMI),
        @formula(Hb ~ 1),
        @formula(Hb ~ 1 + age + BMI),
        :nipt_id,
        pheno_data, vcf_index);

trajgwas(@formula(Hb ~ 1 + age + BMI),
        @formula(Hb ~ 1),
        @formula(Hb ~ 1 + age + BMI),
        :nipt_id,
        pheno_data,
        vcffile,
        geneticformat = "VCF",
        vcftype = :DS,
        pvalfile,
        nullfile = fitted_null,
        covrowinds = covrowmask,
        geneticrowinds = geneticrowmask)
ERROR: LoadError: MethodError: no method matching trajgwas(::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term, Term}}, ::FormulaTerm{Term, ConstantTerm{Int64}}, ::FormulaTerm{Term, Tuple{ConstantTerm{Int64}, Term, Term, Term}}, ::Symbol, ::DataFrame, ::String, ::String; geneticformat="VCF", vcftype=:DS, nullfile="/share/home/lsy_yangzijing/trajgwas_xx/longgang_20230902/input/PLT.null.txt", covrowinds=Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1  …  1, 1, 1, 1, 1, 1, 1, 1, 1, 1], geneticrowinds=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  20084, 20085, 20086, 20087, 20088, 20089, 20090, 20091, 20092, 20093])
Closest candidates are:
  trajgwas(::FormulaTerm, ::FormulaTerm, ::FormulaTerm, ::Union{String, Symbol}, ::DataFrame, ::Union{Nothing, AbstractString}; nullfile, solver, solver_config, parallel, runs, verbose, init, kwargs...) at ~/.julia/packages/TrajGWAS/uUosN/src/gwas.jl:149
  trajgwas(::FormulaTerm, ::FormulaTerm, ::FormulaTerm, ::Union{String, Symbol}, !Matched::AbstractString, ::Union{Nothing, AbstractString}; covtype, covrowinds, kwargs...) at ~/.julia/packages/TrajGWAS/uUosN/src/gwas.jl:128
  trajgwas(::FormulaTerm, ::FormulaTerm, ::FormulaTerm, ::Union{String, Symbol}, ::DataFrame) at ~/.julia/packages/TrajGWAS/uUosN/src/gwas.jl:149 got unsupported keyword arguments "geneticformat", "vcftype", "nullfile", "covrowinds", "geneticrowinds"
  ...
Stacktrace:
 [1] top-level scope
   @ ~/trajgwas_xx/longgang_20230902/bin/1.null_model.jl:37
in expression starting at /share/home/lsy_yangzijing/trajgwas_xx/longgang_20230902/bin/1.null_model.jl:37

Line 37 of /share/home/lsy_yangzijing/trajgwas_xx/longgang_20230902/bin/1.null_model.jl is

trajgwas(@formula(Hb ~ 1 + age + BMI),
        @formula(Hb ~ 1),
        @formula(Hb ~ 1 + age + BMI),
        :nipt_id,
        pheno_data,
        vcffile,
        geneticformat = "VCF",
        vcftype = :DS,
        pvalfile,
        nullfile = fitted_null,
        covrowinds = covrowmask,
        geneticrowinds = geneticrowmask)

I'm not very familiar with the Julia language. Any wrong argument or did I miss some arguments when I using trajgwas()? I have a VCF file with more sample than my pheno file (I am sure all sample in pheno file is in VCF file). So I try to using covrowinds and geneticrowinds to deal with it. But I never succeed. When I kept the sample in the VCF file and the pheno file the same, everything went well.

@kose-y
Copy link
Member

kose-y commented Sep 8, 2023

With the current code, you are not reaching where you had gotten the error before. It seems like you have accidentally changed a semicolon to a comma. While it is not always mandatory in Julia, a semicolon is necessary before the keyword arguments in trajgwas(), as it uses variable-length arguments internally. You need to change the two lines with vcffile, to vcffile;.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants