Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing peek params for non-cgat-flow repositories #103

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

stydodt
Copy link

@stydodt stydodt commented Oct 24, 2019

peek_params was using pipeline directory when config was run, irrespective of value of working dir. This doesn't work if the pipeline is not a cgat-flow pipeline and pipeline_genesets is not in the same directory. Need to change operator priority of and and or.

Unfortunately this still doesn't fix the problem that it won't know where to look if workingdir is set to '?!', but its not in the cgat-flow directory.

…ctive of value of working dir. This doesn't work if the pipeline is not a cgat-flow pipeline and pipeline_genesets is not in the same directory. Need to change operator priority of and and or
@Acribbs
Copy link
Contributor

Acribbs commented Nov 1, 2019

Hi, sorry not sure if im understanding your problem. Are you able to elaborate in more detail what you are specifically trying to do? Do you have test case?

I am able to run pipelines outside the cgat-flow directory and don't have any issues.

@IanSudbery
Copy link
Contributor

Most pipelines have a peek_params at the start that loads parameters from the genesets pipeline.
To find out where to find the genesets pipeline is, it looks in the pipeline.yml.

When you do pipeline_mypipe.py config, most likely annotations_dir is set to ?!, so the pipeline will not be able to find an annotations dir to get the config from.

These lines in config.py deal with this possibility:

if ("config" in sys.argv or "check" in sys.argv or "clone" in sys.argv and workingdir == "?!"):
        workingdir = os.path.join(get_params()["pipelinedir"],
                                  "pipeline_" + pipeline)

The idea is that if config, check or clone is run, and workingdir is set to ?! (which is the parameter provided to peek_params(workingdir=PARAMS["annotation_dir"]), then the '?!' is overwritten with pipeline_genesets from the same directory as the current pipeline file.

There are two problems with this.

The first is that ("config" in sys.argv or "check" in sys.argv or "clone" in sys.argv and workingdir == "?!") is interpreted as

("config" in sys.argv) or 
("check" in sys.argv) or 
("clone" in sys.argv and workingdir == "?!")

when we want it interpreted as

("config" in sys.argv or "check" in sys.argv or "clone" in sys.argv) and 
(workingdir == "?!")

so in the old formulation, workingdir is only checked if the command is clone, not if it is config or check. This means that whenever you run config, workingdir is being overwritten with %(pipeinedir)s/pipeline_genesets whatever annotations_dir is set to in %(pipelinedir)s/pipeline.yml.

This is a problem, because if you are developing a pipeline in a directory that does not contain pipeline_genesets, it will cause the an error.

To reproduce this behaviour, create the following new pipeline in a non cgatflow location, say ~/devel/mypipe/pipeline_mypipe.py:

from cgatcore import pipeline as P
from ruffus import originate

PARAMS = P.get_parameters(
    [os.path.join(P.snip(__file__, ".py"), "pipeline.yml"),
    "pipeline.yml"] )

PARAMS.update(P.peek_parameters(
    PARAMS["annotations_dir"],
    "genesets",
    prefix="annotations_",
    update_interface=True,
    restrict_interface=True))

@originate("test")
def dummy(outfile):
    P.touch(outfile)

with a pipeline.yml in ~/mypipe/pipeline_mypipe/pipeline.yml that contains:

annotations:
    dir: PATH_TO_ANNOTATIONS

(with the actual path subbed in) - actually, it doesn't matter what you put here.

Now go to a new directory and run

$ python ~/devel/mypipe/pipeline_mypipe.py config

Changing to the new formulation only solves half the problem though. Under the suggested change config will actually load the annotations interface from the location specified in ~/devel/mypipe/pipeline_mypipe/pipeline.yml, but really, you'd like to not hard code a location there, but put ?!. But if you do this, config will still fall back on trying to load ~/devel/mypipe/pipeline_genesets/pipeline.yml.

@Acribbs
Copy link
Contributor

Acribbs commented Nov 8, 2019

Thanks @IanSudbery I get it now. I will attempt to run your example

@Acribbs
Copy link
Contributor

Acribbs commented Dec 9, 2019

Hi @IanSudbery , I managed to have a quick look at this. Could all of this not be handled in the PARAMS better by also parameterising the pipeline name? i.e.:

PARAMS.update(P.peek_parameters(
    PARAMS["annotations_dir"],
    "genesets",
    prefix="annotations_",
    update_interface=True,
    restrict_interface=True))

to

PARAMS.update(P.peek_parameters(
    PARAMS["annotations_dir"],
    PARAMS["annotation_pipelinename"],
    prefix="annotations_",
    update_interface=True,
    restrict_interface=True))

Then it wouldn't fallback to using pipeline_genesets?

@IanSudbery
Copy link
Contributor

I don't understand? How is this different? Presumably you would put "genesets" in your template pipeline.yml, and then it would evaluate to the same thing?

I wonder if the real solution is that if you are cloning, or contiging, or checking, you don't actually need to import the config values, so perhaps if clone, config or check is the command and workingdir="?!", then the function should just return without doing anything (perhaps with a warning)?

@Acribbs
Copy link
Contributor

Acribbs commented Dec 9, 2019

Maybe I'm reading the code wrong, but the having annotation_dir: ?! still tries to load ~/devel/mypipe/pipeline_genesets/ because the pipeline name is hard coded to genesets? Then you could add warnings that no pipeline name has been set and it returns nothing.

I agree that the solution should be return a warning (or at least a better warning that it returning that it can't find the working directory).

@IanSudbery
Copy link
Contributor

IanSudbery commented Dec 10, 2019 via email

@Acribbs
Copy link
Contributor

Acribbs commented Dec 10, 2019

To be honest I haven't looked at this in a lot of detail, iv had meningitis last week and still recovering, but I think the solution should work.

@IanSudbery
Copy link
Contributor

IanSudbery commented Dec 10, 2019 via email

@Acribbs
Copy link
Contributor

Acribbs commented Oct 26, 2024

Just wondering if this needed to be looked at again?

@IanSudbery
Copy link
Contributor

This PR still fixes the problem of the incorrect grouping of conditions. The main fact that peek_parameters will still throw an error if run with a non-cgatflow pipeline remains.

Its sat dormant because we've stopped really using peek_paramters as it just makes pipelines difficult to share as standalone entities. I suspect that your solution - return nothing, is still the correct one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants