-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add --allow-missing
for dvc commit
#10524
Comments
Makes sense, should be quite straightforward to add. I hope someone from the community can pick it up. |
I can give this a shot. @ermolaev94 I am new to the community and need some help here. I understand that you need a flag |
@anunayasri thanks! the scope for this is to add The idea is that I think the use case comes from a practical issue like this. Let's say we have a pipeline that has a lot of data (input), output (models), etc. It also depends on some python files (source code tracked by Git). Let's say we added a comment in that source file and we don't want to run the pipeline again (since it doesn't change the result but takes a lot of time to download the data and run it), but we want to update Let me know if that makes sense. |
Thanks for the detailed response @shcheklein . This makes sense. I will try it out in some time and revert back. |
@shcheklein Can you guide me how to reproduce this issue of downloading data on I have tried the following -
|
I don't think this is a good issue for contributions, as there are lot of open product questions. @ermolaev94, if you are not aware, We could not implement this for This PR implements virtual operation for The problem is that In that case, you do a lot of |
@skshetry just to a bit more color (and @ermolaev94 can correct me):
I think this is about pipelines primarily, not
I think we advertise a few use case actually. Including the one when you change quickly a dependency and don't want to run the whole pipeline. Here is the description: https://dvc.org/doc/command-reference/commit#description
could you please clarify / do you remember where / when that discussion was happening? |
Yup, I understand that. If you read in #9440, the suggestion from Ruslan is to add support for updating pipelines through
Documentation is not quite right. Please read this comment: #9389 (comment). I had always thought of
I hope the discussion in #9389 and #9440 captures most of the thing. |
Hmm 🤔 I'm pretty sure (I can find the original implementation discussion probably) that Anyways,I think
I would go not from Git analogs but from practical DVC-specific scenarios if possible. E.g. pin down existing state to It's way easier for me to think in these terms tbh. And then if needed map to Git analogs. It might well be that the name is not perfect and we might well have discrepancy in semantics for |
Sorry, I don't think I read it very carefully. I guess the source code itself is a dependency. Currently, It looks like we already support Lines 45 to 54 in f56343d
Although virtual operation would help for large datasets here. |
But it does not, not by default. By default, So, what
Reading #919 (and, #1601), Ruslan is right here. |
Looks like this issue needs more discussion. I think I should stop working on it for now.
@skshetry I am looking to contribute to the repo. Could you please point me to beginner friendly product questions that I can work on. |
Hi, I think it's okay to implement To give you more information, internally, every command in dvc mirrors an API of same name in Line 9 in f56343d
Line 45 in f56343d
You can take an example of Lines 104 to 109 in f56343d
|
Sometimes it's necessary to update pipeline hashes and params without downloading large datasets. Currently I can't run
dvc commit
on machine without data even if I'd like to update my source code machine.It would be great to have similar to
dvc repro
flag providing a way to ignore files that are not inside cache.The text was updated successfully, but these errors were encountered: