-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YODA principles #113
YODA principles #113
Conversation
WIP: more
First draft is done. |
The lecturer hands out the requirements: The projects needs to | ||
|
||
- be prepared in the form of a DataLad dataset | ||
- needs to contain a data analysis performed with Python tools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haven't read in full yet, but this wants to tell me that something in here is Python-specific, and I can ignore it for other projects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, I get your point. I will try to reduce that feeling, if we keep this.
Can I get your take on the general idea? It was to put the YODA principles into the context of the narrative, and this I thought was easiest possible in the context of a data analysis. My initial idea was:
- introduce YODA principles
- introduce Python API
- combine it to a data analysis project (maybe with Raw data from online personality tests datalad/datasets.datalad.org#24)
- publish that somewhere
all wrapped up in a "midterm project" context in the educational narrative.
However, thinking about this now, it also feels like a lot in a single chapter (Yoda, Python API, datalad publish). The alternative would be to have individual parts in their own chapters or as parts of other chapters, and then combine/apply them in a single section.
I'm undecided yet, so if anyone has preferences...
The section :ref:`run` already outlined the problem of associating | ||
a result with an input and a script. It can be difficult to link a | ||
figure from your data analysis project with an input data file or a | ||
script, even if you created this figure yourself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that sense it is yet another linkage aspect (in addition to P2): provenance capture. Only executing through datalad run links the code and the inputs to the outputs. Uncertain if we want to attempt merging P2 and P3, or how to delineate them better. This is not directly relevant to your writing, but a possible conceptual weakness of the principles.
In a sense the P2 summary is also a good summary for P3....
about containerized computational environments for reproducible data science, | ||
check out `this section <https://the-turing-way.netlify.com/reproducible_environments/06/containers#Containers_section>`_ | ||
in the wonderful book `The Turing Way <https://the-turing-way.netlify.com/introduction/introduction>`_, | ||
a comprehensive guide to reproducible data science. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eventually move into the future section
docs/intro/narrative.rst
Outdated
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
This handbook will teach you simple and yet advanced principles of data | ||
management for reproducible, comprehensible, transparent, and FAIR data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link FAIR to their page?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like! Works nicely with the principles being outlined before being explained later.
I'm adding a place-holder for a section on the YODA principles and organizational best-practices for data analyses in DataLad datasets. This could be woven into the narrative by using one of the datasets in here for a "midterm" data analysis project. Afterwards, in conjunction with #111, we could share these results somewhere that is not the same file system.