-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crawling of stanford dataspace, and simple indexes #11
Conversation
some failures are due to the bug somewhere in twisted or scrapy leading to |
Codecov Report
@@ Coverage Diff @@
## master #11 +/- ##
===========================================
- Coverage 86.44% 70.71% -15.73%
===========================================
Files 51 51
Lines 4130 4180 +50
===========================================
- Hits 3570 2956 -614
- Misses 560 1224 +664
Continue to review full report at Codecov.
|
* origin/master: TST: Mark test_simple1 as a known V6 failure TST: travis: Add V6 run TST: Drop stale known_failure_v6's RF: rename simple_with_stanford_lib.py to stanford_lib.py BF: crcns - use new datacite interface BF(workaround): adjust for absent pruning commits due to --incremental BF: need to use "incremental=True" now for aggregate_metadata BF: use legacy.openfmri.org
elderly effort. IIRC was working but datasets of interest were broken (broken tarballs iirc) anyways. And with no immediate need - abandoned. So let's let it RiP |
changes from NF: crawl Stanford digital repository datalad#2241 to support crawling of stanford dataspace (to close stanford digital repository datasets.datalad.org#16) -- was done separately in ENH: crawl stanford lib initial crawler #17- target file url relative to the initial url (e.g. all the files are on the same website),
- from the path to the page (relative to the initial url?) which contains the link to the target file (e.g. when we have a website which points to external components or to some generic "keystore")