Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify mirrors #83

Open
jrdnbradford opened this issue Oct 17, 2024 · 1 comment
Open

Verify mirrors #83

jrdnbradford opened this issue Oct 17, 2024 · 1 comment

Comments

@jrdnbradford
Copy link
Contributor

jrdnbradford commented Oct 17, 2024

From discussion in #82.

We need an unexported function to test whether or not a particular mirror is indeed a mirror and is working. Checking that it is in the mirrors list won't quite do the job as many mirrors aren't listed there.

This will prevent us from having to do workarounds such as that fixed in #84.

@jrdnbradford
Copy link
Contributor Author

Since gutenbergr currently only supports http, https, and ftp mirrors, perhaps the check should be limited to these. We could do something like:

is_gutenbergr_compatible_mirror <- function(url) {
  base_url <- sub("/+$", "", url)
  readme_url <- paste0(base_url, "/README")
  readme <- read_url(readme_url)
  contains_pg_string <- any(grepl("GUTINDEX.ALL", readme))
  contains_pg_string
}

If read_url is able to read a root README and it has a reference to GUTINDEX.ALL then it's most likely a running mirror gutenbergr can handle. We can add this check to a few key places, like if try_gutenberg_download fails or when gutenberg_get_mirror is run and there isn't already the gutenberg_mirror option set.

There are obviously several conditions we can check for besides/instead of this. Let me know what you think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant