-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wishlist s3 cache #1748
Comments
While a nice idea, I fear this could be quite complicated in practice. In particular, managing a cache between multiple processes that may even be running on different hosts is likely to lead to a number of low-level difficulties. For the use case you mention, it may be better to set up a separate process to act as a caching proxy. Having only one process to manage the files makes things much easier. A quick web search brings up https://github.com/rhelmer/caching-s3-proxy although it seems not to have been touched for a while, and I don't know how well it would work on very large files. |
I'll give some thought to the mitm style solution you've suggested. |
You might also take a look at CloudWatch, which offers some caching of S3 requests. |
I have to imagine you didn't intend CloudWatch, that seems very unrelated. However https://github.com/nginxinc/nginx-s3-gateway is another possibility. |
CloudFront offers the ability to cache requests to S3-sourced files. I'm mentioning it here as it may be another option for those using S3 for sharing BAM files. Hope this might be useful. Apologies if you're talking about something else. edit: I meant CloudFront, not CloudWatch. Sorry! |
can you point to a URL that gives a little more documentation on what
you’re thinking of. The cloud watch that I know serves a very different
purpose as near as I understand.
|
htslib is awesome, I feel embarrassed and greedy to wish for more, but I have a workflow where I see a potential for a big win, and I'm sure I'm not the only one who would benefit.
Consider
https://requests-cache.readthedocs.io/en/stable/
a transparent drop in for requests, which does local caching.
and imagine how many of us are pulling the same 0.01% of the same s3 file over and over again during development, and never need 99.99% of the whole BAM/CRAM
if there was an ENV var, or other config to allow htslib-s3-plugin to write to a local cache, and to pull from that when available it seems like that would payoff quite well for a presumably common use case.
even if it isn't used in every code path, a partial win would still payoff handsomely.
it seems sufficiently adjacent to
#1670
that perhaps during that work, leaving some notes about the relevant paths would be possible
The text was updated successfully, but these errors were encountered: