Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a cache system #75

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

Conversation

moyodiallo
Copy link
Contributor

@moyodiallo moyodiallo commented Dec 1, 2023

The response of a request is cached and it is invalidated every time one of the packages of the new request is involve between the new opam commit and the old one (cached opam commit).
Anytime the cached response is the same as the new response, we keep the old one for the oldest opam commit.

The response of a request is cached and it is invalidated every time one
of then packages of the new request is involve ind the differencies
between the new opam commit and the old one(cached opam commit).

Anytime the cached response is the same as the new response, we keep the
old one for the oldest opam commits which are used.
Invalidate the cache when the new commit is older in the commit history.
@moyodiallo moyodiallo marked this pull request as ready for review December 4, 2023 14:53
On addition to direct dependencies some transitive deps could also
change between 2 commits of opam repository.
@talex5
Copy link
Contributor

talex5 commented Dec 8, 2023

This is too large for me to review (and seems to include a load of unnecessary reformatting).

Could you say something about how it works and why it's correct?

I would expect it to work like this:

  • Every time the solver asks for the candidates of a package, remember the Git hash of the packages/$pkg directory. We already have this hash, as we use it for reading the directory.
  • When solving a request which is the same except for the opam-repository commit, quickly check if all of those directories have the same hashes.

I see all kinds of code here shelling out to git diff, git pull, git log, etc, which seems odd.

Note: Putting the cache on the worker means it might not work so well with a cluster, since it's less likely the worker handling a request also handled the previous one. An alternative would be to send all the hashes back to ocaml-ci and let it do the check in one place. However, that puts more load on the single CI instance, so it might not be a good trade-off.

@moyodiallo
Copy link
Contributor Author

Could you say something about how it works and why it's correct?

  • For the first time when a request is solved, the result is cached with 2 keys. The hash of the request and hash of request without the opam repository commits (The first (url,commit) list and the second (url,_) list.
  • When a request is sent the solver-service. It is hashed and found, if not it goes to the next step.
  • The request hash (first key) wasn't in the cache, the request is hashed without the opam-repository commits (the second key), to find if there's an old solved for that request. If yes, it try to invalidate the cache with the new opam-repository commits (when the commits of the request and cache differ). If the cache is invalidated, it solves again. If the result is the same as old one, the old one is kept to conserve the old opam-repository commits.
  • Invalidating a cache is when the packages obtained with a diff of the new and old commit of an opam-repository, are involve in the request, and the transitive dependencies of that request are also involved.

With that we only re-solve a request if its packages are involve in the change of the opam-repository commits:

  • All the requests are solved at their first time.
  • Any request that changes its opam-repository commits could be resolved.
  • Two requests with different opam-repository URLs are treated differently (considered as different).

@moyodiallo
Copy link
Contributor Author

Note: Putting the cache on the worker means it might not work so well with a cluster, since it's less likely the worker handling a request also handled the previous one. An alternative would be to send all the hashes back to ocaml-ci and let it do the check in one place. However, that puts more load on the single CI instance, so it might not be a good trade-off.

It could be interesting to have a node that distribute the requests among the solver-workers. Could we have that kind of config at the current structure (ocluster, ocluster-workers) ?

* Same solution with different commit (is_same_solution).
* Same commits have no changes.
@moyodiallo
Copy link
Contributor Author

solve-cache
solve-cache(set/get) only record the results coming from the CI, it doesn't solve.

I think this could solve the cases :

  • having the cache on the CI side which will have more load on the single instance
  • having also the cache in the solver-workers, that will ends up missing lot of hit.

@moyodiallo moyodiallo marked this pull request as draft December 12, 2023 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants