Add a cache system #75

moyodiallo · 2023-12-01T18:13:45Z

The response of a request is cached and it is invalidated every time one of the packages of the new request is involve between the new opam commit and the old one (cached opam commit).
Anytime the cached response is the same as the new response, we keep the old one for the oldest opam commit.

The response of a request is cached and it is invalidated every time one of then packages of the new request is involve ind the differencies between the new opam commit and the old one(cached opam commit). Anytime the cached response is the same as the new response, we keep the old one for the oldest opam commits which are used.

Invalidate the cache when the new commit is older in the commit history.

On addition to direct dependencies some transitive deps could also change between 2 commits of opam repository.

talex5 · 2023-12-08T09:11:54Z

This is too large for me to review (and seems to include a load of unnecessary reformatting).

Could you say something about how it works and why it's correct?

I would expect it to work like this:

Every time the solver asks for the candidates of a package, remember the Git hash of the packages/$pkg directory. We already have this hash, as we use it for reading the directory.
When solving a request which is the same except for the opam-repository commit, quickly check if all of those directories have the same hashes.

I see all kinds of code here shelling out to git diff, git pull, git log, etc, which seems odd.

Note: Putting the cache on the worker means it might not work so well with a cluster, since it's less likely the worker handling a request also handled the previous one. An alternative would be to send all the hashes back to ocaml-ci and let it do the check in one place. However, that puts more load on the single CI instance, so it might not be a good trade-off.

moyodiallo · 2023-12-08T11:13:18Z

Could you say something about how it works and why it's correct?

For the first time when a request is solved, the result is cached with 2 keys. The hash of the request and hash of request without the opam repository commits (The first (url,commit) list and the second (url,_) list.
When a request is sent the solver-service. It is hashed and found, if not it goes to the next step.
The request hash (first key) wasn't in the cache, the request is hashed without the opam-repository commits (the second key), to find if there's an old solved for that request. If yes, it try to invalidate the cache with the new opam-repository commits (when the commits of the request and cache differ). If the cache is invalidated, it solves again. If the result is the same as old one, the old one is kept to conserve the old opam-repository commits.
Invalidating a cache is when the packages obtained with a diff of the new and old commit of an opam-repository, are involve in the request, and the transitive dependencies of that request are also involved.

With that we only re-solve a request if its packages are involve in the change of the opam-repository commits:

All the requests are solved at their first time.
Any request that changes its opam-repository commits could be resolved.
Two requests with different opam-repository URLs are treated differently (considered as different).

moyodiallo · 2023-12-08T11:16:16Z

Note: Putting the cache on the worker means it might not work so well with a cluster, since it's less likely the worker handling a request also handled the previous one. An alternative would be to send all the hashes back to ocaml-ci and let it do the check in one place. However, that puts more load on the single CI instance, so it might not be a good trade-off.

It could be interesting to have a node that distribute the requests among the solver-workers. Could we have that kind of config at the current structure (ocluster, ocluster-workers) ?

* Same solution with different commit (is_same_solution). * Same commits have no changes.

moyodiallo · 2023-12-11T10:51:37Z

solve-cache(set/get) only record the results coming from the CI, it doesn't solve.

I think this could solve the cases :

having the cache on the CI side which will have more load on the single instance
having also the cache in the solver-workers, that will ends up missing lot of hit.

moyodiallo force-pushed the add-cache-system branch from b109239 to d1120d4 Compare December 3, 2023 14:55

moyodiallo added 2 commits December 3, 2023 16:01

Refactor and fix the invalidation

10f8152

Invalidate the cache when the new commit is older in the commit history.

Refactoring

53e4c0e

moyodiallo marked this pull request as ready for review December 4, 2023 14:53

Update Dockerfile

ffe2df9

moyodiallo force-pushed the add-cache-system branch from 3f230ca to ffe2df9 Compare December 4, 2023 20:31

Invalidate the cache because of a transitive dep.

ef0beba

On addition to direct dependencies some transitive deps could also change between 2 commits of opam repository.

moyodiallo force-pushed the add-cache-system branch from 98ab919 to ef0beba Compare December 6, 2023 17:23

moyodiallo requested review from talex5, tmattio and tmcgilchrist December 7, 2023 12:21

Some corrections

e004ee9

* Same solution with different commit (is_same_solution). * Same commits have no changes.

moyodiallo marked this pull request as draft December 12, 2023 09:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a cache system #75

Add a cache system #75

moyodiallo commented Dec 1, 2023 •

edited

Loading

talex5 commented Dec 8, 2023

moyodiallo commented Dec 8, 2023

moyodiallo commented Dec 8, 2023

moyodiallo commented Dec 11, 2023

Add a cache system #75

Are you sure you want to change the base?

Add a cache system #75

Conversation

moyodiallo commented Dec 1, 2023 • edited Loading

talex5 commented Dec 8, 2023

moyodiallo commented Dec 8, 2023

moyodiallo commented Dec 8, 2023

moyodiallo commented Dec 11, 2023

moyodiallo commented Dec 1, 2023 •

edited

Loading