Skip to content
This repository has been archived by the owner on Oct 23, 2019. It is now read-only.

Pipeline workers #86

Open
markfinger opened this issue Mar 30, 2016 · 9 comments
Open

Pipeline workers #86

markfinger opened this issue Mar 30, 2016 · 9 comments

Comments

@markfinger
Copy link
Owner

The original plan for workers turned out to be flawed as the transport cost quickly reduced any wins that parallelisation introduced. I rambled a bit in https://github.com/markfinger/unfort/blob/327161abaf47b59152c3634dcf258afb17e92a8b/README.md#worker-processes

I still think there's some merit, and it's worth investigating. It may even bring some structural wins by forcing a cleaner separation between the pipeline and the high-level wiring.

@markfinger markfinger changed the title Workers Pipeline workers Mar 30, 2016
@markfinger
Copy link
Owner Author

Unfort handles files as individual units, which ensures that moving the pipeline into child processes shouldn't be the most difficult process - at least, in concept.

The current process involves two-phases: graph population and code generation. Consolidating them would simplify the wiring as well as enabling us to depend purely on the graph's completed signal to denote that everything's done.

@markfinger
Copy link
Owner Author

One downside to moving the pipeline into workers is that it would probably require any overrides to be made in separate files so that the workers can import them without triggering side-effects.

@markfinger
Copy link
Owner Author

Should create a reasonably large test-case with 10,000+ files in random directories. Should randomly generate the code and have them import one another so that the graph picks up everything.

At that scale, profiling should start to highlight low-hanging fruit.

@markfinger
Copy link
Owner Author

Key benefits that I'm expecting from pipeline workers would be:

  • increasing the IO thread count so that jobs touching the FS complete faster
  • off-loading CPU-intensive code that's blocking the event loop

@markfinger
Copy link
Owner Author

Interactions with the persistent cache will get a bit fiddlier thought. I wonder if it's worthwhile running the cache in the foreground and only using worker processes for computation.

Might be a good opportunity to investigate cache perf: #87

@spalger
Copy link
Contributor

spalger commented Mar 30, 2016

From your last comment it sounds like you're thinking like I am: workers would be responsible for populating the cache, the foreground would only read from cache, and the foreground would tell the worker processes which caches to populate?

@markfinger
Copy link
Owner Author

In an optimal case, yeah. There are trade-offs though.

Currently, caching is implemented purely at a low-level within the pipeline itself. That approach has been quite useful as it allowed the flexibility to handle edge-cases.

One example is caching path resolution for dependencies. There are some edge-cases that are hard to detect (for example, does ./foo map to ./foo.js or ./foo/index.js), so in these cases the current implementation will never cache specific paths so as to avoid the ambiguity.

Another example: in some projects I'm selectively turning off persistent caching for particular assets that are generated by LESS or SASS compilers. Mostly as it's a pain to wire up the invalidation.

In effect, I suppose it's a balancing act of trying to preserve low-level control so that experimentation is possible, while also trying to maintaining a sane and robust codebase.

Hmm, it's probably worth doing some research and musing over this for a little bit before starting to prototype.

@markfinger
Copy link
Owner Author

@spalger sorry missed a lot of detail in your suggestion, got side-tracked and rambled a bit 😛

But, yeah, that'd be a good approach, not sure what the implementation would look like though.

@markfinger
Copy link
Owner Author

Assuming that the pipeline were converted to running in workers, it might be worthwhile to move dependency resolution out of the pipeline. This would enable better interactions with any FS caching/buffering - see #88

In effect, this would ensure that the pipeline is used purely for transformation and AST introspection.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants