Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation in ClojureScript extremely slow #356

Open
ku1ik opened this issue Apr 27, 2016 · 9 comments
Open

Validation in ClojureScript extremely slow #356

ku1ik opened this issue Apr 27, 2016 · 9 comments

Comments

@ku1ik
Copy link

ku1ik commented Apr 27, 2016

I have defined schema'd record (here https://github.com/asciinema/asciinema-player/blob/19c5c423e3ca78df1ae04dcebbf0d32437e303e1/src/asciinema/player/vt.cljc#L56), used s/defn to few functions and enabled schema validation in test via (use-fixtures :once schema.test/validate-schemas) (here https://github.com/asciinema/asciinema-player/blob/19c5c423e3ca78df1ae04dcebbf0d32437e303e1/test/asciinema/player/vt_test.cljc#L15).

I have noticed that schema validation is very slow when under ClojureScript. Without validation the tests under both Clojure and ClojureScript were finishing in few seconds. After enabling the validation I noticed that it's still couple of seconds for Clojure but 15+ minutes for ClojureScript (phantomjs). Right now I'm enabling test validation for Clojure only because of this, using reader conditional (as seen in the second link above).

It seems it's not test-specific problem, because I can observe the same behavior when using (s/set-fn-validation! true).

I tried to find a specific part of this VT schema that makes it so slow (changing different sub-schemas to s/Any etc) and it seems there's no single key/sub-schema that makes it slow, it just adds up from all sub-schemas.

I'd be happy to give more information, test, dig deeper. Any hints?

@w01fe
Copy link
Member

w01fe commented Apr 27, 2016

Thanks for the report. Is this possibly the same issue as #352 ? Are you observing failures, or is it slow despite everything succeeding?

I don't know much about ClojureScript profiling, but I'll try to take a look and see what I can find. If you can share a slow js file that I can profile in Chrome, that would be a big help. Thanks!

@ku1ik
Copy link
Author

ku1ik commented Apr 27, 2016

It's always succeeding so not sure if the underlying issue is related to the mentioned one by you.
I'll prepare slow .js file for Chrome, hold on.

@ku1ik
Copy link
Author

ku1ik commented Apr 27, 2016

I pushed compiled files with simple html file to this repo: https://github.com/sickill/gh-356
It has been compiled without advanced optimization (so it's easier to see function names in profiler), with (s/set-fn-validation! true).

@ku1ik
Copy link
Author

ku1ik commented Apr 27, 2016

I have added CPU profile dump to the repo, you can load it in Chrome profiler. Lot's of schema.spec.element_transformer and schema.spec.collection.CollectionSpec.schema$spec$core$CoreSpec$checker$arity$2 calls. Stack is very deep there, and it's hard for me to see where it actually ends.

@ku1ik
Copy link
Author

ku1ik commented Apr 27, 2016

If you switch to "Heavy (Bottom Up)" view in the profiler it's easier to spot. (btw, sorry for the noise ;-)

@w01fe
Copy link
Member

w01fe commented Apr 28, 2016

Thanks, that's very helpful!

Now that I'm looking at the code, it seems like you have a large state
object that is being validated many times (included in generative tests)?
Out of the box, Schema will validate the entire VT object every time you
call the feed-one function, including walking all of the buffers and such
(which, given the representation as a sequence of pairs, could be pretty
expensive). There is no caching for parts that have already been
validated, so the full validation is redone every time the function is
called (even if only a small part of the state has changed).

I'm not sure if this is the only thing going on or if something else is
unexpectedly slow (especially given the comparative results clj vs cljs),
but nothing pops out in the profile to me. Did you notice anything?

If there is a way to improve the performance of cljs checking, we'd
definitely like to do that. In the mean-time, as a workaround you could
declare feed-one to be ^:never-validate, so that you get schema checking
on the other functions but don't revalidate the entire state on every
step. You could also create a custom checker that caches results --
something like a map from schema to IdentityHashMap with the set of
known-good inputs. If you're interested in going down that road, I would
be happy to give some pointers.

On Thu, Apr 28, 2016 at 12:53 AM, Marcin Kulik [email protected]
wrote:

I pushed compiled files with simple html file to this repo:
https://github.com/sickill/gh-356
It has been compiled without advanced optimization (so it's easier to see
function names in profiler), with (s/set-fn-validation! true).


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#356 (comment)

@ku1ik
Copy link
Author

ku1ik commented Apr 28, 2016

Yes, it's validated many times, including generative tests. Generative tests are the major cause of 15+ minutes test run because of the high number of times they stress feed-one + schema validation.
However when I disabled generative test, left only the regular unit ones, it was also an order of magnitude slower (under CLJS) than without schema validation. So instead of 2 seconds they were running for 20 seconds.

I didn't observe anything I could stick to in the profile (mostly because I don't know schema's internals).
Re the size of the object I'm validating, yeah, it's not the smallest, so I suspected the size plays role here, however I didn't expect such a difference (between CLJS without validation and CLJS with validation, also between CLJS and CLJ with validation). Apparently this code runs orders of magnitude faster on CLJ, I just didn't notice it when it was all in a small scale.

Re adding ^:never-validate to feed-one: this is a place where I actually want to have the schema checked in tests, especially with generative tests which put lots of rubbish (random bytes) onto this virtual terminal, and I want to find about any violations of schema resulting from mishandling edge cases.

I appreciate you dug deeper, looked at my code ❤️. I think I'll stay with the reader conditional enabling schema checking in tests only under CLJ, which is fine as I'm running tests under both CLJ and CLJS on CI so potential bugs will be caught anyway.

@w01fe
Copy link
Member

w01fe commented Apr 29, 2016

Ah, thanks for the context -- that all makes sense, and sounds like a reasonable compromise. I haven't done much with performance in ClojureScript, but I don't know of any specific reason it would be slower -- maybe the JIT just isn't as good at optimizing the work as its set up as does as the JVM (the specific way checkers are created is aimed at high performance after aggressive inlining, FWIW).

So I guess I'll put this on the back burner for now, but I'll definitely keep it in the back of my mind next time I'm digging into things.

@ku1ik
Copy link
Author

ku1ik commented May 4, 2016

Ok, thanks! (feel free to close this or keep it open, whichever you prefer).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants