Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize CollectionSpec checker #436

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

nilern
Copy link

@nilern nilern commented Feb 5, 2022

  • Replace a doseq with reduce on JVM
  • Accumulate into a JS array on cljs instead of a persistent vector in an atom

The latter could improve #356 substantially; in the Malli seqex impl I had a similar situation with a persistent set in an atom on cljs and it was just unacceptably slow.

@@ -23,19 +23,18 @@
(let [_ (macros/assert! (= 2 (count e)) "remaining can have only one schema.")
c (spec/sub-checker (second e) params)]
#?(:clj (fn [^java.util.List res x]
(doseq [i x]
(.add res (c i)))
(reduce (fn [res i] (doto res (.add (c i)))) res x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this expected to be faster? The List can't be used safely concurrently so I would be surprised if you could do better than the doseq. Have you profiled and seen a speedup?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduce (and run!) is faster and more memory efficient because as an internal iterator it can be fully specialized to the collection and usually does not need to allocate at all. I have profiled it sometime but by now it is just a habit like using (into [] (map ...) x) instead of (into [] (map ... x)). Since reduce works on all seqs really the only reason to ever use doseq is if :when/:while/:let is super useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The weird thing about reduce here is that it's threading through an extra argument that you don't care about because you're mutating the argument anyway. I think it makes the code slightly less clear and I would be a bit surprised if there was a performance benefit, but I'd be happy to believe it if there were accompanying measurements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The weird thing about reduce here is that it's threading through an extra argument that you don't care about because you're mutating the argument anyway.

Might be worth comparing to (reduce #(.add res (c %2))) nil x).

Copy link
Contributor

@frenchy64 frenchy64 Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it would probably beat the current code handily since I don't see a type hint on the inner res.

EDIT: I've added a reflection check to the master build.
EDIT2: Of course I missed the doto propagating the type hint, nevermind.

(then res nil))
:cljs (fn [res x]
(swap! res into (map c x))
(reduce (fn [res i] (doto res (.push (c i)))) res x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that something mutable could be faster here, but I'm again unsure about the use of reduce. Can you share profiling results?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

into uses reduce too! Here it is more important that the data structure is mutable. I'll do some profiling later.

@@ -58,7 +57,7 @@

:cljs
(defn- has-error? [l]
(some utils/error? l)))
(.some l utils/error?)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with cljs, can you explain the change here please?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that l is a JS array it just uses Array#some.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Is that expected to be appreciably faster?

@w01fe
Copy link
Member

w01fe commented Feb 5, 2022

Thanks for the PR! Certainly happy to accept optimizations but always prefer if they come with measurements. And would like to learn more about the choice of reduce over doseq.

@nilern
Copy link
Author

nilern commented Feb 5, 2022

I want to do some benchmarks but for now I am out of time.

@w01fe
Copy link
Member

w01fe commented Feb 6, 2022

Thanks! I just want to make sure we are striking the right balance between performance and clarity, i.e. only making optimizations that make a measurable difference. My intuition is admittedly stale here, so I'm happy to take these with measurements, or happy if another maintainer wants to take them without.

@nilern
Copy link
Author

nilern commented Feb 12, 2022

I don't think this adds much complexity; it is just using JS arrays just like it was already using ArrayList on the JVM.

While I am fairly confident it will also be faster it would be silly to optimize without benchmarking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants