Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nat.repr slow for very large literals #5771

Open
nomeata opened this issue Oct 19, 2024 · 3 comments
Open

Nat.repr slow for very large literals #5771

nomeata opened this issue Oct 19, 2024 · 3 comments
Labels
bug Something isn't working P-low We are not planning to work on this issue

Comments

@nomeata
Copy link
Contributor

nomeata commented Oct 19, 2024

When experimenting with encoding set as bitvectors as Nats, which seems like it should be rather efficient, I noticed that processing the file in VSCode was quick enough, but lake build would take a long time, and lean is busy writing the .c file.

I suspect that Nat.repr is just very slow on larger literals, given that it goes through List Char, rather than allocating a String of the right size and then (linearly) updating it digit by digit (or chopping the Nat into USize-sized limbs, using the C code to print it, and concatenating efficiently).

It also seems to scale quadratically with the length of the number, as

#time #guard_msgs(drop all) in #eval (Nat.repr (10^50000)).length
#time #guard_msgs(drop all) in #eval (Nat.repr (10^100000)).length
#time #guard_msgs(drop all) in #eval (Nat.repr (10^200000)).length
#time #guard_msgs(drop all) in #eval (Nat.repr (10^400000)).length

shows, which gives me (on live.lean-lang.org) timings

time: 622ms
time: 2374ms
time: 9302ms
time: 37342ms

Versions

"4.12.0-nightly-2024-10-18"

Impact

Add 👍 to issues you consider important. If others are impacted by this issue, please ask them to add 👍 to it.

@digama0
Copy link
Collaborator

digama0 commented Oct 20, 2024

Base conversion is algotihmically quadratic. (Every digit requires doing what amounts to (n % 10, n / 10) and there are not many shortcuts.) I don't think the List Char thing (which is linear) is what you are witnessing. As long as you only do base conversion up to bounded length (by breaking it into limbs of some size), this should be much faster.

@nomeata
Copy link
Contributor Author

nomeata commented Oct 20, 2024

Ok, but presumably there are still some large constant factors on the table here?

@digama0
Copy link
Collaborator

digama0 commented Oct 20, 2024

Yes, possibly, I'm just saying that these numbers don't demonstrate that. If large number literals are transitioned to use a from_array function instead of translating a string, the calls to Nat.repr would disappear except for small numbers, where I think they are not a bottleneck.

@nomeata nomeata changed the title Nat.repr slow for very large literals, quadratic algorithm Nat.repr slow for very large literals Oct 22, 2024
@leanprover-bot leanprover-bot added the P-low We are not planning to work on this issue label Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P-low We are not planning to work on this issue
Projects
None yet
Development

No branches or pull requests

3 participants