Nat.repr slow for very large literals #5771

nomeata · 2024-10-19T11:58:01Z

When experimenting with encoding set as bitvectors as Nats, which seems like it should be rather efficient, I noticed that processing the file in VSCode was quick enough, but lake build would take a long time, and lean is busy writing the .c file.

I suspect that Nat.repr is just very slow on larger literals, given that it goes through List Char, rather than allocating a String of the right size and then (linearly) updating it digit by digit (or chopping the Nat into USize-sized limbs, using the C code to print it, and concatenating efficiently).

It also seems to scale quadratically with the length of the number, as

#time #guard_msgs(drop all) in #eval (Nat.repr (10^50000)).length
#time #guard_msgs(drop all) in #eval (Nat.repr (10^100000)).length
#time #guard_msgs(drop all) in #eval (Nat.repr (10^200000)).length
#time #guard_msgs(drop all) in #eval (Nat.repr (10^400000)).length

shows, which gives me (on live.lean-lang.org) timings

time: 622ms
time: 2374ms
time: 9302ms
time: 37342ms

Versions

"4.12.0-nightly-2024-10-18"

Impact

Add 👍 to issues you consider important. If others are impacted by this issue, please ask them to add 👍 to it.

The text was updated successfully, but these errors were encountered:

digama0 · 2024-10-20T12:50:55Z

Base conversion is algotihmically quadratic. (Every digit requires doing what amounts to (n % 10, n / 10) and there are not many shortcuts.) I don't think the List Char thing (which is linear) is what you are witnessing. As long as you only do base conversion up to bounded length (by breaking it into limbs of some size), this should be much faster.

nomeata · 2024-10-20T14:21:02Z

Ok, but presumably there are still some large constant factors on the table here?

digama0 · 2024-10-20T14:52:10Z

Yes, possibly, I'm just saying that these numbers don't demonstrate that. If large number literals are transitioned to use a from_array function instead of translating a string, the calls to Nat.repr would disappear except for small numbers, where I think they are not a bottleneck.

nomeata added the bug Something isn't working label Oct 19, 2024

nomeata mentioned this issue Oct 19, 2024

represent verified implication graph in Lean teorth/equational_theories#636

Draft

nomeata changed the title ~~Nat.repr slow for very large literals, quadratic algorithm~~ Nat.repr slow for very large literals Oct 22, 2024

nomeata mentioned this issue Oct 22, 2024

Optimizing terms for kernel reduction (meta issue) #5806

Open

11 tasks

leanprover-bot added the P-low We are not planning to work on this issue label Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nat.repr slow for very large literals #5771

Nat.repr slow for very large literals #5771

nomeata commented Oct 19, 2024

digama0 commented Oct 20, 2024 •

edited

Loading

nomeata commented Oct 20, 2024

digama0 commented Oct 20, 2024

Nat.repr slow for very large literals #5771

Nat.repr slow for very large literals #5771

Comments

nomeata commented Oct 19, 2024

Versions

Impact

digama0 commented Oct 20, 2024 • edited Loading

nomeata commented Oct 20, 2024

digama0 commented Oct 20, 2024

digama0 commented Oct 20, 2024 •

edited

Loading