You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When experimenting with encoding set as bitvectors as Nats, which seems like it should be rather efficient, I noticed that processing the file in VSCode was quick enough, but lake build would take a long time, and lean is busy writing the .c file.
I suspect that Nat.repr is just very slow on larger literals, given that it goes through List Char, rather than allocating a String of the right size and then (linearly) updating it digit by digit (or chopping the Nat into USize-sized limbs, using the C code to print it, and concatenating efficiently).
It also seems to scale quadratically with the length of the number, as
#time #guard_msgs(drop all) in #eval (Nat.repr (10^50000)).length
#time #guard_msgs(drop all) in #eval (Nat.repr (10^100000)).length
#time #guard_msgs(drop all) in #eval (Nat.repr (10^200000)).length
#time #guard_msgs(drop all) in #eval (Nat.repr (10^400000)).length
shows, which gives me (on live.lean-lang.org) timings
Base conversion is algotihmically quadratic. (Every digit requires doing what amounts to (n % 10, n / 10) and there are not many shortcuts.) I don't think the List Char thing (which is linear) is what you are witnessing. As long as you only do base conversion up to bounded length (by breaking it into limbs of some size), this should be much faster.
Yes, possibly, I'm just saying that these numbers don't demonstrate that. If large number literals are transitioned to use a from_array function instead of translating a string, the calls to Nat.repr would disappear except for small numbers, where I think they are not a bottleneck.
nomeata
changed the title
Nat.repr slow for very large literals, quadratic algorithm
Nat.repr slow for very large literals
Oct 22, 2024
When experimenting with encoding set as bitvectors as
Nat
s, which seems like it should be rather efficient, I noticed that processing the file in VSCode was quick enough, butlake build
would take a long time, and lean is busy writing the.c
file.I suspect that
Nat.repr
is just very slow on larger literals, given that it goes throughList Char
, rather than allocating aString
of the right size and then (linearly) updating it digit by digit (or chopping theNat
intoUSize
-sized limbs, using the C code to print it, and concatenating efficiently).It also seems to scale quadratically with the length of the number, as
shows, which gives me (on live.lean-lang.org) timings
Versions
"4.12.0-nightly-2024-10-18"
Impact
Add 👍 to issues you consider important. If others are impacted by this issue, please ask them to add 👍 to it.
The text was updated successfully, but these errors were encountered: