You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I'm reading the lib.rs code and found the encode_with_unstable api, tt donesn't seem to be used in the documentation?
But it occupied so much in the lib.rs, and the comments in code don't explain Why and What.
So maybe some extra explanation?
The text was updated successfully, but these errors were encountered:
This is a great question. I have some nice internal documentation explaining what problem this is solving, I'll see if I can make a version of it that doesn't include internal-only details.
Any update on this? I'm working on a PR for this repo and need to make sure I don't break encode_with_unstable. I think I get the main point that if you're splitting text arbitrarily, not necessarily aligned with the regex spits, the tokens at the boundaries where the split occurs might end up different than if the whole string were tokenized as one. But it would help to get some more backstory on the motivation for this and the use-cases that it's serving.
tiktoken/src/lib.rs
Line 524 in 095924e
Hello, I'm reading the lib.rs code and found the
encode_with_unstable
api, tt donesn't seem to be used in the documentation?But it occupied so much in the lib.rs, and the comments in code don't explain Why and What.
So maybe some extra explanation?
The text was updated successfully, but these errors were encountered: