-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
decode_bytes()
improper error handling
#35
Comments
The >>> tuple(decode_bytes([b'08 War \xaf No More Trouble.shn.mp3']))
('08 War \\xaf', ' No More Trouble.shn.mp3') This behavior allows for relatively simple error handling in the presence of multiple decoding errors in an input chunk. Generally, there is no guarantee that If one wants to receive individual decoded lines, one can decode first and itemize the decoded stream. The following examples illustrate the difference: >>> tuple(decode_bytes(itemize([b'08 War \xaf No More Trouble.shn.mp3'], None)))
('08 War \\xaf', ' No More Trouble.shn.mp3') tuple(itemize(decode_bytes([b'08 War \xaf No More Trouble.shn.mp3']), None))
('08 War \\xaf No More Trouble.shn.mp3',) WDYT about extending the documentation to make users aware of this effect? |
`decode_bytes()` can yield multiple output chunks for one input chunk (in the case of decoding errors). This implies that it cannot be meaningfully used after `itemize()` without threatening the semantics of the items (i.e., with split-by-line items are no longer unique lines). For this reason, this commit changes the order of these helpers in the `gitworktree()` implementation. Refs: https://github.com/datalad/datalad-next/issues/740
Thanks! This makes sense and it should be documented, I agree. I addressed the problem for This also revealed another issue datalad/datalad-next#742 |
Analog change to 5fe82ea for `iter_gitdiff()` Refs: https://github.com/datalad/datalad-next/issues/740
Analog change to 5fe82ea for `iter_gittree()` Refs: https://github.com/datalad/datalad-next/issues/740
I am moving this issue to |
Fixed by #36 |
Reopening, because the docs are not yet adjusted |
I have a line from
git ls-files
that comes out ofitemize()
like so:decode_bytes()
trips on the\xaf
and yieldsi.e. it swallows the end of the file name.
A test like the following is sufficient to reproduce the problem
Giving:
(the test target is probably not the real thing, but it shows the problematic behavior)
The text was updated successfully, but these errors were encountered: