-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Approval for array-based data structure and multi-tiered digests for sequence collections #10
Comments
@jmarshall and @jb-adams would be interested to hear your thoughts. |
I see this as a way to incorporate the two paradigms to sequence identifiers that we've been struggling to navigate:
The result is a vendor-specific "meta-checksum" that contains within it the vendor-neutral checksum component. I think this is good, and a good way to get adopters to implement, but does raise a few questions:
My questions are mainly based on assessment of equivalency, and making sure we don't have a false-negative problem (2 collections are the same at the sequence level but appear different) when researchers share results. But if this is not a concern, then everything looks good to me. |
I guess the following comment is relevant here, depending on whether the question of ordering is included in the definition of the "digest algorithm": #7 (comment) |
This was approved in the ADR with PR #14 |
I would like to solicit feedback from community members on the latest iteration of the digest algorithm for sequence collection identifiers. After lots of discussion (see #8, #1), here is the latest proposal.
It's an array of arrays, and here I'm showing just 3 arrays, but this approach works for any number of arrays, and is backwards compatible with sequence collections that lack certain array definitions.
The retrieval works like this
A simple server could allow only
recursion=1
, but we agreed thatrecursion=0
is very useful and should probably be a required part of the specification, whilerecursion=2
should probably be disabled. Given that the=0
and=1
layers are possible, this also enables retrieval of components, which is independently valuable:Regardless of what elements end up in a sequence collection, we're in a position to approve this as the digest algorithm. Feedback welcome!
The text was updated successfully, but these errors were encountered: