Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Prototype workflow for generating and accessioning speech-to-text extraction #1

Open
1 of 14 tasks
jmartin-sul opened this issue Sep 12, 2024 · 0 comments
Open
1 of 14 tasks

Comments

@jmartin-sul
Copy link
Member

jmartin-sul commented Sep 12, 2024

Some admonishments/guidelines for the below investigations, based on 2024-09-10 planning meeting

model choice/configuration

  • We are assuming that we will use Whisper, or some variant (e.g. WhisperX), because it provided what we felt was the best combination of output quality and performance from the tools that were evaluated earlier in 2024 (@edsu may be able to link to that analysis for context?). If we determine that we want to go with a completely different model, we need to write up our reasoning for approval.
  • We want our solution to provide access to Whisper's tuning parameters so that we can tweak them as needed, so completely blackbox solutions that run Whisper with no access to configuration aren't acceptable.

terminology

After some discussion, we settled on the term "speech to text" to encompass text extraction from speech in audio, whether it has video or not (there was lack of consensus/confusion about whether "caption" applies to audio-only, and it does also apply to still-image descriptions; and "transcript" doesn't quite encompass what captions do for video).

So e.g. speechToTextWF, speech_to_text as a snake-case var name, "speech to text" or "speech-to-text generation" as a human readable term, etc.

infrastructure provisioning

  • We would like to avoid (or at least minimize as much as possible) vendor lock-in. We're highly likely to go with AWS to start, since we have more departmental expertise there, but GCP isn't out of the question. The cloud vendor has to be an org with which Stanford has a business agreement, and which is available through Cardinal Cloud, so that might rule out anything other than Amazon and Google? But as much as possible, we should use building blocks that have analogs in multiple major cloud vendors
  • related, but somewhat standalone point: ultimately, we should define and deploy the cloud infrastructure using Terraform. It's meant to be platform agnostic and the department already uses it. But also, all of our permanent prod/stage/qa cloud infrastructure is deployed assuming that Terraform is the source of truth, so things that were created manually (e.g. using the AWS web console or one off aws CLI commands) will cause confusion in the future. It's totally fine to experiment with building blocks by manually spinning them up that way, but once the experiment is done, those should be torn down and defined formally in Terraform.

model usage

  • ⚠️ It is unacceptable for our data to be used to train the models of other orgs. This rules out, for example, OpenAI's hosted Whisper service. This is a SUL-wide rule, at the moment.

todo

@jmartin-sul jmartin-sul changed the title [WIP] EPIC: Prototype workflow for transcriptioning and captioning EPIC: Prototype workflow for transcriptioning and captioning Sep 12, 2024
@jmartin-sul jmartin-sul changed the title EPIC: Prototype workflow for transcriptioning and captioning [EPIC] Prototype workflow for transcriptioning and captioning Sep 12, 2024
@jmartin-sul jmartin-sul changed the title [EPIC] Prototype workflow for transcriptioning and captioning [EPIC] Prototype workflow for generating and accessioning transcripts/captions Sep 12, 2024
@jmartin-sul jmartin-sul changed the title [EPIC] Prototype workflow for generating and accessioning transcripts/captions [EPIC] Prototype workflow for generating and accessioning speech-to-text extraction Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant