You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the current implementation of the addFiles workflow, a separate API call is made for each individual file when a completed Globus upload is finalized on the Dataverse end. It's being done like that simply because it was inherited from how direct uploads to S3 are handled. There is no real need to keep doing this - we could instead call the Globus /operation/endpoint/.../ls on the entire folder once, when we get the confirmation that the transfer task has been completed, and populate the sizes for all the files in the upload batch.
In the ongoing prod. use case, the depositors are trying transfer entire 8K-file datasets in one batch. Miraculously, this has worked for at least one dataset, but these size lookups proved to be the bottleneck and took an obscene amount of time.
The text was updated successfully, but these errors were encountered:
In the current implementation of the addFiles workflow, a separate API call is made for each individual file when a completed Globus upload is finalized on the Dataverse end. It's being done like that simply because it was inherited from how direct uploads to S3 are handled. There is no real need to keep doing this - we could instead call the Globus
/operation/endpoint/.../ls
on the entire folder once, when we get the confirmation that the transfer task has been completed, and populate the sizes for all the files in the upload batch.In the ongoing prod. use case, the depositors are trying transfer entire 8K-file datasets in one batch. Miraculously, this has worked for at least one dataset, but these size lookups proved to be the bottleneck and took an obscene amount of time.
The text was updated successfully, but these errors were encountered: