You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
IIUC, that issue only happens when the result-set of the query is non-deterministic, i.e. only when there's a non-deterministic filter on the rows (such as LIMIT). So I can see 2 solutions:
Detect deterministic cases and avoid copying in that case.
Rely on row ids to avoid running the filter twice: create the UDF table based on the original query, and then inner-join on row ids with an unfiltered version of the query.
The pre-UDF logic in
datachain/src/datachain/query/dataset.py
Lines 589 to 598 in ee43fd1
For context, this was introduced in https://github.com/iterative/dvcx/pull/1068
The text was updated successfully, but these errors were encountered: