Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some OptimizeIteration tasks fail with UnpicklingError #10

Open
tddough98 opened this issue Jul 28, 2023 · 1 comment
Open

Some OptimizeIteration tasks fail with UnpicklingError #10

tddough98 opened this issue Jul 28, 2023 · 1 comment

Comments

@tddough98
Copy link
Collaborator

I saw 3/50 OptimizeIteration tasks fail simultaneously with the same error. It was the 4th round of OptimizeIteration, so the previous 3 rounds were able to run successfully.

2023-07-28 14:50:47,959 - Optimize4.11 - INFO - Beginning Optimize4 11
2023-07-28 14:50:48,003 - Optimize4.11 - ERROR - Failed to interpret file '/data/peer/doughet/merlin-analysis/rawdata/Optimize3/scale_factors.npy' as a pickle
Traceback (most recent call last):
  File "/home/doughet/miniconda3/envs/merlin/lib/python3.10/site-packages/numpy/lib/npyio.py", line 441, in load
    return pickle.load(fid, **pickle_kwargs)
EOFError: Ran out of input

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/lila/home/doughet/MERlin/merlin/core/analysistask.py", line 333, in run
    self._run_analysis(fragmentIndex)
  File "/lila/home/doughet/MERlin/merlin/analysis/optimize.py", line 77, in _run_analysis
    scaleFactors = self._get_previous_scale_factors()
  File "/lila/home/doughet/MERlin/merlin/analysis/optimize.py", line 193, in _get_previous_scale_factors
    scaleFactors = previousIteration.get_scale_factors()
  File "/lila/home/doughet/MERlin/merlin/analysis/optimize.py", line 386, in get_scale_factors
    return self.dataSet.load_numpy_analysis_result(
  File "/lila/home/doughet/MERlin/merlin/core/dataset.py", line 657, in load_numpy_analysis_result
    return np.load(savePath, allow_pickle=True)
  File "/home/doughet/miniconda3/envs/merlin/lib/python3.10/site-packages/numpy/lib/npyio.py", line 443, in load
    raise pickle.UnpicklingError(
_pickle.UnpicklingError: Failed to interpret file '/data/peer/doughet/merlin-analysis/rawdata/Optimize3/scale_factors.npy' as a pickle

However, when I restart the pipeline, the task completes without issue. Maybe it's a file system issue where multiple tasks are opening or modifying a file simulataneously and all of them fail together?

Not sure if this is related to this issue, but I'm also seeing these warnings printed to the console

/lila/home/doughet/MERlin/merlin/util/registration.py:119: RuntimeWarning: invalid value encountered in divide
  unsmoothm = np.divide(dIdv + dIdu, dIdu - dIdv)
/lila/home/doughet/MERlin/merlin/util/registration.py:116: RuntimeWarning: invalid value encountered in divide
  m = np.divide(-(fdv + fdu), (fdu - fdv))
@tddough98
Copy link
Collaborator Author

also sometimes seeing the same error when loading backgrounds.npy from the previous OptimizeIteration task.

it appears that this also occurs with >1 tasks failing, so I'm more inclined to think it's an issue with simultaneous file access

I wonder if these failed tasks could be restarted immediately as rerunning the pipeline seems to always run successfully

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant