-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parallel processing using shared globals (macOS) #140
Comments
Sorry, just noticed that you are using the multiprocessing backend and not loky, so the above doesn't apply & I don't know why some of the macos processes are not seeing the changes to the global state... |
... yes, at some point when this changed in joblib, I had to force the multiprocessing backend to avoid problems as you just described... but I never tested on macOS. Using the multiprocessing backend in general can be subject to other issues... |
- move rpbp_models dir to src/rpbp/models and add to MANIFEST.in - models are now installed to `rpbp_install_dir`/models - get_default_models_base() now returns `rpbp_install_dir`/models - `compile_rpbp_models` always compiles models - todo: run this if fail to find pre-compiled binaries - remove pbiotools dependency from compile_rpbp_models - this allows us to run rpbp-compile-models as part of bioconda recipe - user gets pre-compiled stan models as part of rp-bp installation - TODO: add logic to check if pre-compiled models are missing & if so compile them? Use bioconda dependencies in CI and test macos / different Python versions - add MacOS CI jobs - use 1 cpu in tests on mac due to parallel processing issue dieterich-lab#140 - test Python versions 3.7-3.10 - install all dependencies including pbiotools from bioconda - use available cpus in tests instead of hard-coding num_cpus - disable failing flake8 checks in pre-commit for now - apply pre-commit fixes - add missing `import scipy.stats` to estimate_orf_bayes_factors.py TODO: remove THIS! add debug output of RawArray
- move rpbp_models dir to src/rpbp/models and add to MANIFEST.in - models are now installed to `rpbp_install_dir`/models - get_default_models_base() now returns `rpbp_install_dir`/models - `compile_rpbp_models` - checks if compiled exes exist, if so doesn't compile unless `--force` is specified - remove pbiotools dependency - this allows us to run rpbp-compile-models as part of bioconda recipe build - user gets pre-compiled stan models as part of rp-bp installation - these functions now call compile_stan_models() to ensure exes are there before using them: - orf_profile_construction/estimate_metagene_profile_bayes_factors.py - translation_prediction/estimate_orf_bayes_factors.py Use bioconda dependencies in CI and test macos / different Python versions - add MacOS CI jobs - use 1 cpu in tests on mac due to parallel processing issue dieterich-lab#140 - test Python versions 3.7-3.10 - install all dependencies including pbiotools from bioconda - use available cpus in tests instead of hard-coding num_cpus - disable failing flake8 checks in pre-commit for now - apply pre-commit fixes - add missing `import scipy.stats` to estimate_orf_bayes_factors.py
- move rpbp_models dir to src/rpbp/models and add to MANIFEST.in - models are now installed to `rpbp_install_dir`/models - get_default_models_base() now returns `rpbp_install_dir`/models - `compile_rpbp_models` - checks if compiled exes exist, if so doesn't compile unless `--force` is specified - remove pbiotools dependency - this allows us to run rpbp-compile-models as part of bioconda recipe build - user gets pre-compiled stan models as part of rp-bp installation - these functions now call compile_stan_models() to ensure exes are there before using them: - orf_profile_construction/estimate_metagene_profile_bayes_factors.py - translation_prediction/estimate_orf_bayes_factors.py Use bioconda dependencies in CI and test macos / different Python versions - add MacOS CI jobs - use 1 cpu in tests on mac due to parallel processing issue dieterich-lab#140 - test Python versions 3.7-3.10 - install all dependencies including pbiotools from bioconda - use available cpus in tests instead of hard-coding num_cpus - disable failing flake8 checks in pre-commit for now - apply pre-commit fixes - add missing `import scipy.stats` to estimate_orf_bayes_factors.py
- move rpbp_models dir to src/rpbp/models and add to MANIFEST.in - models are now installed to `rpbp_install_dir`/models - get_default_models_base() now returns `rpbp_install_dir`/models - `compile_rpbp_models` - checks if compiled exes exist, if so doesn't compile unless `--force` is specified - remove pbiotools dependency - this allows us to run rpbp-compile-models as part of bioconda recipe build - user gets pre-compiled stan models as part of rp-bp installation - use compiled models instead of re-compiling - these functions would previously re-compile the stan files with every `CmdStanModel` instantiation - orf_profile_construction/estimate_metagene_profile_bayes_factors.py - translation_prediction/estimate_orf_bayes_factors.py - specify `exe_file` instead of `stan_file` to avoid re-compiling them - call compile_rpbp_models.compile() to ensure exes are there before using them Use bioconda dependencies in CI and test macos / different Python versions - add MacOS CI jobs - use 1 cpu in tests on mac due to parallel processing issue dieterich-lab#140 - test Python versions 3.7-3.10 - install all dependencies including pbiotools from bioconda - use available cpus in tests instead of hard-coding num_cpus - disable failing flake8 checks in pre-commit for now - apply pre-commit fixes - add missing `import scipy.stats` to estimate_orf_bayes_factors.py
- move rpbp_models dir to src/rpbp/models and add to MANIFEST.in - models are now installed to `rpbp_install_dir`/models - get_default_models_base() now returns `rpbp_install_dir`/models - `compile_rpbp_models` - checks if compiled exes exist, if so doesn't compile unless `--force` is specified - remove pbiotools dependency - this allows us to run rpbp-compile-models as part of bioconda recipe build - user gets pre-compiled stan models as part of rp-bp installation - use compiled models instead of re-compiling - these functions would previously re-compile the stan files with every `CmdStanModel` instantiation - orf_profile_construction/estimate_metagene_profile_bayes_factors.py - translation_prediction/estimate_orf_bayes_factors.py - specify `exe_file` instead of `stan_file` to avoid re-compiling them - call compile_rpbp_models.compile() to ensure exes are there before using them Use bioconda dependencies in CI and test macos / different Python versions - add MacOS CI jobs - use 1 cpu in tests on mac due to parallel processing issue dieterich-lab#140 - test Python versions 3.7-3.10 - install all dependencies including pbiotools from bioconda - use available cpus in tests instead of hard-coding num_cpus - disable failing flake8 checks in pre-commit for now - apply pre-commit fixes - add missing `import scipy.stats` to estimate_orf_bayes_factors.py
For macOS users (with conda install), this might be a problem in general. We should probably add a note in the docs. |
The parallel shared data strategy currently used is:
set global var to point to large array of data -> fork parallel processes that all have read-only access to this global data
This is fine as long as the fork happens after the global var has been set to the desired value.
However the joblib backend loky maintains a pool of processes instead of always forking for every parallel operation (which is good), so depending on various factors it could be that the process where
get_all_bayes_factors
runs was forked before the global variable was set to the desired value, and it still sees the previous value0
instead of the RawArray (which is obviously bad).This job the processes always saw the global array data: https://github.com/lkeegan/rp-bp/actions/runs/3894618408/jobs/6648857310
This job the processes saw the previous global int
0
: https://github.com/lkeegan/rp-bp/actions/runs/3894618408/jobs/6648857542#step:7:1653I'll look into solutions - one simple one could be to pass a dict containing the RawArrays (just the pointers, not the data) to the function instead of using global variables to share this information.
Originally posted by @lkeegan in #25 (comment)
The text was updated successfully, but these errors were encountered: