-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pymatviz
import cost way too high
#209
Comments
here's a rough first draft of what the import time regression test might look like. doesn't work yet and the values in import sys
import time
from importlib.util import find_spec, module_from_spec
from types import ModuleType
import pytest
def uncached_import(module_name: str) -> ModuleType:
# Find the module spec
spec = find_spec(module_name)
if spec is None:
raise ImportError(f"No module named '{module_name}'")
# Create a new module based on the spec
module = module_from_spec(spec)
# Add the module to sys.modules temporarily
sys.modules[module_name] = module
try:
# Load the module
spec.loader.exec_module(module)
finally:
# Remove the module from sys.modules
sys.modules.pop(module_name, None)
return module
def measure_import_time(module_name: str) -> float:
# Remove the module and its submodules from sys.modules if they're already there
to_remove = [
name
for name in sys.modules
if name == module_name or name.startswith(f"{module_name}.")
]
for name in to_remove:
sys.modules.pop(name, None)
# Measure import time
start_time = time.perf_counter()
uncached_import(module_name)
end_time = time.perf_counter()
return end_time - start_time
def measure_import_times(modules: dict[str, float]) -> dict[str, float]:
return {module: measure_import_time(module) for module in modules}
# Hard-coded baseline import times (replace these with actual measured values)
BASELINE_IMPORT_TIMES = {
"pymatviz": 0.8000,
"pymatviz.ptable": 1.1,
"pymatviz.io": 0.0300,
"pymatviz.scatter": 0.0400,
"pymatviz.phonons": 0.0200,
}
def test_import_time_regression() -> None:
import_times = measure_import_times(BASELINE_IMPORT_TIMES)
for module, baseline_time in BASELINE_IMPORT_TIMES.items():
measured_time = import_times[module]
# rel=0.2 allows up to 20% increase in import time (CI is slower than local)
assert measured_time == pytest.approx(
baseline_time, rel=0.2
), f"{module} import took {measured_time:.4f}s, expected {baseline_time:.4f}s"
# Additional assertion to check if any module takes an unusually long time to import
max_import_time = max(import_times.values())
assert (
max_import_time <= 2.0
), f"A module is taking an unusually long time to import: {max_import_time:.4f}s" |
Running pip install tuna
python -X importtime -c "import pymatviz" 2> pmv.log
tuna pmv.log reveals that one major factor for the high import cost of which imports try:
import torch
except ImportError:
torch = None at the module level rather than inside |
Thanks for sharing that tool (again), I would take over from here :) |
takes almost 2 seconds to import the latest version of
pymatviz
. used to be a lot less. i suspect #194 may have caused this problem without me noticing at the time. current import waterfall can be generated with:should figure out which modules take the longest to load, and decide if they really need to be imported from
pymatviz/__init__.py
or if they could be lazy-loaded instead (i.e. only loaded if the user actually accesses them) and put some regression tests in place to ensure no future PRs blow up the import time again. might also make sense to see how large code bases like pandas keep their import time lowThe text was updated successfully, but these errors were encountered: