Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDK]Support Docker image as objective in the tune API #2338

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 47 additions & 39 deletions sdk/python/v1beta1/kubeflow/katib/api/katib_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,9 +164,9 @@ def tune(
self,
# TODO (andreyvelich): How to be consistent with other APIs (name) ?
name: str,
objective: Callable,
objective: Union[Callable, str],
parameters: Dict[str, Any],
base_image: str = constants.BASE_IMAGE_TENSORFLOW,
#base_image: str = constants.BASE_IMAGE_TENSORFLOW,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we should keep the base_image, since we use it when user set objective as train function.

namespace: Optional[str] = None,
env_per_trial: Optional[
Union[Dict[str, str], List[Union[client.V1EnvVar, client.V1EnvFromSource]]]
Expand Down Expand Up @@ -294,20 +294,12 @@ def tune(
if max_failed_trial_count is not None:
experiment.spec.max_failed_trial_count = max_failed_trial_count

# Validate objective function.
utils.validate_objective_function(objective)

# Extract objective function implementation.
objective_code = inspect.getsource(objective)

# Objective function might be defined in some indented scope
# (e.g. in another function). We need to dedent the function code.
objective_code = textwrap.dedent(objective_code)

# Iterate over input parameters.
input_params = {}
experiment_params = []
trial_params = []
base_image = constants.BASE_IMAGE_TENSORFLOW,

for p_name, p_value in parameters.items():
# If input parameter value is Katib Experiment parameter sample.
if isinstance(p_value, models.V1beta1ParameterSpec):
Expand All @@ -326,33 +318,49 @@ def tune(
# Otherwise, add value to the function input.
input_params[p_name] = p_value

# Wrap objective function to execute it from the file. For example
# def objective(parameters):
# print(f'Parameters are {parameters}')
# objective({'lr': '${trialParameters.lr}', 'epochs': '${trialParameters.epochs}', 'is_dist': False})
objective_code = f"{objective_code}\n{objective.__name__}({input_params})\n"

# Prepare execute script template.
exec_script = textwrap.dedent(
"""
program_path=$(mktemp -d)
read -r -d '' SCRIPT << EOM\n
{objective_code}
EOM
printf "%s" "$SCRIPT" > $program_path/ephemeral_objective.py
python3 -u $program_path/ephemeral_objective.py"""
)

# Add objective code to the execute script.
exec_script = exec_script.format(objective_code=objective_code)

# Install Python packages if that is required.
if packages_to_install is not None:
exec_script = (
utils.get_script_for_python_packages(packages_to_install, pip_index_url)
+ exec_script
# Handle different types of objective input
if callable(objective):
# Validate objective function.
utils.validate_objective_function(objective)

# Extract objective function implementation.
objective_code = inspect.getsource(objective)

# Objective function might be defined in some indented scope
# (e.g. in another function). We need to dedent the function code.
objective_code = textwrap.dedent(objective_code)

# Wrap objective function to execute it from the file. For example
# def objective(parameters):
# print(f'Parameters are {parameters}')
# objective({'lr': '${trialParameters.lr}', 'epochs': '${trialParameters.epochs}', 'is_dist': False})
objective_code = f"{objective_code}\n{objective.__name__}({input_params})\n"

# Prepare execute script template.
exec_script = textwrap.dedent(
"""
program_path=$(mktemp -d)
read -r -d '' SCRIPT << EOM\n
{objective_code}
EOM
printf "%s" "$SCRIPT" > $program_path/ephemeral_objective.py
python3 -u $program_path/ephemeral_objective.py"""
)

# Add objective code to the execute script.
exec_script = exec_script.format(objective_code=objective_code)

# Install Python packages if that is required.
if packages_to_install is not None:
exec_script = (
utils.get_script_for_python_packages(packages_to_install, pip_index_url)
+ exec_script
)
elif isinstance(objective, str):
base_image=objective
else:
raise ValueError("The objective must be a callable function or a docker image.")

if isinstance(resources_per_trial, dict):
if "gpu" in resources_per_trial:
resources_per_trial["nvidia.com/gpu"] = resources_per_trial.pop("gpu")
Expand Down Expand Up @@ -395,8 +403,8 @@ def tune(
client.V1Container(
name=constants.DEFAULT_PRIMARY_CONTAINER_NAME,
image=base_image,
command=["bash", "-c"],
args=[exec_script],
command=["bash", "-c"] if callable(objective) else None,
args=[exec_script] if callable(objective) else None,
Comment on lines +406 to +407
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I'm not sure if we can assign None to command and args here when we use Docker image as objective.

As @andreyvelich shows an example for us, we sometimes need to pass command and args to the training container to execute python scripts with some parameters.

Could you explain your idea in details so that I can understand more? WDYT👀 @akhilsaivenkata @andreyvelich

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, initially we can just allow user to set image as objective without command and args.
Similar to how we allow create training job using base_image parameter: https://github.com/kubeflow/training-operator/blob/master/sdk/python/kubeflow/training/api/training_client.py#L327C35-L327C45.

env=env if env else None,
env_from=env_from if env_from else None,
resources=resources_per_trial,
Expand Down
Loading