Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement]: Support fastjsonschema as well as jsonschema for schema validator #5764

Open
ani-sinha opened this issue Oct 1, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@ani-sinha
Copy link
Contributor

Enhancement

In RHEL, we are looking to drop support for jsonschema. We would like to use fastjsonschema. The api doc is here:
https://horejsek.github.io/python-fastjsonschema/

It would be nice if cloud-init could support both python libraries to check json schema.

@ani-sinha ani-sinha added enhancement New feature or request new An issue that still needs triage labels Oct 1, 2024
@blackboxsw blackboxsw removed the new An issue that still needs triage label Oct 1, 2024
@blackboxsw
Copy link
Collaborator

blackboxsw commented Oct 1, 2024

Good suggestion @ani-sinha, I can see it packaged in alpine, rawhide, fedora, debian and ubuntu archives as well so making this switch should be desired for distributions which have this package available. This will be a fairly complex bit of work because cloudinit.config.schema has some tight coupling with the following for custom extensions:

  • jsonscheme.ValidationError
  • jsonschema.Draft4Validator, FormatChecker
  • jsonschema.exceptions.best_match, SchemaError
  • jsonschema.validators.create

But, this is definitely something worth exploring further to size this amount of effort.

@blackboxsw
Copy link
Collaborator

blackboxsw commented Oct 2, 2024

Testing the impact to boot I put util.log_time around our current schema validation performed during early boot at it averages at 0.003 seconds. Given this is a fairly complex bit of work to adapt to use fastjsonschema for our custom annotations, errors and deprecation handling it's unlikely that upstream will prioritize this effort as python3-jsonschema seems to be functional and a minimal cost to boot times.

We would welcome patches for this work and help shepherd in those changes. If the community starts to approach this effort we would like to make sure that we can retain our schema error and deprecation annotation functionality:

cat > example.yaml <<EOF
#cloud-config

# Basic system setup
hostname: example-host
bogus: asdf

# Package management
apt_update: true
package_upgrade: true
packages:
  - git
  - nginx
  - python3
EOF
 lxc launch ubuntu-daily:oracular -c security.nesting=true -c cloud-init.user-data="$(cat example.yaml)" -c test-5764
lxc exec test-5764 -- cloud-init status --wait
lxc exec test-5764 bash
root@example-host:~# cloud-init schema --system --annotate
Found cloud-config data types: user-data, network-config

1. user-data at /var/lib/cloud/instances/e4605433-ac8c-4514-873a-6661cb8ae7ac/cloud-config.txt:
#cloud-config

# from 1 files
# part-001

---
apt_update: true		# D1
bogus: asdf		# E1
hostname: example-host
package_upgrade: true
packages:
- git
- nginx
- python3
...

# Errors: -------------
# E1: Additional properties are not allowed ('bogus' was unexpected)


# Deprecations: -------------
# D1:  Deprecated in version 22.2. Use **package_update** instead.



2. network-config at /var/lib/cloud/instances/e4605433-ac8c-4514-873a-6661cb8ae7ac/network-config.json:
  Valid schema network-config
Error: Invalid schema: user-data

@vittyvk
Copy link

vittyvk commented Oct 3, 2024

I think the main motivation to consider switching from jsonschema to fastjsonschema it to reduce the install footprint and not to cut validation time. E.g. on my Fedora 40 I see:

$ rpm -q --requires python3-fastjsonschema | grep -v rpmlib
python(abi) = 3.12

$ rpm -q --requires python3-jsonschema | grep -v rpmlib
/usr/bin/python3
python(abi) = 3.12
python3.12dist(attrs) >= 22.2
python3.12dist(jsonschema-specifications) >= 2023.3.6
python3.12dist(referencing) >= 0.28.4
python3.12dist(rpds-py) >= 0.7.1

Schema validation is optional, this gives two options: install jsonschema and all its dependencies and get full validation or not install it and skip validation completely. Maybe there's room for a third option, e.g. install fastjsonschema and get some 'basic' validation (errors only)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants