Skip to content

Commit

Permalink
Merge latest updates to integration into feature/ingest2metrics befor…
Browse files Browse the repository at this point in the history
…e merge back
  • Loading branch information
RayPlante committed Dec 8, 2023
2 parents 7713bf9 + d96af33 commit e2f299f
Show file tree
Hide file tree
Showing 19 changed files with 924 additions and 16 deletions.
13 changes: 13 additions & 0 deletions docker/cacerts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
This directory contains non-standard CA certificates needed to build the docker
images.

Failures building the Docker containers defined in ../ due to SSL certificate
verification errors may be a consequence of your local network's firewall. In
particular, the firewall may be substituting external site certificates with
its own signed by a non-standard CA certficate (chain). If so, you can place
the necessary certificates into this directory; they will be passed into the
containers, allowing them to safely connect to those external sites.

Be sure the certificates are in PEM format and include a .crt file extension.

Do not remove this README file; doing so may cause a Docker build faiure.
5 changes: 5 additions & 0 deletions docker/dockbuild.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,11 @@ setup_build

log_intro # record start of build into log

# install CA certs into containers that can use them
if { echo $BUILD_IMAGES | grep -qs pymongo; }; then
cp_ca_certs_to pymongo
fi

for container in $BUILD_IMAGES; do
echo '+ ' docker build $BUILD_OPTS -t $PACKAGE_NAME/$container $container | logit
docker build $BUILD_OPTS -t $PACKAGE_NAME/$container $container 2>&1 | logit
Expand Down
4 changes: 2 additions & 2 deletions docker/ejsonschema/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ RUN PYTHON=python3.8 uwsgi --build-plugin "/usr/src/uwsgi/plugins/python python3
RUN update-alternatives --install /usr/lib/uwsgi/plugins/python3_plugin.so \
python_plugin.so /usr/lib/uwsgi/plugins/python38_plugin.so 1

RUN python -m pip install setuptools --upgrade
RUN python -m pip install "setuptools<66.0.0"
RUN python -m pip install json-spec jsonschema==2.4.0 requests \
pytest==4.6.5 filelock crossrefapi pyyaml
pytest==4.6.5 filelock crossrefapi pyyaml jsonpath_ng
RUN python -m pip install --no-dependencies jsonmerge==1.3.0

WORKDIR /root
Expand Down
2 changes: 1 addition & 1 deletion docker/jqfromsrc/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ From oar-metadata/pymongo

RUN apt-get update && \
apt-get install -y libonig-dev curl build-essential libtool zip \
unzip autoconf git
unzip autoconf git bison
RUN pip install pipenv

WORKDIR /root
Expand Down
5 changes: 5 additions & 0 deletions docker/pymongo/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.8 1; \
update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 1; \
update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 1
RUN locale-gen en_US.UTF-8

COPY cacerts/README.md cacerts/*.crt /usr/local/share/ca-certificates/
RUN update-ca-certificates
ENV REQUESTS_CA_BUNDLE /etc/ssl/certs/ca-certificates.crt

ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
Expand Down
12 changes: 12 additions & 0 deletions docker/pymongo/cacerts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
This directory contains non-standard CA certificates needed to build the docker
images.

Failures building the Docker containers defined in ../ due to SSL certificate
verification errors may be a consequence of your local network's firewall. In
particular, the firewall may be substituting external site certificates with
its own signed by a non-standard CA certficate (chain). If so, you can place
the necessary certificates into this directory; they will be passed into the
containers, allowing them to safely connect to those external sites.

Be sure the certificates are in PEM format and include a .crt file extension.

135 changes: 135 additions & 0 deletions model/README-NERDm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# The NIST Extensible Resource Data Model (NERDm): JSON schemas for rich description of digital resources

## Overview

The NIST Extensible Resource Data Model (NERDm) is set of schemas for encoding in JSON format metadata
that describe digital resources. The variety of digital resources it can describe includes not only
digital data sets and collections, but also software, digital services, web sites and portals, and
digital twins. It was created to serve as the internal metadata format used by the NIST Public Data
Repository and Science Portal to drive rich presentations on the web and to enable discovery; however, it
was also designed to enable programmatic access to resources and their metadata by external users.
Interoperability was also a key design aim: the schemas are defined using the JSON Schema standard [1, 2,
3], metadata are encoded as JSON-LD [4, 5], and their semantics are tied to community ontologies, with an
emphasis on DCAT [6] and the US federal Project Open Data (POD) [7] models. Finally, extensibility is also
central to its design: the schemas are composed of a central core schema and various extension schemas.
New extensions to support richer metadata concepts can be added over time without breaking existing
applications.

### About Validation

Validation is central to NERDm's extensibility model. Consuming applications should be able to choose
which metadata extensions they care to support and ignore terms and extensions they don't support.
Furthermore, they should not fail when a NERDm document leverages extensions they don't recognize, even
when on-the-fly validation is required. To support this flexibility, the NERDm framework allows
documents to declare what extensions are being used and where. We have developed an optional extension
to the standard JSON Schema validation (see ejsonschema below) to support flexible validation: while a
standard JSON Schema validater can validate a NERDm document against the NERDm core schema, our extension
will validate a NERDm document against any recognized extensions and ignore those that are not
recognized.

### Data Model Summary

The NERDm data model is based around the concept of resource, semantically equivalent to a schema.org
Resource, and as in schema.org, there can be different types of resources, such as data sets and
software. A NERDm document indicates what types the resource qualifies as via the JSON-LD `@type`
property. All NERDm Resources are described by metadata terms from the core NERDm schema; however,
different resource types can by describe by additional metadata properties (often drawing on particular
NERDm extension schemas). A Resource can contain Components of various types (including
DCAT-defined Distributions); these can include specifically downloadable data files, hierachical data
collecitons, links to web sites (like software repositories), software tools, or other NERDm Resources.
Through the NERDm extension system, domain-specific metadata can be included at either the resource or
component level. The direct semantic and syntactic connections to the DCAT, POD, and schema.org schemas
is intended to ensure unambiguous conversion of NERDm documents into those schemas.

### NERDm Schemas

* [nerdm-schema.json](nerdm-schema.json) -- the Core NERDm schema that includes definitions of the
base `Resource` and `Component` types
* [nerdm-pub-schema.json](nerdm-pub-schema.json) -- an extension schema that define different kinds
of resource publications.
* [nerdm-rls-schema.json](nerdm-rls-schema.json) -- an extension schema that defines types that help
describe different versions or releases of resources.
* [nerdm-bib-schema.json](nerdm-bib-schema.json) -- an extension schema that defines types for richer
descriptions of bibliographic references. In particular, this enables closer interoperability
with DataCite metadata.
* [nerdm-agg-schema.json](nerdm-agg-schema.json) -- an extension schema that defines different types of
data collections or aggregations that are important to the NIST Public Data Repository.
* [nerdm-exp-schema.json](nerdm-exp-schema.json) -- an extension schema that defines types for
describing experimental data and their context.
* [nerdm-sip-schema.json](nerdm-sip-schema.json) -- an extension schema used by the NIST Public Data
Repository to describe an Submission Information Package (SIP).


### Status and Future

As of this writing, the Core NERDm schema and its framework stands at version 0.7 and is compatible with
the "draft-04" version of JSON Schema. Version 1.0 is projected to be released in 2023. In that
release, the NERDm schemas will be updated to the "draft2020" version of JSON Schema [2, 3]. Other
improvements will include stronger support for RDF and the Linked Data Platform through its support of
JSON-LD [5].

## Key Links

<dl>
<dt> The NERDm JSON Schema Files: <br/>
<a href="https://github.com/usnistgov/oar-metadata/tree/integration/model">
https://github.com/usnistgov/oar-metadata/tree/integration/model</a> </dt>
<dd> This directory contains the latest (and previous) versions of the core NERDm Schema and various
extensions. All files with names of the form, "*-schema*.json" are JSON Schema definition files; those
that do not include a version in the file name represent the latest versions. The latest version of the
core schema is called `nerdm-schema.json`, and schemas with names of the form,
"nerdm-[ext]_-schema.json", contain extension schemas. All NERDm schemas here are documented
internally, including semantic definitions of all terms. </dd>

<dt> ejsonschema: Software for Validating JSON supporting extension schemas <br/>
<a href="https://github.com/usnistgov/ejsonschema">
https://github.com/usnistgov/ejsonschema</a> </dt>
<dd> This software repository provides Python software that extends the community software library,
python-jsonschema
(<a href="https://github.com/python-jsonschema/jsonschema">https://github.com/python-jsonschema/jsonschema</a>)
to support NERDm's extension framework. Use the scripts/validate script to validate NERDm
documents on the command line. (Type <code>validate -h</code> for more information.) </dd>

<dt> Example NERDm Documents <br/>
<a href="https://github.com/usnistgov/oar-metadata/tree/integration/model/examples">
https://github.com/usnistgov/oar-metadata/tree/integration/model/examples</a> </dt>
<dd> This folder contains example NERDm documents that illustrate the NERDm data model and use of
extension schemas. These all can be validated using the ejsonschema validate script. </dd>

<dt> NERDm Support Software <br/>
<a href="https://github.com/usnistgov/oar-metadata">
https://github.com/usnistgov/oar-metadata</a> </dt>
<dd> This software repository includes a Python package, <code>nistoar.nerdm</code>, that aids in
creating and handling NERDm documents. In particular, it includes converters that convert NERDm
instances into other formats (like POD, schema.org, DataCite and DCAT). It can also transform NERDm
documents conforming to earlier versions of the schemas to that of the latest versions. </dd>
</dl>

## References

[1] JSON Schema Website, URL: https://json-schema.org/

[2] Galiegue, F., Zyp, K, and Court, G. (2013). JSON Schema: core definitions and terminology (draft04),
IETF Internet-Draft, URL: https://datatracker.ietf.org/doc/html/draft-zyp-json-schema-04

[3] Galiegue, F., Zyp, K, and Court, G. (2013). JSON Schema: interactive and no interactive validation
(draft04), IETF Internet-Draft, URL: https://datatracker.ietf.org/doc/html/draft-fge-json-schema-00

[4] JSON-LD Website, URL: https://json-ld.org/

[5] Sporny, M., Longley, D., Kellogg, G., Lanthaler, M., Champin, P., Lindstrom (2020) JSON-LD 1.1: A
JSON-based Serialization for Linked Data, W3C Recommendation 16 July 2020, URL:
https://www.w3.org/TR/json-ld/

[6] Albertoni, R., Browning, D., Cox, S., Gonzalez Beltran, A., Perego, A, Winstanley, P. (2020) Data
Catalog Vocabulary (DCAT) - Version 2, W3C Recommendation 04 February 2020, URL:
https://www.w3.org/TR/vocab-dcat-2/

[7] United States Government, DCAT-US Schema v1.1 (Project Open Data Metadata Schema), URL:
https://resources.data.gov/resources/dcat-us/

[8] McBride, B. (2004). The Resource Description Framework (RDF) and its Vocabulary Description Language
RDFS. Handbook on Ontologies, 51-65. https://doi.org/10.1007/978-3-540-24750-0_3

[9] Candan, K. S., Liu, H., & Suvarna, R. (2001). Resource description framework. ACM SIGKDD Explorations
Newsletter, 3(1), 6-19. https://doi.org/10.1145/507533.507536
7 changes: 7 additions & 0 deletions model/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
This folder contains files that define various kinds of data models supported by the oar-metadata software.

For more infomation...

* ...about the NERDm Schema Framework, see [README-NERDm.md](README-NERDm.md)
* check out NERDm examples in the examples subfolder

15 changes: 15 additions & 0 deletions oar-build/_dockbuild.sh
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,21 @@ function setup_build {
BUILD_OPTS=`collect_build_opts`
}

function cp_ca_certs_to {
# assuming we are in the docker dir
[ \! -d cacerts ] || {
crts=`compgen -G 'cacerts/*.crt' || true`
[ -z "$crts" ] || {
echo "${prog}: installing CA certs from docker/cacerts"
for cont in $@; do
mkdir -p $cont/cacerts
echo '+' cp $crts cacerts/README.md $cont/cacerts
cp $crts cacerts/README.md $cont/cacerts
done
}
}
}

function help {
helpfile=$OAR_BUILD_DIR/dockbuild_help.txt
[ -f "$OAR_DOCKER_DIR/dockbuild_help.txt" ] && \
Expand Down
25 changes: 25 additions & 0 deletions python/nistoar/base/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
from collections.abc import Mapping
from urllib.parse import urlparse

import jsonpath_ng as jp

from . import OARException

oar_home = None
Expand Down Expand Up @@ -476,3 +478,26 @@ def lookup_config_server(serverport):
"""
raise NotImplementedError()

NO_VALUE=NotImplemented
RAISE=NO_VALUE
def hget_jp(obj: Mapping, path: str, default=None):
"""
return the first value from within a hierarchical dictionary (e.g. JSON or config structure)
that corresponds to a given location path. The location path is JSONPath-compliant string
(https://goessner.net/articles/JsonPath/). This function is intended for use with paths that
uniquely locate data--i.e. resolve to only one value.
:param dict obj: the dictionary to search for a matching value.
:param str path: a string indicating the location of the value to return. This should be
a JSONPath-compliant string (where the initial "$." is optional)
:raises KeyError: if default is not provide (i.e. is RAISE) and the path does not resolve to
an existing location.
"""
try:
return jp.parse(path).find(obj)[0].value
except IndexError:
if default is RAISE:
raise KeyError(path)
return default

hget = hget_jp

Loading

0 comments on commit e2f299f

Please sign in to comment.