[Docs] data quality -- freshness #10612

Quantisan · 2024-11-01T06:06:19Z

Description of PR changes above includes a link to an existing GitHub issue
PR title is prefixed with one of: [BUGFIX], [FEATURE], [DOCS], [MAINTENANCE], [CONTRIB]
Code is linted - run invoke lint (uses ruff format + ruff check)
Appropriate tests and docs have been updated

For more information about contributing, visit our community resources.

After you submit your PR, keep the page open and monitor the statuses of the various checks made by our continuous integration process at the bottom of the page. Please fix any issues that come up and reach out on Slack if you need help. Thanks for contributing!

Key freshness Expectations refer to sample data ideas for scenarios ideas for pitfalls draft for intro and conclusion draft for Example copy edit Data preview section transitions, and conclusion update Example to be more standard copy edit Scenarios copy edit Pitfalls as list add 'Factors to consider when setting freshness thresholds' section heading casing shorten intro re-write Factors to consider as a checklist streamline intro add tip on When to use MaxToBeBetween vs MinToBeBetween shorten 'Factors to consider' checklist

netlify · 2024-11-01T06:06:41Z

❌ Deploy Preview for niobium-lead-7998 failed. Why did it fail? →

Name	Link
🔨 Latest commit	`7fa5012`
🔍 Latest deploy log	https://app.netlify.com/sites/niobium-lead-7998/deploys/67246fff9dd34100089f70f0

for more information, see https://pre-commit.ci

codecov · 2024-11-01T06:10:10Z

❌ 1 Tests Failed:

Tests completed	Failed	Passed	Skipped
9688	1	9687	740

View the top 1 failed tests by shortest run time

tests.expectations.core.test_expect_column_values_to_be_of_type test_expect_column_values_to_be_in_set_render_performance

Stack Traces | 2.02s run time

@pytest.mark.unit
    def test_expect_column_values_to_be_in_set_render_performance():
        """
        This test prevents a historical bug in which rendering took ~10 seconds to render 400 items.
        """
    
        large_number = 400
    
        x = ExpectColumnValuesToBeInSet(
            column="foo_column_name", value_set=["foo" for _ in range(large_number)]
        )
    
&gt;       duration_s = timeit.timeit(x.render, number=1)

.../expectations/core/test_expect_column_values_to_be_of_type.py:118: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.../hostedtoolcache/Python/3.12.7......................../x64/lib/python3.12/timeit.py:237: in timeit
    return Timer(stmt, setup, timer, globals).timeit(number)
.../hostedtoolcache/Python/3.12.7......................../x64/lib/python3.12/timeit.py:180: in timeit
    timing = self.inner(it, self.timer)
&lt;timeit-src&gt;:6: in inner
    ???
great_expectations/expectations/expectation.py:434: in render
    self.rendered_content = inline_renderer.get_rendered_content()
.../render/renderer/inline_renderer.py:217: in get_rendered_content
    return self._get_atomic_rendered_content_for_object(render_object=render_object)
.../render/renderer/inline_renderer.py:94: in _get_atomic_rendered_content_for_object
    self._get_atomic_rendered_content_from_renderer_names(
.../render/renderer/inline_renderer.py:126: in _get_atomic_rendered_content_from_renderer_names
    renderer_rendered_content = self._get_renderer_atomic_rendered_content(
.../render/renderer/inline_renderer.py:147: in _get_renderer_atomic_rendered_content
    renderer_rendered_content = InlineRenderer._get_rendered_content_from_renderer_impl(
.../render/renderer/inline_renderer.py:199: in _get_rendered_content_from_renderer_impl
    renderer_rendered_content = renderer_fn(configuration=render_object)
.../render/renderer/renderer.py:28: in inner_func
    return renderer_fn(*args, **kwargs)
great_expectations/expectations/expectation.py:137: in inner_func
    rendered_string_template = render_func(*args, **kwargs)
great_expectations/expectations/expectation.py:576: in _prescriptive_summary
    "params": renderer_configuration.params.dict(),
great_expectations/render/renderer_configuration.py:97: in dict
    return super().dict(
.../hostedtoolcache/Python/3.12.7......................../x64/lib/python3.12.../pydantic/v1/main.py:449: in dict
    return dict(
.../hostedtoolcache/Python/3.12.7......................../x64/lib/python3.12.../pydantic/v1/main.py:866: in _iter
    v = self._get_value(
.../hostedtoolcache/Python/3.12.7......................../x64/lib/python3.12.../pydantic/v1/main.py:742: in _get_value
    v_dict = v.dict(
great_expectations/render/renderer_configuration.py:97: in dict
    return super().dict(
.../hostedtoolcache/Python/3.12.7......................../x64/lib/python3.12.../pydantic/v1/main.py:449: in dict
    return dict(
.../hostedtoolcache/Python/3.12.7......................../x64/lib/python3.12.../pydantic/v1/main.py:866: in _iter
    v = self._get_value(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = &lt;class 'pydantic.v1.main.v__315'&gt;, v = 'foo', to_dict = True
by_alias = True, include = None, exclude = None, exclude_unset = False
exclude_defaults = False, exclude_none = True

&gt;   @classmethod
E   Failed: Timeout &gt;2.0s

.../hostedtoolcache/Python/3.12.7......................../x64/lib/python3.12.../pydantic/v1/main.py:727: Failed

To view individual test run time comparison to the main branch, go to the Test Analytics Dashboard

Quantisan added 4 commits November 1, 2024 14:05

add csv

c887c6e

move code to CI

e1b19bb

add freshness .py files to examples_under_test.py

4efe469

Quantisan requested a review from rachhouse November 1, 2024 06:06

[pre-commit.ci] auto fixes from pre-commit.com hooks

7fa5012

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs] data quality -- freshness #10612

[Docs] data quality -- freshness #10612

Quantisan commented Nov 1, 2024

netlify bot commented Nov 1, 2024 •

edited

Loading

codecov bot commented Nov 1, 2024 •

edited

Loading

[Docs] data quality -- freshness #10612

Are you sure you want to change the base?

[Docs] data quality -- freshness #10612

Conversation

Quantisan commented Nov 1, 2024

netlify bot commented Nov 1, 2024 • edited Loading

❌ Deploy Preview for niobium-lead-7998 failed. Why did it fail? →

codecov bot commented Nov 1, 2024 • edited Loading

❌ 1 Tests Failed:

netlify bot commented Nov 1, 2024 •

edited

Loading

codecov bot commented Nov 1, 2024 •

edited

Loading