Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table with Null engine not created on cluster #327

Open
benjamin-awd opened this issue Jul 26, 2024 · 4 comments · May be fixed by #328
Open

Table with Null engine not created on cluster #327

benjamin-awd opened this issue Jul 26, 2024 · 4 comments · May be fixed by #328
Labels
bug Something isn't working

Comments

@benjamin-awd
Copy link

benjamin-awd commented Jul 26, 2024

Describe the bug

Tables with null engines are not created using the ON CLUSTER clause

Steps to reproduce

  1. Attempt to create a relation with a null engine e.g.
{{
    config(
        engine="Null",
        materialized="materialized_view",
        schema="default",
        cluster="default"
    )
}}

select * from {{ ref("stg_testing") }}
  1. Materialized view + target table is only created on a single server, and not across the entire cluster

Expected behaviour

There should be an option to allow a null engine table to be created on cluster

Code examples, such as models or profile settings

Also tried defining the cluster via profiles.yml, but this makes no difference

@benjamin-awd benjamin-awd added the bug Something isn't working label Jul 26, 2024
@benjamin-awd
Copy link
Author

I figure the quick fix for this is to update the logic in get_on_cluster():

    @classmethod
    def get_on_cluster(
        cls: Type[Self], cluster: str = '', materialized: str = '', engine: str = ''
    ) -> bool:
        if cluster.strip():
            return (
                materialized in ('view', 'dictionary')
                or 'distributed' in materialized
                or 'Replicated' in engine
+               or engine in ('Null')
            )

        else:
            return False

@benjamin-awd
Copy link
Author

Another relevant issue: #237

@emirkmo
Copy link

emirkmo commented Jul 26, 2024

The current on_cluster macro is really overloaded. It is a conditional macro that checks for things other than cluster.

For example, I wanted to use it in a post-hook for configuring Row Policies but basically couldn't.

Perhaps instead of adding more and more cases to the if statement, we can refactor it to work by default when called, and fix the problems from its misuse in other macros?

Or we should leave the current one as is since it is used everywhere, but make a real_on_cluster_macro that does not do any checking of the relation, just returns the cluster correctly formatted as 'ON CLUSTER <cluster>' if cluster exists... :D . The current one can use the real_on_cluster_macro under the hood. It should really be called conditional_on_cluster_macro since it conditionally returns on_cluster if some conditions not specified in the README are met.

This is a sincere suggestion, what do you think @benjamin-awd? Then you could use it with any type of Engine etc. and the conditional one can be used by those implementations that need more logic such as distributed/Replicated materializations..

@benjamin-awd benjamin-awd linked a pull request Jul 28, 2024 that will close this issue
3 tasks
@benjamin-awd
Copy link
Author

benjamin-awd commented Jul 28, 2024

@emirkmo I ended up creating a force_on_cluster config argument that should add the ON CLUSTER clause regardless of engine or materialization type in #328

Perhaps instead of adding more and more cases to the if statement, we can refactor it to work by default when called, and fix the problems from its misuse in other macros?

This is a good point -- I prefer that option (something like this) that adds the ON CLUSTER clause to everything by default if the cluster argument is defined but I'm not sure what consequences that would have.

As a user, when I specify the cluster argument in the config, I'd expect my relations to be created on the cluster. If a user wants to only create a model on a single connected shard, then they should remove the cluster argument from their model (or profiles).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants