-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added method to GoogleBigQuery to copy between projects #1111
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes requested - happy to chat more about any of my review if you'd like. Also, ideally there would be a new test for this new method.
parsons/google/google_bigquery.py
Outdated
except Exception as e: | ||
print("Exception during copy between projects", e) | ||
else: | ||
print("BigQuery copy failed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we hit this failure after creating the dataset in the first try/except, the attempt might fail but still have a side effect of creating the dataset, right? That feels a little unpredictable although it wouldn't necessarily cause problems because this code doesn't try to re-create an already existing table. Still, you might want to either leave a note about order of execution so that it's clear that a failure might still have created a dataset or restructure this so we're checking for both failure modes at once rather than sequentially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I shouldn't do the try/except at all but just create the job and if there's a general exception just let the computer say so. (I think this would be an example of your guidance below re general exceptions?)
parsons/google/google_bigquery.py
Outdated
dataset = bigquery.Dataset(dataset_id) | ||
dataset = self.client.create_dataset(dataset, timeout=30) | ||
else: # fail | ||
print("BigQuery copy failed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to use the logger here (and throughout the method) rather than print. Here's an example of the logger in this file - if you've not used it before and are unfamiliar with log levels happy to explain a bit more how they work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks--this looks like an error level (from what I know)--does that sound right?
parsons/google/google_bigquery.py
Outdated
result = job.result() | ||
print(result) | ||
except Exception as e: | ||
print("Exception during copy between projects", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it makes sense to capture the NotFound exceptions and log the results without actually raising the exception, this exception here just captures any exception and saves the feedback. Generally speaking we want to raise any unexpected errors so we can notice and fix whatever the problem is sooner rather than later, so I would remove this specific try/except block and just call the code normally. If there's an exception, there's an exception - and we want to know about it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking, in software development, "don't catch general exceptions" is a good rule of thumb. This blog post has some good details of why.
No description provided.