Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Update Tile Pre-Processor to support more modes #6558

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

blessedcoolant
Copy link
Collaborator

@blessedcoolant blessedcoolant commented Jun 29, 2024

Summary

Add support for more methods of resampling to work with Tile ControlNet models.

  • regular - The default downsizing resampler which basically downsamples the input image.
  • blur - A new guided filter algorithm that works with the now available ControlNet models for SDXL.
  • var - Used for image variations apparently
  • super - TODO -- used with super sizing but probably should not be in this particular node anyway

Tile Image Processor is now also available in the Linear UI.

Related Issues / Discussions

QA Instructions

Merge Plan

Test and Merge.

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • Documentation added / updated (if applicable)

@github-actions github-actions bot added python PRs that change python files invocations PRs that change invocations backend PRs that change backend files frontend PRs that change frontend files labels Jun 29, 2024
@blessedcoolant blessedcoolant force-pushed the tile-preprocessor-sdxl branch 2 times, most recently from fd8a155 to 2911149 Compare June 29, 2024 14:26
@blessedcoolant blessedcoolant marked this pull request as ready for review June 29, 2024 14:26
@blessedcoolant blessedcoolant force-pushed the tile-preprocessor-sdxl branch 2 times, most recently from 3dc5566 to 9a23dd2 Compare June 29, 2024 14:50
Copy link
Collaborator

@RyanJDick RyanJDick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did an initial high-level pass. I haven't actually looked at the specifics of how the new modes are implemented yet - I'll do that once some of these initial questions are answered.

Have you done any testing with the new modes? Or can you give guidance on when you might want to choose one over another.

invokeai/app/invocations/controlnet_image_processors.py Outdated Show resolved Hide resolved
invokeai/app/invocations/controlnet_image_processors.py Outdated Show resolved Hide resolved
invokeai/app/invocations/controlnet_image_processors.py Outdated Show resolved Hide resolved
invokeai/app/invocations/controlnet_image_processors.py Outdated Show resolved Hide resolved
invokeai/app/invocations/controlnet_image_processors.py Outdated Show resolved Hide resolved

if self.mode == "regular":
np_img = HWC3(np_img)
if self.down_sampling_rate < 1.1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that you didn't change this code, but do you know why 1.1 was chosen as the cut-off? It seems like we intentionally chose to allow float down sampling rates, implying that a value such as 1.07 could be a valid downsampling rate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why this was done originally. I'll need to check.

invokeai/app/invocations/controlnet_image_processors.py Outdated Show resolved Hide resolved
invokeai/app/invocations/controlnet_image_processors.py Outdated Show resolved Hide resolved
Comment on lines 531 to 535
if random.random() > 0.5:
np_img = self.apply_gaussian_blur(np_img, ksize=int(blur_strength), sigmaX=blur_strength / 2)

if random.random() > 0.5:
np_img = self.apply_guided_filter(np_img, radius, eps, int(scale_factor))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why non-deterministic? Should we make these two different modes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also how this particular model seems to prefer it: https://huggingface.co/xinsir/controlnet-tile-sdxl-1.0

@@ -0,0 +1,283 @@
# ruff: noqa: E741
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file appears to have been copied from somewhere. Can you link to the original?

Also, why use this implementation over cv2.ximgproc.guidedFilter?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR was made to support this new Tile model: https://huggingface.co/xinsir/controlnet-tile-sdxl-1.0

This one uses a custom version of the guidedFilter which they claim to be faster. I have not done benchmarks to test this myself. Functionally using the inbuilt guidedFilter from cv2 should also work just fine imo.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it would be easy to test for both speed and behavior. I think that's worth doing so we can decide whether it's worth bringing in this file.

If we do decide that this file is better than the cv2 version, then we should:

  • Explain why we are using this over cv2 at the top of this file.
  • Permalink to the source that this file was copied from.
  • Remove import cv2.ximgproc from the invocation file.

Copy link
Collaborator

@RyanJDick RyanJDick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the sampling mode implementations.

Do you have test results that you can share for the new modes? Are they empirically better?

Comment on lines +518 to +520
# referenced from
# https://huggingface.co/TTPlanet/TTPLanet_SDXL_Controlnet_Tile_Realistic/blob/37f1c4575b543fb2036e39f5763d082fdd135318/TTP_tile_preprocessor_v5.py
def _blur_resample(self, np_img: np.ndarray[Any, Any]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does not seem to match anything in the linked file. Is the link wrong?

Comment on lines +526 to +535
blur_strength = random.sample([i / 10.0 for i in range(10, 201, 2)], k=1)[0]
radius = random.sample([i for i in range(1, 40, 2)], k=1)[0] # noqa: C416
eps = random.sample([i / 1000.0 for i in range(1, 101, 2)], k=1)[0]
scale_factor = random.sample([i / 10.0 for i in range(10, 181, 5)], k=1)[0]

if random.random() > 0.5:
np_img = self._apply_gaussian_blur(np_img, ksize=int(blur_strength), sigma_x=blur_strength / 2)

if random.random() > 0.5:
np_img = self._apply_guided_filter(np_img, radius, eps, int(scale_factor))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the random sampling in this block makes it look like experimental code?
Nodes should be deterministic, so we need to either:

  • Hard-code these values to good defaults.
  • Expose them as parameters.
  • Add a seed. (This seems like a bad idea. I can't think of a good reason for a preprocessor to need a seed.)

Comment on lines +521 to +524
height, width, _ = np_img.shape
ratio = np.sqrt(1024.0 * 1024.0 / (width * height))
resize_w, resize_h = int(width * ratio), int(height * ratio)
np_img = cv2.resize(np_img, (resize_w, resize_h))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the first step is to resize to match the area 1024x1024. Why? Is this SDXL-specific? Does this interact in some important way with one of the later hard-coded values?

Comment on lines +544 to +547
height, width, _ = np_img.shape
ratio = np.sqrt(1024.0 * 1024.0 / (width * height))
resize_w, resize_h = int(width * ratio) // 48 * 48, int(height * ratio) // 48 * 48
np_img = cv2.resize(np_img, (resize_w, resize_h))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above re: why 1024x1024?

And why force it to be a multiple of 48?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend PRs that change backend files frontend PRs that change frontend files invocations PRs that change invocations python PRs that change python files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants