-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] UMAP fails to correctly embed small datasets when random_state
is not set.
#6024
Comments
In deterministic mode, the iterations are processed differently. However, confirming if this explains the difference observed here would have to be investigated further. However, smaller datasets definitely require larger |
Running with higher Given that the example code above leaves most parameters up to their defaults, it should be the case that
Checking out where this warning likely originates from, I suspect this is related to the fact that neither
I can confirm that when either specifying |
Which version of RAPIDS are you using? This error should simply mean that the |
I'm using version 24.06, so that checks out. |
Describe the bug
UMAP fails to correctly embed small datasets when
random_state
is not set (or, rather, when it is set toNone
). This affects dataset sizes smaller than roughly 90 samples.Steps/Code to reproduce bug
Running any dataset under 90 samples through unseeded/seeded UMAP will be enough to reproduce this bug. Below is a simple example using
make_blobs
data.The output looks something like this:
Expected behavior
Failing to pass a random state seed should not interfere with the quality of UMAP embeddings.
Environment details (please complete the following information):
Additional context
This behavior is independent of choice in epochs and initialization algorithm.
The text was updated successfully, but these errors were encountered: