-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to use TunedModel with precomputed-SVM #1141
Comments
Thanks @KeishiS for the positive feedback and for posting. I'm afraid, that when MLJTuning (or It looks like you may have better luck with the LIBSVM version of the model (also provided an MLJ interface). In this case you can pass a kernel function rather than an explicit matrix, which won't suffer this issue, right? Would this suit your purpose? For the record, it is theoretically possible to fix the sk-learn API. The proper interface point for "metadata" that needs to be resampled is to pass it along with the data. So, a corrected workflow would look something like mach = machine(SVC(), X, y, kernel)
evaluate!(mach, resampling=...) To implement this would require also adding a "data front end" to the MLJ interface, to articulate exactly how the resampling is to be done, because the default resampling of arrays (just resample the rows) doesn't work in this case. Unfortunately, the MLJ sk-learn interfaces are created with a lot of metaprogramming and are therefore difficult to customise. So a fix here would be complicated. |
Thank you for your reply! 😄 I wasn't familiar with the concept of a "data front end", so I'll take some time to study the information at the link you provided. While the example code creates a gram matrix from simple toy data, I'm currently considering using a graph kernel where processing multiple graphs in parallel would be more efficient. That's why I was hoping to use it as a precomputed kernel if possible. I appreciate your suggestion of the Based on the information you've provided, I'll think about whether there might be a good alternative approach. For now, I'll close this issue. Thank you very much for taking the time to address my concerns. |
First of all, thank you for the great work you're doing in maintaining this project. I encoutered what seems to be a bug when attempting to use a support vector classifier with a precomputed Gram matrix, while performing hyperparameter tuning using
TunedModel
. I would like to submit a pull request to address the issue, but I'm unsure which part of the codebase needs modification. Any advice would be greatly appreciated.Describe the bug
When performing parameter search with TunedModel on an SVM with a precomputed kernel, the data splitting is not carried out properly.
To Reproduce
Expected behavior
During the process of searching for the best params, the Gram matrix
gmat
is divided into training data and test data. We expectgmat[train_idx, train_idx]
andgmat[test_idx, train_idx]
to be created. However, the current code splits it intogmat[train_idx, :]
andgmat[test_idx, :]
. This operation is executed in thefit_and_extract_on_fold
function inMLJBase.jl/src/resampling.jl
.Versions
I would be grateful for any advice on how to approach solving this issue. Thank you for taking the time to read and consider this matter!
The text was updated successfully, but these errors were encountered: