Recursive Feature Elimination RFE - Feature Request? #426

Sinansi · 2020-01-24T14:27:17Z

Hello,

I am doing feature selection using Scikit Learn Recursive Feature Elimination RFE.
This algorithm takes ages on Python. I searched on Julia Observer and couldnt find any equivalent in Julia. Is it true Julia has no RFE implementation? And if so, can it be considered for feature request?

Thank you!

ablaom · 2020-01-26T19:46:26Z

Do you have a good reason to believe this would be any faster if implemented in Julia? I would guess that most of the work in RFE is training the basic model (over and over). If that model is wrapped C code (the typical case in sklearn) then this ought to be close optimal, no?

I ask because the expedient thing to do might be just wrap the python function, to start with.

Sinansi · 2020-02-12T19:26:22Z

@ablaom thanks for your reply. RFE is a very powerful feature selection technique. Honestly, I find it to be the best thus far because it is considers the nature of your specific dataset, and it is perhaps the only selection technique that advise you on the number of features to use.

Unfortunately, I dont have solid evidence that implementing RFE in Julia will be faster than the one in Python (sklearn), but I see that Julia is generally faster than Python, especially with processing that requires alot of loops.

Meanwhile, I totally support the idea of wrapping the Python function. That sounds as a great start.

ablaom · 2020-10-11T21:25:52Z

See also JuliaAI/MLJModels.jl#314

ablaom · 2021-02-28T22:57:06Z

Worth mentioning that Caret has a variant with resampling, in addition to a vanilla version that looks the same as the python one. We should start with the vanilla one.

https://topepo.github.io/caret/recursive-feature-elimination.html

ablaom · 2021-03-01T00:03:20Z

RFE applies a supervised model that must report feature importances (could be coefficients, in case of a linear model) but there is not a uniform interface for this at present. We should also keep in mind that some models report several different types of scores. I've opened #747 to address this.

In the meantime, a "getter" function (acting on the model's report) could do, as in the sk-learn implementation.

Also, maybe it's worth having an option to use Shapley values (eg https://github.com/nredell/ShapML.jl) ??

Sinansi · 2021-03-03T18:01:56Z

Yes please, the vanilla version is a good start.

ablaom · 2021-03-04T21:50:55Z

Some design notes to self or whoever ultimately picks this up:

I propose this be a Supervised model wrapper with a transform method that transforms a new table of input features to the reduced set (for use in conjunction with other models). The wrapper could be trained on the final selection of reduced features, so that predict is based on those, enabling easy out-of-sample evaluation of the feature reduction, with an opt-out option for final train (like in TunedModel).

(Structually such a wrapper would look similar to other wrappers, like TunedModel or EnsembleModel. A technical detail is that under the hood there needs to be two parameterised types: ProbabilisticRFEModel{M} and DeterministicRFEModel{M}, as we do in those examples. )

@boliu-christine

ablaom · 2021-12-21T20:03:03Z

On second thoughts, for consistency with other feature reduction strategies, a wrapper is probably not advised. We should simply implement as an Unsupervised model which includes the target as a training argument. See #874 .

vboussange · 2023-10-27T15:56:22Z

Hey, any updates on this topic?
I may start working on a PR if there has not been any advances yet - taking sklearn as a template. Feel free to provide inputs on how you would proceed if some of you have already put some deeper thoughts on it!

ablaom · 2023-10-29T20:33:20Z

Yes it would be great to get help with this. I don't have any new thoughts on this, and actually haven't been thinking about this recently. Possibly @OkonSamuel, who worked of the feature importance API, may want to comment.

Here is the feature importance API: https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/#Feature-importances . Let me know if you have any questions about it.

My preference would be for new package, rather than a PR here. It could still be part of the standard MLJ.jl install, and be integrated in documentation. As I say above, RFE is probably implemented as Unsupervised but you include the target y in the training data, as in MLJModelInterface.fit(rfe_model, verbosity, X, y), unlike existing Unsupervised models (of which you can find examples in MLJModels/src/builtins/transformers.jl).

For testing, here are supervised models reporting feature importances:

julia> models() do m
       m.reports_feature_importances
       end
15-element Vector{NamedTuple{(:name, :package_name, :is_supervised, :abstract_type, :deep_properties, :docstring, :fit_data_scitype, :human_name, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :inverse_transform_scitype, :is_pure_julia, :is_wrapper, :iteration_parameter, :load_path, :package_license, :package_url, :package_uuid, :predict_scitype, :prediction_type, :reporting_operations, :reports_feature_importances, :supports_class_weights, :supports_online, :supports_training_losses, :supports_weights, :transform_scitype, :input_scitype, :target_scitype, :output_scitype)}}:
 (name = AdaBoostStumpClassifier, package_name = DecisionTree, ... )
 (name = CatBoostClassifier, package_name = CatBoost, ... )
 (name = CatBoostRegressor, package_name = CatBoost, ... )
 (name = DecisionTreeClassifier, package_name = DecisionTree, ... )
 (name = DecisionTreeRegressor, package_name = DecisionTree, ... )
 (name = EvoTreeClassifier, package_name = EvoTrees, ... )
 (name = EvoTreeCount, package_name = EvoTrees, ... )
 (name = EvoTreeGaussian, package_name = EvoTrees, ... )
 (name = EvoTreeMLE, package_name = EvoTrees, ... )
 (name = EvoTreeRegressor, package_name = EvoTrees, ... )
 (name = RandomForestClassifier, package_name = DecisionTree, ... )
 (name = RandomForestRegressor, package_name = DecisionTree, ... )
 (name = XGBoostClassifier, package_name = XGBoost, ... )
 (name = XGBoostCount, package_name = XGBoost, ... )
 (name = XGBoostRegressor, package_name = XGBoost, ... )

A bunch of ScikitLearn models have also been added but not yet updated in the model registry.

OkonSamuel · 2023-10-30T20:48:21Z

Hey, any updates on this topic? I may start working on a PR if there has not been any advances yet - taking sklearn as a template. Feel free to provide inputs on how you would proceed if some of you have already put some deeper thoughts on it!

As @ablaom has pointed out the building blocks for supporting this has been added to the MLJ API. Ideally the approach would be to develop an external package that supports RFE, but we have just haven't gotten around to do this yet. This can be implemented as a wrapper model similar to the way we implement EnsembleModel.

ablaom · 2023-10-31T02:32:42Z

| This can be implemented as a wrapper model similar to the way we implement EnsembleModel.

I think the right approach is to view this as an Unsupervised transformer (wrapping a classifier) that has the target as part of training data, which is now allowed. See #874

ablaom · 2023-11-15T19:11:31Z

@vboussange I have a Master's student keen to take this on now. What is the status of your own efforts? Are you happy for him to take this on? He can start immediately.

vboussange · 2023-11-15T20:28:27Z

Hey @ablaom, this sank slowly down on my todo list - I have not done any progress. Please go ahead if you have some workforce!

ablaom · 2024-06-19T21:36:07Z

done. See FeatureSelection.jl model is RecursiveFeatureElimination

ablaom added the enhancement New feature or request label Jan 26, 2020

ablaom mentioned this issue Feb 28, 2021

Improved feature importance support #747

Open

13 tasks

ablaom mentioned this issue Sep 1, 2023

A de-correlation model for feature exclusion #1047

Open

ablaom assigned OkonSamuel Nov 16, 2023

ablaom closed this as completed Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recursive Feature Elimination RFE - Feature Request? #426

Recursive Feature Elimination RFE - Feature Request? #426

Sinansi commented Jan 24, 2020

ablaom commented Jan 26, 2020

Sinansi commented Feb 12, 2020

ablaom commented Oct 11, 2020

ablaom commented Feb 28, 2021

ablaom commented Mar 1, 2021

Sinansi commented Mar 3, 2021

ablaom commented Mar 4, 2021

ablaom commented Dec 21, 2021 •

edited

Loading

vboussange commented Oct 27, 2023

ablaom commented Oct 29, 2023 •

edited

Loading

OkonSamuel commented Oct 30, 2023 •

edited

Loading

ablaom commented Oct 31, 2023 •

edited

Loading

ablaom commented Nov 15, 2023 •

edited

Loading

vboussange commented Nov 15, 2023

ablaom commented Jun 19, 2024 •

edited

Loading

Recursive Feature Elimination RFE - Feature Request? #426

Recursive Feature Elimination RFE - Feature Request? #426

Comments

Sinansi commented Jan 24, 2020

ablaom commented Jan 26, 2020

Sinansi commented Feb 12, 2020

ablaom commented Oct 11, 2020

ablaom commented Feb 28, 2021

ablaom commented Mar 1, 2021

Sinansi commented Mar 3, 2021

ablaom commented Mar 4, 2021

ablaom commented Dec 21, 2021 • edited Loading

vboussange commented Oct 27, 2023

ablaom commented Oct 29, 2023 • edited Loading

OkonSamuel commented Oct 30, 2023 • edited Loading

ablaom commented Oct 31, 2023 • edited Loading

ablaom commented Nov 15, 2023 • edited Loading

vboussange commented Nov 15, 2023

ablaom commented Jun 19, 2024 • edited Loading

ablaom commented Dec 21, 2021 •

edited

Loading

ablaom commented Oct 29, 2023 •

edited

Loading

OkonSamuel commented Oct 30, 2023 •

edited

Loading

ablaom commented Oct 31, 2023 •

edited

Loading

ablaom commented Nov 15, 2023 •

edited

Loading

ablaom commented Jun 19, 2024 •

edited

Loading