You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that when full formulas are used in search_terms input argument such that some formulas are incompatible, cv_varsel doesn't check the incompatibility and includes variables from two incompatible formulas. This can be seen in the following reprex
library(brms)
library(projpred)
set.seed(2)
N <- 100
p <- 5 #number of parameters
dat <- as.data.frame(matrix(rnorm(N*p), nrow = N, ncol = p)) #initialize data frame with p covariates
names(dat) <- paste0('x', 1:p)
betas <- rnorm(p) # simulate effect values
dat$y <- rnorm(N, mean=as.matrix(dat[, paste0('x', 1:p)]) %*% betas) #y is a noisy observation of the linear combination of covariates
formula_all <- as.formula(paste0('y~', paste(paste0('x', 1:p), collapse = '+')))
ref_mod <- brm(formula_all, data = dat, refresh = 0)
cv_select_prj <- cv_varsel(ref_mod, method = 'forward', cv_method = 'LOO', refit_prj = F, search_terms = c('x1+x2','x1+x3','x1+x2+x4'))
Here, I would assume that either x1, x2, x4 would be included or x1 and x3 only, as x1+x2+x4 is not compatible with x1+x3. However, cv_varsel returns:
I think this behavior is correct given projpred's current implementation of the search_terms argument: After x1 + x3 was chosen, x2 from x1 + x2 is a candidate because the required x1 term is already included in the chosen terms. And after x2 was chosen additionally, x4 from x1 + x2 + x4 is a candidate because the required x1 and x2 terms are already included in the chosen terms.
Despite this correctness, I think I know what you intended: At size 3 (including the intercept), x1 + x2 and x1 + x3 should be candidates (thereby forcing the inclusion of x1). At size 4, x1 + x2 + x4 should be a candidate only if x1 + x2 has been chosen at size 3. I don't think this is possible with the current implementation of the search_terms argument. I tried
but both don't give the desired results. So I'll label this as a feature request (currently, I think the - syntax is the way to go).
BTW (just for the record, because I first thought like you that this was a bug): Before merging #360, we had the same behavior (but in a slightly different manner): There, within search_forward(), we got
cands# [1] "x2 + x3"
at size 4 (including the intercept), i.e., after x1 + x3 has been chosen. This was also correct from projpred's point of view—in the same sense as now that #360 has been merged. So #360 (more precisely, the efficiency improvement mentioned here) doesn't seem to have affected this.
Hi,
I noticed that when full formulas are used in
search_terms
input argument such that some formulas are incompatible, cv_varsel doesn't check the incompatibility and includes variables from two incompatible formulas. This can be seen in the following reprexHere, I would assume that either x1, x2, x4 would be included or x1 and x3 only, as x1+x2+x4 is not compatible with x1+x3. However, cv_varsel returns:
Is this the intended behaviour or should we try to resolve this? I The problem seems to lie in
select_possible_terms_size
informula.R
.The text was updated successfully, but these errors were encountered: