Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using kfold for model selection after splitting by groups #169

Open
lkschwarz opened this issue Apr 25, 2021 · 6 comments
Open

Using kfold for model selection after splitting by groups #169

lkschwarz opened this issue Apr 25, 2021 · 6 comments

Comments

@lkschwarz
Copy link

Using packages rstanarm and loo to run a logistic regression with four different intercepts and univariate slope hierarchical by individual, then using k-fold leave-one-group-out for model selection (kfold_split_grouped, then kfold). I get the same error when running the kfold command regardless of the model complexity.
Error message:
Fitting K = 60 models distributed over 3 cores
Error in checkForRemoteErrors(val) :
3 nodes produced errors; first error: object 'n_chains' not found
I think it has to do with the number of cores in the kfold command (The above error occurred with 3 cores). If I run it with one core, it works but impossibly slowly. More than one core, and there is a problem.

I updated R and all packages yesterday:
R Session info:
R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
other attached packages:
loo_2.4.1 MCMCvis_0.15.1 rstanarm_2.21.1 Rcpp_1.0.6

I have included some ineligant sample code that should reproduce the error. The problem occurs at line 83.
rstanarm_logistic_hier_test.txt

Thanks,
LKS

@jgabry
Copy link
Member

jgabry commented May 18, 2021

Sorry you're getting an error. I changed iter to a small number just to make things go faster but otherwise I used your example and I don't get any error running the kfold part. But I'm on Mac so perhaps this is a Windows issue.

To help me figure this out, aside from convergence warnings, can you tell me whether running this simpler example is successful or if you run into a similar error?

n_chains <- 2
fit <- stan_lmer(mpg ~ disp + (1|cyl), data = mtcars, 
                 refresh = 0, iter = 50, cores = n_chains, chains = n_chains)
k <- kfold(fit, cores = 2, folds = loo::kfold_split_random(K = 3, N = nrow(mtcars)))

@jgabry
Copy link
Member

jgabry commented May 18, 2021

And sorry for the slow reply!

@jgabry jgabry added question and removed question labels May 18, 2021
@lkschwarz
Copy link
Author

lkschwarz commented May 20, 2021 via email

@jgabry
Copy link
Member

jgabry commented May 20, 2021

Thanks for checking. I bet there's a problem with how we're doing parallelization on windows. Will try to look into it soon. If possible, can you try one other thing? If you replace n_chains with a number does it work? That is, if you just put 2 everywhere you currently have n_chains does it run without error? The answer to that will be helpful when trying to figure out what the problem is here.

@lkschwarz
Copy link
Author

lkschwarz commented May 20, 2021 via email

@jgabry
Copy link
Member

jgabry commented May 20, 2021

Ok great, that's super helpful for narrowing down where the problem is. And I'm glad you can at least get it working this way until we fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants