Skip to content

chris-german/WiSER-Reproducibility

Repository files navigation

Data

Abstract

We apply WiSER to three datasets in order to investigate factors related to intra-individual variability: the Women’s Health Study Accelerometry Study (WHS available via dbGaP), The Action to Control Cardiovascular Disease (ACCORD available via BioLINCC), and S&P 500/ President Trump’s Twitter Data (publicly available). WHS data contains accelerometer data on over 15,000 women over 7 days. ACCORD data contains data from a multi-center trial in patients with type II diabetes. The S&P 500/Trump Twitter data is data downloaded from publicly available web APIs that contain Trump's Tweets and Daily historic stock data from the stocks in S&P 500.

Availability

  • Data are publicly available.

Publicly available data

Description

File format(s)

  • CSV or other plain text.

Data dictionary

Part 2: Code

Abstract

Code to download data (when applicable), clean data, and reproduce results are provided in the form of Jupyter Notebooks in the GitHub respository for this project. The subfolders contain code/related content for each analysis found in the paper.

Description

Code format(s)

Supporting software requirements

Version of primary software used

WiSER.jl version v0.0.2.

Libraries and dependencies used by the code

  • R packages (used for plotting): data.table, facetscales, ggplot2, gridExtra, scales
  • Julia Packages: WiSER (and its dependencies found at https://GitHub.com/OpenMendel/WiSER.jl/blob/master/Project.toml), CodecZlib, CSV, DataFrames, DelimitedFiles, GLM, KNITRO [academic license], MarketData, RCall, Roots, SpecialFunctions, StatsBase, TimeZones, DelimitedFiles.

Parallelization used

  • No parallel code used

License

  • MIT License (default)

Additional information (optional)

Parallelization of code was not used, but is easily possible in WiSER, shown in its GitHub documentation.

Julia allows for easy reproducibility, by including a Manifest.toml and Project.toml pair in each subfolder. The user can simply run ] activate . in Julia at that directory and the correct environment with Julia package dependencies used will run.

Part 3: Reproducibility workflow

Scope

The Jupyter notebooks and code provided can be used to reproduce all results (including tables and figures) in Sections 5 and 6, and their accompanying supplementary material sections (S.5-S.8).

Workflow

Format(s)

  • Self-contained R Markdown file, Jupyter notebook, or other literate programming approach

Instructions

Each subfolder in the GitHub repository links to certain sections of the paper (Simulations, Women's Health Study, ACCORD, Twitter/Stock data). These each contain Jupyter notebooks with extensions .ipynb that go step-by-step through the workflow of the analyses presented in the paper, starting from downloading the data (when applicable), to cleaning the data, to analyzing the data. Once you have access to the data sets that require researcher requests, you can run these notebooks with the data and it will produce the results seen in the paper. For easy readability, .html files of the rendered notebooks are also included, which can be opened to view the notebook contents without launching Jupyter.

Note: In order to run Julia in a Jupyter notebook, you must install Julia and the IJulia package. After downloading and launching Julia, IJulia can be installed and Jupyter notebook can be launched by running the following code in Julia:

using Pkg
Pkg.add("IJulia")
Pkg.build("IJulia")
using IJulia
notebook()

Expected run-time

Approximate time needed to reproduce the analyses on a standard desktop machine:

  • > 8 hours

Additional information (optional)

The simulations take the bulk of the time. The real data analyses, including cleaning the data, should take under an hour on a standard desktop machine.

About

Reproduces results from WiSER paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published