Link2Care RCT: Data Rescue for a Multi-Modal Intervention Study
Project Summary
Link2Care was a multi-year randomized controlled trial (RCT) aimed at reducing re-incarceration and improving health outcomes among homeless adults recently released from jail. The study leveraged mobile technology and case management to connect participants with healthcare and social services. Data collection spanned April 2018 to May 2023, with multiple modalities and formats across hundreds of variables.
- Study Registration: ClinicalTrials.gov NCT03399500
- Institution: UTHealth School of Public Health (Dallas Campus)
- My PI: Dr. M. Brad Cannell
- Study Lead PIs: Dr. Michael Businelle, Dr. Jennifer Reingle Gonzalez
- Project Involvement: Sep 2023 - Jan 2024
- Status: Data reconstruction completed; reporting handled by other team members
Tech Stack & Constraints
This project involved rescuing a fragmented dataset collected via QDS, REDCap, and Excel along with data files that were pre-processed with SPSS and SAS - with minimal validation and inconsistent variable naming. Despite the complexity, all work was completed on a local workstation with reproducible outputs and documentation.
Core Tools & Libraries (R)
tidyverse,lubridate,stringr,dplyr,purrr,readr,readxl,openxlsx,haven,here
codebookr— used to generate a full codebook from a custom metadata CSV
Quarto— narrative-style documentation and reproducible reporting
My Contributions
Data Preprocessing
- Restructured delayed discount task (DDT), arrest, and bridge session data into long format (Subject-Visit)
- Validated date windows against study protocols
- Preprocessing QMD
Metadata Mapping
- Extracted variable metadata across disparate files
- Created a unified variable map to standardize naming and selection
- Variable Map QMD
Data Integration & Cleaning
- Combined datasets into a wide-format structure (1,606 subjects × 1,129 variables, one subject per row)
- Deduplicated records and consolidated overlapping variables
- Combining QMD
- Calculating QMD
- Post-Processing QMD
Documentation & Codebook
- Generated a full codebook using
codebookrand the metadata CSV
- Codebook QMD
- Workflow of using a CSV for attribute metadata later incorporated into the
codebookrpackage (though not formally credited)
Reflections
This project was a masterclass in data triage. I stepped into a fragmented, multi-format dataset with no validation and rebuilt it into a clean, analyzable structure. While my contributions weren’t reflected in the final repository or publications, the reproducible pipeline I created was used to validate and replicate the final outputs. I’m proud of the work—and the resilience it took to do it right.