Link2Care RCT: Data Rescue for a Multi-Modal Intervention Study

R
REDCap
SPSS
SAS
RCT
Public Health
Homelessness
Health Disparities
Social Justice
Record Linkage
Data Cleaning
Reproducible Analysis
Health Informatics
Work Product
Academic Research
Long
Rebuilt and documented a fragmented dataset for Link2Care, an RCT targeting health and housing outcomes among formerly incarcerated homeless adults. Standardized variables across QDS, REDCap, Excel, SPSS, and SAS inputs, created reproducible pipelines, and generated a full codebook. Final dataset included 1,606 subjects and 1,129 variables.
Author

Morrigan M.

Published

January 22, 2024

Modified

July 14, 2025

Project Summary

Link2Care was a multi-year randomized controlled trial (RCT) aimed at reducing re-incarceration and improving health outcomes among homeless adults recently released from jail. The study leveraged mobile technology and case management to connect participants with healthcare and social services. Data collection spanned April 2018 to May 2023, with multiple modalities and formats across hundreds of variables.

Tech Stack & Constraints

This project involved rescuing a fragmented dataset collected via QDS, REDCap, and Excel along with data files that were pre-processed with SPSS and SAS - with minimal validation and inconsistent variable naming. Despite the complexity, all work was completed on a local workstation with reproducible outputs and documentation.

Core Tools & Libraries (R)

  • tidyverse, lubridate, stringr, dplyr, purrr, readr, readxl, openxlsx, haven, here
  • codebookr — used to generate a full codebook from a custom metadata CSV
  • Quarto — narrative-style documentation and reproducible reporting

My Contributions

Data Preprocessing

  • Restructured delayed discount task (DDT), arrest, and bridge session data into long format (Subject-Visit)
  • Validated date windows against study protocols
  • Preprocessing QMD

Metadata Mapping

  • Extracted variable metadata across disparate files
  • Created a unified variable map to standardize naming and selection
  • Variable Map QMD

Data Integration & Cleaning

Documentation & Codebook

  • Generated a full codebook using codebookr and the metadata CSV
  • Codebook QMD
  • Workflow of using a CSV for attribute metadata later incorporated into the codebookr package (though not formally credited)

Reflections

This project was a masterclass in data triage. I stepped into a fragmented, multi-format dataset with no validation and rebuilt it into a clean, analyzable structure. While my contributions weren’t reflected in the final repository or publications, the reproducible pipeline I created was used to validate and replicate the final outputs. I’m proud of the work—and the resilience it took to do it right.

Back to top