Researcher: Opt Out Oversight

Context:
You have spent two years building an accent conversion model to help people with speech disorders, trained entirely on public YouTube videos. Before release, you understand that while the videos were public, several creators never imagined their content would be used for AI health training. Using their data feels like a violation of contextual integrity, even if it is legally permissible.
Dilemma:
A) Delete all data from non-consenting creators. This incurs a devastating two-year delay and funding loss for your accessibility project.
B) Continue using the data, arguing the vital public health benefit and the public nature of the videos outweigh the lack of explicit consent.
Story behind the dilemma:
This commentary addresses critical ethical issues in using social media data to train AI models for digital phenotyping—the practice of quantifying human traits and health conditions using digital technology. The authors argue that the common research practice of scraping publicly available social media posts to train healthcare AI models is ethically problematic, even when legally permissible.
The core ethical violation is the lack of explicit consent. Users who share content publicly for social interaction do not necessarily consent to having their data used for health-related analysis or to train AI systems. This is especially critical when sensitive information, such as labels related to neurodiversity or mental health, is scraped and used, as it risks amplifying stigma and causing harm.
To counter these risks, the authors advocate for a shift away from simply scraping data based on its public availability. Instead, they propose the adoption of community-based participatory design (CBPD) principles. This approach involves collaborating directly with the communities whose data is being used, ensuring they have a voice in how the AI is developed and for what purposes. This fosters trust, improves model relevance, and ensures that the development of healthcare AI respects the autonomy and contextual expectations of the individuals it aims to serve.
