Data Science for Mental Health (DS4MH) @ The Alan Turing Institute
About Us
The vision for this interest group is to kick-start one or more projects using contemporary data science and multi-modal data for mental health to provide insight and benefit for individuals, clinicians, and contribute to fundamental research in mental health (including dementia) as well as the data science methodology. It aims to provide an informal bridge between clinicians, charities, and data owners (like CRIS, UKDP, and Biobank) and data science researchers to stimulate and align cutting edge research in this area.
Events
Meetings
We organise monthly meetings (including half-an-hour long invited talks) at the Turing. Meetings are organised and moderated by Jenny Chim, Yue Wu, and Emilio Ferrucci. Please join our mailing list for more updated information.
As a part of AI UK Fringe, we jointly organised a hybrid event with the NLP interest group on AI for Mental Health Monitoring on 28th March 2024.
See here for our previous talks.
Upcoming Events
Meetings
Date | Time | Presenter | Title |
---|---|---|---|
2025.06.19 | 15:00 | Introduction | |
15:05 | Dr. Amrit Kaur Purba (University of Cambridge) |
Navigating the Digital Landscape: Understanding the Impact of Social Media on Youth Health
The impact of social media on youth health is a complex and evolving issue that requires ongoing attention and adaptive research strategies. As social media platforms rapidly change, research methods must evolve to better understand their influence on young people's overall health. The challenge of distinguishing causality from correlation highlights the need for more rigorous studies, supported by advancements in objective social media data collection and causal analysis of observational data. Just as past public health efforts addressed physical health risks, today’s digital landscape demands a similar focus on 'digital sanitation' to protect youth from harmful content that can influence their behaviours and overall health. Securing funding for large-scale, longitudinal research is crucial to uncover the causal pathways of social media’s impact, assess its long-term effects on youth health, and develop evidence-based interventions to mitigate its growing risks. Policymakers must take swift, proactive action to implement measures that protect young people and hold social media platforms accountable. Research-driven policies, coupled with efforts to foster digital mindfulness and scientific literacy among youth, educators, and caregivers, are essential for empowering individuals to navigate the digital world safely. A coordinated approach—anchored in robust research, forward-thinking policy, and comprehensive education—is key to safeguarding youth health in an ever-evolving digital landscape. |
|
15:45 | Yuanchao Li (University of Edinburgh) |
Leveraging Pre-Trained Speech and Language Models for Semi-Supervised Cognitive State Classification
The lack of labeled data is a common challenge in speech classification tasks, particularly those requiring extensive subjective assessment, such as cognitive state classification. We propose a Semi-Supervised Learning (SSL) framework, introducing a novel multi-view pseudo-labeling method that leverages pre-trained speech and language models to select the most confident data for training the classification model. Acoustically, unlabeled data are compared to labeled data using the Fréchet audio distance, calculated from embeddings generated by multiple audio encoders. Linguistically, large language models are prompted to revise automatic speech recognition transcriptions and predict labels based on our proposed task-specific knowledge. High-confidence data are identified when pseudo-labels from both sources align, while mismatches are treated as low-confidence data. A bimodal classifier is then trained to iteratively label the low-confidence data until a predefined criterion is met. We evaluate our SSL framework on both emotion recognition and Alzheimer’s dementia detection tasks. Experimental results demonstrate that our method achieves competitive performance compared to fully supervised learning using only 30% of the labeled data and significantly outperforms two selected baselines. |
|
16:20 | After talks discussion |