1.
Classifying the information needs of survivors of domestic violence in online health communities using large language models : prediction model development and evaluation studyShaowei Guan,
Vivian Hui,
Gregor Štiglic,
Rose Eva Constantino,
Young Ji Lee,
Arkers Kwan Ching Wong, 2025, izvirni znanstveni članek
Opis: Background: Domestic violence (DV) is a significant public health concern affecting the physical and mental well-being of
numerous women, imposing a substantial health care burden. However, women facing DV often encounter barriers to seeking
in-person help due to stigma, shame, and embarrassment. As a result, many survivors of DV turn to online health communities
as a safe and anonymous space to share their experiences and seek support. Understanding the information needs of survivors of
DV in online health communities through multiclass classification is crucial for providing timely and appropriate support.
Objective: The objective was to develop a fine-tuned large language model (LLM) that can provide fast and accurate predictions
of the information needs of survivors of DV from their online posts, enabling health care professionals to offer timely and
personalized assistance.
Methods: We collected 294 posts from Reddit subcommunities focused on DV shared by women aged ≥18 years who
self-identified as experiencing intimate partner violence. We identified 8 types of information needs: shelters/DV centers/agencies;
legal; childbearing; police; DV report procedure/documentation; safety planning; DV knowledge; and communication. Data
augmentation was applied using GPT-3.5 to expand our dataset to 2216 samples by generating 1922 additional posts that imitated
the existing data. We adopted a progressive training strategy to fine-tune GPT-3.5 for multiclass text classification using 2032
posts. We trained the model on 1 class at a time, monitoring performance closely. When suboptimal results were observed, we
generated additional samples of the misclassified ones to give them more attention. We reserved 184 posts for internal testing
and 74 for external validation. Model performance was evaluated using accuracy, recall, precision, and F1
-score, along with CIs
for each metric.
Results: Using 40 real posts and 144 artificial intelligence–generated posts as the test dataset, our model achieved an F1
-score
of 70.49% (95% CI 60.63%-80.35%) for real posts, outperforming the original GPT-3.5 and GPT-4, fine-tuned Llama 2-7B and
Llama 3-8B, and long short-term memory. On artificial intelligence–generated posts, our model attained an F1
-score of 84.58%
(95% CI 80.38%-88.78%), surpassing all baselines. When tested on an external validation dataset (n=74), the model achieved
an F1
-score of 59.67% (95% CI 51.86%-67.49%), outperforming other models. Statistical analysis revealed that our model significantly outperformed the others in F1
-score (P=.047 for real posts; P<.001 for external validation posts). Furthermore, our
model was faster, taking 19.108 seconds for predictions versus 1150 seconds for manual assessment.
Conclusions: Our fine-tuned LLM can accurately and efficiently extract and identify DV-related information needs through
multiclass classification from online posts. In addition, we used LLM-based data augmentation techniques to overcome the
limitations of a relatively small and imbalanced dataset. By generating timely and accurate predictions, we can empower health
care professionals to provide rapid and suitable assistance to survivors of DV.
Ključne besede: domestic violence, online health communities, large language models, generative artificial intelligence, artificial intelligence
Objavljeno v DKUM: 22.07.2025; Ogledov: 0; Prenosov: 2
Celotno besedilo (780,00 KB)
Gradivo ima več datotek! Več...