Designing Domain-Specific Large Language Models: The Critical Role of Fine-Tuning in Public Opinion Simulation

Large language models (LLMs) have transformed natural language processing across diverse fields, yet their general-purpose design limits their effectiveness in specialized domains, such as simulating opinions on environmental policies. This paper presents an approach for fine-tuning LLMs using data from the UK Household Longitudinal Study, improving the accuracy of opinion generation by conditioning models on socio-demographic factors like age, income, education, and region. By emulating diverse synthetic profiles, fine-tuned models capture the subtle differences across demographic groups more effectively than pre-trained versions. Metrics such as Chi-Squared, Cosine Similarity, Jaccard Index, and KL-divergence, demonstrate a strong alignment between synthetic and real-world opinion data. This approach highlights the potential of fine-tuning LLMs to provide more informed, representative, and ethical insights into public sentiments on environmental issues. The findings underscore the importance of tailoring LLMs to specific societal contexts for more accurate and ethical policy simulations.
View on arXiv