WildFireCan-MMD: A Multimodal Dataset for Classification of User-Generated Content During Wildfires in Canada

Rapid information access is vital during wildfires, yet traditional data sources are slow and costly. Social media offers real-time updates, but extracting relevant insights remains a challenge. We present WildFireCan-MMD, a new multimodal dataset of X posts from recent Canadian wildfires, annotated across twelve key themes. Evaluating both vision-language models and custom-trained classifiers, we show that while zero-shot prompting offers quick deployment, even simple trained models outperform them when labelled data is available. Our best-performing transformer-based fine-tuned model reaches 83% f-score, outperforming gpt4 by 23%. As a use case, we demonstrate how this model can be used to uncover trends during wildfires. Our findings highlight the enduring importance of tailored datasets and task-specific training. Importantly, such datasets should be localized, as disaster response requirements vary across regions and contexts.
View on arXiv@article{sherritt2025_2504.13231, title={ WildFireCan-MMD: A Multimodal Dataset for Classification of User-Generated Content During Wildfires in Canada }, author={ Braeden Sherritt and Isar Nejadgholi and Marzieh Amini }, journal={arXiv preprint arXiv:2504.13231}, year={ 2025 } }