42
0

WildFireCan-MMD: A Multimodal Dataset for Classification of User-Generated Content During Wildfires in Canada

Abstract

Rapid information access is vital during wildfires, yet traditional data sources are slow and costly. Social media offers real-time updates, but extracting relevant insights remains a challenge. We present WildFireCan-MMD, a new multimodal dataset of X posts from recent Canadian wildfires, annotated across twelve key themes. Evaluating both vision-language models and custom-trained classifiers, we show that while zero-shot prompting offers quick deployment, even simple trained models outperform them when labelled data is available. Our best-performing transformer-based fine-tuned model reaches 83% f-score, outperforming gpt4 by 23%. As a use case, we demonstrate how this model can be used to uncover trends during wildfires. Our findings highlight the enduring importance of tailored datasets and task-specific training. Importantly, such datasets should be localized, as disaster response requirements vary across regions and contexts.

View on arXiv
@article{sherritt2025_2504.13231,
  title={ WildFireCan-MMD: A Multimodal Dataset for Classification of User-Generated Content During Wildfires in Canada },
  author={ Braeden Sherritt and Isar Nejadgholi and Marzieh Amini },
  journal={arXiv preprint arXiv:2504.13231},
  year={ 2025 }
}
Comments on this paper