Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video
ParsingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023 |
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature
AlignmentIEEE International Conference on Computer Vision (ICCV), 2023 |
A Case Study on Combining ASR and Visual Features for Generating
Instructional Video CaptionsConference on Computational Natural Language Learning (CoNLL), 2019 |
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event
CaptioningIEEE International Conference on Computer Vision (ICCV), 2019 |
Temporal Deformable Convolutional Encoder-Decoder Networks for Video
CaptioningAAAI Conference on Artificial Intelligence (AAAI), 2019 |