Continual Learning for VQA
A study of CL settings for multimodal learning.
We explore continual learning in Visual Question Answering, where tasks can be defined according to the visual or textual modality.
- We introduce four settings based on Question Types, the Visual Content, and different VQA datasets.
- We benchmark several continual learning algorithms. We find that the straightforward Experience Replay strategy is the only approach that is consistently effective across settings.
- We compare single-stream encoder-only models that utilize object-centric or patch-based representations. We find that the object-cetric model is more robust to forgetting. This highlights the importance of developing continual learning approaches that perform well for patch-based representations.
- We analyze the evolution of representational similarity during training. We find that the representations of two modalities change at different rates, with visual representations showing lower similarity. This motivated our future work on separately controlling the regularization weight for representations of the two input modalities.
References
2022
- Task Formulation Matters When Learning Continually: A Case Study in Visual Question Answering2022