Download 736 740 Zip -
Reference the original paper: Drossos, K., Lipping, S., & Virtanen, T. (2020). "Clotho: an Audio Captioning Dataset." Proc. IEEE ICASSP, pp. 736-740 .
Mention the diversity of the audio (natural sounds, urban environments, etc.) and the linguistic variety of the captions. Download 736 740 zip
You can also download specific evaluation (1.2 GB) or analysis (14.4 GB) subsets. 🛠️ Producing a Write-up Reference the original paper: Drossos, K
The full development set is approximately 6.5 GB . Reference the original paper: Drossos
The dataset is hosted by the and can be accessed through platforms like Zenodo .
If you are writing a technical report or paper using this data, ensure you include these standard sections: