BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks

Jan 1, 2025·
Juan A. Rodriguez
,
Xiangru Jian
,
Siba Smarak Panigrahi
,
Tianyu Zhang
,
Aarash Feizi
,
Abhay Puri
,
Akshay Kalkunte Suresh
,
François Savard
,
Ahmed Masry
,
Shravan Nayak
,
Rabiul Awal
,
Mahsa Massoud
,
Amirhossein Abaskohi
,
Zichao Li
,
Suyuchen Wang
,
Pierre-Andre Noel
,
Mats Leon Richter
,
Saverio Vadacchino
,
Shubham Agarwal
,
Sanket Biswas
,
Sara Shanian
,
Ying Zhang
,
Sathwik Tejaswi Madhusudhan
,
Joao Monteiro
,
Krishnamurthy Dj Dvijotham
,
Torsten Scholak
,
Nicolas Chapados
,
Sepideh Kharaghani
,
Sean Hughes
,
M. Özsu
,
Siva Reddy
,
Marco Pedersoli
,
Yoshua Bengio
,
Christopher Pal
,
Issam H. Laradji
,
Spandana Gella
,
Perouz Taslakian
,
David Vazquez
,
Sai Rajeswar
· 0 min read
Type
Publication
The Thirteenth International Conference on Learning Representations (ICLR 2025)