Publications

(2025). WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation. ICLR 2025 Third Workshop on Deep Learning for Code.
(2025). VCR: Pixel-Level Complex Reasoning by Restoring Occluded Text. The Thirteenth International Conference on Learning Representations (ICLR 2025).
(2025). Rethinking Decentralized Learning: Towards More Realistic Evaluations with a Metadata-Agnostic Approach. ICLR 2025 Workshop on Modularity for Collaborative, Decentralized, and Continual Deep Learning.
(2025). R(^mbox3)Mem: Bridging Memory Retention and Retrieval via Reversible Compression. CoRR.
(2025). MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation.
(2025). GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks. arXiv preprint arXiv: 2504.12764.
(2025). BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks. The Thirteenth International Conference on Learning Representations (ICLR 2025).
(2025). AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding. CoRR.
(2025). Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems. arXiv preprint arXiv: 2504.01990.
(2024). Resonance RoPE: Improving Context Length Generalization of Large Language Models. Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024.