AgroCoT-- A Chain-of-Thought Benchmark for Evaluating Reasoning in Vision-Language Models for Agriculture
AgroCoT: A Chain-of-Thought Benchmark for Evaluating Reasoning in Vision-Language Models for Agriculture
Categories
- cs.AI
Initially Published At
2025-11-28T15:02:19Z
Authors
- Yibin Wen
- Qingmei Li
- Zi Ye
- Jiarui Zhang
- Zurong Mai
- Jing Wu
- Shuohong Lou
- Yuhang Chen
- Henglian Huang
- Xiaoya Fan
- Yang Zhang
- Defeng Gu
- Lingyuan Zhao
- Yutong Lu
- Haohuan Fu
- Jianxi Huang
- Juepeng Zheng
Summary
Recent advancements in Vision-Language Models (VLMs) have significantly impacted various industries. In agriculture, these multimodal capabilities hold great promise for applications such as precision farming, crop monitoring, pest detection, and environmental sustainability. However, while several Visual Question Answering (VQA) datasets and benchmarks have been developed to assess VLM performance, they often fail to effectively evaluate the critical reasoning and problem-solving skills needed in complex agricultural contexts. To address this gap, we introduce AgroCoT, a VQA dataset that integrates Chain-of-Thought (CoT) reasoning, specifically designed to evaluate the reasoning capabilities of VLMs. With 4,759 carefully curated samples, AgroCoT provides a comprehensive and robust evaluation of reasoning abilities, particularly in zero-shot scenarios, focusing on the models’ ability to engage in logical reasoning and effective problem-solving. Our evaluation of 30 representative VLMs, including both proprietary and open-source models, reveals a gap in their reasoning capabilities, which underscores the importance of incorporating CoT for assessments. Our dataset is available at https://huggingface.co/datasets/wenyb/AgriCoT.