AgroCoT-- A Chain-of-Thought Benchmark for Evaluating Reasoning in Vision-Language Models for Agriculture

AgroCoT: A Chain-of-Thought Benchmark for Evaluating Reasoning in Vision-Language Models for Agriculture

Authors

Yibin Wen
Qingmei Li
Zi Ye
Jiarui Zhang
Zurong Mai
Jing Wu
Shuohong Lou
Yuhang Chen
Henglian Huang
Xiaoya Fan
Yang Zhang
Defeng Gu
Lingyuan Zhao
Yutong Lu
Haohuan Fu
Jianxi Huang
Juepeng Zheng

Recent advancements in Vision-Language Models (VLMs) have significantly impacted various industries. In agriculture, these multimodal capabilities hold great promise for applications such as precision farming, crop monitoring, pest detection, and environmental sustainability. However, while several Visual Question Answering (VQA) datasets and benchmarks have been developed to assess VLM performance, they often fail to effectively evaluate the critical reasoning and problem-solving skills needed in complex agricultural contexts. To address this gap, we introduce AgroCoT, a VQA dataset that integrates Chain-of-Thought (CoT) reasoning, specifically designed to evaluate the reasoning capabilities of VLMs. With 4,759 carefully curated samples, AgroCoT provides a comprehensive and robust evaluation of reasoning abilities, particularly in zero-shot scenarios, focusing on the models’ ability to engage in logical reasoning and effective problem-solving. Our evaluation of 30 representative VLMs, including both proprietary and open-source models, reveals a gap in their reasoning capabilities, which underscores the importance of incorporating CoT for assessments. Our dataset is available at https://huggingface.co/datasets/wenyb/AgriCoT.

AgroCoT-- A Chain-of-Thought Benchmark for Evaluating Reasoning in Vision-Language Models for Agriculture

AgroCoT: A Chain-of-Thought Benchmark for Evaluating Reasoning in Vision-Language Models for Agriculture

PDF

Categories

Initially Published At

Authors

Summary