LLM models have made it easier to extract information from images, whether it’s creating detailed descriptions or recognizing text. In this article, we compare six popular models — Gemini Pro 1.5, Gemini Flash 1.5, ChatGPT-4o, GPT-4o-mini, Claude 3.5 Sonnet and Claude 3.5 Haiku — to identify the best performer for image recognition and text extraction tasks.
Table of Contents
Question 1: Image Recognition – Describe the Photo Concisely
Ranking of LLM Models
- Gemini Pro 1.5
Polished and concise, Gemini Pro 1.5 provides detailed yet succinct descriptions that perfectly balance accuracy and brevity. - Gemini Flash 1.5
Accurate and clear, Gemini Flash 1.5 offers robust image descriptions, although it’s slightly less polished than Pro. - ChatGPT-4o
ChatGPT-4o delivers detailed descriptions with clear identification of the outfit and background. However, its phrasing can be slightly less refined than the top two models. - Claude 3.5 Sonnet
Descriptive and thorough, Claude 3.5 Sonnet excels in detail but tends to over-explain, making it less concise than higher-ranked options. - GPT-4o-mini
Simple and effective for basic recognition, GPT-4o-mini provides accurate descriptions but lacks depth in covering finer details. - Claude 3.5 Haiku
Misinterpreted the image entirely, offering no useful description, which places it last in this category.
Question 2: Text Extraction – Extract the Text Concisely
For text extraction, the models were tested on the following quote:
“You can’t connect the dots looking forward; you can only connect them looking backwards. So you have to trust that the dots will somehow connect in your future.”
Performance Results
All 5 models achieved a perfect match with the original text, demonstrating 100% accuracy in text extraction, except Claude 3.5 Haiku gives no information again.
- Gemini Pro 1.5
- Gemini Flash 1.5
- ChatGPT-4o
- ChatGPT-4o–mini
- Claude 3.5 Sonnet
Final Rankings
- Gemini Pro 1.5
- Gemini Flash 1.5
- ChatGPT-4o
- Claude 3.5 Sonnet
- GPT-4o-mini
- Claude 3.5 Haiku
AI models like Gemini Pro 1.5, Gemini Flash 1.5 and ChatGPT-4o excel in delivering high-quality results for image recognition and text extraction. Gemini Pro’s consistent polish and accuracy make it the standout choice for users seeking precise and versatile AI performance.