Comparing AI Vision: Which Model wins Image Recognition and Text Extraction?

Comparing AI Vision: Which Model Wins in Image Recognition and Text Extraction?

LLM models have made it easier to extract information from images, whether it’s creating detailed descriptions or recognizing text. In this article, we compare six popular models — Gemini Pro 1.5Gemini Flash 1.5ChatGPT-4oGPT-4o-mini, Claude 3.5 Sonnet and Claude 3.5 Haiku — to identify the best performer for image recognition and text extraction tasks.

Question 1: Image Recognition – Describe the Photo Concisely

Ranking of LLM Models

  1. Gemini Pro 1.5
    Polished and concise, Gemini Pro 1.5 provides detailed yet succinct descriptions that perfectly balance accuracy and brevity.
  2. Gemini Flash 1.5
    Accurate and clear, Gemini Flash 1.5 offers robust image descriptions, although it’s slightly less polished than Pro.
  3. ChatGPT-4o
    ChatGPT-4o delivers detailed descriptions with clear identification of the outfit and background. However, its phrasing can be slightly less refined than the top two models.
  4. Claude 3.5 Sonnet
    Descriptive and thorough, Claude 3.5 Sonnet excels in detail but tends to over-explain, making it less concise than higher-ranked options.
  5. GPT-4o-mini
    Simple and effective for basic recognition, GPT-4o-mini provides accurate descriptions but lacks depth in covering finer details.
  6. Claude 3.5 Haiku
    Misinterpreted the image entirely, offering no useful description, which places it last in this category.

Question 2: Text Extraction – Extract the Text Concisely

For text extraction, the models were tested on the following quote:

“You can’t connect the dots looking forward; you can only connect them looking backwards. So you have to trust that the dots will somehow connect in your future.”

Performance Results

All 5 models achieved a perfect match with the original text, demonstrating 100% accuracy in text extraction, except Claude 3.5 Haiku gives no information again.

  • Gemini Pro 1.5
  • Gemini Flash 1.5
  • ChatGPT-4o
  • ChatGPT-4omini
  • Claude 3.5 Sonnet

Final Rankings

  1. Gemini Pro 1.5
  2. Gemini Flash 1.5
  3. ChatGPT-4o
  4. Claude 3.5 Sonnet
  5. GPT-4o-mini
  6. Claude 3.5 Haiku

AI models like Gemini Pro 1.5, Gemini Flash 1.5 and ChatGPT-4o excel in delivering high-quality results for image recognition and text extraction. Gemini Pro’s consistent polish and accuracy make it the standout choice for users seeking precise and versatile AI performance.