How AI Understands Dog Breeds: A Research Project

When you hear “Siberian Husky,” what characteristics come to mind? Most people might think of physical traits like their distinctive appearance, but what about their personality traits, or how well they work in different environments? This question led me down an interesting path exploring how AI models understand and associate different dog breeds with their non-physical characteristics.

Why Dogs?

I’ve been training dogs since 1998, starting with a Rottweiler. Over the years, I’ve been deeply involved in search and rescue, obedience, and protection sports (IPO and Mondioring) as both handler and helper. I’ve also had the privilege of working with military and police dogs, particularly working line German Shepherds and Belgian Malinois. My own Malinois was an incredible partner - she approached everything with amazing drive and dedication. Currently, I’m training my first working line Lab, which has been fascinating. While the work is different, I’m seeing interesting parallels between traditional gundog training and modern approaches to sport/working dogs. This extensive background with different breeds and training methodologies made me particularly curious about how LLMs understand and categorize dog breeds and their characteristics.

The Original Plan: Embedding-Based Analysis

The initial concept was straightforward: use modern embedding models to understand how AI “thinks” about different dog breeds and their characteristics. By comparing the embeddings of phrases like “works well independently” or “needs constant attention” with embeddings of different dog breeds, we could potentially uncover interesting patterns in how language models have learned to associate breeds with various traits.

I used OpenAI’s text-embedding-3-large model to generate embeddings for both breed names and characteristic phrases, planning to use cosine similarity to find meaningful associations. The hypothesis was that breeds historically bred for similar purposes would cluster together in embedding space, and their embeddings would be closer to relevant characteristic phrases.

However, this approach didn’t yield the insights I was hoping for. The cosine similarities between breeds and characteristics often didn’t align with well-known breed traits, suggesting that either the embedding space wasn’t capturing these relationships effectively, or the relationships were more complex than simple vector similarities could reveal.

Pivoting to Direct AI Assessment

After the embedding approach proved unsuccessful, I shifted to a more direct method: asking Google’s Gemini-2.0-Pro-05-02 Experimental model (as of February 2025) to explicitly rate how well each characteristic applies to each breed on a scale of 0-100.

This approach produced much more nuanced and generally accurate results. The AI showed a sophisticated understanding of breed characteristics, often aligning well with common knowledge about different breeds. For example:

It correctly identified breeds known for independence versus those needing constant human interaction
It accurately rated working dogs’ abilities in their traditional roles
It showed understanding of breed-specific behavioral traits

However, the AI wasn’t perfect. Some of its assessments revealed interesting blind spots. A notable example is the Finnish Lapphund, which received a surprisingly low score for “working in cold conditions” – a trait that should be high given the breed’s history of working in arctic conditions. These mismatches between AI perception and reality highlight the limitations in how AI models learn about specialized domains.

Making It Interactive: A Citizen Science Approach

To better understand where AI perceptions align with or differ from human knowledge, I’ve created an interactive website where people can vote on these AI-generated trait assessments. The goal is to create a dataset that captures both AI and human understanding of dog breeds.

All the data is open and freely available, including:

Complete lists of breeds and characteristics
AI-generated similarity scores
Original embeddings for both breeds and characteristics
User voting data (coming soon)

You can find the dataset on GitHub and explore the interactive visualization at Dog Breed Research.

What’s Next?

This project opens up several interesting research directions:

Analyzing patterns in how AI models understand specialized domains like dog breeds
Comparing different AI models’ understanding of breed characteristics
Studying how human knowledge differs from AI-generated assessments
Using the collected voting data to improve AI understanding of breed traits

If you’re interested in contributing to this research, head over to the website and start voting on breed characteristics. Every vote helps build a better understanding of how AI perceptions compare to human knowledge.

The code, data, and methodology are all open source – feel free to use them for your own research or analysis.

Get Involved

You can find the project at:

Website: Dog Breed Research
GitHub: Project Repository
Contact: @joonaheino on X