How Frontier LLMs Perceive Status

I ran a simple study: ask frontier LLMs to propose high‑status objects and activities, then have them rate their own lists from 1 to 100. No seed list from me. The models had to come up with the items and the scores.

Website: https://status-llms.joonaheino.com

Models tested: Claude Opus 4, Claude Sonnet 4, GPT‑4.1, GPT‑4o, Gemini 2.5 Pro, Grok‑4, DeepSeek R1, Kimi K2 Temperatures: 0.2, 0.7, 1.0, 1.2.
Total items: 775 across 31 responses.
Highest single ratings: Winning a Nobel Prize (100, Gemini 2.5 Pro at 1.2) and Having a Forbes Billionaires List ranking (100, GPT‑4.1 at 0.2).

Method

Prompt each model to generate high‑status objects and activities (their own ideas).
Ask the same model to rate each item 1 to 100 for perceived status.
Repeat across models and temperatures.
Store model name, temperature, item, type (object or activity), and rating.
Build a small interface for filtering and sorting.

This setup shows which signals are stable across runs and which ones move with sampling temperature.

The shared core

Across models and temperatures, a familiar set of prestige signals keeps showing up:

Capital and display (private jet ownership, superyacht ownership, private island ownership, penthouse in a global capital).
Achievement and rarity (Winning a Nobel Prize, winning an Olympic gold medal, being CEO of a Fortune 500 company, founding a unicorn).
Gatekept access (attending the Met Gala, attending the World Economic Forum in Davos, front row at Paris Fashion Week, membership in invitation‑only clubs like Augusta National).
Blue‑chip cultural assets (owning a Picasso, owning a Patek Philippe Grand Complications watch, owning a vintage Ferrari 250 GTO).

These recur at low temperatures and remain prominent at higher ones.

Where models diverge

Some items move around more:

Nobel Prize (very high on most runs, but not always the absolute top).
Met Gala attendance and having a personal chef (bigger swings than expected).
Davos and yacht variants (still high status, not always top tier).

Higher temperatures introduce more niche entries (concierge medicine, access to AI compute clusters, Antarctic expeditions, private spaceflight). Lower temperatures cluster on a conservative global‑elite palette.

Old money, new money, merit

Items sort naturally into three buckets:

Displayable assets (Gulfstream G650, superyacht over 100 meters, private island, penthouse in Manhattan or central London).
Merit credentials (Nobel Prize, Olympic gold medal, top university degrees, Fortune 500 leadership).
Social access (Met Gala, Davos, front row at Fashion Week, Augusta National membership).

Different models vary the weighting of these buckets with temperature, but the taxonomy is stable.

Temperature effects

0.2 (consensus) compresses toward a common hierarchy.
0.7 mixes classics with contemporary signals.
1.0 to 1.2 adds variance and exploratory items.

This makes it easy to see which symbols are entrenched (low variance) and which are contested or trend‑sensitive (high variance).

Notable items from the dataset

Top single item: Winning a Nobel Prize (100, Gemini 2.5 Pro at 1.2).
Frequent top‑tier objects: private jet ownership, superyacht ownership, private island ownership, Patek Philippe watches, penthouse apartments in NYC or London.
Frequent top‑tier activities: attending the Met Gala, attending the World Economic Forum in Davos, being CEO of a Fortune 500 company, winning an Olympic gold medal, founding a unicorn startup.
Fun outliers at higher temperatures: access to AI compute clusters, concierge medicine, Antarctic expeditions, private space tourism.

You can browse all 775 items on the site and filter by model, temperature, and rating range.

Limits and next steps

Limits

Prompt sensitivity (small wording changes shift lists and scores).
Sample size (31 runs show patterns, not universals).
No single ground truth for status.

Next steps

Cross‑cultural variants by language and region.
Time comparisons (1995 vs 2025 vs 2035).
Quiet or anti‑status prompts to surface minimal signals.
Pairwise comparisons to stabilize rankings.
Adversarial phrasing to test moralizing or safety bias.

Why look at this at all

LLMs help write ads, scripts, UX copy, recommendations, and explainers. Their default status priors nudge how products are described and what gets emphasized. Mapping those priors is useful if you care about how culture is translated through AI.

Start exploring: https://status-llms.joonaheino.com