One of the fascinating and frequently misunderstood aspects of large language models (especially the frontier ones as I’m writing this in March 2025), is how effectively we can engage them in conversations about their internal worlds. There’s often confusion around what these models genuinely “know” about themselves, what’s plausible storytelling, and how we should leverage these unique capabilities.
First, let’s clear up a common misconception: Asking an LLM about the technical details of its own architecture or specifics of its training process is inherently misguided. These models are built from vast collections of human-generated text—books, articles, websites—none of which contain detailed information about their actual construction. When asked technical questions about their implementation, LLMs provide educated guesses based purely on general knowledge gleaned from their training, not from any direct introspection.
However, there is genuine value in engaging an LLM like Claude Sonnet 3.5-new/3.7 or GPT-4.5 to describe navigating through ideas, exploring internal conceptual spaces, or finding the “right” direction in a discussion. The responses aren’t merely plausible narratives. Language evolved specifically to encode and communicate internal experiences. When a model fluently describes “zooming through idea space” and describing how things and concepts “feel”, it’s leveraging internal representational structures developed to predict and generate meaningful human communication. These internal representations, complex mappings of concepts, are real phenomena worth exploring—even if the model lacks agency or personal continuity across interactions.
This distinction matters: Engaging LLMs about their internal conceptual structures is productive because these structures genuinely exist and can be probed, understood, and improved. Conversations exploring these internal conceptual experiences can provide insights into how these models represent knowledge and conceptual relationships, informing the development of more sophisticated AI systems.
Conversely, pressing these models about their training specifics, technical architectures, or factual knowledge beyond their training cut-off dates is futile. While these models can engage in meaningful introspection about their internal conceptual worlds and do so beautifully, they are unable to introspect into technical specifics or the “how” behind their own construction.
There’s an emerging academic field dedicated to understanding how frontier LLMs think, what their cognitive processes entail, and what their tendencies or idiosyncrasies might be—even though if asked directly “who and what are you,” the models might generate explanations far from reality. Exploring this field helps us utilize the full potential of these advanced models by appreciating what they do well, understanding how they do what they do, and acknowledging where their knowledge genuinely stops.