How to set up reputation monitoring in large language models
Communications managers face two challenges: Is your company is included in AI models and if so, how is it portrayed in AI responses?
An increasing number of people are no longer using traditional search engines to find information, but are relying on the answers provided by AI large language models (LLMs). However, LLMs work fundamentally differently from Google and other search engines. Rather than linking to individual websites, they summarise information from numerous sources and provide an interpreted answer. As a result, an AI response is always an interpretation of information, and it is precisely this interpretation that is increasingly shaping the perception of companies, brands, and topics.
Communications managers are therefore faced with two challenges. First, they must determine whether their company is included in AI models at all. Second, they must examine how their company is portrayed in AI responses, in terms of both tone and associations.
AI as both a reputation actor and an analysis tool
Language models play a dual role: they are both the subject of investigation and the analytical tool.
On the one hand, the focus is on monitoring AI-generated statements about companies that affect their reputation. The central question here is: What do AI systems say about a company? This involves systematically collecting, analysing, and evaluating the responses that language models give to typical user questions, such as those relating to quality, trustworthiness, scandals, sustainability, or employer attractiveness. This form of monitoring is comparable to classic media or social media monitoring, but it refers to a new actor: AI itself as a “communicator”.
On the other hand, AI is used to observe and measure reputation, ie AI-supported reputation monitoring. In this case, AI is not the object of observation, but the tool. Language models evaluate large volumes of text to derive reputation indicators such as tonality, topics, trust, risk factors, and narratives.
These two levels are closely intertwined. This is because the training data and response patterns of language models are derived from the very public discourse that AI analyses. Consequently, reputation is increasingly emerging within a cycle of public communication, AI-supported evaluation, and AI-based reproduction – marking a fundamental change that is redefining strategic reputation management.
Why language models are not measuring instruments
Established models of reputation measurement, such as the RepTrak model or the Reputation Quotient (RQ), are usually based on extensive surveys. These surveys are objective (repeating them produces the same result), reliable (the same measurement produces the same result), and valid (they measure what they are supposed to measure), thus fulfilling all three quality criteria of empirical research.
Language models, however, do not meet these criteria in a strict methodological sense. Their answers depend heavily on the prompt, the context, the language, and even the order of the questions. This means they are not objective in the sense of measurement theory, but prompt-sensitive. They are also only reliable to a limited extent, because the same prompt generates similar, but never identical, responses. As a result, the models are not strictly reliable, but only “stylistically stable”. Finally, they are not valid, because they do not measure anything in an empirical sense. Instead, they generate texts that resemble measurements. When asked, for example, “How satisfied are German customers with brand X?”, they can provide plausible figures and convincing explanations, but these have no empirical basis in reality.
These limitations make it clear that language models should not be misunderstood as measuring instruments. Nevertheless, when used correctly, they are surprisingly well-suited as a proxy for reputation.
This is because, in the digital public sphere, reputation is shaped primarily by language, repetition, and media visibility rather than by individual experiences alone. A company’s identity is determined by what is continuously said, written, and quoted about it. If a company is considered “innovative but chaotic” or “expensive but high-quality”, this influences decisions even among people who have never been customers. Reputation is not an objectively measurable characteristic, but an external attribution. It is not something a company possesses, but the result of collective perception.
From experienced to discursive reputation
Against this backdrop, LLMs are not instruments for measuring experienced reputation, but for analysing discursive reputation. The focus is not on what real customers think after specific interactions, but on the image that has become established in public discourse.
This is where language models’ particular strength lies. Trained on vast amounts of text, they excel at recognising and summarising dominant narratives. They process opinions expressed in the media, on social networks, in forums, and on review platforms, thereby condensing public discourse. In other words, while language models do not have direct access to reality, they do have a highly developed model of how people talk and think about the world – or, in this case, about a company or brand.
If reputation is understood as the result of discursive condensation, it can be heuristically structured along three central dimensions: awareness, attitude, and attribution. All three dimensions can be systematically observed and evaluated with the help of language models, albeit in different ways.
Awareness: Is the company even considered?
Awareness describes whether, and to what extent, a company exists in the cognitive space of the market. In an AI-mediated information environment, this means one thing above all: does the company appear in the responses of language models when real decision-making and orientation questions are asked?
In practice, I use two approaches to answer this question. First, I rely on specialised tools that systematically record companies’ visibility in various language models. Second, I work with standardised prompts that reflect typical user questions, such as “Which providers are leaders in the field of…?” or “Which companies are considered trustworthy when it comes to…?” These prompts are deliberately formulated in a neutral manner and used identically across multiple models.
What matters is not the individual mention, but the aggregated pattern. The responses provide an overview of market visibility: how often is a company mentioned, in which contexts does it appear, and what position does it occupy in comparison to competitors? This reveals whether a brand is on the shortlist or plays little role in decision-making processes. Here, awareness is reflected not in reach metrics, but in discursive presence in relevant decision-making situations.
Attitude: How is the company evaluated emotionally?
The second dimension concerns the emotional and evaluative colouring of this presence. The question is whether the perception of a company is predominantly positive, negative, or ambivalent. This dimension cannot be derived directly from language model responses, as LLMs do not make independent evaluations, but rather condense existing discourses.
An intermediate step is therefore required to combine the aggregated market voice reflected in language models with real user experiences. In practice, communications managers must systematically process current customer reviews and use them as the basis for analysis.
To do this, reviews from relevant sources – such as Trustpilot, app stores, or industry-specific platforms – are exported for a clearly defined period, for example the last two to three months. The data is then consolidated, cleaned, and structured so that all analysed companies can be compared on the same basis.
Next, several language models analyse the same review dataset. It is crucial that the models work exclusively with the available comments and do not incorporate any external knowledge. Based on this material, they identify overall sentiment, recurring positive and negative themes, and changes over time.
Finally, the results are synthesised. The models summarise the prevailing mood, extract typical experience patterns, and highlight areas of high satisfaction, frustration, or structural problems. Because all providers are evaluated using identical data and analysis prompts, this approach produces an up-to-date and consistent comparison of emotional evaluation across the market.
Attributions and associations: What does the company stand for?
The third dimension concerns the substantive meaning of a brand. This is not about sentiment, but about the characteristics, themes, and associations attributed to a company in the market. This dimension can be captured by using language models to systematically identify, cluster, and consolidate recurring terms, attributes, and associations.
To do this, communications managers define uniform prompts that explicitly address key reputation issues. Examples include: “Is the company considered reputable?”, “What topics is it most often associated with?”, or “Is its positioning clear, contradictory, or difficult to grasp?” These prompts are systematically answered by multiple language models.
The responses are then compared. If the models provide consistent descriptions, this indicates a coherent reputation. If the assessments diverge significantly or contradict one another, this points to a fragmented market identity. At the same time, the topics and attributes mentioned are collected, structured, and classified as positive, negative, or ambivalent.
The result is a set of dominant reputation clusters that reveal whether a brand is primarily associated with trust, innovation, regulation, risk, or controversy. Finally, the clarity of positioning is assessed: if the models’ descriptions largely converge, the profile is considered stable. If they diverge, the result is a blurred or contradictory image that requires targeted communication efforts.
Dr Lydia Prexl is a communications strategist with over 15 years’ experience. Since 2021, she has been responsible for internal and external communications at the European payment service provider Unzer. Prior to that, she established the communications function at the fintech insurer Getsafe. She is also the author and editor of several books and guides on communication and writing, including Wie kommunizieren Startups? (How do startups communicate?).
