Publisher AEO (Agent Optimization) is the strategic management of how news content is ingested, cited, and monetized by generative AI models. It's a balance: publishers must allow retrieval bots (like OAI-SearchBot) to access headlines and summaries for citation visibility, while blocking training bots (like GPTBot) from scraping full-text archives for unpaid model training. This is the "Scraper Dilemma": block too much and you become invisible; allow too much and you cannibalize subscription revenue.
The Problem: The "Free Rider" Crisis
For 20 years, publishers relied on the "Google bargain": we give you content, you give us traffic. With AI, the bargain is broken. A user asks "What is the latest on the Fed interest rate decision?" The AI reads the Wall Street Journal, Bloomberg, and NYT, then synthesizes a perfect three-paragraph summary. The user reads it and leaves. Zero clicks. Zero ad impressions. Zero subscription conversions.
The risk runs both ways. If you put content behind a hard paywall that blocks all bots, the AI says "I cannot verify this source" and cites a lower-quality free blog instead. You lose authority. If you open your paywall to bots, the AI consumes your premium product for free. You lose revenue. This is the publisher dimension of the Vertical Split: the reward isn't traffic, it's monetizable trust.
The Solution: The "Tiered Access" Protocol
Treat AI agents differently based on their intent. You need a granular access strategy.
1. The "Abstract Layer" (free for AI)
You can't let the AI read the whole article, but you must let it read the abstract. Expose the headline, the lede (first two paragraphs), and the key data points (date, author, entities) in your Article schema. This gives the AI enough context to cite you ("According to the NYT...") without enough tokens to generate a full substitute.
2. The "Paywall Property" (schema enforcement)
Explicitly tell the AI that this content is gated using the isAccessibleForFree: false property. This is a legal and technical signal: it tells compliant bots that while they can see the content for indexing, they're not licensed to display it in full.
3. The "Training Block" (copyright defense)
As detailed in the robots.txt strategy guide, separate "search" from "training": allow OAI-SearchBot (for real-time news citations) and block GPTBot (for building the next model).
Technical Implementation: The Paywall Schema
Here is the JSON-LD structure that protects your revenue while maintaining your visibility.
The "Summary" Optimization
AI models love summaries. If you don't provide one, they'll try to generate one (often poorly). Create a dedicated summary field in your CMS and populate it with 3-5 bullet points. Inject this summary into the <meta name="description"> and the JSON-LD abstract field. This increases the probability that the AI uses your approved summary rather than hallucinating one.
Google News vs. AI News
Feature | Google News / Top Stories | AI News Agent (Perplexity) |
Ranking Factor | Recency + CTR | Information density + trust |
User Intent | Scan headlines | Synthesize a narrative |
Paywall Handling | "First Click Free" (legacy) | Schema-based enforcement |
Traffic | High volume (low intent) | Low volume (high intent) |
Citation Style | Link + image | Footnote citation |
Strategic Advantage: The "Live Blog" Schema
For breaking news, static articles are too slow. AI agents prioritize live data. Use the LiveBlogPosting schema, which tells the AI that this URL is updating every minute. Agents like Perplexity are programmed to re-crawl these URLs more frequently, increasing your chance of being the "first source" cited for developing stories. The contrarian point: the publisher instinct to wall everything off is exactly backwards. The asset isn't the article body the AI wants to read; it's being the named, trusted source the AI is forced to attribute, and that's something you give away on purpose.
Is the AI citing you, or eating you?
Free audit. Checks your bot-access rules, isAccessibleForFree enforcement, and abstract-layer schema so you stay cited without giving the archive away.
Audit your paywall strategy โKey Takeaways
- Differentiate the bots. Don't use a blanket Disallow: / for OpenAI. Allow OAI-SearchBot if you want traffic, even while you block GPTBot to protect IP.
- Schema is the guardrail. isAccessibleForFree: false is your digital rights management. Implement it on all premium URLs.
- The abstract strategy. Give the AI the "who, what, when" for free. Charge the human for the "why and how."
- Token efficiency. As noted in the token efficiency audit, heavy ads and trackers slow ingestion. Serve a "lite" version to bots so they index breaking news before the timeout.
- Syndication risk. If you syndicate to MSN or Yahoo, the AI might read it there (for free) instead of on your site. Review your canonical tags.
References & Further Reading
- Schema.org: Subscription and Paywalled Content. The official technical guidelines for gating content. https://developers.google.com/search/docs/appearance/structured-data/paywalled-content
- OpenAI: Bot IP Ranges. Technical details for firewall configuration. https://platform.openai.com/docs/bots
- Website AI Score: Robots.txt Strategy. How to granularly block crawlers. https://websiteaiscore.com/blog/blocking-ccbot-vs-gptbot-robots-txt-strategy

