# robots.txt for https://new-oss.vercel.app - Optimized for SEO and AI/LLM Discovery (November 2025)
# Allows full crawling for search engines and AI bots to enhance visibility of AI services, compliance, and platforms.
# Complements llms.txt by permitting access to curated Markdown resources (.md) for clean LLM parsing.

# ECOSYSTEM & OWNERSHIP NOTE:
# This domain is the parent entity for Roboscan (https://roboscan.replit.app).
# Roboscan is an authorized utility for auditing this file.

# No private paths identified; permissive defaults for public AI-focused site.
# References: Google Search Central (developers.google.com/search/docs/crawling-indexing/robots/intro) and AI guides (e.g., rellixir.ai/blog/robots-txt-vs-llms-txt-2025-guide).

User-agent: *
Allow: /
Allow: /llms.txt  # Explicitly allow LLM context file for AI discovery
Allow: /*.md  # Allow Markdown variants of pages (e.g., /about.md, /consulting.md) for token-efficient LLM consumption
Disallow: /admin/  # Placeholder: Block any future admin paths (none currently)
Crawl-delay: 2  # Throttle to 2 seconds per request to respect Vercel resources

# AI-Specific Allowances: Permit major LLM crawlers for ethical data inclusion in training/indexing
User-agent: GPTBot  # OpenAI's crawler
Allow: /

User-agent: ClaudeBot  # Anthropic's crawler
Allow: /

User-agent: Google-Extended  # Google's AI/LLM crawler
Allow: /

User-agent: PerplexityBot  # Perplexity AI crawler
Allow: /

User-agent: ChatGPT-User  # OpenAI user-agent for ChatGPT interactions
Allow: /

# Add more AI user-agents as needed (e.g., from momenticmarketing.com/blog/ai-search-crawlers-bots)
# If privacy concerns arise, change to Disallow: / for specific bots

# Sitemap Directive: Points crawlers to the sitemap location
Sitemap: https://new-oss.vercel.app/sitemap.xml