← All days

Day 1

A reference map of every major model family — proprietary APIs, open-weight, and specialized — and how to think about cost, capability, and tradeoffs.

Context

The AI model landscape, as of the most recent updates, has four distinct segments. Understanding what lives in each segment — and why — is foundational knowledge for every AI PM.

Segment 1: Anthropic
Anthropic’s Claude family is built around Constitutional AI alignment and a tiered capability structure. As of the most recent update, the Claude family includes models such as claude-opus-4-6 (highest capability; supports a 1M token context window, suitable for whole-codebase analysis, long document ingestion, and complex multi-step reasoning), claude-sonnet-4-6 (the production workhorse; 200K context, offering a strong performance-to-cost ratio for most enterprise workloads), and claude-haiku-4-5 (fastest and most cost-effective; 200K context, ideal for high-volume classification and real-time UX). Always pull the current model strings from docs.anthropic.com/en/docs/about-claude/models — these change with each release.

Segment 2: OpenAI
OpenAI’s lineup, as of the most recent update, includes the standard family (gpt-4o, 128K context, multimodal, strong function-calling) and the reasoning series (o3, o4-mini; chain-of-thought reasoning with extended compute at inference time). The reasoning series excels at multi-step logic, math, and code — refer to OpenAI’s pricing page for the latest cost details. gpt-4o-mini serves the high-volume economy tier.

Segment 3: Google
Gemini 2.5 Pro leads Google’s family with a 1M+ token context window natively — one of the largest context windows as of the most recent updates. Gemini Flash variants offer lower-cost, lower-latency options. Google’s native integration with Search, Workspace, and Vertex AI makes it the default choice for organizations already in the Google ecosystem.

Segment 4: Microsoft
Microsoft operates two AI tracks as of the most recent updates. The first is the OpenAI partnership — all GPT-4o and o-series models are available via Azure OpenAI Service with enterprise identity, compliance, and VNet integration. The second is Microsoft’s own research portfolio: Phi-4 (14B parameter model achieving near-frontier reasoning at a fraction of the size, available on Azure AI Foundry and Hugging Face) and the new MAI model family — Microsoft’s proprietary series. MAI-DS-R1 is a safety-tuned distillation of DeepSeek R1, enhanced with Microsoft’s responsible AI training and available on Azure. MAI-1 is Microsoft’s in-house large language model reportedly trained at ~500B parameters, designed for Azure enterprise deployments. The MAI family is significant because it gives Microsoft API-level control over safety and compliance without depending on a third-party model provider.

Segment 5: Nvidia
Nvidia entered the model-as-a-product space with the Nemotron family. Llama-3.1-Nemotron-70B-Instruct is a fine-tune of Meta’s Llama 3.1 70B using Nvidia’s synthetic data pipeline, achieving near-GPT-4o performance at open-weight cost. Nemotron-4-340B-Instruct is Nvidia’s larger proprietary model, also available via the Nvidia API and NGC catalog. Minitron is Nvidia’s compressed series (8B, 4B) produced via pruning and distillation — designed for on-device and edge deployment. Nvidia’s strategic play is to own the inference stack (GPU + TensorRT-LLM + model) end-to-end, so model quality drives hardware adoption.

Segment 6: Open-weight tier
The self-hostable tier has reached near-frontier capability. Key models: Llama 4 Maverick (Meta, Community license, MoE architecture, strong general tasks), DeepSeek V3 / R1 (DeepSeek, MIT license — see note below), Mistral Large 2 (Mistral AI, commercial license, EU-residency friendly), Qwen 2.5-72B (Alibaba, Apache 2.0, best multilingual performance in APAC), and Phi-4 (Microsoft, MIT license, frontier reasoning at 14B). The business implication: enterprise customers can now self-host near-frontier models at near-zero per-token cost. Understanding this tier is essential for pricing your own products and defending API cost to skeptical buyers.

The cost-compression trajectory
DeepSeek R1 (released as of late 2024) demonstrated that near-frontier reasoning performance could be achieved at a reported training cost dramatically lower than established frontier labs. Whether or not that figure captures full infrastructure costs, the directional signal is clear: training cost barriers are compressing faster than expected, and this has continued into 2025–26 with Llama 4 and MAI-DS-R1. For API providers, sustained competitive advantage increasingly rests on safety credibility, enterprise compliance infrastructure (SOC 2, HIPAA, FedRAMP), long-context architecture, and ongoing research quality — not training cost moats alone.

Model pricing
Pricing changes quarterly and should never be hardcoded into documentation or estimates. The correct habit: always pull live pricing from the official pricing pages listed in the Resources section of this lesson before any real estimate. The formula that stays stable regardless of price changes:

daily cost =
  (daily_requests × avg_input_tokens  × input_price_per_1M_tokens  / 1,000,000)
+ (daily_requests × avg_output_tokens × output_price_per_1M_tokens / 1,000,000)

monthly cost = daily cost × 30

Apply this formula to the code lab below, then cross-check with live pricing pages. Pricing details for specific models can be found on their respective Anthropic, OpenAI, and Google pricing pages.

Tasks (4)

  1. Set up your ai-pm-60days GitHub repo (25 min)
    Create the portfolio repository you will build throughout this course:
    • Create a public GitHub repo named ai-pm-60days
    • Add a README explaining the project and your learning goals
    • Create a /day-01/ folder as the first entry point
    • Commit and push — this is your Day 1 artifact
    This repo is your primary portfolio artifact. 60 days of quality commits demonstrates applied learning more credibly than any resume claim. A hiring manager should be able to browse your commit history and see a PM who built things, not just read about them.
  2. Build the 2026 model comparison matrix (25 min)
    Create /day-01/model_comparison.md with two clearly labeled sections:
    • Section 1 — Proprietary APIs: Include Claude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 4.5, GPT-4o, o3/o4-mini, Gemini 2.5 Pro, MAI-DS-R1, Llama-3.1-Nemotron-70B. Columns: model name, context window, pricing page link (no hardcoded numbers), key capability strengths, best-fit use case.
    • Section 2 — Open-weight tier: Include Llama 4 Maverick, DeepSeek V3/R1, Mistral Large 2, Qwen 2.5-72B, Phi-4. Columns: provider, license type, key strength, the enterprise deployment scenario it enables.
    • For every model, link to the official pricing page — never write down a number that will be wrong next quarter.
  3. Cost-model a real enterprise use case (25 min)
    Apply the cost formula to a contract review product scenario:
    • Scenario: 500 contracts/month, 50K input tokens per contract, 2K output tokens per contract
    • Use the formula: monthly cost = daily_requests × 30 × [(input_tokens × input_price/1M) + (output_tokens × output_price/1M)]
    • Pull current pricing from the official pages (Anthropic, OpenAI, Google) and calculate for at least 3 models
    • Identify the volume threshold at which cost becomes a business-critical constraint and architectural changes (caching, model routing, batching) become necessary
    • Save your analysis as /day-01/cost_model.md
  4. Write the open-source objection response (25 min)
    Prepare a structured 200-word answer to the enterprise buyer question: “We have strong AI engineers. Why should we pay for Claude API when we can self-host Llama 4 Maverick for free?”
    • Lead with acknowledgment — Llama 4 Maverick is a genuinely strong model
    • Name specific differentiators: Constitutional AI training, Responsible Scaling Policy, SOC 2 Type II, HIPAA eligibility, 99.9% SLA, audit trails, support contracts
    • Address the real cost of self-hosting: GPU infrastructure, model serving, fine-tuning maintenance, security patching, compliance overhead
    • End with the honest framing: benchmark both on your specific use case, but know what proprietary APIs include beyond the token price
    • Save as /day-01/open_source_objection.md

Model landscape: cost estimation framework — Python

# Day 01 — AI Model Landscape & Cost Estimation Framework
#
# PRICING NOTE: Always verify prices at official pages before any real estimate.
# Links to all provider pricing pages:
#   Anthropic : https://www.anthropic.com/pricing
#   OpenAI    : https://openai.com/pricing
#   Google    : https://cloud.google.com/pricing
#   Microsoft : https://learn.microsoft.com/en-us/azure/foundry/foundry-models/concepts/models-sold-directly-by-azure
#   Nvidia    : https://build.nvidia.com/  (NIM API pricing)
#
# The formula (memorise this — plug in live prices before any real estimate):
#
#   daily cost = (daily_requests x avg_input_tokens  x input_price_per_1M  / 1_000_000)
#              + (daily_requests x avg_output_tokens x output_price_per_1M / 1_000_000)
#
#   monthly cost = daily_cost x 30

# ---- MODEL REGISTRY -------------------------------------------------------
# 'ctx' = context window  |  'tier' = rough cost bucket

PROPRIETARY_MODELS = {
    # -- Anthropic -----------------------------------------------------------
    "claude-opus-4-6":           {"ctx": "1M",   "tier": "premium",   "provider": "Anthropic",  "notes": "1M context, highest capability, complex reasoning"},
    "claude-sonnet-4-6":         {"ctx": "200K", "tier": "standard",  "provider": "Anthropic",  "notes": "Production workhorse, best perf/cost ratio"},
    "claude-haiku-4-5-20251001": {"ctx": "200K", "tier": "economy",   "provider": "Anthropic",  "notes": "High-volume, fast, lowest cost"},
    # -- OpenAI --------------------------------------------------------------
    "gpt-4o":                    {"ctx": "128K", "tier": "standard",  "provider": "OpenAI",     "notes": "Strong ecosystem, multimodal, function-calling"},
    "o3":                        {"ctx": "128K", "tier": "reasoning",  "provider": "OpenAI",     "notes": "Best reasoning, 3-10x latency premium"},
    "o4-mini":                   {"ctx": "128K", "tier": "reasoning",  "provider": "OpenAI",     "notes": "Reasoning at economy cost"},
    "gpt-4o-mini":               {"ctx": "128K", "tier": "economy",   "provider": "OpenAI",     "notes": "High-volume, low-cost standard model"},
    # -- Google --------------------------------------------------------------
    "gemini-2.5-pro":            {"ctx": "1M+",  "tier": "premium",   "provider": "Google",     "notes": "Largest native context, Google ecosystem native"},
    # -- Microsoft -----------------------------------------------------------
    "phi-4":                     {"ctx": "16K",  "tier": "economy",   "provider": "Microsoft",  "notes": "Frontier reasoning at 14B params, MIT license"},
    "mai-ds-r1":                 {"ctx": "128K", "tier": "reasoning",  "provider": "Microsoft",  "notes": "DeepSeek R1 + Microsoft responsible AI tuning"},
    # -- Nvidia --------------------------------------------------------------
    "llama-3.1-nemotron-70b":    {"ctx": "128K", "tier": "standard",  "provider": "Nvidia",     "notes": "Llama 3.1 fine-tuned by Nvidia, near-GPT-4o"},
    "nemotron-4-340b":           {"ctx": "4K",   "tier": "premium",   "provider": "Nvidia",     "notes": "Nvidia's largest proprietary model, NIM API"},
}

OPEN_WEIGHT_MODELS = [
    {"model": "Llama 4 Maverick",  "provider": "Meta",      "license": "Community",  "strength": "Frontier-class MoE, free self-host"},
    {"model": "DeepSeek V3 / R1",  "provider": "DeepSeek",  "license": "MIT",        "strength": "Near-frontier reasoning, low training cost signal"},
    {"model": "Mistral Large 2",   "provider": "Mistral",   "license": "Commercial", "strength": "EU data residency, strong reasoning"},
    {"model": "Qwen 2.5-72B",      "provider": "Alibaba",   "license": "Apache 2.0", "strength": "Best multilingual, APAC enterprise"},
    {"model": "Phi-4",             "provider": "Microsoft", "license": "MIT",        "strength": "Frontier reasoning at 14B, edge deployable"},
]

# ---- COST FORMULA ---------------------------------------------------------

def monthly_cost(daily_requests, avg_input_tokens, avg_output_tokens,
                 input_price_per_1m, output_price_per_1m):
    """
    monthly_cost = daily_requests x 30 x [
        (avg_input_tokens  x input_price_per_1M  / 1_000_000) +
        (avg_output_tokens x output_price_per_1M / 1_000_000)
    ]
    Prices are per 1M tokens. Plug in LIVE prices from provider pricing pages.
    """
    cost_per_request = (
        (avg_input_tokens  / 1_000_000) * input_price_per_1m +
        (avg_output_tokens / 1_000_000) * output_price_per_1m
    )
    return cost_per_request * daily_requests * 30

# ---- OUTPUT ---------------------------------------------------------------

print("=" * 68)
print("PROPRIETARY MODEL LANDSCAPE  (verify model strings at provider docs)")
print("=" * 68)
for name, m in PROPRIETARY_MODELS.items():
    print(f"  [{m['provider']:<11}]  {name:<32}  ctx: {m['ctx']:>5}  [{m['tier']}]")
    print(f"              {m['notes']}")

print()
print("=" * 68)
print("OPEN-WEIGHT TIER  (self-hostable, near-zero per-token cost)")
print("=" * 68)
for m in OPEN_WEIGHT_MODELS:
    print(f"  {m['model']:<22}  {m['provider']:<11}  {m['license']:<12}")
    print(f"    {m['strength']}")

print()
print("=" * 68)
print("COST FORMULA DEMO  (VERIFY ALL PRICES before any real estimate)")
print("=" * 68)
print("Scenario: 500 contracts/month, 50K input tokens, 2K output tokens")
print("Formula : monthly = daily_req x 30 x [(in_tok x in_price + out_tok x out_price) / 1M]")
print()
# Placeholder prices — REPLACE WITH CURRENT PRICES FROM PROVIDER PAGES
scenarios = [
    ("Claude Sonnet 4.6  ", 17, 50_000, 2_000,  3.00, 15.00),   # verify at anthropic.com/pricing
    ("Claude Haiku 4.5   ", 17, 50_000, 2_000,  0.25,  1.25),   # verify at anthropic.com/pricing
    ("GPT-4o             ", 17, 50_000, 2_000,  2.50, 10.00),   # verify at openai.com/pricing
    ("Gemini 2.5 Pro     ", 17, 50_000, 2_000,  1.25,  5.00),   # verify at ai.google.dev/pricing
]
for name, req, inp, out, in_p, out_p in scenarios:
    cost = monthly_cost(req, inp, out, in_p, out_p)
    print(f"  {name}  ~${cost:>8,.2f}/month   <-- VERIFY PRICE")
print()
print("Tip: Output tokens cost 3-5x more than input tokens across all providers.")
print("     Optimise prompts (input) AND response length (output) for cost control.")

Interview question

Walk me through the 2026 AI model landscape. Which model would you choose for a new enterprise contract review product, and why?

A strong answer covers four things: the landscape structure, the recommendation with specific reasoning, the competitive alternatives, and the open-source objection.

The landscape: Four main proprietary players — Anthropic (Claude family, Constitutional AI), OpenAI (GPT + reasoning o-series), Google (Gemini, longest native context), and Microsoft (Azure OpenAI + MAI family + Phi-4). Plus Nvidia’s Nemotron models entering the space. The open-weight tier — Llama 4, DeepSeek, Mistral, Qwen, Phi-4 — is now near-frontier quality and changes the pricing conversation entirely.

Recommendation for contract review: Claude Sonnet 4.6 (claude-sonnet-4-6) for most contracts; Claude Opus 4.6 for unusually complex multi-jurisdiction documents where the 1M context window and maximum reasoning depth are justified. Sonnet’s 200K context handles full contracts without chunking (avoiding retrieval errors), and Constitutional AI training produces calibrated outputs on sensitive legal content. For the latest pricing details, refer to Anthropic’s pricing page.

Why not GPT-4o? Strong ecosystem, but 128K context forces chunking on large contracts, which increases architectural complexity. For Azure-first customers, Claude via Azure AI Marketplace removes the procurement friction — same model, enterprise identity integration included.

The open-source objection: Llama 4 Maverick is an excellent model. What self-hosting doesn’t include is Anthropic’s Constitutional AI training, Responsible Scaling Policy, SOC 2 Type II certification, HIPAA eligibility, a 99.9% SLA, or enterprise audit trails. For a law firm handling privileged documents, those aren’t nice-to-haves — they’re procurement requirements. Always benchmark both on your specific use case before committing.

PM angle

The model landscape is not a static list to memorise — it’s a map that changes every quarter. The PM skill is knowing how to read the map: which segment each model belongs to, what tradeoffs define that segment, and how pricing, context window, safety posture, and compliance certifications affect build/buy/partner decisions. Microsoft’s MAI family and Nvidia’s Nemotron entry signal that model production is broadening beyond the original frontier labs — the competitive landscape in 2027 will look meaningfully different from today.

Resources