DeepSeek V4 - almost on the frontier, a fraction of the price

Simon Willison's Weblog

View Original ↗
AI 導讀 technology AI 重要性 4/5

DeepSeek V4-Flash 輸入定價 $0.14/M tokens 居全球最低;1.6 兆參數的 V4-Pro 登頂開放權重模型,效能落後頂尖前沿模型約 3-6 個月。

  • V4-Flash 輸入定價 $0.14/M tokens,比 GPT-5.4 Nano 便宜 30%,是目前所有前沿小模型中最低定價。
  • V4-Pro 1.6 兆參數超越所有現有開放模型,在 1M Context 下計算量僅為 V3.2 的 27%,低定價有效率根基。
  • DeepSeek 自評 V4-Pro-Max 落後 GPT-5.4 與 Gemini-3.1-Pro 約 3-6 個月,MIT 授權可自由商用。

DeepSeek V4-Flash 的輸入定價僅 $0.14/百萬 tokens,比 OpenAI GPT-5.4 Nano 低 30%,比 Claude Haiku 4.5 便宜整整七倍;旗艦版 V4-Pro 以 1.6 兆總參數躍升為全球最大開放權重模型。2026 年 4 月 24 日,中國 AI 實驗室 DeepSeek 發布 V4 系列兩款預覽模型,以效率驅動的定價策略再次衝擊全球 AI 市場格局。

V4-Pro 1.6 兆參數:超越 Kimi K2.6 成最大開放模型

DeepSeek V4 系列分為兩款:DeepSeek-V4-ProDeepSeek-V4-Flash,均採用混合專家架構(Mixture of Experts,MoE——模型擁有龐大參數池,但每次推理只啟動其中一小部分),支援 100 萬 tokens 超長上下文,並以 MIT 開源授權發布,可自由商用。

V4-Pro 總參數量達 1.6 兆(1.6T),每次推理啟動 49B 活躍參數;Flash 版為 284B 總參數,啟動 13B。這使 V4-Pro 超越了 Kimi K2.6(1.1T)、GLM-5.1(754B),也是去年 12 月發布的 DeepSeek V3.2(685B)的兩倍有餘,成為目前有紀錄以來最大的開放權重模型。模型檔案方面,Pro 版在 Hugging Face 上重達 865GB,Flash 版為 160GB。量化壓縮後的 Flash 版有機會在 128GB M5 MacBook Pro 上本機運行,Pro 版若能做到僅串流必要活躍專家也存在可能性。

Flash $0.14/M:定價擊穿所有前沿小模型底線

DeepSeek V4 的定價是本次發布最受矚目的核心。Flash 版輸入 $0.14/百萬 tokens、輸出 $0.28/百萬 tokens;Pro 版輸入 $1.74/百萬 tokens、輸出 $3.48/百萬 tokens

對照下表可看出,V4-Flash 是當前小模型中定價最低的,低於 GPT-5.4 Nano($0.20)、Gemini 3.1 Flash-Lite($0.25);V4-Pro 則是大型前沿模型中最便宜的,比 Gemini 3.1 Pro($2.00)低 13%,比 GPT-5.4($2.50)低 30%,更不到 Claude Opus 4.7($5.00)的三分之一。對於高頻 API 呼叫場景,這個差距會直接反映在每月帳單上。

1M Context 計算量壓縮至 V3.2 的 7-27%

DeepSeek 在技術論文中揭示了他們能以低價提供服務的根本原因:本次發布的重心在於大幅提升長上下文場景的運算效率。以 100 萬 tokens 的極長上下文為例,V4-Pro 的單 token 浮點運算量(FLOPs)僅為 DeepSeek-V3.2 的 27%;KV 快取(推理時暫存注意力中間結果、加速長文處理的機制)大小也只剩 10%。效率更極致的 Flash 版在同樣情境下,FLOPs 僅為 V3.2 的 10%,KV 快取縮至 7%

這意味著相同的硬體資源可以服務更多並發用戶,每 token 的邊際成本大幅下降。DeepSeek 的低定價不是補貼策略,而是真實的架構效率優勢。

自評落後 GPT-5.4 約 3-6 個月,量化版本即將到來

從 DeepSeek 自行報告的基準測試來看,V4-Pro 在各項任務中與前沿模型具備競爭力。啟用擴展推理 tokens 的 V4-Pro-Max 模式,在標準推理基準上超越了 GPT-5.2 和 Gemini-3.0-Pro。然而,DeepSeek 在論文中直接坦承:V4-Pro-Max「略遜於 GPT-5.4 和 Gemini-3.1-Pro,顯示其發展軌跡落後當前最先進前沿模型約 3 至 6 個月」。

這種罕見的坦誠提供了清晰的性價比定位:以不到前沿模型六成的價格,換取約九成的效能。量化版本方面,可持續關注 Unsloth 社群(huggingface.co/unsloth/models),其 4-bit 量化版本預計很快推出,屆時 Flash 在消費級硬體上本機運行將大幅普及。

DeepSeek V4 的成本護城河來自架構效率:1M token 場景下計算量降至前代的 7-10%,這才是能持續以低定價盈利的真正理由。

Abstract

Chinese AI lab DeepSeek's last model release was V3.2 (and V3.2 Speciale) last December. They just dropped the first of their hotly anticipated V4 series in the shape of two preview models, DeepSeek-V4-Pro and DeepSeek-V4-Flash. Both models are 1 million token context Mixture of Experts. Pro is 1.6T total parameters, 49B active. Flash is 284B total, 13B active. They're using the standard MIT license. I think this makes DeepSeek-V4-Pro the new largest open weights model. It's larger than Kimi K2.6 (1.1T) and GLM-5.1 (754B) and more than twice the size of DeepSeek V3.2 (685B). Pro is 865GB on Hugging Face, Flash is 160GB. I'm hoping that a lightly quantized Flash will run on my 128GB M5 MacBook Pro. It's possible the Pro model may run on it if I can stream just the necessary active experts from disk. For the moment I tried the models out via OpenRouter, using llm-openrouter: llm install llm-openrouter llm openrouter refresh llm -m openrouter/deepseek/deepseek-v4-pro 'Generate an SVG of a pelican riding a bicycle' Here's the pelican for DeepSeek-V4-Flash: And for DeepSeek-V4-Pro: For comparison, take a look at the pelicans I got from DeepSeek V3.2 in December, V3.1 in August, and V3-0324 in March 2025. So the pelicans are pretty good, but what's really notable here is the cost. DeepSeek V4 is a very, very inexpensive model. Here's DeepSeek's pricing page. They're charging $0.14/million tokens input and $0.28/million tokens output for Flash, and $1.74/million input and $3.48/million output for Pro. Here's a comparison table with the frontier models from Gemini, OpenAI and Anthropic: Model Input ($/M) Output ($/M) DeepSeek V4 Flash $0.14 $0.28 GPT-5.4 Nano $0.20 $1.25 Gemini 3.1 Flash-Lite $0.25 $1.50 Gemini 3 Flash Preview $0.50 $3 GPT-5.4 Mini $0.75 $4.50 Claude Haiku 4.5 $1 $5 DeepSeek V4 Pro $1.74 $3.48 Gemini 3.1 Pro $2 $12 GPT-5.4 $2.50 $15 Claude Sonnet 4.6 $3 $15 Claude Opus 4.7 $5 $25 GPT-5.5 $5 $30 DeepSeek-V4-Flash is the cheapest of the small models, beating even OpenAI's GPT-5.4 Nano. DeepSeek-V4-Pro is the cheapest of the larger frontier models. This note from the DeepSeek paper helps explain why they can price these models so low - they've focused a great deal on efficiency with this release, especially for longer context prompts: In the scenario of 1M-token context, even DeepSeek-V4-Pro, which has a larger number of activated parameters, attains only 27% of the single-token FLOPs (measured in equivalent FP8 FLOPs) and 10% of the KV cache size relative to DeepSeek-V3.2. Furthermore, DeepSeek-V4-Flash, with its smaller number of activated parameters, pushes efficiency even further: in the 1M-token context setting, it achieves only 10% of the single-token FLOPs and 7% of the KV cache size compared with DeepSeek-V3.2. DeepSeek's self-reported benchmarks in their paper show their Pro model competitive with those other frontier models, albeit with this note: Through the expansion of reasoning tokens, DeepSeek-V4-Pro-Max demonstrates superior performance relative to GPT-5.2 and Gemini-3.0-Pro on standard reasoning benchmarks. Nevertheless, its performance falls marginally short of GPT-5.4 and Gemini-3.1-Pro, suggesting a developmental trajectory that trails state-of-the-art frontier models by approximately 3 to 6 months. I'm keeping an eye on huggingface.co/unsloth/models as I expect the Unsloth team will have a set of quantized versions out pretty soon. It's going to be very interesting to see how well that Flash model runs on my own machine. Tags: ai, generative-ai, llms, llm, llm-pricing, pelican-riding-a-bicycle, deepseek, llm-release, openrouter, ai-in-china