Rendered at 08:08:03 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
creamyhorror 1 days ago [-]
Two key quotes:
• Reasoning: Through the expansion of reasoning tokens, DeepSeek-V4-Pro-Max demonstrates superior performance relative to GPT-5.2 and Gemini-3.0-Pro on standard reasoning benchmarks. Nevertheless, its performance falls marginally short of GPT-5.4 and Gemini3.1-Pro, suggesting a developmental trajectory that trails state-of-the-art frontier models by approximately 3 to 6 months. Furthermore, DeepSeek-V4-Flash-Max achieves comparable performance to GPT-5.2 and Gemini-3.0-Pro, establishing itself as a highly cost-effective architecture for complex reasoning tasks.
• Agent: On public benchmarks, DeepSeek-V4-Pro-Max is on par with leading open-source models, such as Kimi-K2.6 and GLM-5.1, but slightly worse than frontier closed models. In our internal evaluation, DeepSeek-V4-Pro-Max outperforms Claude Sonnet 4.5 and approaches the level of Opus 4.5.
While they're some months behind closed SOTA (though benchmarks put them close), I wonder if Deepseek 4's longer context capabilities and kv-cache advantage will make up for this
daemonologist 1 days ago [-]
$1.47/M input, $3.48/M output, open weights (MIT license), and competitive with the frontier on their selected benchmarks. Big if it holds up on real-world tasks.
nthypes 1 days ago [-]
Insane! Price is amazing with Opus 4.6 frontier level.
nthypes 1 days ago [-]
Actually better than Opus 4.6 on Terminal Bench 2.0
• Reasoning: Through the expansion of reasoning tokens, DeepSeek-V4-Pro-Max demonstrates superior performance relative to GPT-5.2 and Gemini-3.0-Pro on standard reasoning benchmarks. Nevertheless, its performance falls marginally short of GPT-5.4 and Gemini3.1-Pro, suggesting a developmental trajectory that trails state-of-the-art frontier models by approximately 3 to 6 months. Furthermore, DeepSeek-V4-Flash-Max achieves comparable performance to GPT-5.2 and Gemini-3.0-Pro, establishing itself as a highly cost-effective architecture for complex reasoning tasks.
• Agent: On public benchmarks, DeepSeek-V4-Pro-Max is on par with leading open-source models, such as Kimi-K2.6 and GLM-5.1, but slightly worse than frontier closed models. In our internal evaluation, DeepSeek-V4-Pro-Max outperforms Claude Sonnet 4.5 and approaches the level of Opus 4.5.
While they're some months behind closed SOTA (though benchmarks put them close), I wonder if Deepseek 4's longer context capabilities and kv-cache advantage will make up for this