DeepSeek-V4

DeepSeek-V4 is a preview series of open Mixture-of-Experts LLMs: V4‑Pro (1.6T params, 49B active) and V4‑Flash (284B, 13B active), both with 1M-token context. New hybrid attention (CSA+HCA) cuts long-context compute and KV cache, plus mHC connections and the Muon optimizer for stability. Trained on 32T+ tokens and post-trained with expert specialization + consolidation.

ストックにはログインが必要です