From 智源社区
“Hardcore innovation will become more common in the future. The reason it’s not widely understood yet is that society needs to be educated by real-world results. Once society starts celebrating hardcore innovators, the collective mindset will shift. What we need now is more proof and time.”
— Liang Wenfeng, Founder of DeepSeek
In recent days, DeepSeek has exploded in popularity worldwide. However, due to its low-profile approach and lack of publicity, the general public knows little about this highly promising tech company—its founding background, business scope, or product strategy.
After compiling all available information, I put together this piece:
The AI Players: Their Backgrounds, Battles, and Hiring Trends
This is the second installment of this series, and possibly the most comprehensive historical account of DeepSeek to date.
All DeepSeek-related images in this article, unless otherwise noted, are sourced from official releases and application screenshots. Special thanks to 暗涌Waves for publishing two in-depth interviews with Liang Wenfeng, which provided valuable insights for this piece.
The Birth of DeepSeek
Around this time last year, a friend from High-flyer (幻方量化) asked me, “Do you want to build a large-scale AI model in China?” I spent the entire afternoon just sipping coffee. As it turns out, life is all about choices.
The “High-flyer” mentioned here is DeepSeek’s investor and, in a way, its parent company. In essence, quant funds rely on algorithms, rather than human judgment, to make investment decisions. Founded in 2015, High-flyer grew rapidly, reaching over ¥100 billion in assets under management by 2021, earning its place among China’s “Big Four” quant funds.
The firm’s founder, Liang Wenfeng, who later established DeepSeek, is an unconventional leader in the financial world. Born in the 1980s, he never studied abroad nor won Olympiad medals. Instead, he graduated from Zhejiang University with a degree in AI under the Department of Electronic Engineering. A homegrown tech expert, Liang maintains a low profile, spending his days reading research papers, coding, and participating in group discussions.
Unlike traditional business executives, Liang is more of a pure tech geek. Industry insiders and DeepSeek researchers hold him in high regard, describing him as someone with exceptional infrastructure engineering and AI model research skills, strong leadership in resource mobilization, and the ability to outperform frontline researchers even in technical details. His learning ability has been described as “terrifying.”
AI Ambitions: From Quant to AGI
Even before founding DeepSeek, High-flyer had been laying the groundwork in AI. In a May 2023 interview with Dark Surge, Liang noted:
“After OpenAI released GPT-3 in 2020, it became clear that computing power would be the key factor in AI development. But even in 2021, when we started building Firefly-2 (萤火二号), most people still couldn’t see it.”
Based on this foresight, the company began constructing its own computing infrastructure.
“From a single GPU, to 100 GPUs in 2015, to 1,000 in 2019, and eventually 10,000—it was a gradual process. At first, we used third-party data centers, but as our scale grew, we had to build our own.”
According to Caijing Eleven People, as of 2023, fewer than five companies in China owned over 10,000 GPUs. Apart from major tech giants, one of them was High-flyer. It is widely believed that 10,000 Nvidia A100 chips mark the threshold for training proprietary large-scale AI models.
Interestingly, Liang stated in an interview that their AI pursuit wasn’t driven by some hidden business strategy—it was pure curiosity.
1. The Official Launch of DeepSeek
In May 2023, Dark Surge asked Liang:
“High-flyer recently announced its entry into large-scale AI models. Why would a quant fund take this step?”
His response was clear-cut:
“What we’re doing has nothing to do with quant finance. We set up a brand-new company, DeepSeek, for this purpose. Many of us in High-flyer have backgrounds in AI. Initially, we explored various applications and ended up in finance because it was sufficiently complex. Now, general AI seems like the next big challenge. So for us, it’s a question of how to do it—not why.”
DeepSeek wasn’t chasing market trends or commercial interests—it was about the pursuit of AGI and tackling the hardest, most important problems. The name DeepSeek (深度求索) was officially confirmed in May 2023, and by July 17, 2023, DeepSeek AI Research Institute was formally established in Hangzhou.
On November 2, 2023, DeepSeek delivered its first major breakthrough: the open-source release of DeepSeek Coder, a cutting-edge AI model for code generation. Available in 1B, 7B, and 33B sizes, it included both base and instruction-tuned models.
At the time, Meta’s CodeLlama was the industry benchmark for open-source code models. But DeepSeek Coder outperformed CodeLlama across multiple benchmarks:
- 9.3% on HumanEval
- 10.8% on MBPP
- 5.9% on DS-1000
A remarkable feat considering DeepSeek Coder’s 7B model surpassed CodeLlama’s 34B version. After instruction tuning, it even outperformed OpenAI’s GPT-3.5 Turbo.
Beyond code generation, DeepSeek Coder also excelled in math and reasoning tasks, further proving its capabilities.
Building the Dream Team
Just three days later, on November 5, 2023, DeepSeek began aggressive recruitment, posting multiple job openings on its WeChat public account. Positions included:
- AGI Research Interns
- Data Architects
- Senior Data Collection Engineers
- Deep Learning Engineers When it comes to hiring, Liang emphasizes two must-have qualities: passion and strong fundamentals. He believes:
“Innovation requires minimal intervention and management. People need the freedom to explore and make mistakes. True innovation comes from within—it cannot be forced or taught.”
2. Relentless Model Releases & Commitment to Open-Source
After the success of DeepSeek Coder, DeepSeek set its sights on a bigger challenge: general-purpose large models.
- November 29, 2023 – DeepSeek released its first general LLM, DeepSeek LLM 67B, outperforming Meta’s LLaMA2 70B on nearly 20 public benchmarks in both English and Chinese, excelling in reasoning, mathematics, and coding. Unlike many competitors, DeepSeek fully open-sourced not just the 7B and 67B models but also nine training checkpoints, a rare move in the open-source community.
- December 18, 2023 – DeepSeek launched DreamCraft3D, a text-to-3D model, pushing AIGC from 2D to 3D. Users could generate high-quality 3D assets from simple text prompts, outperforming existing methods in subjective evaluations.
- January 7, 2024 – DeepSeek published a 40+ page technical report detailing Scaling Laws, alignment methods, and AGI evaluation frameworks for DeepSeek LLM 67B. (Report)
- January 11, 2024 – DeepSeek released China’s first MoE-based LLM, DeepSeekMoE, supporting both Chinese and English for free commercial use. The model demonstrated superior efficiency across 2B, 16B, and 145B scales.
- January 25, 2024 – The DeepSeek Coder technical report was released, showcasing its repo-level code understanding and Fill-in-the-Middle training, which significantly improved code completion. (Report)
- January 30, 2024 – DeepSeek opened its API platform, offering 10 million free tokens and OpenAI API compatibility for both Chat and Coder models.
- February 5, 2024 – DeepSeek introduced DeepSeekMath, a 7B model for mathematical reasoning, achieving GPT-4-level performance on the MATH benchmark, outperforming 30B-70B open-source models.
- February 28, 2024 – To ease concerns about licensing, DeepSeek released a detailed open-source FAQ, clarifying commercial usage and licensing terms.
- March 11, 2024 – DeepSeek launched its first multimodal LLM, DeepSeek-VL, in 7B and 1.3B sizes, alongside a research paper.
- March 20, 2024 – DeepSeek’s founder, Liang Wenfeng, spoke at NVIDIA GTC 2024, discussing value alignment in LLMs and the challenges of balancing AI ethics with diverse societal values.
- March 2024 – DeepSeek API entered paid service, triggering China’s LLM price war with a groundbreaking pricing model:
- ¥1 per million input tokens
- ¥2 per million output tokens
- DeepSeek also obtained China’s LLM regulatory approval, clearing the way for wider adoption.
- May 2024 – DeepSeek-V2 MoE LLM was open-sourced, leveraging MLA (Multi-Head Latent Attention) to reduce memory usage by 87-95% compared to traditional MHA. DeepSeek also introduced MoE Sparse, optimizing computational efficiency.
- Price: ¥1 per million input tokens, ¥2 per million output tokens
- Praised by SemiAnalysis as “possibly the best AI research paper of the year.”
- Former OpenAI engineer Andrew Carr called it “full of groundbreaking insights.”
- Competes directly with GPT-4-Turbo, but at 1/70th of the API cost.
- June 17, 2024 – DeepSeek Coder V2 launched, claiming to outperform GPT-4-Turbo in coding tasks. Available in 236B and 16B versions, fully open-sourced, with API access at the same low price.
- June 21, 2024 – DeepSeek Coder introduced live code execution, mirroring Claude 3.5 Sonnet’s Artifacts feature, enabling one-click code execution within the browser.
DeepSeek is not just shaping the open-source AI landscape—it’s redefining LLM pricing and accessibility, bringing GPT-4-level models to the masses at an unprecedented cost.
3. Continuous Breakthroughs & Global Recognition
In May 2024, DeepSeek made headlines with its MoE-based open-source model, DeepSeek V2, which delivered GPT-4-Turbo-level performance at just ¥1 per million input tokens—1/70th the price of its competitor. This aggressive pricing strategy forced major AI players like Zhipu, ByteDance, and Alibaba to slash their prices. Meanwhile, another wave of GPT bans led many AI applications to turn to domestic models for the first time.
In July 2024, DeepSeek’s founder Liang Wenfeng addressed the price war, stating:
“We were surprised by how sensitive people are to pricing. We simply set our prices based on costs, ensuring we neither subsidize nor overcharge. The model is profitable even at this price.”
Unlike other companies relying on subsidies, DeepSeek maintained profitability even at rock-bottom prices. When asked whether this was a user acquisition strategy, Liang emphasized:
“That’s not our main goal. Our costs dropped due to innovations in next-gen model structures, so we passed the savings on. We believe AI and APIs should be affordable for everyone.”
DeepSeek’s vision-driven approach continued to unfold:
- July 4, 2024 – DeepSeek API extended context length to 128K without increasing costs, far exceeding GPT-3.5’s original 4K limit.
- July 10, 2024 – DeepSeekMath-7B was the model of choice for the Top 4 teams in the first Global AI Math Olympiad (AIMO).
- July 18, 2024 – DeepSeek-V2 topped the open-source rankings in the Global LLM Arena (Chatbot Arena), surpassing LLaMA3-70B, Qwen2-72B, Nemotron-4-340B, and Gemma2-27B.
- July 26, 2024 – DeepSeek API introduced advanced features like Fill-in-the-Middle (FIM), function calling, and structured JSON output, significantly improving coding capabilities.
- August 2, 2024 – DeepSeek slashed API costs with an innovative disk caching mechanism, reducing prices to ¥0.1 per million tokens for cached requests.
- August 16, 2024 – DeepSeek-Prover-V1.5 was released, surpassing multiple open-source models in theorem-proving benchmarks.
- September 6, 2024 – DeepSeek-V2.5 merged Chat and Code models, improving alignment with human preferences and excelling in writing and instruction-following tasks.
- September 18, 2024 – DeepSeek-V2.5 dominated domestic LLM rankings, setting new performance benchmarks.
- November 20, 2024 – DeepSeek-R1-Lite launched, comparable to OpenAI’s o1-preview, generating high-quality synthetic data for future model training.
- December 10, 2024 – The final DeepSeek-V2.5-1210 fine-tuned version debuted, excelling in math, coding, writing, and role-playing tasks. DeepSeek’s web app also introduced live internet search.
- December 13, 2024 – DeepSeek-VL2 multimodal MoE model was open-sourced, significantly enhancing vision capabilities across 3B, 16B, and 27B variants.
- December 26, 2024 – DeepSeek-V3 launched, with training costs estimated at just $5.5 million. The model rivaled leading proprietary AI systems while offering faster generation speeds. A 45-day promotional period made the new API more accessible.
2025: A New Era for DeepSeek
- January 15, 2025 – The official DeepSeek app launched on iOS and Android.
- January 20, 2025 – DeepSeek-R1 inference model was open-sourced, matching OpenAI’s o1 in performance. DeepSeek also adopted the MIT License and explicitly allowed model distillation, further embracing open-source collaboration.
- January 27, 2025 – DeepSeek Janus-Pro, a multimodal model named after the Roman god Janus, was open-sourced, excelling in both visual understanding and image generation.
Global Recognition & Industry Impact
DeepSeek’s rapid ascent has drawn global attention. Even U.S. President Donald Trump acknowledged its rise as a “wake-up call” for America, while Microsoft CEO Satya Nadella and OpenAI CEO Sam Altman praised its technological breakthroughs.
The DeepSeek Phenomenon: A Chinese AI Miracle In just two years, DeepSeek has transformed from an unknown startup into a dominant force in AI, proving that Chinese companies can compete at the highest levels of AI innovation.
Trump’s “wake-up call” and Anthropic’s underlying concerns only confirm China’s growing AI power—not just riding the wave, but reshaping its course.
4. Key Product Releases
- Nov 2, 2023 – DeepSeek Coder (Code Model)
- Nov 29, 2023 – DeepSeek LLM 67B (General Model)
- Dec 18, 2023 – DreamCraft3D (Text-to-3D Model)
- Jan 11, 2024 – DeepSeekMoE (MoE Model)
- Feb 5, 2024 – DeepSeekMath (Math Reasoning Model)
- Mar 11, 2024 – DeepSeek-VL (Multimodal Model)
- May 2024 – DeepSeek-V2 (MoE General Model)
- June 17, 2024 – DeepSeek Coder V2 (Code Model)
- Sept 6, 2024 – DeepSeek-V2.5 (Merged General & Code Model)
- Dec 13, 2024 – DeepSeek-VL2 (Multimodal MoE Model)
- Dec 26, 2024 – DeepSeek-V3 (Next-Gen General Model)
- Jan 20, 2025 – DeepSeek-R1 (Inference Model)
- Jan 20, 2025 – DeepSeek Official App (iOS & Android)
- Jan 27, 2025 – DeepSeek Janus-Pro (Multimodal Model)