Featured image of post DeepSeek Founder Interview

DeepSeek Founder Interview

China's AI Cannot Forever Follow; Someone Must Stand at the Technological Frontier.

Source: 暗涌 Waves, Translated by DeepSeek

With the release of the V3 open-source model, DeepSeek has once again captured global attention, this time going viral internationally.

The training cost of DeepSeek V3 is estimated to be just one-eleventh of that of the Llama 3.1 405B model, yet it outperforms the latter. In multiple benchmarks, DeepSeek V3 has achieved state-of-the-art (SOTA) results among open-source models, surpassing Llama 3.1 405B and competing head-to-head with top models like GPT-4o and Claude 3.5 Sonnet—while being priced cheaper than Claude 3.5 Haiku, at just 9% of Claude 3.5 Sonnet’s cost. Ranked 7th on the Chatbot Arena leaderboard, DeepSeek is the only open-source model in the top ten, licensed under the least restrictive MIT license.

In May 2024, DeepSeek rose to fame with the release of its open-source model, DeepSeek V2, which offered unprecedented cost-performance efficiency, sparking a price war in the domestic large model market. As the only non-major company with a reserve of 10,000 A100 chips, DeepSeek has made many unconventional choices. It remains focused on research and technology, avoiding consumer-facing applications and commercialization, and steadfastly pursuing an open-source path without external funding.

How was DeepSeek built? Dark Waves, under 36Kr, interviewed the reclusive DeepSeek founder, Liang Wenfeng, in May 2023 and July 2024. A technical idealist, Liang represents a rare voice in China’s tech scene, prioritizing “right and wrong” over “profit and loss” and advocating for original innovation.

01 How Did the Price War Begin?

Waves: After the release of DeepSeek V2, a fierce price war erupted in the large model market. Some say you’ve become a disruptor.

Liang Wenfeng: We didn’t intend to be disruptors; it just happened.

Waves: Were you surprised by the outcome?

Liang Wenfeng: Very surprised. We didn’t expect pricing to be so sensitive. We simply followed our own pace and priced based on cost, aiming for modest profits without subsidies.

Waves: Five days later, Zhipu AI followed suit, then ByteDance, Alibaba, Baidu, and Tencent.

Liang Wenfeng: Zhipu AI only reduced prices for an entry-level product. ByteDance was the first to match our flagship model’s price, triggering a wave of reductions. Major companies, with higher costs, are now operating at a loss, reminiscent of the internet era’s subsidy wars.

Waves: Externally, price cuts seem like a user grab, typical of internet-era competition.

Liang Wenfeng: Grabbing users isn’t our goal. We reduced prices because our next-gen model structure lowered costs, and we believe APIs and AI should be affordable and accessible to all.

Waves: Why focus on model structure instead of copying Llama like others?

Liang Wenfeng: If the goal is applications, copying Llama makes sense. But our destination is AGI, requiring new model structures and foundational research to scale up efficiently.

02 The Real Gap is Between Originality and Imitation

Waves: Why did DeepSeek V2 surprise Silicon Valley?

Liang Wenfeng: In the U.S., such innovation is common. What’s surprising is a Chinese company contributing as an innovator, not just a follower.

Waves: But isn’t this approach too luxurious for China, given the heavy investment required?

Liang Wenfeng: Innovation is costly, but China’s economy and tech giants’ profits are substantial. What’s lacking isn’t capital but confidence and the ability to organize high-density talent for effective innovation.

Waves: Why do Chinese companies, including well-funded ones, prioritize quick commercialization?

Liang Wenfeng: For decades, we’ve emphasized profit over innovation. Innovation requires curiosity and creativity, not just business drives.

03 DeepSeek’s Mission: Research and Exploration

Waves: Why did a quantitative fund like High-Flyer 幻方量化 (A quantitative investment firm and early DeepSeek backer) decide to venture into large models? What’s the rationale behind this move?

Liang Wenfeng: Actually, our work on large models has no direct connection to quantitative finance. We established a new company called DeepSeek specifically for this purpose. Many of High-Flyer’s core team members have backgrounds in artificial intelligence. Initially, we explored various scenarios and eventually settled on finance due to its complexity. We believe that artificial general intelligence (AGI) is one of the next great challenges, so for us, it’s more about how to do it rather than why.

Waves: Are you planning to train a general-purpose large model, or will you focus on a vertical-specific model, such as one tailored for finance?

Liang Wenfeng: Our goal is to develop AGI. Language models are likely a necessary step toward AGI, as they already exhibit some AGI-like characteristics. We’ll start with language models and later expand into areas like vision.

Waves: With major tech companies entering the field, many startups have abandoned the idea of focusing solely on general-purpose large models. Why haven’t you?

Liang Wenfeng: We won’t rush into developing applications based on our models. Our focus remains on the large models themselves.

Waves: Many believe that startups entering the field after major players have established dominance is no longer a good strategy. What’s your take?

Liang Wenfeng: At this point, neither major companies nor startups can easily establish a decisive technological advantage. With OpenAI leading the way and everyone building on publicly available papers and code, both large companies and startups will likely develop their own large language models by next year. Startups and incumbents each have their opportunities. While startups don’t currently control vertical-specific scenarios, these scenarios are fragmented and better suited for agile startup organizations.

In the long term, the barriers to applying large models will continue to lower, meaning startups will have opportunities to enter the field at any point in the next 20 years. Our goal is clear: we’re not focusing on verticals or applications but on research and exploration.

Waves: Why do you define your mission as “research and exploration”?

Liang Wenfeng: It’s driven by curiosity. On a broader level, we want to test certain hypotheses. For example, we believe that human intelligence may fundamentally be rooted in language—that thinking is essentially a linguistic process. What we perceive as thinking might just be the brain weaving language. This suggests that human-like AI (AGI) could emerge from language models. On a more immediate level, GPT-4 still holds many mysteries. While we aim to replicate it, we’re also conducting research to uncover its secrets.

Waves: But research comes with significantly higher costs.

Liang Wenfeng: If we were merely replicating existing work, we could rely on publicly available papers or open-source code, requiring minimal training or fine-tuning, which would be cost-effective. However, research involves extensive experimentation and comparison, demanding more computational resources and highly skilled personnel, which drives up costs.

Waves: Where does the funding for research come from?

Liang Wenfeng: High-Flyer is one of our investors and has ample R&D budgets. Additionally, we have an annual donation budget of several hundred million yuan, which has traditionally gone to public welfare organizations. If needed, we can reallocate some of these funds.

Waves: But building foundational large models requires at least $200-300 million just to get started. How do you plan to sustain such ongoing investments?

Liang Wenfeng: We’re in talks with various potential investors. However, many VCs are hesitant about funding pure research. They have exit expectations and want to see quick product commercialization. Given our research-first approach, securing VC funding is challenging. That said, we already have computational resources and an engineering team, which gives us a solid foundation.

Waves: What kind of business model are you envisioning?

Liang Wenfeng: We’re considering open-sourcing most of our training results, which could align with commercialization efforts. We want to make large models accessible to everyone, even small app developers, at low costs, rather than having the technology monopolized by a few companies.

Waves: Major tech companies will also offer services later. What differentiates you from them?

Liang Wenfeng: Their models are likely to be tied to their platforms or ecosystems, whereas ours will remain completely free and open.

Waves: Regardless, it seems somewhat crazy for a commercial company to engage in such an open-ended, research-driven exploration.

Liang Wenfeng: If you’re looking for a purely commercial rationale, you won’t find one—it’s not cost-effective. From a business perspective, fundamental research has a low return on investment. When OpenAI’s early investors put money in, they weren’t thinking about returns; they genuinely wanted to advance the field. We’re confident that, given our capabilities and the timing, we’re among the best-suited to pursue this mission.

04 The Curiosity-Driven Reserve of 10,000 GPUs

Waves: GPUs have become a scarce resource in the ChatGPT-driven entrepreneurial wave. Yet, as early as 2021, you had the foresight to stockpile 10,000 GPUs. Why?

Liang Wenfeng: It was a gradual process, starting from just one GPU, then 100 in 2015, 1,000 in 2019, and eventually 10,000. Initially, we hosted our systems in IDCs, but as the scale grew, hosting became insufficient, so we built our own data centers. Many might assume there’s some hidden business logic behind this, but in reality, it was primarily driven by curiosity.

Waves: What kind of curiosity?

Liang Wenfeng: Curiosity about the boundaries of AI capabilities. For outsiders, the ChatGPT wave was a massive shock, but for insiders, the real turning point was AlexNet in 2012, which revolutionized neural network research after decades of dormancy. While specific technologies keep evolving, the combination of models, data, and computing power remains constant. When OpenAI released GPT-3 in 2020, it became clear that massive computational power was essential. Yet, even in 2021, when we invested in building Firefly II, most people still didn’t fully grasp this.

Waves: So, you’ve been focusing on computational power since 2012?

Liang Wenfeng: For researchers, the thirst for computing power is insatiable. After conducting small-scale experiments, we always want to scale up. Since then, we’ve consciously deployed as much computational power as possible.

Waves: Many assume that building such a computing cluster is for quantitative hedge funds using machine learning for price predictions.

Liang Wenfeng: If we were solely focused on quantitative investing, a few GPUs would suffice. Beyond investing, we conduct extensive research to understand what paradigms can fully describe financial markets, whether there are more concise representations, the limits of different paradigms, and their broader applicability.

Waves: But this process is also a money-burning endeavor.

Liang Wenfeng: Something exciting can’t always be measured in monetary terms. It’s like buying a piano for your home—you can afford it, and there’s a group of people eager to play it.

Waves: GPUs typically depreciate at a rate of 20% annually.

Liang Wenfeng: We haven’t calculated it precisely, but it’s likely less. Nvidia GPUs are like hard currency—even older models are still widely used. When we retired some older GPUs, they still held significant resale value, so we didn’t lose much.

Waves: Building a computing cluster also incurs substantial maintenance, labor, and electricity costs.

Liang Wenfeng: Electricity and maintenance costs are actually quite low, accounting for only about 1% of the hardware’s annual cost. Labor costs are higher, but they’re an investment in the future and the company’s greatest asset. We select people who are relatively down-to-earth, curious, and eager to conduct research here.

Waves: In 2021, High-Flyer was among the first in the Asia-Pacific region to acquire A100 GPUs. Why were you ahead of some cloud providers?

Liang Wenfeng: We had been researching, testing, and planning for new GPUs well in advance. As for cloud providers, their demand was fragmented until 2022, when autonomous driving created a need for rented training machines with paying customers, prompting them to build the necessary infrastructure. Large companies rarely focus solely on research or training—they’re more driven by business needs.

Waves: How do you view the competitive landscape of large models?

Liang Wenfeng: Major companies certainly have advantages, but if they can’t quickly apply their models, they may not sustain their efforts, as they need to see results. Leading startups also have solid technical foundations, but like the previous wave of AI startups, they face commercialization challenges.

Waves: Some might think a quantitative fund emphasizing AI is just creating hype for other businesses.

Liang Wenfeng: Actually, our quantitative fund has largely stopped raising external capital.

Waves: How do you distinguish between true AI believers and opportunists?

Liang Wenfeng: True believers were here before and will remain here. They’re more likely to buy GPUs in bulk or sign long-term agreements with cloud providers rather than renting short-term.

05 DeepSeek’s Talent: Homegrown Innovators

Waves: Jack Clark, former policy director at OpenAI and co-founder of Anthropic, described DeepSeek as having hired “a group of enigmatic geniuses.” What kind of team built DeepSeek V2?

Liang Wenfeng: There are no enigmatic geniuses here. The team consists of fresh graduates from top universities, PhD interns in their fourth or fifth year, and young researchers who graduated just a few years ago.

Waves: Many large model companies are fixated on recruiting talent from overseas. Some believe that the top 50 talents in this field are not in China. Where does your team come from?

Liang Wenfeng: The V2 model was developed entirely by local talent, with no returnees from overseas. While the top 50 talents might not be in China, perhaps we can cultivate such individuals ourselves.

Waves: How did the MLA innovation come about? I heard the idea originated from a young researcher’s personal interest?

Hifh-Flyer proposed a novel MLA (a new multi-head latent attention mechanism) architecture, reducing memory usage to 5%-13% of the previously dominant MHA architecture.

Liang Wenfeng: After summarizing the evolutionary patterns of the Attention architecture, he came up with the idea to design an alternative. However, turning the idea into reality was a lengthy process. We formed a dedicated team and spent several months to make it work.

Waves: The birth of such divergent ideas seems closely tied to your entirely innovation-driven organizational structure. Even during the High-Flyer era, you rarely assigned tasks or goals top-down. But for AGI, an uncertain frontier, does this require more management intervention?

Liang Wenfeng: DeepSeek operates entirely bottom-up as well. We don’t pre-assign roles; instead, roles emerge naturally. Everyone has unique experiences and comes with their own ideas—they don’t need to be pushed. During exploration, when someone encounters a problem, they naturally gather others to discuss it. However, when an idea shows potential, we do allocate resources top-down.

Waves: I’ve heard that DeepSeek is very flexible in allocating GPUs and personnel.

Liang Wenfeng: There are no limits on how individuals can allocate GPUs or personnel. If someone has an idea, they can immediately access the training cluster’s GPUs without approval. Since there are no hierarchies or cross-departmental barriers, they can also freely involve anyone, as long as the other person is interested.

Waves: This loose management style also depends on having a team driven by strong passion. I’ve heard you’re skilled at identifying talent through unconventional metrics, allowing exceptional individuals who don’t fit traditional evaluation criteria to stand out.

Liang Wenfeng: Our hiring criteria have always been passion and curiosity. That’s why many team members have unique and interesting backgrounds. For many, the desire to conduct research far outweighs their concern for money.

Waves: The Transformer was born at Google’s AI Lab, and ChatGPT at OpenAI. What do you think is the difference in the value of innovation between a large company’s AI Lab and a startup?

Liang Wenfeng: Whether it’s Google’s lab, OpenAI, or even the AI Labs of major Chinese companies, they all hold significant value. The fact that OpenAI succeeded also has an element of historical serendipity.

06 Innovation is Self-Generated, Not Taught

Waves: To a large extent, is innovation also a matter of chance? I noticed that the row of meeting rooms in your office has doors on both sides that can be easily pushed open. Your colleagues say this is to leave room for serendipity. The story of the Transformer’s creation involves someone who happened to overhear a conversation, joined in, and eventually helped turn it into a universal framework.

Liang Wenfeng: I believe innovation is, first and foremost, a matter of belief. Why is Silicon Valley so innovative? It’s because they dare to take risks. When ChatGPT emerged, the entire domestic ecosystem lacked confidence in pursuing cutting-edge innovation. From investors to major companies, everyone felt the gap was too wide and opted to focus on applications instead. But innovation requires confidence first. This kind of confidence is often more evident in young people.

Waves: But you don’t participate in fundraising and rarely make public statements. In terms of social visibility, you’re certainly less prominent than companies actively raising funds. How do you ensure DeepSeek is the top choice for those working on large models?

Liang Wenfeng: Because we’re tackling the hardest problems. What attracts top talent the most is the opportunity to solve the world’s most challenging problems. In fact, top talent in China is undervalued. There’s too little hardcore innovation at the societal level, so they rarely get the chance to stand out. By working on the most difficult problems, we naturally attract them.

Waves: Recently, OpenAI’s release didn’t include GPT-5, leading many to believe the technology curve is slowing down. Many are also starting to question the Scaling Law. What’s your take?

Liang Wenfeng: We’re more optimistic. The industry seems to be progressing as expected. OpenAI isn’t infallible; they can’t always lead the way.

Waves: How long do you think it will take to achieve AGI? Before releasing DeepSeek V2, you released models for code generation and mathematics, and you switched from dense models to MoE. What are the milestones on your AGI roadmap?

Liang Wenfeng: It could be 2 years, 5 years, or 10 years—but it will definitely happen within our lifetime. As for the roadmap, even within our company, there’s no consensus. However, we’re betting on three directions: first, mathematics and code; second, multimodal capabilities; and third, natural language itself. Mathematics and code are natural testing grounds for AGI, similar to Go—a closed, verifiable system where high intelligence might be achieved through self-learning. On the other hand, multimodal learning and engaging with the real world may also be necessary for AGI. We remain open to all possibilities.

Waves: What do you think the endgame for large models will look like?

Liang Wenfeng: There will be specialized companies providing foundational models and services, with a long chain of professional divisions. More people will build on top of these to meet society’s diverse needs.

Waves: Over the past year, there have been many changes in China’s large model ecosystem. For example, Wang Huiwen, who was very active at the beginning of last year, exited midway, and the companies that joined later have started to differentiate themselves.

Liang Wenfeng: Wang Huiwen bore all the losses himself, allowing others to exit unscathed. He made a choice that was most unfavorable to himself but beneficial to everyone else. I admire his integrity.

Waves: Where are you focusing most of your energy now?

Liang Wenfeng: Most of my energy is spent researching the next generation of large models. There are still many unsolved problems.

Waves: Other large model startups insist on balancing both research and commercialization. After all, technological advantages don’t last forever, and it’s important to capitalize on the window of opportunity to turn technological strengths into products. Is DeepSeek’s focus on model research due to insufficient model capabilities?

Liang Wenfeng: All the old playbooks are products of the previous generation and may not hold in the future. Applying the business logic of the internet to discuss the profit models of future AI is like discussing General Electric and Coca-Cola when Pony Ma was starting Tencent. It’s likely a case of “seeking a sword from a marked boat.”

Waves: High-Flyer has always had a strong technological and innovative DNA, and its growth has been relatively smooth. Is this why you’re more optimistic?

Liang Wenfeng: High-Flyer has, to some extent, strengthened our confidence in technology-driven innovation, but it hasn’t been all smooth sailing. We’ve gone through a long accumulation process. What the outside world sees is High-Flyer post-2015, but we’ve actually been at it for 16 years.

Waves: Returning to the topic of original innovation. With the economy entering a downturn and capital entering a cold cycle, will this further suppress original innovation?

Liang Wenfeng: I don’t think so. The adjustment of China’s industrial structure will rely more on hardcore technological innovation. When people realize that the quick money of the past was largely due to luck, they’ll be more willing to roll up their sleeves and engage in real innovation.

Waves: So you’re optimistic about this as well?

Liang Wenfeng: I grew up in the 1980s in a fifth-tier city in Guangdong. My father was an elementary school teacher. In the 1990s, there were many money-making opportunities in Guangdong, and many parents came to our house saying that education was useless. But now, looking back, their views have changed. Because making money has become harder—even opportunities like driving a taxi have dried up. A generation’s mindset has shifted.

Hardcore innovation will become more common in the future. It might not be easily understood now because society as a whole needs to be educated by facts. When society rewards those who engage in hardcore innovation with success and fame, collective attitudes will change. We just need more examples and time.

07 The Future of DeepSeek and AI

Waves: DeepSeek currently exudes an early OpenAI-style idealism and is open-source. Do you plan to close-source in the future? Both OpenAI and Mistral transitioned from open-source to closed-source at some point.

Liang Wenfeng: We will not close-source. We believe that building a robust technical ecosystem is more important at this stage.

Waves: Do you have any plans for fundraising? Media reports suggest that High-Flyer intends to spin off DeepSeek for an independent IPO. Ultimately, many AI startups in Silicon Valley have found it unavoidable to align with major companies.

Liang Wenfeng: We don’t have any fundraising plans in the short term. Money has never been our problem; the issue we face is the embargo on high-end chips.

Waves: Many people believe that building AGI is entirely different from doing quantitative trading. Quant trading can be done quietly, but AGI seems to require more fanfare and alliances to amplify investment.

Liang Wenfeng: Greater investment does not necessarily lead to greater innovation. Otherwise, major companies would monopolize all innovation.

Waves: You’re not focusing on applications right now—is it because your team lacks operational expertise?

Liang Wenfeng: We believe the current stage is an explosion of technological innovation, not application. In the long term, we hope to build an ecosystem where the industry directly uses our technology and outputs. We focus solely on foundational models and cutting-edge innovation, while other companies can build B2B or B2C businesses on DeepSeek’s foundation. If a complete industry value chain can be formed, there’s no need for us to develop applications ourselves. Of course, we could do applications if needed, but research and innovation will always be our top priority.

Waves: But if companies choose APIs, why would they choose DeepSeek over major corporations?

Liang Wenfeng: The future will likely lean toward specialized divisions of labor. Foundational large models require continuous innovation. Major companies have their limits and are not necessarily the best fit.

Waves: Can technology alone create a competitive edge? After all, you’ve said there are no absolute technical secrets.

Liang Wenfeng: Technology has no secrets, but resetting takes time and resources. NVIDIA’s GPUs theoretically have no technical secrets and are easy to replicate, but reorganizing a team and catching up with the next generation of technology requires time. In practice, this creates a wide moat.

Waves: After your price cuts, ByteDance was quick to follow, suggesting they felt some level of threat. What’s your take on startups finding new ways to compete with major corporations?

Liang Wenfeng: To be honest, we don’t really care about that. It was just something we did along the way. Providing cloud services isn’t our main goal—our goal remains achieving AGI.

So far, we haven’t seen any new approaches, but major corporations don’t have a clear advantage either. They have existing user bases, but their cash flow businesses are also burdensome, making them vulnerable to disruption.

Waves: How do you see the endgame for the other six large-model startups besides DeepSeek?

Liang Wenfeng: Two to three of them may survive. Everyone is still in the money-burning phase right now, so companies with clearer self-positioning and more refined operations have a better chance. Others might transform completely. Valuable innovations won’t disappear but will re-emerge in different forms.

Waves: During the High-Flyer era, your competitive stance was often described as “doing your own thing” and rarely engaging in horizontal comparisons. What’s your foundational thinking about competition?

Liang Wenfeng: I often think about whether something can improve society’s operational efficiency and whether you can find a position in the industrial chain where you excel. As long as the endgame increases societal efficiency, it’s valid. Many things along the way are transitional; over-focusing on them inevitably leads to confusion.

08 Innovation cannot be taught

Waves: How is DeepSeek’s recruitment progressing?

Liang Wenfeng: The initial team is already in place. In the early stages, since we were short-staffed, we temporarily borrowed some people from High-Flyer. When ChatGPT-3.5 became popular at the end of last year, we had already started recruiting. However, we still need more people to join.

Waves: Talent for large-model startups is scarce. Some investors say that many of the most suitable candidates can only be found in AI labs at giants like OpenAI or Facebook AI Research. Will you seek out such talent overseas?

Liang Wenfeng: If you’re chasing short-term goals, it makes sense to hire people with existing experience. But if you’re looking long-term, experience becomes less critical—basic abilities, creativity, and passion matter more. From this perspective, there are plenty of suitable candidates in China.

Waves: Why isn’t experience that important?

Liang Wenfeng: You don’t necessarily need someone who has done something before to do it again. One principle we follow when hiring at High-Flyer is to prioritize ability over experience. For our core technical roles, most of the team consists of recent graduates or those with only one or two years of experience.

Waves: In innovative fields, do you think experience can be a hindrance?

Liang Wenfeng: When tackling a problem, experienced people may instinctively tell you how it should be done. In contrast, those without experience will explore and think deeply about how to solve the problem, ultimately finding a solution that suits the current situation.

Waves: High-Flyer transitioned from being complete outsiders to becoming a leader in the financial sector within a few years. Was this hiring philosophy one of the secrets to that success?

Liang Wenfeng: Our core team—including myself—had no prior experience in quantitative finance, which is quite unique. I wouldn’t call it a secret to success, but it is a part of High-Flyer’s culture. We don’t intentionally avoid hiring experienced people, but we focus more on their abilities.

Take our sales team as an example. Our two key salespeople came from unrelated industries. One used to work in Germany in mechanical product exports, and the other was a backend developer at a brokerage firm. When they entered this field, they had no experience, no resources, and no connections.

Today, we might be the only large private equity firm primarily relying on direct sales. Direct sales mean no fees paid to intermediaries, which results in higher profit margins for the same scale and performance. Many firms have tried to imitate us but haven’t succeeded.

Waves: Why haven’t others succeeded in copying your approach?

Liang Wenfeng: Because copying just one aspect isn’t enough for innovation. It needs to align with the company’s culture and management style. In fact, during their first year, they achieved nothing. It wasn’t until the second year that they started to show results. But our evaluation standards are different from those of typical companies. We don’t have KPIs or fixed “targets.”

Waves: So, what are your evaluation standards?

Liang Wenfeng: Unlike most companies that prioritize client order volumes, we encourage our sales team to build their own networks, meet more people, and expand their influence. A trustworthy and upright salesperson might not get clients to place orders immediately, but they can make clients feel they’re reliable.

Waves: Once you’ve hired the right people, how do you get them up to speed?

Liang Wenfeng: Assign them important tasks and don’t interfere. Let them figure things out and perform. A company’s DNA is very hard to imitate. For instance, if you hire inexperienced people, how do you assess their potential? And once they’re onboard, how do you help them grow? These are not things that can be directly copied.

Waves: What do you think are the necessary conditions for building an innovative organization?

Liang Wenfeng: Our conclusion is that innovation requires as little interference and management as possible, giving people the freedom to explore and make mistakes. Innovation often emerges naturally—it’s not something you can force or teach.

Waves: This is quite an unconventional management style. How do you ensure that people work effectively and stay aligned with your goals under this approach?

Liang Wenfeng: We ensure alignment by hiring people who share our values, and we maintain cohesion through company culture. That said, we don’t have a formalized company culture, because anything formalized can also hinder innovation. Most of the time, it’s about leading by example. How managers make decisions in specific situations becomes an unwritten guideline.

Waves: In this wave of large-model competition, do you think startups’ innovative organizational structures could be the key to competing against major corporations?

Liang Wenfeng: If you follow textbook methodologies, startups doing what we do would seem doomed to fail. But the market is constantly changing. The real determinant isn’t existing rules or conditions—it’s the ability to adapt and respond to change. Many big corporations’ organizational structures can no longer support rapid decision-making and execution. Their prior experience and inertia can become constraints. In this new AI wave, there will undoubtedly be a batch of new companies rising.

Waves: What excites you most about doing this?

Liang Wenfeng: Figuring out whether our hypotheses are correct. If they are, it’s incredibly thrilling.

Waves: What are the non-negotiable criteria for this wave of hiring for large-model development?

Liang Wenfeng: Passion and solid fundamental abilities. Nothing else is as important.

Waves: Are these people easy to find?

Liang Wenfeng: Their enthusiasm usually reveals itself—they genuinely want to do this. These are the kinds of people who often end up finding you as well.

Waves: Developing large models can require endless investment. Does the potential cost give you pause?

Liang Wenfeng: Innovation is inherently expensive and inefficient, often accompanied by waste. That’s why innovation can only emerge once an economy reaches a certain level of development. When resources are scarce, or in industries not driven by innovation, cost and efficiency are paramount. Look at OpenAI—they burned through a lot of money before achieving results.

Waves: Do you feel like you’re doing something crazy?

Liang Wenfeng: I’m not sure if it’s crazy, but there are many things in this world that can’t be explained purely by logic. For example, many programmers are passionate contributors to open-source communities—they work hard all day and still find the energy to contribute code.

Waves: There’s a sense of spiritual reward in that.

Liang Wenfeng: It’s like hiking 50 kilometers—your body is exhausted, but your spirit feels fulfilled.

Waves: Do you think curiosity-driven “craziness” can be sustained?

Liang Wenfeng: Not everyone can stay “crazy” for a lifetime, but most people, in their younger years, can dedicate themselves fully and passionately to something without any ulterior motives.