DeepSeek R2 expected to release before May
Company behind popular Chinese LLM looking to capitalise on success
DeepSeek is expected to launch its second release before May, ahead of its original schedule following the splash created by DeepSeek R1 earlier this year.
The Chinese start up had pencilled in early May for the release of its updated version, but is now rushing to complete its update.
According to Reuters, citing a number of anonymous sources, R2 ought to produce better code and be able to reason in languages other than English. However, no firm date has yet been pinned down.
The first public release of DeepSeek caused a stir earlier this year not just because of its popularity, but due to the vastly lower cost of training and running the app compared to more mature LLMs. The organisation claimed that it cost just $5.6 million for its final training run. That compares to the $5 billion loss that AI pioneer OpenAI posted in 2024 on revenues of $3.7 billion – losses that are expected to ratchet up to $44 billion before it achieves profitability in 2029.
DeepSeek was able to train its models for a fraction of the cost of rivals by, it claimed, implementing a number of low-level code optimisations to ensure that the application could run efficiently on older Nvidia hardware, following the implementation of US sanctions in 2023.
According to Anthropic CEO Dario Amodei, the cost of training an LLM normally ranges between $100 million and $1 billion, depending on the purpose, while ChatGPT v4 cost more than $100 million, according to OpenAI CEO Sam Altman. AI also requires the ingestion of vast quantities of data, typically scraped from the internet, with or without the consent of the owner or creator.
DeepSeek, according to Reuters’ research in China, is run more like a research lab than a profit-making organisation, with a flat hierarchical structure.
It was founded by Liang Wenfeng, an engineering graduate who became a billionaire via his hedge fund High-Flyer. The company operates from offices within walking distance in Beijing of both Tsinghua University and Peking University, two of China’s most prestigious institutions of higher education.
The hedge fund took an early interest in AI as a means of fine-tuning its trading, re-investing 70 per cent of revenue into AI research. The company produced two supercomputers on pre-sanctions Nvidia A100 chips in 2020 and 2021 for the purpose of training AI models.
However, it has been claimed that in addition to the code optimisations, DeepSeek took some other shortcuts. OpenAI has accused it of stealing its own intellectual property via a process of distillation, transferring knowledge from a larger model to a smaller one. This enables models to be trained in a fraction of the time (and cost) it usually takes.
OpenAI has not provided the evidence to back up this claim.
Furthermore, a New York-based security company found sensitive data from DeekSeek unsecured on the open internet. This data included the origin of log requests, containing chat history, API keys, directory structures and chatbot metadata logs.