How did Dibsik built artificial intelligence with less money?
Last month, the US financial markets fell after a Chinese emerging company called Deepseek said it was built one of the world’s most powerful artificial intelligence systems using much fewer computer chips, which many experts believed.
Artificial intelligence companies usually train their Chatbots using giant computers full of 16,000 specialized chips or more. But Dibsic said he only needed about 2000.
As Deepseek engineer in A. Search paper Posted immediately after Christmas, the startup has used many technological tricks to reduce the cost of building its system significantly. Its engineers need about 6 million dollars of raw computing power, or nearly a tenth of what was spent dead in building the latest technology of artificial intelligence.
What exactly what Dipsik did? Here is evidence.
How is artificial intelligence techniques built?
AI’s leading techniques depend on what scientists call nerve networks and sports systems that learn their skills by analyzing huge amounts of data.
The most powerful systems spend months analyze all English texts on the Internet as well as many pictures, sounds and other multimedia. This requires huge amounts of computing power.
About 15 years ago, artificial intelligence researchers realized that specialized computer chips are called graphics processing units, or graphics processing units, which were an effective way to do this type of data analysis. Companies like Silicon Valley Chipmaker Nvidia originally designed these chips to provide graphics for computer video games. But GPUS had a talent to operate mathematics that operate nerve networks.
Since companies have collected more graphics processing units in their computer data centers, artificial intelligence systems can analyze more data.
But the best cost of graphics processing units is about $ 40,000, and it needs huge amounts of electricity. Sending data between the chips can use more electrical energy than operating the chips itself.
How did Dibsik reduce costs?
I did many things. The most prominent of which is that it adopted a method called “a mix of experts”.
Corporates have usually created a single nervous network that learned all the patterns in all data on the Internet. This was costly, because it required huge amounts of data to travel between GPU chips.
If one of the Sharqiya learns how to write a poem and another is to learn how to write a computer program, they still need to talk to each other, only if there is some overlap between poetry and programming.
With a mixture of experts, researchers tried to solve this problem by dividing the system into many nerve networks: one for hair, one for computer programming, one for biology, one for physics, etc. There may be 100 of these smaller “experts” systems. Each expert can focus on his own field.
Many companies have struggled with this method, but Deepseek managed to do it well. Her trick was the pairing of these smaller “experts” systems with a “general” system.
Experts still need to circulate some information with each other, and a general understanding – which has a decent but detailed understanding can help each topic – coordinate interactions between experts.
It is somewhat similar to the editor who oversees a news room full of specialized journalists.
This is more efficient?
More more. But this is not the only thing that Deepseek did. He also mastered a simple trick that involves decimal fractures that anyone who remembers math in primary school can understand.
There, mathematics is involved in this?
Remember your mathematics teacher explains the PI concept. PI, which is also referred to as π, is an endless number: 3.14159265358979 …
You can use π to make useful accounts, such as determining the circumference of the circle. When you do these calculations, you limit π to a few decimal: 3.14. If you use this simpler number, you will get a very good grade of the circuit around the circle.
Deepseek did something similar – but much wider – in training artificial intelligence technology.
Mathematics that allows the nervous network to determine the patterns in the text are just reproduction – a lot, a lot and a lot of beating. We are talking months of beating across thousands of computer chips.
Usually, the chips double the numbers that are suitable for 16 bits of memory. But Deepseek press each number at 8 bits only of memory – half the space. In essence, he photographed many decimals out of every number.
This means that each account was less accurate. But this does not matter. The accounts were accurate enough to really produce a strong nerve network.
That’s all?
Well, they added another trick.
After clicking on each number in 8 bits of memory, Deepseek take a different path when these numbers are hit together. When determining the answer to every multiplication problem – a major account that helps in determining how the nerve network works – the answer has spanned 32 bits of memory. In other words, he kept a lot of decade. The answer has made more accurate.
So what high school student could do this?
Well, no. In their paper, Deepseek engineers showed that they were also good in writing a very complex computer code that tells GPU to what to do. They knew how to press more efficiency of these chips.
Few people have this type of skill. However, the dangerous artificial intelligence laboratories have the talented engineers needed to match what Dibsic did.
Then why didn’t they really do this?
Some AI laboratories may use at least some tricks already themselves. Companies like Openai always do not reveal what they do behind closed doors.
But others were clearly surprised to do Dibsic. Doing what the emerging did is not easy. The experiment needed to find a penetration of such millions of dollars – if not billions – in electrical energy.
In other words, it requires huge amounts of risks.
“You have to put a lot of money at stake to try new things – often fail,” said Tim Dimmers, a researcher at the Allen Institute of Artificial Intelligence in Seattle, who specializes in building effective artificial intelligence systems. An artificial intelligence researcher in Meta.
“This is the reason why we do not see much innovation: people are afraid to lose many millions just to try something that does not work,” he added.
Several critics have indicated that Deepseek is worth $ 6 million, which only covered what the startup spent when training the final version of the system. In their paper, Deepseek engineers said they spent additional money on research and experimentation before final training. But the same applies to any advanced artificial intelligence project.
Deepseek tried, and it resulted. Now, since the Chinese young woman has shared its methods with other artificial intelligence researchers, technological tricks are preparing to reduce the cost of building artificial intelligence significantly