Discussion about this post

User's avatar
Rick Greenwald's avatar

HI Tom -

A couple of thoughts on what appears to be a disconnect between DeepSeek and other AI technology. I may post this on my substack later today. I welcome opinions on the following.

-----------------------------

I am not familiar with AI technology in any depth, and the idea I am proposing would require that to turn it from an analogy into an insight, but I think there may be a way to explain the large cost differential between what is going on at OpenAI and similar operations and DeepSeek.

I am not deep in AI, but I am in another technology, database management. There are two scenarios, off the top of my head, that could display a similar disparity in costs and complexity, but only to people who do not fully understand the requirements in those scenarios. You know, like journalists or investors.

The first is protecting data integrity in OLTP systems at any kind of scale. It’s making sure that individual write operations do not corrupt the correctness of other individual write operations. If your systems don’t do a lot of writes, or none at all, you don’t have to worry about your DBMS handling this very tricky area well. If you have a significant amount of write activity, and, especially, if you scale the number of people doing write operations, the situation grows very complex very fast. And the only way to address the issues raised in this situation is to take its implications into account from the very beginning of writing the code for the DBMS. (You can’t “fix” lost data integrity after it is gone.) And including these considerations as you build the DBMS makes that building much more expensive and time consuming, both for the development and the testing of the system.

If you just want to skip that part, the DBMS can be much simpler. And cheaper to develop, test and maintain. But if you try to scale this type of DBMS for writes, things will start to go bad. (The good news is that you may not notice how bad they went until much later. Wait, is that actually good news?)

Another similar area is high availability. The capability for providing high availability is typically measured with two metrics – Recovery Time Option (RTO) and Recovery Point Option (RPO). A low RTO means your system will not be down for very long if you have to recover it. A low RTO means that you will not lose much data if the system crashes.

You can get some availability pretty easily. Heck, simple backups will give you the ability to recovery your data, although the RTO could be in the hours. If you can accept larger amounts of lost data, a higher RPO approach can also be pretty basic.

But if you are talking about near zero RTO and RPO, it’s really hard. And, again, you are talking about considerations that must be taken throughout the development of the DBMS, and carefully implemented, frequently and multiple times the cost of a more tolerant system. And, again, there is no shortcut for this.

I don’t know if AI has similar challenges in implementation. Since AI is primarily a data read process, these particular things may not matter in the use of AI products. But they could matter in the development and training of the systems, or there could be other areas with similar characteristics.

Remember, from the standpoint of using a DBMS, there is no difference between a system that can handle high volume write operations or that has very low RPOs and RTOs. From the outside, they look the same. But when problems come up, the systems are dramatically different in how they adjust and react. If there is something similar in AI technology, it makes perfect sense that the simpler system is much cheaper.

Expand full comment
Paolo Magrassi's avatar

This Briant Merchant weaves a large canvas with a thin thread: it reminds me of February 2023 when Google Bard gave a couple of silly answers in an online demo to the press and it caused Alphabet to lose $100M in one hour. Furthermore, we've seen plenty of LLM benchmarks that were subverted in two weeks.

Having said that, the idea that Deep Learning just depends on chips is as silly as believing that human ingenuity (algorithms and architectures) has no role in deep learning and AI in general.

Expand full comment

No posts