It's a matter of time before artificial general intelligence is achieved: Jeff Denworth, VAST Data

Jeff Denworth explains that as AI models, like generative AI, become more advanced, they require tremendous amounts of data for training, which is both a technological and financial challenge.
Muqbil Ahmar
  • Updated On May 9, 2024 at 03:30 PM IST
Read by: 100 Industry Professionals
Reader Image Read by 100 Industry Professionals
<p>Jeff Denworth, Co-founder at VAST Data </p>
Jeff Denworth, Co-founder at VAST Data

As they say, data is the new oil. Organizations today need a robust data infrastructure in order to unlock the value of emergent technologies such as analytics and AI. This infrastructure only can help data-intensive organizations confront complex real-world challenges. A modern data storage system is needed that can cater to the requirements of modern technologies. ETCIO caught up with Jeff Denworth, Co-founder at VAST Data in a free-wheeling chat on how people are building new mechanisms to capture data and figuring out ways to extract value from that data. Some excerpts:

Advt
There is a lot of buzz around AI currently, particularly after the advent of generative AI. How has it changed the data paradigm?

The interesting thing about generative AI is that now you have machines that are generating new information in the form of a response, for example, putting a prompt into ChatGPT. There's a service that OpenAI is about to offer called Soma. You could basically write some text and a five-minute video can come out of just a few kilobytes of input that you provide the system. You start with five kilobytes of an idea and you end up with ten gigabytes of an output. This is a 50,000 times increase in the amount of data that is getting created by machines. Jensen Huang, the CEO of NVIDIA, says that in the future, you won't need to store things when you have machines that know everything and will be intelligent. You can generate an answer on the fly, as you search for an answer to a question. Therefore, these are very interesting times as far as AI is concerned.

Another important aspect to understand is that these models only get trained on a periodic basis. For example, the free version of ChatGPT has been trained till 2022. Typically, a training run requires an investment of a few billion dollars and therefore is difficult to carry out at short intervals. So, the moment GenAI models are created and released, they are no longer current to what's happening in the world.

Thus, there's another capability entering the market called retrieval augmented general intelligence or generative intelligence, also referred to as RAG. This allows you to deploy an AI model so that people can use it. And when it doesn't have a current answer, it can retrieve that from a system that you ask a question to. From a data perspective, everything is going to be crazy because these models are getting so intelligent that it's hard to distinguish between a human and a robot in terms of who you're interacting with. They're becoming, in many cases, smarter than humans, particularly in difficult subjects that require tons of studying. You can train a model within a month, while a person might take 10 years to come to that level of capability.

Advt
What role does data play in this new kind of AI? When do you see the advent of Artificial General Intelligence?

AI is based on data. All new forms of AI are statistical methods that are being implemented on extreme scale supercomputers. Models get more accurate as you expose them to greater amounts of data. If you feed these models with more information about the real world, ranging from physics to bioinformatics to mathematics, as is being done routinely by large AI research centers, these new models will graduate to a level where they're as intelligent as human beings. It is a matter of time when the world achieves artificial general intelligence. We are seeing a massive rush to acquire extremely large data sets in order to drive that pursuit. The training data for the next generation model is 2,000 times larger than what chatGPT-3 was trained on. In order to train something that understands the natural world, you have to show it the whole world. And that's a very data-intensive process. The humongous amounts of compute power and data needed to train these next generation models are astronomical.

Advt
Therefore, there are a number of companies gearing up to make spectacular investments. The first public evidence of this has now come out in the market, with Microsoft and OpenAI reportedly building or developing plans to build a $100 billion computer. It is a fact that $100 billion is the GDP of a sizable nation. And that's just one machine which runs one program. What OpenAI and Microsoft are doing in public, some other players of that level and scale are planning in private.

Moreover, there is a huge amount of data that is getting synthesized. There is an effort to manufacture a synthesized version of the world. This creates a new kind of problem for AI systems. If you have AI generating the data that it has to train itself with, at some point it becomes a recursive process. That is when the whole model collapses. It happens when you just don't have enough quality data to keep the model from drifting. Once these models start drifting and hallucinating, they will have to undergo retraining.

Advt
As you pointed out to the surge in revolutionary technologies like AI, which have data as their foundation, what is the role VAST Data is playing in it?

We started VAST Data in the early part of 2016. We were founded 10 days after OpenAI. We realized that we had an idea for a new distributed style of system, a data storage system and a data processing system. That essentially resolved fundamental trade-offs that people deal with as they manage and process their data. The problem with other systems is that they don't scale very well and they're not designed for rapid data ingestion.

We realized that for the first time in 20 years, it would be possible to build a new style of distributed systems architecture. One that didn't have the disadvantages of the existing systems. A system on which you could build web-scale, data-center-scale systems that have the performance that you should expect to get from the fastest infrastructure in the world (which is flash storage today), but at a cost point that's more equivalent to what you would expect to see from an archive. You can remedy this fundamental tradeoff that has always existed between performance and capacity with an enterprise data center.

Companies like Google, Baidu, Facebook, and Uber are publicly working on a subset of AI called deep learning. Uber is doing it for computer vision. For the first time ever, you have a machine that could make sense of a data type that for the last 30 years was not processable by a data warehouse or an analytics system. This is unstructured data. A renaissance is on its way in how people understand data by virtue of GPU computing. Neural networks can now make sense of this new data type. That unstructured data now just lives in archive systems that nobody has the ability to make use of until the advent of generative AI or large language models or computer vision.


Please shed some light on the journey of VAST Data and what fundamental problem are you trying to solve through it?

Our journey started with the idea that we could basically unlock access to this data so that it could be understandable. So, we built a file system and an object storage system like a classic enterprise storage infrastructure that broke that 30-year-old tradeoff between performance and capacity. In 2019, we said, "We're here to kill the hard drive. If you don't have hard drives in your data center, all your data is on flash. Then you don't need to implement a storage tiering methodology that IT practitioners have been working with for the last 30 years. All of that goes away. You just have one big pool of fast, cheap infrastructure that scales linearly.

We built the organization on that premise. We're valued at $9 billion, making us one of the most valuable private infrastructure companies in history. In the storage space, we've achieved something that nobody ever thought would be possible.

Bringing commercial viability to AI algorithms is still a question mark. What is your perspective on it?

That is a challenge. We have realized that at some point people will need to figure out how to bring commercial viability to algorithms that are being pioneered in organizations such as Baidu, Google and Meta. When all industries want to harvest this new capability, they are going to need an enterprise systems manufacturer or a systems engineering company that can make this simple for every organization to deploy and scale.

The VAST Data platform is about the convergence of unstructured data. The platform combines this with the next generation approach to data warehousing that's designed for radical levels of transactionality. As data flows in from the natural world, you can contextualize it immediately in a data warehouse that sits alongside your raw data. Our cores will be programmable by our customers in the near future. You can embed deep learning technologies and algorithms into the system and it can learn from the data in real time as it flows into the system.

How are you helping industries leverage promising technologies such as AI? Please share some success stories.

If you can build a system that has the capabilities to support the needs of the present style of computing (on which a lot of AI depends), you have probably solved a numer of problems that exist in enterprise environments until now. Our solution is being used by many of the world's largest banks, governments, media companies, health care research organizations, and other industries for their day-to-day data management that allows them to bring AI into their environment.

For example, Pixar selected us to build movies on our platform years ago. At that time, they were experimenting with a new style of rendering. It needed high-performance infrastructure. We've had a great partnership with them.

In order for AI algorithms to function, there will be a need for superfast computing and processing. With cloud becoming de facto, what is your perspective on the urgency of AI Cloud?

Tier one cloud service providers were not ready for the needs of intensive GPU computing. There's a good chance that they won't have the power and the data center capability to deploy thousands of GPUs, which is needed to build intelligent infrastructure.

So, a new crop of AI companies are emerging. Organizations like CoreWeave or Lambda Labs in the United States, Core42 in the UAE are building multi-billion dollar clouds specifically designed for extremely high performance GPU computing. These data centers run on hundreds of megawatts and come with a level of efficiency and capability that you typically wouldn't get in a tier one cloud. The world needs a new set of AI clouds, with multi-tenant infrastructure that has enterprise capability. Some of these clusters, for example Cori, have a single system consisting of 22,000 GPUs. We are talking about something that requires at least tens of megawatts of capability just for one system and takes up an entire warehouse.

What is your assessment of the India market as far as modern data management is concerned?

As the country adopts new AI models, it will need modern data platforms like ours to make full use of these emerging technologies.

  • Published On May 3, 2024 at 10:39 AM IST
Be the first one to comment.
Comment Now

Join the community of 2M+ industry professionals

Subscribe to our newsletter to get latest insights & analysis.

Download ETCIO App

  • Get Realtime updates
  • Save your favourite articles
Scan to download App