Nvidia dominance at risk with LPU? - MetaVisions #20

Davi de Andrade
March 02, 2024

Hi all, hope you’re doing fantastic! Whilst listening to my favorite podcast: All In (go check them out), I learned about Groq AI, a company that could revolutionise the AI infrastructure stack and cause some real shock waves to a dominant Nvidia market.

Nvidia’s dominance

I would be pretty surprised if many of you have not seen Nvidia’s incredible market run over the last 12 months.

Having made most of it’s name in the gaming sphere by competing toe to toe with AMD in the GPU market over the last decade, the last year has been truly magical and there is ‘AI’ good reason for it.

When ChatGPT exploded at the start of last year, Large Language Models started being announced and released by all corners of the internet. This started an investment war between Tech Titans Amazon, Google, Microsoft, Meta, Tesla and others to secure large AI infrastructure where models can be trained, applications can be developed, products launched and money made.

Meta GPU Investment
Microsoft GPU Investment
Tesla GPU Investment

Why is most of this investment going into GPUs?

Training an LLM is a complex process that I won't even attempt to describe in here. An essential fact, is that it takes lots of calculations to be done over and over again at the same time. Inference, which is when the trained model is generating responses, also requires lots of (different) calculations.

At the start, researchers were trying to use Central Processing Units for this process, but they quickly realised that it was inefficient and slow, as CPUs were made to be quick and precise at handling a few parameters at the same time.
Graphics Processing Units were initially created to support the increasing heavy graphical demand of video games, which means that they were designed to handle multiple data points simultaneously, a process known as parallel processing. Making GPUs great at handling many parameters at the same time.

So, in simple terms, GPUs are important in the LLM training and inference process because they can do many calculations, increases the speed in which these processes are done.

Why has Nvidia completely dominated this market against AMD?

It has been a combination of factors such as:
• Early Strategic investments into AI - according to Nvidia, they were aware of the potential for GPUs in the LLM game, as early as the mid 2000s.
• CUDA Software Stack: optimised for a wide range of AI applications and workloads.
• Hardware Design.

The graph above shows the crazy growth on the Data Center category for Nvidia, which has increased their overall revenue over the past 4 Quarters from $7.2b to $22.1b.
We are not too sure how long Nvidia’s dominance will last or how long this infrastructure build-out will last, as we still have to see organisations being able to generate a revenue number that backs up their investment into their AI infrastructure, but it has been one of the greatest phenomenons I’ve seen.
If you had invested £10,000 in Nvidia 5 years ago, your portfolio would be worth £198,319 today

Groq AI - A disruption in the AI stack

Nvidia’s position in the market appears to be secure and should stay like this until a competitor like AMD can catch up, and release cards that are as good as the green-side, they have started this process by introducing AI accelerators to its RDNA 3 GPUs.

However, a new competitor has gotten some publicity recently, Groq AI’s chatbot has gone viral. Users are impressed by how quick they are receiving responses from the prompts. This can be described as the Inference speed or latency. The faster the inference speed, the quicker the model can respond to prompts.

Why is Groq AI so quick at responding prompts?

The main reason it’s the Language Processing Unit, a chip system developed by Groq AI. It is not an LLM, but it is a new piece of hardware that sits in the AI infrastructure.

The LPU has been designed specifically for inference purposes, it allows Groq to generate around 500 tokens per second. A great improvement when comparing to GPT 3.5 40 tokens per second, which relies on GPUs for this process.

The Notebook Check - Groq's LPU is specifically designed to process large language models (LLMs) and has clear advantages over general purpose GPUs or NPUs. Groq initially developed the Tensor Stream Processor (TSP), which was later rebranded as language processing unit to reflect its increased proficiency at Generative AI tasks based on inference. Since it is focused solely on LLMs, the LPU is much more streamlined than a GPGPU and allows for simplified scheduling hardware with lower latency, sustained throughput and increased efficiency.

With language-based operations in mind, the chipset uses:
• Sequential Processing - Processes tasks in sequence, making it highly effective for language comprehension and generation
• Single-Core Architecture - Single core design with instant memory access
• Optimized Memory Bandwith- Through the use of SRAM memory, instead of DRAM or HBM, Instant data accessibility and the highest performance memory solution

Will LPU chipsets disrupt Nvidia’s dominance?

Making predictions in this market is not easy. I do think that LPUs will have a great part in modern AI infrastructure due to it’s powerful inference performance, which is key in driving a good user performance in applications that use LLMs.
We need to beware that LPUs cannot be used for training LLMs, which means that GPUs will continue to be crucial parts of AI infrastructure, as it is the best performing solution for LLM training.

“Folks still wanted a one-size-fits-all solution like a GPU which they can use for both their training and inference. Now the emerging market has forced people to find differentiation and a general solution won't help them accomplish that.” - Mark Heaps, Chief Evangelist @ Groq AI

See you next week,
Davi, MetaVisions

Reply

or to participate.