Cloud Computing’s Intensifying ARMs Race with Liftr Insights’ Paul Teich
Advisor: Simon Erickson
“Buy, buy, buy!”
Those appear to be the most common words in the semiconductor industry right now. 2020 has been one of its most exciting years on record, with billion dollar acquisitions taking place at a frenetic pace.
NVIDIA’s (Nasdaq: NVDA) earned the distinction of the industry’s biggest spender. Its $40 billion purchase of ARM Holdings takes the cake as the year’s largest acquisition, though it also shelled out another $7 billion for Mellanox just eight months prior.
Analog Devices (Nasdaq: ADI) paid $21 billion for Maxim Integrated Products (Nasdaq: MXIM) in July. And AMD (Nasdaq: AMD) recently wrote a $35 billion check for Xilinx (Nasdaq: XLNX).
That’s a lot of money changing hands! It seems we’re in the middle of a semiconductor arms race. Competition amongst chipmakers is intense, and no one wants to get left behind using yesterday’s technology.
The market implications of these M&A deals will also be significant. Every mega-acquisition changes a company’s competitive position and its relationship with customers. It also provides visionary leaders with a war chest of new resources to help them execute on their bigger-picture battle plans.
But why now?
What’s going on in the chipmaking world that could possibly justify so many massive acquisitions so suddenly?
Are there important new technologies hitting the market, causing companies to course-correct to respond to changing customer needs?
Is artificial intelligence becoming more demanding, requiring more horsepower and efficiency from specialized chips in order to keep up?
And as investors, should we be excited or terrified about billions of dollars of our shareholder capital that’s flooding into these acquisitions?
To help us answer those questions, we brought in one of the semiconductor industry’s brightest minds.
Paul Teich is the principal analyst at Liftr Insights. He has 40 years of IT experience and 12 patents to his name. In my opinion, he’s the most thorough and insightful analyst of the entire semiconductor industry.
In an exclusive interview with 7investing CEO Simon Erickson, Paul explains how NVIDIA rose to prominence in training deep learning models, why security will matter for edge computing, and the importance of power efficiency in the new age of machine learning inference.
Paul also describes why developer ecosystems are becoming critical for custom chip development and how Amazon’s (Nasdaq: AMZN) Inferentia has started a trend of large companies developing their own ASICs. He also reveals his favorite semiconductor acquisition of 2020 and even takes a few questions that were submitted by our 7investing followers!
00:00 – Introduction and why NVIDIA’s GPUs were perfect for deep learning training. What will be the rule of GPUs in machine learning inference?
10:37 – Artificial intelligence’s role in data center network security and the importance of Smart NICs.
15:52 – Is data science scalable? Are companies beginning to build deeper within the cloud computing stack?
21:21 – Edge computing: Will it become commoditized or remain specialized?
24:30 – The importance of computing efficiency in the new age of machine learning inference.
33:58 – Are GPUs at risk of being replaced by FPGAs or custom ASICs?
43:27 – Big Tech’s split between custom silicon and merchant supply.
52:04 – What was the best semiconductor acquisition of 2020?
54:55 – Why was NVIDIA so interested in buying ARM Holdings for $40 billion?
58:03 – Is Intel keeping up with its competitors in the AI arms race?
Publicly-traded companies mentioned in this interview include Amazon, AMD, Fastly, Intel, NVIDIA, and Xilinx. 7investing’s advisors and/or guests may have positions in the companies that are mentioned.
This interview was originally recorded on November 3, 2020 and was first published on November 5, 2020.
Simon Erickson 00:00
Hello, everyone, and welcome to this edition of the 7investing podcast. My name is Simon Erickson, I’m the founder and CEO of 7investing. And I’m really excited today to talk about the semiconductor and cloud computing industry. Because there’s a lot of talk about “is it the end of Moore’s Law”? There’s a lot of acquisitions going on out here in 2020. Artificial Intelligence is a huge headline that everybody wants to talk about.So I really wanted to go to an expert, to get the true scoop on what’s going on out there. And so that’s why I’m super excited to talk with Paul Teich. He’s the principal analyst out there at Liftr Insights in Austin, Texas, and he’s joining me on the podcast this afternoon. Hey, Paul, thanks for joining me with 7investing!
Paul Teich 00:39
My pleasure, Simon.
Simon Erickson 00:40
Paul, we’ve got so many different angles we could take with this, you know. We’ve kind of mentioned Moore’s law is winding down and the CPU architectures are changing. We’ve seen some multi billion dollar acquisitions out there in the semiconductor space.But maybe let’s start with artificial intelligence. Because you and I chatted last back in March. Last time you were on the 7investing podcast, we talked about the different layers of cloud computing, right? From kind of the ground up, you’ve got Infrastructure as a Service, if you just want memory, you just want storage, you’ve kind of got those things. And then you can have the building blocks on top of that: platform as a service. And then software as a service at the top; the application layer. And, Paul, I guess with my first question, it seems like a lot of those early SaaS companies were “out of the box” solutions, right? Maybe I wanted something for payroll processing. Maybe I want something for inventory or something like that. But artificial intelligence seems like it’s so much more complex than these simplistic out of the box SaaS solutions, which aren’t as applicable anymore. What’s your take on artificial intelligence? And how is it changing cloud computing these days?
Paul Teich 01:48
Wow, so big question. It’s completely changing everything. But let’s start with really what that means. So model training, okay, is is the heart of deep learning, and what we call — I’m just going to say — deep learning, machine learning, right? So AI is a big thing. Machine learning is a subset of AI, deep learning is kind of what this explosion is all about the last five or seven years. And so deep learning training requires real world data. Or if you’re training cars, simulated automotive data. But you’re basing that on real world simulations, right. But you have to have a lot of data to train a model. It’s very compute intensive. That’s where Nvidia has made a hell of a lot of money the past five years. It’s really compute intensive. And it’s basically matrix math. Really accelerated well on GPUs, and now we get the special tensor cores to accelerate it even better. We can talk about memory in a second, because memory bandwidth and memory controllers all factor into the fact that each model is actually different. So we’ve gone from these very simplistic, you know, single layer neuron models. Not just multiple layers of hundreds and thousands of neurons, but ensembles of models. So now what we call an AI is actually a collection of deep learning models that run together, and each part of the model specializes in a different part of your recognition problem. And therein lies the challenge with like building a single chip to rule them all. Okay, so GPUs are really good in a general purpose sense, they’re much better than a CPU. And for lots of reasons, they’re easier to program than FPGAs. FPGAs have a place, and we can talk about that. But where GPUs are made, they’re parallel programming all of this matrix math that has to happen to do the inferencing.
Paul Teich 04:07
Once you’ve built a trained deep learning model. So you train a model over here on essentially a supercomputer, whether it’s in the cloud or dedicated right into the NVIDIA HDX box, right? And so it takes some cycles to train a model. Takes a lot of data to train a model. We haven’t really figured out this whole single shot thing you’ve seen, probably some academic papers, some initial benchmarks, and recently there’s been some single shot research published. But certainly not a thing yet where you can train a model with a little data and it will do something really useful and repetitively with a high quality. So let’s say we’re talking about Alexa. Okay. Everybody’s favorite at home. Smart speakers. But we’ll take Amazon Alexa running in AWS. Okay? Once you’ve trained a model and it’s fairly stable, then you can look at how to optimize it. And so after a lot of iterations, AWS said, “maybe it’d be more efficient if we designed our own chip to go do this.” Now the clouds don’t share any of their internal data. Google, Microsoft, Amazon…they don’t share data with their supply chain. And so they’re sitting on all of this voice data. The folks running Alexa eventually have this really big collection of what people are saying to smart speakers. How they need to go and what kind of questions they need to answer about the weather and sports and TV channels and entertainment. And so they train in AI. It’s very specialized AI. And they have all this data that says, “here’s the way our model should work.” And it may look completely different, like GPT3. GPT2, GPT3, the open source speech recognition models that are out there. And speech generation, their model may look completely different because they’ve evolved differently.
Paul Teich 06:19
And so AWS can afford then to say, “we’re going to deploy our speech recognition and speech generation, because part of Alexa is speaking back to you.” So the pathway is there’s a set of models that determine what the user is saying; what I speak into a Smart speaker. Then it goes into the cloud and figures out what the actual intent was, with a different set of models. Okay. Says, “Oh, he’s asking about last night’s sports score, or whatever, right?” And now you wrap that back up, and you speak it back to me in a English sentence, that takes some AI there. So it’s all of these ensembles of models together to deliver this. What we think is a simple smart speaker experience. So to run the cloud based portion of this, Amazon, designed its own inferencing chip called Inferentia. Unsurprisingly. It’s not a real imaginative name.
Simon Erickson 07:21
Real clever name by Amazon. Good job, Jeff. Good job, Bezos.
Paul Teich 07:24
Good job. It’s an inferencing chip. Okay, it’s not a training chip. They’re still using GPUs and other stuff to do training with. But what they’re trying to do is lower the operational cost of Alexa. And to do that, they need to lower the energy costs to answer my query. That’s your operational cost. So when I speak into the smart speaker and then it figures out that I said something that needs an answer to what did I say? What should the answer be, and then phrases it back to me. Okay. That’s all energy. Doesn’t matter how much the chips cost. That’s all energy and response time. The faster I can do certain parts of it, the more time it has to think about the quality of the response it’s going to give to me. So if I spend all my time just on the speech recognition and the speech generation, then I don’t have a lot of time to actually figure out. “Is he talking about a recipe or a sports score?” And and that’s the important part is to give relevant information based on the user’s context back to them, with this artificial intelligence.
Paul Teich 08:40
And so with the data that Amazon’s collected on Alexa, it can train its own models. And based on those trained models, it has a preferential architecture to go design Inferentia chips around. Now Inferentia chips might not work with somebody else’s speech recognition and generation engines. Might not be the right answer. And that’s the challenge, is the memory profiles may be a vastly different number of hyper parameters, right? In the search space, maybe completely different. And so what the memory access patterns look like, how many convolutions do you need to do per second, you know, it’s all kind of fungible as you talk about designing a chip that is set up to deliver you more targeted ads. Versus answering a question, versus finding that restaurant on the map. You know, all of this stuff is fundamentally different models. So, where we go with that is that training is probably going to stay in general purpose. Ideally, a lot of them want to design their own general purpose chip. And we know kind of what that looks like. It’s a Google TPU. TPU, v2, v3, etc.
Simon Erickson 10:06
Matrix math. The things that GPUs made a ton of money for NVIDIA. Right?
Paul Teich 10:10
Right. Right now, when you add the tensor core to the GPU — and at some point, we’ll know that the war is over, when Nvidia cuts all the GPU stuff out — right now it’s not really much of a GPU, but there’s still some legacy there. GPU is NVIDIA’s DNA. When they make a part that doesn’t have a GPU legacy anymore, that’s when the AI revolution is pretty much over. So a big subset of AI is security. Starts with fraud detection and credit card transactions. But really quickly, it morphs into our concept of data center. Security is to be the igloo metaphor, right? It’s really hard and crunchy on the outside. But once you break through, it’s soft and chewy. [laughs]. The Polar Bear analogy. Now the cloud data center, the cloud network, you have to assume it’s under constant attack. Okay, it’s a huge attack surface. At Amazon, Google, the super seven, even the real estate plays, the tier two, the telecoms, everybody is under constant attack now. And so that whole concept of “we’re just gonna have a firewall, and once they break through the firewall, they can have whatever they want.” We’re doing now behavioral analysis of what is running behind that first couple layers of protection. And so that behavior analysis wants to be trying to do is let legitimately, credentialed apps do their thing. I have the security clearance, this is a usual thing for me to do. I shouldn’t have access to this file. Don’t prevent me from doing that. That’s a business deceleration. At the same point, if somebody is doing something unusual, they have credentials, but not in that part of the system. It’s not a usual thing for them to go, “yeah, that for that person or that app to go make a request to do.”
Paul Teich 12:29
AI is being trained now on security data, what is usual and unusual for different roles. And so that then comes back down into the cloud data center with these Smart NICs [Smart Network Interface Cards]. And so, Nvidia & Mellanox. Okay, as part of Nvidia’s Core Data Center strategy. Nvidia and Mellanox was before Nvidia and ARM. That was a huge move on their part. Because what does Mellanox do? Mellanox has actually a couple of different flavors of Smart NICs. One of them has an FPGA in it. That’s not gonna last long right? So we pretty well know that it’s there, you know, programmable ASIC. That’s going to take the day and probably going to have some Nvidia tensor core magic in it at some point to do the AI stuff. But that Point of Presence at every server node in a hyperscale data center, where your control…where you’re looking at the network traffic in real time with a trained AI model, this is Microsoft Azure’s FPGA-enabled Catapult Smart NIC as well. It’s been doing that for years. It’s attached to every Azure server. Has been for years. And so these are behavioral analysis to make sure that traffic is optimized in real time. Because a hyperscale data center is very fluid in terms of assembling clusters of nodes disassembling them, that whole software defined networking thing comes into play in a huge way. And AI helps all that become optimized. But what you’re really doing is also preventing bad actors. Now, do you have the right credentials? And oh, by the way, not just do you have the right credentials, but is it something we expected you to do with those credentials? If this is not normal behavior, I need to go figure out if it’s normal to ask someone. And so AI really is when we talk about it permeating the cloud. It’s permeated the cloud at a fundamental level. When you look at Alibaba’s X-Dragon Smart NIC, AWS’ Nitro, Microsoft’s Catapult. We have to assume Google’s got something and not telling us about it. Gotta assume they’re doing something similar, right? Oracle is doing something similar. Yeah, if you go down the list.
Paul Teich 15:10
But what they’re doing is essentially a couple of things. A Smart NIC has that extra level of security and assurance that protects your virtual core or bare metal. But it also gives you better performance because you’re not actually running that stuff on the node. And that’s the big story. Is that now, not only did they offload the security and the credentials, but they offloaded the virtualization stack. And so you’re paying very little overhead for virtualizing all this metal now. Because you have that Smart NIC as a sidecar to your actual server node, doing the work.
Simon Erickson 15:52
And Paul, are companies starting to build things deeper into the stack? I mean it used to be you’d hire a SaaS company to take this application that you run within your organization. As large companies are understanding what cloud computing is capable of better now, are they asking for the infrastructure as a service? So they can start doing the engineering and building the apps on their own?
Paul Teich 16:17
I think that’s one of the things that we’re running into, with not just analytics. So SAP, in general, apps like SAP — but AI, specifically — is that data scientists don’t scale very well. It’s a credentialed, degreed, experienced kind of human. And we haven’t figured out how to automate that part yet. So we haven’t yet invented the deep learning training and models that would, say, my kind of problem needs this kind of training. Okay, here’s how I would optimize my set of oil and maintenance auto shops across the US? How would I analyze that data compared to a restaurant chain? Compared to a hospital system? Okay, so the data looks different. Even within, you know, comparing hospital systems or Jiffy Lube oil change chains, right? Everybody’s been collecting data in different ways, for the past 20, 30, 40 years.
Paul Teich 17:38
And so the problem with analytics is a challenge for the industry. It takes a human data scientist now to go in and look at it. And it takes a human to start loading data to train AIs. So you can’t just say, “I’m going to create a data lake and let SAP HANA.” Somebody has to be really conversant in how to assemble that analytics package and how to train a useful AI model. And then go deploy it. Without that being automated.
Paul Teich 18:19
I think that’s where we’re seeing the SaaS slowdown. The software as a service is one size does not fit all. We haven’t been able to figure that out for deep learning training. Because every inferencing solution now is an ensemble of trained deep learning systems. It is kind of rocket science still. We haven’t simplified it. Everybody’s trying. They’re trying to create a high level construct. So I can just feed it data, I can go find patterns. But the patterns may not be important. Challenges, they may be important, and they may not matter. Or give me the culturally, there’s the apocryphal — I’m not sure if he talked about it in the past — but that AWS HR system that they implemented a few years ago, where they trained it on a few years worth of Amazon human resources success data. They said “who are the candidates who are going to be successful at Amazon?” Okay. And so after a few months of crawling through the resumes, this was this AI, somebody raised a flag and said, “Yeah, we’re not seeing any female resumes.” And turned out to be a cultural problem at Amazon at the time. To their credit, they stopped being HR AI, and said, “Yeah, we’re not promoting enough females, where we are not gender unbiased. We’ve got gender bias in our system. And so we’re going to shut the AI down, we’re going to revisit the past few months worth of resumes that it didn’t give us because they were female. And we have a cultural problem.”
Paul Teich 20:15
So that when we talk about bias in AI, it could be that the system has had a systemic problem that humans haven’t recognized. Maybe we kind of blinded ourselves, maybe we just don’t know the problem exists. And the AI will expose it and take action on it, if we tell it to. “hey, do the thing that matters the most.” And this is kind of the paperclip problem in AI. If you build an artificial intelligence that’s tuned to making a better paperclip and everything looks like a paperclip, pretty soon it starts consuming the earth to make better paper clips, right?
Simon Erickson 21:00
It’s going to do what you tell it to do.
Paul Teich 21:03
So give it a bunch of resumes and say, “who’s been successful in the past, and who was successful in the past?” He had a problem with it. And so you change the system and then that changes the data that feeds into the AI.
Paul Teich 21:21
So that that kind of brings us we had a question about the edge. And so, edge computing and AI is a big deal. Okay, so what happens at the edge? What happens at the core?
Simon Erickson 21:33
And that’s Sorabh Arora, by the way. Just to give a shout out to this actual question. To our 7investing audience. Specifically, the question was, “Paul, how do you foresee edge edge computing playing out? Would it become commoditized? Or will it still remain specialized?”
Paul Teich 21:52
I think it’s going to be specialized. Mostly because of this AI thing that literally is permeating everything. If all we were talking about was like an ARM or a RISK5 microcontroller, you know, done.
Paul Teich 22:08
But we live in a different world now. And so security at the edge has seen more challenges than security in the core. Because security in the core is heavily defended. Okay, if you’re talking about you know, your whole network is exposed, if somebody pries into an IoT edge node or an endpoint and gets past that authentication and credential system and becomes a known good player on the system. The reason why we take a look at “is that actor in security? Is that actor qualified to take that action even though they have credentials? Have they done that before? Is that something we want them to do?” And so IoT endpoints need that security.
Paul Teich 22:51
And that’s going to rely on inferencing models trained elsewhere. You can train that in cost effective, low power IoT endpoint. But there’ll be some little AI, deep learning acceleration kernel to run simple models at the edge or increasingly complicated models at the edge. And if they’re different, there’s a different metric to the same ends for lowering the cost of each inference. So the energy cost of an inference matters in a battery powered or solar powered resource. Powered IoT endpoint. It matters in a 5g base station. All of these places are going to have be running inferencing models to tune performance, as well as a defense system.
Simon Erickson 23:51
So let me let me unpack some of that there, Paul. Holy cow, what a bunch to take in! I probably need to watch that about 10 more times for it all to sink in there with you. But I mean, there’s some some interesting points you bring up. First of all is that data science is having challenges scaling. And you know, part of this is we’ve seen these platforms — not just the software at the top application layer — but these platforms like the Splunk’s, the Twilio’s, the Fastly’s of the world that are more usage based models now. So they can try to automate a lot of that stuff, take the human element out of it. Find those analytical problems. Security vendors too; the same thing. The goal is to try to make decisions as efficiently as possible.
Simon Erickson 24:30
I wanted to follow up on something you said multiple times about the difference between machine learning training and machine learning inference. Which are very different from each other. We’ve been training, we talked about Inferentia. We talked about Amazon. We’ve been training what a certain word is and how to recognize certain words for so long. We’ve been training autonomous cars what a stop sign is, what a deer running in front of the road is, for years.
Paul Teich 24:54
And still the race car drives into the wall! [laughs]
Simon Erickson 24:56
Right! It’s still, like, you’ve got Nvidia now has trained everything. Video rendering for everything to recognize things. And now it’s kind of taking the next step to inference. It’s using GANs — generative adversarial networks — to create shapes of people’s faces for Zoom calls so that it takes less bandwidth on those calls. Obviously latency a really big deal for self driving cars. Amazon has an interesting…I mean, the computing for inference is so much greater than of training.
Simon Erickson 25:28
So shifting directions now to the semiconductor part of our conversation. We’ve gotten used to GPUs for training. I would say Nvidia has become like the de facto standard for training. Do you think that’s going to be the same for inference? Or is this going to require a boatload more horsepower than Nvidia can provide?
Paul Teich 25:47
It’s not about horsepower. Okay, this is Nvidia’s challenge is lowering the cost of energy per inference, no matter what your inferencing model looks like. What’s the cost of energy to deliver that inference in the time you have available to deliver it the outlay? If it’s an IoT or edge use case, there’s different constraints on what am I doing with my sensor data? Where am I sending it? How am I summarizing it? Right.
Paul Teich 26:19
But let’s back up a second. So Moore’s law is kind of broken, you had a glancing mention of that at the beginning, right? So and where we see that broken is in core frequencies just not really going anywhere. So we can measure the core frequencies in the clouds at Liftr Insights and they’re flat. You get some new stuff in cascade lake that’s starting to lift a little bit. But we’re still not talking about seeing regularly four and five gigahertz cores. Just doesn’t happen. Okay.
Simon Erickson 26:51
Can you talk a little bit more about that too Paul? What you’re doing with Liftr Insights? The public cloud transparency and kind of how you’re seeing behind the scenes of what’s getting deployed out there. Or what’s being used for instances out there.
Paul Teich 27:03
Right, so we do a semi monthly enumeration of all of the rentable configurations from the top four clouds today. More later, but Microsoft, Google, Alibaba, and Amazon. And so that configuration space is huge. If we’re talking about just the SKUs, the configurations available, Linux based on demand worldwide, without taking into account zones. This is just the the SKUs available in each provider region, about 17,000 per scan and accelerating. It started at 11,000 back in 2009, beginning of 2019. So you take this big shelf space and we take a look at AMD versus Intel versus ARM, which is grabbing on. And we look at Nvidia versus the FPGAs, Intel and AMD, and all these others. We have a vast amount of data that the clouds tell us. For the most part, 90 to 93% coverage of base speeds for each virtual CPU. How many cores per instance, but how much memory per core. So we see a lot of this data flying by and what’s really apparent. And we do this full enumeration every time we scan. So we know what’s been added. At a very granular level, this size was added in this geography. New instance type was added. Like this, I think AWS announced yesterday that their new A100 based type is now GA [generally available]. Yes, as of yesterday, in two regions, US West and US East. And we’ll see that on our mid month scan. So every two weeks, we pick up the new stuff that shows up. And frequencies per core have been flat.
Paul Teich 29:02
Not only that, but each cloud has their own profile. Google’s a little bit lower than the rest. They run their cloud a little bit cooler. Maybe closer to ambient air temperature. Again, you have cooling, lowering the PUE [power usage effectiveness] for a data center. Many people do different clouds, do different tricks in terms of how do I not spend a lot of money on the air conditioning, so more of my power budget goes to those compute jobs. Okay, so Google’s in generally lower frequency. I call this fairly high frequency compared but it’s a narrow range, okay? It’s like you know, 2.4 to 2.8 gigahertz with very little above three gigahertz. Just you know, some isolated instance types. Some of the newer ones starting to edge up there, but really slowly specialista. We’ll see that happen wholesale when they go to cooling, like water cooling. So when when the clouds start to adapt water cooling and other exotic reasons cooling to mainstream instance types, we’ll see more of the higher frequency types.
Paul Teich 30:05
But where that comes into play is you can’t really just accelerate things. A lot of problems don’t scale by adding a whole lot more cores. For CPUs, actually, a lot of problems are limited by memory domain on the GPU side. So this is why Nvidia has those big HGX boxes with 16 GPUs and all their associated memory. Because that memory domain forms the limits of how fast they can compute. If they expand that more by doing outside the boxes, it slows down the network and then becomes a limiting factor. And how fast that cluster can operate just like a HPC [high performance computing] cluster. Okay, so why people went to InfiniBand and all these extreme measures. Once you have a big cluster of compute power, the network starts to be all those little serial network lines become the limiting factor to how fast you can go.
Paul Teich 30:59
So, back to inferencing chips. The Moore’s Law thing comes into play is that when you have a trained model that doesn’t look like other people’s trained model to do anything. When you’ve taken your data and you’ve created something that’s pretty optimal, then you have to tune that model for the hardware that you have available. And what’s happened in the past few years is we’ve democratized design. And Intel, bless them, last vendor standing as an integrated design and manufacturing shop, for the most part. AMD separated with Global Foundries a long time ago now. Nvidia, never been there. But what we’re finding is the clouds have enough performance data and they have enough cash on hand that, if they want, to go design a chip to go accelerate inference. Alexa — to spoil the punchline — you’re gonna accelerate in their Amazon Alexa real estate and they have data centers doing Alexa. You’ve got 10, 20, 30 million people using Alexa every hour. You have data center space dedicated to serving Alexa in a meaningful way. That’s influential. So you design your own chip. And if you have enough scale and you deploy this chip at scale enough, then you get the benefit from having optimal code running on that suits your service needs, running on a processor that’s better tuned to run that code at a lower operational costs. That is why you do this. .
Paul Teich 32:55
So you have a one time upfront capex to buy a server with GPUs in it. And it’s probably a lot more than buying that same server with the Inferentia chip in it, because the Inferentia chip’s not going through distribution. There’s nobody getting a margin on it. It’s just, you designed it, you’re having a contract manufacturer, it goes straight into your system.
Paul Teich 33:21
But what matters the most isn’t that cost differential. What matters the most is, am I getting the best opex out of that part? If I’m not getting good opex out of that part, I need to be looking on the merchant market for something that’s doing a better job. Or I need to share my data with somebody who can do a better job at custom designing a part. So all of the clouds are designing their own inferencing chips right now. Everybody in the top 12 is doing some kind of research and development on inferencing chips for some set of applications that matters to them.
Simon Erickson 33:58
And my investing question for you on that is, “is that impacting the Nvidias and the — well, previously — the Xilinx’s and now the AMDs of the world, that previously designed those chips?” Are those being disrupted by in house proprietary ASICs?
Paul Teich 34:14
They will be. I mean, if you look at what Amazon is doing, clearly the Inferentia estate could have been GPUs. And it’s expanding there. And I can say the same about Graviton and ARM based processors, is they’re growing Gravitron dramatically. So same thing there, lowering the cost of serving a processor-based task, right? And that cost is opex. Customers want performance. And if they don’t care what instruction set delivers the performance, they just want reliable performance. Graviton…if it’s lowering Amazon’s operational cost, changes the equation for Amazon. And I think where we’re seeing some of the ARM server chip development fall out happen in the merchant market think I’ve heard rumors about Marvell very recently losing some developers. And so I would kind of expect that it’s going to be a little tougher there.
Paul Teich 35:24
I’m mixing a lot of things here. So let me let me back up a second. So closing out the GPUs and FPGAs. I think FPGAs will always have a slice of the market, because you can test a design on an FPGA before committing to an ASIC, custom silicon. So I think that folks will always have test beds. They’ll probably have some initial deployments of big new capabilities using FPGAs, if they’ve got to get them to market ahead of a full custom silicon design. The big battle is going to be on the software side.
Paul Teich 36:09
And so history, we kind of all know, is that AMD has done a really good job on client GPUs. Because of DirectX and these really simple GPU API’s. And so they were able to latch into Windows and the gaming consoles and provide this driver level software for whoever managed the platform. In this case, Microsoft Windows or Xbox folks or whoever, right? They did a lot of software co-development with AMD. AMD really doesn’t have…they tried it with OpenCL and OpenGL. Didn’t really stick. But they don’t really have a software developer, a general software developer program that would translate into competing with Nvidia in IaaS. Okay. Just didn’t happen for AMD for a variety of reasons. Xilinx has a pretty good developer moment. Okay, the challenge is that FPGAs are still hard to program. They at least have an ecosystem, they know how to build and grow this. And I think that’s the real synergy with AMD. Is that AMD, they get the FPGAs, and that keeps a slice of high value market for developer teams and early deployments and things like that. But the real value for AMD is a seasoned software developer team that knows how to create an ecosystem. That’s fighting your way out of the paper bag kind of stuff.
Simon Erickson 37:49
Okay. So Paul, allow me to step back just so I can connect the dots as best as I’m capable of doing here. But it seems like you’ve got some powerhouse use cases out there. You’ve got Alexa used by 30 million people at the time. So you’ve got a flood of data. And at that point, you want to create optimal software code, right? Or somebody that’s going to manipulate that in the perfect way that you want to. And that’s why Amazon goes out and creates its own custom chips. And so you’ve got these really big cases. And the cases for this, the reason they want to do this, is to get the opex as low as possible. You’ve got trillions of operations per second. You want the watts required; the power to actually do those to be as minimal as possible. And so we’re starting to see – if I’m hearing you correctly – the software and the ecosystem and the developer side of this becoming more and more of an important part of getting that middle step between power and the end software itself.
Paul Teich 38:39
And if we take a look at IaaS, the difference between SaaS and IaaS is that my data scientists can go build what they need using IaaS and PaaS.
Simon Erickson 38:52
And when you’re saying IaaS, is this Nvidia’s CUDA? Where you actually are letting a developer build on a GPU and program something and code it in C? Are you talking about something different?
Paul Teich 39:03
Python. But yeah, you are down at the level of programming in Python using these higher level libraries, maybe some higher level constructs. That’s where some of the past stuff comes into play that’s evolving now. But the challenge is that packaging that as a software as a service that applies to a whole bunch of horizontal companies doing the same thing just hasn’t happened yet. We don’t have that knowledge in the industry yet. And so AMD wants in. Okay, right. Right now, we don’t see many MI series GPUs in the cloud. And there’s a bunch of reasons. But one of them is that there’s no way … they don’t have the same CUDA level programming support and library level programming supported Nvidia has driver support for enterprise. It’s another thing that Xilinx does well, that AMD could definitely use. Is enterprising cloud class driver support. Not gaming drivers, but enterprise deep learning, bet-your-business, mission critical type driver support.
Paul Teich 40:14
And that that takes us back to Nvidia now. Who is buying ARM. And so they’ve got they’ve got the Smart NICs with Mellanox. Now they’re trying to buy ARM. See what happens in China.
Simon Erickson 40:32
$40 billion dollars Paul! This isn’t pocket change anymore! That’s a lot of money!
Paul Teich 40:37
But you’re sitting on a war chest. I mean, at some point, it’s not worth anything if you don’t spend it. Right?
Simon Erickson 40:44
Fair enough. Good point.
Paul Teich 40:49
Are they overpaying? Don’t care. Okay. If all they did was keep that cash and you know, generate arbitrage money off of it, that just doesn’t count. So they’re investing now in the future of their business and competing with Intel. Which has been Jensen’s dream for a long time. And so the trick there is, the super seven, the top dozen clouds are probably not your target. They are rolling their own right now. They’re doing their own industry inferencing chips. A lot of them are looking at doing their own processor development. So a lot of that is ARM-based, by the way. Bostly where RISK5 goes in the future. Those are your choices pretty much. But RISK5 doesn’t yet have an enterprise ecosystem. ARM at least has that work been done for it. And AWS has really been pushing it.
Paul Teich 41:42
So custom design in the top clouds is going to be a feature of the landscape. It’s going to eat into market share. Where Nvidia and AMD — because now AMD is in the Smart NIC market, because Xilinx has been powering Smart NICs — so the AMD/Xilinx then gets AMD’s GPUs into play. To get AMD into the Smart NIC game, you’re starting to see these ecosystems really start to develop. Telcos, tier two clouds, folks who need to buy it who don’t have the money to do their own design. It still takes 10s of millions of dollars to design, verify, and deploy at scale, harden a chip design. It’s not for the faint of heart. Takes specialized skill sets. And so you’re looking at $30 to $40 million. Now, maybe a little bit less if you’re doing this a lot. But that’s a big investment there, right? If you’re spending hundreds of millions of dollars developing several chips at a time over the course of a year, like Amazon does. They do that.
Paul Teich 43:01
They have chip design teams. And so one more chip for servers is not a big deal for them. Not as big a deal as it would be for somebody in the telco space. Here, that’s not a center of excellence. And so this is where I think Nvidia, Intel, AMD, these combined ecosystems will still have a lot of ground running. Because not everybody can design their own chip.
Paul Teich 43:27
First, you have to know you, you have to have an app. Alexa is the app. Google search is the app. Microsoft language translation and your spell checker is the app. They’ve all got…those clouds have apps. They need specialized silicon to lower the cost of energy to make their margins. And so if you’re just selling cycles, okay, real estate investment trust. You probably may have some of that security app data. So everybody on that score is generating a lot of security knowledge. But in terms of app knowledge, probably not the dedicated focus. So look, and you’ll find it to the big app based clouds. Facebook. Yeah. Be really surprised if they’re not designing their own chips. When you have the app and you’re deployed at that kind of scale, you’re working with somebody to go make this all run faster, better, cheaper.
Simon Erickson 44:37
That’s the next question we had from our audience, Paul. Is I think exactly what you’re talking about right now.
Simon Erickson 44:41
This is from Manoj Roje. He says “as the seven hyper scalars plus Apple — and I believe the hyper scalars he is talking about the large clouds, the Alphabets, the Facebooks the Alibaba’s of the world — are building their own chips, inferences, etc, what percent of their capex spend would be on semiconductors and how do they divide it between Central and edge cloud deployments in the next five years?” So are the large companies putting more and more money into semiconductor chips that they’re developing themselves? And are they deploying that in kind of these centralized or on these edge cloud deployments?
Paul Teich 45:16
All of the above? Multiple equations with multiple variables. But let’s try to tease that apart. Okay, so…
Simon Erickson 45:24
A great question, by the way.
Paul Teich 45:25
It is a very good question. What’s your edge? So, in some respects, Amazon’s AWS edge is local zones. So at Liftr Insights, we started to pick up local zones and wavelength this last month or so. We’re starting to see those new regions pop up. And the end point then becomes an outpost enclave. AWS outposts. So there, you’re talking about distributing data center infrastructure to the edge as part of your business. And outposts are a slice of what AWS does in their data center. And it’s not just entirely probable, but they’re definitely going to put Gravitron in outposts. That’s their plan.
Paul Teich 46:21
Okay. You want to do inferencing, they’ll have that Inferentia chip. They’ll probably have already started to have some GPU enabled SKUs out on the edge for the general purpose folks. So I think what the real question there is, is what percentage of that mix is merchant supply, and what is in house design and contract manufacturing? Because that’s what mostly is going to affect Intel, AMD, Nvidia, those those folks. Is how much of this does Amazon build by itself over the long haul? Could they get into memory eventually? That’s probably way farther than I’m going to talk. So memory chip supply will probably be merchant for a long time. They’re going to do their own custom assemblies and packaging, I would expect. There’s probably a point at which the big clouds get into buying new good die and doing your own multi chip packaging. I want to put a GPU and go high bandwidth memory stack and a couple of processors in the same package. Nvidia, can you ship me known good die? They’ll complain, but they’ll do it. [laughs] As long as that business doesn’t go to the Inferentia chip, they’re pretty happy with that.
Paul Teich 47:47
So the the landscape changes about who’s designing chips, building chips, packaging chips will change. Known good die, the source of that, is either merchant or contract manufacturer. And it’s too early to call out percentages. But you have to imagine that AWS is pointing the way with Graviton and Inferentia, that they’re increasing part of their mix for compute power. It’s going to be in house design, contract manufactured silicon.
Simon Erickson 48:26
And what’s your take on the question that you just posed? About proprietary ASIC versus merchant, that’s commercially available? For the big guys like Amazon, is this a 30/70? split, of in house versus merchant? Or is it 50/50? Where are they spending their time and their money?
Paul Teich 48:45
Take memory and storage out of it for now?
Simon Erickson 48:47
Paul Teich 48:47
Okay. Those are really specialized things. But for compute power for IaaS, I think if you look at what it takes to design for Inferentia, AWS had to stand up the same kind of development environment that Nvidia has. That Xilinx has. In order to enable developers to access Inferentia. That’s a big effort. Gravitron, less so. It’s a standard instruction set, it’s ARM. You know, Linux supports ARM. They’ve taken some time to port their databases. So a lot of their past databases now run on Graviton.
Paul Teich 49:32
And so, running specialized silicon internally makes a lot of sense. Exposing it to IaaS and PaaS customers takes a whole different level of commitment. So once you have the cheap chips, once you’re buying them in volume to serve your search, to serve your grammar checker, then you make the decision “okay, so these boards are cheap now. Do I want to spend the money on software to go enable them for IaaS development?” Okay, and that’s a different business question. I think increasingly the answer will be, “yeah, why don’t we do that.” Processors definitely. Depends on how general purpose that accelerator silicon is. So got to imagine Inferentia has some promise for some apps that are not voice. That Amazon’s worked very hard on certain classes of models, for doing image recognition and stuff so that they would run on Inferentia now that they’ve been built.
Simon Erickson 50:39
So Inferentia’s not just a proprietary, behind the scenes and behind the walls of Amazon project. This could be something that opens up, because of all the work that they put into it.
Paul Teich 50:48
That’s their intent. You can go rent an Inferentia instance type. And the other thing is drivers. So right now, on all of the enterprise drivers, with the exception of an AMD virtual desktop SKU at Azure, run on Xeon. So all the GPUs, even Inferentia, is running with an Intel processor paired with it. Because stable drivers. They want a stable platform, a single platform for driving development. Where we’ll see AMD being really successful, and this is also true within AWS for Gravitron, is when you start seeing the first dedicated accelerators. Where they paired an Epic with an Nvidia chip. Or even an Epic and an AMD GPU, for one of those general purpose compute types. Because right now, doesn’t happen. And same with Gravitron. When you see Gravitron paired with Inferentia in AWS’ public cloud, as a rentable instance type, it means they’ve done a lot of development work on those drivers.
Simon Erickson 52:04
Paul, my final question for you is: we’ve seen so many acquisitions. And we’ve been talking about them throughout this entire conversation. But just to throw a couple of the numbers behind them: Nvidia buying Mellanox — you were talking about for networking chips — earlier this year, $7 billion deal for that. We just saw them go out and buy ARM Holdings, the majority of ARM Holdings, for $40 billion. We see Analog Devices buying Maxim Integrated Products. We haven’t even talked about that one, that was a quiet $21 billion deal right there. And then AMD going out and buying Xilinx — what you and I have talked about for years as a company — for $35 billion dollars.
Simon Erickson 52:42
You can take this in so many different directions, Paul. But maybe the question I ask you is “What was your favorite acquisition of all of those that were announced in 2020?” Who got the best deal? Maybe even price tag aside, what are you most excited about in terms of these semiconductor acquisitions that took place this year?
Paul Teich 53:01
Aside from my previous history with AMD (I worked there for 20 years), I’m gonna have to call the AMD/Xilinx deal. I think it potentially has the biggest impact in my central interest, which is cloud data center. So some of the other stuff — automotive and IoT, there’s some good plays there — but I think the Xilinx purchase is AMD acquiring a mature software organization at its core. It’s not about the FPGAs. The Smart NICs is also a good play. But it’s really about, Xilinx understands the businesses that AMD wants to be in. Their software development teams could use the hardware resources that AMD has. Those products. And be really successful. And so I see, other people want to go take down Intel explicitly. I think AMD, this is kind of table stakes for them to remain a player in the Core Data Center. They’ve got a really good product with Epic, Epic Gen2, Gen3. From all I’ve heard, it looks pretty solid. But they have to grow as an organization to stay in the game. And I think this keeps that rolling.
Simon Erickson 54:25
It get them closer to developers. Xilinx is a very customizable, reprogrammable chip. This kind of gives AMD a relationship with those developers that are using those products.
Paul Teich 54:37
Absolutely. And it gives them that whole experienced team to go create a GPU ecosystem and a deep learning ecosystem, more importantly. To go combat Nvidia. And those in house designs. They’ve got to up there too.
Simon Erickson 54:55
And back to the Nvidia/ARM acquisition too. Again, $40 billion. That’s a huge acquisition. I’ve seen a lot of headlines. I’d love to hear your perspective on this one, Paul. I mean, is this all about the IP? And the relationships with the cloud that ARM had? And the licensing model that ARM had? Or, what is it that Nvidia is so interested in ARM Holdings?
Paul Teich 55:16
From my particular viewpoint, low power design. Nvidia’s had real trouble below 30 watts, historically. They’re great at designing high performance parts. I mean, no two ways about it. That’s their forte. They’re great at that ecosystem enablement. But they haven’t been really good at lowering that cost per transaction.
Paul Teich 55:44
So we’re seeing with the A100 — and somebody is going to give me a call tomorrow about this, I know [laughs]. It’s kind of a mainframe story. Okay. If you keep an A100 fed and happy, if you keep it 100% utilized, the cost per inference goes way down. But that’s IBM story for a Linux mainframe. And some people do. But it’s really a training part. For most folks, if you cannot virtualize that’s A100 satisfactorily, it’s really not going to lower the cost per transaction enough. So anybody who puts this in a public cloud: so AWS, GCP, got to imagine Azure and Alibaba are not far behind, they’ve got to figure out how to slice and dice an A100 to attract folks to do inferencing on it. And how to keep how to keep it fed well. Or else they’re going to start pushing your customers via pricing and availability to other products. So anyhow, that was probably more than you asked for…
Simon Erickson 57:01
I was actually going to ask you another question about it! I’m wanting even more! It’s still all about the opex though? It’s about presenting options to the cloud providers, so they can still get the costs — in terms of watts, depending on whatever it is that they want to use those processes for.
Paul Teich 57:16
And I buy Jensen’s point, that he’s buying a customer list. Right? But with that customer list, he’s got expectations. So the whole company Nvidia is going to have to figure out, to serve ARM’s customers well, they’ve got to leverage ARM’s low power design capabilities. And this is where ARM hasn’t been effective in designing a general purpose GPU that can be used in general applications. And so this is another marriage that could have an impact. But Nvidia is already the market leader in their segment. ARM is already the front runner, if you’re not AMD or Intel. I think it’s big. But less transformative than AMD and Xilinx.
Simon Erickson 58:03
Great. And how about Intel? Is Intel keeping up with all the developments in AI? We talked a lot about AMD. We talked a lot about Nvidia. How is Intel reacting to its competitors moving so quickly?
Paul Teich 58:13
Simon Erickson 58:15
That’s another two hour conversation that I just spotted you up with. [laughs]
Paul Teich 58:18
It’s a tough question. Because, you know, they’ve got FPGAs. But they’re caught in the middle between GPUs and specialty silicon Habana. And before that, other companies, right? Nervana. There’s a whole litany of companies that Intel has bought.
Paul Teich 58:32
And the CPU design folks who are putting in tensor cores, essentially. They’re putting in AI acceleration, deep learning acceleration. And so Intel’s challenge is it’s always CPU first. And so they’ve designed a bit of software. This Intel One, whatever they’re calling it, that is a unified, you know, “you give us code and we’ll optimally port it to our processors, or to an FPGA, or to our GPUs, or when that happens. Or our Habana silicon. Right?
Paul Teich 59:04
And so, you write to this high level code. And fundamentally, I think that group should spin out and do the whole write, once run on anything, right? If they can do that code, that’s brilliant. And it’s got a whole business model all by itself.
Paul Teich 59:23
But Intel is a silicon design and manufacturer house. They are processor-first. And that’s the thing that bites them every time. Is that they have care and feeding and market funds for the other groups. And they constantly have internal struggles about who to put forward and who not to put forward. And at some point, they have to grow liike Nvidia needs to grow past GPUs into an AI company. “We don’t care if it runs graphics.” Intel’s got to evolve to a point where they’re just like, “okay, we don’t care what silicon we’re selling you. We need to sell you appropriate silicon for your app.”
Simon Erickson 1:00:05
Yeah, great insight here, Paul.
Simon Erickson 1:00:07
You can tell — for anyone watching this conversation — why I refer to Paul Teich as “the smartest guy in the room” when it comes to cloud computing and when it comes to the semiconductor industry. He’s got 40 years of IT experience and 12 patents to his name. He’s a principal analyst out at Liftr Insights in Austin, Texas. Hey Paul, I had a lot of fun. Thanks for chatting with me and 7investing today.
Paul Teich 1:00:26
Thanks for having me, Simon. It was a lot of fun.
Simon Erickson 1:00:28
And thanks for everybody for tuning in. We had a good time. We are here to empower you to invest in your future. We are 7investing!
Related 7investing articles: