This Article is available as a video essay on YouTube
In the days after SpaceX’s awe-inspiring Starship launch-and-catch — watch the first eight minutes of this video if you haven’t yet — there was another older video floating around on X, this time of Richard Bowles, a former executive at Arianespace, the European rocket company. The event was the Singapore Satellite Industry Forum, and the year was 2013:
This morning, SpaceX came along and said, “We foresee a launch costing $7 million”. Well, ok, let’s ignore the 7, let’s say $15 million…at $15 million every operator would change their gameplane completely. Every supplier would change their gameplan completely. We wouldn’t be building satellites exactly as we are today, so a lot of these questions I think it might be interesting to go on that and say, “Where do you see your companies if you’re going to compete with a $15 million launch program.” So Richard, where do you see your company competing with a $15 million launch?”…
RB: SpaceX is an interesting phenomenon. We saw it, and you just mentioned it, I thought it was $5 million or $7 million…
Why don’t you take Arianespace instead of SpaceX first. Where would you compete with a $15 million launch?
RB: I’ve got to talk about what I’m competing with, because that then predicates exactly how we will compete when we analyze what we are competing with. Obviously we like to analyze the competition.
So today, SpaceX hasn’t launched into the geosynchronous orbit yet, they’re doing very well, their progress is going forward amazingly well, but I’m discovering in the market is that SpaceX primarily seems to be selling a dream, which is good, we should all dream, but I think a $5 million launch, or a $15 million launch, is a bit of a dream. Personally I think reusability is a dream. Recently I was at a session where I was told that there was no recovery plan because they’re not going to have any failures, so I think that’s a part of the dream.
So at the moment, I feel that we’re looking, and you’re presenting to me, how am I going to respond to a dream? My answer to respond to a dream is that first of all, you don’t wake people up, they have to wake up on their own, and then once the market has woken up to the dream and the reality, then we’ll compete with that.
But they are looking at a price which is about half yours today.
RB: It’s a dream.
Alright. Suppose that you wake up and they’re there, what would you Arianespace do.
RB: We would have to react to it. They’re not supermen, so whatever they can do we can do. We would then have to follow. But today, at the moment…it is a theoretical question at this moment in time.
I personally don’t believe it’s going to be theoretical for that much longer. They’ve done everything almost they said they would do. That’s true.
The moderator ended up winning the day; in 2020 Elon Musk said on a podcast that the “best case” for Falcon 9 launches was indeed $15 million (i.e. most cost more, but that price point had been achieved). Of course customers pay a lot more: SpaceX charges a retail price of $67 million per launch, in part because it has no competition; Arianespace retired the Ariane 5 rocket, which had a retail launch price of $178 million, in 2023. Ariane 6 had its first launch this year, but it’s not price competitive, in part because it’s not reusable. From Politico:
The idea of copying SpaceX and making Ariane partly reusable was considered and rejected. That decision haunts France’s Economy Minister Bruno Le Maire. “In 2014 there was a fork in the road, and we didn’t take the right path,” Le Maire said in 2020.
But just because it works for Elon, doesn’t make it good for Europe. Once it’s up and running, Ariane 6 should have nine launches a year — of which around four will be for institutional missions, like government reconnaissance satellites and earth observation systems. The rest will be targeted at commercial clients.
Compare that to SpaceX. Fed by a steady stream of Pentagon and industry contracts, in addition to missions for its own Starlink satellite constellation, Musk’s company carried out a record 96 launches in 2023.
“It wasn’t that we just said reusability is bullshit,” said [former head the European Space Agency Jan] Wörner of the early talks around Ariane 6 in the mid-2010s, and the consideration of building reusable stages rather than burning through fresh components each mission. “If you have 10 flights per year and you are only building one new launcher per year then from an industrial point of view that’s not going to work.”
Wörner’s statement is like Bowles in the way in which it sees the world as static; Bowles couldn’t see ahead to a world where SpaceX actually figured out how to reuse rockets by landing them on drone ships, much less the version 2 example of catching a much larger rocket that we saw this weekend. Wörner, meanwhile, can’t see backwards: the reason why SpaceX has so much more volume, both from external customers and from itself (Starlink), is because it is cheap. Cheapness creates scale, which makes things even cheaper, and the ultimate output is entirely new markets.
The SpaceX Dream
Of course Bowles was right in another way: SpaceX is a dream. It’s a dream of going to Mars, and beyond, of extending humanity’s reach beyond our home planet; Arianespace is just a business. That, though, has been their undoing. A business carefully evaluates options, and doesn’t necessarily choose the highest upside one, but rather the one with the largest expected value, a calculation that incorporates the likelihood of success — and even then most find it prudent to hedge, or build in option value.
A dreamer, though, starts with success, and works backwards. In this case, Musk explained the motivation for driving down launch costs on X:
First off, this made it imperative that SpaceX find a way to launch a massively larger rocket that is fully recoverable, and doesn’t include the weight and logistics costs of the previous approach (this weekend SpaceX caught the Super Heavy booster; the next step is catching the Starship spacecraft that sits above it). Once SpaceX can launch massively larger rockets cheaply, though, it can start to do other things, like dramatically expand Starlink capability.
The next generation Starlink satellites, which are so big that only Starship can launch them, will allow for a 10X increase in bandwidth and, with the reduced altitude, faster latency https://t.co/HLYdjjia3o
— Elon Musk (@elonmusk) October 14, 2024
Starlink won’t be the only beneficiary; the Singapore moderator had it right back in 2013: everyone will change their gameplan completely, which will mean more business for SpaceX, which will only make things cheaper, which will mean even more business. Indeed, there is a window to rocketports that don’t have anything to do with Mars, but simply facilitate drastically faster transportation here on planet earth. The transformative possibilities of scale — and the dramatic decrease in price that follows — are both real and hard to imagine.
Tesla’s Robotaxi Presentation
The Starship triumph wasn’t the only Musk-related story of the week: last Thursday Tesla held its We, Robot event where it promised to unveil its Robotaxi, and observers were considerably less impressed. From Bloomberg:
Elon Musk unveiled Tesla Inc.’s highly anticipated self-driving taxi at a flashy event that was light on specifics, sending its stock sliding as investors questioned how the carmaker will achieve its ambitious goals. The chief executive officer showed off prototypes of a slick two-door sedan called the Cybercab late Thursday, along with a van concept and an updated version of Tesla’s humanoid robot. The robotaxi — which has no steering wheel or pedals — could cost less than $30,000 and “probably” will go into production in 2026, Musk said.
The product launch, held on a movie studio lot near Los Angeles, didn’t address how Tesla will make the leap from selling advanced driver-assistance features to fully autonomous vehicles. Musk’s presentation lacked technical details and glossed over topics including regulation or whether the company will own and operate its own fleet of Cybercabs. As Jefferies analysts put it, Tesla’s robotaxi appears “toothless.”
The underwhelming event sent Tesla’s shares tumbling as much as 10% Friday in New York, the biggest intraday decline in more than two months. They were down 7.6% at 12:29 p.m., wiping out $58 billion in market value. The stock had soared almost 70% since mid-April, largely in anticipation of the event. Uber Technologies Inc. and Lyft Inc., competing ride-hailing companies whose investors had been nervously awaiting the Cybercab’s debut, each surged as much as 11% Friday. Uber’s stock hit an all-time high.
Tesla has a track record of blowing past timelines Musk has offered for all manner of future products, and has had a particularly difficult time following through on his self-driving forecasts. The CEO told investors in 2019 that Tesla would have more than 1 million robotaxis on the road by the following year. The company hasn’t deployed a single autonomous vehicle in the years since.
First off, the shockingly short presentation — 22:44 from start to “Let’s get the party started” — was indeed devoid of any details about the Robotaxi business case. Secondly, all of the criticisms of Musk’s mistaken predictions about self-driving are absolutely true. Moreover, the fact of the matter is that Tesla is now far behind the current state-of-the-art, Waymo, which is in operation in four U.S. cities and about to start up in two more. Waymo has achieved Level 4 automation, while Tesla’s are stuck at Level 2. To review the levels of automation:
- Level 0: Limited features that provide warnings and momentary assistance (i.e. automatic emergency braking)
- Level 1: Steering or brake/acceleration automation (i.e. cruise control or lane centering)
- Level 2: Steering and brake/acceleration control, which must be constantly supervised (i.e. hands-on-wheel)
- Level 3: Self-driving that only operates under pre-defined conditions, and in which the driver must take control immediately when requested
- Level 4: Self-driving that only operates under pre-defined conditions, under which the driver is not expected to take control
- Level 5: Self-driving under all conditions, with no expectation of driver control
Waymo has two big advantages relative to Tesla: first, its cars have a dramatically more expansive sensor suite, including camera, radar, and LiDAR; the latter is the most accurate way to measure depth, which is particularly tricky for cameras and fairly imprecise for radar. Second, any Waymo car can be taken over by a remote driver any time it encounters a problem. This doesn’t happen often — once every 17,311 miles in sunny California last year — but it is comforting to know that there is a fallback.
The challenge is that both of these advantages cost money: LiDAR is the biggest reason why the Generation 5 Waymo’s on the streets of San Francisco cost a reported $200,000; Generation 6 has fewer sensors and should be considerably cheaper, and prices will come down as Waymo scales, but this is still a barrier. Humans in data centers, meanwhile, sitting poised to take over a car that encounters trouble, are not just a cost center but also a limit on scalability. Then again, higher cost structures are its own limitation on scalability; Waymos are awesome but they will need to get an order of magnitude cheaper to change the world.
The Autonomy Dream
What was notable about Musk’s Tesla presentation is what it actually did include. Start with that last point; Musk’s focus was on that changing the world bit:
You see a lot of sci-fi movies where the future is dark and dismal. It’s not a future you want to be in. I love Bladerunner, but I don’t know if we want that future. I think we want that duster he’s wearing, but not the bleak apocalypse. We want to have a fun, exciting future that if you could look in a crystal ball and see the future, you’d be like “Yes, I wish that I could be there now”. That’s what we want.
Musk proceeded to talk about having a lounge on wheels that gave you your time back and was safer to boot, and which didn’t need ugly parking lots; the keynote slides added parks to LAX and Sofi and Dodger Stadiums:
One of the things that is really interesting is how will this affect the cities that we live in. When you drive around a city, or the car drive you around the city, you see that there’s a lot of parking lots. There’s parking lots everywhere. There are parking garages. So what would happen if you have an autonomous world is that you can now turn parking lots into parks…there’s a lot of opportunity to create greenspace in the cities that we live in.
This is certainly an attractive vision; it’s also far beyond the world of Uber and Lyft or even Waymo, which are focused on solving the world as it actually exists today. That means dealing with human drivers, which means there will be parking lots for a long time to come. Musk’s vision is a dream.
What, though, would that dream require, if it were to come true? Musk said it himself: full autonomy provided by a fleet of low cost vehicles that make it silly — or prohibitively expensive, thanks to sky-rocketing insurance — for anyone to drive themselves. That isn’t Level 4, like Waymo, it’s Level 5, and, just as importantly, it’s cheap, because cheap drives scale and scale drives change.
Tesla’s strategy for “cheap” is well-known: the company eschews LiDAR, and removed radar from new models a few years ago, claiming that it would accomplish its goals using cameras alone.1 Setting aside the viability of this claim, the connection to the the dream is clear: a cameras-only approach enables the low cost vehicles integral to Musk’s dream. Yes, Waymo equipment costs will come down with scale, but Waymo’s current approach is both safer in the present and also more limited in bringing about the future.
What many folks seemed to miss in Musk’s presentation was his explanation as to how Tesla — and only Tesla — might get there.
The Bitter Lesson
Rich Sutton wrote one of the most important and provocative articles about AI in 2019; it’s called The Bitter Lesson:
The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore’s law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. There were many examples of AI researchers’ belated learning of this bitter lesson, and it is instructive to review some of the most prominent.
The examples Sutton goes over includes chess, where search beat deterministic programming, and Go, where unsupervised learning did the same. In both cases bringing massive amounts of compute to bear was both simpler and more effective than humans trying to encode their own shortcuts and heuristics. The same thing happened with speech recognition and computer vision: deep learning massively outperforms any sort of human-guided algorithms. Sutton notes towards the end:
One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.
It’s a brilliant observation, to which I might humbly add one additional component: while the Bitter Lesson is predicated on there being an ever-increasing amount of compute, which reliably solves once-intractable problems, one of the lessons of LLMs is that you also need an ever-increasing amount of data. Existing models are already trained on all of the data AI labs can get their hands on, including most of the Internet, YouTube transcripts, scanned books, etc.; there is much talk about creating synthetic data, both from humans and from other LLMs, to ensure that scaling laws continue. The alternative is that we hit the so-called “data wall”.
LLMs, meanwhile, are commonly thought about in terms of language — it is in the name, after all — but what they actually predict are tokens, and tokens can be anything, including driving data. Timothy Lee explained some of Waymo’s research in this area at Understanding AI:
Any self-driving system needs an ability to predict the actions of other vehicles. For example, consider this driving scene I borrowed from a Waymo research paper:
Vehicle A wants to turn left, but it needs to do it without running into cars B or D. There are a number of plausible ways for this scene to unfold. Maybe B will slow down and let A turn. Maybe B will proceed, D will slow down, and A will squeeze in between them. Maybe A will wait for both vehicles to pass before making the turn. A’s actions depend on what B and D do, and C’s actions, in turn, depend on what A does.
If you are driving any of these four vehicles, you need to be able to predict where the other vehicles are likely to be one, two, and three seconds from now. Doing this is the job of the prediction module of a self-driving stack. Its goal is to output a series of predictions that look like this:
Researchers at Waymo and elsewhere struggled to model interactions like this in a realistic way. It’s not just that each individual vehicle is affected by a complex set of factors that are difficult to translate into computer code. Each vehicle’s actions depend on the actions of other vehicles. So as the number of cars increases, the computational complexity of the problem grows exponentially.
But then Waymo discovered that transformer-based networks were a good way to solve this kind of problem.
“In driving scenarios, road users may be likened to participants in a constant dialogue, continuously exchanging a dynamic series of actions and reactions mirroring the fluidity of communication,” Waymo researchers wrote in a 2023 research paper.
Just as a language model outputs a series of tokens representing text, Waymo’s vehicle prediction model outputs a series of tokens representing vehicle trajectories—things like “maintain speed and direction,” “turn 5 degrees left,” or “slow down by 3 mph”.
Rather than trying to explicitly formulate a series of rules for vehicles to follow (like “stay in your lane” and “don’t hit other vehicles”), Waymo trained the model like an LLM. The model learned the rules of driving by trying to predict the trajectories of human-driven vehicles on real roads.
This data-driven approach allowed the model to learn subtleties of vehicle interactions that are not described in any driver manual and would be hard to capture with explicit computer code.
This is not yet a panacea. Lee notes later in his article:
One big problem Sinavski noted is that Wayve hasn’t found a vision-language model that’s “really good at spatial reasoning.” If you’re a long-time reader of Understanding AI, you might remember when I asked leading LLMs to tell the time from an analog clock or solve a maze. ChatGPT, Claude, and Gemini all failed because today’s foundation models are not good at thinking geometrically.
This seems like it would be a big downside for a model that’s supposed to drive a car. And I suspect it’s why Waymo’s perception system isn’t just one big network. Waymo still uses traditional computer code to divide the driving scene up into discrete objects and compute a numerical bounding box for each one. This kind of pre-processing gives the prediction network a head start as it reasons about what will happen next.
Another concern is that the opaque internals of LLMs make them difficult to debug. If a self-driving system makes a mistake, engineers want to be able to look under the hood and figure out what happened. That’s much easier to do in a system like Waymo’s, where some of the basic data structures (like the list of scene elements and their bounding boxes) were designed by human engineers.
But the broader point here is that self-driving companies do not face a binary choice between hand-crafted code or one big end-to-end network. The optimal self-driving architecture is likely to be a mix of different approaches. Companies will need to learn the best division of labor from trial and error.
That sounds right, but for one thing: The Bitter Lesson. To go back to Sutton:
This is a big lesson. As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes. We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.
If The Bitter Lesson ends up applying to true Level 5 autonomy, then Waymo is already signed up for school. A “mix of different approaches” clearly works better now, and may for the next few years, but does it get them to Level 5? And what of the data, to the extent it is essential to the The Sweet Outcome of self-taught AI? This was the part of the Tesla presentation I referenced above:
One of the reasons why the computer can be so much better than a person is that we have millions of cars that are training on driving. It’s like living millions of lives simultaneously and seeing very unusal situations that a person in their entire lifetime would not see, hopefully. With that amount of training data, it’s obviously going to be much better than what a human could be, because you can’t live a million lives. It can also see in all directions simultaneously, and it doesn’t get tired or text or any of those things, so it will naturally be 10x, 20x, 30x safer than a human for all those reasons.
I want to emphasize that the solution that we have is AI and vision. There’s no expensive equipment needed. The Model 3 and Model Y and S and X that we make today will be capable of full autonomy unsupervised. And that means that our costs of producing the vehicle is low.
Again, Musk has been over-promising and under-delivering in terms of self-driving for existing Tesla owners for years now, so the jury is very much out on whether current cars get full unsupervised autonomy. But that doesn’t change the fact that those cars do have cameras, and those cameras are capturing data and doing fine-tuning right now, at a scale that Waymo has no way of matching. This is what I think Andrej Karpathy, the former Tesla Autopilot head, was referring to in his recent appearance on the No Priors podcast:
I think people think that Waymo is ahead of Tesla, I think personally Tesla is ahead of Waymo, and I know it doesn’t look like that, but I’m still very bullish on Tesla and it’s self-driving program. I think that Tesla has a software problem, and I think Waymo has a hardware problem, is the way I put it, and I think software problems are much easier. Tesla has deployment of all these cars on earth at scale, and I think Waymo needs to get there. The moment Tesla gets to the point where they can actually deploy and it actually works I think is going to be really incredible…
I’m not sure that people are appreciating that Tesla actually does use a lot of expensive sensors, they just do it at training time. So there a bunch of cars that drive around with LiDARS, they do a bunch of stuff that doesn’t scale and they have extra sensors etc., and they do mapping and all this stuff. You’re doing it at training time and then you’re distilling that into a test-time package that is deployed to the cars and is vision only. It’s like an arbitrage on sensors and expense. And so I think it’s actually kind of a brilliant strategy that I don’t think is fully appreciated, and I think is going to work out well, because the pixels have the information, and I think the network will be capable of doing that, and yes at training time I think these sensors are really useful, but I don’t think they are as useful at test time.
Do note that Karpathy — who worked at Tesla for five years — is hardly a neutral observer, and also note that he forecasts a fully neural net approach to driving as taking ten years; that’s hardly next year, as Musk promised. That end goal, though, is Level 5, with low cost sensors and thus low cost cars, the key ingredient of realizing the dream of full autonomy and the transformation that would follow.
The Cost of Dreams
I don’t, for the record, know if the Tesla approach is going to work; my experience with both Waymo and Tesla definitely makes clear that Waymo is ahead right now (and the disengagement numbers for Tesla are multiple orders of magnitude worse). Most experts assume that LiDAR sensors are non-negotiable in particular.
The Tesla bet, though, is that Waymo’s approach ultimately doesn’t scale and isn’t generalizable to true Level 5, while starting with the dream — true autonomy — leads Tesla down a better path of relying on nothing but AI, fueled by data and fine-tuning that you can only do if you already have millions of cars on the road. That is the connection to SpaceX and what happened this weekend: if you start with the dream, then understand the cost structure necessary to achieve that dream, you force yourself down the only path possible, forgoing easier solutions that don’t scale for fantastical ones that do.