I stumbled upon a post of mine written a year ago, right after the loud DeepSeek R1 release. I would add something about the mission worth fighting for and the research taste today. Reads fresh otherwise.

Deep expertise in GPU/TPU programming.

Sharded matmuls and communication collectives are the most important building blocks of modern AI systems. Not a lot of engineers have years of experience using them, pushing the corresponding hardware to the limits, and finding ways to do more computation with less hardware.

The training/inference codebases are quite unusual: there isn’t a lot of code, but the code is hard to understand if you’re not an expert in the area, and it’s impossible to optimize unless you know a lot more math and hardware details.

Grown, not built.

It’s not always possible to gather a lot of talented people in one room and hope they will jell into a cohesive organization. What definitely works is finding 3-5-7 founding members that can work together as The Three Musketeers, make them jell, and then slowly add more people that match the culture of the “Musketeers”.

Founder-led.

Good luck reaching any meaningful results when the company is led by a former VP that plans to jump ship in the next 2-3 years to supercharge their career. They will think about the company output in terms of breathtaking presentations they will give and won’t worry about the whole company failing, because it’s failing forward, and they get a lot of important learnings on the way, and are ready to tackle a bigger role afterwards.

Long-term vs short-term.

While it’s very important to be able to hit the intermediate checkpoints, like matching performance of selected competitors, and reproducing in-house the important known results, the group should have some common long-term goal unrelated to the ever-changing hype.

Look at Demis Hassabis. “Solve intelligence, then solve everything else”. Despite the hype and anxiety in the area, his direction stays the same over > 15 years. Some projects need a tight group of people working for 5 years. Deep work. Reading and thinking in the dead of night, when everyone is asleep.

Grow a magical garden vs grow a magical tree.

There is a difference between growing a team that is able to build novel tech and products vs building an org that can quickly replicate some successful product.

The successful models and products are just results of abilities to fathom great depths of science, engineering, and teamwork.