post image

Why I Left Netflix to Join a 3 Month Old Startup

Mar 21, 2022 Sam Redai

Over the past few months, I’ve been asked about this enough times that I eventually found it a fitting topic for the first post for this blog. Let me start by saying that I loved working at Netflix! Nothing in life is perfect, but my time working for Netflix was a period of immense professional and personal growth. It felt like the first opportunity in my career where I could test the limits of my technical skill set. The culture of personal responsibility taught me how to own my failures and how to understand my successes. I’ve learned a tremendous amount from my colleagues and many of my professional relationships have extended into incredible friendships. Much of this fuels the question many people ask me: “Why did you leave Netflix to join Tabular, a brand new company?”

“Migration Paranoia”

When Jason Reid hired me at Netflix, I was the first engineer on a team tasked with migrating Netflix’s entire Data Science and Engineering organization from Python 2 to Python 3. It wasn’t exactly like Instagram’s well-known 2-to-3 migration, but we were looking at many of the same challenges; thousands of applications, libraries, and scripts, touching many critical components of the analytics organization. Like any data professional will tell you, safely running big data pipelines while migrating them to new code is never easy. I remember sleeping at night and having the occasional nightmare where, while testing a migration, I’d break some production pipeline and ruin some data-powered Netflix feature like top 10, the personalized recommendation engine, or I don’t know, some super important DVD-rental dataset! I can’t remember who came up with the name for this condition, but pretty soon we were referring to it as “Migration Paranoia”. The root of this paranoia came from the fact that safely performing a test run of someone else’s application included many manual steps that could go wrong: Creating a test table, scouring the scripts and imported libraries for hard-coded table references, and making sure I don’t delete some poor engineer’s entire production table by missing a DROP TABLE statement. Not to mention all of the side-effects caused by external systems like orchestration engines, APIs, and signals from upstream pipelines. It doesn’t require much imagination to come up with ways a test data pipeline could cause detrimental effects to production environments. Then, I came across a pipeline that wrote to Iceberg tables.

Apache Iceberg: A Powerful Table Format

Iceberg is an open source table format created at Netflix by Ryan Blue and Dan Weeks. It’s adoption at Netflix is strong and at the time, it was on track to becoming the single table format used across the entire Netflix data warehouse. There’s a ton of features that have helped it become an extremely popular format in the industry. Instead of overloading a metastore with table and partition information, it includes rich table metadata that’s stored with the actual data. One of the features this enables is Iceberg’s snapshot mechanism. Changes to a table produces a new snapshot and these snapshots collectively form a linear history of a table’s evolution. This is often described as analogous to how commits work in Git, allowing you to time-travel instantly to previous snapshots, as you would to previous commits in Git.

It was the ability to write to a table in a way that produces an unpublished snapshot that soon became indispensable to the python migration effort. An unpublished snapshot serves as a query-able product of writing directly to a table while leaving the production state of the table completely untouched. To publish, or not to publish, was determined by a simple Spark setting on the job called ‘spark.wap.id’. What this meant for us is that testing changes to a pipeline was as simple as running the new code in parallel with publishing disabled. The published and unpublished snapshots can then be compared to detect any regressions. A good night’s sleep was earned simply by ensuring that our test runs had publishing disabled! After the Python migration was successfully completed, my team and I went on to build and maintain a number of tools in the data space, many of them that leveraged powerful Iceberg features.

The Startup

When Dan Weeks, Jason Reid, and Ryan Blue announced that they’ll be leaving Netflix to start a new company named Tabular, it was huge news. As you can imagine, there weren’t many details early on but I was excited to hear that the open source Iceberg project would be a core component of the new company. As more details emerged, it became even more clear that Tabular, at its founding, was poised for success. Data infrastructure is hard and organizations have been trying, with varying success, to create scalable, resilient, and secure infrastructure for their data at rest. The charter for Tabular is to combine decades of collective experience of successfully solving the most challenging problems, at the largest scale, and infusing that into an awesome product. Add to that an extraordinary founding team, investment from a venture capital firm with an outstanding reputation, and a design for a product built around a proven technology, and you have a great recipe for success. When Jason reached out with an offer, I immediately knew it was a rare opportunity to be a part of something special.

Although I was a Software Engineer that spent most of my time either coding or reviewing code, I had grown a reputation internally at Netflix as a strong advocate for Iceberg. Joining Tabular is a natural extension of that advocacy but on a larger and much more impactful level. Instead of advocating for Iceberg adoption to engineers at Netflix, I can advocate for it’s adoption globally. Furthermore, I believe in the Iceberg community and its ambitious roadmap. It also helps that the goal of synergy between the Iceberg open source project and Tabular described by Ryan in his blog post, Tabular and the Iceberg Community, feels like a winning formula that’s healthy. And not just healthy for the ASF or Tabular, but for the laundry list of companies who are fully adopting Iceberg today.

The Road Ahead

Tabular’s goal to build a data platform that’s independent of any particular processing engine or storage provider is an ambitious one that will change the way the entire industry thinks of data infrastructure. That being said, there’s a tremendous amount of work and unforeseen challenges along the road ahead. I’m fortunate to be working with a tremendous founding team of engineers and I’m eager to help build what will be a truly innovative product!

-Sam

Other Posts

post image
The Git-Backed UI: A Design Catastrophe Wrapped in Complexity

In software, simplicity is the key to good design. Users don’t need to struggle with complexity; they want tools that help them work, not get in the way. But tools like dbt Cloud, which should make data transformation easier, do the opposite. Instead of a simple process, they wrap users in layers of Git integration, adding confusion where there should be none.

post image
Introducing Sludge - A terminal UI for Slurm clusters

Whether it’s way back when I used to fumble through htop at work to figure out what’s happening with some on-prem server, or if it’s today when lazydocker and k9s have grown into critical tools that maximize my productivity, I’ve always been a huge fan of terminal UIs–especially those that are highly interactive.