My Bets for 2017, by Lloyd Tabb
By Lloyd Tabb
Founder, Chairman & Chief Technology Officer at Looker
December 14, 2016 — Santa Cruz, CA
[Editor’s note: Aside from being Looker’s Founder, Chairman and CTO, Lloyd Tabb is a lifelong engineer, entrepreneur and VC. He founded LiveOp and Mozilla.org, was an early engineer at Netscape (writing the first HTML Composer and first scripting engine), original database and language lead at Borland, and a venture partner at CMEA Capital.]
1) Moore’s Law holds true for databases
Per Moore’s law, CPUs are always getting faster and cheaper. Of late, databases have been following the same pattern.
In 2013, Amazon changed the game when they introduced Redshift, a massively parallel processing database that allowed companies to store and analyze all their data for a reasonable price. Since then however, companies who saw products like Redshift as datastores with effectively limitless capacity have hit a wall. They have hundreds of terabytes or even petabytes of data and are stuck between paying more for the speed they had become accustomed to, or waiting five minutes for a query to return.
Enter (or reenter) Moore’s law. Redshift has become the industry standard for cloud MPP databases, and we don’t see that changing anytime soon. With that said, our prediction for 2017 is that on-demand MPP databases like Google BigQuery and Snowflake will see a huge uptick in popularity. On-demand databases charge pennies for storage, allowing companies to store data without worrying about cost. When users want to run queries or pull data, it spins up the hardware it needs and gets the job done in seconds. They’re fast, scalable, and we expect to see a lot of companies using them in 2017.
2) SQL will have another extraordinary year
SQL has been around for decades, but from the late-1990s to mid 2000s, it went out of style as people started exploring NoSQL and Hadoop alternatives. SQL however, has come back with a vengeance. The renaissance of SQL has been beautiful to behold and I don’t even think it’s near it’s peak yet.
The innovations we’re seeing are blowing our minds. BigQuery has created a product that is essentially infinitely scalable, the original goal of Hadoop, AND practical for analytics, the original goal of relational databases.
SQL engines for Hadoop have continued to gain traction. Products like SparkSQL and Presto are popping up in enterprises and as cloud services because they allow companies to leverage their existing Hadoop clusters and cloud storage for speedy analytics. What’s not to love?
To top it all off, companies like Snowflake, and now Amazon Athena, are building giant SQL data engines that query directly on S3 buckets, a source that was previously only accessible via command line.
2016 was the best year SQL has ever had — 2017 will be even better.
3) The data lake will find purpose
Companies have been collecting data for awhile, so the data lake is well-stocked with fish. But the people who needed data most couldn’t generally find the right fish.
I support the notion of a data lake, dumping all your raw data into one data warehouse. But it doesn’t work if you don’t have a way to make it cohesive when you query it. There have been great innovations by companies like Segment, Fivetran and Stitch, which make moving data into the lake easier. Modeling data is the final step that brings it all together and helps some of the best companies in the world see through data.
Companies like Docker, Amazon Prime Now and BuzzFeed are using all their data to create comprehensive views of their customers and of their businesses. When these final two steps are added, the data lake can finally be a powerful way to get all your data into the hands of every decision-maker to make companies more successful.