New AWS big data-related features play well with Looker (and others)
By Sara Isenberg
Founder, Publisher, Editor-in-Chief, Santa Cruz Tech Beat
April 20, 2017 — Santa Cruz, CA
You can now use Amazon Athena to query encrypted data stored in Amazon S3.
“Big Data” is changing the way organizations collect and analyze their data, and two local companies are at the heart of this transformation. Amazon Web Services offers a durable and cost effective way to store enormous amounts of data in the cloud, where it can be accessed by all kinds of other applications for analysis. Looker builds the visualization tools that help users navigate and visualize the needles in these data haystacks. Together, they are making it possible for ordinary people to collect, process, and access insights in all industries at a fraction of the traditional cost.
At last fall’s re:Invent conference AWS announced Amazon Athena, a tool that makes it even simpler to query the vast treasure troves of cloud data in Amazon S3. Looker’s Erin Franz blogged about how this helps them help their customers. Franz says:
“Looker took an early bet on SQL as the lingua franca for data analysis. We developed a product that directly leverages the underlying power and functionality of SQL dialects, and already has full support for the ecosystem of Amazon products including RDS, Redshift, and EMR via Spark SQL, Hive and Presto. Now, either in conjunction with these engines or separately, you can leverage Looker on Athena to make data in S3 available across your organization. Looker doesn’t move your data from S3, it directly leverages the power of Athena to query the data where it lives.”
A deeper dive on Athena:
AWS Technical Evangelist Tara Walker writes:
“Amazon Athena is a serverless interactive query service that enables users to easily analyze data in Amazon S3 using standard SQL. At Athena’s core is Presto, a distributed SQL engine to run queries with ANSI SQL support and Apache Hive which allows Athena to work with popular data formats like CSV, JSON, ORC, Avro, and Parquet and adds common Data Definition Language (DDL) operations like create, drop, and alter tables. Athena enables the performant query access to datasets stored in Amazon Simple Storage Service (S3) with structured and unstructured data format.”
Protecting sensitive information – such as security logs, financial transactions, healthcare records, and more – is often done by encrypting data. However, in many cases, users need to decide between securing their data or making it available for analytics. One of the big advantages of AWS is that you don’t have to make tough choices with your data. Amazon Redshift and Amazon EMR have long supported analytics on encrypted data. Amazon is excited to bring this capability to Athena. Now, you can easily run SQL queries directly against your encrypted data in S3 and write encrypted results back to your S3 bucket. Both, server-side encryption and client-side encryption are supported, enabling you to query your data while it’s protected at rest, encrypted in Amazon S3; and in-transit, as it travels to and from Amazon S3; and also via Athena’s JDBC driver, over encrypted communication channels. (Source: AWS YouTube.)
Organizations capture and analyze more data than ever before, and these tools help them find the digital needles in their Exabyte haystacks.
- You store data in a “data lake” built on Amazon S3, an easy, cost-effective place to put lots of data in its native format.
- You encrypt it to be absolutely sure it stays secure and nobody but you can use it.
- Normally, you’d have to extract the data, transform it into a useful format and load it into a database to run queries. But the Athena service lets you work with this data on demand without all that Extract/Transform/Load overhead.
- Looker is built on top of these services to function as the front-end interface that lets you run these queries and see the results in a friendly way.
All of this could theoretically help the builders and makers right here in Santa Cruz work with data analysis in fields like — just for example — Internet of Things (Calliope), genomics (UCSC Genomics Institute), machine data and sensor streams (Zero Mortorcycles) or marketing operations (Nanigans). [Note: Nanigans isn’t located in Santa Cruz but, hey, Doug Erickson works there.]
Learn more in this video: