Big data analysis usually requires big data infrastructure; a highly skilled team of operations and development staff to manage the different aspects of big data analysis.
Google BigQuery solves this by providing:
- a fully managed
- no ops
- low cost
analytics database. Google takes care of infrastructure, and provides a SQL-like manner of interacting with data.
- provides a service for near real-time interactive analysis of big data sets
- TB to PB
- based on columnar structure for high performance
- query using SQL-like syntax
- only pay for storage and processing
- zero administration for performance and scale
- secure, reliable, and supports open standards
- projects can host one or more datasets
- access to data is typically controlled using access control lists (ACLs) on a dataset
- tables contain data in BigQuery
- used to define a schema for the data
- BigQuery also supports views (virtual tables) defined by a SQL query
CBTNuggets BigQuery project diagram:
BigQuery has a number of reay-made public data sample tables useful for experimenting when getting started.
natality is one of those sample tables.
SELECT weight_pounds, state, year, gestation_weeks FROM publicdata:samples.natality --    --  - project --  - dataset --  - table ORDER BY weight_pounds ASC LIMIT 10;
BigQuery can be accessed in a number of ways:
- Web UI at bigquery.cloud.google.com (opens in a new tab)
- Cloud SDK using the
bq query ...
- interactively via
- RESTful JSON API via client libraries
- using a variety of 3rd-party tools
- Apps scripts
The sidebar has:
- project switcher
- public data sets
We can see details for the natality dataset (opens in a new tab).
On the natality dataset:
select 'Query Table'
paste the following query, or write your own, into the textarea:
SELECT weight_pounds, state, year, gestation_weeks FROM publicdata:samples.natality ORDER BY weight_pounds ASC LIMIT 10;
the green tick on the right is a validator, indicating how much data will be processed by the query
- important when evaluating what you'll pay when processing data