As the public cloud vendors compete for your big data storage, processing, and analytics dollars, each vendor offers different data ingestion methods to optimize the bulk data loading process to capture your data (and your dollars). Google is no different and offers a bulk loading option for both batch and streaming workloads for Google BigQuery.
While SnapLogic has supported Google BigQuery for quite some time, both streaming and batch bulk-loading options were introduced in the November Fall 2017 release (R4.11) to further optimize SnapLogic pipelines loading data into Google BigQuery. This new set of capabilities helps SnapLogic customers optimize the loading of their Google BigQuery Data Warehouse, leveraging the more than 400 Snaps to connect to nearly any source and optimize the load into Google BigQuery.
Instead of inserting (writing) one record of data at a time into Google BigQuery, the new SnapLogic Google BigQuery Bulk Load snaps load data, as the name would suggest, in bulk to your Google BigQuery dataset. Whether you are uploading data files as a batch process, which automatically leverages high-speed Google Cloud Storage for temporary file staging, or streaming data, the insert process is optimized for bulk operations resulting in much higher levels of performance and lower loading times.
Just to give you a sense of the performance increase, internal testing has shown that when comparing the loading of 100,000 documents using the Google BigQuery Write Snap vs. the Google BigQuery Bulk Load (Streaming) Snap, results showed a minimum of 50-percent reduction in load time (your results will vary based on your batch load setting, number of columns, and length of data). At 1,000,000 records, the same testing showed a minimum of 80-percent reduction in load time (again, your results will vary).
Further, when considering batch processing jobs, which was previously not supported, once your data files are loaded to high-speed Google Cloud Storage (which is automatically handled by the Google BigQuery Bulk Load (Cloud Storage) Snap), loading to Google BigQuery is extremely fast. An internal test shows that a JSON file with 1.5 million records will load into Google BigQuery in just over 90 seconds (again, your time will vary greatly based on your data).
Now that we’ve talked about the tech, consider the business applications in supporting your IOT, Customer 360, digital marketing, operations, or other large volume data analytics use case and the faster time-to-value in accomplishing these use cases with SnapLogic.
See the Google BigQuery Snaps in action below.
Give our new Google BigQuery Bulk Upload Snaps a try and let us know what you think.