One of the challenges that data scientists face when running machine learning workloads is processing information before it’s ready for use. Google unveiled a new cloud service Thursday aimed at easing that pain.
Google Cloud Dataprep will automatically detect data schemas, joins, and anomalies such as missing or duplicate values, without requiring coding. After that, it will help users build a set of rules for processing the information. Those rules are then built in Apache Streams format and can be imported into products like Google's Cloud Dataflow for processing information as it's imported into services like the BigQuery data warehouse service.
While Cloud Dataprep is built to prepare data for machine learning, the system also uses machine learning itself to try to determine which rules will be most useful for customers. As of Thursday, it's available in private beta.
BigQuery is receiving a number of enhancements as well, including a new Commercial Datasets program that's now available in public beta. It will let users take information from AccuWeather, Dow Jones, Xignite, HouseCanary, and Remine and directly feed it into BigQuery for further processing.
BigQuery can also now query data stored in Cloud Bigtable, Google’s managed NoSQL database offering for low-latency data. That means users can write one SQL query that can tap into information from Bigtable and BigQuery. In the past, they’d have to write a program to search Bigtable.