Skip to content

Ingesting Data from Google BigQuery

Connecting to Google BigQuery

To connect to Google BigQuery, you need to set up authentication using a Google Cloud service account and provide the following information:

  • Name: A friendly name for your connection to easily identify and reuse it for ingesting additional tables

  • Dataset Name: The BigQuery dataset containing the tables you want to ingest

  • Service Account JSON Key File - Optional: Paste your complete Google Cloud service account JSON key file content. If provided, this will be used instead of the individual fields below.

    • Note: If you use this option, you can skip the individual fields below
  • Or Enter Details Individually:

    • Client ID: The unique identifier for the service account client (from client_id in JSON key file)
    • Client Email: The service account email address for authentication (from client_email in JSON key file)
    • Private Key ID: The unique identifier for the private key (from private_key_id in JSON key file)
    • Private Key: The private key for the service account in PEM format (from private_key in JSON key file)
    • Project ID: The Google Cloud Project ID containing your BigQuery datasets (from project_id in JSON key file)
    • Client X509 Certificate URL: The URL for the X.509 certificate associated with the service account (from client_x509_cert_url in JSON key file)

Prerequisites

Before connecting to Google BigQuery, ensure that:

  • Your Google Cloud Project is accessible and billing is enabled
  • You have a valid service account with appropriate permissions
  • The specified dataset exists and is accessible
  • Network connectivity allows HTTPS access to BigQuery APIs
  • Service account authentication is properly configured

Service Account Setup

To set up service account authentication for BigQuery:

  1. Create a service account in your Google Cloud Project
  2. Generate a JSON key file containing all the required credentials
  3. Assign appropriate BigQuery roles such as:
    • BigQuery Data Viewer for read access to datasets and tables
    • BigQuery Job User for running queries
  4. Use the JSON key file by either:
    • Recommended: Pasting the entire JSON key file content into the connection form
    • Alternative: Extracting individual fields from the JSON key file for manual entry

Required Permissions

The service account connecting to BigQuery must have the following permissions:

  • bigquery.datasets.get permission on the target dataset
  • bigquery.tables.list permission to list tables in the dataset
  • bigquery.tables.get permission on the tables you want to ingest
  • bigquery.tables.getData permission to read table data
  • bigquery.jobs.create permission to run queries

Example Configuration

Here’s an example of a typical BigQuery connection configuration:

FieldExample Value
NameProduction BigQuery Analytics
Dataset Nameanalytics_dataset
Service Account JSON Key File{ "type": "service_account", "project_id": "...", ... }
Client Emailvendia-service@your-project.iam.gserviceaccount.com
Private Key-----BEGIN PRIVATE KEY-----\n...
Project IDyour-google-cloud-project-id
Private Key IDabc123def456ghi789
Client ID123456789012345678901
Client X509 Cert URLhttps://www.googleapis.com/robot/v1/metadata/x509/...

Vendia Supported and Unsupported BigQuery Data Types

Vendia Supported BigQuery Data TypesVendia Unsupported BigQuery Data Types
BOOLEAN (BOOL)ARRAY
BYTESBIGNUMERIC (BIGDECIMAL)
DATEGEOGRAPHY
DATETIMEINTERVAL
FLOAT64 (FLOAT)JSON
INT64 (INTEGER, INT, BIGINT, etc.)RANGE
NUMERIC (DECIMAL)STRUCT (RECORD)
STRING
TIME
TIMESTAMP

Best Practices

  • Security: Use service accounts with minimal required permissions instead of user accounts
  • Key Management: Store private keys securely and rotate them regularly
  • Cost Optimization: Use appropriate BigQuery slot reservations and partitioned tables
  • Testing: Test connectivity with a small sample table before ingesting large datasets
  • Dataset Organization: Use descriptive dataset and table names for better organization

Troubleshooting

If you encounter connection issues:

  1. Authentication Failed: Verify that service account credentials and JSON key file are valid
  2. Project Not Found: Check that the Project ID is correct and accessible
  3. Dataset Not Found: Confirm that the dataset exists and the service account has access
  4. Permission Denied: Ensure that the service account has the required BigQuery permissions
  5. API Errors: Verify that the BigQuery API is enabled for your Google Cloud Project
  6. Network Issues: Check that firewall rules allow HTTPS connections to Google APIs
  7. Quota Exceeded: Monitor BigQuery quotas and request increases if needed
  8. Invalid Private Key: Ensure that the private key is in the correct PEM format

Next Steps

After successfully connecting to your BigQuery dataset, you can:

  • Select specific tables to ingest
  • Configure data transformations and mappings
  • Set up incremental data ingestion jobs
  • Schedule regular data synchronization tasks