Ingesting Data from Google BigQuery
Connecting to Google BigQuery
To connect to Google BigQuery, you need to set up authentication using a Google Cloud service account and provide the following information:
-
Name: A friendly name for your connection to easily identify and reuse it for ingesting additional tables
-
Dataset Name: The BigQuery dataset containing the tables you want to ingest
-
Service Account JSON Key File - Optional: Paste your complete Google Cloud service account JSON key file content. If provided, this will be used instead of the individual fields below.
- Note: If you use this option, you can skip the individual fields below
-
Or Enter Details Individually:
- Client ID: The unique identifier for the service account client (from
client_id
in JSON key file) - Client Email: The service account email address for authentication (from
client_email
in JSON key file) - Private Key ID: The unique identifier for the private key (from
private_key_id
in JSON key file) - Private Key: The private key for the service account in PEM format (from
private_key
in JSON key file) - Project ID: The Google Cloud Project ID containing your BigQuery datasets (from
project_id
in JSON key file) - Client X509 Certificate URL: The URL for the X.509 certificate associated with the service account (from
client_x509_cert_url
in JSON key file)
- Client ID: The unique identifier for the service account client (from
Prerequisites
Before connecting to Google BigQuery, ensure that:
- Your Google Cloud Project is accessible and billing is enabled
- You have a valid service account with appropriate permissions
- The specified dataset exists and is accessible
- Network connectivity allows HTTPS access to BigQuery APIs
- Service account authentication is properly configured
Service Account Setup
To set up service account authentication for BigQuery:
- Create a service account in your Google Cloud Project
- Generate a JSON key file containing all the required credentials
- Assign appropriate BigQuery roles such as:
BigQuery Data Viewer
for read access to datasets and tablesBigQuery Job User
for running queries
- Use the JSON key file by either:
- Recommended: Pasting the entire JSON key file content into the connection form
- Alternative: Extracting individual fields from the JSON key file for manual entry
Required Permissions
The service account connecting to BigQuery must have the following permissions:
bigquery.datasets.get
permission on the target datasetbigquery.tables.list
permission to list tables in the datasetbigquery.tables.get
permission on the tables you want to ingestbigquery.tables.getData
permission to read table databigquery.jobs.create
permission to run queries
Example Configuration
Here’s an example of a typical BigQuery connection configuration:
Field | Example Value |
---|---|
Name | Production BigQuery Analytics |
Dataset Name | analytics_dataset |
Service Account JSON Key File | { "type": "service_account", "project_id": "...", ... } |
Client Email | vendia-service@your-project.iam.gserviceaccount.com |
Private Key | -----BEGIN PRIVATE KEY-----\n... |
Project ID | your-google-cloud-project-id |
Private Key ID | abc123def456ghi789 |
Client ID | 123456789012345678901 |
Client X509 Cert URL | https://www.googleapis.com/robot/v1/metadata/x509/... |
Vendia Supported and Unsupported BigQuery Data Types
Vendia Supported BigQuery Data Types | Vendia Unsupported BigQuery Data Types |
---|---|
BOOLEAN (BOOL) | ARRAY |
BYTES | BIGNUMERIC (BIGDECIMAL) |
DATE | GEOGRAPHY |
DATETIME | INTERVAL |
FLOAT64 (FLOAT) | JSON |
INT64 (INTEGER, INT, BIGINT, etc.) | RANGE |
NUMERIC (DECIMAL) | STRUCT (RECORD) |
STRING | |
TIME | |
TIMESTAMP |
Best Practices
- Security: Use service accounts with minimal required permissions instead of user accounts
- Key Management: Store private keys securely and rotate them regularly
- Cost Optimization: Use appropriate BigQuery slot reservations and partitioned tables
- Testing: Test connectivity with a small sample table before ingesting large datasets
- Dataset Organization: Use descriptive dataset and table names for better organization
Troubleshooting
If you encounter connection issues:
- Authentication Failed: Verify that service account credentials and JSON key file are valid
- Project Not Found: Check that the Project ID is correct and accessible
- Dataset Not Found: Confirm that the dataset exists and the service account has access
- Permission Denied: Ensure that the service account has the required BigQuery permissions
- API Errors: Verify that the BigQuery API is enabled for your Google Cloud Project
- Network Issues: Check that firewall rules allow HTTPS connections to Google APIs
- Quota Exceeded: Monitor BigQuery quotas and request increases if needed
- Invalid Private Key: Ensure that the private key is in the correct PEM format
Next Steps
After successfully connecting to your BigQuery dataset, you can:
- Select specific tables to ingest
- Configure data transformations and mappings
- Set up incremental data ingestion jobs
- Schedule regular data synchronization tasks