Ingesting Data from Amazon S3
Connecting to Amazon S3
To connect to Amazon S3 and ingest CSV files, you need to provide the following information:
- Name: A friendly name for your connection to easily identify and reuse it for ingesting additional files
- Role ARN: The ARN of the AWS role that Vendia will assume to access your S3 bucket
- S3 Bucket Name: The name of the S3 bucket containing your CSV file
- Bucket Region: The AWS region where your S3 bucket is located (e.g., us-east-1)
Prerequisites
Before connecting, you must update your IAM role’s trust relationship to allow Vendia to access your S3 bucket.
Update Trust Relationship
Allow Vendia to access your S3 bucket by adding a trust relationship to your IAM role. This grants Vendia’s AWS accounts permission to assume your role and access the S3 bucket on your behalf.
Note: The actual Vendia AWS account numbers are provided within the product UI when setting up your S3 connection.
Follow these steps to update the trust relationship:
- Go to the AWS IAM console
- Find the role you’re using for Vendia access
- Click on the “Trust relationships” tab
- Click “Edit trust policy”
- Add or merge the trust relationship policy shown below (replace
VENDIA_ACCOUNT_ID
with the account numbers from the product UI) - Click “Update trust policy”
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::VENDIA_ACCOUNT_ID1:root", "arn:aws:iam::VENDIA_ACCOUNT_ID2:root" ] }, "Action": "sts:AssumeRole" } ]}
Required IAM Permissions
The IAM role must have the following permissions for the S3 bucket:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": ["s3:GetObject", "s3:ListBucket"], "Resource": [ "arn:aws:s3:::your-bucket-name", "arn:aws:s3:::your-bucket-name/*" ] } ]}
Advanced Settings (Optional)
Client-Side Encryption (CSE)
If your S3 objects are encrypted using client-side encryption, you can provide the encryption master key:
- Encryption Key (Optional): The encryption master key for client-side encryption (CSE)
Supported File Formats
Currently, Vendia supports ingesting CSV files from Amazon S3. The CSV files should follow standard formatting conventions:
- Comma-separated values
- Optional header row
- UTF-8 encoding (recommended)
Supported Data Types
Vendia supports the following data types for CSV columns:
Data Type | Description | Example Values |
---|---|---|
STRING | Text data of any length | ”John Doe”, “Product Name” |
INTEGER | 32-bit integer numbers | 123, -456, 0 |
LONG | 64-bit integer numbers | 1234567890123, -987654321 |
FLOAT | Floating-point decimal numbers | 3.14, -2.5, 1.23E+10 |
BOOLEAN | True/false values | true, false, 1, 0 |
DATE | Date values | 2023-01-18, 1/18/2023 |
TIMESTAMP | Date and time values | 2024-06-08 17:28:00 |
BINARY | Binary data encoded as base64 or hex | base64 encoded data |
Example Configuration
Here’s an example of a typical S3 connection configuration:
Field | Example Value |
---|---|
Name | Production Data Bucket |
Role ARN | arn:aws:iam::123456789012:role/VendiaS3AccessRole |
S3 Bucket Name | my-company-data-bucket |
Bucket Region | us-east-1 |
Encryption Key | (optional, only if using CSE) |
Best Practices
- Use IAM roles instead of access keys for enhanced security
- Apply the principle of least privilege when setting up IAM permissions
- Ensure your S3 bucket and objects are accessible from the specified region
- Consider using S3 bucket policies to further restrict access
- Test the connection with a small sample file first
Troubleshooting
If you encounter connection issues:
- Verify the trust relationship includes the Vendia account IDs (provided in the product UI)
- Confirm the IAM role has the required S3 permissions
- Check that the bucket region matches the specified region
- Ensure the bucket name is spelled correctly
- Verify that the objects you want to ingest exist in the bucket
Next Steps
After successfully connecting to your S3 bucket, you can:
- Browse and select specific CSV files to ingest
- Configure CSV parsing options (headers, delimiters, etc.)
- Set up data transformations and mappings