Ingesting Data from Cloudera CDP Hive
Connecting to Cloudera CDP Hive
To connect to Cloudera CDP Hive, you need to provide the following information:
- Name: A friendly name for your connection to easily identify and reuse it for ingesting additional tables
- Hive JDBC URL: The Hive JDBC URL for your Cloudera CDP environment. This URL is used to connect to the Hive service in your Cloudera CDP environment. It typically follows the format
jdbc:hive2://<hostname>:<port>/<database>;transportMode=http;ssl=true;httpPath=cliservice - Atlas URL: The Atlas URL for your Cloudera CDP environment. This URL is used to connect to the Atlas service in your Cloudera CDP environment. It typically follows the format
http://<hostname>:<port>/api/atlas/v2 - Username: The username for the Cloudera account you want to connect with (Supports Cloudera SSO authentication)
- Password: The password for the Cloudera account you want to connect with
- Database: The Hive database containing the tables you want to ingest
Prerequisites
Before connecting to Cloudera CDP Hive, ensure that:
- Your Cloudera CDP environment is accessible from Vendia
- You have valid credentials with appropriate permissions
- The Hive and Atlas services are running and accessible
- Network connectivity allows access to the specified ports
Required Permissions
The user account connecting to Cloudera CDP Hive must have the following permissions:
- Read access to the Hive database and tables you want to ingest
- Access to the Atlas service for metadata retrieval
- Permission to execute queries on the specified Hive database
Example Configuration
Here’s an example of a typical Cloudera CDP Hive connection configuration:
| Field | Example Value |
|---|---|
| Name | Production Hive Environment |
| Hive JDBC URL | jdbc:hive2://hive-server.company.com:443/default;transportMode=http;ssl=true;httpPath=cliservice |
| Atlas URL | http://atlas-server.company.com:21000/api/atlas/v2 |
| Username | hive-user |
| Password | **** |
| Database | analytics_db |
Vendia Supported and Unsupported Cloudera Hive Data Types
| Supported Cloudera Hive Data Types | Unsupported Cloudera Hive Data Types |
|---|---|
| BIGINT | ARRAY |
| BINARY | CHAR |
| BOOLEAN | INTERVAL |
| DATE | MAP |
| DECIMAL | UNIONTYPE |
| DOUBLE | VARCHAR |
| DOUBLE PRECISION | |
| FLOAT | |
| INT | |
| INTEGER | |
| NUMERIC | |
| SMALLINT | |
| STRING | |
| STRUCT | |
| TIMESTAMP | |
| TINYINT |
Best Practices
- Use secure connections (SSL/TLS) when connecting to Cloudera CDP environments
- Ensure proper network security and firewall configurations
- Test connectivity with a small subset of data before ingesting large tables
- Use service accounts with minimal required permissions for production environments
Troubleshooting
If you encounter connection issues:
- Verify the Hive JDBC URL format and ensure all parameters are correct
- Confirm the Atlas URL is accessible and the service is running
- Check network connectivity to both Hive and Atlas services
- Validate user credentials and permissions in Cloudera
- Ensure the specified database exists and is accessible
- Review Cloudera logs for any authentication or authorization errors
- Test the connection from a tool like Beeline to verify JDBC connectivity
Next Steps
After successfully connecting to your Cloudera CDP Hive environment, you can:
- Select specific tables to ingest
- Configure data transformations and mappings
- Set up incremental data ingestion jobs
- Schedule regular data synchronization tasks