Ingesting Data from Cloudera CDP Hive
Connecting to Cloudera CDP Hive
To connect to Cloudera CDP Hive, you need to provide the following information:
- Name: A friendly name for your connection to easily identify and reuse it for ingesting additional tables
- Hive JDBC URL: The Hive JDBC URL for your Cloudera CDP environment. This URL is used to connect to the Hive service in your Cloudera CDP environment. It typically follows the format
jdbc:hive2://<hostname>:<port>/<database>;transportMode=http;ssl=true;httpPath=cliservice
- Atlas URL: The Atlas URL for your Cloudera CDP environment. This URL is used to connect to the Atlas service in your Cloudera CDP environment. It typically follows the format
http://<hostname>:<port>/api/atlas/v2
- Username: The username for the Cloudera account you want to connect with (Supports Cloudera SSO authentication)
- Password: The password for the Cloudera account you want to connect with
- Database: The Hive database containing the tables you want to ingest
Prerequisites
Before connecting to Cloudera CDP Hive, ensure that:
- Your Cloudera CDP environment is accessible from Vendia
- You have valid credentials with appropriate permissions
- The Hive and Atlas services are running and accessible
- Network connectivity allows access to the specified ports
Required Permissions
The user account connecting to Cloudera CDP Hive must have the following permissions:
- Read access to the Hive database and tables you want to ingest
- Access to the Atlas service for metadata retrieval
- Permission to execute queries on the specified Hive database
Example Configuration
Here’s an example of a typical Cloudera CDP Hive connection configuration:
Field | Example Value |
---|---|
Name | Production Hive Environment |
Hive JDBC URL | jdbc:hive2://hive-server.company.com:443/default;transportMode=http;ssl=true;httpPath=cliservice |
Atlas URL | http://atlas-server.company.com:21000/api/atlas/v2 |
Username | hive-user |
Password | **** |
Database | analytics_db |
Vendia Supported and Unsupported Cloudera Hive Data Types
Supported Cloudera Hive Data Types | Unsupported Cloudera Hive Data Types |
---|---|
BIGINT | ARRAY |
BINARY | CHAR |
BOOLEAN | INTERVAL |
DATE | MAP |
DECIMAL | STRUCT |
DOUBLE | UNIONTYPE |
DOUBLE PRECISION | VARCHAR |
FLOAT | |
INT | |
INTEGER | |
NUMERIC | |
SMALLINT | |
STRING | |
TIMESTAMP | |
TINYINT |
Best Practices
- Use secure connections (SSL/TLS) when connecting to Cloudera CDP environments
- Ensure proper network security and firewall configurations
- Test connectivity with a small subset of data before ingesting large tables
- Use service accounts with minimal required permissions for production environments
Troubleshooting
If you encounter connection issues:
- Verify the Hive JDBC URL format and ensure all parameters are correct
- Confirm the Atlas URL is accessible and the service is running
- Check network connectivity to both Hive and Atlas services
- Validate user credentials and permissions in Cloudera
- Ensure the specified database exists and is accessible
- Review Cloudera logs for any authentication or authorization errors
- Test the connection from a tool like Beeline to verify JDBC connectivity
Next Steps
After successfully connecting to your Cloudera CDP Hive environment, you can:
- Select specific tables to ingest
- Configure data transformations and mappings
- Set up incremental data ingestion jobs
- Schedule regular data synchronization tasks