Skip to content

Ingesting Data from Cloudera CDP Hive

Connecting to Cloudera CDP Hive

To connect to Cloudera CDP Hive, you need to provide the following information:

  • Name: A friendly name for your connection to easily identify and reuse it for ingesting additional tables
  • Hive JDBC URL: The Hive JDBC URL for your Cloudera CDP environment. This URL is used to connect to the Hive service in your Cloudera CDP environment. It typically follows the format jdbc:hive2://<hostname>:<port>/<database>;transportMode=http;ssl=true;httpPath=cliservice
  • Atlas URL: The Atlas URL for your Cloudera CDP environment. This URL is used to connect to the Atlas service in your Cloudera CDP environment. It typically follows the format http://<hostname>:<port>/api/atlas/v2
  • Username: The username for the Cloudera account you want to connect with (Supports Cloudera SSO authentication)
  • Password: The password for the Cloudera account you want to connect with
  • Database: The Hive database containing the tables you want to ingest

Prerequisites

Before connecting to Cloudera CDP Hive, ensure that:

  • Your Cloudera CDP environment is accessible from Vendia
  • You have valid credentials with appropriate permissions
  • The Hive and Atlas services are running and accessible
  • Network connectivity allows access to the specified ports

Required Permissions

The user account connecting to Cloudera CDP Hive must have the following permissions:

  • Read access to the Hive database and tables you want to ingest
  • Access to the Atlas service for metadata retrieval
  • Permission to execute queries on the specified Hive database

Example Configuration

Here’s an example of a typical Cloudera CDP Hive connection configuration:

FieldExample Value
NameProduction Hive Environment
Hive JDBC URLjdbc:hive2://hive-server.company.com:443/default;transportMode=http;ssl=true;httpPath=cliservice
Atlas URLhttp://atlas-server.company.com:21000/api/atlas/v2
Usernamehive-user
Password****
Databaseanalytics_db

Vendia Supported and Unsupported Cloudera Hive Data Types

Supported Cloudera Hive Data TypesUnsupported Cloudera Hive Data Types
BIGINTARRAY
BINARYCHAR
BOOLEANINTERVAL
DATEMAP
DECIMALSTRUCT
DOUBLEUNIONTYPE
DOUBLE PRECISIONVARCHAR
FLOAT
INT
INTEGER
NUMERIC
SMALLINT
STRING
TIMESTAMP
TINYINT

Best Practices

  • Use secure connections (SSL/TLS) when connecting to Cloudera CDP environments
  • Ensure proper network security and firewall configurations
  • Test connectivity with a small subset of data before ingesting large tables
  • Use service accounts with minimal required permissions for production environments

Troubleshooting

If you encounter connection issues:

  1. Verify the Hive JDBC URL format and ensure all parameters are correct
  2. Confirm the Atlas URL is accessible and the service is running
  3. Check network connectivity to both Hive and Atlas services
  4. Validate user credentials and permissions in Cloudera
  5. Ensure the specified database exists and is accessible
  6. Review Cloudera logs for any authentication or authorization errors
  7. Test the connection from a tool like Beeline to verify JDBC connectivity

Next Steps

After successfully connecting to your Cloudera CDP Hive environment, you can:

  • Select specific tables to ingest
  • Configure data transformations and mappings
  • Set up incremental data ingestion jobs
  • Schedule regular data synchronization tasks