Data Transformations
Data Transformations in Analytical Data Sharing
Vendia’s analytical data sharing platform provides powerful, no-code capabilities for transforming and preparing your data before sharing it with partners or consumers. These features help you meet governance requirements, protect sensitive information, and create targeted data products.
Transformation Types
Vendia supports two main types of data transformations:
- Column-Level Transformations: Control data visibility and apply masking to protect sensitive information
- Row-Level Transformations: Filter data to create targeted datasets based on specific conditions
Transformation Workflows
Vendia supports flexible transformation workflows to meet different data sharing needs:
- Transform During Ingestion: Apply filtering and masking as data is ingested from source systems
- Ingest Then Transform: Ingest all data first, then create new filtered/masked tables as needed
Important: Column-level and row-level filtering rules must be configured during data ingestion or table creation and cannot be modified after a table has been created.
Column-Level Transformations
Vendia provides fine-grained control over data visibility and access through column-level filtering rules and permissions. These features help ensure data governance, privacy compliance, and secure data sharing.
Column Masking
Apply dynamic masking to sensitive columns to protect personally identifiable information (PII) or confidential business data.
Setting Up Column Masking
Configure column masking during the data ingestion process or when creating a new Vendia table:
- When to Configure: During data ingestion or table creation
- Select Columns: Choose which columns need masking protection
- Choose Column Action:
- Allow: All columns are allowed by default (no masking)
- Exclude: Exclude this column from the table entirely
- Mask entire value: Replace the entire value with asterisks (*)
- Mask everything except the last: Replace all but the last specified number of characters with asterisks
- Replace entire value with: Replace the entire value with a custom provided value
Example Masking Scenarios
-- Original dataemail: john.doe@company.comphone: +1-555-123-4567customer_id: 12345
-- After applying column actionsemail: ****************** (mask entire value)phone: +1-555-***-**** (mask everything except the last 4 characters)customer_id: [CUSTOMER_ID] (replace entire value with custom text)-- Note: Some columns may be excluded entirely and not appear in the result
Column Inclusion and Exclusion Rules
Control which columns are visible in your data tables through inclusion and exclusion actions:
- Allow: Include the column in the table with original values (default behavior)
- Exclude: Remove the column entirely from the table
Column-Level Best Practices
- Plan Thoroughly: Since column rules cannot be changed after table creation, invest time in planning before ingestion
- Principle of Least Privilege: Only grant access to columns that consumers actually need
- Test Before Production: Always validate masking and visibility rules during the ingestion preview phase
- Documentation: Maintain clear documentation of column access decisions for future reference
- Compliance Review: Ensure column rules meet regulatory requirements before finalizing table creation
Row-Level Transformations
Vendia allows you to define row-level filtering rules to control which rows of data are shared in your table. These filters help you create targeted datasets by restricting data access based on specific attribute values and conditions.
Setting Up Row-Level Filters
Configure row-level filtering during the data ingestion process or when creating a new Vendia table. Each filter consists of three components:
- Attribute: Select the column/field you want to filter on
- Operator: Choose the comparison operator for your filter condition
- Constraint: Specify the value to compare against
Available Operators
- Equal to: Exact match (=)
- Not equal to: Exclude exact matches (≠)
- Greater than: Values greater than the constraint (>)
- Greater than or equal to: Values greater than or equal to the constraint (≥)
- Less than: Values less than the constraint (<)
- Less than or equal to: Values less than or equal to the constraint (≤)
- Like: Pattern matching with wildcards
- Not like: Exclude pattern matches
- In: Match any value in a list
- Not in: Exclude any value in a list
- Null: Include only null/empty values
- Not null: Exclude null/empty values
Example Row-Level Filters
Basic Filtering Examples
-- Filter for specific marketing opt-in statusWHERE MARKETINGOPTIN_AIRLINE = 'true'
-- Filter for customers in specific regionsWHERE REGION = 'North America'
-- Filter for recent transactionsWHERE ORDER_DATE >= '2024-01-01'
-- Filter for high-value customersWHERE CUSTOMER_TIER IN ('Premium', 'VIP')
-- Exclude test dataWHERE CUSTOMER_TYPE != 'TEST'
-- Filter for active accounts onlyWHERE STATUS IS NOT NULL AND STATUS != 'INACTIVE'
Advanced Filtering Scenarios
-- Combine multiple conditions for targeted customer segmentsWHERE REGION = 'Europe' AND CUSTOMER_TIER = 'Premium' AND LAST_ORDER_DATE >= '2024-06-01'
-- Filter for specific product categories with minimum order valuesWHERE PRODUCT_CATEGORY LIKE '%Electronics%' AND ORDER_VALUE > 100
-- Exclude internal and test dataWHERE CUSTOMER_TYPE NOT IN ('INTERNAL', 'TEST', 'DEMO')
-- Filter for customers with complete profile informationWHERE EMAIL IS NOT NULL AND PHONE IS NOT NULL AND ADDRESS IS NOT NULL
Row-Level Best Practices
- Start Simple: Begin with basic single-condition filters before adding complex multi-condition rules
- Test Thoroughly: Always preview filter results to ensure they produce the expected data subset
- Document Filter Logic: Maintain clear documentation of filter criteria and business rationale
- Consider Performance: Complex filters with multiple conditions may impact query performance
- Plan for Updates: Since filters can’t be changed after table creation, consider future data needs
- Validate Data Quality: Ensure filter attributes have consistent, reliable data for effective filtering
Common Use Cases
Geographic Data Segmentation
-- North American customers onlyWHERE COUNTRY IN ('USA', 'Canada', 'Mexico')
-- European Union compliance filteringWHERE REGION = 'EU' AND GDPR_CONSENT = 'true'
Customer Segmentation
-- High-value customersWHERE LIFETIME_VALUE > 10000 AND STATUS = 'ACTIVE'
-- Recent customer acquisitionWHERE SIGNUP_DATE >= '2024-01-01'
Time-Based Filtering
-- Recent transactions onlyWHERE TRANSACTION_DATE >= '2024-07-01'
-- Exclude historical test dataWHERE CREATED_DATE >= '2023-01-01' AND ACCOUNT_TYPE != 'TEST'
Data Quality Filtering
-- Complete customer records onlyWHERE EMAIL IS NOT NULL AND PHONE IS NOT NULL
-- Exclude system-generated test recordsWHERE ACCOUNT_TYPE NOT IN ('TEST', 'SYSTEM', 'DEMO')
Transformation Workflow Steps
Step 1: Plan Before Ingestion
- Data Classification: Identify sensitive columns (PII, financial data, proprietary information) before ingestion
- Compliance Requirements: Consider GDPR, CCPA, HIPAA, or industry-specific regulations
- Consumer Requirements: Define what data consumers need to see before creating the table
- Filter Criteria: Determine which attributes will be used for row-level filtering
Step 2: Configure During Ingestion/Table Creation
- Set Up Column Actions: Configure column actions in your data connector
- Define Filter Conditions: Create rules that specify which rows to include in your table
- Test Configuration: Preview masked/filtered data to verify actions behave as expected
Step 3: Deploy and Finalize
- Validate Configuration: Ensure transformations meet compliance requirements
- Complete Process: Finalize once all column actions and filters are validated
- Document Decisions: Record transformation decisions for future reference
Example Use Cases
- Create tenant- or customer-specific datasets with custom filters and masking
- Redact sensitive fields before sharing with external partners
- Apply geographic or regulatory compliance filters for region-specific data sharing
- Mask PII while preserving data utility for analytics
Learn More
- Data Joins
- Getting Started with Analytical Data Sharing
- Analytical Data Sharing FAQ
- Contact Vendia Support for specific questions.