Metadata Scanning Overview

Piotr Kononow - Dataedo Team Piotr Kononow 20th September, 2023

Types of connectors

Native connectors

Dataedo is shipped with a number of native connectors that can connect to specific technologies: databases, applications, BI tools, ETL tools, etc. and extract metadata.

  1. Database connectors - SQL Server, Snowflake, PostgreSQL and more
  2. BI connectors - Power BI, Tableau, SSRS and more
  3. Application connectors - Microsoft Dynamics 365, Salesforce
  4. Metadata repositories connectors - AWS Glue Data Catalog, Apache Hive Metastore, Microsoft Dataverse
  5. File connectors - Apache Parquet, Delta Lake, CSV, JSON, and more
  6. ETL connectors - Azure Data Factory, SSIS, dbt
  7. Storage access - Local disk, FTP/SFTP, Amazon S3, Azure Blob Storage/Data Lake Storage

Browse full list of connectors

ODBC connector

ODBC (Open Database Connectivity) is a standard way of accessing databases. Most databases, applications and other sources provide ODBC drivers that you install and configure on your workstation. Once configuration is set you can then use Dataedo ODBC connector, select the connector and metadata will be imported from your source. Please note that metadata available through most ODBC drives is limited.

Learn more about ODBC connnector

Custom SQL Connectors (from 23.2)

In version 23.2, we are unveiling a mechanism that allows for the creation of custom connectors for data sources that support SQL and extract metadata with data dictionary tables (supported by most relational databases). These custom connectors are delineated through a series of predefined SQL queries that extract metadata.

Furthermore, these connectors have the capability to support data profiling, reference data, and primary key/foreign key tester functionalities.

As of now, the custom connectors are crafted exclusively by the Dataedo team. However, we are working towards empowering users to create their own custom connectors in forthcoming updates.

Learn more about custom SQL connectors

Metadata import with interface tables

Dataedo provides a powerful feature that allows users to import various metadata into catalog using interface tables - a set of predefined tables where users can upload metadata extracted on their own and run import that loads it safely into repository.

Learn more about interface tables

SQL DDL imports

Sometimes you cannot allow a third party software to connect to your database, but you can dump database structure with a set of SQL DDL scripts (create statements). Dataedo can import schema from DDL scripts a selected SQL dialects.

Learn more about importing metadata from DDL scripts

Metadata extraction techniques

Dataedo uses following techniques to extract metadata from your source:

Scanning data dictionary tables

Most databases and data platforms provide System Catalog / Data Dictionary tables. Dataedo scans those tables to identify tables, columns and other structures.

Calling APIs

Some data sources provide APIs that provide metadata.

Sampling documents

Sources like document stores (MongoDB for instance) require sampling of documents to identify its structure.

Parsing report/ETL code

To extract metadata from BI or ETL tools Dataedo needs to parse the code of reports and packages.

SQL Parsing

To build lineage for SQL code Dataedo parses SQL to identify objects used, their columns and data movement.

Learn more about SQL parsing

Data profiling

To extract information about column data profile (min-max, average, number of nulls, etc.) Dataedo executes a number of predefined data profiling SQL queries.

Learn more about Data Profiling

Found issue with this article? Comment below
Comments are only visible when the visitor has consented to statistics cookies. To see and add comments please accept statistics cookies.
There are no comments. Click here to write the first comment.