How to read this article?
This article explains the vision for the product and roadmap of its development for the next two years. Please bear in mind that this roadmap can change, as with each release (every 3-6 months) we update our plans, and vision evolves over time with your feedback, requests and the direction the market is going.
Please check out 24.1 release notes.
Below are the key development areas:
Data Governance
Workflows (Q1-Q2 2024)
"Workflows" is umbrella name for a family of features that will help you manage authoring of your documentation:
- Drafts (Q2 2024) - Dataedo will introduce a two-tiered system, objects will be in draft and published states. This will apply to user created objects, such as domains, subject areas, terms, and imported objects - tables, reports and stored procedures. Draft objects will only be visible do data stewards (editors), and viewers will only be able to see published objects.
- Statuses (released in 24.1) - Building on the foundation of the draft/published system, Dataedo will offer a feature allowing objects to adopt a status from a custom-defined list available in your repository. This enhancement (inspired by Jira) serves to provide users with the ability to accurately determine the progress phase of specific object's documentation, and identify who is currently responsible for it. Each status will be limited to specific roles, so not everyone will be able to move object to any status (permissions will probably not be in version 1 of this feature).
- Change suggestion (Q2 2024) - Soon, everyone (with a free viewer license and the right permission) can suggest edits to descriptions of any asset. These suggestions go to a dashboard where editors decide to accept or reject them.
- Request for change (late 2024) - In addition to suggesting edits, users can soon ask for changes to any object's documentation.
Asset Certifications (Q2-Q3 2024)
One of the important aspects of data governance and data democratization is trust in data. One of the ways you can increase trust in data is by certification of assets by data stewards and data owners. Dataedo will enable them to provide a certificate for each asset in the catalog, each with a timestamp and signed by specific person.
AI Data Classifications (Q2 2024)
While Dataedo currently identifies sensitive data based on column names, we're taking a step further. Our new feature will scan the actual content of the data, determining its type. This means it can detect whether the data is sensitive, like an email, name, address, etc.
Steward Hub (2024)
Steward Hub is a suite of features crafted to assist Data Stewards in their daily tasks. With the aid of built-in rules, dictionaries, and sophisticated language models like Chat GPT, the Steward Hub recommends areas of improvement, suggesting documentation fill-ins and potential entity links. First stage of the feature was released in 24.1 and more functionalities will be throughout 2024 and beyond.
Here's a peek into some of the Steward Hub's capabilities:
- Domains
- Propose potential domains and subject areas
- Offer suggestions for creating links from domains to data assets
- Business Glossary
- Recommend possible terms for creation
- Show terms by their definition status (e.g., to define, for review, publish)
- Suggest linking terms to columns, tables, and reports
- Data Dictionary
- Identify missing descriptions or overly simplistic ones that need refining
- Propose improved descriptions
- Foreign keys
- Advise on potential foreign keys, drawing from names and SQL queries/views in the repository
- Data Classification
- Suggest classifications, enhancing the current Data Classification module
- Reference Data
- Display lookups according to their definition status
- Recommend lookups based on column names, data types, and data profiling
- Suggest how to link defined lookups to other columns in the catalog (e.g., connecting a "Country" lookup to every country column in the repository).
With the Steward Hub, Dataedo reinforces its commitment to streamlining the data documentation process.
Data Discovery
We are prioritizing on how viewers find and discover metadata in our catalog. That's why we will be redesigning navigation and most lists, forms and diagrams in Dataedo Portal.
Redesigned ERD diagrams (Q1 2024)
We will merge ERD diagrams from Desktop and Portal and enable users to modify and save ERDs directly in the Portal.
ERDs for SQL queries and views (Q2 2024)
Dataedo Portal can currently draw a diagram of lineage for a database view or SQL query. In the future we would like to visualize query joins (including nested queries) in an ER diagram (similarly to how SSMS does it).
Similar objects (Q2 2024)
To make it easier to find the right report, dataset or column Portal will be showing similar objects based on various built in rules.
Data Lineage
Data Lineage automations
See Connectors and Metadata Extraction section.
Connectors and Metadata Extraction
SQL Parsing
Building lineage for insert and update statements in stored procedures.
BI tools
- Qlik Sense (2024)
- Looker (2024)
ETL tools
- AWS Glue (2024)
- Talend (2024)
- Informatica PowerCenter (2024)
Import mechanism improvements (Q2-... 2024)
We are redesigning our import mechanism completely to provide following improvements:
- Importing multiple databases at once (Q2 2024)
- Profiling and scanning data (Q2 2024) - at import you will be able to automatically profile data or refresh lookups.
Import from Dataedo Portal
Currently, importing metadata is only available with Desktop or command line files. We are working on enabling imports directly in Portal.
Convenient scheduler (Q2-Q3 2024)
We want to make it easier to schedule imports and make sure your metadata is up to date. One of the features that will help us achieve that is a feature that allows easy scheduling and monitoring status of imports of each source from Portal.
Data Profiling
JSON structure discovery (Q2 2024)
With the advent of Big Data, relational databases abandoned their strict data models in many places and developers are saving unstructured data in text/JSON columns. Understanding of the structure of documents in those columns is as crucial as understanding columns in a table. Therefore, we will provide a JSON scanner for columns identified by data stewards (Data Steward Assistant could also suggest such fields). This scanner will extract schema of the JSON document and create a linked
Different "perspectives" for tables (Q2-... 2024)
Some tables hold multiple entities, or entities in different states. For instance, when order is 'APPROVED' it requires order date to be set.
Take the example of an orders table. An order marked as 'APPROVED' necessitates that an order date is specified. When analyzing such a table, it's beneficial to view the distribution and null values of the order_date column separately for both approved and draft orders.
Dataedo is introducing a feature that lets users create different "perspectives" on a single table. This helps us get a clearer picture of what data tables actually represent. These perspectives will also be a foundation for an upcoming Data Quality module.
Dashboards & Analytics (Q2-... 2024)
We are planning to deliver a number of built in dashboards:
- Metadata Ingestion Dashboard - Dashboard that helps track status, time and number of objects imported daily.
- Dataedo Usage Dashboard - Dashboard that shows usage of Dataedo - daily users, objects visited, searches and edits daily.
- Data Stewardship Dashboard - Dashboard for Data Stewards and DG management that shows documentation progress (broken down by different metadata types), weekly and monthly edits, and steward activity.
- Data Catalog Dashboard - Dashboard that shows summary and statistics of data assets in the catalog.
- Data Profiling Dashboard - High level overview of data profiled in the catalog and each data source.
- Data Quality Dashboard - See Data Quality section.
Data Quality (Q2-... 2024)
- Data Quality rules:
- Simple column rules: value less/more/between, string length is less/more/between, column is not null, value fits pattern
- Number of rows rules
- Custom SQL rules enabling user writing custom tests using SQL
- Foreign Key rules: DQ module will perform referential integrity tests from defined foreign keys/relationships in bulk
- Unique Key rules: DQ module will perform uniqueness tests from defined primary/unique keys in bulk
- Reference Data rules: DQ module will test if values match defined lookup in Reference Data module
- Thresholds - user will be able to provide values/thresholds for Pass/Warning/Error results.
- Filters - option to define filters for tables to test specific subset of rows (e.g. approved documents, transactions after 2020, etc.).
- Testing engine - Part of Dataedo DQ will be
- Tests log - Test logs will be saved in the DQ repository (open SQL database).
- Data Quality dashboard - presents high level overview of tests and quality over time with the ability to drill into specific tables, columns and tests.
Exports and Integrations
MS Teams and Slack integrations (sometime in 2024?)
We would like to make Dataedo more interactive, and on of the way will be integration with your communicator - MS Teams or Slack.
Dataedo API (sometime in 2024)
Dataedo API will enable you for easier integration. API will consist of RES APIs methods and will enable definition of webhooks to get notified on events and changes in Dataedo.
Excel export replacement (Q1 2024)
Excel export will be decommissioned and we will build in a number of grid views that will provide option to copy or save to Excel.
Deployment and administration
Repository on PostgreSQL (sometime in 2024)
To make Dataedo hosting easier, we are working on repository hosted on PostgreSQL.
Desktop authentication with Portal users (Q2-Q3 2024) and permissions in Desktop (late 2024)
Dataedo Desktop will enable connecting to repository using the same account as Dataedo Portal does. This will pave way to permissions included in Desktop.