Skip to content

Commit

Permalink
docs: Add Data Catalog section for connectors.
Browse files Browse the repository at this point in the history
  • Loading branch information
Fabiana Clemente authored and Fabiana Clemente committed Feb 8, 2024
1 parent fda3d77 commit 46ebb1a
Show file tree
Hide file tree
Showing 24 changed files with 97 additions and 23 deletions.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
24 changes: 24 additions & 0 deletions docs/data_catalog/connectors/create_connector.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# How to create a connector in Fabric's Data Catalog?

:fontawesome-brands-youtube:{ .youtube } <a href="https://youtube.com/clip/UgkxVTrEn2jY8GL-wqSXX3PByuUH5Q81Usih?si=xdpQ4eTCo_SEcvxp"><u>How to create a connector to an RDBMS in Fabric?</u></a>

To create a connector in YData Fabric, select the "Connectors" page from the left side menu, as illustrated in the image below.

[ADD HERE THE IMAGE]

Click in *"Add Connector"* and a list of connector types to choose from will be shown to you.

[Add here the image]

For the purpose of this example we will be creating a connector to our AWS S3 storage.
The credentials/secrets to your storage will be requested. After adding them, you can *"Test connection"*
to ensure that all the details are correct.
A confirmation message, similar to the one shown in the image below, should appear in our screen,
letting you know that you can now save your connector successfully!

[Add here the image]

**Congrats!** 🚀 You have now created your first **Connector**! You can now create different *Datasources*
in your project's Data Catalog.
Get ready for your journey of improved quality data for AI.

1 change: 0 additions & 1 deletion docs/data_catalog/connectors/file_formats.md

This file was deleted.

20 changes: 15 additions & 5 deletions docs/data_catalog/connectors/index.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,18 @@
# Connectors

Introduce the user what are the connectors. Explain the following:
- Our connectors are highly scalable (due to Dask)
- The benefits of an easy and re-usable connection
- Security, etc.
Fabric connectors play an important role in the landscape of data-driven projects, acting as essential components that facilitate the movement and integration of data across different systems, platforms, and applications.
Fabric connectors where designe to offer a seamless and easy connectivity for data exchange between disparate data sources (such as databases, cloud storage systems, etc).

##
## Benefits

- **Data Integration:** Fabric Connectors are primarily used to consume and integrate data a variety of different sources in a single project, ensuring that data can be easily combined, transformed, and made ready for
analysis or operational use.
- **Automation of data flows:** They automate the process of data extraction, transformation and loading (ETL), which is crucial for maintaining up-to-date and accurate the data that is being used for a certain project.
- **Simplification of data access:** Fabric connectors experience simplify the process of accessing and using data from specialized or complex systems, making it easier for users without deep technical expertise to leverage data
for insights.
- **Enhancement of Data Security:** Designed to manage in a secure way the credentials and access to your different storage.

## Get started with Fabric Connectors
- :fontawesome-brands-youtube:{ .youtube } <a href="https://youtube.com/clip/UgkxVTrEn2jY8GL-wqSXX3PByuUH5Q81Usih?si=xdpQ4eTCo_SEcvxp"><u>How to create a connector in Fabric?</u></a>
- :fontawesome-brands-github: <a href="https://www.youtube.com/watch?v=1zYreRKsNGE"><u>How to use Object Storage Connectors through Labs?</u></a>
- :fontawesome-brands-github: <a href="https://www.youtube.com/watch?v=1zYreRKsNGE"><u>How to use RDBMS connectors through Labs?</u></a>
27 changes: 24 additions & 3 deletions docs/data_catalog/connectors/supported_connections.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,25 @@
Mainly leverage information below:
https://www.notion.so/a4ed9105cdf94d3f92da13f7774e45b6?v=b72e059a809d44e5baacfa56b94f17cc&pvs=4
# Supported connections

Also useful to build the docs: https://doc.dataiku.com/dss/latest/connecting/connections.html
Fabric can read and write data from a variety of data sources.

## Connectors

Here is the list of the available connectors in Fabric.

| Connector Name | Type | Supported file types | Notes |
|:---------------------|:--------------:|-------------------------------------:|:------------------------------------------------------------------------------------------------------|
| AWS S3 | Object Storage | `Parquet` `CSV` | |
| Azure Blog Storage | Object Storage | `Parquet` `CSV` | |
| Azure Data Lake | Object Storage | `Parquet` `CSV` | |
| Google Cloud storage | Object Storage | `Parquet` `CSV` | |
| Upload file | File | `Parquet` `CSV` | Maximum file size is 700MB. <br/>Bigger files should be uploaded and read from <br/>remote object storages |
| Google BigQuery | Big Table | `Not applicable` | |
| MySQL | RDBMS | `Not applicable` | Supports reading whole schemas or specifying a query |
| Azure SQL Server | RDBMS | `Not applicable` | Supports reading whole schemas or specifying a query |
| PostGreSQL | RDBMS | `Not applicable` | Supports reading whole schemas or specifying a query |
| Snowflake | RDBMS | `Not applicable` | Supports reading whole schemas or specifying a query |
| Oracle DB | RDBMS | `Not applicable` | Supports reading whole schemas or specifying a query |

## Haven't found your storage?

To understand our development roadmap or to request prioritization of new data connector, reach out to us at [ydata.ai/contact-us.](https://ydata.ai/contact-us)
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
42 changes: 28 additions & 14 deletions docs/data_catalog/index.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,36 @@
# Data Catalog

We need to improve the below narrative Mainly focus on:
In the realm of data management and analysis, the ability to efficiently discover, understand,
and access data is crucial. Fabric's Data Catalog emerges as a pivotal solution in this context,
designed to facilitate an organized, searchable, and accessible repository of metadata.
This chapter introduces the concept, functionality, and advantages of the *Data Catalog* within
Fabric's ecosystem, offering developers a comprehensive overview of its significance and utility.

- What is Fabric's Data Catalog
- What are the benefits for the developers
- What can be found in Fabric's Catalog - Connectors & Datasets (no need to explain them here, only links for the next sections)
To ensure that large volumes of data can be processed through the entire data pipeline,
Fabric is equipped with integrated connectors for various types of storages (from RDBMS to cloud object storage),
guaranteeing the data never leaves your premises. Furthermore Fabric's Catalog ensures a timely and scalable
data analysis as it runs on top of a distributed architecture powered by Kubernetes and [Dask](https://docs.dask.org/en/stable/).

The narrative of a Data-Centric platform must, as expected, start with the data. Data Sources are the primary way of ingesting and understanding data.
The benefits of Fabric's Data Catalog for data teams are manifold, enhancing not only the efficiency but also
the effectiveness of data understanding operations:

To ensure that large volumes of data can be processed through the entire platform, integrated direct Connectors for various types of data sources - namely relational database management systems (RDBMS), data warehouses, remote object storages and even local files - exist. These Connectors offer:
- **Improved Data Accessibility:** With the Data Catalog, developers can consume the data
they need for a certain project through a user-friendly interface, significantly reducing the time spent searching for data across disparate sources.
This enhanced discoverability makes it easier to initiate data analysis, machine learning projects,
- or any other data-driven tasks.

- **simplified** authentication
- **scalability** and **high-throughput**, through the combination of a distributed computing engine with an infrastructure which scales on-demand
- **security**, as the connection is direct and data is never moved from the original premises (only read into memory and discarded when no longer necessary)
- **Enhanced Data Governance and Quality**: Fabric's Data Catalog provides comprehensive tools for data-drive projects governance in terms of data assets,
including data quality profiling and metadata management. These tools help maintain high-data quality and compliance with regulatory standards,
ensuring that developers work with reliable and standardized information throughout the project.

- **Knowledge and Insight Sharing:** Through detailed metadata, data quality warnings and detailed profiling,
Fabric's Data Catalog enhances the understanding of data's context and behaviour. This shared knowledge base supports better decision-making
and innovation in a data-driven project.

## Related Materials
- 📖 <a href="https://ydata.ai/resources/whitepaper-compare-data-catalogs"><u>Data Catalogs in the modern data stack</u></a>
- :fontawesome-brands-youtube:{ .youtube } <a href="https://www.youtube.com/watch?v=1zYreRKsNGE"><u>How to create your first Datasource from a CSV file?</u></a>
- :fontawesome-brands-youtube:{ .youtube } <a href="https://youtube.com/clip/UgkxN3cYbXHvH2C-dKbX1DYQHV34uea5R9t2?si=u65rOifZHZwsGIgY"><u>How to create a *Database* in the Data Catalog?</u></a>
- :fontawesome-brands-youtube:{ .youtube } <a href="https://www.youtube.com/watch?v=3JyuJlQLM4Q&t=1s"><u>How to automate data quality profiling?</u></a>


## (List of quick links from the documentation. Ex below: )
- What Connectors support Fabric Catalog
- How to understand the quality of my dataset with Fabric Catalog profiling
- How to profile a database
- (etc)
6 changes: 6 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,12 @@ nav:
- How to create your first Lab: "get-started/create_lab.md"
- How to create your first Pipeline: "get-started/create_pipeline.md"
- Fabric Community: "get-started/fabric_community.md"
- Data catalog:
- 'data_catalog/index.md'
- Connectors:
- 'data_catalog/connectors/index.md'
- How to create a connector?: 'data_catalog/connectors/create_connector.md'
- Supported connectors: 'data_catalog/connectors/supported_connections.md'
- SDK:
- Overview: "sdk/index.md"
- Installation: 'sdk/installation.md'
Expand Down

0 comments on commit 46ebb1a

Please sign in to comment.