docs: Add Data Catalog section for connectors.

ydataai · Feb 8, 2024 · 46ebb1a · 46ebb1a
1 parent fda3d77
commit 46ebb1a
Show file tree

Hide file tree

Showing 24 changed files with 97 additions and 23 deletions.
diff --git a/docs/sdk/reference/api/common/client.md → .github/reference/api/common/client.md b/docs/sdk/reference/api/common/client.md → .github/reference/api/common/client.md
diff --git a/docs/sdk/reference/api/common/types.md → .github/reference/api/common/types.md b/docs/sdk/reference/api/common/types.md → .github/reference/api/common/types.md
diff --git a/...sdk/reference/api/connectors/connector.md → ...hub/reference/api/connectors/connector.md b/...sdk/reference/api/connectors/connector.md → ...hub/reference/api/connectors/connector.md
diff --git a/...k/reference/api/datasources/datasource.md → ...b/reference/api/datasources/datasource.md b/...k/reference/api/datasources/datasource.md → ...b/reference/api/datasources/datasource.md
diff --git a/...sdk/reference/api/datasources/metadata.md → ...hub/reference/api/datasources/metadata.md b/...sdk/reference/api/datasources/metadata.md → ...hub/reference/api/datasources/metadata.md
diff --git a/docs/sdk/reference/api/index.md → .github/reference/api/index.md b/docs/sdk/reference/api/index.md → .github/reference/api/index.md
diff --git a/docs/sdk/reference/api/synthesizers/base.md → .github/reference/api/synthesizers/base.md b/docs/sdk/reference/api/synthesizers/base.md → .github/reference/api/synthesizers/base.md
diff --git a/.../reference/api/synthesizers/multitable.md → .../reference/api/synthesizers/multitable.md b/.../reference/api/synthesizers/multitable.md → .../reference/api/synthesizers/multitable.md
diff --git a/...sdk/reference/api/synthesizers/regular.md → ...hub/reference/api/synthesizers/regular.md b/...sdk/reference/api/synthesizers/regular.md → ...hub/reference/api/synthesizers/regular.md
diff --git a/.../reference/api/synthesizers/timeseries.md → .../reference/api/synthesizers/timeseries.md b/.../reference/api/synthesizers/timeseries.md → .../reference/api/synthesizers/timeseries.md
diff --git a/docs/sdk/reference/changelog.md → .github/reference/changelog.md b/docs/sdk/reference/changelog.md → .github/reference/changelog.md
diff --git a/docs/data_catalog/connectors/create_connector.md b/docs/data_catalog/connectors/create_connector.md
@@ -0,0 +1,24 @@
+# How to create a connector in Fabric's Data Catalog?
+
+:fontawesome-brands-youtube:{ .youtube } <a href="https://youtube.com/clip/UgkxVTrEn2jY8GL-wqSXX3PByuUH5Q81Usih?si=xdpQ4eTCo_SEcvxp"><u>How to create a connector to an RDBMS in Fabric?</u></a> 
+
+To create a connector in YData Fabric, select the "Connectors" page from the left side menu, as illustrated in the image below.
+
+[ADD HERE THE IMAGE]
+
+Click in *"Add Connector"* and a list of connector types to choose from will be shown to you. 
+
+[Add here the image]
+
+For the purpose of this example we will be creating a connector to our AWS S3 storage. 
+The credentials/secrets to your storage will be requested. After adding them, you can *"Test connection"* 
+to ensure that all the details are correct. 
+A confirmation message, similar to the one shown in the image below, should appear in our screen, 
+letting you know that you can now save your connector successfully! 
+
+[Add here the image]
+
+**Congrats!** 🚀 You have now created your first **Connector**! You can now create different *Datasources*
+in your project's Data Catalog.
+Get ready for your journey of improved quality data for AI.
+
diff --git a/docs/data_catalog/connectors/file_formats.md b/docs/data_catalog/connectors/file_formats.md
diff --git a/docs/data_catalog/connectors/index.md b/docs/data_catalog/connectors/index.md
@@ -1,8 +1,18 @@
 # Connectors
 
-Introduce the user what are the connectors. Explain the following:
-- Our connectors are highly scalable (due to Dask)
-- The benefits of an easy and re-usable connection
-  - Security, etc.
+Fabric connectors play an important role in the landscape of data-driven projects, acting as essential components that facilitate the movement and integration of data across different systems, platforms, and applications. 
+Fabric connectors where designe to offer a seamless and easy connectivity for data exchange between disparate data sources (such as databases, cloud storage systems, etc).
 
-##
+## Benefits
+
+- **Data Integration:** Fabric Connectors are primarily used to consume and integrate data a variety of different sources in a single project, ensuring that data can be easily combined, transformed, and made ready for 
+analysis or operational use.
+- **Automation of data flows:** They automate the process of data extraction, transformation and loading (ETL), which is crucial for maintaining up-to-date and accurate the data that is being used for a certain project. 
+- **Simplification of data access:** Fabric connectors experience simplify the process of accessing and using data from specialized or complex systems, making it easier for users without deep technical expertise to leverage data
+for insights.
+- **Enhancement of Data Security:** Designed to manage in a secure way the credentials and access to your different storage.
+
+## Get started with Fabric Connectors
+- :fontawesome-brands-youtube:{ .youtube } <a href="https://youtube.com/clip/UgkxVTrEn2jY8GL-wqSXX3PByuUH5Q81Usih?si=xdpQ4eTCo_SEcvxp"><u>How to create a connector in Fabric?</u></a> 
+- :fontawesome-brands-github: <a href="https://www.youtube.com/watch?v=1zYreRKsNGE"><u>How to use Object Storage Connectors through Labs?</u></a>
+- :fontawesome-brands-github: <a href="https://www.youtube.com/watch?v=1zYreRKsNGE"><u>How to use RDBMS connectors through Labs?</u></a>
diff --git a/docs/data_catalog/connectors/supported_connections.md b/docs/data_catalog/connectors/supported_connections.md
@@ -1,4 +1,25 @@
-Mainly leverage information below:
-https://www.notion.so/a4ed9105cdf94d3f92da13f7774e45b6?v=b72e059a809d44e5baacfa56b94f17cc&pvs=4
+# Supported connections
 
-Also useful to build the docs: https://doc.dataiku.com/dss/latest/connecting/connections.html
+Fabric can read and write data from a variety of data sources. 
+
+## Connectors
+
+Here is the list of the available connectors in Fabric. 
+
+| Connector Name       |      Type      |                 Supported file types | Notes                                                                                                 |
+|:---------------------|:--------------:|-------------------------------------:|:------------------------------------------------------------------------------------------------------|
+| AWS S3               | Object Storage |                      `Parquet` `CSV` |                                                                                                       |
+| Azure Blog Storage   | Object Storage |                      `Parquet` `CSV` |                                                                                                       |
+| Azure Data Lake      | Object Storage |                      `Parquet` `CSV` |                                                                                                       |
+| Google Cloud storage | Object Storage |                      `Parquet` `CSV` |                                                                                                       |
+| Upload file          |      File      |                      `Parquet` `CSV` | Maximum file size is 700MB. <br/>Bigger files should be uploaded and read from <br/>remote object storages |
+| Google BigQuery      |   Big Table    |                     `Not applicable` |                                                                                                       |
+| MySQL                |     RDBMS      |                     `Not applicable` | Supports reading whole schemas or specifying a query                                                  |
+| Azure SQL Server     |     RDBMS      |                     `Not applicable` | Supports reading whole schemas or specifying a query                                                  |
+| PostGreSQL           |     RDBMS      |                     `Not applicable` | Supports reading whole schemas or specifying a query                                                  |
+| Snowflake            |     RDBMS      |                     `Not applicable` | Supports reading whole schemas or specifying a query                                                  |
+| Oracle DB            |     RDBMS      |                     `Not applicable` | Supports reading whole schemas or specifying a query                                                  |
+
+## Haven't found your storage?
+
+To understand our development roadmap or to request prioritization of new data connector, reach out to us at [ydata.ai/contact-us.](https://ydata.ai/contact-us)
diff --git a/...data_catalog/datasets/dataset_overview.md → ...a_catalog/datasources/dataset_overview.md b/...data_catalog/datasets/dataset_overview.md → ...a_catalog/datasources/dataset_overview.md
diff --git a/.../data_catalog/datasets/explore_in_labs.md → ...ta_catalog/datasources/explore_in_labs.md b/.../data_catalog/datasets/explore_in_labs.md → ...ta_catalog/datasources/explore_in_labs.md
diff --git a/docs/data_catalog/datasets/index.md → docs/data_catalog/datasources/index.md b/docs/data_catalog/datasets/index.md → docs/data_catalog/datasources/index.md
diff --git a/docs/data_catalog/datasets/metadata.md → docs/data_catalog/datasources/metadata.md b/docs/data_catalog/datasets/metadata.md → docs/data_catalog/datasources/metadata.md
diff --git a/docs/data_catalog/datasets/pii.md → docs/data_catalog/datasources/pii.md b/docs/data_catalog/datasets/pii.md → docs/data_catalog/datasources/pii.md
diff --git a/docs/data_catalog/datasets/profiling.md → docs/data_catalog/datasources/profiling.md b/docs/data_catalog/datasets/profiling.md → docs/data_catalog/datasources/profiling.md
diff --git a/docs/data_catalog/datasets/warnings.md → docs/data_catalog/datasources/warnings.md b/docs/data_catalog/datasets/warnings.md → docs/data_catalog/datasources/warnings.md
diff --git a/docs/data_catalog/index.md b/docs/data_catalog/index.md
@@ -1,22 +1,36 @@
 # Data Catalog
 
-We need to improve the below narrative Mainly focus on:
+In the realm of data management and analysis, the ability to efficiently discover, understand,
+and access data is crucial. Fabric's Data Catalog emerges as a pivotal solution in this context,
+designed to facilitate an organized, searchable, and accessible repository of metadata.
+This chapter introduces the concept, functionality, and advantages of the *Data Catalog* within
+Fabric's ecosystem, offering developers a comprehensive overview of its significance and utility.
 
-- What is Fabric's Data Catalog
-- What are the benefits for the developers
-- What can be found in Fabric's Catalog - Connectors & Datasets (no need to explain them here, only links for the next sections)
+To ensure that large volumes of data can be processed through the entire data pipeline, 
+Fabric is equipped with integrated connectors for various types of storages (from RDBMS to cloud object storage),
+guaranteeing the data never leaves your premises. Furthermore Fabric's Catalog ensures a timely and scalable 
+data analysis as it runs on top of a distributed architecture powered by Kubernetes and [Dask](https://docs.dask.org/en/stable/).
 
-The narrative of a Data-Centric platform must, as expected, start with the data. Data Sources are the primary way of ingesting and understanding data.
+The benefits of Fabric's Data Catalog for data teams are manifold, enhancing not only the efficiency but also
+the effectiveness of data understanding operations:
 
-To ensure that large volumes of data can be processed through the entire platform, integrated direct Connectors for various types of data sources - namely relational database management systems (RDBMS), data warehouses, remote object storages and even local files - exist. These Connectors offer:
+- **Improved Data Accessibility:** With the Data Catalog, developers can consume the data
+they need for a certain project through a user-friendly interface, significantly reducing the time spent searching for data across disparate sources.
+This enhanced discoverability makes it easier to initiate data analysis, machine learning projects,
+- or any other data-driven tasks.
 
-- **simplified** authentication
-- **scalability** and **high-throughput**, through the combination of a distributed computing engine with an infrastructure which scales on-demand
-- **security**, as the connection is direct and data is never moved from the original premises (only read into memory and discarded when no longer necessary)
+- **Enhanced Data Governance and Quality**: Fabric's Data Catalog provides comprehensive tools for data-drive projects governance in terms of data assets, 
+including data quality profiling and metadata management. These tools help maintain high-data quality and compliance with regulatory standards,
+ensuring that developers work with reliable and standardized information throughout the project. 
+
+- **Knowledge and Insight Sharing:** Through detailed metadata, data quality warnings and detailed profiling, 
+Fabric's Data Catalog enhances the understanding of data's context and behaviour. This shared knowledge base supports better decision-making
+and innovation in a data-driven project.
+
+## Related Materials
+- 📖 <a href="https://ydata.ai/resources/whitepaper-compare-data-catalogs"><u>Data Catalogs in the modern data stack</u></a>
+- :fontawesome-brands-youtube:{ .youtube } <a href="https://www.youtube.com/watch?v=1zYreRKsNGE"><u>How to create your first Datasource from a CSV file?</u></a>
+- :fontawesome-brands-youtube:{ .youtube } <a href="https://youtube.com/clip/UgkxN3cYbXHvH2C-dKbX1DYQHV34uea5R9t2?si=u65rOifZHZwsGIgY"><u>How to create a *Database* in the Data Catalog?</u></a>
+- :fontawesome-brands-youtube:{ .youtube } <a href="https://www.youtube.com/watch?v=3JyuJlQLM4Q&t=1s"><u>How to automate data quality profiling?</u></a>
 
 
-## (List of quick links from the documentation. Ex below: )
-- What Connectors support Fabric Catalog
-- How to understand the quality of my dataset with Fabric Catalog profiling
-- How to profile a database
-- (etc)
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -15,6 +15,12 @@ nav:
           - How to create your first Lab: "get-started/create_lab.md"
           - How to create your first Pipeline: "get-started/create_pipeline.md"
       - Fabric Community: "get-started/fabric_community.md"
+  - Data catalog:
+      - 'data_catalog/index.md'
+      - Connectors:
+          - 'data_catalog/connectors/index.md'
+          - How to create a connector?: 'data_catalog/connectors/create_connector.md'
+          - Supported connectors: 'data_catalog/connectors/supported_connections.md'
   - SDK:
       - Overview: "sdk/index.md"
       - Installation: 'sdk/installation.md'