diff --git a/pages/advanced-algorithms/available-algorithms/migrate.mdx b/pages/advanced-algorithms/available-algorithms/migrate.mdx index 4b274e9d0..dccec6d37 100644 --- a/pages/advanced-algorithms/available-algorithms/migrate.mdx +++ b/pages/advanced-algorithms/available-algorithms/migrate.mdx @@ -9,7 +9,13 @@ import { Steps } from 'nextra/components' # migrate -A module that contains procedures describing graphs on a meta-level. +The `migrate` module provides an efficient way to transfer graph data from various relational databases +into Memgraph. This module allows you to retrieve data from various source systems, +transforming tabular data into graph structures. + +With Cypher, you can shape the migrated data dynamically, making it easy to create nodes, +establish relationships, and enrich your graph. Below are examples showing how to retrieve, +filter, and convert relational data into a graph format. Input: } -* `table_or_sql: str` ➡ Table name or an SQL query. When the table name is specified, the module - will migrate all the rows from the table. In the case that a SQL query is provided, the module - will migrate the rows returned from the queries. -* `config: mgp.Map` ➡ Connection configuration parameters (as in `mysql.connector.connect`). -* `config_path` ➡ Path to a JSON file containing configuration parameters (as in `mysql.connector.connect`). -* `params: mgp.Nullable[mgp.Any] (default=None)` ➡ Optionally, queries can be parameterized. In that case, `params` provides parameter values. - +- `table_or_sql: str` ➡ Table name or an SQL query. +- `config: mgp.Map` ➡ Connection parameters (as in `mysql.connector.connect`). +- `config_path` ➡ Path to a JSON file containing configuration parameters. +- `params: mgp.Nullable[mgp.Any] (default=None)` ➡ Query parameters (if applicable). + {

Output:

} -* `row: mgp.Map`: The result table as a stream of rows. +- `row: mgp.Map` ➡ The result table as a stream of rows. {

Usage:

} -To inspect a sample of rows, use the following query: - +#### Retrieve and inspect data ```cypher -CALL migrate.mysql('example_table', {user:'memgraph', - password:'password', - host:'localhost', - database:'demo_db'} ) +CALL migrate.mysql('example_table', {user: 'memgraph', + password: 'password', + host: 'localhost', + database: 'demo_db'} ) YIELD row -RETURN row; +RETURN row LIMIT 5000; ``` -In the case you want to migrate specific results from a SQL query, it is enough to modify the -first argument of the query module call, and continue to use the Cypher query language to -shape your results: - +#### Filter specific data ```cypher -CALL migrate.mysql('SELECT * FROM example_table', {user:'memgraph', - password:'password', - host:'localhost', - database:'demo_db'} ) +CALL migrate.mysql('SELECT * FROM users', {user: 'memgraph', + password: 'password', + host: 'localhost', + database: 'demo_db'} ) YIELD row -WITH row WHERE row.age >= 30 RETURN row; ``` -### `sql_server()` +#### Create nodes from migrated data +```cypher +CALL migrate.mysql('SELECT id, name, age FROM users', {user: 'memgraph', + password: 'password', + host: 'localhost', + database: 'demo_db'} ) +YIELD row +CREATE (u:User {id: row.id, name: row.name, age: row.age}); +``` + +#### Create relationships between users +```cypher +CALL migrate.mysql('SELECT user1_id, user2_id FROM friendships', {user: 'memgraph', + password: 'password', + host: 'localhost', + database: 'demo_db'} ) +YIELD row +MATCH (u1:User {id: row.user1_id}), (u2:User {id: row.user2_id}) +CREATE (u1)-[:FRIENDS_WITH]->(u2); +``` -With the `migrate.sql_server()` procedure you can access SQL Server and migrate your data -to Memgraph. The result table is converted into a stream, and the returned rows can -be used to create graph structures. The value of the `config` parameter must be -at least an empty map. If `config_path` is passed, every key-value pair from -JSON file will overwrite any values in `config` file. +--- + +### `oracle_db()` + +With the `migrate.oracle_db()` procedure, you can access Oracle DB and migrate your data to Memgraph. {

Input:

} -* `table_or_sql: str` ➡ Table name or an SQL query. When the table name is specified, the module - will migrate all the rows from the table. In the case that a SQL query is provided, the module - will migrate the rows returned from the queries. -* `config: mgp.Map` ➡ Connection configuration parameters (as in `pyodbc.connect`). -* `config_path` ➡ Path to the JSON file containing configuration parameters (as in `pyodbc.connect`). -* `params: mgp.Nullable[mgp.Any] (default=None)` ➡ Optionally, queries can be parameterized. In that case, `params` provides parameter values. - +- `table_or_sql: str` ➡ Table name or an SQL query. +- `config: mgp.Map` ➡ Connection parameters (as in `mysql.connector.connect`). +- `config_path` ➡ Path to a JSON file containing configuration parameters. +- `params: mgp.Nullable[mgp.Any] (default=None)` ➡ Query parameters (if applicable). + {

Output:

} -* `row: mgp.Map`: The result table as a stream of rows. +- `row: mgp.Map` ➡ The result table as a stream of rows. {

Usage:

} -To inspect the first 5000 rows from a database, use the following query: - +#### Retrieve and inspect data ```cypher -CALL migrate.sql_server('example_table', {user:'memgraph', - password:'password', - host:'localhost', - database:'demo_db'} ) +CALL migrate.oracle_db('example_table', {user: 'memgraph', + password: 'password', + host: 'localhost', + database: 'demo_db'} ) YIELD row RETURN row LIMIT 5000; ``` -In the case you want to migrate specific results from a SQL query, it is enough to modify the -first argument of the query module call, and continue to use the Cypher query language to -shape your results: - +#### Merge nodes to avoid duplicates ```cypher -CALL migrate.sql_server('SELECT * FROM example_table', {user:'memgraph', - password:'password', - host:'localhost', - database:'demo_db'} ) +CALL migrate.oracle_db('SELECT id, name FROM companies', {user: 'memgraph', + password: 'password', + host: 'localhost', + database: 'business_db'} ) YIELD row -WITH row -WHERE row.age >= 30 -RETURN row; +MERGE (c:Company {id: row.id}) +SET c.name = row.name; ``` -### `oracle_db()` +--- + +### `postgresql()` -With the `migrate.oracle_db` you can access Oracle DB and migrate your data to Memgraph. -The result table is converted into a stream, and the returned rows can be used to -create graph structures. The value of the `config` parameter must be at least an -empty map. If `config_path` is passed, every key-value pair from JSON file will -overwrite any values in `config` file. +With the `migrate.postgresql()` procedure, you can access PostgreSQL and migrate your data to Memgraph. {

Input:

} -* `table_or_sql: str` ➡ Table name or an SQL query. When the table name is specified, the module - will migrate all the rows from the table. In the case that a SQL query is provided, the module - will migrate the rows returned from the queries. -* `config: mgp.Map` ➡ Connection configuration parameters (as in `oracledb.connect`). -* `config_path` ➡ Path to the JSON file containing configuration parameters (as in `oracledb.connect`). -* `params: mgp.Nullable[mgp.Any] (default=None)` ➡ Optionally, queries may be parameterized. In that case, `params` provides parameter values. - +- `table_or_sql: str` ➡ Table name or an SQL query. +- `config: mgp.Map` ➡ Connection parameters (as in `mysql.connector.connect`). +- `config_path` ➡ Path to a JSON file containing configuration parameters. +- `params: mgp.Nullable[mgp.Any] (default=None)` ➡ Query parameters (if applicable). + {

Output:

} -* `row: mgp.Map`: The result table as a stream of rows. +- `row: mgp.Map` ➡ The result table as a stream of rows. {

Usage:

} -To inspect the first 5000 rows from a database, use the following query: - +#### Retrieve and inspect data ```cypher -CALL migrate.oracle_db('example_table', {user:'memgraph', - password:'password', - host:'localhost', - database:'demo_db'} ) +CALL migrate.postgresql('example_table', {user: 'memgraph', + password: 'password', + host: 'localhost', + database: 'demo_db'} ) YIELD row RETURN row LIMIT 5000; ``` -In the case you want to migrate specific results from a SQL query, it is enough to modify the -first argument of the query module call, and continue to use the Cypher query language to -shape your results: +#### Create nodes for products +```cypher +CALL migrate.postgresql('SELECT product_id, name, price FROM products', {user: 'memgraph', + password: 'password', + host: 'localhost', + database: 'retail_db'} ) +YIELD row +CREATE (p:Product {id: row.product_id, name: row.name, price: row.price}); +``` +#### Establish relationships between orders and customers ```cypher -CALL migrate.oracle_db('SELECT * FROM example_table', {user:'memgraph', - password:'password', - host:'localhost', - database:'demo_db'} ) +CALL migrate.postgresql('SELECT order_id, customer_id FROM orders', {user: 'memgraph', + password: 'password', + host: 'localhost', + database: 'retail_db'} ) YIELD row -WITH row -WHERE row.age >= 30 -RETURN row; +MATCH (o:Order {id: row.order_id}), (c:Customer {id: row.customer_id}) +CREATE (c)-[:PLACED]->(o); ``` -### `postgresql()` +--- -With the `migrate.postgresql` you can access PostgreSQL and migrate your data to Memgraph. -The result table is converted into a stream, and the returned rows can be used to -create graph structures. The value of the `config` parameter must be at least an -empty map. If `config_path` is passed, every key-value pair from JSON file will -overwrite any values in `config` file. +### `sql_server()` + +With the `migrate.sql_server()` procedure, you can access SQL Server and migrate your data to Memgraph. {

Input:

} -* `table_or_sql: str` ➡ Table name or an SQL query. When the table name is specified, the module - will migrate all the rows from the table. In the case that a SQL query is provided, the module - will migrate the rows returned from the queries. -* `config: mgp.Map` ➡ Connection configuration parameters (as in `psycopg2.connect`). -* `config_path` ➡ Path to the JSON file containing configuration parameters (as in `psycopg2.connect`). -* `params: mgp.Nullable[mgp.Any] (default=None)` ➡ Optionally, queries may be parameterized. In that case, `params` provides parameter values. - +- `table_or_sql: str` ➡ Table name or an SQL query. +- `config: mgp.Map` ➡ Connection parameters (as in `mysql.connector.connect`). +- `config_path` ➡ Path to a JSON file containing configuration parameters. +- `params: mgp.Nullable[mgp.Any] (default=None)` ➡ Query parameters (if applicable). + {

Output:

} -* `row: mgp.Map`: The result table as a stream of rows. +- `row: mgp.Map` ➡ The result table as a stream of rows. {

Usage:

} -To inspect the first 5000 rows from a database, use the following query: - +#### Retrieve and inspect data ```cypher -CALL migrate.postgresql('example_table', {user:'memgraph', - password:'password', - host:'localhost', - database:'demo_db'} ) +CALL migrate.sql_server('example_table', {user: 'memgraph', + password: 'password', + host: 'localhost', + database: 'demo_db'} ) YIELD row RETURN row LIMIT 5000; ``` -In the case you want to migrate specific results from a SQL query, it is enough to modify the -first argument of the query module call, and continue to use the Cypher query language to -shape your results: +#### Convert SQL table rows into graph nodes +```cypher +CALL migrate.sql_server('SELECT id, name, role FROM employees', {user: 'memgraph', + password: 'password', + host: 'localhost', + database: 'company_db'} ) +YIELD row +CREATE (e:Employee {id: row.id, name: row.name, role: row.role}); +``` + +--- + +### `s3()` +With the `migrate.s3()` procedure, you can **access a CSV file in AWS S3**, stream the data into Memgraph, +and transform it into a **graph representation** using Cypher. The migration is using the Python `boto3` client. + +{

Input:

} + +- `file_path: str` ➡ S3 file path in the format `'s3://bucket-name/path/to/file.csv'`. +- `config: mgp.Map` ➡ AWS connection parameters. All of them are optional. + - `aws_access_key_id` - if not provided, environment variable `AWS_ACCESS_KEY_ID` will be used + - `aws_secret_access_key` - if not provided, environment variable `AWS_SECRET_ACCESS_KEY` will be used + - `region_name` - if not provided, environment variable `AWS_REGION` will be used + - `aws_session_token` - if not provided, environment variable `AWS_SESSION_TOKEN` will be used +- `config_path: str` (optional) ➡ Path to a JSON file containing AWS credentials. + +{

Output:

} + +- `row: mgp.Map` ➡ Each row from the CSV file as a structured dictionary. + +{

Usage:

} + +#### Retrieve and inspect CSV data from S3 +```cypher +CALL migrate.s3('s3://my-bucket/data.csv', {aws_access_key_id: 'your-key', + aws_secret_access_key: 'your-secret', + region_name: 'us-east-1'} ) +YIELD row +RETURN row +LIMIT 100; +``` + +#### Filter specific rows from the CSV ```cypher -CALL migrate.postgresql('SELECT * FROM example_table', {user:'memgraph', - password:'password', - host:'localhost', - database:'demo_db'} ) +CALL migrate.s3('s3://my-bucket/customers.csv', {aws_access_key_id: 'your-key', + aws_secret_access_key: 'your-secret', + region_name: 'us-west-2'} ) YIELD row -WITH row WHERE row.age >= 30 RETURN row; ``` + +#### Create nodes dynamically from CSV data +```cypher +CALL migrate.s3('s3://my-bucket/employees.csv', {aws_access_key_id: 'your-key', + aws_secret_access_key: 'your-secret', + region_name: 'eu-central-1'} ) +YIELD row +CREATE (e:Employee {id: row.id, name: row.name, position: row.position}); +``` diff --git a/pages/data-migration.mdx b/pages/data-migration.mdx index 169c6fd4e..5a8cccc0b 100644 --- a/pages/data-migration.mdx +++ b/pages/data-migration.mdx @@ -15,7 +15,7 @@ Where is the data you want to migrate? - [CYPHERL files](#cypherl-files) - [Neo4j](#neo4j) - [Data from an application or a program](#data-from-an-application-or-a-program) -- [Relational database management system (MySQL, SQL Server, Oracle)](#rdbms) +- [Relational database management system (MySQL, SQL Server, Oracle DB, PostgreSQL, AWS S3)](#rdbms) - [In a stream](#data-from-a-stream) - [Parquet, ORC or IPC/Feather/Arrow file](#parquet-orc-or-ipcfeatherarrow-file) - [NetworkX, PyG or DGL graph](#networkx-pyg-or-dgl-graph) @@ -86,10 +86,10 @@ data](/data-modeling) and rewrite the CSV file, then import it into Memgraph using the LOAD CSV clause, like in this [example](/data-migration/migrate-from-rdbms). -Alternatively, you can use the [`migration` +Alternatively, you can use the [`migrate` module](/advanced-algorithms/available-algorithms/migrate) from the MAGE graph -library which allows you to access data from a MySQL database, an SQL Server or -an Oracle database. +library which allows you to access data from a MySQL database, an SQL Server, +Oracle database, PostgreSQL or a CSV file in AWS S3. ## Data from a stream