Skip to content

Commit 92e4cad

Browse files
Migrate from S3 (#1157)
* Add migration from S3 docs * add reference to new procedure --------- Co-authored-by: katarinasupe <[email protected]>
1 parent bc45957 commit 92e4cad

File tree

2 files changed

+176
-123
lines changed

2 files changed

+176
-123
lines changed

pages/advanced-algorithms/available-algorithms/migrate.mdx

Lines changed: 172 additions & 119 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,13 @@ import { Steps } from 'nextra/components'
99

1010
# migrate
1111

12-
A module that contains procedures describing graphs on a meta-level.
12+
The `migrate` module provides an efficient way to transfer graph data from various relational databases
13+
into Memgraph. This module allows you to retrieve data from various source systems,
14+
transforming tabular data into graph structures.
15+
16+
With Cypher, you can shape the migrated data dynamically, making it easy to create nodes,
17+
establish relationships, and enrich your graph. Below are examples showing how to retrieve,
18+
filter, and convert relational data into a graph format.
1319

1420
<Cards>
1521
<Cards.Card
@@ -25,204 +31,251 @@ A module that contains procedures describing graphs on a meta-level.
2531
| **Implementation** | Python |
2632
| **Parallelism** | sequential |
2733

34+
---
35+
2836
## Procedures
2937

3038
### `mysql()`
3139

32-
With the `migrate.mysql()` procedure you can access MySQL and migrate your data to Memgraph.
33-
The result table is converted into a stream, and the returned rows can be used
34-
to create graph structures. The value of the `config` parameter must be at least
35-
an empty map. If `config_path` is passed, every key-value pair from JSON file
36-
will overwrite any values in `config` file.
40+
With the `migrate.mysql()` procedure, you can access MySQL and migrate your data to Memgraph.
41+
The result table is converted into a stream, and the returned rows can be used to create graph structures.
3742

3843
{<h4 className="custom-header"> Input: </h4>}
3944

40-
* `table_or_sql: str` ➡ Table name or an SQL query. When the table name is specified, the module
41-
will migrate all the rows from the table. In the case that a SQL query is provided, the module
42-
will migrate the rows returned from the queries.
43-
* `config: mgp.Map` ➡ Connection configuration parameters (as in `mysql.connector.connect`).
44-
* `config_path` ➡ Path to a JSON file containing configuration parameters (as in `mysql.connector.connect`).
45-
* `params: mgp.Nullable[mgp.Any] (default=None)` ➡ Optionally, queries can be parameterized. In that case, `params` provides parameter values.
46-
45+
- `table_or_sql: str` ➡ Table name or an SQL query.
46+
- `config: mgp.Map` ➡ Connection parameters (as in `mysql.connector.connect`).
47+
- `config_path` ➡ Path to a JSON file containing configuration parameters.
48+
- `params: mgp.Nullable[mgp.Any] (default=None)` ➡ Query parameters (if applicable).
49+
4750
{<h4 className="custom-header"> Output: </h4>}
4851

49-
* `row: mgp.Map`: The result table as a stream of rows.
52+
- `row: mgp.Map` The result table as a stream of rows.
5053

5154
{<h4 className="custom-header"> Usage: </h4>}
5255

53-
To inspect a sample of rows, use the following query:
54-
56+
#### Retrieve and inspect data
5557
```cypher
56-
CALL migrate.mysql('example_table', {user:'memgraph',
57-
password:'password',
58-
host:'localhost',
59-
database:'demo_db'} )
58+
CALL migrate.mysql('example_table', {user: 'memgraph',
59+
password: 'password',
60+
host: 'localhost',
61+
database: 'demo_db'} )
6062
YIELD row
61-
RETURN row;
63+
RETURN row
6264
LIMIT 5000;
6365
```
6466

65-
In the case you want to migrate specific results from a SQL query, it is enough to modify the
66-
first argument of the query module call, and continue to use the Cypher query language to
67-
shape your results:
68-
67+
#### Filter specific data
6968
```cypher
70-
CALL migrate.mysql('SELECT * FROM example_table', {user:'memgraph',
71-
password:'password',
72-
host:'localhost',
73-
database:'demo_db'} )
69+
CALL migrate.mysql('SELECT * FROM users', {user: 'memgraph',
70+
password: 'password',
71+
host: 'localhost',
72+
database: 'demo_db'} )
7473
YIELD row
75-
WITH row
7674
WHERE row.age >= 30
7775
RETURN row;
7876
```
7977

80-
### `sql_server()`
78+
#### Create nodes from migrated data
79+
```cypher
80+
CALL migrate.mysql('SELECT id, name, age FROM users', {user: 'memgraph',
81+
password: 'password',
82+
host: 'localhost',
83+
database: 'demo_db'} )
84+
YIELD row
85+
CREATE (u:User {id: row.id, name: row.name, age: row.age});
86+
```
87+
88+
#### Create relationships between users
89+
```cypher
90+
CALL migrate.mysql('SELECT user1_id, user2_id FROM friendships', {user: 'memgraph',
91+
password: 'password',
92+
host: 'localhost',
93+
database: 'demo_db'} )
94+
YIELD row
95+
MATCH (u1:User {id: row.user1_id}), (u2:User {id: row.user2_id})
96+
CREATE (u1)-[:FRIENDS_WITH]->(u2);
97+
```
8198

82-
With the `migrate.sql_server()` procedure you can access SQL Server and migrate your data
83-
to Memgraph. The result table is converted into a stream, and the returned rows can
84-
be used to create graph structures. The value of the `config` parameter must be
85-
at least an empty map. If `config_path` is passed, every key-value pair from
86-
JSON file will overwrite any values in `config` file.
99+
---
100+
101+
### `oracle_db()`
102+
103+
With the `migrate.oracle_db()` procedure, you can access Oracle DB and migrate your data to Memgraph.
87104

88105
{<h4 className="custom-header"> Input: </h4>}
89106

90-
* `table_or_sql: str` ➡ Table name or an SQL query. When the table name is specified, the module
91-
will migrate all the rows from the table. In the case that a SQL query is provided, the module
92-
will migrate the rows returned from the queries.
93-
* `config: mgp.Map` ➡ Connection configuration parameters (as in `pyodbc.connect`).
94-
* `config_path` ➡ Path to the JSON file containing configuration parameters (as in `pyodbc.connect`).
95-
* `params: mgp.Nullable[mgp.Any] (default=None)` ➡ Optionally, queries can be parameterized. In that case, `params` provides parameter values.
96-
107+
- `table_or_sql: str` ➡ Table name or an SQL query.
108+
- `config: mgp.Map` ➡ Connection parameters (as in `mysql.connector.connect`).
109+
- `config_path` ➡ Path to a JSON file containing configuration parameters.
110+
- `params: mgp.Nullable[mgp.Any] (default=None)` ➡ Query parameters (if applicable).
111+
97112
{<h4 className="custom-header"> Output: </h4>}
98113

99-
* `row: mgp.Map`: The result table as a stream of rows.
114+
- `row: mgp.Map` The result table as a stream of rows.
100115

101116
{<h4 className="custom-header"> Usage: </h4>}
102117

103-
To inspect the first 5000 rows from a database, use the following query:
104-
118+
#### Retrieve and inspect data
105119
```cypher
106-
CALL migrate.sql_server('example_table', {user:'memgraph',
107-
password:'password',
108-
host:'localhost',
109-
database:'demo_db'} )
120+
CALL migrate.oracle_db('example_table', {user: 'memgraph',
121+
password: 'password',
122+
host: 'localhost',
123+
database: 'demo_db'} )
110124
YIELD row
111125
RETURN row
112126
LIMIT 5000;
113127
```
114128

115-
In the case you want to migrate specific results from a SQL query, it is enough to modify the
116-
first argument of the query module call, and continue to use the Cypher query language to
117-
shape your results:
118-
129+
#### Merge nodes to avoid duplicates
119130
```cypher
120-
CALL migrate.sql_server('SELECT * FROM example_table', {user:'memgraph',
121-
password:'password',
122-
host:'localhost',
123-
database:'demo_db'} )
131+
CALL migrate.oracle_db('SELECT id, name FROM companies', {user: 'memgraph',
132+
password: 'password',
133+
host: 'localhost',
134+
database: 'business_db'} )
124135
YIELD row
125-
WITH row
126-
WHERE row.age >= 30
127-
RETURN row;
136+
MERGE (c:Company {id: row.id})
137+
SET c.name = row.name;
128138
```
129139

130-
### `oracle_db()`
140+
---
141+
142+
### `postgresql()`
131143

132-
With the `migrate.oracle_db` you can access Oracle DB and migrate your data to Memgraph.
133-
The result table is converted into a stream, and the returned rows can be used to
134-
create graph structures. The value of the `config` parameter must be at least an
135-
empty map. If `config_path` is passed, every key-value pair from JSON file will
136-
overwrite any values in `config` file.
144+
With the `migrate.postgresql()` procedure, you can access PostgreSQL and migrate your data to Memgraph.
137145

138146
{<h4 className="custom-header"> Input: </h4>}
139147

140-
* `table_or_sql: str` ➡ Table name or an SQL query. When the table name is specified, the module
141-
will migrate all the rows from the table. In the case that a SQL query is provided, the module
142-
will migrate the rows returned from the queries.
143-
* `config: mgp.Map` ➡ Connection configuration parameters (as in `oracledb.connect`).
144-
* `config_path` ➡ Path to the JSON file containing configuration parameters (as in `oracledb.connect`).
145-
* `params: mgp.Nullable[mgp.Any] (default=None)` ➡ Optionally, queries may be parameterized. In that case, `params` provides parameter values.
146-
148+
- `table_or_sql: str` ➡ Table name or an SQL query.
149+
- `config: mgp.Map` ➡ Connection parameters (as in `mysql.connector.connect`).
150+
- `config_path` ➡ Path to a JSON file containing configuration parameters.
151+
- `params: mgp.Nullable[mgp.Any] (default=None)` ➡ Query parameters (if applicable).
152+
147153
{<h4 className="custom-header"> Output: </h4>}
148154

149-
* `row: mgp.Map`: The result table as a stream of rows.
155+
- `row: mgp.Map` The result table as a stream of rows.
150156

151157
{<h4 className="custom-header"> Usage: </h4>}
152158

153-
To inspect the first 5000 rows from a database, use the following query:
154-
159+
#### Retrieve and inspect data
155160
```cypher
156-
CALL migrate.oracle_db('example_table', {user:'memgraph',
157-
password:'password',
158-
host:'localhost',
159-
database:'demo_db'} )
161+
CALL migrate.postgresql('example_table', {user: 'memgraph',
162+
password: 'password',
163+
host: 'localhost',
164+
database: 'demo_db'} )
160165
YIELD row
161166
RETURN row
162167
LIMIT 5000;
163168
```
164169

165-
In the case you want to migrate specific results from a SQL query, it is enough to modify the
166-
first argument of the query module call, and continue to use the Cypher query language to
167-
shape your results:
170+
#### Create nodes for products
171+
```cypher
172+
CALL migrate.postgresql('SELECT product_id, name, price FROM products', {user: 'memgraph',
173+
password: 'password',
174+
host: 'localhost',
175+
database: 'retail_db'} )
176+
YIELD row
177+
CREATE (p:Product {id: row.product_id, name: row.name, price: row.price});
178+
```
168179

180+
#### Establish relationships between orders and customers
169181
```cypher
170-
CALL migrate.oracle_db('SELECT * FROM example_table', {user:'memgraph',
171-
password:'password',
172-
host:'localhost',
173-
database:'demo_db'} )
182+
CALL migrate.postgresql('SELECT order_id, customer_id FROM orders', {user: 'memgraph',
183+
password: 'password',
184+
host: 'localhost',
185+
database: 'retail_db'} )
174186
YIELD row
175-
WITH row
176-
WHERE row.age >= 30
177-
RETURN row;
187+
MATCH (o:Order {id: row.order_id}), (c:Customer {id: row.customer_id})
188+
CREATE (c)-[:PLACED]->(o);
178189
```
179190

180-
### `postgresql()`
191+
---
181192

182-
With the `migrate.postgresql` you can access PostgreSQL and migrate your data to Memgraph.
183-
The result table is converted into a stream, and the returned rows can be used to
184-
create graph structures. The value of the `config` parameter must be at least an
185-
empty map. If `config_path` is passed, every key-value pair from JSON file will
186-
overwrite any values in `config` file.
193+
### `sql_server()`
194+
195+
With the `migrate.sql_server()` procedure, you can access SQL Server and migrate your data to Memgraph.
187196

188197
{<h4 className="custom-header"> Input: </h4>}
189198

190-
* `table_or_sql: str` ➡ Table name or an SQL query. When the table name is specified, the module
191-
will migrate all the rows from the table. In the case that a SQL query is provided, the module
192-
will migrate the rows returned from the queries.
193-
* `config: mgp.Map` ➡ Connection configuration parameters (as in `psycopg2.connect`).
194-
* `config_path` ➡ Path to the JSON file containing configuration parameters (as in `psycopg2.connect`).
195-
* `params: mgp.Nullable[mgp.Any] (default=None)` ➡ Optionally, queries may be parameterized. In that case, `params` provides parameter values.
196-
199+
- `table_or_sql: str` ➡ Table name or an SQL query.
200+
- `config: mgp.Map` ➡ Connection parameters (as in `mysql.connector.connect`).
201+
- `config_path` ➡ Path to a JSON file containing configuration parameters.
202+
- `params: mgp.Nullable[mgp.Any] (default=None)` ➡ Query parameters (if applicable).
203+
197204
{<h4 className="custom-header"> Output: </h4>}
198205

199-
* `row: mgp.Map`: The result table as a stream of rows.
206+
- `row: mgp.Map` The result table as a stream of rows.
200207

201208
{<h4 className="custom-header"> Usage: </h4>}
202209

203-
To inspect the first 5000 rows from a database, use the following query:
204-
210+
#### Retrieve and inspect data
205211
```cypher
206-
CALL migrate.postgresql('example_table', {user:'memgraph',
207-
password:'password',
208-
host:'localhost',
209-
database:'demo_db'} )
212+
CALL migrate.sql_server('example_table', {user: 'memgraph',
213+
password: 'password',
214+
host: 'localhost',
215+
database: 'demo_db'} )
210216
YIELD row
211217
RETURN row
212218
LIMIT 5000;
213219
```
214220

215-
In the case you want to migrate specific results from a SQL query, it is enough to modify the
216-
first argument of the query module call, and continue to use the Cypher query language to
217-
shape your results:
221+
#### Convert SQL table rows into graph nodes
222+
```cypher
223+
CALL migrate.sql_server('SELECT id, name, role FROM employees', {user: 'memgraph',
224+
password: 'password',
225+
host: 'localhost',
226+
database: 'company_db'} )
227+
YIELD row
228+
CREATE (e:Employee {id: row.id, name: row.name, role: row.role});
229+
```
230+
231+
---
232+
233+
### `s3()`
218234

235+
With the `migrate.s3()` procedure, you can **access a CSV file in AWS S3**, stream the data into Memgraph,
236+
and transform it into a **graph representation** using Cypher. The migration is using the Python `boto3` client.
237+
238+
{<h4 className="custom-header"> Input: </h4>}
239+
240+
- `file_path: str` ➡ S3 file path in the format `'s3://bucket-name/path/to/file.csv'`.
241+
- `config: mgp.Map` ➡ AWS connection parameters. All of them are optional.
242+
- `aws_access_key_id` - if not provided, environment variable `AWS_ACCESS_KEY_ID` will be used
243+
- `aws_secret_access_key` - if not provided, environment variable `AWS_SECRET_ACCESS_KEY` will be used
244+
- `region_name` - if not provided, environment variable `AWS_REGION` will be used
245+
- `aws_session_token` - if not provided, environment variable `AWS_SESSION_TOKEN` will be used
246+
- `config_path: str` (optional) ➡ Path to a JSON file containing AWS credentials.
247+
248+
{<h4 className="custom-header"> Output: </h4>}
249+
250+
- `row: mgp.Map` ➡ Each row from the CSV file as a structured dictionary.
251+
252+
{<h4 className="custom-header"> Usage: </h4>}
253+
254+
#### Retrieve and inspect CSV data from S3
255+
```cypher
256+
CALL migrate.s3('s3://my-bucket/data.csv', {aws_access_key_id: 'your-key',
257+
aws_secret_access_key: 'your-secret',
258+
region_name: 'us-east-1'} )
259+
YIELD row
260+
RETURN row
261+
LIMIT 100;
262+
```
263+
264+
#### Filter specific rows from the CSV
219265
```cypher
220-
CALL migrate.postgresql('SELECT * FROM example_table', {user:'memgraph',
221-
password:'password',
222-
host:'localhost',
223-
database:'demo_db'} )
266+
CALL migrate.s3('s3://my-bucket/customers.csv', {aws_access_key_id: 'your-key',
267+
aws_secret_access_key: 'your-secret',
268+
region_name: 'us-west-2'} )
224269
YIELD row
225-
WITH row
226270
WHERE row.age >= 30
227271
RETURN row;
228272
```
273+
274+
#### Create nodes dynamically from CSV data
275+
```cypher
276+
CALL migrate.s3('s3://my-bucket/employees.csv', {aws_access_key_id: 'your-key',
277+
aws_secret_access_key: 'your-secret',
278+
region_name: 'eu-central-1'} )
279+
YIELD row
280+
CREATE (e:Employee {id: row.id, name: row.name, position: row.position});
281+
```

0 commit comments

Comments
 (0)