diff --git a/LLMs/src/Neo4jOpenAIApoc.ipynb b/LLMs/src/Neo4jOpenAIApoc.ipynb
new file mode 100644
index 0000000..bab44e5
--- /dev/null
+++ b/LLMs/src/Neo4jOpenAIApoc.ipynb
@@ -0,0 +1,553 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "provenance": [],
+ "authorship_tag": "ABX9TyNtGurXJvdCqbMNGCM35Inr",
+ "include_colab_link": true
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github",
+ "colab_type": "text"
+ },
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "EehUBcly6lRg",
+ "outputId": "a764076b-f9dc-4439-f521-c6fb1d533135"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
+ "Requirement already satisfied: neo4j in /usr/local/lib/python3.10/dist-packages (5.9.0)\n",
+ "Requirement already satisfied: pytz in /usr/local/lib/python3.10/dist-packages (from neo4j) (2022.7.1)\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip install neo4j"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Integrate LLM workflows with Knowledge Graph using Neo4j and APOC\n",
+ "## OpenAI and VertexAI endpoints are now available as APOC Extended procedures\n",
+ "Probably a day doesn't go by that you don't hear about new and exciting things happening in the Large Language Model (LLM) space. There are so many opportunities and use cases for any company to utilize the power of LLMs to enhance their productivity, transform or manipulate their data, and be used in conversational AI and QA systems.\n",
+ "To make it easier for you to integrate LLMs with Knowledge Graphs, the team at Neo4j has begun the journey of adding support for LLM integrations. The integrations are available as APOC Extended procedures. At the moment, OpenAI and VertexAI endpoints are supported, but we plan to add support for many more.\n",
+ "When I was brainstorming what would be a cool use case to demonstrate the newly added APOC procedures, my friend Michael Hunger suggested an exciting idea. What if we used graph context, or the neighborhood of a node, to enrich the information stored in text embeddings? That way, the vector similarity search could produce better results due to the increased richness of embedded information. The idea is simple but compelling and could be helpful in many use cases.\n",
+ "\n",
+ "## Neo4j environment setup\n",
+ "In this example, we will use both the APOC and Graph Data Science libraries. Luckily, Neo4j Sandbox projects have both libraries installed and additionally come with a prepopulated database. Therefore, you can set up the environment with a couple of clicks. We will use [the small Movie project](https://sandbox.neo4j.com/?usecase=movies) to avoid incurring a more considerable LLM API cost."
+ ],
+ "metadata": {
+ "id": "Dz238wQH1UEd"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# Define Neo4j connections\n",
+ "from neo4j import GraphDatabase\n",
+ "host = 'bolt://44.215.124.63:7687'\n",
+ "user = 'neo4j'\n",
+ "password = 'steel-orders-reproduction'\n",
+ "driver = GraphDatabase.driver(host,auth=(user, password))\n",
+ "\n",
+ "def run_query(query, params={}):\n",
+ " with driver.session() as session:\n",
+ " result = session.run(query, params)\n",
+ " return result.to_df()"
+ ],
+ "metadata": {
+ "id": "QM4VE2Q_6n0D"
+ },
+ "execution_count": 2,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "\n",
+ "\n",
+ "The dataset contains Movie and Person nodes. There are only 38 movies, so we are dealing with a tiny dataset. The information provides a movie's title and tagline, when it was released, and who acted in or directed it.\n",
+ "## Constructing text embedding values\n",
+ "\n",
+ "We will be using the OpenAI API endpoints. Therefore, you will end to create an OpenAI account if you haven't already.\n",
+ "\n",
+ "As mentioned, the idea is to use the neighborhood of a node to construct its text embedding representation. Since the graph model is simple, we don't have a lot of creative freedom. We will create text embedding representations of movies by using their properties and neighbor information. In this instance, the neighbor information is only about its actors and directors. However, I believe that this concept can be applied to more complex graph schema and be used to improve your vector similarity search applications.\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "CgJdlP841fj4"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "\n",
+ "\n",
+ "The typical approach we see nowadays, where we simply chunk and embed documents, might fail when looking for information that spans multiple documents. This problem is also known as multi-hop question answering. However, the multi-hop QA problem can be solved using knowledge graphs. One way to look at a knowledge graph is as condensed information storage. For example, an information extraction pipeline can be used to extract relevant information from various records. Using knowledge graphs, you can represent highly-connected information that spans multiple documents as relationships between various entities.\n",
+ "\n",
+ "One solution is to use LLMs to generate a Cypher statement that can be used to retrieve connected information from the database. Another solution, which we will use here, is to use the connection information to enrich the text embedding representations. Additionally, the enhanced information can be retrieved at query time to provide additional context to the LLM from which it can base its response.\n",
+ "\n",
+ "The following Cypher query can be used to retrieve all the relevant information about the movie nodes from their neighbors."
+ ],
+ "metadata": {
+ "id": "yz60G2IR1sX3"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "print(run_query(\"\"\"\n",
+ "MATCH (m:Movie)\n",
+ "MATCH (m)-[r:ACTED_IN|DIRECTED]-(t)\n",
+ "WITH m, type(r) as type, collect(t.name) as names\n",
+ "WITH m, type+\": \"+reduce(s=\"\", n IN names | s + n + \", \") as types\n",
+ "WITH m, collect(types) as contexts\n",
+ "WITH m, \"Movie title: \"+ m.title + \" year: \"+coalesce(m.released,\"\") +\" plot: \"+ coalesce(m.tagline,\"\")+\"\\n\" +\n",
+ " reduce(s=\"\", c in contexts | s + substring(c, 0, size(c)-2) +\"\\n\") as context\n",
+ "RETURN context LIMIT 1\n",
+ "\"\"\")['context'][0])"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "M7_NaY4S7HTa",
+ "outputId": "962b21fb-0e90-4799-ad6d-8d2715c11524"
+ },
+ "execution_count": 3,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Movie title: The Matrix year: 1999 plot: Welcome to the Real World\n",
+ "ACTED_IN: Emil Eifrem, Hugo Weaving, Laurence Fishburne, Carrie-Anne Moss, Keanu Reeves\n",
+ "DIRECTED: Lana Wachowski, Lilly Wachowski\n",
+ "\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Depending on your domain, you might also use custom queries that retrieve information more than one hop away or sometimes want to aggregate some results.\n",
+ "\n",
+ "We will now use OpenAI's embedding endpoint to generate text embeddings representing the movies and their context and store them as node properties."
+ ],
+ "metadata": {
+ "id": "qsjWWgtP10Dg"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "openai_api_key = \"OPENAI_API_KEY\""
+ ],
+ "metadata": {
+ "id": "HQuFAs7d_Yl6"
+ },
+ "execution_count": 4,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "run_query(\"\"\"\n",
+ "CALL apoc.periodic.iterate(\n",
+ " 'MATCH (m:Movie) RETURN id(m) AS id',\n",
+ " 'MATCH (m:Movie)\n",
+ " WHERE id(m) = id\n",
+ " MATCH (m)-[r:ACTED_IN|DIRECTED]-(t)\n",
+ " WITH m, type(r) as type, collect(t.name) as names\n",
+ " WITH m, type+\": \"+reduce(s=\"\", n IN names | s + n + \", \") as types\n",
+ " WITH m, collect(types) as contexts\n",
+ " WITH m, \"Movie title: \"+ m.title + \" year: \"+coalesce(m.released,\"\") +\" plot: \"+ coalesce(m.tagline,\"\")+\"\\n\" +\n",
+ " reduce(s=\"\", c in contexts | s + substring(c, 0, size(c)-2) +\"\\n\") as context\n",
+ " CALL apoc.ml.openai.embedding([context], $apiKey) YIELD embedding\n",
+ " SET m.embedding = embedding',\n",
+ " {batchSize:1, retries:3, params: {apiKey: $apiKey}})\n",
+ "\"\"\", {'apiKey': openai_api_key})['errorMessages'][0]"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "wzeKmQnT9Mqv",
+ "outputId": "8ee5f12c-828d-4cb9-aa35-d6be9a832e47"
+ },
+ "execution_count": 5,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "{}"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 5
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "The newly added apoc.ml.openai.embeddingprocedures make generating text embeddings very easy using OpenAI's API. We wrap the API call with apoc.periodic.iterate to batch the transactions and introduce the retry policy.\n",
+ "\n",
+ "# Retrieval-augmented LLMs\n",
+ "\n",
+ "It looks like the mainstream trend is to provide LLMs with external information at query time. We can even find OpenAI's guides how to provide relevant information as part of the prompt to generate the answer.\n",
+ "\n",
+ "\n",
+ "\n",
+ "Here, we will use vector similarity search to find relevant movies given the user input. The workflow is the following:\n",
+ "We embed the user question with the same text embedding model we used to embed node context information\n",
+ "We use the cosine similarity to find the top 3 most relevant nodes and return their information to the LLM\n",
+ "The LLM constructs the final answer based on the provided information\n",
+ "\n",
+ "Since we will be using the gpt-3.5-turbo model to generate the final answer, it is a good practice to define the system prompt. To make it more readable, we will define the system prompt as Python variable and then use query parameters when executing Cypher statements.\n"
+ ],
+ "metadata": {
+ "id": "h6FbeBKO12H4"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "system_prompt = \"\"\"\n",
+ "You are an assistant that helps to generate text to form nice and human understandable answers based.\n",
+ "The latest prompt contains the information, and you need to generate a human readable response based on the given information.\n",
+ "Make the answer sound as a response to the question. Do not mention that you based the result on the given information.\n",
+ "Do not add any additional information that is not explicitly provided in the latest prompt.\n",
+ "I repeat, do not add any information that is not explicitly given.\n",
+ "\"\"\""
+ ],
+ "metadata": {
+ "id": "KR8qsmX72AQ_"
+ },
+ "execution_count": 6,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Next, we will define a function that constructs a user prompt based on the user question and the provided context from the database."
+ ],
+ "metadata": {
+ "id": "vK2Xj7ky2BGr"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "def generate_user_prompt(question, context):\n",
+ " return f\"\"\"\n",
+ " The question is {question}\n",
+ " Answer the question by using the provided information:\n",
+ " {context}\n",
+ " \"\"\""
+ ],
+ "metadata": {
+ "id": "tGHG1dOi2DlI"
+ },
+ "execution_count": 7,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Before asking the LLM to generate answers, we must define the intelligent search tool that will provide relevant context information based on the vector similarity search. As mentioned, we need to embed the user input and then use the cosine similarity to identify relevant nodes. With graphs, you can decide the type of information you want to retrieve and provide as context. In this example, we will return the same context information that was used to generate text embeddings along with similar movie information."
+ ],
+ "metadata": {
+ "id": "GXrNjboh2EWH"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "def retrieve_context(question, k=3):\n",
+ " data = run_query(\n",
+ " \"\"\"\n",
+ " // retrieve the embedding of the question\n",
+ " CALL apoc.ml.openai.embedding([$question], $apiKey) YIELD embedding\n",
+ " // match relevant movies\n",
+ " MATCH (m:Movie)\n",
+ " WITH m, gds.similarity.cosine(embedding, m.embedding) AS score\n",
+ " ORDER BY score DESC\n",
+ " // limit the number of relevant documents\n",
+ " LIMIT toInteger($k)\n",
+ " // retrieve graph context\n",
+ " MATCH (m)--()--(m1:Movie)\n",
+ " WITH m,m1, count(*) AS count\n",
+ " ORDER BY count DESC\n",
+ " WITH m, apoc.text.join(collect(m1.title)[..3], \", \") AS similarMovies\n",
+ " MATCH (m)-[r:ACTED_IN|DIRECTED]-(t)\n",
+ " WITH m, similarMovies, type(r) as type, collect(t.name) as names\n",
+ " WITH m, similarMovies, type+\": \"+reduce(s=\"\", n IN names | s + n + \", \") as types\n",
+ " WITH m, similarMovies, collect(types) as contexts\n",
+ " WITH m, \"Movie title: \"+ m.title + \" year: \"+coalesce(m.released,\"\") +\" plot: \"+ coalesce(m.tagline,\"\")+\"\\n\" +\n",
+ " reduce(s=\"\", c in contexts | s + substring(c, 0, size(c)-2) +\"\\n\") + \"similar movies:\" + similarMovies + \"\\n\" as context\n",
+ " RETURN context\n",
+ " \"\"\",\n",
+ " {\"question\": question, \"k\": k, \"apiKey\": openai_api_key},\n",
+ " )\n",
+ " return data[\"context\"].to_list()"
+ ],
+ "metadata": {
+ "id": "E4EGxJuLAm1y"
+ },
+ "execution_count": 8,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "At the moment, you need to use the gds.similarity.cosine function to calculate the cosine similarity between the question and relevant nodes. After identifying the relevant nodes, we retrieve the context using two additional MATCHclauses. You can check out Neo4j's GraphAcademy to learn more about Cypher query language.\n",
+ "\n",
+ "Finally, we can define the function that takes in the user question and returns an answer."
+ ],
+ "metadata": {
+ "id": "PlPZeC3k2JML"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "def generate_answer(question):\n",
+ " # Retrieve context\n",
+ " context = retrieve_context(question)\n",
+ " # Print context\n",
+ " for c in context:\n",
+ " print(c)\n",
+ " # Generate answer\n",
+ " response = run_query(\n",
+ " \"\"\"\n",
+ " CALL apoc.ml.openai.chat([{role:'system', content: $system},\n",
+ " {role: 'user', content: $user}], $apiKey) YIELD value\n",
+ " RETURN value.choices[0].message.content AS answer\n",
+ " \"\"\",\n",
+ " {\n",
+ " \"system\": system_prompt,\n",
+ " \"user\": generate_user_prompt(question, context),\n",
+ " \"apiKey\": openai_api_key,\n",
+ " },\n",
+ " )\n",
+ " return response[\"answer\"][0]"
+ ],
+ "metadata": {
+ "id": "AWL1XQh62Jax"
+ },
+ "execution_count": 9,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Let's test our retrieval-augmented LLM workflow."
+ ],
+ "metadata": {
+ "id": "pfUcyFnG2NW9"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "generate_answer(\"Who played in the Matrix?\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 313
+ },
+ "id": "MSiHagee_70-",
+ "outputId": "398d15be-da08-4a6f-bcb5-d00b9dd1a0f1"
+ },
+ "execution_count": 10,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Movie title: The Matrix year: 1999 plot: Welcome to the Real World\n",
+ "ACTED_IN: Emil Eifrem, Hugo Weaving, Laurence Fishburne, Carrie-Anne Moss, Keanu Reeves\n",
+ "DIRECTED: Lana Wachowski, Lilly Wachowski\n",
+ "similar movies:The Matrix Revolutions, The Matrix Reloaded, V for Vendetta\n",
+ "\n",
+ "Movie title: The Matrix Reloaded year: 2003 plot: Free your mind\n",
+ "DIRECTED: Lana Wachowski, Lilly Wachowski\n",
+ "ACTED_IN: Hugo Weaving, Laurence Fishburne, Carrie-Anne Moss, Keanu Reeves\n",
+ "similar movies:The Matrix Revolutions, The Matrix, V for Vendetta\n",
+ "\n",
+ "Movie title: The Matrix Revolutions year: 2003 plot: Everything that has a beginning has an end\n",
+ "DIRECTED: Lana Wachowski, Lilly Wachowski\n",
+ "ACTED_IN: Hugo Weaving, Laurence Fishburne, Carrie-Anne Moss, Keanu Reeves\n",
+ "similar movies:The Matrix Reloaded, The Matrix, V for Vendetta\n",
+ "\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'The actors who played in The Matrix are Emil Eifrem, Hugo Weaving, Laurence Fishburne, Carrie-Anne Moss and Keanu Reeves.'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 10
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "generate_answer(\"Recommend a movie with Jack Nicholson?\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 331
+ },
+ "id": "XepSm2miD7ac",
+ "outputId": "b86d6d0c-cea7-49bc-95d6-299c4fbd84ec"
+ },
+ "execution_count": 11,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Movie title: Something's Gotta Give year: 2003 plot: \n",
+ "ACTED_IN: Keanu Reeves, Diane Keaton, Jack Nicholson\n",
+ "DIRECTED: Nancy Meyers\n",
+ "similar movies:Something's Gotta Give, The Replacements, Johnny Mnemonic\n",
+ "\n",
+ "Movie title: One Flew Over the Cuckoo's Nest year: 1975 plot: If he's crazy, what does that make you?\n",
+ "ACTED_IN: Danny DeVito, Jack Nicholson\n",
+ "DIRECTED: Milos Forman\n",
+ "similar movies:Hoffa, As Good as It Gets, Something's Gotta Give\n",
+ "\n",
+ "Movie title: As Good as It Gets year: 1997 plot: A comedy from the heart that goes for the throat.\n",
+ "ACTED_IN: Helen Hunt, Jack Nicholson, Cuba Gooding Jr., Greg Kinnear\n",
+ "DIRECTED: James L. Brooks\n",
+ "similar movies:A Few Good Men, Cast Away, Twister\n",
+ "\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'If you\\'re looking for a movie recommendation featuring Jack Nicholson, I\\'d suggest checking out \"One Flew Over the Cuckoo\\'s Nest\" from 1975. The movie stars Danny DeVito and Jack Nicholson, and was directed by Milos Forman. It\\'s a classic drama that portrays the struggles of patients in a mental institution.'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 11
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "generate_answer(\"What are similar movies to As Good as It Gets?\")"
+ ],
+ "metadata": {
+ "id": "DtWRYPGnEgv9",
+ "outputId": "33ab60fd-4fdc-4adf-df15-ef8d8ae0e80e",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 331
+ }
+ },
+ "execution_count": 13,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Movie title: Something's Gotta Give year: 2003 plot: \n",
+ "ACTED_IN: Keanu Reeves, Diane Keaton, Jack Nicholson\n",
+ "DIRECTED: Nancy Meyers\n",
+ "similar movies:Something's Gotta Give, The Replacements, Johnny Mnemonic\n",
+ "\n",
+ "Movie title: As Good as It Gets year: 1997 plot: A comedy from the heart that goes for the throat.\n",
+ "ACTED_IN: Helen Hunt, Jack Nicholson, Cuba Gooding Jr., Greg Kinnear\n",
+ "DIRECTED: James L. Brooks\n",
+ "similar movies:A Few Good Men, Cast Away, Twister\n",
+ "\n",
+ "Movie title: The Devil's Advocate year: 1997 plot: Evil has its winning ways\n",
+ "DIRECTED: Taylor Hackford\n",
+ "ACTED_IN: Al Pacino, Charlize Theron, Keanu Reeves\n",
+ "similar movies:That Thing You Do, Something's Gotta Give, The Replacements\n",
+ "\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'There are a few similar movies to \"As Good as It Gets\" that you may enjoy. If you liked the combination of comedy and drama in the plot, you may also enjoy \"A Few Good Men\" and \"Cast Away\". If you enjoyed the acting of Jack Nicholson, you might also like \"The Replacements\" and \"Johnny Mnemonic\", both of which he had a role in.'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 13
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Summary\n",
+ "And there you have it, a glimpse into the fascinating world of integrating Large Language Models with Knowledge Graphs. As the field continues to evolve, so too will the tools and techniques at our disposal. With Neo4j and APOC's continued advancements, we can expect even greater innovation in how we handle and process data."
+ ],
+ "metadata": {
+ "id": "uvT3ajut2Q9z"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [],
+ "metadata": {
+ "id": "-1pwS5zJx28Y"
+ },
+ "execution_count": null,
+ "outputs": []
+ }
+ ]
+}
\ No newline at end of file
diff --git a/LLMs/src/generic_cypher_gpt4.ipynb b/LLMs/src/generic_cypher_gpt4.ipynb
new file mode 100644
index 0000000..853e236
--- /dev/null
+++ b/LLMs/src/generic_cypher_gpt4.ipynb
@@ -0,0 +1,1109 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "provenance": [],
+ "authorship_tag": "ABX9TyMV47wMTNlwFnmIA1EX1mSX",
+ "include_colab_link": true
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github",
+ "colab_type": "text"
+ },
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "JUDi5el-l8d8",
+ "outputId": "9104694a-4f9e-4c3d-a66d-7c013ad333c6"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
+ "Requirement already satisfied: openai in /usr/local/lib/python3.9/dist-packages (0.27.4)\n",
+ "Requirement already satisfied: neo4j in /usr/local/lib/python3.9/dist-packages (5.7.0)\n",
+ "Requirement already satisfied: aiohttp in /usr/local/lib/python3.9/dist-packages (from openai) (3.8.4)\n",
+ "Requirement already satisfied: requests>=2.20 in /usr/local/lib/python3.9/dist-packages (from openai) (2.27.1)\n",
+ "Requirement already satisfied: tqdm in /usr/local/lib/python3.9/dist-packages (from openai) (4.65.0)\n",
+ "Requirement already satisfied: pytz in /usr/local/lib/python3.9/dist-packages (from neo4j) (2022.7.1)\n",
+ "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/dist-packages (from requests>=2.20->openai) (1.26.15)\n",
+ "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/dist-packages (from requests>=2.20->openai) (2022.12.7)\n",
+ "Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.9/dist-packages (from requests>=2.20->openai) (2.0.12)\n",
+ "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/dist-packages (from requests>=2.20->openai) (3.4)\n",
+ "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.9/dist-packages (from aiohttp->openai) (4.0.2)\n",
+ "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.9/dist-packages (from aiohttp->openai) (1.9.2)\n",
+ "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.9/dist-packages (from aiohttp->openai) (23.1.0)\n",
+ "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.9/dist-packages (from aiohttp->openai) (1.3.3)\n",
+ "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.9/dist-packages (from aiohttp->openai) (6.0.4)\n",
+ "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.9/dist-packages (from aiohttp->openai) (1.3.1)\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip install openai neo4j"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Generating Cypher queries with GPT-4 on any graph schema\n",
+ "\n",
+ "Large language models have a great potential to translate natural language into query language. For example, some people like to use GPT models to translate text to SQL, while others want to use GPT models to construct SPARQL queries. Personally, I prefer exploring how to translate natural language to Cypher query language.\n",
+ "In my experiments, I have noticed there are two approaches to developing an LLM flow that constructs Cypher statements. One option is to provide example queries in the prompt or use the examples to finetune an LLM model. However, the limitation of this approach is that it requires some work to produce the Cypher examples. Therefore, the example Cypher queries must be generated for each graph schema. On the other hand, we can provide an LLM directly with schema information and let it construct Cypher statements based on graph schema information alone. Using the second approach, we could develop a generic Cypher statement model to produce Cypher statements for any input graph schema, as we eliminate the need for any additional work like generating example Cypher statements.\n",
+ "This blog post aims to show you how to implement a Cypher statement-generating model by providing only the graph schema information. We will evaluate the model's Cypher construction capabilities on three graphs with different graph schemas. Currently, the only model I recommend to generate Cypher statements based on only the provided graph schema is GPT-4. Other models like GPT-3.5-turbo or text-davinci-003 aren't that great, and I have yet to find an open-source LLM model that would be good at following instructions in the prompt as well as GPT-4.\n",
+ "## Experiment Setup\n",
+ "I have implemented a Python class that connects to a Neo4j instance and fetches the schema information when initialized. The graph schema information can then be used as input to GPT-4 model."
+ ],
+ "metadata": {
+ "id": "NYILquGHZrd7"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "node_properties_query = \"\"\"\n",
+ "CALL apoc.meta.data()\n",
+ "YIELD label, other, elementType, type, property\n",
+ "WHERE NOT type = \"RELATIONSHIP\" AND elementType = \"node\"\n",
+ "WITH label AS nodeLabels, collect(property) AS properties\n",
+ "RETURN {labels: nodeLabels, properties: properties} AS output\n",
+ "\n",
+ "\"\"\"\n",
+ "\n",
+ "rel_properties_query = \"\"\"\n",
+ "CALL apoc.meta.data()\n",
+ "YIELD label, other, elementType, type, property\n",
+ "WHERE NOT type = \"RELATIONSHIP\" AND elementType = \"relationship\"\n",
+ "WITH label AS nodeLabels, collect(property) AS properties\n",
+ "RETURN {type: nodeLabels, properties: properties} AS output\n",
+ "\"\"\"\n",
+ "\n",
+ "rel_query = \"\"\"\n",
+ "CALL apoc.meta.data()\n",
+ "YIELD label, other, elementType, type, property\n",
+ "WHERE type = \"RELATIONSHIP\" AND elementType = \"node\"\n",
+ "RETURN {source: label, relationship: property, target: other} AS output\n",
+ "\"\"\""
+ ],
+ "metadata": {
+ "id": "67N5Q5-CmuG8"
+ },
+ "execution_count": 2,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "from neo4j import GraphDatabase\n",
+ "from neo4j.exceptions import CypherSyntaxError\n",
+ "import openai\n",
+ "\n",
+ "\n",
+ "def schema_text(node_props, rel_props, rels):\n",
+ " return f\"\"\"\n",
+ " This is the schema representation of the Neo4j database.\n",
+ " Node properties are the following:\n",
+ " {node_props}\n",
+ " Relationship properties are the following:\n",
+ " {rel_props}\n",
+ " Relationship point from source to target nodes\n",
+ " {rels}\n",
+ " Make sure to respect relationship types and directions\n",
+ " \"\"\"\n",
+ "\n",
+ "\n",
+ "class Neo4jGPTQuery:\n",
+ " def __init__(self, url, user, password, openai_api_key):\n",
+ " self.driver = GraphDatabase.driver(url, auth=(user, password))\n",
+ " openai.api_key = openai_api_key\n",
+ " # construct schema\n",
+ " self.schema = self.generate_schema()\n",
+ "\n",
+ "\n",
+ " def generate_schema(self):\n",
+ " node_props = self.query_database(node_properties_query)\n",
+ " rel_props = self.query_database(rel_properties_query)\n",
+ " rels = self.query_database(rel_query)\n",
+ " return schema_text(node_props, rel_props, rels)\n",
+ "\n",
+ " def refresh_schema(self):\n",
+ " self.schema = self.generate_schema()\n",
+ "\n",
+ " def get_system_message(self):\n",
+ " return f\"\"\"\n",
+ " Task: Generate Cypher queries to query a Neo4j graph database based on the provided schema definition.\n",
+ " Instructions:\n",
+ " Use only the provided relationship types and properties.\n",
+ " Do not use any other relationship types or properties that are not provided.\n",
+ " If you cannot generate a Cypher statement based on the provided schema, explain the reason to the user.\n",
+ " Schema:\n",
+ " {self.schema}\n",
+ "\n",
+ " Note: Do not include any explanations or apologies in your responses.\n",
+ " \"\"\"\n",
+ "\n",
+ " def query_database(self, neo4j_query, params={}):\n",
+ " with self.driver.session() as session:\n",
+ " result = session.run(neo4j_query, params)\n",
+ " output = [r.values() for r in result]\n",
+ " output.insert(0, result.keys())\n",
+ " return output\n",
+ "\n",
+ " def construct_cypher(self, question, history=None):\n",
+ " messages = [\n",
+ " {\"role\": \"system\", \"content\": self.get_system_message()},\n",
+ " {\"role\": \"user\", \"content\": question},\n",
+ " ]\n",
+ " # Used for Cypher healing flows\n",
+ " if history:\n",
+ " messages.extend(history)\n",
+ "\n",
+ " completions = openai.ChatCompletion.create(\n",
+ " model=\"gpt-4\",\n",
+ " temperature=0.0,\n",
+ " max_tokens=1000,\n",
+ " messages=messages\n",
+ " )\n",
+ " return completions.choices[0].message.content\n",
+ "\n",
+ " def run(self, question, history=None, retry=True):\n",
+ " # Construct Cypher statement\n",
+ " cypher = self.construct_cypher(question, history)\n",
+ " print(cypher)\n",
+ " try:\n",
+ " return self.query_database(cypher)\n",
+ " # Self-healing flow\n",
+ " except CypherSyntaxError as e:\n",
+ " # If out of retries\n",
+ " if not retry:\n",
+ " return \"Invalid Cypher syntax\"\n",
+ " # Self-healing Cypher flow by\n",
+ " # providing specific error to GPT-4\n",
+ " print(\"Retrying\")\n",
+ " return self.run(\n",
+ " question,\n",
+ " [\n",
+ " {\"role\": \"assistant\", \"content\": cypher},\n",
+ " {\n",
+ " \"role\": \"user\",\n",
+ " \"content\": f\"\"\"This query returns an error: {str(e)} \n",
+ " Give me a improved query that works without any explanations or apologies\"\"\",\n",
+ " },\n",
+ " ],\n",
+ " retry=False\n",
+ " )\n"
+ ],
+ "metadata": {
+ "id": "IHY0Kt2-mFFq"
+ },
+ "execution_count": 3,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "It's interesting how I ended with the final system message to get GPT-4 following my instructions. At first, I wrote my directions as plain text and added some constraints. However, the model wasn't doing exactly what I wanted, so I opened ChatGPT in a web browser and asked it to rewrite my instructions in a manner that GPT-4 would understand. Finally, ChatGPT seems to understand what works best as GPT-4 prompts, as the model behaved much better with this new prompt structure.\n",
+ "\n",
+ "\n",
+ "The GPT-4 model uses the ChatCompletion endpoint, which uses a combination of system, user, and optional assistant messages when we want to ask follow-up questions. So, we always start with only the system and user message. However, if the generated Cypher statement has any syntax error, the self-healing flow will be started, where we include the error in the follow-up question so that GPT-4 can fix the query. Therefore, we have included the optional history parameter for Cypher self-healing flow.\n",
+ "\n",
+ "The run function starts by generating a Cypher statement. Then, the generated Cypher statement is used to query the Neo4j database. If the Cypher syntax is valid, the query results are returned. However, suppose there is a Cypher syntax error. In that case, we do a single follow-up to GPT-4, provide the generated Cypher statement it constructed in the previous call, and include the error from the Neo4j database. GPT-4 is quite good at fixing a Cypher statement when provided with the error.\n",
+ "\n",
+ "The self-healing Cypher flow was inspired by others who have used similar flows for Python and other code. However, I have limited the follow-up Cypher healing to only a single iteration. If the follow-up doesn't provide a valid Cypher statement, the function returns the \"Invalid Cypher syntax response\".\n",
+ "Let's now test the capabilities of GPT-4 to construct Cypher statements based on the provided graph schema only."
+ ],
+ "metadata": {
+ "id": "LXk1Uy-vZ7cm"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "We will begin with a simple airport route graph, which is available [as the GDS project in Neo4j Sandbox](https://sandbox.neo4j.com/?usecase=graph-data-science2).\n",
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "CGCS37mZushD"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ ""
+ ],
+ "metadata": {
+ "id": "Lc7d99b6azdm"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "This graph schema is relatively simple. The graph contains information about airports and their routes. Additionally, information about the airport's city, region, country, and continent is stored as separate nodes.\n",
+ "\n",
+ "We can instantiate the Python class used to query the airport graph with the following Python code:"
+ ],
+ "metadata": {
+ "id": "qxzVgleva2dp"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "openai_key = \"INSERT_OPENAI_API_KEY\""
+ ],
+ "metadata": {
+ "id": "7hg23hmtqIvD"
+ },
+ "execution_count": 4,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "gds_db = Neo4jGPTQuery(\n",
+ " url=\"bolt://18.207.187.166:7687\",\n",
+ " user=\"neo4j\",\n",
+ " password=\"preferences-accomplishments-vent\",\n",
+ " openai_api_key=openai_key,\n",
+ ")\n"
+ ],
+ "metadata": {
+ "id": "NZTTW3TkpKFY"
+ },
+ "execution_count": 5,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Now we can begin our experiment. First, we will begin with a simple question."
+ ],
+ "metadata": {
+ "id": "aD4z9PW7a8i-"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "gds_db.run(\"\"\"\n",
+ "What is the city with the most airports?\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "TyBNV92QqeUn",
+ "outputId": "fca146ab-ac24-4abc-9d6b-3db3db0d4f8d"
+ },
+ "execution_count": 6,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "MATCH (a:Airport)-[:IN_CITY]->(c:City)\n",
+ "RETURN c.name AS City, COUNT(a) AS NumberOfAirports\n",
+ "ORDER BY NumberOfAirports DESC\n",
+ "LIMIT 1\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[['City', 'NumberOfAirports'], ['London', 6]]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 6
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Great start. The Cypher statement was correctly generated, and we found that London has six airports. Next, let's try something more complex."
+ ],
+ "metadata": {
+ "id": "Qm67jPVaa-u4"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "gds_db.run(\"\"\"\n",
+ "calculate the minimum, maximum, average, and standard deviation of the number of flights out of each airport.\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "QLbBT4zVrkgS",
+ "outputId": "c9a4e5b8-7761-43fe-df29-7829a14d9783"
+ },
+ "execution_count": 7,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "MATCH (a:Airport)-[r:HAS_ROUTE]->(:Airport)\n",
+ "WITH a, count(r) as num_flights\n",
+ "RETURN min(num_flights) as min_flights, max(num_flights) as max_flights, avg(num_flights) as avg_flights, stDev(num_flights) as stddev_flights\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[['min_flights', 'max_flights', 'avg_flights', 'stddev_flights'],\n",
+ " [1, 307, 20.905362776025285, 38.28730861505158]]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 7
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Quite nice. The GPT-4 model correctly assumed that flights relate to the HAS_ROUTE relationship. Additionally, it accurately aggregates flights per airport, then calculates the specified metrics.\n",
+ "\n",
+ "Let's now throw it a curveball. We will ask the model to calculate the variance since Cypher doesn't have any built-in function to calculate the variance."
+ ],
+ "metadata": {
+ "id": "Ci3zSR8ZbBN9"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "gds_db.run(\"\"\"\n",
+ "calculate the variance of the number of flights out of each airport.\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 111
+ },
+ "id": "OpnkI_laEMda",
+ "outputId": "8ecb6490-5e61-4d94-f803-2d10df74776b"
+ },
+ "execution_count": 8,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "The provided schema does not have information about the number of flights for each airport. Therefore, it is not possible to calculate the variance of the number of flights out of each airport using the given schema.\n",
+ "Retrying\n",
+ "As mentioned earlier, the provided schema does not have information about the number of flights for each airport. Therefore, it is not possible to create a query to calculate the variance of the number of flights out of each airport using the given schema.\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'Invalid Cypher syntax'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 8
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "First of all, GPT-4 provided explanations when explicitly told not to. Secondly, neither Cypher statements make any sense. In this example, even the self-healing flow didn't succeed since we are not dealing with a Cypher syntax error but a GPT-4 system malfunction.\n",
+ "\n",
+ "I have noticed that GPT-4 struggles when it needs to perform multiple aggregations using different grouping keys in a single Cypher statement. Here it wanted to split the statement into two parts (which don't work either), but in other cases it wants to borrow syntax from SQL.\n",
+ "\n",
+ "However, GPT-4 is quite obedient and provides the specified results from the database as instructed by the user."
+ ],
+ "metadata": {
+ "id": "PHk4gjdebFEg"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "gds_db.run(\"\"\"\n",
+ "Find the shortest route between ATL and IAH airports\n",
+ "and return only the iata and runways property of the nodes as a map object\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "h_IxUkoIykhd",
+ "outputId": "f6e60bb7-7a0c-4d5a-f7a7-448adda75d26"
+ },
+ "execution_count": 9,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "MATCH (atl:Airport {iata: \"ATL\"}), (iah:Airport {iata: \"IAH\"}), path = shortestPath((atl)-[:HAS_ROUTE*]-(iah))\n",
+ "WITH nodes(path) AS airports\n",
+ "UNWIND airports AS airport\n",
+ "RETURN {iata: airport.iata, runways: airport.runways} AS airportInfo\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[['airportInfo'],\n",
+ " [{'iata': 'ATL', 'runways': 5}],\n",
+ " [{'iata': 'IAH', 'runways': 5}]]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 9
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Here is where the power of GPT-4 shines. The more specific we are in what we want to find and how we want the results to be structured, the better it works.\n",
+ "We can also test if it knows how to use the GDS library."
+ ],
+ "metadata": {
+ "id": "e2VDhhXxbJyR"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "print(gds_db.construct_cypher(\"\"\"\n",
+ "Calculate the betweenness centrality of airports using the Graph Data Science library\n",
+ "\"\"\"))"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "7S6rkSYzufWM",
+ "outputId": "b5d75083-5198-4ca6-b873-e7718f735f73"
+ },
+ "execution_count": 10,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "CALL gds.betweenness.stream({\n",
+ " nodeProjection: 'Airport',\n",
+ " relationshipProjection: {\n",
+ " HAS_ROUTE: {\n",
+ " type: 'HAS_ROUTE',\n",
+ " orientation: 'UNDIRECTED'\n",
+ " }\n",
+ " }\n",
+ "})\n",
+ "YIELD nodeId, score\n",
+ "RETURN gds.util.asNode(nodeId).id AS airportId, score\n",
+ "ORDER BY score DESC\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Well, the constructed Cypher statement looks fine. However, there is only one problem. The generated Cypher statement uses the anonymous graph projection, which was deprecated and removed in GDS v2. Here we see some issues arising from GPT-4's knowledge cutoff date. Unfortunately, it looks like GDS v2 was released after the knowledge cutoff date, and therefore the new syntax is not baked into GPT-4. Therefore, at the moment, the GPT-4 model doesn't provide valid GDS procedures.\n",
+ "\n",
+ "If you pay attention, you will also notice that GPT-4 never uses the Cypher subquery syntax, which is again another syntax change that was added after the knowledge cutoff date.\n",
+ "\n",
+ "Interestingly, if you calculate any of the values from graph algorithms and store them as node property, the GPT-4 has no problem retrieving that."
+ ],
+ "metadata": {
+ "id": "55cxuibdbNhl"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "gds_db.run(\"\"\"\n",
+ "Use PageRank to find the five most important airports and return their descr and pagerank value\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "6DPu-MyYwFa7",
+ "outputId": "b3829b72-16a5-40d9-ac95-61d9d11ce17b"
+ },
+ "execution_count": 11,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "MATCH (a:Airport)\n",
+ "RETURN a.descr, a.pagerank\n",
+ "ORDER BY a.pagerank DESC\n",
+ "LIMIT 5\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[['a.descr', 'a.pagerank'],\n",
+ " ['Dallas/Fort Worth International Airport', 11.97978260670334],\n",
+ " [\"Chicago O'Hare International Airport\", 11.162988178920267],\n",
+ " ['Denver International Airport', 10.997299338126387],\n",
+ " ['Hartsfield - Jackson Atlanta International Airport', 10.389948350302957],\n",
+ " ['Istanbul International Airport', 8.42580121770578]]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 11
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "It looks like Dallas and Chicago have the highest PageRank scores.\n",
+ "## Healthcare sandbox\n",
+ "You might say that the airport sandbox might have been part of the training data of GPT-4. That is definitely a possibility. Therefore, let's test GPT-4 ability to construct Cypher statements on the latest Neo4j Sandbox project dealing with healthcare data, published between December 2022 and January 2023. That should be after the GPT-4 knowledge cutoff date.\n",
+ "\n",
+ ""
+ ],
+ "metadata": {
+ "id": "PtyWWubP4xLh"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "The healthcare graph schema revolves around adverse drug event cases. Therefore, each case is related to relevant drugs. In addition, other information is available such as the age group, outcome, and reaction. Here, I took the examples from the sandbox guide as I am not familiar with the adverse drug events domain."
+ ],
+ "metadata": {
+ "id": "D2U9HoJobWBi"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "hc_db = Neo4jGPTQuery(\n",
+ " url=\"bolt://3.216.123.73:7687\",\n",
+ " user=\"neo4j\",\n",
+ " password=\"reenlistment-superstructures-shafts\",\n",
+ " openai_api_key=openai_key,\n",
+ ")"
+ ],
+ "metadata": {
+ "id": "UsKiPnx1wy8Z"
+ },
+ "execution_count": 12,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "hc_db.run(\"\"\"\n",
+ "What are the top 5 side effects reported?\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "ch0QpKzb491V",
+ "outputId": "14eecb9d-cc15-44a6-bca5-34cefdb02a1a"
+ },
+ "execution_count": 13,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "MATCH (c:Case)-[:HAS_REACTION]->(r:Reaction)\n",
+ "RETURN r.description as SideEffect, COUNT(*) as Frequency\n",
+ "ORDER BY Frequency DESC\n",
+ "LIMIT 5\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[['SideEffect', 'Frequency'],\n",
+ " ['Fatigue', 303],\n",
+ " ['Product dose omission issue', 285],\n",
+ " ['Headache', 272],\n",
+ " ['Nausea', 256],\n",
+ " ['Pain', 253]]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 13
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "It would be interesting to learn how did GPT-4 know that side effects can be found as the Reaction nodes. Even I couldn't find that without any details about the graph. Are there graph out there with similar schema, or is the knowledge cutoff date of GPT-4 not that accurate? Or does it only have great intuition to find relevant data based on node labels and their properties.\n",
+ "\n",
+ "Let's try something more complex."
+ ],
+ "metadata": {
+ "id": "Tapjpf0HbaO6"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "hc_db.run(\"\"\"\n",
+ "What are the top 3 manufacturing companies with the most reported side effects?\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "bMEOMI0n5aT5",
+ "outputId": "7d91e1a7-69ca-4d9c-c9e1-71401113f74c"
+ },
+ "execution_count": 14,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "MATCH (m:Manufacturer)-[:REGISTERED]->(c:Case)-[:HAS_REACTION]->(r:Reaction)\n",
+ "RETURN m.manufacturerName, COUNT(r) as sideEffectsCount\n",
+ "ORDER BY sideEffectsCount DESC\n",
+ "LIMIT 3\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[['m.manufacturerName', 'sideEffectsCount'],\n",
+ " ['TAKEDA', 5058],\n",
+ " ['PFIZER', 3219],\n",
+ " ['NOVARTIS', 1823]]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 14
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Here, we can see that GPT-4 is very specific in our request. Since we are asking for the count of reported side effects, it expands to Reaction nodes and counts them. On the other hand, we could request only the number of cases."
+ ],
+ "metadata": {
+ "id": "7-uxtXJWbeGH"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "hc_db.run(\"\"\"\n",
+ "What are the top 3 manufacturing companies with the most reported cases?\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "gUBIsss9PXLd",
+ "outputId": "3fdf29a7-e4fb-4985-bc57-ece657f4762b"
+ },
+ "execution_count": 15,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "MATCH (m:Manufacturer)-[:REGISTERED]->(c:Case)\n",
+ "RETURN m.manufacturerName, COUNT(c) as case_count\n",
+ "ORDER BY case_count DESC\n",
+ "LIMIT 3\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[['m.manufacturerName', 'case_count'],\n",
+ " ['TAKEDA', 617],\n",
+ " ['CELGENE', 572],\n",
+ " ['PFIZER', 513]]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 15
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Now, lets do something where GPT-4 has to do both filtering and aggregating."
+ ],
+ "metadata": {
+ "id": "qjwYZA2GbgOk"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "hc_db.run(\"\"\"\n",
+ "What are the top 5 drugs whose side effects resulted in Death of patients as an outcome?\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "0SVUT8l95sVo",
+ "outputId": "1f6d09a5-b3d4-47bc-d48d-042b398a81ca"
+ },
+ "execution_count": 16,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "MATCH (d:Drug)-[:IS_PRIMARY_SUSPECT|:IS_SECONDARY_SUSPECT|:IS_CONCOMITANT|:IS_INTERACTING]->(c:Case)-[:RESULTED_IN]->(o:Outcome)\n",
+ "WHERE o.outcome = \"Death\"\n",
+ "RETURN d.name, COUNT(*) as count\n",
+ "ORDER BY count DESC\n",
+ "LIMIT 5\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[['d.name', 'count']]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 16
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Something that happens sometimes is that GPT-4 messes up the relationship direction. For example, the relationships from the Drug to the Case node should have a reverse direction. Additionally, the Sandbox guide uses only the IS_PRIMARY_SUSPECT relationship type, but we can't blame the GPT-4 model due to the question's ambiguity.\n",
+ "\n",
+ "Note that GPT-4 is not deterministic. Therefore, it may return correct relationship directions and sometimes not. For me, it worked correctly one day and not the other. However, I got consistent results within the same day, so who knows what is happening behind the scenes.\n",
+ "\n",
+ "What I found interesting is that the GPT-4 model knew that the outcome property contains information about the death of patients. But more than that, it knew that the death value should be capitalized, which makes me think the model saw this dataset in one form or another.\n",
+ "\n",
+ "## Custom astronomical dataset\n",
+ "I have decided to construct a custom astronomical dataset that the model definitely hasn't seen during its training since it didn't exist until I started writing this post. It is very tiny, but good enough to test out GPT-4 generalization ability. I have created a [blank project on Neo4j Sandbox](https://sandbox.neo4j.com/?usecase=blank-sandbox) and then seeded the database with the following script."
+ ],
+ "metadata": {
+ "id": "EdFQGZADTTCW"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "astro_db = Neo4jGPTQuery(\n",
+ " url=\"bolt://35.171.160.87:7687\",\n",
+ " user=\"neo4j\",\n",
+ " password=\"discontinuance-fifths-sports\",\n",
+ " openai_api_key=openai_key,\n",
+ ")"
+ ],
+ "metadata": {
+ "id": "3Ceh2OEh6JMR"
+ },
+ "execution_count": 17,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "url = \"https://gist.githubusercontent.com/tomasonjo/52b2da916ef5cd1c2adf0ad62cc71a26/raw/a3a8716f7b28f3a82ce59e6e7df28389e3cb33cb/astro.cql\"\n",
+ "astro_db.query_database(\"CALL apoc.cypher.runFile($url)\", {'url':url})\n",
+ "astro_db.refresh_schema()"
+ ],
+ "metadata": {
+ "id": "w8GjNAPnTd8k"
+ },
+ "execution_count": 18,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "The constructed graph has the following schema.\n",
+ "\n",
+ "\n",
+ "\n",
+ "The database contains planets within our solar system that orbit the sun. Additionally, satellites like ISS, the moon, and Hubble Telescope are included."
+ ],
+ "metadata": {
+ "id": "pRsVNCkxbt2Z"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "astro_db.run(\"\"\"\n",
+ "What orbits the Earth?\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "i-nFUrbfT7Zp",
+ "outputId": "8aeace90-e748-46c4-e5a1-0fba731e8b96"
+ },
+ "execution_count": 19,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "MATCH (ao:AstronomicalObject {name: \"Earth\"})<-[:ORBITS]-(o)\n",
+ "RETURN o.name, labels(o)\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[['o.name', 'labels(o)'],\n",
+ " ['Hubble Space Telescope', ['Satellite']],\n",
+ " ['ISS', ['Satellite']],\n",
+ " ['Moon', ['Satellite']]]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 19
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Remember, the GPT-4 only know that there are satellites and astronomical objects in the database. Astronomical objects orbit other astronomical objects, while satellites can only orbit objects. It looks like it used its knowledge to assume that only a satellite would orbit the Earth, which is impressive. We can observe that GPT-4 probably makes a lot of assumption based on its baked knowledge to help us with our queries.\n",
+ "\n",
+ "Let's dig deeper."
+ ],
+ "metadata": {
+ "id": "YQXN8CRNb1j1"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "astro_db.run(\"\"\"\n",
+ "Does ISS orbits the Sun?\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "hU7agLmMUUOa",
+ "outputId": "f842f49c-c764-49ac-cc11-39e0720ac178"
+ },
+ "execution_count": 20,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "MATCH (s:Satellite {name: \"ISS\"})-[:ORBITS]->(a:AstronomicalObject {name: \"Sun\"})\n",
+ "RETURN s.name as Satellite, a.name as AstronomicalObject\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[['Satellite', 'AstronomicalObject']]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 20
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "So, the ISS doesn't directly orbit the sun. We can rephrase our question."
+ ],
+ "metadata": {
+ "id": "XYizIkKHb4JP"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "astro_db.run(\"\"\"\n",
+ "Does ISS orbits the Sun? Find any path between them\n",
+ "and return names of nodes in the path\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "l6RY0-1jXyyK",
+ "outputId": "b244c722-09b4-41d8-960b-f28bd785c3bf"
+ },
+ "execution_count": 21,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "MATCH path = (iss:Satellite {name: \"ISS\"})-[:ORBITS*]->(sun:AstronomicalObject {name: \"Sun\"})\n",
+ "RETURN [node in nodes(path) | node.name] AS path_names\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[['path_names'], [['ISS', 'Earth', 'Sun']]]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 21
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Now, it uses a variable-length path pattern to find if ISS orbits the sun by proxy. Of course, we gave it a hint to use that, but it is still remarkable. For the final example, let's observe how good GPT-4 is at guessing never-seen-before property values."
+ ],
+ "metadata": {
+ "id": "iIwY3cBtb7X3"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "astro_db.run(\"\"\"\n",
+ "What's the altitude difference between ISS and Hubble telescope\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 239
+ },
+ "id": "_gdrePY_b_wz",
+ "outputId": "66b2469c-0848-49b9-de21-942e214bf0e3"
+ },
+ "execution_count": 22,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "To find the altitude difference between ISS and Hubble telescope, you can use the following Cypher query:\n",
+ "\n",
+ "```cypher\n",
+ "MATCH (s1:Satellite {name: \"ISS\"}), (s2:Satellite {name: \"Hubble Telescope\"})\n",
+ "RETURN abs(s1.altitude - s2.altitude) as altitude_difference\n",
+ "```\n",
+ "Retrying\n",
+ "```cypher\n",
+ "MATCH (s1:Satellite {name: \"ISS\"}), (s2:Satellite {name: \"Hubble Telescope\"})\n",
+ "RETURN abs(s1.altitude - s2.altitude) as altitude_difference\n",
+ "```\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'Invalid Cypher syntax'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 22
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "To tell you the truth, I am kind of relieved GPT-4 didn't guess correctly that Hubble is stored in the database as \"Hubble Space Telescope\". Other than that, the generated Cypher statement is perfectly valid.\n",
+ "Summary\n",
+ "GPT-4 has a great potential to generate Cypher statements based on only the provided graph schema. My opinion is that it has seen a lot of datasets and graph models during its training, so it is kind of good at guessing which properties to use and sometimes even their values. However, you can always provide the model with instructions about which properties to use and specify the exact values if the model isn't performing well on your specific graph model. The limitations I have observed during this experiment are the following:\n",
+ "* Multiple aggregations with different grouping keys are a problem\n",
+ "* Version two of the Graph Data Science library is beyond the knowledge cutoff date\n",
+ "* Sometimes it messes up the relationship direction (not frequently, though)\n",
+ "* The non-deterministic nature of GPT-4 makes it feel like you are dealing with a horoscope-based model, where identical queries work in the morning but not in the afternoon\n",
+ "* Sometimes the model bypasses system instructions and provides explanations for queries\n",
+ "\n",
+ "Using the schema-only approach to GPT-4 can be used for experimental setups to help developers or researchers that don't have malicious intents to interact with the graph database. On the other hand, if you want to build something more production-ready, I would recommend using the approach of providing examples of Cypher statements."
+ ],
+ "metadata": {
+ "id": "Gq1a80BHb-Oc"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [],
+ "metadata": {
+ "id": "U_WPbddjWVEc"
+ },
+ "execution_count": 22,
+ "outputs": []
+ }
+ ]
+}
\ No newline at end of file
diff --git a/LLMs/src/langchain_fulltext_agent.ipynb b/LLMs/src/langchain_fulltext_agent.ipynb
new file mode 100644
index 0000000..1b8b448
--- /dev/null
+++ b/LLMs/src/langchain_fulltext_agent.ipynb
@@ -0,0 +1,496 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "provenance": [],
+ "authorship_tag": "ABX9TyPEI+4M9KTa9FUFAUClVHNd",
+ "include_colab_link": true
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github",
+ "colab_type": "text"
+ },
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "gH7LglHfbQfQ",
+ "outputId": "5a7b1c49-f5d5-4be6-aaca-8fa544121e64"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
+ "Requirement already satisfied: neo4j in /usr/local/lib/python3.10/dist-packages (5.9.0)\n",
+ "Requirement already satisfied: openai in /usr/local/lib/python3.10/dist-packages (0.27.8)\n",
+ "Requirement already satisfied: langchain in /usr/local/lib/python3.10/dist-packages (0.0.205)\n",
+ "Requirement already satisfied: pytz in /usr/local/lib/python3.10/dist-packages (from neo4j) (2022.7.1)\n",
+ "Requirement already satisfied: requests>=2.20 in /usr/local/lib/python3.10/dist-packages (from openai) (2.27.1)\n",
+ "Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from openai) (4.65.0)\n",
+ "Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from openai) (3.8.4)\n",
+ "Requirement already satisfied: PyYAML>=5.4.1 in /usr/local/lib/python3.10/dist-packages (from langchain) (6.0)\n",
+ "Requirement already satisfied: SQLAlchemy<3,>=1.4 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.0.10)\n",
+ "Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (4.0.2)\n",
+ "Requirement already satisfied: dataclasses-json<0.6.0,>=0.5.7 in /usr/local/lib/python3.10/dist-packages (from langchain) (0.5.8)\n",
+ "Requirement already satisfied: langchainplus-sdk>=0.0.9 in /usr/local/lib/python3.10/dist-packages (from langchain) (0.0.11)\n",
+ "Requirement already satisfied: numexpr<3.0.0,>=2.8.4 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.8.4)\n",
+ "Requirement already satisfied: numpy<2,>=1 in /usr/local/lib/python3.10/dist-packages (from langchain) (1.22.4)\n",
+ "Requirement already satisfied: openapi-schema-pydantic<2.0,>=1.2 in /usr/local/lib/python3.10/dist-packages (from langchain) (1.2.4)\n",
+ "Requirement already satisfied: pydantic<2,>=1 in /usr/local/lib/python3.10/dist-packages (from langchain) (1.10.7)\n",
+ "Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (8.2.2)\n",
+ "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai) (23.1.0)\n",
+ "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai) (2.0.12)\n",
+ "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai) (6.0.4)\n",
+ "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai) (1.9.2)\n",
+ "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai) (1.3.3)\n",
+ "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai) (1.3.1)\n",
+ "Requirement already satisfied: marshmallow<4.0.0,>=3.3.0 in /usr/local/lib/python3.10/dist-packages (from dataclasses-json<0.6.0,>=0.5.7->langchain) (3.19.0)\n",
+ "Requirement already satisfied: marshmallow-enum<2.0.0,>=1.5.1 in /usr/local/lib/python3.10/dist-packages (from dataclasses-json<0.6.0,>=0.5.7->langchain) (1.5.1)\n",
+ "Requirement already satisfied: typing-inspect>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from dataclasses-json<0.6.0,>=0.5.7->langchain) (0.9.0)\n",
+ "Requirement already satisfied: typing-extensions>=4.2.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<2,>=1->langchain) (4.5.0)\n",
+ "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.20->openai) (1.26.15)\n",
+ "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.20->openai) (2022.12.7)\n",
+ "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.20->openai) (3.4)\n",
+ "Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from SQLAlchemy<3,>=1.4->langchain) (2.0.2)\n",
+ "Requirement already satisfied: packaging>=17.0 in /usr/local/lib/python3.10/dist-packages (from marshmallow<4.0.0,>=3.3.0->dataclasses-json<0.6.0,>=0.5.7->langchain) (23.1)\n",
+ "Requirement already satisfied: mypy-extensions>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from typing-inspect>=0.4.0->dataclasses-json<0.6.0,>=0.5.7->langchain) (1.0.0)\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip install neo4j openai langchain"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "from langchain.chat_models import ChatOpenAI\n",
+ "from langchain.chains import GraphCypherQAChain\n",
+ "from langchain.graphs import Neo4jGraph\n",
+ "\n",
+ "graph = Neo4jGraph(\n",
+ " url=\"bolt://3.238.222.252:7687\",\n",
+ " username=\"neo4j\",\n",
+ " password=\"bushels-harpoon-deduction\"\n",
+ ")"
+ ],
+ "metadata": {
+ "id": "ad1yMWkjbUTj"
+ },
+ "execution_count": 2,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "import os\n",
+ "\n",
+ "os.environ['OPENAI_API_KEY'] = \"sk-\"\n",
+ "\n",
+ "chain = GraphCypherQAChain.from_llm(\n",
+ " ChatOpenAI(temperature=0, model_name=\"gpt-4-0613\"), graph=graph, verbose=True,\n",
+ " return_direct=True\n",
+ ")"
+ ],
+ "metadata": {
+ "id": "jhE0tzoXbiGs"
+ },
+ "execution_count": 13,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "chain.run(\"Who played in the Matrix?\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "zkWN8VX6bngj",
+ "outputId": "281973da-fcae-4588-d02f-ae9e5f232a0e"
+ },
+ "execution_count": 14,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "\n",
+ "\u001b[1m> Entering new chain...\u001b[0m\n",
+ "Generated Cypher:\n",
+ "\u001b[32;1m\u001b[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie) WHERE m.title = 'Matrix' RETURN a.name\u001b[0m\n",
+ "\n",
+ "\u001b[1m> Finished chain.\u001b[0m\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 14
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "graph.query(\"\"\"CREATE FULLTEXT INDEX entities IF NOT EXISTS\n",
+ "FOR (n:Movie|Person)\n",
+ "ON EACH [n.name, n.title]\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "SmS2Zu_bcZKQ",
+ "outputId": "dbd210fe-ab29-4c76-f5ea-ce3c5ed53468"
+ },
+ "execution_count": 15,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 15
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "graph.query(\"\"\"\n",
+ "CALL db.index.fulltext.queryNodes(\"entities\", \"Matrix\") YIELD node, score\n",
+ "RETURN node.title, score\n",
+ "LIMIT 3\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "AE3-hCuVccdL",
+ "outputId": "67a78c3c-ac92-4985-b028-fc7d5a4261ac"
+ },
+ "execution_count": 16,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[{'node.title': 'Matrix, The', 'score': 4.259097099304199},\n",
+ " {'node.title': 'Matrix Revolutions, The', 'score': 3.7098255157470703},\n",
+ " {'node.title': 'Matrix Reloaded, The', 'score': 3.7098255157470703}]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 16
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "from __future__ import annotations\n",
+ "from langchain.chains.base import Chain\n",
+ "\n",
+ "from typing import Any, Dict, List\n",
+ "\n",
+ "from pydantic import Field\n",
+ "\n",
+ "\n",
+ "fulltext_search = \"\"\"\n",
+ "CALL db.index.fulltext.queryNodes(\"entities\", $query)\n",
+ "YIELD node, score\n",
+ "RETURN coalesce(node.title, node.name) AS option, labels(node)[0] AS type LIMIT 3\n",
+ "\"\"\"\n",
+ "\n",
+ "\n",
+ "class Neo4jFulltextGraphChain(Chain):\n",
+ " \"\"\"Chain for keyword question-answering against a graph.\"\"\"\n",
+ "\n",
+ " graph: Neo4jGraph = Field(exclude=True)\n",
+ " input_key: str = \"query\" #: :meta private:\n",
+ " output_key: str = \"result\" #: :meta private:\n",
+ "\n",
+ " @property\n",
+ " def input_keys(self) -> List[str]:\n",
+ " \"\"\"Return the input keys.\n",
+ " :meta private:\n",
+ " \"\"\"\n",
+ " return [self.input_key]\n",
+ "\n",
+ " @property\n",
+ " def output_keys(self) -> List[str]:\n",
+ " \"\"\"Return the output keys.\n",
+ " :meta private:\n",
+ " \"\"\"\n",
+ " _output_keys = [self.output_key]\n",
+ " return _output_keys\n",
+ "\n",
+ " def _call(self, inputs: Dict[str, str]) -> Dict[str, Any]:\n",
+ " \"\"\"Extract entities, look up info and answer question.\"\"\"\n",
+ " question = inputs[self.input_key]\n",
+ " context = self.graph.query(\n",
+ " fulltext_search, {'query': question})\n",
+ " return {self.output_key: context}\n"
+ ],
+ "metadata": {
+ "id": "gUbIKwm_cfVy"
+ },
+ "execution_count": 17,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "fulltext = Neo4jFulltextGraphChain(graph=graph)\n",
+ "fulltext.run(\"Matrix\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "jAGQ0Y4CiAPw",
+ "outputId": "aa99af7d-49ce-4803-d1d8-60e332b63733"
+ },
+ "execution_count": 18,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[{'option': 'Matrix, The', 'type': 'Movie'},\n",
+ " {'option': 'Matrix Revolutions, The', 'type': 'Movie'},\n",
+ " {'option': 'Matrix Reloaded, The', 'type': 'Movie'}]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 18
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "fulltext.run(\"Keanu\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "N9_SWhELiF5l",
+ "outputId": "040582cd-d741-48da-eae6-ca2f972c24f0"
+ },
+ "execution_count": 19,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[{'option': 'Keanu', 'type': 'Movie'},\n",
+ " {'option': 'Keanu Reeves', 'type': 'Actor'}]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 19
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "from langchain.agents import initialize_agent, Tool\n",
+ "from langchain.agents import AgentType\n",
+ "\n",
+ "llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo\")\n",
+ "tools = [\n",
+ " Tool(\n",
+ " name=\"Movies\",\n",
+ " func=chain.run,\n",
+ " description=\"\"\"useful for when you need to information about movies, actors, directors, and so on.\n",
+ " Input must be full question.\n",
+ " Always make sure to use the entity search first to validate the names of movies and person.\"\"\",\n",
+ " ),\n",
+ " Tool(\n",
+ " name=\"Entity search\",\n",
+ " func=fulltext.run,\n",
+ " description=\"\"\"useful for when you need find exact values of names of people and movies. The tool returns three options and you have to select the best one.\n",
+ " Input must be a single entity.\"\"\",\n",
+ " ),\n",
+ "\n",
+ "]"
+ ],
+ "metadata": {
+ "id": "brcvHFabiNcu"
+ },
+ "execution_count": 20,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "graph_agent = initialize_agent(tools, llm, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=True)"
+ ],
+ "metadata": {
+ "id": "XNbJEch5j4Zg"
+ },
+ "execution_count": 21,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "graph_agent.run(\"Who played in the Matrix?\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 1000
+ },
+ "id": "RslNeUUEj-Y4",
+ "outputId": "1e705c06-2356-46d3-8c52-b61c0def267e"
+ },
+ "execution_count": 22,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "\n",
+ "\u001b[1m> Entering new chain...\u001b[0m\n",
+ "\u001b[32;1m\u001b[1;3mThought: I should use the Movies tool to get information about the movie \"The Matrix\" and its cast.\n",
+ "Action:\n",
+ "```\n",
+ "{\n",
+ " \"action\": \"Movies\",\n",
+ " \"action_input\": \"cast of The Matrix\"\n",
+ "}\n",
+ "```\n",
+ "\u001b[0m\n",
+ "\n",
+ "\u001b[1m> Entering new chain...\u001b[0m\n",
+ "Generated Cypher:\n",
+ "\u001b[32;1m\u001b[1;3mMATCH (m:Movie {title: 'The Matrix'})-[:ACTED_IN]-(a:Actor)\n",
+ "RETURN a.name\u001b[0m\n",
+ "\n",
+ "\u001b[1m> Finished chain.\u001b[0m\n",
+ "\n",
+ "Observation: \u001b[36;1m\u001b[1;3m[]\u001b[0m\n",
+ "Thought:\u001b[32;1m\u001b[1;3mThe observation should not be an empty list. Let me try again.\n",
+ "Action:\n",
+ "```\n",
+ "{\n",
+ " \"action\": \"Movies\",\n",
+ " \"action_input\": \"The Matrix\"\n",
+ "}\n",
+ "```\n",
+ "\u001b[0m\n",
+ "\n",
+ "\u001b[1m> Entering new chain...\u001b[0m\n",
+ "Generated Cypher:\n",
+ "\u001b[32;1m\u001b[1;3mMATCH (m:Movie) WHERE m.title = 'The Matrix' RETURN m\u001b[0m\n",
+ "\n",
+ "\u001b[1m> Finished chain.\u001b[0m\n",
+ "\n",
+ "Observation: \u001b[36;1m\u001b[1;3m[]\u001b[0m\n",
+ "Thought:\u001b[32;1m\u001b[1;3mI need to check the exact name of the movie using the Entity search tool first.\n",
+ "Action:\n",
+ "```\n",
+ "{\n",
+ " \"action\": \"Entity search\",\n",
+ " \"action_input\": \"The Matrix\"\n",
+ "}\n",
+ "```\n",
+ "\n",
+ "\u001b[0m\n",
+ "Observation: \u001b[33;1m\u001b[1;3m[{'option': 'Matrix, The', 'type': 'Movie'}, {'option': 'Matrix Reloaded, The', 'type': 'Movie'}, {'option': 'Matrix Revolutions, The', 'type': 'Movie'}]\u001b[0m\n",
+ "Thought:\u001b[32;1m\u001b[1;3mI can now use the correct name of the movie to get the cast using the Movies tool.\n",
+ "Action:\n",
+ "```\n",
+ "{\n",
+ " \"action\": \"Movies\",\n",
+ " \"action_input\": \"cast of Matrix, The\"\n",
+ "}\n",
+ "```\n",
+ "\n",
+ "\u001b[0m\n",
+ "\n",
+ "\u001b[1m> Entering new chain...\u001b[0m\n",
+ "Generated Cypher:\n",
+ "\u001b[32;1m\u001b[1;3mMATCH (m:Movie {title: \"Matrix, The\"})-[:ACTED_IN]-(a:Actor)\n",
+ "RETURN a.name\u001b[0m\n",
+ "\n",
+ "\u001b[1m> Finished chain.\u001b[0m\n",
+ "\n",
+ "Observation: \u001b[36;1m\u001b[1;3m[{'a.name': 'Hugo Weaving'}, {'a.name': 'Laurence Fishburne'}, {'a.name': 'Keanu Reeves'}, {'a.name': 'Carrie-Anne Moss'}]\u001b[0m\n",
+ "Thought:"
+ ]
+ },
+ {
+ "output_type": "error",
+ "ename": "OutputParserException",
+ "evalue": "ignored",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/langchain/agents/chat/output_parser.py\u001b[0m in \u001b[0;36mparse\u001b[0;34m(self, text)\u001b[0m\n\u001b[1;32m 17\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 18\u001b[0;31m \u001b[0maction\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtext\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msplit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"```\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 19\u001b[0m \u001b[0mresponse\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mjson\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mloads\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0maction\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstrip\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;31mIndexError\u001b[0m: list index out of range",
+ "\nDuring handling of the above exception, another exception occurred:\n",
+ "\u001b[0;31mOutputParserException\u001b[0m Traceback (most recent call last)",
+ "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mgraph_agent\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Who played in the Matrix?\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/langchain/chains/base.py\u001b[0m in \u001b[0;36mrun\u001b[0;34m(self, callbacks, tags, *args, **kwargs)\u001b[0m\n\u001b[1;32m 271\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 272\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"`run` supports only one positional argument.\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 273\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcallbacks\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcallbacks\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtags\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtags\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0m_output_key\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 274\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 275\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mkwargs\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0margs\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/langchain/chains/base.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, inputs, return_only_outputs, callbacks, tags, include_run_info)\u001b[0m\n\u001b[1;32m 147\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mKeyboardInterrupt\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mException\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 148\u001b[0m \u001b[0mrun_manager\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mon_chain_error\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 149\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 150\u001b[0m \u001b[0mrun_manager\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mon_chain_end\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moutputs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 151\u001b[0m final_outputs: Dict[str, Any] = self.prep_outputs(\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/langchain/chains/base.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, inputs, return_only_outputs, callbacks, tags, include_run_info)\u001b[0m\n\u001b[1;32m 141\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 142\u001b[0m outputs = (\n\u001b[0;32m--> 143\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_call\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minputs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrun_manager\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mrun_manager\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 144\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mnew_arg_supported\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 145\u001b[0m \u001b[0;32melse\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_call\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minputs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/langchain/agents/agent.py\u001b[0m in \u001b[0;36m_call\u001b[0;34m(self, inputs, run_manager)\u001b[0m\n\u001b[1;32m 955\u001b[0m \u001b[0;31m# We now enter the agent loop (until it returns something).\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 956\u001b[0m \u001b[0;32mwhile\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_should_continue\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0miterations\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtime_elapsed\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 957\u001b[0;31m next_step_output = self._take_next_step(\n\u001b[0m\u001b[1;32m 958\u001b[0m \u001b[0mname_to_tool_map\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 959\u001b[0m \u001b[0mcolor_mapping\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/langchain/agents/agent.py\u001b[0m in \u001b[0;36m_take_next_step\u001b[0;34m(self, name_to_tool_map, color_mapping, inputs, intermediate_steps, run_manager)\u001b[0m\n\u001b[1;32m 771\u001b[0m \u001b[0mraise_error\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 772\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mraise_error\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 773\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 774\u001b[0m \u001b[0mtext\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 775\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhandle_parsing_errors\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbool\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/langchain/agents/agent.py\u001b[0m in \u001b[0;36m_take_next_step\u001b[0;34m(self, name_to_tool_map, color_mapping, inputs, intermediate_steps, run_manager)\u001b[0m\n\u001b[1;32m 760\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 761\u001b[0m \u001b[0;31m# Call the LLM to see what to do.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 762\u001b[0;31m output = self.agent.plan(\n\u001b[0m\u001b[1;32m 763\u001b[0m \u001b[0mintermediate_steps\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 764\u001b[0m \u001b[0mcallbacks\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mrun_manager\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_child\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mrun_manager\u001b[0m \u001b[0;32melse\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/langchain/agents/agent.py\u001b[0m in \u001b[0;36mplan\u001b[0;34m(self, intermediate_steps, callbacks, **kwargs)\u001b[0m\n\u001b[1;32m 442\u001b[0m \u001b[0mfull_inputs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_full_inputs\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mintermediate_steps\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 443\u001b[0m \u001b[0mfull_output\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mllm_chain\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpredict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcallbacks\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcallbacks\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mfull_inputs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 444\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moutput_parser\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mparse\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfull_output\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 445\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 446\u001b[0m async def aplan(\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/langchain/agents/chat/output_parser.py\u001b[0m in \u001b[0;36mparse\u001b[0;34m(self, text)\u001b[0m\n\u001b[1;32m 28\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 29\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mincludes_answer\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 30\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mOutputParserException\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf\"Could not parse LLM output: {text}\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 31\u001b[0m return AgentFinish(\n\u001b[1;32m 32\u001b[0m \u001b[0;34m{\u001b[0m\u001b[0;34m\"output\"\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mtext\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msplit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mFINAL_ANSWER_ACTION\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstrip\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtext\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;31mOutputParserException\u001b[0m: Could not parse LLM output: I have the list of actors who played in The Matrix."
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [],
+ "metadata": {
+ "id": "mdBxM7R_kDJf"
+ },
+ "execution_count": null,
+ "outputs": []
+ }
+ ]
+}
\ No newline at end of file
diff --git a/LLMs/src/langchain_neo4j.ipynb b/LLMs/src/langchain_neo4j.ipynb
new file mode 100644
index 0000000..b2eb565
--- /dev/null
+++ b/LLMs/src/langchain_neo4j.ipynb
@@ -0,0 +1,541 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "provenance": [],
+ "authorship_tag": "ABX9TyOcXBcR86rPwNdXV2pzMJOf",
+ "include_colab_link": true
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github",
+ "colab_type": "text"
+ },
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "lqp9wLWW-UFE",
+ "outputId": "156a39ce-bcaf-4f78-d07e-62a343da4f81"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
+ "Collecting neo4j\n",
+ " Downloading neo4j-5.8.1.tar.gz (187 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m187.7/187.7 kB\u001b[0m \u001b[31m3.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25h Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
+ " Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
+ " Installing backend dependencies ... \u001b[?25l\u001b[?25hdone\n",
+ " Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
+ "Collecting openai\n",
+ " Downloading openai-0.27.7-py3-none-any.whl (71 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m72.0/72.0 kB\u001b[0m \u001b[31m7.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hCollecting langchain\n",
+ " Downloading langchain-0.0.180-py3-none-any.whl (922 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m922.9/922.9 kB\u001b[0m \u001b[31m25.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hRequirement already satisfied: pytz in /usr/local/lib/python3.10/dist-packages (from neo4j) (2022.7.1)\n",
+ "Requirement already satisfied: requests>=2.20 in /usr/local/lib/python3.10/dist-packages (from openai) (2.27.1)\n",
+ "Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from openai) (4.65.0)\n",
+ "Collecting aiohttp (from openai)\n",
+ " Downloading aiohttp-3.8.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.0/1.0 MB\u001b[0m \u001b[31m38.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hRequirement already satisfied: PyYAML>=5.4.1 in /usr/local/lib/python3.10/dist-packages (from langchain) (6.0)\n",
+ "Requirement already satisfied: SQLAlchemy<3,>=1.4 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.0.10)\n",
+ "Collecting async-timeout<5.0.0,>=4.0.0 (from langchain)\n",
+ " Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)\n",
+ "Collecting dataclasses-json<0.6.0,>=0.5.7 (from langchain)\n",
+ " Downloading dataclasses_json-0.5.7-py3-none-any.whl (25 kB)\n",
+ "Requirement already satisfied: numexpr<3.0.0,>=2.8.4 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.8.4)\n",
+ "Requirement already satisfied: numpy<2,>=1 in /usr/local/lib/python3.10/dist-packages (from langchain) (1.22.4)\n",
+ "Collecting openapi-schema-pydantic<2.0,>=1.2 (from langchain)\n",
+ " Downloading openapi_schema_pydantic-1.2.4-py3-none-any.whl (90 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m90.0/90.0 kB\u001b[0m \u001b[31m5.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hRequirement already satisfied: pydantic<2,>=1 in /usr/local/lib/python3.10/dist-packages (from langchain) (1.10.7)\n",
+ "Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (8.2.2)\n",
+ "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai) (23.1.0)\n",
+ "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai) (2.0.12)\n",
+ "Collecting multidict<7.0,>=4.5 (from aiohttp->openai)\n",
+ " Downloading multidict-6.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (114 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m114.5/114.5 kB\u001b[0m \u001b[31m11.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hCollecting yarl<2.0,>=1.0 (from aiohttp->openai)\n",
+ " Downloading yarl-1.9.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (268 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m268.8/268.8 kB\u001b[0m \u001b[31m21.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hCollecting frozenlist>=1.1.1 (from aiohttp->openai)\n",
+ " Downloading frozenlist-1.3.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (149 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m149.6/149.6 kB\u001b[0m \u001b[31m13.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hCollecting aiosignal>=1.1.2 (from aiohttp->openai)\n",
+ " Downloading aiosignal-1.3.1-py3-none-any.whl (7.6 kB)\n",
+ "Collecting marshmallow<4.0.0,>=3.3.0 (from dataclasses-json<0.6.0,>=0.5.7->langchain)\n",
+ " Downloading marshmallow-3.19.0-py3-none-any.whl (49 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.1/49.1 kB\u001b[0m \u001b[31m4.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hCollecting marshmallow-enum<2.0.0,>=1.5.1 (from dataclasses-json<0.6.0,>=0.5.7->langchain)\n",
+ " Downloading marshmallow_enum-1.5.1-py2.py3-none-any.whl (4.2 kB)\n",
+ "Collecting typing-inspect>=0.4.0 (from dataclasses-json<0.6.0,>=0.5.7->langchain)\n",
+ " Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)\n",
+ "Requirement already satisfied: typing-extensions>=4.2.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<2,>=1->langchain) (4.5.0)\n",
+ "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.20->openai) (1.26.15)\n",
+ "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.20->openai) (2022.12.7)\n",
+ "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.20->openai) (3.4)\n",
+ "Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from SQLAlchemy<3,>=1.4->langchain) (2.0.2)\n",
+ "Requirement already satisfied: packaging>=17.0 in /usr/local/lib/python3.10/dist-packages (from marshmallow<4.0.0,>=3.3.0->dataclasses-json<0.6.0,>=0.5.7->langchain) (23.1)\n",
+ "Collecting mypy-extensions>=0.3.0 (from typing-inspect>=0.4.0->dataclasses-json<0.6.0,>=0.5.7->langchain)\n",
+ " Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)\n",
+ "Building wheels for collected packages: neo4j\n",
+ " Building wheel for neo4j (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
+ " Created wheel for neo4j: filename=neo4j-5.8.1-py3-none-any.whl size=258701 sha256=a706d5cd9390e35eeaef28489bc4f9751940010945aeca57839f274b756e5295\n",
+ " Stored in directory: /root/.cache/pip/wheels/89/ee/2a/85f7b50c16580f09d88dcdbfd546db95d5b29b9967a3603ca1\n",
+ "Successfully built neo4j\n",
+ "Installing collected packages: neo4j, mypy-extensions, multidict, marshmallow, frozenlist, async-timeout, yarl, typing-inspect, openapi-schema-pydantic, marshmallow-enum, aiosignal, dataclasses-json, aiohttp, openai, langchain\n",
+ "Successfully installed aiohttp-3.8.4 aiosignal-1.3.1 async-timeout-4.0.2 dataclasses-json-0.5.7 frozenlist-1.3.3 langchain-0.0.180 marshmallow-3.19.0 marshmallow-enum-1.5.1 multidict-6.0.4 mypy-extensions-1.0.0 neo4j-5.8.1 openai-0.27.7 openapi-schema-pydantic-1.2.4 typing-inspect-0.9.0 yarl-1.9.2\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip install neo4j openai langchain"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# LangChain has added Cypher Search\n",
+ "## With the LangChain library, you can conveniently generate Cypher queries, enabling an efficient retrieval of information from Neo4j.\n",
+ "\n",
+ "If you have developed or plan to implement any solution that uses Large Language Models, you have most likely heard of the LangChain library. LangChain library is the most widely known Python library used to develop applications that use LLMs in one or another capabilities. It is designed to be modular, allowing us to use any LLM in any available modules, such as chains, tools, memory, or agents.\n",
+ "\n",
+ "A month ago, I spent a week researching and implementing a solution allowing anyone to retrieve information from Neo4j directly from the LangChain library and use it in their LLM applications. I learned quite a lot about the internals of LangChain library and wrote up my experience in a blog post.\n",
+ "\n",
+ "A colleague of mine showed me a LangChain feature request, where the user requested that my work of having the option to retrieve information from the Neo4j database would be added as a module directly to the LangChain library so that no additional code or external modules would be needed to integrate Neo4j into LangChain applications. Since I was already familiar with LangChain internals, I decided to try and implement Cypher searching capabilities myself. I spent a weekend researching and coding the solution and ensuring it would conform to the contribution standards for it to be added to the library. Luckily, the maintainers of LangChain are very responsive and open to new ideas, and the Cypher Search has been added in the latest release of the LangChain library. Thanks to Harrison Chase for maintaining such a great library and also being very responsive to new ideas.\n",
+ "\n",
+ "In tutorial, I will show you how you can use the newly added Cypher Search in the LangChain library to retrieve information from a Neo4j database.\n",
+ "\n",
+ "## What is a knowledge graph\n",
+ "LangChain has already integrations with Vector and SQL databases, so why do we need an integration with a Graph Database like Neo4j?\n",
+ "Knowledge graphs are ideal for storing heterogeneous and highly connected data. For instance, the above image contains information about people, organizations, movies, websites, etc. While the ability to intuitively model and store diverse sets of data is incredible, I think the main benefit of using graphs is the ability to analyze data points through their relationships. Graphs enable us to uncover connections and correlations that might otherwise remain unnoticed with traditional database and analytics approaches as they often overlook the context surrounding individual data points.\n",
+ "\n",
+ "The power of graph databases truly shines when dealing with complex systems where interdependencies and interactions are vital in understanding the system.\n",
+ "They enable us to see beyond individual data points and delve into the intricate relationships that define their context. This provides a deeper, more holistic view of the data, facilitating better decision-making and knowledge discovery.\n",
+ "\n",
+ "## Setting up Neo4j environment\n",
+ "\n",
+ "If you have an existing Neo4j database, you can use it to try the newly added Cypher Search. The Cypher Search module uses graph schema information to generate Cypher statements, meaning you can plug it into any Neo4j database.\n",
+ "If you don't have any Neo4j database yet, you can use [Neo4j Sandbox](https://neo4j.com/sandbox/), which offers a free cloud instance of a Neo4j database. You need to register and instantiate any of the available pre-populated databases. I will be using the [ICIJ Paradise Papers dataset](https://sandbox.neo4j.com/?usecase=icij-paradise-papers) in this blog post, but you can use any other if you want. The dataset has been made available by International Consortium of Investigative Journalists as part of their Offshore Leaks Database.\n",
+ "\n",
+ "The graph contains four types of nodes:\n",
+ "* Entity - The offshore legal entity. This could be a company, trust, foundation, or other legal entity created in a low-tax jurisdiction.\n",
+ "* Officer - A person or company who plays a role in an offshore entity, such as beneficiary, director, or shareholder. The relationships shown in the diagram are just a sample of all the existing ones.\n",
+ "* Intermediary - A go-between for someone seeking an offshore corporation and an offshore service provider - usually a law-firm or a middleman that asks an offshore service provider to create an offshore firm.\n",
+ "* Address - The registered address as it appears in the original databases obtained by ICIJ.\n",
+ "\n",
+ "## Knowledge Graph Cypher Search\n",
+ "The name Cypher Search comes from Cypher, which is a query language used to interact with graph databases like Neo4j.\n",
+ "\n",
+ "\n",
+ "\n",
+ "In order to allow LangChain to retrieve information from graph databases, I implemented a module that can convert the natural language to a Cypher statement, use it to retrieve data from Neo4j and return the retrieved information to the user in a natural language form. This two-way conversion process between natural language and database language not only enhances the overall accessibility of data retrieval but also greatly improves the user experience.\n",
+ "\n",
+ "\n",
+ "The beauty of the LangChain library is in its simplicity. We only need a couple of lines of code and we can retrieve information from Neo4j using natural language."
+ ],
+ "metadata": {
+ "id": "2wg51pF7LVVi"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "from langchain.chat_models import ChatOpenAI\n",
+ "from langchain.chains import GraphCypherQAChain\n",
+ "from langchain.graphs import Neo4jGraph\n",
+ "\n",
+ "graph = Neo4jGraph(\n",
+ " url=\"bolt://3.239.187.136:7687\", \n",
+ " username=\"neo4j\", \n",
+ " password=\"cushions-haul-revolution\"\n",
+ ")"
+ ],
+ "metadata": {
+ "id": "hTz9REHt-XxO"
+ },
+ "execution_count": 2,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "import os\n",
+ "\n",
+ "os.environ['OPENAI_API_KEY'] = \"sk-\"\n",
+ "\n",
+ "chain = GraphCypherQAChain.from_llm(\n",
+ " ChatOpenAI(temperature=0), graph=graph, verbose=True,\n",
+ ")"
+ ],
+ "metadata": {
+ "id": "DkEz1NTBFrXu"
+ },
+ "execution_count": 3,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Here, we are using the gpt-3.5-turbo model from OpenAI to generate Cypher statements. The Cypher statements are generated based on the graph schema, which means that, in theory, you can plug the Cypher chain into any Neo4j instance, and it should be able to answer natural language answers. Unfortunately, I haven't yet tested other LLM providers in their ability to generate Cypher statements since I don't have access to any of them. Still, I would love to hear your evaluation of other LLMs generating Cypher statements if you will give it a go. Of course, if you want to break the dependency on LLM cloud providers, you can always [fine-tune an open-source LLM to generate Cypher statements](https://towardsdatascience.com/fine-tuning-an-llm-model-with-h2o-llm-studio-to-generate-cypher-statements-3f34822ad5).\n",
+ "\n",
+ "Let's start with a simple test."
+ ],
+ "metadata": {
+ "id": "UW50UoMUMRSg"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "chain.run(\"\"\"\n",
+ "Which intermediary is connected to most entites?\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 258
+ },
+ "id": "nblHLc6TJnLT",
+ "outputId": "5d54fa86-d442-4361-f24e-2ec068de16e2"
+ },
+ "execution_count": 4,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "\n",
+ "\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
+ "Generated Cypher:\n",
+ "\u001b[32;1m\u001b[1;3mMATCH (i:Intermediary)-[:CONNECTED_TO]->(e:Entity)\n",
+ "RETURN i.name, COUNT(e) AS num_entities\n",
+ "ORDER BY num_entities DESC\n",
+ "LIMIT 1\u001b[0m\n",
+ "Full Context:\n",
+ "\u001b[32;1m\u001b[1;3m[{'i.name': 'Group - SUN Capital Partner Group', 'num_entities': 115}]\u001b[0m\n",
+ "\n",
+ "\u001b[1m> Finished chain.\u001b[0m\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'Based on the provided information, the intermediary that is connected to the most entities is the Group - SUN Capital Partner Group, with a total of 115 entities.'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 4
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "We can observe the generated Cypher statement and the retrieved information from Neo4j used to form the answer. That is as easy a setup as it gets. Let's move on to the next example."
+ ],
+ "metadata": {
+ "id": "QMHK94FyMYnV"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "chain.run(\"\"\"\n",
+ "Who are the officers of ZZZ-MILI COMPANY LTD.?\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 222
+ },
+ "id": "_ia3wgn7JDg3",
+ "outputId": "d2fada74-3351-49a1-c2ad-ed4fbc9d2d8b"
+ },
+ "execution_count": 5,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "\n",
+ "\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
+ "Generated Cypher:\n",
+ "\u001b[32;1m\u001b[1;3mMATCH (o:Officer)-[:OFFICER_OF]->(e:Entity{name:'ZZZ-MILI COMPANY LTD.'}) RETURN o.name\u001b[0m\n",
+ "Full Context:\n",
+ "\u001b[32;1m\u001b[1;3m[{'o.name': 'Whitson - Iva Marie'}, {'o.name': 'Whitson - Claud S.'}, {'o.name': 'Whitson - Claud S.'}, {'o.name': 'FINSBURY NOMINEES LTD'}, {'o.name': 'EURO NOMINEES LTD.'}, {'o.name': 'EURO SECURITIES LTD.'}, {'o.name': 'COMPANY DIRECTORS LTD.'}]\u001b[0m\n",
+ "\n",
+ "\u001b[1m> Finished chain.\u001b[0m\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'The officers of ZZZ-MILI COMPANY LTD. are Whitson - Iva Marie, Whitson - Claud S., FINSBURY NOMINEES LTD, EURO NOMINEES LTD., EURO SECURITIES LTD., and COMPANY DIRECTORS LTD.'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 5
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Since we are using a graph, let's construct a question that would utilize the power of graph databases."
+ ],
+ "metadata": {
+ "id": "X04jYeaeMhZe"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#chain_improved.run(\"\"\"\n",
+ "#How are entities SOUTHWEST LAND DEVELOPMENT LTD. and Dragon Capital Markets Limited related?\n",
+ "#\"\"\")"
+ ],
+ "metadata": {
+ "id": "IvGS4_c1CHQg"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "The generated Cypher statement looks fine at first glance. However, there is a problem as the Cypher statements used the variable-length path finding syntax and also treated relationships as undirected. As a result, this type of query is highly unoptimized and would explode in the number of rows.\n",
+ "\n",
+ "The nice thing about gpt-3.5-turbo is that it follows hints and instructions we drop in the input. For example, we can ask it to find only the shortest path."
+ ],
+ "metadata": {
+ "id": "W1Cqo3KrMoU1"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "chain.run(\"\"\"\n",
+ "How are entities SOUTHWEST LAND DEVELOPMENT LTD. and Dragon Capital Markets Limited connected?\n",
+ "Find a shortest path.\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 278
+ },
+ "id": "ZVQPdna0GaLP",
+ "outputId": "2e44963e-652d-44ce-ba6a-f33a2ab5b95e"
+ },
+ "execution_count": 6,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "\n",
+ "\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
+ "Generated Cypher:\n",
+ "\u001b[32;1m\u001b[1;3mMATCH (e1:Entity {name: 'SOUTHWEST LAND DEVELOPMENT LTD.'}), (e2:Entity {name: 'Dragon Capital Markets Limited'})\n",
+ "MATCH p=shortestPath((e1)-[*]-(e2))\n",
+ "RETURN p\u001b[0m\n",
+ "Full Context:\n",
+ "\u001b[32;1m\u001b[1;3m[{'p': [{'sourceID': 'Paradise Papers - Appleby', 'jurisdiction': 'KY', 'service_provider': 'Appleby', 'countries': 'Cayman Islands', 'jurisdiction_description': 'Cayman Islands', 'type': 'CE', 'valid_until': 'Appleby data is current through 2014', 'ibcRUC': '50843', 'labels(n)': '[\"Entity\"]', 'name': 'SOUTHWEST LAND DEVELOPMENT LTD.', 'country_codes': 'CYM', 'incorporation_date': '1993-Oct-05', 'node_id': '82011899'}, 'REGISTERED_ADDRESS', {'sourceID': 'Paradise Papers - Appleby', 'valid_until': 'Appleby data is current through 2014', 'address': 'Clifton House', 'labels(n)': '[\"Address\"]', 'name': 'Clifton House; 75 Fort Street; Grand Cayman KY1-1108; Cayman Islands', 'country_codes': 'CYM', 'countries': 'Cayman Islands', 'node_id': '81027146'}, 'REGISTERED_ADDRESS', {'sourceID': 'Paradise Papers - Appleby', 'jurisdiction': 'KY', 'service_provider': 'Appleby', 'countries': 'British Virgin Islands;Cayman Islands', 'jurisdiction_description': 'Cayman Islands', 'type': 'CE', 'valid_until': 'Appleby data is current through 2014', 'ibcRUC': '251645', 'labels(n)': '[\"Entity\"]', 'name': 'Dragon Capital Markets Limited', 'country_codes': 'VGB;CYM', 'incorporation_date': '1996-May-02', 'node_id': '82014099'}]}]\u001b[0m\n",
+ "\n",
+ "\u001b[1m> Finished chain.\u001b[0m\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'SOUTHWEST LAND DEVELOPMENT LTD. and Dragon Capital Markets Limited are connected through the fact that they were both serviced by Appleby in the Cayman Islands jurisdiction. A shortest path between them would be to follow the service provider Appleby and the jurisdiction of Cayman Islands.'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 6
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Now that we dropped a hint that only the shortest path should be retrieved, we don't run into cardinality explosion troubles anymore. However, one thing I noticed is that the LLM sometimes doesn't provide the best results if a path object is returned. However, we can also fix that by instructing the model what information to use."
+ ],
+ "metadata": {
+ "id": "dKYeXyqnMrJm"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "chain.run(\"\"\"\n",
+ "How are entities SOUTHWEST LAND DEVELOPMENT LTD. and Dragon Capital Markets Limited connected?\n",
+ "Find a shortest path.\n",
+ "Return only name properties of nodes and relationship types\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 239
+ },
+ "id": "jpUGmHgzIF4i",
+ "outputId": "dfb66d71-7eca-4595-cc69-2fab4d033b62"
+ },
+ "execution_count": 7,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "\n",
+ "\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
+ "Generated Cypher:\n",
+ "\u001b[32;1m\u001b[1;3mMATCH path = shortestPath((e1:Entity {name: 'SOUTHWEST LAND DEVELOPMENT LTD.'})-[*]-(e2:Entity {name: 'Dragon Capital Markets Limited'}))\n",
+ "RETURN [n IN nodes(path) | n.name] AS Names, [r IN relationships(path) | type(r)] AS Relationships\u001b[0m\n",
+ "Full Context:\n",
+ "\u001b[32;1m\u001b[1;3m[{'Names': ['SOUTHWEST LAND DEVELOPMENT LTD.', 'Clifton House; 75 Fort Street; Grand Cayman KY1-1108; Cayman Islands', 'Dragon Capital Markets Limited'], 'Relationships': ['REGISTERED_ADDRESS', 'REGISTERED_ADDRESS']}]\u001b[0m\n",
+ "\n",
+ "\u001b[1m> Finished chain.\u001b[0m\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'The entities SOUTHWEST LAND DEVELOPMENT LTD. and Dragon Capital Markets Limited are connected through the relationship types REGISTERED_ADDRESS and REGISTERED_ADDRESS. A shortest path between them can be found by following these relationships. The name properties of the nodes along this path are SOUTHWEST LAND DEVELOPMENT LTD., Clifton House; 75 Fort Street; Grand Cayman KY1-1108; Cayman Islands, and Dragon Capital Markets Limited.'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 7
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Now we can a better response and more appropriate response. The more hints you drop to an LLM, the better results you can expect. For example, you can also instruct it which relationships it can traverse."
+ ],
+ "metadata": {
+ "id": "i1ISIYQvMxEF"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "chain.run(\"\"\"\n",
+ "How are entities SOUTHWEST LAND DEVELOPMENT LTD. and Dragon Capital Markets Limited connected?\n",
+ "Find a shortest path and use only officer, intermediary, and connected relationships.\n",
+ "Return only name properties of nodes and relationship types\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 278
+ },
+ "id": "Sxhj6n7NT6bd",
+ "outputId": "6b431f95-986e-41bf-d13e-f5da5e2b8694"
+ },
+ "execution_count": 8,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "\n",
+ "\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
+ "Generated Cypher:\n",
+ "\u001b[32;1m\u001b[1;3mMATCH (e1:Entity {name: 'SOUTHWEST LAND DEVELOPMENT LTD.'}), (e2:Entity {name: 'Dragon Capital Markets Limited'})\n",
+ "MATCH p=shortestPath((e1)-[:OFFICER_OF|INTERMEDIARY_OF|CONNECTED_TO*]-(e2))\n",
+ "RETURN [n IN nodes(p) | n.name] AS Names, [r IN relationships(p) | type(r)] AS Relationships\u001b[0m\n",
+ "Full Context:\n",
+ "\u001b[32;1m\u001b[1;3m[{'Names': ['SOUTHWEST LAND DEVELOPMENT LTD.', 'Appleby Trust (Cayman) Ltd.', 'Dragon Capital Clean Development Investments Ltd.', 'Group - Dragon Capital', 'Dragon Capital Markets Limited'], 'Relationships': ['INTERMEDIARY_OF', 'INTERMEDIARY_OF', 'CONNECTED_TO', 'CONNECTED_TO']}]\u001b[0m\n",
+ "\n",
+ "\u001b[1m> Finished chain.\u001b[0m\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'SOUTHWEST LAND DEVELOPMENT LTD. and Dragon Capital Markets Limited are connected through intermediary and connected relationships. The shortest path between them is as follows: SOUTHWEST LAND DEVELOPMENT LTD. (INTERMEDIARY_OF) -> Appleby Trust (Cayman) Ltd. (INTERMEDIARY_OF) -> Dragon Capital Clean Development Investments Ltd. (CONNECTED_TO) -> Group - Dragon Capital (CONNECTED_TO) -> Dragon Capital Markets Limited.'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 8
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Summary\n",
+ "Graph databases are an excellent tool for retrieving or analyzing the connections between various entities like people and organizations. In this blog post, we looked at a simple shortest path use case, where the number of relationships and the sequence of relationship types is unknown beforehand. These types of queries are virtually impossible in a vector database and could also be quite complicated in a SQL database.\n",
+ "I am thrilled about the addition of Cypher Search to the LangChain library. Please test it out, and let me know how it works for you, especially if you are testing it on other LLM models or have exciting use cases."
+ ],
+ "metadata": {
+ "id": "myeTTU1zM0b-"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [],
+ "metadata": {
+ "id": "5mmBVoJiIr9K"
+ },
+ "execution_count": null,
+ "outputs": []
+ }
+ ]
+}
\ No newline at end of file
diff --git a/LLMs/src/langchain_neo4j_tips.ipynb b/LLMs/src/langchain_neo4j_tips.ipynb
new file mode 100644
index 0000000..4e64ad3
--- /dev/null
+++ b/LLMs/src/langchain_neo4j_tips.ipynb
@@ -0,0 +1,859 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "provenance": [],
+ "authorship_tag": "ABX9TyM4xOyaU8TfeD/8PDfwIi4J",
+ "include_colab_link": true
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github",
+ "colab_type": "text"
+ },
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "SgGtmyZxDfCI",
+ "outputId": "1a5f5dcf-7052-4a6a-a604-a507bebe7159"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
+ "Requirement already satisfied: neo4j in /usr/local/lib/python3.10/dist-packages (5.9.0)\n",
+ "Requirement already satisfied: openai in /usr/local/lib/python3.10/dist-packages (0.27.7)\n",
+ "Requirement already satisfied: langchain in /usr/local/lib/python3.10/dist-packages (0.0.183)\n",
+ "Requirement already satisfied: pytz in /usr/local/lib/python3.10/dist-packages (from neo4j) (2022.7.1)\n",
+ "Requirement already satisfied: requests>=2.20 in /usr/local/lib/python3.10/dist-packages (from openai) (2.27.1)\n",
+ "Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from openai) (4.65.0)\n",
+ "Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from openai) (3.8.4)\n",
+ "Requirement already satisfied: PyYAML>=5.4.1 in /usr/local/lib/python3.10/dist-packages (from langchain) (6.0)\n",
+ "Requirement already satisfied: SQLAlchemy<3,>=1.4 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.0.10)\n",
+ "Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (4.0.2)\n",
+ "Requirement already satisfied: dataclasses-json<0.6.0,>=0.5.7 in /usr/local/lib/python3.10/dist-packages (from langchain) (0.5.7)\n",
+ "Requirement already satisfied: numexpr<3.0.0,>=2.8.4 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.8.4)\n",
+ "Requirement already satisfied: numpy<2,>=1 in /usr/local/lib/python3.10/dist-packages (from langchain) (1.22.4)\n",
+ "Requirement already satisfied: openapi-schema-pydantic<2.0,>=1.2 in /usr/local/lib/python3.10/dist-packages (from langchain) (1.2.4)\n",
+ "Requirement already satisfied: pydantic<2,>=1 in /usr/local/lib/python3.10/dist-packages (from langchain) (1.10.7)\n",
+ "Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (8.2.2)\n",
+ "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai) (23.1.0)\n",
+ "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai) (2.0.12)\n",
+ "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai) (6.0.4)\n",
+ "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai) (1.9.2)\n",
+ "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai) (1.3.3)\n",
+ "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai) (1.3.1)\n",
+ "Requirement already satisfied: marshmallow<4.0.0,>=3.3.0 in /usr/local/lib/python3.10/dist-packages (from dataclasses-json<0.6.0,>=0.5.7->langchain) (3.19.0)\n",
+ "Requirement already satisfied: marshmallow-enum<2.0.0,>=1.5.1 in /usr/local/lib/python3.10/dist-packages (from dataclasses-json<0.6.0,>=0.5.7->langchain) (1.5.1)\n",
+ "Requirement already satisfied: typing-inspect>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from dataclasses-json<0.6.0,>=0.5.7->langchain) (0.9.0)\n",
+ "Requirement already satisfied: typing-extensions>=4.2.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<2,>=1->langchain) (4.5.0)\n",
+ "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.20->openai) (1.26.15)\n",
+ "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.20->openai) (2022.12.7)\n",
+ "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.20->openai) (3.4)\n",
+ "Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from SQLAlchemy<3,>=1.4->langchain) (2.0.2)\n",
+ "Requirement already satisfied: packaging>=17.0 in /usr/local/lib/python3.10/dist-packages (from marshmallow<4.0.0,>=3.3.0->dataclasses-json<0.6.0,>=0.5.7->langchain) (23.1)\n",
+ "Requirement already satisfied: mypy-extensions>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from typing-inspect>=0.4.0->dataclasses-json<0.6.0,>=0.5.7->langchain) (1.0.0)\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip install neo4j openai langchain"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# LangChain Cypher search: Tips & Tricks\n",
+ "## How to optimize prompts for Cypher statement generation to retrieve relevant information from Neo4j in your LLM applications\n",
+ "\n",
+ "Last time, we looked at how to get started with [Cypher Search in the LangChain](https://towardsdatascience.com/langchain-has-added-cypher-search-cb9d821120d5) library and why you would want to use knowledge graphs in your LLM applications. In this blog post, we will continue to explore various use cases for integrating knowledge graphs into LLM and LangChain applications. Along the way, you will learn how to improve prompts to produce better and more accurate Cypher statements.\n",
+ "\n",
+ "Specifically, we will look at how to use the few-shot capabilities of LLMs by providing a couple of Cypher statement examples, which can be used to specify which Cypher statements the LLM should produce, what the results should look like, and more. Additionally, you will learn how you can integrate graph algorithms from the Neo4j Graph Data Science library into your LangChain applications.\n",
+ "\n",
+ "## Neo4j environment setup\n",
+ "\n",
+ "In this blog post, we will be using the [Twitch dataset that is available in Neo4j Sandbox](https://sandbox.neo4j.com/?usecase=twitch).\n",
+ "\n",
+ "\n",
+ "\n",
+ "The Twitch social network composes of users. A small percentage of those users broadcast their gameplay or activities through live streams. In the graph model, users who do live streams are tagged with a secondary label Stream. Additional information about which teams they belong to, which games they play on stream, and in which language they present their content is present. We also know how many followers they had at the moment of scraping, the all-time historical view count, and when they created their accounts. The most relevant information for network analysis is knowing which users engaged in the streamer's chat. You can distinguish if the user who chatted in the stream was a regular user (CHATTER relationship), a moderator of the stream (MODERATOR relationship), or a stream VIP.\n",
+ "The network information was scraped between the 7th and the 10th of May 2021. Therefore, the dataset has outdated information.\n",
+ "## Improving LangChain Cypher search\n",
+ "First, we have to setup the LangChain Cypher search."
+ ],
+ "metadata": {
+ "id": "2lXfT6Luz2oV"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "from langchain.chat_models import ChatOpenAI\n",
+ "from langchain.chains import GraphCypherQAChain\n",
+ "from langchain.graphs import Neo4jGraph\n",
+ "\n",
+ "graph = Neo4jGraph(\n",
+ " url=\"bolt://44.212.12.199:7687\", \n",
+ " username=\"neo4j\", \n",
+ " password=\"buoy-warehouse-subordinates\"\n",
+ ")"
+ ],
+ "metadata": {
+ "id": "x5oiK71aDhwm"
+ },
+ "execution_count": 2,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "import os\n",
+ "\n",
+ "os.environ['OPENAI_API_KEY'] = \"OPENAI_API_KEY\"\n",
+ "\n",
+ "chain = GraphCypherQAChain.from_llm(\n",
+ " ChatOpenAI(temperature=0), graph=graph, verbose=True,\n",
+ ")"
+ ],
+ "metadata": {
+ "id": "Wliyn4YWDrj0"
+ },
+ "execution_count": 3,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "I really love how easy it is go setup the Cypher Search in the LangChain library. You only need to define the Neo4j and OpenAI credentials, and you are good to go. Under the hood, the graph objects inspects the graph schema model and passes it to the GraphCypherQAChain to construct accurate Cypher statements.\n",
+ "\n",
+ "Let's begin with a simple question."
+ ],
+ "metadata": {
+ "id": "rd2pXgxZ0Ovo"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "chain.run(\"\"\"\n",
+ "Which fortnite streamer has the most followers?\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 258
+ },
+ "id": "oaKOhoyKDvzj",
+ "outputId": "2953b317-196e-455a-a836-ce4965f2b34d"
+ },
+ "execution_count": 4,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "\n",
+ "\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
+ "Generated Cypher:\n",
+ "\u001b[32;1m\u001b[1;3mMATCH (s:Stream)-[:PLAYS]->(:Game {name: 'Fortnite'})\n",
+ "RETURN s.name, s.followers\n",
+ "ORDER BY s.followers DESC\n",
+ "LIMIT 1\u001b[0m\n",
+ "Full Context:\n",
+ "\u001b[32;1m\u001b[1;3m[{'s.name': 'thegrefg', 's.followers': 7269018}]\u001b[0m\n",
+ "\n",
+ "\u001b[1m> Finished chain.\u001b[0m\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'According to the provided information, the Fortnite streamer with the most followers is thegrefg, with a total of 7,269,018 followers.'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 4
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "The Cypher chain constructed a relevant Cypher statement, used it to retrieve information from Neo4j, and provided the answer in natural language form.\n",
+ "\n",
+ "Now let's ask another question."
+ ],
+ "metadata": {
+ "id": "RH7heaDW0RJP"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "chain.run(\"\"\"\n",
+ "Which italian streamer has the most followers?\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 296
+ },
+ "id": "4Cdzdk29E77u",
+ "outputId": "0eef44d4-1d19-4b22-f901-08f68881fb1c"
+ },
+ "execution_count": 5,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "\n",
+ "\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n"
+ ]
+ },
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "WARNING:langchain.chat_models.openai:Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.._completion_with_retry in 1.0 seconds as it raised RateLimitError: That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID ff0e12be427eab2e88d77430f44dc49a in your message.).\n"
+ ]
+ },
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Generated Cypher:\n",
+ "\u001b[32;1m\u001b[1;3mMATCH (s:Stream)-[:HAS_LANGUAGE]->(:Language {name: 'Italian'})\n",
+ "RETURN s.name, s.followers\n",
+ "ORDER BY s.followers DESC\n",
+ "LIMIT 1\u001b[0m\n",
+ "Full Context:\n",
+ "\u001b[32;1m\u001b[1;3m[]\u001b[0m\n",
+ "\n",
+ "\u001b[1m> Finished chain.\u001b[0m\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "\"I'm sorry, but I cannot provide an answer to your question as no information has been provided. Please provide more details or a specific name to assist you better.\""
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 5
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "The generated Cypher statement looks valid, but unfortunately, we didn't get any results. The problem is that the language values are stored as two-character country codes, and the LLM is unaware of that. There are a few options we have to overcome this problem. First, we can utilize the few-shot capabilities of LLMs by providing examples of Cypher statements, which the model then imitates when generating Cypher statements. To add example Cypher statements in the prompt, we have to update the Cypher generating prompt. You can take a look at the default prompt used to generate Cypher statements to better understand the update we are going to do."
+ ],
+ "metadata": {
+ "id": "X1_7O7MD0TYr"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# https://github.com/hwchase17/langchain/blob/master/langchain/chains/graph_qa/prompts.py\n",
+ "from langchain.prompts.prompt import PromptTemplate\n",
+ "\n",
+ "\n",
+ "CYPHER_GENERATION_TEMPLATE = \"\"\"Task:Generate Cypher statement to query a graph database.\n",
+ "Instructions:\n",
+ "Use only the provided relationship types and properties in the schema.\n",
+ "Do not use any other relationship types or properties that are not provided.\n",
+ "Schema:\n",
+ "{schema}\n",
+ "Cypher examples:\n",
+ "# How many streamers are from Norway?\n",
+ "MATCH (s:Stream)-[:HAS_LANGUAGE]->(:Language {{name: 'no'}})\n",
+ "RETURN count(s) AS streamers\n",
+ "\n",
+ "Note: Do not include any explanations or apologies in your responses.\n",
+ "Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.\n",
+ "Do not include any text except the generated Cypher statement.\n",
+ "\n",
+ "The question is:\n",
+ "{question}\"\"\"\n",
+ "CYPHER_GENERATION_PROMPT = PromptTemplate(\n",
+ " input_variables=[\"schema\", \"question\"], template=CYPHER_GENERATION_TEMPLATE\n",
+ ")"
+ ],
+ "metadata": {
+ "id": "Wu8DkdjXF0Vf"
+ },
+ "execution_count": 6,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "If you compare the new Cypher generating prompt to the default one, you can observe we only added the Cypher examples section. We added an example where the model could observe that the language values are given as two-character country codes. Now we can test the improved Cypher chain to answer the question about the most followed Italian streamers."
+ ],
+ "metadata": {
+ "id": "2gR8nPcQ0VAM"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "chain_language_example = GraphCypherQAChain.from_llm(\n",
+ " ChatOpenAI(temperature=0), graph=graph, verbose=True,\n",
+ " cypher_prompt=CYPHER_GENERATION_PROMPT\n",
+ ")\n",
+ "\n",
+ "chain_language_example.run(\"\"\"\n",
+ "Which italian streamer has the most followers?\n",
+ "\"\"\")\n"
+ ],
+ "metadata": {
+ "id": "uHqJDR43HFOf",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 276
+ },
+ "outputId": "1801e211-2250-4b57-b145-50aaba77edcb"
+ },
+ "execution_count": 7,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "\n",
+ "\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
+ "Generated Cypher:\n",
+ "\u001b[32;1m\u001b[1;3mMATCH (s:Stream)-[:HAS_LANGUAGE]->(:Language {name: 'it'})\n",
+ "WHERE s.followers IS NOT NULL\n",
+ "RETURN s.name AS streamer, s.followers AS followers\n",
+ "ORDER BY followers DESC\n",
+ "LIMIT 1\u001b[0m\n",
+ "Full Context:\n",
+ "\u001b[32;1m\u001b[1;3m[{'streamer': 'pow3rtv', 'followers': 1530428}]\u001b[0m\n",
+ "\n",
+ "\u001b[1m> Finished chain.\u001b[0m\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'According to the provided information, the streamer with the most followers is pow3rtv, with a total of 1530428 followers.'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 7
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "The model is now aware that the languages are given as two-character country codes and can now accurately answer questions that use the language information.\n",
+ "\n",
+ "## Using graph algorithms to answer questions\n",
+ "In the previous blog post, we looked at how integrating graph databases into LLM applications can answer questions like how entities are connected by finding the shortest or other paths between them. Today we will look at another use cases where graph databases can be used in LLM applications that other databases struggle with, specifically how we can use graph algorithms like PageRank to provide relevant answers. For example, we can use personalized PageRank to provide recommendations to an end user at query time.\n",
+ "\n",
+ "Take a look at the following example:"
+ ],
+ "metadata": {
+ "id": "sjqCT1x00XXF"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "chain_language_example.run(\"\"\"\n",
+ "Which streamers should I also watch if I like pokimane?\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 239
+ },
+ "id": "-fsAoE_MHvDq",
+ "outputId": "271397f0-c027-4e2c-a310-9ef39b1da9d1"
+ },
+ "execution_count": 8,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "\n",
+ "\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
+ "Generated Cypher:\n",
+ "\u001b[32;1m\u001b[1;3mMATCH (s1:Stream)-[:PLAYS]->(:Game {name: 'League of Legends'})<-[:PLAYS]-(s2:Stream)-[:CHATTER]->(s1)\n",
+ "WHERE s1.name = 'pokimane'\n",
+ "RETURN s2.name AS recommended_streamer\u001b[0m\n",
+ "Full Context:\n",
+ "\u001b[32;1m\u001b[1;3m[]\u001b[0m\n",
+ "\n",
+ "\u001b[1m> Finished chain.\u001b[0m\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'Based on your interest in Pokimane, you may also enjoy watching other popular streamers such as Valkyrae, LilyPichu, and Fuslie. These streamers have similar content and personalities that may appeal to your interests.'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 8
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Interestingly, every time we rerun this question, the model will generate a different Cypher statement. However, one thing is consistent. For some reason, every time the League of Legends is somehow included in the query.\n",
+ "\n",
+ "A bit more worrying fact is that the LLM model provided recommendations even though it wasn't provided with any suggestions in the prompt context. It's known that gpt-3.5-turbo sometimes doesn't follow the rules, especially if you do not repeat them more than once.\n",
+ "\n",
+ "Repeating the instruction three times can help gpt-3.5-turbo solve this problem. However, by repeating instructions, you are increasing the token count and consequently the cost of Cypher generation. Therefore, it would take some prompt engineering to get the best results using the lowest count of tokens.\n",
+ "\n",
+ "As mentioned, we will use Personalized PageRank to provide stream recommendations. But first, we need to project the in-memory graph and run the Node Similarity algorithm to prepare the graph to be able to give recommendations. Look at my [previous blog post](https://towardsdatascience.com/twitchverse-a-network-analysis-of-twitch-universe-using-neo4j-graph-data-science-d7218b4453ff) to learn more about how graph algorithms can be used to analyze the Twitch network."
+ ],
+ "metadata": {
+ "id": "Uui_hb0v0cCf"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# Project in-memory graph\n",
+ "graph.query(\"\"\"\n",
+ "CALL gds.graph.project('shared-audience',\n",
+ " ['User', 'Stream'],\n",
+ " {CHATTER: {orientation:'REVERSE'}})\n",
+ "\"\"\")\n",
+ "\n",
+ "# Run node similarity algorithm\n",
+ "graph.query(\"\"\"\n",
+ "CALL gds.nodeSimilarity.mutate('shared-audience',\n",
+ " {similarityMetric: 'Jaccard',similarityCutoff:0.05, topK:10, sudo:true,\n",
+ " mutateProperty:'score', mutateRelationshipType:'SHARED_AUDIENCE'})\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "6NimpqImIXzU",
+ "outputId": "073caee8-2102-41cb-d5c9-caa4e6d4add1"
+ },
+ "execution_count": 9,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[{'preProcessingMillis': 0,\n",
+ " 'computeMillis': 109479,\n",
+ " 'mutateMillis': 39,\n",
+ " 'postProcessingMillis': -1,\n",
+ " 'nodesCompared': 4538,\n",
+ " 'relationshipsWritten': 23609,\n",
+ " 'similarityDistribution': {'p1': 0.05039477348327637,\n",
+ " 'max': 0.9291150569915771,\n",
+ " 'p5': 0.05223870277404785,\n",
+ " 'p90': 0.27272772789001465,\n",
+ " 'p50': 0.08695673942565918,\n",
+ " 'p95': 0.3424661159515381,\n",
+ " 'p10': 0.05494499206542969,\n",
+ " 'p75': 0.14691996574401855,\n",
+ " 'p99': 0.46153998374938965,\n",
+ " 'p25': 0.06399989128112793,\n",
+ " 'p100': 0.9291150569915771,\n",
+ " 'min': 0.04999995231628418,\n",
+ " 'mean': 0.1265612697302399,\n",
+ " 'stdDev': 0.09586148263128431},\n",
+ " 'configuration': {'topK': 10,\n",
+ " 'similarityMetric': 'JACCARD',\n",
+ " 'bottomK': 10,\n",
+ " 'bottomN': 0,\n",
+ " 'mutateRelationshipType': 'SHARED_AUDIENCE',\n",
+ " 'topN': 0,\n",
+ " 'concurrency': 4,\n",
+ " 'jobId': '78d599a8-ae8b-4a5d-8ccb-1471b7b6bbeb',\n",
+ " 'degreeCutoff': 1,\n",
+ " 'similarityCutoff': 0.05,\n",
+ " 'logProgress': True,\n",
+ " 'nodeLabels': ['*'],\n",
+ " 'sudo': True,\n",
+ " 'relationshipTypes': ['*'],\n",
+ " 'mutateProperty': 'score'}}]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 9
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "The node similarity algorithm will take about 30 seconds to complete as the database has almost five million users. The Cypher statement to provide recommendations using Personalized PageRank is the following:"
+ ],
+ "metadata": {
+ "id": "LFCvIpyW0o2J"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "graph.query(\"\"\"\n",
+ "MATCH (s:Stream)\n",
+ "WHERE s.name = \"kimdoe\"\n",
+ "WITH collect(s) AS sourceNodes\n",
+ "CALL gds.pageRank.stream(\"shared-audience\", \n",
+ " {sourceNodes:sourceNodes, relationshipTypes:['SHARED_AUDIENCE'], \n",
+ " nodeLabels:['Stream']})\n",
+ "YIELD nodeId, score\n",
+ "WITH gds.util.asNode(nodeId) AS node, score\n",
+ "WHERE NOT node in sourceNodes\n",
+ "RETURN node.name AS streamer, score\n",
+ "ORDER BY score DESC LIMIT 3\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "hmoSjlwsquPi",
+ "outputId": "a35e7e2b-9016-42f3-8e1d-ce97d0282e86"
+ },
+ "execution_count": 10,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[{'streamer': 'tranth', 'score': 0.13697276805472164},\n",
+ " {'streamer': 'jungtaejune', 'score': 0.13697276805472164},\n",
+ " {'streamer': 'hanryang1125', 'score': 0.1051181893540686}]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 10
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "The OpenAI LLMs could be better at using the Graph Data Science library as their knowledge cutoff is September 2021, and version 2 of the Graph Data Science library was released in April 2022. Therefore, we need to provide another example in the prompt to show the LLM show to use Personalized PageRank to give recommendations."
+ ],
+ "metadata": {
+ "id": "VlfPC50_0rBU"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# https://github.com/hwchase17/langchain/blob/master/langchain/chains/graph_qa/prompts.py\n",
+ "\n",
+ "CYPHER_RECOMMENDATION_TEMPLATE = \"\"\"Task:Generate Cypher statement to query a graph database.\n",
+ "Instructions:\n",
+ "Use only the provided relationship types and properties in the schema.\n",
+ "Do not use any other relationship types or properties that are not provided.\n",
+ "Schema:\n",
+ "{schema}\n",
+ "Cypher examples:\n",
+ "# How many streamers are from Norway?\n",
+ "MATCH (s:Stream)-[:HAS_LANGUAGE]->(:Language {{name: 'no'}})\n",
+ "RETURN count(s) AS streamers\n",
+ "# Which streamers do you recommend if I like kimdoe?\n",
+ "MATCH (s:Stream)\n",
+ "WHERE s.name = \"kimdoe\"\n",
+ "WITH collect(s) AS sourceNodes\n",
+ "CALL gds.pageRank.stream(\"shared-audience\", \n",
+ " {{sourceNodes:sourceNodes, relationshipTypes:['SHARED_AUDIENCE'], \n",
+ " nodeLabels:['Stream']}})\n",
+ "YIELD nodeId, score\n",
+ "WITH gds.util.asNode(nodeId) AS node, score\n",
+ "WHERE NOT node in sourceNodes\n",
+ "RETURN node.name AS streamer, score\n",
+ "ORDER BY score DESC LIMIT 3\n",
+ "\n",
+ "Note: Do not include any explanations or apologies in your responses.\n",
+ "Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.\n",
+ "Do not include any text except the generated Cypher statement.\n",
+ "\n",
+ "The question is:\n",
+ "{question}\"\"\"\n",
+ "CYPHER_RECOMMENDATION_PROMPT = PromptTemplate(\n",
+ " input_variables=[\"schema\", \"question\"], template=CYPHER_RECOMMENDATION_TEMPLATE\n",
+ ")"
+ ],
+ "metadata": {
+ "id": "ZzgTn65jJXvF"
+ },
+ "execution_count": 11,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "We can now test the Personalized PageRank recommendations."
+ ],
+ "metadata": {
+ "id": "KaSQI5me0s5f"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "chain_recommendation_example = GraphCypherQAChain.from_llm(\n",
+ " ChatOpenAI(temperature=0, model_name='gpt-4'), graph=graph, verbose=True,\n",
+ " cypher_prompt=CYPHER_RECOMMENDATION_PROMPT, \n",
+ ")\n",
+ "\n",
+ "chain_recommendation_example.run(\"\"\"\n",
+ "Which streamers do you recommend if I like pokimane?\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "id": "r14bQBlaL03n",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 405
+ },
+ "outputId": "25bef31a-21b2-44e6-b09a-0e60ea97bec6"
+ },
+ "execution_count": 12,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "\n",
+ "\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
+ "Generated Cypher:\n",
+ "\u001b[32;1m\u001b[1;3mMATCH (s:Stream)\n",
+ "WHERE s.name = \"pokimane\"\n",
+ "WITH collect(s) AS sourceNodes\n",
+ "CALL gds.pageRank.stream(\"shared-audience\", \n",
+ " {sourceNodes:sourceNodes, relationshipTypes:['SHARED_AUDIENCE'], \n",
+ " nodeLabels:['Stream']})\n",
+ "YIELD nodeId, score\n",
+ "WITH gds.util.asNode(nodeId) AS node, score\n",
+ "WHERE NOT node in sourceNodes\n",
+ "RETURN node.name AS streamer, score\n",
+ "ORDER BY score DESC LIMIT 3\u001b[0m\n",
+ "Full Context:\n",
+ "\u001b[32;1m\u001b[1;3m[{'streamer': 'xchocobars', 'score': 0.2343657053097286}, {'streamer': 'ariasaki', 'score': 0.06485239618458194}, {'streamer': 'natsumiii', 'score': 0.05969369486512491}]\u001b[0m\n",
+ "\n",
+ "\u001b[1m> Finished chain.\u001b[0m\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "\"Based on the information provided, I recommend checking out the following streamers if you like Pokimane:\\n\\n1. xchocobars with a score of 0.2344\\n2. ariasaki with a score of 0.0649\\n3. natsumiii with a score of 0.0597\\n\\nThese scores indicate their similarity to Pokimane's content and style. Enjoy watching!\""
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 12
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Unfortunately, here, we have to use the gpt-4 model as the gpt-3.5-turbo is stubborn and doesn't want to imitate the complex Personalized PageRank example.\n",
+ "\n",
+ "We can also test if the gpt-4 model will decide to generalize the Personalized PageRank recommendation in other use cases."
+ ],
+ "metadata": {
+ "id": "F6BOSdmG0u8W"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "chain_recommendation_example.run(\"\"\"\n",
+ "Which streamers do you recommend to watch if I like Chess games?\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 278
+ },
+ "id": "ZvWNlpFIMJwj",
+ "outputId": "15f6edae-2938-4ae2-aa9c-d1ad3d78ddbb"
+ },
+ "execution_count": 13,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "\n",
+ "\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
+ "Generated Cypher:\n",
+ "\u001b[32;1m\u001b[1;3mMATCH (s:Stream)-[:PLAYS]->(:Game {name: 'Chess'})\n",
+ "RETURN s.name AS streamer\n",
+ "ORDER BY s.followers DESC LIMIT 10\u001b[0m\n",
+ "Full Context:\n",
+ "\u001b[32;1m\u001b[1;3m[{'streamer': 'gmhikaru'}, {'streamer': 'thisisnotgeorgenotfound'}, {'streamer': 'gothamchess'}, {'streamer': 'mates'}, {'streamer': 'akanemsko'}, {'streamer': 'xntentacion'}, {'streamer': 'chessbrah'}, {'streamer': 'inet_saju'}, {'streamer': 'annacramling'}, {'streamer': 'michelleputtini'}]\u001b[0m\n",
+ "\n",
+ "\u001b[1m> Finished chain.\u001b[0m\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'I recommend the following chess streamers for you to watch: gmhikaru, thisisnotgeorgenotfound, gothamchess, mates, akanemsko, xntentacion, chessbrah, inet_saju, annacramling, and michelleputtini. Enjoy watching their chess games!'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 13
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "The LLM decided to take a more straightforward route to provide recommendations and simply returned the three chess players with the highest follower count. We can't really blame it for choosing this option.\n",
+ "\n",
+ "However, LLMs are quite good at listening to hints:"
+ ],
+ "metadata": {
+ "id": "72O2RJqR0xtE"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "chain_recommendation_example.run(\"\"\"\n",
+ "Which streamers do you recommend to watch if I like Chess games?\n",
+ "Use Personalized PageRank to provide recommendations.\n",
+ "Do not exclude sourceNodes in the answer\n",
+ "\"\"\")"
+ ],
+ "metadata": {
+ "id": "ys23tsgfMYRp",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 368
+ },
+ "outputId": "672cd445-cd6e-464d-cb4e-557a81f638b5"
+ },
+ "execution_count": 14,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "\n",
+ "\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
+ "Generated Cypher:\n",
+ "\u001b[32;1m\u001b[1;3mMATCH (s:Stream)-[:PLAYS]->(:Game {name: 'Chess'})\n",
+ "WITH collect(s) AS sourceNodes\n",
+ "CALL gds.pageRank.stream(\"shared-audience\", \n",
+ " {sourceNodes:sourceNodes, relationshipTypes:['SHARED_AUDIENCE'], \n",
+ " nodeLabels:['Stream']})\n",
+ "YIELD nodeId, score\n",
+ "WITH gds.util.asNode(nodeId) AS node, score\n",
+ "RETURN node.name AS streamer, score\n",
+ "ORDER BY score DESC LIMIT 3\u001b[0m\n",
+ "Full Context:\n",
+ "\u001b[32;1m\u001b[1;3m[{'streamer': 'segonaye', 'score': 1.1104359051332637}, {'streamer': 'dafatw01', 'score': 0.978557815808113}, {'streamer': 'chessbrah', 'score': 0.9612404689154856}]\u001b[0m\n",
+ "\n",
+ "\u001b[1m> Finished chain.\u001b[0m\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'Based on the Personalized PageRank algorithm, I recommend the following streamers for watching Chess games:\\n\\n1. segonaye with a score of 1.1104359051332637\\n2. dafatw01 with a score of 0.978557815808113\\n3. chessbrah with a score of 0.9612404689154856\\n\\nThese streamers have been ranked according to their relevance to Chess games. Enjoy watching!'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 14
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Summary\n",
+ "In this blog post, we expanded on using knowledge graphs in LangChain applications, focusing on improving prompts for better Cypher statements. The main opportunity to improve the Cypher generation accuracy is to use the few-shot capabilities of LLMs, offering Cypher statement examples that dictate the type of statements an LLM should produce. Sometimes, the LLM model doesn't correctly guess the property values, while other times, it doesn't provide the Cypher statements we would like it to generate. Additionally, we have looked at how we can use graph algorithms like Personalized PageRank in LLM applications to provide better and more relevant answers."
+ ],
+ "metadata": {
+ "id": "vD3m_ovB01_3"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [],
+ "metadata": {
+ "id": "zJvtuXlvuaon"
+ },
+ "execution_count": 14,
+ "outputs": []
+ }
+ ]
+}
\ No newline at end of file
diff --git a/LLMs/src/langchain_neo4j_vertexai.ipynb b/LLMs/src/langchain_neo4j_vertexai.ipynb
new file mode 100644
index 0000000..328ee8e
--- /dev/null
+++ b/LLMs/src/langchain_neo4j_vertexai.ipynb
@@ -0,0 +1,582 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "provenance": [],
+ "authorship_tag": "ABX9TyNOOwWqANx97QWxWt3GRMrG",
+ "include_colab_link": true
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github",
+ "colab_type": "text"
+ },
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "id": "ISol2e_-PGJK",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "a57d5819-205c-4841-d1d2-bd295016c581"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.6/2.6 MB\u001b[0m \u001b[31m25.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.1/1.1 MB\u001b[0m \u001b[31m27.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m188.5/188.5 kB\u001b[0m \u001b[31m12.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25h Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
+ " Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
+ " Installing backend dependencies ... \u001b[?25l\u001b[?25hdone\n",
+ " Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m321.3/321.3 kB\u001b[0m \u001b[31m29.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.0/2.0 MB\u001b[0m \u001b[31m78.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.0/1.0 MB\u001b[0m \u001b[31m68.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m90.0/90.0 kB\u001b[0m \u001b[31m9.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m114.5/114.5 kB\u001b[0m \u001b[31m10.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m268.8/268.8 kB\u001b[0m \u001b[31m25.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m149.6/149.6 kB\u001b[0m \u001b[31m16.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.1/49.1 kB\u001b[0m \u001b[31m5.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25h Building wheel for neo4j (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip install --quiet --upgrade google-cloud-aiplatform langchain neo4j"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "import IPython\n",
+ "app = IPython.Application.instance()\n",
+ "app.kernel.do_shutdown(True)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "r_D_9En9ceMe",
+ "outputId": "9be7a2c5-79cc-40ce-b857-ede1372fb4b5"
+ },
+ "execution_count": 2,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "{'status': 'ok', 'restart': True}"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 2
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "from langchain.graphs import Neo4jGraph\n",
+ "\n",
+ "graph = Neo4jGraph(\n",
+ " url=\"bolt://34.239.133.123:7687\",\n",
+ " username=\"neo4j\",\n",
+ " password=\"traps-henry-milliliters\"\n",
+ ")"
+ ],
+ "metadata": {
+ "id": "WedomuMuPT91"
+ },
+ "execution_count": 1,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "from google.colab import files\n",
+ "\n",
+ "uploaded = files.upload()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 73
+ },
+ "id": "7zgoRpLPb1Kv",
+ "outputId": "45c6238f-b6b1-4305-8a1d-8a3c35d983c0"
+ },
+ "execution_count": 2,
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ ""
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ " \n",
+ " "
+ ]
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Saving credentials.json to credentials.json\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "import os\n",
+ "\n",
+ "from langchain.chat_models import ChatVertexAI\n",
+ "from langchain.chains import GraphCypherQAChain\n",
+ "\n",
+ "os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = \"credentials.json\""
+ ],
+ "metadata": {
+ "id": "Ax6qjSxIPMy2"
+ },
+ "execution_count": 3,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "from langchain.chains import GraphCypherQAChain\n",
+ "\n",
+ "chain = GraphCypherQAChain.from_llm(\n",
+ " ChatVertexAI(temperature=0, model_name=\"codechat-bison\"), graph=graph, verbose=True,\n",
+ ")\n",
+ "\n",
+ "chain.run(\"Who is the most followed fortnite streamer?\")"
+ ],
+ "metadata": {
+ "id": "8uYg8LhCQRU9",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 261
+ },
+ "outputId": "5b2a7617-c132-4402-b2cc-73a5a618ad29"
+ },
+ "execution_count": 4,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "\n",
+ "\u001b[1m> Entering new chain...\u001b[0m\n",
+ "Generated Cypher:\n",
+ "\u001b[32;1m\u001b[1;3m\n",
+ "MATCH (s:Stream)-[:PLAYS]->(g:Game)\n",
+ "WHERE g.name = \"Fortnite\"\n",
+ "RETURN s.followers ORDER BY s.followers DESC LIMIT 1\n",
+ "\u001b[0m\n",
+ "Full Context:\n",
+ "\u001b[32;1m\u001b[1;3m[{'s.followers': 7269018}]\u001b[0m\n",
+ "\n",
+ "\u001b[1m> Finished chain.\u001b[0m\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'The most followed Fortnite streamer is Ninja.'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 4
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "chain.run(\"Please recommend a good chess stream?\")"
+ ],
+ "metadata": {
+ "id": "eDEYAakNe21i",
+ "outputId": "396acdbc-1a42-422d-ac9d-a901b58e9176",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 296
+ }
+ },
+ "execution_count": 6,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "\n",
+ "\u001b[1m> Entering new chain...\u001b[0m\n",
+ "Generated Cypher:\n",
+ "\u001b[32;1m\u001b[1;3m\n",
+ "MATCH (s:Stream)-[:PLAYS]->(g:Game)\n",
+ "WHERE g.name = \"Chess\"\n",
+ "RETURN s\n",
+ "ORDER BY s.total_view_count DESC\n",
+ "LIMIT 1\n",
+ "\u001b[0m\n",
+ "Full Context:\n",
+ "\u001b[32;1m\u001b[1;3m[{'s': {'followers': 8772, 'name': 'imsatranc', 'url': 'https://www.twitch.tv/imsatranc'}}]\u001b[0m\n",
+ "\n",
+ "\u001b[1m> Finished chain.\u001b[0m\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'I recommend imsatranc. He has 8772 followers and his stream is at https://www.twitch.tv/imsatranc.'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 6
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "chain.run(\"Which are the top five games played by streamers?\")"
+ ],
+ "metadata": {
+ "id": "k_V1nduwfPfk",
+ "outputId": "b1591717-bd2a-448a-e6be-23160f269228",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 316
+ }
+ },
+ "execution_count": 14,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "\n",
+ "\u001b[1m> Entering new chain...\u001b[0m\n",
+ "Generated Cypher:\n",
+ "\u001b[32;1m\u001b[1;3m\n",
+ "MATCH (s:Stream)-[:PLAYS]->(g:Game)\n",
+ "RETURN g.name AS game,\n",
+ " COUNT(*) AS number_of_streams\n",
+ "ORDER BY number_of_streams DESC\n",
+ "LIMIT 5;\n",
+ "\u001b[0m\n",
+ "Full Context:\n",
+ "\u001b[32;1m\u001b[1;3m[{'game': 'Just Chatting', 'number_of_streams': 868}, {'game': 'Resident Evil Village', 'number_of_streams': 442}, {'game': 'Grand Theft Auto V', 'number_of_streams': 380}, {'game': 'League of Legends', 'number_of_streams': 279}, {'game': 'Fortnite', 'number_of_streams': 217}]\u001b[0m\n",
+ "\n",
+ "\u001b[1m> Finished chain.\u001b[0m\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'The top five games played by streamers are:\\n\\n1. Just Chatting\\n2. Resident Evil Village\\n3. Grand Theft Auto V\\n4. League of Legends\\n5. Fortnite'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 14
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "chain.run(\"Which streams are the most important based on PageRank?\")"
+ ],
+ "metadata": {
+ "id": "th3WtJwofgQU",
+ "outputId": "21ba5792-93a9-431b-aaa9-42201ffca095",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 814
+ }
+ },
+ "execution_count": 10,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "\n",
+ "\u001b[1m> Entering new chain...\u001b[0m\n",
+ "Generated Cypher:\n",
+ "\u001b[32;1m\u001b[1;3m\n",
+ "MATCH (s:Stream)\n",
+ "WITH s AS stream\n",
+ "CALL pageRank(\n",
+ " startNode: stream,\n",
+ " maxIterations: 10,\n",
+ " dampingFactor: 0.85\n",
+ ")\n",
+ "RETURN stream,\n",
+ " rnk AS pageRank\n",
+ "ORDER BY rnk DESC\n",
+ "LIMIT 10\n",
+ "\u001b[0m\n"
+ ]
+ },
+ {
+ "output_type": "error",
+ "ename": "ValueError",
+ "evalue": "ignored",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mCypherSyntaxError\u001b[0m Traceback (most recent call last)",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/langchain/graphs/neo4j_graph.py\u001b[0m in \u001b[0;36mquery\u001b[0;34m(self, query, params)\u001b[0m\n\u001b[1;32m 80\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 81\u001b[0;31m \u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msession\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mquery\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mparams\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 82\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mr\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/neo4j/_sync/work/session.py\u001b[0m in \u001b[0;36mrun\u001b[0;34m(self, query, parameters, **kwargs)\u001b[0m\n\u001b[1;32m 310\u001b[0m \u001b[0mparameters\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mparameters\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0;34m{\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 311\u001b[0;31m self._auto_result._run(\n\u001b[0m\u001b[1;32m 312\u001b[0m \u001b[0mquery\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mparameters\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_config\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdatabase\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/neo4j/_sync/work/result.py\u001b[0m in \u001b[0;36m_run\u001b[0;34m(self, query, parameters, db, imp_user, access_mode, bookmarks, notifications_min_severity, notifications_disabled_categories)\u001b[0m\n\u001b[1;32m 165\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_connection\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msend_all\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 166\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_attach\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 167\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/neo4j/_sync/work/result.py\u001b[0m in \u001b[0;36m_attach\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 273\u001b[0m \u001b[0;32mwhile\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_attached\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 274\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_connection\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfetch_message\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 275\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/neo4j/_sync/io/_common.py\u001b[0m in \u001b[0;36minner\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 179\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 180\u001b[0;31m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 181\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mNeo4jError\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mServiceUnavailable\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mSessionExpired\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mexc\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/neo4j/_sync/io/_bolt.py\u001b[0m in \u001b[0;36mfetch_message\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 807\u001b[0m )\n\u001b[0;32m--> 808\u001b[0;31m \u001b[0mres\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_process_message\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtag\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfields\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 809\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0midle_since\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mperf_counter\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/neo4j/_sync/io/_bolt5.py\u001b[0m in \u001b[0;36m_process_message\u001b[0;34m(self, tag, fields)\u001b[0m\n\u001b[1;32m 360\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 361\u001b[0;31m \u001b[0mresponse\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mon_failure\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msummary_metadata\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0;34m{\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 362\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mServiceUnavailable\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mDatabaseUnavailable\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/neo4j/_sync/io/_common.py\u001b[0m in \u001b[0;36mon_failure\u001b[0;34m(self, metadata)\u001b[0m\n\u001b[1;32m 246\u001b[0m \u001b[0mUtil\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcallback\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mhandler\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 247\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mNeo4jError\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhydrate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m**\u001b[0m\u001b[0mmetadata\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 248\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;31mCypherSyntaxError\u001b[0m: {code: Neo.ClientError.Statement.SyntaxError} {message: Invalid input '10': expected \"%\", \"(\", \"YIELD\" or an identifier (line 6, column 20 (offset: 92))\n\" maxIterations: 10,\"\n ^}",
+ "\nDuring handling of the above exception, another exception occurred:\n",
+ "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)",
+ "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mchain\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Which streams are the most important based on PageRank?\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/langchain/chains/base.py\u001b[0m in \u001b[0;36mrun\u001b[0;34m(self, callbacks, tags, *args, **kwargs)\u001b[0m\n\u001b[1;32m 288\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 289\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"`run` supports only one positional argument.\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 290\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcallbacks\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcallbacks\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtags\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtags\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0m_output_key\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 291\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 292\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mkwargs\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0margs\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/langchain/chains/base.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, inputs, return_only_outputs, callbacks, tags, include_run_info)\u001b[0m\n\u001b[1;32m 164\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mKeyboardInterrupt\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mException\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 165\u001b[0m \u001b[0mrun_manager\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mon_chain_error\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 166\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 167\u001b[0m \u001b[0mrun_manager\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mon_chain_end\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moutputs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 168\u001b[0m final_outputs: Dict[str, Any] = self.prep_outputs(\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/langchain/chains/base.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, inputs, return_only_outputs, callbacks, tags, include_run_info)\u001b[0m\n\u001b[1;32m 158\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 159\u001b[0m outputs = (\n\u001b[0;32m--> 160\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_call\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minputs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrun_manager\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mrun_manager\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 161\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mnew_arg_supported\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 162\u001b[0m \u001b[0;32melse\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_call\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minputs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/langchain/chains/graph_qa/cypher.py\u001b[0m in \u001b[0;36m_call\u001b[0;34m(self, inputs, run_manager)\u001b[0m\n\u001b[1;32m 110\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 111\u001b[0m \u001b[0;31m# Retrieve and limit the number of results\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 112\u001b[0;31m \u001b[0mcontext\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgraph\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mquery\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgenerated_cypher\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtop_k\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 113\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 114\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreturn_direct\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/langchain/graphs/neo4j_graph.py\u001b[0m in \u001b[0;36mquery\u001b[0;34m(self, query, params)\u001b[0m\n\u001b[1;32m 82\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mr\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 83\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mCypherSyntaxError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 84\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Generated Cypher Statement is not valid\\n\"\u001b[0m \u001b[0;34mf\"{e}\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 85\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 86\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mrefresh_schema\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m->\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;31mValueError\u001b[0m: Generated Cypher Statement is not valid\n{code: Neo.ClientError.Statement.SyntaxError} {message: Invalid input '10': expected \"%\", \"(\", \"YIELD\" or an identifier (line 6, column 20 (offset: 92))\n\" maxIterations: 10,\"\n ^}"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [],
+ "metadata": {
+ "id": "2kmzV4euf-oT"
+ },
+ "execution_count": null,
+ "outputs": []
+ }
+ ]
+}
\ No newline at end of file