diff --git a/README.md b/README.md index 89bfb1d8b..0dcd63121 100644 --- a/README.md +++ b/README.md @@ -43,24 +43,24 @@ TruLens supports the evaluation of tracking for any LLM app framework. Choose a **Langchain** -[langchain_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/langchain_quickstart.ipynb). -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/colab/langchain_quickstart_colab.ipynb) +[langchain_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/langchain_quickstart.ipynb). +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/colab/langchain_quickstart_colab.ipynb) -[langchain_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/py_script_quickstarts/langchain_quickstart.py). +[langchain_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/py_script_quickstarts/langchain_quickstart.py). **Llama-Index** -[llama_index_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/llama_index_quickstart.ipynb). -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/colab/llama_index_quickstart_colab.ipynb) +[llama_index_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/llama_index_quickstart.ipynb). +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/colab/llama_index_quickstart_colab.ipynb) -[llama_index_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/py_script_quickstarts/llama_index_quickstart.py) +[llama_index_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/py_script_quickstarts/llama_index_quickstart.py) **No Framework** -[text2text_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/text2text_quickstart.ipynb). -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/colab/text2text_quickstart_colab.ipynb) +[text2text_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/text2text_quickstart.ipynb). +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/colab/text2text_quickstart_colab.ipynb) -[text2text_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/py_script_quickstarts/text2text_quickstart.py) +[text2text_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/py_script_quickstarts/text2text_quickstart.py) ## TruLens-Explain diff --git a/docs/trulens_eval/gh_top_intro.md b/docs/trulens_eval/gh_top_intro.md index 881d9c59b..0dff33007 100644 --- a/docs/trulens_eval/gh_top_intro.md +++ b/docs/trulens_eval/gh_top_intro.md @@ -43,21 +43,21 @@ TruLens supports the evaluation of tracking for any LLM app framework. Choose a **Langchain** -[langchain_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/langchain_quickstart.ipynb). -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/colab/langchain_quickstart_colab.ipynb) +[langchain_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/langchain_quickstart.ipynb). +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/colab/langchain_quickstart_colab.ipynb) -[langchain_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/py_script_quickstarts/langchain_quickstart.py). +[langchain_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/py_script_quickstarts/langchain_quickstart.py). **Llama-Index** -[llama_index_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/llama_index_quickstart.ipynb). -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/colab/llama_index_quickstart_colab.ipynb) +[llama_index_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/llama_index_quickstart.ipynb). +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/colab/llama_index_quickstart_colab.ipynb) -[llama_index_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/py_script_quickstarts/llama_index_quickstart.py) +[llama_index_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/py_script_quickstarts/llama_index_quickstart.py) **No Framework** -[text2text_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/text2text_quickstart.ipynb). -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/colab/text2text_quickstart_colab.ipynb) +[text2text_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/text2text_quickstart.ipynb). +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/colab/text2text_quickstart_colab.ipynb) -[text2text_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/py_script_quickstarts/text2text_quickstart.py) +[text2text_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/py_script_quickstarts/text2text_quickstart.py) diff --git a/docs/trulens_eval/intro.md b/docs/trulens_eval/intro.md index 26731ebf5..9368300bf 100644 --- a/docs/trulens_eval/intro.md +++ b/docs/trulens_eval/intro.md @@ -46,24 +46,24 @@ TruLens supports the evaluation of tracking for any LLM app framework. Choose a **Langchain** -[langchain_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/langchain_quickstart.ipynb). -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/colab/langchain_quickstart_colab.ipynb) +[langchain_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/langchain_quickstart.ipynb). +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/colab/langchain_quickstart_colab.ipynb) -[langchain_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/py_script_quickstarts/langchain_quickstart.py). +[langchain_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/py_script_quickstarts/langchain_quickstart.py). **Llama-Index** -[llama_index_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/llama_index_quickstart.ipynb). -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/colab/llama_index_quickstart_colab.ipynb) +[llama_index_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/llama_index_quickstart.ipynb). +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/colab/llama_index_quickstart_colab.ipynb) -[llama_index_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/py_script_quickstarts/llama_index_quickstart.py) +[llama_index_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/py_script_quickstarts/llama_index_quickstart.py) **No Framework** -[text2text_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/text2text_quickstart.ipynb). -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/colab/text2text_quickstart_colab.ipynb) +[text2text_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/text2text_quickstart.ipynb). +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/colab/text2text_quickstart_colab.ipynb) -[text2text_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/py_script_quickstarts/text2text_quickstart.py) +[text2text_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/py_script_quickstarts/text2text_quickstart.py) ### 💡 Contributing diff --git a/trulens_eval/README.md b/trulens_eval/README.md index 26731ebf5..9368300bf 100644 --- a/trulens_eval/README.md +++ b/trulens_eval/README.md @@ -46,24 +46,24 @@ TruLens supports the evaluation of tracking for any LLM app framework. Choose a **Langchain** -[langchain_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/langchain_quickstart.ipynb). -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/colab/langchain_quickstart_colab.ipynb) +[langchain_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/langchain_quickstart.ipynb). +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/colab/langchain_quickstart_colab.ipynb) -[langchain_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/py_script_quickstarts/langchain_quickstart.py). +[langchain_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/py_script_quickstarts/langchain_quickstart.py). **Llama-Index** -[llama_index_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/llama_index_quickstart.ipynb). -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/colab/llama_index_quickstart_colab.ipynb) +[llama_index_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/llama_index_quickstart.ipynb). +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/colab/llama_index_quickstart_colab.ipynb) -[llama_index_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/py_script_quickstarts/llama_index_quickstart.py) +[llama_index_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/py_script_quickstarts/llama_index_quickstart.py) **No Framework** -[text2text_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/text2text_quickstart.ipynb). -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/colab/text2text_quickstart_colab.ipynb) +[text2text_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/text2text_quickstart.ipynb). +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/colab/text2text_quickstart_colab.ipynb) -[text2text_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.15.3/trulens_eval/examples/quickstart/py_script_quickstarts/text2text_quickstart.py) +[text2text_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.16.0/trulens_eval/examples/quickstart/py_script_quickstarts/text2text_quickstart.py) ### 💡 Contributing diff --git a/trulens_eval/examples/expositional/vector-dbs/pinecone/pinecone_evals_build_better_rags.ipynb b/trulens_eval/examples/expositional/vector-dbs/pinecone/pinecone_evals_build_better_rags.ipynb index f4aa8a0ea..540e73a9c 100644 --- a/trulens_eval/examples/expositional/vector-dbs/pinecone/pinecone_evals_build_better_rags.ipynb +++ b/trulens_eval/examples/expositional/vector-dbs/pinecone/pinecone_evals_build_better_rags.ipynb @@ -41,7 +41,7 @@ "metadata": {}, "outputs": [], "source": [ - "!pip install -qU trulens-eval==0.15.3 langchain==0.0.315 openai==0.28.1 tiktoken==0.5.1 \"pinecone-client[grpc]==2.2.4\" pinecone-datasets==0.5.1 datasets==2.14.5" + "!pip install -qU trulens-eval==0.16.0 langchain==0.0.315 openai==0.28.1 tiktoken==0.5.1 \"pinecone-client[grpc]==2.2.4\" pinecone-datasets==0.5.1 datasets==2.14.5" ] }, { diff --git a/trulens_eval/examples/quickstart/colab/langchain_quickstart_colab.ipynb b/trulens_eval/examples/quickstart/colab/langchain_quickstart_colab.ipynb index 8b50e642b..b895c9bb5 100644 --- a/trulens_eval/examples/quickstart/colab/langchain_quickstart_colab.ipynb +++ b/trulens_eval/examples/quickstart/colab/langchain_quickstart_colab.ipynb @@ -17,7 +17,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "9006f7e5", + "id": "43c40f8c", "metadata": {}, "source": [ "# Langchain Quickstart\n", @@ -30,17 +30,17 @@ { "cell_type": "code", "execution_count": null, - "id": "e7a4cb73", + "id": "7252cea6", "metadata": {}, "outputs": [], "source": [ - "# ! pip install trulens_eval==0.15.3 langchain>=0.0.263" + "# ! pip install trulens_eval==0.16.0 langchain>=0.0.263" ] }, { "attachments": {}, "cell_type": "markdown", - "id": "4ab38498", + "id": "c9cad9a4", "metadata": {}, "source": [ "## Setup\n", @@ -51,7 +51,7 @@ { "cell_type": "code", "execution_count": null, - "id": "568b10ef", + "id": "7f1ac901", "metadata": {}, "outputs": [], "source": [ @@ -63,7 +63,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "530f1fb0", + "id": "e32c71ae", "metadata": {}, "source": [ "### Import from LangChain and TruLens" @@ -72,7 +72,7 @@ { "cell_type": "code", "execution_count": null, - "id": "7171031d", + "id": "837c1cbd", "metadata": {}, "outputs": [], "source": [ @@ -95,7 +95,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "89cc5a91", + "id": "29e890d1", "metadata": {}, "source": [ "### Create Simple LLM Application\n", @@ -106,7 +106,7 @@ { "cell_type": "code", "execution_count": null, - "id": "45e6a3db", + "id": "0841bc6a", "metadata": {}, "outputs": [], "source": [ @@ -128,7 +128,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "ba6384b0", + "id": "03d83080", "metadata": {}, "source": [ "### Send your first request" @@ -137,7 +137,7 @@ { "cell_type": "code", "execution_count": null, - "id": "c78eba16", + "id": "0b0164c8", "metadata": {}, "outputs": [], "source": [ @@ -147,7 +147,7 @@ { "cell_type": "code", "execution_count": null, - "id": "b644712a", + "id": "e35d82cc", "metadata": {}, "outputs": [], "source": [ @@ -159,7 +159,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "c1fd9e81", + "id": "91eb2148", "metadata": {}, "source": [ "## Initialize Feedback Function(s)" @@ -168,7 +168,7 @@ { "cell_type": "code", "execution_count": null, - "id": "31cef648", + "id": "75262341", "metadata": {}, "outputs": [], "source": [ @@ -184,7 +184,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "1741be4e", + "id": "f41ccf78", "metadata": {}, "source": [ "## Instrument chain for logging with TruLens" @@ -193,7 +193,7 @@ { "cell_type": "code", "execution_count": null, - "id": "7c6acdf0", + "id": "16aa7a47", "metadata": {}, "outputs": [], "source": [ @@ -205,7 +205,7 @@ { "cell_type": "code", "execution_count": null, - "id": "f44c9977", + "id": "56ab4066", "metadata": {}, "outputs": [], "source": [ @@ -217,7 +217,7 @@ }, { "cell_type": "markdown", - "id": "81d7e1c6", + "id": "ac14bb88", "metadata": {}, "source": [ "## Retrieve records and feedback" @@ -226,7 +226,7 @@ { "cell_type": "code", "execution_count": null, - "id": "e71c86ef", + "id": "6b227bd2", "metadata": {}, "outputs": [], "source": [ @@ -241,7 +241,7 @@ { "cell_type": "code", "execution_count": null, - "id": "c68972e3", + "id": "9bd6dc55", "metadata": {}, "outputs": [], "source": [ @@ -263,7 +263,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "5e92c52f", + "id": "a271adc2", "metadata": {}, "source": [ "## Explore in a Dashboard" @@ -272,7 +272,7 @@ { "cell_type": "code", "execution_count": null, - "id": "427b551a", + "id": "b0f2a3b1", "metadata": {}, "outputs": [], "source": [ @@ -284,7 +284,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "4eb451d6", + "id": "d59c64a2", "metadata": {}, "source": [ "Alternatively, you can run `trulens-eval` from a command line in the same folder to start the dashboard." @@ -293,7 +293,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "2d79a878", + "id": "6249580d", "metadata": {}, "source": [ "### Chain Leaderboard\n", @@ -326,7 +326,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "2cc19084", + "id": "e6d8282a", "metadata": {}, "source": [ "Note: Feedback functions evaluated in the deferred manner can be seen in the \"Progress\" page of the TruLens dashboard." @@ -335,7 +335,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "1c499778", + "id": "def6bf2c", "metadata": {}, "source": [ "## Or view results directly in your notebook" @@ -344,7 +344,7 @@ { "cell_type": "code", "execution_count": null, - "id": "004879bf", + "id": "8ea1b983", "metadata": {}, "outputs": [], "source": [ diff --git a/trulens_eval/examples/quickstart/colab/llama_index_quickstart_colab.ipynb b/trulens_eval/examples/quickstart/colab/llama_index_quickstart_colab.ipynb index b62b376a8..67e0fd77f 100644 --- a/trulens_eval/examples/quickstart/colab/llama_index_quickstart_colab.ipynb +++ b/trulens_eval/examples/quickstart/colab/llama_index_quickstart_colab.ipynb @@ -17,7 +17,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "26f87cc8", + "id": "ebc95f04", "metadata": {}, "source": [ "# Llama-Index Quickstart\n", @@ -32,7 +32,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "b75b26eb", + "id": "0e65b87c", "metadata": {}, "source": [ "## Setup\n", @@ -44,17 +44,17 @@ { "cell_type": "code", "execution_count": null, - "id": "dc735073", + "id": "f49f8da3", "metadata": {}, "outputs": [], "source": [ - "#! pip install trulens-eval==0.15.3 llama_index>=0.8.29post1 html2text>=2020.1.16" + "#! pip install trulens-eval==0.16.0 llama_index>=0.8.29post1 html2text>=2020.1.16" ] }, { "attachments": {}, "cell_type": "markdown", - "id": "d5869342", + "id": "eb064a67", "metadata": {}, "source": [ "### Add API keys\n", @@ -64,7 +64,7 @@ { "cell_type": "code", "execution_count": null, - "id": "80c4270b", + "id": "bb08bb57", "metadata": {}, "outputs": [], "source": [ @@ -75,7 +75,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "2e21ccc8", + "id": "de3f940c", "metadata": {}, "source": [ "### Import from LlamaIndex and TruLens" @@ -84,7 +84,7 @@ { "cell_type": "code", "execution_count": null, - "id": "15d2d792", + "id": "36c5cb79", "metadata": {}, "outputs": [], "source": [ @@ -98,7 +98,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "1749ca95", + "id": "d293e059", "metadata": {}, "source": [ "### Create Simple LLM Application\n", @@ -109,7 +109,7 @@ { "cell_type": "code", "execution_count": null, - "id": "59492f68", + "id": "68b3eb86", "metadata": {}, "outputs": [], "source": [ @@ -126,7 +126,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "2d648df8", + "id": "39ee4d6e", "metadata": {}, "source": [ "### Send your first request" @@ -135,7 +135,7 @@ { "cell_type": "code", "execution_count": null, - "id": "740c6020", + "id": "73c7ae1d", "metadata": {}, "outputs": [], "source": [ @@ -146,7 +146,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "72dfc51d", + "id": "a4a498bc", "metadata": {}, "source": [ "## Initialize Feedback Function(s)" @@ -155,7 +155,7 @@ { "cell_type": "code", "execution_count": null, - "id": "98baac75", + "id": "d0edc9be", "metadata": {}, "outputs": [], "source": [ @@ -184,7 +184,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "e39e33ab", + "id": "e0f88878", "metadata": {}, "source": [ "## Instrument app for logging with TruLens" @@ -193,7 +193,7 @@ { "cell_type": "code", "execution_count": null, - "id": "08850ddd", + "id": "144933f6", "metadata": {}, "outputs": [], "source": [ @@ -205,7 +205,7 @@ { "cell_type": "code", "execution_count": null, - "id": "bf6c0c0f", + "id": "cf42648b", "metadata": {}, "outputs": [], "source": [ @@ -217,7 +217,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "57b3899b", + "id": "cb7c8943", "metadata": {}, "source": [ "## Explore in a Dashboard" @@ -226,7 +226,7 @@ { "cell_type": "code", "execution_count": null, - "id": "eeb1d378", + "id": "c6172cb7", "metadata": {}, "outputs": [], "source": [ @@ -238,7 +238,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "02ff95de", + "id": "73375c7a", "metadata": {}, "source": [ "Alternatively, you can run `trulens-eval` from a command line in the same folder to start the dashboard." @@ -247,7 +247,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "13834f95", + "id": "3a6e0efa", "metadata": {}, "source": [ "Note: Feedback functions evaluated in the deferred manner can be seen in the \"Progress\" page of the TruLens dashboard." @@ -256,7 +256,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "129cf23e", + "id": "e0abb30f", "metadata": {}, "source": [ "## Or view results directly in your notebook" @@ -265,7 +265,7 @@ { "cell_type": "code", "execution_count": null, - "id": "bf7a26c5", + "id": "b1cc4087", "metadata": {}, "outputs": [], "source": [ diff --git a/trulens_eval/examples/quickstart/colab/text2text_quickstart_colab.ipynb b/trulens_eval/examples/quickstart/colab/text2text_quickstart_colab.ipynb index 1cd9b5281..1b12ce78c 100644 --- a/trulens_eval/examples/quickstart/colab/text2text_quickstart_colab.ipynb +++ b/trulens_eval/examples/quickstart/colab/text2text_quickstart_colab.ipynb @@ -17,7 +17,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "4cd76c30", + "id": "4aaab2fb", "metadata": {}, "source": [ "# Text to Text Quickstart\n", @@ -30,17 +30,17 @@ { "cell_type": "code", "execution_count": null, - "id": "d6ab0b90", + "id": "ca634ebd", "metadata": {}, "outputs": [], "source": [ - "# ! pip install trulens_eval==0.15.3" + "# ! pip install trulens_eval==0.16.0" ] }, { "attachments": {}, "cell_type": "markdown", - "id": "6bdcd805", + "id": "58ac01c6", "metadata": {}, "source": [ "## Setup\n", @@ -51,7 +51,7 @@ { "cell_type": "code", "execution_count": null, - "id": "8ed8e99e", + "id": "1e432e79", "metadata": {}, "outputs": [], "source": [ @@ -63,7 +63,7 @@ { "cell_type": "code", "execution_count": null, - "id": "b0173a86", + "id": "1dd39167", "metadata": {}, "outputs": [], "source": [ @@ -74,7 +74,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "641ce18d", + "id": "1d0dbe31", "metadata": {}, "source": [ "### Import from TruLens" @@ -83,7 +83,7 @@ { "cell_type": "code", "execution_count": null, - "id": "6ed8fe19", + "id": "72407c47", "metadata": {}, "outputs": [], "source": [ @@ -97,7 +97,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "19f60f5c", + "id": "30f662e8", "metadata": {}, "source": [ "### Create Simple Text to Text Application\n", @@ -108,7 +108,7 @@ { "cell_type": "code", "execution_count": null, - "id": "a2855cd1", + "id": "5c616e85", "metadata": {}, "outputs": [], "source": [ @@ -125,7 +125,7 @@ { "cell_type": "code", "execution_count": null, - "id": "be55f788", + "id": "28712c86", "metadata": {}, "outputs": [], "source": [ @@ -138,7 +138,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "1a557cc5", + "id": "0ad46d92", "metadata": {}, "source": [ "### Send your first request" @@ -147,7 +147,7 @@ { "cell_type": "code", "execution_count": null, - "id": "d9e2b930", + "id": "18f165fe", "metadata": {}, "outputs": [], "source": [ @@ -159,7 +159,7 @@ { "cell_type": "code", "execution_count": null, - "id": "2751890a", + "id": "47d071c8", "metadata": {}, "outputs": [], "source": [ @@ -169,7 +169,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "e6b552f0", + "id": "b2001c8f", "metadata": {}, "source": [ "## Initialize Feedback Function(s)" @@ -178,7 +178,7 @@ { "cell_type": "code", "execution_count": null, - "id": "9725f089", + "id": "7671ad0b", "metadata": {}, "outputs": [], "source": [ @@ -192,7 +192,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "0315595d", + "id": "5d57263d", "metadata": {}, "source": [ "## Instrument the callable for logging with TruLens" @@ -201,7 +201,7 @@ { "cell_type": "code", "execution_count": null, - "id": "554d6f07", + "id": "9e96ba52", "metadata": {}, "outputs": [], "source": [ @@ -213,7 +213,7 @@ { "cell_type": "code", "execution_count": null, - "id": "dc206f53", + "id": "e65a61f1", "metadata": {}, "outputs": [], "source": [ @@ -224,7 +224,7 @@ { "cell_type": "code", "execution_count": null, - "id": "3ab0623e", + "id": "e28e26aa", "metadata": {}, "outputs": [], "source": [ @@ -235,7 +235,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "1cd23e1b", + "id": "6637224e", "metadata": {}, "source": [ "## Explore in a Dashboard" @@ -244,7 +244,7 @@ { "cell_type": "code", "execution_count": null, - "id": "039ab508", + "id": "bc12f699", "metadata": {}, "outputs": [], "source": [ @@ -256,7 +256,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "9d5e3fc5", + "id": "5a604e84", "metadata": {}, "source": [ "Alternatively, you can run `trulens-eval` from a command line in the same folder to start the dashboard." @@ -265,7 +265,7 @@ { "attachments": {}, "cell_type": "markdown", - "id": "3aa7b1a3", + "id": "c103cce8", "metadata": {}, "source": [ "## Or view results directly in your notebook" @@ -274,7 +274,7 @@ { "cell_type": "code", "execution_count": null, - "id": "47acff2f", + "id": "383596e2", "metadata": {}, "outputs": [], "source": [ diff --git a/trulens_eval/examples/quickstart/langchain_async.ipynb b/trulens_eval/examples/quickstart/langchain_async.ipynb index 52e6270de..dafc458dc 100644 --- a/trulens_eval/examples/quickstart/langchain_async.ipynb +++ b/trulens_eval/examples/quickstart/langchain_async.ipynb @@ -28,7 +28,7 @@ "metadata": {}, "outputs": [], "source": [ - "# ! pip install trulens_eval==0.15.3 langchain>=0.0.263" + "# ! pip install trulens_eval==0.16.0 langchain>=0.0.263" ] }, { diff --git a/trulens_eval/examples/quickstart/langchain_quickstart.ipynb b/trulens_eval/examples/quickstart/langchain_quickstart.ipynb index df555bda9..be2c8ab2a 100644 --- a/trulens_eval/examples/quickstart/langchain_quickstart.ipynb +++ b/trulens_eval/examples/quickstart/langchain_quickstart.ipynb @@ -18,7 +18,7 @@ "metadata": {}, "outputs": [], "source": [ - "# ! pip install trulens_eval==0.15.3 langchain>=0.0.263" + "# ! pip install trulens_eval==0.16.0 langchain>=0.0.263" ] }, { diff --git a/trulens_eval/examples/quickstart/langchain_retrieval_agent.ipynb b/trulens_eval/examples/quickstart/langchain_retrieval_agent.ipynb index 18a93b8c6..92d2879de 100644 --- a/trulens_eval/examples/quickstart/langchain_retrieval_agent.ipynb +++ b/trulens_eval/examples/quickstart/langchain_retrieval_agent.ipynb @@ -23,7 +23,7 @@ "metadata": {}, "outputs": [], "source": [ - "#! pip install trulens_eval==0.15.3 langchain==0.0.315 unstructured==0.10.23 chromadb==0.4.14" + "#! pip install trulens_eval==0.16.0 langchain==0.0.315 unstructured==0.10.23 chromadb==0.4.14" ] }, { diff --git a/trulens_eval/examples/quickstart/llama_index_async.ipynb b/trulens_eval/examples/quickstart/llama_index_async.ipynb index a4f25218a..da415e56a 100644 --- a/trulens_eval/examples/quickstart/llama_index_async.ipynb +++ b/trulens_eval/examples/quickstart/llama_index_async.ipynb @@ -24,7 +24,7 @@ "metadata": {}, "outputs": [], "source": [ - "# ! pip install trulens_eval==0.15.3 llama_index>=0.8.29post1 html2text>=2020.1.16" + "# ! pip install trulens_eval==0.16.0 llama_index>=0.8.29post1 html2text>=2020.1.16" ] }, { diff --git a/trulens_eval/examples/quickstart/llama_index_quickstart.ipynb b/trulens_eval/examples/quickstart/llama_index_quickstart.ipynb index c46ab8d83..0eefc27c2 100644 --- a/trulens_eval/examples/quickstart/llama_index_quickstart.ipynb +++ b/trulens_eval/examples/quickstart/llama_index_quickstart.ipynb @@ -31,7 +31,7 @@ "metadata": {}, "outputs": [], "source": [ - "#! pip install trulens-eval==0.15.3 llama_index>=0.8.29post1 html2text>=2020.1.16" + "#! pip install trulens-eval==0.16.0 llama_index>=0.8.29post1 html2text>=2020.1.16" ] }, { diff --git a/trulens_eval/examples/quickstart/pinecone_quickstart.ipynb b/trulens_eval/examples/quickstart/pinecone_quickstart.ipynb index 49c39a2a8..82b7b68ed 100644 --- a/trulens_eval/examples/quickstart/pinecone_quickstart.ipynb +++ b/trulens_eval/examples/quickstart/pinecone_quickstart.ipynb @@ -29,7 +29,7 @@ "metadata": {}, "outputs": [], "source": [ - "#! pip install trulens-eval==0.15.3 llama_index>=0.8.29post1 pinecone-client>=2.2.2 nltk>=3.8.1 html2text>=2020.1.16" + "#! pip install trulens-eval==0.16.0 llama_index>=0.8.29post1 pinecone-client>=2.2.2 nltk>=3.8.1 html2text>=2020.1.16" ] }, { diff --git a/trulens_eval/examples/quickstart/prototype_evals.ipynb b/trulens_eval/examples/quickstart/prototype_evals.ipynb index e5b536f6e..a4b96449a 100644 --- a/trulens_eval/examples/quickstart/prototype_evals.ipynb +++ b/trulens_eval/examples/quickstart/prototype_evals.ipynb @@ -32,7 +32,7 @@ "metadata": {}, "outputs": [], "source": [ - "#! pip install trulens-eval==0.15.3" + "#! pip install trulens-eval==0.16.0" ] }, { diff --git a/trulens_eval/examples/quickstart/py_script_quickstarts/all_tools.py b/trulens_eval/examples/quickstart/py_script_quickstarts/all_tools.py index f72e8a0ec..d89fa525a 100644 --- a/trulens_eval/examples/quickstart/py_script_quickstarts/all_tools.py +++ b/trulens_eval/examples/quickstart/py_script_quickstarts/all_tools.py @@ -2,14 +2,16 @@ # coding: utf-8 # # Langchain Quickstart -# +# # In this quickstart you will create a simple LLM Chain and learn how to log it and get feedback on an LLM response. -# +# # [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/main/trulens_eval/examples/quickstart/langchain_quickstart.ipynb) # In[ ]: -# ! pip install trulens_eval==0.15.3 langchain>=0.0.263 + +# ! pip install trulens_eval==0.16.0 langchain>=0.0.263 + # ## Setup # ### Add API keys @@ -17,24 +19,22 @@ # In[ ]: -import os +import os os.environ["OPENAI_API_KEY"] = "..." os.environ["HUGGINGFACE_API_KEY"] = "..." + # ### Import from LangChain and TruLens # In[ ]: + from IPython.display import JSON # Imports main tools: -from trulens_eval import Feedback -from trulens_eval import Huggingface -from trulens_eval import Tru -from trulens_eval import TruChain +from trulens_eval import TruChain, Feedback, Huggingface, Tru from trulens_eval.schema import FeedbackResult - tru = Tru() # Imports from langchain to build app. You may need to install langchain first @@ -42,16 +42,17 @@ # ! pip install langchain>=0.0.170 from langchain.chains import LLMChain from langchain.llms import OpenAI -from langchain.prompts.chat import ChatPromptTemplate +from langchain.prompts.chat import ChatPromptTemplate, PromptTemplate from langchain.prompts.chat import HumanMessagePromptTemplate -from langchain.prompts.chat import PromptTemplate + # ### Create Simple LLM Application -# +# # This example uses a LangChain framework and OpenAI LLM # In[ ]: + full_prompt = HumanMessagePromptTemplate( prompt=PromptTemplate( template= @@ -66,22 +67,28 @@ chain = LLMChain(llm=llm, prompt=chat_prompt_template, verbose=True) + # ### Send your first request # In[ ]: + prompt_input = '¿que hora es?' + # In[ ]: + llm_response = chain(prompt_input) display(llm_response) + # ## Initialize Feedback Function(s) # In[ ]: + # Initialize Huggingface-based feedback function collection class: hugs = Huggingface() @@ -90,82 +97,93 @@ # By default this will check language match on the main app input and main app # output. + # ## Instrument chain for logging with TruLens # In[ ]: -tru_recorder = TruChain( - chain, app_id='Chain1_ChatApplication', feedbacks=[f_lang_match] -) + +tru_recorder = TruChain(chain, + app_id='Chain1_ChatApplication', + feedbacks=[f_lang_match]) + # In[ ]: + with tru_recorder as recording: llm_response = chain(prompt_input) display(llm_response) + # ## Retrieve records and feedback # In[ ]: + # The record of the ap invocation can be retrieved from the `recording`: -rec = recording.get() # use .get if only one record +rec = recording.get() # use .get if only one record # recs = recording.records # use .records if multiple display(rec) + # In[ ]: + # The results of the feedback functions can be rertireved from the record. These # are `Future` instances (see `concurrent.futures`). You can use `as_completed` # to wait until they have finished evaluating. from concurrent.futures import as_completed -for feedback_future in as_completed(rec.feedback_results): +for feedback_future in as_completed(rec.feedback_results): feedback, feedback_result = feedback_future.result() - + feedback: Feedback feedbac_result: FeedbackResult display(feedback.name, feedback_result.result) + # ## Explore in a Dashboard # In[ ]: -tru.run_dashboard() # open a local streamlit app to explore + +tru.run_dashboard() # open a local streamlit app to explore # tru.stop_dashboard() # stop if needed + # Alternatively, you can run `trulens-eval` from a command line in the same folder to start the dashboard. # ### Chain Leaderboard -# +# # Understand how your LLM application is performing at a glance. Once you've set up logging and evaluation in your application, you can view key performance statistics including cost and average feedback value across all of your LLM apps using the chain leaderboard. As you iterate new versions of your LLM application, you can compare their performance across all of the different quality metrics you've set up. -# +# # Note: Average feedback values are returned and displayed in a range from 0 (worst) to 1 (best). -# +# # ![Chain Leaderboard](https://www.trulens.org/Assets/image/Leaderboard.png) -# +# # To dive deeper on a particular chain, click "Select Chain". -# +# # ### Understand chain performance with Evaluations -# +# # To learn more about the performance of a particular chain or LLM model, we can select it to view its evaluations at the record level. LLM quality is assessed through the use of feedback functions. Feedback functions are extensible methods for determining the quality of LLM responses and can be applied to any downstream LLM task. Out of the box we provide a number of feedback functions for assessing model agreement, sentiment, relevance and more. -# +# # The evaluations tab provides record-level metadata and feedback on the quality of your LLM application. -# +# # ![Evaluations](https://www.trulens.org/Assets/image/Leaderboard.png) -# +# # ### Deep dive into full chain metadata -# +# # Click on a record to dive deep into all of the details of your chain stack and underlying LLM, captured by tru_chain_recorder. -# +# # ![Explore a Chain](https://www.trulens.org/Assets/image/Chain_Explore.png) -# +# # If you prefer the raw format, you can quickly get it using the "Display full chain json" or "Display full record json" buttons at the bottom of the page. # Note: Feedback functions evaluated in the deferred manner can be seen in the "Progress" page of the TruLens dashboard. @@ -174,104 +192,129 @@ # In[ ]: -tru.get_records_and_feedback(app_ids=[] - )[0] # pass an empty list of app_ids to get all + +tru.get_records_and_feedback(app_ids=[])[0] # pass an empty list of app_ids to get all + # # Logging Methods -# +# # ## Automatic Logging -# +# # The simplest method for logging with TruLens is by wrapping with TruChain and including the tru argument, as shown in the quickstart. -# +# # This is done like so: # In[ ]: -truchain = TruChain(chain, app_id='Chain1_ChatApplication', tru=tru) + +truchain = TruChain( + chain, + app_id='Chain1_ChatApplication', + tru=tru +) truchain("This will be automatically logged.") + # Feedback functions can also be logged automatically by providing them in a list to the feedbacks arg. # In[ ]: + truchain = TruChain( chain, app_id='Chain1_ChatApplication', - feedbacks=[f_lang_match], # feedback functions + feedbacks=[f_lang_match], # feedback functions tru=tru ) truchain("This will be automatically logged.") + # ## Manual Logging -# +# # ### Wrap with TruChain to instrument your chain # In[ ]: + tc = TruChain(chain, app_id='Chain1_ChatApplication') + # ### Set up logging and instrumentation -# +# # Making the first call to your wrapped LLM Application will now also produce a log or "record" of the chain execution. -# +# # In[ ]: + prompt_input = 'que hora es?' gpt3_response, record = tc.call_with_record(prompt_input) + # We can log the records but first we need to log the chain itself. # In[ ]: + tru.add_app(app=truchain) + # Then we can log the record: # In[ ]: + tru.add_record(record) + # ### Log App Feedback # Capturing app feedback such as user feedback of the responses can be added with one call. # In[ ]: + thumb_result = True -tru.add_feedback( - name="👍 (1) or 👎 (0)", record_id=record.record_id, result=thumb_result -) +tru.add_feedback(name="👍 (1) or 👎 (0)", + record_id=record.record_id, + result=thumb_result) + # ### Evaluate Quality -# +# # Following the request to your app, you can then evaluate LLM quality using feedback functions. This is completed in a sequential call to minimize latency for your application, and evaluations will also be logged to your local machine. -# +# # To get feedback on the quality of your LLM, you can use any of the provided feedback functions or add your own. -# +# # To assess your LLM quality, you can provide the feedback functions to `tru.run_feedback()` in a list provided to `feedback_functions`. -# +# # In[ ]: + feedback_results = tru.run_feedback_functions( - record=record, feedback_functions=[f_lang_match] + record=record, + feedback_functions=[f_lang_match] ) display(feedback_results) + # After capturing feedback, you can then log it to your local database. # In[ ]: + tru.add_feedbacks(feedback_results) + # ### Out-of-band Feedback evaluation -# +# # In the above example, the feedback function evaluation is done in the same process as the chain evaluation. The alternative approach is the use the provided persistent evaluator started via `tru.start_deferred_feedback_evaluator`. Then specify the `feedback_mode` for `TruChain` as `deferred` to let the evaluator handle the feedback functions. -# +# # For demonstration purposes, we start the evaluator here but it can be started in another process. # In[ ]: + truchain: TruChain = TruChain( chain, app_id='Chain1_ChatApplication', @@ -284,25 +327,22 @@ truchain("This will be logged by deferred evaluator.") tru.stop_evaluator() + # # Custom Functions -# +# # Feedback functions are an extensible framework for evaluating LLMs. You can add your own feedback functions to evaluate the qualities required by your application by updating `trulens_eval/feedback.py`, or simply creating a new provider class and feedback function in youre notebook. If your contributions would be useful for others, we encourage you to contribute to TruLens! -# +# # Feedback functions are organized by model provider into Provider classes. -# +# # The process for adding new feedback functions is: # 1. Create a new Provider class or locate an existing one that applies to your feedback function. If your feedback function does not rely on a model provider, you can create a standalone class. Add the new feedback function method to your selected class. Your new method can either take a single text (str) as a parameter or both prompt (str) and response (str). It should return a float between 0 (worst) and 1 (best). # In[ ]: -from trulens_eval import Feedback -from trulens_eval import Provider -from trulens_eval import Select -from trulens_eval import Tru +from trulens_eval import Provider, Feedback, Select, Tru class StandAlone(Provider): - def custom_feedback(self, my_text_field: str) -> float: """ A dummy function of text inputs to float outputs. @@ -320,53 +360,57 @@ def custom_feedback(self, my_text_field: str) -> float: # In[ ]: + standalone = StandAlone() -f_custom_function = Feedback(standalone.custom_feedback - ).on(my_text_field=Select.RecordOutput) +f_custom_function = Feedback(standalone.custom_feedback).on( + my_text_field=Select.RecordOutput +) + # 3. Your feedback function is now ready to use just like the out of the box feedback functions. Below is an example of it being used. # In[ ]: + tru = Tru() feedback_results = tru.run_feedback_functions( - record=record, feedback_functions=[f_custom_function] + record=record, + feedback_functions=[f_custom_function] ) tru.add_feedbacks(feedback_results) + # ## Multi-Output Feedback functions # Trulens also supports multi-output feedback functions. As a typical feedback function will output a float between 0 and 1, multi-output should output a dictionary of `output_key` to a float between 0 and 1. The feedbacks table will display the feedback with column `feedback_name:::outputkey` # In[ ]: -multi_output_feedback = Feedback( - lambda input_param: { - 'output_key1': 0.1, - 'output_key2': 0.9 - }, name="multi" -).on(input_param=Select.RecordOutput) + +multi_output_feedback = Feedback(lambda input_param: {'output_key1': 0.1, 'output_key2': 0.9}, name="multi").on( + input_param=Select.RecordOutput +) feedback_results = tru.run_feedback_functions( - record=record, feedback_functions=[multi_output_feedback] + record=record, + feedback_functions=[multi_output_feedback] ) tru.add_feedbacks(feedback_results) + # In[ ]: + # Aggregators will run on the same dict keys. import numpy as np - -multi_output_feedback = Feedback( - lambda input_param: { - 'output_key1': 0.1, - 'output_key2': 0.9 - }, - name="multi-agg" -).on(input_param=Select.RecordOutput).aggregate(np.mean) +multi_output_feedback = Feedback(lambda input_param: {'output_key1': 0.1, 'output_key2': 0.9}, name="multi-agg").on( + input_param=Select.RecordOutput +).aggregate(np.mean) feedback_results = tru.run_feedback_functions( - record=record, feedback_functions=[multi_output_feedback] + record=record, + feedback_functions=[multi_output_feedback] ) tru.add_feedbacks(feedback_results) + # In[ ]: @@ -376,16 +420,12 @@ def dict_aggregator(list_dict_input): for dict_input in list_dict_input: agg += dict_input['output_key1'] return agg - - -multi_output_feedback = Feedback( - lambda input_param: { - 'output_key1': 0.1, - 'output_key2': 0.9 - }, - name="multi-agg-dict" -).on(input_param=Select.RecordOutput).aggregate(dict_aggregator) +multi_output_feedback = Feedback(lambda input_param: {'output_key1': 0.1, 'output_key2': 0.9}, name="multi-agg-dict").on( + input_param=Select.RecordOutput +).aggregate(dict_aggregator) feedback_results = tru.run_feedback_functions( - record=record, feedback_functions=[multi_output_feedback] + record=record, + feedback_functions=[multi_output_feedback] ) tru.add_feedbacks(feedback_results) + diff --git a/trulens_eval/examples/quickstart/py_script_quickstarts/langchain_quickstart.py b/trulens_eval/examples/quickstart/py_script_quickstarts/langchain_quickstart.py index 4dffb39e9..8fcecf53e 100644 --- a/trulens_eval/examples/quickstart/py_script_quickstarts/langchain_quickstart.py +++ b/trulens_eval/examples/quickstart/py_script_quickstarts/langchain_quickstart.py @@ -2,14 +2,16 @@ # coding: utf-8 # # Langchain Quickstart -# +# # In this quickstart you will create a simple LLM Chain and learn how to log it and get feedback on an LLM response. -# +# # [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/main/trulens_eval/examples/quickstart/langchain_quickstart.ipynb) # In[ ]: -# ! pip install trulens_eval==0.15.3 langchain>=0.0.263 + +# ! pip install trulens_eval==0.16.0 langchain>=0.0.263 + # ## Setup # ### Add API keys @@ -17,24 +19,22 @@ # In[ ]: -import os +import os os.environ["OPENAI_API_KEY"] = "..." os.environ["HUGGINGFACE_API_KEY"] = "..." + # ### Import from LangChain and TruLens # In[ ]: + from IPython.display import JSON # Imports main tools: -from trulens_eval import Feedback -from trulens_eval import Huggingface -from trulens_eval import Tru -from trulens_eval import TruChain +from trulens_eval import TruChain, Feedback, Huggingface, Tru from trulens_eval.schema import FeedbackResult - tru = Tru() # Imports from langchain to build app. You may need to install langchain first @@ -42,16 +42,17 @@ # ! pip install langchain>=0.0.170 from langchain.chains import LLMChain from langchain.llms import OpenAI -from langchain.prompts.chat import ChatPromptTemplate +from langchain.prompts.chat import ChatPromptTemplate, PromptTemplate from langchain.prompts.chat import HumanMessagePromptTemplate -from langchain.prompts.chat import PromptTemplate + # ### Create Simple LLM Application -# +# # This example uses a LangChain framework and OpenAI LLM # In[ ]: + full_prompt = HumanMessagePromptTemplate( prompt=PromptTemplate( template= @@ -66,22 +67,28 @@ chain = LLMChain(llm=llm, prompt=chat_prompt_template, verbose=True) + # ### Send your first request # In[ ]: + prompt_input = '¿que hora es?' + # In[ ]: + llm_response = chain(prompt_input) display(llm_response) + # ## Initialize Feedback Function(s) # In[ ]: + # Initialize Huggingface-based feedback function collection class: hugs = Huggingface() @@ -90,82 +97,93 @@ # By default this will check language match on the main app input and main app # output. + # ## Instrument chain for logging with TruLens # In[ ]: -tru_recorder = TruChain( - chain, app_id='Chain1_ChatApplication', feedbacks=[f_lang_match] -) + +tru_recorder = TruChain(chain, + app_id='Chain1_ChatApplication', + feedbacks=[f_lang_match]) + # In[ ]: + with tru_recorder as recording: llm_response = chain(prompt_input) display(llm_response) + # ## Retrieve records and feedback # In[ ]: + # The record of the ap invocation can be retrieved from the `recording`: -rec = recording.get() # use .get if only one record +rec = recording.get() # use .get if only one record # recs = recording.records # use .records if multiple display(rec) + # In[ ]: + # The results of the feedback functions can be rertireved from the record. These # are `Future` instances (see `concurrent.futures`). You can use `as_completed` # to wait until they have finished evaluating. from concurrent.futures import as_completed -for feedback_future in as_completed(rec.feedback_results): +for feedback_future in as_completed(rec.feedback_results): feedback, feedback_result = feedback_future.result() - + feedback: Feedback feedbac_result: FeedbackResult display(feedback.name, feedback_result.result) + # ## Explore in a Dashboard # In[ ]: -tru.run_dashboard() # open a local streamlit app to explore + +tru.run_dashboard() # open a local streamlit app to explore # tru.stop_dashboard() # stop if needed + # Alternatively, you can run `trulens-eval` from a command line in the same folder to start the dashboard. # ### Chain Leaderboard -# +# # Understand how your LLM application is performing at a glance. Once you've set up logging and evaluation in your application, you can view key performance statistics including cost and average feedback value across all of your LLM apps using the chain leaderboard. As you iterate new versions of your LLM application, you can compare their performance across all of the different quality metrics you've set up. -# +# # Note: Average feedback values are returned and displayed in a range from 0 (worst) to 1 (best). -# +# # ![Chain Leaderboard](https://www.trulens.org/Assets/image/Leaderboard.png) -# +# # To dive deeper on a particular chain, click "Select Chain". -# +# # ### Understand chain performance with Evaluations -# +# # To learn more about the performance of a particular chain or LLM model, we can select it to view its evaluations at the record level. LLM quality is assessed through the use of feedback functions. Feedback functions are extensible methods for determining the quality of LLM responses and can be applied to any downstream LLM task. Out of the box we provide a number of feedback functions for assessing model agreement, sentiment, relevance and more. -# +# # The evaluations tab provides record-level metadata and feedback on the quality of your LLM application. -# +# # ![Evaluations](https://www.trulens.org/Assets/image/Leaderboard.png) -# +# # ### Deep dive into full chain metadata -# +# # Click on a record to dive deep into all of the details of your chain stack and underlying LLM, captured by tru_chain_recorder. -# +# # ![Explore a Chain](https://www.trulens.org/Assets/image/Chain_Explore.png) -# +# # If you prefer the raw format, you can quickly get it using the "Display full chain json" or "Display full record json" buttons at the bottom of the page. # Note: Feedback functions evaluated in the deferred manner can be seen in the "Progress" page of the TruLens dashboard. @@ -174,5 +192,6 @@ # In[ ]: -tru.get_records_and_feedback(app_ids=[] - )[0] # pass an empty list of app_ids to get all + +tru.get_records_and_feedback(app_ids=[])[0] # pass an empty list of app_ids to get all + diff --git a/trulens_eval/examples/quickstart/py_script_quickstarts/llama_index_quickstart.py b/trulens_eval/examples/quickstart/py_script_quickstarts/llama_index_quickstart.py index 3529b36ff..b6ac3bc42 100644 --- a/trulens_eval/examples/quickstart/py_script_quickstarts/llama_index_quickstart.py +++ b/trulens_eval/examples/quickstart/py_script_quickstarts/llama_index_quickstart.py @@ -2,70 +2,77 @@ # coding: utf-8 # # Llama-Index Quickstart -# +# # In this quickstart you will create a simple Llama Index App and learn how to log it and get feedback on an LLM response. -# +# # For evaluation, we will leverage the "hallucination triad" of groundedness, context relevance and answer relevance. -# +# # [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/main/trulens_eval/examples/quickstart/llama_index_quickstart.ipynb) # ## Setup -# +# # ### Install dependencies # Let's install some of the dependencies for this notebook if we don't have them already # In[ ]: -#! pip install trulens-eval==0.15.3 llama_index>=0.8.29post1 html2text>=2020.1.16 + +#! pip install trulens-eval==0.16.0 llama_index>=0.8.29post1 html2text>=2020.1.16 + # ### Add API keys # For this quickstart, you will need Open AI and Huggingface keys. The OpenAI key is used for embeddings and GPT, and the Huggingface key is used for evaluation. # In[ ]: -import os +import os os.environ["OPENAI_API_KEY"] = "..." + # ### Import from LlamaIndex and TruLens # In[ ]: -from trulens_eval import Feedback -from trulens_eval import Tru -from trulens_eval import TruLlama + +from trulens_eval import Feedback, Tru, TruLlama from trulens_eval.feedback import Groundedness from trulens_eval.feedback.provider.openai import OpenAI tru = Tru() + # ### Create Simple LLM Application -# +# # This example uses LlamaIndex which internally uses an OpenAI LLM. # In[ ]: -from llama_index import SimpleWebPageReader -from llama_index import VectorStoreIndex -documents = SimpleWebPageReader(html_to_text=True).load_data( - ["http://paulgraham.com/worked.html"] -) +from llama_index import VectorStoreIndex, SimpleWebPageReader + +documents = SimpleWebPageReader( + html_to_text=True +).load_data(["http://paulgraham.com/worked.html"]) index = VectorStoreIndex.from_documents(documents) query_engine = index.as_query_engine() + # ### Send your first request # In[ ]: + response = query_engine.query("What did the author do growing up?") print(response) + # ## Initialize Feedback Function(s) # In[ ]: + import numpy as np # Initialize provider class @@ -76,7 +83,8 @@ # Define a groundedness feedback function f_groundedness = Feedback(grounded.groundedness_measure_with_cot_reasons).on( TruLlama.select_source_nodes().node.text -).on_output().aggregate(grounded.grounded_statements_aggregator) + ).on_output( + ).aggregate(grounded.grounded_statements_aggregator) # Question/answer relevance between overall question and answer. f_qa_relevance = Feedback(openai.relevance).on_input_output() @@ -84,32 +92,37 @@ # Question/statement relevance between question and each context chunk. f_qs_relevance = Feedback(openai.qs_relevance).on_input().on( TruLlama.select_source_nodes().node.text -).aggregate(np.mean) + ).aggregate(np.mean) + # ## Instrument app for logging with TruLens # In[ ]: -tru_query_engine_recorder = TruLlama( - query_engine, + +tru_query_engine_recorder = TruLlama(query_engine, app_id='LlamaIndex_App1', - feedbacks=[f_groundedness, f_qa_relevance, f_qs_relevance] -) + feedbacks=[f_groundedness, f_qa_relevance, f_qs_relevance]) + # In[ ]: + # or as context manager with tru_query_engine_recorder as recording: query_engine.query("What did the author do growing up?") + # ## Explore in a Dashboard # In[ ]: -tru.run_dashboard() # open a local streamlit app to explore + +tru.run_dashboard() # open a local streamlit app to explore # tru.stop_dashboard() # stop if needed + # Alternatively, you can run `trulens-eval` from a command line in the same folder to start the dashboard. # Note: Feedback functions evaluated in the deferred manner can be seen in the "Progress" page of the TruLens dashboard. @@ -118,5 +131,6 @@ # In[ ]: -tru.get_records_and_feedback(app_ids=[] - )[0] # pass an empty list of app_ids to get all + +tru.get_records_and_feedback(app_ids=[])[0] # pass an empty list of app_ids to get all + diff --git a/trulens_eval/examples/quickstart/py_script_quickstarts/text2text_quickstart.py b/trulens_eval/examples/quickstart/py_script_quickstarts/text2text_quickstart.py index 129db5bdf..d2acf0b6b 100644 --- a/trulens_eval/examples/quickstart/py_script_quickstarts/text2text_quickstart.py +++ b/trulens_eval/examples/quickstart/py_script_quickstarts/text2text_quickstart.py @@ -2,14 +2,16 @@ # coding: utf-8 # # Text to Text Quickstart -# +# # In this quickstart you will create a simple text to text application and learn how to log it and get feedback. -# +# # [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/main/trulens_eval/examples/quickstart/text2text_quickstart.ipynb) # In[ ]: -# ! pip install trulens_eval==0.15.3 + +# ! pip install trulens_eval==0.16.0 + # ## Setup # ### Add API keys @@ -17,32 +19,33 @@ # In[ ]: -import os +import os os.environ["OPENAI_API_KEY"] = "..." os.environ["HUGGINGFACE_API_KEY"] = "..." + # In[ ]: -import openai +import openai openai.api_key = os.environ["OPENAI_API_KEY"] + # ### Import from TruLens # In[ ]: + from IPython.display import JSON # Imports main tools: -from trulens_eval import Feedback -from trulens_eval import Huggingface -from trulens_eval import Tru - +from trulens_eval import Feedback, Huggingface, Tru tru = Tru() + # ### Create Simple Text to Text Application -# +# # This example uses a bare bones OpenAI LLM, and a non-LLM just for demonstration purposes. # In[ ]: @@ -50,26 +53,18 @@ def llm_standalone(prompt): return openai.ChatCompletion.create( - model="gpt-3.5-turbo", - messages=[ - { - "role": - "system", - "content": - "You are a question and answer bot, and you answer super upbeat." - }, { - "role": "user", - "content": prompt - } + model="gpt-3.5-turbo", + messages=[ + {"role": "system", "content": "You are a question and answer bot, and you answer super upbeat."}, + {"role": "user", "content": prompt} ] )["choices"][0]["message"]["content"] # In[ ]: -import hashlib - +import hashlib def simple_hash_callable(prompt): h = hashlib.shake_256(prompt.encode('utf-8')) return str(h.hexdigest(20)) @@ -79,60 +74,70 @@ def simple_hash_callable(prompt): # In[ ]: -prompt_input = "How good is language AI?" + +prompt_input="How good is language AI?" prompt_output = llm_standalone(prompt_input) prompt_output + # In[ ]: + simple_hash_callable(prompt_input) + # ## Initialize Feedback Function(s) # In[ ]: + # Initialize Huggingface-based feedback function collection class: hugs = Huggingface() # Define a sentiment feedback function using HuggingFace. f_sentiment = Feedback(hugs.positive_sentiment).on_output() + # ## Instrument the callable for logging with TruLens # In[ ]: + from trulens_eval import TruBasicApp +tru_llm_standalone_recorder = TruBasicApp(llm_standalone, app_id="Happy Bot", feedbacks=[f_sentiment]) +tru_simple_hash_callable_recorder = TruBasicApp(simple_hash_callable, app_id="Hasher", feedbacks=[f_sentiment]) -tru_llm_standalone_recorder = TruBasicApp( - llm_standalone, app_id="Happy Bot", feedbacks=[f_sentiment] -) -tru_simple_hash_callable_recorder = TruBasicApp( - simple_hash_callable, app_id="Hasher", feedbacks=[f_sentiment] -) # In[ ]: + with tru_llm_standalone_recorder as recording: tru_llm_standalone_recorder.app(prompt_input) + # In[ ]: + with tru_simple_hash_callable_recorder as recording: tru_simple_hash_callable_recorder.app(prompt_input) + # ## Explore in a Dashboard # In[ ]: -tru.run_dashboard() # open a local streamlit app to explore + +tru.run_dashboard() # open a local streamlit app to explore # tru.stop_dashboard() # stop if needed + # Alternatively, you can run `trulens-eval` from a command line in the same folder to start the dashboard. # ## Or view results directly in your notebook # In[ ]: -tru.get_records_and_feedback(app_ids=[] - )[0] # pass an empty list of app_ids to get all + +tru.get_records_and_feedback(app_ids=[])[0] # pass an empty list of app_ids to get all + diff --git a/trulens_eval/examples/quickstart/summarization_eval.ipynb b/trulens_eval/examples/quickstart/summarization_eval.ipynb index 73279c701..d5b81752d 100644 --- a/trulens_eval/examples/quickstart/summarization_eval.ipynb +++ b/trulens_eval/examples/quickstart/summarization_eval.ipynb @@ -32,7 +32,7 @@ "metadata": {}, "outputs": [], "source": [ - "\"\"\"!pip install trulens_eval==0.15.3\n", + "\"\"\"!pip install trulens_eval==0.16.0\n", " bert_score==0.3.13 \\\n", " evaluate==0.4.0 \\\n", " absl-py==1.4.0 \\\n", diff --git a/trulens_eval/examples/quickstart/text2text_quickstart.ipynb b/trulens_eval/examples/quickstart/text2text_quickstart.ipynb index 4c3a175f7..012456a76 100644 --- a/trulens_eval/examples/quickstart/text2text_quickstart.ipynb +++ b/trulens_eval/examples/quickstart/text2text_quickstart.ipynb @@ -18,7 +18,7 @@ "metadata": {}, "outputs": [], "source": [ - "# ! pip install trulens_eval==0.15.3" + "# ! pip install trulens_eval==0.16.0" ] }, { diff --git a/trulens_eval/generated_files/all_tools.ipynb b/trulens_eval/generated_files/all_tools.ipynb index dc7ac857d..48bcf103e 100644 --- a/trulens_eval/generated_files/all_tools.ipynb +++ b/trulens_eval/generated_files/all_tools.ipynb @@ -18,7 +18,7 @@ "metadata": {}, "outputs": [], "source": [ - "# ! pip install trulens_eval==0.15.3 langchain>=0.0.263" + "# ! pip install trulens_eval==0.16.0 langchain>=0.0.263" ] }, { diff --git a/trulens_eval/trulens_eval/__init__.py b/trulens_eval/trulens_eval/__init__.py index 1d3592db7..8833773ba 100644 --- a/trulens_eval/trulens_eval/__init__.py +++ b/trulens_eval/trulens_eval/__init__.py @@ -78,7 +78,7 @@ """ -__version__ = "0.15.3" +__version__ = "0.16.0" from trulens_eval.feedback import Bedrock from trulens_eval.feedback import Feedback