-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release/v1.4 #679
Release/v1.4 #679
Conversation
WalkthroughThis comprehensive update enhances the PandasAI library with advanced reasoning capabilities, custom skills management, and SQLite database connectivity. It also improves code execution context, query execution tracking, and introduces new test cases for added functionalities. The changes are aimed at improving the user experience, expanding the library's capabilities, and ensuring robustness. Changes
TipsChat with CodeRabbit Bot (
|
Codecov Report
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. @@ Coverage Diff @@
## main #679 +/- ##
==========================================
+ Coverage 83.84% 84.80% +0.96%
==========================================
Files 67 70 +3
Lines 3206 3547 +341
==========================================
+ Hits 2688 3008 +320
- Misses 518 539 +21
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 23
Configuration used: CodeRabbit UI
Files ignored due to filter (1)
- mkdocs.yml
Files selected for processing (29)
- docs/examples.md (1 hunks)
- docs/getting-started.md (1 hunks)
- docs/skills.md (1 hunks)
- examples/skills_example.py (1 hunks)
- examples/using_pandasai_log_server.py (1 hunks)
- pandasai/init.py (2 hunks)
- pandasai/agent/init.py (6 hunks)
- pandasai/assets/prompt_templates/advanced_reasoning.tmpl (1 hunks)
- pandasai/assets/prompt_templates/check_if_relevant_to_conversation.tmpl (1 hunks)
- pandasai/assets/prompt_templates/generate_python_code.tmpl (1 hunks)
- pandasai/assets/prompt_templates/simple_reasoning.tmpl (1 hunks)
- pandasai/exceptions.py (1 hunks)
- pandasai/helpers/code_manager.py (8 hunks)
- pandasai/helpers/query_exec_tracker.py (1 hunks)
- pandasai/helpers/skills_manager.py (1 hunks)
- pandasai/llm/base.py (3 hunks)
- pandasai/prompts/base.py (4 hunks)
- pandasai/prompts/generate_python_code.py (2 hunks)
- pandasai/schemas/df_config.py (3 hunks)
- pandasai/skills/init.py (1 hunks)
- pandasai/smart_dataframe/init.py (4 hunks)
- pandasai/smart_datalake/init.py (17 hunks)
- tests/llms/test_base_llm.py (1 hunks)
- tests/prompts/test_generate_python_code_prompt.py (5 hunks)
- tests/skills/test_skills.py (1 hunks)
- tests/test_codemanager.py (7 hunks)
- tests/test_query_tracker.py (1 hunks)
- tests/test_smartdataframe.py (5 hunks)
- tests/test_smartdatalake.py (2 hunks)
Files skipped from review due to trivial changes (8)
- docs/examples.md
- pandasai/assets/prompt_templates/check_if_relevant_to_conversation.tmpl
- pandasai/assets/prompt_templates/simple_reasoning.tmpl
- pandasai/exceptions.py
- pandasai/prompts/generate_python_code.py
- tests/llms/test_base_llm.py
- tests/test_query_tracker.py
- tests/test_smartdataframe.py
Additional comments: 64
pandasai/assets/prompt_templates/generate_python_code.tmpl (1)
- 9-17: The new hunk introduces additional instructions for the user, such as reasoning step-by-step, avoiding the use of technical table names, and considering the last message in the conversation. Ensure that these instructions are clear and understandable to the user. Also, verify that the
{skills}
and{reasoning}
placeholders are correctly replaced with the intended content during runtime.docs/getting-started.md (1)
- 193-193: The new setting
use_advanced_reasoning_framework
has been introduced. Ensure that the implications of enabling this setting are clearly documented, especially in terms of performance and resource usage. Also, consider providing examples of scenarios where this setting would be beneficial.pandasai/assets/prompt_templates/advanced_reasoning.tmpl (1)
- 1-3: The changes in this hunk are related to the instructions for creating advanced reasoning prompts. The instructions are clear and concise, and they provide the necessary guidance for wrapping the reasoning and answer in the appropriate tags, as well as returning the updated
analyze_data
function. However, it would be beneficial to provide an example for better understanding.Example:
For instance, if the last step was to calculate the mean of a dataset, your template might look like this: <reasoning> To find the average value of the dataset, we calculate the mean. This gives us a single value that represents the center of the dataset. </reasoning> <answer> The mean of the dataset is calculated using the `mean()` function in pandas. </answer> ```python def analyze_data(data): mean = data.mean() return mean</blockquote></details> <details><summary>docs/skills.md (1)</summary><blockquote> * 108-108: The function `plot_salaries_using_streamlit` is not defined in the provided code. It seems like a typo, and the function `plot_salaries` should be added as a skill instead. Please verify. ```diff - agent.add_skills(plot_salaries_using_streamlit) + agent.add_skills(plot_salaries)
pandasai/__init__.py (1)
- 261-268: The
__all__
list has been updated to includeskill
. This meansskill
will be imported whenfrom pandasai import *
is used. Ensure that this is the intended behavior and thatskill
is designed to be used directly.examples/using_pandasai_log_server.py (1)
- 21-23: Environment variables are being set within the script. This is not a good practice as it can lead to security issues and it's not scalable. Environment variables should be set outside the script, in the environment where the script is running. If these lines are for demonstration purposes, consider adding a comment to clarify this.
26, 42:
The same API key is used for both the OpenAI instance and the logging server. If these services require different keys, this could be a potential issue. Please verify.examples/skills_example.py (3)
41-41: The
Agent
is initialized with amemory_size
of 10. Ensure that this value is appropriate for your use case. If theAgent
needs to remember more or fewer past interactions, adjust this value accordingly.43-43: The
plot_salaries
skill is added to theAgent
. Ensure that this skill is compatible with theAgent
and that it can be used effectively in the chat interface.46-47: The
Agent
is used to process a chat command. Ensure that the command is correctly interpreted by theAgent
and that the response is as expected.pandasai/helpers/skills_manager.py (1)
- 98-99: The setter for
used_skills
allows for the entire_used_skills
list to be replaced. This could potentially remove all previously used skills. If this is not the intended behavior, consider removing this setter or adding additional checks.pandasai/prompts/base.py (4)
14-14: The
_config
attribute has been added to theAbstractPrompt
class. Ensure that this attribute is used appropriately in the subclasses and that it is initialized before use to avoidNoneType
errors.31-32: The
on_prompt_generation
method has been added. This method seems to be a hook that can be overridden by subclasses to perform additional tasks during the prompt generation process. Ensure that this method is used correctly in the subclasses.65-74: The
set_config
andget_config
methods have been added. These methods are used to set and retrieve the configuration for the prompt. Theget_config
method can retrieve the entire configuration or a specific key from the configuration. Ensure that these methods are used correctly in the subclasses and that the configuration is set before it is retrieved.90-101: The
on_prompt_generation
method is now called at the beginning of theto_string
method. This allows subclasses to perform additional tasks during the prompt generation process. Also, theset_vars
method is now called with an additional condition to excludeAbstractPrompt
instances. This change ensures that nested prompts do not receive other nested prompts as arguments, which could potentially lead to infinite recursion. Ensure that these changes do not introduce any unexpected behavior in the subclasses.pandasai/llm/base.py (3)
- 123-142: The
_extract_tag_text
method uses regular expressions to extract text between specified tags from a response string. This method could potentially returnNone
if no match is found. Ensure that the calling code can handle aNone
return value. Also, consider precompiling the regular expression for performance improvement if this method is called frequently.- match = re.search( - f"(<{tag}>)(.*)(</{tag}>)", - response, - re.DOTALL | re.MULTILINE, - ) + pattern = re.compile(f"(<{tag}>)(.*)(</{tag}>)", re.DOTALL | re.MULTILINE) + match = pattern.search(response)
168-173: The
_extract_answer
method splits the response into sentences and filters out any sentence containing "temp_chart.png". This seems like a very specific condition. Ensure that this condition is valid for all cases where this method will be used. If this is a temporary workaround or a condition specific to a certain use case, consider making this more generic or configurable.200-211: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [192-208]
The
generate_code
method's return type has been changed from a single string to a list of strings. This change will affect all places where this method is called. Ensure that all calls to this method throughout the codebase have been updated to handle the new return type. Also, the method now extracts and returns the code, reasoning, and answer from the response. Ensure that this is the intended behavior and that it doesn't break any existing functionality.- def generate_code(self, instruction: AbstractPrompt) -> str: + def generate_code(self, instruction: AbstractPrompt) -> [str, str, str]:pandasai/agent/__init__.py (6)
3-4: The import statement for
skill
is added but it's not clear from the context whether it's a module or a class. If it's a class, it should be capitalized according to Python's naming conventions. Please verify.46-57: The
set_instance_type
method is called on theSmartDatalake
object to set the instance type. A new methodadd_skills
is added to theAgent
class to add skills to theSmartDatalake
. This is a good practice as it allows for better modularity and separation of concerns.85-86: The
check_if_related_to_conversation
method now returns a boolean value indicating whether the query is related to the conversation. This value is then passed to theis_related_query
method of theSmartDatalake
object. This is a good practice as it makes the code more readable and understandable.96-96: The return type of the
check_if_related_to_conversation
method is now explicitly defined asbool
. This is a good practice as it makes the code more readable and understandable.121-121: The
check_if_related_to_conversation
method now returns the value ofrelated
. This is a good practice as it makes the code more readable and understandable.189-207: New properties are added to the
Agent
class to get the last code generated, the last code executed, the last prompt, the last reasoning, and the last answer from theSmartDatalake
object. This is a good practice as it allows for better encapsulation and data hiding.pandasai/smart_dataframe/__init__.py (4)
29-29: The import statement
from pandasai.skills import skill
is added. Ensure that theskill
module is correctly implemented and contains all the necessary classes or functions that are used in this file. Also, verify that theskill
module does not have any side effects upon import.306-307: The
set_instance_type
method is called on theSmartDatalake
instance. Ensure that this method is correctly implemented in theSmartDatalake
class and that it correctly handles the instance type. Also, verify that the instance type is used correctly in theSmartDatalake
class.326-330: The
add_skills
method is added to theSmartDataframe
class. This method calls theadd_skills
method on theSmartDatalake
instance. Ensure that theadd_skills
method is correctly implemented in theSmartDatalake
class and that it correctly handles the skills. Also, verify that the skills are used correctly in theSmartDatalake
class.692-698: The
last_reasoning
andlast_answer
properties are added to theSmartDataframe
class. These properties return thelast_reasoning
andlast_answer
attributes of theSmartDatalake
instance, respectively. Ensure that these attributes are correctly set in theSmartDatalake
class and that they correctly represent the last reasoning and answer.tests/test_smartdatalake.py (1)
- 194-211: This test case is checking if the
last_answer
andlast_reasoning
attributes are correctly set after a chat interaction. The test case seems to be well written and covers the necessary assertions. However, it would be beneficial to add a test case for whenuse_advanced_reasoning_framework
is set toFalse
to ensure that thelast_answer
andlast_reasoning
attributes behave as expected in that scenario as well.pandasai/helpers/code_manager.py (8)
9-9: The
SkillsManager
class is imported but not used in this hunk. Ensure that it is used in the code sections not included in this hunk, otherwise, it's an unnecessary import.28-50: The
CodeExecutionContext
class is introduced to provide additional context during code execution. It includes aprompt_id
and askills_manager
. The class is well-structured and follows the best practices of encapsulation with the use of properties.208-217: The
execute_code
method now takes an instance ofCodeExecutionContext
as a parameter instead ofprompt_id
. This change provides more context during code execution. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.234-240: The
execute_code
method now uses theprompt_id
from thecontext
parameter. Also, it resets theused_skills
in theskills_manager
before executing the code. This is a good practice as it ensures that theused_skills
are reset before each code execution.242-242: The
_clean_code
method now takes an instance ofCodeExecutionContext
as a parameter. This change provides more context during code cleaning. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.257-261: The
execute_code
method now adds the used skills to the environment before executing the code. This is a good practice as it ensures that the skills are available in the environment during code execution.411-411: The
_clean_code
method now takes an instance ofCodeExecutionContext
as a parameter. This change provides more context during code cleaning. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.435-450: The
_clean_code
method now checks for used skills inside theanalyze_data
function and adds them to theskills_manager
. This is a good practice as it ensures that the used skills are tracked and can be reused in subsequent code executions.tests/prompts/test_generate_python_code_prompt.py (6)
54-54: The new variable "skills" is being set to an empty string. Ensure that this is the intended behavior and that the "skills" variable is used appropriately in the rest of the code.
69-69: The instruction about not changing the params of the initial Python function has been updated to include a note about using the right dataframes given the context. This is a good addition as it provides more clarity to the user.
79-79: The instruction for the "Analyze" step has been updated to specify that if the user asks to plot a chart, it must be saved as an image in "temp_chart.png" and not shown. This is a good change as it provides clear instructions to the user about how to handle chart requests.
85-88: New instructions have been added to guide the user on how to approach the task and how to format their answer. These instructions provide additional clarity and guidance to the user, which can help improve the quality of the user's work.
159-159: The instruction for the "Analyze" step in the custom instructions has been updated to match the change made in the default instructions. This ensures consistency between the default and custom instructions.
170-170: The instruction for the "Analyze" step in the test for custom instructions has been updated to match the change made in the default instructions. This ensures that the test is checking the correct behavior.
tests/test_codemanager.py (3)
13-13: The
CodeExecutionContext
class is now imported and used in the test suite. Ensure that the tests cover all the functionalities of this new class.74-77: A new fixture
exec_context
is introduced. It creates a mock object forCodeExecutionContext
. This is a good practice as it allows for isolation of tests and makes them more reliable.190-200: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [182-197]
The
exec_context
fixture is now used as an argument in the test methods. This change is consistent with the introduction of theCodeExecutionContext
class. Ensure that theCodeExecutionContext
class is properly mocked and that the tests still cover all the functionalities of theCodeManager
class.pandasai/smart_datalake/__init__.py (16)
20-28: The new imports
SkillsManager
,skill
, andQueryExecTracker
are introduced. Ensure these modules are correctly implemented and available in the project structure.44-48: The new imports
CodeExecutionContext
andAdvancedReasoningDisabledError
are introduced. Ensure these modules are correctly implemented and available in the project structure.57-64: New instance variables
_conversation_id
,_skills
,_instance
, and_query_exec_tracker
are introduced. Make sure they are used appropriately throughout the class.110-111: A
SkillsManager
instance is being created. Ensure that theSkillsManager
class is correctly implemented and that its methods are used appropriately in theSmartDatalake
class.124-130: The
_conversation_id
is being set to a new UUID,_instance
is being set to the class name, and_query_exec_tracker
is being initialized with the log server configuration. Ensure that these changes are compatible with the rest of the codebase.218-222: The
add_skills
method is introduced to add skills to theSkillsManager
. Ensure that theadd_skills
method of theSkillsManager
class is correctly implemented and that it handles the input correctly.257-266: The
prompt
is being set with the configuration, dataframes, conversation, and skills. Ensure that these changes are compatible with the rest of the codebase and that theprompt
is used correctly.314-326: The
QueryExecTracker
is being used to start a new track, add query info, and add dataframes. Ensure that these methods are correctly implemented in theQueryExecTracker
class and that they are used correctly in theSmartDatalake
class.338-340: The
QueryExecTracker
is being used to execute the cache get function. Ensure that theexecute_func
method of theQueryExecTracker
class is correctly implemented and that it handles the input correctly.356-367: The
QueryExecTracker
is being used to execute theget_prompt
andgenerate_code
methods. Ensure that these methods are correctly implemented and that they handle the input correctly.393-397: A
CodeExecutionContext
is being created and passed to theexecute_code
method. Ensure that theCodeExecutionContext
class is correctly implemented and that it is used correctly in theSmartDatalake
class.416-422: The
QueryExecTracker
is being used to execute theretry_run_code
method. Ensure that this method is correctly implemented and that it handles the input correctly.450-475: The
QueryExecTracker
is being used to get the execution time, execute theparse
method, set the success flag, and publish the results. Ensure that these methods are correctly implemented and that they handle the input correctly.514-531: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [492-521]
The
_retry_run_code
method now returns a list. Ensure that this change is compatible with the rest of the codebase.
686-709: The
last_reasoning
andlast_answer
properties are introduced. They raise anAdvancedReasoningDisabledError
if the advanced reasoning framework is not enabled. Ensure that this exception is handled correctly in the codebase.734-736: The
instance
property is introduced. Ensure that it is used correctly in the codebase.pandasai/schemas/df_config.py (3)
11-15: The new
LogServerConfig
class is a TypedDict that includesserver_url
andapi_key
. Ensure that the server URL and API key are being properly validated elsewhere in the code, as this TypedDict does not include any validation logic. Also, consider encrypting or securely storing theapi_key
as it is sensitive information.16-22: A new configuration option
use_advanced_reasoning_framework
has been added to theConfig
class. Ensure that this new configuration option is being used correctly throughout the codebase.35-35: The
log_server
attribute of typeLogServerConfig
has been added to theConfig
class. Ensure that this new attribute is being used correctly throughout the codebase and that it is being properly initialized.
def skill(skill_function): | ||
def wrapped_function(*args, **kwargs): | ||
return skill_function(*args, **kwargs) | ||
|
||
wrapped_function.name = skill_function.__name__ | ||
wrapped_function.func_def = ( | ||
"""def pandasai.skills.{funcion_name}{signature}""".format( | ||
funcion_name=wrapped_function.name, | ||
signature=str(inspect.signature(skill_function)), | ||
) | ||
) | ||
|
||
doc_string = skill_function.__doc__ | ||
|
||
wrapped_function.print = ( | ||
""" | ||
<function> | ||
{signature} | ||
{doc_string} | ||
</function> | ||
""" | ||
).format( | ||
signature=wrapped_function.func_def, | ||
doc_string=""" \"\"\"{0}\n \"\"\"""".format(doc_string) | ||
if doc_string is not None | ||
else "", | ||
) | ||
|
||
return wrapped_function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The skill
function is a decorator that wraps a given function, preserving its name and signature, and providing a formatted string representation. However, there's a typo in the string format placeholder on line 10, which should be function_name
instead of funcion_name
. This could lead to an error when trying to format the string. Here's the corrected code:
- 10: """def pandasai.skills.{funcion_name}{signature}""".format(
+ 10: """def pandasai.skills.{function_name}{signature}""".format(
Also, the wrapped_function.print
attribute seems to be a string representation of the function, including its signature and docstring. However, it's not clear why this is necessary, as Python already provides the inspect
module for this purpose. If there's a specific reason for this, it might be worth documenting it in the code.
import matplotlib.pyplot as plt | ||
|
||
plt.bar(name, salaries) | ||
plt.xlabel("Employee Name") | ||
plt.ylabel("Salary") | ||
plt.title("Employee Salaries") | ||
plt.xticks(rotation=45) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The matplotlib.pyplot
import is inside the function plot_salaries
. This could lead to performance issues if the function is called multiple times, as the import statement will be executed each time the function is called. It's a best practice to place all import statements at the top of the file. This makes it easier to see what modules are used in the script and avoids repeated imports.
+ import matplotlib.pyplot as plt
...
- import matplotlib.pyplot as plt
import matplotlib.pyplot as plt | ||
|
||
plt.bar(name, salary) | ||
plt.xlabel("Employee Name") | ||
plt.ylabel("Salary") | ||
plt.title("Employee Salaries") | ||
plt.xticks(rotation=45) | ||
plt.savefig("temp_chart.png") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same issue as above is present here with the matplotlib.pyplot
import inside the function plot_salaries
. Move the import to the top of the file.
+ import matplotlib.pyplot as plt
...
- import matplotlib.pyplot as plt
import os | ||
import pandas as pd | ||
from pandasai import Agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The os
and pd
(pandas) modules are imported but not used in this code snippet. If they are not used elsewhere in the code, consider removing these imports to clean up the code.
- import os
- import pandas as pd
[employees_df, salaries_df], | ||
config={ | ||
"llm": llm, | ||
"enable_cache": True, | ||
"log_server": { | ||
"server_url": "SERVER_URL", | ||
"api_key": "YOUR_API_KEY", | ||
}, | ||
}, | ||
memory_size=10, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Agent
class is instantiated twice with the same parameters. This could be a potential redundancy. If the two instances are not used differently, consider reusing the same instance to adhere to the DRY (Don't Repeat Yourself) principle.
37, 57:
The same chat query is being made twice. If the responses are expected to be the same, consider removing one of the queries to avoid redundancy.
@@ -147,6 +169,7 @@ def test_exception_handling( | |||
code_manager.execute_code = Mock( | |||
side_effect=NoCodeFoundError("No code found in the answer.") | |||
) | |||
code_manager.execute_code.__name__ = "execute_code" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The __name__
attribute of the execute_code
mock object is set to "execute_code". This seems unnecessary as the __name__
attribute of a Mock
object is automatically set to the name of the method it's mocking. If there's a specific reason for this, it's fine. Otherwise, consider removing it.
- code_manager.execute_code.__name__ = "execute_code"
def __init__( | ||
self, | ||
server_config: Union[dict, None] = None, | ||
) -> None: | ||
self._success = False | ||
self._start_time = None | ||
self._server_config = server_config | ||
self._query_info = {} | ||
self._is_related_query = True | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The QueryExecTracker
class is initialized with a server_config
parameter which is optional and can be None
. However, there is no check to ensure that server_config
is a dictionary when it is not None
. This could lead to unexpected behavior if a non-dictionary object is passed. Consider adding a type check for server_config
in the constructor.
def __init__(
self,
server_config: Union[dict, None] = None,
) -> None:
self._success = False
self._start_time = None
+ if server_config is not None and not isinstance(server_config, dict):
+ raise ValueError("server_config must be a dictionary or None")
self._server_config = server_config
self._query_info = {}
self._is_related_query = True
def execute_func(self, function, *args, **kwargs) -> Any: | ||
""" | ||
Tracks function executions, calculates execution time and prepare data | ||
Args: | ||
function (function): Function that is to be executed | ||
|
||
Returns: | ||
Any: Response return after function execution | ||
""" | ||
start_time = time.time() | ||
|
||
# Get the tag from kwargs if provided, or use the function name as the default | ||
tag = kwargs.pop("tag", function.__name__) | ||
|
||
try: | ||
result = function(*args, **kwargs) | ||
|
||
execution_time = time.time() - start_time | ||
if tag not in exec_steps: | ||
return result | ||
|
||
step_data = self._generate_exec_step(tag, result) | ||
|
||
step_data["success"] = True | ||
step_data["execution_time"] = execution_time | ||
|
||
self._steps.append(step_data) | ||
|
||
return result | ||
|
||
except Exception: | ||
execution_time = time.time() - start_time | ||
self._steps.append( | ||
{ | ||
"type": exec_steps[tag], | ||
"success": False, | ||
"execution_time": execution_time, | ||
} | ||
) | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The execute_func
method is designed to execute a function and track its execution time. However, it also modifies the function's kwargs by popping the "tag" key if it exists. This could lead to unexpected behavior if the function relies on this key. Consider copying the kwargs before modifying them.
def execute_func(self, function, *args, **kwargs) -> Any:
"""
Tracks function executions, calculates execution time and prepare data
Args:
function (function): Function that is to be executed
Returns:
Any: Response return after function execution
"""
start_time = time.time()
# Get the tag from kwargs if provided, or use the function name as the default
+ kwargs_copy = kwargs.copy()
- tag = kwargs.pop("tag", function.__name__)
+ tag = kwargs_copy.pop("tag", function.__name__)
try:
- result = function(*args, **kwargs)
+ result = function(*args, **kwargs_copy)
if self._server_config is None: | ||
server_url = os.environ.get("LOGGING_SERVER_URL") | ||
api_key = os.environ.get("LOGGING_SERVER_API_KEY") | ||
else: | ||
server_url = self._server_config.get( | ||
"server_url", os.environ.get("LOGGING_SERVER_URL") | ||
) | ||
api_key = self._server_config.get( | ||
"api_key", os.environ.get("LOGGING_SERVER_API_KEY") | ||
) | ||
|
||
if api_key is None or server_url is None: | ||
return | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The publish
method retrieves the server_url
and api_key
from the server_config
dictionary or from environment variables. However, it does not validate these values before attempting to send a request. If either of these values is None
or an empty string, the request will fail. Consider adding validation for these values before sending the request.
if self._server_config is None:
server_url = os.environ.get("LOGGING_SERVER_URL")
api_key = os.environ.get("LOGGING_SERVER_API_KEY")
else:
server_url = self._server_config.get(
"server_url", os.environ.get("LOGGING_SERVER_URL")
)
api_key = self._server_config.get(
"api_key", os.environ.get("LOGGING_SERVER_API_KEY")
)
+ if not server_url or not api_key:
+ raise ValueError("server_url and api_key must be provided")
if api_key is None or server_url is None:
return
def _format_response(self, result: ResponseType) -> ResponseType: | ||
""" | ||
Format output response | ||
Args: | ||
result (ResponseType): response returned after execution | ||
|
||
Returns: | ||
ResponseType: formatted response output | ||
""" | ||
formatted_result = {} | ||
if result["type"] == "dataframe": | ||
formatted_result = { | ||
"type": result["type"], | ||
"value": { | ||
"headers": result["value"].columns.tolist(), | ||
"rows": result["value"].values.tolist(), | ||
}, | ||
} | ||
return formatted_result | ||
elif result["type"] == "plot": | ||
with open(result["value"], "rb") as image_file: | ||
image_data = image_file.read() | ||
# Encode the image data to Base64 | ||
base64_image = ( | ||
f"data:image/png;base64,{base64.b64encode(image_data).decode()}" | ||
) | ||
formatted_result = { | ||
"type": result["type"], | ||
"value": base64_image, | ||
} | ||
|
||
return formatted_result | ||
else: | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _format_response
method handles different types of responses, but it does not handle unexpected types. If the result
dictionary contains a type that is not "dataframe" or "plot", the method simply returns the original result. This could lead to inconsistent return types. Consider adding a default case to handle unexpected types.
def _format_response(self, result: ResponseType) -> ResponseType:
"""
Format output response
Args:
result (ResponseType): response returned after execution
Returns:
ResponseType: formatted response output
"""
formatted_result = {}
if result["type"] == "dataframe":
formatted_result = {
"type": result["type"],
"value": {
"headers": result["value"].columns.tolist(),
"rows": result["value"].values.tolist(),
},
}
return formatted_result
elif result["type"] == "plot":
with open(result["value"], "rb") as image_file:
image_data = image_file.read()
# Encode the image data to Base64
base64_image = (
f"data:image/png;base64,{base64.b64encode(image_data).decode()}"
)
formatted_result = {
"type": result["type"],
"value": base64_image,
}
return formatted_result
+ else:
+ raise ValueError(f"Unexpected result type: {result['type']}")
else:
return result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files ignored due to filter (1)
- mkdocs.yml
Files selected for processing (2)
- pandasai/helpers/code_manager.py (8 hunks)
- pandasai/smart_dataframe/init.py (4 hunks)
Additional comments: 12
pandasai/smart_dataframe/__init__.py (4)
29-29: The import of
skill
frompandasai.skills
is new. Ensure that theskill
module and its usage are properly tested and documented. Also, verify that theskill
module does not introduce any circular dependencies.306-307: The
set_instance_type
method is being called on theSmartDatalake
instance. Ensure that this method exists in theSmartDatalake
class and that it properly handles the instance type. Also, verify that the instance type is used appropriately within theSmartDatalake
class.326-330: The
add_skills
method is a new addition to theSmartDataframe
class. Ensure that this method is properly tested and that the skills are correctly added to theSmartDatalake
instance. Also, verify that the skills are used appropriately within theSmartDatalake
class.695-701: The
last_reasoning
andlast_answer
properties are new additions to theSmartDataframe
class. Ensure that these properties are properly tested and that they correctly retrieve the last reasoning and answer from theSmartDatalake
instance. Also, verify that these properties are used appropriately within theSmartDataframe
class.pandasai/helpers/code_manager.py (8)
9-9: The
SkillsManager
class is imported but not used in this hunk. Ensure that it is used in the code sections not included in this hunk.28-50: The
CodeExecutionContext
class is introduced to provide additional context for code execution. It includes aprompt_id
and askills_manager
. The class is well-structured and follows best practices.208-217: The
execute_code
method now takes acontext
parameter of typeCodeExecutionContext
instead ofprompt_id
. Ensure that all calls to this method throughout the codebase have been updated to match the new signature.234-240: The
file_name
foradd_save_chart
is now derived fromcontext.prompt_id
instead ofprompt_id
. Also, theused_skills
of theskills_manager
in the context is reset before the code is cleaned and executed. This is a good practice as it ensures that the skills used are tracked accurately for each execution context.242-242: The
_clean_code
method now takes acontext
parameter of typeCodeExecutionContext
. Ensure that all calls to this method throughout the codebase have been updated to match the new signature.257-262: The environment for code execution now includes the skills used in the current context. This is a good practice as it allows the executed code to use the skills defined in the context.
411-411: The
_clean_code
method now takes acontext
parameter of typeCodeExecutionContext
. Ensure that all calls to this method throughout the codebase have been updated to match the new signature.435-451: The
_clean_code
method now checks for used skills inside theanalyze_data
function. This is a good practice as it allows the method to accurately track the skills used in the code. However, theadd_used_skill
method could potentially add a function name that is not a skill. Consider checking if the function name is a skill before adding it toused_skills
.- context.skills_manager.add_used_skill(function_name) + if context.skills_manager.is_skill(function_name): + context.skills_manager.add_used_skill(function_name)
* Added support for sqlite connectors * Removed linting error * Added tests for sqlite connector * chore: style the code --------- Co-authored-by: Gabriele Venturi <[email protected]>
… returning strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files selected for processing (6)
- docs/connectors.md (1 hunks)
- examples/from_sql.py (2 hunks)
- pandasai/connectors/init.py (2 hunks)
- pandasai/connectors/base.py (1 hunks)
- pandasai/connectors/sql.py (2 hunks)
- tests/connectors/test_sqlite.py (1 hunks)
Files skipped from review due to trivial changes (2)
- pandasai/connectors/init.py
- tests/connectors/test_sqlite.py
Additional comments: 5
pandasai/connectors/base.py (1)
- 42-49: The new
SqliteConnectorConfig
class extendsSQLBaseConnectorConfig
and adds two new attributes:table
anddatabase
. This class will be used to configure the connection to SQLite databases. Ensure that thetable
anddatabase
attributes are correctly used in theSqliteConnector
class and that the user is informed about these new configuration parameters in the documentation.examples/from_sql.py (2)
5-5: The import of
SqliteConnector
is added to the existing imports. Ensure that theSqliteConnector
class is implemented correctly and is compatible with the existing codebase.51-53: The
invoice_connector
is added to theSmartDatalake
instance. Ensure that theSmartDatalake
class can handle multiple connectors and that theSqliteConnector
is compatible with it.- "database": "local_path_to_db", + "database": os.getenv("SQLITE_DB_PATH", "local_path_to_db"),Committable suggestion (Beta)
# With a Sqlite database invoice_connector = SqliteConnector( config={ "database": os.getenv("SQLITE_DB_PATH", "local_path_to_db"), "table": "invoices", "where": [["status", "=", "pending"]], } )
pandasai/connectors/sql.py (2)
8-8: The import statement has been updated to include
SqliteConnectorConfig
. This is a new configuration class for theSqliteConnector
class. Ensure that this class is defined and correctly handles the configuration for SQLite connections.367-450: A new
SqliteConnector
class has been added. This class inherits from theSQLConnector
class and is used to connect to SQLite databases. The class includes methods for initializing the connector, loading the configuration, initializing the database connection, closing the connection, getting the head of the data source, and returning the string representation of the connector.The
__init__
method has been updated to set the dialect to "sqlite" and populate the configuration from environment variables if the configuration is a dictionary. The_load_connector_config
method uses theSqliteConnectorConfig
class to load the configuration. The_init_connection
method creates a SQLAlchemy engine and connection for the SQLite database. Thehead
method runs a SQL query to get the column names and 5 random rows from the data source. The__repr__
method returns a string representation of the connector.Ensure that the
SqliteConnectorConfig
class correctly handles the configuration for SQLite connections and that theSqliteConnector
class correctly connects to SQLite databases and performs the necessary operations.
# With a Sqlite databse | ||
|
||
invoice_connector = SqliteConnector( | ||
config={ | ||
"database": "local_path_to_db", | ||
"table": "invoices", | ||
"where": [["status", "=", "pending"]], | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new SqliteConnector
instance invoice_connector
is created. The database path is hardcoded as "local_path_to_db". This might not be the correct path for all environments. Consider making this a configurable parameter.
### Sqlite connector | ||
|
||
Similarly to the PostgreSQL and MySQL connectors, the Sqlite connector allows you to connect to a local Sqlite database file. It is designed to be easy to use, even if you are not familiar with Sqlite or with PandasAI. | ||
|
||
To use the Sqlite connector, you only need to import it into your Python code and pass it to a `SmartDataframe` or `SmartDatalake` object: | ||
|
||
```python | ||
from pandasai.connectors import SqliteConnector | ||
|
||
connector = SqliteConnector(config={ | ||
"database" : "PATH_TO_DB", | ||
"table" : "actor", | ||
"where" :[ | ||
["first_name","=","PENELOPE"] | ||
] | ||
}) | ||
|
||
df = SmartDataframe(connector) | ||
df.chat('How many records are there ?') | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new Sqlite connector is a great addition to the library. However, it's important to note that the "database" parameter in the configuration expects a file path to the SQLite database. This could potentially lead to file path issues if not handled correctly. Ensure that the file path is always valid and the file exists before trying to connect to the database. Also, consider adding error handling for invalid file paths or inaccessible files.
connector = SqliteConnector(config={
"database" : "PATH_TO_DB",
"table" : "actor",
"where" :[
["first_name","=","PENELOPE"]
]
})
Committable suggestion (Beta)
### Sqlite connector | |
Similarly to the PostgreSQL and MySQL connectors, the Sqlite connector allows you to connect to a local Sqlite database file. It is designed to be easy to use, even if you are not familiar with Sqlite or with PandasAI. | |
To use the Sqlite connector, you only need to import it into your Python code and pass it to a `SmartDataframe` or `SmartDatalake` object: | |
```python | |
from pandasai.connectors import SqliteConnector | |
connector = SqliteConnector(config={ | |
"database" : "PATH_TO_DB", | |
"table" : "actor", | |
"where" :[ | |
["first_name","=","PENELOPE"] | |
] | |
}) | |
df = SmartDataframe(connector) | |
df.chat('How many records are there ?') | |
``` | |
import os | |
database_path = "PATH_TO_DB" | |
if not os.path.exists(database_path): | |
raise FileNotFoundError(f"The database file {database_path} does not exist.") | |
connector = SqliteConnector(config={ | |
"database" : database_path, | |
"table" : "actor", | |
"where" :[ | |
["first_name","=","PENELOPE"] | |
] | |
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 3
Configuration used: CodeRabbit UI
Files selected for processing (2)
- pandasai/helpers/output_types/_output_types.py (1 hunks)
- tests/test_smartdataframe.py (5 hunks)
Files skipped from review due to trivial changes (1)
- pandasai/helpers/output_types/_output_types.py
Additional comments: 3
tests/test_smartdataframe.py (3)
228-233: The instructions for the
analyze_data
function have been updated to specify that if the user asks to plot a chart, it should be saved as an image intemp_chart.png
and not displayed. This is a good change as it ensures that the function behaves consistently across different environments (e.g., a headless server where displaying a chart might not be possible). However, it would be helpful to specify wheretemp_chart.png
should be saved (e.g., in the current directory, in a specific directory, etc.).243-246: The instructions have been updated to specify that the "technical" names of the tables should not be written in the answer. This is a good change as it encourages the use of user-friendly names in the output, which can improve the user experience. However, it would be helpful to provide some examples of what constitutes a "technical" name and what a suitable replacement might be.
295-298: The instructions have been updated to specify that the updated
analyze_data
function should be returned wrapped withinpython
. This is a good change as it ensures that the returned function is properly formatted as Python code. However, it would be helpful to provide an example of what this should look like.
@@ -215,7 +215,7 @@ def test_run_with_privacy_enforcement(self, llm): | |||
User: How many countries are in the dataframe? | |||
</conversation> | |||
|
|||
This is the initial python function. Do not change the params. | |||
This is the initial python function. Do not change the params. Given the context, use the right dataframes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment "This is the initial python function. Do not change the params. Given the context, use the right dataframes." is a bit vague. It would be helpful to specify which function this comment is referring to and what parameters should not be changed. Also, it would be beneficial to clarify what is meant by "use the right dataframes" - does this mean to select the appropriate dataframe from a set of dataframes based on the context? If so, what criteria should be used to select the appropriate dataframe?
@@ -275,7 +276,7 @@ def test_run_passing_output_type(self, llm, output_type, output_type_hint): | |||
User: How many countries are in the dataframe? | |||
</conversation> | |||
|
|||
This is the initial python function. Do not change the params. | |||
This is the initial python function. Do not change the params. Given the context, use the right dataframes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment "This is the initial python function. Do not change the params. Given the context, use the right dataframes." is repeated here. As mentioned earlier, it would be helpful to clarify this comment.
3. Analyze: Conducting the actual analysis (if the user asks to plot a chart you must save it as an image in temp_chart.png and not show the chart.) | ||
At the end, return a dictionary of: | ||
{output_type_hint} | ||
""" | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instructions for the analyze_data
function are repeated here. It would be helpful to avoid repetition and only include these instructions once, perhaps at the start of the test suite.
Summary by CodeRabbit
New Features:
SqliteConnector
to connect to SQLite databases, expanding the range of supported data sources.Improvements:
QueryExecTracker
class.Bug Fixes:
CodeManager
class.Tests:
Documentation: