Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add GitFile Component #5458

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

raphaelchristi
Copy link
Contributor

This pull request introduces a new GitFileComponent that allows users to analyze Git repositories and retrieve the content of selected files from specified branches. The component provides the following features:

  • Repository URL input for connecting to Git repositories
  • Branch selection dropdown that dynamically fetches available branches
  • File selection interface to choose specific files from the repository
  • File content retrieval functionality with binary file detection
  • Error handling for various Git operations

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Dec 26, 2024
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Dec 26, 2024
@raphaelchristi raphaelchristi force-pushed the feature/git-file-component branch from 744298e to 57f6ce0 Compare December 26, 2024 17:46
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Dec 26, 2024
@raphaelchristi raphaelchristi changed the title feat: add GitFileComponent feat: add GitFile Component Dec 26, 2024
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Dec 26, 2024
Copy link
Contributor

@ogabrielluiz ogabrielluiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at the other Git component to check an example on how to use anyio.Path

@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Dec 28, 2024
Copy link
Contributor

@ogabrielluiz ogabrielluiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, @raphaelchristi !

Comment on lines 47 to 54
@asynccontextmanager
async def temp_git_repo(self):
"""Async context manager for temporary git repository cloning."""
temp_dir = tempfile.mkdtemp()
try:
yield temp_dir
finally:
await asyncio.get_event_loop().run_in_executor(None, lambda: shutil.rmtree(temp_dir, ignore_errors=True))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@asynccontextmanager
async def temp_git_repo(self):
"""Async context manager for temporary git repository cloning."""
temp_dir = tempfile.mkdtemp()
try:
yield temp_dir
finally:
await asyncio.get_event_loop().run_in_executor(None, lambda: shutil.rmtree(temp_dir, ignore_errors=True))
@asynccontextmanager
async def temp_git_repo(self):
"""Context manager for handling temporary clone directory."""
temp_dir = None
try:
temp_dir = tempfile.mkdtemp(prefix="langflow_clone_")
yield temp_dir
finally:
if temp_dir:
await anyio.Path(temp_dir).rmdir()

Comment on lines 116 to 134
await asyncio.get_event_loop().run_in_executor(
None,
lambda: git.Repo.clone_from(
self.repository_url, temp_dir, branch=self.branch, depth=1, single_branch=True
),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work. Use this instead:

Suggested change
await asyncio.get_event_loop().run_in_executor(
None,
lambda: git.Repo.clone_from(
self.repository_url, temp_dir, branch=self.branch, depth=1, single_branch=True
),
)
await asyncio.to_thread(
git.Repo.clone_from,
self.repository_url,
temp_dir,
branch=self.branch,
depth=1,
single_branch=True,
)

Comment on lines 146 to 151
if not self.repository_url:
return [Data(data={"error": "Please enter a repository URL"})]
if not self.branch or self.branch == "Enter repository URL first":
return [Data(data={"error": "Please select a branch"})]
if not self.selected_files:
return [Data(data={"error": "Please select at least one file"})]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be exceptions.

Comment on lines 155 to 177
await asyncio.get_event_loop().run_in_executor(
None,
lambda: git.Repo.clone_from(
self.repository_url, temp_dir, branch=self.branch, depth=1, single_branch=True
),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never use asyncio.get_event_loop() because there may not be a running event loop which will raise an error. Always use asyncio.to_thread.

raphaelchristi added a commit to raphaelchristi/langflow that referenced this pull request Jan 3, 2025
- Replace asyncio.get_event_loop() with asyncio.to_thread()
- Add proper error handling with GitFileError
- Improve temporary directory cleanup
- Make required fields explicit
- Add type hints and docstrings
- Enhance error messages and logging

Fixes langflow-ai#5458
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 3, 2025
@ogabrielluiz ogabrielluiz force-pushed the feature/git-file-component branch from ddd877e to c3b8418 Compare January 10, 2025 15:05
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 10, 2025
…agement

- Introduced `anyio` for improved async file operations.
- Updated temporary directory creation and cleanup logic for better error handling and efficiency.
- Added a test case handling mechanism in the `__call__` method.
- Improved binary file detection and content reading using `aiofiles`.
- Enhanced error logging during temporary directory cleanup.

These changes aim to streamline the GitFileComponent's functionality and improve overall performance.
- Introduced comprehensive unit tests for the GitFileComponent to validate its behavior.
- Added tests for handling missing repository URL, branch, and selected files.
- Implemented tests for binary and text file detection.
- Created tests for retrieving branches and successfully getting file contents.
- Utilized mocking to simulate Git operations and file handling.

These tests aim to ensure the reliability and correctness of the GitFileComponent's functionality.
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 10, 2025
@ogabrielluiz ogabrielluiz requested review from erichare and removed request for ogabrielluiz January 10, 2025 15:51
@edwinjosechittilappilly
Copy link
Collaborator

Experiencing issues when loading a Git repository. I also suggest adding real_time_refresh=True, so that as soon as the Git URL is updated, the branch and the subsequent fields are refreshed automatically instead of having to refresh each field manually.

Issue Faced:
Screenshot 2025-01-16 at 12 52 21 PM

Copy link
Collaborator

@edwinjosechittilappilly edwinjosechittilappilly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Utilise real_time_refresh

Also utilise update_build_config to update the fields data instead of multiple refresh buttons.

@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 18, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 18, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 18, 2025
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jan 18, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 18, 2025
Copy link

codspeed-hq bot commented Jan 18, 2025

CodSpeed Performance Report

Merging #5458 will improve performances by 13.74%

Comparing raphaelchristi:feature/git-file-component (61d4638) with main (d8eabc7)

Summary

⚡ 1 improvements
✅ 14 untouched benchmarks

Benchmarks breakdown

Benchmark main raphaelchristi:feature/git-file-component Change
test_successful_run_with_input_type_any 297.7 ms 261.8 ms +13.74%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants