Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add GitExtractor component #5459

Conversation

raphaelchristi
Copy link
Contributor

@raphaelchristi raphaelchristi commented Dec 26, 2024

This pull request introduces a new component called GitExtractorComponent to the langflow project, for analyzing Git repositories.

Features:

  • Repository Info: Extracts branch, remotes, and commit details
  • Statistics: Calculates file counts, sizes, and line numbers
  • Directory Structure: Generates complete folder tree
  • File Content: Extracts text files with binary handling
  • Memory Safe: Implements content truncation for large repos
  • Error Handling: Graceful error recovery and resource cleanup

The GitExtractorComponent is designed to analyze a Git repository and provide various information and file contents. Additionally, the __init__.py file has been updated to include this new component.

New component addition:

  • src/backend/base/langflow/components/git/gitextractor.py: Introduced the GitExtractorComponent class, which includes methods for retrieving repository information, statistics, directory structure, file contents, and text-based file contents. This component uses asynchronous context management for temporary Git repository cloning and handles potential GitError exceptions.

Initialization update:

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Dec 26, 2024
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Dec 26, 2024
@raphaelchristi raphaelchristi force-pushed the feature/git-extractor-component branch from f2d65d0 to 68737d2 Compare December 26, 2024 18:11
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Dec 26, 2024
Copy link
Contributor

@ogabrielluiz ogabrielluiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, @raphaelchristi

This is looking good.

The tmpdir calls are blocking and it would be better if they were async. Could you refactor that?

Also, you don't need to delete the folder because you could use a context manager that will remove the folder once the code block runs.

@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Dec 28, 2024
@raphaelchristi
Copy link
Contributor Author

Hi @ogabrielluiz ,

Thank you for the review! I've implemented the suggested changes:

  • Converted all methods to async using async/await
  • Added asynccontextmanager for the tmpdir operations
  • Implemented automatic cleanup using the context manager

Let me know if you'd like me to make any additional adjustments to the implementation.

Copy link
Collaborator

@edwinjosechittilappilly edwinjosechittilappilly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 16, 2025
@edwinjosechittilappilly
Copy link
Collaborator

Good Work! @raphaelchristi Please follow up if the tests in CI Fails!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 16, 2025
… cleanup

- Convert methods to async using async/await
- Add asynccontextmanager for automatic tmpdir cleanup
- Remove manual shutil.rmtree calls
@ogabrielluiz ogabrielluiz force-pushed the feature/git-extractor-component branch from 3583f2a to 21f978b Compare January 16, 2025 18:25
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 16, 2025
@edwinjosechittilappilly edwinjosechittilappilly added this pull request to the merge queue Jan 17, 2025
Merged via the queue into langflow-ai:main with commit ed2a761 Jan 17, 2025
34 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants