Note
A stable release (v1.2.0) will be made available on Friday, March 8, 2024. Please check back here for updates.
This toolkit provides a straightforward interface for interacting with Google's Gemini Pro 1.0 and upcoming 1.5 AI models. It facilitates tasks such as text generation, image captioning & analysis, and multi-turn chat (chatbot) functionality by abstracting complex API calls into simpler, more accessible commands. This tool is especially useful for everyday users, developers and researchers who wish to incorporate advanced AI capabilities into their projects without delving into the intricacies of direct API communication offering access to the full suite of Google's Gemini Pro and soon Gemini Ultra large language models.
- Chat Functionality: Chat with Gemini's advanced conversational models.
- Image Captioning: Analyze images and generate descriptive captions or insights.
- Text Generation: Generate creative and contextually relevant text based on prompts.
- Command-Line Interface (CLI): Access Gemini AI functionalities directly from the command line for quick integrations and testing.
- Python Wrapper: Enables seamless interaction with the full suite of Gemini models offered by Google using just two lines of code.
Python 3.x
- An API key from Google AI Studio
The following Python packages are required:
requests
: For making HTTP requests to Google's Gemini API.
The following Python packages are optional:
python-dotenv
: For managing API keys and other environment variables.
Follow these steps to set up the Gemini AI Wrapper and CLI on your system:
Clone the repository:
git clone https://github.com/RMNCLDYO/Gemini-AI-Wrapper-and-CLI.git
cd Gemini-AI-Wrapper-and-CLI
Install required Python packages:
pip install -r requirements.txt
- To use the Gemini API, you'll need an API key. If you don't already have one, create a key in Google AI Studio.
- Once you have your API key, create a new file named
.env
in the root directory (main folder), or rename theexample.env
file in the root directory of this project to.env
. - Add your API key to the
.env
file as follows:API_KEY=your_api_key_here
- The program will automatically load and use your API key when chatting with the language model.
python gemini_cli.py chat
python gemini_cli.py text "Your text prompt here"
python gemini_cli.py vision path/to/your/image.jpg "Vision prompt"
from gemini_chat import ChatAPI
ChatAPI().chat()
from gemini_text import TextAPI
TextAPI().text("Your text prompt here")
from gemini_vision import VisionAPI
VisionAPI().vision("path/to/your/image.jpg", "Vision prompt")
-
Images must be in one of the following image data
MIME types
:- PNG -
image/png
- JPEG -
image/jpeg
- WEBP -
image/webp
- HEIC -
image/heic
- HEIF -
image/heif
- PNG -
-
Maximum of 16 individual images.
-
Maximum of 4MB of data, including images and text.
-
No specific limits to the number of pixels in an image; however, larger images are scaled down to fit a maximum resolution of 3072 x 3072 while preserving their original aspect ratio.
Prompts with a single image tend to yield better results.
Contributions are welcome!
Please refer to CONTRIBUTING.md for detailed guidelines on how to contribute to this project.
Encountered a bug? We'd love to hear about it. Please follow these steps to report any issues:
- Check if the issue has already been reported.
- Use the Bug Report template to create a detailed report.
- Submit the report here.
Your report will help us make the project better for everyone.
Got an idea for a new feature? Feel free to suggest it. Here's how:
- Check if the feature has already been suggested or implemented.
- Use the Feature Request template to create a detailed request.
- Submit the request here.
Your suggestions for improvements are always welcome.
Stay up-to-date with the latest changes and improvements in each version:
- CHANGELOG.md provides detailed descriptions of each release.
Your security is important to us. If you discover a security vulnerability, please follow our responsible disclosure guidelines found in SECURITY.md. Please refrain from disclosing any vulnerabilities publicly until said vulnerability has been reported and addressed.
Licensed under the MIT License. See LICENSE for details.