-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Additions and Improvements: Goodreads Scraper #43
base: master
Are you sure you want to change the base?
Conversation
Functions added: - Custom Scrape: Implemented custom scraping for specific Goodreads lists. - Added arguments for easy script use. See ReadMe.md for detailed information on available arguments. - Implemented functionality to export book IDs from the database to a text format. The new features enhance the flexibility and usability of the script, allowing users to specify custom scraping parameters and export data in a more accessible format.
Modified the file to fix non-working functions and added new functionalities such as format info and publication info.
…extra library as its a standard library module in Python
Hi there! This looks wonderful; thank you for all your work! I haven't had time to test yet but will try to do this ASAP. If we integrate your changes, would you like to be credited in the README? If so, how would you like to be credited (username, name, something else)? These are significant changes that we haven't had time to make ourselves, and I want to make sure you get the credit you deserve. |
Hi! If this goes well this will be my first contribution ^_^ I would love my username(GrimmXoXo or GM) appearing on the contribution but please make sure that this works well. |
- Implemented get_reviews.py, a script that extracts reviews for given book IDs from Goodreads. - The script reads input from a text file containing book IDs and outputs the reviews in a SQLite database. - Added a log file for easier debugging of the script. - Updated requirements.txt to include langdetect, which filters out non-English reviews. - Included an example on how to run the script in the README.md file.
That would be all i think, added a new script to fetch reviews,added a log file for reviews(can also be added to book_id,book_details) to debug problems,added readme inside the folder to use the new script with working example. |
Description
Hi there, I was working on a Book recommendation system project and wanted to get data for my models. I came across your repository and liked the work, so I decided to improve upon this and implement it in my project. If it helps, I would also like to contribute a bit towards Goodreads Scraper.
Changes Made
List out the key changes made in this pull request, including any new features added, bugs fixed, or improvements made.
get_books.py
to ensure compatibility with the current model of the Goodreads website.Files Modified
List the files modified/added in this pull request.
Checklist
Ensure that the following tasks have been completed:
Preview
This is the output for a particular Category/Collection of goodreads
![database_scraper](https://private-user-images.githubusercontent.com/146922228/330418931-7051731f-1032-41f6-b819-83b411b8b6e4.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1MjY0NjcsIm5iZiI6MTczOTUyNjE2NywicGF0aCI6Ii8xNDY5MjIyMjgvMzMwNDE4OTMxLTcwNTE3MzFmLTEwMzItNDFmNi1iODE5LTgzYjQxMWI4YjZlNC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxNFQwOTQyNDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0zOTYxNDc4YmE2NDZiMzJhNGRmZjJlYTM1NmYzYTNlNDZlNzJiZDc4OTIzY2FmMzMxN2Q3YmE3ZDc1ZmNiYzBlJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.bOQeE-tA7JE7gTOOoxdgj3Uyy64yQHHQC_w4-pQENgE)
This is the Output for the json files which we get from get_books.py