Skip to content

Commit 16426de

Browse files
committed
feat: add examples in javascript
1 parent 5f5ec1b commit 16426de

7 files changed

+956
-2
lines changed

scrapegraph-js/PAGINATION.md

Lines changed: 244 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,244 @@
1+
# SmartScraper Pagination
2+
3+
This document describes the pagination functionality added to the ScrapeGraph JavaScript SDK.
4+
5+
## Overview
6+
7+
The `smartScraper` function now supports pagination, allowing you to scrape multiple pages of content in a single request. This is particularly useful for e-commerce sites, search results, news feeds, and other paginated content.
8+
9+
## Usage
10+
11+
### Basic Pagination
12+
13+
```javascript
14+
import { smartScraper } from 'scrapegraph-js';
15+
16+
const apiKey = process.env.SGAI_APIKEY;
17+
const url = 'https://example.com/products';
18+
const prompt = 'Extract all product information';
19+
const totalPages = 5; // Scrape 5 pages
20+
21+
const result = await smartScraper(apiKey, url, prompt, null, null, totalPages);
22+
```
23+
24+
### Pagination with Schema
25+
26+
```javascript
27+
import { smartScraper } from 'scrapegraph-js';
28+
import { z } from 'zod';
29+
30+
const ProductSchema = z.object({
31+
products: z.array(z.object({
32+
name: z.string(),
33+
price: z.string(),
34+
rating: z.string().optional(),
35+
})),
36+
});
37+
38+
const result = await smartScraper(
39+
apiKey,
40+
url,
41+
prompt,
42+
ProductSchema,
43+
null,
44+
3 // 3 pages
45+
);
46+
```
47+
48+
### Pagination with Scrolling
49+
50+
```javascript
51+
const result = await smartScraper(
52+
apiKey,
53+
url,
54+
prompt,
55+
null,
56+
10, // 10 scrolls per page
57+
2 // 2 pages
58+
);
59+
```
60+
61+
### All Features Combined
62+
63+
```javascript
64+
const result = await smartScraper(
65+
apiKey,
66+
url,
67+
prompt,
68+
ProductSchema,
69+
5, // numberOfScrolls
70+
3 // totalPages
71+
);
72+
```
73+
74+
## Function Signature
75+
76+
```javascript
77+
smartScraper(apiKey, url, prompt, schema, numberOfScrolls, totalPages)
78+
```
79+
80+
### Parameters
81+
82+
- `apiKey` (string): Your ScrapeGraph AI API key
83+
- `url` (string): The URL of the webpage to scrape
84+
- `prompt` (string): Natural language prompt describing what data to extract
85+
- `schema` (Object, optional): Zod schema object defining the output structure
86+
- `numberOfScrolls` (number, optional): Number of times to scroll the page (0-100)
87+
- `totalPages` (number, optional): Number of pages to scrape (1-10)
88+
89+
### Parameter Validation
90+
91+
- `totalPages` must be an integer between 1 and 10
92+
- `numberOfScrolls` must be an integer between 0 and 100
93+
- Both parameters are optional and default to `null`
94+
95+
## Examples
96+
97+
### E-commerce Product Scraping
98+
99+
```javascript
100+
import { smartScraper } from 'scrapegraph-js';
101+
import { z } from 'zod';
102+
103+
const ProductSchema = z.object({
104+
products: z.array(z.object({
105+
name: z.string(),
106+
price: z.string(),
107+
rating: z.string().optional(),
108+
image_url: z.string().optional(),
109+
})),
110+
});
111+
112+
const result = await smartScraper(
113+
process.env.SGAI_APIKEY,
114+
'https://www.amazon.com/s?k=laptops',
115+
'Extract all laptop products with name, price, rating, and image',
116+
ProductSchema,
117+
null,
118+
5 // Scrape 5 pages of results
119+
);
120+
```
121+
122+
### News Articles Scraping
123+
124+
```javascript
125+
const NewsSchema = z.object({
126+
articles: z.array(z.object({
127+
title: z.string(),
128+
summary: z.string(),
129+
author: z.string().optional(),
130+
date: z.string().optional(),
131+
})),
132+
});
133+
134+
const result = await smartScraper(
135+
process.env.SGAI_APIKEY,
136+
'https://news.example.com',
137+
'Extract all news articles with title, summary, author, and date',
138+
NewsSchema,
139+
3, // Scroll 3 times per page
140+
4 // Scrape 4 pages
141+
);
142+
```
143+
144+
## Error Handling
145+
146+
The function will throw an error if:
147+
- `totalPages` is not an integer between 1 and 10
148+
- `numberOfScrolls` is not an integer between 0 and 100
149+
- API key is invalid
150+
- Network request fails
151+
152+
```javascript
153+
try {
154+
const result = await smartScraper(apiKey, url, prompt, null, null, totalPages);
155+
console.log('Success:', result);
156+
} catch (error) {
157+
if (error.message.includes('totalPages')) {
158+
console.error('Pagination error:', error.message);
159+
} else {
160+
console.error('Other error:', error.message);
161+
}
162+
}
163+
```
164+
165+
## Backward Compatibility
166+
167+
The pagination feature is fully backward compatible. All existing function calls will continue to work:
168+
169+
```javascript
170+
// These all work as before
171+
await smartScraper(apiKey, url, prompt);
172+
await smartScraper(apiKey, url, prompt, schema);
173+
await smartScraper(apiKey, url, prompt, schema, numberOfScrolls);
174+
```
175+
176+
## Performance Considerations
177+
178+
- Pagination requests may take significantly longer than single-page requests
179+
- Consider using smaller `totalPages` values for testing
180+
- Some websites may not support pagination
181+
- Rate limiting may apply for large pagination requests
182+
183+
## Testing
184+
185+
Run the pagination tests:
186+
187+
```bash
188+
npm test
189+
```
190+
191+
Or run specific examples:
192+
193+
```bash
194+
node examples/smartScraper_pagination_example.js
195+
node examples/smartScraper_pagination_enhanced_example.js
196+
node examples/smartScraper_pagination_with_scroll_example.js
197+
```
198+
199+
## Best Practices
200+
201+
1. **Start Small**: Begin with 1-2 pages for testing
202+
2. **Use Schemas**: Define clear schemas for structured data extraction
203+
3. **Error Handling**: Always wrap calls in try-catch blocks
204+
4. **Rate Limiting**: Be mindful of API rate limits with large pagination requests
205+
5. **Website Compatibility**: Not all websites support pagination - test thoroughly
206+
6. **Performance**: Monitor request times and adjust parameters accordingly
207+
208+
## Troubleshooting
209+
210+
### Common Issues
211+
212+
1. **Validation Error**: Ensure `totalPages` is between 1-10
213+
2. **Timeout**: Try reducing `totalPages` or `numberOfScrolls`
214+
3. **No Results**: Some websites may not support pagination
215+
4. **Rate Limiting**: Reduce request frequency or pagination size
216+
217+
### Debug Tips
218+
219+
```javascript
220+
console.log('Starting pagination request...');
221+
console.log('URL:', url);
222+
console.log('Total Pages:', totalPages);
223+
console.log('Number of Scrolls:', numberOfScrolls);
224+
225+
const startTime = Date.now();
226+
const result = await smartScraper(apiKey, url, prompt, schema, numberOfScrolls, totalPages);
227+
const duration = Date.now() - startTime;
228+
229+
console.log('Request completed in:', duration, 'ms');
230+
console.log('Result type:', typeof result);
231+
```
232+
233+
## Support
234+
235+
For issues or questions about pagination functionality:
236+
237+
1. Check the examples in the `examples/` directory
238+
2. Run the test suite with `npm test`
239+
3. Review the error messages for specific guidance
240+
4. Check the main SDK documentation
241+
242+
---
243+
244+
*This pagination feature is designed to work with the existing ScrapeGraph AI API and maintains full backward compatibility with existing code.*

0 commit comments

Comments
 (0)