Skip to content

Commit

Permalink
refactor!: Update the crawlers & storage clients structure (#828)
Browse files Browse the repository at this point in the history
## Description

Update the dir structure of crawlers & storage clients, as discussed
earlier on the Slack.

I decided to export nothing on the 2nd level because of the extras & it
would also be pretty huge (taking into account we have also models
there).

E.g. for BS crawler:
```diff
- from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
+ from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
```

Or for memory storage client:
```diff
- from memory_storage_client import MemoryStorageClient
+ from storage_clients import MemoryStorageClient
```

This should be generally more aligned with the concepts of Crawlee. Of
course, quite a breaking change though. Better to do it now than later.

This will not be applied to the JS version because sub-pkgs like
`PlaywrightCrawler` are its own package.

## Issue

- Closes: #764

## Breaking changes

### Crawlers & CrawlingContexts

- All crawler and crawling context classes have been consolidated into a
single sub-package called `crawlers`.
- The affected classes include: `AbstractHttpCrawler`,
`AbstractHttpParser`, `BasicCrawler`, `BasicCrawlerOptions`,
`BasicCrawlingContext`, `BeautifulSoupCrawler`,
`BeautifulSoupCrawlingContext`, `BeautifulSoupParserType`,
`ContextPipeline`, `HttpCrawler`, `HttpCrawlerOptions`,
`HttpCrawlingContext`, `HttpCrawlingResult`,
`ParsedHttpCrawlingContext`, `ParselCrawler`, `ParselCrawlingContext`,
`PlaywrightCrawler`, `PlaywrightCrawlingContext`,
`PlaywrightPreNavCrawlingContext`.

Example update:
```diff
- from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
+ from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
```

### Storage clients

- All storage client classes have been moved into a single sub-package
called `storage_clients`.
- The affected classes include: `MemoryStorageClient`,
`BaseStorageClient`.

Example update:
```diff
- from crawlee.memory_storage_client import MemoryStorageClient
+ from crawlee.storage_clients import MemoryStorageClient
```

### CurlImpersonateHttpClient

- The `CurlImpersonateHttpClient` changed its import location.

Example update:
```diff
- from crawlee.http_clients.curl_impersonate import CurlImpersonateHttpClient
+ from crawlee.http_clients import CurlImpersonateHttpClient
```
  • Loading branch information
vdusek authored Dec 20, 2024
1 parent c58e973 commit 0ba04d1
Show file tree
Hide file tree
Showing 175 changed files with 479 additions and 345 deletions.
2 changes: 1 addition & 1 deletion docs/deployment/code/apify/crawler_as_actor_example.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from apify import Actor

from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext


async def main() -> None:
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/code/add_data_to_dataset_bs.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext


async def main() -> None:
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/code/add_data_to_dataset_pw.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext


async def main() -> None:
Expand Down
3 changes: 1 addition & 2 deletions docs/examples/code/beautifulsoup_crawler.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
import asyncio
from datetime import timedelta

from crawlee.basic_crawler import BasicCrawlingContext
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.crawlers import BasicCrawlingContext, BeautifulSoupCrawler, BeautifulSoupCrawlingContext


async def main() -> None:
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/code/beautifulsoup_crawler_stop.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext


async def main() -> None:
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/code/capture_screenshot_using_playwright.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.storages import KeyValueStore


Expand Down
2 changes: 1 addition & 1 deletion docs/examples/code/crawl_all_links_on_website_bs.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext


async def main() -> None:
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/code/crawl_all_links_on_website_pw.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext


async def main() -> None:
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/code/crawl_multiple_urls_bs.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext


async def main() -> None:
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/code/crawl_multiple_urls_pw.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext


async def main() -> None:
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/code/crawl_specific_links_on_website_bs.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import asyncio

from crawlee import Glob
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext


async def main() -> None:
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/code/crawl_specific_links_on_website_pw.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import asyncio

from crawlee import Glob
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext


async def main() -> None:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import asyncio

from crawlee import EnqueueStrategy
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext


async def main() -> None:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import asyncio

from crawlee import EnqueueStrategy
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext


async def main() -> None:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import asyncio

from crawlee import EnqueueStrategy
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext


async def main() -> None:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import asyncio

from crawlee import EnqueueStrategy
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext


async def main() -> None:
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/code/export_entire_dataset_to_file_csv.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext


async def main() -> None:
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/code/export_entire_dataset_to_file_json.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext


async def main() -> None:
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/code/fill_and_submit_web_form_crawler.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from urllib.parse import urlencode

from crawlee import Request
from crawlee.http_crawler import HttpCrawler, HttpCrawlingContext
from crawlee.crawlers import HttpCrawler, HttpCrawlingContext


async def main() -> None:
Expand Down
3 changes: 1 addition & 2 deletions docs/examples/code/parsel_crawler.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import asyncio

from crawlee.basic_crawler import BasicCrawlingContext
from crawlee.parsel_crawler import ParselCrawler, ParselCrawlingContext
from crawlee.crawlers import BasicCrawlingContext, ParselCrawler, ParselCrawlingContext

# Regex for identifying email addresses on a webpage.
EMAIL_REGEX = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
Expand Down
6 changes: 1 addition & 5 deletions docs/examples/code/playwright_crawler.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,6 @@
import asyncio

from crawlee.playwright_crawler import (
PlaywrightCrawler,
PlaywrightCrawlingContext,
PlaywrightPreNavCrawlingContext,
)
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext, PlaywrightPreNavCrawlingContext


async def main() -> None:
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/code/playwright_crawler_with_camoufox.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from typing_extensions import override

from crawlee.browsers import BrowserPool, PlaywrightBrowserController, PlaywrightBrowserPlugin
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext


class CamoufoxPlugin(PlaywrightBrowserPlugin):
Expand Down
4 changes: 2 additions & 2 deletions docs/guides/code/http_clients/curl_impersonate_example.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import asyncio

from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.http_clients.curl_impersonate import CurlImpersonateHttpClient
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.http_clients import CurlImpersonateHttpClient


async def main() -> None:
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/code/http_clients/httpx_example.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.http_clients import HttpxHttpClient


Expand Down
2 changes: 1 addition & 1 deletion docs/guides/code/proxy_management/inspecting_bs_example.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.proxy_configuration import ProxyConfiguration


Expand Down
2 changes: 1 addition & 1 deletion docs/guides/code/proxy_management/inspecting_pw_example.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.proxy_configuration import ProxyConfiguration


Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.proxy_configuration import ProxyConfiguration


Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.proxy_configuration import ProxyConfiguration


Expand Down
2 changes: 1 addition & 1 deletion docs/guides/code/proxy_management/session_bs_example.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler
from crawlee.crawlers import BeautifulSoupCrawler
from crawlee.proxy_configuration import ProxyConfiguration


Expand Down
2 changes: 1 addition & 1 deletion docs/guides/code/proxy_management/session_pw_example.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from crawlee.playwright_crawler import PlaywrightCrawler
from crawlee.crawlers import PlaywrightCrawler
from crawlee.proxy_configuration import ProxyConfiguration


Expand Down
2 changes: 1 addition & 1 deletion docs/guides/code/proxy_management/tiers_bs_example.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.proxy_configuration import ProxyConfiguration


Expand Down
2 changes: 1 addition & 1 deletion docs/guides/code/proxy_management/tiers_pw_example.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.proxy_configuration import ProxyConfiguration


Expand Down
2 changes: 1 addition & 1 deletion docs/guides/code/request_storage/do_not_purge_example.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import asyncio

from crawlee.configuration import Configuration
from crawlee.http_crawler import HttpCrawler, HttpCrawlingContext
from crawlee.crawlers import HttpCrawler, HttpCrawlingContext


async def main() -> None:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext


async def main() -> None:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext


async def main() -> None:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.memory_storage_client import MemoryStorageClient
from crawlee.storage_clients import MemoryStorageClient


async def main() -> None:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.http_crawler import HttpCrawler, HttpCrawlingContext
from crawlee.crawlers import HttpCrawler, HttpCrawlingContext
from crawlee.request_loaders import RequestList


Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.http_crawler import HttpCrawler, HttpCrawlingContext
from crawlee.crawlers import HttpCrawler, HttpCrawlingContext


async def main() -> None:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.http_crawler import HttpCrawler, HttpCrawlingContext
from crawlee.crawlers import HttpCrawler, HttpCrawlingContext
from crawlee.storages import RequestQueue


Expand Down
2 changes: 1 addition & 1 deletion docs/guides/code/request_storage/tandem_example.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.parsel_crawler import ParselCrawler, ParselCrawlingContext
from crawlee.crawlers import ParselCrawler, ParselCrawlingContext
from crawlee.request_loaders import RequestList


Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.parsel_crawler import ParselCrawler, ParselCrawlingContext
from crawlee.crawlers import ParselCrawler, ParselCrawlingContext
from crawlee.request_loaders import RequestList, RequestManagerTandem
from crawlee.storages import RequestQueue

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext


async def main() -> None:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
from crawlee.storages import Dataset


Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext


async def main() -> None:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import asyncio

from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.storages import KeyValueStore


Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import asyncio

from crawlee import ConcurrencySettings
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler
from crawlee.crawlers import BeautifulSoupCrawler


async def main() -> None:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import asyncio

from crawlee import ConcurrencySettings
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler
from crawlee.crawlers import BeautifulSoupCrawler


async def main() -> None:
Expand Down
Loading

0 comments on commit 0ba04d1

Please sign in to comment.