Releases: apify/crawlee-python
Releases · apify/crawlee-python
0.6.2
0.6.1
0.6.1 (2025-03-03)
🐛 Bug Fixes
- Add
browserforge
to mandatory dependencies (#1044) (ddfbde8) by @Pijukatel
0.6.0
0.6.0 (2025-03-03)
🚀 Features
- Integrate browserforge fingerprints (#829) (2b156b4) by @Pijukatel
- Add AdaptivePlaywrightCrawler (#872) (5ba70b6) by @Pijukatel
- Implement
_snapshot_client
forSnapshotter
(#957) (ba4d384) by @Mantisus - Add adaptive context helpers (#964) (e248f17) by @Pijukatel
- [breaking] Enable additional status codes arguments to PlaywrightCrawler (#959) (87cf446) by @Pijukatel
- Replace
HeaderGenerator
implementation bybrowserforge
implementation (#960) (c2f8c93) by @Pijukatel
🐛 Bug Fixes
- Fix playwright template and dockerfile (#972) (c33b34d) by @janbuchar
- Fix installing dependencies via pip in project template (#977) (1e3b8eb) by @janbuchar
- Fix default migration storage (#1018) (6a0c4d9) by @Pijukatel
- Fix logger name for http based loggers (#1023) (bfb3944) by @Pijukatel
- Remove allow_redirects override in CurlImpersonateHttpClient (#1017) (01d855a) by @2tunnels
- Remove follow_redirects override in HttpxHttpClient (#1015) (88afda3) by @2tunnels
- Fix flaky test_common_headers_and_user_agent (#1030) (58aa70e) by @Pijukatel
Refactor
- [breaking] Remove unused config properties (#978) (4b7fe29) by @vdusek
- [breaking] Remove Base prefix from abstract class names (#980) (8ccb5d4) by @vdusek
- [breaking] Сhange default
incognito context
topersistent context
forPlaywright
(#985) (f01520d) by @Mantisus - [breaking] Change
Session
cookies fromdict
toSessionCookies
withCookieJar
(#984) (6523b3a) by @Mantisus - [breaking] Replace enum with literal for
EnqueueStrategy
(#1019) (d2481ef) by @vdusek - [breaking] Update status code handling (#1028) (6b59471) by @Mantisus
- [breaking] Move
cli
dependencies to optional dependencies (#1011) (4382959) by @Mantisus
0.5.4
0.5.4 (2025-02-05)
🚀 Features
- Add support
use_incognito_pages
forbrowser_launch_options
inPlaywrightCrawler
(#941) (eae3a33) by @Mantisus
🐛 Bug Fixes
- Fix session managment with retire (#947) (caee03f) by @Mantisus
- Fix templates - poetry-plugin-export version and camoufox template name (#952) (7addea6) by @Pijukatel
- Fix convert relative link to absolute in
enqueue_links
for response with redirect (#956) (694102e) by @Mantisus - Fix
CurlImpersonateHttpClient
cookies handler (#946) (ed415c4) by @Mantisus
0.5.3
0.5.3 (2025-01-31)
🚀 Features
- Add keep_alive flag to
crawler.__init__
(#921) (7a82d0c) by @Pijukatel - Add
block_requests
helper forPlaywrightCrawler
(#919) (1030459) by @Mantisus - Return request handlers from decorator methods to allow further decoration (#934) (9ec0aae) by @mylank
- Add
transform_request_function
forenqueue_links
(#923) (6b15957) by @Mantisus - Add
time_remaining_secs
property toMIGRATING
event data (#940) (b44501b) by @fnesveda - Add LogisticalRegressionPredictor - rendering type predictor for adaptive crawling (#930) (8440499) by @Pijukatel
🐛 Bug Fixes
0.5.2
0.5.1
0.5.1 (2025-01-07)
🐛 Bug Fixes
- Make result of RequestList.is_empty independent of fetch_next_request calls (#876) (d50249e) by @janbuchar
0.5.0
0.5.0 (2025-01-02)
🚀 Features
- Add possibility to use None as no proxy in tiered proxies (#760) (0fbd017) by @Pijukatel
- Add
use_state
context method (#682) (868b41e) by @Mantisus - Add pre-navigation hooks router to AbstractHttpCrawler (#791) (0f23205) by @Pijukatel
- Add example of how to integrate Camoufox into PlaywrightCrawler (#789) (246cfc4) by @Pijukatel
- Expose event types, improve on/emit signature, allow parameterless listeners (#800) (c102c4c) by @janbuchar
- Add stop method to BasicCrawler (#807) (6d01af4) by @Pijukatel
- Add
html_to_text
helper function (#792) (2b9d970) by @Pijukatel - [breaking] Implement
RequestManagerTandem
, removeadd_request
fromRequestList
, accept any iterable inRequestList
constructor (#777) (4172652) by @janbuchar
🐛 Bug Fixes
- Fix circular import in
KeyValueStore
(#805) (8bdf49d) by @Mantisus - [breaking] Refactor service usage to rely on
service_locator
(#691) (1d31c6c) by @vdusek - Pass
verify
in httpx client (#802) (074d083) by @Mantisus - Fix
page_options
forPlaywrightBrowserPlugin
(#796) (bd3bdd4) by @Mantisus - Fix event migrating handler in
RequestQueue
(#825) (fd6663f) by @Mantisus - Respect user configuration for work with status codes (#812) (8daf4bd) by @Mantisus
abort-on-error
for successive runs (#834) (0cea673) by @Mantisus- Relax ServiceLocator restrictions (#837) (aa3667f) by @janbuchar
- Fix typo in exports (#841) (8fa6ac9) by @janbuchar
Refactor
- [breaking] Refactor HttpCrawler, BeautifulSoupCrawler, ParselCrawler inheritance (#746) (9d3c269) by @Pijukatel
- [breaking] Remove
json_
andorder_no
fromRequest
(#788) (5381d13) by @Mantisus - [breaking] Rename PwPreNavContext to PwPreNavCrawlingContext (#827) (84b61a3) by @vdusek
- [breaking] Rename PlaywrightCrawler kwargs: browser_options, page_options (#831) (ffc6048) by @Pijukatel
- [breaking] Update the crawlers & storage clients structure (#828) (0ba04d1) by @vdusek
0.4.5
0.4.5 (2024-12-06)
🚀 Features
- Improve project bootstrapping (#538) (367899c) by @janbuchar
🐛 Bug Fixes
- Add upper bound of HTTPX version (#775) (b59e34d) by @vdusek
- Fix incorrect use of desired concurrency ratio (#780) (d1f8bfb) by @Pijukatel
- Remove pydantic constraint <2.10.0 and update timedelta validator, serializer type hints (#757) (c0050c0) by @Pijukatel