-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CheerioCrawler not persisting cookies #2618
Comments
Cookies are persisted per session, your second request is (almost certainly) getting a new session. |
How do I make sure the second request is using the same session? |
What are you trying to do? |
You could set |
The website I'm trying to scrape has a anti-bot feature where you need to wait in a access queue. The access queue page sends a When I detect this I'm sleeping the required amount and then re-queuing the same URL. I can't find a way to refresh a pay via Cheerio directly so I'm having to requeue it with a different unique key. However this seems difficult to implement with many sessions since I cannot specify the request go through the same session. Maybe there's a better way to handle this use case in Crawlee I'm not aware of? |
Can you give me the Url of that website ? |
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/cheerio (CheerioCrawler)
Issue description
The CheerioCrawler is not persisting cookies at all. The
session
storage does have the cookies for therequest.url
but it is not being set. Manually trying to set it in thepreNavigationHooks
does not work assession.getCookieString(request.url)
is empty.useSessionPool: true
andpersistCookiesPerSession: true
Code sample
Package version
v3.11.1
Node.js version
v20.16.0
Operating system
MacOS Sonoma
Apify platform
I have tested this on the
next
releaseNo response
Other context
Here's a small Python script to test if Crawlee is properly setting cookies. It will set a cookie on
GET /
The text was updated successfully, but these errors were encountered: