Add tests and change logic for checking availability in ncku_tainan#89
Add tests and change logic for checking availability in ncku_tainan#89kevinjcliao merged 11 commits intog0v:masterfrom
ncku_tainan#89Conversation
kevinjcliao
left a comment
There was a problem hiding this comment.
Thanks for adding unit tests!
Note, this project is currently in maintenance mode while we wait for 1922.gov.tw to come online. In the meantime, it's always possible the self-paid hospitals program starts again, and we need to be ready for it. Thank you for this PR! It cleans up the code and makes it more stable.
Requesting changes for some small nits, and then happy to accept!
https://g0v-tw.slack.com/archives/C020EQ0R8TW/p1622465181397300
| if i != len(optionValues) - 1: | ||
| # Prepare data_dict for the next POST request | ||
| post_data["__VIEWSTATE"] = soup.find( | ||
| "input", {"id": "__VIEWSTATE"} | ||
| ).get("value") | ||
| post_data["__VIEWSTATEGENERATOR"] = soup.find( | ||
| "input", {"id": "__VIEWSTATEGENERATOR"} | ||
| ).get("value") | ||
| post_data["__EVENTVALIDATION"] = soup.find( | ||
| "input", {"id": "__EVENTVALIDATION"} | ||
| ).get("value") | ||
| post_data["ctl00$MainContent$ddlWeeks"] = optionValues[i + 1] | ||
| post_data["ctl00$MainContent$ddlWeeks_02"] = date | ||
|
|
||
| # Launch POST request | ||
| # Using sync since each data_dict in POST request depends on previous html text. | ||
| r = requests.post(url, verify=CERT, data=post_data, timeout=5) | ||
| soup = bs4.BeautifulSoup(r.text, "html.parser") |
There was a problem hiding this comment.
Can you 'early return' instead?
if i == len(optionValues) - 1:
break
# rest of if logic
```
on Line 79?
| # Get first day of each weekly appointments list. | ||
| soup = bs4.BeautifulSoup(html, "html.parser") | ||
| selectTag = soup.find("select", {"id": "ctl00_MainContent_ddlWeeks"}) | ||
| optionValues = list(map(lambda x: x.get("value"), selectTag.find_all("option"))) |
There was a problem hiding this comment.
Nit: option_values. Prefer Snake Case for Python variable names.
| map(lambda x: x.get("value"), selectTag.find_all("option")) | ||
| # Get first day of each weekly appointments list. | ||
| soup = bs4.BeautifulSoup(html, "html.parser") | ||
| selectTag = soup.find("select", {"id": "ctl00_MainContent_ddlWeeks"}) |
There was a problem hiding this comment.
Nit: select_tag: Prefer Snake Case for Python variable names.
| async with session.get(URL_SELF_PAID, timeout=5) as r: | ||
| html_self_paid = await r.text() | ||
| async with session.get(URL_GOV_PAID, timeout=5) as r: |
There was a problem hiding this comment.
Have you tested the timeout? In my experience 5 seconds is often too short for these scraping jobs.
There was a problem hiding this comment.
The timeout error never happened in setting timeout=5, however I will increase it to 10 sec as other scraper.
| # Launch POST request | ||
| # Using sync since each data_dict in POST request depends on previous html text. | ||
| r = requests.post(url, verify=CERT, data=post_data, timeout=5) | ||
| r = requests.post(url, verify=CERT, data=post_data, timeout=10) |
There was a problem hiding this comment.
Changing timeout from 5 sec to 10 sec
| timeout = aiohttp.ClientTimeout(total=5) | ||
| timeout = aiohttp.ClientTimeout(total=10) | ||
| async with aiohttp.ClientSession(timeout=timeout) as session: | ||
| async with session.get(URL_SELF_PAID, timeout=5) as r: | ||
| async with session.get(URL_SELF_PAID, timeout=timeout) as r: | ||
| html_self_paid = await r.text() | ||
| async with session.get(URL_GOV_PAID, timeout=5) as r: | ||
| async with session.get(URL_GOV_PAID, timeout=timeout) as r: |
There was a problem hiding this comment.
Changing from 5 sec to 10 sec
| select_tag = soup.find("select", {"id": "ctl00_MainContent_ddlWeeks"}) | ||
| option_values = list( | ||
| map(lambda x: x.get("value"), select_tag.find_all("option")) | ||
| ) |
There was a problem hiding this comment.
selectTag -> select_tag and optionValues -> option_values to fit python naming convention.
Related Issue: #46
Working at branch
parse_ncku_tainanTest Plan Checklist: