Skip to content

Add tests and change logic for checking availability in ncku_tainan#89

Merged
kevinjcliao merged 11 commits intog0v:masterfrom
typingmonk:parse_ncku_tainan
Jun 15, 2021
Merged

Add tests and change logic for checking availability in ncku_tainan#89
kevinjcliao merged 11 commits intog0v:masterfrom
typingmonk:parse_ncku_tainan

Conversation

@typingmonk
Copy link
Copy Markdown
Contributor

Related Issue: #46

Working at branch parse_ncku_tainan

Test Plan Checklist:

  • yarn tc passes with no errors
  • yarn lint fixes formatting errors.
  • Load up the site and confirm it's working.

Copy link
Copy Markdown
Collaborator

@kevinjcliao kevinjcliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding unit tests!

Note, this project is currently in maintenance mode while we wait for 1922.gov.tw to come online. In the meantime, it's always possible the self-paid hospitals program starts again, and we need to be ready for it. Thank you for this PR! It cleans up the code and makes it more stable.

Requesting changes for some small nits, and then happy to accept!

https://g0v-tw.slack.com/archives/C020EQ0R8TW/p1622465181397300

Comment thread backend/Parsers/ncku_tainan.py Outdated
Comment on lines +79 to +96
if i != len(optionValues) - 1:
# Prepare data_dict for the next POST request
post_data["__VIEWSTATE"] = soup.find(
"input", {"id": "__VIEWSTATE"}
).get("value")
post_data["__VIEWSTATEGENERATOR"] = soup.find(
"input", {"id": "__VIEWSTATEGENERATOR"}
).get("value")
post_data["__EVENTVALIDATION"] = soup.find(
"input", {"id": "__EVENTVALIDATION"}
).get("value")
post_data["ctl00$MainContent$ddlWeeks"] = optionValues[i + 1]
post_data["ctl00$MainContent$ddlWeeks_02"] = date

# Launch POST request
# Using sync since each data_dict in POST request depends on previous html text.
r = requests.post(url, verify=CERT, data=post_data, timeout=5)
soup = bs4.BeautifulSoup(r.text, "html.parser")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you 'early return' instead?

if i == len(optionValues) - 1:
    break
# rest of if logic
```

on Line 79? 

Comment thread backend/Parsers/ncku_tainan.py Outdated
# Get first day of each weekly appointments list.
soup = bs4.BeautifulSoup(html, "html.parser")
selectTag = soup.find("select", {"id": "ctl00_MainContent_ddlWeeks"})
optionValues = list(map(lambda x: x.get("value"), selectTag.find_all("option")))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: option_values. Prefer Snake Case for Python variable names.

Comment thread backend/Parsers/ncku_tainan.py Outdated
map(lambda x: x.get("value"), selectTag.find_all("option"))
# Get first day of each weekly appointments list.
soup = bs4.BeautifulSoup(html, "html.parser")
selectTag = soup.find("select", {"id": "ctl00_MainContent_ddlWeeks"})
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: select_tag: Prefer Snake Case for Python variable names.

Comment thread backend/Parsers/ncku_tainan.py Outdated
Comment on lines +25 to +27
async with session.get(URL_SELF_PAID, timeout=5) as r:
html_self_paid = await r.text()
async with session.get(URL_GOV_PAID, timeout=5) as r:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested the timeout? In my experience 5 seconds is often too short for these scraping jobs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout error never happened in setting timeout=5, however I will increase it to 10 sec as other scraper.

# Launch POST request
# Using sync since each data_dict in POST request depends on previous html text.
r = requests.post(url, verify=CERT, data=post_data, timeout=5)
r = requests.post(url, verify=CERT, data=post_data, timeout=10)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing timeout from 5 sec to 10 sec

Comment thread backend/Parsers/ncku_tainan.py Outdated
Comment on lines +23 to +27
timeout = aiohttp.ClientTimeout(total=5)
timeout = aiohttp.ClientTimeout(total=10)
async with aiohttp.ClientSession(timeout=timeout) as session:
async with session.get(URL_SELF_PAID, timeout=5) as r:
async with session.get(URL_SELF_PAID, timeout=timeout) as r:
html_self_paid = await r.text()
async with session.get(URL_GOV_PAID, timeout=5) as r:
async with session.get(URL_GOV_PAID, timeout=timeout) as r:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing from 5 sec to 10 sec

select_tag = soup.find("select", {"id": "ctl00_MainContent_ddlWeeks"})
option_values = list(
map(lambda x: x.get("value"), select_tag.find_all("option"))
)
Copy link
Copy Markdown
Contributor Author

@typingmonk typingmonk Jun 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

selectTag -> select_tag and optionValues -> option_values to fit python naming convention.

Comment thread backend/Parsers/ncku_tainan.py Outdated
@typingmonk typingmonk requested a review from kevinjcliao June 1, 2021 09:01
@kevinjcliao kevinjcliao merged commit 9dd0c01 into g0v:master Jun 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants