You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was using tarantula and noticed that it was trying to crawl a "tel:" link, ultimately failing because the path of the parsed URI was nil. I looked into it and saw that a simple fix would be to add tel to the skip_uri_patterns list in Crawler's initialize function. However, the crawler would have the same issue with other URI schemes that aren't listed in skip_uri_patterns, so it seems like a more general approach may be better. Do you think it would make more sense to skip URIs that start with any scheme name, or is there a reason you specifically chose to only skip the javascript, mailto, and http schemes?
The text was updated successfully, but these errors were encountered:
I was using tarantula and noticed that it was trying to crawl a "tel:" link, ultimately failing because the path of the parsed URI was nil. I looked into it and saw that a simple fix would be to add tel to the skip_uri_patterns list in Crawler's initialize function. However, the crawler would have the same issue with other URI schemes that aren't listed in skip_uri_patterns, so it seems like a more general approach may be better. Do you think it would make more sense to skip URIs that start with any scheme name, or is there a reason you specifically chose to only skip the javascript, mailto, and http schemes?
The text was updated successfully, but these errors were encountered: