Fix(data_collector): fix us_index collector.py Http Error 403 Forbidden; Remove FutureWarning. #2047
+13
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Motivation and Context
us_index collector.py stopped working which caused by the following error:
Traceback (most recent call last):
File "Y:\repo\qlib\scripts\data_collector\us_index\collector.py", line 273, in
fire.Fire(partial(get_instruments, market_index="us_index"))
File "C:\Users\auror\miniforge3\envs\qlib\Lib\site-packages\fire\core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\site-packages\fire\core.py", line 559, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\site-packages\fire\core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "\Trade-server\d\repo\qlib\scripts\data_collector\utils.py", line 672, in get_instruments
getattr(obj, method)()
File "\Trade-server\d\repo\qlib\scripts\data_collector\index.py", line 213, in parse_instruments
changers_df = self.get_changes()
^^^^^^^^^^^^^^^^^^
File "\Trade-server\d\repo\qlib\scripts\data_collector\us_index\collector.py", line 229, in get_changes
changes_df = pd.read_html(self.WIKISP500_CHANGES_URL)[-1]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\site-packages\pandas\io\html.py", line 1240, in read_html
return _parse(
^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\site-packages\pandas\io\html.py", line 983, in _parse
tables = p.parse_tables()
^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\site-packages\pandas\io\html.py", line 249, in parse_tables
tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
^^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\site-packages\pandas\io\html.py", line 806, in _build_doc
raise e
File "C:\Users\auror\miniforge3\envs\qlib\Lib\site-packages\pandas\io\html.py", line 785, in _build_doc
with get_handle(
^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\site-packages\pandas\io\common.py", line 728, in get_handle
ioargs = _get_filepath_or_buffer(
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\site-packages\pandas\io\common.py", line 384, in _get_filepath_or_buffer
with urlopen(req_info) as req:
^^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\site-packages\pandas\io\common.py", line 289, in urlopen
return urllib.request.urlopen(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\urllib\request.py", line 215, in urlopen
return opener.open(url, data, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\urllib\request.py", line 521, in open
response = meth(req, response)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\urllib\request.py", line 630, in http_response
response = self.parent.error(
^^^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\urllib\request.py", line 559, in error
return self._call_chain(*args)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\urllib\request.py", line 492, in _call_chain
result = func(*args)
^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\urllib\request.py", line 639, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
The script also has the following FutureWarning:
scripts\data_collector\us_index\collector.py:151: FutureWarning: Passing literal html to 'read_html' is deprecated and will be removed in a future version. To read from a literal string, wrap it in a 'StringIO' object.
df_list = pd.read_html(_data.text)
How Has This Been Tested?
run
python collector.py --index_name SP500 --qlib_dir ~/.qlib/qlib_data/us_data --method parse_instrumentsmade sure the sp500.txt file is created successfully and the FutureWarning is gone.pytest qlib/tests/test_all_pipeline.pyunder upper directory ofqlib.Screenshots of Test Results (if appropriate):
Types of changes