Skip to content

Conversation

@kzhdev
Copy link

@kzhdev kzhdev commented Nov 6, 2025

Description

  1. Fix us_index collector.py Http Error: 403 Forbidden
  2. Reomve FutureWarning

Motivation and Context

us_index collector.py stopped working which caused by the following error:

Traceback (most recent call last):
File "Y:\repo\qlib\scripts\data_collector\us_index\collector.py", line 273, in
fire.Fire(partial(get_instruments, market_index="us_index"))
File "C:\Users\auror\miniforge3\envs\qlib\Lib\site-packages\fire\core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\site-packages\fire\core.py", line 559, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\site-packages\fire\core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "\Trade-server\d\repo\qlib\scripts\data_collector\utils.py", line 672, in get_instruments
getattr(obj, method)()
File "\Trade-server\d\repo\qlib\scripts\data_collector\index.py", line 213, in parse_instruments
changers_df = self.get_changes()
^^^^^^^^^^^^^^^^^^
File "\Trade-server\d\repo\qlib\scripts\data_collector\us_index\collector.py", line 229, in get_changes
changes_df = pd.read_html(self.WIKISP500_CHANGES_URL)[-1]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\site-packages\pandas\io\html.py", line 1240, in read_html
return _parse(
^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\site-packages\pandas\io\html.py", line 983, in _parse
tables = p.parse_tables()
^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\site-packages\pandas\io\html.py", line 249, in parse_tables
tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
^^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\site-packages\pandas\io\html.py", line 806, in _build_doc
raise e
File "C:\Users\auror\miniforge3\envs\qlib\Lib\site-packages\pandas\io\html.py", line 785, in _build_doc
with get_handle(
^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\site-packages\pandas\io\common.py", line 728, in get_handle
ioargs = _get_filepath_or_buffer(
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\site-packages\pandas\io\common.py", line 384, in _get_filepath_or_buffer
with urlopen(req_info) as req:
^^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\site-packages\pandas\io\common.py", line 289, in urlopen
return urllib.request.urlopen(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\urllib\request.py", line 215, in urlopen
return opener.open(url, data, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\urllib\request.py", line 521, in open
response = meth(req, response)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\urllib\request.py", line 630, in http_response
response = self.parent.error(
^^^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\urllib\request.py", line 559, in error
return self._call_chain(*args)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\urllib\request.py", line 492, in _call_chain
result = func(*args)
^^^^^^^^^^^
File "C:\Users\auror\miniforge3\envs\qlib\Lib\urllib\request.py", line 639, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

The script also has the following FutureWarning:

scripts\data_collector\us_index\collector.py:151: FutureWarning: Passing literal html to 'read_html' is deprecated and will be removed in a future version. To read from a literal string, wrap it in a 'StringIO' object.
df_list = pd.read_html(_data.text)

How Has This Been Tested?

run python collector.py --index_name SP500 --qlib_dir ~/.qlib/qlib_data/us_data --method parse_instruments made sure the sp500.txt file is created successfully and the FutureWarning is gone.

  • Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.
  • If you are adding a new feature, test on your own test scripts.

Screenshots of Test Results (if appropriate):

  1. Pipeline test:
image
  1. Your own tests:
image

Types of changes

  • Fix bugs
  • Add new feature
  • Update documentation

@kzhdev
Copy link
Author

kzhdev commented Nov 6, 2025

@microsoft-github-policy-service agree

1 similar comment
@kzhdev
Copy link
Author

kzhdev commented Nov 6, 2025

@microsoft-github-policy-service agree

@SunsetWolf
Copy link
Collaborator

Hi, @kzhdev

It's nice to see the code you've contributed, and I think you're helping to make qlib better, thank you very much.

Some suggestions:

  • If UserAgent can be generated randomly, it will be better than the current hard code.
    There are some packages that can be used to randomly generate UserAgent, e.g.: fake-useragent, etc.

  • Also, you need to get your code past the CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants