Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub API rate limit #127

Closed
achimnol opened this issue Jan 20, 2025 · 13 comments · Fixed by #148
Closed

GitHub API rate limit #127

achimnol opened this issue Jan 20, 2025 · 13 comments · Fixed by #148
Assignees
Labels
answered Indicates an answered question. bug Something isn't working question Further information is requested

Comments

@achimnol
Copy link

achimnol commented Jan 20, 2025

After #116, I see now it performs retries while downloading PBS and other stuffs.
Unfortunately, there is another problem: we now hit the GitHub's API rate limit. 🫠

Would there be any better way to avoid it?

For instance,

  • Could I set GITHUB_TOKEN to authenticate the fetch requests?
  • How could I configure GitHub Actions cache to preserve the already fetched artifacts in previous workflow runs?

Example workflow run: https://github.com/lablup/backend.ai/actions/runs/12868611532

@jsirois
Copy link
Contributor

jsirois commented Jan 20, 2025

For auth, you can do like so:

SCIENCE_AUTH_API_GITHUB_COM_BEARER: ${{ secrets.GITHUB_TOKEN }}

Also see: https://github.com/pex-tool/pex/blob/027781802e73d68fb26a75f6e2b7d8b65e3b160b/.github/workflows/ci.yml#L10-L12

The whole auth scheme is defined here:

lift/science/fetcher.py

Lines 78 to 127 in 2bd9199

def _configure_auth(url: Url) -> httpx.Auth | tuple[str, str] | None:
if not url.info.hostname:
return None
normalized_hostname = url.info.hostname.upper().replace(".", "_").replace("-", "_")
env_auth_prefix = f"SCIENCE_AUTH_{normalized_hostname}"
env_auth = {key: value for key, value in os.environ.items() if key.startswith(env_auth_prefix)}
def check_ambiguous_auth(auth_type: str) -> None:
if env_auth:
raise AmbiguousAuthError(
f"{auth_type.capitalize()} auth was configured for {url} via env var but so was: "
f"{", ".join(env_auth)}"
)
def get_username(auth_type: str) -> str | None:
return env_auth.pop(f"{env_auth_prefix}_{auth_type.upper()}_USER", None)
def require_password(auth_type: str) -> str:
env_var = f"{env_auth_prefix}_{auth_type.upper()}_PASS"
passwd = env_auth.pop(env_var, None)
if not passwd:
raise InvalidAuthError(
f"{auth_type.capitalize()} auth requires a password be configured via the "
f"{env_var} env var."
)
return passwd
if bearer := env_auth.pop(f"{env_auth_prefix}_BEARER", None):
check_ambiguous_auth("bearer")
return "Authorization", f"Bearer {bearer}"
if username := get_username("basic"):
password = require_password("basic")
check_ambiguous_auth("basic")
return httpx.BasicAuth(username=username, password=password)
if username := get_username("digest"):
password = require_password("digest")
check_ambiguous_auth("digest")
return httpx.DigestAuth(username=username, password=password)
try:
return httpx.NetRCAuth(None)
except (FileNotFoundError, IsADirectoryError):
pass
except NetrcParseError as e:
logger.warning(f"Not using netrc for auth, netrc file is invalid: {e}")
return None

So you could similarly define basic auth or use ~/.netrc.

For caching, I won't teach GitHub actions, but science allows you to control the cache with --cache-dir: https://science.scie.app/cli.html#science

So you can use SCIENCE_CACHE_DIR.

@jsirois jsirois self-assigned this Jan 20, 2025
@jsirois jsirois added question Further information is requested answered Indicates an answered question. labels Jan 20, 2025
@jsirois
Copy link
Contributor

jsirois commented Jan 20, 2025

@achimnol I've marked this as an answered question, but please confirm this answer works for you.

@achimnol
Copy link
Author

Thanks for the detailed answer.
I'll check out with my team and leave the result.

@jsirois
Copy link
Contributor

jsirois commented Jan 30, 2025

@achimnol do you have any results to report?

@achimnol
Copy link
Author

achimnol commented Feb 1, 2025

@achimnol do you have any results to report?

We made a temporary CI workflow to reproduce the issue before applying the GitHub API token, but we failed to reproduce it when we triggered the workflow manually several times afterwards. 😞

I'll update you once it happens again.

@Yaminyam
Copy link

@jsirois
Copy link
Contributor

jsirois commented Feb 19, 2025

@Yaminyam this is a problem - there should be at least 2 hits and there is just 1:

:; git grep SCIENCE_AUTH_API_GITHUB_COM_BEARER
.github/workflows/ci.yml:      SCIENCE_AUTH_API_GITHUB_COM_BEARER: ${{ secrets.GITHUB_TOKEN }}

1st rule of Pants club: Pants blocks env vars by default
2nd rule of Pants club: Pants is generally sneaky and does things behind your back. All is usually not as it seems.

So you need to leak the SCIENCE_AUTH_API_GITHUB_COM_BEARER env var through Pants to underlying processes somewhere in your config. I think you still use https://www.pantsbuild.org/stable/reference/subsystems/subprocess-environment for Python / Pex subprocesses to do this, but I have not hacked on or used Pants for several years now; so you'll want to make sure.

@jsirois
Copy link
Contributor

jsirois commented Mar 6, 2025

@Yaminyam have you been able to configure Pants to let the SCIENCE_AUTH_API_GITHUB_COM_BEARER env var leak through to Pex processes?

@cairijun
Copy link

return "Authorization", f"Bearer {bearer}"

https://github.com/encode/httpx/blob/9e8ab40369bd3ec2cc8bff37ab79bf5769c8b00f/httpx/_client.py#L448

I think passing a tuple as auth means Basic Authentication in httpx, not Bearer.

@jsirois
Copy link
Contributor

jsirois commented Mar 18, 2025

@cairijun indeed. That tuple should be routed to the request headers but it is not. Thanks for spotting this bug! I'll get out a release today with a fix.

Keep in mind, my observation above is still pertinent. Even with the fix, things should still fail due to rate limits unless the env var is punched through past Pants hermetic sandbox walls.

@jsirois jsirois added bug Something isn't working in progress Indicates the assignee is actively working on the item. labels Mar 18, 2025
jsirois added a commit to jsirois/lift that referenced this issue Mar 18, 2025
@jsirois
Copy link
Contributor

jsirois commented Mar 18, 2025

@cairijun thank you very much for looking at the code. That was an embarrasing bug. #148 is out for review and I should have a release in the next several hours. I'll in turn produce a Pex release that bumps its minimum science requirement to that release. I'll not when both releases are complete here and then close the issue.

@jsirois
Copy link
Contributor

jsirois commented Mar 18, 2025

OK, science 0.12.2 is released with the fix: https://github.com/a-scie/lift/releases/tag/v0.12.2

@jsirois
Copy link
Contributor

jsirois commented Mar 18, 2025

And now Pex 2.33.5 is released with a bump to science>=0.12.2: https://github.com/pex-tool/pex/releases/tag/v2.33.5

@achimnol, @Yaminyam and @cairijun please speak up if you are not able to upgrade to Pex 2.33.5 or newer and fix your rate limit issue after ensuring Pants is not blocking the SCIENCE_AUTH_API_GITHUB_COM_BEARER env var (it might be easiest to inspect a Pex scie build sandbox __run.sh script to confirm locally).

@jsirois jsirois removed the in progress Indicates the assignee is actively working on the item. label Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
answered Indicates an answered question. bug Something isn't working question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants