-
Notifications
You must be signed in to change notification settings - Fork 433
[CI] add sglang integration test #1085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Note Gemini is unable to generate a summary for this pull request due to the file types involved not being currently supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new CI job test-sglang-integration that triggers external integration tests on the T-one testing platform. The job creates a remote test job using the T-one API, polls for completion status, and reports results back to the GitHub Actions workflow.
Key changes:
- Adds new
test-sglang-integrationjob that integrates with T-one external testing platform - Implements job creation and status polling logic using T-one REST APIs
- Uses authenticated API calls with base64-encoded signatures for job management
Comments suppressed due to low confidence (1)
.github/workflows/ci.yml:250
- The indentation is incorrect. Lines 230-250 appear to belong to the
test-wheel-ubuntujob based on their content (they referencetest_env/bin/activateand run mooncake services), but they are placed after thetest-sglang-integrationjob definition ends. These steps should either be:
- Indented to be part of the
test-wheel-ubuntujob (if they were originally there), or - Part of a separate job definition.
The current structure will cause a YAML parsing error in the GitHub Actions workflow.
- name: Start Mooncake Master
run: |
source test_env/bin/activate
mkdir -p /tmp/mooncake_storage
mooncake_master \
--eviction_high_watermark_ratio=0.95 \
--cluster_id=ci_test_cluster \
--port 50051 &
sleep 3
shell: bash
- name: Run Python Tensor API Performance Test (CI check)
env:
MOONCAKE_MASTER: "127.0.0.1:50051"
MOONCAKE_TE_META_DATA_SERVER: "http://127.0.0.1:8080/metadata"
MOONCAKE_PROTOCOL: "tcp"
LOCAL_HOSTNAME: "127.0.0.1"
run: |
source test_env/bin/activate
python scripts/test_tensor_api.py -n 1
shell: bash
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - name: trigger T-one test | ||
| if: ${{ env.tone_user_name != '' }} | ||
| run: | | ||
| curl -L -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/repos/${{ github.repository }}/actions/runs/${{ github.run_id }}/artifacts > artifact.json |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The artifact download in line 178 fetches artifacts using the GitHub API without authentication. This will work for public repositories, but if the repository is private, this call will fail. Consider using the GITHUB_TOKEN for authentication:
curl -L -H "Accept: application/vnd.github+json" -H "Authorization: Bearer ${{ secrets.GITHUB_TOKEN }}" -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/repos/${{ github.repository }}/actions/runs/${{ github.run_id }}/artifacts > artifact.json| curl -L -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/repos/${{ github.repository }}/actions/runs/${{ github.run_id }}/artifacts > artifact.json | |
| curl -L -H "Accept: application/vnd.github+json" -H "Authorization: Bearer ${{ secrets.GITHUB_TOKEN }}" -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/repos/${{ github.repository }}/actions/runs/${{ github.run_id }}/artifacts > artifact.json |
| if: ${{ env.tone_user_name != '' }} | ||
| run: | | ||
| curl -L -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/repos/${{ github.repository }}/actions/runs/${{ github.run_id }}/artifacts > artifact.json | ||
| artifact_id=$(jq -r ".artifacts[] | select(.name | contains(\"py312\") ) | .id" artifact.json) |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error handling is missing for the jq command that extracts the artifact_id. If no artifact matching "py312" is found, artifact_id will be empty, but this is not checked before using it in the API call on line 182. This could result in an invalid API request. Add validation:
artifact_id=$(jq -r ".artifacts[] | select(.name | contains(\"py312\") ) | .id" artifact.json)
if [ -z "$artifact_id" ]; then
echo "ERROR: No py312 artifact found"
exit 1
fi| artifact_id=$(jq -r ".artifacts[] | select(.name | contains(\"py312\") ) | .id" artifact.json) | |
| artifact_id=$(jq -r ".artifacts[] | select(.name | contains(\"py312\") ) | .id" artifact.json) | |
| if [ -z "$artifact_id" ]; then | |
| echo "ERROR: No py312 artifact found" | |
| exit 1 | |
| fi |
| fi | ||
| job_id=$(jq .data.id job.json) | ||
| echo "check job status here and remember to cancel it before restart the job !" | ||
| echo "job_url: https://tone.openanolis.cn/ws/gclfnh19/test_result/${job_id}" |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The job URL contains a hard-coded workspace ID "gclfnh19" (line 196), but the API request uses workspace "mooncake_test" (line 182). If these need to match, consider using a consistent value or making it configurable. If they represent different entities in the T-one system, add a comment explaining the relationship.
| echo "job_id=${job_id}" >> $GITHUB_ENV | ||
| shell: bash | ||
|
|
||
| - name: qurey job results |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling error: "qurey" should be "query".
| - name: qurey job results | |
| - name: query job results |
| curl -s -H 'Content-Type: application/json' -X POST -d "{\"workspace\":\"mooncake_test\",\"project\":\"mooncake-ci\",\"template\":\"mooncake-ci-test\",\"name\":\"mooncake-ci-${{ github.sha }}\",\"username\":\"${{ secrets.TONE_USER_NAME }}\",\"env_ifs\":\" \",\"env_info\":\"ARTIFACT_ID=${artifact_id} GIT_REPO=${{ github.repository }}\",\"signature\":\"$signature\"}" https://tone.openanolis.cn/api/job/create/ > job.json | ||
| jq . job.json | ||
| if [ $? -ne 0 ]; then | ||
| echo "curl failed" | ||
| exit 1 | ||
| fi |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition checking jq exit status after displaying the JSON output is problematic. Line 183 runs jq . job.json which will display the JSON, and line 184 checks $? which captures the exit code of jq, not the preceding curl command. If the curl succeeded but returned invalid JSON, this check would catch it, but the logic seems intended to check if curl failed. Consider checking the curl exit status directly or adding error handling to the curl command itself (e.g., using curl's -f flag to fail on HTTP errors).
| curl -s -H 'Content-Type: application/json' -X POST -d "{\"workspace\":\"mooncake_test\",\"project\":\"mooncake-ci\",\"template\":\"mooncake-ci-test\",\"name\":\"mooncake-ci-${{ github.sha }}\",\"username\":\"${{ secrets.TONE_USER_NAME }}\",\"env_ifs\":\" \",\"env_info\":\"ARTIFACT_ID=${artifact_id} GIT_REPO=${{ github.repository }}\",\"signature\":\"$signature\"}" https://tone.openanolis.cn/api/job/create/ > job.json | |
| jq . job.json | |
| if [ $? -ne 0 ]; then | |
| echo "curl failed" | |
| exit 1 | |
| fi | |
| curl -sf -H 'Content-Type: application/json' -X POST -d "{\"workspace\":\"mooncake_test\",\"project\":\"mooncake-ci\",\"template\":\"mooncake-ci-test\",\"name\":\"mooncake-ci-${{ github.sha }}\",\"username\":\"${{ secrets.TONE_USER_NAME }}\",\"env_ifs\":\" \",\"env_info\":\"ARTIFACT_ID=${artifact_id} GIT_REPO=${{ github.repository }}\",\"signature\":\"$signature\"}" https://tone.openanolis.cn/api/job/create/ > job.json | |
| if [ $? -ne 0 ]; then | |
| echo "curl failed" | |
| exit 1 | |
| fi | |
| jq . job.json |
| echo "job_url: https://tone.openanolis.cn/ws/gclfnh19/test_result/${job_id}" | ||
| echo "job_id=${job_id}" >> $GITHUB_ENV | ||
| shell: bash | ||
|
|
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The job_id environment variable is set using echo "job_id=${job_id}" >> $GITHUB_ENV on line 197, but it's only available in subsequent steps, not within the same step. This is correct usage. However, ensure that if this step fails after setting the job_id but before completion, there's no mechanism to clean up the remote job. Consider adding a cleanup step that runs on failure using if: failure() to cancel any created jobs.
| - name: Cancel remote job on failure | |
| if: failure() && env.tone_user_name != '' | |
| run: | | |
| if [ -n "${{ env.job_id }}" ]; then | |
| signature="${{ secrets.TONE_USER_NAME }}|${{ secrets.TONE_USER_TOKEN }}|$(python3 -c "import time;print(time.time())")" | |
| signature="$(python3 -c "import base64;print(base64.b64encode(\"$signature\".encode('utf-8')).decode('utf-8'))")" | |
| curl -s -H 'Content-Type: application/json' -X POST -d "{\"id\":${{ env.job_id }},\"username\":\"${{ secrets.TONE_USER_NAME }}\",\"signature\":\"$signature\"}" https://tone.openanolis.cn/api/job/cancel/ | |
| echo "Attempted to cancel remote job with id ${{ env.job_id }}" | |
| else | |
| echo "No job_id found, nothing to cancel." | |
| fi | |
| shell: bash |
| job_status=$(jq .data.job_second_state job_status.json) | ||
| if [[ $job_status =~ "pass" ]]; then | ||
| echo "job successful !" | ||
| exit 0 | ||
| elif [[ $job_status =~ "fail" ]] ; then |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern matching using =~ on lines 217 and 220 will match if the string contains "pass" or "fail" anywhere in the job_second_state field. This could lead to false positives if the state contains these substrings in unexpected ways. Consider using exact string matching or anchored patterns if the API returns specific state values like "pass", "fail", "running", etc. For example: if [ "$job_status" = "\"pass\"" ] (note the quotes since jq output includes JSON quotes) or use jq -r to get raw output without quotes.
| job_status=$(jq .data.job_second_state job_status.json) | |
| if [[ $job_status =~ "pass" ]]; then | |
| echo "job successful !" | |
| exit 0 | |
| elif [[ $job_status =~ "fail" ]] ; then | |
| job_status=$(jq -r .data.job_second_state job_status.json) | |
| if [ "$job_status" = "pass" ]; then | |
| echo "job successful !" | |
| exit 0 | |
| elif [ "$job_status" = "fail" ] ; then |
| time=0 | ||
| while true; do | ||
| if [ $time -gt 720 ]; then | ||
| echo "timeout" | ||
| exit 1 |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The polling loop uses a simple counter with 10-second sleeps, resulting in a 2-hour timeout (720 * 10 seconds = 7200 seconds = 2 hours). However, this timeout might be too long for CI pipelines. Consider:
- Adding a configurable timeout or making the timeout value more explicit (e.g.,
max_iterations=720 # 2 hours with 10s intervals) - Reducing the timeout to a more reasonable value for CI (e.g., 30-60 minutes)
- Adding periodic status updates to show the job is still running
| steps: | ||
| - name: trigger T-one test | ||
| if: ${{ env.tone_user_name != '' }} | ||
| run: | | ||
| curl -L -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/repos/${{ github.repository }}/actions/runs/${{ github.run_id }}/artifacts > artifact.json | ||
| artifact_id=$(jq -r ".artifacts[] | select(.name | contains(\"py312\") ) | .id" artifact.json) | ||
| signature="${{ secrets.TONE_USER_NAME }}|${{ secrets.TONE_USER_TOKEN }}|$(python3 -c "import time;print(time.time())")" | ||
| signature="$(python3 -c "import base64;print(base64.b64encode(\"$signature\".encode('utf-8')).decode('utf-8'))")" | ||
| curl -s -H 'Content-Type: application/json' -X POST -d "{\"workspace\":\"mooncake_test\",\"project\":\"mooncake-ci\",\"template\":\"mooncake-ci-test\",\"name\":\"mooncake-ci-${{ github.sha }}\",\"username\":\"${{ secrets.TONE_USER_NAME }}\",\"env_ifs\":\" \",\"env_info\":\"ARTIFACT_ID=${artifact_id} GIT_REPO=${{ github.repository }}\",\"signature\":\"$signature\"}" https://tone.openanolis.cn/api/job/create/ > job.json |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Security concern: The signature generation uses a timestamp but doesn't validate the response within a time window, and the signature is passed in plaintext over HTTPS. While HTTPS provides transport security, consider if the T-one API has specific security requirements. Additionally, ensure that the secrets (TONE_USER_NAME and TONE_USER_TOKEN) are properly configured in the repository settings, as missing secrets could cause silent failures or expose the authentication mechanism.
| steps: | |
| - name: trigger T-one test | |
| if: ${{ env.tone_user_name != '' }} | |
| run: | | |
| curl -L -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/repos/${{ github.repository }}/actions/runs/${{ github.run_id }}/artifacts > artifact.json | |
| artifact_id=$(jq -r ".artifacts[] | select(.name | contains(\"py312\") ) | .id" artifact.json) | |
| signature="${{ secrets.TONE_USER_NAME }}|${{ secrets.TONE_USER_TOKEN }}|$(python3 -c "import time;print(time.time())")" | |
| signature="$(python3 -c "import base64;print(base64.b64encode(\"$signature\".encode('utf-8')).decode('utf-8'))")" | |
| curl -s -H 'Content-Type: application/json' -X POST -d "{\"workspace\":\"mooncake_test\",\"project\":\"mooncake-ci\",\"template\":\"mooncake-ci-test\",\"name\":\"mooncake-ci-${{ github.sha }}\",\"username\":\"${{ secrets.TONE_USER_NAME }}\",\"env_ifs\":\" \",\"env_info\":\"ARTIFACT_ID=${artifact_id} GIT_REPO=${{ github.repository }}\",\"signature\":\"$signature\"}" https://tone.openanolis.cn/api/job/create/ > job.json | |
| tone_user_token: ${{ secrets.TONE_USER_TOKEN }} | |
| steps: | |
| - name: Check T-one secrets | |
| run: | | |
| if [ -z "${{ secrets.TONE_USER_NAME }}" ] || [ -z "${{ secrets.TONE_USER_TOKEN }}" ]; then | |
| echo "ERROR: TONE_USER_NAME and/or TONE_USER_TOKEN secrets are not set." | |
| exit 1 | |
| fi | |
| shell: bash | |
| - name: trigger T-one test | |
| if: ${{ env.tone_user_name != '' && env.tone_user_token != '' }} | |
| run: | | |
| curl -L -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/repos/${{ github.repository }}/actions/runs/${{ github.run_id }}/artifacts > artifact.json | |
| artifact_id=$(jq -r ".artifacts[] | select(.name | contains(\"py312\") ) | .id" artifact.json) | |
| timestamp=$(python3 -c "import time;print(int(time.time()))") | |
| signature=$(python3 -c "import hmac, hashlib, os; key=os.environ['TONE_USER_TOKEN'].encode('utf-8'); msg=f'{os.environ['TONE_USER_NAME']}|{key.decode()}|{os.environ['TIMESTAMP']}'.encode('utf-8'); print(hmac.new(key, msg, hashlib.sha256).hexdigest())" TONE_USER_NAME="${{ secrets.TONE_USER_NAME }}" TONE_USER_TOKEN="${{ secrets.TONE_USER_TOKEN }}" TIMESTAMP="$timestamp") | |
| curl -s -H 'Content-Type: application/json' -X POST -d "{\"workspace\":\"mooncake_test\",\"project\":\"mooncake-ci\",\"template\":\"mooncake-ci-test\",\"name\":\"mooncake-ci-${{ github.sha }}\",\"username\":\"${{ secrets.TONE_USER_NAME }}\",\"env_ifs\":\" \",\"env_info\":\"ARTIFACT_ID=${artifact_id} GIT_REPO=${{ github.repository }}\",\"timestamp\":\"$timestamp\",\"signature\":\"$signature\"}" https://tone.openanolis.cn/api/job/create/ > job.json |
Description
Type of Change
How Has This Been Tested?
Checklist