Skip to content

Conversation

@Ann-1024
Copy link
Collaborator

Description

Type of Change

  • Types
    • Bug fix
    • New feature
      • Transfer Engine
      • Mooncake Store
      • Mooncake EP
      • Integration
      • P2P Store
      • Python Wheel
    • Breaking change
    • CI/CD
    • Documentation update
    • Other

How Has This Been Tested?

Checklist

  • I have performed a self-review of my own code.
  • I have updated the documentation.
  • I have added tests to prove my changes are effective.

@gemini-code-assist
Copy link
Contributor

Note

Gemini is unable to generate a summary for this pull request due to the file types involved not being currently supported.

@stmatengss stmatengss requested a review from Copilot November 21, 2025 02:06
Copilot finished reviewing on behalf of stmatengss November 21, 2025 02:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new CI job test-sglang-integration that triggers external integration tests on the T-one testing platform. The job creates a remote test job using the T-one API, polls for completion status, and reports results back to the GitHub Actions workflow.

Key changes:

  • Adds new test-sglang-integration job that integrates with T-one external testing platform
  • Implements job creation and status polling logic using T-one REST APIs
  • Uses authenticated API calls with base64-encoded signatures for job management
Comments suppressed due to low confidence (1)

.github/workflows/ci.yml:250

  • The indentation is incorrect. Lines 230-250 appear to belong to the test-wheel-ubuntu job based on their content (they reference test_env/bin/activate and run mooncake services), but they are placed after the test-sglang-integration job definition ends. These steps should either be:
  1. Indented to be part of the test-wheel-ubuntu job (if they were originally there), or
  2. Part of a separate job definition.

The current structure will cause a YAML parsing error in the GitHub Actions workflow.

    - name: Start Mooncake Master
      run: |
        source test_env/bin/activate
        mkdir -p /tmp/mooncake_storage
        mooncake_master \
          --eviction_high_watermark_ratio=0.95 \
          --cluster_id=ci_test_cluster \
          --port 50051 &
        sleep 3
      shell: bash

    - name: Run Python Tensor API Performance Test (CI check)
      env:
        MOONCAKE_MASTER: "127.0.0.1:50051"
        MOONCAKE_TE_META_DATA_SERVER: "http://127.0.0.1:8080/metadata"
        MOONCAKE_PROTOCOL: "tcp"
        LOCAL_HOSTNAME: "127.0.0.1"
      run: |
        source test_env/bin/activate
        python scripts/test_tensor_api.py -n 1
      shell: bash

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- name: trigger T-one test
if: ${{ env.tone_user_name != '' }}
run: |
curl -L -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/repos/${{ github.repository }}/actions/runs/${{ github.run_id }}/artifacts > artifact.json
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The artifact download in line 178 fetches artifacts using the GitHub API without authentication. This will work for public repositories, but if the repository is private, this call will fail. Consider using the GITHUB_TOKEN for authentication:

curl -L -H "Accept: application/vnd.github+json" -H "Authorization: Bearer ${{ secrets.GITHUB_TOKEN }}" -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/repos/${{ github.repository }}/actions/runs/${{ github.run_id }}/artifacts > artifact.json
Suggested change
curl -L -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/repos/${{ github.repository }}/actions/runs/${{ github.run_id }}/artifacts > artifact.json
curl -L -H "Accept: application/vnd.github+json" -H "Authorization: Bearer ${{ secrets.GITHUB_TOKEN }}" -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/repos/${{ github.repository }}/actions/runs/${{ github.run_id }}/artifacts > artifact.json

Copilot uses AI. Check for mistakes.
if: ${{ env.tone_user_name != '' }}
run: |
curl -L -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/repos/${{ github.repository }}/actions/runs/${{ github.run_id }}/artifacts > artifact.json
artifact_id=$(jq -r ".artifacts[] | select(.name | contains(\"py312\") ) | .id" artifact.json)
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error handling is missing for the jq command that extracts the artifact_id. If no artifact matching "py312" is found, artifact_id will be empty, but this is not checked before using it in the API call on line 182. This could result in an invalid API request. Add validation:

artifact_id=$(jq -r ".artifacts[] | select(.name | contains(\"py312\") ) | .id" artifact.json)
if [ -z "$artifact_id" ]; then
  echo "ERROR: No py312 artifact found"
  exit 1
fi
Suggested change
artifact_id=$(jq -r ".artifacts[] | select(.name | contains(\"py312\") ) | .id" artifact.json)
artifact_id=$(jq -r ".artifacts[] | select(.name | contains(\"py312\") ) | .id" artifact.json)
if [ -z "$artifact_id" ]; then
echo "ERROR: No py312 artifact found"
exit 1
fi

Copilot uses AI. Check for mistakes.
fi
job_id=$(jq .data.id job.json)
echo "check job status here and remember to cancel it before restart the job !"
echo "job_url: https://tone.openanolis.cn/ws/gclfnh19/test_result/${job_id}"
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The job URL contains a hard-coded workspace ID "gclfnh19" (line 196), but the API request uses workspace "mooncake_test" (line 182). If these need to match, consider using a consistent value or making it configurable. If they represent different entities in the T-one system, add a comment explaining the relationship.

Copilot uses AI. Check for mistakes.
echo "job_id=${job_id}" >> $GITHUB_ENV
shell: bash

- name: qurey job results
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: "qurey" should be "query".

Suggested change
- name: qurey job results
- name: query job results

Copilot uses AI. Check for mistakes.
Comment on lines +182 to +187
curl -s -H 'Content-Type: application/json' -X POST -d "{\"workspace\":\"mooncake_test\",\"project\":\"mooncake-ci\",\"template\":\"mooncake-ci-test\",\"name\":\"mooncake-ci-${{ github.sha }}\",\"username\":\"${{ secrets.TONE_USER_NAME }}\",\"env_ifs\":\" \",\"env_info\":\"ARTIFACT_ID=${artifact_id} GIT_REPO=${{ github.repository }}\",\"signature\":\"$signature\"}" https://tone.openanolis.cn/api/job/create/ > job.json
jq . job.json
if [ $? -ne 0 ]; then
echo "curl failed"
exit 1
fi
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition checking jq exit status after displaying the JSON output is problematic. Line 183 runs jq . job.json which will display the JSON, and line 184 checks $? which captures the exit code of jq, not the preceding curl command. If the curl succeeded but returned invalid JSON, this check would catch it, but the logic seems intended to check if curl failed. Consider checking the curl exit status directly or adding error handling to the curl command itself (e.g., using curl's -f flag to fail on HTTP errors).

Suggested change
curl -s -H 'Content-Type: application/json' -X POST -d "{\"workspace\":\"mooncake_test\",\"project\":\"mooncake-ci\",\"template\":\"mooncake-ci-test\",\"name\":\"mooncake-ci-${{ github.sha }}\",\"username\":\"${{ secrets.TONE_USER_NAME }}\",\"env_ifs\":\" \",\"env_info\":\"ARTIFACT_ID=${artifact_id} GIT_REPO=${{ github.repository }}\",\"signature\":\"$signature\"}" https://tone.openanolis.cn/api/job/create/ > job.json
jq . job.json
if [ $? -ne 0 ]; then
echo "curl failed"
exit 1
fi
curl -sf -H 'Content-Type: application/json' -X POST -d "{\"workspace\":\"mooncake_test\",\"project\":\"mooncake-ci\",\"template\":\"mooncake-ci-test\",\"name\":\"mooncake-ci-${{ github.sha }}\",\"username\":\"${{ secrets.TONE_USER_NAME }}\",\"env_ifs\":\" \",\"env_info\":\"ARTIFACT_ID=${artifact_id} GIT_REPO=${{ github.repository }}\",\"signature\":\"$signature\"}" https://tone.openanolis.cn/api/job/create/ > job.json
if [ $? -ne 0 ]; then
echo "curl failed"
exit 1
fi
jq . job.json

Copilot uses AI. Check for mistakes.
echo "job_url: https://tone.openanolis.cn/ws/gclfnh19/test_result/${job_id}"
echo "job_id=${job_id}" >> $GITHUB_ENV
shell: bash

Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The job_id environment variable is set using echo "job_id=${job_id}" >> $GITHUB_ENV on line 197, but it's only available in subsequent steps, not within the same step. This is correct usage. However, ensure that if this step fails after setting the job_id but before completion, there's no mechanism to clean up the remote job. Consider adding a cleanup step that runs on failure using if: failure() to cancel any created jobs.

Suggested change
- name: Cancel remote job on failure
if: failure() && env.tone_user_name != ''
run: |
if [ -n "${{ env.job_id }}" ]; then
signature="${{ secrets.TONE_USER_NAME }}|${{ secrets.TONE_USER_TOKEN }}|$(python3 -c "import time;print(time.time())")"
signature="$(python3 -c "import base64;print(base64.b64encode(\"$signature\".encode('utf-8')).decode('utf-8'))")"
curl -s -H 'Content-Type: application/json' -X POST -d "{\"id\":${{ env.job_id }},\"username\":\"${{ secrets.TONE_USER_NAME }}\",\"signature\":\"$signature\"}" https://tone.openanolis.cn/api/job/cancel/
echo "Attempted to cancel remote job with id ${{ env.job_id }}"
else
echo "No job_id found, nothing to cancel."
fi
shell: bash

Copilot uses AI. Check for mistakes.
Comment on lines +216 to +220
job_status=$(jq .data.job_second_state job_status.json)
if [[ $job_status =~ "pass" ]]; then
echo "job successful !"
exit 0
elif [[ $job_status =~ "fail" ]] ; then
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern matching using =~ on lines 217 and 220 will match if the string contains "pass" or "fail" anywhere in the job_second_state field. This could lead to false positives if the state contains these substrings in unexpected ways. Consider using exact string matching or anchored patterns if the API returns specific state values like "pass", "fail", "running", etc. For example: if [ "$job_status" = "\"pass\"" ] (note the quotes since jq output includes JSON quotes) or use jq -r to get raw output without quotes.

Suggested change
job_status=$(jq .data.job_second_state job_status.json)
if [[ $job_status =~ "pass" ]]; then
echo "job successful !"
exit 0
elif [[ $job_status =~ "fail" ]] ; then
job_status=$(jq -r .data.job_second_state job_status.json)
if [ "$job_status" = "pass" ]; then
echo "job successful !"
exit 0
elif [ "$job_status" = "fail" ] ; then

Copilot uses AI. Check for mistakes.
Comment on lines +203 to +207
time=0
while true; do
if [ $time -gt 720 ]; then
echo "timeout"
exit 1
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The polling loop uses a simple counter with 10-second sleeps, resulting in a 2-hour timeout (720 * 10 seconds = 7200 seconds = 2 hours). However, this timeout might be too long for CI pipelines. Consider:

  1. Adding a configurable timeout or making the timeout value more explicit (e.g., max_iterations=720 # 2 hours with 10s intervals)
  2. Reducing the timeout to a more reasonable value for CI (e.g., 30-60 minutes)
  3. Adding periodic status updates to show the job is still running

Copilot uses AI. Check for mistakes.
Comment on lines +174 to +182
steps:
- name: trigger T-one test
if: ${{ env.tone_user_name != '' }}
run: |
curl -L -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/repos/${{ github.repository }}/actions/runs/${{ github.run_id }}/artifacts > artifact.json
artifact_id=$(jq -r ".artifacts[] | select(.name | contains(\"py312\") ) | .id" artifact.json)
signature="${{ secrets.TONE_USER_NAME }}|${{ secrets.TONE_USER_TOKEN }}|$(python3 -c "import time;print(time.time())")"
signature="$(python3 -c "import base64;print(base64.b64encode(\"$signature\".encode('utf-8')).decode('utf-8'))")"
curl -s -H 'Content-Type: application/json' -X POST -d "{\"workspace\":\"mooncake_test\",\"project\":\"mooncake-ci\",\"template\":\"mooncake-ci-test\",\"name\":\"mooncake-ci-${{ github.sha }}\",\"username\":\"${{ secrets.TONE_USER_NAME }}\",\"env_ifs\":\" \",\"env_info\":\"ARTIFACT_ID=${artifact_id} GIT_REPO=${{ github.repository }}\",\"signature\":\"$signature\"}" https://tone.openanolis.cn/api/job/create/ > job.json
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Security concern: The signature generation uses a timestamp but doesn't validate the response within a time window, and the signature is passed in plaintext over HTTPS. While HTTPS provides transport security, consider if the T-one API has specific security requirements. Additionally, ensure that the secrets (TONE_USER_NAME and TONE_USER_TOKEN) are properly configured in the repository settings, as missing secrets could cause silent failures or expose the authentication mechanism.

Suggested change
steps:
- name: trigger T-one test
if: ${{ env.tone_user_name != '' }}
run: |
curl -L -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/repos/${{ github.repository }}/actions/runs/${{ github.run_id }}/artifacts > artifact.json
artifact_id=$(jq -r ".artifacts[] | select(.name | contains(\"py312\") ) | .id" artifact.json)
signature="${{ secrets.TONE_USER_NAME }}|${{ secrets.TONE_USER_TOKEN }}|$(python3 -c "import time;print(time.time())")"
signature="$(python3 -c "import base64;print(base64.b64encode(\"$signature\".encode('utf-8')).decode('utf-8'))")"
curl -s -H 'Content-Type: application/json' -X POST -d "{\"workspace\":\"mooncake_test\",\"project\":\"mooncake-ci\",\"template\":\"mooncake-ci-test\",\"name\":\"mooncake-ci-${{ github.sha }}\",\"username\":\"${{ secrets.TONE_USER_NAME }}\",\"env_ifs\":\" \",\"env_info\":\"ARTIFACT_ID=${artifact_id} GIT_REPO=${{ github.repository }}\",\"signature\":\"$signature\"}" https://tone.openanolis.cn/api/job/create/ > job.json
tone_user_token: ${{ secrets.TONE_USER_TOKEN }}
steps:
- name: Check T-one secrets
run: |
if [ -z "${{ secrets.TONE_USER_NAME }}" ] || [ -z "${{ secrets.TONE_USER_TOKEN }}" ]; then
echo "ERROR: TONE_USER_NAME and/or TONE_USER_TOKEN secrets are not set."
exit 1
fi
shell: bash
- name: trigger T-one test
if: ${{ env.tone_user_name != '' && env.tone_user_token != '' }}
run: |
curl -L -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/repos/${{ github.repository }}/actions/runs/${{ github.run_id }}/artifacts > artifact.json
artifact_id=$(jq -r ".artifacts[] | select(.name | contains(\"py312\") ) | .id" artifact.json)
timestamp=$(python3 -c "import time;print(int(time.time()))")
signature=$(python3 -c "import hmac, hashlib, os; key=os.environ['TONE_USER_TOKEN'].encode('utf-8'); msg=f'{os.environ['TONE_USER_NAME']}|{key.decode()}|{os.environ['TIMESTAMP']}'.encode('utf-8'); print(hmac.new(key, msg, hashlib.sha256).hexdigest())" TONE_USER_NAME="${{ secrets.TONE_USER_NAME }}" TONE_USER_TOKEN="${{ secrets.TONE_USER_TOKEN }}" TIMESTAMP="$timestamp")
curl -s -H 'Content-Type: application/json' -X POST -d "{\"workspace\":\"mooncake_test\",\"project\":\"mooncake-ci\",\"template\":\"mooncake-ci-test\",\"name\":\"mooncake-ci-${{ github.sha }}\",\"username\":\"${{ secrets.TONE_USER_NAME }}\",\"env_ifs\":\" \",\"env_info\":\"ARTIFACT_ID=${artifact_id} GIT_REPO=${{ github.repository }}\",\"timestamp\":\"$timestamp\",\"signature\":\"$signature\"}" https://tone.openanolis.cn/api/job/create/ > job.json

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant