Skip to content

Commit af7f3c7

Browse files
committed
fix conflicts with main
Signed-off-by: Sophie du Couédic <[email protected]>
2 parents 8caf79a + eeedaf4 commit af7f3c7

File tree

10 files changed

+225
-71
lines changed

10 files changed

+225
-71
lines changed

.github/workflows/test.yml

Lines changed: 6 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -43,17 +43,14 @@ jobs:
4343
markers: "v0 and cpu and e2e"
4444
flags: "--timeout=300"
4545
- name: "V1-e2e"
46-
markers: "v1 and cpu and e2e"
46+
markers: "v1 and cpu and e2e and not cb"
4747
flags: "--timeout=300 --forked"
48-
- name: "V1-worker"
49-
markers: "v1 and not e2e"
50-
flags: "--timeout=300"
51-
- name: "utils"
52-
markers: "utils"
53-
flags: "--timeout=300"
54-
- name: "cb"
55-
markers: "cb"
48+
- name: "V1-cb"
49+
markers: "v1 and cpu and cb"
5650
flags: "--timeout=300 --forked"
51+
- name: "V1-worker and utils"
52+
markers: "v1 and not e2e or utils"
53+
flags: "--timeout=300"
5754

5855
name: "${{ matrix.test_suite.name }} (${{ matrix.vllm_version.name }})"
5956

@@ -163,10 +160,6 @@ jobs:
163160
# `uv run`, to avoid having `uv run` re-sync any dependencies or
164161
# re-install the vllm_sypre package from source
165162
source .venv/bin/activate
166-
if [ ${{ matrix.test_suite.markers }} == "cb" ]; then
167-
# install custom fms branch
168-
uv pip install git+https://github.com/foundation-model-stack/foundation-model-stack@paged_attn_mock --force-reinstall
169-
fi
170163
# commands to run if condition is true
171164
python3 -m pytest ${{ matrix.test_suite.flags }} \
172165
tests -v -m "${{ matrix.test_suite.markers }}"

docs/contributing/README.md

Lines changed: 67 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ If you encounter a bug or have a feature request, please search [existing issues
1212

1313
You can also reach out for support in the `#sig-spyre` channel in the [vLLM Slack](https://inviter.co/vllm-slack) workspace.
1414

15-
## Developing
15+
## Docs
1616

1717
### Building the docs with MkDocs
1818

@@ -21,7 +21,7 @@ You can also reach out for support in the `#sig-spyre` channel in the [vLLM Slac
2121
Install MkDocs along with the [plugins](https://github.com/vllm-project/vllm-spyre/blob/main/mkdocs.yaml) used in the vLLM Spyre documentation.
2222

2323
```bash
24-
pip install -r docs/requirements-docs.txt
24+
uv pip install -r docs/requirements-docs.txt
2525
```
2626

2727
!!! note
@@ -118,6 +118,71 @@ Then, run the continuous batching tests:
118118
python -m pytest -v -x tests/e2e -m cb
119119
```
120120

121+
## Debugging
122+
123+
!!! tip
124+
You can `oc edit` a pod and change the image without having the pod schedule to a different node. This can be useful for testing whether software or hardware is the issue.
125+
126+
- The script `/opt/sentient/bin/aiu-query-devices` in the pod can be used to see the connectivity between the `AIUs` on the machine. You can also infer this from environment variables with names like `AIU_TIER_\d_SET_\d_RANK_\d`.
127+
128+
- `SPYRE_DEVICES` can be used to select which devices will be selected for each `RANK`. This is similar to how `CUDA_VISIBLE_DEVICES` works for GPU.
129+
130+
!!! example
131+
`0,2,4,6` will assign rank `0` to AIU index `0`, rank `1` to AIU index `2`, rank `2` to AIU index `4`, and rank `3` to AIU index `6`.
132+
133+
- An alternative is to use `AIU_WORLD_RANK_\d=0000:aa:00.0` to explicitly map ranks to `PCI` addresses (make sure there are no duplicates used at runtime).
134+
135+
- A bash script that uses `/opt/sentient/senlib/bin/senlib_unit_test` to check each `AIU` allocated to the pod to see if they work for a basic test:
136+
137+
```shell
138+
--8<-- "tools/check_aiu.sh"
139+
```
140+
141+
### Logging levels
142+
143+
Various log levels that can be configured:
144+
145+
- `DTLOG_LEVEL` - `TRACE, DEBUG, INFO, WARNING, ERROR`
146+
- `TORCH_SENDNN_LOG` - `WARNING, CRITICAL`
147+
- `VLLM_LOGGING_LEVEL` - `DEBUG, INFO, WARNING, ERROR`
148+
149+
!!! tip
150+
`DTLOG_LEVEL=INFO` (piped to file) can help you see what device addresses are actually in use. Look for the string `Opened: SEN:VFIO`.
151+
152+
!!! tip
153+
In order to stop massive log spew, this configuration is ideal:
154+
```
155+
export DTLOG_LEVEL=ERROR
156+
export TORCH_SENDNN_LOG=CRITICAL
157+
```
158+
159+
### Topology Aware Allocation
160+
161+
This section is specific to the AIU operator and scheduling workloads onto specific cards.
162+
163+
(TODO: link to docs once they exist)
164+
165+
- This mode supports users to request a special set of AIU cards based on `PCI` topology. By using this mode, we can guarantee to pick up AIU cards of a particular class in the node:
166+
167+
- `Tier0` provides a set of cards in the same `PCI` switch.
168+
- `Tier1` provides a set of cards from at most one-hop away `PCI` switch.
169+
- `Tier2` provides a set of cards from at most two-hops away `PCI` switch.
170+
171+
- Running a Multi AIU Job using `ibm.com/aiu_pf_tier0,tier1,tier2`:
172+
173+
- This resource type is used for picking up a topology aware card set, which is required to run tensor parallel (`TP`) workloads more effectively. By using `tierX` class resource, `TP` users can automatically get a best performing card set for the workload.
174+
175+
- The maximum number of allocatable resources in each tier depends on the platform & cluster, but we can get up to:
176+
177+
- `Tier0` - `4` cards
178+
- `Tier1` - `8` cards
179+
- `Tier2` - `16` cards
180+
181+
- Devices in `tier0` can do `peer-to-peer (P2P) RDMA`, devices on different trees use `Host DMA` sharing files through `/dev/shm`.
182+
183+
!!! warning
184+
If you request cards greater than the cards supported by the switch, the pod will never be scheduled. In the above example, if you specify `ibm.com/aiu_pf_tier0: 5` in your yaml, the pod will never be scheduled because the maximum set of cards in `tier0` was specified as `4`.
185+
121186
## Pull Requests
122187

123188
### Linting

pyproject.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ readme = "README.md"
1212
license = {text = "Apache 2"}
1313
dependencies = [
1414
"fms-model-optimizer>=0.2.0",
15-
"ibm-fms==1.0.0",
15+
"ibm-fms==1.1.0",
1616
"vllm>=0.9.0,!=0.9.1",
1717
]
1818
requires-python = ">=3.9"
@@ -140,6 +140,8 @@ plugins.md013.enabled = false # line-length
140140
plugins.md041.enabled = false # first-line-h1
141141
plugins.md033.enabled = false # inline-html
142142
plugins.md024.allow_different_nesting = true # no-duplicate-headers
143+
plugins.md007.enabled = true
144+
plugins.md007.indent = 4
143145

144146
[dependency-groups]
145147
dev = [

tests/e2e/test_spyre_cb.py

Lines changed: 37 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,14 @@
88
from typing import Any
99

1010
import pytest
11+
<<<<<<< HEAD
1112
from spyre_util import (compare_results, create_random_request,
1213
generate_hf_output, generate_spyre_vllm_output,
1314
get_spyre_model_list)
15+
=======
16+
from spyre_util import (create_random_request, generate_cb_spyre_vllm_output,
17+
get_spyre_backend_list, get_spyre_model_list)
18+
>>>>>>> origin/main
1419
from vllm import EngineArgs, SamplingParams
1520
from vllm.v1.engine import EngineCoreRequest
1621
from vllm.v1.engine.core import EngineCore
@@ -23,6 +28,7 @@
2328
"appropriately completes the request. Be polite in your response to the "
2429
"user.\n\n### Instruction:\n{}\n\n### Response:")
2530

31+
<<<<<<< HEAD
2632

2733
@pytest.mark.cb
2834
@pytest.mark.parametrize("max_num_seqs", [2, 3, 4],
@@ -41,6 +47,35 @@
4147
"how do I add multiple new columns in m for power query or power bi?"),
4248
template.format("Convert char to string in Java."),
4349
]])
50+
=======
51+
@pytest.mark.cb
52+
@pytest.mark.v1
53+
@pytest.mark.parametrize("max_num_seqs", [2, 3, 4],
54+
ids=lambda val: f"max_num_seqs({val})")
55+
@pytest.mark.parametrize("model", get_spyre_model_list())
56+
@pytest.mark.parametrize("backend", get_spyre_backend_list())
57+
@pytest.mark.parametrize(
58+
"prompts",
59+
[
60+
[
61+
"7 6 5 4",
62+
"10 9 8 7",
63+
],
64+
[
65+
"7 6 5 4",
66+
"10 9 8 7",
67+
"8 7 6 5",
68+
],
69+
[
70+
"7 6 5 4",
71+
"10 9 8 7",
72+
"8 7 6 5",
73+
"9 8 7 6",
74+
],
75+
],
76+
ids=lambda val: f"num_prompts({len(val)})",
77+
)
78+
>>>>>>> origin/main
4479
def test_cb_handling(
4580
model: str,
4681
backend: str,
@@ -648,9 +683,9 @@ def augment_checked_steps(
648683

649684

650685
@pytest.mark.cb
686+
@pytest.mark.v1
651687
@pytest.mark.parametrize("model", get_spyre_model_list())
652-
@pytest.mark.parametrize(
653-
"backend", [pytest.param("eager", marks=pytest.mark.cpu, id="eager")])
688+
@pytest.mark.parametrize("backend", get_spyre_backend_list())
654689
@pytest.mark.parametrize("max_num_seqs", [2])
655690
@pytest.mark.parametrize(
656691
"seqs_max_tokens,prompts_lengths,steps_add_reqs,checked_steps,"

tools/check_aiu.sh

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
#!/bin/bash
2+
3+
# A bash script that uses `/opt/sentient/senlib/bin/senlib_unit_test`
4+
# to check each AIU allocated to the pod to see if
5+
# they work for a basic test:
6+
7+
cleanup_done=0
8+
cleanup() {
9+
if [ "$cleanup_done" -eq 0 ] && [ -f ~/.senlib.json.bak ]; then
10+
echo "Restoring .senlib.json from backup"
11+
cp ~/.senlib.json.bak ~/.senlib.json
12+
cleanup_done=1
13+
fi
14+
kill -- -$PPID
15+
wait
16+
exit
17+
}
18+
19+
trap cleanup EXIT SIGINT
20+
21+
# Create backup .senlib.json if it doesn't exist
22+
if [ -f "$HOME"/.senlib.json ]; then
23+
if [ ! -f "$HOME"/.senlib.json.bak ]; then
24+
echo "Creating backup of $HOME/.senlib.json"
25+
cp "$HOME"/.senlib.json "$HOME"/.senlib.json.bak
26+
else
27+
echo "$HOME/.senlib.json.bak already exists"
28+
fi
29+
fi
30+
31+
for device_id in $(jq -r .GENERAL.sen_bus_id[] /etc/aiu/senlib_config.json); do
32+
echo "======================================================================"
33+
echo "Checking AIU ${device_id}"
34+
echo "======================================================================"
35+
jq -n '{"GENERAL": { "sen_bus_id": "'"${device_id}"'" }}' > .senlib.json
36+
# run in background to not override bash signal handler
37+
timeout 10 /opt/sentient/senlib/bin/senlib_unit_test --gtest_filter=SmlPF1VF0.Open &
38+
wait
39+
done

0 commit comments

Comments
 (0)