Skip to content

Commit

Permalink
Merge pull request #8 from ROCmSoftwarePlatform/jack-fixes
Browse files Browse the repository at this point in the history
Some fixes for rccl parser
  • Loading branch information
lcskrishna authored Oct 27, 2023
2 parents 9fa0777 + 7beb999 commit 305353d
Show file tree
Hide file tree
Showing 4 changed files with 9 additions and 4 deletions.
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,11 @@ NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=INIT,COLL <application/executable> |& tee nccl
```
HSA_FORCE_FINE_GRAIN_PCIE=1 NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=INIT,COLL <application/executable> |& tee nccl_debug_log.txt
```
**NOTE:**
For some workloads buffered output can impact the RCCL/NCCL log format which may break the parser. The following env variable can help with this
```
PYTHONBUFFERED=x stdbuf -i0 -o0 -e0
```


### Automated way:
Expand Down
2 changes: 1 addition & 1 deletion rccl-tests
4 changes: 2 additions & 2 deletions run_parser_and_generate_summary.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def main():
rccl_tests_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "rccl-tests")
os.system("cp net_unique.sh " + rccl_tests_path)
os.chdir(rccl_tests_path)
if os.system("./install.sh > /dev/null 2>&1"):
if os.system("./install.sh --rccl_home=/opt/rocm 2>&1"):
print("ERROR: Failed to install rccl-tests.")
sys.exit(1)

Expand All @@ -27,7 +27,7 @@ def main():
print ("ERROR: Unable to run rccl-tests properly.")
sys.exit(1)
os.system("mv rccl_perf_log.txt ../")
os.chdir(os.path.join(os.path.dirname(os.path.realpath(__file__)), "../"))
os.chdir(os.path.join(os.path.dirname(os.path.realpath(__file__))))

print (os.getcwd())
summary_cmd = "python generate_summary.py --log-file rccl_perf_log.txt --script-file net_unique.sh --count-file net_counts.csv"
Expand Down

0 comments on commit 305353d

Please sign in to comment.