Skip to content

[midend/lib/Conversion]TOSA:ReduceSumOP Vectorize Optimization #490

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions examples/BuddyDeepSeekR1/AnalyseDialectOps.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This analysis seems unrelated to the vectorization optimization. Please separate it into a different PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only useful for analysis that counts the number of operations in different dialects.
Will re-issue a different PR

Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
#!/usr/bin/env python3

import os
import re
from collections import Counter, defaultdict
from pathlib import Path

def extract_dialect_ops(mlir_file_path):
"""
Extract operations from all dialects in an MLIR file and count their occurrences.

Args:
mlir_file_path (str): Path to the MLIR file

Returns:
dict: Dictionary containing dialect names as keys and Counter objects as values
"""
# Read the MLIR file
with open(mlir_file_path, 'r') as f:
content = f.read()

# Find all operations using regex
# This pattern matches lines that contain operation names with dialect prefix
# Excludes numbers and common non-dialect prefixes
op_pattern = r'([a-zA-Z_][a-zA-Z0-9_]*)\.([a-zA-Z_][a-zA-Z0-9_]*)'
all_ops = re.findall(op_pattern, content)

# Group operations by dialect
dialect_ops = defaultdict(Counter)
for dialect, op in all_ops:
# Skip common non-dialect prefixes
if dialect.lower() in ['func', 'module', 'memref', 'arith', 'builtin']:
continue
dialect_ops[dialect][op] += 1

return dialect_ops

def main():
# Get the directory of the current script
current_dir = Path(__file__).parent

# Construct path to subgraph0.mlir
mlir_file = current_dir / 'subgraph0.mlir'

if not mlir_file.exists():
print(f"Error: {mlir_file} not found")
return

# Extract and count operations by dialect
dialect_ops = extract_dialect_ops(str(mlir_file))

# Print results
print("\nMLIR Operation Statistics:")
print("=" * 60)
print(f"{'Dialect':<20} {'Operation':<30} {'Count':<10}")
print("=" * 60)

total_ops = 0
total_unique_ops = 0

# Sort dialects by total operation count
sorted_dialects = sorted(
dialect_ops.items(),
key=lambda x: sum(x[1].values()),
reverse=True
)

for dialect, ops in sorted_dialects:
dialect_total = sum(ops.values())
total_ops += dialect_total
total_unique_ops += len(ops)

print(f"\n{dialect} (Total: {dialect_total} ops)")
print("-" * 60)

# Sort operations by count
sorted_ops = sorted(ops.items(), key=lambda x: x[1], reverse=True)
for op, count in sorted_ops:
print(f"{'':<20} {op:<30} {count:<10}")

print("\n" + "=" * 60)
print(f"Total dialects: {len(dialect_ops)}")
print(f"Total unique operations: {total_unique_ops}")
print(f"Total operation instances: {total_ops}")

if __name__ == "__main__":
main()
22 changes: 22 additions & 0 deletions examples/BuddyDeepSeekR1/makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/bin/bash
BUDDY_OPT := ../../build/bin/buddy-opt
MLIR_OPT := ../../llvm/build/bin/mlir-opt
MLIR_TRANSLATE := ../../llvm/build/bin/mlir-translate
MLIR_CPU_RUNNER := ../../llvm/build/bin/mlir-cpu-runner
LLC := ../../llvm/build/bin/llc
OPT_FLAG := -O0

ifeq ($(shell uname),Linux)
MLIR_RUNNER_UTILS := ../../llvm/build/lib/libmlir_runner_utils.so
MLIR_C_RUNNER_UTILS := ../../llvm/build/lib/libmlir_c_runner_utils.so
LIB_OMP := ../../llvm/build/lib/libomp.so
MTRIPLE := x86_64-unknown-linux-gnu
else ifeq ($(shell uname),Darwin)
MLIR_RUNNER_UTILS := ../../llvm/build/lib/libmlir_runner_utils.dylib
MLIR_C_RUNNER_UTILS := ../../llvm/build/lib/libmlir_c_runner_utils.dylib
MTRIPLE := x86_64-apple-darwin
endif

lower-deepseek-r1-tosa:
@${MLIR_OPT} ./subgraph0.mlir \
-pass-pipeline "builtin.module(func.func(tosa-to-linalg-named),func.func(tosa-to-linalg),func.func(tosa-to-tensor),func.func(tosa-to-arith))" -o ./subgraph0-lower.mlir
121 changes: 121 additions & 0 deletions examples/BuddyNext/compare_outputs.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use FileCheck to verify correctness and use rtclock to obtain performance results. See here as an example.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script is used to verify the correctness of using different Pass Pipelines for the same file and calculate the speedup ratio to verify the efficiency.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether it is an mlir file, you need to use FileCheck to verify the correctness

Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
#!/bin/bash

# 设置颜色输出
GREEN='\033[0;32m'
RED='\033[0;31m'
NC='\033[0m' # No Color
YELLOW='\033[1;33m'
BLUE='\033[0;34m'

# 检查命令行参数
if [ $# -ne 2 ]; then
echo -e "${YELLOW}Usage: $0 <command1> <command2>${NC}"
echo "Example: $0 'next-reduce-sum-run' 'next-reduce-sum-vec-manual-run'"
exit 1
fi

CMD1="$1"
CMD2="$2"
RUNS=10

# 创建临时文件存储输出
OUTPUT1=$(mktemp)
OUTPUT2=$(mktemp)
PROCESSED1=$(mktemp)
SPEEDUPS=$(mktemp)

# 提取时间数据
extract_time() {
local file="$1"
grep -o '[0-9]\+\.[0-9]\+e[-+]\?[0-9]\+\|[0-9]\+\.[0-9]\+' "$file"
}

# 转换时间为秒
convert_to_seconds() {
local time_val="$1"
if [[ $time_val =~ e ]]; then
echo "$time_val" | sed 's/e/*10^/' | bc -l
else
printf "%.9f" $time_val
fi
}

# 计算平均值
calculate_mean() {
local file="$1"
local sum=0
local count=0
while read -r line; do
sum=$(echo "$sum + $line" | bc -l)
count=$((count + 1))
done < "$file"
if [ $count -gt 0 ]; then
echo "scale=9; $sum / $count" | bc -l
else
echo "0"
fi
}

echo -e "${BLUE}Running each version $RUNS times...${NC}"

# 运行两个命令并计算每次的加速比
for ((i=1; i<=$RUNS; i++)); do
echo -ne "\rRun $i/$RUNS"

# 运行第一个命令
TEMP_OUT1=$(mktemp)
make $CMD1 > "$TEMP_OUT1" 2>/dev/null
TIME1=$(extract_time "$TEMP_OUT1")
if [ -n "$TIME1" ]; then
TIME1=$(convert_to_seconds "$TIME1")
fi

# 运行第二个命令
TEMP_OUT2=$(mktemp)
make $CMD2 > "$TEMP_OUT2" 2>/dev/null
TIME2=$(extract_time "$TEMP_OUT2")
if [ -n "$TIME2" ]; then
TIME2=$(convert_to_seconds "$TIME2")
fi

# 保存第一次运行的输出用于比较
if [ $i -eq 1 ]; then
grep "data =" "$TEMP_OUT1" | sed 's/base@ = [^[:space:]]*/base@ = <addr>/g' > "$PROCESSED1"
grep "data =" "$TEMP_OUT2" | sed 's/base@ = [^[:space:]]*/base@ = <addr>/g' > "$OUTPUT2"
fi

# 计算这次运行的加速比
if [ -n "$TIME1" ] && [ -n "$TIME2" ] && [ "$TIME1" != "0" ] && [ "$TIME2" != "0" ]; then
echo "scale=9; $TIME1/$TIME2" | bc -l >> "$SPEEDUPS"
fi

rm "$TEMP_OUT1" "$TEMP_OUT2"
done
echo

# 比较数据输出
echo -e "\n${BLUE}Comparing output data:${NC}"
if diff "$PROCESSED1" "$OUTPUT2" > /dev/null; then
echo -e "${GREEN}✓ Outputs match! Both versions produce the same results.${NC}"
else
echo -e "${RED}✗ Outputs differ! Found differences:${NC}"
echo "----------------------------------------"
diff "$PROCESSED1" "$OUTPUT2"
echo "----------------------------------------"
fi

# 计算加速比的均值
echo -e "\n${BLUE}Performance Comparison:${NC}"
SPEEDUP_MEAN=$(calculate_mean "$SPEEDUPS")

if [ -n "$SPEEDUP_MEAN" ] && [ "$SPEEDUP_MEAN" != "0" ]; then
if [ $(echo "$SPEEDUP_MEAN > 1" | bc -l) -eq 1 ]; then
printf "${GREEN}Second version is %.2fx faster${NC}\n" "$SPEEDUP_MEAN"
else
SLOWDOWN=$(echo "scale=2; 1/$SPEEDUP_MEAN" | bc -l)
printf "${RED}Second version is %.2fx slower${NC}\n" "$SLOWDOWN"
fi
fi

# 清理临时文件
rm "$OUTPUT1" "$OUTPUT2" "$PROCESSED1" "$SPEEDUPS"
Loading