-
Notifications
You must be signed in to change notification settings - Fork 74
Open
Description
Recently tested using dynolog to dynamically collect profile data for PyTorch programs. It was found that CoreDump occurs in the following Nvidia PyTorch runtime images, specifically as follows:
nvcr.io/nvidia/pytorch:25.01-py3
nvcr.io/nvidia/pytorch:25.02-py3
GPU: A800
My test program is as follows:
import os
import torch
import torch.nn as nn
import time
from torchvision.models import resnet18
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = resnet18().to(device)
inputs = torch.randn(800, 3, 224, 224).to(device)
targets = torch.randint(0, 1000, (800,)).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
def test(iterations=200):
print("pid: {}".format(os.getpid()))
_ = model(inputs)
times = []
for idx in range(1, iterations):
start_time = time.perf_counter()
outputs = model(inputs)
loss = criterion(outputs, targets)
optimizer.zero_grad()
loss.backward()
optimizer.step()
times.append(time.perf_counter() - start_time)
print("Iter {} cost {:.3f}".format(idx, times[-1]))
return times
if __name__ == "__main__":
test()
Metadata
Metadata
Assignees
Labels
No labels