This repository was archived by the owner on Oct 6, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 77
RuntimeError: CUDA error: device-side assert triggered #17
Copy link
Copy link
Open
Description
I was trying to run the model on my custom data of KG triples, to compare its performance, however I encountered a problem.
Upon running the training command for policy gradient model:
./experiment.sh configs/<model>.sh --train 0
Encountered the following error:
RuntimeError: CUDA error: device-side assert triggered
Full stack trace:
33%|████████████████████████████████████████████████ | 226/677 [01:38<02:55, 2.58it/s]
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:256: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [283,0,0], thread: [0,0,0] Assertion `sum > accZero` failed.
Traceback (most recent call last):
File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/workspace/KGReasoning/code/MultiHopKG/src/experiments.py", line 765, in <module>
run_experiment(args)
File "/workspace/KGReasoning/code/MultiHopKG/src/experiments.py", line 746, in run_experiment
train(lf)
File "/workspace/KGReasoning/code/MultiHopKG/src/experiments.py", line 235, in train
lf.run_train(train_data, dev_data)
File "/workspace/KGReasoning/code/MultiHopKG/src/learn_framework.py", line 108, in run_train
loss = self.loss(mini_batch)
File "/workspace/KGReasoning/code/MultiHopKG/src/rl/graph_search/pg.py", line 58, in loss
output = self.rollout(e1, r, e2, num_steps=self.num_rollout_steps)
File "/workspace/KGReasoning/code/MultiHopKG/src/rl/graph_search/pg.py", line 135, in rollout
sample_outcome = self.sample_action(db_outcomes, inv_offset)
File "/workspace/KGReasoning/code/MultiHopKG/src/rl/graph_search/pg.py", line 205, in sample_action
sample_outcome = sample(action_space, action_dist)
File "/workspace/KGReasoning/code/MultiHopKG/src/rl/graph_search/pg.py", line 190, in sample
sample_action_dist = apply_action_dropout_mask(action_dist, action_mask)
File "/workspace/KGReasoning/code/MultiHopKG/src/rl/graph_search/pg.py", line 177, in apply_action_dropout_mask
action_keep_mask = var_cuda(rand > self.action_dropout_rate).float()
File "/workspace/KGReasoning/code/MultiHopKG/src/utils/ops.py", line 121, in var_cuda
return Variable(x, requires_grad=requires_grad).cuda()
RuntimeError: CUDA error: device-side assert triggered
Kindly help me debug this, possible error sources and how to remove them.
yuanllong
Metadata
Metadata
Assignees
Labels
No labels