Skip to content
This repository was archived by the owner on Oct 6, 2025. It is now read-only.

RuntimeError: CUDA error: device-side assert triggered #17

@shivam1702

Description

@shivam1702

I was trying to run the model on my custom data of KG triples, to compare its performance, however I encountered a problem.

Upon running the training command for policy gradient model:
./experiment.sh configs/<model>.sh --train 0

Encountered the following error:
RuntimeError: CUDA error: device-side assert triggered

Full stack trace:

 33%|████████████████████████████████████████████████                                                                                                | 226/677 [01:38<02:55,  2.58it/s]
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:256: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [283,0,0], thread: [0,0,0] Assertion `sum > accZero` failed.
Traceback (most recent call last):
  File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/workspace/KGReasoning/code/MultiHopKG/src/experiments.py", line 765, in <module>
    run_experiment(args)
  File "/workspace/KGReasoning/code/MultiHopKG/src/experiments.py", line 746, in run_experiment
    train(lf)
  File "/workspace/KGReasoning/code/MultiHopKG/src/experiments.py", line 235, in train
    lf.run_train(train_data, dev_data)
  File "/workspace/KGReasoning/code/MultiHopKG/src/learn_framework.py", line 108, in run_train
    loss = self.loss(mini_batch)
  File "/workspace/KGReasoning/code/MultiHopKG/src/rl/graph_search/pg.py", line 58, in loss
    output = self.rollout(e1, r, e2, num_steps=self.num_rollout_steps)
  File "/workspace/KGReasoning/code/MultiHopKG/src/rl/graph_search/pg.py", line 135, in rollout
    sample_outcome = self.sample_action(db_outcomes, inv_offset)
  File "/workspace/KGReasoning/code/MultiHopKG/src/rl/graph_search/pg.py", line 205, in sample_action
    sample_outcome = sample(action_space, action_dist)
  File "/workspace/KGReasoning/code/MultiHopKG/src/rl/graph_search/pg.py", line 190, in sample
    sample_action_dist = apply_action_dropout_mask(action_dist, action_mask)
  File "/workspace/KGReasoning/code/MultiHopKG/src/rl/graph_search/pg.py", line 177, in apply_action_dropout_mask
    action_keep_mask = var_cuda(rand > self.action_dropout_rate).float()
  File "/workspace/KGReasoning/code/MultiHopKG/src/utils/ops.py", line 121, in var_cuda
    return Variable(x, requires_grad=requires_grad).cuda()
RuntimeError: CUDA error: device-side assert triggered

Kindly help me debug this, possible error sources and how to remove them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions