Evaluation Error in IDOL #48

aylinaydincs · 2022-11-29T10:36:10Z

Hi,
When I try to train a model for IDOL with the ssh cennection to a server, I am taking following error in the evaluation stage:

-- Process 1 terminated with the following error: Traceback (most recent call last): File "/home/aylinaydin/anaconda3/envs/project/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/home/aylinaydin/Project/VNext/detectron2/engine/launch.py", line 126, in _distributed_worker main_func(*args) File "/home/aylinaydin/Project/VNext/projects/IDOL/train_net.py", line 161, in main res = Trainer.test(cfg, model) File "/home/aylinaydin/Project/VNext/detectron2/engine/defaults.py", line 617, in test results_i = inference_on_dataset(model, data_loader, evaluator) File "/home/aylinaydin/Project/VNext/detectron2/evaluation/evaluator.py", line 158, in inference_on_dataset outputs = model(inputs) File "/home/aylinaydin/anaconda3/envs/project/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/home/aylinaydin/Project/VNext/projects/IDOL/idol/idol.py", line 284, in forward 0]) # (height, width) is resized size,images. image_sizes[0] is original size File "/home/aylinaydin/Project/VNext/projects/IDOL/idol/idol.py", line 357, in inference det_masks = output_mask[indices] RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
I look for change the device for each tensor, but i couldn't. Can you help me?

The text was updated successfully, but these errors were encountered:

aylinaydincs · 2022-11-29T15:44:32Z

In my server there are 20 cpu's and 2 gpu's

aylinaydincs · 2022-12-05T11:48:40Z

Still I couldn't solve the error "RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)"
It comes from inference() method. I think the trainable tensors are keeping different devices. Should I change the self.merge_on_cpu = cfg.MODEL.IDOL.MERGE_ON_CPU in idol.py?

ldknight · 2022-12-05T13:11:03Z

You can add the following two lines of code below the error location. I successfully run the inference on a single gpu.
################################################
if not isinstance(indices,list):
indices=indices.cpu()
################################################

aylinaydincs · 2022-12-06T15:50:10Z

Thank you for your suggestion, I think it works!

aylinaydincs closed this as completed Dec 6, 2022

or-toledano mentioned this issue Jan 18, 2023

Fix different indices device crash on GPU eval #61

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation Error in IDOL #48

Evaluation Error in IDOL #48

aylinaydincs commented Nov 29, 2022

aylinaydincs commented Nov 29, 2022

aylinaydincs commented Dec 5, 2022

ldknight commented Dec 5, 2022 •

edited

Loading

aylinaydincs commented Dec 6, 2022

Evaluation Error in IDOL #48

Evaluation Error in IDOL #48

Comments

aylinaydincs commented Nov 29, 2022

aylinaydincs commented Nov 29, 2022

aylinaydincs commented Dec 5, 2022

ldknight commented Dec 5, 2022 • edited Loading

aylinaydincs commented Dec 6, 2022

ldknight commented Dec 5, 2022 •

edited

Loading