Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation Error in IDOL #48

Closed
aylinaydincs opened this issue Nov 29, 2022 · 4 comments · May be fixed by #61
Closed

Evaluation Error in IDOL #48

aylinaydincs opened this issue Nov 29, 2022 · 4 comments · May be fixed by #61

Comments

@aylinaydincs
Copy link

Hi,
When I try to train a model for IDOL with the ssh cennection to a server, I am taking following error in the evaluation stage:

-- Process 1 terminated with the following error: Traceback (most recent call last): File "/home/aylinaydin/anaconda3/envs/project/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/home/aylinaydin/Project/VNext/detectron2/engine/launch.py", line 126, in _distributed_worker main_func(*args) File "/home/aylinaydin/Project/VNext/projects/IDOL/train_net.py", line 161, in main res = Trainer.test(cfg, model) File "/home/aylinaydin/Project/VNext/detectron2/engine/defaults.py", line 617, in test results_i = inference_on_dataset(model, data_loader, evaluator) File "/home/aylinaydin/Project/VNext/detectron2/evaluation/evaluator.py", line 158, in inference_on_dataset outputs = model(inputs) File "/home/aylinaydin/anaconda3/envs/project/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/home/aylinaydin/Project/VNext/projects/IDOL/idol/idol.py", line 284, in forward 0]) # (height, width) is resized size,images. image_sizes[0] is original size File "/home/aylinaydin/Project/VNext/projects/IDOL/idol/idol.py", line 357, in inference det_masks = output_mask[indices] RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
I look for change the device for each tensor, but i couldn't. Can you help me?

@aylinaydincs
Copy link
Author

In my server there are 20 cpu's and 2 gpu's

@aylinaydincs
Copy link
Author

Still I couldn't solve the error "RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)"
It comes from inference() method. I think the trainable tensors are keeping different devices. Should I change the self.merge_on_cpu = cfg.MODEL.IDOL.MERGE_ON_CPU in idol.py?

@ldknight
Copy link

ldknight commented Dec 5, 2022

You can add the following two lines of code below the error location. I successfully run the inference on a single gpu.
################################################
if not isinstance(indices,list):
indices=indices.cpu()
################################################

@aylinaydincs
Copy link
Author

Thank you for your suggestion, I think it works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants