-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient mining using all the available computational resources #26
Comments
Hello, I've never tried this, in our setup we use slurm for the scheduler which gives us containers with the right amount of GPU/CPU per jobs. You could try to set the cuda device at the beginning of the embedding module based on the iteration value maybe. Not sure if that would work. If you have an environment where you can test this, we would love a PR to do this. |
Thanks for the reply. So, just to be clear, when using Indeed, adding the following lines at the beginning of the def run(
self,
iteration_value: tp.Optional[tp.Any] = None,
iteration_index: int = 0,
):
# Setting cuda visible device for each shard
num_gpus = self.config.encode.config.requirements.gpus_per_node
try:
rank = int(iteration_value.split(".")[-2]) % num_gpus
torch.cuda.set_device(rank)
except ValueError:
pass
[...] I can open a PR if you like. Best, Z |
thanks for checking that it works. You probably want to use |
You can alternatively setup a slurm cluster, on your machine - I just made my 2x RTX 3090 Ubuntu machine into a "slurm cluster" |
Hello,
I have two monolingual datasets with about 1M sentences each and I would like to mine bitext from them. For this, I am following the quickstart and modifying
demo.yml
according to my needs. Here is what it looks like at the moment:To start mining, I run:
With this configuration, the embedding step utilizes only 1 GPU. I have tried to modify the
embed_text
config options to make it run on multiple GPUs but unsuccessfully. Is there a way to run the embedding step on multiple gpus?In general, given a machine with
n_gpus
andn_cpus
what are the config parameters to modify to make full use of the available computational resources?Thank you in advance,
Z
The text was updated successfully, but these errors were encountered: