potential little fixes `appendix-D4 .ipynb` #427

casinca · 2024-11-02T20:08:03Z

While toying around to compare with my code, there are these 2 things that I'm not sure about in the complete training func from D.4 let me know what do you think

You showed 2 ways of passing the peak_lr value to the optimizer: Directly passed as an argument to the train_model function or retrieving it using the optimizers parameters inside the train_model function with peak_lr = optimizer.param_groups[0]["lr"] which is the way implemented in the notebook and the book.

But in the code, the lr argument for optimizer = torch.optim.AdamW(model.parameters(), weight_decay=0.1) is never passed as lr=peak_lr thus defaulting to the lr AdamW's default of 1e-3 instead of the peak_lr = 5e-4 when we retrieve with peak_lr = optimizer.param_groups[0]["lr"]

There is a gap for the gradients clipping, there's no clipping for the 1st step after the warmup ends when global_step = warmup_steps because the warmup stops at if global_step < warmup_steps: and the clipping starts at if global_step > warmup_steps:

I'm not sure that was intended to have no clipping when lr is at max because you also mentioned:

Applies gradient clipping after the warmup phase to avoid exploding gradients

Example of an output with warmup_steps = 18 and prints yes/no under the above conditions

- lr missing argument for passing peak_lr to the optimizer - filling 1 step gap for gradient clipping

review-notebook-app · 2024-11-02T20:08:08Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

rasbt · 2024-11-03T16:19:35Z

Thanks a lot for the PR. Yes, you are right, since we retrieve the peak LR from the optimizer in the training function, we should initialize the optimizer with the peak LR.

I agree with your update, but I'll set the peak_lr to 0.001 so that the loss and plots afterwards don't change too much (to make it a bit less confusing for the readers)

Regarding the second point, that's a great catch as well. I just double checked and you are correct!

Update appendix-D.ipynb

f4bfd68

- lr missing argument for passing peak_lr to the optimizer - filling 1 step gap for gradient clipping

adjustments

0deecc3

rasbt merged commit 9ce0be3 into rasbt:main Nov 3, 2024
8 checks passed

casinca deleted the Append-D4-fixes branch November 4, 2024 18:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

potential little fixes `appendix-D4 .ipynb` #427

potential little fixes `appendix-D4 .ipynb` #427

casinca commented Nov 2, 2024

review-notebook-app bot commented Nov 2, 2024

rasbt commented Nov 3, 2024

potential little fixes appendix-D4 .ipynb #427

potential little fixes appendix-D4 .ipynb #427

Conversation

casinca commented Nov 2, 2024

review-notebook-app bot commented Nov 2, 2024

rasbt commented Nov 3, 2024

potential little fixes `appendix-D4 .ipynb` #427

potential little fixes `appendix-D4 .ipynb` #427