potential little fixes appendix-D4 .ipynb
#427
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While toying around to compare with my code, there are these 2 things that I'm not sure about in the complete training func from D.4 let me know what do you think
You showed 2 ways of passing the
peak_lr
value to the optimizer: Directly passed as an argument to thetrain_model
function or retrieving it using the optimizers parameters inside thetrain_model
function withpeak_lr = optimizer.param_groups[0]["lr"]
which is the way implemented in the notebook and the book.But in the code, the
lr
argument foroptimizer = torch.optim.AdamW(model.parameters(), weight_decay=0.1)
is never passed aslr=peak_lr
thus defaulting to the lr AdamW's default of 1e-3 instead of thepeak_lr = 5e-4
when we retrieve withpeak_lr = optimizer.param_groups[0]["lr"]
There is a gap for the gradients clipping, there's no clipping for the 1st step after the warmup ends when
global_step = warmup_steps
because the warmup stops atif global_step < warmup_steps:
and the clipping starts atif global_step > warmup_steps:
I'm not sure that was intended to have no clipping when lr is at max because you also mentioned:
Example of an output with
warmup_steps = 18
and prints yes/no under the above conditions