-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
str_concat memory leak #92
Comments
Hi Martin. I think you are correct that the token memory is not explicitly freed before the program exits end the tokens that are being merged are also not freed which is probably more problematic. There is already a fix for this on this draft PR which puts all the pointers ultimately into Strings and Lists of Strings so it all memory should be freed without calling free directly. |
sounds very good - i am right now wondering if i should go for Lists and Strings in my port or deal with memory management myself, need to find out if it is performance crucial but i guess its not. Thanks for the reply, great project |
did you check performance of this change - i am playing around with Tokenizer and noticed that Dict is really pretty slow compared to python {} in my tests. I have not yet checked if a Mojo sort/search implementation following the C-Code is faster . |
As I said in the same comment you quoted:
I didn't play with prompts much which is when the dict lookup would be used. It is outside the transformer and not measured in the tokens/sec benchmark. In previous benchmarks of the vocab lookup it was only taking 6ms so I doubt there will be a difference with Dict big enough to warrant deciding on performance rather than simplifying the code base. But Llamatune reports the total runtime as well tokens / sec so any problems I have overlooked would be noticed there. The Llama2 vocab is 30,000 tokens though and the binary search being replaced is O(Log(N)) average time while Dict is O(1). |
I am also working on a port of llama2 to mojo. Your port is excellent, just do it myself for the sake of learning, probably sticking closer to the C source, will see how it goes. Just struggling with the tokenizer in the Andrej's C code and taking a look at your code, i wonder if there is a probably not problematic memory leak in the str_concat method, cant see right now that the memory allocated there is freed at one pint ... i might be completely wrong, just thought to drop you a line.
The text was updated successfully, but these errors were encountered: