-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for trust_remote_code / 8k context #410
Comments
Being able to use a monkey patch would be cool, too, but I assume that's even more work. |
What I am most interested in is being able to use models which use this: Most of them are 8k. https://huggingface.co/TheBloke/airoboros-33B-gpt4-1-4-SuperHOT-8K-fp16/tree/main |
This is planned as a seperate addon but currently unfinished. |
Oh, OK, fair enough. Whenever you have a spare moment, would you kindly tell me where in the code the call is which loads a 16-bit llama-based model (you know, that I'd download from HF) is so I could just rig it myself to work? Whenever I have the time, I will figure out how to use python to just tell me the line number. If that happens before you get around to replying to this, I'll close out the PR. It could be either the code in KoboldAI or the code in transformers itself, I don't care which. |
The easiest way to do it is with our Basic HF backend since there it will be in the from_pretrained lines, in the main backend its quite complicated. The hold-up is that the Basic HF backend is unfinished and unstable, so your milage may strongly vary. |
Hmm, yeah, I'm having some issues with it. :( Check this out, though: Also, there's this: The patch is three lines. That code ameliorates the decrease in perplexity. Here's a colab: https://colab.research.google.com/drive/1VI2nhlyKvd5cw4-zHvAIk00cAVj2lCCC#scrollTo=b80b3f37 |
I just noticed everything you merged. Thanks! I'd been hopping between forks, and this makes my life a lot easier. |
In case you aren't aware, transformers now has support for rope scaling. https://huggingface.co/docs/transformers/main/model_doc/llama#transformers.LlamaConfig |
We automatically use rope scaling if its present in a models config. Manual control for it is planned. |
Ooh, nice. That makes my life a lot easier. Incidentally, I stumbled upon this: https://github.com/jquesnelle/scaled-rope Basically, it builds a wheel with the necessary code to support all these different scaling methods along with patch functions, e.g. def patch_llama_for_linear_scaled_rotary_embeddings(model, scale): I found it because I had problems loading some different models because of the layers, which it takes care of. |
Hello,
There are a number of models I'd like to try which require this. I know that I asked you about this in the past, and IIRC you mentioned that you removed it because you wanted to implement it properly.
In the interim, would you kindly instruct me on what I have to change in order to pass this flag to the appropriate call(s) (you don't have to do it for every conceivable situation/type of model, just for hf or hf_torch or whichever is necessary (16-bit, don't worry about loading in 8 or 4 bit) to load e.g. llama-based models, maybe falcon, etc. I'd just as happily patch transformers itself; whatever gets it to work. I'm mostly trying to load the models with increased context size.
Thanks.
The text was updated successfully, but these errors were encountered: