"What I also noticed in the model was it was very verbose and was just generating tokens way too much than the base model. There were a lot of questions in which the model just kept on generating token and it hit the 1024 cutoff."
Vipul Sehgal
Paper Club Presenter
Qwen