Adding a long prompt can help you fight LLM hallucinations. However, if you know exactly how you want your LLM output constrained, there are much better strategies! ๐ช
Did you know you can force your LLM to ALWAYS generate a valid JSON file? Or to follow a well-defined answer template? You can do that and more with the ๐ค transformers-compatible outlines library.
It doesn't only allow you to master your LLM -- your text generation application will also become faster! ๐ฅ The more constrained your text generation is, the bigger speedups you'll see!
Follow @remi and other outlines folks to stay on top of the constrained generation game ๐ง
Up to 3x faster LLM generation with no extra resources/requirements - ngram speculation has landed in ๐ค transformers! ๐๏ธ๐จ
All you need to do is to add prompt_lookup_num_tokens=10 to your generate call, and you'll get faster LLMs ๐ฅ
How does it work? ๐ค
Start with assisted generation, where a smaller model generates candidate sequences. The net result is a significant speedup if the model agrees with the candidate sequences! However, we do require a smaller model trained similarly ๐
The idea introduced (and implemented) by Apoorv Saxena consists of gathering the candidate sequences from the input text itself. If the latest generated ngram is in the input, use the continuation therein as a candidate! No smaller model is required while still achieving significant speedups ๐ฅ
In fact, the penalty of gathering and testing the candidates is so small that you should use this technique whenever possible!