Intel Neural Chat
Collection
Fine-tuned 7B parameter LLM models, one of which made it to the top of the 7B HF LLM Leaderboard
•
15 items
•
Updated
•
1
The dataset is currently empty. Upload or create new data files. Then, you will be able to explore them in the Dataset Viewer.
Here is a collective list of instruction dataset used for Neural Chat fine-tuning. The total number of instruction samples and tokens are about 1.5M and 5M respectively.
Type | Language | Dataset | Number |
---|---|---|---|
HC3 | en | HC3 | 24K |
dolly | en | databricks-dolly-15k | 15K |
alpaca-zh | zh | tigerbot-alpaca-zh-0.5m | 500K |
alpaca-en | en | TigerResearch/tigerbot-alpaca-en-50k | 50K |
math | en | tigerbot-gsm-8k-en | 8K |
general | en | tigerbot-stackexchange-qa-en-0.5m | 500K |
OpenOrca | en | Open-Orca/OpenOrca | 400K (sampled) |
The collective dataset has been validated on multiple LLMs (such as MPT, LLama, Llama2) by the NeuralChat team (Kaokao Lv, Wenxin Zhang, Xuhui Ren, and Haihao Shen) from Intel/SATG/AIA/AIPT. Thanks to Hello-SimpleAI, databricks, TigerResearch/TigerBot, Open-Orca for releasing the open-source instruction dataset.