Sample dataset?
4
#23 opened 3 days ago
by
dweb
Details on the evaluation with lighteval
1
#22 opened 3 days ago
by
amaracani
is this published dataset finished PII process too?
2
#20 opened 4 days ago
by
kimcando
tiny-fineweb
1
#19 opened 6 days ago
by
3thn
Training configs for data ablation study
1
#14 opened 10 days ago
by
jimmyhbx
Reprocessing for a new language
7
#12 opened 11 days ago
by
pere
Are copyrighted works included in this dataset?
4
#9 opened 14 days ago
by
umm-maybe
Any plan to train models on larger subset of dataset?
1
#8 opened 14 days ago
by
mrfakename
Split by languages?
3
#7 opened 14 days ago
by
mhenrichsen
Thank you for the great dataset
#5 opened 15 days ago
by
musicurgy
Torrent?
2
#4 opened 15 days ago
by
emilss
Scoring documents with LLM and making scores available as a quality filter (Ask-LLM)
1
#3 opened 15 days ago
by
Lauler