Teuken-7B-Base and Teuken-7B-Instruct: Towards European LLMs

https://news.ycombinator.com/rss Hits: 12

Summary

[Submitted on 30 Sep 2024 (v1), last revised 15 Oct 2024 (this version, v2)] Title:Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs Authors:Mehdi Ali, Michael Fromm, Klaudia Thellmann, Jan Ebert, Alexander Arno Weber, Richard Rutmann, Charvi Jain, Max Lübbering, Daniel Steinigen, Johannes Leveling, Katrin Klug, Jasper Schulze Buschhoff, Lena Jurkschat, Hammam Abdelwahab, Benny Jörg Stein, Karl-Heinz Sylla, Pavel Denisov, Nicolo' Brandizzi, Qasid Saleem, Anirban Bhowmick, Lennard Helmer, Chelsea John, Pedro Ortiz Suarez, Malte Ostendorff, Alex Jude, Lalith Manjunath, Samuel Weinbach, Carolin Penke, Oleg Filatov, Shima Asaadi, Fabio Barth, Rafet Sifa, Fabian Küch, Andreas Herten, René Jäkel, Georg Rehm, Stefan Kesselheim, Joachim Köhler, Nicolas Flores-Herr View a PDF of the paper titled Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs, by Mehdi Ali and 38 other authors View PDF HTML (experimental) Abstract:We present two multilingual LLMs designed to embrace Europe's linguistic diversity by supporting all 24 official languages of the European Union. Trained on a dataset comprising around 60% non-English data and utilizing a custom multilingual tokenizer, our models address the limitations of existing LLMs that predominantly focus on English or a few high-resource languages. We detail the models' development principles, i.e., data composition, tokenizer optimization, and training methodologies. The models demonstrate competitive performance across multilingual benchmarks, as evidenced by their performance on European versions of ARC, HellaSwag, MMLU, and TruthfulQA. Submission history From: Mehdi Ali [view email] [v1] Mon, 30 Sep 2024 16:05:38 UTC (391 KB) [v2] Tue, 15 Oct 2024 17:09:40 UTC (4,358 KB)

First seen: 2025-04-15 12:09

Last seen: 2025-04-15 23:15

Read Full Article More from this Source

Teuken-7B-Base and Teuken-7B-Instruct: Towards European LLMs

Summary

Related News

Business co-founders in tech startups are less valuable than they think

Restoring a Sinclair C5

New material gives copper superalloy-like strength

The Creativity Hack No One Told You About: Read the Obits

In Memoriam: SF and Fine Artist David Schleinkofer