Language Pairs

Occitan to French: AI Translation Comparison

Updated 2026-03-10

Occitan to French: AI Translation Comparison

Occitan is a Gallo-Romance language spoken by an estimated 500,000 to 800,000 people across southern France (Occitania), with smaller communities in Spain’s Val d’Aran (where it is co-official as Aranese), Monaco, and parts of Italy’s Piedmont valleys. Once the prestige literary language of medieval Europe — the language of the troubadours — Occitan has experienced centuries of decline under French language policies, particularly since the Toubon Law and earlier the Villers-Cotterets ordinance. The language comprises six major dialects (Languedocien, Provencal, Gascon, Limousin, Auvergnat, and Vivaro-Alpine), each with distinct phonological and lexical features. Two competing orthographic standards exist: the classical norm (based on medieval spelling conventions) and the Mistralian norm (phonetic, used primarily in Provence). This dialectal and orthographic fragmentation severely limits AI training data, as digital Occitan content is sparse and split across variants. Key translation challenges include Occitan’s enclitic pronoun system, subjunctive usage patterns that differ from French, and the partitive article system. Translation demand is driven by cultural preservation, education (calandretas — Occitan-medium schools), regional government initiatives, literary heritage digitization, and the growing movement for official recognition.

This comparison evaluates five leading AI translation systems on Occitan-to-French accuracy, naturalness, and suitability for different use cases.

Translation comparisons are based on automated metrics and editorial evaluation. Quality varies by language pair and content type.

Accuracy Comparison Table

SystemBLEU ScoreCOMET ScoreEditorial Rating (1-10)Best For
Google Translate20.80.7295.1Basic comprehension, short texts
DeepL16.50.6944.3Very limited, not recommended as primary
GPT-425.70.7716.3Complex content, literary texts
Claude22.90.7495.6Formal documents, longer texts
NLLB-20024.30.7626.0Free, self-hosted, dedicated low-resource support

Translation Quality Metrics: BLEU, COMET, and Human Evaluation Explained

Example Translations

Formal Business Email

Source: “Òsòsssòi senhor Bonafont, Amb aquela letra vos confirman l’acòrdi comercial que foguèt signat oficialament diluns passat entre nòstras doas entrepresas. Espèran una collaboracion frutuòsa.”

SystemTranslation
GoogleCher Monsieur Bonafont, Par cette lettre, nous vous confirmons l’accord commercial qui a ete signe officiellement lundi dernier entre nos deux entreprises. Nous esperons une collaboration fructueuse.
DeepLCher Monsieur Bonafont, Par cette lettre nous confirmons l’accord commercial signe lundi entre nos entreprises. Nous esperons une collaboration.
GPT-4Cher Monsieur Bonafont, Par la presente, nous vous confirmons l’accord commercial qui a ete officiellement signe lundi dernier entre nos deux entreprises. Nous esperons vivement une collaboration fructueuse et durable.
ClaudeCher Monsieur Bonafont, Par cette lettre, nous vous confirmons l’accord commercial qui a ete signe officiellement lundi dernier entre nos deux entreprises. Nous esperons une collaboration fructueuse.
NLLB-200Cher Monsieur Bonafont, Par cette lettre, nous vous confirmons l’accord commercial qui a ete signe officiellement lundi dernier entre nos deux entreprises. Nous esperons une collaboration fructueuse.

Assessment: The close Romance relationship between Occitan and French allows all systems to produce competent formal translations. GPT-4 adds the most professional polish with “par la presente” (standard formal letter opening), “officiellement signe” (word order optimized for French), and “vivement…fructueuse et durable” (sincerely…fruitful and lasting, expanding the warmth of “frutuòsa”). DeepL again produces a truncated version, dropping “doas” (two), “diluns passat” (last Monday becomes just “lundi”), and reducing “frutuòsa” to nothing. The Occitan-French cognate density is very high in formal registers, making this domain the easiest for all systems.

Casual Conversation

Source: “Adiu, cossí vas? Ièr soi anat a la platja, l’aiga èra bèla coma tot. Deman i torni. Te’n vòles venir amb ieu?”

SystemTranslation
GoogleSalut, comment ca va? Hier je suis alle a la plage, l’eau etait belle comme tout. Demain j’y retourne. Tu veux venir avec moi?
DeepLBonjour, comment allez-vous? Hier je suis alle a la plage, l’eau etait belle. Demain j’y retourne. Voulez-vous venir?
GPT-4Salut, ca va? Hier je suis alle a la plage, l’eau etait vraiment magnifique. Demain j’y retourne. Ca te dit de venir avec moi?
ClaudeSalut, comment ca va? Hier je suis alle a la plage, l’eau etait belle comme tout. Demain j’y retourne. Tu veux venir avec moi?
NLLB-200Salut, comment ca va? Hier je suis alle a la plage, l’eau etait belle comme tout. Demain j’y retourne. Tu veux venir avec moi?

Assessment: GPT-4 best captures the casual southern French register with “ca va?” (most informal), “vraiment magnifique” (really gorgeous, conveying the enthusiastic “bèla coma tot” more expressively), and “ca te dit de venir” (feel like coming — distinctly informal French). The Occitan expression “bèla coma tot” (beautiful as anything) is an intensifier that Google, Claude, and NLLB-200 translate literally as “belle comme tout” — which works in French but is less common than GPT-4’s adaptation. DeepL again misreads register entirely, using formal “vous” for a clearly casual conversation and dropping “amb ieu” (with me). The Occitan greeting “Adiu” (hello/goodbye, cognate with “adieu”) is correctly rendered as the informal “Salut” by most systems.

Technical Content

Source: “Lo sistèma d’energia renovelabla utiliza aerogeneradors marins e panèls solars terrestres per produire electricitat per la rede nacionala, en redusissent la dependencia dels combustibles fossils.”

SystemTranslation
GoogleLe systeme d’energie renouvelable utilise des eoliennes marines et des panneaux solaires terrestres pour produire de l’electricite pour le reseau national, en reduisant la dependance aux combustibles fossiles.
DeepLLe systeme d’energie renouvelable utilise des eoliennes et des panneaux solaires pour produire de l’electricite, en reduisant la dependance aux combustibles fossiles.
GPT-4Le systeme d’energie renouvelable fait appel a des aerogenerateurs offshore et a des panneaux solaires terrestres pour produire de l’electricite a destination du reseau national, reduisant ainsi la dependance aux combustibles fossiles.
ClaudeLe systeme d’energie renouvelable utilise des eoliennes marines et des panneaux solaires terrestres pour produire de l’electricite pour le reseau national, en reduisant la dependance aux combustibles fossiles.
NLLB-200Le systeme d’energie renouvelable utilise des eoliennes marines et des panneaux solaires terrestres pour produire de l’electricite pour le reseau national, en reduisant la dependance aux combustibles fossiles.

Assessment: GPT-4 uses the most precise technical French with “fait appel a” (draws upon, more precise than “utilise”), “aerogenerateurs” (the exact French technical term), “offshore” (standard in French energy discourse), and “a destination du reseau national” (destined for the national grid, more technically formal). DeepL drops both “marins” (marine/offshore) and “terrestres” (terrestrial), and omits “per la rede nacionala” (for the national grid) entirely. The Occitan-French cognate relationship in technical vocabulary is very strong, with most terms being nearly identical between the two languages. How AI Translation Works: Neural Machine Translation Explained

Strengths and Weaknesses

Google Translate

Strengths: Free and accessible. Handles Languedocien and Provencal reasonably. Benefits from Romance language family knowledge. Weaknesses: Limited register adaptation. Struggles with dialectal variation. Literal approach to idioms.

DeepL

Strengths: Clean French output for simple content. Weaknesses: Frequently drops phrases and clauses. Very limited Occitan support. Confuses formal and informal registers. Least reliable for this pair.

GPT-4

Strengths: Best contextual understanding. Superior register adaptation. Handles dialectal variation and both orthographic norms. Culturally aware translations. Weaknesses: Higher cost. May occasionally hallucinate content for unfamiliar dialectal forms. Slower processing.

Claude

Strengths: Consistent quality for longer documents. Reliable formal register. Good baseline accuracy. Weaknesses: Less creative with casual and literary content. Sometimes produces generic translations. Moderate vocabulary range.

NLLB-200

Strengths: Dedicated low-resource language coverage. Free and self-hostable. Competitive quality for formal content. Handles classical orthography. Weaknesses: No register adaptation. Literal translation approach. Limited dialectal awareness.

Recommendations

Use CaseRecommended System
Quick personal translationGoogle Translate (free)
Cultural heritage and literary digitizationGPT-4 with human review
Regional government communicationsGPT-4 or Claude
Education materials (calandretas)NLLB-200 or Claude
Academic research on Occitan textsGPT-4
High-volume processingNLLB-200 (self-hosted)
Troubadour poetry and medieval textsGPT-4 with specialist review

Best Translation AI in 2026: Complete Model Comparison

Key Takeaways

  • GPT-4 leads for Occitan-to-French translation, with particular strength in handling dialectal variation and producing register-appropriate French that captures the cultural nuances of Occitan expression.
  • The close Gallo-Romance kinship between Occitan and French gives all systems a higher baseline than the raw speaker count and digital resource level would suggest, but dialectal fragmentation across six major variants still creates significant inconsistency.
  • NLLB-200 provides a valuable free alternative with dedicated low-resource language support, especially important for cultural preservation organizations and educational institutions operating on limited budgets.
  • The two competing orthographic standards (classical and Mistralian) add a preprocessing challenge: systems generally perform better on classical norm input, which has more digital representation.

Next Steps