English to Persian: AI Translation Guide
English to Persian: AI Translation Guide
Persian (Farsi) is spoken by over 110 million people across Iran, Afghanistan (as Dari), and Tajikistan (as Tajik). English-to-Persian translation serves diaspora communities, international business, academic exchange, and media localization. Persian uses a modified Arabic script written right-to-left, has SOV word order, and employs an ezafe construction that has no English equivalent — all of which create distinct challenges for AI translation.
This guide compares five AI translation systems on English-to-Persian quality.
Translation comparisons are based on automated metrics and editorial evaluation. Quality varies by language pair and content type.
Accuracy Comparison Table
| System | BLEU Score | COMET Score | Editorial Rating (1-10) | Best For |
|---|---|---|---|---|
| Google Translate | 28.4 | 0.808 | 6.9 | General use, broadest coverage |
| DeepL | 26.7 | 0.794 | 6.5 | Formal text (limited Persian) |
| GPT-4 | 30.2 | 0.821 | 7.3 | Contextual accuracy, idioms |
| Claude | 28.9 | 0.811 | 7.0 | Long-form, consistent output |
| NLLB-200 | 25.3 | 0.781 | 6.2 | Budget, self-hosted |
Translation Quality Metrics: BLEU, COMET, and Human Evaluation Explained
Best Overall: GPT-4
GPT-4 produces the most accurate and natural English-to-Persian translations, with the strongest handling of the ezafe construction, idiomatic expressions, and formal/informal register. Its contextual understanding allows it to choose appropriate Persian vocabulary and sentence structures that NMT systems miss. The gap between GPT-4 and other systems is larger for Persian than for many European languages.
Best Free Option: Google Translate
Google Translate is the best free option for English-to-Persian, offering broad vocabulary coverage and reasonable accuracy for standard text. Its quality has improved over the years, though output can feel stiff and overly formal. NLLB-200 supports Persian but produces the lowest quality output, often with grammatical errors and awkward phrasing.
Common Challenges for English to Persian
The Ezafe Construction
Persian uses the ezafe (a short unstressed vowel, typically “-e” or “-ye”) to connect nouns to their modifiers. “The red book” becomes “کتاب قرمز” (ketab-e qermez), where the ezafe links “book” and “red.” In writing, the ezafe is usually not marked, creating ambiguity that AI systems must resolve. Compound ezafe chains like “کتاب تاریخ ایران” (book of history of Iran) can extend to three or four levels, and AI systems must generate these correctly from English input.
GPT-4 handles ezafe construction most accurately. NLLB-200 frequently misorders modifiers or drops ezafe connections.
SOV Word Order
Persian places the verb at the end of the sentence. “I read the book” becomes “من کتاب را خواندم” (man ketab ra khandam — I book [object marker] read). AI systems must restructure English SVO into Persian SOV, which all systems handle for simple sentences. However, complex sentences with multiple clauses require careful restructuring, and NMT systems sometimes produce English-like word order that sounds unnatural in Persian.
The “Ra” Object Marker
Persian uses the postposition “را” (ra) to mark definite direct objects. “I saw the man” requires “ra” (مرد را دیدم), but “I saw a man” does not (مردی دیدم). AI systems must correctly determine definiteness from English context and apply “ra” accordingly. Overuse or underuse of “ra” is a common error, particularly in NLLB-200 and Google Translate.
Formal vs. Informal Register
Persian has significant register variation. Formal written Persian uses Arabic-origin vocabulary and complex syntax, while spoken Persian is simpler and uses more native vocabulary. “To cause” can be “باعث شدن” (formal, Arabic-origin) or “سبب شدن” (semi-formal). AI systems tend to produce overly formal output from neutral English input. GPT-4 can be prompted for specific register levels.
Regional Variants
Iranian Persian, Dari (Afghanistan), and Tajik (Tajikistan) differ in vocabulary, pronunciation, and some grammar. Most AI systems produce Iranian Persian by default. If the target audience speaks Dari or Tajik, output will contain unfamiliar vocabulary and constructions. GPT-4 can be prompted for Dari, but no system reliably produces Tajik.
Use Case Recommendations
| Use Case | Recommended System |
|---|---|
| Business correspondence | GPT-4 |
| Government / formal documents | GPT-4 with human review |
| Media / news translation | Google Translate or GPT-4 |
| Academic text | GPT-4 or Claude |
| Dari-targeted content | GPT-4 with dialect prompting |
| High-volume processing | Google Translate |
| Budget-sensitive, self-hosted | NLLB-200 |
| Long-form editorial | Claude |
Key Takeaways
- GPT-4 leads for English-to-Persian by a wider margin than for most language pairs. Its handling of the ezafe construction and register selection is notably better than NMT alternatives.
- Persian is a medium-resource language for AI translation. Quality is lower than European pairs but higher than truly low-resource languages.
- The ezafe construction and “ra” object marking are the most frequent sources of errors across all systems. These are Persian-specific features with no English equivalent.
- Regional variant handling is limited. If your audience is in Afghanistan or Tajikistan, human review by a native speaker of the relevant variety is essential.
Next Steps
- Full model comparison: Read Best Translation AI in 2026: Complete Model Comparison.
- System comparison: See Google Translate vs. DeepL vs. AI: Which Is Best?.
- When to add humans: Learn more in Human vs. AI Translation: When Each Makes Sense.