l10n: document a bit how it works (#8102)

* l10n: document a bit how it works

* add a link to fluent

Co-authored-by: Daniel Hofstetter <daniel.hofstetter@42dh.com>

* fix typo

Co-authored-by: Daniel Hofstetter <daniel.hofstetter@42dh.com>

---------

Co-authored-by: Daniel Hofstetter <daniel.hofstetter@42dh.com>
This commit is contained in:
Sylvestre Ledru 2025-06-11 09:30:48 +02:00 committed by GitHub
parent 7e4877fb30
commit 61d69a18d7
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

187
docs/src/l10n.md Normal file
View file

@ -0,0 +1,187 @@
# 🌍 Localization (L10n) in uutils coreutils
This guide explains how localization (L10n) is implemented in the **Rust-based coreutils project**, detailing the use of [Fluent](https://projectfluent.org/) files, runtime behavior, and developer integration.
---
## 📁 Fluent File Layout
Each utility has its own set of translation files under:
```
src/uu/<utility>/locales/<locale>.ftl
```
Examples:
```
src/uu/ls/locales/en-US.ftl
src/uu/ls/locales/fr-FR.ftl
```
These files follow Fluent syntax and contain localized message patterns.
---
## ⚙️ Initialization
Localization must be explicitly initialized at runtime using:
```
setup_localization(path)
```
This is typically done:
- In `src/bin/coreutils.rs` for **multi-call binaries**
- In `src/uucore/src/lib.rs` for **single-call utilities**
The string parameter determines the lookup path for Fluent files.
---
## 🌐 Locale Detection
Locale selection is automatic and performed via:
```
fn detect_system_locale() -> Result<LanguageIdentifier, LocalizationError>
```
It reads the `LANG` environment variable (e.g., `fr-FR.UTF-8`), strips encoding, and parses the identifier.
If parsing fails or `LANG` is not set, it falls back to:
```
const DEFAULT_LOCALE: &str = "en-US";
```
You can override the locale at runtime by running:
```
LANG=ja-JP ./target/debug/ls
```
---
## 📥 Retrieving Messages
Two APIs are available:
### `get_message(id: &str) -> String`
Returns the message from the current locale bundle.
```
let msg = get_message("id-greeting");
```
If not found, falls back to `en-US`. If still missing, returns the ID itself.
---
### `get_message_with_args(id: &str, args: HashMap<String, String>) -> String`
Supports variable interpolation and pluralization.
```
let msg = get_message_with_args(
"error-io",
HashMap::from([
("error".to_string(), std::io::Error::last_os_error().to_string())
])
);
```
Fluent message example:
```
error-io = I/O error occurred: { $error }
```
Variables must match the Fluent placeholder keys (`$error`, `$name`, `$count`, etc.).
---
## 📦 Fluent Syntax Example
```
id-greeting = Hello, world!
welcome = Welcome, { $name }!
count-files = You have { $count ->
[one] { $count } file
*[other] { $count } files
}
```
Use plural rules and inline variables to adapt messages dynamically.
---
## 🧪 Testing Localization
Run all localization-related unit tests with:
```
cargo test --lib -p uucore
```
Tests include:
- Loading bundles
- Plural logic
- Locale fallback
- Fluent parse errors
- Thread-local behavior
- ...
---
## 🧵 Thread-local Storage
Localization is stored per thread using a `OnceLock`.
Each thread must call `setup_localization()` individually.
Initialization is **one-time-only** per thread — re-initialization results in an error.
---
## 🧪 Development vs Release Mode
During development (`cfg(debug_assertions)`), paths are resolved relative to the crate source:
```
$CARGO_MANIFEST_DIR/../uu/<utility>/locales/
```
In release mode, **paths are resolved relative to the executable**:
```
<executable_dir>/locales/<utility>/
```
If both fallback paths fail, an error is returned during `setup_localization()`.
---
## 🔤 Unicode Isolation Handling
By default, the Fluent system wraps variables with Unicode directional isolate characters (`U+2068`, `U+2069`) to protect against visual reordering issues in bidirectional text (e.g., mixing Arabic and English).
In this implementation, isolation is **disabled** via:
```
bundle.set_use_isolating(false);
```
This improves readability in CLI environments by preventing extraneous characters around interpolated values:
Correct (as rendered):
```
"Welcome, Alice!"
```
Fluent default (disabled here):
```
"\u{2068}Alice\u{2069}"
```