mirror of
https://github.com/uutils/coreutils.git
synced 2025-07-07 21:45:01 +00:00
l10n: document a bit how it works (#8102)
* l10n: document a bit how it works * add a link to fluent Co-authored-by: Daniel Hofstetter <daniel.hofstetter@42dh.com> * fix typo Co-authored-by: Daniel Hofstetter <daniel.hofstetter@42dh.com> --------- Co-authored-by: Daniel Hofstetter <daniel.hofstetter@42dh.com>
This commit is contained in:
parent
7e4877fb30
commit
61d69a18d7
1 changed files with 187 additions and 0 deletions
187
docs/src/l10n.md
Normal file
187
docs/src/l10n.md
Normal file
|
@ -0,0 +1,187 @@
|
|||
# 🌍 Localization (L10n) in uutils coreutils
|
||||
|
||||
This guide explains how localization (L10n) is implemented in the **Rust-based coreutils project**, detailing the use of [Fluent](https://projectfluent.org/) files, runtime behavior, and developer integration.
|
||||
|
||||
---
|
||||
|
||||
## 📁 Fluent File Layout
|
||||
|
||||
Each utility has its own set of translation files under:
|
||||
|
||||
```
|
||||
src/uu/<utility>/locales/<locale>.ftl
|
||||
```
|
||||
|
||||
Examples:
|
||||
|
||||
```
|
||||
src/uu/ls/locales/en-US.ftl
|
||||
src/uu/ls/locales/fr-FR.ftl
|
||||
```
|
||||
|
||||
These files follow Fluent syntax and contain localized message patterns.
|
||||
|
||||
---
|
||||
|
||||
## ⚙️ Initialization
|
||||
|
||||
Localization must be explicitly initialized at runtime using:
|
||||
|
||||
```
|
||||
setup_localization(path)
|
||||
```
|
||||
|
||||
|
||||
This is typically done:
|
||||
- In `src/bin/coreutils.rs` for **multi-call binaries**
|
||||
- In `src/uucore/src/lib.rs` for **single-call utilities**
|
||||
|
||||
The string parameter determines the lookup path for Fluent files.
|
||||
|
||||
---
|
||||
|
||||
## 🌐 Locale Detection
|
||||
|
||||
Locale selection is automatic and performed via:
|
||||
|
||||
```
|
||||
fn detect_system_locale() -> Result<LanguageIdentifier, LocalizationError>
|
||||
```
|
||||
|
||||
It reads the `LANG` environment variable (e.g., `fr-FR.UTF-8`), strips encoding, and parses the identifier.
|
||||
|
||||
If parsing fails or `LANG` is not set, it falls back to:
|
||||
|
||||
```
|
||||
const DEFAULT_LOCALE: &str = "en-US";
|
||||
```
|
||||
|
||||
You can override the locale at runtime by running:
|
||||
|
||||
```
|
||||
LANG=ja-JP ./target/debug/ls
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📥 Retrieving Messages
|
||||
|
||||
Two APIs are available:
|
||||
|
||||
### `get_message(id: &str) -> String`
|
||||
|
||||
Returns the message from the current locale bundle.
|
||||
|
||||
```
|
||||
let msg = get_message("id-greeting");
|
||||
```
|
||||
|
||||
If not found, falls back to `en-US`. If still missing, returns the ID itself.
|
||||
|
||||
---
|
||||
|
||||
### `get_message_with_args(id: &str, args: HashMap<String, String>) -> String`
|
||||
|
||||
Supports variable interpolation and pluralization.
|
||||
|
||||
```
|
||||
let msg = get_message_with_args(
|
||||
"error-io",
|
||||
HashMap::from([
|
||||
("error".to_string(), std::io::Error::last_os_error().to_string())
|
||||
])
|
||||
);
|
||||
```
|
||||
|
||||
Fluent message example:
|
||||
|
||||
```
|
||||
error-io = I/O error occurred: { $error }
|
||||
```
|
||||
|
||||
Variables must match the Fluent placeholder keys (`$error`, `$name`, `$count`, etc.).
|
||||
|
||||
---
|
||||
|
||||
## 📦 Fluent Syntax Example
|
||||
|
||||
```
|
||||
id-greeting = Hello, world!
|
||||
welcome = Welcome, { $name }!
|
||||
count-files = You have { $count ->
|
||||
[one] { $count } file
|
||||
*[other] { $count } files
|
||||
}
|
||||
```
|
||||
|
||||
Use plural rules and inline variables to adapt messages dynamically.
|
||||
|
||||
---
|
||||
|
||||
## 🧪 Testing Localization
|
||||
|
||||
Run all localization-related unit tests with:
|
||||
|
||||
```
|
||||
cargo test --lib -p uucore
|
||||
```
|
||||
|
||||
Tests include:
|
||||
- Loading bundles
|
||||
- Plural logic
|
||||
- Locale fallback
|
||||
- Fluent parse errors
|
||||
- Thread-local behavior
|
||||
- ...
|
||||
|
||||
---
|
||||
|
||||
## 🧵 Thread-local Storage
|
||||
|
||||
Localization is stored per thread using a `OnceLock`.
|
||||
Each thread must call `setup_localization()` individually.
|
||||
Initialization is **one-time-only** per thread — re-initialization results in an error.
|
||||
|
||||
---
|
||||
|
||||
## 🧪 Development vs Release Mode
|
||||
|
||||
During development (`cfg(debug_assertions)`), paths are resolved relative to the crate source:
|
||||
|
||||
```
|
||||
$CARGO_MANIFEST_DIR/../uu/<utility>/locales/
|
||||
```
|
||||
|
||||
In release mode, **paths are resolved relative to the executable**:
|
||||
|
||||
```
|
||||
<executable_dir>/locales/<utility>/
|
||||
```
|
||||
|
||||
If both fallback paths fail, an error is returned during `setup_localization()`.
|
||||
|
||||
---
|
||||
|
||||
## 🔤 Unicode Isolation Handling
|
||||
|
||||
By default, the Fluent system wraps variables with Unicode directional isolate characters (`U+2068`, `U+2069`) to protect against visual reordering issues in bidirectional text (e.g., mixing Arabic and English).
|
||||
|
||||
In this implementation, isolation is **disabled** via:
|
||||
|
||||
```
|
||||
bundle.set_use_isolating(false);
|
||||
```
|
||||
|
||||
This improves readability in CLI environments by preventing extraneous characters around interpolated values:
|
||||
|
||||
Correct (as rendered):
|
||||
|
||||
```
|
||||
"Welcome, Alice!"
|
||||
```
|
||||
|
||||
Fluent default (disabled here):
|
||||
|
||||
```
|
||||
"\u{2068}Alice\u{2069}"
|
||||
```
|
Loading…
Add table
Add a link
Reference in a new issue