diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..1398694 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,49 @@ +# Contributing + +## Translation improvements + +You can find our translations in [`src/bin/edit/localization.rs`](./src/bin/edit/localization.rs). +Please feel free to open a pull request with your changes at any time. +If you'd like to discuss your changes first, please feel free to open an issue. + +## Bug reports + +If you find any bugs, we gladly accept pull requests without prior discussion. +Otherwise, you can of course always open an issue for us to look into. + +## Feature requests + +Please open a new issue for any feature requests you have in mind. +Keeping the binary size of the editor small is a priority for us and so we may need to discuss any new features first until we have support for plugins. + +## Code changes + +The project has a focus on a small binary size and sufficient (good) performance. +As such, we generally do not accept pull requests that introduce dependencies (there are always exceptions of course). +Otherwise, you can consider this project a playground for trying out any cool ideas you have. + +The overall architecture of the project can be summarized as follows: +* The underlying text buffer in `src/buffer` doesn't keep track of line breaks in the document. + This is a crucial design aspect that permeates throughout the entire codebase. + + To oversimplify, the *only* state that is kept is the current cursor position. + When the user asks to move to another line, the editor will `O(n)` seek through the underlying document until it found the corresponding number of line breaks. + * As a result, `src/simd` contains crucial `memchr2` functions to quickly find the next or previous line break (runs at up to >100GB/s). + * Furthermore, `src/unicode` implements an `Utf8Chars` iterator which transparently inserts U+FFFD replacements during iteration (runs at up to 4GB/s). + * Furthermore, `src/unicode` also implements grapheme cluster segmentation and cluster width measurement via its `MeasurementConfig` (runs at up to 600MB/s). + * If word wrap is disabled, `memchr2` is used for all navigation across lines, allowing us to breeze through 1GB large files as if they were 1MB. + * Even if word-wrap is enabled, it's still sufficiently smooth thanks to `MeasurementConfig`. This is only possible because these base functions are heavily optimized. +* `src/framebuffer.rs` implements a "framebuffer" like in video games. + It allows us to draw the UI output into an intermediate buffer first, accumulating all changes and handling things like color blending. + Then, it can compare the accumulated output with the previous frame and only send the necessary changes to the terminal. +* `src/tui.rs` implements an immediate mode UI. Its module implementation gives an overview how it works and I recommend reading it. +* `src/vt.rs` implements our VT parser. +* `src/sys` contains our platform abstractions. +* Finally, `src/bin/edit` ties everything together. + It's roughly 90% UI code and business logic. + It contains a little bit of VT logic in `setup_terminal`. + +If you have an issue with your terminal, the places of interest are the aforementioned: +* VT parser in `src/vt.rs` +* Platform specific code in `src/sys` +* And the `setup_terminal` function in `src/bin/edit/main.rs` diff --git a/README.md b/README.md index 6328e65..a49312a 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,20 @@ -# MS-DOS Editor Redux +# Microsoft Edit -TBA +A simple editor for simple needs. + +This editor pays homage to the classic [MS-DOS Editor](https://en.wikipedia.org/wiki/MS-DOS_Editor), but with a modern interface and modern input controls similar to VS Code. The goal is to provide an accessible editor, even those largely unfamiliar with terminals can use. + +## Installation + +* Download the latest release from our [releases page](https://github.com/microsoft/edit/releases/latest) +* Extract the archive +* Copy the `edit` binary to a directory in your `PATH` +* You may delete any other files in the archive if you don't need them + +## Build Instructions + +* [Install Rust](https://www.rust-lang.org/tools/install) +* Install the nightly toolchain: `rustup install nightly` + * Alternatively, set the environment variable `RUSTC_BOOTSTRAP=1` +* Clone the repository +* For a release build run: `cargo build --config .cargo/release.toml --release` diff --git a/src/apperr.rs b/src/apperr.rs index d6d16e8..bb6ee4d 100644 --- a/src/apperr.rs +++ b/src/apperr.rs @@ -1,12 +1,16 @@ +//! Provides a transparent error type for edit. + use std::{io, result}; use crate::sys; -// Remember to add an entry to `Error::message()` for each new error. pub const APP_ICU_MISSING: Error = Error::new_app(0); +/// Edit's transparent `Result` type. pub type Result = result::Result; +/// Edit's transparent `Error` type. +/// Abstracts over system and application errors. #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum Error { App(u32), diff --git a/src/arena/debug.rs b/src/arena/debug.rs index 2e6402a..34209ae 100644 --- a/src/arena/debug.rs +++ b/src/arena/debug.rs @@ -7,9 +7,34 @@ use std::ptr::NonNull; use super::release; use crate::apperr; +/// A debug wrapper for [`release::Arena`]. +/// +/// The problem with [`super::ScratchArena`] is that it only "borrows" an underlying +/// [`release::Arena`]. Once the [`super::ScratchArena`] is dropped it resets the watermark +/// of the underlying [`release::Arena`], freeing all allocations done since borrowing it. +/// +/// It is completely valid for the same [`release::Arena`] to be borrowed multiple times at once, +/// *as long as* you only use the most recent borrow. Bad example: +/// ```should_panic +/// use edit::arena::scratch_arena; +/// +/// let mut scratch1 = scratch_arena(None); +/// let mut scratch2 = scratch_arena(None); +/// +/// let foo = scratch1.alloc_uninit::(); +/// +/// // This will also reset `scratch1`'s allocation. +/// drop(scratch2); +/// +/// *foo; // BOOM! ...if it wasn't for our debug wrapper. +/// ``` +/// +/// To avoid this, this wraps the real [`release::Arena`] in a "debug" one, which pretends as if every +/// instance of itself is a distinct [`release::Arena`] instance. Then we use this "debug" [`release::Arena`] +/// for [`super::ScratchArena`] which allows us to track which borrow is the most recent one. pub enum Arena { // Delegate is 'static, because release::Arena requires no lifetime - // annotations, and so this struct cannot use them either. + // annotations, and so this mere debug helper cannot use them either. Delegated { delegate: &'static release::Arena, borrow: usize }, Owned { arena: release::Arena }, } diff --git a/src/arena/mod.rs b/src/arena/mod.rs index 9bfb14b..ac43fb9 100644 --- a/src/arena/mod.rs +++ b/src/arena/mod.rs @@ -1,12 +1,14 @@ +//! Arena allocators. Small and fast. + #[cfg(debug_assertions)] mod debug; mod release; mod scratch; mod string; -#[cfg(debug_assertions)] +#[cfg(all(not(doc), debug_assertions))] pub use self::debug::Arena; -#[cfg(not(debug_assertions))] +#[cfg(any(doc, not(debug_assertions)))] pub use self::release::Arena; pub use self::scratch::{ScratchArena, init, scratch_arena}; pub use self::string::ArenaString; diff --git a/src/arena/release.rs b/src/arena/release.rs index 0e81e95..8edcc8d 100644 --- a/src/arena/release.rs +++ b/src/arena/release.rs @@ -12,12 +12,36 @@ use crate::{apperr, sys}; const ALLOC_CHUNK_SIZE: usize = 64 * KIBI; +/// An arena allocator. +/// +/// If you have never used an arena allocator before, think of it as +/// allocating objects on the stack, but the stack is *really* big. +/// Each time you allocate, memory gets pushed at the end of the stack, +/// each time you deallocate, memory gets popped from the end of the stack. +/// +/// One reason you'd want to use this is obviously performance: It's very simple +/// and so it's also very fast, >10x faster than your system allocator. +/// +/// However, modern allocators such as `mimalloc` are just as fast, so why not use them? +/// Because their performance comes at the cost of binary size and we can't have that. +/// +/// The biggest benefit though is that it sometimes massively simplifies lifetime +/// and memory management. This can best be seen by this project's UI code, which +/// uses an arena to allocate a tree of UI nodes. This is infameously difficult +/// to do in Rust, but not so when you got an arena allocator: +/// All nodes have the same lifetime, so you can just use references. +/// +/// # Safety +/// +/// **Do not** push objects into the arena that require destructors. +/// Destructors are not executed. Use a pool allocator for that. pub struct Arena { base: NonNull, capacity: usize, commit: Cell, offset: Cell, + /// See [`super::debug`], which uses this for borrow tracking. #[cfg(debug_assertions)] pub(super) borrows: Cell, } @@ -61,6 +85,7 @@ impl Arena { /// Obviously, this is GIGA UNSAFE. It runs no destructors and does not check /// whether the offset is valid. You better take care when using this function. pub unsafe fn reset(&self, to: usize) { + // Fill the deallocated memory with 0xDD to aid debugging. if cfg!(debug_assertions) && self.offset.get() > to { let commit = self.commit.get(); let len = (self.offset.get() + 128).min(commit) - to; diff --git a/src/arena/scratch.rs b/src/arena/scratch.rs index 9c72e0a..d7dcbbe 100644 --- a/src/arena/scratch.rs +++ b/src/arena/scratch.rs @@ -9,6 +9,7 @@ use crate::helpers::*; static mut S_SCRATCH: [release::Arena; 2] = const { [release::Arena::empty(), release::Arena::empty()] }; +/// Call this before using [`scratch_arena`]. pub fn init() -> apperr::Result<()> { unsafe { for s in &mut S_SCRATCH[..] { @@ -18,8 +19,27 @@ pub fn init() -> apperr::Result<()> { Ok(()) } -/// Returns a new scratch arena for temporary allocations, -/// ensuring it doesn't conflict with the provided arena. +/// Need an arena for temporary allocations? [`scratch_arena`] got you covered. +/// Call [`scratch_arena`] and it'll return an [`Arena`] that resets when it goes out of scope. +/// +/// --- +/// +/// Most methods make just two kinds of allocations: +/// * Interior: Temporary data that can be deallocated when the function returns. +/// * Exterior: Data that is returned to the caller and must remain alive until the caller stops using it. +/// +/// Such methods only have two lifetimes, for which you consequently also only need two arenas. +/// ...even if your method calls other methods recursively! This is because the exterior allocations +/// of a callee are simply interior allocations to the caller, and so on, recursively. +/// +/// This works as long as the two arenas flip/flop between being used as interior/exterior allocator +/// along the callstack. To ensure that is the case, we use a recursion counter in debug builds. +/// +/// This approach was described among others at: +/// +/// # Safety +/// +/// If your function takes an [`Arena`] argument, you **MUST** pass it to `scratch_arena` as `Some(&arena)`. pub fn scratch_arena(conflict: Option<&Arena>) -> ScratchArena<'static> { unsafe { #[cfg(debug_assertions)] @@ -31,18 +51,9 @@ pub fn scratch_arena(conflict: Option<&Arena>) -> ScratchArena<'static> { } } -// Most methods make just two kinds of allocations: -// * Interior: Temporary data that can be deallocated when the function returns. -// * Exterior: Data that is returned to the caller and must remain alive until the caller stops using it. -// -// Such methods only have two lifetimes, for which you consequently also only need two arenas. -// ...even if your method calls other methods recursively! This is because the exterior allocations -// of a callee are simply interior allocations to the caller, and so on, recursively. -// -// This works as long as the two arenas flip/flop between being used as interior/exterior allocator -// along the callstack. To ensure that is the case, we use a recursion counter in debug builds. -// -// This approach was described among others at: https://nullprogram.com/blog/2023/09/27/ +/// Borrows an [`Arena`] for temporary allocations. +/// +/// See [`scratch_arena`]. #[cfg(debug_assertions)] pub struct ScratchArena<'a> { arena: debug::Arena, diff --git a/src/arena/string.rs b/src/arena/string.rs index 0d8211d..ecda536 100644 --- a/src/arena/string.rs +++ b/src/arena/string.rs @@ -4,49 +4,63 @@ use std::ops::{Bound, Deref, DerefMut, RangeBounds}; use super::Arena; use crate::helpers::*; +/// A custom string type, because `std` lacks allocator support for [`String`]. +/// +/// To keep things simple, this one is hardcoded to [`Arena`]. #[derive(Clone)] pub struct ArenaString<'a> { vec: Vec, } impl<'a> ArenaString<'a> { + /// Creates a new [`ArenaString`] in the given arena. #[must_use] pub const fn new_in(arena: &'a Arena) -> Self { Self { vec: Vec::new_in(arena) } } - #[inline] + /// Turns a [`str`] into an [`ArenaString`]. + #[must_use] pub fn from_str(arena: &'a Arena, s: &str) -> Self { let mut res = Self::new_in(arena); res.push_str(s); res } + /// It says right here that you checked if `bytes` is valid UTF-8 + /// and you are sure it is. Presto! Here's an `ArenaString`! + /// /// # Safety /// - /// It says "unchecked" right there. What did you expect? + /// You fool! It says "unchecked" right there. Now the house is burning. #[inline] #[must_use] pub unsafe fn from_utf8_unchecked(bytes: Vec) -> Self { Self { vec: bytes } } - pub fn from_utf8_lossy<'s>(arena: &'a Arena, v: &'s [u8]) -> Result<&'s str, ArenaString<'a>> { - let mut iter = v.utf8_chunks(); + /// Checks whether `text` contains only valid UTF-8. + /// If the entire string is valid, it returns `Ok(text)`. + /// Otherwise, it returns `Err(ArenaString)` with all invalid sequences replaced with U+FFFD. + pub fn from_utf8_lossy<'s>( + arena: &'a Arena, + text: &'s [u8], + ) -> Result<&'s str, ArenaString<'a>> { + let mut iter = text.utf8_chunks(); let Some(mut chunk) = iter.next() else { return Ok(""); }; let valid = chunk.valid(); if chunk.invalid().is_empty() { - debug_assert_eq!(valid.len(), v.len()); - return Ok(unsafe { str::from_utf8_unchecked(v) }); + debug_assert_eq!(valid.len(), text.len()); + return Ok(unsafe { str::from_utf8_unchecked(text) }); } const REPLACEMENT: &str = "\u{FFFD}"; let mut res = Self::new_in(arena); - res.reserve(v.len()); + res.reserve(text.len()); loop { res.push_str(chunk.valid()); @@ -62,6 +76,7 @@ impl<'a> ArenaString<'a> { Err(res) } + /// Turns a [`Vec`] into an [`ArenaString`], replacing invalid UTF-8 sequences with U+FFFD. #[must_use] pub fn from_utf8_lossy_owned(v: Vec) -> Self { match Self::from_utf8_lossy(v.allocator(), &v) { @@ -70,26 +85,32 @@ impl<'a> ArenaString<'a> { } } + /// It's empty. pub fn is_empty(&self) -> bool { self.vec.is_empty() } + /// It's lengthy. pub fn len(&self) -> usize { self.vec.len() } + /// It's capacatity. pub fn capacity(&self) -> usize { self.vec.capacity() } + /// It's a [`String`], now it's a [`str`]. Wow! pub fn as_str(&self) -> &str { unsafe { str::from_utf8_unchecked(self.vec.as_slice()) } } + /// It's a [`String`], now it's a [`str`]. And it's mutable! WOW! pub fn as_mut_str(&mut self) -> &mut str { unsafe { str::from_utf8_unchecked_mut(self.vec.as_mut_slice()) } } + /// Now it's bytes! pub fn as_bytes(&self) -> &[u8] { self.vec.as_slice() } @@ -103,22 +124,32 @@ impl<'a> ArenaString<'a> { &mut self.vec } + /// Reserves *additional* memory. For you old folks out there (totally not me), + /// this is differrent from C++'s `reserve` which reserves a total size. pub fn reserve(&mut self, additional: usize) { self.vec.reserve(additional) } + /// Now it's small! Alarming! + /// + /// *Do not* call this unless this string is the last thing on the arena. + /// Arenas are stacks, they can't deallocate what's in the middle. pub fn shrink_to_fit(&mut self) { self.vec.shrink_to_fit() } + /// To no surprise, this clears the string. pub fn clear(&mut self) { self.vec.clear() } + /// Append some text. pub fn push_str(&mut self, string: &str) { self.vec.extend_from_slice(string.as_bytes()) } + /// Append a single character. + #[inline] pub fn push(&mut self, ch: char) { match ch.len_utf8() { 1 => self.vec.push(ch as u8), @@ -156,6 +187,7 @@ impl<'a> ArenaString<'a> { } } + /// Replaces a range of characters with a new string. pub fn replace_range>(&mut self, range: R, replace_with: &str) { match range.start_bound() { Bound::Included(&n) => assert!(self.is_char_boundary(n)), diff --git a/src/base64.rs b/src/base64.rs index 138de1e..d978b45 100644 --- a/src/base64.rs +++ b/src/base64.rs @@ -1,19 +1,31 @@ +//! Base64 facilities. + use crate::arena::ArenaString; const CHARSET: [u8; 64] = *b"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; +/// Encodes the given bytes as base64 and appends them to the destination string. pub fn encode(dst: &mut ArenaString, src: &[u8]) { unsafe { let mut inp = src.as_ptr(); let mut remaining = src.len(); let dst = dst.as_mut_vec(); + + // One aspect of base64 is that the encoded length can be calculated accurately in advance. let out_len = src.len().div_ceil(3) * 4; + // ... we can then use this fact to reserve space all at once. dst.reserve(out_len); + // SAFETY: Getting a pointer to the reserved space is only safe + // *after* calling `reserve()` as it may change the pointer. let mut out = dst.as_mut_ptr().add(dst.len()); if remaining != 0 { + // Translate chunks of 3 source bytes into 4 base64-encoded bytes. while remaining > 3 { + // SAFETY: Thanks to `remaining > 3`, reading 4 bytes at once is safe. + // This improves performance massively over a byte-by-byte approach, + // because it allows us to byte-swap the read and use simple bit-shifts below. let val = u32::from_be((inp as *const u32).read_unaligned()); inp = inp.add(3); remaining -= 3; @@ -32,6 +44,8 @@ pub fn encode(dst: &mut ArenaString, src: &[u8]) { let mut in1 = 0; let mut in2 = 0; + // We can simplify the following logic by assuming that there's only 1 + // byte left. If there's >1 byte left, these two '=' will be overwritten. *out.add(3) = b'='; *out.add(2) = b'='; diff --git a/src/bin/edit/main.rs b/src/bin/edit/main.rs index 6281a04..39777e1 100644 --- a/src/bin/edit/main.rs +++ b/src/bin/edit/main.rs @@ -27,7 +27,7 @@ use edit::input::{self, kbmod, vk}; use edit::oklab::oklab_blend; use edit::tui::*; use edit::vt::{self, Token}; -use edit::{apperr, base64, path, sys}; +use edit::{apperr, base64, icu, path, sys}; use localization::*; use state::*; @@ -51,6 +51,10 @@ fn main() -> process::ExitCode { } fn run() -> apperr::Result<()> { + let items = vec!["hello.txt", "hallo.txt", "world.txt", "Hello, world.txt"]; + let mut sorted = items.clone(); + sorted.sort_by(|a, b| icu::compare_strings(a.as_bytes(), b.as_bytes())); + // Init `sys` first, as everything else may depend on its functionality (IO, function pointers, etc.). let _sys_deinit = sys::init()?; // Next init `arena`, so that `scratch_arena` works. `loc` depends on it. diff --git a/src/buffer/mod.rs b/src/buffer/mod.rs index 66096d1..5536117 100644 --- a/src/buffer/mod.rs +++ b/src/buffer/mod.rs @@ -1,13 +1,17 @@ +//! A text buffer for a text editor. +//! //! Implements a Unicode-aware, layout-aware text buffer for terminals. //! It's based on a gap buffer. It has no line cache and instead relies //! on the performance of the ucd module for fast text navigation. //! +//! --- +//! //! If the project ever outgrows a basic gap buffer (e.g. to add time travel) //! an ideal, alternative architecture would be a piece table with immutable trees. //! The tree nodes can be allocated on the same arena allocator as the added chunks, //! making lifetime management fairly easy. The algorithm is described here: -//! * https://cdacamar.github.io/data%20structures/algorithms/benchmarking/text%20editors/c++/editor-data-structures/ -//! * https://github.com/cdacamar/fredbuf +//! * +//! * //! //! The downside is that text navigation & search takes a performance hit due to small chunks. //! The solution to the former is to keep line caches, which further complicates the architecture. @@ -36,8 +40,8 @@ use crate::framebuffer::{Framebuffer, IndexedColor}; use crate::helpers::*; use crate::oklab::oklab_blend; use crate::simd::memchr2; -use crate::unicode::{Cursor, MeasurementConfig}; -use crate::{apperr, icu, unicode}; +use crate::unicode::{self, Cursor, MeasurementConfig}; +use crate::{apperr, icu}; /// The margin template is used for line numbers. /// The max. line number we should ever expect is probably 64-bit, @@ -47,16 +51,25 @@ const MARGIN_TEMPLATE: &str = " │ "; /// Happens to reuse MARGIN_TEMPLATE, because it has sufficient whitespace. const TAB_WHITESPACE: &str = MARGIN_TEMPLATE; +/// Stores statistics about the whole document. #[derive(Copy, Clone)] pub struct TextBufferStatistics { logical_lines: CoordType, visual_lines: CoordType, } +/// Stores the active text selection. #[derive(Copy, Clone)] enum TextBufferSelection { + /// No active selection. None, + /// The user is currently selecting text. + /// + /// Moving the cursor will update the selection. Active { beg: Point, end: Point }, + /// The user stopped selecting text. + /// + /// Moving the cursor will destroy the selection. Done { beg: Point, end: Point }, } @@ -66,6 +79,9 @@ impl TextBufferSelection { } } +/// In order to group actions into a single undo step, +/// we need to know the type of action that was performed. +/// This stores the action type. #[derive(Copy, Clone, Eq, PartialEq)] enum HistoryType { Other, @@ -73,11 +89,15 @@ enum HistoryType { Delete, } +/// An undo/redo entry. struct HistoryEntry { - /// Logical cursor position before the change was made. + /// [`TextBuffer::cursor`] position before the change was made. cursor_before: Point, + /// [`TextBuffer::selection`] before the change was made. selection_before: TextBufferSelection, + /// [`TextBuffer::stats`] before the change was made. stats_before: TextBufferStatistics, + /// [`GapBuffer::generation`] before the change was made. generation_before: u32, /// Logical cursor position where the change took place. /// The position is at the start of the changed range. @@ -88,21 +108,38 @@ struct HistoryEntry { added: Vec, } +/// Caches an ICU search operation. struct ActiveSearch { + /// The search pattern. pattern: String, + /// The search options. options: SearchOptions, + /// The ICU `UText` object. text: icu::Text, + /// The ICU `URegularExpression` object. regex: icu::Regex, + /// [`GapBuffer::generation`] when the search was created. + /// This is used to detect if we need to refresh the + /// [`ActiveSearch::regex`] object. buffer_generation: u32, + /// [`TextBuffer::selection_generation`] when the search was + /// created. When the user manually selects text, we need to + /// refresh the [`ActiveSearch::pattern`] with it. selection_generation: u32, + /// Stores the text buffer offset in between searches. next_search_offset: usize, + /// If we know there were no hits, we can skip searching. no_matches: bool, } +/// Options for a search operation. #[derive(Default, Clone, Copy, Eq, PartialEq)] pub struct SearchOptions { + /// If true, the search is case-sensitive. pub match_case: bool, + /// If true, the search matches whole words. pub whole_word: bool, + /// If true, the search uses regex. pub use_regex: bool, } @@ -111,22 +148,36 @@ pub struct SearchOptions { struct ActiveEditLineInfo { /// Points to the start of the currently being edited line. safe_start: Cursor, + /// Number of visual rows of the line that starts + /// at [`ActiveEditLineInfo::safe_start`]. line_height_in_rows: CoordType, + /// Byte distance from the start of the line at + /// [`ActiveEditLineInfo::safe_start`] to the next line. distance_next_line_start: usize, } +/// Char- or word-wise navigation? Your choice. pub enum CursorMovement { Grapheme, Word, } +/// The result of a call to [`TextBuffer::render()`]. pub struct RenderResult { + /// The maximum visual X position we encountered during rendering. pub visual_pos_x_max: CoordType, } +/// A [`TextBuffer`] with inner mutability. pub type TextBufferCell = SemiRefCell; + +/// A [`TextBuffer`] inside an [`Rc`]. +/// +/// We need this because the TUI system needs to borrow +/// the given text buffer(s) until after the layout process. pub type RcTextBuffer = Rc; +/// A text buffer for a text editor. pub struct TextBuffer { buffer: GapBuffer, @@ -167,11 +218,15 @@ pub struct TextBuffer { } impl TextBuffer { + /// Creates a new text buffer inside an [`Rc`]. + /// See [`TextBuffer::new()`]. pub fn new_rc(small: bool) -> apperr::Result { let buffer = TextBuffer::new(small)?; Ok(Rc::new(SemiRefCell::new(buffer))) } + /// Creates a new text buffer. With `small` you can control + /// if the buffer is optimized for <1MiB contents. pub fn new(small: bool) -> apperr::Result { Ok(Self { buffer: GapBuffer::new(small)?, @@ -209,26 +264,36 @@ impl TextBuffer { }) } + /// Length of the document in bytes. pub fn text_length(&self) -> usize { self.buffer.len() } + /// Number of logical lines in the document, + /// that is, lines separated by newlines. pub fn logical_line_count(&self) -> CoordType { self.stats.logical_lines } + /// Number of visual lines in the document, + /// that is, the number of lines after layout. pub fn visual_line_count(&self) -> CoordType { self.stats.visual_lines } + /// Does the buffer need to be saved? pub fn is_dirty(&self) -> bool { self.last_save_generation != self.buffer.generation() } + /// The buffer generation changes on every edit. + /// With this you can check if it has changed since + /// the last time you called this function. pub fn generation(&self) -> u32 { self.buffer.generation() } + /// Force the buffer to be dirty. pub fn mark_as_dirty(&mut self) { self.last_save_generation = self.buffer.generation().wrapping_sub(1); } @@ -237,10 +302,12 @@ impl TextBuffer { self.last_save_generation = self.buffer.generation(); } + /// The encoding used during reading/writing. "UTF-8" is the default. pub fn encoding(&self) -> &'static str { self.encoding } + /// Set the encoding used during reading/writing. pub fn set_encoding(&mut self, encoding: &'static str) { if self.encoding != encoding { self.encoding = encoding; @@ -248,10 +315,14 @@ impl TextBuffer { } } + /// The newline type used in the document. LF or CRLF. pub fn is_crlf(&self) -> bool { self.newlines_are_crlf } + /// Changes the newline type used in the document. + /// + /// NOTE: Cannot be undone. pub fn normalize_newlines(&mut self, crlf: bool) { let newline: &[u8] = if crlf { b"\r\n" } else { b"\n" }; let mut off = 0; @@ -318,26 +389,34 @@ impl TextBuffer { self.newlines_are_crlf = crlf; } + /// Whether to insert or overtype text when writing. pub fn is_overtype(&self) -> bool { self.overtype } + /// Set the overtype mode. pub fn set_overtype(&mut self, overtype: bool) { self.overtype = overtype; } + /// Gets the logical cursor position, that is, + /// the position in lines and graphemes per line. pub fn cursor_logical_pos(&self) -> Point { self.cursor.logical_pos } + /// Gets the visual cursor position, that is, + /// the position in laid out rows and columns. pub fn cursor_visual_pos(&self) -> Point { self.cursor.visual_pos } + /// Gets the width of the left margin. pub fn margin_width(&self) -> CoordType { self.margin_width } + /// Is the left margin enabled? pub fn set_margin_enabled(&mut self, enabled: bool) -> bool { if self.margin_enabled == enabled { false @@ -348,22 +427,38 @@ impl TextBuffer { } } + /// Gets the width of the text contents for layout. pub fn text_width(&self) -> CoordType { self.width - self.margin_width } + /// Ask the TUI system to scroll the buffer and make the cursor visible. + /// + /// TODO: This function shows that [`TextBuffer`] is poorly abstracted + /// away from the TUI system. The only reason this exists is so that + /// if someone outside the TUI code enables word-wrap, the TUI code + /// recognizes this and scrolls the cursor into view. But outside of this + /// scrolling, views, etc., are all UI concerns = this should not be here. pub fn make_cursor_visible(&mut self) { self.wants_cursor_visibility = true; } + /// For the TUI code to retrieve a prior [`TextBuffer::make_cursor_visible()`] request. pub fn take_cursor_visibility_request(&mut self) -> bool { mem::take(&mut self.wants_cursor_visibility) } + /// Is word-wrap enabled? + /// + /// Technically, this is a misnomer, because it's line-wrapping. pub fn is_word_wrap_enabled(&self) -> bool { self.word_wrap_enabled } + /// Enable or disable word-wrap. + /// + /// NOTE: It's expected that the tui code calls `set_width()` sometime after this. + /// This will then trigger the actual recalculation of the cursor position. pub fn set_word_wrap(&mut self, enabled: bool) { if self.word_wrap_enabled != enabled { self.word_wrap_enabled = enabled; @@ -372,6 +467,11 @@ impl TextBuffer { } } + /// Set the width available for layout. + /// + /// Ideally this would be a pure UI concern, but the text buffer needs this + /// so that it can abstract away visual cursor movement such as "go a line up". + /// What would that even mean if it didn't know how wide a line is? pub fn set_width(&mut self, width: CoordType) -> bool { if width <= 0 || width == self.width { false @@ -382,10 +482,12 @@ impl TextBuffer { } } + /// Set the tab width. Could be anything, but is expected to be 1-8. pub fn tab_size(&self) -> CoordType { self.tab_size } + /// Set the tab size. Clamped to 1-8. pub fn set_tab_size(&mut self, width: CoordType) -> bool { let width = width.clamp(1, 8); if width == self.tab_size { @@ -397,18 +499,22 @@ impl TextBuffer { } } + /// Returns whether tabs are used for indentation. pub fn indent_with_tabs(&self) -> bool { self.indent_with_tabs } + /// Sets whether tabs or spaces are used for indentation. pub fn set_indent_with_tabs(&mut self, indent_with_tabs: bool) { self.indent_with_tabs = indent_with_tabs; } + /// Sets whether the line the cursor is on should be highlighted. pub fn set_line_highlight_enabled(&mut self, enabled: bool) { self.line_highlight_enabled = enabled; } + /// Sets a ruler column, e.g. 80. pub fn set_ruler(&mut self, column: CoordType) { self.ruler = column; } @@ -799,6 +905,7 @@ impl TextBuffer { Ok(()) } + /// Returns the current selection. pub fn has_selection(&self) -> bool { self.selection.is_some() } @@ -809,6 +916,7 @@ impl TextBuffer { self.selection_generation } + /// Moves the cursor to `visual_pos` and updates the selection to contain it. pub fn selection_update_visual(&mut self, visual_pos: Point) { let cursor = self.cursor; self.set_cursor_for_selection(self.cursor_move_to_visual_internal(cursor, visual_pos)); @@ -826,6 +934,7 @@ impl TextBuffer { } } + /// Moves the cursor to `logical_pos` and updates the selection to contain it. pub fn selection_update_logical(&mut self, logical_pos: Point) { let cursor = self.cursor; self.set_cursor_for_selection(self.cursor_move_to_logical_internal(cursor, logical_pos)); @@ -843,6 +952,7 @@ impl TextBuffer { } } + /// Moves the cursor by `delta` and updates the selection to contain it. pub fn selection_update_delta(&mut self, granularity: CursorMovement, delta: CoordType) { let cursor = self.cursor; self.set_cursor_for_selection(self.cursor_move_delta_internal(cursor, granularity, delta)); @@ -860,6 +970,7 @@ impl TextBuffer { } } + /// Select the current word. pub fn select_word(&mut self) { let Range { start, end } = navigation::word_select(&self.buffer, self.cursor.offset); let beg = self.cursor_move_to_offset_internal(self.cursor, start); @@ -871,6 +982,7 @@ impl TextBuffer { }); } + /// Select the current line. pub fn select_line(&mut self) { let beg = self.cursor_move_to_logical_internal( self.cursor, @@ -885,6 +997,7 @@ impl TextBuffer { }); } + /// Select the entire document. pub fn select_all(&mut self) { let beg = Default::default(); let end = self.cursor_move_to_logical_internal(beg, Point::MAX); @@ -895,18 +1008,23 @@ impl TextBuffer { }); } + /// Turn an active selection into a finalized selection. + /// + /// Any future cursor movement will destroy the selection. pub fn selection_finalize(&mut self) { if let TextBufferSelection::Active { beg, end } = self.selection { self.set_selection(TextBufferSelection::Done { beg, end }); } } + /// Destroy the current selection. pub fn clear_selection(&mut self) -> bool { let had_selection = self.selection.is_some(); self.set_selection(TextBufferSelection::None); had_selection } + /// Find the next occurrence of the given `pattern` and select it. pub fn find_and_select(&mut self, pattern: &str, options: SearchOptions) -> apperr::Result<()> { if let Some(search) = &mut self.search { let search = search.get_mut(); @@ -959,6 +1077,7 @@ impl TextBuffer { Ok(()) } + /// Find the next occurrence of the given `pattern` and replace it with `replacement`. pub fn find_and_replace( &mut self, pattern: &str, @@ -978,6 +1097,7 @@ impl TextBuffer { self.find_and_select(pattern, options) } + /// Find all occurrences of the given `pattern` and replace them with `replacement`. pub fn find_and_replace_all( &mut self, pattern: &str, @@ -1333,18 +1453,22 @@ impl TextBuffer { cursor } + /// Moves the cursor to the given offset. pub fn cursor_move_to_offset(&mut self, offset: usize) { unsafe { self.set_cursor(self.cursor_move_to_offset_internal(self.cursor, offset)) } } + /// Moves the cursor to the given logical position. pub fn cursor_move_to_logical(&mut self, pos: Point) { unsafe { self.set_cursor(self.cursor_move_to_logical_internal(self.cursor, pos)) } } + /// Moves the cursor to the given visual position. pub fn cursor_move_to_visual(&mut self, pos: Point) { unsafe { self.set_cursor(self.cursor_move_to_visual_internal(self.cursor, pos)) } } + /// Moves the cursor by the given delta. pub fn cursor_move_delta(&mut self, granularity: CursorMovement, delta: CoordType) { unsafe { self.set_cursor(self.cursor_move_delta_internal(self.cursor, granularity, delta)) } } @@ -1847,11 +1971,13 @@ impl TextBuffer { self.edit_end(); } - // TODO: This function is ripe for some optimizations: - // * Instead of replacing the entire selection, - // it should unindent each line directly (as if multiple cursors had been used). - // * The cursor movement at the end is rather costly, but at least without word wrap - // it should be possible to calculate it directly from the removed amount. + /// Unindents the current selection or line. + /// + /// TODO: This function is ripe for some optimizations: + /// * Instead of replacing the entire selection, + /// it should unindent each line directly (as if multiple cursors had been used). + /// * The cursor movement at the end is rather costly, but at least without word wrap + /// it should be possible to calculate it directly from the removed amount. pub fn unindent(&mut self) { let mut selection_beg = self.cursor.logical_pos; let mut selection_end = selection_beg; @@ -1927,7 +2053,8 @@ impl TextBuffer { self.set_cursor_internal(self.cursor_move_to_logical_internal(self.cursor, selection_end)); } - /// Extracts a chunk of text or a line if no selection is active. May optionally delete it. + /// Extracts the contents of the current selection. + /// May optionally delete it, if requested. This is meant to be used for Ctrl+X. pub fn extract_selection(&mut self, delete: bool) -> Vec { let Some((beg, end)) = self.selection_range_internal(true) else { return Vec::new(); @@ -1946,6 +2073,9 @@ impl TextBuffer { out } + /// Extracts the contents of the current selection the user made. + /// This differs from [`TextBuffer::extract_selection()`] in that + /// it does nothing if the selection was made by searching. pub fn extract_user_selection(&mut self, delete: bool) -> Option> { if !self.has_selection() { return None; @@ -1961,10 +2091,17 @@ impl TextBuffer { Some(self.extract_selection(delete)) } + /// Returns the current selection anchors, or `None` if there + /// is no selection. The returned logical positions are sorted. pub fn selection_range(&self) -> Option<(Cursor, Cursor)> { self.selection_range_internal(false) } + /// Returns the current selection anchors. + /// + /// If there's no selection and `line_fallback` is `true`, + /// the start/end of the current line are returned. + /// This is meant to be used for Ctrl+C / Ctrl+X. fn selection_range_internal(&self, line_fallback: bool) -> Option<(Cursor, Cursor)> { let [beg, end] = match self.selection { TextBufferSelection::None if !line_fallback => return None, @@ -1983,6 +2120,8 @@ impl TextBuffer { if beg.offset < end.offset { Some((beg, end)) } else { None } } + /// Starts a new edit operation. + /// This is used for tracking the undo/redo history. fn edit_begin(&mut self, history_type: HistoryType, cursor: Cursor) { self.active_edit_depth += 1; if self.active_edit_depth > 1 { @@ -2033,6 +2172,8 @@ impl TextBuffer { } } + /// Writes `text` into the buffer at the current cursor position. + /// It records the change in the undo stack. fn edit_write(&mut self, text: &[u8]) { let logical_y_before = self.cursor.logical_pos.y; @@ -2052,6 +2193,8 @@ impl TextBuffer { self.stats.logical_lines += self.cursor.logical_pos.y - logical_y_before; } + /// Deletes the text between the current cursor position and `to`. + /// It records the change in the undo stack. fn edit_delete(&mut self, to: Cursor) { debug_assert!(to.offset >= self.active_edit_off); @@ -2076,6 +2219,8 @@ impl TextBuffer { self.stats.logical_lines += logical_y_before - to.logical_pos.y; } + /// Finalizes the current edit operation + /// and recalculates the line statistics. fn edit_end(&mut self) { self.active_edit_depth -= 1; assert!(self.active_edit_depth >= 0); @@ -2125,10 +2270,12 @@ impl TextBuffer { self.reflow(false); } + /// Undo the last edit operation. pub fn undo(&mut self) { self.undo_redo(true); } + /// Redo the last undo operation. pub fn redo(&mut self) { self.undo_redo(false); } @@ -2238,10 +2385,12 @@ impl TextBuffer { self.reflow(false); } + /// For interfacing with ICU. pub(crate) fn read_backward(&self, off: usize) -> &[u8] { self.buffer.read_backward(off) } + /// For interfacing with ICU. pub fn read_forward(&self, off: usize) -> &[u8] { self.buffer.read_forward(off) } diff --git a/src/cell.rs b/src/cell.rs index 861be7a..64f9e3f 100644 --- a/src/cell.rs +++ b/src/cell.rs @@ -1,4 +1,4 @@ -//! Like `RefCell`, but without any runtime checks in release mode. +//! [`std::cell::RefCell`], but without runtime checks in release builds. #[cfg(debug_assertions)] pub use debug::*; diff --git a/src/document.rs b/src/document.rs index 1d6c1f2..64b6fec 100644 --- a/src/document.rs +++ b/src/document.rs @@ -8,7 +8,7 @@ use std::path::PathBuf; use crate::arena::{ArenaString, scratch_arena}; use crate::helpers::ReplaceRange as _; -/// An abstraction over potentially chunked text containers. +/// An abstraction over reading from text containers. pub trait ReadableDocument { /// Read some bytes starting at (including) the given absolute offset. /// @@ -16,7 +16,7 @@ pub trait ReadableDocument { /// /// * Be lenient on inputs: /// * The given offset may be out of bounds and you MUST clamp it. - /// * You SHOULD NOT assume that offsets are at grapheme cluster boundaries. + /// * You should not assume that offsets are at grapheme cluster boundaries. /// * Be strict on outputs: /// * You MUST NOT break grapheme clusters across chunks. /// * You MUST NOT return an empty slice unless the offset is at or beyond the end. @@ -28,14 +28,21 @@ pub trait ReadableDocument { /// /// * Be lenient on inputs: /// * The given offset may be out of bounds and you MUST clamp it. - /// * You SHOULD NOT assume that offsets are at grapheme cluster boundaries. + /// * You should not assume that offsets are at grapheme cluster boundaries. /// * Be strict on outputs: /// * You MUST NOT break grapheme clusters across chunks. /// * You MUST NOT return an empty slice unless the offset is zero. fn read_backward(&self, off: usize) -> &[u8]; } +/// An abstraction over writing to text containers. pub trait WriteableDocument: ReadableDocument { + /// Replace the given range with the given bytes. + /// + /// # Warning + /// + /// * The given range may be out of bounds and you MUST clamp it. + /// * The replacement may not be valid UTF8. fn replace(&mut self, range: Range, replacement: &[u8]); } diff --git a/src/framebuffer.rs b/src/framebuffer.rs index 1abd28d..5018f66 100644 --- a/src/framebuffer.rs +++ b/src/framebuffer.rs @@ -1,3 +1,5 @@ +//! A shoddy framebuffer for terminal applications. + use std::cell::Cell; use std::fmt::Write; use std::ops::{BitOr, BitXor}; @@ -24,6 +26,7 @@ const CACHE_TABLE_SIZE: usize = 1 << CACHE_TABLE_LOG2_SIZE; /// 8 bits out, but rather shift 56 bits down to get the best bits from the top. const CACHE_TABLE_SHIFT: usize = usize::BITS as usize - CACHE_TABLE_LOG2_SIZE; +/// Standard 16 VT & default foreground/background colors. #[derive(Clone, Copy)] pub enum IndexedColor { Black, @@ -47,33 +50,55 @@ pub enum IndexedColor { Foreground, } +/// Number of indices used by [`IndexedColor`]. pub const INDEXED_COLORS_COUNT: usize = 18; +/// Fallback theme. pub const DEFAULT_THEME: [u32; INDEXED_COLORS_COUNT] = [ 0xff000000, 0xff212cbe, 0xff3aae3f, 0xff4a9abe, 0xffbe4d20, 0xffbe54bb, 0xffb2a700, 0xffbebebe, 0xff808080, 0xff303eff, 0xff51ea58, 0xff44c9ff, 0xffff6a2f, 0xffff74fc, 0xfff0e100, 0xffffffff, 0xff000000, 0xffffffff, ]; +/// A shoddy framebuffer for terminal applications. +/// +/// The idea is that you create a [`Framebuffer`], draw a bunch of text and +/// colors into it, and it takes care of figuring out what changed since the +/// last rendering and sending the differences as VT to the terminal. +/// +/// This is an improvement over how many other terminal applications work, +/// as they fail to accurately track what changed. If you watch the output +/// of `vim` for instance, you'll notice that it redraws unrelated parts of +/// the screen all the time. pub struct Framebuffer { + /// Store the color palette. indexed_colors: [u32; INDEXED_COLORS_COUNT], + /// Front and back buffers. Indexed by `frame_counter & 1`. buffers: [Buffer; 2], + /// The current frame counter. Increments on every `flip` call. frame_counter: usize, - auto_colors: [u32; 2], // [dark, light] + /// The colors used for `contrast()`. It stores the default colors + /// of the palette as [dark, light], unless the palette is recognized + /// as a light them, in which case it swaps them. + auto_colors: [u32; 2], + /// A cache table for previously contrasted colors. + /// See: contrast_colors: [Cell<(u32, u32)>; CACHE_TABLE_SIZE], } impl Framebuffer { + /// Creates a new framebuffer. pub fn new() -> Self { Self { indexed_colors: DEFAULT_THEME, buffers: Default::default(), frame_counter: 0, auto_colors: [0, 0], - contrast_colors: [const { Cell::new((0, 0)) }; 256], + contrast_colors: [const { Cell::new((0, 0)) }; CACHE_TABLE_SIZE], } } + /// Sets the base color palette. pub fn set_indexed_colors(&mut self, colors: [u32; INDEXED_COLORS_COUNT]) { self.indexed_colors = colors; @@ -86,6 +111,7 @@ impl Framebuffer { } } + /// Begins a new frame with the given `size`. pub fn flip(&mut self, size: Size) { if size != self.buffers[0].bg_bitmap.size { for buffer in &mut self.buffers { @@ -117,9 +143,7 @@ impl Framebuffer { /// Replaces text contents in a single line of the framebuffer. /// All coordinates are in viewport coordinates. - /// Assumes that all tabs have been replaced with spaces. - /// - /// TODO: This function is ripe for performance improvements. + /// Assumes that control characters have been replaced or escaped. pub fn replace_text( &mut self, y: CoordType, @@ -131,6 +155,18 @@ impl Framebuffer { back.text.replace_text(y, origin_x, clip_right, text) } + /// Draws a scrollbar in the given `track` rectangle. + /// + /// Not entirely sure why I put it here instead of elsewhere. + /// + /// # Parameters + /// + /// * `clip_rect`: Clips the rendering to this rectangle. + /// This is relevant when you have scrollareas inside scrollareas. + /// * `track`: The rectangle in which to draw the scrollbar. + /// In absolute viewport coordinates. + /// * `content_offset`: The current offset of the scrollarea. + /// * `content_height`: The height of the scrollarea content. pub fn draw_scrollbar( &mut self, clip_rect: Rect, @@ -247,8 +283,10 @@ impl Framebuffer { self.indexed_colors[index as usize] } - // To facilitate constant folding by the compiler, - // alpha is given as a fraction (`numerator` / `denominator`). + /// Returns a color from the palette. + /// + /// To facilitate constant folding by the compiler, + /// alpha is given as a fraction (`numerator` / `denominator`). #[inline] pub fn indexed_alpha(&self, index: IndexedColor, numerator: u32, denominator: u32) -> u32 { let c = self.indexed_colors[index as usize]; @@ -259,6 +297,7 @@ impl Framebuffer { a << 24 | r << 16 | g << 8 | b } + /// Returns a color opposite to the brightness of the given `color`. pub fn contrasted(&self, color: u32) -> u32 { let idx = (color as usize).wrapping_mul(HASH_MULTIPLIER) >> CACHE_TABLE_SHIFT; let slot = self.contrast_colors[idx].get(); @@ -277,16 +316,25 @@ impl Framebuffer { srgb_to_oklab(color).l < 0.5 } + /// Blends the given sRGB color onto the background bitmap. + /// + /// TODO: The current approach blends foreground/background independently, + /// but ideally `blend_bg` with semi-transparent dark should also darken text below it. pub fn blend_bg(&mut self, target: Rect, bg: u32) { let back = &mut self.buffers[self.frame_counter & 1]; back.bg_bitmap.blend(target, bg); } + /// Blends the given sRGB color onto the foreground bitmap. + /// + /// TODO: The current approach blends foreground/background independently, + /// but ideally `blend_fg` should blend with the background color below it. pub fn blend_fg(&mut self, target: Rect, fg: u32) { let back = &mut self.buffers[self.frame_counter & 1]; back.fg_bitmap.blend(target, fg); } + /// Reverses the foreground and background colors in the given rectangle. pub fn reverse(&mut self, target: Rect) { let back = &mut self.buffers[self.frame_counter & 1]; @@ -310,17 +358,23 @@ impl Framebuffer { } } + /// Replaces VT attributes in the given rectangle. pub fn replace_attr(&mut self, target: Rect, mask: Attributes, attr: Attributes) { let back = &mut self.buffers[self.frame_counter & 1]; back.attributes.replace(target, mask, attr); } + /// Sets the current visible cursor position and type. + /// + /// Call this when focus is inside an editable area and you want to show the cursor. pub fn set_cursor(&mut self, pos: Point, overtype: bool) { let back = &mut self.buffers[self.frame_counter & 1]; back.cursor.pos = pos; back.cursor.overtype = overtype; } + /// Renders the framebuffer contents accumulated since the + /// last call to `flip()` and returns them serialized as VT. pub fn render<'a>(&mut self, arena: &'a Arena) -> ArenaString<'a> { let idx = self.frame_counter & 1; // Borrows the front/back buffers without letting Rust know that we have a reference to self. @@ -484,6 +538,7 @@ struct Buffer { cursor: Cursor, } +/// A buffer for the text contents of the framebuffer. #[derive(Default)] struct LineBuffer { lines: Vec, @@ -509,10 +564,8 @@ impl LineBuffer { /// Replaces text contents in a single line of the framebuffer. /// All coordinates are in viewport coordinates. - /// Assumes that all tabs have been replaced with spaces. - /// - /// TODO: This function is ripe for performance improvements. - pub fn replace_text( + /// Assumes that control characters have been replaced or escaped. + fn replace_text( &mut self, y: CoordType, origin_x: CoordType, @@ -632,6 +685,7 @@ impl LineBuffer { } } +/// An sRGB bitmap. #[derive(Default)] struct Bitmap { data: Vec, @@ -647,6 +701,10 @@ impl Bitmap { memset(&mut self.data, color); } + /// Blends the given sRGB color onto the bitmap. + /// + /// This uses the `oklab` color space for blending so the + /// resulting colors may look different from what you'd expect. fn blend(&mut self, target: Rect, color: u32) { if (color & 0xff000000) == 0x00000000 { return; @@ -700,11 +758,14 @@ impl Bitmap { } } +/// A bitfield for VT text attributes. +/// +/// It being a bitfield allows for simple diffing. #[repr(transparent)] #[derive(Default, Clone, Copy, PartialEq, Eq)] pub struct Attributes(u8); -#[allow(non_upper_case_globals)] // Mimics an enum, but it's actually a bitfield. Allows simple diffing. +#[allow(non_upper_case_globals)] impl Attributes { pub const None: Attributes = Attributes(0); pub const Italic: Attributes = Attributes(0b1); @@ -734,6 +795,7 @@ impl BitXor for Attributes { } } +/// Stores VT attributes for the framebuffer. #[derive(Default)] struct AttributeBuffer { data: Vec, @@ -782,6 +844,7 @@ impl AttributeBuffer { } } +/// Stores cursor position and type for the framebuffer. #[derive(Default, PartialEq, Eq)] struct Cursor { pos: Point, diff --git a/src/hash.rs b/src/hash.rs index 4001160..914d48d 100644 --- a/src/hash.rs +++ b/src/hash.rs @@ -1,3 +1,12 @@ +//! Provides fast, non-cryptographic hash functions. + +/// The venerable wyhash hash function. +/// +/// It's fast, has good statistical properties, and is in the public domain. +/// See: +/// If you visit the link, you'll find that it was superseded by "rapidhash", +/// but that's not particularly interesting for this project. rapidhash results +/// in way larger assembly and isn't faster when hashing small amounts of data. pub fn hash(mut seed: u64, data: &[u8]) -> u64 { unsafe { const S0: u64 = 0xa0761d6478bd642f; diff --git a/src/helpers.rs b/src/helpers.rs index 0b6eedb..f76f6c1 100644 --- a/src/helpers.rs +++ b/src/helpers.rs @@ -1,3 +1,5 @@ +//! Random assortment of helpers I didn't know where to put. + use std::alloc::Allocator; use std::cmp::Ordering; use std::io::Read; @@ -15,11 +17,17 @@ pub const KIBI: usize = 1024; pub const MEBI: usize = 1024 * 1024; pub const GIBI: usize = 1024 * 1024 * 1024; +/// A viewport coordinate type used throughout the application. pub type CoordType = i32; +/// To avoid overflow issues because you're adding two [`CoordType::MAX`] values together, +/// you can use [`COORD_TYPE_SAFE_MIN`] and [`COORD_TYPE_SAFE_MAX`]. pub const COORD_TYPE_SAFE_MAX: CoordType = 32767; + +/// See [`COORD_TYPE_SAFE_MAX`]. pub const COORD_TYPE_SAFE_MIN: CoordType = -32767 - 1; +/// A 2D point. Uses [`CoordType`]. #[derive(Default, Debug, Clone, Copy, PartialEq, Eq)] pub struct Point { pub x: CoordType, @@ -46,6 +54,7 @@ impl Ord for Point { } } +/// A 2D size. Uses [`CoordType`]. #[derive(Default, Debug, Clone, Copy, PartialEq, Eq)] pub struct Size { pub width: CoordType, @@ -58,6 +67,7 @@ impl Size { } } +/// A 2D rectangle. Uses [`CoordType`]. #[derive(Default, Debug, Clone, Copy, PartialEq, Eq)] pub struct Rect { pub left: CoordType, @@ -67,34 +77,44 @@ pub struct Rect { } impl Rect { + /// Mimics CSS's `padding` property where `padding: a` is `a a a a`. pub fn one(value: CoordType) -> Self { Self { left: value, top: value, right: value, bottom: value } } + /// Mimics CSS's `padding` property where `padding: a b` is `a b a b`, + /// and `a` is top/bottom and `b` is left/right. pub fn two(top_bottom: CoordType, left_right: CoordType) -> Self { Self { left: left_right, top: top_bottom, right: left_right, bottom: top_bottom } } + /// Mimics CSS's `padding` property where `padding: a b c` is `a b c b`, + /// and `a` is top, `b` is left/right, and `c` is bottom. pub fn three(top: CoordType, left_right: CoordType, bottom: CoordType) -> Self { Self { left: left_right, top, right: left_right, bottom } } + /// Is the rectangle empty? pub fn is_empty(&self) -> bool { self.left >= self.right || self.top >= self.bottom } + /// Width of the rectangle. pub fn width(&self) -> CoordType { self.right - self.left } + /// Height of the rectangle. pub fn height(&self) -> CoordType { self.bottom - self.top } + /// Check if it contains a point. pub fn contains(&self, point: Point) -> bool { point.x >= self.left && point.x < self.right && point.y >= self.top && point.y < self.bottom } + /// Intersect two rectangles. pub fn intersect(&self, rhs: Self) -> Self { let l = self.left.max(rhs.left); let t = self.top.max(rhs.top); @@ -110,7 +130,7 @@ impl Rect { } } -/// `std::cmp::minmax` is unstable, as per usual. +/// [`std::cmp::minmax`] is unstable, as per usual. pub fn minmax(v1: T, v2: T) -> [T; 2] where T: Ord, @@ -145,12 +165,16 @@ pub const unsafe fn str_from_raw_parts<'a>(ptr: *const u8, len: usize) -> &'a st unsafe { str::from_utf8_unchecked(slice::from_raw_parts(ptr, len)) } } +/// [`<[T]>::copy_from_slice`] panics if the two slices have different lengths. +/// This one just returns the copied amount. pub fn slice_copy_safe(dst: &mut [T], src: &[T]) -> usize { let len = src.len().min(dst.len()); unsafe { ptr::copy_nonoverlapping(src.as_ptr(), dst.as_mut_ptr(), len) }; len } +/// [`Vec::splice`] results in really bad assembly. +/// This doesn't. Don't use [`Vec::splice`]. pub trait ReplaceRange { fn replace_range>(&mut self, range: R, src: &[T]); } @@ -205,6 +229,7 @@ fn vec_replace_impl(dst: &mut Vec, range: Range`] buffers. pub fn file_read_uninit( file: &mut T, buf: &mut [MaybeUninit], @@ -216,11 +241,13 @@ pub fn file_read_uninit( } } +/// Turns a [`&[u8]`] into a [`&[MaybeUninit]`]. #[inline(always)] pub const fn slice_as_uninit_ref(slice: &[T]) -> &[MaybeUninit] { unsafe { slice::from_raw_parts(slice.as_ptr() as *const MaybeUninit, slice.len()) } } +/// Turns a [`&mut [T]`] into a [`&mut [MaybeUninit]`]. #[inline(always)] pub const fn slice_as_uninit_mut(slice: &mut [T]) -> &mut [MaybeUninit] { unsafe { slice::from_raw_parts_mut(slice.as_mut_ptr() as *mut MaybeUninit, slice.len()) } diff --git a/src/icu.rs b/src/icu.rs index 74423a4..bf3b03b 100644 --- a/src/icu.rs +++ b/src/icu.rs @@ -1,3 +1,5 @@ +//! Bindings to the ICU library. + use std::cmp::Ordering; use std::ffi::CStr; use std::mem; @@ -13,6 +15,7 @@ use crate::{apperr, arena_format, sys}; static mut ENCODINGS: Vec<&'static str> = Vec::new(); +/// Returns a list of encodings ICU supports. pub fn get_available_encodings() -> &'static [&'static str] { // OnceCell for people that want to put it into a static. #[allow(static_mut_refs)] @@ -38,6 +41,7 @@ pub fn get_available_encodings() -> &'static [&'static str] { } } +/// Formats the given ICU error code into a human-readable string. pub fn apperr_format(f: &mut std::fmt::Formatter<'_>, code: u32) -> std::fmt::Result { fn format(code: u32) -> &'static str { let Ok(f) = init_if_needed() else { @@ -62,6 +66,7 @@ pub fn apperr_format(f: &mut std::fmt::Formatter<'_>, code: u32) -> std::fmt::Re } } +/// Converts between two encodings using ICU. pub struct Converter<'pivot> { source: *mut icu_ffi::UConverter, target: *mut icu_ffi::UConverter, @@ -80,6 +85,14 @@ impl Drop for Converter<'_> { } impl<'pivot> Converter<'pivot> { + /// Constructs a new `Converter` instance. + /// + /// # Parameters + /// + /// * `pivot_buffer`: A buffer used to cache partial conversions. + /// Don't make it too small. + /// * `source_encoding`: The source encoding name (e.g., "UTF-8"). + /// * `target_encoding`: The target encoding name (e.g., "UTF-16"). pub fn new( pivot_buffer: &'pivot mut [MaybeUninit], source_encoding: &str, @@ -114,6 +127,20 @@ impl<'pivot> Converter<'pivot> { arena_format!(arena, "{}\0", input) } + /// Performs one step of the encoding conversion. + /// + /// # Parameters + /// + /// * `input`: The input buffer to convert from. + /// It should be in the `source_encoding` that was previously specified. + /// * `output`: The output buffer to convert to. + /// It should be in the `target_encoding` that was previously specified. + /// + /// # Returns + /// + /// A tuple containing: + /// 1. The number of bytes read from the input buffer. + /// 2. The number of bytes written to the output buffer. pub fn convert( &mut self, input: &[u8], @@ -168,24 +195,26 @@ impl<'pivot> Converter<'pivot> { // I picked 64 because it seemed like a reasonable lower bound. const CACHE_SIZE: usize = 64; -// Caches a chunk of TextBuffer contents (UTF-8) in UTF-16 format. +/// Caches a chunk of TextBuffer contents (UTF-8) in UTF-16 format. struct Cache { - /// The translated text. Contains `len`-many valid items. + /// The translated text. Contains [`Cache::utf16_len`]-many valid items. utf16: [u16; CACHE_SIZE], - /// For each character in `utf16` this stores the offset in the `TextBuffer`, + /// For each character in [`Cache::utf16`] this stores the offset in the [`TextBuffer`], /// relative to the start offset stored in `native_beg`. - /// This has the same length as `utf16`. + /// This has the same length as [`Cache::utf16`]. utf16_to_utf8_offsets: [u16; CACHE_SIZE], - /// `utf8_to_utf16_offsets[native_offset - native_beg]` will tell you which character - /// in `utf16` maps to the given `native_offset` in the underlying `TextBuffer`. + /// `utf8_to_utf16_offsets[native_offset - native_beg]` will tell you which character in + /// [`Cache::utf16`] maps to the given `native_offset` in the underlying [`TextBuffer`]. /// Contains `native_end - native_beg`-many valid items. utf8_to_utf16_offsets: [u16; CACHE_SIZE], - /// The number of valid items in `utf16`. + /// The number of valid items in [`Cache::utf16`]. utf16_len: usize, + /// Offset of the first non-ASCII character. + /// Less than or equal to [`Cache::utf16_len`]. native_indexing_limit: usize, - /// The range of UTF-8 text in the `TextBuffer` that this chunk covers. + /// The range of UTF-8 text in the [`TextBuffer`] that this chunk covers. utf8_range: Range, } @@ -195,9 +224,15 @@ struct DoubleCache { mru: bool, } -// I initially did this properly with a PhantomData marker for the TextBuffer lifetime, -// but it was a pain so now I don't. Not a big deal - its only use is in a self-referential -// struct in TextBuffer which Rust can't deal with anyway. +/// A wrapper around ICU's `UText` struct. +/// +/// In our case its only purpose is to adapt a [`TextBuffer`] for ICU. +/// +/// # Safety +/// +/// Warning! No lifetime tracking is done here. +/// I initially did it properly with a PhantomData marker for the TextBuffer +/// lifetime, but it was a pain so now I don't. Not a big deal in our case. pub struct Text(&'static mut icu_ffi::UText); impl Drop for Text { @@ -208,11 +243,12 @@ impl Drop for Text { } impl Text { - /// Constructs an ICU `UText` instance from a `TextBuffer`. + /// Constructs an ICU `UText` instance from a [`TextBuffer`]. /// /// # Safety /// - /// The caller must ensure that the given `TextBuffer` outlives the returned `Text` instance. + /// The caller must ensure that the given [`TextBuffer`] + /// outlives the returned `Text` instance. pub unsafe fn new(tb: &TextBuffer) -> apperr::Result { let f = init_if_needed()?; @@ -349,12 +385,16 @@ fn utext_access_impl<'a>( let dirty = ut.a != tb.generation() as i64; if dirty { + // The text buffer contents have changed. + // Invalidate both caches so that future calls don't mistakenly use them + // when they enter the for loop in the else branch below (`dirty == false`). double_cache.cache[0].utf16_len = 0; double_cache.cache[1].utf16_len = 0; double_cache.cache[0].utf8_range = 0..0; double_cache.cache[1].utf8_range = 0..0; ut.a = tb.generation() as i64; } else { + // Check if one of the caches already contains the requested range. for (i, cache) in double_cache.cache.iter_mut().enumerate() { if cache.utf8_range.contains(&index_contained) { double_cache.mru = i != 0; @@ -443,13 +483,12 @@ fn utext_access_impl<'a>( } } - // TODO: This loop is the slow part of our uregex search. May be worth optimizing. loop { let Some(c) = it.next() else { break; }; - // Thanks to our `if utf16_len >= utf16_limit` check, + // Thanks to our `if utf16_len >= UTF16_LEN_LIMIT` check, // we can safely assume that this will fit. unsafe { let utf8_len_beg = utf8_len; @@ -515,7 +554,11 @@ extern "C" fn utext_map_native_index_to_utf16(ut: &icu_ffi::UText, native_index: off_rel as i32 } -// Same reason here for not using a PhantomData marker as with `Text`. +/// A wrapper around ICU's `URegularExpression` struct. +/// +/// # Safety +/// +/// Warning! No lifetime tracking is done here. pub struct Regex(&'static mut icu_ffi::URegularExpression); impl Drop for Regex { @@ -526,8 +569,14 @@ impl Drop for Regex { } impl Regex { + /// Enable case-insensitive matching. pub const CASE_INSENSITIVE: i32 = icu_ffi::UREGEX_CASE_INSENSITIVE; + + /// If set, ^ and $ match the start and end of each line. + /// Otherwise, they match the start and end of the entire string. pub const MULTILINE: i32 = icu_ffi::UREGEX_MULTILINE; + + /// Treat the given pattern as a literal string. pub const LITERAL: i32 = icu_ffi::UREGEX_LITERAL; /// Constructs a regex, plain and simple. Read `uregex_open` docs. @@ -566,7 +615,7 @@ impl Regex { } /// Updates the regex pattern with the given text. - /// If the text contents have changed, you can pass the same text as you usued + /// If the text contents have changed, you can pass the same text as you used /// initially and it'll trigger ICU to reload the text and invalidate its caches. /// /// # Safety @@ -578,6 +627,7 @@ impl Regex { unsafe { (f.uregex_setUText)(self.0, text.0 as *const _ as *mut _, &mut status) }; } + /// Sets the regex to the absolute offset in the underlying text. pub fn reset(&mut self, index: usize) { let f = assume_loaded(); let mut status = icu_ffi::U_ZERO_ERROR; @@ -611,6 +661,7 @@ impl Iterator for Regex { static mut ROOT_COLLATOR: Option<*mut icu_ffi::UCollator> = None; +/// Compares two UTF-8 strings for sorting using ICU's collation algorithm. pub fn compare_strings(a: &[u8], b: &[u8]) -> Ordering { // OnceCell for people that want to put it into a static. #[allow(static_mut_refs)] @@ -688,6 +739,10 @@ fn compare_strings_ascii(a: &[u8], b: &[u8]) -> Ordering { static mut ROOT_CASEMAP: Option<*mut icu_ffi::UCaseMap> = None; +/// Converts the given UTF-8 string to lower case. +/// +/// Case folding differs from lower case in that the output is primarily useful +/// to machines for comparisons. It's like applying Unicode normalization. pub fn fold_case<'a>(arena: &'a Arena, input: &str) -> ArenaString<'a> { // OnceCell for people that want to put it into a static. #[allow(static_mut_refs)] diff --git a/src/input.rs b/src/input.rs index 7f4a3bb..204eb2e 100644 --- a/src/input.rs +++ b/src/input.rs @@ -1,10 +1,17 @@ +//! Parses VT sequences into input events. +//! +//! In the future this allows us to take apart the application and +//! support input schemes that aren't VT, such as UEFI, or GUI. + use crate::helpers::{CoordType, Point, Size}; use crate::vt; -// TODO: Is this a good idea? I did it to allow typing `kbmod::CTRL | vk::A`. -// The reason it's an awkard u32 and not a struct is to hopefully make ABIs easier later. -// Of course you could just translate on the ABI boundary, but my hope is that this -// design lets me realize some restrictions early on that I can't foresee yet. +/// Represents a key/modifier combination. +/// +/// TODO: Is this a good idea? I did it to allow typing `kbmod::CTRL | vk::A`. +/// The reason it's an awkard u32 and not a struct is to hopefully make ABIs easier later. +/// Of course you could just translate on the ABI boundary, but my hope is that this +/// design lets me realize some restrictions early on that I can't foresee yet. #[repr(transparent)] #[derive(Clone, Copy, PartialEq, Eq)] pub struct InputKey(u32); @@ -47,6 +54,7 @@ impl InputKey { } } +/// A keyboard modifier. Ctrl/Alt/Shift. #[repr(transparent)] #[derive(Clone, Copy, PartialEq, Eq)] pub struct InputKeyMod(u32); @@ -83,8 +91,10 @@ impl std::ops::BitOrAssign for InputKeyMod { } } -// The codes defined here match the VK_* constants on Windows. -// It's a convenient way to handle keyboard input, even on other platforms. +/// Keyboard keys. +/// +/// The codes defined here match the VK_* constants on Windows. +/// It's a convenient way to handle keyboard input, even on other platforms. pub mod vk { use super::InputKey; @@ -189,6 +199,7 @@ pub mod vk { pub const F24: InputKey = InputKey::new(0x87); } +/// Keyboard modifiers. pub mod kbmod { use super::InputKeyMod; @@ -203,12 +214,17 @@ pub mod kbmod { pub const CTRL_ALT_SHIFT: InputKeyMod = InputKeyMod::new(0x07000000); } +/// Text input. +/// +/// "Keyboard" input is also "text" input and vice versa. +/// It differs in that text input can also be Unicode. #[derive(Clone, Copy)] pub struct InputText<'a> { pub text: &'a str, pub bracketed: bool, } +/// Mouse input state. Up/Down, Left/Right, etc. #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Default)] pub enum InputMouseState { #[default] @@ -224,21 +240,34 @@ pub enum InputMouseState { Scroll, } +/// Mouse input. #[derive(Clone, Copy)] pub struct InputMouse { + /// The state of the mouse.Up/Down, Left/Right, etc. pub state: InputMouseState, + /// Any keyboard modifiers that are held down. pub modifiers: InputKeyMod, + /// Position of the mouse in the viewport. pub position: Point, + /// Scroll delta. pub scroll: Point, } +/// Primary result type of the parser. pub enum Input<'input> { + /// Window resize event. Resize(Size), + /// Text input. + /// + /// Note that [`Input::Keyboard`] events can also be text. Text(InputText<'input>), + /// Keyboard input. Keyboard(InputKey), + /// Mouse input. Mouse(InputMouse), } +/// Parses VT sequences into input events. pub struct Parser { bracketed_paste: bool, x10_mouse_want: bool, @@ -247,6 +276,9 @@ pub struct Parser { } impl Parser { + /// Creates a new parser that turns VT sequences into input events. + /// + /// Keep the instance alive for the lifetime of the input stream. pub fn new() -> Self { Self { bracketed_paste: false, @@ -256,7 +288,8 @@ impl Parser { } } - /// Turns VT sequences into keyboard, mouse, etc., inputs. + /// Takes an [`vt::Stream`] and returns a [`Stream`] + /// that turns VT sequences into input events. pub fn parse<'parser, 'vt, 'input>( &'parser mut self, stream: vt::Stream<'vt, 'input>, @@ -265,15 +298,15 @@ impl Parser { } } +/// An iterator that parses VT sequences into input events. +/// +/// Can't implement [`Iterator`], because this is a "lending iterator". pub struct Stream<'parser, 'vt, 'input> { parser: &'parser mut Parser, stream: vt::Stream<'vt, 'input>, } impl<'input> Stream<'_, '_, 'input> { - /// Parses the next input action from the previously given input. - /// - /// Can't implement Iterator, because this is a "lending iterator". #[allow(clippy::should_implement_trait)] pub fn next(&mut self) -> Option> { loop { @@ -446,6 +479,17 @@ impl<'input> Stream<'_, '_, 'input> { } } + /// Once we encounter the start of a bracketed paste + /// we seek to the end of the paste in this function. + /// + /// A bracketed paste is basically: + /// ```text + /// [201~ lots of text [201~ + /// ``` + /// + /// That text inbetween is then expected to be taken literally. + /// It can inbetween be anything though, including other escape sequences. + /// This is the reason why this is a separate method. #[cold] fn handle_bracketed_paste(&mut self) -> Option> { let beg = self.stream.offset(); diff --git a/src/oklab.rs b/src/oklab.rs index c512c1c..9759cce 100644 --- a/src/oklab.rs +++ b/src/oklab.rs @@ -1,7 +1,10 @@ -//! This module implements Oklab as defined at: https://bottosson.github.io/posts/oklab/ +//! Oklab colorspace conversions. +//! +//! Implements Oklab as defined at: #![allow(clippy::excessive_precision)] +/// An Oklab color with alpha. pub struct Lab { pub l: f32, pub a: f32, @@ -9,6 +12,7 @@ pub struct Lab { pub alpha: f32, } +/// Converts a 32-bit sRGB color to Oklab. pub fn srgb_to_oklab(color: u32) -> Lab { let r = SRGB_TO_RGB_LUT[(color & 0xff) as usize]; let g = SRGB_TO_RGB_LUT[((color >> 8) & 0xff) as usize]; @@ -31,6 +35,7 @@ pub fn srgb_to_oklab(color: u32) -> Lab { } } +/// Converts an Oklab color to a 32-bit sRGB color. pub fn oklab_to_srgb(c: Lab) -> u32 { let l_ = c.l + 0.3963377774 * c.a + 0.2158037573 * c.b; let m_ = c.l - 0.1055613458 * c.a - 0.0638541728 * c.b; @@ -57,6 +62,7 @@ pub fn oklab_to_srgb(c: Lab) -> u32 { r | (g << 8) | (b << 16) | (a << 24) } +/// Blends two 32-bit sRGB colors in the Oklab color space. pub fn oklab_blend(dst: u32, src: u32) -> u32 { let dst = srgb_to_oklab(dst); let src = srgb_to_oklab(src); diff --git a/src/path.rs b/src/path.rs index f6df958..1b49b2e 100644 --- a/src/path.rs +++ b/src/path.rs @@ -1,3 +1,5 @@ +//! Path related helpers. + use std::ffi::OsStr; use std::path::{Component, MAIN_SEPARATOR_STR, Path, PathBuf}; diff --git a/src/simd/memchr2.rs b/src/simd/memchr2.rs index 4ef2086..9b8ff10 100644 --- a/src/simd/memchr2.rs +++ b/src/simd/memchr2.rs @@ -1,13 +1,13 @@ -//! Rust has a very popular `memchr` crate. It's quite fast, so you may ask yourself -//! why we don't just use it: Simply put, this is optimized for short inputs. +//! `memchr`, but with two needles. use std::ptr; use super::distance; -/// memchr(), but with two needles. -/// Returns the index of the first occurrence of either needle in the `haystack`. -/// If no needle is found, `haystack.len()` is returned. +/// `memchr`, but with two needles. +/// +/// Returns the index of the first occurrence of either needle in the +/// `haystack`. If no needle is found, `haystack.len()` is returned. /// `offset` specifies the index to start searching from. pub fn memchr2(needle1: u8, needle2: u8, haystack: &[u8], offset: usize) -> usize { unsafe { diff --git a/src/simd/memrchr2.rs b/src/simd/memrchr2.rs index dd7be9a..c5b7bf8 100644 --- a/src/simd/memrchr2.rs +++ b/src/simd/memrchr2.rs @@ -1,16 +1,15 @@ -//! Rust has a very popular `memchr` crate. It's quite fast, so you may ask yourself -//! why we don't just use it: Simply put, this is optimized for short inputs. +//! `memchr`, but with two needles. use std::ptr; use super::distance; -/// Same as `memchr2`, but searches from the end of the haystack. -/// If no needle is found, 0 is returned. +/// `memchr`, but with two needles. /// -/// *NOTE: Unlike `memchr2` (or `memrchr`), an offset PAST the hit is returned.* -/// This is because this function is primarily used for `unicode::newlines_backward`, -/// which needs exactly that. +/// If no needle is found, 0 is returned. +/// Unlike `memchr2` (or `memrchr`), an offset PAST the hit is returned. +/// This is because this function is primarily used for +/// `ucd::newlines_backward`, which needs exactly that. pub fn memrchr2(needle1: u8, needle2: u8, haystack: &[u8], offset: usize) -> Option { unsafe { let beg = haystack.as_ptr(); diff --git a/src/simd/memset.rs b/src/simd/memset.rs index c89855d..7748440 100644 --- a/src/simd/memset.rs +++ b/src/simd/memset.rs @@ -1,21 +1,25 @@ -//! This module provides a `memset` function for "arbitrary" sizes (1/2/4/8 bytes), as the regular `memset` -//! is only implemented for byte-sized arrays. This allows us to more aggressively unroll loops and to -//! use AVX2 on x64 for the non-byte-sized cases and opens the door to compiling with `-Copt-level=s`. +//! `memchr` for arbitrary sizes (1/2/4/8 bytes). //! -//! This implementation uses SWAR to only have a single implementation for all 4 sizes: By duplicating smaller -//! types into a larger `u64` register we can treat all sizes as if they were `u64`. The only thing we need -//! to take care of then, is the tail end of the array, where we need to write 0-7 additional bytes. +//! Clang calls the C `memset` function only for byte-sized types (or 0 fills). +//! We however need to fill other types as well. For that, clang generates +//! SIMD loops under higher optimization levels. With `-Os` however, it only +//! generates a trivial loop which is too slow for our needs. +//! +//! This implementation uses SWAR to only have a single implementation for all +//! 4 sizes: By duplicating smaller types into a larger `u64` register we can +//! treat all sizes as if they were `u64`. The only thing we need to take care +//! of is the tail end of the array, which needs to write 0-7 additional bytes. use std::mem; use super::distance; -/// A trait to mark types that are safe to use with `memset`. +/// A marker trait for types that are safe to `memset`. /// /// # Safety /// /// Just like with C's `memset`, bad things happen -/// if you use this with types that are non-trivial. +/// if you use this with non-trivial types. pub unsafe trait MemsetSafe: Copy {} unsafe impl MemsetSafe for u8 {} @@ -30,6 +34,7 @@ unsafe impl MemsetSafe for i32 {} unsafe impl MemsetSafe for i64 {} unsafe impl MemsetSafe for isize {} +/// Fills a slice with the given value. #[inline] pub fn memset(dst: &mut [T], val: T) { unsafe { diff --git a/src/simd/mod.rs b/src/simd/mod.rs index 2d81740..a114135 100644 --- a/src/simd/mod.rs +++ b/src/simd/mod.rs @@ -1,3 +1,5 @@ +//! Provides various high-throughput utilities. + mod memchr2; mod memrchr2; mod memset; diff --git a/src/sys/mod.rs b/src/sys/mod.rs index 3e4d7b4..df204a8 100644 --- a/src/sys/mod.rs +++ b/src/sys/mod.rs @@ -1,3 +1,5 @@ +//! Platform abstractions. + use std::fs::File; use std::path::Path; diff --git a/src/sys/unix.rs b/src/sys/unix.rs index a10855e..966574e 100644 --- a/src/sys/unix.rs +++ b/src/sys/unix.rs @@ -1,3 +1,8 @@ +//! Unix-specific platform code. +//! +//! Read the `windows` module for reference. +//! TODO: This reminds me that the sys API should probably be a trait. + use std::ffi::{CStr, c_int, c_void}; use std::fs::{self, File}; use std::mem::{self, MaybeUninit}; diff --git a/src/sys/windows.rs b/src/sys/windows.rs index 92aba14..4a1bae5 100644 --- a/src/sys/windows.rs +++ b/src/sys/windows.rs @@ -73,6 +73,7 @@ extern "system" fn console_ctrl_handler(_ctrl_type: u32) -> Foundation::BOOL { 1 } +/// Initializes the platform-specific state. pub fn init() -> apperr::Result { unsafe { // Get the stdin and stdout handles first, so that if this function fails, @@ -151,6 +152,7 @@ impl Drop for Deinit { } } +/// Switches the terminal into raw mode, etc. pub fn switch_modes() -> apperr::Result<()> { unsafe { check_bool_return(Console::SetConsoleCtrlHandler(Some(console_ctrl_handler), 1))?; @@ -180,6 +182,10 @@ pub fn switch_modes() -> apperr::Result<()> { } } +/// During startup we need to get the window size from the terminal. +/// Because I didn't want to type a bunch of code, this function tells +/// [`read_stdin`] to inject a fake sequence, which gets picked up by +/// the input parser and provided to the TUI code. pub fn inject_window_size_into_stdin() { unsafe { STATE.inject_resize = true; @@ -202,9 +208,11 @@ fn get_console_size() -> Option { /// Reads from stdin. /// -/// Returns `None` if there was an error reading from stdin. -/// Returns `Some("")` if the given timeout was reached. -/// Otherwise, it returns the read, non-empty string. +/// # Returns +/// +/// * `None` if there was an error reading from stdin. +/// * `Some("")` if the given timeout was reached. +/// * Otherwise, it returns the read, non-empty string. pub fn read_stdin(arena: &Arena, mut timeout: time::Duration) -> Option> { let scratch = scratch_arena(Some(arena)); @@ -351,6 +359,10 @@ pub fn read_stdin(arena: &Arena, mut timeout: time::Duration) -> Option Option { unsafe { let handle = Console::GetStdHandle(Console::STD_INPUT_HANDLE); @@ -376,12 +394,14 @@ pub fn open_stdin_if_redirected() -> Option { } } +/// A unique identifier for a file. #[derive(Clone)] #[repr(transparent)] pub struct FileId(FileSystem::FILE_ID_INFO); impl PartialEq for FileId { fn eq(&self, other: &Self) -> bool { + // Lowers to an efficient word-wise comparison. const SIZE: usize = std::mem::size_of::(); let a: &[u8; SIZE] = unsafe { mem::transmute(&self.0) }; let b: &[u8; SIZE] = unsafe { mem::transmute(&other.0) }; @@ -405,6 +425,10 @@ pub fn file_id(file: &File) -> apperr::Result { } } +/// Canonicalizes the given path. +/// +/// This differs from [`fs::canonicalize`] in that it strips the `\\?\` UNC +/// prefix on Windows. This is because it's confusing/ugly when displaying it. pub fn canonicalize(path: &Path) -> std::io::Result { let mut path = fs::canonicalize(path)?; let path = path.as_mut_os_string(); @@ -421,8 +445,8 @@ pub fn canonicalize(path: &Path) -> std::io::Result { } /// Reserves a virtual memory region of the given size. -/// To commit the memory, use `virtual_commit`. -/// To release the memory, use `virtual_release`. +/// To commit the memory, use [`virtual_commit`]. +/// To release the memory, use [`virtual_release`]. /// /// # Safety /// @@ -456,7 +480,7 @@ pub unsafe fn virtual_reserve(size: usize) -> apperr::Result> { /// # Safety /// /// This function is unsafe because it uses raw pointers. -/// Make sure to only pass pointers acquired from `virtual_reserve`. +/// Make sure to only pass pointers acquired from [`virtual_reserve`]. pub unsafe fn virtual_release(base: NonNull, size: usize) { unsafe { Memory::VirtualFree(base.as_ptr() as *mut _, size, Memory::MEM_RELEASE); @@ -468,8 +492,8 @@ pub unsafe fn virtual_release(base: NonNull, size: usize) { /// # Safety /// /// This function is unsafe because it uses raw pointers. -/// Make sure to only pass pointers acquired from `virtual_reserve` -/// and to pass a size less than or equal to the size passed to `virtual_reserve`. +/// Make sure to only pass pointers acquired from [`virtual_reserve`] +/// and to pass a size less than or equal to the size passed to [`virtual_reserve`]. pub unsafe fn virtual_commit(base: NonNull, size: usize) -> apperr::Result<()> { unsafe { check_ptr_return(Memory::VirtualAlloc( @@ -511,14 +535,17 @@ pub unsafe fn get_proc_address(handle: NonNull, name: &CStr) -> apper } } +/// Loads the "common" portion of ICU4C. pub fn load_libicuuc() -> apperr::Result> { unsafe { load_library(w!("icuuc.dll")) } } +/// Loads the internationalization portion of ICU4C. pub fn load_libicui18n() -> apperr::Result> { unsafe { load_library(w!("icuin.dll")) } } +/// Returns a list of preferred languages for the current user. pub fn preferred_languages(arena: &Arena) -> Vec { // If the GetUserPreferredUILanguages() don't fit into 512 characters, // honestly, just give up. How many languages do you realistically need? @@ -606,6 +633,7 @@ pub(crate) fn io_error_to_apperr(err: std::io::Error) -> apperr::Error { gle_to_apperr(err.raw_os_error().unwrap_or(0) as u32) } +/// Formats a platform error code into a human-readable string. pub fn apperr_format(f: &mut std::fmt::Formatter<'_>, code: u32) -> std::fmt::Result { unsafe { let mut ptr: *mut u8 = null_mut(); @@ -635,6 +663,7 @@ pub fn apperr_format(f: &mut std::fmt::Formatter<'_>, code: u32) -> std::fmt::Re } } +/// Checks if the given error is a "file not found" error. pub fn apperr_is_not_found(err: apperr::Error) -> bool { err == gle_to_apperr(Foundation::ERROR_FILE_NOT_FOUND) } diff --git a/src/tui.rs b/src/tui.rs index b9d37b1..0a84059 100644 --- a/src/tui.rs +++ b/src/tui.rs @@ -1,3 +1,145 @@ +//! An immediate mode UI framework for terminals. +//! +//! # Why immediate mode? +//! +//! This uses an "immediate mode" design, similar to [ImGui](https://github.com/ocornut/imgui). +//! The reason for this is that I expect the UI needs for any terminal application to be +//! fairly minimal, and for that purpose an immediate mode design is much simpler to use. +//! +//! So what's "immediate mode"? The primary alternative is called "retained mode". +//! The diference is that when you create a button in this framework in one frame, +//! and you stop telling this framework in the next frame, the button will vanish. +//! When you use a regular retained mode UI framework, you create the button once, +//! set up callbacks for when it is clicked, and then stop worrying about it. +//! +//! The downside of immediate mode is that your UI code _may_ become cluttered. +//! The upside however is that that you cannot leak UI elements, you don't need to +//! worry about lifetimes nor callbacks, and that simple UIs are simple to write. +//! +//! More importantly though, the primary reason for this is that the +//! lack of callbacks means we can use this design across a plain C ABI, +//! which we'll need once plugins come into play. GTK's `g_signal_connect` +//! shows that the alternative can be rather cumbersome. +//! +//! # Design overview +//! +//! While this file is fairly lengthy, the overall algorithm is simple. +//! On the first frame ever: +//! * Prepare an empty `arena_next`. +//! * Parse the incoming [`input::Input`] which should be a resize event. +//! * Create a new [`Context`] instance and give it the caller. +//! * Now the caller will draw their UI with the [`Context`] by calling the +//! various [`Context`] UI methods, such as [`Context::block_begin()`] and +//! [`Context::block_end()`]. These two are the basis which all other UI +//! elements are built upon by the way. Each UI element that is created gets +//! allocated onto `arena_next` and inserted into the UI tree. +//! That tree works exactly like the DOM tree in HTML: Each node in the tree +//! has a parent, children, and siblings. The tree layout at the end is then +//! a direct mirror of the code "layout" that created it. +//! * Once the caller is done and drops the [`Context`], it'll secretly call +//! `report_context_completion`. This causes a number of things: +//! * The DOM tree that was built is stored in `prev_tree`. +//! * A hashmap of all nodes is built and stored in `prev_node_map`. +//! * `arena_next` is swapped with `arena_prev`. +//! * Each UI node is measured and laid out. +//! * Now the caller is expected to repeat this process with a [`None`] +//! input event until [`Tui::needs_settling()`] returns false. +//! This is necessary, because when [`Context::button()`] returns `true` +//! in one frame, it may change the state in the caller's code +//! and require another frame to be drawn. +//! * Finally a call to [`Tui::render()`] will render the UI tree into the +//! framebuffer and return VT output. +//! +//! On every subsequent frame the process is similar, but one crucial element +//! of any immediate mode UI framework is added: +//! Now when the caller draws their UI, the various [`Context`] UI elements +//! have access to `prev_node_map` and the previously built UI tree. +//! This allows the UI framework to reuse the previously computed layout for +//! hit tests, caching scroll offsets, and so on. +//! +//! In the end it looks very similar: +//! * Prepare an empty `arena_next`. +//! * Parse the incoming [`input::Input`]... +//! * **BUT** now we can hit-test mouse clicks onto the previously built +//! UI tree. This way we can delegate focus on left mouse clicks. +//! * Create a new [`Context`] instance and give it the caller. +//! * The caller draws their UI with the [`Context`]... +//! * **BUT** we can preserve the UI state across frames. +//! * Continue rendering until [`Tui::needs_settling()`] returns false. +//! * And the final call to [`Tui::render()`]. +//! +//! # Classnames and node IDs +//! +//! So how do we find which node from the previous tree correlates to the +//! current node? Each node needs to be constructed with a "classname". +//! The classname is hashed with the parent node ID as the seed. This derived +//! hash is then used as the new child node ID. Under the assumption that the +//! collision likelihood of the hash function is low, this services as true IDs. +//! +//! This has the nice added property that finding a node with the same ID +//! guarantees that all of the parent nodes must have equivalent IDs as well. +//! This turns "is the focus anywhere inside this subtree" into an O(1) check. +//! +//! The reason "classnames" are used is because I was hoping to add theming +//! in the future with a syntax similar to CSS (simplified, however). +//! +//! # Example +//! +//! ``` +//! use edit::helpers::Size; +//! use edit::input::Input; +//! use edit::tui::*; +//! use edit::{arena, arena_format}; +//! +//! struct State { +//! counter: i32, +//! } +//! +//! fn main() { +//! arena::init().unwrap(); +//! +//! // Create a `Tui` instance which holds state across frames. +//! let mut tui = Tui::new().unwrap(); +//! let mut state = State { counter: 0 }; +//! let input = Input::Resize(Size { width: 80, height: 24 }); +//! +//! // Pass the input to the TUI. +//! { +//! let mut ctx = tui.create_context(Some(input)); +//! draw(&mut ctx, &mut state); +//! } +//! +//! // Continue until the layout has settled. +//! while tui.needs_settling() { +//! let mut ctx = tui.create_context(None); +//! draw(&mut ctx, &mut state); +//! } +//! +//! // Render the output. +//! let scratch = arena::scratch_arena(None); +//! let output = tui.render(&*scratch); +//! println!("{}", output); +//! } +//! +//! fn draw(ctx: &mut Context, state: &mut State) { +//! ctx.table_begin("classname"); +//! { +//! ctx.table_next_row(); +//! +//! // Thanks to the lack of callbacks, we can use a primitive +//! // if condition here, as well as in any potential C code. +//! if ctx.button("button", "Click me!") { +//! state.counter += 1; +//! } +//! +//! // Similarly, formatting and showing labels is straightforward. +//! // It's impossible to forget updating the label this way. +//! ctx.label("label", &arena_format!(ctx.arena(), "Counter: {}", state.counter)); +//! } +//! ctx.table_end(); +//! } +//! ``` + use std::arch::breakpoint; #[cfg(debug_assertions)] use std::collections::HashSet; @@ -22,33 +164,46 @@ type InputKey = input::InputKey; type InputMouseState = input::InputMouseState; type InputText<'input> = input::InputText<'input>; +/// Since [`TextBuffer`] creation and management is expensive, +/// we cache instances of them for reuse between frames. +/// This is used for [`Context::editline()`]. struct CachedTextBuffer { node_id: u64, editor: RcTextBuffer, seen: bool, } +/// Since [`Context::editline()`] and [`Context::textarea()`] +/// do almost the same thing, this abstracts over the two. enum TextBufferPayload<'a> { Editline(&'a mut dyn WriteableDocument), Textarea(RcTextBuffer), } +/// In order for the TUI to show the correct Ctrl/Alt/Shift +/// translations, this struct lets you set them. pub struct ModifierTranslations { pub ctrl: &'static str, pub alt: &'static str, pub shift: &'static str, } +/// Controls to which node the floater is anchored. #[derive(Default, Clone, Copy, PartialEq, Eq)] pub enum Anchor { + /// The floater is attached relative to the node created last. #[default] Last, + /// The floater is attached relative to the current node (= parent of new nodes). Parent, + /// The floater is attached relative to the root node (= usually the viewport). Root, } +/// Controls the position of the floater. See [`Context::attr_float`]. #[derive(Default)] pub struct FloatSpec { + /// Controls to which node the floater is anchored. pub anchor: Anchor, // Specifies the origin of the container relative to the container size. [0, 1] pub gravity_x: f32, @@ -58,36 +213,65 @@ pub struct FloatSpec { pub offset_y: f32, } +/// Informs you about the change that was made to the list selection. #[derive(Clone, Copy, PartialEq, Eq)] pub enum ListSelection { + /// The selection wasn't changed. Unchanged, + /// The selection was changed to the current list item. Selected, + /// The selection was changed to the current list item + /// *and* the item was also activated (Enter or Double-click). Activated, } +/// Controls the position of a node relative to its parent. #[derive(Default)] pub enum Position { + /// The child is stretched to fill the parent. #[default] Stretch, + /// The child is positioned at the left edge of the parent. Left, + /// The child is positioned at the center of the parent. Center, + /// The child is positioned at the right edge of the parent. Right, } +/// Controls the text overflow behavior of a label +/// when the text doesn't fit the container. #[derive(Default, Clone, Copy, PartialEq, Eq)] pub enum Overflow { + /// Text is simply cut off when it doesn't fit. #[default] Clip, + /// An ellipsis is shown at the end of the text. TruncateHead, + /// An ellipsis is shown in the middle of the text. TruncateMiddle, + /// An ellipsis is shown at the beginning of the text. TruncateTail, } +/// There's two types of lifetimes the TUI code needs to manage: +/// * Across frames +/// * Per frame +/// +/// [`Tui`] manages the first one. It's also the entrypoint for +/// everything else you may want to do. pub struct Tui { + /// Arena used for the previous frame. arena_prev: Arena, + /// Arena used for the current frame. arena_next: Arena, + /// The UI tree built in the previous frame. + /// This refers to memory in `arena_prev`. prev_tree: Tree<'static>, + /// A hashmap of all nodes built in the previous frame. + /// This refers to memory in `arena_prev`. prev_node_map: NodeMap<'static>, + /// The framebuffer used for rendering. framebuffer: Framebuffer, modifier_translations: ModifierTranslations, @@ -97,27 +281,51 @@ pub struct Tui { modal_default_fg: u32, /// Last known terminal size. + /// + /// This lives here instead of [`Context`], because we need to + /// track the state across frames and input events. + /// This also applies to the remaining members in this block below. size: Size, /// Last known mouse position. mouse_position: Point, /// Between mouse down and up, the position where the mouse was pressed. /// Otherwise, this contains Point::MIN. mouse_down_position: Point, + /// Node ID of the node that was clicked on. + /// Used for tracking drag targets. left_mouse_down_target: u64, + /// Timestamp of the last mouse up event. + /// Used for tracking double/triple clicks. mouse_up_timestamp: std::time::Instant, + /// The current mouse state. mouse_state: InputMouseState, + /// Whether the mouse is currently being dragged. mouse_is_drag: bool, + /// The number of clicks that have happened in a row. + /// Gets reset when the mouse was released for a while. mouse_click_counter: CoordType, + /// The path to the node that was clicked on. mouse_down_node_path: Vec, + /// The position of the first click in a double/triple click series. first_click_position: Point, + /// The node ID of the node that was first clicked on + /// in a double/triple click series. first_click_target: u64, + /// Path to the currently focused node. focused_node_path: Vec, + /// Contains the last element in [`Tui::focused_node_path`]. + /// This way we can track if the focus changed, because then we + /// need to scroll the node into view if it's within a scrollarea. focused_node_for_scrolling: u64, + /// A list of cached text buffers used for [`Context::editline()`]. cached_text_buffers: Vec, + /// The clipboard contents. clipboard: Vec, + /// A counter that is incremented every time the clipboard changes. + /// Allows for tracking clipboard changes without comparing contents. clipboard_generation: u32, settling_have: i32, @@ -126,6 +334,7 @@ pub struct Tui { } impl Tui { + /// Creates a new [`Tui`] instance for storing state across frames. pub fn new() -> apperr::Result { let arena_prev = Arena::new(128 * MEBI)?; let arena_next = Arena::new(128 * MEBI)?; @@ -179,56 +388,74 @@ impl Tui { Ok(tui) } + /// Sets up the framebuffer's color palette. pub fn setup_indexed_colors(&mut self, colors: [u32; INDEXED_COLORS_COUNT]) { self.framebuffer.set_indexed_colors(colors); } + /// Set up translations for Ctrl/Alt/Shift modifiers. pub fn setup_modifier_translations(&mut self, translations: ModifierTranslations) { self.modifier_translations = translations; } + /// Set the default background color for floaters (dropdowns, etc.). pub fn set_floater_default_bg(&mut self, color: u32) { self.floater_default_bg = color; } + /// Set the default foreground color for floaters (dropdowns, etc.). pub fn set_floater_default_fg(&mut self, color: u32) { self.floater_default_fg = color; } + /// Set the default background color for modals. pub fn set_modal_default_bg(&mut self, color: u32) { self.modal_default_bg = color; } + /// Set the default foreground color for modals. pub fn set_modal_default_fg(&mut self, color: u32) { self.modal_default_fg = color; } + /// If the TUI is currently running animations, etc., + /// this will return a timeout smaller than [`time::Duration::MAX`]. pub fn read_timeout(&mut self) -> time::Duration { mem::replace(&mut self.read_timeout, time::Duration::MAX) } + /// Returns an indexed color from the framebuffer. #[inline] pub fn indexed(&self, index: IndexedColor) -> u32 { self.framebuffer.indexed(index) } + /// Returns an indexed color from the framebuffer with the given alpha. + /// See [`Framebuffer::indexed_alpha()`]. #[inline] pub fn indexed_alpha(&self, index: IndexedColor, numerator: u32, denominator: u32) -> u32 { self.framebuffer.indexed_alpha(index, numerator, denominator) } + /// Returns a color in contrast with the given color. + /// See [`Framebuffer::contrasted()`]. pub fn contrasted(&self, color: u32) -> u32 { self.framebuffer.contrasted(color) } + /// Returns the current clipboard contents. pub fn clipboard(&self) -> &[u8] { &self.clipboard } + /// Returns the current clipboard generation. + /// The generation changes every time the clipboard contents change. + /// This allows you to track clipboard changes. pub fn clipboard_generation(&self) -> u32 { self.clipboard_generation } + /// Starts a new frame and returns a [`Context`] for it. pub fn create_context<'a, 'input>( &'a mut self, input: Option>, @@ -547,7 +774,7 @@ impl Tui { self.settling_want = (self.settling_have + 1).min(20); } - /// Renders all nodes into a string-frame representation. + /// Renders the last frame into the framebuffer and returns the VT output. pub fn render<'a>(&mut self, arena: &'a Arena) -> ArenaString<'a> { self.framebuffer.flip(self.size); for child in self.prev_tree.iterate_roots() { @@ -1118,6 +1345,8 @@ impl Tui { } } +/// Context is a temporary object that is created for each frame. +/// Its primary purpose is to build a UI tree. pub struct Context<'a, 'input> { tui: &'a mut Tui, @@ -1148,6 +1377,7 @@ impl<'a> Drop for Context<'a, '_> { } impl<'a> Context<'a, '_> { + /// Get an arena for temporary allocations such as for [`arena_format`]. pub fn arena(&self) -> &'a Arena { // TODO: // `Context` borrows `Tui` for lifetime 'a, so `self.tui` should be `&'a Tui`, right? @@ -1158,32 +1388,45 @@ impl<'a> Context<'a, '_> { unsafe { mem::transmute::<&'_ Arena, &'a Arena>(&self.tui.arena_next) } } + /// Returns the viewport size. pub fn size(&self) -> Size { + // We don't use the size stored in the framebuffer, because until + // `render()` is called, the framebuffer will use a stale size. self.tui.size } + /// Returns an indexed color from the framebuffer. #[inline] pub fn indexed(&self, index: IndexedColor) -> u32 { self.tui.framebuffer.indexed(index) } + /// Returns an indexed color from the framebuffer with the given alpha. + /// See [`Framebuffer::indexed_alpha()`]. #[inline] pub fn indexed_alpha(&self, index: IndexedColor, numerator: u32, denominator: u32) -> u32 { self.tui.framebuffer.indexed_alpha(index, numerator, denominator) } + /// Returns a color in contrast with the given color. + /// See [`Framebuffer::contrasted()`]. pub fn contrasted(&self, color: u32) -> u32 { self.tui.framebuffer.contrasted(color) } + /// Returns the current clipboard contents. pub fn clipboard(&self) -> &[u8] { self.tui.clipboard() } + /// Returns the current clipboard generation. + /// The generation changes every time the clipboard contents change. + /// This allows you to track clipboard changes. pub fn clipboard_generation(&self) -> u32 { self.tui.clipboard_generation() } + /// Sets the clipboard contents. pub fn set_clipboard(&mut self, data: Vec) { if !data.is_empty() { self.tui.clipboard = data; @@ -1192,13 +1435,14 @@ impl<'a> Context<'a, '_> { } } + /// Tell the UI framework that your state changed and you need another layout pass. pub fn needs_rerender(&mut self) { // If this hits, the call stack is responsible is trying to deadlock you. debug_assert!(self.tui.settling_have < 15); self.needs_settling = true; } - /// Begins a new UI block (container) with a unique ID. + /// Begins a generic UI block (container) with a unique ID derived from the given `classname`. pub fn block_begin(&mut self, classname: &'static str) { let parent = self.tree.current_node; @@ -1232,6 +1476,7 @@ impl<'a> Context<'a, '_> { } /// Mixes in an extra value to the next UI block's ID for uniqueness. + /// Use this when you build a list of items with the same classname. pub fn next_block_id_mixin(&mut self, id: u64) { self.next_block_id_mixin = id; } @@ -1241,6 +1486,8 @@ impl<'a> Context<'a, '_> { last_node.attributes.focusable = true; } + /// If this is the first time the current node is being drawn, + /// it'll steal the active focus. pub fn focus_on_first_present(&mut self) { let steal = { let mut last_node = self.tree.last_node.borrow_mut(); @@ -1252,6 +1499,7 @@ impl<'a> Context<'a, '_> { } } + /// Steals the focus unconditionally. pub fn steal_focus(&mut self) { self.steal_focus_for(self.tree.last_node); } @@ -1263,12 +1511,14 @@ impl<'a> Context<'a, '_> { } } + /// If the current node owns the focus, it'll be given to the parent. pub fn toss_focus_up(&mut self) { if self.tui.pop_focusable_node(1) { self.needs_rerender(); } } + /// If the parent node owns the focus, it'll be given to the current node. pub fn inherit_focus(&mut self) { let mut last_node = self.tree.last_node.borrow_mut(); let Some(parent) = last_node.parent else { @@ -1289,17 +1539,23 @@ impl<'a> Context<'a, '_> { } } + /// Causes keyboard focus to be unable to escape this node and its children. + /// It's a "well" because if the focus is inside it, it can't escape. pub fn attr_focus_well(&mut self) { let mut last_node = self.tree.last_node.borrow_mut(); last_node.attributes.focus_well = true; } + /// Explicitly sets the intrinsic size of the current node. + /// The intrinsic size is the size the node ideally wants to be. pub fn attr_intrinsic_size(&mut self, size: Size) { let mut last_node = self.tree.last_node.borrow_mut(); last_node.intrinsic_size = size; last_node.intrinsic_size_set = true; } + /// Turns the current node into a floating node, + /// like a popup, modal or a tooltip. pub fn attr_float(&mut self, spec: FloatSpec) { let last_node = self.tree.last_node; let anchor = { @@ -1328,16 +1584,19 @@ impl<'a> Context<'a, '_> { ln.attributes.fg = self.tui.floater_default_fg; } + /// Gives the current node a border. pub fn attr_border(&mut self) { let mut last_node = self.tree.last_node.borrow_mut(); last_node.attributes.bordered = true; } + /// Sets the current node's position inside the parent. pub fn attr_position(&mut self, align: Position) { let mut last_node = self.tree.last_node.borrow_mut(); last_node.attributes.position = align; } + /// Assigns padding to the current node. pub fn attr_padding(&mut self, padding: Rect) { let mut last_node = self.tree.last_node.borrow_mut(); last_node.attributes.padding = Self::normalize_rect(padding); @@ -1352,21 +1611,27 @@ impl<'a> Context<'a, '_> { } } + /// Assigns a sRGB background color to the current node. pub fn attr_background_rgba(&mut self, bg: u32) { let mut last_node = self.tree.last_node.borrow_mut(); last_node.attributes.bg = bg; } + /// Assigns a sRGB foreground color to the current node. pub fn attr_foreground_rgba(&mut self, fg: u32) { let mut last_node = self.tree.last_node.borrow_mut(); last_node.attributes.fg = fg; } + /// Applies reverse-video to the current node: + /// Background and foreground colors are swapped. pub fn attr_reverse(&mut self) { let mut last_node = self.tree.last_node.borrow_mut(); last_node.attributes.reverse = true; } + /// Checks if the current keyboard input matches the given shortcut, + /// consumes it if it is and returns true in that case. pub fn consume_shortcut(&mut self, shortcut: InputKey) -> bool { if !self.input_consumed && self.input_keyboard == Some(shortcut) { self.set_input_consumed(); @@ -1381,26 +1646,31 @@ impl<'a> Context<'a, '_> { self.input_consumed = true; } + /// Returns whether the mouse was pressed down on the current node. pub fn was_mouse_down(&mut self) -> bool { let last_node = self.tree.last_node.borrow(); self.tui.was_mouse_down_on_node(last_node.id) } + /// Returns whether the mouse was pressed down on the current node's subtree. pub fn contains_mouse_down(&mut self) -> bool { let last_node = self.tree.last_node.borrow(); self.tui.was_mouse_down_on_subtree(&last_node) } + /// Returns whether the current node is focused. pub fn is_focused(&mut self) -> bool { let last_node = self.tree.last_node.borrow(); self.tui.is_node_focused(last_node.id) } + /// Returns whether the current node's subtree is focused. pub fn contains_focus(&mut self) -> bool { let last_node = self.tree.last_node.borrow(); self.tui.is_subtree_focused(&last_node) } + /// Begins a modal window. Call [`Context::modal_end()`]. pub fn modal_begin(&mut self, classname: &'static str, title: &str) { self.block_begin(classname); self.attr_float(FloatSpec { anchor: Anchor::Root, ..Default::default() }); @@ -1433,12 +1703,16 @@ impl<'a> Context<'a, '_> { self.last_modal = Some(self.tree.last_node); } + /// Ends the current modal window block. pub fn modal_end(&mut self) -> bool { self.block_end(); self.block_end(); self.contains_focus() && self.consume_shortcut(vk::ESCAPE) } + /// Begins a table block. Call [`Context::table_end()`]. + /// Tables are the primary way to create a grid layout, + /// and to layout controls on a single row (= a table with 1 row). pub fn table_begin(&mut self, classname: &'static str) { self.block_begin(classname); @@ -1449,6 +1723,8 @@ impl<'a> Context<'a, '_> { }); } + /// Assigns widths to the columns of the current table. + /// By default, the table will left-align all columns. pub fn table_set_columns(&mut self, columns: &[CoordType]) { let mut last_node = self.tree.last_node.borrow_mut(); if let NodeContent::Table(spec) = &mut last_node.content { @@ -1459,6 +1735,7 @@ impl<'a> Context<'a, '_> { } } + /// Assigns the gap between cells in the current table. pub fn table_set_cell_gap(&mut self, cell_gap: Size) { let mut last_node = self.tree.last_node.borrow_mut(); if let NodeContent::Table(spec) = &mut last_node.content { @@ -1468,6 +1745,7 @@ impl<'a> Context<'a, '_> { } } + /// Starts the next row in the current table. pub fn table_next_row(&mut self) { { let current_node = self.tree.current_node.borrow(); @@ -1492,6 +1770,7 @@ impl<'a> Context<'a, '_> { self.block_begin("row"); } + /// Ends the current table block. pub fn table_end(&mut self) { let current_node = self.tree.current_node.borrow(); @@ -1504,12 +1783,29 @@ impl<'a> Context<'a, '_> { self.block_end(); // table } + /// Creates a simple text label. pub fn label(&mut self, classname: &'static str, text: &str) { self.styled_label_begin(classname); self.styled_label_add_text(text); self.styled_label_end(); } + /// Creates a styled text label. + /// + /// # Example + /// ``` + /// use edit::framebuffer::IndexedColor; + /// use edit::tui::Context; + /// + /// fn draw(ctx: &mut Context) { + /// ctx.styled_label_begin("label"); + /// // Shows "Hello" in the inherited foreground color. + /// ctx.styled_label_add_text("Hello"); + /// // Shows ", World!" next to "Hello" in red. + /// ctx.styled_label_set_foreground(ctx.indexed(IndexedColor::Red)); + /// ctx.styled_label_add_text(", World!"); + /// } + /// ``` pub fn styled_label_begin(&mut self, classname: &'static str) { self.block_begin(classname); self.tree.last_node.borrow_mut().content = NodeContent::Text(TextContent { @@ -1519,6 +1815,7 @@ impl<'a> Context<'a, '_> { }); } + /// Changes the active pencil color of the current label. pub fn styled_label_set_foreground(&mut self, fg: u32) { let mut node = self.tree.last_node.borrow_mut(); let NodeContent::Text(content) = &mut node.content else { @@ -1535,6 +1832,7 @@ impl<'a> Context<'a, '_> { } } + /// Changes the active pencil attributes of the current label. pub fn styled_label_set_attributes(&mut self, attr: Attributes) { let mut node = self.tree.last_node.borrow_mut(); let NodeContent::Text(content) = &mut node.content else { @@ -1547,6 +1845,7 @@ impl<'a> Context<'a, '_> { } } + /// Adds text to the current label. pub fn styled_label_add_text(&mut self, text: &str) { let mut node = self.tree.last_node.borrow_mut(); let NodeContent::Text(content) = &mut node.content else { @@ -1556,6 +1855,7 @@ impl<'a> Context<'a, '_> { content.text.push_str(text); } + /// Ends the current label block. pub fn styled_label_end(&mut self) { { let mut last_node = self.tree.last_node.borrow_mut(); @@ -1573,6 +1873,7 @@ impl<'a> Context<'a, '_> { self.block_end(); } + /// Sets the overflow behavior of the current label. pub fn attr_overflow(&mut self, overflow: Overflow) { let mut last_node = self.tree.last_node.borrow_mut(); let NodeContent::Text(content) = &mut last_node.content else { @@ -1582,6 +1883,8 @@ impl<'a> Context<'a, '_> { content.overflow = overflow; } + /// Creates a button with the given text. + /// Returns true if the button was activated. pub fn button(&mut self, classname: &'static str, text: &str) -> bool { self.styled_label_begin(classname); self.attr_focusable(); @@ -1596,6 +1899,8 @@ impl<'a> Context<'a, '_> { self.button_activated() } + /// Creates a checkbox with the given text. + /// Returns true if the checkbox was activated. pub fn checkbox(&mut self, classname: &'static str, text: &str, checked: &mut bool) -> bool { self.styled_label_begin(classname); self.attr_focusable(); @@ -1628,6 +1933,8 @@ impl<'a> Context<'a, '_> { } } + /// Creates a text input field. + /// Returns true if the text contents changed. pub fn editline<'s, 'b: 's>( &'s mut self, classname: &'static str, @@ -1636,6 +1943,7 @@ impl<'a> Context<'a, '_> { self.textarea_internal(classname, TextBufferPayload::Editline(text)) } + /// Creates a text area. pub fn textarea(&mut self, classname: &'static str, tb: RcTextBuffer) { self.textarea_internal(classname, TextBufferPayload::Textarea(tb)); } @@ -2244,6 +2552,7 @@ impl<'a> Context<'a, '_> { tc.scroll_offset.y = scroll_y; } + /// Creates a scrollable area. pub fn scrollarea_begin(&mut self, classname: &'static str, intrinsic_size: Size) { self.block_begin(classname); @@ -2270,6 +2579,7 @@ impl<'a> Context<'a, '_> { self.tree.last_node = container_node; } + /// Scrolls the current scrollable area to the given position. pub fn scrollarea_scroll_to(&mut self, pos: Point) { let mut container = self.tree.last_node.borrow_mut(); if let NodeContent::Scrollarea(sc) = &mut container.content { @@ -2279,6 +2589,7 @@ impl<'a> Context<'a, '_> { } } + /// Ends the current scrollarea block. pub fn scrollarea_end(&mut self) { self.block_end(); // content block self.block_end(); // outer container @@ -2366,6 +2677,7 @@ impl<'a> Context<'a, '_> { } } + /// Creates a list where exactly one items is selected. pub fn list_begin(&mut self, classname: &'static str) { self.block_begin(classname); self.attr_focusable(); @@ -2387,12 +2699,15 @@ impl<'a> Context<'a, '_> { last_node.content = NodeContent::List(content); } + /// Creates a list item with the given text. pub fn list_item(&mut self, select: bool, text: &str) -> ListSelection { self.styled_list_item_begin(); self.styled_label_add_text(text); self.styled_list_item_end(select) } + /// Creates a list item consisting of a styled label. + /// See [`Context::styled_label_begin`]. pub fn styled_list_item_begin(&mut self) { let list = self.tree.current_node; let idx = list.borrow().child_count; @@ -2403,6 +2718,7 @@ impl<'a> Context<'a, '_> { self.attr_focusable(); } + /// Ends the current styled list item. pub fn styled_list_item_end(&mut self, select: bool) -> ListSelection { self.styled_label_end(); @@ -2458,6 +2774,7 @@ impl<'a> Context<'a, '_> { } } + /// Ends the current list block. pub fn list_end(&mut self) { self.block_end(); @@ -2565,12 +2882,16 @@ impl<'a> Context<'a, '_> { } } + /// Creates a menubar, to be shown at the top of the screen. pub fn menubar_begin(&mut self) { self.table_begin("menubar"); self.attr_focus_well(); self.table_next_row(); } + /// Appends a menu to the current menubar. + /// + /// Returns true if the menu is open. Continue appending items to it in that case. pub fn menubar_menu_begin(&mut self, text: &str, accelerator: char) -> bool { let mixin = self.tree.current_node.borrow().child_count as u64; self.next_block_id_mixin(mixin); @@ -2614,6 +2935,7 @@ impl<'a> Context<'a, '_> { } } + /// Appends a button to the current menu. pub fn menubar_menu_button( &mut self, text: &str, @@ -2623,6 +2945,8 @@ impl<'a> Context<'a, '_> { self.menubar_menu_checkbox(text, accelerator, shortcut, false) } + /// Appends a checkbox to the current menu. + /// Returns true if the checkbox was activated. pub fn menubar_menu_checkbox( &mut self, text: &str, @@ -2658,6 +2982,7 @@ impl<'a> Context<'a, '_> { clicked } + /// Ends the current menu. pub fn menubar_menu_end(&mut self) { self.table_end(); @@ -2688,6 +3013,7 @@ impl<'a> Context<'a, '_> { } } + /// Ends the current menubar. pub fn menubar_end(&mut self) { self.table_end(); } @@ -2766,6 +3092,7 @@ impl<'a> Context<'a, '_> { } } +/// See [`Tree::visit_all`]. #[derive(Clone, Copy)] enum VisitControl { Continue, @@ -2773,6 +3100,7 @@ enum VisitControl { Stop, } +/// Stores the root of the "DOM" tree of the UI. struct Tree<'a> { tail: &'a NodeCell<'a>, root_first: &'a NodeCell<'a>, @@ -2785,6 +3113,8 @@ struct Tree<'a> { } impl<'a> Tree<'a> { + /// Creates a new tree inside the given arena. + /// A single root node is added for the main contents. fn new(arena: &'a Arena) -> Self { let root = Self::alloc_node(arena); { @@ -2809,6 +3139,7 @@ impl<'a> Tree<'a> { arena.alloc_uninit().write(Default::default()) } + /// Appends a child node to the current node. fn push_child(&mut self, node: &'a NodeCell<'a>) { let mut n = node.borrow_mut(); n.parent = Some(self.current_node); @@ -2844,6 +3175,8 @@ impl<'a> Tree<'a> { self.checksum = wymix(self.checksum, n.id); } + /// Removes the current node from its parent and appends it as a new root. + /// Used for [`Context::attr_float`]. fn move_node_to_root(&mut self, node: &'a NodeCell<'a>, anchor: Option<&'a NodeCell<'a>>) { let mut n = node.borrow_mut(); let Some(parent) = n.parent else { @@ -2879,6 +3212,7 @@ impl<'a> Tree<'a> { self.root_last = node; } + /// Completes the current node and moves focus to the parent. fn pop_stack(&mut self) { let current_node = self.current_node.borrow(); let stack_parent = current_node.stack_parent.unwrap(); @@ -2954,6 +3288,10 @@ impl<'a> Tree<'a> { } } +/// A hashmap of node IDs to nodes. +/// +/// This map uses a simple open addressing scheme with linear probing. +/// It's fast, simple, and sufficient for the small number of nodes we have. struct NodeMap<'a> { slots: &'a [Option<&'a NodeCell<'a>>], shift: usize, @@ -2967,7 +3305,10 @@ impl Default for NodeMap<'static> { } impl<'a> NodeMap<'a> { + /// Creates a new node map for the given tree. fn new(arena: &'a Arena, tree: &Tree<'a>) -> Self { + // Since we aren't expected to have millions of nodes, + // we allocate 4x the number of slots for a 25% fill factor. let width = (4 * tree.count + 1).ilog2().max(1) as usize; let slots = 1 << width; let shift = 64 - width; @@ -2997,6 +3338,7 @@ impl<'a> NodeMap<'a> { Self { slots, shift, mask } } + /// Gets a node by its ID. fn get(&mut self, id: u64) -> Option<&'a NodeCell<'a>> { let shift = self.shift; let mask = self.mask; @@ -3021,7 +3363,7 @@ struct FloatAttributes { offset_y: f32, } -// NOTE: Must not contain items that require drop(). +/// NOTE: Must not contain items that require drop(). #[derive(Default)] struct NodeAttributes { float: Option, @@ -3036,20 +3378,20 @@ struct NodeAttributes { focus_void: bool, // Prevents focus from entering via Tab } -// NOTE: Must not contain items that require drop(). +/// NOTE: Must not contain items that require drop(). struct ListContent<'a> { selected: u64, // Points to the Node that holds this ListContent instance, if any>. selected_node: Option<&'a NodeCell<'a>>, } -// NOTE: Must not contain items that require drop(). +/// NOTE: Must not contain items that require drop(). struct TableContent<'a> { columns: Vec, cell_gap: Size, } -// NOTE: Must not contain items that require drop(). +/// NOTE: Must not contain items that require drop(). struct StyledTextChunk { offset: usize, fg: u32, @@ -3059,14 +3401,14 @@ struct StyledTextChunk { const INVALID_STYLED_TEXT_CHUNK: StyledTextChunk = StyledTextChunk { offset: usize::MAX, fg: 0, attr: Attributes::None }; -// NOTE: Must not contain items that require drop(). +/// NOTE: Must not contain items that require drop(). struct TextContent<'a> { text: ArenaString<'a>, chunks: Vec, overflow: Overflow, } -// NOTE: Must not contain items that require drop(). +/// NOTE: Must not contain items that require drop(). struct TextareaContent<'a> { buffer: &'a TextBufferCell, @@ -3081,7 +3423,7 @@ struct TextareaContent<'a> { has_focus: bool, } -// NOTE: Must not contain items that require drop(). +/// NOTE: Must not contain items that require drop(). #[derive(Clone)] struct ScrollareaContent { scroll_offset: Point, @@ -3089,7 +3431,7 @@ struct ScrollareaContent { thumb_height: CoordType, } -// NOTE: Must not contain items that require drop(). +/// NOTE: Must not contain items that require drop(). #[derive(Default)] enum NodeContent<'a> { #[default] @@ -3102,7 +3444,7 @@ enum NodeContent<'a> { Scrollarea(ScrollareaContent), } -// NOTE: Must not contain items that require drop(). +/// NOTE: Must not contain items that require drop(). #[derive(Default)] struct NodeSiblings<'a> { prev: Option<&'a NodeCell<'a>>, @@ -3122,7 +3464,7 @@ impl<'a> NodeSiblings<'a> { } } -// NOTE: Must not contain items that require drop(). +/// NOTE: Must not contain items that require drop(). #[derive(Default)] struct NodeChildren<'a> { first: Option<&'a NodeCell<'a>>, @@ -3144,7 +3486,9 @@ impl<'a> NodeChildren<'a> { type NodeCell<'a> = SemiRefCell>; -// NOTE: Must not contain items that require drop(). +/// A node in the UI tree. +/// +/// NOTE: Must not contain items that require drop(). #[derive(Default)] struct Node<'a> { prev: Option<&'a NodeCell<'a>>, @@ -3171,6 +3515,8 @@ struct Node<'a> { } impl Node<'_> { + /// Given an outer rectangle (including padding and borders) of this node, + /// this returns the inner rectangle (excluding padding and borders). fn outer_to_inner(&self, mut outer: Rect) -> Rect { let l = self.attributes.bordered; let t = self.attributes.bordered; @@ -3184,6 +3530,8 @@ impl Node<'_> { outer } + /// Given an intrinsic size (excluding padding and borders) of this node, + /// this returns the outer size (including padding and borders). fn intrinsic_to_outer(&self) -> Size { let l = self.attributes.bordered; let t = self.attributes.bordered; @@ -3202,6 +3550,7 @@ impl Node<'_> { size } + /// Computes the intrinsic size of this node and its children. fn compute_intrinsic_size(&mut self) { match &mut self.content { NodeContent::Table(spec) => { diff --git a/src/unicode/measurement.rs b/src/unicode/measurement.rs index 115dac4..eb1540d 100644 --- a/src/unicode/measurement.rs +++ b/src/unicode/measurement.rs @@ -6,17 +6,24 @@ use crate::document::ReadableDocument; use crate::helpers::{CoordType, Point}; use crate::simd::{memchr2, memrchr2}; +/// Stores a position inside a [`ReadableDocument`]. +/// +/// The cursor tracks both the absolute byte-offset, +/// as well as the position in terminal-related coordinates. #[derive(Default, Debug, Clone, Copy, PartialEq, Eq)] pub struct Cursor { /// Offset in bytes within the buffer. pub offset: usize, /// Position in the buffer in lines (.y) and grapheme clusters (.x). + /// /// Line wrapping has NO influence on this. pub logical_pos: Point, /// Position in the buffer in laid out rows (.y) and columns (.x). + /// /// Line wrapping has an influence on this. pub visual_pos: Point, /// Horizontal position in visual columns. + /// /// Line wrapping has NO influence on this and if word wrap is disabled, /// it's identical to `visual_pos.x`. This is useful for calculating tab widths. pub column: CoordType, @@ -27,6 +34,7 @@ pub struct Cursor { pub wrap_opp: bool, } +/// Your entrypoint to navigating inside a [`ReadableDocument`]. #[derive(Clone)] pub struct MeasurementConfig<'doc> { buffer: &'doc dyn ReadableDocument, @@ -36,25 +44,41 @@ pub struct MeasurementConfig<'doc> { } impl<'doc> MeasurementConfig<'doc> { + /// Creates a new [`MeasurementConfig`] for the given document. pub fn new(buffer: &'doc dyn ReadableDocument) -> Self { Self { buffer, tab_size: 8, word_wrap_column: 0, cursor: Default::default() } } + /// Sets the tab size. + /// + /// Defaults to 8, because that's what a tab in terminals evaluates to. pub fn with_tab_size(mut self, tab_size: CoordType) -> Self { self.tab_size = tab_size.max(1); self } + /// You want word wrap? Set it here! + /// + /// Defaults to 0, which means no word wrap. pub fn with_word_wrap_column(mut self, word_wrap_column: CoordType) -> Self { self.word_wrap_column = word_wrap_column; self } + /// Sets the initial cursor to the given position. + /// + /// WARNING: While the code doesn't panic if the cursor is invalid, + /// the results will obviously be complete garbage. pub fn with_cursor(mut self, cursor: Cursor) -> Self { self.cursor = cursor; self } + /// Navigates **forward** to the given absolute offset. + /// + /// # Returns + /// + /// The cursor position after the navigation. pub fn goto_offset(&mut self, offset: usize) -> Cursor { self.cursor = Self::measure_forward( self.tab_size, @@ -68,6 +92,13 @@ impl<'doc> MeasurementConfig<'doc> { self.cursor } + /// Navigates **forward** to the given logical position. + /// + /// Logical positions are in lines and grapheme clusters. + /// + /// # Returns + /// + /// The cursor position after the navigation. pub fn goto_logical(&mut self, logical_target: Point) -> Cursor { self.cursor = Self::measure_forward( self.tab_size, @@ -81,6 +112,13 @@ impl<'doc> MeasurementConfig<'doc> { self.cursor } + /// Navigates **forward** to the given visual position. + /// + /// Visual positions are in laid out rows and columns. + /// + /// # Returns + /// + /// The cursor position after the navigation. pub fn goto_visual(&mut self, visual_target: Point) -> Cursor { self.cursor = Self::measure_forward( self.tab_size, @@ -94,6 +132,7 @@ impl<'doc> MeasurementConfig<'doc> { self.cursor } + /// Returns the current cursor position. pub fn cursor(&self) -> Cursor { self.cursor } @@ -447,10 +486,33 @@ impl<'doc> MeasurementConfig<'doc> { } } -// TODO: This code could be optimized by replacing memchr with manual line counting. -// If `line_stop` is very far away, we could accumulate newline counts horizontally -// in a AVX2 register (= 32 u8 slots). Then, every 256 bytes we compute the horizontal -// sum via `_mm256_sad_epu8` yielding us the newline count in the last block. +/// Seeks forward to to the given line start. +/// +/// If given a piece of `text`, and assuming you're currently at `offset` which +/// is on the logical line `line`, this will seek forward until the logical line +/// `line_stop` is reached. For instance, if `line` is 0 and `line_stop` is 2, +/// it'll seek forward past 2 line feeds. +/// +/// This function always stops exactly past a line feed +/// and thus returns a position at the start of a line. +/// +/// # Warning +/// +/// If the end of `text` is hit before reaching `line_stop`, the function +/// will return an offset of `text.len()`, not at the start of a line. +/// +/// # Parameters +/// +/// * `text`: The text to search in. +/// * `offset`: The offset to start searching from. +/// * `line`: The current line. +/// * `line_stop`: The line to stop at. +/// +/// # Returns +/// +/// A tuple consisting of: +/// * The new offset. +/// * The line number that was reached. pub fn newlines_forward( text: &[u8], mut offset: usize, @@ -467,6 +529,13 @@ pub fn newlines_forward( offset = offset.min(len); loop { + // TODO: This code could be optimized by replacing memchr with manual line counting. + // + // If `line_stop` is very far away, we could accumulate newline counts horizontally + // in a AVX2 register (= 32 u8 slots). Then, every 256 bytes we compute the horizontal + // sum via `_mm256_sad_epu8` yielding us the newline count in the last block. + // + // We could also just use `_mm256_sad_epu8` on each fetch as-is. offset = memchr2(b'\n', b'\n', text, offset); if offset >= len { break; @@ -482,9 +551,18 @@ pub fn newlines_forward( (offset, line) } -// Seeks to the start of the given line. -// No matter what parameters are given, it only returns an offset at the start of a line. -// Put differently, even if `line == line_stop`, it'll seek backward to the line start. +/// Seeks backward to the given line start. +/// +/// See [`newlines_forward`] for details. +/// This function does almost the same thing, but in reverse. +/// +/// # Warning +/// +/// In addition to the notes in [`newlines_forward`]: +/// +/// No matter what parameters are given, [`newlines_backward`] only returns an +/// offset at the start of a line. Put differently, even if `line == line_stop`, +/// it'll seek backward to the line start. pub fn newlines_backward( text: &[u8], mut offset: usize, @@ -506,6 +584,10 @@ pub fn newlines_backward( } } +/// Returns an offset past a newline. +/// +/// If `offset` is right in front of a newline, +/// this will return the offset past said newline. pub fn skip_newline(text: &[u8], mut offset: usize) -> usize { if offset >= text.len() { return offset; @@ -522,6 +604,7 @@ pub fn skip_newline(text: &[u8], mut offset: usize) -> usize { offset } +/// Strips a trailing newline from the given text. pub fn strip_newline(mut text: &[u8]) -> &[u8] { // Rust generates surprisingly tight assembly for this. if text.last() == Some(&b'\n') { diff --git a/src/unicode/mod.rs b/src/unicode/mod.rs index ffe4b8a..ee139ee 100644 --- a/src/unicode/mod.rs +++ b/src/unicode/mod.rs @@ -1,3 +1,5 @@ +//! Everything related to Unicode lives here. + mod measurement; mod tables; mod utf8; diff --git a/src/unicode/utf8.rs b/src/unicode/utf8.rs index 4b8cb38..7fcefc7 100644 --- a/src/unicode/utf8.rs +++ b/src/unicode/utf8.rs @@ -1,5 +1,14 @@ use std::{hint, iter}; +/// An iterator over UTF-8 encoded characters. +/// +/// This differs from [`std::str::Chars`] in that it works on unsanitized +/// byte slices and transparently replaces invalid UTF-8 sequences with U+FFFD. +/// +/// This follows ICU's bitmask approach for `U8_NEXT_OR_FFFD` relatively +/// closely. This is important for compatibility, because it implements the +/// WHATWG recommendation for UTF8 error recovery. It's also helpful, because +/// the excellent folks at ICU have probably spent a lot of time optimizing it. #[derive(Clone, Copy)] pub struct Utf8Chars<'a> { source: &'a [u8], @@ -7,30 +16,39 @@ pub struct Utf8Chars<'a> { } impl<'a> Utf8Chars<'a> { + /// Creates a new `Utf8Chars` iterator starting at the given `offset`. pub fn new(source: &'a [u8], offset: usize) -> Self { Self { source, offset } } + /// Returns the byte slice this iterator was created with. pub fn source(&self) -> &'a [u8] { self.source } + /// Checks if the source is empty. pub fn is_empty(&self) -> bool { self.source.is_empty() } + /// Returns the length of the source. pub fn len(&self) -> usize { self.source.len() } + /// Returns the current offset in the byte slice. + /// + /// This will be past the last returned character. pub fn offset(&self) -> usize { self.offset } + /// Sets the offset to continue iterating from. pub fn seek(&mut self, offset: usize) { self.offset = offset; } + /// Returns true if `next` will return another character. pub fn has_next(&self) -> bool { self.offset < self.source.len() } @@ -39,9 +57,6 @@ impl<'a> Utf8Chars<'a> { // performance actually suffers when this gets inlined. #[cold] fn next_slow(&mut self, c: u8) -> char { - // See: https://datatracker.ietf.org/doc/html/rfc3629 - // as well as ICU's `utf8.h` for the bitmask approach. - if self.offset >= self.source.len() { return Self::fffd(); } @@ -114,12 +129,10 @@ impl<'a> Utf8Chars<'a> { // The trail byte is the index and the lead byte mask is the value. // This is because the split at 0x90 requires more bits than fit into an u8. const TRAIL1_LEAD_BITS: [u8; 16] = [ - // +------ 0xF4 lead - // |+----- 0xF3 lead - // ||+---- 0xF2 lead - // |||+--- 0xF1 lead - // ||||+-- 0xF0 lead - // vvvvv + // --------- 0xF4 lead + // | ... + // | +---- 0xF0 lead + // v v 0b_00000, // 0b_00000, // 0b_00000, // @@ -143,6 +156,8 @@ impl<'a> Utf8Chars<'a> { cp &= !0xF0; // Now we can verify if it's actually <= 0xF4. + // Curiously, this if condition does a lot of heavy lifting for + // performance (+13%). I think it's just a coincidence though. if cp > 4 { return Self::fffd(); } @@ -191,7 +206,8 @@ impl<'a> Utf8Chars<'a> { } } - // Improves performance by ~5% and reduces code size. + // This simultaneously serves as a `cold_path` marker. + // It improves performance by ~5% and reduces code size. #[cold] #[inline(always)] fn fffd() -> char { @@ -202,8 +218,6 @@ impl<'a> Utf8Chars<'a> { impl Iterator for Utf8Chars<'_> { type Item = char; - // At opt-level="s", this function doesn't get inlined, - // but performance greatly suffers in that case. #[inline] fn next(&mut self) -> Option { if self.offset >= self.source.len() { diff --git a/src/vt.rs b/src/vt.rs index 91063db..2306ce5 100644 --- a/src/vt.rs +++ b/src/vt.rs @@ -1,19 +1,38 @@ +//! Our VT parser. + use std::{mem, time}; use crate::simd::memchr2; +/// The parser produces these tokens. pub enum Token<'parser, 'input> { + /// A bunch of text. Doesn't contain any control characters. Text(&'input str), + /// A single control character, like backspace or return. Ctrl(char), + /// We encountered `ESC x` and this contains `x`. Esc(char), + /// We encountered `ESC O x` and this contains `x`. SS3(char), + /// A CSI sequence started with `ESC [`. + /// + /// They are the most common escape sequences. See [`Csi`]. Csi(&'parser Csi), + /// An OSC sequence started with `ESC ]`. + /// + /// The sequence may be split up into multiple tokens if the input + /// is given in chunks. This is indicated by the `partial` field. Osc { data: &'input str, partial: bool }, + /// An DCS sequence started with `ESC P`. + /// + /// The sequence may be split up into multiple tokens if the input + /// is given in chunks. This is indicated by the `partial` field. Dcs { data: &'input str, partial: bool }, } +/// Stores the state of the parser. #[derive(Clone, Copy)] -pub enum State { +enum State { Ground, Esc, Ss3, @@ -24,10 +43,20 @@ pub enum State { DcsEsc, } +/// A single CSI sequence, parsed for your convenience. pub struct Csi { + /// The parameters of the CSI sequence. pub params: [u16; 32], + /// The number of parameters stored in [`Csi::params`]. pub param_count: usize, + /// The private byte, if any. `0` if none. + /// + /// The private byte is the first character right after the + /// `ESC [` sequence. It is usually a `?` or `<`. pub private_byte: char, + /// The final byte of the CSI sequence. + /// + /// This is the last character of the sequence, e.g. `m` or `H`. pub final_byte: char, } @@ -73,6 +102,9 @@ impl Parser { } } +/// An iterator that parses VT sequences into [`Token`]s. +/// +/// Can't implement [`Iterator`], because this is a "lending iterator". pub struct Stream<'parser, 'input> { parser: &'parser mut Parser, input: &'input str, @@ -80,10 +112,12 @@ pub struct Stream<'parser, 'input> { } impl<'parser, 'input> Stream<'parser, 'input> { + /// Returns the input that is being parsed. pub fn input(&self) -> &'input str { self.input } + /// Returns the current parser offset. pub fn offset(&self) -> usize { self.off } @@ -99,8 +133,6 @@ impl<'parser, 'input> Stream<'parser, 'input> { } /// Parses the next VT sequence from the previously given input. - /// - /// Can't implement Iterator, because this is a "lending iterator". #[allow(clippy::should_implement_trait)] pub fn next(&mut self) -> Option> { // I don't know how to tell Rust that `self.parser` and its lifetime diff --git a/tools/grapheme-table-gen/README.md b/tools/grapheme-table-gen/README.md new file mode 100644 index 0000000..48d3a37 --- /dev/null +++ b/tools/grapheme-table-gen/README.md @@ -0,0 +1,15 @@ +# Grapheme Table Generator + +This tool processes Unicode Character Database (UCD) XML files to generate efficient, multi-stage trie lookup tables for properties relevant to terminal applications: +* Grapheme cluster breaking rules +* Line breaking rules (optional) +* Character width properties + +## Usage + +* Download [ucd.nounihan.grouped.zip](https://www.unicode.org/Public/UCD/latest/ucdxml/ucd.nounihan.grouped.zip) +* Run some equivalent of: + ```sh + grapheme-table-gen --lang=rust --extended --no-ambiguous --line-breaks path/to/ucd.nounihan.grouped.xml + ``` +* Place the result in `src/unicode/tables.rs`