feat(tui): syntax highlighting via syntect with theme picker (#11447)

## Summary

Adds syntax highlighting to the TUI for fenced code blocks in markdown
responses and file diffs, plus a `/theme` command with live preview and
persistent theme selection. Uses syntect (~250 grammars, 32 bundled
themes, ~1 MB binary cost) — the same engine behind `bat`, `delta`, and
`xi-editor`. Includes guardrails for large inputs, graceful fallback to
plain text, and SSH-aware clipboard integration for the `/copy` command.

<img width="1554" height="1014" alt="image"
src="https://github.com/user-attachments/assets/38737a79-8717-4715-b857-94cf1ba59b85"
/>

<img width="2354" height="1374" alt="image"
src="https://github.com/user-attachments/assets/25d30a00-c487-4af8-9cb6-63b0695a4be7"
/>

## Problem

Code blocks in the TUI (markdown responses and file diffs) render
without syntax highlighting, making it hard to scan code at a glance.
Users also have no way to pick a color theme that matches their terminal
aesthetic.

## Mental model

The highlighting system has three layers:

1. **Syntax engine** (`render::highlight`) -- a thin wrapper around
syntect + two-face. It owns a process-global `SyntaxSet` (~250 grammars)
and a `RwLock<Theme>` that can be swapped at runtime. All public entry
points accept `(code, lang)` and return ratatui `Span`/`Line` vectors or
`None` when the language is unrecognized or the input exceeds safety
guardrails.

2. **Rendering consumers** -- `markdown_render` feeds fenced code blocks
through the engine; `diff_render` highlights Add/Delete content as a
whole file and Update hunks per-hunk (preserving parser state across
hunk lines). Both callers fall back to plain unstyled text when the
engine returns `None`.

3. **Theme lifecycle** -- at startup the config's `tui.theme` is
resolved to a syntect `Theme` via `set_theme_override`. At runtime the
`/theme` picker calls `set_syntax_theme` to swap themes live; on cancel
it restores the snapshot taken at open. On confirm it persists `[tui]
theme = "..."` to config.toml.

## Non-goals

- Inline diff highlighting (word-level change detection within a line).
- Semantic / LSP-backed highlighting.
- Theme authoring tooling; users supply standard `.tmTheme` files.

## Tradeoffs

| Decision | Upside | Downside |
| ------------------------------------------------ |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
-----------------------------------------------------------------------------------------------------------------------
|
| syntect over tree-sitter / arborium | ~1 MB binary increase for ~250
grammars + 32 themes; battle-tested crate powering widely-used tools
(`bat`, `delta`, `xi-editor`). tree-sitter would add ~12 MB for 20-30
languages or ~35 MB for full coverage. | Regex-based; less structurally
accurate than tree-sitter for some languages (e.g. language injections
like JS-in-HTML). |
| Global `RwLock<Theme>` | Enables live `/theme` preview without
threading Theme through every call site | Lock contention risk
(mitigated: reads vastly outnumber writes, single UI thread) |
| Skip background / italic / underline from themes | Terminal BG
preserved, avoids ugly rendering on some themes | Themes that rely on
these properties lose fidelity |
| Guardrails: 512 KB / 10k lines | Prevents pathological stalls on huge
diffs or pastes | Very large files render without color |

## Architecture

```
config.toml  ─[tui.theme]─>  set_theme_override()  ─>  THEME (RwLock)
                                                              │
                  ┌───────────────────────────────────────────┘
                  │
  markdown_render ─── highlight_code_to_lines(code, lang) ─> Vec<Line>
  diff_render     ─── highlight_code_to_styled_spans(code, lang) ─> Option<Vec<Vec<Span>>>
                  │
                  │   (None ⇒ plain text fallback)
                  │
  /theme picker   ─── set_syntax_theme(theme)    // live preview swap
                  ─── current_syntax_theme()      // snapshot for cancel
                  ─── resolve_theme_by_name(name) // lookup by kebab-case
```

Key files:

- `tui/src/render/highlight.rs` -- engine, theme management, guardrails
- `tui/src/diff_render.rs` -- syntax-aware diff line wrapping
- `tui/src/theme_picker.rs` -- `/theme` command builder
- `tui/src/bottom_pane/list_selection_view.rs` -- side content panel,
callbacks
- `core/src/config/types.rs` -- `Tui::theme` field
- `core/src/config/edit.rs` -- `syntax_theme_edit()` helper

## Observability

- `tracing::warn` when a configured theme name cannot be resolved.
- `Config::startup_warnings` surfaces the same message as a TUI banner.
- `tracing::error` when persisting theme selection fails.

## Tests

- Unit tests in `highlight.rs`: language coverage, fallback behavior,
CRLF stripping, style conversion, guardrail enforcement, theme name
mapping exhaustiveness.
- Unit tests in `diff_render.rs`: snapshot gallery at multiple terminal
sizes (80x24, 94x35, 120x40), syntax-highlighted wrapping, large-diff
guardrail, rename-to-different-extension highlighting, parser state
preservation across hunk lines.
- Unit tests in `theme_picker.rs`: preview rendering (wide + narrow),
dim overlay on deletions, subtitle truncation, cancel-restore, fallback
for unavailable configured theme.
- Unit tests in `list_selection_view.rs`: side layout geometry, stacked
fallback, buffer clearing, cancel/selection-changed callbacks.
- Integration test in `lib.rs`: theme warning uses the final
(post-resume) config.

## Cargo Deny: Unmaintained Dependency Exceptions

This PR adds two `cargo deny` advisory exceptions for transitive
dependencies pulled in by `syntect v5.3.0`:

| Advisory | Crate | Status |
|----------|-------|--------|
| RUSTSEC-2024-0320 | `yaml-rust` | Unmaintained (maintainer
unreachable) |
| RUSTSEC-2025-0141 | `bincode` | Unmaintained (development ceased;
v1.3.3 considered complete) |

**Why this is safe in our usage:**

- Neither advisory describes a known security vulnerability. Both are
"unmaintained" notices only.
- `bincode` is used by syntect to deserialize pre-compiled syntax sets.
Again, these are **static vendored artifacts** baked into the binary at
build time. No user-supplied bincode data is ever deserialized. - Attack
surface is zero for both crates; exploitation would require a
supply-chain compromise of our own build artifacts.
- These exceptions can be removed when syntect migrates to `yaml-rust2`
and drops `bincode`, or when alternative crates are available upstream.
This commit is contained in:
Felipe Coury
2026-02-22 01:26:58 -03:00
committed by GitHub
parent 1dad0a7f4a
commit c4f1af7a86
26 changed files with 3726 additions and 317 deletions

View File

@@ -1,3 +1,4 @@
use crate::render::highlight::highlight_code_to_lines;
use crate::render::line_utils::line_to_static;
use crate::wrapping::RtOptions;
use crate::wrapping::adaptive_wrap_line;
@@ -99,6 +100,8 @@ where
pending_marker_line: bool,
in_paragraph: bool,
in_code_block: bool,
code_block_lang: Option<String>,
code_block_buffer: String,
wrap_width: Option<usize>,
current_line_content: Option<Line<'static>>,
current_initial_indent: Vec<Span<'static>>,
@@ -124,6 +127,8 @@ where
pending_marker_line: false,
in_paragraph: false,
in_code_block: false,
code_block_lang: None,
code_block_buffer: String::new(),
wrap_width,
current_line_content: None,
current_initial_indent: Vec::new(),
@@ -278,6 +283,16 @@ where
self.push_line(Line::default());
}
self.pending_marker_line = false;
// When inside a fenced code block with a known language, accumulate
// text into the buffer for batch highlighting in end_codeblock().
// Append verbatim — pulldown-cmark text events already contain the
// original line breaks, so inserting separators would double them.
if self.in_code_block && self.code_block_lang.is_some() {
self.code_block_buffer.push_str(&text);
return;
}
if self.in_code_block && !self.needs_newline {
let has_content = self
.current_line_content
@@ -394,12 +409,25 @@ where
self.needs_newline = false;
}
fn start_codeblock(&mut self, _lang: Option<String>, indent: Option<Span<'static>>) {
fn start_codeblock(&mut self, lang: Option<String>, indent: Option<Span<'static>>) {
self.flush_current_line();
if !self.text.lines.is_empty() {
self.push_blank_line();
}
self.in_code_block = true;
// Extract the language token from the info string. CommonMark info
// strings can contain metadata after the language, separated by commas,
// spaces, or other delimiters (e.g. "rust,no_run", "rust title=demo").
// Take only the first token so the syntax lookup succeeds.
let lang = lang
.as_deref()
.and_then(|s| s.split([',', ' ', '\t']).next())
.filter(|s| !s.is_empty())
.map(std::string::ToString::to_string);
self.code_block_lang = lang;
self.code_block_buffer.clear();
self.indent_stack.push(IndentContext::new(
vec![indent.unwrap_or_default()],
None,
@@ -409,6 +437,20 @@ where
}
fn end_codeblock(&mut self) {
// If we buffered code for a known language, syntax-highlight it now.
if let Some(lang) = self.code_block_lang.take() {
let code = std::mem::take(&mut self.code_block_buffer);
if !code.is_empty() {
let highlighted = highlight_code_to_lines(&code, &lang);
for hl_line in highlighted {
self.push_line(Line::default());
for span in hl_line.spans {
self.push_span(span);
}
}
}
}
self.needs_newline = true;
self.in_code_block = false;
self.indent_stack.pop();
@@ -689,4 +731,39 @@ mod tests {
"expected full URL-like token in one rendered line, got: {lines:?}"
);
}
#[test]
fn fenced_code_info_string_with_metadata_highlights() {
// CommonMark info strings like "rust,no_run" or "rust title=demo"
// contain metadata after the language token. The language must be
// extracted (first word / comma-separated token) so highlighting works.
for info in &["rust,no_run", "rust no_run", "rust title=\"demo\""] {
let markdown = format!("```{info}\nfn main() {{}}\n```\n");
let rendered = render_markdown_text(&markdown);
let has_rgb = rendered.lines.iter().any(|line| {
line.spans
.iter()
.any(|s| matches!(s.style.fg, Some(ratatui::style::Color::Rgb(..))))
});
assert!(
has_rgb,
"info string \"{info}\" should still produce syntax highlighting"
);
}
}
#[test]
fn crlf_code_block_no_extra_blank_lines() {
// pulldown-cmark can split CRLF code blocks into multiple Text events.
// The buffer must concatenate them verbatim — no inserted separators.
let markdown = "```rust\r\nfn main() {}\r\n line2\r\n```\r\n";
let rendered = render_markdown_text(markdown);
let lines = lines_to_strings(&rendered);
// Should be exactly two code lines; no spurious blank line between them.
assert_eq!(
lines,
vec!["fn main() {}".to_string(), " line2".to_string()],
"CRLF code block should not produce extra blank lines: {lines:?}"
);
}
}