mirror of
https://github.com/openai/codex.git
synced 2026-05-02 04:11:39 +03:00
feat(tui): syntax highlighting via syntect with theme picker (#11447)
## Summary Adds syntax highlighting to the TUI for fenced code blocks in markdown responses and file diffs, plus a `/theme` command with live preview and persistent theme selection. Uses syntect (~250 grammars, 32 bundled themes, ~1 MB binary cost) — the same engine behind `bat`, `delta`, and `xi-editor`. Includes guardrails for large inputs, graceful fallback to plain text, and SSH-aware clipboard integration for the `/copy` command. <img width="1554" height="1014" alt="image" src="https://github.com/user-attachments/assets/38737a79-8717-4715-b857-94cf1ba59b85" /> <img width="2354" height="1374" alt="image" src="https://github.com/user-attachments/assets/25d30a00-c487-4af8-9cb6-63b0695a4be7" /> ## Problem Code blocks in the TUI (markdown responses and file diffs) render without syntax highlighting, making it hard to scan code at a glance. Users also have no way to pick a color theme that matches their terminal aesthetic. ## Mental model The highlighting system has three layers: 1. **Syntax engine** (`render::highlight`) -- a thin wrapper around syntect + two-face. It owns a process-global `SyntaxSet` (~250 grammars) and a `RwLock<Theme>` that can be swapped at runtime. All public entry points accept `(code, lang)` and return ratatui `Span`/`Line` vectors or `None` when the language is unrecognized or the input exceeds safety guardrails. 2. **Rendering consumers** -- `markdown_render` feeds fenced code blocks through the engine; `diff_render` highlights Add/Delete content as a whole file and Update hunks per-hunk (preserving parser state across hunk lines). Both callers fall back to plain unstyled text when the engine returns `None`. 3. **Theme lifecycle** -- at startup the config's `tui.theme` is resolved to a syntect `Theme` via `set_theme_override`. At runtime the `/theme` picker calls `set_syntax_theme` to swap themes live; on cancel it restores the snapshot taken at open. On confirm it persists `[tui] theme = "..."` to config.toml. ## Non-goals - Inline diff highlighting (word-level change detection within a line). - Semantic / LSP-backed highlighting. - Theme authoring tooling; users supply standard `.tmTheme` files. ## Tradeoffs | Decision | Upside | Downside | | ------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | | syntect over tree-sitter / arborium | ~1 MB binary increase for ~250 grammars + 32 themes; battle-tested crate powering widely-used tools (`bat`, `delta`, `xi-editor`). tree-sitter would add ~12 MB for 20-30 languages or ~35 MB for full coverage. | Regex-based; less structurally accurate than tree-sitter for some languages (e.g. language injections like JS-in-HTML). | | Global `RwLock<Theme>` | Enables live `/theme` preview without threading Theme through every call site | Lock contention risk (mitigated: reads vastly outnumber writes, single UI thread) | | Skip background / italic / underline from themes | Terminal BG preserved, avoids ugly rendering on some themes | Themes that rely on these properties lose fidelity | | Guardrails: 512 KB / 10k lines | Prevents pathological stalls on huge diffs or pastes | Very large files render without color | ## Architecture ``` config.toml ─[tui.theme]─> set_theme_override() ─> THEME (RwLock) │ ┌───────────────────────────────────────────┘ │ markdown_render ─── highlight_code_to_lines(code, lang) ─> Vec<Line> diff_render ─── highlight_code_to_styled_spans(code, lang) ─> Option<Vec<Vec<Span>>> │ │ (None ⇒ plain text fallback) │ /theme picker ─── set_syntax_theme(theme) // live preview swap ─── current_syntax_theme() // snapshot for cancel ─── resolve_theme_by_name(name) // lookup by kebab-case ``` Key files: - `tui/src/render/highlight.rs` -- engine, theme management, guardrails - `tui/src/diff_render.rs` -- syntax-aware diff line wrapping - `tui/src/theme_picker.rs` -- `/theme` command builder - `tui/src/bottom_pane/list_selection_view.rs` -- side content panel, callbacks - `core/src/config/types.rs` -- `Tui::theme` field - `core/src/config/edit.rs` -- `syntax_theme_edit()` helper ## Observability - `tracing::warn` when a configured theme name cannot be resolved. - `Config::startup_warnings` surfaces the same message as a TUI banner. - `tracing::error` when persisting theme selection fails. ## Tests - Unit tests in `highlight.rs`: language coverage, fallback behavior, CRLF stripping, style conversion, guardrail enforcement, theme name mapping exhaustiveness. - Unit tests in `diff_render.rs`: snapshot gallery at multiple terminal sizes (80x24, 94x35, 120x40), syntax-highlighted wrapping, large-diff guardrail, rename-to-different-extension highlighting, parser state preservation across hunk lines. - Unit tests in `theme_picker.rs`: preview rendering (wide + narrow), dim overlay on deletions, subtitle truncation, cancel-restore, fallback for unavailable configured theme. - Unit tests in `list_selection_view.rs`: side layout geometry, stacked fallback, buffer clearing, cancel/selection-changed callbacks. - Integration test in `lib.rs`: theme warning uses the final (post-resume) config. ## Cargo Deny: Unmaintained Dependency Exceptions This PR adds two `cargo deny` advisory exceptions for transitive dependencies pulled in by `syntect v5.3.0`: | Advisory | Crate | Status | |----------|-------|--------| | RUSTSEC-2024-0320 | `yaml-rust` | Unmaintained (maintainer unreachable) | | RUSTSEC-2025-0141 | `bincode` | Unmaintained (development ceased; v1.3.3 considered complete) | **Why this is safe in our usage:** - Neither advisory describes a known security vulnerability. Both are "unmaintained" notices only. - `bincode` is used by syntect to deserialize pre-compiled syntax sets. Again, these are **static vendored artifacts** baked into the binary at build time. No user-supplied bincode data is ever deserialized. - Attack surface is zero for both crates; exploitation would require a supply-chain compromise of our own build artifacts. - These exceptions can be removed when syntect migrates to `yaml-rust2` and drops `bincode`, or when alternative crates are available upstream.
This commit is contained in:
@@ -1,3 +1,4 @@
|
||||
use crate::render::highlight::highlight_code_to_lines;
|
||||
use crate::render::line_utils::line_to_static;
|
||||
use crate::wrapping::RtOptions;
|
||||
use crate::wrapping::adaptive_wrap_line;
|
||||
@@ -99,6 +100,8 @@ where
|
||||
pending_marker_line: bool,
|
||||
in_paragraph: bool,
|
||||
in_code_block: bool,
|
||||
code_block_lang: Option<String>,
|
||||
code_block_buffer: String,
|
||||
wrap_width: Option<usize>,
|
||||
current_line_content: Option<Line<'static>>,
|
||||
current_initial_indent: Vec<Span<'static>>,
|
||||
@@ -124,6 +127,8 @@ where
|
||||
pending_marker_line: false,
|
||||
in_paragraph: false,
|
||||
in_code_block: false,
|
||||
code_block_lang: None,
|
||||
code_block_buffer: String::new(),
|
||||
wrap_width,
|
||||
current_line_content: None,
|
||||
current_initial_indent: Vec::new(),
|
||||
@@ -278,6 +283,16 @@ where
|
||||
self.push_line(Line::default());
|
||||
}
|
||||
self.pending_marker_line = false;
|
||||
|
||||
// When inside a fenced code block with a known language, accumulate
|
||||
// text into the buffer for batch highlighting in end_codeblock().
|
||||
// Append verbatim — pulldown-cmark text events already contain the
|
||||
// original line breaks, so inserting separators would double them.
|
||||
if self.in_code_block && self.code_block_lang.is_some() {
|
||||
self.code_block_buffer.push_str(&text);
|
||||
return;
|
||||
}
|
||||
|
||||
if self.in_code_block && !self.needs_newline {
|
||||
let has_content = self
|
||||
.current_line_content
|
||||
@@ -394,12 +409,25 @@ where
|
||||
self.needs_newline = false;
|
||||
}
|
||||
|
||||
fn start_codeblock(&mut self, _lang: Option<String>, indent: Option<Span<'static>>) {
|
||||
fn start_codeblock(&mut self, lang: Option<String>, indent: Option<Span<'static>>) {
|
||||
self.flush_current_line();
|
||||
if !self.text.lines.is_empty() {
|
||||
self.push_blank_line();
|
||||
}
|
||||
self.in_code_block = true;
|
||||
|
||||
// Extract the language token from the info string. CommonMark info
|
||||
// strings can contain metadata after the language, separated by commas,
|
||||
// spaces, or other delimiters (e.g. "rust,no_run", "rust title=demo").
|
||||
// Take only the first token so the syntax lookup succeeds.
|
||||
let lang = lang
|
||||
.as_deref()
|
||||
.and_then(|s| s.split([',', ' ', '\t']).next())
|
||||
.filter(|s| !s.is_empty())
|
||||
.map(std::string::ToString::to_string);
|
||||
self.code_block_lang = lang;
|
||||
self.code_block_buffer.clear();
|
||||
|
||||
self.indent_stack.push(IndentContext::new(
|
||||
vec![indent.unwrap_or_default()],
|
||||
None,
|
||||
@@ -409,6 +437,20 @@ where
|
||||
}
|
||||
|
||||
fn end_codeblock(&mut self) {
|
||||
// If we buffered code for a known language, syntax-highlight it now.
|
||||
if let Some(lang) = self.code_block_lang.take() {
|
||||
let code = std::mem::take(&mut self.code_block_buffer);
|
||||
if !code.is_empty() {
|
||||
let highlighted = highlight_code_to_lines(&code, &lang);
|
||||
for hl_line in highlighted {
|
||||
self.push_line(Line::default());
|
||||
for span in hl_line.spans {
|
||||
self.push_span(span);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
self.needs_newline = true;
|
||||
self.in_code_block = false;
|
||||
self.indent_stack.pop();
|
||||
@@ -689,4 +731,39 @@ mod tests {
|
||||
"expected full URL-like token in one rendered line, got: {lines:?}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn fenced_code_info_string_with_metadata_highlights() {
|
||||
// CommonMark info strings like "rust,no_run" or "rust title=demo"
|
||||
// contain metadata after the language token. The language must be
|
||||
// extracted (first word / comma-separated token) so highlighting works.
|
||||
for info in &["rust,no_run", "rust no_run", "rust title=\"demo\""] {
|
||||
let markdown = format!("```{info}\nfn main() {{}}\n```\n");
|
||||
let rendered = render_markdown_text(&markdown);
|
||||
let has_rgb = rendered.lines.iter().any(|line| {
|
||||
line.spans
|
||||
.iter()
|
||||
.any(|s| matches!(s.style.fg, Some(ratatui::style::Color::Rgb(..))))
|
||||
});
|
||||
assert!(
|
||||
has_rgb,
|
||||
"info string \"{info}\" should still produce syntax highlighting"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn crlf_code_block_no_extra_blank_lines() {
|
||||
// pulldown-cmark can split CRLF code blocks into multiple Text events.
|
||||
// The buffer must concatenate them verbatim — no inserted separators.
|
||||
let markdown = "```rust\r\nfn main() {}\r\n line2\r\n```\r\n";
|
||||
let rendered = render_markdown_text(markdown);
|
||||
let lines = lines_to_strings(&rendered);
|
||||
// Should be exactly two code lines; no spurious blank line between them.
|
||||
assert_eq!(
|
||||
lines,
|
||||
vec!["fn main() {}".to_string(), " line2".to_string()],
|
||||
"CRLF code block should not produce extra blank lines: {lines:?}"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user