Language Support¶

Supported File Types¶

Language Check extracts prose from these file formats using tree-sitter parsers:

Format	Language ID	Extensions	Parser / Strategy
Markdown	`markdown`	`.md`, `.markdown`	tree-sitter-markdown
MDX	(alias)	`.mdx`	Treated as Markdown
HTML	`html`	`.html`, `.htm`	tree-sitter-html
XHTML	(alias)	`.xhtml`	Treated as HTML
LaTeX	`latex`	`.tex`, `.latex`, `.ltx`	tree-sitter-latex
R Sweave	`sweave`	`.Rnw`, `.rnw`	R chunk preprocessing + tree-sitter-latex
reStructuredText	`rst`	`.rst`, `.rest`	tree-sitter-rst
Org mode	`org`	`.org`	tree-sitter-org (vendored)
BibTeX	`bibtex`	`.bib`	tree-sitter-bibtex
Typst	`typst`	`.typ`	tree-sitter-typst (vendored)
Forester	`forester`	`.tree`	tree-sitter-forester (vendored)

Prose extraction details¶

Each language has a custom prose extractor that understands which parts of a document contain human-readable text:

Markdown / HTML — Uses tree-sitter query patterns to select prose nodes, skipping code blocks, front matter, and inline code.
LaTeX — Tree-walks the AST collecting word nodes from \begin{document} onward. Skips preamble, math environments, verbatim/minted/algorithm blocks, and structural commands (\ref, \label, \includegraphics, etc.). Display math (\[...\]) bridges into surrounding prose as an exclusion zone.
R Sweave — Preprocesses R code chunks (<<...>>= through @) by blanking them with whitespace, then delegates to the LaTeX extractor.
reStructuredText — Extracts paragraph and title nodes. Skips code-block, math, raw, and similar directives. Inline literals are marked as exclusion zones.
Org mode — Extracts paragraph text and heading titles. Skips #+begin_src blocks, drawers (:PROPERTIES:), LaTeX environments, comments, and tables.
BibTeX — Extracts prose from specific fields: title, booktitle, abstract, note, annote, annotation, howpublished, and series. Other fields (author, journal, year, etc.) are ignored. LaTeX commands inside values (e.g. \emph{...}) are handled via exclusion zones.
Typst — Collects text nodes from paragraphs, headings, and list items. Skips code blocks (```), inline code (`), math ( $...$ and $ ... $ ), #code expressions, set/show rules, let bindings, imports, includes, labels, references, URLs, escapes, and comments. Inline markup (*bold*, _italic_) is bridged through.
Forester — Collects text and escape nodes, skipping math (#{...}, ##{...}), verbatim fences, wiki links, comments, and structural commands (\import, \ref, \def, etc.). Display math bridges as an exclusion zone.

Adding more file types¶

You can add support for extra file types without code in two ways:

map new extensions onto existing built-in language IDs, or
define regex-based Simplified Language Schema (SLS) YAML files in .langcheck/schemas/.

See the Config-Only Language Guide for both workflows, including a full schema example.

Tip

To add support for an entirely new markup language with its own tree-sitter grammar, see the Plugin Language Guide.

Checking Languages¶

The spell-check and grammar-check language is separate from the file type. Click the language indicator in the VS Code status bar to switch:

EN-US — American English
EN-GB — British English
DE-DE — German
FR — French
ES — Spanish

Language detection can also be automatic via the whatlang crate when no explicit language is set.