Language Support

Supported File Types

Language Check extracts prose from these file formats using tree-sitter parsers:

Format

Language ID

Extensions

Parser / Strategy

Markdown

markdown

.md, .markdown

tree-sitter-markdown

MDX

(alias)

.mdx

Treated as Markdown

HTML

html

.html, .htm

tree-sitter-html

XHTML

(alias)

.xhtml

Treated as HTML

LaTeX

latex

.tex, .latex, .ltx

tree-sitter-latex

R Sweave

sweave

.Rnw, .rnw

R chunk preprocessing + tree-sitter-latex

reStructuredText

rst

.rst, .rest

tree-sitter-rst

Org mode

org

.org

tree-sitter-org (vendored)

BibTeX

bibtex

.bib

tree-sitter-bibtex

Typst

typst

.typ

tree-sitter-typst (vendored)

Forester

forester

.tree

tree-sitter-forester (vendored)

Prose extraction details

Each language has a custom prose extractor that understands which parts of a document contain human-readable text:

  • Markdown / HTML — Uses tree-sitter query patterns to select prose nodes, skipping code blocks, front matter, and inline code.

  • LaTeX — Tree-walks the AST collecting word nodes from \begin{document} onward. Skips preamble, math environments, verbatim/minted/algorithm blocks, and structural commands (\ref, \label, \includegraphics, etc.). Display math (\[...\]) bridges into surrounding prose as an exclusion zone.

  • R Sweave — Preprocesses R code chunks (<<...>>= through @) by blanking them with whitespace, then delegates to the LaTeX extractor.

  • reStructuredText — Extracts paragraph and title nodes. Skips code-block, math, raw, and similar directives. Inline literals are marked as exclusion zones.

  • Org mode — Extracts paragraph text and heading titles. Skips #+begin_src blocks, drawers (:PROPERTIES:), LaTeX environments, comments, and tables.

  • BibTeX — Extracts prose from specific fields: title, booktitle, abstract, note, annote, annotation, howpublished, and series. Other fields (author, journal, year, etc.) are ignored. LaTeX commands inside values (e.g. \emph{...}) are handled via exclusion zones.

  • Typst — Collects text nodes from paragraphs, headings, and list items. Skips code blocks (```), inline code (`), math ($...$ and $ ... $), #code expressions, set/show rules, let bindings, imports, includes, labels, references, URLs, escapes, and comments. Inline markup (*bold*, _italic_) is bridged through.

  • Forester — Collects text and escape nodes, skipping math (#{...}, ##{...}), verbatim fences, wiki links, comments, and structural commands (\import, \ref, \def, etc.). Display math bridges as an exclusion zone.

Adding more file types

You can add support for extra file types without code in two ways:

  • map new extensions onto existing built-in language IDs, or

  • define regex-based Simplified Language Schema (SLS) YAML files in .langcheck/schemas/.

See the Config-Only Language Guide for both workflows, including a full schema example.

Tip

To add support for an entirely new markup language with its own tree-sitter grammar, see the Plugin Language Guide.

Checking Languages

The spell-check and grammar-check language is separate from the file type. Click the language indicator in the VS Code status bar to switch:

  • EN-US — American English

  • EN-GB — British English

  • DE-DE — German

  • FR — French

  • ES — Spanish

Language detection can also be automatic via the whatlang crate when no explicit language is set.