Adding A Parser
Finding a parser
New parsers for difftastic must be reasonably complete and maintained.
There are many tree-sitter parsers available, and the tree-sitter website includes a list of some well-known parsers.
Add the source code
Tree-sitter parsers are sometimes packaged on npm, sometimes packaged on crates.io, and have different release frequencies. Difftastic uses git subtrees (not git submodules) to track parsers.
Once you've found a parser, add it as a git subtree to
vendor/
. We'll use
tree-sitter-json as
an example.
$ git subtree add --prefix=vendor/tree-sitter-json git@github.com:tree-sitter/tree-sitter-json.git master
Configure the build
Cargo does not allow packages to include subdirectories that contain a
Cargo.toml
. Add a symlink to the src/
parser subdirectory.
$ cd vendor
$ ln -s tree-sitter-json/src tree-sitter-json-src
You can now add the parser to build by including the directory in
build.rs
.
TreeSitterParser {
name: "tree-sitter-json",
src_dir: "vendor/tree-sitter-json-src",
extra_files: vec![],
},
If your parser includes custom C or C++ files for lexing (e.g. a
scanner.cc
), add them to extra_files
.
Configure parsing
Add an entry to tree_sitter_parser.rs
for your language.
Json => {
let language = unsafe { tree_sitter_json() };
TreeSitterConfig {
name: "JSON",
language,
atom_nodes: vec!["string"].into_iter().collect(),
delimiter_tokens: vec![("{", "}"), ("[", "]")],
highlight_query: ts::Query::new(
language,
include_str!("../vendor/highlights/json.scm"),
)
.unwrap(),
}
}
name
is the human-readable name shown in the UI.
atom_nodes
is a list of tree-sitter node names that should be
treated as atoms even though the nodes have children. This is common
for things like string literals or interpolated strings, where the
node might have children for the opening and closing quote.
If you don't set atom_nodes
, you may notice added/removed content
shown in white. This is usually a sign that child node should have its
parent treated as an atom.
delimiter_tokens
are delimiters that difftastic stores on
the enclosing list node. This allows difftastic to distinguish
delimiter tokens from other punctuation in the language.
If you don't set delimiter_tokens
, difftastic will consider the
tokens in isolation, and may think that a (
was added but the )
was unchanged.
You can use difft --dump-ts foo.json
to see the results of the
tree-sitter parser, and difft --dump-syntax foo.json
to confirm that
you've set atoms and delimiters correctly.
Configure language detection
Update from_extension
in guess_language.rs
to detect your new
language.
"json" => Some(Json),
There may also file names or shebangs associated with your language. GitHub's linguist definitions is a useful source of common file extension.
Syntax highlighting (Optional)
To add syntax highlighting to the package, you'll also need a symlink
to the highlights.scm
, if available.
$ cd vendor/highlights
$ ln -s ../tree-sitter-json/queries/highlights.scm json.scm
Add a regression test
Finally, add a regression test for your language. This ensures that the output for your test file doesn't change unexpectedly.
Regression test files live in sample_files/
and have the form
foo_before.abc
and foo_after.abc
.
$ nano simple_before.json
$ nano simple_after.json
Run the regression test script and update the .expected
file.
$ ./sample_files/compare_all.sh
$ cp sample_files/compare.result sample_files/compare.expected