Favorite Rust Crates for one-off scripts

I am not a professional Rust programmer. Most of my Rust code are back-scratchers that are hacked together to scratch an itch. Over time, I have built up a set of crates that I like to use that make gluing together a script easier.

These are tasks that I used to reach to Python for. My thought process goes like this, "If I want to share this project, I'd like to use a lingua franca, so other people will understand it". So I chose Python.

However, as time has moved on, the Python ecosystem looks very different from when I used Python professionally. Python packaging has forsaken the Zen of Python ethos that "There should be one-- and preferably only one --obvious way to do it." We are spoiled for choice in python, but that means every new project feels like a research task on which tools should I use for starting that project.

So now, I tend to reach for Rust to start a new project. Cargo gives me the definitive answer to "How do I start this project?".

Rust also gives me the type-safety that I've come to love. Sure, I can bolt on something like Pydantic and mypy to give me stronger typing, but that means I am just trying to squeeze Python into a Rust shaped hole. Why don't I just use Rust? so I do.

I've aborted too many Python projects in favor of Rust due to the complexity of trying to make it more Rust-like. Now I skip that pain, and just choose Rust.

The core tool chain of Rust, Cargo, Clippy, and Rust Analyzer gives me a set of basic well integrated tools that make starting a Rust project easy. No more cobbling different tools from a wealth of choices. There is one obvious way to do it. From there the world is wide open. So what are some common crates that I use to hack together scripts to scratch an itch?

My script

Recently I wrote a script to generate an Anki deck to help me when practicing my CW Academy vocabulary.

This is a pretty typical hacky script that I wrote to accomplish one goal, quality be damned. There were .unwrap() calls everywhere. I didn't care. I cared about the result more than I cared about the quality of the code that got me there.

The goal of this script was to turn the vocabulary I'm learning in the CW Academy's beginner course into an Anki deck that will use drill based on the sounds of these words, callsigns, and phrases.

The best way to learn morse code, or CW as it is called in amateur radio, is to memorize the sounds that the characters make as whole sounds rather than dits and dahs.

Proficiency in CW is gaining what is called "Instant Character Recognition" where we hear a sound like "dah dit" and instantly think "N". Eventually, if we're lucky, this instant recognition will develop into identifying groups of sounds as whole words.

So my goal was to take a collection of words, callsigns, and phrases that are part of the CW Academy's beginner course, turn them into sound files, and use Anki to drill them with spaced repetition.

My theory is that if I use spaced repetition to learn these sounds, Anki will take care to playing the sounds I haven't quite internalized more often and show me the ones I have a good handle on less often. Thus optimizing my learning process.

So what crates did I use to accomplish this goal?

Serde

cargo add serde serde_derive serde_json

Most often, I am taking data I've scraped from somewhere, processing it, and producing some kind of output. In the case of my Anki deck script, I took the JSON from Morse Code World's CW Academy training page, parsed it into Rust structs, and use that to generate morse code.

Rust's answer to that problem is the powerful Serde. Serde provides a collection of annotations for Rust types that can be used by parsing crates to serialize and deserialize values to and from Rust. I was tempted to omit it here because it is so ubiquitous, but in case you're unaware, I am including it.

The JSON I was consuming is not too surprising:

[
  {
    "copying": {
      "characters": ["A"],
      "words": ["TEA"],
      "abbreviations": ["AA"],
      "phrases": ["EAT AT TEN"]
    },
  }
]

In Python, I might be tempted to just parse this into lists and dictionaries, which is fine, but this is the 21st century, and we have really powerful type systems. Serde makes it really easy to turn a bag of objects into structured data.

Is this overkill compared to lists and dictionaries? Maybe, but it is the way I enjoy coding, so this is what I do. I had to put up with spontaneous KeyError, IndexError, and NoneType errors for a decade. I'm not going back. You can pull my product and sum types from my cold dead hands.

use serde_derive::Deserialize;

#[derive(Clone, Default, Debug, PartialEq, Deserialize)]
struct Session {
    copying: Copying,
}

#[derive(Clone, Default, Debug, PartialEq, Deserialize)]
struct Copying {
    #[serde(default)]
    abbreviations: Vec<String>,

    #[serde(default)]
    callsigns: Vec<String>,

    #[serde(default)]
    characters: Vec<String>,

    #[serde(default)]
    numbers: Vec<String>,

    #[serde(default)]
    phrases: Vec<String>,

    #[serde(default)]
    words: Vec<String>,
}

Parsing this is straight-forward.

    let cwops_data = File::open("./data/cwops.json")
      .expect("could not open data file");
      
    let sessions: Vec<Session> = serde_json::from_reader(cwops_data)
      .expect("could not deserialize session data");

Rust does some type-inference magic and turns that JSON file into a vector of Session structs. The #[serde(default)] annotation tell it to use the default value for a Vec<String>. If the field is missing in the JSON, the value is an empty vector. Easy peasey.

There are parsing crates for all manor of serializations: JSON, YAML, even CSV. If it is popular format, there's probably a Serde crate to process it,

subprocess

cargo add subprocess

Shelling out in professional, production code is more often than not a big 'ol faux pas. However, in these kinds of hacked together scripts, good taste be damned. Sometimes you want to just shell out to something as a proof-of-concept.

I suppose I could have figured out how to generate tones at certain frequency, at a certain word per minute, with the spacing I wanted. That would have been a fun exercise, but it would have been a distraction to my goal.

In the case of my Anki deck script, I needed to generate sound files. I found a command called cwwav that turns the text piped into it into a .wav file. It has everything I needed, but is a separate command. I need a way to execute it from within Rust.

I started with Rust's std::process module, but it had a few issues. It didn't look at my PATH to find the cwwav command. Also, stdin is awkward to use. I basically wanted the equivalent of running a command in bash, piping in some input, and capturing its output. I didn't care if it blocked.

The crate I found was subprocess

Exec::cmd("cwwav")
    .stdin(answer.as_str())
    .args(&[
        "-o", &wav_file.to_string_lossy(),
        &format!("--frequency={0}", options.frequency),
        &format!("--wpm={0}", options.wpm),
        &format!("--farnsworth={0}", options.farnsworth),
    ])
    .capture()
    .expect("unable to run cwwav");

This is pretty straightforward. It shells out, executes the command and returns with a status code. The stdout and stderr are streamed to the process' output, so the person sees the output. Is it elegant? Is it pretty? Nah, but it gets the job done.

logging

cargo add log env_logger

This provide a super simple way to add logging to the script

use env_logger::Env

env_logger::Builder::from_env(
  Env::default().default_filter_or("info")
).init();

log::info!("Encoding {answer:#?} to {wav_file:#?}");

It uses the standard format strings used with format!() and println!() and env_logger allows the logging to be controlled by the enviroment.

clap

cargo add clap --features derive

In the original version of this post, I suggested using structopt. Ben Pfaff on Mastodon pointed out that clap has integrated the derive feature of structopt. The structopt crate is no longer necessary to derive a struct from command line arguments and clap can be used directly now. In fact, the structopt project has been place into maintenance mode, so from this point on I will start using clap.

The clap crate provides a derive macro to declare command line arguments using serde like annotations. These annotations make it very easy to describe the command line arguments and sub-commands for your command.

use clap::Parser;

// ...

#[derive(Clone, PartialEq, Debug, clap::Parser)]
#[command(version, about = "A tool to generate a custom Ankideck to CW Academy")]
struct Args {
    /// use N words per minute
    #[arg(short = 'w', long = "wpm", default_value = "25")]
    wpm: usize,

    /// use N WPM Farnsworth speed
    #[arg(short = 'F', long = "farnsworth", default_value = "15")]
    farnsworth: usize,

    /// use sidetone frequency N Hz
    #[arg(short = 'f', long = "frequency", default_value = "500")]
    frequency: usize,

    /// use this directory for building the Anki deck
    #[arg(short = 'o', long = "output", default_value = "out")]
    out_path: PathBuf,
}

// ...

let args = Args::parse();

This produces nice, consistent help text for the command. There are also a number of supplemental crates that build on top of clap to provide things such as shell completion, man page generation, etc.

$ ./target/release/ankiweb-cwacademy --help
A tool to generate a custom Ankideck to CW Academy

Usage: ankiweb-cwacademy [OPTIONS]

Options:
  -w, --wpm <WPM>                use N words per minute [default: 25]
  -F, --farnsworth <FARNSWORTH>  use N WPM Farnsworth speed [default: 15]
  -f, --frequency <FREQUENCY>    use sidetone frequency N Hz [default: 500]
  -o, --output <OUT_PATH>        use this directory for building the Anki deck [default: out]
  -h, --help                     Print help
  -V, --version                  Print version

Conclusion

That is the basic set. There are some more advanced crates that I might pick if I was making a more production ready project. These would be thiserror, anyhow, and maybe tracing.

For one-off scripts these are the crates I reach for every time. If you would like to see the code for the Anki deck generator. It is located at gitlab:ericcodes/ericcodes/ankiweb-cwacademy.

I will likely upload the deck to the shared deck repository after this deck has some time to bake.