Date Modified Category rust Tags rust

Introduction

Rust is a new language created by Graydon Hoare. It’s described as safe, concurrent, and practical. Knowing how difficult concurrency can be in C/C++, I’ve decided to give it a look.

Playing with the examples you get while googling is nice, but you cannot evaluate a language without really trying to do something new with it. The goal must be simple enough to have a chance to finish it and complex enough to do something useful with it.

Goal

The goal is to develop an application similar to pexec.

The parameters will be:
  • A folder that will be scanned recursively.
  • A TOML configuration file with
    • A regex to match the files that will be processed
    • A list of commands to execute on each of the files matched.

The files discovered will be processed using as many cpu cores as available.

This makes a simple application, usable, for example, to post-process pictures with imagemagick, that demonstrate the use of Rust where it should shine: concurrency.

Rust installation

Rust installation is well described on the rust-lang website. Cargo is the de facto package manager for rust. It’s part of all Rust installation.

The Rust book first starts explaining the usage of rustc, the rust compiler as-is and then shows the same example using cargo which is the simplest solution to start with. To create a new project:

Hello World!

Creating the project with cargo is straightforward:

$ cargo new rpexec --bin

If you use git as version control (if not, think about it twice ;-):

$ cargo new rpexec --bin --vcs git
Created binary (application) `rpexec` project

This should create the basic file structure below:

$ /usr/bin/tree -a
.
├── .git
│   ├── HEAD
│   ├── config
│   ├── description
│   ├── hooks
│   │   └── README.sample
│   ├── info
│   │   └── exclude
│   ├── objects
│   │   ├── info
│   │   └── pack
│   └── refs
│       ├── heads
│       └── tags
├── .gitignore
├── Cargo.toml
└── src
    └── main.rs

10 directories, 8 files

The example can be executed right now!

$ cargo run
   Compiling rpexec v0.1.0 (file:///X:/wp/meta/prj/rust/rpexec)
    Finished dev [unoptimized + debuginfo] target(s) in 1.1 secs
     Running `target\debug\rpexec.exe`
Hello, world!

So far, so good!

Path & PathBuf

To start, I wanted to implement a recursive function to scan all files and folders starting from a given point. Recursion can be avoided most of the time but it’s always interesting to test how the language handles it.

10s of Googling pointed me to the read_dir function of the rust standard library. The example makes usage of std::path::Path which seems to be a operating system dependant structure that holds a filesystem path.

I start by adding a vector that will contain all the files discovered during the scan:

use std::path::Path;

fn main() {
    // 'files' will contain the list of files discovered.
    let mut files : Vec<Path> = Vec::new();
    println!("Hello, world!");
}

Compiling it with:

$ cargo build

Leads to the following result:

   Compiling rpexec v0.1.0 (rpexec)
error[E0277]: the trait bound `[u8]: std::marker::Sized` is not satisfied in `std::path::Path`
 --> src\main.rs:5:21
  |
5 |     let mut files : Vec<Path> = Vec::new();
  |                     ^^^^^^^^^ `[u8]` does not have a constant size known at compile-time
  |
  = help: within `std::path::Path`, the trait `std::marker::Sized` is not implemented for `[u8]`
  = note: required because it appears within the type `std::path::Path`
  = note: required by `std::vec::Vec`

error[E0277]: the trait bound `[u8]: std::marker::Sized` is not satisfied in `std::path::Path`
 --> src\main.rs:5:33
  |
5 |     let mut files : Vec<Path> = Vec::new();
  |                                 ^^^^^^^^ `[u8]` does not have a constant size known at compile-time
  |
  = help: within `std::path::Path`, the trait `std::marker::Sized` is not implemented for `[u8]`
  = note: required because it appears within the type `std::path::Path`
  = note: required by `<std::vec::Vec<T>>::new`

error: aborting due to 2 previous errors

error: Could not compile `rpexec`.

To learn more, run the command again with --verbose.

Wow! That makes a lot of error log for just one line added! Seems a Vector requires to have fixed size elements which is not the case for Path. Verified by looking at the Path documentation which tells:

This is an unsized type, meaning that it must always be used behind
a pointer like & or Box. For an owned version of this type, see PathBuf.

Ok, seems PathBuf is the key.

use std::path::PathBuf;

fn main() {
    // 'files' will contain the list of files discovered.
    let mut files : Vec<PathBuf> = Vec::new();
    println!("Hello, world!");
}

And the resulting compilation:

   Compiling rpexec v0.1.0 (rpexec)
warning: unused variable: `files`
 --> src\main.rs:5:9
  |
5 |     let mut files : Vec<PathBuf> = Vec::new();
  |         ^^^^^^^^^
  |
  = note: #[warn(unused_variables)] on by default
  = note: to disable this warning, consider using `_files` instead

warning: variable does not need to be mutable
 --> src\main.rs:5:9
  |
5 |     let mut files : Vec<PathBuf> = Vec::new();
  |         ^^^^^^^^^
  |
  = note: #[warn(unused_mut)] on by default

    Finished dev [unoptimized + debuginfo] target(s) in 0.77 secs
Ok, now the 2 errors have been turned in warnings:
  • The variable files is unused
  • The variable files does not need to be mutable

Try to use the vector:

use std::path::PathBuf;

fn main() {
    // 'files' will contain the list of files discovered.
    let mut files : Vec<PathBuf> = Vec::new();
    for f in files {
        println!("Discovered: {}", f);
    }
}

Gives the following:

   Compiling rpexec v0.1.0 (rpexec)
error[E0277]: the trait bound `std::path::PathBuf: std::fmt::Display` is not satisfied
 --> src\main.rs:7:36
  |
7 |         println!("Discovered: {}", f);
  |                                    ^ `std::path::PathBuf` cannot be formatted with the default formatter; try using `:?` instead if you are using a format string
  |
  = help: the trait `std::fmt::Display` is not implemented for `std::path::PathBuf`
  = note: required by `std::fmt::Display::fmt`

error: aborting due to previous error

Googling on the PathBuf documentation again:

pub fn display(&self) -> Display[src]

Returns an object that implements Display for safely printing paths that may contain non-Unicode data.
use std::path::PathBuf;

fn main() {
    // 'files' will contain the list of files discovered.
    let mut files : Vec<PathBuf> = Vec::new();
    for f in files {
        println!("Discovered: {}", f.display());
    }
}
   Compiling rpexec v0.1.0 (rpexec)
warning: variable does not need to be mutable
 --> src\main.rs:5:9
  |
5 |     let mut files : Vec<PathBuf> = Vec::new();
  |         ^^^^^^^^^
  |
  = note: #[warn(unused_mut)] on by default

Finally, it compiles… Error messages are good but the compiler and the type system do not give you any free gift.

Conditional compilation

As I’m often jumping from one operating system to the other between linux and windows, it’s nice to have code that works on both systems without any manual changes. For that, sometimes, conditional compilation is the simplest solution:

use std::path;

fn recursive_find(p : &path::Path, v : &Vec<path::PathBuf>) {
    if p.is_dir() {
        println!("{} is a directory", p.display());
    } else {
        println!("{} is a file", p.display());
    }
}

fn main() {
    // 'files' will contain the list of files discovered.
    let mut files : Vec<path::PathBuf> = Vec::new();
    let p;
    if cfg!(target_os = "windows") {
        p = path::Path::new("C:");
    } else {
        p = path::Path::new("/");
    }
    recursive_find(&p, &files);
    for f in files {
        println!("Discovered: {}", f.display());
    }
}

After tons of warning from the compiler, I can see:

C: is a directory

Mutability

Trying to add the discovered file to the vector:

fn recursive_find(p : &path::Path, v : &Vec<path::PathBuf>) {
    if p.is_dir() {
        println!("{} is a directory", p.display());
    } else {
        v.push(p);
    }
}

Gives:

error[E0308]: mismatched types
 --> src\main.rs:7:16
  |
7 |         v.push(p);
  |                ^ expected struct `std::path::PathBuf`, found reference
  |
  = note: expected type `std::path::PathBuf`
             found type `&std::path::Path`
  = help: here are some functions which might fulfill your needs:
          - .to_path_buf()

Nice! The compiler give us the solution! Changing the line to v.push(p.to_path_buf()) gives:

error[E0596]: cannot borrow immutable borrowed content `*v` as mutable
 --> src\main.rs:7:9
  |
3 | fn recursive_find(p : &path::Path, v : &Vec<path::PathBuf>) {
  |                                        ------------------- use `&mut Vec<path::PathBuf>` here to make mutable
...
7 |         v.push(p.to_path_buf());
  |         ^ cannot borrow as mutable

Again, the compiler is clever enough to give good error messages.

Reading the directory

Reading the content of the directory requires to use std::fs::read_dir

pub fn read_dir<P: AsRef<Path>>(path: P) -> Result<ReadDir>

Returns an iterator over the entries within a directory.

The iterator will yield instances of io::Result<DirEntry>. New errors may be encountered after an iterator is initially constructed.

The function returns a Result which has to be handled. The simplest solution is to start by calling the panic! macro to print any issue.

use std::path;
use std::fs;

fn recursive_find(p : &path::Path, v : &mut Vec<path::PathBuf>) {
    if p.is_dir() {
        println!("{} is a directory", p.display());
        match fs::read_dir(p) {
            Err(why) => panic!("{:?}", why),
            Ok(entries) => for entry in entries {
                match entry {
                    Err(why) => panic!("{:?}", why),
                    Ok(entry) =>  recursive_find(entry.path().as_path(), v)
                }
            }
        }
    } else {
        v.push(p.to_path_buf());
    }
}

fn main() {
    // 'files' will contain the list of files discovered.
    let mut files : Vec<path::PathBuf> = Vec::new();
    let p;
    if cfg!(target_os = "windows") {
        p = path::Path::new("C:");
    } else {
        p = path::Path::new("/");
    }
    recursive_find(&p, &mut files);
    for f in files {
        println!("Discovered: {}", f.display());
    }
}

Running the result will quickly lead to a panic:

thread 'main' panicked at 'Error { repr: Os { code: 5, message: "Access is denied." } }', src\main.rs:8:24
stack backtrace:
   0: std::panicking::default_hook::{{closure}}
   1: std::panicking::default_hook
   2: std::panicking::rust_panic_with_hook
   3: std::panicking::begin_panic
   4: std::panicking::begin_panic_fmt
   5: rpexec::recursive_find
             at src/main.rs:8
   6: rpexec::recursive_find
             at src/main.rs:12
   7: rpexec::main
             at src/main.rs:30
   8: __rust_maybe_catch_panic
   9: std::rt::lang_start
  10: main

Adding Logging

As we have seen in the previous post, handling errors with panic is not always the best solution. In order to let the recursive scan give us a list of accessible files, we need to ignore the errors. Ignoring an error blindly is (almost) always a bad idea. It’s much better to log it to have a mean to debug it later if needed.

Adding logging in Rust is really easy. The Log crate provides the facade (the macros to log: info!, warn!, debug!, …). The backend responsible to handle the log messages is separated and can be chosen depending on the needs. In this particular example, a simple logger will be sufficient. env_logger looks like a perfect fit.

Cargo & modules

To add support of logging, simply add the 2 following dependencies in your cargo.toml:

[package]
name = "rpexec"
version = "0.1.0"
authors = ["13pgeiser"]

[dependencies]
log = "0.3.8"
env_logger = "0.4.3"

In the rust code, add at the beginning:

#[macro_use] extern crate log;
extern crate env_logger;

and replace the panic! macros with warn:

use std::path;
use std::fs;

#[macro_use] extern crate log;
extern crate env_logger;

fn recursive_find(p : &path::Path, v : &mut Vec<path::PathBuf>) {
    if p.is_dir() {
        println!("{} is a directory", p.display());
        match fs::read_dir(p) {
            Err(why) => warn!("{:?}", why),
            Ok(entries) => for entry in entries {
                match entry {
                    Err(why) => warn!("{:?}", why),
                    Ok(entry) => recursive_find(entry.path().as_path(), v)
                }
            }
        }
    } else {
        v.push(p.to_path_buf());
    }
}

fn main() {
    match env_logger::init() {
        Err(why) => panic!("{:?}", why),
        Ok(_) => ()
    }
    // 'files' will contain the list of files discovered.
    let mut files : Vec<path::PathBuf> = Vec::new();
    let p;
    if cfg!(target_os = "windows") {
        p = path::Path::new("C:");
    } else {
        p = path::Path::new("/");
    }
    recursive_find(&p, &mut files);
    for f in files {
        println!("Discovered: {}", f.display());
    }
}

Adding external dependencies proved to be much easier than I thought. Everything went flawlessly.

Cleanup

A bit of cleanup:
  • Removing println!
  • Creating a function read_dir which takes only the Path as parameter
use std::path;
use std::fs;

#[macro_use] extern crate log;
extern crate env_logger;

fn recursive_find(p : &path::Path, v : &mut Vec<path::PathBuf>) {
    if p.is_dir() {
        match fs::read_dir(p) {
            Err(why) => warn!("{:?}", why),
            Ok(entries) => for entry in entries {
                match entry {
                    Err(why) => warn!("{:?}", why),
                    Ok(entry) => recursive_find(entry.path().as_path(), v)
                }
            }
        }
    } else {
        v.push(p.to_path_buf());
    }
}

fn read_dir(p : &path::Path) -> Vec<path::PathBuf> {
    let mut files : Vec<path::PathBuf> = Vec::new();
    recursive_find(&p, &mut files);
    files
}

fn main() {
    match env_logger::init() {
        Err(why) => panic!("{:?}", why),
        Ok(_) => ()
    }
    let p;
    if cfg!(target_os = "windows") {
        p = path::Path::new("C:");
    } else {
        p = path::Path::new("/");
    }
    let files = read_dir(&p);
    for f in files {
        println!("Discovered: {}", f.display());
    }
}

Docopt

Docopt is a command-line interface description language coming from Python. A good implementation of it is available in Rust: https://docs.rs/docopt/0.8.3/docopt/

First add the required dependencies in your Cargo.toml file:

docopt = "0.8.3"
serde = "1.0"
serde_derive = "1.0"

Then add the following lines at the beginning of the program:

#[macro_use] extern crate serde_derive;
extern crate docopt;
use docopt::Docopt;

At global scope, add the USAGE string and the structure that will holds the results:

static USAGE: &'static str = "
Usage: rpexec <cfg> <folder>
       rpexec --help

Options:
    -h, --help     Show this help.
";

#[derive(Debug, Deserialize)]
struct Args {
    arg_cfg: Option<String>,
    arg_folder: Option<String>,
    flag_help: bool,
}

And in the main function, you can now access the parameters:

fn main() {
    let args: Args = Docopt::new(USAGE)
                            .and_then(|d| d.deserialize())
                            .unwrap_or_else(|e| e.exit());
    println!("{:?}", args);
    let folder = args.arg_folder.unwrap();
    ...

The final source code is the following:

use std::path;
use std::fs;

#[macro_use] extern crate log;
extern crate env_logger;

#[macro_use] extern crate serde_derive;
extern crate docopt;
use docopt::Docopt;

fn recursive_find(p : &path::Path, v : &mut Vec<path::PathBuf>) {
    if p.is_dir() {
        match fs::read_dir(p) {
            Err(why) => warn!("{:?}", why),
            Ok(entries) => for entry in entries {
                match entry {
                    Err(why) => warn!("{:?}", why),
                    Ok(entry) => recursive_find(entry.path().as_path(), v)
                }
            }
        }
    } else {
        v.push(p.to_path_buf());
    }
}

fn read_dir(p : &path::Path) -> Vec<path::PathBuf> {
    let mut files : Vec<path::PathBuf> = Vec::new();
    recursive_find(&p, &mut files);
    files
}

static USAGE: &'static str = "
Usage: rpexec <cfg> <folder>
       rpexec --help

Options:
    -h, --help     Show this help.
";

#[derive(Debug, Deserialize)]
struct Args {
    arg_cfg: Option<String>,
    arg_folder: Option<String>,
    flag_help: bool,
}

fn main() {
    let args: Args = Docopt::new(USAGE)
                            .and_then(|d| d.deserialize())
                            .unwrap_or_else(|e| e.exit());
    println!("{:?}", args);
    let folder = args.arg_folder.unwrap();
    match env_logger::init() {
        Err(why) => panic!("{:?}", why),
        Ok(_) => ()
    }
    let p = path::Path::new(&folder);
    let files = read_dir(&p);
    for f in files {
        println!("Discovered: {}", f.display());
    }
}

Again, quite easy!

The binary can now be called with arguments. By going in rpexec/target/debug, the program can now be called like this: rpexec.exe cfg.toml deps even if the configuration file does not exists yet.

rpexec\target\debug>rpexec.exe cfg.toml deps
Args { arg_cfg: Some("cfg.toml"), arg_folder: Some("deps"), flag_help: false }
Discovered: deps\libaho_corasick-d82beb3221574513.rlib
Discovered: deps\libdocopt-ec92cae87e20792b.rlib
Discovered: deps\libenv_logger-792faa4f253012f4.rlib
Discovered: deps\liblazy_static-3abe432cfe74cd43.rlib
Discovered: deps\liblibc-ad563abe9be22a0c.rlib
Discovered: deps\liblog-1a5f68b4313fd17c.rlib
Discovered: deps\libmemchr-6b9b4c395b9f85d7.rlib
Discovered: deps\libquote-17430abfe964663d.rlib
Discovered: deps\libregex-74137e40bd0b850e.rlib
Discovered: deps\libregex_syntax-7c512e09064da28c.rlib
Discovered: deps\libserde-f0a953cb6047220f.rlib
Discovered: deps\libserde_derive_internals-cc1344aee24c1045.rlib
Discovered: deps\libstrsim-d7edeb0f6e49f7ac.rlib
Discovered: deps\libsyn-075a49a5f5023de6.rlib
Discovered: deps\libsynom-391ce3b89b1a79b9.rlib
Discovered: deps\libthread_local-df6d741e06a83da6.rlib
Discovered: deps\libunicode_xid-d1282a2c4617451c.rlib
Discovered: deps\libunreachable-0d982b238a8c544f.rlib
Discovered: deps\libutf8_ranges-d5a96f4cdc58866a.rlib
Discovered: deps\libvoid-cdc19a0366cf91b4.rlib
Discovered: deps\rpexec-6b19306f9617ff91.exe
Discovered: deps\serde_derive-64247ade3c46a7ae.dll

TOML

Add the required dependencies in your Cargo.toml file:

toml = "0.4.5"

Add the following declarations:

use std::fs::File;
use std::io::Read;
extern crate toml;

Create a structure to get the content of the toml file:

#[derive(Deserialize)]
struct Config {
    regex: String,
}

And read the config file:

let cfg = args.arg_cfg.unwrap();
let mut f = File::open(cfg).expect("File not found");
let mut content = String::new();
f.read_to_string(&mut content).expect("Reading failed");
println!("Content: {}", content);
let config : Config = toml::from_str(&content).unwrap();
println!("toml: regex = {}", config.regex);

Regex

Add the required dependencies in your Cargo.toml file:

regex = "0.2.3"

Add the following declarations:

extern crate regex;
use regex::bytes::Regex;

Create the regular expression matcher in the main function:

let re = Regex::new(&config.regex).unwrap();

Pass it to the read_dir and the recursive_find functions and change the line v.push(p.to_path_buf()); to:

match p.to_str() {
    None => warn!("Cannot handle: {}", p.display()),
    Some(s) => if re.is_match(s.as_bytes()) {
        v.push(p.to_path_buf());
    }
}

And you’re done! Now the files will be filtered using the regular expression described in the toml file given in parameters!

Template engine

In order to create commands from the configuration file, the simplest is to use a template engine. Tera seems to be simple enough and well supported for my example.

Add the required dependencies in your Cargo.toml file (serde_json is required for Tera filters):

tera = "0.10.10"
serde_json = "1.0.9"

Make sure to reference this new crate in main.rs:

#[macro_use] extern crate tera;
use std::collections::HashMap;
use tera::{Result, Value, to_value};

Update the configuration structure and add 2 filters that will be used in the templates (see <https://tera.netlify.com/docs/templates/#filters> for more)

#[derive(Deserialize)]
struct Config {
    regex: String,
    cmds: Vec<String>,
}

pub fn basename(value: Value, _: HashMap<String, Value>) -> Result<Value> {
    let s = try_get_value!("upper", "value", String, value);
    let p = path::Path::new(&s);
    let basename = p.file_stem().unwrap().to_str().unwrap();
    Ok(to_value(basename).unwrap())
}

pub fn parent(value: Value, _: HashMap<String, Value>) -> Result<Value> {
    let s = try_get_value!("upper", "value", String, value);
    let p = path::Path::new(&s);
    let basename = p.parent().unwrap().to_str().unwrap();
    Ok(to_value(basename).unwrap())
}

And replace the final loop with the following:

for p in files {
    match p.to_str() {
        None => warn!("Cannot handle: {}", p.display()),
        Some(s) =>  {
            for cmd in &config.cmds {
                let mut context = tera::Context::new();
                context.add("path", &s);
                let mut tmpl = tera::Tera::default();
                tmpl.register_filter("basename", basename);
                tmpl.register_filter("parent", parent);
                tmpl.add_raw_template("one_off", &cmd).expect("failed");
                match tmpl.render("one_off", &context) {
                    Err(why) => warn!("{:?}", why),
                    Ok(result) => println!("{}", result)
                }
            }
        }
    }
}

Now, we can create a command line in the cfg.toml passed as parameters:

regex = ".*\\.exe"
cmds = [
    "{{path}}",
    "{{path | parent}}/{{path | basename }}",
]

And you’re done!

Using iterators

Iterators allows to perform a task on each elements of a sequence. For more details, you can consult Iterator chapter of Rust book 2nd edition and Wikipedia Iterator Pattern.

Rewriting our code to use an iterator is straightforward. The content of the final for loop is extracted in a function:

pub fn handle_path(p : &path::PathBuf, cmds : &Vec<String>) {
    match p.to_str() {
        None => {
            warn!("Cannot handle: {}", p.display());
        },
        Some(s) =>  {
            for cmd in cmds {
                let mut context = tera::Context::new();
                context.add("path", &s);
                let mut tmpl = tera::Tera::default();
                tmpl.register_filter("basename", basename);
                tmpl.register_filter("parent", parent);
                tmpl.add_raw_template("one_off", &cmd).expect("failed");
                match tmpl.render("one_off", &context) {
                    Err(why) => warn!("{:?}", why),
                    Ok(result) => println!("{}", result)
                }
            }
        }
    }
}

Then, the for loop is replaced with an iterator call:

files.iter().for_each(|p| handle_path(p, &config.cmds));

This makes the code much more readable.

Executing the resulting commands

Calling a external program is achieved using std::process::command:

use std::process::Command;

The std::process::command API expect all arguments passed 1 by 1 to the .arg() method. The simplest to avoid argument processing is to pass the command line to the SHELL:

pub fn run_command(cmdline : String) {
    // println!("\nexecuting: {}", cmdline);
    let output;
    if cfg!(target_os = "windows") {
        output = Command::new("cmd.exe").args(&["/C", &cmdline]).output().unwrap();
    } else {
        output = Command::new("sh").args(&["-c", &cmdline]).output().unwrap();
    }
    let stdout = String::from_utf8_lossy(&output.stdout);
    if stdout.len() > 0 {
        print!("{}", stdout);
    }
    let stderr = String::from_utf8_lossy(&output.stderr);
    if stderr.len() > 0 {
        eprint!("{}", stderr);
    }
}

Going parallel with Rayon!

Rayon is a data-parallesim library for Rust. It implements parallel iterators that are spreading the job over the available cores.

Enabling parallel processing in this small application is as easy as this:

Update the Cargo.toml file:

rayon = "0.9.0"

Use this new crate:

extern crate rayon;
use rayon::prelude::*;

And finally replace the iterator by a parallel iterator:

// files.iter().for_each(|p| handle_path(p, &config.cmds));
files.par_iter().for_each(|p| handle_path(p, &config.cmds));

Wow! That was really easy! Enjoy!