Recoverability / Repeatability

Can execution resume from halfway.

Workflow Steps

Maturity Levels

🧹 Need manual clean up, then restart from scratch.

Usually takes effort to:

  • Find out what needs cleaning up.
  • Figure out how to clean up, and do it.

This will happen again.

🔁 Restart from scratch.

Tooling can automatically clean up for you.

Maybe you want to keep the failing environment around for investigation.

♻️ Reuse and recycle

  • If it's already done, we won't do it again.

  • Replace / update existing resources.

  • Dependency updates must be propagated.

    • Re-download file if it changed.
    • Restart the web application if configuration changed.

This means if we fail at 90% / 2 hours into the process, we can restart at that point without waiting another 2 hours.

API Implications

  • Implementors should provide a "check" function, and a "do it" function.
  • The check function checks if the task is in the desired state.
  • If not, the do it function is called.

choochoo:

  • Requires a visit_fn ("do it" function).
  • check_fn is run if present, otherwise it always runs the visit_fn.
  • Bonus: The check function is run after the visit function to detect if the logic is correct.

Live Demo

  • Run it twice.
  • Replay last 2 steps.
  • Update the file.
time ./target/release/examples/demo
rm -rf /tmp/choochoo/demo/station_{g,h}
  • 514 KB, 1 second

    for i in {1..200000}; do printf "application contents $i"; done | gzip -f > app.zip
    
  • 5 MB, 7 seconds

    for i in {1..2000000}; do printf "application contents $i"; done | gzip -f > app.zip