When Your Shell Script Outgrows Its "Shell"

The Scoring Algorithm That Broke the Spell

There is a moment in every project where you stop and look at what you have written. Not with the satisfaction of shipping, but with the quiet recognition that the tool you chose has become the problem you are solving.

For me, that moment was writing a professor curve in awk. A function that calculates which AI model is the best coder by weighting nine benchmark axes (correctness, complexity, code quality, efficiency, stability, edge cases, debugging, format, safety), raising each to a power of 1.6, applying a curved grading scale, and penalizing models that fall below quality gates.

bash

calc_coding_score() {
  local corr="$1" cmplx="$2" cq="$3" eff="$4" stab="$5" edge="$6" dbg="$7" fmt="$8" safe="$9"
  awk -v corr="$corr" -v cmplx="$cmplx" -v cq="$cq" -v eff="$eff" \
    -v stab="$stab" -v edge="$edge" -v dbg="$dbg" -v fmt="$fmt" -v safe="$safe" '
    function pw(base, ex) {
      if (base <= 0) return 0
      return exp(log(base) * ex)
    }
    BEGIN {
      w = pw(corr,1.6)*0.40 + pw(cmplx,1.6)*0.20 + pw(cq,1.6)*0.15 + \
          pw(eff,1.6)*0.05 + pw(stab,1.6)*0.10 + pw(edge,1.6)*0.03 + \
          pw(dbg,1.6)*0.03 + pw(fmt,1.6)*0.02 + pw(safe,1.6)*0.02
      curved = pw(w * 100 / 100, 1.3) * 100
      # ... quality gate penalties ...
    }'
}

Nine parameters passed as positional arguments to a bash function that immediately shells out to awk because bash cannot do floating-point math. The awk block implements power functions, weighted sums, a professor curve, and quality gate penalties. It works. It is correct. And it is absurd.

This function lives in AST, a CLI tool I built to track AI model performance from the terminal. The initial scope was focused: fetch model data from an API, format it into a clean dashboard, print it. Bash was the obvious choice. curl for HTTP, jq for JSON, printf for formatting. Three dependencies, zero build step, zero compilation. I had a shipping version in an afternoon.

That was about 600 lines ago. AST is now 824.

What 824 Lines of Bash Looks Like

It is not the line count that matters. Plenty of bash scripts are longer and perfectly fine. What matters is what those lines are doing.

AST now has a Unicode TUI with box-drawing characters and semantic colors. It has a watch mode that clears the screen and redraws every N seconds with rank change arrows. It has parallel HTTP fetches using background processes and wait. It has a braille loading spinner. It has multi-provider support, section filtering, JSON output, and a Homebrew tap. And it has a scoring algorithm that reproduces a third-party formula using awk because the language it is written in cannot multiply two decimals.

Each of these features was a reasonable addition at the time. Multi-provider support? Add a case statement and a variable. Watch mode? A while loop with sleep. Section filtering? A dispatch table of render functions. Each one made sense in isolation. Together, they produced a script where the complexity exceeds what the language was designed to manage.

The signals are specific. Error handling is manual: every curl call needs explicit exit code checking, every jq pipeline can fail silently if a field is missing, and set -euo pipefail catches "some" failures, which is not a property you want in your error handling strategy. Testing a function that outputs ANSI escape sequences through printf is a different experience than testing one that returns a typed value. Renaming a variable in an 824-line bash script offers no compiler feedback. ShellCheck catches syntax issues, but it cannot tell you that your refactored function now returns a string where a number was expected because bash does not have that concept. And the architecture wants to be something bash cannot express: section renderers, a dispatch loop, data extraction pipelines, parallel I/O, all expressed as functions sharing global variables because bash does not have modules, return types, or data structures beyond arrays and strings.

The Pillars Nobody Ranks

Software development has pillars. Not principles. Pillars. The structural elements that hold a project up over time. Testing. Maintainability. Readability. Reliability. And one that developers consistently treat as optional: refactoring.

Testing gets its own frameworks, its own conferences, its own certification tracks. Maintainability gets discussed in code reviews. Readability has style guides. Reliability has SLAs. Refactoring gets a Jira ticket that moves to the next sprint. Then the next. Then the backlog. Then nowhere.

This is backwards. Refactoring is not cleanup. It is not technical debt repayment. It is not the thing you do when you have "spare time," a concept that does not exist in software. Refactoring is the mechanism by which code stays aligned with what it actually does. As requirements grow, the original structure stops fitting. Refactoring is how you reshape the structure to match the new reality.

When you skip it, the code does not stop working. It stops being honest. The function names describe what they used to do. The architecture reflects decisions that no longer apply. The language itself, in my case bash, reflects a scope that no longer exists.

Why Rust

The rewrite target is Rust. Not because Rust is trendy (though it is), but because the specific problems AST has are the specific problems Rust solves. The scoring function makes this concrete.

Here is the bash version again. Nine positional arguments, shelled out to awk because bash cannot multiply decimals:

bash

calc_coding_score() {
  local corr="$1" cmplx="$2" cq="$3" eff="$4" stab="$5" edge="$6" dbg="$7" fmt="$8" safe="$9"
  awk -v corr="$corr" -v cmplx="$cmplx" ... '
    function pw(base, ex) { return exp(log(base) * ex) }
    BEGIN {
      w = pw(corr,1.6)*0.40 + pw(cmplx,1.6)*0.20 + pw(cq,1.6)*0.15 + ...
      curved = pw(w * 100 / 100, 1.3) * 100
    }'
}

Here is what the same function looks like in Rust:

rust

fn calc_coding_score(axes: &BenchmarkAxes) -> f64 {
    let weights = [
        (axes.correctness, 0.40), (axes.complexity, 0.20),
        (axes.code_quality, 0.15), (axes.efficiency, 0.05),
        (axes.stability, 0.10), (axes.edge_cases, 0.03),
        (axes.debugging, 0.03), (axes.format, 0.02),
        (axes.safety, 0.02),
    ];

    let base: f64 = weights.iter()
        .map(|(score, weight)| score.powf(1.6) * weight)
        .sum::<f64>() * 100.0;

    let mut curved = (base / 100.0).powf(1.3) * 100.0;

    if axes.correctness < 0.90 { curved -= 8.0 }
    if axes.correctness < 0.70 { curved -= 10.0 }
    if axes.correctness < 0.50 { curved -= 15.0 }
    if axes.code_quality < 0.60 { curved -= 8.0 }
    if axes.code_quality < 0.40 { curved -= 15.0 }
    if axes.complexity < 0.30 { curved -= 12.0 }

    curved
}

No subprocess. No string-to-number coercion. No positional arguments where the ninth parameter is $9 and you hope you counted right. The types are explicit. The struct field names document what each axis means. The quality gate penalties are visible instead of buried in an awk block. And if you rename code_quality to quality, the compiler tells you every call site that needs updating. Something that does not exist in bash.

This is not about Rust being a better language in the abstract. It is about the specific gap between what AST needs and what bash provides. The scoring function is one example. The same gap shows up in error handling (Result<T, E> vs. silent empty strings from failed jq pipelines), in architecture (modules with defined interfaces vs. functions sharing global variables), and in distribution (a compiled binary vs. a script that assumes macOS Sequoia).

Refactoring as a Pillar

AST works. It has a CI pipeline, a test suite, a Homebrew tap, and users. Nothing is broken. But the structure that made sense at 200 lines does not make sense at 824, and it will make less sense at 1,500. The responsible thing is not to keep patching the bash script until it collapses under its own weight. The responsible thing is to recognize that the project outgrew its original form and reshape it.

That is refactoring. Not maintenance. Maintenance fixes what is broken. Refactoring reshapes what works but no longer fits. And it deserves to stand next to testing, readability, and reliability as a structural pillar, not sit in the backlog as a chore you will get to when there is "spare time."

Why Bash Was Still Right

Bash was the right choice when I started AST. It gave me a production CLI in an afternoon with zero dependencies beyond what macOS ships. I shipped it, got it into Homebrew, built a user base, and learned what the tool needed to become by running it in the real world. None of that would have been faster in Rust. The compile times alone would have slowed the early iteration where I was changing the output format three times a day.

This is the part that gets lost in language debates. The right tool changes as the project changes. Starting in bash was not a mistake that needs correcting. It was an implementation that served its purpose and is now being succeeded by the next one.

This pattern plays out at every scale. The TypeScript compiler was written in TypeScript for over a decade. It worked. It shipped. It powered an ecosystem used by millions. Then in 2025, Microsoft announced a rewrite in Go, not because TypeScript failed, but because the scale of the ecosystem outgrew what a self-hosted compiler could deliver in build performance. Ten years of serving its purpose, then a language change when the requirements demanded it. Ryan Dahl built Node.js in C++, ran it for years, then started over with Deno in Rust because he understood what the runtime actually needed after living with the first version.

Neither rewrite was an admission of failure. Both were the natural next step after an implementation proved what the project needed to become. The bash version taught me what the Rust version needs to be. That is what iteration is for.

If you are looking at a project and thinking "this works but it is getting harder to change," that is not a sign you built it wrong. That is a sign you built it well enough to find its limits. You now have something more valuable than a clean codebase: the knowledge of what the project actually requires.

Iterate. Reshape. Refactor. There is no done.