ch03/l02

Pipelines and filters

Compose small tools into an evidence-producing report.

sort | uniq -c 20 min read, 30 min lab foundation

A pipe connects stdout from one command to stdin of the next. Each stage should have a simple job: select, transform, count, sort, or display. Pipelines are readable when each stage answers one question.

In the field

You need to count repeated error types in a copied log.

Worked command

$ grep "ERROR" app.log | cut -d' ' -f5 | sort | uniq -c | sort -nr$ find . -type f -name '*.log' -print0 | xargs -0 grep -n "timeout"
Anti-pattern

Do not build a long pipeline before testing each stage.

Safer pattern

Run the first stage, inspect output, then add one filter at a time.

Knowledge check

In `... | sort | uniq -c`, why must `sort` come before `uniq -c`?

  • A uniq -c sorts internally, so sort just makes the output prettier
  • B uniq only collapses adjacent duplicate lines, so identical lines must be grouped first
  • C Without sort, uniq -c counts each line as unique because duplicates aren't adjacent
  • D sort is needed so uniq can read stdin instead of a file argument
Show the answer

Correct: B. uniq only collapses adjacent duplicate lines, so identical lines must be grouped first

Why

`uniq` compares each line only to its immediate neighbor, so duplicates scattered through the stream are never merged unless `sort` groups them first. The tempting wrong answer assumes `uniq` sorts on its own — it does not; it just deduplicates adjacent lines.

Practice checklist

  1. Create a small log sample.
  2. Count repeated terms with `sort | uniq -c`.
  3. Repeat with a null-safe `find | xargs` pipeline.

Deliverable evidence

  • A four-stage pipeline with a one-line explanation for each stage.
Teaching diagramch03 · mental model
three streams, redirected separately stdin fd 0 command grep / sort stdout fd 1 > file stderr fd 2 2> file stdout pipes to next stage; stderr stays on screen > redirects fd1 only — fd2 needs its own 2>

shows: A command reads stdin (fd0) and writes two independent outputs — stdout (fd1) and stderr (fd2) — each redirectable on its own, so `>` captures data while `2>` captures diagnostics.

does not prove: It shows the wiring of the three streams, not that any given command actually sends errors to stderr — a few tools misroute diagnostics to stdout, so you still confirm by inspecting both targets.

Memorize this

Commit these to memory, then drill them until recall is automatic.

pipesortuniq -cxargs -0cut
Recall practice · Meaning -> command

cue Count how many times each distinct line appears, ordered most-frequent first

show recall target

sort | uniq -c | sort -nr