Pipelines and filters
Compose small tools into an evidence-producing report.
sort | uniq -c
20 min read, 30 min lab
foundation
A pipe connects stdout from one command to stdin of the next. Each stage should have a simple job: select, transform, count, sort, or display. Pipelines are readable when each stage answers one question.
You need to count repeated error types in a copied log.
Worked command
$ grep "ERROR" app.log | cut -d' ' -f5 | sort | uniq -c | sort -nr$ find . -type f -name '*.log' -print0 | xargs -0 grep -n "timeout"
Do not build a long pipeline before testing each stage.
Run the first stage, inspect output, then add one filter at a time.
In `... | sort | uniq -c`, why must `sort` come before `uniq -c`?
Show the answer
Correct: B. uniq only collapses adjacent duplicate lines, so identical lines must be grouped first
`uniq` compares each line only to its immediate neighbor, so duplicates scattered through the stream are never merged unless `sort` groups them first. The tempting wrong answer assumes `uniq` sorts on its own — it does not; it just deduplicates adjacent lines.
Practice checklist
- Create a small log sample.
- Count repeated terms with `sort | uniq -c`.
- Repeat with a null-safe `find | xargs` pipeline.
Deliverable evidence
- A four-stage pipeline with a one-line explanation for each stage.
shows: A command reads stdin (fd0) and writes two independent outputs — stdout (fd1) and stderr (fd2) — each redirectable on its own, so `>` captures data while `2>` captures diagnostics.
does not prove: It shows the wiring of the three streams, not that any given command actually sends errors to stderr — a few tools misroute diagnostics to stdout, so you still confirm by inspecting both targets.
Commit these to memory, then drill them until recall is automatic.
cue Count how many times each distinct line appears, ordered most-frequent first
show recall target
sort | uniq -c | sort -nr