// article

Cardinality lessons from a synthetic log storm

2024-09-15 · Nhat Linh Dang

#observability#cost#labs

Illustration for Cardinality lessons from a synthetic log storm

The Data Pipelines for Logs cohort starts with a boring CSV of field names. By week three, that CSV becomes a battlefield — which fields deserve indexes, which should be sampled, and which should never have existed.

We inject a synthetic storm: thousands of ephemeral keys designed to mimic a buggy client release. Your job is not heroic regex heroics; it is negotiation with future you about what "must be searchable" really means.

Mentors grade on communication as much as configuration. If you drop fields silently, you document the drop. If you keep a noisy field, you defend the cost.

We end with a short worksheet you could hand to finance — rough monthly storage deltas, not fake precision. The numbers are illustrative, but the structure is what alumni reuse.

Back to blog