Reconciliation at Scale — Part 1

Harjeet Singh
8 min readAug 15, 2023
Image source — google

I remember a poem we read in English Literature that went like

“Water, water, everywhere,
And all the boards did shrink;
Water, water, everywhere,
Nor any drop to drink.”
- Samuel Taylor Coleridge

Ok, Let’s talk about Reconciliation, we’ll go back to the poem later. First of all, what is Reconciliation? There are technical definitions, but I'll write about what it encompasses. After the ETL has happened, and the processes that need to make data available for any means of consumption are done with, your data is ready to be served. But how do we know the data needs to be served is correct, complete, and isn’t different from what it needs to be?
See the use of words like complete and different. We are not talking about errors in data, or incomplete data as in, some columns are missing, or raw data was corrupt, a field that should have had customer email doesn't contain an email-like structure or an Integer field contains a string, etc. That is all part of pre-cleaning or post-cleaning, depending on how your pipeline ingestion is set up and at what stage data gets altered from raw to processed or staging to consumption layer. We are talking about whether all the data that is supposed to reach from point A to point B has reached or not. Did it get changed in transit? Did we lose any data points?
The process to verify this is Reconciliation. Long definition, right…

--

--

Harjeet Singh

Problem Solver, writes on Tech, finance and Product. Watch out for my new creation, "THE PM SERIES"