All products & tools
LiveDataset
Indian Addresses Raw
4.37M raw Indian address records from MCA & bank branch sources
About
indian-addresses-raw is a 4.37-million-record open dataset of unstructured Indian addresses drawn from MCA corporate filings and bank branch registries. Every record carries source_type, lifecycle_state, declared state/pincode, lat/lon coordinates where available, and full provenance metadata — making it the largest publicly available corpus of raw Indian addresses. It is the upstream source behind the gold-labeled training set and the Qwen3 Indian Address Parser model.
Features
- 4,370,606 raw address records — largest public Indian address corpus
- Sources: MCA corporate filings + Indian bank branch registries
- Per-record source_type, lifecycle_state, and provenance metadata
- Declared state, pincode, latitude, and longitude fields
- Cross-linked with indian-addresses-gold and the Qwen3 address parser model
Tags
DatasetIndian AddressesMCANLPOpen DataGeospatial