summaryrefslogtreecommitdiff
path: root/README
blob: 2686785093fdbc876640499125834e1fa01c5afe (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
This project can currently dump and (partially) normalize white pages from Deutsche Telekom's CD and DVDs.

The on-disc-data currently comes in four flavours (see https://erdgeist.org/posts/2008/datenmessie.html)

version 1) Teleauskunft 1188 from 1992, (April-June)
version 2) Teleauskunft 1188 Telefon-Teilnehmer, Oktober 1995 / Telefon-Teilnehmer Gesamtausgabe from 1995/1996
version 3) Telefonbuch für Deutschland, Version 1.0 1996 through DasTelefonbuch, Deutschland, Herbst 2003
version 4) DasTelefonbuch, Map&Route, Frühjahr 2004 until now

version 1
=========

Notes: Strings are encoded in cp437, those inside records stored in 7-bit packed encoding. Only the .001 files on each CD are interesting.

Each file consists of a standard header and a number of pages, with pages starting at 0x800, being spaced at 0x2000 steps.

The header's important values are (uint16_t*)0x40 number of pages, (uint32_t*)0x42 total number of records in file and a \0 separated list of gasse, city, zip and prefix, starting at 0xe8.

Each page can either be a "normal" one, with phone entries or a "blob" one, with multi line records, being referenced from "normal" pages inside the same file. It starts with a flag (uint8_t*)0x00, a size of blob's contents (i.e. if != 0, this is a blob page) at (uint16_t*)0x02, a count of records in that page at (uint16_t*)0x04 and for each record a pointer into this page (plus offset 0x0e), starting at 0x0e.

Each record