summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorDirk Engling <erdgeist@erdgeist.org>2015-06-09 12:37:02 +0200
committerDirk Engling <erdgeist@erdgeist.org>2015-06-09 12:37:02 +0200
commit63d76f54c09f8e265cefac36d9c8536eb89aeb76 (patch)
tree95211e8740ab2b458b7007456c25e2799ab135dc
parent8d0b4ac83bf3334c66fe7024bc4ada3b7f955157 (diff)
Document version 4
-rw-r--r--README14
1 files changed, 12 insertions, 2 deletions
diff --git a/README b/README
index a05d0de..ea504f6 100644
--- a/README
+++ b/README
@@ -67,10 +67,20 @@ Now we look at the continuation entry parts: A column_id of 0x4003 denotes a sta
67version 3 67version 3
68========= 68=========
69 69
70Some of the files on disc are obfuscated while others are plain. They consist of chunks of lha compressed data (as readable by the lha command). The obfuscation is a simple XOR of the first 32 (for streets: 34) bytes with a static 4 byte key that changes with every CD. Since lha headers do have a static signature (i.e. -lh5-), it's easy to derive the current key by xoring the static value with the bytes found in the file. Conveniently un-obfuscated files generate a all-zero key and don't need special treatment. The path name of each chunk is identical so it must be re-written and the header checksum must be re-calculated. 70All strings are iso8859-1. Some of the files on disc are obfuscated while others are plain. They consist of chunks of lha compressed data (as readable by the lha command). The obfuscation is a simple XOR of the first 32 (for streets: 34) bytes with a static 4 byte key that changes with every CD. Since lha headers do have a static signature (i.e. -lh5-), it's easy to derive the current key by xoring the static value with the bytes found in the file. Conveniently un-obfuscated files generate a all-zero key and don't need special treatment. The path name of each chunk is identical so it must be re-written and the header checksum must be re-calculated.
71 71
72After decompressing all files (quite a lot, it's split at 3k entries per file) from dat/teiln.dat (case insensitive), we get three kinds of files: one with all Vorname columns, one with all Nachname columns and one with the rest of the columns. Before 2000_Q1 files {0,3,6,...} were Nachname, {1,4,7,...} were Vorname and {2,5,8,...} were all the other tables. After 2000_Q1 (incl.) {0,3,6,...} were all other columns, while {1,4,7,...} were Nachname and {2,5,8,...} Vorname columns. The first char in each entry in the Nachname column contains a continuation flag, where "1" means single line entry, "3" is the first and "2" are the remaining lines in a continuation. 72After decompressing all files (quite a lot, it's split at 3k entries per file) from dat/teiln.dat (case insensitive), we get three kinds of files: one with all Vorname columns, one with all Nachname columns and one with the rest of the columns. Before 2000_Q1 files {0,3,6,...} were Nachname, {1,4,7,...} were Vorname and {2,5,8,...} were all the other tables. After 2000_Q1 (incl.) {0,3,6,...} were all other columns, while {1,4,7,...} were Nachname and {2,5,8,...} Vorname columns. The first char in each entry in the Nachname column contains a continuation flag, where "1" means single line entry, "3" is the first and "2" are the remaining lines in a continuation.
73 73
74If there is a dat/strassen.dat (case insensitive), the concatenated decompressed chunks are a list of all street names referenced in the street/hnr column. 74If there is a dat/strassen.dat (case insensitive), the concatenated decompressed chunks are a list of all street names referenced (hex, 0-based) in the street/hnr column.
75 75
76If there is a dat/karto.dat (case insensitive), the concatenated decompressed chunks are a sorted list of all zip/streetname(/hnr) combinations on that CD, each line finished by the geo coordinates of that address. 76If there is a dat/karto.dat (case insensitive), the concatenated decompressed chunks are a sorted list of all zip/streetname(/hnr) combinations on that CD, each line finished by the geo coordinates of that address.
77
78version 4
79=========
80
81All strings are iso8859-1. All relevant information are in the files phonebook.db and streets.tl. If you need geo-coordinates to all your addresses, you'll also need the zip-streets-hn-geo.tl or zip-streets-hn.tl files, respectively. The files consist of zlib compressed chunks with streetname file just being a \0 separated list of streets, each referenced (dec, 0-based) from the street/hnr column. For the zip-streets*-geo.tl files, the concatenated decompressed chunks are a sorted list of all zip/streetname(/hnr) combinations on that CD, each line finished by the geo coordinates of that address (since they are sorted, you can do a binary search for each address in that file).
82
83The phonebook.db decompresses in a lot chunks with around 3000 entries sorted into 11 columns, the first being a binary flag byte and the latter: Nachname, Vorname, Namenszusatz+Addresszusatz, Verweise, Strassenindex_Hausnummer, Vorwahl, Postleitzahl, Ort, Rufnummer, Email+Webadresse respectively.
84
85The flag byte's lower nibble is 0 for a single line entry, 1 for the start of a continuation record and 2 for a trailing line of a continuation. The flag's top nibble bits are 0x80 set for a business record (as opposed to a natural person), 0x40 set, if the number must not be included in reverse query results and 0x20 used but unknown.
86