summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorDirk Engling <erdgeist@erdgeist.org>2015-05-19 14:15:23 +0200
committerDirk Engling <erdgeist@erdgeist.org>2015-05-19 14:15:23 +0200
commitb3a053c07a9f43b951196c62533d6dab0d3ccd3d (patch)
treecde57a8a7f8f5877d83d93e62a921b2bff4f68b6
parent0641cbc58f607b57df2a8f70d368bb5bef4f31d4 (diff)
Documentation of version 2
-rw-r--r--README28
1 files changed, 28 insertions, 0 deletions
diff --git a/README b/README
index 7da5e3b..ef9f374 100644
--- a/README
+++ b/README
@@ -25,4 +25,32 @@ The entries are then packed into a 7 bit stream, with the 0000001b separating th
25version 2 25version 2
26========= 26=========
27 27
28Each database file (atb?dd00) is composed of several continous chunks of PKware packed blocks. Each decompressed block is 8192 bytes. The amount of chunks (excluding the first) is at (uint16_t*)0x14. Start offsets of all but the first chunks are in turn encoded in a 19bit packed list of integers at 0x20. The first chunk starts at the end of this table, the rest is relative to 0x800. The amount of blocks per chunk is at (uint16_t*)0x1c, for the first chunk the (amount-1) is at (uint16_t*)0x1e. The example tool "blast" from the zlib archive can decompress these blocks.
29
30A corresponding index file atb?di00 consists of a list of uint32_t offsets for each record, relative to the continous stream of all decompressed 8192 byte blocks. These offsets start at file position 0x8, with the (uint32_t*)0x0 being the amount of records in the index (and thus in the database).
31
32The CDs do have different database layouts for each region (NO, W, S). The reason is prefixes for north east Germany having more than 5 digits. A record is arranged around the record offset, with (most) flags, counts and offsets after the offset and strings before the offset.
33
34As in version 1, each record can have multiple entries. The columns present in the first entry (that is always there) are described by the bits set in a flag word and refer to the string table. Columns present in continuation entries are directly encoded by tuples described below.
35
36The amount of entry parts is at (uint16_t*)0x0 guaranteed to be at least 1, the number of flag bytes whose bits describe the first entry is at (uint16_t*)0x2, both uint16_t relative to the record offset. Each entry is described by a tuple of uint16_t (column_id, offset) stored in a single (uint32_t*) starting at (uint16_t*)0x4. The fact that they belong together in a uin32_t is important, because they can not span a block boundary and are pushed to the beginning of the next block, with all further tuples pushed accordingly. The offsets in each entry's tuples refer to a position before the record, adjusted by the number of flag bytes for the first entry (i.e. if there's 2 flag bytes signalled at (uint16_t*)0x2, the string table ends at two bytes before record offset).
37
38All strings are encoded in cp437.
39
40The first entry is parsed from the end, all strings with dynamic length end with their length byte, so can be consumed from the end, fixed length strings do not have a length byte but can be \0-terminated. The exact layout of the first entry depends on the database layout. Region NO differs slighty. All columns we describe now are in reverse order, i.e. PPPPPZZZZZC: In all layouts the entry ends with a continuation indicator, which is a fixed 1 character string "1" prepended a fixed 5 character string containing a zip code. In versions S and W, we find a fixed 5 character string containing a phone prefix (Vorwahl) which actually is not used except in the UI (the correct Vorwahl is encoded with bit 0x0020 in S/W and 0x0010 in NO).
41
42The next columns depend on the bits set in the first record's flag bit:
43If the bit 0x0080 is set, a fixed 5 character string is present that overrides the first fixed zip code.
44If the bit 0x0040 is set, a fixed 1 character string "X" is present that, whose meaning is unclear.
45If the bit 0x0020 is set, in NO dynamic string containing the Rufnummer is present, in S/W a fixed 5 character string containing the Vorwahl is present.
46If the bit 0x0010 is set, a dynamic string is present, representing the Vorwahl in NO and the Rufnummer in S/W.
47The next 8 bits indicate dynamic strings: 0x0008 => Vorname, 0x0004 => Strasse, 0x0002 => Ort, 0x0001 => Hausnummer, 0x8000 => Zusaetze, 0x4000 => Ortszusatz, 0x2000 => S/W: Adresszusatz NO: Verweise, 0x1000 => NO: Adresszusatz S/W: Verweise.
48
49From now we only have dynamic strings: A column representing the Ort is always present. In case of the NO layout there is the unused phone prefix (Vorwahl). Finally there is the Name column.
50
51Now we look at the continuation entry parts: A column_id of 0x4003 denotes a start of an entry in the sense of a new line in the telephone book. This is actually always the column_id for the first entry. For all but the first entry the corresponding string in the string table is the one byte string "2", and as stated above for the first entry this continuation indicator is always "1". All other entries represent columns for the current entry. The exact mapping from column id to column varies between NO, S and W and can be looked up in the source code ;) String length is implicite by the previos string's offset.
52
53version 3
54=========
55
28TBD. 56TBD.