From 52532242ef38937387fac2303e0371860a15caa3 Mon Sep 17 00:00:00 2001
From: itsme <itsme@xs4all.nl>
Date: Fri, 9 Jul 2021 17:13:12 +0200
Subject: updated documentation

---
 docs/cronos-research.md | 118 ++++++++++++++++++++++++++++--------------------
 1 file changed, 68 insertions(+), 50 deletions(-)

diff --git a/docs/cronos-research.md b/docs/cronos-research.md
index 64e2d51..a60d054 100644
--- a/docs/cronos-research.md
+++ b/docs/cronos-research.md
@@ -21,44 +21,55 @@ On a default Windows installation, the CronosPro app shows with several encoding
 
 ##Files ending in .dat
 
-All .dat files start with the string `"CroFile\0"` and then 8 more header bytes
+All .dat files start with a 19 byte header:
 
-`CroStru.dat` has
+    char      magic[9]      // allways: 'CroFile\x00'
+    uint16    unknown
+    char      version[5]    // 01.02, 01.03, 01.04
+    uint16    encoding      // 1 = KOD, 3 = encrypted
+    uint16    blocksize     // 0040 = Bank, 0400 = Index, 0200 = Stru
+    
+This is followed by a block of 0x101 or 0x100 minus 19 bytes seemingly random data.
 
-    xx yy 30 31 2e 30 32 01 == ? ? 0 1 . 0 2 ?
+The unknown word is unclear but seems not to be random, might be a checksum.
 
+In `CroBank.dat` there's a bias towards 313 times c8 05, 196 times b8 00, 116 times 4e 13, 95 times 00 00, and 81 times 98 00 out of 1964 databases.
 
-CroBank.dat and CroIndex.dat have (as found in the big dump)
+In `CroStru.dat` there's a bias towards 351 times c8 05, 224 times b8 00, 119 times 4e 13, 103 times 00 00 and 83 times 98 00 out of 1964 databases.
 
-    xx yy 30 31 2e 30 32 0[0123] == ? ? 0 1 . 0 2 ?
+In `CroIndex.dat` there's a bias towards 312 times c8 05, 194 times b8 00, 107 times 4e 13, 107 times 00 00 and 82 times 98 00 out of 1964 databases.
 
-    xx yy 30 31 2e 30 33 0[023]  == ? ? 0 1 . 0 3 ?
 
-    xx yy 30 31 2e 30 34 03      == ? ? 0 1 . 0 4 ?
+##Files ending in .tad
 
+The first two `uint32` are the number of deleted records and the tad offset to the first deleted entry.
+The deleted entries form a linked list, with the size always 0xFFFFFFFF.
 
-which seems to be the version identifier. The xx yy part is unclear but seems not to be random, might be a checksum.
+Depending on the version in the `.dat` header, `.tad` use either 32 bit or 64 bit file offsets
 
-In `CroBank.dat` there's a bias towards 313 times c8 05, 196 times b8 00, 116 times 4e 13, 95 times 00 00, and 81 times 98 00 out of 1964 databases.
+version `01.02` and `01.04` use 32 bit offsets:
 
-In `CroStru.dat` there's a bias towards 351 times c8 05, 224 times b8 00, 119 times 4e 13, 103 times 00 00 and 83 times 98 00 out of 1964 databases.
+    uint32 offset
+    uint32 size       // with flag in upper bit, 0 -> large record
+    uint32 checksum   // but sometimes just 0x00000000, 0x00000001 or 0x00000002
 
-In `CroIndex.dat` there's a bias towards 312 times c8 05, 194 times b8 00, 107 times 4e 13, 107 times 00 00 and 82 times 98 00 out of 1964 databases.
+version `01.03` uses 64 bit offsets:
 
-##Files ending in .tad
+    uint64 offset
+    uint32 size       // with flag in upper bit, 0 -> large record
+    uint32 checksum   // but sometimes just 0x00000000, 0x00000001 or 0x00000002
 
-The first two `uint32_t` seem to be the amount and the offset to the first free block.
+where size can be 0xffffffff (indicating a free/deleted block).
+Bit 31 of the size indicates that this is an extended record.
 
-The original description made it look like there were different formats for the block references, but all entries in the .tads appear to follow the scheme:
+Extended records start with plaintext: { uint32 offset, uint32 size }  or { uint64 offset, uint32 size }
 
-    uint32_t offset
-    uint32_t size       // with flag in upper bit, 0 -> large record
-    uint32_t checksum   // but sometimes just 0x00000000, 0x00000001 or 0x00000002
 
-where size can be 0xffffffff (probably to indicate a free/deleted block) some size entries have their top bits set. In some files the offset looks garbled but usually the top bit of the size then is set.
+## the 'old format'
+
+The original description made it look like there were different formats for the block references.
 
-large records start with plaintext: { uint32 offset, uint32 size? }
-followed by data obfuscated with 'shift==0'
+This was found in previously existing documentation, but no sample databases with this format were found so far.
 
 If the .dat file has a version of 01.03 or later, the corresponding .tad file looks like this:
 
@@ -69,12 +80,12 @@ If the .dat file has a version of 01.03 or later, the corresponding .tad file lo
 
 The old description would also assume 12 byte reference blocks but a packed struct, probably if the CroFile version is 01.01.
 
-    uint32_t offset1
-    uint16_t size1
-    uint32_t offset2
-    uint16_t size2
+    uint32 offset1
+    uint16 size1
+    uint32 offset2
+    uint16 size2
 
-with the first chunk read from offset1 with length size1 and potentially more parts with total length of size2 starting at file offset offset2 with the first `uint32_t` of the 256 byte chunk being the next chunk's offset and a maximum of 252 bytes being actual data.
+with the first chunk read from offset1 with length size1 and potentially more parts with total length of size2 starting at file offset offset2 with the first `uint32` of the 256 byte chunk being the next chunk's offset and a maximum of 252 bytes being actual data.
 
 However, I never found files with .tad like that. Also the original description insisted on those chunks needing the decode-magic outlined below, but the python implementation only does that for CroStru files and still seems to produce results.
 
@@ -114,7 +125,14 @@ Interesting files are CroStru.dat containing metadata on the database within blo
       0x7e, 0xab, 0x59, 0x52, 0x54, 0x9c, 0xd2, 0xe9,
       0xef, 0xdd, 0x37, 0x1e, 0x8f, 0xcb, 0x8a, 0x90,
       0xfc, 0x84, 0xe5, 0xf9, 0x14, 0x19, 0xdf, 0x6e,
-      0x23, 0xc4, 0x66, 0xeb, 0xcc, 0x22, 0x1c, 0x5c };
+      0x23, 0xc4, 0x66, 0xeb, 0xcc, 0x22, 0x1c, 0x5c,
+    };
+
+
+given the `shift`, the encoded data: `a[0]..a[n-1]` and the decoded data: `b[0]..b[n-1]`, the encoding works as follows:
+
+    decode: b[i] = KOD[a[i]] - (i+shift)
+    encode: a[i] = INV[b[i] + (i+shift)]
 
 
 The original description of an older database format called the per block counter start offset 'sistN' which seems to imply it to be constant for certain entries. They correspond to a "system number" of meta entries visible in the database software. Where these offsets come from is currently unknown, the existing code just brute forces through all offsets and looks for certain sentinels.
@@ -136,11 +154,11 @@ Names are stored as: `byte strlen + char value[strlen]`
 
 The first entry contains:
 
-    byte
+    uint8
     array {
         Name keyname
-        uint32_t index_or_size;   // size when bit31 is set.
-        byte data[size]
+        uint32 index_or_size;   // size when bit31 is set.
+        uint8 data[size]
     }
 
 this results in a dictionary, with keys like: `Bank`, `BankId`, `BankTable`, `Base`nnn, etc.
@@ -149,27 +167,27 @@ the `Base000` entry contains the record number for the table definition of the f
 
 ## table definitions
 
-    byte version
-    word16 unk1
-    word16 unk2
-    word16 unk3
-    word32 unk4
-    word32 unk5
+    uint8  version
+    uint16 unk1
+    uint16 unk2
+    uint16 unk3
+    uint32 unk4
+    uint32 unk5
     Name   tablename
     Name   unk6
-    word32 unk7
-    word32 nrfields
+    uint32 unk7
+    uint32 nrfields
 
     array {
-      word16 entrysize    -- total nr of bytes in this entry.
-      word16 fieldtype      0 = sysnum, 2 = text, 4 = number
-      word32 fieldindex ??
+      uint16 entrysize    -- total nr of bytes in this entry.
+      uint16 fieldtype      0 = sysnum, 2 = text, 4 = number
+      uint32 fieldindex ??
       Name   fieldname
-      word32 
-      byte
-      word32 fieldindex ??
-      word32 fieldsize
-      word32 ?
+      uint32 
+      uint8
+      uint32 fieldindex ??
+      uint32 fieldsize
+      uint32 ?
       ...
     } fields[nrfields]
 
@@ -180,9 +198,9 @@ the `Base000` entry contains the record number for the table definition of the f
 
 some records are compressed, the format is like this:
 
-    word16 size
-    byte   head[2] = { 8, 0 }
-    word32 crc32
-    byte   compdata[size-4]
-    byte   tail[3] = { 0, 0, 2 }
+    uint16 size
+    uint8   head[2] = { 8, 0 }
+    uint32 crc32
+    uint8   compdata[size-4]
+    uint8   tail[3] = { 0, 0, 2 }
 
-- 
cgit v1.2.3