updated documentation

author: itsme <itsme@xs4all.nl> 2021-07-09 17:13:12 +0200
committer: itsme <itsme@xs4all.nl> 2021-07-09 17:14:20 +0200
commit: 52532242ef38937387fac2303e0371860a15caa3 (patch)
tree: aff67efe6a796e3335b8f420ba7d0ec13dca3189
parent: 277f905849f9a050089049b2c84c45fac6203045 (diff)
1 files changed, 68 insertions, 50 deletions
diff --git a/docs/cronos-research.md b/docs/cronos-research.md
index 64e2d51..a60d054 100644
--- a/docs/cronos-research.md
+++ b/docs/cronos-research.md
@@ -21,44 +21,55 @@ On a default Windows installation, the CronosPro app shows with several encoding
 ##Files ending in .dat
-All .dat files start with the string `"CroFile\0"` and then 8 more header bytes
+All .dat files start with a 19 byte header:
-`CroStru.dat` has
+    char      magic[9]      // allways: 'CroFile\x00'
+    uint16    unknown
+    char      version[5]    // 01.02, 01.03, 01.04
+    uint16    encoding      // 1 = KOD, 3 = encrypted
+    uint16    blocksize     // 0040 = Bank, 0400 = Index, 0200 = Stru
+    
+This is followed by a block of 0x101 or 0x100 minus 19 bytes seemingly random data.
-    xx yy 30 31 2e 30 32 01 == ? ? 0 1 . 0 2 ?
+The unknown word is unclear but seems not to be random, might be a checksum.
+In `CroBank.dat` there's a bias towards 313 times c8 05, 196 times b8 00, 116 times 4e 13, 95 times 00 00, and 81 times 98 00 out of 1964 databases.
-CroBank.dat and CroIndex.dat have (as found in the big dump)
+In `CroStru.dat` there's a bias towards 351 times c8 05, 224 times b8 00, 119 times 4e 13, 103 times 00 00 and 83 times 98 00 out of 1964 databases.
-    xx yy 30 31 2e 30 32 0[0123] == ? ? 0 1 . 0 2 ?
+In `CroIndex.dat` there's a bias towards 312 times c8 05, 194 times b8 00, 107 times 4e 13, 107 times 00 00 and 82 times 98 00 out of 1964 databases.
-    xx yy 30 31 2e 30 33 0[023]  == ? ? 0 1 . 0 3 ?
-    xx yy 30 31 2e 30 34 03      == ? ? 0 1 . 0 4 ?
+##Files ending in .tad
+The first two `uint32` are the number of deleted records and the tad offset to the first deleted entry.
+The deleted entries form a linked list, with the size always 0xFFFFFFFF.
-which seems to be the version identifier. The xx yy part is unclear but seems not to be random, might be a checksum.
+Depending on the version in the `.dat` header, `.tad` use either 32 bit or 64 bit file offsets
-In `CroBank.dat` there's a bias towards 313 times c8 05, 196 times b8 00, 116 times 4e 13, 95 times 00 00, and 81 times 98 00 out of 1964 databases.
+version `01.02` and `01.04` use 32 bit offsets:
-In `CroStru.dat` there's a bias towards 351 times c8 05, 224 times b8 00, 119 times 4e 13, 103 times 00 00 and 83 times 98 00 out of 1964 databases.
+    uint32 offset
+    uint32 size       // with flag in upper bit, 0 -> large record
+    uint32 checksum   // but sometimes just 0x00000000, 0x00000001 or 0x00000002
-In `CroIndex.dat` there's a bias towards 312 times c8 05, 194 times b8 00, 107 times 4e 13, 107 times 00 00 and 82 times 98 00 out of 1964 databases.
+version `01.03` uses 64 bit offsets:
-##Files ending in .tad
+    uint64 offset
+    uint32 size       // with flag in upper bit, 0 -> large record
+    uint32 checksum   // but sometimes just 0x00000000, 0x00000001 or 0x00000002
-The first two `uint32_t` seem to be the amount and the offset to the first free block.
+where size can be 0xffffffff (indicating a free/deleted block).
+Bit 31 of the size indicates that this is an extended record.
-The original description made it look like there were different formats for the block references, but all entries in the .tads appear to follow the scheme:
+Extended records start with plaintext: { uint32 offset, uint32 size }  or { uint64 offset, uint32 size }
-    uint32_t offset
-    uint32_t size       // with flag in upper bit, 0 -> large record
-    uint32_t checksum   // but sometimes just 0x00000000, 0x00000001 or 0x00000002
-where size can be 0xffffffff (probably to indicate a free/deleted block) some size entries have their top bits set. In some files the offset looks garbled but usually the top bit of the size then is set.
+## the 'old format'
+The original description made it look like there were different formats for the block references.
-large records start with plaintext: { uint32 offset, uint32 size? }
+This was found in previously existing documentation, but no sample databases with this format were found so far.
-followed by data obfuscated with 'shift==0'
 If the .dat file has a version of 01.03 or later, the corresponding .tad file looks like this:
@@ -69,12 +80,12 @@ If the .dat file has a version of 01.03 or later, the corresponding .tad file lo
 The old description would also assume 12 byte reference blocks but a packed struct, probably if the CroFile version is 01.01.
-    uint32_t offset1
+    uint32 offset1
-    uint16_t size1
+    uint16 size1
-    uint32_t offset2
+    uint32 offset2
-    uint16_t size2
+    uint16 size2
-with the first chunk read from offset1 with length size1 and potentially more parts with total length of size2 starting at file offset offset2 with the first `uint32_t` of the 256 byte chunk being the next chunk's offset and a maximum of 252 bytes being actual data.
+with the first chunk read from offset1 with length size1 and potentially more parts with total length of size2 starting at file offset offset2 with the first `uint32` of the 256 byte chunk being the next chunk's offset and a maximum of 252 bytes being actual data.
 However, I never found files with .tad like that. Also the original description insisted on those chunks needing the decode-magic outlined below, but the python implementation only does that for CroStru files and still seems to produce results.
@@ -114,7 +125,14 @@ Interesting files are CroStru.dat containing metadata on the database within blo
      0x7e, 0xab, 0x59, 0x52, 0x54, 0x9c, 0xd2, 0xe9,
      0xef, 0xdd, 0x37, 0x1e, 0x8f, 0xcb, 0x8a, 0x90,
      0xfc, 0x84, 0xe5, 0xf9, 0x14, 0x19, 0xdf, 0x6e,
-      0x23, 0xc4, 0x66, 0xeb, 0xcc, 0x22, 0x1c, 0x5c };
+      0x23, 0xc4, 0x66, 0xeb, 0xcc, 0x22, 0x1c, 0x5c,
+    };
+given the `shift`, the encoded data: `a[0]..a[n-1]` and the decoded data: `b[0]..b[n-1]`, the encoding works as follows:
+    decode: b[i] = KOD[a[i]] - (i+shift)
+    encode: a[i] = INV[b[i] + (i+shift)]
 The original description of an older database format called the per block counter start offset 'sistN' which seems to imply it to be constant for certain entries. They correspond to a "system number" of meta entries visible in the database software. Where these offsets come from is currently unknown, the existing code just brute forces through all offsets and looks for certain sentinels.
@@ -136,11 +154,11 @@ Names are stored as: `byte strlen + char value[strlen]`
 The first entry contains:
-    byte
+    uint8
    array {
        Name keyname
-        uint32_t index_or_size;   // size when bit31 is set.
+        uint32 index_or_size;   // size when bit31 is set.
-        byte data[size]
+        uint8 data[size]
    }
 this results in a dictionary, with keys like: `Bank`, `BankId`, `BankTable`, `Base`nnn, etc.
@@ -149,27 +167,27 @@ the `Base000` entry contains the record number for the table definition of the f
 ## table definitions
-    byte version
+    uint8  version
-    word16 unk1
+    uint16 unk1
-    word16 unk2
+    uint16 unk2
-    word16 unk3
+    uint16 unk3
-    word32 unk4
+    uint32 unk4
-    word32 unk5
+    uint32 unk5
    Name   tablename
    Name   unk6
-    word32 unk7
+    uint32 unk7
-    word32 nrfields
+    uint32 nrfields
    array {
-      word16 entrysize    -- total nr of bytes in this entry.
+      uint16 entrysize    -- total nr of bytes in this entry.
-      word16 fieldtype      0 = sysnum, 2 = text, 4 = number
+      uint16 fieldtype      0 = sysnum, 2 = text, 4 = number
-      word32 fieldindex ??
+      uint32 fieldindex ??
      Name   fieldname
-      word32 
+      uint32 
-      byte
+      uint8
-      word32 fieldindex ??
+      uint32 fieldindex ??
-      word32 fieldsize
+      uint32 fieldsize
-      word32 ?
+      uint32 ?
      ...
    } fields[nrfields]
@@ -180,9 +198,9 @@ the `Base000` entry contains the record number for the table definition of the f
 some records are compressed, the format is like this:
-    word16 size
+    uint16 size
-    byte   head[2] = { 8, 0 }
+    uint8   head[2] = { 8, 0 }
-    word32 crc32
+    uint32 crc32
-    byte   compdata[size-4]
+    uint8   compdata[size-4]
-    byte   tail[3] = { 0, 0, 2 }
+    uint8   tail[3] = { 0, 0, 2 }
author	itsme <itsme@xs4all.nl>	2021-07-09 17:13:12 +0200
committer	itsme <itsme@xs4all.nl>	2021-07-09 17:14:20 +0200
commit	52532242ef38937387fac2303e0371860a15caa3 (patch)
tree	aff67efe6a796e3335b8f420ba7d0ec13dca3189
parent	277f905849f9a050089049b2c84c45fac6203045 (diff)

diff --git a/docs/cronos-research.md b/docs/cronos-research.md index 64e2d51..a60d054 100644 --- a/docs/cronos-research.md +++ b/docs/cronos-research.md
@@ -21,44 +21,55 @@ On a default Windows installation, the CronosPro app shows with several encoding
21		21
22	##Files ending in .dat	22	##Files ending in .dat
23		23
24	All .dat files start with the string `"CroFile\0"` and then 8 more header bytes	24	All .dat files start with a 19 byte header:
25		25
26	`CroStru.dat` has	26	char magic[9] // allways: 'CroFile\x00'
		27	uint16 unknown
		28	char version[5] // 01.02, 01.03, 01.04
		29	uint16 encoding // 1 = KOD, 3 = encrypted
		30	uint16 blocksize // 0040 = Bank, 0400 = Index, 0200 = Stru
		31
		32	This is followed by a block of 0x101 or 0x100 minus 19 bytes seemingly random data.
27		33
28	xx yy 30 31 2e 30 32 01 == ? ? 0 1 . 0 2 ?	34	The unknown word is unclear but seems not to be random, might be a checksum.
29		35
		36	In `CroBank.dat` there's a bias towards 313 times c8 05, 196 times b8 00, 116 times 4e 13, 95 times 00 00, and 81 times 98 00 out of 1964 databases.
30		37
31	CroBank.dat and CroIndex.dat have (as found in the big dump)	38	In `CroStru.dat` there's a bias towards 351 times c8 05, 224 times b8 00, 119 times 4e 13, 103 times 00 00 and 83 times 98 00 out of 1964 databases.
32		39
33	xx yy 30 31 2e 30 32 0[0123] == ? ? 0 1 . 0 2 ?	40	In `CroIndex.dat` there's a bias towards 312 times c8 05, 194 times b8 00, 107 times 4e 13, 107 times 00 00 and 82 times 98 00 out of 1964 databases.
34		41
35	xx yy 30 31 2e 30 33 0[023] == ? ? 0 1 . 0 3 ?
36		42
37	xx yy 30 31 2e 30 34 03 == ? ? 0 1 . 0 4 ?	43	##Files ending in .tad
38		44
		45	The first two `uint32` are the number of deleted records and the tad offset to the first deleted entry.
		46	The deleted entries form a linked list, with the size always 0xFFFFFFFF.
39		47
40	which seems to be the version identifier. The xx yy part is unclear but seems not to be random, might be a checksum.	48	Depending on the version in the `.dat` header, `.tad` use either 32 bit or 64 bit file offsets
41		49
42	In `CroBank.dat` there's a bias towards 313 times c8 05, 196 times b8 00, 116 times 4e 13, 95 times 00 00, and 81 times 98 00 out of 1964 databases.	50	version `01.02` and `01.04` use 32 bit offsets:
43		51
44	In `CroStru.dat` there's a bias towards 351 times c8 05, 224 times b8 00, 119 times 4e 13, 103 times 00 00 and 83 times 98 00 out of 1964 databases.	52	uint32 offset
		53	uint32 size // with flag in upper bit, 0 -> large record
		54	uint32 checksum // but sometimes just 0x00000000, 0x00000001 or 0x00000002
45		55
46	In `CroIndex.dat` there's a bias towards 312 times c8 05, 194 times b8 00, 107 times 4e 13, 107 times 00 00 and 82 times 98 00 out of 1964 databases.	56	version `01.03` uses 64 bit offsets:
47		57
48	##Files ending in .tad	58	uint64 offset
		59	uint32 size // with flag in upper bit, 0 -> large record
		60	uint32 checksum // but sometimes just 0x00000000, 0x00000001 or 0x00000002
49		61
50	The first two `uint32_t` seem to be the amount and the offset to the first free block.	62	where size can be 0xffffffff (indicating a free/deleted block).
		63	Bit 31 of the size indicates that this is an extended record.
51		64
52	The original description made it look like there were different formats for the block references, but all entries in the .tads appear to follow the scheme:	65	Extended records start with plaintext: { uint32 offset, uint32 size } or { uint64 offset, uint32 size }
53		66
54	uint32_t offset
55	uint32_t size // with flag in upper bit, 0 -> large record
56	uint32_t checksum // but sometimes just 0x00000000, 0x00000001 or 0x00000002
57		67
58	where size can be 0xffffffff (probably to indicate a free/deleted block) some size entries have their top bits set. In some files the offset looks garbled but usually the top bit of the size then is set.	68	## the 'old format'
		69
		70	The original description made it look like there were different formats for the block references.
59		71
60	large records start with plaintext: { uint32 offset, uint32 size? }	72	This was found in previously existing documentation, but no sample databases with this format were found so far.
61	followed by data obfuscated with 'shift==0'
62		73
63	If the .dat file has a version of 01.03 or later, the corresponding .tad file looks like this:	74	If the .dat file has a version of 01.03 or later, the corresponding .tad file looks like this:
64		75
@@ -69,12 +80,12 @@ If the .dat file has a version of 01.03 or later, the corresponding .tad file lo
69		80
70	The old description would also assume 12 byte reference blocks but a packed struct, probably if the CroFile version is 01.01.	81	The old description would also assume 12 byte reference blocks but a packed struct, probably if the CroFile version is 01.01.
71		82
72	uint32_t offset1	83	uint32 offset1
73	uint16_t size1	84	uint16 size1
74	uint32_t offset2	85	uint32 offset2
75	uint16_t size2	86	uint16 size2
76		87
77	with the first chunk read from offset1 with length size1 and potentially more parts with total length of size2 starting at file offset offset2 with the first `uint32_t` of the 256 byte chunk being the next chunk's offset and a maximum of 252 bytes being actual data.	88	with the first chunk read from offset1 with length size1 and potentially more parts with total length of size2 starting at file offset offset2 with the first `uint32` of the 256 byte chunk being the next chunk's offset and a maximum of 252 bytes being actual data.
78		89
79	However, I never found files with .tad like that. Also the original description insisted on those chunks needing the decode-magic outlined below, but the python implementation only does that for CroStru files and still seems to produce results.	90	However, I never found files with .tad like that. Also the original description insisted on those chunks needing the decode-magic outlined below, but the python implementation only does that for CroStru files and still seems to produce results.
80		91
@@ -114,7 +125,14 @@ Interesting files are CroStru.dat containing metadata on the database within blo
114	0x7e, 0xab, 0x59, 0x52, 0x54, 0x9c, 0xd2, 0xe9,	125	0x7e, 0xab, 0x59, 0x52, 0x54, 0x9c, 0xd2, 0xe9,
115	0xef, 0xdd, 0x37, 0x1e, 0x8f, 0xcb, 0x8a, 0x90,	126	0xef, 0xdd, 0x37, 0x1e, 0x8f, 0xcb, 0x8a, 0x90,
116	0xfc, 0x84, 0xe5, 0xf9, 0x14, 0x19, 0xdf, 0x6e,	127	0xfc, 0x84, 0xe5, 0xf9, 0x14, 0x19, 0xdf, 0x6e,
117	0x23, 0xc4, 0x66, 0xeb, 0xcc, 0x22, 0x1c, 0x5c };	128	0x23, 0xc4, 0x66, 0xeb, 0xcc, 0x22, 0x1c, 0x5c,
		129	};
		130
		131
		132	given the `shift`, the encoded data: `a[0]..a[n-1]` and the decoded data: `b[0]..b[n-1]`, the encoding works as follows:
		133
		134	decode: b[i] = KOD[a[i]] - (i+shift)
		135	encode: a[i] = INV[b[i] + (i+shift)]
118		136
119		137
120	The original description of an older database format called the per block counter start offset 'sistN' which seems to imply it to be constant for certain entries. They correspond to a "system number" of meta entries visible in the database software. Where these offsets come from is currently unknown, the existing code just brute forces through all offsets and looks for certain sentinels.	138	The original description of an older database format called the per block counter start offset 'sistN' which seems to imply it to be constant for certain entries. They correspond to a "system number" of meta entries visible in the database software. Where these offsets come from is currently unknown, the existing code just brute forces through all offsets and looks for certain sentinels.
@@ -136,11 +154,11 @@ Names are stored as: `byte strlen + char value[strlen]`
136		154
137	The first entry contains:	155	The first entry contains:
138		156
139	byte	157	uint8
140	array {	158	array {
141	Name keyname	159	Name keyname
142	uint32_t index_or_size; // size when bit31 is set.	160	uint32 index_or_size; // size when bit31 is set.
143	byte data[size]	161	uint8 data[size]
144	}	162	}
145		163
146	this results in a dictionary, with keys like: `Bank`, `BankId`, `BankTable`, `Base`nnn, etc.	164	this results in a dictionary, with keys like: `Bank`, `BankId`, `BankTable`, `Base`nnn, etc.
@@ -149,27 +167,27 @@ the `Base000` entry contains the record number for the table definition of the f
149		167
150	## table definitions	168	## table definitions
151		169
152	byte version	170	uint8 version
153	word16 unk1	171	uint16 unk1
154	word16 unk2	172	uint16 unk2
155	word16 unk3	173	uint16 unk3
156	word32 unk4	174	uint32 unk4
157	word32 unk5	175	uint32 unk5
158	Name tablename	176	Name tablename
159	Name unk6	177	Name unk6
160	word32 unk7	178	uint32 unk7
161	word32 nrfields	179	uint32 nrfields
162		180
163	array {	181	array {
164	word16 entrysize -- total nr of bytes in this entry.	182	uint16 entrysize -- total nr of bytes in this entry.
165	word16 fieldtype 0 = sysnum, 2 = text, 4 = number	183	uint16 fieldtype 0 = sysnum, 2 = text, 4 = number
166	word32 fieldindex ??	184	uint32 fieldindex ??
167	Name fieldname	185	Name fieldname
168	word32	186	uint32
169	byte	187	uint8
170	word32 fieldindex ??	188	uint32 fieldindex ??
171	word32 fieldsize	189	uint32 fieldsize
172	word32 ?	190	uint32 ?
173	...	191	...
174	} fields[nrfields]	192	} fields[nrfields]
175		193
@@ -180,9 +198,9 @@ the `Base000` entry contains the record number for the table definition of the f
180		198
181	some records are compressed, the format is like this:	199	some records are compressed, the format is like this:
182		200
183	word16 size	201	uint16 size
184	byte head[2] = { 8, 0 }	202	uint8 head[2] = { 8, 0 }
185	word32 crc32	203	uint32 crc32
186	byte compdata[size-4]	204	uint8 compdata[size-4]
187	byte tail[3] = { 0, 0, 2 }	205	uint8 tail[3] = { 0, 0, 2 }
188		206