Compressed Linetables
- Posted by petelomax Aug 29, 2014
- 1613 views
Just thought I'd post a fairly simple mini challenge for anyone who fancies it:
I use a LineTab to map machine code offsets to a source code line number. It is a simple flat sequence of integers, with negative values representing a "skip" (most will be -1..-25, but it should cope with -1073741824 or thereabouts) and positive values representing an ever-increasing offset (an upper limit of 1073741823 would be more than enough). Some of these are quite large, one just spotted was #334 (820) entries long, so it is about time that I considered compressing them somehow. Speed is not a significant issue. Here is that 820-entry-long one as an example:
{-637,0,-23,5,6,-2,12,13,14,-71,20,-78,21,27,29,-2,31,35,37,39,44,46,-1,48,53,58,60,66,68,71,74,75,81,91,93,98, 100,-1,110,116,117,-3,123,-1,124,-42,132,-11,133,136,142,146,152,158,161,-1,163,169,172,175,178,180,183,185, 192,203,209,220,226,227,-1,228,-3,234,238,242,245,248,-1,252,-55,258,-4,263,264,266,269,275,-1,281,283,285,287, 293,298,302,306,-1,308,310,-2,312,314,-1,318,321,323,326,332,338,340,-33,342,-2,344,-5,347,350,353,357,359,361, 365,367,369,-1,372,376,-1,378,381,383,385,-1,388,392,395,397,401,405,-33,407,-4,412,413,416,420,423,426,430,437, 443,447,450,452,454,458,460,-1,463,-21,467,-14,470,473,477,481,-3,485,486,487,488,493,497,-23,499,501,-6,502,504, 509,-1,514,521,523,525,532,533,534,535,-7,536,541,542,543,544,-1,545,550,552,-1,559,562,564,567,569,-41,573,-9, 574,575,-10,576,-1,577,-4,583,584,590,-26,592,-1,594,-1,599,607,-11,613,619,622,628,635,637,644,646,647,-1,648, -24,654,-1,659,661,-2,667,671,675,679,683,-1,687,-17,693,-3,695,699,701,703,710,716,-8,720,-3,722,723,-1,724,-1, 725,731,737,740,745,-1,750,752,754,756,759,-1,761,762,-23,763,767,769,775,782,784,790,792,795,798,-4,800,-1,806, 810,812,814,819,821,828,829,-1,830,-3,836,841,-8,843,-2,848,-45,854,-3,856,862,866,-7,869,-3,874,-1,880,881,882, 889,900,906,913,919,920,-1,921,-20,927,931,-4,933,-2,935,938,942,-7,946,-9,952,953,954,956,-1,958,961,964,970, 976,979,981,987,993,999,1002,1008,-19,1009,-3,1014,1016,1019,-6,1022,-3,1028,1032,-1,1035,1038,1040,1045,-19, 1050,1056,1057,1058,1061,1068,1079,1085,1092,1098,1099,1100,-1,1101,-22,1107,1111,1116,1118,-1,1120,1124,1127,-1, 1131,-2,1135,-20,1141,1142,1143,1145,1147,1150,-13,1151,-7,1157,-1,1158,1164,-2,1167,1171,1174,1177,-1,1180,-13, 1183,-3,1188,1189,1190,1191,1192,1199,1206,1212,1219,1225,1232,1238,1239,-1,1240,-32,1246,1253,1259,1266,1273,-8, 1277,-12,1283,1287,1289,-19,1295,-1,1300,1301,1302,1303,-12,1304,-5,1309,1312,1315,1317,1319,1323,1325,1327,-1, 1330,1337,1343,1347,-1,1349,1352,1354,1356,-32,1359,1363,-3,1366,1369,1371,1373,1376,1379,1385,1389,1392,1395, 1399,1405,1409,1413,-41,1417,-3,1419,1425,1428,1431,1434,1440,-12,1446,-7,1447,-1,1450,1456,1462,1468,1474,1477, 1482,-1,1487,-1,1489,1491,1493,1496,-2,1498,1504,1507,-24,1513,1515,1516,-4,1517,-11,1523,1524,1527,-4,1530,-2, 1532,1533,1534,1535,1536,1537,1538,1543,-10,1546,-3,1547,1554,1560,1567,1574,1581,1587,1594,1600,1601,-1,1602,-22, 1608,1612,1615,1618,1621,1624,1627,1630,-10,1633,-10,1639,-1,1642,-7,1643,-1,1645,1651,1652,1655,1660,1662,1668, 1672,1674,1677,1680,1683,1690,1694,1701,-32,1702,-7,1703,-1,1705,1711,1712,1716,1721,1723,1729,1731,1735,1738, 1741,1744,1747,1754,1758,1765,-35,1766,-17,1767,-1,1768,1771,1774,1777,1780,1782,1785,1788,1791,1793,-1,1795,1796, 1798,1800,1802,1804,1806,1808,1814,1816,1821,1823,-1,1828,-38,1829,-3,1830,1836,1838,1842,1847,1849,1852,1854, 1857,1858,-1,1859,1862,1864,1869,-1,1871,-1,1875,1880,1885,1887,1893,1896,1899,1905,1908,1915,1917,1920,-38,1921, -15,1922,1925,1932,1934,-1,1935,-1,1942,-1,1945,-4,1948,1949,-23,1952,1956,-2,1958,1962,-1,1968,1970,1975,1980, -13,1981,-3,1983,1987,-1,1993,1996,1999,-10,2004,-4,2005,2009,-1,2015,-1,2022,-1,2025,-1,2027,-14,2030,-1,2032, 2035,2038,2041,2044,2047,2048,2053,2054,2056,2058,2060,2062,-23,2064,2067,2069,2075,2077,2082,2085,2087,-12,2094, -1,2096,2097,2104,-1,2107,-2,2110,-2,2112,2113,-16,2120,-3,2124,-3,2126,2130,-1,2136,2137,2140,2145,2150,-17,2151, -2,2153,-3,2156,-3,2158,2162,-1,2168,-2,2169,2172,2175,2180,-19,2181,-4,2186,2190,-1,2196,2199,2202,-9,2204,-143, 2209,2214}What that means:
offset 0 corresponds to line 638 (ie -(-637)+1) aka -LineTab[1]+(LineTab[2]>=0) offset 5 corresponds to line 662 (ie -(-637)+1-(-23)+1) aka -[1]+([2]>=0)+-[3]+([4]>=0) offset 6 corresponds to line 663 (ie "" +1) aka "" +([5]>=0) offset 12 corresponds to line 666... (ie "" -(-2)+1) aka "" +-[6]+([7]>=0)To convert an offset to a source line number (which I only need to do when the app crashes) I simply start at the top and add up line numbers until I get to the required offset, not that you really needed to understand that. It is reasonably compact, but 3280 bytes could clearly be improved on, something nearer to 900 should be quite possible, and I would be mightily impressed if someone could jam that lot into 100 or so bytes!
A full-on LZW or whatever approach feels like overkill and could quite probably become intermittently error-prone. The compress() routine from database.e seems like a reasonable starting point (or something entirely different if you fancy), but that "ever-increasing offset" looks like a few "reset base" in the middle or suchlike should help even more. Assuming you need one every 256 bytes, a finger in the air says that saves ~820*0.6*3-50=1426 bytes. Equally (again, not trying to push a solution on anyone) I could see that -637 held as {-128,-128,-128,-128,-125}. But that's enough of my daft half-baked ideas!
The challenge, should anyone be at all interested, is to pack that into the shortest sequence of bytes, and of course unpack said bytes to obtain the original flawlessly.
Bonus points: convert an offset into a line number without uncompressing.