i86 Projects

back   

ChaOS is an organic bootstrapped self-compiling operating system for Intel/AMD platforms, in continuous development since 1995. It is not a linux derivative. The compiler, linker and source editor which produce the executable image are embedded within the operating system, along with the entire source code and an integral source-level debugger. This makes for a compact development platform which has successfully migrated from 80486 to present-day Intel i86 platforms.

ChaOS is predominantly a text-based system but runs in a choosable VESA graphics mode, so has graphics capability which is used for design software outputting to a CO2 laser. It serves a valuable purpose in preserving custom business software, unaffected by the version changes which plague the mainstream operating system world. ChaOS has evolved to be simplistic yet reliable, and by being different, is completely immune to internet viruses.

Public i86 projects:
{armc1}

ChaOS source language is C, with inline assembler in asm {....} blocks. C types within the source code are enhanced by short-form aliases, to improve readability and speed compilation. These aliases are

:

    CH  signed char
    UC  unsigned char
    SI  signed short, 16-bit little-endian
    UI  unsigned short, 16-bit little-endian
    Sm  signed short, 16-bit big-endian
    Um  unsigned short, 16-bit big-endian
    SL  signed long, 32-bit little-endian
    UL  unsigned long, 32-bit little-endian
    SM  signed long, 32-bit big-endian
    UM  unsigned long, 32-bit big-endian
    SQ  signed long long, 64-bit little-endian
    UQ  unsigned long long, 64-bit little-endian
    SP  signed long long, 64-bit big-endian
    UP  unsigned long, 64-bit big-endian

28.12.17 - A proper DEFLATE algorithm Fixed an error in my coding of DEFLATE when scanning match chains - 258-byte matches were being dropped, instead of being stored in the match table. This fix pushes compression ratio beyond that seen in the DOCX files I am working with, 1.3% better on packages typically compressing to one sixth of original size. At these high compression ratios, the 1.4% gain results in DOCX files which are 10% smaller but which work fine when opened by Microsoft Word16.

Fixing a slight error in the code length frequency distribution squeezes files by a further 0.1%. All this without going down the lazy-match route. The verdict must be that classic DEFLATE algorithms drop potential matches from their hash tables to favour speed over maximum compression.

16.12.17 - A proper DEFLATE algorithm As mentioned earlier, public DEFLATE source code is opaque to me especially when looking at hash table generation. It is largely 16-bit code, written to run in a series of 64k segments. Since memory is no issue in a 32-bit flat linear system, I decided to try a linear table to map occurences of the byte-triplets which serve as the baseline string-match in DEFLATE. Using 16-bit values to point back into the 32k input buffer, this table is 256*256*256*2 bytes in length, i.e. 32Mb, initialized to all zeros. At each step through the input buffer this table can be checked for a previous occurrence of the triplet (i.e. a non-zero value), otherwise the current offset is saved to the table in 1-based format (i.e. plus one). This produces some fodder to output DEFLATE length and distance codes, with a simple loop to check succeeding input bytes to match longer strings.

A two-pass approach is needed, through each input block, first to identify the string matches (each of which is saved to a table for the second pass), and to increment the frequency distributions for the output codes. The Huffman trees can then be generated for those output codes. The second pass is very simple

:
    MATCH* m=matchtable;
    for(n=0;n< matches;n++)
        {
         while(offset< match->offset)
            {
             //output literal symbol, 0 through 255
            }
         //output match->length code
         //output match->distance code
         match++;
         offset+=match->len;
        }
    while(offset< inputblocklen)
        {
         //output literal symbol, 0 through 255
        }

This simplistic approach resulted in a bit more compression on the DOCX XML files, and served to debug the full range of DEFLATE length and distance codes.

With the addition of another table, this time 32k*16-bit offsets (I call this the chain table), when an entry in the main 32Mb table is found to be occupied, offsets can be moved to the chain table and the later match patched in. This results in chains of triplet-matches. For each triplet match encountered in the first pass, these chains can be checked for longer string matches, naturally favouring the nearer matches (shorter distances) which are first in the chains. This is the essence of DEFLATE. This algorithm produces the 75-90% compression ratio we expect. Direct comparison with Microsoft Word16 DOCX compression reveals this algorithm to be about 0.3% worse on the compression ratio. The difference is down to lack of the "lazy-matching" tweak. I prefer speed and simplicity of code over absolute compression.

21.11.17 - ZIP/UNZIP/DOCX Added code to {unzip} to extract a full archive to current directory and subdirectory tree, creating also a ziplist file to record the contents of the archive. Developed {zip} further to process ziplist into a multi-file .zip package, all fairly straightforward. Running UNZIP/ZIP sequence now produces a file which can be loaded by Windows10, but which produces an error when attempting to extract (inflate) parts from the archive (whilst Debian Linux works fine)..

Internet Wisdom suggests the way around this problem (and similar problems with older .zip files and newer Windows versions) is to download a program called 7-Zip, decompress then re-compress the archive, which does work. However I correctly guessed that the Windows issue is with the way in which a zero-length Huffman distance tree is stored in the Deflate block head. Deflate block encoding stores the length of the distance tree, less 1, in 5 bits. Therefore it is not possible to store an empty distance tree, it must have at least one leaf node. Linux allows the symbol for this single leaf node to evaluate to zero; Windows however requires this single leaf node to be non-zero.

With this tweak in place I have a .zip format which is acceptable to Windows10. Therefore I can now disassemble a DOCX into its component parts, then reassemble them for the Windows10/Word2016 environment.

One extra tweak relates to DOCX image components which initially were missing from my reassembled packages. Parts such as JPEG may be stored uncompressed into the ZIP package, because they would not compress down much further, if at all. An extra parameter in ziplist records the compression method (now 8 or 0), so {zip} can reassemble the package with the same compression mix.

17.11.17 - Simple .ZIP file Initial Deflate algorithm complete, using just literal symbols (no distance codes), wrapped into {zip} project to create a simple .zip file with one Local File Header, one Central Directory File Header and one End of Central Directory record. Compression ratio is awful, just 33% but the object at the moment is to get a working DOCX format. Debian Linux accepts the .zip image and displays the single-file contents, and will decompress the output provided the crc32 is correct. Just for the record the crc is performed on the Inflated file image.

13.11.17 - Huffman Encoding Source code in ZLIB ZIP archive is mature and complex, probably easier to write a DEFLATE algorithm for ChaOS from scratch, rather than hacking the Linux source code. Created a debug version of my INFLATE code, to produce a symbol frequency distribution for the first Huffman tree encoded in a sample DEFLATE block (this being the code_length array).

I already know how to build a Huffman tree from the simple array of code lengths in a DEFLATE block - the question is can I create a Huffman tree from a symbol frequency array?

After four days the answer is YES. It is relatively easy to sort and coallesce leaf nodes to produce a Huffman tree, but near impossible to rearrange nodes at each level into ascending symbol order. However the code length array as encoded at the start of a DEFLATE block can more or less be read straight from the tree heirarchy - code length is simply the number of parent links up to the tree root node. A pass though the tree to find code lengths for each symbol also reveals the minimum and maximum code lengths. With this information it is dead simple to generate the huffman bit codes corresponding to each code length, remembering we need the length AND the bitcode to generate a token for the DEFLATE output stream:

    UL  lvl,m,code=0;
    for(lvl=mincode;lvl<=maxcode;lvl++)
        {
         for(m=0;m< syms;m++)
            {
             if(codelen[m]==lvl)
                {                        //ascending syms at each level
                 bitcode[m]=code;code++; //produce adjacent ascending codes
                }
            }
         code<<=1;
        }

9.11.17 - ED/UNZIP/RED Quickly hacked together a selectlist program {unzip} to display contents of ZIP file and decompress content on a keypress a into a rough read-only hack {red} of my source editor {ed}. Added //z: hyperlink format to {ed} similar to //p: hyperlink which invokes my PDF reader in a separate screen area for browsing references while editing source files.

Added global clipboard to ChaOS , copy function to {red} and paste function to {ed}. Now I can transfer ZIP archive content straight to {ed} without cluttering my disks with extracted files. Added parameters to //z: hyperlink to go straight to a nominated file + source line - really useful. Stopped short of a grep search of the ZIP for a text match, but not difficult to do should I ever need to find a needle in a ZIP haystack.

8.11.17 - XML/DOCX/PKZIP Investigating DOCX file format and nibbling at the associated ECMA Specifications to discover that DOCX is in fact a PKZIP archive containing a group of XML parts. Also discovered that compression method within these packages is type 8 DEFLATE, and identical to PDF format except that ZLIB header and trailing checksum are omitted. I already have code running in ChaOS to INFLATE these blocks, so could see the uncompressed DOCX parts within short order.

Further serendipity in downloading a GitHub archive, to have a look at some source code for the DEFLATE algorithm - exactly the same compression is used here too.

6.11.17 - {xhc}/kbd Crafted my first interrupt TRB ring for USB keyboard on XHC. Easiest is to clear the link->toggle flag, so that TRBs can be re-used unchanged next time around. Keyboard Transfer Complete interrupts disrupt the MSD driver, (more work to do here), but code worked straight away on an Asus BeeBox - a quantum step towards running ChaOS on machines with no PS/2 port.

2.11.17 - {xhc}/msd Playing around with SCSI READ16/WRITE16 READCAP/READCAP16 to discover USB->SATA docking stations can actually access sectors above the 2Tb ceiling. Modified {xhc}/msd to invoke READCAP16 when READCAP returns 0xffffffff sectors, and switch to READ16/WRITE16 CBW protocol for drives >2Tb.

1.11.17 - {xhc}/msd Added Mass Storage Device class driver to XHC, now able to back up development partitions to SS USB stick five times faster than USB2.

30.10.17 - {xhc} USB configuration now advanced through getUDD, getproductstring, getconfigdescriptors, enough then to construct an Input Context for a USB Mass Storage Device, including Endpoint Contexts for the standard Bulkin/Bulkout pipes. Thus Slot State is advanced to Configured state, ready to try talking to the device proper.

Command Completion code 17s are returned until the Input Context is filled in properly for this - key points are the Input Context size is encoded in the Slot Context (keep count of max endpoint index when processing Endpoint Descriptors into EndPoint Context blocks) and Input Context->add&1 needs to be set, besides the relevant endpoint bit fields, to flag a Slot Context update.

Cache synchronization is needed before gathering data from the XHC dma buffers - I have tried many ways to do this but only asm{wbinvd} does the trick.

27.10.17 - {xhc} Initial framework now in place, device connect and disconnect events now creating/destroying container xUSB DEVs, with XHC slot allocation, endpoint 0 configuration and transition to XHC Addressed state. Simple Endpoint 0 TRB ring with Link TRB and PCS toggle all working, using a GETDESCRIPTOR request as test fodder.

What caught me out for a while is the aggressive power-saving of USB3, which requires a check of USB2/USB1 link state before attempting any USB traffic. The symptom is Command Completion code 4 (USB transaction error), and the cure is a setlinkstate(15),msdelay(20,setlinkstate(0) sequence which should bring the link state to 3 (running) with a PORTSC Event. I do not fully understand the power saving regime yet, that will come later.

23.10.17 - {xhc} Event Ring and Command Ring now running. xHC hardware reset stop spurious startup interrupts, of course all registers need to be then programmed from scratch. Using NoOp (TRB type 23) to test the command ring, and get the RCS flag working correctly. xHC reports 13 available ports, 1 to 9 are USB2, 10 to 13 are USB3, some are connected internally on my laptop, some are unconnected. xHC PCI[0xd0] is a port routing register to switch ports between the PCH EHC controller and the xHC. Flicking these switches to 1 causes a Port Status change event for each port with a device attached. Copying the contents of PCI[0xdc] (which equals 3) to PCI[0xd8] enables the two marked SuperSpeed external USB sockets for USB3 device connect on port 10 or 11. If a USB2 device is plugged into these sockets it appears on port 1 or 2.

20.10.17 - {xhc} Beginning work on USB3 via {xhc}, a driver for the XHCI controller on my laptop.

First job is to understand and construct an Event Ring, to provide a stream of interrupts and Event TRBs. Although PCI:0x3c register as filled in by BIOS on this device indicates a connection to pic vector 7, this vector number is absent from the LPC Bridge redirection registers, and no PCI pin interrupts come through to my handler. This is fixed by using the Message Signalled Interrupt mechanism, setting MSI Address Register to 0xfee00000 and MSI Data Register to 0x0030/0x0130/0x4030/0x4130/0xc030/0xc130 for a handler on IVT vector 0x30.

The Event Ring is easy enough to set up, remembering that the XHC is the TRB producer, and software is the consmer of events, opposite to the other TRB rings in the device driver scenario. One fact either missing from the documentation, or which I have overlooked in the documentation is that the EHB flag in the Interrupter Dequeue Pointer latches on if this register is written with an invalid address - e.g. for an empty ring, just after setting ERSTBA - anything other than the first ring segment base address will latch EHB. Of course this then generates an interrupt, but with no event - the XHC is protesting about an attempt to advance the dequeue pointer before an event has been produced.

To discover this barely-useful fact, I had made an algorithmic error inside my interrupt handler, advancing the Dequeue pointer too far. The flood of interrupts produced confused me for a while - expecting just one interrupt per second from MFINDEX rollover events but getting 1000 interrupts per second - until one realizes this rate is the default setting for IMOD - the XHC Interrupt Moderation Register.

Also of interest is the programming of 64-bit MMIO registers on the XHC from 32-bit mode, which requires the low-order 32-bits to be written first. The new register value latches when the upper 32 bits are written. This led me to a glaring error in my 64-bit extensions to the current 32-bit i86 compiler, where an indirect 64-bit memory read using common source and destination register reads the first 32-bits OK, then gobbledegook for the second read, having destroyed the source memory address. {cc} compiler now modified with a proper fix for this error, and with low-high ordering on 64-bit writes. This allows MMIO register arrays which include 64-bit registers (such as the XHC) to be declared as structures, reading or writing those MMIO registers using structure member notation.

Finally, to observe the XHC Event Ring producing MFINDEX wrap event TRBs, on my Dell Inspiron laptop the default PCI setting for inactivity timeout needs to be changed, i.e. PCI[0x40]&=~0x00380000; (No timeout). Otherwise the XHC goes into power saving and MFINDEX is frozen. (No events!)