ChaOS Diary to Sep 2015

ChaOS Home    ChaOS Source Notes    ChaOS Source Index    ChaOS Downloads    CTPP Home

Diary to Jun 2015    Diary to Dec 2014    Diary to Sep 2012    Diary to Mar 2012    Diary to Dec 2011    Diary to Sep 2011    Diary to Mar 2011    Diary to Dec 2010    Diary to Sep 2010    Diary to Jun 2010    Diary to Mar 2010    Diary to Dec 2009    Diary to Aug 2009    Diary to Apr 2009    Diary to Nov 2008

ChaOS Diary - monoblog and links to reference documents

Many golden nuggets lie herein.        Trademarks are acknowledged as copyright of their respective owners

25/9/2015 Intel D510MO Motherboard - No legacy MBR multiboot when UEFI is on! Built a Mini-ITX box with a bargain basement D510MO motherboard which cost just ten pounds including 1Gb RAM. This motherboard has a UEFI BIOS, so it would serve as testbed for a ChaOS UEFI boot, or as a low-power server.

It has been a long time since I experienced a problem booting ChaOS, but the D510MO had me stumped for a couple of days. Eventually, I stumbled across some source code in TianoCore which implements the legacy MBR support within the UEFI. This restricts the legacy boot to MBRs with more than one partition marked as active. For many years now I have used a ChaOS partition table in the MBR with, usually, all 4 partitions marked as active, so that I can boot from any of the 4 partitions - absolutely essential when developing operating systems. If a partition becomes non-bootable during development, I have up to three backup partitions to fall back on.

After cursing Tianocore for this, (realising it is nigh impossible to make a small patch to the BIOS image to NOP out this test), I found that this behaviour can be avoided by setting the D510MO UEFI BIOS option to off. Maybe I just found out why this eBay trader was selling the D510MO so cheap! Nevertheless this is seriously annoying. Although there is a rule from the old MSDOS FDISK days that only one partition should be marked as active, this can be enforced in disk formatting software rather than by the BIOS. Worryingly, there may be many more Intel motherboards being flashed with this algorithm. Does Intel think Windows is the only OS out there?

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28738

21/9/2015 {armc0} Crafting the ut type system Expressions analyzer now recoded to use stack tmps rather than push and pop. Attention now turns to expanding ut and parsing of structure declarations. Spent some time considering how C++ types will blend into the type system, including ways to implement class scope. As types in C++ follow scope visibilty rules similar to those for data names, I have reversed type-matching order for named types (struct,union,enum for now) which so types are matching in correct scope order.

Seeing there is a problem in my design when using a named type in a structure memberlist - i.e. where to store the namestring - added a new tNAMED type for structure members. The same reasoning applies to function argument lists, where argument names are declared. So function arglists can now be either a list of all tNAMED types, or a list of non-tNAMED types (but not a mixture of the two).

As with function declarations, once the {armc0} parser can process structure declaration synax, I can add code for the structure member postfix operators . and ->.

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28738

12/9/2015 {armc0} Upside-down stack frame takes shape The main consequence of the upside-down stack frame model is that push and pop become problematic, as they shift the stack frame. It is possible to track the address change, but would require an extra field in the PCODE structure, to carry the frame shift through to the compiler backend. Source level debugging would be very awkward, with these floating local symbol addresses.

My solution is to keep count of temporaries normally created by push, and maxargs, i.e. the maximum argument count for for functions called by the current function. The local address offsets closest to SP are then reserved for function arguments using standard C order. Code generation for a function call changes slightly:

    assignment-expression
    push      accumulator
    assignment-expression
    push      accumulator
    call      function
    ADD       SP,8
becomes:
    assignment-expression
    store     [SP+4],accumulator
    assignment-expression
    store     [SP+0],accumulator
    push      accumulator
    call      function

In fact the new arrangement is easier, as function arguments can be compiled and stored in left-right order, rather than buffered and fed to the pcode stack in right-left C-style order. This is an elegant solution, so I will run with it. The same logic can be applied to other stack temporaries generated during compilation, I will store these just above the function arg spaces.

    {armc0} new stack model 12/9/2015

    -----------------------
    |      outer arg1     |
    -----------------------
    |      outer arg0     |    SP+4+functionargmax*4+localtmps*4+localvariablespace
    -------------------------
    |   return address    |    SP+functionargmax*4+localtmps*4+localvariablespace
    -------------------------
    |  local variable 0   |
    -----------------------
    |  local variable 1   |
    -----------------------
          ..........
    -----------------------
    |  local variable n   |    SP+functionargmax*4+localtmps*4
    -------------------------
    |  local temp     0   |
    -----------------------
    |  local temp     1   |
    -----------------------
          ..........
    -----------------------
    |  local temp     n   |    SP+functionargmax*4
    -----------------------
    -------------------------
    | function arg max-1  |    <-function arguments effectively 'pushed'
    -----------------------        C-style, in reverse order
          ..........
    -----------------------
    | function arg 0      |    SP+0
    -----------------------

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28737

6/9/2015 {armc0} A C Compiler with upside-down local addressing Cortex-M0 instruction set constraints are pretty much fatal to my r12 stack pointer model in {armc}, because r8 to r12 are not readily accessible to the 16-bit Thumb instruction set. Tried using r7 as a stack frame pointer, but this too is out because register offsets in 16-bit Thumb ldr and str cannot be negative. Essentially, my idea of creating a Intel EBP-style stack frame on Cortex-M0 is impossible.

To fix this, I am designing armc0, a compiler with an alternate stack model in which all local variable offsets are positive. The answer I think is to delay address generation for local variables until the end of a C function. At this point the local variable stack space requirement is fixed, so positive addresses offsets exist for all local variables and function arguments, after the filling in #imm for the obligatory sub sp,#imm pcode. Fortunately I have always buffered pcodes in my compilers until the end of fct-body so changing the access pcodes for local variable from BP to SP is light spanner work.

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

31/8/2015 {armc} Legwork A busy week fleshing out the type system, and implementing initializers. enum and typedef now supported, plus initializers for multi-dimensionalarrays as laid out in the language specifications. Also introduced register type specifier, using the spare registers above .registerbank.

Also began dealing with Cortex-M0 constraints as regards immediate constants. Extended the '_p' prefix to generate literal pool variables for immediate values not encodable in Thumb instruction set.

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

22/8/2015 {armc} The first C functions Arm Thumb C function cll mechanism now in place, using ldr reg,(literal), blx reg instructions (rather than PC-relative branch instruction) so that calls can be made to anywhere in 32-bit address space.

Using STM32F411 as the testbed, settled on r12 as a stack frame pointer, analogous to EBP in Intel speak (as used in ChaOS). Local data and function arguments now accessible to the expressions analyzer, thus generating the my first true ARM C functions.

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

21/8/2015 {armc} Subordinating the assembler With ut in place, and before going hell-bent on compiling C functions, I spent a couple of days rewriting the assembler to use the C symbol tables for identifier matches, rather than a separate namespace. I have used this idea to great effect in ChaOS, it saves a lot of grief when interfacing assembly code with C code. As long as you remember how dumb an assembler is when it comes to linking symbols together, things work fine - in other words - use overloaded function names in the assembler at your peril.

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

19/8/2015 {armc} ut type system Spent a few days immersed in C type declarations, determined that the new ut type system should be able to handle the more complex types such as int (*) [4] and int (*(*pf)[4])(int a,int b) which escaped my understanding till now. So now I can move forward, certain in the knowledge that should I ever need to create a pointer to an array of 4 pointers to a function taking two arguments of type int and return int, I have a type system equal to the challenge.

At the practical level, this new type parser also recognises and encodes embedded identifiers, whether as function arguments or more importantly as the identifier in a declaration. Just one flag, gotident, therefore determines whether the parser scanned a bare type name (as needed for a type cast) or a data declaration. And the parser is properly recursive, creating or linking in all the necessary subtypes in complex declarations as the recursion unwinds, to return the complete type as a simple offset into the ut array.

So the parser is now capable of encoding a full function prototype, with argument names, to arrive at the most interesting point in C program compilation: fct-body. In other words, on parsing an opening brace rather than a semicolon after a function declaration, the parser can enter compound-statement, and begin generating code.

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

14/8/2015 {armc} ut complex typenames Array handling now just about working, including pointer add and subtract. Moving towards function declarations I discover that the construct int(*)[2] is the typename for pointer to array of 2 int, a fact that completely went over my head when I started writing my first C compiler many years ago. So it is a construct which would cause an error in my ChaOS compiler - but I am anxious to make sure there are no similar flaws in the {armc} type parser. Function pointers are declared the same way, so this moment of enlightment came at just the right time.

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

11/8/2015 {armc} A C compiler for ARM Thumb

On with the legwork. Added code to parse array declarators, and enter doarrayindex() at postfix '[' operator. Added stack and SRAM windows to {arm} Live Debug to view and debug compiler output.

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

6/8/2015 {armc} A C compiler for ARM Thumb It has been coming for a while now. To short-circuit the transition from {ARM assembler with inline C blocks} to {C compiler with inline ARM asm blocks} I have just tried inserting a call to declarationstatement() inside the assembler itself (to be able to blend C code with assembly statements). This idea is not going to work easily, only a compiler writer would understand why..

So I have started {armc}, copied from {armt} and initially changing just the outer parsing loop, to enter the assembler only from within an asm { ... } construct, otherwise calling declarationstatement() (is a C program just a sequence of declarations?).

As an exercise, I attempted to create an I2C program to interface a Nucleo board with an LCD panel. Though not entirely successful (only flickers were seen from the LCD panel) it demonstrates that the parsing strategy, generation of code etc are broadly correct.

So back to the legwork of the C expression analyzer. Not mentioned so far the work that has been going on to design the register model for this compiler. To break away from the two-register model of the ChaOS compiler, and provide a flexible design which can vary the size of register set used, I have tried a number of different designs so far. My compiler benchmark/nemesis has always been the expression

        *ptr++=*ptr1++;

My general rule of thumb is: if the register model works for this C expression, it is not far wrong.

Another general rule I have is to try to generate code in a single pass, not attempt to re-order the source code to make it easy on the compiler. This means for the expression

        a=b=c=d;

I prefer to create temporaries for the addresses of a,b and c to be used in the eventual right-to-left store, rather than scan forward to compile c = d, first. Luckily my simplistic two-register ChaOS model is analogous to the RISC architecture in ARM, where address values must be in register before a memory load or store operation. For a=b my compilers produce:

        {cc}  no optimizations          {armc}, registerbank=2
        ----------------------        ----------------------
        mov  eax,&a                     ldr  r00,&a     //from pool literal
        mov  ecx,&b                     ldr  r01,&b
        mov  [ecx],ecx                  ldr  r01,[r01]
        mov  [eax],ecx                  str  r01,[r00]

So far I have tried a sliding accumulator technique, but settled back on a fixed accumulator at r00, with one or more registers available for the rvalues in an expression. However many extra registers you have, there will be an occasion when you run out, and need temporary storage. In the ChaOS compiler these temporaries are created by pushing values on to the stack. So in {armt} and {armc}, I will do the same.

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

4/8/2015 {armt} Relative and Unary operators

Fleshed out {armt} expression analyzer, adding TMPBOOL (17/7/15) to support the relative operations, and support for unarynot, unaryneg, and bitwise complement operator. Used TMPBOOL and the long-planned JCOND pcode to implement conditional branches neede to support && and || operators.

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

1/8/2015 STM32F411 FPU core, {arm} Live Debug

Added code to enable CP13 coprocessor access, uploaded code nugget move coprocessor registers to ARM core registers, then transmit results over ST-Link. Added FPU core register display to {arm}. Toolchain is now ready to watch FPU instructions.

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

29/7/2015 STM32F411 cold start, {arm} Live Debug Cold/warm start oscillator startup written and debugged for STM32F411. Reset by the ST-Link is not the same as a power off-on cycle, because the ST-Link leaves the oscillator running at 96MHz during the reset. Using my {arm} Live Debug, MCO and oscilloscope this took under 2 hours, compared to at least 18 hours when I did this for the SAM3X on the Arduino Due a few months ago.

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

29/7/2015 Tufnol gear-cutting on CO2 Laser With linearity improvement in place on the laser CNC, I revisited an old idea (once tried) to cut a helical print gear. Several passes are required with advancing focus depth. The dual-pass engraving method (14/7/15) has already been improved, now using adjacent columns of a 16x16 array of values to map artwork pixels to laser power. In this test, 15 overpasses were performed on Tufnol using an artwork with different grey levels representing the laser power level. The result is a 3D cut with pretty much the correct tooth geometry. Several similar passes using artworks with progressively narrower teeth produces a very useful-looking Tufnol gear.

Careful measurement of the finished 67T 20PA gear showed an actual pressure angle of 22.4 degrees, an error of 2.4 degrees. (done by inking the gear teeth, then rolling the gear along a straight-edge on a piece of paper). The error is quickly shown to be 180/67, being the error in transferring a flat triangular artwork to the gear circumference. Therefore for this operation the artwork angle a needed for a gear wheel with T teeth and pressure angle PA is given by:

        a=PA-(180/T);

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

28/7/2015 STM32F411 Flashwrite

Flash write for the more capable STM32F411 is now working, but using Erase All is time-consuming (due to the large flash memory capacity), so I added a Page Erase command to erase 64k blocks as needed. Reduces erase time for my short development programs from 8 seconds to 1 second.

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

26/7/2015 STM32F030 startup sequence, {arm} live debugging

Studied STM32F030 documents in order to set up GPIO for simple LED blink, to have visible feedback from the Nucleo board as well as the debug register state. Used this simple toolchain to develop a cold start sequence for STM32F030 - going through the gears to set the device running at 64MHz (and beyond) using the PLL clock (with LED blink rate to prove it!). Then programmed the MCO output, AHB enable and associated GPIO to be able to see the various clock signals on the oscilloscope. The different GPIO port output mode settings make a lot more sense when viewed this way. differences.

{armt} is already creaking. Inline data is essential - it is the only way to get arbitrary 32-bit values into the Cortex-M0. I need a way to get 32-bit SRAM addresses into the unit (these are the addresses of modifiable data items). The needs to maintain a list of values needed by the program, then dump these into the instruction stream at a convenient time, but I am not sure this will be worth the effort. So I add a quick literalpool hack. I choose the prefix '_p' to signal to the assembler that a symbol is a forward reference to a literal value in the code stream. Then I add a new .pool directive, to write out the SRAM addresses of the corresponding data symbols (32-bit aligned), whilst filling in the target addresses for ldr reg,literal instructions.

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

24/7/2015 STM Nucleo evaluation boards, {arm} and Cortex-M4 and Cortex-M0 Single-Step Debug

Created separate assembler .include files for the STM32F030 and STM32F411, moving device-dependent ifno to where it belongs, whilst adding extra .dot directives and .raw file header fields to keep on top of all the differences.

Crafted flashwrite for the STM32F030, to write my .raw program images to the device, verified by readback using flashread.

Added no-arg option to {arm} disassembler, to detect connection to Nucleo board and upload flash contents back into the disassembler. This is a great way to see the code generated by the {armt} assembler and expression analyzer. Added some quick hacks to generate ST-Link commands for resetting the target processor, and execute single-step with register display update.

!My ARM debugger is born!.

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

19/7/2015 STM Nucleo evaluation boards, ST-Link Flash write

As touched on earlier, flashwrite ove ST-Link requires an uploaded code nugget to do the dirty work, i.e. trigger a the half-word write cycle needed for the Flash memory controller on the STM32 chips. In conjunction with sequences to perform Flash Unlock, and Flash Erase, and using flashread to read back memory contents over the ST-Link, managed to achieve a simple multiple half-word flash write sequence. So now I know it can be done.

On the downside, some time was lost tracking down exceptions caused by invalid opcodes when programming the STM32F030 Cortex-M0 processor.

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

18/7/2015 STM Nucleo evaluation boards, ST-Link First glimmer of communication observed, exhaustive testing shows a simple but strict initialization sequence puts the board into SWD mode:

        observe inital power-up mode is 1 (USB_MASS)
        send killer command 0xf220a3
        observe mode is now 2 (DEBUG)

This is all done using my EHC/ard command ardsend (which transmits command-line arg data on USB bulkout pipe) and bulkin pipe polling. With , ST-Link is now live, it I soon have processor run, halt, reset, register read and memory read all up and running. Useful command sequences are quickly being absorbed into my EHC USB -> ST-Link gadget driver, sufficient already to support for a flashread command to retrieve the STM32 device memory contents..

Flash write read over the ST-Link is a little more complicated - it requires code to run on the target processors - so a small code thunk has to be written to SRAM, registers have to be set correctly then a processor run command executed. Nevertheless, the compilation, installation and running of such code over the ST-Link is a great exercise for improving {armt}.

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

17/7/2015 STM Nucleo evaluation boards, ST-Link Received Nucleo boards STM32F030 and STM32F411, my new {armt} testbeds. I am strongly interested in the SWD (Single-Wire-Debug) which is available through the USB interface on these boards. Straight away started on a USB gadget driver for the ST-Link USB vendor-model signature, to begin throwing commands at the interface and observing what happens. ST-Link is well-documented, so here goes...

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

17/7/2015 {armt}.nc - expression analyzer - Relative operators With pcodes now in place for the bitwise and arithmetic operator levels, my attention turns to coding the relative operators, (<,<=,== etc). In the ChaOS compiler these operators generate code to place a value 0 or 1 in the accumulator, which serves both as a test value for branch code (e.g. if(n < m)) whilst being also the correct value should the value of an expression need to be computed (e.g. a=n < m;). This simplistic approach has served me well, but a better way would be to introduce a new TMPTYP into the expression analyzer, which can be carried through to test() or fetch().

TMPTYPBOOL would be encoded with a value indicating the test which was performed, arranged in an enumeration with complementary tests in adjacent positions (EQ,NE,GT,LE,GE,LT etc, in ARM language). The condition can then be inverted by XORing the condition code bit 0. (I am thinking forward to the implementation of a full C compiler here, and code needed to support flow control statements).

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

15/7/2015 {armt}.nc - expression analyzer - Multipy, Divide and Modulo operator Steadily filling in the gaps in expressions analyzer. Because there is no ARM modulo instruction (usually remainders are a by-product of a DIV operation). The workaround I cameup with is a division, followed by a multiplication, then a subtraction to find the result:

    To achieve a % b

    udiv tmp,a,b
    mul  tmp1,tmp,b
    sub  a,a,tmp1

Not too fussed at the moment about the exact behaviour of this code in the various signed/unsiged number combinations - this is sufficient to get this section of the expression analyzer filled in.

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

14/7/2015 CO2 Laser Revived an old control in the D10 software to double-cut each pixel column, effectively changing cut resolution from 600x600 dpi to 1200x600 dpi. Cutting is much cleaner and extraction is better, I am guessing because detatched dust particles are smaller and therefore more mobile in the airflow.

Also added potentiometer to laser PSU, to control discharge current through the tube, another way of controlling laser cut depth, and easier than messing with motor speeds. Ordered milliammeter for permanent display of tube current.

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

13/7/2015 CO2 Laser Improved cutting performance with the new laser tube has a downside - gradual blockage of the dust extraction results in reduced cutting depth, becoming really significant after about 9 hours. The way around this is probably to run two fast engraving passes at reduced power, with a nozzle clean and extractor shakedown in between. Happily the new encoder system provides an easy way to resynch the work for another pass. Added a quick resynch command to D10 to find TDC on the encoder.

Update:14/7/2015:Fast-pass engrave attempted, 6.5hour duration after thorough cleaning of all apparatus. Results very good, depth adequate so second pass not required for this roller. Added video camera hooked into CCTV system, to monitor the machine from home.

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

11/7/2015 {armt}/{avrx}/CO2 Laser encoder Programming Arduino circuit to feed encoder information to D10/D10a via spare lines on parallel port. Having introduced a feedback loop to the laser, it is only a small job to use this information to detect motor stall/lost steps whilst engraving and shut down before damaging the work.

Using {avrx} for programming the Arduino, I am thinking that an ARM microcontroller would make a better fist of the maths needed to detect stepper drive non-linearity, and adjust step timing accordingly. Also at this time the BBC announced the MicroBit - using an ARM Cortex-M0. So development has to turn full on to my ARM Thumb assembler - this is where the future will lie.

Ordered a couple of STM Nucleo development boards, the F411 Cortex-M4 and F030 Cortex-M0 version, just to get a handle on this trend.

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724

8/7/2015 CO2 Laser, Nema42 drive Replaced 3:1 Berger Lahr y-axis stepper drive with direct drive Nema42 motor, and connected through double universal joint - i.e. constant velocity joint. This eliminates nonlinearity in the 3:1 belt drive and pulleys, which were dependent on the roller stub not being bent or damaged.

* direct diary upload via FTP1 (encrypted!) from 4Gb CFS SATA partition saa3: over ChaOS version 1.03.28724