TIMER 0 Overflow ISR code comparison

This is the tiny TIMER 0 overflow Interrupt Service Routine (ISR) contained in Arduino source file wiring.c, version 1.6.10.
ISR(TIMER0_OVF_vect)
{
   // copy these to local variables so they can be stored in registers
   // (volatile variables must be read from memory on every access)
   unsigned long m = timer0_millis;
   unsigned char f = timer0_fract;

   m += MILLIS_INC;
   f += FRACT_INC;
   if (f >= FRACT_MAX) {
      f -= FRACT_MAX;
      m += 1;
   }

   timer0_fract = f;
   timer0_millis = m;
   timer0_overflow_count++;
}
The C code above was compiled with GCC 4.9.2 (avr-gcc.exe dated 7/25/2016) using an 8 MHz clock. It generates the sequence of AVR instructions below. The number of bytes needed for each instruction, the clock cycles it used, the corresponding C instructions, and most other comments were added manually.
000000 <__vector_16>:                     ; Bytes Clocks
   0:   1f 92           push    r1           ;  2  2
   2:   0f 92           push    r0           ;  2  2
   4:   0f b6           in      r0, 0x3f     ;  2  1   Value = 63 = SREG
   6:   0f 92           push    r0           ;  2  2
   8:   11 24           eor     r1, r1       ;  2  1   Set R1 = 0
   a:   2f 93           push    r18          ;  2  2
   c:   3f 93           push    r19          ;  2  2
   e:   8f 93           push    r24          ;  2  2
  10:   9f 93           push    r25          ;  2  2
  12:   af 93           push    r26          ;  2  2
  14:   bf 93           push    r27          ;  2  2

  16:   80 91 01 01     lds     r24, 0x0101  ;  4  2   m = timer0_millis;
  1a:   90 91 02 01     lds     r25, 0x0102  ;  4  2
  1e:   a0 91 03 01     lds     r26, 0x0103  ;  4  2
  22:   b0 91 04 01     lds     r27, 0x0104  ;  4  2
  26:	30 91 00 01     lds     r19, 0x0100  ;  4  2   f = timer0_fract;
  2a:   23 e0           ldi     r18, 0x06    ;  2  1   R18 = FRACT_INC (at 8 MHz)
  2c:   23 0f           add     r18, r19     ;  2  1   f += FRACT_INC;
  2e:   2d 37           cpi     r18, 0x7D    ;  2  1   0x7D = 125 (FRACT_MAX)
  30:   20 f4           brcc    .+8          ;  2 1/2  if (f >= FRACT_MAX) {

  32:   02 96           adiw    r24, 0x02    ;  2  2   m += MILLIS_INC;
  34:   a1 1d           adc     r26, r1      ;  2  1     (the false branch)
  36:   b1 1d           adc     r27, r1      ;  2  1
  38:   05 c0           rjmp    .+10         ;  2  2   (jump to 0x44)

  3a:   29 e8           ldi     r18, 0x89    ;  2  1   f -= FRACT_MAX;
  3c:   23 0f           add     r18, r19     ;  2  1
  3e:   03 96           adiw    r24, 0x03    ;  2  2   m += MILLIS_INC + 1;
  40:   a1 1d           adc     r26, r1      ;  2  1
  42:   b1 1d           adc     r27, r1      ;  2  1

  44:   20 93 00 01     sts     0x0100, r18  ;  4  2   timer0_fract = f;
  48:   80 93 01 01     sts     0x0101, r24  ;  4  2   timer0_millis = m;
  4c:   90 93 02 01     sts     0x0102, r25  ;  4  2
  50:   a0 93 03 01     sts     0x0103, r26  ;  4  2
  54:   b0 93 04 01     sts     0x0104, r27  ;  4  2

  58:   80 91 05 01     lds     r24, 0x0105  ;  4  2   timer0_overflow_count++;
  5c:   90 91 06 01     lds     r25, 0x0106  ;  4  2
  60:   a0 91 07 01     lds     r26, 0x0107  ;  4  2
  64:   b0 91 08 01     lds     r27, 0x0108  ;  4  2
  68:   01 96           adiw    r24, 0x01    ;  2  2
  6a:   a1 1d           adc     r26, r1      ;  2  1
  6c:   b1 1d           adc     r27, r1      ;  2  1
  6e:   80 93 05 01     sts     0x0105, r24  ;  4  2
  72:   90 93 06 01     sts     0x0106, r25  ;  4  2
  76:   a0 93 07 01     sts     0x0107, r26  ;  4  2
  7a:   b0 93 08 01     sts     0x0108, r27  ;  4  2

  7e:   bf 91           pop     r27          ;  2  2
  80:   af 91           pop     r26          ;  2  2
  82:   9f 91           pop     r25          ;  2  2
  84:   8f 91           pop     r24          ;  2  2
  86:   3f 91           pop     r19          ;  2  2
  88:   2f 91           pop     r18          ;  2  2
  8a:   0f 90           pop     r0           ;  2  2
  8c:   0f be           out     0x3f, r0     ;  2  1  Value = 63 = SREG
  8e:   0f 90           pop     r0           ;  2  2
  90:   1f 90           pop     r1           ;  2  2
  92:   18 95           reti                 ;  2  4
The next byte address after the "reti" would be 0x94 = 148 decimal, which means the C routine generated 148 bytes of FLASH memory binary code.
It takes either 93 or 94 clock cycles to execute it, depending on whether the "if" statement is TRUE or FALSE.

Note: The "1/2" count notation on the "brcs" line means "either 1 or 2". Only a single clock cycle is needed if the logic falls through (to "subi ...") but 2 clock cycles are needed if the jump to address 0x3A is taken.



Consider the following code, in AVR Assembly language. For comparison sake, it uses the very same variable names, which are declared so that their relative position to one another is preset. This is done by assigning their addresses. The addresses themselves are not important, but the relative position of the variables to one another is, with timer0_millis immediately following the timer0_fract variable. This code is not contained in wiring.c, but it performs the very same function.
timer0_fract  = 0x180  ; unsigned char (1 byte)
timer0_millis = 0x181  ; unsigned long (4 bytes) 0x181 to 0x184

AHi_T0_fract  = timer0_fract >> 8
ALo_T0_fract  = timer0_fract & 0xFF
AHi_T0_millis = timer0_millis >> 8
ALo_T0_millis = timer0_millis & 0xFF

;  Just FYI:    16 MHz   8 MHz    4 MHz
; MILLIS_INC  =    1        2        4
; FRACT_INC   =    3        6       12
; FRACT_MAX   =  125      125      125   So -FRACT_MAX always = -125 = 0x83

VECTOR_16:                  ; Bytes Clocks
   PUSH    R31                 ;  2  2
   PUSH    R30                 ;  2  2   Only PUSH the 5 Registers needed
   PUSH    R29                 ;  2  2
   PUSH    R28                 ;  2  2
   PUSH    R27                 ;  2  2
   IN      R27, SREG           ;  2  1   Save the FLAGs (machine state) on entry

   LDI     R31, AHi_T0_fract   ;  2  1   Point to the timer0_fract BYTE to work
   LDI     R30, ALo_T0_fract   ;  2  1    with it prior to timer0_millis
   LD      R29, Z              ;  2  2   R28 = timer0_fract
   LDI     R28, FRACT_INC      ;  2  1   
   ADD     R29, R28            ;  2  1   Advance timer0_fract by FRACT_INC
   ST      Z+, R29             ;  2  2   Always save timer0_fract now
   LDI     R28, -FRACT_MAX     ;  2  1   Add 131 to generate CARRY if R29 just
   ADD     R29, R28            ;  2  1    exceeded 125 (FRACT_MAX)
   BRCC    Load_MILLIS_INC     ;  2 1/2  If it didn't, skip the next 2 instruction
   LD      R28, -Z             ;  2  2   Decrement Z without changing CARRY
   ST      Z+, R29             ;  2  2   Replace timer0_fract with rollover value

Load_MILLIS_INC:               ; CARRY is set only if FRACT_INC rolled past 125
   LDI     R28, MILLIS_INC     ;  2  1   The amount to increment the LSByte only
   LD      R29, Z              ;  2  2   Pointing to the timer0_millis LSByte
   ADC     R29, R28            ;  2  1   Increment the LSByte by MILLIS_INC and
   ST      Z+, R29             ;  2  2    save it to SRAM and advance Z to Byte #2
   BRCC    ISR_16_Exit         ;  2 1/2  We're done if no CARRY (most of the time)

   LDI     R28, 0              ;  2  1   Initialize the Register to add below
   LD      R29, Z              ;  2  2   Retrieve timer0_millis byte #2
   ADC     R29, R28            ;  2  1   CARRY is already set; increment R29
   ST      Z+, R29             ;  2  2   Save timer0_millis byte #2
   BRCC    ISR_16_Exit         ;  2 1/2  Done if no CARRY (255 out of 256 times
   LD      R29, Z              ;  2  2    that the low order byte rolled over
   ADC     R29, R28            ;  2  1    into the 2nd byte, which about every
   ST      Z+, R29             ;  2  2    125 clock cycles)
   BRCC    ISR_16_Exit         ;  2 1/2
   LD      R29, Z              ;  2  2
   ADC     R29, R28            ;  2  1
   ST      Z, R29              ;  2  2

ISR_16_Exit:   
   OUT     SREG, R27           ;  2  1   Restore the FLAGs saved in R27
   POP     R27                 ;  2  2
   POP     R28                 ;  2  2
   POP     R29                 ;  2  2
   POP     R30                 ;  2  2
   POP     R31                 ;  2  2
   RETI                        ;  2  4

The Assembly language version requires 82 bytes of FLASH memory binary code, versus 148 for the C version.
The lowest possible clock cycle count is 46, which occurs most (i.e., > 90%) of the time.
  3 cycles are added if the timer0_fract BYTE exceeds 125 (about 1 out of 21 times at 8 MHz)
  7 more cycles are added if the 2nd byte of timer0_millis needs to be incremented (depends on clock frequency)
  6 more cycles are added if the 3nd byte of timer0_millis needs to be incremented (every 65,536th time)
  4 more cycles are added if the High byte of timer0_millis needs to be incremented (every 16,777,216th time)
So, the worst possible case: 66 clock cycles, which occurs once in about 4-2/3 hours.

Why is all this cycle counting important? So, at 8 MHz:
  this interrupt occurs every 0.002048 seconds (= about 488.28) times each second
  this amounts to 1/.002048 * (93 - 46) or upwards of 20,000 clock cycles each second.

At 16 MHz, the interrupt occurs twice that often!
In order to speed up any device, minimize the number of clock cycles taken by frequently-executed code, like ISRs.

The Assembly source is not proprietary, and is released as free software. It is an early rewrite of the Timer 0 overflow code. The Timer0 code now present in MIRTOS differents significantly, maintaining a seconds count as well, adding a timer fine-tuning variable, and recalculating the two ..._INC values once each second to support dynamically changing the MCU system clock from the command line.