Some ways to make an embedded system safer in the Covid-19’s times (II).

In the previous post I started a serie of advices about how to make our embedded systems safer and more reliables. In this post we will discuss another 5 hints.

6 Use a watchdog

A program might get stuck in a block of instructions for a variety of reasons. Some of them are:

  • Memory gets corrupted.
  • A hardware peripheral (internal or external) fails and stops responding.
  • In a real time system a higher task has monopolized the CPU, or the system failed into a priority-inversion state.

No matter what the problem was, we need to handle it and the systems must be recovered or at least it needs to get into a safe state.

A watchdog is a timer that can be external or internal to the chip that must be feeded (tickled, kicked) within a time window limit. If the program fails to feed it, then the watchdog resets the system. If the program got stuck in an infinite loop because of the reasons given above, then it will not be able to feed the watchdog, and as consecuence, the system will reset. The watchdogscan also be single or windowed. An external watchdog is completely independent from the chip, whilst an internal watchdog share some dependency with the chip (power, clocks). If the overall system allows it, you would prefer the external one; otherwise use the internal one with caution. While a single watchdog must be kicked before the time window closes, a windowed watchdog must be kicked after and before a time window.

Some internal watchdogs can fire an interrupt before the system resets. We can take advantage of this situation by putting the system into a safe state before it resets.

The ATmega328 chip, which is the core of the Arduino Uno boards, includes a single watchdog. And it can run in one of three modes:

  • Interrupt
  • System reset
  • Interrupt and system reset

Although the Arduino platform does not include direct support for the watchdog, we still can use it. All we have to do is to call the ATmega328 library wdt.h functions:

#include <avr/wdt.h>

wdt_enable( TIME_OUT ); // configuration
wdt_disable();          // should be use under very specific conditions, better off if you don't use it
wdt_reset();            // feed the dog

As per this library we cannot program the watchdog in any mode other than system reset. And I did not found a way to set the interrupt mode (you might take a look to the file avr/wdt.h), so we must live with that.

A word of caution: Never feed the watchdog from a ISR or a periodic function. That’s not going to work. The use of a watchdog is an art, and before you feed it you must check that critical conditions in the program have been met.

Watchdog theory and practice is a topic for its own and for a larger article. Nonetheless, I have cured some interesting must-read online resources written by the embedded systems gurus. I hope you find them as interesting as I did:

Finally, in this regards we should also use the Power-on and Brown-out features of the modern processors. Power-on holds the reset line in its active state until the power supply reaches a stable and secure level, so the processor starts safely. A Brown-out condition is given when the power supply fails below a threshold level and either an interrupt or reset (or both) are fired. The ATmega328 chip includes both; however, as the watchdog library, they are not included as part of the Arduino API, but you can still use them. And please, use them! Remember the kind of firmware we are tackling.

7 Make your system real time

This is a good one. The most critical parts in your system must be executed even when less important are in execution. A blocking function like delay() hurds the CPU until its timeout is reached, and no other function can do something useful, even if that other function is critical to the system. The only way to break this behaviour is interrupting somehow the function that has monopolized the CPU and yield the control to the interrupt service routine.

Can Arduino become a real time system? To a some extent, yes. We have two options: to use an interrupt driven system, or to use a real time operating system. Arduino is not and never meant to be real time, that’s way the access to interrupts is limited, and some peripherals do not use interrupts (like the ADC module), they poll instead (blocking the system, of course).

We might struggle with the interrupts in Arduino or better off, we can use a real time pre-emptive operating system, like FreeRTOS. Preemptive means that tasks with higher priorities (more important to the eyes of the application) will preempt (take the control over) tasks with lower priorities (less important).

I have been working on a project called The Molcajete project which aims to bring together Arduino and FreeRTOS. It also includes the capability of compiling from the command line, multi files projects, and to program from the very main() function. Of course there are more options to choose from, but the point here is that a system that is supposed to respond to critical events within strict time limits must be real time through interrupts or pre-emptive operating systems.

Hint: If you choose the pre-emptive real time operating system solution, then don’t use dynamic objects; use static ones instead. I wrote an entry in using and creating static tasks.

8 Make your functions have only one exit point when possible

Do not scatter the function’s exit points. We all are used to put a return in wherever point we know the program must left a function. Although this situation is not bad by itself, it might become a nightmare when the time to maintain the program comes. For example, instead of:

bool foo( uint8_t arg )
{
   if( arg > 128 ){
      return true;
   } else{
      return false;
   }
}

We should do this:

bool foo( uint8_t arg )
{
   bool ret_val = false;

   if( arg > 128 ){
      ret_val = true;
   } else{
      ret_val = false;
   }

   return ret_val;
}

I want to point out an important good programming practice here: Always initialize your variables. Even more, as C99 already allows it (20 years after), declare (and initialize) your variables as near as possible to the point where they are going to be used. Those times when we must declare our variables at the very beginning of our functions has gone. For example, say we want to read the ADC and to apply a simple filter, then we might code the function in this way:

uint16_t adc_filter( uint8_t channel )
{
   // some stuff here

   // more stuff here

   uint16_t adc_reading = 0;
   for( uint8_t i; i < 8; ++i ){
      adc_reading += analogRead( channel );
   }

   return adc_reading / 8;
}

Observe that I have declared the variable adc_reading exactly in the place I’m about to use it. Please take a look to the for instruction, what do you see? I’m declaring the counting variable inside the for! In this way, the counting variable is local to the loop, so it does not exist outside of it.

9 Test all paths

We set a path (or branch) to exit or to signal when something went wrong so we disrupt the normal program behaviour, but Have we really test that path, or was it just a good intention? We must force the sytem to take those maleficent branches in order to verify that they really do what are intended to do. Perhaps we made a mistake when coding the condition (if( a = b), you know what I mean), and when the time comes that the branch will save us, it does not fire because we made a mistake.

To test all paths will not always be possible because of the nature of the asynchronous events in our embedded systems. There will be scenarios completely dependent of outside hardware, so the best we can do is to mock the offending sub-system, or at least to make a double check in our conditionals.

To learn more about mocking you might take a look at this.

10 Use redundancy for critical variables

How do you know that an X-ray originated in a far far galaxy has not corrupted one of your critical variables changing a bit? At first there is no way to say it. However there are a bunch of workarounds when this catastrophical scenario shows up (if ever). One of them is the so called One’s complement pattern (OCP) (I saw it for the first and only time in: Douglass, Bruce Powel. Designing patterns for embedded systems in C. Newnes, 2011. Beware: the book is full of errors and typos and the code is so hard to read and follow. However, the theoretical information inside the book is great.)

Besides X-rays, there are others more mundane sources of corruption:

  • Electromagnetic interference (EMI).
  • Faulty hardware, including the RAM memory.
  • Power supply spikes.

The OCP technique includes a copy of the variable, but in the form of the one’s complement. Before to use our critical variable we invert the one’s complement form and then compare it with the original. If they match, then we can use the variable. Otherwise we take the appropiate actions. In a multithreaded system (e.g. using a multitasking kernel) the reading and writing must be atomic, perhaps using a mutex.

The next program shows an implementation of the OCP pattern with a single variable. Of course, the program can be modified to handle compound types, and even better, to use abstract data types. Later on I will include a program in C++, but for now I hope you can see the spirit and the benefits of this technique.

#include <stdint.h>
#include <stdio.h>
#include <stdbool.h>
#include <assert.h>


typedef struct
{
   uint16_t temperature; // raw variable
   uint16_t inverted;    // inverted variable
} Temperature_t;

static uint16_t invert( uint16_t val )
{
   return ~val;
}

void set( Temperature_t* me, uint16_t temp_raw )
{
   me->temperature = temp_raw;
   me->inverted = invert( temp_raw );
}

bool get( const Temperature_t* me, uint16_t* res )
{
   bool pass = false;

   uint16_t raw = invert( me->inverted );

   if( me->temperature == raw ){
      *res = me->temperature;
      pass = true;
   } else{
      pass = false;

      // fire the alarms!
   }

   return pass;
}

// ----
#define A0 0
uint16_t analogRead( uint8_t c )
{
   return 0xaa;
}
// ----


int main()
{
   Temperature_t lm35;

   set( &lm35, analogRead( A0 ) );

   // do some stuff in between ...

   
   // ---- in production mode:

   uint16_t lm35_raw;
   if( get( &lm35, &lm35_raw ) == true ){

      // do something with the critical variable 'lm35_raw'

   } else{

      // oops, something went wrong!

   }

   
   // ---- in testing mode:

   uint16_t val;
   get( &lm35, &val );
   assert( 0xaa == val );
   // should pass

   lm35.temperature += 1;
   // an X-ray beam passed through our chip

   uint16_t other_val;
   get( &lm35, &other_val );
   assert( 0xaa == other_val );
   // should not pass
}

Conclusions

We’ve seen other 5 techniques, not easy to implement all of them, that might help us in building firmware more secure and reliable. Hope you find them not only interesting but usefull as well, and you might adapt some (or all!) into your actual and ongoing projects. Be free to drop me a line.

You can reach the first part of this series here, and the last part here.

3 respuestas para “Some ways to make an embedded system safer in the Covid-19’s times (II).”

Responder

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Salir /  Cambiar )

Google photo

Estás comentando usando tu cuenta de Google. Salir /  Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Salir /  Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Salir /  Cambiar )

Conectando a %s