Long in the making, nanopb 0.4 has seen some wide reaching improvements in reaction to the development of the rest of the protobuf ecosystem. This document showcases features that are not immediately visible, but that you may want to take advantage of.
A lot of effort has been spent in retaining backwards and forwards compatibility with previous nanopb versions. For a list of breaking changes, see migration document
The basic design of nanopb has always been that the information about messages is stored in a compact descriptor format, which is iterated in runtime. Initially it was very tightly tied with encoder and decoder logic.
In nanopb-0.3.0 the field iteration logic was separated to
pb_common.c
. Already at that point it was clear that the
old format was getting too limited, but it wasn’t extended at that
time.
Now in 0.4, the descriptor format was completely decoupled from the
encoder and decoder logic, and redesigned to meet new demands.
Previously each field was stored as pb_field_t
struct,
which was between 8 and 32 bytes in size, depending on compilation
options and platform. Now information about fields is stored as a
variable length sequence of uint32_t
data words. There are
1, 2, 4 and 8 word formats, with the 8 word format containing plenty of
space for future extensibility.
One benefit of the variable length format is that most messages now
take less storage space. Most fields use 2 words, while simple fields in
small messages require only 1 word. Benefit is larger if code previously
required PB_FIELD_16BIT
or PB_FIELD_32BIT
options. In the AllTypes
test case, 0.3 had data size of
1008 bytes in 8-bit configuration and 1408 bytes in 16-bit
configuration. New format in 0.4 takes 896 bytes for either of
these.
In addition, the new decoupling has allowed moving most of the field descriptor data into FLASH on Harvard architectures, such as AVR. Previously nanopb was quite RAM-heavy on AVR, which cannot put normal constants in flash like most other platforms do.
Nanopb generator is now available as a Python package, installable
using pip
package manager. This will reduce the need for
binary packages, as if you have Python already installed you can just
pip install nanopb
and have the generator available on path
as nanopb_generator
.
The generator can also take advantage of the Python-based
protoc
available in grpcio-tools
Python
package. If you also install that, there is no longer a need to have
binary protoc
available.
Initially, nanopb generator was used in two steps: first calling
protoc
to parse the .proto
file into
.pb
binary format, and then calling
nanopb_generator.py
to output the .pb.h
and
.pb.c
files.
Nanopb 0.2.3 added support for running as a protoc
plugin, which allowed single-step generation using
--nanopb_out
parameter. However, the plugin mode has two
complications: passing options to nanopb generator itself becomes more
difficult, and the generator does not know the actual path of input
files. The second limitation has been particularly problematic for
locating .options
files.
Both of these older methods still work and will remain supported.
However, now nanopb_generator
can also take
.proto
files directly and it will transparently call
protoc
in the background.
Since its very beginnings, nanopb has supported field callbacks to
allow processing structures that are larger than what could fit in
memory at once. So far the callback functions have been stored in the
message structure in a pb_callback_t
struct.
Storing pointers along with user data is somewhat risky from a
security point of view. In addition it has caused problems with
oneof
fields, which reuse the same storage space for
multiple submessages. Because there is no separate area for each
submessage, there is no space to store the callback pointers either.
Nanopb-0.4.0 introduces callbacks that are referenced by the function
name instead of setting the pointers separately. This should work well
for most applications that have a single callback function for each
message type. For more complex needs, pb_callback_t
will
also remain supported.
Function name callbacks also allow specifying custom data types for
inclusion in the message structure. For example, you could have
MyObject*
pointer along with other message fields, and then
process that object in custom way in your callback.
This feature is demonstrated in tests/oneof_callback test case and examples/network_server example.
As mentioned above, callbacks inside submessages inside oneofs have
been problematic to use. To make using pb_callback_t
-style
callbacks there possible, a new generator option
submsg_callback
was added.
Setting this option to true will cause a new message level callback
to be added before the which_field
of the oneof. This
callback will be called when the submessage tag number is known, but
before the actual message is decoded. The callback can either choose to
set callback pointers inside the submessage, or just completely decode
the submessage there and then. If any unread data remains after the
callback returns, normal submessage decoding will continue.
There is an example of this in tests/oneof_callback test case.
It is often said that good C code is chock full of macros. Or maybe I got it wrong. But since nanopb 0.2, the field descriptor generation has heavily relied on macros. This allows it to automatically adapt to differences in type alignment on different platforms, and to decouple the Python generation logic from how the message descriptors are implemented on the C side.
Now in 0.4.0, I’ve made the macros even more abstract. Time will tell
whether this was such a great idea that I think it is, but now the
complete list of fields in each message is available in
.pb.h
file. This allows a kind of metaprogramming using X-macros
One feature that this can be used for is binding the message
descriptor to a custom structure or C++ class type. You could have a
bunch of other fields in the structure and even the datatypes can be
different to an extent, and nanopb will automatically detect the size
and position of each field. The generated .pb.c
files now
just have calls of PB_BIND(msgname, structname, width)
.
Adding a similar call to your own code will bind the message to your own
structure.
Protobuf format defines that strings should consist of valid UTF-8
codepoints. Previously nanopb has not enforced this, requiring extra
care in the user code. Now optional UTF-8 validation is available with
compilation option PB_VALIDATE_UTF8
.
Some platforms such as AVR
do not support the
double
datatype, instead making it an alias for
float
. This has resulted in problems when trying to process
message types containing double
fields generated on other
machines. There has been an example on how to manually perform the
conversion between double
and float
.
Now that example is integrated as an optional feature in nanopb core.
By defining PB_CONVERT_DOUBLE_FLOAT
, the required
conversion between 32- and 64-bit floating point formats happens
automatically on decoding and encoding.
Testing on embedded platforms has been integrated in the continuous testing environment. Now all of the 80+ test cases are automatically run on STM32 and AVR targets. Previously only a few specialized test cases were manually tested on embedded systems.
Nanopb fuzzer has also been integrated in Google’s OSSFuzz platform, giving a huge boost in the CPU power available for randomized testing.