Planet Linux Australia

,

Gary PendergastWordPress Importers: Defining a Schema

While schemata are usually implemented using language-specific tools (eg, XML uses XML Schema, JSON uses JSON Schema), they largely use the same concepts when talking about data. This is rather helpful, we don’t need to make a decision on data formats before we can start thinking about how the data should be arranged.

Note: Since these concepts apply equally to all data formats, I’m using “WXR” in this post as shorthand for “the structured data section of whichever file format we ultimately use”, rather than specifically referring to the existing WXR format. 🙂

Why is a Schema Important?

It’s fair to ask why, if the WordPress Importers have survived this entire time without a formal schema, why would we need one now?

There are two major reasons why we haven’t needed one in the past:

  • WXR has remained largely unchanged in the last 10 years: there have been small additions or tweaks, but nothing significant. There’s been no need to keep track of changes.
  • WXR is currently very simple, with just a handful of basic elements. In a recent experiment, I was able to implement a JavaScript-based WXR generator in just a few days, entirely by referencing the Core implementation.

These reasons are also why it would help to implement a schema for the future:

  • As work on WXR proceeds, there will likely need to be substantial changes to what data is included: adding new fields, modifying existing fields, and removing redundant fields. Tracking these changes helps ensure any WXR implementations can stay in sync.
  • These changes will result in a more complex schema: relying on the source to re-implement it will become increasingly difficult and error-prone. Following Gutenberg’s lead, it’s likely that we’d want to provide official libraries in both PHP and JavaScript: keeping them in sync is best done from a source schema, rather than having one implementation copy the other.

Taking the time to plan out a schema now gives us a solid base to work from, and it allows for future changes to happen in a reliable fashion.

WXR for all of WordPress

With a well defined schema, we can start to expand what data will be included in a WXR file.

Media

Interestingly, many of the challenges around media files are less to do with WXR, and more to do with importer capabilities. The biggest headache is retrieving the actual files, which the importer currently handles by trying to retrieve the file from the remote server, as defined in the wp:attachment_url node. In context, this behaviour is understandable: 10+ years ago, personal internet connections were too slow to be moving media around, it was better to have the servers talk to each other. It’s a useful mechanism that we should keep as a fallback, but the more reliable solution is to include the media file with the export.

Plugins and Themes

There are two parts to plugins and themes: the code, and the content. Modern WordPress sites require plugins to function, and most are customised to suit their particular theme.

For exporting the code, I wonder if a tiered solution could be applied:

  • Anything from WordPress.org would just need their slug, since they can be re-downloaded during import. Particularly as WordPress continues to move towards an auto-updated future, modified versions of plugins and themes are explicitly not supported.
  • Third party plugins and themes would be given a filter to use, where they can provide a download URL that can be included in the export file.
  • Third party plugins/themes that don’t provide a download URL would either need to be skipped, or zipped up and included in the export file.

For exporting the content, WXR already includes custom post types, but doesn’t include custom settings, or custom tables. The former should be included automatically, and the latter would likely be handled by an appropriate action for the plugin to hook into.

Settings

There are a currently handful of special settings that are exported, but (as I just noted, particularly with plugins and themes being exported) this would likely need to be expanded to included most items in wp_options.

Users

Currently, the bare minimum information about users who’ve authored a post is included in the export. This would need to be expanded to include more user information, as well as users who aren’t post authors.

WXR for parts of WordPress

The modern use case for importers isn’t just to handle a full site, but to handle keeping sites in sync. For example, most news organisations will have a staging site (or even several layers of staging!) which is synchronised to production.

While it’s well outside the scope of this project to directly handle every one of these use cases, we should be able to provide the framework for organisations to build reliable platforms on. Exports should be repeatable, objects in the export should have unique identifiers, and the importer should be able to handle any subset of WXR.

WXR Beyond WordPress

Up until this point, we’ve really been talking about WordPress→WordPress migrations, but I think WXR is a useful format beyond that. Instead of just containing direct exports of the data from particular plugins, we could also allow it to contain “types” of data. This turns WXR into an intermediary language, exports can be created from any source, and imported into WordPress.

Let’s consider an example. Say we create a tool that can export a Shopify, Wix, or GoDaddy site to WXR, how would we represent an online store in the WXR file? We don’t want to export in the format that any particular plugin would use, since a WordPress Core tool shouldn’t be advantaging one plugin over others.

Instead, it would be better if we could format the data in a platform-agnostic way, which plugins could then implement support for. As luck would have it, Schema.org provides exactly the kind of data structure we could use here. It’s been actively maintained for nearly nine years, it supports a wide variety of data types, and is intentionally platform-agnostic.

Gazing into my crystal ball for a moment, I can certainly imagine a future where plugins could implement and declare support for importing certain data types. When handling such an import (assuming one of those plugins wasn’t already installed), the WordPress Importer could offer them as options during the import process. This kind of seamless integration allows WordPress to show that it offers the same kind of fully-featured site building experience that modern CMS services do.

Of course, reality is never quite as simple as crystal balls and magic wands make them out to be. We have to contend with services that provide incomplete or fragmented exports, and there are even services that deliberately don’t provide exports at all. In the next post, I’ll be writing about why we should address this problem, and how we might be able to go about it.

This post is part of a series, talking about the WordPress Importers, their history, where they are now, and where they could go in the future.

,

Gary PendergastWordPress Importers: Getting Our House in Order

The previous post talked about the broad problems we need to tackle to bring our importers up to speed, making them available for everyone to use.

In this post, I’m going to focus on what we could do with the existing technology, in order to give us the best possible framework going forward.

A Reliable Base

Importers are an interesting technical problem. Much like you’d expect from any backup/restore code, importers need to be extremely reliable. They need to comfortable handle all sorts of unusual data, and they need to keep it all safe. Particularly considering their age, the WordPress Importers do a remarkably good job of handling most content you can throw at it.

However, modern development practices have evolved and improved since the importers were first written, and we should certainly be making use of such practices, when they fit with our requirements.

For building reliable software that we expect to largely run by itself, a variety of comprehensive automated testing is critical. This ensures we can confidently take on the broader issues, safe in the knowledge that we have a reliable base to work from.

Testing must be the first item on this list. A variety of automated testing gives us confidence that changes are safe, and that the code can continue to be maintained in the future.

Data formats must be well defined. While this is useful for ensuring data can be handled in a predictable fashion, it’s also a very clear demonstration of our commitment to data freedom.

APIs for creating or extending importers should be straightforward for hooking into.

Performance Isn’t an Optional Extra

With sites constantly growing in size (and with the export files potentially gaining a heap of extra data), we need to care about the performance of the importers.

Luckily, there’s already been some substantial work done on this front:

There are other groups in the WordPress world who’ve made performance improvements in their own tools: gathering all of that experience is a relatively quick way to bring in production-tested improvements.

The WXR Format

It’s worth talking about the WXR format itself, and determining whether it’s the best option for handling exports into the future. XML-based formats are largely viewed as a relic of days gone past, so (if we were to completely ignore backwards compatibility for a moment) is there a modern data format that would work better?

The short answer… kind of. 🙂

XML is actually well suited to this use case, and (particularly when looking at performance improvements) is the only data format for which PHP comes with a built-in streaming parser.

That said, WXR is basically an extension of the RSS format: as we add more data to the file that clearly doesn’t belong in RSS, there is likely an argument for defining an entirely WordPress-focused schema.

Alternative Formats

It’s important to consider what the priorities are for our export format, which will help guide any decision we make. So, I’d like to suggest the following priorities (in approximate priority order):

  • PHP Support: The format should be natively supported in PHP, thought it is still workable if we need to ship an additional library.
  • Performant: Particularly when looking at very large exports, it should be processed as quickly as possible, using minimal RAM.
  • Supports Binary Files: The first comments on my previous post asked about media support, we clearly should be treating it as a first-class citizen.
  • Standards Based: Is the format based on a documented standard? (Another way to ask this: are there multiple different implementations of the format? Do those implementations all function the same?
  • Backward Compatible: Can the format be used by existing tools with no changes, or minimal changes?
  • Self Descriptive: Does the format include information about what data you’re currently looking at, or do you need to refer to a schema?
  • Human Readable: Can the file be opened and read in a text editor?

Given these priorities, what are some options?

WXR (XML-based)

Either the RSS-based schema that we already use, or a custom-defined XML schema, the arguments for this format are pretty well known.

One argument that hasn’t been well covered is how there’s a definite trade-off when it comes to supporting binary files. Currently, the importer tries to scrape the media file from the original source, which is not particularly reliable. So, if we were to look at including media files in the WXR file, the best option for storing them is to base64 encode them. Unfortunately, that would have a serious effect on performance, as well as readability: adding huge base64 strings would make even the smallest exports impossible to read.

Either way, this option would be mostly backwards compatible, though some tools may require a bit of reworking if we were to substantial change the schema.

WXR (ZIP-based)

To address the issues with media files, an alternative option might be to follow the path that Microsoft Word and OpenOffice use: put the text content in an XML file, put the binary content into folders, and compress the whole thing.

This addresses the performance and binary support problems, but is initially worse for readability: if you don’t know that it’s a ZIP file, you can’t read it in a text editor. Once you unzip it, however, it does become quite readable, and has the same level of backwards compatibility as the XML-based format.

JSON

JSON could work as a replacement for XML in both of the above formats, with one additional caveat: there is no streaming JSON parser built in to PHP. There are 3rd party libraries available, but given the documented differences between JSON parsers, I would be wary about using one library to produce the JSON, and another to parse it.

This format largely wouldn’t be backwards compatible, though tools which rely on the export file being plain text (eg, command line tools to do broad search-and-replaces on the file) can be modified relatively easily.

There are additional subjective arguments (both for and against) the readability of JSON vs XML, but I’m not sure there’s anything to them beyond personal preference.

SQLite

The SQLite team wrote an interesting (indirect) argument on this topic: OpenOffice uses a ZIP-based format for storing documents, the SQLite team argued that there would be benefits (particularly around performance and reliability) for OpenOffice to switch to SQLite.

They key issues that I see are:

  • SQLite is included in PHP, but not enabled by default on Windows.
  • While the SQLite team have a strong commitment to providing long-term support, SQLite is not a standard, and the only implementation is the one provided by the SQLite team.
  • This option is not backwards compatible at all.

FlatBuffers

FlatBuffers is an interesting comparison, since it’s a data format focussed entirely on speed. The down side of this focus is that it requires a defined schema to read the data. Much like SQLite, the only standard for FlatBuffers is the implementation. Unlike SQLite, FlatBuffers has made no commitments to providing long-term support.

WXR (XML-based)WXR (ZIP-based)JSONSQLiteFlatBuffers
Works in PHP?✅✅⚠⚠⚠
Performant?⚠✅⚠✅✅
Supports Binary Files?⚠✅⚠✅✅
Standards Based?✅✅✅⚠ / ��
Backwards Compatible?⚠⚠���
Self Descriptive?✅✅✅✅�
Readable?✅⚠ / �✅��

As with any decision, this is a matter of trade-offs. I’m certainly interested in hearing additional perspectives on these options, or thoughts on options that I haven’t considered.

Regardless of which particular format we choose for storing WordPress exports, every format should have (or in the case of FlatBuffers, requires) a schema. We can talk about schemata without going into implementation details, so I’ll be writing about that in the next post.

This post is part of a series, talking about the WordPress Importers, their history, where they are now, and where they could go in the future.

Gary PendergastWordPress Importers: Stating the Problem

It’s time to focus on the WordPress Importers.

I’m not talking about tidying them up, or improve performance, or fixing some bugs, though these are certainly things that should happen. Instead, we need to consider their purpose, how they fit as a driver of WordPress’ commitment to Open Source, and how they can be a key element in helping to keep the Internet Open and Free.

The History

The WordPress Importers are arguably the key driver to WordPress’ early success. Before the importer plugins existed (before WordPress even supported plugins!) there were a handful of import-*.php scripts in the wp-admin directory that could be used to import blogs from other blogging platforms. When other platforms fell out of favour, WordPress already had an importer ready for people to move their site over. One of the most notable instances was in 2004, when Moveable Type changed their license and prices, suddenly requiring personal blog authors to pay for something that had previously been free. WordPress was fortunate enough to be in the right place at the right time: many of WordPress’ earliest users came from Moveable Type.

As time went on, WordPress became well known in its own right. Growth relied less on people wanting to switch from another provider, and more on people choosing to start their site with WordPress. For practical reasons, the importers were moved out of WordPress Core, and into their own plugins. Since then, they’ve largely been in maintenance mode: bugs are fixed when they come up, but since export formats rarely change, they’ve just continued to work for all these years.

An unfortunate side effect of this, however, is that new importers are rarely written. While a new breed of services have sprung up over the years, the WordPress importers haven’t kept up.

The New Services

There are many new CMS services that have cropped up in recent years, and we don’t have importers for any of them. WordPress.com has a few extra ones written, but they’ve been built on the WordPress.com infrastructure out of necessity.

You see, we’ve always assumed that other CMSes will provide some sort of export file that we can use to import into WordPress. That isn’t always the case, however. Some services (notable, Wix and GoDaddy Website Builder) deliberately don’t allow you to export your own content. Other services provide incomplete or fragmented exports, needlessly forcing stress upon site owners who want to use their own content outside of that service.

To work around this, WordPress.com has implemented importers that effectively scrape the site: while this has worked to some degree, it does require regular maintenance, and the importer has to do a lot of guessing about how the content should be transformed. This is clearly not a solution that would be maintainable as a plugin.

Problem Number 4

Some services work against their customers, and actively prevent site owners from controlling their own content.

This strikes at the heart of the WordPress Bill of Rights. WordPress is built with fundamental freedoms in mind: all of those freedoms point to owning your content, and being able to make use of it in any form you like. When a CMS actively works against providing such freedom to their community, I would argue that we have an obligation to help that community out.

A Variety of Content

It’s worth discussing how, when starting a modern CMS service, the bar for success is very high. You can’t get away with just providing a basic CMS: you need to provide all the options. Blogs, eCommerce, mailing lists, forums, themes, polls, statistics, contact forms, integrations, embeds, the list goes on. The closest comparison to modern CMS services is… the entire WordPress ecosystem: built on WordPress core, but with the myriad of plugins and themes available, along with the variety of services offered by a huge array of companies.

So, when we talk about the importers, we need to consider how they’ll be used.

Problem Number 3

To import from a modern CMS service into WordPress, your importer needs to map from service features to WordPress plugins.

Getting Our Own House In Order

Some of these problems don’t just apply to new services, however.

Out of the box, WordPress exports to WXR (WordPress eXtended RSS) files: an XML file that contains the content of the site. Back when WXR was first created, this was all you really needed, but much like the rest of the WordPress importers, it hasn’t kept up with the times. A modern WordPress site isn’t just the sum of its content: a WordPress site has plugins and themes. It has various options configured, it has huge quantities of media, it has masses of text content, far more than the first WordPress sites ever had.

Problem Number 2

WXR doesn’t contain a full export of a WordPress site.

In my view, WXR is a solid format for handling exports. An XML-based system is quite capable of containing all forms of content, so it’s reasonable that we could expand the WXR format to contain the entire site.

Built for the Future

If there’s one thing we can learn from the history of the WordPress importers, it’s that maintenance will potentially be sporadic. Importers are unlikely to receive the same attention that the broader WordPress Core project does, owners may come and go. An importer will get attention if it breaks, of course, but it otherwise may go months or years without changing.

Problem Number 1

We can’t depend on regular importer maintenance in the future.

It’s quite possible to build code that will be running in 10+ years: we see examples all across the WordPress ecosystem. Doing it in a reliable fashion needs to be a deliberate choice, however.

What’s Next?

Having worked our way down from the larger philosophical reasons for the importers, to some of the more technically-oriented implementation problems; I’d like to work our way back out again, focussing on each problem individually. In the following posts, I’ll start laying out how I think we can bring our importers up to speed, prepare them for the future, and make them available for everyone.

This post is part of a series, talking about the WordPress Importers, their history, where they are now, and where they could go in the future.

,

Michael StillA super simple sourdough loaf

Share

This is the fourth in a series of posts documenting my adventures in making bread during the COVID-19 shutdown.

This post has been a while coming, but my sister in law was interested in the sourdough loaf last night, so I figured I should finally document my process. First off you need to have a sourdough starter, which I wrote up in a previous post. I am sure less cheaty ways will work too, but the cheating was where it was at for me.

Then, you basically follow the process I use for my super simple non-breadmaker loaf, but tweaked a little to use the starter. For the loaf itself:

  • 2 cups of bakers flour (not plain white flour)
  • 1 tea spoon of salt
  • 2 cups of the sourdough starter
  • 1 cup water

Similarly to the super simple loaf, you want the dough to be a bit tacky when mixed — it gets runnier as the yeast does its thing, so it will be too runny if it doesn’t start out tacky.

I then just leave it on the kitchen bench under a cover for the day. In the evening its baked like the super simple loaf — heat a high thermal mass dutch oven for 30 minutes at 230 degrees celcius, and then bake the break in the dutch over for first 30 minutes with the lid on, and then 12 more minutes with the lid off.

You also need to feed the starter when you make the loaf dough. That’s just 1.5 cups of flour, and a cup of warm water mixed into the starter after you’ve taken out the starter for the loaf. I tweak the flour to water ratio to keep the starter at a fairly thick consistency, and you’ll learn over time what is right. You basically want pancake batter consistency.

We keep our starter in the fridge and need to feed it (which means baking) twice a week. If we kept it on the bench we’d need to bake daily.

Share

,

Craige McWhirterSober Living for the Revolution

by Gabriel Kuhn

Sober Living for the Revolution: Hardcore Punk, Straight Edge, and Radical Politics

This is not a new book, having been published in 2010 but it's a fairly recent discovery for me.

I was never part of the straight edge scene here in Australia but was certainly aware of some of the more prominent bands and music in the punk scene in general. I've always had an ear for music with a political edge.

When it came to the straight edge scene I knew sweet FA. So that aspect of this book was pure curiousity. What attracted me to this work was the subject of radical sobriety and it's lived experience amongst politically active people.

In life, if you decide to forgo something that everybody else does, it gives you a perspective on society that that you wouldn't have if you were just engaging. It teaches you a lot about the world.

-- Ian MacKaye

This was one of the first parts of the book to really pop out at me. This rang true for my lived experience in other parts of my life where I'd forgone things that everyone else does. There were costs in not engaging but Ian is otherwise correct.

While entirely clear eyed about the problems of inebriation amongst Australian activists and in wider society as a whole, the titular concept of sober living for the revolution had not previously resonated with me.

But then I realised that if you do not speak that language, you recognise that they are not talking to you... In short, if you don't speak the language of violence, you are released from violence. This was a very profound discovery for me.

-- Ian MacKaye

While my quotes are pretty heavily centered on one individual, there are about 20 contributors from Europe, the middle east and both North and South America provding reasonably diverse perspective on the music but more importantly the inspiration and positive impacts of radical sobriety on their communities.

As someone who was reading primarilly for the sober living insights, the book's focus on the straight edge scene was quite heavy to wade through but the insights gained were worth the musical history lessons.

The only strategy for sharing good ideas that succeeds unfailingly... is the power of example — if you put “ecstatic sobriety” into action in your life, and it works, those who sincerely want similar things will join in.

-- Crimethinc

Overall this book pulled together a number of threads I'd been pulling on myself over my adult life and brought them into one comical phrase: lucid bacchanalism.

I was also particularly embarassed to have not previously identified alcohol consumption as not merely a recreation but yet another insidious form of consumerism.

Well worth a read.

Russell CokerPSI and Cgroup2

In the comments on my post about Load Average Monitoring [1] an anonymous person recommended that I investigate PSI. As an aside, why do I get so many great comments anonymously? Don’t people want to get credit for having good ideas and learning about new technology before others?

PSI is the Pressure Stall Information subsystem for Linux that is included in kernels 4.20 and above, if you want to use it in Debian then you need a kernel from Testing or Unstable (Bullseye has kernel 4.19). The place to start reading about PSI is the main Facebook page about it, it was originally developed at Facebook [2].

I am a little confused by the actual numbers I get out of PSI, while for the load average I can often see where they come from (EG have 2 processes each taking 100% of a core and the load average will be about 2) it’s difficult to work out where the PSI numbers come from. For my own use I decided to treat them as unscaled numbers that just indicate problems, higher number is worse and not worry too much about what the number really means.

With the cgroup2 interface which is supported by the version of systemd in Testing (and which has been included in Debian backports for Buster) you get PSI files for each cgroup. I’ve just uploaded version 1.3.5-2 of etbemon (package mon) to Debian/Unstable which displays the cgroups with PSI numbers greater than 0.5% when the load average test fails.

System CPU Pressure: avg10=0.87 avg60=0.99 avg300=1.00 total=20556310510
/system.slice avg10=0.86 avg60=0.92 avg300=0.97 total=18238772699
/system.slice/system-tor.slice avg10=0.85 avg60=0.69 avg300=0.60 total=11996599996
/system.slice/system-tor.slice/tor@default.service avg10=0.83 avg60=0.69 avg300=0.59 total=5358485146

System IO Pressure: avg10=18.30 avg60=35.85 avg300=42.85 total=310383148314
 full avg10=13.95 avg60=27.72 avg300=33.60 total=216001337513
/system.slice avg10=2.78 avg60=3.86 avg300=5.74 total=51574347007
/system.slice full avg10=1.87 avg60=2.87 avg300=4.36 total=35513103577
/system.slice/mariadb.service avg10=1.33 avg60=3.07 avg300=3.68 total=2559016514
/system.slice/mariadb.service full avg10=1.29 avg60=3.01 avg300=3.61 total=2508485595
/system.slice/matrix-synapse.service avg10=2.74 avg60=3.92 avg300=4.95 total=20466738903
/system.slice/matrix-synapse.service full avg10=2.74 avg60=3.92 avg300=4.95 total=20435187166

Above is an extract from the output of the loadaverage check. It shows that tor is a major user of CPU time (the VM runs a ToR relay node and has close to 100% of one core devoted to that task). It also shows that Mariadb and Matrix are the main users of disk IO. When I installed Matrix the Debian package told me that using SQLite would give lower performance than MySQL, but that didn’t seem like a big deal as the server only has a few users. Maybe I should move Matrix to the Mariadb instance. to improve overall system performance.

So far I have not written any code to display the memory PSI files. I don’t have a lack of RAM on systems I run at the moment and don’t have a good test case for this. I welcome patches from people who have the ability to test this and get some benefit from it.

We are probably about 6 months away from a new release of Debian and this is probably the last thing I need to do to make etbemon ready for that.

Russell CokerRISC-V and Qemu

RISC-V is the latest RISC architecture that’s become popular. It is the 5th RISC architecture from the University of California Berkeley. It seems to be a competitor to ARM due to not having license fees or restrictions on alterations to the architecture (something you have to pay extra for when using ARM). RISC-V seems the most popular architecture to implement in FPGA.

When I first tried to run RISC-V under QEMU it didn’t work, which was probably due to running Debian/Unstable on my QEMU/KVM system and there being QEMU bugs in Unstable at the time. I have just tried it again and got it working.

The Debian Wiki page about RISC-V is pretty good [1]. The instructions there got it going for me. One thing I wasted some time on before reading that page was trying to get a netinst CD image, which is what I usually do for setting up a VM. Apparently there isn’t RISC-V hardware that boots from a CD/DVD so there isn’t a Debian netinst CD image. But debootstrap can install directly from the Debian web server (something I’ve never wanted to do in the past) and that gave me a successful installation.

Here are the commands I used to setup the base image:

apt-get install debootstrap qemu-user-static binfmt-support debian-ports-archive-keyring

debootstrap --arch=riscv64 --keyring /usr/share/keyrings/debian-ports-archive-keyring.gpg --include=debian-ports-archive-keyring unstable /mnt/tmp http://deb.debian.org/debian-ports

I first tried running RISC-V Qemu on Buster, but even ls didn’t work properly and the installation failed.

chroot /mnt/tmp bin/bash
# ls -ld .
/usr/bin/ls: cannot access '.': Function not implemented

When I ran it on Unstable ls works but strace doesn’t work in a chroot, this gave enough functionality to complete the installation.

chroot /mnt/tmp bin/bash
# strace ls -l
/usr/bin/strace: test_ptrace_get_syscall_info: PTRACE_TRACEME: Function not implemented
/usr/bin/strace: ptrace(PTRACE_TRACEME, ...): Function not implemented
/usr/bin/strace: PTRACE_SETOPTIONS: Function not implemented
/usr/bin/strace: detach: waitpid(1602629): No child processes
/usr/bin/strace: Process 1602629 detached

When running the VM the operation was noticably slower than the emulation of PPC64 and S/390x which both ran at an apparently normal speed. When running on a server with equivalent speed CPU a ssh login was obviously slower due to the CPU time taken for encryption, a ssh connection from a system on the same LAN took 6 seconds to connect. I presume that because RISC-V is a newer architecture there hasn’t been as much effort made on optimising the Qemu emulation and that a future version of Qemu will be faster. But I don’t think that Debian/Bullseye will give good Qemu performance for RISC-V, probably more changes are needed than can happen before the freeze. Maybe a version of Qemu with better RISC-V performance can be uploaded to backports some time after Bullseye is released.

Here’s the Qemu command I use to run RISC-V emulation:

qemu-system-riscv64 -machine virt -device virtio-blk-device,drive=hd0 -drive file=/vmstore/riscv,format=raw,id=hd0 -device virtio-blk-device,drive=hd1 -drive file=/vmswap/riscv,format=raw,id=hd1 -m 1024 -kernel /boot/riscv/vmlinux-5.10.0-1-riscv64 -initrd /boot/riscv/initrd.img-5.10.0-1-riscv64 -nographic -append net.ifnames=0 noresume security=selinux root=/dev/vda ro -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-device,rng=rng0 -device virtio-net-device,netdev=net0,mac=02:02:00:00:01:03 -netdev tap,id=net0,helper=/usr/lib/qemu/qemu-bridge-helper

Currently the program /usr/sbin/sefcontext_compile from the selinux-utils package needs execmem access on RISC-V while it doesn’t on any other architecture I have tested. I don’t know why and support for debugging such things seems to be in early stages of development, for example the execstack program doesn’t work on RISC-V now.

RISC-V emulation in Unstable seems adequate for people who are serious about RISC-V development. But if you want to just try a different architecture then PPC64 and S/390 will work better.

,

Linux AustraliaCouncil Meeting Tuesday 12th January 2021 – Minutes

1. Meeting overview and key information

Present

Sae Ra Germaine

Jonathan Woithe

Julien Goodwin

Russell Stuart

Lisa Sands

Joel Addison

 

Apologies 

Benno Rice

 

Meeting opened at 1931 AEDT by Sae Ra and quorum was achieved.

Minutes taken by Julien.

2. Log of correspondence

  • From Anchor to council@ on 16 Dec 2020: opensource.org.au is due for renewal by 15 Feb 2021.
  • From Big Orange Heart via web form: request for sponsorship of WordFest (https://wordfest.live) on 22 Jan 2021.
    [Response has been sent indicating Council in caretaker mode, suggesting a longer lead time in future and outlining LA’s funding priorities.]
  • From Anchor to council@ on 27 Dec 2020: lca2018.org requires renewal by 26 Feb 2021.
    [As per meeting on 15 Dec 2020, this will be allowed to expire if the few remaining services dependent on the domain have been moved by 26 Feb 2021.]
  • From <a member> via website form on 27 Dec 2020: will LA write a submission into the Productivity Commission “Right to Repair” enquiry? A response was sent:
    • The tight timing (closing date in early Feb) and existing commitments of the Council members will almost certainly prevent this happening unless a member steps up to do it.
    • If a member wished LA to sign a submission they prepared and it was consistent with LA’s values, Council would consider signing it.

<The member> is writing a submission.  He has subsequently indicated he won’t request LA sign it as it contains a lot of personal perspective and he doesn’t feel he can speak for others. He may share it on linux-aus for comments.

  • From Binh Hguyen on Grants list, asking about out-of-cycle funding for a project idea (a public information aggregator). Response sent indicating Council is in caretaker mode and suggesting that they wait for the 2021 Grant program if it eventuates.

 

3. Items for discussion

  • AGM –  Are we ready, who is doing what etc. 
    • Roles required: 
      • Chat Monitor
      • Votes/Poll Monitor
      • Hands Up – Reaction Monitor
      • Timer
  • Bringing together the nomination sheets, these need to be collated, distributed and a seconder needs to be found for each of the nominations.
  • Annual Report – what is missing what do we need to send out etc.
    • Hoping for reports from web team, pycon.
    • Once those are received the report can be sent out.
  • Rusty Wrench
    • Decision is to award to <see announcement>, Jon or one of the other formers will award
    • AI: Sae Ra to liaise with Jon
  • JW: Russell Coker has set up a replacement for planet.linux.org.au at https://planet.luv.asn.au/. It has been suggested that planet.linux.org.au either point to this URL or contain a link to it.
    • AI: Jonathan & Julien to action link on current static page
  • JW: <A member> suggests via council@ on 5 Jan 2021 that the “Should a digest be dispatched daily when the size threshold isn’t reached” setting for la-announce be set to “Yes”. Due to the low traffic volume on this list, users with digest mode set may only receive messages after a delay of many months.
    • All agree, however it turns out the setting has already been enabled.

4. Items for noting

  • None

5. Other business

  • None

6. In camera

  • No items were discussed in camera

Meeting closed at 2024

The post Council Meeting Tuesday 12th January 2021 – Minutes appeared first on Linux Australia.

Pia AndrewsReflections on public sector transformation and COVID

Public sectors around the world are facing unprecedented challenges as the speed, scale and complexity of modern life grows exponentially. The 21st century is a large, complex, globalised and digital age unlike anything in the history of humans, but our systems of governance were largely forged in the industrial age. The 20th century alone saw enough change to merit a rethink: global population rose from 1.6 billion to 6 billion, two world wars spurred the creation of global economic and power structures, the number of nations rose from 77 to almost 200, and of course we entered the age of electronics and the internet, changing forever the experience, connectivity, access to knowledge, and increased individual empowerment of people everywhere. Between Climate Change, COVID-19, and globalism, nations worldwide are also now preparing for the likelihood of rolling emergencies, whether health, environmental, economic or social.

“Traditional” approaches to policy, service delivery and regulation are too slow, increasingly ineffective and result in increasingly hard to predict outcomes, making most public sectors and governments increasingly unable to meet the changing needs of the communities we serve.

Decades of austerity, hollowing out expertise, fragmentation of interdependent functions that are forced to compete, outsourcing and the inevitable ensuing existential crises have all left public sectors less prepared than ever, at a the time when people most need us. Trust is declining and yet public sectors often feel unable to be authoritative sources of facts or information, independent of political or ideological influence, which exacerbates the trust and confidence deficit. Public sectors have become too reactive, too “business” focused, constantly pivoting all efforts on the latest emergency, cost efficiency, media release or whim of the Minister, whilst not investing in baseline systems, transformation, programs or services that are needed to be proactive and resilient. A values-based public sector that is engaged with, responsive to and serving the needs of (1) the Government, (2) the Parliament AND (3) the people – a difficult balancing act to be sure! – is critical, both to maintaining the trust of all three masters, and to being genuinely effective over time :)

Whether it is regulation, services or financial management, public sectors everywhere also need to embrace change as the new norm, which means our systems, processes and structures need to be engaged in continuously measuring, monitoring and responding to change, throughout the entire policy-delivery lifecycle. This means policy and delivery folk should be hand in hand throughout the entire process, so the baton passing between functionally segmented teams can end.

Faux transformation

Sadly today, most “transformation programs” appear to fall into one of three types:

  • Iteration or automation – iterative improvements, automation or new tech just thrown at existing processes and services, which doesn’t address the actual needs, systemic problems, or the gaping policy-delivery continuum chasm that has widened significantly in recent decades; or
  • Efficiency restructures - well marketed austerity measures to reduce the cost of government without actually improving the performance, policy outcomes or impact of government; or
  • Experimentation at the periphery - real transformation skills or units that are kept at the fringe and unable to drive or affect systemic change across any given public sector.

Most “transformation programs” I see are simply not particularly transformative, particularly when you scratch the surface to find how they would change things in future. If you answer is “we’ll have a new system” or “an x% improvement”, then it probably isn’t transformation, it is probably an iteration. Transformation should result in exponential solutions to exponential problems and a test driven and high confidence policy-delivery continuum that takes days not months for implementation, with the effects of new policies clearly seen through consistently measured, monitored and continuously improved delivery. You should have a clear and clearly understood future state in mind to transformation towards, otherwise it is certainly iteration on the status quo.

There are good exceptions to this normative pattern. Estonia, Taiwan, South Korea, Canada and several nations across South East Asia have and are investing in genuine and systemic transformation programs, often focused on improving the citizen experience as well as the quality of life of their citizens and communities. My favourite quote from 2020 was from Dr Sania Nishtar (Special Assistant to the Prime Minister of Pakistan on Poverty Alleviation and Social Protection) when she said ‘it is neither feasible nor desirable to return to the pre-COVID status’. It was part of a major UNDP summit on NextGenGov, where all attendees reflected the same sentiment that COVID exposed significant gaps in our public sectors, and we all need significant reform to be effective and responsive to rolling emergencies moving forward.

So what does good transformation look like?

I would categorise true transformation efforts in three types, with all three needed:

  1. Policy and service transformation means addressing and reimagining the policy-delivery continuum in the 21st century, and bringing policy and implementation people together in the same process and indeed, the same (virtual) room. This would mean new policies are better informed, able to be tested from inception through to implementation, are able to be immediately or at least swiftly implemented upon enactment in Parliament and are then continuously measured, monitored and iterated in accordance with the intended policy outcome. The exact same infrastructure used for delivery should be used for policy, and vice versa, to ensure there is no gap between, and to ensure policy outcomes are best realised whilst also responding to ongoing change. After all, when policy outcomes are not realized, regardless of whose fault it     was, it is everyone’s failure. This kind of transformation is possible within any one department or agency, but ideally needs leadership across all of government to ensure consistency of policy impact and benefits realisation.
  2. Organizational transformation would mean getting back to basics and having a clear vision of the purpose and intended impact of the department as a whole, with clear overarching measurement of those goals, and clear line of sight for how all programs contribute to those goals, and with all staff clear in how their work supports the goals. This type of transformation requires structural cultural transformation that builds on the shared values and goals of the department, but gains a consistency of behaviours that are constructive and empathetic. This kind of transformation is entirely possible within the domain of any one department or agency, if the leadership support and participate in it.
  3. Systemic transformation means the addressing and reimagining of the public sector as a whole, including its role in society, the structures, incentive systems, assurance processes, budget management, 21st century levers (like open government), staff support and relationship to other sectors. It also means having a clear vision for what it means to be a proud, empowered and skilled public servant today, which necessarily includes system and design thinking, participatory governance skills and digital literacy (not just skills). This can’t be done in any one department and requires all of public sector investment, coordination and cross government mandate. This level of transformation has started to happen in some countries but it is early days and needs prioritization if public sectors are to truly and systemically transform. Such transformation efforts often focus on structure, but need to include scope for transformation of policy, services, workforce, funding and more across government.

As we enter the age of Artificial Intelligence, public sectors should also be planning what an augmented public sector looks like, one that keeps values, trust and accountability at the heart of what we do, whilst using machines to support better responsiveness, modelling, service delivery and to maintain diligent and proactive protection of the people and communities we serve. Most AI projects seem to be about iterative efforts, automation or cost savings, which misses the opportunity to design a modern public service that gets the best of humans and machines working together for the best public outcomes.

COVID-19

COVID has been a dramatic reminder of the ineffectiveness of government systems to respond to changing needs in at least three distinct ways:

  • heavy use of emergency powers have been relied upon to get anything of substance done, demonstrating key systemic barriers, but rather than changing the problematic business as usual processes, many are reverting to usual practice as soon as practical;
  • superhuman efforts have barely scratched the surface of the problems. The usual resourcing response to pressure it to just increase resources rather than to change how we respond to the problem, but there are not exponential resources available, so ironically the
  • inequities have been compounded by governments pressing on the same old levers with the same old processes without being able to measure, monitor and iterative or pivot in real time in response to the impacts of change.

Sadly, the pressure for ‘good news stories’ often drives a self-congratulatory tone and an increase to an already siloed mindset, as public servants struggle to respond to increased and often diametrically opposed expectations and needs from the public and political domains. Many have also mistaken teleworking for transformation, potentially missing a critical opportunity to transform towards a 21st century public sector.

Last word

I’m planning to do a bit more writing about this, so please leave your comments and thoughts below. I’d be keen to hear how you differentiate transformation from iterative efforts, and how to ensure we are doing both. There is, of course, value to be found in some iterative efforts. It is when 100% of our time and effort is focused on iteration that we see public sectors simply revert to playing whack-a-mole against an exponentially growing problem space, hence the need to have SOME proportion of our resource on genuine transformation efforts. Proportional planning is critical so we address both the important and the urgent, not one without the other.

,

Michael StillThe Mythical Man-Month

Share

I expect everyone (well, almost everyone) involved in some way in software engineering has heard of this book. I decided that it was time to finally read it, largely prompted by this excellent blog post by apenwarr which discusses second systems effect among other things. Now, you can buy this book for a surprisingly large amount of money, but as Michael Carden pointed out, the PDF is also made available for free by the Internet Archive. I’d recommend going that route.

The book is composed of a series of essays, which discuss the trials of the OS/360 team in the mid-1960s, and uses those experiences to attempt to form a series of more general observations on the art of software development and systems engineering.

The first observation that resonates with me is that building a reliable program which integrates into a wider ecosystem is significantly harder than whipping up a quick script to do a thing. The book estimates that the additional effort of generalization, testing, documentation, and integration is nine times the effort of the original program. To be honest, I think that estimate is probably low.

The second essay is entitled The Mythical Man-Month, and is the essay we’ve all heard about and that lends it name to the book. Its an exploration of how poor we are at resource estimation (I don’t think this has improved much in the last 50 years), and how simply adding resources doesn’t actually help. Brooks asserts that we are poor at estimation because we are optimists, with that optimism being driven by the inputs to our work being entirely ephemeral. I’m not sure I 100% agree with that, I think there is some other aspects not to be ignored, like poorly defined requirements and scope creep.

Next we hear about how a software engineering team can be constructed along similar lines to a surgical team — one person doing all the actual coding, but nine others acting in supporting roles. This essay fell flat to be honest, because I just can’t imagine a world in which this level of overhead would be acceptable. I feel Brooks would argue its not overhead — he in fact argues that it is our way out from the mythical man month dilemma — but I’ve never seen anything like it implemented. I suspect the closest I’ve encountered is the visual effects industry, in which it is common to have a tools team supporting the artists.

The book moves on to an essay about how conceptual integrity in design is important to a successful project — that is, that the project needs to know what it is achieving, and maintain focus on those things, instead of incorporating every random good idea the team has along the way. This aligns well with some reading about OKRs that I did over the Christmas break, which starts with this quote:

“Myspace didn’t have an obvious idea of what it wanted to be or accomplish, and could never decide on the best way to get there. That’s not surprising. If you don’t know where you’re going, it’s hard to figure out the best path. And within a few years, Facebook dominated social media. Which should never have happened. When you have an existing network, maintaining it should be the simple part. But they weren’t able to do that.” — Kyle Evan’s excellent post on OKRs.

I find that alignment heartening, in that its the first time one of these essays has made me think that perhaps things have indeed improved in my industry since 1975.

Then again, Brooks then diverts into what I can only call a …strange… direction:

How is conceptual integrity to be achieved?
Brooks was a people person.

Although in his defence he goes on to argue that being a pleb implementer still allows for quite a large amount of creative work to be done, just in a constrained manner.

Brooks then introduces the concept of the Second System Effect, which I think is an important one — both OpenStack and Kubernetes may be characterised as Second Systems, and we still face design issues around such systems today. However, this chapter is disappointingly short compared with others, and I think could spend more than three paragraphs at the end of the chapter discussing techniques to avoid creating new ones. Frankly, I think appenwarr and Joel Spolsky do a better job of explaining the concept (although I think Joel is wrong in stating that you should never rewrite a system — I would instead say you should never big bang rewrite a system).

Brooks next discussed formal written specifications (which are also covered as part of a suite of documents to be producd in a later chapter), a topic which seems to have suffered a terrible fate in recent years, as Agile techniques have moved us away from large and clumbersome formal documents entirely detailing the function of a system to a more incremental approach to building software.

The book uses the Tower of Babel as a case study in engineering project management. Projects require clear communication to succeed. The book proposes a very formal method of publishing memoranda which I feel is largely superceded by modern communication techniques. I feel this quote is however quite insightful:

“Organizations must be designed around the people available; not people fitted into pure-theory organizations.”

Next Brooks moves onto estimation. He’d previously proposed ratios for how long other delivery tasks such as testing should take compared with development (but in much more detail than I give here), but he now correctly points out that the time to develop a feature will vary drastically with the complexity of that feature — there is no linear relationship between complexity and time. However, Brooks also makes an important point that I think we often forget — studies at that time had shown that while estimates might have been largely correct, they failed to take into account other corporate duties, overhead, and sickness. I think this is probably true now too, we tend to be optimistic about what else will be happening while we try to work.

The chapter on programming space (the memory used by an old school mainframe program) is historically interesting, I don’t see a lot of relevance in today’s world, apart from the observation that each developer or team needs to account for the impact of their decisions on the experience of the end user of the system, as well as whether they are within the boundaries of their agreed responsibilities to other internal teams.

Next we encounter a chapter encouraging pilot implementations, which Brooks describes as “building one to throw away”. Brooks argues that you’ll throw away the first version anyway, so you may as well be honest to yourself and your stakeholders at the start. I wonder what Brooks would think of modern Agile approaches, where work is split into smaller chunks and then iterated on. That feels like a systematic approach to prototyping if done well to me.

There’s not a lot of interest in the last 50 pages of the book to be honest, with the focus being on things now considered outmoded — how to control how much RAM a component is using down to the byte level, what colour wire to use when field patching the circuitry of prototype mainframes, and so on.

One final gem is Brooks’ advice that milestones should be specific, and that any schedule slippage should be treated with the same severity — a slippage of a day should be just as much an opportunity to get the team to pull together as a slippage of a week or a month. Brooks also advises that junior managers need to signal to senior managers if their status reporting is intended as informational (the issue is already being addressed), or intended to prompt action from the senior manager.

The biggest flaw with this book is that it feels dated — repeated use of male pronouns to describe all programmers, and a general tendancy to compare software engineering to the creative aspects of a Christian God seem out of place almost 50 years later. However, I found this book very readable, and quite insightful. I’m definitely going to be calling my junior devs plebs from now on.

In other news, I think this post holds the record for a book review on this site. There’s just so much here to digest. I think that’s one of the strengths of this book — it is information dense whilst still being approachable and easy to read. Overall I don’t think this book is too dated to be useful, which is impressive for something nearly 50 years old.

The Mythical Man-month Book Cover The Mythical Man-month
Frederick Phillips Brooks, Frederick P. Brooks, Jr.,
Computer programming
Reading, Mass. ; Don Mills, Ont. : Addison-Wesley Publishing Company
1975
195

Share

Jan SchmidtRift CV1 – Adventures in Kalman filtering Part 2

In the last post I had started implementing an Unscented Kalman Filter for position and orientation tracking in OpenHMD. Over the Christmas break, I continued that work.

A Quick Recap

When reading below, keep in mind that the goal of the filtering code I’m writing is to combine 2 sources of information for tracking the headset and controllers.

The first piece of information is acceleration and rotation data from the IMU on each device, and the second is observations of the device position and orientation from 1 or more camera sensors.

The IMU motion data drifts quickly (at least for position tracking) and can’t tell which way the device is facing (yaw, but can detect gravity and get pitch/roll).

The camera observations can tell exactly where each device is, but arrive at a much lower rate (52Hz vs 500/1000Hz) and can take a long time to process (hundreds of milliseconds) to analyse to acquire or re-acquire a lock on the tracked device(s).

The goal is to acquire tracking lock, then use the motion data to predict the motion closely enough that we always hit the ‘fast path’ of vision analysis. The key here is closely enough – the more closely the filter can track and predict the motion of devices between camera frames, the better.

Integration in OpenHMD

When I wrote the last post, I had the filter running as a standalone application, processing motion trace data collected by instrumenting a running OpenHMD app and moving my headset and controllers around. That’s a really good way to work, because it lets me run modifications on the same data set and see what changed.

However, the motion traces were captured using the current fusion/prediction code, which frequently loses tracking lock when the devices move – leading to big gaps in the camera observations and more interpolation for the filter.

By integrating the Kalman filter into OpenHMD, the predictions are improved leading to generally much better results. Here’s one trace of me moving the headset around reasonably vigourously with no tracking loss at all.

Headset motion capture trace

If it worked this well all the time, I’d be ecstatic! The predicted position matched the observed position closely enough for every frame for the computer vision to match poses and track perfectly. Unfortunately, this doesn’t happen every time yet, and definitely not with the controllers – although I think the latter largely comes down to the current computer vision having more troubler matching controller poses. They have fewer LEDs to match against compared to the headset, and the LEDs are generally more side-on to a front-facing camera.

Taking a closer look at a portion of that trace, the drift between camera frames when the position is interpolated using the IMU readings is clear.

Headset motion capture – zoomed in view

This is really good. Most of the time, the drift between frames is within 1-2mm. The computer vision can only match the pose of the devices to within a pixel or two – so the observed jitter can also come from the pose extraction, not the filtering.

The worst tracking is again on the Z axis – distance from the camera in this case. Again, that makes sense – with a single camera matching LED blobs, distance is the most uncertain part of the extracted pose.

Losing Track

The trace above is good – the computer vision spots the headset and then the filtering + computer vision track it at all times. That isn’t always the case – the prediction goes wrong, or the computer vision fails to match (it’s definitely still far from perfect). When that happens, it needs to do a full pose search to reacquire the device, and there’s a big gap until the next pose report is available.

That looks more like this

Headset motion capture trace with tracking errors

This trace has 2 kinds of errors – gaps in the observed position timeline during full pose searches and erroneous position reports where the computer vision matched things incorrectly.

Fixing the errors in position reports will require improving the computer vision algorithm and would fix most of the plot above. Outlier rejection is one approach to investigate on that front.

Latency Compensation

There is inherent delay involved in processing of the camera observations. Every 19.2ms, the headset emits a radio signal that triggers each camera to capture a frame. At the same time, the headset and controller IR LEDS light up brightly to create the light constellation being tracked. After the frame is captured, it is delivered over USB over the next 18ms or so and then submitted for vision analysis. In the fast case where we’re already tracking the device the computer vision is complete in a millisecond or so. In the slow case, it’s much longer.

Overall, that means that there’s at least a 20ms offset between when the devices are observed and when the position information is available for use. In the plot above, this delay is ignored and position reports are fed into the filter when they are available. In the worst case, that means the filter is being told where the headset was hundreds of milliseconds earlier.

To compensate for that delay, I implemented a mechanism in the filter where it keeps extra position and orientation entries in the state that can be used to retroactively apply the position observations.

The way that works is to make a prediction of the position and orientation of the device at the moment the camera frame is captured and copy that prediction into the extra state variable. After that, it continues integrating IMU data as it becomes available while keeping the auxilliary state constant.

When a the camera frame analysis is complete, that delayed measurement is matched against the stored position and orientation prediction in the state and the error used to correct the overall filter. The cool thing is that in the intervening time, the filter covariance matrix has been building up the right correction terms to adjust the current position and orientation.

Here’s a good example of the difference:

Before: Position filtering with no latency compensation
After: Latency-compensated position reports

Notice how most of the disconnected segments have now slotted back into position in the timeline. The ones that haven’t can either be attributed to incorrect pose extraction in the compute vision, or to not having enough auxilliary state slots for all the concurrent frames.

At any given moment, there can be a camera frame being analysed, one arriving over USB, and one awaiting “long term” analysis. The filter needs to track an auxilliary state variable for each frame that we expect to get pose information from later, so I implemented a slot allocation system and multiple slots.

The downside is that each slot adds 6 variables (3 position and 3 orientation) to the covariance matrix on top of the 18 base variables. Because the covariance matrix is square, the size grows quadratically with new variables. 5 new slots means 30 new variables – leading to a 48 x 48 covariance matrix instead of 18 x 18. That is a 7-fold increase in the size of the matrix (48 x 48 = 2304 vs 18 x 18 = 324) and unfortunately about a 10x slow-down in the filter run-time.

At that point, even after some optimisation and vectorisation on the matrix operations, the filter can only run about 3x real-time, which is too slow. Using fewer slots is quicker, but allows for fewer outstanding frames. With 3 slots, the slow-down is only about 2x.

There are some other possible approaches to this problem:

  • Running the filtering delayed, only integrating IMU reports once the camera report is available. This has the disadvantage of not reporting the most up-to-date estimate of the user pose, which isn’t great for an interactive VR system.
  • Keeping around IMU reports and rewinding / replaying the filter for late camera observations. This limits the overall increase in filter CPU usage to double (since we at most replay every observation twice), but potentially with large bursts when hundreds of IMU readings need replaying.
  • It might be possible to only keep 2 “full” delayed measurement slots with both position and orientation, and to keep some position-only slots for others. The orientation of the headset tends to drift much more slowly than position does, so when there’s a big gap in the tracking it would be more important to be able to correct the position estimate. Orientation is likely to still be close to correct.
  • Further optimisation in the filter implementation. I was hoping to keep everything dependency-free, so the filter implementation uses my own naive 2D matrix code, which only implements the features needed for the filter. A more sophisticated matrix library might perform better – but it’s hard to say without doing some testing on that front.

Controllers

So far in this post, I’ve only talked about the headset tracking and not mentioned controllers. The controllers are considerably harder to track right now, but most of the blame for that is in the computer vision part. Each controller has fewer LEDs than the headset, fewer are visible at any given moment, and they often aren’t pointing at the camera front-on.

Oculus Camera view of headset and left controller.

This screenshot is a prime example. The controller is the cluster of lights at the top of the image, and the headset is lower left. The computer vision has gotten confused and thinks the controller is the ring of random blue crosses near the headset. It corrected itself a moment later, but those false readings make life very hard for the filtering.

Position tracking of left controller with lots of tracking loss.

Here’s a typical example of the controller tracking right now. There are some very promising portions of good tracking, but they are interspersed with bursts of tracking losses, and wild drifting from the computer vision giving wrong poses – leading to the filter predicting incorrect acceleration and hence cascaded tracking losses. Particularly (again) on the Z axis.

Timing Improvements

One of the problems I was looking at in my last post is variability in the arrival timing of the various USB streams (Headset reports, Controller reports, camera frames). I improved things in OpenHMD on that front, to use timestamps from the devices everywhere (removing USB timing jitter from the inter-sample time).

There are still potential problems in when IMU reports from controllers get updated in the filters vs the camera frames. That can be on the order of 2-4ms jitter. Time will tell how big a problem that will be – after the other bigger tracking problems are resolved.

Sponsorships

All the work that I’m doing implementing this positional tracking is a combination of my free time, hours contributed by my employer Centricular and contributions from people via Github Sponsorships. If you’d like to help me spend more hours on this and fewer on other paying work, I appreciate any contributions immensely!

Next Steps

The next things on my todo list are:

  • Integrate the delayed-observation processing into OpenHMD (at the moment it is only in my standalone simulator).
  • Improve the filter code structure – this is my first kalman filter and there are some implementation decisions I’d like to revisit.
  • Publish the UKF branch for other people to try.
  • Circle back to the computer vision and look at ways to improve the pose extraction and better reject outlying / erroneous poses, especially for the controllers.
  • Think more about how to best handle / schedule analysis of frames from multiple cameras. At the moment each camera operates as a separate entity, capturing frames and analysing them in threads without considering what is happening in other cameras. That means any camera that can’t see a particular device starts doing full pose searches – which might be unnecessary if another camera still has a good view of the device. Coordinating those analyses across cameras could yield better CPU consumption, and let the filter retain fewer delayed observation slots.

Russell CokerMonopoly the Game

The Smithsonian Mag has an informative article about the history of the game Monopoly [1]. The main point about Monopoly teaching about the problems of inequality is one I was already aware of, but there are some aspects of the history that I learned from the article.

Here’s an article about using modified version of Monopoly to teach Sociology [2].

Maria Paino and Jeffrey Chin wrote an interesting paper about using Monopoly with revised rules to teach Sociology [3]. They publish the rules which are interesting and seem good for a class.

I think it would be good to have some new games which can teach about class differences. Maybe have an “Escape From Poverty” game where you have choices that include drug dealing to try and improve your situation or a cooperative game where people try to create a small business. While Monopoly can be instructive it’s based on the economic circumstances of the past. The vast majority of rich people aren’t rich from land ownership.

,

Tim RileyOpen source status update, December 2020

Happy new year! Before we get too far through January, here’s the recap of my December in OSS.

Advent of Code 2020 (in Go!)

This month started off a little differently to usual. After spending some time book-learning about Go, I decided to try the Advent of Code for the first time as a way to build some muscle memory for a new language. And gosh, it was a lot of fun! Turns out I like programming and problem-solving, go figure. After ~11 days straight, however, I decided to put the effort on hold. I could tell the pace wasn’t going to be sustainable for me (it was a lot of late nights), and I’d already begun to feel pretty comfortable with various aspects of Go, so that’s where I left it for now.

Rich dry-system component_dir configuration (and cleanups!)

Returning to my regular Ruby business, December was a good month for dry-system. After the work in November to prepare the way for Zeitwerk, I moved onto introducing a new component_dirs setting, which permits the addition of any number of component directories (i.e. where dry-system should look for your Ruby class files), each with their own specific configurations:

class MyApp::Container < Dry::System::Container
  configure do |config|
    config.root = __dir__

    config.component_dirs.add "lib" do |dir|
      dir.auto_register = true    # defaults to true
      dir.add_to_load_path = true # defaults to true
      dir.default_namespace = "my_app"
    end
  end
end

Along with this, I’m removing the following from Dry::System::Container:

  • The top-level default_namespace and auto_register settings
  • The .add_to_load_path! and .auto_register! methods

Together, this means there’ll be only a single place to configure the behaviour related to the loading of components from directories: the singular component_dirs setting.

This has been a rationalization I’ve been wanting to make for a long time, and happily, it’s proving to be a positive one: as I’ve been working through the changes, it’s allowed me to simplify some of the gnarlier parts of the gem.

What all of this provides is the right set of hooks for Hanami to specify the component directories for your app, as well as configure each one to work nicely with Zeitwerk. That’s the end goal, and I suspect we’ll arrive there in late January or February, but in the meantime, I’ve enjoyed the chance to tidy up the internals of this critical part of the Hanami 2.0 underpinnings.

You can follow my work in progress over in this PR.

Helpers for hanami-view 2.0

Towards the end of the month I had a call with Luca (the second in as many months, what a treat!), in which we discussed how we might bringing about full support for view helpers into hanami-view 2.0.

Of course, these won’t be �helpers� in quite the same shape you’d expect from Rails or any of the Ruby static site generators, because if you’ve ever heard me talk about dry-view or hanami-view 2.0 (here’s a refresher), one of its main goals is to help move you from a gross, global soup of unrelated helpers towards view behaviour modelled as focused, testable, well-factored object oriented code.

In this case, we finished the discussion with a plan, and Luca turned it around within a matter of days, with a quickfire set of PRs!

First he introduced the concept of custom anonymous scopes for any view. A scope in dry-view/hanami-view parlance is the object that provides the total set of methods available to use within the template. For a while we’ve supported defining custom scope classes to add behavior for a view that doesn’t belong on any one of its particular exposures, but this requires a fair bit of boilerplate, especially if it’s just for a method or two:

class ArticleViewScope < Hanami::View::Scope
  def custom_method
    # Custom behavior here, can access all scope facilities, e.g. `locals` or `context`
  end
end

class ArticleView < Hanami::View
  config.scope = ArticleViewScope

  expose :article do |slug:|
    # logic to load article here
  end
end

So to make this eaiser, we now we have this new class-level block:

class ArticleView < Hanami::View
  expose :article do |slug:|
    # logic to load article here
  end

  # New scope block!
  scope do
    def custom_method
      # Custom behavior here, can access all scope facilities, e.g. `locals` or `context`
    end
  end
end

So nice! Also nice? That it was a literal 3-line change to the hanami-view code � Also, also nice? You can still “upgrade� to a fully fledged class if the code ever requires it.

Along with this, Luca also began adapting the existing range of global helpers for use in hanami-view 2.0. I may dislike the idea of helpers in general, but truly stateless things like html builders, etc. I’m generally happy to see around, and with the improvements to template rendering we have over hanami-view 1.x, we’ll be able to make these a lot more expressive for Hanami view developers. This PR is just the first step, but I expect we’ll be able to make some quick strides once this is in place.

Thank you to my sponsors! 🙌�

Thank you to my six GitHub sponsorts for your continuing support! If you’re reading this and would like to chip in and help push forward the Ruby web application ecosystem for 2021, I’d really appreciate your support.

See you all next month!

,

Russell CokerPlanet Linux Australia

Linux Australia have decided to cease running the Planet installation on planet.linux.org.au. I believe that blogging is still useful and a web page with a feed of Australian Linux blogs is a useful service. So I have started running a new Planet Linux Australia on https://planet.luv.asn.au/. There has been discussion about getting some sort of redirection from the old Linux Australia page, but they don’t seem able to do that.

If you have a blog that has a reasonable portion of Linux and FOSS content and is based in or connected to Australia then email me on russell at coker.com.au to get it added.

When I started running this I took the old list of feeds from planet.linux.org.au, deleted all blogs that didn’t have posts for 5 years and all blogs that were broken and had no recent posts. I emailed people who had recently broken blogs so they could fix them. It seems that many people who run personal blogs aren’t bothered by a bit of downtime.

As an aside I would be happy to setup the monitoring system I use to monitor any personal web site of a Linux person and notify them by Jabber or email of an outage. I could set it to not alert for a specified period (10 mins, 1 hour, whatever you like) so it doesn’t alert needlessly on routine sysadmin work and I could have it check SSL certificate validity as well as the basic page header.

Russell CokerWeather and Boinc

I just wrote a Perl script to look at the Australian Bureau of Meteorology pages to find the current temperature in an area and then adjust BOINC settings accordingly. The Perl script (in this post after the break, which shouldn’t be in the RSS feed) takes the URL of a Bureau of Meteorology observation point as ARGV[0] and parses that to find the current (within the last hour) temperature. Then successive command line arguments are of the form “24:100” and “30:50” which indicate that at below 24C 100% of CPU cores should be used and below 30C 50% of CPU cores should be used. In warm weather having a couple of workstations in a room running BOINC (or any other CPU intensive task) will increase the temperature and also make excessive noise from cooling fans.

To change the number of CPU cores used the script changes /etc/boinc-client/global_prefs_override.xml and then tells BOINC to reload that config file. This code is a little ugly (it doesn’t properly parse XML, it just replaces a line of text) and could fail on a valid configuration file that wasn’t produced by the current BOINC code.

The parsing of the BoM page is a little ugly too, it relies on the HTML code in the BoM page – they could make a page that looks identical which breaks the parsing or even a page that contains the same data that looks different. It would be nice if the BoM published some APIs for getting the weather. One thing that would be good is TXT records in the DNS. DNS supports caching with specified lifetime and is designed for high throughput in aggregate. If you had a million IOT devices polling the current temperature and forecasts every minute via DNS the people running the servers wouldn’t even notice the load, while a million devices polling a web based API would be a significant load. As an aside I recommend playing nice and only running such a script every 30 minutes, the BoM page seems to be updated on the half hour so I have my cron jobs running at 5 and 35 minutes past the hour.

If this code works for you then that’s great. If it merely acts as an inspiration for developing your own code then that’s great too! BOINC users outside Australia could replace the code for getting meteorological data (or even interface to a digital thermometer). Australians who use other CPU intensive batch jobs could take the BoM parsing code and replace the BOINC related code. If you write scripts inspired by this please blog about it and comment here with a link to your blog post.

#!/usr/bin/perl
use strict;
use Sys::Syslog;

# St Kilda Harbour RMYS
# http://www.bom.gov.au/products/IDV60901/IDV60901.95864.shtml

my $URL = $ARGV[0];

open(IN, "wget -o /dev/null -O - $URL|") or die "Can't get $URL";
while(<IN>)
{
  if($_ =~ /tr class=.rowleftcolumn/)
  {
    last;
  }
}

sub get_data
{
  if(not $_[0] =~ /headers=.t1-$_[1]/)
  {
    return undef;
  }
  $_[0] =~ s/^.*headers=.t1-$_[1]..//;
  $_[0] =~ s/<.td.*$//;
  return $_[0];
}

my @datetime;
my $cur_temp -100;

while(<IN>)
{
  chomp;
  if($_ =~ /^<.tr>$/)
  {
    last;
  }
  my $res;
  if($res = get_data($_, "datetime"))
  {
    @datetime = split(/\//, $res)
  }
  elsif($res = get_data($_, "tmp"))
  {
    $cur_temp = $res;
  }
}
close(IN);
if($#datetime != 1 or $cur_temp == -100)
{
  die "Can't parse BOM data";
}

my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime();

if($mday - $datetime[0] > 1 or ($datetime[0] > $mday and $mday != 1))
{
  die "Date wrong\n";
}

my $mins;
my @timearr = split(/:/, $datetime[1]);
$mins = $timearr[0] * 60 + $timearr [1];
if($timearr[1] =~ /pm/)
{
  $mins += 720;
}
if($mday != $datetime[0])
{
  $mins += 1440;
}

if($mins + 60 < $hour * 60 + $min)
{
  die "page outdated\n";
}

my %temp_hash;
foreach ( @ARGV[1..$#ARGV] )
{
  my @tmparr = split(/:/, $_);
  $temp_hash{$tmparr[0]} = $tmparr[1];
}
my @temp_list = sort(keys(%temp_hash));
my $percent = 0;
my $i;
for($i = $#temp_list; $i >= 0 and $temp_list[$i] > $cur_temp; $i--)
{
  $percent = $temp_hash{$temp_list[$i]}
}

my $prefs = "/etc/boinc-client/global_prefs_override.xml";
open(IN, "<$prefs") or die "Can't read $prefs";
my @prefs_contents;
while(<IN>)
{
  push(@prefs_contents, $_);
}
close(IN);

openlog("boincmgr-cron", "", "daemon");

my @cpus_pct = grep(/max_ncpus_pct/, @prefs_contents);
my $cpus_line = $cpus_pct[0];
$cpus_line =~ s/..max_ncpus_pct.$//;
$cpus_line =~ s/^.*max_ncpus_pct.//;
if($cpus_line == $percent)
{
  syslog("info", "Temp $cur_temp" . "C, already set to $percent");
  exit 0;
}
open(OUT, ">$prefs.new") or die "Can't read $prefs.new";
for($i = 0; $i <= $#prefs_contents; $i++)
{
  if($prefs_contents[$i] =~ /max_ncpus_pct/)
  {
    print OUT "   <max_ncpus_pct>$percent.000000</max_ncpus_pct>\n";
  }
  else
  {
    print OUT $prefs_contents[$i];
  }
}
close(OUT);
rename "$prefs.new", "$prefs" or die "can't rename";
system("boinccmd --read_global_prefs_override");
syslog("info", "Temp $cur_temp" . "C, set percentage to $percent");

Francois MarierProgramming a DMR radio with its CPS

Here are some notes I took around programming my AnyTone AT-D878UV radio to operate on DMR using the CPS software that comes with it.

Note that you can always tune in to a VFO channel by hand if you haven't had time to add it to your codeplug yet.

DMR terminology

First of all, the terminology of DMR is quite different from that of the regular analog FM world.

Here are the basic terms:

  • Frequency: same meaning as in the analog world
  • Repeater: same meaning as in the analog world
  • Timeslot: Each frequency is split into two timeslots (1 and 2) and what that means that there can be two simultaneous transmissions on each frequency.
  • Color code: This is the digital equivalent of a CTCSS tone (sometimes called privacy tone) in that using the incorrect code means that you will tie up one of the timeslots on the frequency, but nobody else will hear you. These are not actually named after colors, but are instead just numerical IDs from 0 to 15.

There are two different identification mechanisms (both are required):

  • Callsign: This is the same identifier issued to you by your country's amateur radio authority. Mine is VA7GPL.
  • Radio ID: This is a unique numerical ID tied to your callsign which you must register for ahead of time and program into your radio. Mine is 3027260.

The following is where this digital mode becomes most interesting:

  • Talkgroup: a "chat room" where everything you say will be heard by anybody listening to that talkgroup
  • Network: a group of repeaters connected together over the Internet (typically) and sharing a common list of talkgroups
  • Hotspot: a personal simplex device which allows you to connect to a network with your handheld and access all of the talkgroups available on that network

The most active network these days is Brandmeister, but there are several others.

  • Access: This can either be Always on which means that a talkgroup will be permanently broadcasting on a timeslot and frequency, or PTT which means a talkgroup will not be broadcast until it is first "woken up" by pressing the push-to-talk button and then will broadcast for a certain amount of time before going to sleep again.
  • Channel: As in the analog world, this is what you select on your radio when you want to talk to a group of people. In the digital world however, it is tied not only to a frequency (and timeslot) and tone (color code), but also to a specific talkgroup.

Ultimately what you want to do when you program your radio is to find the talkgroups you are interested in (from the list offered by your local repeater) and then assign them to specific channel numbers on your radio. More on that later.

Callsign and Radio IDs

Before we get to talkgroups, let's set your callsign and Radio ID:

Then you need to download the latest list of Radio IDs so that your radio can display people's names and callsigns instead of just their numerical IDs.

One approach is to only download the list of users who recently talked on talkgroups you are interested in. For example, I used to download the contacts for the following talkgroups: 91,93,95,913,937,3026,3027,302,30271,30272,530,5301,5302,5303,5304,3100,3153,31330 but these days, what I normally do is to just download the entire worldwide database (user.csv) since my radio still has enough storage (200k entries) for it.

In order for the user.csv file to work with the AnyTone CPS, it needs to have particular columns and use the DOS end-of-line characters (apt install dos2unix if you want to do it manually). I wrote a script to do all of the work for me.

If you use dmrconfig to program this radio instead, then the conversion is unnecessary. The user.csv file can be used directly, however it will be truncated due to an incorrect limit hard-coded in the software.

Talkgroups

Next, you need to pick the talkgroups you would like to allocate to specific channels on your radio.

Start by looking at the documentation for your local repeaters (e.g. VE7RAG and VE7NWR in the Vancouver area).

In addition to telling you the listen and transmit frequencies of the repeater (again, this works the same way as with analog FM), these will tell you which talkgroups are available and what timeslots and color codes they have been set to. It will also tell you the type of access for each of these talkgroups.

This is how I programmed a channel:

and a talkgroup on the VE7RAG repeater in my radio:

If you don't have a local repeater with DMR capability, or if you want to access talkgroups available on a different network, then you will need to get a DMR hotspot such as one that's compatible with the Pi-Star software.

This is an excerpt from the programming I created for the talkgroups I made available through my hotspot:

One of the unfortunate limitations of the CPS software for the AnyTone 878 is that talkgroup numbers are globally unique identifiers. This means that if TG1234 (hypothetical example) is Ragchew 3000 on DMR-MARC but Iceland-wide chat on Brandmeister, then you can't have two copies of it with different names. The solution I found for this was to give that talkgroup the name "TG1234" instead of "Ragchew3k" or "Iceland". I use a more memorable name for non-conflicting talkgroups, but for the problematic ones, I simply repeat the talkgroup number.

Simplex

Talkgroups are not required to operate on DMR. Just like analog FM, you can talk to another person point-to-point using a simplex channel.

The convention for all simplex channels is the following:

  • Talkgroup: 99
  • Color code: 1
  • Timeslot: 1
  • Admit criteria: Always
  • In Call Criteria: TX or Always

After talking to the British Columbia Amateur Radio Coordination Council, I found that the following frequency ranges are most suitable for DMR simplex:

  • 145.710-145.790 MHz (simplex digital transmissions)
  • 446.000-446.975 MHz (all simplex modes)

The VECTOR list identifies two frequencies in particular:

  • 446.075 MHz
  • 446.500 MHz

Learn more

If you'd like to learn more about DMR, I would suggest you start with this excellent guide (also mirrored here).

,

Tim Riley2020 in review

2020, hey? What a time to finally get back on my ”year in review” horse. It was a calamitous year for many of us, but there’s a lot I’m thankful for from 2020.

Work

In January I started a new job with Culture Amp, as part of the Icelab closing and the whole team moving across. I couldn’t have found a better place to work: the people are inspiring, I’ve wound up with a great mentor/manager, and the projects are nourishing. There’s so much I can contribute to here, and I know I’m still only scratching the surface.

A new workplace with it’s own tech ecosystem meant I did a lot of learning for work this year. Among other things, I started working with event sourcing, distributed systems, AWS CDK, and even a little Go as the year came to an end.

I’m full-time remote with Culture Amp. Under ordinary circumstances (which went out the window before long), this would mean semi-regular visits to the Melbourne office, and I was lucky enough to do that a couple of times in January and February before travel became unworkable. Aside from that, I’ve enjoyed many hours over Zoom working with all my new colleagues.

And just to tie a bow in the a big year of work-related things, Michael, Max, and I worked through to December putting the finishing touches on the very last Icelab project, Navigate Senate Committees.

I’m deeply grateful to be where I am now, to have smoothly closed one chapter of work while opening another that I’m excited to inhabit for many years to come.

OSS

This year in OSS has been all about Hanami 2.0 development. By the end of 2019, I’d already made the broad strokes required to make dry-system the new core of the framework. 2020 was all about smoothing around the edges. I worked pretty consistently at this throughout the year, focusing on a new technique for frictionless, zero-boilerplate integrated components, view/action integration, a revised approach for configuration, and lately, support for code loading with Zeitwerk.

Towards the beginning of the year, I decided to follow Piotr’s good example and write my own monthly OSS status updates. I published these monthly since, making for 9 in the year (and 10 once I do December’s, once this post is done!). I’m really glad I established this habit. It captures so much of the thinking I put into my work that would otherwise be lost with time, and in the case of the Hanami project, it’s a way for the community to follow along with progress. And I won’t lie, the thought of the upcoming post motivates me to squeeze just a little bit more into each month!

This year I shared my Hanami 2 application template as a way to try the framework while it’s still in development. We’re using it for three production services at work and it’s running well.

Helping to rewrite Hanami for 2.0 has been the biggest OSS project I’ve undertaken, and work on this was a slog at times, but I’m happy I managed to keep a steady pace. I also rounded out the year by being able to catch up with Luca and Piotr for a couple of face to face discussions, which was a delight after so many months of text-only collaboration.

On the conference side of things, given the travel restrictions, there was a lot less than normal, but I did have a great time at the one event I did attend, RubyConf AU back in February (which seems so long ago now). Sandwiched between the Australian summer bushfires and the onset of the coronavirus, the community here was amazingly lucky with the timing. Aside from this, the increase of virtual events meant I got to share a talk with Philly.rb and appear on GitHub’s open source Friday livestream series.

I also joined the GitHub sponsors program in May. I only have a small group of sponsors (I’d love more!), but receiving notice of each one was a true joy.

And in a first for me, I spent some time working on this year’s Advent of Code! I used it as an opportunity to develop some familiarity with Go. It was great fun! Code is here if you’re interested.

Home & family

The ”stay at home” theme of 2020 was a blessing in many ways, because it meant more time with all the people I love. This year I got to join in on Clover learning to read, write, ride a bike, so many things! Iris is as gregarious as ever and definitely keen to begin her school journey this coming year.

As part of making the most of being home, I also started working out at home in March (thanks, PE with Joe!), which I managed to keep up at 5 days/week ever since! I haven’t felt this good in years.

And to close out, a few other notables:

  • Started visiting the local library more, I enjoyed reading paper books again
  • Some particularly good reads from the year: Stephen Baxter’s Northland trilogy, Cixin Liu’s The Supernova Era, Stephen Baxter’s The Medusa Chronicles, Roger Levy’s The Rig, Kim Stanley Robinson’s Red Moon, Mary Robinette Kowal’s The Relentless Moon, Kylie Maslen’s Show Me Where it Hurts, and Hugh Howey’s Sand
  • Ted Lasso on Apple TV+ was just great
  • Got a Nintendo Switch and played through Breath of the Wild, a truly spectacular experience
  • I think that’s all for now, see you next year!

Lev LafayetteImage Watermarks in Batch

A common need among those who engage in large scale image processing is to assign a watermark of some description to their images. Further, so I have been told, it is preferable to have multiple watermarks that have slightly different kerning depending on whether the image is portrait or landscape. Thus there are two functions to this script, one for separating the mass of images into a directory into whether they are portrait or landscape, and a second to apply the appropriate watermark. The script is therefore structured as follows and witness the neatness and advantages of structured coding, even in shell scripts. I learned a lot from first-year Pascal programming.

#!/bin/bash
separate() { # Separate original files in portrait and landscape directories
# content here
}

apply() { # Apply correct watermark to each directory
# content here
}

main() {
    separate
    apply
}

main
exit

The first function simple compares the width to the length of the image to make its determination. It assumes that the images are correctly orientated in the first place. Directories are created for the two types of files, and the script parses over each file in the directory and determines whether they are portraits or landscapes using the identify utility from ImageMagick and copies (you want to keep the originals) into an appropriate directory. The content of the separate function therefore ends up like this:

separate() { # Separate original files in portrait and landscape directories
mkdir portraits; mkdir landscapes
for item in ./*.jpg
do
  orient=$(identify -format '%[fx:(h>w)]' "$item")
  if [ $orient -eq 1 ] ;
  then
      cp "$item" ./portraits
  else
      cp "$item" ./landscapes
  fi
done
}

The second function goes into each directory and applies the watermark by making use of ImageMagick's composite function, looping over each file in the directory. The file in the directories are overwritten (the $item is input and output) but remember the originals have not been altered. The size of the watermark varies according to the size of the image and each are located in the "southeast" corner of the file. The function assumes a watermark 20 pixels by 20 pixels offset from the bottom left corner and a watermark of 0.1 of the height of the image. These can be changed as desired.

apply() { # Apply correct watermark to each directory
cd portraits
for item in ./*.jpg; do convert "$item" ../watermarkp.png +distort affine "0,0 0,0 %[w],%[h] %[fx:t?v.w*(u.h/v.h*0.1):s.w],%[fx:t?v.h*(u.h/v.h*0.1):s.h]" -shave 1 -gravity southeast -geometry +20+20 -composite "$item" ; done
cd ../landscapes
for item in ./*.jpg; do convert "$item" ../watermarkl.png +distort affine "0,0 0,0 %[w],%[h] %[fx:t?v.w*(u.h/v.h*0.1):s.w],%[fx:t?v.h*(u.h/v.h*0.1):s.h]" -shave 1 -gravity southeast -geometry +20+20 -composite "$item" ; done
}

This file can also be combined with existing scripts for Batch Image Processing. In particular, jpegtran/exiftran, suggested by Michael Deegan, for EXIF rotation flags.

For what it's worth this script took around an hour to put together (mostly research, about 5 minutes coding, and 10 minutes testing and debugging). I suspect it will save readers who use it a great deal of time.

Ben MartinNew home for the Astrolabe, pocket day calc, and coin of sentimental value

 I turned a slice of a tree trunk into a matching pair of holders for sentimental objects over the break. This has a few coats of polyurethane and deeper coating on the bark. Having some layers on the bark takes away the sharper edges for you. I need to measure the thickness of the poly on the front and inside the pockets as it is certainly measurable. What was a nice fit without finish becomes a rather tight fit with the poly.

 

Behind the two instruments is the key chain which is tucked away into a deeper pocket. The pockets at the side of each object are to allow fingers to free the object for inspection or frustrating use in the case of the astrolabe.

I was going to go down the well trodden path of making a small coffee table top from the timber but I like this idea as it frees these objects from their boxes and the darker red timber really compliments the objects embedded within it.

Tim SerongScope Creep

On December 22, I decided to brew an oatmeal stout (5kg Gladfield ale malt, 250g dark chocolate malt, 250g light chocolate malt, 250g dark crystal malt, 500g rolled oats, 150g rice hulls to stop the mash sticking, 25g Pride of Ringwood hops, Safale US-05 yeast). This all takes a good few hours to do the mash and the boil and everything, so while that was underway I thought it’d be a good opportunity to remove a crappy old cupboard from the laundry, so I could put our nice Miele upright freezer in there, where it’d be closer to the kitchen (the freezer is presently in a room at the other end of the house).

The cupboard was reasonably easy to rip out, but behind it was a mouldy and unexpectedly bright yellow wall with an ugly gap at the top where whoever installed it had removed the existing cornice.

Underneath the bottom half of the cupboard, I discovered not the cork tiles which cover the rest of the floor, but a layer of horrific faux-tile linoleum. Plus, more mould. No way was I going to put the freezer on top of that.

So, up came the floor covering, back to nice hardwood boards.

Of course, the sink had to come out too, to remove the flooring from under its cabinet, and that meant pulling the splashback tiles (they had ugly screw holes in them anyway from a shelf that had been bracketed up on top of them previously).

Removing the tiles meant replacing a couple of sections of wall.

Also, we still needed to be able to use the washing machine through all this, so I knocked up a temporary sink support.

New cornice went in.

The rest of the plastering was completed and a ceiling fan installed.

Waterproofing membrane was applied where new tiles will go around a new sink.

I removed the hideous old aluminium backed weather stripping from around the exterior door and plastered up the exposed groove.

We still need to paint everything, get the new sink installed, do the tiling work and install new taps.

As for the oatmeal stout, I bottled that on January 2. From a sample taken at the time, it should be excellent, but right now still needs to carbonate and mature.

Stewart SmithPhotos from Taiwan

A few years ago we went to Taiwan. I managed to capture some random bits of the city on film (and also some shots on my then phone, a Google Pixel). I find the different style of art on the streets around the world to be fascinating, and Taiwan had some good examples.

I’ve really enjoyed shooting Kodak E100VS film over the years, and some of my last rolls were shot in Taiwan. It’s a film that unfortunately is not made anymore, but at least we have a new Ektachrome to have fun with now.

Words for our time: “Where there is democracy, equality and freedom can exist; without democracy, equality and freedom are merely empty words”.

This is, of course, only a small number of the total photos I took there. I’d really recommend a trip to Taiwan, and I look forward to going back there some day.

,

Michael StillDeciding when to filter out large scale refactorings from code analysis

Share

I want to be able to see the level of change between OpenStack releases. However, there are a relatively small number of changes with simply huge amounts of delta in them — they’re generally large refactors or the delete which happens when part of a repository is spun out into its own project.

I therefore wanted to explore what was a reasonable size for a change in OpenStack so that I could decide what maximum size to filter away as likely to be a refactor. After playing with a couple of approaches, including just randomly picking a number, it seems the logical way to decide is to simply plot a histogram of the various sizes, and then pick a reasonable place on the curve as the cutoff. Due to the large range of values (from zero lines of change to over a million!), I ended up deciding a logarithmic axis was the way to go.

For the projects listed in the OpenStack compute starter kit reference set, that produces the following histogram:A histogram of the sizes of various OpenStack commitsI feel that filtering out commits over 10,000 lines of delta feels justified based on that graph. For reference, the raw histogram buckets are:

Commit sizeCount
< 225747
< 11237436
< 101326314
< 1001148865
< 1000116928
< 1000013277
< 1000001522
< 1000000113

Share

Michael StillA quick summary of OpenStack release tags

Share

I wanted a quick summary of OpenStack git release tags for a talk I am working on, and it turned out to be way more complicated than I expected. I ended up having to compile a table, and then turn that into a code snippet. In case its useful to anyone else, here it is:

ReleaseRelease dateFinal release tag
AustinOctober 20102010.1
BexarFebruary 20112011.1
CactusApril 20112011.2
DiabloSeptember 20112011.3
EssexApril 20122012.1.3
FolsomSeptember 20122012.2.4
GrizzlyApril 20132013.1.5
HavanaOctober 20132013.2.4
IcehouseApril 20142014.1.5
JunoOctober 20142014.2.4
KiloApril 20152015.1.4
LibertyOctober 2015Glance: 11.0.2
Keystone: 8.1.2
Neutron: 7.2.0
Nova: 12.0.6
MitakaApril 2016Glance: 12.0.0
Keystone: 9.3.0
Neutron: 8.4.0
Nova: 13.1.4
NewtonOctober 2016Glance: 13.0.0
Keystone: 10.0.3
Neutron: 9.4.1
Nova: 14.1.0
OcataFebruary 2017Glance: 14.0.1
Keystone: 11.0.4
Neutron: 10.0.7
Nova: 15.1.5
PikeAugust 2017Glance: 15.0.2
Keystone: 12.0.3
Neutron: 11.0.8
Nova: 16.1.8
QueensFebruary 2018Glance: 16.0.1
Keystone: 13.0.4
Neutron: 12.1.1
Nova: 17.0.13
RockyAugust 2018Glance: 17.0.1
Keystone: 14.2.0
Neutron: 13.0.7
Nova: 18.3.0
SteinApril 2019Glance: 18.0.1
Keystone: 15.0.1
Neutron: 14.4.2
Nova: 19.3.2
TrainOctober 2019Glance: 19.0.4
Keystone: 16.0.1
Neutron: 15.3.0
Nova: 20.4.1
UssuriMay 2020Glance: 20.0.1
Keystone: 17.0.0
Neutron: 16.2.0
Nova: 21.1.1
VictoriaOctober 2020Glance: 21.0.0
Keystone: 18.0.0
Neutron: 17.0.0
Nova: 22.0.1

Or in python form for those so inclined:

RELEASE_TAGS = {
    'austin': {'all': '2010.1'},
    'bexar': {'all': '2011.1'},
    'cactus': {'all': '2011.2'},
    'diablo': {'all': '2011.3'},
    'essex': {'all': '2012.1.3'},
    'folsom': {'all': '2012.2.4'},
    'grizzly': {'all': '2013.1.5'},
    'havana': {'all': '2013.2.4'},
    'icehouse': {'all': '2014.1.5'},
    'juno': {'all': '2014.2.4'},
    'kilo': {'all': '2015.1.4'},
    'liberty': {
        'glance': '11.0.2',
        'keystone': '8.1.2',
        'neutron': '7.2.0',
        'nova': '12.0.6'
    },
    'mitaka': {
        'glance': '12.0.0',
        'keystone': '9.3.0',
        'neutron': '8.4.0',
        'nova': '13.1.4'
    },
    'newton': {
        'glance': '13.0.0',
        'keystone': '10.0.3',
        'neutron': '9.4.1',
        'nova': '14.1.0'
    },
    'ocata': {
        'glance': '14.0.1',
        'keystone': '11.0.4',
        'neutron': '10.0.7',
        'nova': '15.1.5'
    },
    'pike': {
        'glance': '15.0.2',
        'keystone': '12.0.3',
        'neutron': '11.0.8',
        'nova': '16.1.8'
    },
    'queens': {
        'glance': '16.0.1',
        'keystone': '13.0.4',
        'neutron': '12.1.1',
        'nova': '17.0.13'
    },
    'rocky': {
        'glance': '17.0.1',
        'keystone': '14.2.0',
        'neutron': '13.0.7',
        'nova': '18.3.0'
    },
    'stein': {
        'glance': '18.0.1',
        'keystone': '15.0.1',
        'neutron': '14.4.2',
        'nova': '19.3.2'
    },
    'train': {
        'glance': '19.0.4',
        'keystone': '16.0.1',
        'neutron': '15.3.0',
        'nova': '20.4.1'
    },
    'ussuri': {
        'glance': '20.0.1',
        'keystone': '17.0.0',
        'neutron': '16.2.0',
        'nova': '21.1.1'
    },
    'victoria': {
        'glance': '21.0.0',
        'keystone': '18.0.0',
        'neutron': '17.0.0',
        'nova': '22.0.1'
    }
}

Share

,

Paul WiseFLOSS Activities December 2020

Focus

This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Administration

  • Debian: restart bacula director, ping some people about disk usage
  • Debian wiki: unblock IP addresses, approve accounts, update email for accounts with bouncing email

Communication

  • Respond to queries from Debian users and contributors on the mailing lists and IRC

Sponsors

All work was done on a volunteer basis.

,

Michael StillRejected talk proposal: Shaken Fist, thought experiments in simpler IaaS clouds

Share

This proposal was submitted for FOSDEM 2021. Given that acceptances were meant to be sent out on 25 December and its basically a week later I think we can assume that its been rejected. I’ve recently been writing up my rejected proposals, partially because I’ve put in the effort to write them and they might be useful elsewhere, but also because I think its important to demonstrate that its not unusual for experienced speakers to be rejected from these events.


OpenStack today is a complicated beast — not only does it try to perform well for large clusters, but it also embraces a diverse set of possible implementations from hypervisors, storage, networking, and more. This was a deliberate tactical choice made by the OpenStack community years ago, forming a so called “Big Tent” for vendors to collaborate in to build Open Source cloud options. It made a lot of sense at the time to be honest. However, OpenStack today finds itself constrained by the large number of permutations it must support, ten years of software and backwards compatability legacy, and a decreasing investment from those same vendors that OpenStack courted so actively.

Shaken Fist makes a series of simplifying assumptions that allow it to achieve a surprisingly large amount in not a lot of code. For example, it supports only one hypervisor, one hypervisor OS, one networking implementation, and lacks an image service. It tries hard to be respectful of compute resources while idle, and as fast as possible to deploy resources when requested — its entirely possible to deploy a new VM and start it booting in less than a second for example (if the boot image is already held in cache). Shaken Fist is likely a good choice for small deployments such as home labs and telco edge applications. It is unlikely to be a good choice for large scale compute however.

Share

Simon LyallAudiobooks – December 2020

The Perils of Perception: Why We’re Wrong About Nearly Everything by Bobby Duffy

Lots of examples of how people are wrong about usually crime rates or levels of immigration. Divided into topics with some comments on why and how to fix. 3/5

The Knowledge: How to Rebuild our World from Scratch
by Lewis Dartnell

A how-to on rebooting civilization following a worldwide disaster. The tone is addressed to a present-day person rather than someone from the future which makes it more readable. 4/5

The Story of Silver: How the White Metal Shaped America and the Modern World by William L. Silber

Almost solely devoted to America it devotes sections to major events around the metal including it’s demonetization, government and private price manipulation and speculation including the Hunt Brothers. 3/5

The First Four Years by Laura Ingalls Wilder

About half the length of the other books in the series and published posthumously. Laura and Almanzo try to make a success farming for 4 years. Things don’t go well. The book is a bit more adult than some of the others 3/5

Casino Royale by Ian Fleming

Interesting how close it is to the 2006 Movie. Also since it is set in ~1951, World War 2 looms large in many places & most characters are veterans. Very good and fairly quick read. 4/5

A Bridge too far: The Classic History of the Greatest Battle of World War II by Cornelius Ryan

An account of the failed airborne operation. Mostly a day-by-day & sources including interviews with participants. A little confusing without maps. 4/5

The Bomb: Presidents, Generals, and the Secret History of Nuclear War by Fred Kaplan

“The definitive history of American policy on nuclear war”. Lots of “War Plans” and “Targeting Policy” with back and forth between service factions. 3/5

The Sirens of Mars: Searching for Life on Another World
by Sarah Stewart Johnson

“Combines elements of memoir from Johnson with the history and science of attempts to discover life on Mars”. I liked this book a lot, very nicely written and inspiring. 4/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Simon LyallDonations 2020

Each year I do the majority of my Charity donations in early December (just after my birthday) spread over a few days (so as not to get my credit card suspended).

I also blog about it to hopefully inspire others. See: 2019, 2018, 2017, 2016, 2015

All amounts this year are in $US unless otherwise stated

My main donations was $750 to Givewell (to allocate to projects as they prioritize). Once again I’m happy that Givewell make efficient use of money donated. I decided this year to give a higher proportion of my giving to them than last year.

Software and Internet Infrastructure Projects

€20 to Syncthing which I’ve started to use instead of Dropbox.

$50 each to the Software Freedom Conservancy and Software in the Public Interest . Money not attached to any specific project.

$51 to the Internet Archive

$25 to Let’s Encrypt

Advocacy Organisations

$50 to the Electronic Frontier Foundation

Others including content creators

I donated $103 to Signum University to cover Corey Olsen’s Exploring the Lord of the Rings series plus other stuff I listen to that they put out.

I paid $100 to be a supporter of NZ News site The Spinoff

I also supported a number of creators on Patreon:

Share

,

Adrian ChaddRepairing and bootstrapping an IBM PC/AT 5170, Part 3

So! In Parts 1 and 2 I covered getting this old thing cleaned up, getting it booting, hacking together a boot floppy disc and getting a working file transfer onto a work floppy disc.

Today I'm going to cover what it took to get it going off of a "hard disk", which in 2020 can look quite a bit different than 1990.

First up - what's the hard disk? Well, in the 90s we could still get plenty of MFM and early IDE hard disks to throw into systems like this. In 2020, well, we can get Very Large IDE disks from NOS (like multi hundred gigabyte parallel ATA interface devices), but BIOSes tend to not like them. The MFM disks are .. well, kinda dead. It's understandable - they didn't exactly build them for a 40 year shelf life.

The IBM PC, like most computers at the time, allow for peripherals to include software support in ROM for the hardware you're trying to use. For example, my PC/AT BIOS doesn't know anything about IDE hardware - it only knows about ye olde ST-412/ST-506 Winchester/MFM drive controllers. But contemporary IDE hardware would include the driver code for the BIOS in an on-board ROM, which the BIOS would enumerate and use. Other drive interconnects such as SCSI did the same thing.

By the time the 80386's were out, basic dumb IDE was pretty well supported in BIOSes as, well, IDE is really code for "let's expose some of the 16 bit ISA bus on a 40 pin ribbon cable to the drives". But, more about that later.

Luckily some electronics minded folk have gone and implemented alternatives that we can use. Notably:

  • There's now an open-source "Universe IDE BIOS" available for computers that don't have IDE support in their BIOS - notably PC/XT and PC/AT, and
  • There are plenty of projects out there which break out the IDE bus on an XT or AT ISA bus - I'm using XT-IDE.
Now, I bought a little XT-IDE + compact flash card board off of ebay. They're cheap, it comes with the universal IDE bios on a flash device, and ...

... well, I plugged it in and it didn't work. So, I wondered if I broke it. I bought a second one, as I don't have other ISA bus computers yet, and ...

IT didn't work. Ok, so I know that there's something up with my system, not these cards. I did the 90s thing of "remove all IO cards until it works" in case there was an IO port conflict and ...

.. wham! The ethernet card. Both wanted 0x300. I'd have to reflash the Universal IDE BIOS to get it to look at any other address, so off I went to get the Intel Etherexpress 8/16 card configuration utility.

Here's an inside shot of the PC/AT with the XT-IDE installed, and a big gaping hole where the Intel EtherExpress 8/16 NIC should be.





No wait. What I SHOULD do first is get the XT-IDE CF card booting and running.

Ok, so - first things first. I had to configure the BIOS drive as NONE, because the BIOS isn't servicing the drive - the IDE BIOS is. Unfortunately, the IDE BIOS is coming in AFTER the system BIOS disks, so I currently can't run MFM + IDE whilst booting from IDE. I'm sure I can figure out how at some point, but that point is not today.

Success! It boots!

To DOS 6.22!

And only the boot sector, and COMMAND.COM! Nooooo!

Ok so - I don't have a working 3.5" drive installed, I don't have DOS 6.22 media on 1.2MB, but I can copy my transfer program (DSZ) - and Alley Cat - onto the CF card. But - now I need the DOS 6.22 install media.

On the plus side - it's 2020 and this install media is everywhere. On the minus side - it's disk images that I can't easily use. On the double minus side - the common DOS raw disk read/write tool - RAWREAD/RAWRITE - don't know about 5.25" drives! Ugh!

However! Here's where a bit of hilarious old knowledge is helpful - although the normal DOS installers want to be run from floppy, there's a thing called the "DOS 6.22 Upgrade" - and this CAN be run from the hard disk. However! You need a blank floppy for it to write the "uninstallation" data to, so keep one of those handy.

I extracted the files from the disk images using MTOOLS - "MCOPY -i DISK.IMG ::*.* ." to get everything out  - used PKZIP and DSZ to get it over to the CF card, and then ran the upgrader.


Hello DOS 6.22 Upgrade Setup!


Ah yes! Here's the uninstall disc step! Which indeed I had on hand for this very moment!


I wonder if I should fill out the registration card for this install and send it to Microsoft.

Ok, so that's done and now I have a working full DOS 6.22 installation. Now I can do all the fun things like making a DOS boot disk and recovery things. (For reference - you do that using FORMAT /S A: to format a SYSTEM disk that you can boot from; then you add things to it using COPY.)

Finally, I made a boot disk with the Intel EtherExpress 8/16 config program on it, and reconfigured my NIC somewhere other than 0x300. Now, I had to open up the PC/AT, remove the XT-IDE and install the EtherExpress NIC to do this - so yes, I had to boot from floppy disc.





Once that was done, I added a bunch of basic things like Turbo C 2.0, Turbo Assembler and mTCP. Now, mTCP is a package that really needed to exist in the 90s. However, this and the RAM upgrade (which I don't think I've talked about yet!) will come in the next installment of "Adrian's remembering old knowledge from his teenage years!".

,

David RoweFreeDV 700E and Compression

FreeDV 700D [9] is built around an OFDM modem [6] and powerful LDPC codes, and was released in mid 2018. Since then our real world experience has shown that it struggles with fast fading channels. Much to my surprise, the earlier FreeDV 700C mode actually works better on fast fading channels. This is surprising as 700C doesn’t have any FEC, but instead uses a simple transmit diversity scheme – the signal is sent twice at two different frequencies.

So I decided to develop a new FreeDV 700E waveform [8] with the following features:

  1. The ability to handle fast fading through an increased pilot symbol rate, but also with FEC which is useful for static crashes, interference, and mopping up random bit errors.
  2. Uses a shorter frame (80ms), rather than the 160ms frame of 700D. This will reduce latency and makes sync faster.
  3. The faster pilot symbol rate will mean 700E can handle frequency offsets better, as well as fast fading.
  4. Increasing the cyclic prefix from 2 to 6ms, allowing the modem to handle up to 6ms of multipath delay spread.
  5. A wider RF bandwidth than 700D, which can help mitigate frequency selective fading. If one part of the spectrum is notched out, we can use FEC to recover data from other parts of the spectrum. On the flip side, narrower signals are more robust to some interference, and use less spectrum.
  6. Compression of the OFDM waveform, to increase the average power (and hence received SNR) for a given peak power.
  7. Trade off low SNR performance for fast fading channel performance. A higher pilot symbol rate and longer cyclic prefix mean less energy is available for data symbols, so low SNR performance won’t be as good as 700D.
  8. It uses the same Codec 2 700C voice codec, so speech quality will be the same as 700C and D when SNR is high.

Over the course of 2020, we’ve refactored the OFDM modem and FreeDV API to make implementing new modem waveforms much easier. This really helped – I designed, simulated, and released the FreeDV 700E mode in just one week of part time work. It’s already being used all over the world in the development version of FreeDV 1.5.0.

My bench tests indicate 700C/D/E are equivalent on moderate fading channels (1Hz Doppler/2ms spread). As the fading speeds up to 2Hz 700D falls over, but 700C/E perform well. On very fast fading (4Hz/4ms) 700E does better as 700C stops working. 700D works better at lower SNRs on slow fading channels (1Hz Doppler/2ms and slower).

The second innovation is compression of the 700C/D/E waveforms, to increase average power significantly (around 6dB from FreeDV 1.4.3). Please be careful adjusting the Tx drive and especially enabling the Tools – Options – Clipping. It can drive your PA quite hard. I have managed 40W RMS out of my 75W PEP transmitter. Make sure your transmitter can handle long periods of high average power.

I’ve also been testing against compressed SSB, which is pretty hard to beat as it’s so robust to fading. However 700E is hanging on quite well with fast fading, and unlike SSB becomes noise free as the SNR increases. At the same equivalent peak power – 700D is doing well when compressed SSB is -5dB SNR and rather hard on the ears.

SSB Compression

To make an “apples to apples” comparison between FreeDV to SSB at low SNRs I need SSB compressor software than I can run on the (virtual) bench. So I’ve developed a speech compressor using some online wisdom [1][2]. Turns out the “Hilbert Clipper” in [2] is very similar to how I am compressing my OFDM signals to improve their PAPR. This appeals to me – using the same compression algorithm on SSB and FreeDV.

The Hilbert transformer takes the “real” speech signal and converts it to a “complex” signal. It’s the same signal, but now it’s represented by in phase and quadrature signals, or alternatively a vector spinning around the origin. Turns out you can do a much better job at compression by limiting the magnitude of that vector, than by clipping the input speech signal. Any clipping tends to spread the signal in frequency, so we have a SSB filter at the output to limit the bandwidth. Good compressors can get down to about 6dB PAPR for SSB, mine is not too shabby at 7-8dB.

It certainly makes a difference to noisy speech, as you can see in this plot (in low SNR channel), and the samples below:

Compression SNR Sample
Off High Listen
On High Listen
Off Low Listen
On Low Listen

FreeDV Performance

Here are some simulated samples. They are all normalised to the same peak power, and all waveforms (SSB and FreeDV) are compressed. The noise power N is in dB but there is some arbitrary scaling (for historical reasons). A more negative N means less noise. For a given noise power N, the SNRs vary as different waveforms have different peak to average power ratio. I’m adopting the convention of comparing signals at the same (Peak Power)/(Noise Power) ratio. This matches real world transmitters – we need to do the best we can with a given PEP (peak power). So the idea below is to compare samples at the same noise power N, and channel type, as peak power is known to be constant. An AWGN channel is just plain noise, MPP is 1Hz Doppler, 2ms delay spread; and MPD is 2Hz Doppler, 4ms delay spread.

Test Mode Channel N SNR Sample
1 SSB AWGN -12.5 -5 Listen
700D AWGN -12.5 -1.8 Listen
2 SSB MPP -17.0 0 Listen
700E MPP -17.0 3 Listen
3 SSB MPD -23.0 8 Listen
700E MPD -23.0 9 Listen

Here’s a sample of the simulated off air 700E modem signal 700E at 8dB SNR in a MPP channel. It actually works up to 4 Hz Doppler and 6ms delay spread. Which sounds likes a UFO landing.

Comments:

  1. With digital when your in a fade, you’re a fade! You lose that chunk of speech. A short FEC code (less than twice fade duration) isn’t going to help you much. We can’t extend the code length because latency (this is PTT speech). Sigh.
  2. This explain why 700C (with no FEC) does so well – we lose speech during deep fades (where FEC breaks anyway) but it “hangs on” as the Doppler whirls around and sounds fine in the “anti-fades”. The voice codec is robust to a few % BER all by itself which helps.
  3. Analog SSB is nearly impervious to fading, no matter how fast. It’s taken a lot of work to develop modems that “hang on” in fast fading channels.
  4. Analog SSB degrades slowly with decreasing SNR, but also improves slowly with increasing SNR. It’s still noisy at high SNRs. DSP noise reduction can help.

Lets take a look at the effect of compression. Here is a screen shot from my spectrum analyser in zero-span mode. It’s displaying power against time from my FT-817 being driven by two waveforms. Yellow is the previous, uncompressed 700D waveform, purple is the latest 700D with compression. You can really get a feel for how much higher the average power is. On my radio I jumped from 5-10W RMS to 40WRMS.

Jose’s demo

Jose, LU5DKI sent me a wave file sample of him “walking through the modes” over a 12,500km path between Argentina and New Zealand. The SSB is at the 2:30 mark:

This example shows how well 700E can handle fast fading over a path that includes Antartica:

A few caveats:

  • Jose’s TK-80 radio is 40 years old and doesn’t have any compression available for SSB.
  • FreeDV attenuates the “pass through” off air radio noise by about 6dB, so the level of the SSB will be lower than the FreeDV audio. However that might be a good idea with all that noise.
  • Some noise reduction DSP might help, although it tends to fall over at low SNRs. I don’t have a convenient command line tool for that. If you do – here is Jose’s sample. Please share the output with us.

I’m interested in objective comparisons of FreeDV and SSB using off air samples. I’m rather less interested in subjective opinions. Show me the samples …….

Conclusions and Further Work

I’m pleased with our recent modem waveform development and especially the compression. It’s also good fun to develop new waveforms and getting easier as the FreeDV API software matures. We’re getting pretty good performance over a range of channels now. Learning how to make modems for digital voice play nicely over HF channels. I feel our SSB versus FreeDV comparisons are maturing too.

The main limitation is the Codec 2 700C vocoder – while usable in practice it’s not exactly HiFi. Unfortunately speech coding is hard – much harder than modems. More R&D than engineering, which means a lot more effort – with no guarantee of a useful result. Anyhoo, lets see if I can make some progress on speech quality at low SNRs in 2021!

Links

[1] Compression – good introduction from AB4OJ.
[2] DSP Speech Processor Experimentation 2012-2020 – Sophisticated speech processor.
[3] Playing with PAPR – my initial simulations from earlier in 2020.
[4] Jim Ahlstrom N2ADR has done some fine work on FreeDV filter C code – very useful once again for this project. Thanks Jim!
[5] Modems for HF Digital Voice Part 1 and Part 2 – gentle introduction in to modems for HF.
[6] Steve Ports an OFDM modem from Octave to C – the OFDM modem Steve and I built – it keeps getting better!
[7] FreeDV 700E uses one of Bill’s fine LDPC codes.
[8] Modem waveform design spreadsheet.
[9] FreeDV 700D Released

Command Lines

Writing these down so I can cut and paste them to repeat these tests in the future….

Typical FreeDV 700E simulation, wave file output:

./src/freedv_tx 700E ../raw/ve9qrp_10s.raw - --clip 1 | ./src/cohpsk_ch - - -23 --mpp --raw_dir ../raw --Fs 8000 | sox -t .s16 -r 8000 -c 1 - ~/drowe/blog/ve9qrp_10s_700e_23_mpd_rx.wav

Looking at the PDF (histogram) of signal magnitude is interesting. Lets generate some compressed FreeDV 700E:

./src/freedv_tx 700D ../raw/ve9qrp.raw - --clip 1 | ./src/cohpsk_ch - - -100  --Fs 8000 --complexout > ve9qrp_700d_clip1.iq16

Now take the complex valued output signal and plot the PDF and CDF the magnitude (and time domain and spectrum):

octave:1> s=load_raw("../build_linux/ve9qrp_700d_clip1.iq16"); s=s(1:2:end)+j*s(2:2:end); figure(1); plot(abs(s)); S=abs(fft(s)); figure(2): clf; plot(20*log10(S)); figure(3); clf; [hh nn] = hist(abs(s),25,1); cdf = empirical_cdf(1:max(abs(s)),abs(s)); plotyy(nn,hh,1:max(abs(s)),cdf);


This is after clipping, so 100% of the samples have a magnitude less than 16384. Also see [3].

When testing with real radios it’s useful to play a sine wave at the same PEP level as the modem signals under test. I could get 75WRMS (and PEP) out of my IC-7200 using this test (13.8VDC power supply):

./misc/mksine - 1000 160 16384 | aplay -f S16_LE

We can measure the PAPR of the sine wave with the cohpsk_ch tool:

./misc/mksine - 1000 10 | ./src/cohpsk_ch - /dev/null -100 --Fs 8000
ohpsk_ch: Fs: 8000 NodB: -100.00 foff: 0.00 Hz fading: 0 nhfdelay: 0 clip: 32767.00 ssbfilt: 1 complexout: 0
cohpsk_ch: SNR3k(dB):    85.23  C/No....:   120.00
cohpsk_ch: peak.....: 10597.72  RMS.....:  9993.49   CPAPR.....:  0.51 
cohpsk_ch: Nsamples.:    80000  clipped.:     0.00%  OutClipped:  0.00%

CPAPR = 0.5dB, should be 0dB, but I think there’s a transient as the Hilbert Transformer FIR filter memory fills up. Close enough.

By chaining cohpsk_ch together is various ways we can build a SSB compressor, and simulate the channel by injecting noise and fading:

./src/cohpsk_ch ../raw/ve9qrp_10s.raw - -100 --Fs 8000 | ./src/cohpsk_ch - - -100 --Fs 8000 --clip 16384 --gain 10 | ./src/cohpsk_ch - - -100 --Fs 8000 --clip 16384 | ./src/cohpsk_ch - - -17 --raw_dir ../raw --mpd --Fs 8000 --gain 0.8 | aplay -f S16_LE

cohpsk_ch: peak.....: 16371.51  RMS.....:  7128.40   CPAPR.....:  7.22

A PAPR of 7.2 dB is pretty good for a few hours work – the cools kids get around 6dB [1][2].

,

Jan SchmidtRift CV1 – Adventures in Kalman filtering

In my last post I wrote about changes in my OpenHMD positional tracking branch to split analysis of the tracking frames from the camera sensors across multiple threads. In the 2 months since then, the only real change in the repository was to add some filtering to the pose search that rejects bad poses by checking if they align with the gravity vector observed by the IMU. That is in itself a nice improvement, but there is other work I’ve been doing that isn’t published yet.

The remaining big challenge (I think) to a usable positional tracking solution is fusing together the motion information that comes from the inertial tracking sensors (IMU) in the devices (headset, controllers) with the observations that come from the camera sensors (video frames). There are some high level goals for that fusion, and lots of fiddly details about why it’s hard.

At the high level, the IMUs provide partial information about the motion of each device at a high rate, while the cameras provide observations about the actual position in the room – but at a lower rate, and with sometimes large delays.

In the Oculus CV1, the IMU provides Accelerometer and Gyroscope readings at 1000Hz (500Hz for controllers), and from those it’s possible to compute the orientation of the device relative to the Earth (but not the compass direction it’s facing), and also to integrate acceleration readings to get velocity and position – but the position tracking from an IMU is only useful in the short term (a few seconds) as it drifts rapidly due to that double integration.

The accelerometers measure (surprise) the acceleration of the device, but are always also sensing the Earth’s gravity field. If a device is at rest, it will ideally report 9.81 m/s2, give or take noise and bias errors. When the device is in motion, the acceleration measured is the sum of the gravity field, bias errors and actual linear acceleration. To interpolate the position with any accuracy at all, you need to separate those 3 components with tight tolerance.

That’s about the point where the position observations from the cameras come into play. You can use those snapshots of the device position to determine the real direction that the devices are facing, and to correct for any errors in the tracked position and device orientation from the IMU integration – by teasing out the bias errors and gravity offset.

The current code uses some simple hacks to do the positional tracking – using the existing OpenHMD 3DOF complementary filter to compute the orientation, and some hacks to update the position when a camera finds the pose of a device.

The simple hacks work surprisingly well when devices don’t move too fast. The reason is (as previously discussed) that the video analysis takes a variable amount of time – if we can predict where a device is with a high accuracy and maintain “tracking lock”, then the video analysis is fast and runs in a few milliseconds. If tracking lock is lost, then a full search is needed to recover the tracking, and that can take hundreds of milliseconds to complete… by which time the device has likely moved a long way and requires another full pose search, which takes hundreds of milliseconds..

So, the goal of my current development is to write a single unified fusion filter that combines IMU and camera observations to better track and predict the motion of devices between camera frames. Better motion prediction means hitting the ‘fast analysis’ path more often, which leads to more frequent corrections of the unknowns in the IMU data, and (circularly) better motion predictions.

To do that, I am working on an Unscented Kalman Filter that tracks the position, velocity, acceleration, orientation and IMU accelerometer and gyroscope biases – with promising initial results.

Graph of position error (m) between predicted position and position from camera observations
Graph of orientation error (degrees) between predicted orientation and camera observed pose.

In the above graphs, the filter is predicting the position of the headset at each camera frame to within 1cm most of the time and the pose to within a few degrees, but with some significant spikes that still need fixing. The explanation for the spikes lies in the data sets that I’m testing against, and points to the next work that needs doing.

To develop the filter, I’ve modifed OpenHMD to record traces as I move devices around. It saves out a JSON file for each device with a log of each IMU reading and each camera frame. The idea is to have a baseline data set that can be used to test each change in the filter – but there is a catch. The current data was captured using the upstream positional tracking code – complete with tracking losses and long analysis delays.

The spikes in the filter graph correspond with when the OpenHMD traces have big delays between when a camera frame was captured and when the analysis completes.

Delay (ms) between camera frame and analysis results.

What this means is that when the filter makes bad predictions, it’s because it’s trying to predict the position of the device at the time the sensor result became available, instead of when the camera frame was captured – hundreds of milliseconds earlier.

So, my next step is to integrate the Kalman filter code into OpenHMD itself, and hopefully capture a new set of motion data with fewer tracking losses to prove the filter’s accuracy more clearly.

Second – I need to extend the filter to compensate for that delay between when a camera frame is captured and when the results are available for use, by using an augmented state matrix and lagged covariances. More on that next time.

To finish up, here’s a taste of another challenge hidden in the data – variability in the arrival time of IMU updates. The IMU updates at 1000Hz – ideally we’d receive those IMU updates 1 per millisecond, but transfer across the USB and variability in scheduling on the host computer make that much noisier. Sometimes further apart, sometimes bunched together – and in one part there’s a 1.2 second gap.

IMU reading timing variability (nanoseconds)

Russell CokerMPV vs Mplayer

After writing my post about VDPAU in Debian [1] I received two great comments from anonymous people. One pointed out that I should be using VA-API (also known as VAAPI) on my Intel based Thinkpad and gave a reference to an Arch Linux Wiki page, as usual Arch Linux Wiki is awesome and I learnt a lot of great stuff there. I also found the Debian Wiki page on Hardware Video Acceleration [2] which has some good information (unfortunately I had already found all that out through more difficult methods first, I should read the Debian Wiki more often.

It seems that mplayer doesn’t suppoer VAAPI. The other comment suggested that I try the mpv fork of Mplayer which does support VAAPI but that feature is disabled by default in Debian.

I did a number of tests on playing different videos on my laptop running Debian/Buster with Intel video and my workstation running Debian/Unstable with ATI video. The first thing I noticed is that mpv was unable to use VAAPI on my laptop and that VDPAU won’t decode VP9 videos on my workstation and most 4K videos from YouTube seem to be VP9. So in most cases hardware decoding isn’t going to help me.

The Wikipedia page about Unified Video Decoder [3] shows that only VCN (Video Core Next) supports VP9 decoding while my R7-260x video card [4] has version 4.2 of the Unified Video Decoder which doesn’t support VP9, H.265, or JPEG. Basically I need a new high-end video card to get VP9 decoding and that’s not something I’m interested in buying now (I only recently bought this video card to do 4K at 60Hz).

The next thing I noticed is that for my combination of hardware and software at least mpv tends to take about 2/3 the CPU time to play videos that mplayer does on every video I tested. So it seems that using mpv will save me 1/3 of the power and heat from playing videos on my laptop and save me 1/3 of the CPU power on my workstation in the worst case while sometimes saving me significantly more than that.

Conclusion>

To summarise quite a bit of time experimenting with video playing and testing things: I shouldn’t think too much about hardware decoding until VP9 hardware is available (years for me). But mpv provides some real benefits right now on the same hardware, I’m not sure why.

,

Russell CokerSMART and SSDs

The Hetzner server that hosts my blog among other things has 2*256G SSDs for the root filesystem. The smartctl and smartd programs report both SSDs as in FAILING_NOW state for the Wear_Leveling_Count attribute. I don’t have a lot of faith in SMART. I run it because it would be stupid not to consider data about possible drive problems, but don’t feel obliged to immediately replace disks with SMART errors when not convenient (if I ordered a new server and got those results I would demand replacement before going live).

Doing any sort of SMART scan will cause service outage. Replacing devices means 2 outages, 1 for each device.

I noticed the SMART errors 2 weeks ago, so I guess that the SMART claims that both of the drives are likely to fail within 24 hours have been disproved. The system is running BTRFS so I know there aren’t any unseen data corruption issues and it uses BTRFS RAID-1 so if one disk has an unreadable sector that won’t cause data loss.

Currently Hetzner Server Bidding has ridiculous offerings for systems with SSD storage. Search for a server with 16G of RAM and SSD storage and the minimum prices are only 2E cheaper than a new server with 64G of RAM and 2*512G NVMe. In the past Server Bidding has had servers with specs not much smaller than the newest systems going for rates well below the costs of the newer systems. The current Hetzner server is under a contract from Server Bidding which is significantly cheaper than the current Server Bidding offerings, so financially it wouldn’t be a good plan to replace the server now.

Monitoring

I have just released a new version of etbe-mon [1] which has a new monitor for SMART data (smartctl). It also has a change to the sslcert check to search all IPv6 and IPv4 addresses for each hostname, makes freespace check look for filesystem mountpoint, and makes the smtpswaks check use latest swaks command-line.

For the new smartctl check there is an option to treat “Marginal” alert status from smartctl as errors and there is an option to name attributes that will be treated as marginal even if smartctl thinks they are significant. So now I have my monitoring system checking the SMART data on the servers of mine which have real hard drives (not VMs) and aren’t using RAID hardware that obscures such things. Also it’s not alerting me about the Wear_Leveling_Count on that particular Hetzner server.

,

Russell CokerTesting VDPAU in Debian

VDPAU is the Video Decode and Presentation API for Unix [1]. I noticed an error with mplayer “Failed to open VDPAU backend libvdpau_i965.so: cannot open shared object file: No such file or directory“, Googling that turned up Debian Bug #869815 [2] which suggested installing the packages vdpau-va-driver and libvdpau-va-gl1 and setting the environment variable “VDPAU_DRIVER=va_gl” to enable VPDAU.

The command vdpauinfo from the vdpauinfo shows the VPDAU capabilities, which showed that VPDAU was working with va_gl.

When mplayer was getting the error about a missing i915 driver it took 35.822s of user time and 1.929s of system time to play Self Control by Laura Branigan [3] (a good music video to watch several times while testing IMHO) on my Thinkpad Carbon X1 Gen1 with Intel video and a i7-3667U CPU. When I set “VDPAU_DRIVER=va_gl” mplayer took 50.875s of user time and 4.207s of system time but didn’t have the error.

It’s possible that other applications on my Thinkpad might benefit from VPDAU with the va_gl driver, but it seems unlikely that any will benefit to such a degree that it makes up for mplayer taking more time. It’s also possible that the Self Control video I tested with was a worst case scenario, but even so taking almost 50% more CPU time made it unlikely that other videos would get a benefit.

For this sort of video (640×480 resolution) it’s not a problem, 38 seconds of CPU time to play a 5 minute video isn’t a real problem (although it would be nice to use less battery). For a 1600*900 resolution video (the resolution of the laptop screen) it took 131 seconds of user time to play a 433 second video. That’s still not going to be a problem when playing on mains power but will suck a bit when on battery. Most Thinkpads have Intel video and some have NVidia as well (which has issues from having 2 video cards and from having poor Linux driver support). So it seems that the only option for battery efficient video playing on the go right now is to use a tablet.

On the upside, screen resolution is not increasing at a comparable rate to Moore’s law so eventually CPUs will get powerful enough to do all this without using much electricity.

,

Tim SerongI Have No Idea How To Debug This

On my desktop system, I’m running XFCE on openSUSE Tumbleweed. When I leave my desk, I hit the “lock screen” button, the screen goes black, and the monitors go into standby. So far so good. When I come back and mash the keyboard, everything lights up again, the screens go white, and it says:

blank: Shows nothing but a black screen
Name: tserong@HOSTNAME
Password:
Enter password to unlock; select icon to lock

So I type my password, hit ENTER, and I’m back in action. So far so good again. Except… Several times recently, when I’ve come back and mashed the keyboard, the white overlay is gone. I can see all my open windows, my mail client, web browser, terminals, everything, but the screen is still locked. If I type my password and hit ENTER, it unlocks and I can interact again, but this is where it gets really weird. All the windows have moved down a bit on the screen. For example, a terminal that was previously neatly positioned towards the bottom of the screen is now partially off the screen. So “something” crashed – whatever overlay the lock thingy put there is gone? And somehow this affected the position of all my application windows? What in the name of all that is good and holy is going on here?

Update 2020-12-21: I’ve opened boo#1180241 to track this.

,

Adrian ChaddRepairing and bootstrapping an IBM 5170 PC/AT, part 2

Ok, so now it runs. But, what did it take to get here?

First up - I'm chasing down a replacement fusable PROM and I'll likely have to build a programmer for it. The programmer will need to run a bit at a time, which is very different to what the EPROM programmers available today support. It works for now, but I don't like it.

I've uploaded a dump of the PROM here - https://erikarn.github.io/pcat/notes.html .

Here's how the repair looks so far:



Next - getting files onto the device. Now, remember the hard disk is unstable, but even given that it's only DOS 5.0 which didn't really ship with any useful file transfer stuff. Everyone expected you'd have floppies available. But, no, I don't have DOS available on floppy media! And, amusingly, I don't have a second 1.2MB drive installed anywhere to transfer files.

I have some USB 3.5" drives that work, and I have a 3.5" drive and Gotek drive to install in the PC/AT. However, until yesterday I didn't have a suitable floppy cable - the 3.5" drive and Gotek USB floppy thingy both use IDC pin connectors, and this PC/AT uses 34 pin edge connectors. So, whatever I had to do, I had to do with what I had.

There are a few options available:

  • You can write files in DOS COMMAND.COM shell using COPY CON <file> - it either has to be all ascii, or you use ALT-<3 numbers> to write ALT CODES. For MS-DOS, this would just input that value into the keyboard buffer. For more information, Wikipedia has a nice write-up here: https://en.wikipedia.org/wiki/Alt_code .
  • You can use an ASCII only assembly as above: a popular one was TCOM.COM, which I kept here: https://erikarn.github.io/pcat/tcomtxt.asm
  • If you have MODE.COM, you could try setting up the serial port (COM1, COM2, etc) to a useful baud rate, turn on flow control, etc - and then COPY COM1 <file>. I didn't try this because I couldn't figure out how to enable hardware flow control, but now that I have it all (mostly) working I may give it a go.
  • If you have QBASIC, you can write some QBASIC!
I tried TCOM.COM, both at 300 and 2400 baud. Both weren't reliable, and there's a reason it isn't - writing to the floppy is too slow! Far, far too slow! And, it wasn't enforcing hardware flow control, which was very problematic for reliable transfers.

So, I wrote some QBASIC. It's pretty easy to open a serial port and read/write to it, but it's not AS easy to have it work for binary file transfer. There are a few fun issues:

  • Remember, DOS (and Windows too, yay!) has a difference between files open for text reading/writing and files open for binary reading/writing.
  • QBASIC has sequential file access or random file access. For sequential, you use INPUT/PRINT, for random you use GET and PUT.
  • There's no byte type - you define it as a STRING type of a certain size.
  • This is an 8MHz 80286, and .. well, let's just say QBASIC isn't the fastest thing on the planet here.
I could do some basic IO fine, but I couldn't actually transfer and write out the file contents quickly and reliably. Even going from 1200 to 4800 and 9600 baud didn't increase the transfer rate! So even given an inner loop of reading/writing a single byte at a time with nothing else, it still can't keep up.

The other amusingly annoying thing is what to use on the remote side to send binary files. Now, you can use minicom and such on FreeBSD/Linux, but it doesn't have a "raw" transfer type - it has xmodem, ymodem, zmodem and ascii transfers. I wanted to transfer a ~ 50KB binary to let me do ZMODEM transfers, and .. well, this presents a bootstrapping problem.

After a LOT of trial and error, I ended up with the following:

  • I used tip on FreeBSD to talk to the serial port
  • I had to put "hf=true" into .tiprc to force hardware handshaking; it didn't seem to work when I set it after I started tip (~s to set a variable)
  • On the QBASIC side I had to open it up with hardware flow control to get reliable transfers;
  • And I had to 128 byte records - not 1 byte records - to get decent transfer performance!
  • On tip to send the file I would ask it to fork 'dd' to do the transfer (using ~C) and asking it to pad to the 128 byte boundary:
    • dd if=file bs=128 conv=sync
The binary I chose (DSZ.COM) didn't mind the extra padding, it wasn't checksumming itself.

Here's the hacky QBASIC program I hacked up to do the transfer:

OPEN "RB", #2, "MYFILE.TXT", 128

' Note: LEN = 128 is part of the OPEN line, not a separate line!
OPEN "COM1:9600,N,8,1,CD0,CS500,DS500,OP0,BIN,TB2048,RB32768" FOR RANDOM AS #1 LEN = 128

size# = 413 '413 * 128 byte transfer
DIM c AS STRING * 128 ' 128 byte record
FOR i = 1 TO size#
  GET #1, , c
  PUT #2, , c
NEXT i
CLOSE #2
CLOSE #1

Now, this is hackish, but specifically:
  • 9600 baud, 8N1, hardware flow control, 32K receive buffer.
  • 128 byte record size for both the file and UART transfers.
  • the DSZ.COM file size, padded to 128 bytes, was 413 blocks. So, 413 block transfers.
  • Don't forget to CLOSE the file once you've written, or DOS won't finalise the file and you'll end up with a 0 byte file.
This plus tip configured for 9600 and hardware flow control did the right thing. I then used DSZ to use ZMODEM to transfer a fresh copy of itself, and CAT.EXE (Alley Cat!)

Ok, so that bootstrapped enough of things to get a ZMODEM transfer binary onto a bootable floppy disc containing a half-baked DOS 5.0 installation. I can write software with QBASIC and I can transfer files on/off using ZMODEM.

Next up, getting XT-IDE going in this PC/AT and why it isn't ... well, complete.



,

Adrian ChaddRepairing and bootstrapping an IBM 5170 PC/AT, part 1

 I bought an IBM PC/AT 5170 a few years ago for a Hackerdojo project that didn't end up going anywhere.

So, I have a PC/AT with:

  • 8MHz 80286 (type 3 board)
  • 512K on board
  • 128K expansion board (with space for 512K extended RAM, 41256 RAM chip style)
  • ST4038 30MB MFM drive with some gunk on head or platter 3 (random head 3 read failures, sigh)
  • 1.2MB floppy drive
  • CGA card
  • Intel 8/16 ethernet card



Ok, so the bad disk was a pain in the ass. It's 2020, DOS on 1.2MB floppy disks isn't exactly the easiest thing to come across. But, it DOES occasionally boot.

But, first up - the BIOS battery replacement had leaked. Everywhere. So I replaced that, and had to type in a BASIC program into ROM BASIC to reprogram the BIOS NVRAM area with a default enough configuration to boot from floppy or hard disk.



Luckily someone had done that:

http://www.minuszerodegrees.net/5170/setup/5170_setup.htm

So, I got through that.

Then, I had to buy some double high density 5.25" discs. Ok, well, that took a bit, but they're still available as new old stock (noone's making floppy discs anymore, sigh.) I booted the hard disk and after enough attempts at it, it booted to the command prompt. At which point I promptly created a bootable system disc and copied as much of DOS 5.0 off of it as I could.




Then, since I am a child of the 80s and remember floppy discs, I promptly DISCCOPY'ed it to a second disc that I'm leaving as a backup.

And, for funsies, DOSSHELL.



Ok, so what's next?

I decided to buy an alternate BIOS - the Quadtel 286 image that's floating about - because quite frankly having to type in that BASIC program into ROM BASIC every time was a pain in the ass. So, in it went. Which was good, because...

Well, then it stopped working. It turns out that my clean-up of the battery leakage wasn't enough. The system booted with three short beeps and "0E" on the screen.

Now we get into deep, deep PC history.

Luckily, the Quadtel BIOS codes are available here:

http://www.bioscentral.com/postcodes/quadtelbios.htm

.. but with the Intel BIOS, it didn't beep, it didn't do anything. Just a black screen.

What gives?

So, starting with PC/AT and clone machines, the BIOS would write status updates during boot to a fixed IO port. Then if you have a diagnostic card that monitors that IO port, you'd get updates on where the system go to during boot before it hit a problem. These are called POST (power on self test) codes.

Here's a write-up of it and some POST cards:

http://www.minuszerodegrees.net/misc/post_cards.htm

Luckily, the Quadtel BIOS just spat it out on the screen for me. Phew.

So! 0xE says the 8254 interval timer wasn't working. I looked on the board and ... voila! It definitely had a lot of rusty looking crap on it. U115, a 32 byte PROM used for some address line decoding also was unhappy.

Here's how it looked before I had cleaned it up - this is circa July:




I had cleaned all of this out and used some vinegar on a Q-tip to neutralise the leaked battery gunk, but I didn't get underneath the ICs.

So, out they both came. I cleaned up the board, repaired some track damage and whacked in sockets.

Then in went the chips - same issue. Then I was sad.

Then! Into the boxes of ICs I went - where I found an 8254-2 that was spare! I lifted it from a dead PC clone controller board a while ago. In IT went, and the PC/AT came alive again.

(At this point I'd like to note that I was super afraid that the motherboard was really dead, as repairing PC/AT motherboards is not something I really wanted to do. Well, it's done and it works.)

Rightio! So, the PC boots, CGA monitor and all, from floppy disc. Now comes the fun stuff - how do I bootstrap said PC/AT with software, given no software on physical media? Aha, that's in part 2.

,

Linux AustraliaCouncil Meeting Tuesday 15th December 2020 – Minutes

1. Meeting overview and key information

Present

Sae Ra Germaine

Jonathan Woithe

Julien Goodwin

Russell Stuart

Lisa Sands

Joel Addison

Benno Rice

Apologies 

None

 

Meeting opened at 1930 AEDT by Sae Ra and quorum was achieved.

Minutes taken by Julien.

2. Log of correspondence

  • From Anchor to council@ on 2 Dec 2020: lca2018.org domain renewal is due by 1 Mar 2021.
    • Still a few services that either need to move off.
    • AI: Joel to deal with.
  • From ASIC to council@ on 8 Dec 2020: “Open Source Australia” business registration due by 24 Nov 2020. This was dealt with on the day; an acknowledgement of renewal was received.
    • (Already done)

3. Items for discussion

  • Rusty Wrench
    • AI: Julien to send out to previous winners, without a suggestion from council
  • Code of Conduct wording clarification (background sent to council@)
    • Also the github copy turns out to be out of date, PyCon desire this refresh to happen early enough to be in place for the next PyCon.
    • Change will be announced.
    • AI: Jonathan to write up an announcement for policies, linux-aus,.
  • YouTube Partner Program
    • Setup more formal escalation paths for LCA (and other LA-affiliated events) in future
    • Register for YouTube Partner Programme to get additional support avenues.
    • Need to create an AdSense account for this
    • Do not need to enable monetisation now or future, but do need to decide whether existing videos should have monetisation enabled when we join the program.
    • Sae Ra moves motion that we join the partner program, but do not enable monetisation for existing videos.
      • Passed, one abstention.
    • AI: Joel to register, and update Ryan.

4. Items for noting

  • Election/AGM announcement sent.
    • Need some volunteers for AGM meeting wrangling.
    • AGM is set for: 11am-midday on Friday the 15th of January (AEDT)
      • 8am Perth
    • AI: Julien to send call for AGM items ASAP.
  • LCA update <details redacted>

5. Other business

  • None

6. In camera

  • No items were discussed in camera

Meeting closed at 2020

The post Council Meeting Tuesday 15th December 2020 – Minutes appeared first on Linux Australia.

Stewart SmithTwo Photos from Healseville Sanctuary

If you’re near Melbourne, you should go to Healseville Sanctuary and enjoy the Australian native animals. I’ve been a number of times over the years, and here’s a couple of photos from a (relatively, as in, the last couple of years) trip.

Leah trying to photograph a much too close bird
Koalas seem to always look like they’ve just woken up. I’m pretty convinced this one just had.

Stewart SmithPhotos from Adelaide

Some shots on Kodak Portra 400 from Adelaide. These would have been shot with my Nikon F80 35mm body, I think all with the 50mm lens. These are all pre-pandemic, and I haven’t gone and looked up when exactly. I’m just catching up on scanning some negatives.

,

Francois MarierList of Planet Linux Australia blogs

I've been following Planet Linux Australia for many years and discovered many interesting FOSS blogs through it. I was sad to see that it got shut down a few weeks ago and so I decided to manually add all of the feeds to my RSS reader to avoid missing posts from people I have been indirectly following for years.

Since all feeds have been removed from the site, I recovered the list of blogs available from an old copy of the site preserved by the Internet Archive.

Here is the resulting .opml file if you'd like to subscribe.

Changes

Once I had the full list, I removed all blogs that are gone, empty or broken (e.g. domain not resolving, returning a 404, various database or server errors).

I updated the URLs of a few blogs which had moved but hadn't updated their feeds on the planet. I also updated the name of a blogger who was still listed under a previous last name.

Finally, I removed LA-specific tags from feeds since these are unlikely to be used again.

Work-arounds

The following LiveJournal feeds didn't work in my RSS reader but opened fine in a browser:

However since none of them have them updated in the last 7 years, I just left them out.

A couple appear to be impossible to fetch over Tor, presumably due to a Cloudflare setting:

Since only the last two have been updated in the last 9 years, I added these to Feedburner and added the following "proxied" URLs to my reader:

Similarly, I couldn't fetch the following over Tor for some other reasons:

I excluded the first two which haven't been updated in 6 years and proxied the other ones:

Maxim ZakharovSafari; wss; OSStatus -9829

You may google out several explanations on an error "WebSocket network error: The operation couldn't be completed (OSStatus error -9829)" when you attempt to connect to a secure websocket using Safari web-browser on Mac OS X 11.0.1 (Big Sur). One of them points out more strict requirements for trusted web-site certificates on MacOS an iOS, which I didn't know.

However, in my case the error was caused by a drawback in Safari browser, - turn out it simply does not send user certificate when performing secure websocket connection (wss:// URL scheme).

Hopefully, Firefox browser doesn't have this drawback and all works fine when you use it. Though you would need to install all CA certificates and user certificate into Firefox's store as it turn out doesn't use the system's one.

Maxim ZakharovAsynchronous Consensus Algorithm

,

Francois MarierOpting your domain out of programmatic advertising

A few years ago, the advertising industry introduced the ads.txt project in order to defend against widespread domain spoofing vulnerabilities in programmatic advertising.

I decided to use this technology to opt out of having ads sold for my domains, at least through ad exchanges which perform this check, by hosting a text file containing this:

contact=ads@fmarier.org

at the following locations:

(In order to get this to work on my blog, running Ikiwiki on Branchable, I had to disable the txt plugin in order to get ads.txt to be served as a plain text file instead of being automatically rendered as HTML.)

Specification

The key parts of the specification for our purposes are:

[3.1] If the server response indicates the resource does not exist (HTTP Status Code 404), the advertising system can assume no declarations exist and that no advertising system is unauthorized to buy and sell ads on the website.

[3.2.1] Some publishers may choose to not authorize any advertising system by publishing an empty ads.txt file, indicating that no advertising system is authorized to buy and sell ads on the website. So that consuming systems properly read and interpret the empty file (differentiating between web servers returning error pages for the /ads.txt URL), at least one properly formatted line must be included which adheres to the format specification described above.

As you can see, the specification sadly ignores RFC8615 and requires that the ads.txt file be present directly in the root of your web server, like the venerable robots.txt file, but unlike the newer security.txt standard.

If you don't want to provide an email address in your ads.txt file, the specification recommends using the following line verbatim:

placeholder.example.com, placeholder, DIRECT, placeholder

Validation

A number of online validators exist, but I used the following to double-check my setup:

,

Tim RileyOpen source status update, November 2020

Hello again, dear OSS enthusiasts. November was quite a fun month for me. Not only did I merge all the PRs I outlined in October’s status update, I also got to begin work on an area I’d been dreaming about for months: integrating Hanami/dry-system with Zeitwerk!

Added an autoloading loader to dry-system

Zeitwerk is a configurable autoloader for Ruby applications and gems. The “auto” in autoloader means that, once configured, you should never have to manually require before referring to the classes defined in the directories managed by Zeitwerk.

dry-system, on the other hand, was requiring literally every file it encountered, by design! The challenge here was to allow it to work with or without an auto-loader, making either mode a configurable option, ideally without major disruption to the library.

Fortunately, many of the core Dry::System::Container behaviours are already separate into individually configurable components, and in the end, all we needed was a new Loader subclass implementing a 2-line method:

module Dry
  module System
    class Loader
      # Component loader for autoloading-enabled applications
      #
      # This behaves like the default loader, except instead of requiring the given path,
      # it loads the respective constant, allowing the autoloader to load the
      # corresponding file per its own configuration.
      #
      # @see Loader
      # @api public
      class Autoloading < Loader
        def require!
          constant
          self
        end
      end
    end
  end
end

This can be enabled for your container like so:

require "dry/system/loader/autoloading"

class MyContainer < Dry::System::Container
  configure do |config|
    config.loader = Dry::System::Loader::Autoloading
    # ...
  end
end

Truth is, it did take a fair bit of doing to arrive at this simple outcome. Check out the pull request for more detail. The biggest underlying change was moving the responsibility for requiring files out of Container itself and into the Loader (which is called via each Component in the container). While I was in there, I took the chance to tweak a few other things too:

  • Clarified the Container.load_paths! method by renaming it to add_to_load_path! (since it is modifying Ruby’s $LOAD_PATH)
  • Stopped automatically adding the system_dir to the load path, since with Zeitwerk support, it’s now reasonable to run dry-system without any of its managed directories being on the load path
  • Added a new component_dirs setting, defaulting to ["lib"], which is used to verify whether a given component is ”local” to the container. This check was previously done using the directories previously passed to load_paths!, which we can’t rely upon now that we’re supporting autoloaders
  • Added a new add_component_dirs_to_load_path setting, defaulting to true, which will automatically add the configured component_dirs to the load path in an after-configure hook. This will help ease the transition from the previous behaviour, and make dry-system still work nicely when not using an autoloader

With all of this in place, a full working example with Zeitwerk looks like this. First, the container:

require "dry/system/container"
require "dry/system/loader/autoloading"

module Test
  class Container < Dry::System::Container
    config.root = Pathname(__dir__).join("..").realpath
    config.add_component_dirs_to_load_path = false
    config.loader = Dry::System::Loader::Autoloading
    config.default_namespace = "test"
  end
end

Then Zeitwerk setup:

loader = Zeitwerk::Loader.new
loader.push_dir Test::Container.config.root.join("lib").realpath
loader.setup

Then, given a component “foo_builder”, at lib/test/foo_builder.rb:

module Test
  class FooBuilder
    def call
      # We can now referencing this constant without a require!
      Entities::Foo.new
    end
  end
end

With this in place, we can resolve Test::Container["foo_builder"], receive an instance of Test::FooBuilder as expected, then .call it to receive our instance Test::Foo. Tada!

I’m very happy with how all this came together.

Next steps with dry-system

Apart from cracking the Zeitwerk nut, this project also gave me the chance to dive into the guts of dry-system after quite a while. There’s quite a bit of tidying up I’d still like to do, which is my plan for the next month or so. I plan to:

  • Make it possible to configure all aspects of each component_dir via a single block passed to the container’s config
  • Remove the default_namespace top-level container setting (since this will now be configured per-component_dir)
  • Remove the .auto_register! method, since our component-loading behaviour requires component dirs to be configured, and this method bypasses that step (until now, it’s only really worked by happenstance)
  • Make Zeitwork usable without additional config by providing a plugin that can be activated by a simple use :zeitwerk

Once these are done, I’ll hop up into the Hanami framework layer and get to work on passing the necessary configuration through to its own dry-system container so that it can also work with Zeitwerk out of the box.

Hanami core team meeting

This month I also had the (rare!) pleasure of catching up with Luca and Piotr in person to discuss our next steps for Hanami 2 development. Read my notes to learn more. If you’re at all interested in Hanami development (and if you’ve reached this point in my 9th straight monthly update, I assume you are), then this is well worth a read!

Of particular relevance to the topics above, we’ve decided to defer the next Hanami 2 alpha release until the Zeitwerk integration is in place. This will ensure we have a smooth transition across releases in terms of code loading behaviour (if we released sooner, we’d need to document a particular set of rules for alpha2 but then half of those out the window for alpha3, which is just too disruptive).

Thank you to my sponsors!

After all this time, I’m still so appreciative of my tiny band of GitHub sponsors. This stuff is hard work, so I’d really appreciate your support.

See you all again next month, by which point we’ll all have a Ruby 3.0 release!

,

Simon LyallAudiobooks – November 2020

The Geography of Nowhere: The Rise and Decline of America’s Man-made Landscape by James Howard Kunstler

A classic in urban planning, covering the downside of post-war American urban design. It dates from 1993 so but still 90% relevant. 3/5

A Year in Paris: Season by Season in the City of Light
by John Baxter

A series of short chapters arranged in seasonal sections on Paris, People, the Author’s life and the French Revolutionary Calendar. Plenty of Interest. 3/5

These Happy Golden Years: Little House Series, Book 8 by Laura Ingalls Wilder

Covering Laura’s short time as a schoolteacher (aged 15!) and her courting with husband-to-be Almanzo. Most action is in the first half of the book though. 3/5

Pure Invention: How Japan’s Pop Culture Conquered the World by Matt Alt

In depth chapters on things the Walkman, Game Boy and Hello Kitty traces Japans rise first in hardware and then in cultural influence. Excellent story 4/5

On All Fronts: The Education of a Journalist by Clarissa Ward

A conflict-reporter memoir of her life and career. Based mainly in Moscow, Baghdad, and Beirut she especially goes into detail of her missions into Syria during it’s civil war. 3/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

David RoweOpen IP over VHF/UHF 4

For the last few weeks I’ve been building up some automated test software for my fledgling IP over radio system.

Long term automated tests can help you thrash out a lot of issues. Part of my cautious approach in taking small steps to build a complex system. I’ve built up a frame repeater system where one terminal “pings” another terminal – and repeats this thousands of times. I enjoyed writing service scripts [3] to wrap up the complex command lines, and bring the system “up” and “down” cleanly. The services also provide some useful debug options like local loopback testing of the radio hardware on each terminal.

I started testing the system “over the bench” with two terminals pinging and ponging frames back and forth via cables. After a few hours I hit a bug where the RpiTx RF would stop. A few repeats showed this sometimes happened in a few minutes, and other times after a few hours.

This lead to an interesting bug hunt. I quite enjoy this sort of thing, peeling off the layers of a complex system, getting closer and closer to the actual problem. It was fun to learn about the RpiTx [2] internals. A very clever system of a circular DMA buffer feeding PLL fractional divider values to the PLLC registers on the Pi. The software application chases that DMA read pointer around, trying to keep the buffer full.

By dumping the clock tree I eventually worked out some other process was messing with the PLLC register. Evariste on the RpiTx forum then suggested I try “force_turbo=1” [4]. That fixed it! My theory is the CPU freq driver (wherever that lives) was scaling all the PLLs when the CPU shifted clock speed. To avoid being caught again I added some logic to check PLLC and bomb out if it appears to have been changed.

A few other interesting things I noticed:

  1. I’m running 10 kbit/s for these tests, with a 10kHz shift between the two FSK tones and a carrier frequency of 144.5MHz. I use a FT-817 SSB Rx to monitor the transmission, which has bandwidth of around 3 kHz. A lot of the time a FSK burst sounds like broadband noise, as the FT-817 is just hearing a part of the FSK spectrum. However if you tune to the high or low tone frequency (just under 144.500 or 144.510) you can hear the FSK tones. Nice audio illustration of FSK in action.
  2. On start up RPiTx uses ntp to calibrate the frequency, which leads to slight shifts in the frequency each time it starts. Enough to be heard by the human ear, although I haven’t measured them.

I’ve just finished a 24 hour test where the system sent 8600 bursts (about 6 Mbyte in each direction) over the link, and everything is running nicely (100% of packets were received). This gives me a lot of confidence in the system. I’d rather know if there are any stability issues now than when the device under test is deployed remotely.

I feel quite happy with that result – there’s quite a lot of signal processing software and hardware that must be playing nicely together to make that happen. Very satisfying.

Next Steps

Now it’s time to put the Pi in a box, connect a real antenna and try some over the air tests. My plan is:

  1. Set up the two terminals several km apart, and see if we can get a viable link at 10 kbit/s, although even 1 kbit/s would be fine for initial tests. Enough margin for 100 kbit/s would be even better, but happy to work on that milestone later.
  2. I’m anticipating some fine tuning of the FSK_LDPC waveforms will be required.
  3. I’m also anticipating problems with urban EMI, which will raise the noise floor and set the SNR of the link. I’ve instrumented the system to measure the noise power at both ends of the link, so I can measure this over time. I can also measure received signal power, and estimate path loss. Knowing the gain of the RTLSDR, we can measure signal power in dBm, and estimate noise power in dBm/Hz.
  4. There might be some EMI from the Pi, lets see what happens when the antenna is close.
  5. I’ll run the frame repeater system over several weeks, debug any stability issues, and collect data on S, N, SNR, and Packet Error Rate.

Reading Further

[1] Open IP over VHF/UHF Part 1 Part 2 Part 3
[2] RpiTx – Radio transmitter software for Raspberry Pis
[3] GitHub repo for this project with build scripts, a project plan and a bunch of command lines I use to run various tests. The latest work in progress will be an open pull request.
[4] RpiTx Group discussion of the PLLC bug discussed above

,

Paul WiseFLOSS Activities November 2020

Focus

This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Administration

  • Debian wiki: disable attachments due to security issue, approve accounts

Communication

  • Respond to queries from Debian users and contributors on the mailing lists and IRC

Sponsors

The visdom, apt-listchanges work and lintian-brush bug report were sponsored by my employer. All other work was done on a volunteer basis.

,

Linux AustraliaCouncil Meeting Tuesday 1st December 2020 – Minutes

1. Meeting overview and key information

Present

Sae Ra Germaine

Jonathan Woithe

Julien Goodwin

Russell Stuart

Lisa Sands

Joel Addison

Benno Rice

Apologies

None

 

Meeting opened at 1930 AEDT by Sae Ra and quorum was achieved.

Minutes taken by Julien.

2. Event Review

Drupal

Admin Team

Pycon

LCA 2020

LCA 2021

LCA 2022

3. Log of correspondence

  • 23 Nov 2020: Mailchimp policy update notification sent to council@.
    • N/A as the account has been closed
    • Data export saved into our Google Drive
  • From: Xero; Date: Mon 30 Nov 2020, Subject: Xero pricing changes, Summary: Price is going up by $2/mo.

4. Items for discussion

  • AGM timing
    • We’ll continue as planned, Julien & Sae Ra to ensure the announcements go out.
  • Do we have a meeting on the 29th of December, if not 12th Jan is normal (just before AGM)
    • No to the end of December, yes to the January
  • LCA YouTube account (Joel)
    • Setup more formal escalation paths for LCA (and other LA-affiliated events) in future
    • Register for YouTube Partner Programme to get additional support avenues.
    • Need to create an AdSense account for this
    • Do not need to enable monetisation now or future, but do need to decide whether existing videos should have monetisation enabled when we join the program.
    • AI: Sae Ra will move this on list so people have time to review

5. Items for noting

  • Rusty wrench nom period ongoing
    • 2 proper nominations received
    • 1 resubmission of a nomination from last year requested
    • 1 apparently coming
  • Jonathan reached out to <a member> re Code of Conduct concerns, haven’t gotten a response
  • Grant application for Software Freedom Day, no response from them, so grant has lapsed.
  • <our contact on the CovidSafe analysis team> has yet to provide information about the FOI costs associated with his group’s work on the COVIDsafe app.

6. Other business 

  • None

7. In camera

  • No items were discussed in camera

2038 AEDT close

The post Council Meeting Tuesday 1st December 2020 – Minutes appeared first on Linux Australia.

,

Stewart SmithWhy you should use `nproc` and not grep /proc/cpuinfo

There’s something really quite subtle about how the nproc utility from GNU coreutils works. If you look at the man page, it’s even the very first sentence:

Print the number of processing units available to the current process, which may be less than the number of online processors.

So, what does that actually mean? Well, just because the computer some code is running on has a certain number of CPUs (and here I mean “number of hardware threads”) doesn’t necessarily mean that you can spawn a process that uses that many. What’s a simple example? Containers! Did you know that when you invoke docker to run a container, you can easily limit how much CPU the container can use? In this case, we’re looking at the --cpuset-cpus parameter, as the --cpus one works differently.

$ nproc
8

$ docker run --cpuset-cpus=0-1 --rm=true -it  amazonlinux:2
bash-4.2# nproc
2
bash-4.2# exit

$ docker run --cpuset-cpus=0-2 --rm=true -it  amazonlinux:2
bash-4.2# nproc
3

As you can see, nproc here gets the right bit of information, so if you’re wanting to do a calculation such as “Please use up to the maximum available CPUs” as a parameter to the configuration of a piece of software (such as how many threads to run), you get the right number.

But what if you use some of the other common methods?

$ /usr/bin/lscpu -p | grep -c "^[0-9]"
8
$ grep -c 'processor' /proc/cpuinfo 
8

$ docker run --cpuset-cpus=0-1 --rm=true -it  amazonlinux:2
bash-4.2# yum install -y /usr/bin/lscpu
......
bash-4.2# /usr/bin/lscpu -p | grep -c "^[0-9]"
8
bash-4.2# grep -c 'processor' /proc/cpuinfo 
8
bash-4.2# nproc
2

In this case, if you base your number of threads off grepping lscpu you take another dependency (on the util-linux package), which isn’t needed. You also get the wrong answer, as you do by grepping /proc/cpuinfo. So, what this will end up doing is just increase the number of context switches, possibly also adding a performance degradation. It’s not just in docker containers where this could be an issue of course, you can use the same mechanism that docker uses anywhere you want to control resources of a process.

Another subtle thing to watch out for is differences in /proc/cpuinfo content depending on CPU architecture. You may not think it’s an issue today, but who wants to needlessly debug something?

tl;dr: for determining “how many processes to run”: use nproc, don’t grep lscpu or /proc/cpuinfo

,

Francois MarierRemoving a corrupted data pack in a Restic backup

I recently ran into a corrupted data pack in a Restic backup on my GnuBee. It led to consistent failures during the prune operation:

incomplete pack file (will be removed): b45afb51749c0778de6a54942d62d361acf87b513c02c27fd2d32b730e174f2e
incomplete pack file (will be removed): c71452fa91413b49ea67e228c1afdc8d9343164d3c989ab48f3dd868641db113
incomplete pack file (will be removed): 10bf128be565a5dc4a46fc2fc5c18b12ed2e77899e7043b28ce6604e575d1463
incomplete pack file (will be removed): df282c9e64b225c2664dc6d89d1859af94f35936e87e5941cee99b8fbefd7620
incomplete pack file (will be removed): 1de20e74aac7ac239489e6767ec29822ffe52e1f2d7f61c3ec86e64e31984919
hash does not match id: want 8fac6efe99f2a103b0c9c57293a245f25aeac4146d0e07c2ab540d91f23d3bb5, got 2818331716e8a5dd64a610d1a4f85c970fd8ae92f891d64625beaaa6072e1b84
github.com/restic/restic/internal/repository.Repack
        github.com/restic/restic/internal/repository/repack.go:37
main.pruneRepository
        github.com/restic/restic/cmd/restic/cmd_prune.go:242
main.runPrune
        github.com/restic/restic/cmd/restic/cmd_prune.go:62
main.glob..func19
        github.com/restic/restic/cmd/restic/cmd_prune.go:27
github.com/spf13/cobra.(*Command).execute
        github.com/spf13/cobra/command.go:838
github.com/spf13/cobra.(*Command).ExecuteC
        github.com/spf13/cobra/command.go:943
github.com/spf13/cobra.(*Command).Execute
        github.com/spf13/cobra/command.go:883
main.main
        github.com/restic/restic/cmd/restic/main.go:86
runtime.main
        runtime/proc.go:204
runtime.goexit
        runtime/asm_amd64.s:1374

Thanks to the excellent support forum, I was able to resolve this issue by dropping a single snapshot.

First, I identified the snapshot which contained the offending pack:

$ restic -r sftp:hostname.local: find --pack 8fac6efe99f2a103b0c9c57293a245f25aeac4146d0e07c2ab540d91f23d3bb5
repository b0b0516c opened successfully, password is correct
Found blob 2beffa460d4e8ca4ee6bf56df279d1a858824f5cf6edc41a394499510aa5af9e
 ... in file /home/francois/.local/share/akregator/Archive/http___udd.debian.org_dmd_feed_
     (tree 602b373abedca01f0b007fea17aa5ad2c8f4d11f1786dd06574068bf41e32020)
 ... in snapshot 5535dc9d (2020-06-30 08:34:41)

Then, I could simply drop that snapshot:

$ restic -r sftp:hostname.local: forget 5535dc9d
repository b0b0516c opened successfully, password is correct
[0:00] 100.00%  1 / 1 files deleted

and run the prune command to remove the snapshot, as well as the incomplete packs that were also mentioned in the above output but could never be removed due to the other error:

$ restic -r sftp:hostname.local: prune
repository b0b0516c opened successfully, password is correct
counting files in repo
building new index for repo
[20:11] 100.00%  77439 / 77439 packs
incomplete pack file (will be removed): b45afb51749c0778de6a54942d62d361acf87b513c02c27fd2d32b730e174f2e
incomplete pack file (will be removed): c71452fa91413b49ea67e228c1afdc8d9343164d3c989ab48f3dd868641db113
incomplete pack file (will be removed): 10bf128be565a5dc4a46fc2fc5c18b12ed2e77899e7043b28ce6604e575d1463
incomplete pack file (will be removed): df282c9e64b225c2664dc6d89d1859af94f35936e87e5941cee99b8fbefd7620
incomplete pack file (will be removed): 1de20e74aac7ac239489e6767ec29822ffe52e1f2d7f61c3ec86e64e31984919
repository contains 77434 packs (2384522 blobs) with 367.648 GiB
processed 2384522 blobs: 1165510 duplicate blobs, 47.331 GiB duplicate
load all snapshots
find data that is still in use for 15 snapshots
[1:11] 100.00%  15 / 15 snapshots
found 1006062 of 2384522 data blobs still in use, removing 1378460 blobs
will remove 5 invalid files
will delete 13728 packs and rewrite 15140 packs, this frees 142.285 GiB
[4:58:20] 100.00%  15140 / 15140 packs rewritten
counting files in repo
[18:58] 100.00%  50164 / 50164 packs
finding old index files
saved new indexes as [340cb68f 91ff77ef ee21a086 3e5fa853 084b5d4b 3b8d5b7a d5c385b4 5eff0be3 2cebb212 5e0d9244 29a36849 8251dcee 85db6fa2 29ed23f6 fb306aba 6ee289eb 0a74829d]
remove 190 old index files
[0:00] 100.00%  190 / 190 files deleted
remove 28868 old packs
[1:23] 100.00%  28868 / 28868 files deleted
done

Michael Stillpngtools, code that can nearly drink in the US

Share

I was recently contacted about availability problems with the code for pngtools. Frankly, I’m mildly surprised anyone still uses this code, but I am happy for them to do so. I have resurrected the code, placed it on github, and included the note below on all relevant posts:

A historical note from November 2020: this code is quite old, but still actively used. I have therefore converted the old subversion repository to git and it is hosted at https://github.com/mikalstill/pngtools. I will monitor there for issues and patches and try my best to remember what I was thinking 20 years ago…

Share

,

Stewart SmithPhotos from Tasmania (2017)

On the random old photos train, there’s some from spending time in Tasmania post linux.conf.au 2017 in Hobart.

All of these are Kodak E100VS film, which was no doubt a bit out of date by the time I shot it (and when they stopped making Ektachrome for a while). It was a nice surprise to be reminded of a truly wonderful Tassie trip, taken with friends, and after the excellent linux.conf.au.

Linux AustraliaCouncil Meeting Tuesday 17th November 2020 – Minutes

1. Meeting overview and key information

Present

Sae Ra Germaine

Jonathan Woithe

Julien Goodwin

Russell Stuart

Lisa Sands

Benno Rice

Joel Addison

Apologies 

None

 

Meeting opened at 1932 AEDT by Sae Ra and quorum was achieved.

Minutes taken by Julien.

2. Log of correspondence

  • None

3. Items for discussion

  • Rusty Wrench timing.
    • Call for nominations draft looks good, (AI) Julien to send ASAP
  • Vala Tech Camp diversity scholarship
    • See mail to council@ 2020-11-09 from Sae Ra, “[LACTTE] For Next Meeting – VALA Tech Camp Diversity Scholarship”
    • Motion by Julien for Linux Australia to sponsor VALA Tech Camp A$2,750, seconded by Jonathan.
    • Passed, one abstention.

4. Items for noting

  • LCA21 <details redacted>

5. Other business

  • Quick discussion re covidsafe research and freedom of information requests, nothing for now, possibly next time
  • Moving the returning officer video into a doc would be nice, but no short term volunteers. Jonathan may look into it early 2021.

6. In camera

  • No items were discussed in camera

 

Meeting closed at 2006

The post Council Meeting Tuesday 17th November 2020 – Minutes appeared first on Linux Australia.

,

David RoweOpen IP over VHF/UHF 3

The goal of this project is to develop a “100 kbit/s IP link” for VHF/UHF using just a Pi and RTLSDR hardware, and open source signal processing software [1]. Since the last post, I’ve integrated a bunch of components and now have a half duplex radio data system running over the bench.

Recent progress:

  1. The Tx and Rx signal processing is now operating happily together on a Pi, CPU load is fine.
  2. The FSK_LDPC modem and FEC [2] has been integrated, so we can now send and receive coded frames. The Tx and Rx command line programs have been modified to send and receive bursts of frames.
  3. I’ve added a PIN diode Transmit/Receive switch, which I developed for the SM2000 project [3]. This is controlled by a GPIO from the Pi. There is also logic to start and stop the Pi Tx carrier at the beginning and end of bursts – so it doesn’t interfere with the Rx side.
  4. I’ve written a “frame repeater” application that takes packets received from the Rx and re-transmits them using the Tx. This will let me run “ping” tests over the air. A neat feature is it injects the received Signal and Noise power into the frame it re-transmits. This will let me measure the received power, the noise floor, and SNR at the remote station.
  5. The receiver in each terminal is very sensitive, and inconveniently picks up frames transmitted by that terminal. After trying a few approaches I settled on a “source filtering” design. When a packet is transmitted, the Tx places a “source byte” in the frame that is unique to that terminal. A one byte MAC address I guess. The local receiver then ignores (filters) any packets with that source address, and only outputs frames from other terminals.

Here is a block diagram of the Pi based terminal, showing hardware and software components:

When I build my next terminal, I will try separate Tx and Rx antennas, as a “minimalist” alternative to the TR switch. The next figure shows the transmit control signals in action. Either side of a burst we need to switch the TR switch and turn the Tx carrier on and off:

Here’s the current half duplex setup on the bench:

Terminal2 is on the left, is comprised of the Pi, RTLSDR, and TR switch. Terminal1 (right) is the HackRF/RTLSDR connected to my laptop. Instead of a TR switch I’m using a hybrid combiner (a 3dB loss, but not an issue for these tests). This also shows how different SDR Tx/Rx hardware can be used with this system.

I’m using 10,000 bit/s for the current work, although that’s software configurable. When I start testing over the air I’ll include options for a range of bit rates, eventually shooting for 100 kbits/s.

Here’s a demo video of the system:

Next Steps

The command lines to run everything are getting unwieldy so I’ll encapsulate them is some “service” scripts to start and stop the system neatly. Then box everything up, try a local RF link, and check for stability over a few days. Once I’m happy I will deploy a terminal and start working through the real world issues. The key to getting complex systems going is taking tiny steps. Test and debug carefully at each step.

It’s coming together quite nicely, and I’m enjoying a few hours of work on the project every weekend. It’s very satisfying to build the layers up one by one, and a pleasant surprise when the pieces start playing nicely together and packets move magically across the system. I’m getting to play with RF, radios, modems, packets, and even building up small parts of a protocol. Good fun!

Reading Further

[1] Open IP over UHF/VHF Part 1 and Part 2.
[2] FSK LDPC Data Mode – open source data mode using a FSK modem and powerful LDPC codes.
[3] SM2000 Part 3 – PIN TR Switch and VHF PA
[4] GitHub repo for this project with build scripts, a project plan and a bunch of command lines I use to run various tests. The latest work in progress will be an open pull request.

Stewart SmithPhotos from Melbourne

I recently got around to scanning some film that took an awful long time to make its way back to me after being developed. There’s some pictures from home.

The rest of this roll of 35mm Fuji Velvia 50 is from Tasmania, which would place this all around December 2016.

Stewart SmithPhotos from long ago….

It’s strange to get unexpected photos from a while ago. It’s also joyous.

These photos above are from a park down the street from where we used to live. I believe it was originally a quarry, and a number of years ago the community got together and turned it into a park. It’s a quite decent size (Parkrun is held there), and there’s plenty of birds (and ducks!) to see.

Moorabbin Station

It’s a very strange feeling seeing photos from both the before time, and from where I used to live. I’m sure that if the world wasn’t the way it was now, and there wasn’t a pandemic, it would feel different.

All of the above were shot on a Nikon F80 with 35mm Fuji Velvia 50 film.

,

Glen TurnerBlocking a USB device

udev can be used to block a USB device (or even an entire class of devices, such as USB storage). Add a file /etc/udev/rules.d/99-local-blacklist.rules containing:

SUBSYSTEM=="usb", ATTRS{idVendor}=="0123", ATTRS{idProduct}=="4567", ATTR{authorized}="0"


comment count unavailable comments

,

Lev LafayetteeResearchAustralasia 2020

With annual conferences since 2007 eResearchAustralasia was hosted online this year, due to the impacts of SARS-CoV-2. Typically conferences are held along the eastern seaboard of Australia, which does bring into question the "-asia" part of the suffix. Even the conference logo highlights Australia and New Zealand, to the exclusion of the rest of the word. I am not sure how eResearch NZ feels about this encroachment on their territory. To be fair, however, eResearchAustralasia did have some "-asian" content, primarily in the keynote address of data analytics for COVID-19 tracking in Indonesia, a presentation on data sharing also with an Indonesian focus.

The conference had 582 attendees, up 119 from last year, and ran for five days from Monday, October 19 to Friday, October 23. Presentations and timetabling was quite varied with a combination of single-session keynotes, three or four concurrent streams of oral presentations and "birds-of-a-feather" sessions, lightning talks, "solution showcases", a poster session (read, "download the PDF"), online exhibitions, and a rather poorly-thought-out "speed networking"; overall more than 120 presentations. The conference itself was conducted through a product called "OnAir", which from some accounts has a good event management suite (EventsAIR), but the user interface could certainly do with some significant improvement. One notable advantage of having an online conference with pre-recorded presentations is that the presenters could engage in a live Q&A and elaboration with attendees, which actually meant that more content could be derived.

It was, of course, impossible to attend all the events so any review is orientated to those sessions that could be visited, which is around fifty in my case. The conference has promised to make videos available at a later date on a public platform (e.g., their Youtube channel) as there was no means to download videos at the conference itself, although sometimes slidedecks were provided. Unsurprisingly, a good proportion of the sessions were orientated around the new work and research environments due to the pandemic (including a topic stream), ranging from data management services, DNA sequencing using Galaxy, and multiple sessions on moving training online.

Training, in fact, received its own topic stream which is good to see after many years of alerts on the growing gap between researcher skills, practices, and requirements. This became particularly evident in one presentation from Intersect, which highlighted the need for educational scaffolding. Another training feature of note came from the University of Otago who, with an interest in replication and reproducibility, reported on their training in containers. This provided for a very interesting comparison with the report on The Australian Research Container Orchestration Service (ARCOS) which is establishing an Australian Kubernetes Core Service to support the use and orchestration of containers.

It will be interesting to see how ARCOS interfaces not just with the Australian Research Data Commons (ARDC), but also with the newly announced Australian Research Environment (ARE), a partnership between Pawsey, AARnet, and NCI "to provide a streamlined, nationally integrated workspace connected by high speed links" in 2021, especially when looking at the presentation on The Future of Community Cloud and State Based Infrastructure Organisations and AARNets presentation on Delivering sustainable Research Infrastructure, emphasizing the National Research Data Infrastructure and expansions on CloudStor Active Research Data Storage and Analysis services.

Research Computing Services from the University of Melbourne and friends were well-represented at the Conference. Steve Manos, for example, gave the presentation on ARCOS. Bernard Meade was a speaker on the panel for Governance Models for Research Compute, whereas yours truly inflicted to a willing audience two presentations to conference; one on Spartan: From Experimental Hybrid towards a Petascale Future, and another on contributing to the international HPC Certification Forum; I made sure I provided slidedecks and a transcript, I don't think anyone else did that. There was also one presentation on law and Natural Language Processing which garnered an additional mention of Spartan, albeit only to the extent that they said they hadn't gotten around to using the service yet! Something also of note was the multiple presentations on Galaxy which is, of course, prominent at Melbourne Bioinformatics.

This is, of course, only a taste of the presentations both in terms of what was available at the conference and what your reviewer attended, but it does give some highlights of what where seen as significant contributions. Despite some situational difficulties in hosting the event eResearchAustralasia have done quite well indeed in making the conference happen and deserve strong and sincere congratulations for that success and the impressive number of registrations. Whilst the conference made it clear that people are adapting to the current circumstances and eResearch has made enormous contributions to the scientific landscape on this matter, there is also clear indications of some national-level organisational initiatives as well. Whilst disruption and change are inevitable, especially in these circumstances, it is hoped that the scientific objectives remain of the higer priority throughout.

Lev LafayetteThe Willsmere Cup

Like an increasing number of Australians I'm not going to partake in the Melbourne Cup "the race that stops a nation", at least not in a traditional manner. Such events cause trauma and death on the track, often enough, and the last race is always to the knackery. The horse racing industry is unnecessary and cruel.

I'll just leave this here for those who want to learn more:

https://www.abc.net.au/news/2019-10-18/slaughter-abuse-of-racehorses-und...

But as a game designer I thought to myself, why not run a simulation? So here it is; "The Willsmere Cup"

The Melbourne cup was originally 2 miles (16 furlongs), so I've modified in scale to 32" because I'm using the nostalgic old measuring system here.

Each horse has a base move per thirty-second turn of 4", but with 2 FUDGE dice; I would reduce base move and increase FUDGE dice for poor weather, track quality, etc.

This means an average of 8 rolls (4 minutes) to complete the track, and potentially winnable in just 5 die rolls (2 minutes 30) with normal speeds. The actual record, again using the old length, is 3 minutes 19 or 7 rolls.

The horses in this race are:

Qilin: From China and blessed with Taoist magics, that runs a sagacious and even-paced race (base move unchanged throughout).

Ziggy: The local favourite hails from a red dust plain. Some say it's from Mars, but really it was outback Australia. Has a bit of a sparkle to their personality which can lead to erratic running (3 FUDGE dice).

Cupid: The darling of the race, a real show pony. Completely average in all racing respects (base move unchanged), but who doesn't love the unicorn-pegasus of love?

Atargatis: An Assyrian breed, known for flying starts and flowing like water, but has trouble with endurance (base move of 5 for first 2 minutes, then base move 3 after that).

Twilight: The dark horse in the race with a brooding (one could say, 'Gothic') personality from the New England region of the United States. Known for starting slow (base move 3 for first 2 minutes), but really picks up speed after that (base move 5 after that). A real stalking horse.

Race at Willsmere at 11am. Unicorn-pegasus horsies provided by Erica Hoehn.

At the starting gate. A lovely day here for a miniature unicorn-pegasus race.

And they're off! Atargatis has taken an early lead (5"), then neck-and-neck for Qilin, Cupid, and Ziggy (4"), and with Twilight (3") bringing up the rear.

At the one-minute mark we see that Atargatis (5"+6"=11") has really opened up, and Twilight (3"+5"=8") has caught up and is now equal second with Qilin (4"+4"=8"). Cupid however is dragging (4"+2"=6") and something has really spooked Ziggy (4"+1"=5") who has fallen back to the last place.

One-and-a-half minute mark, Atargatis continues in the lead (5"+6"+4"=15) a very impressive time, almost half-way through the track. In second place, some three lengths behind is Qilin (4"+4"+4"=12") and in equal place is Cupid, having made an amazing burst (4"+2"+6"=12"). One length behind this pack Twilight (3"+5"+3"=11"), and bringing up the rear, but also with a burst of energy is Ziggy (4"+1"+5"=10")

Two-minute mark, traditionally the half-way point of the race. Wait! Is that a giant cat that has entered the grounds? Yes,
Manannán mac Lir is rolling in the sun and enjoying the spectacle as well.

Atargatis continues to lead the field at pace (5"+6"+4"+4"=19"), but there has been a burst from Twilight (3"+5"+3"+5"=16") who is making their move into second place! The steady Qilin is now equal-second (4"+4"+4"+4"=16"), followed by Cupid (4"+2"+6"+3"=15"), slowing down a little, and Ziggy has really been distracted by the cat and falls further behind (4"+1"+5"+2"=12"). When you're a feisty runner, you can't make two mistakes like that in a race.

Two-and-a-half-minute mark, almost three-quarters done, and Twilight continues their burst (3"+5"+3"+5"+6"=22"), this is a great recovery, and into first place! Atargatis has really slowed down (5"+6"+4"+4"+2"=21"). A trip on Qilin! The normally steady unicorn-pegasus has slipped and is now in equal third with Cupid (4"+2"+6"+3"+3"=18", and Ziggy brings up the rear (4"+1"+5"+2"+3"=15").

Three-minute mark, and Twilight really has a remarkable pace going now and is pulling clearly ahead (+6"=28"), Atargatis is making a real effort as well (+5"=26"), but is three lengths behind. Qilin is back to a steady gait (+4"=22"), with Cupid (+3"=21") and Ziggy (+4"=19") back in the field.

But it's all Twilight! At the 3.15 mark, that's a new record, Twilight crosses the finish line (+6"=35"). Atargatis will make some twenty seconds late (+5"+2"=33), with Cupid (+4"+5"=36") just getting a nose in front of Qilin (+5"+4"=35"), and finally at the rear is the unfortunate Ziggy (+3"+4"+4"=34")

AttachmentSize
Image icon willsmerecup01.jpg105.66 KB
Image icon willsmerecup02.jpg197.98 KB
Image icon willsmerecup03.jpg205.6 KB
Image icon willsmerecup04.jpg208.05 KB
Image icon willsmerecup05.jpg193.63 KB

,

Simon LyallAudiobooks – October 2020

Protocol: The Power of Diplomacy and How to Make It Work for You by Capricia Penavic Marshall

A mix of White House stories and tips about how to enhance your career though skills she has learnt. The stories are the best bit of the book. 3/5

Little Town on the Prairie: Little House Series, Book 7 by Laura Ingalls Wilder

Various incidents with 15 year old Laura now studying to become a school teacher while being courted. The family farm progresses and town grows. 3/5

Bold They Rise: The Space Shuttle Early Years (1972-1986) by David Hitt and Heather R. Smith

Covering up to and including the Challenger Disaster. Largely quotes from astronauts and people involved. Interesting seeing how missions quickly went to routine. 3/5

The X-15 Rocket Plane: Flying the First Wings into Space by Michelle Evans

A detailed look at the rocketplane programme. Structured around each of the pilots. Covers all the important flights and events. 4/5

The Time Traveller’s Almanac – Part III – Mazes & Traps
by Multiple Authors

Around 18 short Sci-Fi stories about Time, the oldest from 1881. Not all stories strictly time travel. Plenty of hits among the collection. 3/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Tim RileyOpen source status update, October 2020

October was the month! I finally got through the remaining tasks standing between me and an Hanami 2.0.0.alpha2 release. Let’s work through the list now!

Application views configured with application inflector

Now when you subclass Hanami::View inside an Hanami app, it will now use the application’s configured inflector automatically. This is important because hanami-view uses the inflector to determine the class names for your view parts, and it’s just plain table stakes for an framework to apply inflections consistently (especially if you’ve configured custom inflection rules).

The implementation within hanami-view was quite interesting, because it was the first time I had to adjust an ApplicationConfiguration (this one being exposed as config.views on the Hanami::Application subclass) to hide one of its base settings. In this case, it hides the inflector setting because we know it will be configured with the application’s inflector as part of the ApplicationView behaviour (to refresh your memory, ApplicationView is a module that’s mixed in whenever Hanami::View is subclassed within a namespace managed by a full Hanami application).

Ordinarily, I’m all in favour of exposing as many settings as possible, but in this case, it didn’t make sense for a view-specific inflector to be independently configurable right alongside the application inflector itself.

Rest assured, you don’t lose access to this setting entirely, so if you ever have reason to give your views a different inflector, you can go right ahead and directly assign it in a view class:

module Main
  class View < Hanami::View
    # By default, the application inflector is configured

    # But you can also override it:
    config.inflector = MyCustomInflector
  end
end

There was a counterpart hanami PR for this change, and I was quite happy to see it all done, because it means we now have consistent handling of both action and view settings: each gem provides their own ApplicationConfiguration class, which is made accessible via config.actions and config.views respectively. This consistency should make it easier to maintain both of these imported configurations going forward (and, one day, to devise a system for any third party gem to register application-level settings).

Application views have their template configured always

One aspect of the ApplicationView behaviour is to automatically configure a template name on each view class. For example, a Main::Views::Articles::Index would have its template configured as "articles/index".

This is great, but there was an missing piece from the implementation. It assumed that your view hierarchy would always include an abstract base class defined within the application:

module Main
  # Abstract base view
  class View < Hanami::View
  end

  module Views
    module Articles
      # Concrete view
      class Index < View
      end
    end
  end
end

Under this assumption, the base view would never have its template automatically configured. That makes sense in the above arrangement, but if you ever wanted to directly inherit from Hanami::View for a single concrete view (and I can imagine cases where this would make sense), you’d lose the nice template name inference!

With this PR, this limitation is no more: every ApplicationView has a template configured in all circumstances.

Application views are configured with a Part namespace

Keeping with the theme of improving hanami-view integration, another gap I’d noticed was that application views are not automatically configured with a part namespace. This meant another wart if you wanted to use this feature:

require "main/views/parts"

module Main
  class View < Hanami::View
    # Ugh, I have to _type_ all of this out, _by hand?_
    config.part_namespace = Views::Parts
  end
end

Not any more! As of this PR, we now have a config.views.parts_path application-level setting, with a default value of "views/parts". When an ApplicationView is activated, it will take this value, convert it into a module (relative to the view’s application or slice namespace), and assign it as the view’s part_namespace. This would see any view defined in Main having Main::Views::Parts automatically set as its part namespace. Slick!

Security-related default headers restored

Sticking with configuration, but moving over to hanami-controller, Hanami::Action subclasses within an Hanami app (that is, any ApplicationAction) now have these security-related headers configured out of the box:

These are set on the config.actions.default_headers application-level setting, which you can also tweak to suit your requirements.

Previously, these were part of a bespoke one-setting-per-header arrangement in the config.security application-level setting namespace, but I think this new arrangement is both easier to understand and much more maintainable, so I was happy to drop that whole class from hanami as part of rounding this out this work.

Automatic cookie support based on configuration

The last change I made to hanami-controller was to move the config.cookies application-level setting, which was defined in the hanami gem, directly into the config.actions namespace, which is defined inside hamami-controller, much closer to the related behaviour.

We now also automatically include the Hanami::Action::Cookies module into any ApplicationAction if cookies are enabled. This removes yet another implmentation detail and piece of boilerplace that users would otherwise need to consider when building their actions. I’m really happy with how the ApplicationAction idea is enabling this kind of integration in such a clean way.

Check out the finer details in the PR to hanami-controller and witness the corresponding code removal from hanami itself.

Released a minimal application template

It’s been a while now since I released my original Hanami 2 application template, which still serves as a helpful base for traditional all-in-one web applications.

But this isn’t the only good use for Hanami 2! I think it can serve as a helpful base for any kind of application. When I had a colleague ask me on the viability of Hanami to manage a long-running system service, I wanted to demonstrate how it could look, so I’ve now released an Hanami 2 minimal application template. This one is fully stripped back: nothing webby at all, just a good old lib/ and a bin/app to demonstrate an entry point. I think it really underscores the kind of versatility I want to achieve with Hanami 2. Go check it out!

Gave dry-types a nice require-time performance boost

Last but not least, one evening I was investigating just how many files were required as one of my applications booted. I noticed an unusually high number of concurrent-ruby files being required. Turns out this was an unintended consequence of requiring dry-types. One single-line PR later and now a require "dry/types" will load 242 fewer files!

Savouring this moment

It’s taken quite some doing to get to this moment, where an Hanami 2.0.0.alpha2 release finally feels feasible. As you’d detect from my previous posts, it’s felt tantalisingly close for every one of the last few months. As you’d also detect from this post, the final stretch has involed a lot of focused, fiddly, and let’s face it, not all that exciting work. But these are just the kind of details we need to get right for an excellent framework experience, and I’m glad I could continue for long enough to get these done.

I’m keenly aware that there’ll be much, much more of this kind of work ahead of us, but for the time being, I’m savouring this interstice.

In fact, I’ve even given myself a treat: I’ve already started some early explorations of how we could adapt dry-system to fit with zeitwerk so that we can reliable autoloading a part of the core Hanami 2 experience. But more on that later ;)

Thank you to my sponsors!

I now have a sponsors page on this here site, which contains a small list of people to whom I am very thankful. I’d really love for you to join their numbers and sustain my open source work.

As for the next month, new horizons await: I’ll start working out some alpha2 release notes (can you believe it’s been nearly 2 years of work?), as well as continuing on the zeitwerk experiment.

See you all again, same place, same time!

,

David RoweSpeech Spectral Quantisation using VQ-VAE

As an exercise to learn more about machine learning, I’ve been experimenting with Vector Quantiser Variational AutoEncoders (VQ VAE) [2]. Sounds scary but is basically embedding a vector quantiser in a Neural Network so they train together. I’ve come up with a simple network that quantises 80ms (8 x 10ms frames) of spectral magnitudes in 88 bits (about 1100 bits/s).

I arrived at my current model through trial and error, using this example [1] as a starting point. Each 10ms frame is a vector of energies from 14 mel-spaced filters, derived from LPCNet [6]. The network uses conv1D stages to downsample and upsample the vectors, with a two stage VQ (11 bits per stage) in the Autoencoder “bottleneck”. The VQ is also encoding total frame energy, so the remaining parameters for a vocoder would be pitch and (maybe) voicing.

This work (spectral quantisation) is applicable to “old school” vocoders like Codec 2 and is also being used with newer Neural Vocoders in some research papers.

I haven’t used it to synthesise any speech yet but it sure does make nice plots. This one is a 2D histogram of the encoder space, white dots are the stage 1 VQ entries. The 16 dimensional data has been reduced to 2 dimensions using PCA.

If the VQ is working, we should expect more dots in the brighter colour areas, and less in the darker areas.

Here is a sample input (green) output (red) of 8 frames:

This is a transition region, going from voiced to unvoiced speech. It seems to handle it OK. The numbers are (frame_number, SD), where SD is the Spectral Distortion in dB*dB. When we get a high SD frame, quite often it’s not crazy wrong, more an educated guess that will probably sound OK, e.g. a different interpolation profile for the frame energy across a transition. Formants are mostly preserved.

The VQ seems to be doing something sensible, after 20 epochs I can see most VQ entries are being used, and the SD gets better with more bits. The NN part trains much faster that the VQ.

Here is a histogram of the SDs for each frame:

The average SD is around 7.5 dB*dB, similar to some of the Codec 2 quantisers. However this is measured on every 10ms frame in an 8 frame sequence, so it’s a measure of how well it interpolates/decimates in time as well. As I mentioned above – some of the “misses” that push the mean SD higher are inconsequential.

Possible Bug in Codec 2 700C

I use similar spectral magnitude vectors for Codec 2 700C [5] – however when I tried that data the SD was about double. Hmmm. I looked into it and found some bugs/weaknesses in my approach for Codec 2 700C (for that codec the spectral magnitudes a dependant on the pitch estimator which occasionally loses it). So that was a nice outcome – trying to get the same result two different ways can be a pretty useful test.

Further Work

Some ideas for further work:

  1. Use kmeans for training.
  2. Inject bit errors when training to make it robust to channel errors.
  3. Include filtered training material to make it robust to recording conditions.
  4. Integrate into a codec and listen to it.
  5. Try other networks – I’m still learning how to engineer an optimal network.
  6. Make it work with relu activations, I can only get it to work with tanh.

Reading Further

[1] VQ VAE Keras MNIST Example – my starting point for the VQ-VAE work
[2] Neural Discrete Representation Learning
[3] My Github repo for this work
[4] Good introduction to PCA
[5] Codec 2 700C – also uses VQ-ed mel-spaced vectors
[6] LPCNet: DSP-Boosted Neural Speech Synthesis

Linux AustraliaCouncil Meeting Tuesday 3rd November 2020 – Minutes

1. Meeting overview and key information

Present

Sae Ra Germaine

Jonathan Woithe

Julien Goodwin

Russell Stuart

Lisa Sands

Joel Addison

Benno Rice

Apologies

None

 

Meeting opened at 1930 AEDT by Sae Ra and quorum was achieved.

Minutes taken by Julien.

2. Event Review

Drupal

Admin Team

Pycon

LCA 2020

LCA 2021

LCA 2022

3. Log of correspondence

  • From: ASIC; Date: Sun, 25 Oct 2020 19:16:44 +1100; Subject: Renewal: OPEN SOURCE AUSTRALIA
    • MOTION: Russell Stuart moves we pay ASIC AUD$87 to renew “OPEN SOURCE AUSTRALIA” for 3 years.
    • Seconder: Jonathan
    • Outcome: Passed
  • From Google to council@ on 27 Oct 2020. Use of Google App Engine beyond 31 Jan 2021 requires payment information be added to the linked account before 31 Jan 2021. Affected project is “sydney-linux-users-group-hr”.
    • Current SLUG site is hosted on LA infra, we’ll leave it and see if anything breaks.
  • From MailChimp to council@ on 28 Oct 2020. Policies have been updated (Standard Terms of Use, Data Processing Addendum).
    • AI: Sae Ra to close account, was used in migration onto CiviCRM
  • From AgileWare; Subject: New Invoice, due 30/11/2020; Date: Sat, 31 Oct 2020 10:00:27 +1100.  Summary: $330 renewal for 6 months hosting.
    • MOTION: Sae Ra moves LA pays AgileWare AUD$330 to for 6 months in advance web site hosting, and up to AUD$3000 for support renewal.
    • Seconder: Russell
    • Outcome:  Passed

4. Items for discussion

  • None

5. Items for noting

  • Stewart Smith has agreed to be returning officer.
  • Sae Ra needs photo & bio for council members for annual report.
  • Audit need bank statements, some are only just through as of this meeting.
  • Approached by Vala Tech Camp to sponsor for next year.

6. Other business 

  • Call for nominations for Rusty Wrench
    • Announce out by late November, 2-3 week nomination period, close mid-December
    • AI: Julien to update draft

7. In camera

  • No items were discussed in camera

2011 AEDT close

The post Council Meeting Tuesday 3rd November 2020 – Minutes appeared first on Linux Australia.

,

Francois MarierRecovering from a corrupt MariaDB index page

I ran into a corrupt MariaDB index page the other day and had to restore my MythTV database from the automatic backups I make as part of my regular maintainance tasks.

Signs of trouble

My troubles started when my daily backup failed on this line:

mysqldump --opt mythconverg -umythtv -pPASSWORD > mythconverg-200200923T1117.sql

with this error message:

mysqldump: Error 1034: Index for table 'recordedseek' is corrupt; try to repair it when dumping table `recordedseek` at row: 4059895

Comparing the dump that was just created to the database dumps in /var/backups/mythtv/, it was clear that it was incomplete since it was about 100 MB smaller.

I first tried a gentle OPTIMIZE TABLE recordedseek as suggested in this StackExchange answer but that caused the database to segfault:

mysqld[9141]: 2020-09-23 15:02:46 0 [ERROR] InnoDB: Database page corruption on disk or a failed file read of tablespace mythconverg/recordedseek page [page id: space=115871, page number=11373]. You may have to recover from a backup.
mysqld[9141]: 2020-09-23 15:02:46 0 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
mysqld[9141]:  len 16384; hex 06177fa70000...
mysqld[9141]:  C     K     c      {\;
mysqld[9141]: InnoDB: End of page dump
mysqld[9141]: 2020-09-23 15:02:46 0 [Note] InnoDB: Uncompressed page, stored checksum in field1 102203303, calculated checksums for field1: crc32 806650270, innodb 1139779342,  page type 17855 == INDEX.none 3735928559, stored checksum in field2 102203303, calculated checksums for field2: crc32 806650270, innodb 3322209073, none 3735928559,  page LSN 148 2450029404, low 4 bytes of LSN at page end 2450029404, page number (if stored to page already) 11373, space id (if created with >= MySQL-4.1.1 and stored already) 115871
mysqld[9141]: 2020-09-23 15:02:46 0 [Note] InnoDB: Page may be an index page where index id is 697207
mysqld[9141]: 2020-09-23 15:02:46 0 [Note] InnoDB: Index 697207 is `PRIMARY` in table `mythconverg`.`recordedseek`
mysqld[9141]: 2020-09-23 15:02:46 0 [Note] InnoDB: It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
mysqld[9141]: 200923 15:02:46 2020-09-23 15:02:46 0 [ERROR] InnoDB: Failed to read file './mythconverg/recordedseek.ibd' at offset 11373: Page read from tablespace is corrupted.
mysqld[9141]: [ERROR] mysqld got signal 11 ;
mysqld[9141]: Core pattern: |/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %h ...
kernel: [820233.893658] mysqld[9186]: segfault at 90 ip 0000557a229f6d90 sp 00007f69e82e2dc0 error 4 in mysqld[557a224ef000+803000]
kernel: [820233.893665] Code: c4 20 83 bd e4 eb ff ff 44 48 89 ...
systemd[1]: mariadb.service: Main process exited, code=killed, status=11/SEGV
systemd[1]: mariadb.service: Failed with result 'signal'.
systemd-coredump[9240]: Process 9141 (mysqld) of user 107 dumped core.#012#012Stack trace of thread 9186: ...
systemd[1]: mariadb.service: Service RestartSec=5s expired, scheduling restart.
systemd[1]: mariadb.service: Scheduled restart job, restart counter is at 1.
mysqld[9260]: 2020-09-23 15:02:52 0 [Warning] Could not increase number of max_open_files to more than 16364 (request: 32186)
mysqld[9260]: 2020-09-23 15:02:53 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=638234502026
...
mysqld[9260]: 2020-09-23 15:02:53 0 [Note] InnoDB: Recovered page [page id: space=115875, page number=5363] from the doublewrite buffer.
mysqld[9260]: 2020-09-23 15:02:53 0 [Note] InnoDB: Starting final batch to recover 2 pages from redo log.
mysqld[9260]: 2020-09-23 15:02:53 0 [Note] InnoDB: Waiting for purge to start
mysqld[9260]: 2020-09-23 15:02:53 0 [Note] Recovering after a crash using tc.log
mysqld[9260]: 2020-09-23 15:02:53 0 [Note] Starting crash recovery...
mysqld[9260]: 2020-09-23 15:02:53 0 [Note] Crash recovery finished.

and so I went with the nuclear option of dropping the MythTV database and restoring from backup.

Dropping the corrupt database

First of all, I shut down MythTV as root:

kilall mythfrontend
systemctl stop mythtv-status.service
systemctl stop mythtv-backend.service

and took a full copy of my MariaDB databases just in case:

systemctl stop mariadb.service
cd /var/lib
apack /root/var-lib-mysql-20200923T1215.tgz mysql/
systemctl start mariadb.service

before dropping the MythTV databse (mythconverg):

$ mysql -pPASSWORD

MariaDB [(none)]> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| mythconverg        |
| performance_schema |
+--------------------+
4 rows in set (0.000 sec)

MariaDB [(none)]> drop database mythconverg;
Query OK, 114 rows affected (25.564 sec)

MariaDB [(none)]> quit
Bye

Restoring from backup

Then I re-created an empty database:

mysql -pPASSWORD < /usr/share/mythtv/sql/mc.sql

and restored the last DB dump prior to the detection of the corruption:

sudo -i -u mythtv
/usr/share/mythtv/mythconverg_restore.pl --directory /var/backups/mythtv --filename mythconverg-1350-20200923010502.sql.gz

In order to restart everything properly, I simply rebooted the machine:

systemctl reboot

Paul WiseFLOSS Activities October 2020

Focus

This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

  • Spam: reported 2 Debian bug reports and 147 Debian mailing list posts
  • Patches: merged libicns patches
  • Debian packages: sponsored iotop-c
  • Debian wiki: RecentChanges for the month
  • Debian screenshots:

Administration

  • Debian: get us removed from an RBL
  • Debian wiki: reset email addresses, approve accounts

Communication

Sponsors

The pytest-rerunfailures/pyemd/morfessor work was sponsored by my employer. All other work was done on a volunteer basis.

,

Francois MarierUpgrading from Ubuntu 18.04 bionic to 20.04 focal

I recently upgraded from Ubuntu 18.04.5 (bionic) to 20.04.1 (focal) and it was one of the roughest Ubuntu upgrades I've gone through in a while. Here are the notes I took on avoiding or fixing the problems I ran into.

Preparation

Before going through the upgrade, I disabled the configurations which I know interfere with the process:

  • Enable etckeeper auto-commits before install by putting the following in /etc/etckeeper/etckeeper.conf:

      AVOID_COMMIT_BEFORE_INSTALL=0
    
  • Remount /tmp as exectuable:

      mount -o remount,exec /tmp
    

Another step I should have taken but didn't, was to temporarily remove safe-rm since it caused some problems related to a Perl upgrade happening at the same time:

apt remove safe-rm

Network problems

After the upgrade, my network settings weren't really working properly and so I started by switching from ifupdown to netplan.io which seems to be the preferred way of configuring the network on Ubuntu now.

Then I found out that netplan.io is not automatically enabling the systemd-resolved handling of .local hostnames.

I would be able to resolve a hostname using avahi:

$ avahi-resolve --name machine.local
machine.local   192.168.1.5

but not with systemd:

$ systemd-resolve machine.local
machine.local: resolve call failed: 'machine.local' not found

$ resolvectl mdns
Global: no
Link 2 (enp4s0): no

The best solution I found involves keeping systemd-resolved and its /etc/resolv.conf symlink to /run/systemd/resolve/stub-resolv.conf.

I added the following in a new /etc/NetworkManager/conf.d/mdns.conf file:

[connection]
connection.mdns=1

which instructs NetworkManager to resolve mDNS on all network interfaces it manages but not register a hostname since that's done by avahi-daemon.

Then I enabled mDNS globally in systemd-resolved by setting the following in /etc/systemd/resolved.conf:

MulticastDNS=yes

before restarting both services:

systemctl restart NetworkManager.service systemd-resolved.service

With that in place, .local hostnames are resolved properly and I can see that mDNS is fully enabled:

$ resolvectl mdns
Global: yes
Link 2 (enp4s0): yes

Boot problems

For some reason I was able to boot with the kernel I got as part of the focal update, but a later kernel update rendered my machine unbootable.

Adding some missing RAID-related modules to /etc/initramfs-tools/modules:

raid1
dmraid
md-raid1

and then re-creating all initramfs:

update-initramfs -u -k all

seemed to do the trick.

,

Francois MarierCopying a GnuBee's root partition onto a new drive

Here is the process I followed when I moved my GnuBee's root partition from one flaky Kingston SSD drive to a brand new Samsung SSD.

It was relatively straightforward, but there are two key points:

  1. Make sure you label the root partition GNUBEE-ROOT.
  2. Make sure you copy the network configuration from the SSD, not the tmpfs mount.

Copy the partition table

First, with both drives plugged in, I replicated the partition table of the first drive (/dev/sda):

# fdisk -l /dev/sda
Disk /dev/sda: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Disk model: KINGSTON SA400S3
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 799CD830-526B-42CE-8EE7-8C94EF098D46

Device       Start       End   Sectors   Size Type
/dev/sda1     2048   8390655   8388608     4G Linux swap
/dev/sda2  8390656 234441614 226050959 107.8G Linux filesystem

onto the second drive (/dev/sde):

# fdisk /dev/sde

Welcome to fdisk (util-linux 2.33.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0xd011eaba.

Command (m for help): g
Created a new GPT disklabel (GUID: 83F70325-5BE0-034E-A9E1-1965FEFD8E9F).

Command (m for help): n
Partition number (1-128, default 1): 
First sector (2048-488397134, default 2048): 
Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-488397134, default 488397134): +4G

Created a new partition 1 of type 'Linux filesystem' and of size 4 GiB.

Command (m for help): t
Selected partition 1
Partition type (type L to list all types): 19
Changed type of partition 'Linux filesystem' to 'Linux swap'.

Command (m for help): n
Partition number (2-128, default 2): 
First sector (8390656-488397134, default 8390656): 
Last sector, +/-sectors or +/-size{K,M,G,T,P} (8390656-488397134, default 488397134): 234441614

Created a new partition 2 of type 'Linux filesystem' and of size 107.8 GiB.

Command (m for help): p
Disk /dev/sde: 232.9 GiB, 250059350016 bytes, 488397168 sectors
Disk model: Samsung SSD 860 
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 83F70325-5BE0-034E-A9E1-1965FEFD8E9F

Device       Start       End   Sectors   Size Type
/dev/sde1     2048   8390655   8388608     4G Linux swap
/dev/sde2  8390656 234441614 226050959 107.8G Linux filesystem

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

I wasted a large amount of space on the second drive, but that was on purpose in case I decide to later on move to a RAID-1 root partition with the Kingston SSD.

Format the partitions

Second, I formated the new partitions:

# mkswap /dev/sde1 
Setting up swapspace version 1, size = 4 GiB (4294963200 bytes)
no label, UUID=7a85fbce-2493-45c1-a548-4ec6e827ec29

# mkfs.ext4 /dev/sde2 
mke2fs 1.44.5 (15-Dec-2018)
Discarding device blocks: done                            
Creating filesystem with 28256369 4k blocks and 7069696 inodes
Filesystem UUID: 732a76df-d369-4e7b-857a-dd55fd461bbc
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
    4096000, 7962624, 11239424, 20480000, 23887872

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (131072 blocks): done
Writing superblocks and filesystem accounting information: done   

and labeled the root partition so that the GnuBee can pick it up as it boots up:

e2label /dev/sde2 GNUBEE-ROOT

since GNUBEE-ROOT is what uboot will be looking for.

Copy the data over

Finally, I copied the data over from the original drive to the new one:

# umount /etc/network
# mkdir /mnt/root
# mount /dev/sde2 /mnt/root
# rsync -aHx --delete --exclude=/dev/* --exclude=/proc/* --exclude=/sys/* --exclude=/tmp/* --exclude=/mnt/* --exclude=/lost+found/* --exclude=/media/* --exclude=/rom/* /* /mnt/root/
# sync

Note that if you don't unmount /etc/network/, you'll be copying the override provided at boot time instead of the underlying config that's on the root partition. The reason that this matters is that the script that renames the network interfaces to ethblack and ethblue expects specific files in order to produce a working network configuration. If you copy the final modified config files then you end up with an bind-mounted empty directory as /etc/network, and the network interfaces can't be brought up successfully.

,

Lev LafayetteContributing To the International HPC Certification Forum

As datasets grow in size and complexity faster than personal computational devices are able to perform more researchers seek HPC systems as a solution to their computational problems. However, many researchers lack the familiarity with the environment for HPC, and require training. As the formal education curriculum has not yet responded sufficiently to this pressure, leaving HPC centres to provide basic training.

One proposed solution to this issues has been the international HPC Certification Forum, established in 2018, and developed from the Performance Conscious HPC (PeCoH) project in 2017 with the Hamburg HPC Competence Center (HHCC), which had the explicit goal of creating the broad standards for an “HPC driving license”. Since its establishment, the Forum has developed a detailed skill-tree across multiple branches (e.g., HPC Knowledge, HPC Use, Performance Engineering etc) and levels of competencies (basic, intermediate, expert) where very specific skills have particular competencies. In addition, the Forum has developed a summative examination system and a PGP-signed certificate.

Whilst the Forum separates the examination and certification from curriculum development and content delivery, it also requires a feedback mechanism from HPC education providers. Review of learning objectives and specific competencies, development of branches in depth and breadth all contribute to building a community ecosystem for the development of the Forum and its success. The availability of “HPC CF Endorsed Training”, with certifiable content is a clear avenue for HPC centres to contribute to the Forum which will be elaborated in this presentation with examples from current work.

A presentation to eResearchAustralasia 2020; slidedeck and transcript available.

,

Lev LafayetteSpartan: From Experimental Hybrid towards a Petascale Future

Previous presentations to eResearch Australiasia described the implementation of Spartan, the University of Melbourne’s general- purpose HPC system. Initially, this system was small but innovative, arguably even experimental. Features included making extensive use of cloud infrastructure for compute nodes, OpenStack for deployment, Ceph for the file system, ROCE for network, Slurm as the workload manager, EasyBuild and LMod, etc.

Based on consideration of job workload and basic principles of sunk, prospective, and opportunity costs, this combination maximised throughput on a low budget, and attracted international attention as a result. Flexibility in design also allowed the introduction of a large LIEF-supported GPGPU partition, the inclusion of older systems from Melbourne Bioinformatics, and departmental contributions. Early design decisions meant that Spartan has been able to provide performance and flexibility, and as a result continues to show high utilisation and job completion (close to 20 million), with overall metrics well what would be a “top 500” system. The inclusion of an extensive training programme based on androgogical principles has also helped significantly.

Very recently Spartan has undergone some significant architecture modifications, which this report will be of interest to other institutions. The adoption of Spectrum Scale file system has further improved scalability, performance, and reliability, along with adapting a pure HPC environment with a significant increase in core count designed for workload changes and especially queue times. Overall, these new developments in Spartan are designed to be integrated to the University’s Petascale Campus Initiative (PCI).

Presentation to eResearchAustralasia 2020

Slidedeck and transcript available.

David RoweFSK LDPC Data Mode

I’m developing an open source data mode using a FSK modem and powerful LDPC codes. The initial use case is the Open IP over UHF/VHF project, but it’s available in the FreeDV API as a general purpose mode for sending data over radio channels.

It uses 2FSK or 4FSK, has a variety of LDPC codes available, works with bursts or streaming frames, and the sample rate and symbol rate can be set at init time.

The FSK modem has been around for some time, and is used for several applications such as balloon telemetry and FreeDV digital voice modes. Bill, VK5DSP, has recently done some fine work to tightly integrate the LDPC codes with the modem. The FSK_LDPC system has been tested over the air in Octave simulation form, been ported to C, and bundled up into the FreeDV API to make using it straight forward from the command line, C or Python.

We’re not using a “black box” chipset here – this is ground up development of the physical layer using open source, careful simulation, automated testing, and verification of our work on real RF signals. As it’s open source the modem is not buried in proprietary silicon so we can look inside, debug issues and integrate powerful FEC codes. Using a standard RTLSDR with a 6dB noise figure, FSK_LDPC is roughly 10dB ahead of the receiver in a sample chipset. That’s a factor of 10 in power efficiency or bit rate – your choice!

Performance

The performance is pretty close to what is theoretically possible for coded FSK [6]. This is about Eb/No=8dB (2FSK) and Eb/No=6dB (4FSK) for error free transmission of coded data. You can work out what that means for your application using:

  MDS = Eb/No + 10*log10(Rb) + NF - 174
  SNR = Eb/No + 10*log10(Rb/B)

So if you were using 4FSK at 100 bits/s, with a 6dB Noise figure, the Minimum Detectable Signal (MDS) would be:

  MDS = 6 + 10*log10(100) + 6 - 174
      = -142dBm

Given a 3kHz noise bandwidth, the SNR would be:

  SNR = 6 + 10*log10(100/3000)
      = -8.8 dB

How it Works

Here is the FSK_LDPC frame design:

At the start of a burst we transmit a preamble to allow the modem to syncronise. Only one preamble is transmitted for each data burst, which can contain as many frames as you like. Each frame starts with a 32 bit Unique Word (UW), then the FEC codeword consisting of the data and parity bits. At the end of the data bits, we reserve 16 bits for a CRC.

This figure shows the processing steps for the receive side:

Unique Word Selection

The Unique Word (UW) is a known sequence of bits we use to obtain “frame sync”, or identify the start of the frame. We need this information so we can feed the received symbols into the LDPC decoder in the correct order.

To find the UW we slide it againt the incoming bit stream and count the number of errors at each position. If the number of errors is beneath a certain threshold – we declare a valid frame and try to decode it with the LDPC decoder.

Even with pure noise (no signal) a random sequence of bits will occasionally get a partial match (better than our threshold) with the UW. That means the occasional dud frame detection. However if we dial up the threshold too far, we might miss good frames that just happen to have a few too many errors in the UW.

So how do we select the length of the UW and threshold? Well for the last few decades I’ve been guessing. However despite being allergic to probability theory I have recently started using the Binominal Distribution to answer this question.

Lets say we have a 32 bit UW, lets plot the Binomial PDF and CDF:


The x-axis is the number of errors. On each graph I’ve plotted two cases:

  1. A 50% Bit Error Rate (BER). This is what we get when no valid signal is present, just random bits from the demodulator.
  2. A 10% bit error rate. This is the worst case where we need to get frame sync – a valid, but low SNR signal. The rate half LDPC codes fall over at about 10% BER.

The CDF tells us “what is the chance of this many or less errors”. We can use it to pick the UW length and thresholds.

In this example, say we select a “vaild UW threshold” of 6 bit errors out of 32. Imagine we are sliding the UW over random bits. Looking at the 50% BER CDF curve, we have a probablity of 2.6E-4 (0.026%) of getting 6 or less errors. Looking at the 10% curve, we have a probablity of 0.96 (96%) of detecting a valid frame – or in other words we will miss 100 – 96 = 4% of the valid frames that just happen to have 7 or more errors in the unique word.

So there is a trade off between false detection on random noise, and missing valid frames. A longer UW helps separate the two cases, but adds some overhead – as UW bits don’t carry any payload data. A lower threshold means you are less likely to trigger on noise, but more likely to miss a valid frame that has a few errors in the UW.

Continuing our example, lets say we try to match the UW on a stream of random bits from off air noise. Because we don’t know where the frame starts, we need to test every single bit position. So at a bit rate of 1000 bits/s we attempt a match 1000 times a second. The probability of a random match in 1000 bits (1 second) is 1000*2.6E-4 = 0.26, or about 1 chance in 4. So every 4 seconds, on average, we will get an accidental UW match on random data. That’s not great, as we don’t want to output garbage frames to higher layers of our system. So a CRC on the decoded data is performed as a final check to determine if the frame is indeed valid.

Putting it all together

We prototyped the system in GNU Octave first, then ported the individual components to stand alone C programs that we can string together using stdin/stdout pipes:

$ cd codec2/build_linux$ cd src/
$ ./ldpc_enc /dev/zero - --code H_256_512_4 --testframes 200 |
  ./framer - - 512 5186 | ./fsk_mod 4 8000 5 1000 100 - - |
  ./cohpsk_ch - - -10.5 --Fs 8000  |
  ./fsk_demod --mask 100 -s 4 8000 5 - - |
  ./deframer - - 512 5186  |
  ./ldpc_dec - /dev/null --code H_256_512_4 --testframes
--snip--
Raw   Tbits: 101888 Terr:   8767 BER: 0.086
Coded Tbits:  50944 Terr:    970 BER: 0.019
      Tpkts:    199 Tper:     23 PER: 0.116

The example above runs 4FSK at 5 symbols/second (10 bits/s), at a sample rate of 8000 Hz. It uses a rate 0.5 LDPC code, so the throughput is 5 bit/s and it works down to -24dB SNR (at around 10% PER). This is what it sounds like on a SSB receiver:

Yeah I know. But it’s in there. Trust me.

The command line programs above are great for development, but unwieldy for real world use. So they’ve been combined into single FreeDV API functions. These functions take data bytes, convert them to samples you send through your radio, then at the receiver back to bytes again. Here’s a simple example of sending some text using the FreeDV raw data API test programs:

$ cd codec2/build_linux/src
$ echo 'Hello World                    ' |
  ./freedv_data_raw_tx FSK_LDPC - - 2>/dev/null |
  ./freedv_data_raw_rx FSK_LDPC - - 2>/dev/null |
  hexdump -C
48 65 6c 6c 6f 20 57 6f  72 6c 64 20 20 20 20 20  |Hello World     |
20 20 20 20 20 20 20 20  20 20 20 20 20 20 11 c6  |              ..|

The “2>/dev/null” hides some of the verbose debug information, to make this example quieter. The 0x11c6 at the end is the 16 bit CRC. This particular example uses frames of 32 bytes, so I’ve padded the input data with spaces.

My current radio for real world testing is a Raspberry Pi Tx and RTLSDR Rx, but FSK_LDPC could be used over regular SSB radios (just pipe the audio into and out of your radio with a sound card), or other SDRs. FSK chips could be used as the Tx (although their receivers are often sub-optimal as we shall see). You could even try it on HF, and receive the signal remotely with a KiwiSDR.

I’ve used a HackRF as a Tx for low level testing. After a few days of tuning and tweaking it works as advertised – I’m getting within 1dB of theory when tested over the bench at rates between 500 and 20000 bits/s. In the table below Minimum Detectable Signal (MDS) is defined as 10% PER, measured over 100 packets. I send the packets arranged as 10 “bursts” of 10 packets each, with a gap between bursts. This gives the acquisition a bit of a work out (burst operation is typically tougher than streaming):

Info bit rate (bits/s) Mode NF (dB) Expected MDS (dBm) Measured MDS (dBm) Si4464 MDS (dBm)
1000 4FSK 6 -132 -131 -123
10000 4FSK 6 -122 -120 -110
5000 2FSK 6 -123 -123 -113

The Si4464 is used as an example of a chipset implementation. The Rx sensitivity figures were extrapolated from the nearest bit rate on Table 3 of the Si4464 data sheet. It’s hard to compare exactly as the Si4664 doesn’t have FEC. In fact it’s not possible to fully utilise the performance of high performance FEC codes on chipsets as they generally don’t have soft decision outputs.

FSK_LDPC can scale to any bit rate you like. The ratio of the sample rate to symbol rate Fs/Rs = 8000/1000 (8kHz, 1000 bits/s) is the same as Fs/Rs = 800000/100000 (800kHz, 100k bits/s), so it’s the same thing to the modem. I’ve tried FSK_LDPC between 5 and 40k bit/s so far.

With a decent LNA in front of the RTLSDR, I measured MDS figures about 4dB lower at each bit rate. I used a rate 0.5 code for the tests to date, but other codes are available (thanks to Bill and the CML library!).

There are a few improvements I’d like to make. In some tests I’m not seeing the 2dB advantage 4FSK should be delivering. Syncronisation is trickier for 4FSK, as we have 4 tones, and the raw modem operating point is 2dB further down the Eb/No curve than 2FSK. I’d also like to add some GFSK style pulse shaping to make the Tx spectrum cleaner. I’m sure some testing over real world links will also show up a few issues.

It’s fun building, then testing, tuning and pushing through one bug after another to build your very own physical layer! It’s a special sort of magic when the real world results start to approach what the theory says is possible.

Reading Further

[1] Open IP over UHF/VHF Part 1 and Part 2 – my first use case for the FSK_LDPC protocol described in this post.
[2] README_FSK – recently updated documentation on the Codec 2 FSK modem, including lots of examples.
[3] README_data – new documentation on Codec 2 data modes, including the FSK_LDPC mode described in this post.
[4] 4FSK on 25 Microwatts – Bill and I sending 4FSK signals across Adelaide, using an early GNU Octave simulation version of the FSK_LDPC mode described in this post.
[5] Bill’s LowSNR blog.
[6] Coded Modulation Library Overview – CML is a wonderful library that we are using in Codec 2 for our LDPC work. Slide 56 tells us the theoretical mininum Eb/No for coded FSK (about 8dB for 2FSK and 6dB for 4FSK).
[7] 4FSK LLR Estimation Part 2 – GitHub PR used for development of the FSK_LDPC mode.

Linux AustraliaCouncil Meeting Tuesday 20th October 2020 – Minutes

1. Meeting overview and key information

Present

Sae Ra Germaine

Jonathan Woithe

Julien Goodwin

Russell Stuart

Lisa Sands

Joel Addison

Apologies 

Benno Rice

 

Meeting opened at 1930 AEDT by Sae Ra and quorum was achieved.

Minutes taken by Julien.

2. Log of correspondence

  • 1 Oct 2020 to @council: Thanks received from NetThing for LA’s support of their 2020 event.

3. Items for discussion

  • Grant application received from Pauline Clague: Online indigenous women’s possum skin cloak making workshop. Application received on 25 Sep 2020, community consultation closed Fri 9 Oct 2020. To be considered by Council on 20 Oct 2020.
    • MOTION BY Sae Ra That Linux Australia Accepts the Grant Proposal Online indigenous women’s possum skin cloak making workshop submitted by Pauline Clague.
    • Seconded: Julien
    • Motion failed.
    • AI: Jonathan to follow up
  • AGM Discussion
    • Agreement on the 15th.
    • Suggestion for Stewart Smith for returning officer
      • AI: Julien to approach

4. Items for noting

  • LCA 2021 <details redacted>
  • Jonathan talked to the Rockhampton Art Gallery folk about their proposal, resources created will be open, may come back.

5. Other business 

  • AI: Julien to provide redacted minutes for audit
  • AI: Julien to create static copy of planet site so admin team can just switch off

6. In camera

  • No items were discussed in camera

 

Meeting closed at 2000

The post Council Meeting Tuesday 20th October 2020 – Minutes appeared first on Linux Australia.

,

Francois MarierUsing a Let's Encrypt TLS certificate with Asterisk 16.2

In order to fix the following error after setting up SIP TLS in Asterisk 16.2:

asterisk[8691]: ERROR[8691]: tcptls.c:966 in __ssl_setup: TLS/SSL error loading cert file. <asterisk.pem>

I created a Let's Encrypt certificate using certbot:

apt install certbot
certbot certonly --standalone -d hostname.example.com

To enable the asterisk user to load the certificate successfuly (it doesn't permission to access to the certificates under /etc/letsencrypt/), I copied it to the right directory:

cp /etc/letsencrypt/live/hostname.example.com/privkey.pem /etc/asterisk/asterisk.key
cp /etc/letsencrypt/live/hostname.example.com/fullchain.pem /etc/asterisk/asterisk.cert
chown asterisk:asterisk /etc/asterisk/asterisk.cert /etc/asterisk/asterisk.key
chmod go-rwx /etc/asterisk/asterisk.cert /etc/asterisk/asterisk.key

Then I set the following variables in /etc/asterisk/sip.conf:

tlscertfile=/etc/asterisk/asterisk.cert
tlsprivatekey=/etc/asterisk/asterisk.key

Automatic renewal

The machine on which I run asterisk has a tricky Apache setup:

  • a webserver is running on port 80
  • port 80 is restricted to the local network

This meant that the certbot domain ownership checks would get blocked by the firewall, and I couldn't open that port without exposing the private webserver to the Internet.

So I ended up disabling the built-in certbot renewal mechanism:

systemctl disable certbot.timer certbot.service
systemctl stop certbot.timer certbot.service

and then writing my own script in /etc/cron.daily/certbot-francois:

#!/bin/bash
TEMPFILE=`mktemp`

# Stop Apache and backup firewall.
/bin/systemctl stop apache2.service
/usr/sbin/iptables-save > $TEMPFILE

# Open up port 80 to the whole world.
/usr/sbin/iptables -D INPUT -j LOGDROP
/usr/sbin/iptables -A INPUT -p tcp --dport 80 -j ACCEPT
/usr/sbin/iptables -A INPUT -j LOGDROP

# Renew all certs.
/usr/bin/certbot renew --quiet

# Restore firewall and restart Apache.
/usr/sbin/iptables -D INPUT -p tcp --dport 80 -j ACCEPT
/usr/sbin/iptables-restore < $TEMPFILE
/bin/systemctl start apache2.service

# Copy certificate into asterisk.
cp /etc/letsencrypt/live/hostname.example.com/privkey.pem /etc/asterisk/asterisk.key
cp /etc/letsencrypt/live/hostname.example.com/fullchain.pem /etc/asterisk/asterisk.cert
chown asterisk:asterisk /etc/asterisk/asterisk.cert /etc/asterisk/asterisk.key
chmod go-rwx /etc/asterisk/asterisk.cert /etc/asterisk/asterisk.key
/bin/systemctl restart asterisk.service

# Commit changes to etckeeper.
pushd /etc/ > /dev/null
/usr/bin/git add letsencrypt asterisk
DIFFSTAT="$(/usr/bin/git diff --cached --stat)"
if [ -n "$DIFFSTAT" ] ; then
    /usr/bin/git commit --quiet -m "Renewed letsencrypt certs."
    echo "$DIFFSTAT"
fi
popd > /dev/null

,

David RoweRTLSDR Strong Signals and Noise Figure

I’ve been exploring the strong and weak signal performance of the RTLSDR. This all came about after Bill and I performed some Over the Air tests using the RTLSDR. We found that if we turned the gain all the way up, lots of birdies and distortion appeared. However if we wind the gain back to improve strong signal performance, the Noise Figure (NF) increases, messing with our modem link budgets.

Fortunately, there’s been a lot of work on the RTLSDR internals from the open source community. So I’ve had a fun couple of weeks drilling down into the RTLSDR drivers and experimenting. It’s been really interesting, and I’ve learnt a lot about the trade offs in SDR. For USD$22, the RTLSDR is a fine teaching/learning tool – and a pretty good radio.

Strong and Weak Signals

Here is a block diagram of the RTLSDR as I understand it:

It’s a superhet, with an IF bandwidth of 2MHz or less. The IF is sampled by an 8-bit ADC that runs at 28 MHz. The down sampling (decimation) from 28MHz to 2MHz provides some “processing gain” which results in a respectable performance. One part I don’t really understand is the tracking BPF, but I gather it’s pretty broad, and has no impact on strong signals a few MHz away.

There are a few ways for strong signals to cause spurious signals to appear:

  1. A -30dBm signal several MHz away will block the LNA/mixer analog stages at the input. For example a pager signal at 148.6MHz while you are listening to 144.5 MHz. This causes a few birdies to gradually appear, and some compression of your wanted signal.
  2. A strong signal a few MHz away can be aliased into your passband, as the IF filter stop band attenuation is not particularly deep.
  3. A -68dBm signal inside the IF bandwidth will overload the ADC, and the whole radio will fall in a heap.

The levels quoted above are for maximum gain (-g 49), and are consistent with [1] and [5]. If you reduce the gain, the overload levels get higher, but so does your noise figure. You can sometimes work around the first two issues, e.g. if the birdies don’t happen to fall right on top of your signal they can be ignored. So the first two effects – while unfortunate – tend to be more benign than ADC overload.

At the weak signal end of the operating range, we are concerned about noise. Here is how I model the various noise contributions:

The idea of a radio is to use the tuner to remove the narrow band unwanted signals, leaving just your wanted signal. However noise tends to be evenly distributed in frequency, so we are stuck with any noise that is located in the bandwidth of our wanted signal. A common technique is to have enough gain before the ADC such that the signal being sampled is large compared to the ADC quantisation noise. That way we can ignore the noise contribution from the ADC.

However with strong signals, we need to back off the gain to prevent overload. Now the ADC noise becomes significant, and the overall NF of the radio increases.

Another way of looking at this – if the gain ahead of the ADC is small, we have a smaller signal hitting the ADC, which will toggle less bits of the ADC, resulting in coarse quantisation (more quantisation noise) from the ADC.

The final decimation stage reduces ADC quantisation noise. This figure shows our received signal (a single frequency spike), and the ADC quantisation noise (continuous line, with energy at every frequency):

The noise power is the sum of all the noise in our bandwidth of interest. The decimation filter limits this bandwidth, removing most of the noise except for the small band near our wanted signal (shaded blue area). So the total noise power is summed over a smaller bandwidth, noise power is reduced and our SNR goes up. This means that despite just being 8 bits, the ADC performs reasonably well.

Off Air Strong Signals

Here is what my spectrum analyser sees when connected to my antenna. It’s 10MHz wide, centred on 147MHz. There are some strong pager signals that bounce around the -30dBm level, plus some weaker 2 metre band Ham signals between -80 and -100dBm.

When I tune to 144.5MHz, that pager signal is outside the IF bandwidth, so I don’t get a complete breakdown of the radio. However it does cause some strong signal compression, and some birdies/aliased signals pop up. Here is a screen shot from gqrx when the pager signal is active:

In this plot, the signal right on 144.5 is my wanted signal, a weak signal I injected on the bench. The hump at 144.7 is an artefact of the pager signal, which due to strong signal compression just happens to to appear close to (but not on top of) my wanted signal. The wider hump is the IF filter. Here’s a closer look of the IF filter, set to 100kHz using the gqrx “Bandwidth” field:

To see the IF filter shape I connected a terminated LNA to the input of the RTLSDR. This generates wideband noise that is amplified and can be used to visualise the filter shape.

There has been some really cool recent work exploring the IF filtering capabilities of the R820T2 [2]. I tried a few of the newly available IF filter configurations. My experience was the shape factor isn’t great (the filters roll off slowly), so the IF filters don’t have a huge impact on close-in strong signal performance. They do help with attenuating aliased signals.

It was however a great learning and exploring experience, a real deep dive into SDR.

Gqrx is really cool for this sort of work. For example if you tune a strong signal from a signal generator either side of the IF passband, you can see aliases popping up and zooming across the screen. It’s also possible to link gqrx with different RTLSDR driver libraries, to explore their performance. I use the UDP output feature to send samples to my noise figure measuring tool.

Airspy and RTLSDR

The Airspy uses the same tuner chip so I tested it and found about the same strong signal results, i.e. it overloads at the same signal levels as the RTLSDR at max gain. Guess this is no surprise if it’s the same tuner. I once again [5] measured the Airspy noise figure at about 7dB, slightly higher than the 6dB of the RTLSDR. This is consistent with other measurements [1], but quite a way from Airspy’s quoted figure (3.5dB).

This is a good example of the high sample rate/decimation filter architecture in action – the 8-bit RTLSDR delivers a similar noise figure to the 12-bit Airspy.

But the RTLSDR NF probably doesn’t matter

I’ve spent a few weeks spent peering into driver code, messing with inter-stage gains, hammering the little RTLSDR with strong RF signals, and optimising noise figures. However it might all be for naught! At both my home and Bills – external noise appears to dominate. On the 2M band (144.5MHz) we are measuring noise from our (omni) antennas about 20dB above thermal, e.g. -150dBm/Hz compared to the ideal thermal noise level of -174dBm/Hz. Dreaded EMI from all the high speed electronics in urban environments.

EMI can look at lot like strong signal overload – at high gains all these lines pop up, just like ADC overload. If you reduce the gain a bit they drop down into the noise, although its a smooth reduction in level unlike ADC overload which is very abrubt. I guess we are seeing are harmonics of switching power supply signals or other nearby digital devices. Rather than an artefact of Rx overload, we are seeing a sensitive receiver detecting weak EMI signal right down near the noise floor.

So this changes the equation – rather than optimising internal NF we need to ensure enough link margin to get over the ambient noise. An external LNA won’t help, and even a lossy coax run loss might not matter much, as the SNR is more or less set at the antenna.

I’ve settled on the librtlsdr RTLSDR driver/library for my FSK modem experiments as it supports IF filter and inter-stage gain control. I also have a mash up called rtl_fsk.c where I integrate a FSK modem and some powerful LDPC codes, but that’s another story!

Reading Further

[1] Evaluation of SDR Boards V1.0 – A fantastic report on the performance of several SDRs.
[2] RTLSDR: driver extensions – a very interesting set of conference slides discussing recent work with the R820T2 by Hayati Aygun. Many useful links.
[3] librtlsdr fork of rtlsdr driver library that includes support for the IF filter configuration and individual gain stage control discussed in [2].
[4] My fork of librtlsdr – used for NF measurements and rtl_fsk mash up development.
[5] Some Measurements on E4000 and R802T tuners – Detailed look at the tuner by HB9AJG. Fig 5 & 6 bucket curves show overload levels consistent with [1] and my measurements.
[6] Measuring SDR Noise Figure in Real Time

,

Lev LafayetteMonitoring HPC Systems Against Compromised SSH

Secure Shell is a very well established cryptographic network protocol for accessing operating network services and is the typical way to access high-performance computing (HPC) systems in preference to various unsecured remote shell protocols, such as rlogin, telnet, and ftp. As with any security protocol it has undergone several changes to improve the strength of the program, most notably the improvement to SSH-2 which incorporated Diffie-Hellman key exchange. The security advantages of SSH are sufficient that there are strong arguments that computing users should use SSH "everywhere". Such a proposition is no mere fancy; as an adaptable network protocol SSH can be used not just for remote logins and operations, but also for secure mounting of remote file systems, file transfers, port forwarding, network tunnelling, web-broswing through encrypted proxies.

Despite the justified popularity and engineering excellence of SSH, in May 2020 multiple HPC centres across Europe found themselves subject to cyber-attacks via compromised SSH credentials. This included, among others, major centres such as the University of Edinburgh's ARCHER supercomputer, the High-Performance Computing Center Stuttgart's Hawk supercomputer, the Julich Research Center JURECA, JUDAC, and JUWELS supercomputers, and the Swiss Center of Scientific Computations (CSCS), with attacks being launched from compromised networks from the University of Krakow, Poland, Shanghai Jiaotong University, PR China, and China Science and Technology Network, PR China. It is speculated that the attacks were making use of GPU infrastructure for cryptographic coin mining, certainly the most obviously vector for financial gain.

The phrase "compromised SSH credentials" does not imply a weakness in SSH as such, but rather practises around SSH-key use. As explicitly stated by system engineers, some researchers had been been using private SSH keys without passcodes and leaving them in their home directories. These would be used by users to login from one HPC system to another, as it is not unusual for researchers to to have accounts on multiple systems. It is noted that users engaging in such an approach are either unaware or ignored the principles of keeping a private key private, encrypting private keys, or making use of an SSH agent. Access to the keys could be achieved through inappropriate POSIX permissions, or more usual methods of access (e.g., ignoring policies of sharing accounts), with follow-up escalations. Passphraseless SSH keys are common as they are the default when creating a new key with `ssh-keygen` and are are convenient to use, without needing to set up an ssh agent. Passphraseless SSH is also offered by default as part of many cloud offerings, as a relatively secure way to provide a new user with access to their virtual machines.

Based at experiences at the University of Melbourne HPC, it is possible at a system level using `ssh-keygen` to script a search to detect all keys with an empty password even when they are named differently with additional complexity required when parsing non-standard directories and configuration files. This is far more elegant than conducting a `grep` for `MII` and similar techniques which is commonly suggested. A further alternative is a test making direct use of `libssh` headers. This however, will require a version of `libssh` which incorporates the new SSH format, which is atypical for HPC systems which tend to have a degree of stability in the operating system level, even if they make use of diverse versions and compilers on the application level. Of course, invoking a different version of `libssh` (e.g., through an environment modules approach) provides an alternative solution which can be incorporated into a small C program (`key_audit.c`), which elegantly tests validation of an empty passphrase against a given keyfile.

For monitoring such programs are extremely efficient; a test of more than 3000 user accounts takes less than 1.5 seconds on a contemporary system. Following this the use of `inotifywait` can be applied so that any new insecure keys would be detected immediately instead of waiting for a cron task to initiate. The system can be further strengthened by using SSH key-only logins, rather than allowing for password authentication, or restricting password authentication to VPN logins only with `sshd_config` and two-factor authentication. Prevention of shared private keys is achieved by checking for duplications in the `authorized_keys` file. Further with `authorized_keys` managed through a repository with version control (e.g., GitHub, Gitlab), another layer of protection would exist to prevent multiple users to log in with the same key. Each key would a separate file named after its own checksum and use an `AuthorizedKeysCommand` directive.

Further research in this area would involve developing a university-wide API offering public keys for arbitrary ssh logins for various systems on the campus. Keys can be stored for user accounts, which are used for git clone actions, pull requests etc. Access to systems could also be implemented via a zero trust security framework (e.g., BeyondCorp), which would protect systems from both intruders who are already within a network perimeter, but also provide secure access to users who are outside it.

Biographies

Lev Lafayette is a senior HPC DevOps Engineer at the University of Melbourne, which he has been since 2015. Prior to that he worked in a similar role at the Victorian Partnership for Advanced Computing (VPAC) for eight years. He an experienced HPC educator and has a significant number of relevant publications in this field

Narendra Chinnam is a Senior HPC DevOps Engineer at the University of Melbourne. Prior to that, he worked as a Systems Software Engineer in the HPC/AI division at Hewlett-Packard Enterprise (HPE) for thirteen years. He has made significant contributions to the HPE HPC cluster management software portfolio and was a member of several top500 cluster deployment projects, including "Eka" - the 4th fastest supercomputer in the world as of Nov-2007.

Timothy Rice is a DevOps/HPC Engineer at the University of Melbourne. In past lives, he was a researcher and tutor in applied mathematics, an operations analyst in the Department of Defence, and a Software Carpentry instructor.

Book chapter proposal

,

Jan SchmidtRift CV1 – multi-threaded tracking

This video shows the completion of work to split the tracking code into 3 threads – video capture, fast analysis and long analysis.

If the projected pose of an object doesn’t line up with the LEDs where we expect it to be, the frame is sent off for more expensive analysis in another thread. That way, it doesn’t block tracking of other objects – the fast analysis thread can continue with the next frame.

As a new blob is detected in a video frame, it is assigned an ID, and tracked between frames using motion flow. When the analysis results are available at some point in the future, the ID lets us find blobs that still exist in that most recent video frame. If the blobs are still unknowns in the new frame, the code labels them with the LED ID it found – and then hopefully in the next frame, the fast analysis is locked onto the object again.

There are some obvious next things to work on:

  • It’s hard to decide what constitutes a ‘good’ pose match, especially around partially visible LEDs at the edges. More experimentation and refinement needed
  • The IMU dead-reckoning between frames is bad – the accelerometer biases especially for the controllers tends to make them zoom off very quickly and lose tracking. More filtering, bias extraction and investigation should improve that, and help with staying locked onto fast-moving objects.
  • The code that decides whether an LED is expected to be visible in a given pose can use some improving.
  • Often the orientation of a device is good, but the position is wrong – a matching mode that only searches for translational matches could be good.
  • Taking the gravity vector of the device into account can help reject invalid poses, as could some tests against plausible location based on human movements limits.

Code is at https://github.com/thaytan/OpenHMD/tree/rift-correspondence-search

,

Tim RileyOpen source status update, September 2020

Well, didn’t September just fly by? Last month I predicted I’d get through the remaining tasks standing in the way of an Hanami 2.0.0.alpha2 release, and while I made some inroads, I didn’t quite get there. At this point I’ve realised that after many consecutive months of really strong productivity on OSS work (which for me right now is done entirely on nights and weekends), a downtick of a couple of months was inevitable.

Anyway, let’s take a look at what I did manage to achieve!

Reintroduced CSRF protection module to hanami-controller

Sometime during the upheaval that was hanami and hanami-controller’s initial rewrite for 2.0.0, we lost the important CSRFProtection module. I’ve brought it back now, this time locating it within hanami-controller instead of hanami, so it can live alongside the action classes that are meant to include it.

For now, you can manually include it in your action classes:

require "hanami/action"
require "hanami/action/csrf_protection"

class MyAction < Hanami::Action
  include Hanami::Action::CSRFProtection
end

And if you need to manually opt out of the protections for any reason, you can implement this method in any one of your action classes:

def verify_csrf_token?(req, res)
  false
end

Either way, I encourage you to check out the code; it’s a simple module and very readable.

Started on automatic enabling of CSRF protection

For a batteries included experience, having to manually include the CSRFProtection module isn’t ideal. So I’m currently working to make it so the module is automatically included when the Hanami application has sessions enabled. This is close to being done already, in this hanami-controller PR and this counterpart hanami PR. I’m also taking this an an opportunity to move all session-related config away from hanami and into hanami-controller, which I think is a more rational location both in terms of end-user understandability and future maintainability.

We’ll see this one fully wrapped up in next month’s update :)

Improving preservation of state in dry/hanami-view context objects

This one was a doozy. It started with my fixing a bug in my site to do with missing page titles, and then realising that it only partially fixed the problem. I wasn’t doing anything particularly strange in my site, just following a pattern of setting page-specific titles in individual templates:

- page_title "Writing"

h1 Writing
  / ... rest of page

And then rendering the title within the layout:

html
  head
    title = page_title

Both of these page_title invocations called a single method on my view context object:

def page_title(new_title = Undefined)
  if new_title == Undefined
    [@page_title, settings.site_title].compact.join(" | ")
  else
    @page_title = new_title
  end
end

Pretty straightforward, right? However, because the context is reinitialized from a base object for each different rendering environment (first the template, and then the layout), that @page_title we set in the template never goes anywhere else, so it’s not available afterwards in the layout.

This baffled me for a quite a while, because I’ve written similar content_for-style helpers in context classes and they’ve always worked without a hitch. Well, it turns out I got kinda lucky in those cases, because I was using a hash (instead of a direct instance variable) to hold the provided pieces of content, and since hashes (like most objects in Ruby) are passed by reference, that just so happened to permit the same bits of content to be seen from all view context instances.

Once I made this relisation, I first committed this egregious hack just to get my site properly showing titles again, and then I mulled over a couple of options for properly fixing this inside hanami-view.

One option would be to acknowledge this particular use case and adjust the underlying gem to support it, ensuring that the template context is used to initialize the layout context. This works, and it’s certainly the smallest possible fix, but I think it papers over the fundamental issue here: the the creation of multiple context instances is a low-level implementation detail and should not be something the user needs to think about. I think a user should feel free to set an ivar in a context instance and reasonably expect that it’ll be available at all points of the rendering cycle.

So how do we fix this? The obvious way would be to ensure we create only a single context object, and have it work as required for rendering the both the template and the layout. The challenge here is that we require a different RenderEnvironment for each of those, so the correct partials can be looked up, whether they’re called from within templates, or within part or scope classes. This is why we took the approach of creating those multiple context objects in the first place, so each one could have an appropriate RenderEnvironment provided.

So how do we keep a single context instance but somehow swap around the underlying environment? Well, as a matter of fact, there’s a gem for that. After discovering this bug, I was inspired and stayed up to midnight spiking on an approach that relies upon dry-effects and a reader effect to provide the differing render_environment to a single context object.

(The other effect I felt was the extreme tiredness the next day, I’m not the spritely youth I used to be!)

Anyway, if you haven’t checked out dry-effects, I encourage you to do so: it may help you to discover some novel approaches to certain design challenges. In this case, all we need to do is include the effect module in our context class:

module Hanami
  class View
    class Context
      # Instance methods can now expect a `render_env` to be available
      include Dry::Effects.Reader(:render_env)
    end
  end
end

And ensure we’re wrapping a handler around any code expected to throw the effect:

module Hanami
  class View
    module StandaloneView
      # This provides `with_render_env`, used below
      include Dry::Effects::Handler.Reader(:render_env)

      def call(format: config.default_format, context: config.default_context, **input)
        # ...

        render_env = self.class.render_env(format: format, context: context)
        template_env = render_env.chdir(config.template)

        # Anything including Dry::Effects.Reader(:render_env) will have access to the
        # provided `template_env` inside this handler block
        output = with_render_env(template_env) {
          render_env.template(config.template, template_env.scope(config.scope, locals))
        }

        # ...
      end
    end
  end
end

With this in place, we have a design that allows us to use a single context object only for entirety of the render lifecycle. For the simplicity to the user, I think this is a very worthwhile change, and I plan to spend time assessing it in detail this coming month. As Nikita (the author of dry-effects) points out, there’s a performance aspect to consider: although we’re saving ourselves some object allocations here, we now have to dispatch to the handler every time we throw the reader effect for the render_env. Still, it feels like a very promising direction.

Filed issues arising from production Hanami 2 applications

Over the month at work, we put the finishing touches on two brand new services built with Hanami 2. This helped us to identify a bunch of rough edges that will need addressing before we’re done with the release. I filed them on our public Trello board:

This goes to show how critical it is for frameworks like Hanami to have real-world testing, even at these very early stages of new release development. I’m glad I can also serve in this role, and grateful for keenness and patience of our teams in working with cutting edge software!

Fixed accidental memoization of dry-configurable setting values

Last but not least, I fixed this bug in dry-configurable that arose from an earlier change I made to have it evaluate settings immediately if a value was provided.

This was a wonderful little bug to fix, and the perfect encapsulation of why I love programming: we started off with two potentially conflicting use cases, represented as two different test cases (one failing), and had to find a way to satisfy them both while still upholding the integrity of the gem’s overall design. I’m really happy with how this one turned out.

🙌� Thanks to my sponsors!

This month I was honoured to have a new sponsor come on board. Thank you Sven Schwyn for your support! If you’d like to give a boost to my open source work, please consider sponsoring me on GitHub.

See you all next month!

Linux AustraliaSaying Farewell to Planet Linux Australia

Planet Linux Australia (planet.linux.org.au) was started more than 15 years ago by Michael Davies. In the time since (and particularly before the rise of social media), it has provided a valuable service by encouraging the sharing of information and opinions within our Open Source community. However, due to the many diverse communication options now available over the internet, sites such as Planet Linux Australia are no longer used as heavily as they once were. With many other channels now available, the resources required to maintain Planet Linux Australia are becoming difficult to justify.

With this in mind and following the recommendation of Michael Davies, the Linux Australia Council has decided that it is time close Planet Linux Australia. Linux Australia would like to express its profound appreciation for the work Michael and others have done to initiate and maintain this service. Our community has greatly benefited from this service over the years.

The post Saying Farewell to Planet Linux Australia appeared first on Linux Australia.

,

Jan SchmidtRift CV1 update

This is another in my series of updates on developing positional tracking for the Oculus Rift CV1 in OpenHMD

In the last post I ended with a TODO list. Since then I’ve crossed off a few things from that, and fixed a handful of very important bugs that were messing things up. I took last week off work, which gave me some extra hacking hours and enthusiasm too, and really helped push things forward.

Here’s the updated list:

  • The full model search for re-acquiring lock when we start, or when we lose tracking takes a long time. More work will mean avoiding that expensive path as much as possible.
  • Multiple cameras interfere with each other.
    • Capturing frames from all cameras and analysing them happens on a single thread, and any delay in processing causes USB packets to be missed.
    • I plan to split this into 1 thread per camera doing capture and analysis of the ‘quick’ case with good tracking lock, and a 2nd thread that does the more expensive analysis when it’s needed. Partially Fixed
  • At the moment the full model search also happens on the video capture thread, stalling all video input for hundreds of milliseconds – by which time any fast motion means the devices are no longer where we expect them to be.
    • This means that by the next frame, it has often lost tracking again, requiring a new full search… making it late for the next frame, etc.
    • The latency of position observations after a full model search is not accounted for at all in the current fusion algorithm, leading to incorrect reporting. Partially Fixed
  • More validation is needed on the camera pose transformations. For the controllers, the results are definitely wrong – I suspect because the controller LED models are supplied (in the firmware) in a different orientation to the HMD and I used the HMD as the primary test. Much Improved
  • Need to take the position and orientation of the IMU within each device into account. This information is in the firmware information but ignored right now. Fixed
  • Filtering! This is a big ticket item. The quality of the tracking depends on many pieces – how well the pose of devices is extracted from the computer vision and how quickly, and then very much on how well the information from the device IMU is combined with those observations. I have read so many papers on this topic, and started work on a complex Kalman filter for it.
  • Improve the model to LED matching. I’ve done quite a bit of work on refining the model matching algorithm, and it works very well for the HMD. It struggles more with the controllers, where there are fewer LEDs and the 2 controllers are harder to disambiguate. I have some things to try out for improving that – using the IMU orientation information to disambiguate controllers, and using better models for what size/brightness we expect an LED to be for a given pose.
  • Initial calibration / setup. Rather than assuming the position of the headset when it is first sighted, I’d like to have a room calibration step and a calibration file that remembers the position of the cameras.
  • Detecting when cameras have been moved. When cameras observe the same device simultaneously (or nearly so), it should be possible to detect if cameras are giving inconsistent information and do some correction.
  • hot-plug detection of cameras and re-starting them when they go offline or encounter spurious USB protocol errors. The latter happens often enough to be annoying during testing.
  • Other things I can’t think of right now.

As you can see, a few of the top-level items have been fixed, or mostly so. I split the computer vision for the tracking into several threads:

  • 1 thread shared between all sensors to capture USB packets and assemble them into frames
  • 1 thread per sensor to analyse the frame and update poses

The goal with that initial split was to prevent the processing of multiple sensors from interfering with each other, but I found that it also has a strong benefit even with a single sensor. I realised something in the last week that I probably should have noted earlier: The Rift sensors capture a video frame every 19.2ms, but that frame then takes a full 17ms to deliver across the USB – the means that when everything was in one thread, even with 1 sensor there was only about 2.2ms for the full analysis to take place or else we’d miss a packet of the next frame and have to throw it away. With the analysis now happening in a separate thread and a ping-pong double buffer in place, the analysis can take quite a bit longer without losing any video frames.

I plan to add a 2nd per-sensor thread that will divide the analysis further. The current thread will do only fast pass validation of any existing tracking lock, and will defer any longer term analysis to the other thread. That means that if we have a good lock on the HMD, but can’t (for example) find one of the controllers, searching for the controller will be deferred and the fast pass thread will move onto the next frame and keep tracking lock on the headset.

I fixed some bugs in the calculations that move between frames of reference – converting to/from the global position and orientation in the world to the position and orientation relative to each camera sensor when predicting what the appearance of the LEDs should be. I also added in the IMU offset and orientation of the LED models from the firmware, to make the predictions more accurate when devices move in the time between camera exposures.

Yaw Correction: when a device is observed by a sensor, the orientation is incorporated into what the IMU is measuring. The IMU can sense gravity and knows which way is up or down, but not which way is forward. The observation from the camera now corrects for that yaw drift, to keep things pointing the way you expect them to.

Some other bits:

  • Fixing numerical overflow issues in the OpenHMD maths routines
  • Capturing the IMU orientation and prediction that most closely corresponds to the moment each camera image is recorded, instead of when the camera image finishes transferring to the PC (which is 17ms later)
  • Improving the annotated debug view, to help understand what’s happening in the tracking computer vision steps
  • A 1st order estimate of device velocity to help improve the next predicted position

I posted a longer form video walkthrough of the current code in action, and discussing some of the remaining shortcomings.

As previously, the code is available at https://github.com/thaytan/OpenHMD/tree/rift-correspondence-search

Linux AustraliaCouncil Meeting Tuesday 6th October 2020 – Minutes

1. Meeting overview and key information

Present

Sae Ra Germaine

Jonathan Woithe

Julien Goodwin

Russell Stuart

Lisa Sands

Benno Rice

Apologies

Joel Addison

 

Meeting opened at 1931 AEDT by Sae Ra and quorum was achieved.

Minutes taken by Julien.

2. Event Review

Drupal

Admin Team

Pycon

LCA 2020

LCA 2021

LCA 2022

3. Log of correspondence

  • NSW Government Small Business Survey: reminder received via council@ on 23 Sep 2020. Original message received on 21 Sep 2020.
    • We’re not going to complete this survey.
  • Ampache grant progress report 2: received via council@ on 23 Sep 2020. 
    • Not fully final, but we do expect one more.
  • Grant application received from Pauline Clague: Online indigenous women’s possum skin cloak making workshop. Application received on 25 Sep 2020, community consultation closes Fri 9 Oct 2020. To be considered by Council on 20 Oct 2020.

4. Items for discussion

  • AGM
    • AI: Sae Ra will set up time with Julien to plan AGM, possibly next week
    • Annual report has been started
      • Will need photo / bio from people
      • AI: Julien to get minutes posted

5. Items for noting

  • newCardigan AGM (glam tech group), running using our Zoom account
  • Netthing happened
    • We got praised!
  • Software Freedom Day

6. Other business 

  • None

7. In camera

  • One item was discussed in camera.

2042 AEDT close

The post Council Meeting Tuesday 6th October 2020 – Minutes appeared first on Linux Australia.

,

Simon LyallAudiobooks – September 2020

The Demon-Haunted World: Science as a Candle in the Dark by Carl Sagan

Chapters on Pseudoscience vs Science, critical/skeptical thinking, science education and public policy. Hasn’t aged too badly and well written. 4/5

Don’t Fall For It: A Short History of Financial Scams by Ben Carlson

Real-life Stories of financial scams and scammers (some I’ve heard, some new) and then some lessons people can draw from them. Nice quick read. 3/5

The Bomb and America’s Missile Age by Christopher Gainor

A history of the forces that led to the Atlas program from the end of the War to 1954. Covers a wide range of led-up rocket programs, technical advances and political, cold-war and inter-service rivalries. 3/5

Girl on the Block: A True Story of Coming of Age Behind the Counter by Jessica Wragg

A memoir of the author’s life and career from 16 to her mid 20s. Mixture of story, information about meat (including recipes), the butchery trade and meat industry. 3/5

The One Device: The Secret History of the iPhone by Brian Merchant

A history of the iPhone and various technologies that went into it. Plus some tours of components and manufacturing. No cooperation from Apple so some big gaps but does okay. 4/5

Humble Pi: A Comedy of Maths Errors by Matt Parker

Lots of examples of where Maths went wrong. From Financial errors and misbuilt bridges to failed game shows. Mix of well-known and some more obscure stories. Well told. 4/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Leon Brooks

,

Paul WiseFLOSS Activities September 2020

Focus

This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Administration

  • Debian wiki: unblock IP addresses, approve accounts

Communication

Sponsors

The gensim, cython-blis, python-preshed, pytest-rerunfailures, morfessor, nmslib, visdom and pyemd work was sponsored by my employer. All other work was done on a volunteer basis.

,

Hamish TaylorWattlebird feeding

While I hope to update this site again soon, here’s a photo I captured over the weekend in my back yard. The red flowering plant is attracting wattlebirds and honey-eaters. This wattlebird stayed still long enough for me to take this shot. After a little bit of editing, I think it has turned out rather well.

Photo taken with: Canon 7D Mark II & Canon 55-250mm lens.

Edited in Lightroom and Photoshop (to remove a sun glare spot off the eye).

Wattlebird feeding

Gary PendergastMore than 280 characters

It’s hard to be nuanced in 280 characters.

The Twitter character limit is a major factor of what can make it so much fun to use: you can read, publish, and interact, in extremely short, digestible chunks. But, it doesn’t fit every topic, ever time. Sometimes you want to talk about complex topics, having honest, thoughtful discussions. In an environment that encourages hot takes, however, it’s often easier to just avoid having those discussions. I can’t blame people for doing that, either: I find myself taking extended breaks from Twitter, as it can easily become overwhelming.

For me, the exception is Twitter threads.

Twitter threads encourage nuance and creativity.

Creative masterpieces like this Choose Your Own Adventure are not just possible, they rely on Twitter threads being the way they are.

Publishing a short essay about your experiences in your job can bring attention to inequality.

And Tumblr screenshot threads are always fun to read, even when they take a turn for the epic (over 4000 tweets in this thread, and it isn’t slowing down!)

Everyone can think of threads that they’ve loved reading.

My point is, threads are wildly underused on Twitter. I think I big part of that is the UI for writing threads: while it’s suited to writing a thread as a series of related tweet-sized chunks, it doesn’t lend itself to writing, revising, and editing anything more complex.

To help make this easier, I’ve been working on a tool that will help you publish an entire post to Twitter from your WordPress site, as a thread. It takes care of transforming your post into Twitter-friendly content, you can just… write. 🙂

It doesn’t just handle the tweet embeds from earlier in the thread: it handles handle uploading and attaching any images and videos you’ve included in your post.

All sorts of embeds work, too. 😉

It’ll be coming in Jetpack 9.0 (due out October 6), but you can try it now in the latest Jetpack Beta! Check it out and tell me what you think. 🙂

This might not fix all of Twitter’s problems, but I hope it’ll help you enjoy reading and writing on Twitter a little more. 💖

,

Jan SchmidtOculus Rift CV1 progress

In my last post here, I gave some background on how Oculus Rift devices work, and promised to talk more about Rift S internals. I’ll do that another day – today I want to provide an update on implementing positional tracking for the Rift CV1.

I was working on CV1 support quite a lot earlier in the year, and then I took a detour to get basic Rift S support in place. Now that the Rift S works as a 3DOF device, I’ve gone back to plugging away at getting full positional support on the older CV1 headset.

So, what have I been doing on that front? Back in March, I posted this video of a new tracking algorithm I was working on to match LED constellations to object models to get the pose of the headset and controllers:

The core of this matching is a brute-force search that (somewhat cleverly) takes permutations of observed LEDs in the video footage and tests them against permutations of LEDs from the device models. It then uses an implementation of the Lambda Twist P3P algorithm (courtesy of pH5) to compute the possible poses for each combination of LEDs. Next, it projects the points of the candidate pose to count how many other LED blobs will match LEDs in the 3D model and how closely. Finally, it computes a fitness metric and tracks the ‘best’ pose match for each tracked object.

For the video above, I had the algorithm implemented as a GStreamer plugin that ran offline to process a recording of the device movements. I’ve now merged it back into OpenHMD, so it runs against the live camera data. When it runs in OpenHMD, it also has access to the IMU motion stream, which lets it predict motion between frames – which can help with retaining the tracking lock when the devices move around.

This weekend, I used that code to close the loop between the camera and the IMU. It’s a little simple for now, but is starting to work. What’s in place at the moment is:

  • At startup time, the devices track their movement only as 3DOF devices with default orientation and position.
  • When a camera view gets the first “good” tracking lock on the HMD, it calls that the zero position for the headset, and uses it to compute the pose of the camera relative to the playing space.
  • Camera observations of the position and orientation are now fed back into the IMU fusion to update the position and correct for yaw drift on the IMU (vertical orientation is still taken directly from the IMU detection of gravity)
  • Between camera frames, the IMU fusion interpolates the orientation and position.
  • When a new camera frame arrives, the current interpolated pose is transformed back into the camera’s reference frame and used to test if we still have a visual lock on the device’s LEDs, and to label any newly appearing LEDs if they match the tracked pose
  • The device’s pose is refined using all visible LEDs and fed back to the IMU fusion.

With this simple loop in place, OpenHMD can now track multiple devices, and can do it using multiple cameras – somewhat. The first time the tracking block associated to a camera thinks it has a good lock on the HMD, it uses that to compute the pose of that camera. As long as the lock is genuinely good at that point, and the pose the IMU fusion is tracking is good – then the relative pose between all the cameras is consistent and the tracking is OK. However, it’s easy for that to go wrong and end up with an inconsistency between different camera views that leads to jittery or jumpy tracking….

In the best case, it looks like this:

Which I am pretty happy with 🙂

In that test, I was using a single tracking camera, and had the controller sitting on desk where the camera couldn’t see it, which is why it was just floating there. Despite the fact that SteamVR draws it with a Vive controller model, the various controller buttons and triggers work, but there’s still something weird going on with the controller tracking.

What next? I have a list of known problems and TODO items to tackle:

  • The full model search for re-acquiring lock when we start, or when we lose tracking takes a long time. More work will mean avoiding that expensive path as much as possible.
  • Multiple cameras interfere with each other.
    • Capturing frames from all cameras and analysing them happens on a single thread, and any delay in processing causes USB packets to be missed.
    • I plan to split this into 1 thread per camera doing capture and analysis of the ‘quick’ case with good tracking lock, and a 2nd thread that does the more expensive analysis when it’s needed.
  • At the moment the full model search also happens on the video capture thread, stalling all video input for hundreds of milliseconds – by which time any fast motion means the devices are no longer where we expect them to be.
    • This means that by the next frame, it has often lost tracking again, requiring a new full search… making it late for the next frame, etc.
    • The latency of position observations after a full model search is not accounted for at all in the current fusion algorithm, leading to incorrect reporting.
  • More validation is needed on the camera pose transformations. For the controllers, the results are definitely wrong – I suspect because the controller LED models are supplied (in the firmware) in a different orientation to the HMD and I used the HMD as the primary test.
  • Need to take the position and orientation of the IMU within each device into account. This information is in the firmware information but ignored right now.
  • Filtering! This is a big ticket item. The quality of the tracking depends on many pieces – how well the pose of devices is extracted from the computer vision and how quickly, and then very much on how well the information from the device IMU is combined with those observations. I have read so many papers on this topic, and started work on a complex Kalman filter for it.
  • Improve the model to LED matching. I’ve done quite a bit of work on refining the model matching algorithm, and it works very well for the HMD. It struggles more with the controllers, where there are fewer LEDs and the 2 controllers are harder to disambiguate. I have some things to try out for improving that – using the IMU orientation information to disambiguate controllers, and using better models for what size/brightness we expect an LED to be for a given pose.
  • Initial calibration / setup. Rather than assuming the position of the headset when it is first sighted, I’d like to have a room calibration step and a calibration file that remembers the position of the cameras.
  • Detecting when cameras have been moved. When cameras observe the same device simultaneously (or nearly so), it should be possible to detect if cameras are giving inconsistent information and do some correction.
  • hot-plug detection of cameras and re-starting them when they go offline or encounter spurious USB protocol errors. The latter happens often enough to be annoying during testing.
  • Other things I can’t think of right now.

A nice side effect of all this work is that it can all feed in later to Rift S support. The Rift S uses inside-out tracking to determine the headset’s position in the world – but the filtering to combine those observations with the IMU data will be largely the same, and once you know where the headset is, finding and tracking the controller LED constellations still looks a lot like the CV1’s system.

If you want to try it out, or take a look at the code – it’s up on Github. I’m working in the rift-correspondence-search branch of my OpenHMD repository at https://github.com/thaytan/OpenHMD/tree/rift-correspondence-search

,

Simon LyallTalks from KubeCon + CloudNativeCon Europe 2020 – Part 1

Various talks I watched from their YouTube playlist.

Application Autoscaling Made Easy With Kubernetes Event-Driven Autoscaling (KEDA) – Tom Kerkhove

I’ve been using Keda a little bit at work. Good way to scale on random stuff. At work I’m scaling pods against length of AWS SQS Queues and as a cron. Lots of other options. This talk is a 9 minute intro. A bit hard to read the small font on the screen of this talk.

Autoscaling at Scale: How We Manage Capacity @ Zalando – Mikkel Larsen, Zalando SE

  • These guys have their own HPA replacement for scaling. Kube-metrics-adapter .
  • Outlines some new stuff in scaling in 1.18 and 1.19.
  • They also have a fork of the Cluster Autoscaler (although some of what it seems to duplicate Amazon Fleets).
  • Have up to 1000 nodes in some of their clusters. Have to play with address space per nodes, also scale their control plan nodes vertically (control plan autoscaler).
  • Use Virtical Pod autoscaler especially for things like prometheus that varies by the size of the cluster. Have had problems with it scaling down too fast. They have some of their own custom changes in a fork

Keynote: Observing Kubernetes Without Losing Your Mind – Vicki Cheung

  • Lots of metrics dont’t cover what you want and get hard to maintain and complex
  • Monitor core user workflows (ie just test a pod launch and stop)
  • Tiny tools
    • 1 watches for events on cluster and logs them -> elastic
    • 2 watches container events -> elastic
    • End up with one timeline for a deploy/job covering everything
    • Empowers users to do their own debugging

Autoscaling and Cost Optimization on Kubernetes: From 0 to 100 – Guy Templeton & Jiaxin Shan

  • Intro to HPA and metric types. Plus some of the newer stuff like multiple metrics
  • Vertical pod autoscaler. Good for single pod deployments. Doesn’t work will with JVM based workloads.
  • Cluster Autoscaler.
    • A few things like using prestop hooks to give pods time to shutdown
    • pod priorties for scaling.
    • –expandable-pods-priority-cutoff to not expand for low-priority jobs
    • Using the priority-expander to try and expand spots first and then fallback to more expensive node types
    • Using mixed instance policy with AWS . Lots of instance types (same CPU/RAM though) to choose from.
    • Look at poddistruptionbudget
    • Some other CA flags like scale-down-utilisation-threshold to lok at.
  • Mention of Keda
  • Best return is probably tuning HPA
  • There is also another similar talk . Note the Male Speaker talks very slow so crank up the speed.

Keynote: Building a Service Mesh From Scratch – The Pinterest Story – Derek Argueta

  • Changed to Envoy as a http proxy for incoming
  • Wrote own extension to make feature complete
  • Also another project migrating to mTLS
    • Huge amount of work for Java.
    • Lots of work to repeat for other languages
    • Looked at getting Envoy to do the work
    • Ingress LB -> Inbound Proxy -> App
  • Used j2 to build the Static config (with checking, tests, validation)
  • Rolled out to put envoy in front of other services with good TLS termination default settings
  • Extra Mesh Use Cases
    • Infrastructure-specific routing
    • SLI Monitoring
    • http cookie monitoring
  • Became a platform that people wanted to use.
  • Solving one problem first and incrementally using other things. Many groups had similar problems. “Just a node in a mesh”.

Improving the Performance of Your Kubernetes Cluster – Priya Wadhwa, Google

  • Tools – Mostly tested locally with Minikube (she is a Minikube maintainer)
  • Minikube pause – Pause the Kubernetes systems processes and leave app running, good if cluster isn’t changing.
  • Looked at some articles from Brendon Gregg
  • Ran USE Method against Minikube
  • eBPF BCC tools against Minikube
  • biosnoop – noticed lots of writes from etcd
  • KVM Flamegraph – Lots of calls from ioctl
  • Theory that etcd writes might be a big contributor
  • How to tune etcd writes ( updated –snapshot-count flag to various numbers but didn’t seem to help)
  • Noticed CPU spkies every few seconds
  • “pidstat 1 60” . Noticed kubectl command running often. Running “kubectl apply addons” regularly
  • Suspected addon manager running often
  • Could increase addon manager polltime but then addons would take a while to show up.
  • But in Minikube not a problem cause minikube knows when new addons added so can run the addon manager directly rather than it polling.
  • 32% reduction in overhead from turning off addon polling
  • Also reduced coredns number to one.
  • pprof – go tool
  • kube-apiserver pprof data
  • Spending lots of times dealing with incoming requests
  • Lots of requests from kube-controller-manager and kube-scheduler around leader-election
  • But Minikube is only running one of each. No need to elect a leader!
  • Flag to turn both off –leader-elect=false
  • 18% reduction from reducing coredns to 1 and turning leader election off.
  • Back to looking at etcd overhead with pprof
  • writeFrameAsync in http calls
  • Theory could increase –proxy-refresh-interval from 30s up to 120s. Good value at 70s but unsure what behavior was. Asked and didn’t appear to be a big problem.
  • 4% reduction in overhead

Share

,

David RowePlaying with PAPR

The average power of a FreeDV signal is surprisingly hard to measure as the parallel carriers produce a waveform that has many peaks an troughs as the various carriers come in and out of phase with each other. Peter, VK3RV has been working on some interesting experiments to measure FreeDV power using calorimeters. His work got me thinking about FreeDV power and in particular ways to improve the Peak to Average Power Ratio (PAPR).

I’ve messed with a simple clipper for FreeDV 700C in the past, but decided to take a more scientific approach and use some simulations to measure the effect of clipping on FreeDV PAPR and BER. As usual, asking a few questions blew up into a several week long project. The usual bugs and strange, too good to be true initial results until I started to get results that felt sensible. I’ve tested some of the ideas over the air (blowing up an attenuator along the way), and learnt a lot about PAPR and related subjects like Peak Envelope Power (PEP).

The goal of this work is to explore the effect of a clipper on the average power and ultimately the BER of a received FreeDV signal, given a transmitter with a fixed peak output power.

Clipping to reduce PAPR

In normal operation we adjust our Tx drive so the peaks just trigger the ALC. This sets the average power at Ppeak – PAPR Watts, for example Pav = 100WPEP – 10dB = 10W average.

The idea of the clipper is to chop the tops off the FreeDV waveform so the PAPR is decreased. We can then increase the Tx drive, and get a higher average power. For example if PAPR is reduced from 10 to 4dB, we get Pav = 100WPEP – 4dB – 40W. That’s 4x the average power output of the 10dB PAPR case – Woohoo!

In the example below the 16 carrier waveform was clipped and the PAPR reduced from 10.5 to 4.5dB. The filtering applied after the clipper smooths out the transitions (and limits the bandwidth to something reasonable).

However it gets complicated. Clipping actually reduces the average power, as we’ve removed the high energy parts of the waveform. It also distorts the signal. Here is a scatter diagram of the signal before and after clipping:


The effect looks like additive noise. Hmmm, and what happens on multipath channels, does the modem perform the same as for AWGN with clipped signals? Another question – how much clipping should we apply?

So I set about writing a simulation (papr_test.m) and doing some experiments to increase my understanding of clippers, PAPR, and OFDM modem performance using typical FreeDV waveforms. I started out trying a few different compression methods such as different compander curves, but found that clipping plus a bandpass filter gives about the same result. So for simplicity I settled on clipping. Throughout this post many graphs are presented in terms of Eb/No – for the purpose of comparison just consider this the same thing as SNR. If the Eb/No goes up by 1dB, so does the SNR.

Here’s a plot of PAPR versus the number of carriers, showing PAPR getting worse with the number of carriers used:

Random data was used for each symbol. As the number of carriers increases, you start to get phases in carriers cancelling due to random alignment, reducing the big peaks. Behaivour with real world data may be different; if there are instances where the phases of all carriers are aligned there may be larger peaks.

To define the amount of clipping I used an estimate of the PDF and CDF:

The PDF (or histogram) shows how likely a certain level is, and the CDF shows the cumulative PDF. High level samples are quite unlikely. The CDF shows us what proportion of samples are above and below a certain level. This CDF shows us that 80% of the samples have a level of less than 4, so only 20% of the samples are above 4. So a clip level of 0.8 means the clipper hard limits at a level of 4, which would affect the top 20% of the samples. A clip value of 0.6, would mean samples with a level of 2.7 and above are clipped.

Effect of clipping on BER

Here are a bunch of curves that show the effect of clipping on and AWGN and multipath channel (roughly CCIR poor). A 16 carrier signal was used – typical of FreeDV waveforms. The clipping level and resulting PAPR is shown in the legend. I also threw in a Tx diversity curve – sending each symbol twice on double the carriers. This is the approach used on FreeDV 700C and tends to help a lot on multipath channels.

As we clip the signal more and more, the BER performance gets worse (Eb/No x-axis) – but the PAPR is reduced so we can increase the average power, which improves the BER. I’ve tried to show the combined effect on the (peak Eb/No x-axis) curves which scales each curve according to it’s PAPR requirements. This shows the peak power required for a given BER. Lower is better.




Take aways:

  1. The 0.8 and 0.6 clip levels work best on the peak Eb/No scale, ie when we combine effect of the hit on BER performance (bad) and PAPR improvement (good).
  2. There is about 4dB improvement across a range of operating points. This is pretty signficant – similar to gains we get from Tx diversity or a good FEC code.
  3. AWGN and Multipath improvements are similar – good. Sometimes you get an algorithm that works well on AWGN but falls in a heap on multipath channels, which are typically much tougher to push bits through.
  4. I also tried 8 carrier waveforms, which produced results about 1dB better, as I guess fewer carriers have a lower PAPR to start with.
  5. Non-linear techniques like clipping spread the energy in frequency.
  6. Filtering to constrain the frequency spread brings the PAPR up again. We can trade off PAPR with bandwidth: lower PAPR, more bandwidth.
  7. Non-linear technqiques will mess with QAM more. So we may hit a wall at high data rates.

Testing on a Real PA

All these simulations are great, but how do they compare with operation on a real HF radio? I designed an experiment to find out.

First, some definitions.

The same FreeDV OFDM signal is represented in different ways as it winds it’s way through the FreeDV system:

  1. Complex valued samples are used for much of the internal signal processing.
  2. Real valued samples at the interfaces, e.g. for getting samples in and out of a sound card and standard HF radio.
  3. Analog baseband signals, e.g. voltage inside your radio.
  4. Analog RF signals, e.g. at the output of your PA, and input to your receiver terminals.
  5. An electromagnetic wave.

It’s the same signal, as we can convert freely between the representations with no loss of fidelity, but it’s representation can change the way measures like PAPR work. This caused me some confusion – for example the PAPR of the real signal is about 3dB higher than the complex valued version! I’m still a bit fuzzy on this one, but have satisfied myself that the PAPR of the complex signal is the same as the PAPR of the RF signal – which is what we really care about.

Another definition that I had to (re)study was Peak Envelope Power (PEP) – which is the peak power averaged over one or more carrier cycles. This is the RF equivalent to our “peak” in PAPR. When driven by any baseband input signal, it’s the maximum RF power of the radio, averaged over one or more carrier cycles. Signals such as speech and FreeDV waveforms will have occasional peaks that hit the PEP. A baseband sine wave driving the radio would generate a RF signal that sits at the PEP power continuously.

Here is the experimental setup:

The idea is to play canned files through the radio, and measure the average Tx power. It took me several attempts before my experiment gave sensible results. A key improvement was to make the peak power of each sampled signal the same. This means I don’t have to keep messing with the audio drive levels to ensure I have the same peak power. The samples are 16 bits, so I normalised each file such that the peak was at +/- 10000.

Here is the RF power sampler:

It works pretty well on signals from my FT-817 and IC-7200, and will help prevent any more damage to RF test equipment. I used my RF sampler after my first attempt using a SMA barell attenuator resulted in it’s destruction when I accdentally put 5W into it! Suddenly it went from 30dB to 42dB attenuation. Oops.

For all the experiments I am tuned to 7.175 MHz and have the FT-817 on it’s lowest power level of 0.5W.

For my first experiment I played a 1000 Hz sine wave into the system, and measured the average power. I like to start with simple signals, something known that lets me check all the fiddly RF kit is actually working. After a few hours of messing about – I did indeed see 27dBm (0.5W) on my spec-an. So, for a signal with 0dB PAPR, we measure average power = PEP. Check.

In my next experiment, I measured the effect of ALC on TX power. With the FT-817 on it’s lowest power setting (0.5W), I increased the drive until just before the ALC bars came on: Here is the relationship I found with output power:

Bars Tx Power
0 26.7
1 26.4
2 26.7
3 27.0

So the ALC really does clamp the power at the peak value.

On to more complex FreeDV signals.

Mesuring the average power OFDM/parallel tone signals proved much harder to measure on the spec-an. The power bounces around over a period of several seconds the ODFM waveform evolves which can derail many power measurement techniques. The time constant, or measurement window is important – we want to capture the total power over a few seconds and average the value.

After several attempts and lots of head scratching I settled on the following spec-an settings:

  1. 10s sweep time so the RBW filter is averging a lot of time varying power at each point in the sweep.
  2. 100kHz span.
  3. RBW/VBW of 10 kHz so we capture all of the 1kHz wide OFDM signal in the RBW filter peak when averaging.
  4. Power averaging over 5 samples.

The two-tone signal was included to help me debug my spec-an settings, as it has a known (3dB) PAPR.

Here is a table showing the results for several test signals, all of which have the same peak power:

Sample Description PAPR Theory/Sim (dB) PAPR Mesured (dB)
sine1000 sine wave at 1000 Hz 0 0
sine_800_1200 two tones at 800 and 1200Hz 3 4
vanilla 700D test frames unclipped 7.1 7
clip0.8 700D test frames clipped at 0.8 3.4 4
ve9qrp 700D with real speech payload data 11 10.5

Click on the file name to listen to a 5 second sample of the sample. The lower PAPR (higher average power) signals sound louder – I guess our ears work on average power too! I kept the drive constant and the PEP/peak just happened to hit 26dBm. It’s not critical, as long as the drive (and hence peak level) is the same across all waveforms tested.

Note the two tone “control” is 1dB off (4dB measured on a known 3dB PAPR signal), I’m not happy about that. This suggests a spec-an set up issue or limitation on my spec-an (e.g. the way it averages power).

However the other signals line up OK to the simulated values, within about +/- 0.5dB, which suggests I’m on the right track with my simulations.

The modulated 700D test frame signals were generated by the Octave ofdm_tx.m script, which reports the PAPR of the complex signal. The same test frame repeats continuously, which makes BER measurements convenient, but is slightly unrealistic. The PAPR was lower than the ve9qrp signal which has real speech payload data. Perhaps because the more random, real world payload data leads to occasional frames where the phase of the carriers align leading to large peaks.

Another source of discrepancy is the non flat frequency filtering in the baseband audio/crystal filter path the signal has to flow through before it emerges as RF.

The zero-span spec-an setting plots power over time, and is very useful for visualing PAPR. The first plot shows the power of our 1000 Hz sine signal (yellow), and the two tone test signal (purple):

You can see how mixing just two signals modulates the power over time, the effect on PAPR, and how the average power is reduced. Next we have the ve9qrp signal (yellow), and our clip 0.8 signal (purple):

It’s clear the clipped signal has a much higher average power. Note the random way the waveform power peaks and dips, as the various carriers come into phase. Note very few high power peaks in the ve9qrp signal – in this sample we don’t have any that hits +26dBm, as they are fairly rare.

I found eye-balling the zero-span plots gave me similar values to non-zero span results in the table above, a good cross check.

Take aways:

  1. Clipping is indeed improving our measured average power, but there are some discrepencies between the measured and PAPR values estimated from theory/simulation.
  2. Using a SDR to receive the signal and measure PAPR using my own maths might be easier than fiddling with the spec-an and guessing at it’s internal algorithms.
  3. PAPR is worse for real world signals (e.g. ve9qrp) than my canned test frames due to relatively rare alignments of the carrier phases. This might only happen once every few seconds, but significantly raises the PAPR, and hurts our average power. These occasional peaks might be triggering the ALC, pushing the average power down every time they occur. As they are rare, these peaks can be clipped with no impact on perceived speech quality. This is why I like the CDF/PDF method of setting thresholds, it lets us discard rare (low probability) outliers that might be hurting our average power.

Conclusions and Further work

The simulations suggest we can improve FreeDV by 4dB using the right clipper/filter combination. Initial tests over a real PA show we can indeed reduce PAPR in line with our simulations.

This project has lead me down an interesting rabbit hole that has kept me busy for a few weeks! Just in case I haven’t had enough, some ideas for further work:

  1. Align these clipping levels and filtering to FreeDV 700D (and possibly 2020). There is existing clipper and filter code but the thresholds were set by educated guess several years for 700C.
  2. Currently each FreeDV waveform is scaled to have the same average power. This is the signal fed via the sound card to your Tx. Should the levels of each FreeDV waveform be adjusted to be the same peak value instead?
  3. Design an experiment to prove BER performance at a given SNR is improved by 4dB as suggested by these simulations. Currently all we have measured is the average power and PAPR – we haven’t actually verified the expected 4dB increase in performance (suggested by the BER simulations above) which is the real goal.
  4. Try the experiments on a SDR Tx – they tend to get results closer to theory due to no crystal filters/baseband audio filtering.
  5. Try the experiments on a 100WPEP Tx – I have ordered a dummy load to do that relatively safely.
  6. Explore the effect of ALC on FreeDV signals and why we set the signals to “just tickle” the ALC. This is something I don’t really understand, but have just assumed is good practice based on other peoples experiences with parallel tone/OFDM modems and on-air FreeDV use. I can see how ALC would compress the amplitude of the OFDM waveform – which this blog post suggests might be a good thing! Perhaps it does so in an uncontrolled manner – as the curves above show the amount of compression is pretty important. “Just tickling the ALC” guarantees us a linear PA – so we can handle any needed compression/clipping carefully in the DSP.
  7. Explore other ways of reducing PAPR.

To peel away the layers of a complex problem is very satisfying. It always takes me several goes, improvements come as the bugs fall out one by one. Writing these blog posts oftens makes me sit back and say “huh?”, as I discover things that don’t make sense when I write them up. I guess that’s the review process in action.

Links

Design for a RF Sampler I built, mine has a 46dB loss.

Peak to Average Power Ratio for OFDM – Nice discussion of PAPR for OFDM signals from DSPlog.

,

Glen TurnerConverting MPEG-TS to, well, MPEG

Digital TV uses MPEG Transport Stream, which is a container for video designed for lossy transmission, such as radio. To save CPU cycles, Personal Video Records often save the MPEG-TS stream directly to disk. The more usual MPEG is technically MPEG Program Stream, which is designed for lossless transmission, such as storage on a disk.

Since these are a container formats, it should be possible to losslessly and quickly re-code from MPEG-TS to MPEG-PS.

ffmpeg -ss "${STARTTIME}" -to "${DURATION}" -i "${FILENAME}" -ignore_unknown -map 0 -map -0:2 -c copy "${FILENAME}.mpeg"


comment count unavailable comments

,

Chris NeugebauerTalk Notes: Practicality Beats Purity: The Zen Of Python’s Escape Hatch?

I gave the talk Practicality Beats Purity: The Zen of Python’s Escape Hatch as part of PyConline AU 2020, the very online replacement for PyCon AU this year. In that talk, I included a few interesting links code samples which you may be interested in:

@apply

def apply(transform):

    def __decorator__(using_this):
        return transform(using_this)

    return __decorator__


numbers = [1, 2, 3, 4, 5]

@apply(lambda f: list(map(f, numbers)))
def squares(i):
  return i * i

print(list(squares))

# prints: [1, 4, 9, 16, 25]

Init.java

public class Init {
  public static void main(String[] args) {
    System.out.println("Hello, World!")
  }
}

@switch and @case

__NOT_A_MATCHER__ = object()
__MATCHER_SORT_KEY__ = 0

def switch(cls):

    inst = cls()
    methods = []

    for attr in dir(inst):
        method = getattr(inst, attr)
        matcher = getattr(method, "__matcher__", __NOT_A_MATCHER__)

        if matcher == __NOT_A_MATCHER__:
            continue

        methods.append(method)

    methods.sort(key = lambda i: i.__matcher_sort_key__)

    for method in methods:
        matches = method.__matcher__()
        if matches:
            return method()

    raise ValueError(f"No matcher matches value {test_value}")

def case(matcher):

    def __decorator__(f):
        global __MATCHER_SORT_KEY__

        f.__matcher__ = matcher
        f.__matcher_sort_key__ = __MATCHER_SORT_KEY__
        __MATCHER_SORT_KEY__ += 1
        return f

    return __decorator__



if __name__ == "__main__":
    for i in range(100):

        @switch
        class FizzBuzz:

            @case(lambda: i % 15 == 0)
            def fizzbuzz(self):
                return "fizzbuzz"

            @case(lambda: i % 3 == 0)
            def fizz(self):
                return "fizz"

            @case(lambda: i % 5 == 0)
            def buzz(self):
                return "buzz"

            @case(lambda: True)
            def default(self):
                return "-"

        print(f"{i} {FizzBuzz}")

,

Paul WiseFLOSS Activities August 2020

Focus

This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Administration

  • Debian: restarted RAM eating service
  • Debian wiki: unblock IP addresses, approve accounts

Sponsors

The cython-blis/preshed/thinc/theano bugs and smart-open/python-importlib-metadata/python-pyfakefs/python-zipp/python-threadpoolctl backports were sponsored by my employer. All other work was done on a volunteer basis.

,

Tim RileyOpen source status update, August 2020

Oh, hello there, has it been another month already? After my bumper month in July, August was a little more subdued (I had to devote more energy towards a work project), but I still managed to get a few nice things done.

Hanami session configuration back in action

In a nice little surprise, I realised that all the building blocks had fallen into place for Hanami’s standard session configuration to begin working again.

So with a couple of lines of config uncommented, Luca’s “soundeck� demo app has working cookie sessions again. Anyone pulling from my Hanami 2 application template will see the same config enabled after this commit, too.

Container auto-registration respects application inflector

Another small config-related changed I made was to [pass the Hanami 2 application inflector]((https://github.com/hanami/hanami/pull/1069) through to to the dry-system container handling component auto-registration.

With this in place, if you configure a custom inflection for your app, e.g.

module MyApp
  class Application < Hanami::Application
    config.inflector do |inflections|
      inflections.acronym "NBA"
    end
  end
end

Then it will be respected when your components are auto-registerd, so you can use your custom inflections as part of your module namespacing.

With the setup above, if I had a file called lib/my_app/nba_jam/cheat_codes.rb, the container would rightly expect it to define MyApp::NBAJam::CheatCodes.

I’m delighed to see this in place. Having to deal with awkward namespaces (e.g. SomeApi instead of SomeAPI) purely because the framework wasn’t up to the task of handling it has long been an annoyance to me (these details matter!), and I’m really glad that Hanami 2 will make this a piece of cake.

This outcome is also a testament to the design approach we’ve taken for all the underpinning dry-rb gems. By ensuring important elements like an inflector were represented by a dedicated abstraction - and a configurable one at that - it was so easy for Hanami to provide its own inflector and see it used wherever necessary.

Customisable standard application components

Every Hanami 2 application will come with a few standard components, like a logger, inflector, and your settings. These are made available as registrations in your application container, e.g. Hanami.application["logger"], to make them easy to auto-inject into your other application components as required.

While it was my intention for these standard components to be replaceable by your own custom versions, what we learnt this month is that this was practically impossible! There was just no way to register your own replacements early enough for them to be seen during the application boot process.

After spending a morning trying to get this to work, I decided that this situation was in fact pointing to a missing feature in dry-system. So I went ahead and added support for multiple boot file directories in dry-system. Now you can configure an array of directories on this new bootable_dirs setting:

class MyContainer < Dry::System::Container
  config.bootable_dirs = [
    "config/boot/custom_components",
    "config/boot/standard_components"
  ]
end

When the container locates a bootable component, it will work with these bootable_dirs just like you’d expect your shell to work with its $PATH: it will search the directories, in order, and the first found instance of your component will be used.

With this in place, I updated Hanami to to configure its own bootable_dirs and use its own directory for defining its standard components. The default directory is secondary to the directory specified for the application’s own bootable components, so this means if you want to replace Hanami’s standard logger, you can just create a config/boot/logger.rb and you’ll be golden!

Started rationalising flash

Last month when I was digging into some session-related details of the framework, I realised that the flash we inherited from Hanami 1 was pretty hard to work with. It didn’t seem to behave in the same way we expect a flash to work, e.g. to automatically preserve added messages and make them available to the next request. The code was also too complex. This is a solved problem, so I looked around and started rationalising the Hanami 2 flash system based on code from Roda’s flash plugin. I haven’t had the chance to finish this yet, but it’ll be first cab off the rank in September.

Plans for September

With a concerted effort, I think I could make September the month I knock off all my remaining tasks for a 2.0.0.alpha2 release. It’s been tantalisingly close for a while, but I think it could really happen!

Time to get stuck into it.

🙌� Thanks to my sponsors!

Lastly, my continued thanks to my little posse of GitHub sponsors for your continued support, especially Benjamin Klotz.

I’d really love for you to join the gang. If you care about a healthy, diverse future for Ruby application developers, please consider sponsoring my open source work!

,

Simon LyallAudiobooks – August 2020

Truth, Lies, and O-Rings: Inside the Space Shuttle Challenger Disaster by Allan J. McDonald

The author was a senior manager in the booster team who cooperated more fully with the investigation than NASA or his company’s bosses would have preferred. Mostly accounts of meetings, hearings & coverups with plenty of technical details. 3/5

The Ascent of Money: A Financial History of the World by Niall Ferguson

A quick tour though the rise of various financial concepts like insurance, bonds, stock markets, bubbles, etc. Nice quick intro and some well told stories. 4/5

The Other Side of the Coin: The Queen, the Dresser and the Wardrobe by Angela Kelly

An authorized book from the Queen’s dresser. Some interesting stories. Behind-the-scenes on typical days and regular events Okay even without photos. 3/5

Second Wind: A Sunfish Sailor, an Island, and the Voyage That Brought a Family Together by Nathaniel Philbrick

A writer takes up competitive sailing after a gap of 15 years, training on winter ponds in prep for the Nationals. A nice read. 3/5

Spitfire Pilot by Flight-Lieutentant David M. Crook, DFC

An account of the Author’s experiences as a pilot during the Battle of Britain. Covering air-combat, missions, loss of friends/colleagues and off-duty life. 4/5

Wild City: A Brief History of New York City in 40 Animals
by Thomas Hynes

A Chapter on each species. Usually information about incidents they were involved in (see “Tigers”) or the growth, decline, comeback of their population & habit. 3/5

Fire in the Sky: Cosmic Collisions, Killer Asteroids, and the Race to Defend Earth by Gordon L. Dillow

A history of the field and some of the characters. Covers space missions, searchers, discovery, movies and the like. Interesting throughout. 4/5

The Long Winter: Little House Series, Book 6 by Laura Ingalls Wilder

The family move into their store building in town for the winter. Blizzard after blizzard sweeps through the town over the next few months and starvation or freezing threatens. 3/5

The Time Traveller’s Almanac Part 1: Experiments edited by Anne and Jeff VanderMeer

First of 4 volumes of short stories. 14 stories, many by well known names (ie Silverberg, Le Guin). A good collection. 3/5

A Long Time Ago in a Cutting Room Far, Far Away: My Fifty Years Editing Hollywood Hits—Star Wars, Carrie, Ferris Bueller’s Day Off, Mission: Impossible, and More by Paul Hirsch

Details of the editing profession & technology. Lots of great stories 4/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

Simon LyallLinkedin putting pressure on users to enable location tracking

I got this email from Linkedin this morning. It is telling me that they are going to change my location from “Auckland, New Zealand” to “Auckland, Auckland, New Zealand“.

Email from Linkedin on 30 August 2020

Since “Auckland, Auckland, New Zealand” sounds stupid to New Zealanders (Auckland is pretty much a big city with a single job market and is not a state or similar) I clicked on the link and opened the application to stick with what I currently have

Except the problem is that the pulldown doesn’t offer many any other locations

The only way to change the location is to click “use Current Location” and then allow Linkedin to access my device’s location.

According to the help page:

By default, the location on your profile will be suggested based on the postal code you provided in the past, either when you set up your profile or last edited your location. However, you can manually update the location on your LinkedIn profile to display a different location.

but it appears the manual method is disabled. I am guessing they have a fixed list of locations in my postcode and this can’t be changed.

So it appears that my options are to accept Linkedin’s crappy name for my location (Other NZers have posted problems with their location naming) or to allow Linkedin to spy on my location and it’ll probably still assign the same dumb name.

The basically appears to be a way for Linkedin to push user to enable location tracking. While at the same time they get to force their own ideas on how New Zealand locations work on users.

Share

,

Chris SmartHow to create bridges on bonds (with and without VLANs) using NetworkManager

Some production systems you face might make use of bonded network connections that you need to bridge in order to get VMs onto them. That bond may or may not have a native VLAN (in which case you bridge the bond), or it might have VLANs on top (in which case you want to bridge the VLANs), or perhaps you need to do both.

Let’s walk through an example where we have a bond that has a native VLAN, that also has the tagged VLAN 123 on top (and maybe a second VLAN 456), all of which need to be separately bridged. This means we will have the bond (bond0) with a matching bridge (br-bond0), plus a VLAN on the bond (bond0.123) with its matching bridge (br-vlan123). It should look something like this.

+------+   +---------+                           +---------------+
| eth0 |---|         |          +------------+   |  Network one  |
+------+   |         |----------|  br-bond0  |---| (native VLAN) |
           |  bond0  |          +------------+   +---------------+
+------+   |         |                                            
| eth1 |---|         |                                            
+------+   +---------+                           +---------------+
            | |   +---------+   +------------+   |  Network two  |
            | +---| vlan123 |---| br-vlan123 |---| (tagged VLAN) |
            |     +---------+   +------------+   +---------------+
            |                                                     
            |     +---------+   +------------+   +---------------+
            +-----| vlan456 |---| br-vlan456 |---| Network three |
                  +---------+   +------------+   | (tagged VLAN) |
                                                 +---------------+

To make it more complicated, let’s say that the native VLAN on the bond needs a static IP and to operate at an MTU of 1500 while the other uses DHCP and needs MTU of 9000.

OK, so how do we do that?

Start by creating the bridge, then later we create the interface that attaches to that bridge. When creating VLANs, they are created on the bond, but then attached as a slave to the bridge.

Create the bridge for the bond

First, let’s create the bridge for our bond. We’ll export some variables to make scripting easier, including the name, value for spanning tree protocol (SPT) and MTU. Note that in this example the bridge will have an MTU of 1500 (but the bond itself will be 9000 to support other VLANs at that MTU size.)

BRIDGE=br-bond0
BRIDGE_STP=yes
BRIDGE_MTU=1500

OK so let’s create the bridge for the native VLAN on the bond (which doesn’t exist yet).

nmcli con add ifname "${BRIDGE}" type bridge con-name "${BRIDGE}"
nmcli con modify "${BRIDGE}" bridge.stp "${BRIDGE_STP}"
nmcli con modify "${BRIDGE}" 802-3-ethernet.mtu "${BRIDGE_MTU}"

By default this will look for an address with DHCP. If you don’t want that you can either set it manually:

nmcli con modify "${BRIDGE}" ipv4.method static ipv4.address 192.168.0.123/24 ipv6.method ignore

Or disable IP addressing:

nmcli con modify "${BRIDGE}" ipv4.method disabled ipv6.method ignore

Finally, bring up the bridge. Yes, we don’t have anything attached to it yet, but that’s OK.

nmcli con up "${BRIDGE}"

You should be able to see it with nmcli and brctl tools (if available on your distro), although note that there is no device attached to this bridge yet.

nmcli con
brctl show

Next, we create the bond to attach to the bridge.

Create the bond and attach to the bridge

Let’s create the bond. In my example I’m using active-backup (mode 1) but your bond may use balance-rr (round robin, mode 0) or, depending on your switching, perhaps something like link aggregation control protocol (LACP) which is 802.3ad (mode 4).

Let’s say that your bond (we’re going to call bond0) has two interfaces, which are eth0 and eth1 respectively. Note that in this example, although the native interface on this bond wants an MTU of 1500, the VLANs which sit on top of the bond need a higher MTU of 9000. Thus, we set the bridge to 1500 in the previous step, but we need to set the bond and its interfaces to 9000. Let’s export those now to make scripting easier.

BOND=bond0
BOND_SLAVE0=eth0
BOND_SLAVE1=eth1
BOND_MODE=active-backup
BOND_MTU=9000

Now we can go ahead and create the bond, setting the options and the slave devices.

nmcli con add type bond ifname "${BOND}" con-name "${BOND}"
nmcli con modify "${BOND}" bond.options mode="${BOND_MODE}"
nmcli con modify "${BOND}" 802-3-ethernet.mtu "${BOND_MTU}"
nmcli con add type ethernet con-name "${BOND}-slave-${BOND_SLAVE0}" ifname "${BOND_SLAVE0}" master "${BOND}"
nmcli con add type ethernet con-name "${BOND}-slave-${BOND_SLAVE1}" ifname "${BOND_SLAVE1}" master "${BOND}"
nmcli con modify "${BOND}-slave-${BOND_SLAVE0}" 802-3-ethernet.mtu "${BOND_MTU}"
nmcli con modify "${BOND}-slave-${BOND_SLAVE1}" 802-3-ethernet.mtu "${BOND_MTU}"

OK at this point you have a bond specified, great! But now we need to attach it to the bridge, which is what will make the bridge actually work.

nmcli con modify "${BOND}" master "${BRIDGE}" slave-type bridge

Note that before we bring up the bond (or afterwards) we need to disable or delete any existing network connections for the individual interfaces. Check this with nmcli con and delete or disable those connections. Note that this may disconnect you, so make sure you have a console to the machine.

Now, we can bring the bond up which will also activate our interfaces.

nmcli con up "${BOND}"

We can check that the bond came up OK.

cat /proc/net/bonding/bond0

And this bond should also now be on the network, via the bridge which has an IP set.

Now if you look at the bridge you can see there is an interface (bond0) attached to it (your distro might not have brctl).

nmcli con
ls /sys/class/net/br-bond0/brif/
brctl show

Bridging a VLAN on a bond

Now that we have our bond, we can create the bridged for our tagged VLANs (remember that the bridge connected to the bond is a native VLAN so it didn’t need a VLAN interface).

Create the bridge for the VLAN on the bond

Create the new bridge, which for our example is going to use VLAN 123 which will use MTU of 9000.

VLAN=123
BOND=bond0
BRIDGE=br-vlan${VLAN}
BRIDGE_STP=yes
BRIDGE_MTU=9000

OK let’s go! (This is the same as the first bridge we created.)

nmcli con add ifname "${BRIDGE}" type bridge con-name "${BRIDGE}"
nmcli con modify "${BRIDGE}" bridge.stp "${BRIDGE_STP}"
nmcli con modify "${BRIDGE}" 802-3-ethernet.mtu "${BRIDGE_MTU}"

Again, this will look for an address with DHCP, so if you don’t want that, then disable it or set an address manually (as per first example). Then you can bring the device up.

nmcli con up "${BRIDGE}"

Create the VLAN on the bond and attach to bridge

OK, now we have the bridge, we create the VLAN on top of bond0 and then attach it to the bridge we just created.

nmcli con add type vlan con-name "${BOND}.${VLAN}" ifname "${BOND}.${VLAN}" dev "${BOND}" id "${VLAN}"
nmcli con modify "${BOND}.${VLAN}" master "${BRIDGE}" slave-type bridge
nmcli con modify "${BOND}.${VLAN}" 802-3-ethernet.mtu "${BRIDGE_MTU}"

If you look at bridges now, you should see the one you just created, attached to a VLAN device (note, your distro might not have brctl).

nmcli con
brctl show

And that’s about it! Now you can attach VMs to those bridges and have them on those networks. Repeat the process for any other VLANs you need on to of the bond.

,

Lev LafayetteProcess Locally, Backup Remotely

Recently, a friend expressed a degree of shock that I could pull old, even very old, items of conversation from emails, Facebook messenger, etc., with apparent ease. "But I wrote that 17 years ago". They were even dismayed when I revealed that this all just stored as plain-text files, suggesting that perhaps I was like a spy, engaging in some sort of data collection on them by way of mutual conversations.

For my own part, I was equally shocked by their reaction. Another night of fitful sleep, where feelings of self-doubt percolate. Is this yet another example that I have some sort of alien psyche? But of course, this is not the case, as keeping old emails and the like as local text files is completely normal in computer science. All my work and professional colleagues do this.

What is the cause of this disparity between the computer scientist and the ubiquitous computer user? Once I realised that the disparity of expected behaviour was not personal, but professional, there was clarity. Essentially, the convenience of cloud technologies and their promotion of applications through Software as a Service (SaaS) has led to some very poor computational habits among general users that have significant real-world inefficiencies.

Webmail

The earliest example of a SaaS application that is convenient and inefficient is webmail. Early providers such as Hotmail and Yahoo! offered the advantage of one being able to access their email from anywhere on any device with a web-browser, and that convenience far out-weighed the more efficient method of processing email with a client with POP (Post-Office Protocol), because POP would delete the email from the mail server as part of the transfer.

Most people opted for various webmail implementations for convenience. There were some advantages of POP, such as storage being limited only by the local computer's capacity, the speed that one could process emails being independent of the local Internet connection, and the security of having emails transferred locally rather than on a remote server. But these weren't enough for the convenience and en masse people adopted cloud solutions, which now have very easily been integrated into the corporate world.

During all this time a different email protocol, IMAP (Internet Access Message Protocol), became more common. IMAP provided numerous advantages over POP. Rather than deleting emails from the server, it copied them to the client and kept them on the server (although some still saw this as a security issue). IMAP clients stay connected to the server whilst active, rather than the short retrieve connection used by POP. IMAP allowed for multiple simultaneous connections, message state information, and multiple mail-boxes. Effectively, IMAP provided the local processing performance, but also the device and location convenience of webmail.

The processing performance that one suffers from webmail is effectively two-fold. One's ability to use and process webmail is dependent on the speed of the Internet connection to the mail server and the spare capacity of that mail-server to perform tasks. For some larger providers (e.g., Gmail) it will only be the former that it is really an issue. But even then, the more complex the regular expression in the search the harder it is, as basic RegEx tools (grep, sed, awk, perl) typically aren't available. With local storage of emails, kept in plain text files, the complexity of the search criteria depends only on the competence of the user and the speed with the performance of the local system. That is why pulling up an archaic email with specified regular expressions search terms is a relatively trivial task for computer professionals who keep their old emails stored locally, but less so for computer users who store them using a webmail application.

Messenger Systems

Where would we be without our various messenger systems? We love the instant message convenience, it's like SMS on steroids. Whether Facebook Messenger, WhatsApp, Instagram, Zoom, WeChat, Discord, or any range of similar products there is the very same issue that one is confronted with webmail because it too operates as a SaaS application. Your ability to process data will depend on the speed of your Internet connection and the spare computational capacity on the receiving server. Yeah, good luck with that on your phone. Have you ever tried to scroll through a few thousand Facebook Messenger posts? It's absolutely hopeless.

But when something like an extensive Facebook message chat is downloaded as a plain HTML file searching and scrolling becomes trivial and blindingly fast in comparison. Google Chrome, for example, has a pretty decent Facebook Message/Chat Downloader. WhattsApp backs up chats automatically to one's mobile device and can be setup to periodically downloaded to Google Drive which then can be moved to a local device (regular expressions on 'phones are not the greatest, yet), and Slack (with some restrictions) offers a download feature. For Discord, there is an open-source chat exporters. Again, the advantages of local copies become clear; speed of processing, the capability of conducting complex searches.

Supercomputers and the Big Reveal

What could this discussion of web-apps and local processing possibility have to do with the operations of big iron supercomputers? Quite a lot actually, as the example illustrates the issue at a scale. Essentially the issue involved a research group that had a very large dataset that was stored interstate. The normal workflow in such a situation is to transfer the dataset as needed to the supercomputer and conduct the operations there. Finding this inconvenient, they made a request to have mount points on each of the compute nodes so the data could stay interstate, the computation could occur on the supercomputer, and they could ignore the annoying requirement of actually transferring the data. Except physics gets in the way, and physics always wins.

The problem was that the computational task primarily involved the use of the Burrows-Wheeler Aligner (BWA) which aligns relatively short nucleotide sequences against a long reference sequence such as the human genome. In carrying out this activity this program makes very extensive use of disk read-writes. Now, you can imagine the issue of the data being sent interstate, the computation being run on the supercomputer, and then the data being sent back to the remote disk, thousands of times per second. The distance between where the data is and where the processing is carried out is meaningful. It is much more efficient to conduct the computational processing close to where the data is. As Grace Hopper, peace-be-upon-her said: "Mind your nanoseconds!"

I like to hand out c30cm pieces of string at the start of introductory HPC training workshops, telling researchers that should hang on to the piece of string to remind themselves to think of where their data is, and where their compute is. And that is the basic rule; keep your data that you want to be processed, even when it's something trivial like a searching email, or even as complex as aligns relatively short nucleotide sequences, close to the processor itself, and the closer the better.

What then is the possible advantage of cloud-enabled services? Despite their ubiquity, when it comes to certain performance-related tasks there isn't much, although the centralised services can provide many advantages in itself. Perhaps the very best processing use is perhaps treating the cloud as "somebody else's computer", that is, a resource where one can offload data for certain computational tasks for can be conducted on the remote system. Rather like how many and an increasing number of researchers find themselves going to HPC facilities when they discover that their datasets and computational problems are too large or too complex for their personal systems.

There is also one extremely useful advantage called cloud services from the user perspective and that is, of course, off-site backup. Whilst various local NAS systems are very good for serving datasets within, say, a household especially for the provision of media, it is perhaps not the wisest choice for the ultimate level of backup. Eventually, disks will fail and with disk failure, there is quite a cost in data recovery. Large cloud storage providers, such as Google, offer massive levels of redundancy which ensures that data loss is extremely improbable, and with tools such as Rclone, synchronisation and backups of local data to remote services can be easily automated. In a nutshell: process data close to the CPU, backup remotely.

,

David RoweOpen IP over VHF/UHF 2

The goal of this project is to develop a “100 kbit/s IP link” for VHF/UHF using just a Pi and RTLSDR hardware, and open source modem software. Since the first post in this series there’s been quite a bit of progress:

  1. I have created a GitHub repo with build scripts, a project plan and command lines on how to run some basic tests.
  2. I’ve built some integrated applications, e.g. rtl_fsk.c – that combines rtl_sdr, a CSDR decimator, and the Codec 2 fsk_demod in one handy application.
  3. Developed a neat little GUI system so we can see whats going on. I’ve found real time GUIs invaluable for physical layer experimentation. That’s one thing you don’t get with a chipset.

Spectral Purity

Bill and I have built and tested (on a spec-an) Tx filters for our Pis, that ensure the transmitted signal meets Australian Ham spurious requirements (which are aligned with international ITU requirements). I also checked the phase noise at 1MHz offset and measured -90dBm/Hz, similar to figures I have located online for my FT-817 at a 5W output level (e.g. DF9IC website quotes +37dBm-130dBc=-93dBm/Hz).

While there are no regulations for Ham phase noise in Australia, my Tx does appear to be compliant with ETSI EN 300 220-1 which deals with short range ISM band devices (maximum of -36dBm in a 100kHz bandwidth at 1MHz offset). Over an ideal 10km link, a -90dBm/Hz signal would be attenuated down to -180dBm/Hz, beneath the thermal noise floor of -174dBm/Hz.

I set up an experiment to pulse the Pi Tx signal off and on at 1 second intervals. Listening on a SSB Rx at +/-50kHz and +/-1MHz about 50m away and in a direct line of site to the roof mounted Pi Tx antenna I can hear no evidence of interference from phase noise.

I am satisfied my Pi based Tx won’t be interfering with anyone.

However as Mark VK5QI suggested I would not recommend amplifying the Tx level to the several watt level. If greater powers are required there are some other Tx options. For example the FSK transmitters of chipset transmitters work quite well, and some have better phase noise specs.

Over the Air Tests

Bill and I spent a few afternoons attempting to send packet at various bit rates. We measured our path loss at -135dBm over a 10km, non-line of site suburban path. Using our FT817s, this path is a noisy copy on SSB using a few watts.

Before I start any tests I ask myself “what would we expect to see?”. Well with 12dBm Tx power thats +12 – 135 = -123dBm into the receiver. Re-arranging for Eb/No, and using Rb=1000 bits/s and a RTL-SDR noise figure of 7dB:

Rx    = Eb/No + 10log10(Rb) + NF - 174
Eb/No = Rx - 10log10(Rb) - NF + 174
      = -123 - 10*log10(1000) - 7 + 174
      = 14 dB

Here is a plot of Eb/No versus BER generated by the mfsk.m script:

Looking up our 2FSK Bit Error Rate (BER) for Eb/No of 14dB, we should get better than 1E-4 (it’s off the graph – but actually about 2E-6). So at 1000 bit/s, we expect practically no errors.

I was disappointed with the real world OTA results: I received packets at 1000 bit/s with 8% BER (equivalent to an Eb/No of 5.5dB) which suggests we are losing 8.5dB somewhere. Our receiver seems a bit deaf.

Here’s a screenshot of the “dashboard” with a 4FSK signal sent by Bill (we experimented with 4FSK as well as 2FSK):

You can see a bunch of other signals (perhaps local EMI) towering over our little 4FSK signal. The red crosses show the frequency estimates of the demodulator – they should lock onto each of the FSK tones (4 in this case).

One risk with low cost SDRs (in a city) is strong signal interference causing the receiver to block and fall over entirely. When I connected my RTL-SDR to my antenna, I had to back the gain off about 3dB, not too bad. However Bill needed to back his gain off 20dB. So that’s one real-world factor we need to deal with.

Still we did it – sending packets 10km across town, through 135dB worth of trees and buildings with just a 12mW Pi and a RTLSDR! That’s a start, and no one said this would be easy! Time to dig in and find some missing dBs.

Over the Bench Testing

I decided to drill down into the MDS performance and system noise figure. After half a days wrangling with command line utilties I had a HackRF rigged up to play FSK signals. The HackRF has the benefit of a built in attenuator, so the output level can be quite low (-68dBm). This makes it much easier to reliably set levels in combination with an external switched attenuator, compared to using relatively high (+14dBm) output of the Pi which gets into everything and requires lots of shielding. Low levels out of your Tx greatly simplifies “over the bench” testing.

After a few days of RF and DSP fiddling I tracked down a problem with the sample rate. I was running the RTL-SDR at a sample rate of 240 kHz, using it’s internal hardware to handle the sample rate conversion. Sampling at 1.8MHz, and reducing the sample rate externally improved the performance by 3dB. I’m guessing this is due to the internal fixed point precision of the RTL-SDR, it may have significant quantisation noise with weak signals.

OK, so now I was getting system noise figures between 7.5 and 9dB when tested “over the bench”. Close, but a few dB above what I would expect. I eventually traced that to a very subtle measurement bug. In the photo you can see a splitter at the output of the switched attenuator. One side feeds to the RTL-SDR, the other to my spec-an. To calibrate the system I play a single freq carrier from the HackRF, as this makes measurement easier on the spec-an using power averaging.

Turns out, the RTL-SDR input port is only terminated when RTL-SDR software is running, i.e. the dongle is active. I often had the software off when measuring levels, so the levels were high by about 1dB, as one port of the splitter was un-terminated!

The following table maps the system noise figure for various gain settings:

g Rx (dBm) BER Eb/No Est NF
0 -128.0 0.01 9.0 7.0
49 -128.0 0.015 8.5 7.5
46 -128 0.06 6 10.0
40 -126.7 0.018 8 12
35 -123 0.068 6 15
30 -119 0.048 7 18

When repeating the low level measurements with -g 0 I obtained 8, 6.5, 7.0, 7.7, so there is some spread. The automatic gain (-g 0) seems about 0.5dB ahead of maximum manual gain (-g 49).

These results are consistent with those reported in this fine report, which measured the NF of the SDRs directly. I have also previously measured RTLSDR noise figures at around 7dB, although on an earlier model.

This helps us understand the effect of receiver gain. Lowering it is bad, especially lowering it a lot. However we may need to run at a lower gain setting, especially if the receiver is being overloaded by strong signals. At least this lets us engineer the link, and understand the effect of gain on system performance.

For fun I hooked up a LNA and using 4FSK I managed 2% BER at -136dBm, which works out to a 2.5dB system noise figure. This is a little higher than I would expect, however I could see some evidence of EMI on the dashboard. Such low levels are difficult on the bench without a Faraday cage of some sort.

Engineering the Physical Layer

With this project the physical layer (modem and radio) is wide open for us to explore. With chipset based approaches you get a link or your don’t, and perhaps a little diagnostic information like a received signal strength. Then again they “just work” most of the time so that’s probably OK! I like looking inside the black boxes and pushing up against the laws of physics, rather that the laws of business.

It gets interesting when you can measure the path loss. You have a variety of options to improve your bit error rate or increase your bit rate:

  1. Add transmit power
  2. Reposition your antenna above the terrain, or out of multipath nulls
  3. Lower loss coax run to your antenna, or mount your Pi and antenna together on top of the mast
  4. Use an antenna with a higher gain like Bill’s Yagi
  5. Add a low noise amplifier to reduce your noise figure
  6. Adjust your symbol/bit rate to spread that energy over fewer (or more) bits/s
  7. Use a more efficient modulation scheme, e.g. 4FSK performs 3dB better than 2FSK at the same bit rate
  8. Move to the country where the ambient RF and EMI is hopefully lower

Related Projects

There are some other cool projects in the “200kHz channel/several 100 kbit/s/IP” UHF Ham data space:

  1. New packet radio NPR70 project – very cool GFSK/TDMA system using a chipset modem approach. I’ve been having a nice discussion with the author Guillaume F4HDK around modem sensitivity and FEC. This project is very well documented.
  2. HNAP – a sophisticated OFDM/QAM/TDMA sytem that is being developed on Pluto SDR hardware as part of a masters thesis.

I’m a noob when it comes to protocols, so have a lot to learn from these projects. However I am pretty focussed on modem/FEC performance which is often sub-optimal (or delegated to a chipset) in the Ham Radio space. There are many dB to be gained from good modem design, which I prefer over adding a PA.

Next Steps

OK, so we have worked though a few bugs and can now get results consistent with the estimated NF of the receiver. It doesn’t explain the entire 8.5dB loss we experienced over the air, but it’s a step in the right direction. The bugs tend to reveal themselves one at a time ……

One possible reason for reduced sensitivity is EMI or ambient RF noise. There are some signs of this in the dashboard plot above. This is more subtle than strong signal overload, but could be increasing the effective noise figure on our link. All our calculations above assume no additional noise being fed into the antenna.

I feel it’s time for another go at Over The Air (OTA) tests. My goal is to get a solid 1000 bit/s link in both directions over our path, and understand any impact on performance such as strong signals or a raised noise floor. We can then proceed to the next steps in the project plan.

Reading Further

GitHub repo for this project with build scripts, a project plan and command lines on how to run some basic tests.
Open IP over VHF/UHF – first post in this series
Bill and I are documenting our OTA tests in this Pull Request
4FSK on 25 Microwatts low bit rate packets with sophisticated FEC at very low power levels. We’ll use the same FEC on this project.
Measuring SDR Noise Figure
Measuring SDR Noise Figure in Real Time
Evaluation of SDR Boards V1.0 – A fantastic report on the performance of several SDRs
NPR70 – New Packet Radio Web Site
HNAP Web site

,

Ben MartinSmall 1/4 inch socket set into a nicer walnut tray

 I was recently thinking about how I could make a selection of 1/4 inch drive bits easier to use. It seems I am not alone in the crowd of people who leave the bits in the case they came in. Some folks do that for many decades. Apart from being trapped into what "was in the set" this also creates an issue when you have some 1/4 inch parts in a case that includes many more 3/8 inch drive bits. I originally marked the smaller drive parts and though about leaving them in the blow molded case as is the common case.

The CNC fiend in me eventually got the better of me and the below is the result. I cut a prototype in pine first, knowing that the chances of getting it all as I wanted on the first try was not impossible, but not probable either. Version 1 is shown below.

 

 The advantage is that now I have the design in Fusion 360 I can cut this design in about an hour. So if I want to add a bunch of deep sockets to the set I can do that for the time cost mostly of gluing up a panel, fixturing it and a little sand a shellac. Not a trivial en devour but the result I think justifies the means.

Below is the board still fixtured in the cnc machine. I think I will make a jig with some sliding toggle clamps so I can fix panels to the jig and then bolt the jig into the cnc instead of directly using hold down clamps.

I have planned to use a bandsaw to but a profile around the tools and may end up with some handle(s) on the tray. That part is something I have to think more about. The thinking about how I want the tools to be stored and accessed is an interesting side project.



 

 

,

Lev LafayetteWillsmere and Cricket

Whether Australia's first notable cricketeer, Tom Wills was at Kew Asylum is apparently subject to debate.

The following Kew Asylum related cricket stories, however, are not. Note the inclusion of one of the greats of early Australian cricket, Hugh Trumble.

William Evans Midwinter

The plaque on grave commemorates William Evans Midwinter (1851-1890), the only cricketer to play for Australia versus England (8 tests) and England versus Australia (4 tests).

William ("Billy") Evans Midwinter (19 June 1851– 3 December 1890) was an English born cricketer who played four Test matches for England, sandwiched in between eight Tests that he played for Australia. Midwinter holds a unique place in cricket history as the only cricketer to have played for Australia and England in Test Matches against each other.

By 1889, Midwinter's wife and two of his children had died, and his businesses were failed or failing. He became "hopelessly insane" and was confined to Bendigo Hospital in 1890. He was then transferred to the Kew Asylum, where he died later that year.

http://monumentaustralia.org.au/display/30687-william-evans-midwinter

KEW ASYLUM CRICKET CLUB.

From: The Australasian (Melbourne, Vic. : 1864 - 1946) Sat 2 Oct 1875
Page 11

A meeting of the members of the staff of the Kew Asylum was held on Saturday afternoon last, when it was resolved to form a cricket club at the establishment. Dr. Robertson was elected president, and Dr. Watkins and Mr. William Davis vice-presidents, Dr. Molloy hon. secretary and treasurer, and Messrs. Trumble, Johnston, Swift, and Flynn as committee.

The club starts with a large number of members, and with such players as Swtft, Niall and Flynn, it is likely to prove rather formidable. The club has not had an opportunity of making any matches yet, but would be glad to receive a few challenges for the ensuing season.

KEW ASYLUM CRICKET CLUB.

From: Boyle & Scott's Australian cricketers' guide., no.1882/83, 1882-01-01 p114

The club has had a fairly successful season, although they had tough opponents in Bohemia, Kew, Fitzroy, Brighton, &c. Among the players, H. Trumble, T. Foley, and G. Roberts have shown improved form with the bat, whilst M ‘Michael, W. Trumble, and Swift are as effective as of old. In bowling, H. Trumble, Arnold, and Swift have been most destructive.

Batting Averages.

Not Most Most
Inns. out. Runs, in inns, in match. Aver.
J. M'Michael 19 9 528 64* 64* 52.8
J. W. Trumble 6 1 229 105* 105* 45.4
J. S. Swift 15 4 497 79 79 45.2
C. Ross 4 0 114 75 75 28.2
H. Trumble 18 7 234 52* 52* 21.3
G. Roberts 13 1 167 39 39 13.11
T. Foley 17 3 163 44 44 11.9
G. Arnold 14 2 103 . 31 31 8.7

Bowling Averages.

Balls. Mdns. Runs. Wits. Aver.
J. W. Trumble 268 16 50 13 3.11
J. S. Swift 394 16 148 25 5.23
G. Arnold 650 26 320 43 7.19
T. Foley 258 11 88 8 11
H. Trumble 834 33 349 28 12.13
G. M'Garvin... 120 3 51 4 12.3
J. M'Michael 177 5 126 10 12.6

Swift 2no balls; Trumble 1.

CRICKET. KEW ASYLUM v. N. MELBOURNE.

From: The Reporter (Box Hill, Vic. : 1889 - 1925) Fri 12 Mar 1909 Page 7

The above match, played on the asylum ground on Saturday, was won by the home team by 7 wickets and 127 runs. North Melbourne scored 47 (Howlott 20 not out), while the Asylum lost 3 wickets for 174 (R. Morrison 110 not out, including 17 fourers, A. Walsh 38, R. Walsh 25 not out). Howlett, 1 for 34, and Buncle, 1 for 33, took the wickets for North Melbourne, and for Kew, Kenny 4 for 19, Crouch 4 for 21.

,

Tim RileyOpen source status update, July 2020

July was a great month for my work on Hanami!

After a feeling like I stalled a little in June, this time around I was able to get to the very end of my initial plans for application/action/view integration, as well as improve clarity around comes next for our 2.0 efforts overall.

Getting closer on extensible component configuration

Whenever I’ve worked on integrating the configuration of the standalone Hanami components (like hanami-controller or view) into the application core, I’ve asked myself, “if the app author chose not to use this component, would the application-level configuration still make sense?� I wanted to avoid baking in too many assumptions about hanami-controller or hanami-view particulars into the config that you can access on Hanami.application.config.

In the long term, I hope we can build a clean extensions API so that component gems can cleanly register themselves with the framework and expose their configuration that way. In the meantime, however, we need to take a practical, balanced approach, to make it easy for hanami-controller and hanami-view to do their job while still honouring that longer-term goal in spirit.

I’m happy to report that I think I’ve found a pretty good arrangement for all of this! You can see it in action with we we load the application.config.actions configuration:

module Hanami
  class Configuration
    attr_reader :actions

    def initialize(env:)
      # ...

      @actions = begin
        require_path = "hanami/action/application_configuration"
        require require_path
        Hanami::Action::ApplicationConfiguration.new
      rescue LoadError => e
        raise e unless e.path == require_path
        Object.new
      end
    end
  end
end

With this approach, if the hanami-controller gem is available, then we’ll make its own ApplicationConfiguration available as application.config.actions. This means the hanami gem itself doesn’t need to know anything else about how action configuration should be handled at the application level. This kind of detail makes much more sense to live in the hanami-controller gem, where those settings will actually be used.

Let’s take a look at that:

module Hanami
  class Action
    class ApplicationConfiguration
      include Dry::Configurable

      # Define settings that are _specific_ to application integration
      setting :name_inference_base, "actions"
      setting :view_context_identifier, "view.context"
      setting :view_name_inferrer, ViewNameInferrer
      setting :view_name_inference_base, "views"

      # Then clone all the standard settings from Action::Configuration
      Configuration._settings.each do |action_setting|
        _settings << action_setting.dup
      end

      def initialize(*)
        super

        # Apply defaults to standard settings for use within an app
        config.default_request_format = :html
        config.default_response_format = :html
      end

      # ...
    end
  end
end

This configuration class:

  • (a) Defines settings specifically for the Hanami::Action behaviour activated only when used within a full Hanami app
  • (b) Clones the standard settings from Hanami::Action::Configuration (which are there for standalone use) and makes them available
  • (c) Then tweaks some of the default values of those standard settings, to make them fit better with the full application experience

This feels like an ideal arrangement. It keeps the ApplicationConfiguration close to the code in ApplicationAction, which uses those new settings. It means that all the application integration code can live together and evolve in sync.

Further, because Hanami::Action::ApplicationConfiguration exposes a superset of the base Hanami::Action::Configuration settings, we can make it so any ApplicationAction (i.e. any action defined within an Hanami app) automatically configures every aspect of itself based on whatever settings are available on the application!

So for the application author, the result of all this groundwork should be a blessedly unsurprising experience: if they’re using hanami-controller, then they can go and tweak whatever settings they want right there on Hanami.application.config.actions, both the basic action settings as well as the integration-specific settings (though most of the time, I hope the defaults should be fine!).

When we do eventually implement an extensions API, we can at that point just remove the small piece special-case code from Hanami::Application::Configuration and replace it with hanami-controller reigstering itself and making its settings available.

If you’re interested in following these changes in more detail, check out hanami/hanami#1068 for the change from the framework side, and then hanami/controller#321 for the ApplicationConfiguration and hanami/controller#320 for the self-configuring application actions. (I also took an initial pass at this in hanami/hanami#1065, but that was surpassed by all the changes linked previously - I took small steps, and learnt along the way!)

I also made matching changes to view configuration. All the same ideas apply: if you have hanami-view loaded, you’ll find an Hanami.application.config.views with all the view settings you need, and then application views will self-configure themselves based on those values! Check out hanami/hanami#1066 and hanami/view#176 for the implementation.

Fixed handle_exception inside actions

One of the settings on Hanami::Action classes is its array of config.handled_exceptions, which you can also supply one-by-one through the config.handle_exception convenience method.

It turns out another handle_exception still existed as a class method, clearly an overhang of the previous action behaviour. I took care of removing that, so now there should be no confusion whenever action authors configure this behaviour (especially since the old class-level method didn’t work with inheritence).

Automatically infer paired views for actions

Believe it or not, the work so far only took me about half-way through the month! This left enough time to roll through all my remaining “minimum viable action/view integration� tasks!

First up was inferring paired views for actions. The idea here is that if you’re building an Hanami 2 app and following the sensible convention of matching your view and action names, then the framework can take care of auto-injecting an action’s view for you.

So if you had an action class like this:

class Main
  module Actions
    module Articles
      class Index < Main::Action
        include Deps[view: "views.articles.index"]

        def handle(request, response)
          response.render view
        end
      end
    end
  end
end

Now, you can drop that include Deps[view: "…"] line. A matching view will now automatically be available as the view for the action!

This works even for RESTful-style actions too. For example, an Actions::Articles::Create action would have an instance of Views::Articles::New injected, since that’s the view you’d want to re-render in the case of presenting a form with errors.

If you need it, you can also configure your own custom view inference by providing your own Hanami.application.config.actions.view_name_inferrer object.

To learn more about the implementation, check out the PR and then this follow-up fix (in which I learnt I should always write integration tests that exercise at least two levels of inheritance).

Automatically render an action’s view

With the paired view inference above, our action class is now looking like this:

class Main
  module Actions
    module Articles
      class Index < Main::Action
        def handle(request, response)
          response.render view
        end
      end
    end
  end
end

But we can do better. For simple actions like this, we shouldn’t have to write that “please render your own view� boilerplate.

So how about just this?

class Main
  module Actions
    module Articles
      class Index < Main::Action
      end
    end
  end
end

Now, any call to this action will automatically render its paired view, passing through all request params for the view to handle as required.

And the beauty of this change was that, after all the groundwork laid so far, it was only a single line of code!

As Kent Beck has said, “for each desired change, make the change easy (warning: this may be hard), then make the easy change.� The easy change indeed. Moments like these are why I love being a programmer :)

Integrated, application-aware view context, and some helpers

Let’s keep going! This month I also gave the “automatic application integration� treatment to Hanami::View::Context. Now when you inherit from this within your application, it’ll be all set up to accept the request/response details that Hanami::Action is already passing through whenever you render a view from within an action.

With these in place, we’re now providing useful methods like session and flash for use within your view-related classes and templates. If you want to add additional behaviour, you can now access request and response on your application’s view context class, too.

While I was doing this, I also took the opportunity to hash out some initial steps towards a standard library of view context helpers with an Hanami::View::ContextHelpers::ContentHelpers module. If you mix this into your app’s view context class, you’ll also have a convenient content_for method that works like you’d expect. Longer term, I’ll look to move this into the hanami-helpers gem and update the existing helpers to work with the new views, including providing a nice way to opt in to whatever specific helpers you want your application to expose.

In the meantime, check out all this fresh view context goodness here.

Hanami 2.0 application template is up to date

After all of this, I took a moment to update my Hanami 2 application template. If you create an app from this template today, all the features I’ve described above will be in place and ready for you to try. I also enabled rack session middleware in the app, because this is a requirement for the flash and session objects as well as CSRF protection.

Hanami 2.0 Trello board

Last but not least, as I was finally seeing some clear air ahead, I took a chance to bring our Hanami 2.0 Trello board up to date!

As it currently stands, I have just 7-8 items left before I think we’ll be ready for the long-awaited Hanami 2.0.0.alpha2 release.

Beyond that, I hope the board will help everyone coordinate the remainder of our work in preparing 2.0.0. At very least, I’m already feeling much better knowing we’re a little more oranized, with a single, up-to-date place where it’s easy to see what’s next as well as add new items whenever we think of them (I’ve no doubt that plenty more little things will crop up).

Plans for August

So that was July. What. A. Month.

I tell you, I was exceedingly happy to have finally completed my “get views and actions properly working together for the first time� list, which turns out to have taken the better past of five months.

For August, I plan to knock out as many of my remaining 2.0.0.alpha2 tasks. Some of them are pretty minor, but one or two are looming a little larger. We’ll see how many I can get through. One thing I’m accepting more and more is that when open sourcing across nights and weekends, patience is a virtue.

Thanks for sticking with me through this journey so far!

🙌� Thanks to my sponsors… could you be the next?

I’ve been working really hard on preparing a truly powerful, flexible Ruby application framework. I’m in this for the long haul, but it’s not easy.

If you’d like to help all of this come to fruition, I’d love for you to sponsor my open source work.

Thanks especially to Benjamin Klotz for your continued support.

,

Paul WiseFLOSS Activities July 2020

Focus

This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Administration

  • Debian wiki: unblock IP addresses, approve accounts, reset email addresses

Communication

Sponsors

The purple-discord, ifenslave and psqlodbc work was sponsored by my employer. All other work was done on a volunteer basis.

,

Chris SmartHow to create Linux bridges and Open vSwitch bridges with NetworkManager

My virtual infrastructure Ansible role supports connecting VMs to both Linux and Open vSwitch bridges, but they must already exist on the KVM host.

Here is how to convert an existing Ethernet device into a bridge. Be careful if doing this on a remote machine with only one connection! Make sure you have some other way to log in (e.g. console), or maybe add additional interfaces instead.

Export interfaces and existing connections

First, export the the device you want to convert so we can easily reference it later (e.g. eth1).

export NET_DEV="eth1"

Now list the current NetworkManager connections for your device exported above, so we know what to disable later.

sudo nmcli con |egrep -w "${NET_DEV}"

This might be something like System eth1 or Wired connection 1, let’s export it too for later reference.

export NM_NAME="Wired connection 1"

Create a Linux bridge

Here is an example of creating a persistent Linux bridge with NetworkManager. It will take a device such as eth1 (substitute as appropriate) and convert it into a bridge. Note that we will be specifically giving it the device name of br0 as that’s the standard convention and what things like libvirt will look for.

Make sure you have exported your device as NET_DEV and its existing NetworkManager connection name as NM_NAME from above, you will use them below.

sudo nmcli con add ifname br0 type bridge con-name br0
sudo nmcli con add type bridge-slave ifname "${NET_DEV}" master br0 con-name br0-slave-"${NET_DEV}"

Note that br0 probably has a different MAC address to your physical interface. If so, make sure you update and DHCP reservations (or be able to find the new IP once the bridge is brought up).

sudo ip link show dev br0
sudo ip link show dev "${NET_DEV}"

Configure the bridge

As mentioned above, by default the Linux bridge will get an address via DHCP. If you don’t want it to be on the network (you might have another dedicated interface) then disable DHCP on it.

sudo nmcli con modify br0 ipv4.method disabled ipv6.method disabled

Or, if you need set a static IP you can do that too.

sudo nmcli con modify br0 ipv4.method static ipv4.address 192.168.123.100/24

If you need to set a specific MTU like 9000 (defaults to 1500), you can do that.

sudo nmcli con modify br0-slave-enp4s0 802-3-ethernet.mtu 9000

Finally, spanning tree protocol is on by default, so disable it if you need to.

sudo nmcli con modify br0 bridge.stp no

Bring up the bridge

Now you can either simply reboot, or stop the current interface and bring up the bridge (do it in one command in case you’re using the one interface, else you’ll get disconnected). Note that your IP might change once bridge comes up, if you didn’t check the MAC address and update any static DHCP leases.

sudo nmcli con down "${NM_NAME}" ; \
sudo nmcli con up br0

Create an Open vSwitch (OVS) bridge

OVS bridges are often used for plumbing into libvirt for use with VLANs.

We can create an OVS bridge which will consists of the bridge itself and multiple ports and interfaces which connect everything together, including the physical device itself (so we can talk on the network) and virtual ports for VLANs and VMs. By default the physical port on the bridge will use untagged (native) VLAN, but if all your traffic needs to be tagged then we can add a tagged interface.

Here is an example of creating a persistent OVS bridge with NetworkManager. It will take a device such as eth1 (substitute as appropriate) and convert it into an ovs-bridge.

Install dependencies

You will need openvswitch installed as well as the OVS NetworkManager plugin.

sudo dnf install -y NetworkManager-ovs openvswitch
sudo systemctl enable --now openvswitch
sudo systemctl restart NetworkManager

Create the bridge

Let’s create the bridge, its port and interface with these three commands.

sudo nmcli con add type ovs-bridge conn.interface ovs-bridge con-name ovs-bridge
sudo nmcli con add type ovs-port conn.interface port-ovs-bridge master ovs-bridge con-name ovs-bridge-port
sudo nmcli con add type ovs-interface slave-type ovs-port conn.interface ovs-bridge master ovs-bridge-port con-name ovs-bridge-int

Patch in our physical interface

Next, create another port on the bridge and patch in our physical device as an Ethernet interface so that real traffic can flow across the network. Make sure you have exported your device as NET_DEV and its existing NetworkManager connection name as NM_NAME from above, you will use them below.

sudo nmcli con add type ovs-port conn.interface ovs-port-eth master ovs-bridge con-name ovs-port-eth
sudo nmcli con add type ethernet conn.interface "${NET_DEV}" master ovs-port-eth con-name ovs-port-eth-int

OK now you should have an OVS bridge configured and patched to your local network via your Ethernet device, but not yet active.

Configure the bridge

By default the OVS bridge will be sending untagged traffic and requesting an IP address for ovs-bridge via DHCP. If you don’t want it to be on the network (you might have another dedicated interface) then disable DHCP on the interface.

sudo nmcli con modify ovs-bridge-int ipv4.method disabled ipv6.method disabled

Or if you need to set a static IP you can do that too.

sudo nmcli con modify ovs-bridge-int ipv4.method static ipv4.address 192.168.123.100/24

If you need to set a specific MTU like 9000 (defaults to 1500), you can do that.

sudo nmcli con modify ovs-bridge-int 802-3-ethernet.mtu 9000
sudo nmcli con modify ovs-port-eth-int 802-3-ethernet.mtu 9000

Bring up the bridge

Before you bring up the bridge, note that ovs-bridge will probably have a MAC address which is different to your physical interface. Keep that in mind if you manage DHCP static leases, and make sure you can find the new IP so that you can log back in once the bridge is brought up.

Now you can either simply reboot, or stop the current interface and bring up the bridge and its interfaces (in theory we just need to bring up ovs-port-eth-int, but let’s make sure and do it in one command in case you’re using the one interface, else you’ll get disconnected and not be able to log back in). Note that your MAC address may change here, so if you’re using DHCP and you’ll get a new IP and your session will freeze, so be sure you can find the new IP so you can log back in.

sudo nmcli con down "${NM_NAME}" ; \
sudo nmcli con up ovs-port-eth-int ; \
sudo nmcli con up ovs-bridge-int

Now you have a working Open vSwitch implementation!

Create OVS VLAN ports

From there you might want to create some port groups for specific VLANs. For example, if your network does not have a native VLAN, you will need to create a VLAN interface on the OVS bridge to get onto the network.

Let’s create a new port and interface for VLAN 123 which will use DHCP by default to get an address and bring it up.

sudo nmcli con add type ovs-port conn.interface vlan123 master ovs-bridge ovs-port.tag 123 con-name ovs-port-vlan123
sudo nmcli con add type ovs-interface slave-type ovs-port conn.interface vlan123 master ovs-port-vlan123 con-name ovs-int-vlan123
sudo nmcli con up ovs-int-vlan123

If you need to set a static address on the VLAN interface instead, you can do so by modifying the interface.

sudo nmcli con modify ovs-int-vlan123 ipv4.method static ipv4.address 192.168.123.100/24

View the OVS configuration

Show the switch config and bridge with OVS tools.

sudo ovs-vsctl show

Clean up old interface profile

It’s not really necessary, but you can disable the current NetworkManager config for the device so that it doesn’t conflict with the bridge, if you want to.

sudo nmcli con modify "${NM_NAME}" ipv4.method disabled ipv6.method disabled

Or you can even delete the old interface’s NetworkManager configuration if you want to (but it’s not necessary).

sudo nmcli con delete "${NM_NAME}"

That’s it!

,

Adrian ChaddRFI from crappy electronics, or "how's this sold in the US?"

I picked up a cheap charging cable for my Baofeng UV-9S. (https://www.amazon.com/gp/product/B07TSDSQ4Z/). It .. well, it works.

But it messes up operating my radios! I heard super strong interference on my HF receiver and my VHF receivers.

So, let's take a look. I setup a little antenna in my shack. The baofeng was about 6ft away.



Here's DC to 120MHz. Those peaks to the right? Broadcast FM. The marker is at 28.5MHz.

Ok, let's plug in the baofeng and charger.

Ok, look at that noise. Ugh. That's unfun.

What about VHF? Let's look at that. 100-300MHz.

Ok, that's expected too. I think that's digital TV or something in there. Ok, now, let's plug in the charger, without it charging..


Whaaaaaaaaattttttt oh wait. Yeah, this is likely an unshielded buck converter and it's unloaded. Ok, let's load it up.


Whaaaaaa oh ok. Well that explains everything.

Let's pull it open:



Yup. A buck converter going from 5v to 9v; no shielding, no shielded power cable and no ground plane on the PCB. This is just amazing. The 3ft charge cable is basically an antenna. "Unintentional radiator" indeed.

So - even with a ferrite on the cable, it isn't quiet.


It's quiet at 28MHz now so I can operate on the 10m band with it charging, but this doesn't help at all at VHF.

Ew.




,

Michael StillShaken Fist 0.2.0

Share

The other day we released Shaken Fist version 0.2, and I never got around to announcing it here. In fact, we’ve done a minor release since then and have another minor release in the wings ready to go out in the next day or so.

So what’s changed in Shaken Fist between version 0.1 and 0.2? Well, actually kind of a lot…

  • We moved from MySQL to etcd for storage of persistant state. This was partially done because we wanted distributed locking, but it was also because MySQL was a pain to work with.
  • We rearranged our repositories — the main repository is now in its own github organisation, and the golang REST client, terrform provider, and deployment tooling have moved into their own repositories in that organisation. There is also a prototype javascript client now as well.
  • Some work has gone into making the API service more production grade, although there is still some work to be done there probably in the 0.3 release — specifically there is a timeout if a response takes more than 300 seconds, which can be the case in launch large VMs where the disk images are not in cache.

There were also some important features added:

  • Authentication of API requests.
  • Resource ownership.
  • Namespaces (a bit like Kubernetes namespaces or OpenStack projects).
  • Resource tagging, called metadata.
  • Support for local mirroring of common disk images.
  • …and a large number of bug fixes.

Shaken Fist is also now packaged on pypi, and the deployment tooling knows how to install from packages as well as source if that’s a thing you’re interested in. You can read more at shakenfist.com, but that site is a bit of a work in progress at the moment. The new github organisation is at github.com/shakenfist.

Share

Michael StillThe KSM and I

Share

I spent much of yesterday playing with KSM (Kernel Shared Memory, or Kernel Samepage Merging depending on which universe you come from). Unix kernels store memory in “pages” which are moved in and out of memory as a single block. On most Linux architectures pages are 4,096 bytes long.

KSM is a Linux Kernel feature which scans memory looking for identical pages, and then de-duplicating them. So instead of having two pages, we just have one and have two processes point at that same page. This has obvious advantages if you’re storing lots of repeating data. Why would you be doing such a thing? Well the traditional answer is virtual machines.

Take my employer’s systems for example. We manage virtual learning environments for students, where every student gets a set of virtual machines to do their learning thing on. So, if we have 50 students in a class, we have 50 sets of the same virtual machine. That’s a lot of duplicated memory. The promise of KSM is that instead of storing the same thing 50 times, we can store it once and therefore fit more virtual machines onto a single physical machine.

For my experiments I used libvirt / KVM on Ubuntu 18.04. To ensure KSM was turned on, I needed to:

  • Ensure KSM is turned on. /sys/kernel/mm/ksm/run should contain a “1” if it is enabled. If it is not, just write “1” to that file to enable it.
  • Ensure libvirt is enabling KSM. The KSM value in /etc/defaults/qemu-kvm should be set to “AUTO”.
  • Check KSM metrics:
# grep . /sys/kernel/mm/ksm/*
/sys/kernel/mm/ksm/full_scans:891
/sys/kernel/mm/ksm/max_page_sharing:256
/sys/kernel/mm/ksm/merge_across_nodes:1
/sys/kernel/mm/ksm/pages_shared:0
/sys/kernel/mm/ksm/pages_sharing:0
/sys/kernel/mm/ksm/pages_to_scan:100
/sys/kernel/mm/ksm/pages_unshared:0
/sys/kernel/mm/ksm/pages_volatile:0
/sys/kernel/mm/ksm/run:1
/sys/kernel/mm/ksm/sleep_millisecs:200
/sys/kernel/mm/ksm/stable_node_chains:49
/sys/kernel/mm/ksm/stable_node_chains_prune_millisecs:2000
/sys/kernel/mm/ksm/stable_node_dups:1055
/sys/kernel/mm/ksm/use_zero_pages:0

My lab machines are currently setup with Shaken Fist, so I just quickly launched a few hundred identical VMs. This first graph is that experiment. Its a little hard to see here but on three machines I consumed about about 40gb of RAM with indentical VMs and then waited. After three or so hours I had saved about 2,500 pages of memory.

To be honest, that’s a pretty disappointing result. 2,5000 4kb pages is only about 10mb of RAM, which isn’t very much at all. Also, three hours is a really long time for our workload, where students often fire up their labs for a couple of hours at a time before shutting them down again. If this was as good as KSM gets, it wasn’t for us.

After some pondering, I realised that KSM is configured by default to not work very well. The default value for pages_to_scan is 100, which means each scan run only inspects about half a megabyte of RAM. It would take a very very long time to scan a modern machine that way. So I tried setting pages_to_scan to 1,000,000,000 instead. One billion is an unreasonably large number for the real world, but hey. You update this number by writing a new value to /sys/kernel/mm/ksm/pages_to_scan.

This time we get a much better result — I launched as many VMs as would fit on each machine, and the sat back and waited (well, went to bed acutally). Again the graph is a bit hard to read, but what it is saying is that after 90 minutes KSM had saved me over 300gb of RAM across the three machines. Its still a little too slow for our workload, but for workloads where the VMs are relatively static that’s a real saving.

Now it should be noted that setting pages_to_scan to 1,000,000,000 comes at a cost — each of these machines now has one of its 48 cores dedicated to scanning memory and deduplicating. For my workload that’s something I am ok with because my workload is not CPU bound, but it might not work for you.

Share

,

Dave HallIf You’re not Using YAML for CloudFormation Templates, You’re Doing it Wrong

In my last blog post, I promised a rant about using YAML for CloudFormation templates. Here it is. If you persevere to the end I’ll also show you have to convert your existing JSON based templates to YAML.

Many of the points I raise below don’t just apply to CloudFormation. They are general comments about why you should use YAML over JSON for configuration when you have a choice.

One criticism of YAML is its reliance on indentation. A lot of the code I write these days is Python, so indentation being significant is normal. Use a decent editor or IDE and this isn’t a problem. It doesn’t matter if you’re using JSON or YAML, you will want to validate and lint your files anyway. How else will you find that trailing comma in your JSON object?

Now we’ve got that out of the way, let me try to convince you to use YAML.

As developers we are regularly told that we need to document our code. CloudFormation is Infrastructure as Code. If it is code, then we need to document it. That starts with the Description property at the top of the file. If you JSON for your templates, that’s it, you have no other opportunity to document your templates. On the other hand, if you use YAML you can add inline comments. Anywhere you need a comment, drop in a hash # and your comment. Your team mates will thank you.

JSON templates don’t support multiline strings. These days many developers have 4K or ultra wide monitors, we don’t want a string that spans the full width of our 34” screen. Text becomes harder to read once you exceed that “90ish” character limit. With JSON your multiline string becomes "[90ish-characters]\n[another-90ish-characters]\n[and-so-on"]. If you opt for YAML, you can use the greater than symbol (>) and then start your multiline comment like so:

Description: >
  This is the first line of my Description
  and it continues on my second line
  and I'll finish it on my third line.

As you can see it much easier to work with multiline string in YAML than JSON.

“Folded blocks” like the one above are created using the > replace new lines with spaces. This allows you to format your text in a more readable format, but allow a machine to use it as intended. If you want to preserve the new line, use the pipe (|) to create a “literal block”. This is great for an inline Lambda functions where the code remains readable and maintainable.

  APIFunction:
    Type: AWS::Lambda::Function
    Properties:
      Code:
        ZipFile: |
          import json
          import random


          def lambda_handler(event, context):
              return {"statusCode": 200, "body": json.dumps({"value": random.random()})}
      FunctionName: "GetRandom"
      Handler: "index.lambda_handler"
      MemorySize: 128
      Role: !GetAtt LambdaServiceRole.Arn
      Runtime: "python3.7"
		Timeout: 5

Both JSON and YAML require you to escape multibyte characters. That’s less of an issue with CloudFormation templates as generally you’re only using the ASCII character set.

In a YAML file generally you don’t need to quote your strings, but in JSON double quotes are used every where, keys, string values and so on. If your string contains a quote you need to escape it. The same goes for tabs, new lines, backslashes and and so on. JSON based CloudFormation templates can be hard to read because of all the escaping. It also makes it harder to handcraft your JSON when your code is a long escaped string on a single line.

Some configuration in CloudFormation can only be expressed as JSON. Step Functions and some of the AppSync objects in CloudFormation only allow inline JSON configuration. You can still use a YAML template and it is easier if you do when working with these objects.

The JSON only configuration needs to be inlined in your template. If you’re using JSON you have to supply this as an escaped string, rather than nested objects. If you’re using YAML you can inline it as a literal block. Both YAML and JSON templates support functions such as Sub being applied to these strings, it is so much more readable with YAML. See this Step Function example lifted from the AWS documentation:

MyStateMachine:
  Type: "AWS::StepFunctions::StateMachine"
  Properties:
    DefinitionString:
      !Sub |
        {
          "Comment": "A simple AWS Step Functions state machine that automates a call center support session.",
          "StartAt": "Open Case",
          "States": {
            "Open Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:open_case",
              "Next": "Assign Case"
            }, 
            "Assign Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:assign_case",
              "Next": "Work on Case"
            },
            "Work on Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:work_on_case",
              "Next": "Is Case Resolved"
            },
            "Is Case Resolved": {
                "Type" : "Choice",
                "Choices": [ 
                  {
                    "Variable": "$.Status",
                    "NumericEquals": 1,
                    "Next": "Close Case"
                  },
                  {
                    "Variable": "$.Status",
                    "NumericEquals": 0,
                    "Next": "Escalate Case"
                  }
              ]
            },
             "Close Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:close_case",
              "End": true
            },
            "Escalate Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:escalate_case",
              "Next": "Fail"
            },
            "Fail": {
              "Type": "Fail",
              "Cause": "Engage Tier 2 Support."    }   
          }
        }

If you’re feeling lazy you can use inline JSON for IAM policies that you’ve copied from elsewhere. It’s quicker than converting them to YAML.

YAML templates are smaller and more compact than the same configuration stored in a JSON based template. Smaller yet more readable is winning all round in my book.

If you’re still not convinced that you should use YAML for your CloudFormation templates, go read Amazon’s blog post from 2017 advocating the use of YAML based templates.

Amazon makes it easy to convert your existing templates from JSON to YAML. cfn-flip is aPython based AWS Labs tool for converting CloudFormation templates between JSON and YAML. I will assume you’ve already installed cfn-flip. Once you’ve done that, converting your templates with some automated cleanups is just a command away:

cfn-flip --clean template.json template.yaml

git rm the old json file, git add the new one and git commit and git push your changes. Now you’re all set for your new life using YAML based CloudFormation templates.

If you want to learn more about YAML files in general, I recommend you check our Learn X in Y Minutes’ Guide to YAML. If you want to learn more about YAML based CloudFormation templates, check Amazon’s Guide to CloudFormation Templates.

,

Adrian ChaddFixing up ath_rate_sample to actually work well with 11n

Way back in 2011 when I was working on FreeBSD's Atheros 802.11n support I needed to go and teach some rate control code about 802.11n MCS rates. (As a side note, the other FreeBSD wifi hackers and I at the time taught wlan_amrr - the AMRR rate control in net80211 - about basic MCS support too, and fixing that will be the subject of a later post.)

The initial hacks I did to ath_rate_sample made it kind of do MCS rates OK, but it certainly wasn't great. To understand why then and what I've done now, it's best to go for a little trip down journey lane - the initial sample rate control algorithm by John Bicket. You can find a copy of the paper he wrote here - https://pdos.csail.mit.edu/papers/jbicket-ms.pdf .

Now, sample didn't try to optimise maximum throughput. Instead, it attempts to optimise for minimum airtime to get the job done, and also attempted to minimise the time spent sampling rates that had a low probability of working. Note this was all done circa 2005 - at the time the other popular rate control methods tried to maintain the highest PHY rate that met some basic success rate (eg packet loss, bit error rate, etc, etc.) The initial implementation in FreeBSD also included multiple packet size bins - 250 and 1600 bytes - to allow rate selection based on packet length.

However, it made some assumptions about rates that don't quite hold in the 802.11n MCS world. Notably, it didn't take the PHY bitrate into account when comparing rates. It mostly assumed that going up in rate code - except between CCK and OFDM rates - meant it was faster. Now, this is true for 11b, 11g and 11a rates - again except when you transition between 11b and 11g rates - but this definitely doesn't hold true in the 802.11n MCS rate world. Yes, between MCS0 to MCS7 the PHY bitrate goes up, but then MCS8 is MCS0 times two streams, and MCS16 is MCS0 times three streams.

So my 2011/2012 just did the minimum hacks to choose /some/ MCS rates. It didn't take the length of aggregates into account; it just used the length of the first packet in the aggregate. Very suboptimal, but it got MCS rates going.

Now fast-forward to 2020. This works fine if you're close to the other end, but it's very terrible if you're at the fringes of acceptable behaviour. My access points at home are not well located and thus I'm reproducing this behaviour very often - so I decided to fix it.

First up - packet length.  I had to do some work to figure out how much data was in the transmit queue for a given node and TID. (Think "QoS category.") The amount of data in the queue wasn't good enough - chances are we couldn't transmit all of it because of 802.11 state (block-ack window, management traffic, sleep state, etc.) So I needed a quick way to query the amount of traffic in the queue taking into account 802.11 state. That .. ended up being a walk of each packet in the software queue for that node/TID list until we hit our limit, but for now that'll do.

So then I can call ath_rate_lookup() to get a rate control schedule knowing how long a packet may be. But depending up on the rate it returns, the amount of data that may be transmitted could be less - there's a 4ms limit on 802.11n aggregates, so at lower MCS rates you end up only sending much smaller frames (like 3KB at the slowest rate.) So I needed a way to return how many bytes to form an aggregate for as well as the rate. That informed the A-MPDU formation routine how much data it could queue in the aggregate for the given rate.

I also stored that away to use when completing the transmit, just to line things up OK.

Ok, so now I'm able to make rate control decisions based on how much data needs to be sent. ath_rate_sample still only worked with 250 and 1600 byte packets. So, I extended that out to 65536 bytes in mostly-powers-of-two values.  This worked pretty well right out of the box, but the rate control process was still making pretty trash decisions.

The next bit is all "statistics". The decisions that ath_rate_sample makes depend upon accurate estimations of how long packet transmissions took. I found that a lot of the logic was drastically over-compensating for failures by accounting a LOT more time for failures at each attempted rate, rather than only accounting how much time failed at that rate. Here's two examples:
  • If a rate failed, then all the other rates would get failure accounted for the whole length of the transmission to that point. I changed it to only account for failures for that rate - so if three out of four rates failed, each failed rate would only get their individual time accounted to that rate, rather than everything.
  • Short (RTS/CTS) and long (no-ACK) retries were being accounted incorrectly. If 10 short retries occured, then the maximum failed transmission for that rate can't be 10 times the "it happened" long retry style packet accounting. It's a short retry; the only thing that could differ is the rate that RTS/CTS is being exchanged at. Penalising rates because of bursts of short failures was incorrect and I changed that accounting.
There are a few more, but you can look at the change log / change history for sys/dev/ath/ath_rate/sample/ to see.

By and large, I pretty accurately nailed making sure that failed transmit rates account for THEIR failures, not the failures of other rates in the schedule. It was super important for MCS rates because mis-accounting failures across the 24-odd rates you can choose in 3-stream transmit can have pretty disasterous effects on throughput - channel conditions change super frequently and you don't want to penalise things for far, far too long and it take a lot of subsequent successful samples just to try using that rate again.

So that was the statistics side done.

Next up - choices.

Choices was a bit less problematic to fix. My earlier hacks mostly just made it possible to choose MCS rates but it didn't really take into account their behaviour. When you're doing 11a/11g OFDM rates, you know that you go in lock-step from 6, 12, 18, 24, 36, 48, 54MB, and if a rate starts failing the higher rate will likely also fail. However, MCS rates are different - the difference between MCS0 (1/2 BPSK, 1 stream) and MCS8 (1/2 BPSK, 2 streams) is only a couple dB of extra required signal strength. So given a rate, you want to sample at MCS rates around it but also ACROSS streams. So I mostly had to make sure that if I was at say MCS3, I'd also test MCS2 and MCS4, but I'd also test MCS10/11/12 (the 2-stream versions of MCS2/3/4) and maybe MCS18/19/20 for 3-stream. I also shouldn't really bother testing too high up the MCS chain if I'm at a lower MCS rate - there's no guarantee that MCS7 is going to work (5/6 QAM64 - fast but needs a pretty clean channel) if I'm doing ok at MCS2. So, I just went to make sure that the sampling logic wouldn't try all the MCS rates when operating at a given MCS rate. It works pretty well - sampling will try a couple MCS rates either side to see if the average transmit time for that rate is higher or lower, and then it'll bump it up or down to minimise said average transmit time.

However, the one gotcha - packet loss and A-MPDU.

ath_rate_sample was based on single frames, not aggregates. So the concept of average transmit time assumed that the data either got there or it didn't. But, with 802.11n A-MPDU aggregation we can have the higher rates succeed at transmitting SOMETHING - meaning that the average transmit time and long retry failure counts look great - but most of the frames in the A-MPDU are dropped. That means low throughput and more actual airtime being used.

When I did this initial work in 2011/2012 I noted this, so I kept an EWMA of the packet loss both of single frames and aggregates. I wouldn't choose higher rates whose EWMA was outside of a couple percent of the current best rate. It didn't matter how good it looked at the long retry view - if only 5% of sub-frames were ACKed, I needed a quick way to dismiss that. The EWMA logic worked pretty well there and only needed a bit of tweaking.


A few things stand out after testing:

  • For shorter packets, it doesn't matter if it chooses the one, two or three stream rate; the bulk of the airtime is overhead and not data. Ie, the difference between MCS4, MCS12 and MCS20 is any extra training symbols for 2/3 stream rates and a few dB extra signal strength required. So, typically it will alternate between them as they all behave roughly the same.
  • For longer packets, the bulk of the airtime starts becoming data, so it begins to choose rates that are obviously providing lower airtime and higher packet success EWMA. MCS12 is the choice for up to 4096 byte aggregates; the higher rates start rapidly dropping off in EWMA. This could be due to a variety of things, but importantly it's optimising things pretty well.
There's a bunch of future work to tidy this all up some more but it can wait.

Adrian ChaddI'm back into the grind of FreeBSD's wireless stack and 802.11ac

hi!

Yes, it's been a while since I posted here and yes, it's been a while since I was actively working on FreeBSD's wireless stack. Life's been .. well, life. I started the ath10k port in 2015. I wasn't expecting it to take 5 years, but here we are. My life has changed quite a lot since 2015 and a lot of the things I was doing in 2015 just stopped being fun for a while.

But the stars have aligned and it's fun again, so here I am.

Here's where things are right now.

First up - if_run. This is the Ralink (now mediatek) 11abgn USB driver for stuff that they made before Mediatek acquired them. A contributor named Ashish Gupta showed up on the #freebsd-wifi IRC channel on efnet to start working on 11n support to if_run and he got it to the point where the basics worked - and I took it and ran with it enough to land 20MHz 11n support. It turns out I had a couple of suitable NICs to test with and, well, it just happened. I'm super happy Ashish came along to get 11n working on another NIC.

The if_run TODO list (which anyone is welcome to contribute to):

  • Ashish is looking at 40MHz wide channel support right now;
  • Short and long-GI support would be good to have;
  • we need to get 11n TX aggregation working via the firmware interface - it looks like the Linux driver has all the bits we need and it doesn't need retransmission support in net80211. The firmware will do it all if we set up the descriptors correctly.

net80211 work


Next up - net80211. So, net80211 has basic 11ac bits, even if people think it's not there. It doesn't know about MU-MIMO streams yet but it'll be a basic 11ac AP and STA if the driver and regulatory domain supports it.

However, as I implement more of the ath10k port, I find more and more missing bits that really need to be in net80211.

A-MPDU / A-MSDU de-encapsulation


The hardware does A-MPDU and A-MSDU de-encapsulation in hardware/firmware, pushing up individual decrypted and de-encapsulated frames to the driver. It supports native wifi and 802.3 (ethernet) encapsulation, and right now we only support native wifi. (Note - net80211 supports 802.3 as well; I'll try to get that going once the driver lands.)

I added support to handle decryption offload with the ath10k supplied A-MPDU/A-MSDU frames (where there's no PN/MIC at all, it's all done in firmware/hardware!) so we could get SOME traffic. However, receive throughput just plainly sucked when I last poked at this. I also added A-MSDU offload support where we wouldn't drop the A-MSDU frames with the same receive 802.11 sequence number. However...

It turns out that my mac was doing A-MSDU in A-MPDU in 11ac, and the net80211 receive A-MPDU reordering was faithfully dropping all A-MSDU frames with the same receive 802.11 sequence number. So TCP would just see massive packet loss and drop the throughput in a huge way. Implementing this feature requires buffering all A-MSDU frames in an A-MPDU sub-frame in the reordering queue rather than tossing them, and then reordering them as if they were a single frame.

So I modified the receive reordering logic to reorder queues of mbufs instead of mbufs, and patched things to allow queuing multiple mbufs as long as they were appropriately stamped as being A-MSDUs in a single A-MPDU subframe .. and now the receive traffic rate is where it should be (> 300mbit UDP/TCP.) Phew.


U-APSD support


I didn't want to implement full U-APSD support in the Atheros 11abgn driver because it requires a lot of driver work to get it right, but the actual U-APSD negotiation support in net80211 is significantly easier. If the NIC supports U-APSD offload (like ath10k does) then I just have to populate the WME QoS fields appropriately and call into the driver to notify them about U-APSD changes.

Right now net80211 doesn't support the ADD-TS / DEL-TS methods for clients requesting explicit QoS requirements.

Migrating more options to per-VAP state


There are a bunch of net80211 state which was still global rather than per-VAP. It makes sense in the old world - NICs that do things in the driver or net80211 side are driven in software, not in firmware, so things like "the current channel", "short/long preamble", etc are global state. However the later NICs that offload various things into firmware can now begin to do interesting things like background channel switching for scan, background channel switching between STA and P2P-AP / P2P-STA. So a lot of state should be kept per-VAP rather than globally so the "right" flags and IEs are set for a given VAP.

I've started migrating this state into per-VAP fields rather than global, but it showed a second shortcoming - because it was global, we weren't explicitly tracking these things per-channel. Ok, this needs a bit more explanation.

Say you're on a 2GHz channel and you need to determine whether you care about 11n, 11g or 11b clients. If you're only seeing and servicing 11n clients then you should be using the short slot time, short preamble and not require RTS/CTS protection to interoperate with pre-11n clients.

But then an 11g client shows up.

The 11g client doesn't need to interoperate with 11b, only 11n - so it doesn't need RTS/CTS. It can use short preamble and short slot time still. But the 11n client need to interoperate, so it needs to switch protection mode into legacy - and it will do RTS/CTS protection.

But then, an 11b client shows up.

At this point the 11g protection kicks in; everyone does RTS/CTS protection and long preamble/slot time kicks in.

Now - is this a property of a VAP, or of a channel? Technically speaking, it's the property of a channel. If any VAP on that channel sees an 11b or 11g client, ALL VAPs need to transition to update protection mode.

I migrated all of this to be per-VAP, but I kept the global state for literally all the drivers that currently consume it. The ath10k driver now uses the per-VAP state for the above, greatly simplifying things (and finishing TODO items in the driver!)


ath10k changes


And yes, I've been hacking on ath10k too.

Locking issues


I've had a bunch of feedback and pull requests from Bjorn and Geramy pointing out lock ordering / deadlock issues in ath10k. I'm slowly working through them; the straight conversion from Linux to FreeBSD showed the differences in our locking and how/when driver threads run. I will rant about this another day.

Encryption key programming


The encryption key programming is programmed using firmware calls, but net80211 currently expects them to be done synchronously. We can't sleep in the net80211 crypto key updates without changing net80211's locks to all be SX locks (and I honestly think that's a bad solution that papers over non-asynchronous code that honestly should just be made asynchronous.) Anyway, so it and the node updates are done using deferred calls - but this required me to take complete copies of the encryption key contents. It turns out net80211 can pretty quickly recycle the key contents - including the key that is hiding inside the ieee80211_node. This fixed up the key reprogramming and deletion - it was sometimes sending garbage to the firmware. Whoops.


What's next?


So what's next? Well, I want to land the ath10k driver! There are still a whole bunch of things to do in both net80211 and the driver before I can do this.

Add 802.11ac channel entries to regdomain.xml


Yes, I added it - but only for FCC. I didn't add them for all the other regulatory domain codes. It's a lot of work because of how this file is implemented and I'd love help here.


Add MU-MIMO group notification


I'd like to make sure that we can at least support associating to a MU-MIMO AP. I think ath10k does it in firmware but we need to support the IE notifications.

Block traffic from being transmitted during a node creation or key update


Right now net80211 will transmit frames right after adding a node or sending a key update - it assumes the driver is completing it before returning. For software driven NICs like the pre-11ac Atheros chips this holds true, but for everything USB and newer firmware based devices this definitely doesn't hold.

For ath10k in particular if you try transmitting a frame without a node in firmware the whole transmit path just hangs. Whoops. So I've fixed that so we can't queue a frame if the firmware doesn't know about the node but ...

... net80211 will send the association responses in hostap mode once the node is created. This means the first association response doesn't make it to the associating client. Since net80211 doesn't yet do this traffic buffering, I'll do it in ath10k- I'll buffer frames during a key update and during node addition/deletion to make sure that nothing is sent OR dropped.

Clean up the Linux-y bits


There's a bunch of dead code which we don't need or don't use; as well as some compatibility bits that define Linux mac80211/nl80211 bits that should live in net80211. I'm going to turn these into net80211 methods and remove the Linux-y bits from ath10k. Bjorn's work to make linuxkpi wifi shims can then just translate the calls to the net80211 API bits I'll add, rather than having to roll full wifi methods inside linuxkpi.


To wrap up ..


.. job changes, relationship changes, having kids, getting a green card, buying a house and paying off old debts from your old hosting company can throw a spanner in the life machine. On the plus side, hacking on FreeBSD and wifi support are fun again and I'm actually able to sleep through the night once more, so ... here goes!

If you're interested in helping out, I've been updating the net80211/driver TODO list here: https://wiki.freebsd.org/WiFi/TodoStuff . I'd love some help, even on the small things!


,

Jan SchmidtOpenHMD and the Oculus Rift

For some time now, I’ve been involved in the OpenHMD project, working on building an open driver for the Oculus Rift CV1, and more recently the newer Rift S VR headsets.

This post is a bit of an overview of how the 2 devices work from a high level for people who might have used them or seen them, but not know much about the implementation. I also want to talk about OpenHMD and how it fits into the evolving Linux VR/AR API stack.

OpenHMD

http://www.openhmd.net/

In short, OpenHMD is a project providing open drivers for various VR headsets through a single simple API. I don’t know of any other project that provides support for as many different headsets as OpenHMD, so it’s the logical place to contribute for largest effect.

OpenHMD is supported as a backend in Monado, and in SteamVR via the SteamVR-OpenHMD plugin. Working drivers in OpenHMD opens up a range of VR games – as well as non-gaming applications like Blender. I think it’s important that Linux and friends not get left behind – in what is basically a Windows-only activity right now.

One downside is that does come with the usual disadvantages of an abstraction API, in that it doesn’t fully expose the varied capabilities of each device, but instead the common denominator. I hope we can fix that in time by extending the OpenHMD API, without losing its simplicity.

Oculus Rift S

I bought an Oculus Rift S in April, to supplement my original consumer Oculus Rift (the CV1) from 2017. At that point, the only way to use it was in Windows via the official Oculus driver as there was no open source driver yet. Since then, I’ve largely reverse engineered the USB protocol for it, and have implemented a basic driver that’s upstream in OpenHMD now.

I find the Rift S a somewhat interesting device. It’s not entirely an upgrade over the older CV1. The build quality, and some of the specifications are actually worse than the original device – but one area that it is a clear improvement is in the tracking system.

CV1 Tracking

The Rift CV1 uses what is called an outside-in tracking system, which has 2 major components. The first is input from Inertial Measurement Units (IMU) on each device – the headset and the 2 hand controllers. The 2nd component is infrared cameras (Rift Sensors) that you space around the room and then run a calibration procedure that lets the driver software calculate their positions relative to the play area.

IMUs provide readings of linear acceleration and angular velocity, which can be used to determine the orientation of a device, but don’t provide absolute position information. You can derive relative motion from a starting point using an IMU, but only over a short time frame as the integration of the readings is quite noisy.

This is where the Rift Sensors get involved. The cameras observe constellations of infrared LEDs on the headset and hand controllers, and use those in concert with the IMU readings to position the devices within the playing space – so that as you move, the virtual world accurately reflects your movements. The cameras and LEDs synchronise to a radio pulse from the headset, and the camera exposure time is kept very short. That means the picture from the camera is completely black, except for very bright IR sources. Hopefully that means only the LEDs are visible, although light bulbs and open windows can inject noise and make the tracking harder.

Rift Sensor view of the CV1 headset and 2 controllers.
Rift Sensor view of the CV1 headset and 2 controllers.

If you have both IMU and camera data, you can build what we call a 6 Degree of Freedom (6DOF) driver. With only IMUs, a driver is limited to providing 3 DOF – allowing you to stand in one place and look around, but not to move.

OpenHMD provides a 3DOF driver for the CV1 at this point, with experimental 6DOF work in a branch in my fork. Getting to a working 6DOF driver is a real challenge. The official drivers from Oculus still receive regular updates to tweak the tracking algorithms.

I have given several presentations about the progress on implementing positional tracking for the CV1. Most recently at Linux.conf.au 2020 in January. There’s a recording at https://www.youtube.com/watch?v=PTHE-cdWN_s if you’re interested, and I plan to talk more about that in a future post.

Rift S Tracking

The Rift S uses Inside Out tracking, which inverts the tracking process by putting the cameras on the headset instead of around the room. With the cameras in fixed positions on the headset, the cameras and their view of the world moves as the user’s head moves. For the Rift S, there are 5 individual cameras pointing outward in different directions to provide (overall) a very wide-angle view of the surroundings.

The role of the tracking algorithm in the driver in this scenario is to use the cameras to look for visual landmarks in the play area, and to combine that information with the IMU readings to find the position of the headset. This is called Visual Inertial Odometry.

There is then a 2nd part to the tracking – finding the position of the hand controllers. This part works the same as on the CV1 – looking for constellations of LED lights on the controllers and matching what you see to a model of the controllers.

This is where I think the tracking gets particularly interesting. The requirements for finding where the headset is in the room, and the goal of finding the controllers require 2 different types of camera view!

To find the landmarks in the room, the vision algorithm needs to be able to see everything clearly and you want a balanced exposure from the cameras. To identify the controllers, you want a very fast exposure synchronised with the bright flashes from the hand controller LEDs – the same as when doing CV1 tracking.

The Rift S satisfies both requirements by capturing alternating video frames with fast and normal exposures. Each time, it captures the 5 cameras simultaneously and stitches them together into 1 video frame to deliver over USB to the host computer. The driver then needs to split each frame according to whether it is a normal or fast exposure and dispatch it to the appropriate part of the tracking algorithm.

Rift S – normal room exposure for Visual Inertial Odometry.
Rift S – fast exposure with IR LEDs for controller tracking.

There are a bunch of interesting things to notice in these camera captures:

  • Each camera view is inserted into the frame in some native orientation, and requires external information to make use of the information in them
  • The cameras have a lot of fisheye distortion that will need correcting.
  • In the fast exposure frame, the light bulbs on my ceiling are hard to tell apart from the hand controller LEDs – another challenge for the computer vision algorithm.
  • The cameras are Infrared only, which is why the Rift S passthrough view (if you’ve ever seen it) is in grey-scale.
  • The top 16-pixels of each frame contain some binary data to help with frame identification. I don’t know how to interpret the contents of that data yet.

Status

This blog post is already too long, so I’ll stop here. In part 2, I’ll talk more about deciphering the Rift S protocol.

Thanks for reading! If you have any questions, hit me up at mailto:thaytan@noraisin.net or @thaytan on Twitter

Tim RileyPhilly.rb talk on hanami-view 2.0

Last month I had the honour of speaking at Philly.rb’s first remote meetup!

I took the opportunity to revise my year-ago talk about dry-view and update it for our current plans for hanami-view 2.0.

You can watch along here:


Doing a talk like this for a fully-remote audience was tricky. With its narrative structure and dramatic premise, I rely on audience cues to make sure I’m hitting the mark. Here, I just had to passionately project into the void… and hope it landed!

Fortunately, the Philly.rb crew was great, and the “after show” was where the remote format really excelled. A bunch of great questions came up, and I was able to jump into screen-sharing mode and do an interactive session on how both view rendering and the broader framework will hang together for Hanami 2.0.

All up, I really enjoyed the session. Thank you Ernesto for facilitating it! With our limited travel possibilities right now, I’d love to do this a little more: please reach out if you’d be interested in me contributing to your remote conference or meet-up.

,

Dave HallLogging Step Functions to CloudWatch

Many AWS Services log to CloudWatch. Some do it out of the box, others need to be configured to log properly. When Amazon released Step Functions, they didn’t include support for logging to CloudWatch. In February 2020, Amazon announced StepFunctions could now log to CloudWatch. Step Functions still support CloudTrail logs, but CloudWatch logging is more useful for many teams.

Users need to configure Step Functions to log to CloudWatch. This is done on a per State Machine basis. Of course you could click around he console to enable it, but that doesn’t scale. If you use CloudFormation to manage your Step Functions, it is only a few extra lines of configuration to add the logging support.

In my example I will assume you are using YAML for your CloudFormation templates. I’ll save my “if you’re using JSON for CloudFormation you’re doing it wrong” rant for another day. This is a cut down example from one of my services:

---
AWSTemplateFormatVersion: '2010-09-09'
Description: StepFunction with Logging Example.
Parameters:
Resources:
  StepFunctionExecRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service: !Sub "states.${AWS::Region}.amazonaws.com"
          Action:
          - sts:AssumeRole
      Path: "/"
      Policies:
      - PolicyName: StepFunctionExecRole
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Effect: Allow
            Action:
            - lambda:InvokeFunction
            - lambda:ListFunctions
            Resource: !Sub "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:my-lambdas-namespace-*"
          - Effect: Allow
            Action:
            - logs:CreateLogDelivery
            - logs:GetLogDelivery
            - logs:UpdateLogDelivery
            - logs:DeleteLogDelivery
            - logs:ListLogDeliveries
            - logs:PutResourcePolicy
            - logs:DescribeResourcePolicies
            - logs:DescribeLogGroups
            Resource: "*"
  MyStateMachineLogGroup:
    Type: AWS::Logs::LogGroup
    Properties:
      LogGroupName: /aws/stepfunction/my-step-function
      RetentionInDays: 14
  DashboardImportStateMachine:
    Type: AWS::StepFunctions::StateMachine
    Properties:
      StateMachineName: my-step-function
      StateMachineType: STANDARD
      LoggingConfiguration:
        Destinations:
          - CloudWatchLogsLogGroup:
             LogGroupArn: !GetAtt MyStateMachineLogGroup.Arn
        IncludeExecutionData: True
        Level: ALL
      DefinitionString:
        !Sub |
        {
          ... JSON Step Function definition goes here
        }
      RoleArn: !GetAtt StepFunctionExecRole.Arn

The key pieces in this example are the second statement in the IAM Role with all the logging permissions, the LogGroup defined by MyStateMachineLogGroup and the LoggingConfiguration section of the Step Function definition.

The IAM role permissions are copied from the example policy in the AWS documentation for using CloudWatch Logging with Step Functions. The CloudWatch IAM permissions model is pretty weak, so we need to grant these broad permissions.

The LogGroup definition creates the log group in CloudWatch. You can use what ever value you want for the LogGroupName. I followed the Amazon convention of prefixing everything with /aws/[service-name]/ and then appended the Step Function name. I recommend using the RetentionInDays configuration. It stops old logs sticking around for ever. In my case I send all my logs to ELK, so I don’t need to retain them in CloudWatch long term.

Finally we use the LoggingConfiguration to tell AWS where we want to send out logs. You can only specify a single Destinations. The IncludeExecutionData determines if the inputs and outputs of each function call is logged. You should not enable this if you are passing sensitive information between your steps. The verbosity of logging is controlled by Level. Amazon has a page on Step Function log levels. For dev you probably want to use ALL to help with debugging but in production you probably only need ERROR level logging.

I removed the Parameters and Output from the template. Use them as you need to.

,

Paul WiseFLOSS Activities June 2020

Focus

This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Administration

  • Debian BTS: usertags QA
  • Debian IRC channels: fixed a channel mode lock
  • Debian wiki: unblock IP addresses, approve accounts, ping folks with bouncing email

Communication

  • Respond to queries from Debian users and developers on the mailing lists and IRC

Sponsors

The ifenslave and apt-listchanges work was sponsored by my employer. All other work was done on a volunteer basis.

,

Craig SandersFuck Grey Text

fuck grey text on white backgrounds
fuck grey text on black backgrounds
fuck thin, spindly fonts
fuck 10px text
fuck any size of anything in px
fuck font-weight 300
fuck unreadable web pages
fuck themes that implement this unreadable idiocy
fuck sites that don’t work without javascript
fuck reactjs and everything like it

thank fuck for Stylus. and uBlock Origin. and uMatrix.

Fuck Grey Text is a post from: Errata

,

Hamish TaylorBlog: A new beginning

Earlier today I launched this site. It is the result of a lot of work over the past few weeks. It began as an idea to publicise some of my photos, and morphed into the site you see now, including a store and blog that I’ve named “Photekgraddft”.

In the weirdly named blog, I want to talk about photography, the stories behind some of my more interesting shots, the gear and software I use, my technology career, my recent ADHD diagnosis and many other things.

This scares me quite a lot. I’ve never really put myself out onto the internet before. If you Google me, you’re not going to find anything much. Google Images has no photos of me. I’ve always liked it that way. Until now.

ADHD’ers are sometimes known for “oversharing”, one of the side-effects of the inability to regulate emotions well. I’ve always been the opposite, hiding, because I knew I was different, but didn’t understand why.

The combination of the COVID-19 pandemic and my recent ADHD diagnosis have given me a different perspective. I now know why I hid. And now I want to engage, and be engaged, in the world.

If I can be a force for positive change, around people’s knowledge and opinion of ADHD, then I will.

If talking about Business Analysis (my day job), and sharing my ideas for optimising organisations helps anyone at all, then I will.

If I can show my photos and brighten someone’s day by allowing them to enjoy a sunset, or a flying bird, then I will.

And if anyone buys any of my photos, then I will be shocked!

So welcome to my little vanity project. I hope it can be something positive, for me, if for noone else in this new, odd world in which we now find ourselves living together.

,

Hamish TaylorPhoto: Rain on leaves

,

Paul WiseFLOSS Activities May 2020

Focus

This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Administration

  • nsntrace: talk to upstream about collaborative maintenance
  • Debian: deploy changes, debug issue with GPS markers file generation, migrate bls/DUCK from alioth-archive to salsa
  • Debian website: ran map cron job, synced mirrors
  • Debian wiki: approve accounts, ping folks with bouncing email

Communication

Sponsors

The apt-offline work and the libfile-libmagic-perl backports were sponsored. All other work was done on a volunteer basis.

,

Hamish TaylorVideo: A Foggy Autumn Morning

Video: A Foggy Autumn Morning

Hamish TaylorPhoto: Walking the dog on a cold Autumn morning

Photo: Walking the dog on a cold Autumn morning

Photo: Walking the dog on a cold Autumn morning

,

Rusty Russell57 Varieties of Pyrite: Exchanges Are Now The Enemy of Bitcoin

TL;DR: exchanges are casinos and don’t want to onboard anyone into bitcoin. Avoid.

There’s a classic scam in the “crypto” space: advertize Bitcoin to get people in, then sell suckers something else entirely. Over the last few years, this bait-and-switch has become the core competency of “bitcoin” exchanges.

I recently visited the homepage of Australian exchange btcmarkets.net: what a mess. There was a list of dozens of identical-looking “cryptos”, with bitcoin second after something called “XRP”; seems like it was sorted by volume?

Incentives have driven exchanges to become casinos, and they’re doing exactly what you’d expect unregulated casinos to do. This is no place you ever want to send anyone.

Incentives For Exchanges

Exchanges make money on trading, not on buying and holding. Despite the fact that bitcoin is the only real attempt to create an open source money, scams with no future are given false equivalence, because more assets means more trading. Worse than that, they are paid directly to list new scams (the crappier, the more money they can charge!) and have recently taken the logical step of introducing and promoting their own crapcoins directly.

It’s like a gold dealer who also sells 57 varieties of pyrite, which give more margin than selling actual gold.

For a long time, I thought exchanges were merely incompetent. Most can’t even give out fresh addresses for deposits, batch their outgoing transactions, pay competent fee rates, perform RBF or use segwit.

But I misunderstood: they don’t want to sell bitcoin. They use bitcoin to get you in the door, but they want you to gamble. This matters: you’ll find subtle and not-so-subtle blockers to simply buying bitcoin on an exchange. If you send a friend off to buy their first bitcoin, they’re likely to come back with something else. That’s no accident.

Looking Deeper, It Gets Worse.

Regrettably, looking harder at specific exchanges makes the picture even bleaker.

Consider Binance: this mainland China backed exchange pretending to be a Hong Kong exchange appeared out of nowhere with fake volume and demonstrated the gullibility of the entire industry by being treated as if it were a respected member. They lost at least 40,000 bitcoin in a known hack, and they also lost all the personal information people sent them to KYC. They aggressively market their own coin. But basically, they’re just MtGox without Mark Karpales’ PHP skills or moral scruples and much better marketing.

Coinbase is more interesting: an MBA-run “bitcoin” company which really dislikes bitcoin. They got where they are by spending big on regulations compliance in the US so they could operate in (almost?) every US state. (They don’t do much to dispel the wide belief that this regulation protects their users, when in practice it seems only USD deposits have any guarantee). Their natural interest is in increasing regulation to maintain that moat, and their biggest problem is Bitcoin.

They have much more affinity for the centralized coins (Ethereum) where they can have influence and control. The anarchic nature of a genuine open source community (not to mention the developers’ oft-stated aim to improve privacy over time) is not culturally compatible with a top-down company run by the Big Dog. It’s a running joke that their CEO can’t say the word “Bitcoin”, but their recent “what will happen to cryptocurrencies in the 2020s” article is breathtaking in its boldness: innovation is mainly happening on altcoins, and they’re going to overtake bitcoin any day now. Those scaling problems which the Bitcoin developers say they don’t know how to solve? This non-technical CEO knows better.

So, don’t send anyone to an exchange, especially not a “market leading” one. Find some service that actually wants to sell them bitcoin, like CashApp or Swan Bitcoin.

,

Stewart Smithop-build v2.5 firmware for the Raptor Blackbird

Well, following on from my post where I excitedly pointed out that Raptor Blackbird support: all upstream in op-build v2.5, that means I can do another in my series of (close to) upstream Blackbird firmware builds.

This time, the only difference from straight upstream op-build v2.5 is my fixes for buildroot so that I can actually build it on Fedora 32.

So, head over to https://www.flamingspork.com/blackbird/op-build-v2.5-blackbird-images/ and grab blackbird.pnor to flash it on your blackbird, let me know how it goes!

Matthew OliverGNS3 FRR Appliance

In my spare time, what little I have, I’ve been wanting to play with some OSS networking projects. For those playing along at home, during last Suse hackweek I played with wireguard, and to test the environment I wanted to set up some routing.
For which I used FRR.

FRR is a pretty cool project, if brings the networking routing stack to Linux, or rather gives us a full opensource routing stack. As most routers are actually Linux anyway.

Many years ago I happened to work at Fujitsu working in a gateway environment, and started playing around with networking. And that was my first experience with GNS3. An opensource network simulator. Back then I needed to have a copy of cisco IOS images to really play with routing protocols, so that make things harder, great open source product but needed access to proprietary router OSes.

FRR provides a CLI _very_ similar to ciscos, and make we think, hey I wonder if there is an FRR appliance we can use in GNS3?
And there was!!!

When I downloaded it and decompressed the cow2 image it was 1.5GB!!! For a single router image. It works great, but what if I wanted a bunch of routers to play with things like OSPF or BGP etc. Surely we can make a smaller one.

Kiwi

At Suse we use kiwi-ng to build machine images and release media. And to make things even easier for me we already have a kiwi config for small OpenSuse Leap JEOS images, jeos is “just enough OS”. So I hacked one to include FRR. All extra tweaks needed to the image are also easily done by bash hook scripts.

I wont go in to too much detail how because I created a git repo where I have it all including a detailed README: https://github.com/matthewoliver/frr_gns3

So feel free to check that would and build and use the image.

But today, I went one step further. OpenSuse’s Open Build System, which is used to build all RPMs for OpenSuse, but can also build debs and whatever build you need, also supports building docker containers and system images using kiwi!

So have now got the OBS to build the image for me. The image can be downloaded from: https://download.opensuse.org/repositories/home:/mattoliverau/images/

And if you want to send any OBS requests to change it the project/package is: https://build.opensuse.org/package/show/home:mattoliverau/FRR-OpenSuse-Appliance

To import it into GNS3 you need the gns3a file, which you can find in my git repo or in the OBS project page.

The best part is this image is only 300MB, which is much better then 1.5GB!
I did have it a little smaller, 200-250MB, but unfortunately the JEOS cut down kernel doesn’t contain the MPLS modules, so had to pull in the full default SUSE kernel. If this became a real thing and not a pet project, I could go and build a FRR cutdown kernel to get the size down, but 300MB is already a lot better then where it was at.

Hostname Hack

When using GNS3 and you place a router, you want to be able to name the router and when you access the console it’s _really_ nice to see the router name you specified in GNS3 as the hostname. Why, because if you have a bunch, you want want a bunch of tags all with the localhost hostname on the commandline… this doesn’t really help.

The FRR image is using qemu, and there wasn’t a nice way to access the name of the VM from inside the container, and now an easy way to insert the name from outside. But found 1 approach that seems to be working, enter my dodgy hostname hack!

I also wanted to to it without hacking the gns3server code. I couldn’t easily pass the hostname in but I could pass it in via a null device with the router name its id:

/dev/virtio-ports/frr.router.hostname.%vm-name%

So I simply wrote a script that sets the hostname based on the existence of this device. Made the script a systemd oneshot service to start at boot and it worked!

This means changing the name of the FRR router in the GNS3 interface, all you need to do is restart the router (stop and start the device) and it’ll apply the name to the router. This saves you having to log in as root and running hostname yourself.

Or better, if you name all your FRR routers before turning them on, then it’ll just work.

In conclusion…

Hopefully now we can have a fully opensource, GNS3 + FRR appliance solution for network training, testing, and inspiring network engineers.

,

Matt PalmerPrivate Key Redaction: UR DOIN IT RONG

Because posting private keys on the Internet is a bad idea, some people like to “redact” their private keys, so that it looks kinda-sorta like a private key, but it isn’t actually giving away anything secret. Unfortunately, due to the way that private keys are represented, it is easy to “redact” a key in such a way that it doesn’t actually redact anything at all. RSA private keys are particularly bad at this, but the problem can (potentially) apply to other keys as well.

I’ll show you a bit of “Inside Baseball” with key formats, and then demonstrate the practical implications. Finally, we’ll go through a practical worked example from an actual not-really-redacted key I recently stumbled across in my travels.

The Private Lives of Private Keys

Here is what a typical private key looks like, when you come across it:

-----BEGIN RSA PRIVATE KEY-----
MGICAQACEQCxjdTmecltJEz2PLMpS4BXAgMBAAECEDKtuwD17gpagnASq1zQTYEC
CQDVTYVsjjF7IQIJANUYZsIjRsR3AgkAkahDUXL0RSECCB78r2SnsJC9AghaOK3F
sKoELg==
-----END RSA PRIVATE KEY-----

Obviously, there’s some hidden meaning in there – computers don’t encrypt things by shouting “BEGIN RSA PRIVATE KEY!”, after all. What is between the BEGIN/END lines above is, in fact, a base64-encoded DER format ASN.1 structure representing a PKCS#1 private key.

In simple terms, it’s a list of numbers – very important numbers. The list of numbers is, in order:

  • A version number (0);
  • The “public modulus”, commonly referred to as “n”;
  • The “public exponent”, or “e” (which is almost always 65,537, for various unimportant reasons);
  • The “private exponent”, or “d”;
  • The two “private primes”, or “p” and “q”;
  • Two exponents, which are known as “dmp1” and “dmq1”; and
  • A coefficient, known as “iqmp”.

Why Is This a Problem?

The thing is, only three of those numbers are actually required in a private key. The rest, whilst useful to allow the RSA encryption and decryption to be more efficient, aren’t necessary. The three absolutely required values are e, p, and q.

Of the other numbers, most of them are at least about the same size as each of p and q. So of the total data in an RSA key, less than a quarter of the data is required. Let me show you with the above “toy” key, by breaking it down piece by piece1:

  • MGI – DER for “this is a sequence”
  • CAQ – version (0)
  • CxjdTmecltJEz2PLMpS4BXn
  • AgMBAAe
  • ECEDKtuwD17gpagnASq1zQTYd
  • ECCQDVTYVsjjF7IQp
  • IJANUYZsIjRsR3q
  • AgkAkahDUXL0RSdmp1
  • ECCB78r2SnsJC9dmq1
  • AghaOK3FsKoELg==iqmp

Remember that in order to reconstruct all of these values, all I need are e, p, and q – and e is pretty much always 65,537. So I could “redact” almost all of this key, and still give all the important, private bits of this key. Let me show you:

-----BEGIN RSA PRIVATE KEY-----
..............................................................EC
CQDVTYVsjjF7IQIJANUYZsIjRsR3....................................
........
-----END RSA PRIVATE KEY-----

Now, I doubt that anyone is going to redact a key precisely like this… but then again, this isn’t a “typical” RSA key. They usually look a lot more like this:

-----BEGIN RSA PRIVATE KEY-----
MIIEogIBAAKCAQEAu6Inch7+mWtKn+leB9uCG3MaJIxRyvC/5KTz2fR+h+GOhqj4
SZJobiVB4FrE5FgC7AnlH6qeRi9MI0s6dt5UWZ5oNIeWSaOOeNO+EJDUkSVf67wj
SNGXlSjGAkPZ0nRJiDjhuPvQmdW53hOaBLk5udxPEQbenpXAzbLJ7wH5ouLQ3nQw
HwpwDNQhF6zRO8WoscpDVThOAM+s4PS7EiK8ZR4hu2toon8Ynadlm95V45wR0VlW
zywgbkZCKa1IMrDCscB6CglQ10M3Xzya3iTzDtQxYMVqhDrA7uBYRxA0y1sER+Rb
yhEh03xz3AWemJVLCQuU06r+FABXJuY/QuAVvQIDAQABAoIBAFqwWVhzWqNUlFEO
PoCVvCEAVRZtK+tmyZj9kU87ORz8DCNR8A+/T/JM17ZUqO2lDGSBs9jGYpGRsr8s
USm69BIM2ljpX95fyzDjRu5C0jsFUYNi/7rmctmJR4s4uENcKV5J/++k5oI0Jw4L
c1ntHNWUgjK8m0UTJIlHbQq0bbAoFEcfdZxd3W+SzRG3jND3gifqKxBG04YDwloy
tu+bPV2jEih6p8tykew5OJwtJ3XsSZnqJMwcvDciVbwYNiJ6pUvGq6Z9kumOavm9
XU26m4cWipuK0URWbHWQA7SjbktqEpxsFrn5bYhJ9qXgLUh/I1+WhB2GEf3hQF5A
pDTN4oECgYEA7Kp6lE7ugFBDC09sKAhoQWrVSiFpZG4Z1gsL9z5YmZU/vZf0Su0n
9J2/k5B1GghvSwkTqpDZLXgNz8eIX0WCsS1xpzOuORSNvS1DWuzyATIG2cExuRiB
jYWIJUeCpa5p2PdlZmBrnD/hJ4oNk4oAVpf+HisfDSN7HBpN+TJfcAUCgYEAyvY7
Y4hQfHIdcfF3A9eeCGazIYbwVyfoGu70S/BZb2NoNEPymqsz7NOfwZQkL4O7R3Wl
Rm0vrWT8T5ykEUgT+2ruZVXYSQCKUOl18acbAy0eZ81wGBljZc9VWBrP1rHviVWd
OVDRZNjz6nd6ZMrJvxRa24TvxZbJMmO1cgSW1FkCgYAoWBd1WM9HiGclcnCZknVT
UYbykCeLO0mkN1Xe2/32kH7BLzox26PIC2wxF5seyPlP7Ugw92hOW/zewsD4nLze
v0R0oFa+3EYdTa4BvgqzMXgBfvGfABJ1saG32SzoWYcpuWLLxPwTMsCLIPmXgRr1
qAtl0SwF7Vp7O/C23mNukQKBgB89DOEB7xloWv3Zo27U9f7nB7UmVsGjY8cZdkJl
6O4LB9PbjXCe3ywZWmJqEbO6e83A3sJbNdZjT65VNq9uP50X1T+FmfeKfL99X2jl
RnQTsrVZWmJrLfBSnBkmb0zlMDAcHEnhFYmHFuvEnfL7f1fIoz9cU6c+0RLPY/L7
n9dpAoGAXih17mcmtnV+Ce+lBWzGWw9P4kVDSIxzGxd8gprrGKLa3Q9VuOrLdt58
++UzNUaBN6VYAe4jgxGfZfh+IaSlMouwOjDgE/qzgY8QsjBubzmABR/KWCYiRqkj
qpWCgo1FC1Gn94gh/+dW2Q8+NjYtXWNqQcjRP4AKTBnPktEvdMA=
-----END RSA PRIVATE KEY-----

People typically redact keys by deleting whole lines, and usually replacing them with [...] and the like. But only about 345 of those 1588 characters (excluding the header and footer) are required to construct the entire key. You can redact about 4/5ths of that giant blob of stuff, and your private parts (or at least, those of your key) are still left uncomfortably exposed.

But Wait! There’s More!

Remember how I said that everything in the key other than e, p, and q could be derived from those three numbers? Let’s talk about one of those numbers: n.

This is known as the “public modulus” (because, along with e, it is also present in the public key). It is very easy to calculate: n = p * q. It is also very early in the key (the second number, in fact).

Since n = p * q, it follows that q = n / p. Thus, as long as the key is intact up to p, you can derive q by simple division.

Real World Redaction

At this point, I’d like to introduce an acquaintance of mine: Mr. Johan Finn. He is the proud owner of the GitHub repo johanfinn/scripts. For a while, his repo contained a script that contained a poorly-redacted private key. He since deleted it, by making a new commit, but of course because git never really deletes anything, it’s still available.

Of course, Mr. Finn may delete the repo, or force-push a new history without that commit, so here is the redacted private key, with a bit of the surrounding shell script, for our illustrative pleasure:

#Add private key to .ssh folder
cd /home/johan/.ssh/
echo  "-----BEGIN RSA PRIVATE KEY-----
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
ÄÄÄÄÄÄÄÄÄÄÄÄÄÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
MIIJKgIBAAKCAgEAxEVih1JGb8gu/Fm4AZh+ZwJw/pjzzliWrg4mICFt1g7SmIE2
TCQMKABdwd11wOFKCPc/UzRH/fHuQcvWrpbOSdqev/zKff9iedKw/YygkMeIRaXB
fYELqvUAOJ8PPfDm70st9GJRhjGgo5+L3cJB2gfgeiDNHzaFvapRSU0oMGQX+kI9
ezsjDAn+0Pp+r3h/u1QpLSH4moRFGF4omNydI+3iTGB98/EzuNhRBHRNq4oBV5SG
Pq/A1bem2ninnoEaQ+OPESxYzDz3Jy9jV0W/6LvtJ844m+XX69H5fqq5dy55z6DW
sGKn78ULPVZPsYH5Y7C+CM6GAn4nYCpau0t52sqsY5epXdeYx4Dc+Wm0CjXrUDEe
Egl4loPKDxJkQqQ/MQiz6Le/UK9vEmnWn1TRXK3ekzNV4NgDfJANBQobOpwt8WVB
rbsC0ON7n680RQnl7PltK9P1AQW5vHsahkoixk/BhcwhkrkZGyDIl9g8Q/Euyoq3
eivKPLz7/rhDE7C1BzFy7v8AjC3w7i9QeHcWOZFAXo5hiDasIAkljDOsdfD4tP5/
wSO6E6pjL3kJ+RH2FCHd7ciQb+IcuXbku64ln8gab4p8jLa/mcMI+V3eWYnZ82Yu
axsa85hAe4wb60cp/rCJo7ihhDTTvGooqtTisOv2nSvCYpcW9qbL6cGjAXECAwEA
AQKCAgEAjz6wnWDP5Y9ts2FrqUZ5ooamnzpUXlpLhrbu3m5ncl4ZF5LfH+QDN0Kl
KvONmHsUhJynC/vROybSJBU4Fu4bms1DJY3C39h/L7g00qhLG7901pgWMpn3QQtU
4P49qpBii20MGhuTsmQQALtV4kB/vTgYfinoawpo67cdYmk8lqzGzzB/HKxZdNTq
s+zOfxRr7PWMo9LyVRuKLjGyYXZJ/coFaobWBi8Y96Rw5NZZRYQQXLIalC/Dhndm
AHckpstEtx2i8f6yxEUOgPvV/gD7Akn92RpqOGW0g/kYpXjGqZQy9PVHGy61sInY
HSkcOspIkJiS6WyJY9JcvJPM6ns4b84GE9qoUlWVF3RWJk1dqYCw5hz4U8LFyxsF
R6WhYiImvjxBLpab55rSqbGkzjI2z+ucDZyl1gqIv9U6qceVsgRyuqdfVN4deU22
LzO5IEDhnGdFqg9KQY7u8zm686Ejs64T1sh0y4GOmGsSg+P6nsqkdlXH8C+Cf03F
lqPFg8WQC7ojl/S8dPmkT5tcJh3BPwIWuvbtVjFOGQc8x0lb+NwK8h2Nsn6LNazS
0H90adh/IyYX4sBMokrpxAi+gMAWiyJHIHLeH2itNKtAQd3qQowbrWNswJSgJzsT
JuJ7uqRKAFkE6nCeAkuj/6KHHMPsfCAffVdyGaWqhoxmPOrnVgECggEBAOrCCwiC
XxwUgjOfOKx68siFJLfHf4vPo42LZOkAQq5aUmcWHbJVXmoxLYSczyAROopY0wd6
Dx8rqnpO7OtZsdJMeBSHbMVKoBZ77hiCQlrljcj12moFaEAButLCdZFsZW4zF/sx
kWIAaPH9vc4MvHHyvyNoB3yQRdevu57X7xGf9UxWuPil/jvdbt9toaraUT6rUBWU
GYPNKaLFsQzKsFWAzp5RGpASkhuiBJ0Qx3cfLyirjrKqTipe3o3gh/5RSHQ6VAhz
gdUG7WszNWk8FDCL6RTWzPOrbUyJo/wz1kblsL3vhV7ldEKFHeEjsDGroW2VUFlS
asAHNvM4/uYcOSECggEBANYH0427qZtLVuL97htXW9kCAT75xbMwgRskAH4nJDlZ
IggDErmzBhtrHgR+9X09iL47jr7dUcrVNPHzK/WXALFSKzXhkG/yAgmt3r14WgJ6
5y7010LlPFrzaNEyO/S4ISuBLt4cinjJsrFpoo0WI8jXeM5ddG6ncxdurKXMymY7
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::.::
:::::::::::::::::::::::::::.::::::::::::::::::::::::::::::::::::
LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLlL
ÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖ
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
YYYYYYYYYYYYYYYYYYYYYyYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
gff0GJCOMZ65pMSy3A3cSAtjlKnb4fWzuHD5CFbusN4WhCT/tNxGNSpzvxd8GIDs
nY7exs9L230oCCpedVgcbayHCbkChEfoPzL1e1jXjgCwCTgt8GjeEFqc1gXNEaUn
O8AJ4VlR8fRszHm6yR0ZUBdY7UJddxQiYOzt0S1RLlECggEAbdcs4mZdqf3OjejJ
06oTPs9NRtAJVZlppSi7pmmAyaNpOuKWMoLPElDAQ3Q7VX26LlExLCZoPOVpdqDH
KbdmBEfTR4e11Pn9vYdu9/i6o10U4hpmf4TYKlqk10g1Sj21l8JATj/7Diey8scO
sAI1iftSg3aBSj8W7rxCxSezrENzuqw5D95a/he1cMUTB6XuravqZK5O4eR0vrxR
AvMzXk5OXrUEALUvt84u6m6XZZ0pq5XZxq74s8p/x1JvTwcpJ3jDKNEixlHfdHEZ
ZIu/xpcwD5gRfVGQamdcWvzGHZYLBFO1y5kAtL8kI9tW7WaouWVLmv99AyxdAaCB
Y5mBAQKCAQEAzU7AnorPzYndlOzkxRFtp6MGsvRBsvvqPLCyUFEXrHNV872O7tdO
GmsMZl+q+TJXw7O54FjJJvqSSS1sk68AGRirHop7VQce8U36BmI2ZX6j2SVAgIkI
9m3btCCt5rfiCatn2+Qg6HECmrCsHw6H0RbwaXS4RZUXD/k4X+sslBitOb7K+Y+N
Bacq6QxxjlIqQdKKPs4P2PNHEAey+kEJJGEQ7bTkNxCZ21kgi1Sc5L8U/IGy0BMC
PvJxssLdaWILyp3Ws8Q4RAoC5c0ZP0W2j+5NSbi3jsDFi0Y6/2GRdY1HAZX4twem
Q0NCedq1JNatP1gsb6bcnVHFDEGsj/35oQKCAQEAgmWMuSrojR/fjJzvke6Wvbox
FRnPk+6YRzuYhAP/YPxSRYyB5at++5Q1qr7QWn7NFozFIVFFT8CBU36ktWQ39MGm
cJ5SGyN9nAbbuWA6e+/u059R7QL+6f64xHRAGyLT3gOb1G0N6h7VqFT25q5Tq0rc
Lf/CvLKoudjv+sQ5GKBPT18+zxmwJ8YUWAsXUyrqoFWY/Tvo5yLxaC0W2gh3+Ppi
EDqe4RRJ3VKuKfZxHn5VLxgtBFN96Gy0+Htm5tiMKOZMYAkHiL+vrVZAX0hIEuRZ
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
-----END RSA PRIVATE KEY-----" >> id_rsa

Now, if you try to reconstruct this key by removing the “obvious” garbage lines (the ones that are all repeated characters, some of which aren’t even valid base64 characters), it still isn’t a key – at least, openssl pkey doesn’t want anything to do with it. The key is very much still in there, though, as we shall soon see.

Using a gem I wrote and a quick bit of Ruby, we can extract a complete private key. The irb session looks something like this:

>> require "derparse"
>> b64 = <<EOF
MIIJKgIBAAKCAgEAxEVih1JGb8gu/Fm4AZh+ZwJw/pjzzliWrg4mICFt1g7SmIE2
TCQMKABdwd11wOFKCPc/UzRH/fHuQcvWrpbOSdqev/zKff9iedKw/YygkMeIRaXB
fYELqvUAOJ8PPfDm70st9GJRhjGgo5+L3cJB2gfgeiDNHzaFvapRSU0oMGQX+kI9
ezsjDAn+0Pp+r3h/u1QpLSH4moRFGF4omNydI+3iTGB98/EzuNhRBHRNq4oBV5SG
Pq/A1bem2ninnoEaQ+OPESxYzDz3Jy9jV0W/6LvtJ844m+XX69H5fqq5dy55z6DW
sGKn78ULPVZPsYH5Y7C+CM6GAn4nYCpau0t52sqsY5epXdeYx4Dc+Wm0CjXrUDEe
Egl4loPKDxJkQqQ/MQiz6Le/UK9vEmnWn1TRXK3ekzNV4NgDfJANBQobOpwt8WVB
rbsC0ON7n680RQnl7PltK9P1AQW5vHsahkoixk/BhcwhkrkZGyDIl9g8Q/Euyoq3
eivKPLz7/rhDE7C1BzFy7v8AjC3w7i9QeHcWOZFAXo5hiDasIAkljDOsdfD4tP5/
wSO6E6pjL3kJ+RH2FCHd7ciQb+IcuXbku64ln8gab4p8jLa/mcMI+V3eWYnZ82Yu
axsa85hAe4wb60cp/rCJo7ihhDTTvGooqtTisOv2nSvCYpcW9qbL6cGjAXECAwEA
AQKCAgEAjz6wnWDP5Y9ts2FrqUZ5ooamnzpUXlpLhrbu3m5ncl4ZF5LfH+QDN0Kl
KvONmHsUhJynC/vROybSJBU4Fu4bms1DJY3C39h/L7g00qhLG7901pgWMpn3QQtU
4P49qpBii20MGhuTsmQQALtV4kB/vTgYfinoawpo67cdYmk8lqzGzzB/HKxZdNTq
s+zOfxRr7PWMo9LyVRuKLjGyYXZJ/coFaobWBi8Y96Rw5NZZRYQQXLIalC/Dhndm
AHckpstEtx2i8f6yxEUOgPvV/gD7Akn92RpqOGW0g/kYpXjGqZQy9PVHGy61sInY
HSkcOspIkJiS6WyJY9JcvJPM6ns4b84GE9qoUlWVF3RWJk1dqYCw5hz4U8LFyxsF
R6WhYiImvjxBLpab55rSqbGkzjI2z+ucDZyl1gqIv9U6qceVsgRyuqdfVN4deU22
LzO5IEDhnGdFqg9KQY7u8zm686Ejs64T1sh0y4GOmGsSg+P6nsqkdlXH8C+Cf03F
lqPFg8WQC7ojl/S8dPmkT5tcJh3BPwIWuvbtVjFOGQc8x0lb+NwK8h2Nsn6LNazS
0H90adh/IyYX4sBMokrpxAi+gMAWiyJHIHLeH2itNKtAQd3qQowbrWNswJSgJzsT
JuJ7uqRKAFkE6nCeAkuj/6KHHMPsfCAffVdyGaWqhoxmPOrnVgECggEBAOrCCwiC
XxwUgjOfOKx68siFJLfHf4vPo42LZOkAQq5aUmcWHbJVXmoxLYSczyAROopY0wd6
Dx8rqnpO7OtZsdJMeBSHbMVKoBZ77hiCQlrljcj12moFaEAButLCdZFsZW4zF/sx
kWIAaPH9vc4MvHHyvyNoB3yQRdevu57X7xGf9UxWuPil/jvdbt9toaraUT6rUBWU
GYPNKaLFsQzKsFWAzp5RGpASkhuiBJ0Qx3cfLyirjrKqTipe3o3gh/5RSHQ6VAhz
gdUG7WszNWk8FDCL6RTWzPOrbUyJo/wz1kblsL3vhV7ldEKFHeEjsDGroW2VUFlS
asAHNvM4/uYcOSECggEBANYH0427qZtLVuL97htXW9kCAT75xbMwgRskAH4nJDlZ
IggDErmzBhtrHgR+9X09iL47jr7dUcrVNPHzK/WXALFSKzXhkG/yAgmt3r14WgJ6
5y7010LlPFrzaNEyO/S4ISuBLt4cinjJsrFpoo0WI8jXeM5ddG6ncxdurKXMymY7
EOF
>> b64 += <<EOF
gff0GJCOMZ65pMSy3A3cSAtjlKnb4fWzuHD5CFbusN4WhCT/tNxGNSpzvxd8GIDs
nY7exs9L230oCCpedVgcbayHCbkChEfoPzL1e1jXjgCwCTgt8GjeEFqc1gXNEaUn
O8AJ4VlR8fRszHm6yR0ZUBdY7UJddxQiYOzt0S1RLlECggEAbdcs4mZdqf3OjejJ
06oTPs9NRtAJVZlppSi7pmmAyaNpOuKWMoLPElDAQ3Q7VX26LlExLCZoPOVpdqDH
KbdmBEfTR4e11Pn9vYdu9/i6o10U4hpmf4TYKlqk10g1Sj21l8JATj/7Diey8scO
sAI1iftSg3aBSj8W7rxCxSezrENzuqw5D95a/he1cMUTB6XuravqZK5O4eR0vrxR
AvMzXk5OXrUEALUvt84u6m6XZZ0pq5XZxq74s8p/x1JvTwcpJ3jDKNEixlHfdHEZ
ZIu/xpcwD5gRfVGQamdcWvzGHZYLBFO1y5kAtL8kI9tW7WaouWVLmv99AyxdAaCB
Y5mBAQKCAQEAzU7AnorPzYndlOzkxRFtp6MGsvRBsvvqPLCyUFEXrHNV872O7tdO
GmsMZl+q+TJXw7O54FjJJvqSSS1sk68AGRirHop7VQce8U36BmI2ZX6j2SVAgIkI
9m3btCCt5rfiCatn2+Qg6HECmrCsHw6H0RbwaXS4RZUXD/k4X+sslBitOb7K+Y+N
Bacq6QxxjlIqQdKKPs4P2PNHEAey+kEJJGEQ7bTkNxCZ21kgi1Sc5L8U/IGy0BMC
PvJxssLdaWILyp3Ws8Q4RAoC5c0ZP0W2j+5NSbi3jsDFi0Y6/2GRdY1HAZX4twem
Q0NCedq1JNatP1gsb6bcnVHFDEGsj/35oQKCAQEAgmWMuSrojR/fjJzvke6Wvbox
FRnPk+6YRzuYhAP/YPxSRYyB5at++5Q1qr7QWn7NFozFIVFFT8CBU36ktWQ39MGm
cJ5SGyN9nAbbuWA6e+/u059R7QL+6f64xHRAGyLT3gOb1G0N6h7VqFT25q5Tq0rc
Lf/CvLKoudjv+sQ5GKBPT18+zxmwJ8YUWAsXUyrqoFWY/Tvo5yLxaC0W2gh3+Ppi
EDqe4RRJ3VKuKfZxHn5VLxgtBFN96Gy0+Htm5tiMKOZMYAkHiL+vrVZAX0hIEuRZ
EOF
>> der = b64.unpack("m").first
>> c = DerParse.new(der).first_node.first_child
>> version = c.value
=> 0
>> c = c.next_node
>> n = c.value
=> 80071596234464993385068908004931... # (etc)
>> c = c.next_node
>> e = c.value
=> 65537
>> c = c.next_node
>> d = c.value
=> 58438813486895877116761996105770... # (etc)
>> c = c.next_node
>> p = c.value
=> 29635449580247160226960937109864... # (etc)
>> c = c.next_node
>> q = c.value
=> 27018856595256414771163410576410... # (etc)

What I’ve done, in case you don’t speak Ruby, is take the two “chunks” of plausible-looking base64 data, chuck them together into a variable named b64, unbase64 it into a variable named der, pass that into a new DerParse instance, and then walk the DER value tree until I got all the values I need.

Interestingly, the q value actually traverses the “split” in the two chunks, which means that there’s always the possibility that there are lines missing from the key. However, since p and q are supposed to be prime, we can “sanity check” them to see if corruption is likely to have occurred:

>> require "openssl"
>> OpenSSL::BN.new(p).prime?
=> true
>> OpenSSL::BN.new(q).prime?
=> true

Excellent! The chances of a corrupted file producing valid-but-incorrect prime numbers isn’t huge, so we can be fairly confident that we’ve got the “real” p and q. Now, with the help of another one of my creations we can use e, p, and q to create a fully-operational battle key:

>> require "openssl/pkey/rsa"
>> k = OpenSSL::PKey::RSA.from_factors(p, q, e)
=> #<OpenSSL::PKey::RSA:0x0000559d5903cd38>
>> k.valid?
=> true
>> k.verify(OpenSSL::Digest::SHA256.new, k.sign(OpenSSL::Digest::SHA256.new, "bob"), "bob")
=> true

… and there you have it. One fairly redacted-looking private key brought back to life by maths and far too much free time.

Sorry Mr. Finn, I hope you’re not still using that key on anything Internet-facing.

What About Other Key Types?

EC keys are very different beasts, but they have much the same problems as RSA keys. A typical EC key contains both private and public data, and the public portion is twice the size – so only about 1/3 of the data in the key is private material. It is quite plausible that you can “redact” an EC key and leave all the actually private bits exposed.

What Do We Do About It?

In short: don’t ever try and redact real private keys. For documentation purposes, just put “KEY GOES HERE” in the appropriate spot, or something like that. Store your secrets somewhere that isn’t a public (or even private!) git repo.

Generating a “dummy” private key and sticking it in there isn’t a great idea, for different reasons: people have this odd habit of reusing “demo” keys in real life. There’s no need to encourage that sort of thing.


  1. Technically the pieces aren’t 100% aligned with the underlying DER, because of how base64 works. I felt it was easier to understand if I stuck to chopping up the base64, rather than decoding into DER and then chopping up the DER. 

,

Jonathan Adamczewskif32, u32, and const

Some time ago, I wrote “floats, bits, and constant expressions” about converting floating point number into its representative ones and zeros as a C++ constant expression – constructing the IEEE 754 representation without being able to examine the bits directly.

I’ve been playing around with Rust recently, and rewrote that conversion code as a bit of a learning exercise for myself, with a thoroughly contrived set of constraints: using integer and single-precision floating point math, at compile time, without unsafe blocks, while using as few unstable features as possible.

I’ve included the listing below, for your bemusement and/or head-shaking, and you can play with the code in the Rust Playground and rust.godbolt.org

// Jonathan Adamczewski 2020-05-12
//
// Constructing the bit-representation of an IEEE 754 single precision floating 
// point number, using integer and single-precision floating point math, at 
// compile time, in rust, without unsafe blocks, while using as few unstable 
// features as I can.
//
// or "What if this silly C++ thing http://brnz.org/hbr/?p=1518 but in Rust?"


// Q. Why? What is this good for?
// A. To the best of my knowledge, this code serves no useful purpose. 
//    But I did learn a thing or two while writing it :)


// This is needed to be able to perform floating point operations in a const 
// function:
#![feature(const_fn)]


// bits_transmute(): Returns the bits representing a floating point value, by
//                   way of std::mem::transmute()
//
// For completeness (and validation), and to make it clear the fundamentally 
// unnecessary nature of the exercise :D - here's a short, straightforward, 
// library-based version. But it needs the const_transmute flag and an unsafe 
// block.
#![feature(const_transmute)]
const fn bits_transmute(f: f32) -> u32 {
  unsafe { std::mem::transmute::<f32, u32>(f) }
}



// get_if_u32(predicate:bool, if_true: u32, if_false: u32):
//   Returns if_true if predicate is true, else if_false
//
// If and match are not able to be used in const functions (at least, not 
// without #![feature(const_if_match)] - so here's a branch-free select function
// for u32s
const fn get_if_u32(predicate: bool, if_true: u32, if_false: u32) -> u32 {
  let pred_mask = (-1 * (predicate as i32)) as u32;
  let true_val = if_true & pred_mask;
  let false_val = if_false & !pred_mask;
  true_val | false_val
}

// get_if_f32(predicate, if_true, if_false):
//   Returns if_true if predicate is true, else if_false
//
// A branch-free select function for f32s.
// 
// If either is_true or is_false is NaN or an infinity, the result will be NaN,
// which is not ideal. I don't know of a better way to implement this function
// within the arbitrary limitations of this silly little side quest.
const fn get_if_f32(predicate: bool, if_true: f32, if_false: f32) -> f32 {
  // can't convert bool to f32 - but can convert bool to i32 to f32
  let pred_sel = (predicate as i32) as f32;
  let pred_not_sel = ((!predicate) as i32) as f32;
  let true_val = if_true * pred_sel;
  let false_val = if_false * pred_not_sel;
  true_val + false_val
}


// bits(): Returns the bits representing a floating point value.
const fn bits(f: f32) -> u32 {
  // the result value, initialized to a NaN value that will otherwise not be
  // produced by this function.
  let mut r = 0xffff_ffff;

  // These floation point operations (and others) cause the following error:
  //     only int, `bool` and `char` operations are stable in const fn
  // hence #![feature(const_fn)] at the top of the file
  
  // Identify special cases
  let is_zero    = f == 0_f32;
  let is_inf     = f == f32::INFINITY;
  let is_neg_inf = f == f32::NEG_INFINITY;
  let is_nan     = f != f;

  // Writing this as !(is_zero || is_inf || ...) cause the following error:
  //     Loops and conditional expressions are not stable in const fn
  // so instead write this as type coversions, and bitwise operations
  //
  // "normalish" here means that f is a normal or subnormal value
  let is_normalish = 0 == ((is_zero as u32) | (is_inf as u32) | 
                        (is_neg_inf as u32) | (is_nan as u32));

  // set the result value for each of the special cases
  r = get_if_u32(is_zero,    0,           r); // if (iz_zero)    { r = 0; }
  r = get_if_u32(is_inf,     0x7f80_0000, r); // if (is_inf)     { r = 0x7f80_0000; }
  r = get_if_u32(is_neg_inf, 0xff80_0000, r); // if (is_neg_inf) { r = 0xff80_0000; }
  r = get_if_u32(is_nan,     0x7fc0_0000, r); // if (is_nan)     { r = 0x7fc0_0000; }
 
  // It was tempting at this point to try setting f to a "normalish" placeholder 
  // value so that special cases do not have to be handled in the code that 
  // follows, like so:
  // f = get_if_f32(is_normal, f, 1_f32);
  //
  // Unfortunately, get_if_f32() returns NaN if either input is NaN or infinite.
  // Instead of switching the value, we work around the non-normalish cases 
  // later.
  //
  // (This whole function is branch-free, so all of it is executed regardless of 
  // the input value)

  // extract the sign bit
  let sign_bit  = get_if_u32(f < 0_f32,  1, 0);

  // compute the absolute value of f
  let mut abs_f = get_if_f32(f < 0_f32, -f, f);

  
  // This part is a little complicated. The algorithm is functionally the same 
  // as the C++ version linked from the top of the file.
  // 
  // Because of the various contrived constraints on thie problem, we compute 
  // the exponent and significand, rather than extract the bits directly.
  //
  // The idea is this:
  // Every finite single precision float point number can be represented as a
  // series of (at most) 24 significant digits as a 128.149 fixed point number 
  // (128: 126 exponent values >= 0, plus one for the implicit leading 1, plus 
  // one more so that the decimal point falls on a power-of-two boundary :)
  // 149: 126 negative exponent values, plus 23 for the bits of precision in the 
  // significand.)
  //
  // If we are able to scale the number such that all of the precision bits fall 
  // in the upper-most 64 bits of that fixed-point representation (while 
  // tracking our effective manipulation of the exponent), we can then 
  // predictably and simply scale that computed value back to a range than can 
  // be converted safely to a u64, count the leading zeros to determine the 
  // exact exponent, and then shift the result into position for the final u32 
  // representation.
  
  // Start with the largest possible exponent - subsequent steps will reduce 
  // this number as appropriate
  let mut exponent: u32 = 254;
  {
    // Hex float literals are really nice. I miss them.

    // The threshold is 2^87 (think: 64+23 bits) to ensure that the number will 
    // be large enough that, when scaled down by 2^64, all the precision will 
    // fit nicely in a u64
    const THRESHOLD: f32 = 154742504910672534362390528_f32; // 0x1p87f == 2^87

    // The scaling factor is 2^41 (think: 64-23 bits) to ensure that a number 
    // between 2^87 and 2^64 will not overflow in a single scaling step.
    const SCALE_UP: f32 = 2199023255552_f32; // 0x1p41f == 2^41

    // Because loops are not available (no #![feature(const_loops)], and 'if' is
    // not available (no #![feature(const_if_match)]), perform repeated branch-
    // free conditional multiplication of abs_f.

    // use a macro, because why not :D It's the most compact, simplest option I 
    // could find.
    macro_rules! maybe_scale {
      () => {{
        // care is needed: if abs_f is above the threshold, multiplying by 2^41 
        // will cause it to overflow (INFINITY) which will cause get_if_f32() to
        // return NaN, which will destroy the value in abs_f. So compute a safe 
        // scaling factor for each iteration.
        //
        // Roughly equivalent to :
        // if (abs_f < THRESHOLD) {
        //   exponent -= 41;
        //   abs_f += SCALE_UP;
        // }
        let scale = get_if_f32(abs_f < THRESHOLD, SCALE_UP,      1_f32);    
        exponent  = get_if_u32(abs_f < THRESHOLD, exponent - 41, exponent); 
        abs_f     = get_if_f32(abs_f < THRESHOLD, abs_f * scale, abs_f);
      }}
    }
    // 41 bits per iteration means up to 246 bits shifted.
    // Even the smallest subnormal value will end up in the desired range.
    maybe_scale!();  maybe_scale!();  maybe_scale!();
    maybe_scale!();  maybe_scale!();  maybe_scale!();
  }

  // Now that we know that abs_f is in the desired range (2^87 <= abs_f < 2^128)
  // scale it down to be in the range (2^23 <= _ < 2^64), and convert without 
  // loss of precision to u64.
  const INV_2_64: f32 = 5.42101086242752217003726400434970855712890625e-20_f32; // 0x1p-64f == 2^64
  let a = (abs_f * INV_2_64) as u64;

  // Count the leading zeros.
  // (C++ doesn't provide a compile-time constant function for this. It's nice 
  // that rust does :)
  let mut lz = a.leading_zeros();

  // if the number isn't normalish, lz is meaningless: we stomp it with 
  // something that will not cause problems in the computation that follows - 
  // the result of which is meaningless, and will be ignored in the end for 
  // non-normalish values.
  lz = get_if_u32(!is_normalish, 0, lz); // if (!is_normalish) { lz = 0; }

  {
    // This step accounts for subnormal numbers, where there are more leading 
    // zeros than can be accounted for in a valid exponent value, and leading 
    // zeros that must remain in the final significand.
    //
    // If lz < exponent, reduce exponent to its final correct value - lz will be
    // used to remove all of the leading zeros.
    //
    // Otherwise, clamp exponent to zero, and adjust lz to ensure that the 
    // correct number of bits will remain (after multiplying by 2^41 six times - 
    // 2^246 - there are 7 leading zeros ahead of the original subnormal's
    // computed significand of 0.sss...)
    // 
    // The following is roughly equivalent to:
    // if (lz < exponent) {
    //   exponent = exponent - lz;
    // } else {
    //   exponent = 0;
    //   lz = 7;
    // }

    // we're about to mess with lz and exponent - compute and store the relative 
    // value of the two
    let lz_is_less_than_exponent = lz < exponent;

    lz       = get_if_u32(!lz_is_less_than_exponent, 7,             lz);
    exponent = get_if_u32( lz_is_less_than_exponent, exponent - lz, 0);
  }

  // compute the final significand.
  // + 1 shifts away a leading 1-bit for normal, and 0-bit for subnormal values
  // Shifts are done in u64 (that leading bit is shifted into the void), then
  // the resulting bits are shifted back to their final resting place.
  let significand = ((a << (lz + 1)) >> (64 - 23)) as u32;

  // combine the bits
  let computed_bits = (sign_bit << 31) | (exponent << 23) | significand;

  // return the normalish result, or the non-normalish result, as appopriate
  get_if_u32(is_normalish, computed_bits, r)
}


// Compile-time validation - able to be examined in rust.godbolt.org output
pub static BITS_BIGNUM: u32 = bits(std::f32::MAX);
pub static TBITS_BIGNUM: u32 = bits_transmute(std::f32::MAX);
pub static BITS_LOWER_THAN_MIN: u32 = bits(7.0064923217e-46_f32);
pub static TBITS_LOWER_THAN_MIN: u32 = bits_transmute(7.0064923217e-46_f32);
pub static BITS_ZERO: u32 = bits(0.0f32);
pub static TBITS_ZERO: u32 = bits_transmute(0.0f32);
pub static BITS_ONE: u32 = bits(1.0f32);
pub static TBITS_ONE: u32 = bits_transmute(1.0f32);
pub static BITS_NEG_ONE: u32 = bits(-1.0f32);
pub static TBITS_NEG_ONE: u32 = bits_transmute(-1.0f32);
pub static BITS_INF: u32 = bits(std::f32::INFINITY);
pub static TBITS_INF: u32 = bits_transmute(std::f32::INFINITY);
pub static BITS_NEG_INF: u32 = bits(std::f32::NEG_INFINITY);
pub static TBITS_NEG_INF: u32 = bits_transmute(std::f32::NEG_INFINITY);
pub static BITS_NAN: u32 = bits(std::f32::NAN);
pub static TBITS_NAN: u32 = bits_transmute(std::f32::NAN);
pub static BITS_COMPUTED_NAN: u32 = bits(std::f32::INFINITY/std::f32::INFINITY);
pub static TBITS_COMPUTED_NAN: u32 = bits_transmute(std::f32::INFINITY/std::f32::INFINITY);


// Run-time validation of many more values
fn main() {
  let end: usize = 0xffff_ffff;
  let count = 9_876_543; // number of values to test
  let step = end / count;
  for u in (0..=end).step_by(step) {
      let v = u as u32;
      
      // reference
      let f = unsafe { std::mem::transmute::<u32, f32>(v) };
      
      // compute
      let c = bits(f);

      // validation
      if c != v && 
         !(f.is_nan() && c == 0x7fc0_0000) && // nans
         !(v == 0x8000_0000 && c == 0) { // negative 0
          println!("{:x?} {:x?}", v, c); 
      }
  }
}

,

Chris NeugebauerReflecting on 10 years of not having to update WordPress

Over the weekend, the boredom of COVID-19 isolation motivated me to move my personal website from WordPress on a self-managed 10-year-old virtual private server to a generated static site on a static site hosting platform with a content delivery network.

This decision was overdue. WordPress never fit my brain particularly well, and it was definitely getting to a point where I wasn’t updating my website at all (my last post was two weeks before I moved from Hobart; I’ve been living in Petaluma for more than three years now).

Settling on which website framework wasn’t a terribly difficult choice (I chose Jekyll, everyone else seems to be using it), and I’ve had friends who’ve had success moving their blogs over. The difficulty I ended up facing was that the standard exporter that everyone to move from WordPress to Jekyll uses does not expect Debian’s package layout.

Backing up a bit: I made a choice, 10 years ago, to deploy WordPress on a machine that I ran myself, using the Debian system wordpress package, a simple aptitude install wordpress away. That decision was not particularly consequential then, but it chewed up 3 hours of my time on Saturday.

Why? The exporter plugin assumes that it will be able to find all of the standard WordPress files in the usual WordPress places, and when it didn’t find that, it broke in unexpected ways. And why couldn’t it find it?

Debian makes packaging choices that prioritise all the software on a system living side-by-side with minimal difficulty. It sets strict permissions. It separates application code from configuration from user data (which in the case of WordPress, includes plugins), in a way that is consistent between applications. This choice makes it easy for Debian admins to understand how to find bits of an application. It also minimises the chance of one PHP application from clobbering another.

10 years later, the install that I had set up was still working, having survived 3-4 Debian versions, and so 3-4 new WordPress versions. I don’t recall the last time I had to think about keeping my WordPress instance secure and updated. That’s quite a good run. I’ve had a working website despite not caring about keeping it updated for at least three years.

The same decisions that meant I spent 3 hours on Saturday doing a simple WordPress export saved me a bunch of time that I didn’t incrementally spend over the course a decade. Am I even? I have no idea.

Anyway, the least I can do is provide some help to people who might run into this same problem, so here’s a 5-step howto.

How to migrate a Debian WordPress site to Jekyll

Should you find the Jekyll exporter not working on your Debian WordPress install:

  1. Use the standard WordPress export to export an XML feel of your site.
  2. Spin up a new instance of WordPress (using WordPress.com, or on a new Virtual Private Server, whatever, really).
  3. Import the exported XML feed.
  4. Install the Jekyll exporter plugin.
  5. Follow the documentation and receive a Jekyll export of your site.

Basically, the plugin works with a stock WordPress install. If you don’t have one of those, it’s easy to move it over.

,

Gary PendergastInstall the COVIDSafe app

I can’t think of a more unequivocal title than that. 🙂

The Australian government doesn’t have a good track record of either launching publicly visible software projects, or respecting privacy, so I’ve naturally been sceptical of the contact tracing app since it was announced. The good news is, while it has some relatively minor problems, it appears to be a solid first version.

Privacy

While the source code is yet to be released, the Android version has already been decompiled, and public analysis is showing that it only collects necessary information, and only uploads contact information to the government servers when you press the button to upload (you should only press that button if you actually get COVID-19, and are asked to upload it by your doctor).

The legislation around the app is also clear that the data you upload can only be accessed by state health officials. Commonwealth departments have no access, neither do non-health departments (eg, law enforcement, intelligence).

Technical

It does what it’s supposed to do, and hasn’t been found to open you up to risks by installing it. There are a lot of people digging into it, so I would expect any significant issues to be found, reported, and fixed quite quickly.

Some parts of it are a bit rushed, and the way it scans for contacts could be more battery efficient (that should hopefully be fixed in the coming weeks when Google and Apple release updates that these contact tracing apps can use).

If it produces useful data, however, I’m willing to put up with some quirks. 🙂

Usefulness

I’m obviously not an epidemiologist, but those I’ve seen talk about it say that yes, the data this app produces will be useful for augmenting the existing contact tracing efforts. There were some concerns that it could produce a lot of junk data that wastes time, but I trust the expert contact tracing teams to filter and prioritise the data they get from it.

Install it!

The COVIDSafe site has links to the app in Apple’s App Store, as well as Google’s Play Store. Setting it up takes a few minutes, and then you’re done!

,

Craige McWhirterBuilding Daedalus Flight on NixOS

NixOS Daedalus Gears by Craige McWhirter

Daedalus Flight was recently released and this is how you can build and run this version of Deadalus on NixOS.

If you want to speed the build process up, you can add the IOHK Nix cache to your own NixOS configuration:

iohk.nix:

nix.binaryCaches = [
  "https://cache.nixos.org"
  "https://hydra.iohk.io"
];
nix.binaryCachePublicKeys = [
  "hydra.iohk.io:f/Ea+s+dFdN+3Y/G+FDgSq+a5NEWhJGzdjvKNGv0/EQ="
];

If you haven't already, you can clone the Daedalus repo and specifically the 1.0.0 tagged commit:

$ git clone --branch 1.0.0 https://github.com/input-output-hk/daedalus.git

Once you've cloned the repo and checked you're on the 1.0.0 tagged commit, you can build Daedalus flight with the following command:

$ nix build -f . daedalus --argstr cluster mainnet_flight

Once the build completes, you're ready to launch Daedalus Flight:

$ ./result/bin/daedalus

To verify that you have in fact built Daedalus Flight, first head to the Daedalus menu then About Daedalus. You should see a title such as "DAEDALUS 1.0.0". The second check, is to press [Ctl]+d to access Daedalus Diagnostocs and your Daedalus state directory should have mainnet_flight at the end of the path.

If you've got these, give yourself a pat on the back and grab yourself a refreshing bevvy while you wait for blocks to sync.

Daedalus FC1 screenshot

,

Andrew RuthvenInstall Fedora CoreOS using FAI

I've spent the last couple of days trying to deploy Fedora CoreOS to some physical hardware/bare metal for a colleague using the official PXE installer from Fedora CoreOS. It wasn't very pleasant, and just wouldn't work reliably.

Maybe my expectations were to high, in that I thought I could use Ignition to prepare more of the system for me, as my colleague has been able to bare metal installs correctly. I just tried to use Ignition as documented.

A few interesting aspects I encountered:

  1. The PXE installer for it has a 618MB initrd file. This takes quite a while to transfer via tftp!
  2. It can't build software RAID for the main install device (and the developers have no intention of adding this), and it seems very finicky to build other RAID sets for other partitions.
  3. And, well, I just kept having problems where the built systems would hang during boot for no obvious reason.
  4. The time to do an installation was incredibly long.
  5. The initrd image is really just running coreos-installer against the nominated device.

During the night I got feed up with that process and wrote a Fully Automatic Installer (FAI) profile that'd install CoreOS instead. I can now use setup-storage from FAI using it's standard disk_config files. This allows me to build complicated disk configurations with software RAID and LVM easily.

A big bonus is that a rebuild is a lot faster, timed from typing reboot to a fresh login prompt is 10 minutes - and this is on physical hardware so includes BIOS POST and RAID controller set up, twice each.

I thought this might be of interest to other people, so the FAI profile I developed for this is located here: https://github.com/catalyst-cloud/fai-profile-fedora-coreos

FAI was initially developed to deploy Debian systems, it has since been extended to be able to install a number of other operating systems, however I think this is a good example of how easy it is to deploy non-Debian derived operating systems using FAI without having to modify FAI itself.

,

Chris SmartAccessing USB serial devices in Fedora Silverblue

One of the things I do a lot on my Fedora machines is talk to devices via USB serial. While a device is correctly detected at /dev/ttyUSB0 and owned by the dialout group, adding myself to that group doesn’t work as it can’t be found. This is because under Silverblue, there are two different group files (/usr/lib/group and /etc/group) with different content.

There are some easy ways to solve this, for example we can create the matching dialout group or write a udev rule. Let’s take a look!

On the host with groups

If you try to add yourself to the dialout group it will fail.

sudo gpasswd -a ${USER} dialout
gpasswd: group 'dialout' does not exist in /etc/group

Trying to re-create the group will also fail as it’s already in use.

sudo groupadd dialout -r -g 18
groupadd: GID '18' already exists

So instead, we can simply grab the entry from the OS group file and add it to /etc/group ourselves.

grep ^dialout: /usr/lib/group |sudo tee -a /etc/group

Now we are able to add ourselves to the dialout group!

sudo gpasswd -a ${USER} dialout

Activate that group in our current shell.

newgrp dialout

And now we can use a tool like screen to talk to the device (note you will have needed to install screen with rpm-ostree and rebooted first).

screen /dev/ttyUSB0 115200

And that’s it. We can now talk to USB serial devices on the host.

Inside a container with udev

Inside a container is a little more tricky as the dialout group is not passed into it. Thus, inside the container the device is owned by nobody and the user will have no permissions to read or write to it.

One way to deal with this and still use the regular toolbox command is to create a udev rule and make yourself the owner of the device on the host, instead of root.

To do this, we create a generic udev rule for all usb-serial devices.

cat << EOF | sudo tee /etc/udev/rules.d/50-usb-serial.rules
SUBSYSTEM=="tty", SUBSYSTEMS=="usb-serial", OWNER="${USER}"
EOF

If you need to create a more specific rule, you can find other bits to match by (like kernel driver, etc) with the udevadm command.

udevadm info -a -n /dev/ttyUSB0

Once you have your rule, reload udev.

sudo udevadm control --reload-rules
sudo udevadm trigger

Now, unplug your serial device and plug it back in. You should notice that it is now owned by your user.

ls -l /dev/ttyUSB0
crw-rw----. 1 csmart dialout 188, 0 Apr 18 20:53 /dev/ttyUSB0

It should also be the same inside the toolbox container now.

[21:03 csmart ~]$ toolbox enter
⬢[csmart@toolbox ~]$ ls -l /dev/ttyUSB0 
crw-rw----. 1 csmart nobody 188, 0 Apr 18 20:53 /dev/ttyUSB0

And of course, as this is inside a container, you can just dnf install screen or whatever other program you need.

Of course, if you’re happy to create the udev rule then you don’t need to worry about the groups solution on the host.

Chris SmartMaking dnf on Fedora Silverblue a little easier with bash aliases

Fedora Silverblue doesn’t come with dnf because it’s an immutable operating system and uses a special tool called rpm-ostree to layer packages on top instead.

Most terminal work is designed to be done in containers with toolbox, but I still do a bunch of work outside of a container. Searching for packages to install with rpm-ostree still requires dnf inside a container, as it does not have that function.

I add these two aliases to my ~/.bashrc file so that using dnf to search or install into the default container is possible from a regular terminal. This just makes Silverblue a little bit more like what I’m used to with regular Fedora.

cat >> ~/.bashrc << EOF
alias sudo="sudo "
alias dnf="bash -c '#skip_sudo'; toolbox -y create 2>/dev/null; toolbox run sudo dnf"
EOF

If the default container doesn’t exist, toolbox creates it. Note that the alias for sudo has a space at the end. This tells bash to also check the next command word for alias expansion, which is what makes sudo work with aliases. Thus, we can make sure that both dnf and sudo dnf will work. The first part of the dnf alias is used to skip the sudo command so the rest is run as the regular user, which makes them both work the same.

We need to source that file or run a new bash session to pick up the aliases.

bash

Now we can just use dnf command like normal. Search can be used to find packages to install with rpm-ostree while installing packages will go into the default toolbox container (both with and without sudo are the same).

sudo dnf search vim
dnf install -y vim
The container is automatically created with dnf

To run vim from the example, enter the container and it will be there.

Vim in a container

You can do whatever you normally do with dnf, like install RPMs like RPMFusion and list repos.

Installing RPMFusion RPMs into container
Lising repositories in the container

Anyway, just a little thing but it’s kind of helpful to me.

,

Craige McWhirterCrisis Proofing the Australian Economy

An Open Letter to Prime Minister Scott Morrison

To The Hon Scott Morrison MP, Prime Minister,

No doubt how to re-invigorate our economy is high on your mind, among other priorities in this time of crisis.

As you're acutely aware, the pandemic we're experiencing has accelerated a long-term high unemployment trajectory we were already on due to industry retraction, automation, off-shoring jobs etc.

Now is the right time to enact changes that will bring long-term crisis resilience, economic stability and prosperity to this nation.

  1. Introduce a 1% tax on all financial / stock / commodity market transactions.
  2. Use 100% of that to fund a Universal Basic Income for all adult Australian citizens.

Funding a Universal Basic Income will bring:

  • Economic resilience in times of emergency (bushfire, drought, pandemic)
  • Removal of the need for government financial aid in those emergencies
  • Removal of all forms of pension and unemployment benefits
  • A more predictable, reduced and balanced government budget
  • Dignity and autonomy to those impacted by a economic events / crisis
  • Space and security for the innovative amongst us to take entrepreneurial risks
  • A growth in social, artistic and economic activity that could not happen otherwise

This is both simple to collect and simple to distribute to all tax payers. It can be done both swiftly and sensibly, enabling you to remove the Job Keeper band aid and it's related budgetary problems.

This is an opportunity to be seized, Mr Morrison.

There is also a second opportunity.

Post World War II, we had the Snowy River scheme. Today we have the housing affordability crisis and many Australians will never own their own home but a public building programme to provide 25% of housing will create a permanent employment and building boom and resolve the housing affordability crisis, over time.

If you cap repayments for those in public housing to 25% of their income, there will also be more disposable income circulating through the economy, creating prosperous times for all Australians.

Carpe diem, Mr Morrison.

Recognise the opportunity. Seize it.


Dear Readers,

If you support either or both of these ideas, please contact the Prime Minister directly and add your voice.

,

Chris SmartFedora Silverblue is an amazing immutable desktop

I recently switched my regular Fedora 31 workstation over to the 31 Silverblue release. I’ve played with Project Atomic before and have been meaning to try it out more seriously for a while, but never had the time. Silverblue provided the catalyst to do that.

What this brings to the table is quite amazing and seriously impressive. The base OS is immutable and everyone’s install is identical. This means quality can be improved as there are less combinations and it’s easier to test. Upgrades to the next major version of Fedora are fast and secure. Instead of updating thousands of RPMs in-place, the new image is downloaded and the system reboots into it. As the underlying images don’t change, it also offers full rollback support.

This is similar to how platforms like Chrome OS and Android work, but thanks to ostree it’s now available for Linux desktops! That is pretty neat.

It doesn’t come with a standard package manager like dnf. Instead, any packages or changes you need to perform on the base OS are done using rpm-ostree command, which actually layers them on top.

And while technically you can install anything using rpm-ostree, ideally this should be avoided as much as possible (some low level apps like shells and libvirt may require it, though). Flatpak apps and containers are the standard way to consume packages. As these are kept separate from the base OS, it also helps improve stability and reliability.

Installing Silverblue

I copied the Silverblue installer to a USB stick and booted it to do the install. As my Dell XPS has an NVIDIA card, I modified the installer’s kernel args and disabled the nouveau driver with the usual nouveau.modeset=0 to get the install GUI to show up.

I’m also running in UEFI mode and due to a bug you have to use a separate, dedicated /boot/efi partition for Silverblue (personally, I think that’s a good thing to do anyway). Otherwise, the install looks pretty much the same as regular Fedora and went smoothly.

Once installed, I blacklisted the nouveau driver and rebooted. To make these kernel arguments permanent, we don’t use grub2, we set kernel args with rpm-ostree.

rpm-ostree kargs --append=modprobe.blacklist=nouveau --append=rd.driver.blacklist=nouveau

The NVIDIA drivers from RPMFusion are supported, so following this I had to add the repositories and drivers as RPMs on the base image.

rpm-ostree install \
https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-31.noarch.rpm \
https://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-31.noarch.rpm
systemctl reboot

Once rebooted I then installed the necessary packages and rebooted again to activate them.

rpm-ostree install akmod-nvidia xorg-x11-drv-nvidia-cuda libva-utils libva-vdpau-driver gstreamer1-libav
rpm-ostree kargs --append=nvidia-drm.modeset=1
systemctl reboot

That was the base setup complete, which all went pretty smoothly. What you’re left with is the base OS with GNOME and a few core apps.

GNOME in Silverblue

Working with Silverblue

Using Silverblue is a different way of working than I have been used to. As mentioned above, there is no dnf command and packages are layered on top of the base OS with the rpm-ostree command. Because this is a layer, installing a new RPM requires a reboot to activate it, which is quite painful when you’re in the middle of some work and realise you need a program.

The answer though, is to use more containers instead of RPMs as I’m used to.

Containers

As I wrote about in an earlier blog post, toolbox is wrapper for setting up containers and compliments Silverblue wonderfully. If you need to install any terminal apps, give this a shot. Creating and running a container is as simple as this.

toolbox create
toolbox enter
Container on Fedora SIlverblue

Once inside your container use it like a normal Fedora machine (dnf is available!).

As rpm-ostree has no search function, using a container is the expected way to do this. Having created the container above, you can now use it (without entering it first) to perform package searches.

toolbox run dnf search vim

Apps

Graphical apps are managed with Flatpak, the new way to deliver secure, isolated programs on Linux. Silverblue is configured to use Fedora apps out of the box, and you can also add Flathub as a third party repo.

I experienced some small glitches with the Software GUI program when applying updates, but I don’t normally use it so I’m not sure if it’s just beta issues or not. As the default install is more sparse than usual, you’ll find yourself needing to install the apps you use. I really like this approach, it keeps the base system smaller and cleaner.

While Fedora provides their own Firefox package in Flatpak format (which is great) Mozilla also just recently started publishing their official package to Flathub. So, to install that, we simply add the Flathub as a repository and install away!

flatpak remote-add flathub https://flathub.org/repo/flathub.flatpakrepo
flatpak update
flatpak install org.mozilla.firefox

After install, Firefox should appears as a regular app inside GNOME.

Official Firefox from Mozilla via Flatpak

If you need to revert to an earlier version of a Flatpak (which I did when I was testing out Firefox beta), you can fetch the remote log for the app, then update to a specific commit.

flatpak remote-info --log flathub-beta org.mozilla.firefox//beta
flatpak update \
--commit 908489d0a77aaa8f03ca8699b489975b4b75d4470ce9bac92e56c7d089a4a869 \
org.mozilla.firefox//beta

Replacing system packages

If you have installed a Flatpak, like Firefox, and no-longer want to use the RPM version included in the base OS, you can use rpm-ostree to override it.

rpm-ostree override remove firefox

After a reboot, you will only see your Flatpak version.

Upgrades

I upgraded from 31 to the 32 beta, which was very fast by comparison to regular Fedora (because it just needs to download the new base image) and pretty seamless.

The only hiccup I had was needing to remove RPMFusion 31 release RPMs first, upgrade the base to 32, then install the RPMFusion 32 release RPMs. After that, I did an update for good measure.

rpm-ostree uninstall rpmfusion-nonfree-release rpmfusion-free-release
rpm-ostree rebase fedora:fedora/32/x86_64/silverblue
rpm-ostree install \
https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-32.noarch.rpm \
https://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-32.noarch.rpm
systemctl reboot

Then post reboot, I did a manual update of the system.

rpm-ostree upgrade

You can see the current status of your system with the rpm-ostree command.

rpm-ostree status 

On my system you can see the ostree I’m using, the commit as well as both layered and local packages.

State: idle
AutomaticUpdates: disabled
Deployments:
â—� ostree://fedora:fedora/32/x86_64/silverblue
                   Version: 32.20200410.n.0 (2020-04-10T08:35:30Z)
                BaseCommit: d809af7c4f170a2175ffa1374827dd55e923209aec4a7fb4dfc7b87cd6c110c9
              GPGSignature: Valid signature by 97A1AE57C3A2372CCA3A4ABA6C13026D12C944D0
           LayeredPackages: akmod-nvidia git gstreamer1-libav ipmitool libva-utils libva-vdpau-driver libvirt
                            pass powertop screen tcpdump tmux vim virt-manager xorg-x11-drv-nvidia-cuda
             LocalPackages: rpmfusion-free-release-32-0.3.noarch rpmfusion-nonfree-release-32-0.4.noarch

  ostree://fedora:fedora/32/x86_64/silverblue
                   Version: 32.20200410.n.0 (2020-04-10T08:35:30Z)
                BaseCommit: d809af7c4f170a2175ffa1374827dd55e923209aec4a7fb4dfc7b87cd6c110c9
              GPGSignature: Valid signature by 97A1AE57C3A2372CCA3A4ABA6C13026D12C944D0
           LayeredPackages: akmod-nvidia git gstreamer1-libav ipmitool libva-utils libva-vdpau-driver libvirt
                            pass powertop screen tcpdump tmux vim virt-manager xorg-x11-drv-nvidia-cuda
             LocalPackages: rpmfusion-free-release-32-0.3.noarch rpmfusion-nonfree-release-32-0.4.noarch

To revert to the previous version temporarily, simply select it from the grub boot menu and you’ll go back in time. If you want to make this permanent, you can rollback to the previous state instead and then just reboot.

rpm-ostree rollback

Silverblue is really impressive and works well. I will continue to use it as my daily driver and see how it goes over time.

Tips

I have run into a couple of issues, mostly around using the Software GUI (which I don’t normally use). Mostly these were things like it listing updates for Flatpaks which were not actually there fore update, and when you tied to update it didn’t do anything.

If you hit issues, you can try clearing out the Software data and loading the program again.

pkill gnome-software
rm -rf ~/.cache/gnome-software

If you need to, you can also clean out and refresh the rpm-ostree cache and do an update.

rpm-ostree cleanup -m
rpm-ostree update

To repair and update Flatpaks, if you need to.

flatpak repair
flatpak update

Also see

Making dnf on the host terminal a little easier with aliases.

Accessing USB serial devices on the host and in a toolbox container.

Pia AndrewsA temporary return to Australia due to COVID-19

The last few months have been a rollercoaster, and we’ve just had to make another big decision that we thought we’d share.

TL;DR: we returned to Australia last night, hopeful to get back to Canada when we can. Currently in Sydney quarantine and doing fine.

UPDATE: please note that this isn’t at all a poor reflection on Canada. To the contrary, we have loved even the brief time we’ve had there, the wonderful hospitality and kindness shown by everyone, and the excellent public services there.

We moved to Ottawa, Canada at the end of February, for an incredible job opportunity with Service Canada which also presented a great life opportunity for the family. We enjoyed 2 “normal” weeks of settling in, with the first week dedicated to getting set up, and the second week spent establishing a work / school routine – me in the office, little A in school and T looking at work opportunities and running the household.

Then, almost overnight, everything went into COVID lock down. Businesses and schools closed. Community groups stopped meeting. Everyone people are being affected by this every day, so we have been very lucky to be largely fine and in good health, and we thought we could ride it out safely staying in Ottawa, even if we hadn’t quite had the opportunity to establish ourselves.

But then a few things happened which changed our minds – at least for now.

Firstly, with the schools shut down before the A had really had a chance to make friends (she only attended for 5 days before the school shut down), she was left feeling very isolated. The school is trying to stay connected with its students by providing a half hour video class each day, with a half hour activity in the afternoons, but it’s no way to help her to make new friends. A has only gotten to know the kids of one family in Ottawa, who are also in isolation but have been amazingly supportive (thanks Julie and family!), so we had to rely heavily on video playdates with cousins and friends in Australia, for which the timezone difference only allows a very narrow window of opportunity each day. With every passing day, the estimated school closures have gone from weeks, to months, to very likely the rest of the school year (with the new school year commencing in September). If she’d had just another week or two, she would have likely found a friend, so that was a pity. It’s also affected the availability of summer camps for kids, which we were relying on to help us with A through the 2 month summer holiday period (July & August).

Secondly, we checked our health cover and luckily the travel insurance we bought covered COVID conditions, but we were keen to get full public health cover. Usually for new arrivals there is a 3 month waiting period before this can be applied for. However, in response to the COVID threat the Ontario Government recently waived that waiting period for public health insurance, so we rushed to register. Unfortunately, the one service office that is able to process applications from non-Canandian citizens had closed by that stage due to COVID, with no re-opening being contemplated. We were informed that there is currently no alternative ability for non-citizens to apply online or over the phone.

Thirdly, the Australian Government has strongly encouraged all Australian citizens to return home, warning of the closing window for international travel. . We became concerned we wouldn’t have full consulate support if something went wrong overseas. A good travel agent friend of ours told us the industry is preparing for a minimum of 6 months of international travel restrictions, which raised the very real issue that if anything went wrong for us, then neither could we get home, nor family come to us. And, as we can now all appreciate, it’s probable that international travel disruptions and prohibitions will endure for much longer than 6 months.

Finally, we had a real scare. For context, we signed a lease for an apartment in a lovely part of central Ottawa, but we weren’t able to move in until early April, so we had to spend 5 weeks living in a hotel room. We did move into our new place just last Sunday and it was glorious to finally have a place, and for little A to finally have her own room, which she adored. Huge thanks to those who generously helped us make that move! The apartment is only 2 blocks away from A’s new school, which is incredibly convenient for us – it will particularly good during the worst of Ottawa’s winter. But little A, who is now a very active and adventurous 4 years old, managed to face plant off her scooter (trying to bunnyhop down a stair!) and she knocked out a front tooth, on only the second day in the new place! She is ok, but we were all very, very lucky that it was a clean accident with the tooth coming out whole and no other significant damage. But we struggled to get any non emergency medical support.

The Ottawa emergency dental service was directing us to a number that didn’t work. The phone health service was so busy that we were told we couldn’t even speak to a nurse for 24 hours. We could have called emergency services and gone to a hospital, which was comforting, but several Ottawa hospitals reported COVID outbreaks just that day, so we were nervous to do so. We ended up getting medical support from the dentist friend of a friend over text, but that was purely by chance. It was quite a wake up call as to the questions of what we would have done if it had been a really serious injury. We just don’t know the Ontario health system well enough, can’t get on the public system, and the pressure of escalating COVID cases clearly makes it all more complicated than usual.

If we’d had another month or two to establish ourselves, we think we might have been fine, and we know several ex-pats who are fine. But for us, with everything above, we felt too vulnerable to stay in Canada right now. If it was just Thomas and I it’d be a different matter.

So, we have left Ottawa and returned to Australia, with full intent to return to Canada when we can. As I write this, we are on day 2 of the 14 day mandatory isolation in Sydney. We were apprehensive about arriving in Sydney, knowing that we’d be put into mandatory quarantine, but the processing and screening of arrivals was done really well, professionally and with compassion. A special thank you to all the Sydney airport and Qatar Airways staff, immigration and medical officers, NSW Police, army soldiers and hotel staff who were all involved in the process. Each one acted with incredible professionalism and are a credit to their respective agencies. They’re also exposing themselves to the risk of COVID in order to help others. Amazing and brave people. A special thank you to Emma Rowan-Kelly who managed to find us these flights back amidst everything shutting down globally.

I will continue working remotely for Service Canada, on the redesign and implementation of a modern digital channel for government services. Every one of my team is working remotely now anyway, so this won’t be a significant issue apart from the timezone. I’ll essentially be a shift worker for this period Our families are all self isolating, to protect the grandparents and great-grandparents, so the Andrews family will be self-isolating in a location still to be confirmed. We will be traveling directly there once we are released from quarantine, but we’ll be contactable via email, fb, whatsapp, video, etc.

We are still committed to spending a few years in Canada, working, exploring and experiencing Canadian cultures, and will keep the place in Ottawa with the hope we can return there in the coming 6 months or so. We are very, very thankful for all the support we have had from work, colleagues, little A’s school, new friends there, as well as that of friends and family back in Australia.

Thank you all – and stay safe. This is a difficult time for everyone, and we all need to do our part and look after each other best we can.

Chris SmartEasy containers on Fedora with toolbox

The toolbox program is a wrapper for setting up containers on Fedora. It’s not doing anything you can’t do yourself with podman, but it does make using and managing containers more simple and easy to do. It comes by default on Silverblue where it’s aimed for use with terminal apps and dev work, but you can try it on a regular Fedora workstation.

sudo dnf install toolbox

Creating containers

You can create just one container if you want, which will be called something like fedora-toolbox-32, or you can create separate containers for different things. Up to you. As an example, let’s create a container called testing-f32.

toolbox create --container testing-f32

By default toolbox uses the Fedora registry and creates a container which is the same version as your host. However you can specify a different version if you need to, for example if you needed a Fedora 30 container.

toolbox create --release f30 --container testing-f30

These containers are not yet running, they’ve just been created for you.

View your containers

You can see your containers with the list option.

toolbox list

This will show you both the images in your cache and the containers in a nice format.

IMAGE ID      IMAGE NAME                                        CREATED
c49513deb616  registry.fedoraproject.org/f30/fedora-toolbox:30  5 weeks ago
f7cf4b593fc1  registry.fedoraproject.org/f32/fedora-toolbox:32  4 weeks ago

CONTAINER ID  CONTAINER NAME  CREATED        STATUS   IMAGE NAME
b468de87277b  testing-f30     5 minutes ago  Created  registry.fedoraproject.org/f30/fedora-toolbox:30
1597ab1a00a5  testing-f32     5 minutes ago  Created  registry.fedoraproject.org/f32/fedora-toolbox:32

As toolbox is a wrapper, you can also see this information with podman, but with two commands; one for images and one for containers. Notice that with podman you can also see that these containers are not actually running (that’s the next step).

podman images ; podman ps -a
registry.fedoraproject.org/f32/fedora-toolbox   32       f7cf4b593fc1   4 weeks ago    360 MB
registry.fedoraproject.org/f30/fedora-toolbox   30       c49513deb616   5 weeks ago    404 MB

CONTAINER ID  IMAGE                                             COMMAND               CREATED             STATUS   PORTS  NAMES
b468de87277b  registry.fedoraproject.org/f30/fedora-toolbox:30  toolbox --verbose...  About a minute ago  Created         testing-f30
1597ab1a00a5  registry.fedoraproject.org/f32/fedora-toolbox:32  toolbox --verbose...  About a minute ago  Created         testing-f32

You can also use podman to inspect the containers and appreciate all the extra things toolbox is doing for you.

podman inspect testing-f32

Entering a container

Once you have a container created, to use it you just enter it with toolbox.

toolbox enter --container testing-f32

Now you are inside your container which is separate from your host, but it generally looks the same. A number of bind mounts were created automatically for you and you’re still in your home directory. It is important to note that all containers you run with toolbox will share your home directory! Thus it won’t isolate different versions of the same software, for example, you would still need to create separate virtual environments for Python.

Any new shells or tabs you create in your terminal app will also be inside that container. Note the PS1 variable has changed to have a pink shape at the front (from /etc/profile.d/toolbox.sh).

Inside a container with toolbox

Note that you could also start and enter the container with podman.

podman start testing-f30
podman exec -it -u ${EUID} -w ${HOME} testing-f30 /usr/bin/bash

Hopefully you can see how toolbox make using containers easier!

Exiting a container

To get out of the container, just exit the shell and you’ll be back to your previous session on the host. The container will still exist and can be entered again, it is not deleted unless you delete it.

Removing a container

To remove a container, simply run toolbox with the rm option. Note that this still keeps the images around, it just deletes the instance of that image that’s running as that container.

toolbox rm -f testing-f32

Again, you can also delete this using podman.

Using containers

Once inside a container you can basically (mostly) treat your container system as a regular Fedora host. You can install any apps you want, such as terminal apps like screenfetch and even graphical programs like gedit (which work from inside the container).

sudo dnf install screenfetch gedit
screenfetch is always a favourite

For any programs that require RPMFusion, like ffmpeg, you first need to set up the repos as you would on a regular Fedora system.

sudo dnf install \
https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm \
https://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-$(rpm -E %fedora).noarch.rpm
sudo dnf install ffmpeg

These programs like screenfetch and ffmpeg are available inside your container, but not outside your container. They are isolated. To run them in the future you would enter the container and run the program.

Instead of entering and then running the program, you can also just use the run command. Here you can see screenfetch is not on my host, but I can run it in the container.

Those are pretty simple (silly?) examples, but hopefully it demonstrates the value of toolbox. It’s probably more useful for dev work where you can separate and manage different versions of various platforms, but it does make it really easy to quickly spin something outside of you host system.

,

Chris SmartCustom WiFi enabled nightlight with ESPHome and Home Assistant

I built this custom night light for my kids as a fun little project. It’s pretty easy so thought someone else might be inspired to do something similar.

Custom WiFi connected nightlight

Hardware

The core hardware is just an ESP8266 module and an Adafruit NeoPixel Ring. I also bought a 240V bunker light and took the guts out to use as the housing, as it looked nice and had a diffuser (you could pick anything that you like).

Removing existing components from bunker light

While the data pin of the NeoPixel Ring can pretty much connect to any GPIO pin on the ESP, bitbanging can cause flickering. It’s better to use pins 1, 2 or 3 on an ESP8266 where we can use other methods to talk to the device.

These methods are exposed in ESPHome’s support for NeoPixel.

  • ESP8266_DMA (default for ESP8266, only on pin GPIO3)
  • ESP8266_UART0 (only on pin GPIO1)
  • ESP8266_UART1 (only on pin GPIO2)
  • ESP8266_ASYNC_UART0 (only on pin GPIO1)
  • ESP8266_ASYNC_UART1 (only on pin GPIO2) (only on pin GPIO2)
  • ESP32_I2S_0 (ESP32 only)
  • ESP32_I2S_1 (default for ESP32)
  • BIT_BANG (can flicker a bit)

I chose GPIO2 and use ESP8266_UART1 method in the code below.

So, first things first, solder up some wires to 5V, GND and GPIO pin 2 on the ESP module. These connect to the 5V, GND and data pins on the NeoPixel Ring respectively.

It’s not very neat, but I used a hot glue gun to stick the ESP module into the bottom part of the bunker light, and fed the USB cable through for power and data.

I hot-glued the NeoPixel Ring in-place on the inside of the bunker light, in the centre, shining outwards towards the diffuser.

The bottom can then go back on and screws hold it in place. I used a hacksaw to create a little slot for the USB cable to sit in and then added hot-glue blobs for feet. All closed up, it looks like this underneath.

Looks a bit more professional from the top.

Code using ESPHome

I flashed the ESP8266 using ESPHome (see my earlier blog post) with this simple YAML config.

esphome:
  name: nightlight
  build_path: ./builds/nightlight
  platform: ESP8266
  board: huzzah
  esp8266_restore_from_flash: true

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

# Enable logging
logger:

# Enable Home Assistant API
api:
  password: '!secret api_password'

# Enable over the air updates
ota:
  password: !secret ota_password

mqtt:
  broker: !secret mqtt_broker
  username: !secret mqtt_username
  password: !secret mqtt_password
  port: !secret mqtt_port

light:
  - platform: neopixelbus
    pin: GPIO2
    method: ESP8266_UART1
    num_leds: 16
    type: GRBW
    name: "Nightlight"
    effects:
      # Customize parameters
      - random:
          name: "Slow Random"
          transition_length: 30s
          update_interval: 30s
      - random:
          name: "Fast Random"
          transition_length: 4s
          update_interval: 5s
      - addressable_rainbow:
          name: Rainbow
          speed: 10
          width: 50
      - addressable_twinkle:
          name: Twinkle Effect
          twinkle_probability: 5%
          progress_interval: 4ms
      - addressable_random_twinkle:
          name: Random Twinkle
          twinkle_probability: 5%
          progress_interval: 32ms
      - addressable_fireworks:
          name: Fireworks
          update_interval: 32ms
          spark_probability: 10%
          use_random_color: false
          fade_out_rate: 120
      - addressable_flicker:
          name: Flicker

The esp8266_restore_from_flash option is useful because if the light is on and someone accidentally turns it off, it will go back to the same state when it is turned back on. It does wear the flash out more quickly, however.

The important settings are the light component with the neopixelbus platform, which is where all the magic happens. We specify which GPIO on the ESP the data line on the NeoPixel Ring is connected to (pin 2 in my case). The method we use needs to match the pin (as discussed above) and in this example is ESP8266_UART1.

The number of LEDs must match the actual number on the NeoPixel Ring, in my case 16. This is used when talking to the on-chip LED driver and calculating effects, etc.

Similarly, the LED type is important as it determines which order the colours are in (swap around if colours don’t match). This must match the actual type of NeoPixel Ring, in my case I’m using an RGBW model which has a separate white LED and is in the order GRBW.

Finally, you get all sorts of effects for free, you just need to list the ones you want and any options for them. These show up in Home Assistant under the advanced view of the light (screenshot below).

Now it’s a matter of plugging the ESP module in and flashing it with esphome.

esphome nightlight.yaml run

Home Assistant

After a reboot, the device should automatically show up in Home Assistant under Configuration -> Devices. From here you can add it to the Lovelace dashboard and make Automations or Scripts for the device.

Nightlight in Home Assistant with automations

Adding it to Lovelace dashboard looks something like this, which lets you easily turn the light on and off and set the brightness.

You can also get advanced settings for the light, where you can change brightness, colours and apply effects.

Nightlight options

Effects

One of the great things about using ESPHome is all the effects which are defined in the YAML file. To apply an effect, choose it from the advanced device view in Home Assistant (as per screenshot above).

This is what rainbow looks like.

Nightlight running Rainbow effect

The kids love to select the colours and effects they want!

Automation

So, once you have the nightlight showing up in Home Assistant, we can create a simple automation to turn it on at sunset and off at sunrise.

Go to Configuration -> Automation and add a new one. You can fill in any name you like and there’s an Execute button there when you want to test it.

The trigger uses the Sun module and runs 10 minutes before sunset.

I don’t use Conditions, but you could. For example, only do this when someone’s at home.

The Actions are set to call the homeassistant.turn_on function and specifies the device(s). Note this takes a comma separated list, so if you have more than one nightlight you can do it with the one automation rule.

That’s it! You can create another one for sunrise, but instead of calling homeassistant.turn_on just call homeassistant.turn_off and use Sunrise instead Sunset.

,

Gary PendergastBebo, Betty, and Jaco

Wait, wasn’t WordPress 5.4 just released?

It absolutely was, and congratulations to everyone involved! Inspired by the fine work done to get another release out, I finally completed the last step of co-leading WordPress 5.0, 5.1, and 5.2 (Bebo, Betty, and Jaco, respectively).

My study now has a bit more jazz in it. 🙂

,

Dave HallZoom's Make or Break Moment

Zoom is experiencing massive growth as large sections of the workforce transition to working from home. At the same time many problems with Zoom are coming to light. This is their make or break moment. If they fix the problems they end up with a killer video conferencing app. The alternative is that they join Cisco's Webex in the dumpster fire of awful enterprise software.

In the interest of transparency I am a paying Zoom customer and I use it for hours every day. I also use Webex (under protest) as it is a client's video conferencing platform of choice.

In the middle of last year Jonathan Leitschuh disclosed two bugs in zoom with security and privacy implications . There was a string of failures that lead to these bugs. To Zoom’s credit they published a long blog post about why these “features” were there in the first place.

Over the last couple of weeks other issues with Zoom have surfaced. “Zoom bombing” or using random 9 digit numbers to find meetings has become a thing. This is caused by zoom’s meeting rooms having a 9 digit code to join. That’s really handy when you have to dial in and enter the number on your telephone keypad. The down side is that you have a 1 in 999 999 999 chance of joining a meeting when using a random number. Zoom does offer the option of requiring a password or PIN for each call. Unfortunately it isn’t the default. Publishing a blog post on how to secure your meetings isn’t enough, the app needs to be more secure by default. The app should default to enabling a 6 digit PIN when creating a meeting.

The Intercept is reporting Zoom’s marketing department got a little carried away when describing the encryption used in the product. This is an area where words matter. Encryption in transit is a base line requirement in communication tools these days. Zoom has this, but their claims about end to end encryption appear to be false. End to end encryption is very important for some use cases. I await the blog post explaining this one.

I don’t know why Proton Mail’s privacy issues blog post got so much attention. This appears to be based on someone skimming the documentation rather than any real testing. Regardless the post got a lot of traction. Some of the same issues were flagged by the EFF.

Until recently zoom’s FAQ read “Does Zoom sell Personal Data? […] Depends what you mean by ‘sell’”. I’m sure that sounded great in a meeting but it is worrying when you read it as a customer. Once called out on social media it was quickly updated and a blog post published. In the post, Zoom assures users it isn’t selling their data.

Joseph Cox reported late last week that Zoom was sending data to Facebook every time someone used their iOS app. It is unclear if Joe gave Zoom an opportunity to fix the issue before publishing the article. The company pushed out a fix after the story broke.

The most recent issue broke yesterday about the Zoom macOS installer behaving like malware. This seems pretty shady behaviour, like their automatic reinstaller that was fixed last year. To his credit, Zoom Founder and CEO, Eric Yuan engaged with the issue on twitter. This will be one to watch over the coming days.

Over the last year I have seen a consistent pattern when Zoom is called out on security and valid privacy issues with their platform. They respond publicly with “oops my bad” blog posts . Many of the issues appear to be a result of them trying to deliver a great user experience. Unfortunately they some times lean too far toward the UX and ignore the security and privacy implications of their choices. I hope that over the coming months we see Zoom correct this balance as problems are called out. If they do they will end up with an amazing platform in terms of UX while keeping their users safe.

Update Since publishing this post additional issues with Zoom were reported. Zoom's CEO announced the company was committed to fixing their product.

,

Chris SmartDefining home automation devices in YAML with ESPHome and Home Assistant, no programming required!

Having built the core of my own “dumb” smart home system, I have been working on making it smart these past few years. As I’ve written about previously, the smart side of my home automation is managed by Home Assistant, which is an amazing, privacy focused open source platform. I’ve previously posted about running Home Assistant in Docker and in Podman.

Home Assistant, the privacy focused, open source home automation platform

I do have a couple of proprietary home automation products, including LIFX globes and Google Home. However, the vast majority of my home automation devices are ESP modules running open source firmware which connect to MQTT as the central protocol. I’ve built a number of sensors and lights and been working on making my light switches smart (more on that in a later blog post).

I already had experience with Arduino, so I started experimenting with this and it worked quite well. I then had a play with Micropython and really enjoyed it, but then I came across ESPHome and it blew me away. I have since migrated most of my devices to ESPHome.

ESPHome provides simple management of ESP devices

ESPHome is smart in making use of PlatformIO underneath, but its beauty lies in the way it abstracts away the complexities of programming for embedded devices. In fact, no programming is necessary! You simply have to define your devices in YAML and run a single command to compile the firmware blob and flash a device. Loops, initialising and managing multiple inputs and outputs, reading and writing to I/O, PWM, functions and callbacks, connecting to WiFi and MQTT, hosting an AP, logging and more is taken care of for you. Once up, the devices support mDNS and unencrypted over the air updates (which is fine for my local network). It supports both Home Assistant API and MQTT (over TLS for ESP8266) as well as lots of common components. There is even an addon for Home Assistant if you prefer using a graphical interface, but I like to do things on the command line.

When combined with Home Assistant, new devices are automatically discovered and appear in the web interface. When using MQTT, the channels are set with retain flag, so that the devices themselves and their last known states are not lost on reboots (you can disable this for testing).

That’s a lot of things you get for just a little bit of YAML!

Getting started

Getting started is pretty easy, just install esphome using pip.

pip3 install --user esphome

Of course, you will need a real physical ESP device of some description. Thanks to PlatformIO, lots of ESP8266 and ESP32 devices are supported. Although built on similar SOC, different devices break out different pins and can have different flashing requirements. Therefore, specifying the exact device is good and can be helpful, but it’s not strictly necessary.

It’s not just ESP modules that are supported. These days a number of commercial products are been built using ESP8266 chips which we can flash, like Sonoff power modules, Xiaomi temperature sensors, Brilliant Smart power outlets and Mirabella Genio light bulbs (I use one of these under my stairs).

For this post though, I will use one of my MH-ET Live ESP32Minikit devices as an example, which has the device name of mhetesp32minikit.

MH-ET Live ESP32Minikit

Managing configs with Git

Everything with your device revolves around your device’s YAML config file, including configuration, flashing, accessing logs, clearing out MQTT messages and more.

ESPHome has a wizard which will prompt you to enter your device details and WiFi credentials. It’s a good way to get started, however it only creates a skeleton file and you have to continue configuring the device manually to actually do anything anyway. So, I think ultimately it’s easier to just create and manage your own files, which we’ll do below. (If you want to give it a try, you can run the command esphome example.yaml wizard which will create an example.yaml file.)

I have two Git repositories to manage my ESPHome devices. The first one is for my WIFI and MQTT credentials (this is private and local, it is not pushed to GitHub), which are stored as variables in a file called secrets.yaml (store them in an Ansible vault, if you like). ESPHome automatically looks for this file when compiling firmware for a device and will use those variables.

Let’s create the Git repo and secrets file, replacing the details below with your own. Note that I am including the settings for an MQTT server, which is unencrypted in the example. If you’re using an MQTT server online you may want to use an ESP8266 device instead and enable TLS fingerprints for a more secure connection. I should also mention that MQTT is not required, devices can also use the Home Assistant API and if you don’t use MQTT those variables can be ignored (or you can leave them out).

mkdir ~/esphome-secrets
cd ~/esphome-secrets
cat > secrets.yaml << EOF
wifi_ssid: "ssid"
wifi_password: "wifi-password"
api_password: "api-password"
ota_password: "ota-password"
mqtt_broker: "mqtt-ip"
mqtt_port: 1883
mqtt_username: "mqtt-username"
mqtt_password: "mqtt-password"
EOF
git init
git add .
git commit -m "esphome secrets: add secrets"

The second Git repo has all of my device configs and references the secrets file from the other repo. I name each device’s config file the same as its name (e.g. study.yaml for the device that controls my study). Let’s create the Git repo and link to the secrets file and ignore things like the builds directory (where builds will go!).

mkdir ~/esphome-configs
cd ~/esphome-configs
ln -s ../esphome-secrets/secrets.yaml .
cat > .gitignore << EOF
/.esphome
/builds
/.*.swp
EOF
git init
git add .
git commit -m "esphome configs: link to secrets"

Creating a config

The config file contains different sections with core settings. You can leave some of these settings out, such as api, which will disable that feature on the device (esphome is required).

  • esphome – device details and build options
  • wifi – wifi credentials
  • logger – enable logging of device to see what’s happening
  • ota – enables over the air updates
  • api – enables the Home Assistant API to control the device
  • mqtt – enables MQTT to control the device

Now that we have our base secrets file, we can create our first device config! Note that settings with !secret are referencing the variables in our secrets.yaml file, thus keeping the values out of our device config. Here’s our new base config for an ESP32 device called example in a file called example.yaml which will connect to WiFi and MQTT.

cat > example.yaml << EOF
esphome:
  name: example
  build_path: ./builds/example
  platform: ESP32
  board: mhetesp32minikit

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

logger:

api:
  password: !secret api_password

ota:
  password: !secret ota_password

mqtt:
  broker: !secret mqtt_broker
  username: !secret mqtt_username
  password: !secret mqtt_password
  port: !secret mqtt_port
  # Set to true when finished testing to set MQTT retain flag
  discovery_retain: false
EOF

Compiling and flashing the firmware

First, plug your ESP device into your computer which should bring up a new TTY, such as /dev/ttyUSB0 (check dmesg). Now that you have the config file, we can compile it and flash the device (you might need to be in the dialout group). The run command actually does a number of things, include sanity check, compile, flash and tail the log.

esphome example.yaml run

This will compile the firmware in the specified build dir (./builds/example) and prompt you to flash the device. As this is a new device, an over the air update will not work yet, so you’ll need to select the TTY device. Once the device is running and connected to WiFi you can use OTA.

INFO Successfully compiled program.
Found multiple options, please choose one:
  [1] /dev/ttyUSB0 (CP2104 USB to UART Bridge Controller)
  [2] Over The Air (example.local)
(number): 

Once it is flashed, the device is automatically rebooted. The terminal should now be automatically tailing the log of the device (we enabled logger in the config). If not, you can tell esphome to tail the log by running esphome example.yaml logs.

INFO Successfully uploaded program.
INFO Starting log output from /dev/ttyUSB0 with baud rate 115200
[21:30:17][I][logger:156]: Log initialized
[21:30:17][C][ota:364]: There have been 0 suspected unsuccessful boot attempts.
[21:30:17][I][app:028]: Running through setup()...
[21:30:17][C][wifi:033]: Setting up WiFi...
[21:30:17][D][wifi:304]: Starting scan...
[21:30:19][D][wifi:319]: Found networks:
[21:30:19][I][wifi:365]: - 'ssid' (02:18:E6:22:E2:1A) ▂▄▆█
[21:30:19][D][wifi:366]:     Channel: 1
[21:30:19][D][wifi:367]:     RSSI: -54 dB
[21:30:19][I][wifi:193]: WiFi Connecting to 'ssid'...
[21:30:23][I][wifi:423]: WiFi Connected!
[21:30:23][C][wifi:287]:   Hostname: 'example'
[21:30:23][C][wifi:291]:   Signal strength: -50 dB ▂▄▆█
[21:30:23][C][wifi:295]:   Channel: 1
[21:30:23][C][wifi:296]:   Subnet: 255.255.255.0
[21:30:23][C][wifi:297]:   Gateway: 10.0.0.123
[21:30:23][C][wifi:298]:   DNS1: 10.0.0.1
[21:30:23][C][ota:029]: Over-The-Air Updates:
[21:30:23][C][ota:030]:   Address: example.local:3232
[21:30:23][C][ota:032]:   Using Password.
[21:30:23][C][api:022]: Setting up Home Assistant API server...
[21:30:23][C][mqtt:025]: Setting up MQTT...
[21:30:23][I][mqtt:162]: Connecting to MQTT...
[21:30:23][I][mqtt:202]: MQTT Connected!
[21:30:24][I][app:058]: setup() finished successfully!
[21:30:24][I][app:100]: ESPHome version 1.14.3 compiled on Mar 30 2020, 21:29:41

You should see the device boot up and connect to your WiFi and MQTT server successfully.

Adding components

Great! Now we have a basic YAML file, let’s add some components to make it do something more useful. Components are high level groups, like sensors, lights, switches, fans, etc. Each component is divided into platforms which is where different devices of that type are supported. For example, two of the different platforms under the light component are rgbw and neopixelbus.

One thing that’s useful to know is that platform devices with the name property set in the config will appear in Home Assistant. Those without will be only local to the device and just have an id. This is how you can link multiple components together on the device, then present a single device to Home Assistant (like garage remote below).

Software reset switch

First thing we can do is add a software switch which will let us reboot the device from Home Assistant (or by publishing manually to MQTT or API). To do this, we add the reboot platform from the switch component. It’s as simple as adding this to the bottom of your YAML file.

switch:
  - platform: restart
    name: "Example Device Restart"

That’s it! Now we can re-run the compile and flash. This time you can use OTA to flash the device via mDNS (but if it’s still connected via TTY then you can still use that instead).

esphome example.yaml run

This is what OTA updates look like.

INFO Successfully compiled program.
Found multiple options, please choose one:
  [1] /dev/ttyUSB0 (CP2104 USB to UART Bridge Controller)
  [2] Over The Air (example.local)
(number): 2
INFO Resolving IP address of example.local
INFO  -> 10.0.0.123
INFO Uploading ./builds/example/.pioenvs/example/firmware.bin (856368 bytes)
Uploading: [=====================================                       ] 62% 

After the device reboots, the new reset button should automatically show up in Home Assistant as a device, under Configuration -> Devices under the name example.

Home Assistant with auto-detected example device and reboot switch

Because we set a name for the reset switch, the reboot switch is visible and called Example Device Restart. If you want to make this visible on the main Overview dashboard, you can do so by selecting ADD TO LOVELACE.

Go ahead and toggle the switch while still tailing the log of the device and you should see it restart. If you’ve already disconnected your ESP device from your computer, you can tail the log using MQTT.

LED light switch

OK, so rebooting the device is cute. Now what if we want to add something more useful for home automation? Well that requires some soldering or breadboard action, but what we can do easily is use the built-in LED on the device as a light and control it through Home Assistant.

On the ESP32 module, the built-in LED is connected to GPIO pin 2. We will first define that pin as an output component using the ESP32 LEDC platform (supports PWM). We then attach a light component using the monochromatic platform to that output component. Let’s add those two things to our config!

output:
  # Built-in LED on the ESP32
  - platform: ledc
    pin: 2
    id: output_ledpin2

light:
  # Light created from built-in LED output
  - platform: monochromatic
    name: "Example LED"
    output: output_ledpin2

Build and flash the new firmware again.

esphome example.yaml run

After the device reboots, you should now be able to see the new Example LED automatically in Home Assistant.

Example device page in Home Assistant showing new LED light

If we toggle this light a few times, we can see the built-in LED on the ESP device fading in and out at the same time.

Other components

As mentioned previously, there are many devices we can easily add to a single board like relays, PIR sensors, temperature and humidity sensors, reed switches and more.

Reed switch, relay, PIR, temperature and humidity sensor (from top to bottom, left to right)

All we need to do is connect them up to appropriate GPIO pins and define them in the YAML.

PIR sensor

A PIR sensor connects to ground and 3-5V, with data connecting to a GPIO pin (let’s use 34 in the example). We read the GPIO pin and can tell when motion is detected because the control pin voltage is set to high. Under ESPHome we can use the binary_sensor component with gpio platform. If needed, pulling the pin down is easy, just set the default mode. Finally, we set the class of the device to motion which will set the appropriate icon in Home Assistant. It’s as simple as adding this to the bottom of your YAML file.

binary_sensor:
  - platform: gpio
    pin:
      number: 34
      mode: INPUT_PULLDOWN
    name: "Example PIR"
    device_class: motion

Again, compile and flash the firmware with esphome.

esphome example.yaml run

As before, after the device reboots again we should see the new PIR device appear in Home Assistant.

Example device page in Home Assistant showing new PIR input

Temperature and humidity sensor

Let’s do another example, a DHT22 temperature sensor connected to GPIO pin 16. Simply add this to the bottom of your YAML file.

sensor:
  - platform: dht
    pin: 16
    model: DHT22
    temperature:
      name: "Example Temperature"
    humidity:
      name: "Example Humidity"
    update_interval: 10s

Compile and flash.

esphome example.yaml run

After it reboots, you should see the new temperature and humidity inputs under devices in Home Assistant. Magic!

Example device page in Home Assistant showing new temperature and humidity inputs

Garage opener using templates and logic on the device

Hopefully you can see just how easy it is to add things to your ESP device and have them show up in Home Assistant. Sometimes though, you need to make things a little more tricky. Take opening a garage door for example, which only has one button to start and stop the motor in turn. To emulate pressing the garage opener, you need apply voltage to the opener’s push button input for a short while and then turn it off again. We can do all of this easily on the device with ESPHome and preset a single button to Home Assistant.

Let’s assume we have a relay connected up to a garage door opener’s push button (PB) input. The relay control pin is connected to our ESP32 on GPIO pin 22.

ESP32 device with relay module, connected to garage opener inputs

We need to add a couple of devices to the ESP module and then expose only the button out to Home Assistant. Note that the relay only has an id, so it is local only and not presented to Home Assistant. However, the template switch which uses the relay has a name is and it has an action which causes the relay to be turned on and off, emulating a button press.

Remember we already added a switch component for the reboot platform? Now need to add the new platform devices to that same section (don’t create a second switch entry).

switch:
  - platform: restart
    name: "Example Device Restart"

  # The relay control pin (local only)
  - platform: gpio
    pin: GPIO22
    id: switch_relay

  # The button to emulate a button press, uses the relay
  - platform: template
    name: "Example Garage Door Remote"
    icon: "mdi:garage"
    turn_on_action:
    - switch.turn_on: switch_relay
    - delay: 500ms
    - switch.turn_off: switch_relay

Compile and flash again.

esphome example.yaml run

After the device reboots, we should now see the new Garage Door Remote in the UI.

Example device page in Home Assistant showing new garage remote inputs

If you actually cabled this up and toggled the button in Home Assistant, the UI button turn on and you would hear the relay click on, then off, then the UI button would go back to the off state. Pretty neat!

There are many other things you can do with ESPHome, but this is just a taste.

Commit your config to Git

Once you have a device to your liking, commit it to Git. This way you can track the changes you’ve made and can always go back to a working config.

git add example.yaml
git commit -m "adding my first example config"

Of course it’s probably a good idea to push your Git repo somewhere remote, perhaps even share your configs with others!

Creating automation in Home Assistant

Of course once you have all these devices it’s great to be able to use them in Home Assistant, but ultimately the point of it all is to automate the home. Thus, you can use Home Assistant to set up scripts and react to things that happen. That’s beyond the scope of this particular post though, as I really wanted to introduce ESPHome and show how you can easily manage devices and integrate them with Home Assistant. There is pretty good documentation online though. Enjoy!

Overriding PlatformIO

As a final note, if you need to override something from PlatformIO, for example specifying a specific version of a dependency, you can do that by creating a modified platformio.ini file in your configs dir (copy from one of your build dirs and modify as needed). This way esphome will pick it up and apply that or you automatically.

,

Robert CollinsStrength training from home

For the last year I’ve been incrementally moving away from lifting static weights and towards body weight based exercises, or callisthenics. I’ve been doing this for a number of reasons, including better avoidance of injury (if I collapse, the entire stack is dynamic, if a bar held above my head drops on me, most of the weight is just dead weight – ouch), accessibility during travel – most hotel gyms are very poor, and functional relevance – I literally never need to put 100 kg on my back, but I do climb stairs, for instance.

Covid-19 shutting down the gym where I train is a mild inconvenience for me as a result, because even though I don’t do it, I am able to do nearly all my workouts entirely from home. And I thought a post about this approach might be of interest to other folk newly separated from their training facilities.

I’ve gotten most of my information from a few different youtube channels:

There are many more channels out there, and I encourage you to go and look and read and find out what works for you. Those 5 are my greatest hits, if you will. I’ve bought the FitnessFAQs exercise programs to help me with my my training, and they are indeed very effective.

While you don’t need a gymnasium, you do need some equipment, particularly if you can’t go and use a local park. Exactly what you need will depend on what you choose to do – for instance, doing dips on the edge of a chair can avoid needing any equipment, but doing them with some portable parallel bars can be much easier. Similarly, doing pull ups on the edge of a door frame is doable, but doing them with a pull-up bar is much nicer on your fingers.

Depending on your existing strength you may not need bands, but I certainly did. Buying rings is optional – I love them, but they aren’t needed to have a good solid workout.

I bought parallettes for working on the planche.undefined Parallel bars for dips and rows.undefined A pull-up bar for pull-ups and chin-ups, though with the rings you can add flys, rows, face-pulls, unstable push-ups and more. The rings. And a set of 3 bands that combine for 7 different support amounts.undefinedundefined

In terms of routine, I do a upper/lower split, with 3 days on upper body, one day off, one day on lower, and the weekends off entirely. I was doing 2 days on lower body, but found I was over-training with Aikido later that same day.

On upper body days I’ll do (roughly) chin ups or pull ups, push ups, rows, dips, hollow body and arch body holds, handstands and some grip work. Today, as I write this on Sunday evening, 2 days after my last training day on Friday, I can still feel my lats and biceps from training Friday afternoon. Zero issue keeping the intensity up.

For lower body, I’ll do pistol squats, nordic drops, quad extensions, wall sits, single leg calf raises, bent leg calf raises. Again, zero issues hitting enough intensity to achieve growth / strength increases. The only issue at home is having a stable enough step to get a good heel drop for the calf raises.

If you haven’t done bodyweight training at all before, when starting, don’t assume it will be easy – even if you’re a gym junkie, our bodies are surprisingly heavy, and there’s a lot of resistance just moving them around.

Good luck, train well!

OpenSTEMOnline Teaching

The OpenSTEM® materials are ideally suited to online teaching. In these times of new challenges and requirements, there are a lot of technological possibilities. Schools and teachers are increasingly being asked to deliver material online to students. Our materials can assist with that process, especially for Humanities and Science subjects from Prep/Kindy/Foundation to Year 6. […]

The post Online Teaching first appeared on OpenSTEM Pty Ltd.

Brendan ScottCovid 19 Numbers – lag

Recording some thoughts about Covid 19 numbers.

Today’s figures

The Government says:

“As at 6.30am on 22 March 2020, there have been 1,098 confirmed cases of COVID-19 in Australia”.

The reference is https://www.health.gov.au/news/health-alerts/novel-coronavirus-2019-ncov-health-alert/coronavirus-covid-19-current-situation-and-case-numbers. However, that page is updated daily (ish), so don’t expect it to be the same if you check the reference.

Estimating Lag

If a person tests positive to the virus today, that means they were infected at some time in the past. So, what is the lag between infection and a positive test result?

Incubation Lag – about 5 days

When you are infected you don’t show symptoms immediately. Rather, there’s an incubation period before symptoms become apparent.  The time between being infected and developing symptoms varies from person to person, but most of the time a person shows symptoms after about 5 days (I recall seeing somewhere that 1 in a 1000 cases will develop symptoms after 14 days).

Presentation Lag – about 2 days

I think it’s fair to also assume that people are not presenting at testing immediately they become ill. It is probably taking them a couple of days from developing symptoms to actually get to the doctor – I read a story somewhere (have since lost the reference) about a young man who went to a party, then felt bad for days but didn’t go for a test until someone else from the party had returned a positive test.  Let’s assume there’s a mix of worried well and stoic types and call it 2 days from becoming symptomatic to seeking a test.

Referral Lag – about a day

Assuming that a GP is available straight away and recommends a test immediately, logistically there will still be most of a day taken up between deciding to see a doctor and having a test carried out.

Testing lag – about 2 days

The graph of infections “epi graph” today looks like this:

200322_new-and-cumulative-covid-19-cases-in-australia-by-notification-date_1

One thing you notice about the graph is that the new cases bars seem to increase for a couple of days, then decrease – so about 100 new cases in the last 24 hours, but almost 200 in the 24 hours before that. From the graph, the last 3 “dips” have been today (Sunday), last Thursday and last Sunday.  This seems to be happening every 3 to 4 days. I initially thought that the dips might mean fewer (or more) people presenting over weekends, but the period is inconsistent with that. I suspect, instead, that this actually means that testing is being batched.

That would mean that neither the peaks nor troughs is representative of infection surges/retreats, but is simply reflecting when tests are being processed. This seems to be a 4 day cycle, so, on average it seems that it would be about 2 days between having the test conducted and receiving a result. So a confirmed case count published today is actually showing confirmed cases as at about 2 days earlier.

Total lag

From the date someone is infected to the time that they receive a positive confirmation is about:

lag = time for symptoms to show+time to seek a test+referral time + time for the test to return a result

So, the published figures on confirmed infections are probably lagging actual infections in the community by about 10 days (5+2+1+2).

If there’s about a 10 day lag between infection and confirmation, then what a figure published today says is that about a week and a half ago there were about this many cases in the community.  So, the 22 March figure of 1098 infections is actually really a 12 March figure.

What the lag means for Physical (ie Social) Distancing

The main thing that the lag means is that if we were able to wave a magic wand today and stop all further infections, we would continue to record new infections for about 10 days (and the tail for longer). In practical terms, implementing physical distancing measures will not show any effect on new cases for about a week and a half. That’s because today there are infected people who are yet to be tested.

The silver lining to that is that the physical distancing measures that have been gaining prominence since 15 March should start to show up in the daily case numbers from the middle of the coming week, possibly offset by overseas entrants rushing to make the 20 March entry deadline.

Estimating Actual Infections as at Today

How many people are infected, but unconfirmed as at today? To estimate actual infections you’d need to have some idea of the rate at which infections are increasing. For example, if infections increased by 10% per day for 10 days, then you’d multiply the most recent figure by 1.1 raised to the power of 10 (ie about 2.5).  Unfortunately, the daily rate of increase (see table on the wiki page) has varied a fair bit (from 20% to 27%) over the most recent 10 days of data (that is, over the 10 days prior to 12 March, since the 22 March figures roughly correspond to 12 March infections) and there’s no guarantee that since that time the daily increase in infections will have remained stable, particularly in light of the implementation of physical distancing measures. At 23.5% per day, the factor is about 8.

There aren’t any reliable figures we can use to estimate the rate of infection during the current lag period (ie from 12 March to 22 March). This is because the vast majority of cases have not been from unexplained community transmission. Most of the cases are from people who have been overseas in the previous fortnight and they’re the cohort that has been most significantly impacted by recent physical distancing measures. From 15 March, they have been required to self isolate and from 20 March most of their entry into the country has stopped.  So I’d expect a surge in numbers up to about 30 March – ie reflecting infections in the cohort of people rushing to get into the country before the borders closed followed by a flattening. With the lag factor above, you’ll need to wait until 1 April or thereabouts to know for sure.

Note:

This post is just about accounting for the time lag between becoming infected and receiving a positive test result. It assumes, for example, that everyone who is infected seeks a test, and that everyone who is infected and seeks a test is, in fact, tested. As at today, neither of these things is true.

,

OpenSTEMCOVID-19 (of course)

We thought it timely to review a few facts and observations, relying on published medical papers (or those submitted for peer review) and reliable sources.

The post COVID-19 (of course) first appeared on OpenSTEM Pty Ltd.

,

Ben MartinTerry2020 finally making the indoor beast more stable

Over time the old Terry robot had evolved from a basic "T" shape to have pan and tilt and a robot arm on board. The rear caster(s) were the weakest part of the robot enabling the whole thing to rock around more than it should. I now have Terry 2020 on the cards.


Part of this is an upgrade to a Kinect2 for navigation. The power requirements of that (12v/3a or so) have lead me to putting a better dc-dc bus on board and some relays to be able to pragmatically shut down and bring up features are needed and conserve power otherwise. The new base footprint is 300x400mm though the drive wheels stick out the side.

The wheels out the sides is partially due to the planetary gear motors (on the under side) being quite long. If it is an issue I can recut the lowest layer alloy and move them inward but I an not really needing to have the absolute minimal turning circle. If that were the case I would move the drive wheels to the middle of the chassis so it could turn on it's center.

There will be 4 layers at the moment and a mezzanine below the arm. So there will be expansion room included in the build :)

The rebuild will allow Terry to move at top speed when self driving. Terry will never move at the speed of an outdoor robot but can move closer to it's potential when it rolls again.

,

Ben MartinBidirectional rc joystick

With a bit of tinkering one can use the https://github.com/bmellink/IBusBM library to send information back to the remote controller. The info is tagged as either temperature, rpm, or voltage and units set based on that. There is a limit of 9 user feedbacks so I have 3 of each exposed.


To do this I used one of the Mega 2650 boards that is in a small form factor configuration. This gave me 5 volts to run the actual rc receiver from and more than one UART to talk to the usb, input and output parts of the buses. I think you only need 2 UARTs but as I had a bunch I just used separate ones.

The 2560 also gives a lavish amount of ram so using ROS topics doesn't really matter. I have 9 subscribers and 1 publisher on the 2560. The 9 subscribers allows sending temp, voltage, rpm info back to the remote and flexibility in what is sent so that can be adjusted on the robot itself.

I used a servo extension cable to carry the base 5v, ground, and rx signals from the ibus out on the rc receiver unit. Handy as the servo plug ends can be taped together for the more bumpy environment that the hound likes to tackle. I wound up putting the diode floating between two extension wires on the (to tx) side of the bus.



The 1 publisher just sends an array with the raw RC values in it. With minimal delays I can get a reasonably steady 120hz publication of rc values. So now the houndbot can tell me when it is getting hungry for more fresh electrons from a great distance!

I had had some problems with the nano and the rc unit and locking up. I think perhaps this was due to crystals as the uno worked ok. The 2560 board has been bench tested for 30 minutes which was enough time to expose the issues on the nano.


Matthew OliverPOC Wireguard + FRR: Now with OSPFv2!

If you read my last post, I set up a POC with wireguard and FRR to have to power of wireguard (WG) but all the routing worked out with FRR. But I had a problem. When using RIPv2, the broadcast messages seemed to get stuck in the WG interfaces until I tcpdumped it. This meant that once I tcpdumped the routes would get through, but only to eventually go stale and disappear.

I talked with the awesome people in the #wireguard IRC channel on freenode and was told to simply stay clear of RIP.

So I revisited my POC env and swapped out RIP for OSPF.. and guess what.. it worked! Now all the routes get propagated and they stay there. Which means if I decided to add new WG links and make it grow, so should all the routing:

suse@wireguard-5:~> ip r
default via 172.16.0.1 dev eth0 proto dhcp
10.0.2.0/24 via 10.0.4.104 dev wg0 proto 188 metric 20
10.0.3.0/24 via 10.0.4.104 dev wg0 proto 188 metric 20
10.0.4.0/24 dev wg0 proto kernel scope link src 10.0.4.105
172.16.0.0/24 dev eth0 proto kernel scope link src 172.16.0.36
172.16.1.0/24 via 10.0.4.104 dev wg0 proto 188 metric 20
172.16.2.0/24 via 10.0.4.104 dev wg0 proto 188 metric 20
172.16.3.0/24 via 10.0.4.104 dev wg0 proto 188 metric 20
172.16.4.0/24 dev eth1 proto kernel scope link src 172.16.4.105
172.16.5.0/24 dev eth2 proto kernel scope link src 172.16.5.105

Isn’t that beautiful, all networks on one of the more distant nodes, including network 1 (172.16.1.0/24).

I realise this doesn’t make much sense unless you read the last post, but never fear, I thought I’ll rework and append the build notes here, in case you interested again.

Build notes – This time with OSPFv2

The topology we’ll be building

Seeing that this is my Suse hackweek project and now use OpenSuse, I’ll be using OpenSuse Leap 15.1 for all the nodes (and the KVM host too).

Build the env

I used ansible-virt-infra created by csmart to build the env. A created my own inventory file, which you can dump in the inventory/ folder which I called wireguard.yml:

---
wireguard:
hosts:
wireguard-1:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-blue"
- name: "net-green"
wireguard-2:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-blue"
- name: "net-white"
wireguard-3:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-white"
wireguard-4:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-orange"
- name: "net-green"
wireguard-5:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-orange"
- name: "net-yellow"
wireguard-6:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-yellow"
vars:
virt_infra_distro: opensuse
virt_infra_distro_image: openSUSE-Leap-15.1-JeOS.x86_64-15.1.0-OpenStack-Cloud-Current.qcow2
virt_infra_distro_image_url: https://download.opensuse.org/distribution/leap/15.1/jeos/openSUSE-Leap-15.1-JeOS.x86_64-15.1.0-OpenStack-Cloud-Current.qcow2
virt_infra_variant: opensuse15.1

Next we need to make sure the networks have been defined, we do this in the kvmhost inventory file, here’s a diff:

diff --git a/inventory/kvmhost.yml b/inventory/kvmhost.yml
index b1f029e..6d2485b 100644
--- a/inventory/kvmhost.yml
+++ b/inventory/kvmhost.yml
@@ -40,6 +40,36 @@ kvmhost:
           subnet: "255.255.255.0"
           dhcp_start: "10.255.255.2"
           dhcp_end: "10.255.255.254"
+        - name: "net-mgmt"
+          ip_address: "172.16.0.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.0.2"
+          dhcp_end: "172.16.0.99"
+        - name: "net-white"
+          ip_address: "172.16.1.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.1.2"
+          dhcp_end: "172.16.1.99"
+        - name: "net-blue"
+          ip_address: "172.16.2.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.2.2"
+          dhcp_end: "172.16.2.99"
+        - name: "net-green"
+          ip_address: "172.16.3.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.3.2"
+          dhcp_end: "172.16.3.99"
+        - name: "net-orange"
+          ip_address: "172.16.4.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.4.2"
+          dhcp_end: "172.16.4.99"
+        - name: "net-yellow"
+          ip_address: "172.16.5.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.5.2"
+          dhcp_end: "172.16.5.99"
     virt_infra_host_deps:
         - qemu-img
         - osinfo-query

Now all we need to do is run the playbook:

ansible-playbook --limit kvmhost,wireguard ./virt-infra.yml

Setting up the IPs and tunnels

This above infrastructure tool uses cloud_init to set up the network, so only the first NIC is up. You can confirm this with:

ansible wireguard -m shell -a "sudo ip a"

That’s ok because we want to use the numbers on our diagram anyway 🙂
Before we get to that, lets make sure wireguard is setup, and update all the nodes.

ansible wireguard -m shell -a "sudo zypper update -y"

If a reboot is required, reboot the nodes:

ansible wireguard -m shell -a "sudo reboot"

Add the wireguard repo to the nodes and install it, I look forward to 5.6 where wireguard will be included in the kernel:

ansible wireguard -m shell -a "sudo zypper addrepo -f obs://network:vpn:wireguard wireguard"

ansible wireguard -m shell -a "sudo zypper --gpg-auto-import-keys install -y wireguard-kmp-default wireguard-tools"

Load the kernel module:

ansible wireguard -m shell -a "sudo modprobe wireguard"

Let’s create wg0 on all wireguard nodes:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo ip link add dev wg0 type wireguard"

And add wg1 to those nodes that have 2:

ansible wireguard-1,wireguard-4 -m shell -a "sudo ip link add dev wg1 type wireguard"

Now while we’re at it, lets create all the wireguard keys (because we can use ansible):

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo mkdir -p /etc/wireguard"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "wg genkey | sudo tee /etc/wireguard/wg0-privatekey | wg pubkey | sudo tee /etc/wireguard/wg0-publickey"

ansible wireguard-1,wireguard-4 -m shell -a "wg genkey | sudo tee /etc/wireguard/wg1-privatekey | wg pubkey | sudo tee /etc/wireguard/wg1-publickey"

Let’s make sure we enable forwarding on the nodes the will pass traffic, and install the routing software (1,2,4 and 5):

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sysctl net.ipv4.conf.all.forwarding=1"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sysctl net.ipv6.conf.all.forwarding=1"

While we’re at it, we might as well add the network repo so we can install FRR and then install it on the nodes:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo zypper ar https://download.opensuse.org/repositories/network/openSUSE_Leap_15.1/ network"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo zypper --gpg-auto-import-keys install -y frr libyang-extentions"

This time we’ll be using OSPFv2, as we’re just using IPv4:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sed -i 's/^ospfd=no/ospfd=yes/' /etc/frr/daemons"

And with that now we just need to do all per server things like add IPs and configure all the keys, peers, etc. We’ll do this a host at a time.
NOTE: As this is a POC we’re just using ip commands, obviously in a real env you’d wont to use systemd-networkd or something to make these stick.

wireguard-1

Firstly using:
sudo virsh dumpxml wireguard-1 |less

We can see that eth1 is net-blue and eth2 is net-green so:
ssh wireguard-1

First IPs:
sudo ip address add dev eth1 172.16.2.101/24
sudo ip address add dev eth2 172.16.3.101/24
sudo ip address add dev wg0 10.0.2.101/24
sudo ip address add dev wg1 10.0.3.101/24

Load up the tunnels:
sudo wg set wg0 listen-port 51821 private-key /etc/wireguard/wg0-privatekey

# Node2 (2.102) public key is: P1tHKnaw7d2GJUSwXZfcayrrLMaCBHqcHsaM3eITm0s= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer P1tHKnaw7d2GJUSwXZfcayrrLMaCBHqcHsaM3eITm0s= allowed-ips 10.0.2.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.2.102:51822

sudo ip link set wg0 up

NOTE: We add 224.0.0.0/8 and 172.16.0.0/16 to allowed-ips. The first allows the OSPF multicast packets through. The latter will allow us to route to our private network that has been joined by WG tunnels.

sudo wg set wg1 listen-port 51831 private-key /etc/wireguard/wg1-privatekey

# Node4 (3.104) public key is: GzY59HlXkCkfXl9uSkEFTHzOtBsxQFKu3KWGFH5P9Qc= (cat /etc/wireguard/wg1-publickey)

sudo wg set wg1 peer GzY59HlXkCkfXl9uSkEFTHzOtBsxQFKu3KWGFH5P9Qc= allowed-ips 10.0.3.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.3.104:51834

sudo ip link set wg1 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router ospf
network 10.0.2.0/24 area 0.0.0.0
network 10.0.3.0/24 area 0.0.0.0
redistribute connected
EOF

sudo systemctl restart frr

wireguard-2

Firstly using:
sudo virsh dumpxml wireguard-2 |less

We can see that eth1 is net-blue and eth2 is net-white so:

ssh wireguard-2

First IPs:
sudo ip address add dev eth1 172.16.2.102/24
sudo ip address add dev eth2 172.16.1.102/24
sudo ip address add dev wg0 10.0.2.102/24


Load up the tunnels:
sudo wg set wg0 listen-port 51822 private-key /etc/wireguard/wg0-privatekey

# Node1 (2.101) public key is: ZsHAeRbNsK66MBOwDJhdDgJRl0bPFB4WVRX67vAV7zs= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer ZsHAeRbNsK66MBOwDJhdDgJRl0bPFB4WVRX67vAV7zs= allowed-ips 10.0.2.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.2.101:51821

sudo ip link set wg0 up

NOTE: We add 224.0.0.0/8 and 172.16.0.0/16 to allowed-ips. The first allows the OSPF multicast packets through. The latter will allow us to route to our private network that has been joined by WG tunnels.

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)


password frr
enable password frr

log file /var/log/frr/frr.log

router ospf
network 10.0.2.0/24 area 0.0.0.0
redistribute connected
EOF

sudo systemctl restart frr

wireguard-3

Only has a net-white, so it must be eth1 so:

ssh wireguard-3

First IPs:
sudo ip address add dev eth1 172.16.1.103/24

Has no WG tunnels or FRR so we’re done here.

wireguard-4

Firstly using:
sudo virsh dumpxml wireguard-4 |less

We can see that eth1 is net-orange and eth2 is net-green so:

ssh wireguard-4

First IPs:
sudo ip address add dev eth1 172.16.4.104/24
sudo ip address add dev eth2 172.16.3.104/24
sudo ip address add dev wg0 10.0.4.104/24
sudo ip address add dev wg1 10.0.3.104/24

Load up the tunnels:
sudo wg set wg0 listen-port 51844 private-key /etc/wireguard/wg0-privatekey

# Node5 (4.105) public key is: Af/sIEnklG6nnDb0wzUSq1D/Ujh6TH+5R9TblLyS3h8= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer Af/sIEnklG6nnDb0wzUSq1D/Ujh6TH+5R9TblLyS3h8= allowed-ips 10.0.4.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.4.105:51845

sudo ip link set wg0 up

NOTE: We add 224.0.0.0/8 and 172.16.0.0/16 to allowed-ips. The first allows the OSPF multicast packets through. The latter will allow us to route to our private network that has been joined by WG tunnels.

sudo wg set wg1 listen-port 51834 private-key /etc/wireguard/wg1-privatekey

# Node1 (3.101) public key is: Yh0kKjoqnJsxbCsTkQ/3uncEhdqa+EtJXCYcVzMdugs= (cat /etc/wireguard/wg1-publickey)

sudo wg set wg1 peer Yh0kKjoqnJsxbCsTkQ/3uncEhdqa+EtJXCYcVzMdugs= allowed-ips 10.0.3.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.3.101:51831

sudo ip link set wg1 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router osfp

network 10.0.3.0/24 area 0.0.0.0
network 10.0.4.0/24 area 0.0.0.0
redistribute connected
EOF


sudo systemctl restart frr

wireguard-5

Firstly using:
sudo virsh dumpxml wireguard-5 |less

We can see that eth1 is net-orange and eth2 is net-yellow so:

ssh wireguard-5

First IPs”
sudo ip address add dev eth1 172.16.4.105/24
sudo ip address add dev eth2 172.16.5.105/24
sudo ip address add dev wg0 10.0.4.105/24

Load up the tunnels:
sudo wg set wg0 listen-port 51845 private-key /etc/wireguard/wg0-privatekey

# Node4 (4.104) public key is: aPA197sLN3F05bgePpeS2uZFlhRRLY8yVWnzBAUcD3A= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer aPA197sLN3F05bgePpeS2uZFlhRRLY8yVWnzBAUcD3A= allowed-ips 10.0.4.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.4.104:51844

sudo ip link set wg0 up

NOTE: We add 224.0.0.0/8 and 172.16.0.0/16 to allowed-ips. The first allows the OSPF multicast packets through. The latter will allow us to route to our private network that has been joined by WG tunnels.

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router ospf

network 10.0.4.0/24 area 0.0.0.0
redistribute connected
EOF


sudo systemctl restart frr

wireguard-6

Only has a net-yellow, so it must be eth1 so:

ssh wireguard-6

First IPs:
sudo ip address add dev eth1 172.16.5.106/24

Final comments

After all this, you now should be where I’m up to. Have an environment that is sharing routes though the WG interfaces.

The current issue I have is that if I go and ping from wireguard-1 to wireguard-5, the ICMP packet happily routes through into the 10.0.3.0/24 tunnel. When it pops out in wg1 of wireguard-4 the kernel isn’t routing it onto wireguard-5 through wg0, or WG isn’t putting the packet into the IP stack or Forwarding queue to continue it’s journey.

Well that is my current assumption. Hopefully I’ll get to the bottom of it soon, and in which case I’ll post it here 🙂

,

Matthew OliverPOC WireGuard + FRR Setup a.k.a dodgy meshy test network

It’s hackweek at Suse! Probably one of my favourite times of year, though I think they come up every 9 months or so.

Anyway, this hackweek I’ve been on a WireGuard journey. I started reading the paper and all the docs. Briefly looking into the code, sitting in the IRC channel and joining the mailing list to get a feel for the community.

There is still 1 day left of hackweek, so I hope to spend more time in the code, and maybe, just maybe see if I can fix a bug.. although they don’t seem to have tracker like most projects, so let’s see how that goes.

The community seems pretty cool. The tech, frankly pretty amazing, even I, from a cloud storage background, understood most the paper.

I had set up a tunnel, tcpdumped traffic, used wireshark to look closely at the packets as I read the paper, it was very informative. But I really wanted to get a feel for how this tech could work. They do have a wg-dynamic project which is planning on use wg as a building block to do cooler things, like mesh networking. This sounds cool, so I wanted to sync my teeth in and see how, not wg-dynamic, but see if I could build something similar out of existing OSS tech, and see where the gotchas are, outside of the obviously less secure. It seemed like a good way to better understand the technology.

So on Wednesday, I decided to do just that. Today is Thursday and I’ve gotten to a point where I can say I partially succeeded. And before I delve in deeper and try and figure out my current stumbling block, I thought I’d write down where I am.. and how I got here.. to:

  1. Point the wireguard community at, in case they’re interested.
  2. So you all can follow along at home, because it’s pretty interesting, I think.

As this title suggests, the plan is/was to setup a bunch of tunnels and use FRR to set up some routing protocols up to talk via these tunnels, auto-magically 🙂

UPDATE: The problem I describe in this post, routes becoming stale, only seems to happen when using RIPv2. When I change it to OSPFv2 all the routes work as expected!! Will write a follow up post to explain the differences.. in fact may rework the notes for it too 🙂

The problem at hand

Test network VM topology

A picture is worth 1000 words. The basic idea is to simulate a bunch of machines and networks connected over wireguard (WG) tunnels. So I created 6 vms, connected as you can see above.

I used Chris Smart’s ansible-virt-infra project, which is pretty awesome, to build up the VMs and networks as you see above. I’ll leave my build notes as an appendix to this post.

Once I have the infrastructure setup, I build all the tunnels as they are in the image. Then went ahead and installed FRR on all the nodes with tunnels (nodes 1, 2, 4, and 5). To keep things simple, I started with the easiest to configure routing protocol, RIPv2.

Believe it or not, everything seemed to work.. well mostly. I can jump on say node 5 (wireguard-5 if you playing along at home) and:

suse@wireguard-5:~> ip r
default via 172.16.0.1 dev eth0 proto dhcp
10.0.2.0/24 via 10.0.4.104 dev wg0 proto 189 metric 20
10.0.3.0/24 via 10.0.4.104 dev wg0 proto 189 metric 20
10.0.4.0/24 dev wg0 proto kernel scope link src 10.0.4.105
172.16.0.0/24 dev eth0 proto kernel scope link src 172.16.0.36
172.16.2.0/24 via 10.0.4.104 dev wg0 proto 189 metric 20
172.16.3.0/24 via 10.0.4.104 dev wg0 proto 189 metric 20
172.16.4.0/24 dev eth1 proto kernel scope link src 172.16.4.105
172.16.5.0/24 dev eth2 proto kernel scope link src 172.16.5.105

Looks good right, we see routes for networks 172.16.{0,2,3,4,5}.0/24. Network 1 isn’t there, but hey that’s quite far away, maybe it hasn’t made it yet. Which leads to the real issue.

If I go and run ip r again, soon all these routes will become stale and disappear. Running ip -ts monitor shows just that.

So the question is, what’s happening to the RIP advertisements? And yes they’re still being sent. Then how come some made it to node 5, and never again.

The simple answer is, it was me. The long answer is, I’ve never used FRR before, and it just didn’t seem to be working. So I started debugging the env. To debug, I had a tmux session opened on the KVM host with a tab for each node running FRR. I’d go to each tab and run tcpdump to check to see if the RIP traffic was making it through the tunnel. And almost instantly, I saw traffic, like:

suse@wireguard-5:~> sudo tcpdump -v -U -i wg0 port 520
tcpdump: listening on wg0, link-type RAW (Raw IP), capture size 262144 bytes
03:01:00.006408 IP (tos 0xc0, ttl 64, id 62964, offset 0, flags [DF], proto UDP (17), length 52)
10.0.4.105.router > 10.0.4.255.router:
RIPv2, Request, length: 24, routes: 1 or less
AFI 0, 0.0.0.0/0 , tag 0x0000, metric: 16, next-hop: self
03:01:00.007005 IP (tos 0xc0, ttl 64, id 41698, offset 0, flags [DF], proto UDP (17), length 172)
10.0.4.104.router > 10.0.4.105.router:
RIPv2, Response, length: 144, routes: 7 or less
AFI IPv4, 0.0.0.0/0 , tag 0x0000, metric: 1, next-hop: self
AFI IPv4, 10.0.2.0/24, tag 0x0000, metric: 2, next-hop: self
AFI IPv4, 10.0.3.0/24, tag 0x0000, metric: 1, next-hop: self
AFI IPv4, 172.16.0.0/24, tag 0x0000, metric: 1, next-hop: self
AFI IPv4, 172.16.2.0/24, tag 0x0000, metric: 2, next-hop: self
AFI IPv4, 172.16.3.0/24, tag 0x0000, metric: 1, next-hop: self
AFI IPv4, 172.16.4.0/24, tag 0x0000, metric: 1, next-hop: self

At first I thought it was good timing. I jumped to another host, and when I tcpdumed the RIP packets turned up instantaneously. This happened again and again.. and yes it took me longer then I’d like to admit before it dawned on me.

Why are routes going stale? it seems as though the packets are getting queued/stuck in the WG interface until I poked it with tcpdump!

These RIPv2 Request packet is sent as a broadcast, not directly to the other end of the tunnel. To get it to not be dropped, I had to widen my WG peer allowed-ips from the /32 to a /24.
So now I wonder if broadcast, or just the fact that it’s only 52 bytes, means it gets queued up and not sent through the tunnel, that is until I come along with a hammer and tcpdump the interface?

Maybe one way I could test this is to speed up the RIP broadcasts and hopefully fill a buffer, or see if I can turn WG, or rather the kernel, into debugging mode.

Build notes

As Promised, here are the current form of my build notes, make reference to the topology image I used above.

BTW I’m using OpenSuse Leap 15.1 for all the nodes.

Build the env

I used ansible-virt-infra created by csmart to build the env. A created my own inventory file, which you can dump in the inventory/ folder which I called wireguard.yml:

---
wireguard:
hosts:
wireguard-1:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-blue"
- name: "net-green"
wireguard-2:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-blue"
- name: "net-white"
wireguard-3:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-white"
wireguard-4:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-orange"
- name: "net-green"
wireguard-5:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-orange"
- name: "net-yellow"
wireguard-6:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-yellow"
vars:
virt_infra_distro: opensuse
virt_infra_distro_image: openSUSE-Leap-15.1-JeOS.x86_64-15.1.0-OpenStack-Cloud-Current.qcow2
virt_infra_distro_image_url: https://download.opensuse.org/distribution/leap/15.1/jeos/openSUSE-Leap-15.1-JeOS.x86_64-15.1.0-OpenStack-Cloud-Current.qcow2
virt_infra_variant: opensuse15.1

Next we need to make sure the networks have been defined, we do this in the kvmhost inventory file, here’s a diff:

diff --git a/inventory/kvmhost.yml b/inventory/kvmhost.yml
index b1f029e..6d2485b 100644
--- a/inventory/kvmhost.yml
+++ b/inventory/kvmhost.yml
@@ -40,6 +40,36 @@ kvmhost:
           subnet: "255.255.255.0"
           dhcp_start: "10.255.255.2"
           dhcp_end: "10.255.255.254"
+        - name: "net-mgmt"
+          ip_address: "172.16.0.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.0.2"
+          dhcp_end: "172.16.0.99"
+        - name: "net-white"
+          ip_address: "172.16.1.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.1.2"
+          dhcp_end: "172.16.1.99"
+        - name: "net-blue"
+          ip_address: "172.16.2.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.2.2"
+          dhcp_end: "172.16.2.99"
+        - name: "net-green"
+          ip_address: "172.16.3.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.3.2"
+          dhcp_end: "172.16.3.99"
+        - name: "net-orange"
+          ip_address: "172.16.4.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.4.2"
+          dhcp_end: "172.16.4.99"
+        - name: "net-yellow"
+          ip_address: "172.16.5.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.5.2"
+          dhcp_end: "172.16.5.99"
     virt_infra_host_deps:
         - qemu-img
         - osinfo-query

Now all we need to do is run the playbook:

ansible-playbook --limit kvmhost,wireguard ./virt-infra.yml

Setting up the IPs and tunnels

This above infrastructure tool uses cloud_init to set up the network, so only the first NIC is up. You can confirm this with:

ansible wireguard -m shell -a "sudo ip a"

That’s ok because we want to use the numbers on our diagram anyway 🙂
Before we get to that, lets make sure wireguard is setup, and update all the nodes.

ansible wireguard -m shell -a "sudo zypper update -y"

If a reboot is required, reboot the nodes:

ansible wireguard -m shell -a "sudo reboot"

Add the wireguard repo to the nodes and install it, I look forward to 5.6 where wireguard will be included in the kernel:

ansible wireguard -m shell -a "sudo zypper addrepo -f obs://network:vpn:wireguard wireguard"

ansible wireguard -m shell -a "sudo zypper --gpg-auto-import-keys install -y wireguard-kmp-default wireguard-tools"

Load the kernel module:

ansible wireguard -m shell -a "sudo modprobe wireguard"

Let’s create wg0 on all wireguard nodes:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo ip link add dev wg0 type wireguard"

And add wg1 to those nodes that have 2:

ansible wireguard-1,wireguard-4 -m shell -a "sudo ip link add dev wg1 type wireguard"

Now while we’re at it, lets create all the wireguard keys (because we can use ansible):

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo mkdir -p /etc/wireguard"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "wg genkey | sudo tee /etc/wireguard/wg0-privatekey | wg pubkey | sudo tee /etc/wireguard/wg0-publickey"

ansible wireguard-1,wireguard-4 -m shell -a "wg genkey | sudo tee /etc/wireguard/wg1-privatekey | wg pubkey | sudo tee /etc/wireguard/wg1-publickey"

Let’s make sure we enable forwarding on the nodes the will pass traffic, and install the routing software (1,2,4 and 5):

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sysctl net.ipv4.conf.all.forwarding=1"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sysctl net.ipv6.conf.all.forwarding=1"

While we’re at it, we might as well add the network repo so we can install FRR and then install it on the nodes:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo zypper ar https://download.opensuse.org/repositories/network/openSUSE_Leap_15.1/ network"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo zypper --gpg-auto-import-keys install -y frr libyang-extentions"

We’ll be using RIPv2, as we’re just using IPv4:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sed -i 's/^ripd=no/ripd=yes/' /etc/frr/daemons"

And with that now we just need to do all per server things like add IPs and configure all the keys, peers, etc. We’ll do this a host at a time.
NOTE: As this is a POC we’re just using ip commands, obviously in a real env you’d wont to use systemd-networkd or something to make these stick.

wireguard-1

Firstly using:
sudo virsh dumpxml wireguard-1 |less

We can see that eth1 is net-blue and eth2 is net-green so:
ssh wireguard-1

First IPs:
sudo ip address add dev eth1 172.16.2.101/24
sudo ip address add dev eth2 172.16.3.101/24
sudo ip address add dev wg0 10.0.2.101/24
sudo ip address add dev wg1 10.0.3.101/24

Load up the tunnels:
sudo wg set wg0 listen-port 51821 private-key /etc/wireguard/wg0-privatekey

# Node2 (2.102) public key is: P1tHKnaw7d2GJUSwXZfcayrrLMaCBHqcHsaM3eITm0s= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer P1tHKnaw7d2GJUSwXZfcayrrLMaCBHqcHsaM3eITm0s= allowed-ips 10.0.2.0/24 endpoint 172.16.2.102:51822

sudo ip link set wg0 up

sudo wg set wg1 listen-port 51831 private-key /etc/wireguard/wg1-privatekey

# Node4 (3.104) public key is: GzY59HlXkCkfXl9uSkEFTHzOtBsxQFKu3KWGFH5P9Qc= (cat /etc/wireguard/wg1-publickey)

sudo wg set wg1 peer GzY59HlXkCkfXl9uSkEFTHzOtBsxQFKu3KWGFH5P9Qc= allowed-ips 10.0.3.0/24 endpoint 172.16.3.104:51834

sudo ip link set wg1 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router rip
version 2
redistribute kernel
redistribute connected

network wg0
no passive-interface wg0
network wg1
no passive-interface wg1
EOF

sudo systemctl restart frr

wireguard-2

Firstly using:
sudo virsh dumpxml wireguard-2 |less

We can see that eth1 is net-blue and eth2 is net-white so:

ssh wireguard-2

First IPs:
sudo ip address add dev eth1 172.16.2.102/24
sudo ip address add dev eth2 172.16.1.102/24
sudo ip address add dev wg0 10.0.2.102/24


Load up the tunnels:
sudo wg set wg0 listen-port 51822 private-key /etc/wireguard/wg0-privatekey

# Node1 (2.101) public key is: ZsHAeRbNsK66MBOwDJhdDgJRl0bPFB4WVRX67vAV7zs= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer ZsHAeRbNsK66MBOwDJhdDgJRl0bPFB4WVRX67vAV7zs= allowed-ips 10.0.2.0/24 endpoint 172.16.2.101:51821

sudo ip link set wg0 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)


password frr
enable password frr

log file /var/log/frr/frr.log

router rip
version 2
redistribute kernel
redistribute connected

network wg0
no passive-interface wg0
EOF

sudo systemctl restart frr

wireguard-3

Only has a net-white, so it must be eth1 so:

ssh wireguard-3

First IPs:
sudo ip address add dev eth1 172.16.1.103/24

Has no WG tunnels or FRR so we’re done here.

wireguard-4

Firstly using:
sudo virsh dumpxml wireguard-4 |less

We can see that eth1 is net-orange and eth2 is net-green so:

ssh wireguard-4

First IPs:
sudo ip address add dev eth1 172.16.4.104/24
sudo ip address add dev eth2 172.16.3.104/24
sudo ip address add dev wg0 10.0.4.104/24
sudo ip address add dev wg1 10.0.3.104/24

Load up the tunnels:
sudo wg set wg0 listen-port 51844 private-key /etc/wireguard/wg0-privatekey

# Node5 (4.105) public key is: Af/sIEnklG6nnDb0wzUSq1D/Ujh6TH+5R9TblLyS3h8= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer Af/sIEnklG6nnDb0wzUSq1D/Ujh6TH+5R9TblLyS3h8= allowed-ips 10.0.4.0/24 endpoint 172.16.4.105:51845

sudo ip link set wg0 up

sudo wg set wg1 listen-port 51834 private-key /etc/wireguard/wg1-privatekey

# Node1 (3.101) public key is: Yh0kKjoqnJsxbCsTkQ/3uncEhdqa+EtJXCYcVzMdugs= (cat /etc/wireguard/wg1-publickey)

sudo wg set wg1 peer Yh0kKjoqnJsxbCsTkQ/3uncEhdqa+EtJXCYcVzMdugs= allowed-ips 10.0.3.0/24 endpoint 172.16.3.101:51831

sudo ip link set wg1 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router rip
version 2
redistribute kernel
redistribute connected

network wg0
no passive-interface wg0

network wg1
no passive-interface wg1
EOF


sudo systemctl restart frr

wireguard-5

Firstly using:
sudo virsh dumpxml wireguard-5 |less

We can see that eth1 is net-orange and eth2 is net-yellow so:

ssh wireguard-5

First IPs”
sudo ip address add dev eth1 172.16.4.105/24
sudo ip address add dev eth2 172.16.5.105/24
sudo ip address add dev wg0 10.0.4.105/24

Load up the tunnels:
sudo wg set wg0 listen-port 51845 private-key /etc/wireguard/wg0-privatekey

# Node4 (4.104) public key is: aPA197sLN3F05bgePpeS2uZFlhRRLY8yVWnzBAUcD3A= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer aPA197sLN3F05bgePpeS2uZFlhRRLY8yVWnzBAUcD3A= allowed-ips 10.0.4.0/24 endpoint 172.16.4.104:51844

sudo ip link set wg0 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router rip
version 2
redistribute kernel
redistribute connected

network wg0
no passive-interface wg0
EOF


sudo systemctl restart frr

wireguard-6

Only has a net-yellow, so it must be eth1 so:

ssh wireguard-6

First IPs:
sudo ip address add dev eth1 172.16.5.106/24

Final comments

When this _is_ all working, we’d probably need to open up the allowed-ips on the WG tunnels. We could start by just adding 172.16.0.0/16 to the list. That might allow us to route packet to the other networks.

If you want to go find other routes out to the internet, then we may need 0.0.0.0/0 But not sure how WG will route that as it’s using the allowed-ips and public keys as a routing table. I guess it may not care as we only have a 1:1 mapping on each tunnel and if we can route to the WG interface, it’s pretty straight forward.
This is something I hope to test.

Anther really beneficial test would be to rebuild this environment using IPv6 and see if things work better as we wouldn’t have any broadcasts anymore, only uni and multi-cast.

As well as trying some other routing protocol in general, like OSPF.

Finally, having to continually adjust allowed-ips and seemingly have to either open it up more or add more ranges make me realise why the wg-dynamic project exists, and why they want to come up with a secure routing protocol to use through the tunnels, to do something similar. So let’s keep an eye on that project.

,

Pia AndrewsA quick reflection on digital for posterity

On the eve of moving to Ottawa to join the Service Canada team (squee!) I thought it would be helpful to share a few things for posterity. There are three things below:

  • Some observations that might be useful
  • A short overview of the Pia Review: 20 articles about digital public sector reform
  • Additional references I think are outstanding and worth considering in public sector digital/reform programs, especially policy transformation

Some observations

Moving from deficit to aspirational planning

Risk! Risk!! Risk!!! That one word is responsible for an incredible amount of fear, inaction, redirection of investment and counter-productive behaviours, especially by public sectors for whom the stakes for the economy and society are so high. But when you focus all your efforts on mitigating risks, you are trying to drive by only using the rear vision mirror, planning your next step based on the issues you’ve already experienced without looking to where you need to be. It ultimately leads to people driving slower and slower, often grinding to a halt, because any action is considered more risky than inaction. This doesn’t really help our metaphorical driver to pick up the kids from school or get supplies from the store. In any case, inaction bears as many risks as no action in a world that is continually changing. For example, if our metaphorical driver was to stop the car in an intersection they will likely be hit by another vehicle, or eventually starve to death.

Action is necessary. Change is inevitable. So public sectors must balance our time between being responsive (not reactive) to change and risks, and being proactive towards a clear goals or future state.

Of course, risk mitigation is what many in government think they need to most urgently address however, to only engage this is to buy into and perpetuate the myth that the increasing pace of change is itself a bad thing. This is the difference between user polling and user research: users think they need faster horses but actually they need a better way to transport more people over longer distances, which could lead to alternatives from horses. Shifting from a change pessimistic framing to change optimism is critical for public sectors to start to build responsiveness into their policy, program and project management. Until public servants embrace change as normal, natural and part of their work, then fear and fear based behaviours will drive reactivism and sub-optimal outcomes.

The OPSI model for innovation would be a helpful tool to ask senior public servants what proportion of their digital investment is in which box, as this will help identify how aspirational vs reactive, and how top down or bottom up they are, noting that there really should be some investment and tactics in all four quadrants.

Innovation-Facets-Diamond-1024x630My observation of many government digital programs is that teams spend a lot of their time doing top down (directed) work that focuses on areas of certainty, but misses out in building the capacity or vision required for bottom up innovation, or anything that genuinely explores and engages in areas of uncertainty. Central agencies and digital transformation teams are in the important and unique position to independently stand back to see the forest for the trees, and help shape systemic responses to all of system problems. My biggest recommendation would be for the these teams to support public sector partners to embrace change optimism, proactive planning, and responsiveness/resilience into their approaches, so as to be more genuinely strategic and effective in dealing with change, but more importantly, to better plan strategically towards something meaningful for their context.

Repeatability and scale

All digital efforts might be considered through the lens of repeatability and scale.

  • If you are doing something, anything, could you publish it or a version of it for others to learn from or reuse? Can you work in the open for any of your work (not just publish after the fact)? If policy development, new services or even experimental projects could be done openly from the start, they will help drive a race to the top between departments.
  • How would the thing you are considering scale? How would you scale impact without scaling resources? Basically, for anything you, if you’d need to dramatically scale resources to implement, then you are not getting an exponential response to the problem.

Sometimes doing non scalable work is fine to test an idea, but actively trying to differentiate between work that addresses symptomatic relief versus work that addresses causal factors is critical, otherwise you will inevitably find 100% of your work program focused on symptomatic relief.

It is critical to balance programs according to both fast value (short term delivery projects) and long value (multi month/year program delivery), reactive and proactive measures, symptomatic relief and addressing causal factors, & differentiating between program foundations (gov as a platform) and programs themselves. When governments don’t invest in digital foundations, they end up duplicating infrastructure for each and every program, which leads to the reduction of capacity, agility and responsiveness to change.

Digital foundations

Most government digital programs seem to focus on small experiments, which is great for individual initiatives, but may not lay the reusable digital foundations for many programs. I would suggest that in whatever projects the team embark upon, some effort be made to explore and demonstrate what the digital foundations for government should look like. For example:

  • Digital public infrastructure - what are the things government is uniquely responsible for that it should make available as digital public infrastructure for others to build upon, and indeed for itself to consume. Eg, legislation as code, services registers, transactional service APIs, core information and data assets (spatial, research, statistics, budgets, etc), central budget management systems. “Government as a Platform” is a digital and transformation strategy, not just a technology approach.
  • Policy transformation and closing the implementation gap -  many policy teams think the issues of policy intent not being realised is not their problem, so showing the value of multidisciplinary, test-driven and end to end policy design and implementation will dramatically shift digital efforts towards more holistic, sustainable and predictable policy and societal outcomes.
  • Participatory governance - departments need to engage the public in policy, services or program design, so demonstrating the value or participatory governance is key. this is not a nice to have, but rather a necessary part of delivering good services. Here is a recent article with some concepts and methods to consider and the team needs to have capabilities to enable this, that aren’t just communications skills, but rather genuine and subject matter expertise engagement.
  • Life Journey programs - putting digital transformation efforts,, policies, service delivery improvements and indeed any other government work in the context of life journeys helps to make it real, get multiple entities that play a part on that journey naturally involved and invested, and drives horizontal collaboration across and between jurisdictions. New Zealand led the way in this, NSW Government extended the methodology, Estonia has started the journey and they are systemically benefiting.
  • I’ve spoken about designing better futures, and I do believe this is also a digital foundation, as it provides a lens through which to prioritise, implement and realise value from all of the above. Getting public servants to “design the good” from a citizen perspective, a business perspective, an agency perspective, Government perspective and from a society perspective helps flush out assumptions, direction and hypotheses that need testing.

The Pia Review

I recently wrote a series of 20 articles about digital transformation and reform in public sectors. It was something I did for fun, in my own time, as a way of both recording and sharing my lessons learned from 20 years working at the intersection of tech, government and society (half in the private sector, half in the public sector). I called it the Public Sector Pia Review and I’ve been delighted by how it has been received, with a global audience republishing, sharing, commenting, and most important, starting new discussions about the sort of public sector they want and the sort of public servants they want to be. Below is a deck that has an insight from each of the 20 articles, and links throughout.

This is not just meant to be a series about digital, but rather about the matter of public sector reform in the broadest sense, and I hope it is a useful contribution to better public sectors, not just better public services.

The Pia Review – 20 years in 20 slides

There is also a collated version of the articles in two parts. These compilations are linked below for convenience, and all articles are linked in the references below for context.

  • Public-Sector-Pia-Review-Part-1 (6MB PDF) — essays written to provide practical tips, methods, tricks and ideas to help public servants to their best possible work today for the best possible public outcomes; and
  • Reimagining government (will link once published) — essays about possible futures, the big existential, systemic or structural challenges and opportunities as I’ve experienced them, paradigm shifts and the urgent need for everyone to reimagine how they best serve the government, the parliament and the people, today and into the future.

A huge thank you to the Mandarin, specifically Harley Dennett, for the support and encouragement to do this, as well as thanks to all the peer reviewers and contributors, and of course my wonderful husband Thomas who peer reviewed several articles, including the trickier ones!

My digital references and links from 2019

Below are a number of useful references for consideration in any digital government strategy, program or project, including some of mine :)

General reading

Life Journeys as a Strategy

Life Journey programs, whilst largely misunderstood and quite new to government, provide a surprisingly effective way to drive cross agency collaboration, holistic service and system design, prioritisation of investment for best outcomes, and a way to really connect policy, services and human outcomes with all involved on the usual service delivery supply chains in public sectors. Please refer to the following references, noting that New Zealand were the first to really explore this space, and are being rapidly followed by other governments around the world. Also please note the important difference between customer journey mapping (common), customer mapping that spans services but is still limited to a single agency/department (also common), and true life journey mapping which necessarily spans agencies, jurisdictions and even sectors (rare) like having a child, end of life, starting school or becoming an adult.

Policy transformation

Data in Government

Designing better futures to transform towards

If you don’t design a future state to work towards, then you end up just designing reactively to current, past or potential issues. This leads to a lack of strategic or cohesive direction in any particular direction, which leads to systemic fragmentation and ultimately system ineffectiveness and cannibalism. A clear direction isn’t just about principles or goals, it needs to be something people can see, connect with, align their work towards to (even if they aren’t in your team), and get enthusiastic about. This is how you create change at scale, when people buy into the agenda, at all levels, and start naturally walking in the same direction regardless of their role. Here are some examples for consideration.

Rules as Code

Please find the relevant Rules as Code links below for easy reference.

Better Rules and RaC examples

,

Clinton Roylca2020 ReWatch 2020-02-02

As I was an organiser of the conference this year, I didn’t get to see many talks, fortunately many of the talks were recorded, so i get to watch the conference well after the fact.

Conference Opening

That white balance on the lectern slides is indeed bad, I really should get around to adding this as a suggestion on the logos documentation. (With some help, I put up all the lectern covers, it was therapeutic and rush free).

I actually think there was a lot of information in this introduction. Perhaps too much?

OpenZFS and Linux

A nice update on where zfs is these days.

Dev/Ops relationships, status: It’s Complicated

A bit of  a war story about production systems, leading to a moment of empathy.

Samba 2020: Why are we still in the 1980s for authentication?

There are a lot of old security standards that are showing there age, there are a lot of modern security standards, but which to choose?

Tyranny of the Clock

A very interesting problem solving adventure, with a few nuggets of interesting information about tools and techniques.

Configuration Is (riskier than?) Code

Because configuration files are parsed by a program, and the program changes how it runs depending on the contents of that configuration file, every program that parses configuration files is basically an interpreter, and thus every configuration file is basically a program. So, configuation is code, and we should be treating configuration like we do code, e.g. revision control, commenting, testing, review.

Easy Geo-Redundant Handover + Failover with MARS + systemd

Using a local process organiser to handle a cluster, interesting, not something I’d really promote. Not the best video cutting in this video, lots of time with the speaker pointing to his slides offscreen.

 

,

Pia AndrewsWhere next: Spring starts when a heartbeat’s pounding…

Today I’m delighted to announce the next big adventure for my little family and I.

For my part, I will be joining the inspirational, aspirational and world leading Service Canada to help drive the Benefits Delivery Modernization program with Benoit Long, Tammy Belanger and their wonderful team, in collaboration with our wonderful colleagues across the Canadian Government! This enormous program aims to dramatically improve the experience of Canadians with a broad range of government services, whilst transforming the organization and helping create the digital foundations for a truly responsive, effective and human-centred public sector :)

This is a true digital transformation opportunity which will make a difference in the lives of so many people. It provides a chance to implement and really realise the benefits of human-centred service design, modular architecture (and Government as a Platform), Rules as Code, data analytics, life journey mapping, and all I have been working on for the last 10 years. I am extremely humbled and thankful for the chance to work with and learn from such a forward thinking team, whilst being able to contribute my experience and expertise to such an important and ambitious agenda.

I can’t wait to work with colleagues across ESDC and the broader Government of Canada, as well as from the many innovative provincial governments. I’ve been lucky enough to attend FWD50 in Ottawa for the last 3 years, and I am consistently impressed by the digital and public sector talent in Canada. Of course, because Canada is one of the “Digital Nations“, it also presents a great opportunity to collaborate closely with other leading digital governments, as I also found when working in New Zealand.

We’ll be moving to Ottawa in early March, so we will see everyone in Canada soon, and will be using the next month or so packing up, spending time with Australian friends and family, and learning about our new home :)

My husband and little one are looking forward to learning about Canadian and Indigenous cultures, learning French (and hopefully some Indigenous languages too, if appropriate!), introducing more z’s into my English, experiencing the cold (yes, snow is a novelty for Australians) and contributing how we can to the community in Ottawa. Over the coming years we will be exploring Canada and I can’t wait to share the particularly local culinary delight that is a Beavertail (a large, flat, hot doughnut like pastry) with my family!

For those who didn’t pick up the reference, the blog title had dual meaning: we are of course heading to Ottawa in the Spring, having had a last Australian Summer for a while (gah!), and it also was a little call out to one of the great Canadian bands, that I’ve loved for years, the Tragically Hip :)

,

sthbrx - a POWER technical bloglinux.conf.au 2020 recap

It's that time of year again. Most of OzLabs headed up to the Gold Coast for linux.conf.au 2020.

linux.conf.au is one of the longest-running community-led Linux and Free Software events in the world, and attracts a crowd from Australia, New Zealand and much further afield. OzLabbers have been involved in LCA since the very beginning and this year was no exception with myself running the Kernel Miniconf and several others speaking.

The list below contains some of our highlights that we think you should check out. This is just a few of the talks that we managed to make it to - there's plenty more worthwhile stuff on the linux.conf.au YouTube channel.

We'll see you all at LCA2021 right here in Canberra...

Keynotes

A couple of the keynotes really stood out:

Sean is a forensic structural engineer who shows us a variety of examples, from structural collapses and firefighting disasters, where trained professionals were blinded by their expertise and couldn't bring themselves to do things that were obvious.

There's nothing quite like cryptography proofs presented to a keynote audience at 9:30 in the morning. Vanessa goes over the issues with electronic voting systems in Australia, and especially internet voting as used in NSW, including flaws in their implementation of cryptographic algorithms. There continues to be no good way to do internet voting, but with developments in methodologies like risk-limiting audits there may be reasonably safe ways to do in-person electronic voting.

OpenPOWER

There was an OpenISA miniconf, co-organised by none other than Hugh Blemings of the OpenPOWER Foundation.

Anton (on Mikey's behalf) introduces the Power OpenISA and the Microwatt FPGA core which has been released to go with it.

Anton live demos Microwatt in simulation, and also tries to synthesise it for his FPGA but runs out of time...

Paul presents an in-depth overview of the design of the Microwatt core.

Kernel

There were quite a few kernel talks, both in the Kernel Miniconf and throughout the main conference. These are just some of them:

There's been many cases where we've introduced a syscall only to find out later on that we need to add some new parameters - how do we make our syscalls extensible so we can add new parameters later on without needing to define a whole new syscall, while maintaining both forward and backward compatibility? It turns out it's pretty simple but needs a few more kernel helpers.

There are a bunch of tools out there which you can use to make your kernel hacking experience much more pleasant. You should use them.

Among other security issues with container runtimes, using procfs to setup security controls during the startup of a container is fraught with hilarious problems, because procfs and the Linux filesystem API aren't really designed to do this safely, and also have a bunch of amusing bugs.

Control Flow Integrity is a technique for restricting exploit techniques that hijack a program's control flow (e.g. by overwriting a return address on the stack (ROP), or overwriting a function pointer that's used in an indirect jump). Kees goes through the current state of CFI supporting features in hardware and what is currently available to enable CFI in the kernel.

Linux has supported huge pages for many years, which has significantly improved CPU performance. However, the huge page mechanism was driven by hardware advancements and is somewhat inflexible, and it's just as important to consider software overhead. Matthew has been working on supporting more flexible "large pages" in the page cache to do just that.

Spoiler: the magical fantasy land is a trap.

Community

Lots of community and ethics discussion this year - one talk which stood out to me:

Bradley and Karen argue that while open source has "won", software freedom has regressed in recent years, and present their vision for what modern, pragmatic Free Software activism should look like.

Other

Among the variety of other technical talks at LCA...

Quantum compilers are not really like regular classical compilers (indeed, they're really closer to FPGA synthesis tools). Matthew talks through how quantum compilers map a program on to IBM's quantum hardware and the types of optimisations they apply.

Clevis and Tang provide an implementation of "network bound encryption", allowing you to magically decrypt your secrets when you are on a secure network with access to the appropriate Tang servers. This talk outlines use cases and provides a demonstration.

Christoph discusses how to deal with the hardware and software limitations that make it difficult to capture traffic at wire speed on fast fibre networks.

,

Pia AndrewsDigital excellence in Ballarat

In December I had the opportunity to work with Matthew Swards and the Business Improvements team in the Ballarat Council to provide a little support for their ambitious digital and data program. The Ballarat Council developed the Ballarat Digital Services Strategy a couple of years ago, which is excellent and sets a strong direction for human centred, integrated, inclusive and data driven government services. Councils face all the same challenges that I’ve found in Federal and State Governments, so many of the same strategies apply, but it was a true delight to see some of the exceptional work happening in data and digital in Ballarat.

The Ballarat Digital Services Strategy has a clear intent which I found to be a great foundation for program planning and balancing short term delivery with long term sustainable architecture and system responsiveness to change:

  1. Develop online services that are citizen centric and integrated from the user’s perspective;
  2. Ensure where possible citizens and businesses are not left behind by a lack of digital capability;
  3. Harness technology to enhance and support innovation within council business units;
  4. Design systems, solutions and data repositories strategically but deploy them tactically;
  5. Create and articulate clear purpose by aligning projects and priorities with council’s priorities;
  6. Achieve best value for ratepayers by focusing on cost efficiency and cost transparency;
  7. Build, lead and leverage community partnerships in order to achieve better outcomes; and
  8. Re-use resources, data and systems in order to reduce overall costs and implementation times.

The Business Improvement team has been working across Council to try to meet these goals, and there has been great progress on several fronts from several different parts of the Council.  I only had a few days but got to see great work on opening more Council data, improving Council data quality, bringing more user centred approaches to service design and delivery, exploration of emerging technologies (including IoT) for Council services, and helping bring a user-centred, multi-discplinary and agile approach to service design and delivery, working closely with business and IT teams. It was particularly great to see cross Council groups around big ticket programs to draw on expertise and capabilities across the organisation, as this kind of horizontal governance is critical for holistic and coordinated efforts for big community outcomes.

Whilst in town, Matthew Swards and I wandered the 5 minutes walk to the tech precinct to catch up with George Fong, who gave us a quick tour, including to the local Tech School, as well as a great chat about digital strategies, connectivity, access, inclusiveness and foundations for regional and remote communities to engage in the digital economy. The local talent and innovation in Ballarat is great to see, and in such close vicinity to the Council itself! The opportunities for collaboration are many and it was great to see cross sector discussions about what is good for the future of Ballarat :)

The Tech School blew my mind! It is a great State Government initiative to have a shared technology centre for all the local schools to use, and included state of the art gaming, 3D digital and printing tech, a robotics lab, and even an industrial strength food lab! I told a few people that people would move to Ballarat for their kids to have access to such a facility, to which I was told “this is just one of 10 across the state”.

It was great to work with the Business Improvement team and consider ways to drive the digital and data agenda for the Council and for Ballarat more broadly. It was also great to be able to leverage so many openly available government standards and design systems, such as the GDS and DTA Digital Service Standards and the NSW Design System. Open governments approaches like this make it easier for all levels of government across the world to leverage good practice, reuse standards and code, and deliver better services for the community. It was excellent timing that the Australian National API Design Standard was released this week, as it will also be of great use to Ballarat Council and all other Councils across Australia. Victoria has a special advantage as well because of the Municipal Association of Victoria (MAV), which works with and supports all Victorian Councils. The amount of great innovation and coordinated co-development around Council needs is extraordinary, and you could imagine the opportunities for better services if MAV and the Councils were to adopt a standard Digital Service Standard for Councils :)

Many thanks to Matt and the BI team at Ballarat Council, as well as those who made the time to meet and discuss all things digital and data. I hope my small contribution can help, and I’m confident that Ballarat will continue to be a shining example of digital and data excellence in government. It was truly a delight to see great work happening in yet another innovative Local Council in Australia, it certainly seems a compelling place to live :)

,

Robert Collins2019 in the rearview

2019 was a very busy year for us. I hadn’t realised how busy it was until I sat down to write this post. There’s also some moderately heavy stuff in here – if you have topics that trigger you, perhaps make sure you have spoons before reading.

We had all the usual stuff. Movies – my top two were Alita and Abominable though the Laundromat and Ford v Ferrari were both excellent and moving pieces. I introduced Cynthia to Teppanyaki and she fell in love with having egg roll thrown at her face hole.

When Cynthia started school we dropped gymnastics due to the time overload – we wanted some downtime for her to process after school, and with violin having started that year she was just looking so tired after a full day of school we felt it was best not to have anything on. Then last year we added in a specific learning tutor to help with the things that she approaches differently to the other kids in her class, giving 2 days a week of extra curricular activity after we moved swimming to the weekends.

At the end of last year she was finally chipper and with it most days after school, and she had been begging to get into more stuff, so we all got together and negotiated drama class and Aikido.

The drama school we picked, HSPA, is pretty amazing. Cynthia adored her first teacher there, and while upset at a change when they rearranged classes slightly, is again fully engaged and thrilled with her time there. Part of the class is putting on a full scale production – they did a version of the Happy Prince near the end of term 3 – and every student gets a part, with the ability for the older students to audition for more parts. On the other hand she tells me tonight that she wants to quit. So shrug, who knows :).

I last did martial arts when I took Aikido with sensei Darren Friend at Aikido Yoshinkai NSW back in Sydney, in the late 2000’s. And there was quite a bit less of me then. Cynthia had been begging to take a martial art for about 4 years, and we’d said that when she was old enough, we’d sign her up, so this year we both signed up for Aikido at the Rangiora Aikido Dojo. The Rangiora dojo is part of the NZ organisation Aikido Shinryukan which is part of the larger Aikikai style, which is quite different, yet the same, as the Yoshinkai Aikido that I had been learning. There have been quite a few moments where I have had to go back to something core – such as my stance – and unlearn it, to learn the Aikikai technique. Cynthia has found the group learning dynamic a bit challenging – she finds the explanations – needed when there are twenty kids of a range of ages and a range of experience – from new intakes each term through to ones that have been doing it for 5 or so years – get boring, and I can see her just switch off. Then she misses the actual new bit of information she didn’t have previously :(. Which then frustrates her. But she absolutely loves doing it, and she’s made a couple of friends there (everyone is positive and friendly, but there are some girls that like to play with her after the kids lesson). I have gotten over the body disconnect and awkwardness and things are starting to flow, I’m starting to be able to reason about things without just freezing in overload all the time, so that’s not bad after a year. However, the extra weight is making my forward rolls super super awkward. I can backward roll easily, with moderately good form; forward rolls though my upper body strength is far from what’s needed to support my weight through the start of the roll – my arm just collapses – so I’m in a sort of limbo – if I get the moment just right I can just start the contact on the shoulder; but if I get the moment slightly wrong, it hurts quite badly. And since I don’t want large scale injuries, doing the higher rolls is very unnerving for me. I suspect its 90% psychological, but am not sure how to get from where I am to having confidence in my technique, other than rinse-and-repeat. My hip isn’t affecting training much, and sensei Chris seems to genuinely like training with Cynthia and I, which is very nice: we feel welcomed and included in the community.

Speaking of my hip – earlier this year something ripped cartilage in my right hip – ended up having to have an MRI scan – and those machines sound exactly like a dot matrix printer – to diagnose it. Interestingly, having the MRI improved my symptoms, but we are sadly in hurry-up-and-wait mode. Before the MRI, I’d wake up at night with some soreness, and my right knee bent, foot on the bed, then sleepily let my leg collapse sideways to the right – and suddenly be awake in screaming agony as the joint opened up with every nerve at its disposal. When the MRI was done, they pumped the joint full of local anaesthetic for two purposes – one is to get a clean read on the joint, and the second is so that they can distinguish between referred surrounding pain, vs pain from the joint itself. It is to be expected with a joint issue that the local will make things feel better (duh), for up to a day or so while the local dissipates. The expression on the specialists face when I told him that I had had a permanent improvement trackable to the MRI date was priceless. Now, when I wake up with joint pain, and my leg sleepily falls back to the side, its only mildly uncomfortable, and I readjust without being brought to screaming awakeness. Similarly, early in Aikido training many activities would trigger pain, and now there’s only a couple of things that do. In another 12 or so months if the joint hasn’t fully healed, I’ll need to investigate options such as stem cells (which the specialist was negative about) or steroids (which he was more negative about) or surgery (which he was even more negative about). My theory about the improvement is that the cartilage that was ripped was sitting badly and the inflation for the MRI allowed it to settle back into the appropriate place (and perhaps start healing better). I’m told that reducing inflammation systematically is a good option. Turmeric time.

Sadly Cynthia has had some issues at school – she doesn’t fit the average mould and while wide spread bullying doesn’t seem to be a thing, there is enough of it, and she receives enough of it that its impacted her happiness more than a little – this blows up in school and at home as well. We’ve been trying a few things to improve this – helping her understand why folk behave badly, what to do in the moment (e.g. this video), but also that anything that goes beyond speech is assault and she needs to report that to us or teachers no matter what.

We’ve also had some remarkably awful interactions with another family at the school. We thought we had a friendly relationship, but I managed to trigger a complete meltdown of the relationship – not by doing anything objectively wrong, but because we had (unknown to me) different folkways, and some perfectly routine and normal behaviour turned out to be stressful and upsetting to them, and then they didn’t discuss it with us at all until it had brewed up in their heads into a big mess… and its still not resolved (and may not ever be: they are avoiding us both).

I weighed in at 110kg this morning. Jan the 4th 2019 I was 130.7kg. Feb 1 2018 I was 115.2kg. This year I peaked at 135.4kg, and got down to 108.7kg before Christmas food set in. That’s pretty happy making all things considered. Last year I was diagnosed with Coitus headaches and though I didn’t know it the medicine I was put on has a known side effect of weight gain. And it did – I had put it down to ongoing failure to manage my diet properly, but once my weight loss doctor gave me an alternative prescription for the headaches, I was able to start losing weight immediately. Sadly, though the weight gain through 2018 was effortless, losing the weight through 2019 was not. Doable, but not effortless. I saw a neurologist for the headaches when they recurred in 2019, and got a much more informative readout on them, how to treat and so on – basically the headaches can be thought of as an instability in the system, and the medicines goal is to stabilise things, and once stable for a decent period, we can attempt to remove the crutch. Often that’s successful, sometimes not, sometimes its successful on a second or third time. Sometimes you’re stuck with it forever. I’ve been eating a keto / LCHF diet – not super strict keto, though Jonie would like me to be on that, I don’t have the will power most of the time – there’s a local truck stop that sells killer hotdogs. And I simply adore them.

I started this year working for one of the largest companies on the planet – VMware. I left there in February and wrote a separate post about that. I followed that job with nearly the polar opposite – a startup working on a blockchain content distribution system. I wrote about that too. Changing jobs is hard in lots of ways – for instance I usually make friendships at my jobs, and those suffer some when you disappear to a new context – not everyone makes connections with you outside of the job context. Then there’s the somewhat non-rational emotional impact of not being in paid employment. The puritans have a lot to answer for. I’m there again, looking for work (and hey, if you’re going to be at Linux.conf.au (Gold Coast Australia January 13-17) I’ll be giving a presentation about some of the interesting things I got up to in the last job interregnum I had.

My feet have been giving me trouble for a couple of years now. My podiatrist is reasonably happy with my progress – and I can certainly walk further than I could – I even did some running earlier in the year, until I got shin splints. However, I seem to have hyper sensitive soles, so she can’t correct my pro-nation until we fix that, which at least for now means a 5 minute session where I touch my feet, someone else does, then something smooth then something rough – called “sensory massage”.

In 2017 and 2018 I injured myself at the gym, and in 2019 I wanted to avoid that, so I sought out ways to reduce injury. Moving away from machines was a big part of that; more focus on technique another part. But perhaps the largest part was moving from lifting dead weight to focusing on body weight exercises – callisthenics. This shifts from a dead weight to control when things go wrong, to an active weight, which can help deal with whatever has happened. So far at least, this has been pretty successful – although I’ve had minor issues – I managed to inflame the fatty pad the olecranon displaces when your elbow locks out – I’m nearly entirely transitioned to a weights-free program – hand stands, pistol