Planet Linux Australia

,

Adrian ChaddIt took me WAY too long to get this gotek to work...

 The short version - when you buy an actual OG Gotek with the OG Gotek firmware, it doesn't work in any convenient way that is documented on any of the retro computing sites, because you DEFINITELY need to flash it with different firmware.

Ok, so the less short version!

I picked up a busted Amiga 500 for free to help repair an Amstrad CPC464 for someone. It had a busted ROM socket, busted RAM and some other random crap that I needed to fix. Oh and 50 of the keys were not working, and the key membrane was broken!

So everything above is fixed - and I still need to fix the RTC in the A501 memory expansion - but now I need to get it to boot. And I don't have a spare floppy drive for this.

But I did have a Gotek - I bought it for the IBM PC stuff, but I ended up bootstrapping it using USB 3.5 inch floppy drives.

.. and it didn't work. And I didn't understand why.

So, the summary.

I wanted to flash the FlashFloppy Gotek stuff from https://github.com/keirf/flashfloppy/wiki/Firmware-Programming. Easy peasy. Get a UART hooked up and connect to FreeBSD.

Didn't work. Welp.

After reading around I found that someone tried at 9600, because it just wouldn't work faster.

I did that and well now it does. It's quite possible if I added the 10k pull-ups mentioned on the flashfloppy hardware mods section that I'd have more luck.



.. and flashing it from FreeBSD:


Easy.

Ok, next. Hooking it up. Set it to be device 0, remove another config jumper. Easy. Also, apparently the pinout is .. reversed on the amiga 500? In any case, pay super close attention to the orientation of the floppy drive connectors.


Anyway. Now it works. I then followed the installation instructions for a USB stick and whacked Workbench 1.3 images on - and now it boots fine into workbench 1.3. I've ordered a replacement mounting frame for the Gotek board so it can sit in with the case closed - but for now the machine works.

Well, besides the RTC. That's next. Ugh.

Francois MarierCrashplan 10 won't start on Ubuntu derivatives

CrashPlan recently updated itself to version 10 on my Pop!_OS laptop and stopped backing anything up.

When trying to start the client, I got faced with this error message:

Code42 cannot connect to its backend service.

Digging through log files

In /usr/local/crashplan/log/service.log.0, I found the reason why the service didn't start:

[05.18.22 07:40:05.756 ERROR main           com.backup42.service.CPService] Error starting up, java.lang.IllegalStateException: Failed to start authorized services.
STACKTRACE:: java.lang.IllegalStateException: Failed to start authorized services.
        at com.backup42.service.ClientServiceManager.authorize(ClientServiceManager.java:552)
        at com.backup42.service.CPService.startServices(CPService.java:2467)
        at com.backup42.service.CPService.start(CPService.java:562)
        at com.backup42.service.CPService.main(CPService.java:1574)
Caused by: com.google.inject.ProvisionException: Unable to provision, see the following errors:

1) Error injecting constructor, java.lang.UnsatisfiedLinkError: Unable to load library 'uaw':
libuaw.so: cannot open shared object file: No such file or directory
libuaw.so: cannot open shared object file: No such file or directory
Native library (linux-x86-64/libuaw.so) not found in resource path (lib/com.backup42.desktop.jar:lang)
  at com.code42.service.useractivity.UserActivityWatcherServiceImpl.<init>(UserActivityWatcherServiceImpl.java:67)
  at com.code42.service.useractivity.UserActivityWatcherServiceImpl.class(UserActivityWatcherServiceImpl.java:23)
  while locating com.code42.service.useractivity.UserActivityWatcherServiceImpl
  at com.code42.service.AbstractAuthorizedModule.addServiceWithoutBinding(AbstractAuthorizedModule.java:77)
  while locating com.code42.service.IAuthorizedService annotated with @com.google.inject.internal.Element(setName=,uniqueId=34, type=MULTIBINDER, keyType=)
  while locating java.util.Set<com.code42.service.IAuthorizedService>

1 error
        at com.google.inject.internal.InternalProvisionException.toProvisionException(InternalProvisionException.java:226)
        at com.google.inject.internal.InjectorImpl$1.get(InjectorImpl.java:1097)
        at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1126)
        at com.backup42.service.ClientServiceManager.getServices(ClientServiceManager.java:679)
        at com.backup42.service.ClientServiceManager.authorize(ClientServiceManager.java:513)
        ... 3 more
Caused by: java.lang.UnsatisfiedLinkError: Unable to load library 'uaw':
libuaw.so: cannot open shared object file: No such file or directory
libuaw.so: cannot open shared object file: No such file or directory
Native library (linux-x86-64/libuaw.so) not found in resource path (lib/com.backup42.desktop.jar:lang)
        at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:301)
        at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:461)
        at com.sun.jna.Library$Handler.<init>(Library.java:192)
        at com.sun.jna.Native.load(Native.java:596)
        at com.sun.jna.Native.load(Native.java:570)
        at com.code42.service.useractivity.UserActivityWatcherServiceImpl.<init>(UserActivityWatcherServiceImpl.java:72)
        at com.code42.service.useractivity.UserActivityWatcherServiceImpl$$FastClassByGuice$$4bcc96f8.newInstance(<generated>)
        at com.google.inject.internal.DefaultConstructionProxyFactory$FastClassProxy.newInstance(DefaultConstructionProxyFactory.java:89)
        at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:114)
        at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:91)
        at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:306)
        at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
        at com.code42.service.AuthorizedScope$1.get(AuthorizedScope.java:38)
        at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:39)
        at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:62)
        at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
        at com.code42.service.AuthorizedScope$1.get(AuthorizedScope.java:38)
        at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:39)
        at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:42)
        at com.google.inject.internal.RealMultibinder$RealMultibinderProvider.doProvision(RealMultibinder.java:198)
        at com.google.inject.internal.RealMultibinder$RealMultibinderProvider.doProvision(RealMultibinder.java:151)
        at com.google.inject.internal.InternalProviderInstanceBindingImpl$Factory.get(InternalProviderInstanceBindingImpl.java:113)
        at com.google.inject.internal.InjectorImpl$1.get(InjectorImpl.java:1094)
        ... 6 more
        Suppressed: java.lang.UnsatisfiedLinkError: libuaw.so: cannot open shared object file: No such file or directory
                at com.sun.jna.Native.open(Native Method)
                at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:191)
                ... 28 more
        Suppressed: java.lang.UnsatisfiedLinkError: libuaw.so: cannot open shared object file: No such file or directory
                at com.sun.jna.Native.open(Native Method)
                at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:204)
                ... 28 more
        Suppressed: java.io.IOException: Native library (linux-x86-64/libuaw.so) not found in resource path (lib/com.backup42.desktop.jar:lang)
                at com.sun.jna.Native.extractFromResourcePath(Native.java:1119)
                at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:275)
                ... 28 more

[05.18.22 07:40:05.756 INFO  main         42.service.history.HistoryLogger] HISTORY:: Code42 stopped, version 10.0.0
[05.18.22 07:40:05.756 INFO  main           com.backup42.service.CPService] *****  STOPPING  *****
[05.18.22 07:40:05.757 INFO  Thread-0       com.backup42.service.CPService] ShutdownHook...calling cleanup
[05.18.22 07:40:05.759 INFO  STOPPING       com.backup42.service.CPService] SHUTDOWN:: Stopping service...

This suggests that a new library dependency (uaw) didn't get installed during the last upgrade.

Looking at the upgrade log (/usr/local/crashplan/log/upgrade..log), I found that it detected my operating system as "pop 20":

Fri May 13 07:39:51 PDT 2022: Info : Resolve Native Libraries for pop 20...
Fri May 13 07:39:51 PDT 2022: Info :   Keep common libs
Fri May 13 07:39:51 PDT 2022: Info :   Keep pop 20 libs

I unpacked the official installer (login required):

$ tar zxf CrashPlanSmb_10.0.0_15252000061000_303_Linux.tgz 
$ cd code42-install
$ gzip -dc CrashPlanSmb_10.0.0.cpi | cpio -i

and found that libuaw.so is only shipped for 4 supported platforms (rhel7, rhel8, ubuntu18 and ubuntu20):

$ find nlib/
nlib/
nlib/common
nlib/common/libfreeblpriv3.chk
nlib/common/libsoftokn3.chk
nlib/common/libsmime3.so
nlib/common/libnss3.so
nlib/common/libplc4.so
nlib/common/libssl3.so
nlib/common/libsoftokn3.so
nlib/common/libnssdbm3.so
nlib/common/libjss4.so
nlib/common/libleveldb.so
nlib/common/libfreeblpriv3.so
nlib/common/libfreebl3.chk
nlib/common/libplds4.so
nlib/common/libnssutil3.so
nlib/common/libnspr4.so
nlib/common/libfreebl3.so
nlib/common/libc42core.so
nlib/common/libc42archive64.so
nlib/common/libnssdbm3.chk
nlib/rhel7
nlib/rhel7/libuaw.so
nlib/rhel8
nlib/rhel8/libuaw.so
nlib/ubuntu18
nlib/ubuntu18/libuaw.so
nlib/ubuntu20
nlib/ubuntu20/libuaw.so

Fixing the installation script

Others have fixed this problem by copying the files manually but since Pop!_OS is based on Ubuntu, I decided to fix this by forcing the OS to be detected as "ubuntu" in the installer.

I simply edited install.sh like this:

--- install.sh.orig 2022-05-18 16:47:52.176199965 -0700
+++ install.sh  2022-05-18 16:57:26.231723044 -0700
@@ -15,7 +15,7 @@
 readonly IS_ROOT=$([[ $(id -u) -eq 0 ]] && echo true || echo false)
 readonly REQ_CMDS="chmod chown cp cpio cut grep gzip hash id ls mkdir mv sed"
 readonly APP_VERSION_FILE="c42.version.properties"
-readonly OS_NAME=$(grep "^ID=" /etc/os-release | cut -d = -f 2 | tr -d \" | tr '[:upper:]' '[:lower:]')
+readonly OS_NAME=ubuntu
 readonly OS_VERSION=$(grep "^VERSION_ID=" /etc/os-release | cut -d = -f 2 | tr -d \" | cut -d . -f1)

 SCRIPT_DIR="${0:0:${#0} - ${#SCRIPT_NAME}}"

and then ran that install script as root again to upgrade my existing installation.

,

Tim RileyOpen source status update, April 2022

April was a pretty decent month for my OSS work! Got some things wrapped up, kept a few things moving, and opened up a promising thing for investigation. What are these things, you say? Let’s take a look!

Finished centralisation of Hanami action and view integrations

I wrote about the need to centralise these integrations last month, and in April, I finally got the work done!

This was a relief to get out. As a task, while necessary, it felt like drudge work – I’d been on it since early March, after all! I was also conscious that this was also blocking Luca’s work on helpers all the while.

My prolonged work on this (in part, among other things like Easter holidays and other such Real Life matters) contributed to us missing April’s Hanami release. The good thing is that it’s done now, and I’m hopeful we can have this released via another Hanami alpha sometime very soon.

In terms of the change to Hanami apps, the biggest change from this is that your apps should use a new superclass for actions and views:

require "hanami/application/action"

module Main
  module Action
    # Used to inherit from Hanami::Action
    class Base < Hanami::Application::Action
    end
  end
end

Aside from the benefit to us as maintainers of having this integration code kept together, this distinct superclass should also help make it clearer where to look when learning about how actions and views work within full Hanami apps.

Enabled proper access to full locals in view templates

I wound up doing a little more work in actions and views this month. The first was a quickie to unblock some more of Luca’s helpers work: making access to the locals hash within templates work like we always expected it would.

This turned out to be a fun one. For a bit of background, the context for every template rendering in hanami-view (i.e. what self is for any given template) is an Hanami::View::Scope instance. This instance contains the template’s locals, makes the full locals hash available as #locals (and #_locals, for various reasons), and uses #method_missing to make also make each local directly available via its own name.

Luca found, however, that calling locals within the template didn’t work at all! After I took a look, it seemed that while locals didn’t work, self.locals or just plain _locals would work. Strange!

Turns out, this all came down to implementation details in Tilt, which we use as our low-level template renderer. The way Tilt works is that it will compile a template down into a single Ruby method that receives a locals param:

def compile_template_method(local_keys, scope_class=nil)
  source, offset = precompiled(local_keys)
  local_code = local_extraction(local_keys)

  # <...snip...>

  method_source << <<-RUBY
    TOPOBJECT.class_eval do
      def #{method_name}(locals)
        #{local_code}
  RUBY

Because of this, locals is actually a local variable in the context of that method execution, which will override any other methods also available on the scope object that Tilt turns into self for the rendering.

Here is how we were originally rendering with Tilt:

tilt(path).render(scope, &block)

My first instinct was simply to pass our locals hash as the (optional) second argument to Tilt’s #render:

tilt(path).render(scope, scope._locals)

But even that didn’t work! Because in generating that local_code above, Tilt will actually take the locals and explode it out into individual variable assignments:

def local_extraction(local_keys)
  local_keys.map do |k|
    if k.to_s =~ /\A[a-z_][a-zA-Z_0-9]*\z/
      "#{k} = locals[#{k.inspect}]"
    else
      raise "invalid locals key: #{k.inspect} (keys must be variable names)"
    end
  end.join("\n")
end

But we don’t need this at all, since hanami-view’s scope object is already making those locals available individually, and we want to ensure access to those locals continues to run through the scope object.

So the ultimate fix is to make locals of our locals. Yo dawg:

tilt(path).render(scope, {locals: scope._locals}, &block)

This gives us our desired access to the locals hash in templates (because that locals key is itself turned into a solitary local variable), while preserving the rest of our existing scope-based functionality.

It also shows me that I probably should’ve written an integration test back when I introduced access to a scope’s locals back in January 2019. 😬

Either way, I’m excited this came up and I could fix it, because it’s an encouraging sign of just how much of this view system we’ll be able to put to use in creating a streamlined and powerful view layer for our future Hanami users!

Merged a fix to stop unwanted view rendering of halted requests

Thanks to our extensive use of Hanami at Culture Amp, my friend and colleague Andrew discovered and fixed a bug with our automatic rendering of views within actions, which I was happy to merge in.

Shipped some long awaited dry-configurable features

After keeping poor ojab waiting way too long, I also merged a couple of nice enhancements he made to dry-configurable:

I then released these as dry-configurable 0.15.0.

Started work on unifying Hanami slices and actions

Last but definitely not least, I started work on one of the last big efforts we need in place before 2.0: making Hanami slices act as much as possible like complete, miniature Hanami applications. I’m going to talk about this a lot more in future posts, but for now, I can point you to a few PRs:

  • Introducing Hanami::SliceName (a preliminary, minor refactoring to fix some slice and application name determination responsibilities that had somehow found their way into our configuration class).
  • A first, abandoned attempt at combining slices and applications, using a mixin for shared behaviour.
  • A much more promising attempt using a composed slice object within the application class, which is currently the base of my further work in this area.

Apart from opening up some really interesting possibilities around making slices fully a portable, mountable abstraction (imagine bringing in slices from gems!), even for our shorter-term needs, this work looks valuable, since I think it should provide a pathway for having application-wide settings kept on the application class, while still allowing per-slice customisation of those settings in whichever slices require them.

The overall slice structure is also something that’s barely changed since I put it in place way back in late 2019. Now it’s going to get the spit and polish it deserves. Hopefully I’ll be able to share more progress on this next month :) See you then!

Adrian ChaddAmiga 1200 - Kickstart 3.1.4 upgrade, but not upgrading to AmigaOS 3.1

 So I splurged a couple bucks on the updated AmigaOS 3.1.4 from Hyperion. It came with both the ROMs and AmigaOS. Now, I have 3.1 installed already, and I wanted to just drop in the ROMs.

It almost worked.

It looks like they moved some libraries from the ROM out to disk/RAM in order to make space.



So I figured they were on one of the installation disks, and....

Easy peasy. I copied icons.library and workbench.library to my system partition and rebooted. All good!



It's really quite slick how Kickstart ROMs are basically a bootloader and then a whole bunch of libraries, some of which form the core OS and some are actually just libraries. I'm sad this concept didn't show up elsewhere.





,

Russell CokerElon and Free Speech

Elon Musk has made the news for spending billions to buy a share of Twitter for the alleged purpose of providing free speech. The problem with this claim is that having any company controlling a large portion of the world’s communication is inherently bad for free speech. The same applies for Facebook, but that’s not a hot news item at the moment.

If Elon wanted to provide free speech he would want to have decentralised messaging systems so that someone who breaks rules on one platform could find another with different rules. Among other things free speech ideally permits people to debate issues with residents of another country on issues related to different laws. If advocates for the Russian government get kicked off Twitter as part of the American sanctions against Russia then American citizens can’t debate the issue with Russian citizens via Twitter. Mastodon is one example of a federated competitor to Twitter [1]. With a federated messaging system each host could make independent decisions about interpretation of sanctions. Someone who used a Mastodon instance based in the US could get a second account in another country if they wanted to communicate with people in countries that are sanctioned by the US.

The problem with Mastodon at the moment is lack of use. It’s got a good set of features and support for different platforms, there are apps for Android and iPhone as well as lots of other software using the API. But if the people you want to communicate with aren’t on it then it’s less useful. Elon could solve that problem by creating a Tesla Mastodon server and give a free account to everyone who buys a new Tesla, which is the sort of thing that a lot of Tesla buyers would like. It’s quite likely that other companies selling prestige products would follow that example. Everyone has seen evidence of people sharing photos on social media with someone else’s expensive car, a Mastodon account on ferrari.com or mercedes.com would be proof of buying the cars in question. The number of people who buy expensive cars new is a very small portion of the world population, but it’s a group of people who are more influential than average and others would join Mastodon servers to follow them.

The next thing that Elon could do to kill Twitter would be to have all his companies (which have something more than a dozen verified Twitter accounts) use Mastodon accounts for their primary PR releases and then send the same content to Twitter with a 48 hour delay. That would force journalists and people who want to discuss those companies on social media to follow the Mastodon accounts. Again this wouldn’t be a significant number of people, but they would be influential people. Getting journalists to use a communications system increases it’s importance.

The question is whether Elon is lacking the vision necessary to plan a Mastodon deployment or whether he just wants to allow horrible people to run wild on Twitter.

The Verge has an interesting article from 2019 about Gab using Mastodon [2]. The fact that over the last 2.5 years I didn’t even hear of Gab using Mastodon suggests that the fears of some people significantly exceeded the problem. I’m sure that some Gab users managed to harass some Mastodon users, but generally they were apparently banned quickly. As an aside the Mastodon server I use doesn’t appear to ban Gab, a search for Gab on it gave me a user posting about being “pureblood” at the top of the list.

Gab claims to have 4 million accounts and has an estimated 100,000 active users. If 5.5% of Tesla owners became active users on a hypothetical Tesla server that would be the largest Mastodon server. Elon could demonstrate his commitment to free speech by refusing to ban Gab in any way. The Wikipedia page about Gab [3] has a long list of horrible people and activities associated with it. Is that the “free speech” to associate with Tesla? Polestar makes some nice electric cars that appear quite luxurious [4] and doesn’t get negative PR from the behaviour of it’s owner, that’s something Elon might want to consider.

Is this really about bragging rights? Buying a controlling interest in a company that has a partial monopoly on Internet communication is something to boast about. Could users of commercial social media be considered serfs who serve their billionaire overlord?

,

Rusty RussellPickhardt Payments Implementation: Finding ?!

So, I’ve finally started implementing Pickhardt Payments in Core Lightning (#cln) and there are some practical complications beyond the paper which are worth noting for others who consider this!

In particular, the cost function in the paper cleverly combines the probability of success, with the fee charged by the channel, giving a cost function of:

? log( (ce + 1 ? fe) / (ce + 1)) + ? 路 fe 路 fee(e)

Which is great: bigger ? means fees matter more, smaller means they matter less. And the paper suggests various ways of adjusting them if you don’t like the initial results.

But, what’s a reasonable ? value? 1? 1000? 0.00001? Since the left term is the negative log of a probability, and the right is a value in millisats, it’s deeply unclear to me!

So it’s useful to look at the typical ranges of the first term, and the typical fees (the rest of the second term which is not ?), using stats from the real network.

If we want these two terms to be equal, we get:

? log( (ce + 1 ? fe) / (ce + 1)) = ? 路 fe 路 fee(e)
=> ? = ? log( (ce + 1 ? fe) / (ce + 1)) / ( fe 路 fee(e))

Let’s assume that fee(e) is the median fee: 51 parts per million. I chose to look at amounts of 1sat, 10sat, 100sat, 1000sat, 10,000sat, 100,000sat and 1M sat, and calculated the ? values for each channel. It turns out that, for almost all those values, the 10th percentile ? value is 0.125 the median, and the 90th percentile ? value is 12.5 times the median, though for 1M sats it’s 0.21 and 51x, which probably reflects that the median fee is not 51 for these channels!

Nonetheless, this suggests we can calculate the “expected ?” using the median capacity of channels we could use for a payment (i.e. those with capacity >= amount), and the median feerate of those channels. We can then bias it by a factor of 10 or so either way, to reasonably promote certainty over fees or vice versa.

So, in the internal API for the moment I accept a frugality factor, generally 0.1 (not frugal, prefer certainty to fees) to 10 (frugal, prefer fees to certainty), and derive ?:

? = -log((median_capacity_msat + 1 – amount_msat) / (median_capacity_msat + 1)) * frugality / (median_fee + 1)

The median is selected only from the channels with capacity > amount, and the +1 on the median_fee covers the case where median fee turns out to be 0 (such as in one of my tests!).

Note that it’s possible to try to send a payment larger than any channel in the network, using MPP. This is a corner case, where you generally care less about fees, so I set median_capacity_msat in the “no channels” case to amount_msat, and the resulting ? is really large, but at that point you can’t be fussy about fees!

,

Simon LyallAudiobooks – April 2022

On Her Majesty’s Secret Service by Ian Fleming

Bond tracks Blofeld to a Swiss hideout. He infiltrates it and must discover and foil Blofeld’s plot. A romantic subplot adds interest. 3/5

The years of Lyndon Johnson 2 – Means of Ascent by Robert Caro

LBJ dodges the war, makes serious money from radio stations and steals the 1948 Senate Primary. Easy to follow and fascinating. 4/5

Blood, Sweat & Chrome: The Wild and True Story of Mad Max: Fury Road by Kyle Buchanan

The book is 90% interviews and talks to a wide range of people involved with the movie. A wealth of interesting stories. 4/5

Vaxxers: The Inside Story of the Oxford AstraZeneca Vaccine and the Race Against the Virus by Sarah Gilbert and Catherine Green

Mainly covering early 2020 to mid-2021, Each author writes alternating chapters covering the development and rollout of the vaccine. 3/5

Becoming Trader Joe: How I Did Business My Way and Still Beat the Big Guys by Joe Coulombe

A story of the author taking the chain through various stages. Keys ways they did business compared to other firms and stayed profitable. 4/5

How Innovation Works: And Why It Flourishes in Freedom by Matt Ridley

Examples of how innovation works in the real world followed by the characteristics of innovation, how to promote it, and how it can go wrong. 4/5


My Audiobook Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Tim RileyTwo years of open source status updates

Back in March of 2020, I decided to take up the habit of writing monthly status updates for my open source software development. 22 updates and 25k words later, I’m happy to be celebrating two years of status updates!

As each month ticks around, I can find it hard to break away from cutting code and write my updates, but every time I get to publishing, I’m really happy to have captured my progress and thinking.

After all, these posts now help remind me I managed to do all of the following over the last two years (and these were just the highlights!):

  • Renamed dry-view to hanami-view and kicked off view/application integration (Mar 2020)
  • Received my first GitHub sponsor (Apr 2020), thank you Benny Klotz (who still sponsors me today!)
  • Shared my Hanami 2 application template (May 2020)
  • Achieved seamless view/action/application integration (May 2020)
  • Brought class-level configuration to Hanami::Action (Jun 2020)
  • Introduced application-level configuration for actions and views (Jul 2020)
  • Added automatic inference for an action’s paired view, along with automatic rendering (Jul 2020)
  • Introduced application integration for view context classes (Jul 2020)
  • Supported multiple boot file dirs in dry-system, allowing user-replacement of standard bootable components in Hanami (Aug 2020)
  • Rebuilt the Hanami Flash class (Aug 2020)
  • Resumed restoring hanami-controller features through automatic enabling of CSRF protection (Sep 2020)
  • Added automatic configuration to views (inflector, template, part namespace) (Oct 2020)
  • Released a non-web Hanami application template (Oct 2020)
  • Started the long road to Hanami/Zeitwerk integration with an autoloading loader for dry-system (Nov 2020)
  • Introduced dedicated “component dirâ€� abstraction to dry-system, along with major cleanups and consistency wins (Dec 2020/Jan 2021)
  • Added support for dry-system component dirs with mixed namespaces (Feb/Mar/Apr 2021)
  • Released dry-system with all these changes, along with Hanami with working Zeitwerk integration (Mar/Apr 2021)
  • Ported Hanami’s app configuration to dry-configurable (May 2021),
  • Laid the way for dry-configurable 1.0 with some API changes (May/Jul 2021)
  • Returned to dry-system and added configurable constant namespaces (Jun/Jul/Aug/Sep/Oct 2021)
  • Introduced compact slice source dirs to Hanami, using dry-systems constant namespaces (Sep/Oct 2021)
  • Added fully configurable source dirs to Hanami (Nov/Dec 2021)
  • Shipped a huge amount of dry-system improvements over two weeks of dedicated OSS time in Jan 2022, including the overhaul of bootable components as part of their rename to providers, as well as partial container imports and exports, plus much more
  • Introduced concrete slice classes and other slice registration improvements to Hanami (Feb 2022)
  • Refactored and relocated action and view integration into the hanami gem itself, and introduced Hanami::SliceConfigurable to make it possible for similar components to integrate (Mar 2022)

This is a lot! To add some extra colour here, a big difference betwen now and pre-2020 is that I’ve been working on OSS exclusively in my personal time (nights and weekends), and I’ve also been slugging away at a single large goal (Hanami 2.0, if you hadn’t heard!), and the combination of this can make the whole thing feel a little thankless. These monthly updates are timely punctuation and a valuable reminder that I am moving forward.

They also capture a lot of in-the-moment thinking that’d otherwise be lost to the sands of time. What I’ve grown to realise with my OSS work is that it’s as much about the process as anything else. For community-driven projects like dry-rb and Hanami, the work will be done when it’s done, and there’s not particularly much we can do to hurry it. However, what we should never forget is to make that work-in-progress readily accessible to our community, to bring people along for the ride, and to share whatever lessons we discover along the way. The passing of each month is a wonderful opportunity for me to do this 😀

Finally, a huge thank you from me to anyone who reads these updates. Hearing from folks and knowing there are people out there following along is a huge encouragement to me.

So, let’s keep this going. I’m looking forward to another year of updates, and—checks calendar–writing April’s post in the next week or so!

,

Paul WiseFLOSS Activities April 2022

Focus

This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

  • Spam: reported 33 Debian mailing list posts
  • Debian wiki: RecentChanges for the month
  • Debian BTS usertags: changes for the month
  • Debian screenshots:

Administration

  • Debian wiki: unblock IP addresses, approve accounts

Communication

Sponsors

The libpst, gensim, SPTAG work was sponsored. All other work was done on a volunteer basis.

,

Tim RileySalubrious Ruby: Don’t mutate what you don’t own

When we’re writing a method in Ruby and receiving objects as arguments, a helpful principle to follow is “don’t mutate what you don’t own.”

Why is this? Those arguments come from places that we as the method authors can’t know, and a well-behaved method shouldn’t alter the external environment unexpectedly.

Consider the following method, which takes an array of numbers and appends a new, incremented number:

def append_number(arr)
  last_number = arr.last || 0
  arr << last_num + 1
end

If we pass in an array, we’ll get a new number appended in the returned array:

my_arr = [1, 2]
my_new_arr = append_number(my_arr) # => [1, 2, 3]

But we’ll also quickly discover that this has been achieved by mutating our original array:

my_arr = [1, 2]
my_new_arr = append_number(arr) # => [1, 2, 3]
my_arr # => [1, 2, 3]

We can confirm by an object equality check that this is still the one same array:

my_new_arr.eql?(my_arr) # => true

This behavior is courtesy of Ruby’s Array#<< method (aka #append or #push), which appends the given object to the receiver (that is, self), before then returning that same self. This kind of self-mutating behaviour is common across both the Array and Hash classes, and while it can provide some conveniences in local use (such as a chain of #<< calls to append multiple items to the same array), it can lead to surprising results when that array or hash comes from anywhere non-local.

Imagine our example above is part of a much larger application. In this case, my_arr will most likely come from somewhere far removed and well outside the purview of append_number or whatever class contains it. As the authors of append_number, we have no idea how that original array might otherwise be used! For this reason, the courteous approach to take is not to mutate the array, but instead create and return a new copy:

def append_number(arr)
  last_number = arr.last || 0

  # There are many ways we can achieve the copy; here's just one
  arr + [last_number]
end

This way, the caller of our method can trust their original values to go unchanged, which is what they would likely expect, especially if our method doesn’t give any hint that it will mutate.

my_arr = [1, 2]
my_new_arr = append_number(arr) # => [1, 2, 3]
my_arr # => [1, 2]

This is a very simple example, but the same princple applies for all kinds of mutable objects passed to your methods.

A more telling story here comes from earlier days of Ruby, around how we handled options hashes passed to methods. We used to do things like this:

def my_method(options = {})
  some_opt = options.delete(:some_opt)
  # Do something with some_opt...
end

my_method(some_opt: "some value")

Using trailing options hashes like this was how we provided “keyword arguments” before Ruby had them as a language feature. Now the trouble with the method above is that we’re mutating that options hash by deleting the :some_opt key. So if the user of our method had code like this:

common_options = {some_opt: "some value"}

first_result = my_method(common_options)
second_result = my_method(common_options)

We’d find ourselves in trouble by the time we call my_method the second time, because at that point the common_options hash will no longer have some_opt:, since the first invocation of my_method deleted it — oops!

This is a great illustration of why modern Ruby’s keyword arguments work the way they do. When we accept a splatted keyword hash argument like **options, Ruby ensures it comes into the method as a new hash, which means that operations like options.delete(:some_opt) do in fact become local in scope, and therefore safe to use.

So now that we’ve covered both arrays and hashes now as Ruby’s most common “container” objects, what about the other kinds of application-specific structures that we might encounter in real world codebases? Objects representing domain models of various kinds, an instance of an ActiveRecord::Base subclass, even? Even in those cases, this principle still holds true. Our code is easier to understand and test when we can reduce the number of dimenstions to its behaviour, and mutating passed-in objects is a big factor in this, especially if you think about methods calling other methods and so on. There are ways we can design our applications to make this a natural approach to take, even for rich domain objects, but that is a topic for another day!

Until then, hopefully this walkthrough here can serve as a reminder to keep our methods courteous, to respect mutable values provided from outside, and wherever possible, leave them undisturbed and unmutated. Salubrious!

Bonus content! In preparing this post, I thought about whether it might be helpful to note that Ruby is a “pass by reference” language, since that’s the key underlying behavior that can result in these accidental mutations. However, my intuition here was actually back to front! Thanks to this wonderful stackoverflow answer, I was reminded that Ruby is in fact a “pass by value” language, but that all values are references.

,

Tim SerongGo With The Flow

We installed 5.94kW of solar PV in late 2017, with an ABB PVI-6000TL-OUTD inverter, and also a nice energy efficient Sanden heat pump hot water service to replace our crusty old conventional electric hot water system. In the four years since then we’ve generated something like 24MWh of electricity, but were actually only able to directly use maybe 45% of that – the rest was exported to the grid.

The plan had always been to get batteries once we are able to afford to do so, and this actually happened in August 2021, when we finally got a single 10kWh Redflow ZCell zinc bromine flow battery installed. We went with Redflow for several reasons:

  • Unlike every other type of battery, they’re not a horrible fire hazard (in fact, the electrolyte, while corrosive, is actually fire retardant – a good thing when you live in a bushfire prone area).
  • They’re modular, so you can just keep adding more of them.
  • 100% depth of discharge (i.e. they’re happy to keep being cycled, and can also be left discharged/idle for extended periods).
  • All the battery components are able to be recycled at end of life.
  • They’re Australian designed and developed, with manufacturing in Thailand.

Our primary reasons for wanting battery storage were to ensure we’re using renewable energy rather than fossil fuels, to try to actually use all our local power generation locally, and to attain some degree of disaster resilience.

Being in Tasmania, most of our grid power is from renewable sources anyway (hydro), so the renewable energy argument may seem a little weak at first, unless you cast your mind back to the time some idiot decided to deplete our dams by selling a whole lot of hydro-generated power to Victoria in an El Niño year, then the Basslink cable broke and Tasmania had to fire up a bunch of diesel generators to get through winter. Good times.

On the local generation and use front, as I mentioned at the start of this post, we’ve previously exported more than half the energy we generated, but the feed-in tariff we get from Aurora (the power company) is only $0.06501 per kWh. Compare that to the rate we pay for grid energy ($0.29852/kWh peak or $0.139/kWh off-peak). Say we exported 13.2MWh in the last four years, we would have received about $858 worth of credit… But then when we drew power back from the grid at night, or on cloudy days, we would have paid somewhere between $1834-$3940 for that same amount of power. Treating the grid as a proxy for battery storage does not make any sort of financial sense.

As for disaster resilience, we don’t often have grid outages, but we do have them, and that can be a problem. Notably, all our potable water comes from rainwater tanks attached to the house and shed, and the pumps that push that water to the taps are electric, so if the grid is down we don’t have running water. Sure, I can get water out of the tanks with a bucket or a jug, and that’s fine for a little drinking or handwashing, but it’s not good long term. Then there’s our fridges and freezer – at any given time we’re likely to have a lot of stored frozen meat from animals we farm. We don’t want to lose that in a potential extended grid outage, as could happen in bushfire season or during a severe weather event. Also, it’s nice to still have power for our local network and NBN kit, so we can check the TasNetworks Power Outages page to find out WTF is going on.

Complete grid independence would be nice, and with our current power utilisation and a single Redflow battery we could almost do it in the height of summer, or even on some good days in spring or autumn, where we can generate all we need to run the house during the day and charge the battery to 100%, then draw it down overnight. The kink is that Redflow batteries need to undergo a maintenance cycle at least every three days where they are completely discharged for a few hours. If you’re grid connected you don’t really notice this, because the maintenance cycle commences at sunset and once the battery is drained you’re just using grid power again until the sun comes up, but it does mean we can’t be grid independent even if we theoretically have enough PV generation to do so, until we get a second battery (with more than one ZCell, the maintenance cycles are interleaved so at least one battery will always have some power in it).

The other problem with grid independence is that as much as Tasmania is excellent for solar PV generation in summer, it sucks in winter. Looking at our generation and usage figures for 2019, from mid-May to mid-August we were only able to generate 17% of the power we used, and I’ve seen days where we only generated 1-2kW in the entire day. Compare that with summer when we’ve peaked above 40kW some days in December.

Still, if the grid went away for a long time in the warmer half of the year with our current setup, it’d be irritating every few nights, but I reckon we’d manage OK. Of course, there would be some adjustments required to minimise our utilisation: I’d set the blockout timer on the Sanden so the hot water only heated during daylight hours, I’d turn most of our computer equipment off overnight, we’d try to avoid using the microwave at the same time as any other chunky electrical appliance so as not to pull more than 3kW continuously from the ZCell, there’s some lights we usually leave on that we’d just turn off, and so forth. In the colder half of the year, well, I guess we’d try to eat all the frozen food quickly then limp along as best as possible. We would still have some power, some of the time.

When we originally had the PV installed, it was AC coupled, i.e. the solar panels were connected to the inverter, and the inverter was connected to the grid, along with our loads. Our choice of inverter (ABB) was partly because it was Selectronic certified, and at the time we knew Redflow to be working with Selectronic battery inverters. Once we finally got to contacting Murray Roberts from Lifestyle Electrical Services in order to get a quote and talk about installation, circumstances had changed. It turns out that the Selectronic kit just doesn’t like flow batteries with their need to be completely discharged periodically. Victron Energy gear on the other hand works really well with – and is fully supported for use with – Redflow’s ZCell batteries. Lesson learned: with changing technology it doesn’t always pay to plan too far in advance.

Murray initially proposed a setup which would have hooked straightforwardly into our AC coupled solar, i.e. the ABB inverter and PV cells would remain connected as is, then there’d be a Victron energy meter to measure what was coming from the PV and from the grid, a Victron MultiPlus II inverter/charger to charge the battery and pull from the battery to run some loads, plus a Victron Cerbo GX and GX Touch to provide monitoring and control. Some of our power circuits to the house would be hooked up as Essential House Loads (i.e. to be supported by the battery during a grid outage), and some would be Non-Essential House Loads, i.e. powered by the grid and/or PV, but without battery backup. The Victron Cerbo GX and the Redflow ZCell Battery Management System (BMS) are internet connected to hook up with Victron’s VRM portal, which provides handy monitoring tools and graphs, and to allow remote support and assistance and firmware updates. The initial proposal looked something like this:

AC Coupled Solar schematic. Some detail missing (e.g. isolators) but you get the idea.

That configuration would have involved the least messing around, and met our goals of:

  • Utilising as much of your own energy as possible locally.
  • Dealing with occasional/unexpected grid outages (modulo the ZCell maintenance cycle).

But, it still fundamentally relied on the grid. With AC coupled solar, if the grid goes down your inverter automatically goes into anti-islanding mode, and won’t give you any power from your PV array even if the grid is down during the day and the sun is shining on your panels. Your first thought here may be “wait, if the grid is down but I still have sunlight, surely I should still have power”, and that’s an understandable reaction, but anti-islanding is actually a safety feature. If the juice goes from the PV cells to the inverter to the grid and your loads, and you’re exporting power while a power company employee is doing maintenance works on the grid side, you could electrocute them. This would not be a good thing.

A perhaps less obvious problem with this setup is that you can’t black start the site during an extended grid outage. If the grid is down and the inverter is thus in anti-islanding mode, you have no way to get power to restart the system and recharge the battery from empty (unless you’ve also got a generator). “But the grid never goes down for that long…” you might say, until you look at the outages really severe bushfires can cause (think: East Gippsland in the 2019-2020 Black Summer bushfires).

After some further discussion, Murray proposed getting rid of the ABB inverter, and doing DC coupled solar instead, with a Victron MPPT RS hooked up to the PV, two MultiPlus IIs, so that we could handle up to 10kW of power (that’s the maximum limit of all loads and grid export), with all house loads hooked up as Essential, so they can be supplied by whatever combination of grid, solar and battery power is available at any given time. Voilà:

DC Coupled Solar schematic. Again, some detail missing, but mostly accurate.

One thing that’s missing in the above diagram is a manual changeover switch we had installed later, so that if there’s ever a fault in the Victron cluster that requires major works, but the grid is still up, we can manually switch all our loads back over to the grid side for the duration. Not that I expect to need that functionality, but better to have it than not just in case.

The 10kW maximum has proven to be fine, by the way – we just don’t ever put anything like that much power through the system at once. During the day we might pull 400-700W continuously with spikes from 1-3kW or occasionally 4-5kW when multiple heavier loads come on. I think the highest we’ve managed was 7kW very briefly one time with a panel heater and the microwave and the hot water and the clothes dryer and gods only know what else was on at the time, the point is it’s not easy to get the load that high. Note that we currently have a gas stove and oven – if we switched that to electric we might want to be a bit cautious about running lots of other heavy loads while cooking, but I suspect we’d still be fine in general.

Rhys, one of Murray’s crew, did a fantastic job of installing all the kit over several days, then Murray came out to do the final commissioning and bring the system online on August 31, 2021. Here’s a couple of pictures:

RedFlow ZCell battery, 2x Victron MultiPlus IIs, Victron MPPT RS
MPPT again, with ethernet switch, sub board, Victron Cerbo GX and ZCell BMS off to the right.

The transparent box in the above photo contains the Cerbo GX and the ZCell BMS, along with a little 12V backup power supply so that those things keep running if the grid fails and the ZCell is at 0% State of Charge (SoC). The ethernet switch above the sub board hooks up to the network point I installed previously when I was using a Raspberry Pi to monitor the ABB inverter’s generation. It’s Power-over-Ethernet, run from a UPS in my office which also keeps the NBN box and router alive, so the whole system still has internet access for about an hour even if all other power sources are dead (handy if there were some sort of fault and we needed remote assistance).

Here’s what the main switchboard looks like now. The leftmost switch (“Main switch (grid supply)”) turns grid power on/off to the sub board under the house, which goes from there to the Multis’ AC input. The AC out comes back up here to the right-hand switch (“Victron Sub Board Backup Supply”) and thence to all the loads (the various switches in the middle).

Do not fuck with the electricity.

Inside the house there’s a neat little touchscreen console (the Victron GX Touch), which connects via an HDMI cable in the wall to the Cerbo under the house. This shows you what everything is doing at any given time, provides notifications of alarms (e.g.: “Grid Lost”) and has a series of menus for configuring the system. The exact same console is also accessible via a web browser or mobile phone, either over the local network, or remotely via the VRM portal.

Three days after everything was up and running, we went out to run some errands and came back home in the afternoon to discover the Cable PI device in our kitchen beeping like mad. All the power was still on in the house. I called Murray in a bit of a panic thinking something was broken, but it turned out we were experiencing an actual grid outage over a large area – all the way from Grove to Leslie Vale, 802 properties were without power due to wires down in strong wind, and the PV was powering our house. We didn’t feel a thing. The UPS in my office noticed a small dip for a second or two when the grid failed, but the microwave clock was still on, and other computer gear not hooked up to a UPS remained up during the cutover. And the battery kept on charging – it was at 65% SoC when the grid went out and up to 83% by the time the grid came back.

This was just awesome.

A few days after that, suddenly, at 19:39 on September 6, we were without power. The grid was up, but the Victron kit had shut down. This was not awesome. It happened during the ZCell’s regular maintenance cycle, and at the time we got a warning in the BMS logs, and a battery high voltage warning from the MPPT. It was unclear then exactly what the problem was though – our MPPT RS was a newer model so maybe it was different somehow from what had been previously tested? Also, we discovered the DVCC setting still needed to be turned on, so maybe that was the issue. Anyway, I power cycled the Victron kit and everything was fine again until a couple of weeks later on September 19, when the system shut down again during a maintenance cycle, and again coinciding with battery high voltage warnings.

Because of the previous shutdown, Murray had been in contact with Simon Hackett from Redflow, and Simon subsequently enabled the “DC-coupled PV – feed in excess” setting. The assumption here was that the extra power being delivered from the ZCell during the maintenance discharge wasn’t absorbed by our house loads, i.e. it was trying to discharge at 1kW, but our loads were utilising less than that, and there was nowhere for the excess to go, hence the shutdown. Enabling “DC-coupled PV – feed in excess” allows power from the DC bus to be sent to the grid if necessary, which turns out to be the case at our site. A second ZCell would mitigate this because the one under maintenance would simply be able to dump to the other (assuming it wasn’t already full), but we only have one battery so far.

At this point, after those two final settings changes (and a firmware update) the system was operating exactly as it should. Everything was configured correctly, and we could survive grid outages if there was charge in the battery and/or the sun was up. Our electricity meter was replaced on September 21 so we could switch to Tariff 93 Peak & Off-Peak billing. There were no further unexpected shutdowns. Everything was totally fine, except we were still seeing these weird battery high voltage warnings during the ZCell maintenance cycle, reported by both the BMS and the MPPT.

MPPT Alarm #2 Battery high voltage

This is something neither Murray nor Simon had seen before. What followed was several weeks of troubleshooting and analysis, which I found absolutely fascinating.

I was keeping detailed notes of what happened during each maintenance cycle, and what I saw in the BMS logs, and on the Cerbo console. Several maintenance cycles later I discovered a correlation between the battery high voltage warnings, and sudden large changes in AC loads, notably when a 2400W panel heater in our bedroom turned itself on and off overnight. Also, the high voltage warnings seemed to be more likely to occur if we went into maintenance with a high state of charge, versus a low state of charge.

Overnight load spikes

Simon, who knows and understands how the ZCell behaves during maintenance, explained about the Energy Extraction Device, which is part of the unit whose purpose is to deliberately drain the battery down to zero for maintenance in a timely fashion. He wondered if there was some issue where very high power demands at short notice, while the the EED was active, were causing the DC bus voltage to fluctuate and in turn cause the MPPT to respond in an unusual manner. We experimented with changing settings on the BMS to activate the EED later than usual in the maintenance cycle (“Start maintenance when SoC below x%”), and also experimented with limiting inverter output on the Multis, and tweaking the maximum charge voltage.

After a few more cycles of observation, the suspicion became that the MPPT itself wasn’t at fault, rather it was just being the messenger. Maybe these odd voltage spikes had always happened at other sites too, but the new MPPT RS units were doing a better job of noticing them?

Later in October, Simon noticed that we were seeing spikes very close to when the battery was nearly completely empty. At that point in time, the EED was telling the Multis that it had a 10 amp output capacity, but if the Multis tried to draw on that to handle a sudden rise in load (as from our panel heater for example), the battery voltage would collapse, and the system would oscillate a bit between those two states. That behaviour was fixed with a small firmware change which I believe later landed in the BMS 1.1.11 release. Unfortunately, that change by itself didn’t make the high voltage warnings go away.

A few days after that, we suddenly had a slew of #201 – Internal DC voltage errors from the MPPT, so we were back to being concerned that maybe there was something wrong with that piece of equipment, especially given those errors began to crop up more often as time passed.

MPPT Alarm #201 – Internal DC voltage error.

Victron’s documentation ominously stated that this error meant “a measurement circuit inside the unit is broken” and the unit “is really broken, not safe for use, and if it hadn’t stopped working already then it would have stopped working soon”. Clearly a replacement or repair was in order, but I’ll get to that later.

On November 4, looking at the logs I’d collected from November 2, Simon noticed that one of the high battery voltage warnings happened at quite a high state of charge (72%), which meant it wasn’t really about the battery running out of energy and having the voltage collapse. It looked like it was the EED being over-drawn, regardless of how much energy was still in the battery. It turns out there’s a thing that the ZCell does to handle surge demand when the EED is on, called an “EED switchback”. ZCells internally have three contactors, for Charge, Discharge and EED (also known as Strip). In normal operation, the C and D contactors are on, and E is off, so the battery can be charged or discharged at will, and the EED is doing nothing. During maintenance, the EED comes on, but it can’t deliver more than 20 amps. If the site pulls more than the EED can supply, the battery goes back to normal operation (C and D on, E off) while the high demand is present. Once the high demand goes away, it switches the EED back on, to keep discharging at the normal rate of 1kW. But, by default, that switchback process only happens five times per battery maintenance cycle so as to avoid the potential for excessive cycling of the contactors in weird edge cases.

Looking at our overnight load with the panel heater on, there’s loads of spikes from below 1kW to above 1kW, so we were getting through that switchback limit very quickly. After that, with the high load from the panel heater, it was entirely possible that the Multi cluster would try to pull more power than the EED could provide, and the EED would shut down in response, resulting in weirdness on the DC bus. Simon’s suspicion for why this hadn’t been seen before was that it needed at least four casual factors to be present at once:

  1. Site demand is spiky for the entire night.
  2. The spikes start below 1kW and end above 1kW to use up the switchback quota.
  3. The site has only one battery (if a second battery were present it would handle the surge load while the first was in maintenance).
  4. The two Multis are capable of running far more load than the battery can service.

If we were to change or remove any one of those factors, we wouldn’t see the problem. So, as a test, Simon changed the switchback limit from 5 to 50, and I watched what happened during the next maintenance cycle. The status page of the BMS web interface shows, among other things, the current state of the contactors. Here’s an example with C and D on, and E off:

The ZBM logs also show the contactor state over time. Here’s a snippet I’ve colourised to make the state changes obvious:

If we take the above section of ZBM logs from 2021-11-08 23:15 to 23:29, it looks like we had “_ D E” up until 23:17, then switched to “C D _” from 23:18 to 23:26, then finally to “_ _ E” at 23:27. Based on this I’d imagine we had one switchback event that lasted eight minutes. But I had earlier noticed on the status page that the contactors seemed to be toggling more rapidly, so I wrote a little script to scrape the BMS REST API once per second and dump that to a file, which shows Charge and EED toggling on/off about ten times in that window:

2021-11-08T23:14:28+11:00    "_ D E"
2021-11-08T23:17:59+11:00    "C D _"
2021-11-08T23:18:29+11:00    "C D E"
2021-11-08T23:18:34+11:00    "_ D E"
2021-11-08T23:18:56+11:00    "C D _"
2021-11-08T23:19:14+11:00    "C D E"
2021-11-08T23:19:20+11:00    "_ D E"
2021-11-08T23:19:54+11:00    "C D _"
2021-11-08T23:20:17+11:00    "C D E"
2021-11-08T23:20:18+11:00    "_ D E"
2021-11-08T23:20:49+11:00    "C D _"
2021-11-08T23:21:19+11:00    "C D E"
2021-11-08T23:21:22+11:00    "_ D E"
2021-11-08T23:21:46+11:00    "C D _"
2021-11-08T23:21:51+11:00    "_ D _"
2021-11-08T23:21:54+11:00    "C D _"
2021-11-08T23:22:23+11:00    "C D E"
2021-11-08T23:22:26+11:00    "_ D E"
2021-11-08T23:22:43+11:00    "C D E"
2021-11-08T23:22:45+11:00    "C D _"
2021-11-08T23:23:26+11:00    "C D E"
2021-11-08T23:23:30+11:00    "_ D E"
2021-11-08T23:23:43+11:00    "C D _"
2021-11-08T23:24:30+11:00    "C D E"
2021-11-08T23:24:34+11:00    "_ D E"
2021-11-08T23:24:43+11:00    "C D _"
2021-11-08T23:25:35+11:00    "C D E"
2021-11-08T23:25:39+11:00    "_ D E"
2021-11-08T23:25:49+11:00    "C D _"
2021-11-08T23:26:40+11:00    "C D E"
2021-11-08T23:26:44+11:00    "_ D E"
2021-11-08T23:26:49+11:00    "_ _ E"

On the assumption we were hitting way more switchbacks than expected, Simon just went and set the maximum switchback count to 999. A few days later, taking the log from my script, I saw something like 150-160 switchbacks, but given we’d set the limit way high, that fixed all the high voltage warnings, except for one, right at the very end of the maintenance cycle when the discharge limit from the ZCell drops from 10 amps to 0 amps.

Simon discussed this final spike with the folks who built the EED, and found that when current draw from the EED stops, there is a voltage spike, of very low energy, for 10ms, and it can rise as high as 64V during that period before dropping back to the expected 57V. It’s normal for the EED to do this, and as there’s no real energy in it, it won’t be damaging anything. The thing about our site seems to be the new MPPT RS, with a new voltage sensing circuit that’s actually capable of noticing the spike, whereas the other gear (the MultiPlus IIs) misses it because it’s so short. The advice from the electrical engineer was to try adding more capacitors on the DC bus to absorb the spike. We already had two 47,000uF capacitors on there, so Murray went and ordered two more.

With the high battery voltage warnings out of the way, we were back to the #201 Internal DC voltage errors from the MPPT. On the assumption the unit was indeed faulty in that regard we requested a replacement, but Victron came back and said the problem could be fixed by replacing two resistors on the main board of the unit. I guess that makes sense – if you can fix a problem with $2 worth of resistors, that’s three orders of magnitude cheaper than replacing the whole unit.

By then we were getting into December, and what with pandemic-related shipping delays and the Christmas holiday period, it was later in January before we were able to get the additional capacitors installed on the DC bus, and replace the resistors in the MPPT’s voltage sensing circuit. The additional capacitors went just fine, the replacement resistors not so much. Once the MPPT was powered back up it claimed there was zero volts coming from the PV, even though the sun was shining, and the LCD display started flickering strangely. Something was definitely broken here, so we powered it back down, and Simon arranged for a replacement unit to be sent out, which took another few weeks, which is a damn shame in January/February, being prime solar PV generation time.

The delay did however allow me to spend some time messing around with scheduled charges to see if there was a cost benefit to grid-charging the battery during off-peak times, then drawing it back down during peak, because the reality is we’re going to want to do this in winter when there’s not much sun, so why not try it out in advance? TL;DR: Yes, it’s worth grid charging the battery off-peak, provided you use all that power during peak times, but it’s a bit irritating trying to figure out exactly what you’ll save. In one of my tests it was the difference between paying $3.85 for about 20kWh of usable electricity in a 24 hour period versus paying $4.70, so it’s not insignificant.

Rhys came out and installed the replacement MPPT on February 11, and was done by the middle of the day. Everything was running beautifully again, but when the unit came online there were ten instances of the dreaded #201 Internal DC Voltage Error, along with a #27 Charger Short Circuit. I used the VictronConnect app on my phone to see if I could get any more information directly from the MPPT. It told me there was a firmware update available from v1.05 to v1.08, so I went looking for information about that, and discovered that Victron’s error code documentation had been updated since I first saw it back in late October. In addition to the ominous warnings about broken measurement circuits it now also said:

“Make sure to update the firmware to at least v1.08, in previous firmwares the limits were too strict. And it could trigger falsely during MPPT start-up in the morning and MPPT shutdown in the evening.”

So I updated the firmware, and writing this now, two and a half months later, we’ve not seen a single #201 since. Could this have always been a firmware issue? Maybe, given the “accepted answer” on this Victron Community forum post says that firmware version v1.08 “solves the vast majority of MPPT RS, and Inverter RS, Error 201 issues”. Or maybe it was both – maybe we had a broken bit of kit and broken firmware too. Either way, it’s fixed now.

I continued to monitor regular maintenance cycles, and also deliberately forced maintenance a couple of times with a high state of charge to try to stress it as much as I could. During those periods I saw something like 3-10 switchbacks, so Simon set our switchback limit back down from 999 to 30. I understand a future Redflow firmware update will change this default for everyone to somewhere between 25-50, and I’m very happy that this unexpected testing at our site resulted in firmware improvements that will presumably benefit other ZCell users too.

By late April we’d been through 33 maintenance cycles since the extra capacitors went in, with 26 of those occurring since the new MPPT was installed. There had been only three occasions when the BMS briefly noticed alleged high battery voltages in that time. The MPPT was completely silent until April 20 when we got three battery high voltage warnings within an 8 minute interval right at the end of the maintenance cycle, when the battery was almost completely empty. But the weather had also started to get cold, and those errors coincide with a spike from our panel heater, which is consistent with our earlier observations about load spikes with the EED on being “difficult” and really just points to replacing the panel heater with a heat pump. Heat pump are way more energy efficient, have much smoother load, and can also be used for cooling in summer (they’re called “reverse cycle air conditioners” on the mainland).

That’s about the end of the story. The system is brilliant, and we could not be happier with the support we’ve received from Simon at Redflow, who’s been extremely generous with his time and knowledge, and Murray and Rhys of Lifestyle Electrical Services. Thanks for everything guys, I’ve learned a lot. In the eight months the system has been running we’ve generated 4631kWh of electricity and “only” sent 588kWh to the grid, which means we’ve used 87% of what we generated locally – much better than the pre-battery figure of 45%. I suspect we’ve reduced the amount of power we pull from the grid by about 30% too, but I’ll have to wait until we have a full year’s worth of data to be sure. We’ve also survived or shortened at least five grid outages with durations from a few minutes to a few hours.

The next thing to do is get a second ZCell, and possibly eventually think about a third. Given our current generation capability, two ZCells would allow us to store and utilise 100% of our generated power locally. We’d also have the ability to handle grid outages at any time, because with two batteries the maintenance cycles interleave and they can be configured to always ensure there’s a minimum amount of charge somewhere. A third would allow us to look at Standby Power System (SPS) mode where one battery is fully charged, then put into hibernation where it can remain for months. This sounds like a great way to have backup storage available for grid outages in the middle of winter when there’s no sunlight.

Appendix A – Settings Worth Messing With

Scheduled charging on the Cerbo GX console

In summer I scheduled charges of the battery to 15% between the hours of 04:00 and 08:00, and 30% between the hours of 15:00-17:00. Peak electricity hours during daylight savings are 08:00-11:00 and 17:00-22:00, and I found that a 15% charge overnight from the grid was enough to have a bit in the battery for the morning peak before the PV really got going. The afternoon charge was there mostly just in case we had a cloudy day – we’d usually get way more charge than that from the sun anyway. Now that we’re off daylight savings, peak hours change to 07:00-10:00 and 16:00-21:00, so I’ve set it to a 30% charge from 03:00-07:00 and a 50% charge from 14:00-16:00, which seems to be about right given our general peak utilisation and decreasing sunlight. I’m unlikely to set the afternoon charge higher than 50% because I don’t want to potentially go into a maintenance cycle with the battery very full, but I may re-evaluate that as we get deeper into winter.

It’s worth mentioning that during a scheduled charge, the power will come from wherever the Victron gear can find it, so if the sun is shining, you’ll be charging from the PV, not the grid. One thing to note is that during the scheduled charge period, the battery will not be used to support loads, even if it’s currently got a higher SoC than your limit. Some power will trickle away slowly though, I assume to run the pumps and supporting electronics of the battery itself.

My choice of timing for the overnight charge (four hours up to the start of the morning peak) is me wanting to have some power in the battery for as long as possible overnight in case of outages, without potentially interfering with maintenance cycles, which typically will have finished some time in the wee hours.

I also set up a 15% charge on weekend mornings. This doesn’t save us any money at all (actually it’ll be costing us a couple of tens of cents) because weekends are all off-peak power. The reason is again to have some opportunistic grid backup. Before I set this up we had an outage at 07:45 one Saturday morning with the battery empty, and had to wait until about 09:30 for the PV to bring the battery up to 10% again before everything came back online. Still, that then got us through the remainder of the grid outage which finished at about 10:15.

Battery Maintenance Settings on the ZCELL BMS

On the Battery Maintenance screen of the ZCell BMS, I’ve got “Immediate maintenance for batteries with an EED” turned off, and “Start Maintenance When SoC Below 25%” enabled. This is to try to reduce the amount of time the EED runs, to limit switchbacks caused by our spiky load. In summer I also set “Daily SoC Limit Before Maintenance” to 50%, so the battery would not let itself be charged more than half way on those long hot days with late sunsets and early sunrises. This was to minimise maintenance cycle time, because I’d previously seen occasions where we went into maintenance with 100% SoC, and the cycle didn’t finish before the following morning when the sun came up. I also had a couple of times where I guess some timeout expired and the ZCell went into its final chemical maintenance state while it still had a few percent of charge. Not letting it get very full on maintenance days avoids these situations. Now that we’re getting towards winter I’ve removed that limit because the nights are longer and I expect our evening power utilisation to be higher, i.e. we should naturally use up whatever power we’re able to generate in plenty of time during winter maintenance cycles.

It’s also worth checking the latitude and longitude are set correctly on the Site Configuration page, because that’s how the BMS figures out when the sun sets and thus when to start maintenance by default.

Appendix B – VRM Portal

The VRM portal is a remote monitoring and management web interface which Victron provides gratis for users of their hardware. It provides a realtime view of the same live utilisation information you can get from the Cerbo console, plus handy graphs of solar, grid and battery consumption.

Consumption 2022-04-26. Red is grid, yellow is solar, blue is battery.

It also provides detailed graphs of just about anything you can think of from any of the system components. It’s extremely useful. Without this I never would have been able to correlate the battery high voltage warnings with load spikes and changes in the ZCell discharge current limits.

Viewing a bunch of interesting detail all at once

The data for the advanced graphs is stored for at least six months, and the solar yield and consumption data is stored for at least 5 years. The alarm logs don’t hang around that long – I suspect it may just be showing the last 1000 entries. Somewhat irritatingly, most of these are usually low battery alarms that we don’t care about (you see a lot of them during maintenance cycles).

Appendix C – Security / Connectivity / Internet Access

The ZCell BMS and Victron Cerbo GX both need to be connected to the internet for firmware updates, remote support, and to work with the VRM portal. They don’t need to be connected 100% of the time, but they do want the connection for those reasons. The system will operate just fine if the internet is down though, and you don’t have to use the VRM portal if you don’t want to. I’ve put everything on a separate network, so I can access the BMS and the Cerbo console from my desktop/laptop/phone, but the BMS and Cerbo can’t do the reverse. It’s not that I don’t trust Redflow or Victron, it’s just sensible to keep systems that allow any form of remote access isolated from the rest of your internal network.

The BMS and Cerbo both provide WiFi APs for initial configuration. I’ve since turned those off. I can use the wired connection to the BMS to turn the WiFi back on if I ever need it, and I can do the same for the Cerbo from its console.

The Cerbo and MPPT both speak Bluetooth, so you can use the VictronConnect app to talk to them from your phone, to view status and update firmware.

Appendix D – Hackability

The ZCell BMS has a REST API, which is documented in the online help available from its web interface. This is how I was able to write a few scripts to log the battery state of charge, the contactor state, and the voltage and warning indicator status:

A bunch of Victron stuff is open source, notably Venus OS, which is the software that runs on the Cerbo. It looks fairly straightforward to get root on these things. It’s also possible to hook up the Victron kit to Home Assistant. I haven’t tried actually doing any of these things yet myself.

Appendix E – Aurora Plus

Having switched to Tariff 93 and gotten a fancy new electricity meter, we were able to use the Aurora Plus service from the power company. This provides a web interface and mobile phone app for viewing your power usage down to the hour, colour coded to indicate peak and off-peak usage and solar feed in. You also get a monthly, rather than quarterly bill. This all sounded pretty neat, so I signed up.

Aside from having used it to confirm that the figures I get from the VRM Portal and the power company actually match, it’s turned out to not be especially great.

While the electricity meter records usage information every 15 minutes, it’s only sent back to Aurora once per day, so the usage data is never actually live. Sure, you can see history, but this is useless for adjusting your power consumption on the fly. Compare that with the VRM Portal or Cerbo console, where I can see at a glance how much power is being used and how much solar is being generated right now and decide to turn appliances on or off appropriately.

Also, it nags you to give them money. It’s continually telling me I have a big red negative dollar balance, and periodically notifies me to “top up now to get ahead of your monthly bill”. No. I will pay the bill by the due date listed on the bill, after the bill actually arrives.

Finally, it costs eleven cents a day for the privilege of having the service. Under the circumstances I think I’m going to cancel it and just go back to quarterly billing.

Aurora+, trying to trick me into paying in advance.

Russell CokerPIN for Login

Windows 10 added a new “PIN” login method, which is an optional login method instead of an Internet based password through Microsoft or a Domain password through Active Directory. Here is a web page explaining some of the technology (don’t watch the YouTube video) [1]. There are three issues here, whether a PIN is any good in concept, whether the specifics of how it works are any good, and whether we can copy any useful ideas for Linux.

Is a PIN Any Good?

A PIN in concept is a shorter password. I think that less secure methods of screen unlocking (fingerprint, face unlock, and a PIN) can be reasonably used in less hostile environments. For example if you go to the bathroom or to get a drink in a relatively secure environment like a typical home or office you don’t need to enter a long password afterwards. Having a short password that works for short time periods of screen locking and a long password for longer times could be a viable option.

It could also be an option to allow short passwords when the device is in a certain area (determined by GPS or Wifi connection). Android devices have in the past had options to disable passwords when at home.

Is the Windows 10 PIN Any Good?

The Windows 10 PIN is based on TPM security which can provide real benefits, but this is more of a failure of Windows local passwords in not using the TPM than a benefit for the PIN. When you login to a Windows 10 system you will be given a choice of PIN or the configured password (local password or AD password).

As a general rule providing a user a choice of ways to login is bad for security as an attacker can use whichever option is least secure.

The configuration options for Windows 10 allow either group policy in AD or the registry to determine whether PIN login is allowed but doesn’t have any control over when the PIN can be used which seems like a major limitation to me.

The claim that the PIN is more secure than a password would only make sense if it was a viable option to disable the local password or AD domain password and only use the PIN. That’s unreasonably difficult for home users and usually impossible for people on machines with corporate management.

Ideas For Linux

I think it would be good to have separate options for short term and long term screen locks. This could be implemented by having a screen locking program use two different PAM configurations for unlocking after short term and long term lock periods.

Having local passwords based on the TPM might be useful. But if you have the root filesystem encrypted via the TPM using systemd-cryptoenroll it probably doesn’t gain you a lot. One benefit of the TPM is limiting the number of incorrect attempts at guessing the password in hardware, the default is allowing 32 wrong attempts and then one every 10 minutes. Trying to do that in software would allow 32 guesses and then a hardware reset which could average at something like 32 guesses per minute instead of 32 guesses per 320 minutes. Maybe something like fail2ban could help with this (a similar algorithm but for password authentication guesses instead of network access).

Having a local login method to use when there is no Internet access and network authentication can’t work could be useful. But if the local login method is easier then an attacker could disrupt Internet access to force a less secure login method.

Is there a good federated authentication system for Linux? Something to provide comparable functionality to AD but with distributed operation as a possibility?

,

Russell CokerGot Covid

I’ve currently got Covid, I believe I caught it on the 11th of April (my first flight since the pandemic started) with a runny nose on the 13th and a positive RAT on the evening of the 14th. I got an official PCR test on the 16th with a positive result returned on the 17th. I think I didn’t infect anyone else (yay)! Now I seem mostly OK but still have a lack of energy, sometimes I suddenly feel tired after 20 minutes of computer work.

The progression of the disease was very different to previous cold/flu diseases that I have had. What I expect is to start with a cough or runny nose, escalate with more of that, have a day or two of utter misery with congestion, joint pain, headache, etc, then have it suddenly decrease overnight. For Covid I had a runny nose for a couple of days which went away then I got congestion in my throat with serious coughing such that I became unable to speak. Then the coughing went away and I had a really bad headache for a day with almost no other symptoms. Then the headache went away and I was coughing a bit the next day. The symptoms seemed to be moving around my body.

I got a new job and they wanted me to fly to the head office to meet the team, I apparently got it on the plane a day before starting work. I’ve discussed this with a manager and stated my plan to drive instead of fly in future. It’s only a 7 hour drive and it’s not worth risking the disease to save 3-4 hours travel time, or even the 5 hours travel I’d have saved if the airports were working normally (apparently a lot of airport staff are off sick so there’s delays). Given the flight delays and the fact that I was advised to arrive extra early at the airport I ended up taking almost 7 hours for the entire trip!

7 hours driving is a bit of effort, but sitting in an airport waiting for a delayed flight while surrounded by diseased people isn’t fun either.

,

Russell CokerJoplin Notes

In response to my post about Android phones without Google Play [1] I received an email recommending Joplin for notes on Android [2].

Joplin supports storing notes on a number of protocols including Nextcloud and WebDAV. I setup WebDAV because it’s easiest, here is Digital Ocean instructions for WebDAV on Apache [3]. That basically works. One problem for my use case is that the Joplin client doesn’t support accounts on multiple servers and the only released way of sharing notes between accounts is using the paid Joplin Cloud service.

There is a Joplin Server in beta which allows sharing notes but that is designed to run in Docker and is written in TypeScript so it was too much pain to setup. One mitigating factor is that there are “Notebooks” which are collections of notes. So if multiple people who trust each other share an account they can have Notebooks for personal notes and a Notebook for shared notes.

There is also a Snap install of the client for Debian [4]. Snap isn’t my favourite way of doing things but packaging JavaScript programs will probably be painful so I’ll do it if I continue using Joplin.

BlueHackersFree psychologist service at conferences: April 2022 update

We’ve done this a number of times over the last decade, from OSDC to LCA. The idea is to provide a free psychologist or counsellor at an in-person conference. Attendees can do an anonymous booking by taking a stickynote (with the timeslot) from a signup sheet, and thus get a free appointment.

Many people find it difficult taking the first (very important) step towards getting professional help, and we’ve received good feedback that this approach indeed assists.

So far we’ve always focused on open source conferences. Now we’re moving into information security! First BrisSEC 2022 (Friday 29 April at the Hilton in Brisbane, QLD) and then AusCERT 2022 (10-13 May at the Star Hotel, Gold Coast QLD). The awesome and geek friendly Dr Carla Rogers will be at both events.

How does this get funded? Well, we’ve crowdfunded some, nudged sponsors, most mostly it gets picked up by the conference organisers (aka indirectly by the sponsors, mostly).

If you’re a conference organiser, or would like a particular upcoming conference to offer this service, do drop us a line and we’re happy to chase it up for you and help the organisers to make it happen. We know how to run that now.

In-person is best. But for virtual conferences, sure contact us as well.

The post Free psychologist service at conferences: April 2022 update first appeared on BlueHackers.org.

,

Russell CokerAndroid Without Play

A while ago I was given a few reasonably high-end Android phones to give away. I gave two very nice phones to someone who looks after refugees so a couple of refugee families could make video calls to relatives. The third phone is a Huawei Nova 7i [1] which doesn’t have the Google Play Store. The Nova 7i is a ridiculously powerful computer (8G of RAM in a phone!!!) but without the Google Play Store it’s not much use to the average phone user. It has the “HuaWei App Gallery” which isn’t as bad as most of the proprietary app stores of small players in the Android world, it has SnapChat, TikTok, Telegram, Alibaba, WeChat, and Grays auction (an app I didn’t even know existed) along with many others. It also links to ApkPure (apparently a 3rd party app installer that “obtains” APK files for major commercial apps) for Facebook among others. The ApkPure thing might be Huawei outsourcing the violation of Facebook terms of service. For the moment I’ve decided to only use free software on this phone and use my old phone for non-free stuff (Facebook, LinkedIn, etc). The eventual aim is that I can only carry a phone with free software for normal use and carry a second phone if I’m active on LinkedIn or something. My recollection is that when I first got the phone (almost 2 years ago) it didn’t have such a range of apps.

The first thing to install was f-droid [2] as the app repository. F-droid has a repository of thousands of free software Android apps as well as some apps that are slightly less free which are tagged appropriately. You can install the F-Droid app from the web site. As an aside I had to go to settings and enable “force old index format” to get the list of packages, I don’t know why as other phones had worked without it.

Here are the F-Droid apps I installed:

  • Kdeconnect to transfer files to PC. This has some neat features including using the PC keyboard on Android. One downside is that there’s no convenient way to kill it. I don’t want it hanging around, I want to transfer a file and close it down to minimise exposure.
  • K9 is an Android app for email that I’ve used for over a decade now. Previously I’ve used it from the Play Store but it’s available in F-droid. I used Kdeconnect to transfer the exported configuration from my old phone to my PC and then from my PC to my new phone.
  • I’m now using SchildiChat for Matrix as a replacement for Google Hangouts (I previously wrote about how Google is killing Hangouts [3]). One advantage of SchildiChat is that it keeps a notification running 24*7 to reduce the incidence of Android killing it. The process of sending private messages with Matrix seems noticeably slower than Hangouts, while Google will inevitably be faster than a federated system (if only because they buy better hardware than I rent) the difference shouldn’t be enough to notice (my Matrix servers might need some work).
  • I used ffupdater to install Firefox. It can also install other browsers that don’t publish APK files. One of the options is “Ungoogled Chromium” which I’m not going to use even though I’ve found Google Chrome to be a great browser, I think I should go all the way in avoiding Google. There’s no description in the app of the differences between the browsers, the ffupdater web page has information about the browsers [4].
  • I use Tusky for Mastodon which is a replacement for Twitter. My Mastodon address is @etbe@mastodon.nzoss.nz. Currently Mastodon needs more users, there are plenty of free servers out there and the New Zealand Open Source Society is just one I have contact with.
  • I have used ConnectBot for ssh connections from Android for over 10 years, previously via the Play Store but it’s also in F-droid. To get the hash of a key from a server in the way ConnectBot displays it run “ssh-keygen -l -E md5 -f /etc/ssh/ssh_host_ed25519_key.pub“.
  • I initially changed Keyboard from MS Swiftkey to the Celia keyboard that came with the phone. But it’s spelling correction was terrible, almost never suggesting words with apostrophes when appropriate and also having no apparent option to disable adult words. I’m now using OpenBoard which is a port of the Google Android keyboard which works well.
  • I’ve just installed “primitive ftpd” for file transfer, it supports ftp and sftp protocols and is well written.
  • I’ve installed the mpv video player which plays FullHD video at high quality using hardware decoding. I don’t need to do that sort of thing (the screen is too small to make it worth FullHD video), but it’s nice to have.
  • For barcodes and QR codes I’m using Binary Eye which seems better than the Play Store one I had used previously.
  • For playing music I’ve tried using the Simple Music Player (which is nice for mp3s), but it doesn’t play m4a or webm files. Auxio and Music Player Go play mp3 and m4a but not webm. So far the only programs I’ve found that can play webm are VLC and MPV, so I’m trying out VLC as a music player which it basically does but a program with the same audio features and no menu options about video would be better. Webm is important to me because I have some music videos downloaded from YouTube and webm allows me to put a binary copy of the audio data into an audio file.

Future Plans

The current main things I’m missing are a calendar, a contact list, and a shared note taking system (like Google Keep). For calendaring and a contact list the CalDAV and CardDAV protocols seem best. The most common implementation on the server side appears to be DAViCal [5]. The Nextcloud system supports CalDAV, CardDAV, web editing of notes and documents (including LibreOffice if you install that plugin) [6]. But it is huge and demands write access to all it’s own code (bad for security), and it’s not packaged for Debian. Also in my tests it gave me an error 401 when I tried to authenticate to it from the Android Nextcloud client. I’ve seen a positive review about Radicale, a simple CalDAV and CardDAV server that doesn’t need a database [7]. I prefer the Unix philosophy of keeping things simple with file storage unless there’s a real need for anything else. I don’t think that anything I ever do with calendaring will require the PostgreSQL database that DAViCal uses.

I’ll give Radicale a go for CalDAV and CardDAV, but I still need something for shared notes (shopping lists etc). Suggestions welcome.

Current Status

Lack of a contacts list is a major loss of functionality in a phone. I could store contacts in the phone memory or on the SIM, but I would still have to get all my old contacts in there and also getting something half working reduces motivation for getting it working properly. Lack of a calendar is also a problem, again I could work around that by exporting all my Google calendars as iCal URLs but I’d rather get it working correctly.

The lack of shared notes may be a harder problem to solve given the failure of Nextcloud. For that I would consider just having the keep.google.com web site always open in Mozilla at least in the short term.

At the moment I require two phones, my new Android phone without Google and the old one for my contacts list etc. Hopefully in a week or so I’ll have my new phone doing contacts, calendaring, and notes. Then my old phone will just be for proprietary apps which I don’t need most of the time and I can leave it at home when I don’t need that sort of thing.

,

FLOSS Down Under - online free software meetingsApril Hack Day Report

The hack day didn’t go as well as I hoped, but didn’t go too badly. There was smaller attendance than hoped and the discussion was mostly about things other than FLOSS. But everyone who attended had fun and learned interesting things so generally I think it counts as a success. There was discussion on topics including military hardware, viruses (particularly Covid), rocketry, and literature. During the discussion one error in a Wikipedia page was discussed and hopefully we can get that fixed.

I think that everyone who attended will be interested in more such meetings. Overall I think this is a reasonable start to the Hack Day meetings, when I previously ran such meetings they often ended up being more social events than serious hacking events and that’s OK too.

One conclusion that we came to regarding meetings is that they should always be well announced in email and that the iCal file isn’t useful for everyone. Discussion continues on the best methods of announcing meetings but I anticipate that better email will get more attendance.

,

David RoweImproving FreeDV 2020

It’s been a few years since FreeDV 2020 was released. On-air experience suggests FreeDV 2020 is less robust to low SNR and fading than the 700x modes, which means it can only be used on benign HF channels, with slow fading and a relatively high SNR. Factors include the high bit rate required for the codec [3], and the modem waveform design.

The goal of this project is to improve the robustness of 2020, and provide intelligible speech at the same operating point SNR as FreeDV 700E. In numerical terms, this can be expressed as the SNR where 700E achieves a coded bit error rate of 0.01, and a packet error rate (PER) of 0.1.

To explore these goals I ended up developing two new modes 2020A, and 2020B. Here are the key innovations, and the theoretical improvements:

  1. Compression (clipping) of the 2020x modem waveforms has been added, which is worth about 4dB. Should improve vanilla 2020 straight away.
  2. 2020A uses the same waveform as 2020, but an unequal error protection scheme. The most important 11 bits of the 52 bit LPCNet codec payload are heavily protected, the other bits not protected at all. This changes the “slope” of the speech quality against SNR curve. Compared to 2020, it may work (with poor speech quality) at lower SNRs, but still have a few audible errors even at higher SNRs.
  3. 2020B is like 700E to 700D – it works with fast fading but requires a few more dB. This will make it usable in European Winter (or over the South Pole Argentina to Australia) type channels – if you have enough SNR. The big challenge here was squeezing all the info we need (enough pilots symbols for fast fading, LPCNet, FEC bits) into a 2100 Hz channel – we are pushing up again the edges of many SSB filters. It also uses unequal FEC, just the most important 11 bits are protected, but not as well as 2020A.
  4. Index optimisation of the LPCNet Vector Quantiser [7].

Simulations

To explore the experimental ideas above I developed a script to test the new modes on simulated channels. I used FreeDV 700E and compressed analog SSB as controls. Tests were conducted at the same peak power/noise level. The multipath channel (multipath poor MPP and multipath disturbed MPD) peak power was set 5dB higher than the AWGN.

Here are the results.

The reported SNR (right column of table) varies because:

  1. With clipping the average power S increases.
  2. Different waveforms have different Peak to Average Power Ratios (PAPR), so S will vary slightly across waveforms. SSB SNR is dependant on the compression and source material.

Comparing 5 & 8 (AWGN) and 6 & 9 (MPP), it appears that compression helps 2020. We get about 4dB SNR increase and improved speech quality with the same peak power.

Comparing 10 and 19, we can see that 2020B does indeed allow us to operate on fast fading channels. However it’s not error free – due to lack of FEC.

Comparing 12 & 13, index optimisation [7] helps a bit, there are less large amplitude pops and clicks. This is not as obvious on 19 & 20, so the improvement is not large.

Comparing 5, 11, and 16, that lack of FEC on 2020A and B can heard, 2020 is nicer on the AWGN channel as FEC mops up the errors.

In 24-26 the noise level is increased (SNR decreased) by 3dB, the 700E PER is 3%. In SNR terms this is just a fraction of a dB away from the 10% PER metric mentioned in the introduction. One of the goals of this project was to make a 2020 variant usable at the same low SNR as 700E. Both 2020 and 2020A get there, but they do take “one sentence” to sync up. I don’t think 2020A (with it’s partial but strong error protection) is any better than 2020 (which now has compression/clipping).

My take aways from these samples:

  • Waveform compression (clipping) helps on all 2020x waveforms, and gets us intelligible speech at SNRs similar to 700E, albeit sync is a bit slow.
  • The effect of index optimisation is marginal when applied to this codec.
  • 2020B works as advertised and can handle the faster fading MPD channel.
  • However the unequal error protection on 2020A and B means audible errors, even on relatively benign channels.
  • 2020A with it’s unequal error protection scheme isn’t any better than 2020 at low SNRs.

Controlled Over The Air (OTA) Tests

Next step was to test on real HF channels, using a similar approach to [6]. The samples we sent in groups of 4 (700E/2020/2020A/2020B) every 30 minutes over the course of one day.

Here are the results. Key:

  1. I used two KwiwSDRs, one less than 100km away (iron) to get a fast fading NVIS path, and one 800km away (am).
  2. Rx – off air sample. First my station ID, analog (SSB) test sample, digital 2020x test sample modem signal.
  3. AnDV – like Rx, but the 2020x modem signal as been decoded. Lets you compare SSB to DV.
  4. DV – just the decoded 2020x – useful for comparing one test to another.
  5. You can get bigger versions of the plots by opening them in another browser tab.

I note (i) The station ID SSB at the start is a bit louder than the SSB test samples, I need some more work on my SSB compressor tool (ii) 700E messes up the mans voice, that codec need some work. A project for later this year.

Sample 24 is a good example of the quality of 2020 compared to SSB.

Samples 59-62 are an example of fast fading and large delay spread, the 700E and 2020B modems obtain a better SNR (10dB compared to 5dB) as they are designed to handle this channel.

Samples 75-78 show 2020A/B doing a little better than 2020, but we still have some errors. Once again, the SNR of 2020B is quite a bit higher as it handles the NVIS channel better.

In samples 79-82, 2020 does better than 2020A/2020B. Once we reach a threshold SNR, the FEC kicks in and squashes any bit errors. It’s quite a bit better than SSB too.

Conclusions

With respect to the goals in the introduction:

  1. We have indeed improved the robustness of FreeDV 2020, waveform compression seems to be the most useful innovation. This can simply be applied to the existing 2020 mode with no compatibility issues. No new mode is required.
  2. We have shown 2020 providing intelligible speech at SNRs close to 700E, although in these tests sync was a bit slow.

Some of the OTA samples show speech quality that is competitive to SSB at similar SNRs.

2020B can handle fast fading channels, however I’m not convinced the partial protection techniques used in 2020A/2020B are worth it. The residual errors are annoying, even at high SNRs. Better to have FEC on everything, just like 700E and vanilla 2020.

Further work

Rather than original goal – would it be nice to set a goal of “noise free”, as that is a key potential benefit of DV over SSB. The residual errors on 2020A/B are audible and annoying. 2020B does work well on fast fading channels, but suffers from lack of FEC.

This testing was conducted with a very small set of samples. It would be useful to test wider, for example with different speakers and different channel conditions, and different radios. Can the average Ham radio station also achieve higher quality with FreeDV 2020 than SSB? If not, why not?

LPCNet has moved on a lot since I forked it for experimental FreeDV use. It would be useful to test the latest version. I am not sure if any of the recent changes are relevant to this work.

Some other ideas:

  1. In the automated OTA tests it would be useful to detect if anyone is using the frequency before starting to transmit. This is tricky as I can’t hear anyone on my local radio due to HF EMI, so it would need to be through the KiwiSDR
  2. Not sure if I have levels right – I need a better SSB compressor tool
  3. Sometimes we get a false sync during analog speech sections.
  4. Performance with lots of lightning crashes (impulse noise) is worth looking into. This really mess’s up SSB.
  5. It would be interesting to repeat the OTA tests using regular SSB radios, to see if the results are repeatable.
  6. Faster sync for 2020 would be nice.
  7. A way to squeeze FEC coverage of all bits into the 2020B signal. Unfortunately we use up a lot of bandwidth to deal with the fast fading (more pilots, higher symbol rate, larger cyclic prefix), so we don’t have “room” for a strong FEC code. We might be able to cover all bits with a high rate code, which would at least mop up any remaining errors on high SNR channels.
  8. We might be able to improve muting when there are channel errors, for example make it FEC based rather than a SNR based squelch.
  9. Further work on compression to reduce PAPR. For example we could design an interleaver to minimise the probability of high PAPR frames, using similar techniques to index optimisation [7].

Links

[1] FreeDV 2020 improvement brainstorms
[2]
Experimental 2020A Mode
[3]
Experimental version of LPCNet, forked for use with FreeDV (now quite out of date).
[4] FreeDV mode overview
[5] FreeDV 700E
[6] Controlled FreeDV Testing
[7] VQ Index Optimisation

,

Tim RileyOpen source status update, March 2022

My OSS work in March was a bit of a grind, but I made progress nonetheless. I worked mostly on relocating and refactoring the Hanami action and view integration code.

For some context, it was back in May 2020 that I first write the action/view integration code for Hanami 2.0. Back then, there were a couple of key motivators:

  • Reduce boilerplate to an absolute minimum, to the extent that simply inheriting from Hanami::View within a slice would give you a view class fully integrated with the Hanami application.
  • Locate the integration code in the non-core gems themselves (i.e. in the hanami-controller and hanami-view gems, rather than hanami), to help set an example for how alternative implementations may also integrate with the framework.

Since then, we’ve learnt a few things:

  • As we’ve gone about refining the core framework, we’ve wound up having to synchronize changes from time to time across the hanami, hanami-controller, and hanami-view gems all at once.
  • Other Hanami contributors have noted that the original integration approach was a little too “magical,” and didn’t allow users any path to opt out of the integration code.

Once I finished my work on the concrete slice classes last month, I decided that now was the time to address these concerns, to bring the action and view class integrations back into the hanami gem, and to take a different approach to activating the integration code.

The work in progress is over in this PR, and thankfully, it’s nearly done!

The impact within Hanami 2 applications will be fairly minimal: the biggest change is that your base action and view classes will now inherit from application variants:

# slices/main/lib/action/base.rb

require "hanami/application/action"

module Main
  module Action
    class Base < Hanami::Application::Action
      # Previously, this inherited from Hanami::Action
    end
  end
end

By using this explicit application superclass for actions and views, we hopefully make it easier for our users to understand and distinguish between the integrated and standalone variants of these classes. This distinct superclass should also provide us a clear place to hang extra API documentation relating to the integrated behavior of actions and views.

More importantly for the overall experience, Hanami::Application::Action and Hanami::Application::View are both now kept within the core hanami gem. While the framework heads into this final stretch of work before 2.0 final, this will allow us to keep together the aspects of the integration that tend to change together, giving us our best chance at providing a tested, reliable, streamlined actions and views experience.

This is a pragmatic move above all else — we’re a team with little time, so the more we can do to give ourselves confidence in this integrated experience working properly, like having all the code and tests together in one place, the quicker we should be able to get to the 2.0 release. Longer term, we’ll want to provide a first-class integration story for third party components, and I believe we can lead the way in how we deliver that via our actions and views, but that’s now firmly a post-2.0 concern in my mind.

In the meantime, I did take this opportunity to rethink and provide some better hooks for classes like Hanami::Application::View to integrate with the rest of the framework, chiefly via a new Hanami::SliceConfigurable module. You can see how it works by checking out the code for Hanami::Application::View itself:

# frozen_string_literal: true

require "hanami/view"
require_relative "../slice_configurable"
require_relative "view/slice_configured_view"

module Hanami
  class Application
    class View < Hanami::View
      extend Hanami::SliceConfigurable

      def self.configure_for_slice(slice)
        extend SliceConfiguredView.new(slice)
      end
    end
  end
end

Any class that extends Hanami::SliceConfigurable will have its own .configure_for_slice(slice) method called whenever it is sublcassed within a module namespace that happens to match the namespace managed by an Hanami slice. Using the slice object passed to that hook, that class can then read any slice- or application-level config to set itself up to integrate with the application.

In the example above, we extend a slice-specific instance of SliceConfiguredView, which will copy across application level view configs, as well configure the view’s part namespaces to match the slice’s namespace. The reason we build a module instance here (this module builder pattern is a whole technique that I’ll gladly go into one day, but it’s a little out of scope for these monthly updates) is so that we don’t have to keep any trace of the slice as state on the class after we’re done using it for configuration, making it so the resulting class is as standalone as possible, and not offering any way for its users to inadvertently couple themselves to the whole slice instance.

Overall, this change is feeling quite settled now. All the code has been moved in and refactored, and all that’s left is a final polishing pass before merge, which I hope I can get done this week! A huge thank you to Sean Collins for his original work in proposing an adjustment to our action integration code. It was Sean’s feedback and exploratory work here that got me off the fence here, and made it so easy to get started with these changes!

That’s it for me for now. See you all again next month, hopefully with some more continued core framework polishing.

,

Lev LafayetteMicroprocessor Trend Usage in HPC Systems for 2022-2023

Background

In 2018 Intel x86 microprocessors were particularly susceptible to the Meltdown security vulnerabilities, whereby any system that allowed out-of-order execution was potentially vulnerable to an attack where a process could read memory that it was not authorised to do so [1]. As this vulnerability did not affect AMD processors, suggestions were raised that AMD could be a more effective choice for HPC environments. In the same year, as a topic at International Supercomputing Conference, the European Processor Initiative (EPI), a program to develop processors for domestic supercomputers, based on ARM (Advanced RISC Machine) and RISC-V, "European Processor Accelerator" system-on-a-chip [2]. With the benefits of four years of hindsight, it is valuable to consider the current trends in microprocessor architecture.

A wide analysis was recently presented at HPCAsia2021 [3] that conducts a detailed analysis of the trends of the last 27 year from over 10,000 computers from the Top500, with even more detailed analysis of 28 systems from 2009 to 2019. Of particular note in this context is the steady growth in recent years of heterogeneous supercomputers i.e., systems with GPGPUs to 28% of the Top500 with an increase of 1% per annum. The authors note: "We expect this increasing trend will continue, particularly for addressing technological limitations controlling the power consumption", a claim that could certainly be justified with the use of Nvidia GPUs or Intel Xeon Phi (discontinued as of 2020) as co-processors. At the time most systems were clustered around 1 GB per core with only three contemporary systems at 2 GB per CPU core, there was a wide variation in compute performance and parallel file system storage, and an increasing use among the most powerful systems of burst buffer storage to overcome the performance gap between memory and the file system.

Recent Developments

It is also necessary to explore trend changes in microprocessor architecture, which are not covered by the HPCAsia paper, especially over the last eighteen months. In particular, there is an increasing growth of AMD EPYC processors, increasing its share almost five-fold in the top 500 in the June 2021 list compared to a year earlier, and present in half of the 58 new entries on the June 2021 list [4]. Also of note is AMD's HPC Fund for COVID-19 research, which includes a donation system with EPYC processors and Instinct accelerators. Specifically, AMD systems made up 49 of the systems compared to 11 a year ago, include 3 new entrants in the top 10, however none of these are systems with Instinct accelerators [5]. Intel is still dominant of course, with 431 systems in the Top 500 in July 2021, albeit down from 470 the previous year.

In the November 2021 list, there was one new entry in to the top 10 list, an AMD system (the Microsoft Azure system, "Voyager-EUS2") with NVIDIA A100 GPUs. In November 2021, AMD had four systems, Fujitsu one (albeit the first), Sunway one, IBM Power 9 two, and Intel Xeon two [6]. The total Top500 share for AMD rose to 73 for AMD for November 2021, and Intel's reduced to 401, although Intel added 42 new systems to AMDs 28 (however, AMDs core count of the new entries was higher). Much of this trend is driven by the EPYC Milan series, launched in March 2021, which compares strongly to Intel's Ice Lake, which came out a month later. Among accelerators, Nvidia's share in November 2021 was 143, roughly stable compared to the previous values of 138 and 141; Nvidia GPUs are in seven of the top 10 clusters and 14 of the top 20. It also can be stated with complete certainty that another AMD system, LUMI from a European consortium, will be in the top 10 at the next release [7]

Whilst the European Union's home-grown ARM/RISC-V exascale systems are not planned to be released in the public until 2023, it remains within its planned timeline, with the completion of stage one at the end of 2021 which included Rhea general-purpose processor, using ARM Neoverse V1 and with 29 RISC-V cores [8], with an emphasises on security, power utilisation, and integration with the European automotive industry. The EPI team heavily advocates the capacity of the open-source RISC-V acceleration to transform the HPC space, with a number of architectures including long-vector processing units, stencil and tensor accelerators, and variable precision accelerators.

At the moment, however, ARM-based systems only make up six computers in the Top500; however, that includes the world's top system, Fugaku, which is the world's first exascale system and has held the number one position since June 2020. Of the six ARM-based systems, five use Fujitsu processors, while the other one uses Marvell’s ThunderX2, a now-cancelled line. The prospect of ARM/RISC-V processors increasing their share in HPC depends very much on assembly-level or compiler software development. Complex Instruction Set Computers (CISC), such as x86, provide a more comprehensive set of tools but at a cost of flexibility and power-consumption. RISC systems, in contrast, have a smaller instruction set and, whilst requiring more cycles to achieve the same task in most cases, can flexibly add new criteria and have a much lower power consumption for the same tasks.

In terms of SPEC (Standard Performance Evaluation Corporation) SPECspeed2017 and SPECrate2017 tests AMD's Milan outperforms Intel's Ice Lake in each of the 16 tests conducted in one and two socket versions, and for integer and floating point performance ratings [9]; this is perhaps unsurprising given that Milan offers more cores per processor (64 vs 40, albeit with individual cores at c25% lower speeds), increased PCIe lanes, very large L3 cache (up to 768 MB on the Milan-X with 3D-Cache) etc. The L3 cache will be particularly significant for simulation applications based on many data points and rapid read-writes, such as molecular modelling, climate and weather simulations, computational fluid dynamics, finite element analysis, etc. Further, the Milan also offers up to 42% lower power usage and 50% less rack space requirements.

Whilst the Intel Ice Lake is a very significant advance from its Cannon Lake and Whiskey Lake predecessors, essentially it seems to be a generation behind AMD Epyc. However, it is worth noting that since Ice Lake, Intel's server generation has been upgraded to Rocket Lake with Sapphire Rapids promised in 2022 (Adler Lake, the equivalent desktop system is already available). Nevertheless, a comparison between Rocket Lake (Intel Core i9-11900T) and AMD's Milan Zen 3 (Epyc 75F3) server specifications also reveals that the AMD Milan is ahead. The Sapphire Rapids (originally expected last quarter 2021) does promise 64GB of L4 cache, this is obviously a far cry from Milan-X's 768 MB.

Perhaps it should be expected that Intel would offer heavy discounts to provide a more competitive offering, given the advantage to AMD systems on many baseline costs as well as performance (after all, Intel tried very hard in the US Supreme Court to invalidate AMD and other x86 licenses). Another advantage that is notable is that Intel offers a greater variety of Ice Lake systems compared to Milan systems, thus providing greater specialisation for diverse workloads which extends over other micro-architectures. For example, Intel's Alder Lake microprocessors are superior to equivalent AMD's Ryzen 5000 series on desktop systems [10]. Whilst this advantage has led to some rather up-beat end-of-year remarks from Intel, it would be wrong to automatically make the same comparison on server systems or, for that matter, to market share. Whilst in the past AMD has struggled financially, this has not been the case for some years, with obvious benefits to consumers: "AMD has literally never been in a stronger position to face Intel’s challenge. The company has now been profitable every year since 2018... What we actually have for the first time in at least 20 years is two financially stable and healthy x86 CPU design firms slugging it out for your dollars." [11]

Whilst both AMD and Intel use the same ISA, the implantation in micro-architecture is different. This initiates another area that Intel systems always enjoy an advantage due to market share is with extensions to the x86 instruction set architecture (e.g., AVX SIMD 256, AVX SIMD 512, for Intel, XOP for AMD, etc) and hardware-assisted virtualisation. For example, in the previous generation of micro-architecture extensions AMD adopted an over-clocking approach to replicate the performance of extensions on Intel micro-architectures [12]. As it turned out (c.f., Meltdown) Intel's own performance metrics were on less than secure foundations. Today, whilst AMD does make an effort to incorporate the instruction set extensions from Intel it is quite possible that performance can be suboptimal, ranging from "lower performance" to "export this environment variable for compatibility" for particular applications. One example of this has been seen with Gaussian and MATLAB applications, which requires an additional environment variable. These differences, and requirements for environment modifications, are most evident in software that makes use of the Intel MKL [13]. Note that with hardware-assisted virtualisation (e.g., Intel HAXM) competitors do not even pretend to aim for compatibility.

Overall, it must be said that whilst Intel's market position certainly remains in a majority in major HPC systems and it provides solid products with incremental improvements and a diverse range, the Milan-series of EPYC processors from Intel have kept match with comparable Intel systems and, with the enormous L3 cache, have really considered pinpointed where "big data" problems are in HPC systems. This has been a known problem for quite a while, between memory and the processor. Whilst many systems have talked about various solutions (such as reviving the old PDP-11 core memory idea with cache NVM), this is really the first architecture that has addressed the problem at scale. It would be unwise to overlook this opportunity as certainly does not take advantage of this problem the University of Melbourne's flagship supercomputer would lose significant ground and fall out from being a world-class system. All other things being roughly equal, any microarchitecture that deals with the processor-memory gap - Intel, AMD, or ARM - should be considered as a priority.

Certainly, a heterogeneous system in terms of processors is possible, as long as system engineers are aware of the potential need for build switches and multiple builds. This is no simple task; specific builds for heterogeneous systems that take advantage of the specific architecture is an often overlooked hurdle, especially when the software in question has a complex collection of dependent builds.

GPUs and Accelerators

In the GPU and accelerator space, the release of AMD's Instinct MI210 and MI210X in November 2021 can be compared very favourably to Nvidia's A100 with 80GB GPU released in November 2020 [14], although one can expect that Nvidia will have a major release with "Ampere Next" (aka "Hopper"), expected in March 2022, although it would require a very significant improvement (around fivefold across 64-bit floating point vector and matrix compute) to close the performance gap. Implied research [15] suggests that: "Coming to the performance numbers, the 'GPU-N' (presumably Hopper GH100) produces 24.2 TFLOPs of FP32 (24% increase over A100) and 779 TFLOPs FP16 (2.5x increase over A100) which sounds really close to the 3x gains that were rumored for GH100 over A100. Compared to AMD's CDNA 2 'Aldebaran' GPU on the Instinct MI250X accelerator, the FP32 performance is less than half (95.7 TFLOPs vs 24.2 TFLOPs) but the FP16 performance is 2.15x higher." [16]

For their own part, albeit delayed by a year more than than expected, Intel will certainly be launching the "Ponte Vecchio" Xe HPC GPU, although the promise of "petaflops in your palm" is quite plausible from released specifications. "What stands out almost immediately is the amount of L2 cache leveraged by Ponte Vecchio: 408MB vs just 16MB on the Instinct MI200 and 40MB on the A100. However, in terms of raw compute, AMD has a lot more vector units: 7,040 across 110 CUs, resulting in an overall throughput of 95.7 TFLOPs, compared to just 19.5 TFLOPs on the NVIDIA A100. However, each of Intel’s CUs will be better fed with much higher cache hit rates and wider XMX matrix units. The MI250X has an 8192-bit wide bus paired with 128GB of HBM2e memory capable of transfer rates of up to 3.2TB/s. Intel hasn’t shared any details regarding the bus width or memory configuration of PVC just yet." [17]

At the same time, technical management must keep a careful eye on the development of stage 2 of the European Processor Initiative and prepare in their mind that the next dominant processor type in high performance computing may very well not only be based on ARM/RISC-V systems, but even provided with open-source principles. This would be an extraordinary achievement as the consumer benefits of competitive performance without vendor-lockin would be enormous. There is even an international shift in this regard [18]; the European Union's EPI is explicitly aimed at reducing reliance on foreign technologies, which other major international powers are also engaging. In 2021, Russia revealed a programme based around RISC-V parts, combined with that country's Elbrus, and the People's Republic of China also has a RISC-V chip family (XiangShan) for personal computers, following the use of their Matrix-2000 and Sunway SW26010 RISC processors as a response to US sanctions on the export of Intel Xeon Phi systems.

Summary

* Heterogeneous systems (CPU/GPU) are now normal in HPC and this will expand to include other architectures.
* AMD Zen CPUs are increasing as a proportion of the Top500 and are overcoming MLK/AVX-512 concerns.
* RISC systems are currently a very small percentage, but include the top system. RISC-V will be important in the future.
* Despite very competitive hardware offerings from AMD and Intel, GPUs are still overwhelming dominated by Nvidia.

References

[1] CVE-2017-5754 Detail
https://www.cve.org/CVERecord?id=CVE-2017-5754

[2] Lev Lafayette, New Developments in Supercomputing, Presentation to Linux Users of Victoria, September 4, 2018
http://levlafayette.com/files/2018luvsupercomputers.pdf

[3] Awais Khan, Hyogi Sim, Sudharshan S. Vazhkudai, Ali R. Butt, Youngjae Kim. An Analysis of System Balance and Architectural Trends Based on Top500 Supercomputers. HPCAsia2021: International Conference on High Performance Computing in Asia-Pacific Region, Association for Computing Machinery, 2021

[4] AMD Leads High Performance Computing Towards Exascale and Beyond, June 28, 2021
https://ir.amd.com/news-events/press-releases/detail/1012/amd-leads-high...

[5] AMD Quadrupled EPYC’s Top 500 Supercomputer Share In A Year, June 28, 2021
https://www.crn.com/news/components-peripherals/amd-quadrupled-epyc-s-to...

[6] November 2021 Top 500
https://www.top500.org/lists/top500/2021/11/

[7] Kurt Lust, EasyBuild on LUMI, a pre-exascale supercomputer, Proceedings of the 7th EasyBuild User Meeting, 24-28 January, 2022

[8] EPI Announces Successful Conclusion of European Processor Initiative Phase One, December 22, 2021
https://www.hpcwire.com/off-the-wire/epi-announces-successful-conclusion...

[9] AMD 3rd Gen Epyc CPUs Put Intel Xeon SPS on Ice in the Datacenter, NextPlatform, July 29, 2021
https://www.nextplatform.com/2021/07/29/amd-3rd-gen-epyc-cpus-put-intel-...
Nota Bene: Article sponsored by AMD.

[10] Paul Alcorn, CPU Benchmarks and Hierarchy 2022: Intel and AMD Processors Ranked, Tom's Hardware, January 8, 2022
https://www.tomshardware.com/reviews/cpu-hierarchy,4312.html

[11] Joel Hruska, Intel’s CEO is Wrong About AMD, ExtremeTech, January 19, 2022
https://www.extremetech.com/computing/330685-intels-ceo-is-wrong-about-amd

[12] Joel Hruska, Analyzing Bulldozer: Why AMD’s chip is so disappointing, ExtremeTech, October 24, 2011
https://www.extremetech.com/computing/100583-analyzing-bulldozers-scalin...

[13] Mingru Yang, MKL has bad performances on an AMD CPU, Nov 18, 2019
https://sites.google.com/a/uci.edu/mingru-yang/programming/mkl-has-bad-p...

[14] AMD Instinct MI200 Series Accelerator, December 2021
https://www.amd.com/system/files/documents/amd-instinct-mi200-datasheet.pdf

[15] Yaosheng Fu, Evgeny Bolotin, Niladrish Chatterjee et al, ACM Transactions on Architecture and Code Optimization, Volume 19 Issue 1 March 2022
(pre-release available online December 2021)
https://doi.org/10.1145/3484505

[16] Hassan Mujtaba, Mysterious NVIDIA ‘GPU-N’ Could Be Next-Gen Hopper GH100 In Disguise, December 21, 2021
https://wccftech.com/mysterious-nvidia-gpu-n-could-be-next-gen-hopper-gh...

[17] Areej Syed, Intel Ponte Vecchio Specs. HarwareTimes, November 15, 2021
https://www.hardwaretimes.com/intel-ponte-vecchio-specs-1024-cores-408mb...

[18] Gareth Halfacree, First RISC-V computer chip lands at the European Processor Initiative, The Register, 22 Sep, 2021
https://www.theregister.com/2021/09/22/first_riscv_epi_chip/

,

Simon LyallAudiobooks – March 2021

The Great Beanie Baby Bubble: Mass Delusion and the Dark Side of Cute
by Zac Bissonnette

The toys, the bubble and the crazy guy behind it all. Fun roller-coaster of a read. Second review. 4/5

Overpaid, Oversexed and Over There: How a Few Skinny Brits with Bad Teeth Rocked America by David Hepworth

A bunch of amusing stories and observations of the British Invasion and it’s followups. I love Hepworth’s style but your mileage may vary. 3/5

This is Not Normal: The Politics of Everyday Expectations by Cass R. Sunstein

A fairly short book that packs some interesting ideas. Mainly concentrating how societal norms change. Worth a read. 4/5

Post Wall, Post Square: Rebuilding the World after 1989 by Kristina Spohr

An analysis of the upheavals of 1989 and the 3 years that followed them. Especially following the actions of Bush, Gorbachev and Kohl, it is mostly a history of the leaders and their policies. 3/5

A Naturalist at Large: The Best Essays of Bernd Heinrich by Bernd Heinrich

Around 35 short ( main around 10-20 minutes ) essays on plants, insects and birds. A delight to listen to. 4/5

My Audiobook Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Russell CokerConverting to UEFI

When I got my HP ML110 Gen9 working as a workstation I initially was under the impression that boot wasn’t supported on NVMe and booted it from USB. I found USB booting with legacy boot to be unreliable so decided to try EFI booting and noticed that the NVMe devices were boot candidates with UEFI. Making one of them bootable was more complex than expected because no-one seems to have documented such things. So here’s my documentation, it’s not great but this method has worked once for me.

Before starting major partitioning work it’s best to run “parted -l and save the output to a file, that can allow you to recreate partitions if you corrupt them. One thing I’m doing on systems I manage is putting “@reboot /usr/sbin/parted -l > /root/parted.log” in the root crontab, then when the system is backed up the backup server gets any recent changes to partitioning (I don’t backup /var/log on all my systems).

Firstly run parted on the device to create the EFI and /boot partitions, note that if you want to copy and paste from this you must do so one line at a time, a block paste seemed to confuse parted.

mklabel gpt
mkpart EFI fat32 1 99
mkpart boot ext3 99 300
toggle 1 boot
toggle 1 esp
p
# Model: CT1000P1SSD8 (nvme)
# Disk /dev/nvme1n1: 1000GB
# Sector size (logical/physical): 512B/512B
# Partition Table: gpt
# Disk Flags: 
#
# Number  Start   End     Size    File system  Name  Flags
#  1      1049kB  98.6MB  97.5MB  fat32        EFI   boot, esp
#  2      98.6MB  300MB   201MB   ext3         boot
q

Here are the commands needed to create the filesystems and install the necessary files. This is almost to the stage of being scriptable. Some minor changes need to be made to convert from NVMe device names to SATA/SAS but nothing serious.

mkfs.vfat /dev/nvme1n1p1
mkfs.ext3 -N 1000 /dev/nvme1n1p2
file -s /dev/nvme1n1p2 | sed -e s/^.*UUID/UUID/ -e "s/ .*$/ \/boot ext3 noatime 0 1/" >> /etc/fstab
file -s /dev/nvme1n1p1 | tr "[a-f]" "[A-F]" |sed -e s/^.*numBEr.0x/UUID=/ -e "s/, .*$/ \/boot\/efi vfat umask=0077 0 1/" >> /etc/fstab
# edit /etc/fstab to put a hyphen between the 2 groups of 4 chars for the VFAT filesystem UUID
mount /boot
mkdir -p /boot/efi /boot/grub
mount /boot/efi
mkdir -p /boot/efi/EFI/debian
apt install efibootmgr shim-unsigned grub-efi-amd64
cp /usr/lib/shim/* /usr/lib/grub/x86_64-efi/monolithic/grubx64.efi /boot/efi/EFI/debian
file -s /dev/nvme1n1p2 | sed -e "s/^.*UUID=/search.fs_uuid /" -e "s/ .needs.*$/ root hd0,gpt2/" > /boot/efi/EFI/debian/grub.cfg
echo "set prefix=(\$root)'/boot/grub'" >> /boot/efi/EFI/debian/grub.cfg
echo "configfile \$prefix/grub.cfg" >> /boot/efi/EFI/debian/grub.cfg
grub-install
update-grub

If someone would like to make a script that can handle the different partition names of regular SCSI/SATA disks, NVMe, CCISS, etc then that would be great. It would be good to have a script in Debian that creates the partitions and sets up the EFI files.

If you want to have a second bootable device then the following commands will copy a GPT partition table and give it new UUIDs, make very certain that $DISKB is the one you want to be wiped and refer to my previous mention of “parted -l“. Also note that parted has a rescue command which works very well.

sgdisk /dev/$DISKA -R /dev/$DISKB 
sgdisk -G /dev/$DISKB

To backup a GPT partition table run a command like this. Note that if sgdisk is told to backup a MBR partitioned disk it will say “Found invalid GPT and valid MBR; converting MBR to GPT forma” which is probably a viable way of converting MBR format to GPT.

sgdisk -b sda.bak /dev/sda

,

Tim RileyLet the shape of the code reflect its flow

Say you’re building a system for handling messages off some kind of queue. For each message, you need to run a series of steps: first to decode the message, next to wrap it in some common structure, and finally, to process the message based some logic provided by the users of your system.

Let’s imagine the queue subscription as provided: we’ll have a subscriber object that yields a message to us via a #handle method:

subscriber.handle do |message|
  # Here's where we need to hook our logic
end

For each of processing steps, let’s also imagine we have corresponding private methods in our class:

  1. #decode(message)
  2. #build_event(decoded_message) — with an “event” being that common structure I mentioned above
  3. #process(event)

With these set up, we could wire them all together in our handler block like so:

subscriber.handle do |message|
  process(build_event(decode(message)))
end

This is hard to grok, however. There’s a lot going on in that one line, and most critically, you have to read it inside out in order to understand its flow: start with decode, then work backwards to build_event and then process.

Instead, we should strive to let the shape of our code reflect its flow. We want to make it easy for the reader of the code to quickly understand the flow of logic even with just a glance.

One step in this direction could be to use intermediate variables to hold the results of each step:

subscriber.handle do |message|
  decoded_message = decode(message)
  event = build_event(decoded_message)
  process(event)
end

This isn’t bad, but the variable names at the beginning of the line add extra noise, and they push back the most meaningful part of each step — the private method names — into a less prominent location.

What I would recommend here is that we take advantave of Ruby’s Object#then to turn this into something that actually looks like a pipeline, since that’s the flow that we’re actually creating via these methods: the steps run in sequence, and the output of one step feeds into the next.

subscriber.handle do |message|
  message
    .then { |message| decode(message) }
    .then { |decoded| build_event(decoded) }
    .then { |event| process(event) }
end

This makes it much clearer that this is a pipeline of three distinct steps, with message as its starting point. Through the shape of those blocks, and the pipe separators distinguishing the block argument from the block body, it also brings greater prominence to the name of the method that we’re calling for each step.

Most importantly, we’ve made this code much more scannable. We’re giving the eye of the reader hooks to latch onto, via the repeated “thens” stacked on top of each other, in addition to their corresponding blocks. The shape of the code embodies its flow, and in doing so, we’ve created a table-of-contents-like structure that both summarises the behaviour, and can serve as a jumping off point for further exploration if required.

To further reduce noise here, we could try Ruby’s new numbered implicit block arguments:

subscriber.handle do |message|
  message
    .then { decode(_1) }
    .then { build_event(_1) }
    .then { process(_1) }
end

However, I’d consider this a step too far, since it takes away what is otherwise a helpful signal, with the block argument name previously serving as a hint to the type of value that we’re dealing with at each point in the pipeline.

By taking the time to consider the flow of our logic, and finding a way for the shape of code to embody that flow, we’ve made our code easier to understand, easier to maintain, and — why not say it? — truer to itself. This is a method I’d walk away from feeling very satisfied having written. Salubrious!

Tim RileySalubrious Ruby

I’ve been writing Ruby for over 21 years now — over half my life! — and in this time, I’ve learnt so much about both the language as well as software development in general. You might have noticed some of the products of that learning in my open source contributions, chiefly the dry-rb and Hanami projects.

I honestly feel blessed to have found a language that could accompany me on a learning journey over so many years.

For much of my time with Ruby, I worked with a tight-knit group of people: the developers at Icelab, and my OSS collaborators. Over the last few years, however, I’ve had the good fortune of working with a much larger group of Rubyists at Culture Amp, and have had many opportunities to share what goes into making a good Ruby application.

There’s little I enjoy more than talking quality code, but lately I’ve been so focused on shipping said code, that I haven’t taken much time to step back and acknowledge what I’ve learnt along the way.

With this announcement, I’m hoping to change this, and create a little accountability for myself along the way! So with no further ado, let me announce a new series here on this little blog, which I’m calling Salubrious Ruby.

I hope you’ll join me while I share the things both big and small that go into making a healthy, wholesome Ruby application!

As I go, I’ll keep pointers to all the articles below:

,

Tim RileyOpen source status update, 🇺🇦 February 2022

After the huge month of January, February was naturally a little quieter, but I did help get a couple of nice things in place.

Stand with Ukraine 💙💛

This is my first monthly OSS update since Russia began its brutal, senseless war on Ukraine. Though I was able to ship some work this month, there are millions of people whose lives and homeland have been torn to pieces. For a perspective from one of our Ruby friends in Ukraine, read this piece and this update from Victor Shepelev, aka zverok.

Let’s all continue to support Ukraine, and help the international community continue doing the same.

Concrete slice classes

For this month I focused mostly on getting concrete slice classes in place for Hanami applications. As I described in the alpha7 release announcement, concrete slice classes give you a nice place for any slice-specific configuration.

They live in config/slices/, and look like this:

# config/slices/main.rb:

module Main
  class Slice < Hanami::Slice
    # Slice config goes here...
  end
end

As of this moment, you can use the slice classes to configure your slice imports:

# config/slices/main.rb:

module Main
  class Slice < Hanami::Slice
    # Import all exported components from "search" slice
    import from: :search
  end
end

As well as particular components to export:

# config/slices/search.rb:

module Search
  class Slice < Hanami::Slice
    # Export the "index_entity" component only
    export ["index_entity"]
  end
end

Later on, I’ll look to expand this config and find a way for a subset of application-level settings to be configured on individual slices, too. I imagine that configuring source_dirs on a per-slice basis may be useful, for example, if you want particular source dirs to be used on one slice, but not others.

The other thing you can currently do on these slice classes is configure their container instance:

# config/slices/search.rb:

module Search
  class Slice < Hanami::Slice
    prepare_container do |container|
      # `container` (a Dry::System::Container subclass) is available here with
      # slice-specific configuration already applied
    end
  end
end

This is an advanced feature and not something we expect typical Hanami users to need. However, I wanted this in place so I could continue providing “escape valves� across the framework, to allow Hanami users to reach below the framework layer and manually tweak the lower-level parts without having to eject entirely from the framework and all the other niceties it provides.

Slice registration refactors

As part of implementing the concrete slice classes, I was able to make some quite nice refactors around the way we handle slices within the Hanami application:

  • All responsibility for slice loading and registration has now away from Application (which is already doing a lot of other work!) into a new SliceRegistrar.
  • The .prepare methods inside both Application and Slice are now roughly identical in structure, with their many constituent setup steps extracted into their own well-named methods (for example). This will make this phase of the boot process much easier to understand and maintain, and I also think it hints at a future in which we have an extensible boot process, wherein other gems may register their own steps as part of the overall sequence that is run when you .prepare and application or slice.

The single-file-app dream lives on

One nice outcome of the concrete slice work is the fact that these classes are not actually required in your Hanami application for it to boot and do its job. It will still look for directories under slices/ and dynamically create those classes if they don’t already exist in config/slices/. What’s even better, however, is that I made this behaviour more easily user-invokable via a public Application.register_slice method. This means you can choose to explicitly register a slice, in cases where the framework may not otherwise detect it:

module MyApp
  class Application < Hanami::Application
    # That's all! This will define a `Main::Slice` class for you.
    register_slice :main
  end
end

But that’s not all! Since these slice classes will now be the place for slice-specific configuration, we may need to provide this when explicitly registering a slice too. For this, you can provide a block that is then evaluated within the context of the generated slice class:

module MyApp
  # Defines `Main::Slice` class and instance_evals the given block
  class Application < Hanami::Application
    register_slice(:main) do
      import from: :search
    end
  end
end

And lastly, you can also provide your own concrete slice class at this point, too:

module MyApp
  class Application < Hanami::Application
  end
end

module Main
  class Slice < Hanami::Slice
  end
end

MyApp::Application.register_slice :main, Main::Slice

One of the guiding forces behind this level of flexibility (apart from it just feeling like the Right Thing To Do) is that I want to keep open the option for single-file Hanami applications. While the framework will always be designed primarily for fully-fledged applications, with their components spread across many source files, sometimes a single file app is still the right tool for the job, and I want Hanami to work here too. As I put the final polish on the core application and slice structures over the coming couple of months, I’ll be keeping this firmly in mind, and will look to share a nice example of this in a future blog post :)

Removed some unusued (and unlikely to be used) flexibility

While I like to try and keep the Hanami framework flexible — and we’ve already looked at several approaches to this just above — I’m also conscious of the cost of this flexibility, and how in certain cases, those costs are just not worth it. One example of this was the removal of the configurable key separator in dry-system earlier this year. In this case, keeping the key separator configurable meant not only significant internal complexity, but also the fact that we could never write documentation that we could be fully confident would work for all people. To boot, we hadn’t heard of a single user wanting to change that separating over the whole of dry-system’s existence.

As part of my work this month, I removed a couple of similar settings from Hanami:

  • I removed the config.slices_namespace setting, which existed in theory to allow slices to also live inside the application’s own module namespace if a user desired (e.g. MyApp::Main instead of just ::Main). In reality, I think that extra level of nesting will be too invoncenient for users to want. More importantly, I think that having our slices always mapping to single top-level modules will be important for our documentation (and generators, and many other things I’m sure) to be clearer.
  • I also remove the config.slices_dir setting, for much the same reasons. Hanami will be far easier to document and support if slices are always loaded from slices/ and nowhere else.

Made Application.shutdown complete

Did you know that you can both boot and shut down an Hanami application? The latter will call stop on any registered providers, which can be useful if you need to actively disconnect from any external resources, such as database connections.

You can shutdown an Hanami application via Application.shutdown, but the implementation was only partially complete. As of this PR, shutdown now works for both slices (and their providers) and when shutting down an application, it will shutdown all the slices in turn.

Simplified configuration by permitting env to be provided just once

Another little one: the application configuration depends on knowing the current Hanami env (i.e. Hanami.env) in several ways, such as knowing when to set env-specific defaults, or apply user-provided env-specific config. Until now, it’s been theoretically possible to re-set the env even after the configuration has loaded, which makes the env-specific behaviour much harder to reason about. With this change, the env is now set just once (based on the HANAMI_ENV env var) when the configuration is initialized, allowing us to much more confidently address the env across all facets of the configuration behavior.

(This and the shutdown work together in a single evening session. For many reasons, I was feeling down, and this was a nice little bit of therapy for me. So much of what I’ve been doing here lately spans multiple days and weeks, and having a task I could complete in an hour was a refreshing change.)

Worked to define a consistent action and view class structure

This effort was mostly driven by Luca, but we worked together to arrive at a consistent structure for the action and view classes to be generated in Hanami applications.

For actions, for example, the following classes will be generated:

  • A single application-level base class, e.g. MyApp::Action::Base in lib/my_app/action/base.rb. This is where you would put any logic or configuration that should apply to every action across all slices within the application.
  • A base class for each slice, e.g. Main::Action::Base in slices/main/lib/action/base.rb, inheriting from the application-level base class. This is where you would put anything that should apply to all the actions only in the particular slice.
  • Every individual action class would then go into the actions/ directory within the slice, e.g. Main::Actions::Articles::Index in slices/main/actions/articles/index.rb.

For views, the structure is much the same, with MyApp::View::Base and Main::View::Base classes located within an identical structure.

The rationale for this structure is that it provides a clear place for any code to live that serves as supporting “infrastructure� for your application’s actions and views: it can go right alongside those Base classes, in their own directories, clearly separated from the rest of your concrete actions and views.

This isn’t an imagined requirement: in a standard Hanami 2 application, we’ll already be generating additional classes for the view layer, such as a view context class (e.g. Main::View::Context) and a base view part class (e.g. Main::View::Part).

This structure is intended to serve as a hint that your own application-level action and view behavior can and should be composed of their own single-responsibility classes as much as possible. This is one of the many ways in which Hanami as a framework can help our users make better design choices, and build this up as a muscle that they can apply to all facets of their application.

Released alpha7

Last but not least, I cut the release of Hanami 2.0.0.alpha7 and shared it with the world.

What’s next?

My next focus has been on a mostly internal refactor to move a bunch of framework integration code from hanami-controller and hanami-view back into the hanami gem itself, since a lot of that is interdependent and important to maintain in sync in order to provide a cohesive, integrated experience for people building full stack Hanami applications. This should hopefully be ready by the next alpha, and will then free me up to move back onto application/slice polish.

,

Simon LyallAudiobooks – February 2022

No Filter: The Inside Story of Instagram by Sarah Frier

A fairly straight story about the company, lots of fun anecdotes. A little biased towards founder Kevin Systrom, probably due to more access to him (and none to Zuckerberg). 3/5

A Walk Around the Block: Stoplight Secrets, Mischievous Squirrels, Manhole Mysteries & Other Stuff You See Every Day (And Know Nothing About) by Spike Calsen

Short chapters about various bits of infrastructure and the people who manage them. Not huge amounts of detail but a few gun facts on each. An okay quick listen. 3/5

Yeager: An Autobiography by Chuck Yeager

A well written account of an aviation legend’s life. Interesting stories of World War 2 service, test pilot and other parts of his career and life. 4/5

The Spy Who Loved Me by Ian Fleming

A first person account by a young women. Fleeing some unfortunately love affairs via a road trip she meets Gangsters and James Bond. Different feel from most Bond books. 3/5

The Ministry for the Future by Kim Stanley Robinson

After a heat-wave kills 20 million in India. A UN Agency (The head of which is the main character) and others start getting serious to reverse climate change. Interesting and engaging. 4/5

The years of Lyndon Johnson: The Path to Power by Robert Caro

The first volume of the series covers Johnson from birth through his unsuccessful bid for a Senate seat in 1941. Detailed, entertaining and easy to follow. 4/5

My Audiobook Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

FLOSS Down Under - online free software meetingsMarch 2022 Meeting

Meeting Report

The March 2022 meeting went reasonably well. Everyone seemed to have fun and learn useful things about computers. After 2 hours my Internet connection dropped out which stopped the people who were using VMs from doing the tutorial. Fortunately most people seemed ready for a break so we ended the meeting. The early and abrupt ending of the meeting was a disappointment but it wasn’t too bad, the meeting would probably only have gone for another half hour otherwise.

The BigBlueButton system was shown to be effective for training when one person got confused with the Debian package configuration options for Postfix and they were able to share the window with everyone else to get advice. I was also confused by that stage.

Future Meetings

The main feature of the meeting was training in setting up a mailserver with Postfix, here are the lecture notes for it [1]. The consensus at the end of the meeting was that people wanted more of that for the April meeting. So for the April meeting I will add to the Postfix Training to include SpamAssassin, SPF, DKIM, and DMARC. For the start of the next meeting instead of providing bare Debian installations for the VMs I’ll provide a basic Postfix/Dovecot setup so people can get straight into SpamAssassin etc.

For the May meeting training on SE Linux was requested.

Social Media

Towards the end of the meeting we discussed Matrix and federated social media. LUV has a Matrix server and I can give accounts to anyone who’s involved in FOSS in the Australia and New Zealand area. For Mastodon the NZOSS Mastodon server [2] seems like a good option. I have an account there to try Mastodon, my Mastodon address is @etbe@mastodon.nzoss.nz .

We are going to make Matrix a primary communication method for the Flounder group, the room is #flounder:luv.asn.au . My Matrix address is @etbe:luv.asn.au .

,

David RoweFreeDV Activity Day

On the weekend of 18-20 Feb I took part in the FreeDV Activity Day, kindly organised by Mooneer, K6AQ [1]. I was particularly interested in trying to receive FreeDV signals from other countries. To maximise my chances, I went camping at “Point Parnka” a quiet location about 200km SE of Adelaide. My station was a simple end fed dipole and IC7200 running 30W.

It was indeed radio-quiet there – my noise floor on 20m was S0, which on my IC7200 means less than -135dBm/Hz [2]. The 40M band was a bit noisier, about S1, and I could hear some 100 Hz buzzing on top of SSB signals. I looked around – in adjacent camp sites were caravans which I guessed had 12V to 240V inverters, and solar chargers (100Hz is the 2nd harmonic of 50Hz AC). Sure enough, the next day the caravans left, and took their EMI with them – 40M was now down to S0 and the 100 Hz buzzing gone.

Having such a low noise floor is interesting – I could hear many stations, and SSB was quite pleasant to listen to without the usual urban EMI. However not everyone could hear me. I had several one way contacts – they sounded great to me but I was lost in their local noise floor.

When I pressed the “Start” button on FreeDV, I could hear the USB link to my IC7200. I guess the USB bits are being modulated at HF frequencies. I messed around with some ferrites on the USB cable which improved, but didn’t entirely remove this EMI. Next time I will position my antenna further away, and try a “pure analog” connection to my laptop.

The highlight of the weekend was decoding Mel, K0PFX, long path from the US using FreeDV 700D on 20M. That’s about half way around the world. Until now, I wasn’t sure if there was a modem issue with FreeDV that made international contact difficult. However I think it’s just down to SNR – with enough power, the right antennas and band conditions, our FreeDV waveforms can make the trip just fine. We have evidence from simulation and automated Over the Air (OTA) tests [3] that suggests FreeDV is competitive with SSB in terms of low SNR performance. Nice to support these results with a real QSO.

I had a few FreeDV (and SSB) contact with Hams over the weekend, including some 700D and 2020 across 1000-2000 km paths in Australia. However I am a rare user of FreeDV. The problem is every time I use it I get frustrated about some aspect and end up kicking off another project to fix it! Oh well, the experimentation is fun.

Quite a bit of activity in other parts of the world, too:

A few observations:

  1. To date I’ve spent a lot of time working on the low SNR performance of FreeDV (where communications is marginal). Now I would like to see if I can improve speech quality on high SNR, “armchair copy” channels.
  2. I also discovered a problem with wind noise upsetting the codec. Making the codec more robust to acoustic background noise is an interesting project.
  3. High SNR SSB sounds great, but the quality drops off when the channel has impulse noise (e.g. due to lightning crashes). In theory FreeDV can correct these, but I haven’t carefully tested the modems to see how they perform. We only test against simulations of AWGN noise and multipath. The effect of radio AGC should also be studied – it make fast jumps when there are static crashes.
  4. It’s hard to beat the low latency of PTT SSB for quick back and forth overs, especially compared to using a laptop to run FreeDV. However the inherent latency of the FreeDV modes is pretty good (e.g. 80ms for 700E). With a real time implementation on a microcontroller (or integrated into a HF Radio) the PTT experience could be pretty close.
  5. I was pleased with the performance of the FreeDV modems. I monitored one 40M QSO 700E where I could see “notches” about 300 Hz apart, indicating a delay spread of 3ms, and the modem was hanging on fine. This is very pleasing, and I am happy we are making progress in this area – multipath channels are tough on modems

I’m looking forward to the next FreeDV activity day in May.

Links

[1] FreeDV Activity Day
[2] Measuring Urban HF Noise with a Loop
[3] Controlled FreeDV Testing

FLOSS Down Under - online free software meetingsMailing List

We now have a mailing list see https://lists.linux.org.au/mailman/listinfo/flounder for information, the address to post to the list is flounder@lists.linux.org.au..

We also have a new URL for the blog and events. See the right sidebar for the link to the iCal file which can be connected to Google Calendar and most online calendaring systems.

,

Lev LafayetteThe HPC Certification Forum : A Call for NZ Contribution

The 2022 eResearch New Zealand conference was held on the 9th to 11th of February, co-hosted by New Zealand eScience Infrastructure (NeSI), REANNZ, and Genomics Aotearoa, and preceded by Carpentaries Connect, a community of institutions and universities that run Software Carpentry, Data Carpentry, and Library Carpentry workshops.

A brief presentation was given to the conference on the HPC Certification Forum and, in particular, a call for participation from New Zealand. The key elements of the presentation included identifying the growth of large datasets and complex problems, the demand for training by researchers, the formation and history of the HPC Certification Forum to provide an "HPC driving license", and the relationship between examinable material by the Forum and the delivery of content by institutions.

Aotearoa-New Zealand is identified as a country that could make a significant contribution to the HPC Certification Forum. In the past, it has made significant strides in reviewing and reviving the HPC Carpentry, and at least one institution (University of Waikato) has taught HPC usage to research scientists outside of specialist computer science courses. The range of potential participation was provided and encouraged to attendees.

,

Tim RileyOpen source status update, December 2021 and January 2022

I’ve been on an absolute tear with my OSS work lately. I’ve been feeling such good momentum that every time I got a moment at the computer, I just had to keep pushing forward with code rather than writing one of these updates. Now that there’s been a couple of Hanami releases and a big dry-system one since the my last update, it’s time for a roundup.

The best thing about the last two months was that I took the first two week in January off from work, and dedicated myself 9-5 to OSS while everyone was still on holidays (the reality was probably more like 10am-4pm then 9pm-12am, but I was having too much fun to stop). This allowed me to get through some really chunky efforts that would’ve been really hard to string along across post-work nighttime efforts alone.

All of this has meant I’ve got a lot to get through here, so I’m going to revert to dot list form with at most a comment or two, otherwise I’ll never get this post done (and you probably won’t want to read it all anyway). Here goes:

Last week I released all of that dry-system work in version 0.23.0, officially our biggest release ever (go read the release notes!). We’re now looking really close to a 1.0 release for dry-system, with just a few things left to go.

I then updated Hanami to use this latest dry-system (including all the updated terminology) and support partial slice imports and exports, which we then released as Hanami 2.0.0.alpha6 just a few days ago.

Getting to this point was a lot of work, and it represents a big milestone in our Hanami 2.0 journey. I’m extremely grateful that I could make this my sole focus for a little while.

Thank you to my sponsors ❤️

My work in Ruby OSS is kindly supported by my GitHub sponsors.

Thank you in particular to Jason Charnes and now also Seb Wilgosz (of HanamiMastery) for your support as my upper tier sponsors!

The 22 dot points in this post show that Hanami 2 is truly getting closer, but there are many dots left to go. I’d love for your support too in helping make this happen.

,

FLOSS Down Under - online free software meetingsFirst Meeting Success

We just had the first Flounder meeting which went well. Had some interesting discussion of storage technology, I learnt a few new things. Some people did the ZFS training and BTRFS training and we had lots of interesting discussion.

Andrew Pam gave a summary of new things in Linux and talked about the sites lwn.net, gamingonlinux.com, and cnx-software.com that he uses to find Linux news. One thing he talked about is the latest developments with SteamDeck which is driving Linux support in Steam games. The site protondb.com tracks Linux support in Steam games.

We had some discussion of BPF, for an introduction to that technology see the BPF lecture from LCA 2022.

Next Meeting

The next meeting (Saturday 5th of March 1PM Melbourne time) will focus on running your own mail server which is always of interest to people who are interested in system administration and which is probably of more interest than usual because of Google forcing companies with “a legacy G Suite subscription” to transition to a more expensive “Business family” offering.

,

Stewart SmithAdventures in the Apple Partition Map (Part 2 of the continuing adventures with the Apple Power Macintosh 7200/120 PC Compatible)

I “recently” wrote about obtaining a new (to me, actually quite old) computer over in The Apple Power Macintosh 7200/120 PC Compatible (Part 1). This post is a bit of a detour, but may help others understand why some images they download from the internet don’t work.

Disk partitioning is (of course) a way to divide up a single disk into multiple volumes (partitions) for different uses. While the idea is similar, computer platforms over the ages have done this in a variety of different ways, with varying formats on disk, and varying limitations. The ones that you’re most likely to be familiar with are the MBR partitioning scheme (from the IBM PC), and the GPT partitioning scheme (common for UEFI systems such as the modern PC and Mac). One you’re less likely to be familiar with is the Apple Partition Map scheme.

The way all IBM PCs and compatibles worked from the introduction of MS-DOS 2.0 in 1983 until some time after 2005 was the Master Boot Record partitioning scheme. It was outrageously simple: of the first 512 byte sector of a disk, the first 446 bytes was for the bootstrapping code (the “boot sector”), the last 2 bytes were for the magic two bytes telling the BIOS this disk was bootable, and the other 64 bytes were four entries of 16 bytes, each describing a disk partition. The Wikipedia page is a good overview of what it all looks like. Since “four partitions should be enough for anybody” wasn’t going to last, DOS 3.2 introduced “extended partitions” which was just using one of those 4 partitions as another similar data structure that could point to more partitions.

In the 1980s (similar to today), the Macintosh was, of course, different. The Apple Partition Map is significantly more flexible than the MBR on PCs. For a start, you could have more than four partitions! You could actually have a lot more than four partitions, as the Apple Partition Map is a single 512-byte sector for each partition, and the partition map is itself a partition. Instead of being block 0 (like the MBR is), it actually starts at block 1, and is contiguous (The Driver Descriptor Record is what’s at block 0). So, once created, it’s hard to extend. Typically it’d be created as 64×512-byte entries, for 32kb… which turns out is actually about enough for anyone.

The Inside Macintosh reference on the SCSI Manager goes through more detail as to these structures. If you’re wondering what language all the coding examples are in, it’s Pascal – which was fairly popular for writing Macintosh applications in back in the day.

But the actual partition map isn’t the “interesting” part of all this (and yes, the quotation marks are significant here), because Macs are pretty darn finicky about what disks to boot off, which gets to be interesting if you’re trying to find a CD-ROM image on the internet from which to boot, and then use to install an Operating System from.

Stewart SmithEvery time I program a Mac…

… the preferred programming language changes.

I never programmed a 1980s Macintosh actually in the 1980s. It was sometime in the early 1990s that I first experienced Microsoft Basic for the Macintosh. I’d previously (unknowingly at the time as it was branded Commodore) experienced Microsoft BASIC on the Commodore 16, Commodore 64, and even the Apple ][, but the Macintosh version was something else. It let you do some pretty neat things such as construct a GUI with largely the same amount of effort as it took to construct a Text based UI on the micros I was familiar with.

Okay, to be fair, I’d also dabbled in Microsoft QBasic that came bundled with MS-DOS of the era, which let you do a whole bunch of graphics – so you could theoretically construct a GUI with it. Something I did attempt to do. Programming on the Mac was so much easier to construct a GUI.

Of course, Microsoft Basic wasn’t the preferred way to program on the Macintosh. At that time it was largely Pascal, with C being something that also existed – but you were going to see Pascal in Inside Macintosh. It was probably somewhat fortuitous that I’d poked at Pascal a bit as something alternate to look at in the high school computing classes. I can only remember using TurboPascal on DOS systems and never actually writing Pascal on the Macintosh.

By the middle part of the 1990s though, I was firmly incompetently writing C on the Mac. No doubt the quality of my code increased after I’d done some university courses actually covering the language rather than the only practical way I had to attempt to write anything useful being looking at Inside Macintosh examples in Pascal and “C for Dummies” which was very not-Macintosh. Writing C on UNIX/Linux was a lot easier – everything was made for it, including Actual Documentation!

Anyway, in the early 2000s I ran MacOS X for a bit on my white iBook G3, and did a (very) small amount of any GUI / Project Builder (the precursor to Xcode) related development – instead largely focusing on command line / X11 things. The latest coolness being to use Objective-C to program applications (unless you were bringing over your Classic MacOS Carbon based application, then you could still write C). Enter some (incompetent) Objective-C coding!

Then Apple went to x86, so the hardware ceased being interesting, and I had no reason to poke at it even as a side effect of having hardware that could run the software stack. Enter a long-ass time of Debian, Ubuntu, and Fedora on laptops.

Come 2022 though, and (for reasons I should really write up), I’m poking at a Mac again and it’s now Swift as the preferred way to write apps. So, I’m (incompetently) hacking away at Swift code. I have to admit, it’s pretty nice. I’ve managed to be somewhat productive in a relative short amount of time, and all the affordances in the language gear towards the kind of safety that is a PITA when coding in C.

So this is my WIP utility to be able to import photos from a Shotwell database into the macOS Photos app:

There’s a lot of rough edges and unknowns left, including how to actually do the import (it looks like there’s going to be Swift code doing AppleScript things as the PhotoKit API is inadequate). But hey, some incompetent hacking in not too much time has a kind-of photo browser thing going on that feels pretty snappy.

Simon LyallAudiobooks – January 2022

Termination Shock by Neal Stephenson

In the near future, a Texas billionaire starts a geoengineering project to counteract global warming. International intrigue results. Similar feel to his other books. 4/5

An Economist walks into a Brothel: And Other Unexpected Places to Understand Risk by Allison Schrager

Examples of how people in unusual situations handle risk and how you can apply it to your life. Interesting and useful. 4/5 – Accidental reread from July 2020.

The Devil’s Candy: The Bonfire of the Vanities Goes to Hollywood by Julie Salamon

A start-to-finish tale of the making of the 1990 big-budget Hollywood bomb. The writer embedded in the production and talked to just about everyone from the director down. fascinating amount of behind-the-scenes detail and insight into people making the film from the director down. 4/5

Leonardo da Vinci: The Biography by Walter Isaacson

Covering what little we know of his life but with analysis of his major works and notebooks. Helps if you have the PDF with all the pictures but listenable if not. 3/5

The Hobbit by J.R.R. Tolkien.

Thought I’d try this new version. I think I still prefer Rob Inglis. My general feelings are:

Pro: He does distinct voices for each character and generally good ones. The voice are influenced by actors in the movies.
Con: His voice is a little indistinct. Not to bad since he’s an actor but separate words are not always clear. He’s not the best with the songs/poems, I’ve heard similar about his LOTR presentation. 4/5

Footprints in the Dust: The Epic Voyages of Apollo, 1969-1975 edited by Colin Burgess.

Covering all Apollo, Skylab, Apollo-Soyuz and Soviet programs. Mostly tries to take different angles from other books so some new stuff even if you’ve read a few of them. 3/5

My Audiobook Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Robert Collinshyper combinators in Rust

Recently I read Michael Snoyman’s post on combining Axum, Hyper, Tonic and Tower. While his solution worked, it irked me – it seemed like there should be a much tighter solution possible.

I can deep dive into the code in a later post perhaps, but I think there are four points of difference. One, since the post was written Axum has started boxing its routes : so the enum dispatch approach taken, which delivers low overheads actually has no benefits today.

Two, while writing out the entire type by hand has some benefits, async code is much more pithy.

Thirdly, the code in the post is entirely generic, except the routing function itself.

And fourth, the outer Service<AddrStream> is an unnecessary layer to abstract over: given the similar constraints – the inner Service must take Request<..>, it is possible to just not use a couple of helpers and instead work directly with Service<Request...>.

So, onto a pithier version.

First, the app server code itself.

use std::{convert::Infallible, net::SocketAddr};

use axum::routing::get;
use hyper::{server::conn::AddrStream, service::make_service_fn};
use hyper::{Body, Request};
use tonic::async_trait;

use demo::echo_server::{Echo, EchoServer};
use demo::{EchoReply, EchoRequest};

struct MyEcho;

#[async_trait]
impl Echo for MyEcho {
    async fn echo(
        &self,
        request: tonic::Request<EchoRequest>,
    ) -> Result<tonic::Response<EchoReply>, tonic::Status> {
        Ok(tonic::Response::new(EchoReply {
            message: format!("Echoing back: {}", request.get_ref().message),
        }))
    }
}

#[tokio::main]
async fn main() {
    let addr = SocketAddr::from(([0, 0, 0, 0], 3000));

    let axum_service = axum::Router::new().route("/", get(|| async { "Hello world!" }));

    let grpc_service = tonic::transport::Server::builder()
        .add_service(EchoServer::new(MyEcho))
        .into_service();

    let both_service =
        demo_router::Router::new(axum_service, grpc_service, |req: &Request<Body>| {
            Ok::<bool, Infallible>(
                req.headers().get("content-type").map(|x| x.as_bytes())
                    == Some(b"application/grpc"),
            )
        });

    let make_service = make_service_fn(move |_conn: &AddrStream| {
        let both_service = both_service.clone();
        async { Ok::<_, Infallible>(both_service) }
    });

    let server = hyper::Server::bind(&addr).serve(make_service);

    if let Err(e) = server.await {
        eprintln!("server error: {}", e);
    }
}

Note the Router: it takes the two services and Fn to determine which to use on any given request. Then we just drop that composed service into make_service_fn and we’re done.

Next up we have the Router implementation. This is generic across any two Service<Request<...>> types as long as they are both Into<Bytes> for their Data, and Into<Box<dyn Error>> for errors.

use std::{future::Future, pin::Pin, task::Poll};

use http_body::combinators::UnsyncBoxBody;
use hyper::{body::HttpBody, Body, Request, Response};
use tower::Service;

#[derive(Clone)]
pub struct Router<First, Second, F> {
    first: First,
    second: Second,
    discriminator: F,
}

impl<First, Second, F> Router<First, Second, F> {
    pub fn new(first: First, second: Second, discriminator: F) -> Self {
        Self {
            first,
            second,
            discriminator,
        }
    }
}

impl<First, Second, FirstBody, FirstBodyError, SecondBody, SecondBodyError, F, FErr>
    Service<Request<Body>> for BinaryRouter<First, Second, F>
where
    First: Service<Request<Body>, Response = Response<FirstBody>>,
    First::Error: Into<Box<dyn std::error::Error + Send + Sync>> + 'static,
    First::Future: Send + 'static,
    First::Response: 'static,
    Second: Service<Request<Body>, Response = Response<SecondBody>>,
    Second::Error: Into<Box<dyn std::error::Error + Send + Sync>> + 'static,
    Second::Future: Send + 'static,
    Second::Response: 'static,
    F: Fn(&Request<Body>) -> Result<bool, FErr>,
    FErr: Into<Box<dyn std::error::Error + Send + Sync>> + Send + 'static,
    FirstBody: HttpBody<Error = FirstBodyError> + Send + 'static,
    FirstBody::Data: Into<bytes::Bytes>,
    FirstBodyError: Into<Box<dyn std::error::Error + Send + Sync>> + 'static,
    SecondBody: HttpBody<Error = SecondBodyError> + Send + 'static,
    SecondBody::Data: Into<bytes::Bytes>,
    SecondBodyError: Into<Box<dyn std::error::Error + Send + Sync>> + 'static,
{
    type Response = Response<
        UnsyncBoxBody<
            <hyper::Body as HttpBody>::Data,
            Box<dyn std::error::Error + Send + Sync + 'static>,
        >,
    >;
    type Error = Box<dyn std::error::Error + Send + Sync + 'static>;
    type Future =
        Pin<Box<dyn Future<Output = Result<Self::Response, Self::Error>> + Send + 'static>>;

    fn poll_ready(
        &mut self,
        cx: &mut std::task::Context<'_>,
    ) -> std::task::Poll<Result<(), Self::Error>> {
        match self.first.poll_ready(cx) {
            Poll::Ready(Ok(())) => match self.second.poll_ready(cx) {
                Poll::Ready(Ok(())) => Poll::Ready(Ok(())),
                Poll::Ready(Err(e)) => Poll::Ready(Err(e.into())),
                Poll::Pending => Poll::Pending,
            },
            Poll::Ready(Err(e)) => Poll::Ready(Err(e.into())),
            Poll::Pending => Poll::Pending,
        }
    }

    fn call(&mut self, req: Request<Body>) -> Self::Future {
        let discriminant = { (self.discriminator)(&req) };
        let (first, second) = if matches!(discriminant, Ok(false)) {
            (Some(self.first.call(req)), None)
        } else if matches!(discriminant, Ok(true)) {
            (None, Some(self.second.call(req)))
        } else {
            (None, None)
        };
        let f = async {
            Ok(match discriminant.map_err(Into::into)? {
                true => second
                    .unwrap()
                    .await
                    .map_err(Into::into)?
                    .map(|b| b.map_data(Into::into).map_err(Into::into).boxed_unsync()),
                false => first
                    .unwrap()
                    .await
                    .map_err(Into::into)?
                    .map(|b| b.map_data(Into::into).map_err(Into::into).boxed_unsync()),
            })
        };
        Box::pin(f)
    }
}

Interesting things here – I use boxed_unsync to abstract over the body concrete type, and I implement the future using async code rather than as a separate struct. It becomes much smaller even after a few bits of extra type constraining.

One thing that flummoxed me for a little was the need to capture the future for the underlying response outside of the async block. Failing to do so provokes a 'static requirement which was tricky to debug. Fortunately there is a bug on making this easier to diagnose in rustc already. The underlying problem is that if you create the async block, and then dereference self, the type for impl of .first has to live an arbitrary time. Whereas by capturing the future immediately, only the impl of the future has to live an arbitrary time, and that doesn’t then require changing the signature of the function.

This is almost worth turning into a crate – I couldn’t see an existing one when I looked, though it does end up rather small – < 100 lines. What do you all think?

FLOSS Down Under - online free software meetingsFirst Meeting Agenda

The first meeting will start at 1PM Australian Eastern time (Melbourne/Sydney) which is +1100 on Saturday the 5th of February.

I will start the video chat an hour early in case someone makes a timezone mistake and gets there an hour before it starts. If anyone else joins early we will have random chat until the start time (deliberately avoiding topics worthy of the main meeting). The link http://b.coker.com.au will redirect to the meeting URL on the day.

The first scheduled talk is a summary and discussion of free software related news. Anyone who knows of something new that excites them is welcome to speak about it.

The main event is discussion of storage technology and hands-on training on BTRFS and ZFS for those who are interested. Here are the ZFS training notes and here are the BTRFS training notes. Feel free to do the training exercises on your own VM before the meeting if you wish.

Then discussion of the future of the group and the use of FOSS social media. While social media is never going to be compulsory some people will want to use it to communicate and we could run some servers for software that is considered good (lots of server capacity is available).

Finally we have to plan future meetings and decide on which communication methods are desired.

The BBB instance to be used for the video conference is sponsored by NZOSS and Catalyst Cloud.

,

OpenSTEMCovering the federal election, before the election

Since PM Scott Morrison did not announce the federal election date last week, it will now be held somewhere between March and May (see the post from ABC’s Antony Green for details). Various aspects of elections are covered in the Civics & Citizenship Australian Curriculum in Years 4, 5 and 6. Students are interested in […]

The post Covering the federal election, before the election first appeared on OpenSTEM Pty Ltd.

FLOSS Down Under - online free software meetingsFlounder Overview

Flounder is a new free software users group based in the Australia/NZ area. Flounder stands for FLOSS (Free Libre Open Source Software) down under.

Here is my blog post describing the initial idea, the comment from d3Xt3r suggested the name. Flounder is a group of fish that has species native to Australia and NZ.

The main aim is to provide educational benefits to free software users via an online meeting that can’t be obtained by watching YouTube videos etc in a scope that is larger than one country. When the pandemic ends we will keep running this as there are benefits to be obtained from a meeting of a wide geographic scope that can’t be obtained by meetings in a single city. People from other countries are welcome to attend but they aren’t the focus of the meeting.

Until we get a better DNS name the address http://b.coker.com.au will redirect to the BBB instance used for online meetings (the meeting address isn’t yet setup so it redirects to the blog). The aim is that there will always be a short URL for the meeting so anyone who has one device lose contact can quickly type the URL into their backup device.

The first meeting will be on the 5th of Feb 2022 at 1PM Melbourne time +1100. When we get a proper domain I’ll publish a URL for an iCal file with entries for all meetings. I will also find some suitable way for meeting times to be localised (I’m sure there’s a WordPress plugin for that).

For the hands-on part of the meetings there will be virtual machine images you can download to run on your own system (tested with KVM, should work with other VM systems) and the possibility of logging in to a running VM. The demonstration VMs will have public IPv6 addresses and will also be available through different ports on a single IPv4 address, having IPv6 on your workstation will be convenient for you but you can survive without it.

Linux Australia has a list of LUGs in Australia, is there a similar list for NZ? One thing I’d like to see is a list of links for iCal files for all the meetings and also an iCal aggregator that for all iCal feeds of online meetings. I’ll host it myself if necessary, but it’s probably best to do it via Linux Australia (Linux Australasia?) if possible.

,

David RoweMeasuring HF Noise Around a City

I recently took my loop antenna [1] to a national park on the edge of Adelaide, and tried to measure the noise power at 7.1MHz. I was expecting a low level. Curiously, the noise seemed directional, there was a 10dB null off the loop in one direction. This made me wonder if I was “DFing Adelaide”, and seeing the ground wave signal [5] from the sum of all noise sources in the city.

So I hatched a plan to take my loop antenna to a bunch of national parks around the perimeter of the city, and see if I could get sensible bearings back towards the city. This time I used a spectrum analyser so I didn’t have to mess around with s-meters. I spent a pleasant day circumnavigating the city, and hacked up a Python script [2][3] to plot the results on a map. The base of each line is the measurement location, and the height of the line the noise relative to the noise floor of the system.

Here’s a plot of noise power against the distance from the center of the city.

These results suggest noise levels drop as we get further away from the city, and is some evidence for ground wave propagation of noise from the city. I had wondered if being sited in a valley just outside the city would be enough to get a low noise floor, but perhaps distance is a factor as well.

I’m surprised a straight line fit of log(power) against linear(distance) works so well, as it should be a curve due to the inverse square law. Oh well, I guess it’s just a small amount of data.

The outlier at 30km was a site between two large sheds that were making machinery noises. I could see harmonics on the spec-an so suspect switch mode power supplies or similar nasties. Country sites can have local noise too.

The loop antenna/pre-amp/spec-an combination has a noise floor of -146 dBm/Hz. Some of my measurements were close to that level (e.g. -143). The measured noise power is actually the sum of the noise power collected by the antenna and the noise of the receiver. So if the received noise is the same level as the receiver (say both -146), the total noise power we measure is twice that (3dB higher) or -143. So I added some code to correct for the receiver noise power, this makes a few dB difference at low noise levels.

Reference [6] Figure 4.9 has some curves of expected noise powers, calculated from an ITU-T document on the subject. Here is a comparison with my measurements at 7 MHz, using my loop antenna and pre-amp system [1]:

Condition Expected Noise Factor (dB) [6] Measured Noise (dBm/Hz) Measured Noise Factor (dB)
Residential 50 -120 54
Quiet Country 30 -146 28

For the residential case I measured between -115dB/Hz and -127dBm/Hz around my suburb so used -120 as the average. Much to my surprise my results are a reasonable match with [6]! Of course rather than being fixed values, the absolute noise levels vary quite a lot over different sites, time of day, and atmospheric conditions, which is noted in [6] and the ITU-T document.

There is a 20dB difference in noise power from the best case in my suburb (-127 dBm/Hz) to the quietest location I visited (-146 dBm/Hz). This is consistent with [6]. That’s a factor of 100 if the noise power in expressed in Watts. Someone could use 1W to talk to me in the country, but require 100 watts to achieve the same SNR if I was listening in my suburb. It’s not possible for the average 100W Ham station to increase their power to 10kW which explains why HF reception from urban locations is getting so hard.

My experimental aim of “DF-ing” the noise was busted. I couldn’t find good nulls from most sites, especially the very low noise level sites.

Layers of Urban Noise

I have some vague hope of using DSP techniques and multiple antennas to cancel some of the noise and actually be able to receive 40M HF radio signals at home. These hopes are fading – it seems like there are several layers of noise:

  1. A city wide urban noise fog made up of the sum of millions of switch mode power supplies, VDSL [4], and high speed digital hardware. This is the best case S5-6 noise I see at certain times of day, and that I have established [1] is fairly uniform (+/- 5dB) around my suburb. An antenna with directivity could possibly help by rejecting noise from some directions, but I don’t really want to put up a 40M beam. I’m not sure if this noise can be reduced using two antenna phase cancellation techniques as the “urban fog” is probably the sum of energy collected from many directions, and phase cancellation effectively puts a null in the pattern of a two antenna array at one (or perhaps two) bearings. More thought required.
  2. High energy, very local stuff that sits on top that urban noise fog, like my neighbours lights or power line noise. This is often impulsive, and pushes the noise over S9. There may be several of these local sources coming from different directions, complicating the use of phase-difference subtraction techniques.
  3. Some times you get all of these noise sources at the same time, which explains why the analog phase-cancellation based boxes aren’t always effective. They can only cancel one noise source.

Further Work

Some ideas for further work:

  1. Try a wire antenna with directivity.
  2. Try 20m, that seems to have a lower quiescent noise floor (S0 at times), but still gets hammered by the S9+ local noise sources at various times of day. However that might reduce the problem to just the local noise sources, as the “urban noise fog” is at an acceptable level. I think. Time to retune the loop!
  3. Think about noise reduction techniques for the strong local impulsive noise.

Links

[1] Measuring Urban HF Noise with a Loop
[2] Easy Steps To Plot Geographic Data on a Map
[3] GitHub repo for this project
[4] VDSL versus HF Radio
[5] Radio Noise W8JI
[6] A High Performance Active Antenna for the High Frequency Band

,

Jan SchmidtPulling on a thread

I’m attending the https://linux.conf.au/ conference online this weekend, which is always a good opportunity for some sideline hacking.

I found something boneheaded doing that today.

There have been a few times while inventing the OpenHMD Rift driver where I’ve noticed something strange and followed the thread until it made sense. Sometimes that leads to improvements in the driver, sometimes not.

In this case, I wanted to generate a graph of how long the computer vision processing takes – from the moment each camera frame is captured until poses are generated for each device.

To do that, I have a some logging branches that output JSON events to log files and I write scripts to process those. I used that data and produced:

Pose recognition latency.
dt = interpose spacing, delay = frame to pose latency

Two things caught my eye in this graph. The first is the way the baseline latency (pink lines) increases from ~20ms to ~58ms. The 2nd is the quantisation effect, where pose latencies are clearly moving in discrete steps.

Neither of those should be happening.

Camera frames are being captured from the CV1 sensors every 19.2ms, and it takes that 17-18ms for them to be delivered across the USB. Depending on how many IR sources the cameras can see, figuring out the device poses can take a different amount of time, but the baseline should always hover around 17-18ms because the fast “device tracking locked” case take as little as 1ms.

Did you see me mention 19.2ms as the interframe period? Guess what the spacing on those quantisation levels are in the graph? I recognised it as implying that something in the processing is tied to frame timing when it should not be.

OpenHMD Rift CV1 tracking timing

This 2nd graph helped me pinpoint what exactly was going on. This graph is cut from the part of the session where the latency has jumped up. What it shows is a ~1 frame delay between when the frame is received (frame-arrival-finish-local-ts) before the initial analysis even starts!

That could imply that the analysis thread is just busy processing the previous frame and doesn’t get start working on the new one yet – but the graph says that fast analysis is typically done in 1-10ms at most. It should rarely be busy when the next frame arrives.

This is where I found the bone headed code – a rookie mistake I wrote when putting in place the image analysis threads early on in the driver development and never noticed.

There are 3 threads involved:

  • USB service thread, reading video frame packets and assembling pixels in framebuffers
  • Fast analysis thread, that checks tracking lock is still acquired
  • Long analysis thread, which does brute-force pose searching to reacquire / match unknown IR sources to device LEDs

These 3 threads communicate using frame worker queues passing frames between each other. Each analysis thread does this pseudocode:

while driver_running:
    Pop a frame from the queue
    Process the frame
    Sleep for new frame notification

The problem is in the 3rd line. If the driver is ever still processing the frame in line 2 when a new frame arrives – say because the computer got really busy – the thread sleeps anyway and won’t wake up until the next frame arrives. At that point, there’ll be 2 frames in the queue, but it only still processes one – so the analysis gains a 1 frame latency from that point on. If it happens a second time, it gets later by another frame! Any further and it starts reclaiming frames from the queues to keep the video capture thread fed – but it only reclaims one frame at a time, so the latency remains!

The fix is simple:

while driver_running:
   Pop a frame
   Process the frame
   if queue_is_empty():
     sleep for new frame notification

Doing that for both the fast and long analysis threads changed the profile of the pose latency graph completely.

Pose latency and inter-pose spacing after fix

This is a massive win! To be clear, this has been causing problems in the driver for at least 18 months but was never obvious from the logs alone. A single good graph is worth a thousand logs.

What does this mean in practice?

The way the fusion filter I’ve built works, in between pose updates from the cameras, the position and orientation of each device are predicted / updated using the accelerometer and gyro readings. Particularly for position, using the IMU for prediction drifts fairly quickly. The longer the driver spends ‘coasting’ on the IMU, the less accurate the position tracking is. So, the sooner the driver can get a correction from the camera to the fusion filter the less drift we’ll get – especially under fast motion. Particularly for the hand controllers that get waved around.

Before: Left Controller pose delays by sensor
After: Left Controller pose delays by sensor

Poses are now being updated up to 40ms earlier and the baseline is consistent with the USB transfer delay.

You can also visibly see the effect of the JPEG decoding support I added over Christmas. The ‘red’ camera is directly connected to USB3, while the ‘khaki’ camera is feeding JPEG frames over USB2 that then need to be decoded, adding a few ms delay.

The latency reduction is nicely visible in the pose graphs, where the ‘drop shadow’ effect of pose updates tailing fusion predictions largely disappears and there are fewer large gaps in the pose observations when long analysis happens (visible as straight lines jumping from point to point in the trace):

Before: Left Controller poses
After: Left Controller poses

,

Linux AustraliaCouncil Meeting 12th January 2022 – Minutes

1. Meeting overview and key information

Present

  • Sae Ra Germaine (President)
  • Joel Addison (Vice President)
  • Russell Stuart (Treasurer)
  • Clinton Roy (Secretary)
  • Jonathan Woite (Council)
  • Neil Cox (Council)

Apologies

Meeting opened at 20:25 AEDT by Sae Ra  and quorum was achieved.

Minutes taken by Joel

2. Log of correspondence

3. Items for discussion

  • PyCon 2022
    • Currently there is a large amount of deposit money sitting with the Adelaide Convention Centre
    • We need to work out what to do (and what we can do) with the deposit if the decision is made that an in person event is not feasible
    • The team indicate that they would not run an online event again, but in person would be possible
    • We will keep discussing this with Richard to determine next steps

4. Items for noting

  • Linux.conf.au 2022
    • Attendee numbers are down at the moment, which means we might have some budget concerns
    • Final preparations are being made at the moment and things are coming together

5. Other business

  • Annual Report
    • Russell is working on this at the moment
    • Aim to send it to members on Friday morning (a day before the AGM)

6. In Camera

1 item was discussed

The post Council Meeting 12th January 2022 – Minutes appeared first on Linux Australia.

Linux AustraliaCouncil Meeting 22nd December 2021 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (Vice President)
  • Clinton Roy (Secretary)
  • Russell Coker (Council)
  • Russell Stuart (Treasurer)
  • Sae Ra Germaine (President)
  • Neil Cox (Council)

Apologies

–    Jonathan Woithe (Council)

Meeting opened at 19:35 AEDT by Sae Ra  and quorum was achieved.

Minutes taken by Clinton Roy

2. Log of correspondence 

3. Items for discussion

  • Rusty Award discussion underway.

ACTION ITEM: Clinton to put up notes about the previous rules for the Rusty Award on the website.

  • Sae Ra: someone needs to take the returning office through the accreditation officer.

ACTION ITEM: Clinton and Joel to go through the prepared notes for how to create an election, then run through it with the returning officer.

ACTION ITEM: For all council officers to send their biographies into the secretary.

ACTION ITEM: for all council members to register for the agm via zoom.

ACTION ITEM: for the secretary to remind linux-aus to register for the AGM.

The post Council Meeting 22nd December 2021 – Minutes appeared first on Linux Australia.

Linux AustraliaCouncil Meeting 08th December 2021 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (Vice President)
  • Clinton Roy (Secretary)
  • Jonathan Woite (Council)
  • Russell Coker (Council)
  • Russell Stuart (Treasurer)
  • Sae Ra Germaine (President)
  • Neil Cox (Council)

Apologies

Meeting opened at 19:32 AEDT by Sae Ra  and quorum was achieved.

Minutes taken by Clinton Roy

2. Log of correspondence 

  • Kathy Reid request for de-identified membership postcode/city/country data to create a map showing visual densities of member locations
    • Sae Ra notes that she would like to show the same map at the Linux Australia talk that she is doing with Julien at LCA2022. Sae Ra suggests that our members can legally request membership data under our constitution, although our privacy policy has wording that suggests data may not be shared with third parties unless this is required to carry out normal activities or we are compelled to by law enforcement. The committee commences reading the Associations Incorporations Act, constitution and privacy policy. Neill thinks the privacy policy blocks us from sending the information. Clinton wonders if the council can give Kathy the final result of the data she’s after without giving individual numbers. Joel mentions this would be normal operation if we generated this ourselves. Jonathan thinks the constitution and the privacy policy don’t clash, as there’s a provision for if we are legally obliged. Russell Coker thinks any member of good standing should not constitute a third party.  Clinton asks, again, if council can make the final product, a pretty heat map. It is mentioned that Kathy is on the Linux Australia Media team, so sharing data could come under that. Lots of back and forward discussion about how to make things clear between the constitution and privacy policy. ACTION ITEM: Jonathan to make an adjustment to privacy policy, removing ambiguity.
  • Sae Ra raises a motion to give the requested data to Kathy for the purpose of generating a heat map of members, Russell Coker seconds. A vote on the motion:

For: six.

Against: none.

Abstain: 1.

The motion is carried. ACTION ITEM: Sae Ra to get postcode information to Kathy.

3. Items for discussion

  • Bank kerfuffle. Russell rang the bank again today, they did not get back to him. Has also emailed them with no result. Russell guesses that it’s closed due to inactivity. If that is the case, it’s very hard to undo, and the easiest thing to do is to shift bank accounts. ACTION ITEM: Russell will update the account in Stripe and XERO.

4. Items for noting

  • Once the admin team has moved the instance of civicrm to our local instance, we’ll be ready to open up taking nominations for council. Sae Ra asks for the best time to hold the AGM during LCA, some non-important logistical discussion ensues. Saturday is chosen.
  • Jonathan has a work in progress task to look at possible voting platforms, at this point still thinks we should stick with Zoom, even though it’s not an ideal solution for a FOSS organisation. The 22nd of December meeting would mostly be looking at nominations. Looking at changing the Fifth of Jan meeting to the Twelfth to make things a little easier, an induction for any new council members.

5. Other business

  • Russell is still dealing with some outstanding invoices.

6. In Camera

  • 1 item was discussed in relation to LCA2022

The post Council Meeting 08th December 2021 – Minutes appeared first on Linux Australia.

Linux AustraliaCouncil Meeting 24th November 2021 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (Vice President)
  • Clinton Roy (Secretary)
  • Jonathan Woite (Council)
  • Russell Coker (Council)
  • Russell Stuart (Treasurer)
  • Sae Ra Germaine (President)
  • Neil Cox (Council)

Apologies

Meeting opened at 19:35 AEDT by Sae Ra  and quorum was achieved.

Minutes taken by Clinton Roy

2. Log of correspondence 

  • WA open source group. Sae Ra to continue correspondence to work out more information.
  • Kathy’s NZ request. Discussion about the request. ACTION ITEM Sae Ra to respond to Kathy with a thank you, but no thanks.

3. Items for discussion

  • Sae Ra has organised a dummy election, asks everyone to vote after 9pm
  • Steve talks on behalf of the Admin team. Talks about the election site. Cron is being done manually atm. Email still going to disk atm (to stop mail going to the world). Sae Ra has seen some small issues with icons, will put a list of little niggles together. Has moved a lot of the mail across, needs to wait till more stuff is in our control to continue. Waiting for the mirror admin to move things across to upload pyconau stuff.
  • LCA2022. Tickets should be opening up shortly. Some keynotes have been opened up. AV company has been selected.

4. Items for noting

  • ACTION ITEM For all council members to check and update the council position descriptions up on github. https://github.com/linuxaustralia/position-descriptions

5. Other business

  • Nil

6. In Camera

  • Nil

The post Council Meeting 24th November 2021 – Minutes appeared first on Linux Australia.

Linux AustraliaCouncil Meeting 10th November 2021 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (Vice President)
  • Clinton Roy (Secretary)
  • Jonathan Woithe (Council)
  • Russell Coker (Council)
  • Neil Cox (Council)

Apologies

–    Russell Stuart (Treasurer)

  • Sae Ra Germaine (President)

Meeting opened at 19:34 AEDT by Joel Addison  and quorum was achieved.

Minutes taken by Clinton Roy

2. Log of correspondence 

Snail mail: Bank term deposit slips.

3. Items for discussion

  • Defamation Law (Open Australia in particular?) We would like more information about their case. Jonathan has posted his one page summary to the list.  Clinton lays out a possible event chain that leads to someone writing a possibly defamatory note on Open Australia’s website about a housing developer. Joel comments that at least having an updated communications policy that makes things clear would be better. Cancelling our communications channels isn’t really feasible, neither is moderating them. Limiting them does make things better. Discussion about different channels and what moderation options already exist with them. General discussion around the judgement and what lawyers are saying about it. Neill suggests that we really don’t want to become the test case. Jonathan quotes from the communication policy, and it allows us to switch moderation on if we need it.
  • ACTION ITEM:Joel and Sae Ra:  itemize our communication channels, work out what can be moderated and what can’t be, what ones we can close.

Discussions about what forms of communications fall under the ruling. It’s about sites that are publically available that can take comments.

ACTION ITEM: Jonathan to modify the communication policy to make it clearer and easier for LA to protect itself; post to list when it is updated; council will vote at next normal meeting.

4. Items for noting

  • LCA AV provider, have received quotes from all providers, and have selected a preferred provider. Sae Ra and Joel are going to be meeting with the preferred supplier soon to discuss details.
  • LA Website – Testing occurring Sun 14 Nov
  • Linux Conf Au Session Selection Committee Post meeting Tomorrow, anyone available? Joel and Sae Ra are going to attend, to make up for Clinton’s having a little bit of a social life.

5. Other business

6. In Camera

  • 1 item was discussed

The post Council Meeting 10th November 2021 – Minutes appeared first on Linux Australia.

Linux AustraliaCouncil Meeting 27th October 2021 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (Vice President)
  • Clinton Roy (Secretary)
  • Jonathan Woite (Council)
  • Russell Coker (Council)
  • Sae Ra Germaine (President)
  • Neil Cox (Council)

–    Russell Stuart (Treasurer)

Apologies

Meeting opened at 19:30 AEDT by Sae Ra  and quorum was achieved.

Minutes taken by Clinton Roy

2. Log of correspondence

Dave Sparks and Owen speaking for Drupal. First time for Dave at this meeting, taking over from Owen. Speakers are locked in, registrations trickling in, looking at 230/240 attendees. Looking at a $7k profit overall. Looking at in person events for next year. Did pull a Wellington venue at no cost for this year. Dave is in New Zealand. Dave will move into the chair role for the rest of his term, the end of 2022 currently. Owen staying on as events management role for his term 2023. Jonathan thanks them for the draft AGM report previously sent. Sae Ra asks if they want a thank you from the LA council at the conference.

Steve speaking on behalf of the admin team. Did a final sync of the website from the cloud  version. Needs to update some urls in the stand alone website so that it can send emails at a certain point in time.  Promises Joel that the cloudy stuff will be happening soon. Steve says there will be a few changes to the budget, only a couple of hundred dollars, not thousands. Migrating everything to the temporary mirror. From 4TB to 45TB storage.

Looking at rebuilding the gitlab instance, one for LA projects, one for others. This means project managers cannot see secrets in another project.

Patrick speaking for Joomla Australia. Close to a constitution, code of conduct etc documents. Another meeting on Tuesday, then those documents should be ready to submit to LA to become a subcommittee. Feeling the effects of Joomla 4 that was released last month. Some fine tuning with third party developers was required.

Richard, speaking for Pycon Australia: have some draft reports written up on the last event. Budget isn’t closed yet, mostly due to gifts not being sent out yet. Richard will delegate AGM report writing. Still have a considerable part of a deposit for a physical venue for next year. Can scale the booking depending on ticket sales. Trying to get another person or two onto the steering committee.

3. Items for discussion

  • Rusty wrench
  • Jonathan has drafted up a communications policy.
  • ACTION ITEM: Sae Ra to send out Rusty Wrench nominations.

4. Items for noting

  • Nil

5. Other business

  • Nil

6. In camera

2 items were discussed in camera

The post Council Meeting 27th October 2021 – Minutes appeared first on Linux Australia.

Linux AustraliaCouncil Meeting 13th October 2021 – Minutes

1. Meeting overview and key information

Present

  • Joel Addison (Vice President)
  • Clinton Roy (Secretary)
  • Jonathan Woite (Council)
  • Russell Stuart (Treasurer)
  • Sae Ra Germaine (President)
  • Neil Cox (Council)

–    Russell Coker (Council)

Apologies

Meeting opened at 19:37 AEDT by Sae Ra  and quorum was achieved.

Minutes taken by Clinton Roy

2. Log of correspondence

3. Items for discussion

  • Portfolios. Sae Ra shows her Portfolios diagram. Minor feedback is given to the diagram. Sae Ra to make some minor adjustments.
  • LCA stuff to still deal with:
    • Comments on social media, defamation law judgement. TODO: Jonathan to come up with a one sheet giving an overview of the situation.
  • SSC virtual face to face meeting. Sae Ra says it turned out to be easier than in many previous years, the program sort of fell together. Was run really well on the day. The jitsi platform wasn’t working terribly well .

4. Items for noting

  • Nil

5. Other business

  • Nil

6. In Camera

4 items were discussed

The post Council Meeting 13th October 2021 – Minutes appeared first on Linux Australia.

,

Colin CharlesThis thing is still on?

Yes, the blog is still on. January 2004 I moved to WordPress, and it is still here January 2022. I didn’t write much last year (neither here, not experimenting with the Hey blog). I didn’t post anything to Instagram last year either from what I can tell, just a lot of stories.

August 16 2021, I realised I was 1,000 days till May 12 2024, which is when I become 40. As of today, that leads 850 days. Did I squander the last 150 days? I’m back to writing almost daily in the Hobonichi Techo (I think last year and the year before were mostly washouts; I barely scribbled anything offline).

I got a new Apple Watch Series 7 yesterday. I can say I used the Series 4 well (79% battery life), purchased in the UK when I broke my Series 0 in Edinburgh airport.

TripIt stats for last year claimed 95 days on the road. This is of course, a massive joke, but I’m glad I did get to visit London, Lisbon, New York, San Francisco, Los Angeles without issue. I spent a lot of time in Kuantan, a bunch of Langkawi trips, and also, I stayed for many months at the Grand Hyatt Kuala Lumpur during the May lockdowns (I practically stayed there all lockdown).

With 850 days to go till I’m 40, I have plenty I would like to achieve. I think I’ll write a lot more here. And elsewhere. Get back into the habit of doing. And publishing by learning and doing. No fear. Not that I wasn’t doing, but its time to be prolific with what’s been going on.

The post This thing is still on? first appeared on Colin Charles Agenda.

,

Simon LyallAudiobooks – December 2021

How Smart Machines Think by Sean Gerrish

An introduction to Machine learning, covering advances of the last 10 years or so via stories about self-driving cars, the Netflix prize etc. 3/5

Harrier 809: Britain’s Legendary Jump Jet and the Untold Story of the Falklands War by Rowland White

Covering the Sea Harrier’s part in the Falkland’s as well as other parts of the air war like operations in Chile and Argentinian units. Well research and written. 4/5

Caroline: Little House, Revisited by Sarah Miller

A retelling of Little House on the Prairie from the perspective of Caroline Ingalls. Interesting re-reading events though an adult’s eyes. Second external review. 3/5

My Adventurous Life by Dick Smith

Autobiography by Australian Entrepreneur and Adventurer. Well packed with interesting stories of both business and other endeavors. 3/5

Nuclear Folly: A History of the Cuban Missile Crisis by Serhii Plokhy

Draws on Soviet and Ukrainian sources to give more details from the Russian side than previous books. Emphasizes the role of luck as both sides misread the other. 3/5

999 – My Life on the Frontline of the Ambulance Service by Dan Farnworth

Stories from the author’s career plus their personal struggles and advocacy for better mental-health support for Ambulance officers. 3/5

The Code Breaker: Jennifer Doudna, Gene Editing, and the Future of the Human Race by Walter Isaacson

As well as profiling Doudna it covers others in the field as well as the technology. Some early responses to the Covid19 pandemic. 3/5

Thunderball by Ian Fleming

James Bond travels to a Health Camp(!) and then to the Bahamas to investigate stolen Nuclear Weapons and Blackmail. Usual action ensues 3/5

See also: Top Audiobooks I’ve listened to

My Audiobook Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

David RoweMeasuring Urban HF Noise with a Loop

For the last few years I’ve been interested in the problem of urban HF noise. As a first step I’d like to characterise the problem by taking a few measurements of noise at my home, around my suburb, and a few other locations. So I need a small, portable antenna. As I’m interested in high noise levels on HF, it doesn’t have to be very efficient.

On a “quiet” day – I see 7MHz noise from my dipole at S5 on my IC7200. I think the S meter is adding up all the noise in the IF bandwidth of the radio. Hooking up my spec-an and referring this to 1 Hz bandwidth, I measure the average noise at about -120 dBm/Hz which is 54dB above thermal (-174dBm/Hz). So even a low gain antenna will receive plenty of noise without hitting the noise floor of the receiver. Ideally I’d like to see -135 dBm/Hz, which is S0 on my IC7200.

Small Loop Antenna

I decided to build a loop, as they don’t need a ground plane, and are relatively insensitive to low permeability surrounding objects (like me holding it) [2].

Using the equations from [2] I decided to use 3 turns of RG58 (about 3m) with a 0.3m diameter. The braid is the conductor, I don’t use the inner at all.

My loop is fed using a toroidal transformer [5]. I like the idea of using a transformer to ensure good balance. All three turns of the loop coax pass through the toroid (the primary), and the secondary is a bunch of turns I wind myself. Using the equations on [2], I estimated the radiation resistance Rr at just 0.006 ohms. To match this to 50 ohms means a lot of turns on the matching transformer.

I tried 50 and 100 turns but for some reason the return loss was still really poor. Also it didn’t seem to be receiving anything, and touching antenna or capacitor terminals had no effect. On a whim I reduced the number of turns and the return loss started to improve, I could see a noticeable dip of a few dB as I moved the tuning capacitor. If I touched the capacitor or the loop it would de-tune, indicating it really was starting to become an antenna!

Turns out that for small loops, the resistive loss Rl is often greater than Rr. I confirmed this by connecting the loop in series with a fixed 100pF capacitor across the spec-an input. When driven by the tracking generator I can see a dip at resonance. By measuring the bandwidth I could estimate the Q and hence the loss resistance Rl of about 2.8 ohms, much greater than Rr of 0.006 ohms.

At about 15 turns I obtained a return loss of better than 10dB – good enough. However it’s really just a match to Rl, consequently the antenna is quite inefficient, I estimate a gain of 10*log10(D*Rr/Rl) = -25dB (D=1.5, the directivity of a small loop). But that still not bad for a 30cm loop on 40m, and good enough for my mission.

The ratio of 15 turns secondary, 3 turns primary (15/3)^2 would transform 2.8 ohm into around 70 ohms, and lead to an expected return loss of 15dB, about what I’m seeing.

EMI receiver

I decided to use my FT817 as a portable receiver to log noise values as I wander about with the loop. So I added a 26dB pre-amp [3] to compensate for the negative gain of the loop and to boost the Rx level high enough to move the S-meter. The pre-amp is increasing the signal and noise together, so unlike a regular pre-amp the receiver will not be more sensitive. Just louder.

I placed the loop on a 1m tripod at the front of my yard, away from the house as that seemed to reduce the noise a bit. Using distant SSB signals, I compared the dipole to the loop over a period of a few days. I used a RTL-SDR SDR running gqrx on the 40m band, with a 7MHz bandpass filter to prevent overload. At times the levels were quite close, at other times the loop down about 10dB. Sometimes the SNR was better on the loop, other times the dipole. It was hard to get exact measurements. I guess there are differences in pattern, polarisation, and the signals received at the slightly different sites, and the power of SSB bounces around.

However it’s fair to say the loop (with it’s pre-amp) is roughly as sensitive as the dipole. If the loop detects a signal (such as noise) the dipole will also detect the same signal at roughly the same level. So the loop is a useful antenna for portable EMI measurement.

By connecting my spec-an in parallel with the loop antenna, I created a calibration curve of noise density against the S meter reading:

My FT817 S-meter seems to have a very narrow range, and is wildly different from the IC7200 S meter. However if I can measure less than S1 on the FT817 (blue, attenuator off), it looks like I will hit my target of -135dBm/Hz.

Blowing up my FT817

During some portable measurements I blew up my FT817! The fix and fault (shorted filter EMI choke and blown internal battery fuse) were exactly as described in [4]. I managed to repair the Rx but the Tx power is still low at 200mW, it’s almost like it’s folding back due to high SWR (although the SWR into the dummy load is Ok). So I’ll need to think about that fault some more. I’ve added external fuses to both the positive and negative FT817 supply leads now, as well as the Lithium battery I use for external power.

Noise measurements

I hopped on my bike and logged the noise at various sites around my suburb. They all bent the FT817 needle with the attenuator off (blue), so I used the “attenuator on” curve (red). The actual measurements are the red circles in the figure above.

So it’s not just my house, this seems to be pervasive problem in my neighbourhood. The best case was walking out into the middle of a large sports ground. I guess that makes sense, as the EMI source must be around the perimeter, and we get some inverse square law fall off. Take that far enough and we have portable operation from a country site – putting some distance between us and the cloud of urban noise.

Discussion and Conclusions

I’ve been reading about antennas this year, and can recommend “Antenna Physics” by Robert Zavrel [1]. The technical level was just right for me – between typical Ham texts and deeply mathematical tomes. This project complemented my reading, and give me some practical experience in the concepts of loss resistance (Rl) and radiation resistance (Rr).

It was pretty cool to see the return loss improve as I adjusted the matching network, and understand “why”. I really enjoy the learning experiences with Ham Radio.

I now have a portable, sorta-calibrated EMI receiver, that I can use to measure the noise levels at various sites, and compare it to my target. As a bonus the loop is directional, so I can DF noise. The range of the FT817 S-meter is a bit limited, but it’s calibrated well enough to tell me if the noise level is similar to my house or getting near my target level of -135 dBm/Hz. Nowhere in my suburb is close, and the noise is not localised to my house.

I’m quite surprised a 30cm loop can receive 7 MHz signals at all, in a quiet location it works just fine as a receive antenna.

The occasional improvement in SNR with the loop is worth looking into. Maybe it’s the location away from the house. Or the lack of near B-field noise, compared to the near E-field noise that the dipole will respond too. It’s difficult to put my dipole in that location (or move it at all), as it’s so big and there are power lines nearby. The compact size of the loop is very useful for repositioning.

Time is a factor, my noise level varies over the day as various noise sources come and go. Over the course of the day I can see at least three different noise sources, for example (i) high energy clicks 20ms apart that seem to be coming from power lines (ii) some sort of (lighting?) noise from a neighbour in the evening, plus (iii) some other noise with a 30us period. Urban HF noise is a complex problem.

Reading Further

[1] Antenna Physics – an Introduction
[2] Antenna Theory Loop Antenna Page
[3] Experimental methods in RF Design, Ch2, Fig 2.34, Class A amplifier
[4] FT817 short circuit power supply repair – the exact fault I induced in my FT-817. This post was very helpful in fixing it!
[5] VK6CS Magnetic Loop Antenna Matching

,

Simon LyallDonations 2021

Each year I do the majority of my Charity donations in early December (just after my birthday) spread over a few days (so as not to get my credit card suspended).

In 2021 I cut down donations to random open source projects but donated $100 each to The Software Freedom Conservancy and Software in the Public Interest.

I do a blog post about it to hopefully inspire others. See: 2020, 2019, 2018, 2017, 2016, 2015

All amounts this year are in $US unless otherwise stated

General Charities

My main donations was $750 to Givewell (to their Maximum Impact Fund). Once again I’m happy that Givewell make efficient use of money donated. I decided this year to give a higher proportion of my giving to them than last year.

Software and Internet Infrastructure Projects

$100 each to the Software Freedom Conservancy and Software in the Public Interest . Money not attached to any specific project.

$51 to the Internet Archive
$25 to Let’s Encrypt

Others including content creators

I donated $103 to Signum University to cover Corey Olsen’s Exploring the Lord of the Rings series plus other stuff I listen to that they put out.

I paid $100 to be a supporter of NZ News site The Spinoff

Patreon

I support a number of creators via Patreon

Share

,

,

,

Gary PendergastWordPress and web3

Blockchain. Cryptocurrency. Ethereum. NFTs. DAOs. Smart Contracts. web3. It’s impossible to avoid the blockchain hype machine these days, but it’s often just as difficult to decipher what it all means.

On top of that, discourse around web3 is extremely polarising: everyone involved is very keen to a) pick a team, and b) get you to join their team. If you haven’t picked a team, you must be secretly with the other team.

Max Read made a compelling argument that the web3 debate is in fact two different debates:

But, OK, what is the root disagreement, exactly? The way I read it there are two broad “is web3 bullshit?” debates, not just one, centered around the following questions:

Can the blockchain do anything that other currently existing technology cannot do and/or do anything better or more efficiently than other currently existing technology?

Will the blockchain form the architecture of the internet of the future (i.e. “web3”), and/or will blockchain-native companies and organizations become important and powerful?

Max Read — Is web3 bullshit?

I’m inclined to agree with Max’s analysis here: there’s a technical question, and there’s a business/cultural question. It’s hard to separate the two when every day sees new headlines about millions of dollars being stolen or scammed; or thousands of people putting millions of dollars into highly optimistic ventures. There are extreme positives and extreme negatives happening all the time in the web3 world.

With that in mind, I want to take a step back from the day-to-day excitement of cryptocurrency and web3, and look at some of the driving philosophies espoused by the movement.

Philosophies of web3

There are a lot of differing viewpoints on web3, every individual has a slightly different take on it. There are three broad themes that stand out, however.

Decentralised

Blockchain-based technology is inherently distributed (with some esoteric caveats, but we can safely ignore them for now). In a world where the web centres around a handful of major services, where we’ve seen the harm that the likes of Facebook and YouTube can inflict on society, it’s not surprising that decentralisation would be a powerful theme drawing in anyone looking for an alternative.

Decentralisation isn’t new to the Internet, of course: it’s right there in the name. This giant set of “interconnected networks” has been decentralised from the very beginning. It’s not perfect, of course: oppressive governments can take control of the borders of their portion of the Internet, and we’ve come to rely on a handful of web services to handle the trickier parts of using the web. But fundamentally, that decentralised architecture is still there. I can still set up a web site hosted on my home computer, which anyone in the world could access.

I don’t do that, however, for the same reason that web3 isn’t immune from centralised services: Centralisation is convenient. Just as we have Facebook, or Google, or Amazon as giant centralised services on the current web, we can already see similar services appearing for web3. For payments, Coinbase has established itself as a hugely popular place exchange cryptocurrencies and traditional currencies. For NFTs, OpenSea is the service where you’ll find nearly every NFT collection. MetaMask keeps all of your crypto-based keys, tokens, and logins in a single “crypto wallet”.

Centralisation is convenient.

While web3 proponents give a lot of credence to the decentralised nature of cryptocurrency being a driver of popularity, I’m not so sure. At best, I’m inclined to think that decentralisation is table stakes these days: you can’t even get started as a global movement without a strong commitment to decentralisation.

But if decentralisation isn’t the key, what is?

Ownership

When we talk about ownership in web3, NFTs are clearly the flavour of the month, but recent research indicates that the entire NFT market is massively artificially inflated.

Rather than taking pot-shots at the NFT straw man, I think it’s more interesting to look at the idea of ownership in terms of attribution. The more powerful element of this philosophy isn’t about who owns something, it’s who created it. NFTs do something rather novel with attribution, allowing royalty payments to the original artist every time an NFT is resold. I love this aspect: royalties shouldn’t just be for movie stars, they should be for everyone.

Comparing that to the current web, take the 3 paragraphs written by Max Read that I quoted above. I was certainly under no technical obligation to show that it was a quote, to attribute it to him, or to link to the source. In fact, it would have been easier for me to just paste his words into this post, and pretend they were my own. I didn’t, of course, because I feel an ethical obligation to properly attribute the quote.

In a world where unethical actors will automatically copy/paste your content for SEO juice (indeed, I expect this blog post to show up on a bunch of these kinds of sites); where massive corporations will consume everything they can find about you, in order to advertise more effectively to you, it’s not at all surprising that people are looking for a technical solution for taking back control of their data, and for being properly attributed for their creations.

The interesting element of this philosophy isn’t about who owns something, it’s who created it.

That’s not to say that existing services discourage attribution: a core function of Twitter is retweets, a core function of Tumblr is reblogging. WordPress still supports trackbacks, even if many folks turn them off these days.

These are all blunt instruments, though, aimed at attributing an entire piece, rather than a more targeted approach. What I’d really like is a way to easily quote and attribute a small chunk of a post: 3 paragraphs (or blocks, if you want to see where I’m heading 😉), inserted into my post, linking back to where I got them from. If someone chooses to quote some of this post, I’d love to receive a pingback just for that quote, so it can be seen in the right context.

The functionality provide by Twitter and Tumblr is less of a technologically-based enforcement of attribution, and more of an example of paving the cow path: by and large, people want to properly attribute others, providing the tools to do so can easily become a fundamental part of how any software is used.

These tools only work so long as there’s an incentive to use them, however. web3 certainly provides the tools to attribute others, but much like SEO scammers copy/pasting blog posts, the economics of the NFT bubble is clearly a huge incentive to ignore those tools and ethical obligations, to the point that existing services have had to build additional features just to detect this abuse.

Monetisation

With every major blockchain also being a cryptocurrency, monetisation is at the heart of the entire web3 movement. Every level of the web3 tech stack involves a cryptocurrency-based protocol. This naturally permeates through the entire web3 ecosystem, where money becomes a major driving factor for every web3-based project.

And so, it’s impossible to look at web3 applications without also considering the financial aspect. When you have to pay just to participate, you have to ask whether every piece of content you create is “worth it”.

Again, let’s go back to the 3 paragraphs I quote above. In a theoretical web3 world, I’d publish this post on a blockchain in some form or another, and that act would also likely include noting that I’d quoted 3 blocks of text attributed to Max Read. I’d potentially pay some amount of money to Max, along with the fees that every blockchain charges in order to perform a transaction. While this process is potentially helpful to the original author at a first glance, I suspect the second and third order effects will be problematic. Having only just clicked the Publish button a few seconds earlier, I’m already some indeterminate amount of money out of pocket. Which brings me back to the question, is this post “worth it”? Will enough people tip/quote/remix/whatever me, to cover the cost of publishing? When every creative work must be viewed through a lens of financial impact, it fundamentally alters that creative process.

When you have to pay just to participate, you have to ask whether every piece of content you create is “worth it”.

Ultimately, we live in a capitalist society, and everyone deserves the opportunity to profit off their work. But by baking monetisation into the underlying infrastructure of web3, it becomes impossible to opt-out. You either have the money to participate without being concerned about the cost, or you’re going to need to weigh up every interaction by whether or not you can afford it.

Web3 Philosophies in WordPress

After breaking it all down, we can see that it’s not all black-and-white. There are some positive parts of web3, and some negative parts. Not that different to the web of today, in fact. 🙂 That’s not to say that either approach is the correct one: instead, we should be looking to learn from both, and produce something better.

Decentralised

I’ve long been a proponent of leveraging the massive install base of WordPress to provide distributed services to anyone. Years ago, I spoke about an idea called “Connected WordPress” that would do exactly that. While the idea didn’t gain a huge amount of traction at the time, the DNA of the Connected WordPress concept shares a lot of similar traits to the decentralised nature of web3.

I’m a big fan of decentralised technologies as a way for individuals to claw back power over their own data from the governments and massive corporations that would prefer to keep it all centralised, and I absolutely think we should be exploring ways to make the existing web more resistant to censorship.

At the same time, we have to acknowledge that there are certainly benefits to centralisation. As long as people have the freedom to choose how and where they participate, and centralised services are required to play nicely with self hosted sites, is there a practical difference?

I quite like how Solid allows you have it both ways, whilst maintaining control over your own data.

Ownership Attribution

Here’s the thing about attribution: you can’t enforce it with technology alone. Snapchat have indirectly demonstrated exactly this problem: in order to not lose a message, people would screenshot or record the message on their phone. In response, Snapchat implemented a feature to notify the other party when you screenshot a message from them. To avoid this, people will now use a second phone to take a photo or video of the message. While this example isn’t specifically about attribution, it demonstrates the problem that there’s no way to technologically restrict how someone interacts with content that you’ve published, once they’ve been granted access.

Instead of worrying about technical restrictions, then, we should be looking at how attribution can be made easier.

IndieWeb is a great example of how this can be done in a totally decentralised fashion.

Monetisation

I’m firmly of the opinion that monetisation of the things you create should be opt-in, rather than opt-out.

Modern society is currently obsessed with monetising everything, however. It comes in many different forms: hustle culture, side gigs, transforming hobbies into businesses, meme stocks, and cryptocurrencies: they’re all symptoms of this obsession.

I would argue that, rather than accepting as fait accompli that the next iteration of the web will be monetised to the core, we should be pushing back against this approach. Fundamentally, we should be looking to build for a post scarcity society, rather than trying to introduce scarcity where there previously was none.

While we work towards that future, we should certainly be easier for folks to monetise their work, but the current raft of cryptocurrencies just aren’t up to the task of operating as… currencies.

What Should You Do?

Well, that depends on what your priorities are. The conversations around web3 are taking up a lot of air right now, so it’s possible to get the impression web3 will be imminently replacing everything. It’s important to keep perspective on this, though. While there’s a lot of money in the web3 ecosystem right now, it’s dwarfed by the sheer size of the existing web.

If you’re excited about the hot new tech, and feeling inspired by the ideas espoused in web3 circles? Jump right in! I’m certain you’ll find something interesting to work on.

Always wanted to get into currency speculation, but didn’t want to deal with all those pesky “regulations” and “safeguards”? Boy howdy, are cryptocurrencies or NFTs the place for you. (Please don’t pretend that this paragraph is investment advice, it is nothing of the sort.)

Want to continue building stuff on the web, and you’re willing to learn new things when you need them, but are otherwise happy with your trajectory? Just keep on doing what you’re doing. Even if web3 does manage to live up to the hype, it’ll take a long time for it to be adopted by the mainstream. You’ll have years to adapt.

Final Thoughts

There are some big promises associated with web3, many of which sound very similar to the promises that were made around web 2.0, particularly around open APIs, and global interoperability. We saw what happened when those kinds of tools go wrong, and web3 doesn’t really solve those problems. It may exacerbate them in some ways, since it’s impossible to delete your data from a blockchain.

That said, (and I say this as a WordPress Core developer), just because a particular piece of software is not the optimal technical solution doesn’t mean it won’t become the most popular. Market forces can be a far stronger factor that technical superiority. There are many legitimate complaints about blockchain (including performance, bloat, fit for purpose, and security) that have been levelled against WordPress in the past, but WordPress certainly isn’t slowing down. I’m not even close to convinced that blockchain is the right technology to base the web on, but I’ve been doing this for too long to bet everything against it.

Markets can remain irrational a lot longer than you and I can remain solvent.

—A. Gary Shilling

As for me, well… 😄

I remain sceptical of web3 as it’s currently defined, but I think there’s room to change it, and to adopt the best bits into the existing web. Web 1.0 didn’t magically disappear when Web 2.0 rolled in, it adapted. Maybe we’ll look back in 10 years and say this was a time when the web fundamentally changed. Or, maybe we’ll refer to blockchain in the same breath as pets.com, and other examples from the dotcom boom of the 1990’s.

The Net interprets censorship as damage and routes around it.

—John Gilmore

This quote was originally referring to Usenet, but it’s stayed highly relevant in the decades since. I think it applies here, too: if the artificial scarcity built into web3 behaves too much like censorship, preventing people from sharing what they want to share, the internet (or, more accurately, the billions of people who interact with the internet) will just… go around it. It won’t all be smooth sailing, but we’ll continue to experiment, evolve, and adapt as it changes.

Personally, I think now is a great time for us to be embracing the values and ideals of projects like Solid, and IndieWeb. Before web3 referred to blockchains, it was more commonly used in reference to the Semantic Web, which is far more in line with WordPress’ ideals, whilst also matching many of the values prioritised by the new web3. As a major driver of the Open Web, WordPress can help people own their content in a sustainable way, engage with others on their own terms, and build communities that don’t depend on massive corporations or hand-wavy magical tech solutions.

Don’t get too caught up in the drama of whatever is the flavour of the month. I’m optimistic about the long term resilience of the internet, and I think you should be, too. 🥳

,

Simon LyallAudiobooks – November 2021

The Secret Life of Groceries: The Dark Miracle of the American Supermarket by Benjamin Lorr

Several sections each looking at different aspect of the American Supermarket. From workers to suppliers to owners. Engaging. 4/5

Life Moves Pretty Fast: The lessons we learned from eighties movies (and why we don’t learn them from movies any more) by Hadley Freeman

A tour through mainstream 80s movies concentrating on one per chapter. Fun but covering serious topics too 4/5

The Boys: A Memoir of Hollywood and Family by Clint Howard and Ron Howard

Alternatively narrated by Ron and Clint about their family, growing up in Hollywood and acting. Only briefly covers events after the mid-1980s. Excellent. 4/5

Remote, Inc. : How to Thrive at Work . . . Wherever You Are by Robert C. Pozen & Alexandra Samuel

Lots of good advice for people suddenly working at home due to the pandemic. Tells you to think of yourself as a “business of one”. 3/5

The History of Spain: Land on a Crossroad by Joyce E. Salisbury

24 Lectures on Spanish History from the Stone age to the early 2000s. Interesting and easy to follow. Covers culture etc, not just kings and politics 3/5

For Your Eyes Only and other stories by Ian Fleming

Five short stories involving James Bond. Three straightforward Bond adventures and two others. I found them all very enjoyable. 3/5

See also: Top Audiobooks I’ve listened to

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

,

Simon LyallRSS Feeds for Podcasts from The Spinoff

The Spinoff is a New Zealand news website. They have several podcasts but unfortunately don’t publish the RSS feeds for these. Instead they just list the Spotify and Apples Podcasts links. So if you use something other than Spotify or the official Apple Podcasts client it is difficult to listen to them.

I’ve found the feed links and listed them here, including those shows on break or no longer produced. These are the official RSS feeds on Acast (where The Spinoff host their podcasts). All Spinoff Podcasts are listed on The Spinoff Podcasts Page .

FAQ

Q1: How do you find these feeds?
A1: I follow the “Apple” link on the podcasts page to the “Apple Podcasts” page for the podcast. Then I put that link into https://getrssfeed.com/ which tells me the RSS feed.

Q2: Where are these feeds?
A2: The Spinoff publish their podcasts via the Acast, who host it and insert ads. Acast provide RSS feeds for all their podcasts. They even provide a page for each show.

Q3: Why don’t The Spinoff publish RSS feeds?
A3: I assume they think most people use Spotify or Apple Podcasts and want to clutter their website. I’ve asked but no reply. I also suspect (based on errors they make) that some of their process is manual so it would be extra work for each episode.

Q4: What is RSS?
A4: RSS is a special webpage that lists podcast episodes (or posts in a blog or Youtube videos in a channel) that can be scanned easily by software. It allows you to see/hear updates without having to visit sites regularly to check if there are new episodes.

Q5: Can you recommend a RSS viewer?
A5: I use Newblur and are very happy with it.

Q6: Can you recommend a Podcast Client?
A6: I manually download podcasts and copy them into the Android based player called Listen Audiobook Player. I wouldn’t really recommend this workflow. Have a a look at this article on the Best podcast apps of 2021 instead. They let you put in a podcast feed and whenever a new episode is published it will be downloaded automatically to your phone.

Share

,

David RoweOpen IP over VHF/UHF 5

I’ve been having fun testing my data radio system over the air for the first time. This involved a few false starts, careful testing, tracking down a few bugs, and tuning the system to handle local EMI and strong pager signals. The good news is – it works! Using 10mW of transmitter power I have established a 1 kbit/s link over a 15km urban path using a RpiTx transmitter and RTL-SDR receiver. Plus lots of software.

My aim for this phase of the project was: don’t expect miracles with 10mW, but it should work down to the predicted Minimum Detectable Signal (MDS) over real world urban paths.

The configuration was similar to the previous post:

Terminal 1 (T1) uses a HackRF Tx and RTL-SDR Rx. Instead of a T/R switch I just use a hybrid combiner. I’ve also added a small “PA” to boost the HackRF signal to 10mW. Terminal 2 (T2) is the prototype pirip terminal. This uses a RpiTx transmitter (10mW output after the low pass filter), and RTL-SDR receiver, and a home made T-R switch.

I installed T2 at my home, connected to a vertical dipole about 8m high via some low loss coax. I installed T1 in my car with a commercial 2m antenna, and took it mobile so I could try to establish a link from various sites.

The frame repeater service on T2 allows me to “ping” it from T1. T2 responds to any valid packet it receives and sends one right back to T1. The system also gives me signal and noise power estimates from both ends of the link. I worked up some maths to estimate the Rx signal power in dBm from the FSK detector output. This lets me monitor received signal levels – and the local noise floor (noise spectral density) in real time.

So off I went for a drive, pinging away at 10 kbit/s. However the system was a bit deaf, I couldn’t get further than 500m before losing the signal. Even given a non line of sight path, that didn’t feel right. I tried reducing the data rate to 1 kbit/s. This was a bit better, out to 1.5km if I chose my location carefully. However the reported receiver powers didn’t look right, and I could see a big difference in the sensitivity at the two ends of the link.

Over the Cable Tests

A radio system has a lot of moving parts. A lot can go wrong, and when you do over the air tests it’s really hard to spot the problems. So I’ve learnt to perform Over the Cable (OTC) tests first. The basic idea is you connect the Tx and Rx via coax and enough attenuators to reduce the Rx power down to the predicted MDS. You keep working through the bugs until it works at the MDS.

So I took T1 and T2 back to the test bench to perform a bi-directional MDS test. At 1 kbit/s, this system should have a Minimum Detectable Signal (MDS) of -132dBm. I placed the two terminals in different rooms, and connected them via coax and lots of attenuation. The usual fun and games occurred, it’s quite hard to attenuate even 10mW (10dBm) down to -132dBm without some RF leaking around the attenuators.

The RF from the Pi was particularly hard to contain. I think it’s pretty poorly coupled into the coax, so the coax shield and Pi itself radiates a lot. So I installed T2 into a metal box, powered it from a battery, and provided just one hole for the SMA connector. With the lid firmly screwed down, this formed a nice Faraday cage.

That helped, I could now attenuate the signals right down to -132dBm. However T2 was quite deaf – about 8dB off theory. I traced this to EMI from the Pi getting into the antenna switch, which was stuffed into the box close to the Pi. To fix this I built a little shielded box from PCB material for the switch. Yayy – now both ends of the link were behaving, with packets getting through right down to the MDS of -129dBm for T1 and -132dBm for T2. T1 is 3dB less sensitive as it has a hybrid splitter instead of a T/R switch. This means the Rx signal is attenuated by 3dB before it hit’s the LNA of the RTL-SDR.

I found another problem that was making the system deaf. When T1 transmits a packet, some of that transmitted signal leaks through the hybrid to the T1 receiver. Turns out the leakage is really quite strong, and is enough to make the FSK demodulator deaf to any off air signals for the next few hundred ms. So T1 becomes deaf to the weak replies from T2. This is due to some averaging code in the FSK demod frequency offset estimator that gets “pumped up” to the mean received level. The work around was to put a delay in the T1 frame repeater service. When it receives a packet from T1, it waits a second before sending back the reply. This is not a practical solution for a real world use, so I’ll need to have another lap around the acquisition code in the near future. However good enough to let me proceed with the aims of the current round of tests.

Urban RF Nasties

My receiver has a fixed gain, set by the “-g” parameter on the RTL-SDR. A high gain means a low noise figure, but makes the receiver prone to overload. I hooked my antenna up to my spec-an and could see two problems that would impact my experiments:

  1. A very strong signal: -22dBm on 148.33 MHz, I think a pager from a multi-story hospital just 700m away.
  2. A lot of noise – the antenna “noise” temperature was equivalent to a noise figure of 20dB. I presume this is urban EMI. That really sucks, as it’s a big hit to my link budget. At the input to the RTL-SDR at full gain we have a 6dB noise figure. Added to that LNA noise is now the urban EMI that is 20-6 = 14dB higher. So the urban EMI will dominate, meaning roughly 14dB off my link budget, the equivalent of dropping my MDS from -132 to -118dBm.

I hooked up my antenna to my RTL-SDR and fired up gqrx to get a feel for what the RTL-SDR was “seeing”. Sure enough at full gain it was suffering a lot of overload problems when that pager fired up. I could actually listen to a local AM radio station quite clearly, that was being mixed up to VHF by the overload distortion. I could also see the level of a beacon being shifted down when the pager transmitted, a sign of receiver desensitisation. So I backed off the RTL-SDR gain to “-g 40” for T1 and T2. Based on my earlier measurements this means a RTL-SDR noise figure of 11dB. Note that at the T1 end (installed at my home), the urban EMI noise from the antenna (20dB) dominates, so the receiver noise figure doesn’t matter much. We would expect a MDS of around -118dBm.

So I hopped in my car and went mobile again. The system was breathing a little easier now. I parked at a few places around my suburb and by moving the car back and forth a few metres could often find a place where multipath worked for me and establish a link. However at these power levels I really need a line of site path, and where I live is quite flat. So I drove across town to the hills that overlook Adelaide. I started the system “pinging” from my car and sure enough I found a few spots that have a clear view over the plain and the link came up at a range of 15km!

Now 15km at 144MHz is a very neat 100dB attenuation over a line of site path (assuming 0dB gain antennas). So we would expect +10dBm-100dB = -90dBm at the receiver. Here is the log from my ping service:
1634971678 Rx Frame Rxloc: -116.08 Noloc: -157.56 snrloc: 11.48 Rxrem: -115.22 Norem: -153.21 snrrem: 7.99
1634971699 Rx Frame Rxloc: -112.90 Noloc: -157.50 snrloc: 14.60 Rxrem: -115.14 Norem: -153.46 snrrem: 8.32
1634971706 Rx Frame Rxloc: -114.00 Noloc: -157.30 snrloc: 13.29 Rxrem: -111.90 Norem: -153.30 snrrem: 11.40
1634971720 Rx Frame Rxloc: -120.02 Noloc: -157.54 snrloc: 7.52 Rxrem: -115.07 Norem: -153.36 snrrem: 8.29

Rxloc and Noloc are the local Rx and No (noise density) power estimates (the T1 terminal in my car). Rxrem and Norem are the Rx and No power estimates at the remote T2 terminal (at my house), 15 km away. The Rx power levels are nice and symmetrical. You can see the noise levels are bit higher at my house (Norem), due to all that urban EMI. This makes the remote SNR lower. The MDS of the system was estimated at -118dBm, a good match to our Rx power.

I didn’t have a perfect line of site path, so instead of the expected -90dBm, it was about -115dm, which means an extra 25dB of loss somewhere. However the link budget stacks up, and we have a working link over the top of a noisy urban RF environment. Yayyyyyy!

This is not a practical system yet, there are still quite a few building blocks to stitch together, and some more code (like a better acquisition system) to be written. However it’s clear the concept works, we really can send data over meaningful distances with simple hardware and open source software. A 2V red LED with 5mA of current consumes 10mW – the same Tx power as this system.

I’m really enjoying the sheer breadth of technologies in this project (propagation, antennas, EMI, shielding, RF hardware, link budget and strong signal analysis, modems, software decimators, power estimation maths, service scripts, Raspberry Pi). It’s fun being in control of and actually building so many layers in the system. A lot of moving pieces, a chance to learn many skills, and a lot that can go wrong too! Makes you appreciate what is inside commercial radios and chipsets, that we take for granted when we press the PTT button, or send an IP packet on our phones.

Next steps:

  1. Rework the acquisition system so it can handle strong signal immediately followed by a weak signal.
  2. A semi-permanent installation that runs for a month to gather long term stats and make sure nothing breaks in real world operation.
  3. A better link that can sustain 100 kbit/s. We need about 20dB more link budget for that. A directional antenna would be useful to try. My current antennas collect EMI power from all directions. A directional antenna would suppress EMI, and reduce the power from local strong signals, allowing me to bump up the RTL-SDR gain and enjoy a lower noise figure. A directional antenna would also increase the wanted signal receive power, further improving the link margin.

Reading Further

[1] Open IP over VHF/UHF Part 1 Part 2 Part 3 Part 4
[2] GitHub repo for this project with build scripts, a project plan

,

Jan Schmidt2.5 years of Oculus Rift

Once again time has passed, and another update on Oculus Rift support feels due! As always, it feels like I’ve been busy with work and not found enough time for Rift CV1 hacking. Nevertheless, looking back over the history since I last wrote, there’s quite a lot to tell!

In general, the controller tracking is now really good most of the time. Like, wildly-swing-your-arms-and-not-lose-track levels (most of the time). The problems I’m hunting now are intermittent and hard to identify in the moment while using the headset – hence my enthusiasm over the last updates for implementing stream recording and a simulation setup. I’ll get back to that.

Outlier Detection

Since I last wrote, the tracking improvements have mostly come from identifying and rejecting incorrect measurements. That is, if I have 2 sensors active and 1 sensor says the left controller is in one place, but the 2nd sensor says it’s somewhere else, we’ll reject one of those – choosing the pose that best matches what we already know about the controller. The last known position, the gravity direction the IMU is detecting, and the last known orientation. The tracker will now also reject observations for a time if (for example) the reported orientation is outside the range we expect. The IMU gyroscope can track the orientation of a device for quite a while, so can be relied on to identify strong pose priors once we’ve integrated a few camera observations to get the yaw correct.

It works really well, but I think improving this area is still where most future refinements will come. That and avoiding incorrect pose extractions in the first place.

Plot of headset tracking – orientation and position

The above plot is a sample of headset tracking, showing the extracted poses from the computer vision vs the pose priors / tracking from the Kalman filter. As you can see, there are excursions in both position and orientation detected from the video, but these are largely ignored by the filter, producing a steadier result.

Left Touch controller tracking – orientation and position

This plot shows the left controller being tracked during a Beat Saber session. The controller tracking plot is quite different, because controllers move a lot more than the headset, and have fewer LEDs to track against. There are larger gaps here in the timeline while the vision re-acquires the device – and in those gaps you can see the Kalman filter interpolating using IMU input only (sometimes well, sometimes less so).

Improved Pose Priors

Another nice thing I did is changes in the way the search for a tracked device is made in a video frame. Before starting looking for a particular device it always now gets the latest estimate of the previous device position from the fusion filter. Previously, it would use the estimate of the device pose as it was when the camera exposure happened – but between then and the moment we start analysis more IMU observations and other camera observations might arrive and be integrated into the filter, which will have updated the estimate of where the device was in the frame.

This is the bit where I think the Kalman filter is particularly clever: Estimates of the device position at an earlier or later exposure can improve and refine the filter’s estimate of where the device was when the camera captured the frame we’re currently analysing! So clever. That mechanism (lagged state tracking) is what allows the filter to integrate past tracking observations once the analysis is done – so even if the video frame search take 150ms (for example), it will correct the filter’s estimate of where the device was 150ms in the past, which ripples through and corrects the estimate of where the device is now.

LED visibility model

To improve the identification of devices better, I measured the actual angle from which LEDs are visible (about 75 degrees off axis) and measured the size. The pose matching now has a better idea of which LEDs should be visible for a proposed orientation and what pixel size we expect them to have at a particular distance.

Better Smoothing

I fixed a bug in the output pose smoothing filter where it would glitch as you turned completely around and crossed the point where the angle jumps from +pi to -pi or vice versa.

Improved Display Distortion Correction

I got a wide-angle hi-res webcam and took photos of a checkerboard pattern through the lens of my headset, then used OpenCV and panotools to calculate new distortion and chromatic aberration parameters for the display. For me, this has greatly improved. I’m waiting to hear if that’s true for everyone, or if I’ve just fixed it for my headset.

Persistent Config Cache

Config blocks! A long time ago, I prototyped code to create a persistent OpenHMD configuration file store in ~/.config/openhmd. The rift-kalman-filter branch now uses that to store the configuration blocks that it reads from the controllers. The first time a controller is seen, it will load the JSON calibration block as before, but it will now store it in that directory – removing a multiple second radio read process on every subsequent startup.

Persistent Room Configuration

To go along with that, I have an experimental rift-room-config branch that creates a rift-room-config.json file and stores the camera positions after the first startup. I haven’t pushed that to the rift-kalman-filter branch yet, because I’m a bit worried it’ll cause surprising problems for people. If the initial estimate of the headset pose is wrong, the code will back-project the wrong positions for the cameras, which will get written to the file and cause every subsequent run of OpenHMD to generate bad tracking until the file is removed. The goal is to have a loop that monitors whether the camera positions seem stable based on the tracking reports, and to use averaging and resetting to correct them if not – or at least to warn the user that they should re-run some (non-existent) setup utility.

Video Capture + Processing

The final big ticket item was a rewrite of how the USB video frame capture thread collects pixels and passes them to the analysis threads. This now does less work in the USB thread, so misses fewer frames, and also I made it so that every frame is now searched for LEDs and blob identities tracked with motion vectors, even when no further analysis will be done on that frame. That means that when we’re running late, it better preserves LED blob identities until the analysis threads can catch up – increasing the chances of having known LEDs to directly find device positions and avoid searching. This rewrite also opened up a path to easily support JPEG decode – which is needed to support Rift Sensors connected on USB 2.0 ports.

Session Simulator

I mentioned the recording simulator continues to progress. Since the tracking problems are now getting really tricky to figure out, this tool is becoming increasingly important. So far, I have code in OpenHMD to record all video and tracking data to a .mkv file. Then, there’s a simulator tool that loads those recordings. Currently it is capable of extracting the data back out of the recording, parsing the JSON and decoding the video, and presenting it to a partially implemented simulator that then runs the same blob analysis and tracking OpenHMD does. The end goal is a Godot based visualiser for this simulation, and to be able to step back and forth through time examining what happened at critical moments so I can improve the tracking for those situations.

To make recordings, there’s the rift-debug-gstreamer-record branch of OpenHMD. If you have GStreamer and the right plugins (gst-plugins-good) installed, and you set env vars like this, each run of OpenHMD will generate a recording in the target directory (make sure the target dir exists):

export OHMD_TRACE_DIR=/home/user/openhmd-traces/
export OHMD_FULL_RECORDING=1

Up Next

The next things that are calling to me are to improve the room configuration estimation and storage as mentioned above – to detect when the poses a camera is reporting don’t make sense because it’s been bumped or moved.

I’d also like to add back in tracking of the LEDS on the back of the headset headband, to support 360 tracking. I disabled those because they cause me trouble – the headband is adjustable relative to the headset, so the LEDs don’t appear where the 3D model says they should be and that causes jitter and pose mismatches. They need special handling.

One last thing I’m finding exciting is a new person taking an interest in Rift S and starting to look at inside-out tracking for that. That’s just happened in the last few days, so not much to report yet – but I’ll be happy to have someone looking at that while I’m still busy over here in CV1 land!

As always, if you have any questions, comments or testing feedback – hit me up at thaytan@noraisin.net or on @thaytan Twitter/IRC.

Thank you to the kind people signed up as Github Sponsors for this project!

,

Lev LafayetteStreamlined Workflow from Instrument to HPC

The complexity of many contemporary scientific workflows is well-known, both in the laboratory setting and the computational processes. One discipline where this is particularly true is biochemistry, and in 2017 the Nobel Prize in Chemistry was awarded for the development of cryo-electron microscopy (cryo-EM). This allows researchers to "freeze" biomolecules in mid-movement and visualize three-dimensional structures of them, aiding in understanding their function and interaction which is, of course, essential in drug discovery pipelines. However, cryo-EM unsurprisingly produces vast quantities of data which, when combined with the storage capabilities and processing capabilities available from High Performance Computing simulations, produces detailed 3D models of biological structures at sub-cellular and molecular scales.

Optimising the cyro-EM workflow is a significant challenge, from image acquisition with transmission electron microscopes and direct electron detectors, through to the preprocessing tasks of motion correction, particple picking and extraction, CTF estimation, then image classification and curation, image sharpening and refinement, and finally structure modelling. On the computational side, the right selection and balance of storage, network, GPU-enabled and optimised software is requisite.

Following previous presentations at eResearchAustralasia that have mapped the innovations of the University of Melbourne's HPC system, Spartan, an exploration is provided here on how a combination of Spectrum Scale storage, a significant LIEF-funded GPU partition, and the use of cryoSPARC contributes to rapid solutions and workflow simplifcation for cryo-EM structures, including SARS-CoV-2. Optimising the cyro-EM workflow is a significant challenge, from image acquisition, through to the preprocessing tasks of motion correction, participle picking and extraction, Contrast Transfer Function estimation, image classification and curation, sharpening and refinement, and finally structure modelling. On the computational side, there is the right selection of storage, network, GPU-enabled and optimised software. This short presentation will outline these steps and choices in a manner that is useful for other institutions.

  • Streamlined Workflow from Instrument to HPC. Presentation to eResearchAustralasia October 21, 2021.
  • ,

    Matt PalmerDiscovering AWS IAM accounts

    Let’s say you’re someone who happens to discover an AWS account number, and would like to take a stab at guessing what IAM users might be valid in that account. Tricky problem, right? Not with this One Weird Trick!

    In your own AWS account, create a KMS key and try to reference an ARN representing an IAM user in the other account as the principal. If the policy is accepted by PutKeyPolicy, then that IAM account exists, and if the error says “Policy contains a statement with one or more invalid principals” then the user doesn’t exist.

    As an example, say you want to guess at IAM users in AWS account 111111111111. Then make sure this statement is in your key policy:

    {
      "Sid": "Test existence of user",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111111111111:user/bob"
      },
      "Action": "kms:DescribeKey",
      "Resource": "*"
    }
    

    If that policy is accepted, then the account has an IAM user named bob. Otherwise, the user doesn’t exist. Scripting this is left as an exercise for the reader.

    Sadly, wildcards aren’t accepted in the username portion of the ARN, otherwise you could do some funky searching with ...:user/a*, ...:user/b*, etc. You can’t have everything; where would you put it all?

    I did mention this to AWS as an account enumeration risk. They’re of the opinion that it’s a good thing you can know what users exist in random other AWS accounts. I guess that means this is a technique you can put in your toolbox safe in the knowledge it’ll work forever.

    Given this is intended behaviour, I assume you don’t need to use a key policy for this, but that’s where I stumbled over it. Also, you can probably use it to enumerate roles and anything else that can be a principal, but since I don’t see as much use for that, I didn’t bother exploring it.

    There you are, then. If you ever need to guess at IAM users in another AWS account, now you can!

    ,

    Linux AustraliaCouncil Meeting 29th September 2021 – Minutes

    1. Meeting overview and key information

    Present

    • Joel Addison (Vice President)
    • Clinton Roy (Secretary)
    • Jonathan Woite (Council)
    • Russell Coker (Council)
    • Russell Stuart (Treasurer)
    • Sae Ra Germaine (President)

    Apologies

    –    Neil Cox (Council)

    Meeting opened at 19:30 AEDT by Sae Ra  and quorum was achieved.

    Minutes taken by Clinton Roy

    2. Log of correspondence

     

    3. Items for discussion

    • Defamation law? Clinton thinks we should be discussing it at least. Sae Ra has done some research. Sae Ra thinks we should update the email list policy such that we retain the right to remove any mails against our policy and widen it such that the policy can be applied across all social media. Clinton suggests we just close down comments on everything till the legal state is clearer. Sae Ra thinks we can police all of our social media platforms. TODO Jonathan to look at widening the email list policy to cover all social media.
    • Steve speaking for the admin team. Not a lot to report.
    • Owen speaking for the Drupal team. Another event coming up, a twin event to the previous one. Tickets for the August event are valid for the November event. Call for speakers is done, announcing the program first week of October. Will push registrations again, might pick up a few more. Picked up one more sponsor. Budget is balanced already, so no red flags or anything. Good feedback on first one, have adjusted the format based on that. The attendees liked the interactive sessions, but wanted slightly less. A shuffle is happening in the committee, David Sparks stepping into the chair role, go through the formal process in October, Owen will shift to an event focus for the remainder of his term. They have booked Wellington for this year, pushed back to 2022, the venue is happy to release them from the contract. Sae Ra asks for some media for the LCA report.
    • Sae Ra speaking on LCA2022: Session selection coming up on Oct 8.

    AGM working bee on sunday the third, Sae Ra to send out a calendar invite.

    4. Items for noting

    5. Other business

    The post Council Meeting 29th September 2021 – Minutes appeared first on Linux Australia.

    ,

    Lev LafayetteIn The Long Run We Are All Dead

    "In The Long Run We Are All Dead: A Preliminary Sketch of Demographic, Economic, and Technological Trends for the University Sector"

    University participation and costs are both rising. What evidence is there that the public is receiving good value? Using demographic data, trends, and analysis from Australia, and considering contemporary developments in information and communications technology, a preliminary assessment is made for the economic and political future of the public university sector.

    Although the proportion of the population completing university education is increasing, so too is the per-student cost. Due to the cost-disease of the service sector, this cost is increasing greater relative to other sectors. Public funding in such a situation would normally be subject to strong political pressure, however, this is mitigated by claims of positive externalities from the sector.

    Recent econometric research breaks this down on a disciplinary level, calculating social and personal benefits with precision. Further, contemporary technologies challenge the traditional "iron triangle" of vectors of quality, cost, and access. This breaks down the traditional cost-disease model, allowing for the triangle to be expanded across the vectors, and potentially freeing up educator resources for more creative contributions. Of course, this assumes that creativity is not a process that can emerge through automation. Until that storm arrives, however, we ought to take advantage of favourable currents.

    Presentation to the University of Otago Higher Education Development Centre (HEDC) Symposium 2021

    ,

    David RoweSox play Desktop Application for Ubuntu File Manager

    Every time I install Ubuntu I get stuck on this one. When browsing files on Ubuntu, I want to be able to click on a wave file icon and have it play with sox “play”. I don’t want it to launch some heavyweight audio editor or MP3 player. Just play my 3 second file without any fuss.

    The problem is I only install Ubuntu every few years so I forget how I did it last time. A side effect of Linux being so reliable is you only need obscure set up steps rarely. So documenting the steps here so Google can augment my memory.

    Place this code in ~/.local/share/applications/play.desktop

    [Desktop Entry]
    Name=play
    Comment=play wave files
    Exec=/usr/bin/play %F
    Terminal=false
    Type=Application
    Categories=Audio
    MimeType=audio/x-wav

    Then in the file manager, select a wave file, right click Open With Another Application and “play” should be one of the selections:

    Glen TurnerThe tyranny of product names

    For a long time computer manufacturers have tried to differentiate themselves and their products from their competitors with fancy names with odd capitalisation and spelling. But as an author, using these names does a disservice to the reader: how are they to know that DEC is pronounced as if it was written Dec ("deck").

    It's time we pushed back, and wrote for our readers, not for corporations.

    It's time to use standard English rules for these Corporate Fancy Names. Proper names begin with a capital, unlike "ciscoSystems®" (so bad that Cisco itself moved away from it). Words are separated by spaces, so "Cisco Systems". Abbreviations and acronyms are written in lower case if they are pronounced as a word, in upper case if each letter is pronounced: so "ram" and "IBM®".

    So from here on in I'll be using the following:

    • Face Book. Formerly, "Facebook®".
    • Junos. Formerly JUNOS®.
    • ram. Formerly RAM.
    • Pan OS. Formerly PAN-OS®.
    • Unix. Formerly UNIX®.

    I'd encourage you to try this in your own writing. It does look odd for the first time, but the result is undeniably more readable. If we are not writing to be understood by our audience then we are nothing more than an unpaid member of some corporation's marketing team.



    comment count unavailable comments

    ,

    David RoweVQ Index Optimisation

    Following on from the subset vector quantiser [1] I’ve been working on improving speech quality in the presence of bit errors without the use of Forward Error Correction (FEC). I’ve tried two techniques:

    1. Optimisation of VQ indexes [2][3]
    2. Trellis based maximum likelihood decoding of a sequence of VQ indexes [4]

    Digital speech and FEC isn’t a great mix. Speech codecs often work fine with a few percent BER, which is unheard of for data applications that need zero bit errors. This is because humans can extract intelligible information from payload speech data with errors. On the channels of interest we like to play at low SNRs and high BERs (around 10%). However not many FEC codes work at 10% BER, and the ones that do require large block sizes, introducing latency which is problematic for Push To Talk (PTT) speech. So I’m interested in exploring alternatives to FEC that allow gradual degradation of speech in channels with high bit error rates.

    Vector Quantisation

    Here is a plot that shows a 2 dimensional Vector Quantiser (VQ) in action. The cloud of small dots is the source data we wish to encode. Each dot represents 2 source data samples, plotted for convenience on a 2D plot. The circles are the VQ entries, which are trained to approximate the source data. We arrange the VQ entries in a table, each with a unique index. To encode a source data pair, we find the nearest VQ pair, and send the index of that VQ entry over the channel. At the decoder, we reconstruct the pair using a simple table look up.

    When we get a bit error in that index, we tend to jump to some random VQ entry, which can introduce a large error in the decoded pair. Index optimisation re-arranges the VQ indexes so a bit error in the received index will result in jumping to a nearby VQ entry, minimising the effect of the error. This is shown by the straight lines in the plot above. Each line shows the decoded VQ value for a single bit error. They aren’t super close (the nearest neighbours), as I guess it’s hard to simultaneously optimise all VQ entries for all single bit errors.

    For my experiments I used a 16-th order VQ, so instead of two pairs there were 16 samples encoded for each VQ entry. These 16 samples represent the speech spectrum. Hard to plot a 16 dimensional value but the same ideas apply. We can optimise the VQ indexes so that a single bit error will lead to a decoded value “close” to the desired value. This gives us some robustness to bit errors for free. Unlike Forward Error Correction, no additional bits need to be sent. Also unlike FEC – the errors aren’t corrected – just masked to a certain extent (the decoded speech sounds a bit better).

    The Trellis system is described in detail in [4]. This looks at a sequence of received VQ indexes, and the trajectory they take on the figure above. It makes the assumption that the speech signal changes fairly slowly in time, so the trajectory we trace on the figure above tends to be small jumps across the VQ “space”. A large jump means a possible bit error. Simultaneously, we look at the likelihood of receiving each vector index. In a noisy channel, we are more sure of some bits, and less sure of others. These are all arranged on a 2D “trellis” which we search to find the most likely path. This tends to work quite well when the vectors are sampled at a high rate (10 or 20ms), less so when we sample them less often (say 40ms).

    The details of the two algorithms tested are in the GitHub PRs [5][6].

    Results

    This plot compares a few methods. The x axis is normalised SNR, and the y-axis spectral distortion. Results were an average of 30 seconds of speech on an AWGN channel.

    1. No errors (blue, bottom), is the basic VQ quantiser with no channel errors. Compared to the input samples, it has an average distortion of 3dB. That gets you rough but usable “communications quality” speech.
    2. Vanilla AWGN (green) is the spectral distortion as we lower the Eb/No (SNR), and bit errors gradually creep in. FreeDV 700C [8] uses no FEC so would respond like this to bit errors.
    3. The red curve is similar to FreeDV 700D/E – we are using a rate 0.5 LDPC code to protect the speech data. This works well until it doesn’t – you hit a threshold in SNR and the code falls over, introducing more errors than it corrects.
    4. The upper blue curve is the index optimised VQ (using the binary switch algorithm). This works pretty well (compare to green) and zero cost – we’ve just shuffled a few indexes in the VQ table.
    5. Black is when we combine the FEC with index optimisation. Even better at high Eb/No, and the “knee” where the FEC falls over is much less obvious than “red”.
    6. Cyan is the Trellis decoder, quite a good result at low Eb/No, but a “long tail” – it makes a few mistakes even at high Eb/No.

    Here are some speech samples showing the index optimisation and trellis routines in action. They were generated at an Eb/No = 1dB (6% BER) operating point on the plot above. The Codec 2 700C mode is provided as a control. In these tests of spectral distortion the 700C mode uses 22 bits/frame at a 40ms frame rate, the “600” mode just 12 bits/frame at a 30ms frame rate. I’ve just applied the index optimisation and trellis decoding to the 600 mode.

    Mode BER VQ Decoder Sample1 Sample2 Sample3
    700C 0.00 Orig Normal Listen Listen Listen
    700C 0.06 Orig Normal Listen Listen Listen
    600 0.00 Orig Normal Listen Listen Listen
    600 0.06 Orig Normal Listen Listen Listen
    600 0.06 Opt Normal Listen Listen Listen
    600 0.06 Opt Trellis Listen Listen Listen

    The index optimisation seems effective, especially on samples 1 and 2. The improvements are less noticeable on the longer sample3, although the longer sample makes it harder to do a quick A/B test. The Trellis scheme is even better at reducing the pops and clicks, but I feel on sample1 at least it tends to “smooth” the speech, so it becomes a little less intelligible.

    Discussion

    In this experiment I compared the spectral distortion of two non-redundant techniques to FEC based protection on a Spectral Distortion versus Eb/No scale.

    While experimenting with this work I found an interesting trade off between update rate and error protection. With a higher update rate, we notice errors less. Unfortunately this increased the bit rate too.

    The non-FEC techniques have a gradual “fuzzy” degradation versus a knee. This is quite useful for digital speech systems, e.g. at the bottom of a fade we might get “readability 3” speech, that bounce back up to “readability 5” after the fade. The ear will put it all back together using “Brain FEC”. With FEC based schemes you get readability 5 – R2D2 noises in the fade – then readability 5.

    So non-FEC schemes have some potential to lower the “minimum SNR” the voice link can handle.

    It’s clear that index optimisation does help intelligibility, with or without FEC.

    At low Eb/No, the PER is 50%! So every 2nd 12-bit vector index has at least 1 bit error, and yet we are getting (readability 3 – readable with difficulty) speech. However the rule of thumb I have developed experimentally still applies – you still need PER=0.1/BER=0.01 for “readability 5” speech.

    There are some use cases where not using FEC might be useful. A rate 0.5 FEC system requires twice the RF bandwidth, and modem synchronisation is harder as each symbol has half the energy. It introduces latency as the FEC codewords for a decent code are larger than the vocoder frame size. When you lose a FEC codeword, you tend to lose a large chunk of speech. Frame sync is slower, as it happens at the FEC codeword rate, making recovery after a deep fade or PTT sync slower. So having a non-FEC alternative in our toolkit for low SNR digital speech is useful.

    On a personal note I quite enjoyed this project. It was tractable, and I managed to get results in a reasonable amount of time without falling down too many R&D rabbit holes. It was also fun and rewarding to come to grips with at least some of the math in [3]. I am quite pleased that (after the usual fight with the concept of coding gain) I managed to reconcile FEC and non-FEC results on a single plot that roughly compares to the perceptual quality of speech.

    Ideas for further work:

    1. Test across a larger number of samples to get a better feel for the effectiveness of these algorithms.
    2. The index optimisation could be applied to Codec 2 700C (and hence FreeDV 700C/D/E). This would however break compatibility.
    3. More work with the trellis scheme might be useful. In a general sense, this is a model that takes into account various probabilities, e.g. how likely is it that we received a certain codeword? We could also include source probability information – e.g. for a certain speaker (or globally across all speakers) some vectors will be more likely than others. The probability tables could be updated in real time, as when the channel is not faded, we can trust that each index is probably correct.
    4. The “600” mode above is a prototype Codec 2 mode based on this work and [1]. We could develop that into a real world Codec 2/FreeDV mode and see how it goes over the air.
    5. The new crop of Neural Net vocoders use the same parameter set and VQ, so index optimisation/FEC trade offs may also be useful there (they may do this already). For example we could run FreeDV 2020 without any FEC, freeing up RF bandwidth for more pilot symbols so it handles HF channels better.

    Reading Further

    [1] Subset Vector Quantiser
    [2] Codec 2 at 450 bit/s
    [3] Pseudo-Gray coding, K. Zeger; A. Gersho, 1990
    [4] Trellis Decoding for Codec 2
    [5] PR – Vector Quantiser Index Optimisation
    [6] PR – Trellis decoding of VQ
    [7] RST Scale
    [8] FreeDV Technology

    ,

    Dave HallYour Terraform Module Needs an Opinion

    Learn why your Terraform modules should be opinionated.

    ,

    Chris NeugebauerTalk Notes: On The Use and Misuse of Decorators

    I gave the talk On The Use and Misuse of Decorators as part of PyConline AU 2021, the second in annoyingly long sequence of not-in-person PyCon AU events. Here’s some code samples that you might be interested in:

    Simple @property implementation

    This shows a demo of @property-style getters. Setters are left as an exercise :)

    
    def demo_property(f):
        f.is_a_property = True
        return f
    
    
    class HasProperties:
    
        def __getattribute__(self, name):
            ret = super().__getattribute__(name)
            if hasattr(ret, "is_a_property"):
                return ret()
            else:
                return ret
    
    class Demo(HasProperties):
    
        @demo_property
        def is_a_property(self):
            return "I'm a property"
    
        def is_a_function(self):
            return "I'm a function"
    
    
    a = Demo()
    print(a.is_a_function())
    print(a.is_a_property)
    

    @run (The Scoped Block)

    @run is a decorator that will run the body of the decorated function, and then store the result of that function in place of the function’s name. It makes it easier to assign the results of complex statements to a variable, and get the advantages of functions having less leaky scopes than if or loop blocks.

    def run(f):
        return f()
    
    @run
    def hello_world():
        return "Hello, World!"
    
    print(hello_world)
    

    @apply (Multi-line stream transformers)

    def apply(transformer, iterable_):
    
        def _applicator(f):
    
            return(transformer(f, iterable_))
    
        return _applicator
    
    @apply(map, range(100)
    def fizzbuzzed(i):
        if i % 3 == 0 and i % 5 == 0:
            return "fizzbuzz"
        if i % 3 == 0:
            return "fizz"
        elif i % 5 == 0:
            return "buzz"
        else:
            return str(i)
    
    

    Builders

    
    def html(f):
        builder = HtmlNodeBuilder("html")
        f(builder)
        return builder.build()
    
    
    class HtmlNodeBuilder:
        def __init__(self, tag_name):
           self.tag_name = tag_name
           self.nodes = []
    
       def node(self, f):
            builder = HtmlNodeBuilder(f.__name__)
            f(builder)
            self.nodes.append(builder.build())
    
        def text(self, text):
            self.nodes.append(text)
    
        def build(self):
          nodes = "\n".join(self.nodes)
           return f"<{self.tag_name}>\n{nodes}\n</{self.tag_name}>"
    
    
    @html
    def document(b):
       @b.node
       def head(b):
           @b.node
           def title(b):
               b.text("Hello, World!")
    
       @b.node
       def body(b):
           for i in range(10, 0, -1):
               @b.node
               def p(b):
                   b.text(f"{i}")
    
    

    Code Registries

    This is an incomplete implementation of a code registry for handling simple text processing tasks:

    ```python

    def register(self, input, output):

    def _register_code(f):
        self.registry[(input, output)] = f
        return f
    
    return _register_code
    

    in_type = (iterable[str], (WILDCARD, ) out_type = (Counter, (WILDCARD, frequency))

    @registry.register(in_type, out_type) def count_strings(strings):

    return Counter(strings)
    

    @registry.register( (iterable[str], (WILDCARD, )), (iterable[str], (WILDCARD, lowercase)) ) def words_to_lowercase(words): …

    @registry.register( (iterable[str], (WILDCARD, )), (iterable[str], (WILDCARD, no_punctuation)) ) def words_without_punctuation(words): …

    def find_steps( self, input_type, input_attrs, output_type, output_attrs ):

    hand_wave()
    

    def give_me(self, input, output_type, output_attrs):

    steps = self.find_steps(
        type(input), (), output_type, output_attrs
    )
    
    temp = input
    for step in steps:
        temp = step(temp)
    
    return temp
    

    ,

    Michael StillLinux bridges have their MTU overwritten when you add an interface

    I discovered last night that network bridges on linux have their Maximum Transmission Unit (MTU) overwritten by whatever is the MTU value of the most recent interface added to the bridge. This is bad. Very bad. Specifically this is bad because MTU matters for accurately describing the capabilities of the network path the packets will travel on, so it shouldn’t be clobbered willy nilly.

    Here’s an example of the behaviour:

    # ip link add egr-br-ens1f0 mtu 1500 type bridge
    # ip link show dev egr-br-ens1f0
    3: egr-br-ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
        link/ether 7e:33:1b:30:d8:00 brd ff:ff:ff:ff:ff:ff
    # ip link add egr-eaa64a-o mtu 8950 type veth peer name egr-eaa64a-i
    # ip link show dev egr-br-ens1f0
    3: egr-br-ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
        link/ether 7e:33:1b:30:d8:00 brd ff:ff:ff:ff:ff:ff
    # brctl addif egr-br-ens1f0 egr-eaa64a-o
    # ip link show dev egr-br-ens1f0
    3: egr-br-ens1f0: <BROADCAST,MULTICAST> mtu 8950 qdisc noop state DOWN mode DEFAULT group default qlen 1000
        link/ether da:82:cf:34:13:60 brd ff:ff:ff:ff:ff:ff

    So you can see here that the bridge had an MTU of 1,500 bytes. We create a veth pair with an MTU of 8,950 bytes and add it to the bridge. Suddenly the bridge’s MTU is 8,950 bytes!

    Perhaps this is my fault — brctl is pretty old school. Let’s use only ip commands to configure the bridge.

    # ip link add mgr-br-ens1f0 mtu 1500 type bridge
    # ip link show dev mgr-br-ens1f0
    6: mgr-br-ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
        link/ether 82:d8:df:15:40:01 brd ff:ff:ff:ff:ff:ff
    # ip link add mgr-eaa64a-o mtu 8950 type veth peer name mgr-eaa64a-i
    # ip link show dev mgr-br-ens1f0
    6: mgr-br-ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
        link/ether 82:d8:df:15:40:01 brd ff:ff:ff:ff:ff:ff
    # ip link set mgr-eaa64a-o master mgr-br-ens1f0
    # ip link show dev mgr-br-ens1f0
    6: mgr-br-ens1f0: <BROADCAST,MULTICAST> mtu 8950 qdisc noop state DOWN mode DEFAULT group default qlen 1000
        link/ether 22:55:4a:a8:19:00 brd ff:ff:ff:ff:ff:ff

    The same problem occurs. Luckily, you can specify the MTU when you add an interface to a bridge, like this:

    # ip link add zgr-br-ens1f0 mtu 1500 type bridge
    # ip link show dev zgr-br-ens1f0
    9: zgr-br-ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
        link/ether 7a:54:2c:04:5f:a8 brd ff:ff:ff:ff:ff:ff
    # ip link add zgr-eaa64a-o mtu 8950 type veth peer name zgr-eaa64a-i
    # ip link show dev zgr-br-ens1f0
    9: zgr-br-ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
        link/ether 7a:54:2c:04:5f:a8 brd ff:ff:ff:ff:ff:ff
    # ip link set zgr-eaa64a-o master zgr-br-ens1f0 mtu 1500
    # ip link show dev zgr-br-ens1f0
    9: zgr-br-ens1f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
        link/ether ae:59:0b:a6:46:94 brd ff:ff:ff:ff:ff:ff

    And that works nicely. In my case, this ended up with me writing code to lookup the MTU of the bridge I was adding the interface to, and then specifying that MTU back when adding the interface. I hope this helps someone else.

    ,

    Jan SchmidtOpenHMD update

    A while ago, I wrote a post about how to build and test my Oculus CV1 tracking code in SteamVR using the SteamVR-OpenHMD driver. I have updated those instructions and moved them to https://noraisin.net/diary/?page_id=1048 – so use those if you’d like to try things out.

    The pandemic continues to sap my time for OpenHMD improvements. Since my last post, I have been working on various refinements. The biggest visible improvements are:

    • Adding velocity and acceleration API to OpenHMD.
    • Rewriting the pose transformation code that maps from the IMU-centric tracking space to the device pose needed by SteamVR / apps.

    Adding velocity and acceleration reporting is needed in VR apps that support throwing things. It means that throwing objects and using gravity-grab to fetch objects works in Half-Life: Alyx, making it playable now.

    The rewrite to the pose transformation code fixed problems where the rotation of controller models in VR didn’t match the rotation applied in the real world. Controllers would appear attached to the wrong part of the hand, and rotate around the wrong axis. Movements feel more natural now.

    Ongoing work – record and replay

    My focus going forward is on fixing glitches that are caused by tracking losses or outliers. Those problems happen when the computer vision code either fails to match what the cameras see to the device LED models, or when it matches incorrectly.

    Tracking failure leads to the headset view or controllers ‘flying away’ suddenly. Incorrect matching leads to controllers jumping and jittering to the wrong pose, or swapping hands. Either condition is very annoying.

    Unfortunately, as the tracking has improved the remaining problems get harder to understand and there is less low-hanging fruit for improvement. Further, when the computer vision runs at 52Hz, it’s impossible to diagnose the reasons for a glitch in real time.

    I’ve built a branch of OpenHMD that uses GStreamer to record the CV1 camera video, plus IMU and tracking logs into a video file.

    To go with those recordings, I’ve been working on a replay and simulation tool, that uses the Godot game engine to visualise the tracking session. The goal is to show, frame-by-frame, where OpenHMD thought the cameras, headset and controllers were at each point in the session, and to be able to step back and forth through the recording.

    Right now, I’m working on the simulation portion of the replay, that will use the tracking logs to recreate all the poses.

    ,

    David RoweControlled FreeDV Testing

    This post presents some results from a controlled experiment to test FreeDV against SSB over a range of real world HF channels.

    As described in [2] I take a 10 second speech sample and transmit it using SSB then FreeDV. By transmitting them more or less at the same time, we get to test them over the same channel. The peak power of the SSB and FreeDV signals are adjusted to be the same. The SSB is compressed [3] so that we are generating as much SSB talk power as possible.

    Over the past few weeks we’ve collected 1158 samples of FreeDV and SSB signals, using a network of KiwSDRs and some scripts to automate the collection of off-air samples. Every 30 minutes my IC7200 would click and whir into life as the laptop connected to it transmitted a test signal in various FreeDV modes. Simultaneously, the signals would be received by a remote KiwiSDR and recorded as a wave file. This was then decoded to produce a side by side SSB versus FreeDV audio signal.

    Jose, LU5DKI, also collected some samples for me using his station. Jose is very experienced at using FreeDV with SDRs over international DX paths.

    I’ll present some of the more interesting samples here. If you open the spectrogram in a new tab you should be able to see a larger version. The spectrograms are like a waterfall but time flows left to right. The tests were conducted with FreeDV 700C/D/E [4].

    Serial Comment SSB & FreeDV Spectrogram
    0036 Lower limit of 700D, which can handle lower SNRs than other modes. SSB very difficult copy. Listen
    0105 FreeDV 700C with some co-channel SSB, results in a few errors in the decoded voice, SSB affected more. Listen
    0106 Fade at the start of SSB and 700D, however 700D takes a few seconds to sync up Listen
    0119 Impulse noise, e.g. atmospherics, lightning, largely suppressed in decoded FreeDV. Listen
    0146 Slow fade, occasionally nulling out entire signal Listen
    0500 700D falling over on a weak fast fading/NVIS channel, note the moth-eaten spectrogram Listen
    0501 700E coping with the same NVIS channel as Serial 0500 – 700E is designed to handle fast fading Listen
    0629 “Barber pole” frequency selective fading carving a notch out of the 700D signal, but no errors at 5dB SNR. SSB improves only slowly with increasing SNR – still quite noisy. Listen
    j01 DX 700D from Argentina to Rotorua, New Zealand. Both SSB and FreeDV pretty good. Listen
    j03 700D DX sample from Argentina to Ireland, SSB a weak copy but 700D not getting sync Listen
    j03 700E As above but with 700E, which is decoding with some errors. 700E was designed for long distance paths Listen

    Notes:

    1. The initial “hello” sounds buzzy as the microphone equaliser [5] is still kicking in.
    2. Most of these samples are low SNR as that’s an interesting area of operation for me. Also – the experiment didn’t collect many high SNR samples, perhaps due to the limitations of my station (50W into a simple dipole) and band conditions during the experiment. I would have liked to have collected some high SNR/fast fading examples.
    3. Unfortunately the output audio levels aren’t normalised, so the FreeDV part of the sample will sound louder. I didn’t apply any noise cancellation to the SSB samples, but if you have access to such software please feel free to download the samples and see what you can do.

    Credits

    Special thanks to Jose LU5DKI, Mooneer K6AQ, Peter VK3RV for their help in manually collecting samples and and discussion around these experiments. Thanks also to the KiwiSDR community, a very useful resource for experimental radio.

    Links

    [1] Codec 2 HF Data Modes Part 3 Similar script based automated testing for Codec 2 HF data modes.
    [2] Automated Voice Testing Pull Request, containing more details of the experiment and test software.
    [3] FreeDV 700E and Compression, including a description of the SSB compressor used here.
    [4] Summary of various modes in FreeDV manual.
    [5] Codec 2 700C Equaliser Part 2

    ,

    Robert CollinsA moment of history

    I’ve been asked more than once what it was like at the beginning of Ubuntu, before it was a company, when an email from someone I’d never heard of came into my mailbox.

    We’re coming up on 20 years now since Ubuntu was founded, and I had cause to do some spelunking into IMAP archives recently… while there I took the opportunity to grab the very first email I received.

    The Ubuntu long shot succeeded wildly. Of course, we liked to joke about how spammy those emails where: cold-calling a raft of Debian developers with job offers, some of them were closer to phishing attacks :). This very early one – I was the second employee (though I started at 4 days a week to transition my clients gradually) – was less so.

    I think its interesting though to note how explicit a gamble this was framed as: a time limited experiment, funded for a year. As the company scaled this very rapidly became a hiring problem and the horizon had to be pushed out to 2 years to get folk to join.

    And of course, while we started with arch in earnest, we rapidly hit significant usability problems, some of which were solvable with porcelain and shallow non-architectural changes, and we built initially patches, and then the bazaar VCS project to tackle those. But others were not: for instance, I recall exceeding the 32K hard link limit on ext3 due to a single long history during a VCS conversion. The sum of these challenges led us to create the bzr project, a ground up rethink of our version control needs, architecture, implementation and user-experience. While ultimately git has conquered all, bzr had – still has in fact – extremely loyal advocates, due to its laser sharp focus on usability.

    Anyhow, here it is: one of the original no-name-here-yet, aka Ubuntu, introductory emails (with permission from Mark, of course). When I clicked through to the website Mark provided there was a link there to a fantastical website about a space tourist… not what I had expected to be reading in Adelaide during LCA 2004.


    From: Mark Shuttleworth <xxx@xxx>
    To: Robert Collins <xxx@xxx>
    Date: Thu, 15 Jan 2004, 04:30

    Tom Lord gave me your email address, I believe he’s
    already sent you the email that I sent him so I’m sure
    you have some background.

    In short, I am going to fund some open source
    development for a year. This is part of a new project
    that I will be getting off the ground in the coming
    weeks. I don’t know where it will lead, it’s flying in
    the face of a stiff breeze but I think at the end of
    the day it will at least fund a few very good open
    source developers for a full year to work on the
    projects they like most.

    One of the pieces of the puzzle is high end source
    code management. I’ll be looking to build an
    infrastructure that will manage source code for
    between 100 and 8000 open source projects (yes,
    there’s a big difference between the two, I don’t know
    at which end of the spectrum we will be at the end of
    the year but our infrastructure will have to at least
    be capable of scaling to the latter within two years)
    with upwards of 2000 developers, drawing code from a
    variety of sources, playing with it and spitting it
    out regularly in nice packages.

    Arch and Subversion seem to be the two leading
    contenders for “next generation open source sccm”. I’d
    be interested in your thoughts on the two of them, and
    how they stack up. I’m looking to hire one person who
    will lead that part of the effort. They’ll work alone
    from home, and be responsible for two things. First,
    extending the tool (arch or svn) in ways that help the
    project. Such extensions will be released under an
    open source licence, and hopefully embraced by the
    tools maintainers and included in the mainline code
    for the tool. And second, they will be responsible for
    our large-scale implementation of SCCM, using that
    tool, and building the management scripts and other
    infrastructure to support such a large, and hopefully
    highly automated, set of repositories.

    Would you be interested in this position? What
    attributes and experience do you think would make you
    a great person to have on the team? What would your
    salary expectation be, as a monthly figure, for a one
    year contract full time?

    I’m currently on your continent, well, just off it. On
    Lizard Island, up North. Am headed today for Brisbane,
    then on the 17th to Launceston via Melbourne. If you
    happen to be on any of those stops, would you be
    interested in meeting up to discuss it further?

    If you’re curious you can find out a bit more about me
    at www.markshuttleworth.com. This project is much
    lower key than some of what you’ll find there. It’s a
    very long shot indeed. But if at worst all that
    happens is a bunch of open source work gets funded at
    my expense I’ll feel it was money well spent.

    Cheers,
    Mark

    =====

    “Good judgement comes from experience, and often experience
    comes from bad judgement” – Rita Mae Brown


    ,

    Arjen LentzClassic McEleice and the NIST search for post-quantum crypto

    I have always liked cryptography, and public-key cryptography in particularly. When Pretty Good Privacy (PGP) first came out in 1991, I not only started using it, also but looking at the documentation and the code to see how it worked. I created my own implementation in C using very small keys, just to better understand.

    Cryptography has been running a race against both faster and cheaper computing power. And these days, with banking and most other aspects of our lives entirely relying on secure communications, it’s a very juicy target for bad actors.

    About 5 years ago, the National (USA) Institute for Science and Technology (NIST) initiated a search for cryptographic algorithmic that should withstand a near-future world where quantum computers with a significant number of qubits are a reality. There have been a number of rounds, which mid 2020 saw round 3 and the finalists.

    This submission caught my eye some time ago: Classic McEliece, and out of the four finalists it’s the only one that is not lattice-based [wikipedia link].

    For Public Key Encryption and Key Exchange Mechanism, Prof Bill Buchanan thinks that the winner will be lattice-based, but I am not convinced.

    Robert McEleice at his retirement in 2007

    Tiny side-track, you may wonder where does the McEleice name come from? From mathematician Robert McEleice (1942-2019). McEleice developed his cryptosystem in 1978. So it’s not just named after him, he designed it. For various reasons that have nothing to do with the mathematical solidity of the ideas, it didn’t get used at the time. He’s done plenty cool other things, too. From his Caltech obituary:

    He made fundamental contributions to the theory and design of channel codes for communication systems—including the interplanetary telecommunication systems that were used by the Voyager, Galileo, Mars Pathfinder, Cassini, and Mars Exploration Rover missions.

    Back to lattices, there are both unknowns (aspects that have not been studied in exhaustive depth) and recent mathematical attacks, both of which create uncertainty – in the crypto sphere as well as for business and politics. Given how long it takes for crypto schemes to get widely adopted, the latter two are somewhat relevant, particularly since cyber security is a hot topic.

    Lattices are definitely interesting, but given what we know so far, it is my feeling that systems based on lattices are more likely to be proven breakable than Classic McEleice, which come to this finalists’ table with 40+ years track record of in-depth analysis. Mind that all finalists are of course solid at this stage – but NIST’s thoughts on expected developments and breakthroughs is what is likely to decide the winner. NIST are not looking for shiny, they are looking for very very solid in all possible ways.

    Prof Buchanan recently published implementations for the finalists, and did some benchmarks where we can directly compare them against each other.

    We can see that Classic McEleice’s key generation is CPU intensive, but is that really a problem? The large size of its public key may be more of a factor (disadvantage), however the small ciphertext I think more than offsets that disadvantage.

    As we’re nearing the end of the NIST process, in my opinion, fast encryption/decryption and small cyphertext, combined with the long track record of in-depth analysis, may still see Classic McEleice come out the winner.

    The post Classic McEleice and the NIST search for post-quantum crypto first appeared on Lentz family blog.

    ,

    Lev LafayetteInaugural John Lions Distinguished Lectures, University of New South Wales

    The importance of John Lions to computing history has spanned decades and continues to do so. In 1976 he published, through the University of New South Wales, the Lions' Commentary on UNIX 6th Edition, with Source Code. The book was both explanatory for the UNIX kernel, but also as a teaching tool. The content was extraordinarily well-written, explaining difficult concepts with a remarkable coherence and also engaging in an early form of instructional scaffolding. When AT&T released UNIX v7 they specifically prohibited "classroom use" of the content, leading to thousands of educators, engineers, and learners around the world photocopying the "Lion's Book", making it the most illegally copied book in computer science history. It wasn't until 1996 when the Santa Cruz Operation, the new owners of UNIX, allowed for its legal publication. Of course, Lions was an academic (associate professor) and involved the community as the founding president of the Australian UNIX User's Group.

    With this in mind, the University of New South Wales organised an extraordinary one-day conference with some of the most impressive figures in global IT infrastructure for the last fifty years. UNIX co-creator (and B language inventor, and Go language co-creator) Ken Thompson started the day with Scientia Professor Gernot Heiser. A former student of John Lions, and chair of the organising committee, John O'Brien contributed with the always incredible Brian Kernighan in a thoroughly charming and clear description of the design principles behind UNIX. Kernighan, always able to enunciate these features, noted the great importance of operators such as pipes and redirection statements and the use of regular expressions. I could not help but feel a small sense of gratification knowing how much I emphasise these features in my own teaching. It is difficult to imagine what high performance computing would be like without such components.

    If there was an area that was somewhat unknown to me was the content-rich material by Margo Seltzer one of the original developers of BerkeleyDB, and she explored some of the conflicts that existed between databases and operating systems and how UNIX's development worked well with BerkeleyDB. Rob Pike, co-creator the Plan 9 operating system, co-author of the UNIX Programming Environment, and co-creator of the Go language, provided an inspiring overview of the development and recent innovations of the latter, while Andrew Tridgell, creator of Samba and rsync, gave his take on the development of FOSS over the years - and with a particular illustration of the "French cafe" method of learning proprietary protocols.

    It was not all deep-tech however, and the supporting processes are also a necessary (and often understated) part of the development of our major IT technologies. Butler Lampson spoke in a careful, detailed, and structured fashion on the design of computer systems and fielded a question on the relevant benefits of FOSS systems quite well. Elizabeth Churchill provided a very handy overview of how the emphasis on user experience over the years has a concurrent meeting for developer-experience, especially using Fuschia and Flutter, and current president of Linx Australia, Sae Ra Germaine, gave some handy advice on the management of communities, especially including the less salubrious parts.

    The final two presentations, by Gernot Heiser and Andy Tanenbaum, were certainly capstones on the formal proceedings of the day. Gernot gave a potted history and rather impressive current deployments of the sel4 microkernel which provides security and performance on the kernel layer. I am especially interested in its RISC-V implementation. The final presentation of the day was by Andy Tanenbaum, creator of Minix, and author of two of the most well-known books used in computer science education on Computer Networks and Operating Systems. Apart from raising the classic debate between himself and Linus Torvalds over the relative virtue of micro vs monolithic kernels. Tannenbaum did make the very important point that the Lions Book led directly to MINIX, which led to Linux, which lead to Android (and to which I will add with BSD UNIX and MacOS X also following having a lineage to Lions). In other words, almost everything that we really know about computing today has been profoundly influenced by John Lions.

    Finishing the day were announcements by Heiser for the establishment of the UNSW Centre for Critical Digital Infrastructure and a significant John Lions prize for Open Source aligned with the Centre. There were also final words by the UNSW Vice-Chancellor Ian Jacobs before we departed to an evening re-dedication of the John Lions Garden. It was during that time I engaged in conversation with his brother, his wife Marianne (who gave a charming speech), and his two daughters, both of who were impressed but I suspect a little mystified on the importance of their late father (Pixel, the family dog was quite a character as well). I must also mention, courtesy of spending of much of the day in his delightful company, the contributions of one John Wulff, and especially his basic assembler language, balad. I look forward to further correspondence with this very experienced engineer from a different era.

    ,

    Dave HallA Rube Goldberg Machine for Container Workflows

    Learn how can you securely copy container images from GHCR to ECR.

    ,

    Chris NeugebauerAdding a PurpleAir monitor to Home Assistant

    Living in California, I’ve (sadly) grown accustomed to needing to keep track of our local air quality index (AQI) ratings, particularly as we live close to places where large wildfires happen every other year.

    Last year, Josh and I bought a PurpleAir outdoor air quality meter, which has been great. We contribute our data to a collection of very local air quality meters, which is important, since the hilly nature of the North Bay means that the nearest government air quality ratings can be significantly different to what we experience here in Petaluma.

    I recently went looking to pull my PurpleAir sensor data into my Home Assistant setup. Unfortunately, the PurpleAir API does not return the AQI metric for air quality, only the raw PM2.5/PM5/PM10 numbers. After some searching, I found a nice template sensor solution on the Home Assistant forums, which I’ve modernised by adding the AQI as a sub-sensor, and adding unique ID fields to each useful sensor, so that you can assign them to a location.

    You’ll end up with sensors for raw PM2.5, the PM2.5 AQI value, the US EPA air quality category, air pressure, relative humidity and air pressure.

    How to use this

    First up, visit the PurpleAir Map, find the sensor you care about, click “get this widget�, and then “JSON�. That will give you the URL to set as the resource key in purpleair.yaml.

    Adding the configuration

    In HomeAssistant, add the following line to your configuration.yaml:

    sensor: !include purpleair.yaml
    

    and then add the following contents to purpleair.yaml

    
     - platform: rest
       name: 'PurpleAir'
    
       # Substitute in the URL of the sensor you care about.  To find the URL, go
       # to purpleair.com/map, find your sensor, click on it, click on "Get This
       # Widget" then click on "JSON".
       resource: https://www.purpleair.com/json?key={KEY_GOES_HERE}&show={SENSOR_ID}
    
       # Only query once a minute to avoid rate limits:
       scan_interval: 60
    
       # Set this sensor to be the AQI value.
       #
       # Code translated from JavaScript found at:
       # https://docs.google.com/document/d/15ijz94dXJ-YAZLi9iZ_RaBwrZ4KtYeCy08goGBwnbCU/edit#
       value_template: >
         {{ value_json["results"][0]["Label"] }}
       unit_of_measurement: ""
       # The value of the sensor can't be longer than 255 characters, but the
       # attributes can.  Store away all the data for use by the templates below.
       json_attributes:
         - results
    
     - platform: template
       sensors:
         purpleair_aqi:
           unique_id: 'purpleair_SENSORID_aqi_pm25'
           friendly_name: 'PurpleAir PM2.5 AQI'
           value_template: >
             {% macro calcAQI(Cp, Ih, Il, BPh, BPl) -%}
               {{ (((Ih - Il)/(BPh - BPl)) * (Cp - BPl) + Il)|round|float }}
             {%- endmacro %}
             {% if (states('sensor.purpleair_pm25')|float) > 1000 %}
               invalid
             {% elif (states('sensor.purpleair_pm25')|float) > 350.5 %}
               {{ calcAQI((states('sensor.purpleair_pm25')|float), 500.0, 401.0, 500.0, 350.5) }}
             {% elif (states('sensor.purpleair_pm25')|float) > 250.5 %}
               {{ calcAQI((states('sensor.purpleair_pm25')|float), 400.0, 301.0, 350.4, 250.5) }}
             {% elif (states('sensor.purpleair_pm25')|float) > 150.5 %}
               {{ calcAQI((states('sensor.purpleair_pm25')|float), 300.0, 201.0, 250.4, 150.5) }}
             {% elif (states('sensor.purpleair_pm25')|float) > 55.5 %}
               {{ calcAQI((states('sensor.purpleair_pm25')|float), 200.0, 151.0, 150.4, 55.5) }}
             {% elif (states('sensor.purpleair_pm25')|float) > 35.5 %}
               {{ calcAQI((states('sensor.purpleair_pm25')|float), 150.0, 101.0, 55.4, 35.5) }}
             {% elif (states('sensor.purpleair_pm25')|float) > 12.1 %}
               {{ calcAQI((states('sensor.purpleair_pm25')|float), 100.0, 51.0, 35.4, 12.1) }}
             {% elif (states('sensor.purpleair_pm25')|float) >= 0.0 %}
               {{ calcAQI((states('sensor.purpleair_pm25')|float), 50.0, 0.0, 12.0, 0.0) }}
             {% else %}
               invalid
             {% endif %}
           unit_of_measurement: "bit"
         purpleair_description:
           unique_id: 'purpleair_SENSORID_description'
           friendly_name: 'PurpleAir AQI Description'
           value_template: >
             {% if (states('sensor.purpleair_aqi')|float) >= 401.0 %}
               Hazardous
             {% elif (states('sensor.purpleair_aqi')|float) >= 301.0 %}
               Hazardous
             {% elif (states('sensor.purpleair_aqi')|float) >= 201.0 %}
               Very Unhealthy
             {% elif (states('sensor.purpleair_aqi')|float) >= 151.0 %}
               Unhealthy
             {% elif (states('sensor.purpleair_aqi')|float) >= 101.0 %}
               Unhealthy for Sensitive Groups
             {% elif (states('sensor.purpleair_aqi')|float) >= 51.0 %}
               Moderate
             {% elif (states('sensor.purpleair_aqi')|float) >= 0.0 %}
               Good
             {% else %}
               undefined
             {% endif %}
           entity_id: sensor.purpleair
         purpleair_pm25:
           unique_id: 'purpleair_SENSORID_pm25'
           friendly_name: 'PurpleAir PM 2.5'
           value_template: "{{ state_attr('sensor.purpleair','results')[0]['PM2_5Value'] }}"
           unit_of_measurement: "μg/m3"
           entity_id: sensor.purpleair
         purpleair_temp:
           unique_id: 'purpleair_SENSORID_temperature'
           friendly_name: 'PurpleAir Temperature'
           value_template: "{{ state_attr('sensor.purpleair','results')[0]['temp_f'] }}"
           unit_of_measurement: "°F"
           entity_id: sensor.purpleair
         purpleair_humidity:
           unique_id: 'purpleair_SENSORID_humidity'
           friendly_name: 'PurpleAir Humidity'
           value_template: "{{ state_attr('sensor.purpleair','results')[0]['humidity'] }}"
           unit_of_measurement: "%"
           entity_id: sensor.purpleair
         purpleair_pressure:
           unique_id: 'purpleair_SENSORID_pressure'
           friendly_name: 'PurpleAir Pressure'
           value_template: "{{ state_attr('sensor.purpleair','results')[0]['pressure'] }}"
           unit_of_measurement: "hPa"
           entity_id: sensor.purpleair
    
    

    Quirks

    I had difficulty getting the AQI to display as a numeric graph when I didn’t set a unit. I went with bit, and that worked just fine. 🤷�♂�

    ,

    Stewart SmithAn Unearthly Child

    So, this idea has been brewing for a while now… try and watch all of Doctor Who. All of it. All 38 seasons. Today(ish), we started. First up, from 1963 (first aired not quite when intended due to the Kennedy assassination): An Unearthly Child. The first episode of the first serial.

    A lot of iconic things are there from the start: the music, the Police Box, embarrassing moments of not quite remembering what time one is in, and normal humans accidentally finding their way into the TARDIS.

    I first saw this way back when a child, where they were repeated on ABC TV in Australia for some anniversary of Doctor Who (I forget which one). Well, I saw all but the first episode as the train home was delayed and stopped outside Caulfield for no reason for ages. Some things never change.

    Of course, being a show from the early 1960s, there’s some rougher spots. We’re not about to have the picture of diversity, and there’s going to be casual racism and sexism. What will be interesting is noticing these things today, and contrasting with my memory of them at the time (at least for episodes I’ve seen before), and what I know of the attitudes of the time.

    “This year-ometer is not calculating properly” is a very 2020 line though (technically from the second episode).

    ,

    Lev LafayetteThe Engineer's Curse

    For the third time in the past year I have sat down to watch "The Wind Rises", the fictionalised biography of Jiro Horikoshi by Hayao Miyazaki and animated by Studio Ghibli, the title derived from a line in Paul Valéry's "The Graveyard by the Sea", "Le vent se lève! Il faut tenter de vivre!" In this tale, Jiro is an engineer who follows his childhood dreams of designing aircraft. However, the setting is imperial Japan in the 1930s and Jiro's employer, Mitsubishi, is under direction to build efficient planes for the purposes of warfare. This is, of course, quite contrary to the claim of Jiro's dream-figure, the Italian engineer Giovanni Battista Caproni who says: "Airplanes are not tools for war. They are not for making money. Airplanes are beautiful dreams. Engineers turn dreams into reality."

    Engineers turn dreams into reality. Alas, Caproni's idealism runs against the harsh reality of politics and economics. Engineers are employed so they can make money for investors. Or they are directed by States, to make tools for war. In the course of the story, Jiro must make various aircraft, such as the Mitsubishi 1MF9 Falcon, a fighter plane, the Mitsubishi 1MF10, another fighter plane, the Mitsubishi A5M, another fighter plane, and of course the Mitsubishi A6M Zero. It was a particularly sad moment when Jiro, attempting to reduce the weight of one of the planes says: "One solution would be, we could leave out the guns". Which of course, would work. It would be a beautiful plane. But that's not what the military wants. Their planes are not designed for beauty, but for killing fellow human beings.

    If someone had told me in my undergraduate days, when I was a student of politics, philosophy, and sociology, that I would in the future be employed as an engineer I probably wouldn't have believed it. To my mind at the time, engineers were a terribly conservative and often a boorish lot. In hindsight, the latter part was more of the undergraduate culture. But as the former, I had misinterpreted it. The conservative nature of engineers is because they understand that, unlike with people, you cannot change the opinion of reality through appeals to morality or aesthetics. Further, they attempt their designs from minimalism to maximise efficiency, and that includes design principles. As Freeman Dyson once wrote:

    "A good scientist is a person with original ideas. A good engineer is a person who makes a design that works with as few original ideas as possible. There are no prima donnas in engineering." (Dyson, Disturbing the universe)

    Natural, factual, reality will remain stubborn, it cannot be convinced or tricked, and the engineer must discover, uncover, and adapt to its inviolable rules. I suspect that the reason that a lot of seemingly smart people turn to, for example, to one of the many disciplines of computer engineering is that the machine does not lie. It does not revise facts according to its feelings. It does not negotiate, and nor can it be induced by appeals or admonishments. There is a brutal and logical honesty in error codes.

    Of course, engineers do live with other people and they must negotiate and appeal etc with others. Engineers are human too, and they will have to deal with their own emotional calls. In the course of the story, Jiro falls in love with the artist Naoko Satomi who succumbs to tuberculosis, the time they spend together all too short, precious, beautiful, and rare. It is tragic that this genuine human relationship has to circumvent the officials who have raised their loyalty to abstract institutions to a level higher than the solidarity that one should show to visceral human beings. But that is just another elaboration of The Engineer's Curse, at least in this context of warfare.

    In this regard, my own situation is less cursed and more blessed. When I think of the sort of assistance that I provide university researchers there is often something quite beneficial about it. I have been charmed by studies into the effect of light on the nocturnal songs rate of willie wagtails. I have recognised the importance of comparing and predicting population samples for marine conservation areas. Or for that matter signaling for SARS-CoV-2, or the transmission of Q Fever, and many other examples. It goes to show that systems, both physical and social, can be designed for the betterment of the world rather than its destruction. It didn't work out that way for Jiro Horikoshi, who just wanted to build beautiful planes. It is in memory of such tragic stories that we must direct the practical intent of our engineering efforts.

    ,

    Lev LafayetteHow to withdraw from an unsuitable course

    When I rage-quit, I don't mince my words.

    Hi XXXXX,

    I am going to have to withdraw from the XXXXX and the XXXXX course.

    The content that has been provided from the latter is infuriating. It is everything that I have considered wrong in teaching computing tools over the past twenty years, and have witnessed ever-increasing numbers of learners who leave such courses who are increasingly ignorant of IT systems.

    There is no content on understanding how the shopping list of tools in the "Ecology Of Resources" in terms of their underlying principles or how to integrate them with different operating systems and environments. Instead, there is just a superficial understanding of their use. This means, every two-three years, when a new tool is introduced or an existing one is upgraded, re-learning is required without ever acquiring a foundational understanding.

    This is contrary to the principles of structured learning and instructional scaffolding, the idea of which a learner gains an understanding of the concepts and then can elaborate to new understandings in an integrated manner. It is curious that such principles are taught from an educational perspective, but not applied reflexively to the teaching of the computing utilities itself.

    Further, the actual tools themselves have not been checked with standard operating environments provided by the university. A major assignment depends on the use of Adobe Spark. Is this platform-independent? Is it available for all operating system environments provided by university equipment? Was this even checked? Do the tools satisfy the accessibility requirements for teachers and learners with, for example, visual disabilities? Was this even checked?

    As I complete my sixth degree (and the second post-graduate education degree), I have to say that this is simply the worst course I have ever participated in.

    Yours sincerely,

    Me.

    ,

    Jan SchmidtRift CV1 – Getting close now…

    It’s been a while since my last post about tracking support for the Oculus Rift in February. There’s been big improvements since then – working really well a lot of the time. It’s gone from “If I don’t make any sudden moves, I can finish an easy Beat Saber level” to “You can’t hide from me!” quality.

    Equally, there are still enough glitches and corner cases that I think I’ll still be at this a while.

    Here’s a video from 3 weeks ago of (not me) playing Beat Saber on Expert+ setting showing just how good things can be now:

    Beat Saber – Skunkynator playing Expert+, Mar 16 2021

    Strap in. Here’s what I’ve worked on in the last 6 weeks:

    Pose Matching improvements

    Most of the biggest improvements have come from improving the computer vision algorithm that’s matching the observed LEDs (blobs) in the camera frames to the 3D models of the devices.

    I split the brute-force search algorithm into 2 phases. It now does a first pass looking for ‘obvious’ matches. In that pass, it does a shallow graph search of blobs and their nearest few neighbours against LEDs and their nearest neighbours, looking for a match using a “Strong” match metric. A match is considered strong if expected LEDs match observed blobs to within 1.5 pixels.

    Coupled with checks on the expected orientation (matching the Gravity vector detected by the IMU) and the pose prior (expected position and orientation are within predicted error bounds) this short-circuit on the search is hit a lot of the time, and often completes within 1 frame duration.

    In the remaining tricky cases, where a deeper graph search is required in order to recover the pose, the initial search reduces the number of LEDs and blobs under consideration, speeding up the remaining search.

    I also added an LED size model to the mix – for a candidate pose, it tries to work out how large (in pixels) each LED should appear, and use that as a bound on matching blobs to LEDs. This helps reduce mismatches as devices move further from the camera.

    LED labelling

    When a brute-force search for pose recovery completes, the system now knows the identity of various blobs in the camera image. One way it avoids a search next time is to transfer the labels into future camera observations using optical-flow tracking on the visible blobs.

    The problem is that even sped-up the search can still take a few frame-durations to complete. Previously LED labels would be transferred from frame to frame as they arrived, but there’s now a unique ID associated with each blob that allows the labels to be transferred even several frames later once their identity is known.

    IMU Gyro scale

    One of the problems with reverse engineering is the guesswork around exactly what different values mean. I was looking into why the controller movement felt “swimmy” under fast motions, and one thing I found was that the interpretation of the gyroscope readings from the IMU was incorrect.

    The touch controllers report IMU angular velocity readings directly as a 16-bit signed integer. Previously the code would take the reading and divide by 1024 and use the value as radians/second.

    From teardowns of the controller, I know the IMU is an Invensense MPU-6500. From the datasheet, the reported value is actually in degrees per second and appears to be configured for the +/- 2000 °/s range. That yields a calculation of Gyro-rad/s = Gyro-°/s * (2000 / 32768) * (?/180) – or a divisor of 938.734.

    The 1024 divisor was under-estimating rotation speed by about 10% – close enough to work until you start moving quickly.

    Limited interpolation

    If we don’t find a device in the camera views, the fusion filter predicts motion using the IMU readings – but that quickly becomes inaccurate. In the worst case, the controllers fly off into the distance. To avoid that, I added a limit of 500ms for ‘coasting’. If we haven’t recovered the device pose by then, the position is frozen in place and only rotation is updated until the cameras find it again.

    Exponential filtering

    I implemented a 1-Euro exponential smoothing filter on the output poses for each device. This is an idea from the Project Esky driver for Project North Star/Deck-X AR headsets, and almost completely eliminates jitter in the headset view and hand controllers shown to the user. The tradeoff is against introducing lag when the user moves quickly – but there are some tunables in the exponential filter to play with for minimising that. For now I’ve picked some values that seem to work reasonably.

    Non-blocking radio

    Communications with the touch controllers happens through USB radio command packets sent to the headset. The main use of radio commands in OpenHMD is to read the JSON configuration block for each controller that is programmed in at the factory. The configuration block provides the 3D model of LED positions as well as initial IMU bias values.

    Unfortunately, reading the configuration block takes a couple of seconds on startup, and blocks everything while it’s happening. Oculus saw that problem and added a checksum in the controller firmware. You can read the checksum first and if it hasn’t changed use a local cache of the configuration block. Eventually, I’ll implement that caching mechanism for OpenHMD but in the meantime it still reads the configuration blocks on each startup.

    As an interim improvement I rewrote the radio communication logic to use a state machine that is checked in the update loop – allowing radio communications to be interleaved without blocking the regularly processing of events. It still interferes a bit, but no longer causes a full multi-second stall as each hand controller turns on.

    Haptic feedback

    The hand controllers have haptic feedback ‘rumble’ motors that really add to the immersiveness of VR by letting you sense collisions with objects. Until now, OpenHMD hasn’t had any support for applications to trigger haptic events. I spent a bit of time looking at USB packet traces with Philipp Zabel and we figured out the radio commands to turn the rumble motors on and off.

    In the Rift CV1, the haptic motors have a mode where you schedule feedback events into a ringbuffer – effectively they operate like a low frequency audio device. However, that mode was removed for the Rift S (and presumably in the Quest devices) – and deprecated for the CV1.

    With that in mind, I aimed for implementing the unbuffered mode, with explicit ‘motor on + frequency + amplitude’ and ‘motor off’ commands sent as needed. Thanks to already having rewritten the radio communications to use a state machine, adding haptic commands was fairly easy.

    The big question mark is around what API OpenHMD should provide for haptic feedback. I’ve implemented something simple for now, to get some discussion going. It works really well and adds hugely to the experience. That code is in the https://github.com/thaytan/OpenHMD/tree/rift-haptics branch, with a SteamVR-OpenHMD branch that uses it in https://github.com/thaytan/SteamVR-OpenHMD/tree/controller-haptics-wip

    Problem areas

    Unexpected tracking losses

    I’d say the biggest problem right now is unexpected tracking loss and incorrect pose extractions when I’m not expecting them. Especially my right controller will suddenly glitch and start jumping around. Looking at a video of the debug feed, it’s not obvious why that’s happening:

    To fix cases like those, I plan to add code to log the raw video feed and the IMU information together so that I can replay the video analysis frame-by-frame and investigate glitches systematically. Those recordings will also work as a regression suite to test future changes.

    Sensor fusion efficiency

    The Kalman filter I have implemented works really nicely – it does the latency compensation, predicts motion and extracts sensor biases all in one place… but it has a big downside of being quite expensive in CPU. The Unscented Kalman Filter CPU cost grows at O(n^3) with the size of the state, and the state in this case is 43 dimensional – 22 base dimensions, and 7 per latency-compensation slot. Running 1000 updates per second for the HMD and 500 for each of the hand controllers adds up quickly.

    At some point, I want to find a better / cheaper approach to the problem that still provides low-latency motion predictions for the user while still providing the same benefits around latency compensation and bias extraction.

    Lens Distortion

    To generate a convincing illusion of objects at a distance in a headset that’s only a few centimetres deep, VR headsets use some interesting optics. The LCD/OLED panels displaying the output get distorted heavily before they hit the users eyes. What the software generates needs to compensate by applying the right inverse distortion to the output video.

    Everyone that tests the CV1 notices that the distortion is not quite correct. As you look around, the world warps and shifts annoyingly. Sooner or later that needs fixing. That’s done by taking photos of calibration patterns through the headset lenses and generating a distortion model.

    Camera / USB failures

    The camera feeds are captured using a custom user-space UVC driver implementation that knows how to set up the special synchronisation settings of the CV1 and DK2 cameras, and then repeatedly schedules isochronous USB packet transfers to receive the video.

    Occasionally, some people experience failure to re-schedule those transfers. The kernel rejects them with an out-of-memory error failing to set aside DMA memory (even though it may have been running fine for quite some time). It’s not clear why that happens – but the end result at the moment is that the USB traffic for that camera dies completely and there’ll be no more tracking from that camera until the application is restarted.

    Often once it starts happening, it will keep happening until the PC is rebooted and the kernel memory state is reset.

    Occluded cases

    Tracking generally works well when the cameras get a clear shot of each device, but there are cases like sighting down the barrel of a gun where we expect that the user will line up the controllers in front of one another, and in front of the headset. In that case, even though we probably have a good idea where each device is, it can be hard to figure out which LEDs belong to which device.

    If we already have a good tracking lock on the devices, I think it should be possible to keep tracking even down to 1 or 2 LEDs being visible – but the pose assessment code will have to be aware that’s what is happening.

    Upstreaming

    April 14th marks 2 years since I first branched off OpenHMD master to start working on CV1 tracking. How hard can it be, I thought? I’ll knock this over in a few months.

    Since then I’ve accumulated over 300 commits on top of OpenHMD master that eventually all need upstreaming in some way.

    One thing people have expressed as a prerequisite for upstreaming is to try and remove the OpenCV dependency. The tracking relies on OpenCV to do camera distortion calculations, and for their PnP implementation. It should be possible to reimplement both of those directly in OpenHMD with a bit of work – possibly using the fast LambdaTwist P3P algorithm that Philipp Zabel wrote, that I’m already using for pose extraction in the brute-force search.

    Others

    I’ve picked the top issues to highlight here. https://github.com/thaytan/OpenHMD/issues has a list of all the other things that are still on the radar for fixing eventually.

    Other Headsets

    At some point soon, I plan to put a pin in the CV1 tracking and look at adapting it to more recent inside-out headsets like the Rift S and WMR headsets. I implemented 3DOF support for the Rift S last year, but getting to full positional tracking for that and other inside-out headsets means implementing a SLAM/VIO tracking algorithm to track the headset position.

    Once the headset is tracking, the code I’m developing here for CV1 to find and track controllers will hopefully transfer across – the difference with inside-out tracking is that the cameras move around with the headset. Finding the controllers in the actual video feed should work much the same.

    Sponsorship

    This development happens mostly in my spare time and partly as open source contribution time at work at Centricular. I am accepting funding through Github Sponsorships to help me spend more time on it – I’d really like to keep helping Linux have top-notch support for VR/AR applications. Big thanks to the people that have helped get this far.

    ,

    Stewart Smithlibeatmydata v129

    Every so often, I release a new libeatmydata. This has not happened for a long time. This is just some bug fixes, most of which have been in the Debian package for some time, I’ve just been lazy and not sat down and merged them.

    git clone https://github.com/stewartsmith/libeatmydata.git

    Download the source tarball from here: libeatmydata-129.tar.gz and GPG signature: libeatmydata-129.tar.gz.asc from my GPG key.

    Or, feel free to grab some Fedora RPMs:

    Releases published also in the usual places:

    ,

    BlueHackersWorld bipolar day 2021

    Today, 30 March, is World Bipolar Day.

    Vincent van Gogh - Worn Out

    Why that particular date? It’s Vincent van Gogh’s birthday (1853), and there is a fairly strong argument that the Dutch painter suffered from bipolar (among other things).

    The image on the side is Vincent’s drawing “Worn Out” (from 1882), and it seems to capture the feeling rather well – whether (hypo)manic, depressed, or mixed. It’s exhausting.

    Bipolar is complicated, often undiagnosed or misdiagnosed, and when only treated with anti-depressants, it can trigger the (hypo)mania – essentially dragging that person into that state near-permanently.

    Have you heard of Bipolar II?

    Hypo-mania is the “lesser” form of mania that distinguishes Bipolar I (the classic “manic depressive” syndrome) from Bipolar II. It’s “lesser” only in the sense that rather than someone going so hyper they may think they can fly (Bipolar I is often identified when someone in manic state gets admitted to hospital – good catch!) while with Bipolar II the hypo-mania may actually exhibit as anger. Anger in general, against nothing in particular but potentially everyone and everything around them. Or, if it’s a mixed episode, anger combined with strong negative thoughts. Either way, it does not look like classic mania. It is, however, exhausting and can be very debilitating.

    Bipolar II people often present to a doctor while in depressed state, and GPs (not being psychiatrists) may not do a full diagnosis. Note that D.A.S. and similar test sheets are screening tools, they are not diagnostic. A proper diagnosis is more complex than filling in a form some questions (who would have thought!)

    Call to action

    If you have a diagnosis of depression, only from a GP, and are on medication for this, I would strongly recommend you also get a referral to a psychiatrist to confirm that diagnosis.

    Our friends at the awesome Black Dog Institute have excellent information on bipolar, as well as a quick self-test – if that shows some likelihood of bipolar, go get that referral and follow up ASAP.

    I will be writing more about the topic in the coming time.

    The post World bipolar day 2021 first appeared on BlueHackers.org.

    ,

    Michael StillManipulating Docker images without Docker installed

    Recently I’ve been playing a bit more with Docker images and Docker image repositories. I had in the past written a quick hack to let me extract files from a Docker image, but I wanted to do something a little more mature than that.

    For example, sometimes you want to download an image from a Docker image repository without using Docker. Naively if you had Docker, you’d do something like this:

    docker pull busybox
    docker save busybox

    However, that assumes that you have Docker installed on the machine downloading the images, and that’s sometimes not possible for security reasons. The most obvious example I can think of is airgapped secure environments where you need to walk the data between two networks, and the unclassified network machine doesn’t allow administrator access to install Docker.

    So I wrote a little tool to do image manipulation for me. The tool is called Occy Strap, is written in python, and is available on pypi. That means installing it is relatively simple:

    python3 -m venv ~/virtualenvs/occystrap
    . ~/virtualenvs/occystrap/bin/activate
    pip install occystrap

    Which doesn’t require administrator permissions. There are then a few things we can do with Occy Strap.

    Downloading an image from a repository and storing as a tarball

    Let’s say we want to download an image from a repository and store it as a local tarball. This is a common thing to want to do in airgapped environments for example. You could do this with docker with a docker pull; docker save. The Occy Strap equivalent is:

    occystrap fetch-to-tarfile registry-1.docker.io library/busybox \
        latest busybox.tar

    In this example we’re pulling from the Docker Hub (registry-1.docker.io), and are downloading busybox’s latest version into a tarball named busybox-occy.tar. This tarball can be loaded with docker load -i busybox.tar on an airgapped Docker environment.

    Downloading an image from a repository and storing as an extracted tarball

    The format of the tarball in the previous example is two JSON configuration files and a series of image layers as tarballs inside the main tarball. You can write these elements to a directory instead of to a tarball if you’d like to inspect them. For example:

    occystrap fetch-to-extracted registry-1.docker.io library/centos 7 \
        centos7

    This example will pull from the Docker Hub the Centos image with the label “7”, and write the content to a directory in the current working directory called “centos7”. If you tarred centos7 like this, you’d end up with a tarball equivalent to what fetch-to-tarfile produces, which could therefore be loaded with docker load:

    cd centos7; tar -cf ../centos7.tar *

    Downloading an image from a repository and storing it in a merged directory

    In scenarios where image layers are likely to be reused between images (for example many images which share a common base layer), you can save disk space by downloading images to a directory which contains more than one image. To make this work, you need to instruct Occy Strap to use unique names for the JSON elements within the image file:

    occystrap fetch-to-extracted --use-unique-names registry-1.docker.io \ 
        homeassistant/home-assistant latest merged_images
    occystrap fetch-to-extracted --use-unique-names registry-1.docker.io \ 
        homeassistant/home-assistant stable merged_images
    occystrap fetch-to-extracted --use-unique-names registry-1.docker.io \ 
        homeassistant/home-assistant 2021.3.0.dev20210219 merged_images

    Each of these images include 21 layers, but the merged_images directory at the time of writing this there are 25 unique layers in the directory. You end up with a layout like this:

    0465ae924726adc52c0216e78eda5ce2a68c42bf688da3f540b16f541fd3018c
    10556f40181a651a72148d6c643ac9b176501d4947190a8732ec48f2bf1ac4fb
    ...
    catalog.json 
    cd8d37c8075e8a0195ae12f1b5c96fe4e8fe378664fc8943f2748336a7d2f2f3 
    d1862a2c28ec9e23d88c8703096d106e0fe89bc01eae4c461acde9519d97b062 
    d1ac3982d662e038e06cc7e1136c6a84c295465c9f5fd382112a6d199c364d20.json 
    ... 
    d81f69adf6d8aeddbaa1421cff10ba47869b19cdc721a2ebe16ede57679850f0.json 
    ...
    manifest-homeassistant_home-assistant-2021.3.0.dev20210219.json 
    manifest-homeassistant_home-assistant-latest.json manifest-
    homeassistant_home-assistant-stable.json

    catalog.json is an Occy Strap specific artefact which maps which layers are used by which image. Each of the manifest files for the various images have been converted to have a unique name instead of manifest.json as well.

    To extract a single image from such a shared directory, use the recreate-image command:

    occystrap recreate-image merged_images homeassistant/home-assistant \
        latest ha-latest.tar

    Exploring the contents of layers and overwritten files

    Similarly, if you’d like the layers to be expanded from their tarballs to the filesystem, you can pass the --expand argument to fetch-to-extracted to have them extracted. This will also create a filesystem at the name of the manifest which is the final state of the image (the layers applied sequential). For example:

    occystrap fetch-to-extracted --expand quay.io \ 
        ukhomeofficedigital/centos-base latest ukhomeoffice-centos

    Note that layers delete files from previous layers with files named “.wh.$previousfilename”. These files are not processed in the expanded layers, so that they are visible to the user. They are however processed in the merged layer named for the manifest file.

    Michael StillComplexity Arrangements for Sustained Innovation: Lessons From 3M Corporation

    This is the second business paper I’ve read this week while reading along with my son’s university studies. The first is discussed here if you’re interested. This paper is better written, but more academic in its style. This ironically makes it harder to read, because its grammar style is more complicated and harder to parse.

    The take aways for me from this paper is that 3M is good at encouraging serendipity and opportune moments that create innovation. This is similar to Google’s attempts to build internal peer networks and deliberate lack of structure. In 3M’s case its partially expressed as 15% time, which is similar to Google’s 20% time. Specifically, “eureka moments” cannot be planned or scheduled, but require prior engagement.

    chance favors only the prepared mind — Pasteur

    3M has a variety of methods for encouraging peer networks, including technology fairs, “bootlegging” (borrowing idle resources from other teams), innovation grants, and so on.

    At the same time, 3M tries to keep at least a partial focus on events driving by schedules. The concept of time is important here — there is a “time to wait” (we are ahead of the market); “a time in between” (15% time); and “a time across” (several parallel efforts around related innovations to speed up the process).

    The idea of “a time to wait” is quite interesting. 3M has a history of discovering things where there is no current application, but somehow corporately remembering those things so that when there are applications years later they can jump in with a solution. They embrace story telling as part of their corporate memory, as well as a way of ensuring they learn from past success and failure.

    Finally, 3M is similar to Google in their deliberate flexibility with the rules. 15% time isn’t rigidly counted for example — it might be 15% a week, or 15% of a year, or more or less than that. As long as it can be justified as a good use of resources its ok.

    This was a good read and I enjoyed it.

     

    Michael StillA corporate system for continuous innovation: The case of Google Inc

    So, one of my kids is studying some business units at university and was assigned this paper to read. I thought it looked interesting, so I gave it a read as well.

    While not being particularly well written in terms of style, this is an approachable introduction to the culture and values of Google and how they play into Google’s continued ability to innovate. The paper identifies seven important attributes of the company’s culture that promote innovation, as ranked by the interviewed employees:

    • The culture is innovation oriented.
    • They put a lot of effort into selecting individuals who will fit well with the culture at hiring time.
    • Leaders are seen as performing a facilitiation role, not a directive one.
    • The organizational structure is loosely defined.
    • OKRs and aligned performance incentives.
    • A culture of organizational learning through postmortems and building internal social networks. Learning is considered a peer to peer activity that is not heavily structured.
    • External interaction — especially in the form of aggressive acquisition of skills and technologies in areas Google feels they are struggling in.

    Additionally, they identify eight habits of a good leader:

    • A good coach.
    • Empoyer your team and don’t micro-manage.
    • Express interest in employees’ success and well-being.
    • Be productive and results oriented.
    • Be a good communicator and listen to your team.
    • Help employees with career development.
    • Have a clear vision and strategy for the team.
    • Have key technical skills, so you can help advise the team.

    Overall, this paper is well worth the time to read. I enjoyed it and found it insightful.

    ,

    Stewart SmithThe Apple Power Macintosh 7200/120 PC Compatible (Part 1)

    So, I learned something recently: if you pick up your iPhone with eBay open on an auction bid screen in just the right way, you may accidentally click the bid button and end up buying an old computer. Totally not the worst thing ever, and certainly a creative way to make a decision.

    So, not too long later, a box arrives!

    In the 1990s, Apple created some pretty “interesting” computers and product line. One thing you could get is a DOS Compatibility (or PC Compatibility) card. This was a card that went into one of the expansion slots on a Mac and had something really curious on it: most of the guts of a PC.

    Others have written on these cards too: https://www.engadget.com/2009-12-10-before-there-was-boot-camp-there-were-dos-compatibility-cards.html and http://www.edibleapple.com/2009/12/09/blast-from-the-past-a-look-back-at-apples-dos-compatibility-cards/. There’s also the Service Manual https://tim.id.au/laptops/apple/misc/pc_compatibility_card.pdf with some interesting details.

    The machine I’d bought was an Apple Power Macintosh 7200/120 with the PC Compatible card added afterwards (so it doesn’t have the PC Compatible label on the front like some models ended up getting).

    The Apple Power Macintosh 7200/120

    Wikipedia has a good article on the line, noting that it was first released in August 1995, and fitting for the era, was sold as about 14 million other model numbers (okay not quite that bad, it was only a total of four model numbers for essentially the same machine). This specific model, the 7200/120 was introduced on April 22nd, 1996, and the original web page describing it from Apple is on the wayback machine.

    For older Macs, Low End Mac is a good resource, and there’s a page on the 7200, and amazingly Apple still has the tech specs on their web site!

    The 7200 series replaced the 7100, which was one of the original PowerPC based Macs. The big changes are using the industry standard PCI bus for its three expansion slots rather than NuBus. Rather surprisingly, NuBus was not Apple specific, but you could not call it widely adopted by successful manufacturers. Apple first used NuBus in the 1987 Macintosh II.

    The PCI bus was standardized in 1992, and it’s almost certain that a successor to it is in the computer you’re using to read this. It really quite caught on as an industry standard.

    The processor of the machine is a PowerPC 601. The PowerPC was an effort of IBM, Apple, and Motorola (the AIM Alliance) to create a class of processors for personal computers based on IBM’s POWER Architecture. The PowerPC 601 was the first of these processors, initially used by Apple in its Power Macintosh range. The machine I have has one running at a whopping 120Mhz. There continued to be PowerPC chips for a number of years, and IBM continued making POWER processors even after that. However, you are almost certainly not using a PowerPC derived processor in the computer you’re using to read this.

    The PC Compatibility card has on it a full on legit Pentium 100 processor, and hardware for doing VGA graphics, a Sound Blaster 16 and the other things you’d usually expect of a PC from 1996. Since it’s on a PCI card though, it’s a bit different than a PC of the era. It doesn’t have any expansion slots of its own, and in fact uses up one of the three PCI slots in the Mac. It also doesn’t have its own floppy drive, or hard drive. There’s software on the Mac that will let the PC card use the Mac’s floppy drive, and part of the Mac’s hard drive for the PC!

    The Pentium 100 was the first mass produced superscalar processor. You are quite likely to be using a computer with a processor related to the Pentium to read this, unless you’re using a phone or tablet, or one of the very latest Macs; in which case you’re using an ARM based processor. You likely have more ARM processors in your life than you have socks.

    Basically, this computer is a bit of a hodge-podge of historical technology, some of which ended up being successful, and other things less so.

    Let’s have a look inside!

    So, one of the PCI slots has a Vertex Twin Turbo 128M8A video card in it. There is not much about this card on the internet. There’s a photo of one on Wikimedia Commons though. I’ll have to investigate more.

    Does it work though? Yes! Here it is on my desk:

    The powered on Power Mac 7200/120

    Even with Microsoft Internet Explorer 4.0 that came with MacOS 8.6, you can find some places on the internet you can fetch files from, at a not too bad speed even!

    More fun times with this machine to come!

    ,

    Dave HallParameter Store vs Secrets Manager

    Which AWS managed service is best for storing and managing your secrets?

    ,

    Lev LafayetteInteractive HPC Computation with Open OnDemand and FastX

    As dataset size and complexity requirements grow increasingly researchers need to find additional computational power for processing. A preferred choice is high performance computing (HPC) which, due to its physical architecture, operating system, and optimised application installations, is best suited for such processing. However HPC systems have historically been less effective at the visual display, and least of all in an interactive manner, leading into a general truism of "compute on the HPC, visualise locally".

    This is primarily due to the tyranny of distance, but also with additional latency introduced by contemporary graphics when remote display instructions are sent to a local X-server. With a demand for both HPC computational power and interactive graphics, the University of Melbourne has implemented two technologies, FastX and Open OnDemand, on their general-purpose HPC system "Spartan". This allows users to run graphical applications on Spartan, by submitting a job to the batch system which executes an XFCE graphical environment. In the Spartan environment, FastX has been coupled with Open OnDemand which provides web-enabled applications (e.g., RStudio, Jupyter Notebooks). In illustrating how this environment operates at the Spartan HPC system, the presentation will also illustrate recent research case studies from the University of Melbourne that have utilised this technology.

    A presentation to eResearchNZ 2021

    ,

    Dave HallA Lost Parcel Results in a New Website

    When Australia Post lost a parcel, we found a lot of problems with one of their websites.

    ,

    Dave HallWe Have a New Website (Finally)

    After 15 years we rebuilt our website. Learn more about the new site.

    ,

    Michael StillShaken Fist v0.4.2

    Shaken Fist v0.4.2 snuck out yesterday as part of shooting this tutorial video. That’s because I really wanted to demonstrate floating IPs, which I only recently got working nicely. Overall in v0.4.2 we:

    • Improved CI for image API calls.
    • Improved upgrade CI testing.
    • Improved network state tracking.
    • Floating IPs now work, and have covering CI. shakenfist#257
    • Resolve leaks of floating IPs from both direct use and NAT gateways. shakenfist#256
    • Resolve leaks of IPManagers on network delete. shakenfist#675
    • Use system packages for ansible during install.

    Michael StillStarting your first instance on Shaken Fist (a video tutorial)

    As a bit of an experiment, I’ve made this quick and dirty “vlog” style tutorial video to show you how to install Shaken Fist on a single machine and boot your first instance. I demonstrate how to install, setup your first virtual network, start the instance, inspect events that the instance has experienced, and then log in.

    Let me know if you think its useful.

    ,

    Jan SchmidtRift CV1 – Testing SteamVR

    Update:

    This post documented an older method of building SteamVR-OpenHMD. I moved them to a page here. That version will be kept up to date for any future changes, so go there.


    I’ve had a few people ask how to test my OpenHMD development branch of Rift CV1 positional tracking in SteamVR. Here’s what I do:

    • Make sure Steam + SteamVR are already installed.
    • Clone the SteamVR-OpenHMD repository:
    git clone --recursive https://github.com/ChristophHaag/SteamVR-OpenHMD.git
    • Switch the internal copy of OpenHMD to the right branch:
    cd subprojects/openhmd
    git remote add thaytan-github https://github.com/thaytan/OpenHMD.git
    git fetch thaytan-github
    git checkout -b rift-kalman-filter thaytan-github/rift-kalman-filter
    cd ../../
    • Use meson to build and register the SteamVR-OpenHMD binaries. You may need to install meson first (see below):
    meson -Dbuildtype=release build
    ninja -C build
    ./install_files_to_build.sh
    ./register.sh
    • It is important to configure in release mode, as the kalman filtering code is generally too slow for real-time in debug mode (it has to run 2000 times per second)
    • Make sure your USB devices are accessible to your user account by configuring udev. See the OpenHMD guide here: https://github.com/OpenHMD/OpenHMD/wiki/Udev-rules-list
    • Please note – only Rift sensors on USB 3.0 ports will work right now. Supporting cameras on USB 2.0 requires someone implementing JPEG format streaming and decoding.
    • It can be helpful to test OpenHMD is working by running the simple example. Check that it’s finding camera sensors at startup, and that the position seems to change when you move the headset:
    ./build/subprojects/openhmd/openhmd_simple_example
    • Calibrate your expectations for how well tracking is working right now! Hint: It’s very experimental 🙂
    • Start SteamVR. Hopefully it should detect your headset and the light(s) on your Rift Sensor(s) should power on.

    Meson

    I prefer the Meson build system here. There’s also a cmake build for SteamVR-OpenHMD you can use instead, but I haven’t tested it in a while and it sometimes breaks as I work on my development branch.

    If you need to install meson, there are instructions here – https://mesonbuild.com/Getting-meson.html summarising the various methods.

    I use a copy in my home directory, but you need to make sure ~/.local/bin is in your PATH

    pip3 install --user meson

    ,

    Jan SchmidtRift CV1 – Pose rejection

    I spent some time this weekend implementing a couple of my ideas for improving the way the tracking code in OpenHMD filters and rejects (or accepts) possible poses when trying to match visible LEDs to the 3D models for each device.

    In general, the tracking proceeds in several steps (in parallel for each of the 3 devices being tracked):

    1. Do a brute-force search to match LEDs to 3D models, then (if matched)
      1. Assign labels to each LED blob in the video frame saying what LED they are.
      2. Send an update to the fusion filter about the position / orientation of the device
    2. Then, as each video frame arrives:
      1. Use motion flow between video frames to track the movement of each visible LED
      2. Use the IMU + vision fusion filter to predict the position/orientation (pose) of each device, and calculate which LEDs are expected to be visible and where.
    3. Try and match up and refine the poses using the predicted pose prior and labelled LEDs. In the best case, the LEDs are exactly where the fusion predicts they’ll be. More often, the orientation is mostly correct, but the position has drifted and needs correcting. In the worst case, we send the frame back to step 1 and do a brute-force search to reacquire an object.

    The goal is to always assign the correct LEDs to the correct device (so you don’t end up with the right controller in your left hand), and to avoid going back to the expensive brute-force search to re-acquire devices as much as possible

    What I’ve been working on this week is steps 1 and 3 – initial acquisition of correct poses, and fast validation / refinement of the pose in each video frame, and I’ve implemented two new strategies for that.

    Gravity Vector matching

    The first new strategy is to reject candidate poses that don’t closely match the known direction of gravity for each device. I had a previous implementation of that idea which turned out to be wrong, so I’ve re-worked it and it helps a lot with device acquisition.

    The IMU accelerometer and gyro can usually tell us which way up the device is (roll and pitch) but not which way they are facing (yaw). The measure for ‘known gravity’ comes from the fusion Kalman filter covariance matrix – how certain the filter is about the orientation of the device. If that variance is small this new strategy is used to reject possible poses that don’t have the same idea of gravity (while permitting rotations around the Y axis), with the filter variance as a tolerance.

    Partial tracking matches

    The 2nd strategy is based around tracking with fewer LED correspondences once a tracking lock is acquired. Initial acquisition of the device pose relies on some heuristics for how many LEDs must match the 3D model. The general heuristic threshold I settled on for now is that 2/3rds of the expected LEDs must be visible to acquire a cold lock.

    With the new strategy, if the pose prior has a good idea where the device is and which way it’s facing, it allows matching on far fewer LED correspondences. The idea is to keep tracking a device even down to just a couple of LEDs, and hope that more become visible soon.

    While this definitely seems to help, I think the approach can use more work.

    Status

    With these two new approaches, tracking is improved but still quite erratic. Tracking of the headset itself is quite good now and for me rarely loses tracking lock. The controllers are better, but have a tendency to “fly off my hands” unexpectedly, especially after fast motions.

    I have ideas for more tracking heuristics to implement, and I expect a continuous cycle of refinement on the existing strategies and new ones for some time to come.

    For now, here’s a video of me playing Beat Saber using tonight’s code. The video shows the debug stream that OpenHMD can generate via Pipewire, showing the camera feed plus overlays of device predictions, LED device assignments and tracked device positions. Red is the headset, Green is the right controller, Blue is the left controller.

    Initial tracking is completely wrong – I see some things to fix there. When the controllers go offline due to inactivity, the code keeps trying to match LEDs to them for example, and then there are some things wrong with how it’s relabelling LEDs when they get incorrect assignments.

    After that, there are periods of good tracking with random tracking losses on the controllers – those show the problem cases to concentrate on.

    ,

    Michael StillBooks read in January 2021

    Its been 10 years since I’ve read enough to write one of these summary posts… Which I guess means something. This month I’ve been thinking a lot about systems design and how to avoid Second Systems effect while growing a product, which guided my reading choices a fair bit. A fair bit of that reading has been in the form of blog posts and twitter threads, so I am going to start including those in these listings of things I’ve read.

    Social media posts of note:

    Books:

    Colin CharlesLife with Rona 2.0 – Days 4, 5, 6, 7, 8 and 9

    These lack of updates are also likely because I’ve been quite caught up with stuff.

    Monday I had a steak from Bay Leaf Steakhouse for dinner. It was kind of weird eating it from packs, but then I’m reminded you could do this in economy class. Tuesday I wanted to attempt to go vegetarian and by the time I was done with a workout, the only place was a chap fan shop (Leong Heng) where I had a mixture of Chinese and Indian chap fan. The Indian stall is run by an ex-Hyatt staff member who immediately recognised me! Wednesday, Alice came to visit, so we got to Hanks, got some alcohol, and managed a smorgasbord of food from Pickers/Sate Zul/Lila Wadi. Night ended very late, and on Thursday, visited Hai Tian for their famous salted egg squid and prawns in a coconut shell. Friday was back to being normal, so I grabbed a pizza from Mint Pizza (this time I tried their Aussie variant). Saturday, today, I hit up Rasa Sayang for some matcha latte, but grabbed food from Classic Pilot Cafe, which Faeeza owns! It was the famous salted egg chicken, double portion, half rice.

    As for workouts, I did sign up for Mantas but found it pretty hard to do, timezone wise. I did spend a lot of time jogging on the beach (this has been almost a daily affair). Monday I also did 2 MD workouts, Tuesday 1 MD workout, Wednesday half a MD workout, Thursday I did a Ping workout at Pwrhouse (so good!), Friday 1 MD workout, and Saturday an Audrey workout at Pwrhouse and 1 MD workout.

    Wednesday I also found out that Rasmus passed away. Frankly, there are no words.

    Thursday, my Raspberry Pi 400 arrived. I set it up in under ten minutes, connecting it to the TV here. It “just works”. I made a video, which I should probably figure out how to upload to YouTube after I stitch it together. I have to work on using it a lot more.

    COVID-19 cases are through the roof in Malaysia. This weekend we’ve seen two days of case breaking records, with today being 5,728 (yesterday was something close). Nutty. Singapore suspended the reciprocal green lane (RGL) agreement with Malaysia for the next 3 months.

    I’ve managed to finish Bridgerton. I like the score. Finding something on Netflix is proving to be more difficult, regardless of having a VPN. Honestly, this is why Cable TV wins… linear programming that you’re just fed.

    Stock market wise, I’ve been following the GameStop short squeeze, and even funnier is the Top Glove one, that they’re trying to repeat in Malaysia. Bitcoin seems to be doing “reasonably well” and I have to say, I think people are starting to realise decentralised services have a future. How do we get there?

    What an interesting week, I look forward to more productive time. I’m still writing in my Hobonichi Techo, so at least that’s where most personal stuff ends up, I guess?

    The post Life with Rona 2.0 – Days 4, 5, 6, 7, 8 and 9 first appeared on Colin Charles Agenda.

    ,

    Jan SchmidtHitting a milestone – Beat Saber!

    I hit an important OpenHMD milestone tonight – I completed a Beat Saber level using my Oculus Rift CV1!

    I’ve been continuing to work on integrating Kalman filtering into OpenHMD, and on improving the computer vision that matches and tracks device LEDs. While I suspect noone will be completing Expert levels just yet, it’s working well enough that I was able to play through a complete level of Beat Saber. For a long time this has been my mental benchmark for tracking performance, and I’m really happy 🙂

    Check it out:

    I should admit at this point that completing this level took me multiple attempts. The tracking still has quite a tendency to lose track of controllers, or to get them confused and swap hands suddenly.

    I have a list of more things to work on. See you at the next update!

    Michael StillShaken Fist 0.4.1

    I don’t blog about every Shaken Fist release here, but I do feel like the 0.4 release (and the subsequent minor bug fix release 0.4.1) are a pretty big deal in the life of the project.

    Shaken Fist logo
    We also got a cool logo during the v0.4 cycle as well.

    The focus of the v0.4 series is reliability — we’ve used behaviour in the continuous integration pipeline as a proxy for that, but it should be a significant improvement in the real world as well. This has included:

    • much more extensive continuous integration coverage, including several new jobs.
    • checksumming image downloads, and retrying images where the checksum fails.
    • reworked locking.
    • etcd reliability improvements.
    • refactoring instances and networks to a new “non-volatile” object model where only immutable values are cached.
    • images now track a state much like instances and networks.
    • a reworked state model for instances, where its clearer why an instance ended up in an error state. This is documented in our developer docs.

    In terms of new features, we also added:

    • a network ping API, which will emit ICMP ping packets on the network node onto your virtual network. We use this in testing to ensure instances booted and ended up online.
    • networks are now checked to ensure that they have a reasonable minimum size.
    • addition of a simple etcd backup and restore tool (sf-backup).
    • improved data upgrade of previous installations.
    • VXLAN ids are now randomized, and this has forced a new naming scheme for network interfaces and bridges.
    • we are smarter about what networks we restore on startup, and don’t restore dead networks.

    We also now require python 3.8.

    Overall, Shaken Fist v0.4 is a place that makes me much more comfortable to run workloads I care about on that previous releases. Its far from perfect, but we’re definitely moving in the right direction.

    ,

    Colin CharlesLife with Rona 2.0 – Day 3

    What an unplanned day. I woke up in time to do an MD workout, despite feeling a little sore. So maybe I was about 10 minutes late and I missed the first set, but his workouts are so long, and I think there were seven sets anyway. Had a good brunch shortly thereafter.

    Did a bit of reading, and then I decided to do a beach boardwalk walk… turns out they were policing the place, and you can’t hit the boardwalk. But the beach is fair game? So I went back to the hotel, dropped off my slippers, and went for a beach jog. Pretty nutty.

    Came back to read a little more and figured I might as well do another MD workout. Then I headed out for dinner, trying out a new place — Mint Pizza. Opened 20.12.2020, and they’re empty, and their pizza is actually pretty good. Lamb and BBQ chicken, they did half-and-half.

    Twitter was discussing Raspberry Pi’s, and all I could see is a lot of misinformation, which is truly shocking. The irony is that open source has been running the Internet for so long, and progressive web apps have come such a long way…

    Back in the day when I did OpenOffice.org or Linux training even, we always did say you should learn concepts and not tools. From the time we ran Linux installfests in the late-90s in Sunway Pyramid (back then, yes, Linux was hard, and you had winmodems), but I had forgotten that I even did stuff for school teachers and NGOs back in 2002… I won’t forget PC Gemilang either…

    Anyway, I placed an order again for another Raspberry Pi 400. I am certain that most people talk so much crap, without realising that Malaysia isn’t a developed nation and most people can’t afford a Mac let alone a PC. Laptops aren’t cheap. And there are so many other issues…. Saying Windows is still required in 2021 is the nuttiest thing I’ve heard in a long time. Easy to tweet, much harder to think about TCO, and realise where in the journey Malaysia is.

    Maybe the best thing was that Malaysian Twitter learned about technology. I doubt many realised the difference between a Pi board vs the 400, but hey, the fact that they talked about tech is still a win (misinformed, but a win).

    The post Life with Rona 2.0 – Day 3 first appeared on Colin Charles Agenda.

    ,

    Colin CharlesLife with Rona 2.0 – Days 1 & 2

    Today is the first day that in the state of Pahang, we have to encounter what many Malaysians are referring to as the Movement Control Order 2.0 (MCO 2.0). I think everyone finally agrees with the terminology that this is a lockdown now, because I remember back in the day when I was calling it that, I’d definitely offend a handful of journalists.

    This is one interesting change for me compared to when I last wrote Life with RonaDay 56 of being indoors and not even leaving my household, in Kuala Lumpur. I am now not in the state, I am living in a hotel, and I am obviously moving around a little more since we have access to the beach.

    KL/Selangor and several other states have already been under the MCO 2.0 since January 13 2021, and while it was supposed to end on January 26, it seems like they’ve extended and harmonised the dates for Peninsular Malaysia to end on February 4 2021. I guess everyone got the “good news” yesterday. The Prime Minister announced some kind of aid last week, but it is still mostly a joke.

    Today was the 2nd day I woke up at around 2.30pm because I went to bed at around 8am. First day I had a 23.5 hour uptime, and the today was less brutal, but working from 1-8am with the PST timezone is pretty brutal. Consequently, I barely got too much done, and had one meal, vegetarian, two packs that included rice. I did get to walk by the beach (between Teluk Cempedak and Teluk Cempedak 2), did quite a bit of exercise there and I think even the monkeys are getting hungry… lots of stray cats and monkeys. Starbucks closes at 7pm, and I rocked up at 7.10pm (this was just like yesterday, when I arrived at 9.55pm and was told they wouldn’t grant me a coffee!).

    While writing this entry, I did manage to get into a long video call with some friends and I guess it was good catching up with people in various states. It also is what prevented me from publishing this entry!

    Day 2

    I did wake up reasonable early today because I had pre-ordered room service to arrive at 9am. There is a fixed menu at the hotel for various cuisines (RM48/pax, thankfully gratis for me) and I told them I prefer not having to waste, so just give me what I want which is off menu items anyway. Roti telur double telur (yes, I know it is a roti jantan) with some banjir dhal and sambal and a bit of fruit on the side with two teh tariks. They delivered as requested. I did forget to ask for a jar of honey but that is OK, there is always tomorrow.

    I spent most of the day vacillating, and wouldn’t consider it productive by any measure. Just chit chats and napping. It did rain today after a long time, so the day seemed fairly dreary.

    When I finally did awaken from my nap, I went for a run on the beach. I did it barefoot. I have no idea if this is how it is supposed to be done, or if you are to run nearer the water or further up above, but I did move around between the two quite often. The beach is still pretty dead, but it is expected since no one is allowed to go unless you’re a hotel guest.

    The hotel has closed 3/4 of their villages (blocks) and moved everyone to the village I’m staying in (for long stay guests…). I’m thankful I have a pretty large suite, it is a little over 980sqft, and the ample space, while smaller than my home, is still welcome.

    Post beach run, I did a workout with MD via Instagram. It was strength/HIIT based, and I burnt a tonne, because he gave us one of his signature 1.5h classes. It was longer than the 80 minute class he normally charges RM50 for (I still think this is undervaluing his service, but he really does care and does it for the love of seeing his students grow!).

    Post-workout I decided to head downtown to find some dinner. Everything at the Teluk Cemepdak block of shops was closed, so they’re not even bothered with doing takeaway. Sg. Lembing steakhouse seemed to have cars parked, Vanggey was empty (Crocodile Rock was open, can’t say if there was a crowd, because the shared parking lot was empty), there was a modest queue at Sate Zul, and further down, Lena was closed, Pickers was open for takeaway but looked pretty closed, Tjantek was open surprisingly, and then I thought I’d give Nusantara a try again, this time for food, but their chef had just gone home at about 8pm. Oops. So I drove to LAN burger, initially ordering just one chicken double special; however they looked like they could use the business so I added on a beef double special. They now accept Boost payments so have joined the e-wallet era. One less place to use cash, which is also why I really like Kuantan. On the drive back, Classic Pilot Cafe was also open and I guess I’ll be heading there too during this lockdown.

    Came back to the room to finish both burgers in probably under 15 minutes. While watching the first episode of Bridgerton on Netflix. I’m not sure what really captivates, but I will continue on (I still haven’t finished the first episode). I need to figure out how to use the 2 TVs that I have in this room — HDMI cable? Apple TV? Not normally using a TV, all this is clearly more complex than I care to admit.

    I soaked longer than expected, ended up a prune, but I’m sure it will give me good rest!

    One thought to leave with:

    “Learn to enjoy every minute of your life. Be happy now. Don’t wait for something outside of yourself to make you happy in the future.” — Earl Nightingale

    The post Life with Rona 2.0 – Days 1 & 2 first appeared on Colin Charles Agenda.

    ,

    Sam WatkinsDeveloping CZ, a dialect of C that looks like Python

    In my experience, the C programming language is still hard to beat, even 50 years after it was first developed (and I feel the same way about UNIX). When it comes to general-purpose utility, low-level systems programming, performance, and portability (even to tiny embedded systems), I would choose C over most modern or fashionable alternatives. In some cases, it is almost the only choice.

    Many developers believe that it is difficult to write secure and reliable software in C, due to its free pointers, the lack of enforced memory integrity, and the lack of automatic memory management; however in my opinion it is possible to overcome these risks with discipline and a more secure system of libraries constructed on top of C and libc. Daniel J. Bernstein and Wietse Venema are two developers who have been able to write highly secure, stable, reliable software in C.

    My other favourite language is Python. Although Python has numerous desirable features, my favourite is the light-weight syntax: in Python, block structure is indicated by indentation, and braces and semicolons are not required. Apart from the pleasure and relief of reading and writing such light and clear code, which almost appears to be executable pseudo-code, there are many other benefits. In C or JavaScript, if you omit a trailing brace somewhere in the code, or insert an extra brace somewhere, the compiler may tell you that there is a syntax error at the end of the file. These errors can be annoying to track down, and cannot occur in Python. Python not only looks better, the clear syntax helps to avoid errors.

    The obvious disadvantage of Python, and other dynamic interpreted languages, is that most programs run extremely slower than C programs. This limits the scope and generality of Python. No AAA or performance-oriented video game engines are programmed in Python. The language is not suitable for low-level systems programming, such as operating system development, device drivers, filesystems, performance-critical networking servers, or real-time systems.

    C is a great all-purpose language, but the code is uglier than Python code. Once upon a time, when I was experimenting with the Plan 9 operating system (which is built on C, but lacks Python), I missed Python’s syntax, so I decided to do something about it and write a little preprocessor for C. This converts from a “Pythonesque” indented syntax to regular C with the braces and semicolons. Having forked a little dialect of my own, I continued from there adding other modules and features (which might have been a mistake, but it has been fun and rewarding).

    At first I called this translator Brace, because it added in the braces for me. I now call the language CZ. It sounds like “C-easy”. Ease-of-use for developers (DX) is the primary goal. CZ has all of the features of C, and translates cleanly into C, which is then compiled to machine code as normal (using any C compiler; I didn’t write one); and so CZ has the same features and performance as C, but enjoys a more pleasing syntax.

    CZ is now self-hosted, in that the translator is written in the language CZ. I confess that originally I wrote most of it in Perl; I’m proficient at Perl, but I consider it to be a fairly ugly language, and overly complicated.

    I intend for CZ’s new syntax to be “optional”, ideally a developer will be able to choose to use the normal C syntax when editing CZ, if they prefer it. For this, I need a tool to convert C back to CZ, which I have not fully implemented yet. I am aware that, in addition to traditionalists, some vision-impaired developers prefer to use braces and semicolons, as screen readers might not clearly indicate indentation. A C to CZ translator would of course also be valuable when porting an existing C program to CZ.

    CZ has a number of useful features that are not found in standard C, but I did not go so far as C++, which language has been described as “an octopus made by nailing extra legs onto a dog”. I do not consider C to be a dog, at least not in a negative sense; but I think that C++ is not an improvement over plain C. I am creating CZ because I think that it is possible to improve on C, without losing any of its advantages or making it too complex.

    One of the most interesting features I added is a simple syntax for fast, light coroutines. I based this on Simon Tatham’s approach to Coroutines in C, which may seem hacky at first glance, but is very efficient and can work very well in practice. I implemented a very fast web server with very clean code using these coroutines. The cost of switching coroutines with this method is little more than the cost of a function call.

    CZ has hygienic macros. The regular cpp (C preprocessor) macros are not hygenic and many people consider them hacky and unsafe to use. My CZ macros are safe, and somewhat more powerful than standard C macros. They can be used to neatly add new program control structures. I have plans to further develop the macro system in interesting ways.

    I added automatic prototype and header generation, as I do not like having to repeat myself when copying prototypes to separate header files. I added support for the UNIX #! scripting syntax, and for cached executables, which means that CZ can be used like a scripting language without having to use a separate compile or make command, but the programs are only recompiled when something has been changed.

    For CZ, I invented a neat approach to portability without conditional compilation directives. Platform-specific library fragments are automatically included from directories having the name of that platform or platform-category. This can work very well in practice, and helps to avoid the nightmare of conditional compilation, feature detection, and Autotools. Using this method, I was able easily to implement portable interfaces to features such as asynchronous IO multiplexing (aka select / poll).

    The CZ library includes flexible error handling wrappers, inspired by W. Richard Stevens’ wrappers in his books on Unix Network Programming. If these wrappers are used, there is no need to check return values for error codes, and this makes the code much safer, as an error cannot accidentally be ignored.

    CZ has several major faults, which I intend to correct at some point. Some of the syntax is poorly thought out, and I need to revisit it. I developed a fairly rich library to go with the language, including safer data structures, IO, networking, graphics, and sound. There are many nice features, but my CZ library is more prototype than a finished product, there are major omissions, and some features are misconceived or poorly implemented. The misfeatures should be weeded out for the time-being, or moved to an experimental section of the library.

    I think that a good software library should come in two parts, the essential low-level APIs with the minimum necessary functionality, and a rich set of high-level convenience functions built on top of the minimal API. I need to clearly separate these two parts in order to avoid polluting the namespaces with all sorts of nonsense!

    CZ is lacking a good modern system of symbol namespaces. I can look to Python for a great example. I need to maintain compatibility with C, and avoid ugly symbol encodings. I think I can come up with something that will alleviate the need to type anything like gtk_window_set_default_size, and yet maintain compatibility with the library in question. I want all the power of C, but it should be easy to use, even for children. It should be as easy as BASIC or Processing, a child should be able to write short graphical demos and the like, without stumbling over tricky syntax or obscure compile errors.

    Here is an example of a simple CZ program which plots the Mandelbrot set fractal. I think that the program is fairly clear and easy to understand, although there is still some potential to improve and clarify the code.

    #!/usr/local/bin/cz --
    use b
    use ccomplex
    
    Main:
    	num outside = 16, ox = -0.5, oy = 0, r = 1.5
    	long i, max_i = 50, rb_i = 30
    	space()
    	uint32_t *px = pixel()  # CONFIGURE!
    	num d = 2*r/h, x0 = ox-d*w_2, y0 = oy+d*h_2
    	for(y, 0, h):
    		cmplx c = x0 + (y0-d*y)*I
    		repeat(w):
    			cmplx w = c
    			for i=0; i < max_i && cabs(w) < outside; ++i
    				w = w*w + c
    			*px++ = i < max_i ? rainbow(i*359 / rb_i % 360) : black
    			c += d

    I wrote a more elaborate variant of this program, which generates images like the one shown below. There are a few tricks used: continuous colouring, rainbow colours, and plotting the logarithm of the iteration count, which makes the plot appear less busy close to the black fractal proper. I sell some T-shirts and other products with these fractal designs online.

    An image from the Mandelbrot set, generated by a fairly simple CZ program.

    I am interested in graph programming, and have been for three decades since I was a teenager. By graph programming, I mean programming and modelling based on mathematical graphs or diagrams. I avoid the term visual programming, because there is no necessary reason that vision impaired folks could not use a graph programming language; a graph or diagram may be perceived, understood, and manipulated without having to see it.

    Mathematics is something that naturally exists, outside time and independent of our universe. We humans discover mathematics, we do not invent or create it. One of my main ideas for graph programming is to represent a mathematical (or software) model in the simplest and most natural way, using relational operators. Elementary mathematics can be reduced to just a few such operators:

    +add, subtract, disjoint union, zero
    ×multiply, divide, cartesian product, one
    ^power, root, logarithm
    sin, cos, sin-1, cos-1, hypot, atan2
    δdifferential, integral
    a set of minimal relational operators for elementary math

    I think that a language and notation based on these few operators (and similar) can be considerably simpler and more expressive than conventional math or programming languages.

    CZ is for me a stepping-stone toward this goal of an expressive relational graph language. It is more pleasant for me to develop software tools in CZ than in C or another language.

    Thanks for reading. I wrote this article during the process of applying to join Toptal, which appears to be a freelancing portal for top developers; and in response to this article on toptal: After All These Years, the World is Still Powered by C Programming.

    My CZ project has been stalled for quite some time. I foolishly became discouraged after receiving some negative feedback. I now know that honest negative feedback should be valued as an opportunity to improve, and I intend to continue the project until it lacks glaring faults, and is useful for other people. If this project or this article interests you, please contact me and let me know. It is much more enjoyable to work on a project when other people are actively interested in it!

    Gary PendergastWordPress Importers: Free (as in Speech)

    Back at the start of this series, I listed four problems within the scope of the WordPress Importers that we needed to address. Three of them are largely technical problems, which I covered in previous posts. In wrapping up this series, I want to focus exclusively on the fourth problem, which has a philosophical side as well as a technical one — but that does not mean we cannot tackle it!

    Problem Number 4

    Some services work against their customers, and actively prevent site owners from controlling their own content.

    Some services are merely inconvenient: they provide exports, but it often involves downloading a bunch of different files. Your CMS content is in one export, your store products are in another, your orders are in another, and your mailing list is in yet another. It’s not ideal, but they at least let you get a copy of your data.

    However, there’s another class of services that actively work against their customers. It’s these services I want to focus on: the services that don’t provide any ability to export your content — effectively locking people in to using their platform. We could offer these folks an escape! The aim isn’t to necessarily make them use WordPress, it’s to give them a way out, if they want it. Whether they choose to use WordPress or not after that is immaterial (though I certainly hope they would, of course). The important part is freedom of choice.

    It’s worth acknowledging that this is a different approach to how WordPress has historically operated in relation to other CMSes. We provide importers for many CMSes, but we previously haven’t written exporters. However, I don’t think this is a particularly large step: for CMSes that already provide exports, we’d continue to use those export files. This is focussed on the few services that try to lock their customers in.

    Why Should WordPress Take This On?

    There are several aspects to why we should focus on this.

    First of all, it’s the the WordPress mission. Underpinning every part of WordPress is the simplest of statements:

    Democratise Publishing

    The freedom to build. The freedom to change. The freedom to share.

    These freedoms are the pillars of a Free and Open Web, but they’re not invulnerable: at times, they need to be defended, and that needs people with the time and resources to offer a defence.

    Which brings me to my second point: WordPress has the people who can offer that defence! The WordPress project has so many individuals working on it, from such a wide variety of backgrounds, we’re able to take on a vast array of projects that a smaller CMS just wouldn’t have the bandwidth for. That’s not to say that we can do everything, but when there’s a need to defend the entire ecosystem, we’re able to devote people to the cause.

    Finally, it’s important to remember that WordPress doesn’t exist in a vacuum, we’re part of a broad ecosystem which can only exist through the web remaining open and free. By encouraging all CMSes to provide proper exports, and implementing them for those that don’t, we help keep our ecosystem healthy.

    We have the ability to take on these challenges, but we have a responsibility that goes alongside. We can’t do it solely to benefit WordPress, we need to make that benefit available to the entire ecosystem. This is why it’s important to define a WordPress export schema, so that any CMS can make use of the export we produce, not just WordPress. If you’ll excuse the imagery for a moment, we can be the knight in shining armour that frees people — then gives them the choice of what they do with that freedom, without obligation.

    How Can We Do It?

    Moving on to the technical side of this problem, I can give you some good news: the answer is definitely not screen scraping. 😄 Scraping a site is fragile, impossible to transform into the full content, and provides an incomplete export of the site: anything that’s only available in the site dashboard can’t be obtained through scraping.

    I’ve recently been experimenting with an alternative approach to solving this problem. Rather than trying to create something resembling a traditional exporter, it turns out that modern CMSes provide the tools we need, in the form of REST APIs. All we need to do is call the appropriate APIs, and collate the results. The fun part is that we can authenticate with these APIs as the site owner, by calling them from a browser extension! So, that’s what I’ve been experimenting with, and it’s showing a lot of promise.

    If you’re interested in playing around with it, the experimental code is living in this repository. It’s a simple proof of concept, capable of exporting the text content of a blog on a Wix site, showing that we can make a smooth, comprehensive, easy-to-use exporter for any Wix site owner.

    Screenshot of the "Free (as in Speech)" browser extension UI.

    Clicking the export button starts a background script, which calls Wix’s REST APIs as the site owner, to get the original copy of the content. It then packages it up, and presents it as a WXR file to download.

    Screenshot of a Firefox download dialog, showing a Wix site packaged up as a WXR file.

    I’m really excited about how promising this experiment is. It can ultimately provide a full export of any Wix site, and we can add support for other CMS services that choose to artificially lock their customers in.

    Where Can I Help?

    If you’re a designer or developer who’s excited about working on something new, head on over to the repository and check out the open issues: if there’s something that isn’t already covered, feel free to open a new issue.

    Since this is new ground for a WordPress project, both technically and philosophically, I’d love to hear more points of view. It’s being discussed in the WordPress Core Dev Chat this week, and you can also let me know what you think in the comments!

    This post is part of a series, talking about the WordPress Importers, their history, where they are now, and where they could go in the future.

    ,

    Gary PendergastWordPress Importers: Defining a Schema

    While schemata are usually implemented using language-specific tools (eg, XML uses XML Schema, JSON uses JSON Schema), they largely use the same concepts when talking about data. This is rather helpful, we don’t need to make a decision on data formats before we can start thinking about how the data should be arranged.

    Note: Since these concepts apply equally to all data formats, I’m using “WXR” in this post as shorthand for “the structured data section of whichever file format we ultimately use”, rather than specifically referring to the existing WXR format. 🙂

    Why is a Schema Important?

    It’s fair to ask why, if the WordPress Importers have survived this entire time without a formal schema, why would we need one now?

    There are two major reasons why we haven’t needed one in the past:

    • WXR has remained largely unchanged in the last 10 years: there have been small additions or tweaks, but nothing significant. There’s been no need to keep track of changes.
    • WXR is currently very simple, with just a handful of basic elements. In a recent experiment, I was able to implement a JavaScript-based WXR generator in just a few days, entirely by referencing the Core implementation.

    These reasons are also why it would help to implement a schema for the future:

    • As work on WXR proceeds, there will likely need to be substantial changes to what data is included: adding new fields, modifying existing fields, and removing redundant fields. Tracking these changes helps ensure any WXR implementations can stay in sync.
    • These changes will result in a more complex schema: relying on the source to re-implement it will become increasingly difficult and error-prone. Following Gutenberg’s lead, it’s likely that we’d want to provide official libraries in both PHP and JavaScript: keeping them in sync is best done from a source schema, rather than having one implementation copy the other.

    Taking the time to plan out a schema now gives us a solid base to work from, and it allows for future changes to happen in a reliable fashion.

    WXR for all of WordPress

    With a well defined schema, we can start to expand what data will be included in a WXR file.

    Media

    Interestingly, many of the challenges around media files are less to do with WXR, and more to do with importer capabilities. The biggest headache is retrieving the actual files, which the importer currently handles by trying to retrieve the file from the remote server, as defined in the wp:attachment_url node. In context, this behaviour is understandable: 10+ years ago, personal internet connections were too slow to be moving media around, it was better to have the servers talk to each other. It’s a useful mechanism that we should keep as a fallback, but the more reliable solution is to include the media file with the export.

    Plugins and Themes

    There are two parts to plugins and themes: the code, and the content. Modern WordPress sites require plugins to function, and most are customised to suit their particular theme.

    For exporting the code, I wonder if a tiered solution could be applied:

    • Anything from WordPress.org would just need their slug, since they can be re-downloaded during import. Particularly as WordPress continues to move towards an auto-updated future, modified versions of plugins and themes are explicitly not supported.
    • Third party plugins and themes would be given a filter to use, where they can provide a download URL that can be included in the export file.
    • Third party plugins/themes that don’t provide a download URL would either need to be skipped, or zipped up and included in the export file.

    For exporting the content, WXR already includes custom post types, but doesn’t include custom settings, or custom tables. The former should be included automatically, and the latter would likely be handled by an appropriate action for the plugin to hook into.

    Settings

    There are a currently handful of special settings that are exported, but (as I just noted, particularly with plugins and themes being exported) this would likely need to be expanded to included most items in wp_options.

    Users

    Currently, the bare minimum information about users who’ve authored a post is included in the export. This would need to be expanded to include more user information, as well as users who aren’t post authors.

    WXR for parts of WordPress

    The modern use case for importers isn’t just to handle a full site, but to handle keeping sites in sync. For example, most news organisations will have a staging site (or even several layers of staging!) which is synchronised to production.

    While it’s well outside the scope of this project to directly handle every one of these use cases, we should be able to provide the framework for organisations to build reliable platforms on. Exports should be repeatable, objects in the export should have unique identifiers, and the importer should be able to handle any subset of WXR.

    WXR Beyond WordPress

    Up until this point, we’ve really been talking about WordPress→WordPress migrations, but I think WXR is a useful format beyond that. Instead of just containing direct exports of the data from particular plugins, we could also allow it to contain “types” of data. This turns WXR into an intermediary language, exports can be created from any source, and imported into WordPress.

    Let’s consider an example. Say we create a tool that can export a Shopify, Wix, or GoDaddy site to WXR, how would we represent an online store in the WXR file? We don’t want to export in the format that any particular plugin would use, since a WordPress Core tool shouldn’t be advantaging one plugin over others.

    Instead, it would be better if we could format the data in a platform-agnostic way, which plugins could then implement support for. As luck would have it, Schema.org provides exactly the kind of data structure we could use here. It’s been actively maintained for nearly nine years, it supports a wide variety of data types, and is intentionally platform-agnostic.

    Gazing into my crystal ball for a moment, I can certainly imagine a future where plugins could implement and declare support for importing certain data types. When handling such an import (assuming one of those plugins wasn’t already installed), the WordPress Importer could offer them as options during the import process. This kind of seamless integration allows WordPress to show that it offers the same kind of fully-featured site building experience that modern CMS services do.

    Of course, reality is never quite as simple as crystal balls and magic wands make them out to be. We have to contend with services that provide incomplete or fragmented exports, and there are even services that deliberately don’t provide exports at all. In the next post, I’ll be writing about why we should address this problem, and how we might be able to go about it.

    This post is part of a series, talking about the WordPress Importers, their history, where they are now, and where they could go in the future.

    ,

    Gary PendergastWordPress Importers: Getting Our House in Order

    The previous post talked about the broad problems we need to tackle to bring our importers up to speed, making them available for everyone to use.

    In this post, I’m going to focus on what we could do with the existing technology, in order to give us the best possible framework going forward.

    A Reliable Base

    Importers are an interesting technical problem. Much like you’d expect from any backup/restore code, importers need to be extremely reliable. They need to comfortable handle all sorts of unusual data, and they need to keep it all safe. Particularly considering their age, the WordPress Importers do a remarkably good job of handling most content you can throw at it.

    However, modern development practices have evolved and improved since the importers were first written, and we should certainly be making use of such practices, when they fit with our requirements.

    For building reliable software that we expect to largely run by itself, a variety of comprehensive automated testing is critical. This ensures we can confidently take on the broader issues, safe in the knowledge that we have a reliable base to work from.

    Testing must be the first item on this list. A variety of automated testing gives us confidence that changes are safe, and that the code can continue to be maintained in the future.

    Data formats must be well defined. While this is useful for ensuring data can be handled in a predictable fashion, it’s also a very clear demonstration of our commitment to data freedom.

    APIs for creating or extending importers should be straightforward for hooking into.

    Performance Isn’t an Optional Extra

    With sites constantly growing in size (and with the export files potentially gaining a heap of extra data), we need to care about the performance of the importers.

    Luckily, there’s already been some substantial work done on this front:

    There are other groups in the WordPress world who’ve made performance improvements in their own tools: gathering all of that experience is a relatively quick way to bring in production-tested improvements.

    The WXR Format

    It’s worth talking about the WXR format itself, and determining whether it’s the best option for handling exports into the future. XML-based formats are largely viewed as a relic of days gone past, so (if we were to completely ignore backwards compatibility for a moment) is there a modern data format that would work better?

    The short answer… kind of. 🙂

    XML is actually well suited to this use case, and (particularly when looking at performance improvements) is the only data format for which PHP comes with a built-in streaming parser.

    That said, WXR is basically an extension of the RSS format: as we add more data to the file that clearly doesn’t belong in RSS, there is likely an argument for defining an entirely WordPress-focused schema.

    Alternative Formats

    It’s important to consider what the priorities are for our export format, which will help guide any decision we make. So, I’d like to suggest the following priorities (in approximate priority order):

    • PHP Support: The format should be natively supported in PHP, thought it is still workable if we need to ship an additional library.
    • Performant: Particularly when looking at very large exports, it should be processed as quickly as possible, using minimal RAM.
    • Supports Binary Files: The first comments on my previous post asked about media support, we clearly should be treating it as a first-class citizen.
    • Standards Based: Is the format based on a documented standard? (Another way to ask this: are there multiple different implementations of the format? Do those implementations all function the same?
    • Backward Compatible: Can the format be used by existing tools with no changes, or minimal changes?
    • Self Descriptive: Does the format include information about what data you’re currently looking at, or do you need to refer to a schema?
    • Human Readable: Can the file be opened and read in a text editor?

    Given these priorities, what are some options?

    WXR (XML-based)

    Either the RSS-based schema that we already use, or a custom-defined XML schema, the arguments for this format are pretty well known.

    One argument that hasn’t been well covered is how there’s a definite trade-off when it comes to supporting binary files. Currently, the importer tries to scrape the media file from the original source, which is not particularly reliable. So, if we were to look at including media files in the WXR file, the best option for storing them is to base64 encode them. Unfortunately, that would have a serious effect on performance, as well as readability: adding huge base64 strings would make even the smallest exports impossible to read.

    Either way, this option would be mostly backwards compatible, though some tools may require a bit of reworking if we were to substantial change the schema.

    WXR (ZIP-based)

    To address the issues with media files, an alternative option might be to follow the path that Microsoft Word and OpenOffice use: put the text content in an XML file, put the binary content into folders, and compress the whole thing.

    This addresses the performance and binary support problems, but is initially worse for readability: if you don’t know that it’s a ZIP file, you can’t read it in a text editor. Once you unzip it, however, it does become quite readable, and has the same level of backwards compatibility as the XML-based format.

    JSON

    JSON could work as a replacement for XML in both of the above formats, with one additional caveat: there is no streaming JSON parser built in to PHP. There are 3rd party libraries available, but given the documented differences between JSON parsers, I would be wary about using one library to produce the JSON, and another to parse it.

    This format largely wouldn’t be backwards compatible, though tools which rely on the export file being plain text (eg, command line tools to do broad search-and-replaces on the file) can be modified relatively easily.

    There are additional subjective arguments (both for and against) the readability of JSON vs XML, but I’m not sure there’s anything to them beyond personal preference.

    SQLite

    The SQLite team wrote an interesting (indirect) argument on this topic: OpenOffice uses a ZIP-based format for storing documents, the SQLite team argued that there would be benefits (particularly around performance and reliability) for OpenOffice to switch to SQLite.

    They key issues that I see are:

    • SQLite is included in PHP, but not enabled by default on Windows.
    • While the SQLite team have a strong commitment to providing long-term support, SQLite is not a standard, and the only implementation is the one provided by the SQLite team.
    • This option is not backwards compatible at all.

    FlatBuffers

    FlatBuffers is an interesting comparison, since it’s a data format focussed entirely on speed. The down side of this focus is that it requires a defined schema to read the data. Much like SQLite, the only standard for FlatBuffers is the implementation. Unlike SQLite, FlatBuffers has made no commitments to providing long-term support.

    WXR (XML-based)WXR (ZIP-based)JSONSQLiteFlatBuffers
    Works in PHP?✅✅⚠⚠⚠
    Performant?⚠✅⚠✅✅
    Supports Binary Files?⚠✅⚠✅✅
    Standards Based?✅✅✅⚠ / ��
    Backwards Compatible?⚠⚠���
    Self Descriptive?✅✅✅✅�
    Readable?✅⚠ / �✅��

    As with any decision, this is a matter of trade-offs. I’m certainly interested in hearing additional perspectives on these options, or thoughts on options that I haven’t considered.

    Regardless of which particular format we choose for storing WordPress exports, every format should have (or in the case of FlatBuffers, requires) a schema. We can talk about schemata without going into implementation details, so I’ll be writing about that in the next post.

    This post is part of a series, talking about the WordPress Importers, their history, where they are now, and where they could go in the future.

    Gary PendergastWordPress Importers: Stating the Problem

    It’s time to focus on the WordPress Importers.

    I’m not talking about tidying them up, or improve performance, or fixing some bugs, though these are certainly things that should happen. Instead, we need to consider their purpose, how they fit as a driver of WordPress’ commitment to Open Source, and how they can be a key element in helping to keep the Internet Open and Free.

    The History

    The WordPress Importers are arguably the key driver to WordPress’ early success. Before the importer plugins existed (before WordPress even supported plugins!) there were a handful of import-*.php scripts in the wp-admin directory that could be used to import blogs from other blogging platforms. When other platforms fell out of favour, WordPress already had an importer ready for people to move their site over. One of the most notable instances was in 2004, when Moveable Type changed their license and prices, suddenly requiring personal blog authors to pay for something that had previously been free. WordPress was fortunate enough to be in the right place at the right time: many of WordPress’ earliest users came from Moveable Type.

    As time went on, WordPress became well known in its own right. Growth relied less on people wanting to switch from another provider, and more on people choosing to start their site with WordPress. For practical reasons, the importers were moved out of WordPress Core, and into their own plugins. Since then, they’ve largely been in maintenance mode: bugs are fixed when they come up, but since export formats rarely change, they’ve just continued to work for all these years.

    An unfortunate side effect of this, however, is that new importers are rarely written. While a new breed of services have sprung up over the years, the WordPress importers haven’t kept up.

    The New Services

    There are many new CMS services that have cropped up in recent years, and we don’t have importers for any of them. WordPress.com has a few extra ones written, but they’ve been built on the WordPress.com infrastructure out of necessity.

    You see, we’ve always assumed that other CMSes will provide some sort of export file that we can use to import into WordPress. That isn’t always the case, however. Some services (notable, Wix and GoDaddy Website Builder) deliberately don’t allow you to export your own content. Other services provide incomplete or fragmented exports, needlessly forcing stress upon site owners who want to use their own content outside of that service.

    To work around this, WordPress.com has implemented importers that effectively scrape the site: while this has worked to some degree, it does require regular maintenance, and the importer has to do a lot of guessing about how the content should be transformed. This is clearly not a solution that would be maintainable as a plugin.

    Problem Number 4

    Some services work against their customers, and actively prevent site owners from controlling their own content.

    This strikes at the heart of the WordPress Bill of Rights. WordPress is built with fundamental freedoms in mind: all of those freedoms point to owning your content, and being able to make use of it in any form you like. When a CMS actively works against providing such freedom to their community, I would argue that we have an obligation to help that community out.

    A Variety of Content

    It’s worth discussing how, when starting a modern CMS service, the bar for success is very high. You can’t get away with just providing a basic CMS: you need to provide all the options. Blogs, eCommerce, mailing lists, forums, themes, polls, statistics, contact forms, integrations, embeds, the list goes on. The closest comparison to modern CMS services is… the entire WordPress ecosystem: built on WordPress core, but with the myriad of plugins and themes available, along with the variety of services offered by a huge array of companies.

    So, when we talk about the importers, we need to consider how they’ll be used.

    Problem Number 3

    To import from a modern CMS service into WordPress, your importer needs to map from service features to WordPress plugins.

    Getting Our Own House In Order

    Some of these problems don’t just apply to new services, however.

    Out of the box, WordPress exports to WXR (WordPress eXtended RSS) files: an XML file that contains the content of the site. Back when WXR was first created, this was all you really needed, but much like the rest of the WordPress importers, it hasn’t kept up with the times. A modern WordPress site isn’t just the sum of its content: a WordPress site has plugins and themes. It has various options configured, it has huge quantities of media, it has masses of text content, far more than the first WordPress sites ever had.

    Problem Number 2

    WXR doesn’t contain a full export of a WordPress site.

    In my view, WXR is a solid format for handling exports. An XML-based system is quite capable of containing all forms of content, so it’s reasonable that we could expand the WXR format to contain the entire site.

    Built for the Future

    If there’s one thing we can learn from the history of the WordPress importers, it’s that maintenance will potentially be sporadic. Importers are unlikely to receive the same attention that the broader WordPress Core project does, owners may come and go. An importer will get attention if it breaks, of course, but it otherwise may go months or years without changing.

    Problem Number 1

    We can’t depend on regular importer maintenance in the future.

    It’s quite possible to build code that will be running in 10+ years: we see examples all across the WordPress ecosystem. Doing it in a reliable fashion needs to be a deliberate choice, however.

    What’s Next?

    Having worked our way down from the larger philosophical reasons for the importers, to some of the more technically-oriented implementation problems; I’d like to work our way back out again, focussing on each problem individually. In the following posts, I’ll start laying out how I think we can bring our importers up to speed, prepare them for the future, and make them available for everyone.

    This post is part of a series, talking about the WordPress Importers, their history, where they are now, and where they could go in the future.

    ,

    Dave HallPrivacy Policy

    Skwashd Services Pty is committed to providing quality services to you and this policy outlines our ongoing obligations to you in respect of how we manage your Personal Information. We have adopted the Australian Privacy Principles (APPs) contained in the Privacy Act 1988 (Cth) (the Privacy Act). The NPPs govern the way in which we collect, use, disclose, store, secure and dispose of your Personal Information. A copy of the Australian Privacy Principles may be obtained from the website of The Office of the Australian Information Commissioner at www.

    ,

    Glen TurnerCompiling and installing software for the uBITX v6 QRP amateur radio transciever

    The uBITX uses an Arduino internally. This article describes how to update its software.

    Required hardware

    The connector on the back is a Mini-B USB connector, so you'll need a "Mini-B to A" USB cable. This is not the same cable as used with older Android smartphones. The Mini-B connector was used with a lot of cameras a decade ago.

    You'll also need a computer. I use a laptop with Fedora Linux installed.

    Required software for software development

    In Fedora all the required software is installed with sudo dnf install arduino git. Add yourself to the users and lock groups with sudo usermod -a -G users,lock $USER (on Debian-style systems use sudo usermod -a -G dialout,lock $USER). You'll need to log out and log in again for that to have an effect (if you want to see which groups you are already in, then use the id command).

    Run arduino as your ordinary non-root user to create the directories used by the Arduino IDE. You can quit the IDE once it starts.

    Obtain the uBITX software

    $ cd ~/Arduino
    $ git clone https://github.com/afarhan/ubitxv6.git ubitx_v6.1_code
    

    Connect the uBITX to your computer

    Plug in the USB cable and turn on the radio. Running dmesg will show the Arduino appearing as a "USB serial" device:

    usb 1-1: new full-speed USB device number 6 using xhci_hcd
    usb 1-1: New USB device found, idVendor=1a86, idProduct=7523, bcdDevice= 2.64
    usb 1-1: New USB device strings: Mfr=0, Product=2, SerialNumber=0
    usb 1-1: Product: USB Serial
    usbcore: registered new interface driver ch341
    usbserial: USB Serial support registered for ch341-uart
    ch341 1-1:1.0: ch341-uart converter detected
    usb 1-1: ch341-uart converter now attached to ttyUSB1
    

    If you want more information about the USB device then use:

    $ lsusb -d 1a86:7523
    Bus 001 Device 006: ID 1a86:7523 QinHeng Electronics CH340 serial converter
    


    comment count unavailable comments

    ,

    Jan SchmidtRift CV1 – Adventures in Kalman filtering Part 2

    In the last post I had started implementing an Unscented Kalman Filter for position and orientation tracking in OpenHMD. Over the Christmas break, I continued that work.

    A Quick Recap

    When reading below, keep in mind that the goal of the filtering code I’m writing is to combine 2 sources of information for tracking the headset and controllers.

    The first piece of information is acceleration and rotation data from the IMU on each device, and the second is observations of the device position and orientation from 1 or more camera sensors.

    The IMU motion data drifts quickly (at least for position tracking) and can’t tell which way the device is facing (yaw, but can detect gravity and get pitch/roll).

    The camera observations can tell exactly where each device is, but arrive at a much lower rate (52Hz vs 500/1000Hz) and can take a long time to process (hundreds of milliseconds) to analyse to acquire or re-acquire a lock on the tracked device(s).

    The goal is to acquire tracking lock, then use the motion data to predict the motion closely enough that we always hit the ‘fast path’ of vision analysis. The key here is closely enough – the more closely the filter can track and predict the motion of devices between camera frames, the better.

    Integration in OpenHMD

    When I wrote the last post, I had the filter running as a standalone application, processing motion trace data collected by instrumenting a running OpenHMD app and moving my headset and controllers around. That’s a really good way to work, because it lets me run modifications on the same data set and see what changed.

    However, the motion traces were captured using the current fusion/prediction code, which frequently loses tracking lock when the devices move – leading to big gaps in the camera observations and more interpolation for the filter.

    By integrating the Kalman filter into OpenHMD, the predictions are improved leading to generally much better results. Here’s one trace of me moving the headset around reasonably vigourously with no tracking loss at all.

    Headset motion capture trace

    If it worked this well all the time, I’d be ecstatic! The predicted position matched the observed position closely enough for every frame for the computer vision to match poses and track perfectly. Unfortunately, this doesn’t happen every time yet, and definitely not with the controllers – although I think the latter largely comes down to the current computer vision having more troubler matching controller poses. They have fewer LEDs to match against compared to the headset, and the LEDs are generally more side-on to a front-facing camera.

    Taking a closer look at a portion of that trace, the drift between camera frames when the position is interpolated using the IMU readings is clear.

    Headset motion capture – zoomed in view

    This is really good. Most of the time, the drift between frames is within 1-2mm. The computer vision can only match the pose of the devices to within a pixel or two – so the observed jitter can also come from the pose extraction, not the filtering.

    The worst tracking is again on the Z axis – distance from the camera in this case. Again, that makes sense – with a single camera matching LED blobs, distance is the most uncertain part of the extracted pose.

    Losing Track

    The trace above is good – the computer vision spots the headset and then the filtering + computer vision track it at all times. That isn’t always the case – the prediction goes wrong, or the computer vision fails to match (it’s definitely still far from perfect). When that happens, it needs to do a full pose search to reacquire the device, and there’s a big gap until the next pose report is available.

    That looks more like this

    Headset motion capture trace with tracking errors

    This trace has 2 kinds of errors – gaps in the observed position timeline during full pose searches and erroneous position reports where the computer vision matched things incorrectly.

    Fixing the errors in position reports will require improving the computer vision algorithm and would fix most of the plot above. Outlier rejection is one approach to investigate on that front.

    Latency Compensation

    There is inherent delay involved in processing of the camera observations. Every 19.2ms, the headset emits a radio signal that triggers each camera to capture a frame. At the same time, the headset and controller IR LEDS light up brightly to create the light constellation being tracked. After the frame is captured, it is delivered over USB over the next 18ms or so and then submitted for vision analysis. In the fast case where we’re already tracking the device the computer vision is complete in a millisecond or so. In the slow case, it’s much longer.

    Overall, that means that there’s at least a 20ms offset between when the devices are observed and when the position information is available for use. In the plot above, this delay is ignored and position reports are fed into the filter when they are available. In the worst case, that means the filter is being told where the headset was hundreds of milliseconds earlier.

    To compensate for that delay, I implemented a mechanism in the filter where it keeps extra position and orientation entries in the state that can be used to retroactively apply the position observations.

    The way that works is to make a prediction of the position and orientation of the device at the moment the camera frame is captured and copy that prediction into the extra state variable. After that, it continues integrating IMU data as it becomes available while keeping the auxilliary state constant.

    When a the camera frame analysis is complete, that delayed measurement is matched against the stored position and orientation prediction in the state and the error used to correct the overall filter. The cool thing is that in the intervening time, the filter covariance matrix has been building up the right correction terms to adjust the current position and orientation.

    Here’s a good example of the difference:

    Before: Position filtering with no latency compensation
    After: Latency-compensated position reports

    Notice how most of the disconnected segments have now slotted back into position in the timeline. The ones that haven’t can either be attributed to incorrect pose extraction in the compute vision, or to not having enough auxilliary state slots for all the concurrent frames.

    At any given moment, there can be a camera frame being analysed, one arriving over USB, and one awaiting “long term” analysis. The filter needs to track an auxilliary state variable for each frame that we expect to get pose information from later, so I implemented a slot allocation system and multiple slots.

    The downside is that each slot adds 6 variables (3 position and 3 orientation) to the covariance matrix on top of the 18 base variables. Because the covariance matrix is square, the size grows quadratically with new variables. 5 new slots means 30 new variables – leading to a 48 x 48 covariance matrix instead of 18 x 18. That is a 7-fold increase in the size of the matrix (48 x 48 = 2304 vs 18 x 18 = 324) and unfortunately about a 10x slow-down in the filter run-time.

    At that point, even after some optimisation and vectorisation on the matrix operations, the filter can only run about 3x real-time, which is too slow. Using fewer slots is quicker, but allows for fewer outstanding frames. With 3 slots, the slow-down is only about 2x.

    There are some other possible approaches to this problem:

    • Running the filtering delayed, only integrating IMU reports once the camera report is available. This has the disadvantage of not reporting the most up-to-date estimate of the user pose, which isn’t great for an interactive VR system.
    • Keeping around IMU reports and rewinding / replaying the filter for late camera observations. This limits the overall increase in filter CPU usage to double (since we at most replay every observation twice), but potentially with large bursts when hundreds of IMU readings need replaying.
    • It might be possible to only keep 2 “full” delayed measurement slots with both position and orientation, and to keep some position-only slots for others. The orientation of the headset tends to drift much more slowly than position does, so when there’s a big gap in the tracking it would be more important to be able to correct the position estimate. Orientation is likely to still be close to correct.
    • Further optimisation in the filter implementation. I was hoping to keep everything dependency-free, so the filter implementation uses my own naive 2D matrix code, which only implements the features needed for the filter. A more sophisticated matrix library might perform better – but it’s hard to say without doing some testing on that front.

    Controllers

    So far in this post, I’ve only talked about the headset tracking and not mentioned controllers. The controllers are considerably harder to track right now, but most of the blame for that is in the computer vision part. Each controller has fewer LEDs than the headset, fewer are visible at any given moment, and they often aren’t pointing at the camera front-on.

    Oculus Camera view of headset and left controller.

    This screenshot is a prime example. The controller is the cluster of lights at the top of the image, and the headset is lower left. The computer vision has gotten confused and thinks the controller is the ring of random blue crosses near the headset. It corrected itself a moment later, but those false readings make life very hard for the filtering.

    Position tracking of left controller with lots of tracking loss.

    Here’s a typical example of the controller tracking right now. There are some very promising portions of good tracking, but they are interspersed with bursts of tracking losses, and wild drifting from the computer vision giving wrong poses – leading to the filter predicting incorrect acceleration and hence cascaded tracking losses. Particularly (again) on the Z axis.

    Timing Improvements

    One of the problems I was looking at in my last post is variability in the arrival timing of the various USB streams (Headset reports, Controller reports, camera frames). I improved things in OpenHMD on that front, to use timestamps from the devices everywhere (removing USB timing jitter from the inter-sample time).

    There are still potential problems in when IMU reports from controllers get updated in the filters vs the camera frames. That can be on the order of 2-4ms jitter. Time will tell how big a problem that will be – after the other bigger tracking problems are resolved.

    Sponsorships

    All the work that I’m doing implementing this positional tracking is a combination of my free time, hours contributed by my employer Centricular and contributions from people via Github Sponsorships. If you’d like to help me spend more hours on this and fewer on other paying work, I appreciate any contributions immensely!

    Next Steps

    The next things on my todo list are:

    • Integrate the delayed-observation processing into OpenHMD (at the moment it is only in my standalone simulator).
    • Improve the filter code structure – this is my first kalman filter and there are some implementation decisions I’d like to revisit.
    • Publish the UKF branch for other people to try.
    • Circle back to the computer vision and look at ways to improve the pose extraction and better reject outlying / erroneous poses, especially for the controllers.
    • Think more about how to best handle / schedule analysis of frames from multiple cameras. At the moment each camera operates as a separate entity, capturing frames and analysing them in threads without considering what is happening in other cameras. That means any camera that can’t see a particular device starts doing full pose searches – which might be unnecessary if another camera still has a good view of the device. Coordinating those analyses across cameras could yield better CPU consumption, and let the filter retain fewer delayed observation slots.

    ,

    Tim SerongScope Creep

    On December 22, I decided to brew an oatmeal stout (5kg Gladfield ale malt, 250g dark chocolate malt, 250g light chocolate malt, 250g dark crystal malt, 500g rolled oats, 150g rice hulls to stop the mash sticking, 25g Pride of Ringwood hops, Safale US-05 yeast). This all takes a good few hours to do the mash and the boil and everything, so while that was underway I thought it’d be a good opportunity to remove a crappy old cupboard from the laundry, so I could put our nice Miele upright freezer in there, where it’d be closer to the kitchen (the freezer is presently in a room at the other end of the house).

    The cupboard was reasonably easy to rip out, but behind it was a mouldy and unexpectedly bright yellow wall with an ugly gap at the top where whoever installed it had removed the existing cornice.

    Underneath the bottom half of the cupboard, I discovered not the cork tiles which cover the rest of the floor, but a layer of horrific faux-tile linoleum. Plus, more mould. No way was I going to put the freezer on top of that.

    So, up came the floor covering, back to nice hardwood boards.

    Of course, the sink had to come out too, to remove the flooring from under its cabinet, and that meant pulling the splashback tiles (they had ugly screw holes in them anyway from a shelf that had been bracketed up on top of them previously).

    Removing the tiles meant replacing a couple of sections of wall.

    Also, we still needed to be able to use the washing machine through all this, so I knocked up a temporary sink support.

    New cornice went in.

    The rest of the plastering was completed and a ceiling fan installed.

    Waterproofing membrane was applied where new tiles will go around a new sink.

    I removed the hideous old aluminium backed weather stripping from around the exterior door and plastered up the exposed groove.

    We still need to paint everything, get the new sink installed, do the tiling work and install new taps.

    As for the oatmeal stout, I bottled that on January 2. From a sample taken at the time, it should be excellent, but right now still needs to carbonate and mature.

    Stewart SmithPhotos from Taiwan

    A few years ago we went to Taiwan. I managed to capture some random bits of the city on film (and also some shots on my then phone, a Google Pixel). I find the different style of art on the streets around the world to be fascinating, and Taiwan had some good examples.

    I’ve really enjoyed shooting Kodak E100VS film over the years, and some of my last rolls were shot in Taiwan. It’s a film that unfortunately is not made anymore, but at least we have a new Ektachrome to have fun with now.

    Words for our time: “Where there is democracy, equality and freedom can exist; without democracy, equality and freedom are merely empty words”.

    This is, of course, only a small number of the total photos I took there. I’d really recommend a trip to Taiwan, and I look forward to going back there some day.

    ,

    Colin CharlesCiao, 2020

    Another year comes to a close, and this is the 4th year running I’m in Kuala Lumpur — 2017, 2018, 2019, and 2020… Wow. Maybe the biggest difference is that I’ve been in Malaysia for 306 days, thanks to the novel coronavirus. I have never spent this much time in Malaysia, in my entire life… I want to say KL, but I’ve managed to zip my way around to Kuantan (a lot), Penang, and Malacca. I can’t believe I flew back on February 29 2020 from Tokyo, and never got on a plane again! What a grounded globalist I’ve become.

    My travel stats are of course, pretty dismal. 39 days out of the country. Apparently I did a total of 13 trips, 92 days of travel (I don’t know if all my local trips are counted frankly), 60,766km, 17 cities, and still 7 countries :) I don’t even want to compare to what it was like in 2019.

    I ended that by saying, “I welcome 2020 with arms wide open.”. I’m not so sure how I feel about 2020. There is life beyond travel. COVID and our reaction to it, really worries me.

    KL has some pretty good food. Kuantan has some pretty good people. While in KL, I visited a spin studio at least once per day. I did a total of 272 spin classes over 366 days! Not to forget there was 56 days of complete lockdown, and studios didn’t open till about maybe mid-June… Sure I did do some spin in London and Paris too, but the bulk of all this happened while I was here in KL.

    I became reasonably friendlier, I became vulnerable, and like every time you do that, you’re chances of happiness and getting hurt probably straddle 50:50. Madonna – The Power of Good-bye can be apt.

    This is not to say I didn’t enjoy 2020. Glass half full. I really did. Carpe diem. Simplicity is best. If you can follow KISS principles in engineering, why would you pour your entire thought process out and overwhelm the other party?

    Anyway, I still look forward to 2021, with wide open arms, and while I really do think the COVID mess isn’t going away and things are going to be worse for many, I will still be focused on the most positive aspects of 2021. And I’ll work on being my old self again ;-)

    I also ended the year with a haircut (number 1/0.5 on the sides) on Monday 28 December 2020. Somewhat of an experiment (does CoQ10 help speed up hair growth?) but also somewhat of a reaction to saying goodbye to December 2020.

    The post Ciao, 2020 first appeared on Colin Charles Agenda.

    ,

    Tim SerongI Have No Idea How To Debug This

    On my desktop system, I’m running XFCE on openSUSE Tumbleweed. When I leave my desk, I hit the “lock screen” button, the screen goes black, and the monitors go into standby. So far so good. When I come back and mash the keyboard, everything lights up again, the screens go white, and it says:

    blank: Shows nothing but a black screen
    Name: tserong@HOSTNAME
    Password:
    Enter password to unlock; select icon to lock

    So I type my password, hit ENTER, and I’m back in action. So far so good again. Except… Several times recently, when I’ve come back and mashed the keyboard, the white overlay is gone. I can see all my open windows, my mail client, web browser, terminals, everything, but the screen is still locked. If I type my password and hit ENTER, it unlocks and I can interact again, but this is where it gets really weird. All the windows have moved down a bit on the screen. For example, a terminal that was previously neatly positioned towards the bottom of the screen is now partially off the screen. So “something” crashed – whatever overlay the lock thingy put there is gone? And somehow this affected the position of all my application windows? What in the name of all that is good and holy is going on here?

    Update 2020-12-21: I’ve opened boo#1180241 to track this.

    ,

    Stewart SmithTwo Photos from Healseville Sanctuary

    If you’re near Melbourne, you should go to Healseville Sanctuary and enjoy the Australian native animals. I’ve been a number of times over the years, and here’s a couple of photos from a (relatively, as in, the last couple of years) trip.

    Leah trying to photograph a much too close bird
    Koalas seem to always look like they’ve just woken up. I’m pretty convinced this one just had.

    Stewart SmithPhotos from Adelaide

    Some shots on Kodak Portra 400 from Adelaide. These would have been shot with my Nikon F80 35mm body, I think all with the 50mm lens. These are all pre-pandemic, and I haven’t gone and looked up when exactly. I’m just catching up on scanning some negatives.

    ,

    Glen TurnerBlocking a USB device

    udev can be used to block a USB device (or even an entire class of devices, such as USB storage). Add a file /etc/udev/rules.d/99-local-blacklist.rules containing:

    SUBSYSTEM=="usb", ATTRS{idVendor}=="0123", ATTRS{idProduct}=="4567", ATTR{authorized}="0"
    


    comment count unavailable comments

    ,

    Gary PendergastMore than 280 characters

    It’s hard to be nuanced in 280 characters.

    The Twitter character limit is a major factor of what can make it so much fun to use: you can read, publish, and interact, in extremely short, digestible chunks. But, it doesn’t fit every topic, ever time. Sometimes you want to talk about complex topics, having honest, thoughtful discussions. In an environment that encourages hot takes, however, it’s often easier to just avoid having those discussions. I can’t blame people for doing that, either: I find myself taking extended breaks from Twitter, as it can easily become overwhelming.

    For me, the exception is Twitter threads.

    Twitter threads encourage nuance and creativity.

    Creative masterpieces like this Choose Your Own Adventure are not just possible, they rely on Twitter threads being the way they are.

    Publishing a short essay about your experiences in your job can bring attention to inequality.

    And Tumblr screenshot threads are always fun to read, even when they take a turn for the epic (over 4000 tweets in this thread, and it isn’t slowing down!)

    Everyone can think of threads that they’ve loved reading.

    My point is, threads are wildly underused on Twitter. I think I big part of that is the UI for writing threads: while it’s suited to writing a thread as a series of related tweet-sized chunks, it doesn’t lend itself to writing, revising, and editing anything more complex.

    To help make this easier, I’ve been working on a tool that will help you publish an entire post to Twitter from your WordPress site, as a thread. It takes care of transforming your post into Twitter-friendly content, you can just… write. 🙂

    It doesn’t just handle the tweet embeds from earlier in the thread: it handles handle uploading and attaching any images and videos you’ve included in your post.

    All sorts of embeds work, too. 😉

    It’ll be coming in Jetpack 9.0 (due out October 6), but you can try it now in the latest Jetpack Beta! Check it out and tell me what you think. 🙂

    This might not fix all of Twitter’s problems, but I hope it’ll help you enjoy reading and writing on Twitter a little more. 💖

    ,

    Glen TurnerConverting MPEG-TS to, well, MPEG

    Digital TV uses MPEG Transport Stream, which is a container for video designed for lossy transmission, such as radio. To save CPU cycles, Personal Video Records often save the MPEG-TS stream directly to disk. The more usual MPEG is technically MPEG Program Stream, which is designed for lossless transmission, such as storage on a disk.

    Since these are a container formats, it should be possible to losslessly and quickly re-code from MPEG-TS to MPEG-PS.

    ffmpeg -ss "${STARTTIME}" -to "${DURATION}" -i "${FILENAME}" -ignore_unknown -map 0 -map -0:2 -c copy "${FILENAME}.mpeg"


    comment count unavailable comments

    ,

    Chris NeugebauerTalk Notes: Practicality Beats Purity: The Zen Of Python’s Escape Hatch?

    I gave the talk Practicality Beats Purity: The Zen of Python’s Escape Hatch as part of PyConline AU 2020, the very online replacement for PyCon AU this year. In that talk, I included a few interesting links code samples which you may be interested in:

    @apply

    def apply(transform):
    
        def __decorator__(using_this):
            return transform(using_this)
    
        return __decorator__
    
    
    numbers = [1, 2, 3, 4, 5]
    
    @apply(lambda f: list(map(f, numbers)))
    def squares(i):
      return i * i
    
    print(list(squares))
    
    # prints: [1, 4, 9, 16, 25]
    

    Init.java

    public class Init {
      public static void main(String[] args) {
        System.out.println("Hello, World!")
      }
    }
    

    @switch and @case

    __NOT_A_MATCHER__ = object()
    __MATCHER_SORT_KEY__ = 0
    
    def switch(cls):
    
        inst = cls()
        methods = []
    
        for attr in dir(inst):
            method = getattr(inst, attr)
            matcher = getattr(method, "__matcher__", __NOT_A_MATCHER__)
    
            if matcher == __NOT_A_MATCHER__:
                continue
    
            methods.append(method)
    
        methods.sort(key = lambda i: i.__matcher_sort_key__)
    
        for method in methods:
            matches = method.__matcher__()
            if matches:
                return method()
    
        raise ValueError(f"No matcher matches value {test_value}")
    
    def case(matcher):
    
        def __decorator__(f):
            global __MATCHER_SORT_KEY__
    
            f.__matcher__ = matcher
            f.__matcher_sort_key__ = __MATCHER_SORT_KEY__
            __MATCHER_SORT_KEY__ += 1
            return f
    
        return __decorator__
    
    
    
    if __name__ == "__main__":
        for i in range(100):
    
            @switch
            class FizzBuzz:
    
                @case(lambda: i % 15 == 0)
                def fizzbuzz(self):
                    return "fizzbuzz"
    
                @case(lambda: i % 3 == 0)
                def fizz(self):
                    return "fizz"
    
                @case(lambda: i % 5 == 0)
                def buzz(self):
                    return "buzz"
    
                @case(lambda: True)
                def default(self):
                    return "-"
    
            print(f"{i} {FizzBuzz}")
    

    ,

    Colin CharlesLinks on Rona #2

    This was easily a late April 2020 roundup, stuck in BBEdit, which may still be vaguely relevant.

    The post Links on Rona #2 first appeared on Colin Charles Agenda.

    ,

    Craig SandersFuck Grey Text

    fuck grey text on white backgrounds
    fuck grey text on black backgrounds
    fuck thin, spindly fonts
    fuck 10px text
    fuck any size of anything in px
    fuck font-weight 300
    fuck unreadable web pages
    fuck themes that implement this unreadable idiocy
    fuck sites that don’t work without javascript
    fuck reactjs and everything like it

    thank fuck for Stylus. and uBlock Origin. and uMatrix.

    Fuck Grey Text is a post from: Errata

    ,

    Matt PalmerPrivate Key Redaction: UR DOIN IT RONG

    Because posting private keys on the Internet is a bad idea, some people like to “redact” their private keys, so that it looks kinda-sorta like a private key, but it isn’t actually giving away anything secret. Unfortunately, due to the way that private keys are represented, it is easy to “redact” a key in such a way that it doesn’t actually redact anything at all. RSA private keys are particularly bad at this, but the problem can (potentially) apply to other keys as well.

    I’ll show you a bit of “Inside Baseball” with key formats, and then demonstrate the practical implications. Finally, we’ll go through a practical worked example from an actual not-really-redacted key I recently stumbled across in my travels.

    The Private Lives of Private Keys

    Here is what a typical private key looks like, when you come across it:

    -----BEGIN RSA PRIVATE KEY-----
    MGICAQACEQCxjdTmecltJEz2PLMpS4BXAgMBAAECEDKtuwD17gpagnASq1zQTYEC
    CQDVTYVsjjF7IQIJANUYZsIjRsR3AgkAkahDUXL0RSECCB78r2SnsJC9AghaOK3F
    sKoELg==
    -----END RSA PRIVATE KEY-----
    

    Obviously, there’s some hidden meaning in there – computers don’t encrypt things by shouting “BEGIN RSA PRIVATE KEY!”, after all. What is between the BEGIN/END lines above is, in fact, a base64-encoded DER format ASN.1 structure representing a PKCS#1 private key.

    In simple terms, it’s a list of numbers – very important numbers. The list of numbers is, in order:

    • A version number (0);
    • The “public modulus”, commonly referred to as “n”;
    • The “public exponent”, or “e” (which is almost always 65,537, for various unimportant reasons);
    • The “private exponent”, or “d”;
    • The two “private primes”, or “p” and “q”;
    • Two exponents, which are known as “dmp1” and “dmq1”; and
    • A coefficient, known as “iqmp”.

    Why Is This a Problem?

    The thing is, only three of those numbers are actually required in a private key. The rest, whilst useful to allow the RSA encryption and decryption to be more efficient, aren’t necessary. The three absolutely required values are e, p, and q.

    Of the other numbers, most of them are at least about the same size as each of p and q. So of the total data in an RSA key, less than a quarter of the data is required. Let me show you with the above “toy” key, by breaking it down piece by piece1:

    • MGI – DER for “this is a sequence”
    • CAQ – version (0)
    • CxjdTmecltJEz2PLMpS4BXn
    • AgMBAAe
    • ECEDKtuwD17gpagnASq1zQTYd
    • ECCQDVTYVsjjF7IQp
    • IJANUYZsIjRsR3q
    • AgkAkahDUXL0RSdmp1
    • ECCB78r2SnsJC9dmq1
    • AghaOK3FsKoELg==iqmp

    Remember that in order to reconstruct all of these values, all I need are e, p, and q – and e is pretty much always 65,537. So I could “redact” almost all of this key, and still give all the important, private bits of this key. Let me show you:

    -----BEGIN RSA PRIVATE KEY-----
    ..............................................................EC
    CQDVTYVsjjF7IQIJANUYZsIjRsR3....................................
    ........
    -----END RSA PRIVATE KEY-----
    

    Now, I doubt that anyone is going to redact a key precisely like this… but then again, this isn’t a “typical” RSA key. They usually look a lot more like this:

    -----BEGIN RSA PRIVATE KEY-----
    MIIEogIBAAKCAQEAu6Inch7+mWtKn+leB9uCG3MaJIxRyvC/5KTz2fR+h+GOhqj4
    SZJobiVB4FrE5FgC7AnlH6qeRi9MI0s6dt5UWZ5oNIeWSaOOeNO+EJDUkSVf67wj
    SNGXlSjGAkPZ0nRJiDjhuPvQmdW53hOaBLk5udxPEQbenpXAzbLJ7wH5ouLQ3nQw
    HwpwDNQhF6zRO8WoscpDVThOAM+s4PS7EiK8ZR4hu2toon8Ynadlm95V45wR0VlW
    zywgbkZCKa1IMrDCscB6CglQ10M3Xzya3iTzDtQxYMVqhDrA7uBYRxA0y1sER+Rb
    yhEh03xz3AWemJVLCQuU06r+FABXJuY/QuAVvQIDAQABAoIBAFqwWVhzWqNUlFEO
    PoCVvCEAVRZtK+tmyZj9kU87ORz8DCNR8A+/T/JM17ZUqO2lDGSBs9jGYpGRsr8s
    USm69BIM2ljpX95fyzDjRu5C0jsFUYNi/7rmctmJR4s4uENcKV5J/++k5oI0Jw4L
    c1ntHNWUgjK8m0UTJIlHbQq0bbAoFEcfdZxd3W+SzRG3jND3gifqKxBG04YDwloy
    tu+bPV2jEih6p8tykew5OJwtJ3XsSZnqJMwcvDciVbwYNiJ6pUvGq6Z9kumOavm9
    XU26m4cWipuK0URWbHWQA7SjbktqEpxsFrn5bYhJ9qXgLUh/I1+WhB2GEf3hQF5A
    pDTN4oECgYEA7Kp6lE7ugFBDC09sKAhoQWrVSiFpZG4Z1gsL9z5YmZU/vZf0Su0n
    9J2/k5B1GghvSwkTqpDZLXgNz8eIX0WCsS1xpzOuORSNvS1DWuzyATIG2cExuRiB
    jYWIJUeCpa5p2PdlZmBrnD/hJ4oNk4oAVpf+HisfDSN7HBpN+TJfcAUCgYEAyvY7
    Y4hQfHIdcfF3A9eeCGazIYbwVyfoGu70S/BZb2NoNEPymqsz7NOfwZQkL4O7R3Wl
    Rm0vrWT8T5ykEUgT+2ruZVXYSQCKUOl18acbAy0eZ81wGBljZc9VWBrP1rHviVWd
    OVDRZNjz6nd6ZMrJvxRa24TvxZbJMmO1cgSW1FkCgYAoWBd1WM9HiGclcnCZknVT
    UYbykCeLO0mkN1Xe2/32kH7BLzox26PIC2wxF5seyPlP7Ugw92hOW/zewsD4nLze
    v0R0oFa+3EYdTa4BvgqzMXgBfvGfABJ1saG32SzoWYcpuWLLxPwTMsCLIPmXgRr1
    qAtl0SwF7Vp7O/C23mNukQKBgB89DOEB7xloWv3Zo27U9f7nB7UmVsGjY8cZdkJl
    6O4LB9PbjXCe3ywZWmJqEbO6e83A3sJbNdZjT65VNq9uP50X1T+FmfeKfL99X2jl
    RnQTsrVZWmJrLfBSnBkmb0zlMDAcHEnhFYmHFuvEnfL7f1fIoz9cU6c+0RLPY/L7
    n9dpAoGAXih17mcmtnV+Ce+lBWzGWw9P4kVDSIxzGxd8gprrGKLa3Q9VuOrLdt58
    ++UzNUaBN6VYAe4jgxGfZfh+IaSlMouwOjDgE/qzgY8QsjBubzmABR/KWCYiRqkj
    qpWCgo1FC1Gn94gh/+dW2Q8+NjYtXWNqQcjRP4AKTBnPktEvdMA=
    -----END RSA PRIVATE KEY-----
    

    People typically redact keys by deleting whole lines, and usually replacing them with [...] and the like. But only about 345 of those 1588 characters (excluding the header and footer) are required to construct the entire key. You can redact about 4/5ths of that giant blob of stuff, and your private parts (or at least, those of your key) are still left uncomfortably exposed.

    But Wait! There’s More!

    Remember how I said that everything in the key other than e, p, and q could be derived from those three numbers? Let’s talk about one of those numbers: n.

    This is known as the “public modulus” (because, along with e, it is also present in the public key). It is very easy to calculate: n = p * q. It is also very early in the key (the second number, in fact).

    Since n = p * q, it follows that q = n / p. Thus, as long as the key is intact up to p, you can derive q by simple division.

    Real World Redaction

    At this point, I’d like to introduce an acquaintance of mine: Mr. Johan Finn. He is the proud owner of the GitHub repo johanfinn/scripts. For a while, his repo contained a script that contained a poorly-redacted private key. He since deleted it, by making a new commit, but of course because git never really deletes anything, it’s still available.

    Of course, Mr. Finn may delete the repo, or force-push a new history without that commit, so here is the redacted private key, with a bit of the surrounding shell script, for our illustrative pleasure:

    #Add private key to .ssh folder
    cd /home/johan/.ssh/
    echo  "-----BEGIN RSA PRIVATE KEY-----
    MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
    ÄÄÄÄÄÄÄÄÄÄÄÄÄÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
    MIIJKgIBAAKCAgEAxEVih1JGb8gu/Fm4AZh+ZwJw/pjzzliWrg4mICFt1g7SmIE2
    TCQMKABdwd11wOFKCPc/UzRH/fHuQcvWrpbOSdqev/zKff9iedKw/YygkMeIRaXB
    fYELqvUAOJ8PPfDm70st9GJRhjGgo5+L3cJB2gfgeiDNHzaFvapRSU0oMGQX+kI9
    ezsjDAn+0Pp+r3h/u1QpLSH4moRFGF4omNydI+3iTGB98/EzuNhRBHRNq4oBV5SG
    Pq/A1bem2ninnoEaQ+OPESxYzDz3Jy9jV0W/6LvtJ844m+XX69H5fqq5dy55z6DW
    sGKn78ULPVZPsYH5Y7C+CM6GAn4nYCpau0t52sqsY5epXdeYx4Dc+Wm0CjXrUDEe
    Egl4loPKDxJkQqQ/MQiz6Le/UK9vEmnWn1TRXK3ekzNV4NgDfJANBQobOpwt8WVB
    rbsC0ON7n680RQnl7PltK9P1AQW5vHsahkoixk/BhcwhkrkZGyDIl9g8Q/Euyoq3
    eivKPLz7/rhDE7C1BzFy7v8AjC3w7i9QeHcWOZFAXo5hiDasIAkljDOsdfD4tP5/
    wSO6E6pjL3kJ+RH2FCHd7ciQb+IcuXbku64ln8gab4p8jLa/mcMI+V3eWYnZ82Yu
    axsa85hAe4wb60cp/rCJo7ihhDTTvGooqtTisOv2nSvCYpcW9qbL6cGjAXECAwEA
    AQKCAgEAjz6wnWDP5Y9ts2FrqUZ5ooamnzpUXlpLhrbu3m5ncl4ZF5LfH+QDN0Kl
    KvONmHsUhJynC/vROybSJBU4Fu4bms1DJY3C39h/L7g00qhLG7901pgWMpn3QQtU
    4P49qpBii20MGhuTsmQQALtV4kB/vTgYfinoawpo67cdYmk8lqzGzzB/HKxZdNTq
    s+zOfxRr7PWMo9LyVRuKLjGyYXZJ/coFaobWBi8Y96Rw5NZZRYQQXLIalC/Dhndm
    AHckpstEtx2i8f6yxEUOgPvV/gD7Akn92RpqOGW0g/kYpXjGqZQy9PVHGy61sInY
    HSkcOspIkJiS6WyJY9JcvJPM6ns4b84GE9qoUlWVF3RWJk1dqYCw5hz4U8LFyxsF
    R6WhYiImvjxBLpab55rSqbGkzjI2z+ucDZyl1gqIv9U6qceVsgRyuqdfVN4deU22
    LzO5IEDhnGdFqg9KQY7u8zm686Ejs64T1sh0y4GOmGsSg+P6nsqkdlXH8C+Cf03F
    lqPFg8WQC7ojl/S8dPmkT5tcJh3BPwIWuvbtVjFOGQc8x0lb+NwK8h2Nsn6LNazS
    0H90adh/IyYX4sBMokrpxAi+gMAWiyJHIHLeH2itNKtAQd3qQowbrWNswJSgJzsT
    JuJ7uqRKAFkE6nCeAkuj/6KHHMPsfCAffVdyGaWqhoxmPOrnVgECggEBAOrCCwiC
    XxwUgjOfOKx68siFJLfHf4vPo42LZOkAQq5aUmcWHbJVXmoxLYSczyAROopY0wd6
    Dx8rqnpO7OtZsdJMeBSHbMVKoBZ77hiCQlrljcj12moFaEAButLCdZFsZW4zF/sx
    kWIAaPH9vc4MvHHyvyNoB3yQRdevu57X7xGf9UxWuPil/jvdbt9toaraUT6rUBWU
    GYPNKaLFsQzKsFWAzp5RGpASkhuiBJ0Qx3cfLyirjrKqTipe3o3gh/5RSHQ6VAhz
    gdUG7WszNWk8FDCL6RTWzPOrbUyJo/wz1kblsL3vhV7ldEKFHeEjsDGroW2VUFlS
    asAHNvM4/uYcOSECggEBANYH0427qZtLVuL97htXW9kCAT75xbMwgRskAH4nJDlZ
    IggDErmzBhtrHgR+9X09iL47jr7dUcrVNPHzK/WXALFSKzXhkG/yAgmt3r14WgJ6
    5y7010LlPFrzaNEyO/S4ISuBLt4cinjJsrFpoo0WI8jXeM5ddG6ncxdurKXMymY7
    :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::.::
    :::::::::::::::::::::::::::.::::::::::::::::::::::::::::::::::::
    LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLlL
    ÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖ
    ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
    ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
    YYYYYYYYYYYYYYYYYYYYYyYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
    gff0GJCOMZ65pMSy3A3cSAtjlKnb4fWzuHD5CFbusN4WhCT/tNxGNSpzvxd8GIDs
    nY7exs9L230oCCpedVgcbayHCbkChEfoPzL1e1jXjgCwCTgt8GjeEFqc1gXNEaUn
    O8AJ4VlR8fRszHm6yR0ZUBdY7UJddxQiYOzt0S1RLlECggEAbdcs4mZdqf3OjejJ
    06oTPs9NRtAJVZlppSi7pmmAyaNpOuKWMoLPElDAQ3Q7VX26LlExLCZoPOVpdqDH
    KbdmBEfTR4e11Pn9vYdu9/i6o10U4hpmf4TYKlqk10g1Sj21l8JATj/7Diey8scO
    sAI1iftSg3aBSj8W7rxCxSezrENzuqw5D95a/he1cMUTB6XuravqZK5O4eR0vrxR
    AvMzXk5OXrUEALUvt84u6m6XZZ0pq5XZxq74s8p/x1JvTwcpJ3jDKNEixlHfdHEZ
    ZIu/xpcwD5gRfVGQamdcWvzGHZYLBFO1y5kAtL8kI9tW7WaouWVLmv99AyxdAaCB
    Y5mBAQKCAQEAzU7AnorPzYndlOzkxRFtp6MGsvRBsvvqPLCyUFEXrHNV872O7tdO
    GmsMZl+q+TJXw7O54FjJJvqSSS1sk68AGRirHop7VQce8U36BmI2ZX6j2SVAgIkI
    9m3btCCt5rfiCatn2+Qg6HECmrCsHw6H0RbwaXS4RZUXD/k4X+sslBitOb7K+Y+N
    Bacq6QxxjlIqQdKKPs4P2PNHEAey+kEJJGEQ7bTkNxCZ21kgi1Sc5L8U/IGy0BMC
    PvJxssLdaWILyp3Ws8Q4RAoC5c0ZP0W2j+5NSbi3jsDFi0Y6/2GRdY1HAZX4twem
    Q0NCedq1JNatP1gsb6bcnVHFDEGsj/35oQKCAQEAgmWMuSrojR/fjJzvke6Wvbox
    FRnPk+6YRzuYhAP/YPxSRYyB5at++5Q1qr7QWn7NFozFIVFFT8CBU36ktWQ39MGm
    cJ5SGyN9nAbbuWA6e+/u059R7QL+6f64xHRAGyLT3gOb1G0N6h7VqFT25q5Tq0rc
    Lf/CvLKoudjv+sQ5GKBPT18+zxmwJ8YUWAsXUyrqoFWY/Tvo5yLxaC0W2gh3+Ppi
    EDqe4RRJ3VKuKfZxHn5VLxgtBFN96Gy0+Htm5tiMKOZMYAkHiL+vrVZAX0hIEuRZ
    JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
    MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
    -----END RSA PRIVATE KEY-----" >> id_rsa
    

    Now, if you try to reconstruct this key by removing the “obvious” garbage lines (the ones that are all repeated characters, some of which aren’t even valid base64 characters), it still isn’t a key – at least, openssl pkey doesn’t want anything to do with it. The key is very much still in there, though, as we shall soon see.

    Using a gem I wrote and a quick bit of Ruby, we can extract a complete private key. The irb session looks something like this:

    >> require "derparse"
    >> b64 = <<EOF
    MIIJKgIBAAKCAgEAxEVih1JGb8gu/Fm4AZh+ZwJw/pjzzliWrg4mICFt1g7SmIE2
    TCQMKABdwd11wOFKCPc/UzRH/fHuQcvWrpbOSdqev/zKff9iedKw/YygkMeIRaXB
    fYELqvUAOJ8PPfDm70st9GJRhjGgo5+L3cJB2gfgeiDNHzaFvapRSU0oMGQX+kI9
    ezsjDAn+0Pp+r3h/u1QpLSH4moRFGF4omNydI+3iTGB98/EzuNhRBHRNq4oBV5SG
    Pq/A1bem2ninnoEaQ+OPESxYzDz3Jy9jV0W/6LvtJ844m+XX69H5fqq5dy55z6DW
    sGKn78ULPVZPsYH5Y7C+CM6GAn4nYCpau0t52sqsY5epXdeYx4Dc+Wm0CjXrUDEe
    Egl4loPKDxJkQqQ/MQiz6Le/UK9vEmnWn1TRXK3ekzNV4NgDfJANBQobOpwt8WVB
    rbsC0ON7n680RQnl7PltK9P1AQW5vHsahkoixk/BhcwhkrkZGyDIl9g8Q/Euyoq3
    eivKPLz7/rhDE7C1BzFy7v8AjC3w7i9QeHcWOZFAXo5hiDasIAkljDOsdfD4tP5/
    wSO6E6pjL3kJ+RH2FCHd7ciQb+IcuXbku64ln8gab4p8jLa/mcMI+V3eWYnZ82Yu
    axsa85hAe4wb60cp/rCJo7ihhDTTvGooqtTisOv2nSvCYpcW9qbL6cGjAXECAwEA
    AQKCAgEAjz6wnWDP5Y9ts2FrqUZ5ooamnzpUXlpLhrbu3m5ncl4ZF5LfH+QDN0Kl
    KvONmHsUhJynC/vROybSJBU4Fu4bms1DJY3C39h/L7g00qhLG7901pgWMpn3QQtU
    4P49qpBii20MGhuTsmQQALtV4kB/vTgYfinoawpo67cdYmk8lqzGzzB/HKxZdNTq
    s+zOfxRr7PWMo9LyVRuKLjGyYXZJ/coFaobWBi8Y96Rw5NZZRYQQXLIalC/Dhndm
    AHckpstEtx2i8f6yxEUOgPvV/gD7Akn92RpqOGW0g/kYpXjGqZQy9PVHGy61sInY
    HSkcOspIkJiS6WyJY9JcvJPM6ns4b84GE9qoUlWVF3RWJk1dqYCw5hz4U8LFyxsF
    R6WhYiImvjxBLpab55rSqbGkzjI2z+ucDZyl1gqIv9U6qceVsgRyuqdfVN4deU22
    LzO5IEDhnGdFqg9KQY7u8zm686Ejs64T1sh0y4GOmGsSg+P6nsqkdlXH8C+Cf03F
    lqPFg8WQC7ojl/S8dPmkT5tcJh3BPwIWuvbtVjFOGQc8x0lb+NwK8h2Nsn6LNazS
    0H90adh/IyYX4sBMokrpxAi+gMAWiyJHIHLeH2itNKtAQd3qQowbrWNswJSgJzsT
    JuJ7uqRKAFkE6nCeAkuj/6KHHMPsfCAffVdyGaWqhoxmPOrnVgECggEBAOrCCwiC
    XxwUgjOfOKx68siFJLfHf4vPo42LZOkAQq5aUmcWHbJVXmoxLYSczyAROopY0wd6
    Dx8rqnpO7OtZsdJMeBSHbMVKoBZ77hiCQlrljcj12moFaEAButLCdZFsZW4zF/sx
    kWIAaPH9vc4MvHHyvyNoB3yQRdevu57X7xGf9UxWuPil/jvdbt9toaraUT6rUBWU
    GYPNKaLFsQzKsFWAzp5RGpASkhuiBJ0Qx3cfLyirjrKqTipe3o3gh/5RSHQ6VAhz
    gdUG7WszNWk8FDCL6RTWzPOrbUyJo/wz1kblsL3vhV7ldEKFHeEjsDGroW2VUFlS
    asAHNvM4/uYcOSECggEBANYH0427qZtLVuL97htXW9kCAT75xbMwgRskAH4nJDlZ
    IggDErmzBhtrHgR+9X09iL47jr7dUcrVNPHzK/WXALFSKzXhkG/yAgmt3r14WgJ6
    5y7010LlPFrzaNEyO/S4ISuBLt4cinjJsrFpoo0WI8jXeM5ddG6ncxdurKXMymY7
    EOF
    >> b64 += <<EOF
    gff0GJCOMZ65pMSy3A3cSAtjlKnb4fWzuHD5CFbusN4WhCT/tNxGNSpzvxd8GIDs
    nY7exs9L230oCCpedVgcbayHCbkChEfoPzL1e1jXjgCwCTgt8GjeEFqc1gXNEaUn
    O8AJ4VlR8fRszHm6yR0ZUBdY7UJddxQiYOzt0S1RLlECggEAbdcs4mZdqf3OjejJ
    06oTPs9NRtAJVZlppSi7pmmAyaNpOuKWMoLPElDAQ3Q7VX26LlExLCZoPOVpdqDH
    KbdmBEfTR4e11Pn9vYdu9/i6o10U4hpmf4TYKlqk10g1Sj21l8JATj/7Diey8scO
    sAI1iftSg3aBSj8W7rxCxSezrENzuqw5D95a/he1cMUTB6XuravqZK5O4eR0vrxR
    AvMzXk5OXrUEALUvt84u6m6XZZ0pq5XZxq74s8p/x1JvTwcpJ3jDKNEixlHfdHEZ
    ZIu/xpcwD5gRfVGQamdcWvzGHZYLBFO1y5kAtL8kI9tW7WaouWVLmv99AyxdAaCB
    Y5mBAQKCAQEAzU7AnorPzYndlOzkxRFtp6MGsvRBsvvqPLCyUFEXrHNV872O7tdO
    GmsMZl+q+TJXw7O54FjJJvqSSS1sk68AGRirHop7VQce8U36BmI2ZX6j2SVAgIkI
    9m3btCCt5rfiCatn2+Qg6HECmrCsHw6H0RbwaXS4RZUXD/k4X+sslBitOb7K+Y+N
    Bacq6QxxjlIqQdKKPs4P2PNHEAey+kEJJGEQ7bTkNxCZ21kgi1Sc5L8U/IGy0BMC
    PvJxssLdaWILyp3Ws8Q4RAoC5c0ZP0W2j+5NSbi3jsDFi0Y6/2GRdY1HAZX4twem
    Q0NCedq1JNatP1gsb6bcnVHFDEGsj/35oQKCAQEAgmWMuSrojR/fjJzvke6Wvbox
    FRnPk+6YRzuYhAP/YPxSRYyB5at++5Q1qr7QWn7NFozFIVFFT8CBU36ktWQ39MGm
    cJ5SGyN9nAbbuWA6e+/u059R7QL+6f64xHRAGyLT3gOb1G0N6h7VqFT25q5Tq0rc
    Lf/CvLKoudjv+sQ5GKBPT18+zxmwJ8YUWAsXUyrqoFWY/Tvo5yLxaC0W2gh3+Ppi
    EDqe4RRJ3VKuKfZxHn5VLxgtBFN96Gy0+Htm5tiMKOZMYAkHiL+vrVZAX0hIEuRZ
    EOF
    >> der = b64.unpack("m").first
    >> c = DerParse.new(der).first_node.first_child
    >> version = c.value
    => 0
    >> c = c.next_node
    >> n = c.value
    => 80071596234464993385068908004931... # (etc)
    >> c = c.next_node
    >> e = c.value
    => 65537
    >> c = c.next_node
    >> d = c.value
    => 58438813486895877116761996105770... # (etc)
    >> c = c.next_node
    >> p = c.value
    => 29635449580247160226960937109864... # (etc)
    >> c = c.next_node
    >> q = c.value
    => 27018856595256414771163410576410... # (etc)
    

    What I’ve done, in case you don’t speak Ruby, is take the two “chunks” of plausible-looking base64 data, chuck them together into a variable named b64, unbase64 it into a variable named der, pass that into a new DerParse instance, and then walk the DER value tree until I got all the values I need.

    Interestingly, the q value actually traverses the “split” in the two chunks, which means that there’s always the possibility that there are lines missing from the key. However, since p and q are supposed to be prime, we can “sanity check” them to see if corruption is likely to have occurred:

    >> require "openssl"
    >> OpenSSL::BN.new(p).prime?
    => true
    >> OpenSSL::BN.new(q).prime?
    => true
    

    Excellent! The chances of a corrupted file producing valid-but-incorrect prime numbers isn’t huge, so we can be fairly confident that we’ve got the “real” p and q. Now, with the help of another one of my creations we can use e, p, and q to create a fully-operational battle key:

    >> require "openssl/pkey/rsa"
    >> k = OpenSSL::PKey::RSA.from_factors(p, q, e)
    => #<OpenSSL::PKey::RSA:0x0000559d5903cd38>
    >> k.valid?
    => true
    >> k.verify(OpenSSL::Digest::SHA256.new, k.sign(OpenSSL::Digest::SHA256.new, "bob"), "bob")
    => true
    

    … and there you have it. One fairly redacted-looking private key brought back to life by maths and far too much free time.

    Sorry Mr. Finn, I hope you’re not still using that key on anything Internet-facing.

    What About Other Key Types?

    EC keys are very different beasts, but they have much the same problems as RSA keys. A typical EC key contains both private and public data, and the public portion is twice the size – so only about 1/3 of the data in the key is private material. It is quite plausible that you can “redact” an EC key and leave all the actually private bits exposed.

    What Do We Do About It?

    In short: don’t ever try and redact real private keys. For documentation purposes, just put “KEY GOES HERE” in the appropriate spot, or something like that. Store your secrets somewhere that isn’t a public (or even private!) git repo.

    Generating a “dummy” private key and sticking it in there isn’t a great idea, for different reasons: people have this odd habit of reusing “demo” keys in real life. There’s no need to encourage that sort of thing.


    1. Technically the pieces aren’t 100% aligned with the underlying DER, because of how base64 works. I felt it was easier to understand if I stuck to chopping up the base64, rather than decoding into DER and then chopping up the DER. 

    ,

    Jonathan Adamczewskif32, u32, and const

    Some time ago, I wrote “floats, bits, and constant expressions” about converting floating point number into its representative ones and zeros as a C++ constant expression – constructing the IEEE 754 representation without being able to examine the bits directly.

    I’ve been playing around with Rust recently, and rewrote that conversion code as a bit of a learning exercise for myself, with a thoroughly contrived set of constraints: using integer and single-precision floating point math, at compile time, without unsafe blocks, while using as few unstable features as possible.

    I’ve included the listing below, for your bemusement and/or head-shaking, and you can play with the code in the Rust Playground and rust.godbolt.org

    // Jonathan Adamczewski 2020-05-12
    //
    // Constructing the bit-representation of an IEEE 754 single precision floating 
    // point number, using integer and single-precision floating point math, at 
    // compile time, in rust, without unsafe blocks, while using as few unstable 
    // features as I can.
    //
    // or "What if this silly C++ thing https://brnz.org/hbr/?p=1518 but in Rust?"
    
    
    // Q. Why? What is this good for?
    // A. To the best of my knowledge, this code serves no useful purpose. 
    //    But I did learn a thing or two while writing it :)
    
    
    // This is needed to be able to perform floating point operations in a const 
    // function:
    #![feature(const_fn)]
    
    
    // bits_transmute(): Returns the bits representing a floating point value, by
    //                   way of std::mem::transmute()
    //
    // For completeness (and validation), and to make it clear the fundamentally 
    // unnecessary nature of the exercise :D - here's a short, straightforward, 
    // library-based version. But it needs the const_transmute flag and an unsafe 
    // block.
    #![feature(const_transmute)]
    const fn bits_transmute(f: f32) -> u32 {
      unsafe { std::mem::transmute::<f32, u32>(f) }
    }
    
    
    
    // get_if_u32(predicate:bool, if_true: u32, if_false: u32):
    //   Returns if_true if predicate is true, else if_false
    //
    // If and match are not able to be used in const functions (at least, not 
    // without #![feature(const_if_match)] - so here's a branch-free select function
    // for u32s
    const fn get_if_u32(predicate: bool, if_true: u32, if_false: u32) -> u32 {
      let pred_mask = (-1 * (predicate as i32)) as u32;
      let true_val = if_true & pred_mask;
      let false_val = if_false & !pred_mask;
      true_val | false_val
    }
    
    // get_if_f32(predicate, if_true, if_false):
    //   Returns if_true if predicate is true, else if_false
    //
    // A branch-free select function for f32s.
    // 
    // If either is_true or is_false is NaN or an infinity, the result will be NaN,
    // which is not ideal. I don't know of a better way to implement this function
    // within the arbitrary limitations of this silly little side quest.
    const fn get_if_f32(predicate: bool, if_true: f32, if_false: f32) -> f32 {
      // can't convert bool to f32 - but can convert bool to i32 to f32
      let pred_sel = (predicate as i32) as f32;
      let pred_not_sel = ((!predicate) as i32) as f32;
      let true_val = if_true * pred_sel;
      let false_val = if_false * pred_not_sel;
      true_val + false_val
    }
    
    
    // bits(): Returns the bits representing a floating point value.
    const fn bits(f: f32) -> u32 {
      // the result value, initialized to a NaN value that will otherwise not be
      // produced by this function.
      let mut r = 0xffff_ffff;
    
      // These floation point operations (and others) cause the following error:
      //     only int, `bool` and `char` operations are stable in const fn
      // hence #![feature(const_fn)] at the top of the file
      
      // Identify special cases
      let is_zero    = f == 0_f32;
      let is_inf     = f == f32::INFINITY;
      let is_neg_inf = f == f32::NEG_INFINITY;
      let is_nan     = f != f;
    
      // Writing this as !(is_zero || is_inf || ...) cause the following error:
      //     Loops and conditional expressions are not stable in const fn
      // so instead write this as type coversions, and bitwise operations
      //
      // "normalish" here means that f is a normal or subnormal value
      let is_normalish = 0 == ((is_zero as u32) | (is_inf as u32) | 
                            (is_neg_inf as u32) | (is_nan as u32));
    
      // set the result value for each of the special cases
      r = get_if_u32(is_zero,    0,           r); // if (iz_zero)    { r = 0; }
      r = get_if_u32(is_inf,     0x7f80_0000, r); // if (is_inf)     { r = 0x7f80_0000; }
      r = get_if_u32(is_neg_inf, 0xff80_0000, r); // if (is_neg_inf) { r = 0xff80_0000; }
      r = get_if_u32(is_nan,     0x7fc0_0000, r); // if (is_nan)     { r = 0x7fc0_0000; }
     
      // It was tempting at this point to try setting f to a "normalish" placeholder 
      // value so that special cases do not have to be handled in the code that 
      // follows, like so:
      // f = get_if_f32(is_normal, f, 1_f32);
      //
      // Unfortunately, get_if_f32() returns NaN if either input is NaN or infinite.
      // Instead of switching the value, we work around the non-normalish cases 
      // later.
      //
      // (This whole function is branch-free, so all of it is executed regardless of 
      // the input value)
    
      // extract the sign bit
      let sign_bit  = get_if_u32(f < 0_f32,  1, 0);
    
      // compute the absolute value of f
      let mut abs_f = get_if_f32(f < 0_f32, -f, f);
    
      
      // This part is a little complicated. The algorithm is functionally the same 
      // as the C++ version linked from the top of the file.
      // 
      // Because of the various contrived constraints on thie problem, we compute 
      // the exponent and significand, rather than extract the bits directly.
      //
      // The idea is this:
      // Every finite single precision float point number can be represented as a
      // series of (at most) 24 significant digits as a 128.149 fixed point number 
      // (128: 126 exponent values >= 0, plus one for the implicit leading 1, plus 
      // one more so that the decimal point falls on a power-of-two boundary :)
      // 149: 126 negative exponent values, plus 23 for the bits of precision in the 
      // significand.)
      //
      // If we are able to scale the number such that all of the precision bits fall 
      // in the upper-most 64 bits of that fixed-point representation (while 
      // tracking our effective manipulation of the exponent), we can then 
      // predictably and simply scale that computed value back to a range than can 
      // be converted safely to a u64, count the leading zeros to determine the 
      // exact exponent, and then shift the result into position for the final u32 
      // representation.
      
      // Start with the largest possible exponent - subsequent steps will reduce 
      // this number as appropriate
      let mut exponent: u32 = 254;
      {
        // Hex float literals are really nice. I miss them.
    
        // The threshold is 2^87 (think: 64+23 bits) to ensure that the number will 
        // be large enough that, when scaled down by 2^64, all the precision will 
        // fit nicely in a u64
        const THRESHOLD: f32 = 154742504910672534362390528_f32; // 0x1p87f == 2^87
    
        // The scaling factor is 2^41 (think: 64-23 bits) to ensure that a number 
        // between 2^87 and 2^64 will not overflow in a single scaling step.
        const SCALE_UP: f32 = 2199023255552_f32; // 0x1p41f == 2^41
    
        // Because loops are not available (no #![feature(const_loops)], and 'if' is
        // not available (no #![feature(const_if_match)]), perform repeated branch-
        // free conditional multiplication of abs_f.
    
        // use a macro, because why not :D It's the most compact, simplest option I 
        // could find.
        macro_rules! maybe_scale {
          () => {{
            // care is needed: if abs_f is above the threshold, multiplying by 2^41 
            // will cause it to overflow (INFINITY) which will cause get_if_f32() to
            // return NaN, which will destroy the value in abs_f. So compute a safe 
            // scaling factor for each iteration.
            //
            // Roughly equivalent to :
            // if (abs_f < THRESHOLD) {
            //   exponent -= 41;
            //   abs_f += SCALE_UP;
            // }
            let scale = get_if_f32(abs_f < THRESHOLD, SCALE_UP,      1_f32);    
            exponent  = get_if_u32(abs_f < THRESHOLD, exponent - 41, exponent); 
            abs_f     = get_if_f32(abs_f < THRESHOLD, abs_f * scale, abs_f);
          }}
        }
        // 41 bits per iteration means up to 246 bits shifted.
        // Even the smallest subnormal value will end up in the desired range.
        maybe_scale!();  maybe_scale!();  maybe_scale!();
        maybe_scale!();  maybe_scale!();  maybe_scale!();
      }
    
      // Now that we know that abs_f is in the desired range (2^87 <= abs_f < 2^128)
      // scale it down to be in the range (2^23 <= _ < 2^64), and convert without 
      // loss of precision to u64.
      const INV_2_64: f32 = 5.42101086242752217003726400434970855712890625e-20_f32; // 0x1p-64f == 2^64
      let a = (abs_f * INV_2_64) as u64;
    
      // Count the leading zeros.
      // (C++ doesn't provide a compile-time constant function for this. It's nice 
      // that rust does :)
      let mut lz = a.leading_zeros();
    
      // if the number isn't normalish, lz is meaningless: we stomp it with 
      // something that will not cause problems in the computation that follows - 
      // the result of which is meaningless, and will be ignored in the end for 
      // non-normalish values.
      lz = get_if_u32(!is_normalish, 0, lz); // if (!is_normalish) { lz = 0; }
    
      {
        // This step accounts for subnormal numbers, where there are more leading 
        // zeros than can be accounted for in a valid exponent value, and leading 
        // zeros that must remain in the final significand.
        //
        // If lz < exponent, reduce exponent to its final correct value - lz will be
        // used to remove all of the leading zeros.
        //
        // Otherwise, clamp exponent to zero, and adjust lz to ensure that the 
        // correct number of bits will remain (after multiplying by 2^41 six times - 
        // 2^246 - there are 7 leading zeros ahead of the original subnormal's
        // computed significand of 0.sss...)
        // 
        // The following is roughly equivalent to:
        // if (lz < exponent) {
        //   exponent = exponent - lz;
        // } else {
        //   exponent = 0;
        //   lz = 7;
        // }
    
        // we're about to mess with lz and exponent - compute and store the relative 
        // value of the two
        let lz_is_less_than_exponent = lz < exponent;
    
        lz       = get_if_u32(!lz_is_less_than_exponent, 7,             lz);
        exponent = get_if_u32( lz_is_less_than_exponent, exponent - lz, 0);
      }
    
      // compute the final significand.
      // + 1 shifts away a leading 1-bit for normal, and 0-bit for subnormal values
      // Shifts are done in u64 (that leading bit is shifted into the void), then
      // the resulting bits are shifted back to their final resting place.
      let significand = ((a << (lz + 1)) >> (64 - 23)) as u32;
    
      // combine the bits
      let computed_bits = (sign_bit << 31) | (exponent << 23) | significand;
    
      // return the normalish result, or the non-normalish result, as appopriate
      get_if_u32(is_normalish, computed_bits, r)
    }
    
    
    // Compile-time validation - able to be examined in rust.godbolt.org output
    pub static BITS_BIGNUM: u32 = bits(std::f32::MAX);
    pub static TBITS_BIGNUM: u32 = bits_transmute(std::f32::MAX);
    pub static BITS_LOWER_THAN_MIN: u32 = bits(7.0064923217e-46_f32);
    pub static TBITS_LOWER_THAN_MIN: u32 = bits_transmute(7.0064923217e-46_f32);
    pub static BITS_ZERO: u32 = bits(0.0f32);
    pub static TBITS_ZERO: u32 = bits_transmute(0.0f32);
    pub static BITS_ONE: u32 = bits(1.0f32);
    pub static TBITS_ONE: u32 = bits_transmute(1.0f32);
    pub static BITS_NEG_ONE: u32 = bits(-1.0f32);
    pub static TBITS_NEG_ONE: u32 = bits_transmute(-1.0f32);
    pub static BITS_INF: u32 = bits(std::f32::INFINITY);
    pub static TBITS_INF: u32 = bits_transmute(std::f32::INFINITY);
    pub static BITS_NEG_INF: u32 = bits(std::f32::NEG_INFINITY);
    pub static TBITS_NEG_INF: u32 = bits_transmute(std::f32::NEG_INFINITY);
    pub static BITS_NAN: u32 = bits(std::f32::NAN);
    pub static TBITS_NAN: u32 = bits_transmute(std::f32::NAN);
    pub static BITS_COMPUTED_NAN: u32 = bits(std::f32::INFINITY/std::f32::INFINITY);
    pub static TBITS_COMPUTED_NAN: u32 = bits_transmute(std::f32::INFINITY/std::f32::INFINITY);
    
    
    // Run-time validation of many more values
    fn main() {
      let end: usize = 0xffff_ffff;
      let count = 9_876_543; // number of values to test
      let step = end / count;
      for u in (0..=end).step_by(step) {
          let v = u as u32;
          
          // reference
          let f = unsafe { std::mem::transmute::<u32, f32>(v) };
          
          // compute
          let c = bits(f);
    
          // validation
          if c != v && 
             !(f.is_nan() && c == 0x7fc0_0000) && // nans
             !(v == 0x8000_0000 && c == 0) { // negative 0
              println!("{:x?} {:x?}", v, c); 
          }
      }
    }
    

    Colin CharlesLife with Rona – Day 56

    One of the busiest days I’ve had, largely due to unplanned/non-calendared events that I would consider happy events. I was scheduled to start work for a West Coast client at 6am, but for some reason I was up by 4am… All this done, I took a short nap (it maybe lasted 30 minutes) before the power went out and rudely awakened me. I was pleasantly surprised to see cake, a platter of cheese/cold cuts and more (olives, beetroot, hummus, and more) that Radiance, Lauren and Sal sent over. Shortly thereafter, more cheese that Sara sent over (promptly to the fridge). So lunch was unplanned but much fun, taking me back to days in Spain (or really, any tapas bar I could be at).

    Back to the grind, and I did a short workout with Zuleeka at Fly. It was of the boxing variant, and I’m finding it really tough to grasp boxing, in general. It just doesn’t seem like the workout for me. Anyway, showered, back to the grind, and was surprised at dinner to note that my mother and 2 aunts had cooked up a storm of my favourite dishes (Crabs, petai, tempeh, fried chicken, mutton, and salad), plus there was also a cake waiting later (butter cake with lovely icing). Socialised, extremely grateful, extremely appreciative, considering we are in a Conditional Movement Control Order (CMCO) and all this was done. And then I went on to doing another call…

    Feel very lucky overall. It has been a while since I’ve celebrated my birthday in KL. Let alone in my home. Where I’ve been self-isolating for 56 days. I never thought it would be this fun! Uptime of some 20 hours… Thanks for all the wishes.

    Malaysia’s Covid-19 situation on Tuesday: 6,742 (+16) total cases, 5,223 (+110) recoveries, 109 dead. 16 in ICU beds, 3 on ventilators.

    Restaurants can restart in Selangor for dine-ins (yes!). We’ve also been told that just your name and telephone number needs to be left behind for contact tracing (so no identity card number — good). I see in Singapore, Terminal 4 in Changi (AirAsia, Cathay Pacific) is closing temporarily later this week… joining Terminal 2; makes me sad to note this is happening. When will travel recover?

    I am stepping out tomorrow to take a look at what is going on. I have some errands to run. I will be in a mask. I will likely be in gym clothings (I don’t have enough jeans or shorts to cycle on a daily basis anyway; I have way more Lululemons though). I wish I had more time to read today, but hey, something’s gotta give!

    The post Life with Rona – Day 56 first appeared on Colin Charles Agenda.

    ,

    Chris NeugebauerReflecting on 10 years of not having to update WordPress

    Over the weekend, the boredom of COVID-19 isolation motivated me to move my personal website from WordPress on a self-managed 10-year-old virtual private server to a generated static site on a static site hosting platform with a content delivery network.

    This decision was overdue. WordPress never fit my brain particularly well, and it was definitely getting to a point where I wasn’t updating my website at all (my last post was two weeks before I moved from Hobart; I’ve been living in Petaluma for more than three years now).

    Settling on which website framework wasn’t a terribly difficult choice (I chose Jekyll, everyone else seems to be using it), and I’ve had friends who’ve had success moving their blogs over. The difficulty I ended up facing was that the standard exporter that everyone to move from WordPress to Jekyll uses does not expect Debian’s package layout.

    Backing up a bit: I made a choice, 10 years ago, to deploy WordPress on a machine that I ran myself, using the Debian system wordpress package, a simple aptitude install wordpress away. That decision was not particularly consequential then, but it chewed up 3 hours of my time on Saturday.

    Why? The exporter plugin assumes that it will be able to find all of the standard WordPress files in the usual WordPress places, and when it didn’t find that, it broke in unexpected ways. And why couldn’t it find it?

    Debian makes packaging choices that prioritise all the software on a system living side-by-side with minimal difficulty. It sets strict permissions. It separates application code from configuration from user data (which in the case of WordPress, includes plugins), in a way that is consistent between applications. This choice makes it easy for Debian admins to understand how to find bits of an application. It also minimises the chance of one PHP application from clobbering another.

    10 years later, the install that I had set up was still working, having survived 3-4 Debian versions, and so 3-4 new WordPress versions. I don’t recall the last time I had to think about keeping my WordPress instance secure and updated. That’s quite a good run. I’ve had a working website despite not caring about keeping it updated for at least three years.

    The same decisions that meant I spent 3 hours on Saturday doing a simple WordPress export saved me a bunch of time that I didn’t incrementally spend over the course a decade. Am I even? I have no idea.

    Anyway, the least I can do is provide some help to people who might run into this same problem, so here’s a 5-step howto.

    How to migrate a Debian WordPress site to Jekyll

    Should you find the Jekyll exporter not working on your Debian WordPress install:

    1. Use the standard WordPress export to export an XML feel of your site.
    2. Spin up a new instance of WordPress (using WordPress.com, or on a new Virtual Private Server, whatever, really).
    3. Import the exported XML feed.
    4. Install the Jekyll exporter plugin.
    5. Follow the documentation and receive a Jekyll export of your site.

    Basically, the plugin works with a stock WordPress install. If you don’t have one of those, it’s easy to move it over.

    Colin CharlesLife with Rona – Days 51, 52, 53, 54, and 55

    Long batch between writing. The daily bit failed. Oops. But the work has accelerated. From a workout perspective, Ping finally started charging for some workouts and I did one on Sunday (it was great). Mother’s Day lunch was good too, and we got a little additional gift from the Prime Minister — CMCO till June 9th. Frankly, my interest in staying home fully (self-imposed exile) is wearing thin. I think I’ll be out by May 13.

    Malaysia’s Covid-19 situation on Thursday: 6,467 (+39) total cases, 4,776 (+74) recoveries, 107 dead. 19 in ICU beds, 9 on ventilators.

    Malaysia’a Covid-19 situation on Friday: 6,535 (+68) total cases, 4,864 (+88) recoveries, 107 dead.

    Malaysia’s Covid-19 situation on Saturday: 6,589 (+54) total cases, 4,929 (+65) recoveries, 108 (+1) dead.

    Malaysia’s Covid-19 situation on Sunday: 6,656 (+67) total cases, 5,025 (+96) recoveries, 108 dead.

    Malaysia’s Covid-19 situation on Monday: 6,726 (+70) total cases, 5,113 (+88) recoveries, 109 (+1) dead. 20 in ICU beds, 7 on ventilators.

    We never did manage to bring the numbers down to single digits, did we? As long as the healthcare system can cope, right? Which then brings the question of how we have been tearing up the economy. MCO ups unemployment to 34-year high, says stats dept is truly scary. And it will get worse after the Hari Raya holidays. Things like Jonker Walk in Malacca are in trouble. Truly sad, considering this is part of visiting Malacca.

    While I do agree that no place should refuse custom without a mask, the tone was rather ridiculous; encourage people to wear masks. Always. Malaysia is also finding more foreign worker clusters, and my gut feeling is that it will be worse than Singapore, long term (we don’t have foreign worker dormitories, they fend for themselves; and we also have a lot of illegal immigration). It is no surprise, crowded living quarters are to blame. Lots of flip flopping around the serving of liquor, and it seems the general idea is yes, as long as you’re a restaurant (not a pub).

    Travel resumption seems to be a little nuts: China, South Korea Move to Revive Business Travel Between Them, Coronavirus: Australia and New Zealand consider opening borders to create ‘Trans-Tasman bubble’, Taiwan keeps its borders shut despite virus success. Why the UK’s New 14-Day Quarantine Rule Is Particularly Troubling to Business Travel Managers. All this seems to be long-term unsustainable.

    Now for a little gag, Royal Selangor Club bars closed after MCO violations. The Club reopened last week, and on Friday a circular went out to members saying we could even bring one guest per member. It didn’t last long. This is why we can’t have nice things.

    The post Life with Rona – Days 51, 52, 53, 54, and 55 first appeared on Colin Charles Agenda.

    ,

    Gary PendergastInstall the COVIDSafe app

    I can’t think of a more unequivocal title than that. 🙂

    The Australian government doesn’t have a good track record of either launching publicly visible software projects, or respecting privacy, so I’ve naturally been sceptical of the contact tracing app since it was announced. The good news is, while it has some relatively minor problems, it appears to be a solid first version.

    Privacy

    While the source code is yet to be released, the Android version has already been decompiled, and public analysis is showing that it only collects necessary information, and only uploads contact information to the government servers when you press the button to upload (you should only press that button if you actually get COVID-19, and are asked to upload it by your doctor).

    The legislation around the app is also clear that the data you upload can only be accessed by state health officials. Commonwealth departments have no access, neither do non-health departments (eg, law enforcement, intelligence).

    Technical

    It does what it’s supposed to do, and hasn’t been found to open you up to risks by installing it. There are a lot of people digging into it, so I would expect any significant issues to be found, reported, and fixed quite quickly.

    Some parts of it are a bit rushed, and the way it scans for contacts could be more battery efficient (that should hopefully be fixed in the coming weeks when Google and Apple release updates that these contact tracing apps can use).

    If it produces useful data, however, I’m willing to put up with some quirks. 🙂

    Usefulness

    I’m obviously not an epidemiologist, but those I’ve seen talk about it say that yes, the data this app produces will be useful for augmenting the existing contact tracing efforts. There were some concerns that it could produce a lot of junk data that wastes time, but I trust the expert contact tracing teams to filter and prioritise the data they get from it.

    Install it!

    The COVIDSafe site has links to the app in Apple’s App Store, as well as Google’s Play Store. Setting it up takes a few minutes, and then you’re done!

    ,

    Andrew RuthvenInstall Fedora CoreOS using FAI

    I've spent the last couple of days trying to deploy Fedora CoreOS to some physical hardware/bare metal for a colleague using the official PXE installer from Fedora CoreOS. It wasn't very pleasant, and just wouldn't work reliably.

    Maybe my expectations were to high, in that I thought I could use Ignition to prepare more of the system for me, as my colleague has been able to bare metal installs correctly. I just tried to use Ignition as documented.

    A few interesting aspects I encountered:

    1. The PXE installer for it has a 618MB initrd file. This takes quite a while to transfer via tftp!
    2. It can't build software RAID for the main install device (and the developers have no intention of adding this), and it seems very finicky to build other RAID sets for other partitions.
    3. And, well, I just kept having problems where the built systems would hang during boot for no obvious reason.
    4. The time to do an installation was incredibly long.
    5. The initrd image is really just running coreos-installer against the nominated device.

    During the night I got feed up with that process and wrote a Fully Automatic Installer (FAI) profile that'd install CoreOS instead. I can now use setup-storage from FAI using it's standard disk_config files. This allows me to build complicated disk configurations with software RAID and LVM easily.

    A big bonus is that a rebuild is a lot faster, timed from typing reboot to a fresh login prompt is 10 minutes - and this is on physical hardware so includes BIOS POST and RAID controller set up, twice each.

    I thought this might be of interest to other people, so the FAI profile I developed for this is located here: https://github.com/catalyst-cloud/fai-profile-fedora-coreos

    FAI was initially developed to deploy Debian systems, it has since been extended to be able to install a number of other operating systems, however I think this is a good example of how easy it is to deploy non-Debian derived operating systems using FAI without having to modify FAI itself.

    ,

    Gary PendergastBebo, Betty, and Jaco

    Wait, wasn’t WordPress 5.4 just released?

    It absolutely was, and congratulations to everyone involved! Inspired by the fine work done to get another release out, I finally completed the last step of co-leading WordPress 5.0, 5.1, and 5.2 (Bebo, Betty, and Jaco, respectively).

    My study now has a bit more jazz in it. 🙂

    ,

    Robert CollinsStrength training from home

    For the last year I’ve been incrementally moving away from lifting static weights and towards body weight based exercises, or callisthenics. I’ve been doing this for a number of reasons, including better avoidance of injury (if I collapse, the entire stack is dynamic, if a bar held above my head drops on me, most of the weight is just dead weight – ouch), accessibility during travel – most hotel gyms are very poor, and functional relevance – I literally never need to put 100 kg on my back, but I do climb stairs, for instance.

    Covid-19 shutting down the gym where I train is a mild inconvenience for me as a result, because even though I don’t do it, I am able to do nearly all my workouts entirely from home. And I thought a post about this approach might be of interest to other folk newly separated from their training facilities.

    I’ve gotten most of my information from a few different youtube channels:

    There are many more channels out there, and I encourage you to go and look and read and find out what works for you. Those 5 are my greatest hits, if you will. I’ve bought the FitnessFAQs exercise programs to help me with my my training, and they are indeed very effective.

    While you don’t need a gymnasium, you do need some equipment, particularly if you can’t go and use a local park. Exactly what you need will depend on what you choose to do – for instance, doing dips on the edge of a chair can avoid needing any equipment, but doing them with some portable parallel bars can be much easier. Similarly, doing pull ups on the edge of a door frame is doable, but doing them with a pull-up bar is much nicer on your fingers.

    Depending on your existing strength you may not need bands, but I certainly did. Buying rings is optional – I love them, but they aren’t needed to have a good solid workout.

    I bought parallettes for working on the planche.undefined Parallel bars for dips and rows.undefined A pull-up bar for pull-ups and chin-ups, though with the rings you can add flys, rows, face-pulls, unstable push-ups and more. The rings. And a set of 3 bands that combine for 7 different support amounts.undefinedundefined

    In terms of routine, I do a upper/lower split, with 3 days on upper body, one day off, one day on lower, and the weekends off entirely. I was doing 2 days on lower body, but found I was over-training with Aikido later that same day.

    On upper body days I’ll do (roughly) chin ups or pull ups, push ups, rows, dips, hollow body and arch body holds, handstands and some grip work. Today, as I write this on Sunday evening, 2 days after my last training day on Friday, I can still feel my lats and biceps from training Friday afternoon. Zero issue keeping the intensity up.

    For lower body, I’ll do pistol squats, nordic drops, quad extensions, wall sits, single leg calf raises, bent leg calf raises. Again, zero issues hitting enough intensity to achieve growth / strength increases. The only issue at home is having a stable enough step to get a good heel drop for the calf raises.

    If you haven’t done bodyweight training at all before, when starting, don’t assume it will be easy – even if you’re a gym junkie, our bodies are surprisingly heavy, and there’s a lot of resistance just moving them around.

    Good luck, train well!

    OpenSTEMOnline Teaching

    The OpenSTEM® materials are ideally suited to online teaching. In these times of new challenges and requirements, there are a lot of technological possibilities. Schools and teachers are increasingly being asked to deliver material online to students. Our materials can assist with that process, especially for Humanities and Science subjects from Prep/Kindy/Foundation to Year 6. […]

    The post Online Teaching first appeared on OpenSTEM Pty Ltd.

    Brendan ScottCovid 19 Numbers – lag

    Recording some thoughts about Covid 19 numbers.

    Today’s figures

    The Government says:

    “As at 6.30am on 22 March 2020, there have been 1,098 confirmed cases of COVID-19 in Australia”.

    The reference is https://www.health.gov.au/news/health-alerts/novel-coronavirus-2019-ncov-health-alert/coronavirus-covid-19-current-situation-and-case-numbers. However, that page is updated daily (ish), so don’t expect it to be the same if you check the reference.

    Estimating Lag

    If a person tests positive to the virus today, that means they were infected at some time in the past. So, what is the lag between infection and a positive test result?

    Incubation Lag – about 5 days

    When you are infected you don’t show symptoms immediately. Rather, there’s an incubation period before symptoms become apparent.  The time between being infected and developing symptoms varies from person to person, but most of the time a person shows symptoms after about 5 days (I recall seeing somewhere that 1 in a 1000 cases will develop symptoms after 14 days).

    Presentation Lag – about 2 days

    I think it’s fair to also assume that people are not presenting at testing immediately they become ill. It is probably taking them a couple of days from developing symptoms to actually get to the doctor – I read a story somewhere (have since lost the reference) about a young man who went to a party, then felt bad for days but didn’t go for a test until someone else from the party had returned a positive test.  Let’s assume there’s a mix of worried well and stoic types and call it 2 days from becoming symptomatic to seeking a test.

    Referral Lag – about a day

    Assuming that a GP is available straight away and recommends a test immediately, logistically there will still be most of a day taken up between deciding to see a doctor and having a test carried out.

    Testing lag – about 2 days

    The graph of infections “epi graph” today looks like this:

    200322_new-and-cumulative-covid-19-cases-in-australia-by-notification-date_1

    One thing you notice about the graph is that the new cases bars seem to increase for a couple of days, then decrease – so about 100 new cases in the last 24 hours, but almost 200 in the 24 hours before that. From the graph, the last 3 “dips” have been today (Sunday), last Thursday and last Sunday.  This seems to be happening every 3 to 4 days. I initially thought that the dips might mean fewer (or more) people presenting over weekends, but the period is inconsistent with that. I suspect, instead, that this actually means that testing is being batched.

    That would mean that neither the peaks nor troughs is representative of infection surges/retreats, but is simply reflecting when tests are being processed. This seems to be a 4 day cycle, so, on average it seems that it would be about 2 days between having the test conducted and receiving a result. So a confirmed case count published today is actually showing confirmed cases as at about 2 days earlier.

    Total lag

    From the date someone is infected to the time that they receive a positive confirmation is about:

    lag = time for symptoms to show+time to seek a test+referral time + time for the test to return a result

    So, the published figures on confirmed infections are probably lagging actual infections in the community by about 10 days (5+2+1+2).

    If there’s about a 10 day lag between infection and confirmation, then what a figure published today says is that about a week and a half ago there were about this many cases in the community.  So, the 22 March figure of 1098 infections is actually really a 12 March figure.

    What the lag means for Physical (ie Social) Distancing

    The main thing that the lag means is that if we were able to wave a magic wand today and stop all further infections, we would continue to record new infections for about 10 days (and the tail for longer). In practical terms, implementing physical distancing measures will not show any effect on new cases for about a week and a half. That’s because today there are infected people who are yet to be tested.

    The silver lining to that is that the physical distancing measures that have been gaining prominence since 15 March should start to show up in the daily case numbers from the middle of the coming week, possibly offset by overseas entrants rushing to make the 20 March entry deadline.

    Estimating Actual Infections as at Today

    How many people are infected, but unconfirmed as at today? To estimate actual infections you’d need to have some idea of the rate at which infections are increasing. For example, if infections increased by 10% per day for 10 days, then you’d multiply the most recent figure by 1.1 raised to the power of 10 (ie about 2.5).  Unfortunately, the daily rate of increase (see table on the wiki page) has varied a fair bit (from 20% to 27%) over the most recent 10 days of data (that is, over the 10 days prior to 12 March, since the 22 March figures roughly correspond to 12 March infections) and there’s no guarantee that since that time the daily increase in infections will have remained stable, particularly in light of the implementation of physical distancing measures. At 23.5% per day, the factor is about 8.

    There aren’t any reliable figures we can use to estimate the rate of infection during the current lag period (ie from 12 March to 22 March). This is because the vast majority of cases have not been from unexplained community transmission. Most of the cases are from people who have been overseas in the previous fortnight and they’re the cohort that has been most significantly impacted by recent physical distancing measures. From 15 March, they have been required to self isolate and from 20 March most of their entry into the country has stopped.  So I’d expect a surge in numbers up to about 30 March – ie reflecting infections in the cohort of people rushing to get into the country before the borders closed followed by a flattening. With the lag factor above, you’ll need to wait until 1 April or thereabouts to know for sure.

    Note:

    This post is just about accounting for the time lag between becoming infected and receiving a positive test result. It assumes, for example, that everyone who is infected seeks a test, and that everyone who is infected and seeks a test is, in fact, tested. As at today, neither of these things is true.

    ,

    Clinton Roylca2020 ReWatch 2020-02-02

    As I was an organiser of the conference this year, I didn’t get to see many talks, fortunately many of the talks were recorded, so i get to watch the conference well after the fact.

    Conference Opening

    That white balance on the lectern slides is indeed bad, I really should get around to adding this as a suggestion on the logos documentation. (With some help, I put up all the lectern covers, it was therapeutic and rush free).

    I actually think there was a lot of information in this introduction. Perhaps too much?

    OpenZFS and Linux

    A nice update on where zfs is these days.

    Dev/Ops relationships, status: It’s Complicated

    A bit of  a war story about production systems, leading to a moment of empathy.

    Samba 2020: Why are we still in the 1980s for authentication?

    There are a lot of old security standards that are showing there age, there are a lot of modern security standards, but which to choose?

    Tyranny of the Clock

    A very interesting problem solving adventure, with a few nuggets of interesting information about tools and techniques.

    Configuration Is (riskier than?) Code

    Because configuration files are parsed by a program, and the program changes how it runs depending on the contents of that configuration file, every program that parses configuration files is basically an interpreter, and thus every configuration file is basically a program. So, configuation is code, and we should be treating configuration like we do code, e.g. revision control, commenting, testing, review.

    Easy Geo-Redundant Handover + Failover with MARS + systemd

    Using a local process organiser to handle a cluster, interesting, not something I’d really promote. Not the best video cutting in this video, lots of time with the speaker pointing to his slides offscreen.

     

    ,

    Robert Collins2019 in the rearview

    2019 was a very busy year for us. I hadn’t realised how busy it was until I sat down to write this post. There’s also some moderately heavy stuff in here – if you have topics that trigger you, perhaps make sure you have spoons before reading.

    We had all the usual stuff. Movies – my top two were Alita and Abominable though the Laundromat and Ford v Ferrari were both excellent and moving pieces. I introduced Cynthia to Teppanyaki and she fell in love with having egg roll thrown at her face hole.

    When Cynthia started school we dropped gymnastics due to the time overload – we wanted some downtime for her to process after school, and with violin having started that year she was just looking so tired after a full day of school we felt it was best not to have anything on. Then last year we added in a specific learning tutor to help with the things that she approaches differently to the other kids in her class, giving 2 days a week of extra curricular activity after we moved swimming to the weekends.

    At the end of last year she was finally chipper and with it most days after school, and she had been begging to get into more stuff, so we all got together and negotiated drama class and Aikido.

    The drama school we picked, HSPA, is pretty amazing. Cynthia adored her first teacher there, and while upset at a change when they rearranged classes slightly, is again fully engaged and thrilled with her time there. Part of the class is putting on a full scale production – they did a version of the Happy Prince near the end of term 3 – and every student gets a part, with the ability for the older students to audition for more parts. On the other hand she tells me tonight that she wants to quit. So shrug, who knows :).

    I last did martial arts when I took Aikido with sensei Darren Friend at Aikido Yoshinkai NSW back in Sydney, in the late 2000’s. And there was quite a bit less of me then. Cynthia had been begging to take a martial art for about 4 years, and we’d said that when she was old enough, we’d sign her up, so this year we both signed up for Aikido at the Rangiora Aikido Dojo. The Rangiora dojo is part of the NZ organisation Aikido Shinryukan which is part of the larger Aikikai style, which is quite different, yet the same, as the Yoshinkai Aikido that I had been learning. There have been quite a few moments where I have had to go back to something core – such as my stance – and unlearn it, to learn the Aikikai technique. Cynthia has found the group learning dynamic a bit challenging – she finds the explanations – needed when there are twenty kids of a range of ages and a range of experience – from new intakes each term through to ones that have been doing it for 5 or so years – get boring, and I can see her just switch off. Then she misses the actual new bit of information she didn’t have previously :(. Which then frustrates her. But she absolutely loves doing it, and she’s made a couple of friends there (everyone is positive and friendly, but there are some girls that like to play with her after the kids lesson). I have gotten over the body disconnect and awkwardness and things are starting to flow, I’m starting to be able to reason about things without just freezing in overload all the time, so that’s not bad after a year. However, the extra weight is making my forward rolls super super awkward. I can backward roll easily, with moderately good form; forward rolls though my upper body strength is far from what’s needed to support my weight through the start of the roll – my arm just collapses – so I’m in a sort of limbo – if I get the moment just right I can just start the contact on the shoulder; but if I get the moment slightly wrong, it hurts quite badly. And since I don’t want large scale injuries, doing the higher rolls is very unnerving for me. I suspect its 90% psychological, but am not sure how to get from where I am to having confidence in my technique, other than rinse-and-repeat. My hip isn’t affecting training much, and sensei Chris seems to genuinely like training with Cynthia and I, which is very nice: we feel welcomed and included in the community.

    Speaking of my hip – earlier this year something ripped cartilage in my right hip – ended up having to have an MRI scan – and those machines sound exactly like a dot matrix printer – to diagnose it. Interestingly, having the MRI improved my symptoms, but we are sadly in hurry-up-and-wait mode. Before the MRI, I’d wake up at night with some soreness, and my right knee bent, foot on the bed, then sleepily let my leg collapse sideways to the right – and suddenly be awake in screaming agony as the joint opened up with every nerve at its disposal. When the MRI was done, they pumped the joint full of local anaesthetic for two purposes – one is to get a clean read on the joint, and the second is so that they can distinguish between referred surrounding pain, vs pain from the joint itself. It is to be expected with a joint issue that the local will make things feel better (duh), for up to a day or so while the local dissipates. The expression on the specialists face when I told him that I had had a permanent improvement trackable to the MRI date was priceless. Now, when I wake up with joint pain, and my leg sleepily falls back to the side, its only mildly uncomfortable, and I readjust without being brought to screaming awakeness. Similarly, early in Aikido training many activities would trigger pain, and now there’s only a couple of things that do. In another 12 or so months if the joint hasn’t fully healed, I’ll need to investigate options such as stem cells (which the specialist was negative about) or steroids (which he was more negative about) or surgery (which he was even more negative about). My theory about the improvement is that the cartilage that was ripped was sitting badly and the inflation for the MRI allowed it to settle back into the appropriate place (and perhaps start healing better). I’m told that reducing inflammation systematically is a good option. Turmeric time.

    Sadly Cynthia has had some issues at school – she doesn’t fit the average mould and while wide spread bullying doesn’t seem to be a thing, there is enough of it, and she receives enough of it that its impacted her happiness more than a little – this blows up in school and at home as well. We’ve been trying a few things to improve this – helping her understand why folk behave badly, what to do in the moment (e.g. this video), but also that anything that goes beyond speech is assault and she needs to report that to us or teachers no matter what.

    We’ve also had some remarkably awful interactions with another family at the school. We thought we had a friendly relationship, but I managed to trigger a complete meltdown of the relationship – not by doing anything objectively wrong, but because we had (unknown to me) different folkways, and some perfectly routine and normal behaviour turned out to be stressful and upsetting to them, and then they didn’t discuss it with us at all until it had brewed up in their heads into a big mess… and its still not resolved (and may not ever be: they are avoiding us both).

    I weighed in at 110kg this morning. Jan the 4th 2019 I was 130.7kg. Feb 1 2018 I was 115.2kg. This year I peaked at 135.4kg, and got down to 108.7kg before Christmas food set in. That’s pretty happy making all things considered. Last year I was diagnosed with Coitus headaches and though I didn’t know it the medicine I was put on has a known side effect of weight gain. And it did – I had put it down to ongoing failure to manage my diet properly, but once my weight loss doctor gave me an alternative prescription for the headaches, I was able to start losing weight immediately. Sadly, though the weight gain through 2018 was effortless, losing the weight through 2019 was not. Doable, but not effortless. I saw a neurologist for the headaches when they recurred in 2019, and got a much more informative readout on them, how to treat and so on – basically the headaches can be thought of as an instability in the system, and the medicines goal is to stabilise things, and once stable for a decent period, we can attempt to remove the crutch. Often that’s successful, sometimes not, sometimes its successful on a second or third time. Sometimes you’re stuck with it forever. I’ve been eating a keto / LCHF diet – not super strict keto, though Jonie would like me to be on that, I don’t have the will power most of the time – there’s a local truck stop that sells killer hotdogs. And I simply adore them.

    I started this year working for one of the largest companies on the planet – VMware. I left there in February and wrote a separate post about that. I followed that job with nearly the polar opposite – a startup working on a blockchain content distribution system. I wrote about that too. Changing jobs is hard in lots of ways – for instance I usually make friendships at my jobs, and those suffer some when you disappear to a new context – not everyone makes connections with you outside of the job context. Then there’s the somewhat non-rational emotional impact of not being in paid employment. The puritans have a lot to answer for. I’m there again, looking for work (and hey, if you’re going to be at Linux.conf.au (Gold Coast Australia January 13-17) I’ll be giving a presentation about some of the interesting things I got up to in the last job interregnum I had.

    My feet have been giving me trouble for a couple of years now. My podiatrist is reasonably happy with my progress – and I can certainly walk further than I could – I even did some running earlier in the year, until I got shin splints. However, I seem to have hyper sensitive soles, so she can’t correct my pro-nation until we fix that, which at least for now means a 5 minute session where I touch my feet, someone else does, then something smooth then something rough – called “sensory massage”.

    In 2017 and 2018 I injured myself at the gym, and in 2019 I wanted to avoid that, so I sought out ways to reduce injury. Moving away from machines was a big part of that; more focus on technique another part. But perhaps the largest part was moving from lifting dead weight to focusing on body weight exercises – callisthenics. This shifts from a dead weight to control when things go wrong, to an active weight, which can help deal with whatever has happened. So far at least, this has been pretty successful – although I’ve had minor issues – I managed to inflame the fatty pad the olecranon displaces when your elbow locks out – I’m nearly entirely transitioned to a weights-free program – hand stands, pistol squats, push ups, dead hangs and so on. My upper body strength needs to come along some before we can really go places though… and we’re probably going to max out the hamstring curl machine (at least for regular two-leg curls) before my core is strong enough to do a Nordic drop.

    Lynne has been worried about injuring herself with weight lifting at the gym for some time now, but recently saw my physio – Ben Cameron at Pegasus PhysioSouth – who is excellent, and he suggested that she could have less chronic back pain if she took weights back up again. She’s recently told me that I’m allowed one ‘told you so’ about this, since she found herself in a spot where previously she would have put herself in a poor lifting position, but the weight training gave her a better option and she intuitively used it, avoiding pain. So that’s a good thing – complicated because of her bodies complicated history, but an excellent trainer and physio team are making progress.

    Earlier this year she had a hell of a fright, with a regular eye checkup getting referred into a ‘you are going blind; maybe tomorrow, maybe within 10 years’ nightmare scenario. Fortunately a second opinion got a specialist who probably knows the same amount but was willing to communicate it with actual words… Lynne has a condition which diabetes (type I or II) can affect, and she has a vein that can alter state somewhat arbitrarily but will probably only degrade slowly, particularly if Lynne’s diet is managed as she has been doing.

    Diet wise, Lynne also has been losing some weight but this is complicated by her chronic idiopathic pancreatitis. That’s code for ‘it keeps happening and we don’t know why’ pancreatitis. We’ve consulted a specialist in the North Island who comes highly recommended by Lynne’s GP, who said that rapid weight loss is a little known but possible cause of pancreatitis – and that fits the timelines involved. So Lynne needs to lose weight to manage the onset of type II diabetes. But not to fast, to avoid pancreatitis, which will hasten the onset of type II diabetes. Aiee. Slow but steady – she’s working with the same doctor I am for that, and a similar diet, though lower on the fats as she has no gall… bladder.

    In April our kitchen waste pipe started chronically blocking, and investigation with a drain robot revealed a slump in the pipe. Ground penetrating radar reveal an anomaly under the garage… and this escalated. We’re going to have to move out of the house for a week while half the house’s carpets are lifted, grout is pumped into the foundations to tighten it all back up again – and hopefully they don’t over pump it – and then it all gets replaced. Oh, and it looks like the drive will be replaced again, to fix the slumped pipe permanently. It will be lovely when done but right now we’re facing a wall of disruption and argh.

    Around September I think, we managed to have a gas poisoning scare – our gas hob was left on and triggered a fireball which fortunately only scared Lynne rather than flambéing her face. We did however not know how much exposure we’d had to the LPG, nor to partially combusted gas – which produces toxic CO as a by-product, so there was a trip into the hospital for observation with Cynthia, with Lynne opting out. Lynne and Cynthia had had plenty of the basic symptoms – headaches, dizziness and so on at the the time, but after waiting for 2 hours in the ER queue that had faded. Le sigh. The hospital, bless their cotton socks don’t have the necessary equipment to diagnose CO poisoning without a pretty invasive blood test, but still took Cynthia’s vitals using methods (manual observation and a infra-red reader) that are confounded by the carboxyhemoglobin that forms from the CO that has been inhaled. Pretty unimpressed – our GP was livid. (This is one recommended protocol). Oh, and our gas hob when we got checked out – as we were not sure if we had left it on, or it had been misbehaving, turned out to have never been safe, got decertified and the pipe cut at the regulator. So we’re cooking on a portable induction hob for now.

    When we moved to Rangiora I was travelling a lot more, Christchurch itself had poorer air quality than Rangiora, and our financial base was a lot smaller. Now, Rangiora’s population has gone up nearly double (13k to 19k conservatively – and that’s ignoring the surrounds that use Rangiora as a base), we have more to work with, the air situation in Christchurch has improved massively, and even a busy years travel is less than I was doing before Cynthia came along. We’re looking at moving – we’re not sure where yet; maybe more country, maybe more city.

    One lovely bright spot over the last few years has been reconnecting with friends from school, largely on Facebook – some of whom I had forgotten that I knew back at school – I had a little clique but was not very aware of the wider school population in hindsight (this was more than a little embarrassing to me, as I didn’t want to blurt out “who are you?!”) – and others whom I had not :). Some of these reconnections are just light touch person-X exists and cares somewhat – and that’s cool. One in particular has grown into a deeper friendship than we had back as schoolkids, and I am happy and grateful that that has happened.

    Our cats are fat and happy. Well mostly. Baggy is fat and stressed and spraying his displeasure everywhere whenever the stress gets too much :(. Cynthia calls him Mr Widdlepants. The rest of the time he cuddles and purrs and is generally happy with life. Dibbler and Kitten-of-the-wild are relatively fine with everything.

    Cynthia’s violin is coming along well. She did a small performance for her classroom (with her teacher) and wowed them. I’ve been inspired to start practising trumpet again. After 27 years of decay my skills are decidedly rusty, but they are coming along. Finding arrangements for violin + trumpet is a bit challenging, and my sight-reading-with-transposition struggles to cope, but we make do. Lynne is muttering about getting a clarinet or drum-kit and joining in.

    So, 2019. Whew. I hope yours was less stressful and had as many or more bright points than ours. Onwards to 2020.

    ,

    BlueHackersBlueHackers crowd-funding free psychology services at LCA and other conferences

    BlueHackers has in the past arranged for a free counsellor/psychologist at several conferences (LCA, OSDC). Given the popularity and great reception of this service, we want to make this a regular thing and try to get this service available at every conference possible – well, at least Australian open source and related events.

    Right now we’re trying to arrange for the service to be available at LCA2020 at the Gold Coast, we have excellent local psychologists already, and the LCA organisers are working on some of the logistical aspects.

    Meanwhile, we need to get the funds organised. Fortunately this has never been a problem with BlueHackers, people know this is important stuff. We can make a real difference.

    Unfortunately BlueHackers hasn’t yet completed its transition from OSDClub project to Linux Australia subcommittee, so this fundraiser is running in my personal name. Well, you know who I (Arjen) am, so I hope you’re ok all with that.

    We have a little over a week until LCA2020 starts, let’s make this happen! Thanks. You can donate via MyCause.

    The post BlueHackers crowd-funding free psychology services at LCA and other conferences first appeared on BlueHackers.org.

    ,

    Robert CollinsA Cachecash retrospective

    In June 2019 I started a new role as a software engineer at a startup called Cachecash. Today is probably the last day of payroll there, and as is my usual practice, I’m going to reflect back on my time there. Less commonly, I’m going to do so in public, as we’re about to open the code (yay), and its not a mega-corporation with everything shuttered up (also yay).

    Framing

    This is intended to be a blameless reflection on what has transpired. Blameless doesn’t mean inaccurate; but it means placing the focus on the process and system, not on the particular actor that happened to be wearing the hat at the time a particular event happened. Sometimes the system is defined by the actors, and in that case – well, I’ll let you draw your own conclusions if you encounter that case.

    A retrospective that we can’t learn from is useless. Worse than useless, because it takes time to write and time to read and that time is lost to us forever. So if a thing is a particular way, it is going to get said. Not to be mean, but because false niceness will waste everyone’s time. Mine and my ex-colleagues whose time I respect. And yours, if you are still reading this.

    What was Cachecash

    Cachecash was a startup – still is in a very technical sense, corporation law being what it is. But it is still a couple of code bases – and a nascent open source project (which will hopefully continue) – built to operationalise and productise this research paper that the Cachecash founders wrote.

    What it isn’t anymore is a company investing significant amounts of time and money in the form of engineering in making code, to make those code bases better.

    Cachecash was also a team of people. That obviously changed over time, but at the time I write this it is:

    • Ghada
    • Justin
    • Kevin
    • Marcus
    • Petar
    • Robert
    • Scott

    And we’re all pretty fantastic, if you ask me :).

    Technical overview

    The CAPNet paper that I linked above doesn’t describe a product. What it describes is a system that permits paying caches (think squid/varnish etc) for transmitting content to clients, while also detecting attempts by such caches to claim payment when they haven’t transmitted, or attempting to collude with a client to pretend to overtransmit and get paid that way. A classic incentives-aligned scheme.

    Note that there is no blockchain involved at this layer.

    The blockchain was added into this core system as a way to build a federated marketplace – the idea was that the blockchain provided a suitable substrate for negotiating the purchase and sale of contracts that would be audited using the CAPNet accounting system, the payments could be micropayments back onto the blockchain, and so on – we’d avoid the regular financial system, and we wouldn’t be building a fragile central system that would prevent other companies also participating.

    Miners would mine coins, publishers would buy coins then place them in escrow as a promise to pay caches to deliver content to clients, and a client would deliver proof of delivery back to the cache which would then claim payment from the publisher.

    Technical Challenges

    There were a few things that turned up as significant issues. In no particular order:

    The protocol

    The protocol itself adds additional round trips to multiple peers – in its ‘normal’ configuration the client ends up running (web- for browers) GRPC connections to 5 endpoints (with all the normal windowing concerns, but potentially over QUIC), and then gets chunks of content in batches (concurrently) from 4 of the peers, runs a small crypto brute force operation on the combined result, and then moves onto the next group of content. This should be sounding suspiciously like TCP – it is basically a window management problem, and it has exactly the same performance management problems – fast start, maximum window size, how far to reduce it when problems are suffered. But accentuated: those 4 cache peers can all suffer their own independent noise problems, or be hostile. But also, they can also suffer correlated problems: they might all be in the same datacentre, or be all run by a hostile actor, or the client might be on a hostile WiFi link, or the client’s OS/browser might be hostile. Lets just say that there is a long, rich road for optimising this new protocol to make it fast, robust, reliable. Much as we have taken many years to make HTTP into QUIC, drawing upon techniques like forward error correction rather than retries – similar techniques will need to be applied to give this protocol similar performance characteristics. And evolving the protocol while maintaining the security properties is a complicated task, with three actors involved, who may collude in various ways.

    An early performance analysis I did on the go code implementation showed that the brute forcing work was a bottleneck because while the time (once optimise) per second was entirely modest for any small amount of data, the delay added per window element acts as a brake on performance for high capacity low latency links. For a 1Gbps 25ms RTT link I estimated a need for 8 cores doing crypto brute forcing on the client.

    JS

    Cachecash is essentially implementing a new network protocol. There are some great hooks these days in browsers, and one can hook in and provide streams to things like video players to let them get one segment of video. However, for downloading an entire file – for instance, if one is downloading a full video, it is not so easy. This bug, open for 2 years now, is the standards based way to do it. Even so non-standards based way to do it involves buffering the entire content in memory, oh and reflecting everything through a static github service worker. (You of course host such a static page yourself, but then the whole idea of this federated distributed system breaks down a little).

    Our initial JS implementation was getting under 512KBps with all-local servers – part of that was the bandwidth delay product issue mentioned above. Moving to getting chunks of content from each cache concurrently using futures improved that up to 512KBps, but thats still shocking for a system we want to be able to compete with the likes of Youtube, Cloudflare and Akamai.

    One of the hot spots turned out to be calculating SHA-256 values – the CAPNet algorithm calculates thousands (it’s tunable, but 8k in the set I was analysing) of independent SHA’s per chunk of received data. This is a problem – in browser SHA routines, even the recent native hosted ones – are slow per SHA. They are not slow per byte. Most folk want to make a small number of SHA calculations. Maybe thousands in total. Not tens of thousands per MB of data received….. So we wrote an implementation of the core crypto routines in Rust WASM, which took our performance locally up to 2MBps in Firefox and 6MBps in Chromium.

    It is also possible we’d show up as crypto-JS at that point and be blacklisted as malware!

    Blockchain

    Having chosen to involve a block chain in the stack we had to deal with that complexity. We chose to take bitcoin’s good bits and run with those rather than either running a sidechain, trying to fit new transaction types into bitcoin itself, or trying to shoehorn our particular model into e.g. Ethereum. This turned out to be a fairly large amount of work : not the core chain itself – cloning the parts of bitcoin that we wanted was very quick. But then layering on the changes that we needed, to start dealing with escrows and negotiating parameters between components and so forth. And some of the operational challenges below turned up here as well even just in developer test setups (in particular endpoint discovery).

    Operational Challenges

    The operational model was pretty interesting. The basic idea was that eventually there would be this big distributed system, a bit-coin like set of miners etc, and we’d be one actor in that ecosystem running some subset of the components, but that until then we’d be running:

    • A centralised ledger
    • Centralised random number generation for the micropayment system
    • Centralised deployment and operations for the cache fleet
    • Software update / vetting for the publisher fleet
    • Software update / publishing for the JS library
    • Some number of seed caches
    • Demo publishers to show things worked
    • Metrics, traces, chain explorer, centralised logging

    We had most of this live and running in some fashion for most of the time I was there – we evolved it and improved it a number of times as we iterated on things. Where appropriate we chose open source components like Jaeger, Prometheus and Elasticsearch. We also added policy layers on top of them to provide rate limiting and anti-spoofing facilities. We deployed stuff in AWS, with EKS, and there were glitches and things to workaround but generally only a tiny amount of time went into that part of it. I think I spent a day on actual operations a month, or thereabouts.

    Other parties were then expected to bring along additional caches to expand the network, additional publishers to expand the content accessible via the network, and clients to use the network.

    Ensuring a process run by a third party is network reachable by a browser over HTTPS is a surprisingly non-simple problem. We partly simplified it by mandating that they run a docker container that we supplied, but there’s still the chance that they are running behind a firewall with asymmetric ingress. And after that we still need a domain name for their endpoint. You can give every cache a CNAME in a dedicated subdomain – say using their public key as the subdomain, so that only that cache can issue requests to update their endpoint information in DNS. It is all solvable, but doing it so that the amount of customer interaction and handholding is reduced to the bare minimum is important: a user with a fleet of 1000 machines doesn’t want to talk to us 1000 times, and we don’t want to talk to them either. But this was a another bit of this-isn’t-really-distributed-is-it grit in the distributed-ointment.

    Adoption Challenges

    ISPs with large fleets of machines are in principle happy to sell capacity on them in return for money – yay. But we have no revenue stream at the moment, so they aren’t really incentivised to put effort in, it becomes a matter of principle, not a fiscal “this is 10x better for my business” imperative. And right now, its 10x slower than HTTP. Or more.

    Content owners with large amounts of content being delivered without a CDN would like a radically cheaper CDN. Except – we’re not actually radically cheaper on a cost structure basis. Current CDN’s are expensive for their expensive 2nd and third generation products because no-one offers what they offer – seamless in-request edge computing. But that ISP that is contributing a cache to the fleet is going to want the cache paid for, and thats the same cost structure as existing CDNs – who often have a free entry tier. We might have been able to make our network cheaper eventually, but I’m just not sure about the radically cheaper bit.

    Content owners who would like a CDN marketplace where the CDN caches are competing with each other – driving costs down – rather than than the CDN operators competing – would absolutely love us. But I rather suspect that those owners want more sophisticated offerings. To be clear, I wasn’t on the customer development team, and didn’t get much in the way of customer development briefings. But things like edge computing workers, where completely custom code can run in the CDN network, adjacent to ones user, are much more powerful offerings than simple static content shipping offerings, and offered by all major CDN’s. These are trusted services – the CAPNet paper doesn’t solve the problem of running edge code and providing proof that it was run. Enarx might go some, or even a long way way to running such code in an untrusted context, but providing a proof that it was run – so that running it can become a mining or mining-like operation is a whole other question. Without such an answer, an edge computing network starts to depend on trusting the caches behaviour a lot more all over again – the network has no proof of execution to depend on.

    Rapid adjustment – load spikes – is another possible use case, but the use of the blockchain to negotiate escrows actually seemed to work against our ability to offer that. Akami define load spike in a time frame faster than many block chains can decide that a transaction has actually been accepted. Offchain transactions are of course a known thing in the block chain space but again that becomes additional engineering.

    Our use of a new network protocol – for all that it was layered on standard web technology – made it harder for potential content owners to adopt our technology. Rather than “we have 200 local proxies that will deliver content to your users, just generate a url of the form X.Y.Z”, our solution is “we do not trust the 200 local proxies that we have, so you need to run complicated JS in your browser/phone app etc” to verify that the proxies are actually doing their job. This is better in some ways – precisely because we don’t trust those proxies, but it also increases both the runtime cost of using the service, the integration cost adopting the service, and complexity of debugging issues receiving content via the service.

    What did we learn?

    It is said that “A startup is an organization formed to search for a repeatable and scalable business model.” What did we uncover in our search? What can we take away going forward?

    In principle we have a classic two sided market – people with excess capacity close to users want to sell it, and people with excess demand for their content want to buy delivery capacity.

    The baseline market is saturated. The market as a whole is on its third or perhaps fourth (depending on how you define things) major iteration of functionality.

    Content delivery purchasers are ok with trusting their suppliers : any supply chain fraud happening in this space at the moment is so small no-one is talking about it that I heard about.

    Some of the things we were doing don’t seem to have been important to the customers we talked to – I don’t have a great read on this, but in particular, the blockchain aspect seems to have been more important to our long term vision than to the 2-sided market place that we perceived. It would be fascinating to me to validate that somehow – would cache capacity suppliers be willing to trust us enough to sell capacity to us with just the auditing mechanism, without the blockchain? Would content providers be happy buying credit from us rather than from a neutral exchange?

    What did I learn?

    I think in hindsight my startup muscles were atrophied – it had been some years since Canonical and it took a few months to start really thinking lean-startup again on a personal basis. That’s ok, because I was hired to build systems. But its not great, because I can do better. So number one: think lean-startup and really step up to help with learning and validation.

    I levelled up my Go lang skills. That was really nice – Kevin has deep knowledge there, and though I’ve written Go before I didn’t have a good appreciation for style or aesthetics, or why. I do now. Where before I’d say ‘I’m happy to dive in but its not a language I feel I really know’, I am now happy to say that I know Go. More to learn – there always is – but in a good place.

    I did a similar thing to my JS skills, but not to the same degree. Having climbed fairly deeply into the JS client – which is written in Typescript, converted its bundling system to webpack to work better with Rust-WASM, and so on. Its still not my go-to place, but I’m much more comfortable there now.

    And of course playing with Rust-WASM was pure delight. Markus and I are both Rust afficionados, and having a genuine reason to write some Rust code for work was just delightful. Finding this bug was just a bonus :).

    It was also really really nice being back in a truely individual contributor role for a while. I really enjoyed being able to just fix bugs and get on with things while I got my bearings. I’ve ended up doing a bit more leadership – refining of requirements, translating between idea-and-specification and the like recently, but still about 80% of time has been able to be sit-down-and-code, and that really is a pleasant holiday.

    What am I going to change?

    I’m certainly going to get a new job :). If you’re hiring, hit me up. (If you don’t have my details already, linkedin is probably best).

    I’m think there the core thing I need to do is more alignment of the day to day work I’m doing with needs of customer development : I don’t want to take on or take over the customer development role – that will often be done best in person with a customer for startups, and I’m happy remote – but the more I can connect what I’m trying to achieve with what will get the customers to pay us, the more successful any business I’m working in will be. This may be a case for non-vanity metrics, or talking more with the customer-development team, or – well, I don’t know exactly what it will look like until I see the context I end up in, but I think more connection will be important.

    And I think the second major thing is to find a better balance between individual contribution and leadership. I love individual contribution, it is perhaps the least stressful and most Zen place to be. But it is also the least effective unless the project has exactly one team member. My most impactful and successful roles have been leadership roles, but the pure leadership role with no individual contribution slowly killed me inside. Pure individual contribution has been like I imagine crack to be, and perhaps just as toxic in the long term.

    ,

    Robert CollinsRust and distributions

    Daniel wrote a lovely blog post about Rust’s ability to be included in distributions, both as a language that you can get via the distribution, and as the language that components of the distribution are being written in.

    I think this is a great goal to raise and I have just a few thoughts and quibbles. First I want to acknowledge and agree with him on the Rust community, its so very nice, and he is doing a great thing as rustup lead; I wish I had more time to put in, I have more things I want to contribute to rustup. I’ll try to get back to the meetings soon.

    On trust

    I completely agree about the need for the crates index improvement : without those we cannot have a mirror network, and thats a significant issue for offline users and slow-region users.

    On curlsh though

    It isn’t the worst possible thing, for all that its “untrusted bootstrapping”, the actual thing downloaded is https secured etc, and so is the rustup binary itself. Put another way, I think the horror is more perceptual than analyzed risk. Someone that trusts Verisign etc enough to download the Debian installer enough over it, has exactly the same risk as someone trusting Verisign enough to download rustup at that point in time.

    Cross signing curlsh that with per-distro keys or something seems pretty ridiculous to me, since the root of trust is still that first download; unless you’re wandering up to someone who has bootstrapped their compiler by hand (to avoid reflections-on-trust attacks), to get an installer, to build a system, to then do reproducible builds, to check that other systems are actually safe… aieeee.

    I think its easier to package the curl|sh shell script in Debian itself perhaps? apt install get-rustup; then if / when rustup becomes packaged the user instructions don’t change but the root of trust would, as get-rustup would be updated to not download rustup, but to trigger a different package install, and so forth.

    I don’t think its desirable though, to have distribution forks of the contents that rustup manages – Debian+Redhat+Suse+… builds of nightly rust with all the things failing or not, and so on – I don’t see who that would help. And if we don’t have that then the root of trust would still not be shifted under the GPG keychain – it would still be the HTTPS infrastructure for downloading rust toolchains + the integrity of the rustup toolchain builds themselves. Making rustup, which currently shares that trust, have a different trust root, seems pointless.

    On duplication of dependencies

    I think Debian needs to become more inclusive here, not Rustup. Debian has spent; pauses, counts, yes, DECADES, rejecting multiple entire ecosystems because of a prejuidiced view about what the Right Way to manage dependencies is. And they are not right in a universal sense. They were right in an engineering sense: given constraints (builds are expensive, bandwidth is expensive, disk is expensive), they are right. But those are not universal constraints, and seeking to impose those constraints on Java and Node – its been an unmitigated disaster. It hasn’t made those upstreams better, or more secure, or systematically fixed problems for users. I have another post on this so rather than repeating I’m going to stop here :).

    I think Rust has – like those languages – made the crucial, maintainer and engineering efficiency important choice to embrace enabling incremental change across libraries, with the consequence that dependencies don’t shift atomically, and sure, this is basically incompatible with Debian packaging world view which says that point and patch releases of libraries are not distinct packages, and thus the shared libs for these things all coexist in the same file on disk. Boom! Crash!

    I assert that it is entirely possible to come up with a reasonable design for managing a respository of software that doesn’t make this conflation, would allow actual point and patch releases of exist as they are for the languages that have this characteristic, and be amenable to automation, auditing and reporting for security issues. E.g. Modernise Debian to cope with this fundamentally different language design decision… which would make Java and Node and Rust work so very much better.

    Alternatively, if Debian doesn’t want to make it possible to natively support languages that have made this choice, Debian could:

    • ship static-but-for-system-libs builds
    • not include things written in rust
    • ask things written in rust to converge their dependencies again and again and again (and only update them when the transitive dependencies across the entire distro have converged)

    I have a horrible suspicion about which Debian will choose to do :(. The blinkers / echo chamber are so very strong in that community.

    For Windows

    We got to parity with Linux for IO for non-McAfee users, but I guess there are a lot of them out there; we probably need to keep pushing on tweaking it until it work better for them too; perhaps autodetect McAfee and switch to minimal? I agree that making Windows users – like I am these days – feel tier one, would be nice :). Maybe a survey of user experience would be a good starting point.

    Shared libraries

    Perhaps generating versioned symbols automatically and building many versions of the crate and then munging them together? But I’d also like to point here again that the whole focus on shared libraries is a bit of a distribution blind spot, and looking at the vast amount of distribution of software occuring in app stores and their model, suggests different ways of dealing with these things. See also the fairly specific suggestion I make about the packaging system in Debian that is the root of the problem in my entirely humble view.

    Bonus

    John Goerzen posted an entirely different thing recently, but in it he discusses programs that don’t properly honour terminfo. Sadly I happen to know that large chunks of the Rust ecosystem assume that everything is ANSI these days, and it certainly sounds like, at least for John, that isn’t true. So thats another way in which Rust could be more inclusive – use these things that have been built, rather than being modern and new age and reinventing the 95% match.

    ,

    Julien GoodwinSome thoughts on Storytelling as an engineering teaching tool

    Every week at work on Wednesday afternoons we have the SRE ops review, a relaxed two hour affair where SREs (& friends of, not all of whom are engineers) share interesting tidbits that have happened over the last week or so, this might be a great success, an outage, a weird case, or even a thorny unsolved problem. Usually these relate to a service the speaker is oncall for, or perhaps a dependency or customer service, but we also discuss major incidents both internal & external. Sometimes a recent issue will remind one of the old-guard (of which I am very much now a part) of a grand old story and we share those too.

    Often the discussion continues well into the evening as we decant to one of the local pubs for dinner & beer, sometimes chatting away until closing time (probably quite regularly actually, but I'm normally long gone).

    It was at one of these nights at the pub two months ago (sorry!), that we ended up chatting about storytelling as a teaching tool, and a colleague asked an excellent question, that at the time I didn't have a ready answer for, but I've been slowly pondering, and decided to focus on over an upcoming trip.

    As I start to write the first draft of this post I've just settled in for cruise on my first international trip in over six months[1], popping over to Singapore for the Melbourne Cup weekend, and whilst I'd intended this to be a holiday, I'm so terrible at actually having a holiday[2] that I've ended up booking two sessions of storytelling time, where I present the history of Google's production networks (for those of you reading this who are current of former engineering Googlers, similar to Traffic 101). It's with this perspective of planning, and having run those sessions that I'm going to try and answer the question that I was asked.

    Or at least, I'm going to split up the question I was asked and answer each part.

    "What makes storytelling good"

    On its own this is hard to answer, there are aspects that can help, such as good presentation skills (ideally keeping to spoken word, but simple graphs, diagrams & possibly photos can help), but a good story can be told in a dry technical monotone and still be a good story. That said, as with the rest of these items charisma helps.

    "What makes storytelling interesting"

    In short, a hook or connection to the audience, for a lot of my infrastructure related outage stories I have enough context with the audience to be able to tie the impact back in a way that resonates with a person. For larger disparate groups shared languages & context help ensure that I'm not just explaining to one person.

    In these recent sessions one was with a group of people who work in our Singapore data centre, in that session I focused primarily on the history & evolution of our data centre fabrics, giving them context to understand why some of the (at face level) stranger design decisions have been made that way.

    The second session was primarily people involved in the deployment side of our backbone networks, and so I focused more on the backbones, again linking with knowledge the group already had.

    "What makes storytelling entertaining"

    Entertaining storytelling is a matter of style, skills and charisma, and while many people can prepare (possibly with help) an entertaining talk, the ability to tell an entertaining story off the cuff is more of a skill, luckily for me, one I seem to do ok with. Two things that can work well are dropping in surprises, and where relevant some level of self-deprecation, however both need to be done very carefully.

    Surprises can work very well when telling a story chronologically "I assumed X because Y, <five minutes of waffling>, so it turned out I hadn't proved Y like I thought, so it wasn't X, it was Z", they can help the audience to understand why a problem wasn't solved so easily, and explaining "traps for young players" as Dave Jones (of the EEVblog) likes to say can themselves be really helpful learning elements. Dropping surprises that weren't surprises to the story's protagonist generally only works if it's as a punchline of a joke, and even then it often doesn't.

    Self-deprecation is an element that I've often used in the past, however more recently I've called others out on using it, and have been trying to reduce it myself, depending on the audience you might appear as a bumbling success or stupid, when the reality may be that nobody understood the situation properly, even if someone should have. In the ops review style of storytelling, it can also lead to a less experienced audience feeling much less confident in general than they should, which itself can harm productivity and careers.

    If the audience already had relevant experience (presenting a classic SRE issue to other SREs for example, a network issue to network engineers, etc.) then audience interaction can work very well for engagement. "So the latency graph for database queries was going up and to the right, what would you look at?" This is also similar to one of the ways to run a "wheel of misfortune" outage simulation.

    "What makes storytelling useful & informative at the same time"

    In the same way as interest, to make storytelling useful & informative for the audience involves consideration for the audience, as a presenter if you know the audience, at least in broad strokes this helps. As I mentioned above, when I presented my talk to a group of datacenter-focused people I focused on the DC elements, connecting history to the current incarnations; when I presented to a group of more general networking folk a few days later, I focused more on the backbones and other elements they'd encountered.

    Don't assume that a story will stick wholesale, just leaving a few keywords, or even just a vague memory with a few key words they can go digging for can make all the difference in the world. Repetition works too, sharing many interesting stories that share the same moral (for an example, one of the ops review classics is demonstrations about how lack of exponential backoff can make recovery from outages hard), hearing this over dozens of different stories over weeks (or months, or years...) it eventually seeps in as something to not even question having been demonstrated as such an obvious foundation of good systems.

    When I'm speaking to an internal audience I'm happy if they simply remember that I (or my team) exist and might be worth reaching out to in future if they have questions.

    Lastly, storytelling is a skill you need to practice, whether a keynote presentation in front of a few thousand people, or just telling tall takes to some mates at the pub practice helps, and eventually many of the elements I've mentioned above become almost automatic. As can probably be seen from this post I could do with some more practice on the written side.

    1: As I write these words I'm aboard a Qantas A380 (QF1) flying towards Singapore, the book I'm currently reading, of all things about mechanical precision ("Exactly: How Precision Engineers Created the Modern World" or as it has been retitled for paperback "The Perfectionists"), has a chapter themed around QF32, the Qantas A380 that notoriously had to return to Singapore after an uncontained engine failure. Both the ATSB report on the incident and the captain Richard de Crespigny's book QF32 are worth reading. I remember I burned though QF32 one (very early) morning when I was stuck in GlobalSwitch Sydney waiting for approval to repatch a fibre, one of the few times I've actually dealt with the physical side of Google's production networks, and to date the only time the fact I live just a block from that facility has been used at all sensibly.

    2: To date, I don't think I've ever actually had a holiday that wasn't organised by family, or attached to some conference, event or work travel I'm attending. This trip is probably the closest I've ever managed (roughly equal to my burnout trip to Hawaii in 2014), and even then I've ruined it by turning two of the three weekdays into work. I'm much better at taking breaks that simply involve not leaving home or popping back to stay with family in Melbourne.

    ,

    Tim SerongNetwork Maintenance

    To my intense amazement, it seems that NBN Co have finally done sufficient capacity expansion on our local fixed wireless tower to actually resolve the evening congestion issues we’ve been having for the past couple of years. Where previously we’d been getting 22-23Mbps during the day and more like 2-3Mbps (or worse) during the evenings, we’re now back to 22-23Mbps all the time, and the status lights on the NTD remain a pleasing green, rather than alternating between green and amber. This is how things were way back at the start, six years ago.

    We received an email from iiNet in early July advising us of the pending improvements. It said:

    Your NBN™ Wireless service offers maximum internet speeds of 25Mbps downland and 5Mbps upload.

    NBN Co have identified that your service is connected to a Wireless cell that is currently experiencing congestion, with estimated typical evening speeds of 3~6 Mbps. This congestion means that activities like browsing, streaming or gaming might have been and could continue to be slower than promised, especially when multiple people or devices are using the internet at the same time.

    NBN Co estimates that capacity upgrades to improve the speed congestion will be completed by Dec-19.

    At the time we were given the option of moving to a lower speed plan with a $10 refund because we weren’t getting the advertised speed, or to wait it out on our current plan. We chose the latter, because if we’d downgraded, that would have reduced our speed during the day, when everything was otherwise fine.

    We did not receive any notification from iiNet of exactly when works would commence, nor was I ever able to find any indication of planned maintenance on iiNet’s status page. Instead, I’ve come to rely on notifications from my neighbour, who’s with activ8me. He receives helpful emails like this:

    This is a courtesy email from Activ8me, Letting you know NBN will be performing Fixed Wireless Network capacity work in your area that might affect your connectivity to the internet. This activity is critical to the maintenance and optimisation of the network. The approximate dates of this maintenance/upgrade work will be:

    Impacted location: Neika, TAS & Downstream Sites & Upstream Sites
    NBN estimates interruption 1 (Listed Below) will occur between:
    Start: 24/09/19 7:00AM End: 24/09/19 8:00PM
    NBN estimates interruption 2 (Listed Below) will occur between:
    Start: 25/09/19 7:00AM End: 25/09/19 8:00PM
    NBN estimates interruption 3 (Listed Below) will occur between:
    Start: 01/10/19 7:00AM End: 01/10/19 8:00PM
    NBN estimates interruption 4 (Listed Below) will occur between:
    Start: 02/10/19 7:00AM End: 02/10/19 8:00PM
    NBN estimates interruption 5 (Listed Below) will occur between:
    Start: 03/10/19 7:00AM End: 03/10/19 8:00PM
    NBN estimates interruption 6 (Listed Below) will occur between:
    Start: 04/10/19 7:00AM End: 04/10/19 8:00PM
    NBN estimates interruption 7 (Listed Below) will occur between:
    Start: 05/10/19 7:00AM End: 05/10/19 8:00PM
    NBN estimates interruption 8 (Listed Below) will occur between:
    Start: 06/10/19 7:00AM End: 06/10/19 8:00PM

    Change start
    24/09/2019 07:00 Australian Eastern Standard Time

    Change end
    06/10/2019 20:00 Australian Eastern Daylight Time

    This is expected to improve your service with us however, occasional loss of internet connectivity may be experienced during the maintenance/upgrade work.
    Please note that the upgrades are performed by NBN Co and Activ8me has no control over them.
    Thank you for your understanding in this matter, and your patience for if it does affect your service. We appreciate it.

    The astute observer will note that this is pretty close to two weeks of scheduled maintenance. Sure enough, my neighbour and I (and presumably everyone else in the area) enjoyed major outages almost every weekday during that period, which is not ideal when you work from home. But, like I said at the start, they did finally get the job done.

    Interestingly, according to activ8me, there is yet more NBN maintenance scheduled from 21 October 07:00 ’til 27 October 21:00, then again from 28 October 07:00 ’til 3 November 21:00 (i.e. another two whole weeks). The only scheduled upgrade I could find listed on iiNet’s status page is CM-177373, starting “in 13 days” with a duration of 6 hours, so possibly not the same thing.

    Based on the above, I am convinced that there is some problem with iiNet’s status page not correctly reporting NBN incidents, but of course I have no idea whether this is NBN Co not telling iiNet, iiNet not listening to NBN Co, or if it’s just that the status web page is busted.

    ,

    Robert CollinsWant me to work with you?

    Reach out to me – I’m currently looking for something interesting to do. https://www.linkedin.com/in/rbtcollins/ and https://twitter.com/rbtcollins are good ways to grab me if you don’t already have my details.

    Should you reach out to me? Maybe :). First, a little retrospective.

    Three years ago, I wrote the following when reflecting on what I wanted to be doing:

    Priorities (roughly ordered most to least important):

    • Keep living in Rangiora (family)
    • Up to moderate travel requirements – 4 trips a year + LCA/PyCon
    • Significant autonomy (not at the expense of doing the right thing for the company, just I work best with the illusion of free will 🙂 )
    • Be doing something that matters
      • -> Being open source is one way to this, but not the only one
    • Something cutting edge would be awesome
      • -> Rust / Haskell / High performance requirements / scale / ….
    • Salary

    How well did that work for me? Pretty good. I had a good satisfying job at VMware for 3 years, met some wonderful people, achieved some very cool things. And those priorities above were broadly achieved.
    The one niggle that stands out was this – Did the things we were doing matter? Certainly there was no social impact – VMware isn’t a non-profit, being right at the core of capitalism as it is. There was direct connection and impact with the team, the staff we worked with and the users of the products… but it is just a bit hard to feel really connected through that though: VMware is a very large company and there are many layers between users and developers.

    We were quite early adopters of Kubernetes, which allowed me to deepen my Go knowledge and experience some more fun with AWS scale operations. I had many interesting discussions about the relative strengths of Python Go and Rust and Java with colleagues there. (Hi Geoffrey).

    Company culture is very important to me, and VMware has a fantastically supportive culture. One of the most supportive companies I’ve been in, bar none. It isn’t a truely remote-organised company though: rather its a bunch of offices that talk to each other, which I think is sad. True remote-first offers so much more engagement.

    I enjoy building things to solve problems. I’ve either directly built, or shaped what is built, in all my most impactful and successful roles. Solving a problem once by hand is fine; solving it for years to come by creating a tool is far more powerful.

    I seem to veer into toolmaking very often: giving other people the ability to solve their problems takes the power of a tool and multiplies it even further.

    It should be no surprise then that I very much enjoy reading white papers like the original Dapper and Map-reduce ones, LinkedIn’s Kafka or for more recent fodder the Facebook Akkio paper. Excellent synthesis and toolmaking applied at industrial scale. I read those things and I want to be a part of the creation of those sorts of systems.

    I was fortunate enough to take some time to go back to university part-time, which though logistically challenging is something I want to see through.

    Thus I think my new roughly ordered (descending) list of priorities needs to be something like this:

    • Keep living in Rangiora (family)
    • Up to moderate travel requirements – 4 team-meeting trips a year + 2 conferences
    • Significant autonomy (not at the expense of doing the right thing for the company, just I work best with the illusion of free will 🙂 )
    • Be doing something that matters
      • Be working directly on a problem / system that has problems
    • Something cutting edge would be awesome
      • Rust / Haskell / High performance requirements / scale / ….
    • A generative (Westrum definition) + supportive company culture
    • Remote-first or at least very remote familiar environment
    • Support my part time study / self improvement initiative
    • Salary

    ,

    Clinton RoyRestricted Sleep Regime

    Since moving down to Melbourne my poor sleep has started up again. It’s really hard to say what the main factor driving this is. My doctor down here has put me onto a drug free way of trying to improve my sleep, and I think I kind of like it, while it’s no silver bullet, it is something I can go back to if I’m having trouble with my sleep, without having to get a prescription.

    The basic idea is to maximise sleep efficiency. If you’re only getting n hours sleep a night, only spend n hours  a night in bed. This forces you to stay up and go to bed rather late for a few nights. Hopefully, being tired will help you sleep through the night in one large segment. Once you’ve successfully slept through the night a few times, relax your bed time by say fifteen minutes, and get used to that. Slowly over time, you increase the amount of sleep you’re getting, while keeping your efficiency high.

    ,

    OpenSTEMElection Activity Bundle

    With the upcoming federal election, many teachers want to do some related activities in class – and we have the materials ready for you! To make selecting suitable resources a bit easier, we have an Election Activity Bundle containing everything you need, available for just $9.90. Did you know that the secret ballot is an Australian […]

    The post Election Activity Bundle first appeared on OpenSTEM Pty Ltd.

    ,

    Julien GoodwinBuilding new pods for the Spectracom 8140 using modern components

    I've mentioned a bunch of times on the time-nuts list that I'm quite fond of the Spectracom 8140 system for frequency distribution. For those not familiar with it, it's simply running a 10MHz signal against a 12v DC power feed so that line-powered pods can tap off the reference frequency and use it as an input to either a buffer (10MHz output pods), decimation logic (1MHz, 100kHz etc.), or a full synthesizer (Versa-pods).

    It was only in October last year that I got a house frequency standard going using an old Efratom FRK-LN which now provides the reference; I'd use a GPSDO, but I live in a ground floor apartment without a usable sky view, this of course makes it hard to test some of the GPS projects I'm doing. Despite living in a tiny apartment I have test equipment in two main places, so the 8140 is a great solution to allow me to lock all of them to the house standard.


    (The rubidium is in the chunky aluminium chassis underneath the 8140)

    Another benefit of the 8140 is that many modern pieces of equipment (such as my [HP/Agilent/]Keysight oscilloscope) have a single connector for reference frequency in/out, and should the external frequency ever go away it will switch back to its internal reference, but also send that back out the connector, which could lead to other devices sharing the same signal switching to it. The easy way to avoid that is to use a dedicated port from a distribution amplifier for each device like this, which works well enough until you have this situation in multiple locations.

    As previously mentioned the 8140 system uses pods to add outputs, while these pods are still available quite cheaply used on eBay (as of this writing, for as low as US$8, but ~US$25/pod has been common for a while), recently the cost of shipping to Australia has gone up to the point I started to plan making my own.

    By making my own pods I also get to add features that the original pods didn't have[1], I started with a quad-output pod with optional internal line termination. This allows me to have feeds for multiple devices with the annoying behaviour I mentioned earlier. The enclosure is a Pomona model 4656, with the board designed to slot in, and offer pads for the BNC pins to solder to for easy assembly.



    This pod uses a Linear Technologies (now Analog Devices) LTC6957 buffer for the input stage replacing a discrete transistor & logic gate combined input stage in the original devices. The most notable change is that this stage works reliably down to -30dBm input (possibly further, couldn't test beyond that), whereas the original pods stop working right around -20dBm.

    As it turns out, although it can handle lower input signal levels, in other ways including power usage it seems very similar. One notable downside is the chip tops out at 4v absolute maximum input, so a separate regulator is used just to feed this chip. The main regulator has also been changed from a 7805 to an LD1117 variant.

    On this version the output stage is the same TI 74S140 dual 4-input NAND gate as was used on the original pods, just in SOIC form factor.

    As with the next board there is one error on the board, the wire loop that forms the ground connection was intended to fit a U-type pin header, however the footprint I used on the boards was just too tight to allow the pins through, so I've used some thin bus wire instead.



    The second major variant I designed was a combo version, allowing sine & square outputs by just switching a jumper, or isolated[2] or line-regenerator (8040TA from Spectracom) versions with a simple sub-board containing just an inductor (TA) or 1:1 transformer (isolated).



    This is the second revision of that board, where the 74S140 has been replaced by a modern TI 74LVC1G17 buffer. This version of the pod, set for sine output, uses almost exactly 30mA of current (since both the old & new pods use linear supplies that's the most sensible unit), whereas the original pods are right around 33mA. The empty pods at the bottom-left are simply placeholders for 2 100 ohm resistors to add 50 ohm line termination if desired.

    The board fits into the Pomona 2390 "Size A" enclosures, or for the isolated version the Pomona 3239 "Size B". This is the reason the BNC connectors have to be extended to reach the board, on the isolated boxes the BNC pins reach much deeper into the enclosure.

    If the jumpers were removed, plus the smaller buffer it should be easy to fit a pod into the Pomona "Miniature" boxes too.



    I was also due to create some new personal businesscards, so I arranged the circuit down to a single layer (the only jumper is the requirement to connect both ground pins on the connectors) and merged it with some text converted to KiCad footprints to make a nice card on some 0.6mm PCBs. The paper on that photo is covering the link to the build instructions, which weren't written at the time (they're *mostly* done now, I may update this post with the link later).

    Finally, while I was out travelling at the start of April my new (to me) HP 4395A arrived so I've finally got some spectrum output. The output is very similar between the original and my version, with the major notable difference being that my version is 10dB worse at the third harmonic. I lack the equipment (and understanding) to properly measure phase noise, but if anyone in AU/NZ wants to volunteer their time & equipment for an afternoon I'd love an excuse for a field trip.



    Spectrum with input sourced from my house rubidium (natively a 5MHz unit) via my 8140 line. Note that despite saying "ExtRef" the analyzer is synced to its internal 10811 (which is an optional unit, and uses an external jumper, hence the display note.



    Spectrum with input sourced from the analyzer's own 10811, and power from the DC bias generator also from the analyzer.


    1: Or at least I didn't think they had, I've since found out that there was a multi output pod, and one is currently in the post heading to me.
    2: An option on the standard Spectracom pods, albeit a rare one.

    ,

    Jonathan AdamczewskiNavigation Mesh and Sunset Overdrive

    Navigation mesh encodes where in the game world an agent can stand, and where it can go. (here “agent” means bot, actor, enemy, NPC, etc)

    At runtime, the main thing navigation mesh is used for is to find paths between points using an algorithm like A*: https://en.wikipedia.org/wiki/A*_search_algorithm

    In Insomniac’s engine, navigation mesh is made of triangles. Triangle edge midpoints define a connected graph for pathfinding purposes.

    In addition to triangles, we have off-mesh links (“Custom Nav Clues” in Insomniac parlance) that describe movement that isn’t across the ground. These are used to represent any kind of off-mesh connection – could be jumping over a car or railing, climbing up to a rooftop, climbing down a ladder, etc. Exactly what it means for a particular type of bot is handled by clue markup and game code.

    These links are placed by artists and designers in the game environment, and included in prefabs for commonly used bot-traversable objects in the world, like railings and cars.

    Navigation mesh makes a certain operations much, much simpler than it would be if done by trying to reason about render or physics geometry.

    Our game work is made up of a lot of small objects, which are each typically made from many triangles.

    Using render or physics geometry to answer the question “can this bot stand here” hundreds of times every frame is not scalable. (Sunset Overdrive had 33ms frames. That’s not a lot of time.)

    It’s much faster to ask: is there navigation mesh where this bot is

    Navigation mesh is relatively sparse and simple, so the question can be answered quickly. We pre-compute bounding volumes for navmesh, to make answering that question even faster, and if a bot was standing on navmesh last frame, it’s even less work to reason about where they are this frame.

    In addition to path-finding, navmesh can be useful to quickly and safely limit movement in a single direction. We sweep lines across navmesh to find boundaries to clamp bot movement. For example, a bot animating through a somersault will have its movement through the world clamped to the edge of navmesh, rather than rolling off into who-knows-what.

    (If you’re making a game where you want bots to be able to freely somersault in any direction, you can ignore the navmesh ðŸ˜�)

    Building navmesh requires a complete view of the static world. The generated mesh is only correct when it accounts for all objects: interactions between objects affect the generated mesh in ways that are not easy (or fast) to reason about independently.

    Intersecting objects can become obstructions to movement. Or they can form new surfaces that an agent can stand upon. You can’t really tell what it means to an agent until you mash it all together.

    To do as little work as possible at runtime, we required *all* of the static objects to be loaded at one time to pre-build mesh for Sunset City.

    We keep that pre-built navmesh loading during the game at all times. For the final version of the game (with both of the areas added via DLC) this required ~55MB memory.

    We use Recast https://github.com/recastnavigation/recastnavigation to generate the triangle mesh, and (mostly for historical reasons) repack this into our own custom format.

    Sunset Overdrive had two meshes: one for “normal” humanoid-sized bots (2m tall, 0.5m radius)

    and one for “large” bots (4.5m tall, 1.35m radius)

    Both meshes are generated as 16x16m tiles, and use a cell size of 0.125m when rasterizing collision geometry.

    There were a few tools used in Sunset Overdrive to add some sense of dynamism to the static environment:

    For pathfinding and bot-steering, we have runtime systems to control bot movement around dynamic obstacles.

    For custom nav clues, we keep track of whether they are in use, to make it less likely that multiple bots are jumping over the same thing at the same time. This can help fan-out groups of bots, forcing them to take distinctly different paths.

    Since Sunset Overdrive, we’ve added a dynamic obstruction system based on Detour https://github.com/recastnavigation/recastnavigation to temporarily cut holes in navmesh for larger impermanent obstacles like stopped cars or temporary structures.

    We also have a way to mark-up areas of navmesh so that they can be toggled in a controlled fashion from script. It’s less flexible than the dyanamic obstruction system – but it is very fast: toggling flags for tris rather than retriangulation.

    I spoke about Sunset Overdrive at the AI Summit a few years back – my slide deck is here:
    Sunset City Express: Improving the NavMesh Pipeline in Sunset Overdrive

    I can also highly recommend @AdamNoonchester‘s talk from GDC 2015:
    AI in the Awesomepocalypse – Creating the Enemies of Sunset Overdrive

    Here’s some navigation mesh, using the default in-engine debug draw (click for larger version)

    What are we looking at? This is a top-down orthographic view of a location in the middle of Sunset City.

    The different colors indicate different islands of navigation mesh – groups of triangles that are reachable from other islands via custom nav clues.
    Bright sections are where sections of navmesh overlap in the X-Z plane.

    There are multiple visualization modes for navmesh.

    Usually, this is displayed over some in-game geometry – it exists to debug/understand the data in game and editor. Depending on what the world looks like, some colors are easier to read than others. (click for larger versions)




    The second image shows the individual triangles – adjacent triangles do not reliably have different colors. And there is stable color selection as the camera moves, almost ðŸ˜�

    Also, if you squint, you can make out the 16x16m tile boundaries, so you can get a sense of scale.

    Here’s a map of the entirety of Sunset City:

    “The Mystery of the Mooil Rig” DLC area:

    “Dawn of the Rise of the Fallen Machine” DLC area:

    Referencing the comments from up-thread, these maps represent the places where agents can be. Additionally, there is connectivity information – we have visualization for that as well.

    This image has a few extra in-engine annotations, and some that I added:

    The purple lines represent custom nav clues – one line in each direction that is connected.

    Also marked are some railings with clues placed at regular intervals, a car with clues crisscrossing it, and moored boats with clues that allow enemies to chase the player.

    Also in this image are very faint lines on the mesh that show connectivity between triangles. When a bot is failing to navigate, it can be useful to visualize the connectivity that the mesh thinks it has :)

    The radio tower where the fight with Fizzie takes place:

    The roller coaster:

    The roller coaster tracks are one single, continuous and complete island of navmesh.

    Navigation mesh doesn’t line up neatly with collision geometry, or render geometry. To make it easier to see, we draw it offset +0.5m up in world-space, so that it’s likely to be above the geometry it has been generated for. (A while ago, I wrote a full-screen post effect that drew onto rendered geometry based on proximity to navmesh. I thought it was pretty cool, and it was nicely unambiguous & imho easier to read – but I never finished it, it bitrot, and I never got back to it alas.)

    Since shipping Sunset Overdrive, we added support for keeping smaller pieces of navmesh in memory – they’re now loaded in 128x128m parts, along with the rest of the open world.

    @despair‘s recent technical postmortem has a little more on how this works:
    ‘Marvel’s Spider-Man’: A Technical Postmortem

    Even so, we still load it all of an open world region to build the navmesh: the asset pipeline doesn’t provide information that is needed to generate navmesh for sub-regions efficiently & correctly, so it’s all-or-nothing. (I have ideas on how to improve this. One day…)

    Let me know if you have any questions – preferably via twitter @twoscomplement

     

    This post was originally a twitter thread:

    ,

    Tim SerongHerringback

    It occurs to me that I never wrote up the end result of the support ticket I opened with iiNet after discovering significant evening packet loss on our fixed wireless NBN connection in August 2017.

    The whole saga took about a month. I was asked to run a battery of tests (ping, traceroute, file download and speedtest, from a laptop plugged directly into the NTD) three times a day for three days, then send all the results in so that a fault could be lodged. I did this, but somehow there was a delay in the results being communicated, so that by the time someone actually looked at them, they were considered stale, and I had to run the whole set of tests all over again. It’s a good thing I work from home, because otherwise there’s no way it would be possible to spend half an hour three times a day running tests like this. Having finally demonstrated significant evening slowdowns, a fault was lodged, and eventually NBN Co admitted that there was congestion in the evenings.

    We have investigated and the cell which this user is connected to experiences high utilisation during busy periods. This means that the speed of this service is likely to be reduced, particularly in the evening when more people are using the internet.

    nbn constantly monitors the fixed wireless network for sites which require capacity expansion and we aim to upgrade site capacity before congestion occurs, however sometimes demand exceeds expectations, resulting in a site becoming congested.

    This site is scheduled for capacity expansion in Quarter 4, 2017 which should result in improved performance for users on the site. While we endeavour to upgrade sites on their scheduled date, it is possible for the date to change.

    I wasn’t especially happy with that reply after a support experience that lasted for a month, but some time in October that year, the evening packet loss became less, and the window of time where we experienced congestion shrank. So I guess they did do some sort of capacity expansion.

    It’s been mostly the same since then, i.e. slower in the evenings than during the day, but, well, it could be worse than it is. There was one glitch in November or December 2018 (poor speed / connection issues again, but this time during the day) which resulted in iiNet sending out a new router, but I don’t have a record of this, because it was a couple of hours of phone support that for some reason never appeared in the list of tickets in the iiNet toolbox, and even if it had, once a ticket is closed, it’s impossible to click it to view the details of what actually happened. It’s just a subject line, status and last modified date.

    Fast forward to Monday March 25 2019 – a day with a severe weather warning for damaging winds – and I woke up to 34% packet loss, ping times all over the place (32-494ms), continual disconnections from IRC and a complete inability to use a VPN connection I need for work. I did the power-cycle-everything dance to no avail. I contemplated a phone call to support, then tethered my laptop to my phone instead in order to get a decent connection, and decided to wait it out, confident that the issue had already been reported by someone else after chatting to my neighbour.

    hideous-packet-loss-march-2019

    Tuesday morning it was still horribly broken, so I unplugged the router from the NTD, plugged a laptop straight in, and started running ping, traceroute and speed tests. Having done that I called support and went through the whole story (massive packet loss, unusable connection). They asked me to run speed tests again, almost all of which failed immediately with a latency error. The one that did complete showed about 8Mbps down, compared to the usual ~20Mbps during the day. So iiNet lodged a fault, and said there was an appointment available on Thursday for someone to come out. I said fine, thank you, and plugged the router back in to the NTD.

    Curiously, very shortly after this, everything suddenly went back to normal. If I was a deeply suspicious person, I’d imagine that because I’d just given the MAC address of my router to support, this enabled someone to reset something that was broken at the other end, and fix my connection. But nobody ever told me that anything like this happened; instead I received a phone call the next day to say that the “speed issue” I had reported was just regular congestion and that the tower was scheduled for an upgrade later in the year. I thanked them for the call, then pointed out that the symptoms of this particular issue were completely different to regular congestion and that I was sure that something had actually been broken, but I was left with the impression that this particular feedback would be summarily ignored.

    I’m still convinced something was broken, and got fixed. I’d be utterly unsurprised if there had been some problem with the tower on the Sunday night, given the strong winds, and it took ’til mid-Tuesday to get it sorted. But we’ll never know, because NBN Co don’t publish information about congestion, scheduled upgrades, faults and outages anywhere the general public can see it. I’m not even sure they make this information consistently available to retail ISPs. My neighbour, who’s with a different ISP, sent me a notice that says there’ll be maintenance/upgrades occurring on April 18, then again from April 23-25. There’s nothing about this on iiNet’s status page when I enter my address.

    There was one time in the past few years though, when there was an outage that impacted me, and it was listed on iiNet’s status page. It said “customers in the area of Herringback may be affected”. I initially didn’t realise that meant me, as I’d never heard for a suburb, region, or area called Herringback. Turns out it’s the name of the mountain our NBN tower is on.

    ,

    Robert CollinsContinuous Delivery and software distributors

    Back in 2010 the continuous delivery meme was just grabbing traction. Today its extremely well established… except in F/LOSS projects.

    I want that to change, so I’m going to try and really bring together a technical view on how that could work – which may require multiple blog posts – and if it gets traction I’ll put my fingers where my thoughts are and get into specifics with any project that wants to do this.

    This is however merely a worked model today: it may be possible to do things quite differently, and I welcome all discussion about the topic!

    tl;dr

    Pick a service discovery mechanism (e.g. environment variables), write two small APIs – one for flag delivery, with streaming updates, and one for telemetry, with an optional aggressive data hiding proxy, then use those to feed enough data to drive a true CI/CD cycle back to upstream open source projects.

    Who is in?

    Background

    (This assumes you know what C/D is – if you don’t, go read the link above, maybe wikipedia etc, then come back.)

    Consider a typical SaaS C/D pipeline:

    git -> build -> test -> deploy

    Here all stages are owned by the one organisation. Once deployed, the build is usable by users – its basically the simplest pipeline around.

    Now consider a typical on-premise C/D pipeline:

    git -> build -> test -> expose -> install

    Here the last stage, the install stage, takes place in the users context, but it may be under the control of the create, or it may be under the control of the user. For instance, Google play updates on an Android phone: when one selects ‘Update Now’, the install phase is triggered. Leaving the phone running with power and Wi-Fi will trigger it automatically, and security updates can be pushed anytime. Continuing the use of Google Play as an example, the expose step here is an API call to upload precompiled packages, so while there are three parties, the distributor – Google – isn’t performing any software development activities (they do gatekeep, but not develop).

    Where it gets awkward is when there are multiple parties doing development in the pipeline.

    Distributing and C/D

    Lets consider an OpenStack cloud underlay circa 2015: an operating system, OpenStack itself, some configuration management tool (or tools), a log egress tool, a metrics egress handler, hardware mgmt vendor binaries. And lets say we’re working on something reasonably standalone. Say horizon.

    OpenStack for most users is something obtained from a vendor. E.g. Cisco or Canonical or RedHat. And the model here is that the vendor is responsible for what the user receives; so security fixes – in particular embargoed security fixes – cannot be published publically and the slowly propogate. They must reach users very quickly. Often, ideally, before the public publication.

    Now we have something like this:

    upstream ends with distribution, then vendor does an on-prem pipeline


    Can we not just say ‘the end of the C/D pipeline is a .tar.gz of horizon at the distribute step? Then every organisation can make their own decisions?

    Maybe…

    Why C/D?

    • Lower risk upgrades (smaller changes that can be reasoned about better; incremental enablement of new implementations to limit blast radius, decoupling shipping and enablement of new features)
    • Faster delivery of new features (less time dealing with failed upgrades == more time available to work on new features; finished features spend less time in inventory before benefiting users).
    • Better code hygiene (the same disciplines needed to make C/D safe also make more aggressive refactoring and tidiness changes safer to do, so it gets done more often).

    1. If the upstream C/D pipeline stops at a tar.gz file, the lower-risk upgrade benefit is reduced or lost: the pipeline isn’t able to actually push all the to installation, and thus we cannot tell when a particular upgrade workaround is no longer needed.

    But Robert, that is the vendors problem!

    I wish it was: in OpenStack so many vendors had the same problem they created shared branches to work on it, then asked for shared time from the project to perform C/I on those branches. The benefit is only realise when the developer who is responsible for creating the issue can fix it, and can be sure that the fix has been delivered; this means either knowing that every install will install transiently every intermediary version, or that they will keep every workaround for every issue for some minimum time period; or that there will be a pipeline that can actually deliver the software.

    2. .tar.gz files are not installed and running systems. A key characteristic of a C/D pipeline is that is exercises the installation and execution of software; the ability to run a component up is quite tightly coupled to the component itself, for all the the ‘this is a process’ interface is very general, the specific ‘this is server X’ or ‘this is CLI utility Y’ interfaces are very concrete. Perhaps a container based approach, where a much narrower interface in many ways can be defined, could be used to mitigate this aspect. Then even if different vendors use different config tools to do last mile config, the dev cycle knows that configuration and execution works. We need to make sure that we don’t separate the teams and their products though: the pipeline upstream must only test code that is relevant to upstream – and downstream likewise. We may be able to find a balance here, but I think more work articulating what that looks like it needed.

    3. it will break the feedback cycle if the running metrics are not receive upstream; yes we need to be careful of privacy aspects, but basic telemetry: the upgrade worked, the upgrade failed, here is a crash dump – these are the tools for sifting through failure at scale, and a number of open source projects like firefox, Ubuntu and chromium have adopted them, with great success. Notably all three have direct delivery models: their preference is to own the relationship with the user and gather such telemetry directly.

    C/D and technical debt

    Sidebar: ignoring public APIs and external dependencies, because they form the contract that installations and end users interact with, which we can reasonably expect to be quite sticky, the rest of a system should be entirely up to the maintainers right? Refactor the DB; Switch frameworks, switch languages. Cleanup classes and so on. With microservices there is a grey area: APIs that other microservices use which are not publically supported.

    The grey area is crucial, because it is where development drag comes in: anything internal to the system can be refactored in a single commit, or in a series of small commits that is rolled up into one, or variations on this theme.

    But some aspect that another discrete component depends upon, with its own delivery cycle: that cannot be fixed, and unless it was built with the same care public APIs were, it may well have poor scaling or performance characteristics that making fixing it very important.

    Given two C/D’d components A and B, where A wants to remove some private API B uses, A cannot delete that API from its git repo until all B’s everywhere that receive A via C/D have been deployed with a version that does not use the private API.

    That is, old versions of B place technical debt on A across the interfaces of A that they use. And this actually applies to public interfaces too – even if they are more sticky, we can expect the components of an ecosystem to update to newer APIs that are cheaper to serve, and laggards hold performance back, keep stale code alive in the codebase for longer and so on.

    This places a secondary requirement on the telemetry: we need to be able to tell whether the fleet is upgraded or not.

    So what does a working model look like?

    I think we need a different diagram than the pipeline; the pipeline talks about the things most folk doing an API or some such project will have directly in hand, but its not actually the full story. The full story is rounded out with two additional features. Feature flags and telemetry. And since we want to protect our users, and distributors probably will simply refuse to provide insights onto actual users, lets assume a near-zero-trust model around both.

    Feature flags

    As I discussed in my previous blog post, feature flags can be used for fairly arbitrary purposes, but in this situation, where trust is limited, I think we need to identify the crucial C/D enabling use cases, and design for them.

    I think that those can be reduce to soft launches – decoupling activating new code paths from getting them shipped out onto machines, and kill switches – killing off flawed / faulty code paths when they start failing in advance of a massive cascade failure; which we can implement with essentially the same thing: some identifier for a code path and then a percentage of the deployed base to enable it on. If we define this API with efficient streaming updates and a consistent service discovery mechanism for the flag API, then this could be replicated by vendors and other distributors or even each user, and pull the feature API data downstream in near real time.

    Telemetry

    The difficulty with telemetry APIs is that they can egress anything. OTOH this is open source code, so malicious telemetry would be visible. But we can structure it to make it harder to violate privacy.

    What does the C/D cycle need from telemetry, and what privacy do we need to preserve?

    This very much needs discussion with stakeholders, but at a first approximation: the C/D cycle depends on knowing what versions are out there and whether they are working. It depends on known what feature flags have actually activated in the running versions. It doesn’t depend on absolute numbers of either feature flags or versions

    Using Google Play again as an example, there is prior art – https://support.google.com/firebase/answer/6317485 – but I want to think truely minimally, because the goal I have is to enable C/D in situations with vastly different trust levels than Google play has. However, perhaps this isn’t enough, perhaps we do need generic events and the ability to get deeper telemetry to enable confidence.

    That said, let us sketch what an API document for that might look like:

    project:
    version:
    health:
    flags:
    - name:
      value:
    

    If that was reported by every deployed instance of a project, once per hour, maybe with a dependencies version list added to deal with variation in builds, it would trivially reveal the cardinality of reporters. Many reporters won’t care (for instance QA testbeds). Many will.

    If we aggregate through a cardinality hiding proxy, then that vector is addressed – something like this:

    - project:
      version:
      weight:
      health:
      flags:
      - name:
        value:
    - project: ...
    

    Because this data is really only best effort, such a proxy could be backed by memcache or even just an in-memory store, depending on what degree of ‘cloud-nativeness’ we want to offer. It would receive accurate data, then deduplicate to get relative weights, round those to (say) 5% as a minimum to avoid disclosing too much about long tail situations (and yes, the sum of 100 1% reports would exceed 100 :)), and then push that up.

    Open Questions

    • Should library projects report, or are they only used in the context of an application/service?
      • How can we help library projects answer questions like ‘has every user stopped using feature Y so that we can finally remove it’ ?
    • Would this be enough to get rid of the fixation on using stable branches everyone seems to have?
      • If not why not?
    • What have I forgotten?

    ,

    Glen TurnerJupyter notebook and R

    This has become substantially simpler in Fedora 29:

    sudo dnf install notebook R-IRKernel R-IRdisplay
    


    comment count unavailable comments

    ,

    Glen TurnerRipping language-learning CDs

    It might be tempting to use MP3's variable bit rate for encoding ripped foreign languages CDs. With the large periods of silence that would seem to make a lot of sense. But you lose the ability to rewind to an exact millisecond, which turns out to be essential as you want to hear a particular phrase a handful of times. So use CBR -- constant bit rate -- encoding, and at a high bit rate like 160kbps.



    comment count unavailable comments

    ,

    Glen Turnerwpa_supplicant update trades off interoperation for security

    In case you want to choose a different security compromise, the update has a nice summary:

    wpasupplicant (2:2.6-19) unstable; urgency=medium
    
      With this release, wpasupplicant no longer respects the system
      default minimum TLS version, defaulting to TLSv1.0, not TLSv1.2. If
      you're sure you will never connect to EAP networks requiring anything less
      than 1.2, add this to your wpasupplicant configuration:
    
        tls_disable_tlsv1_0=1
        tls_disable_tlsv1_1=1
    
      wpasupplicant also defaults to a security level 1, instead of the system
      default 2. Should you need to change that, change this setting in your
      wpasupplicant configuration:
    
        openssl_ciphers=DEFAULT@SECLEVEL=2
    
      Unlike wpasupplicant, hostapd still respects system defaults.
    
     -- Andrej Shadura <…@debian.org>  Sat, 15 Dec 2018 14:22:18 +0100


    comment count unavailable comments

    ,

    Glen TurnerFinding git credentials in libsecret

    To find passwords in libsecret you need to know what attributes to search for. These are often set by some shim but not documented. The attributes tend to vary by shim.

    For git's libsecret shim the attributes are: protocol, server, user.

    A worked example, the account gdt on git.example.org:

    $ secret-tool search --all 'protocol' 'https' 'server' 'git.example.org' 'user' 'gdt'
    [/org/freedesktop/secrets/collection/login/123]
    label = Git: https://git.example.org/
    secret = CvKxlezMsSDuR7piMBTzREJ7l8WL1T
    created = 2019-02-01 10:20:34
    modified = 2019-02-01 10:20:34
    schema = org.gnome.keyring.NetworkPassword
    attribute.protocol = https
    attribute.server = git.example.org
    attribute.user = gdt
    

    Note that the "label" is mere documentation, it's the "attribute" entries which matter.



    comment count unavailable comments

    ,

    Tim SerongDistributed Storage is Easier Now: Usability from Ceph Luminous to Nautilus

    On January 21, 2019 I presented Distributed Storage is Easier Now: Usability from Ceph Luminous to Nautilus at the linux.conf.au 2019 Systems Administration Miniconf. Thanks to the incredible Next Day Video crew, the video was online the next day, and you can watch it here:

    If you’d rather read than watch, the meat of the talk follows, but before we get to that I have two important announcements:

    1. Cephalocon 2019 is coming up on May 19-20, in Barcelona, Spain. The CFP is open until Friday February 1, so time is rapidly running out for submissions. Get onto it.
    2. If you’re able to make it to FOSDEM on February 2-3, there’s a whole Software Defined Storage Developer Room thing going on, with loads of excellent content including What’s new in Ceph Nautilus – project status update and preview of the coming release and Managing and Monitoring Ceph with the Ceph Manager Dashboard, which will cover rather more than I was able to here.

    Back to the talk. At linux.conf.au 2018, Sage Weil presented “Making distributed storage easy: usability in Ceph Luminous and beyond”. What follows is somewhat of a sequel to that talk, covering the changes we’ve made in the meantime, and what’s still coming down the track. If you’re not familiar with Ceph, you should probably check out A Gentle Introduction to Ceph before proceeding. In brief though, Ceph provides object, block and file storage in a single, horizontally scalable cluster, with no single points of failure. It’s Free and Open Source software, it runs on commodity hardware, and it tries to be self-managing wherever possible, so it notices when disks fail, and replicates data elsewhere. It does background scrubbing, and it tries to balance data evenly across the cluster. But you do still need to actually administer it.

    This leads to one of the first points Sage made this time last year: Ceph is Hard. Status display and logs were traditionally difficult to parse visually, there were (and still are) lots of configuration options, tricky authentication setup, and it was difficult to figure out the number of placement groups to use (which is really an internal detail of how Ceph shards data across the cluster, and ideally nobody should need to worry about it). Also, you had to do everything with a CLI, unless you had a third-party GUI.

    I’d like to be able to flip this point to the past tense, because a bunch of those things were already fixed in the Luminous release in August 2017; status display and logs were cleaned up, a balancer module was added to help ensure data is spread more evenly, crush device classes were added to differentiate between HDDs and SSDs, a new in-tree web dashboard was added (although it was read-only, so just cluster status display, no admin tasks), plus a bunch of other stuff.

    But we can’t go all the way to saying “Ceph was hard”, because that might imply that everything is now easy. So until we reach that frabjous day, I’m just going to say that Ceph is easier now, and it will continue to get easier in future.

    At linux.conf.au in January 2018, we were half way through the Mimic development cycle, and at the time the major usability enhancements planned included:

    • Centralised configuration management
    • Slick deployment in Kubernetes with Rook
    • A vastly improved dashboard based on ceph-mgr and openATTIC
    • Placement Group merging

    We got some of that stuff done for Mimic, which was released in June 2018, and more of it is coming in the Nautilus release, which is due out very soon.

    In terms of usability improvements, Mimic gave us a new dashboard, inspired by and derived from openATTIC. This dashboard includes all the features of the Luminous dashboard, plus username/password authentication, SSL/TLS support, RBD and RGW management, and a configuration settings browser. Mimic also brought the ability to store and manage configuration options centrally on the MONs, which means we no longer need to set options in /etc/ceph/ceph.conf, replicate that across the cluster, and restart whatever daemons were affected. Instead, you can run `ceph config set ...` to make configuration changes. For initial cluster bootstrap, you can even use DNS SRV records rather than specifying MON hosts in the ceph.conf file.

    As I mentioned, the Nautilus release is due out really soon, and will include a bunch more good stuff:

    • PG autoscaling:
    • More dashboard enhancements, including:
      • Multiple users/roles, also single sign on via SAML
      • Internationalisation and localisation
      • iSCSI and NFS Ganesha management
      • Embedded Grafana dashboards
      • The ability to mark OSDs up/down/in/out, and trigger scrubs/deep scrubs
      • Storage pool management
      • A configuration settings editor which actually tells you what the configuration settings mean, and do
      • Embedded Grafana dashboards
      • To see what this all looks like, check out Ceph Manager Dashboard Screenshots as of 2019-01-17
    • Blinky lights, that being the ability to turn on or off the ident and fault LEDs for the disk(s) backing a given OSD, so you can find the damn things in your DC.
    • Orchestrator module(s)

    Blinky lights, and some of the dashboard functionality (notably configuring iSCSI gateways and NFS Ganesha) means that Ceph needs to be able to talk to whatever tool it was that deployed the cluster, which leads to the final big thing I want to talk about for the Nautilus release, which is the Orchestrator modules.

    There’s a bunch of ways to deploy Ceph, and your deployment tool will always know more about your environment, and have more power to do things than Ceph itself will, but if you’re managing Ceph, through the inbuilt dashboard and CLI tools, there’s things you want to be able to do as a Ceph admin, that Ceph itself can’t do. Ceph can’t deploy a new MDS, or RGW, or NFS Ganesha host. Ceph can’t deploy new OSDs by itself. Ceph can’t blink the lights on a disk on some host if Ceph itself has somehow failed, but the host is still up. For these things, you rely on your deployment tool, whatever it is. So Nautilus will include Orchestrator modules for Ansible, DeepSea/Salt, and Rook/Kubernetes, which allow the Ceph management tools to call out to your deployment tool as necessary to have it perform those tasks. This is the bit I’m working on at the moment.

    Beyond Nautilus, Octopus is the next release, due in a bit more than nine months, and on the usability front I know we can expect more dashboard and more orchestrator functionality, but before that, we have the Software Defined Storage Developer Room at FOSDEM on February 2-3 and Cephalocon 2019 on May 19-20. Hopefully some of you reading this will be able to attend 🙂

    Update 2019-02-04: Check out Sage’s What’s new in Ceph Nautilus FOSDEM talk for much more detail on what’s coming up in Nautilus and beyond.